Transactions in Workflow Foundation-land

I’ve been spending some quality time with SSB and WF of late. On the balance, my opinion of both these technologies is very positive, though each has some warts of note. For Service Broker, they got the transactional messaging semantics right, but much of the lower level connection management – what SSB calls “routes” are clumsy to deal with. For Workflow Foundation, the execution model is amazingly flexible. Unfortunately, WF’s support for transactions is significantly more rigid.

If you’re build a SSB app, you’re typical execution thread looks like this:

  1. Start a transaction.
  2. Receive message(s) from top of the queue.
  3. Execute service business logic. Obviously, this varies from service to service but it typically involves reading and writing data in the database as well as sending messages to other services.
  4. Commit the transaction

When I sat down to marry SSB and WF, I naively assumed I could simply use WF for step three above. Alas, that turns out to be impossible. This thread on MSDN Forums has most of the gory details, but the short version is that WF does not support flowing host managed transactions into the workflow instance. As per Joel West in the aforementioned thread:

“[T]he WF runtime in V1 only supports flowing in a transaction on WorkflowInstance.Unload. There are various ways that you could try and hack this (with a custom persistence service or WorkflowCommitWorkBatchService) but if you do this it won’t work correctly 100% of the time and the times when it fails (error conditions or failures causing the tx to rollback) will be exactly when you are expecting transactional consistency.

Bottom line – the only way to make this work is to call WorkflowInstance.Unload inside your transaction scope.  This was the best that we could do in V1 to try and enable this pattern in some form.  Not always ideal but it can be made to work for most scenarios that require usage of an external transaction.”

So the WF compatible execution thread looks like this:

  1. Start a transaction
  2. Receive message(s) from the top of the queue
  3. Load/Create the associated workflow instance for the received messages
    • All messages received are guaranteed to be from the same SSB conversation group, which is roughly analogous to a WF instance, so this turns out to be fairly easy
  4. Enqueue the received message in the workflow instance
  5. Unload the workflow instance
  6. Commit the host transaction
  7. Reload the workflow instance
  8. Run the workflow instance (note, I’m using the manual scheduling service)
    • Workflow instance creates a transaction if needed
  9. Unload the workflow instance (typically done via UnloadOnIdle in the persistence service)
    • Assuming the workflow instance needed a transaction, it gets committed after unload

Basically, you use two transactions. One host managed transaction to move the message from SSB to WF instance and one WF managed transaction to process the message.The need for two transaction instead of one is unfortunate, but required given the current design of WF. And frankly, given the importance and difficulty of transaction management, I’m not that surprised that WF has hard coded transaction semantics. Trying to build a generic transaction flow model that would work in the myriad of scenarios WF is targeting would have been extremely difficult. At least there is a work around, even if it means using two transactions and loading and unloading the workflow instance twice.

However, there is a silver lining to the two transaction approach: two unexpected benefits when dealing with poison messages. First, SSB doesn’t have dead letter queue like MSMQ does. Moving a poison message to a dead letter queue would break SSB’s exactly once and in order semantics.(MSMQ doesn’t guarantee in order delivery) But moving all messages into the WF instance gets them out of the main SSB queue so poison messages don’t continue to get processed over and over.

Second, because the workflow instance is peristed after the messages are enqueued, there’s a representation of the workflow after the message is received but before the message is processed. If there’s a poison message, attempting to processing the message will fail and rollback to this state. This persisted workflow instance could be sent to a developer who could step through it to determine the cause of the error. We could even have developer versions of runtime workflow services so we could read remote data and simulate data updates. I wouldn’t want the developer updating production data in this way, but it would be great for troubleshooting issues.

Comments:

The WF transaction model makes more sense if you come from a BizTalk background (like me), because BizTalk has essentially the same restriction. That said, BizTalk is far, far more flexible in its transaction facilities, but you still have to remember that message receive and send operations operate in completely separate transactions from orchestrations. However, this is improved in BizTalk because you have the MessageBox right in the middle of the messaging engine and the orchestration engine which makes implementing your kind of scenario completely natural even under the two-transaction separation. I'm not quite sure I agree about the whole poison message thingie, though. First of all, handling "real" poison messages (i.e. malformed ones that you really need to discard) requires breaking In-order delivery semantics if you hope to keep processing. The only way to avoid if the problematic message is important is to completely stop processing, fix the offending message(s) and resume processing. And this, I think, pretty much is all across the board (not specific to SSSB or MSMQ). Second, MSMQ does support in-order delivery, afaik, for *messages sent in the same transaction*. This is key because MSMQ doesn't have the notion of a conversation/dialog like SSSB does (though it's certainly possible to implement it if you really want it). That said, I think you may be talking more about "in-order processing", which is a completely different thing, and which you seem to be dismissing. I'm not quite sure why you'd care so much about in-order delivery if you're not doing in-order processing after that (i.e. message 2 might finish processing before message-1 does), so that might be something to consider.
I'm not very deep on BizTalk, but I'm working on it. That said, comparing the high-level TX capabilities of a product like BizTalk to the low-level TX capabilities of a technology like WF is an apples to oranges comparison. From what I understand about MessageBox, it sounds fairly similar to SSB, which would make a SSB/WF implementation conceptually similar to BizTalk. But I need to learn much more about BizTalk. I agree 100% that dealing with poison messages need to be fixed before you can resume processing. But the silver lining of the two transaction approach to SSB & WF is that you're only stalling the instance with the poision message. If you left the poison message in the SSB queue, it would keep getting picked up until and throwing an exception until an operator came along to do something. If you move it to WF, then it will only effect the instance that message is intended for. I'm not dismissing in-order processing at all. It's one of the primary values of SSB in my opinon. Assuming WF doesn't reorder messages in a queue, moving them from SSB to WF wouldn't break the in order semantics.
Harry, I just realized I misread your original post and didn't notice that you were delivering all messages in a single conversation group to a single workflow instance. In that case, you're probably maintaining ordered processing. FWIW, seems you're implementing the kind of solution that is usually handled in Biztalk through a feature known as Convoys (look them up, the details might be useful to you at least for ideas, I think) BTW, the BizTalk MessageBox is far more than SSB, though they share some features, I think. Mostly, the MsgBox does not only queuing, it's also the underlying Pub/Sub and content-based routing engine in BizTalk. I do think it might be possible to implement the message box on top of SSB, and maybe the BizTalk team might explore that for possible performance improvements in future releases. And yes, I realize I'm a pita sometimes nagging about biztalk (but I do think for the kind of work you're doing, knowing how biztalk manages some of it would prove useful).
Don't worry. I'm a PITA sometimes nagging about SSB! :)