The Importance of Idempotence

Every organization has some operations or processes that have to happen Exactly Once. Your employer needs to make sure they issue your paycheck exactly once. Your bank needs to make sure that paycheck is deposited in your account exactly once. Exactly Once isn’t something that just “traditional” enterprises like banks care about. Google needs to make sure your AdSense check is issued exactly once. Amazon needs to make sure your credit card is charged exactly once. Especially when there’s money involved, the company wants to make sure it gets handled correctly – Exactly Once.

In application (aka siloed) development, transactions are often used to ensure stuff happens Exactly Once, to good effect. But how do we guarantee Exactly Once now that we’re connecting systems together? Given how well transactions work inside applications, it’s not surprising that early attempts to guarantee Exactly Once between systems relied on distributed transactions, this time to not-so-good effect. Pat Helland summarized the problems with distributed transactions this way:

“The two-phase commit protocol will ensure perfect consistency given infinite time.  I say that because it will wait and wait and wait until the transaction is resolved and then provide perfect consistency.   Of course, while partitioned and waiting, arbitrary swaths of the application’s database may be locked up rendering the application unusable.  For this reason, I’ve frequently referred to the two phase commit protocol as the “Anti-Availability Protocol”. “
Pat Helland, SOA and Newton’s Universe

So now we’re faced with a dilemma. Transactions are, for all practical purposes, unusable to ensure Exactly Once processing between connected systems. And yet, the business requirement to ensure Exactly Once hasn’t gone away. We need another way.

The first fallacy of distributed computing is that the network is reliable. It’s usually works, but usually isn’t a guarantee. If I send a message to a remote system but don’t get an acknowledgement, which got lost: the original message or the ack? There’s no way to know, so I have to send the message again. But if I send it again and it’s the ack that got lost, then the target system will receive the message multiple times.

Since the network is not reliable, there’s no way to guarantee that a message will be delivered exactly once. The best we can go is ensure a message will be delivered at least once. However, that implies the target system will receive some messages multiple times. If we need to ensure Exactly Once, we need to make sure the target system won’t duplicate the work if it receives duplicate messages. In other words, we need the target system to be idempotent.

“In computer science, the term idempotent is used to describe method or subroutine calls which can safely be called multiple times, as invoking the procedure a single time or multiple times results in the system maintaining the same state i.e. after the method call all variables have the same value as they did before.

Example: Looking up some customer’s name and address are typically idempotent, since the system will not change state based on this. However, placing an order for a car for the customer is not, since running the method/call several times will lead to several orders being placed, and therefore the state of the system being changed to reflect this.”
Wikipedia, Idempotence (Computer Science)

Or more succinctly:

“Idempotent Means It’s OK to Arrive Multiple Times”
Pat Helland (again)

I can’t overstate the importance of designing your cross-system communication to be idempotent. If you care about ensuring Exactly Once, each step of your process has to be either transactional or idempotent, or you’ll be screwed. It’s interesting to note that you have to be transactional *OR* idempotent, but not both. You can chain together multiple steps in long business process, across multiple disparate systems, but as long as each step is either transactional or idempotent, you can guarantee Exactly Once across the entire process. In other words:

Transactional/Exactly Once == Idempotent/At Least Once

This implies that you can substitute an idempotent operation for a transactional operation, and still ensure Exactly Once.

Let’s look at an example. Typically you ensure Exactly Once processing with MSMQ by receiving messages within the scope of a transaction along with whatever other work you’re doing. But what if you can’t use a transactional receive, say because it’s a remote queue? What would an idempotent equivalent for transactional receive look like?

How about:

  1. Peek a message from the remote queue
  2. Insert the message into the target system database, using the unique MSMQ Message ID as the primary key
  3. Remove the message from the queue by ID

Each of those steps is idempotent. Peek is a read, which is naturally idempotent. Inserting the message into the database is idempotent, since we use the message ID as the primary key. As long as that ID is unique, we can never insert it into the database more than once. Finally, removing a message based on it’s unique ID is also naturally idempotent. Once the message is in the target system database, we can use traditional transactions to ensure it gets processed Exactly Once.

So we took a single transactional operation and turned it into a series of idempotent steps. Both ensure each message is processed Exactly Once. Given the choice, I’d rather write the transactional operation – it’s much less code since we’re we can use existing infrastructure – aka the distributed transaction coordinator. But if the transactional infrastructure isn’t available, I’d rather write multiple idempotent steps and ensure Exactly Once rather than risk losing or duplicating messages.

I’ve got more on this topic, but in the meantime think about this: How do you think durable messaging infrastructure like MSMQ ensures exactly once delivery? You can use that pattern, even if you’re not using durable messaging infrastructure.

Morning Coffee 122

  • Sorry for the posting lag. Had a few technical difficulties around here. In the process of moving hosts, so expect more glitches.
  • My talk at the p&p Summit on Monday went really well. At least, it felt good and the applause at the end felt genuine. I recorded the audio on my laptop, so I’ll be posting a Silverlight version as soon as I figure out how to adjust the levels so their somewhat consistent. Paraesthesia and #2872 have reactions.
  • Speaking of the p&p Summit, Scott Hanselman posted his ASP.NET MVC demo from his talk. Said ASP.NET MVC bits aren’t available yet, so you can’t, you know, run the demo for yourself. But at least you can review what the ASP.NET MVC code will look like.
  • I stopped by the SOA/BPM conference last week and saw Jon, Sam and Jesus among others. Spent quite a bit of time talking to Sam and his Neudesic colleagues about this “physically distributed/logically centralized” approach that I think is hogwash. It sounds to me like Neudesic approach is really federated not centralized, though I’m not sure David Pallmann would agree. Federated makes much more sense to me than centralized.
  • Nick Malik continues his series on SOA Business Operations Model. I especially like his point that this isn’t a series of choices, you need to “look at your company and try to understand which model the business has selected.”
  • The first CTP of PowerShell 2.0 is out! Check out what’s new on the PowerShell team blog and Jeffrey Snover’s TechEd Presentation. (via Sam Gentile)
  • Soma announced updates to VC++ coming next year, including TR1 support and a “major” MFC upgrade to support creating native apps that look like Office, IE or VS. I get supporting TR1, but the idea that people are clamoring for MFC updates is kinda surprising. Many years ago when I first came to MSFT, a friend asked “But don’t you hate Microsoft?” to which I responded “No, I just hate MFC”. Obviously, not everyone agrees with that sentiment.
  • Steve Vinoski thinks there’s no hope for IT. Funny, I keep agreeing with Steve’s overall point but disagreeing with his reasoning. I still don’t buy the serendipity argument. I like compiled languages. And I think he’s overstating the amount of “real, useful guidance” for REST floating around. Basically, there’s “the book“.
  • In widely reported news, Windows Live launched their next generation services. Don’t bother with the press release, just go to the new WL home page.
  • Speaking of WL, Dare Obasanjo points to the Live Data Interactive SDK page where you can experiment with the WL Contacts REST API. It gives you a good sense of how the Web3S protocol works. Pretty well, IMO. However, how come WL Contacts Schema doesn’t include some type of update timestamp for sync purposes? If you wanted to build say a Outlook to WL Contacts sync engine, you’d have to download the entire address book and grovel thru it for changes every sync.
  • Speaking of Web3S, I’d love to see some info on how one might implement a service using Web3S. Yaron Goland positions Web3S as an alternative to APP that WL developed because they “couldn’t make APP work in any sane way for our scenarios”. I’m sure other folks have similar scenarios.