The Importance of Idempotence

Every organization has some operations or processes that have to happen Exactly Once. Your employer needs to make sure they issue your paycheck exactly once. Your bank needs to make sure that paycheck is deposited in your account exactly once. Exactly Once isn’t something that just “traditional” enterprises like banks care about. Google needs to make sure your AdSense check is issued exactly once. Amazon needs to make sure your credit card is charged exactly once. Especially when there’s money involved, the company wants to make sure it gets handled correctly – Exactly Once.

In application (aka siloed) development, transactions are often used to ensure stuff happens Exactly Once, to good effect. But how do we guarantee Exactly Once now that we’re connecting systems together? Given how well transactions work inside applications, it’s not surprising that early attempts to guarantee Exactly Once between systems relied on distributed transactions, this time to not-so-good effect. Pat Helland summarized the problems with distributed transactions this way:

“The two-phase commit protocol will ensure perfect consistency given infinite time.  I say that because it will wait and wait and wait until the transaction is resolved and then provide perfect consistency.   Of course, while partitioned and waiting, arbitrary swaths of the application’s database may be locked up rendering the application unusable.  For this reason, I’ve frequently referred to the two phase commit protocol as the “Anti-Availability Protocol”. “
Pat Helland, SOA and Newton’s Universe

So now we’re faced with a dilemma. Transactions are, for all practical purposes, unusable to ensure Exactly Once processing between connected systems. And yet, the business requirement to ensure Exactly Once hasn’t gone away. We need another way.

The first fallacy of distributed computing is that the network is reliable. It’s usually works, but usually isn’t a guarantee. If I send a message to a remote system but don’t get an acknowledgement, which got lost: the original message or the ack? There’s no way to know, so I have to send the message again. But if I send it again and it’s the ack that got lost, then the target system will receive the message multiple times.

Since the network is not reliable, there’s no way to guarantee that a message will be delivered exactly once. The best we can go is ensure a message will be delivered at least once. However, that implies the target system will receive some messages multiple times. If we need to ensure Exactly Once, we need to make sure the target system won’t duplicate the work if it receives duplicate messages. In other words, we need the target system to be idempotent.

“In computer science, the term idempotent is used to describe method or subroutine calls which can safely be called multiple times, as invoking the procedure a single time or multiple times results in the system maintaining the same state i.e. after the method call all variables have the same value as they did before.

Example: Looking up some customer’s name and address are typically idempotent, since the system will not change state based on this. However, placing an order for a car for the customer is not, since running the method/call several times will lead to several orders being placed, and therefore the state of the system being changed to reflect this.”
Wikipedia, Idempotence (Computer Science)

Or more succinctly:

“Idempotent Means It’s OK to Arrive Multiple Times”
Pat Helland (again)

I can’t overstate the importance of designing your cross-system communication to be idempotent. If you care about ensuring Exactly Once, each step of your process has to be either transactional or idempotent, or you’ll be screwed. It’s interesting to note that you have to be transactional *OR* idempotent, but not both. You can chain together multiple steps in long business process, across multiple disparate systems, but as long as each step is either transactional or idempotent, you can guarantee Exactly Once across the entire process. In other words:

Transactional/Exactly Once == Idempotent/At Least Once

This implies that you can substitute an idempotent operation for a transactional operation, and still ensure Exactly Once.

Let’s look at an example. Typically you ensure Exactly Once processing with MSMQ by receiving messages within the scope of a transaction along with whatever other work you’re doing. But what if you can’t use a transactional receive, say because it’s a remote queue? What would an idempotent equivalent for transactional receive look like?

How about:

  1. Peek a message from the remote queue
  2. Insert the message into the target system database, using the unique MSMQ Message ID as the primary key
  3. Remove the message from the queue by ID

Each of those steps is idempotent. Peek is a read, which is naturally idempotent. Inserting the message into the database is idempotent, since we use the message ID as the primary key. As long as that ID is unique, we can never insert it into the database more than once. Finally, removing a message based on it’s unique ID is also naturally idempotent. Once the message is in the target system database, we can use traditional transactions to ensure it gets processed Exactly Once.

So we took a single transactional operation and turned it into a series of idempotent steps. Both ensure each message is processed Exactly Once. Given the choice, I’d rather write the transactional operation – it’s much less code since we’re we can use existing infrastructure – aka the distributed transaction coordinator. But if the transactional infrastructure isn’t available, I’d rather write multiple idempotent steps and ensure Exactly Once rather than risk losing or duplicating messages.

I’ve got more on this topic, but in the meantime think about this: How do you think durable messaging infrastructure like MSMQ ensures exactly once delivery? You can use that pattern, even if you’re not using durable messaging infrastructure.

Morning Coffee 122

  • Sorry for the posting lag. Had a few technical difficulties around here. In the process of moving hosts, so expect more glitches.
  • My talk at the p&p Summit on Monday went really well. At least, it felt good and the applause at the end felt genuine. I recorded the audio on my laptop, so I’ll be posting a Silverlight version as soon as I figure out how to adjust the levels so their somewhat consistent. Paraesthesia and #2872 have reactions.
  • Speaking of the p&p Summit, Scott Hanselman posted his ASP.NET MVC demo from his talk. Said ASP.NET MVC bits aren’t available yet, so you can’t, you know, run the demo for yourself. But at least you can review what the ASP.NET MVC code will look like.
  • I stopped by the SOA/BPM conference last week and saw Jon, Sam and Jesus among others. Spent quite a bit of time talking to Sam and his Neudesic colleagues about this “physically distributed/logically centralized” approach that I think is hogwash. It sounds to me like Neudesic approach is really federated not centralized, though I’m not sure David Pallmann would agree. Federated makes much more sense to me than centralized.
  • Nick Malik continues his series on SOA Business Operations Model. I especially like his point that this isn’t a series of choices, you need to “look at your company and try to understand which model the business has selected.”
  • The first CTP of PowerShell 2.0 is out! Check out what’s new on the PowerShell team blog and Jeffrey Snover’s TechEd Presentation. (via Sam Gentile)
  • Soma announced updates to VC++ coming next year, including TR1 support and a “major” MFC upgrade to support creating native apps that look like Office, IE or VS. I get supporting TR1, but the idea that people are clamoring for MFC updates is kinda surprising. Many years ago when I first came to MSFT, a friend asked “But don’t you hate Microsoft?” to which I responded “No, I just hate MFC”. Obviously, not everyone agrees with that sentiment.
  • Steve Vinoski thinks there’s no hope for IT. Funny, I keep agreeing with Steve’s overall point but disagreeing with his reasoning. I still don’t buy the serendipity argument. I like compiled languages. And I think he’s overstating the amount of “real, useful guidance” for REST floating around. Basically, there’s “the book“.
  • In widely reported news, Windows Live launched their next generation services. Don’t bother with the press release, just go to the new WL home page.
  • Speaking of WL, Dare Obasanjo points to the Live Data Interactive SDK page where you can experiment with the WL Contacts REST API. It gives you a good sense of how the Web3S protocol works. Pretty well, IMO. However, how come WL Contacts Schema doesn’t include some type of update timestamp for sync purposes? If you wanted to build say a Outlook to WL Contacts sync engine, you’d have to download the entire address book and grovel thru it for changes every sync.
  • Speaking of Web3S, I’d love to see some info on how one might implement a service using Web3S. Yaron Goland positions Web3S as an alternative to APP that WL developed because they “couldn’t make APP work in any sane way for our scenarios”. I’m sure other folks have similar scenarios.

Morning Coffee 121

  • My daughter had her tonsils & adenoids out on yesterday. It was a routine procedure and it went by-the-numbers, but any parent will tell you it’s hard to see your kid in a hospital bed.
  • Given the previous bullet, I’m not at the SOA/BPM conference for the big announcement. Don’t worry, there’s lots of other folks covering the news.
  • It was a crappy sports weekend in the Pierson house. Va Tech snatched defeat from the jaws of victory, Southern Cal never led at Oregon, the Capitals lost twice, and the Redskins got blown out by the Pats. At least the Caps won big yesterday in Toronto.
  • Speaking of the Capitals, Peter Bondra retired Monday. I still think it’s a travesty that he didn’t spend his whole career in DC, but I’ve made my peace with it.
  • Nick Malik has a great series on business operations models and how they apply to SOA. Regular readers should be unsurprised that I favor low standardization, though I can see the value of high integration. That makes the Coordinated Operating Model my fav, though I can see the benefit of the Diversified Model as well. I can’t wait to read what Nick has to say on changing models.
  • Speaking of Nick, I’m doing a roundtable with him on “Making SOA Work in the Enterprise” @ the Strategic Architect Forum. Should be fun. Sorry for the lack of linkage on this, but it’s an invite-only event.
  • Jezz Santos has a new series of white papers on building software factories. First up “Packaging with Visual Studio 2005
  • Aaron Skonnard has a new whitepaper on using the WCF LOB Adapter SDK with BTS 2006 R2. I’ve been building one of these things recently, so I’m looking forward to checking that out. (via Sam Gentile)
  • Tim Ewald looks at Resource Oriented Architecture (when did ROA become a TLA?) and wonders “what if your problem domain is more focused on processes than data?” I wonder that all the time. (via Jesus Rodriguez)
  • It’s not just durable messaging – Libor Soucek also disagrees with my opinions on centralized control. I agree 100% with Libor that centralized management would make operation’s lives “much, MUCH easier” as he puts it. However, that doesn’t make it feasible at any significant scale. Furthermore, I wouldn’t describe an approach that requires that “all services adopt [the] same common management interface” as “pragmatic”. Frankly, just the opposite.

The Worst of Both Worlds

David Pallmann of Neudesic responded to my comment that “Physically distributed but logically centralized” didn’t make any sense to me at all:

What exactly does this mean? To some this may sound like a contradiction.

This simply means that a bus is physically more like the point-to-point architecture (spread out, no hub) but functionally more like the hub-and-spoke architecture (pub-sub messaging, centralized configuration and activity tracking, easy change management).

Unfortunately, I wasn’t confused about the seeming contradictory nature of these concepts. In other words, I understand the “what” and “how” of David’s physically distributed/logically centralized approach.

I don’t understand the “why”. As in, “why would you want to do this?” or “why do you think this would work at any significant scale?”.

If we check out Neudesic’s page on their ESB product (which David pointed me to) we find the following blurb:

Centralized Management
The distributed nature of service oriented programming can create a management nightmare. Neuron·ESB supports this distributed architecture while simultaneously centralizing monitoring and configuration.

SOA’s “distributed nature” is it’s primary strength. SOA’s not primarily about standards or ease-of-connectivity – though those obviously play a role. It’s about enabling decentralized decision making. Since you can’t be both centralized and decentralized, enforcing centralized management basically negates SOA’s primary strength. This seems like the worst of both worlds to me. All the hassle of distributed decision making combined with all the hassle of centralized management.

Yes, decentralized decision making can create a management nightmare. Personally, a management nightmare is much more attractive anything centralized approaches have ever delivered in the IT industry.

Dare Obasanjo recently wrote “If You Fight the Web, You Will Lose“. He was talking about the Web as a Platform, but it’s good general advice. Can you imagine applying the marketing blurb above to the Internet at large?

Centralized Management
The distributed nature of service oriented programming the Internet can create a management nightmare. Neuron·ESB supports this distributed architecture while simultaneously centralizing monitoring and configuration.

If the Internet can somehow get by without centralized management, why can’t you?

Morning Coffee 120

  • Doing these morning coffee posts is a lot tougher since I cut back my blog reading. Where I used to have no trouble finding 4-5 coffee-worthy items every day, these days I seem to only get 1-2, if that.
  • After starting off 3-0 and 100% on the PK, the Caps dropped four in a row and have been miserable on special teams. The special teams woes continued last night against the Lightning, but they still won. Caps went 0-4 on the powerplay, and coughed up a short handed goal. But they also went 3-3 on the PK, so I guess it wasn’t all bad. Maybe my mother will stop calling for Hanlon’s job now. It’s a long season and as Peerless Prognosticator points out, the rebuild isn’t over.
  • Jomo Fisher, who helped Scott Hanselman auto-merge assemblies, has been digging around in F# of late. As it turns out, he’s joining the F# team so I’m thinking it’s not a huge stretch for him. If you’re a C# developer trying interested in getting a handle on this new F# thing, his blog is a good place to start.
  • Speaking of F#, Don Syme posts about yet another new F# feature: Async Workflows. Workflow is a bad term here IMO since it can be easily confused with WF. Regardless of it’s name, Async Workflows is about making .NET’s Async Programming model a first class citizen in F#. Robert Pickering has a good post explaining how this new feature works.
  • Microsoft sure has a lot of multi-threading / async-programming tools coming out. In addition to F# Async Workflows, there’s the Concurrency and Coordination Runtime, Parallel LINQ and the Task Parallel Library. I would hope all this work eventually coalesces as a coherent product offering.
  • Now that F# is being “producized”, I wonder if the language evolution will slow down. Async workflows were introduced in F# 1.9.2.9. Other recent changes include Computation Expressions (v1.9.2), Use Bindings (v1.9.2) and Active Patterns (v1.9.1). F# seems to churn more in minor releases than C# does in major releases. Of course, that’s because F# was a research project, not a “real” product. Now that it’s going to be a product, will the rate of innovation slow?