The Importance of Idempotence

Every organization has some operations or processes that have to happen Exactly Once. Your employer needs to make sure they issue your paycheck exactly once. Your bank needs to make sure that paycheck is deposited in your account exactly once. Exactly Once isn’t something that just “traditional” enterprises like banks care about. Google needs to make sure your AdSense check is issued exactly once. Amazon needs to make sure your credit card is charged exactly once. Especially when there’s money involved, the company wants to make sure it gets handled correctly – Exactly Once.

In application (aka siloed) development, transactions are often used to ensure stuff happens Exactly Once, to good effect. But how do we guarantee Exactly Once now that we’re connecting systems together? Given how well transactions work inside applications, it’s not surprising that early attempts to guarantee Exactly Once between systems relied on distributed transactions, this time to not-so-good effect. Pat Helland summarized the problems with distributed transactions this way:

“The two-phase commit protocol will ensure perfect consistency given infinite time.  I say that because it will wait and wait and wait until the transaction is resolved and then provide perfect consistency.   Of course, while partitioned and waiting, arbitrary swaths of the application’s database may be locked up rendering the application unusable.  For this reason, I’ve frequently referred to the two phase commit protocol as the “Anti-Availability Protocol”. “
Pat Helland, SOA and Newton’s Universe

So now we’re faced with a dilemma. Transactions are, for all practical purposes, unusable to ensure Exactly Once processing between connected systems. And yet, the business requirement to ensure Exactly Once hasn’t gone away. We need another way.

The first fallacy of distributed computing is that the network is reliable. It’s usually works, but usually isn’t a guarantee. If I send a message to a remote system but don’t get an acknowledgement, which got lost: the original message or the ack? There’s no way to know, so I have to send the message again. But if I send it again and it’s the ack that got lost, then the target system will receive the message multiple times.

Since the network is not reliable, there’s no way to guarantee that a message will be delivered exactly once. The best we can go is ensure a message will be delivered at least once. However, that implies the target system will receive some messages multiple times. If we need to ensure Exactly Once, we need to make sure the target system won’t duplicate the work if it receives duplicate messages. In other words, we need the target system to be idempotent.

“In computer science, the term idempotent is used to describe method or subroutine calls which can safely be called multiple times, as invoking the procedure a single time or multiple times results in the system maintaining the same state i.e. after the method call all variables have the same value as they did before.

Example: Looking up some customer’s name and address are typically idempotent, since the system will not change state based on this. However, placing an order for a car for the customer is not, since running the method/call several times will lead to several orders being placed, and therefore the state of the system being changed to reflect this.”
Wikipedia, Idempotence (Computer Science)

Or more succinctly:

“Idempotent Means It’s OK to Arrive Multiple Times”
Pat Helland (again)

I can’t overstate the importance of designing your cross-system communication to be idempotent. If you care about ensuring Exactly Once, each step of your process has to be either transactional or idempotent, or you’ll be screwed. It’s interesting to note that you have to be transactional *OR* idempotent, but not both. You can chain together multiple steps in long business process, across multiple disparate systems, but as long as each step is either transactional or idempotent, you can guarantee Exactly Once across the entire process. In other words:

Transactional/Exactly Once == Idempotent/At Least Once

This implies that you can substitute an idempotent operation for a transactional operation, and still ensure Exactly Once.

Let’s look at an example. Typically you ensure Exactly Once processing with MSMQ by receiving messages within the scope of a transaction along with whatever other work you’re doing. But what if you can’t use a transactional receive, say because it’s a remote queue? What would an idempotent equivalent for transactional receive look like?

How about:

  1. Peek a message from the remote queue
  2. Insert the message into the target system database, using the unique MSMQ Message ID as the primary key
  3. Remove the message from the queue by ID

Each of those steps is idempotent. Peek is a read, which is naturally idempotent. Inserting the message into the database is idempotent, since we use the message ID as the primary key. As long as that ID is unique, we can never insert it into the database more than once. Finally, removing a message based on it’s unique ID is also naturally idempotent. Once the message is in the target system database, we can use traditional transactions to ensure it gets processed Exactly Once.

So we took a single transactional operation and turned it into a series of idempotent steps. Both ensure each message is processed Exactly Once. Given the choice, I’d rather write the transactional operation – it’s much less code since we’re we can use existing infrastructure – aka the distributed transaction coordinator. But if the transactional infrastructure isn’t available, I’d rather write multiple idempotent steps and ensure Exactly Once rather than risk losing or duplicating messages.

I’ve got more on this topic, but in the meantime think about this: How do you think durable messaging infrastructure like MSMQ ensures exactly once delivery? You can use that pattern, even if you’re not using durable messaging infrastructure.

The Worst of Both Worlds

David Pallmann of Neudesic responded to my comment that “Physically distributed but logically centralized” didn’t make any sense to me at all:

What exactly does this mean? To some this may sound like a contradiction.

This simply means that a bus is physically more like the point-to-point architecture (spread out, no hub) but functionally more like the hub-and-spoke architecture (pub-sub messaging, centralized configuration and activity tracking, easy change management).

Unfortunately, I wasn’t confused about the seeming contradictory nature of these concepts. In other words, I understand the “what” and “how” of David’s physically distributed/logically centralized approach.

I don’t understand the “why”. As in, “why would you want to do this?” or “why do you think this would work at any significant scale?”.

If we check out Neudesic’s page on their ESB product (which David pointed me to) we find the following blurb:

Centralized Management
The distributed nature of service oriented programming can create a management nightmare. Neuron·ESB supports this distributed architecture while simultaneously centralizing monitoring and configuration.

SOA’s “distributed nature” is it’s primary strength. SOA’s not primarily about standards or ease-of-connectivity – though those obviously play a role. It’s about enabling decentralized decision making. Since you can’t be both centralized and decentralized, enforcing centralized management basically negates SOA’s primary strength. This seems like the worst of both worlds to me. All the hassle of distributed decision making combined with all the hassle of centralized management.

Yes, decentralized decision making can create a management nightmare. Personally, a management nightmare is much more attractive anything centralized approaches have ever delivered in the IT industry.

Dare Obasanjo recently wrote “If You Fight the Web, You Will Lose“. He was talking about the Web as a Platform, but it’s good general advice. Can you imagine applying the marketing blurb above to the Internet at large?

Centralized Management
The distributed nature of service oriented programming the Internet can create a management nightmare. Neuron·ESB supports this distributed architecture while simultaneously centralizing monitoring and configuration.

If the Internet can somehow get by without centralized management, why can’t you?

Throwing Gasoline on the Fire

Steve Vinoski has raised a bit of a flame war by admitting he has lost the ESB religion. Given that I’ve never been a fan of ESB’s anyway, there’s a lot there that I agree with. In particular I liked the description of “magical framework” middleware, blaming enterprise architects for driving ESB’s as the “single integration architecture” even though a single *anything* in the enterprise is untenable and his point that flexibility means you don’t do any one thing particularly well.

However, Steve goes on to bash compiled languages and WS-* while suggesting the One True Integration Strategy™ is REST + <insert your favorite dynamic language here>, then acts surprised that the conversation denigrates into “us vs. them”. When you start by saying that compiled language proponents “natter on pointlessly”, I think you lose your right to later lament the depreciating level of conversation .

All programming languages provide their own unique model of the execution environment.  Dynamic languages have a very different model than compiled languages. Arguing that this or that model is better for everyone, everywhere, in all circumstances seems unbelievably naive and arrogant at the same time.

On the other hand, I do agree with Steve’s point that most developers only know a single programming language, to their detriment. One language developers often miss a better solution because their language of choice doesn’t provide the right semantics to solve the problem at hand. Developers could do a lot worse than learn a new language. And I don’t mean a C# developer should learn VB.

The most pressing example of picking the right language for the right problem today is multi-threading. Most languages – including dynamic languages – have shitty concurrency semantics. If you’re building an app to take advantage of many-core processing, “mainstream” apps like C#, Java and Ruby won’t help you much. But we’re starting to see languages with native concurrency semantics like Erlang. Erlang is dynamically typed, but that’s not what makes it interesting. It’s interesting because of it’s native primitives for spawning tasks. I don’t see why you couldn’t add similar primitives for task spawning to a compiled functional language like F#.

As for REST vs. SOAP/WS-*, I thought it was interesting that Steve provided no rationale whatsoever for why you should avoid them. The more I listen to this pissing match debate, the more I think the various proponents are arguing over unimportant syntactical details when the semantics are basically the same. SOAP is just a way to add metadata to an XML message, much as HTTP headers are. WS-* provides a set of optional message-level capabilities for handling cross-cutting concerns like security. Past that, are the models really that different? Nope.

For system integration scenarios like Steve is talking about, I’m not sure how important any of the WS-* capabilities are. Security? I can get that at the transport layer (aka HTTPS). Reliable Messaging? If I do request/response (which REST excels at), I don’t need RM. Transactions? Are you kidding me? Frankly, the only capability you really need in this scenario is idempotence, and neither REST or SOAP provides any standard mechanism to achieve that. (more on that in a later post)

I understand that some vendors are taking the WS-* specs and building out huge centralized infrastructure products and calling them ESBs. I think Steve is primarily raging against that, and on that point I agree 100%. But Steve sounds like he’s traded one religion for another – “Born Again REST”. For me, picking the right tool for the job implies much less fanaticism than Steve displays in his recent posts.

The DevHawk 2007 World Tour

After spending almost all of fiscal year 07 (July ’06 thru June ’07) not traveling and not presenting, I’m going to be doing a few public talks to finish out the year. If you, dear reader, are going to one of these please drop me a line. Invariably, it’s the side meetings and discussions that are the most valuable at these conferences.

IT Architect Regional Conference 2007
October 15th – 16th, San Diego, CA

I’m a huge fan of IASA, so I’m thrilled to be doing their west regional conference. I’ve presented to a packed house for the local chapter before, so I think these folks will put on a good conference. They sure have a good selection of topics and speakers.

My session is called “Moving Beyond Industrial Software“. Here’s the abstract:

Computers have been instrumental in ushering in the post-industrial age. Yet, most enterprises today are run with an industrial mindset and the IT department is organized like a factory. This creates a tension between the forces of industrialization that define the organization and the forces of post-industrialization that define today’s marketplace. For example, our post-industrial world is becoming more decentralized by the day. Yet many organizations believe the key to a successful service oriented architecture – a very decentralized system design – is to have a central service repository.

In this session, Harry Pierson will examine this tension, get you thinking outside the industrial mindset and help you think about software development in a post-industrial way.

I’m very excited about this talk.

MS SOA & Business Process Conference
October 29th – November 2nd, Redmond, WA

I’m not presenting at MSSOABPC (that’s a mouthful) but looks like most of my team is going. So if you’re going and want to hang out with the guys who are doing this stuff in the trenches @ MSIT, let me know. Also, I put out the call for anyone interested in a geek dinner. From the agenda, looks like they’re keeping us busy until 8pm every night Mon-Wed, so we can either a) have geek dinner Thursday or Friday or b) have geek beers after one of the receptions in the early part of the week.

patterns & practices Summit USA West
November 5th – 9th, Redmond, WA

I did the p&p Summit back in 2005, a very successful debut of my Developer 2.0 talk. (I’m doing that talk at a different conference this year, details below.) This year, I’m not 100% sure what I’m going to talk about yet. I’m currently slated to talk about the Rome project that I’m doing in MSIT, but given our current slow progress on that project, I’m probably going to talk about something else. I’m thinking either the “Moving Beyond Industrial Software” talk described above or the “Facing the Fallacies of Distributed Computing” talk described below. Any other suggestions?

DevTeach Vancouver 2007
November 26th – 30th, Vancouver, BC

This is a brand new experience for me. Frankly, I’d never heard of DevTeach before my friend Mario Cardnial suggested I submit a couple of sessions. Since it’s only a few hours drive away, I’m bringing the family along. We’ll see how that goes. And when I’m not doing my sessions or hanging out with the family, I might take in a session or two in the XNA track.

Here are the sessions I’m doing:

Developer 2.0
Finding Your Way in the Future of Software Development

The one constant in software development is change. Software development in 2007 is dramatically different than it was in 2000, which was in turn dramatically different than in 1993. You can be guaranteed that the platforms, languages, and tools will continue to evolve. Learn how Harry Pierson, Architect in Microsoft IT, believes software development is going to evolve in the next five years and what you must do today to remain competitive.

Facing the Fallacies of Distributed Computing
Sun Fellow Peter Deutsch is credited with authoring “The Eight Fallacies of Distributed Computing”. These are near-universal assumptions about distributed systems that “All prove to be false in the long run and all cause big trouble and painful learning experiences.” In this session, we will examine these fallacies in depth and learn how to avoid them on the Windows platform by leveraging Web Services, WCF and SQL Service Broker.

The Integration Business Case, continued

Nick responds to my visceral thoughts on the integration business case. There’s no point in excerpting it – go read the whole thing. I’ll wait.

It looks like for case #2, he added the ability to “change readily and inexpensively”, which is to say he made it overlap even further with #4 than it used to. He also changed #3 to make it clear that he was collecting metrics to give us “awareness of process efficiency”. That makes #3 overlap with #4 on efficiency instead of #1 on BI, but either way it’s still redundant.

So we’re still left with the business cases of Business Intelligence, Efficiency and Agility. Nick conflates Efficiency and Agility both in his original post and his follow-up, but I think it makes sense to separate them. I still stand by my original point that the business is only interested in directly funding Business Intelligence.

Nick is willing to bet a nice lunch that MSFT has invested more in improving operational efficiency that we have on BI in the past four years. He’s probably right, but he missed the point I was making. The business will readily invest in improving a specific process they can measure the ROI on improving. MSFT has lots of processes, I’m sure most of them have significant room for improvement.

But Nick’s list isn’t about specific improvements. He’s explicitly wrote that he’s describing a scenario where “our systems are all optimally integrated”. Selling the business on generally improving efficiency is very different that selling the business on improving the efficiency of a specific process. I’d bet the same nice lunch that the vast majority – if not all – of integration infrastructure running at MSFT was originally deployed as part of a specific business scenario that needed to be solved.

My point here is most businesses are better at funding projects to meet specific business needs than it is at funding pure infrastructure projects.

As for agility. Martin Fowler pointed out once that adding flexibility means adding complexity. But chances are, you’ll be wrong about the flexibility you think you’ll need. So you actually end up with the additional complexity but none of the flexibility benefit. Martin recommends “since most of the time we get it wrong, just don’t put the flexibility in there”. Instead, you should strive for simplicity, since simpler systems are easier to understand and thus easier to change.

Does the same philosophy apply to process? I think so, though there is one thing I’d be willing to risk being wrong on. We all expect the steps in a process to change over time, so moving to a declarative model for process definition sounds like a good idea. Luckily, there’s existing platform infrastructure that helps you out here. But beyond that, I can’t think of a flexibility requirement that I’m so sure of that I’m willing to take on the additional complexity.

Again, I’m not saying efficiency or agility (or integration for that matter) are bad things. I’m saying they’re a tough sell to the business in the absence of specific scenarios. Selling the business on automating the ordering processing is feasible. Selling the business on building out integration infrastructure because some future project will leverage it is much tougher. If you can sell them on it, either because the company is particularly forward thinking or because you can sell ice to Eskimos, then more power to you. But for the rest of us, better to focus on specific scenarios that the business will value and keep the integration details under wraps.