Morning Coffee 135

  • Bill Gates does his last CES Keynote, and we announce a PC that looks like a purse?
  • News that Warner Brothers is going exclusively Blu-Ray is disappointing. However, I’m convinced that neither side will win this format war but that online downloads will trump both. Obviously, XBLM is a significant player in this space, but the market is crowding up quickly. Netflix apparently will unveil a new set-top box @ CES to let you watch HD movies via the Internet.
  • Don Syme has a roundup of posts by John Liao about F#. Mostly, WPF + F# with a couple of ASP.NET 2.0 posts and one on XML .
  • Speaking of F#, Stephan Tolksdorf has been working on an F# port of MS Research’s Parsec library called FParsec. Parsec is a “monadic parser combinator library”, something I have little experience with, so I’ve gone back to some source research on the topic, which I hope to blog at length about soon.
  • Steve Vinoski talks about serendipitous reuse in his latest Internet Computing article. I’m not a believer in reuse in the enterprise, serendipitous or otherwise, but I liked the conclusion to Steve’s article when he wrote “It’s highly ironic that many enterprise architects seek to impose centralized control over their distributed organizations. In many cases, such centralization is a sure recipe for failure.” Also, his point that “control without controlling” works sounds vaguely familiar.
  • Update: This is really Morning Coffee 136, but I don’t want to change the title since it’s part of the URL

Morning Coffee 134

  • Bill de Hora responds to a few of my Durable and RESTful ideas. He points out that relying on a client-generated ID can be troublesome, and recommends using multiple identifiers – one created by the sender, one by the receiver and one representing the message exchange itself. However, the sender ID is vulnerable to client bugs & tampering as Bill points out, and neither the receiver ID nor the exchange ID can be used to determine if a given message is a duplicate. If you don’t trust the sender, is it even possible to determine if a given message is a duplicate?
  • Pablo Castro confirms that there are “practical limits” to what ADO.NET Data Services can do with respect to idempotence. Nothing in his post was surprising, though I hope it will be more explicitly called out in the final docs. Developers used to the comforting protection of a transaction may be in for a rude awakening.
  • Dare Obasanjo has a great post comparing the new features in C# 3.0 to dynamic languages like IronPython. I believe many of the productivity aspects of dynamic languages have little to do with being dynamic.
  • Pat Helland noodles on durability and messaging, two topics near and dear to my heart (probably from working with him for a couple of years). I’m not sure where he’s going with this – his conclusion that “Basically, big, complex, and distributed system are big, complex, and distributed” isn’t exactly ground-breaking. But his point that “durable” isn’t a binary concept is worth more consideration. Also, his description of IMS only looking at the effects of a committed transaction is very similar to how web sites work, though obviously HTTP isn’t durable so you can’t make event horizon optimizations like IMS did.
  • Tangentially related, Werner Vogels discusses the idea of eventually consistent distributed databases. Today, that’s a problem mostly only Internet-scale sites like Amazon deal with. In the near future of continued data explosion + manycore, we’ll all have to deal with it.
  • Nick Malik argues that categorizing enterprise applications by lifecycle is much less useful than categorization based on organizational impact. He might also need a new chair.
  • Jesus Rodriguez digs into one of SSB’s new features in SQL 2008: conversation priorities.
  • Arnon Rotem-Gal-Oz and Sam Gentile are mixing it up over the definition of SOA. Sam thinks SOA has to include business drivers and Arnon doesn’t. I’m with Sam on this, defining “SOA” independently from “Applying SOA” seems pointless. Then again, rigorously defining SOA – much less arguing about said definition – seems like a waste of time in the first place IMHO.
  • Wow, this guy Zed is mad at the Ruby community.
  • Andrew Baron has 8 Reasons Why The TV Studios Will Die. Personally, I think reason #2 – Expendable Middle-Person – is the most important. If content producers can reach consumers directly, what value-add will the networks provide? (via United Hollywood)

Durable and RESTful

A while back I wondered if it’s still REST if you don’t use HTTP. The reason I wondered that is because like many I’ve become disillusioned with the WS-* stack over time and see REST as a viable alternative to all that spec-driven complexity. However, just because I’m looking to REST means I’m willing to give up on durable messaging. So I shouldn’t be asking “can I do REST without HTTP?” I should be asking “what protocol can I use to do durable messaging with REST?”

It turns out HTTP is just fine for RESTful durable messaging, if you take the time to make your POSTs idempotent. There’s even a IETF RFC that builds on HTTP and specifies a mechanism to do it.

As I wrote last month, idempotence is critically important to ensuring “things” happen exactly once when connecting disparate systems together. At the end of that post, I asked you, dear reader, to contemplate just how durable messaging systems ensures exactly once delivery. They do it by assigning messages to be delivered a unique identifier. Any non-idempotent operations can be made idempotent with unique identifiers and a message ID log.

“Not Idempotent:
Withdrawing $1 Billion.
If Haven’t Yet Done Withdrawal #XYZ for $1 Billion,
Then Withdraw $1 Billion and Label as #XYZ”
Pat Helland

For example, when you send a message in MSMQ, it’s assigned a 20 byte identifier which is “unique within your enterprise.” 1 If the destination system receives multiple messages with the same message ID, it knows they are duplicates and can safely toss all but one of the messages with the same ID. Exactly once, no transactions.

While many operations in REST are naturally idempotent, using REST doesn’t magically make all your operations idempotent, contrary to popular belief. Have you ever seen a message like “please don’t press submit order twice” on the checkout page of an e-commerce website? It’s there because POST is not naturally idempotent and the site hasn’t taken any extra steps to identify duplicate POSTs. If the site embedded a unique ID in a hidden form field, it could use that to identify duplicate orders.

If you’re a RESTifarian, haven’t you seen this approach somewhere before?

Given that POST isn’t naturally idempotent, I think it’s kinda surprising that new resources are created in AtomPub by POSTing them to a collection rather than PUTting them to a specific URL. RESTful Web Services specifically points out that PUT is idempotent, so I wonder why AtomPub uses POST. I’d guess most AtomPub implementations (aka blogs) aren’t much concerned about ensuring Exactly Once. If an blog entry gets posted twice, you delete one and go on with your life.

However, if you wanted to use AtomPub and ensure Exactly Once, you can use the fact that Atom entries must contain exactly one ID element which as per the spec must be universally unique. From reading the Atom spec, the ID element seems primarily designed for Atom feed consumers, but AtomPub servers could also use it as an “idempotence identifier”, similar to how MSMQ uses the message ID. If you end up with multiple entries with the same entry ID, discard all but one.

So by creating a unique identifier on the client side and logging that identifier on the server side, we can make any REST service idempotent. We can make it a durable service if we write the outgoing message – with the message ID we generate – to a durable store before trying to send it. If you write it to a durable store within the scope of a local transaction, you’re even closer to duplicating MSMQ’s functionality, yet the only protocol requirement beyond vanilla HTTP is having a unique message ID.

The one problem with the Atom entity ID approach is that it requires cracking the message in order to see if we should process it. For REST services, I would think we’d want to stick the idempotence identifier in an HTTP header. We already headers to implement conditional GET, why not a header for what amounts to conditional POST?

Turns out such a header exists in the AS2 spec, i.e. “MIME-Based Secure Peer-to-Peer Business Data Interchange Using HTTP”. AS2 defines a Message-Id HTTP header which “SHOULD be globally unique”. In the case of an HTTP error, AS2 specifies the “POST operation with identical content, including same Message-ID, SHOULD be repeated” and that “Servers SHOULD be prepared to receive a POST with a repeated Message-ID.” I assume this implies a server shouldn’t process a message with the same ID twice.

So what would a durable REST service look like? I think like this:

  1. Sending system records the intent to send a message by saving it to a local durable store, potentially in the scope of a local transaction. As part of saving the message, a unique message id is generated (I’d use a GUID, but as long as it’s unique it doesn’t matter.)
  2. A background thread in the sending system monitors the durable message store. When a new to-be-sent message arrives, the thread POSTs it to the destination, setting the Message-Id HTTP header to the unique identifier generated in step 1.
  3. The receiving system stores the Message-Id header value in a log table and processes the received message, potentially in the scope of a local transaction. Optionally, it can store the return message (if there is one) in the durable store as well.
  4. If the sending system doesn’t receive a 2xx status code, it rePOSTs the message to the receiving system until it does.
  5. If the receiving system receives a message that’s already listed in the log table, it ignores it and returns a success status code. Optionally, if the return message has been saved, the receiving system can resend the return message as long as it doesn’t redo the work.

This seems like a better approach than my original direction of doing REST over a durable protocol like MSMQ or SSB. What do you think?

Update: Erik Johnson points out that an HTTP POST’s idempotency is “left unsaid”. So my statement that “POST isn’t idempotent” isn’t quite correct. POST isn’t naturally idempotent. I’ve updated the post accordingly.

  1. Technically, the MSMQ message ID isn’t universally unique as it is a 16 byte GUID representing the source system + a 4 byte sequence number. The sequence number can rollover, after sending 2^32 messages. In practice, rolling over the message ID after 4 billion messages is rarely an issue.

Morning Coffee 124

  • While my blog was down last week, I finally finished Gears of War. I played thru on hardcore, but had to throttle back to casual to beat the last boss. I’d like to try and finish on hardcore, but I’ve moved on to Dead Rising – another game from last year I never had time to finish. I’m almost done the main play mode, though I understand there are other play modes that get unlocked when you finish it.
  • I’m forbidden from buying any new games before Christmas, so Mass Effect, Assassin’s Creed and The Orange Box will have to wait. My next game will either be Blue Dragon, which a friend let me borrow, or R6:Vegas, yet another (but the last) game from last year I never got time to play.
  • I’ll skip the “giving thanks” jokes and point out that Visual Studio 2008 and .NET FX 3.5 have shipped.  Soma has the announcement and both Scott Guthrie and Sam Gentile summarize what’s new. The Express editions are available from the new Express Developer Center. The VS SDK doesn’t appear to be released yet, but I’m sure it will be along in due course.
  • Speaking of VS SDK, CoDe Magazine did an entire issue on VS Extensibility which you can read online or download as PDF.
  • Nick Malik took a bunch of heat back in June for what some thought was a redefinition of Mort, one of the Developer Division personas. Now Paul Vick thinks it’s time to retire the Mort persona, primarily because of the negative connotation the name carries. His suggestion for a replacement is Ben (as in Franklin). And did you notice how similar Paul’s description of Mort is to what Nick described? I’d say some folks owe Nick an apology.
  • I said Friday I was going to take a closer look @ OpenID and OAuth. There’s an intro to OpenID on their wiki and Sam Ruby’s OpenID for non-SuperUsers seems to be the canonical source on implementing OpenID on your own blog. Frankly, reading the OpenID intro reminded me a lot of WS-Federation Passive Requestor Profile. Does OpenID have the equivalent of an “active” mode?
  • Likewise, the Beginner’s Guide to OAuth series of posts by Eran Hammer-Lahav is a good intro to OAuth. The phrase “Jane notices she is now at a Faji page by looking at the browser URL” from the protocol walkthru makes me worry that OAuth is vulnerable to phishing. Having one of the OAuth authors call phishing victims careless and wishing for Karl Rove to “scare people into being more careful and smarter about what they do online” makes me think my fears are well grounded. I’m thinking maybe OAuth and OpenID aren’t quite ready to nail down WS-*’s coffin.
  • In researching OpenID, I came across this presentation hosted on SlideShare. I had never seen SlideShare before – it’s kinda like YouTube for presentations. Sharing basic presentations is kinda lame – there doesn’t appear to be any animation support, so the slides are basically pictures. However, they also support “slidecasting” where you sync slides to an audio file hosted elsewhere. That I like. I have a bunch of old decks + audio, maybe I’ll stick them up there.

Afternoon Coffee 123

  • Morning Coffee is late this morning because we went for our Christmas portrait this morning and it took forever. The pictures turned out great though.
  • Nick Malik finishes up his series on business operation models by covering the diversification model. Also, Nick’s points about the synergy between a diversified model and the coordinated model are spot on. I happen to be a big fan of those models (aka the models with low standardization) which probably drives some of the  more my “unique” perspectives on SOA.
  • Scott Guthrie starts out a new series and future technology, this time it’s ASP.NET MVC Framework that gets the series treatment. The first entry in the series is a general overview. I wonder why there’s no cool code name for the MVC framework? Whatever it’s named, I like the auto routing and action rules – it seems very Rails-inspired.
  • Over the weekend, Don Box points out that the REST authentication story “blows chunks”. I’ve recently given up on the reliable part of the original “Secure, Reliable, Transacted Web Services” vision – and I never believed the transacted part. Security, on the other hand, is the one part of that original vision that has worked out IMO. My experience with the WS-* security stack has been pretty good, though Dare Obasanjo thinks that OpenID and OAuth are the final nail in the WS-* coffin.
  • Speaking of Dare, he goes on to say WS-* is to REST as Theory is to Practice. He makes the point that “The only times I encounter someone with good things to say about WS-* is if it is their job to pimp these technologies or they have already “invested” in WS-* and want to defend that investment.” I gave up pimping evangelizing technology a while back and I don’t want to be in the position of defending a bad investment, so I’m spending lots of time looking at REST.
  • Jesus Rodriguez takes a look at the Managed Services Engine and comes away excited. Jesus is a self-described “strong believer” in SOA governance. I’m a self-described strong disbeliever in SOA governance, so MSE sounds like more of the Worst of Both Worlds to me.
  • A little light reading: I pulled Applied Cryptography and A New Kind of Science out of my garage last weekend. Plus my copies of RESTful Web Services and Programming Erlang just arrived yesterday.