Blogging F# Code

I’m going to start posting about my F# parsing code soon. Obviously, I’ll make the code directly available, but I’m also going to be writing about it quite a bit. Since I’ll be posting lots of F# code snippets, I took the time to build an F# language syntax definition for CodeHTMLer. Of all the various WL Writer Insert Code plug-ins, CodeHTMLer is my favorite because it can be configured not to use <pre> tags, which many RSS readers handle poorly (in my experience).

In case anyone else wants it, I’ve stuck the CodeHTMLer F# language definition up on my SkyDrive. If you using the CodeHTMLer WL Writer Plug-in, you can easily add this to your machine. Once you’ve installed CodeHTMLer and run it once, go to the command line and type cd %appdata%WindowsLiveWriter and you’ll find the LanguageDefinitions.xml file. Edit that file to insert the add the contents of my F# language definition after the <CodeLanguages> tag and you’re all set.

BTW, the first language in the file will be the default language in the plug-in, so if you’re an occasional F# user, you might want to add the F# definition to the end rather than the beginning of the file. If you don’t want to further edit the XML file manually, you can select “Edit Languages” in the plug-in and edit the order of the languages to your heart’s content.

Morning Coffee 128

After using Outlook 2007 as my RSS reader for a few months, I’ve gone back to RSS Bandit. I run two work machines (desktop + laptop) and I finally got tired duplicated blog entries because each copy of Outlook downloads the same post. Also, for some reason Outlook downloads the same Technorati posts over and over again.
ADO.NET Entity Framework Beta 3 was released. The latest CTP of the EF Tools is also available. And as per the press release, EF has gained support from “Core Lab, DataDirect Technologies, Firebird Foundation Inc., IBM Corp., MySQL AB, Npgsql , OpenLink Software Inc., Phoenix Software International, Sybase Inc. and VistaDB Software Inc”. I’m not sure what that means, exactly, but I guess you’ll be able to LINQ to Entities on a wide variety of DB platforms. Interesting Oracle isn’t on that list. Not really surprising, but interesting.
Here’s a new ASP.NET MVC article from Scott Guthrie, this one on views and how you pass data to one from a controller. Using generics to get strongly-typed ViewData is pretty sweet. But where’s the MVC CTP that was supposed to be here this week?
In news about web app tool previews that did ship this week, Live Labs announces Volta. Haven’t installed or played with it yet, but I did read the fundamentals page. It primarily looks like a tool to compile MSIL -> JavaScript, so you can write your code in C# but execute it in the browser. Sam and Jesus are excited, Arnon not so much. Arnon’s argument that being able to postponing architectural decisions is to good to be true is fairly compelling, and not just because he quotes me to support his argument. But I’ll download it and provide further comment after I experiment with it myself.
Simple Sharing Extensions is now FeedSync. Not sure what else is new about it, other than it’s been blessed with “1.0″ status. The Live FeedSync Dev Center has an introduction, a tutorial and the spec. (via LiveSide)
Dare likes tuples. Me too. I also like symbols.

Durable and RESTful

A while back I wondered if it’s still REST if you don’t use HTTP. The reason I wondered that is because like many I’ve become disillusioned with the WS-* stack over time and see REST as a viable alternative to all that spec-driven complexity. However, just because I’m looking to REST means I’m willing to give up on durable messaging. So I shouldn’t be asking “can I do REST without HTTP?” I should be asking “what protocol can I use to do durable messaging with REST?”

It turns out HTTP is just fine for RESTful durable messaging, if you take the time to make your POSTs idempotent. There’s even a IETF RFC that builds on HTTP and specifies a mechanism to do it.

As I wrote last month, idempotence is critically important to ensuring “things” happen exactly once when connecting disparate systems together. At the end of that post, I asked you, dear reader, to contemplate just how durable messaging systems ensures exactly once delivery. They do it by assigning messages to be delivered a unique identifier. Any non-idempotent operations can be made idempotent with unique identifiers and a message ID log.

“Not Idempotent:
Withdrawing $1 Billion.
Idempotent:
If Haven’t Yet Done Withdrawal #XYZ for $1 Billion,
Then Withdraw $1 Billion and Label as #XYZ”
Pat Helland

For example, when you send a message in MSMQ, it’s assigned a 20 byte identifier which is “unique within your enterprise.” ¹ If the destination system receives multiple messages with the same message ID, it knows they are duplicates and can safely toss all but one of the messages with the same ID. Exactly once, no transactions.

While many operations in REST are naturally idempotent, using REST doesn’t magically make all your operations idempotent, contrary to popular belief. Have you ever seen a message like “please don’t press submit order twice” on the checkout page of an e-commerce website? It’s there because POST is not naturally idempotent and the site hasn’t taken any extra steps to identify duplicate POSTs. If the site embedded a unique ID in a hidden form field, it could use that to identify duplicate orders.

If you’re a RESTifarian, haven’t you seen this approach somewhere before?

Given that POST isn’t naturally idempotent, I think it’s kinda surprising that new resources are created in AtomPub by POSTing them to a collection rather than PUTting them to a specific URL. RESTful Web Services specifically points out that PUT is idempotent, so I wonder why AtomPub uses POST. I’d guess most AtomPub implementations (aka blogs) aren’t much concerned about ensuring Exactly Once. If an blog entry gets posted twice, you delete one and go on with your life.

However, if you wanted to use AtomPub and ensure Exactly Once, you can use the fact that Atom entries must contain exactly one ID element which as per the spec must be universally unique. From reading the Atom spec, the ID element seems primarily designed for Atom feed consumers, but AtomPub servers could also use it as an “idempotence identifier”, similar to how MSMQ uses the message ID. If you end up with multiple entries with the same entry ID, discard all but one.

So by creating a unique identifier on the client side and logging that identifier on the server side, we can make any REST service idempotent. We can make it a durable service if we write the outgoing message – with the message ID we generate – to a durable store before trying to send it. If you write it to a durable store within the scope of a local transaction, you’re even closer to duplicating MSMQ’s functionality, yet the only protocol requirement beyond vanilla HTTP is having a unique message ID.

The one problem with the Atom entity ID approach is that it requires cracking the message in order to see if we should process it. For REST services, I would think we’d want to stick the idempotence identifier in an HTTP header. We already headers to implement conditional GET, why not a header for what amounts to conditional POST?

Turns out such a header exists in the AS2 spec, i.e. “MIME-Based Secure Peer-to-Peer Business Data Interchange Using HTTP”. AS2 defines a Message-Id HTTP header which “SHOULD be globally unique”. In the case of an HTTP error, AS2 specifies the “POST operation with identical content, including same Message-ID, SHOULD be repeated” and that “Servers SHOULD be prepared to receive a POST with a repeated Message-ID.” I assume this implies a server shouldn’t process a message with the same ID twice.

So what would a durable REST service look like? I think like this:

Sending system records the intent to send a message by saving it to a local durable store, potentially in the scope of a local transaction. As part of saving the message, a unique message id is generated (I’d use a GUID, but as long as it’s unique it doesn’t matter.)
A background thread in the sending system monitors the durable message store. When a new to-be-sent message arrives, the thread POSTs it to the destination, setting the Message-Id HTTP header to the unique identifier generated in step 1.
The receiving system stores the Message-Id header value in a log table and processes the received message, potentially in the scope of a local transaction. Optionally, it can store the return message (if there is one) in the durable store as well.
If the sending system doesn’t receive a 2xx status code, it rePOSTs the message to the receiving system until it does.
If the receiving system receives a message that’s already listed in the log table, it ignores it and returns a success status code. Optionally, if the return message has been saved, the receiving system can resend the return message as long as it doesn’t redo the work.

This seems like a better approach than my original direction of doing REST over a durable protocol like MSMQ or SSB. What do you think?

Update: Erik Johnson points out that an HTTP POST’s idempotency is “left unsaid”. So my statement that “POST isn’t idempotent” isn’t quite correct. POST isn’t naturally idempotent. I’ve updated the post accordingly.

Technically, the MSMQ message ID isn’t universally unique as it is a 16 byte GUID representing the source system + a 4 byte sequence number. The sequence number can rollover, after sending 2^32 messages. In practice, rolling over the message ID after 4 billion messages is rarely an issue.↩

Functional Understanding

I was showing some of my cool (well, I think it’s cool) F# parsing code to some folks @ DevTeach. I realized very quickly that a) most mainstream developers are fairly unaware of functional programming and b) I suck at explaining why functional programming matters. So I decided to take another stab at it. I probably should have posted this before my recent series on F#, but better late than never I suppose.

Right off the bat, the term “functional” is confusing. When you say “function” to a mainstream developer, they hear “subroutine“. But when you say “function” to a mathematician, they hear “calculation“. Functions in functional programming (aka FP) are closer to the mathematic concept. If you think about math functions, they’re very different than subroutines. In particular, math functions have no intrinsic mutable data. If you have a math function like f(x) = x³, f(7) always equals 343, no matter how many times you call it. This is very different then a function like String.Length() where the value returned depends on the value of the string.

Another interesting aspect of math-style functions is that they have no side-effects. When you call StringBuilder.Append(), you’re changing the internal state of the StringBuilder object. But FP functions don’t work like that. Providing the same input always provides the same output (i.e. the same independent variable always yields the same dependent value).

If you’re a .NET developer, this may sound strange, but you’re probably very familiar with the String class which works exactly the same way.

A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

In other words, all variables in FP are a lot like .NET Strings. In fact, in many FP languages, variables are actually called “values” because they don’t, in fact, vary.

It turns out that this approach to programming has significant upside for unit testing and concurrency. Unit tests typically spend a significant effort getting the objects they’re testing into the right state to invoke the function under test. In FP, the result of a function is purely dependent on the values passed into it, which makes unit testing very straight forward. For concurrency, since functions don’t share mutable state, there’s no need to do complicated locking across multiple processors.

But if values don’t vary, how to we managed application state? FP apps typically maintain their state on the stack. For example, my F# parser starts with a string input and return an abstract syntax tree. All the data is passed between functions on the stack. However, for most user-oriented non-console applications, keeping all state on the stack isn’t realistic. As Simon Peyton Jones points out, “The ultimate purpose of running a program is invariably to cause some side effect: a changed file, some new pixels on the screen, a message sent, or whatever.” So all FP languages provide some mechanism for purposefully implementing side effects, some (like Haskell) stricter in their syntax than others.

One of the nice things about F#’s multi-paradigm nature is that side effects is a breeze. Of course, that’s both a blessing and a curse, since the much of the aforementioned upside comes from purposefully building side-effect free functions. But the more I work with F#, the more I appreciate the ability to do both functional as well as imperative object-oriented operations in the same language. For example, my parsing code so far is purely functional – it takes in a string to be parsed and returns an AST. But the logical next step would be to generate output based on that AST. Since F# supports non-functional code – not to mention the rich Base Class Library – generating output should be straightforward.

Studio Busting

A week ago, I wrote that the ongoing writers strike might accelerate the transition to Media 2.0. Several other folks think the same way and explain why much better than I have. Marc Andreessen (aka creator of Mosaic) has a fantastic post that not only explains this transition better than I can, it also helped me understand my views on unions in general.

In the post, he describes two economic models – the Hollywood model and the Silicon Valley model. The Hollywood model is highly-centralized, with a small number of huge companies (aka “big media”) owning practically everything. In contrast, the Silicon Valley model is highly-decentralized, where pretty much anyone can create a company or bring a product to market. Marc believes that the entertainment industry at large is transitioning to the decentralized model. I agree 110% – the general decentralization trend is one I highlight in my “Moving Beyond Industrial Software” presentation that I’ve been delivering recently.

Unions are a response to the dramatic power differential between an employer and individual employees. By pooling (aka centralizing) their bargaining power, the union provides a counter-balance to the power wielded by the employer(s). But in a decentralized model, unions aren’t really necessary. Marc describes the “alignment of interests between creators and financiers” as “near-perfect”. Near-perfect might be a bit on the rosy side, but it’s a model I’m much more comfortable with than mega-corporations & unions.

Some believe that the AMPTP (aka the studios) is trying to break the entertainment unions. But what if those unions decided to break the studios? I gotta think that while there are lots of quality writers out there, the best in the business are members of the writers guild. What if they just decided to stop writing for the studios and go into business for themselves? Patrick Goldstein of the LA Times wonders the exact same thing.

Series

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.