Passion * Technology * Ruthless Competence

Thursday, August 27, 2009

The Last Mile of the Internet

Christian Weyer makes a great comment on yesterday’s post about the barbarian rediscovery of async messaging:

But how do these two toolkits solve the NAT/Firewall issue? Without a solution to this they are pretty much useless in breadth usage.

Simply put, they don’t. Frankly, they don’t even try. And I agree with Christian that the NAT/Firewall issue makes any async messaging based approach useless for clients. It’s kind of like the last mile problem in the telco/cable industries – you’ve got this great capability in the center, but you can’t leverage its full potential because of the massive effort it takes to push that capability all the way to the edge of the network.

Dave Winer has been pretty explicit with his RSS Cloud work: “The goal is to have a Small Pieces Loosely Joined equivalent of Twitter.” PubSubHubbub doesn’t mention Twitter by name, but the protocol spec specifically says “Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today”. Both specs have a fundamental design that looks like this:

image

This picture leaves out multiple publishers and subscribers and the subscriber registration process, but you get the basic idea. And it all works great assuming that both the subscriber and the pub/sub infrastructure can accept incoming connections. While that seems like a fairly safe assumption for infrastructure pieces, it is clearly a faulty assumption for any subscriber running locally on a client machine. Client machines primarily live behind firewalls at the office, behind NAT routers at home or on mobile wireless network – all of which disallow most if not all incoming connections. In other words, this works just fine for server subscribers (like, say Google Reader) but not for client subscribers (like, say TweetDeck).

image

As far as I can tell, the only way to enable client subscribers to play in this async messaging world is via some type of relay service. Any other solution I can think of depends on mass adoption of new technology, which as I mentioned in my last post is nearly impossible.

image

In this approach, the client subscriber makes an outbound connection to some type of relay infrastructure, which in turn creates a endpoint on the public internet for that client. Registration for pub/sub happens as normal, using the relay endpoint as the notification URL. Then, when a message arrives on the relay endpoint, it’s sent back down the outbound connection to the client.

The relay approach is technically feasible – it’s used in many places today. Exchange DirectPush uses this approach to support real-time delivery of mail to mobile devices – though the relay capability is built directly Exchange client access servers rather than available as a separate service. The .NET Service Bus – part of Windows Azure – provides a hosted relay infrastructure that anyone can leverage (though their support of non-windows platforms is pretty weak). I haven’t worked with it, but it looks like Opera’s new Unite platform includes a relay service as well (note, they call it a proxy service). Nice thing about Opera Unite is the async messaging infrastructure is built right into their browser, though you could achieve something similar in any browser using Flash or Silverlight.

Yes, having to relay messages sucks. But the question is, which sucks worse: polling or relaying?

Posted By Harry Pierson at 11:08 AM Pacific Daylight Time

Wednesday, August 26, 2009

Async Messaging and the Barbarian Hordes

At PDC 1996, Pat Helland did a six minute bit where he compared personal computing to the sacking of Rome and Microsoft Transaction Server to the Renaissance. It was called “Transaction Processing and the Barbarian Hordes” and in my opinion it should be required viewing for everyone in the tech industry.

Of course, the tech industry has changed significantly since PDC96. In particular, personal computing has become the new “Classical Rome” and web developers are the new barbarians. Just as Microsoft rediscovered transaction processing in the 90’s, it seems that RESTifarians are on the verge of rediscovering asynchronous messaging.

“The internet has been dead and boring for a while now.  It has reached a point of stability where flashes of technological creativity are rare, but every now and then some new technology can put a spark back in the ole gal (no sexism intended).

If you haven’t heard of WebHooks or PubSubHubBub its about time you did. Both are designed to  simplify and optimize the web.”

Mark Cuban, The Internet is about to change

Not to put too fine a point on it, but these “flashes of technological creativity” that Mark’s going gaga over aren’t new at all. Both Web Hooks and PubSubHubbub are essentially async messaging, the oldest form of messaging in the history of networking. But just as personal computing ignored the importance of transaction processing for a long time, REST has long ignored the importance of async messaging. Instead, web development has instead been focused exclusively on request/response – something I’ve struggled with for quite some time. But the rise of Twitter has driven many people to realize that something I’ve known since 2003: “In order to truly evolve syndication…we need to break free of the synchronous polling model.” [1]

imageI love the slogan from this Web Hooks presentation: “so simple you’ll think it’s stupid”. Web Hooks aren’t stupid – far from it – but they certainly are simple. They’re basically callbacks – which Web Hooks creator Jeff Lindsay readily acknowledges - invoked across the network using standard REST technology like HTTP and XML or JSON. The canonical webhook examples are Paypal Instant Payment Notification and GitHub Post-Receive Hooks. In both cases, you register a custom notification URL with the system in question. Then, when something specific happens in the system, a message gets POSTed to the registered URL. In some scenarios, it’s a simple notification. For example, when GitHub receives a commmit push, it POSTs a JSON message about the commit to the registered URL. In other scenarios, the initial message is the start of an async conversation - the system expects you to POST a message back to them sometime in the future. For example, when a customer makes a payment, PayPal POSTs a message to the URL you registered. You then confirm the payment by posting a message back to a well known PayPal URL.

Note, by the way, that both of these canonical examples depend on async messaging. GitHub isn’t going to do anything with a response anyway, so there’s no point in sending them a response. PayPal, on the other hand, is expecting a response. Yet, they use async messaging instead of an arguably simpler HTTP request/response operation. They do this for same reason WS-Transaction is the Anti-Availability Protocol – the last thing you want to do is lock up precious resources in your system waiting for some nimrod on the other side of the Internet to respond to a request you sent. Instead you what PayPal does – send an async message, listen on a separate channel for a response, correlate the messages explicitly via some kind of conversation identifier and release your precious resources to do other work while you wait for the response.

image As for PubSubHubbub, it’s focused on real time delivery of new information. Dave Winer’s recent RSS Cloud efforts focus on real-time notification as well. In both cases, instead of subscribers polling a given RSS feed for changes every X amount of time, they register for notification when the feed is updated. This is very similar to the way GitHub uses async messages for commit push notification as described above.

imageBoth PubSubHubbub and RSS Cloud include an intermediary that’s responsible for managing the list of current subscribers and relaying the notification when the publisher makes a change.  Honestly, I’m not a fan of the Hub/Cloud intermediary – it feels a little too ESB-like to me. However, since it’s only relaying notifications it receives without transformation, I can live with it. Besides, there’s no reason why a publisher can’t act as it’s own hub. The vast number of blogs and twitter users have so few subscribers that the extra layer of abstraction is probably not worth it. On the other hand, if you’re going to run a notification hub for the largest users, you might as well use it for smaller ones as well.

While I think Mark’s laid the “new technology” hype on pretty thick, I do think he hits the nail on the head regarding the major new business opportunities that can come from adopting the heretofore ignored async messaging model on the web:

“This could be an open door for the content business…Using The Associated Press as an example, AP could post their stories to a HUB. In realtime, the HUB can update member websites so that they will always have information first, before any aggregator. It may not take long for aggregators to recognize the new data on the member sites, but they won’t have it first.

The New York Times could do the same thing. Subscribers could get everything first, in realtime. Then after some delay which might be 1 minute, it might be 30 minutes depending on what the paper thinks is the value related to timeliness, it could post on the website and on twitter and facebook as updates. Would NY Times online readers pay $1 a month to be guaranteed that they get their news first, before anyone else ? I dont know.

In the sports world, text based play by play websites could be updated in realtime rather than pulling every 30 seconds or requiring the user to hit refresh every few seconds.”

Arguably, this opportunity is easier to realize precisely because async messaging isn’t new technology. Getting people to adopt a new technology is incredibly hard. It’s much easier to get people to adopt a new pattern for using an existing technology. And async messaging has been possible as long as the web has been in existence.

Web Hooks and PubSubHubbub are long overdue but very welcome steps forward in the evolution of the Internet. I wonder what the barbarians will rediscover next?


[1] Of course, writing a prediction like this is a far sight from actually implementing it. If I had actually put some engineering effort behind this in 2003, maybe I’d be a household name in the tech community by now. On the other hand, I said some things in that same post that have turned out to be spectacularly incorrect (“Indigo is going to make Longhorn a great platform for SOA”) so it probably wouldn’t have made much of a difference.

Posted By Harry Pierson at 11:33 AM Pacific Daylight Time

Tuesday, October 09, 2007

Throwing Gasoline on the Fire

Steve Vinoski has raised a bit of a flame war by admitting he has lost the ESB religion. Given that I've never been a fan of ESB's anyway, there's a lot there that I agree with. In particular I liked the description of "magical framework" middleware, blaming enterprise architects for driving ESB's as the "single integration architecture" even though a single *anything* in the enterprise is untenable and his point that flexibility means you don't do any one thing particularly well.

However, Steve goes on to bash compiled languages and WS-* while suggesting the One True Integration Strategy™ is REST + <insert your favorite dynamic language here>, then acts surprised that the conversation denigrates into "us vs. them". When you start by saying that compiled language proponents "natter on pointlessly", I think you lose your right to later lament the depreciating level of conversation .

All programming languages provide their own unique model of the execution environment.  Dynamic languages have a very different model than compiled languages. Arguing that this or that model is better for everyone, everywhere, in all circumstances seems unbelievably naive and arrogant at the same time.

On the other hand, I do agree with Steve's point that most developers only know a single programming language, to their detriment. One language developers often miss a better solution because their language of choice doesn't provide the right semantics to solve the problem at hand. Developers could do a lot worse than learn a new language. And I don't mean a C# developer should learn VB.

The most pressing example of picking the right language for the right problem today is multi-threading. Most languages - including dynamic languages - have shitty concurrency semantics. If you're building an app to take advantage of many-core processing, "mainstream" apps like C#, Java and Ruby won't help you much. But we're starting to see languages with native concurrency semantics like Erlang. Erlang is dynamically typed, but that's not what makes it interesting. It's interesting because of it's native primitives for spawning tasks. I don't see why you couldn't add similar primitives for task spawning to a compiled functional language like F#.

As for REST vs. SOAP/WS-*, I thought it was interesting that Steve provided no rationale whatsoever for why you should avoid them. The more I listen to this pissing match debate, the more I think the various proponents are arguing over unimportant syntactical details when the semantics are basically the same. SOAP is just a way to add metadata to an XML message, much as HTTP headers are. WS-* provides a set of optional message-level capabilities for handling cross-cutting concerns like security. Past that, are the models really that different? Nope.

For system integration scenarios like Steve is talking about, I'm not sure how important any of the WS-* capabilities are. Security? I can get that at the transport layer (aka HTTPS). Reliable Messaging? If I do request/response (which REST excels at), I don't need RM. Transactions? Are you kidding me? Frankly, the only capability you really need in this scenario is idempotence, and neither REST or SOAP provides any standard mechanism to achieve that. (more on that in a later post)

I understand that some vendors are taking the WS-* specs and building out huge centralized infrastructure products and calling them ESBs. I think Steve is primarily raging against that, and on that point I agree 100%. But Steve sounds like he's traded one religion for another - "Born Again REST". For me, picking the right tool for the job implies much less fanaticism than Steve displays in his recent posts.

Posted By Harry Pierson at 12:56 PM Pacific Daylight Time

Thursday, June 28, 2007

Morning Coffee 96

  • My friend David "LetsKillDave" Weller writes a long post on corporate blogging, responding to comments on the subject from Penny Arcade. Andre "Ozymandias" Vrignaud also responds. David is specifically talking about blogging within the gaming division, but they apply pretty broadly to Microsoft as a whole when it comes to blogging. "I don't want to get fired", "I don't want to do things that needlessly hurt my company" and "We can say things that PR or marketing people can't.  Or won't." all ring true to me.
  • Speaking of gaming, there seems to be more that your average cool games coming our for Xbox 360 this summer. I just picked up Forza 2 which rocks with the Racing Wheel. The Darkness looks very cool and I laughed my ass off playing the Overlord demo. Both shipped this week and have gotten good reviews. On their way in August are Bioshock and Blue Dragon. Of course, there are a few other big games coming this holiday. A good, but expensive, year to be a gamer.
  • I laughed my ass off reading Larry O'Brien's Top 10 Things To Do With Your Petaflop Supercomputer, esp. #9.
  • WSDL 2.0, it's official. Nick Allen has the news. Personally, WSDL seems to be the spec most responsible for driving RPC-style request/response web services, so let's just say that I am not a fan.  
  • Joe McKendrick thinks something is "holding back SOA"? I don't think it's any one thing, but certainly the RPC style that most web service toolkits pretty much force down your throat isn't helping.
  • Nick Malik thinks Acropolis is promising as a SOA service consumer, but Udi Dalan thinks it doesn't support multi-threading well enough. I lean towards Nick on this one since I see multi-threading as a language problem, which a library like Acropolis can't solve on it's own.
  • Jon Flanders has been busy building the BizTalk Server 2006 extensions for Windows Workflow Foundation (June CTP) SDK Sample. I'm not sure why the marketing folks gave this such a long and involved name, but the sample does look pretty cool. Paul Andrews has the project overview and demo video. However, given that the WF workflows are hosted in BTS, is it accurate to say "No Biztalk Experience Required"?
  • Speaking of WF, Tomas Restrepo takes a detailed look at the new WF service hosting in .NET FX 3.5. Mostly, he likes what he sees. I have the same problem he does with the message correlation IDs. I'd like to have other options here, including support for what I call "message data correlation" (Tomas describes this as "natural correlating identifiers") and "address correlation" which is basically the REST model.
Posted By Harry Pierson at 10:25 AM Pacific Daylight Time

Wednesday, May 30, 2007

Morning Coffee 85

  • Microsoft announces Surface Computing. When can you buy one for your house? Probably not anytime soon. TechMeme has lots more.
  • The one piece of swag I want more than anything else at TechEd is an Evil Mastermind shirt.
  • Nick Allen notes that WSDL 2.0 has reached "proposed recommendation" stage. I guess having a "recommended" version of WSDL is an improvement over the "note" version. But other than having a RESTful HTTP binding in addition to the SOAP binding - and being longer - what's new?
  • Speaking of description languages, Don Box writes about the Web Application Description Language which looks very REST-y in that it supports specifying both the URI as well as the payload format. Like Don, I agree with Erik Johnson who commented that "people attracted to REST (in whatever form) are rebelling against interface-based programming more than WS-* itself". I have a longer post on this coming, but suffice to say I'm really souring on interface-based programming.
  • Nick Malik writes that WCF is immature because of it's "lack of a routable, intermediable, declared message durability option". Yeah, that's a huge problem in my book too. It also relates to the last bullet - since durable messaging is inherently async, it doesn't fit well into the interface-based programming model.
Posted By Harry Pierson at 11:32 AM Pacific Daylight Time

Friday, May 25, 2007

This Isn't The Droid I'm Looking For

Since David Ing responded to my REST/CRUD/CRAP entry on his blog, I guess that means I respond to his response here, right?

Actually, this is going to be very short(*) because I mostly agree with what David wrote. For example:

If we say stuff like 'REST shall/must replace all foolish WS-* SOAP systems (insert Nelson-style HaHa here)' then we are just repeating the same lessons as before - One size won't fit all and there is no One Ring to Bind Them illusion in REST as well. If you have something complex that fits the WS-* style then maybe REST isn't the droid you are looking for.

Of course, this begs the question "what is complex?" David described complex as "long running transaction updating multiple distributed stores". Personally, I'd replace "stores" with "systems" and I hate term "long running transaction" so I would have written "long running operation" or "business process", but grammatical nitpicking aside that certainly seems like a fair description. Certainly, most of the scenarios I'm looking at in my day job could be described as being long running and updating multiple systems. David says later that he thinks "a lot of apps don’t need to be modeled that way", but I'm not aware of the alternatives so I'll let him expand on this on his own blog. I'll try to get to writing about my thoughts around protocol state next week.

I did disagree with this one point:

We may say REST is really about a protocol state and not CRUD, but unless the rest of the world gravitates to that view then I'm afraid it just won't be. If, say, through some ongoing groundswell of common usage, people start modelling entities as dereferenceable URI's and using POST to do Create and Updates, then REST will be about CRUD by default. This is all very unacademic and unjust, but thems the breaks peachy. The lesson from SOAP is not to fight it by trying to re-educate the masses after they get a perception in their head.

I think it doesn't really matter how the rest of the world gravitates, it only matters how you as the the service provider choose to expose your service. If you're doing something simple like ATOM publishing, you can probably get away with REST as CRUD. (Would that be hi or lo REST?) If you're doing something more complex, either in terms of being long running or needing multiple updates in one atomic operation like Tim's airline example, you'll probably need to gravitate towards REST as protocol state. But can't the two models can co-exist nicely, even in the same app? No re-education required.

UPDATE - From Fielding's Dissertation: "REST relies instead on the author choosing a resource identifier that best fits the nature of the concept being identified". My point exactly.


(*) Wow, this post wasn't that short at all. Can you imagine how long this would have been if I had mostly disagreed with David?

Posted By Harry Pierson at 1:59 PM Pacific Daylight Time

Thursday, May 24, 2007

REST is neither CRUD nor CRAP

In the wake of my praise for CRUD is CRAP, David Ing asked “how do you reconcile something like REST (astoria etc) with being CRUD-adverse? Is there a happy place where the two can go for coffee?” Sure there is: I hear Tim Ewald’s XML Nation serves great coffee and the scones are pretty good. (*)

Seriously, the key observation that Tim recently made is that REST != CRUD. Sure, it can be used that way and for simple scenarios it works fine. (I’ll define “simple scenarios” in a second.) But I don’t believe CRUD style REST works in the large. Tim said you can’t build with just CRUD because it’s “to simplistic to be useful”. I’ll go even more fundamental, using REST for CRUD means having giving up transactions entirely. I've already accepted that building loosely coupled services means giving up distributed transactions. But the idea of giving up transactions entirely is just crazy talk.

So when I said "simple scenarios" above, I meant "scenarios that don't need transactions". (I take it as a given that RESTifarians aren't hot for WS-AtomicTransaction.) ATOM Publishing is a simple scenario because the web resource authoring scenario doesn’t need transactions to protect updates to multiple resources at a time. If it did, I don't believe the REST as CRUD approach they use would work.

As you might guess then, I’m not a fan of Astoria. I believe the sweet spot for so called “data services” will be read only (because they don't need transactions, natch). I'm sure there are some read/write scenarios Astoria will be useful for, but I think they will be limited - at least within the enterprise.

If REST != CRUD, then what is it? Let's go back to Tim's post:

Every communication protocol has a state machine. For some protocols they are very simple, for others they are more complex. When you implement a protocol via RPC, you build methods that modify the state of the communication. That state is maintained as a black box at the endpoint. Because the protocol state is hidden, it is easy to get things wrong. For instance, you might call Process before calling Init. People have been looking for ways to avoid these problems by annotating interface type information for a long time, but I'm not aware of any mainstream solutions. The fact that the state of the protocol is encapsulated behind method invocations that modify that state in non-obvious ways also makes versioning interesting.

The essence of REST is to make the states of the protocol explicit and addressable by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you're moving to, making that your new state. A state's representation includes the links (arcs in the graph) to the other states that you can move to from the current state. This is exactly how browser based apps work, and there is no reason that your app's protocol can't work that way too. (The ATOM Publishing protocol is the canonical example, though its easy to think that its about entities, not a state machine.)

While I disagree with Tim's disagreement of ATOM (i.e. I believe APP is about entities, but it works because it doesn't need transactions), I agree 100% that REST is about protocol state. Tim lays this very clear in his airline reservation sample. Thus, I can spurn CRUD and still embrace REST if I want to.

Further, Tim's points on the opaque nature of RPC style interactions (which web services appear to have fallen into despite the best of intentions) are spot on. If you're doing simple request/response services, the protocol state is trivial, so that works fine. However, in the scenarios I face, long running services are the norm and managing the protocol state is critical. I've got some ideas on how to do that, but that's a future blog post.


(*) Actually, I have no idea if Tim even likes coffee or scones. FYI, DevHawk Nation would not feature great coffee or pretty good scones. We would, however, have Arrogant Bastard Ale on tap.

Posted By Harry Pierson at 11:09 AM Pacific Daylight Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.