The Last Mile of the Internet

Christian Weyer makes a great comment on yesterday’s post about the barbarian rediscovery of async messaging:

But how do these two toolkits solve the NAT/Firewall issue? Without a solution to this they are pretty much useless in breadth usage.

Simply put, they don’t. Frankly, they don’t even try. And I agree with Christian that the NAT/Firewall issue makes any async messaging based approach useless for clients. It’s kind of like the last mile problem in the telco/cable industries – you’ve got this great capability in the center, but you can’t leverage its full potential because of the massive effort it takes to push that capability all the way to the edge of the network.

Dave Winer has been pretty explicit with his RSS Cloud work: “The goal is to have a Small Pieces Loosely Joined equivalent of Twitter.” PubSubHubbub doesn’t mention Twitter by name, but the protocol spec specifically says “Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today”. Both specs have a fundamental design that looks like this:

This picture leaves out multiple publishers and subscribers and the subscriber registration process, but you get the basic idea. And it all works great assuming that both the subscriber and the pub/sub infrastructure can accept incoming connections. While that seems like a fairly safe assumption for infrastructure pieces, it is clearly a faulty assumption for any subscriber running locally on a client machine. Client machines primarily live behind firewalls at the office, behind NAT routers at home or on mobile wireless network – all of which disallow most if not all incoming connections. In other words, this works just fine for server subscribers (like, say Google Reader) but not for client subscribers (like, say TweetDeck).

As far as I can tell, the only way to enable client subscribers to play in this async messaging world is via some type of relay service. Any other solution I can think of depends on mass adoption of new technology, which as I mentioned in my last post is nearly impossible.

In this approach, the client subscriber makes an outbound connection to some type of relay infrastructure, which in turn creates a endpoint on the public internet for that client. Registration for pub/sub happens as normal, using the relay endpoint as the notification URL. Then, when a message arrives on the relay endpoint, it’s sent back down the outbound connection to the client.

The relay approach is technically feasible – it’s used in many places today. Exchange DirectPush uses this approach to support real-time delivery of mail to mobile devices – though the relay capability is built directly Exchange client access servers rather than available as a separate service. The .NET Service Bus – part of Windows Azure – provides a hosted relay infrastructure that anyone can leverage (though their support of non-windows platforms is pretty weak). I haven’t worked with it, but it looks like Opera’s new Unite platform includes a relay service as well (note, they call it a proxy service). Nice thing about Opera Unite is the async messaging infrastructure is built right into their browser, though you could achieve something similar in any browser using Flash or Silverlight.

Yes, having to relay messages sucks. But the question is, which sucks worse: polling or relaying?

Async Messaging and the Barbarian Hordes

At PDC 1996, Pat Helland did a six minute bit where he compared personal computing to the sacking of Rome and Microsoft Transaction Server to the Renaissance. It was called “Transaction Processing and the Barbarian Hordes” and in my opinion it should be required viewing for everyone in the tech industry.

Of course, the tech industry has changed significantly since PDC96. In particular, personal computing has become the new “Classical Rome” and web developers are the new barbarians. Just as Microsoft rediscovered transaction processing in the 90’s, it seems that RESTifarians are on the verge of rediscovering asynchronous messaging.

“The internet has been dead and boring for a while now.  It has reached a point of stability where flashes of technological creativity are rare, but every now and then some new technology can put a spark back in the ole gal (no sexism intended).

If you haven’t heard of WebHooks orPubSubHubBub its about time you did. Both are designed to  simplify and optimize the web.”

Mark Cuban, The Internet is about to change

Not to put too fine a point on it, but these “flashes of technological creativity” that Mark’s going gaga over aren’t new at all. Both Web Hooks and PubSubHubbub are essentially async messaging, the oldest form of messaging in the history of networking. But just as personal computing ignored the importance of transaction processing for a long time, REST has long ignored the importance of async messaging. Instead, web development has instead been focused exclusively on request/response – something I’ve struggled with for quite some time. But the rise of Twitter has driven many people to realize that something I’ve known since 2003: “In order to truly evolve syndication…we need to break free of the synchronous polling model.” 1

image

I love the slogan from this Web Hooks presentation: “so simple you’ll think it’s stupid”. Web Hooks aren’t stupid – far from it – but they certainly are simple. They’re basically callbacks – which Web Hooks creator Jeff Lindsay readily acknowledges – invoked across the network using standard REST technology like HTTP and XML or JSON. The canonical webhook examples are Paypal Instant Payment Notification and GitHub Post-Receive Hooks. In both cases, you register a custom notification URL with the system in question. Then, when something specific happens in the system, a message gets POSTed to the registered URL. In some scenarios, it’s a simple notification. For example, when GitHub receives a commmit push, it POSTs a JSON message about the commit to the registered URL. In other scenarios, the initial message is the start of an async conversation – the system expects you to POST a message back to them sometime in the future. For example, when a customer makes a payment, PayPal POSTs a message to the URL you registered. You then confirm the payment by posting a message back to a well known PayPal URL.

Note, by the way, that both of these canonical examples depend on async messaging. GitHub isn’t going to do anything with a response anyway, so there’s no point in sending them a response. PayPal, on the other hand, is expecting a response. Yet, they use async messaging instead of an arguably simpler HTTP request/response operation. They do this for same reason WS-Transaction is the Anti-Availability Protocol – the last thing you want to do is lock up precious resources in your system waiting for some nimrod on the other side of the Internet to respond to a request you sent. Instead you what PayPal does – send an async message, listen on a separate channel for a response, correlate the messages explicitly via some kind of conversation identifier and release your precious resources to do other work while you wait for the response.

image

As for PubSubHubbub, it’s focused on real time delivery of new information. Dave Winer’s recent RSS Cloud efforts focus on real-time notification as well. In both cases, instead of subscribers polling a given RSS feed for changes every X amount of time, they register for notification when the feed is updated. This is very similar to the way GitHub uses async messages for commit push notification as described above.

Both PubSubHubbub and RSS Cloud include an intermediary that’s responsible for managing the list of current subscribers and relaying the notification when the publisher makes a change.  Honestly, I’m not a fan of the Hub/Cloud intermediary – it feels a little too ESB-like to me. However, since it’s only relaying notifications it receives without transformation, I can live with it. Besides, there’s no reason why a publisher can’t act as it’s own hub. The vast number of blogs and twitter users have so few subscribers that the extra layer of abstraction is probably not worth it. On the other hand, if you’re going to run a notification hub for the largest users, you might as well use it for smaller ones as well.

While I think Mark’s laid the “new technology” hype on pretty thick, I do think he hits the nail on the head regarding the major new business opportunities that can come from adopting the heretofore ignored async messaging model on the web:

“This could be an open door for the content business…Using The Associated Press as an example, AP could post their stories to a HUB. In realtime, the HUB can update member websites so that they will always have information first, before any aggregator. It may not take long for aggregators to recognize the new data on the member sites, but they won’t have it first.

The New York Times could do the same thing. Subscribers could get everything first, in realtime. Then after some delay which might be 1 minute, it might be 30 minutes depending on what the paper thinks is the value related to timeliness, it could post on the website and on twitter and facebook as updates. Would NY Times online readers pay $1 a month to be guaranteed that they get their news first, before anyone else ? I dont know.

In the sports world, text based play by play websites could be updated in realtime rather than pulling every 30 seconds or requiring the user to hit refresh every few seconds.”

Arguably, this opportunity is easier to realize precisely because async messaging isn’t new technology. Getting people to adopt a new technology is incredibly hard. It’s much easier to get people to adopt a new pattern for using an existing technology. And async messaging has been possible as long as the web has been in existence.

Web Hooks and PubSubHubbub are long overdue but very welcome steps forward in the evolution of the Internet. I wonder what the barbarians will rediscover next?


  1. Of course, writing a prediction like this is a far sight from actually implementing it. If I had actually put some engineering effort behind this in 2003, maybe I’d be a household name in the tech community by now. On the other hand, I said some things in that same post that have turned out to be spectacularly incorrect (“Indigo is going to make Longhorn a great platform for SOA”) so it probably wouldn’t have made much of a difference.

Morning Coffee 172

  • I took the kids to see Fly Me To The Moon recently. We had to trek to Monroe (about 30 minutes away) because it’s a special 3D movie, and it was only playing there and in downtown Seattle. The movie’s story is insipid – three flies stow away on Apollo 11 – but all the space shots were actually kinda cool. It sure felt like they wanted to be scientifically and historically accurate about the the actual mission (well, other than the part about the flies). Patrick really liked it (he wants to build a rocket in the back yard) and Riley sat thru the whole thing with a minimum of fussing.
  • I’m a big fan of Joe Biden, so I’m really happy Obama picked him to be his running mate.
  • I know it’s old news but what the frak was John Edwards thinking? I like his policies, but the arrogance it takes to run for president when you know you’ve got that skeleton in your closet is mind-boggling.
  • On the other hand, watching the Sean Hannity and guest’s hypocrisy on Edwards’ affair, only to watch them scramble like cockroaches when Colmes points out McCain had admitted to having an affair was frakking hilarious.

OK, onto geek stuff:

  • My new boss Dave Remy has moved to a new blog. If you’re curious what he was up to for the 10 months he was away from Microsoft, he’s happy to share.
  • IPy and IRuby developer Curt Hagenlocher (aka Iron Curt) is blogging. Cue the Ozzy…I AM IRON CURT. Or don’t. Anyway, he dives in the deep end of the pool – no “hello world” lollyblogging for Iron Curt – digging into the stack implications of rethrowing exceptions and debugging emitted IL.
  • Srivatsn writes about static compilation of IPy scripts. Note, we’re not talking about static typing – it’s still the same good-old dynamically typed IronPython, just packaged up as an assembly, rather than as a bunch of .py files. Note, if you’re interested in compiling IronPython, you should check out the PYC sample we published as part of Beta 4.
  • Speaking of IPy Beta 4, Shri Borde posts about the COM dispatch support which is enabled by default as of Beta 4. If you’re driving COM automation clients (like Office) from IPy, this is a huge improvement over the old mechanism.
  • Jeff Hardy has released a new version of NWSGI, a managed version of Python’s Web Service Server Gateway Interface. My understanding is that this would allow any Python web stack written against WSGI to run in IIS with IronPython (subject to IronPython’s compatibility with CPython). Jeff’s been documenting his efforts getting Django running with NWSGI on his blog. Awesome work Jeff! (Thanks for the correction Seo!)
  • I never really bought into the “Attention Economy”, but Chris Anderson’s economic analysis of his DIY Drones site traffic was fascinating.
  • Lutz announces “it is time to move on” from Reflector and there was a collective horrified scream in the .NET community. He’s handing it over to Red Gate, who promised they “will continue to offer the tool for free to the community”.
  • I missed this when he posted it in June, but I really liked Nikhil Kothari use of the DLR in Silverlight to cut down on the XAML verbosity in his ViewModel action binding.
  • Brian McNamara previews the new Add Reference and file ordering support in the upcoming F# CTP. I’m really looking forward to the project-to-project reference support. I can’t tell you how many times I’ve gotten burned because my main project recompiled but my test project didn’t. You just get used to hitting Rebuild All instead of Build. As for file ordering, it’s a bit of a bummer that F# requires it, but the new experience is hella better than editing the project file by hand. I’m really looking forward to the new CTP.

Morning Coffee 164

  • Big news since my last Morning Coffee post was the announcement of Live Mesh. I’ve been running it for about a month, and I’m really digging it. Make sure you check out the team blog and watch the developer tour video (be on the lookout for IPy about half way thru the video)

ALT.NET

  • I had a great time @ the ALT.NET open space conference last weekend. I was somewhat distracted on Saturday as due to a family communication mixup, I had to bring my son Patrick with me. Jeffrey Palermo shot a cute video of him (3 minutes in) where he explains that he’s at the conference “to be with my dad”. Having a five year old is a little distracting, but everyone was amazingly cool with having him around. When he gets a little older I have no doubt he’ll be attending conferences and leading open sessions.
  • I did a session on F#, but it felt kinda all over the place. I hadn’t touched F# in a few months and it showed IMO. Matt Podwysocki was there to help keep the session from devolving into mass chaos. Thanks Matt.
  • My favorite session of the conference was Scott Hanselman’s “Are We Innovating?” talk, which I think originated from a question I asked him: There are many examples of large OSS projects in other dev communities that get ported to .NET (NHibernate, NAnt, MonoRail, etc). Can you name one that’s gone the other way? I can’t.
  • I took Matt’s advice and joined the local ALT.NET Seattle group.

DyLang Stuff

  • Martin Maly posts about how dynamic method dispatches are cached in three different layers by the DLR. You shouldn’t care about this stuff if you’re a DLR language user, but you will certainly care about it if you’re a DLR language builder.
  • I’m really excited to see Phil Haack (whom I met F2F @ ALT.NET) is experimenting with IronRuby & ASP.NET MVC. True, I’d rather it was IPy, but his Routes.LoadFromRuby would work with Python with very little code change.
  • Note to self, take a deeper look at Twining, the IPy database DSL by David Seruyange.
  • Daily Michael Foord – Ironclad 0.2 Released. Ironclad is a project to implement Python’s C extension API in C# so that IronPython could load standard Python C modules like SciPy and NumPy. So far, they’re able to load the bz2 module.

Other Stuff

  • Congrats to Brad and Jim for shipping xUnit.net 1.0.
  • Everyone seems to be jumping on the functional C# coding bandwagon. Bart De Smet’s series on pattern matching in C# is currently at eight posts. Now Luca Bolognese is in on the action, with three posts so far on functional code in C#. I like how Luca keeps writing that the C# syntax is “not terrible” for functional programming. Again, why suffer thru the “not terrible” syntax when you could be using F# instead? (via Charlie Calvert)
  • I need to take a look at VLinq. Charlie and Scott Hanselman both mentioned it recently.
  • I would like to have been in the conversation with Ted Neward, Neal Ford, Venkat Subramaniam, Don Box and Amanda Silver.
  • I haven’t had any time to play with XNA of late, which means the great list of GDC videos Dave Weller posted on the XNA team blog will remain beyond my ability to invest time for now.
  • There’s a new drop of Spec# from MS Research. IronRuby is using Spec# heavily as I recall.

Morning Coffee 163

Between MVP summit last week, ALT.NET this past weekend and an internal brown-bag presentation yesterday, my unread email and blog posts have piled up. Most of the following is old news, but I wanted to get something out. Especially since I feel a case of Caps Fever coming on that will force – forceyou understand – me to head home early today.

DyLang Stuff

  • My teammate Srivatsn demonstrates how to make your static C# types act more dynamic in order to interop better with DLR languages. For example, by implementing GetBoundMember and SetMemberAfter, you can support setting arbitrary attributes on a C# class from Python. Cool.
  • Today’s Michael Foord link: On Testing: Some Programmers Refuse to Get it. He’s responding to a comment by Allen Holub suggesting that having 110k of test code for 30k of production code is “a real indictment of the language” (IronPython). I’m with Michael on this, Holub’s suggestion is laughable and worse radically uninformed. I like the way Larry O’Brien (who passed on Holub’s comment in the first place) describes the views of tests from inside and outside the agile community. I also like his description of tests as “quality diodes”.

Other Stuff

  • Werner Vogels posts about a new Amazon EC2 feature: Persistent local storage. Basically, you can create an empty volume up to a terabyte in size and then mount it to your images as a drive. The objective seems to be able to run relational databases in the images, rather than being limited to S3 and SimpleDB. Kinda interesting, but given Google’s announcement last week, I think the shine is off EC2 a bit.
  • This past weekend’s Twitter outage has Dave Winer re-thinking the idea of building networks on a single point of failure. While obviously I agree with the concept, I don’t agree with his solution that “We need some big infrastructure companies to get into this game”. While there are some big blog infrastructures out there, most of that network was built on a massive number of small infrastructures. Why wouldn’t the same thing work for microblogging?