The Last Mile of the Internet

Christian Weyer makes a great comment on yesterday’s post about the barbarian rediscovery of async messaging:

But how do these two toolkits solve the NAT/Firewall issue? Without a solution to this they are pretty much useless in breadth usage.

Simply put, they don’t. Frankly, they don’t even try. And I agree with Christian that the NAT/Firewall issue makes any async messaging based approach useless for clients. It’s kind of like the last mile problem in the telco/cable industries – you’ve got this great capability in the center, but you can’t leverage its full potential because of the massive effort it takes to push that capability all the way to the edge of the network.

Dave Winer has been pretty explicit with his RSS Cloud work: “The goal is to have a Small Pieces Loosely Joined equivalent of Twitter.” PubSubHubbub doesn’t mention Twitter by name, but the protocol spec specifically says “Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today”. Both specs have a fundamental design that looks like this:

This picture leaves out multiple publishers and subscribers and the subscriber registration process, but you get the basic idea. And it all works great assuming that both the subscriber and the pub/sub infrastructure can accept incoming connections. While that seems like a fairly safe assumption for infrastructure pieces, it is clearly a faulty assumption for any subscriber running locally on a client machine. Client machines primarily live behind firewalls at the office, behind NAT routers at home or on mobile wireless network – all of which disallow most if not all incoming connections. In other words, this works just fine for server subscribers (like, say Google Reader) but not for client subscribers (like, say TweetDeck).

As far as I can tell, the only way to enable client subscribers to play in this async messaging world is via some type of relay service. Any other solution I can think of depends on mass adoption of new technology, which as I mentioned in my last post is nearly impossible.

In this approach, the client subscriber makes an outbound connection to some type of relay infrastructure, which in turn creates a endpoint on the public internet for that client. Registration for pub/sub happens as normal, using the relay endpoint as the notification URL. Then, when a message arrives on the relay endpoint, it’s sent back down the outbound connection to the client.

The relay approach is technically feasible – it’s used in many places today. Exchange DirectPush uses this approach to support real-time delivery of mail to mobile devices – though the relay capability is built directly Exchange client access servers rather than available as a separate service. The .NET Service Bus – part of Windows Azure – provides a hosted relay infrastructure that anyone can leverage (though their support of non-windows platforms is pretty weak). I haven’t worked with it, but it looks like Opera’s new Unite platform includes a relay service as well (note, they call it a proxy service). Nice thing about Opera Unite is the async messaging infrastructure is built right into their browser, though you could achieve something similar in any browser using Flash or Silverlight.

Yes, having to relay messages sucks. But the question is, which sucks worse: polling or relaying?

Async Messaging and the Barbarian Hordes

At PDC 1996, Pat Helland did a six minute bit where he compared personal computing to the sacking of Rome and Microsoft Transaction Server to the Renaissance. It was called “Transaction Processing and the Barbarian Hordes” and in my opinion it should be required viewing for everyone in the tech industry.

Of course, the tech industry has changed significantly since PDC96. In particular, personal computing has become the new “Classical Rome” and web developers are the new barbarians. Just as Microsoft rediscovered transaction processing in the 90’s, it seems that RESTifarians are on the verge of rediscovering asynchronous messaging.

“The internet has been dead and boring for a while now. It has reached a point of stability where flashes of technological creativity are rare, but every now and then some new technology can put a spark back in the ole gal (no sexism intended).

If you haven’t heard of WebHooks orPubSubHubBub its about time you did. Both are designed to simplify and optimize the web.”

Mark Cuban, The Internet is about to change

Not to put too fine a point on it, but these “flashes of technological creativity” that Mark’s going gaga over aren’t new at all. Both Web Hooks and PubSubHubbub are essentially async messaging, the oldest form of messaging in the history of networking. But just as personal computing ignored the importance of transaction processing for a long time, REST has long ignored the importance of async messaging. Instead, web development has instead been focused exclusively on request/response – something I’ve struggled with for quite some time. But the rise of Twitter has driven many people to realize that something I’ve known since 2003: “In order to truly evolve syndication…we need to break free of the synchronous polling model.” ¹

I love the slogan from this Web Hooks presentation: “so simple you’ll think it’s stupid”. Web Hooks aren’t stupid – far from it – but they certainly are simple. They’re basically callbacks – which Web Hooks creator Jeff Lindsay readily acknowledges – invoked across the network using standard REST technology like HTTP and XML or JSON. The canonical webhook examples are Paypal Instant Payment Notification and GitHub Post-Receive Hooks. In both cases, you register a custom notification URL with the system in question. Then, when something specific happens in the system, a message gets POSTed to the registered URL. In some scenarios, it’s a simple notification. For example, when GitHub receives a commmit push, it POSTs a JSON message about the commit to the registered URL. In other scenarios, the initial message is the start of an async conversation – the system expects you to POST a message back to them sometime in the future. For example, when a customer makes a payment, PayPal POSTs a message to the URL you registered. You then confirm the payment by posting a message back to a well known PayPal URL.

Note, by the way, that both of these canonical examples depend on async messaging. GitHub isn’t going to do anything with a response anyway, so there’s no point in sending them a response. PayPal, on the other hand, is expecting a response. Yet, they use async messaging instead of an arguably simpler HTTP request/response operation. They do this for same reason WS-Transaction is the Anti-Availability Protocol – the last thing you want to do is lock up precious resources in your system waiting for some nimrod on the other side of the Internet to respond to a request you sent. Instead you what PayPal does – send an async message, listen on a separate channel for a response, correlate the messages explicitly via some kind of conversation identifier and release your precious resources to do other work while you wait for the response.

As for PubSubHubbub, it’s focused on real time delivery of new information. Dave Winer’s recent RSS Cloud efforts focus on real-time notification as well. In both cases, instead of subscribers polling a given RSS feed for changes every X amount of time, they register for notification when the feed is updated. This is very similar to the way GitHub uses async messages for commit push notification as described above.

Both PubSubHubbub and RSS Cloud include an intermediary that’s responsible for managing the list of current subscribers and relaying the notification when the publisher makes a change. Honestly, I’m not a fan of the Hub/Cloud intermediary – it feels a little too ESB-like to me. However, since it’s only relaying notifications it receives without transformation, I can live with it. Besides, there’s no reason why a publisher can’t act as it’s own hub. The vast number of blogs and twitter users have so few subscribers that the extra layer of abstraction is probably not worth it. On the other hand, if you’re going to run a notification hub for the largest users, you might as well use it for smaller ones as well.

While I think Mark’s laid the “new technology” hype on pretty thick, I do think he hits the nail on the head regarding the major new business opportunities that can come from adopting the heretofore ignored async messaging model on the web:

“This could be an open door for the content business…Using The Associated Press as an example, AP could post their stories to a HUB. In realtime, the HUB can update member websites so that they will always have information first, before any aggregator. It may not take long for aggregators to recognize the new data on the member sites, but they won’t have it first.

The New York Times could do the same thing. Subscribers could get everything first, in realtime. Then after some delay which might be 1 minute, it might be 30 minutes depending on what the paper thinks is the value related to timeliness, it could post on the website and on twitter and facebook as updates. Would NY Times online readers pay $1 a month to be guaranteed that they get their news first, before anyone else ? I dont know.

In the sports world, text based play by play websites could be updated in realtime rather than pulling every 30 seconds or requiring the user to hit refresh every few seconds.”

Arguably, this opportunity is easier to realize precisely because async messaging isn’t new technology. Getting people to adopt a new technology is incredibly hard. It’s much easier to get people to adopt a new pattern for using an existing technology. And async messaging has been possible as long as the web has been in existence.

Web Hooks and PubSubHubbub are long overdue but very welcome steps forward in the evolution of the Internet. I wonder what the barbarians will rediscover next?

Of course, writing a prediction like this is a far sight from actually implementing it. If I had actually put some engineering effort behind this in 2003, maybe I’d be a household name in the tech community by now. On the other hand, I said some things in that same post that have turned out to be spectacularly incorrect (“Indigo is going to make Longhorn a great platform for SOA”) so it probably wouldn’t have made much of a difference.↩

DevHawk World Tour FY2010

As I’ve done the past two years, here’s a list of all the places I’m going in the next fiscal year. Traditionally, I’ve done this post by calendar year, but all MSFT planning is done by FY and so invariably I miss events early in the calendar year but late in the fiscal (like PyCon last year). I’ll be updating this post periodically as I get tapped for more presentations. There are several other conferences I’m considering, submitting sessions for, in discussions with, but these are the ones that are confirmed.

Danish University Tour, Sept 7-11
My FY10 travels first take me to Copenhagen, where I was invited by the local subsidiary to present at four different universities in a single week. Don’t know how much sightseeing I’ll get done, but I’ll sure be talking a lot. My host Martin Esmann writes Stud.blog for Danish ComputerWorld and has a post (in Danish) about my visit. Personally, I am just excited about being featured in something called “Stud.blog”! 😄 Actually, Stud here means “Student” not “slender, upright members of wood” or any other definition of the term “stud”.

I’ll be visiting Aalborg University, Aarhus University, University of Southern Denmark and University of Copehhagen as well as delivering a TechTalk at the Microsoft Development Center Copenhagen, which is Microsoft’s biggest development center in Europe. I’ll primarily be delivering my Iron Languages introductory talk “Pumping Iron”, but there’s also some interest in language development on the DLR so I’ll be talking on that topic as well.

patterns & practices Summit Redmond 2009, Oct 12-16
This will be my third p&p Summit in a row and fourth in five years. This year, I’m doing a talk called “Not Everything is a new Nail() : How Languages Influence Design”. I was supposed to deliver this talk last year, but got side track with my day job and ended up talking about IronPython instead. Keith has made it VERY clear he doesn’t want another last minute substitution again this year.

Turing award winner Alan Perlis is credited with saying ‘A language that doesn’t affect the way you think about programming is not worth knowing.’ Yet, most programmers rarely venture outside of the comfort zone of statically-typed object-oriented languages. Our heavy use of object-oriented languages influences our thinking to the point that we can?t see alternative approaches at all. This isn?t to say the object-oriented languages are bad, but as is typical in most things, there is no one ‘best’ way for all situations. In this talk, VS Languages PM Harry Pierson will look at a given software development scenario from both the object-oriented and functional perspectives, in order to see how much on an influence language really has on our engineering efforts.

TechEd_Europe_2009

Tech·Ed Europe 2009, Nov 9-13
I knew I was going to be updating this post over time, but I didn’t expect to have to update it so soon! Literally the day after I posted this, I got the speaker invite for Tech·Ed Europe 2009. My session hasn’t been posted yet, but this is the abstract we submitted:

Dynamic Languages on the Microsoft .NET Framework
The Dynamic Language Runtime (DLR) adds a shared dynamic type system, a standard hosting model, and support for generating fast dynamic code to the CLR. IronPython and IronRuby are Microsoft’s dynamic language implementations on .NET. In this talk, we’ll show you how to interactively create great .NET applications using dynamic languages. You’ll walk away knowing why dynamic languages deserve a spot in your toolbox!

It’s kind of generic, but given that most of the audience probably hasn’t seen IronPython or IronRuby, having broad latitude in my presentation topic is a good thing. I’ll probably deliver a variant of my standard “Pumping Iron” talk like I’m doing in Denmark. I delivered it recently at an internal event with Jimmy, so there’s lots more IronRuby content than there used to be.

The only bummer about doing Tech·Ed Europe is that I’m only doing one measly talk. I’m asking around – I’d love to do a .NET user group or university talk while I’m in town. Any takers?

Find out what's
next

~~Microsoft Professional Developers Conference 2009, Nov 17-19~~
Update: Tech·Ed Europe and PDC are on back-to-back weeks this year so we’ll be sending a teammate-to-be-determined to PDC in my stead. My family is very pleased I won’t be gone for two weeks straight.

Last year, I was on the content team for PDC. This year, that PITA responsibility belongs to someone else so I might actually get real work done in the four weeks leading up to PDC. My team will tell you, last year PDC sucked up 100% of my time for a month as we were driving towards our 2.0 release.

Technically, I haven’t had a talk for PDC accepted yet. But I submitted three and two are looking good (though I assume only one will make it to the actual show) so I thought I’d just go ahead and include it on this post. If/when my talks get accepted, I’ll post links and abstracts. Also, if one of my PDC talks is accepted, I’ll probably submit a talk for ~~SoCal Code Camp~~ ~~as well.~~

pycon logo

PyCon 2010, Feb 19-21
This will also be my third PyCon in a row, though PyCon last year was a bit of a whirlwind since I had literally just joined the IronPython team. I finally feel like I might have something interesting to present at PyCon this year. Last year Dino and Jim handled the presentation duties from our team (with Michael Foord and Jonathan Hartley delivering a tutorial and Sarah Sutkiewicz speaking on FePy). We already have one announcement that I think is pretty significant lined up and might have a second depending on how hard I can push LCA and management between now and then. Talk proposals are due October 1st, so any suggestions would be appreciated!

HawkCodeBox

Last month, I lamented the lack of extensibility of the WPF text box. While there are several vendors and at least one open source custom syntax highlighting text box, it still really bothers me how inextensible the basic WPF text box is. I just want to do a simple colorizing REPL – why is that so hard?

So instead of using any of those syntax highlighting text boxes, I decided to build my own using the approach Ken Johnson wrote about on Code Project. As I wrote before, it’s a hack – you set the text box’s foreground and background brushes to transparent so that you can override OnRender – but it works.

The big change I made from Ken’s code was to use DLR TokenCategorizer instead of regular expressions to tokenize the code. TokenCategorizer is a service provided by the DLR hosting API, which will tokenize a given script source for you. Here’s the code that colorizes the text in the text box.

var source = Engine.CreateScriptSourceFromString(this.Text);
var tokenizer = Engine.GetService<TokenCategorizer>();
tokenizer.Initialize(null, source, SourceLocation.MinValue);

var t = tokenizer.ReadToken();
while (t.Category != TokenCategory.EndOfStream)
{
    if (SyntaxMap.ContainsKey(t.Category))
    {
        ft.SetForegroundBrush(_syntaxMap[t.Category],
             t.SourceSpan.Start.Index, t.SourceSpan.Length);
    }

    t = tokenizer.ReadToken();
}

As you can see, I ask the engine for a TokenCategorizer, initialize it with the text box’s current contents, then iterate thru the tokens, looking for ones in my SyntaxMap. If the token category is in the syntax map, we change the foreground brush for that span of formatted text (ft is a WPF FormattedText instance I created earlier in the method.

Of course, this approach isn’t very efficient – it re-colorizes the entire file on every change. It turns out that some DLR TokenCategorizer are restartable so you can cache the tokenizer state at any point and then return later with a new TokenCategorizer instance and pick up tokenizing where you left off. With this approach, you could say tokenize a line at a time, allowing you to only need to retokenize the line where the change occurred rather than the entire file. But only IronPython supports tokenizer restarting today, so I decided to take the easy way and simple re-colorize on every change.

I named the project HawkCodeBox and I’ve published the source up on GitHub. It’s fairly simple, but of course the goal wasn’t to build the be-all-end-all text editor – other people in the VS team already have that job.

CodePlex Editor Role

Ask Sara, I have been bugging her for a LONG time for this CodePlex feature. Actually, my team has been bugging her team for longer than either of us have been in these jobs.

Last week’s CodePlex release includes a feature known as “Editor Role”. If you look at the Project Role Matrix, you’ll notice two primary differences from what the standard logged-in user can do: they can create/edit wiki pages and they can’t rate releases. Developers and Coordinators can’t rate releases either – I guess the idea is that they don’t want members of the team rating their own releases (5 Stars! Again! Wow, we’re awesome!).

Until now, the only way to give members of the community the ability to edit the wiki also gave permission to edit work items, check in source code and make releases. We’re still working on getting Microsoft at large to understand the benefits of community collaboration aspect in open source, but in the meantime we just can’t give those permissions to people off the team. However, we would love to have contributions to our documentation wiki. ¹ With the new Editor Role, we’ll be able to grant wiki editor access without any of the other permissions.

Of course, the whole idea of “wiki permissions” kinda flies in the face of the basic wiki design principles. So we’re going to be pretty liberal about handing out editor permissions. If you’re interested in editing the wiki, drop me a line and I’ll get you hooked up.

Big mega-thanks to the CodePlex team for making this feature happen. I guess I’ll have to find something new to bug Sara about!

You can tell we’re a real open source project because we’re begging for documentation help!↩

Series

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.