Passion * Technology * Ruthless Competence

Thursday, August 27, 2009

The Last Mile of the Internet

Christian Weyer makes a great comment on yesterday’s post about the barbarian rediscovery of async messaging:

But how do these two toolkits solve the NAT/Firewall issue? Without a solution to this they are pretty much useless in breadth usage.

Simply put, they don’t. Frankly, they don’t even try. And I agree with Christian that the NAT/Firewall issue makes any async messaging based approach useless for clients. It’s kind of like the last mile problem in the telco/cable industries – you’ve got this great capability in the center, but you can’t leverage its full potential because of the massive effort it takes to push that capability all the way to the edge of the network.

Dave Winer has been pretty explicit with his RSS Cloud work: “The goal is to have a Small Pieces Loosely Joined equivalent of Twitter.” PubSubHubbub doesn’t mention Twitter by name, but the protocol spec specifically says “Polling sucks. We think a decentralized pubsub layer is a fundamental, missing layer in the Internet architecture today”. Both specs have a fundamental design that looks like this:

image

This picture leaves out multiple publishers and subscribers and the subscriber registration process, but you get the basic idea. And it all works great assuming that both the subscriber and the pub/sub infrastructure can accept incoming connections. While that seems like a fairly safe assumption for infrastructure pieces, it is clearly a faulty assumption for any subscriber running locally on a client machine. Client machines primarily live behind firewalls at the office, behind NAT routers at home or on mobile wireless network – all of which disallow most if not all incoming connections. In other words, this works just fine for server subscribers (like, say Google Reader) but not for client subscribers (like, say TweetDeck).

image

As far as I can tell, the only way to enable client subscribers to play in this async messaging world is via some type of relay service. Any other solution I can think of depends on mass adoption of new technology, which as I mentioned in my last post is nearly impossible.

image

In this approach, the client subscriber makes an outbound connection to some type of relay infrastructure, which in turn creates a endpoint on the public internet for that client. Registration for pub/sub happens as normal, using the relay endpoint as the notification URL. Then, when a message arrives on the relay endpoint, it’s sent back down the outbound connection to the client.

The relay approach is technically feasible – it’s used in many places today. Exchange DirectPush uses this approach to support real-time delivery of mail to mobile devices – though the relay capability is built directly Exchange client access servers rather than available as a separate service. The .NET Service Bus – part of Windows Azure – provides a hosted relay infrastructure that anyone can leverage (though their support of non-windows platforms is pretty weak). I haven’t worked with it, but it looks like Opera’s new Unite platform includes a relay service as well (note, they call it a proxy service). Nice thing about Opera Unite is the async messaging infrastructure is built right into their browser, though you could achieve something similar in any browser using Flash or Silverlight.

Yes, having to relay messages sucks. But the question is, which sucks worse: polling or relaying?

Posted By Harry Pierson at 11:08 AM Pacific Daylight Time

Wednesday, August 26, 2009

Async Messaging and the Barbarian Hordes

At PDC 1996, Pat Helland did a six minute bit where he compared personal computing to the sacking of Rome and Microsoft Transaction Server to the Renaissance. It was called “Transaction Processing and the Barbarian Hordes” and in my opinion it should be required viewing for everyone in the tech industry.

Of course, the tech industry has changed significantly since PDC96. In particular, personal computing has become the new “Classical Rome” and web developers are the new barbarians. Just as Microsoft rediscovered transaction processing in the 90’s, it seems that RESTifarians are on the verge of rediscovering asynchronous messaging.

“The internet has been dead and boring for a while now.  It has reached a point of stability where flashes of technological creativity are rare, but every now and then some new technology can put a spark back in the ole gal (no sexism intended).

If you haven’t heard of WebHooks or PubSubHubBub its about time you did. Both are designed to  simplify and optimize the web.”

Mark Cuban, The Internet is about to change

Not to put too fine a point on it, but these “flashes of technological creativity” that Mark’s going gaga over aren’t new at all. Both Web Hooks and PubSubHubbub are essentially async messaging, the oldest form of messaging in the history of networking. But just as personal computing ignored the importance of transaction processing for a long time, REST has long ignored the importance of async messaging. Instead, web development has instead been focused exclusively on request/response – something I’ve struggled with for quite some time. But the rise of Twitter has driven many people to realize that something I’ve known since 2003: “In order to truly evolve syndication…we need to break free of the synchronous polling model.” [1]

imageI love the slogan from this Web Hooks presentation: “so simple you’ll think it’s stupid”. Web Hooks aren’t stupid – far from it – but they certainly are simple. They’re basically callbacks – which Web Hooks creator Jeff Lindsay readily acknowledges - invoked across the network using standard REST technology like HTTP and XML or JSON. The canonical webhook examples are Paypal Instant Payment Notification and GitHub Post-Receive Hooks. In both cases, you register a custom notification URL with the system in question. Then, when something specific happens in the system, a message gets POSTed to the registered URL. In some scenarios, it’s a simple notification. For example, when GitHub receives a commmit push, it POSTs a JSON message about the commit to the registered URL. In other scenarios, the initial message is the start of an async conversation - the system expects you to POST a message back to them sometime in the future. For example, when a customer makes a payment, PayPal POSTs a message to the URL you registered. You then confirm the payment by posting a message back to a well known PayPal URL.

Note, by the way, that both of these canonical examples depend on async messaging. GitHub isn’t going to do anything with a response anyway, so there’s no point in sending them a response. PayPal, on the other hand, is expecting a response. Yet, they use async messaging instead of an arguably simpler HTTP request/response operation. They do this for same reason WS-Transaction is the Anti-Availability Protocol – the last thing you want to do is lock up precious resources in your system waiting for some nimrod on the other side of the Internet to respond to a request you sent. Instead you what PayPal does – send an async message, listen on a separate channel for a response, correlate the messages explicitly via some kind of conversation identifier and release your precious resources to do other work while you wait for the response.

image As for PubSubHubbub, it’s focused on real time delivery of new information. Dave Winer’s recent RSS Cloud efforts focus on real-time notification as well. In both cases, instead of subscribers polling a given RSS feed for changes every X amount of time, they register for notification when the feed is updated. This is very similar to the way GitHub uses async messages for commit push notification as described above.

imageBoth PubSubHubbub and RSS Cloud include an intermediary that’s responsible for managing the list of current subscribers and relaying the notification when the publisher makes a change.  Honestly, I’m not a fan of the Hub/Cloud intermediary – it feels a little too ESB-like to me. However, since it’s only relaying notifications it receives without transformation, I can live with it. Besides, there’s no reason why a publisher can’t act as it’s own hub. The vast number of blogs and twitter users have so few subscribers that the extra layer of abstraction is probably not worth it. On the other hand, if you’re going to run a notification hub for the largest users, you might as well use it for smaller ones as well.

While I think Mark’s laid the “new technology” hype on pretty thick, I do think he hits the nail on the head regarding the major new business opportunities that can come from adopting the heretofore ignored async messaging model on the web:

“This could be an open door for the content business…Using The Associated Press as an example, AP could post their stories to a HUB. In realtime, the HUB can update member websites so that they will always have information first, before any aggregator. It may not take long for aggregators to recognize the new data on the member sites, but they won’t have it first.

The New York Times could do the same thing. Subscribers could get everything first, in realtime. Then after some delay which might be 1 minute, it might be 30 minutes depending on what the paper thinks is the value related to timeliness, it could post on the website and on twitter and facebook as updates. Would NY Times online readers pay $1 a month to be guaranteed that they get their news first, before anyone else ? I dont know.

In the sports world, text based play by play websites could be updated in realtime rather than pulling every 30 seconds or requiring the user to hit refresh every few seconds.”

Arguably, this opportunity is easier to realize precisely because async messaging isn’t new technology. Getting people to adopt a new technology is incredibly hard. It’s much easier to get people to adopt a new pattern for using an existing technology. And async messaging has been possible as long as the web has been in existence.

Web Hooks and PubSubHubbub are long overdue but very welcome steps forward in the evolution of the Internet. I wonder what the barbarians will rediscover next?


[1] Of course, writing a prediction like this is a far sight from actually implementing it. If I had actually put some engineering effort behind this in 2003, maybe I’d be a household name in the tech community by now. On the other hand, I said some things in that same post that have turned out to be spectacularly incorrect (“Indigo is going to make Longhorn a great platform for SOA”) so it probably wouldn’t have made much of a difference.

Posted By Harry Pierson at 11:33 AM Pacific Daylight Time

Friday, August 21, 2009

DevHawk World Tour FY2010

As I’ve done the past two years, here’s a list of all the places I’m going in the next fiscal year. Traditionally, I’ve done this post by calendar year, but all MSFT planning is done by FY and so invariably I miss events early in the calendar year but late in the fiscal (like PyCon last year). I'll be updating this post periodically as I get tapped for more presentations. There are several other conferences I'm considering, submitting sessions for, in discussions with, but these are the ones that are confirmed.

Danish University Tour, Sept 7-11

250px-Dannebrog My FY10 travels first take me to Copenhagen, where I was invited by the local subsidiary to present at four different universities in a single week. Don’t know how much sightseeing I’ll get done, but I’ll sure be talking a lot. My host Martin Esmann writes Stud.blog for Danish ComputerWorld and has a post (in Danish) about my visit. Personally, I am just excited about being featured in something called “Stud.blog”! :) Actually, Stud here means “Student” not “slender, upright members of wood” or any other definition of the term “stud”.

I’ll be visiting Aalborg University, Aarhus University, University of Southern Denmark and University of Copehhagen as well as delivering a TechTalk at the Microsoft Development Center Copenhagen, which is Microsoft’s biggest development center in Europe. I’ll primarily be delivering my Iron Languages introductory talk “Pumping Iron”, but there’s also some interest in language development on the DLR so I’ll be talking on that topic as well.

patterns & practices Summit Redmond 2009, Oct 12-16

n79454152413_3738 This will be my third p&p Summit in a row and fourth in five years. This year, I’m doing a talk called “Not Everything is a new Nail() : How Languages Influence Design”. I was supposed to deliver this talk last year, but got side track with my day job and ended up talking about IronPython instead. Keith has made it VERY clear he doesn’t want another last minute substitution again this year.

Turing award winner Alan Perlis is credited with saying 'A language that doesn't affect the way you think about programming is not worth knowing.' Yet, most programmers rarely venture outside of the comfort zone of statically-typed object-oriented languages. Our heavy use of object-oriented languages influences our thinking to the point that we can?t see alternative approaches at all. This isn?t to say the object-oriented languages are bad, but as is typical in most things, there is no one 'best' way for all situations. In this talk, VS Languages PM Harry Pierson will look at a given software development scenario from both the object-oriented and functional perspectives, in order to see how much on an influence language really has on our engineering efforts.

Tech·Ed Europe 2009, Nov 9-13

TechEd_Europe_2009 I knew I was going to be updating this post over time, but I didn’t expect to have to update it so soon! Literally the day after I posted this, I got the speaker invite for Tech·Ed Europe 2009. My session hasn’t been posted yet, but this is the abstract we submitted:

Dynamic Languages on the Microsoft .NET Framework
The Dynamic Language Runtime (DLR) adds a shared dynamic type system, a standard hosting model, and support for generating fast dynamic code to the CLR. IronPython and IronRuby are Microsoft’s dynamic language implementations on .NET. In this talk, we’ll show you how to interactively create great .NET applications using dynamic languages. You’ll walk away knowing why dynamic languages deserve a spot in your toolbox!

It’s kind of generic, but given that most of the audience probably hasn’t seen IronPython or IronRuby, having broad latitude in my presentation topic is a good thing. I’ll probably deliver a variant of my standard “Pumping Iron” talk like I’m doing in Denmark. I delivered it recently at an internal event with Jimmy, so there’s lots more IronRuby content than there used to be.

The only bummer about doing Tech·Ed Europe is that I’m only doing one measly talk. I’m asking around – I’d love to do a .NET user group or university talk while I’m in town. Any takers?

Microsoft Professional Developers Conference 2009, Nov 17-19

Find out what's nextUpdate: Tech·Ed Europe and PDC are on back-to-back weeks this year so we’ll be sending a teammate-to-be-determined to PDC in my stead. My family is very pleased I won’t be gone for two weeks straight.

Last year, I was on the content team for PDC. This year, that PITA responsibility belongs to someone else so I might actually get real work done in the four weeks leading up to PDC. My team will tell you, last year PDC sucked up 100% of my time for a month as we were driving towards our 2.0 release.

Technically, I haven’t had a talk for PDC accepted yet. But I submitted three and two are looking good (though I assume only one will make it to the actual show) so I thought I’d just go ahead and include it on this post. If/when my talks get accepted, I’ll post links and abstracts. Also, if one of my PDC talks is accepted, I’ll probably submit a talk for SoCal Code Camp as well.

PyCon 2010, Feb 19-21

pycon_logo This will also be my third PyCon in a row, though PyCon last year was a bit of a whirlwind since I had literally just joined the IronPython team. I finally feel like I might have something interesting to present at PyCon this year. Last year Dino and Jim handled the presentation duties from our team (with Michael Foord and Jonathan Hartley delivering a tutorial and Sarah Sutkiewicz speaking on FePy). We already have one announcement that I think is pretty significant lined up and might have a second depending on how hard I can push LCA and management between now and then. Talk proposals are due October 1st, so any suggestions would be appreciated!

Posted By Harry Pierson at 2:32 PM Pacific Daylight Time

Thursday, August 20, 2009

HawkCodeBox

Last month, I lamented the lack of extensibility of the WPF text box. While there are several vendors and at least one open source custom syntax highlighting text box, it still really bothers me how inextensible the basic WPF text box is. I just want to do a simple colorizing REPL – why is that so hard?

So instead of using any of those syntax highlighting text boxes, I decided to build my own using the approach Ken Johnson wrote about on Code Project. As I wrote before, it’s a hack – you set the text box’s foreground and background brushes to transparent so that you can override OnRender – but it works.

The big change I made from Ken’s code was to use DLR TokenCategorizer instead of regular expressions to tokenize the code. TokenCategorizer is a service provided by the DLR hosting API, which will tokenize a given script source for you. Here’s the code that colorizes the text in the text box.

var source = Engine.CreateScriptSourceFromString(this.Text);
var tokenizer = Engine.GetService<TokenCategorizer>();
tokenizer.Initialize(null, source, SourceLocation.MinValue);

var t = tokenizer.ReadToken();
while (t.Category != TokenCategory.EndOfStream)
{
    if (SyntaxMap.ContainsKey(t.Category))
    {
        ft.SetForegroundBrush(_syntaxMap[t.Category], 
             t.SourceSpan.Start.Index, t.SourceSpan.Length);
    }

    t = tokenizer.ReadToken();
}

As you can see, I ask the engine for a TokenCategorizer, initialize it with the text box’s current contents, then iterate thru the tokens, looking for ones in my SyntaxMap. If the token category is in the syntax map, we change the foreground brush for that span of formatted text (ft is a WPF FormattedText instance I created earlier in the method.

Of course, this approach isn’t very efficient – it re-colorizes the entire file on every change. It turns out that some DLR TokenCategorizer are restartable so you can cache the tokenizer state at any point and then return later with a new TokenCategorizer instance and pick up tokenizing where you left off. With this approach, you could say tokenize a line at a time, allowing you to only need to retokenize the line where the change occurred rather than the entire file. But only IronPython supports tokenizer restarting today, so I decided to take the easy way and simple re-colorize on every change.

I named the project HawkCodeBox and I’ve published the source up on GitHub. It’s fairly simple, but of course the goal wasn’t to build the be-all-end-all text editor – other people in the VS team already have that job.

Posted By Harry Pierson at 11:49 AM Pacific Daylight Time

Sunday, August 16, 2009

CodePlex Editor Role

Ask Sara, I have been bugging her for a LONG time for this CodePlex feature. Actually, my team has been bugging her team for longer than either of us have been in these jobs.

Last week’s CodePlex release includes a feature known as “Editor Role”. If you look at the Project Role Matrix, you’ll notice two primary differences from what the standard logged-in user can do: they can create/edit wiki pages and they can’t rate releases. Developers and Coordinators can’t rate releases either – I guess the idea is that they don’t want members of the team rating their own releases (5 Stars! Again! Wow, we’re awesome!).

Until now, the only way to give members of the community the ability to edit the wiki also gave permission to edit work items, check in source code and make releases. We’re still working on getting Microsoft at large to understand the benefits of community collaboration aspect in open source, but in the meantime we just can’t give those permissions to people off the team. However, we would love to have contributions to our documentation wiki. [1] With the new Editor Role, we’ll be able to grant wiki editor access without any of the other permissions.

Of course, the whole idea of “wiki permissions” kinda flies in the face of the basic wiki design principles. So we’re going to be pretty liberal about handing out editor permissions. If you’re interested in editing the wiki, drop me a line and I’ll get you hooked up. 

Big mega-thanks to the CodePlex team for making this feature happen. I guess I’ll have to find something new to bug Sara about!

[1] You can tell we’re a real open source project because we’re begging for documentation help!

Posted By Harry Pierson at 8:17 AM Pacific Daylight Time

Thursday, August 13, 2009

2009 Space Elevator Conference

Today marks the start of the 2009 Space Elevator Conference on the Microsoft campus. Last night, my father and I attended a free overview presentation on space elevators. My father is a huge sci-fi fan and has read many of Arthur C. Clarke’s books include The Fountains of Paradise so he was very excited for this opportunity. Unfortunately, while the idea of a space elevator is pretty exciting, the presentation itself left quite a bit to be desired.

For the un-initiated, a space elevator is just what it sounds like – an elevator into space. Chemical rockets are horribly inefficient, so instead the idea is to run a cable way out into space. According to Wikipedia, a space elevator would be a couple of orders of magnitude cheaper for getting things into space than chemical rocketry.

Of course, actually building a space elevator would have a massive up front cost and an engineering effort that would dwarf even the effort that landed mankind on the moon. One of the biggest problems is substance the cable itself is build out of. This cable would be thousands of kilometers long, and would have to be extremely strong. Frankly, there’s no feasible material to make the cable from available to us today. Apparently, making a cable strong enough out of the strongest high tensile steel available today would weigh more than the entire universe! Not exactly feasible. But advancements in carbon nanotubes have scientists believing they might be able to make materials 100x stronger than high tensile steel. If that pans out, it would be feasible to build the space elevator cable from carbon nanotubes.

Another big issue is power for the climbers. Current thinking apparently is to beam power to the climbers via megawatt lasers – an idea that like carbon nanotubes would have far reaching impact on our society over and above space elevators. The idea of “beaming power” sounds nearly as fantastic as the space elevator itself, but apparently there’s an X-Prize style competition underway with a cool $2 million in prize money if you can build a beam powered climber that travel 5 meters/second.

While the idea of a space elevator is very fascinating and I was excited to spend an evening with my dad geeking out in a non-software related field, the presentation itself was kinda crappy. I have no doubt that Dr. Bryan Laubscher, who delivered the presentation, is one of the top minds in space elevator theory and technology in the world today. However, his presentation was bullet-point laden, rambling, incoherent at times and frankly boring.

For example, I get the feeling that Dr. Laubscher spends a lot of time defending the idea of a space elevator to skeptical NASA scientists. He spent WAY too much time talking about how inefficient chemical rockets are – I mean, mention it once but don’t keep coming back to that point over and over. He also went off on a strange tangent about the potential for societal decline when we turn our back on exploration. But he wasn’t presenting to skeptical NASA scientist last night – he was presenting to group of enthusiastic amateurs. If you can’t tailor your presentation to your audience, there’s no way you’re going to be effective.

While the presentation could have been better, it still had some fascinating information. For example, there would probably have to be multiple space elevators – Dr. Laubscher estimated there would be five. It’s much more efficient to have the space elevator be one way so you need at least two – one to have one to go up and one to go down. I never considered the idea of multiple space elevators before.

Apparently, last year’s Space Elevator Conference was on the Microsoft Campus and I wouldn’t be surprised if next year’s was as well. I hope it will be. I’d like to attend more of the conference. Saturday is Space Elevator 101 day at the conference but I’m driving my parents to the airport. In the meantime, there are some space elevator blogs to follow. Also, I met the president of the LiftPort Group which is headquartered in Seattle, so maybe I’ll get a chance to talk to him one-on-one sometime after the conference is over.

And I should probably read The Fountains of Paradise while I’m at it.

Posted By Harry Pierson at 4:39 PM Pacific Daylight Time

Wednesday, August 12, 2009

Invoking Python Functions from C# (Without Dynamic)

image So I’ve compiled the Pygments package into a CLR assembly and loaded an embedded Python script, so now all that remains is calling into the functions in that embedded Python script. Turns out, this is the easiest step so far.

We’ll start with get_all_lexers and get_all_styles, since they’re nearly identical. Both functions are called once on initialization, take zero arguments and return a PythonGenerator (for you C# devs, a PythonGenerator is kind of like the IEnumerable that gets created when you yield return from a function). In fact, the only difference between them is that get_all_styles returns a generator of simple strings, while get_all_lexers returns a PythonTuple of the long name, a tuple of aliases, a tuple of filename patterns and a tuple of mime types. Here’s the implementation of Languages property:

PygmentLanguage[] _lanugages;

public PygmentLanguage[] Languages
{
    get
    {
        if (_lanugages == null)
        {
            _init_thread.Join();

            var f = _scope.GetVariable<PythonFunction>("get_all_lexers");
            var r = (PythonGenerator)_engine.Operations.Invoke(f);
            var lanugages_list = new List<PygmentLanguage>();
            foreach (PythonTuple o in r)
            {
                lanugages_list.Add(new PygmentLanguage()
                    {
                        LongName = (string)o[0],
                        LookupName = (string)((PythonTuple)o[1])[0]
                    });
            }

            _lanugages = lanugages_list.ToArray();
        }

        return _lanugages;
    }
}

If you recall from my last post, I initialized the _scope on a background thread, so I first have to wait for the thread to complete. If I was using C# 4.0, I’d simply be able to run _scope.get_all_lexers, but since I’m not I have to manually reach into the _scope and retrieve the get_all_lexers function via the GetVariable method. I can’t invoke the PythonFunction directly from C#, instead I have to use the Invoke method that hangs off _engine.Operations. I cast the return value from Invoke to a PythonGenerator and iterate over it to populate the array of languages.

If you’re working with dynamic languages from C#, the ObjectOperations instance than hangs off the ScriptEngine instance is amazingly useful. Dynamic objects can participate in a powerful but somewhat complex protocol for binding a wide variety of dynamic operation types. The DynamicMetaObject class supports twelve different Bind operations. But the DynamicMetaObject binder methods are designed to be used by language implementors. The ObjectOperations class lets you invoke them fairly easily from a higher level of abstraction.

The last Python function I call from C# is generate_html. Unlike get_all_lexers, generate_html takes three parameters and can be called multiple times. The Invoke method has a params argument so it can accept any number of additional parameters, but when I tried to call it I got a NotImplemented exception. It turns out that Invoke currently throws NotImplemented if it receives more than 2 parameters. Yes, we realize that’s kinda broken and we are looking to fix it. However, it turns out there’s another way that’s also more efficient for a function like generate_html that we are likely to call more than once. Here’s my implementation of GenerateHtml in C#.

Func<object, object, object, string> _generatehtml_function;

public string GenerateHtml(string code, string lexer, string style)
{
    if (_generatehtml_function == null)
    {
        _init_thread.Join();
            
        var f = _scope.GetVariable<PythonFunction>("generate_html");
        _generatehtml_function = _engine.Operations.ConvertTo
                           <Func<object, object, object, string>>(f);
    }

    return _generatehtml_function(code, lexer, style);
}

Instead of calling Invoke, I convert the PythonFunction instance into a delegate using Operations.ConvertTo which I then cache and call like any other delegate from C#. Not only does Invoke fail for more than two parameters, it creates a new dynamic call site every time it’s called. Since get_all_lexers and get_all_styles are each only called once, it’s no big deal. But you typically call generate_html multiple times for a block of source code. Using ConvertTo generates a dynamic call site as part of the delegate, so that’s more efficient than creating one on every call.

The rest of the C# code is fairly pedestrian and has nothing to do with IronPython, as all access to Python code is hidden behind GenerateHtml as well as the Languages and Styles property.

So as I’ve shown in the last few posts, embedding IronPython inside a C# application – even before we get the new dynamic functionality of C# 4.0 – isn’t really all that hard. Of course, we’re always interested in ways to make it easier. If you’ve got any questions or suggestions, please feel free to leave a comment or drop me a line.

Posted By Harry Pierson at 10:10 AM Pacific Daylight Time

Tuesday, August 11, 2009

Embedding Python Scripts in C# Applications

image

Now that I’ve got Pygments and its dependencies packaged up in an easy-to-distribute assembly, I need to be able to call it from C#. However, if you pop open pygments.dll in Reflector, you’ll notice it’s not exactly intuitive to access. Lots of compiler generated names like pygments$12 and StringIO$64 in a type named DLRCachedCode. Clearly, this code isn’t intended to be used by anything except the IronPython runtime.

So we better create one of those IronPython runtime thingies.

As you can see in the layer diagram to the left, PygmentsCodeSource is split into two parts – a C# part and a Python part. The Python part is very simple – just importing a couple of Pygments functions into the global namespace and a simple helper function to generate syntax highlighted HTML from a given block of code in a given language and style. The code itself is pretty simple. Note the reference to the pygments assembly I described last post. Here’s the entire file:

import clr
clr.AddReference("pygments")

from pygments.lexers import get_all_lexers
from pygments.styles import get_all_styles

def generate_html(code, lexer_name, style_name):
  from pygments import highlight
  from pygments.lexers import get_lexer_by_name
  from pygments.styles import get_style_by_name
  from devhawk_formatter import DevHawkHtmlFormatter

  if not lexer_name: lexer_name = "text"
  if not style_name: style_name = "default"
  lexer = get_lexer_by_name(lexer_name)
  return highlight(code, lexer, DevHawkHtmlFormatter(style=style_name))

Instead of including this in the Pygments assembly, I embedded this file as a resource in my C# assembly. This way, I could use the standard DLR hosting APIs to create a script source and execute this code. I did have to build a concrete StreamContentProvider class to wrap the resource stream in, but otherwise, it’s pretty straight forward.

static ScriptEngine _engine;
static ScriptSource _source;

private void InitializeHosting()
{
    _engine = IronPython.Hosting.Python.CreateEngine();

    var asm = System.Reflection.Assembly.GetExecutingAssembly();
    var stream = asm.GetManifestResourceStream(
                   "DevHawk.PygmentsCodeSource.py");
    _source = _engine.CreateScriptSource(
                new BasicStreamContentProvider(stream), 
                "PygmentsCodeSource.py");
}

Once I got the engine and script source set up, all that remains is setup a script scope to execute the script source in. For this specific application, it’s probably overkill to have a scope per instance – I think the syntax highlighting process is stateless so a single scope should be easily shared across multiple PygmentsCodeSource instances. But I didn’t take any chances, I created a script scope per instance to execute the source in.

ScriptScope _scope;
Thread _init_thread;

public PygmentsCodeSource()
{
    if (_engine == null)
        InitializeHosting();

     _scope = _engine.CreateScope();

    _init_thread = new Thread(() => { _source.Execute(_scope); });
    _init_thread.Start();
}

You’ll notice that I’m executing the source in the scope on a background thread. That’s because it takes a while to execute, especially the first time. However, I don’t actually use the Python code until after the user types or copies a block of code into the UI and presses OK. In my experience, executing the Python code is typically finished by the time I get code into the box and press OK. I just need to make sure I add an _init_thread.Join guard anywhere I’m going to access the _scope to be sure the initialization is complete before I try to use it.

In the next, and last, post in this small series we’ll see how to invoke Python functions in the _scope I initialized above from C#.

Posted By Harry Pierson at 9:24 AM Pacific Daylight Time

Monday, August 10, 2009

Compiling Python Packages into Assemblies

image In looking at my hybrid IronPython / C# Windows Live Writer plugin, we’re going to start at the bottom with the Pygments package. Typically Python packages are a physical on-disk folder that contain a collection of Python files (aka modules). And during early development of Pygments for WLWriter, that’s exactly how I used it. However, when it can time for deployment, I figured it would be much easier if I packaged up the Pygments package, my custom HTML formatter and the standard library modules that Pygments depends on into a single assembly.

IronPython ships with a script named pyc for compiling Python files into .NET assemblies. However, pyc is pretty much just a wrapper around the clr module CompileModules function. I wrote my own custom script to build the Pygments assembly from the files in a the pygments and pygments_dependencies folders.

from System import IO
from System.IO.Path import Combine

def walk(folder):
  for file in IO.Directory.GetFiles(folder):
    yield file
  for folder in IO.Directory.GetDirectories(folder):
    for file in walk(folder): yield file
  
folder = IO.Path.GetDirectoryName(__file__)

pygments_files = list(walk(Combine(folder, 'pygments')))
pygments_dependencies = list(walk(Combine(folder,'pygments_dependencies')))

all_files = pygments_files + pygments_dependencies
all_files.append(IO.Path.Combine(folder, 'devhawk_formatter.py'))

import clr
clr.CompileModules(Combine(folder, "..\external\pygments.dll"), *all_files)

Most of this code is a custom implementation of walk. I have all the IronPython and DLR dlls including ipy.exe checked into my source tree, but I don’t have the standard library checked in. Other than that, the code is pretty straight forward – collect a bunch of files in a list and call CompileModules.

The problem with this approach is that IronPython isn’t doing any kind of dependency checking when we compile the assembly. If you pass just the contents of the Pygments package into CompileModules, it will emit an assembly but that assembly will still depend on some modules in the standard library. If those aren’t available, the Pygments assembly won’t load. I’d love to have an automatic tool to determine module dependencies, but since I didn’t have such a tool I used a brute-force, by-hand solution. I wrote a small script to exercise the Pygments assembly. If there were any missing dependencies, test_compiled_pygments would throw an exception indicating the missing module. For each missing dependency, I copied over the missing dependency, recompiled to project and tried again. Lather, rinse, repeat. Not fun, but Pygments only depended on seven standard library modules so it didn’t end up taking that long.

So having gone down this path of compiling Python files into an assembly, would I do it again? For an application with an installer like this one, yes no question. I added the Pygments assembly as a reference to my C# library and it got added to the installer automatically. That was much easier than managing all of the Pygments files and its dependencies in the installer project manually. Plus, I still would have had to manually figure out the dependencies unless I chose to include the entire standard library.

I will point out that the compiled Pygments assembly is the largest single file in my deployed solution. It clocks in at 2.25MB. That’s about twice the size of the Python files that I compiled it from. So clearly, I’m paying for the convenience of deploying a single file in space and maybe load time. [1] I’m also paying in space for a private copy of IronPython and the DLR – the two IronPython and five DLR assemblies clock in around 3.16MB. In comparison, the actual Writer plugin assembly itself is only about 25KB! But for an installed desktop app like a WLWriter plugin, 5MB of assorted infrastructure isn’t worth worrying about compared to the hassle of ensuring a shared copy of IronPython is installed. I mean, even if you don’t know IronPython exists, you can still install and use Pygments for WLWriter. Simplifying the install process is easily worth 5MB in storage space on the user’s computer in my opinion.

Next up, we’ll look at the Python half of the PygmentsCodeSource component, which calls into this compiled Pygments library.


[1] I haven’t done it, but it would be interesting to compare the load time for the single larger pygments assembly vs. loading and parsing the Python files individually. If I had to guess, I’m thinking the single assembly would load faster even though it’s bigger since there’s less overhead (only loading one big file vs. lots of small ones) and you skip the parsing step. But that’s pure guesswork on my part.

Posted By Harry Pierson at 11:16 AM Pacific Daylight Time

Building a Hybrid C# / IronPython App Without Dynamic Type

Arguably, the biggest feature of C# 4.0 is the new dynamic type. And it’ll be great…when it ships. In the meantime, some of us what to build hybrid C# and IronPython applications today, such as my Pygments for Windows Live Writer plugin.

pygments_logo Pygments is a syntax highlighter, written in Python, with support for over one hundred languages. With the exception of a couple of bugs in our importer (discussed here) it works great with IronPython. It’s also extensible, so I was able to easily build a custom formatter to output exactly the HTML I want inserted in my blog posts. So it made perfect sense to use Pygments as the basis of a Windows Live Writer plugin.

image As great a tool as Windows Live Writer is, it’s developers haven’t exactly seen the light when it comes to dynamic languages. If you want to create a custom Content Source for Windows Live Writer, you have to generate a compiled on-disk assembly with a static type and custom attributes. Not exactly IronPython’s forte, if you know what I mean. I did try and build a pure IronPython solution, but eventually gave up. So I ended up building a hybrid solution. The front end of the plugin as well as the UI elements are written in C# while the syntax highlighter engine is written in IronPython. And since this is running on the current .NET framework, I didn’t have the new fangled C# 4.0 dynamic type to help me.

Over the next couple of blog posts, I want to highlight a few aspects how I built this plugin, including compiling Python packages into assemblies and invoking Python code from C# 3.0 and earlier. If you want to look for your self, the source is up on GitHub.

Posted By Harry Pierson at 8:04 AM Pacific Daylight Time

Friday, August 07, 2009

Pygments for Windows Live Writer v1.0.2

I just uploaded a new version of my Pygments for WL Writer plugin to my skydrive. Nothing major here – some minor UI cleanup + an upgrade to IronPython 2.6 beta 2. Installing over the old version worked on my machine, but that’s as far as my testing has gone. I also pushed the latest source out to GitHub. 

I’m still waiting on a fix for what Dino has taken to calling “Harry’s Pygments Import Bug” – which actually turned out to be three importer bugs. The Pygments lexers package is customized so as to abstract away the specific modules the individual lexers are defined in. I don’t use that functionality – I’m using get_all_lexers and get_lexer_by_name instead – but the bugs caused importing the package to fail so in the mean time I commented out the lines that don’t work under IronPython. I think Dino’s got the fixes for this checked in, but I probably won’t update Pygments for WL Writer again until IronPython 2.6 RC.

Posted By Harry Pierson at 3:46 PM Pacific Daylight Time

Thursday, August 06, 2009

I Hate Global.asax

One of the things I’ve always loved about ASP.NET is how easily extensible it is. Back in 2000, I had a customer that wanted to “skin” their website using XML and XSLT – an approach Martin Fowler later called Transform View. We were working with classic ASP at the time, so the solution we ended up with was kind of ugly. But I was able to implement this approach in ASP.NET in a few hundred lines of code, which I wrote up in an MSDN article published back in 2003. In the conclusion of that article, I wrote the following:

Using ASP.NET is kind of like having your mind read. If you ever look at a site and think "I need something different," you'll most likely find that the ASP.NET architects have considered that need and provided a mechanism for you to hook in your custom functionality. In this case, I've bypassed the built-in Web Forms and Web Services support to build an entire engine that services Web requests in a unique way.

Nearly ten years later, I finally ran into a situation where ASP.NET failed to read my mind and doesn’t provide a mechanism to hook in custom functionality: Global.asax.

I always thought of global.asax as an obsolete construct primarily intended to ease migration from classic ASP. After all, ASP.NET has first class support for customizing request handling at various points throughout the execution pipeline via IHttpModule. Handling those events in global.asax always felt vaguely hacky to me.

However, what I didn’t realize is that there are some events that can only be handled via global.asax (or its code behind). In particular, Application_Start/End and Session_Start/End can only be handled in global.asax. Worse, these aren’t true events. For reasons I’m sure made sense at the time but that I don’t understand, the HttpApplicationFactory discovers these methods via reflection rather than by an interface or other more typical mechanism. You can check it out for yourself with Reflector or the Reference Source – look for the method with the wonderful name ReflectOnMethodInfoIfItLooksLikeEventHandler. No, I’m not making that up.

The reason I suddenly care about global.asax is because Application_Start is where ASP.NET MVC apps configure their route table. But if you want to access the Application_Start method in a dynamic language like IronPython, you’re pretty much out of luck. The only way to receive the Application_Start pseudo-event is via a custom HttpApplication class. But you can’t implement your custom HttpApplication in a dynamically typed language like IronPython since it finds the Application_Start method via Reflection. Ugh.

If someone can explain to me why ASP.NET uses reflection to fire the Application_Start event, I’d love to understand why it works this way. Even better - I’d love to see this fixed in some future version of ASP.NET. You come the only way to configure a custom HttpApplication class is to specify it via global.asax? Wouldn’t it make sense to specify it in web.config instead?

In order to support Application_Start for dynamic languages you basically have two choices:

  1. Build a custom HttpApplication class in C# and reference it in global.asax. This is kind of the approach used by Jimmy’s ironrubymvc project. He’s got a RubyMvcApplication which he inherits his GlobalApplication from. Given that GlobalApplication is empty, I think he could remove his global.asax.cs file and just reference RubyMvcApplication from global.asax directly.
  2. Build custom Application_Start/End-like events out of IHttpModule Init and Dispose. You can have multiple IHttpModule instances in a given web app, so you’d need to make sure you ran fired Start and End only once. This is the approach taken by the ASP.NET Dynamic Language Support. [1]

So here’s the question Iron Language Fans: Which of these approaches is better? I lean towards Option #1, since it traps exactly the correct event though it does require a global.asax file to be hanging around (kind of like how the ASP.NET MVC template has a blank default.aspx file “to ensure that ASP.NET MVC is activated by IIS when a user makes a "/" request”). But I’m curious what the Iron Language Community at large thinks. Feel free to leave me a comment or drop me an email with your thoughts.


[1] FYI, I’m working on getting the code for ASP.NET Dynamic Language Support released. In the meantime, you can verify what I’m saying via Reflector.

Posted By Harry Pierson at 11:58 AM Pacific Daylight Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.