Modular Compilers

During Lang.NET, I ended up sitting next to Hua Ming, who’s been working on the .NET Classbox project I wrote about previously. .NET Classbox introduces a new syntax for “using” to C# – basically, you can use individual classes as well as whole namespaces, and you can extend the individual classes you use. Obviously, that meant having a custom compiler that was 99% vanilla C# + the extra classbox syntax. Rather than building a C# compiler from scratch, the Classbox project extended the Mono Project C# compiler. Hua described the process as taking a “huge amount of time” and he described the compiler as “a monster”. Now, I’m not trying to knock Mono here, I imagine our C# compiler is just as hard to work with. SSCLI’s C# compiler directory is 5.5MB of source code alone spread across 126 .h and 68 .cpp files.

Is it just me, or does it seem crazy to have to muck about with such a large code base in order to add a relatively simple language feature? What I’d like to see is a more modular way of building compilers, so that integrating a small language feature like classbox would be a small amount of effort.

Of course, there is some work that’s been done in this space. MS Research had a Research C# compiler paper, but it’s three years old and one of the two authors has moved on to a cool product group job. I also discovered SUIF and the National Compiler Infrastructure Project, but these don’t look like they’ve been updated in a while.

I like the model that the Research C# compiler proposes. Basically, it looks like this:

  1. Specify the grammar in a modular way. In the paper, the grammar is specified in an Excel file, and you can use multiple files in a modular fashion. i.e. have one file for the core language and another for the extensions.
  2. Late bind a grammar production to an action. Typically, in a lex/yacc style scenario, you embed the action code for a given production directly into the grammar, which makes it extremely hard to extend the existing syntax. In the paper, each production is linked with an instance of a type, so swapping out a new type would seem to be possible.
  3. Generate an abstract syntax tree, that gets processed by multiple visitors. From the paper, the compiler has broken the “traditional” compiler steps – bind, typecheck, rewrite and generate binary (in this case IL) – into separate visitors. That makes adding extra steps or chaning existing steps fairly straightforward.

The only think I don’t like about this specific approach is their Excel file based parser generator. It’s a huge step beyond the LEX/YACC approach as it is scanner-less (having separate scanner and parser steps kills any chance of modularity) but it still has to deal with ambiguous grammars. Personally, I’ve been looking at Parsing Expression Grammars in part because they aren’t ambiguous. For programming lanugages, support ambiguity in the grammar is a bug, not a feature.

Extending WL Writer

So I downloaded the SDK for WL Writer and took a quick look. Basically, there’s two types of extensions you can build:

  • App Launcher – so you can add a “Blog It” button to some other app to remotely launch WL Writer. I assume this is how the WL Toolbar intergration works.
  • Content Source – so you can add some type of custom content to a post. Typical examples would be Technorati tags or Currently Listening To info.

Given that they are trying to support “every blogging service out there”, I’m surprised there’s not a way to build a plugable blogging service. WL Writer only allows you to customize the content of the post via plugins. Customizing the metadata (i.e. categories) is right out. I realize it’s the hip thing to put Technorati tags right in your post content, but Technorati also picks up category information which dasBlog already has great support for. What I’d really like is something that acts like del.icio.us’ new post form, where you can free type in your categories, it highlights words as you type and it shows you a list of all your tags so you can click on them.

One other minor note – WL Writer does a good job for inserting hyperlinks. When you select a word, often the whitespace that follows it is also selected. Some HTML editors will insert the hyperlink over the whole selection – inlcuding the whitespace which makes no sense. WL Writer gets it right and excludes any trailing whitespace from the hyperlink. Cool!