During Lang.NET, I ended up sitting next to Hua Ming, who's been working on the .NET Classbox project I wrote about previously. .NET Classbox introduces a new syntax for "using" to C# - basically, you can use individual classes as well as whole namespaces, and you can extend the individual classes you use. Obviously, that meant having a custom compiler that was 99% vanilla C# + the extra classbox syntax. Rather than building a C# compiler from scratch, the Classbox project extended the Mono Project C# compiler. Hua described the process as taking a "huge amount of time" and he described the compiler as "a monster". Now, I'm not trying to knock Mono here, I imagine our C# compiler is just as hard to work with. SSCLI's C# compiler directory is 5.5MB of source code alone spread across 126 .h and 68 .cpp files.
Is it just me, or does it seem crazy to have to muck about with such a large code base in order to add a relatively simple language feature? What I'd like to see is a more modular way of building compilers, so that integrating a small language feature like classbox would be a small amount of effort.
Of course, there is some work that's been done in this space. MS Research had a Research C# compiler paper, but it's three years old and one of the two authors has moved on to a cool product group job. I also discovered SUIF and the National Compiler Infrastructure Project, but these don't look like they've been updated in a while.
I like the model that the Research C# compiler proposes. Basically, it looks like this:
The only think I don't like about this specific approach is their Excel file based parser generator. It's a huge step beyond the LEX/YACC approach as it is scanner-less (having separate scanner and parser steps kills any chance of modularity) but it still has to deal with ambiguous grammars. Personally, I've been looking at Parsing Expression Grammars in part because they aren't ambiguous. For programming lanugages, support ambiguity in the grammar is a bug, not a feature.