Embedding Python Scripts in C# Applications

Now that I’ve got Pygments and its dependencies packaged up in an easy-to-distribute assembly, I need to be able to call it from C#. However, if you pop open pygments.dll in Reflector, you’ll notice it’s not exactly intuitive to access. Lots of compiler generated names like pygments$12 and StringIO$64 in a type named DLRCachedCode. Clearly, this code isn’t intended to be used by anything except the IronPython runtime.

So we better create one of those IronPython runtime thingies.

As you can see in the layer diagram to the left, PygmentsCodeSource is split into two parts – a C# part and a Python part. The Python part is very simple – just importing a couple of Pygments functions into the global namespace and a simple helper function to generate syntax highlighted HTML from a given block of code in a given language and style. The code itself is pretty simple. Note the reference to the pygments assembly I described last post. Here’s the entire file:

import clr
clr.AddReference("pygments")

from pygments.lexers import get_all_lexers
from pygments.styles import get_all_styles

def generate_html(code, lexer_name, style_name):
  from pygments import highlight
  from pygments.lexers import get_lexer_by_name
  from pygments.styles import get_style_by_name
  from devhawk_formatter import DevHawkHtmlFormatter

  if not lexer_name: lexer_name = "text"
  if not style_name: style_name = "default"
  lexer = get_lexer_by_name(lexer_name)
  return highlight(code, lexer, DevHawkHtmlFormatter(style=style_name))

Instead of including this in the Pygments assembly, I embedded this file as a resource in my C# assembly. This way, I could use the standard DLR hosting APIs to create a script source and execute this code. I did have to build a concrete StreamContentProvider class to wrap the resource stream in, but otherwise, it’s pretty straight forward.

static ScriptEngine _engine;
static ScriptSource _source;

private void InitializeHosting()
{
    _engine = IronPython.Hosting.Python.CreateEngine();

    var asm = System.Reflection.Assembly.GetExecutingAssembly();
    var stream = asm.GetManifestResourceStream(
                   "DevHawk.PygmentsCodeSource.py");
    _source = _engine.CreateScriptSource(
                new BasicStreamContentProvider(stream),  
                "PygmentsCodeSource.py");
}

Once I got the engine and script source set up, all that remains is setup a script scope to execute the script source in. For this specific application, it’s probably overkill to have a scope per instance – I think the syntax highlighting process is stateless so a single scope should be easily shared across multiple PygmentsCodeSource instances. But I didn’t take any chances, I created a script scope per instance to execute the source in.

ScriptScope _scope;
Thread _init_thread;

public PygmentsCodeSource()
{
    if (_engine == null)
        InitializeHosting();

     _scope = _engine.CreateScope();

    _init_thread = new Thread(() => { _source.Execute(_scope); });
    _init_thread.Start();
}

You’ll notice that I’m executing the source in the scope on a background thread. That’s because it takes a while to execute, especially the first time. However, I don’t actually use the Python code until after the user types or copies a block of code into the UI and presses OK. In my experience, executing the Python code is typically finished by the time I get code into the box and press OK. I just need to make sure I add an _init_thread.Join guard anywhere I’m going to access the _scope to be sure the initialization is complete before I try to use it.

In the next, and last, post in this small series we’ll see how to invoke Python functions in the _scope I initialized above from C#.

Compiling Python Packages into Assemblies

In looking at my hybrid IronPython / C# Windows Live Writer plugin, we’re going to start at the bottom with the Pygments package. Typically Python packages are a physical on-disk folder that contain a collection of Python files (aka modules). And during early development of Pygments for WLWriter, that’s exactly how I used it. However, when it can time for deployment, I figured it would be much easier if I packaged up the Pygments package, my custom HTML formatter and the standard library modules that Pygments depends on into a single assembly.

IronPython ships with a script named pyc for compiling Python files into .NET assemblies. However, pyc is pretty much just a wrapper around the clr module CompileModules function. I wrote my own custom script to build the Pygments assembly from the files in a the pygments and pygments_dependencies folders.

from System import IO
from System.IO.Path import Combine

def walk(folder):
  for file in IO.Directory.GetFiles(folder):
    yield file
  for folder in IO.Directory.GetDirectories(folder):
    for file in walk(folder): yield file

folder = IO.Path.GetDirectoryName(__file__)

pygments_files = list(walk(Combine(folder, 'pygments')))
pygments_dependencies = list(walk(Combine(folder,'pygments_dependencies')))

all_files = pygments_files + pygments_dependencies
all_files.append(IO.Path.Combine(folder, 'devhawk_formatter.py'))

import clr
clr.CompileModules(Combine(folder, "..externalpygments.dll"), *all_files)

Most of this code is a custom implementation of walk. I have all the IronPython and DLR dlls including ipy.exe checked into my source tree, but I don’t have the standard library checked in. Other than that, the code is pretty straight forward – collect a bunch of files in a list and call CompileModules.

The problem with this approach is that IronPython isn’t doing any kind of dependency checking when we compile the assembly. If you pass just the contents of the Pygments package into CompileModules, it will emit an assembly but that assembly will still depend on some modules in the standard library. If those aren’t available, the Pygments assembly won’t load. I’d love to have an automatic tool to determine module dependencies, but since I didn’t have such a tool I used a brute-force, by-hand solution. I wrote a small script to exercise the Pygments assembly. If there were any missing dependencies, test_compiled_pygments would throw an exception indicating the missing module. For each missing dependency, I copied over the missing dependency, recompiled to project and tried again. Lather, rinse, repeat. Not fun, but Pygments only depended on seven standard library modules so it didn’t end up taking that long.

So having gone down this path of compiling Python files into an assembly, would I do it again? For an application with an installer like this one, yes no question. I added the Pygments assembly as a reference to my C# library and it got added to the installer automatically. That was much easier than managing all of the Pygments files and its dependencies in the installer project manually. Plus, I still would have had to manually figure out the dependencies unless I chose to include the entire standard library.

I will point out that the compiled Pygments assembly is the largest single file in my deployed solution. It clocks in at 2.25MB. That’s about twice the size of the Python files that I compiled it from. So clearly, I’m paying for the convenience of deploying a single file in space and maybe load time. 1 I’m also paying in space for a private copy of IronPython and the DLR – the two IronPython and five DLR assemblies clock in around 3.16MB. In comparison, the actual Writer plugin assembly itself is only about 25KB! But for an installed desktop app like a WLWriter plugin, 5MB of assorted infrastructure isn’t worth worrying about compared to the hassle of ensuring a shared copy of IronPython is installed. I mean, even if you don’t know IronPython exists, you can still install and use Pygments for WLWriter. Simplifying the install process is easily worth 5MB in storage space on the user’s computer in my opinion.

Next up, we’ll look at the Python half of the PygmentsCodeSource component, which calls into this compiled Pygments library.


  1. I haven’t done it, but it would be interesting to compare the load time for the single larger pygments assembly vs. loading and parsing the Python files individually. If I had to guess, I’m thinking the single assembly would load faster even though it’s bigger since there’s less overhead (only loading one big file vs. lots of small ones) and you skip the parsing step. But that’s pure guesswork on my part.

Building a Hybrid C# / IronPython App Without Dynamic Type

Arguably, the biggest feature of C# 4.0 is the new dynamic type. And it’ll be great…when it ships. In the meantime, some of us what to build hybrid C# and IronPython applications today, such as my Pygments for Windows Live Writer plugin.

pygments_logo

Pygments is a syntax highlighter, written in Python, with support for over one hundred languages. With the exception of a couple of bugs in our importer (discussed here) it works great with IronPython. It’s also extensible, so I was able to easily build a custom formatter to output exactly the HTML I want inserted in my blog posts. So it made perfect sense to use Pygments as the basis of a Windows Live Writer plugin.

As great a tool as Windows Live Writer is, it’s developers haven’t exactly seen the light when it comes to dynamic languages. If you want to create a custom Content Source for Windows Live Writer, you have to generate a compiled on-disk assembly with a static type and custom attributes. Not exactly IronPython’s forte, if you know what I mean. I did try and build a pure IronPython solution, but eventually gave up. So I ended up building a hybrid solution. The front end of the plugin as well as the UI elements are written in C# while the syntax highlighter engine is written in IronPython. And since this is running on the current .NET framework, I didn’t have the new fangled C# 4.0 dynamic type to help me.

Over the next couple of blog posts, I want to highlight a few aspects how I built this plugin, including compiling Python packages into assemblies and invoking Python code from C# 3.0 and earlier. If you want to look for your self, the source is up on GitHub.

Pygments for Windows Live Writer v1.0.2

I just uploaded a new version of my Pygments for WL Writer plugin to my skydrive. Nothing major here – some minor UI cleanup + an upgrade to IronPython 2.6 beta 2. Installing over the old version worked on my machine, but that’s as far as my testing has gone. I also pushed the latest source out to GitHub.

I’m still waiting on a fix for what Dino has taken to calling “Harry’s Pygments Import Bug” – which actually turned out to be three importer bugs. The Pygments lexers package is customized so as to abstract away the specific modules the individual lexers are defined in. I don’t use that functionality – I’m using get_all_lexers and get_lexer_by_name instead – but the bugs caused importing the package to fail so in the mean time I commented out the lines that don’t work under IronPython. I think Dino’s got the fixes for this checked in, but I probably won’t update Pygments for WL Writer again until IronPython 2.6 RC.

I Hate Global.asax

One of the things I’ve always loved about ASP.NET is how easily extensible it is. Back in 2000, I had a customer that wanted to “skin” their website using XML and XSLT – an approach Martin Fowler later called Transform View. We were working with classic ASP at the time, so the solution we ended up with was kind of ugly. But I was able to implement this approach in ASP.NET in a few hundred lines of code, which I wrote up in an MSDN article published back in 2003. In the conclusion of that article, I wrote the following:

Using ASP.NET is kind of like having your mind read. If you ever look at a site and think “I need something different,” you’ll most likely find that the ASP.NET architects have considered that need and provided a mechanism for you to hook in your custom functionality. In this case, I’ve bypassed the built-in Web Forms and Web Services support to build an entire engine that services Web requests in a unique way.

Nearly ten years later, I finally ran into a situation where ASP.NET failed to read my mind and doesn’t provide a mechanism to hook in custom functionality: Global.asax.

I always thought of global.asax as an obsolete construct primarily intended to ease migration from classic ASP. After all, ASP.NET has first class support for customizing request handling at various points throughout the execution pipeline via IHttpModule. Handling those events in global.asax always felt vaguely hacky to me.

However, what I didn’t realize is that there are some events that can only be handled via global.asax (or its code behind). In particular, Application_Start/End and Session_Start/End can only be handled in global.asax. Worse, these aren’t true events. For reasons I’m sure made sense at the time but that I don’t understand, the HttpApplicationFactory discovers these methods via reflection rather than by an interface or other more typical mechanism. You can check it out for yourself with Reflector or the Reference Source – look for the method with the wonderful name ReflectOnMethodInfoIfItLooksLikeEventHandler. No, I’m not making that up.

The reason I suddenly care about global.asax is because Application_Start is where ASP.NET MVC apps configure their route table. But if you want to access the Application_Start method in a dynamic language like IronPython, you’re pretty much out of luck. The only way to receive the Application_Start pseudo-event is via a custom HttpApplication class. But you can’t implement your custom HttpApplication in a dynamically typed language like IronPython since it finds the Application_Start method via Reflection. Ugh.

If someone can explain to me why ASP.NET uses reflection to fire the Application_Start event, I’d love to understand why it works this way. Even better – I’d love to see this fixed in some future version of ASP.NET. You come the only way to configure a custom HttpApplication class is to specify it via global.asax? Wouldn’t it make sense to specify it in web.config instead?

In order to support Application_Start for dynamic languages you basically have two choices:

  1. Build a custom HttpApplication class in C# and reference it in global.asax. This is kind of the approach used by Jimmy’s ironrubymvc project. He’s got a RubyMvcApplication which he inherits his GlobalApplication from. Given that GlobalApplication is empty, I think he could remove his global.asax.cs file and just reference RubyMvcApplication from global.asax directly.
  2. Build custom Application_Start/End-like events out of IHttpModule Init and Dispose. You can have multiple IHttpModule instances in a given web app, so you’d need to make sure you ran fired Start and End only once. This is the approach taken by the ASP.NET Dynamic Language Support. 1

So here’s the question Iron Language Fans: Which of these approaches is better? I lean towards Option #1, since it traps exactly the correct event though it does require a global.asax file to be hanging around (kind of like how the ASP.NET MVC template has a blank default.aspx file “to ensure that ASP.NET MVC is activated by IIS when a user makes a “/” request”). But I’m curious what the Iron Language Community at large thinks. Feel free to leave me a comment or drop me an email with your thoughts.


  1. FYI, I’m working on getting the code for ASP.NET Dynamic Language Support released. In the meantime, you can verify what I’m saying via Reflector.