DevHawk World Tour FY2010

As I’ve done the past two years, here’s a list of all the places I’m going in the next fiscal year. Traditionally, I’ve done this post by calendar year, but all MSFT planning is done by FY and so invariably I miss events early in the calendar year but late in the fiscal (like PyCon last year). I’ll be updating this post periodically as I get tapped for more presentations. There are several other conferences I’m considering, submitting sessions for, in discussions with, but these are the ones that are confirmed.

Danish University Tour, Sept 7-11
My FY10 travels first take me to Copenhagen, where I was invited by the local subsidiary to present at four different universities in a single week. Don’t know how much sightseeing I’ll get done, but I’ll sure be talking a lot. My host Martin Esmann writes Stud.blog for Danish ComputerWorld and has a post (in Danish) about my visit. Personally, I am just excited about being featured in something called “Stud.blog”! 😄 Actually, Stud here means “Student” not “slender, upright members of wood” or any other definition of the term “stud”.

I’ll be visiting Aalborg University, Aarhus University, University of Southern Denmark and University of Copehhagen as well as delivering a TechTalk at the Microsoft Development Center Copenhagen, which is Microsoft’s biggest development center in Europe. I’ll primarily be delivering my Iron Languages introductory talk “Pumping Iron”, but there’s also some interest in language development on the DLR so I’ll be talking on that topic as well.

patterns & practices Summit Redmond 2009, Oct 12-16
This will be my third p&p Summit in a row and fourth in five years. This year, I’m doing a talk called “Not Everything is a new Nail() : How Languages Influence Design”. I was supposed to deliver this talk last year, but got side track with my day job and ended up talking about IronPython instead. Keith has made it VERY clear he doesn’t want another last minute substitution again this year.

Turing award winner Alan Perlis is credited with saying ‘A language that doesn’t affect the way you think about programming is not worth knowing.’ Yet, most programmers rarely venture outside of the comfort zone of statically-typed object-oriented languages. Our heavy use of object-oriented languages influences our thinking to the point that we can?t see alternative approaches at all. This isn?t to say the object-oriented languages are bad, but as is typical in most things, there is no one ‘best’ way for all situations. In this talk, VS Languages PM Harry Pierson will look at a given software development scenario from both the object-oriented and functional perspectives, in order to see how much on an influence language really has on our engineering efforts.

TechEd_Europe_2009

Tech·Ed Europe 2009, Nov 9-13
I knew I was going to be updating this post over time, but I didn’t expect to have to update it so soon! Literally the day after I posted this, I got the speaker invite for Tech·Ed Europe 2009. My session hasn’t been posted yet, but this is the abstract we submitted:

Dynamic Languages on the Microsoft .NET Framework
The Dynamic Language Runtime (DLR) adds a shared dynamic type system, a standard hosting model, and support for generating fast dynamic code to the CLR. IronPython and IronRuby are Microsoft’s dynamic language implementations on .NET. In this talk, we’ll show you how to interactively create great .NET applications using dynamic languages. You’ll walk away knowing why dynamic languages deserve a spot in your toolbox!

It’s kind of generic, but given that most of the audience probably hasn’t seen IronPython or IronRuby, having broad latitude in my presentation topic is a good thing. I’ll probably deliver a variant of my standard “Pumping Iron” talk like I’m doing in Denmark. I delivered it recently at an internal event with Jimmy, so there’s lots more IronRuby content than there used to be.

The only bummer about doing Tech·Ed Europe is that I’m only doing one measly talk. I’m asking around – I’d love to do a .NET user group or university talk while I’m in town. Any takers?

Find out what's
next

~~Microsoft Professional Developers Conference 2009, Nov 17-19~~
Update: Tech·Ed Europe and PDC are on back-to-back weeks this year so we’ll be sending a teammate-to-be-determined to PDC in my stead. My family is very pleased I won’t be gone for two weeks straight.

Last year, I was on the content team for PDC. This year, that PITA responsibility belongs to someone else so I might actually get real work done in the four weeks leading up to PDC. My team will tell you, last year PDC sucked up 100% of my time for a month as we were driving towards our 2.0 release.

Technically, I haven’t had a talk for PDC accepted yet. But I submitted three and two are looking good (though I assume only one will make it to the actual show) so I thought I’d just go ahead and include it on this post. If/when my talks get accepted, I’ll post links and abstracts. Also, if one of my PDC talks is accepted, I’ll probably submit a talk for ~~SoCal Code Camp~~ ~~as well.~~

pycon logo

PyCon 2010, Feb 19-21
This will also be my third PyCon in a row, though PyCon last year was a bit of a whirlwind since I had literally just joined the IronPython team. I finally feel like I might have something interesting to present at PyCon this year. Last year Dino and Jim handled the presentation duties from our team (with Michael Foord and Jonathan Hartley delivering a tutorial and Sarah Sutkiewicz speaking on FePy). We already have one announcement that I think is pretty significant lined up and might have a second depending on how hard I can push LCA and management between now and then. Talk proposals are due October 1st, so any suggestions would be appreciated!

CodePlex Editor Role

Ask Sara, I have been bugging her for a LONG time for this CodePlex feature. Actually, my team has been bugging her team for longer than either of us have been in these jobs.

Last week’s CodePlex release includes a feature known as “Editor Role”. If you look at the Project Role Matrix, you’ll notice two primary differences from what the standard logged-in user can do: they can create/edit wiki pages and they can’t rate releases. Developers and Coordinators can’t rate releases either – I guess the idea is that they don’t want members of the team rating their own releases (5 Stars! Again! Wow, we’re awesome!).

Until now, the only way to give members of the community the ability to edit the wiki also gave permission to edit work items, check in source code and make releases. We’re still working on getting Microsoft at large to understand the benefits of community collaboration aspect in open source, but in the meantime we just can’t give those permissions to people off the team. However, we would love to have contributions to our documentation wiki. ¹ With the new Editor Role, we’ll be able to grant wiki editor access without any of the other permissions.

Of course, the whole idea of “wiki permissions” kinda flies in the face of the basic wiki design principles. So we’re going to be pretty liberal about handing out editor permissions. If you’re interested in editing the wiki, drop me a line and I’ll get you hooked up.

Big mega-thanks to the CodePlex team for making this feature happen. I guess I’ll have to find something new to bug Sara about!

You can tell we’re a real open source project because we’re begging for documentation help!↩

Invoking Python Functions from C# (Without Dynamic)

So I’ve compiled the Pygments package into a CLR assembly and loaded an embedded Python script, so now all that remains is calling into the functions in that embedded Python script. Turns out, this is the easiest step so far.

We’ll start with get_all_lexers and get_all_styles, since they’re nearly identical. Both functions are called once on initialization, take zero arguments and return a PythonGenerator (for you C# devs, a PythonGenerator is kind of like the IEnumerable that gets created when you yield return from a function). In fact, the only difference between them is that get_all_styles returns a generator of simple strings, while get_all_lexers returns a PythonTuple of the long name, a tuple of aliases, a tuple of filename patterns and a tuple of mime types. Here’s the implementation of Languages property:

PygmentLanguage[] _lanugages;

public PygmentLanguage[] Languages
{
    get
    {
        if (_lanugages == null)
        {
            _init_thread.Join();

            var f = _scope.GetVariable<PythonFunction>("get_all_lexers");
            var r = (PythonGenerator)_engine.Operations.Invoke(f);
            var lanugages_list = new List<PygmentLanguage>();
            foreach (PythonTuple o in r)
            {
                lanugages_list.Add(new PygmentLanguage()
                    {
                        LongName = (string)o[0],
                        LookupName = (string)((PythonTuple)o[1])[0]
                    });
            }

            _lanugages = lanugages_list.ToArray();
        }

        return _lanugages;
    }
}

If you recall from my last post, I initialized the _scope on a background thread, so I first have to wait for the thread to complete. If I was using C# 4.0, I’d simply be able to run _scope.get_all_lexers, but since I’m not I have to manually reach into the _scope and retrieve the get_all_lexers function via the GetVariable method. I can’t invoke the PythonFunction directly from C#, instead I have to use the Invoke method that hangs off _engine.Operations. I cast the return value from Invoke to a PythonGenerator and iterate over it to populate the array of languages.

If you’re working with dynamic languages from C#, the ObjectOperations instance than hangs off the ScriptEngine instance is amazingly useful. Dynamic objects can participate in a powerful but somewhat complex protocol for binding a wide variety of dynamic operation types. The DynamicMetaObject class supports twelve different Bind operations. But the DynamicMetaObject binder methods are designed to be used by language implementors. The ObjectOperations class lets you invoke them fairly easily from a higher level of abstraction.

The last Python function I call from C# is generate_html. Unlike get_all_lexers, generate_html takes three parameters and can be called multiple times. The Invoke method has a params argument so it can accept any number of additional parameters, but when I tried to call it I got a NotImplemented exception. It turns out that Invoke currently throws NotImplemented if it receives more than 2 parameters. Yes, we realize that’s kinda broken and we are looking to fix it. However, it turns out there’s another way that’s also more efficient for a function like generate_html that we are likely to call more than once. Here’s my implementation of GenerateHtml in C#.

Func<object, object, object, string> _generatehtml_function;

public string GenerateHtml(string code, string lexer, string style)
{
    if (_generatehtml_function == null)
    {
        _init_thread.Join();

        var f = _scope.GetVariable<PythonFunction>("generate_html");
        _generatehtml_function = _engine.Operations.ConvertTo
                           <Func<object, object, object, string>>(f);
    }

    return _generatehtml_function(code, lexer, style);
}

Instead of calling Invoke, I convert the PythonFunction instance into a delegate using Operations.ConvertTo which I then cache and call like any other delegate from C#. Not only does Invoke fail for more than two parameters, it creates a new dynamic call site every time it’s called. Since get_all_lexers and get_all_styles are each only called once, it’s no big deal. But you typically call generate_html multiple times for a block of source code. Using ConvertTo generates a dynamic call site as part of the delegate, so that’s more efficient than creating one on every call.

The rest of the C# code is fairly pedestrian and has nothing to do with IronPython, as all access to Python code is hidden behind GenerateHtml as well as the Languages and Styles property.

So as I’ve shown in the last few posts, embedding IronPython inside a C# application – even before we get the new dynamic functionality of C# 4.0 – isn’t really all that hard. Of course, we’re always interested in ways to make it easier. If you’ve got any questions or suggestions, please feel free to leave a comment or drop me a line.

Embedding Python Scripts in C# Applications

Now that I’ve got Pygments and its dependencies packaged up in an easy-to-distribute assembly, I need to be able to call it from C#. However, if you pop open pygments.dll in Reflector, you’ll notice it’s not exactly intuitive to access. Lots of compiler generated names like pygments$12 and StringIO$64 in a type named DLRCachedCode. Clearly, this code isn’t intended to be used by anything except the IronPython runtime.

So we better create one of those IronPython runtime thingies.

As you can see in the layer diagram to the left, PygmentsCodeSource is split into two parts – a C# part and a Python part. The Python part is very simple – just importing a couple of Pygments functions into the global namespace and a simple helper function to generate syntax highlighted HTML from a given block of code in a given language and style. The code itself is pretty simple. Note the reference to the pygments assembly I described last post. Here’s the entire file:

import clr
clr.AddReference("pygments")

from pygments.lexers import get_all_lexers
from pygments.styles import get_all_styles

def generate_html(code, lexer_name, style_name):
  from pygments import highlight
  from pygments.lexers import get_lexer_by_name
  from pygments.styles import get_style_by_name
  from devhawk_formatter import DevHawkHtmlFormatter

  if not lexer_name: lexer_name = "text"
  if not style_name: style_name = "default"
  lexer = get_lexer_by_name(lexer_name)
  return highlight(code, lexer, DevHawkHtmlFormatter(style=style_name))

Instead of including this in the Pygments assembly, I embedded this file as a resource in my C# assembly. This way, I could use the standard DLR hosting APIs to create a script source and execute this code. I did have to build a concrete StreamContentProvider class to wrap the resource stream in, but otherwise, it’s pretty straight forward.

static ScriptEngine _engine;
static ScriptSource _source;

private void InitializeHosting()
{
    _engine = IronPython.Hosting.Python.CreateEngine();

    var asm = System.Reflection.Assembly.GetExecutingAssembly();
    var stream = asm.GetManifestResourceStream(
                   "DevHawk.PygmentsCodeSource.py");
    _source = _engine.CreateScriptSource(
                new BasicStreamContentProvider(stream),  
                "PygmentsCodeSource.py");
}

Once I got the engine and script source set up, all that remains is setup a script scope to execute the script source in. For this specific application, it’s probably overkill to have a scope per instance – I think the syntax highlighting process is stateless so a single scope should be easily shared across multiple PygmentsCodeSource instances. But I didn’t take any chances, I created a script scope per instance to execute the source in.

ScriptScope _scope;
Thread _init_thread;

public PygmentsCodeSource()
{
    if (_engine == null)
        InitializeHosting();

     _scope = _engine.CreateScope();

    _init_thread = new Thread(() => { _source.Execute(_scope); });
    _init_thread.Start();
}

You’ll notice that I’m executing the source in the scope on a background thread. That’s because it takes a while to execute, especially the first time. However, I don’t actually use the Python code until after the user types or copies a block of code into the UI and presses OK. In my experience, executing the Python code is typically finished by the time I get code into the box and press OK. I just need to make sure I add an _init_thread.Join guard anywhere I’m going to access the _scope to be sure the initialization is complete before I try to use it.

In the next, and last, post in this small series we’ll see how to invoke Python functions in the _scope I initialized above from C#.

Compiling Python Packages into Assemblies

In looking at my hybrid IronPython / C# Windows Live Writer plugin, we’re going to start at the bottom with the Pygments package. Typically Python packages are a physical on-disk folder that contain a collection of Python files (aka modules). And during early development of Pygments for WLWriter, that’s exactly how I used it. However, when it can time for deployment, I figured it would be much easier if I packaged up the Pygments package, my custom HTML formatter and the standard library modules that Pygments depends on into a single assembly.

IronPython ships with a script named pyc for compiling Python files into .NET assemblies. However, pyc is pretty much just a wrapper around the clr module CompileModules function. I wrote my own custom script to build the Pygments assembly from the files in a the pygments and pygments_dependencies folders.

from System import IO
from System.IO.Path import Combine

def walk(folder):
  for file in IO.Directory.GetFiles(folder):
    yield file
  for folder in IO.Directory.GetDirectories(folder):
    for file in walk(folder): yield file

folder = IO.Path.GetDirectoryName(__file__)

pygments_files = list(walk(Combine(folder, 'pygments')))
pygments_dependencies = list(walk(Combine(folder,'pygments_dependencies')))

all_files = pygments_files + pygments_dependencies
all_files.append(IO.Path.Combine(folder, 'devhawk_formatter.py'))

import clr
clr.CompileModules(Combine(folder, "..externalpygments.dll"), *all_files)

Most of this code is a custom implementation of walk. I have all the IronPython and DLR dlls including ipy.exe checked into my source tree, but I don’t have the standard library checked in. Other than that, the code is pretty straight forward – collect a bunch of files in a list and call CompileModules.

The problem with this approach is that IronPython isn’t doing any kind of dependency checking when we compile the assembly. If you pass just the contents of the Pygments package into CompileModules, it will emit an assembly but that assembly will still depend on some modules in the standard library. If those aren’t available, the Pygments assembly won’t load. I’d love to have an automatic tool to determine module dependencies, but since I didn’t have such a tool I used a brute-force, by-hand solution. I wrote a small script to exercise the Pygments assembly. If there were any missing dependencies, test_compiled_pygments would throw an exception indicating the missing module. For each missing dependency, I copied over the missing dependency, recompiled to project and tried again. Lather, rinse, repeat. Not fun, but Pygments only depended on seven standard library modules so it didn’t end up taking that long.

So having gone down this path of compiling Python files into an assembly, would I do it again? For an application with an installer like this one, yes no question. I added the Pygments assembly as a reference to my C# library and it got added to the installer automatically. That was much easier than managing all of the Pygments files and its dependencies in the installer project manually. Plus, I still would have had to manually figure out the dependencies unless I chose to include the entire standard library.

I will point out that the compiled Pygments assembly is the largest single file in my deployed solution. It clocks in at 2.25MB. That’s about twice the size of the Python files that I compiled it from. So clearly, I’m paying for the convenience of deploying a single file in space and maybe load time. ¹ I’m also paying in space for a private copy of IronPython and the DLR – the two IronPython and five DLR assemblies clock in around 3.16MB. In comparison, the actual Writer plugin assembly itself is only about 25KB! But for an installed desktop app like a WLWriter plugin, 5MB of assorted infrastructure isn’t worth worrying about compared to the hassle of ensuring a shared copy of IronPython is installed. I mean, even if you don’t know IronPython exists, you can still install and use Pygments for WLWriter. Simplifying the install process is easily worth 5MB in storage space on the user’s computer in my opinion.

Next up, we’ll look at the Python half of the PygmentsCodeSource component, which calls into this compiled Pygments library.

I haven’t done it, but it would be interesting to compare the load time for the single larger pygments assembly vs. loading and parsing the Python files individually. If I had to guess, I’m thinking the single assembly would load faster even though it’s bigger since there’s less overhead (only loading one big file vs. lots of small ones) and you skip the parsing step. But that’s pure guesswork on my part.↩

Series

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.