Passion * Technology * Ruthless Competence

Thursday, June 18, 2009

__clrtype__ Metaclasses: Named Attribute Parameters

In my last post, I added support for custom attribute positional parameters . To finish things off, I need to add support for named parameters as well. Custom attributes support named parameters for public fields and settable properties. It works kind of like C# 3.0’s object initalizers. However, unlike object initalizers, the specific fields and properties to be set on a custom attribute as well as their values are passed to the CustomAttributeBuilder constructor. With six arguments – five of which are arrays – it’s kind of an ugly constructor. But luckily, we can hide it away in the make_cab function by using Python’s keyword arguments feature.

def make_cab(attrib_type, *args, **kwds):
  clrtype = clr.GetClrType(attrib_type)
  argtypes = tuple(map(lambda x:clr.GetClrType(type(x)), args))
  ci = clrtype.GetConstructor(argtypes)

  props = ([],[])
  fields = ([],[])
  
  for kwd in kwds:
    pi = clrtype.GetProperty(kwd)
    if pi is not None:
      props[0].append(pi)
      props[1].append(kwds[kwd])
    else:
      fi = clrtype.GetField(kwd)
      if fi is not None:
        fields[0].append(fi)
        fields[1].append(kwds[kwd])
      else:
        raise Exception"No %s Member found on %s" % (kwd, clrtype.Name)
  
  return CustomAttributeBuilder(ci, args, 
    tuple(props[0]), tuple(props[1]), 
    tuple(fields[0]), tuple(fields[1]))

def cab_builder(attrib_type):
  return lambda *args, **kwds:make_cab(attrib_type, *args, **kwds)

You’ll notice that make_cab now takes a third parameter: the attribute type and the tuple of positional arguments we saw last post. This third parameter “**kwds” is a dictionary of named parameters. Python supports both positional and named parameter passing, like VB has for a while and C# will in 4.0. However, this **kwds parameter contains all the extra or leftover named parameters that were passed in but didn’t match any existing function arguments. Think of it like the params of named parameters.

As I wrote earlier, custom attributes support setting named values of both fields and properties. We don’t want the developer to have to know if given named parameter is a field or property, so make_cab iterates over all the named parameters, checking first to see if it’s a property then if it’s a field. It keeps a list of all the field / property infos as well as their associated values. Assuming all the named parameters are found, those lists are converted to tuples and passed into the CustomAttributeBuilder constructor.

In addition to the change to make_cab, I also updated cab_builder slightly in order to pass the **kwds parameter on thru to the make_cab function. No big deal. So now, I can add an attribute with named parameters to my IronPython class and it still looks a lot like a C# attribute specification.

clr.AddReference("System.Xml")
from System.Xml.Serialization import XmlRootAttribute 
from System import ObsoleteAttribute, CLSCompliantAttribute
Obsolete = cab_builder(ObsoleteAttribute)
CLSCompliant = cab_builder(CLSCompliantAttribute)
XmlRoot = cab_builder(XmlRootAttribute)

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries" 
  _clrclassattribs = [
    Obsolete("Warning Lark's Vomit"), 
    CLSCompliant(False),
    XmlRoot("product", Namespace="http://samples.devhawk.net")]

  # remainder of Product class omitted for clarity

As usual, sample code is up on my skydrive.

Now that I can support custom attributes on classes, it would be fairly straightforward to add them to methods, properties, etc as well. The hardest part at this point is coming up with a well designed API that works within the Python syntax. If you’ve got any opinions on that, feel free to share them in the comments, via email, or on the IronPython mailing list.

Posted By Harry Pierson at 10:09 AM Pacific Daylight Time

Wednesday, June 17, 2009

__clrtype__ Metaclasses: Positional Attribute Parameters

The basic infrastructure for custom attributes in IronPython is in place, but it’s woefully limited. Specifically, it only works for custom attributes that don’t have parameters. Of course, most of the custom attributes that you’d really want to use require additional parameters, both the positional or named variety. Since positional parameters are easier, let’s start with them.

Positional parameters get passed to the custom attribute’s constructor. As we saw in the previous post, you need a CustomAttributeBuilder to attach a custom attribute to an attribute target (like a class). Previously, I just needed to know the attribute type since I was hard coding the positional parameters. But now, I need to know both the attribute type as well as the desired positional parameters. I could have built a custom Python class to track this information, but it made much more sense just to use CustomAttributeBuilder instances. I built a utility function make_cab to construct the CustomAttributeBuilder instances.

def make_cab(attrib_type, *args):
  argtypes = tuple(map(lambda x:clr.GetClrType(type(x)), args))
  ci = clr.GetClrType(attrib_type).GetConstructor(argtypes)
  return CustomAttributeBuilder(ci, args)

from System import ObsoleteAttribute 

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries"   
  _clrclassattribs = [make_cab(ObsoleteAttribute , "Warning Lark's Vomit")]

  # remaining Product class definition omited for clarity

In make_cab, I build a tuple of CLR types from the list of positional arguments that was passed in. If you haven’t seed the *args syntax before, it works like C#’s params keyword – any extra arguments are passed into the function as a tuple names args. I use Python’s built in map function (FP FTW!) to build a tuple of CLR types of the provided arguments, which I then pass to GetConstructor. Previously, I passed an empty tuple to GetConstructor because I wanted the default constructor. If you don’t pass any positional arguments, you still get the default constructor. Once I’ve found the right constructor, I pass it and the original tuple of arguments to the CustomAttributeBuilder constructor.

One major benefit of this approach is that it simplifies the metaclass code. Since _clrclassattribs is now a list of CustomAttributeBuilders, now I just need to iterate over that list and call SetCustomAttribute for each.

    if hasattr(cls, '_clrclassattribs'):
      for cab in cls._clrclassattribs:
        typebld.SetCustomAttribute(cab)

The only problem with this approach is that specifying the list of custom attributes is now extremely verbose. Not only am I specifying the full attribute class name as well as the positional arguments, I’m also having to insert a call to make_cab. Previously, it kinda looked like a C# custom attribute, albeit in the wrong place. Not anymore. So I decided to write a function called cab_builder to generates less verbose calls to make_cab:

def cab_builder(attrib_type):
  return lambda *args:make_cab(attrib_type, *args)

from System import ObsoleteAttribute 
Obsolete = cab_builder(ObsoleteAttribute)

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries"   
  _clrclassattribs = [Obsolete("Warning Lark's Vomit")]

  # remaining Product class definition omited for clarity

The cab_builder function returns an anonymous lambda function that closes over the attrib_type variable. Python lambdas are just like C# lambdas, except that they only support expressions [1]. The results of calling the lambda returned from cab_builder is exactly the same as calling make_cab directly, but less verbose. And since I named the function returned from cab_builder Obsolete, now my list of class custom attributes looks exactly like it does in C# (though still in a different place). As usual, the code is up on my skydrive.

If you’re only using the attribute once like this, it is kind of annoying to first declare the cab_builder function. If you wanted to you could iterate over the types in a given assembly, looking for ones that inherit from Attribute and generate the cab_builder call dynamically. However, I’m not sure how performant that would be. Another possibility would be to iterate over the types in a given assembly and generate a Python module on disk with the calls to cab_builder. Then, you’d just have to import this module of common attributes but still be able to include additional calls to cab_builder as needed.

[1] The lack of statement lambdas in Python is one of my few issues with the language.

Posted By Harry Pierson at 11:02 AM Pacific Daylight Time

Monday, June 15, 2009

__clrtype__ Metaclasses: Simple Custom Attributes

I know it’s been a while since my last __clrtype__ post, but I was blocked on some bug fixes that shipped as part of IronPython 2.6 Beta 1. So now let’s start looking at one of the most requested IronPython features – custom attributes!

Over the course of the next three blog posts, I’m going to build out a mechanism for specifying custom attributes on the CLR type we’re generating via __clrtype__. All the various Builder classes in System.Reflection.Emit support a SetCustomAttribute method that works basically the same way. There are two overloads – the one I’m going to use takes a single CustomAttributeBuilder as a parameter.

For this first post, I’m going to focus on the basic custom attribute infrastructure, so we’re going to use the extremely simple ObsoleteAttribute. While you can pass some arguments to the constructor, for this first post I’m going to use the parameterless constructor. To keep things less confusing, I’m going back to the original version of the Product class, before I introduced CLR fields and properties. The one change I’m making is that I’m adding a list of attributes I want to add to the class.

from System import ObsoleteAttribute 

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries"   
  _clrclassattribs = [ObsoleteAttribute]
  
  # remainder of class omitted for clarity

Python list comprehensions use the same square bracket syntax as C# properties, so it kinda looks right to someone with a C# eye – though having the attribute specifications inside the class, rather than above it, is totally different. I wish I could use Python’s class decorators for custom class attributes, but class decorators run after metaclasses so unfortunately that doesn’t work. Also, I can’t leave off the “Attribute” suffix like you can in C#. If I really wanted to, I could provide a new type name in the import statement (“from System import ObsoleteAttribute as Obsolete”) but I thought spelling it out was clearer for this post.

Now that I have specified the class attributes, I can update the metaclass __clrtype__ method to set the attribute on the generated CLR class:

    if hasattr(cls, '_clrclassattribs'):
      for attribtype in cls._clrclassattribs:
        ci = clr.GetClrType(attribtype).GetConstructor(())
        cab = CustomAttributeBuilder(ci, ())
        typebld.SetCustomAttribute(cab)

I’m simply iterating over the list of _clrclassattribs (if it exists), getting the default parameterless constructor for each attribute type, creating a CustomAttributeBuilder instance from that constructor and then calling SetCustomAttribute. Of course, this is very simple because we’re not supporting any custom arguments or setting of named properties. We’ll get to that in the next post. In the mean time, you can get the full code for this post from my skydrive.

There is one significant issue with this custom attribute code. Attributes are typically marked with the AttributeUsage attribute that specifies a set of constraints, such as the kind of targets a given attribute can be attached to and if it can be specified multiple times. For example, the MTAThread attribute can’t be specified multiple times and it can only be attached to methods. However, those attribute constraints are validated by the compiler, not the runtime. I haven’t written any code yet to validate those constraints, so you can specify invalid combinations like multiple MTAThread attributes on a class. For now, I’m just going to leave it to the developer not to specify invalid attribute combinations. Custom attributes are passive anyway so I’m figure no one will come looking for a MTAThread attribute on a class or other such scenarios.

However, I’m interested in your opinion: When we get to actually productizing a higher-level API for __clrtype__, what kinds of attribute validation should we do, if any?

Posted By Harry Pierson at 10:34 AM Pacific Daylight Time

Wednesday, May 20, 2009

IronPython 2.6 Beta 1

In addition to the IronPython CTP for .NET Framework 4.0 Beta 1 I blogged about earlier, we also released the first beta of IronPython 2.6 today. How about that – two IronPython releases in one day! This is our second preview release as we work towards our 2.6 RTM in September. 2.6 Alpha 1 was released back in March.

There are two big new features in this release. The first is our implementation of the ctypes module. The ctypes module is like P/Invoke for Python. It allows Python code to call into unmanaged DLL functions. Here, for example, I’m calling into the standard wprintf function from msvcrt.dll

IronPython 2.6 Beta 1 (2.6.0.10) on .NET 2.0.50727.4918
>>> import ctypes
>>> libc = ctypes.cdll.msvcrt
>>> ret = libc.wprintf("%s\n""hello")
hello

Between ctypes and Ironclad, I think we’ll eventually be able to load most native Python extensions in IronPython. Woot!

The other big new feature in this release is a real implementation of sys._getframe. _getframe lets you write code that inspects the Python callstack. Previously, we supported _getframe only with a depth of zero which is to say you could inspect the current frame, but no others. Now, by default we don’t implement _getframe at all unless you pass in –X:Frames or –X:FullFrames on the command line. Removing the version of _getframe that only worked for depth zero fixes an issue with collections.py that broke much of the 2.6 standard library in IronPython 2.6 Alpha 1.

The difference between Frames and FullFrames is in what is returned by frame.f_locals member. If you’re running with FullFrames, we hoist all local variables into the heap so they can be accessed by our frame walker. If you’re running with Frames, our ability to access locals up the stack is limited. Sometimes they are available - If you called locals() in a frame up the stack for example, then f_locals will be available – but usually not. There’s a performance difference between the default (i.e. no Frames), –X:Frames and –X:FullFrames, hence why we provide the user fine grained control over the Frame support.

Our performance has gotten better relative to 2.6 Alpha 1. Our PyStone numbers have improved 80% from Alpha 1, similar to where we were in IronPython 2.0.1. We’ve also been able to cut our startup time about 25% from 2.0.1. We’re still an order of magnitude slower than CPython on startup, but we’re getting better. We’re significantly worse on PyBench than we were in 2.6 Alpha 1, but that’s primarily because there’s now a second exception test. As I described back in March, we get killed on the exceptions benchmarks – the two combine to consume nearly 62% of our total run time. Ouch!

Finally, there are bug fixes. Of particular relevance to readers of this blog are a series of fixes that allow me to continue on with my __clrtype__ series. Watch for that soon.

As I said back when we released Alpha 1, the release cycle on 2.6 will be much shorter than it was for 2.0. 2.0 had eight alphas, five betas and two release candidates over the course of around twenty months. We expect 2.6 to have one alpha, two betas and a release candidate over eight months. So please start trying using the beta as soon as you can so you can give us your feedback and we can fix your bugs!

Posted By Harry Pierson at 5:30 PM Pacific Daylight Time

IronPython 2.6 CTP for .NET 4.0 Beta 1

The .NET Framework 4.0 and Visual Studio 2010 Beta 1 is now generally available for download. Jason Zander has a very thorough rundown on some of the new features in this release. Of course, my favorite new features in VS2010 is the new dynamic language support in C# and Visual Basic, which let’s you easily call out to IronPython code from those languages.

For anyone who wants to experiment with interoperating C# or VB with IronPython, we released IronPython 2.6 CTP for .NET 4.0 Beta 1 today. There’s also a walkthru showing how you can use the standard Python library module random from both C# and VB. Note, there’s currently a URL bug in that walkthru – it links to IronPython 2.6 Alpha 1 rather than the .NET 4.0 Beta 1 IronPython CTP. Make sure you pick up the right version of IronPython if you want to try out the walkthru. Looks like they fixed the redirect in the walkthru.

FYI, this is a CTP quality release with about the same functionality as IronPython 2.6 Alpha 1.  Essentially, this is the version of IronPython that was in the source tree when the VS team branched for Beta 1.

If you’ve got any feedback, please drop us a line on the mailing list.

Posted By Harry Pierson at 11:54 AM Pacific Daylight Time

Thursday, May 07, 2009

Checkin Comments for IronPython Source

We’ve been slowly but surely increasing the frequency of IronPython source drops. When I joined the team last April, we we only pushing the source about twice a month (sometimes only once a month). By last July, we were pushing source about once a week. Since mid-January, we’ve pushed out the latest source 131 times, which comes to about once a day on average since the start of the year. Big kudos to Dave Fugate, who’s primarily responsible for improving the frequency of our source code drops.

However, while we’ve been good about source code drop frequency, we haven’t been good about transparency. All those source drops have the same less-than-useful checkin comment “Latest IP sources migrated to CodePlex TFS”. If you wanted to know what was changed in a given changeset, you had to do the diff yourself.

But all that opaque code changes is a thing of the past now. Dave upgraded out source push script so that it emails a list of changes as well as the checkin comments whenever we update the source on CodePlex. For example, check out the source push announcement for our latest source drop.  Now we publish added, deleted and modified sources as well as the comments for any checkins included in the source drop.

As Dave said on the mailing list, please let us know if you have any feedback on these source update emails. I think they’re awesome (though I did have one small suggestion) but we want to know what you think.

Posted By Harry Pierson at 11:19 AM Pacific Daylight Time

Friday, April 24, 2009

__clrtype__ Metaclasses Demo: Silverlight Databinding

I’ve gotten to the point where I can actually demo something interesting with __clrtype__ metaclasses: Silverlight Databinding. This is a trivial sample, data binding a list of Products (aka the sample class I’ve been using all week) to a list box. But according to Jimmy, this is something he gets asked about on a regular basis and there’s a AgDLR bug open for this. The __clrtype__ feature is specific to IronPython but I bet the IronRuby guys could implement something similar if they wanted to.

When you install IronPython 2.6 (or 2.0.1 for that matter), it comes with the AgDLR bits in the Silverlight subfolder. This includes Silverlight compatible versions of the DLR and IronPython as well as the Silverlight DLR host and the development web server Chiron in the Silverlight\bin directory. There is also a script in the Silverlight\script directory that will generate a dynamic Silverlight application from a template. I ran “sl.bat python sldemo” in order to build the skeleton project.

In the generated app.xaml file, I removed the default text box and replaced it with this XAML code that I stole nearly-verbatim from my blog post on data binding in WPF with IronPython. The only thing I changed was the binding path for the text block (title became name).

    <ListBox x:Name="listbox1" > 
      <ListBox.ItemTemplate> 
        <DataTemplate> 
          <TextBlock Text="{Binding Path=name}" /> 
        </DataTemplate> 
      </ListBox.ItemTemplate> 
    </ListBox>

Then in the App class, I set the ItemsSource of the ListBox to a hand-built a list of Products.

class App:
  def __init__(self):
    root = Application.Current.LoadRootVisual(UserControl(), "app.xaml")
    root.listbox1.ItemsSource = [
      Product("Crunchy Frog"1012),
      Product("Rams Bladder Cup"1012),
      Product("Cockroach Cluster"1012),
      Product("Anthrax Ripple"1012),
      Product("Spring Suprise"1012)]

And that’s pretty much it. I used Chiron’s /z command to create a Silverlight XAP file, uploaded it to Silverlight Streaming and embedded it right here in this post. Code is up on my skydrive as well. Uusing Silverlight Streaming for this app was very easy - basically upload the XAP file to their server and embed some iframe code in this post via the source view and that was it. I’m not sure I would use it for a production app, but it rocked for hosting this demo.

The XAP is a big download for such a trivial app - about 1.3MB. The vast majority of that is the DLR and IronPython assemblies. The XAP would only be 2.9kB if it was just the Python, XAML and manifest files. This kinda stinks, but there’s a new transparent platform extensions feature in Silverlight 3 so we can at least break the DLR and IronPython DLLs out into their own separate XAPs. That way they only get downloaded once and cached in the browser instead of being included in every single IronPython Silverlight application anyone creates.

So that’s one scenario down, one to go. In order to be able to build WCF services in IronPython, I have to add a lot more infrastructure – notably emitting CLR methods that can invoke dynamic methods as well as emitting custom attributes. Invoking dynamic methods means understanding DLR binders, so look for more posts on __clrtype__ next week.

Posted By Harry Pierson at 2:27 PM Pacific Daylight Time

__clrtype__ Metaclasses: Adding CLR Properties

When I was first experimenting with __clrtype__, I got to the point of making CLR fields work and then immediately tried to do some data binding with Silverlight. Didn’t work. Turns out Silverlight can only data bind against properties – fields aren’t supported. So now let’s add basic property support to ClrTypeMetaclass. Python has a rich mechanism for defining properties, but hooking that up requires DLR binders so for now I’m going to generate properties that are simple wrappers around the associated fields.

There’s enough code involved in defining a property to break it out into it’s own method:

  @staticmethod
  def define_prop(typebld, name, fieldtype, fieldbld):
    attribs = ( MethodAttributes.Public 
              | MethodAttributes.SpecialName 
              | MethodAttributes.HideBySig)
    clrtype = clr.GetClrType(fieldtype)
    
    getbld = typebld.DefineMethod("get_" + name, attribs, clrtype, None)
    getilgen = getbld.GetILGenerator()
    getilgen.Emit(OpCodes.Ldarg_0)
    getilgen.Emit(OpCodes.Ldfld, fieldbld)
    getilgen.Emit(OpCodes.Ret)

    setbld = typebld.DefineMethod("set_" + name, attribs, None, (clrtype,))
    setilgen = setbld.GetILGenerator()
    setilgen.Emit(OpCodes.Ldarg_0)
    setilgen.Emit(OpCodes.Ldarg_1)
    setilgen.Emit(OpCodes.Stfld, fieldbld)
    setilgen.Emit(OpCodes.Ret)

    prpbld = typebld.DefineProperty(name, 
      PropertyAttributes.None, clrtype, None)
    prpbld.SetGetMethod(getbld)
    prpbld.SetSetMethod(setbld)

You provide define_prop the TypeBuilder for the Type being constructed, the name and type of the property as well as the FieldBuilder that gets returned from the call to DefineField. In the previous installment, I wasn’t bothering to save the FieldBuilder to a variable since I never used it again. Now, I’m stashing it away for the call to define_prop as I’ll show below.

For each field, we define a get method, a set method and a property. The get function first executes ldarg_0 to load the current object reference onto the execution stack, then it executes ldfld to load the specified field from the object onto the stack, then it returns. The set function executes ldarg_0 to load the current object reference and ldarg_1 to load the value passed as the first argument onto the execution stack, then it executes stfld to store the value in the specified field of the object. Once I have the two methods, I call DefineProperty to create the PropertyBuilder and then associate the get and set methods with that property.

As I said before, Reflection.Emit is straightforward though tedious. Honestly, I didn’t go thru the Emit docs to figure out what the methods should look like. Instead, I wrote a basic wrapper property in C# and looked at the generated IL in Reflector.

The only other change here is adding the call to define_prop on our first iteration thru list of _clrfields. Since the rest of __clrtype__ is the same, here’s just that code snippet:

    if hasattr(cls, "_clrfields"):
      for fldname in cls._clrfields: 
        fieldtype = clr.GetClrType(cls._clrfields[fldname])
        fieldbld = typebld.DefineField(fldname, fieldtype, 
                             FieldAttributes.Public)
        ClrTypeMetaclass.define_prop(typebld, fldname, fieldtype, fieldbld)

As I said above, I simply save off the result of calling DefineField so I can pass it to define_prop. I also save off the field type in a variable since I use it more than once. Avoids the second dictionary lookup and is clearer to understand what the function does.

Accessing the CLR properties via reflection is pretty straightforward – not very different than reflecting over CLR fields. The only significant difference between them is that CLR properties can be indexable and fields can’t, so you have to pass an index parameter to GetValue and SetValue. These aren’t indexed properties, so I pass in None for the index parameter.

>>> = Product("Crunchy Frog"1012)
>>> pi = p.GetType().GetProperty("name")
>>> pi.GetValue(p, None)
'Crunchy Frog'
>>> pi.SetValue(p, "Spring Surprise"None)
>>> pi.GetValue(p, None)
'Spring Surprise'
>>> p.name
'Spring Surprise'

One quick aside about the CLR type I’m generating here. I’m fairly certain this reflected object wouldn’t pass muster with the C# compiler. I’m defining a field and a property with the same name. It clearly works at the IL level, but I’m not sure what the C# compiler would do if you tried to refer to a CLR type like this. I should probably be prepending an underscore or something on the field name, but then I wonder if the field should also be private. There’s a whole API design discussion down that road, but I’m not quite ready to have that yet so I’m just leaving the fields public and having fields and properties with the same name. Luckily, I’m never generating a CLR type on disk so you can’t build a C# project that refers to it anyway.

Posted By Harry Pierson at 1:47 PM Pacific Daylight Time

Thursday, April 23, 2009

__clrtype__ Metaclasses: Adding CLR Fields

Now that we have the basic __clrtype__ metaclass infrastructure in place, let’s enhance it to add support for CLR fields. To do this, we’re going to need to add two things to our custom CLR type. First, we need to define the fields themselves. Second, we need to make sure that Python code will read and writes to the statically typed fields for the specified names rather than the storing them in the object dictionary as usual. Here’s the updated version of ClrTypeMetaclass (or you can get it from my skydrive)

class ClrTypeMetaclass(type):
  def __clrtype__(cls):
    baseType = super(ClrTypeMetaclass, cls).__clrtype__()
    typename = cls._clrnamespace + "." + cls.__name__ \
                 if hasattr(cls, "_clrnamespace") \
                 else cls.__name__
                 
    typegen = Snippets.Shared.DefineType(typename, baseType, TrueFalse)
    typebld = typegen.TypeBuilder

    for ctor in baseType.GetConstructors(): 
      ctorparams = ctor.GetParameters()
      ctorbld = typebld.DefineConstructor(
                  ctor.Attributes,
                  ctor.CallingConvention,
                  tuple([p.ParameterType for p in ctorparams]))
      ilgen = ctorbld.GetILGenerator()
      ilgen.Emit(OpCodes.Ldarg, 0)
      for index in range(len(ctorparams)):
        ilgen.Emit(OpCodes.Ldarg, index + 1)
      ilgen.Emit(OpCodes.Call, ctor)
      ilgen.Emit(OpCodes.Ret)

    if hasattr(cls, "_clrfields"):
      for fldname in cls._clrfields: 
        typebld.DefineField(
          fldname, 
          clr.GetClrType(cls._clrfields[fldname]), 
          FieldAttributes.Public)
          
    new_type = typebld.CreateType()
    
    if hasattr(cls, "_clrfields"):
      for fldname in cls._clrfields: 
        fldinfo = new_type.GetField(fldname)
        setattr(cls, fldname, ReflectedField(fldinfo))
        
    return new_type

All the base type, type name, type builder and constructor code in the first half of the __clrtype__ method is the same as last time, so we’ll focus on the second half. After emitting the constructor(s), next we iterate thru a dictionary named _clrfields (if it exists in the class) that maps field names to types. For each of these dictionary entries, we emit a public field on the CLR type with the specified name and type.

The first time I tried this, I simply added the custom field generation code I just described and left it at that. Didn’t work. Python doesn’t look to store information in fields defined by the static type metadata unless explicitly instructed to. That’s why I need to iterate over the declared list of fields a second time after the type has been created. The first time creates the CLR fields, the second time inserts a ReflectedField instance into the class dictionary. ReflectedField is a Python descriptor that reads and writes the field value by calling GetValue and SetValue on the contained FieldInfo object. Python uses the same name resolution for fields as it does for method (In Python, methods are fields that store callable objects) so when IronPython discovers the ReflectedField descriptor in the class instance, it uses that to get or store the value rather than sticking it in the local dictionary.

Now here’s the new version of the Product class, this time with CLR fields as well as a custom type name:

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries"   
  _clrfields = {
    "name":str,
    "cost":float,
    "quantity":int,
    }
    
  def __init__(self, name, cost, quantity):
    self.name = name
    self.cost = cost
    self.quantity = quantity
    
  def calc_total(self):
    return self.cost * self.quantity

As you can see, the only thing that’s changed is the addition of the _clrfields dictionary. But now, we can use reflection to get and set the Product fields, like so:

>>> = Product("Crunchy Frog"5.9910)
>>> = p.GetType()
>>> p.name
'Crunchy Frog'
>>> namefi = t.GetField("name")
>>> namefi.GetValue(p)
'Crunchy Frog'
>>> namefi.SetValue(p, "Spring Surprise")
>>> p.name
'Spring Surprise'

This is great progress, but not enough to get us to our first “real” scenario: data binding in Silverlight. Silverlight only supports data binding against public properties, so I’ll need to wrap all these CLR fields in CLR properties in my next post.

Posted By Harry Pierson at 11:30 AM Pacific Daylight Time

Wednesday, April 22, 2009

__clrtype__ Metaclasses: Customizing the Type Name

Now that we know a little about how IronPython uses CLR types under the hood, let’s start customizing those types. In a nutshell, __clrtype__ metaclasses are metaclasses that implement a function named __clrtype__ that takes the Python class definition as a parameter and returns a System.Type. IronPython will then use the returned Type  as the underlying CLR type whenever you create an instance of the Python class.

Technically, you could emit whatever custom CLR Type you want to in the __clrtype__, but typically you’ll want to emit a class that both implements whatever static CLR metadata you need as well as the dynamic binding infrastructure that IronPython expects. The easiest way to do this is to ask IronPython emit a type that handles all the dynamic typing and then inherit from that type to add the custom CLR metadata you want.

Let’s start simple and hello-worldly by just customizing the name of the generated CLR type that’s associated with the Python class. There’s a fair amount of boilerplate code that is needed even for this simple scenario, and I can build on that as we add features that actually do stuff. If you want to follow along at home, you’ll need IronPython 2.6 Alpha 1 (or later) and you can get this code from my SkyDrive.

class ClrTypeMetaclass(type):
  def __clrtype__(cls):
    baseType = super(ClrTypeMetaclass, cls).__clrtype__()
    typename = cls._clrnamespace + "." + cls.__name__ \
                 if hasattr(cls, "_clrnamespace") \
                 else cls.__name__
                 
    typegen = Snippets.Shared.DefineType(typename, baseType, TrueFalse)
    typebld = typegen.TypeBuilder

    for ctor in baseType.GetConstructors(): 
      ctorparams = ctor.GetParameters()
      ctorbld = typebld.DefineConstructor(
                  ctor.Attributes,
                  ctor.CallingConvention,
                  tuple([p.ParameterType for p in ctorparams]))
      ilgen = ctorbld.GetILGenerator()
      ilgen.Emit(OpCodes.Ldarg, 0)
      for index in range(len(ctorparams)):
        ilgen.Emit(OpCodes.Ldarg, index + 1)
      ilgen.Emit(OpCodes.Call, ctor)
      ilgen.Emit(OpCodes.Ret)

    return typebld.CreateType()

Like all Python metaclasses, ClrTypeMetaclass inherits from the built-in Python type object. If I wanted to customize the Python class as well, I could implement __new__ on ClrTypeMetaclass , but I only care about customizing the CLR type so it only implements __clrtype__. If you want to know more about what you can do with Python metaclasses, check out Michael Foord’s Metaclasses in Five Minutes.

First off, I want to get IronPython to generate the base class that will implement all the typical Pythonic stuff like name resolution and dynamic method dispatch. To do that, I call __clrtype__ on the supertype of ClrTypeMetaclass – aka the built-in type object. That function returns the System.Type that IronPython would have used as the underlying CLR type for the Python class if we weren’t using __clrtype__ metaclasses.

Once I have the base class, next I figure out what the name of the generated CLR type will be. This is pretty simple, I just use the name of the Python class. To make this logic a little more interesting, I added support for a custom namespace. If the Python class has a _clrnamespace field, I append that as the custom namespace for the name. I should probably be using a double underscore – i.e. __clrnamespace – but I didn’t want to wrestle with name mangling in this prototype code.

Now that I have a name and a base class, I can generate the class I’m going to use. I’m using the DefineType method in Microsoft.Scripting.Generation.Snippets DLR class for three reasons. First, there’s a CLR bug that doesn’t let you create a dynamic assembly from a dynamic method. Second, reusing the snippets assembly avoids the overhead of generating a new assembly. Finally, the types in Snippets.Shared get saved to disk if you run with the -X:SaveAssemblies flag, so you can inspect custom CLR type that gets generated. The DefineType function takes four parameters, the type name, the base class, a preserve name flag and a generate debug symbols flag. If you pass false for preserve name, you get a name like foobar$1 instead of just foobar. As for debug symbols, since I don’t have any source code that I’m generating IL from, emitting debug symbols doesn’t make a lot of sense. DefineType returns a TypeGen, but I only need the TypeBuilder.

The last thing I need to do is implement the custom CLR type constructor(s). IronPython CLR types will always have at least one parameter – the PythonType (PythonType == IronPython’s implementation of Python’s built-in type object) that’s used for dynamic name resolution. I don’t want to add any custom functionality in my custom CLR type constructors, so I simply iterate thru the list of constructors on the base class and generate a constructor on the custom CLR type with a matching parameter list and that calls the base class constructor. 

Generating the IL to emit the constructor and the base class is straightforward, if tedious. I define the constructor with the same attributes, calling convention and parameters as the base class constructor. Then I emit IL to load the local instance (i.e. ldarg 0) and all the parameters onto the stack, call the base constructor and finally return. Once all the constructors are defined, I can create the type and return.

Using the ClrTypeMetaclass is very easy - simply specify the __metaclass__ field in a class. If you want to customize the namespace, specify the _clrnamespace field as well. Here’s an example:

class Product(object):
  __metaclass__ = ClrTypeMetaclass
  _clrnamespace = "DevHawk.IronPython.ClrTypeSeries"   
  
  def __init__(self, name, cost, quantity):
    self.name = name
    self.cost = cost
    self.quantity = quantity
  
  def calc_total(self):
    return self.cost * self.quantity

You can verify this code has custom CLR metadata by calling GetType on a Product instance and inspecting the result via standard reflection techniques.

>>> = Product('Crunchy Frog'1020)
>>> m.GetType().Name
'Product'
>>> m.GetType().FullName
'DevHawk.IronPython.ClrTypeSeries.Product'

Great, so now I have a custom CLR type for my Python class. Unfortunately, at this point it’s pretty useless. Next, I’m going to add instance fields to the CLR type.

Posted By Harry Pierson at 12:51 PM Pacific Daylight Time

Tuesday, April 21, 2009

__clrtype__ Metaclasses: IronPython Classes Under the Hood

Before we start using __clrtype__ metaclasses, we need to understand a bit about how IronPython maps between CLR types and Python classes. IronPython doesn’t support Reflection based APIs or custom attributes today because IronPython doesn’t emit a custom CLR types for every Python class. Instead, it typically shares a single CLR type across many Python classes. For example, all three of these Python classes share a single underlying CLR type.

class shop(object):
  pass 

class cheese_shop(shop):
  def have_cheese(self, cheese_type):
    return False

class argument_clinic(object):
  def is_right_room(self, room=12):
    return "I've told you once"

import clr
print clr.GetClrType(shop).FullName
print clr.GetClrType(cheese_shop).FullName
print clr.GetClrType(argument_clinic).FullName 

Even though cheese_shop inherits from shop and argument_clinic inherits from object, all three classes share the same underlying CLR type. On my machine, running IronPython 2.6 Alpha 1, that type is named “IronPython.NewTypes.System.Object_1$1”.

IronPython can share the CLR type across multiple Python classes because that CLR type has no code specific to a given Python class. CLR types are immutable – once you build a CLR type, you can’t do things like add new methods, remove existing method or change the inheritance hierarchy. But all those things are legal to do in Python. Here, I’m creating an instance of the cheese_shop class, but then changing that instance to be an argument_clinic instance instead.

>>> cs = cheese_shop()

>>> cs.have_cheese("Venezuelan Beaver Cheese"
False
>>> cs.is_right_room(12
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'cheese_shop' object has no attribute 'is_right_room'

>>> # Change the object's class at runtime
>>> cs.__class__ = argument_clinic # don't try this in C#!

>>> cs.have_cheese("Venezuelan Beaver Cheese"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'argument_clinic' object has no attribute 'have_cheese'
>>> cs.is_right_room(12
"I've told you once"

When you call a method on a Python object, the name is resolved by walking a series of dictionaries. First, the dictionary of the object itself is searched for the method name. Assuming the name isn’t in the object dictionary, Python then looks in the __class__ dictionary. If it’s not there, Python recursively looks through the base classes stored in the __bases__ tuple until it finds the method or the name fails to resolve. If we re-assign __class__ at run time, we change the dictionary Python uses to resolve method names.

There are cases where IronPython generates a new underlying CLR type. For example, if you build a Python class that inherits from a CLR type, then IronPython will have to generate a new underlying CLR type that inherits from the CLR type in order to remain compatible. IronPython automatically overrides all the virtual methods of the base type, implementing the same dynamic method dispatch that I described above. This lets you pass the IronPython class wherever the base CLR type is expected.

The ability to swap Python classes at runtime depends on having the same underlying CLR type. If the underlying CLR type doesn’t match, then assigning a new value to the __class__ field of an object will fail. This applies both to IronPython classes that inherit from CLR types as well as __clrtype__ metaclass types. In the code I’ll be blogging, I always generate a unique CLR type for every Python class, which means that I can’t dynamically retype the object. Given that the point of __clrtype__ metaclasses is to generate static type information, this hardly seems like a limitation. However, it’s something to be aware of as we explore the __clrtypes__ feature.

Posted By Harry Pierson at 10:59 AM Pacific Daylight Time

Monday, April 20, 2009

Introducing __clrtype__ Metaclasses

Everyone knows Anders announced at PDC08 that C# 4.0 will include new features (aka the dynamic keyword + the DLR) that makes it much easier for C# to call into dynamically typed code. What you probably don’t know is that IronPython 2.6 includes a new feature that makes it easier for IronPython code to be called by statically typed code.

While the vast majority of .NET is available to IronPython, there are certain APIs that just don’t work with dynamic code. In particular, any code that uses Reflection over an object’s CLR type metadata won’t work with IronPython. For example, while WPF supports ICustomTypeDescriptor, Silverlight only supports data binding against reflectable properties. Furthermore, any code that uses custom attributes inherently uses Reflection. For example, Darrel Hawley recently blogged a WCF host he wrote in IronPython, but he wrote the WCF service in C#. You can’t write WCF services in IronPython because WCF expects service classes to be adorned with ServiceContract and OperationContract attributes (among many others). IronPython users want access to use these APIs. Support for custom attributes is one of the most common requests we get - it’s currently the 5th highest vote getter among open issues.

In IronPython 2.6, we’re adding the ability to customize the CLR type of Python classes. This means you can add custom attributes, emit properties, whatever you want. For those of you who’ve been dreaming of implementing WCF services or databinding in Silverlight purely in IronPython, then this is the feature for you.

In a nutshell, IronPython 2.6 extends Python’s metaclass feature that lets you to customize the creation of classes. In the metaclass, you can implement an IronPython-specific method __clrtype__ which returns a custom System.Type of your own creation that IronPython will then use as the underlying CLR type of the Python class. Implementing __clrtype__ gives you the chance to implement whatever reflectable metadata you need: constructors, fields, properties, methods, events, custom attributes, nested classes, whatever.

Over a series of posts, I’ll be demonstrating this new feature and implement some common scenario requests – including Silverlight databinding and WCF services – purely in Python. Quick warning: __clrtype__ uses low level features like Python metaclasses, Reflection.Emit and DLR Binders so these posts will be deeper technically than usual. Don’t worry – this isn’t the API interface we expect everyone to use. Eventually, we want to have an easy to use API that will sit on top of the low-level __clrtype__ hook.

Posted By Harry Pierson at 10:17 AM Pacific Daylight Time

Wednesday, April 08, 2009

Writing an IronPython Debugger: Breakpoint Management

Setting a breakpoint was the second feature I implemented in ipydbg. While setting a breakpoint on the first line of the Python file being run is convenient, it was obviously necessary to provide the user a mechanism to create their own breakpoints, as well as enable and disable existing breakpoints.

First thing I had to do was to refactor the create_breakpoint method. Originally, I was searching thru the symbol documents looking for the one that matched the filename in OnUpdateModuleSymbols. However, since I wanted to specify by new breakpoints via the same filename/line number combination, it made more sense to move symbol document logic into create_breakpoint:

def create_breakpoint(module, filename, linenum):
    reader = module.SymbolReader
    if reader == None:
      return None
    
    # currently, I'm only comparing filenames. This algorithm may need
    # to get more sophisticated to support differntiating files with the 
    # same name in different paths
    filename = Path.GetFileName(filename)
    for doc in reader.GetDocuments():
      if str.Compare(filename, Path.GetFileName(doc.URL), True== 0:
        linenum = doc.FindClosestLine(linenum)
        method = reader.GetMethodFromDocumentPosition(doc, linenum, 0)
        function = module.GetFunctionFromToken(method.Token.GetToken())
        
        for sp in get_sequence_points(method):
          if sp.doc.URL == doc.URL and sp.start_line == linenum:
            return function.ILCode.CreateBreakpoint(sp.offset)
        
        return function.CreateBreakpoint()

The new version isn’t much different than the old. It loops thru the symbol documents looking for one that matches the filename argument. Then it creates the breakpoint the same way it did before. Eventually, I’m going to need a better algorithm than “only compare filenames”, but it works for now.

Once I made this change, it was trivial to implement a breakpoint add command. What was harder was deciding on the right user experience for this. I decided that breakpoint management was going to be the first multi-key command in ipydbg. so all the debug commands are prefixed with a “b”. I use the same command routing decorator I used for input commands. As you can see, my breakpoint command looks a lot like my top level input method – read a key from the console then dispatch it via a commands dictionary that gets populated by @inputcmd decorators.

@inputcmd(_inputcmds, ConsoleKey.B)
def _input_breakpoint(self, keyinfo):
    keyinfo2 = Console.ReadKey()
    if keyinfo2.Key in IPyDebugProcess._breakpointcmds:
        return IPyDebugProcess._breakpointcmds[keyinfo2.Key](self, keyinfo2)
    else:
        print "\nInvalid breakpoint command"str(keyinfo2.Key)
        return False

Currently, there are four breakpoint commands: “a” for add, “l” for list, “e” for enable and “d” for disable. List is by far the simplest.

@inputcmd(_breakpointcmds, ConsoleKey.L)
def _bp_list(self, keyinfo):
  print "\nList Breakpoints"   
  for i, bp in enumerate(self.breakpoints): 
    sp = get_location(bp.Function, bp.Offset)
    state = "Active" if bp.IsActive else "Inactive"
    print "  %d%s:%d %s" % (i+1, sp.doc.URL, sp.start_line, state)
  return False

As you can see, I’m keeping a list of breakpoints in my IPyDebugProcess class. Originally, I used AppDomain.Breakpoints list, but that only returns enabled breakpoints so I was forced to store my own list. Note also that I’m using the enumerate function, which returns a tuple of the collection count and item. I do this so I can refer to breakpoints by number when enabling or disabling them:

@inputcmd(_breakpointcmds, ConsoleKey.E)
def _bp_enable(self, keyinfo):
  self._set_bp_status(True)
  
@inputcmd(_breakpointcmds, ConsoleKey.D)
def _bp_disable(self, keyinfo):
  self._set_bp_status(False)

def _set_bp_status(self, activate):
  stat = "Enable" if activate else "Disable"
  try:
    bp_num = int(Console.ReadLine())
    for i, bp in enumerate(self.breakpoints): 
      if i+1 == bp_num:
        bp.Activate(activate)
        print "\nBreakpoint %d %sd" % (bp_num, stat)
        return False
    raise Exception"Breakpoint %d not found" % bp_num
    
  except Exception, msg:
    with CC.Red: print "&s breakpoint Failed %s" % (stat, msg)

Since the code was identical, except for the value passed to bp.Activate, I factored the code into a separate _set_bp_status method. After the user presses ‘b’ and then either ‘e’ or ‘d’, they then type the number of the breakpoint provided by the breakpoint list command. _set_bp_status then simply iterates thru the list until it finds the matching breakpoint and calls Activate. Note that since it’s possible to have 10 or more breakpoints, I’m using ReadLine instead of ReadKey, meaning you have to hit return after you type in the breakpoint number.

Finally, I need a way to create new breakpoints. With the refactoring of create_breakpoint, this is pretty straightforward

@inputcmd(_breakpointcmds, ConsoleKey.A) 
def _bp_add(self, keyinfo): 
  try
    args = Console.ReadLine().Trim().split(':'
    if len(args) != 2raise Exception"Only pass two arguments"  
    linenum = int(args[1]) 
     
    for assm in self.active_appdomain.Assemblies: 
      for mod in assm.Modules: 
          bp = create_breakpoint(mod, args[0], linenum) 
          if bp != None
            self.breakpoints.append(bp) 
            bp.Activate(True
            Console.WriteLine( "Breakpoint set"
            return False 
    raise Exception"Couldn't find %s:%d" % (args[0], linenum)     

  except Exception, msg: 
    with CC.Red: 
      print "Add breakpoint failed", msg 

Most of _bp_add is processing the input arguments, looping through the modules and then storing the breakpoint that gets returned. When I set the initial breakpoint inside OnUpdateModuleSymbols, I have the module with updated symbols as an event argument. However, in the more general case we’ve got no way of knowing which module of the current app domain contains the filename in question. So we loop thru all the modules, calling create_breakpoint on each until one returns a non-null value. Of course, “all the modules” will include the IronPython implementation, but assuming you’re running against released bits the call to create_breakpoint will return right away if debug symbols aren’t available.

As usual, the latest version is up on GitHub. This will be the latest update to ipydbg for a little while. I worked on it quite a bit while I was at PyCon and have been busy with other things since I got home. Don’t worry, I’ll come back to it soon enough. As I mentioned Monday, I want to get function evaluation working so I can have a REPL console running in the target process instead of the one I’ve got currently running in the debugger process.

Posted By Harry Pierson at 2:45 PM Pacific Daylight Time

Monday, April 06, 2009

Pygments for WL Writer v1.0.1

I just replaced the original v1.0.0 Pygments for WL Writer installer with a new and improved v1.0.1. The original URL still works – I archived the old version off with a new name. Updated source is available on on GitHub.

The only change is that I now override OnSelectedContentChanged in the sidebar control. That way, if I have multiple blocks of pygmented code in a given post, the sidebar UI updates with the correct language and color scheme of the currently selected code block.

Posted By Harry Pierson at 12:57 PM Pacific Daylight Time

Writing an IronPython Debugger: REPL Console

While I was banging my head against a wall experimenting with understanding how CorValue extraction worked, I found myself wanting to dink around with the debugger objects in a REPL console. One of IronPython’s core strengths is support for “exploratory programming” via the REPL. It turned out bringing a REPL to ipydbg was quite simple.

Python includes two built-in features that making DIY REPL quite easy: compile and exec (though technically, exec is a statement, not a function). As you might assume from their names, compile converts a string into what Python calls a code object while exec executes a code object in a given scope. Technically, exec can accept a string so I could get by without using compile. However, if you’re compiling a single interactive statement compile can automatically insert a print statement if you’ve passed in a an expression. In other words, if you type in “2+2” on the console it will print “4”, which is the behavior I wanted.

Here’s what my REPL console code look like. I love that it’s only 20 lines of code.

@inputcmd(_inputcmds, ConsoleKey.R)
def _input_repl_cmd(self, keyinfo):
  with CC.Gray:
    print "\nREPL Console\nPress Ctl-Z to Exit"
    cmd = ""
    _locals = {'self'self}

    while True:
      Console.Write(">>>" if not cmd else "...")
      
      line = Console.ReadLine()
      if line == None:
        break
      
      if line:
        cmd = cmd + line + "\n"
      else:
        try:
          if len(cmd) > 0:
            exec compile(cmd, "<input>""single"in globals(),_locals
        except Exception, ex:
          with CC.Red: print type(ex), ex
        cmd = ""

It’s pretty straightforward. I set up a dictionary to act as the local variable scope for the code that gets executed. I’m just reusing the current global scope, but I want the local scope to start with only the reference to the current IPyDebugProcess instance which is passed into _input_repl_cmd as “self”. All the other local variables like cmd and line won’t be available to the REPL code. Then I drop into a loop where I read lines from the console and execute them.

In order to support multi-line statements, I build up the cmd variable over multiple line inputs and I don’t execute it until the user inputs an empty line. In the standard Python console, it can recognize single line statements and execute them immediately. Dino showed me how to use the IronPython parser to do the same thing, but I haven’t implemented that in ipydbg yet. To exit the REPL loop, you type Ctl-Z, which returns None (aka null) from ReadLine instead of the empty string.

Since I never execute the code more than once, I have my exec and compile statements together on a single line. Compile takes the string to be compiled, the name of the file it came from (I’m using <input> for this) and the kind of code. Passing in “single” for the kind of code adds the auto-expression-print functionality I mentioned above. Then I exec the code object that’s returned in specified scope I’m managing for this instance of the REPL loop. If you exit out of the REPL and re-enter it, you get a fresh new copy of the local scope so any functions or variables you define in the last REPL are gone.

Runtime execution of code into a given scope is a hallmark of dynamic languages, but I’m still fairly green when it comes to Python so it took me a while to figure this out. Python code executes in a given scope, a combination of global and local variables. When you’re in the ipy.exe REPL, you’re at top level scope anyway, so global and local scope are the same – if you add something to global scope, it shows up in local scope and vis-versa. Inside a function, you’ll have the same global scope, but the local scope will be different and changes to one won’t be reflected in the other. The ipydbg REPL isn’t a function per-se, but it does provide an explicit local scope that gets disposed when you exit the REPL.

While having a debugger REPL is really convenient for prototyping new ipydbg commands, it’ll really shine once I get function evaluation working. Then I’ll be able to open a REPL console where the commands are executed in the target process instead of the debugger process as they are now. That will be very cool. Until then, the latest code is – as always – up on GitHub.

Posted By Harry Pierson at 12:07 PM Pacific Daylight Time

Writing an IronPython Debugger: Getting Arguments

It’s a small update, but I added support for displaying method arguments along side the local variables. As I mentioned in that post, breaking out the CorValue extraction and display code into a shared function was a good idea – adding support for getting arguments was trivial since I could reuse that code.

Because there’s no hierarchy of scopes to deal with and the names are in the metadata instead of debug symbols, getting arguments is much easier than getting local variables.

def get_arguments(frame): 
    mi = frame.GetMethodInfo() 
    for pi in mi.GetParameters(): 
      if pi.Position == 0continue 
      arg = frame.GetArgument(pi.Position - 1
      yield pi.Name, arg

You’ll notice that I’m yielding the arguments as a tuple of the name and value, the same as get_locals yields. I did refactor get_locals a bit – there’s no longer an argument to skip hidden variables anymore (though get_locals still skips dynamic call sites caches as it did before). Now, it’s up to the the caller of get_arguments and get_locals to filter hidden variables as they see fit.

Because get_locals and get_arguments yield the same types, I was able to factor the code to print a value and loop through the collection of values into separate local functions.

@inputcmd(_inputcmds, ConsoleKey.L)  
def _input_locals_cmd(self, keyinfo):  
  def print_value(name, value):  
    display, type_name = display_value(extract_value(value))  
    with CC.Magenta: print "  ", name,   
    print display,  
    with CC.Green: print type_name  
      
  def print_all_values(f, show_hidden):  
      count = 0  
      for name,value in f(self.active_thread.ActiveFrame):  
        if name.startswith("$"and not show_hidden:  
          continue  
        print_value(name, value)  
        count+=1          
      return count  
        
  print "\nLocals"  
  show_hidden = \ 
    (keyinfo.Modifiers & ConsoleModifiers.Alt) == ConsoleModifiers.Alt  
  count = print_all_values(get_locals, show_hidden)  
  count += print_all_values(get_arguments, show_hidden)  

  if count == 0:  
      with CC.Magenta: print "  No Locals Found"  

I really like the local functions feature of Python. In C#, you can define an anonymous delegate using the lambda syntax. But for a scenario like this, I like local functions better. However, I do like C#’s support for statement lambdas – Python only supports expression lambdas. So while I like local functions better in this scenario (because I’m using the method more than once) in something like an event handler, I like the statement lambda syntax better.

As usual, the latest version of ipydbg is up on GitHub.

Posted By Harry Pierson at 9:46 AM Pacific Daylight Time

Sunday, April 05, 2009

Pygments for Windows Live Writer

For the past few years, I’ve used the CodeHTMLer plugin for Windows Live Writer for the code snippets in my blog. However, recently I discovered the Pygments Python syntax highlighter package which supports scores more languages than CodeHTMLer does. It also support multiple color schemes and was easily extensible so I could build an HTML formatter that didn’t use <pre> tags (which I’ve found DasBlog has issues with in the RSS feed, though honestly I’m running three minor releases behind the latest DasBlog release). IronPython supports Pygments just fine – at least, the one IPy bug that Pygments exposes has a simple workaround – so I set about building a Windows Live Writer plugin that uses it.

If you’re simply interested in the plugin itself, you can get it from my skydrive. The source is up on GitHub. For now, if you find any bugs, please leave a comment on this post. If there’s enough interest I’ll setup a site somewhere (CodePlex perhaps) where I can track bugs and feature requests.

Pygments for WL Writer is a smart content source. In WL Writer’s terminology, that means when you click inserted text in the editor window, it is treated as an atomic entity which you can then edit by using the Edit Code button in the Pygments for WL Writer sidebar editor. I I often found that I would edit my code multiple times – usually to shorten lines so they’d fit on my blog without wrapping. CodeHTMLer for WL Writer is a standard content source, so it just spews the formatted code as HTML onto the page.

From an IronPython perspective, there’s some interesting stuff there. I decided to compile the pygments library into a DLL for easier distribution. If you look in the source, there’s a folder for the Pygments source as well as the parts of the standard Python library that Pygments depends on and my custom HTML formatter. Those all get compiled via a custom script which can be called by the build.bat file in the project root.

Some features I’m thinking about adding:

  • An extensibility model so that you can add new languages by dropping new Pygments lexers into the same folder the plugin is installed to. Pygments supports lots of languages, but not all of them – notably it’s missing Powershell and F#.
  • Support for new HTML formatters and color schemes using the same extensibility mechanism described above.
  • Support for selecting an HTML formatter.
  • Improving the code editor window. Currently, I’m using a standard WinForms multi-line TextBox, but that leaves a lot to be desired. With the Python work I do, I often need to be able to select a bunch of text and change it’s indenting via tab and shift-tab. If anyone has a suggestion for a good WinForms text editing control, let me know.
  • Being able to specify the font and size of the Pygmented code.
  • Storing user preferences – remembering the most recent syntax and color scheme the user used.

Feedback, as always is appreciated. I’ll probably write a few posts about the project when I get a chance, so let me know if there’s anything you’re dying to hear about.

Posted By Harry Pierson at 12:28 PM Pacific Daylight Time

Wednesday, April 01, 2009

Writing an IronPython Debugger: Command Routing

At this point, ipydbg support seven commands: Continue, Quit, Show Stack Trace, Show Locals, Step Over, Step In, and Step Out. All these commands are invoked by a single keystroke. I’m using Console.ReadKey in an attempt to cut down on the number of keystrokes needed for interacting with the debugger. If I only type ‘s’ instead of ‘s <enter>’ to step, I figure I’ll be twice as productive! :)

If I was writing ipydbg in C#, I could use switch statement to dispatch commands in the _input method based on user keystrokes. However, Python doesn’t have a switch statement so I’ve been using a cascading set of if/elif/else statements instead. When you get up to seven if/elif clauses plus an else clause, the code smell is pretty overwhelming.

# Only has three if/elif clauses, but it's already a little smelly
val = Console.ReadKey()   
if val.Key == 'a'
  result = 'a' 
elif val.Key == 'b' 
  result = 'b' 
elif val.Key == 'c' 
  result = 'c' 
else
  print "unknown key"

Python might not have a switch statement, but it does have first-order functions so you can get the effects of a switch by using a dictionary.

def do_a():    
  return 'a'   
def do_b():    
  return 'b'   
def do_c():    
  return 'c'   
_switch = {'a':do_a, 'b':do_b, 'c':do_c}    

val = Console.ReadKey()    
if val in _switch:
  result = _switch[val.Key]()
else:
  print "unknown key"

I like this approach much better. Individual if/elif blocks are now broken out into separate functions, which smells better than embedding them in one big function. Also, I like that my pseduo-switch statement is completely separate from the how the _switch dictionary is initialized. However, this approach also separates the pseudo-case statement functions from the _switch dictionary as well. That’s not a good thing. You can easily imagine screwing up by adding a new function but forgetting to manually update the _switch dictionary.

What I need is a way to declaratively associate the switch function with the dictionary lookup key that’s associated with it. Luckily, Python Decorators provides a very clean way to do this.

_switch = {}       

@inputcmd(_switch, 'a')
def do_a():     
  return 'a'    
@inputcmd(_switch, 'b')
def do_b():     
  return 'b'    
@inputcmd(_switch, 'c')
def do_c():     
  return 'c'    

val = Console.ReadKey()     
if val in _switch: 
  result = _switch[val.Key]() 
else
  print "unknown key"

I’ve blogged about decorators before when I wanted to automatically invoke operations on the right thread in my WPF photo viewing app. The @inputcmd decorator is a bit more complicated than the @BGThread and @UIThread decorators since @inputcmd decorator accepts arguments. Each of the @input command decorators in the code above is the equivalent to this code:

def do_a():      
  return 'a'

_tmp = inputcmd(_switch, 'a')
do_a = _tmp(do_a)

As you can see, the inputcmd function returns the decorator that wraps do_a, rather than being the decorator itself. This function that returns a function that returns a function is kinda confusing at first. But this approach allows you to configure the decorator for a specific purpose via the arguments – in this case, specifying which dictionary and which console key this function is associated with.

Also unlike @BGThread and @UIThread, I don’t actually want to modify the behavior of the methods decorated with @inputcmd. I only want to store a reference to them in the passed in dictionary. So implementing this decorator is very easy:

def inputcmd(cmddict, key):
    def deco(f):
        cmddict[key] = f
        return
    return deco

The decorator simply inserts the function into the passed-in dictionary using the passed in key. It then returns the function as is, so it’s not really rebinding the symbol to a new method (technically, it’s rebinding the symbol to the same function it’s currently bound to). If I wanted also wrap the passed in function to provide additional functionality, I could do that with a second locally defined function inside the deco function.

The latest version of ipydbg as been refactored to use @inputcmd instead of set of a cascading if/elif statement blocks. Now that that’s done, I can start working on multi-key commands.

Posted By Harry Pierson at 1:55 PM Pacific Standard Time

Tuesday, March 31, 2009

DevHawk on CodeCast

Ken Levy used to work around the corner from my office, back in his days on the VSX team. These days, he’s hosting the CodeCast (among other things) and he dropped my my office a while back to chat about IronPython for his podcast.

Check it out.

Posted By Harry Pierson at 10:36 AM Pacific Standard Time

Writing an IronPython Debugger: Displaying Values

Now that I can get the local variables for a given frame, I need to display them in the console. Eventually, I’d like to provide the ability to update the local variables as well, but you gotta crawl before you can run. Luckily, the debugger API is consistent about using same COM interfaces – wrapped by the managed CorValue class – to represent all data values, including local variables, function arguments and object fields. So the work I do now to display CorValues in the console will be reusable in other contexts down the road.

While the debugger API is consistent about how it represents values in the target process, the API it uses is very complicated. The primary COM interface for accessing values is ICorDebugValue, but it has eight siblings: ICorDebugReferenceValue, ICorDebugHandleValue, ICorDebugStringValue, ICorDebugObjectValue, ICorDebugGenericValue, ICorDebugBoxValue, ICorDebugArrayValue, ICorDebugHeapValue. All those COM interfaces are represented in managed code by CorValue and it’s subclasses.

Furthermore, confusingly ICorDebugValues have both a Type and an ExactType. ExactType is what .NET developers typically think of as the type, aka the CLR type. Well, the debugger API’s representation of the CLR type at any rate. You can retrieve the value’s metadata as a System.Type compatible object via value.ExactType.Class.GetTypeInfo().CorValue’s Type property, on the other hand, represents the object’s primitive or element type. For example, instances of .NET classes have an element Type of ELEMENT_TYPE_CLASS. There are a collection of primitive types (boolean, char, ints of various signage and size, floats of various size) as well as types you wouldn’t call primitive but that the runtime has specific knowledge of (string, array and value types - aka structs in C# terminology).

If you’re confused by all that, don’t worry so am I. Honestly, I’ve re-written this code several times, each time understanding the API just a bit better. Whatever the *right* way to use the interfaces, I’m sure I don’t know it. For my first cut at this, I essentially ported MDbg’s high level CorValue API – aka MDbgValue::InternalGetValue if you’re looking at the MDbg source code – over to Python. Along the way, I’ve improved on that code as I’ll describe below.

A given CorValue may be a primitive value like an int or it may be a reference to or a boxed version of some other CorValue object. So in order to print the CorValue, you have to go thru a series of attempts to dereference and unbox until you get to the “real” underlying CorValue object. From there, converting the value to a string I can print depends on the value’s element type. For primitive types like ints and floats, you can call CastToGenericValue to get a CorGenericValue “view” of the same CorValue object [1]. A CorGenericValue can read and write the raw bytes from memory in the target process of the value. The GetValue method reads the data from target process then does an unsafe cast to appropriate managed type. For example, an ELEMENT_TYPE_R4 CorValue gets cast into a System.Single. For CorValue strings, I call CastToStringValue and then access the String property. For classes, value types and objects, there’s no simple or standard approach to retrieving the data, so for now I return the result of calling CastToObjectValue. Eventually, I’ll want to provide a mechanism to read the specific fields of a class or value type.

Unfortunately, the mechanism above to read primitive types doesn’t work with IronPython. GetValue needs to know the correct element type in order to do the unsafe cast. For value types (aka any struct other than the basic primitives), GetValue will return a data as a byte array. The problem is that when you box a primitive, the original element types gets overwritten by ELEMENT_TYPE_VALUETYPE. You can’t get the original element type back, even after unboxing. So for boxed primitives, you can only retrieve the data as a raw byte array or as a CorObjectValue, neither of which is very useful.

Luckily, I was able to work around this. Under the hood, GetValue calls UnsafeGetValueAsType to do the actual work of reading the data from the target process and casting it to the right managed type. UnsafeGetValueAsType It accepts the an element type value as a method parameter. If your know the right element type value, you could call UnsafeGetValueAsType directly if instead of going thru GetValue. While boxing overwrites the original element type value, an unboxed CorValue still has the CLR type metadata available. So I was able to map CLR Types to element types (e.g. System.Single –> ELEMENT_TYPE_R4) in order to retrieve the underlying value of boxed primitive types.

_type_map = { 'System.Boolean': ELEMENT_TYPE_BOOLEAN,   
  'System.SByte'  : ELEMENT_TYPE_I1, 'System.Byte'   : ELEMENT_TYPE_U1,   
  'System.Int16'  : ELEMENT_TYPE_I2, 'System.UInt16' : ELEMENT_TYPE_U2,   
  'System.Int32'  : ELEMENT_TYPE_I4, 'System.UInt32' : ELEMENT_TYPE_U4,   
  'System.IntPtr' : ELEMENT_TYPE_I,  'System.UIntPtr': ELEMENT_TYPE_U,  
  'System.Int64'  : ELEMENT_TYPE_I8, 'System.UInt64' : ELEMENT_TYPE_U8,   
  'System.Single' : ELEMENT_TYPE_R4, 'System.Double' : ELEMENT_TYPE_R8,   
  'System.Char'   : ELEMENT_TYPE_CHAR, }   
     
_generic_element_types = _type_map.values()   

class NullCorValue(object):  
  def __init__(self, typename):  
    self.typename = typename  

def extract_value(value):  
    rv = value.CastToReferenceValue()  
    if rv != None:  
      if rv.IsNull:   
        typename = rv.ExactType.Class.GetTypeInfo().Name  
        return NullCorValue(typename)  
      return extract_value(rv.Dereference())  
    bv = value.CastToBoxValue()  
    if bv != None:  
      return extract_value(bv.GetObject())   

    if value.Type in _generic_element_types:  
      return value.CastToGenericValue().GetValue()  
    elif value.Type == ELEMENT_TYPE_STRING:  
      return value.CastToStringValue().String  
    elif value.Type == ELEMENT_TYPE_VALUETYPE:  
      typename = value.ExactType.Class.GetTypeInfo().Name   
      if typename in _type_map:  
        gv = value.CastToGenericValue()  
        return gv.UnsafeGetValueAsType(_type_map[typename])  
      else:  
        return value.CastToObjectValue()  
    elif value.Type in [ELEMENT_TYPE_CLASS, ELEMENT_TYPE_OBJECT]:  
      return value.CastToObjectValue()  
    else:  
      msg = "CorValue type %s not supported" % str(value.Type)
      raise (Exception, msg)

It’s kinda ugly code and I’m thinking that at least some of really belongs in the CorValue C# classes rather than in ipydbg. However, I’m not that interested in doing the significant refactoring it would take to make the CorValue API developer-friendly, so I did it here.

One thing to note that I didn’t cover earlier is the NullCorValue object. For reference values, there’s a IsNull property that may be set. If it is set, I need a mechanism to indicate the null value, but also includes the type information. So I created a custom type that can store the type name to represent null. Again, something that should be a part of the CorValue API.

Once I have my extracted value, I need to display it in the console. This is much simpler than the extracting the value. As I wrote above, I’m not making any attempt to print a real representation for CorObjectValues. I could look at making a call ToString call to get something useful, but that requires invoking a function in the target process and I haven’t gotten that far with ipydbg yet. So I just print “<…>” if it isn’t a string, primitive or null value.

def display_value(value):
  if type(value) == str:
    return (('"%s"' % value), 'System.String')
  elif type(value) == CorObjectValue:
    return ("<...>", value.ExactType.Class.GetTypeInfo().FullName)
  elif type(value) == NullCorValue:
    return ("<None>", value.typename)
  else:
    return (str(value), value.GetType().FullName)

Now all I need is to iterate thru the list of local variables and call extract_value and display_value on each in turn and print the results. I won’t reproduce that code here, but you can see it in the ipydbg project source on GitHub.

I’m happy with what I’ve gotten working (it took several days of banging my head against the proverbial wall to get it this far) but there’s still room for improvement. First, I’d like to be able to call ToString to get a class-specific generic representation as I described above. Second, I need a way to display the fields of a CorObejctValue object. It’s just a combination of metadata reading and CorObjectValue::GetFieldValue, but that code won’t write itself. Finally, there are other Python primitives - like list, dictionary and tuple – that ipydbg should have specific knowledge of and be able to display without requiring the user to drill into the member variables and the like.


[1] While the CorValue API does certain things very well, I wish it did a better job abstracting away the existence of the various ICorDebugValue interfaces. Hence the need for all the calls to CastToWhatever().

Posted By Harry Pierson at 8:35 AM Pacific Standard Time

Friday, March 27, 2009

IronPython 2.6 Alpha 1

Just in type for PyCon, we just shipped the first alpha of IronPython 2.6. As you can guess from the version number, the main feature of this version of IronPython will be the new features introduced in Python 2.6. As you can see, we’ve synced version numbers between IronPython and Python. No more explaining which version of IPy goes with which version of Python.

In addition to the start of 2.6 support, the other big feature of IronPython 2.6 is something called Adaptive Compilation. IronPython’s performance is pretty good compared to CPython. We’re about 28% faster than CPython (IPy 2.0.1 vs. CPy 2.5) on PyStone and about 10% faster on PyBench if you exclude the TryRaiseExcept test. [1] However, our startup time is not very good. These two facts are related: it takes a long time on startup to compile to Python code to IL (and then JITted from IL to native code), but once that’s done the code runs really fast. However, if you’re only going to execute a function a few times, it typically isn’t worth the overhead to compile the function to IL. The Adaptive Compilation feature is an interpreter for DLR trees. The first few times you run a given Python function, it gets interpreted. At some point, after you’ve called the function enough times, IronPython 2.6 decides to take the hit and compile the function. If you want to go back to the old “always compile to IL” model, you can pass –O on the command line.

This is our first alpha of 2.6, and some things are kinda broken. In particular, there was a change to collections.py that breaks much of the Python Standard Library under IronPython. Dave has the details and the workaround. Rest assured, this will get fixed before we release. Dino is hard at work making _getframe work for depths greater than zero. Because it will have some perf impact, it won’t be enabled by default – you’ll have to pass a command-line parameter to enable it. But if you have to opt-in to _getframe support for depth > 0, it makes sense to opt-into _getframe support entirely and do away with the current _getframe(0) only support. What’s nice about this approach is that it will work with collections.py regardless if you opt-in to _getframe or not.

As stated in the release notes, the release cycle on 2.6 will be much shorter than 2.0. There was only seven months between 1.0 and 1.1, and we’re shooting for a slightly longer timeframe for 2.6. Certainly not like the twenty months that passed between 1.1 and 2.0. So please start trying it out as soon as you can and give us your feedback.


[1] IPy is over 4000% slower than CPy on TryRaiseExcept, 58,234 ms vs. 1,286ms. This one test represents 44% of our overall test run time and causes IPy to run PyBench 57% slower than CPy instead of 10% faster. Python has a different philosophy on exceptions than CLR does. Several Python exceptions like GeneratorExit and StopIteration are explicitly documented as “not considered an error”. This is a very different approach to CLR’s approach. At some point, we’re going to have to look at improving exception performance, but it’s not really a priority for the 2.6 release.

Posted By Harry Pierson at 9:20 AM Pacific Standard Time

Wednesday, March 25, 2009

Writing an IronPython Debugger: Getting Local Variables

I just pushed out a new drop of ipydbg that includes the first cut of support for showing local variables. Getting the value for a local variable is actually pretty simple. The CorFrame object (which hangs off active_thread) includes a method to get a local variable by index as well getting a count of all local variables. The problem with these functions is that they don’t provide the name of the variable. For that, you’ve got to look in debug symbols.

From a CorFrame, you can retrieve the associated CorFunction. Since I added symbol reader support to CorModule, I added support for directly retrieving the ISymbolMethod for a CorFunction. From the method symbols, I can get the root lexical scope of the method. And from the symbol scope, I can get the locals. Scopes can be nested, so to get all the locals for a given function, you need to iterate thru all the child scopes as well.

So here’s my get_locals function:

def get_locals(frame, scope=None, offset=None, show_hidden=False): 
    #if the scope is unspecified, try and get it from the frame
    if scope == None
        symmethod = frame.Function.GetSymbolMethod() 
        if symmethod != None
            scope = symmethod.RootScope 
        #if scope still not available, yield the local variables
        #from the frame, with auto-gen'ed names (local_1, etc)
        else
          for i in range(frame.GetLocalVariablesCount()): 
            yield "local_%d" % i, frame.GetLocalVariable(i) 
          return 

    #if we have a scope, get the locals from the scope 
    #and their values from the frame
    for lv in scope.GetLocals(): 
        #always skip $site locals - they are cached callsites and 
        #not relevant to the ironpython developer
        if lv.Name == "$site": continue 
        if not lv.Name.startswith("$") or show_hidden: 
          v = frame.GetLocalVariable(lv.AddressField1) 
          yield lv.Name, v 

    if offset == None: offset = frame.GetIP()[0

    #recusively call get_locals for all the child scopes
    for s in scope.GetChildren(): 
      if s.StartOffset <= offset and s.EndOffset >= offset: 
        for ret in get_locals(frame, s, offset, show_hidden): 
          yield ret

The function is designed to automatically retrieve the scope and offset, if they’re available. That way, I can simply call get_locals with the frame argument and it does the right thing. For example, if you don’t pass in a symbol scope explicitly get_locals will attempt to retrieve the debug symbols. If debug symbols aren’t available, iterates over the locals in the frame and yields each with a fake name (local_0, local_1, etc). If the debug symbols are available, then it iterates over the locals in the scope, then calls itself for each of the child scopes (skipping child scopes who’s offset range doesn’t overlap with the current offset).

The other feature of get_locals is deciding which locals to include. As you might expect, IronPython emits some local variables that are for internal runtime use. These variables get prefixed with a dollar sign. The dollar sign is not a legal identifier character in C# or Python, but IL has no problem with it. If you pass in False for show_hidden (or use the default value), then get_locals skips over any local variables who’s name starts with the dollar sign.

Even if you pass in True for show_hidden, get_locals still skips over any variable named “$site”. $site variables are dynamic call site caches, a DLR feature that are used to efficiently dispatch dynamic calls by caching the results of previous invocations. Martin Maly’s blog has more details on these caches. As they are part of method dispatch, I never want to show them to the ipydbg user, so they get skipped regardless of the value of show_hidden.

Now that I can get the local variables for a given frame, we need to convert those variables to something you can print on the screen. That turns out to be more complicated that you might expect, so it’ll have to wait for the next post (which may be a while, given that PyCon is this weekend). In the meantime, you can get the latest version of ipydbg from GitHub.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Saturday, March 21, 2009

Writing an IronPython Debugger: A Little Hack…err…Cleanup

Yesterday, I pushed out two commits to ipydbg. The first was simple, I removed all of the embedded ConsoleColorMgr code in favor of the separate consolecolor.py module I blogged about Thursday. The second commit…well, let’s just say it’s not quite so simple.

Last weekend, I was experimenting with breakpoints when I discovered that the MoveNext method of BreakpointEnumerator was throwing a NotImplementedException. Up to that point, I hadn’t modified any of the MDbg C# source code except to merge the corapi and raw assemblies into a single assembly. But since I had to fix BreakpointEnumerator, I figured I should make some improvements to the C# code as well. For example, I added helper functions to easily retrieve the metadata for a class or function.

In my latest commit, I’ve added a SymbolReader property to CorModule. Previously, I managed the mapping from CorModules to SymbolReaders in my IPyDebugProcess class via the symbol_readers field. However, since mapping CorModules to SymbolReaders is something pretty much any debugger app would have to do, it made more sense to have that be a part of CorModule directly. So now, you can set and retrieve the SymbolReader directly on the module. Furthermore, I moved the logic to retrieve a SymbolReader from the IStream provided in the OnUpdateModuleSymbols event into the CorModule class as well.

I wouldn’t have bothered to blog this change at all, except that if you look at how the SymbolReader property is implemented under the hood, it’s not what you would expect. Instead of having SymbolReader as an instance variable on CorModule – as you might expect -CorModule has a static dictionary mapping CorModules to SymbolReaders. The instance SymbolReader property simply then access to the underlying static dictionary.

//code taken from CorModule class in CorModule.cs
private static Dictionary<CorModule, ISymbolReader> _symbolsMap =   
                             new Dictionary<CorModule, ISymbolReader>();   

public ISymbolReader SymbolReader    
{   
    get   
    {   
        if (_symbolsMap.ContainsKey(this))   
            return _symbolsMap[this];   
        else   
            return null;   
    }   
    set   
    {   
        _symbolsMap[this] = value;   
    }   
}

Now obviously, this the way you typically implement properties. However, the problem is that there isn’t a 1-to-1 mapping between the underlying debugger COM object instances and the managed objects instances that wrap them. For example, if you look at the CorClass:Module property, it constructs a new managed wrapper for the COM interface it gets back from ICorDebugClass.GetModule. That means that I can’t store the symbol reader as an instance field in the managed wrapper since I probably will never see a given managed wrapper module instance ever again.

All of the debugger API wrapper classes including CorModule inherit from a class named WrapperBase which overrides Equals and GetHashCode. The overridden implementations defer to the wrapped COM interface, which means that two separate managed wrapper instances of the same COM interface will have the same hash code and will evaluate as equal. The upshot is that object uniqueness is determined by the wrapped COM object rather that the managed object instance itself.

Using a static dictionary to store a module instance property provides the necessary “it doesn’t matter what managed object instance you use as long as they all wrap the same COM object underneath” semantics. If I create multiple instances CorModule that all wrap the same underlying COM interface pointer, they’ll all share the same SymbolReader instance from the dictionary.

Yeah, it’s feels kinda hacky, but it works.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Thursday, March 19, 2009

IronPython ConsoleColorMgr

I really liked the ConsoleColorMgr class from my last ipydbg post so I took a few minutes to yank it out into its own seperate module. I also took the opportunity to make a few improvements.

First off, I added support for background colors as well as foreground colors. Furthermore, both colors default to “None” which ConsoleColorMgr takes to mean leave that color unchanged.

from System import Console as _Console

class ConsoleColorMgr(object):
  def __init__(self, foreground = None, background = None):
    self.foreground = foreground
    self.background = background

  def __enter__(self):  
    self._tempFG = _Console.ForegroundColor  
    self._tempBG = _Console.BackgroundColor 
    if self.foreground: _Console.ForegroundColor = self.foreground  
    if self.background: _Console.BackgroundColor = self.background
      
  def __exit__(self, t, v, tr):  
    _Console.ForegroundColor = self._tempFG 
    _Console.BackgroundColor = self._tempBG

The other change I made was to build a set of default ConsoleColorMgr instances in the consolecolor module, one for each of the values in ConsoleColor.

import sys  
from System import ConsoleColor, Enum
  
_curmodule = sys.modules[__name__]

for
n in Enum.GetNames(ConsoleColor):
    setattr(_curmodule, n, ConsoleColorMgr(Enum.Parse(ConsoleColor, n)))

Note that for this set of default ConsoleColorMgr instances, I’m only setting the foreground color. If you want to set the background color, you have to create your own ConsoleColorMgr instances. This allows me to write the following:

from __future__ import with_statement
import consolecolor   

with consolecolor.Red:    
    print "Open the pod bay doors, HAL"   
with consolecolor.ConsoleColorMgr(ConsoleColor.Black, ConsoleColor.Red): 
    print "I'm sorry Dave, I'm afraid I can't do that." 

If you want it, I’ve put consolecolor.py up on my skydrive or it’s available as part of my devhawk_ipy project on GitHub.

Update - Christopher Bermingham pointed out that my sample snippet at the end doesn’t work unless you add “from __future__ import with_statement” to the top of your python file. I updated my code snippet to include this. Thanks Christopher!

Posted By Harry Pierson at 3:43 PM Pacific Standard Time

Writing an IronPython Debugger: Colorful Console

Now that I’ve added the current source code line to the console output, I wanted to start using color in order to make it clearer to understand the various pieces of data that gets output. Now, the various event handler messages get output in dark grey while the current line of source is in yellow. Here’s what it looks like on my machine (note, the top line with the green [11] is PowerShell and ipy2 is a PowerShell alias to ipy.exe v2.0.1)

ipydbg on the console

Writing color to the windows console is a hassle because of the stateful API it uses. The problem is that I always want to return to the default color after I’ve written out a line of colored text. I wish there was an overload of Console.Write and WriteLine that took the foreground and background colors as arguments. 

Of course, I could easily implement my own write and writeline methods that took color parameters. However, I was loath to do that as Python’s print statement is so convenient. So instead, I build a console color context manager. I got the idea from Luis Fallas’ XmlWriter context manager.

class ConsoleColorMgr(object): 
  def __init__(self, color): 
    self.color = color 

  def __enter__(self): 
    self.temp = Console.ForegroundColor 
    Console.ForegroundColor = self.color 
     
  def __exit__(self, t, v, tr): 
    Console.ForegroundColor = self.temp 

CCDarkGray = ConsoleColorMgr(ConsoleColor.DarkGray)
CCGray     = ConsoleColorMgr(ConsoleColor.Gray)
CCYellow   = ConsoleColorMgr(ConsoleColor.Yellow)

def OnCreateAppDomain(self, sender,e): 
    with CCDarkGray: 
      print "OnCreateAppDomain", e.AppDomain.Name 
    e.AppDomain.Attach()

Python’s with statement is similar to C#’s using statement. However, unlike IDisposable object, Python context managers support both an enter and exit method. This means I don’t have to construct an object in order to get a context (in this case, the console colors) managed. So far, I’ve got three console color context managers defined – Grey, DarkGrey and Yellow. I’m thinking that ConsoleColorMgr is a candidate for my assorted module collection at some point.

Now that I can print in color, I wanted to modify my line printer to use color. Usually, the current sequence point corresponds to an entire line of python source. But as we see below, sometimes only part of a given line of source text is associated with a given sequence point.

image

The other issue I ran into is that there’s a always a sequence point at the very end of a function. Unlike the break at the start of the function I wrote about in my last post, this one I didn’t want to automatically step over. This is the last breakpoint for a given scope, so I should give the user one last chance to inspect the scope (once I add the ability to do that, at any rate) before we step out of it. However, I wanted a way of showing that we’re about to step out in the source code line view. I decided on writing a series of carets ^^^ to indicate that we’re at the end of a function.

image

As you can see in the dark grey line in the screenshot above, the current sequence point starts and ends at line 4 column 23. Column 23 is beyond the end of line 4, so that’s what I look for in order to draw the three carets. Here’s the final version of _print_source_line:

def _print_source_line(self, sp, lines):
  line = lines[sp.start_line-1]
  with CCGray:
    Console.Write("%d: " % sp.start_line)
    Console.Write(line.Substring(0, sp.start_col-1))
    with CCYellow:
      if sp.start_col > len(line):
        Console.Write(" ^^^")
      else:
        Console.Write(line.Substring(sp.start_col-1,
                                     sp.end_col - sp.start_col))
    Console.WriteLine(line.Substring(sp.end_col-1))

So colorizing the current line of source code turned out to be a little harder than I had expected. But hey, I got a start of a reusable module out of it. That’s pretty cool. Anyway, the latest bits are, as always, up on GitHub.

Posted By Harry Pierson at 2:48 PM Pacific Standard Time

Writing an IronPython Debugger: Showing Source Code

It’s been almost a week since my last ipydbg post. I’m not done, I just needed to catch my breath for a few days and get some other work done. Contrary to popular believe, my day job revolves around more than just ipydbg! :)

Actually, I’ve made ten commit since my last post, but it’s been a mostly minor changes. For example, I was hacking around with breakpoints and restored a bunch of commented out code in BreakpointEnumerator. Since I was changing the original C# CorDebug wrapper source, I decided to add a few helper functions to return metadata for functions and classes as well as cleaning up some C# filenames. On the Python side, I added an active_appdomain field to IPyDebugProcess to go along with active_thread.

Today, I added what started as a fairly minor feature – showing the current line of source code at the start of the input loop. The initial code for this was cake, simply getting the sequence point for the current location and mapping that to a source file. In order to avoid hitting the file system over and over, I cache source files the first time they are accessed.

def _get_file(self,filename):
    filename = Path.GetFileName(filename)
    if not filename in self.source_files:
      self.source_files[filename] = File.ReadAllLines(filename)
    return self.source_files[filename] 

def _input(self):
    offset, sp = self._get_location(self.active_thread.ActiveFrame)
    lines = self._get_file(sp.doc.URL)
    print "%d:" % sp.start_line, lines[sp.start_line-1]
    #input loop ommited for clarity   

However, when I did this, I discovered a slight issue. When you step into a Python function, the CLR debugger breaks at the very beginning of the function being stepped into. In C#, the function start is mapped to the opening curly brace of the function. IronPython, on the other hand, doesn’t map the start of the function to anything since there’s a bunch of infrastructure code at the start of every function that has no correlation to the python source. This means _get_location return a null sequence point when I first step into a function and thus I wouldn’t be able to show any source code.

I could make the argument that start of the function should be mapped to the colon that starts the function block. However, I’m not in a position to make changes to how the shipping version of IronPython emits debug symbols. So instead, I decided to insert an automatic step whenever I step into a function by modifying OnStepComplete:

def OnStepComplete(self, sender,e):
    offset, sp = self._get_location(e.Thread.ActiveFrame)
    print "OnStepComplete Reason:", e.StepReason, \
           "Location:", sp if sp != None else "offset %d" % offset
    if e.StepReason == CorDebugStepReason.STEP_CALL:
      self._do_step(e.Thread, False)
    else:
      self._do_break_event(e)

I have this nagging feeling that a simple step won’t suffice and I’ll need to add logic to ensure that I’m only auto-stepping when the start of the function doesn’t have a matching sequence point. But I have tested this with a few different python scripts and it appears to work fine. If I need something more sophisticated, I can always add it later. BTW, notice I modified the signature of _do_step so that it takes the thread as an argument rather than picking it up as an IPyDebugProcess field.

As usual, latest ipydbg (including new compiled version of CorDebug.dll) is available at GitHub.

Posted By Harry Pierson at 1:58 PM Pacific Standard Time

Friday, March 13, 2009

Writing an IronPython Debugger: Debugging Just My Code

As I wrote last time, in order to make debug stepping actually useful in ipydbg I need to avoid stepping into frames that are part of the IronPython infrastructure. I did something similar when I hide infrastructure frames in the stack trace. Originally, I had planned to automatically stepping again if we ended up on a frame that didn’t correspond to a python file. However, Mike Stall showed me a much cleaner and better performing solution: Just My Code. As I mentioned at the start of this series, support for JMC is one of the main reasons I wanted to build my own debugger rather than use MDbg.

Enabling JMC in the stepper object is trivial:

def create_stepper(thread, JMC = True):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  stepper.SetJmcStatus(JMC) 
  return stepper

If I make that single change and run ipydbg, any step effectively turns into a full continue since none of the code has been marked as “My Code” yet. As you see, the tricky part of JMC isn’t enabling it on the stepper, it’s “painting” the parts of the code where you want JMC stepping to work. You can set JMC status at the module, class or the method level. In the case of ipdbg, it’s easiest to work at the class level:

infrastructure_methods =  ['TryGetExtraValue',     
    'TrySetExtraValue',     
    '.cctor',     
    '.ctor',     
    'CustomSymbolDictionary.GetExtraKeys',     
    'IModuleDictionaryInitialization.InitializeModuleDictionary']    

def OnClassLoad(self, sender, e):
    cmi = CorMetadataImport(e.Class.Module)
    mt = cmi.GetType(e.Class.Token)
    print "OnClassLoad", mt.Name

    if not e.Class.Module.IsDynamic:
      e.Class.JMCStatus = False
    elif mt.Name.startswith('IronPython.NewTypes'):
      e.Class.JMCStatus = False
    else:
      e.Class.JMCStatus = True
      for mmi in mt.GetMethods():
        if mmi.Name in infrastructure_methods:
          f = e.Class.Module.GetFunctionFromToken(mmi.MetadataToken)
          f.JMCStatus = False

OnClassLoad is where the action is. This event handler is responsible for enabling JMC for all class methods that map to python code. To understand how the logic in OnClassLoad works, you need to understand a little about the .NET types and code that IronPython generates. Note, the following description is for the IronPython 2.0 branch. Code generation evolves from release to release and I know for a fact there are changes in the upcoming 2.6 version. I assume that I’ll eventually have to sniff the IronPython version in order to set JMC correctly.

Today, IronPython generates all code into dynamic modules and methods. Since I want to limit stepping to python code only, I automatically disable JMC for non-dynamic modules. I can imagine a scenario where I want to step into non-dynamically generated code, but I think the best way to handle that would be to disable JMC at the stepper rather than widening the amount of code marked as JMC enabled.

For every module that gets loaded, IronPython generates a type. At a minimum you’re going to load two modules: site.py and whatever python script you ran. If you have the python standard library installed, site.py loads a bunch of other modules as well. Each of these module types have a bunch of standard methods that always get generated. For example, the global scope code in the module is placed in a static method on the module type called Initialize. Any python functions you define get generated static methods with mangled names on the module type [1]. All these methods have corresponding python code and should be JMC enabled. The other standard methods on a module type should not be JMC enabled. So in my debugger, I mark the class as JMC enabled but then iterate over the list of methods and mark any in the list of standard methods (except for Initialize) as JMC disabled.

Of course, you can also create classes in Python. As you might expect, classes in Python are generated as .NET types. However, the semantics of Python classes are very different than .NET types. For example, you can change the inheritance hierarchy of python classes at runtime. That’s obviously not allowed for .NET types. So the .NET types we generate have all the logic to implement Python class semantics. As it turns out, these .NET types *only* have the logic to implement Python class semantics, which is to say they have *none* of Python class methods code. This makes sense when you think about it – since Python can add and remove methods from a class at runtime, IronPython can’t put the method code in the .NET type itself. Instead, Python class methods are generated as static methods on the module type, just like top-level functions are. Since Python class types only contain Python class semantics logic, we never want to enable JMC for Python class types. Python class types get generated in the IronPython.NewTypes namespace, so it’s fairly easy to check the class name in OnClassLoad and automatically disable JMC for classes any in that namespace.

Adding JMC support makes ipydbg significantly more usable. It’s almost like a real tool now, isn’t it? Latest bits are up on GitHub.


[1] FYI, IronPython generates python functions as dynamic methods in release mode and static module class methods in debug mode since you can’t step into dynamic methods. The description above is specific to debug mode since ipydbg exclusively runs in debug mode.

Posted By Harry Pierson at 3:43 PM Pacific Standard Time

Writing an IronPython Debugger: Stepping Thru Code

So far, I’ve written seven posts about my IronPython debugger, but frankly it isn’t very functional yet. It runs, breaks on the first line and can show a stack trace. Not exactly Jolt award material. In this post, I’m going to add one of the core functions of any debugger: stepping. Where previously I’ve written a bunch of code but had little to show in terms of features, now I’m getting three new features (basic step, step in and step out) at once!

def _input(self):
  #remaining _input code omitted for clarity
  elif k.Key == ConsoleKey.S:
      print "\nStepping"
      self._do_step(False)
      return
  elif k.Key == ConsoleKey.I:
      print "\nStepping In"
      self._do_step(True)
      return                
  elif k.Key == ConsoleKey.O:
      print "\nStepping Out"
      stepper = create_stepper(self.active_thread)
      stepper.StepOut()

def _do_step(self, step_in):
  stepper = create_stepper(self.active_thread)
  mod = self.active_thread.ActiveFrame.Function.Module
  if mod not in self.symbol_readers:
      stepper.Step(step_in)
  else:
    range = get_step_ranges(self.active_thread, self.symbol_readers[mod])
    stepper.StepRange(step_in, range)

Here you can see the _input clauses for step, step in and step out. Of the three, step out is the simplest to implement: create the stepper object and call StepOut. For step and step in, I could simply call Step (the boolean argument indicates if you want to step into or over functions) but that only steps a single IL statement. The vast majority of the time there are multiple IL instructions for every line of source code, so IL statement stepping is very tedious. As we learned when setting a breakpoint, debug symbols contain sequence points that map between source and IL locations. If they’re available, I use the sequence points to determine the range of IL statements to step over so that I can step single source statements instead.

The stepping code above depends on three helper functions defined at global scope.

def create_stepper(thread):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  return stepper 
  
def create_step_range(start, end):
  range = Array.CreateInstance(COR_DEBUG_STEP_RANGE, 1)
  range[0] = COR_DEBUG_STEP_RANGE(startOffset = UInt32(start),
                                  endOffset = UInt32(end))
  return range
  
def get_step_ranges(thread, reader):
    frame = thread.ActiveFrame
    offset, mapResult = frame.GetIP()
    method = reader.GetMethod(SymbolToken(frame.FunctionToken))
    for sp in get_sequence_points(method):
        if sp.offset > offset:
            return create_step_range(offset, sp.offset)
    return create_step_range(offset, frame.Function.ILCode.Size)          

The first function, create_stepper, simply constructs and configures the stepper object. The call to SetUnmappedStopMask tells the debugger not to stop if it encounters code that can’t be mapped to IL. If you need to debug at that level, ipydbg is *not* for you.

Next is create_step_range, which exists purely for .NET interop purposes. There are three interop warts hidden in this function. First is creating a .NET array of COR_DEBUG_STEP_RANGE structs. Every time I write Array code like this, I wish for a CreateFromCollection static method on Array. However, in this case it isn’t that big a deal since it’s a one element array. Second wart is having to set the values of COR_DEBUG_STEP_RANGE via constructor keyword arguments. It turns out that IronPython disallows direct updates to value type fields (read this for the reason why). Instead, I pass in the field values into the constructor as keyword arguments. Finally, you have to explicitly convert the start and end offsets to a unsigned int in order to set the offset fields in the COR_DEBUG_STEP_RANGE struct constructor.

Finally is get_step_ranges, which iterates thru the list of sequence points in the current method looking for the one with the smallest offset that is larger than the current offset position. If it can’t find a matching sequence point, it sets the range to the end of the current function. The start range offset is always the current offset. I did make a significant change to get_sequence_points – it no longer yields sequence points that have a start line of 0xfeefee. By convention, that indicates a sequence point to be skipped. Originally, the logic to ignore 0xfeefee sequence points was in get_location. But when I originally wrote get_step_ranges, it had essentially the same sequence point skipping logic, so I moved it to get_location instead.

Technically, I’ve built three new features but the reality is that if you end up in IronPython infrastructure code it’s really hard to find your way back to python code. Step in is particularly useless right now. Luckily, the .NET debugger API supports a feature called “Just My Code” that will make stepping much more useful. In the meantime, the latest version of ipydbg is up on GitHub as usual.

Posted By Harry Pierson at 9:31 AM Pacific Standard Time

Wednesday, March 11, 2009

Writing an IronPython Debugger: Refactoring

When we last left ipydbg, it was up to about 200 lines of code. Not bad in terms of overall length, but I started to detect some code smell. I was relying pretty heavily on global variables and the structure of my code made it difficult to control how the debugger was run. I wanted to change ipydbg so it would automatically spin up an MTA thread if I forgot to add the –X:MTA command line parameter. But since by debugger and process objects were global, they’d get created on the main thread of ipydbg, regardless if it was STA or MTA. So for this “release” (I’d say I’m almost to version 0.0.0.1), I decided on focusing on enginering and refactoring rather than new features.

The big new addition is the IPyDebugProcess class, which is clearly the workhorse of the application. All of the previously global variables are now class instance variables on IPyDebugProcess. Input and run along with all the event handlers as well as do_break_event and get_location are now class methods, as they need to access instance variables (setting the break event, accessing the symbol reader dictionary, etc.). Functions that didn’t need to access instance variables (get_sequence_points, create_breakpoint, get_dynamic_frames and get_method_info_for_frame) I left as top-level functions. If they get more complex, I may break them out into their own modules, but for now I left them in ipydbg.py.

The conversion process was fairly trivial. I had to add “self.” lots of places and change the indention level all over but that was pretty much it. Once I finished the conversion, I was able to add the run_debugger function to handle the thread creation, if necessary.

def run_debugger(py_file):
    if Thread.CurrentThread.GetApartmentState() == ApartmentState.STA:
        t = Thread(ParameterizedThreadStart(run_debugger))
        t.SetApartmentState(ApartmentState.MTA)
        t.Start(py_file)
        t.Join()   
    else:
        p = IPyDebugProcess()
        p.run(py_file)

if __name__ == "__main__":        
    run_debugger(sys.argv[1])        

Originally, I tried to put this logic in IPyDebugProcess.run. However, since I’m creating the debugger object in the __init__ function, that meant it would be created on the wrong thread. I could have moved the debugger creation to the run method or move the thread management code to __init__, but I decided to factor that logic into a separate function completely. Felt cleaner that way.

Posted By Harry Pierson at 7:42 PM Pacific Standard Time

IronPython at PyCon

Here’s a quick quiz. Which of these tasks is harder to accomplish:

  1. Getting $6,000 from a variety of groups within Microsoft to pay for a Gold PyCon 2009 sponsorship.
  2. Sending PSF a check

If you guessed #2, you’d be right. It’s amazing how difficult the seemly trivial task of “give those PSF folks money” turned out to be. But it’s done now, and you can see the MS logo there on the side of all the PyCon pages.

In addition to the sponsorship, there are some great looking IronPython sessions at PyCon.

Posted By Harry Pierson at 3:22 PM Pacific Standard Time

devhawk_ipy

As I write various python modules (many of which get blogged about), I dump them into a special folder on my machine(s). In my powershell profile script, I set the IRONPYTHONPATH environment variable so that these modules are available to the IPy interpreter (i.e. ipy.exe). To date, I’ve been pretty haphazard about this. But I decided to get a little more structured and put that folder under source control and make it available as “devhawk_ipy”.

So far, I’ve only got three scripts (plus an empty __init__.py) in devhawk_ipy.

Eventually I’ll put my code for working with WPF, LiveFX and Azure into this package, but I’m not happy with where they are yet.

Like ipydbg, devhawk_ipy is up on GitHub. For those non-Git users, I’m will continue to these files up on my SkyDrive. I kind of see SkyDrive as a dumping ground for random content while devhawk_ipy is where stuff goes when it’s a little more polished.

Like IronPython, devhawk_ipy is licensed under the MS-PL. If you’re interested in contributing, feel free to fork and send me patches.

Posted By Harry Pierson at 2:44 PM Pacific Standard Time

Monday, March 09, 2009

Writing an IronPython Debugger: Dynamic Stack Trace

Now that I can interact with my debugger, it’s time to add a command. I decided to start with something simple – or at least something I thought would be simple - printing a stack trace.

In the unmanaged debugger API, threads have the concept of both stack chains and stack frames. A stack chain represents a segment of the physical stack. In a typical managed app, you’ll have at least two stack chains: the unmanaged stack chain and the managed stack chain. You can interate through the stack chains for a given thread via the Chains property. However, ipydbg is a managed only debugger, so I can ignore the unmanaged stack chain. Instead, I just retrieve the current (managed) chain via the thread’s ActiveChain property.

Within a managed stack chain, there is a collection of stack frames. This is the call stack that managed developers are typically used to working with. It turns out that printing a raw stack trace is very easy to do. Here was my first stab at it:

elif k.Key == ConsoleKey.T:
  print "\nManaged Stack Trace"
  for f in active_thread.ActiveChain.Frames:
    offset, sp = get_location(f)
    metadata_import = CorMetadataImport(f.Function.Module)
    method_info = metadata_import.GetMethodInfo(f.FunctionToken)
    print "  ", \
      "%s::%s --" % (method_info.DeclaringType.Name, method_info.Name), \
      sp if sp != None else "(offset %d)" % offset

This elif block is part of the input method I showed last time. It loops thru the frames in the Active Chain of the active thread and prints some data to the console. As I said, pretty easy. Of course, the devil is in the details.

First detail I should call out is that active_thread variable. As per Mike Stall, “there is no notion of "active thread" in the underlying debug APIs. It's purely a construct in a debugger UI to make it easier for end-users.” My console based UI may be rudimentary, but it’s still a UI. Events like OnBreakpoint include the active thread as a event argument, so I stash that away in a variable so it’ll be available to the input loop.

Second detail is the call to get_location. When we last saw get_location, it was returning a formatted string. Since my last post, I’ve refactored the code so it returns the raw location data – a tuple of the raw IP offset and the associated sequence point, if available. I’ve also added a __str__ method to my sequence point object, so when I print it to the console, I get the filename and line nicely formatted.

Finally, there’s all CorMetadataImport code. In addition to wrapping the unmanaged debugger API, CorDebug also wraps the unmanaged metadata API. This code lets me get MethodInfo compatible view of the function metadata for a given stack frame. I use it here to get the type and function name for each frame on the stack.

The end result looks something like this. Note, I’ve replaced “Microsoft.Scripting” with “MS.Scripting” to avoid word wrapping.

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Managed Stack Trace
   S$2::Initialize simpletest.py:1 (offset: 84)
   MS.Scripting.Runtime.OptimizedScriptCode::InvokeTarget (offset 72)
   MS.Scripting.ScriptCode::Run (offset 0)
   IronPython.Hosting.PythonCommandLine::RunFileWorker (offset 77)
   IronPython.Hosting.PythonCommandLine::RunFile (offset 15)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 46)
   IronPython.Hosting.PythonCommandLine::Run (offset 240)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 74)
   MS.Scripting.Hosting.Shell.ConsoleHost::RunCommandLine (offset 158)
   MS.Scripting.Hosting.Shell.ConsoleHost::ExecuteInternal (offset 32)
   MS.Scripting.Hosting.Shell.ConsoleHost::Execute (offset 63)
   MS.Scripting.Hosting.Shell.ConsoleHost::Run (offset 390)
   PythonConsoleHost::Main -- (offset 125)

As we can see, we may be on the first line of the python script, but we’ve got a pretty deep stack trace already. Everything but the top-most frame are from the underlying IronPython implementation. Those extra frames obscure the stack frames I actually care about, so it would be nice to hide any stack frames from IronPython or the DLR. It’s easy enough to write a python generator function that filters out frames that from the DLR or IronPython namespaces. In order to get the type name, we need the method_info like we did above. I’ve factored that code into a separate function in order to avoid code duplication.

def get_method_info_for_frame(frame)
    if frame.FrameType != CorFrameType.ILFrame:
      return None
    metadata_import = CorMetadataImport(frame.Function.Module)
    return metadata_import.GetMethodInfo(frame.FunctionToken)
    
def get_dynamic_frames(chain):
  for f in chain.Frames:
    method_info = get_method_info_for_frame(f)
    if method_info == None:
      continue
    typename = method_info.DeclaringType.Name
    if typename.startswith("Microsoft.Scripting.") \
      or typename.startswith("IronPython.") \
      or typename == "PythonConsoleHost":
        continue
    yield f

You’ll notice I’ve added a guard to get_method_info_for_frame in order to ensure that the frame argument is an IL Frame. There are three types of stack frames in the debugger API: IL, native and internal. Most of the frames we’re dealing with are IL frames, but you do run into the occasional lightweight function (i.e. DynamicMethod) frame when debugging IronPython code. Typically, IronPython generates DynamicMethods for all python code except for a few cases related to .NET interop. However, you can’t debug DynamicMethods, so when you run with –D, we generate normal non-dynamic methods instead. However, even when running with –D, we still use DynamicMethods for call site dispatch. Since they’re an implementation detail, we want to filter those out in get_dynamic_frames too.

This gives us a much more manageable stack trace:

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Stack Trace
   S$2::Initialize -- simpletest.py:1 (offset: 84)

As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:10 PM Pacific Standard Time

Wednesday, March 04, 2009

Writing and IronPython Debugger: Adding Interactivity

Now that ipydbg can set a breakpoint, it’s time to add some interactivity to the app. MDbg supports dozens of commands and currently ipydbg supports none. I’d love for ipydbg to support a wide range of commands like MDbg does, but for now let’s keep it simple and start with two: Continue and Quit. These aren’t very interesting as commands go, but that lets me focus this blog post on adding basic interactivity and future posts on specific commands.

First off, we have to understand how the CorDebug managed API supports interactivity. As we’ve seen, callbacks into the debugger are surfaced as managed events. If we look at the base class for all the debugger event arguments, we see that it exposes a Continue property. If you want the debugger to automatically continue after the event handler finishes running, you set the Continue property to true (which is the default). If you want the debugger to stay paused while you provide the developer a chance to poke around, you set Continue to false. In that case, the debugger stays paused until call process.Continue explicitly.

Once we set the Continue property to false, we need a mechanism to signal the main thread of execution that it’s time to wake up and ask the user what they want to do next. Of course, that’s what WaitHandle and it’s descendents are for. In fact, we’re already using an AutoResetEvent in OnProcessExit to signal that the debugged app has exited so we should exit the debugger. However, now we have two different signals that we want to send: exit the debugger or enter the input loop. I decided to differentiate by using two separate AutoResetEvents:

terminate_event = AutoResetEvent(False
break_event = AutoResetEvent(False

def OnProcessExit(s,e): 
  print "OnProcessExit" 
  terminate_event.Set() 

def OnBreakpoint(s,e): 
  print "OnBreakpoint", get_location( 
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread) 
  e.Continue = False 
  break_event.Set() 

#code to create debugger and process omitted for clarity

handles = Array.CreateInstance(WaitHandle, 2
handles[0] = terminate_event 
handles[1] = break_event 

while True
  process.Continue(False

  i = WaitHandle.WaitAny(handles) 
  if i == 0
    break 

  input()

Instead of a single call to process.Continue I had before, I’ve created an infinite “while True” loop that calls Continue, waits for one of the events to signal, then either exits the loop of enters the input loop (via the input function). Since there are two AutoResetEvents, I need to use the WaitAny method to wait for one of them to signal. WaitAny takes an array, which is kind of clunky to use from IronPython since the array has to be strongly typed. It would be much more pythonic if I could call WaitHandle.WaitAny([terminate_event, break_event]). WaitAny then returns an index into the array indicating which one received the signal. If it was the terminate_event that signaled, I exit the loop (and the application). Otherwise, I enter the input loop. Notice, by the way, in OnBreakpoint that I’m both setting Continue to false and signaling the break_event.

The “input loop” needs to be a loop because the user may want to type in multiple commands before letting the debugged app continue to execute. This means that the input function is implemented as another “while True” loop. When the user does chooses a command that implies the process should continue, I simply exit out of the input function and the outer “while True” loop above executes the continue and waits for a signal.

Here’s what the input function looks like right now with our two basic commands:

def input():
  while True:
    Console.Write("» ")
    k = Console.ReadKey()
    
    if k.Key == ConsoleKey.Spacebar:
      Console.WriteLine("\nContinuing")
      return 
    elif k.Key == ConsoleKey.Q:
      Console.WriteLine("\nQuitting")
      process.Stop(0)
      process.Terminate(255)
      return
    else:
      Console.WriteLine("\n Please enter a valid command")

I’ve mapped “q” to quit the debugger and spacebar to continue. Since I’m using Console ReadKey, you only have to type the key in question – no return needed. For continue, we don’t do anything but exit the input loop by returning. Continue gets called as part of the other loop and since we haven’t/can’t add additional breakpoints the debugged app will run until it ends. For quit, I call the Terminate method on process, hard coding the return value to 255. However, Terminate implicitly continues the debugged process. Since you can’t continue a running process, the call to Continue in the outer loop throws an exception. I avoid this exception by adding the call to Stop before Terminate. As per the Stop docs, the debugger maintains a “stop counter” and only resumes the debugged process when the counter reaches zero.  Calling Stop increases the stop counter by one, calling Terminate decreases it by one, then the outer loop Continue  call decreases it to zero and the process continues, terminates and fires the OnProcessExit event handler as usual.

Now that we have a basic interactive loop, I’ll be able to add more interesting commands. I’m guessing at some point, I’ll need to refactor input a bit – I’m guessing a huge if/elif/else statement is going to get ugly fast, but I’ll worry about that when it gets out of hand. As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:06 PM Pacific Standard Time

Monday, March 02, 2009

Writing an IronPython Debugger: Setting a Breakpoint

Now that we have a debugger process up and running, let start adding some actual features. First up, we want to be able to set breakpoints. One of the nice things MDbg does is auto-set a breakpoint on the entrypoint function. For ipydbg, we’re going to auto-set a breakpoint on the first line of the python file being debugged.

In order to set a breakpoint, we need debugger symbols. They allow us to translate between “line one of simpletest.py” and the actual location in the code and back. We’re all used to seeing the PDB files that are produced when we compile a C# assembly. Unsurprisingly, the symbol store binder provides a method to load these PDB files from disk. But where do IronPython debug symbols come from? I know from my extensive reading of the ipy.exe command line parameters that you pass –D to enable application debugging, but since all the IL is being generated in memory, how does the debugger get access to the PDB files?

It turns out the debugger API includes a UpdateModuleSymbols callback method that the runtime uses to notify the debugger when the symbols change. The debugger symbols are provided in an IStream, and then you use the symbol binder to get a symbol reader. The .NET Framework already provides a managed API for reading and writing debug symbols. However, that API doesn’t support loading symbols from a stream, so the MDbg code includes it’s own wrapper around the symbol binder API to include that functionality. Here’s some code to get the debug symbol reader for an updated module and iterate through the associated files:

sym_binder = SymbolBinder()  
    
def OnUpdateModuleSymbols(s,e):  
  print "OnUpdateModuleSymbols"  
    
  metadata_import = e.Module.GetMetaDataInterface[IMetadataImport]()  
  reader = sym_binder.GetReaderFromStream(metadata_import, e.Stream)  

  for doc in reader.GetDocuments():   
    print "\t", doc.URL

process.OnUpdateModuleSymbols += OnUpdateModuleSymbols

If we run this version of ipydbg on simpletest.py with the IPy 2.0.1 release and the Python standard library installed, OnUpdatedModuleSymbols gets called six times, once for each python file that gets loaded when simpletest runs. (site.py, os.py, ntpath.py, stat.py, UserDict.py and simpletest.py). BTW, I tried running this code on the latest build of IPy (changeset 47624) and I’m getting a COM Interop exception. So for now, stick with 2.0.1.

Now that we can get these dynamically generated debug symbols, we can use them to create a breakpoint on the first line of the script being debugged. Everytime OnUpdateModuleSympols is called, I try to bind the initial breakpoint (unless it’s already been bound of course) by calling the following create_breakpoint function.

def create_breakpoint(doc, line, module, reader):
  line = doc.FindClosestLine(line)
  method = reader.GetMethodFromDocumentPosition(doc, line, 0)
  function = module.GetFunctionFromToken(method.Token.GetToken())
  
  for sp in get_sequence_points(method):
    if sp.doc.URL == doc.URL and sp.start_line == line:
      bp = function.ILCode.CreateBreakpoint(sp.offset)
      bp.Activate(True)
      return bp
      
  bp = function.CreateBreakpoint()
  bp.Activate(True)
  return bp

This code translates a given document/line into a function/offset where we can set a breakpoint. To do this, we use sequence points which as per Rick Byers are “used to mark a spot in the IL code that corresponds to a specific location in the original source”. So once we find the function that corresponds to a given line of code, we iterate over the sequence points until we find the one that matches the line we want to break on. If we find a matching sequence point, we set the breakpoint there. If we don’t, we set the breakpoint on the function itself. get_sequence_points is a simple wrapper around ISymbolMethod GetSequencePoints. The original API is pretty ugly to use – managing six separate arrays of information – so get_sequence_points turns it into a generator function you can iterate over.

Now that the breakpoint is set, we want to trap the breakpoint event as well. That’s easy enough, we create an event handler for process.OnBreakpoint similar to the OnUpdateModuleSymbols event above. Eventually, we’ll have the ability to step when we break, but for now I’m just going to print out the current location when the breakpoint is hit. This is kind of the reverse of the operation above. Setting a breakpoint means going from a source location to an IL offset within a function. Printing the current location means going from an IL offset in a function back to the source location. Here’s the function to do that:

def get_location(reader, thread): 
  frame = thread.ActiveFrame 
  function = frame.Function 
   
  offset, mapping_result = frame.GetIP() 
  method = reader.GetMethod(SymbolToken(frame.Function.Token)) 
   
  real_sp = None 
  for sp in get_sequence_points(method): 
    if sp.offset > offset:  
      break 
    if sp.start_line != 0xfeefee:  
      real_sp = sp 
       
  if real_sp == None
    return "Location (offset %d)" % (offset) 
   
  return "Location %s:%d (offset %d)" % ( 
    Path.GetFileName(real_sp.doc.URL), real_sp.start_line, offset) 

def OnBreakpoint(s,e):
  print "OnBreakpoint", get_location(
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread)

Given a symbol reader and a debug thread, get_location returns a location string. It loops thru the sequence points, similar to create_breakpoint, in order to find the closest corresponding line of python code to the current offset (check out Mike Stall’s blog as for why I’m checking for 0xfeefee). In order to make this work, I need the symbol reader for the module that I retrieved in OnUpdateModuleSymbols. For now, I’m stashing the reader in a global dictionary keyed by the module named symbol_readers where OnBreakpoint can access it.

Ipydbg isn’t interactive yet, but it is now running, setting a breakpoint and successfully breaking at that breakpoint. As usual, the latest version of ipydbg is up on GitHub.

Posted By Harry Pierson at 3:59 PM Pacific Standard Time

Saturday, February 28, 2009

CodeHTMLer Language Definition for Python

As I’ve blogged before, I use CodeHTMLer to post code snippets on my blog. I hear SyntaxHighlighter is the new hotness, but since it relies on CSS the syntax highlighting only appears on the website and not in the RSS reader.

The problem with CodeHTMLer is that it only supports a handful of languages out of the box. But the language definition file is simple enough – just an XML file with a bunch of regular expressions. When I was doing a lot of F# work, I wrote an F# language definition. Now that I’m on the IronPython team, go figure I’m writing a lot of code in Python. I *know* I’ve written a Python language definition for CodeHTMLer more than once, but I would forget to post it and then lose it when I paved my laptop hard drive. So after doing this three or four times, I’ve finally remembered to put it up on my SkyDrive.

If you want to install this yourself to colorize Python code snippets with CodeHTMLer, follow the directions I posted earlier with the F# language definition.

Posted By Harry Pierson at 8:26 AM Pacific Standard Time

Friday, February 27, 2009

Writing an IronPython Debugger: Hello, Debugger!

Since I’m guessing most of my readers have never build a debugger before (I certainly hadn’t), let’s start with the debugger equivalent of Hello, World!

import clr  
clr.AddReference('CorDebug')  

import sys  
from System.Reflection import Assembly  
from System.Threading import AutoResetEvent  
from Microsoft.Samples.Debugging.CorDebug import CorDebugger  

ipy = Assembly.GetEntryAssembly().Location  
py_file = sys.argv[1]  
cmd_line = "\"%s\" -D \"%s\"" % (ipy, py_file)  

evt = AutoResetEvent(False)  

def OnCreateAppDomain(s,e):  
  print "OnCreateAppDomain", e.AppDomain.Name  
  e.AppDomain.Attach()  

def OnProcessExit(s,e):  
  print "OnProcessExit"  
  evt.Set()  

debugger = CorDebugger(CorDebugger.GetDefaultDebuggerVersion())  
process = debugger.CreateProcess(ipy, cmd_line)  

process.OnCreateAppDomain += OnCreateAppDomain  
process.OnProcessExit += OnProcessExit  

process.Continue(False)  

evt.WaitOne()

I start by adding a reference to the CorDebug library I discussed at the end of my last post (that’s the low level managed debugger API plus the C# definitions of the various COM APIs). Then I need both the path to the IPy executable as well as the script to be run, which is passed in on the command line (sys.argv). For now, I just use Reflection to find the path to the current ipy.exe and use that. I use those to build a command line – you’ll notice I’m adding the –D on the command line to generate debugger symbols.

Next, I define two event handlers: OnCreateAppDomain and OnProcessExit. When the AppDomain is created, the debugger needs to explicitly attach to it. When the process exits, we signal an AutoResetEvent to indicate our program can exit.

Then it’s a simple process of creating the CorDebugger object, creating a process, setting up the process event handlers and then running the process via the call to Continue. We then wait on the AutoResetEvent for the debugged process to exit. And voila, you have the worlds simplest debugger in about 30 lines of code.

To run it, you run the ipy.exe interpreter and pass in the ipydbg script above and the python script to be debugged. You also have to pass –X:MTA on the command line, as the ICorDebug objects only work from a multi-threaded apartment. When you run it, you get something that looks like this:

» ipy -X:MTA ipydbg.py simpletest.py
OnCreateAppDomain DefaultDomain
35
OnProcessExit

Simpletest.py is a very simple script that prints the results of adding two numbers together. Here, you see the event handlers fire by writing text out to the console.

For those of you who’d like to see this code actually run on your machine, I’ve created an ipydbg project up on GitHub. The tree version that goes with this blog post is here. If you’re not running Git, you can download a tar or zip of the project via the “download” button at the top of the page. It includes both the CorDebug source as well as the ipydbg.py file (shown above) and the simpletest.py file. It also has a compiled version of CorDebug.dll, so you don’t have to compile it yourself (for those IPy only coders who don’t have VS on their machine).

Posted By Harry Pierson at 5:41 PM Pacific Standard Time

Writing an IronPython Debugger: MDbg 101

Before I start writing any debugger code, I thought it would help to quickly review the .NET debugger infrastructure that is available as well as the design of the MDbg command line debugger. Please note, my understanding of this stuff is fairly rudimentary – Mike Stall is “da man” if you’re looking for a .NET debugger blogger to read.

The CLR provides a series of unmanaged APIs for things like hosting the CLR, reading and writing CLR metadata and – more relevant to our current discussion – debugging as well as reading and writing debugger symbols. These APIs are exposed as COM objects. The CLR Debugging API allows you to do those all the things you would expect to be able to do in a debugger: attach to processes (actually, app domains), create breakpoints, step thru code, etc. Of course, being an unmanaged API, it’s pretty much unavailable to be used from IronPython. Luckily, MDbg wraps this unmanaged API for us, making it available to any managed language, including IronPython.

The basic design of MDbg looks like this:

image

At the bottom is the “raw” assembly, which contains the C# definitions of the unmanaged debugger API – basically anything that starts with ICorDebug and ICorPublish. Raw also defines some of the metadata API, since that’s how type information is exposed to the debugger.

The next level up is the “corapi” assembly, which I refer to as the low-level managed debugger API. This is a fairly thin layer that translates the unmanaged paradigm into something more palatable to managed code developers. For example, COM enumerators such as ICorDebugAppDomainEnum are exposed as IEnumerable types. Also, the managed callback interface gets exposed as .NET events. It’s not perfect – the code is written in C# 1.0 style so there are no generics or yields.

Where corapi is the low-level API, “mdbgeng” is the high-level managed debugger API. As you would expect, it wraps the low-level API and provides automatic implementations of common operations. For example, this layer maintains a list of breakpoints so you can create them before the relevant assembly has been loaded. Then when assemblies are loaded, it goes thru the list of unbound breakpoints to see if any can be bound. It’s also this layer that automatically creates the main entrypoint breakpoint.

Finally, at the top we have the MDbg application itself, as well as any MDbg extensions (represented by the … in the diagram above). The mdbgext assembly defines the types shared between MDbg.exe and the extension assemblies. MDbg has some cool extensions – including an IronPython extension – but for now I’m focused on building something as lightweight as possible, so I’m going to forgo an extensibility mechanism, at least for now.

My initial prototype was written against the high-level API. There were two problems with this approach. The first is that there’s no support for Just My Code in the high-level API. As I mentioned in my last post, JMC support is critical for this project. Adding JMC support isn’t hard, but I’m trying to make as few changes as possible to the MDbg source, since I’m not interested in forking and maintaining that code. Second, while the low-level API provides an event-based API (OnModuleLoad, OnBreakpoint, OnStepComplete, etc), the high-level API provides a more console-oriented looping API. I found the event-driven API to be cleaner to work with and I’m thinking it will work better if I ever build a GUI version of ipydbg. So I’ve decided to work against the low-level API (aka corapi).

I mentioned above that I didn’t want to change the MDbg source, but I did make one small change. The separation of corapi and raw into two separate assemblies is an outdated artifact of an earlier version of MDbg. So I decided to combine these two into a single assembly called CorDebug. Other than some simple cleanup to assembly level attributes to make a single assembly possible, I haven’t changed the source code at all.

Posted By Harry Pierson at 3:33 PM Pacific Standard Time

Writing an IronPython Debugger: Introduction

A while back I showed how you can use Visual Studio to debug IronPython scripts. While that works great, it’s lots of steps and lots of mouse work. I yearned for something lighter weight and that I could drive from the command line.

The .NET framework includes a command line debugger called MDbg, but after using it for a bit, I found it didn’t like it very much for IronPython debugging. Mdbg automatically sets a breakpoint on the main entrypoint function, but only if it can find the debugging symbols. So when you use Mdbg with the released version of IPy, the breakpoint never gets set. Instead, you have to trap the module load event, set a breakpoint in the python file you’re debugging, then stop trapping the module load event. Every Time. That gets tedious.

Another problem with MDbg is that it’s not Just-My-Code (aka JMC) aware. JMC is this awesome debugging feature that was introduced in .NET 2.0 that lets the debugger “paint” the parts of the code that you want to step thru (aka “My Code”). By default, Visual Studio marks code with symbols as “my code” and code without symbols as “not my code”. [1] We don’t ship symbols with IronPython releases, so Visual Studio does only steps thru the python code. MDbg doesn’t support JMC, so I often found myself stepping into random parts of the IronPython implementation. That’s even more tedious.

Luckily, the source code to MDbg is available. So I got the wacky idea to build a debugger specifically for IronPython. CPython includes pdb (aka Python Debugger, not Program Database) but we don’t support it because we haven’t implemented settrace. Thus, ipydbg was born.

Over the course of this series of blog posts, I’m going to build out ipydbg. I have built out a series of prototypes so I fairly confident that I know how to build it. However, I’m not sure what it will look like at the end. If you’ve got any strong opinions on it one way or the other, be sure to email me or leave me comments.

BTW, major thanks to my VSL teammate Mike Stall (of Mike Stall's .NET Debugging Blog). Without his help, I would probably still be trying to make heads or tails of the MDbg source.


[1] VS uses the DebuggerNonUserCode attribute to provide fine grained control of what is considered “my code” and should be stepped thru.

Posted By Harry Pierson at 2:21 PM Pacific Standard Time

Sunday, February 15, 2009

IronPython 2.0.1

I’m on vacation this week, but I wanted to quickly point out that we shipped IronPython v2.0.1 last Friday. This has been a performance focused release, as you can see via our 2.0 vs. 2.0.1 benchmarks. We have improved our PyStone performance by about 11.5% and our Richards performance by just over 4%. Thanks to Dino for the perf improvements and Dave for the great performance report.

Posted By Harry Pierson at 8:20 PM Pacific Standard Time

Thursday, January 29, 2009

IronPython and CodeDOM: Dynamically Compiling C# Files

As part of my series on using IronPython with WPF [1], I built an extension method in C# that does dynamic member resolution on WPF FrameworkElements. The upshot of this code is that I can write “win1.listbox1” instead of “win1.FindName(‘listbox1’)” when using WPF objects from Python or any DLR language. Convenient, right?

The problem with this approach is that the C# extension method gets compiled into an assembly that’s bound to a specific version of the DLR. I recently started experimenting with a more recent build of IronPython and I couldn’t load the extension method assembly due to a conflict between the different versions of Microsoft.Scripting.dll. Of course, I could have simply re-compiled the assembly against the new bits, but that would mean every time I moved to a new version of IronPython, I’d have to recompile. Worse, it would limit my ability to run multiple versions of IronPython on my machine at once. I currently have three – count ‘em, *three* – copies of IronPython installed: 2.0 RTM, nightly build version 46242, and an internal version without the mangled namespaces of our public CodePlex releases. Having to manage multiple copies of my extension assembly would get annoying very quickly.

Instead of adding a reference to the compiled assembly, what if I could add a reference to a C# file directly? Kinda like how adding references to Python files works, but for statically compiled C#. That would let me write code like the following, which falls back to adding a reference to the C# file directly if adding a reference to the compiled assembly fails.

try:
  clr.AddReference('Microsoft.Scripting.Extension.Wpf.dll')
except
  import codedom
  codedom.add_reference_cs_file('FrameworkElementExtension.cs'
    ['System', 'WindowsBase', 'PresentationFramework'
     'PresentationCore', 'Microsoft.Scripting'])

Since this technique uses CodeDOM, I decided to encapsulate the code in a Python module named codedom, which is frankly pretty simple. As a shout-out to my pals on the VB team, I broke compiling out into it’s own separate function so I could easily support adding VB as well as C# files.

def compile(prov, file, references):
  cp = CompilerParameters()
  cp.GenerateInMemory = True
  for ref in references:
    a = Assembly.LoadWithPartialName(ref)
    cp.ReferencedAssemblies.Add(a.Location)

  cr = prov.CompileAssemblyFromFile(cp, file)
  if cr.Errors.Count > 0:
    raise Exception(cr.Errors)
  return cr.CompiledAssembly
    
def add_reference_cs_file(file, references):
  clr.AddReference(compile(CSharpCodeProvider(), file, references))
  
def add_reference_vb_file(file, references):
  clr.AddReference(compile(VBCodeProvider(), file, references))

The compile function uses a CodeDOM provider, which provides a convenient function to compile an assembly from a single file. The only tricky part was adding the references correctly. Of the five references in this example, the only one CodeDOM can locate automatically is System.dll. For the others, it appears that CodeDOM needs the full path to the assembly in question.

Of course, hard-coding the assembly paths in my script would be too fragile, so instead I use partial names. I load each referenced assembly via Assembly.LoadWithPartialName then pass it’s Location to the CodeDOM provider via the CompilerParameters object. I realize that loading an assembly just to find its location it kind of overkill but a) I couldn’t find another mechanism to locate an assemblies location given only a partial name and b) I’m going to be loading the referenced assemblies when I load the generated assembly anyway, so I figured it loading them to find their location wasn’t a big deal. Note, that typically you’re used to passing a string to clr.AddReference, but it also can accept an assembly object directly.

Of course, this approach isn’t what you would call “fast”. Loading the pre-compiled assembly is much, much faster than compiling the C# file on the fly. But I figure slow code is better than code that doesn’t work at all. Besides, the way the code is written, I only take the extra compile hit if the pre-compiled assembly won’t load.

I stuck my codedom.py file up on my SkyDrive. Feel free to leverage as you need.


[1] I had to put that series on the back burner in part because the December update to Windows Live totally broke my WPF photo viewing app. I’ve got a new WPF app I’m working on, but I’m not quite ready to blog about it yet.

Posted By Harry Pierson at 4:53 PM Pacific Standard Time

Wednesday, January 07, 2009

Nightly Builds Technical Info

Here are some technical details on my Nightly Builds solution. I broke them into a separate post because I figured most people are more interested in the actual service than how it’s built.

As you might expect, I built most of the solution in IronPython. All of the download, build, compress and Azure upload code was written in IPy. The one part I didn’t write in IPy was the Azure cloud web app, which I wrote in C#. Jon Udell’s been investigating getting IPy to run in Azure, but I just wanted something quick and dirty (as you can see from the utter lack of formatting) so I decided to use C# instead. Man, were my ASP.NET skills rusty.

As for the IronPython parts, for the most part I’m using external tools for downloading, building and compressing. I use the Source Control RSS Feed to discover recent source code changesets, CodePlex Client to download source from CodePlex, MSBuild to build the binaries, 7-zip to compress the binaries and the StorageClient library sample to upload the compressed binaries up to Azure blob storage.

For building and compressing, I’m literally shelling out to MSBuild and 7-Zip via os.system. I looked at programmatically building via the MSBuild API, but I ran into an assembly binding bug that I wasn’t motivated enough to work around. As for creating zip files programmatically, IronPython doesn’t have a zlib module implementation yet so I just used 7-Zip’s command line utility instead.

For downloading form CodePlex, I originally started by shelling out to CodePlex Client. However, I wanted the ability to cloak folders – for example \Tutorial and \Src\Tests – that weren’t required to build. CodePlex Client has a very useful TFS library embedded in it – the build process combines all the libraries into a single executable via ILMerge. I could have compiled my own version of the TFS library, but instead I just load cpc.exe as an assembly reference via clr.AddReferenceToFileAndPath. It’s a nifty trick Jim Hugunin showed me once.

Uploading to Azure was very straightforward because of the StorageClient library. Here’s the code to create a blob container object (creating the actual blob container if it doesn’t already exist) and to upload a file to a container.

def get_blob_container(prj):
  azure_account = StorageAccountInfo(endpoint, None, azure_name, azure_key)
  storage = BlobStorage.Create(azure_account)
  container = storage.GetBlobContainer(prj.lower())
  if not container.DoesContainerExist():
    print "Creating", prj, "Azure Blob Storage Container"
    container.CreateContainer(None, ContainerAccessControl.Public)
  return container

def upload_to_azure(container, upload_filepath, azure_filename, metadata):
    print "Uploading", azure_filename, "to Azure"
    prop = BlobProperties(azure_filename)
    nv = NameValueCollection()
    for key in metadata:
      nv[key] = metadata[key]
    prop.Metadata = nv
    
    with File.OpenRead(upload_filepath) as stream:
      contents = BlobContents(stream)
      if not container.CreateBlob(prop, contents, True):
        raise "Uploading " + azure_filename + " to Azure failed"

I’ve been working on some pure IronPython code to access the blob storage REST API directly, but that’s primarily to familiarize myself with the service. At some point, I’m going to want to leverage Table Storage but my brief experimentation with the StorageClient Table Storage interface makes me think that it depends on static typing too much to be useful for IPy. If that turns out to be true, the Table Storage REST API will be my only option.

As you can see in the code above, these Azure blob containers are set to be publically accessible (via ContainerAccessControl.Public argument passed to CreateContainer). So for my C# app, I’m simply using calling XDocument.Load with the List Blobs operation url, shaping the results via LINQ to XML and binding them to nested ASP.NET Repeater controls.

Assuming people find this useful, I’m thinking of some additional improvements, in order of what I’m likely to get to first:

  • Caching Project Info in the cloud app
    Currently, I’m hitting getting and processing the list of binary releases on every request. I’m sure caching that data to make it more efficient.
  • Virtual Build Environment
    Currently, I’m just building on my laptop. It would be nice to have a clean environment dedicated to running the build script.
  • Auto-Build
    My script uses the RSS feed to find the recent checkins, but I have to manually kick off the process. I’d like it to set it up as a service that periodically checks the source code RSS feed automatically and downloads and builds any new releases that it finds.
  • Table Storage for Build Metadata
    Today, I am simply grabbing the list of all uploaded compressed binaries for a given project, parsing their names, and displaying that as a hierarchical list on the project page. If I used Table Storage, I could add additional metadata including social software features like ratings and comments.
  • Amazon EC2 Virtual Build Environment
    If I’m creating a virtual machine for my build environment, I could look at hosting it on Amazon EC2. They support Windows now after all. Ideally, I’d use an Azure worker role for compiling and compressing builds, but our build tools need access to the file system.
Posted By Harry Pierson at 3:23 PM Pacific Standard Time

IronPython Nightly Builds

IronPython 2.0 shipped about a month ago, but we’re still chugging along with our post 2.0 work. We’ve shipped seven source code releases since we shipped 2.0 and we should be back to our normal schedule of updating the source 2-3 times a week schedule by next week. Given how often we ship source, we’re thinking of extending the the time between binary drops. Binary releases have to be signed and there’s a fairly arduous process we have to go thru in order to get each binary release out the door.

However, there’s something nice and convenient about downloading a pre-compiled binary release. So I spent my Christmas vacation building a script to download and build IronPython nightly builds. Once built, I compress the binaries and upload them to Azure blob storage. Finally, I built a *very* simple cloud app for users to view and download available nightly builds. As an extra benefit, I’m also providing nightly builds of the DLR.

Please note, these are *NOT* official Microsoft releases of IronPython and/or DLR. They aren’t signed and they haven’t gone through the aforementioned release process. I’m just downloading the public source, building it with the publicly available tools, then making them available on a a publicly accessible website.

The website for the IronPython (and DLR) nightly builds is http://nightlybuilds.cloudapp.net.

As usual, I welcome any feedback. Is having prebuilt unsigned binaries of IPy releases useful? Do you want IronRuby binaries as well? What about social features (rating releases, comments, etc)? Please let me know what you think.

Posted By Harry Pierson at 3:18 PM Pacific Standard Time

Tuesday, December 16, 2008

IronPython and LiveFX: Raw HTTP Access

One of the cool things about the Live Framework is that while there’s a convenient .NET library available, you can use the raw HTTP interface from any platform. LiveFX data is served up over HTTP and is available in ATOM, RSS, JSON or POX formats. As I’ve already shown, you can easily use the .NET library from IronPython, but I wanted to try working with the raw HTTP interface to get a feel for that as well.

Unfortunately, it was harder than I expected it to be. The big issue is that the documentation on how to LiveFX authorization tokens via raw HTTP is fairly sparse and occasionally contradictory. For example, there’s a whole section on Authentication and Live Framework, but it doesn’t cover this scenario. Luckily, I was able to figure it out with the help of AtomPub Project Manager LiveFX Sample, a post on Alex Feinman’s blog, a post on Emmanuel Mesas’ blog and a little groveling around with Reflector. It does appear that the auth docs are in flux –Emmanuel refers to this MSDN article as being about RPS Soap requests, but it’s actually about delegated authority. (Is MSDN reusing URLs? Bad idea.) Also, the sample code has a comment that reads “to be replaced by delegated authorization” so it looks like changes are coming. In other words, no promises on how long this code will work!

If you look at the AtomPub Project Manager sample, there’s a WindowsLiveIdentity.cs file that implements static GetTicket method that looks similar to both the code on Alex’s blog as well as the implementation of GetWindowsLiveAuthenticationToken. The upshot is that there’s a WS-Trust endpoint for Windows Live at https://dev.login.live.com/wstlogin.srf. You send it a RequestSecurityToken (aka RST) message (with a couple of extra WL specific extensions) and it responds with the security token you’ll need for accessing the LiveFx HTTP endpoints.

I ported the GetTicket function over to IronPython. I’m using .NET classes like WebRequest and XmlReader, but there’s nothing fancy here so I would expect it to be easy enough to port over to the standard Python library.

def get_WL_ticket(username, password, compactTicket):
    req = WebRequest.Create(_LoginEndPoint)
    req.Method = "POST"
    req.ContentType = "application/soap+xml; charset=UTF-8"
    req.Timeout = 30 * 10000
    
    rst = get_RST_message(username, password, compactTicket)
    rstbytes = Encoding.UTF8.GetBytes(rst)
    with req.GetRequestStream() as reqstm:
      reqstm.Write(rstbytes, 0, rstbytes.Length)
      
    with req.GetResponse() as resp:
      with resp.GetResponseStream() as respstm:
        with XmlReader.Create(respstm) as reader:
          if compactTicket:
            name = "BinarySecurityToken"
            namespace = "http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"
          else:
            name = "RequestedSecurityToken"
            namespace = "http://schemas.xmlsoap.org/ws/2005/02/trust"

          if not reader.ReadToDescendant(name, namespace):
            raise "couldn't find security token element"
          
          reader.ReadStartElement(name, namespace)
          token = reader.ReadContentAsString()
          reader.ReadEndElement()
          
          return Convert.ToBase64String(Encoding.UTF8.GetBytes(token))

This code simply uses a WebRequest object to post the RST message to the WS-Trust enpoint then parses the result to find the token. get_RST_message uses standard Python string formatting to generate the RST message that gets posted to the WS-Trust endpoint. I’m not exactly sure why you need to convert the token value to a byte array and then Base64 encode it, but that’s what the sample code does so I did it to.

Once you have the authentication ticket, you need to download root service endpoint document in order to get the base URL and the profiles link. Then you can download all the profiles or you can download a specific one if you know it’s leet-speak identifier. LiveFX data can be downloaded in a variety of formats: ATOM, JSON, RSS or POX. You choose your format by setting the Accept and Content-Type headers.

I wrote the following functions, the generic boilerplate download function as well a specific versions for downloading JSON and POX:

def download(url, contentType, authToken):
  req = WebRequest.Create(url)
  req.Accept = contentType
  req.ContentType = contentType
  req.Headers.Add(HttpRequestHeader.Authorization, authToken)
  
  return req.GetResponse() 
  
def download_json(url, authToken):
  resp = download(url, 'application/json', authToken)
  with StreamReader(resp.GetResponseStream()) as reader: 
      data = reader.ReadToEnd()
      return eval(data)

def download_pox(url, authToken):
  resp = download(url, 'text/xml', authToken)
  return XmlReader.Create(resp.GetResponseStream())

Using JSON in Python is really easy, since I can simply eval the returned string and get back Python dictionary objects, similar to what you can do in Javascript.

Here’s some code that uses the get_WL_ticket and download_json functions above to retrieve the the user’s Personal Status Message

#Get user's WL ticket
uid = raw_input("enter WL ID: ")   
pwd = raw_input("enter password: ")  

authToken = livefx_http.get_WL_ticket(uid, pwd, True)  

#download root service document
service = livefx_http.download_json(_LiveFxUri, authToken)  

#download general profile document
url = service['BaseUri'] + service['ProfilesLink'] + "/G3N3RaL"  

genprofile = livefx_http.download_json(url, authToken)  
print genprofile['ProfileBase']['PersonalStatusMessage']

POX is also fairly easy, though a bit more verbose than JSON. The sample code, which I have stuck on my SkyDrive, includes both POX and JSON code, so you can compare and contrast the differences.

Posted By Harry Pierson at 3:36 PM Pacific Standard Time

IronPython and LiveFX: Ori’s LiveOE.py

Ori Amiga is a Group Program Manager over in the Live Framework team whom you might have seen at PDC08 delivering the Lap Around LiveFX & Mesh Services and LiveFX Programming Model Architecture and Insights talks. And apparently, he’s an IronPython fan as posted a small LiveFX Python module to his blog. It’s pretty simple – it only wraps Connect and ConnectLocal - but it does cut about ten lines of path appending, reference adding and module importing code into a single import statement. Here’s the profile access script from my last post rewritten to use Ori’s LiveOE module.

import LiveOE
from devhawk import linq

uid = raw_input("Enter Windows Live ID: ")
pwd = raw_input("Enter Password: ")

loe = LiveOE.Connect(uid, pwd)

general = linq.Single(loe.Profiles.Entries, 
  lambda e: e.Resource.Type == LiveOE.ProfileResource.ProfileType.General)

print loe.Mesh.ProvisionedUser.Name
print loe.Mesh.ProvisionedUser.Email
print general.Resource.ProfileInfo.PersonalStatusMessage
print linq.Count(loe.Contacts.Entries)

FYI, make sure you update the sdkLibsPath in LiveOE.py – I’m not sure where Ori has installed the LiveFX SDK, but it’s *not* in the location suggested by the read me file.

BTW, it turns out the WL Profile information is read only which answers a question I had. However, reading the thread it sounds like they will eventually get around to making it read-write at some point.

Posted By Harry Pierson at 10:09 AM Pacific Standard Time

Friday, December 12, 2008

IronPython and LiveFX: Accessing Profiles

I recently got access to both the Windows Azure and Live Framework CTP programs. Frankly, I’m very interested in Live Mesh, so I decided to start with a simple LiveFX program. Scott (aka ScottIsAFool) at LiveSide posted a “quick and dirty” console app that pulls info from a user’s profile via LiveFx. It’s not Mesh per se, but it does use the same framework and resource model so I decided to port it to IronPython. FYI, this app won’t run unless you’ve been received a LiveFx CTP token and provisioned yourself.

#Add LiveFX References
import sys
sys.path.append('C:\\Program Files\\Microsoft SDKs\\Live Framework SDK\\v0.9\\Libraries\\.Net Library')

import clr
clr.AddReference('Microsoft.LiveFX.Client')
clr.AddReference('Microsoft.LiveFX.ResourceModel')

from Microsoft.LiveFX.Client import LiveOperatingEnvironment
from Microsoft.LiveFX.ResourceModel.ProfileResource import ProfileType
from System.Net import NetworkCredential

from devhawk import linq

#get username and password from the user
uid = raw_input("Enter Windows Live ID: ")
pwd = raw_input("Enter Password: ")
creds = NetworkCredential(uid, pwd, "https://user-ctp.windows.net")

#print out user's info
loe = LiveOperatingEnvironment()
loe.Connect(creds)

general = linq.Single(loe.Profiles.Entries, 
  lambda e: e.Resource.Type == ProfileType.General)

print loe.Mesh.ProvisionedUser.Name
print loe.Mesh.ProvisionedUser.Email
print general.Resource.ProfileInfo.PersonalStatusMessage
print linq.Count(loe.Contacts.Entries)

I did modify the app slightly, reading the WLID and password off the console – I was *sure* I would accidently post my personal credentials if I left them embedded in the app. Otherwise, it’s a straight port. First, I add references the LiveFX dlls. Since they’re not local to my script, I add the directory where they’re installed to sys.path, which lets me call clr.AddReference directly. Then I retrieve the user’s ID and password using raw_input (Python’s equivalent to Console.ReadLine). Finally, I connect to the user’s LiveOperatingEnvironment and pull their name, email address, personal status message and the number of contacts they have.

As per the original app, I use LINQ to find the right profile as well as count the number of contacts. I was able to reuse the linq.py file I wrote for my Rock Band song list screen scraper (though I did have to add the Count function since I hadn’t needed it previously). I’ve posted this script on my SkyDrive, and it includes my most recent linq.py file.

BTW, it doesn’t appear that you can set the PersonalStatusMessage programmatically, at least not currently. I was thinking it would be cool to build an app that sets your PSM via Twitter, but the set method of PersonalStatusMessage is marked internal. In fact, all the set methods of all the profile properties I looked at are marked internal. If someone knows how to update LiveFX resource objects in the current CTP, I’d appreciate it if you dropped me a line or left me a comment.

Posted By Harry Pierson at 5:50 PM Pacific Standard Time

Thursday, December 11, 2008

IronPython RTM News Gets Around

I just hit the MSDN home page, and what should I see?

msdn HomeIt’s cool to see JasonZ, aka my group’s general manager, blogging about our product.

I also fired up Visual Studio, and IronPython is the top headline there too:

VS homeNot sure why the news is dated September 18th, but hey it’s really cool to see IronPython (not to mention the DLR, with the second headline) getting this kind of visibility.

Posted By Harry Pierson at 4:49 PM Pacific Standard Time

Wednesday, December 10, 2008

IPy RTW FTW!

imageThis is a very pretty sight. It’s a screenshot from the IronPython CodePlex home page showing that 2.0 is the “current release”. Yes that’s right, dear reader, IronPython 2.0 has officially been released!

Get it now!

This release marks the end of a very busy year for me, nine months to the day since I accepted the offer to join the dynamic languages team. Between helping ship IronPython 2.0 and helping manage the languages and tools PDC08 track, I’ve been swimming in the deep end of the pool all year. Feels good to not have any immediate deliverables for the next month or two.

Major, major props to Dino, IronCurt, Dave and Srivatsn who have done the heavy lifting on the IPy side this release. Also major props to the DLR team, who are releasing the final 0.9 version of the DLR later today in concert with IPy 2.0. (Update: the DLR 0.9 RTW bits are now available) And of course, HUGE HUGE HUGE thanks to the vibrant IPy community, many of whom are listed by name in the release notes.

Even with 2.0 finally out the door, there’s no rest for the dynamic. As per the release notes, “we’re planning on releasing IronPython 2.0.1 fairly soon” so keep those bug reports coming. Going forward, we’ve got big plans for IronPython and we rely heavily on the continued input from our community, so please keep telling us where we can improve.

On a personal note, the past nine months have been busy – very busy – but they’ve also been a blast. Frankly, I was hesitant about joining the product groups for a long time because I was worried about the grind, the culture, the overall experience. Turns out my fears were overblown, though I’m thinking that’s at least partially related to the fact that I work on a “little” project like IronPython rather than a huge project like Visual Studio.

Posted By Harry Pierson at 2:57 PM Pacific Standard Time

Monday, December 01, 2008

IronPython and Linq to XML Part 4: Generating XML

Now that I have my list of Rock Band songs and I can get the right Zune metadata for most of them, I just need to write out the playlist XML. This is very straight forward to do with the classes in System.Xml.Linq.

def GenMediaElement(song):
  try:
    trackurl = zune_catalog_url + song.search_string
    trackfeed = XDocument.Load(trackurl)
    trackentry = First(trackfeed.Descendants(atomns+'entry'))
    trk = ScrapeEntry(trackentry)
    return XElement('media', (XAttribute(key, trk[key]) for key in trk))
  except:
    print "FAILED", song
    
zpl = XElement("smil",
  XElement("head"
    XElement("title", "Rock Band Generated Playlist")),
  XElement("body",
    XElement("seq", (GenMediaElement(song) for song in songs))))

settings = XmlWriterSettings()
settings.Indent = True
settings.Encoding = Encoding.UTF8
with XmlWriter.Create("rockband.zpl", settings) as xtw:
  zpl.WriteTo(xtw)

XElement’s constructor takes a name (XName to be precise) and any number of child objects. These child objects can be XML nodes (aka XObjects) or simple content objects like strings or numbers. If you pass an IEnumerable, the XElement constructor will iterate the collection and add all the items as children of the element. If you’ve had the displeasure of building an XML tree using the DOM, you’ll really appreciate XElements’s fluent interface. I was worried that Python’s significant whitespace would force me to put all the nested XElements on a single line, but luckily Python doesn’t treat whitespace inside parenthesis as significant. 

Creating collections in Python is even easier than it is in C#. Python’s supports a yield keyword which is basically the equivalent of C#’s yield return. However, Python also supports list comprehensions (known as generator expressions), which are similar to F#’s sequence expressions. These are nice because you can specify a collection in a single line, rather than having to create a separate function, which is what you have to do to use yield. I have two generator expressions: (XAttribute(key, trk[key]) for key in trk) creates a collection of XAttributes, one for every item in the trk dictionary and (GenMediaElement(song) for song in songs) which generates a collection of XElements, one for every song in the song collection.

Once I’ve finished building the playlist XML, I need to write it out to a file. Originally, I used Python’s built in open function, but the playlist file had to be UTF-8 because of band names like Mötley Crüe. Zune’s software appears to always use UTF-8. In addition to setting the encoding, I also specify to use indentation, so the resulting file is somewhat readable by humans.

The playlist works great in the Zune software, but since it’s a streaming playlist there’s no easy way to automatically download all the songs and sync them to your Zune device. I expected to be able to right click on the playlist and select “download all", but there’s no such option. Zune does have a concept called Channels where the songs from a regularly updated feed are downloaded locally and synced to the device. However, the Zune software appears to be hardcoded to only download channels from the catalog service so I couldn’t tap into that. If anyone knows how to sign up to become a Zune partner channel, please drop me a line.

Otherwise, that’s So there you have it. As usual, I’ve stuck the code up on my SkyDrive. If I can remember, I’ll try and run the script once a week and upload the new playlist to my SkyDrive as well.

Posted By Harry Pierson at 11:13 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Thursday, November 27, 2008

IronPython and Linq to XML Part 3: Consuming Atom Feeds

Now that I have my list of Rock Band songs, I need to generate a Zune playlist. I wrote that Zune just uses the WMP playlist format, but that’s not completely true. Media elements in a Zune playlist have several attributes that appear unique to Zune.

Because of Zune Pass, Zune supports the idea of streaming playlists where the songs are downloaded on demand instead of played from the local hard drive. In order to enable this, media elements in Zune playlists can have a serviceID attribute, a GUID that uniquely identifies the song on the Zune service. We also need the song’s album and duration – the Zune software summarily removes songs that don’t include the duration.

Of course, the Rock Band song list doesn’t include the Zune song service ID. It also doesn’t include the song’s album or duration. So we need a way, given the song’s title and artist (which we do have) to get its album, duration and service ID. Luckily, the Zune service provides a way to do exactly this, albeit an undocumented way. Via Fiddler2, I learned that Zune exposes a set of Atom feed web services on catalog.zune.net that the UI uses when you search the marketplace from the Zune software. There are feeds to search by artist and by album but the one we care about is the search by track. For example, here’s the track query for Pinball Wizard by The Who.

Since these feeds are real XML, I can simply use XDocument.Load to suck down the XML. Then I look for the first Atom entry element using similar LINQ to XML techniques I wrote about last time. If there’s no Atom elements, that means that the search failed – either Zune doesn’t know about the song or it can’t find it via the Rock Band provided title and artist. Of the 461 songs on Rock Band right now, my script can find 417 of them on Zune automatically.

Of course, since the Zune data is in XML instead of HTML, finding the data I’m looking for is much easier that it was to find the Rock Band song data. Here’s the code pull the relevant information out of the Zune catalog feed that we need.

def ScrapeEntry(entry):  
  id = entry.Element(atomns+'id').Value 
  length = entry.Element(zunens+'length').Value 

  d = {} 
  d['trackTitle'] = entry.Element(atomns+'title').Value 
  d['albumArtist'] = entry.Element(zunens+'primaryArtist')
                       .Element(zunens+'name').Value 
  d['trackArtist'] = d['albumArtist'] 
  d['albumTitle'] = entry.Element(zunens+'album')
                       .Element(zunens+'title').Value 
   
  if id.StartsWith('urn:uuid:'): 
    d['serviceId'] = "{" + id.Substring(9) + "}" 
  else: 
    d['serviceId'] = id 
   
  m = length_re.Match(length) 
  if m.Success: 
    min = int(m.Groups[1].Value) 
    sec = int(m.Groups[2].Value) 
    d['duration'] = str((min * 60 + sec) * 1000) 
  else: 
    d['duration'] = '60000
     
  return

trackurl = catalogurl + song.search_string
trackfeed = XDocument.Load(trackurl) 
trackentry = First(trackfeed.Descendants(atomns+'entry')) 
track = ScrapeEntry(trackentry) 

A few quick notes:

  • The code above isn’t valid Python, I added a couple of carriage returns (albumArtist and albumTitle) to get it to read well on the blog without wrapping badly.
  • song.search_string returns the song title and artist as a plus delimited string. i.e. pinball+wizard+the+who. However, many Rock Band songs end in a parenthetical like (Cover Version) so I automatically strip that off for the search string
  • duration in the Atom feed is stored like PT3M23S, which means the song is 3:23 long. The playlist file expect the song length in milliseconds, so I use a .NET regular expression to pull out the minutes and seconds and do the conversion. It’s not exact – songs lengths usually aren’t exactly a factor of seconds, but as far as I can understand, Zune just uses that to display in the UI – it doesn’t affect playback at all.

Now I have a list of songs with all the relevant metadata, next time I’ll write it out into a Zune playlist file.

Posted By Harry Pierson at 10:55 AM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Wednesday, November 26, 2008

Early Christmas from Iron Languages and DLR

Tomorrow may be Thanksgiving, but the Microsoft DevDiv dynamic language teams are trying to make it feel like Christmas with three separate pre-holiday releases.

  1. IronPython 2.0 RC2 
    We were really hoping to only have one release candidate, but we ended up with a couple of significant bugs that we couldn’t push off to 2.0.1. With December holidays coming soon, RC2 has a pretty small window before we declare RTM so now is the time to download the release and try your code out.
  2. IronRuby 1.0 Alpha 2 
    There’s been zero blog traffic on this, just a notice on the IronRuby mailing list. As per said notice, “Notable features” include “the inclusion of iirb.bat, igem.bat, irails.bat, irake.bat”.
  3. New DLR CodePlex Project T
    he DLR source has been available as part of IronPython for over a year but now they have their own home on CodePlex. Check out the Release Notes for an overview, reads some Docs and Specs or just download their initial v0.9 beta. Their v0.9 beta is synced with IPy 2.0 RC2 (and their v0.9 final will sync with IPy 2.0 RTM) but it also includes synced versions of IronRuby and ToyScript in both source and binaries. Plus, Sesh has promised “weekly code drops”. Finally, unlike IronPython and IronRuby, DLR is using the discussion section of their CodePlex site – I’m eager to see how well the new-ish discussion/mailing list integration works.

So there you go, new versions of IronPython and IronRuby plus a whole new DLR CodePlex project to boot. Enjoy.

Posted By Harry Pierson at 10:49 PM Pacific Standard Time

IronPython and Linq to XML Part 2: Screen Scraping

First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.

Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.

def loadStream(streamreader):
  from System.Xml.Linq import XDocument
  from Sgml import SgmlReader
  
  reader = SgmlReader()
  reader.DocType = "HTML"
  reader.InputStream = streamreader
  return XDocument.Load(reader)
  
def load(url):
  from System import Uri
  from System.IO import StreamReader
  
  if isinstance(url, str):
    url = Uri(url)
  
  if url.Scheme == "file":
    from System.IO import File
    with File.OpenRead(url.LocalPath) as fs:
      with StreamReader(fs) as sr:
        return loadStream(sr)
  else:
    from System.Net import WebClient
    wc = WebClient()
    with wc.OpenRead(url) as ns:
      with StreamReader(ns) as sr:
        return loadStream(sr)

def parse(text):
  from System.IO import StringReader
  return loadStream(StringReader(text))

I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.

Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.

def FindById(node, id):
  def CheckId(n):
    a = n.Attribute('id')
    return a != None and a.Value == id
  
  return linq.Single(node.Descendants(), CheckId)

(Side note – I didn’t like the verbosity of the “a != None and a.Value == id” line of code, by XAttributes are not comparable by value. That is, I can’t write “node.Attribute(‘id’) == XAttribute(‘id’, id)”. And writing “node.Attribute(‘id’).Value == id” only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)

LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:

def Single(col, fun):
  return Enumerable.Single[object](col, Func[object, bool](fun))
  
def Where(col, fun):
  return Enumerable.Where[object](col, Func[object, bool](fun))
  
def Select(col, fun):
  return Enumerable.Select[object, object](col, Func[object, object](fun))

Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.

def ScrapeSong(node):    
  tds = list(node.Elements(xhtml.ns+'td'))   
  anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]   
     
  title = anchor.Value   
  url = anchor.Attribute('href').Value   
  artist = tds[1].Value   
  year = tds[2].Value   
  genre = tds[3].Value   
  difficulty = tds[4].Value   
  _type = tds[5].Value   
  added = DateTime.Parse(tds[6].Value)   
     
  return Song(title, artist, added, url, year, genre, difficulty, _type)   

songs = ThenBy(OrderByDesc(  
          Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),   
          lambda s: s.added), lambda s: s.artist)

And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.

Posted By Harry Pierson at 5:16 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

IronPython and Linq to XML Part 1: Introduction

Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.

There are lots of songs available for Rock Band - 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.

I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.

BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.

Posted By Harry Pierson at 5:07 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Monday, November 24, 2008

IronPython and WPF Part 5: Interactive Console

One of the hallmarks of dynamic language programming is the use of the interactive prompt, otherwise known as the Read-Eval-Print-Loop or REPL. Even though I’m building a WPF client application, I’d still like to have the ability to poke around and even modify the app as it’s running from the command prompt, REPL style.

If you work thru the IronPython Tutorial, there are exercises for interactively building both a WinForms and a WPF application. In both scenarios, you create a dedicated thread to service the UI so it can run while the interactive prompt thread is blocked waiting for user input. However, as we saw in the last part of this series, UI elements in both WinForms and WPF can only be accessed from the thread they are created on. We already know how to marshal calls to the correct UI thread – Dispatcher.Invoke. However, what we need is a way to intercept commands entered on the interactive prompt so we can marshal them to the correct thread before they execute.

Luckily, IronPython provides just such a mechanism: clr module’s SetCommandDispatcher. A command dispatcher is a function hook that gets called for every command the user enters. It receives a single parameter, a delegate representing the command the user entered. In the WPF and WinForms tutorials, you use this function hook to marshal the commands to the right thread to be executed. Here’s the command dispatcher from the WPF tutorial:

def DispatchConsoleCommand(consoleCommand):
    if consoleCommand:
        dispatcher.Invoke(DispatcherPriority.Normal, consoleCommand)

The dispatcher.Invoke call looks kinda like the UIThread decorator from the Background Processing part of this series, doesn’t it?

Quick aside: I looked at using SyncContext here instead of Dispatcher, since I don’t care about propagating a return value back to the interactive console thread. However, SyncContext expects a SendOrPostDelegate, which expects a single object parameter. The delegate passed to the console hook function is an Action with no parameters. I could have built a wrapper function that took a single parameter which it would ignore, but I decided it wasn’t worth it. The more I look at it, the more I believe SyncContext is a good idea with a bad design.

I wrapped all the thread creation and command dispatching into a reusable helper class called InteractiveApp.

class InteractiveApp(object):
  def __init__(self):
    self.evt = AutoResetEvent(False)
    
    thrd = Thread(ThreadStart(self.thread_start))
    thrd.ApartmentState = ApartmentState.STA
    thrd.IsBackground = True
    thrd.Start()
    
    self.evt.WaitOne()
    clr.SetCommandDispatcher(self.DispatchConsoleCommand)
    
  def thread_start(self):
    try:
      self.app = Application()
      self.app.Startup += self.on_startup
      self.app.Run()
    finally:
      clr.SetCommandDispatcher(None)

  def on_startup(self, *args):
    self.dispatcher = Threading.Dispatcher.FromThread(Thread.CurrentThread)
    self.evt.Set()
    
  def DispatchConsoleCommand(self, consoleCommand):
    if consoleCommand:
        self.dispatcher.Invoke(consoleCommand)
    
  def __getattr__(self, name):
    return getattr(self.app, name)

The code is pretty self explanatory. The constructor (__init__) creates the UI thread, starts it, waits for it to signal that it’s ready via an AutoResetEvent and then finally sets the command dispatcher. The UI thread creates and runs the WPF application, saves the dispatcher object as a field on the object, then signals that it’s ready. DispatchConsoleCommand is nearly identical to the earlier version, I’ve just made it an instance method instead of a stand-alone function. Finally, I define __getattr__ so that any operations invoked on InteractiveApp are passed thru to the contained WPF Application instance.

In my app.py file, I look to see if the module has been started directly or if it’s been imported into another module. If the module is run directly (aka ‘ipy app.py’) then the global __name__ variable will be ‘__main__’. In that case, we start the application up normally (i.e. without the interactive prompt) by just creating an Application then running it with a Window instance. Otherwise, we are importing this app into another module (typically, the interactive console), so we create an InteractiveApp instance and we create an easy to use run method that can create the instance of the main window.

if __name__ == '__main__':
  app = wpf.Application()
  window1 = MainWin.MainWindow()
  app.Run(window1.root)
  
else
  app = wpf.InteractiveApp()

  def run():
    global mainwin
    mainwin = MainWin.MainWindow()
    mainwin.root.Show()

If you want to run the app interactively, you simply import the app module and call run. Here’s a sample session where I iterate thru the items bound to the first list box. Of course, I can do a variety of other operations I can do such as manipulate the data or create new UI elements.

IronPython 2.0 (2.0.0.0) on .NET 2.0.50727.3053  
>>> import app  
>>> app.run()  
#at this point the app window launches
>>> for i in app.mainwin.allAlbumsListBox.Items:  
...     print i.title  
...  
Harvest Festivals  
Mrs. Gardner's Art  
Riley's Playdate  
August 13  
Camp Days  
July 14  
May Photo Shoot  
Summer Play 2006  
Lake Washington With The Gellers  
Camp Pierson '06  
January 28

One small thing to keep in mind: if you exit the command prompt, the UI thread will also exit since it’s marked as a background thread. Also, it looks like you could shut the client down then call run again to restart it, but you can’t. If you shut the client down, the Run method in InteractiveApp.thread_start exits, resets the Command Dispatcher to nothing and the thread terminates. I could fix it so that you could run the app multiple times, but I find I typically only run the app once for a given session anyway.

Posted By Harry Pierson at 10:45 AM Pacific Standard Time

Friday, November 21, 2008

Resolver One 1.3 Released

ResolverLogo IronPython’s biggest customer is Resolver Systems, makers of Resolver One, “a familiar spreadsheet-like interface with the powerful Python programming language to give you a tool to analyse and present your data.” While I think they have a great product on pure merit - I’ve been impressed with their product since I was introduced to it at Lang.NET this year - I’m particularly interested in Resolver One as it’s written in IronPython. They use IPy not only as the embedded language exposed to end users but as the underlying implementation language as well.

Furthermore, these guys are heavily involved in the IPy community. Resolver developer Michael Foord is writing a book on IronPython and was our first Dynamic Language MVP. Michael’s Resolver colleague Jonathan Hartley did me a solid by taking my space at ØreDev. Even the CTO and Co-founder Giles Thomas is a regular blogger and speaker at events. Let me tell you, having guys this great in the community sure makes my job easier.

I just wanted to give the Resolver folks a shout out and say major congratulations on shipping a new version of their core Resolver One product. Michael has more info on this release as well as a glance forward with their plans for their next (IPy 2.0, woot!)

Posted By Harry Pierson at 10:53 AM Pacific Standard Time

Background Processing Re-Revisited

OK, here’s the last word on this whole background processing / concurrency decorators thing. I went back and re-wrote the original decorators, but using the approach I used with the SyncContext version. I don’t want to rehash it again, here are the main points:

  • Instead of using a property to retrieve the dispatcher, I get it via Application.Current.MainWindow.Dispatcher (checking to be sure Current and MainWindow aren’t null…err, None). This way, I pick up the dispatcher automatically rather than forcing a specific interface on the class with decorated methods. In fact, this approach should work with pure functions as well.
  • Since I don’t have a convenient function like SetSynchronizationContext, I store the dispatcher in thread local storage for later use in calling back to the UI thread.
  • Unlike the SyncContext version, this version propagates the return value of @UIThread decorated functions. I don’t propagate the return value of @BGThread functions – there’d be no point farming a task to a background thread then blocking the UI thread waiting for a response.

As usual, the code is on my SkyDrive. It includes both the SyncContext and Dispatcher version of the decorators.

Posted By Harry Pierson at 7:20 AM Pacific Standard Time

Introducing IronPython Article

FYI, my Introducing IronPython article from the .NET Languages issue CoDe magazine is now available online in it’s entirety. Previously, only the introduction was available online. And while we’re on the subject, major thanks to the folks who at the CoDe magazine booth at PDC, who gave me several copies of that issue.

Posted By Harry Pierson at 12:08 AM Pacific Standard Time

Thursday, November 20, 2008

IronPython and WPF Background Processing Revisited

Yesterday, I blogged about using decorators to indicate if a given function should execute on the UI or background thread. While the solution works, I wrote “I’m thinking there might be a way to use SynchronizationContext to marshal it automatically, but I haven’t tried to figure that out yet.” I had some time this morning so I figured out how to use SynchronizationContext instead of the WPF dispatcher.

Leslie Sanford wrote a pretty good overview, but the short version is that SyncContext is an abstraction for concurrency management. It lets you write code that is ignorant of specific synchronization mechanisms in concurrency-aware managed frameworks like WinForms and WPF. For example, while my previous version worked fine, it was specific to WPF. If I wanted to provide similar functionality that worked with WinForms, I’d have to rewrite my decorators to use Control.Invoke. But if I port them over to use SyncContext, they would work with WinForms, WPF and any other library that plugs into SyncContext.

SyncContext abstracts away both initially obtaining the sync context as well as marshaling calls back to the UI thread. SyncContext provides a static property to access  current context, instead of a framework specific mechanism like accessing the Dispatcher property of the WPF Window class. Once you have a context, you can call Send or Post to marshal the call back to the UI thread (Send blocks the calling thread, Post doesn’t).

With that in mind, here’s the new version of BGThread and UIThread. Slightly more complex, but still pretty simple clocking in at just under 30 lines.

def BGThread(fun): 
  def argUnpacker(args): 
    oldSyncContext = SynchronizationContext.Current
    try:
      SynchronizationContext.SetSynchronizationContext(args[-1])
      fun(*args[:-1])
    finally:
      SynchronizationContext.SetSynchronizationContext(oldSyncContext)
  
  def wrapper(*args):
    args2 = args + (SynchronizationContext.Current,)
    ThreadPool.QueueUserWorkItem(WaitCallback(argUnpacker), args2)
  
  return wrapper

def UIThread(fun):
  def unpack(args): 
    ret = fun(*args)
    if ret != None:
      import warnings
      warnings.warn(fun.__name__ + " function returned " + str(ret) + " but that return value isn't propigated to the calling thread")

  def wrapper(*args):
    if SynchronizationContext.Current == None:
      fun(*args)
    else:
      SynchronizationContext.Current.Send(SendOrPostCallback(unpack), args)
     
  return wrapper

In the BGThread wrapper, I add the current SyncContext to the parameter tuple that I pass to the background thread. Once on the background thread, I set the current SyncContext to the last element of the the parameter tuple then call the decorated function with the remaining parameters. (for the non pythonic: args[:-1] is Python slicing syntax that means “all but the last element of args”). Using a try/finally block is probably overkill – I expect the current SyncContext to be either None or leftover garbage – but the urge to clean up after myself is apparently much stronger on the background thread than it is in say my office. :)

In the UIThread wrapper, I grab the current context and invoke the decorated method via the Send method. Like QueueUserWorkItem, SyncContext Send and Post only support a single parameter, so I use the same *args trick I described in my last post. (I changed the name to unpack in the code above for blog formatting purposes)

One major caveat about this approach is that there’s no way to return a value from a function decorated as UIThread. I understand why SyncContext.Post doesn’t return a value (it’s async) but SyncContext.Send is synchronous call, so why doesn’t it marshal the return value back to the calling thread? WPF’s Dispatcher.Invoke and WinForm’s Control.Invoke both return a value. I didn’t handle the return value in my original version of UIThread, but now that I’ve moved over to using SyncContext, I can’t. Not sure why the SyncContext is designed that way – seems like a design flaw to me. Since the return value won’t propagate, I sniff the result decorated function’s return value and raise a warning if it’s not None.

I’ve uploaded the SyncContext version to my SkyDrive in case you want the code for yourself. Note, I’ll thinking I’ll revise code this one more time – I want to rebuild the WPF version so that it propagates return values and picks up an dispatcher via Application.Current.MainWindow rather than having to have a dispatcher property on my class.

Posted By Harry Pierson at 2:57 PM Pacific Standard Time

Wednesday, November 19, 2008

IronPython and WPF Part 4: Background Processing

Like many apps today, my WL Spaces photo viewer is a connected app. The various WL Spaces RSS feeds that drive the app can take a several seconds to download. Unless you like annoying your users, it’s a bad idea to lock up your user interface while you make you make synchronous network calls on your UI thread. Typically, this long running processing gets farmed out to a background thread which keeps the UI thread free to service the user events.

.NET provides a variety of mechanisms for doing long running processing on a background thread. For example you can create a new thread, you can queue a work item to the ThreadPoool or use the BackgroundWorker component. However, none of these are particularly pythonic, so I set out to see if I could leverage any of Python’s unique capabilities to make background processing as easy as possible. This is what I ended up with:

def OnClick(self, sender, args): 
    self.DLButton.IsEnabled = False 
    self.BackgroundTask(self._url.Text) 

@BGThread   
def BackgroundTask(self, url): 
    wc = WebClient()
    data = wc.DownloadString(Uri(url))   
    self.Completed(data) 
     
@UIThread 
def Completed(self, data): 
    self.DLButton.IsEnabled = True
    self._text.Text = data

By using the cool decorators feature of Python, I’m able to declaratively indicate whether I want a given method to be executed on the UI thread or on a background thread. Doesn’t get much easier than that. Even better, the implementations of BGThread and UIThread are only about twenty lines of Python code combined!

Decorators kinda look like custom .NET attributes. However, where .NET attributes are passive (you have to ask for them explicitly), decorators act as an active modifier to the functions they are attached to. In that respect, they’re kind of like aspects. Certainly, I would consider which thread a given method executes on to be a cross-cutting concern.

The Completed function above is exactly the same as if I had written the following:

def Completed(self, data): 
    self.DLButton.IsEnabled = True 
    self._text.Text = data 
Completed = UIThread(Completed)

In C#, you can’t pass a function as a parameter to another function – you have to first wrap that function in a delegate. Python, like F#, directly supports higher-order functions. This lets you easily factor common aspectual code out into reusable functions then compose them with your business logic. The decorators have no knowledge of the functions they are attached to and the code that calls those functions are written in complete ignorance of the decorators. Python goes the extra mile beyond even F# by providing the ‘@’ syntax.

Here are the implementations of my the UIThread and BGThread decorators:

def BGThread(fun): 
  def argUnpacker(args): 
    fun(*args)
  
  def wrapper(*args): 
    ThreadPool.QueueUserWorkItem(WaitCallback(argUnpacker), args)
  
  return wrapper

def UIThread(fun):
  def wrapper(self, *args):
    if len(args) == 0:
      actiontype = Action1[object]
    else:
      actiontype = Action[tuple(object for x in range(len(args)+1))]

    action = actiontype(fun)
    self.dispatcher.Invoke(action, self, *args)
    
  return wrapper 

BGThread defines a wrapper function that queues a call to the decorated function to the .NET thread pool.  UIThread defines a wrapper that marshals the call to the UI thread by using a WPF Dispatcher. I’m thinking there might be a way to use SynchronizationContext to marshal it automatically, but I haven’t tried to figure that out yet. The above approach does require a dispatcher property hanging off the class, but that’s fairly trivial to implement and seems like a small price to pay to get declarative background thread processing.

A couple of quick implementation notes:

  • The ‘*args’ syntax used in those methods above means “given me the rest of the positional arguments in a tuple”. Kinda like the C# params keyword. But that syntax also lets you pass a tuple of parameters to a function, and have them broken out into individual parameters. QueueUserWorkItem only supports passing a single object into the queued function, so I pass the tupled arguments to the argUnpacker method, which in turn untuples the arguments and calls the decorated function.
  • The System assembly includes the single parameter Action<T> delegate. The current DLR provides Action delegates with zero, two and up to sixteen parameters. However, those are in a separate namespace (remember?) and IPy seems to have an issue with importing overloaded type names into the current scope. I could have used their namespace scoped name, but instead I redefined the version from System to be called Action1.
  • To interop with .NET generic types, IPy uses the legal but rarely used Python syntax type[typeparam]. For example, to create a List of strings, you would say “List[str]()”. The type parameter is a tuple, so in UIThread I build a tuple of objects based on the number of arguments passed into wrapper (with the special case of a single type parameter using Action1 instead of Action).

I haven’t uploaded my WL Spaces Photo Viewer app because I keep making changes to it as I write this blog post series. However, for this post I built a simple demo app so I could focus on just the threading scenario. I’ve stuck the code for that demo up on my SkyDrive, so feel free to leverage it as you need.

Posted By Harry Pierson at 1:47 PM Pacific Standard Time

Monday, November 17, 2008

IronPython and WPF Part 3: Data Binding

Here’s the short version of this post: data binding in WPF to IPy objects just works...mostly. However, I’m guessing you are much more interested in the long version.

Typically, data binding depends on reflection. For example, the following snippet of XAML defines a data bound list box where the title property of each object in the bound collection gets bound to the text property of a text block control. WPF would typically find the title property of the bound objects via reflection.

<ListBox Grid.Column="0" x:Name="listbox1" >
  <ListBox.ItemTemplate>
    <DataTemplate>
      <TextBlock Text="{Binding Path=title}" />
    </DataTemplate>
  </ListBox.ItemTemplate>
</ListBox>

The problem is that IronPython objects don’t support reflection – or more accurately, reflection won’t give you the answer you’re expecting. Every IPy object does have a static type, but it implements Python’s dynamic type model. [1] Thus, if you reflect on the IPy object looking for the title property or field, you won’t find it. It might seem we’re in a bit of a bind (pun intended). However, WPF does provide an out:

“You can bind to public properties, sub-properties, as well as indexers of any common language runtime (CLR) object. The binding engine uses CLR reflection to get the values of the properties. Alternatively, objects that implement ICustomTypeDescriptor or have a registered TypeDescriptionProvider also work with the binding engine.”
WPF Binding Sources Overview, MSDN Library

Luckily for us, IronPython objects implement ICustomTypeDescriptor [2]. That snippet of XAML above? It’s straight from my photo viewing app. All I had to do was define the data template in the list box XAML then set the ItemsSource property of the list box instance.

w.listbox1.ItemsSource = albumsFeed.channel.item

As I said, it just works. However, I did hit one small snag – hence the “mostly” caveat above.

If you look at the top level WL Spaces photos feed, you’ll see that each item’s title starts with “Photo Album:”. Yet in the screenshot of my app, you’ll notice that I’ve stripped that redundant text out of the title. Typically, if you want to change the bound value during the binding process, you build an IValueConverter class. I needed two value conversions in my app, stripping “Photo Album:” for the album list box and converting a string URL into a BitmapImage for the image list box.

IronPython objects can inherit from a .NET interface, so there’s no problem building an IValueConverter. However, in order to use a custom IValueConverter from XAML, you need to declare it in XAML as a static resource. However, as you might imagine, dynamic IPy objects don’t work as static resources. So while I can define an IValueConverter in Python, I can’t create one from XAML.

There are a few possible solutions to this. The first is to build up the data template in code. If you do that, they you can programmatically add the converte