IronPython Post 2.0 Roadmap

Michael Foord (aka voidspace) twittered that “None of the IronPython team can get it together to blog regularly, except @jschementi of course.” While I’m not sure Jimmy is all that prolific either, Michael’s certainly right about me. I started this job at the beginning of April, and I’ve only blogged twenty one times in the three and a half months since. Worse, I’ve only blogged six times in the past month and a half – and half of those have been since Michael called out my lack o’ posting. My wife has blogged like twenty five times in that same time period. I can only plead pressures of a new job plus a two week vacation. I have been twittering a lot.

Michael was twittering in response to Todd Ogasawara’s post wondering about our Python 3.0 plans. Since we haven’t been particularly transparent (my fault) I thought I’d lay out our near and middle term plans.

First off, we’re on the verge of releasing 1.1.2 (the release candidate is available now), a service release in our 1.x branch which contains a bunch of bug fixes we’ve back ported from our 2.0 work. This is our last planned release in the 1.x branch. For those who don’t know, our 1.x branch tracks CPython’s 2.4 branch.

Most of our team’s focus has been on 2.0 which we’re on track to shipping later this year. Our 2.0 corresponds to CPython’s 2.5 branch. It’s a major release for us because of the addition of the Dynamic Language Runtime. Currently, you can get 2.0 Beta 3, with Beta 4 scheduled for early August (we go about 6 weeks between beta releases). If you want even fresher code than our latest release, you can pull and build the source yourself. We went about two months without pushing source due to some broken scripts, but they’re fixed now so we’re going to try and push out code much more often than we have in the past.

For the non-Python geeks in the audience, Python is undergoing a major change. Python 3.0 is going to break backwards compatibility with Python 2.x in number of ways. Breaking backwards compatibility always has to be handled carefully, so the Python community is investing quite a bit of effort to make the transition as smooth as possible.The Python Software Foundation is currently working on both 2.6 and 3.0 simultaneously. The idea is to have as much feature parity between the two releases (except for the stuff being removed from 3.0) and to provide an automatic tool to translating to the new version.

Let me be very clear (since as Todd discovered, we haven’t been to date) that once we get IronPython 2.0 out the door, we will start working towards IronPython 3.0, which will be our version of Python 3.0. We want to take the same stepping-stone approach that CPython is taking. So that means at a minimum we’ll do an IPy 2.1 with CPython 2.6′s new language and library features, (along with the usual bug fixing and other quality improvements we do every cycle) before then proceeding to work on IPy 3.0.

Until we get IPy 2.0 out the door, I’m not willing to talk about specific timelines. We’re an agile project and we’re going to be feature and quality driven, full stop. There were about seven months between the release of IPy 1.0 and 1.1, however that didn’t include much new Python feature work so it’s not a good comparison IMO. My gut tells me the IPy 2.1 release will take longer than a typical minor release while the IPy 3.0 release won’t take as long as a typical major release. Note, those are guesses, not commitments.

Besides IPy 2.1 and 3.0, the other major thing we’re working on is Visual Studio integration for IronPython. Yes, there is IronPythonStudio, but that’s a VS SDK sample not a production-quality VS integration the IPy team maintains or supports. The IntelliSense implementation is pretty flaky, the compile-oriented project system feels pretty un-pythonic and of course we need to upgrade it to support IPy 2.0 and the DLR (it would be nice if IronRuby could leverage our efforts down the road). Like everything else we do in this group, we’ll be publishing the VS Integration source code up on CodePlex as early and often as we can.

So to recap our current thinking:

IPy 1.1.2 in RC now, shipping in several weeks assuming we don’t find any major regressions
IPy 2.0 in beta now, shipping later this year
IPy 2.1 supporting new CPy 2.6 features at some point after IPy 2.1, probably longer than a typical minor release
First release of IPy integration with VS in the same timeframe as IPy 2.1 but with alpha drops as soon as we can
IPy 3.0 supporting new CPy 3.0 after IPy 2.1, probably shorter than a typical major release

One last thing, as many of you know the IronRuby project supports community contributions to the standard libraries. I wanted the IPy community to know I’m 100% committed for establishing a similar arrangement for IronPython. I’ve got nothing to announce yet, but rest assured I’ve been spending a lot of time talking to lawyers.

As always, if you’ve got opinions to share please feel free to leave me comments below, shoot me an email, or join the IPy mailing list.

IronPython 1.1.2 RC

It’s a little late, but we just pushed out the RC for IronPython 1.1.2. This is a minor release with a bunch of bug fixes we’ve backported from our 2.0 work.

However, there are two minor breaking changes I wanted to highlight:

Work Item #16348 – We’ve changed the nt.unlink method so that it raises an exception if the file doesn’t exist. This brings nt.unlink in line with CPython’s behavior, but does change the public behavior of the method.
Work Item #16735 – We’ve changed the return type of IronPython.Runtime.Operations.Ops.Id() from long to object. We return a boxed 32-bit integer for Ops.Id(), unless you’ve allocated over 2^32 objects in which case we roll over to our arbitrary precision IronMath.BigInteger type. Note, this only would affect statically typed languages compiled against the IronPython assembly. It wouldn’t affect python code in any way.

We think these are fairly minor (hence the reason we green-lit them) and furthermore these changes mirrors the eventual behavior of 2.0. Please let me know if either of the changes is a problem for you or your project.

Debugging IronPython Code in Visual Studio

In case I’m not the last person on the planet to figure this out…

In VS, click on File->Open->Project/Solution or press Ctl-Shift-O
Select ipy.exe from wherever you put it
Right click ipy.exe in Solution Explorer and select Properties
In the Command Arguments box, type “-D” (to generate debug code) and the full path to the script you want to execute. If you want to drop into interactive mode after the script executes, also include a “-i”
Open the script you specified in step 4 and place breakpoints as usual
Run via Debug->Start Debugging or press F5

Thanks Srivatsn for helping me out with this.

Deserializing XML with IronPython

Now that I can stream process XML, the next logical step is to deserialize it into some type of object graph. As I said in my last post, there are at least three different DOM-esque options on the .NET platform as well as two in the Python library (xml.dom and xml.minidom)

However, anyone who’s ever programmed against the DOM knows just what a major PITA it is.

Instead, you could deserialize the XML into a custom object tree, based on the nodes in the XML stream. In .NET, there are at least two libraries for doing this: the old-school XmlSerializer as well as the new-fangled DataContractSerializer. In these libraries, the PITA comes in defining the static types with all the various custom attribute adornments you need to tell the deserializer how to do it’s job. Actually, if you’re defining your code first, all those adornments aren’t that big a deal. However, if you’re starting from the XML, especially XML with lots of different namespaces – like say my RSS feed – defining a static type for this gets old fast.

Of course, if you’re not using a statically typed language… 😉

One of the cool aspects of dynamic languages is the ability to easily generate new types on the fly. In Python, you can create a new type by calling the type function. Here’s an example of creating a new type for a XML node:

def create_type(node, parent):  
  return type(node.name, (parent,), {'xmlns':node.namespace})

Since I’m working with XML, I wanted to make sure I handled namespaces. Thus, I add the namespace to the class definition (the third parameter in the type function above). This lets me walk up to any arbitrary object created from an XML element and check it’s namespace.

I used this dynamic type creation functionality in my xml2py module, which I added to my IronPython SkyDrive folder. It leverages ipypulldom, so make sure you get both. The heart of the module is the xml2py function, which recursively iterates thru the node stream and builds the tree. Attributes and child elements become named attributes on the object, so I can write code that looks like this:

import xml2py  
rss = xml2py.parse('http://feeds.feedburner.com/Devhawk')  
for item in rss.channel.item:  
  print item.title

You see? No screwing around with childNodes or getAttribute here.

The basic processing loop of xml2py creates a new instance of a new type when it encounters a start element tag. It then collects all the attributes and children of that element, and adds them as attributes on the element object, using the name of the type as name of the attribute. If there are multiple children with the same type name, xml2py converts that attribute to a list of values. For example, in an RSS feed, there will be likely be many rss.channel.item elements. In xml2py, the item attribute of the channel object will be a list of item objects.

Since attributes and child elements are getting slotted together, I added a _nodetype attribute on each so I can later tell (if I care) if the value was originally an attribute or element. I haven’t written py2xml yet, but that might be important then.

I do one optimization for simple string elements like <foo>bar</foo>. In this case, I create a type that inherits from string (hence the need for the parent parameter in the create_type function above) and contains the string text. It still has the xmlns and _nodetype attributes, so I could write item.title.xmlns (which is empty since RSS is in the default namespace) or item.title._nodetype (which would be XmlNodeType.Element)

It’s not much code – about 100 lines of code split evenly between the xml2py function and the _type_factory object. Given that you usually see the same element in an XML stream over an over, I didn’t want to create multiple types for the same element. So _type_factory caches types in a dictionary so I can reuse them. One of the cool things is that it’s a callable type (i.e. it implements __call__ so I can use the instance like a function. I started by defining a xtype function that didn’t cache anything, but then later switched xtype to be a _type_factory instance, but none of my code that called xtype had to change!

One other quick note. If you put xml2py.py and ipypylldom.py in a folder, you can experiment with them by launching ipy -i xml2py. This runs xml2py.py as a script, but dumps you into the interactive console when you’re thru. It will run the little snippet of code above which runs xml2py on my FeedBurner feed, but then you can play around with the rss object and see what it contains. Be sure to check out the xmlns attribute for each object in the rss.channel.link list.

Stream Processing XML in IronPython

When it comes to processing XML, there are two basic approaches – load it all into memory at once or process it a node at a time. In the .NET world where I have spent most of the past ten years, those two models are represented by XmlDocument and XmlReader. There are alternatives to XmlDocument, such as XDocument and XPathDocument, but you get the idea.

Out in non-MSFT land, the same two basic models exist, however the de facto standard for stream based processing is SAX, the Simple API for XML. SAX is supported by many languages, including Python.

Personally, I’ve never been a fan of SAX’s event-driven approach. Pushing events makes total sense for a human driven UI, but I never understood why anyone thought that was a good idea for stream processing XML. I like XmlReader’s pull model much better. When you’re ready for the next node, just call Read() – no mucking about setting content handlers or handling node processing events.

Luckily, the Python standard library supports both approaches. It provides both a SAX based parser as well as a pull based parser called pulldom. Pulldom doc’s are fairly sparse, but Paul Prescod wrote a nice introduction. Here’s an example from Paul’s site (slightly modified):

from xml.dom import pulldom
nodes = pulldom.parse( "file.xml" )  
for (event,node) in nodes:  
    if event=="START_ELEMENT" and node.tagName=="table":  
        nodes.expandNode( node )

Actually, I like this better than XmlReader, since it provides the nodes in a list-like construct that appeals to the functional programmer in me. I’d like it even more if Python had a native pattern matching syntax – you know, like F# – but you can get similar results by chaining together conditionals with elif.

However, IronPython doesn’t support any of the XML parsing modules from Python’s standard library. They’re all based on a C-based python module called pyexpat which IronPython can’t load. ¹ I wanted a pulldom type model, so I decided to wrap XmlReader to provide a similar API and lets me write code like this:

import ipypulldom  
nodes = ipypulldom.parse( "sample.xml" )
for node in nodes:
  if node.nodeType==XmlNodeType.Element:
    print node.xname

There are a few differences from pulldom, but it’s basically the same model. I’m using the native .NET type XmlNodeType rather than a string to indicate the node type. Furthermore, I made the node type a property of the node, rather than a separate variable. I also didn’t implement expandNode, though doing so would be a fairly straightforward combination of XmlReader.ReadSubtree and XmlDocument.Load.

I stuck the code for ipypulldom up in a new folder on my Skydrive: IronPython Stuff. It’s fairly short – only about 45 lines of code. Feel free to use it if you need it.

The FePy project has a .NET port of pyexpat as part of their distribution, so I assume that lets you use the standard pulldom implementation in IPy. FePy looks really cool but I haven’t had time to dig into it yet.↩

Series

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.