Python Blog Posts (page 1 of 2)
I’ve been doing a little CPython coding lately. Even though I left the IronPython team a while ago (and IronPython is now under new management) I’m still still a big fan of the Python language and it’s great for prototyping.
However, one thing I don’t like about Python is how it uses the PYTHONPATH environment variable. I like to keep any non-standard library dependencies in my project folder, but then you have to set the PYTHONPATH environment variable in order for the Python interpreter to resolve those packages. Personally, I wish there was a command line parameter for specifying PYTHONPATH – I hate having to modify the environment in order to execute my prototype. Yes, I realize I don’t have to modify the machine-wide environment – but I would much prefer a stateless approach to an approach that requires modification of local shell state.
I decided to build a Powershell script that takes allows the caller to invoke Python while specifying the PYTHONPATH as a parameter. The script saves off the current PYTHONPATH, sets it to the passed in value, invokes the Python interpreter with the remaining script parameters, then sets PYTHONPATH back to its original value. While I was at it, I added the ability to let the user optionally specify which version of Python to use (defaulting to the most recent) as well as a switch to let the caller chose between invoking python.exe or pythonw.exe.
The details of the script are fairly mundane. However, building a Powershell script that supported optional named parameters and collected all the unnamed arguments together in a single parameter took a little un-obvious Powershell voodoo that I thought was worth blogging about.
I started with the following param declaration for my function
param ( [string] $LibPath="", [switch] $WinApp, [string] $PyVersion="" )
These three named parameters control the various features of my Python Powershell script. Powershell has an automatic variable named $args that holds the arguments that don’t get bound to a named argument. My plan was to pass the contents of the $args parameter to the Python interpreter. And that plan works fine…so long as none of the non-switch parameters are omitted.
I mistakenly (and in retrospect, stupidly) thought that since I had provided default values for the named parameters, they would only bind to passed-in arguments by name. However, Powershell binds non-switch parameters by position if the names aren’t specified . For example, this is the command line I use to execute tests from the root of my prototype project:
cpy -LibPath .Libsite-packages .Scriptsunit2.py discover -s .src
Obviously, the $LibPath parameter gets bound to the “.Libsite-package” argument. However, since $PyVersion isn’t specified by name, it gets bound by position and picks up the “.Scriptsunit2.py” argument. Clearly, that’s not what I intended – I want “.Scriptsunit2.py” along with the remaining arguments to be passed to the Python interpreter while the PyVersion parameter gets bound to its default value.
What I needed was more control over how incoming arguments are bound to parameters. Luckily, Powershell 2 introduced Advanced Function Parameters which gives script authors exactly that kind of control over parameters binding. In particular, there are two custom attributes for parameters that allowed me to get the behavior I wanted:
- Position – allows the script author to specify what positional argument should be bound to the parameter. If this argument isn’t specified, parameters are bound in the order they appear in the param declaration
- ValueFromRemainingArguments – allows the script author to specify that all remaining arguments that haven’t been bound should be bound to this parameter. This is kind of like the Powershell equivalent of params in C# or the ellipsis in C/C++.
A little experimentation with these attributes yielded the following solution:
param ( [string] $LibPath="", [switch] $WinApp, [string] $PyVersion="", [parameter(Position=0, ValueFromRemainingArguments=$true)] $args )
Note, the first three parameters are unchanged. However, I added an explicit $args parameter (I could have named it anything, but I had already written the rest of my script against $args) with the Position=0 and ValueFromRemainingArguments=$true parameter attribute values.The combination of these two attribute values means that the $args parameter is bound to an array of all the positional (aka unnamed) incoming arguments, starting with the first position. In other words – exactly the behavior I wanted.
Not sure how many people need a Powershell script that sets PYTHONPATH and auto-selects the latest version of Python, but maybe someone will find it useful. Also, I would think this approach to variadic functions with optional named parameters could be useful in other scenarios where you are wrapping an existing tool or utility in PowerShell, but need the ability to pass arbitrary parameters thru to the tool/utility being wrapped.
Here’s a quick quiz. Which of these tasks is harder to accomplish:
- Getting $6,000 from a variety of groups within Microsoft to pay for a Gold PyCon 2009 sponsorship.
- Sending PSF a check
If you guessed #2, you’d be right. It’s amazing how difficult the seemly trivial task of “give those PSF folks money” turned out to be. But it’s done now, and you can see the MS logo there on the side of all the PyCon pages.
In addition to the sponsorship, there are some great looking IronPython sessions at PyCon.
- Michael Foord and Jonathan Hartley are running a Developing with IronPython tutorial. Michael is also delivering a talk Functional Testing of Desktop Applications.
- Jim Hugunin is delivering an invited talk IronPython: Directions, Data and Demos
- Dino Viehland is sitting on the Python VMs panel and doing a talk IronPython Implementation
- Sarah Dutkiewicz (aka Coding Geekette) is doing a talk Pumping Iron into Python: Intro to FePy
As I’ve blogged before, I use CodeHTMLer to post code snippets on my blog. I hear SyntaxHighlighter is the new hotness, but since it relies on CSS the syntax highlighting only appears on the website and not in the RSS reader.
The problem with CodeHTMLer is that it only supports a handful of languages out of the box. But the language definition file is simple enough – just an XML file with a bunch of regular expressions. When I was doing a lot of F# work, I wrote an F# language definition. Now that I’m on the IronPython team, go figure I’m writing a lot of code in Python. I *know* I’ve written a Python language definition for CodeHTMLer more than once, but I would forget to post it and then lose it when I paved my laptop hard drive. So after doing this three or four times, I’ve finally remembered to put it up on my SkyDrive.
If you want to install this yourself to colorize Python code snippets with CodeHTMLer, follow the directions I posted earlier with the F# language definition.
- Ben Hall announces IronEditor, a simple dev tool for IronPython and IronRuby. Pretty nice, though fairly simplistic (as Ben readily admits). For example, it doesn’t have an interactive mode, only the ability to execute scripts and direct the output to IronEditor’s output window. However, it is a good start and I’m sure it’ll just get better. One thing he’s apparently considering is a Silverlight version. (via Michael Foord)
- Speaking of “Iron” tools, Sapphire Steel have had an IronRuby version (in alpha) of their Ruby in Steel product for several months now. I wonder if John’s had a chance to play with it.
- Speaking of John, the ASP.NET MVC / IronRuby prototype he talked about @ TechEd is now available on ASP.NET MVC Preview 4 via Phil Haack.
- Ted Neward has an article exploring the IronPython VS Integration sample that ships in the VS SDK. As I mentioned the other day, we’re starting working on a production quality implementation of VS Integration for IPy.
- Ophir Kra-Oz (aka Evil Fish) blogs Python for Executives. I like his “Risk, Recruiting, Performance and Maturity” model – four boxes, perfect for keeping an executive’s attention! 😄 Plus Ophir has some nice things to say about IronPython. (via Michael Foord)
- Ronnie Maor blogs an extension method for PythonEngine to make Eval simpler. I especially like how he uses string format syntax so you can dynamically generate the code to eval. I wonder what this would look like in IPy 2.0 with DLR Hosting API. (via IronPython URLs)
- Speaking of DLR Hosting, Seshadri has another great DLR hosting post, this time hosting IPy inside of VS08 so you can script VS08 events (document saved, window created, etc) with Python.
- Justin Etheredge has a bunch of IronRuby posts – Getting IronRuby Up and Running, Running Applications in IronRuby, Learning Ruby via IronRuby and C# Part 1. (via Sam Gentile)
- Don Syme links to several F# related posts by Ray Vernagus, though he’s apparently also experimenting with IronRuby. I’m really interested in his Purely Functional Data Structures port to F#.
- Speaking of F#, Brian has a teaser screenshot of F# upcoming CTP. However, he chooses the New Item dialog to tease, which looks pretty much like the current new item dialog (the new one does have fewer F# templates). However, if you look in the Solution Explorer, you’ll notice a real “References” node. No more #I/#R! Yeah!
- The interactive graphic in Kevin Kelly’s One Machine article is fascinating. It really highlights that the vast vast vast majority of power, storage, CPU cycles and RAM come from personal computers on the edge. Even in bandwidth, where PC’s still have the highest share but it looks to be around 1/3rd, the aggregate of all edge devices (PCs, mobile phones, PDAs, etc.) still dominates the data centers.
When it comes to processing XML, there are two basic approaches – load it all into memory at once or process it a node at a time. In the .NET world where I have spent most of the past ten years, those two models are represented by XmlDocument and XmlReader. There are alternatives to XmlDocument, such as XDocument and XPathDocument, but you get the idea.
Personally, I’ve never been a fan of SAX’s event-driven approach. Pushing events makes total sense for a human driven UI, but I never understood why anyone thought that was a good idea for stream processing XML. I like XmlReader’s pull model much better. When you’re ready for the next node, just call Read() – no mucking about setting content handlers or handling node processing events.
Luckily, the Python standard library supports both approaches. It provides both a SAX based parser as well as a pull based parser called pulldom. Pulldom doc’s are fairly sparse, but Paul Prescod wrote a nice introduction. Here’s an example from Paul’s site (slightly modified):
from xml.dom import pulldom nodes = pulldom.parse( "file.xml" ) for (event,node) in nodes: if event=="START_ELEMENT" and node.tagName=="table": nodes.expandNode( node )
Actually, I like this better than XmlReader, since it provides the nodes in a list-like construct that appeals to the functional programmer in me. I’d like it even more if Python had a native pattern matching syntax – you know, like F# – but you can get similar results by chaining together conditionals with elif.
However, IronPython doesn’t support any of the XML parsing modules from Python’s standard library. They’re all based on a C-based python module called pyexpat which IronPython can’t load. 1 I wanted a pulldom type model, so I decided to wrap XmlReader to provide a similar API and lets me write code like this:
import ipypulldom nodes = ipypulldom.parse( "sample.xml" ) for node in nodes: if node.nodeType==XmlNodeType.Element: print node.xname
There are a few differences from pulldom, but it’s basically the same model. I’m using the native .NET type XmlNodeType rather than a string to indicate the node type. Furthermore, I made the node type a property of the node, rather than a separate variable. I also didn’t implement expandNode, though doing so would be a fairly straightforward combination of XmlReader.ReadSubtree and XmlDocument.Load.