Tomorrow may be Thanksgiving, but the Microsoft DevDiv dynamic language teams are trying to make it feel like Christmas with three separate pre-holiday releases.
- IronPython 2.0 RC2
We were really hoping to only have one release candidate, but we ended up with a couple of significant bugs that we couldn’t push off to 2.0.1. With December holidays coming soon, RC2 has a pretty small window before we declare RTM so now is the time to download the release and try your code out. - IronRuby 1.0 Alpha 2
There’s been zero blog traffic on this, just a notice on the IronRuby mailing list. As per said notice, “Notable features” include “the inclusion of iirb.bat, igem.bat, irails.bat, irake.bat”. - New DLR CodePlex Project T
he DLR source has been available as part of IronPython for over a year but now they have their own home on CodePlex. Check out the Release Notes for an overview, reads some Docs and Specs or just download their initial v0.9 beta. Their v0.9 beta is synced with IPy 2.0 RC2 (and their v0.9 final will sync with IPy 2.0 RTM) but it also includes synced versions of IronRuby and ToyScript in both source and binaries. Plus, Sesh has promised “weekly code drops”. Finally, unlike IronPython and IronRuby, DLR is using the discussion section of their CodePlex site – I’m eager to see how well the new-ish discussion/mailing list integration works.
So there you go, new versions of IronPython and IronRuby plus a whole new DLR CodePlex project to boot. Enjoy.
First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.
Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.
def loadStream(streamreader):
from System.Xml.Linq import XDocument
from Sgml import SgmlReader
reader = SgmlReader()
reader.DocType = "HTML"
reader.InputStream = streamreader
return XDocument.Load(reader)
def load(url):
from System import Uri
from System.IO import StreamReader
if isinstance(url, str):
url = Uri(url)
if url.Scheme == "file":
from System.IO import File
with File.OpenRead(url.LocalPath) as fs:
with StreamReader(fs) as sr:
return loadStream(sr)
else:
from System.Net import WebClient
wc = WebClient()
with wc.OpenRead(url) as ns:
with StreamReader(ns) as sr:
return loadStream(sr)
def parse(text):
from System.IO import StringReader
return loadStream(StringReader(text))
I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.
Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.
def FindById(node, id):
def CheckId(n):
a = n.Attribute('id')
return a != None and a.Value == id
return linq.Single(node.Descendants(), CheckId)
(Side note – I didn’t like the verbosity of the “a != None and a.Value == id” line of code, by XAttributes are not comparable by value. That is, I can’t write “node.Attribute(‘id’) == XAttribute(‘id’, id)”. And writing “node.Attribute(‘id’).Value == id” only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)
LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:
def Single(col, fun):
return Enumerable.Single[object](col, Func[object, bool](fun))
def Where(col, fun):
return Enumerable.Where[object](col, Func[object, bool](fun))
def Select(col, fun):
return Enumerable.Select[object, object](col, Func[object, object](fun))
Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.
def ScrapeSong(node):
tds = list(node.Elements(xhtml.ns+'td'))
anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]
title = anchor.Value
url = anchor.Attribute('href').Value
artist = tds[1].Value
year = tds[2].Value
genre = tds[3].Value
difficulty = tds[4].Value
_type = tds[5].Value
added = DateTime.Parse(tds[6].Value)
return Song(title, artist, added, url, year, genre, difficulty, _type)
songs = ThenBy(OrderByDesc(
Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),
lambda s: s.added), lambda s: s.artist)
And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.
Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.
There are lots of songs available for Rock Band - 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.
I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.
BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.