IronPython and Linq to XML Part 3: Consuming Atom Feeds

Now that I have my list of Rock Band songs, I need to generate a Zune playlist. I wrote that Zune just uses the WMP playlist format, but that’s not completely true. Media elements in a Zune playlist have several attributes that appear unique to Zune.

Because of Zune Pass, Zune supports the idea of streaming playlists where the songs are downloaded on demand instead of played from the local hard drive. In order to enable this, media elements in Zune playlists can have a serviceID attribute, a GUID that uniquely identifies the song on the Zune service. We also need the song’s album and duration – the Zune software summarily removes songs that don’t include the duration.

Of course, the Rock Band song list doesn’t include the Zune song service ID. It also doesn’t include the song’s album or duration. So we need a way, given the song’s title and artist (which we do have) to get its album, duration and service ID. Luckily, the Zune service provides a way to do exactly this, albeit an undocumented way. Via Fiddler2, I learned that Zune exposes a set of Atom feed web services on catalog.zune.net that the UI uses when you search the marketplace from the Zune software. There are feeds to search by artist and by album but the one we care about is the search by track. For example, here’s the track query for Pinball Wizard by The Who.

Since these feeds are real XML, I can simply use XDocument.Load to suck down the XML. Then I look for the first Atom entry element using similar LINQ to XML techniques I wrote about last time. If there’s no Atom elements, that means that the search failed – either Zune doesn’t know about the song or it can’t find it via the Rock Band provided title and artist. Of the 461 songs on Rock Band right now, my script can find 417 of them on Zune automatically.

Of course, since the Zune data is in XML instead of HTML, finding the data I’m looking for is much easier that it was to find the Rock Band song data. Here’s the code pull the relevant information out of the Zune catalog feed that we need.

def ScrapeEntry(entry):
  id = entry.Element(atomns+'id').Value  
  length = entry.Element(zunens+'length').Value  

  d = {}  
  d['trackTitle'] = entry.Element(atomns+'title').Value  
  d['albumArtist'] = entry.Element(zunens+'primaryArtist').Element(zunens+'name').Value  
  d['trackArtist'] = d['albumArtist']  
  d['albumTitle'] = entry.Element(zunens+'album').Element(zunens+'title').Value  

  if id.StartsWith('urn:uuid:'):  
    d['serviceId'] = "{" + id.Substring(9) + "}"  
  else:  
    d['serviceId'] = id  

  m = length_re.Match(length)  
  if m.Success:  
    min = int(m.Groups[1].Value)  
    sec = int(m.Groups[2].Value)  
    d['duration'] = str((min * 60 + sec) * 1000)  
  else:  
    d['duration'] = '60000'  

  return d  

trackurl = catalogurl + song.search_string
trackfeed = XDocument.Load(trackurl)  
trackentry = First(trackfeed.Descendants(atomns+'entry'))  
track = ScrapeEntry(trackentry)

A few quick notes:

  • song.search_string returns the song title and artist as a plus delimited string. i.e. pinball+wizard+the+who. However, many Rock Band songs end in a parenthetical like (Cover Version) so I automatically strip that off for the search string
  • duration in the Atom feed is stored like PT3M23S, which means the song is 3:23 long. The playlist file expect the song length in milliseconds, so I use a .NET regular expression to pull out the minutes and seconds and do the conversion. It’s not exact – songs lengths usually aren’t exactly a factor of seconds, but as far as I can understand, Zune just uses that to display in the UI – it doesn’t affect playback at all.

Now I have a list of songs with all the relevant metadata, next time I’ll write it out into a Zune playlist file.

Early Christmas from Iron Languages and DLR

Tomorrow may be Thanksgiving, but the Microsoft DevDiv dynamic language teams are trying to make it feel like Christmas with three separate pre-holiday releases.

  1. IronPython 2.0 RC2 
    We were really hoping to only have one release candidate, but we ended up with a couple of significant bugs that we couldn’t push off to 2.0.1. With December holidays coming soon, RC2 has a pretty small window before we declare RTM so now is the time to download the release and try your code out.
  2. IronRuby 1.0 Alpha 2 
    There’s been zero blog traffic on this, just a notice on the IronRuby mailing list. As per said notice, “Notable features” include “the inclusion of iirb.bat, igem.bat, irails.bat, irake.bat”.
  3. New DLR CodePlex Project
    The DLR source has been available as part of IronPython for over a year but now they have their own home on CodePlex. Check out the Release Notes for an overview, reads some Docs and Specs or just download their initial v0.9 beta. Their v0.9 beta is synced with IPy 2.0 RC2 (and their v0.9 final will sync with IPy 2.0 RTM) but it also includes synced versions of IronRuby and ToyScript in both source and binaries. Plus, Sesh has promised “weekly code drops”. Finally, unlike IronPython and IronRuby, DLR is using the discussion section of their CodePlex site – I’m eager to see how well the new-ish discussion/mailing list integration works.

So there you go, new versions of IronPython and IronRuby plus a whole new DLR CodePlex project to boot. Enjoy.

IronPython and Linq to XML Part 2: Screen Scraping

First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.

Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.

def loadStream(streamreader):
  from System.Xml.Linq import XDocument
  from Sgml import SgmlReader

  reader = SgmlReader()
  reader.DocType = "HTML"
  reader.InputStream = streamreader
  return XDocument.Load(reader)

def load(url):
  from System import Uri
  from System.IO import StreamReader

  if isinstance(url, str):
    url = Uri(url)

  if url.Scheme == "file":
    from System.IO import File
    with File.OpenRead(url.LocalPath) as fs:
      with StreamReader(fs) as sr:
        return loadStream(sr)
  else:
    from System.Net import WebClient
    wc = WebClient()
    with wc.OpenRead(url) as ns:
      with StreamReader(ns) as sr:
        return loadStream(sr)

def parse(text):
  from System.IO import StringReader
  return loadStream(StringReader(text))

I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.

Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.

def FindById(node, id):
  def CheckId(n):
    a = n.Attribute('id')
    return a != None and a.Value == id

  return linq.Single(node.Descendants(), CheckId)

(Side note – I didn’t like the verbosity of the a != None and a.Value == id line of code, but XAttributes are not comparable by value. That is, I can’t write node.Attribute('id') == XAttribute('id', id). And writing ``node.Attribute('id').Value == id11 only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)

LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:

def Single(col, fun):
  return Enumerable.Single[object](col, Func[object, bool](fun))

def Where(col, fun):
  return Enumerable.Where[object](col, Func[object, bool](fun))

def Select(col, fun):
  return Enumerable.Select[object, object](col, Func[object, object](fun))

Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.

def ScrapeSong(node):
  tds = list(node.Elements(xhtml.ns+'td'))
  anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]

  title = anchor.Value
  url = anchor.Attribute('href').Value
  artist = tds[1].Value
  year = tds[2].Value
  genre = tds[3].Value
  difficulty = tds[4].Value
  _type = tds[5].Value
  added = DateTime.Parse(tds[6].Value)

  return Song(title, artist, added, url, year, genre, difficulty, _type)

songs = ThenBy(OrderByDesc(
          Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),
          lambda s: s.added), lambda s: s.artist)

And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.

IronPython and Linq to XML Part 1: Introduction

Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.

There are lots of songs available for Rock Band – 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.

I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.

BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.

IronPython and WPF Part 5: Interactive Console

One of the hallmarks of dynamic language programming is the use of the interactive prompt, otherwise known as the Read-Eval-Print-Loop or REPL. Even though I’m building a WPF client application, I’d still like to have the ability to poke around and even modify the app as it’s running from the command prompt, REPL style.

If you work thru the IronPython Tutorial, there are exercises for interactively building both a WinForms and a WPF application. In both scenarios, you create a dedicated thread to service the UI so it can run while the interactive prompt thread is blocked waiting for user input. However, as we saw in the last part of this series, UI elements in both WinForms and WPF can only be accessed from the thread they are created on. We already know how to marshal calls to the correct UI thread – Dispatcher.Invoke. However, what we need is a way to intercept commands entered on the interactive prompt so we can marshal them to the correct thread before they execute.

Luckily, IronPython provides just such a mechanism: clr module’s SetCommandDispatcher. A command dispatcher is a function hook that gets called for every command the user enters. It receives a single parameter, a delegate representing the command the user entered. In the WPF and WinForms tutorials, you use this function hook to marshal the commands to the right thread to be executed. Here’s the command dispatcher from the WPF tutorial:

def DispatchConsoleCommand(consoleCommand):
    if consoleCommand:
        dispatcher.Invoke(DispatcherPriority.Normal, consoleCommand)

The dispatcher.Invoke call looks kinda like the UIThread decorator from the Background Processing part of this series, doesn’t it?

Quick aside: I looked at using SyncContext here instead of Dispatcher, since I don’t care about propagating a return value back to the interactive console thread. However, SyncContext expects a SendOrPostDelegate, which expects a single object parameter. The delegate passed to the console hook function is an Action with no parameters. I could have built a wrapper function that took a single parameter which it would ignore, but I decided it wasn’t worth it. The more I look at it, the more I believe SyncContext is a good idea with a bad design.

I wrapped all the thread creation and command dispatching into a reusable helper class called InteractiveApp.

class InteractiveApp(object):
  def __init__(self):
    self.evt = AutoResetEvent(False)

    thrd = Thread(ThreadStart(self.thread_start))
    thrd.ApartmentState = ApartmentState.STA
    thrd.IsBackground = True
    thrd.Start()

    self.evt.WaitOne()
    clr.SetCommandDispatcher(self.DispatchConsoleCommand)

  def thread_start(self):
    try:
      self.app = Application()
      self.app.Startup += self.on_startup
      self.app.Run()
    finally:
      clr.SetCommandDispatcher(None)

  def on_startup(self, *args):
    self.dispatcher = Threading.Dispatcher.FromThread(Thread.CurrentThread)
    self.evt.Set()

  def DispatchConsoleCommand(self, consoleCommand):
    if consoleCommand:
        self.dispatcher.Invoke(consoleCommand)

  def __getattr__(self, name):
    return getattr(self.app, name)

The code is pretty self explanatory. The constructor (__init__) creates the UI thread, starts it, waits for it to signal that it’s ready via an AutoResetEvent and then finally sets the command dispatcher. The UI thread creates and runs the WPF application, saves the dispatcher object as a field on the object, then signals that it’s ready. DispatchConsoleCommand is nearly identical to the earlier version, I’ve just made it an instance method instead of a stand-alone function. Finally, I define __getattr__ so that any operations invoked on InteractiveApp are passed thru to the contained WPF Application instance.

In my app.py file, I look to see if the module has been started directly or if it’s been imported into another module. If the module is run directly (aka ‘ipy app.py’) then the global __name__ variable will be ‘__main__’. In that case, we start the application up normally (i.e. without the interactive prompt) by just creating an Application then running it with a Window instance. Otherwise, we are importing this app into another module (typically, the interactive console), so we create an InteractiveApp instance and we create an easy to use run method that can create the instance of the main window.

if __name__ == '__main__':
  app = wpf.Application()
  window1 = MainWin.MainWindow()
  app.Run(window1.root)

else:  
  app = wpf.InteractiveApp()

  def run():
    global mainwin
    mainwin = MainWin.MainWindow()
    mainwin.root.Show()

If you want to run the app interactively, you simply import the app module and call run. Here’s a sample session where I iterate thru the items bound to the first list box. Of course, I can do a variety of other operations I can do such as manipulate the data or create new UI elements.

IronPython 2.0 (2.0.0.0) on .NET 2.0.50727.3053
>>> import app
>>> app.run()
#at this point the app window launches
>>> for i in app.mainwin.allAlbumsListBox.Items:
...     print i.title
...
Harvest Festivals
Mrs. Gardner's Art
Riley's Playdate
August 13
Camp Days
July 14
May Photo Shoot
Summer Play 2006
Lake Washington With The Gellers
Camp Pierson '06
January 28

One small thing to keep in mind: if you exit the command prompt, the UI thread will also exit since it’s marked as a background thread. Also, it looks like you could shut the client down then call run again to restart it, but you can’t. If you shut the client down, the Run method in InteractiveApp.thread_start exits, resets the Command Dispatcher to nothing and the thread terminates. I could fix it so that you could run the app multiple times, but I find I typically only run the app once for a given session anyway.