First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.
Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.
def loadStream(streamreader):
from System.Xml.Linq import XDocument
from Sgml import SgmlReader
reader = SgmlReader()
reader.DocType = "HTML"
reader.InputStream = streamreader
return XDocument.Load(reader)
def load(url):
from System import Uri
from System.IO import StreamReader
if isinstance(url, str):
url = Uri(url)
if url.Scheme == "file":
from System.IO import File
with File.OpenRead(url.LocalPath) as fs:
with StreamReader(fs) as sr:
return loadStream(sr)
else:
from System.Net import WebClient
wc = WebClient()
with wc.OpenRead(url) as ns:
with StreamReader(ns) as sr:
return loadStream(sr)
def parse(text):
from System.IO import StringReader
return loadStream(StringReader(text))
I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.
Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.
def FindById(node, id):
def CheckId(n):
a = n.Attribute('id')
return a != None and a.Value == id
return linq.Single(node.Descendants(), CheckId)
(Side note – I didn’t like the verbosity of the “a != None and a.Value == id” line of code, by XAttributes are not comparable by value. That is, I can’t write “node.Attribute(‘id’) == XAttribute(‘id’, id)”. And writing “node.Attribute(‘id’).Value == id” only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)
LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:
def Single(col, fun):
return Enumerable.Single[object](col, Func[object, bool](fun))
def Where(col, fun):
return Enumerable.Where[object](col, Func[object, bool](fun))
def Select(col, fun):
return Enumerable.Select[object, object](col, Func[object, object](fun))
Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.
def ScrapeSong(node):
tds = list(node.Elements(xhtml.ns+'td'))
anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]
title = anchor.Value
url = anchor.Attribute('href').Value
artist = tds[1].Value
year = tds[2].Value
genre = tds[3].Value
difficulty = tds[4].Value
_type = tds[5].Value
added = DateTime.Parse(tds[6].Value)
return Song(title, artist, added, url, year, genre, difficulty, _type)
songs = ThenBy(OrderByDesc(
Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),
lambda s: s.added), lambda s: s.artist)
And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.
Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.
There are lots of songs available for Rock Band - 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.
I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.
BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.