Tag Archives : LINQ


Ambiguous ExtensionAttribute Errors

I was recently contacted by Nathanael Jones of the ImageResizer project about a question he had posted on Stack Overflow:

How can a single .NET assembly, targeting 2.0, 3.0, 3.5, 4.0, and 4.5 concurrently, support extension methods for both C# and VB.NET consumers?

Short Answer: You can’t. You think you can, but if you’re serious about targeting .NET 2.0/3.0 and 3.5+ as well as that whole C# and VB support thing, you can’t. Not really.

Long Answer: People love extension methods. Seriously, I think some people want to marry extension methods they love them so much. They just can’t stand to be without their extension methods, even when they’re using .NET 2.0.

Rather than go without, some people figured out how to get extension methods support on older versions of the .NET Runtime. Extension methods are essentially a compile time technology – the IL that gets emitted for calling an extension method is identical to the IL for calling a normal static method. The only runtime dependency for extension methods is the ExtensionAttribute which is used to mark methods intended to be used as extension methods (as well as classes and assemblies that contain them). ExtensionAttribute is defined in System.Core from .NET 3.5, but it’s just a marker. If you define your own copy of ExtensionAttribute and use the VS 2008 or later version of the C# compiler, you can get extension methods to work on .NET 2.0.

Back when I was working on IronPython, we ran into this exact issue when we merged DLR expression trees with LINQ expression trees. LINQ trees used extension methods all over the place, but we still needed to support .NET 2.0 in IronPython. We were already using the VS08 compiler so all we had to do was add our own copy of ExtensionAttribute to the DLR and we were good to go…or so we thought. Instead, we discovered that this approach doesn’t work as advertised – at least not if you care about VB support.

The problem stems from having multiple copies of ExtensionAttribute. IronPython and DLR had no problem – they were compiled for .NET 2.0 and thus had only the one copy of ExtensionAttribute that we added to the DLR. But people who used IronPython or DLR in a .NET 3.5 project ended up two copies of ExtensionAttribute – the copy we added to DLR and the official System.Core version. Two copies of a system provided type == start of a big problem.

Actually, if you’re only using C#, having multiple copies of ExtensionAttribute isn’t that big a deal. The C# compiler raises a warning when it find multiple definitions of a type in the System namespace. Because ExtensionAttribute is in the System namespace, C# has to pick one of the colliding type definitions to use. However, since the copies of ExtensionAttribute are identical it doesn’t matter which version the C# compiler picks.

Unfortunately, Visual Basic is much less forgiving when it encounters multiple versions of the same type. Instead of a warning like C#, the VB compiler raises an error when it encounters multiple definitions of ExtensionAttribute. So the “define your own ExtensionAttribute” approach leaves you with a DLL that won’t work from VB on .NET 3.5 or later.

Excluding VB on what was the most recent version of .NET at the time was a non starter for us, so we investigated other options. We discovered that we could “solve” this issue (again “or so we thought”) by having an internal definition of ExtensionAttribute in every assembly that needed it. Since the types weren’t public, VB stopped complaining about type collisions. C# still had the compiler warning, but we had already decided not to care about that.

I said at the time “It seems counterintuitive, doesn’t it? To solve a multiple type definition problem, we defined even more copies of the type in question.” Yeah, turns out I was kinda way wrong about that. We discovered later that having an internal ExtensionAttribute per project solved the VB ambiguous type error but introduced a new “break all the other extension methods in the project error”.

Remember earlier when I wrote it didn’t matter which copy of ExtensionAttribute the C# compiler picks because they are “identical”? Remember when I wrote we could solve the VB ambiguous type error by changing the visibility of ExtensionAttribute? Woops. Changing the visibility of our ExtensionAttribute meant it was no longer identical which meant it kinda mattered which copy of ExtensionAttribute the C# compiler choose. If the C# compiler picked one of our internal ExtensionAttributes, it would break every use of extension methods in the project referencing IronPython or the DLR!

We investigated a bunch of ways to control which version of ExtensionAttribute was selected by the C# compiler, but we couldn’t find an easy, obvious way in MSBuild to control the order of references passed to the compiler. In the end, we moved the custom ExtensionAttribute into its own DLL. That way, we could reference it from our IronPython and DLR projects to get extension method support but .NET 3.5 projects referencing either IronPython or DLR could reference System.Core instead. We still got the C# warning, but since we were back to identical ExtensionAttribute  definitions, the warning could be ignored.

Believe me, if there had been any way to remove the extension methods from the DLR and IronPython, we would have done it. Having a separate assembly with just a single custom attribute definition is an ugly hack, pure and simple. But the DLR was essentially the .NET 4.0 version System.Core – getting it to run along side the .NET 3.5 version of System.Core was bound to require hacks.

My advice to Nathanial on SO was the same as I gave at the top of this post: don’t use the “define your own ExtensionAttribute” hack to try and get extension method support on .NET 2.0. Extensions methods are nice, but they aren’t worth the headache of dealing with the errors that stem from multiple definitions of ExtensionAttribute when you try to use your library from .NET 3.5 or later.


IronPython and Linq to XML Part 4: Generating XML

Now that I have my list of Rock Band songs and I can get the right Zune metadata for most of them, I just need to write out the playlist XML. This is very straight forward to do with the classes in System.Xml.Linq.

def GenMediaElement(song):
  try:
    trackurl = zune_catalog_url + song.search_string     
    trackfeed = XDocument.Load(trackurl)
    trackentry = First(trackfeed.Descendants(atomns+'entry'))
    trk = ScrapeEntry(trackentry)
    return XElement('media', (XAttribute(key, trk[key]) for key in trk))
  except:
    print "FAILED", song     
     
zpl = XElement("smil",     
  XElement("head",  
    XElement("title", "Rock Band Generated Playlist")),     
  XElement("body",     
    XElement("seq", (GenMediaElement(song) for song in songs))))

settings = XmlWriterSettings()
settings.Indent = True     
settings.Encoding = Encoding.UTF8     
with XmlWriter.Create("rockband.zpl", settings) as xtw:
  zpl.WriteTo(xtw)

XElement’s constructor takes a name (XName to be precise) and any number of child objects. These child objects can be XML nodes (aka XObjects) or simple content objects like strings or numbers. If you pass an IEnumerable, the XElement constructor will iterate the collection and add all the items as children of the element. If you’ve had the displeasure of building an XML tree using the DOM, you’ll really appreciate XElements’s fluent interface. I was worried that Python’s significant whitespace would force me to put all the nested XElements on a single line, but luckily Python doesn’t treat whitespace inside parenthesis as significant. 

Creating collections in Python is even easier than it is in C#. Python’s supports a yield keyword which is basically the equivalent of C#’s yield return. However, Python also supports list comprehensions (known as generator expressions), which are similar to F#’s sequence expressions. These are nice because you can specify a collection in a single line, rather than having to create a separate function, which is what you have to do to use yield. I have two generator expressions: (XAttribute(key, trk[key]) for key in trk) creates a collection of XAttributes, one for every item in the trk dictionary and (GenMediaElement(song) for song in songs) which generates a collection of XElements, one for every song in the song collection.

Once I’ve finished building the playlist XML, I need to write it out to a file. Originally, I used Python’s built in open function, but the playlist file had to be UTF-8 because of band names like Mötley Crüe. Zune’s software appears to always use UTF-8. In addition to setting the encoding, I also specify to use indentation, so the resulting file is somewhat readable by humans.

The playlist works great in the Zune software, but since it’s a streaming playlist there’s no easy way to automatically download all the songs and sync them to your Zune device. I expected to be able to right click on the playlist and select “download all”, but there’s no such option. Zune does have a concept called Channels where the songs from a regularly updated feed are downloaded locally and synced to the device. However, the Zune software appears to be hardcoded to only download channels from the catalog service so I couldn’t tap into that. If anyone knows how to sign up to become a Zune partner channel, please drop me a line.

Otherwise, that’s So there you have it. As usual, I’ve stuck the code up on my SkyDrive. If I can remember, I’ll try and run the script once a week and upload the new playlist to my SkyDrive as well.


IronPython and Linq to XML Part 3: Consuming Atom Feeds

Now that I have my list of Rock Band songs, I need to generate a Zune playlist. I wrote that Zune just uses the WMP playlist format, but that’s not completely true. Media elements in a Zune playlist have several attributes that appear unique to Zune.

Because of Zune Pass, Zune supports the idea of streaming playlists where the songs are downloaded on demand instead of played from the local hard drive. In order to enable this, media elements in Zune playlists can have a serviceID attribute, a GUID that uniquely identifies the song on the Zune service. We also need the song’s album and duration – the Zune software summarily removes songs that don’t include the duration.

Of course, the Rock Band song list doesn’t include the Zune song service ID. It also doesn’t include the song’s album or duration. So we need a way, given the song’s title and artist (which we do have) to get its album, duration and service ID. Luckily, the Zune service provides a way to do exactly this, albeit an undocumented way. Via Fiddler2, I learned that Zune exposes a set of Atom feed web services on catalog.zune.net that the UI uses when you search the marketplace from the Zune software. There are feeds to search by artist and by album but the one we care about is the search by track. For example, here’s the track query for Pinball Wizard by The Who.

Since these feeds are real XML, I can simply use XDocument.Load to suck down the XML. Then I look for the first Atom entry element using similar LINQ to XML techniques I wrote about last time. If there’s no Atom elements, that means that the search failed – either Zune doesn’t know about the song or it can’t find it via the Rock Band provided title and artist. Of the 461 songs on Rock Band right now, my script can find 417 of them on Zune automatically.

Of course, since the Zune data is in XML instead of HTML, finding the data I’m looking for is much easier that it was to find the Rock Band song data. Here’s the code pull the relevant information out of the Zune catalog feed that we need.

def ScrapeEntry(entry):   
  id = entry.Element(atomns+'id').Value  
  length = entry.Element(zunens+'length').Value  

  d = {}  
  d['trackTitle'] = entry.Element(atomns+'title').Value  
  d['albumArtist'] = entry.Element(zunens+'primaryArtist')
                       .Element(zunens+'name').Value  
  d['trackArtist'] = d['albumArtist']  
  d['albumTitle'] = entry.Element(zunens+'album')
                       .Element(zunens+'title').Value  
    
  if id.StartsWith('urn:uuid:'):  
    d['serviceId'] = "{" + id.Substring(9) + "}"  
  else:  
    d['serviceId'] = id  
    
  m = length_re.Match(length)  
  if m.Success:  
    min = int(m.Groups[1].Value)  
    sec = int(m.Groups[2].Value)  
    d['duration'] = str((min * 60 + sec) * 1000)  
  else:  
    d['duration'] = '60000'  
      
  return d  

trackurl = catalogurl + song.search_string     
trackfeed = XDocument.Load(trackurl)  
trackentry = First(trackfeed.Descendants(atomns+'entry'))  
track = ScrapeEntry(trackentry)

A few quick notes:

  • The code above isn’t valid Python, I added a couple of carriage returns (albumArtist and albumTitle) to get it to read well on the blog without wrapping badly.
  • song.search_string returns the song title and artist as a plus delimited string. i.e. pinball+wizard+the+who. However, many Rock Band songs end in a parenthetical like (Cover Version) so I automatically strip that off for the search string
  • duration in the Atom feed is stored like PT3M23S, which means the song is 3:23 long. The playlist file expect the song length in milliseconds, so I use a .NET regular expression to pull out the minutes and seconds and do the conversion. It’s not exact – songs lengths usually aren’t exactly a factor of seconds, but as far as I can understand, Zune just uses that to display in the UI – it doesn’t affect playback at all.

Now I have a list of songs with all the relevant metadata, next time I’ll write it out into a Zune playlist file.


IronPython and Linq to XML Part 2: Screen Scraping

First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.

Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.

def loadStream(streamreader):
  from System.Xml.Linq import XDocument     
  from Sgml import SgmlReader     
   
  reader = SgmlReader()
  reader.DocType = "HTML"
  reader.InputStream = streamreader     
  return XDocument.Load(reader)
   
def load(url):
  from System import Uri     
  from System.IO import StreamReader     
   
  if isinstance(url, str):
    url = Uri(url)
   
  if url.Scheme == "file":
    from System.IO import File     
    with File.OpenRead(url.LocalPath) as fs:
      with StreamReader(fs) as sr:
        return loadStream(sr)
  else:
    from System.Net import WebClient     
    wc = WebClient()
    with wc.OpenRead(url) as ns:
      with StreamReader(ns) as sr:
        return loadStream(sr)

def parse(text):
  from System.IO import StringReader     
  return loadStream(StringReader(text))

I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.

Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.

def FindById(node, id):
  def CheckId(n):
    a = n.Attribute('id')
    return a != None and a.Value == id     
   
  return linq.Single(node.Descendants(), CheckId)

(Side note – I didn’t like the verbosity of the “a != None and a.Value == id” line of code, by XAttributes are not comparable by value. That is, I can’t write “node.Attribute(‘id’) == XAttribute(‘id’, id)”. And writing “node.Attribute(‘id’).Value == id” only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)

LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:

def Single(col, fun):
  return Enumerable.Single[object](col, Func[object, bool](fun))
   
def Where(col, fun):
  return Enumerable.Where[object](col, Func[object, bool](fun))
   
def Select(col, fun):
  return Enumerable.Select[object, object](col, Func[object, object](fun))

Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.

def ScrapeSong(node):     
  tds = list(node.Elements(xhtml.ns+'td'))    
  anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]    
      
  title = anchor.Value    
  url = anchor.Attribute('href').Value    
  artist = tds[1].Value    
  year = tds[2].Value    
  genre = tds[3].Value    
  difficulty = tds[4].Value    
  _type = tds[5].Value    
  added = DateTime.Parse(tds[6].Value)    
      
  return Song(title, artist, added, url, year, genre, difficulty, _type)    

songs = ThenBy(OrderByDesc(   
          Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),    
          lambda s: s.added), lambda s: s.artist)

And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.


IronPython and Linq to XML Part 1: Introduction

Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.

There are lots of songs available for Rock Band – 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.

I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.

BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.