Passion * Technology * Ruthless Competence

Monday, December 01, 2008

IronPython and Linq to XML Part 4: Generating XML

Now that I have my list of Rock Band songs and I can get the right Zune metadata for most of them, I just need to write out the playlist XML. This is very straight forward to do with the classes in System.Xml.Linq.

def GenMediaElement(song):
  try:
    trackurl = zune_catalog_url + song.search_string
    trackfeed = XDocument.Load(trackurl)
    trackentry = First(trackfeed.Descendants(atomns+'entry'))
    trk = ScrapeEntry(trackentry)
    return XElement('media', (XAttribute(key, trk[key]) for key in trk))
  except:
    print "FAILED", song
    
zpl = XElement("smil",
  XElement("head"
    XElement("title", "Rock Band Generated Playlist")),
  XElement("body",
    XElement("seq", (GenMediaElement(song) for song in songs))))

settings = XmlWriterSettings()
settings.Indent = True
settings.Encoding = Encoding.UTF8
with XmlWriter.Create("rockband.zpl", settings) as xtw:
  zpl.WriteTo(xtw)

XElement’s constructor takes a name (XName to be precise) and any number of child objects. These child objects can be XML nodes (aka XObjects) or simple content objects like strings or numbers. If you pass an IEnumerable, the XElement constructor will iterate the collection and add all the items as children of the element. If you’ve had the displeasure of building an XML tree using the DOM, you’ll really appreciate XElements’s fluent interface. I was worried that Python’s significant whitespace would force me to put all the nested XElements on a single line, but luckily Python doesn’t treat whitespace inside parenthesis as significant. 

Creating collections in Python is even easier than it is in C#. Python’s supports a yield keyword which is basically the equivalent of C#’s yield return. However, Python also supports list comprehensions (known as generator expressions), which are similar to F#’s sequence expressions. These are nice because you can specify a collection in a single line, rather than having to create a separate function, which is what you have to do to use yield. I have two generator expressions: (XAttribute(key, trk[key]) for key in trk) creates a collection of XAttributes, one for every item in the trk dictionary and (GenMediaElement(song) for song in songs) which generates a collection of XElements, one for every song in the song collection.

Once I’ve finished building the playlist XML, I need to write it out to a file. Originally, I used Python’s built in open function, but the playlist file had to be UTF-8 because of band names like Mötley Crüe. Zune’s software appears to always use UTF-8. In addition to setting the encoding, I also specify to use indentation, so the resulting file is somewhat readable by humans.

The playlist works great in the Zune software, but since it’s a streaming playlist there’s no easy way to automatically download all the songs and sync them to your Zune device. I expected to be able to right click on the playlist and select “download all", but there’s no such option. Zune does have a concept called Channels where the songs from a regularly updated feed are downloaded locally and synced to the device. However, the Zune software appears to be hardcoded to only download channels from the catalog service so I couldn’t tap into that. If anyone knows how to sign up to become a Zune partner channel, please drop me a line.

Otherwise, that’s So there you have it. As usual, I’ve stuck the code up on my SkyDrive. If I can remember, I’ll try and run the script once a week and upload the new playlist to my SkyDrive as well.

Posted By Harry Pierson at 11:13 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Thursday, November 27, 2008

IronPython and Linq to XML Part 3: Consuming Atom Feeds

Now that I have my list of Rock Band songs, I need to generate a Zune playlist. I wrote that Zune just uses the WMP playlist format, but that’s not completely true. Media elements in a Zune playlist have several attributes that appear unique to Zune.

Because of Zune Pass, Zune supports the idea of streaming playlists where the songs are downloaded on demand instead of played from the local hard drive. In order to enable this, media elements in Zune playlists can have a serviceID attribute, a GUID that uniquely identifies the song on the Zune service. We also need the song’s album and duration – the Zune software summarily removes songs that don’t include the duration.

Of course, the Rock Band song list doesn’t include the Zune song service ID. It also doesn’t include the song’s album or duration. So we need a way, given the song’s title and artist (which we do have) to get its album, duration and service ID. Luckily, the Zune service provides a way to do exactly this, albeit an undocumented way. Via Fiddler2, I learned that Zune exposes a set of Atom feed web services on catalog.zune.net that the UI uses when you search the marketplace from the Zune software. There are feeds to search by artist and by album but the one we care about is the search by track. For example, here’s the track query for Pinball Wizard by The Who.

Since these feeds are real XML, I can simply use XDocument.Load to suck down the XML. Then I look for the first Atom entry element using similar LINQ to XML techniques I wrote about last time. If there’s no Atom elements, that means that the search failed – either Zune doesn’t know about the song or it can’t find it via the Rock Band provided title and artist. Of the 461 songs on Rock Band right now, my script can find 417 of them on Zune automatically.

Of course, since the Zune data is in XML instead of HTML, finding the data I’m looking for is much easier that it was to find the Rock Band song data. Here’s the code pull the relevant information out of the Zune catalog feed that we need.

def ScrapeEntry(entry):  
  id = entry.Element(atomns+'id').Value 
  length = entry.Element(zunens+'length').Value 

  d = {} 
  d['trackTitle'] = entry.Element(atomns+'title').Value 
  d['albumArtist'] = entry.Element(zunens+'primaryArtist')
                       .Element(zunens+'name').Value 
  d['trackArtist'] = d['albumArtist'] 
  d['albumTitle'] = entry.Element(zunens+'album')
                       .Element(zunens+'title').Value 
   
  if id.StartsWith('urn:uuid:'): 
    d['serviceId'] = "{" + id.Substring(9) + "}" 
  else: 
    d['serviceId'] = id 
   
  m = length_re.Match(length) 
  if m.Success: 
    min = int(m.Groups[1].Value) 
    sec = int(m.Groups[2].Value) 
    d['duration'] = str((min * 60 + sec) * 1000) 
  else: 
    d['duration'] = '60000
     
  return

trackurl = catalogurl + song.search_string
trackfeed = XDocument.Load(trackurl) 
trackentry = First(trackfeed.Descendants(atomns+'entry')) 
track = ScrapeEntry(trackentry) 

A few quick notes:

  • The code above isn’t valid Python, I added a couple of carriage returns (albumArtist and albumTitle) to get it to read well on the blog without wrapping badly.
  • song.search_string returns the song title and artist as a plus delimited string. i.e. pinball+wizard+the+who. However, many Rock Band songs end in a parenthetical like (Cover Version) so I automatically strip that off for the search string
  • duration in the Atom feed is stored like PT3M23S, which means the song is 3:23 long. The playlist file expect the song length in milliseconds, so I use a .NET regular expression to pull out the minutes and seconds and do the conversion. It’s not exact – songs lengths usually aren’t exactly a factor of seconds, but as far as I can understand, Zune just uses that to display in the UI – it doesn’t affect playback at all.

Now I have a list of songs with all the relevant metadata, next time I’ll write it out into a Zune playlist file.

Posted By Harry Pierson at 10:55 AM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Wednesday, November 26, 2008

IronPython and Linq to XML Part 2: Screen Scraping

First, I need to convert the HTML list of Rock Band songs into a machine readable format. That means doing a little screen scraping. Originally, I used Beautiful Soup but I found that UnicodeDammit got confused on names like Blue Öyster Cult and Mötley Crüe. I’m guessing it’s broken because IronPython doesn’t have non-unicode strings.

Instead, I used SgmlReader to provide an XmlReader interface over the HTML, then queried that data via Linq to XML. I used the version of SgmlReader from MindTouch since they include a compiled binary and it seems to be the only active maintained version. I wrapped it all up in a function called load that loads HTML from either disk or the network (based on the URI scheme) into an XDocument.

def loadStream(streamreader):
  from System.Xml.Linq import XDocument
  from Sgml import SgmlReader
  
  reader = SgmlReader()
  reader.DocType = "HTML"
  reader.InputStream = streamreader
  return XDocument.Load(reader)
  
def load(url):
  from System import Uri
  from System.IO import StreamReader
  
  if isinstance(url, str):
    url = Uri(url)
  
  if url.Scheme == "file":
    from System.IO import File
    with File.OpenRead(url.LocalPath) as fs:
      with StreamReader(fs) as sr:
        return loadStream(sr)
  else:
    from System.Net import WebClient
    wc = WebClient()
    with wc.OpenRead(url) as ns:
      with StreamReader(ns) as sr:
        return loadStream(sr)

def parse(text):
  from System.IO import StringReader
  return loadStream(StringReader(text))

I call load, passing in the URL to the list of songs. The “official” Rock Band song page loads the actual content from a different page via AJAX, so I just load the actual list directly via my load function.

Once the HTML is loaded as an XDocument, I need a way to find the specific HTML nodes I was looking for. As I said earlier, XDocument uses Linq to XML – there is not other API for querying the XML tree. In the HTML, there’s a div tag with the id “content” that contains all the song rows as table row elements. I built a simple function that uses the LINQ Single method to find the tag by it’s id attribute value.

def FindById(node, id):
  def CheckId(n):
    a = n.Attribute('id')
    return a != None and a.Value == id
  
  return linq.Single(node.Descendants(), CheckId)

(Side note – I didn’t like the verbosity of the “a != None and a.Value == id” line of code, by XAttributes are not comparable by value. That is, I can’t write “node.Attribute(‘id’) == XAttribute(‘id’, id)”. And writing “node.Attribute(‘id’).Value == id” only works if every node has an id attribute. Not making XAttribute comparable by value seems like a strange design choice to me.)

LINQ to objects works just fine from IronPython, with a few caveats. First, IronPython doesn’t have extension methods, so you can’t chain calls together sequentially like you can in C#. So instead of collection.Where(…).Select(…), you have to write Select(Where(collection, …), …). Second, all the LINQ methods are generic, so you have to use the verbose list syntax (for example: Single[object] or Select[object,object]). Since Python doesn’t care about the generic types, I wrote a bunch of simple helper functions around the common LINQ methods that just use object as the generic type. Here are a few examples:

def Single(col, fun):
  return Enumerable.Single[object](col, Func[object, bool](fun))
  
def Where(col, fun):
  return Enumerable.Where[object](col, Func[object, bool](fun))
  
def Select(col, fun):
  return Enumerable.Select[object, object](col, Func[object, object](fun))

Once I have the content node, all the songs are in tr nodes beneath it. I wrote a function called ScrapeSong that transforms a song tr node into a Song object (which I’ll talk about in the next installment of this series). I use LINQ methods Select, OrderBy and ThenBy to provide me an enumeration of Song objects, ordered by date added (descending) than artist name.

def ScrapeSong(node):    
  tds = list(node.Elements(xhtml.ns+'td'))   
  anchor = list(tds[0].Elements(xhtml.ns+'a'))[0]   
     
  title = anchor.Value   
  url = anchor.Attribute('href').Value   
  artist = tds[1].Value   
  year = tds[2].Value   
  genre = tds[3].Value   
  difficulty = tds[4].Value   
  _type = tds[5].Value   
  added = DateTime.Parse(tds[6].Value)   
     
  return Song(title, artist, added, url, year, genre, difficulty, _type)   

songs = ThenBy(OrderByDesc(  
          Select(content.Elements(xhtml.ns +'tr'), ScrapeSong),   
          lambda s: s.added), lambda s: s.artist)

And that’s pretty much it. Next, I’ll iterate thru the list of songs and get the details I need from Zune’s catalog web services in order to write out a playlist that the Zune software will understand.

Posted By Harry Pierson at 5:16 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

IronPython and Linq to XML Part 1: Introduction

Shortly after I joined the VS Languages team, we had a morale event that included a Rock Band tournament. I didn’t play that day in the tournament since I had never played before, but I was hooked just the same. I got Rock Band for my birthday, Rock Band 2 shortly after it came out in September and I’m hoping to get the AC/DC Track Pack for Christmas.

There are lots of songs available for Rock Band - 461 currently available between on-disc and downloadable tracks – with more added every week. Frankly, there’s lots of music on that list that I don’t recognize. Luckily, I’m also a Zune Pass subscriber, so I can go out and download all the Rock Band tracks and listen to them on my Zune. But who has time to manually search for 461 songs? Not me. So I wrote a little Python app to download the list of Rock Band songs and save it as a Zune playlist.

I ended up use Linq to XML very heavily in this project. Zune playlists use the same XML format as Windows playlists, Zune exposes the backend music catalog via a Atom feeds and I used Chris Lovett’s SgmlReader to expose the HTML list of Rock Band songs as XML. I realize Linq to XML wasn’t on “the list”, but I had a specific need so it got bumped to the head of the line.

BTW, for those who just want the playlist, I stuck it on my Skydrive. Unfortunately, there’s no Skydrive API right now, so I can’t automate uploading the new playlist every week. If anyone has alternative suggestions or a way to programmatically upload files to SkyDrive, let me know.

Posted By Harry Pierson at 5:07 PM Pacific Standard Time
IronPython | LINQ | Rock Band | XML | Zune

Tuesday, May 06, 2008

Deserializing XML with IronPython

Now that I can stream process XML, the next logical step is to deserialize it into some type of object graph. As I said in my last post, there are at least three different DOM-esque options on the .NET platform as well as two in the Python library (xml.dom and xml.minidom)

However, anyone who's ever programmed against the DOM knows just what a major PITA it is.

Instead, you could deserialize the XML into a custom object tree, based on the nodes in the XML stream. In .NET, there are at least two libraries for doing this: the old-school XmlSerializer as well as the new-fangled DataContractSerializer. In these libraries, the PITA comes in defining the static types with all the various custom attribute adornments you need to tell the deserializer how to do it's job. Actually, if you're defining your code first, all those adornments aren't that big a deal. However, if you're starting from the XML, especially XML with lots of different namespaces - like say my RSS feed - defining a static type for this gets old fast.

Of course, if you're not using a statically typed language... ;)

One of the cool aspects of dynamic languages is the ability to easily generate new types on the fly. In Python, you can create a new type by calling the type function. Here's an example of creating a new type for a XML node:

def create_type(node, parent): 
  return type(node.name, (parent,), {'xmlns':node.namespace})

Since I'm working with XML, I wanted to make sure I handled namespaces. Thus, I add the namespace to the class definition (the third parameter in the type function above). This lets me walk up to any arbitrary object created from an XML element and check it's namespace.

I used this dynamic type creation functionality in my xml2py module, which I added to my IronPython SkyDrive folder. It leverages ipypulldom, so make sure you get both. The heart of the module is the xml2py function, which recursively iterates thru the node stream and builds the tree. Attributes and child elements become named attributes on the object, so I can write code that looks like this:

import xml2py 
rss = xml2py.parse('http://feeds.feedburner.com/Devhawk') 
for item in rss.channel.item: 
  print item.title

You see? No screwing around with childNodes or getAttribute here.

The basic processing loop of xml2py creates a new instance of a new type when it encounters a start element tag. It then collects all the attributes and children of that element, and adds them as attributes on the element object, using the name of the type as name of the attribute. If there are multiple children with the same type name, xml2py converts that attribute to a list of values. For example, in an RSS feed, there will be likely be many rss.channel.item elements. In xml2py, the item attribute of the channel object will be a list of item objects.

Since attributes and child elements are getting slotted together, I added a _nodetype attribute on each so I can later tell (if I care) if the value was originally an attribute or element. I haven't written py2xml yet, but that might be important then.

I do one optimization for simple string elements like <foo>bar</foo>. In this case, I create a type that inherits from string (hence the need for the parent parameter in the create_type function above) and contains the string text. It still has the xmlns and _nodetype attributes, so I could write item.title.xmlns (which is empty since RSS is in the default namespace) or item.title._nodetype (which would be XmlNodeType.Element)

It's not much code - about 100 lines of code split evenly between the xml2py function and the _type_factory object. Given that you usually see the same element in an XML stream over an over, I didn't want to create multiple types for the same element. So _type_factory caches types in a dictionary so I can reuse them. One of the cool things is that it's a callable type (i.e. it implements __call__ so I can use the instance like a function. I started by defining a xtype function that didn't cache anything, but then later switched xtype to be a _type_factory instance, but none of my code that called xtype had to change!

One other quick note. If you put xml2py.py and ipypylldom.py in a folder, you can experiment with them by launching "ipy -i xml2py". This runs xml2py.py as a script, but dumps you into the interactive console when you're thru. It will run the little snippet of code above which runs xml2py on my FeedBurner feed, but then you can play around with the rss object and see what it contains. Be sure to check out the xmlns attribute for each object in the rss.channel.link list.

Posted By Harry Pierson at 5:37 PM Pacific Daylight Time

Stream Processing XML in IronPython

When it comes to processing XML, there are two basic approaches - load it all into memory at once or process it a node at a time. In the .NET world where I have spent most of the past ten years, those two models are represented by XmlDocument and XmlReader. There are alternatives to XmlDocument, such as XDocument and XPathDocument, but you get the idea.

Out in non-MSFT land, the same two basic models exist, however the de facto standard for stream based processing is SAX, the Simple API for XML. SAX is supported by many languages, including Python.

Personally, I've never been a fan of SAX's event-driven approach. Pushing events makes total sense for a human driven UI, but I never understood why anyone thought that was a good idea for stream processing XML. I like XmlReader's pull model much better. When you're ready for the next node, just call Read() - no mucking about setting content handlers or handling node processing events.

Luckily, the Python standard library supports both approaches. It provides both a SAX based parser as well as a pull based parser called pulldom. Pulldom doc's are fairly sparse, but Paul Prescod wrote a nice introduction. Here's an example from Paul's site (slightly modified):

from xml.dom import pulldom
nodes = pulldom.parse( "file.xml" ) 
for (event,node) in nodes: 
    if event=="START_ELEMENT" and node.tagName=="table": 
        nodes.expandNode( node )

Actually, I like this better than XmlReader, since it provides the nodes in a list-like construct that appeals to the functional programmer in me. I'd like it even more if Python had a native pattern matching syntax - you know, like F# - but you can get similar results by chaining together conditionals with elif.

However, IronPython doesn't support any of the XML parsing modules from Python's standard library. They're all based on a C-based python module called pyexpat which IronPython can't load. [1] I wanted a pulldom type model, so I decided to wrap XmlReader to provide a similar API and lets me write code like this:

import ipypulldom 
nodes = ipypulldom.parse( "sample.xml" )  
for node in nodes:  
  if node.nodeType==XmlNodeType.Element:  
    print node.xname

There are a few differences from pulldom, but it's basically the same model. I'm using the native .NET type XmlNodeType rather than a string to indicate the node type. Furthermore, I made the node type a property of the node, rather than a separate variable. I also didn't implement expandNode, though doing so would be a fairly straightforward combination of XmlReader.ReadSubtree and XmlDocument.Load.

I stuck the code for ipypulldom up in a new folder on my Skydrive: IronPython Stuff. It's fairly short - only about 45 lines of code. Feel free to use it if you need it.


[1] The FePy project has a .NET port of pyexpat as part of their distribution, so I assume that lets you use the standard pulldom implementation in IPy. FePy looks really cool but I haven't had time to dig into it yet.

Posted By Harry Pierson at 11:21 AM Pacific Daylight Time

Wednesday, February 27, 2008

Morning Coffee 150

  • Yesterday was the NHL trading deadline, and the Capitals were very busy. They obtained Huet from Montreal, Federov from Columbus and Cooke from Vancouver. Given they are fighting just to make the playoffs, going for three soon-to-be unrestricted free agents seems like an odd choice. However, the consensus (among my parents anyway) was that it's critical to get this very young Caps team some playoff experience. Even if all three walk at season's end, it'll be worth if the Caps make a playoff run. Besides it's not like we gave up much: an extra second round pick in '09, a 19 year old defensive prospect (who was apparently 14th on the depth chart) and an underachieving winger.
  • Speaking of the Caps playoff chances, they are currently one and a half games back of the division leading Hurricanes and two games behind the current eighth seed Flyers. Yes, I rank hockey teams using baseball's standings system. Otherwise, you have to talk about games in hand (i.e. the Caps are five points behind Carolina with two games in hand).
  • The writer's guild ratified the new contract, so Hollywood labor strife is now officially behind us. At least until July when the the actors may go on strike.
  • It seems like a slow week for Microsoft geek news, which is odd since WS08, VS08 and SQL08 all launch today. I'm guessing it's the calm before the Mix storm next week.
  • After going dark for six months, Linq to XSD has been re-released to work with the RTM version of VS08. Scott Hanselman demonstrates Linq to XSD by applying it to OFX, an XML Schema he calls "goofy" but apparently helped develop. OFX uses derivation by restriction, which has no direct corollary in C#, but Linq to XSD's  is able to translate between XML and objects without loosing any of that type fidelity. Nice to know Linq to XSD can tolerate OFX's level of goofiness, though I'm guessing most people use much more straightforward schemas.
  • Speaking of Linq, I discovered LINQPad via a comment on Rob Conery's blog (which I found via DNK). It's basically a code snippet IDE for C# 3.0 and VB9, with it also has built in database connection support, so it can fulfil much the same role as SQL Management Studio. I only played with it for a few minutes, but I was really impressed.  This is definitely going in my utilities folder. I wonder if they're interested in supporting F#?
  • Not sure how I missed this, but you can get MSDN Magazine via same Syndicated Client Experience as Architecture Journal. Unlike AJ which is divided into issues, the MSDN magazine client is divided into topics which is harder to square with the physical magazine. On the other hand, since MSDN Mag has been around longer, perhaps topics + search is a better discovery mechanism.
  • Soma announces the Visual Studio Gallery, a repository of VS Extensions. It's kinda cool, but the whole discovery mechanism is clunky. I might like to experiment with some free or even free trial products, but there's no way to filter on cost so finding them is a hassle. Also, there's no way for community members to vote, rate or comment on the products in any way.
  • Nick Malik can't answer the question "how does Enterprise Architecture demonstrate value?" I could be snarky and say "it doesn't", but that's only half the answer. It doesn't, but it should. My opinion, since you asked Nick, is that EA fails to deliver value because it tries to control the uncontrollable. Trying to gain efficiency thru establishing standards and eliminating overlap via reuse are pipe dreams, though literally millions of $$$ have been poured into those sink-holes. There are a few areas where centrally funded infrastructure projects can solve big problems that individual projects can't effectively tackle on their own. EA should focus their time there, they can actually make a difference. Otherwise, they should stay out of project's way.
Posted By Harry Pierson at 10:17 AM Pacific Standard Time

Wednesday, February 13, 2008

Morning Coffee 146

  • The writers strike is officially over. Everyone goes back to work today. Thomas Cleaver has what I thought was the best post summarizing how the writers won. TV Guide has a rundown of how and when various shows will resume. I can't wait to see Daily Show and Colbert Report tonight. Lost - aka the best show on TV - looks like it will be getting five more episodes (in addition to the eight shot before the strike).
  • Speaking of TV, Battlestar Galactica Fans: circle April 4th on your calendar.
  • Obama won all three "Potomac Primaries" yesterday, and is now the Democratic front-runner, though there's a long way to go before the convention. Scott Adams of Dilbert fame has a great take on presidential experience - I'm guessing he's an Obama fan.
  • In minor acquisition news, Microsoft is acquiring Caligari, makers of 3D modeling tool trueSpace. The Caligari folks are joining the Virtual Earth team, though I wonder what the XNA folks think of the acquisition. This isn't the first 3D modeling product Microsoft ever acquired - we owned Softimage for four years in the '90s.
  • Scott Hanselman and Tomas Resprepo both write about PowerShellPlus, which I saw week before last @ Lang.NET. Scott really likes it, for both PS novices and gurus, but Tomas thinks the UI is busy, based on the screenshots. Personally, I'm not doing much PS work lately - occasional one off stuff, but that's it - so it doesn't seem worth the effort.
  • Speaking of Scott & Tomas, Scott also has a nice gallery of VS themes. I'm partial to Tomas' Ragnarok Grey. Is there a VSThemesGallery.com site somewhere?
  • Still speaking of Scott, he points to the new ASP.NET Developer Wiki (beta). I poked around, but didn't find anything shiny. I was very surprised that searching for "MVC" returned no results.
  • Speaking of MVC, Scott Guthrie has a rundown on what's coming in the MIX preview release of ASP.NET MVC. Biggest news IMO is that it's /bin deployable - i.e. you don't need your hoster to do anything special to support MVC (assuming they already support ASP.NET 3.5). Also big news, they're releasing the source so you can build and patch (and enhance?) it yourself.
  • Chris Taveres continues is ObjectBuilder series and Tomas continues is DLR Notes series. BTW, my F# based DLR experimentation continues, albeit slowly (frakking day job). Hope to be able to post on this soon.
  • One of the things driving my interest in F# is manycore. An interesting tangent to manycore is general purpose programming on graphics processing units (aka GPGPU). MS Research just released a new version of Accelerator, just such a GPGPU system. I personally haven't played with it - I've been focused on writing parsers, not parallel code.
  • Is XQuery really "a promising technology of the future" as Don Box suggests? I see exactly zero demand or use for it in my day-to-day work. Of course, Don's paid to build future platform goo, so maybe it is promising and Don's afore-mentioned goo will leverage it, though I remain skeptical. As for XML being "Done like a well-cooked steak", I'd say XML is like a great steak cooked perfectly, except it's done exactly how you don't like it. You can appreciate its quality, but you don't really enjoy it as much as you could have.
Posted By Harry Pierson at 10:04 AM Pacific Standard Time

Monday, October 08, 2007

Morning Coffee 116

"Looks like I picked the wrong week to stop sniffing glue"
Steve McCroskey, Airplane!

  • So it's been a while since my last post. Just over a month, not including The F5 High, which wasn't "original IP". Frankly, I just stopped reading pretty much cold turkey. I wanted and needed to go heads down on day job stuff for a while. Since I haven't been reading, Morning Coffee is going to be a little cold while I ramp back up.
  • The new NHL season is upon us, and the Caps are looking good so far. Obviously, they have the new uniforms, but they're also out to a 2-0 start for the first time in five years. And in those two games, they've only allowed one goal and are 100% on the PK. It's nice to see them start strong, but obviously there's a long way to go. Here's hoping the can stay strong all season.
  • Speaking of staying strong, the wheels that were rattling last week came off the Trojan bandwagon completely this week. I'm not sure it's as big an upset as Appalachian State beating Michigan but it's close. What happened to the team that scored 5 TD's in a row on Nebraska?
  • Big news last week is that MSFT is going to release the source code to much of the .NET Framework. Scott Guthrie has the details. Frankly, between Rotor & Reflector, it wasn't like you couldn't see the source code anyway, so this seems like a no-brainer. But integrating it directly into the VS Debugging experience, that's frakking brilliant.
  • I haven't had a chance to install the new XML Schema Designer (Aug 07 CTP)  but I was really impressed with this video. The XML Team blog has more details. However, I'm not sure what the ship vehicle is. The CTP install on top of VS08 beta 2, but in the video they keep saying "a future version" of VS, implying that it's not going to be in VS08.
  • Dare is spending some time investigating SSB. I think it's interesting that some of the REST crowd are starting to see the need for durable messaging. Dare argues that the features and usage models are more important than wire protocol. As long as it's standardized, I don't care that much about the protocol. Several of the REST folks mentioned AMQP. While I've got nothing against AMQP technically (frankly, I haven't read the spec), but what does it say about durable messaging vendors (including MSFT) that a financial institution felt the need to drive an interoperable durable messaging specification?
Posted By Harry Pierson at 9:57 AM Pacific Daylight Time

Friday, August 17, 2007

DataReaders, LINQ to XML and Range Generation

I'm doing a bunch of database / XML stuff @ work, so I decided to use to VS08 beta 2 so I can use LINQ. For reasons I don't want to get into, I needed a way to convert arbitrary database rows, read using a SqlDataReader, into XML. LINQ to SQL was out, since the code has to work against arbitrary tables (i.e. I have no compile time schema knowledge). But XLinq LINQ to XML helped me out a ton. Check out this example:

const string ns = "{http://some.sample.namespace.schema}";

while (dr.Read())
{
    XElement rowXml = new XElement(ns + tableName,
        from i in GetRange(0, dr.FieldCount)
        select
            new XElement(ns + dr.GetName(i), dr.GetValue(i)));
}

That's pretty cool. The only strange thing in there is the GetRange method. I needed an easy way to build a range of integers from zero to the number of fields in the data reader. I wasn't sure of any standard way, so I wrote this little two line function:

IEnumerable<int> GetRange(int min, int max)
{
    for (int i = min; i < max; i++)
        yield return i;
}

It's simple enough, but I found it strange that I couldn't find a standard way to generate a range with a more elegant syntax. Ruby has standard range syntax that looks like (1..10), but I couldn't find the equivalent C#. Did I miss something, or am I really on my own to write a GetRange function?

Update - As expected, I missed something. John Lewicki pointed me to the static Enumerable.Range method that does exactly what I needed.

Posted By Harry Pierson at 4:55 PM Pacific Daylight Time
ADO.NET | Database | Development | LINQ | Ruby | XML

Wednesday, July 25, 2007

Early Afternoon Coffee 105

  • My two sessions on Rome went very well. Sort of like what I did @ TechEd last month, but with a bit more kimono opening since it was an internal audience. Best things about doing these types of talks is the questions and post-session conversation. I've missed that since moving over to MSIT.
  • Late last week, I got my phone switched over to the new Office Communications Server 2007 beta. In my old office, I used the Office Communicator PBX phone integration features extensively. However, when we moved we got new IP phones that didn't integrate with Communicator. So when a chance to get on the beta came along, I jumped. I'll let you know my impressions after a few weeks, in the meantime you can read about Mark Deakin's experience.
  • Matevz Gacnik figures out how to build a transactional web service that interacts with the new transactional file system in Vista and Server 08. Interesting, but personally I don't believe in using transactional web services. The whole point of service orientation is to reduce the coupling between services. Trying two services (technically, a service consumer and provider) together in an atomic transaction seems like going in the wrong direction. Still, good on Matevz for digging into the transactional file system.
  • Udi Dahan gives us 6 simple steps to being a "top" IT consultant. I notice that getting well known, speaking and publishing are at the top of the list but actually being good at what you're well known for comes in at #5 on the list. I'm sure Udi thinks that's implicit in becoming a "top" consultant, but I'm not so sure.
  • Pat Helland thinks Normalization is for Sissies. Slide #6 has the key take away: "For God's Sake, Don't Normalize Immutable Data".
  • Larry O'Brien bashes the new binary efficient XML working group and working draft. I agree 100% w/ Larry. These aren't the droids we're looking for.
  • John Evdemon points to a new e-book from my old team called SOA in the Real World. I flipped thru it (figuratively) and it appears to drill into the Foundations of Solution Architecture as well as provide real-world case studdies for each of the pillars recurring logical capabilities. Need to give it a deeper read.
Posted By Harry Pierson at 12:36 PM Pacific Daylight Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.