DevHawk on CodeCast

Ken Levy used to work around the corner from my office, back in his days on the VSX team. These days, he’s hosting the CodeCast (among other things) and he dropped my my office a while back to chat about IronPython for his podcast.

Check it out.

Writing an IronPython Debugger: Displaying Values

Now that I can get the local variables for a given frame, I need to display them in the console. Eventually, I’d like to provide the ability to update the local variables as well, but you gotta crawl before you can run. Luckily, the debugger API is consistent about using same COM interfaces – wrapped by the managed CorValue class – to represent all data values, including local variables, function arguments and object fields. So the work I do now to display CorValues in the console will be reusable in other contexts down the road.

While the debugger API is consistent about how it represents values in the target process, the API it uses is very complicated. The primary COM interface for accessing values is ICorDebugValue, but it has eight siblings: ICorDebugReferenceValue, ICorDebugHandleValue, ICorDebugStringValue, ICorDebugObjectValue, ICorDebugGenericValue, ICorDebugBoxValue, ICorDebugArrayValue, ICorDebugHeapValue. All those COM interfaces are represented in managed code by CorValue and it’s subclasses.

Furthermore, confusingly ICorDebugValues have both a Type and an ExactType. ExactType is what .NET developers typically think of as the type, aka the CLR type. Well, the debugger API’s representation of the CLR type at any rate. You can retrieve the value’s metadata as a System.Type compatible object via value.ExactType.Class.GetTypeInfo().CorValue’s Type property, on the other hand, represents the object’s primitive or element type. For example, instances of .NET classes have an element Type of ELEMENT_TYPE_CLASS. There are a collection of primitive types (boolean, char, ints of various signage and size, floats of various size) as well as types you wouldn’t call primitive but that the runtime has specific knowledge of (string, array and value types – aka structs in C# terminology).

If you’re confused by all that, don’t worry so am I. Honestly, I’ve re-written this code several times, each time understanding the API just a bit better. Whatever the right way to use the interfaces, I’m sure I don’t know it. For my first cut at this, I essentially ported MDbg’s high level CorValue API – aka MDbgValue::InternalGetValue if you’re looking at the MDbg source code – over to Python. Along the way, I’ve improved on that code as I’ll describe below.

A given CorValue may be a primitive value like an int or it may be a reference to or a boxed version of some other CorValue object. So in order to print the CorValue, you have to go thru a series of attempts to dereference and unbox until you get to the “real” underlying CorValue object. From there, converting the value to a string I can print depends on the value’s element type. For primitive types like ints and floats, you can call CastToGenericValue to get a CorGenericValue “view” of the same CorValue object ¹. A CorGenericValue can read and write the raw bytes from memory in the target process of the value. The GetValue method reads the data from target process then does an unsafe cast to appropriate managed type. For example, an ELEMENT_TYPE_R4 CorValue gets cast into a System.Single. For CorValue strings, I call CastToStringValue and then access the String property. For classes, value types and objects, there’s no simple or standard approach to retrieving the data, so for now I return the result of calling CastToObjectValue. Eventually, I’ll want to provide a mechanism to read the specific fields of a class or value type.

Unfortunately, the mechanism above to read primitive types doesn’t work with IronPython. GetValue needs to know the correct element type in order to do the unsafe cast. For value types (aka any struct other than the basic primitives), GetValue will return a data as a byte array. The problem is that when you box a primitive, the original element types gets overwritten by ELEMENT_TYPE_VALUETYPE. You can’t get the original element type back, even after unboxing. So for boxed primitives, you can only retrieve the data as a raw byte array or as a CorObjectValue, neither of which is very useful.

Luckily, I was able to work around this. Under the hood, GetValue calls UnsafeGetValueAsType to do the actual work of reading the data from the target process and casting it to the right managed type. UnsafeGetValueAsType It accepts the an element type value as a method parameter. If your know the right element type value, you could call UnsafeGetValueAsType directly if instead of going thru GetValue. While boxing overwrites the original element type value, an unboxed CorValue still has the CLR type metadata available. So I was able to map CLR Types to element types (e.g. System.Single –> ELEMENT_TYPE_R4) in order to retrieve the underlying value of boxed primitive types.

_type_map = { 'System.Boolean': ELEMENT_TYPE_BOOLEAN,
  'System.SByte'  : ELEMENT_TYPE_I1, 'System.Byte'   : ELEMENT_TYPE_U1,
  'System.Int16'  : ELEMENT_TYPE_I2, 'System.UInt16' : ELEMENT_TYPE_U2,
  'System.Int32'  : ELEMENT_TYPE_I4, 'System.UInt32' : ELEMENT_TYPE_U4,
  'System.IntPtr' : ELEMENT_TYPE_I,  'System.UIntPtr': ELEMENT_TYPE_U,
  'System.Int64'  : ELEMENT_TYPE_I8, 'System.UInt64' : ELEMENT_TYPE_U8,
  'System.Single' : ELEMENT_TYPE_R4, 'System.Double' : ELEMENT_TYPE_R8,
  'System.Char'   : ELEMENT_TYPE_CHAR, }

_generic_element_types = _type_map.values()

class NullCorValue(object):
  def __init__(self, typename):
    self.typename = typename

def extract_value(value):
    rv = value.CastToReferenceValue()
    if rv != None:
      if rv.IsNull:
        typename = rv.ExactType.Class.GetTypeInfo().Name
        return NullCorValue(typename)
      return extract_value(rv.Dereference())
    bv = value.CastToBoxValue()
    if bv != None:
      return extract_value(bv.GetObject())

    if value.Type in _generic_element_types:
      return value.CastToGenericValue().GetValue()
    elif value.Type == ELEMENT_TYPE_STRING:
      return value.CastToStringValue().String
    elif value.Type == ELEMENT_TYPE_VALUETYPE:
      typename = value.ExactType.Class.GetTypeInfo().Name
      if typename in _type_map:
        gv = value.CastToGenericValue()
        return gv.UnsafeGetValueAsType(_type_map[typename])
      else:
        return value.CastToObjectValue()
    elif value.Type in [ELEMENT_TYPE_CLASS, ELEMENT_TYPE_OBJECT]:
      return value.CastToObjectValue()
    else:
      msg = "CorValue type %s not supported" % str(value.Type)
      raise (Exception, msg)

It’s kinda ugly code and I’m thinking that at least some of really belongs in the CorValue C# classes rather than in ipydbg. However, I’m not that interested in doing the significant refactoring it would take to make the CorValue API developer-friendly, so I did it here.

One thing to note that I didn’t cover earlier is the NullCorValue object. For reference values, there’s a IsNull property that may be set. If it is set, I need a mechanism to indicate the null value, but also includes the type information. So I created a custom type that can store the type name to represent null. Again, something that should be a part of the CorValue API.

Once I have my extracted value, I need to display it in the console. This is much simpler than the extracting the value. As I wrote above, I’m not making any attempt to print a real representation for CorObjectValues. I could look at making a call ToString call to get something useful, but that requires invoking a function in the target process and I haven’t gotten that far with ipydbg yet. So I just print “<…>” if it isn’t a string, primitive or null value.

def display_value(value):
  if type(value) == str:
    return (('"%s"' % value), 'System.String')
  elif type(value) == CorObjectValue:
    return ("<...>", value.ExactType.Class.GetTypeInfo().FullName)
  elif type(value) == NullCorValue:
    return ("<None>", value.typename)
  else:
    return (str(value), value.GetType().FullName)

Now all I need is to iterate thru the list of local variables and call extract_value and display_value on each in turn and print the results. I won’t reproduce that code here, but you can see it in the ipydbg project source on GitHub.

I’m happy with what I’ve gotten working (it took several days of banging my head against the proverbial wall to get it this far) but there’s still room for improvement. First, I’d like to be able to call ToString to get a class-specific generic representation as I described above. Second, I need a way to display the fields of a CorObejctValue object. It’s just a combination of metadata reading and CorObjectValue::GetFieldValue, but that code won’t write itself. Finally, there are other Python primitives – like list, dictionary and tuple – that ipydbg should have specific knowledge of and be able to display without requiring the user to drill into the member variables and the like.

While the CorValue API does certain things very well, I wish it did a better job abstracting away the existence of the various ICorDebugValue interfaces. Hence the need for all the calls to CastToWhatever().↩

IronPython 2.6 Alpha 1

Just in type for PyCon, we just shipped the first alpha of IronPython 2.6. As you can guess from the version number, the main feature of this version of IronPython will be the new features introduced in Python 2.6. As you can see, we’ve synced version numbers between IronPython and Python. No more explaining which version of IPy goes with which version of Python.

In addition to the start of 2.6 support, the other big feature of IronPython 2.6 is something called Adaptive Compilation. IronPython’s performance is pretty good compared to CPython. We’re about 28% faster than CPython (IPy 2.0.1 vs. CPy 2.5) on PyStone and about 10% faster on PyBench if you exclude the TryRaiseExcept test. ¹ However, our startup time is not very good. These two facts are related: it takes a long time on startup to compile to Python code to IL (and then JITted from IL to native code), but once that’s done the code runs really fast. However, if you’re only going to execute a function a few times, it typically isn’t worth the overhead to compile the function to IL. The Adaptive Compilation feature is an interpreter for DLR trees. The first few times you run a given Python function, it gets interpreted. At some point, after you’ve called the function enough times, IronPython 2.6 decides to take the hit and compile the function. If you want to go back to the old “always compile to IL” model, you can pass –O on the command line.

This is our first alpha of 2.6, and some things are kinda broken. In particular, there was a change to collections.py that breaks much of the Python Standard Library under IronPython. Dave has the details and the workaround. Rest assured, this will get fixed before we release. Dino is hard at work making _getframe work for depths greater than zero. Because it will have some perf impact, it won’t be enabled by default – you’ll have to pass a command-line parameter to enable it. But if you have to opt-in to _getframe support for depth > 0, it makes sense to opt-into _getframe support entirely and do away with the current _getframe(0) only support. What’s nice about this approach is that it will work with collections.py regardless if you opt-in to _getframe or not.

As stated in the release notes, the release cycle on 2.6 will be much shorter than 2.0. There was only seven months between 1.0 and 1.1, and we’re shooting for a slightly longer timeframe for 2.6. Certainly not like the twenty months that passed between 1.1 and 2.0. So please start trying it out as soon as you can and give us your feedback.

IPy is over 4000% slower than CPy on TryRaiseExcept, 58,234 ms vs. 1,286ms. This one test represents 44% of our overall test run time and causes IPy to run PyBench 57% slower than CPy instead of 10% faster. Python has a different philosophy on exceptions than CLR does. Several Python exceptions like GeneratorExit and StopIteration are explicitly documented as “not considered an error”. This is a very different approach to CLR’s approach. At some point, we’re going to have to look at improving exception performance, but it’s not really a priority for the 2.6 release.↩

Writing an IronPython Debugger: Getting Local Variables

I just pushed out a new drop of ipydbg that includes the first cut of support for showing local variables. Getting the value for a local variable is actually pretty simple. The CorFrame object (which hangs off active_thread) includes a method to get a local variable by index as well getting a count of all local variables. The problem with these functions is that they don’t provide the name of the variable. For that, you’ve got to look in debug symbols.

From a CorFrame, you can retrieve the associated CorFunction. Since I added symbol reader support to CorModule, I added support for directly retrieving the ISymbolMethod for a CorFunction. From the method symbols, I can get the root lexical scope of the method. And from the symbol scope, I can get the locals. Scopes can be nested, so to get all the locals for a given function, you need to iterate thru all the child scopes as well.

So here’s my get_locals function:

def get_locals(frame, scope=None, offset=None, show_hidden=False):  
    #if the scope is unspecified, try and get it from the frame

    if scope == None:  
        symmethod = frame.Function.GetSymbolMethod()  
        if symmethod != None:  
            scope = symmethod.RootScope  
        #if scope still not available, yield the local variables

        #from the frame, with auto-gen'ed names (local_1, etc)

        else:  
          for i in range(frame.GetLocalVariablesCount()):  
            yield "local_%d" % i, frame.GetLocalVariable(i)  
          return  

    #if we have a scope, get the locals from the scope  

    #and their values from the frame

    for lv in scope.GetLocals():  
        #always skip $site locals - they are cached callsites and  

        #not relevant to the ironpython developer

        if lv.Name == "$site": continue  
        if not lv.Name.startswith("$") or show_hidden:  
          v = frame.GetLocalVariable(lv.AddressField1)  
          yield lv.Name, v  

    if offset == None: offset = frame.GetIP()[0]  

    #recusively call get_locals for all the child scopes

    for s in scope.GetChildren():  
      if s.StartOffset <= offset and s.EndOffset >= offset:  
        for ret in get_locals(frame, s, offset, show_hidden):  
          yield ret

The function is designed to automatically retrieve the scope and offset, if they’re available. That way, I can simply call get_locals with the frame argument and it does the right thing. For example, if you don’t pass in a symbol scope explicitly get_locals will attempt to retrieve the debug symbols. If debug symbols aren’t available, iterates over the locals in the frame and yields each with a fake name (local_0, local_1, etc). If the debug symbols are available, then it iterates over the locals in the scope, then calls itself for each of the child scopes (skipping child scopes who’s offset range doesn’t overlap with the current offset).

The other feature of get_locals is deciding which locals to include. As you might expect, IronPython emits some local variables that are for internal runtime use. These variables get prefixed with a dollar sign. The dollar sign is not a legal identifier character in C# or Python, but IL has no problem with it. If you pass in False for show_hidden (or use the default value), then get_locals skips over any local variables who’s name starts with the dollar sign.

Even if you pass in True for show_hidden, get_locals still skips over any variable named “$site”. $site variables are dynamic call site caches, a DLR feature that are used to efficiently dispatch dynamic calls by caching the results of previous invocations. Martin Maly’s blog has more details on these caches. As they are part of method dispatch, I never want to show them to the ipydbg user, so they get skipped regardless of the value of show_hidden.

Now that I can get the local variables for a given frame, we need to convert those variables to something you can print on the screen. That turns out to be more complicated that you might expect, so it’ll have to wait for the next post (which may be a while, given that PyCon is this weekend). In the meantime, you can get the latest version of ipydbg from GitHub.

AgDLR 0.5

I mentioned yesterday that it looked like a new release of AgDLR was eminent and sure enough here it is. There are some really cool new features including Silverlight 3 Transparent Platform Extension support, In-Browser REPL and In-Browser testing of Silverlight apps. As with IronRuby 0.3, Jimmy has the a summary of the new AgDLR release.

One feature of the new release I did want to highlight was XapHttpHandler because I’m the one who wrote it! 😄

The Silverlight versions of IronPython and IronRuby ship with a tool called Chiron that provides a REPL-esque experience for building dynamic language Silverlight apps. John Lam had a good write-up on Chiron when we first released it last year, but basically the idea is that Chiron is a local web server that will auto-generate a Silverlight XAP from a directory of Python and/or Ruby files on demand. For example, if your HTML page requests a Silverlight app named app.xap, Chiron automatically creates the app.xap file from the files in the app directory. This lets you simply edit your Python and/or Ruby files directly then refresh your browser to get the new version without needing an explicit build step.

The problem is that, unlike IIS and the ASP.NET Development Server, Chiron doesn’t integrate with ASP.NET. So it’s fine for building Silverlight apps that stand alone or talk to 3rd party services. But if you want to build a Silverlight app that talks back to it’s ASP.NET host, you’re out of luck. That’s where XapHttpHandler comes in. XapHttpHandler does the same exact on-demand XAP packaging for dynamic language Silverlight applications that Chiron does, but it’s implemented as an IHttpHandler so it plugs into the standard ASP.NET pipeline. All you have to do is put the Chiron.exe in your web application’s bin directory and add XapHttpHandler to your web.config like so:

<configuration>
  <!--remaining web.config content ommitted for clarity-->
  <system.web>
    <httpHandlers>
      <add verb="*" path="*.xap" validate="false"
           type="Chiron.XapHttpHandler,Chiron"/>
    </httpHandlers>
  <system.web>
</configuration>

The new AgDLR drop includes a sample website that shows XapHttpHandler in action.

Quick note of caution: by design, XapHttpHandler does not cache the XAP file – it’s generated anew on every request. So I would highly recommend against using XapHttpHandler on a production web server. You’re much better off using Chiron to build a physical XAP file that you then deploy to your production web server.

Series

Disclaimer

The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.