Writing an IronPython Debugger: Displaying Values

Now that I can get the local variables for a given frame, I need to display them in the console. Eventually, I’d like to provide the ability to update the local variables as well, but you gotta crawl before you can run. Luckily, the debugger API is consistent about using same COM interfaces – wrapped by the managed CorValue class – to represent all data values, including local variables, function arguments and object fields. So the work I do now to display CorValues in the console will be reusable in other contexts down the road.

While the debugger API is consistent about how it represents values in the target process, the API it uses is very complicated. The primary COM interface for accessing values is ICorDebugValue, but it has eight siblings: ICorDebugReferenceValue, ICorDebugHandleValue, ICorDebugStringValue, ICorDebugObjectValue, ICorDebugGenericValue, ICorDebugBoxValue, ICorDebugArrayValue, ICorDebugHeapValue. All those COM interfaces are represented in managed code by CorValue and it’s subclasses.

Furthermore, confusingly ICorDebugValues have both a Type and an ExactType. ExactType is what .NET developers typically think of as the type, aka the CLR type. Well, the debugger API’s representation of the CLR type at any rate. You can retrieve the value’s metadata as a System.Type compatible object via value.ExactType.Class.GetTypeInfo().CorValue’s Type property, on the other hand, represents the object’s primitive or element type. For example, instances of .NET classes have an element Type of ELEMENT_TYPE_CLASS. There are a collection of primitive types (boolean, char, ints of various signage and size, floats of various size) as well as types you wouldn’t call primitive but that the runtime has specific knowledge of (string, array and value types – aka structs in C# terminology).

If you’re confused by all that, don’t worry so am I. Honestly, I’ve re-written this code several times, each time understanding the API just a bit better. Whatever the right way to use the interfaces, I’m sure I don’t know it. For my first cut at this, I essentially ported MDbg’s high level CorValue API – aka MDbgValue::InternalGetValue if you’re looking at the MDbg source code – over to Python. Along the way, I’ve improved on that code as I’ll describe below.

A given CorValue may be a primitive value like an int or it may be a reference to or a boxed version of some other CorValue object. So in order to print the CorValue, you have to go thru a series of attempts to dereference and unbox until you get to the “real” underlying CorValue object. From there, converting the value to a string I can print depends on the value’s element type. For primitive types like ints and floats, you can call CastToGenericValue to get a CorGenericValue “view” of the same CorValue object [1]. A CorGenericValue can read and write the raw bytes from memory in the target process of the value. The GetValue method reads the data from target process then does an unsafe cast to appropriate managed type. For example, an ELEMENT_TYPE_R4 CorValue gets cast into a System.Single. For CorValue strings, I call CastToStringValue and then access the String property. For classes, value types and objects, there’s no simple or standard approach to retrieving the data, so for now I return the result of calling CastToObjectValue. Eventually, I’ll want to provide a mechanism to read the specific fields of a class or value type.

Unfortunately, the mechanism above to read primitive types doesn’t work with IronPython. GetValue needs to know the correct element type in order to do the unsafe cast. For value types (aka any struct other than the basic primitives), GetValue will return a data as a byte array. The problem is that when you box a primitive, the original element types gets overwritten by ELEMENT_TYPE_VALUETYPE. You can’t get the original element type back, even after unboxing. So for boxed primitives, you can only retrieve the data as a raw byte array or as a CorObjectValue, neither of which is very useful.

Luckily, I was able to work around this. Under the hood, GetValue calls UnsafeGetValueAsType to do the actual work of reading the data from the target process and casting it to the right managed type. UnsafeGetValueAsType It accepts the an element type value as a method parameter. If your know the right element type value, you could call UnsafeGetValueAsType directly if instead of going thru GetValue. While boxing overwrites the original element type value, an unboxed CorValue still has the CLR type metadata available. So I was able to map CLR Types to element types (e.g. System.Single –> ELEMENT_TYPE_R4) in order to retrieve the underlying value of boxed primitive types.

_type_map = { 'System.Boolean': ELEMENT_TYPE_BOOLEAN,
  'System.SByte'  : ELEMENT_TYPE_I1, 'System.Byte'   : ELEMENT_TYPE_U1,
  'System.Int16'  : ELEMENT_TYPE_I2, 'System.UInt16' : ELEMENT_TYPE_U2,
  'System.Int32'  : ELEMENT_TYPE_I4, 'System.UInt32' : ELEMENT_TYPE_U4,
  'System.IntPtr' : ELEMENT_TYPE_I,  'System.UIntPtr': ELEMENT_TYPE_U,
  'System.Int64'  : ELEMENT_TYPE_I8, 'System.UInt64' : ELEMENT_TYPE_U8,
  'System.Single' : ELEMENT_TYPE_R4, 'System.Double' : ELEMENT_TYPE_R8,
  'System.Char'   : ELEMENT_TYPE_CHAR, }

_generic_element_types = _type_map.values()

class NullCorValue(object):
  def __init__(self, typename):
    self.typename = typename

def extract_value(value):
    rv = value.CastToReferenceValue()
    if rv != None:
      if rv.IsNull:
        typename = rv.ExactType.Class.GetTypeInfo().Name
        return NullCorValue(typename)
      return extract_value(rv.Dereference())
    bv = value.CastToBoxValue()
    if bv != None:
      return extract_value(bv.GetObject())

    if value.Type in _generic_element_types:
      return value.CastToGenericValue().GetValue()
    elif value.Type == ELEMENT_TYPE_STRING:
      return value.CastToStringValue().String
    elif value.Type == ELEMENT_TYPE_VALUETYPE:
      typename = value.ExactType.Class.GetTypeInfo().Name
      if typename in _type_map:
        gv = value.CastToGenericValue()
        return gv.UnsafeGetValueAsType(_type_map[typename])
      else:
        return value.CastToObjectValue()
    elif value.Type in [ELEMENT_TYPE_CLASS, ELEMENT_TYPE_OBJECT]:
      return value.CastToObjectValue()
    else:
      msg = "CorValue type %s not supported" % str(value.Type)
      raise (Exception, msg)

It’s kinda ugly code and I’m thinking that at least some of really belongs in the CorValue C# classes rather than in ipydbg. However, I’m not that interested in doing the significant refactoring it would take to make the CorValue API developer-friendly, so I did it here.

One thing to note that I didn’t cover earlier is the NullCorValue object. For reference values, there’s a IsNull property that may be set. If it is set, I need a mechanism to indicate the null value, but also includes the type information. So I created a custom type that can store the type name to represent null. Again, something that should be a part of the CorValue API.

Once I have my extracted value, I need to display it in the console. This is much simpler than the extracting the value. As I wrote above, I’m not making any attempt to print a real representation for CorObjectValues. I could look at making a call ToString call to get something useful, but that requires invoking a function in the target process and I haven’t gotten that far with ipydbg yet. So I just print “<…>” if it isn’t a string, primitive or null value.

def display_value(value):
  if type(value) == str:
    return (('"%s"' % value), 'System.String')
  elif type(value) == CorObjectValue:
    return ("<...>", value.ExactType.Class.GetTypeInfo().FullName)
  elif type(value) == NullCorValue:
    return ("<None>", value.typename)
  else:
    return (str(value), value.GetType().FullName)

Now all I need is to iterate thru the list of local variables and call extract_value and display_value on each in turn and print the results. I won’t reproduce that code here, but you can see it in the ipydbg project source on GitHub.

I’m happy with what I’ve gotten working (it took several days of banging my head against the proverbial wall to get it this far) but there’s still room for improvement. First, I’d like to be able to call ToString to get a class-specific generic representation as I described above. Second, I need a way to display the fields of a CorObejctValue object. It’s just a combination of metadata reading and CorObjectValue::GetFieldValue, but that code won’t write itself. Finally, there are other Python primitives – like list, dictionary and tuple – that ipydbg should have specific knowledge of and be able to display without requiring the user to drill into the member variables and the like.


  1. While the CorValue API does certain things very well, I wish it did a better job abstracting away the existence of the various ICorDebugValue interfaces. Hence the need for all the calls to CastToWhatever(). ↩︎

Comments:

Hmm, starting to look like you're having to deal with a lot of complexity by working directly with CorAPI. (I know you decided in your first post to go that route...) I'm thinking that MdbgEng / MdbgValue might be a lot simpler to use. Not that I know anything about it yet... Based on what you've learned by all your poking around, any thoughts on what it might be like to duplicate ipydbg, but building on MdbgEng classes instead of CorAPI? ~Steve