Writing an IronPython Debugger: Debugging Just My Code

As I wrote last time, in order to make debug stepping actually useful in ipydbg I need to avoid stepping into frames that are part of the IronPython infrastructure. I did something similar when I hide infrastructure frames in the stack trace. Originally, I had planned to automatically stepping again if we ended up on a frame that didn’t correspond to a python file. However, Mike Stall showed me a much cleaner and better performing solution: Just My Code. As I mentioned at the start of this series, support for JMC is one of the main reasons I wanted to build my own debugger rather than use MDbg.

Enabling JMC in the stepper object is trivial:

def create_stepper(thread, JMC = True):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  stepper.SetJmcStatus(JMC)  
  return stepper

If I make that single change and run ipydbg, any step effectively turns into a full continue since none of the code has been marked as “My Code” yet. As you see, the tricky part of JMC isn’t enabling it on the stepper, it’s “painting” the parts of the code where you want JMC stepping to work. You can set JMC status at the module, class or the method level. In the case of ipdbg, it’s easiest to work at the class level:

infrastructure_methods =  ['TryGetExtraValue',
    'TrySetExtraValue',
    '.cctor',
    '.ctor',
    'CustomSymbolDictionary.GetExtraKeys',
    'IModuleDictionaryInitialization.InitializeModuleDictionary']

def OnClassLoad(self, sender, e):
    cmi = CorMetadataImport(e.Class.Module)
    mt = cmi.GetType(e.Class.Token)
    print "OnClassLoad", mt.Name

    if not e.Class.Module.IsDynamic:
      e.Class.JMCStatus = False
    elif mt.Name.startswith('IronPython.NewTypes'):
      e.Class.JMCStatus = False
    else:
      e.Class.JMCStatus = True
      for mmi in mt.GetMethods():
        if mmi.Name in infrastructure_methods:
          f = e.Class.Module.GetFunctionFromToken(mmi.MetadataToken)
          f.JMCStatus = False

OnClassLoad is where the action is. This event handler is responsible for enabling JMC for all class methods that map to python code. To understand how the logic in OnClassLoad works, you need to understand a little about the .NET types and code that IronPython generates. Note, the following description is for the IronPython 2.0 branch. Code generation evolves from release to release and I know for a fact there are changes in the upcoming 2.6 version. I assume that I’ll eventually have to sniff the IronPython version in order to set JMC correctly.

Today, IronPython generates all code into dynamic modules and methods. Since I want to limit stepping to python code only, I automatically disable JMC for non-dynamic modules. I can imagine a scenario where I want to step into non-dynamically generated code, but I think the best way to handle that would be to disable JMC at the stepper rather than widening the amount of code marked as JMC enabled.

For every module that gets loaded, IronPython generates a type. At a minimum you’re going to load two modules: site.py and whatever python script you ran. If you have the python standard library installed, site.py loads a bunch of other modules as well. Each of these module types have a bunch of standard methods that always get generated. For example, the global scope code in the module is placed in a static method on the module type called Initialize. Any python functions you define get generated static methods with mangled names on the module type 1. All these methods have corresponding python code and should be JMC enabled. The other standard methods on a module type should not be JMC enabled. So in my debugger, I mark the class as JMC enabled but then iterate over the list of methods and mark any in the list of standard methods (except for Initialize) as JMC disabled.

Of course, you can also create classes in Python. As you might expect, classes in Python are generated as .NET types. However, the semantics of Python classes are very different than .NET types. For example, you can change the inheritance hierarchy of python classes at runtime. That’s obviously not allowed for .NET types. So the .NET types we generate have all the logic to implement Python class semantics. As it turns out, these .NET types only have the logic to implement Python class semantics, which is to say they have none of Python class methods code. This makes sense when you think about it – since Python can add and remove methods from a class at runtime, IronPython can’t put the method code in the .NET type itself. Instead, Python class methods are generated as static methods on the module type, just like top-level functions are. Since Python class types only contain Python class semantics logic, we never want to enable JMC for Python class types. Python class types get generated in the IronPython.NewTypes namespace, so it’s fairly easy to check the class name in OnClassLoad and automatically disable JMC for classes any in that namespace.

Adding JMC support makes ipydbg significantly more usable. It’s almost like a real tool now, isn’t it? Latest bits are up on GitHub.


  1. FYI, IronPython generates python functions as dynamic methods in release mode and static module class methods in debug mode since you can’t step into dynamic methods. The description above is specific to debug mode since ipydbg exclusively runs in debug mode.

Writing an IronPython Debugger: Stepping Thru Code

So far, I’ve written seven posts about my IronPython debugger, but frankly it isn’t very functional yet. It runs, breaks on the first line and can show a stack trace. Not exactly Jolt award material. In this post, I’m going to add one of the core functions of any debugger: stepping. Where previously I’ve written a bunch of code but had little to show in terms of features, now I’m getting three new features (basic step, step in and step out) at once!

def _input(self):
  #remaining _input code omitted for clarity

  elif k.Key == ConsoleKey.S:
      print "nStepping"
      self._do_step(False)
      return
  elif k.Key == ConsoleKey.I:
      print "nStepping In"
      self._do_step(True)
      return
  elif k.Key == ConsoleKey.O:
      print "nStepping Out"
      stepper = create_stepper(self.active_thread)
      stepper.StepOut()

def _do_step(self, step_in):
  stepper = create_stepper(self.active_thread)
  mod = self.active_thread.ActiveFrame.Function.Module
  if mod not in self.symbol_readers:
      stepper.Step(step_in)
  else:
    range = get_step_ranges(self.active_thread, self.symbol_readers[mod])
    stepper.StepRange(step_in, range)

Here you can see the _input clauses for step, step in and step out. Of the three, step out is the simplest to implement: create the stepper object and call StepOut. For step and step in, I could simply call Step (the boolean argument indicates if you want to step into or over functions) but that only steps a single IL statement. The vast majority of the time there are multiple IL instructions for every line of source code, so IL statement stepping is very tedious. As we learned when setting a breakpoint, debug symbols contain sequence points that map between source and IL locations. If they’re available, I use the sequence points to determine the range of IL statements to step over so that I can step single source statements instead.

The stepping code above depends on three helper functions defined at global scope.

def create_stepper(thread):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  return stepper  

def create_step_range(start, end):
  range = Array.CreateInstance(COR_DEBUG_STEP_RANGE, 1)
  range[0] = COR_DEBUG_STEP_RANGE(startOffset = UInt32(start),
                                  endOffset = UInt32(end))
  return range

def get_step_ranges(thread, reader):
    frame = thread.ActiveFrame
    offset, mapResult = frame.GetIP()
    method = reader.GetMethod(SymbolToken(frame.FunctionToken))
    for sp in get_sequence_points(method):
        if sp.offset > offset:
            return create_step_range(offset, sp.offset)
    return create_step_range(offset, frame.Function.ILCode.Size)

The first function, create_stepper, simply constructs and configures the stepper object. The call to SetUnmappedStopMask tells the debugger not to stop if it encounters code that can’t be mapped to IL. If you need to debug at that level, ipydbg is *not* for you.

Next is create_step_range, which exists purely for .NET interop purposes. There are three interop warts hidden in this function. First is creating a .NET array of COR_DEBUG_STEP_RANGE structs. Every time I write Array code like this, I wish for a CreateFromCollection static method on Array. However, in this case it isn’t that big a deal since it’s a one element array. Second wart is having to set the values of COR_DEBUG_STEP_RANGE via constructor keyword arguments. It turns out that IronPython disallows direct updates to value type fields (read this for the reason why). Instead, I pass in the field values into the constructor as keyword arguments. Finally, you have to explicitly convert the start and end offsets to a unsigned int in order to set the offset fields in the COR_DEBUG_STEP_RANGE struct constructor.

Finally is get_step_ranges, which iterates thru the list of sequence points in the current method looking for the one with the smallest offset that is larger than the current offset position. If it can’t find a matching sequence point, it sets the range to the end of the current function. The start range offset is always the current offset. I did make a significant change to get_sequence_points – it no longer yields sequence points that have a start line of 0xfeefee. By convention, that indicates a sequence point to be skipped. Originally, the logic to ignore 0xfeefee sequence points was in get_location. But when I originally wrote get_step_ranges, it had essentially the same sequence point skipping logic, so I moved it to get_location instead.

Technically, I’ve built three new features but the reality is that if you end up in IronPython infrastructure code it’s really hard to find your way back to python code. Step in is particularly useless right now. Luckily, the .NET debugger API supports a feature called “Just My Code” that will make stepping much more useful. In the meantime, the latest version of ipydbg is up on GitHub as usual.