Passion * Technology * Ruthless Competence

Tuesday, March 31, 2009

DevHawk on CodeCast

Ken Levy used to work around the corner from my office, back in his days on the VSX team. These days, he’s hosting the CodeCast (among other things) and he dropped my my office a while back to chat about IronPython for his podcast.

Check it out.

Posted By Harry Pierson at 10:36 AM Pacific Standard Time

Writing an IronPython Debugger: Displaying Values

Now that I can get the local variables for a given frame, I need to display them in the console. Eventually, I’d like to provide the ability to update the local variables as well, but you gotta crawl before you can run. Luckily, the debugger API is consistent about using same COM interfaces – wrapped by the managed CorValue class – to represent all data values, including local variables, function arguments and object fields. So the work I do now to display CorValues in the console will be reusable in other contexts down the road.

While the debugger API is consistent about how it represents values in the target process, the API it uses is very complicated. The primary COM interface for accessing values is ICorDebugValue, but it has eight siblings: ICorDebugReferenceValue, ICorDebugHandleValue, ICorDebugStringValue, ICorDebugObjectValue, ICorDebugGenericValue, ICorDebugBoxValue, ICorDebugArrayValue, ICorDebugHeapValue. All those COM interfaces are represented in managed code by CorValue and it’s subclasses.

Furthermore, confusingly ICorDebugValues have both a Type and an ExactType. ExactType is what .NET developers typically think of as the type, aka the CLR type. Well, the debugger API’s representation of the CLR type at any rate. You can retrieve the value’s metadata as a System.Type compatible object via value.ExactType.Class.GetTypeInfo().CorValue’s Type property, on the other hand, represents the object’s primitive or element type. For example, instances of .NET classes have an element Type of ELEMENT_TYPE_CLASS. There are a collection of primitive types (boolean, char, ints of various signage and size, floats of various size) as well as types you wouldn’t call primitive but that the runtime has specific knowledge of (string, array and value types - aka structs in C# terminology).

If you’re confused by all that, don’t worry so am I. Honestly, I’ve re-written this code several times, each time understanding the API just a bit better. Whatever the *right* way to use the interfaces, I’m sure I don’t know it. For my first cut at this, I essentially ported MDbg’s high level CorValue API – aka MDbgValue::InternalGetValue if you’re looking at the MDbg source code – over to Python. Along the way, I’ve improved on that code as I’ll describe below.

A given CorValue may be a primitive value like an int or it may be a reference to or a boxed version of some other CorValue object. So in order to print the CorValue, you have to go thru a series of attempts to dereference and unbox until you get to the “real” underlying CorValue object. From there, converting the value to a string I can print depends on the value’s element type. For primitive types like ints and floats, you can call CastToGenericValue to get a CorGenericValue “view” of the same CorValue object [1]. A CorGenericValue can read and write the raw bytes from memory in the target process of the value. The GetValue method reads the data from target process then does an unsafe cast to appropriate managed type. For example, an ELEMENT_TYPE_R4 CorValue gets cast into a System.Single. For CorValue strings, I call CastToStringValue and then access the String property. For classes, value types and objects, there’s no simple or standard approach to retrieving the data, so for now I return the result of calling CastToObjectValue. Eventually, I’ll want to provide a mechanism to read the specific fields of a class or value type.

Unfortunately, the mechanism above to read primitive types doesn’t work with IronPython. GetValue needs to know the correct element type in order to do the unsafe cast. For value types (aka any struct other than the basic primitives), GetValue will return a data as a byte array. The problem is that when you box a primitive, the original element types gets overwritten by ELEMENT_TYPE_VALUETYPE. You can’t get the original element type back, even after unboxing. So for boxed primitives, you can only retrieve the data as a raw byte array or as a CorObjectValue, neither of which is very useful.

Luckily, I was able to work around this. Under the hood, GetValue calls UnsafeGetValueAsType to do the actual work of reading the data from the target process and casting it to the right managed type. UnsafeGetValueAsType It accepts the an element type value as a method parameter. If your know the right element type value, you could call UnsafeGetValueAsType directly if instead of going thru GetValue. While boxing overwrites the original element type value, an unboxed CorValue still has the CLR type metadata available. So I was able to map CLR Types to element types (e.g. System.Single –> ELEMENT_TYPE_R4) in order to retrieve the underlying value of boxed primitive types.

_type_map = { 'System.Boolean': ELEMENT_TYPE_BOOLEAN,   
  'System.SByte'  : ELEMENT_TYPE_I1, 'System.Byte'   : ELEMENT_TYPE_U1,   
  'System.Int16'  : ELEMENT_TYPE_I2, 'System.UInt16' : ELEMENT_TYPE_U2,   
  'System.Int32'  : ELEMENT_TYPE_I4, 'System.UInt32' : ELEMENT_TYPE_U4,   
  'System.IntPtr' : ELEMENT_TYPE_I,  'System.UIntPtr': ELEMENT_TYPE_U,  
  'System.Int64'  : ELEMENT_TYPE_I8, 'System.UInt64' : ELEMENT_TYPE_U8,   
  'System.Single' : ELEMENT_TYPE_R4, 'System.Double' : ELEMENT_TYPE_R8,   
  'System.Char'   : ELEMENT_TYPE_CHAR, }   
     
_generic_element_types = _type_map.values()   

class NullCorValue(object):  
  def __init__(self, typename):  
    self.typename = typename  

def extract_value(value):  
    rv = value.CastToReferenceValue()  
    if rv != None:  
      if rv.IsNull:   
        typename = rv.ExactType.Class.GetTypeInfo().Name  
        return NullCorValue(typename)  
      return extract_value(rv.Dereference())  
    bv = value.CastToBoxValue()  
    if bv != None:  
      return extract_value(bv.GetObject())   

    if value.Type in _generic_element_types:  
      return value.CastToGenericValue().GetValue()  
    elif value.Type == ELEMENT_TYPE_STRING:  
      return value.CastToStringValue().String  
    elif value.Type == ELEMENT_TYPE_VALUETYPE:  
      typename = value.ExactType.Class.GetTypeInfo().Name   
      if typename in _type_map:  
        gv = value.CastToGenericValue()  
        return gv.UnsafeGetValueAsType(_type_map[typename])  
      else:  
        return value.CastToObjectValue()  
    elif value.Type in [ELEMENT_TYPE_CLASS, ELEMENT_TYPE_OBJECT]:  
      return value.CastToObjectValue()  
    else:  
      msg = "CorValue type %s not supported" % str(value.Type)
      raise (Exception, msg)

It’s kinda ugly code and I’m thinking that at least some of really belongs in the CorValue C# classes rather than in ipydbg. However, I’m not that interested in doing the significant refactoring it would take to make the CorValue API developer-friendly, so I did it here.

One thing to note that I didn’t cover earlier is the NullCorValue object. For reference values, there’s a IsNull property that may be set. If it is set, I need a mechanism to indicate the null value, but also includes the type information. So I created a custom type that can store the type name to represent null. Again, something that should be a part of the CorValue API.

Once I have my extracted value, I need to display it in the console. This is much simpler than the extracting the value. As I wrote above, I’m not making any attempt to print a real representation for CorObjectValues. I could look at making a call ToString call to get something useful, but that requires invoking a function in the target process and I haven’t gotten that far with ipydbg yet. So I just print “<…>” if it isn’t a string, primitive or null value.

def display_value(value):
  if type(value) == str:
    return (('"%s"' % value), 'System.String')
  elif type(value) == CorObjectValue:
    return ("<...>", value.ExactType.Class.GetTypeInfo().FullName)
  elif type(value) == NullCorValue:
    return ("<None>", value.typename)
  else:
    return (str(value), value.GetType().FullName)

Now all I need is to iterate thru the list of local variables and call extract_value and display_value on each in turn and print the results. I won’t reproduce that code here, but you can see it in the ipydbg project source on GitHub.

I’m happy with what I’ve gotten working (it took several days of banging my head against the proverbial wall to get it this far) but there’s still room for improvement. First, I’d like to be able to call ToString to get a class-specific generic representation as I described above. Second, I need a way to display the fields of a CorObejctValue object. It’s just a combination of metadata reading and CorObjectValue::GetFieldValue, but that code won’t write itself. Finally, there are other Python primitives - like list, dictionary and tuple – that ipydbg should have specific knowledge of and be able to display without requiring the user to drill into the member variables and the like.


[1] While the CorValue API does certain things very well, I wish it did a better job abstracting away the existence of the various ICorDebugValue interfaces. Hence the need for all the calls to CastToWhatever().

Posted By Harry Pierson at 8:35 AM Pacific Standard Time

Friday, March 27, 2009

IronPython 2.6 Alpha 1

Just in type for PyCon, we just shipped the first alpha of IronPython 2.6. As you can guess from the version number, the main feature of this version of IronPython will be the new features introduced in Python 2.6. As you can see, we’ve synced version numbers between IronPython and Python. No more explaining which version of IPy goes with which version of Python.

In addition to the start of 2.6 support, the other big feature of IronPython 2.6 is something called Adaptive Compilation. IronPython’s performance is pretty good compared to CPython. We’re about 28% faster than CPython (IPy 2.0.1 vs. CPy 2.5) on PyStone and about 10% faster on PyBench if you exclude the TryRaiseExcept test. [1] However, our startup time is not very good. These two facts are related: it takes a long time on startup to compile to Python code to IL (and then JITted from IL to native code), but once that’s done the code runs really fast. However, if you’re only going to execute a function a few times, it typically isn’t worth the overhead to compile the function to IL. The Adaptive Compilation feature is an interpreter for DLR trees. The first few times you run a given Python function, it gets interpreted. At some point, after you’ve called the function enough times, IronPython 2.6 decides to take the hit and compile the function. If you want to go back to the old “always compile to IL” model, you can pass –O on the command line.

This is our first alpha of 2.6, and some things are kinda broken. In particular, there was a change to collections.py that breaks much of the Python Standard Library under IronPython. Dave has the details and the workaround. Rest assured, this will get fixed before we release. Dino is hard at work making _getframe work for depths greater than zero. Because it will have some perf impact, it won’t be enabled by default – you’ll have to pass a command-line parameter to enable it. But if you have to opt-in to _getframe support for depth > 0, it makes sense to opt-into _getframe support entirely and do away with the current _getframe(0) only support. What’s nice about this approach is that it will work with collections.py regardless if you opt-in to _getframe or not.

As stated in the release notes, the release cycle on 2.6 will be much shorter than 2.0. There was only seven months between 1.0 and 1.1, and we’re shooting for a slightly longer timeframe for 2.6. Certainly not like the twenty months that passed between 1.1 and 2.0. So please start trying it out as soon as you can and give us your feedback.


[1] IPy is over 4000% slower than CPy on TryRaiseExcept, 58,234 ms vs. 1,286ms. This one test represents 44% of our overall test run time and causes IPy to run PyBench 57% slower than CPy instead of 10% faster. Python has a different philosophy on exceptions than CLR does. Several Python exceptions like GeneratorExit and StopIteration are explicitly documented as “not considered an error”. This is a very different approach to CLR’s approach. At some point, we’re going to have to look at improving exception performance, but it’s not really a priority for the 2.6 release.

Posted By Harry Pierson at 9:20 AM Pacific Standard Time

Wednesday, March 25, 2009

Writing an IronPython Debugger: Getting Local Variables

I just pushed out a new drop of ipydbg that includes the first cut of support for showing local variables. Getting the value for a local variable is actually pretty simple. The CorFrame object (which hangs off active_thread) includes a method to get a local variable by index as well getting a count of all local variables. The problem with these functions is that they don’t provide the name of the variable. For that, you’ve got to look in debug symbols.

From a CorFrame, you can retrieve the associated CorFunction. Since I added symbol reader support to CorModule, I added support for directly retrieving the ISymbolMethod for a CorFunction. From the method symbols, I can get the root lexical scope of the method. And from the symbol scope, I can get the locals. Scopes can be nested, so to get all the locals for a given function, you need to iterate thru all the child scopes as well.

So here’s my get_locals function:

def get_locals(frame, scope=None, offset=None, show_hidden=False): 
    #if the scope is unspecified, try and get it from the frame
    if scope == None
        symmethod = frame.Function.GetSymbolMethod() 
        if symmethod != None
            scope = symmethod.RootScope 
        #if scope still not available, yield the local variables
        #from the frame, with auto-gen'ed names (local_1, etc)
        else
          for i in range(frame.GetLocalVariablesCount()): 
            yield "local_%d" % i, frame.GetLocalVariable(i) 
          return 

    #if we have a scope, get the locals from the scope 
    #and their values from the frame
    for lv in scope.GetLocals(): 
        #always skip $site locals - they are cached callsites and 
        #not relevant to the ironpython developer
        if lv.Name == "$site": continue 
        if not lv.Name.startswith("$") or show_hidden: 
          v = frame.GetLocalVariable(lv.AddressField1) 
          yield lv.Name, v 

    if offset == None: offset = frame.GetIP()[0

    #recusively call get_locals for all the child scopes
    for s in scope.GetChildren(): 
      if s.StartOffset <= offset and s.EndOffset >= offset: 
        for ret in get_locals(frame, s, offset, show_hidden): 
          yield ret

The function is designed to automatically retrieve the scope and offset, if they’re available. That way, I can simply call get_locals with the frame argument and it does the right thing. For example, if you don’t pass in a symbol scope explicitly get_locals will attempt to retrieve the debug symbols. If debug symbols aren’t available, iterates over the locals in the frame and yields each with a fake name (local_0, local_1, etc). If the debug symbols are available, then it iterates over the locals in the scope, then calls itself for each of the child scopes (skipping child scopes who’s offset range doesn’t overlap with the current offset).

The other feature of get_locals is deciding which locals to include. As you might expect, IronPython emits some local variables that are for internal runtime use. These variables get prefixed with a dollar sign. The dollar sign is not a legal identifier character in C# or Python, but IL has no problem with it. If you pass in False for show_hidden (or use the default value), then get_locals skips over any local variables who’s name starts with the dollar sign.

Even if you pass in True for show_hidden, get_locals still skips over any variable named “$site”. $site variables are dynamic call site caches, a DLR feature that are used to efficiently dispatch dynamic calls by caching the results of previous invocations. Martin Maly’s blog has more details on these caches. As they are part of method dispatch, I never want to show them to the ipydbg user, so they get skipped regardless of the value of show_hidden.

Now that I can get the local variables for a given frame, we need to convert those variables to something you can print on the screen. That turns out to be more complicated that you might expect, so it’ll have to wait for the next post (which may be a while, given that PyCon is this weekend). In the meantime, you can get the latest version of ipydbg from GitHub.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Tuesday, March 24, 2009

AgDLR 0.5

agdlr-400 I mentioned yesterday that it looked like a new release of AgDLR was eminent and sure enough here it is. There are some really cool new features including Silverlight 3 Transparent Platform Extension support, In-Browser REPL and In-Browser testing of Silverlight apps. As with IronRuby 0.3, Jimmy has the a summary of the new AgDLR release.

One feature of the new release I did want to highlight was XapHttpHandler because I’m the one who wrote it! :)

The Silverlight versions of IronPython and IronRuby ship with a tool called Chiron that provides a REPL-esque experience for building dynamic language Silverlight apps. John Lam had a good write-up on Chiron when we first released it last year, but basically the idea is that Chiron is a local web server that will auto-generate a Silverlight XAP from a directory of Python and/or Ruby files on demand. For example, if your HTML page requests a Silverlight app named app.xap, Chiron automatically creates the app.xap file from the files in the app directory. This lets you simply edit your Python and/or Ruby files directly then refresh your browser to get the new version without needing an explicit build step.

The problem is that, unlike IIS and the ASP.NET Development Server, Chiron doesn’t integrate with ASP.NET. So it’s fine for building Silverlight apps that stand alone or talk to 3rd party services. But if you want to build a Silverlight app that talks back to it’s ASP.NET host, you’re out of luck. That’s where XapHttpHandler comes in. XapHttpHandler does the same exact on-demand XAP packaging for dynamic language Silverlight applications that Chiron does, but it’s implemented as an IHttpHandler so it plugs into the standard ASP.NET pipeline. All you have to do is put the Chiron.exe in your web application’s bin directory and add XapHttpHandler to your web.config like so:

<configuration>
  <!--remaining web.config content ommitted for clarity-->
  <system.web>
    <httpHandlers>
      <add verb="*" path="*.xap" validate="false" type="Chiron.XapHttpHandler,Chiron"/>
    </httpHandlers>
  <system.web>
</configuration>

The new AgDLR drop includes a sample website that shows XapHttpHandler in action.

Quick note of caution: by design, XapHttpHandler does not cache the XAP file - it’s generated anew on every request. So I would highly recommend against using XapHttpHandler on a production web server. You’re much better off using Chiron to build a physical XAP file that you then deploy to your production web server.

Posted By Harry Pierson at 10:25 PM Pacific Standard Time

Monday, March 23, 2009

IronRuby 0.3

Last week was Mix09, Microsoft’s annual conference for web development and design. There were some big announcements – Silverlight 3 Beta, ASP.NET MVC RTM, TDS support for SQL Data Services, new drops of Azure and LiveFX SDKs and I’m sure a bunch of other things that I’ve forgotten.

Of course, by far the most important thing that shipped at Mix09 was IronRuby 0.3.

Jimmy has the details on the new release and John Lam did a talk at Mix on dynamic languages in Silverlight. I haven’t seen an announcement, but it also looks like there’s a new version of AgDLR - aka the Silverlight Dynamic Languages SDK – as well.

Posted By Harry Pierson at 9:42 AM Pacific Standard Time

Saturday, March 21, 2009

Writing an IronPython Debugger: A Little Hack…err…Cleanup

Yesterday, I pushed out two commits to ipydbg. The first was simple, I removed all of the embedded ConsoleColorMgr code in favor of the separate consolecolor.py module I blogged about Thursday. The second commit…well, let’s just say it’s not quite so simple.

Last weekend, I was experimenting with breakpoints when I discovered that the MoveNext method of BreakpointEnumerator was throwing a NotImplementedException. Up to that point, I hadn’t modified any of the MDbg C# source code except to merge the corapi and raw assemblies into a single assembly. But since I had to fix BreakpointEnumerator, I figured I should make some improvements to the C# code as well. For example, I added helper functions to easily retrieve the metadata for a class or function.

In my latest commit, I’ve added a SymbolReader property to CorModule. Previously, I managed the mapping from CorModules to SymbolReaders in my IPyDebugProcess class via the symbol_readers field. However, since mapping CorModules to SymbolReaders is something pretty much any debugger app would have to do, it made more sense to have that be a part of CorModule directly. So now, you can set and retrieve the SymbolReader directly on the module. Furthermore, I moved the logic to retrieve a SymbolReader from the IStream provided in the OnUpdateModuleSymbols event into the CorModule class as well.

I wouldn’t have bothered to blog this change at all, except that if you look at how the SymbolReader property is implemented under the hood, it’s not what you would expect. Instead of having SymbolReader as an instance variable on CorModule – as you might expect -CorModule has a static dictionary mapping CorModules to SymbolReaders. The instance SymbolReader property simply then access to the underlying static dictionary.

//code taken from CorModule class in CorModule.cs
private static Dictionary<CorModule, ISymbolReader> _symbolsMap =   
                             new Dictionary<CorModule, ISymbolReader>();   

public ISymbolReader SymbolReader    
{   
    get   
    {   
        if (_symbolsMap.ContainsKey(this))   
            return _symbolsMap[this];   
        else   
            return null;   
    }   
    set   
    {   
        _symbolsMap[this] = value;   
    }   
}

Now obviously, this the way you typically implement properties. However, the problem is that there isn’t a 1-to-1 mapping between the underlying debugger COM object instances and the managed objects instances that wrap them. For example, if you look at the CorClass:Module property, it constructs a new managed wrapper for the COM interface it gets back from ICorDebugClass.GetModule. That means that I can’t store the symbol reader as an instance field in the managed wrapper since I probably will never see a given managed wrapper module instance ever again.

All of the debugger API wrapper classes including CorModule inherit from a class named WrapperBase which overrides Equals and GetHashCode. The overridden implementations defer to the wrapped COM interface, which means that two separate managed wrapper instances of the same COM interface will have the same hash code and will evaluate as equal. The upshot is that object uniqueness is determined by the wrapped COM object rather that the managed object instance itself.

Using a static dictionary to store a module instance property provides the necessary “it doesn’t matter what managed object instance you use as long as they all wrap the same COM object underneath” semantics. If I create multiple instances CorModule that all wrap the same underlying COM interface pointer, they’ll all share the same SymbolReader instance from the dictionary.

Yeah, it’s feels kinda hacky, but it works.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Thursday, March 19, 2009

IronPython ConsoleColorMgr

I really liked the ConsoleColorMgr class from my last ipydbg post so I took a few minutes to yank it out into its own seperate module. I also took the opportunity to make a few improvements.

First off, I added support for background colors as well as foreground colors. Furthermore, both colors default to “None” which ConsoleColorMgr takes to mean leave that color unchanged.

from System import Console as _Console

class ConsoleColorMgr(object):
  def __init__(self, foreground = None, background = None):
    self.foreground = foreground
    self.background = background

  def __enter__(self):  
    self._tempFG = _Console.ForegroundColor  
    self._tempBG = _Console.BackgroundColor 
    if self.foreground: _Console.ForegroundColor = self.foreground  
    if self.background: _Console.BackgroundColor = self.background
      
  def __exit__(self, t, v, tr):  
    _Console.ForegroundColor = self._tempFG 
    _Console.BackgroundColor = self._tempBG

The other change I made was to build a set of default ConsoleColorMgr instances in the consolecolor module, one for each of the values in ConsoleColor.

import sys  
from System import ConsoleColor, Enum
  
_curmodule = sys.modules[__name__]

for
n in Enum.GetNames(ConsoleColor):
    setattr(_curmodule, n, ConsoleColorMgr(Enum.Parse(ConsoleColor, n)))

Note that for this set of default ConsoleColorMgr instances, I’m only setting the foreground color. If you want to set the background color, you have to create your own ConsoleColorMgr instances. This allows me to write the following:

from __future__ import with_statement
import consolecolor   

with consolecolor.Red:    
    print "Open the pod bay doors, HAL"   
with consolecolor.ConsoleColorMgr(ConsoleColor.Black, ConsoleColor.Red): 
    print "I'm sorry Dave, I'm afraid I can't do that." 

If you want it, I’ve put consolecolor.py up on my skydrive or it’s available as part of my devhawk_ipy project on GitHub.

Update - Christopher Bermingham pointed out that my sample snippet at the end doesn’t work unless you add “from __future__ import with_statement” to the top of your python file. I updated my code snippet to include this. Thanks Christopher!

Posted By Harry Pierson at 3:43 PM Pacific Standard Time

Writing an IronPython Debugger: Colorful Console

Now that I’ve added the current source code line to the console output, I wanted to start using color in order to make it clearer to understand the various pieces of data that gets output. Now, the various event handler messages get output in dark grey while the current line of source is in yellow. Here’s what it looks like on my machine (note, the top line with the green [11] is PowerShell and ipy2 is a PowerShell alias to ipy.exe v2.0.1)

ipydbg on the console

Writing color to the windows console is a hassle because of the stateful API it uses. The problem is that I always want to return to the default color after I’ve written out a line of colored text. I wish there was an overload of Console.Write and WriteLine that took the foreground and background colors as arguments. 

Of course, I could easily implement my own write and writeline methods that took color parameters. However, I was loath to do that as Python’s print statement is so convenient. So instead, I build a console color context manager. I got the idea from Luis Fallas’ XmlWriter context manager.

class ConsoleColorMgr(object): 
  def __init__(self, color): 
    self.color = color 

  def __enter__(self): 
    self.temp = Console.ForegroundColor 
    Console.ForegroundColor = self.color 
     
  def __exit__(self, t, v, tr): 
    Console.ForegroundColor = self.temp 

CCDarkGray = ConsoleColorMgr(ConsoleColor.DarkGray)
CCGray     = ConsoleColorMgr(ConsoleColor.Gray)
CCYellow   = ConsoleColorMgr(ConsoleColor.Yellow)

def OnCreateAppDomain(self, sender,e): 
    with CCDarkGray: 
      print "OnCreateAppDomain", e.AppDomain.Name 
    e.AppDomain.Attach()

Python’s with statement is similar to C#’s using statement. However, unlike IDisposable object, Python context managers support both an enter and exit method. This means I don’t have to construct an object in order to get a context (in this case, the console colors) managed. So far, I’ve got three console color context managers defined – Grey, DarkGrey and Yellow. I’m thinking that ConsoleColorMgr is a candidate for my assorted module collection at some point.

Now that I can print in color, I wanted to modify my line printer to use color. Usually, the current sequence point corresponds to an entire line of python source. But as we see below, sometimes only part of a given line of source text is associated with a given sequence point.

image

The other issue I ran into is that there’s a always a sequence point at the very end of a function. Unlike the break at the start of the function I wrote about in my last post, this one I didn’t want to automatically step over. This is the last breakpoint for a given scope, so I should give the user one last chance to inspect the scope (once I add the ability to do that, at any rate) before we step out of it. However, I wanted a way of showing that we’re about to step out in the source code line view. I decided on writing a series of carets ^^^ to indicate that we’re at the end of a function.

image

As you can see in the dark grey line in the screenshot above, the current sequence point starts and ends at line 4 column 23. Column 23 is beyond the end of line 4, so that’s what I look for in order to draw the three carets. Here’s the final version of _print_source_line:

def _print_source_line(self, sp, lines):
  line = lines[sp.start_line-1]
  with CCGray:
    Console.Write("%d: " % sp.start_line)
    Console.Write(line.Substring(0, sp.start_col-1))
    with CCYellow:
      if sp.start_col > len(line):
        Console.Write(" ^^^")
      else:
        Console.Write(line.Substring(sp.start_col-1,
                                     sp.end_col - sp.start_col))
    Console.WriteLine(line.Substring(sp.end_col-1))

So colorizing the current line of source code turned out to be a little harder than I had expected. But hey, I got a start of a reusable module out of it. That’s pretty cool. Anyway, the latest bits are, as always, up on GitHub.

Posted By Harry Pierson at 2:48 PM Pacific Standard Time

Writing an IronPython Debugger: Showing Source Code

It’s been almost a week since my last ipydbg post. I’m not done, I just needed to catch my breath for a few days and get some other work done. Contrary to popular believe, my day job revolves around more than just ipydbg! :)

Actually, I’ve made ten commit since my last post, but it’s been a mostly minor changes. For example, I was hacking around with breakpoints and restored a bunch of commented out code in BreakpointEnumerator. Since I was changing the original C# CorDebug wrapper source, I decided to add a few helper functions to return metadata for functions and classes as well as cleaning up some C# filenames. On the Python side, I added an active_appdomain field to IPyDebugProcess to go along with active_thread.

Today, I added what started as a fairly minor feature – showing the current line of source code at the start of the input loop. The initial code for this was cake, simply getting the sequence point for the current location and mapping that to a source file. In order to avoid hitting the file system over and over, I cache source files the first time they are accessed.

def _get_file(self,filename):
    filename = Path.GetFileName(filename)
    if not filename in self.source_files:
      self.source_files[filename] = File.ReadAllLines(filename)
    return self.source_files[filename] 

def _input(self):
    offset, sp = self._get_location(self.active_thread.ActiveFrame)
    lines = self._get_file(sp.doc.URL)
    print "%d:" % sp.start_line, lines[sp.start_line-1]
    #input loop ommited for clarity   

However, when I did this, I discovered a slight issue. When you step into a Python function, the CLR debugger breaks at the very beginning of the function being stepped into. In C#, the function start is mapped to the opening curly brace of the function. IronPython, on the other hand, doesn’t map the start of the function to anything since there’s a bunch of infrastructure code at the start of every function that has no correlation to the python source. This means _get_location return a null sequence point when I first step into a function and thus I wouldn’t be able to show any source code.

I could make the argument that start of the function should be mapped to the colon that starts the function block. However, I’m not in a position to make changes to how the shipping version of IronPython emits debug symbols. So instead, I decided to insert an automatic step whenever I step into a function by modifying OnStepComplete:

def OnStepComplete(self, sender,e):
    offset, sp = self._get_location(e.Thread.ActiveFrame)
    print "OnStepComplete Reason:", e.StepReason, \
           "Location:", sp if sp != None else "offset %d" % offset
    if e.StepReason == CorDebugStepReason.STEP_CALL:
      self._do_step(e.Thread, False)
    else:
      self._do_break_event(e)

I have this nagging feeling that a simple step won’t suffice and I’ll need to add logic to ensure that I’m only auto-stepping when the start of the function doesn’t have a matching sequence point. But I have tested this with a few different python scripts and it appears to work fine. If I need something more sophisticated, I can always add it later. BTW, notice I modified the signature of _do_step so that it takes the thread as an argument rather than picking it up as an IPyDebugProcess field.

As usual, latest ipydbg (including new compiled version of CorDebug.dll) is available at GitHub.

Posted By Harry Pierson at 1:58 PM Pacific Standard Time

Friday, March 13, 2009

Writing an IronPython Debugger: Debugging Just My Code

As I wrote last time, in order to make debug stepping actually useful in ipydbg I need to avoid stepping into frames that are part of the IronPython infrastructure. I did something similar when I hide infrastructure frames in the stack trace. Originally, I had planned to automatically stepping again if we ended up on a frame that didn’t correspond to a python file. However, Mike Stall showed me a much cleaner and better performing solution: Just My Code. As I mentioned at the start of this series, support for JMC is one of the main reasons I wanted to build my own debugger rather than use MDbg.

Enabling JMC in the stepper object is trivial:

def create_stepper(thread, JMC = True):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  stepper.SetJmcStatus(JMC) 
  return stepper

If I make that single change and run ipydbg, any step effectively turns into a full continue since none of the code has been marked as “My Code” yet. As you see, the tricky part of JMC isn’t enabling it on the stepper, it’s “painting” the parts of the code where you want JMC stepping to work. You can set JMC status at the module, class or the method level. In the case of ipdbg, it’s easiest to work at the class level:

infrastructure_methods =  ['TryGetExtraValue',     
    'TrySetExtraValue',     
    '.cctor',     
    '.ctor',     
    'CustomSymbolDictionary.GetExtraKeys',     
    'IModuleDictionaryInitialization.InitializeModuleDictionary']    

def OnClassLoad(self, sender, e):
    cmi = CorMetadataImport(e.Class.Module)
    mt = cmi.GetType(e.Class.Token)
    print "OnClassLoad", mt.Name

    if not e.Class.Module.IsDynamic:
      e.Class.JMCStatus = False
    elif mt.Name.startswith('IronPython.NewTypes'):
      e.Class.JMCStatus = False
    else:
      e.Class.JMCStatus = True
      for mmi in mt.GetMethods():
        if mmi.Name in infrastructure_methods:
          f = e.Class.Module.GetFunctionFromToken(mmi.MetadataToken)
          f.JMCStatus = False

OnClassLoad is where the action is. This event handler is responsible for enabling JMC for all class methods that map to python code. To understand how the logic in OnClassLoad works, you need to understand a little about the .NET types and code that IronPython generates. Note, the following description is for the IronPython 2.0 branch. Code generation evolves from release to release and I know for a fact there are changes in the upcoming 2.6 version. I assume that I’ll eventually have to sniff the IronPython version in order to set JMC correctly.

Today, IronPython generates all code into dynamic modules and methods. Since I want to limit stepping to python code only, I automatically disable JMC for non-dynamic modules. I can imagine a scenario where I want to step into non-dynamically generated code, but I think the best way to handle that would be to disable JMC at the stepper rather than widening the amount of code marked as JMC enabled.

For every module that gets loaded, IronPython generates a type. At a minimum you’re going to load two modules: site.py and whatever python script you ran. If you have the python standard library installed, site.py loads a bunch of other modules as well. Each of these module types have a bunch of standard methods that always get generated. For example, the global scope code in the module is placed in a static method on the module type called Initialize. Any python functions you define get generated static methods with mangled names on the module type [1]. All these methods have corresponding python code and should be JMC enabled. The other standard methods on a module type should not be JMC enabled. So in my debugger, I mark the class as JMC enabled but then iterate over the list of methods and mark any in the list of standard methods (except for Initialize) as JMC disabled.

Of course, you can also create classes in Python. As you might expect, classes in Python are generated as .NET types. However, the semantics of Python classes are very different than .NET types. For example, you can change the inheritance hierarchy of python classes at runtime. That’s obviously not allowed for .NET types. So the .NET types we generate have all the logic to implement Python class semantics. As it turns out, these .NET types *only* have the logic to implement Python class semantics, which is to say they have *none* of Python class methods code. This makes sense when you think about it – since Python can add and remove methods from a class at runtime, IronPython can’t put the method code in the .NET type itself. Instead, Python class methods are generated as static methods on the module type, just like top-level functions are. Since Python class types only contain Python class semantics logic, we never want to enable JMC for Python class types. Python class types get generated in the IronPython.NewTypes namespace, so it’s fairly easy to check the class name in OnClassLoad and automatically disable JMC for classes any in that namespace.

Adding JMC support makes ipydbg significantly more usable. It’s almost like a real tool now, isn’t it? Latest bits are up on GitHub.


[1] FYI, IronPython generates python functions as dynamic methods in release mode and static module class methods in debug mode since you can’t step into dynamic methods. The description above is specific to debug mode since ipydbg exclusively runs in debug mode.

Posted By Harry Pierson at 3:43 PM Pacific Standard Time

Writing an IronPython Debugger: Stepping Thru Code

So far, I’ve written seven posts about my IronPython debugger, but frankly it isn’t very functional yet. It runs, breaks on the first line and can show a stack trace. Not exactly Jolt award material. In this post, I’m going to add one of the core functions of any debugger: stepping. Where previously I’ve written a bunch of code but had little to show in terms of features, now I’m getting three new features (basic step, step in and step out) at once!

def _input(self):
  #remaining _input code omitted for clarity
  elif k.Key == ConsoleKey.S:
      print "\nStepping"
      self._do_step(False)
      return
  elif k.Key == ConsoleKey.I:
      print "\nStepping In"
      self._do_step(True)
      return                
  elif k.Key == ConsoleKey.O:
      print "\nStepping Out"
      stepper = create_stepper(self.active_thread)
      stepper.StepOut()

def _do_step(self, step_in):
  stepper = create_stepper(self.active_thread)
  mod = self.active_thread.ActiveFrame.Function.Module
  if mod not in self.symbol_readers:
      stepper.Step(step_in)
  else:
    range = get_step_ranges(self.active_thread, self.symbol_readers[mod])
    stepper.StepRange(step_in, range)

Here you can see the _input clauses for step, step in and step out. Of the three, step out is the simplest to implement: create the stepper object and call StepOut. For step and step in, I could simply call Step (the boolean argument indicates if you want to step into or over functions) but that only steps a single IL statement. The vast majority of the time there are multiple IL instructions for every line of source code, so IL statement stepping is very tedious. As we learned when setting a breakpoint, debug symbols contain sequence points that map between source and IL locations. If they’re available, I use the sequence points to determine the range of IL statements to step over so that I can step single source statements instead.

The stepping code above depends on three helper functions defined at global scope.

def create_stepper(thread):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  return stepper 
  
def create_step_range(start, end):
  range = Array.CreateInstance(COR_DEBUG_STEP_RANGE, 1)
  range[0] = COR_DEBUG_STEP_RANGE(startOffset = UInt32(start),
                                  endOffset = UInt32(end))
  return range
  
def get_step_ranges(thread, reader):
    frame = thread.ActiveFrame
    offset, mapResult = frame.GetIP()
    method = reader.GetMethod(SymbolToken(frame.FunctionToken))
    for sp in get_sequence_points(method):
        if sp.offset > offset:
            return create_step_range(offset, sp.offset)
    return create_step_range(offset, frame.Function.ILCode.Size)          

The first function, create_stepper, simply constructs and configures the stepper object. The call to SetUnmappedStopMask tells the debugger not to stop if it encounters code that can’t be mapped to IL. If you need to debug at that level, ipydbg is *not* for you.

Next is create_step_range, which exists purely for .NET interop purposes. There are three interop warts hidden in this function. First is creating a .NET array of COR_DEBUG_STEP_RANGE structs. Every time I write Array code like this, I wish for a CreateFromCollection static method on Array. However, in this case it isn’t that big a deal since it’s a one element array. Second wart is having to set the values of COR_DEBUG_STEP_RANGE via constructor keyword arguments. It turns out that IronPython disallows direct updates to value type fields (read this for the reason why). Instead, I pass in the field values into the constructor as keyword arguments. Finally, you have to explicitly convert the start and end offsets to a unsigned int in order to set the offset fields in the COR_DEBUG_STEP_RANGE struct constructor.

Finally is get_step_ranges, which iterates thru the list of sequence points in the current method looking for the one with the smallest offset that is larger than the current offset position. If it can’t find a matching sequence point, it sets the range to the end of the current function. The start range offset is always the current offset. I did make a significant change to get_sequence_points – it no longer yields sequence points that have a start line of 0xfeefee. By convention, that indicates a sequence point to be skipped. Originally, the logic to ignore 0xfeefee sequence points was in get_location. But when I originally wrote get_step_ranges, it had essentially the same sequence point skipping logic, so I moved it to get_location instead.

Technically, I’ve built three new features but the reality is that if you end up in IronPython infrastructure code it’s really hard to find your way back to python code. Step in is particularly useless right now. Luckily, the .NET debugger API supports a feature called “Just My Code” that will make stepping much more useful. In the meantime, the latest version of ipydbg is up on GitHub as usual.

Posted By Harry Pierson at 9:31 AM Pacific Standard Time

Thursday, March 12, 2009

VB Dev Lead Position Open

In case you’re job hunting, the VB team has a position open for a dev lead:

The Visual Basic team has a long history of delivering great value to our customers, and we are continuing that in the Dev10 release of Visual Studio. We’re looking for a Development Lead to help guide these efforts as well as shape future versions of the compiler.

The Visual Studio Languages group (VSL) develops VB, C#, F#, IronPython and IronRuby. As a member of this product unit, you’ll have the opportunity to work with others developing compilers and IDEs targeting the .NET runtime. You’ll benefit from their experience and contribute best practices and methodologies of your own. In VSL, developers work closely with their QA team, and we are committed to delivering the best value for our customers at very high quality.

As a Development Lead on the Visual Basic compiler, you’ll be the hand at the tiller of VB.NET compiler development. Specifically, you will:

  • Manage the day-to-day duties of the compiler and runtime development team, ensuring on-schedule delivery of high quality components.
  • Help chart the direction the compiler team takes by prioritizing efforts in coordination with your counterparts in QA and PM.
  • Contribute to the design of the Visual Basic programming language.
  • Mentor your team of developers to continue their career growth.
  • Help shape the engineering environment and procedures in Visual Studio Languages.
  • Work closely with the IDE team to help them provide a top notch editing and debugging experience.

To be successful, you’ll need the following:

  • A demonstrated aptitude for managing a team of high-caliber developers.
  • Excellent communication, collaboration and negotiation skills and the ability to drive open issues to closure.
  • Strong architectural sense and a working knowledge of the fundamentals of compiler design.
  • Passion for delivering customer solutions and quality software in general.
  • Working knowledge of the managed runtime environment is a strong plus.
Posted By Harry Pierson at 9:37 AM Pacific Standard Time

Wednesday, March 11, 2009

Writing an IronPython Debugger: Refactoring

When we last left ipydbg, it was up to about 200 lines of code. Not bad in terms of overall length, but I started to detect some code smell. I was relying pretty heavily on global variables and the structure of my code made it difficult to control how the debugger was run. I wanted to change ipydbg so it would automatically spin up an MTA thread if I forgot to add the –X:MTA command line parameter. But since by debugger and process objects were global, they’d get created on the main thread of ipydbg, regardless if it was STA or MTA. So for this “release” (I’d say I’m almost to version 0.0.0.1), I decided on focusing on enginering and refactoring rather than new features.

The big new addition is the IPyDebugProcess class, which is clearly the workhorse of the application. All of the previously global variables are now class instance variables on IPyDebugProcess. Input and run along with all the event handlers as well as do_break_event and get_location are now class methods, as they need to access instance variables (setting the break event, accessing the symbol reader dictionary, etc.). Functions that didn’t need to access instance variables (get_sequence_points, create_breakpoint, get_dynamic_frames and get_method_info_for_frame) I left as top-level functions. If they get more complex, I may break them out into their own modules, but for now I left them in ipydbg.py.

The conversion process was fairly trivial. I had to add “self.” lots of places and change the indention level all over but that was pretty much it. Once I finished the conversion, I was able to add the run_debugger function to handle the thread creation, if necessary.

def run_debugger(py_file):
    if Thread.CurrentThread.GetApartmentState() == ApartmentState.STA:
        t = Thread(ParameterizedThreadStart(run_debugger))
        t.SetApartmentState(ApartmentState.MTA)
        t.Start(py_file)
        t.Join()   
    else:
        p = IPyDebugProcess()
        p.run(py_file)

if __name__ == "__main__":        
    run_debugger(sys.argv[1])        

Originally, I tried to put this logic in IPyDebugProcess.run. However, since I’m creating the debugger object in the __init__ function, that meant it would be created on the wrong thread. I could have moved the debugger creation to the run method or move the thread management code to __init__, but I decided to factor that logic into a separate function completely. Felt cleaner that way.

Posted By Harry Pierson at 7:42 PM Pacific Standard Time

IronPython at PyCon

Here’s a quick quiz. Which of these tasks is harder to accomplish:

  1. Getting $6,000 from a variety of groups within Microsoft to pay for a Gold PyCon 2009 sponsorship.
  2. Sending PSF a check

If you guessed #2, you’d be right. It’s amazing how difficult the seemly trivial task of “give those PSF folks money” turned out to be. But it’s done now, and you can see the MS logo there on the side of all the PyCon pages.

In addition to the sponsorship, there are some great looking IronPython sessions at PyCon.

Posted By Harry Pierson at 3:22 PM Pacific Standard Time

devhawk_ipy

As I write various python modules (many of which get blogged about), I dump them into a special folder on my machine(s). In my powershell profile script, I set the IRONPYTHONPATH environment variable so that these modules are available to the IPy interpreter (i.e. ipy.exe). To date, I’ve been pretty haphazard about this. But I decided to get a little more structured and put that folder under source control and make it available as “devhawk_ipy”.

So far, I’ve only got three scripts (plus an empty __init__.py) in devhawk_ipy.

Eventually I’ll put my code for working with WPF, LiveFX and Azure into this package, but I’m not happy with where they are yet.

Like ipydbg, devhawk_ipy is up on GitHub. For those non-Git users, I’m will continue to these files up on my SkyDrive. I kind of see SkyDrive as a dumping ground for random content while devhawk_ipy is where stuff goes when it’s a little more polished.

Like IronPython, devhawk_ipy is licensed under the MS-PL. If you’re interested in contributing, feel free to fork and send me patches.

Posted By Harry Pierson at 2:44 PM Pacific Standard Time

Monday, March 09, 2009

Writing an IronPython Debugger: Dynamic Stack Trace

Now that I can interact with my debugger, it’s time to add a command. I decided to start with something simple – or at least something I thought would be simple - printing a stack trace.

In the unmanaged debugger API, threads have the concept of both stack chains and stack frames. A stack chain represents a segment of the physical stack. In a typical managed app, you’ll have at least two stack chains: the unmanaged stack chain and the managed stack chain. You can interate through the stack chains for a given thread via the Chains property. However, ipydbg is a managed only debugger, so I can ignore the unmanaged stack chain. Instead, I just retrieve the current (managed) chain via the thread’s ActiveChain property.

Within a managed stack chain, there is a collection of stack frames. This is the call stack that managed developers are typically used to working with. It turns out that printing a raw stack trace is very easy to do. Here was my first stab at it:

elif k.Key == ConsoleKey.T:
  print "\nManaged Stack Trace"
  for f in active_thread.ActiveChain.Frames:
    offset, sp = get_location(f)
    metadata_import = CorMetadataImport(f.Function.Module)
    method_info = metadata_import.GetMethodInfo(f.FunctionToken)
    print "  ", \
      "%s::%s --" % (method_info.DeclaringType.Name, method_info.Name), \
      sp if sp != None else "(offset %d)" % offset

This elif block is part of the input method I showed last time. It loops thru the frames in the Active Chain of the active thread and prints some data to the console. As I said, pretty easy. Of course, the devil is in the details.

First detail I should call out is that active_thread variable. As per Mike Stall, “there is no notion of "active thread" in the underlying debug APIs. It's purely a construct in a debugger UI to make it easier for end-users.” My console based UI may be rudimentary, but it’s still a UI. Events like OnBreakpoint include the active thread as a event argument, so I stash that away in a variable so it’ll be available to the input loop.

Second detail is the call to get_location. When we last saw get_location, it was returning a formatted string. Since my last post, I’ve refactored the code so it returns the raw location data – a tuple of the raw IP offset and the associated sequence point, if available. I’ve also added a __str__ method to my sequence point object, so when I print it to the console, I get the filename and line nicely formatted.

Finally, there’s all CorMetadataImport code. In addition to wrapping the unmanaged debugger API, CorDebug also wraps the unmanaged metadata API. This code lets me get MethodInfo compatible view of the function metadata for a given stack frame. I use it here to get the type and function name for each frame on the stack.

The end result looks something like this. Note, I’ve replaced “Microsoft.Scripting” with “MS.Scripting” to avoid word wrapping.

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Managed Stack Trace
   S$2::Initialize simpletest.py:1 (offset: 84)
   MS.Scripting.Runtime.OptimizedScriptCode::InvokeTarget (offset 72)
   MS.Scripting.ScriptCode::Run (offset 0)
   IronPython.Hosting.PythonCommandLine::RunFileWorker (offset 77)
   IronPython.Hosting.PythonCommandLine::RunFile (offset 15)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 46)
   IronPython.Hosting.PythonCommandLine::Run (offset 240)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 74)
   MS.Scripting.Hosting.Shell.ConsoleHost::RunCommandLine (offset 158)
   MS.Scripting.Hosting.Shell.ConsoleHost::ExecuteInternal (offset 32)
   MS.Scripting.Hosting.Shell.ConsoleHost::Execute (offset 63)
   MS.Scripting.Hosting.Shell.ConsoleHost::Run (offset 390)
   PythonConsoleHost::Main -- (offset 125)

As we can see, we may be on the first line of the python script, but we’ve got a pretty deep stack trace already. Everything but the top-most frame are from the underlying IronPython implementation. Those extra frames obscure the stack frames I actually care about, so it would be nice to hide any stack frames from IronPython or the DLR. It’s easy enough to write a python generator function that filters out frames that from the DLR or IronPython namespaces. In order to get the type name, we need the method_info like we did above. I’ve factored that code into a separate function in order to avoid code duplication.

def get_method_info_for_frame(frame)
    if frame.FrameType != CorFrameType.ILFrame:
      return None
    metadata_import = CorMetadataImport(frame.Function.Module)
    return metadata_import.GetMethodInfo(frame.FunctionToken)
    
def get_dynamic_frames(chain):
  for f in chain.Frames:
    method_info = get_method_info_for_frame(f)
    if method_info == None:
      continue
    typename = method_info.DeclaringType.Name
    if typename.startswith("Microsoft.Scripting.") \
      or typename.startswith("IronPython.") \
      or typename == "PythonConsoleHost":
        continue
    yield f

You’ll notice I’ve added a guard to get_method_info_for_frame in order to ensure that the frame argument is an IL Frame. There are three types of stack frames in the debugger API: IL, native and internal. Most of the frames we’re dealing with are IL frames, but you do run into the occasional lightweight function (i.e. DynamicMethod) frame when debugging IronPython code. Typically, IronPython generates DynamicMethods for all python code except for a few cases related to .NET interop. However, you can’t debug DynamicMethods, so when you run with –D, we generate normal non-dynamic methods instead. However, even when running with –D, we still use DynamicMethods for call site dispatch. Since they’re an implementation detail, we want to filter those out in get_dynamic_frames too.

This gives us a much more manageable stack trace:

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Stack Trace
   S$2::Initialize -- simpletest.py:1 (offset: 84)

As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:10 PM Pacific Standard Time

Wednesday, March 04, 2009

Writing and IronPython Debugger: Adding Interactivity

Now that ipydbg can set a breakpoint, it’s time to add some interactivity to the app. MDbg supports dozens of commands and currently ipydbg supports none. I’d love for ipydbg to support a wide range of commands like MDbg does, but for now let’s keep it simple and start with two: Continue and Quit. These aren’t very interesting as commands go, but that lets me focus this blog post on adding basic interactivity and future posts on specific commands.

First off, we have to understand how the CorDebug managed API supports interactivity. As we’ve seen, callbacks into the debugger are surfaced as managed events. If we look at the base class for all the debugger event arguments, we see that it exposes a Continue property. If you want the debugger to automatically continue after the event handler finishes running, you set the Continue property to true (which is the default). If you want the debugger to stay paused while you provide the developer a chance to poke around, you set Continue to false. In that case, the debugger stays paused until call process.Continue explicitly.

Once we set the Continue property to false, we need a mechanism to signal the main thread of execution that it’s time to wake up and ask the user what they want to do next. Of course, that’s what WaitHandle and it’s descendents are for. In fact, we’re already using an AutoResetEvent in OnProcessExit to signal that the debugged app has exited so we should exit the debugger. However, now we have two different signals that we want to send: exit the debugger or enter the input loop. I decided to differentiate by using two separate AutoResetEvents:

terminate_event = AutoResetEvent(False
break_event = AutoResetEvent(False

def OnProcessExit(s,e): 
  print "OnProcessExit" 
  terminate_event.Set() 

def OnBreakpoint(s,e): 
  print "OnBreakpoint", get_location( 
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread) 
  e.Continue = False 
  break_event.Set() 

#code to create debugger and process omitted for clarity

handles = Array.CreateInstance(WaitHandle, 2
handles[0] = terminate_event 
handles[1] = break_event 

while True
  process.Continue(False

  i = WaitHandle.WaitAny(handles) 
  if i == 0
    break 

  input()

Instead of a single call to process.Continue I had before, I’ve created an infinite “while True” loop that calls Continue, waits for one of the events to signal, then either exits the loop of enters the input loop (via the input function). Since there are two AutoResetEvents, I need to use the WaitAny method to wait for one of them to signal. WaitAny takes an array, which is kind of clunky to use from IronPython since the array has to be strongly typed. It would be much more pythonic if I could call WaitHandle.WaitAny([terminate_event, break_event]). WaitAny then returns an index into the array indicating which one received the signal. If it was the terminate_event that signaled, I exit the loop (and the application). Otherwise, I enter the input loop. Notice, by the way, in OnBreakpoint that I’m both setting Continue to false and signaling the break_event.

The “input loop” needs to be a loop because the user may want to type in multiple commands before letting the debugged app continue to execute. This means that the input function is implemented as another “while True” loop. When the user does chooses a command that implies the process should continue, I simply exit out of the input function and the outer “while True” loop above executes the continue and waits for a signal.

Here’s what the input function looks like right now with our two basic commands:

def input():
  while True:
    Console.Write("» ")
    k = Console.ReadKey()
    
    if k.Key == ConsoleKey.Spacebar:
      Console.WriteLine("\nContinuing")
      return 
    elif k.Key == ConsoleKey.Q:
      Console.WriteLine("\nQuitting")
      process.Stop(0)
      process.Terminate(255)
      return
    else:
      Console.WriteLine("\n Please enter a valid command")

I’ve mapped “q” to quit the debugger and spacebar to continue. Since I’m using Console ReadKey, you only have to type the key in question – no return needed. For continue, we don’t do anything but exit the input loop by returning. Continue gets called as part of the other loop and since we haven’t/can’t add additional breakpoints the debugged app will run until it ends. For quit, I call the Terminate method on process, hard coding the return value to 255. However, Terminate implicitly continues the debugged process. Since you can’t continue a running process, the call to Continue in the outer loop throws an exception. I avoid this exception by adding the call to Stop before Terminate. As per the Stop docs, the debugger maintains a “stop counter” and only resumes the debugged process when the counter reaches zero.  Calling Stop increases the stop counter by one, calling Terminate decreases it by one, then the outer loop Continue  call decreases it to zero and the process continues, terminates and fires the OnProcessExit event handler as usual.

Now that we have a basic interactive loop, I’ll be able to add more interesting commands. I’m guessing at some point, I’ll need to refactor input a bit – I’m guessing a huge if/elif/else statement is going to get ugly fast, but I’ll worry about that when it gets out of hand. As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:06 PM Pacific Standard Time

Monday, March 02, 2009

Writing an IronPython Debugger: Setting a Breakpoint

Now that we have a debugger process up and running, let start adding some actual features. First up, we want to be able to set breakpoints. One of the nice things MDbg does is auto-set a breakpoint on the entrypoint function. For ipydbg, we’re going to auto-set a breakpoint on the first line of the python file being debugged.

In order to set a breakpoint, we need debugger symbols. They allow us to translate between “line one of simpletest.py” and the actual location in the code and back. We’re all used to seeing the PDB files that are produced when we compile a C# assembly. Unsurprisingly, the symbol store binder provides a method to load these PDB files from disk. But where do IronPython debug symbols come from? I know from my extensive reading of the ipy.exe command line parameters that you pass –D to enable application debugging, but since all the IL is being generated in memory, how does the debugger get access to the PDB files?

It turns out the debugger API includes a UpdateModuleSymbols callback method that the runtime uses to notify the debugger when the symbols change. The debugger symbols are provided in an IStream, and then you use the symbol binder to get a symbol reader. The .NET Framework already provides a managed API for reading and writing debug symbols. However, that API doesn’t support loading symbols from a stream, so the MDbg code includes it’s own wrapper around the symbol binder API to include that functionality. Here’s some code to get the debug symbol reader for an updated module and iterate through the associated files:

sym_binder = SymbolBinder()  
    
def OnUpdateModuleSymbols(s,e):  
  print "OnUpdateModuleSymbols"  
    
  metadata_import = e.Module.GetMetaDataInterface[IMetadataImport]()  
  reader = sym_binder.GetReaderFromStream(metadata_import, e.Stream)  

  for doc in reader.GetDocuments():   
    print "\t", doc.URL

process.OnUpdateModuleSymbols += OnUpdateModuleSymbols

If we run this version of ipydbg on simpletest.py with the IPy 2.0.1 release and the Python standard library installed, OnUpdatedModuleSymbols gets called six times, once for each python file that gets loaded when simpletest runs. (site.py, os.py, ntpath.py, stat.py, UserDict.py and simpletest.py). BTW, I tried running this code on the latest build of IPy (changeset 47624) and I’m getting a COM Interop exception. So for now, stick with 2.0.1.

Now that we can get these dynamically generated debug symbols, we can use them to create a breakpoint on the first line of the script being debugged. Everytime OnUpdateModuleSympols is called, I try to bind the initial breakpoint (unless it’s already been bound of course) by calling the following create_breakpoint function.

def create_breakpoint(doc, line, module, reader):
  line = doc.FindClosestLine(line)
  method = reader.GetMethodFromDocumentPosition(doc, line, 0)
  function = module.GetFunctionFromToken(method.Token.GetToken())
  
  for sp in get_sequence_points(method):
    if sp.doc.URL == doc.URL and sp.start_line == line:
      bp = function.ILCode.CreateBreakpoint(sp.offset)
      bp.Activate(True)
      return bp
      
  bp = function.CreateBreakpoint()
  bp.Activate(True)
  return bp

This code translates a given document/line into a function/offset where we can set a breakpoint. To do this, we use sequence points which as per Rick Byers are “used to mark a spot in the IL code that corresponds to a specific location in the original source”. So once we find the function that corresponds to a given line of code, we iterate over the sequence points until we find the one that matches the line we want to break on. If we find a matching sequence point, we set the breakpoint there. If we don’t, we set the breakpoint on the function itself. get_sequence_points is a simple wrapper around ISymbolMethod GetSequencePoints. The original API is pretty ugly to use – managing six separate arrays of information – so get_sequence_points turns it into a generator function you can iterate over.

Now that the breakpoint is set, we want to trap the breakpoint event as well. That’s easy enough, we create an event handler for process.OnBreakpoint similar to the OnUpdateModuleSymbols event above. Eventually, we’ll have the ability to step when we break, but for now I’m just going to print out the current location when the breakpoint is hit. This is kind of the reverse of the operation above. Setting a breakpoint means going from a source location to an IL offset within a function. Printing the current location means going from an IL offset in a function back to the source location. Here’s the function to do that:

def get_location(reader, thread): 
  frame = thread.ActiveFrame 
  function = frame.Function 
   
  offset, mapping_result = frame.GetIP() 
  method = reader.GetMethod(SymbolToken(frame.Function.Token)) 
   
  real_sp = None 
  for sp in get_sequence_points(method): 
    if sp.offset > offset:  
      break 
    if sp.start_line != 0xfeefee:  
      real_sp = sp 
       
  if real_sp == None
    return "Location (offset %d)" % (offset) 
   
  return "Location %s:%d (offset %d)" % ( 
    Path.GetFileName(real_sp.doc.URL), real_sp.start_line, offset) 

def OnBreakpoint(s,e):
  print "OnBreakpoint", get_location(
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread)

Given a symbol reader and a debug thread, get_location returns a location string. It loops thru the sequence points, similar to create_breakpoint, in order to find the closest corresponding line of python code to the current offset (check out Mike Stall’s blog as for why I’m checking for 0xfeefee). In order to make this work, I need the symbol reader for the module that I retrieved in OnUpdateModuleSymbols. For now, I’m stashing the reader in a global dictionary keyed by the module named symbol_readers where OnBreakpoint can access it.

Ipydbg isn’t interactive yet, but it is now running, setting a breakpoint and successfully breaking at that breakpoint. As usual, the latest version of ipydbg is up on GitHub.

Posted By Harry Pierson at 3:59 PM Pacific Standard Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.