Passion * Technology * Ruthless Competence

Friday, October 09, 2009

Hybrid App Debugging – The Debug Window

In my last installment, I added support for a separate debug window on a separate thread from the main window thread. That way, I can pause the execution of the main window while the debug window stays responsive to user input. Now, let’s add some functionality to the debug window. I’m going to start by showing the source code of the python file being executed.

private void OnTraceback(TraceBackFrame frame, string result, object payload)
{
    FunctionCode code = (FunctionCode)frame.f_code;
    if (_curCode == null || _curCode.co_filename != code.co_filename)
    {
        _source.Inlines.Clear();
        foreach (var line in System.IO.File.ReadAllLines(code.co_filename))
        {
            _source.Inlines.Add(new Run(line + "\r\n"));
        }
    }

The TraceBackFrame instance has a property f_code that represents the FunctionCode object being executed in this frame. We have to explicitly cast to FunctionCode type because currently we’re exposing all properties that hang off TraceBackFrame as object type. Since Python is a dynamic language, we’re going to use reflection against the instance itself anyway so it doesn’t really matter what the return type is. However, I’ve asked Dino to change the TraceBackFrame type to use explicit types in order to make it easier to use SetTrace from statically typed languages like C#. Look for that in RC2.

After we cast the code object so it can be used from C#, we check to see if the currently loaded file matches the file currently loaded into the UI. I’ve ranted recently about the limitations of WPF’s TextBox but I didn’t want to get hung up syntax highlighting for this sample so I just went ahead and used the RichTextBox. In the DebugWindow Loaded event handler, I create _source as a WPF Paragraph and then wrap it in a FlowDocument and use it as the RichTextBox’s Document. I set the FlowDocument to be extremely wide, so as to avoid word wrapping. Then when I need to load a new source file, I clear _source of it’s current contents and add a single line run for every line of code in the file. This convention becomes useful later when I go to highlight the current line of code.

Once I load the current file, I save the current frame, code, result and payload in instance fields and then switch on result to determine what to do next. Currently, I’m just highlighting the relevant line of code and setting a TextBlock control in the menu bar.

private void TracebackCall()
{
    dbgStatus.Text = string.Format("Call {0}", _curCode.co_name);
    HighlightLine((int)_curFrame.f_lineno, 
        Brushes.LightGreen, Brushes.Black);
}

private void TracebackReturn()
{
    dbgStatus.Text = string.Format("Return {0}", _curCode.co_name);
    HighlightLine(_curCode.co_firstlineno, 
        Brushes.LightPink, Brushes.Black);
}

private void TracebackLine()
{
    dbgStatus.Text = string.Format("Line {0}", _curFrame.f_lineno);
    HighlightLine((int)_curFrame.f_lineno, 
        Brushes.Yellow, Brushes.Black);
}

In Visual Studio, we typically highlight the current line of code in yellow. However, that doesn’t work as great in a language like Python that delineates code blocks with whitespace. In ipydbg, I indicated function return with three carets. But I didn’t want to be modifying the text in the RichTextBox here so instead I used different colors for the different traceback event types: light green for call, light pink for return and yellow for line. The frame object contains the current line number, which I use for call and line, while the code object has the first line of the current code object, which I use for return. HighlightLine highlights the line in question with the colors provided and also scrolls that line into view if it isn’t already visible.

So now when a traceback is handled, it shows the text for the file being executed and highlights the appropriate line, based on the type of traceback event that happened. Now all we need is to have some way be able to continue execution. In the code, you’ll see I’ve defined a series of RoutedUICommands for common debugger commands. I’ve got the StepIn command wired up in the DebugWindow XAML to a menu item and the “S” keystroke. All that remains is to define StepInExecuted.

private void StepInExecuted(object sender, ExecutedRoutedEventArgs e)
{
    dbgStatus.Text = "Running";

    foreach (var i in _source.Inlines)
    {
        i.Background = rtbSource.Background;
        i.Foreground = rtbSource.Foreground;
    }

    _dbgContinue.Set();
}

This function does three basic things: changes the dbgStatus text, resets all the text in the RichTextBox back to the default coloring, and sets the _dbgContinue AutoResetEvent which signals the main window thread that’s been blocked in OnTracebackReceived to continue.

With this post, I’m about even with the code that’s up on GitHub. That code has a few other capabilities – notably it will stop tracing if you close the debug window and it supports StepOut command which disables traceback for the current scope by returning null in OnTracebackReceived. But I haven’t implemented things like:

  • Set Next Statement
  • Viewing and changing variables
  • Debugger REPL
  • Breakpoint Management

Any suggestions on which of those would you like to see next?

Posted By Harry Pierson at 11:18 AM Pacific Daylight Time

Thursday, October 08, 2009

Hybrid App Debugging – Threading

I added traceback to my GetThings app in just two lines of code, but so far it doesn’t actually do anything that you would expect a debugger to do. But before we get to that, we need understand a little about how threading works for traceback debugging.

As I mentioned last time, the traceback debugger works by calling into the registered traceback handler at various times (entering/exiting a function, before executing a line of code and on exceptions). Execution of the Python code continues when the traceback function exits. That means that you have to block the execution thread while you let the user poke around with the debugger UI. For a console based app, that’s easy. For a GUI app, not so much.

At a minimum, you need to run your debugger UI on a separate thread from your main app window. If you want your main app window to be responsive while you debug, you’ll need to pump messages at a minimum (DoEvents in Windows Forms, similar approaches are available for WPF) or preferably run your python scripts on a background thread separate from either the main window UI thread or the debugger UI thread. To keep things simple, I’m going to simply block the main window thread while the debugger is active.

Since I’m going to have to setup a new thread for the debugger window, I decided to use a static constructor to centralize creating the thread, creating the window and registering the traceback handler all in one place.

static Thread _debugThread;
static DebugWindow _debugWindow;
static ManualResetEvent _debugWindowReady = new ManualResetEvent(false);

public static void InitDebugWindow(ScriptEngine engine)
{
    _debugThread = new Thread(() =>
    {
        _debugWindow = new DebugWindow(engine);
        _debugWindow.Show();
        Dispatcher.Run();
    });
    _debugThread.SetApartmentState(ApartmentState.STA);
    _debugThread.Start();

    _debugWindowReady.WaitOne();
    engine.SetTrace(_debugWindow.OnTracebackReceived);
}

As you can see, InitDebugWindow spins up a new thread and creates the debug window on that thread. Since it’s not the main WPF application thread, you have to explicitly call Dispatcher.Run to get the event queue pumping. You also have to explicitly set the apartment state to be single threaded for any threads creating WPF objects. Finally, I wait for the window to signal that it’s ready (it set’s the _debugWindowReady AutoResetEvent in the Window Loaded event) and then call SetTrace, passing in the debug window’s OnTracebackReceived event, on the thread that called InitDebugWindow.

It’s critical that you call SetTrace – and thus InitDebugWindow – on the thread that’s going to execute the Python code. Debugging in Python is per thread. Even if you execute the same code in the same ScriptScope with the same ScriptEngine but on a different thread, the traceback handler calls won’t fire. The way DebugWindow is written, it will only support debugging a single thread, but it would be pretty straightforward to support multiple threads by changing the way OnTracebackReceived gets signaled to continue.

Speaking of OnTracebackReceived, this was my initial basic implementation of it:

private TracebackDelegate OnTracebackReceived
    (TraceBackFrame frame, string result, object payload)
{
    Action<TraceBackFrame, stringobject> tbAction = this.OnTraceback;
    this.Dispatcher.BeginInvoke(tbAction, frame, result, payload);
    _dbgContinue.WaitOne();
    return this.OnTracebackReceived;
}

As we saw, the DebugWindow is running on a different thread than the traceback handler call will come in on. So OnTracebackReceived needs to invoke a new call on the correct thread by using Dispatcher.BeginInvoke. Even though OnTracebackReceived is always called on the main window thread, it still has access to the properties of the debug window thread like its Dispatcher. I used BeginInvoke to invoke OnTraceback asynchronously – OnTraceback isn’t going to return anything interesting and we’re going to wait on an AutoResetEvent before continuing anyway so I didn’t see any reason to use a synchronous call.

We’ll discuss OnTraceback more next post, but basically it will configure the UI for the traceback event that happened. Then DebugWindow will wait for user input. When the user indicates they want to resume execution, the command handler in question will set _dbgContinue and the original traceback will return so execution can continue.

Posted By Harry Pierson at 6:42 PM Pacific Daylight Time

Wednesday, October 07, 2009

Hybrid App Debugging – TracebackDelegate and SetTrace

Now that I’ve introduced my simple hybrid GetThings app, we need to set about adding support for debugging just the IronPython part of the app via the new lightweight debugging functionality we’re introducing in 2.6. Note, the code is up on github, but isn’t going to exactly match what I show on the blog. Also, I have a post RC1 daily build of IronPython in the Externals folder since I discovered a few issues while building this sample that Dino had to fix after RC1. Those assemblies will be updated as needed as the sample progresses.

We saw last time how how easy it is to execute a Python script to configure a C# app – only four lines of code. If we want to support debugging, we need to add a fifth:

private void Window_Loaded(object sender, RoutedEventArgs e)
{
    ScriptEngine engine = Python.CreateEngine();
    engine.SetTrace(this.OnTraceback);

    ScriptScope s = engine.CreateScope();
    s.SetVariable("items", lbThings.Items);
    engine.ExecuteFile("getthings.py", s);
}

You’ll notice the one new line – the call to engine.SetTrace. This is actually an extension method – ScriptEngine is a DLR hosting API class and but SetTrace is IronPython specific functionality [1]. If you look at the source of Python.SetTrace, you’ll see that it’s just a wrapper around SysModule.settrace, but it avoids needing to get the engine’s shared PythonContext yourself.

SetTrace takes a TracebackDelegate as a parameter. That delegate gets registered as the global traceback handler for the Python engine (on that thread, but we’ll ignore threading for now). Whenever that engine enters a new scope (i.e. a new function), the IronPython runtime calls into the global traceback handler. While the traceback handler runs, execution of the python code in that engine is paused. When the traceback handler returns, the engine resumes executing python code.

In addition to the global traceback handler, each scope has a local traceback handler as well. The TracebackDelegate type returns a TracebackDelegate which is used as the local traceback handler for the next traceback event within that scope. Traceback handlers can return themselves, some other TracebackDelegate, or null if they don’t want any more traceback events for the current scope. It’s kinda confusing, so here’s a picture:

image

You’ll notice three different traceback event types in the picture above: call, line and return. Call indicates the start of a scope, and is always invoked on the global traceback handler (i.e. the traceback passed to SetTrace). Line indicates the Python engine is about to execute a line of code and return indicates the end of a scopes execution. As you can see, the runtime uses the return value of the traceback for the next tracing call until the end of the scope. The return value from the “return” event handler is ignored.

So now that we know the basics of traceback handlers, here’s a simple TracebackDelegate that simply returns itself. The “Hello, world!” of traceback debugging if you will.

private TracebackDelegate OnTraceback
    (TraceBackFrame frame, string result, object payload)
{
    return this.OnTraceback;
}

If you run this code, there will be no functional difference from the code before you added the SetTrace call. That’s because we’re not doing anything in the traceback handler. But if you run this in the debugger with a breakpoint on this function, you’ll see that it gets called a bunch of times. In the python code from the last post, there are three scopes – module scope, download_stuff function scope and the get_nodes function scope. Each of those function scopes will have a call and return event, plus a bunch of line events in between.

The parameters for TracebackDelegate are described in the Python docs. The frame parameter is the current stack frame – it has information about the local and global variables, the code object currently executing, the line number being executed and a pointer to the previous stack frame if there is one. More information on code and frame objects is available in the python data model (look for “internal types”). Result is the reason why the traceback function is being called (in Python docs, it’s called “event” but that’s a keyword in C#). IronPython supports four traceback results: “call”, “line” and “return” as described above plus “exception” when an exception is thrown. Finally, the payload value’s meaning depends on the traceback result. For call and line, payload is null. For return, payload is the value being returned from the function. For exception, the payload is information about the exception and where it was thrown.

As I mentioned above, python code execution is paused while the traceback handler executes and then continues when the traceback handler returns. That means you need to block in that function if you want to let the user interact with the debugger. For a console app like PDB, you can do that with a single thread of execution easily enough. For a GUI app like GetThings, that means running the debugger and debugee windows on separate threads. And as I alluded to, tracing for Python script engines is per thread. So next time, we’ll look deeper into how to use multiple threads for lightweight debugging a hybrid app.


[1] Eventually, I’d like to see IronRuby support lightweight debugging as well. However, there’s no built in mechanism for Ruby debugging the way there is for Python, so it’s less clear how we should expose debugging to the Ruby developer. We’d also want to build a language neutral DLR Hosting API mechanism for lightweight debugging as well at that point. But honestly, we have higher priorities at this point.

Posted By Harry Pierson at 1:43 PM Pacific Daylight Time

Tuesday, October 06, 2009

Lightweight Debugging for Hybrid C#/IronPython Apps

designallCAZM93SM

One of the IronPython scenarios that I’m hearing more and more about recently is for polyglot programs. In these scenarios, part of the application is built in IronPython other parts are build in compiled, statically typed languages like C# or Visual Basic. Sometimes, programs are written this way to allow the C# app to access a Python library, like my Pygments for WL Writer plugin. Other programs want to be customizable by the end user, like Intellipad. Whatever the reason, I think that the number of these hybrid polyglot programs is going up, which partially explains why the C# team added the new dynamic type to C# 4.0.

(FYI: the You had me at “dynamic” shirt above is available for sale in my Zazzle store along with my Architecture Help 5¢ shirt)

The thing is that if you’re going to build polyglot apps, you’re probably going to want the ability to debug polyglot apps as well. I’ve written extensively about building a debugger for IronPython. However, ipydbg uses the CLR debugger under the hood which means you have to have the debugger and the code it’s debugging in separate processes. That’s a huge design burden for building a debuggable polyglot application. Luckily, as of IronPython 2.6, we support Python’s built-in trace debugging capability (aka sys.settrace). While you can use this in pure Python apps (like PDB), you can also use it polyglot C# (or VB)/IronPython apps as well. If only someone were to take the time to build a sample and document what he did along the way…

Hey, that sounds like PM work!

Seriously, let me introduce you to the worlds simplest Twitter application: GetThings. The app downloads a list of my tweets via the Twitter API and displays them in a list box. The UI is written in C# while the tweet download code is written in Python. Clearly, this is a pretty brain dead app – but the point isn’t to build a great Twitter app but rather to show how to use the settrace API from C#.

I’ve stuck the code up on GitHub. If you want to see the basic app in action sans debugging, start with the initial checkin. As you can see here, basic C# / IronPython integration is pretty trivial. I’m simply creating an engine and a scope, adding the list boxes’ Items property to the scope, and executing the getthings.py file from the disk.

private void Window_Loaded(object sender, RoutedEventArgs e)
{
    ScriptEngine engine = Python.CreateEngine();
    ScriptScope  scope = engine.CreateScope();
    scope.SetVariable("items", lbThings.Items);
    engine.ExecuteFile("getthings.py", scope);
}

Since GetThings.py is just a text file, the user can modify it to get a list of anything they want – some other user’s timeline, the public timeline, or even – gasp! – something not from Twitter! In fact, as you see below, I’ve actually modified it to pull the tweets from a file on disk so I can avoid hitting the network on every run.

import clr
clr.AddReference("System.Xml")
from System.Xml import XmlDocument

def get_nodes(xml):
    return xml.SelectNodes("statuses/status/text")

def download_stuff():
    x = XmlDocument()

    #load from disk to save time in development
    #x.Load("http://twitter.com/statuses/user_timeline/devhawk.xml")
    x.Load("devhawk.xml")

    for n in get_nodes(x):
        txt = n.InnerText
        items.Add(txt)

download_stuff()

OK, so that’s the basics of the world’s simplest hybrid C#/IronPython Twitter application. Next up, I’ll add the settrace basics.

Posted By Harry Pierson at 2:11 PM Pacific Daylight Time

Wednesday, July 08, 2009

Microsoft.Scripting.Debugging

If you’ve compiled IronPython from source recently, you may have noticed a new DLL: Microsoft.Scripting.Debugging. This DLL contains a lightweight, non-blocking debugger for DLR based languages that is going to enable both new scenarios as well as better compatibility with CPython. Needless to say, we’re very excited about it.

When I was actively working on my ipydbg series, I got several emails asking about using it in an embedded scripting scenario. Unfortunately, the ipydbg approach doesn’t work very well in the embedded scripting scenario. ipydbg uses ICorDebug and friends, which completely blocks the application being debugged. This means, your debugger has to run in a separate process. So either you run your debugger in your host app process and your scripts in a separate process or you run your debugger in a separate process debugging both the scripts and the host app. Neither option is very appealing.

Now with the DLR Debugger, you can run all three components in the same process. I think of the DLR debugger as a “cooperative” debugger in much the same way that Windows 3.x supported cooperative multitasking. It’s also known as trace or traceback debugging. Code being debugged yields to the debugger at set points during its execution. The debugger then does whatever it wants, including showing UI and/or letting the developer inspect or modify program state. When the debugger returns, execution of the original code continues until the next set point wherein the process repeats itself.

The primary point of entry for the DLR Debugger is the DebugContext class. Notable there is the TransformLambda method, which takes a normal DLR LambdaExpression and transforms it into a cooperatively debugged LambdaExpression. LambdaExpressions can contain DebugInfoExpressions – typically we insert them at the start of every Python code line as well as one at the end of the function. When we run IronPython in debug mode (i.e. –D), those get turned into sequence points as we saw back when I was working on ipydbg. When using the DLR Debugger, those DebugInfoExpressions are transformed into calls out to IDebugCallback.OnDebugEvent. The DLR Debugger implements the IDebugCallback interface on the TracePipeline class which also implements ITracePipeline. In OnDebugEvent, TracePipeline calls out to an ITraceCallback instance you provide. The extra layer of indirection means you can change your traceback handler without having to regenerate the debuggable version of your functions.

Of course, we hide all this DLR Debugger goo from you in IronPython. Python already has a mechanism for doing traceback debugging – sys.settrace. Our ITraceCallback, PythonTracebackListener, wrapps the DLR Debugger API to expose the sys.settrace API. That makes this feature a twofer – new capability for IronPython + better compatibility with CPython. Instead of needing a custom tool (i.e. ipydbg) you can now use PDB from the standard Python library (modulo bugs in our implementation). I haven’t been working on ipydbg recently since you’ll be able to use PDB soon enough.

For those hosting IronPython, we also have a couple of static extension methods in our hosting API (look for the SetTrace functions in IronPython\Hosting\Python.cs). These are simply wrappers around sys.settrace, so it has the same API regardless if you access it from inside IronPython or from the hosting API. But if you’re hosting IronPython in a C# application, those extension methods are very convenient to use.

This debugger will be in our regular releases of IronPython as of 2.6 beta 2 which is scheduled to drop at the end of this month. For those who just can’t wait, it’s available as source code starting with yesterday’s changeset. Please let us know what you think!

Posted By Harry Pierson at 2:42 PM Pacific Daylight Time

Wednesday, April 08, 2009

Writing an IronPython Debugger: Breakpoint Management

Setting a breakpoint was the second feature I implemented in ipydbg. While setting a breakpoint on the first line of the Python file being run is convenient, it was obviously necessary to provide the user a mechanism to create their own breakpoints, as well as enable and disable existing breakpoints.

First thing I had to do was to refactor the create_breakpoint method. Originally, I was searching thru the symbol documents looking for the one that matched the filename in OnUpdateModuleSymbols. However, since I wanted to specify by new breakpoints via the same filename/line number combination, it made more sense to move symbol document logic into create_breakpoint:

def create_breakpoint(module, filename, linenum):
    reader = module.SymbolReader
    if reader == None:
      return None
    
    # currently, I'm only comparing filenames. This algorithm may need
    # to get more sophisticated to support differntiating files with the 
    # same name in different paths
    filename = Path.GetFileName(filename)
    for doc in reader.GetDocuments():
      if str.Compare(filename, Path.GetFileName(doc.URL), True== 0:
        linenum = doc.FindClosestLine(linenum)
        method = reader.GetMethodFromDocumentPosition(doc, linenum, 0)
        function = module.GetFunctionFromToken(method.Token.GetToken())
        
        for sp in get_sequence_points(method):
          if sp.doc.URL == doc.URL and sp.start_line == linenum:
            return function.ILCode.CreateBreakpoint(sp.offset)
        
        return function.CreateBreakpoint()

The new version isn’t much different than the old. It loops thru the symbol documents looking for one that matches the filename argument. Then it creates the breakpoint the same way it did before. Eventually, I’m going to need a better algorithm than “only compare filenames”, but it works for now.

Once I made this change, it was trivial to implement a breakpoint add command. What was harder was deciding on the right user experience for this. I decided that breakpoint management was going to be the first multi-key command in ipydbg. so all the debug commands are prefixed with a “b”. I use the same command routing decorator I used for input commands. As you can see, my breakpoint command looks a lot like my top level input method – read a key from the console then dispatch it via a commands dictionary that gets populated by @inputcmd decorators.

@inputcmd(_inputcmds, ConsoleKey.B)
def _input_breakpoint(self, keyinfo):
    keyinfo2 = Console.ReadKey()
    if keyinfo2.Key in IPyDebugProcess._breakpointcmds:
        return IPyDebugProcess._breakpointcmds[keyinfo2.Key](self, keyinfo2)
    else:
        print "\nInvalid breakpoint command"str(keyinfo2.Key)
        return False

Currently, there are four breakpoint commands: “a” for add, “l” for list, “e” for enable and “d” for disable. List is by far the simplest.

@inputcmd(_breakpointcmds, ConsoleKey.L)
def _bp_list(self, keyinfo):
  print "\nList Breakpoints"   
  for i, bp in enumerate(self.breakpoints): 
    sp = get_location(bp.Function, bp.Offset)
    state = "Active" if bp.IsActive else "Inactive"
    print "  %d%s:%d %s" % (i+1, sp.doc.URL, sp.start_line, state)
  return False

As you can see, I’m keeping a list of breakpoints in my IPyDebugProcess class. Originally, I used AppDomain.Breakpoints list, but that only returns enabled breakpoints so I was forced to store my own list. Note also that I’m using the enumerate function, which returns a tuple of the collection count and item. I do this so I can refer to breakpoints by number when enabling or disabling them:

@inputcmd(_breakpointcmds, ConsoleKey.E)
def _bp_enable(self, keyinfo):
  self._set_bp_status(True)
  
@inputcmd(_breakpointcmds, ConsoleKey.D)
def _bp_disable(self, keyinfo):
  self._set_bp_status(False)

def _set_bp_status(self, activate):
  stat = "Enable" if activate else "Disable"
  try:
    bp_num = int(Console.ReadLine())
    for i, bp in enumerate(self.breakpoints): 
      if i+1 == bp_num:
        bp.Activate(activate)
        print "\nBreakpoint %d %sd" % (bp_num, stat)
        return False
    raise Exception"Breakpoint %d not found" % bp_num
    
  except Exception, msg:
    with CC.Red: print "&s breakpoint Failed %s" % (stat, msg)

Since the code was identical, except for the value passed to bp.Activate, I factored the code into a separate _set_bp_status method. After the user presses ‘b’ and then either ‘e’ or ‘d’, they then type the number of the breakpoint provided by the breakpoint list command. _set_bp_status then simply iterates thru the list until it finds the matching breakpoint and calls Activate. Note that since it’s possible to have 10 or more breakpoints, I’m using ReadLine instead of ReadKey, meaning you have to hit return after you type in the breakpoint number.

Finally, I need a way to create new breakpoints. With the refactoring of create_breakpoint, this is pretty straightforward

@inputcmd(_breakpointcmds, ConsoleKey.A) 
def _bp_add(self, keyinfo): 
  try
    args = Console.ReadLine().Trim().split(':'
    if len(args) != 2raise Exception"Only pass two arguments"  
    linenum = int(args[1]) 
     
    for assm in self.active_appdomain.Assemblies: 
      for mod in assm.Modules: 
          bp = create_breakpoint(mod, args[0], linenum) 
          if bp != None
            self.breakpoints.append(bp) 
            bp.Activate(True
            Console.WriteLine( "Breakpoint set"
            return False 
    raise Exception"Couldn't find %s:%d" % (args[0], linenum)     

  except Exception, msg: 
    with CC.Red: 
      print "Add breakpoint failed", msg 

Most of _bp_add is processing the input arguments, looping through the modules and then storing the breakpoint that gets returned. When I set the initial breakpoint inside OnUpdateModuleSymbols, I have the module with updated symbols as an event argument. However, in the more general case we’ve got no way of knowing which module of the current app domain contains the filename in question. So we loop thru all the modules, calling create_breakpoint on each until one returns a non-null value. Of course, “all the modules” will include the IronPython implementation, but assuming you’re running against released bits the call to create_breakpoint will return right away if debug symbols aren’t available.

As usual, the latest version is up on GitHub. This will be the latest update to ipydbg for a little while. I worked on it quite a bit while I was at PyCon and have been busy with other things since I got home. Don’t worry, I’ll come back to it soon enough. As I mentioned Monday, I want to get function evaluation working so I can have a REPL console running in the target process instead of the one I’ve got currently running in the debugger process.

Posted By Harry Pierson at 2:45 PM Pacific Daylight Time

Monday, April 06, 2009

Writing an IronPython Debugger: REPL Console

While I was banging my head against a wall experimenting with understanding how CorValue extraction worked, I found myself wanting to dink around with the debugger objects in a REPL console. One of IronPython’s core strengths is support for “exploratory programming” via the REPL. It turned out bringing a REPL to ipydbg was quite simple.

Python includes two built-in features that making DIY REPL quite easy: compile and exec (though technically, exec is a statement, not a function). As you might assume from their names, compile converts a string into what Python calls a code object while exec executes a code object in a given scope. Technically, exec can accept a string so I could get by without using compile. However, if you’re compiling a single interactive statement compile can automatically insert a print statement if you’ve passed in a an expression. In other words, if you type in “2+2” on the console it will print “4”, which is the behavior I wanted.

Here’s what my REPL console code look like. I love that it’s only 20 lines of code.

@inputcmd(_inputcmds, ConsoleKey.R)
def _input_repl_cmd(self, keyinfo):
  with CC.Gray:
    print "\nREPL Console\nPress Ctl-Z to Exit"
    cmd = ""
    _locals = {'self'self}

    while True:
      Console.Write(">>>" if not cmd else "...")
      
      line = Console.ReadLine()
      if line == None:
        break
      
      if line:
        cmd = cmd + line + "\n"
      else:
        try:
          if len(cmd) > 0:
            exec compile(cmd, "<input>""single"in globals(),_locals
        except Exception, ex:
          with CC.Red: print type(ex), ex
        cmd = ""

It’s pretty straightforward. I set up a dictionary to act as the local variable scope for the code that gets executed. I’m just reusing the current global scope, but I want the local scope to start with only the reference to the current IPyDebugProcess instance which is passed into _input_repl_cmd as “self”. All the other local variables like cmd and line won’t be available to the REPL code. Then I drop into a loop where I read lines from the console and execute them.

In order to support multi-line statements, I build up the cmd variable over multiple line inputs and I don’t execute it until the user inputs an empty line. In the standard Python console, it can recognize single line statements and execute them immediately. Dino showed me how to use the IronPython parser to do the same thing, but I haven’t implemented that in ipydbg yet. To exit the REPL loop, you type Ctl-Z, which returns None (aka null) from ReadLine instead of the empty string.

Since I never execute the code more than once, I have my exec and compile statements together on a single line. Compile takes the string to be compiled, the name of the file it came from (I’m using <input> for this) and the kind of code. Passing in “single” for the kind of code adds the auto-expression-print functionality I mentioned above. Then I exec the code object that’s returned in specified scope I’m managing for this instance of the REPL loop. If you exit out of the REPL and re-enter it, you get a fresh new copy of the local scope so any functions or variables you define in the last REPL are gone.

Runtime execution of code into a given scope is a hallmark of dynamic languages, but I’m still fairly green when it comes to Python so it took me a while to figure this out. Python code executes in a given scope, a combination of global and local variables. When you’re in the ipy.exe REPL, you’re at top level scope anyway, so global and local scope are the same – if you add something to global scope, it shows up in local scope and vis-versa. Inside a function, you’ll have the same global scope, but the local scope will be different and changes to one won’t be reflected in the other. The ipydbg REPL isn’t a function per-se, but it does provide an explicit local scope that gets disposed when you exit the REPL.

While having a debugger REPL is really convenient for prototyping new ipydbg commands, it’ll really shine once I get function evaluation working. Then I’ll be able to open a REPL console where the commands are executed in the target process instead of the debugger process as they are now. That will be very cool. Until then, the latest code is – as always – up on GitHub.

Posted By Harry Pierson at 12:07 PM Pacific Daylight Time

Writing an IronPython Debugger: Getting Arguments

It’s a small update, but I added support for displaying method arguments along side the local variables. As I mentioned in that post, breaking out the CorValue extraction and display code into a shared function was a good idea – adding support for getting arguments was trivial since I could reuse that code.

Because there’s no hierarchy of scopes to deal with and the names are in the metadata instead of debug symbols, getting arguments is much easier than getting local variables.

def get_arguments(frame): 
    mi = frame.GetMethodInfo() 
    for pi in mi.GetParameters(): 
      if pi.Position == 0continue 
      arg = frame.GetArgument(pi.Position - 1
      yield pi.Name, arg

You’ll notice that I’m yielding the arguments as a tuple of the name and value, the same as get_locals yields. I did refactor get_locals a bit – there’s no longer an argument to skip hidden variables anymore (though get_locals still skips dynamic call sites caches as it did before). Now, it’s up to the the caller of get_arguments and get_locals to filter hidden variables as they see fit.

Because get_locals and get_arguments yield the same types, I was able to factor the code to print a value and loop through the collection of values into separate local functions.

@inputcmd(_inputcmds, ConsoleKey.L)  
def _input_locals_cmd(self, keyinfo):  
  def print_value(name, value):  
    display, type_name = display_value(extract_value(value))  
    with CC.Magenta: print "  ", name,   
    print display,  
    with CC.Green: print type_name  
      
  def print_all_values(f, show_hidden):  
      count = 0  
      for name,value in f(self.active_thread.ActiveFrame):  
        if name.startswith("$"and not show_hidden:  
          continue  
        print_value(name, value)  
        count+=1          
      return count  
        
  print "\nLocals"  
  show_hidden = \ 
    (keyinfo.Modifiers & ConsoleModifiers.Alt) == ConsoleModifiers.Alt  
  count = print_all_values(get_locals, show_hidden)  
  count += print_all_values(get_arguments, show_hidden)  

  if count == 0:  
      with CC.Magenta: print "  No Locals Found"  

I really like the local functions feature of Python. In C#, you can define an anonymous delegate using the lambda syntax. But for a scenario like this, I like local functions better. However, I do like C#’s support for statement lambdas – Python only supports expression lambdas. So while I like local functions better in this scenario (because I’m using the method more than once) in something like an event handler, I like the statement lambda syntax better.

As usual, the latest version of ipydbg is up on GitHub.

Posted By Harry Pierson at 9:46 AM Pacific Daylight Time

Wednesday, April 01, 2009

Writing an IronPython Debugger: Command Routing

At this point, ipydbg support seven commands: Continue, Quit, Show Stack Trace, Show Locals, Step Over, Step In, and Step Out. All these commands are invoked by a single keystroke. I’m using Console.ReadKey in an attempt to cut down on the number of keystrokes needed for interacting with the debugger. If I only type ‘s’ instead of ‘s <enter>’ to step, I figure I’ll be twice as productive! :)

If I was writing ipydbg in C#, I could use switch statement to dispatch commands in the _input method based on user keystrokes. However, Python doesn’t have a switch statement so I’ve been using a cascading set of if/elif/else statements instead. When you get up to seven if/elif clauses plus an else clause, the code smell is pretty overwhelming.

# Only has three if/elif clauses, but it's already a little smelly
val = Console.ReadKey()   
if val.Key == 'a'
  result = 'a' 
elif val.Key == 'b' 
  result = 'b' 
elif val.Key == 'c' 
  result = 'c' 
else
  print "unknown key"

Python might not have a switch statement, but it does have first-order functions so you can get the effects of a switch by using a dictionary.

def do_a():    
  return 'a'   
def do_b():    
  return 'b'   
def do_c():    
  return 'c'   
_switch = {'a':do_a, 'b':do_b, 'c':do_c}    

val = Console.ReadKey()    
if val in _switch:
  result = _switch[val.Key]()
else:
  print "unknown key"

I like this approach much better. Individual if/elif blocks are now broken out into separate functions, which smells better than embedding them in one big function. Also, I like that my pseduo-switch statement is completely separate from the how the _switch dictionary is initialized. However, this approach also separates the pseudo-case statement functions from the _switch dictionary as well. That’s not a good thing. You can easily imagine screwing up by adding a new function but forgetting to manually update the _switch dictionary.

What I need is a way to declaratively associate the switch function with the dictionary lookup key that’s associated with it. Luckily, Python Decorators provides a very clean way to do this.

_switch = {}       

@inputcmd(_switch, 'a')
def do_a():     
  return 'a'    
@inputcmd(_switch, 'b')
def do_b():     
  return 'b'    
@inputcmd(_switch, 'c')
def do_c():     
  return 'c'    

val = Console.ReadKey()     
if val in _switch: 
  result = _switch[val.Key]() 
else
  print "unknown key"

I’ve blogged about decorators before when I wanted to automatically invoke operations on the right thread in my WPF photo viewing app. The @inputcmd decorator is a bit more complicated than the @BGThread and @UIThread decorators since @inputcmd decorator accepts arguments. Each of the @input command decorators in the code above is the equivalent to this code:

def do_a():      
  return 'a'

_tmp = inputcmd(_switch, 'a')
do_a = _tmp(do_a)

As you can see, the inputcmd function returns the decorator that wraps do_a, rather than being the decorator itself. This function that returns a function that returns a function is kinda confusing at first. But this approach allows you to configure the decorator for a specific purpose via the arguments – in this case, specifying which dictionary and which console key this function is associated with.

Also unlike @BGThread and @UIThread, I don’t actually want to modify the behavior of the methods decorated with @inputcmd. I only want to store a reference to them in the passed in dictionary. So implementing this decorator is very easy:

def inputcmd(cmddict, key):
    def deco(f):
        cmddict[key] = f
        return
    return deco

The decorator simply inserts the function into the passed-in dictionary using the passed in key. It then returns the function as is, so it’s not really rebinding the symbol to a new method (technically, it’s rebinding the symbol to the same function it’s currently bound to). If I wanted also wrap the passed in function to provide additional functionality, I could do that with a second locally defined function inside the deco function.

The latest version of ipydbg as been refactored to use @inputcmd instead of set of a cascading if/elif statement blocks. Now that that’s done, I can start working on multi-key commands.

Posted By Harry Pierson at 1:55 PM Pacific Standard Time

Tuesday, March 31, 2009

Writing an IronPython Debugger: Displaying Values

Now that I can get the local variables for a given frame, I need to display them in the console. Eventually, I’d like to provide the ability to update the local variables as well, but you gotta crawl before you can run. Luckily, the debugger API is consistent about using same COM interfaces – wrapped by the managed CorValue class – to represent all data values, including local variables, function arguments and object fields. So the work I do now to display CorValues in the console will be reusable in other contexts down the road.

While the debugger API is consistent about how it represents values in the target process, the API it uses is very complicated. The primary COM interface for accessing values is ICorDebugValue, but it has eight siblings: ICorDebugReferenceValue, ICorDebugHandleValue, ICorDebugStringValue, ICorDebugObjectValue, ICorDebugGenericValue, ICorDebugBoxValue, ICorDebugArrayValue, ICorDebugHeapValue. All those COM interfaces are represented in managed code by CorValue and it’s subclasses.

Furthermore, confusingly ICorDebugValues have both a Type and an ExactType. ExactType is what .NET developers typically think of as the type, aka the CLR type. Well, the debugger API’s representation of the CLR type at any rate. You can retrieve the value’s metadata as a System.Type compatible object via value.ExactType.Class.GetTypeInfo().CorValue’s Type property, on the other hand, represents the object’s primitive or element type. For example, instances of .NET classes have an element Type of ELEMENT_TYPE_CLASS. There are a collection of primitive types (boolean, char, ints of various signage and size, floats of various size) as well as types you wouldn’t call primitive but that the runtime has specific knowledge of (string, array and value types - aka structs in C# terminology).

If you’re confused by all that, don’t worry so am I. Honestly, I’ve re-written this code several times, each time understanding the API just a bit better. Whatever the *right* way to use the interfaces, I’m sure I don’t know it. For my first cut at this, I essentially ported MDbg’s high level CorValue API – aka MDbgValue::InternalGetValue if you’re looking at the MDbg source code – over to Python. Along the way, I’ve improved on that code as I’ll describe below.

A given CorValue may be a primitive value like an int or it may be a reference to or a boxed version of some other CorValue object. So in order to print the CorValue, you have to go thru a series of attempts to dereference and unbox until you get to the “real” underlying CorValue object. From there, converting the value to a string I can print depends on the value’s element type. For primitive types like ints and floats, you can call CastToGenericValue to get a CorGenericValue “view” of the same CorValue object [1]. A CorGenericValue can read and write the raw bytes from memory in the target process of the value. The GetValue method reads the data from target process then does an unsafe cast to appropriate managed type. For example, an ELEMENT_TYPE_R4 CorValue gets cast into a System.Single. For CorValue strings, I call CastToStringValue and then access the String property. For classes, value types and objects, there’s no simple or standard approach to retrieving the data, so for now I return the result of calling CastToObjectValue. Eventually, I’ll want to provide a mechanism to read the specific fields of a class or value type.

Unfortunately, the mechanism above to read primitive types doesn’t work with IronPython. GetValue needs to know the correct element type in order to do the unsafe cast. For value types (aka any struct other than the basic primitives), GetValue will return a data as a byte array. The problem is that when you box a primitive, the original element types gets overwritten by ELEMENT_TYPE_VALUETYPE. You can’t get the original element type back, even after unboxing. So for boxed primitives, you can only retrieve the data as a raw byte array or as a CorObjectValue, neither of which is very useful.

Luckily, I was able to work around this. Under the hood, GetValue calls UnsafeGetValueAsType to do the actual work of reading the data from the target process and casting it to the right managed type. UnsafeGetValueAsType It accepts the an element type value as a method parameter. If your know the right element type value, you could call UnsafeGetValueAsType directly if instead of going thru GetValue. While boxing overwrites the original element type value, an unboxed CorValue still has the CLR type metadata available. So I was able to map CLR Types to element types (e.g. System.Single –> ELEMENT_TYPE_R4) in order to retrieve the underlying value of boxed primitive types.

_type_map = { 'System.Boolean': ELEMENT_TYPE_BOOLEAN,   
  'System.SByte'  : ELEMENT_TYPE_I1, 'System.Byte'   : ELEMENT_TYPE_U1,   
  'System.Int16'  : ELEMENT_TYPE_I2, 'System.UInt16' : ELEMENT_TYPE_U2,   
  'System.Int32'  : ELEMENT_TYPE_I4, 'System.UInt32' : ELEMENT_TYPE_U4,   
  'System.IntPtr' : ELEMENT_TYPE_I,  'System.UIntPtr': ELEMENT_TYPE_U,  
  'System.Int64'  : ELEMENT_TYPE_I8, 'System.UInt64' : ELEMENT_TYPE_U8,   
  'System.Single' : ELEMENT_TYPE_R4, 'System.Double' : ELEMENT_TYPE_R8,   
  'System.Char'   : ELEMENT_TYPE_CHAR, }   
     
_generic_element_types = _type_map.values()   

class NullCorValue(object):  
  def __init__(self, typename):  
    self.typename = typename  

def extract_value(value):  
    rv = value.CastToReferenceValue()  
    if rv != None:  
      if rv.IsNull:   
        typename = rv.ExactType.Class.GetTypeInfo().Name  
        return NullCorValue(typename)  
      return extract_value(rv.Dereference())  
    bv = value.CastToBoxValue()  
    if bv != None:  
      return extract_value(bv.GetObject())   

    if value.Type in _generic_element_types:  
      return value.CastToGenericValue().GetValue()  
    elif value.Type == ELEMENT_TYPE_STRING:  
      return value.CastToStringValue().String  
    elif value.Type == ELEMENT_TYPE_VALUETYPE:  
      typename = value.ExactType.Class.GetTypeInfo().Name   
      if typename in _type_map:  
        gv = value.CastToGenericValue()  
        return gv.UnsafeGetValueAsType(_type_map[typename])  
      else:  
        return value.CastToObjectValue()  
    elif value.Type in [ELEMENT_TYPE_CLASS, ELEMENT_TYPE_OBJECT]:  
      return value.CastToObjectValue()  
    else:  
      msg = "CorValue type %s not supported" % str(value.Type)
      raise (Exception, msg)

It’s kinda ugly code and I’m thinking that at least some of really belongs in the CorValue C# classes rather than in ipydbg. However, I’m not that interested in doing the significant refactoring it would take to make the CorValue API developer-friendly, so I did it here.

One thing to note that I didn’t cover earlier is the NullCorValue object. For reference values, there’s a IsNull property that may be set. If it is set, I need a mechanism to indicate the null value, but also includes the type information. So I created a custom type that can store the type name to represent null. Again, something that should be a part of the CorValue API.

Once I have my extracted value, I need to display it in the console. This is much simpler than the extracting the value. As I wrote above, I’m not making any attempt to print a real representation for CorObjectValues. I could look at making a call ToString call to get something useful, but that requires invoking a function in the target process and I haven’t gotten that far with ipydbg yet. So I just print “<…>” if it isn’t a string, primitive or null value.

def display_value(value):
  if type(value) == str:
    return (('"%s"' % value), 'System.String')
  elif type(value) == CorObjectValue:
    return ("<...>", value.ExactType.Class.GetTypeInfo().FullName)
  elif type(value) == NullCorValue:
    return ("<None>", value.typename)
  else:
    return (str(value), value.GetType().FullName)

Now all I need is to iterate thru the list of local variables and call extract_value and display_value on each in turn and print the results. I won’t reproduce that code here, but you can see it in the ipydbg project source on GitHub.

I’m happy with what I’ve gotten working (it took several days of banging my head against the proverbial wall to get it this far) but there’s still room for improvement. First, I’d like to be able to call ToString to get a class-specific generic representation as I described above. Second, I need a way to display the fields of a CorObejctValue object. It’s just a combination of metadata reading and CorObjectValue::GetFieldValue, but that code won’t write itself. Finally, there are other Python primitives - like list, dictionary and tuple – that ipydbg should have specific knowledge of and be able to display without requiring the user to drill into the member variables and the like.


[1] While the CorValue API does certain things very well, I wish it did a better job abstracting away the existence of the various ICorDebugValue interfaces. Hence the need for all the calls to CastToWhatever().

Posted By Harry Pierson at 8:35 AM Pacific Standard Time

Wednesday, March 25, 2009

Writing an IronPython Debugger: Getting Local Variables

I just pushed out a new drop of ipydbg that includes the first cut of support for showing local variables. Getting the value for a local variable is actually pretty simple. The CorFrame object (which hangs off active_thread) includes a method to get a local variable by index as well getting a count of all local variables. The problem with these functions is that they don’t provide the name of the variable. For that, you’ve got to look in debug symbols.

From a CorFrame, you can retrieve the associated CorFunction. Since I added symbol reader support to CorModule, I added support for directly retrieving the ISymbolMethod for a CorFunction. From the method symbols, I can get the root lexical scope of the method. And from the symbol scope, I can get the locals. Scopes can be nested, so to get all the locals for a given function, you need to iterate thru all the child scopes as well.

So here’s my get_locals function:

def get_locals(frame, scope=None, offset=None, show_hidden=False): 
    #if the scope is unspecified, try and get it from the frame
    if scope == None
        symmethod = frame.Function.GetSymbolMethod() 
        if symmethod != None
            scope = symmethod.RootScope 
        #if scope still not available, yield the local variables
        #from the frame, with auto-gen'ed names (local_1, etc)
        else
          for i in range(frame.GetLocalVariablesCount()): 
            yield "local_%d" % i, frame.GetLocalVariable(i) 
          return 

    #if we have a scope, get the locals from the scope 
    #and their values from the frame
    for lv in scope.GetLocals(): 
        #always skip $site locals - they are cached callsites and 
        #not relevant to the ironpython developer
        if lv.Name == "$site": continue 
        if not lv.Name.startswith("$") or show_hidden: 
          v = frame.GetLocalVariable(lv.AddressField1) 
          yield lv.Name, v 

    if offset == None: offset = frame.GetIP()[0

    #recusively call get_locals for all the child scopes
    for s in scope.GetChildren(): 
      if s.StartOffset <= offset and s.EndOffset >= offset: 
        for ret in get_locals(frame, s, offset, show_hidden): 
          yield ret

The function is designed to automatically retrieve the scope and offset, if they’re available. That way, I can simply call get_locals with the frame argument and it does the right thing. For example, if you don’t pass in a symbol scope explicitly get_locals will attempt to retrieve the debug symbols. If debug symbols aren’t available, iterates over the locals in the frame and yields each with a fake name (local_0, local_1, etc). If the debug symbols are available, then it iterates over the locals in the scope, then calls itself for each of the child scopes (skipping child scopes who’s offset range doesn’t overlap with the current offset).

The other feature of get_locals is deciding which locals to include. As you might expect, IronPython emits some local variables that are for internal runtime use. These variables get prefixed with a dollar sign. The dollar sign is not a legal identifier character in C# or Python, but IL has no problem with it. If you pass in False for show_hidden (or use the default value), then get_locals skips over any local variables who’s name starts with the dollar sign.

Even if you pass in True for show_hidden, get_locals still skips over any variable named “$site”. $site variables are dynamic call site caches, a DLR feature that are used to efficiently dispatch dynamic calls by caching the results of previous invocations. Martin Maly’s blog has more details on these caches. As they are part of method dispatch, I never want to show them to the ipydbg user, so they get skipped regardless of the value of show_hidden.

Now that I can get the local variables for a given frame, we need to convert those variables to something you can print on the screen. That turns out to be more complicated that you might expect, so it’ll have to wait for the next post (which may be a while, given that PyCon is this weekend). In the meantime, you can get the latest version of ipydbg from GitHub.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Saturday, March 21, 2009

Writing an IronPython Debugger: A Little Hack…err…Cleanup

Yesterday, I pushed out two commits to ipydbg. The first was simple, I removed all of the embedded ConsoleColorMgr code in favor of the separate consolecolor.py module I blogged about Thursday. The second commit…well, let’s just say it’s not quite so simple.

Last weekend, I was experimenting with breakpoints when I discovered that the MoveNext method of BreakpointEnumerator was throwing a NotImplementedException. Up to that point, I hadn’t modified any of the MDbg C# source code except to merge the corapi and raw assemblies into a single assembly. But since I had to fix BreakpointEnumerator, I figured I should make some improvements to the C# code as well. For example, I added helper functions to easily retrieve the metadata for a class or function.

In my latest commit, I’ve added a SymbolReader property to CorModule. Previously, I managed the mapping from CorModules to SymbolReaders in my IPyDebugProcess class via the symbol_readers field. However, since mapping CorModules to SymbolReaders is something pretty much any debugger app would have to do, it made more sense to have that be a part of CorModule directly. So now, you can set and retrieve the SymbolReader directly on the module. Furthermore, I moved the logic to retrieve a SymbolReader from the IStream provided in the OnUpdateModuleSymbols event into the CorModule class as well.

I wouldn’t have bothered to blog this change at all, except that if you look at how the SymbolReader property is implemented under the hood, it’s not what you would expect. Instead of having SymbolReader as an instance variable on CorModule – as you might expect -CorModule has a static dictionary mapping CorModules to SymbolReaders. The instance SymbolReader property simply then access to the underlying static dictionary.

//code taken from CorModule class in CorModule.cs
private static Dictionary<CorModule, ISymbolReader> _symbolsMap =   
                             new Dictionary<CorModule, ISymbolReader>();   

public ISymbolReader SymbolReader    
{   
    get   
    {   
        if (_symbolsMap.ContainsKey(this))   
            return _symbolsMap[this];   
        else   
            return null;   
    }   
    set   
    {   
        _symbolsMap[this] = value;   
    }   
}

Now obviously, this the way you typically implement properties. However, the problem is that there isn’t a 1-to-1 mapping between the underlying debugger COM object instances and the managed objects instances that wrap them. For example, if you look at the CorClass:Module property, it constructs a new managed wrapper for the COM interface it gets back from ICorDebugClass.GetModule. That means that I can’t store the symbol reader as an instance field in the managed wrapper since I probably will never see a given managed wrapper module instance ever again.

All of the debugger API wrapper classes including CorModule inherit from a class named WrapperBase which overrides Equals and GetHashCode. The overridden implementations defer to the wrapped COM interface, which means that two separate managed wrapper instances of the same COM interface will have the same hash code and will evaluate as equal. The upshot is that object uniqueness is determined by the wrapped COM object rather that the managed object instance itself.

Using a static dictionary to store a module instance property provides the necessary “it doesn’t matter what managed object instance you use as long as they all wrap the same COM object underneath” semantics. If I create multiple instances CorModule that all wrap the same underlying COM interface pointer, they’ll all share the same SymbolReader instance from the dictionary.

Yeah, it’s feels kinda hacky, but it works.

Posted By Harry Pierson at 3:27 PM Pacific Standard Time

Thursday, March 19, 2009

Writing an IronPython Debugger: Colorful Console

Now that I’ve added the current source code line to the console output, I wanted to start using color in order to make it clearer to understand the various pieces of data that gets output. Now, the various event handler messages get output in dark grey while the current line of source is in yellow. Here’s what it looks like on my machine (note, the top line with the green [11] is PowerShell and ipy2 is a PowerShell alias to ipy.exe v2.0.1)

ipydbg on the console

Writing color to the windows console is a hassle because of the stateful API it uses. The problem is that I always want to return to the default color after I’ve written out a line of colored text. I wish there was an overload of Console.Write and WriteLine that took the foreground and background colors as arguments. 

Of course, I could easily implement my own write and writeline methods that took color parameters. However, I was loath to do that as Python’s print statement is so convenient. So instead, I build a console color context manager. I got the idea from Luis Fallas’ XmlWriter context manager.

class ConsoleColorMgr(object): 
  def __init__(self, color): 
    self.color = color 

  def __enter__(self): 
    self.temp = Console.ForegroundColor 
    Console.ForegroundColor = self.color 
     
  def __exit__(self, t, v, tr): 
    Console.ForegroundColor = self.temp 

CCDarkGray = ConsoleColorMgr(ConsoleColor.DarkGray)
CCGray     = ConsoleColorMgr(ConsoleColor.Gray)
CCYellow   = ConsoleColorMgr(ConsoleColor.Yellow)

def OnCreateAppDomain(self, sender,e): 
    with CCDarkGray: 
      print "OnCreateAppDomain", e.AppDomain.Name 
    e.AppDomain.Attach()

Python’s with statement is similar to C#’s using statement. However, unlike IDisposable object, Python context managers support both an enter and exit method. This means I don’t have to construct an object in order to get a context (in this case, the console colors) managed. So far, I’ve got three console color context managers defined – Grey, DarkGrey and Yellow. I’m thinking that ConsoleColorMgr is a candidate for my assorted module collection at some point.

Now that I can print in color, I wanted to modify my line printer to use color. Usually, the current sequence point corresponds to an entire line of python source. But as we see below, sometimes only part of a given line of source text is associated with a given sequence point.

image

The other issue I ran into is that there’s a always a sequence point at the very end of a function. Unlike the break at the start of the function I wrote about in my last post, this one I didn’t want to automatically step over. This is the last breakpoint for a given scope, so I should give the user one last chance to inspect the scope (once I add the ability to do that, at any rate) before we step out of it. However, I wanted a way of showing that we’re about to step out in the source code line view. I decided on writing a series of carets ^^^ to indicate that we’re at the end of a function.

image

As you can see in the dark grey line in the screenshot above, the current sequence point starts and ends at line 4 column 23. Column 23 is beyond the end of line 4, so that’s what I look for in order to draw the three carets. Here’s the final version of _print_source_line:

def _print_source_line(self, sp, lines):
  line = lines[sp.start_line-1]
  with CCGray:
    Console.Write("%d: " % sp.start_line)
    Console.Write(line.Substring(0, sp.start_col-1))
    with CCYellow:
      if sp.start_col > len(line):
        Console.Write(" ^^^")
      else:
        Console.Write(line.Substring(sp.start_col-1,
                                     sp.end_col - sp.start_col))
    Console.WriteLine(line.Substring(sp.end_col-1))

So colorizing the current line of source code turned out to be a little harder than I had expected. But hey, I got a start of a reusable module out of it. That’s pretty cool. Anyway, the latest bits are, as always, up on GitHub.

Posted By Harry Pierson at 2:48 PM Pacific Standard Time

Writing an IronPython Debugger: Showing Source Code

It’s been almost a week since my last ipydbg post. I’m not done, I just needed to catch my breath for a few days and get some other work done. Contrary to popular believe, my day job revolves around more than just ipydbg! :)

Actually, I’ve made ten commit since my last post, but it’s been a mostly minor changes. For example, I was hacking around with breakpoints and restored a bunch of commented out code in BreakpointEnumerator. Since I was changing the original C# CorDebug wrapper source, I decided to add a few helper functions to return metadata for functions and classes as well as cleaning up some C# filenames. On the Python side, I added an active_appdomain field to IPyDebugProcess to go along with active_thread.

Today, I added what started as a fairly minor feature – showing the current line of source code at the start of the input loop. The initial code for this was cake, simply getting the sequence point for the current location and mapping that to a source file. In order to avoid hitting the file system over and over, I cache source files the first time they are accessed.

def _get_file(self,filename):
    filename = Path.GetFileName(filename)
    if not filename in self.source_files:
      self.source_files[filename] = File.ReadAllLines(filename)
    return self.source_files[filename] 

def _input(self):
    offset, sp = self._get_location(self.active_thread.ActiveFrame)
    lines = self._get_file(sp.doc.URL)
    print "%d:" % sp.start_line, lines[sp.start_line-1]
    #input loop ommited for clarity   

However, when I did this, I discovered a slight issue. When you step into a Python function, the CLR debugger breaks at the very beginning of the function being stepped into. In C#, the function start is mapped to the opening curly brace of the function. IronPython, on the other hand, doesn’t map the start of the function to anything since there’s a bunch of infrastructure code at the start of every function that has no correlation to the python source. This means _get_location return a null sequence point when I first step into a function and thus I wouldn’t be able to show any source code.

I could make the argument that start of the function should be mapped to the colon that starts the function block. However, I’m not in a position to make changes to how the shipping version of IronPython emits debug symbols. So instead, I decided to insert an automatic step whenever I step into a function by modifying OnStepComplete:

def OnStepComplete(self, sender,e):
    offset, sp = self._get_location(e.Thread.ActiveFrame)
    print "OnStepComplete Reason:", e.StepReason, \
           "Location:", sp if sp != None else "offset %d" % offset
    if e.StepReason == CorDebugStepReason.STEP_CALL:
      self._do_step(e.Thread, False)
    else:
      self._do_break_event(e)

I have this nagging feeling that a simple step won’t suffice and I’ll need to add logic to ensure that I’m only auto-stepping when the start of the function doesn’t have a matching sequence point. But I have tested this with a few different python scripts and it appears to work fine. If I need something more sophisticated, I can always add it later. BTW, notice I modified the signature of _do_step so that it takes the thread as an argument rather than picking it up as an IPyDebugProcess field.

As usual, latest ipydbg (including new compiled version of CorDebug.dll) is available at GitHub.

Posted By Harry Pierson at 1:58 PM Pacific Standard Time

Friday, March 13, 2009

Writing an IronPython Debugger: Debugging Just My Code

As I wrote last time, in order to make debug stepping actually useful in ipydbg I need to avoid stepping into frames that are part of the IronPython infrastructure. I did something similar when I hide infrastructure frames in the stack trace. Originally, I had planned to automatically stepping again if we ended up on a frame that didn’t correspond to a python file. However, Mike Stall showed me a much cleaner and better performing solution: Just My Code. As I mentioned at the start of this series, support for JMC is one of the main reasons I wanted to build my own debugger rather than use MDbg.

Enabling JMC in the stepper object is trivial:

def create_stepper(thread, JMC = True):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  stepper.SetJmcStatus(JMC) 
  return stepper

If I make that single change and run ipydbg, any step effectively turns into a full continue since none of the code has been marked as “My Code” yet. As you see, the tricky part of JMC isn’t enabling it on the stepper, it’s “painting” the parts of the code where you want JMC stepping to work. You can set JMC status at the module, class or the method level. In the case of ipdbg, it’s easiest to work at the class level:

infrastructure_methods =  ['TryGetExtraValue',     
    'TrySetExtraValue',     
    '.cctor',     
    '.ctor',     
    'CustomSymbolDictionary.GetExtraKeys',     
    'IModuleDictionaryInitialization.InitializeModuleDictionary']    

def OnClassLoad(self, sender, e):
    cmi = CorMetadataImport(e.Class.Module)
    mt = cmi.GetType(e.Class.Token)
    print "OnClassLoad", mt.Name

    if not e.Class.Module.IsDynamic:
      e.Class.JMCStatus = False
    elif mt.Name.startswith('IronPython.NewTypes'):
      e.Class.JMCStatus = False
    else:
      e.Class.JMCStatus = True
      for mmi in mt.GetMethods():
        if mmi.Name in infrastructure_methods:
          f = e.Class.Module.GetFunctionFromToken(mmi.MetadataToken)
          f.JMCStatus = False

OnClassLoad is where the action is. This event handler is responsible for enabling JMC for all class methods that map to python code. To understand how the logic in OnClassLoad works, you need to understand a little about the .NET types and code that IronPython generates. Note, the following description is for the IronPython 2.0 branch. Code generation evolves from release to release and I know for a fact there are changes in the upcoming 2.6 version. I assume that I’ll eventually have to sniff the IronPython version in order to set JMC correctly.

Today, IronPython generates all code into dynamic modules and methods. Since I want to limit stepping to python code only, I automatically disable JMC for non-dynamic modules. I can imagine a scenario where I want to step into non-dynamically generated code, but I think the best way to handle that would be to disable JMC at the stepper rather than widening the amount of code marked as JMC enabled.

For every module that gets loaded, IronPython generates a type. At a minimum you’re going to load two modules: site.py and whatever python script you ran. If you have the python standard library installed, site.py loads a bunch of other modules as well. Each of these module types have a bunch of standard methods that always get generated. For example, the global scope code in the module is placed in a static method on the module type called Initialize. Any python functions you define get generated static methods with mangled names on the module type [1]. All these methods have corresponding python code and should be JMC enabled. The other standard methods on a module type should not be JMC enabled. So in my debugger, I mark the class as JMC enabled but then iterate over the list of methods and mark any in the list of standard methods (except for Initialize) as JMC disabled.

Of course, you can also create classes in Python. As you might expect, classes in Python are generated as .NET types. However, the semantics of Python classes are very different than .NET types. For example, you can change the inheritance hierarchy of python classes at runtime. That’s obviously not allowed for .NET types. So the .NET types we generate have all the logic to implement Python class semantics. As it turns out, these .NET types *only* have the logic to implement Python class semantics, which is to say they have *none* of Python class methods code. This makes sense when you think about it – since Python can add and remove methods from a class at runtime, IronPython can’t put the method code in the .NET type itself. Instead, Python class methods are generated as static methods on the module type, just like top-level functions are. Since Python class types only contain Python class semantics logic, we never want to enable JMC for Python class types. Python class types get generated in the IronPython.NewTypes namespace, so it’s fairly easy to check the class name in OnClassLoad and automatically disable JMC for classes any in that namespace.

Adding JMC support makes ipydbg significantly more usable. It’s almost like a real tool now, isn’t it? Latest bits are up on GitHub.


[1] FYI, IronPython generates python functions as dynamic methods in release mode and static module class methods in debug mode since you can’t step into dynamic methods. The description above is specific to debug mode since ipydbg exclusively runs in debug mode.

Posted By Harry Pierson at 3:43 PM Pacific Standard Time

Writing an IronPython Debugger: Stepping Thru Code

So far, I’ve written seven posts about my IronPython debugger, but frankly it isn’t very functional yet. It runs, breaks on the first line and can show a stack trace. Not exactly Jolt award material. In this post, I’m going to add one of the core functions of any debugger: stepping. Where previously I’ve written a bunch of code but had little to show in terms of features, now I’m getting three new features (basic step, step in and step out) at once!

def _input(self):
  #remaining _input code omitted for clarity
  elif k.Key == ConsoleKey.S:
      print "\nStepping"
      self._do_step(False)
      return
  elif k.Key == ConsoleKey.I:
      print "\nStepping In"
      self._do_step(True)
      return                
  elif k.Key == ConsoleKey.O:
      print "\nStepping Out"
      stepper = create_stepper(self.active_thread)
      stepper.StepOut()

def _do_step(self, step_in):
  stepper = create_stepper(self.active_thread)
  mod = self.active_thread.ActiveFrame.Function.Module
  if mod not in self.symbol_readers:
      stepper.Step(step_in)
  else:
    range = get_step_ranges(self.active_thread, self.symbol_readers[mod])
    stepper.StepRange(step_in, range)

Here you can see the _input clauses for step, step in and step out. Of the three, step out is the simplest to implement: create the stepper object and call StepOut. For step and step in, I could simply call Step (the boolean argument indicates if you want to step into or over functions) but that only steps a single IL statement. The vast majority of the time there are multiple IL instructions for every line of source code, so IL statement stepping is very tedious. As we learned when setting a breakpoint, debug symbols contain sequence points that map between source and IL locations. If they’re available, I use the sequence points to determine the range of IL statements to step over so that I can step single source statements instead.

The stepping code above depends on three helper functions defined at global scope.

def create_stepper(thread):
  stepper = thread.ActiveFrame.CreateStepper()
  stepper.SetUnmappedStopMask(CorDebugUnmappedStop.STOP_NONE)
  return stepper 
  
def create_step_range(start, end):
  range = Array.CreateInstance(COR_DEBUG_STEP_RANGE, 1)
  range[0] = COR_DEBUG_STEP_RANGE(startOffset = UInt32(start),
                                  endOffset = UInt32(end))
  return range
  
def get_step_ranges(thread, reader):
    frame = thread.ActiveFrame
    offset, mapResult = frame.GetIP()
    method = reader.GetMethod(SymbolToken(frame.FunctionToken))
    for sp in get_sequence_points(method):
        if sp.offset > offset:
            return create_step_range(offset, sp.offset)
    return create_step_range(offset, frame.Function.ILCode.Size)          

The first function, create_stepper, simply constructs and configures the stepper object. The call to SetUnmappedStopMask tells the debugger not to stop if it encounters code that can’t be mapped to IL. If you need to debug at that level, ipydbg is *not* for you.

Next is create_step_range, which exists purely for .NET interop purposes. There are three interop warts hidden in this function. First is creating a .NET array of COR_DEBUG_STEP_RANGE structs. Every time I write Array code like this, I wish for a CreateFromCollection static method on Array. However, in this case it isn’t that big a deal since it’s a one element array. Second wart is having to set the values of COR_DEBUG_STEP_RANGE via constructor keyword arguments. It turns out that IronPython disallows direct updates to value type fields (read this for the reason why). Instead, I pass in the field values into the constructor as keyword arguments. Finally, you have to explicitly convert the start and end offsets to a unsigned int in order to set the offset fields in the COR_DEBUG_STEP_RANGE struct constructor.

Finally is get_step_ranges, which iterates thru the list of sequence points in the current method looking for the one with the smallest offset that is larger than the current offset position. If it can’t find a matching sequence point, it sets the range to the end of the current function. The start range offset is always the current offset. I did make a significant change to get_sequence_points – it no longer yields sequence points that have a start line of 0xfeefee. By convention, that indicates a sequence point to be skipped. Originally, the logic to ignore 0xfeefee sequence points was in get_location. But when I originally wrote get_step_ranges, it had essentially the same sequence point skipping logic, so I moved it to get_location instead.

Technically, I’ve built three new features but the reality is that if you end up in IronPython infrastructure code it’s really hard to find your way back to python code. Step in is particularly useless right now. Luckily, the .NET debugger API supports a feature called “Just My Code” that will make stepping much more useful. In the meantime, the latest version of ipydbg is up on GitHub as usual.

Posted By Harry Pierson at 9:31 AM Pacific Standard Time

Wednesday, March 11, 2009

Writing an IronPython Debugger: Refactoring

When we last left ipydbg, it was up to about 200 lines of code. Not bad in terms of overall length, but I started to detect some code smell. I was relying pretty heavily on global variables and the structure of my code made it difficult to control how the debugger was run. I wanted to change ipydbg so it would automatically spin up an MTA thread if I forgot to add the –X:MTA command line parameter. But since by debugger and process objects were global, they’d get created on the main thread of ipydbg, regardless if it was STA or MTA. So for this “release” (I’d say I’m almost to version 0.0.0.1), I decided on focusing on enginering and refactoring rather than new features.

The big new addition is the IPyDebugProcess class, which is clearly the workhorse of the application. All of the previously global variables are now class instance variables on IPyDebugProcess. Input and run along with all the event handlers as well as do_break_event and get_location are now class methods, as they need to access instance variables (setting the break event, accessing the symbol reader dictionary, etc.). Functions that didn’t need to access instance variables (get_sequence_points, create_breakpoint, get_dynamic_frames and get_method_info_for_frame) I left as top-level functions. If they get more complex, I may break them out into their own modules, but for now I left them in ipydbg.py.

The conversion process was fairly trivial. I had to add “self.” lots of places and change the indention level all over but that was pretty much it. Once I finished the conversion, I was able to add the run_debugger function to handle the thread creation, if necessary.

def run_debugger(py_file):
    if Thread.CurrentThread.GetApartmentState() == ApartmentState.STA:
        t = Thread(ParameterizedThreadStart(run_debugger))
        t.SetApartmentState(ApartmentState.MTA)
        t.Start(py_file)
        t.Join()   
    else:
        p = IPyDebugProcess()
        p.run(py_file)

if __name__ == "__main__":        
    run_debugger(sys.argv[1])        

Originally, I tried to put this logic in IPyDebugProcess.run. However, since I’m creating the debugger object in the __init__ function, that meant it would be created on the wrong thread. I could have moved the debugger creation to the run method or move the thread management code to __init__, but I decided to factor that logic into a separate function completely. Felt cleaner that way.

Posted By Harry Pierson at 7:42 PM Pacific Standard Time

Monday, March 09, 2009

Writing an IronPython Debugger: Dynamic Stack Trace

Now that I can interact with my debugger, it’s time to add a command. I decided to start with something simple – or at least something I thought would be simple - printing a stack trace.

In the unmanaged debugger API, threads have the concept of both stack chains and stack frames. A stack chain represents a segment of the physical stack. In a typical managed app, you’ll have at least two stack chains: the unmanaged stack chain and the managed stack chain. You can interate through the stack chains for a given thread via the Chains property. However, ipydbg is a managed only debugger, so I can ignore the unmanaged stack chain. Instead, I just retrieve the current (managed) chain via the thread’s ActiveChain property.

Within a managed stack chain, there is a collection of stack frames. This is the call stack that managed developers are typically used to working with. It turns out that printing a raw stack trace is very easy to do. Here was my first stab at it:

elif k.Key == ConsoleKey.T:
  print "\nManaged Stack Trace"
  for f in active_thread.ActiveChain.Frames:
    offset, sp = get_location(f)
    metadata_import = CorMetadataImport(f.Function.Module)
    method_info = metadata_import.GetMethodInfo(f.FunctionToken)
    print "  ", \
      "%s::%s --" % (method_info.DeclaringType.Name, method_info.Name), \
      sp if sp != None else "(offset %d)" % offset

This elif block is part of the input method I showed last time. It loops thru the frames in the Active Chain of the active thread and prints some data to the console. As I said, pretty easy. Of course, the devil is in the details.

First detail I should call out is that active_thread variable. As per Mike Stall, “there is no notion of "active thread" in the underlying debug APIs. It's purely a construct in a debugger UI to make it easier for end-users.” My console based UI may be rudimentary, but it’s still a UI. Events like OnBreakpoint include the active thread as a event argument, so I stash that away in a variable so it’ll be available to the input loop.

Second detail is the call to get_location. When we last saw get_location, it was returning a formatted string. Since my last post, I’ve refactored the code so it returns the raw location data – a tuple of the raw IP offset and the associated sequence point, if available. I’ve also added a __str__ method to my sequence point object, so when I print it to the console, I get the filename and line nicely formatted.

Finally, there’s all CorMetadataImport code. In addition to wrapping the unmanaged debugger API, CorDebug also wraps the unmanaged metadata API. This code lets me get MethodInfo compatible view of the function metadata for a given stack frame. I use it here to get the type and function name for each frame on the stack.

The end result looks something like this. Note, I’ve replaced “Microsoft.Scripting” with “MS.Scripting” to avoid word wrapping.

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Managed Stack Trace
   S$2::Initialize simpletest.py:1 (offset: 84)
   MS.Scripting.Runtime.OptimizedScriptCode::InvokeTarget (offset 72)
   MS.Scripting.ScriptCode::Run (offset 0)
   IronPython.Hosting.PythonCommandLine::RunFileWorker (offset 77)
   IronPython.Hosting.PythonCommandLine::RunFile (offset 15)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 46)
   IronPython.Hosting.PythonCommandLine::Run (offset 240)
   MS.Scripting.Hosting.Shell.CommandLine::Run (offset 74)
   MS.Scripting.Hosting.Shell.ConsoleHost::RunCommandLine (offset 158)
   MS.Scripting.Hosting.Shell.ConsoleHost::ExecuteInternal (offset 32)
   MS.Scripting.Hosting.Shell.ConsoleHost::Execute (offset 63)
   MS.Scripting.Hosting.Shell.ConsoleHost::Run (offset 390)
   PythonConsoleHost::Main -- (offset 125)

As we can see, we may be on the first line of the python script, but we’ve got a pretty deep stack trace already. Everything but the top-most frame are from the underlying IronPython implementation. Those extra frames obscure the stack frames I actually care about, so it would be nice to hide any stack frames from IronPython or the DLR. It’s easy enough to write a python generator function that filters out frames that from the DLR or IronPython namespaces. In order to get the type name, we need the method_info like we did above. I’ve factored that code into a separate function in order to avoid code duplication.

def get_method_info_for_frame(frame)
    if frame.FrameType != CorFrameType.ILFrame:
      return None
    metadata_import = CorMetadataImport(frame.Function.Module)
    return metadata_import.GetMethodInfo(frame.FunctionToken)
    
def get_dynamic_frames(chain):
  for f in chain.Frames:
    method_info = get_method_info_for_frame(f)
    if method_info == None:
      continue
    typename = method_info.DeclaringType.Name
    if typename.startswith("Microsoft.Scripting.") \
      or typename.startswith("IronPython.") \
      or typename == "PythonConsoleHost":
        continue
    yield f

You’ll notice I’ve added a guard to get_method_info_for_frame in order to ensure that the frame argument is an IL Frame. There are three types of stack frames in the debugger API: IL, native and internal. Most of the frames we’re dealing with are IL frames, but you do run into the occasional lightweight function (i.e. DynamicMethod) frame when debugging IronPython code. Typically, IronPython generates DynamicMethods for all python code except for a few cases related to .NET interop. However, you can’t debug DynamicMethods, so when you run with –D, we generate normal non-dynamic methods instead. However, even when running with –D, we still use DynamicMethods for call site dispatch. Since they’re an implementation detail, we want to filter those out in get_dynamic_frames too.

This gives us a much more manageable stack trace:

OnBreakpoint Initialize Location: simpletest.py:1 (offset: 84)
» t
Stack Trace
   S$2::Initialize -- simpletest.py:1 (offset: 84)

As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:10 PM Pacific Standard Time

Wednesday, March 04, 2009

Writing and IronPython Debugger: Adding Interactivity

Now that ipydbg can set a breakpoint, it’s time to add some interactivity to the app. MDbg supports dozens of commands and currently ipydbg supports none. I’d love for ipydbg to support a wide range of commands like MDbg does, but for now let’s keep it simple and start with two: Continue and Quit. These aren’t very interesting as commands go, but that lets me focus this blog post on adding basic interactivity and future posts on specific commands.

First off, we have to understand how the CorDebug managed API supports interactivity. As we’ve seen, callbacks into the debugger are surfaced as managed events. If we look at the base class for all the debugger event arguments, we see that it exposes a Continue property. If you want the debugger to automatically continue after the event handler finishes running, you set the Continue property to true (which is the default). If you want the debugger to stay paused while you provide the developer a chance to poke around, you set Continue to false. In that case, the debugger stays paused until call process.Continue explicitly.

Once we set the Continue property to false, we need a mechanism to signal the main thread of execution that it’s time to wake up and ask the user what they want to do next. Of course, that’s what WaitHandle and it’s descendents are for. In fact, we’re already using an AutoResetEvent in OnProcessExit to signal that the debugged app has exited so we should exit the debugger. However, now we have two different signals that we want to send: exit the debugger or enter the input loop. I decided to differentiate by using two separate AutoResetEvents:

terminate_event = AutoResetEvent(False
break_event = AutoResetEvent(False

def OnProcessExit(s,e): 
  print "OnProcessExit" 
  terminate_event.Set() 

def OnBreakpoint(s,e): 
  print "OnBreakpoint", get_location( 
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread) 
  e.Continue = False 
  break_event.Set() 

#code to create debugger and process omitted for clarity

handles = Array.CreateInstance(WaitHandle, 2
handles[0] = terminate_event 
handles[1] = break_event 

while True
  process.Continue(False

  i = WaitHandle.WaitAny(handles) 
  if i == 0
    break 

  input()

Instead of a single call to process.Continue I had before, I’ve created an infinite “while True” loop that calls Continue, waits for one of the events to signal, then either exits the loop of enters the input loop (via the input function). Since there are two AutoResetEvents, I need to use the WaitAny method to wait for one of them to signal. WaitAny takes an array, which is kind of clunky to use from IronPython since the array has to be strongly typed. It would be much more pythonic if I could call WaitHandle.WaitAny([terminate_event, break_event]). WaitAny then returns an index into the array indicating which one received the signal. If it was the terminate_event that signaled, I exit the loop (and the application). Otherwise, I enter the input loop. Notice, by the way, in OnBreakpoint that I’m both setting Continue to false and signaling the break_event.

The “input loop” needs to be a loop because the user may want to type in multiple commands before letting the debugged app continue to execute. This means that the input function is implemented as another “while True” loop. When the user does chooses a command that implies the process should continue, I simply exit out of the input function and the outer “while True” loop above executes the continue and waits for a signal.

Here’s what the input function looks like right now with our two basic commands:

def input():
  while True:
    Console.Write("» ")
    k = Console.ReadKey()
    
    if k.Key == ConsoleKey.Spacebar:
      Console.WriteLine("\nContinuing")
      return 
    elif k.Key == ConsoleKey.Q:
      Console.WriteLine("\nQuitting")
      process.Stop(0)
      process.Terminate(255)
      return
    else:
      Console.WriteLine("\n Please enter a valid command")

I’ve mapped “q” to quit the debugger and spacebar to continue. Since I’m using Console ReadKey, you only have to type the key in question – no return needed. For continue, we don’t do anything but exit the input loop by returning. Continue gets called as part of the other loop and since we haven’t/can’t add additional breakpoints the debugged app will run until it ends. For quit, I call the Terminate method on process, hard coding the return value to 255. However, Terminate implicitly continues the debugged process. Since you can’t continue a running process, the call to Continue in the outer loop throws an exception. I avoid this exception by adding the call to Stop before Terminate. As per the Stop docs, the debugger maintains a “stop counter” and only resumes the debugged process when the counter reaches zero.  Calling Stop increases the stop counter by one, calling Terminate decreases it by one, then the outer loop Continue  call decreases it to zero and the process continues, terminates and fires the OnProcessExit event handler as usual.

Now that we have a basic interactive loop, I’ll be able to add more interesting commands. I’m guessing at some point, I’ll need to refactor input a bit – I’m guessing a huge if/elif/else statement is going to get ugly fast, but I’ll worry about that when it gets out of hand. As usual, the latest ipydbg source is up on GitHub.

Posted By Harry Pierson at 2:06 PM Pacific Standard Time

Monday, March 02, 2009

Writing an IronPython Debugger: Setting a Breakpoint

Now that we have a debugger process up and running, let start adding some actual features. First up, we want to be able to set breakpoints. One of the nice things MDbg does is auto-set a breakpoint on the entrypoint function. For ipydbg, we’re going to auto-set a breakpoint on the first line of the python file being debugged.

In order to set a breakpoint, we need debugger symbols. They allow us to translate between “line one of simpletest.py” and the actual location in the code and back. We’re all used to seeing the PDB files that are produced when we compile a C# assembly. Unsurprisingly, the symbol store binder provides a method to load these PDB files from disk. But where do IronPython debug symbols come from? I know from my extensive reading of the ipy.exe command line parameters that you pass –D to enable application debugging, but since all the IL is being generated in memory, how does the debugger get access to the PDB files?

It turns out the debugger API includes a UpdateModuleSymbols callback method that the runtime uses to notify the debugger when the symbols change. The debugger symbols are provided in an IStream, and then you use the symbol binder to get a symbol reader. The .NET Framework already provides a managed API for reading and writing debug symbols. However, that API doesn’t support loading symbols from a stream, so the MDbg code includes it’s own wrapper around the symbol binder API to include that functionality. Here’s some code to get the debug symbol reader for an updated module and iterate through the associated files:

sym_binder = SymbolBinder()  
    
def OnUpdateModuleSymbols(s,e):  
  print "OnUpdateModuleSymbols"  
    
  metadata_import = e.Module.GetMetaDataInterface[IMetadataImport]()  
  reader = sym_binder.GetReaderFromStream(metadata_import, e.Stream)  

  for doc in reader.GetDocuments():   
    print "\t", doc.URL

process.OnUpdateModuleSymbols += OnUpdateModuleSymbols

If we run this version of ipydbg on simpletest.py with the IPy 2.0.1 release and the Python standard library installed, OnUpdatedModuleSymbols gets called six times, once for each python file that gets loaded when simpletest runs. (site.py, os.py, ntpath.py, stat.py, UserDict.py and simpletest.py). BTW, I tried running this code on the latest build of IPy (changeset 47624) and I’m getting a COM Interop exception. So for now, stick with 2.0.1.

Now that we can get these dynamically generated debug symbols, we can use them to create a breakpoint on the first line of the script being debugged. Everytime OnUpdateModuleSympols is called, I try to bind the initial breakpoint (unless it’s already been bound of course) by calling the following create_breakpoint function.

def create_breakpoint(doc, line, module, reader):
  line = doc.FindClosestLine(line)
  method = reader.GetMethodFromDocumentPosition(doc, line, 0)
  function = module.GetFunctionFromToken(method.Token.GetToken())
  
  for sp in get_sequence_points(method):
    if sp.doc.URL == doc.URL and sp.start_line == line:
      bp = function.ILCode.CreateBreakpoint(sp.offset)
      bp.Activate(True)
      return bp
      
  bp = function.CreateBreakpoint()
  bp.Activate(True)
  return bp

This code translates a given document/line into a function/offset where we can set a breakpoint. To do this, we use sequence points which as per Rick Byers are “used to mark a spot in the IL code that corresponds to a specific location in the original source”. So once we find the function that corresponds to a given line of code, we iterate over the sequence points until we find the one that matches the line we want to break on. If we find a matching sequence point, we set the breakpoint there. If we don’t, we set the breakpoint on the function itself. get_sequence_points is a simple wrapper around ISymbolMethod GetSequencePoints. The original API is pretty ugly to use – managing six separate arrays of information – so get_sequence_points turns it into a generator function you can iterate over.

Now that the breakpoint is set, we want to trap the breakpoint event as well. That’s easy enough, we create an event handler for process.OnBreakpoint similar to the OnUpdateModuleSymbols event above. Eventually, we’ll have the ability to step when we break, but for now I’m just going to print out the current location when the breakpoint is hit. This is kind of the reverse of the operation above. Setting a breakpoint means going from a source location to an IL offset within a function. Printing the current location means going from an IL offset in a function back to the source location. Here’s the function to do that:

def get_location(reader, thread): 
  frame = thread.ActiveFrame 
  function = frame.Function 
   
  offset, mapping_result = frame.GetIP() 
  method = reader.GetMethod(SymbolToken(frame.Function.Token)) 
   
  real_sp = None 
  for sp in get_sequence_points(method): 
    if sp.offset > offset:  
      break 
    if sp.start_line != 0xfeefee:  
      real_sp = sp 
       
  if real_sp == None
    return "Location (offset %d)" % (offset) 
   
  return "Location %s:%d (offset %d)" % ( 
    Path.GetFileName(real_sp.doc.URL), real_sp.start_line, offset) 

def OnBreakpoint(s,e):
  print "OnBreakpoint", get_location(
    symbol_readers[e.Thread.ActiveFrame.Function.Module], e.Thread)

Given a symbol reader and a debug thread, get_location returns a location string. It loops thru the sequence points, similar to create_breakpoint, in order to find the closest corresponding line of python code to the current offset (check out Mike Stall’s blog as for why I’m checking for 0xfeefee). In order to make this work, I need the symbol reader for the module that I retrieved in OnUpdateModuleSymbols. For now, I’m stashing the reader in a global dictionary keyed by the module named symbol_readers where OnBreakpoint can access it.

Ipydbg isn’t interactive yet, but it is now running, setting a breakpoint and successfully breaking at that breakpoint. As usual, the latest version of ipydbg is up on GitHub.

Posted By Harry Pierson at 3:59 PM Pacific Standard Time

Friday, February 27, 2009

Writing an IronPython Debugger: Hello, Debugger!

Since I’m guessing most of my readers have never build a debugger before (I certainly hadn’t), let’s start with the debugger equivalent of Hello, World!

import clr  
clr.AddReference('CorDebug')  

import sys  
from System.Reflection import Assembly  
from System.Threading import AutoResetEvent  
from Microsoft.Samples.Debugging.CorDebug import CorDebugger  

ipy = Assembly.GetEntryAssembly().Location  
py_file = sys.argv[1]  
cmd_line = "\"%s\" -D \"%s\"" % (ipy, py_file)  

evt = AutoResetEvent(False)  

def OnCreateAppDomain(s,e):  
  print "OnCreateAppDomain", e.AppDomain.Name  
  e.AppDomain.Attach()  

def OnProcessExit(s,e):  
  print "OnProcessExit"  
  evt.Set()  

debugger = CorDebugger(CorDebugger.GetDefaultDebuggerVersion())  
process = debugger.CreateProcess(ipy, cmd_line)  

process.OnCreateAppDomain += OnCreateAppDomain  
process.OnProcessExit += OnProcessExit  

process.Continue(False)  

evt.WaitOne()

I start by adding a reference to the CorDebug library I discussed at the end of my last post (that’s the low level managed debugger API plus the C# definitions of the various COM APIs). Then I need both the path to the IPy executable as well as the script to be run, which is passed in on the command line (sys.argv). For now, I just use Reflection to find the path to the current ipy.exe and use that. I use those to build a command line – you’ll notice I’m adding the –D on the command line to generate debugger symbols.

Next, I define two event handlers: OnCreateAppDomain and OnProcessExit. When the AppDomain is created, the debugger needs to explicitly attach to it. When the process exits, we signal an AutoResetEvent to indicate our program can exit.

Then it’s a simple process of creating the CorDebugger object, creating a process, setting up the process event handlers and then running the process via the call to Continue. We then wait on the AutoResetEvent for the debugged process to exit. And voila, you have the worlds simplest debugger in about 30 lines of code.

To run it, you run the ipy.exe interpreter and pass in the ipydbg script above and the python script to be debugged. You also have to pass –X:MTA on the command line, as the ICorDebug objects only work from a multi-threaded apartment. When you run it, you get something that looks like this:

» ipy -X:MTA ipydbg.py simpletest.py
OnCreateAppDomain DefaultDomain
35
OnProcessExit

Simpletest.py is a very simple script that prints the results of adding two numbers together. Here, you see the event handlers fire by writing text out to the console.

For those of you who’d like to see this code actually run on your machine, I’ve created an ipydbg project up on GitHub. The tree version that goes with this blog post is here. If you’re not running Git, you can download a tar or zip of the project via the “download” button at the top of the page. It includes both the CorDebug source as well as the ipydbg.py file (shown above) and the simpletest.py file. It also has a compiled version of CorDebug.dll, so you don’t have to compile it yourself (for those IPy only coders who don’t have VS on their machine).

Posted By Harry Pierson at 5:41 PM Pacific Standard Time

Writing an IronPython Debugger: MDbg 101

Before I start writing any debugger code, I thought it would help to quickly review the .NET debugger infrastructure that is available as well as the design of the MDbg command line debugger. Please note, my understanding of this stuff is fairly rudimentary – Mike Stall is “da man” if you’re looking for a .NET debugger blogger to read.

The CLR provides a series of unmanaged APIs for things like hosting the CLR, reading and writing CLR metadata and – more relevant to our current discussion – debugging as well as reading and writing debugger symbols. These APIs are exposed as COM objects. The CLR Debugging API allows you to do those all the things you would expect to be able to do in a debugger: attach to processes (actually, app domains), create breakpoints, step thru code, etc. Of course, being an unmanaged API, it’s pretty much unavailable to be used from IronPython. Luckily, MDbg wraps this unmanaged API for us, making it available to any managed language, including IronPython.

The basic design of MDbg looks like this:

image

At the bottom is the “raw” assembly, which contains the C# definitions of the unmanaged debugger API – basically anything that starts with ICorDebug and ICorPublish. Raw also defines some of the metadata API, since that’s how type information is exposed to the debugger.

The next level up is the “corapi” assembly, which I refer to as the low-level managed debugger API. This is a fairly thin layer that translates the unmanaged paradigm into something more palatable to managed code developers. For example, COM enumerators such as ICorDebugAppDomainEnum are exposed as IEnumerable types. Also, the managed callback interface gets exposed as .NET events. It’s not perfect – the code is written in C# 1.0 style so there are no generics or yields.

Where corapi is the low-level API, “mdbgeng” is the high-level managed debugger API. As you would expect, it wraps the low-level API and provides automatic implementations of common operations. For example, this layer maintains a list of breakpoints so you can create them before the relevant assembly has been loaded. Then when assemblies are loaded, it goes thru the list of unbound breakpoints to see if any can be bound. It’s also this layer that automatically creates the main entrypoint breakpoint.

Finally, at the top we have the MDbg application itself, as well as any MDbg extensions (represented by the … in the diagram above). The mdbgext assembly defines the types shared between MDbg.exe and the extension assemblies. MDbg has some cool extensions – including an IronPython extension – but for now I’m focused on building something as lightweight as possible, so I’m going to forgo an extensibility mechanism, at least for now.

My initial prototype was written against the high-level API. There were two problems with this approach. The first is that there’s no support for Just My Code in the high-level API. As I mentioned in my last post, JMC support is critical for this project. Adding JMC support isn’t hard, but I’m trying to make as few changes as possible to the MDbg source, since I’m not interested in forking and maintaining that code. Second, while the low-level API provides an event-based API (OnModuleLoad, OnBreakpoint, OnStepComplete, etc), the high-level API provides a more console-oriented looping API. I found the event-driven API to be cleaner to work with and I’m thinking it will work better if I ever build a GUI version of ipydbg. So I’ve decided to work against the low-level API (aka corapi).

I mentioned above that I didn’t want to change the MDbg source, but I did make one small change. The separation of corapi and raw into two separate assemblies is an outdated artifact of an earlier version of MDbg. So I decided to combine these two into a single assembly called CorDebug. Other than some simple cleanup to assembly level attributes to make a single assembly possible, I haven’t changed the source code at all.

Posted By Harry Pierson at 3:33 PM Pacific Standard Time

Writing an IronPython Debugger: Introduction

A while back I showed how you can use Visual Studio to debug IronPython scripts. While that works great, it’s lots of steps and lots of mouse work. I yearned for something lighter weight and that I could drive from the command line.

The .NET framework includes a command line debugger called MDbg, but after using it for a bit, I found it didn’t like it very much for IronPython debugging. Mdbg automatically sets a breakpoint on the main entrypoint function, but only if it can find the debugging symbols. So when you use Mdbg with the released version of IPy, the breakpoint never gets set. Instead, you have to trap the module load event, set a breakpoint in the python file you’re debugging, then stop trapping the module load event. Every Time. That gets tedious.

Another problem with MDbg is that it’s not Just-My-Code (aka JMC) aware. JMC is this awesome debugging feature that was introduced in .NET 2.0 that lets the debugger “paint” the parts of the code that you want to step thru (aka “My Code”). By default, Visual Studio marks code with symbols as “my code” and code without symbols as “not my code”. [1] We don’t ship symbols with IronPython releases, so Visual Studio does only steps thru the python code. MDbg doesn’t support JMC, so I often found myself stepping into random parts of the IronPython implementation. That’s even more tedious.

Luckily, the source code to MDbg is available. So I got the wacky idea to build a debugger specifically for IronPython. CPython includes pdb (aka Python Debugger, not Program Database) but we don’t support it because we haven’t implemented settrace. Thus, ipydbg was born.

Over the course of this series of blog posts, I’m going to build out ipydbg. I have built out a series of prototypes so I fairly confident that I know how to build it. However, I’m not sure what it will look like at the end. If you’ve got any strong opinions on it one way or the other, be sure to email me or leave me comments.

BTW, major thanks to my VSL teammate Mike Stall (of Mike Stall's .NET Debugging Blog). Without his help, I would probably still be trying to make heads or tails of the MDbg source.


[1] VS uses the DebuggerNonUserCode attribute to provide fine grained control of what is considered “my code” and should be stepped thru.

Posted By Harry Pierson at 2:21 PM Pacific Standard Time
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.