Writing an IronPython Debugger: MDbg 101

Before I start writing any debugger code, I thought it would help to quickly review the .NET debugger infrastructure that is available as well as the design of the MDbg command line debugger. Please note, my understanding of this stuff is fairly rudimentary – Mike Stall is “da man” if you’re looking for a .NET debugger blogger to read.

The CLR provides a series of unmanaged APIs for things like hosting the CLR, reading and writing CLR metadata and – more relevant to our current discussion – debugging as well as reading and writing debugger symbols. These APIs are exposed as COM objects. The CLR Debugging API allows you to do those all the things you would expect to be able to do in a debugger: attach to processes (actually, app domains), create breakpoints, step thru code, etc. Of course, being an unmanaged API, it’s pretty much unavailable to be used from IronPython. Luckily, MDbg wraps this unmanaged API for us, making it available to any managed language, including IronPython.

The basic design of MDbg looks like this:

image

At the bottom is the “raw” assembly, which contains the C# definitions of the unmanaged debugger API – basically anything that starts with ICorDebug and ICorPublish. Raw also defines some of the metadata API, since that’s how type information is exposed to the debugger.

The next level up is the “corapi” assembly, which I refer to as the low-level managed debugger API. This is a fairly thin layer that translates the unmanaged paradigm into something more palatable to managed code developers. For example, COM enumerators such as ICorDebugAppDomainEnum are exposed as IEnumerable types. Also, the managed callback interface gets exposed as .NET events. It’s not perfect – the code is written in C# 1.0 style so there are no generics or yields.

Where corapi is the low-level API, “mdbgeng” is the high-level managed debugger API. As you would expect, it wraps the low-level API and provides automatic implementations of common operations. For example, this layer maintains a list of breakpoints so you can create them before the relevant assembly has been loaded. Then when assemblies are loaded, it goes thru the list of unbound breakpoints to see if any can be bound. It’s also this layer that automatically creates the main entrypoint breakpoint.

Finally, at the top we have the MDbg application itself, as well as any MDbg extensions (represented by the … in the diagram above). The mdbgext assembly defines the types shared between MDbg.exe and the extension assemblies. MDbg has some cool extensions – including an IronPython extension – but for now I’m focused on building something as lightweight as possible, so I’m going to forgo an extensibility mechanism, at least for now.

My initial prototype was written against the high-level API. There were two problems with this approach. The first is that there’s no support for Just My Code in the high-level API. As I mentioned in my last post, JMC support is critical for this project. Adding JMC support isn’t hard, but I’m trying to make as few changes as possible to the MDbg source, since I’m not interested in forking and maintaining that code. Second, while the low-level API provides an event-based API (OnModuleLoad, OnBreakpoint, OnStepComplete, etc), the high-level API provides a more console-oriented looping API. I found the event-driven API to be cleaner to work with and I’m thinking it will work better if I ever build a GUI version of ipydbg. So I’ve decided to work against the low-level API (aka corapi).

I mentioned above that I didn’t want to change the MDbg source, but I did make one small change. The separation of corapi and raw into two separate assemblies is an outdated artifact of an earlier version of MDbg. So I decided to combine these two into a single assembly called CorDebug. Other than some simple cleanup to assembly level attributes to make a single assembly possible, I haven’t changed the source code at all.

Writing an IronPython Debugger: Introduction

A while back I showed how you can use Visual Studio to debug IronPython scripts. While that works great, it’s lots of steps and lots of mouse work. I yearned for something lighter weight and that I could drive from the command line.

The .NET framework includes a command line debugger called MDbg, but after using it for a bit, I found it didn’t like it very much for IronPython debugging. Mdbg automatically sets a breakpoint on the main entrypoint function, but only if it can find the debugging symbols. So when you use Mdbg with the released version of IPy, the breakpoint never gets set. Instead, you have to trap the module load event, set a breakpoint in the python file you’re debugging, then stop trapping the module load event. Every Time. That gets tedious.

Another problem with MDbg is that it’s not Just-My-Code (aka JMC) aware. JMC is this awesome debugging feature that was introduced in .NET 2.0 that lets the debugger “paint” the parts of the code that you want to step thru (aka “My Code”). By default, Visual Studio marks code with symbols as “my code” and code without symbols as “not my code”. 1 We don’t ship symbols with IronPython releases, so Visual Studio does only steps thru the python code. MDbg doesn’t support JMC, so I often found myself stepping into random parts of the IronPython implementation. That’s even more tedious.

Luckily, the source code to MDbg is available. So I got the wacky idea to build a debugger specifically for IronPython. CPython includes pdb (aka Python Debugger, not Program Database) but we don’t support it because we haven’t implementedsettrace. Thus, ipydbg was born.

Over the course of this series of blog posts, I’m going to build out ipydbg. I have built out a series of prototypes so I fairly confident that I know how to build it. However, I’m not sure what it will look like at the end. If you’ve got any strong opinions on it one way or the other, be sure to email me or leave me comments.

BTW, major thanks to my VSL teammate Mike Stall (of Mike Stall’s .NET Debugging Blog). Without his help, I would probably still be trying to make heads or tails of the MDbg source.


  1. VS uses the DebuggerNonUserCode attribute to provide fine grained control of what is considered “my code” and should be stepped thru.