Code is Model

In the foreword to Architecture Journal 3, I wrote the following:

Abstraction is the architect’s key tool for dealing with complexity. We’ve seen the evolution of architectural abstractions techniques such as models and patterns; however, we have yet to realize much in the way of measurable value from them. Models and patterns are useful for communicating effectively with our peers, but so far they haven’t helped drastically reduce the amount of resources it takes to build and operate a system. In order to continue to deal with increasing complexity, we need to get much more pragmatic about using abstraction to solve problems.

Because the lack of measurable value to date, the IT industry at large has come to view models at best as “pretty pictures” and at worst as a pointless waste of time and resources. But the reality is, we use models in the IT industry all the time. I don’t know what you’re favorite modeling tool it, but my current favorite is C#. Before that was VB and before that was C++. I realize some of my readers might be more partial to Java, Ruby, or even Assembly. But the reality is: all of these so-called “higher level” programming languages are simply models of the CPU execution environment.

The only code that the CPU can understand is machine code. But nobody wants to write and debug all their code using 0’s and 1’s. So we move up a level of abstraction and use a language that humans can read more easily and that can be automatically translated (i.e. compiled) into machine code the CPU can execute. The simplest step above machine code is assembly language. But ASM isn’t particularly productive so work with, so the industry has continuously raised the level of abstraction in the languages they use. C is a higher level of abstraction than ASM, adding concepts like types and functions. C++ is a higher level of abstraction than C, adding concepts like classes and inheritance. Each of these levels of abstraction presents a new model of the execution environment with new features that make programming more productive (and sometimes more portable). Different languages offer different models of the execution environment. For example, the Ruby model of the execution environment allows for the manipulation of class instances while the C++ model allows for multiple inheritance. This isn’t to say one is better than the other – they are just different.

In the past decade, we’ve seen the rise in popularity of VM based programming environments – primarily Java and CLR. In these environments, there are multiple models at work. CLR languages and Java are models above the underling VM execution environment. The VM execution environment is, in turn, a model of the physical execution environment. As an example, a C#/Java program is translated into IL/bytecode at compile time and then from IL/bytecode to machine code at runtime. So in these VMs, two model translations have to occur in order to go from programming language to machine code. It turns out that this multiple step translation approach is also useful in non-VM environments. For example, the original C++ compiler output vanilla C code which was, in turn, compiled with a vanilla C compiler. C# and Java use a similar approach, except that the second translation occurs at runtime, not compile time.

So if Code is Model, what can we learn from looking at the success of mainstream text-based programming languages to help us in the development of higher abstraction modeling languages that are actually useful. This isn’t an exhaustive list, but here are a few things (tenets?) I’ve thought of:

  • Models must be Precise
    There must be no ambiguity in the meaning of the elements in a model. In C#, every statement and keyword has an exact well-defined meaning. There is never a question as to what any given piece of C# code means. There may be context-sensitive meanings, such as how the keyword “using” has different meanings in C# depending on where it is used. If you don’t have a similar level of precision in your model, there’s no way to transform it to lower abstraction models in a deterministic fashion. Models that can’t be transformed into lower abstraction models are nothing more than pretty pictures – perhaps useful for communication with other people on the project, but useless as development artifacts.
  • Model Transformation must be Deterministic
    By definition (or at least by convention), models are at a higher level of abstraction than both your execution domain and mainstream programming languages – perhaps significantly higher. In order to derive value from a model, you must be able to transform it into the execution domain. Like the C# to IL to machine code example, the model transformation may comprise multiple steps. But each transformation between models must be as precise as the models themselves. When you compile a given piece of C# code, you get the same IL output every time. However, this transformation can vary across target models. For example, when you run a managed app on a x86 machine you get different machine code than if you ran it on an x64 machine.
  • Models must be Intrinsic to the Development Process
    Even if you have precise models and deterministic transformations, you have to make them first class citizens of the development process or they will become outdated quickly. How often have you blueprinted your classes with UML at the start of the project, only to have that class diagram be horribly out of date by the end of the project? In order to keep models up to date, they must be used through-out the development process. If you need to make a change, make the change to the model and then retransform into the execution domain. Think of C# and IL – do we use C# as a blueprint, transform once to IL and then hand edit the IL? No! We change the C# directly and retransform into IL. We need to have the same process even as we move into higher levels of abstraction.
  • Models Aren’t Always Graphical
    Some things are best visualized as pictures, some things aren’t. To date, we’re much better at graphically modeling static structure than dynamic behavior. That’s changing – for example, check out the BTS or WF tools. But generally, it’s easier to model structure than behavior graphically. Don’t try and put a square peg in a round hole. If a text based language is the best choice, that’s fine. Think about the Windows Forms Designer in VS – you use a graphical “language” to lay out your user interface, but you implement event handlers using a text-based language.
  • Explicitly Call Out Models vs. Views
    One of the areas that I get easily confused about is model views. If I’m looking at two different model visualizations (text or graphical), when are they different models and when are they views into the same model. People don’t seem to care much one way or the other, but I think the difference is critical. For example, a UML class model and a C# class are two separate models – you need a transformation to go back and forth between them. However, the VS Class Designer is a graphical view into the model described by the C# class definitions. Changes in one view are immediately visible in the other – no transformation required. If you look at the Class Designer file format, you’ll notice only diagram rendering specific information is stored (ShowAsAssociation, Position, Collapsed, etc.). I guess this could fall under “Models must be Precise” – i.e. you should precisely define if a given visualization is a view or a model – but I think this area is muddy enough to warrant it’s own tenet.

I’m sure there are more thoughts for this list, but that’s a good start. Please feel free to leave your opinion on these tenets and suggestions for new ones in my comments.