The Irrelevant Semantics of Compiled XAML

From newly converted ARC MVP Sam Gentile, I found this interesting post from Drew Marsh about how XAML is compiled:

XAML is indeed a language, but it is never compiled into C# or IL… The truth is, it’s not “compiled” at all. If anything you can say it is “compacted” and that only happens in scenarios where it is turned into a BAML stream. That, however, is an optimization and not a necessity for XAML to work. XAML can be interpreted purely in it’s raw XML text form, using Parser::LoadXml. Even BAML is interpreted, it’s just a far more compact and efficient representation of the object graph than raw XML text.
[Drew Marsh – The XAML Experience]

Given that I just wrote about compiling, I wanted to weigh with a couple of points:

  • First, by definition compilation is a translation from one format to another. Therefore, converting XAML to BAML is a compilation step. The SDM folks have a command line tool for compiling deployment reports from the models in the Architect edition of VSTS. However, I assume what Drew meant here was that XAML isn’t compiled into a directly executable format, so in reality I’m just being picky about the use of the word “compiled”.

  • Second, the fact that the XAML is compiled into an efficient binary representation and then embedded as a resource (as per Rob Relyea) is fascinating from an implementation perspective, but somewhat irrelevant semantically. Drew points out that the BAML is interpreted. With VM environments like CLR, the line between interpreted and compiled blurs considerably. Rob’s post referenced above is based on the PDC 03 XAML bits, and at the time, XAML could be compiled into BAML or IL. However, at the time (20 months ago) Rob guessed that the IL compilation would be cut because the BAML perf was just as good or better, the file size was smaller and localization is easier. In the end, the XAML file is converted into a format the machine can execute – the specific choice of compilers and transformations isn’t particularly interesting from a modeling perspective since it happens automatically.

XAML isn’t the only place where the traditional compiling to executable model is being stretched. In Windows Workflow Foundation, though your workflow is defined as a type, you can actually modify running instances of the workflow. Given that WF supports declaring workflows as C# or XOML (soon to be XAML), I wonder if they are going to go the same route as WPF and eliminate the C#/IL way of declaring workflows. Another interesting example is LINQ and C# 3.0. This is interesting because you can use LINQ directly on in memory data, but when you apply it to database (via DLinq) the LINQ statements are parsed into expression trees then converted into SQL. (Check out this post from Ian Griffiths for deeper coverage of expression trees).

Anyway, it’s late and I realize I’ve written quite a bit to basically say that the definition of “compiled” is pretty blurry at this point and getting blurrier going forward. In the end, it’s much more interesting IMO to focus on the model environment you’re working in (XAML in this case) rather than the details of how that model is translated into the execution environment, unless you’re the one building those translation tools.

Update: I had one other thought on all this. It’s interesting that computing power (CPU + IO bandwidth) have improved to a point where the performance of interpreting BAML at runtime is as fast than executing XAML compiled to IL directly. I certainly wouldn’t have assumed that.

Code is Model

In the foreword to Architecture Journal 3, I wrote the following:

Abstraction is the architect’s key tool for dealing with complexity. We’ve seen the evolution of architectural abstractions techniques such as models and patterns; however, we have yet to realize much in the way of measurable value from them. Models and patterns are useful for communicating effectively with our peers, but so far they haven’t helped drastically reduce the amount of resources it takes to build and operate a system. In order to continue to deal with increasing complexity, we need to get much more pragmatic about using abstraction to solve problems.

Because the lack of measurable value to date, the IT industry at large has come to view models at best as “pretty pictures” and at worst as a pointless waste of time and resources. But the reality is, we use models in the IT industry all the time. I don’t know what you’re favorite modeling tool it, but my current favorite is C#. Before that was VB and before that was C++. I realize some of my readers might be more partial to Java, Ruby, or even Assembly. But the reality is: all of these so-called “higher level” programming languages are simply models of the CPU execution environment.

The only code that the CPU can understand is machine code. But nobody wants to write and debug all their code using 0’s and 1’s. So we move up a level of abstraction and use a language that humans can read more easily and that can be automatically translated (i.e. compiled) into machine code the CPU can execute. The simplest step above machine code is assembly language. But ASM isn’t particularly productive so work with, so the industry has continuously raised the level of abstraction in the languages they use. C is a higher level of abstraction than ASM, adding concepts like types and functions. C++ is a higher level of abstraction than C, adding concepts like classes and inheritance. Each of these levels of abstraction presents a new model of the execution environment with new features that make programming more productive (and sometimes more portable). Different languages offer different models of the execution environment. For example, the Ruby model of the execution environment allows for the manipulation of class instances while the C++ model allows for multiple inheritance. This isn’t to say one is better than the other – they are just different.

In the past decade, we’ve seen the rise in popularity of VM based programming environments – primarily Java and CLR. In these environments, there are multiple models at work. CLR languages and Java are models above the underling VM execution environment. The VM execution environment is, in turn, a model of the physical execution environment. As an example, a C#/Java program is translated into IL/bytecode at compile time and then from IL/bytecode to machine code at runtime. So in these VMs, two model translations have to occur in order to go from programming language to machine code. It turns out that this multiple step translation approach is also useful in non-VM environments. For example, the original C++ compiler output vanilla C code which was, in turn, compiled with a vanilla C compiler. C# and Java use a similar approach, except that the second translation occurs at runtime, not compile time.

So if Code is Model, what can we learn from looking at the success of mainstream text-based programming languages to help us in the development of higher abstraction modeling languages that are actually useful. This isn’t an exhaustive list, but here are a few things (tenets?) I’ve thought of:

  • Models must be Precise
    There must be no ambiguity in the meaning of the elements in a model. In C#, every statement and keyword has an exact well-defined meaning. There is never a question as to what any given piece of C# code means. There may be context-sensitive meanings, such as how the keyword “using” has different meanings in C# depending on where it is used. If you don’t have a similar level of precision in your model, there’s no way to transform it to lower abstraction models in a deterministic fashion. Models that can’t be transformed into lower abstraction models are nothing more than pretty pictures – perhaps useful for communication with other people on the project, but useless as development artifacts.
  • Model Transformation must be Deterministic
    By definition (or at least by convention), models are at a higher level of abstraction than both your execution domain and mainstream programming languages – perhaps significantly higher. In order to derive value from a model, you must be able to transform it into the execution domain. Like the C# to IL to machine code example, the model transformation may comprise multiple steps. But each transformation between models must be as precise as the models themselves. When you compile a given piece of C# code, you get the same IL output every time. However, this transformation can vary across target models. For example, when you run a managed app on a x86 machine you get different machine code than if you ran it on an x64 machine.
  • Models must be Intrinsic to the Development Process
    Even if you have precise models and deterministic transformations, you have to make them first class citizens of the development process or they will become outdated quickly. How often have you blueprinted your classes with UML at the start of the project, only to have that class diagram be horribly out of date by the end of the project? In order to keep models up to date, they must be used through-out the development process. If you need to make a change, make the change to the model and then retransform into the execution domain. Think of C# and IL – do we use C# as a blueprint, transform once to IL and then hand edit the IL? No! We change the C# directly and retransform into IL. We need to have the same process even as we move into higher levels of abstraction.
  • Models Aren’t Always Graphical
    Some things are best visualized as pictures, some things aren’t. To date, we’re much better at graphically modeling static structure than dynamic behavior. That’s changing – for example, check out the BTS or WF tools. But generally, it’s easier to model structure than behavior graphically. Don’t try and put a square peg in a round hole. If a text based language is the best choice, that’s fine. Think about the Windows Forms Designer in VS – you use a graphical “language” to lay out your user interface, but you implement event handlers using a text-based language.
  • Explicitly Call Out Models vs. Views
    One of the areas that I get easily confused about is model views. If I’m looking at two different model visualizations (text or graphical), when are they different models and when are they views into the same model. People don’t seem to care much one way or the other, but I think the difference is critical. For example, a UML class model and a C# class are two separate models – you need a transformation to go back and forth between them. However, the VS Class Designer is a graphical view into the model described by the C# class definitions. Changes in one view are immediately visible in the other – no transformation required. If you look at the Class Designer file format, you’ll notice only diagram rendering specific information is stored (ShowAsAssociation, Position, Collapsed, etc.). I guess this could fall under “Models must be Precise” – i.e. you should precisely define if a given visualization is a view or a model – but I think this area is muddy enough to warrant it’s own tenet.

I’m sure there are more thoughts for this list, but that’s a good start. Please feel free to leave your opinion on these tenets and suggestions for new ones in my comments.

Ted on C# 3.0

I just discovered Ted Neward’s blog has moved. In catching up, I found this great post on the new features of C# 3.0. Even though I had read thru the C# 3.0 spec, Ted’s explanation was much easier to read.

FYI, speaking of Ted, I’ll be speaking at his No Fluff Just Stuff .NET software symposium. Still working w/ Ted on the abstracts, but basically I’m basically talking about patterns, GAT and DSLs.

MVP Summit Wrap Up Thoughts

It’s hard to believe it October already. The last three weeks have been jam packed, starting with PDC 05, then a variety of meetings culminating with the company meeting the following week, then the 2005 MVP Global Summit last week. This was the first MVP Summit to include Architect MVPs so it was pretty stressful. Of course, there were things we could have done better, but all-in-all I was happy with the event. A year ago, we had just awarded our first 14 Architect MVPs. Now we’re 100 strong between our solution and infrastructure Architect MVPs and we had better than half of them in Redmond for the summit. I swear, it will take us the rest of the fiscal year to implement even half of their suggestions.

I’m sure each of the various groups that have MVPs think that their MVPs are the best, so I guess I’m no different in that regard. Our Architect MVPs are an amazing group and I am already looking forward to the next opportunity to get a bunch of them in a room together again.

DevHawk on C9

I’m sure it will get lost in the massive tide of C9 videos sure to come out of PDC, but Scoble posted an interview he did with me a few months ago. Check it out.