Greatest Kids Ever

I’m sure my wife will blog this in more detail, but we’re on vacation so blogging isn’t top of her mind. Today was my brother’s wedding – I was the best man with the whole speech giving and everything. We left the kids with my wife’s sister tonight, but last night we brought them to the rehearsal dinner. They were the only two kids there but there were straight up amazingly well behaved. All night. Long past their bedtime (though to be honest, we are in Washington DC and still sort of on west coast time). I mean, I expect my kids to behave – but this was above and beyond. We got to the restaurant around 6pm and didn’t leave until 9:30! Tonight, at the wedding, several people told us how impressed they were with our kids. I am too.

Just wanted to blog how incredibly proud I am of my kids.

Language Features I Wish C# Had – Symbols

Ruby’s symbols are often talked about in terms of efficiency – taking up less memory and executing faster. While these are both laudable goals, symbols are more than just performance improvers. The ability to name things is valuable semantically. Take a look thru p&p’s Composite UI AppBlock and you’ll see strings used as names for things all over the place. It’s great for loose coupling, you see. But how do you tell the difference between a string used as a name and a string used for some other reason like user input? You can’t.

Rails makes extensive use of symbols – anyone who has Rolled on Rails has seen “scaffold :recipe”. That’s just the tip of the iceberg. Rails uses symbols extensively across both ActionPack and ActiveRecord (and probably others I’m not familiar with). It’s a great approach, but one that’s unique to Ruby as far as I’m aware.

Language Features I Wish C# Had – Tuples

Several languages, such as Python, have the concept of a Tuple built into the lanugage. One of things it’s used for in Python is multiple return values. So you can call “return x,y” to return two values. Of course, C# can only return one. If you need to return more values, you have to define out parameters.

LINQ / C# 3.0 / VB 9 support the idea of anonymous types, which is similar to a tuple. The big difference is that, because they’re anonymous, they can’t leave the scope they’re defined in. In other words, they’re great within a function, but if you want to pass them out of your function type-safely, you have to define a non-anonymous type for them.

Interestingly enough, F# supports tuples, though it a bit of a hack. Since the CLR doesn’t support tuples, F# basically defines different Tuple classes for up to seven tuple parameters (i.e. Tuple<t1,t2,t3,t4,t5,t6,t7>), For .NET 1.x, it’s even worse – they have to define different type names (Tuple2, Tuple3, etc). Ugh.

Update: Robert Pickering pointed out that F#’s tuple implementation is entirely transparent inside of F#. He’s right – I was writing from the perspective of a C# developer using F#’s implementation of tuples. Maybe I need to be looking closer at F#?

Business Processes Are Services Too

I’ve been having a conversation with Piyush Pant over on his blog that started as a comment he left on my Services Aren’t Stateless post. He thinks that I’m “missing the crucial point here by implicitly conflating business process and service state”. While Piyush hasn’t really defined what he means by these terms, I think I understand what he’s getting at. Yes, process and service state are different in many ways, but they are also similar in that they are both service private data.

Pat Helland (side note – I wish Pat would start blogging again) wrote an article some time ago titled Data on the Outside vs. Data on the Inside where he talked about the differences between service private data and data in the space between the services. For example, data on the outside is immutable, requires an open schema for interop, doesn’t need encapsulation and is representable in XML. Service private data is not immutable, doesn’t need an open schema for interop, requires encapsulation and is typically stored in a SQL RDBMS. So on this front, process and service state are both service private data so conflating them makes some sense.

However, what’s not in the article is the idea of Resource and Activity data. Not sure why Pat didn’t include this in the article, but he was talking about it as far back as PDC 2003. Stu Charlton described the difference between resource and activity data in his Autonomous Services article:

Activity Data – This is “work in progress” data for any long-running business operation, and is usually encapsulated by business logic. A classic example is a shopping cart in any e-commerce system. This data is mutable, but typically has low concurrency conflicts, as it is not widely shared. Typically activity data retires after a long running operation completes, and may be archived in a decision support system for later analysis.

Resource Data – This is “state of the business” data, which represents the resources of an organization, and is usually encapsulated by business logic. Examples are: room availability in a hotel, inventory levels in a warehouse, account statuses, employee and customer information. Some resources have a small life span, others may last a very long time (years). Resource data is usually volatile with potential for high concurrency conflicts.

So I’m fairly sure that when Piyush says “process state” I should hear “activity data”. Similarly “service state” is “resource data”. The differences between activity and resource data lead to some interesting implementation artifacts, which I assume he getting at when he says I’m conflating the two. For example, since activity data like shopping cart has low or no concurrency issues, using an optimistic concurrency scheme is entirely appropriate, which you would never use for highly volatile resource data like warehouse inventory levels. In fact, since activity data doesn’t have concurrency issues, you could even store it inside an instance of workflow or orchestration, which gets serialized to a persistent store when it’s in an idle state.

However, the fact that activity and resource data is handled differently doesn’t mean that most services won’t have activity data. When Thomas Erl says that that stateless services is a “common principle of service orientation”, essentially what I think he’s saying that services should only have resource data. And as I said before, this seems wrong to me. Sure, some services will be stateless. But all services? Services implement business capabilities. Most business capabilities are long running processes. Doesn’t that imply that most services in the enterprise will need to be long running workflows or orchestrations?

So for the most part, Piyush and I just seem to have different names for the same concepts. The one issue I have with Piyush’s descricription of process and service state is that he seems to implicitly assume that processes aren’t services. Why not? Again, not all services will be processes, but if you’re not exposing processes as services, how exactly are you exposing them?

Modular Compilers

During Lang.NET, I ended up sitting next to Hua Ming, who’s been working on the .NET Classbox project I wrote about previously. .NET Classbox introduces a new syntax for “using” to C# – basically, you can use individual classes as well as whole namespaces, and you can extend the individual classes you use. Obviously, that meant having a custom compiler that was 99% vanilla C# + the extra classbox syntax. Rather than building a C# compiler from scratch, the Classbox project extended the Mono Project C# compiler. Hua described the process as taking a “huge amount of time” and he described the compiler as “a monster”. Now, I’m not trying to knock Mono here, I imagine our C# compiler is just as hard to work with. SSCLI’s C# compiler directory is 5.5MB of source code alone spread across 126 .h and 68 .cpp files.

Is it just me, or does it seem crazy to have to muck about with such a large code base in order to add a relatively simple language feature? What I’d like to see is a more modular way of building compilers, so that integrating a small language feature like classbox would be a small amount of effort.

Of course, there is some work that’s been done in this space. MS Research had a Research C# compiler paper, but it’s three years old and one of the two authors has moved on to a cool product group job. I also discovered SUIF and the National Compiler Infrastructure Project, but these don’t look like they’ve been updated in a while.

I like the model that the Research C# compiler proposes. Basically, it looks like this:

  1. Specify the grammar in a modular way. In the paper, the grammar is specified in an Excel file, and you can use multiple files in a modular fashion. i.e. have one file for the core language and another for the extensions.
  2. Late bind a grammar production to an action. Typically, in a lex/yacc style scenario, you embed the action code for a given production directly into the grammar, which makes it extremely hard to extend the existing syntax. In the paper, each production is linked with an instance of a type, so swapping out a new type would seem to be possible.
  3. Generate an abstract syntax tree, that gets processed by multiple visitors. From the paper, the compiler has broken the “traditional” compiler steps – bind, typecheck, rewrite and generate binary (in this case IL) – into separate visitors. That makes adding extra steps or chaning existing steps fairly straightforward.

The only think I don’t like about this specific approach is their Excel file based parser generator. It’s a huge step beyond the LEX/YACC approach as it is scanner-less (having separate scanner and parser steps kills any chance of modularity) but it still has to deal with ambiguous grammars. Personally, I’ve been looking at Parsing Expression Grammars in part because they aren’t ambiguous. For programming lanugages, support ambiguity in the grammar is a bug, not a feature.