Lang.NET 2006

Erik Meijer just posted details about Lang .NET 2006 over on Lambda the Ultimate. Looks to be the next generation of the Complier Dev Lab I attended last month. The appear to have opened up the program significantly, and are asking for abstracts for both 30 minute talks as well as 10 minute “lightning” talks. If you’re interested in submitting, here’s the list of topics they are most interested in:

  • Dynamic languages and scripting
  • AJAX and ATLAS
  • Domain specific languages
  • Functional languages
  • Object-oriented and aspect-oriented programming
  • Web-services and mobile code
  • Libraries
  • Language-Integrated Query (LINQ)
  • Compiler frameworks
  • Garbage collection
  • JIT compilation
  • Visual Programming
  • Success and failure stories
  • Non-standard language features and implementation techniques
  • Tools and IDE support

RubyNet Project

Having written about Ruby in the scope of the Compiler Dev Lab and the Dual Schema Problem, I was interested to come across the Ruby.NET project from Queensland University of Technology. From the Ruby.NET home page:

Our goal is to create a compiler for the Ruby language that targets the .NET CLR. We aim to support 100% of Ruby language semantics, including all dynamic constructs such as closures and continuations. We plan to generate 100% managed and verifiable CIL code.

Sweet!

The Dual Schema Problem

A few months ago, Ted Neward wrote a great article about the history of the Object Relational Impedance Mismatch problem and how LINQ is addressing it in a new way. Basically, LINQ is introducing new language abstractions and complementary libraries to enable queries as a first class concept within the language. However, I don’t believe that O/R Impedance Mismatch is the whole problem. More specifically, it’s a follow-on problem to what I would call the Dual Schema problem.

In a nutshell, the Dual Schema problem is that you have to design and implement two separate versions of your persistent entities. There’s the in memory version, typically written in an OO language like C# or Java. Then there’s the on disk version, typically written in SQL. Regardless of the difficulties translating between the two versions (i.e. the aforementioned impedance mismatch), you have to first deal with the complexity of keeping the two versions in sync. While LINQ does a great job eliminating much of the friction translating between on disk and in memory formats, it could go much farther by eliminating the need for translation in the first place.

A variety of solutions to the Dual Schema problem have evolved, primarily outside the hallowed halls of enterprise vendors (i.e. MS and others like us). One such solution is Ruby on Rails. In a Rails environment, I simply declare the existence of a given persistent entity:

class Person < ActiveRecord::Base
end

The ActiveRecord base class (a standard part of Rails) will dynamically create methods and attributes on the Person object at runtime, based on the schema of the People table in the database. (Rails is smart enough to understand English plurals, hence the automatic connection of Person and People.) So technically there are still two schemas, but the in-memory version is automatically derived of the on-disk version.

(Note, DLinq provides a conceptually similar tool – SqlMetal – that can generate the static types from a given database schema. However, as static types they have to be defined at compile time. So while SqlMetal reduces the effort to keep schemas in sync, it doesn’t eliminate it the way Rails does.)

By slaving the object schema to the database schema, Rails essentially solves the Dual Schema problem. The problem with the Rails approach is that defining a database schema requires a significant amount of skill and effort. Defining classes is typically trivial in comparison.The fact Rails allows you to implement a persistent entity with almost no code doesn’t help you much if you have to write and maintain a ton of SQL code to define your database schema.

I believe the Rails model is actually backwards. It would be much better for the developer if they could define their persistent entity in code and slave the database schema to the object model instead of the other way around.

Of course, this approach isn’t exactly news. In his article, Ted writes of the rise and fall of OO database management systems, which were supposed to solve the Dual Schema and Impedance Mismatch problems. I’m certainly not suggesting a return to the heyday of OODBMS. However, one of the reasons Ted points out OODBMS failed was because big companies were already wedded to RDBMS. But those big companies are the short head. As you move down the long tail of software, relational database as the primary storage paradigm makes less and less sense. For the vast majority of applications, relational databases are overkill.

Ted’s other point about OODBMS is that loose coupling between the data store and the in memory representation is a feature, not a flaw. He’s totally right. But can’t we advance the state of the art in database typing to the level of modern day OO languages? How about eliminating anachronisms like fixed length strings? What if we derive the database schema from the object model – Rails in reverse if you will – but is still loosely coupled enough to allow for schema evolution?

An example of this code-centric model for data storage is Consus. It’s written by Konstantin Knizhnik, who has written a bunch of open source, object-oriented and object-relational databases across a wide variety of languages and execution environments, including CLR. Consus is actually written in Java, but he provides version compiled for .NET using Visual J#. Consus lets you to define your data either as tables or objects. So you can do this:

Statement st = db.createStatement();
st.executeUpdate("create table Person (name string, address string, salary bigint)");
st.executeUpdate("insert into Person values ('John Smith', '1 Guildhall St.', 75000)");
ResultSet rs = st.executeQuery(
    "select name, address, salary from Person where salary > 100000");

Or you can do this:

class Person {
    String name;
    String address;
    long salary;
    Person(String aName, long aSalary, String aAddress) {
        name = aName;
        salary = aSalary;
        address = aAddress;
    }
};

Person p = new Person("John Smith", 75000, "1 Guildhall St.");
ConsusStatement st = db.createStatement();
stmt.insert(p);
ConsusResultSet cursor = (ConsusResultSet)st.executeQuery(
    "select from Person where salary > 100000");

Consus also handles OO concepts like derivation and containment. Of course, the embedded queries are ugly, but you could imagine DLinq style support for Consus. In fact, one of the primary issues with Consus is that it supports both object and tuple style queries. When you explicitly request tables (i.e. “select name, address salary from Person”), you’ve got a tuple style query. When you don’t (i.e. “select from Person”) you’ve got an object style query. Of course, the issues with tuple style queries are well documented in Ted’s article and is exactly the problem that LINQ is designed to solve.

(Konstantin, if you’re reading this, drop me a line and I’ll look into getting you hooked up with the LINQ folks if you’re interested in adding LINQ support to Consus.NET.)

The tradeoff between the Rails approach and the Consus approach is one of performance. I have a ton of respect for Konstantin and the work he’s done on Consus and other OO and OR databases available from his site. However, I sure the combined developer forces at major database vendors like Microsoft (and other DB companies) means SQL Server (and the like) will out perform Consus by a significant margin, especially on large scale databases. So if execution performance is your primary criteria, the Ruby on Rails approach is better (leaving aside discussion of the Ruby runtime itself). However, in the long run execution performance is much less important than developer productivity. So I believe that  for all the current interest in Rails, I think a Consus-style model will become dominant.

Compiler Dev Lab – Scripting

Day Two of the Compiler Dev Lab was all about scripting. Iron Python was the primary focus of the day, but they also had Phalanger (Managed PHP) and Monad folks there as well.

  • I hadn’t realized just how performant these dynamic languages are on the CLR when compared to their native versions. The original version of Iron Python was 1.7x faster than the standard C implementation back in the summer of ’04. Now with CLR 2.0, that version is now 2x faster with out any code changes. The Phalanger folks said they are 2.5x faster than the native version of PHP (1.7x faster than PHP + the Zend Optimizer). That’s pretty impressive performance.
  • The IronPython folks are heavy users of the new DynamicMethod class from .NET 2.0. Otherwise known as Lightweight Code Generation, DynamicMethod allows you emit a static function but have it get garbage collected when it’s no longer needed. IP almost never generates new classes, since new types can’t be garbage collected. The only times they generate actual classes are when you inherit from an existing .NET class or when you generate a new delegate type.
  • It’s really hard to serve the dual masters of both the existing language community and the .NET community. Jim Hugunin used the example of String.Trim(). A .NET developer would expect String.Trim() to “just work”. A Python developer would expect that to throw an AttributeError exception (the Python equivalent of Trim is strip). How do you handle this? In IP, it defaults to pure Python mode, but if you enter “import clr”, you move into .NET hybrid mode.
  • One of the typical features of dynamic languages is the ability to change the base class of an object on the fly. Jim demoed this with WPF. He created a class that inherited from one type of panel and then set the __class__ property of the object to a different panel and the display changed immediately. Freaky, but cool.
  • Jim showed a demo of a WPF app that hosted Python for extensibility. One of the scripts in turn hosted Python to create an interactive console for the app. Having a scripting engine that can host itself is awesome.
  • The VSIP SDK CTP (reg required) includes an sample lanugage integration project for Iron Python. So you can get both the source into IP language itself as well as the source to the integration into Visual Studio.
  • I got an email yesterday from someone asking about the possibility of Visual Ruby.NET. I haven’t heard anything about it, but it would be cool to see Ruby on Rails runing under CLR. John Lam is working on RubyCLR, but my understanding is that is a bridge between the CLR and the Ruby runtime, not a CLR implemenation of the Ruby runtime. (IP is a CLR implementation of the Python runtime.) I’m thinking that there are some similarities between Ruby and Python, so having the source of IronPython would be a huge help in building a Visual Ruby implementation. For example, both Ruby and Python have closures. IP has a FunctionEnvironment class which is used to lift stack variables onto the heap in a variety of scenarios, including closures. So if I was building Visual Ruby, having access to the FunctionEnvironment class would be a good start.
  • I said yesterday that I need to learn more about F#. They showed a video of an internal F# presentation, but I spent most of my time cracking jokes with Sam Gentile who’s in town for an SC-BAT workshop.
  • I didn’t pay enough attention to the Monad presentation. 😦