I mentioned on the first day of the Compiler Dev Lab that Brian Beckman is a hoot. He’s also wicked smart. He posted about his demo from Monday where he demonstrates building indexes for use in LINQ queries. In his words:
In the terminology of relational databases, a “join” is, semantically, like a nested loop over a pair of lists (or tables) of records, saving only those where some certain fields match. Unless we do something smart, this could be very expensive. Imagine searching a database of a million DNA profiles for the closest match to a sample that has 10,000 DNA features (I have no idea whether those are real numbers: I just made them up, but they sound ballpark to me). A dumb join would search all 1 million profiles for each of the 10,000 features, resulting in 10 billion match tests, almost all of which will fail – by design, of course. That’s going to hurt.
The “something smart” is to build an index and search through that. Your database doesn’t have to be large at all for this to pay off. In fact, even with just a few records, it’s cheaper to build an index, use it, and throw it away than it is to do a nested loop.
He goes on to prove out his point about building an index. For his full dataset (joining 3053 cities with 195 countries) it is literally 65x slower not to build a one-off index. Even for smaller datasets, the time difference is less dramatic but still significant. For example, with 89 cities instead of 3053, it’s 3x slower not to build the index.
The reason I’m so interested in Brian’s post is because of my experiments with Ning. As you might recall, in trying to build a .NET version of Partisan Hacks, I found ASP.NET 2.0 to be significantly simpler than PHP (which Ning uses). However, building even the trivial SQL Express database for Partisan Hacks was a non-trivial exercise. Sure, I’ve done it many times before, but it seems strange that ASP.NET makes it so easy to build a site while SQL Server makes it so complex to build a database. If I was a novice user, I would never be able to build a database for my web site.
Why is this? I think that the simple app or amateur developer is simply not the target audience for SQL Server (even SQL Express). If you don’t know the difference between nvarchar(100) and varchar(max) you’re pretty much out in the cold when it comes to SQL Server. Their target audience appears to be enterprise databases that are cared for by enterprise database administrators. Databases with scores of tables and millions of rows. Great for them, bad for novice users who just want to persist their data somewhere quickly and easily.
Why can’t building my database be as simple as building my site?
Ning makes it easy to use their Content Store. You create an instance of a content object, you set properties (dynamic ones), you hit save. No fuss, no muss, no db schema. Sure is an easier model to understand and program to. In that regard, it blows away everything, even Ruby on Rails. RoR is pretty sweet, but it needs a real database schema on the back end in order to drive RoR’s guiding principle of “convention over configuration*“*. If there’s no DB schema to discover, I think much of the RoR model would break down. (but that may just be my lack of RoR experience talking)
I not sure what a simpler database system would look like, but one idea of mine is to use a schemaless database. Much of the complexity comes from having to define both an in memory as well as perseistant schema, as well as the translation between them. If you just stored managed .NET objects, you would eliminate the redundant schema specification. It’s not a fully fleshed out concept, but it is a start of an idea.
What other ideas would make persistant data significantly easier to work with?