Passion * Technology * Ruthless Competence

Thursday, April 24, 2008

Morning Coffee 164

  • Big news since my last Morning Coffee post was the announcement of Live Mesh. I've been running it for about a month, and I'm really digging it. Make sure you check out the team blog and watch the developer tour video (be on the lookout for IPy about half way thru the video)

ALT.NET

  • I had a great time @ the ALT.NET open space conference last weekend. I was somewhat distracted on Saturday as due to a family communication mixup, I had to bring my son Patrick with me. Jeffrey Palermo shot a cute video of him (3 minutes in) where he explains that he's at the conference "to be with my dad". Having a five year old is a little distracting, but everyone was amazingly cool with having him around. When he gets a little older I have no doubt he'll be attending conferences and leading open sessions.
  • I did a session on F#, but it felt kinda all over the place. I hadn't touched F# in a few months and it showed IMO. Matt Podwysocki was there to help keep the session from devolving into mass chaos. Thanks Matt.
  • My favorite session of the conference was Scott Hanselman's "Are We Innovating?" talk, which I think originated from a question I asked him: There are many examples of large OSS projects in other dev communities that get ported to .NET (NHibernate, NAnt, MonoRail, etc). Can you name one that's gone the other way? I can't.
  • I took Matt's advice and joined the local ALT.NET Seattle group.

DyLang Stuff

  • Martin Maly posts about how dynamic method dispatches are cached in three different layers by the DLR. You shouldn't care about this stuff if you're a DLR language user, but you will certainly care about it if you're a DLR language builder.
  • I'm really excited to see Phil Haack (whom I met F2F @ ALT.NET) is experimenting with IronRuby & ASP.NET MVC. True, I'd rather it was IPy, but his Routes.LoadFromRuby would work with Python with very little code change.
  • Note to self, take a deeper look at Twining, the IPy database DSL by David Seruyange.
  • Daily Michael Foord - Ironclad 0.2 Released. Ironclad is a project to implement Python's C extension API in C# so that IronPython could load standard Python C modules like SciPy and NumPy. So far, they're able to load the bz2 module

Other Stuff

  • Congrats to Brad and Jim for shipping xUnit.net 1.0.
  • Everyone seems to be jumping on the functional C# coding bandwagon. Bart De Smet's series on pattern matching in C# is currently at eight posts. Now Luca Bolognese is in on the action, with three posts so far on functional code in C#. I like how Luca keeps writing that the C# syntax is "not terrible" for functional programming. Again, why suffer thru the "not terrible" syntax when you could be using F# instead? (via Charlie Calvert)
  • I need to take a look at VLinq. Charlie and Scott Hanselman both mentioned it recently.
  • I would like to have been in the conversation with Ted Neward, Neal Ford, Venkat Subramaniam, Don Box and Amanda Silver.
  • I haven't had any time to play with XNA of late, which means the great list of GDC videos Dave Weller posted on the XNA team blog will remain beyond my ability to invest time for now.
  • There's a new drop of Spec# from MS Research. IronRuby is using Spec# heavily as I recall.
Posted By Harry Pierson at 10:53 AM Pacific Daylight Time

Monday, December 17, 2007

Morning Coffee 131

  • On a recommendation from my mother-in-law, I've been watching Torchwood. Sort of Men in Black, the series and set in Cardiff. Since it's made in England, it'll be one of the few shows still running in the new year due to the WGA strike.
  • A while back I pointed out that many DotNetKicks articles were submitted by their authors. I submitted a few of my own, just for kicks (har har), with mixed results. Today, I discovered that the parse buffer post from my Practical Parsing in F# series was submitted, picked up some kicks, and made it to the home page. That's pretty cool. I guess writing more dev-focused articles is the way to go to get attention on DNK.
  • Amazon has rolled out a limited beta of SimpleDB, which appears to be S3 + query support. Cost is based on usage: 14¢/hour for machine utilization, 10¢/GB upload, 13-18¢/GB download and $1.50/GB storage/month. I'd love to see SimpleDB software that I could download and install, rather than hosted only. Even if I was going to use the hosted service, I'd like to develop against a non-hosted instance.
  • Research for sale! I was checking out the MS Research download feed and discovered a link to the Automatic Graph Layout (MSAGL) library. This was previously called GLEE (Graph Layout Execution Engine) and was "free for non-commercial use". Now, you can buy it for $295 from Windows Marketplace (though the previous free version is still available). The idea of directly commercializing research like this strikes me as pretty unusual. It must be a really good library.
  • Scott Guthrie shows off the new Dynamic Data Support that will ship as part of the ASP.NET Extensions. I'm like, whatever. Scaffolding wasn't that that interesting to me in RoR, so it's no surprise that it's not that interesting in ASP.NET.
  • Jeff "Party With" Palermo blogs about the IoC support in the new MVC Contrib project. Also looks like they're porting RoR's simply_restful. (via Scott Guthrie
  • I need to try out some of Tomas Respro's VS color schemes (also via Scott Guthrie)
Posted By Harry Pierson at 11:13 AM Pacific Standard Time

Friday, August 17, 2007

DataReaders, LINQ to XML and Range Generation

I'm doing a bunch of database / XML stuff @ work, so I decided to use to VS08 beta 2 so I can use LINQ. For reasons I don't want to get into, I needed a way to convert arbitrary database rows, read using a SqlDataReader, into XML. LINQ to SQL was out, since the code has to work against arbitrary tables (i.e. I have no compile time schema knowledge). But XLinq LINQ to XML helped me out a ton. Check out this example:

const string ns = "{http://some.sample.namespace.schema}";

while (dr.Read())
{
    XElement rowXml = new XElement(ns + tableName,
        from i in GetRange(0, dr.FieldCount)
        select
            new XElement(ns + dr.GetName(i), dr.GetValue(i)));
}

That's pretty cool. The only strange thing in there is the GetRange method. I needed an easy way to build a range of integers from zero to the number of fields in the data reader. I wasn't sure of any standard way, so I wrote this little two line function:

IEnumerable<int> GetRange(int min, int max)
{
    for (int i = min; i < max; i++)
        yield return i;
}

It's simple enough, but I found it strange that I couldn't find a standard way to generate a range with a more elegant syntax. Ruby has standard range syntax that looks like (1..10), but I couldn't find the equivalent C#. Did I miss something, or am I really on my own to write a GetRange function?

Update - As expected, I missed something. John Lewicki pointed me to the static Enumerable.Range method that does exactly what I needed.

Posted By Harry Pierson at 4:55 PM Pacific Daylight Time
ADO.NET | Database | Development | LINQ | Ruby | XML

Wednesday, July 25, 2007

Early Afternoon Coffee 105

  • My two sessions on Rome went very well. Sort of like what I did @ TechEd last month, but with a bit more kimono opening since it was an internal audience. Best things about doing these types of talks is the questions and post-session conversation. I've missed that since moving over to MSIT.
  • Late last week, I got my phone switched over to the new Office Communications Server 2007 beta. In my old office, I used the Office Communicator PBX phone integration features extensively. However, when we moved we got new IP phones that didn't integrate with Communicator. So when a chance to get on the beta came along, I jumped. I'll let you know my impressions after a few weeks, in the meantime you can read about Mark Deakin's experience.
  • Matevz Gacnik figures out how to build a transactional web service that interacts with the new transactional file system in Vista and Server 08. Interesting, but personally I don't believe in using transactional web services. The whole point of service orientation is to reduce the coupling between services. Trying two services (technically, a service consumer and provider) together in an atomic transaction seems like going in the wrong direction. Still, good on Matevz for digging into the transactional file system.
  • Udi Dahan gives us 6 simple steps to being a "top" IT consultant. I notice that getting well known, speaking and publishing are at the top of the list but actually being good at what you're well known for comes in at #5 on the list. I'm sure Udi thinks that's implicit in becoming a "top" consultant, but I'm not so sure.
  • Pat Helland thinks Normalization is for Sissies. Slide #6 has the key take away: "For God's Sake, Don't Normalize Immutable Data".
  • Larry O'Brien bashes the new binary efficient XML working group and working draft. I agree 100% w/ Larry. These aren't the droids we're looking for.
  • John Evdemon points to a new e-book from my old team called SOA in the Real World. I flipped thru it (figuratively) and it appears to drill into the Foundations of Solution Architecture as well as provide real-world case studdies for each of the pillars recurring logical capabilities. Need to give it a deeper read.
Posted By Harry Pierson at 12:36 PM Pacific Daylight Time

Wednesday, June 13, 2007

Morning Coffee 89

  • akira Akira in HD from XBL Video Marketplace? Coolness.
  • Omar Shahine has the WL Hotmail + Outlook scoop. Download it here. I've used this product off and on over the past few years. Typically, I would use it, love it, but then never get around to reinstalling it after a repave since it was subscription-only product. 
  • Microsoft releases eScrum project management tool. I've seen this internally but haven't used it yet. However, I have no doubt that the cool kids will deem it "not hot" in favor of Mingle. (via Larkware)
  • Ted Neward writes at length about relational databases, object databases and OR mapping. Ted may be Switzerland when it comes to platform, but he has no problem taking sides and mixing it up when it comes to data & object persistence. He makes some interesting points that mostly boil down to "different tools for different jobs". Also, has the dual schema problem entered the general vernacular, or just Ted's?
  • Nick Malik survives his trip to Nashville and has some thoughts on Ruby, Microsoft and alpha geeks. His point about the alpha geek track record (he sites Powerbuilder, Delphi and EJB) is spot on. This is something I've been thinking about since ETech last year. How good are alpha geeks at trendspotting? For every technology they adopt that makes the mainstream, how many don't? I'm guessing quite a few more than the three Nick mentions.
  • Speaking of alpha geeks, this whole ALT.NET silliness reminds me of the famous Groucho Marx quote: "I don't want to belong to any club that will accept me as a member." Though maybe I'm just bitter because "Working at MS" has been deemed "not hot". :)
Posted By Harry Pierson at 9:57 AM Pacific Daylight Time

Wednesday, March 21, 2007

Morning Coffee 49

  • The eBay Architecture SD Forum presentation that spawned the whole Transactionless meme is available here. As I reported yesterday, it doesn't call for going completely transactionless as Martin suggested. It calls for going without distributed transactions, which I agree with 100%.
  • More interesting than the transactional aspects, I found the data tier functional segmentation information facinating. Too bad those guys aren't using our platform, SSB was expressly designed for exactly this sort of segmentation. I also liked that step 1 for "massively scaling J2EE" is to "throw out most of J2EE".
  • After going mostly dark since last august, the manager of my old team John deVadoss has been blogging up a storm since the beginning of March. So has my old boss Mike Platt. I wonder what happened at the begining of March? Here's hoping this blogging fever spreads on my old team.
  • Joe McKendrick: "The bottom line is that ROI on SOA is an enterprise challenge, not an IT challenge." Truer words are rarely spoken.
  • The rumor mill on the Black Xbox 360 "Elite" are coming fast and furious. I don't care about the HDMI port (my HDTV is five years old and doesn't have one) but I would like a bigger hard drive...
Posted By Harry Pierson at 10:25 AM Pacific Standard Time

Monday, March 19, 2007

VSTDB, Where Have You Been All My Life?

Honestly, this post started off as a rant entitled "Is it Me, or is DB Development a Pain in the Ass?" about the sorry state of database development tools in Visual Studio. But in searching around on MSDN for information about the built-in "Database Project" type (which could more accurately be called "just a bunch of SQL scripts"), I stumbled across information about the Database Professionals edition of Visual Studio Team System. That's right, I had forgotten that we shipped this late last year.

I short, VSTDB (or whatever the "official" acronym is) is 90% of what I was looking for in a DB dev tool. Sure, it's not perfect, but it's a massive improvement over the previous state of the art.

The primary feature of VSTDB is the ability to "build" your database the same way you build your code. You use lots of small scripts that get compiled together into a model (for lack of a better term) of the database as you've defined it. That model can be deployed to a new database instance or used to update an existing instance. You can also compare that model against an existing database in order to determine what's changed and automatically build update SQL scripts for the DBA's to run in the production environment (since you don't want your developers doing that).

It takes a little getting used to, but the "lots of small scripts" approach has a lot of upside. If you have a table with a primary key, you're supposed to define the primary key, indexes, constraints, triggers, etc. in separate scripts from table creation script. This makes things much easier when you're trying to figure out what's changed in your source control system.

VSTDB has a variety of other cool looking features like data generation and unit testing, but I haven't really dug into them much yet. One thing that VSTDB supports that I wasn't expecting was Service Broker! SQL Management Studio has limited SSB support - if you want to create new SSB objects you have to write the DDL directly. VSTDB requires you to write the SSB DDL also (it makes you write DDL for everything, see below) but it at least has templates for all the SSB object types. Very Cool!

Of course, there are always things that could be improved. The T-SQL editor does syntax highlighting but not IntelliSense. It doesn't support the existing visual database tools like the Table Designer. And while you can build T-SQL stored procs, functions, types, etc, VSTDB doesn't support the development of managed SQLCLR stored procs, et.al. Things to work on for v2, I suppose. 

If you're using VS Team Suite, you can download an add-on that adds VSTDB functionality to your existing VSTS installation. It's only 8MB, so it's definitely the way to go for Team Suite users.

Posted By Harry Pierson at 4:15 PM Pacific Standard Time

Wednesday, August 16, 2006

Business Processes Are Services Too

I've been having a conversation with Piyush Pant over on his blog that started as a comment he left on my Services Aren't Stateless post. He thinks that I'm "missing the crucial point here by implicitly conflating business process and service state". While Piyush hasn't really defined what he means by these terms, I think I understand what he's getting at. Yes, process and service state are different in many ways, but they are also similar in that they are both service private data.

Pat Helland (side note - I wish Pat would start blogging again) wrote an article some time ago titled Data on the Outside vs. Data on the Inside where he talked about the differences between service private data and data in the space between the services. For example, data on the outside is immutable, requires an open schema for interop, doesn't need encapsulation and is representable in XML. Service private data is not immutable, doesn't need an open schema for interop, requires encapsulation and is typically stored in a SQL RDBMS. So on this front, process and service state are both service private data so conflating them makes some sense.

However, what's not in the article is the idea of Resource and Activity data. Not sure why Pat didn't include this in the article, but he was talking about it as far back as PDC 2003. Stu Charlton described the difference between resource and activity data in his Autonomous Services article:

Activity Data - This is "work in progress" data for any long-running business operation, and is usually encapsulated by business logic. A classic example is a shopping cart in any e-commerce system. This data is mutable, but typically has low concurrency conflicts, as it is not widely shared. Typically activity data retires after a long running operation completes, and may be archived in a decision support system for later analysis.

Resource Data - This is "state of the business" data, which represents the resources of an organization, and is usually encapsulated by business logic. Examples are: room availability in a hotel, inventory levels in a warehouse, account statuses, employee and customer information. Some resources have a small life span, others may last a very long time (years). Resource data is usually volatile with potential for high concurrency conflicts.

So I'm fairly sure that when Piyush says "process state" I should hear "activity data". Similarly "service state" is "resource data". The differences between activity and resource data lead to some interesting implementation artifacts, which I assume he getting at when he says I'm conflating the two. For example, since activity data like shopping cart has low or no concurrency issues, using an optimistic concurrency scheme is entirely appropriate, which you would never use for highly volatile resource data like warehouse inventory levels. In fact, since activity data doesn't have concurrency issues, you could even store it inside an instance of workflow or orchestration, which gets serialized to a persistent store when it's in an idle state.

However, the fact that activity and resource data is handled differently doesn't mean that most services won't have activity data. When Thomas Erl says that that stateless services is a "common principle of service orientation", essentially what I think he's saying that services should only have resource data. And as I said before, this seems wrong to me. Sure, some services will be stateless. But all services? Services implement business capabilities. Most business capabilities are long running processes. Doesn't that imply that most services in the enterprise will need to be long running workflows or orchestrations?

So for the most part, Piyush and I just seem to have different names for the same concepts. The one issue I have with Piyush's descricription of process and service state is that he seems to implicitly assume that processes aren't services. Why not? Again, not all services will be processes, but if you're not exposing processes as services, how exactly are you exposing them?

Posted By Harry Pierson at 3:21 PM Pacific Daylight Time

Monday, June 26, 2006

HawkEye on Entity Data Model Announcement

My pal Tim dropped me an email last week to let me know they (the ADO.NET team) were publishing their vNext vision around entities. Of course, they picked the week when I'm in San Diego! So I didn't get a chance to look at it until today. In a nutshell, they are raising the level of abstraction for databases. Regular DevHawk readers know I talk about abstraction a lot around here. In fact, one of my earliest posts on this blog - 1 house, 2 kids and 5 jobs ago - was on Disruptive Programming Language Technologies. So this is a topic near and dear to my heart.

This is an amazingly good thing. Think of the impact VB had on the development industry, but bigger. The abstraction level of databases hasn't been raised in decades. It's about freaking time we did.

My only problem with the article is that it's pretty obtuse. Referring to this as "Making the Conceptual Level Real" makes it sound much less exciting than it really is. Nobody refers to C# as a "conceptual" programming language. But if you use the terminology from the vision article, that's exactly what it is. Machine code is the physical level, IL is the logical layer and C# would then be the conceptual layer. But lets say you build a compiler that compiles C# directly to machine code. Would it suddenly become the logical layer? Who knows? Who cares? Let's just raise the level of abstraction and not get all caught up naming the level we're currently at.

VB was introduced 15 years ago in 1991. Most developers in the industry are aware and remember the impact VB had (if you don't, check out Billy Hollis' History of BASIC). The relational model was introduced 36 years ago, The first RDBMS was introduced in 28 years ago. I'd bet the majority of developers in the industry today don't remember a time before databases. Hell, I was introduced 36 years old myself. (I'm sure my dad remembers programming before databases, but he doesn't code much these days.)

As I said, this is going to be big and it's about freaking time. So hats off to the ADO.NET team. Can't wait to see this running. According to this, first CTP drop is August, so you don't even have to wait too long.

Posted By Harry Pierson at 2:35 PM Pacific Daylight Time

Tuesday, April 18, 2006

ActiveRecord::Migration

When I wrote about the Dual Schema problem a few weeks ago, I specifically wrote that the Rails model is backwards because it derives the in-memory schema from the database schema. While I still believe that, Rails' ActiveRecord::Migration library does make it significantly easier to manage the database from Ruby code. For those not familiar, ActiveRecord::Migration is a series of Ruby script files that define the database schema. Inside each migration script is an up and down method, so you can migrate forward and backward in the history of your project. And it provides easy to use abstractions such as create_table and add_column so you don't have to geek out on SQL syntax (unless you want to). Once you have a collection of these scripts, simply calling "rake migrate" will bring your database instance up to the current schema (rake is Ruby's equivalent of make). Or, you can set your database to a specific version of the schema by running "rake migrate VERSION=X".

I wonder why the Rolling on Rails tutorial uses the database tools directly instead of ActiveRecord::Migrate? I'm thinking it wasn't available when the tutorial was written. Whatever the reason, they really should update the tutorial to reflect the current state of Rails.

Posted By Harry Pierson at 3:22 PM Pacific Daylight Time

Tuesday, March 28, 2006

The Dual Schema Problem

A few months ago, Ted Neward wrote a great article about the history of the Object Relational Impedance Mismatch problem and how LINQ is addressing it in a new way. Basically, LINQ is introducing new language abstractions and complementary libraries to enable queries as a first class concept within the language. However, I don't believe that O/R Impedance Mismatch is the whole problem. More specifically, it's a follow-on problem to what I would call the Dual Schema problem.

In a nutshell, the Dual Schema problem is that you have to design and implement two separate versions of your persistent entities. There's the in memory version, typically written in an OO language like C# or Java. Then there's the on disk version, typically written in SQL. Regardless of the difficulties translating between the two versions (i.e. the aforementioned impedance mismatch), you have to first deal with the complexity of keeping the two versions in sync. While LINQ does a great job eliminating much of the friction translating between on disk and in memory formats, it could go much farther by eliminating the need for translation in the first place.

A variety of solutions to the Dual Schema problem have evolved, primarily outside the hallowed halls of enterprise vendors (i.e. MS and others like us). One such solution is Ruby on Rails. In a Rails environment, I simply declare the existence of a given persistent entity:

class Person < ActiveRecord::Base
end

The ActiveRecord base class (a standard part of Rails) will dynamically create methods and attributes on the Person object at runtime, based on the schema of the People table in the database. (Rails is smart enough to understand English plurals, hence the automatic connection of Person and People.) So technically there are still two schemas, but the in-memory version is automatically derived of the on-disk version.

(Note, DLinq provides a conceptually similar tool - SqlMetal - that can generate the static types from a given database schema. However, as static types they have to be defined at compile time. So while SqlMetal reduces the effort to keep schemas in sync, it doesn't eliminate it the way Rails does.)

By slaving the object schema to the database schema, Rails essentially solves the Dual Schema problem. The problem with the Rails approach is that defining a database schema requires a significant amount of skill and effort. Defining classes is typically trivial in comparison.The fact Rails allows you to implement a persistent entity with almost no code doesn’t help you much if you have to write and maintain a ton of SQL code to define your database schema.

I believe the Rails model is actually backwards. It would be much better for the developer if they could define their persistent entity in code and slave the database schema to the object model instead of the other way around.

Of course, this approach isn't exactly news. In his article, Ted writes of the rise and fall of OO database management systems, which were supposed to solve the Dual Schema and Impedance Mismatch problems. I'm certainly not suggesting a return to the heyday of OODBMS. However, one of the reasons Ted points out OODBMS failed was because big companies were already wedded to RDBMS. But those big companies are the short head. As you move down the long tail of software, relational database as the primary storage paradigm makes less and less sense. For the vast majority of applications, relational databases are overkill.

Ted's other point about OODBMS is that loose coupling between the data store and the in memory representation is a feature, not a flaw. He's totally right. But can't we advance the state of the art in database typing to the level of modern day OO languages? How about eliminating anachronisms like fixed length strings? What if we derive the database schema from the object model - Rails in reverse if you will - but is still loosely coupled enough to allow for schema evolution?

An example of this code-centric model for data storage is Consus. It’s written by Konstantin Knizhnik, who has written a bunch of open source, object-oriented and object-relational databases across a wide variety of languages and execution environments, including CLR. Consus is actually written in Java, but he provides version compiled for .NET using Visual J#. Consus lets you to define your data either as tables or objects. So you can do this:

Statement st = db.createStatement();
st.executeUpdate(
    "create table Person (name string, address string, salary bigint)");
st.executeUpdate(
    "insert into Person values ('John Smith', '1 Guildhall St.', 75000)");
ResultSet rs = st.executeQuery(
    "select name, address, salary from Person where salary > 100000");

Or you can do this:

class Person {
    String name;
    String address;
    long salary;
    Person(String aName, long aSalary, String aAddress) {
        name = aName;
        salary = aSalary;
        address = aAddress;
    }
};

Person p = new Person("John Smith", 75000, "1 Guildhall St.");
ConsusStatement st = db.createStatement();
stmt.insert(p);
ConsusResultSet cursor = (ConsusResultSet)st.executeQuery(
    "select from Person where salary > 100000");

Consus also handles OO concepts like derivation and containment. Of course, the embedded queries are ugly, but you could imagine DLinq style support for Consus. In fact, one of the primary issues with Consus is that it supports both object and tuple style queries. When you explicitly request tables (i.e. "select name, address salary from Person"), you’ve got a tuple style query. When you don’t (i.e. "select from Person”) you’ve got an object style query. Of course, the issues with tuple style queries are well documented in Ted’s article and is exactly the problem that LINQ is designed to solve.

(Konstantin, if you’re reading this, drop me a line and I’ll look into getting you hooked up with the LINQ folks if you’re interested in adding LINQ support to Consus.NET.)

The tradeoff between the Rails approach and the Consus approach is one of performance. I have a ton of respect for Konstantin and the work he’s done on Consus and other OO and OR databases available from his site. However, I sure the combined developer forces at major database vendors like Microsoft (and other DB companies) means SQL Server (and the like) will out perform Consus by a significant margin, especially on large scale databases. So if execution performance is your primary criteria, the Ruby on Rails approach is better (leaving aside discussion of the Ruby runtime itself). However, in the long run execution performance is much less important than developer productivity. So I believe that  for all the current interest in Rails, I think a Consus-style model will become dominant.

Posted By Harry Pierson at 2:35 PM Pacific Standard Time

Wednesday, March 15, 2006

The SQL Complexity Problem

I mentioned on the first day of the Compiler Dev Lab that Brian Beckman is a hoot. He's also wicked smart. He posted about his demo from Monday where he demonstrates building indexes for use in LINQ queries. In his words:

In the terminology of relational databases, a “join” is, semantically, like a nested loop over a pair of lists (or tables) of records, saving only those where some certain fields match. Unless we do something smart, this could be very expensive. Imagine searching a database of a million DNA profiles for the closest match to a sample that has 10,000 DNA features (I have no idea whether those are real numbers: I just made them up, but they sound ballpark to me). A dumb join would search all 1 million profiles for each of the 10,000 features, resulting in 10 billion match tests, almost all of which will fail – by design, of course. That’s going to hurt.

The “something smart” is to build an index and search through that. Your database doesn’t have to be large at all for this to pay off. In fact, even with just a few records, it’s cheaper to build an index, use it, and throw it away than it is to do a nested loop.

He goes on to prove out his point about building an index. For his full dataset (joining 3053 cities with 195 countries) it is literally 65x slower not to build a one-off index. Even for smaller datasets, the time difference is less dramatic but still significant. For example, with 89 cities instead of 3053, it's 3x slower not to build the index.

The reason I'm so interested in Brian's post is because of my experiments with Ning. As you might recall, in trying to build a .NET version of Partisan Hacks, I found ASP.NET 2.0 to be significantly simpler than PHP (which Ning uses). However, building even the trivial SQL Express database for Partisan Hacks was a non-trivial exercise. Sure, I've done it many times before, but it seems strange that ASP.NET makes it so easy to build a site while SQL Server makes it so complex to build a database. If I was a novice user, I would never be able to build a database for my web site.

Why is this? I think that the simple app or amateur developer is simply not the target audience for SQL Server (even SQL Express). If you don't know the difference between nvarchar(100) and varchar(max) you're pretty much out in the cold when it comes to SQL Server. Their target audience appears to be enterprise databases that are cared for by enterprise database administrators. Databases with scores of tables and millions of rows. Great for them, bad for novice users who just want to persist their data somewhere quickly and easily.

Why can't building my database be as simple as building my site?

Ning makes it easy to use their Content Store. You create an instance of a content object, you set properties (dynamic ones), you hit save. No fuss, no muss, no db schema. Sure is an easier model to understand and program to. In that regard, it blows away everything, even Ruby on Rails. RoR is pretty sweet, but it needs a real database schema on the back end in order to drive RoR's guiding principle of "convention over configuration". If there's no DB schema to discover, I think much of the RoR model would break down. (but that may just be my lack of RoR experience talking)

I not sure what a simpler database system would look like, but one idea of mine is to use a schemaless database. Much of the complexity comes from having to define both an in memory as well as perseistant schema, as well as the translation between them. If you just stored managed .NET objects, you would eliminate the redundant schema specification. It's not a fully fleshed out concept, but it is a start of an idea.

What other ideas would make persistant data significantly easier to work with?

Posted By Harry Pierson at 11:51 PM Pacific Standard Time
DevHawk
World Tour 2008
DevDays 2008

Change Congress
Recent Bookmarks
Tags .NET Framework (2) ADO.NET (5) Agile (7) AJAX (3) Architecture (282) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (93) Web Services (5) ASP.NET (18) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (113) dasBlog (11) Podcasting (4) BPM (1) C# (5) C++ (3) Capitals (5) CardSpace (3) CLR (2) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (12) Dependency Injection (2) Development (115) C Plus Plus (1) Embedded (5) Lanugages (36) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (8) Domain Specific Languages (13) Durable Messaging (5) Dynamic Languages (9) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (38) Functional Programming (11) Game Development (2) Guidance Automation (3) Hardware (8) HawkEye (3) Hockey (29) Home Electronics (1) Home Network (4) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (14) IronRuby (3) Java (2) Job (3) LINQ (19) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (27) MIX06 (2) Mobile Phone (1) Morning Coffee (165) Object Oriented (4) Office (5) Open Source (4) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (30) Games (17) General Geekery (25) Home Theater (1) Movies (22) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (15) Parsing Expression Grammar (15) patterns & practices (2) Politics (39) PowerPoint (2) PowerShell (28) Presentation (4) Projects (1) HawkWiki (1) Python (3) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (1) Rome (5) Ruby (23) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (15) Social Software (1) Software + Services (2) Software Factories (11) Software Industry (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (5) Unified Client (1) Unit Testing (3) UX (1) Virtual PC (2) Visual Studio (19) Volta (2) Washington Capitals (33) WCF (31) Web 2.0 (64) Web Services (5) WF (20) Windows Live (21) Xbox (1) Xbox 360 (51) XML (7) XNA (13)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.