Service Factory Customization Workshop Day One

No morning coffee posts for the first half of this week, because I’m in training thru Wednesday. Day one was mostly overview of GAT and DSL, which was review for me. Today we’re starting to dig into some of the new stuff they’ve build for the new version of WSSF, so I’m paying much more attention today.

This isn’t your typical workshop in that the content is sort of being generated on the fly. As I type, we’re voting on what we’re going to cover for the next two days. Most classes I’ve been in are pre-programmed, the teacher doesn’t ask the class what topics should be covered and what order. There isn’t even one “teacher” - there are five folks from p&p including the architect, dev lead and PM of WSSF that are tag-teaming. Even the hands-on labs aren’t completely ironed out – they’re evolving the lab directions as we do the labs. It’s atypical, but it works.

Afternoon Coffee 106

Lots of meetings today, so my coffee post is late…

  • The Big Newstm: Visual Studio 2008 and .NET Framework 3.5 Beta 2 is available for download. Soma and Scott have more. Silverlight 1.0 RC and the Silverlight Add-in for VS08 will apparently be available in a couple of days. Finally, there’s a go-live license for the framework, so you get a head-start deploying apps before VS08 and NETFX 3.5 RTM. Time to build out a new VPC image.
  • Next week, I’m attending the p&p Service Factory v3 Customization Workshop. I’m looking forward to playing with the new Service Factory drop, but I’m really interested in learning more about building factories. I wonder if they’re going to discuss their VS08 plans.
  • Nick Malik recently wrote about making “middle out SOA” work. I hate that term “middle-out”. It feels like we’re pinning our hopes on middle-out because we know top-down and bottom-up don’t work. My old boss John DeVadoss (who assures me he’ll be blogging regularly again “soon”) big vs. little SOA, with big SOA being “dead”. I like the term “little SOA” better than “middle-out SOA”, but just because big SOA is a big failure, doesn’t mean little SOA will make any headway.
  • There’s a new F# drop available. Don Syme has the details. Looks like they’ve got some interesting new constructs for async and parallel programing.
  • ABC announced yesterday that they are streaming HD on their website. So you can check out the season finale of Lost in HD for free. They embed commercials so it’s not really “for free”, but you don’t have to pay $3 an episode like you do on XBLM. I wonder if XBLM might offer this capability in the future? Certainly would increase my use of XBLM. (as would an all-you-can-eat pricing scheme)

The Durable Messaging Debate Continues

Last week, Nick Malik responded to Libor Soucek’s advice to avoid durable messaging. Nick points out that while both durable and non-durable messaging requires some type of compensation logic (nothing is 100% foolproof because fools are so ingenious), the durable messaging compensation logic is significantly simpler.

This led to a very long conversation over on Libor’s blog. Libor started by clarifying his original point, and then the two of them went back and forth chatting in the comments. It’s been very respectful, Libor calls both Nick and I “clever and influential” though he also thinks we’re wrong on this durable messaging thing. In my private emails with Libor, he’s been equally respectful and his opinion is very well thought out, though obviously I think he’s the one who’s wrong. 😄

I’m not sure how much is clear from Libor’s public posts, but it looks like most of his recent experience comes from building trading exchanges. According to his about page, he’s been building electronic trading systems since 2002. While I have very little experience in that domain, I can see very clearly how the highly redundant, reliable multicast approach that he describes would be a very good if not the best solution.

But there is no system inside Microsoft IT that looks even remotely like a trading exchange. Furthermore, I don’t think approaches for building a trading exchange generalize well. So that means Nick and I have very different priorities than Libor, something that seems to materialize as a significant amount of talking past each other. As much as I respect Libor, I can’t shake the feeling that he doesn’t “get” my priorities and I wouldn’t be at all surprised if he felt the same way about me.

The biggest problem with his highly redundant approach is the sheer cost when you consider the large number of systems involved. According to Nick, MSIT has “over 2700 applications in 82 solution domains”. When you consider the cost for taking a highly redundant approach across that many applications, the cost gets out of control very quickly. Nick estimates that the support staff cost alone for tripling our hardware infrastructure to make it highly redundant would be around half a billion dollars a year. And that doesn’t include hardware acquisition costs, electricity costs, real-estate costs (gotta put all those servers somewhere) or any other costs. The impact to Microsoft’s bottom line would be enormous, for what Nick calls “negligible or invisible” benefit.

There’s no question that high availability costs big money. I just asked Dale about it, and he said that in his opinion going above 99.9% availability increases costs “nearly exponentially”. He estimates just going from 99% to 99.9% doubles the cost. 99% availability is almost 15 minutes of downtime per day (on average). 99.9% is about 90 seconds downtime per day (again, on average).

How much is that 13 extra minutes of uptime per day worth? I would say “depends on the application”. How many of the 2700 applications Nick mentions need even 99% availability? Certainly some do, but I would guess that less than 10% of those systems need better than 99% availability. What pretty much all of them actually need is high reliability, which is to say they need to work even in the face of “hostile or unexpected circumstances” (like system failures and downtime).

High availability implies high reliability. However, the reverse is not true. You can build systems to gracefully handle failures without the cost overhead of highly redundant infrastructure intended to avoid failures. Personally, I think the best way to build such highly reliable yet not highly available systems is to use durable messaging, though I’m sure there are other ways.

This is probably the biggest difference between Libor and me. I am actively looking to trade away availability (not reliability) in return for lowering the cost of building and running a system. To someone who builds electronic trading systems like Libor, that probably sounds completely wrongheaded. But an electronic trading system would fall into the minority of systems that need high availability (ultra high five nines of availability in this case). For the systems that actually do need high availability, you have to invest in redundancy to get it. But for the rest of the systems, there’s a less costly way to get the reliability you need: Durable Messaging.

*Now* How Much Would You Pay For This Code?

Via Larkware and InfoQ, I discovered a great post on code reuse by Dennis Forbes: Internal Code Reuse Considered Dangerous. I’ve written about reuse before, albeit in the context of services. But where I wrote about the impact of context on reuse (high context == low or no reuse), Dennis focused on the idea of accidental reuse. Here’s the money quote from Dennis:

Code reuse doesn’t happen by accident, or as an incidental – reusable code is designed and developed to be generalized reusable code. Code reuse as a by-product of project development, usually the way organizations attempt to pursue it, is almost always detrimental to both the project and anyone tasked with reusing the code in the future. [Emphasis in original]

I’ve seen many initiatives of varying “officialness” to identify and produce reusable code assets over the years, both inside and outside Microsoft. Dennis’ point that code has to be specifically designed to be reusable is right on the money. Accidental code (or service) reuse just doesn’t happen. Dennis goes so far as to describe such efforts as “almost always detrimental to both the project and anyone tasked with reusing the code in the future”.

One of the more recent code reuse efforts I’ve seen went so far as to identify a reusable asset lifecycle model. While it was fairly detailed at the lifecycle steps that came after said asset came into existence, it was maddeningly vague as to how these reusable assets got built in the first place. The lifecycle said that a reusable asset “comes into existence during the planning phases”. That’s EA-speak for “and then a miracle happens”.

Obviously, the hard part about reusable assets is designing and building them in the first place. So the fact that they skimped on this part of the lifecycle made it very clear they had no chance of success with the project. I shot back the following questions, but never got a response. If you are attempting such a reuse effort, I’d strongly advise answering these questions first:

  • How does a project know a given asset is reusable?
  • How does a project design a given asset to be reusable?
  • How do you incent (incentivize?) a project to invest the extra resources (time, people, money) in takes to generalize an asset to be reusable?

And to steal one from Dennis:

  • What, realistically, would competitors and new entrants in the field offer for a given reusable asset?

Carl Lewis wonders Is your code worthless? As a reusable asset, probably yes.

Early Afternoon Coffee 105

  • My two sessions on Rome went very well. Sort of like what I did @ TechEd last month, but with a bit more kimono opening since it was an internal audience. Best things about doing these types of talks is the questions and post-session conversation. I’ve missed that since moving over to MSIT.
  • Late last week, I got my phone switched over to the new Office Communications Server 2007 beta. In my old office, I used the Office Communicator PBX phone integration features extensively. However, when we moved we got new IP phones that didn’t integrate with Communicator. So when a chance to get on the beta came along, I jumped. I’ll let you know my impressions after a few weeks, in the meantime you can read about Mark Deakin’s experience.
  • Matevz Gacnik figures out how to build a transactional web service that interacts with the new transactional file system in Vista and Server 08. Interesting, but personally I don’t believe in using transactional web services. The whole point of service orientation is to reduce the coupling between services. Trying two services (technically, a service consumer and provider) together in an atomic transaction seems like going in the wrong direction. Still, good on Matevz for digging into the transactional file system.
  • Udi Dahan gives us 6 simple steps to being a “top” IT consultant. I notice that getting well known, speaking and publishing are at the top of the list but actually being good at what you’re well known for comes in at #5 on the list. I’m sure Udi thinks that’s implicit in becoming a “top” consultant, but I’m not so sure.
  • Pat Helland thinks Normalization is for Sissies. Slide #6 has the key take away: “For God’s Sake, Don’t Normalize Immutable Data”.
  • Larry O’Brien bashes the new binary efficient XML working group and working draft. I agree 100% w/ Larry. These aren’t the droids we’re looking for.
  • John Evdemon points to a new e-book from my old team called SOA in the Real World. I flipped thru it (figuratively) and it appears to drill into the Foundations of Solution Architecture as well as provide real-world case studdies for each of the pillars recurring logical capabilities. Need to give it a deeper read.