The Other Foundation Technology

I mentioned last week that WF “is one of two foundation technologies that my project absolutely depends on”. Sam Gentile assumes the other foundation technology is WCF. It’s not.

As a quick reminder, my day job these days is to architect and deliver shared service-oriented infrastructure for Microsoft’s IT division. These services will be automating long running business operations. And when I say long running, I mean days, weeks or longer. While there will surely be some atomic or stateless services, I expect most of the services we build will be long running. Thus, the infrastructure I’m responsible for has to enable and support long running services.

The other foundation technology my project depends on is Service Broker. Service Broker was expressly designed for building these types of long running services. It supports several capabilities that I consider absolutely critical for long running services:

  • Service Initiated Interaction. Polling for changes is inefficient. Long running operations need support for the Solicit-Response and/or Notification message exchange patterns.
  • Durable Messaging. The first fallacy of distributed computing is that the network is reliable. If you need to be 100% sure the message gets delivered, you have to write it to disk on both sides.
  • Service Instance Dehydration. It’s both dangerous and inefficient to keep an instance of a long running service in memory when it’s idle. In order to maximize integrity (i.e. service instances survive a system crash) as well as resource utilization (i.e. we’re not wasting memory/CPU/etc on idle service instances), service instances must be dehydrated to disk.

In addition to these capabilities, Service Broker supports something called Conversation Group Locking, which turns out to be important when building highly scalable long running services. Furthermore, my understanding is that Conversation Group Locking is a feature unique to Service Broker, not only across Microsoft’s products but across the industry. Basically, it means that inbound messages for a specific long running service instance are locked so they can’t be processed on more than one thread at a time.

Here’s an example: let’s say I’m processing a Cancel Order message for a specific order when the Ready to Ship message arrives for that order arrives. With Conversation Group Locking, the Ready to Ship message stays locked in the queue until the Cancel Order message transaction is complete, regardless of the number of service threads there are. Without Conversation Group Locking, the Ready to Ship message might get processed by another service thread at the same time the Cancel Order message is being processed. The customer might get notified that the cancellation succeeded while the shipping service gets notified to ship the product. Oops.

There’s also an almost-natural fit between Service Broker and Windows Workflow. For example, a Service Broker Conversation Group and a WorkflowInstance are roughly analogous. They even both use a Guid for identification, making mapping between Conversation Group and WF Instance simple and direct. I was able to get prototype Service Broker / WF integration up and running in about a day. I’ll post more on that integration later this week.

Last but not least, Service Broker is wicked fast. Unfortunately, I don’t have any public benchmarks to point to, but the Service Broker team told me about a private customer benchmark that handled almost 9,000 messages per second! One of the reasons Service Broker is so fast is because it’s integrated into SQL Server 2005, which is is pretty fast in it’s own right. Since Service Broker is baked right in, you can do all your messaging work and your data manipulation within the scope of a local transaction.

Service Broker has a few rough areas and it lacks a supported managed api (though there is a sample managed api available). Probably the biggest issue is that Service Broker has almost no interop story. If you need to interop with a Service Broker service, you can use SQL Server’s native Web Service support. or the BizTalk adapter for Service Broker from AdapterWORX. However, I’m not sure how many of Service Broker’s native capabilities are exposed if you use these interop mechanisms. You would probably have to write a bunch of application code to make these capabilities work in an interop scenario.

Still, I feel Service Broker’s unique set of capabilities, its natural fit with WF and its high performance make it the best choice for building my project’s long running services. Is it the best choice for your project? I have no idea. One of the benefits of working for MSIT is that I get to focus on solving a specific problem and not on solving general problems. I would say that if you’re doing exclusively atomic or stateless services, Service Broker is probably overkill. If you’re doing any long running services at all, I would at least give Service Broker a serious look.

Thoughts on the SOA Workshop

Last week, I attended an SOA workshop presented by SOA Systems and delivered by “top-selling SOA author” Thomas Erl. It was two SOA-jammed days + the drive to Vancouver and back primarily discussing SOA with Dale. In other words, it was a lot of SOA. I went up expecting to take Erl to task for his “Services are Stateless” principle. However, that turned out to be a misunderstanding on my part about how Erl uses the term stateless. However, while Erl and I agreed on optimizing memory utilization (which is what he means by stateless), that wasn’t much else when it came to common ground. As I wrote last week, Erl’s vision of service-orientation is predicated on unrealistic organizational behavior and offer at best unproven promises of cost and time savings in the long run via black box reuse.

Erl spends a lot of time talking about service reuse. I think it’s safe to say, in Erl’s mind, reuse is the primary value of service orientation. However, he didn’t offer any reason to believe we can reuse services any more successfully than we were able to reuse objects. Furthermore, his predictions about the amount of reuse you can achieve are completely made up. At one point, he was giving actual reuse numbers (i.e. 35% new code, 65% existing code). When I asked him where those numbers came from, Erl admitted that they were “estimates” because “there hasn’t been enough activity in serious SOA projects to provide accurate metrics” and that there is “no short term way of proving” the amount of service reuse. In other words, Erl made those numbers up out of thin air.

This whole “serious” or “real” SOA is a major theme with Erl. One the one hand, I agree that SOA is a horribly overused term. Many projects labeled SOA have little or nothing to do with SO. On the other hand, it seems pretty convenient to chalk up failed projects as not being “real” SOA so you can continue to spout attractive yet completely fictional reuse numbers. I asked about the Gartner’s 20% service reuse prediction and Erl responded that low reuse number was because the WS-* specs are still in process. While I agree that the WS-* specs are critical to the advancement of SO, I fail to see how lack of security, reliable messaging and transactions are holding back reuse. If anything, I would expect those specs to impede reuse, as it adds further contextual requirements to the service.

While I think Erl is mistaken when it comes to the potential for service reuse, he’s absolutely dreaming when it comes to the organizational structure and behavior that has to be in place for this potential service reuse to happen in the first place. I’m not sure what Erl was doing before he became a “top-selling SOA author,” but I find it hard to believe it included any time in any significantly sized IT shop.

Erl expects services – “real” services, anyway – to take around 30% more time and money than he traditional siloed approach. The upside for spending this extra time and money is the potential service reuse. The obvious problem with this is that we don’t know how much reuse we’re going to see for this extra time and money. If you spend 30% more but can only reuse 20% of your services (as Gartner predicts), is it worth it? If you end up spending 50% more but are only able to reuse 10% of your services, is it worth it? Where’s the line where it’s no longer worth it to do SOA? Given that there’s no real way to know how much reuse you’re going to see, Erl’s vision of SOA requires a huge leap of faith on the part of the implementer. “Huge leap of faith” doesn’t go so well with “corporate IT department”.

Furthermore, the next IT project I encounter that is willing to invest any additional time and money – much less 30% – in order to achieve some theoretical organizational benefit down the road will be the first. Most projects I’ve encountered (including inside MSIT) sacrifice long term time and money in return for short term gain. When asked how to make this 30% investment happen, Erl suggested that the CIO has to have a “dictatorial” relationship with the projects in the IT shop. I’m thinking that CIO’s that adopt a dictatorial stance won’t get much cooperation from the IT department and will soon be ex-CIO’s.

In the end, I got a lot less out of this workshop than I was hoping to. As long as SO takes 30% more time and money and the primary benefit is the same retread promises of reuse that OO failed to deliver on, I have a hard time imagining SO making much headway.

PS – I have a barely used copy of “Service-Oriented Architecture: Concepts, Technology, and Design” if anyone wants to trade for it. It’s not a red paperclip, but it’s like new – only flipped through once. 😄

Stateless != Stateless

A while back, I blogged that Services Aren’t Stateless, in response to some stuff in Thomas Erl’s latest book. At the time, I mentioned that I was looking forward to discussing my opinions with Erl when I attended his workshop. I’ve spent the last two days at said workshop. I’ll have a full write-up on the workshop later this week, but I wanted to blog the resolution to this stateless issue right away.

At the time, I wrote “I assume Erl means that service should be stateless in the same way HTTP is stateless.” Turns out, my assumption was way way wrong. When he addressed this topic in his workshop, he started by talking about dealing with concurrency and scalability, which got me confused at first. Apparently, when Erl says stateless, he’s referring to minimizing memory usage. That is, don’t keep service state in memory longer than you need to. So all the stuff about activity data, that’s all fine as per Erl’s principles, as long as you write it out to database instead of keeping it in memory. In his book, he talks about the service being “temporarily stateful” while processing a message. When I read that, I didn’t get it – because I was thinking of the HTTP definition of stateless & stateful. But if we’re just talking about raw memory usage, it suddenly makes sense.

On the one hand, I’m happy to agree 100% with Erl on another of his principles. Frankly, much of what he talked about in his workshop seems predicated on unrealistic organizational behavior and offer at best unproven promises of cost and time savings in the long run via black box reuse. So to be in complete agreement with him on something was a nice change of pace. Thomas is much more interested in long-running and async services than I originally expected when I first flipped thru his book.

On the other hand, calling this out as a “principle of service orientation” hardly seems warranted. I mean, large scale web sites have been doing this for a long time and SQL Session State support has been a part of ASP.NET since v1. Furthermore, using the term “stateless” in this way is fundamentally different from the way HTTP and the industry at large uses it, which was the source of my confusion. So while I agree with the concept, I really wish Erl hadn’t chosen an overloaded term to refer to it.

Feasible Service Reuse

Yesterday, I posted about services and reuse. More to the point, I posted why I don’t believe that business services will be reusable, any more than business objects were reusable. However, “can’t reuse business services” isn’t the whole story, because I believe in different kinds of reuse.

The kind of reuse I was writing about yesterday is typically referred to as “black box reuse”. The idea being that I can reuse the item (object or service) with little or no understanding of how it works. Thomas Beck wrote about colored boxes on his blog yesterday. Context impacts reuse – the environments in which you plan to reuse an item must be compatible with what the item expects. However, those contextual requirements aren’t written down anywhere – at least, they’re not encoded anywhere in the item’s interface. Those contextual requirements are buried in the code – the code you’re not supposed to look at because we’re striving for black box reuse. Opaque Requirements == No Possibility of Reuse.

As I wrote yesterday, David Chappell tears this type of reuse apart fairly adeptly. Money quote: “Creating services that can be reused requires predicting the future”. But black box reuse this isn’t the only kind of reuse. It’s attractive, since it’s simple. At least it would be, if it actually worked. So what kind of reuse doesn’t require predicting the future?

Refactoring.

I realize most people probably don’t consider refactoring to be reuse. But let’s take a look at the official definition from refactoring.com:

Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a ‘refactoring’) does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it’s less likely to go wrong. The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring

Two things about this definition imply reuse. First, refactoring is “restructuring an existing body of code”. It’s not rewriting that existing body of code. You may be making changes to the code – this certainly isn’t black box reuse – but you’re not scrapping the code completely and starting over. Second, refactoring is making changes to the code “without changing its external behavior”. You care about the code’s external behavior because somewhere, some other code is calling the code you’re refactoring. Some other existing piece of code that you don’t want to change – i.e. that you want to reuse.

When you refactor, you still reuse a significant amount of the code, but you’re not having to predict the future to do it. Refactoring**is the kind of reuse I believe in.

In his article, David talks about types of reuse such as business agility, adaptability and easily changeable orchestration. These look a lot more like refactoring than black box reuse to me. Unfortunately, David waves these away, saying  “Still, isn’t this just another form of reuse?”. Reconfiguration hardly qualifies as “predict the future” style reuse that he spends the rest of the article arguing against. It’s just one paragraph in an otherwise splendid article, so I’ll give him a pass this time. (I’m sure he’s relieved.)

A Question of Context

A couple of weeks ago, David Chappell posted a great article on SOA and the Reality of Reuse. When someone mentions the idea of using SOA for reuse, I cringe. David does a great job blowing the “SOA for Reuse” argument out of the water. In the future, I will just send that link rather than spending the time arguing out the point.

But something nagged at the back of my brain about that post. David starts by talking about object reuse before making the parallel to services. The problem with that comparison is that object reuse hasn’t been a failure. When was the last time you wrote a String class? A Linked List? A Button? There’s been support for Buttons in Windows since at least the 3.x days (probably longer, but that’s before my time). Whatever your OO language of choice, there’s a big collection of reusable objects to go with it.

Given his position as “famous technology author”, I’m assuming David is well aware of successes of object reuse. Furthermore, I doubt it was an accident that in his article he writes that “reuse of business objects failed” (emphasis added). While there has been success around object reuse, essentially none of those successes have been in a business scenario. In fact, there have been some high profile projects such as Microsoft’s Business Framework and IBM’s San Francisco Project that have crashed and burned been significantly less than successful.

So here’s the question: given that general object reuse has seen some success, what’s so different about business objects that causes reuse to fail utterly? Since we’re really interested in service reuse, knowing why some object reuse succeeds and other reuse fails will help us understand which services are likely to be reusable and which wont. I would say that success of object reuse hinges on context.

Wikipedia gives this definition of context: “The context of an event, word, paradigm, change or other reality includes the circumstances and conditions which surround it.” (emphasis in original) For example, the word “order” is ambiguous. If you’re using a procurement system for the military, you could conceivably be given an order to place an order. (OK, that’s silly. But you get the idea.) The word “order” has two different meanings. However, the words that surround the ambiguous term make the meaning clear. An order that you place is different that an order that you give. That’s context.

A string or a linked list or a button has very little in the way of contextual needs. That is to say you can use it the exact same way in a wide variety of environments. A business object on the other hand has significant contextual requirements, which makes reuse difficult or impossible. For example, a Purchase Order object from the above-mentioned military procurement system sounds like it might be reusable. At least until you take into account the differences between branches of the military, between ordering tanks and ordering uniforms, between active units and reserve units, etc. Once the generic Purchase Order has been specialized to the point of practical usability for a given scenario, it’s no longer reusable in the general case.

Taking this back to the service realm, likewise I figure the reusable services will be the ones with little or no contextual needs. A good example is the identity and directory services such as Active Directory and its competitors. Sure, you use LDAP not SOAP to access it, but AD is certainly both reusable and a service plus it’s in wide usage. Other candidates for reusable services my team is looking at are service directory, management and operations, business activity monitoring and provisioning.

I actually think there will be less reuse in services than there was with objects. The value of reuse of services has to exceed not only the contextual issues but also the overhead of distributed access. Calling across the network is an expensive operation – whatever’s on the other side better be worth the drive. I’m guessing for services, more often than not, reuse won’t be worth the trip (if it’s possible at all).

Update: David pointed out to me that the last paragraph of his article begins:

Object reuse has been successful in some areas. The large class libraries in J2EE and the .NET Framework are perhaps the most important examples of this.

Doh! I guess my “assumption” that David is aware of successful object reuse was correct.