The One Business Case for Integration

Nick Malik lays out what he thinks are the four business cases for integration:

Assume we succeeded, and our systems are all optimally integrated.  What has changed?

  1. We have better business intelligence.  We have better understanding of our customers, our partners, our products, and our business.  And from that understanding, we make better decisions.  Those decisions are made in a federated manner using self-apparent information.
  2. We have end-to-end business processes that cross multiple systems, multiple roles, multiple geographies, and multiple data stores, all aware of and supporting the needs of the customer.
  3. We have end-to-end awareness of the metrics that drive both dissatisfaction and cost, and we can take that knowledge and apply it to making our business better.
  4. We have a more efficient enterprise, more able to grow to a larger size, at an accelerated rate, and still respond with agility to changing business opportunities.

I put to you that, in fact, we only have one business case for integration: better business intelligence. The other reasons Nick lists are either redundant or not as important to the business – at least in the general case – as you might think.

First off, #3 from Nick’s list sounds suspiciously like #1. If there’s a difference between “better understanding driving better decisions” and “applying awareness of metrics to making our business better”, I don’t know what it is. We’ll send one of them off to the Dept. of Redundancy Dept. and be done with it.

Second, I don’t think the business cares that IT has multiple systems or multiple data stores. If the business could run on one big centralized system that could meet the needs of the customer (aka the ERP fantasy), they’d be fine with that. The fact that realities of modern enterprise IT require splitting up capabilities across many systems is an implementation detail that frankly isn’t a concern of the business.

Besides, what’s the business benefit here? News flash: the enterprise already has end-to-end business processes that cross multiple systems, multiple roles, blah blah blah. They’re just not automated end-to-end. Does the business care that their not automated? Not a bit. Sure, they care about processes are slow, costly and error-prone, which manual processes tend to be. But it’s those negative characteristics that the business cares about, not integration. Besides, making processes quick, cheap and error-free sounds a lot like making them efficient. In other words, more work for the Dept. of Redundancy Dept.

Finally, I don’t think efficiency and agility is as important to the enterprise as Nick makes it out to be. I mean, the enterprise will say it cares about efficiency – especially in front of the stock holders. But when it comes to putting it’s money where it’s mouth is, the enterprise doesn’t, more often than not. Think about how success is measured in the typical IT project. Is efficiency one of the criteria for judging success? Not really. Will your project stakeholders let you run over budget and ship a few months late, just to improve efficiency? Probably not, unless that efficiency gain is both demonstrable and dramatic.

Of course, there are certainly specific examples where a automation or efficiency business case for integration can be made. For example, if replacing a specific manual process with an automated one has a large and measurable ROI, the business will likely be interested in making that investment. If you have a certain process that you do over and over that’s core to the business, the business will probably be interested in optimizing the frak out of it. For example, I would guess a delivery company like UPS or FedEx has spent a lot of time and money on optimizing their delivery processes.

But what it sounds like Nick’s talking about here is making a general case for making all our systems “optimally integrated”. Given that our current systems aren’t, this would take significant time and money. Yet the tangible benefit to the business is at best nebulous. Nick thinks improved integration will allow the business to “respond with agility to changing business opportunities.” He’s probably right. But how do you quantify this agility? How much will we save in the future for what we’re spending today? For the general case, the answer is “it depends”. It’s really hard to fund a project when it’s projected ROI is “it depends” .

However, business intelligence is a no brainer for the enterprise to invest in. Giving decision makers better and more up-to-date information, that’s a tangible benefit that the organization can quantify now. If you can quantify the value of a project, you’ve got the start of a budget. Of course, all that juicy data is smeared across a variety of systems, which means integration. Again, the enterprise doesn’t really care about said multiple systems or integration, but they care about the outcome.

Nick recommends to SOA folks that “if you aren’t already working with your BI team, pick up the phone. Their mature processes and practices are able to address many of your issues, and the natural synergy between BI and SOA can make them a strong ally in the fight for a better, faster, cheaper, and more intelligent enterprise.” Good advice. Otherwise, selling integration to the business isn’t much different than selling them SOA. In other words, don’t sell it – just do it.

Retire the Tenets

John Heintz and I continue to be in mostly violent agreement. It’s kinda like me saying “You da architect! Look at my massive scale EAI Mashup!” and having him respond “No, you da architect! The SOA tenets drive me bonkers!” Makes you wonder what would happen after a few beers. What’s the architect version of Tastes Great, Less Filling? ^[Not that you would catch me drinking Miller Lite. Ever.]

Speaking of the tenets, John gives them a good shredding:

Tenet 1: Boundaries are Explicit
(Sure, but isn’t everything? Ok, so SQL based integration strategies don’t fall into this category. How do I build a good boundary? What will version better? What has a lower barrier to mashup/integration?)

Tenet 2: Services are Autonomous
(Right. This is a great goal, but provides no guidance or boundaries to achieve it.)

Tenet 3: Services share schema and contract, not class
(So do all of my OO programs with interface and classes. What is different from OO design that makes SOA something else?)

Tenet 4: Service compatibility is based upon policy
(This is a good start: the types and scope of policy can shape an architecture. The policies are the constraints in a system. There not really defined though, just a statement that they should be there.)

Ah, I feel better getting that out.

As John points out, the four tenets aren’t particularly useful as guidance. They’re too high level (like Mt. Rainier high) to be really actionable. They’re like knowing a pattern’s name but not understanding how and when to use the actual pattern. However, I don’t think the tenets were ever intended to be guidance. Instead, they were used to shift the conversation on how to build distributed applications just as Microsoft was introducing the new distributed application stack @ PDC03.

John’s response to the first tenet makes it sound like having explicit boundaries is obvious. And today, maybe it is. But back in 2003, mainstream platforms typically used a distributed object approach to building distributed apps. Distributed objects were widely implemented and fairly well understood. You created an object like normal, but the underlying platform would create the actual object on a remote machine. You’d call functions on your local proxy and the platform would marshal the call across the network to the real object. The network hop would still be there, but the platform abstracted away the mechanics of making it. Examples of distributed object platforms include CORBA via IOR, Java RMI, COM via DCOM and .NET Remoting.

The (now well documented and understood) problem with this approach is that distributed objects can’t be designed like other objects. For performance reasons, distributed objects have to have what Martin Fowler called a “coarse-grained interface”, a design which sacrifices flexibility and extensibility in return for minimizing the number of cross-network calls. Because the network overhead can’t be abstracted away, distributed objects are a very leaky abstraction.

So in 2003, Indigo folks came along and basically said “You know the distributed object paradigm? The one we’ve been shipping in our platform since 1996? Yeah, turns out we think that’s the wrong approach.” Go back and check out this interview with Don Box from early 2004. The interviewer asks Don if WCF will “declare the death of distributed objects”. Don hems and haws at first, saying “that’s probably too strong of a statement” but then later says that the “contract, protocol, messaging oriented style will win out” over distributed objects because of natural selection.

The tenets, IMHO, were really designed to help the Windows developer community wrap their heads around some of the implications of messaging and service orientation. These ideas weren’t really new – the four tenets apply to EDI, which has been around for decades. But for a generation of Windows developers who had cut their teeth on DCOM, MTS and VB, it was a significant paradigm shift.

These days, with the tenets going on four years old, the conversation has shifted. Platform vendors are falling over themselves to ship service/messaging stacks like WCF and most developers are looking to these stacks for the next systems they build. Did the tenets do that? In part, I think. Mainstream adoption of RSS was probably the single biggest driver of this paradigm shift, but the tenets certainly helped. Either way, now that service orientation is mainstream, I would say that the tenets’ job is done and it’s time to retire them. Once you accept the service-oriented paradigm, what further guidance do the tenets provide? Not much, if any.

Where Have All the SOA Mashups Gone?

John Heintz responded to my serendipitous reuse post. Nice to see I misunderstood his opinions about how easy RESTful systems are to integrate:

I didn’t mean to imply that building RESTful system would lead to magical integration without any hard work. I can see how that came across in my post, and I guess I got the reaction I asked for 😉

I get the feeling that John would be a good guy to have a beer with.

John spends most of his post writing about the SOA in the Real World book. I’ve flipped thru it and I’m familiar with the model (it is my old team after all) but I haven’t read it so I don’t really want to comment about the book specifically. But there were two things John mentioned that I did want to comment on.

First, at the end of his post John writes:

Can some of the constraints of REST be applied to SOA? Absolutely. I think an asynchronous, message-passing architecture with a uniform interface would be astoundingly interesting! I’m not the only one: see MEST, AMPQ, and Erlang.

This goes back to a REST question I asked two months ago: is it still REST if you don’t use HTTP? I’m guessing John would say yes.

I might be going out on a limb here, I’ll bet the core of John’s problem with SOA is how toolkits like WCF all but force you to build RPC style services that can easily be modeled as method calls. That’s certainly one of my problems with SOA. Tim Ewald said it best:

It’s depressing to think that SOAP started just about 10 years ago and that now that everything is said and done, we built RPC again. I know SOAP is really an XML messaging protocol, you can do oneway async stuff, etc, etc, but let’s face it. The tools make the technology and the tools (and the examples and the advice you get) point at RPC. And we know what the problems with RPC are. If you want to build something that is genuinely loosely-coupled, RPC is a pretty hard path to take.

If SOA == RPC and REST == loosely coupled messages, then I’ll start growing dreadlocks right now. Frankly, as Tim says, I think it’s a problem with the tools (I’m looking at you WCF) and not the underlying architecture, but how many people can distinguish the architecture from the tools? Not many, I’m afraid.

Second, John asks an interesting question:

Where are the SOA mashups?

That’s easy! They’re inside the firewall where you can’t see them! 😉

Seriously, I’m not sure about “SOA” mashups, but I’m working with what you might call a huge “enterprise” mashup system inside Microsoft. Our Enterprise Data Integration Services push around massive amounts of data to downstream systems. There are over fifty datasets in production, each with scores of tables, millions of rows and hundreds of subscribing systems. One example, our Products dataset, has over 100 tables and nearly 300 subscribing systems.

Is it “service oriented”? No, but then again it was originally developed ten years ago on SQL 6.5. But is it a mashup? Is it an “application that combines content from more than one source into an integrated experience“? Yep. Is it easy to work with? No, but guess why I’m involved? We’re looking at ways to “modernize” the system. Am I going to build RPC style services as part of this modernization? Hell, no.

So John, am I right or wrong about that beer?

Is Serendipity the Heart of the WS-*/REST Debate?

Thanks to Technorati, I found this post by John Heintz. He’s checking out John Evdemon’se-book on SOA and has a problem with this overview:

SOA is an architectural approach to creating systems built from autonomous services. With SOA, integration becomes forethought rather than afterthought. This book introduces a set of architectural capabilities, and explores them in subsequent chapters.

To which John H. responds:

I, for one, would rather build on an architecture that promotes integration as an afterthought, so I don’t have to think about it before hand!!!

Yeah, I’d rather not have to think about integration before hand either. On the other hand, I want integration that actually works. It sounds like John H. is suggesting here that REST somehow eliminates the need to consider integration up front. It doesn’t. Consider this: if you’re building a Web 2.0 site then you are expected to expose everything in your site via APP, RSS and/or RESTful POX services. In other words, the Web 2.0 community expects you to have the forethought to enable integration. If you don’t, Marc Canter will call you out in front of Bill Gates and Tim O’Reilly.

This integration by afterthought approach seems to be big among RESTifarians. John H. links to a REST discussion post by Nick Gall advocating the principle of generality, “unexpected reuse” and “design for serendipity”. Money quote:

The Internet and the Web are paradigms of Serendipity-Oriented Architectures. Why? Largely because of their simple generality. It is my belief that generality is one of the major enablers of serendipity. So here I immodestly offer Gall’s General Principle of Serendipity: “Just as generality of knowledge is the key to serendipitous discovery, generality of purpose is the key to serendipitous (re)use.”

Serendipity means “the accidental discovery of something pleasant, valuable, or useful“. “Serendipitous reuse” sounds an awful lot like accidental reuse. Most enterprises have been there, done that and have nothing to show for their efforts or $$$ except the team t-shirt. Yet Tim Berners-Lee believes “Unexpected reuse is the value of the web” and Roy Fielding tells us to “Engineer for serendipity”. What gives?

First off, enterprises aren’t interested in unexpected or serendipitous reuse. They want their reuse to be systematic and predictable. The likelihood of serendipitous reuse is directly related to the number of potential reusers. But the number of potential reusers inside the enterprise is dramatically smaller than out on the public Internet. That brings the chance for serendipitous reuse inside the enterprise to nearly zero.

Second, enterprise systems aren’t exactly known for their “simple generality”. If Nick’s right that “generality of purpose is the key to serendipitous (re)use”, then enterprises might as well give up on serendipitous reuse right now. As I said last year, it’s a question of context. Context is specifics, the opposite of generality. Different business units have different business practices, different geographies have different laws, different markets have different competitors, etc. If an enterprise operates in multiple contexts – and most do - enterprise solutions have to take them into account. Those different contexts prevent you from building usable – much less reusable – general solutions.

Finally, I think the amount of serendipitous reuse in REST is overstated. If you build an app on the Facebook Platform, can you use it on MySpace? Nope. If you build an app that uses the Flickr services, will it work with Picasa Web Albums? Nope. Of course, there are exceptions – pretty much everyone supports the MetaWeblog API – but those exceptions seem few and far between to me. Furthermore, the bits that are getting reused – such as identifier, format and protocol – are infrastructure capabilities more suitable to reuse anyway. Serendipitously reusing infrastructure capabilities is much easier than serendipitously reusing business capabilities, REST or not.

The problems that stand in the way of reuse aren’t technology ones. Furthermore, the reuse problems face by enterprises are very different than ones faced by Web 2.0 companies. REST is a great approach, but it isn’t a one-size-fits-all technology solution that magically relegates integration and reuse to “afterthought” status. Serendipity is nice, when it happens. However, by definition it’s not something you can depend on.

The Durable Messaging Debate Continues

Last week, Nick Malik responded to Libor Soucek’s advice to avoid durable messaging. Nick points out that while both durable and non-durable messaging requires some type of compensation logic (nothing is 100% foolproof because fools are so ingenious), the durable messaging compensation logic is significantly simpler.

This led to a very long conversation over on Libor’s blog. Libor started by clarifying his original point, and then the two of them went back and forth chatting in the comments. It’s been very respectful, Libor calls both Nick and I “clever and influential” though he also thinks we’re wrong on this durable messaging thing. In my private emails with Libor, he’s been equally respectful and his opinion is very well thought out, though obviously I think he’s the one who’s wrong. 😄

I’m not sure how much is clear from Libor’s public posts, but it looks like most of his recent experience comes from building trading exchanges. According to his about page, he’s been building electronic trading systems since 2002. While I have very little experience in that domain, I can see very clearly how the highly redundant, reliable multicast approach that he describes would be a very good if not the best solution.

But there is no system inside Microsoft IT that looks even remotely like a trading exchange. Furthermore, I don’t think approaches for building a trading exchange generalize well. So that means Nick and I have very different priorities than Libor, something that seems to materialize as a significant amount of talking past each other. As much as I respect Libor, I can’t shake the feeling that he doesn’t “get” my priorities and I wouldn’t be at all surprised if he felt the same way about me.

The biggest problem with his highly redundant approach is the sheer cost when you consider the large number of systems involved. According to Nick, MSIT has “over 2700 applications in 82 solution domains”. When you consider the cost for taking a highly redundant approach across that many applications, the cost gets out of control very quickly. Nick estimates that the support staff cost alone for tripling our hardware infrastructure to make it highly redundant would be around half a billion dollars a year. And that doesn’t include hardware acquisition costs, electricity costs, real-estate costs (gotta put all those servers somewhere) or any other costs. The impact to Microsoft’s bottom line would be enormous, for what Nick calls “negligible or invisible” benefit.

There’s no question that high availability costs big money. I just asked Dale about it, and he said that in his opinion going above 99.9% availability increases costs “nearly exponentially”. He estimates just going from 99% to 99.9% doubles the cost. 99% availability is almost 15 minutes of downtime per day (on average). 99.9% is about 90 seconds downtime per day (again, on average).

How much is that 13 extra minutes of uptime per day worth? I would say “depends on the application”. How many of the 2700 applications Nick mentions need even 99% availability? Certainly some do, but I would guess that less than 10% of those systems need better than 99% availability. What pretty much all of them actually need is high reliability, which is to say they need to work even in the face of “hostile or unexpected circumstances” (like system failures and downtime).

High availability implies high reliability. However, the reverse is not true. You can build systems to gracefully handle failures without the cost overhead of highly redundant infrastructure intended to avoid failures. Personally, I think the best way to build such highly reliable yet not highly available systems is to use durable messaging, though I’m sure there are other ways.

This is probably the biggest difference between Libor and me. I am actively looking to trade away availability (not reliability) in return for lowering the cost of building and running a system. To someone who builds electronic trading systems like Libor, that probably sounds completely wrongheaded. But an electronic trading system would fall into the minority of systems that need high availability (ultra high five nines of availability in this case). For the systems that actually do need high availability, you have to invest in redundancy to get it. But for the rest of the systems, there’s a less costly way to get the reliability you need: Durable Messaging.