Custom Authentication with WCF is Top Shelf

I’ve spent the last three days heads down in WCF security and color me massively impressed. I just checked in a prototype that provides customized authentication for a business service. The idea that you could bang up a custom authentication service fairly easily blows my mind.

The cornerstone to this support in WCF is the standard WSFederationHttpBinding. While the binding name implies support for WS-Federation which in turn implies the use of infrastructure like Active Directory Federation Services, the binding also scales down to support simple federation scenarios with a single Security Token Service (aka STS) as defined by WS-Trust. WS-Trust appears similar to Kerberos. If you want to access a service using the federation binding, you first obtain a security token from the associated STS. Tokens contain SAML assertions, which can be standard – such as Name and Windows SID – or entirely custom, which opens up very interesting and flexible security scenarios.

If you want to support multiple authentication systems (windows, certificates, CardSpace, PassportWindows Live ID, etc), STS is perfect because you can centralize the multiple authentication schemes at the STS, which then hands out a standard token the business service understands. Adding a new auth scheme can happen centrally at the STS rather than in each and every service. Support for multiple authentication schemes was the focus of our current prototype and it worked extremely well.

WCF includes a federation sample which is where you should start if you’re interested in this stuff. That scenario includes a chain of two STS’s. Accessing the secure bookstore service requires authenticating against the bookstore STS which in turn requires authenticating against a generic “HomeRealm” STS. Since there are two STS’s, they factored the common STS code into a shared assembly. You can use that common code to build an STS of your own.

For our prototype, we made only minor changes to the common STS code from the sample. In fact, the only significant change we made was to support programmatic selection of the proof key encryption token. In the sample, both the issuer token and the proof key encryption token are hard coded (passed into the base class constructor). The issuer token is used to sign the custom security token so the target service knows it came from the STS. The encryption token is used to – you guessed it – encrypt the token so it can only be used by the target service. Hard-coding the encryption token means you can only use your STS with a single target service. We changed that so the encryption token can be chosen based on the incoming service token request.

Of course, it wasn’t all puppy dogs and ice cream. While I like the config system of WCF, anyone who calls it “easy” is full of it. I’ve spend most of the last three days looking at config files. Funny thing about config files is that they’re hard to debug. So most of my effort over the last few days has been in a cycle of run app / app throws exception / tweak config / repeat. Ugh.

Also, while the federation sample is comprehensive, I wonder why this functionality isn’t in the based WCF platform. For example, the sample includes implementations of RequestSecurityToken and RequestSecurityTokenResponse, the input and output messages of the STS. But I realized that WCF has to have its own implementations of RST and RSTR as well, since it has to send the RST to the STS and process the RSTR it gets in response. A little spelunking revealed the presence of an official WCF implementation of RST and RSTR, both marked internal. I normally fall on the pragmatic side of the internal/public debate, but this one makes little sense to me.

Otherwise, the prototype went smooth as silk and my project teammates were very impressed at how quickly this came together. Several of the project teams we’re working with have identified multiple authentication as the “killer” capability they’re looking to us to provide, so it’s good to know we’re making progress in the right direction.

FeedFlare Finally Fixed

I moved over to FeedBurner a while back. DasBlog has great support for FeedBurner – all you do set your FeedBurner feed name in the DasBlog config and it handles the rest, including permanently redirecting your readers to the new feed.

However, I haven’t been able to make FeedFlares work today. FeedFlares “build interactivity into each post” with little links like “Digg this”, “Email this” or “Add to del.icio.us”. Since FeedBurner is serving the XML feed, it’s no big deal for them to add those links into the RSS feed. But to get those same flares to work on the web site, you have to embed a little script at the end of each item. Scott shows how to do this with DasBlog, except that it didn’t work for me. I’ve tried off and on, but for some reason, the FeedBurner script file I was including was always empty.

Then I noticed the other day that my post WorkflowQueueNames had the flare’s on them. Hmm, why would that post work and none of the rest of mine work? Turns out that it works because there’s no spaces in the title. Unlike most of the rest of the DasBlog community, I’m using ‘+’ for spaces in my permalinks, instead of removing them. So I get http://devhawk.net/FeedFlare+Finally+Fixed.aspx as the permalink url instead of http://devhawk.net/FeedFlareFinallyFixed.aspx. In fact, that feature is in DasBlog because I pushed for it (a fact Scott reminded me of while I was troubleshooting this last night). And it was breaking the FeedFlares.

The solution is to URL encode the ‘+’, which is %2B, in the FeedFlare script link. I created a custom macro, since I already had a few custom macro’s powering this site anyway, and now I get the FeedFlares on all my blog entries. I’ll also go update the DasBlog source, but creating a custom macro was both easier and less risky than patching the tree and upgrading everything.

The Other Foundation Technology

I mentioned last week that WF “is one of two foundation technologies that my project absolutely depends on”. Sam Gentile assumes the other foundation technology is WCF. It’s not.

As a quick reminder, my day job these days is to architect and deliver shared service-oriented infrastructure for Microsoft’s IT division. These services will be automating long running business operations. And when I say long running, I mean days, weeks or longer. While there will surely be some atomic or stateless services, I expect most of the services we build will be long running. Thus, the infrastructure I’m responsible for has to enable and support long running services.

The other foundation technology my project depends on is Service Broker. Service Broker was expressly designed for building these types of long running services. It supports several capabilities that I consider absolutely critical for long running services:

  • Service Initiated Interaction. Polling for changes is inefficient. Long running operations need support for the Solicit-Response and/or Notification message exchange patterns.
  • Durable Messaging. The first fallacy of distributed computing is that the network is reliable. If you need to be 100% sure the message gets delivered, you have to write it to disk on both sides.
  • Service Instance Dehydration. It’s both dangerous and inefficient to keep an instance of a long running service in memory when it’s idle. In order to maximize integrity (i.e. service instances survive a system crash) as well as resource utilization (i.e. we’re not wasting memory/CPU/etc on idle service instances), service instances must be dehydrated to disk.

In addition to these capabilities, Service Broker supports something called Conversation Group Locking, which turns out to be important when building highly scalable long running services. Furthermore, my understanding is that Conversation Group Locking is a feature unique to Service Broker, not only across Microsoft’s products but across the industry. Basically, it means that inbound messages for a specific long running service instance are locked so they can’t be processed on more than one thread at a time.

Here’s an example: let’s say I’m processing a Cancel Order message for a specific order when the Ready to Ship message arrives for that order arrives. With Conversation Group Locking, the Ready to Ship message stays locked in the queue until the Cancel Order message transaction is complete, regardless of the number of service threads there are. Without Conversation Group Locking, the Ready to Ship message might get processed by another service thread at the same time the Cancel Order message is being processed. The customer might get notified that the cancellation succeeded while the shipping service gets notified to ship the product. Oops.

There’s also an almost-natural fit between Service Broker and Windows Workflow. For example, a Service Broker Conversation Group and a WorkflowInstance are roughly analogous. They even both use a Guid for identification, making mapping between Conversation Group and WF Instance simple and direct. I was able to get prototype Service Broker / WF integration up and running in about a day. I’ll post more on that integration later this week.

Last but not least, Service Broker is wicked fast. Unfortunately, I don’t have any public benchmarks to point to, but the Service Broker team told me about a private customer benchmark that handled almost 9,000 messages per second! One of the reasons Service Broker is so fast is because it’s integrated into SQL Server 2005, which is is pretty fast in it’s own right. Since Service Broker is baked right in, you can do all your messaging work and your data manipulation within the scope of a local transaction.

Service Broker has a few rough areas and it lacks a supported managed api (though there is a sample managed api available). Probably the biggest issue is that Service Broker has almost no interop story. If you need to interop with a Service Broker service, you can use SQL Server’s native Web Service support. or the BizTalk adapter for Service Broker from AdapterWORX. However, I’m not sure how many of Service Broker’s native capabilities are exposed if you use these interop mechanisms. You would probably have to write a bunch of application code to make these capabilities work in an interop scenario.

Still, I feel Service Broker’s unique set of capabilities, its natural fit with WF and its high performance make it the best choice for building my project’s long running services. Is it the best choice for your project? I have no idea. One of the benefits of working for MSIT is that I get to focus on solving a specific problem and not on solving general problems. I would say that if you’re doing exclusively atomic or stateless services, Service Broker is probably overkill. If you’re doing any long running services at all, I would at least give Service Broker a serious look.

QOTD – Rick Barnes

As usual, I’m behind on blogging. This quote is actually from last Tuesday.

“Sunshine is a terrific bleach”
Rick Barnes

Rick, by the way, is my manager.

WorkflowQueueNames

As I wrote in my last post, I’m doing a bunch of WF related work. I’m close to releasing some WF related stuff I started building last week in Jon’s class. But I discovered something cool about the way WF’s queuing system works, and wanted to blog about it.

Side note - Speaking of Jon, he’s joined the “WF is not a toy” conversation. He had an interesting point about the persistence service that I hadn’t thought of. If you use the SQL persistence service and you have TransactionScope in your workflow, you end up with a distributed transaction, even if these are all writing against the same SQL instance. That’s a good enough reason to write your own persistence service right there.

In the WF stuff I’m building, I need a way for the WF runtime to notify a given workflow instance when something happens. WF has a low level queuing system as well as the higher abstraction data exchange system. I’m more interested in low level knowledge of WF, so I decided to use the queuing system.

In my implementation, the workflow instances only need to be notified when specific events happen. That is, I’m not passing in any real data on the queue – the arrival of the data is what’s important, not the data itself. Queues are identified by name and I started by using a simple string as my queue name. However, the queue name isn’t limited to be a string, it supports any IComparable class. This turned out to be a huge advantage for me.

Things worked fine when I was building a simple sequence, but when I moved to a parallel activity things went south. Since I was using a simple string, I ended up creating two queues with the same name, which didn’t work out well. Furthermore, I have two different notification situations. So I needed a way to have a unique queue name for the same activity type in parallel branches of the workflow as well as supporting two different notification situations.

Because queue name is IComparable instead of a string, I was able to create two queue name types – one for each notification situation. Each of these queue name types includes a string that I initialize to the activity’s qualified name, which as per the docs is “always unique in a workflow instance”. So I was able to kill two birds with one stone – supporting multiple parallel activities as well as multiple notification scenarios. That’s pretty cool. If they had used simple strings, I would have had to have a naming system like “notificationscenario:notificationdata:activityname” and then have to parse out the queue name string. In fact, I started down this path before I remembered that queue name is IComparable. Using IComparable is much much cleaner.