Hawk Notes, Volume 4

I rolled out a small update to my blog: It's now reading the blog posts from a GitHub repo instead of an Azure DevOps repo.

Only reason for the swap is to get programmatic public access to the repo. Azure DevOps supports public projects but REST access still appears to require authentication. Personal Access Tokens work just fine, but they expire which is a pain in the backside.

Maybe anonymous access to Azure Repos REST services gets enabled in the future, but I didn't want to wait. It was pretty easy to port the Azure DevOps REST Wrapper to Octokit. My code was already using TPL Dataflow to load all the post metadata from the Azure Git repo. All I really needed to do was change the client initialization code and the URL construction scheme and I was good to go.

Hawk Notes, Volume 3

I just rolled out a new feature for my blog: Series.

Back when I was working for the IronPython team, I would write series of posts on a single topic, such as writing an IronPython debugger or using IronPython's __clrtype__ metaclass feature. I've also written series on building parsers in F#, a step-by-step guide to brokered WinRT components and the home grown engine I use for this blog. And with the new job, I suspect I'll be writing even more.

So I decided to make series a first class construct in my blog. You can go to a top level page to see all my series. Or you can go to a specific series page, such as the one for Hawk Notes. One thing I really like about Series is that they display in chronological order. That makes reading a series like the IronPython Debugger much easier to follow.

Implementing Series was fairly straightforward...or at least it would have been if I hadn't decided to significantly refactor the service code. I didn't like how I was handling configuration - too much direct config reading that was better handled by options. I made some poor service design decisions that limited the use of dependency injection. Most of all, I wanted to change the way memory caching worked so that more data loaded on demand instead of ahead of time. I also took the opportunity to use newer language constructs like value tuples and local functions.

I still have two different sources for posts - the local file system and an Azure Repo. I used to use Azure Storage but I got rid of that source as part of this refactoring. I have a simple interface PostLoader1 which the AzureGitLoader and FileSystemLoader classes implement. In order to select between them at run time, I have a third PostLoader implementation named PostLoaderSelector. PostLoaderSelector chooses between the sources based on configuration and uses IServiceProvider to activate the specified type from the DI container. PostLoaderSelector gets the IServiceProvider instance via constructor injection. I couldn't find a good example of how to manually activate types from ASP.NET's DI container, so for future reference it looks like this:

public Task<IEnumerable<Post>> LoadPostsAsync()
{
    // Look Ma, a Local Function!
    PostLoader GetLoader()
    {
        switch (options.PostStorage)
        {
            case BlogOptions.PostStorageType.FileSystem:
                return serviceProvider.GetService<FileSystemLoader>();
            case BlogOptions.PostStorageType.AzureGit:
                return serviceProvider.GetService<AzureGitLoader>();
            default:
                throw new NotImplementedException();
        }
    };

    var loader = GetLoader();
    return loader.LoadPostsAsync();
}

  1. Note the lack of the "I" prefix for my interface type names. Death to this final vestigal Hungarian notation!

Hawk Notes, Volume 2

Obviously, I've been pretty bad about keeping this blog up to date. That applies to both the content of the blog as well as the blog engine. Frankly, the last time I updated my blog code before last weekend was four years ago. At the time, ASP.NET Core was still called ASP.NET 5 and the latest version was Preview 7. Before posting my recent announcement, I thought it might be a good idea to get this blog running on a released and supported version of ASP.NET Core.

Once I got my code updated to the latest version of ASP.NET, I decided actually add some functionality. When I built Hawk, I focused primarily on how the site was going to render my existing content. I didn't give much though to how I would write new blog posts. That turned out to be a bad decision. Getting new posts published turned out to be such a huge pain in the butt that I bothered to write new posts. With the new job, I want to go back to blogging much more often. So I figured I should update the engine to make publishing new posts easier.

As I wrote in my previous Hawk Note, Hawk loads all the post metadata into memory on startup. It supports loading posts from either the file system as well as from Azure Storage. The master copy of my posts are stored in their own Visual Studio Online Visual Studio Team Services Azure DevOps Repos. Moving posts from the git repo into azure storage turned out the be the biggest publishing obstacle. So I decided to eliminate that step. If you're reading this, I guess it worked!

Since Hawk already supported loading posts from two different repositories, it was pretty straight forward to add a third that reads directly from the Azure Repo. I also added code to render the markdown to HTML using Markdig, eliminating the need to use Edge.js1.

Originally, I decided to store my blog posts in what is now called Azure Repos because it offered free private repos for personal use. While GitHub now offers free private repos, I've decided to keep my posts in an Azure Repos because I find the Azure DevOps REST interface much easier to wrap my head around than GitHub's GraphQL based approach. Yeah, I'm sure GraphQL is the better approach. But I was able to add the ability to load posts from Azure Repos in roughly 150 lines of code. For now anyway, easy and proven beats newer and more powerful.

Hawk pulling from the Azure Repo is the first step for creating a simple publishing process. Next, I will to set up a service hook to automatically trigger a Hawk refresh when I push to the post repo master branch. Eventually, I also want to add image publishing and maybe some type tool to help author the post metadata (auto set the date, auto slugify the title, validate categories and tags, etc). I also want to re-enable comments, though social media makes that somewhat less important than it was back in the day. Regardless, I figure it's best to tackle improvements incrementally. Last thing I want is another long silence on this blog.


  1. Note, I might not use Edge.js in Hawk anymore, but I still think Tomasz Janczuk must be some kind of a wizard.

Hawk Notes, Volume 1

This is the first in a series of blog posts about Hawk, the engine that powers this site. My plan is to make a post like this for every significant update to the site. We'll see well that plan works.

  • I just pushed out a new version of Hawk on my website. The primary feature of this release is support for ASP.NET 5 Beta 7. I also published the source code up on GitHub. Feedback welcome!
  • As I mentioned in my post on Edge.js, the publishing tools for Hawk is little more than duct tape and bailing wire at this point. Eventually, I'd like to have a dedicated tool, but for now it's a manual three step process:
    1. Run the PublishDraft to publish a post from my draft directory to a local git repo of all my content. As part of this, I update some of the metadata and render the markdown to HTML.
    2. Run my WritePostsToAzure Custom Command to publish posts from my local git repo to Azure. I have a blog post on my custom command infrastructure in the works.
    3. Trigger a content refresh via an unpublished URL.
  • I need to trigger a content refresh because Hawk loads all of the post metadata from Azure on startup. The combined metadata for all my posts is pretty small - about 2/3 of a megabyte stored on disk as JSON. Having the data in memory makes it easy to query as well as support multiple post repositories ( Azure storage and the file system).
  • I felt comfortable publishing the Hawk source code now because I added a secret key to the data refresh URL. Previously, the refresh URL was unsecured. I didn't think giving random, anonymous people on the Interet the ability to kick off a data refresh was a good idea, so I held off publishing source until I secured that endpoint.
  • Hawk caches blog post content and legacy comments in memory. This release also adds cache invalidation logic so that everything gets reloaded from storage on data refresh, not just the blog post metadata.
  • I don't understand what the ASP.NET team is doing with the BufferedHtmlContent class. In beta 7 it's been moved to the Common repo and published as source. However, I couldn't get it to compile because it depends on an internal [NotNull] attribute. I decided to scrap my use of BufferedHtmlContent and built out several classes that implement IHtmlContent directly instead. For example, the links at the bottom of my master layout are rendered by the SocialLink class. Frankly, I'm not sure if rolling your own IHtmlContent class for snippet of HTML code you want to automate is a best practice. It seems like it's harder than it should be. It feels like ASP.NET needs a built-in class like BufferedHtmlContent, so I'm not sure why it's been removed.

Using Task in ASP.NET MVC Today

I’ve been experimenting with the new async support coming in the next version of C# (and VB). I must say, I’m very impressed. Async is one of those things you know you’re supposed to be doing. However, traditionally it has taken a lot of code and been hard to get right. The new await keyword changes all that.

For example, here’s an async function to download the Twitter public timeline:

public async Task PublicTimelineAsync()
{
  var url = "http://api.twitter.com/1/statuses/public_timeline.xml";
  var xml = await new WebClient().DownloadStringTaskAsync(url);
  return XDocument.Parse(xml);
}

That’s not much more difficult that writing the synchronous version. By using the new async and await keywords, all the ugly async CPS code you’re supposed to write is generated for you automatically by the compiler. That’s a huge win.

The only downside to async is that support for it is spotty in the .NET Framework today. Each major release of .NET to date has introduced a new async API pattern. .NET 1.0 had the Async Programming Model (APM). .NET 2.0 introduced the Event-based Async Pattern (EAP). Finally .NET 4.0 gave us the Task Parallel Library (TPL). The await keyword only works with APIs writen using the TPL pattern. APIs using older async patterns have to be wrapped as TPL APIs to work with await. The Async CTP includes a bunch of extension methods that wrap common async APIs, such as DownloadStringTaskAsync from the code above.

The async wrappers are nice, but there are a few places where we really need the TPL pattern plumbed deeper. For example, ASP.NET MVC supports AsyncControllers. AsyncControllers are used to avoid blocking IIS threads waiting on long running I/O operations – such as getting the public timeline from Twitter. Now that I’ve been bitten by the async zombie virus, I want to write my async controller methods using await:

public async Task<ActionResult> Index()
{
    var t = new Twitter();
    var timeline = await t.PublicTimelineAsync();
    var data = timeline.Root.Elements("status")
        .Elements("text").Select(e => e.Value);
    return View(data);
}

Unfortunately, neither the main trunk of MVC nor the MVC futures project has support for the TPL model [1]. Instead, I have to manually write some semblance of the async code that await would have emitted on my behalf. In particular, I have to manage the outstanding operations, implement a continuation method and map the parameters in my controller manually.

public void IndexAsync()
{
    var twitter = new Twitter();

    AsyncManager.OutstandingOperations.Increment();
    twitter
        .PublicTimelineAsync()
        .ContinueWith(task =>
        {
            AsyncManager.Parameters["timeline"] = task.Result;
            AsyncManager.OutstandingOperations.Decrement();
        });
}

public ActionResult IndexCompleted(XDocument timeline)
{
    var data = timeline.Root.Elements("status")
        .Elements("text").Select(e => e.Value);
    return View(data);
}

I promise you, writing that boilerplate code over and over gets old pretty darn quick. So I wrote the following helper function to eliminate as much boilerplate code as I could.

public static void RegisterTask<T>(
    this AsyncManager asyncManager,
    Task<T> task,
    Func<T, object> func)
{
    asyncManager.OutstandingOperations.Increment();
    task.ContinueWith(task2 =>
    {
        //invoke the provided function with the
        //result of running the task
        var o = func(task2.Result);

        //use reflection to set asyncManager.Parameters
        //for the returned object's fields and properties
        var ty = o.GetType();
        foreach (var f in ty.GetFields())
        {
            asyncManager.Parameters[f.Name] = f.GetValue(o);
        }
        foreach (var p in ty.GetProperties())
        {
            var v = p.GetGetMethod().Invoke(o, null);
            asyncManager.Parameters[p.Name] = v;
        }

        asyncManager.OutstandingOperations.Decrement();
    });
}

With this helper function, you pass in the Task<T> that you are waiting on as well as a delegate to invoke when the task completes. RegisterTask takes care of incrementing and decrementing the outstanding operations count as appropriate. It also registers a continuation that reflects over the object returned from the invoked delegate to populate the Parameters collection.

With this helper function, you can write the async controller method like this:

public void IndexAsync()
{
    var twitter = new Twitter();

    AsyncManager.RegisterTask(
        twitter.PublicTimelineAsync(),
        data => new { timeline = data });
}

//IndexCompleted hasn't changed
public ActionResult IndexCompleted(XDocument timeline)
{
    var data = timeline.Root.Elements("status")
        .Elements("text").Select(e => e.Value);
    return View(data);
}

It’s not as clean as the purely TPL based version. In particular, you still need to write separate Async and Completed methods for each controller method. You also need to build an object to map values from the completed tasks into parameters in the completed method. Mapping parameters is a pain, but the anonymous object syntax is terser than setting values in the AsyncManager Parameter collection.

It’s not full TPL support, but it’ll do for now. Here’s hoping that the MVC team has async controller methods with TPL on their backlog.


[1] I’m familiar with Craig Cavalier’s Async MVC with TPL post, but a fork of the MVC Futures project is a bit too bleeding edge for my needs at this point.