Passion * Technology * Ruthless Competence

Friday, September 12, 2008

Afternoon Coffee 174

You know, this gets pretty long when I go a week between morning coffee posts.

Dynamic Language Stuff

Other Stuff

  • Don Syme blogs about an update to the F# CTP, a mere week after the original release. One week? That's more often than even IPy releases. I can't wait to see what they ship in next week's release! :) Seriously, I hope they can keep the release sprints short, but every week would be a bit crazy!
  • Speaking of F#, Matt Podwysocki updates FsTest for the F# CTP and posts about Extension Everything in F#. Unlike C#, which only supports extension methods, F# also supports extensions properties, static methods and events, though like Matt I can't think of a good use for extension events.
  • Still speaking about F#, Andrew Kennedy has a three part series on the new units of measure feature of F#. If you were going to use F# to build the physics engine of a game, I would suspect UoM would be extremely useful. (via Don Syme)
  • Oh look, Chris Smith built an F# version of artillery game that uses Units of Measure for the physics code. I'll bet UoM was extremely useful. :)
  • Talking about Live Mesh at TechEd Australia - where much to my surprise frankly they were demoing Live Mesh Apps - I pointed out to Scott Hanselman that Mesh is running an embedded CoreCLR (aka the same CLR from Silverlight 2). Scott went poking around and posted what he discovered. Looking forward to finding out what he digs up on using CoreCLR outside the browser.
  • Speaking of Scott, I need to set up a family video conference solution like Scott's before my next trip.
  • Congrats to Glenn Block and the MEF team for their initial CodePlex source drop! I've been hearing about this possibility since Glenn joined the team, so I'm really excited to see it happen. I need to take a look at it in detail (in my copious spare time) because I want to find out how to make it work with IPy.
  • Bart de Smet has a whole series (starting here) on Dynamic Expression Trees. However, given that he specifically writes "This blog series is not about DLR itself" makes it seem pretty conceptual to me. Why not talk about DLR expression trees instead Bart?
  • I'm sure you noticed ASP.NET MVC preview 5 dropped last week. I really liked Brad Wilson's discussion of the new view engine design.
  • Tomas Restrepo has started publishing his source code on GitHub. Personally, I haven't published any source code lately but I am using Git for all of my non IPy core work (which is stored in TFS). Like Tomas, I'm still getting the hang of Git but I'm really digging it's speed, it's branching and the fact that there's zero infrastructure requirements. SVN provides the lightweight svnserve, but Git is even lighter weight than that.
  • I liked Steve Yegge's post on typing. I am a touch typer, but I doubt I type 70 words a minutes. I do know where the number keys are without looking though, so I guess that's pretty good. I remember seeing Chris Anderson demo Avalon WPF long before it was public and being impressed at how fast he could type.
Posted By Harry Pierson at 1:20 PM Pacific Daylight Time

Friday, August 29, 2008

Morning Coffee 173

I'm on my way out the door for New Zealand and Australia, but I wanted to push out a few things.

  • F# August September CTP is out! Don Syme has the announcement, Jomo Fisher has the link roundup and details are on the brand-spanking-new MSDN F# Dev Center. Major congrats to the F# team. I've been running a pre-release version of these bits, and they are a huge step forward if you're an F# developer.
  • I've got an article on IronPython in the latest issue of CoDe magazine. Also check out Brad Wilson's IronRuby article, Ted Neward's F# article and Neil Ford's Polygot Programming article.
  • Via Michael Foord I discovered that IronPython tester Dave Fugate is back on the blog. He starts with a couple of posts about measuring IronPython performance.
  • Speaking of blogging teammates, I think the dynamic languages team has the highest percentage of bloggers in any group at MSFT. All four Program Managers (Dave (lead), John, Jimmy and me), four of five developers (Shri (lead), Dino, Curt and Oleg) and all three Testers (Jim, Dave and Srivatsn). The only non blogger right now is Tomas - who at least has a home page - and  the lead tester which is an open position right now. 11 bloggers out of 12 team members equals 91.67% team blogger coverage.
  • I was really impressed with Newspeak when I saw it at Lang.NET so I'm very excited to see they have a new website. No public bits yet, but I like the part where they point out Newspeak "can be implemented independently of Squeak, Smalltalk or any particular VM or IDE". How about implementing a version on DLR guys?
  • Maurice de Beijer shows off embedding IronPython inside a WF application. Kinda cool, but he's primarily showing off implementing a CLR interface in IPy. How about a WF activity that execute arbitrary IPy code. That would be cool. (via IronPython URLs)
  • Ironclad has reached their 0.5 milestone, being able to import numpy from IronPython. BTW, guys - I'm not sure commenting out one line that appears to be unreferenced qualifies as a "monstrous caveat". Congrats guys! (via IronPython URLs)
Posted By Harry Pierson at 1:29 PM Pacific Daylight Time

Monday, August 25, 2008

Morning Coffee 172

  • I took the kids to see Fly Me To The Moon recently. We had to trek to Monroe (about 30 minutes away) because it's a special 3D movie, and it was only playing there and in downtown Seattle. The movie's story is insipid - three flies stow away on Apollo 11 - but all the space shots were actually kinda cool. It sure felt like they wanted to be scientifically and historically accurate about the the actual mission (well, other than the part about the flies). Patrick really liked it (he wants to build a rocket in the back yard) and Riley sat thru the whole thing with a minimum of fussing.
  • I'm a big fan of Joe Biden, so I'm really happy Obama picked him to be his running mate.
  • I know it's old news but what the frak was John Edwards thinking? I like his policies, but the arrogance it takes to run for president when you know you've got that skeleton in your closet is mind-boggling.
  • On the other hand, watching the Sean Hannity and guest's hypocrisy on Edwards' affair, only to watch them scramble like cockroaches when Colmes points out McCain had admitted to having an affair was frakking hilarious.

OK, onto geek stuff:

  • My new boss Dave Remy has moved to a new blog. If you're curious what he was up to for the 10 months he was away from Microsoft, he's happy to share.
  • IPy and IRuby developer Curt Hagenlocher (aka Iron Curt) is blogging. Cue the Ozzy...I AM IRON CURT. Or don't. Anyway, he dives in the deep end of the pool - no "hello world" lollyblogging for Iron Curt - digging into the stack implications of rethrowing exceptions and debugging emitted IL.
  • Srivatsn writes about static compilation of IPy scripts. Note, we're not talking about static typing - it's still the same good-old dynamically typed IronPython, just packaged up as an assembly, rather than as a bunch of .py files. Note, if you're interested in compiling IronPython, you should check out the PYC sample we published as part of Beta 4.
  • Speaking of IPy Beta 4, Shri Borde posts about the COM dispatch support which is enabled by default as of Beta 4. If you're driving COM automation clients (like Office) from IPy, this is a huge improvement over the old mechanism.
  • Jeff Hardy has released a new version of NWSGI, a managed version of Python's Web Service Server Gateway Interface. My understanding is that this would allow any Python web stack written against WSGI to run in IIS with IronPython (subject to IronPython's compatibility with CPython). Jeff's been documenting his efforts getting Django running with NWSGI on his blog. Awesome work Jeff! (Thanks for the correction Seo!)
  • I never really bought into the "Attention Economy", but Chris Anderson's economic analysis of his DIY Drones site traffic was fascinating.
  • Lutz announces "it is time to move on" from Reflector and there was a collective horrified scream in the .NET community. He's handing it over to Red Gate, who promised they "will continue to offer the tool for free to the community".
  • I missed this when he posted it in June, but I really liked Nikhil Kothari use of the DLR in Silverlight to cut down on the XAML verbosity in his ViewModel action binding.
  • Brian McNamara previews the new Add Reference and file ordering support in the upcoming F# CTP. I'm really looking forward to the project-to-project reference support. I can't tell you how many times I've gotten burned because my main project recompiled but my test project didn't. You just get used to hitting Rebuild All instead of Build. As for file ordering, it's a bit of a bummer that F# requires it, but the new experience is hella better than editing the project file by hand. I'm really looking forward to the new CTP.
Posted By Harry Pierson at 9:56 AM Pacific Daylight Time

Friday, August 08, 2008

Monadic Philosophy Part 5 - Reader Comments

Barry Kelly thinks that "programmers would understand monads better if they were described as a design pattern". I agree 100% and would love to see a monad design pattern written out using p&p's pattern form. The one thing I would note on this is that certain language constructs can make working with certain design patterns easier. For example, C# obviously has great language level support for the Iterator design pattern. Once you've got language level support, it doesn't really feel like a design pattern anymore, it feels like a language feature. I mean, given that you can write OO code in a language like C, does that mean technically OO is a "design pattern". I don't think so.

A commenter named atp warned me not to "fall into the newbie trap of thinking that monads are about sequencing operations. They aren't. A large number of monads (for example, Reader) are commutative and do not enforce any sort of statement ordering." Fair enough. For example, you switch the order of some LINQ operators and still end up with the same result. If you switch Where and Select, you should end up with the same output (assuming the where clause isn't invalidated by the select projection). But from a C#/F# perspective, I don't really care about monads for enforcing order anyway - the language has that natively. I care much more about the context flow aspect of monads, which it sounds like atp thinks we should be focused on anyway. Works for me.

Finally, Yuri K. pointed out that we aren't really stuck with the nested lambda expression syntax in C#. In Luke Hoban's Monadic Parser Combinators using C# 3.0 post, he implements a Where, Select and SelectMany extension method for his Parser delegate type, which allows him to plug into C#'s query comprehension syntax. He's 100% correct and I considered including this fact in my post. However, the mapping between query comprehension and the Bind and Result functions is a little murky, so I skipped it.

For C# query comprehensions, basically SelectMany does double duty, not only binding the parser and the parser generating function (which Luke called 'selector'), but also taking the two parse values and calling to a projector function and returning the projection return value in a Result. By implementing SelectMany, you can rewrite the TwoValues parser like this:

static Parser<string> QueryTwoItems() 

    return from v1 in Item()  
           from v2 in Item()  
           select string.Format("{0}{1}", v1, v2); 
}

which looks pretty much identical to the F# monadic syntax version. Luke also implements Where, which I have in my F# parser library as Satisfy. Where takes a parser and only returns the parser result if the provided boolean predicate returns true. Select is a projection, similar to SelectMany but only used with a single parser. I have a couple of specific projectors in my F# library (Ignore which tosses the parse result and Listify which turns a single result into a single item list) but I haven't had any need for a generic projector like Select. I'm assuming Luke only implemented Select to make the query comprehension work when you don't have multiple from statements.

Posted By Harry Pierson at 5:20 PM Pacific Daylight Time

Friday, August 01, 2008

Monadic Philosophy Part 4 - The Parser Monad in F#

In the last post, I built out a basic parser monad in C#. While the approach worked OK, the syntax is still a little foreign to your typical .NET programmer, what with it's nested anonymous functions and all. Now, I'm going to translate that code to F# and take a look at the special monadic syntax F# supports that makes using monads as easy any sequential code.

First, let's translate our Parser delegate, Bind, Result and Item functions over to F#. Just for kicks, let's also port over the final version of TwoItems too.

type Parser<'input, 'result> = 'input-> ('result * 'input) option

// the Bind function, defined as a custom operator
let (>>=) p f : Parser<'i,'r> = 
    fun input ->
        match p input with
        | Some(value, input) -> (f value) input
        | None -> None

let Result v : Parser<'i,'r> = fun input -> Some(v, input)

let Item : Parser<string, char> = 
    fun input ->
        if string.IsNullOrEmpty(input) 
            then None
            else Some(input.[0], input.Substring(1))

let BestTwoItems = 
    Item >>= (fun v1 -> 
    Item >>= (fun v2 -> 
    Result (sprintf "%c%c" v1 v2)))

First, we start with the declaration of the Parser type. Unlike C#, F# has built in support for tuples, so I didn't bother to define a Result type (just the Result function). A Parser is declared to be a function that takes in some generic input type and returns an optional tuple pairing the result with the remaining input to be parsed. As I've blogged before, F#'s option type is kinda like C#'s Nullable type, so a parser that returns None is considered to have failed to parse the input.

Next up is are the monad functions Bind and Result. The only significant change from the C# version is that I used the custom operator >>= for the Bind function. So instead of calling "Item().Bind(some_function)", we can call "Item >>= some_function". F# functions aren't attached to a type like C# extension methods are, so this is the only way to get the more readable infix notation. I'm using >>= as the bind operator because that's the operator Haskell uses for their monad function. Other than the custom operator name, Bind and Result work identically to their C# counterparts. Note, I explicitly specified the return type of Bind, Result and Item, but I didn't have to. F# can infer the types of all the parameters from usage just fine. I added the type specifications for the reader, in case you're not familiar with F#'s syntax.

Likewise, Item is identical to the C# version including using strings as the parse input, except for than the F# syntax. Typically, in a real parsing app you would use an intrinsic list of chars instead of strings, since F#s list is a much more efficient data structure than strings for operations that strip characters off the head of the list (like parsers are wont to do). However, I wanted to make this code as similar to the previous code, so I stuck with strings.

Finally, we have BestTwoItems. Again, syntax aside, it's exactly like it's C# cousin though I did use the slightly more compact sprintf function instead of string.Format. Again, while BestTwoItems it works well, it uses the same nested anonymous function syntax from the C# version. Maybe I shouldn't have called it "BestTwoItems"!

However, in F# it's possible to define a custom syntax for your monad that let's you write the function this way:

let VeryBestTwoItems =
    parse {
        let! v1 = Item
        let! v2 = Item
        return sprintf "%c%c" v1 v2 }

With this monadic syntax, we've now completely eliminated not only the Parser delegate and the input string, but also the nested anonymous functions needed by the Bind function, making the code appear completely sequential.

The secret to making this work is the parse monad object. It the code above, the word parse almost feels like a language keyword, but it's not. It's actually an instance of an parse monad object with a specific signature. F# knows how to take the syntax above and combine it with the parse monad object to produce the right code. Here's the parse monad:

type ParseMonad() =
    member w.Delay(f) = fun input -> f () input 
    member w.Return(v) = Result v 
    member w.Bind(p, f) = p >>= f
    
let parse = ParseMonad()

As you can see, there's an obvious direct correlation of Result and Bind functions we defined last time and the Return and Bind methods in the ParseMonad. The only thing we haven't seen before is the Delay method. Monads are one of of F#'s many delayed expressions. F# wraps the entire monad in a call to Delay to ensure the monad isn't executed prematurely.

As per the F# grammar spec, there are several other functions you can define on your monad if you so choose. My "real" parser monad also implements Zero and Combine. Zero returns a parser that unconditionally fails. By defining Zero on my monad object, I can write ifs without elses, the parser monad will implicitly inject Zero clause as your else statement. Combine combines results (I know, a shocker!). I use it as a prioritized choice. In other words, when you Combine two parsers, you only call the second parser if calling the first parsers fails. Prioritized Choice is used very often in PEGs, which is why I chose to define it this way.

F# monadic syntax also support For, Let, While, Using, TryFinally and TryWith. Frankly, I haven't spent much time thinking about scenarios where you'd use these other syntax elements. The only one that's obvious to me is Using for deterministic finalization, which you could see using anywhere you access IDispoasble objects. Here's hoping the F# folks document in detail how to use this powerful syntax.

So that's it for basic monads in F#. I've gotten some great comments (and one less than great comments) as I've written this series. In my last post on monads (in this series at least) I'll repost some of those comments as well as provide some concluding thoughts.

Posted By Harry Pierson at 3:42 PM Pacific Daylight Time

Thursday, July 31, 2008

Monadic Philosophy Part 3 - The Parser Monad in C#

(If you disregarded my advice and read the previous version of this post, please note I rewrote this post significantly so you'll probably want to read it again.)

In the last post, we looked at how LINQ is a monad and how IEnumerable is a pseudo-functional construct. However, C#'s intrinsic collection support - aka foreach and yield return - really obscure how you might go about building your own monad. So for this post, we're going to take a look at a parsing monad instead. Just as LINQ broke the big problem of queries into a collection of standard query operators that were composable, we want to take the same approach for parsers.

Note, I'm going to stick with C# for now, and get into F# monads in my next post. Quick shout out to Luke Hoban and Brian McNamara, from whom I stole obtained some of the code below.

Quick refresher: I've described a monad as a sequence of computations with a context flow. Since C# has explicit sequencing, we want to focus on the context flow. For LINQ, the context was IEnumerable. For parsers, we could define an similar IParser interface like this:

class Tuple<T1, T2>
{
    public readonly T1 Item1;
    public readonly T2 Item2;
    public Tuple(T1 val1, T2 val2) { Item1 = val1; Item2 = val2; }
}

class Result<T> : Tuple<T, string>
{
    public Result(T val, string rest) : base(val, rest) { }
}

interface IParser<T>
{
    Result<T> Parse(string input);
}

The Parse function takes a string to be parsed as input and returns the parsing result which pairs the semantic value with with the remaining string input to be parsed. I've built out a simple generic tuple class because I know I'll use it again later. I've long wished C# would support intrinsic tuples like F# does. For convenience, I've also created a strongly typed subclass of Tuple to represent parse results where the second item is a string, to save some typing. Since Result is a class, it can be null which means the the Parser failed to parse the input.

The problem with this approach is that unlike IEnumerable, the C# compiler has no built-in knowledge of this interface. That means there are no easy-to-use keywords like foreach and yield return that can do our heavy lifting of consuming or creating these IParser types for us. Instead, we would have to explicitly declare classes to implement the interface. As we add more and more parsers, that additional overhead of creating types would become more and more unwieldy. Instead, let's redefine Parser as a delegate.

delegate Result<T> Parser<T>(string input);

The benefit of this approach is that you can create Parser delegates inside functions, using C#'s anonymous delegate syntax, without the overhead of creating a type. For example, here's a function to create a simple primitive parser that strips the first character off the parse string and returns it as the parse result:

static Parser<char> Item()
{
    return input =>
        {
            return string.IsNullOrEmpty(input)
                ? null
                : new Result<char>(input[0], input.Substring(1));
        };
}

That's a lot more convenient than building a type just to implement a single method.

Now that we have our Parser type, we need to think about how to compose Parsers so that we can flow context between them. Much as LINQ provides a collection of primitive query operators (Select, Where, OrderBy, etc), you would expect a monadic parser library to provide a collection of primitive parsers (Item, Satisfy, AnyOf, ItemEqual, etc), that you could combine into higher-order parsers along with some language specific lower-order parsers. Here's an example from the the PEG grammar:

    Primary <- Identifier !LEFTARROW / OPEN Expression CLOSE / Literal / Class / DOT

The Primary parser depends on some high-order language specific parsers (Identifier, Expression, Literal and Class) as well as some language specific low-order tokenizer style parsers (LEFTARROW, OPEN, CLOSE and DOT) and finally some language-independent primitive parsers (the failure predicate ! and the prioritized choice operator /).

So how should we compose these various Parsers? LINQ query operators were fairly easy to compose because they all take in and return the same type (IEnumerable) so you can simply chain them together. Parsers are a little trickier because the inputs and outputs are asymmetric - i.e. they take a string, but return a Result - so simple chaining won't work.

We could combine the parsers sequentially, taking the parse string returned from first parser and feed it into the second. Then we could combine the two parse values in a Tuple to return them (you see why I created a generic Tuple class?) resulting in a function that looks like this:

static Parser<Tuple<T1,T2>> Join<T1,T2>(this Parser<T1> p1, Parser<T2> p2) 

    return input => 
        { 
            var ret1 = p1(input); 
            if (ret1 == null
                return null

            var ret2 = p2(ret1.Item2); 
            if (ret2 == null
                return null

            return new Result<Tuple<T1,T2>>( 
                new Tuple<T1, T2>(ret1.Item1, ret2.Item1), 
                ret2.Item2); 
        }; 
}

Note this is an extension method so we can call Parser1.Join(Parser2) rather than the less fluent Join(Parser1, Parser2). I was going to call this function Combine, but there's already a static Combine method on the Delegate type that caused a conflict, so I used Join instead.

The Join approach works, but it's a bit unwieldy to return the parsing values in a tuple. Every set of joined parsers will result in another level of tuple nesting in the Result that's returned. That gets pretty ugly pretty fast. For example, lets say we want to create a parser that combines two instances of Item. It looks like this:

static Parser<Tuple<char, char>> TwoItems()
{
    return Item().Plus(Item());
}

That's not so bad. But now look what happens if we combine the TwoItems parser with another instance of Item:

static Parser<Tuple<Tuple<char, char>, char>> ThreeItems()
{
    return TwoItems().Plus(Item());
}

The result is a nested tuple. Yuck. We need a better way. Enter the monadic bind. The code looks like this:

static Parser<U> Bind<T, U>(this Parser<T> p1, Func<T, Parser<U>> fun)
{
    return input =>
        {
            var ret1 = p1(input);
            if (ret1 == null)
                return null;

            var p2 = fun(ret1.Item1);
            if (p2 == null)
                return null;

            return p2(ret1.Item2);
        };
}

Like the Join function above, Bind starts by calling the first parser function, returning null if the parse fails. However, instead of calling to the second parser directly, it calls to the provided function that generates the second parser. This function acts as a closure, packaging up the parse value from the first parser for later processing. Finally, Bind calls to the generated second parser, feeding in the remaining text from the first parser result.

This approach allows you to inject code that combines the parsing values however we like rather than always pairing them up in a tuple. Here's a version of TwoItems that binds a call to Item with a custom function that calls Item again and returns the two characters as a string rather than a tuple:

static Parser<string> BetterTwoItems()
{
    return Item().Bind<char, string>(
        val => 
        {
            return input =>
            {
                var result = Item()(input);
                return new Result<string>(
                    string.Format("{0}{1}", val, result.Item1),
                    result.Item2);
            };
        });
}

It's kinda strange to see a lambda expression that returns a lambda expression in C#, but that's what this code does. The first lambda expression (val =>) defines the custom function, the second lambda expression (input =>) defines the Parser delegate. Val is the parse value from calling Item() the first time - ret1.Item1 in the Bind function above. Input is the remainder of the parse string - ret1.Item2 from the Bind function.

Unfortunately, while this approach avoids nested tuples for parse values, we've had to give up quite a bit of simplicity. The original TwoItems method was a single line of code. BetterTwoItems is significantly more complex. Furthermore, the double lambda expression syntax confuses C#'s type inference, forcing you to explicitly specify the generic types on the Bind method. Luckily there's a better way to write this. However, let's start by rewriting the function like this:

static Parser<string> SlightlyBetterTwoItems()
{
    return Item().Bind(
        v1 => Item().Bind<char, string>(
            v2 =>
            {
                return input =>
                {
                    return new Result<string>(
                        string.Format("{0}{1}", v1, v2),
                        input);
                };
            }));
}

SlightlyBetterTwoItems pulls the second call to Item out into a second Bind operation. The point of this refactoring is to make it clear that we can view this function as a call to Item, bound to a second call to Item, bound to custom function to return a Parser that returns the two parse value chars formatted as a string. You'll notice that by eliminating the the double lambda expression on the first call to Bind, we were able to drop out the explicit generic type specification.

This version is a little clearer, but we can make it clearer yet. It turns out that wrapping up a parse value in a Parser that unconditionally returns the parse value and the parse text input in a Result is a very common operation. So let's create a primitive function Result to wrap up a parse value in a Parser delegate and build our final version of TwoItems that uses it.

static Parser<T> Result<T>(T val) 

    return input => new Result<T>(val, input); 


static Parser<string> BestTwoItems()
{
    return Item().Bind(
        v1 => Item().Bind(
        v2 => Result(string.Format("{0}{1}", v1, v2))));
}

Now it's very clear that we have a call to Item, bound to a second call to item, which is in turn bound to a call to Result. We've now dropped all use of double lambdas, which means C# can infer the types to each of our Bind calls implicitly. But more importantly, do you see any reference to Parser<T> delegates or input strings in this code? Only in the return type specification. Just as LINQ hides the specifics of flowing IEnumerable or enumerator objects between standard query operators, the parser monad hides the specifics of flowing Parser delegates or input strings between parse operations.

The Parser delegate plus the Bind and Result methods are all there are to our basic parser monad. Seriously, all that worry that monad "is a bit obscure and sounds a little daunting" and it's really just two functions and a delegate type.

While this code is fairly straight forward, the whole nested lambdas expressions is fairly atypical syntax that some developers might have a hard time understanding. Unfortunately, if we're writing our parsers in C#, we're kinda stuck with this syntax. However, F# has a special syntax that lets you write what looks like normal sequential code, while still flowing the Parser context between parse operations exactly like the code above does. We'll take a look at that syntax in the next post.

Posted By Harry Pierson at 8:47 PM Pacific Daylight Time

Wednesday, July 30, 2008

Monadic Philosophy Part 2 - The LINQ Monad

If you don't come from a math or philosophy background (and I don't) "monad" sounds like a made-up word. Of course, understanding OO's use of terms like "class" and "object" can be hard to grok at first too. But at least those terms have some grounding in real-world concepts that non-math geeks come across. Because I couldn't draw an analogy of monads to anything at first, it made grasping the concept of monads very hard for me.

It's such a unfamiliar word that the F# team doesn't like it either:

"[W]hen the designers of F# talked with the designers of Haskell about this, they agreed that the word monad is a bit obscure and sounds a little daunting and that using other names might be wise."
[F# Workflows and Haskell Monads, Expert F#, p232]

The F# team thought about calling them workflows, but settled on computation expression. Frankly, I don't like these names much better. Workflow is too easily confused with WF and if the term computation expression is way to generic. Isn't everything in programming a computation expression? I think I'll just stick with monad.

Of course, if there was a short, pithy way of describing a monad, I'm sure that's what we'd call them. It's a kinda complicated idea, so there's no simple two or three word phrase that accurately describes it. "Sequential computation with context flow" is the best I could come up with. It's a crappy description, but here's an elegant example that most .NET programmers are probably already familiar with.

var Orders = new List<Order>()

//code to populate orders omitted
var q = Orders 
    .Where(x => x.OrderDate < DateTime.Now) 
    .OrderBy(x => x.OrderDate) 
    .Select(x => new {ID = x.OrderID, Date = x.OrderDate})

Yes it's true: LINQ is a monad. The two basic concepts about a monad from my description above is that it's a) a sequence of operations and b) there's a context that flows from operation to operation. We see both here in this simple LINQ query. I realize I'm using what looks like a LINQ to SQL query here, but for the sake of argument let's assume that this is all happening in memory.

The query is a sequence of three operations: Where, OrderBy and Select. LINQ has a set of standard query operators that you can mix and match in whatever order you need to. Part of the monad's job is to enforce the sequence of actions. For C#, that's not really a big deal, since it has explicit sequencing already. However, other languages like Haskell use lazy evaluation, meaning there is no explicit order of execution. Many lazy evaluation languages use monads in areas, such as I/O, where order of execution matters.

While C# doesn't need any help to enforce execution order, monads are very useful in the way they flow context between the operations. In the case of LINQ, all the standard query operators take an IEnumerable<T> as their first parameter and return an IEnumerable<T>. Since they have the same inputs and outputs, they can be plugged together in whatever order is required. Yet, you don't see any reference to GetEnumerator or the enumerator objects they return in the LINQ code above. All that code is hidden inside the LINQ query operators so the LINQ developer doesn't have to look at it.

If you squint hard enough, IEnumerable kinda looks like a functional construct. It exposes a single method (GetEnumerator) and can be passed around much the same way functional languages like F# pass around first-order functions. Furthermore, the result of calling GetEnumerator is an IEnumerator object that likewise exposes one main function (MoveNext). In other words, you can think of IEnumerable sort of like a function that returns a function that you call to iterate the collection.

So to sum up, a monad is a sequence of operations in a specific order that automatically flows context from one operation to the next. In the LINQ example, C# has built-in constructs - IEnumerable<T>, foreach and yield return - that makes the monad seem less foreign (which is why I used it as my first example!) However, as we'll see, the concepts of sequence and context flow in a monad still hold even if we're not using built in features of C# to implement them.

Posted By Harry Pierson at 9:53 AM Pacific Daylight Time

Tuesday, July 29, 2008

Monadic Philosophy

(Since I accidentally published part one of this series a few minutes ago, I figured I might as well start publishing the series.)

If you start learning functional programming, eventually you'll come across the idea of a monad. Coming from the object/imperative world of languages like C#, I've had a hard time wrapping my head around this concept. There's no shortage of monad tutorials out there, but most use Haskell's IO as the prototypical example of a monad. Given that I don't know Haskell very well, I found it hard to separate the Haskell stuff from monad stuff. So I set monads on the back burner and decided not to worry about them.

However, all that changed when Stephan Tolksdorf alerted me to his very cool monadic parser combinator library FParsec. I found the FParsec parsers much easier to read my F# parser efforts, so I became very interested in monadic parser combinators. As you might guess, a "monadic parser combinator library" makes heavy use of monads. Time to switch burners.

The problem with learning monads with FParsec is that it's really designed for production use. I needed to break monads down to first principles, so I rolled my own monadic parser library. Make no mistake, if I were looking to build a production parser in F# right now, I'd use with FParsec. My monadic parser library might "get there" eventually, but right now it's a toy.

Over a series of posts, I'm going to describe what I know about monads. I didn't set out to write a tutorial on monads - as I said, there are plenty of them out there. However, I found most of the the many monad tutorials I read lacking because the did a good job explaining the "how", but not such a good job on the "why". Coming from an imperative world, I wanted to understand the philosophy better. That being said, there's a lot of tutorial in and around the philosophy. Hopefully, you'll find both useful.

Posted By Harry Pierson at 5:10 PM Pacific Daylight Time

Friday, July 25, 2008

Morning Coffee 171

  • Big news for IronRuby out of OSCON. John and Jim have the details. Congrats to the IronRuby folks on reaching these milestones and paving the way for others (i.e. IPy) to follow some of the same paths.
  • One of those OSCON announcements, is a project my teammate Jimmy Schementi has been working on: Silverline, which "let's you write Rails code that can run on the client".
  • Shri Borde - the dev manager for IPy, IRuby and F# - tackles a tricky subject of static compilation of dynamic Python code. This came up on the mailing list recently as one of the outstanding requests for IPy to do is support custom attributes, which requires static compilation. Shri lays out some of the big issues with this approach. However, the community has been fairly clear on this, so it's obviously something we need to look at.
  • I met someone from MS Research at the MS Product Fair who pointed me to the Institute for Personal Robots in Education, a joint effort between Georgia Tech and Bryn Mawr College and sponsored by Microsoft Research. Their Myro software (myro == my robot) is written in CPython, but there's an effort underway (aka Miro 3.0) to build a .NET version that uses IronPython. Must investigate.
  • Seshadri shows how easy it is to extend C# types in IronPython. It's also shows how simple it is to host DLR code in your app - it's like 6 lines of code!
  • Early reviews of IronPython in Action are good.  
  • If you want to run an IronPython IDE in your browser with Silverlight, check out SilverShell from Dan Eloff.
  • The XNA team has announced their business plans for community games. Basically, you set a price point between 200 and 800 points (aka between $2.50 and $10) and receive a "baseline" of 70% of the revenue the game generates. More details are available in the FAQ. This is pretty excited. I'd like to build some co-op kids games.
  • Speaking of XNA, Caligari is now offering TrueSpace 7.6 for free . David Weller and Glenn Wilson provide an XNA viewpoint on the announcement, Chris Pendleton shows how to upload your models to VirtualEarth.
  • Congrats to the CodePlex team on their latest drop, which features that a cool new feature - Mailing Lists! IronPython has had a Mailman mailing list for years, so I'm not sure we'll use this feature on IPy, but I'll investigate it
  • Two PDC notes: First, Rick Rashid - VP of MS Research - will be delivering a PDC keynote. Second, the PDC team has put up a video podcast on Producing a Ginormous Conference in 10 Minutes or Less! It's the "inaugural episode" so watch for more Countdown to PDC video podcast episodes in the future.
  • I recently discovered Chris Smith's F# blog. He's got recent posts on Mastering F# Lists and Guidelines for Readable F# code. For the F# novice, check out his F# in 20 Minutes posts (part one, part two)
  • Pat Helland is moving to the SQL team. Good luck Pat!
  • I like Nick Malik's formal definition of use cases, but I can't help be reminded of Charlie Alfred's Value-Driven Architecture article in Architecture Journal 5 where he said use cases were "easy to teach and explain" but that "if simplicity were the only goal that counted, we'd all still be walking or riding horses to get from one place to another."
Posted By Harry Pierson at 11:41 AM Pacific Daylight Time

Monday, July 21, 2008

Five Minutes Past Noon Coffee 170

  • Ben Hall announces IronEditor, a simple dev tool for IronPython and IronRuby. Pretty nice, though fairly simplistic (as Ben readily admits). For example, it doesn't have an interactive mode, only the ability to execute scripts and direct the output to IronEditor's output window. However, it is a good start and I'm sure it'll just get better. One thing he's apparently considering is a Silverlight version. (via Michael Foord)
  • Speaking of "Iron" tools, Sapphire Steel have had an IronRuby version (in alpha) of their Ruby in Steel product for several months now. I wonder if John's had a chance to play with it.
  • Speaking of John, the ASP.NET MVC / IronRuby prototype he talked about @ TechEd is now available on ASP.NET MVC Preview 4 via Phil Haack.
  • Ted Neward has an article exploring the IronPython VS Integration sample that ships in the VS SDK. As I mentioned the other day, we're starting working on a production quality implementation of VS Integration for IPy.
  • Ophir Kra-Oz (aka Evil Fish) blogs Python for Executives. I like his "Risk, Recruiting, Performance and Maturity" model - four boxes, perfect for keeping an executive's attention! :) Plus Ophir has some nice things to say about IronPython. (via Michael Foord)
  • Ronnie Maor blogs an extension method for PythonEngine to make Eval simpler. I especially like how he uses string format syntax so you can dynamically generate the code to eval. I wonder what this would look like in IPy 2.0 with DLR Hosting API. (via IronPython URLs)
  • Speaking of DLR Hosting, Seshadri has another great DLR hosting post, this time hosting IPy inside of VS08 so you can script VS08 events (document saved, window created, etc) with Python.
  • Justin Etheredge has a bunch of IronRuby posts - Getting IronRuby Up and Running, Running Applications in IronRuby, Learning Ruby via IronRuby and C# Part 1. (via Sam Gentile)
  • Don Syme links to several F# related posts by Ray Vernagus, though he's apparently also experimenting with IronRuby. I'm really interested in his Purely Functional Data Structures port to F#.
  • Speaking of F#, Brian has a teaser screenshot of F# upcoming CTP. However, he chooses the New Item dialog to tease, which looks pretty much like the current new item dialog (the new one does have fewer F# templates). However, if you look in the Solution Explorer, you'll notice a real "References" node. No more #I/#R! Yeah!
  • The interactive graphic in Kevin Kelly's One Machine article is fascinating. It really highlights that the vast vast vast majority of power, storage, CPU cycles and RAM come from personal computers on the edge. Even in bandwidth, where PC's still have the highest share but it looks to be around 1/3rd, the aggregate of all edge devices (PCs, mobile phones, PDAs, etc.) still dominates the data centers.
Posted By Harry Pierson at 12:05 PM Pacific Daylight Time

Monday, July 14, 2008

Morning Coffee 167

  • If you're a gamer, you're probably already well aware that E3 is this week. The Too Human demo has already been released. I have a friend who's been working on "something" that will be announced today (I think).
  • Live Mesh folks pushed out an update Friday. Among the new features is the ability to sync folders among peers but NOT up to the cloud. This is cool because it means I can sync my many many GB of pictures and music on my home machine backed up with Carbonite. This means I can sync them without blowing thru my 5GB Mesh storage limit.
  • It looks like there's a new F# drop - 1.9.4.19 - but as usual there is no announcement or details as to what's new. Release notes guys, look into it.  UPDATE - Don Syme blogged the release, and it's pretty minor. a .NET FX 3.5 SP1 bug fix, a fix for Mono, and they removed WebRequest.GetResponseAsync to make F# work on Silverlight. And the release notes are in the readme. My bad.
  • Speaking of F#, it was "partially inspired" by OCaml, so when I see papers related to OCaml, I immediately wonder if I an apply the described techniques to F#. "Catch me if you can, Towards type-safe, hierarchical, lightweight, polymorphic and efficient error management in OCaml" is one such paper. (via LtU)
  • Speaking of functional programming, Matthew Podwysocki posted a bunch of FP links as well as a Code Gallery Sample on FP in C#. Good stuff.
  • As per Scott Guthrie, it looks like there's a new ASP.NET MVC drop coming this week.
  • Based on posts by Ted Neward, Dare Obasanjo and Steve Vinoski, Google Protocol Buffers sounds like it's going to be a dud. Note, I haven't looked at it depth personally, I'm just passing on opinions of some folks I read and trust.
  • Speaking of Dare, both he and James Hamilton take a look at Cassandra and come away impressed. I wonder how easy it is to code against from Python and/or .NET?
  • Bart de Smet has a cool sample of calling out to PowerShell from IronRuby via the backtick command. Pretty cool, but it would even cooler to show how to call out to PS and return .NET objects to Ruby (though that would probably not be spec compliant for the backtick command).
  • Here's a MS code name I had never heard before - Zermatt. It's "a framework for implementing claims-based identity in your applications." (via Steve Gilham)
Posted By Harry Pierson at 9:30 AM Pacific Daylight Time

Monday, July 07, 2008

Morning Coffee 166

Yes, I realize it’s been a while. I tried in vain to catch up with my blog reading after my Hawaii vacation and finally just gave up and hit “mark all as read”.

Dynamic Languages

  • There's a new version of the DLR hosting spec available (doc, pdf). The DLR implementation is still in motion, so there are some inconsistencies between the spec and the code, but the spec should give you the high level overview you need if you want to host DLR languages inside your app.
  • Oleg Tkachenko recently joined the dynamic languages team. He's the creator of the Interactive IronRuby Web Shell, an IronRuby version of Try Ruby. Of course, it’s not as cool as using SL2to execute the code directly in the browser. Michael Foord has his Python in the Browser and my teammates John and Jimmy demoed a Silverlight version of Try Ruby @ TechEd.
  • Jim Deville, also of the dynamic languages team, recently started blogging.
  • I have a new boss, Dave Remy. He doesn't have a blog - yet - but you can follow him on Twitter as daveremy. When Twitter is actually working that is.
  • There's a new homepage/wiki for IronRuby though I’m not sure why there's a picture of Matz wearing a Python shirt on the home page.
  • My teammate Jimmy Schementi provides some "continued hope" for a better (heck, I'll take current) ASP.NET and ASP.NET MVC story for DLR languages.
  • Via Michael Foord, sounds like IronClad is making good progress. V0.4 can run the bz2 module "in its entirity" (maybe run a spellcheck on your site, guys?) and now apparently, it's now able to load numpy.core. Very exciting!

Other Stuff

  • Pat Helland, who has blogged even less than me for the past few months, has a post up about controller and doers in the IT department. After 18 months in MSIT, put me in the doer camp, please.
  • The F# team has pushed out a spec for v1.9.4 of the language. Don Syme says it's not official, but it's a huge improvement over the old informal spec
  • Speaking of F#, my friend Matthew Podwysocki recently published FsTest, a testing DSL for F#. I wrote about F# unit testing as part of my PEG parsing series, and I really like the direction Matthew has taken this project. You can pull it down from CodePlex.
  • When I did my PEG talk @ Lang.NET, Gilad Bracha mentioned I should check out oMeta. It looks really cool, though with the job change I haven’t had the time to play with it. Now I discover that Jeff Moser is working on a version for CLR called oMeta# that I’ve got to spend some time with. And in the comments to that post, I discovered pyMeta from Allen Short, though it apparently doesn’t work on IronPython (must investigate why).
  • James Kovacs introduces psake, a PowerShell based build automation tool which uses a rake-inspired internal DSL syntax similar to one I blogged last year. I'd love to see this take off, but given MSBuild's tool integration, I wonder if that's feasible.
  • I upgraded my home wireless network almost exactly a year ago. I've been happy with the range and coverage, but not so happy with the Buffalo Tech firmware. The built-in DHCP server is pretty flaky. So I upgraded to the open-source Tomato firmware. Upgrade was smooth, though I did need to reset my cable modem. But even that was smooth - Comcast has an automated service for that now,
Posted By Harry Pierson at 9:30 AM Pacific Daylight Time

Tuesday, July 01, 2008

Programming Languages @ PDC08

The PDC folks pushed out a bunch of new sessions yesterday, including a bunch from my part of DevDiv. You can see the full list of sessions that have been published (so far) at the PDC site.

An Introduction to F#
Learn about Microsoft's new language, F#, a typed functional programming language for the .NET Framework. F# combines functional programming with the runtime support, libraries, tools, and object model of .Net. Understand how F# asynchronous workflows help tame the complexity of parallel and asynchronous I/O programming and how to use F# in conjunction with tools such as Parallel Extensions for .NET.

Deep Dive: Dynamic Languages in .NET
The CLR has great support for dynamic languages like IronPython. Learn how the new Dynamic Language Runtime (DLR) adds a shared dynamic type system, a standard hosting model, and support for generating fast dynamic code. Hear how these features enable languages that use the DLR to share code with other dynamic and static languages like VB.NET and C#.

Future Directions for Visual Basic
Come learn about the new capabilities in the next version of the language, including: extensions to LINQ, syntax simplifications, and improvements to the IDE. We'll provide insight into the direction of the language, including dynamic binding, meta-programming, and scripting.

The Future of C#
In this talk Microsoft Technical fellow and C# Chief Architect Anders Hejlsberg outlines the future of C#. He will describe the many forces that influence and shape the future of programming languages and explain how they fit into C#.

Visual C++: 10 is the New 6
Get more done. The next version of Visual C++ is all about improving developer productivity for large-scale applications. Learn about the IntelliSense and browsing experiences, changes to the project and build system, project-less browsing, collaboration through remote symbol indexing, and custom visualization of symbolic information.

Posted By Harry Pierson at 10:37 AM Pacific Daylight Time
C# | C++ | Lanugages | DLR | F# | IronPython | IronRuby | Microsoft | PDC08 | Visual Basic

Thursday, April 24, 2008

Morning Coffee 164

  • Big news since my last Morning Coffee post was the announcement of Live Mesh. I've been running it for about a month, and I'm really digging it. Make sure you check out the team blog and watch the developer tour video (be on the lookout for IPy about half way thru the video)

ALT.NET

  • I had a great time @ the ALT.NET open space conference last weekend. I was somewhat distracted on Saturday as due to a family communication mixup, I had to bring my son Patrick with me. Jeffrey Palermo shot a cute video of him (3 minutes in) where he explains that he's at the conference "to be with my dad". Having a five year old is a little distracting, but everyone was amazingly cool with having him around. When he gets a little older I have no doubt he'll be attending conferences and leading open sessions.
  • I did a session on F#, but it felt kinda all over the place. I hadn't touched F# in a few months and it showed IMO. Matt Podwysocki was there to help keep the session from devolving into mass chaos. Thanks Matt.
  • My favorite session of the conference was Scott Hanselman's "Are We Innovating?" talk, which I think originated from a question I asked him: There are many examples of large OSS projects in other dev communities that get ported to .NET (NHibernate, NAnt, MonoRail, etc). Can you name one that's gone the other way? I can't.
  • I took Matt's advice and joined the local ALT.NET Seattle group.

DyLang Stuff

  • Martin Maly posts about how dynamic method dispatches are cached in three different layers by the DLR. You shouldn't care about this stuff if you're a DLR language user, but you will certainly care about it if you're a DLR language builder.
  • I'm really excited to see Phil Haack (whom I met F2F @ ALT.NET) is experimenting with IronRuby & ASP.NET MVC. True, I'd rather it was IPy, but his Routes.LoadFromRuby would work with Python with very little code change.
  • Note to self, take a deeper look at Twining, the IPy database DSL by David Seruyange.
  • Daily Michael Foord - Ironclad 0.2 Released. Ironclad is a project to implement Python's C extension API in C# so that IronPython could load standard Python C modules like SciPy and NumPy. So far, they're able to load the bz2 module

Other Stuff

  • Congrats to Brad and Jim for shipping xUnit.net 1.0.
  • Everyone seems to be jumping on the functional C# coding bandwagon. Bart De Smet's series on pattern matching in C# is currently at eight posts. Now Luca Bolognese is in on the action, with three posts so far on functional code in C#. I like how Luca keeps writing that the C# syntax is "not terrible" for functional programming. Again, why suffer thru the "not terrible" syntax when you could be using F# instead? (via Charlie Calvert)
  • I need to take a look at VLinq. Charlie and Scott Hanselman both mentioned it recently.
  • I would like to have been in the conversation with Ted Neward, Neal Ford, Venkat Subramaniam, Don Box and Amanda Silver.
  • I haven't had any time to play with XNA of late, which means the great list of GDC videos Dave Weller posted on the XNA team blog will remain beyond my ability to invest time for now.
  • There's a new drop of Spec# from MS Research. IronRuby is using Spec# heavily as I recall.
Posted By Harry Pierson at 10:53 AM Pacific Daylight Time

Monday, April 07, 2008

Morning Coffee 161

  • Huge perk of the new job: new hardware. I had to give up my Dell workstation but I got a Lenovo T61p dual core widescreen laptop, an HP dc7800 dual monitor quad core desktop and a Polycom CX700 IP phone. I'm really digging the Lenovo's integrated fingerprint reader - no more password login - but I'm most impressed with their integrated driver management software. Sure beats the heck out of hunting for dozens of updated drivers all over the place like most vendors for you to.
  • Minor downside to all my new toys: I spent most of my first week on the job installing and configuring said new toys.
  • Caps will face the Flyers in the first round of the playoffs which starts Friday. I have a feeling that I'll be feeling poorly Friday around 3pm and have to head home early. :)

DyLang Stuff

  • Apparently, Michael Foord isn't getting enough exposure on this blog. :) He left a comment to remind me to mention the IronPython URLs link blog he writes along with Mark Rees and Seo Sanghyeon.
  • Speaking of Michael, his employer Resolver Systems just launched a new product: Resolver One Quant.
  • Still speaking of Michael, he's quoted in the InternetNews article Python Fans Take Aim at the Enterprise.
  • My teammate Jimmy Schementi posts a preview of his spare time project "Silverlight on Rails". This RoR plugin lets you declaratively specify if you want your RoR controller code to be accessed remotely via AJAX and run on the server or if you want that code to be downloaded to the client and run in SilverLight. Very cool stuff.

Other Stuff

  • Don Syme provides some insight into the F# producization process. There's going to be an update to the "Research release" later this month and a CTP of the "Product release" later this summer (Brian McNamara has the CTP details). I am looking forward to these releases, though I'll probably be too busy w/ IPy to experiment much with them.
  • Speaking of F#, Matt Podwysocki continues his adventures with F# with a look at tuples, records and discriminated unions. Of the three, I find discriminated unions the most interesting since there isn't anything like it in other languages I've used.
  • Gregori and Chris both announce the release of Unity 1.0. Congrats guys! But if I don't have time to hack around with the latest F# release, you can imagine I won't be getting to Unity any time soon...
  • Jeff Atwood recommends you build your application UI first. Furthermore, he does a good job selling the value of paper prototyping as well as introducing the concept of PowerPoint prototyping. Money quote: "You don't want something too powerful."
  • Via LiveSide I discovered James Hamilton's blog. Normally, hardware infrastructure isn't really my bag, but I find his ideas around using ISO standard shipping containers as modular data center building blocks fascinating. For example, check out this post that suggests sticking modular data centers in condos would be cheaper than building data centers!Subscribed.
  • Speaking of ISO, you may have heard Open Office XML was ratified as an ISO standard. Obviously, there was a lot of controversy around this, but Miguel de Icaza lists of what he considers major community wins from the standardization process. Anything that "pushed Microsoft into more open directions" is a good thing IMO.
Posted By Harry Pierson at 9:39 AM Pacific Daylight Time

Monday, March 17, 2008

DLR Resources & Jobs

I'm back home from PyCon, but between digging out my inbox, finishing transition reports and doing my mid-year career discussion I'm a little busy. But I did want to point at a couple of recent posts from the IPy team blog:

Posted By Harry Pierson at 11:25 AM Pacific Standard Time

Monday, March 10, 2008

Morning Coffee 157

  • My Xbox 360 started flashing the dreaded Red Ring of Death on Friday. <sigh> I'm not going to have much time to play in the next week, so it's not the end of the universe, but I did have to dig an old DVD player out of the garage for interim duty.
  • My Caps really stepped in it over the weekend dropping two games they had to have and by most reports (aka according to my dad) that they dominated most of the way. Caps Playoff Math isn't as dire as say Clinton's Nomination Math, but they are three games back of the Hurricanes with twelve to play.
  • Ted Neward has a pretty good F# overview article in the most recent MSDN Magazine. I say pretty good because I wonder if someone with no functional programming experience will "get it". As much as I like F# and functional programming, I think some of the basic concepts don't pass Don Box's two beer test.
  • Speaking of Ted, somehow his feed fell off my radar (bad DevHawk!) and I missed several great posts like Modular Toolchains (note to Ted, check out A Research C# Compiler), Why we need both static and dynamic in the same language (note to self, check out Cobra) and The Fallacies Remain.... (recently, I'm the guy shouting about risks).
  • Speaking of MSDN Magazine, have you seen their new site redesign? I can't find any announcement of it, but man the site looks great.
  • If you missed MIX, the sessions are all online already. That was fast.
  • John Lam blogs about the availability of the Dynamic Silverlight bits. Apparently, Dynamic Silverlight includes more recent bits than the Silverlight 2 SDK, which does includes binaries and tools for IronPython, IronRuby and Managed JScript (quickstart). So you can get started with dynamic languages on Silverlight using the SL SDK alone, but I expect that the Dynamic Silverlight bits will be updated more regularly than the SDK.
Posted By Harry Pierson at 8:59 AM Pacific Standard Time

Tuesday, March 04, 2008

Play Ball Script in F#

Scott Hanselman wanted an F# version of this Play Ball! round-robin scheduling PowerShell script. Here's what I came up with:

let randomize list =
    let random = new System.Random()
    let list'= 
        list 
        |> List.map (fun i -> (random.Next(), i))
        |> List.sort (fun (i1,_) (i2,_) -> Int32.compare i1 i2) 
    let (_,list'') = List.unzip list'
    list''

let rec makeGames teams =
    match teams with
    | first :: rest ->
        [for team in rest -> (first, team)] @ (makeGames rest)
    | [] ->  []

let teams = ['a'..'f']
let games = teams |> makeGames |> randomize

MakeGames uses pattern matching to see if the list of teams is empty or not. If the list is empty, we simply return an empty list of games. If not, we use F#'s list comprehension syntax to generate a list of games between the first team in the list and each of the remaining teams. This list is combined (via the '@' operator) with the results of calling makeGames recursively. This generates the un-randomized list of games.

Once we have the list of games, we need to randomize it. I ported the randomize function above over from a version I found in Erlang. Basically, it attaches a random number to each element in the list to be randomized, sorts by those randomly generated numbers, then throws the numbers away and returns the just the randomized list. Note, the Erlang version I referenced runs randomize log(length of list) times to ensure a fair shuffle, but I thought once would be enough for this simple script.

(Note to F# team: perhaps List.randomize should be a part of the standard F# library?)

Posted By Harry Pierson at 12:58 PM Pacific Standard Time

Friday, February 22, 2008

Morning Coffee 149

Posted By Harry Pierson at 10:34 AM Pacific Standard Time

Wednesday, February 13, 2008

Morning Coffee 146

  • The writers strike is officially over. Everyone goes back to work today. Thomas Cleaver has what I thought was the best post summarizing how the writers won. TV Guide has a rundown of how and when various shows will resume. I can't wait to see Daily Show and Colbert Report tonight. Lost - aka the best show on TV - looks like it will be getting five more episodes (in addition to the eight shot before the strike).
  • Speaking of TV, Battlestar Galactica Fans: circle April 4th on your calendar.
  • Obama won all three "Potomac Primaries" yesterday, and is now the Democratic front-runner, though there's a long way to go before the convention. Scott Adams of Dilbert fame has a great take on presidential experience - I'm guessing he's an Obama fan.
  • In minor acquisition news, Microsoft is acquiring Caligari, makers of 3D modeling tool trueSpace. The Caligari folks are joining the Virtual Earth team, though I wonder what the XNA folks think of the acquisition. This isn't the first 3D modeling product Microsoft ever acquired - we owned Softimage for four years in the '90s.
  • Scott Hanselman and Tomas Resprepo both write about PowerShellPlus, which I saw week before last @ Lang.NET. Scott really likes it, for both PS novices and gurus, but Tomas thinks the UI is busy, based on the screenshots. Personally, I'm not doing much PS work lately - occasional one off stuff, but that's it - so it doesn't seem worth the effort.
  • Speaking of Scott & Tomas, Scott also has a nice gallery of VS themes. I'm partial to Tomas' Ragnarok Grey. Is there a VSThemesGallery.com site somewhere?
  • Still speaking of Scott, he points to the new ASP.NET Developer Wiki (beta). I poked around, but didn't find anything shiny. I was very surprised that searching for "MVC" returned no results.
  • Speaking of MVC, Scott Guthrie has a rundown on what's coming in the MIX preview release of ASP.NET MVC. Biggest news IMO is that it's /bin deployable - i.e. you don't need your hoster to do anything special to support MVC (assuming they already support ASP.NET 3.5). Also big news, they're releasing the source so you can build and patch (and enhance?) it yourself.
  • Chris Taveres continues is ObjectBuilder series and Tomas continues is DLR Notes series. BTW, my F# based DLR experimentation continues, albeit slowly (frakking day job). Hope to be able to post on this soon.
  • One of the things driving my interest in F# is manycore. An interesting tangent to manycore is general purpose programming on graphics processing units (aka GPGPU). MS Research just released a new version of Accelerator, just such a GPGPU system. I personally haven't played with it - I've been focused on writing parsers, not parallel code.
  • Is XQuery really "a promising technology of the future" as Don Box suggests? I see exactly zero demand or use for it in my day-to-day work. Of course, Don's paid to build future platform goo, so maybe it is promising and Don's afore-mentioned goo will leverage it, though I remain skeptical. As for XML being "Done like a well-cooked steak", I'd say XML is like a great steak cooked perfectly, except it's done exactly how you don't like it. You can appreciate its quality, but you don't really enjoy it as much as you could have.
Posted By Harry Pierson at 10:04 AM Pacific Standard Time

Wednesday, February 06, 2008

Morning Coffee 143

  • I've been sick for three days, hence the lack of posting around here.
  • As a Redskins fan, it's hard to root for any other NFC East team. On the other hand, it sure was easy to root against the Patriots. Congrats to the Giants on their Super Bowl victory. Favorite headline: 18 and uh-oh!
  • Sounds like there's cause for optimism regarding the writer's strike. But is it already too late? Will the 9% drop in viewers ever come back? Personally, I think the studios have hastened their own irrelevance.
  • With last night's win, the Caps are one game above .500. In and of itself, that's nothing to be proud of - Coach Boudreau remarked when we reached .500 that the Caps had "officially reached mediocrity". However, the Caps are the only team in the SE conference that's above .500. If hockey used baseball standings, Carolina, Atlanta and Florida would each be 1/2 game back of the Caps. It's going to be a fight to the finish.
  • In fairly big managed Ruby news, Wayne Kelly has decided to contribute to the IronRuby effort, effectively walking away from the Ruby.NET which helped get off the ground. One the one hand, obviously this is great for IronRuby. On the other hand, I liked the idea of multiple managed implementations of Ruby, so here's hoping Ruby.NET doesn't fade away.
  • Speaking of the DLR, I know I mentioned Martin Maly's blog in my Lang.NET Morning Coffee Post, but I didn't actually get to read his posts on targeting the DLR until I unexpectedly had several days off sick. If you are at all interested in writing your own language for the .NET platform: Go. Read. Now. You should also check out Tomas Restrepo's blog, he has also started writing about targeting the DLR.
  • Larry O'Brien's blog is currently offline, but he commented that he doubted my ToyScript F# parser would be more than 600 lines of code. Currently, the parser is clocking in at 287 lines of code plus about 50 more for the AST. It's not done yet - see earlier statement about being sick - but I'm fixing bugs not writing additional code at this point. To be completely accurate, that's 287 lines of FParsec code. It's taken me a little bit to learn FParsec, but so far I'm pretty happy with it.
  • Scott Hanselman points to the new MS Deploy project, a tool for managing content and configuration on web servers. I've never understood why this wasn't a standard part of IIS. It seems every hosting company I've used has rolled their own web-based management tool like DotNetPanel.
  • Oh yeah, Vista SP1 and Windows Server 2008 shipped Monday. Congrats!
  • I fired up Inside Xbox the other day, and there was a page about the new Disney Channel show "Phineas and Ferb". Of course, with two kids under five, anything new on the Disney Channel is notable in my house. What made this blog-worthy is the fact that it's directed and written by Dan Povenmire, who I knew from my USC days. I used to go see his band Keep Left and groan loudly at the bad puns in their song "PSA". Dan, if you found this searching for yourself online: Awesome work, my kids love the show!
Posted By Harry Pierson at 11:41 AM Pacific Standard Time

Thursday, January 31, 2008

Morning Coffee 141 - Lang.NET '08 Edition

header I was hoping to blog my thoughts on Lang.NET as the event went along. Obviously, that didn't happen, though I was pretty good about dumping links into my del.icio.us feed. The talks were all recorded, and should be up on the website in a week or two. Rather than provide a detailed summary of everything that happened, here are my highlights:

  • The coolest thing about conferences like this is what John Rose called "N3" aka "Nerd-to-Nerd Networking". It was great to meet in person, drink with and geek out with folks who's blogs I read like Tomas Petricek, Wesner Moise and Larry O'Brien. Plus, I got to meet a bunch of other cool folks like Gilad Bracha, Stefan Wenig and Wez Furlong. That's worth the price of admission (which was admittedly nothing) right there.
  • Coolest MSFT talk: Martin Maly "Targeting DLR". I was wholly unaware that the DLR includes an entire compiler back end. Martin summarized the idea of DLR trees on his blog, but the short version is "you parse the language, DLR generates the code". That's pretty cool, and should dramatically lower the bar for language development. Of course, I want to write my parser in F#, so I'm going to port the DLR ToyScript sample to F#.
  • Runner-up, Coolest MSFT talk: Erik Meijer "Democratizing the Cloud with Volta". Erik is a great speaker and he really set the tone of his session with the comment "Division by zero is the goal, not an error". He was referring to an idea from The Change Function that user's measure of success is a function of perceived crisis divided by perceived pain of adoption. Erik wants to drive that adoption pain to zero. It's a laudable goal, but I remain unconvinced on Volta.
  • Coolest Non-MSFT talk: Gilad Bracha "Newspeak". Newspeak is a new language from one of the co-authors of Java. It's heavily smalltalk influenced, and runs on Squeak. He showed developing PEGs in Newspeak, and they were very compact and easy to read, easier even than F#. He calls them Executable grammar, and you can read his research paper or review his slides on the topic. Unfortunately, Newspeak isn't generally available at this time.
  • Runner-up, Coolest Non-MSFT talk: Miguel de Icaza "Moonlight and Mono". The talk was kinda all-over-the-place, but It's great to see how far Mono has come. Second Life just started beta testing a Mono-based script runner for their LSL language (apparently, Mono breaks many LSL scripts because it runs them so fast). He also showed off Unity, a 3D game development tool, also running on Mono.
  • Resolver One is a product that bridges the gap between spreadsheets and applications, entirely written in IronPython (around 30,000 lines of app code and 110,000 lines of test code, all in IPy). Creating a spread-sheet based app development environment is one of those ideas that seems obvious in retrospect, at least to me. If you do any kind of complicated spreadsheet based analysis, you should check out their product.
  • If you're a PowerShell user, you should check out PowerShell+. It's a free console environment designed for PowerShell and a damn sight better than CMD.exe. If you're not a PowerShell user, what the heck is wrong with you?
  • Other projects to take a deeper look at: C# Mixins and Cobra Language.
  • I thought my talk went pretty well. It's was a 15 minute version of my Practical Parsing in F# series. Several folks were surprised I've been coding F# for less than a year.
Posted By Harry Pierson at 10:25 AM Pacific Standard Time

Tuesday, January 29, 2008

Practical F# Parsing: Recursion and Predicate Functions

To prep for my Lang.NET talk, I went back and reviewed my PEG parser. One thing I was not happy with was that all the recursion was handled in a one-off manner. When I needed to match multiple characters in the comment rule, I wrote a special one-off function to recursively process the comment until it reached an EOL. When I needed to parse a series of ranges, characters or definitions, I wrote special one-off functions to handle that recursion. Obviously, that's not the best approach. So, I wrote the following active pattern functions to handle recursion.

//ZOM == Zero Or More   
let rec (|ZOM|) f input = 
    match f input with 
    | Some(i,input) -> 
        let j,input = (|ZOM|) f input
        (i :: j, input)
    | None -> [], input

//OOM == One Or More       
let (|OOM|_|) f input = 
    match (|ZOM|) f input with
    | [], input -> None
    | v, input -> Some(v,input)
    
//ZOO == Zero Or One
let (|ZOO|) f input = 
    match f input with 
    | Some(i,input) -> Some(i), input
    | None -> None,input

With these functions at the ready, I can stop writing one-off recursion functions. Instead, I write a function that matches a single item, which I pass as an argument to one of the three functions above. For example, here is the original and new version of the top level Grammar function.

//Original version
let (|Grammar|_|) input =
    let rec ParseDefinitions dl input =
        match input with
        | Definition (d, input) -> 
            ParseDefinitions (dl @ [d]) input
        | _ -> Some(dl, input)
    let (|OneOrMoreDefintions|_|) input = 
        match input with
        | Definition (d, input) -> ParseDefinitions [d] input
        | _ -> None
    match input with
    | Spacing (OneOrMoreDefintions (dl, EndOfFile)) -> 
        Some(List.to_array dl)
    | _ -> None
    
//New Version   
let (|Grammar|_|) = function
    | Spacing (OOM (|Definition|_|) (dl, EndOfFile)) ->
        Some(List.to_array dl)
    | _ -> None

The new version is much shorter, because there's already a function to match a single definition, which we can pass into OneOrMore (aka OOM). Note, when I pass an active pattern function as a parameter, I have to use it's real name (with the pipes and parameters). Having to use the real name is pretty ugly, but F# need to be able to differentiate between using a function as an active pattern vs using it as a function parameter. If you could just call "OOM Definition (dl, EndOfFile)", would F# realize Definition is a parameter?

I also defined syntactic predicate functions. If you'll recall, these syntactic predicates will try to match but automatically backtrack, returning success or failure depending on which function you called.

//FP == Failure Predicate
let (|FP|_|) f input =
    match f input with
    | Some(_) -> None
    | None -> Some(input)

//SP == Success Predicate
let (|SP|_|) f input =
    match f input with
    | Some(_) -> Some(input)
    | None -> None

To see this in action, here's the original and updated Primary function. Only the first rule is relevant, so I've omitted the others.

//Original version
let (|Primary|_|) input = 
    let (|NotLEFTARROW|_|) input = 
        match input with 
        | LEFTARROW (_) -> None 
        | _ -> Some(input) 
    match input with 
    | Identifier (id, NotLEFTARROW (input)) ->  
        Some(Primary.Identifier(id), input) 
    //rest of function omitted for clarity

//new version
let (|Primary|_|) = function 
    | Identifier (id, FP (|LEFTARROW|_|) (input)) -> 
        Some(Primary.Identifier(id), input) 
    //rest of function omitted for clarity

Instead of writing a special function to match "not left arrow", I just pass the left arrow function as a parameter to Failure Predicate (aka FP). With these recursion and syntactic predicate functions, I was able to remove all the one-off recursion functions from my parser. (Note, I posted an updated version of PegParser on my SkyDrive so you can see this in action.)

These five functions significantly reduced the complexity of the code. Unfortunately, I'm not sure it's much easier to read. The conciseness is offset IMO by the ugliness of using the active pattern's true names. Also, I would have liked to use custom operators for these five functions, but operators aren't allowed to be active pattern functions. Hopefully, that will change at some point in the future, though if we're going to dream of better syntax, can we do something about all the parens? Personally, I'd love to be able to write the following:

//This doesn't work, but I can dream, can't I?
let (|Primary|_|) = function  
    | Identifier (id) !!LEFTARROW (input) -> 
        Some(Primary.Identifier(id), input)  
    //rest of function omitted for clarity
   
let (|Grammar|_|) = function 
    | Spacing ++Definition (dl) EndOfFile -> 
        Some(List.to_array dl) 
    | _ -> None

Note to self, talk to F# team members who come to LangNET about this...

Posted By Harry Pierson at 9:40 AM Pacific Standard Time

Monday, January 28, 2008

Morning Coffee 140

  • I only posted one Morning Coffee post last week. It wasn't a lack of content, it was a lack of drive on my part. I had 20-30 items flagged in my news reader, but for some reason I couldn't work up the interest in posting them. So some of these are a bit old.
  • I'm at the Language.NET Symposium this week, so look for lots of language blogging. I've already chatted with Tomáš Petříček and John Lam. If someone kicks Ted Neward's ass because he hates Perl, I'll try and liveblog it.
  • Speaking of Ted Neward, he discusses the question "Can Dynamic Languages Scale?" without devolving into a flame-fest. I agree 100% with his point about the difference between performance scaling and complexity scaling. Personally, I tend to err on the side of better complexity scaling, since buying hardware is easier than hiring developers.
  • Nick Malik responds to me calling his shared global integration vision flawed. He points to NGOSS/eTOM as an example of a shared iterative model that works. I know squat about that shared model, so I'll refrain from commenting until I do a little homework on the telco industry.
  • Speaking of shared interop models, Microsoft is joining DataPortability.org. Dare Obasanjo and Marc Canter are skeptical that so far this effort is all hype and no substance. Reminds me a bit of AttentionTrust.org. But if DataPortability.org can get off the ground, maybe there's hope for Nick's vision (or vis-versa).
  • Don Syme lists what's new in the latest F# release. As I said, this release is pretty light on features. Hopefully, I'll get some details
  • Tomas Restrepo shows how to change your home folder in PowerShell. I need to do this.
Posted By Harry Pierson at 9:10 AM Pacific Standard Time

Monday, January 21, 2008

Morning Coffee 139

  • Big news on the WGA strike front: the AMPTP reached a deal with the Directors Guild last weeks. Initial reaction from United Hollywood is mixed, but I'm hopeful this will at least get the AMPTP / WGA talks started again.
  • Speaking of new media, Xbox 360 Fanboy has a rundown of 45 short films from Sundance that are getting released on Xbox Live Marketplace. That's pretty a-typical content for XBLM. Typically, new content on XBLM has been from "Hollywood Heavyweights". I'm pretty excited to see them branch out content wise.
  • Speaking of Xbox 360, seems they had a good year. Congrats!
  • Still speaking of Xbox 360, everyone gets a free copy of Undertow this week.
  • Scott Guthrie announces the availability of the .NET Framework Source Code. Shawn Burke has instructions for how to use it with VS08. So far, they've made the core base class libraries, ASP.NET, Windows Forms, WPF, ADO.NET and XML available. LINQ, WCF and WF are expected to become available "in the weeks and months ahead".
  • Ted Neward wonders if Java is "Done" like the Patriots, or "Done" like the Dolphins? If you want my opinion (I'm guessing yes, since you're reading my blog), definitely done like the Dolphins. OpenJDK was a desperation move to make Java "cool" again, but it won't work. People who want an open source stack are using LAMP and language wonks who saw Java as mainstream SmallTalk have moved on to Ruby. The question will be if Sun buying MySQL will make Sun cool or MySQL uncool by association. I'm guessing the latter.
  • Speaking of Ted, he's got a great post about the relevance of game programming to the mainstream or enterprise developer.
  • Speaking of game development, David Weller points to all the new XNA GS 2.0 content up on Creators Club Online.
  • There's a new version (1.9.3.14) of F# out, but no announcement from Don regarding what's new. I reviewed the release notes, seems like this is primarily a bug-fix release with only very minor feature additions.
  • Speaking of F#, Don points to Greg Neverov's implementation of Software Transactional Memory in F#. This immediately reminded me of Tim Sweeney's Next Mainstream Programming Language talk. Tim suggested said language would need to support a combination of side-effect free functional code and software transactional memory. F# is looking to be closer to that language all the time.
  • Still speaking of F#, Don Syme's Expert F# book is out. I read the draft version - it rocks - but I'm still going to get my own real copy. You should too.
  • With their win Saturday, the Caps are back to .500 for the first time since late October. Since Thanksgiving, the Caps are 15-7-4. Only four teams in the league have a better record over that time span. We play one of them tonight - the Penguins - and it's on Versus, so I'll even get to see it. In HD no less.
Posted By Harry Pierson at 9:34 AM Pacific Standard Time

Wednesday, January 16, 2008

Morning Coffee 138

  • In writers strike news, the WGA has made side deals with Worldwide Pants (aka Dave Letterman's company), United Artists (aka Tom Cruise's company) and The Weinstein Company (previously known as Miramax). The WGA strategy of divide and conquer seems to me making slow progress. Update: The Weinstein Company was founded by Miramax's founders Harvey and Bob Weinstein after they left Miramax. But Miramax is still around. Thanks to GrantC for the correction.
  • They're still two games under .500, but the Caps completed a season sweep of the Eastern Conference leading Ottawa Senators last night. They're only 3 games out of the top spot in the (admittedly very weak) Southeast division
  • Big tech news today isn't coming from MSFT-land. Sun is buying MySQL and Oracle is (finally) buying BEA. Both deals seem like pretty significant culture clashes, though Sun/MySQL seems like the better fit of the two.
  • There's a new draft of Service Modeling Language 1.1 available. If you'll recall, this used to be called the System Definition Model, part of the Dynamic Systems Initiative. Hadn't heard anything from those folks in a while, good to see they're making progress.
  • Stephan Tolksdorf dropped me a line to tell me he was able to "vastly simplify" FParsec, and as a result it now runs on the current version of F#. Awesome!
  • Speaking of F#, Scott Hanselman has a new F# podcast, this time interviewing Dustin Campbell. Check out all of Dustin's F# posts.
  • I didn't know about the "Copy as Path" feature in Vista. Why is it hidden?
  • I was a big fan of the WDS deskbar shortcut feature - a feature that is missing in Vista. Enter Start++ by Brandon Paddock, which adds shortcuts to Vista's search box. It also supports "iPhone apps" and scripting. But JScript? Where's the PowerShell love, Brandon?
  • EA released the source code to the original SimCity under the GPL. Bil Simser is digging into the code and it looks like he's going to port it to XNA. (via Ozymandias)
  • Wes Haggard has published the source code to CodeHTMLer on CodePlex. He took two updates from me: the F# language definition as well as the ability to choose the font when not using PRE tags.
Posted By Harry Pierson at 11:14 AM Pacific Standard Time

Monday, January 07, 2008

Morning Coffee 135

  • Bill Gates does his last CES Keynote, and we announce a PC that looks like a purse?
  • News that Warner Brothers is going exclusively Blu-Ray is disappointing. However, I'm convinced that neither side will win this format war but that online downloads will trump both. Obviously, XBLM is a significant player in this space, but the market is crowding up quickly. Netflix apparently will unveil a new set-top box @ CES to let you watch HD movies via the Internet.
  • Don Syme has a roundup of posts by John Liao about F#. Mostly, WPF + F# with a couple of ASP.NET 2.0 posts and one on XML .
  • Speaking of F#, Stephan Tolksdorf has been working on an F# port of MS Research's Parsec library called FParsec. Parsec is a "monadic parser combinator library", something I have little experience with, so I've gone back to some source research on the topic, which I hope to blog at length about soon.
  • Steve Vinoski talks about serendipitous reuse in his latest Internet Computing article. I'm not a believer in reuse in the enterprise, serendipitous or otherwise, but I liked the conclusion to Steve's article when he wrote "It's highly ironic that many enterprise architects seek to impose centralized control over their distributed organizations. In many cases, such centralization is a sure recipe for failure." Also, his point that "control without controlling" works sounds vaguely familiar.
  • Update - This is really Morning Coffee 136, but I don't want to change the title since it's part of the URL
Posted By Harry Pierson at 10:29 AM Pacific Standard Time

Thursday, December 20, 2007

Morning Eggnog 132

  • My parents are coming into town tomorrow so I'm off for the remaining week or so of the year. Blogging will likely be non-existent, unless I blog something I come up with while geeking out with my dad.
  • Juergen van Gael demonstrates how to use TPL from F#. He wrote this once before using F#'s async workflows feature. I like the TPL version, though the "new Action<int>(RowTask)" is a little wordy. I'm guessing the eventual F# syntax will probably become something compact like "action RowTask". (via Don Syme)
  • Andrew Peter ported RoR's Haml view engine to ASP.NET MVC, calling the result NHaml. I haven't played around with the new MVC stuff much, but I'm guessing ASP.NET's control-based approach doesn't work well when you separate out the controller code. If I'm manually authoring view templates, I'd much rather type NHaml's syntax than the standard ASP.NET <% ...%> syntax. On the other hand, there aren't any design tools out there today handle the NHaml syntax. Also, I wonder if Andrew is working on a Sass port. (via DNK)
Posted By Harry Pierson at 12:18 PM Pacific Standard Time

F# PEG Parser Next Steps

There are still a couple of posts to go in my Practical Parsing in F# series. But with Christmas and my parents on their way, I'm taking the rest of the year off.

I've stuck the code as it currently stands up on my SkyDrive. Conveniently enough, xUnit.net released their RC1 build yesterday, which includes supports for static test methods. I've included the RC1 build in the zip file on SkyDrive, as well as simple batch file so you can run the tests yourself.

Taking a break from this project will give me a good opportunity to figure out where to take it next. As the code stands, it's not very useful - it simply builds a PEG AST from a PEG grammar. That's just the first phase of a typical compiler. Without those other phases (you know, like "generate binary code") this is just an interesting sample.

Since I'm in the "future pondering" phase, now's the time to make your opinion known. What do you, dear reader, think I should do with this code? Bonus points for wanting to get involved.

Posted By Harry Pierson at 10:03 AM Pacific Standard Time

Practical F# Parsing: Semantic Productions (2)

Now that I've explained the AST, there are several more semantic productions to go. I'm not going to describe them all in detail, just hit a few important highlights.

Many of the semantic productions return lists of other productions. Class returns a list of Ranges, Literal and Identifier returns lists of characters, etc. As you would expect, these multiples are encoded in the grammar. For example, here's the implementation of Literal:

///Literal <- ['] (!['] Char)* ['] Spacing
///         / ["] (!["] Char)* ["] Spacing
let (|Literal|_|) input =

    let rec (|LitChars|_|) delimiter chars input =
        match input with
        | TOKEN delimiter (_) -> Some(L2S chars, input)
        | Char (c, input) -> 
            (|LitChars|_|) delimiter (chars @ [c]) input
        | _ -> None
    
    match input with
    | TOKEN "'"  (LitChars "'"  [] (lit, TOKEN "'"  (Spacing(input)))) -> 
        Some(lit, input)
    | TOKEN "\"" (LitChars "\"" [] (lit, TOKEN "\"" (Spacing(input)))) -> 
        Some(lit, input)
    | _ -> None

I'm using a local recursive function LitChars to retrieve the characters between the quote delimiters. The quote parameter - i.e. single or double quote - is passed in as a parameter. I also pass in an empty list of chars as a parameter. Remember that functional programs keep their data on the stack, a list parameter is a common way to keep state in a recursive function. When I match a single non-delimiter character, I add it to the list with the chars @ [c] expression. [c] converts a single value c into a list of one element while the @ operator concatenates to lists. I'm not sure adding the value to he end like that is a good idea perf wise. Programming Erlang recommends only adding items to the head then reversing the list when you're done matching. But F# isn't Erlang, so I'm not sure what the guidance is here.

Another thing you find in productions is the backtracking syntactic predicates. We saw an example of them in the implementation of Comment. Often, their used to indicate the end of a list of other productions, such as Literal, above. However, sometimes, they're used to ensure the correct production is matched. For example, a Primary can be an Identifier, as long as it's not followed by a left arrow. An identifier followed by a left arrow indicates a Definition.

///Primary <- Identifier !LEFTARROW
///         / OPEN Expression CLOSE
///         / Literal / Class / DOT
let rec (|Primary|_|) input =

    let (|NotLEFTARROW|_|) input =
        match input with
        | LEFTARROW (_) -> None
        | _ -> Some(input)

    match input with
    | Identifier (id, NotLEFTARROW (input)) -> 
        Some(Primary.Identifier(id), input)
    | OPEN ( Expression (exp, CLOSE (input))) -> 
        Some(Primary.Expression(exp), input)
    | Literal (lit, input) -> Some(Primary.Literal(lit), input)
    | Class (cls, input) -> Some(Primary.Class(cls), input)
    | DOT (input) -> Some(Primary.Dot, input)
    | _ -> None

Here, I need a way to match the absence of LEFTARROW, so I've build a simple local function called NotLEFTARROW. This isn't very clean IMO - I'd rather have a used a custom operator like !!! and &&& for my backtracking predicates. But I haven't figured out how to use custom operators as Active Patterns. I was able to write a standard non-operator AP function, but then I have to use the full AP function name. Here's a version of Primary written that way:

///Backtracking failure predicate
let (|NotPred|_|) f input =
    match f input with
    | Some (_) -> None
    | _ -> Some(input) 
    
let rec (|Primary|_|) input =
    match input with
    | Identifier (id, NotPred (|LEFTARROW|_|) (input)) -> 
        Some(Primary.Identifier(id), input) 
    //Other matches omited

Frankly, I don't think that's very readable, so I didn't implement it that way. If I can figure out how to use custom operators and pass around AP functions without using their full ugly name, I'll change it.

Finally, there are a few things about F#'s scoping rules that you need to understand. F# uses linear scoping, which is to say there's no way to use a type or function that hasn't been declared, sort of like C/C++. The difference is that while C/C++ have a way to declare a type or function separately from its implementation, F# has no such capacity. This becomes an issue when you have circular references. For example, Primary can be an Expression, which is a list of SequenceItems, each of which is a Primary with an optional prefix and suffix. In order to declare those in F#, you have to use a special "and" syntax to link the types/functions together.

//ToString and Exp2Str omitted for clarity
type Primary =
| Identifier of string
| Expression of Expression
| Literal of string
| Class of Range list
| Dot 

//ToString omitted for clarity
and SequenceItem =
    { 
        primaryItem: Primary;
        itemPrefix: Prefix option;     
        itemSuffix: Suffix option;
    }

and Sequence = SequenceItem list

and Expression = Sequence list

Likewise, the AP functions to recognize Primary, SequenceItem, Sequence and Expression are anded together. For me, this is one of the hardest things to get used to about F#. But as you can see from the expressiveness of the code, it's well worth the trouble

Posted By Harry Pierson at 10:00 AM Pacific Standard Time

Wednesday, December 19, 2007

Practical F# Parsing: The Abstract Syntax Tree

In the last post, I showed two semantic productions, Char and Range. Char returns an option tuple of a native char and the parse buffer. Range returns a tuple of either a single character or a character range and the parse buffer. Certainly, I could have written Range to always return a char * char tuple, passing in the same character for both in the case of a single character range. However, this provides an opportunity to introduce F#'s discriminated unions (or simply union for short).

The F# Manual describes a discriminated union as "a new type composed of a fixed number of distinct alternatives". Many of the semantic productions return "a fixed number of distinct alternatives" so I find a union is a good way to model the return value of semantic production functions. Here's the definition of Range:

///AST Type for Range production
type Range =
| Single of char
| Dual of char * char
    with
    override this.ToString()
        match this with
        | Single x -> sprintf "Range.Single (%A)" x
        | Dual (x,y) -> sprintf "Range.Dual (%A,%A)" x y

So Range is either a single character, or a tuple of two characters. As you saw in the last post, you create an instance of a union with the type.alternative syntax. You can also use simply the alternative name, assuming F# can determine the correct union type. Personally, I like using the full name - it helps me remember what the type really is.

Notice that the AP function and the union type appear to have the same name. Actually, they don't since the name of the AP function's name includes the bananas - i.e. (|Range|_). However, if you want you can define a function called simply Range and still have a type named Range as well - as long as you're not interested in language interop. F# can tell the difference between the Range function and the Range union, but C# can't. So I'd say we're best off avoiding overloading the names entirely.

If you look at the compiled union in Reflector, you'll see the Range type, with public internal classes named _Single and _Dual that inherit from Range. In other words, F# implements union types as an inheritance tree.  Range also provides static constructors for the various disparate types in the union.

One last thing I want to point out about the Range type is how I overrode ToString. This is primarily for unit testing - if you don't override ToString, you only get the type name which isn't very useful when trying to figure out why a given unit test failed. I'm using the F# native sprintf function rather than string.Format, so the format string is a little different.

The other major F# type we'll use in the AST are record types. These are similar conceptually to structs in C#. Basically, they're a tuple with names. For example, here's the Definition record type (though we haven't seen any functions that use this type yet).

///AST Type for Definition production
type Definition =  
    { 
        name: string
        exp: Expression; 
    } 
    with 
    override this.ToString() =  
        sprintf "Definition (name: %A, exp: %A)" this.name (Primary.Exp2Str this.exp)

I could have simply defined this type as (string * Expression), but having the fields named makes it crystal clear what the semantic meaning of each field is. The only place where I used an anonymous tuple in the AST instead of a record is in the Range union above - I figured that was simple enough not to warrant named fields.

I also have a couple of type aliases. For example, I have a record type called SequenceItem. An array of SequenceItems is a Sequence and an array of Sequences is an Expression (which we saw in the Definition type above).

///AST Type for Sequence Item production
let SequenceItem =
    { 
        primaryItem: Primary;
        itemPrefix: Prefix option;     
        itemSuffix: Suffix option;
    }

///AST Type for Sequence production
let Sequence = SequenceItem list

///AST Type for Expression production
let Expression = Sequence list

Note, unlike unions and records, type aliases can't override base class methods like ToString. This is because there is no actual Sequence or Expression types in the compiled code - F# compiles away type aliases completely. Looking at the implementaiton of Definition in reflector confirms that the exp member is of type List<List<SequenceItem>>. Since I need to convert Expressions to strings in two different places, I wrote a static Exp2Str method on the Primary type (not shown). It feels a bit hacky to stick Expression's ToString implementation on the Primary type, but I had little choice given F#'s scoping rules.

Technically, since they get compiled away anyway, I could have skipped the Sequence and Expression declarations and simply defined the exp field of Definition as "SequenceItem list list". But the "list list" syntax throws me a bit. I mean, I understand it, but I found using the terms Sequence and Expression far more readable. Also, I used the definition of Expression in the definition of Primary, so it makes sense for it to have it's own name.

Posted By Harry Pierson at 11:00 AM Pacific Standard Time

Tuesday, December 18, 2007

Practical F# Parsing: Semantic Productions (1)

All the syntactic productions in my PEG parser, save one, have the exact same signature. They take in a char list and return a char list option. Which is to say, they take a parse buffer in and return either the remaining parse buffer on a successful match or nothing on a failed match. The only exception is EndOfFile which doesn't return the remaining parse buffer because there isn't any buffer left to parse.

Now we're moving on to look at the productions with semantic implications. In Parsing Expression Grammars, there are eleven: Char, Range, Class, Literal, Identifier, Primary, Sequence Item, Sequence, Expression, Definition and Grammar. Like their syntactic brethren, these semantic productions will all have a single char list input parameter. However, they will all return some semantic value along with the remaining parse buffer.

We'll start with Char, since it's the only semantic production that doesn't return a custom type:

///Char <- '\\' [nrt'"\[\]\\]
/// / '\\' [0-2][0-7][0-7]
/// / '\\' [0-7][0-7]
/// / '\\' [0-7]
/// / !'\\' .   
let (|Char|_|) input = 
       
    let (|InRange|_|) upper input =
        let i2c value = Char.chr(Char.code '0' + value)
        let c2i value = Char.code value - Char.code '0'
        
        match input with
        | NC (c, input) when (i2c 0) <= c && c <= (i2c upper) ->
            Some((c2i c), input)
        | _ -> None
        
    match input with
    | TOKEN @"\" (NC(c, input)) 
    when List.exists (fun x -> x=c) ['n';'r';'t';'\'';'"';'[';']';'\\'] -> 
        match c with
        | 'n' -> Some('\n', input)
        | 'r' -> Some('\r', input)
        | 't' -> Some('\t', input)
        | _ -> Some(c, input)
    | TOKEN @"\" (InRange 2 (i1, InRange 7 (i2, InRange 7 (i3, input)))) ->
        Some(Char.chr (i1 * 64 + i2 * 8 + i3), input)
    | TOKEN @"\" (InRange 7 (i1, InRange 7 (i2, input))) ->
        Some(Char.chr (i1 * 8 + i2), input)
    | TOKEN @"\" (InRange 7 (i1, input)) ->
        Some(Char.chr (i1), input)

    | NC(c, input) when c <> '\\' -> Some(c, input)
    | _ -> None 

Note, this production is slightly different from the one in the PEG whitepaper. This way was easier to pattern match. Also, I typically don't wrap my when guards onto the next line, but this way it doesn't wrap funny on my blog.

While long, Char is fairly straight-forward. There are five ordered choices that can match this production. The first is for escaped characters, the next three are for character codes, and the last one is matching any character except the backslash escape character. Note, tracking F#'s escape characters and PEG's escape characters can get tricky. I've used verbatim strings for all my TOKEN parameters in order to help try and keep it straight.

The escape character match clause uses a when guard to narrow down the selection criteria. I use the built-in List.exists method to see if the character is in a hard-coded list of special characters. List.exists takes in a function parameter, and returns true if that function returns true for any of the value is the list. Since I'm just matching a value, my function parameter is a trivial equality test. If List.exists returns true, I return that special character as part of the return tuple. Of all the escape characters in PEG, only three are also escape characters in F#, so I use a second match clause to return the correct char value. There's probably a way to do that more elegantly, but since there were just three clauses, I figured it was easier to type them out manually.

For the character code clauses, I wrote a special local AP function called InRange to determine if the specified character was within a specified range and to convert it from a char to an int. Note, the way the production is written, the largest character code you can specify is 277, which means you can encode slightly more than the standard UTF-8 character set. Honestly, this should be updated to support full UTF-16, but I'm not here to critique the grammar, so I didn't try to fix this issue.

Note, all the results (save None) return a tuple of the matched character value and the remaining input buffer. Again, all the remaining productions will work like that. For example, here's the Range production:

///Range <- Char '-' Char / Char
let (|Range|_|) input =
    match input with
    | Char (c1, TOKEN "-" (Char (c2, input))) -> 
        Some(Range.Dual (c1, c2), input)
    | Char (c1, input) -> 
        Some(Range.Single (c1), input)
    | _ -> None

Compared to Char, Range is fairly simple. It's either two chars, separated by a hyphen (for example: a-z) or it's a single char. Again, being able to use Active Patterns to build on lower level productions is a huge helper.

But what does this function return? What does Range.Single and Range.Dual mean? Those are refer to a special F# construct called a discriminated union. Before we can continue writing semantic productions, we need to define these types to hold the results of these productions.

Posted By Harry Pierson at 12:53 PM Pacific Standard Time

Monday, December 17, 2007

Practical F# Parsing : Syntactical Productions (2)

Now that I've moved over to Active Patterns, I want to go back and finish the syntactic productions for my PEG parser. Most of the syntactic productions are very straightforward when implemented in AP. We've seen EndOfFile, EndOfLine and Space already. There is also a series of symbol identifiers that have only a single match clause. For example, here's DOT:

///DOT <- '.' Spacing
let (|DOT|_|) input =
    match input with
    | TOKEN "." (Spacing(input)) -> Some(input)
    | _ -> None

I'm not going to go thru all the symbol AP functions since their all basically like this one. However, you'll notice that this function references an AP we haven't seen yet - Spacing. I want to close out the section on Syntactical Productions by looking at the Spacing and Comment productions. Since Spacing depends on Comment, I'll start with Comment.

Comments in PEG grammars are single lines that start with a # symbol, similar to the // line comments in F# and C#. This is the PEG grammar rule for Comment:

///Comment <- '#' (!EndOfLine .)* EndOfLine

Basically, this says that a comment starts with a #, then matches zero or more characters that are not EndOfLine, and ends with an EndOfLine. The exclamation point is a syntactic predicate, which means that we unconditionally backtrack after attempting to match. PEG has both a success and failure syntactic predicate - the ! is the failure predicate while & is the success predicate. So inside the parens, this production rule says to test the current point in the parse buffer for EndOfLine. If it finds it, the match fails and we exit out of the parens (where we match EndOfLine again without backtracking it this time). If it doesn't find it, the parser backtracks, consumes the next character regardless what it is, then repeats.

Unfortunately, there's a bug in this production. If the parse buffer ends in a comment, the production will fail since it hasn't reached the EndOfLine and there are no more characters to consume. So I changed the production to:

///Comment <- '#' ((!EndOfLine / !EndOfFile) .)* EndOfLine?

This rule now ends the comment if it reaches an EndOfLine or EndOfFile. Additionally, it makes the final EndOfLine match optional. So if the comment ends with a new line, the new line is consumed as part of the grammar production. If the comment ends with the end of file, the EndOfFile is not consumed as part of the production. If you'll recall, EndOfFile returns Some(unit) rather than Some(char list). In F#, the various branches of a match clause have to have the same return type, so you can't return Some() from one branch and Some(input) from another. It's no big deal - you use the EndOfFile production at the top-level grammar to ensure you've consumed the entire file anyway.

Here's the F# implementation of Comment:

Comment defines a local AP function called CommentContent, which implements the part of the grammar production inside the parens.

///Comment <- '#' ((!EndOfLine / !EndOfFile) .)* EndOfLine?
let (|Comment|_|) input = 
    let rec (|CommentContent|_|) input = 
        match input with
        | EndOfLine (input) -> Some(input)
        | EndOfFile -> Some(input)
        | NC (_,input) -> (|CommentContent|_|) input
        | _ -> None
    match input with
    | TOKEN "#" (CommentContent (input)) -> Some(input)
    | _ -> None

Local AP function CommentContent recurses thru the input buffer after the pound sign, looking for  EndOfLine or EndOfFile. This function should never match the final default clause, but I put it in to keep the compiler from complaining. Notice that I use symbol redefinition here so both EndOf match clauses return Some(input). For EndOfLine, I'm re-defining input to mean what is returned by EndOfLine. For EndOfFile, I'm not re-defining, so input still means the list that is passed into the pattern match statement.

Compared to Comment, Spacing is pretty trivial:

///Spacing <- (Space / Comment)*
let rec (|Spacing|) input = 
    match input with
    | Space (input) -> (|Spacing|) input
    | Comment (input) -> (|Spacing|) input
    | _ -> input

There are two things I want to call out about spacing. First, it's a recursive function, so it's defined with let rec. AP functions can be recursive, just like normal functions. Also, note the lack of an underscore in the name of this AP function. Spacing is defined as zero or more spaces or comments, so it's perfectly valid to match nothing. Thus, Spacing is always a successful match. In this case, we don't put the underscore in the AP function name and we don't wrap the return result in Some(). You'll notice the last match clause simply returns input, rather than Some(input).

That's all the syntactic predicates. Next up, the meat of the grammar: semantic predicates.

Posted By Harry Pierson at 11:38 AM Pacific Standard Time

Friday, December 14, 2007

Practical F# Parsing: Active Patterns

In the last post, I gave you a sneak preview of what the EndOfLine production would look like using Active Patterns. But before we get to how to build that, let me give you a little background on why. If you want the full explanation, check out the whitepaper.

Basically, Active Patterns (aka AP) are a way to use the pattern matching of functional languages with abstractions rather than native language types. If you'll recall, I built functions to abstract the parse buffer so I could later change it's implementation if I needed to. The problem is that since the parse buffer is an abstraction, you can't use it in the match clauses. For example, here's a version of EndOfLine that uses a native char list.

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input = 
    match input with
    | '\r' :: '\n' :: input -> Some(input)
    | '\n' :: input -> Some(input)
    | '\r' :: input -> Some(input)
    | _ -> None

That's straightforward like the AP preview at the end of the last post, but I've lost the benefit of the parse buffer abstraction. In other words, if I wanted to change the implementaiton of the parse buffer to a string, or some other type, I'd be screwed if I wrote EndOfLine this way. Traditionally, functional language developers had an either/or choice when it came to abstractions vs. pattern matching. AP let's you use both.

Using a special syntax, you can indicate that an F# function is an AP by surrounding the name in what Don calls "bananas". Here's the AP version of NC:

let (|NC|_|) input = 
    match input with 
    | i :: input -> Some(i, input)
    | [] -> None

This function is identical to the one defined in the first post, except for the name. By surrounding the actual name in paren/pipe "bananas", you're indicating the function can be used in match clauses, not just the match input. The trailing underscore in the name indicates this is a partial pattern, which means it returns an option value (aka Some(_) or None).

There's no reason why you can't use an AP function like any other function. I find I do this often in my unit tests. Here's an updated version of an NC unit test.

[<Fact>]  
let test_NC_empty_string () =   
    let ret = (|NC|_|) !!"" 
    Assert.Equal(None, ret)

While you can still call the function like this, the primary benefit of using Active Patterns is so you can use the function in pattern match clauses directly. This allows the production clauses to mirror the actual grammar rules directly. For simple productions like EndOfFile and EndOfLine, the AP F# implementation isn't much more complex than the grammar rule itself:

///EndOfFile <- !.
let (|EndOfFile|_|) input =  
    match input with 
    | NC (_) -> None 
    | _ -> Some() 

///EndOfLine <- '\r\n' / '\n' / '\r'    
let (|EndOfLine|_|) input =  
    match input with 
    | TOKEN "\r\n" (input) -> Some(input) 
    | TOKEN "\n" (input) -> Some(input) 
    | TOKEN "\r" (input) -> Some(input) 
    | _ -> None

You see in these functions, the calls to NC and TOKEN are used in the match clauses (i.e. after the pipe) rather than the match input (i.e. between match and with). Note, when used in a match clause, you just use the name directly without the bananas.

You'll notice that for TOKEN, the token string to match goes outside the parentheses. This is because "\r\n" is an input parameter to the TOKEN AP function. Alternatively, I could have written EndOfLine using only the NC function, though I find TOKEN version easier to read.

///EndOfLine <- '\r\n' / '\n' / '\r'   
let (|EndOfLine|_|) input = 
    match input with
    | NC ('\r', NC ('\n', (input))) -> Some(input)
    | NC ('\n', input) -> Some(input)
    | NC ('\r', input) -> Some(input)
    | _ -> None

In this version, the values of '\r' and '\n' are pattern matched against the result of calling NC, so they go inside the parentheses. In other words, the TOKEN clauses are matched if TOKEN returns some value. However, the NC clauses are only matched if the returned result matches the value specified in the match clause. inside the parentheses. TOKEN has two parameters, the token string and the parse buffer, while NC only has the parse buffer. When you write an AP function, the last parameter gets bound to the match clause input. Additional parameters, like TOKEN's token, much be specified in the match clause.

Notice I've defined these grammar productions as active patterns as well, which will make them compose nicely with higher-order productions. For example, here's the Space grammar production, which reuses EndOfLine:

///Space <- ' ' / '\t' / EndOfLine
let (|Space|_|) input = 
    match input with
    | TOKEN " " (input) -> Some(input)
    | TOKEN "\t" (input) -> Some(input)
    | EndOfLine (input) -> Some(input)
    | _ -> None

It's DSL-esque, wouldn't you say? Active Patterns is a little parens-heavy - the NC version of EndOfLine has three nested APs which isn't exactly easy on the eyes. However, the concept is very solid and it make the parsing code almost easier to write by hand than it would be to use a parser generator like yacc. Almost.

Posted By Harry Pierson at 10:11 AM Pacific Standard Time

Thursday, December 13, 2007

Practical F# Parsing: Syntactical Productions (1)

Now that I have my parse buffer functions and unit test framework taken care of, it's time to write some parsing expression grammar productions [1]. In any grammar, there are productions that have semantic meaning and productions that are only used to specify syntax. For example, in C# statements end with a semicolon and code blocks are delineated with curly braces. Syntactical elements don't end up in the abstract syntax tree, so they're easier to implement.

Most of PEG's grammar productions are syntactical in nature. Let's start with the simplest production, EndOfFile.

///EndOfFile <- !.
let EndOfFile input = 
    match NC input with
    | None -> Some()
    | _ -> None

Similar to the underscore in F#, the period represents any character In the PEG grammar and the exclamation point is a backtracking not. In other words, EndOfFile succeeds if NC fails, which only happens when the parse buffer is empty. Since the parse buffer is empty on a successful match, there's no value to return so Ireturn Some(). Some() is kinda like Nullable<void>, which doesn't make much sense in C#. But here it means I have a successful match but no data to pass back to the caller.

Here's a slightly more complex grammar production, EndOfLine:

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input = 
    match NC input with
    | Some('\r', input) ->
        match NC input with
        | Some ('\n', input) -> Some(input)
        | _ -> Some(input)
    | Some('\n', input) -> Some(input)
    | _ -> None

EndOfLine has three possible matches: \r\n, \n and \r. Here, you see the use of nested match statements in the \r match clause. If the top character is \r, the function checks the top of the remaining text buffer for \n. F#'s ability to reuse symbols comes in very handy here.

Unfortunately, this isn't code very readable. In particular, the match priority order, which matters in PEGs, is totally lost. For this production, it's no big deal, but that won't always be the case. I'm also not using the handy TOKEN function I wrote in a few posts ago. Here's an alternative version that uses TOKEN and preserves the match priority order.

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input = 
    match TOKEN "\r\n" input with
    | Some(input) -> Some(input)
    | _ -> 
        match TOKEN "\n" input with
        | Some(input) -> Some(input)
        | _ -> 
            match TOKEN "\r" input with
            | Some(input) -> Some(input)
            | _ -> None

This time, I use the TOKEN function and nested match statements to chain the results together. The previous method is probably faster, but this approach is more readable and explicitly preserves the match order. However, it sure is a pain to type. Wouldn't it be great if we could write something like this:

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input =  
    match input with 
    | TOKEN "\r\n" (input) -> Some(input) 
    | TOKEN "\n" (input) -> Some(input) 
    | TOKEN "\r" (input) -> Some(input) 
    | _ -> None 

Turns out, F#'s Active Patterns feature let's me implement EndOfLine exactly like this. We'll look at how it works in the next post.


[1] The full PEG grammar is listed in "Parsing Expression Grammars: A Recognition-Based Syntactic Foundation" by Brian Ford.

Posted By Harry Pierson at 11:08 AM Pacific Standard Time

Wednesday, December 12, 2007

Practical F# Parsing: Unit Testing

Now that I have functions to access the parse buffer, I better write some tests for them. Yes, I realize I should have written the tests first, but the articles flow better this way.

I've written before that one of the benefits to side-effect free functional programming is that it makes unit testing a breeze. But I still need some type of xUnit testing framework. I could write my own native F# xUnit framework, but given the availability of mature xUnit frameworks on .NET, I'd really rather just use one of them.

Traditionally, I've used NUnit or Visual Studio's unit testing framework, but they're designed to work with OO languages like C#. In order to use them from F#, we have to use the OO features of F#. Here's an example of some F# unit tests using NUnit.

type [<TestFixture>] parser_tests =   
    class   
        new () = {}   
          
        [<Test>]  
        member this.test_NC() =   
            let Some(c,text) = NC !!"test" 
            Assert.Equal(c, 't')   
            Assert.Equal(text, !!"est")

        [<Test>]  
        member this.test_NC_empty_string() =   
            let ret = NC !!"" 
            Assert.Equal(None, ret)
    end

While this works, there's an awful lot of extraneous text needed to make this work. Test functions need to be methods on a Test Fixture class (note, F# uses [< >] to indicate attributes) and that class needs a default constructor. F# doesn't add one by default, so we have to do so manually. And every test function needs to be marked with "member this".

I'd really rather write tests that looks like this:

[<Test>]   
let test_NC =    
    let Some(c,text) = NC !!"test" 
    Assert.Equal(c, 't')   
    Assert.Equal(text, !!"est")

[<Test>]   
let test_NC_empty_string =    
    let ret = NC !!"" 
    Assert.Equal(None, ret)

That's a lot more straightforward. If only I could write my test code like that...

It turns out there's a new kid on the .NET unit testing block. xUnit.net is the brainchild of Jim Newkirk (one of the original NUnit developers) and Brad Wilson (aka the .NET Guy). Among other things, xUnit.net does away with the TestFixture attribute. All public methods in all public classes are checked for tests in xUnit.net.

Since we don't need the TestFixture, does that mean I can write the tests as F# functions if I use xUnit.net? Not quite. xUnit.net only checks for public instance methods on the public classes in a test assembly. But F# functions get compiled as static methods. Luckily, xUnit.net is simple enough to change. I submitted a patch to xUnit.net that enables it to find both static and instance test methods (and to skip creating and disposing an object instance for static test methods). I'm hoping it will be accepted and ship as part of their next release. Until then, I'm running my own private build with those changes included.

Now that I've settled on a unit test framework, let's look at some tests. For my parser solution, I have two projects: PegParser and PegParser.Tests. The tests project depends both on the PegParser assembly as well as xunit.dll, so I need to set a reference to both in my project. F# projects in VS don't have the References node in the project tree, you have to either add the references on the project property page or directly within the code. Not sure which is better, but it's easier to show the code syntax:

#R @"..\..\xUnit.net\Main\xunit\bin\Debug\xunit.dll"
#R @"..\PegParser\pegparser.dll"

open Xunit
open Parser

The #R compiler directive is used to reference an external assembly. F#'s open statement acts like C#'s using statement, so I can reference types without specifying their full namespace. You'll notice that the parser is implemented in a dll called pegparser.dll while the namespace is simply Parser. Actually, it's not really a namespace. If you open PegParser.dll in Reflector, you'll notice that Parser is actually a type, and the functions are all implemented as static methods. F# hides that from you, though you'd have to know that if you wanted to invoke the parser from C# or VB.net. By default, F# uses the filename as the namespace/class name and I haven't changed that default in my parser code (though I probably should).

Once we've referenced the correct assemblies, I need to write the tests. Here are two tests for NC (aka Next Char) function I wrote in the last post.

[<Fact>]   
let test_NC_empty_string () =    
    let ret = NC !!""  
    Assert.Equal(None, ret) 

[<Fact>]   
let test_NC_simple_string () =    
    let Some(c,text) = NC !!"test"  
    Assert.Equal(c, 't')    
    Assert.Equal(text, !!"est") 

You'll notice this code is almost identical to my wish test code above. Almost. There are a few syntax changes - In xUnit.net, tests are called facts and Assert.AreEqual is simply Assert.Equal. I've also had to add empty parentheses after each test function name. Remember that functions in FP are like math functions. If there's no independent value to pass in, the result of a math function is is constant. F# compiles parameter-less functions as static properties instead of a static methods. In order to make the test functions compile as static methods, we have to pass in at least one parameter. In F#, the unit type is the equivalent of the void type in C#. Unit has exactly one valid value - the empty parens. Adding the empty parens to the parameter list of the test functions ensures they get compiled into static methods.

Note, it's really really easy to forget to add those empty parens. If you don't add them, the code will still compile, but the tests won't be found or run. I've been bit by that once already, so if you have a bunch of tests not running, make sure they have the empty parens!

So each test method feeds a parse buffer (converted from a string with the custom !! operator) into the NC function and makes assertions about the result. In the first test, NC should return None if the parse buffer is empty, so I simply compare the function result to None via Assert.Equals. In the second test, I use F#'s pattern matching capability to assign the result of NC to the value Some(c,text). Basically, this is doing simple pattern matching to bind the two value names to the two parts of the tuple returned from NC. Those two values are then asserted to be a specific values as you would expect.

Note, in the current version of F#, the line let Some(c,text) = NC !!"test" yields two warnings. The first (FS0062) warns that a future version of the language will require parens around Some(c,text). I sure hope they change their minds on this, since active patterns are already so parens-heavy. The second (FS0025) warns that this is an incomplete pattern match. If NC returns None, the pattern wont match and the function will throw a MatchFailureException. Of course, since these are unit tests, that's exactly the behavior I want! Given the nature of these warnings, personally, I turn them both off (only in my unit tests, mind you). This is done via the #nowarn compiler directives at the top of the file.

#nowarn "25" //Turn off Incomplete Pattern Match warning
#nowarn "62" //Turn off Some contruct needs parens warning

Obviously, there are more tests than just these (though my total code coverage is pretty poor, shame on me) but they're all pretty similar. As you can see, there's tests are very straight forward. The nature of FP functions makes them fairly simple to test, and xUnit.net (with a couple of minor changes) makes it easy to write your unit tests in F#.

Posted By Harry Pierson at 11:40 AM Pacific Standard Time

Tuesday, December 11, 2007

Practical F# Parsing: The Parse Buffer

The first thing I need for my F# parser is a type to represent the buffer of characters being parsed. For now, I'm using the simplest thing that could possibly work: an intrinsic F# character list. Lists are heavily used in functional languages, so they tend to have very efficient native list types. F# is no exception. Along with a native list type, F# has a native operation for retrieving the head of a list. For example, If you execute the following code, head will be bound to '1' and tail will be bound to the list ['2';'3';'4']

let head :: tail = ['1';'2';'3';'4']

The problem using the native list head operator is that my parsing code will be explicitly bound to work on character lists only. However, I'm not sure an in-memory character list is the best way to read long files of text off the disk for processing, so I'd rather not limit future options like this. So instead, I'll define my own function to retrieve the head of the parsing buffer.

let NC input =  
    match input with     
    | head :: tail -> Some(head, tail)    
    | _ -> None

Note, in all my F# code I use the #light syntax which means code blocks are indicated by significant whitespace indentation similar to Python.

NC stands for Next Character, though I gave it a short name since it's used often. It's basically wraps the native list head operator. If there's at least one element in the list, the first clause matches and the function returns Some(head,tail). If the list is empty, the second clause matches and the function returns None. The use of Some and None means this function returns an F# option type, which is similar to .NET's Nullable type. (head,tail) is a tuple - in this case, combining the head and the tail of the list together. Finally, the underscore in the second match clause is a wildcard, so that clause matches anything. I'm using it here like the default clause of a switch statement in C#.

The F# type for this function is 'a list -> ('a * 'a list) option. The 'a is a generic type parameter and the asterisk indicates a tuple. So NC takes a generic list, and returns an option tuple with a single generic element and a generic list. Even though the function is named Next Character, it would work with a list of any type.

Now that I have my own list head function, the rest of my parsing code can it. If I later decide to change the type of the parse buffer, I can change the implementation of NC - including the input and return types - without having to change the parsing functions that use NC. For example, here's a different implementation where the input parameter is a .NET string instead of a char list.

let NC (input : string)
    if input.Length > 0     
        then Some(input.Chars(0), input.Substring(1))     
        else None

Since I'm calling methods on the input parameter, I need to explicitly tell F# what type it is. The F# type for this function is string -> (char * string) option, which is obviously different from the type definition of the original NC version. But F#'s type inference automatically adjusts to handle the change in the type so functions that call NC don't have to change. FP languages like F# handle list operations extremely efficiently, so this version of NC is probably a step in the wrong direction. However, it's good to know I can easily encapsulate the details our the parse buffer type away from the rest of my parsing code.

Here's another function I'll use in parsing, defined entirely in terms of NC.

let TOKEN token input = 
    let rec ParseToken token input = 
        match token, NC input with  
        | t :: [], Some(i, input) when i = t ->
            Some(input) 
        | t :: token, Some(i, input) when i = t ->
            ParseToken token input 
        | _ -> None 
    ParseToken (List.of_seq token) input

The TOKEN function checks to see if the token string is at the front of the input parse buffer. If so, it returns the remainder of the buffer after the token. If not, it returns None. It's defined entirely in terms of NC, so it works the same with both the versions of NC I've written so far. However, depending on the implementation of NC, I might rewrite TOKEN. For example, if I was using the string version of NC, I'd probably rewrite this function to use String.StartsWith instead of recursion.

TOKEN defines a local recursive function called ParseToken. It's very common to see local functions defined inside other functions in FP, similar to how classes can define local classes in OO languages like C#. ParseToken recursively compares the characters in the token string with the characters in the input buffer, finishing when either it reaches the end of the token string or there's a mismatch. By default, functions in F# can't call themselves recursively by default, so ParseToken is declared to be recursive by using "let rec" instead of simply "let".

ParseToken shows off something interesting about the way symbols are used in F#. Remember that F# values are immutable by default. Yet, the symbols "token" and "input" appear to change. In the match statement, token and input represent the values passed into the function. However, in the match clauses, token and input represent the tail of those two lists. Technically, the values didn't change, F# allows you to reuse the symbols. If I wanted to avoid reusing symbols, I could define ParseToken this way (though I find this much less readable):

let rec ParseToken token input = 
    match token, NC input with  
    | t :: [], Some(i, itail) when i = t ->  
        Some(itail) 
    | t :: ttail, Some(i, itail) when i = t ->  
        ParseToken ttail itail 
    | _ -> None

Other than declaring ParseToken, the TOKEN function is a single line of code. It simply calls ParseToken, converting the token parameter into a char list along the way. While lists are very efficient, which would you rather type?

let token = TOKEN ['f';'o';'o'] input
let token = TOKEN "foo" input

Personally, I like the second version. I'm sure there's a slight perf hit to convert from a string to a list, but frankly I value readability over performance so I used strings for tokens. TOKEN uses the List.of_seq function to do the conversion. Seq is F#'s name for IEnumerable. Technically, TOKEN would work with any IEnumerable<char> type. However, in my source code, it's always going to be a string.

I also used List.of_seq to define a helper function String to Parse Buffer (aka S2PB) that converts a string into a character list. I use function often in the test code.

let S2PB input = List.of_seq input

If I were to change the input type of NC, I'd also change the implementation of S2PB so that it still took in a string but returned whatever the correct input type for NC.

The one problem with S2PB function is that you have to use it with parentheses almost all the time. If I write NC S2PB "foo", F# can't tell if I'm calling NC with two parameters or passing the result of S2PB "foo" to NC. Since NC is strongly typed to have only one input parameter, you might have thought it could automatically determine the calling order, but it doesn't.

I could make the function calls explicit with parenthesis by writing NC (S2PB "foo"). F# also provides a pipe operator, so I could pipe the result of S2PB into NC by writing S2PB "foo" |> NC. I didn't like either of those solutions, so instead I defined a custom unary operator !! as an alias. The parameter binding rules are different for custom operators because I can write NC !! "foo" without piping or parenthesis.

let (!!) input = S2PB input

So now I've got three functions and a custom operator that completely abstract away the parse buffer type. As long a my parsing functions only uses these functions to access the parse buffer, I'll be able to later change the buffer type without affecting my parsing code at all.

Posted By Harry Pierson at 1:18 PM Pacific Standard Time

Monday, December 10, 2007

Practical Parsing in F#

I'm interested in parsing because I'm interested in Domain Specific Languages. F# is pretty good for internal DSLs, but internal DSLs are obviously limited by the syntax of the host language. If you want complete control over the language, you've got to build your own parser.

The defacto standard for parser development is Yet Another Compiler Compiler, or yacc. There's a version of yacc for .NET as well as one specifically for F#. However, I'm not a fan of yacc. Yacc parsers are specified using context-free grammar (aka CFG). But CFG's can be ambiguous - actually, it's nearly impossible to build an unambiguous CFG. Personally, I'm a big fan of Parsing Expression Grammars (or PEGs) which among other advantages makes it impossible to develop ambiguous grammars. Furthermore, PEGs don't require a separate lexical analyzer like lex, so I think they're more suitable for building modular compilers.

Since I like PEGs and F# so much, I developed a parser for the PEG grammar from the original PEG whitepaper using F#. The grammar is much simpler than a language like C#, but with twenty nine grammar productions it's certainly not trivial. The F# implementation is fairly straightforward backtracking recursive decent parser, which makes it easy to understand even if you're not a parser guru. It's also small - around 400 lines of code including comments. But I think the code illustrates both the general value of Functional Programming as well as the specific value of F#. Here's how the series is shaping up (though this is subject to change):

I was originally planning to post the code for the parser itself with this post. However, i find that I'm revising the code as I write the articles in this series, so I'm going to hold off for now. If you're really desperate, drop me a line and I'll see what I can do.

Update - Almost forgot, if you're going to follow along at home, I'm using the latest version of F#, v1.9.3.7. Note, the F# Downloads page on the MS Research is woefully out of date, so go to the MS Research Downloads page. Currently, it's the most recent release. It snaps into VS 2005 and 2008 plus has command line tools. If you're an VS Express user, Douglas Stockwell explained how to roll your own F# Express.

Much Later Update - The code is now available on my Skydrive.

Posted By Harry Pierson at 2:40 PM Pacific Standard Time

Friday, December 07, 2007

Blogging F# Code

I'm going to start posting about my F# parsing code soon. Obviously, I'll make the code directly available, but I'm also going to be writing about it quite a bit. Since I'll be posting lots of F# code snippets, I took the time to build an F# language syntax definition for CodeHTMLer. Of all the various WL Writer Insert Code plug-ins, CodeHTMLer is my favorite because it can be configured not to use <pre> tags, which many RSS readers handle poorly (in my experience).

In case anyone else wants it, I've stuck the CodeHTMLer F# language definition up on my SkyDrive. If you using the CodeHTMLer WL Writer Plug-in, you can easily add this to your machine. Once you've installed CodeHTMLer and run it once, go to the command line and type "cd %appdata%\WindowsLiveWriter" and you'll find the LanguageDefinitions.xml file. Edit that file to insert the add the contents of my F# language definition after the <CodeLanguages> tag and you're all set.

BTW, the first language in the file will be the default language in the plug-in, so if you're an occasional F# user, you might want to add the F# definition to the end rather than the beginning of the file. If you don't want to further edit the XML file manually, you can select "Edit Languages" in the plug-in and edit the order of the languages to your heart's content.

Posted By Harry Pierson at 1:47 PM Pacific Standard Time

Tuesday, December 04, 2007

Functional Understanding

I was showing some of my cool (well, I think it's cool) F# parsing code to some folks @ DevTeach. I realized very quickly that a) most mainstream developers are fairly unaware of functional programming and b) I suck at explaining why functional programming matters. So I decided to take another stab at it. I probably should have posted this before my recent series on F#, but better late than never I suppose.

Right off the bat, the term "functional" is confusing. When you say "function" to a mainstream developer, they hear "subroutine". But when you say "function" to a mathematician, they hear "calculation". Functions in functional programming (aka FP) are closer to the mathematic concept. If you think about math functions, they're very different than subroutines. In particular, math functions have no intrinsic mutable data. If you have a math function like f(x) = x3, f(7) always equals 343, no matter how many times you call it. This is very different then a function like String.Length() where the value returned depends on the value of the string.

Another interesting aspect of math-style functions is that they have no side-effects. When you call StringBuilder.Append(), you're changing the internal state of the StringBuilder object. But FP functions don't work like that. Providing the same input always provides the same output (i.e. the same independent variable always yields the same dependent value).

If you're a .NET developer, this may sound strange, but you're probably very familiar with the String class which works exactly the same way.

A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

In other words, all variables in FP are a lot like .NET Strings. In fact, in many FP languages, variables are actually called "values" because they don't, in fact, vary.

It turns out that this approach to programming has significant upside for unit testing and concurrency. Unit tests typically spend a significant effort getting the objects they're testing into the right state to invoke the function under test. In FP, the result of a function is purely dependent on the values passed into it, which makes unit testing very straight forward. For concurrency, since functions don't share mutable state, there's no need to do complicated locking across multiple processors.

But if values don't vary, how to we managed application state? FP apps typically maintain their state on the stack. For example, my F# parser starts with a string input and return an abstract syntax tree. All the data is passed between functions on the stack. However, for most user-oriented non-console applications, keeping all state on the stack isn't realistic.  As Simon Peyton Jones points out, "The ultimate purpose of running a program is invariably to cause some side effect: a changed file, some new pixels on the screen, a message sent, or whatever." So all FP languages provide some mechanism for purposefully implementing side effects, some (like Haskell) stricter in their syntax than others.

One of the nice things about F#'s multi-paradigm nature is that side effects is a breeze. Of course, that's both a blessing and a curse, since the much of the aforementioned upside comes from purposefully building side-effect free functions. But the more I work with F#, the more I appreciate the ability to do both functional as well as imperative object-oriented operations in the same language. For example, my parsing code so far is purely functional - it takes in a string to be parsed and returns an AST. But the logical next step would be to generate output based on that AST. Since F# supports non-functional code - not to mention the rich Base Class Library - generating output should be straightforward.

Posted By Harry Pierson at 5:25 PM Pacific Standard Time

Monday, December 03, 2007

Morning Coffee 127

Posted By Harry Pierson at 10:12 AM Pacific Standard Time

Friday, November 30, 2007

F# Hawkeye : Assorted Not-So-Goodness

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

It's not all puppy dogs and ice cream with F#. Here are a few things that I didn't like about the lanugage.

Linear Scoping

In C#, a given piece of code is able to call any function it wants to (limited by CAS and visibility of course). If I define two functions, the first can call the second even thought the compiler has never seen the second function when it's parsing the first.

F# has linear scoping like C++ does. You can't call functions that haven't already been defined in the file (or a previous file that's already been fed to the compiler). This makes writing mutually recursive functions (A calls B, B calls C, C calls A) fairly annoying. Typically, in F# we declare functions with "let". But in the Additive function above, we're declaring the function with "and". By using "and", we're basically chaining together function declarations into a logical unit. Then, you mark the first function declaration in the chain as recursive, and now those methods are enabled for mutual recursion. Not exactly intuitive.

Frankly, I like C#'s ability to bind to methods that haven't been declared yet. I wonder if this is an intrinsic issue with FP or F# scoping rules, or if it's something they could fix if they took the time.

No Method Overloading

In my CheckForToken method, I use a string type to hold the token I'm looking for, since tokens can often be multi-character. However, for one character tokens, this is over kill. Not just in terms of creating a string object to contain just one character, but also in how I pattern match the token against the input string. If we're only looking for one character, we can skip recursion entirely. Yet there's no way to define two functions called Token that have different signatures.

Given type inference, this isn't surprising, but it's still a little annoying for folks coming from C# land.

Limited VS support

More evidence of a rotted brain. F#'s integration into VS is limited at best. It does syntax highlighting and debugging, but that's about it. The problem I keep running into is that I have two projects - my main project and my test project. Even though I've define the test project as being dependent on the main project, it doesn't automatically compile the test project when I change the main project. So I keep doing something like fix a bug, recompile, then run NUnit to see if the light turned green. It doesn't, because I haven't rebuilt the test project and it's still referencing the old version of the main project. Now that it's a "real" product, I'm hoping to see better integration into VS08. Maybe even an F# Express that leverages the new VS08 shell?

Posted By Harry Pierson at 8:57 AM Pacific Standard Time

F# Hawkeye : Assorted Goodness

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

Significant Whitespace

If you're a Python programmer, you already know this one. Instead of delineating code blocks explicit with curly braces or begin/end keywords, F# uses whitespace. Code blocks are indented relative to their parent. This enforces readability standards as well as conciseness. You can see that in the code Additive function above. Technically, this is optional in F# if you specify the #light compiler option, but pretty much all the docs and books assume this by default.

Custom Operators

This is minor, but cool nonetheless. Many languages let you overload existing operators like + and *. However, F# goes a step further and also lets you create custom operators. You just pick a combination of symbols that isn't already being used and define a function for it. For example, in my parsing code I wanted a simple way to adorn my input parse strings in my tests so that I could later easily change their type if I changed the type of NextChar and CheckForToken as described above. I defined the "double bang" operator !!. Currently, double bang converts a string into a character list, but originally it simply returned the string since I had written my Char and Token classes in terms of string.

Posted By Harry Pierson at 8:55 AM Pacific Standard Time

Thursday, November 29, 2007

F# Hawkeye : Type Inference

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

For you LINQ early adopters, you may think you know everything about type inference, but F#'s uses it much more extensively. In C#3 , you can write "var o = new SomeObject()" and the compiler is smart enough to figure out the variable o is of type SomeObject. Saves some typing, but it's not exactly brain surgery. F# can not only infer the type of local variables like C#, but it can also infer type of a function's input parameters and return value based on how those variables are used in the function. For example, in the Additive function, F# can infer that the "input" parameter is a char list because Token takes a generic list and '+' is a char.

F# automatically "generisizes" the functions you write. So if you write a function for traversing a list, by default it will work on a list of any type. You don't have to explicitly declare the generics, F# automatically makes your code as generic as possible, based on your usage of the variables.

What's really interesting about this approach is that changes to parameter or return types in a low-level function can have a rippling effect up the stack. In my parsing code, I haven't settled on the type I'm going to use to represent the string to be parsed. My tests are all short strings, so F#'s intrinsic char list type is fine. However, I don't know how well that will work for longer strings like a typical input file. F#'s native parsing tools (based on lex & yacc which I dislike) have a special LexBuffer class to represent the parse string. However, I've written my code so I can change the type of the lowest-level functions (NextChar and CheckForToken) and not affect the rest of my code. That's pretty wicked.

Type inference does have a downside. I guess VS has rotted my mind, but I'm hooked on Intellisense. The BCL is too big to remember all the classes and all the method parameters. Intellisense is kinda like Google a web search engine. If you sorta know what you're looking for, Intellisense helps close the gap to find it. Otherwise, it's time to break out the docs. However, if you're inferring type based on usage, Intellisense is out of luck. Honestly, there have been times where I've put in an explicit type declaration to get Intellisense to work, written the code, then removed the type declaration.

Posted By Harry Pierson at 7:41 AM Pacific Standard Time

F# Hawkeye : Pattern Matching

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

Most FP languages include some type of pattern matching, and F# is no exception. At first blush, pattern matching looks a little like a switch statement, but it's much more powerful. Where switch statements typically only do simple matches such as "does this variable equal this constant?". In F#, you can break apart types, use wildcards even pass the potential match data into a custom function to determines if there's a match. Of course, you can also do your more run-of-the-mill "does value equal constant" comparisons as well.

The problem with most functional language is that while pattern matching is powerful, it's not particularly extensible. As Don pointed out in a recent paper, this becomes a real problem when trying to integrate FP with OO. OO is designed to hide details behind abstractions. Yet those abstractions can't be used in pattern matching. Luckily, Don and the F# Guys (sounds like a band) have invented an extensible pattern matching syntax called Active Patterns to deal with this problem. Basically, in more recent versions of F#, you can adorn functions with special syntax so that you can use them in your pattern matching clauses.

This turns out to be wicked cool for writing parsers. I'm building recursive descent parsers, so each grammar production is implemented as a function. Yet, since I'm using the active pattern syntax, I can use them in pattern matching clauses. This allows me to chain together functions in a single match clause rather than having multiple match statements. And it's very readable. For example, the function to recognize the grammar production "Additive  <- Multitive ‘+’ Additive | Multitive" is translated into the following F#:

and (|Additive|_|) input = 
    match input with 
    | Multitive(v1,Token '+' (Additive(v2, input))) ->
          Some(v1+v2,input)
    | Multitive(v,input) -> Some(v,input)
    | _ -> None

The weird "bananas" around Additive on the first line indicate this is an active pattern. Multitive and Token are also an active patterns. This syntax is a little parens heavy, but otherwise, that translation from grammar to F# is nearly declarative. It almost defeats the need for having a parser generator when building a parser is this straightforward. Almost.

Posted By Harry Pierson at 7:40 AM Pacific Standard Time

Wednesday, November 28, 2007

F# Hawkeye : Functional Programming

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

If you're coming from the imperative object-oriented world of C#, Java or VB, functional programming just seems odd at first. But you might as well start getting used to it - functional aspects have been leaking into C# and VB with every version. The way LINQ chains together iterators (Where, Select, Order By, etc) is heavily influenced by FP (aka functional programming).

It takes a while to get the hang of FP, but I've found that some problems that are hard to solve in C# are easy in F#. For example, in my parsing code, I have a bunch of boilerplate caching and tracing code that I need to execute in each parsing function. How do I do that in C# without having to cut and paste the boilerplate code over and over again?

In F#, I wrote a function called CacheAndTrace that takes a parsing function as a parameter and returns a new function that adds the caching and tracing code. Since the new function has the same signature as the original parsing function, the caller can't tell the difference but it saves me having to write the boilerplate code over and over again.

Actually, you can do this type of functional composition with C#. In fact, I first wrote my version of the CacheAndTrace function in C#. But F#'s syntax is much better for functional programming - probably since it was designed for FP, rather than having FP tacked on later. 

My father once compared functional programming to Aspect Oriented Programming. AOP is about factoring apart cross cutting concerns, which tends to be hard to do in a straight-up OO language. But in an FP language, AOP-esque separation of concern is fairly straightforward.

F# isn't a pure functional language - it's actually a multi-paradigm language, so you can do imperative and OO stuff if you want to. For example, variables are typically immutable in functional programming. So how do you do something like cache the most recent result from a given function - which is what my CacheAndTrace function does? Frankly, I don't know. But in F#, I can easily mark the cache value as mutable so I can update it on every call. Sort of the best of both worlds, though mixing and matching can get a little tricky.

F#'s OO support is really useful when you interop with other .NET languages, since they're mostly OO themselves. For example, I'm using NUnit test cases for all my parsing code. My parsing code is F#, so I wanted to write my tests in F# as well. NUnit requires test methods to be grouped into classes known as test fixtures. Frankly, if I were designing a native F# xUnit library, I wouldn't require all the test methods to be grouped into a class. But it's easier to just define an test fixture object in F# rather than build my own xUnit Framework for F#

Posted By Harry Pierson at 8:22 AM Pacific Standard Time

F# as a Second .NET Language

(Harry is @ DevTeach in Vancounver with his family this week. He was hoping to still do Morning Coffee posts, but that's turned out to be infeasible. So instead, you get a series of pre-written posts about F#.)

I've been spending some real quality time with F# of late. I've been getting into parsing again, and it turns out that functional pattern matching languages like F# are really good at text processing. After claiming I'd learn F# this year, then abandoning the effort to learn Powershell, I went back to F#. Nothing against PowerShell - I've moved over to using it as my primary command line shell and have tricked out my startup script and everything. But I haven't found much need to code in it lately.

If you're a .NET ninja guru, you owe it to yourself to take a long look at F#, if for no other reason to expand your mind. It takes a while to get used to. Don Syme (aka father of F#) can attest I've been peppering him with questions (Thanks, Don!). I've also been writing code in C# and F#, and looking at the result in Reflector so I can understand what's happening. I certainly am not an F# expert by any stretch, but I do think I'm getting the hang of it.

Since I've reached this first plateau of getting it, I thought I'd write out some of the things I like and don't like about the language. This is by no means an introduction to F#. I'd recommend Robert Pickering's Foundations of F# book as well as Don's Expert F# book (when it comes out). You should also be reading Luke Hoban's and Jomo Fisher's blogs - they both just joined the F# team. C# MVP Tomas Petricek has written several blog posts introducing F#, which he's brought together in a single post. For an general overview of functional programming (the primary programming paradigm of F#), check out Slava Akhmechet's Functional Programming For The Rest of Us.

This turned into a fairly long post, so I split it out into a series that I'll post thru the end of the week.

UPDATE - added link to Tomas Petricek's F# Introduction

Posted By Harry Pierson at 8:20 AM Pacific Standard Time

Thursday, October 25, 2007

Morning Coffee 120

  • Doing these morning coffee posts is a lot tougher since I cut back my blog reading. Where I used to have no trouble finding 4-5 coffee-worthy items every day, these days I seem to only get 1-2, if that.
  • After starting off 3-0 and 100% on the PK, the Caps dropped four in a row and have been miserable on special teams. The special teams woes continued last night against the Lightning, but they still won. Caps went 0-4 on the powerplay, and coughed up a short handed goal. But they also went 3-3 on the PK, so I guess it wasn't all bad. Maybe my mother will stop calling for Hanlon's job now. It's a long season and as Peerless Prognosticator points out, the rebuild isn't over.
  • Jomo Fisher, who helped Scott Hanselman auto-merge assemblies, has been digging around in F# of late. As it turns out, he's joining the F# team so I'm thinking it's not a huge stretch for him. If you're a C# developer trying interested in getting a handle on this new F# thing, his blog is a good place to start.
  • Speaking of F#, Don Syme posts about yet another new F# feature: Async Workflows. Workflow is a bad term here IMO since it can be easily confused with WF. Regardless of it's name, Async Workflows is about making .NET's Async Programming model a first class citizen in F#. Robert Pickering has a good post explaining how this new feature works.
  • Microsoft sure has a lot of multi-threading / async-programming tools coming out. In addition to F# Async Workflows, there's the Concurrency and Coordination Runtime, Parallel LINQ and the Task Parallel Library. I would hope all this work eventually coalesces as a coherent product offering.
  • Now that F# is being "producized", I wonder if the language evolution will slow down. Async workflows were introduced in F# 1.9.2.9. Other recent changes include Computation Expressions (v1.9.2), Use Bindings (v1.9.2) and Active Patterns (v1.9.1). F# seems to churn more in minor releases than C# does in major releases. Of course, that's because F# was a research project, not a "real" product. Now that it's going to be a product, will the rate of innovation slow?
Posted By Harry Pierson at 9:50 AM Pacific Daylight Time

Friday, October 19, 2007

Morning Coffee 119

  • The biggest news of the week IMHO is Soma announcing the formation of an F# product team. Specifically, they will "fully integrate the F# language into Visual Studio and continue innovating and evolving F#." Though Soma calls F# "another first-class programming language on the CLR", I get the feeling there won't be a "Visual F#" sku. Don Syme has more on the news.
  • In other Soma announcement news, Popfly is now in beta. More details on what's new on the Popfly Team Blog. I haven't played with Popfly in depth, but I think it's got huge potential.
  • Scott Guthrie details the upcoming ASP.NET MVC Framework. Personally, I'm not building web apps much these days, so I'm not really invested one way or the other. Given the interest in this approach, it's nice to see the ASP.NET team respond to the market, though I'm sure someone will complain that we're trying to kill off the various open-source MVC Web frameworks that have sprung up.
  • Over in Windows Live, they shipped a new version of Live Search Maps, upgraded WL Photo Gallery (which I've been digging) to support Flickr and shipped an update to WL Accounts which allows you to link accounts.
  • The Clarius folks keep churning out great tools for software factory developers. The latest is the T4 editor, which brings intellisense, color syntax highlighting and property inspector support for Text Templating Transformation Toolkit (aka T4) files. T4 files are used for code generation in both DSL Toolkit and GAT.
  • David Pallman (again via Sam Gentile) suggests there are only three choices for infrastructure architecture: None/Point-to-point, Centralized/Hub-and-Spoke and Thin/Bus. I get the first two, but his explanation of the third goes to far into the "magic framework" category for my taste. "Physically distributed but logically centralized"? That doesn't make any sense to me at all.
  • Fellowship of the Ring makes its way onto XBLM. Alas, not in HD so I'll stick w/ my extended four hour DVD version thankyouverymuch.
Posted By Harry Pierson at 10:27 AM Pacific Daylight Time

Wednesday, August 08, 2007

Morning Coffee 109

  • I forgot to add a number to my last morning coffee post. However, after extensive research, I have determined that it was #108. So thing are continuing as usual today with #109. On the other hand, do you really want development and architecture opinions from a guy who can barely count? :)
  • The finalists in the Dream-Build-Play contest have been announced. I haven't played any of them yet (some are available for download) but they several of them sure look good.
  • And speaking of gaming, MS announced an Xbox 360 price drop yesterday. So if you want to get in on some of the XNA action, here's your chance (or you could just build for your PC - take your pick).
  • Finally on the gaming front, if you're not busy Monday you can watch the first day of Gamefest 2007 online. Get the scoop on XNA 2.0 as well as the new XNA networking support. I, alas, am busy Monday so I'll have to catch it on demand.
  • On to, you know, actual geek stuff things. Scott Guthrie seems to have retired his LINQ to SQL series and moved on to LINQ to XML. He shows how to build an RSS reader application with LINQ to XML. An oldie demo, but a goodie.
  • Wanna learn F#, there's a whole site of samples up on CodePlex. (via Don Syme)
  • Jeff Atwood is annoyed at how many different products you have to install to get a current & complete setup of VS 2005. Of course, MS shipped two parts of that stack since VS05 shipped (TFS & DBPro), three service packs (VS05 SP1, SQL 05 SP2 and DBPro SR1) and a major OS upgrade (VS Vista update). Doesn't the same thing happen with any shipping product after a few years? BTW, if this is such a huge hassle, I wonder why Jeff doesn't create a slipstreamed VS installer?
  • Udi Dahan has a great post on estimation where he claims "Developers don’t know how to estimate." No argument, but the way he phrases it sounds like it's the developer's fault they suck at estimation. It's not. Developing - by definition - is building something you've never built before. Is it any surprise we suck at estimating how long it will take us to do something we've never done before?
Posted By Harry Pierson at 10:15 AM Pacific Daylight Time

Thursday, June 07, 2007

TechEd 2007 Day Four

Yesterday was another day of talking primarily to people I know, inside and outside of Microsoft. Got into a long conversation with Gareth Jones and Peter Provost about combining test-driven and model-driven development. Having done evangelism for five of the last six years, I haven't been an agile practitioner. I'm getting to the point where I feel dirty when I don't write tests or don't check in, but not dirty enough to actually do anything about it (yet). But practicing or not, it was fascinating to hear Gareth and Peter brainstorm on this topic.

Speaking of storms, we had a downpour here yesterday. Thunderstorm moved right over the convention center - you could tell by how loud the thunder and rain were. I hit a seam in the storm heading back to my hotel, but I did get drenched heading to the influencer party @ Margaritaville. The party was fun, after I dried off, though I seem to remember knowing more of the influencers last time I was @ TechEd. Ended up sharing a cab back to the hotel with Ted Neward and Mark Miller. Ted's like the IT Industry's Switzerland, so I took the opportunity to pick his brain on the goings on in other communities - primarily the Ruby community.

I did get a chance to hack a little code yesterday. As a side effect of my interest in programming language design, I'm also interested in parser development. Towards that end, I've been learning about Parsing Expression Grammars. The original PEG parser was built in Haskell, but I decided to write mine in F#. Even though I had never worked in F# before, I got my parser up and running fairly easily the first time. I did hit one syntax snag that Don Syme helped me with. I'll blog this more in detail later, but I ported a simple arithmetic grammar packrat parser written in 120 lines of Haskell to about 90 lines of F#. Not bad for a first timer. (Don got it down to 25 lines, using F#'s new Active Patterns feature.)

I gearing up for my second talk, which happens right after lunch. I recorded a Virtual TechEd session this morning with the help of my friend Jon Flanders. It's an 8 minute overview of the Rome project, so it is VERY high level. But anything that helps get the word out I see as a good thing, right?

Posted By Harry Pierson at 8:45 AM Pacific Daylight Time
F# | Microsoft | Rome | TechEd
Change Congress
Recent Bookmarks
Tags .NET Framework (2) __clrtype__ (9) ADO.NET (5) Agile (7) AJAX (3) Architecture (288) Guidance (6) Interop (2) Modelling (61) Patterns (7) Process (4) SOA (94) Web Services (5) ASP.NET (25) Async Messaging (2) Azure (1) Battlestar Galactica (3) BI (2) BizTalk (4) Blogging (117) dasBlog (11) Podcasting (4) BPM (1) C# (11) C++ (4) Capitals (5) CardSpace (3) CLR (2) CodePlex (1) College Football (10) Comedy Central (1) Community (81) Concurrency (6) Consumer Electronics (1) Database (13) Debugger (23) Dependency Injection (2) Development (122) C Plus Plus (1) Embedded (5) Lanugages (42) Media (2) P2P (11) Rotor (1) SharePoint (6) SOP (3) DIY (1) DLR (25) Domain Specific Languages (15) Durable Messaging (5) Dynamic Languages (12) Dynamic Silverlight (1) Education (3) Enterprise 2.0 (1) Entertainment (14) ETech (15) F# (51) Functional Programming (17) Game Development (2) Guidance Automation (3) Hardware (8) HawkCodeBox (1) HawkEye (3) Health (1) Hockey (31) Home Electronics (1) Home Network (5) Hosting API (1) Humor (5) IASA (1) Idempotence (3) infrastructure (5) Instrumentation (4) Integration (2) IronPython (112) IronRuby (16) Java (2) Job (3) Kodu (1) LangNET (2) Lightweight Debugger (5) LINQ (23) Live Framework (3) Live Mesh (2) Lost (1) Master Data Management (1) Media 2.0 (6) Microsoft (31) MIX06 (2) Mobile Phone (1) Monads (5) Morning Coffee (172) Object Oriented (4) Office (5) Open Source (8) Open Space (2) Operations (3) Other (135) Art (1) Books (1) Family (33) Games (18) General Geekery (27) Home Theater (1) Movies (23) Music (20) Politics (3) Society (1) Sports (37) Working at MSFT (19) Parallel Programming (3) Parsing Expression Grammar (16) patterns & practices (2) PDC08 (5) Politics (48) Polyglot (3) PowerPoint (2) PowerShell (39) Presentation (7) Projects (1) HawkWiki (1) Pygments (5) Python (6) Quote of the Day (4) Refactoring (1) Research (2) REST (18) Reuse (5) Robotics (2) Rock Band (4) Rome (5) Ruby (23) Ruby on Rails (1) Sci-Fi (2) Scripting (4) Security (3) Service Broker (14) SharePoint (2) Silverlight (20) Social Software (1) Software + Services (2) Software Design (2) Software Engineering (1) Software Factories (11) Software Industry (1) Space Elevator (1) Spark (1) SQL Server (2) Stephen Colbert (1) TechEd (7) TechEd06 (1) TechRec League (1) Television (6) Travel (7) Unified Client (1) Unit Testing (4) USC (1) UX (1) Virtual PC (2) Visual Basic (3) Visual Studio (20) Volta (2) Washington Capitals (37) WCF (31) Web 2.0 (67) Web Services (7) WF (21) Windows (3) Windows Live (29) Windows Live Writer (3) WPF (8) Xbox (1) Xbox 360 (54) XML (11) XNA (15) Zune (4)
Disclaimer: The information in this weblog is provided "AS IS" with no warranties, and confers no rights. This weblog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. Inappropriate comments will be deleted at the authors discretion.