Passion * Technology * Ruthless Competence

Wednesday, April 09, 2008

Morning Coffee 162

  • Another nice thing about the new job: I'm working in the vicinity of some good friends. I was over in building 42 yesterday and made it a point to stop by Pat Helland's office yesterday and spend an hour or so chatting about the new gig. Pat is down the hall from David Hill, whom I worked with on Architecture Strategy. Back in my building, we're down the hall from the VSX folks including my friends Ken Levy and Gareth Jones. I'm sure there are more folks I know around, but hey it's only my second week!
  • I'm a big fan of Carbonite, which I use to back up all the digital media on my home computer. With two little kids, we have lots of digital photos as you might imagine . However, one thing that bugs me about Carbonite is that it doesn't back up video files by default, you have to go in on a folder by folder basis and select "'Back up Video files in this folder" from the context menu. Given how much trouble this "feature" has given me, I imagine less techie folks don't even realize their video files aren't getting backed up. However, I will say the latest version of the Carbonite Software at least makes it easy to find files that aren't backed up. A quick sweep revealed around a dozen folders that had un-backed-up video files in them, which I promptly fixed.
  • The big news yesterday was the new Google App Engine, which looks to give you access to virtualized infrastructure that sounds similar to what GOOG is rumored to use internally. I like Dave Winer's comment that this enables "shrinkwrap net apps that scale that can be deployed by civillians." Given Google's history w/ Python - Python's BDFL Guido van Rossum works there - it's no surprise that Google App Engine (GAE?) runs on Python, though apparently they "look forward to supporting more languages in the future". I'm guessing "more languages" == Ruby, maybe Erlang too.
  • I wonder if/how Google App Engine will affect Ruby on Rails momentum? If there's a significant lag before App Engine supports Ruby, will that drive developers to Python web stacks like Django? (Django is included in "the box" with App Engine)?@ PyCon, I was surprised at the intra-language animosity I observed. I wonder how many Python developers are secretly hoping Google never ships Ruby support. I highly doubt Google would do that - they want to tap the exploding RoR market like everyone else - but I'd bet it would really take the wind out of Rails' sails if they did.
  • Today's Michael Foord Link: Embedding IronPython 2, Examples of the DLR Hosting API. You can read the DLR Hosting spec, but it's pretty out of date so Michael's article helps fill in some of the gaps.
  • Looks like PowerShell has gotten the open source community treatment in a project called Pash. While I'm sure others are excited about PS on Linux or Mac, I'm excited to see PS running on Compact Framework. I wonder if it would work with XNA?
  • Speaking of XNA, XNA Console is a new CodePlex project that provides an IPy console to manipulate your XNA based game on the fly. Python is no stranger to game development - Civ IV for example provided mod capabilities via python. Alas, the compact framework can't run IPy today, so neither can XNA on Xbox. But wouldn't it be cool to hack your game in IPy running on a 360 using the messenger kit? (via IPy URLs)
  • Bart De Smet gets functional, writing type switch and pattern matching in C# 3.0. I guess it works, but it sure is ugly. Why not just use F# and be done with it?
  • Soma announces that the VC++ Feature Pack has shipped. Somewhere, I assume, there is much (some?) rejoicing.
Posted By Harry Pierson at 10:20 AM Pacific Daylight Time

Monday, March 10, 2008

Morning Coffee 157

  • My Xbox 360 started flashing the dreaded Red Ring of Death on Friday. <sigh> I'm not going to have much time to play in the next week, so it's not the end of the universe, but I did have to dig an old DVD player out of the garage for interim duty.
  • My Caps really stepped in it over the weekend dropping two games they had to have and by most reports (aka according to my dad) that they dominated most of the way. Caps Playoff Math isn't as dire as say Clinton's Nomination Math, but they are three games back of the Hurricanes with twelve to play.
  • Ted Neward has a pretty good F# overview article in the most recent MSDN Magazine. I say pretty good because I wonder if someone with no functional programming experience will "get it". As much as I like F# and functional programming, I think some of the basic concepts don't pass Don Box's two beer test.
  • Speaking of Ted, somehow his feed fell off my radar (bad DevHawk!) and I missed several great posts like Modular Toolchains (note to Ted, check out A Research C# Compiler), Why we need both static and dynamic in the same language (note to self, check out Cobra) and The Fallacies Remain.... (recently, I'm the guy shouting about risks).
  • Speaking of MSDN Magazine, have you seen their new site redesign? I can't find any announcement of it, but man the site looks great.
  • If you missed MIX, the sessions are all online already. That was fast.
  • John Lam blogs about the availability of the Dynamic Silverlight bits. Apparently, Dynamic Silverlight includes more recent bits than the Silverlight 2 SDK, which does includes binaries and tools for IronPython, IronRuby and Managed JScript (quickstart). So you can get started with dynamic languages on Silverlight using the SL SDK alone, but I expect that the Dynamic Silverlight bits will be updated more regularly than the SDK.
Posted By Harry Pierson at 8:59 AM Pacific Standard Time

Tuesday, March 04, 2008

Play Ball Script in F#

Scott Hanselman wanted an F# version of this Play Ball! round-robin scheduling PowerShell script. Here's what I came up with:

let randomize list =
    let random = new System.Random()
    let list'= 
        list 
        |> List.map (fun i -> (random.Next(), i))
        |> List.sort (fun (i1,_) (i2,_) -> Int32.compare i1 i2) 
    let (_,list'') = List.unzip list'
    list''

let rec makeGames teams =
    match teams with
    | first :: rest ->
        [for team in rest -> (first, team)] @ (makeGames rest)
    | [] ->  []

let teams = ['a'..'f']
let games = teams |> makeGames |> randomize

MakeGames uses pattern matching to see if the list of teams is empty or not. If the list is empty, we simply return an empty list of games. If not, we use F#'s list comprehension syntax to generate a list of games between the first team in the list and each of the remaining teams. This list is combined (via the '@' operator) with the results of calling makeGames recursively. This generates the un-randomized list of games.

Once we have the list of games, we need to randomize it. I ported the randomize function above over from a version I found in Erlang. Basically, it attaches a random number to each element in the list to be randomized, sorts by those randomly generated numbers, then throws the numbers away and returns the just the randomized list. Note, the Erlang version I referenced runs randomize log(length of list) times to ensure a fair shuffle, but I thought once would be enough for this simple script.

(Note to F# team: perhaps List.randomize should be a part of the standard F# library?)

Posted By Harry Pierson at 12:58 PM Pacific Standard Time

Tuesday, January 29, 2008

Practical F# Parsing: Recursion and Predicate Functions

To prep for my Lang.NET talk, I went back and reviewed my PEG parser. One thing I was not happy with was that all the recursion was handled in a one-off manner. When I needed to match multiple characters in the comment rule, I wrote a special one-off function to recursively process the comment until it reached an EOL. When I needed to parse a series of ranges, characters or definitions, I wrote special one-off functions to handle that recursion. Obviously, that's not the best approach. So, I wrote the following active pattern functions to handle recursion.

//ZOM == Zero Or More   
let rec (|ZOM|) f input = 
    match f input with 
    | Some(i,input) -> 
        let j,input = (|ZOM|) f input
        (i :: j, input)
    | None -> [], input

//OOM == One Or More       
let (|OOM|_|) f input = 
    match (|ZOM|) f input with
    | [], input -> None
    | v, input -> Some(v,input)
    
//ZOO == Zero Or One
let (|ZOO|) f input = 
    match f input with 
    | Some(i,input) -> Some(i), input
    | None -> None,input

With these functions at the ready, I can stop writing one-off recursion functions. Instead, I write a function that matches a single item, which I pass as an argument to one of the three functions above. For example, here is the original and new version of the top level Grammar function.

//Original version
let (|Grammar|_|) input =
    let rec ParseDefinitions dl input =
        match input with
        | Definition (d, input) -> 
            ParseDefinitions (dl @ [d]) input
        | _ -> Some(dl, input)
    let (|OneOrMoreDefintions|_|) input = 
        match input with
        | Definition (d, input) -> ParseDefinitions [d] input
        | _ -> None
    match input with
    | Spacing (OneOrMoreDefintions (dl, EndOfFile)) -> 
        Some(List.to_array dl)
    | _ -> None
    
//New Version   
let (|Grammar|_|) = function
    | Spacing (OOM (|Definition|_|) (dl, EndOfFile)) ->
        Some(List.to_array dl)
    | _ -> None

The new version is much shorter, because there's already a function to match a single definition, which we can pass into OneOrMore (aka OOM). Note, when I pass an active pattern function as a parameter, I have to use it's real name (with the pipes and parameters). Having to use the real name is pretty ugly, but F# need to be able to differentiate between using a function as an active pattern vs using it as a function parameter. If you could just call "OOM Definition (dl, EndOfFile)", would F# realize Definition is a parameter?

I also defined syntactic predicate functions. If you'll recall, these syntactic predicates will try to match but automatically backtrack, returning success or failure depending on which function you called.

//FP == Failure Predicate
let (|FP|_|) f input =
    match f input with
    | Some(_) -> None
    | None -> Some(input)

//SP == Success Predicate
let (|SP|_|) f input =
    match f input with
    | Some(_) -> Some(input)
    | None -> None

To see this in action, here's the original and updated Primary function. Only the first rule is relevant, so I've omitted the others.

//Original version
let (|Primary|_|) input = 
    let (|NotLEFTARROW|_|) input = 
        match input with 
        | LEFTARROW (_) -> None 
        | _ -> Some(input) 
    match input with 
    | Identifier (id, NotLEFTARROW (input)) ->  
        Some(Primary.Identifier(id), input) 
    //rest of function omitted for clarity

//new version
let (|Primary|_|) = function 
    | Identifier (id, FP (|LEFTARROW|_|) (input)) -> 
        Some(Primary.Identifier(id), input) 
    //rest of function omitted for clarity

Instead of writing a special function to match "not left arrow", I just pass the left arrow function as a parameter to Failure Predicate (aka FP). With these recursion and syntactic predicate functions, I was able to remove all the one-off recursion functions from my parser. (Note, I posted an updated version of PegParser on my SkyDrive so you can see this in action.)

These five functions significantly reduced the complexity of the code. Unfortunately, I'm not sure it's much easier to read. The conciseness is offset IMO by the ugliness of using the active pattern's true names. Also, I would have liked to use custom operators for these five functions, but operators aren't allowed to be active pattern functions. Hopefully, that will change at some point in the future, though if we're going to dream of better syntax, can we do something about all the parens? Personally, I'd love to be able to write the following:

//This doesn't work, but I can dream, can't I?
let (|Primary|_|) = function  
    | Identifier (id) !!LEFTARROW (input) -> 
        Some(Primary.Identifier(id), input)  
    //rest of function omitted for clarity
   
let (|Grammar|_|) = function 
    | Spacing ++Definition (dl) EndOfFile -> 
        Some(List.to_array dl) 
    | _ -> None

Note to self, talk to F# team members who come to LangNET about this...

Posted By Harry Pierson at 9:40 AM Pacific Standard Time

Wednesday, January 09, 2008

Morning Coffee 137

  • Note, I somehow duplicated Morning Coffee 135. So I've skipped 136 to make up for it.
  • Congrats to Hillary Clinton for her unexpected win in the New Hampshire primary. As I said last week, I think Obama has a better chance of winning in November, but I've got nothing against Clinton or her politics.
  • Speaking of winning, congrats to LSU on winning the BCS. Are they the best team in college football? Personally, I don't think so - there are at least three other teams (Georgia, West VA and of course USC) who can make a persuasive argument that they should be #1. But losing to teams like Penn Pitt and Stanford, neither WVA and USC have an argument they should have been in the championship game. But that's what makes the BCS such BS. If nothing else, at least the "we need a playoff" meme is picking up steam.
  • This is sort of cool: Eye-fi is a wireless enabled SD card so you can wirelessly upload pictures from your camera to your PC or favorite photo service. However, I think the price needs to come down a bit. I recently bought a 2GB SD card for my wife's new camera for $20. A 2GB Eye-fi card is $99. Not sure wireless upload is worth 5x per card.
  • With all the focus on LINQ providing type-safe queries, it's easy to forget that some apps do need to build their queries at run time. Scott Guthrie points at a Dynamic LINQ C# sample (also available for VB) that builds LINQ expression trees from strings. It kinda takes you back to the bad-old-days of embedding SQL strings in your code, but there are scenarios - especially BI scenarios - where you need this capability.
  • Soma announces the VC++ 2008 Feature Pack Beta. This is the long-awaited (by who?) MFC update as well as support for the C++ TR1. TR1 provides some FP-esque support like function objects and tuples, so maybe this is worth a look. On the other hand, given that much (all?) of TR1 is lifted from Boost, maybe we should just use that.
  • Speaking of cool libraries, check out C5 (aka the Copenhagen Comprehensive Collection Classes for C#). It's basically a complete redesign of System.Collections.Generic (or SCG as they call it). I've read thru their online book and I'm very impressed. Of course, with me focused on F# of late, I'm primarily using immutable collections, so I'm not sure how much use I have for C5 right now.
  • There was a free CoDe magazine in my DevTeach bag back in November with a fascinating article on where LINQ goes from here - LINQ 2.0 if you will. One of things the article discusses is tier-splitting, which has seen the light of day in Volta. Will Volta also deliver External Relationships, Reshaping Combinators and Join Patterns or will those come from different projects?
  • I had to pave my workstation yesterday. I was running an interim build of Vista x64 SP1 and I couldn't make Virtual Server work with it. As part of the repave, I discovered I needed to update the firmware of my SCSI controller, but the update had to run under DOS. Freaking DOS? My workstation doesn't even have a floppy drive to boot DOS from! However, I was able to boot from a USB thumb disk instead. That's damn useful.

Friday, January 04, 2008

Morning Coffee 135

  • Congrats to Barack Obama for walking away with the Iowa Democratic Caucus, which set turnout records. Frankly, I'm pretty cool with any of the democratic front runners but I think Obama has the best chance of winning in November. I'm not sure Edwards second time around will be any more successful than the last and I believe Clinton would drive the GOP GOTV campaign better than any of the actual GOP candidates would.
  • Obviously, I like to play M-rated games like Bioshock and Mass Effect. But I also like games I can play with my kids like Lego Star Wars. There are two new Lego games coming out this year: Lego Indiana Jones and Lego Batman. I can't wait.
  • Speaking of gaming, Xbox LIVE had some issues over the holiday break, due to record setting sign-ups and concurrent users. Record setting numbers is a nice problem to have if you're on the business side, but a not-so-nice if you're a customer or work in operations. The XBL GM announced they're offering a "token of appreciation" for everyone's patience - a free XBLA game. Assuming it's not a crappy game, it's a classy move.
  • I watched Transformers on HD-DVD last night. Fun movie with lots of action, but man is it dumb. John Turturro is the only real stand-out.
  • Dustin Campbell implements cons, cdr and car from Scheme in C# and VB. While of limited production value (Dustin specifically warns readers not to use any of his code), it really demonstrates how different the functional world is from the object/imperative one, right down to the concept of type. Cons doesn't return a tuple, it returns function with two bound variables. (via DNK)
Posted By Harry Pierson at 10:00 AM Pacific Standard Time

Thursday, December 13, 2007

Practical F# Parsing: Syntactical Productions (1)

Now that I have my parse buffer functions and unit test framework taken care of, it's time to write some parsing expression grammar productions [1]. In any grammar, there are productions that have semantic meaning and productions that are only used to specify syntax. For example, in C# statements end with a semicolon and code blocks are delineated with curly braces. Syntactical elements don't end up in the abstract syntax tree, so they're easier to implement.

Most of PEG's grammar productions are syntactical in nature. Let's start with the simplest production, EndOfFile.

///EndOfFile <- !.
let EndOfFile input = 
    match NC input with
    | None -> Some()
    | _ -> None

Similar to the underscore in F#, the period represents any character In the PEG grammar and the exclamation point is a backtracking not. In other words, EndOfFile succeeds if NC fails, which only happens when the parse buffer is empty. Since the parse buffer is empty on a successful match, there's no value to return so Ireturn Some(). Some() is kinda like Nullable<void>, which doesn't make much sense in C#. But here it means I have a successful match but no data to pass back to the caller.

Here's a slightly more complex grammar production, EndOfLine:

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input = 
    match NC input with
    | Some('\r', input) ->
        match NC input with
        | Some ('\n', input) -> Some(input)
        | _ -> Some(input)
    | Some('\n', input) -> Some(input)
    | _ -> None

EndOfLine has three possible matches: \r\n, \n and \r. Here, you see the use of nested match statements in the \r match clause. If the top character is \r, the function checks the top of the remaining text buffer for \n. F#'s ability to reuse symbols comes in very handy here.

Unfortunately, this isn't code very readable. In particular, the match priority order, which matters in PEGs, is totally lost. For this production, it's no big deal, but that won't always be the case. I'm also not using the handy TOKEN function I wrote in a few posts ago. Here's an alternative version that uses TOKEN and preserves the match priority order.

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input = 
    match TOKEN "\r\n" input with
    | Some(input) -> Some(input)
    | _ -> 
        match TOKEN "\n" input with
        | Some(input) -> Some(input)
        | _ -> 
            match TOKEN "\r" input with
            | Some(input) -> Some(input)
            | _ -> None

This time, I use the TOKEN function and nested match statements to chain the results together. The previous method is probably faster, but this approach is more readable and explicitly preserves the match order. However, it sure is a pain to type. Wouldn't it be great if we could write something like this:

///EndOfLine <- '\r\n' / '\n' / '\r'   
let EndOfLine input =  
    match input with 
    | TOKEN "\r\n" (input) -> Some(input) 
    | TOKEN "\n" (input) -> Some(input) 
    | TOKEN "\r" (input) -> Some(input) 
    | _ -> None 

Turns out, F#'s Active Patterns feature let's me implement EndOfLine exactly like this. We'll look at how it works in the next post.


[1] The full PEG grammar is listed in "Parsing Expression Grammars: A Recognition-Based Syntactic Foundation" by Brian Ford.

Posted By Harry Pierson at 11:08 AM Pacific Standard Time

Morning Coffee 130

  • Michael Klucher announces the release of XNA Game Studio 2.0 and Major Nelson points to the press release announcing the release. You can get the bits from XNA Creators Club Online (the XNA dev center has yet to be updated).
  • Speaking of XNA, David Weller points out the warm-up challenge for Dream-Build-Play 2008. I assume networking will be a big part of this years' entries, but the warm-up challenge is to "Create a new and innovative use of Artificial Intelligence in a game".
  • Still speaking of XNA, Gamasutra has an interview with XNA GM Chris Satchell where he hints at a publishing channel for XNA games on the Xbox 360, with "full details" coming sometime in the new year.
  • The Capitals beat the Rangers in overtime last night. Since changing coaches on Thanksgiving, they're 6-3-1. That's great, but they're still five games under .500. The good news is that even though the Caps tied for last in the league, they're only six points out of a playoff spot with about fifty games left in the season.
  • My old team puts on an event every year called the Strategic Architects Forum. It's invite-only, but they've posted some of the videos, slides and transcripts from this year's event.
  • J.D. Meier discusses the new Guidance Explorer release. They're now up to 3,000 "nuggets" of guidance and they've moved the guidance store to MSDN. (via Sam Gentle)
  • Arnon Rotem-Gal-Oz explains further why arbitrary tier-splitting is bad. I'd also suggest reading Chapter 7 of PoEAA which provides another version of the same story: You can't take an object that's designed for fine-grained local access and make it remote without really screwing yourself up.
  • Eric Lippert thinks immutable data structures are "the way of the future in C#" so he's written a series on immutability. Posts include kinds of immutability, an immutable stack, an immutable covariant stack and an immutable queue. As I've discussed, immutable data structures are HUGE in functional programming. Eric's immutable stacks and queues are similar to F#'s native list type. (via Jomo Fisher)

Wednesday, December 12, 2007

Practical F# Parsing: Unit Testing

Now that I have functions to access the parse buffer, I better write some tests for them. Yes, I realize I should have written the tests first, but the articles flow better this way.

I've written before that one of the benefits to side-effect free functional programming is that it makes unit testing a breeze. But I still need some type of xUnit testing framework. I could write my own native F# xUnit framework, but given the availability of mature xUnit frameworks on .NET, I'd really rather just use one of them.

Traditionally, I've used NUnit or Visual Studio's unit testing framework, but they're designed to work with OO languages like C#. In order to use them from F#, we have to use the OO features of F#. Here's an example of some F# unit tests using NUnit.

type [<TestFixture>] parser_tests =   
    class   
        new () = {}   
          
        [<Test>]  
        member this.test_NC() =   
            let Some(c,text) = NC !!"test" 
            Assert.Equal(c, 't')   
            Assert.Equal(text, !!"est")

        [<Test>]  
        member this.test_NC_empty_string() =   
            let ret = NC !!"" 
            Assert.Equal(None, ret)
    end

While this works, there's an awful lot of extraneous text needed to make this work. Test functions need to be methods on a Test Fixture class (note, F# uses [< >] to indicate attributes) and that class needs a default constructor. F# doesn't add one by default, so we have to do so manually. And every test function needs to be marked with "member this".

I'd really rather write tests that looks like this:

[<Test>]   
let test_NC =    
    let Some(c,text) = NC !!"test" 
    Assert.Equal(c, 't')   
    Assert.Equal(text, !!"est")

[<Test>]   
let test_NC_empty_string =    
    let ret = NC !!"" 
    Assert.Equal(None, ret)

That's a lot more straightforward. If only I could write my test code like that...

It turns out there's a new kid on the .NET unit testing block. xUnit.net is the brainchild of Jim Newkirk (one of the original NUnit developers) and Brad Wilson (aka the .NET Guy). Among other things, xUnit.net does away with the TestFixture attribute. All public methods in all public classes are checked for tests in xUnit.net.

Since we don't need the TestFixture, does that mean I can write the tests as F# functions if I use xUnit.net? Not quite. xUnit.net only checks for public instance methods on the public classes in a test assembly. But F# functions get compiled as static methods. Luckily, xUnit.net is simple enough to change. I submitted a patch to xUnit.net that enables it to find both static and instance test methods (and to skip creating and disposing an object instance for static test methods). I'm hoping it will be accepted and ship as part of their next release. Until then, I'm running my own private build with those changes included.

Now that I've settled on a unit test framework, let's look at some tests. For my parser solution, I have two projects: PegParser and PegParser.Tests. The tests project depends both on the PegParser assembly as well as xunit.dll, so I need to set a reference to both in my project. F# projects in VS don't have the References node in the project tree, you have to either add the references on the project property page or directly within the code. Not sure which is better, but it's easier to show the code syntax:

#R @"..\..\xUnit.net\Main\xunit\bin\Debug\xunit.dll"
#R @"..\PegParser\pegparser.dll"

open Xunit
open Parser

The #R compiler directive is used to reference an external assembly. F#'s open statement acts like C#'s using statement, so I can reference types without specifying their full namespace. You'll notice that the parser is implemented in a dll called pegparser.dll while the namespace is simply Parser. Actually, it's not really a namespace. If you open PegParser.dll in Reflector, you'll notice that Parser is actually a type, and the functions are all implemented as static methods. F# hides that from you, though you'd have to know that if you wanted to invoke the parser from C# or VB.net. By default, F# uses the filename as the namespace/class name and I haven't changed that default in my parser code (though I probably should).

Once we've referenced the correct assemblies, I need to write the tests. Here are two tests for NC (aka Next Char) function I wrote in the last post.

[<Fact>]   
let test_NC_empty_string () =    
    let ret = NC !!""  
    Assert.Equal(None, ret) 

[<Fact>]   
let test_NC_simple_string () =    
    let Some(c,text) = NC !!"test"  
    Assert.Equal(c, 't')    
    Assert.Equal(text, !!"est") 

You'll notice this code is almost identical to my wish test code above. Almost. There are a few syntax changes - In xUnit.net, tests are called facts and Assert.AreEqual is simply Assert.Equal. I've also had to add empty parentheses after each test function name. Remember that functions in FP are like math functions. If there's no independent value to pass in, the result of a math function is is constant. F# compiles parameter-less functions as static properties instead of a static methods. In order to make the test functions compile as static methods, we have to pass in at least one parameter. In F#, the unit type is the equivalent of the void type in C#. Unit has exactly one valid value - the empty parens. Adding the empty parens to the parameter list of the test functions ensures they get compiled into static methods.

Note, it's really really easy to forget to add those empty parens. If you don't add them, the code will still compile, but the tests won't be found or run. I've been bit by that once already, so if you have a bunch of tests not running, make sure they have the empty parens!

So each test method feeds a parse buffer (converted from a string with the custom !! operator) into the NC function and makes assertions about the result. In the first test, NC should return None if the parse buffer is empty, so I simply compare the function result to None via Assert.Equals. In the second test, I use F#'s pattern matching capability to assign the result of NC to the value Some(c,text). Basically, this is doing simple pattern matching to bind the two value names to the two parts of the tuple returned from NC. Those two values are then asserted to be a specific values as you would expect.

Note, in the current version of F#, the line let Some(c,text) = NC !!"test" yields two warnings. The first (FS0062) warns that a future version of the language will require parens around Some(c,text). I sure hope they change their minds on this, since active patterns are already so parens-heavy. The second (FS0025) warns that this is an incomplete pattern match. If NC returns None, the pattern wont match and the function will throw a MatchFailureException. Of course, since these are unit tests, that's exactly the behavior I want! Given the nature of these warnings, personally, I turn them both off (only in my unit tests, mind you). This is done via the #nowarn compiler directives at the top of the file.

#nowarn "25" //Turn off Incomplete Pattern Match warning
#nowarn "62" //Turn off Some contruct needs parens warning

Obviously, there are more tests than just these (though my total code coverage is pretty poor, shame on me) but they're all pretty similar. As you can see, there's tests are very straight forward. The nature of FP functions makes them fairly simple to test, and xUnit.net (with a couple of minor changes) makes it easy to write your unit tests in F#.

Posted By Harry Pierson at 11:40 AM Pacific Standard Time

Tuesday, December 11, 2007

Practical F# Parsing: The Parse Buffer

The first thing I need for my F# parser is a type to represent the buffer of characters being parsed. For now, I'm using the simplest thing that could possibly work: an intrinsic F# character list. Lists are heavily used in functional languages, so they tend to have very efficient native list types. F# is no exception. Along with a native list type, F# has a native operation for retrieving the head of a list. For example, If you execute the following code, head will be bound to '1' and tail will be bound to the list ['2';'3';'4']

let head :: tail = ['1';'2';'3';'4']

The problem using the native list head operator is that my parsing code will be explicitly bound to work on character lists only. However, I'm not sure an in-memory character list is the best way to read long files of text off the disk for processing, so I'd rather not limit future options like this. So instead, I'll define my own function to retrieve the head of the parsing buffer.

let NC input =  
    match input with     
    | head :: tail -> Some(head, tail)    
    | _ -> None

Note, in all my F# code I use the #light syntax which means code blocks are indicated by significant whitespace indentation similar to Python.

NC stands for Next Character, though I gave it a short name since it's used often. It's basically wraps the native list head operator. If there's at least one element in the list, the first clause matches and the function returns Some(head,tail). If the list is empty, the second clause matches and the function returns None. The use of Some and None means this function returns an F# option type, which is similar to .NET's Nullable type. (head,tail) is a tuple - in this case, combining the head and the tail of the list together. Finally, the underscore in the second match clause is a wildcard, so that clause matches anything. I'm using it here like the default clause of a switch statement in C#.

The F# type for this function is 'a list -> ('a * 'a list) option. The 'a is a generic type parameter and the asterisk indicates a tuple. So NC takes a generic list, and returns an option tuple with a single generic element and a generic list. Even though the function is named Next Character, it would work with a list of any type.

Now that I have my own list head function, the rest of my parsing code can it. If I later decide to change the type of the parse buffer, I can change the implementation of NC - including the input and return types - without having to change the parsing functions that use NC. For example, here's a different implementation where the input parameter is a .NET string instead of a char list.

let NC (input : string)
    if input.Length > 0     
        then Some(input.Chars(0), input.Substring(1))     
        else None

Since I'm calling methods on the input parameter, I need to explicitly tell F# what type it is. The F# type for this function is string -> (char * string) option, which is obviously different from the type definition of the original NC version. But F#'s type inference automatically adjusts to handle the change in the type so functions that call NC don't have to change. FP languages like F# handle list operations extremely efficiently, so this version of NC is probably a step in the wrong direction. However, it's good to know I can easily encapsulate the details our the parse buffer type away from the rest of my parsing code.

Here's another function I'll use in parsing, defined entirely in terms of NC.

let TOKEN token input = 
    let rec ParseToken token input = 
        match token, NC input with  
        | t :: [], Some(i, input) when i = t ->
            Some(input) 
        | t :: token, Some(i, input) when i = t ->
            ParseToken token input 
        | _ -> None 
    ParseToken (List.of_seq token) input

The TOKEN function checks to see if the token string is at the front of the input parse buffer. If so, it returns the remainder of the buffer after the token. If not, it returns None. It's defined entirely in terms of NC, so it works the same with both the versions of NC I've written so far. However, depending on the implementation of NC, I might rewrite TOKEN. For example, if I was using the string version of NC, I'd probably rewrite this function to use String.StartsWith instead of recursion.

TOKEN defines a local recursive function called ParseToken. It's very common to see local functions defined inside other functions in FP, similar to how classes can define local classes in OO languages like C#. ParseToken recursively compares the characters in the token string with the characters in the input buffer, finishing when either it reaches the end of the token string or there's a mismatch. By default, functions in F# can't call themselves recursively by default, so ParseToken is declared to be recursive by using "let rec" instead of simply "let".

ParseToken shows off something interesting about the way symbols are used in F#. Remember that F# values are immutable by default. Yet, the symbols "token" and "input" appear to change. In the match statement, token and input represent the values passed into the function. However, in the match clauses, token and input represent the tail of those two lists. Technically, the values didn't change, F# allows you to reuse the symbols. If I wanted to avoid reusing symbols, I could define ParseToken this way (though I find this much less readable):

let rec ParseToken token input = 
    match token, NC input with  
    | t :: [], Some(i, itail) when i = t ->  
        Some(itail) 
    | t :: ttail, Some(i, itail) when i = t ->  
        ParseToken ttail itail 
    | _ -> None

Other than declaring ParseToken, the TOKEN function is a single line of code. It simply calls ParseToken, converting the token parameter into a char list along the way. While lists are very efficient, which would you rather type?

let token = TOKEN ['f';'o';'o'] input
let token = TOKEN "foo" input

Personally, I like the second version. I'm sure there's a slight perf hit to convert from a string to a list, but frankly I value readability over performance so I used strings for tokens. TOKEN uses the List.of_seq function to do the conversion. Seq is F#'s name for IEnumerable. Technically, TOKEN would work with any IEnumerable<char> type. However, in my source code, it's always going to be a string.

I also used List.of_seq to define a helper function String to Parse Buffer (aka S2PB) that converts a string into a character list. I use function often in the test code.

let S2PB input = List.of_seq input

If I were to change the input type of NC, I'd also change the implementation of S2PB so that it still took in a string but returned whatever the correct input type for NC.

The one problem with S2PB function is that you have to use it with parentheses almost all the time. If I write NC S2PB "foo", F# can't tell if I'm calling NC with two parameters or passing the result of S2PB "foo" to NC. Since NC is strongly typed to have only one input parameter, you might have thought it could automatically determine the calling order, but it doesn't.

I could make the function calls explicit with parenthesis by writing NC (S2PB "foo"). F# also provides a pipe operator, so I could pipe the result of S2PB into NC by writing S2PB "foo" |> NC. I didn't like either of those solutions, so instead I defined a custom unary operator !! as an alias. The parameter binding rules are different for custom operators because I can write NC !! "foo" without piping or parenthesis.

let (!!) input = S2PB input

So now I've got three functions and a custom operator that completely abstract away the parse buffer type. As long a my parsing functions only uses these functions to access the parse buffer, I'll be able to later change the buffer type without affecting my parsing code at all.

Posted By Harry Pierson at 1:18 PM Pacific Standard Time

Tuesday, December 04, 2007

Functional Understanding

I was showing some of my cool (well, I think it's cool) F# parsing code to some folks @ DevTeach. I realized very quickly that a) most mainstream developers are fairly unaware of functional programming and b) I suck at explaining why functional programming matters. So I decided to take another stab at it. I probably should have posted this before my recent series on F#, but better late than never I suppose.

Right off the bat, the term "functional" is confusing. When you say "function" to a mainstream developer, they hear "subroutine". But when you say "function" to a mathematician, they hear "calculation". Functions in functional programming (aka FP) are closer to the mathematic concept. If you think about math functions, they're very different than subroutines. In particular, math functions have no intrinsic mutable data. If you have a math function like f(x) = x3, f(7) always equals 343, no matter how many times you call it. This is very different then a function like String.Length() where the value returned depends on the value of the string.

Another interesting aspect of math-style functions is that they have no side-effects. When you call StringBuilder.Append(), you're changing the internal state of the StringBuilder object. But FP functions don't work like that. Providing the same input always provides the same output (i.e. the same independent variable always yields the same dependent value).

If you're a .NET developer, this may sound strange, but you're probably very familiar with the String class which works exactly the same way.

A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable.

A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.

In other words, all variables in FP are a lot like .NET Strings. In fact, in many FP languages, variables are actually called "values" because they don't, in fact, vary.

It turns out that this approach to programming has significant upside for unit testing and concurrency. Unit tests typically spend a significant effort getting the objects they're testing into the right state to invoke the function under test. In FP, the result of a function is purely dependent on the values passed into it, which makes unit testing very straight forward. For concurrency, since functions don't share mutable state, there's no need to do complicated locking across multiple processors.

But if values don't vary, how to we managed application state? FP apps typically maintain their state on the stack. For example, my F# parser starts with a string input and return an abstract syntax tree. All the data is passed between functions on the stack. However, for most user-oriented non-console applications, keeping all state on the stack isn't realistic.  As Simon Peyton Jones points out, "The ultimate purpose of running a program is invariably to cause some side effect: a changed file, some new pixels on the screen, a message sent, or whatever." So all FP languages provide some mechanism for purposefully implementing side effects, some (like Haskell) stricter in their syntax than others.

One of the nice things about F#'s multi-paradigm nature is that side effects is a breeze. Of course, that's both a blessing and a curse, since the much of the aforementioned upside comes from purposefully building side-effect free functions. But the more I work with F#, the more I appreciate the ability to do both functional as well as imperative object-oriented operations in the same language. For example, my parsing code so far is purely functional - it takes in a string to be parsed and returns an AST. But the logical next step would be to generate output based on that AST. Since F# supports non-functional code - not to mention the rich Base Class Library - generating output should be straightforward.

Posted By Harry Pierson at 5:25 PM Pacific Standard Time
DevHawk
World Tour 2008
DevDays 2008

Change Congress
Recent Bookmarks