Practical F# Parsing: Syntactical Productions (1)

Now that I have my parse buffer functions and unit test framework taken care of, it’s time to write some parsing expression grammar productions 1. In any grammar, there are productions that have semantic meaning and productions that are only used to specify syntax. For example, in C# statements end with a semicolon and code blocks are delineated with curly braces. Syntactical elements don’t end up in the abstract syntax tree, so they’re easier to implement.

Most of PEG’s grammar productions are syntactical in nature. Let’s start with the simplest production, EndOfFile.

///EndOfFile <- !.
let EndOfFile input =
    match NC input with
    | None -> Some()
    | _ -> None

Similar to the underscore in F#, the period represents any character In the PEG grammar and the exclamation point is a backtracking not. In other words, EndOfFile succeeds if NC fails, which only happens when the parse buffer is empty. Since the parse buffer is empty on a successful match, there’s no value to return so I return Some(). Some() is kinda like Nullable, which doesn’t make much sense in C#. But here it means I have a successful match but no data to pass back to the caller.

Here’s a slightly more complex grammar production, EndOfLine:

///EndOfLine <- ‘rn’ / ‘n’ / ‘r’
let EndOfLine input =
    match NC input with
    | Some(‘r’, input) ->
        match NC input with
        | Some (‘n’, input) -> Some(input)
        | _ -> Some(input)
    | Some(‘n’, input) -> Some(input)
    | _ -> None

EndOfLine has three possible matches: rn, n and r. Here, you see the use of nested match statements in the r match clause. If the top character is r, the function checks the top of the remaining text buffer for n. F#’s ability to reuse symbols comes in very handy here.

Unfortunately, this isn’t code very readable. In particular, the match priority order, which matters in PEGs, is totally lost. For this production, it’s no big deal, but that won’t always be the case. I’m also not using the handy TOKEN function I wrote in a few posts ago. Here’s an alternative version that uses TOKEN and preserves the match priority order.

///EndOfLine <- ‘rn’ / ‘n’ / ‘r’
let EndOfLine input =
    match TOKEN “rn” input with
    | Some(input) -> Some(input)
    | _ ->
        match TOKEN “n” input with
        | Some(input) -> Some(input)
        | _ ->
            match TOKEN “r” input with
            | Some(input) -> Some(input)
            | _ -> None

This time, I use the TOKEN function and nested match statements to chain the results together. The previous method is probably faster, but this approach is more readable and explicitly preserves the match order. However, it sure is a pain to type. Wouldn’t it be great if we could write something like this:

///EndOfLine <- 'rn' / 'n' / 'r'
let EndOfLine input =
    match input with
    | TOKEN "rn" (input) -> Some(input)
    | TOKEN "n" (input) -> Some(input)
    | TOKEN "r" (input) -> Some(input)
    | _ -> None

Turns out, F#’s Active Patterns feature let’s me implement EndOfLine exactly like this. We’ll look at how it works in the next post.

Morning Coffee 130

  • Michael Klucher announces the release of XNA Game Studio 2.0 and Major Nelson points to the press release announcing the release. You can get the bits from XNA Creators Club Online (the XNA dev center has yet to be updated).
  • Speaking of XNA, David Weller points out the warm-up challenge for Dream-Build-Play 2008. I assume networking will be a big part of this years’ entries, but the warm-up challenge is to “Create a new and innovative use of Artificial Intelligence in a game”.
  • Still speaking of XNA, Gamasutra has an interview with XNA GM Chris Satchell where he hints at a publishing channel for XNA games on the Xbox 360, with “full details” coming sometime in the new year.
  • The Capitals beat the Rangers in overtime last night. Since changing coaches on Thanksgiving, they’re 6-3-1. That’s great, but they’re still five games under .500. The good news is that even though the Caps tied for last in the league, they’re only six points out of a playoff spot with about fifty games left in the season.
  • My old team puts on an event every year called the Strategic Architects Forum. It’s invite-only, but they’ve posted some of the videos, slides and transcripts from this year’s event.
  • J.D. Meier discusses the new Guidance Explorer release. They’re now up to 3,000 “nuggets” of guidance and they’ve moved the guidance store to MSDN. (via Sam Gentle)
  • Arnon Rotem-Gal-Oz explains further why arbitrary tier-splitting is bad. I’d also suggest reading Chapter 7 of PoEAA which provides another version of the same story: You can’t take an object that’s designed for fine-grained local access and make it remote without really screwing yourself up.
  • Eric Lippert thinks immutable data structures are “the way of the future in C#” so he’s written a series on immutability. Posts include kinds of immutability, an immutable stack, an immutable covariant stack and an immutable queue. As I’ve discussed, immutable data structures are HUGE in functional programming. Eric’s immutable stacks and queues are similar to F#’s native list type. (via Jomo Fisher)