Practical F# Parsing : Syntactical Productions (2)

Now that I’ve moved over to Active Patterns, I want to go back and finish the syntactic productions for my PEG parser. Most of the syntactic productions are very straightforward when implemented in AP. We’ve seen EndOfFile, EndOfLine and Space already. There is also a series of symbol identifiers that have only a single match clause. For example, here’s DOT:

///DOT <- '.' Spacing
let (|DOT|_|) input =
    match input with
    | TOKEN "." (Spacing(input)) -> Some(input)
    | _ -> None

I’m not going to go thru all the symbol AP functions since their all basically like this one. However, you’ll notice that this function references an AP we haven’t seen yet – Spacing. I want to close out the section on Syntactical Productions by looking at the Spacing and Comment productions. Since Spacing depends on Comment, I’ll start with Comment.

Comments in PEG grammars are single lines that start with a # symbol, similar to the // line comments in F# and C#. This is the PEG grammar rule for Comment:

///Comment <- '#' (!EndOfLine .)* EndOfLine

Basically, this says that a comment starts with a #, then matches zero or more characters that are not EndOfLine, and ends with an EndOfLine. The exclamation point is a syntactic predicate, which means that we unconditionally backtrack after attempting to match. PEG has both a success and failure syntactic predicate – the ! is the failure predicate while & is the success predicate. So inside the parens, this production rule says to test the current point in the parse buffer for EndOfLine. If it finds it, the match fails and we exit out of the parens (where we match EndOfLine again without backtracking it this time). If it doesn’t find it, the parser backtracks, consumes the next character regardless what it is, then repeats.

Unfortunately, there’s a bug in this production. If the parse buffer ends in a comment, the production will fail since it hasn’t reached the EndOfLine and there are no more characters to consume. So I changed the production to:

///Comment <- '#' ((!EndOfLine / !EndOfFile) .)* EndOfLine?

This rule now ends the comment if it reaches an EndOfLine or EndOfFile. Additionally, it makes the final EndOfLine match optional. So if the comment ends with a new line, the new line is consumed as part of the grammar production. If the comment ends with the end of file, the EndOfFile is not consumed as part of the production. If you’ll recall, EndOfFile returns Some(unit) rather than Some(char list). In F#, the various branches of a match clause have to have the same return type, so you can’t return Some() from one branch and Some(input) from another. It’s no big deal – you use the EndOfFile production at the top-level grammar to ensure you’ve consumed the entire file anyway.

Here’s the F# implementation of Comment:

Comment defines a local AP function called CommentContent, which implements the part of the grammar production inside the parens.

///Comment <- '#' ((!EndOfLine / !EndOfFile) .)* EndOfLine?
let (|Comment|_|) input =  
    let rec (|CommentContent|_|) input =  
        match input with
        | EndOfLine (input) -> Some(input)
        | EndOfFile -> Some(input)
        | NC (_,input) -> (|CommentContent|_|) input
        | _ -> None
    match input with
    | TOKEN "#" (CommentContent (input)) -> Some(input)
    | _ -> None

Local AP function CommentContent recurses thru the input buffer after the pound sign, looking for  EndOfLine or EndOfFile. This function should never match the final default clause, but I put it in to keep the compiler from complaining. Notice that I use symbol redefinition here so both EndOf match clauses return Some(input). For EndOfLine, I’m re-defining input to mean what is returned by EndOfLine. For EndOfFile, I’m not re-defining, so input still means the list that is passed into the pattern match statement.

Compared to Comment, Spacing is pretty trivial:

///Spacing <- (Space / Comment)*
let rec (|Spacing|) input =  
    match input with
    | Space (input) -> (|Spacing|) input
    | Comment (input) -> (|Spacing|) input
    | _ -> input

There are two things I want to call out about spacing. First, it’s a recursive function, so it’s defined with let rec. AP functions can be recursive, just like normal functions. Also, note the lack of an underscore in the name of this AP function. Spacing is defined as zero or more spaces or comments, so it’s perfectly valid to match nothing. Thus, Spacing is always a successful match. In this case, we don’t put the underscore in the AP function name and we don’t wrap the return result in Some(). You’ll notice the last match clause simply returns input, rather than Some(input).

That’s all the syntactic predicates. Next up, the meat of the grammar: semantic predicates.

Morning Coffee 131

  • On a recommendation from my mother-in-law, I’ve been watching Torchwood. Sort of Men in Black, the series and set in Cardiff. Since it’s made in England, it’ll be one of the few shows still running in the new year due to the WGA strike.
  • A while back I pointed out that many DotNetKicks articles were submitted by their authors. I submitted a few of my own, just for kicks (har har), with mixed results. Today, I discovered that the parse buffer post from my Practical Parsing in F# series was submitted, picked up some kicks, and made it to the home page. That’s pretty cool. I guess writing more dev-focused articles is the way to go to get attention on DNK.
  • Amazon has rolled out a limited beta of SimpleDB, which appears to be S3 + query support. Cost is based on usage: 14¢/hour for machine utilization, 10¢/GB upload, 13-18¢/GB download and $1.50/GB storage/month. I’d love to see SimpleDB software that I could download and install, rather than hosted only. Even if I was going to use the hosted service, I’d like to develop against a non-hosted instance.
  • Research for sale! I was checking out the MS Research download feed and discovered a link to the Automatic Graph Layout (MSAGL) library. This was previously called GLEE (Graph Layout Execution Engine) and was “free for non-commercial use”. Now, you can buy it for $295 from Windows Marketplace (though the previous free version is still available). The idea of directly commercializing research like this strikes me as pretty unusual. It must be a really good library.
  • Scott Guthrie shows off the new Dynamic Data Support that will ship as part of the ASP.NET Extensions. I’m like, whatever. Scaffolding wasn’t that that interesting to me in RoR, so it’s no surprise that it’s not that interesting in ASP.NET.
  • Jeff “Party With” Palermo blogs about the IoC support in the new MVC Contrib project. Also looks like they’re porting RoR’s simply_restful. (via Scott Guthrie)
  • I need to try out some of Tomas Respro’s VS color schemes (also via Scott Guthrie)