Practical F# Parsing: Recursion and Predicate Functions

To prep for my Lang.NET talk, I went back and reviewed my PEG parser. One thing I was not happy with was that all the recursion was handled in a one-off manner. When I needed to match multiple characters in the comment rule, I wrote a special one-off function to recursively process the comment until it reached an EOL. When I needed to parse a series of ranges, characters or definitions, I wrote special one-off functions to handle that recursion. Obviously, that’s not the best approach. So, I wrote the following active pattern functions to handle recursion.

//ZOM == Zero Or More
let rec (|ZOM|) f input =
    match f input with
    | Some(i,input) ->
        let j,input = (|ZOM|) f input
        (i :: j, input)
    | None -> [], input

//OOM == One Or More
let (|OOM|_|) f input =
    match (|ZOM|) f input with
    | [], input -> None
    | v, input -> Some(v,input)

//ZOO == Zero Or One
let (|ZOO|) f input =
    match f input with
    | Some(i,input) -> Some(i), input
    | None -> None,input

With these functions at the ready, I can stop writing one-off recursion functions. Instead, I write a function that matches a single item, which I pass as an argument to one of the three functions above. For example, here is the original and new version of the top level Grammar function.

//Original version
let (|Grammar|_|) input =
    let rec ParseDefinitions dl input =
        match input with
        | Definition (d, input) -> ParseDefinitions (dl @ [d]) input
        | _ -> Some(dl, input)
    let (|OneOrMoreDefintions|_|) input =
        match input with
        | Definition (d, input) -> ParseDefinitions [d] input
        | _ -> None
    match input with
    | Spacing (OneOrMoreDefintions (dl, EndOfFile)) ->
          Some(List.to_array dl)
    | _ -> None

//New Version
let (|Grammar|_|) = function
    | Spacing (OOM (|Definition|_|) (dl, EndOfFile)) ->
          Some(List.to_array dl)
    | _ -> None

The new version is much shorter, because there’s already a function to match a single definition, which we can pass into OneOrMore (aka OOM). Note, when I pass an active pattern function as a parameter, I have to use it’s real name (with the pipes and parameters). Having to use the real name is pretty ugly, but F# need to be able to differentiate between using a function as an active pattern vs using it as a function parameter. If you could just call OOM Definition (dl, EndOfFile), would F# realize Definition is a parameter?

I also defined syntactic predicate functions. If you’ll recall, these syntactic predicates will try to match but automatically backtrack, returning success or failure depending on which function you called.

//FP == Failure Predicate
let (|FP|_|) f input =
    match f input with
    | Some(_) -> None
    | None -> Some(input)

//SP == Success Predicate
let (|SP|_|) f input =
    match f input with
    | Some(_) -> Some(input)
    | None -> None

To see this in action, here’s the original and updated Primary function. Only the first rule is relevant, so I’ve omitted the others.

//Original version
let (|Primary|_|) input =
    let (|NotLEFTARROW|_|) input =
        match input with
        | LEFTARROW (_) -> None
        | _ -> Some(input)
    match input with
    | Identifier (id, NotLEFTARROW (input)) ->
        Some(Primary.Identifier(id), input)
    //rest of function omitted for clarity

//new version
let (|Primary|_|) = function
    | Identifier (id, FP (|LEFTARROW|_|) (input)) ->
          Some(Primary.Identifier(id), input)
    //rest of function omitted for clarity

Instead of writing a special function to match “not left arrow”, I just pass the left arrow function as a parameter to Failure Predicate (aka FP). With these recursion and syntactic predicate functions, I was able to remove all the one-off recursion functions from my parser. (Note, I posted an updated version of PegParser on my SkyDrive so you can see this in action.)

These five functions significantly reduced the complexity of the code. Unfortunately, I’m not sure it’s much easier to read. The conciseness is offset IMO by the ugliness of using the active pattern’s true names. Also, I would have liked to use custom operators for these five functions, but operators aren’t allowed to be active pattern functions. Hopefully, that will change at some point in the future, though if we’re going to dream of better syntax, can we do something about all the parens? Personally, I’d love to be able to write the following:

//This doesn't work, but I can dream, can't I?
let (|Primary|_|) = function
    | Identifier (id) !!LEFTARROW (input) ->
        Some(Primary.Identifier(id), input)
    //rest of function omitted for clarity

let (|Grammar|_|) = function
    | Spacing ++Definition (dl) EndOfFile ->
        Some(List.to_array dl)
    | _ -> None

Note to self, talk to F# team members who come to LangNET about this…