Now that I’ve explained the AST, there are several more semantic productions to go. I’m not going to describe them all in detail, just hit a few important highlights.
Many of the semantic productions return lists of other productions. Class returns a list of Ranges, Literal and Identifier returns lists of characters, etc. As you would expect, these multiples are encoded in the grammar. For example, here’s the implementation of Literal:
///Literal <- ['] (!['] Char)* ['] Spacing /// / ["] (!["] Char)* ["] Spacing let (|Literal|_|) input = let rec (|LitChars|_|) delimiter chars input = match input with | TOKEN delimiter (_) -> Some(L2S chars, input) | Char (c, input) -> (|LitChars|_|) delimiter (chars @ [c]) input | _ -> None match input with | TOKEN "'" (LitChars "'" [] (lit, TOKEN "'" (Spacing(input)))) -> Some(lit, input) | TOKEN """ (LitChars """ [] (lit, TOKEN """ (Spacing(input)))) -> Some(lit, input) | _ -> None
I’m using a local recursive function LitChars to retrieve the characters
between the quote delimiters. The quote parameter – i.e. single or
double quote – is passed in as a parameter. I also pass in an empty list
of chars as a parameter. Remember that functional programs keep their
data on the stack, a list parameter is a common way to keep state in a
recursive function. When I match a single non-delimiter character, I add
it to the list with the chars @ [c]
expression. [c] converts a single
value c into a list of one element while the @ operator concatenates to
lists. I’m not sure adding the value to he end like that is a good idea
perf wise. Programming
Erlang recommends only adding
items to the head then reversing the list when you’re done matching. But
F# isn’t Erlang, so I’m not sure what the guidance is here.
Another thing you find in productions is the backtracking syntactic predicates. We saw an example of them in the implementation of Comment. Often, their used to indicate the end of a list of other productions, such as Literal, above. However, sometimes, they’re used to ensure the correct production is matched. For example, a Primary can be an Identifier, as long as it’s not followed by a left arrow. An identifier followed by a left arrow indicates a Definition.
///Primary <- Identifier !LEFTARROW /// / OPEN Expression CLOSE /// / Literal / Class / DOT let rec (|Primary|_|) input = let (|NotLEFTARROW|_|) input = match input with | LEFTARROW (_) -> None | _ -> Some(input) match input with | Identifier (id, NotLEFTARROW (input)) -> Some(Primary.Identifier(id), input) | OPEN ( Expression (exp, CLOSE (input))) -> Some(Primary.Expression(exp), input) | Literal (lit, input) -> Some(Primary.Literal(lit), input) | Class (cls, input) -> Some(Primary.Class(cls), input) | DOT (input) -> Some(Primary.Dot, input) | _ -> None
Here, I need a way to match the absence of LEFTARROW, so I’ve build a simple local function called NotLEFTARROW. This isn’t very clean IMO – I’d rather have a used a custom operator like !!! and &&& for my backtracking predicates. But I haven’t figured out how to use custom operators as Active Patterns. I was able to write a standard non-operator AP function, but then I have to use the full AP function name. Here’s a version of Primary written that way:
///Backtracking failure predicate let (|NotPred|_|) f input = match f input with | Some (_) -> None | _ -> Some(input) let rec (|Primary|_|) input = match input with | Identifier (id, NotPred (|LEFTARROW|_|) (input)) -> Some(Primary.Identifier(id), input) //Other matches omited
Frankly, I don’t think that’s very readable, so I didn’t implement it that way. If I can figure out how to use custom operators and pass around AP functions without using their full ugly name, I’ll change it.
Finally, there are a few things about F#’s scoping rules that you need to understand. F# uses linear scoping, which is to say there’s no way to use a type or function that hasn’t been declared, sort of like C/C++. The difference is that while C/C++ have a way to declare a type or function separately from its implementation, F# has no such capacity. This becomes an issue when you have circular references. For example, Primary can be an Expression, which is a list of SequenceItems, each of which is a Primary with an optional prefix and suffix. In order to declare those in F#, you have to use a special “and” syntax to link the types/functions together.
//ToString and Exp2Str omitted for clarity type Primary = | Identifier of string | Expression of Expression | Literal of string | Class of Range list | Dot //ToString omitted for clarity and SequenceItem = { primaryItem: Primary; itemPrefix: Prefix option; itemSuffix: Suffix option; } and Sequence = SequenceItem list and Expression = Sequence list
Likewise, the AP functions to recognize Primary, SequenceItem, Sequence and Expression are anded together. For me, this is one of the hardest things to get used to about F#. But as you can see from the expressiveness of the code, it’s well worth the trouble