As mentioned in my previous post, I’ve recently been solving problems from Exercism with a bunch of my colleagues.
In this post, I’m discussing my solution to the Wordy exercise.
It’s embarrassing, but this post completely ignores the advice in the last one.
The last post talked about the importance of introducing concepts incrementally. In contrast, here I go all out, and implement a Parser Combinator Framework from scratch.
I’m really only publishing this because I’d already gone to the effort of adding fairly detailed “literate” comments.
The aim of the game in “Wordy” is to parse expressions that look like the following:
What is 2 plus 8 multiplied by 3?
You’re supposed to ignore BIDMAS,
and evaluate the expressions from left to right. So in that case, the correct result would be 30
.
I decided to slightly overreach the problem description, and also support parsing some “postfix” operations too, such as:
What is 2 squared?
I’m also going to be fairly religious about only accepting solutions that are well formed.
Parser Combinators
As mentioned above, this solution to “Wordy” starts by building a ((relatively) simple) “Parser Combinator” framework.
To quote a paper on Monadic Parser Combinators, by Graham Hutton and Erik Meijer:
In functional programming, a popular approach to building recursive descent parsers is to model parsers as functions, and to define higher-order functions (or combinators) that implement grammar constructions such as sequencing, choice, and repetition.
In a nutshell, we’re aiming to write a framework that consists of:
- Very simple primitive parsers
- Functions for composing those parsers together
Once we’ve got those two things, we will be able to compose our simple parsers together to form complicated ones, eventually solving the kata.
This pattern, starting with very simple primitives that are easy to compose, is very common in Functional Programming.
I’d go as far as saying that this principle is central to good Functional Programming Design.
The Framework
Parser[A]
A Parser[A]
is a “Single Abstract Method” class
“Single Abstract Method” classes are isomorphic to a function.
In this case, the function is parse
with type String => Option[(String, A)]
You can think of a parser as taking a String, and either:
- consuming part of that string, and returning a value
- failing, and returning no value
The +A
in the type signature means that our parser is “covariant”
If we have a Parser[A]
and A extends B,
we can use our Parser[A]
as if it were a Parser[B]
.
This doesn’t come up very much, so you can mostly ignore it.
trait Parser[+A] { self =>
parse
the “Single Abstract Method”.
Every “concrete” Parser must implement parse
.
If we were doing this “in production”, we’d probably want to do something more efficient than creating a new string in every single Parser
.
For example, we could model this by updating an “index” into the string.
Fortunately, in our case, none of the strings are going to be particularly large or complex, so we can get away with it.
def parse(str: String): Option[(String, A)]
flatMap
flatMap
takes a function from A => Parser[B]
and produces a Parser[B]
which:
- runs
this
Parser, to produce an A, and consume part of the string - calls the function
f(a)
to produce aParser[B]
- runs that parser on the remainder of the string
Implementing flatMap
on a trait implements the “Monad” pattern.
Amongst other things this lets us write “for comprehensions”,
these “de-sugar” into chains of flatMap
s
def flatMap[B](f: A => Parser[B]): Parser[B] =
new Parser[B] {
override def parse(str: String): Option[(String, B)] =
for {
(s1, a) <- self.parse(str)
(s2, b) <- f(a).parse(s1)
} yield (s2, b)
}
orElse
this mimics Scala’s Option.orElse
.
It returns the result of “this parser”, if there is one, otherwise it returns the result of the parser in the argument.
def orElse[AA >: A](p2: Parser[AA]): Parser[AA] =
new Parser[AA] {
override def parse(str: String): Option[(String, AA)] =
.parse(str).orElse(p2.parse(str))
self}
map
The famous “map” function.
This transforms our Parser[A]
into a Parser[B]
using a function A => B
.
We can write it, in terms of flatMap
seen above,
and Parser.pure
(which we’ll get to later).
Parser.pure
always returns a constant value,
and doesn't
consume any part of the string.
def map[B](f: A => B): Parser[B] = self.flatMap(a => Parser.pure(f(a)))
replace
This is really just a special case of map, we’re passing a constant value, rather than a function.
This doesn’t let us do anything that we couldn’t do with parser.map(_ => ...
,
except that it makes it slightly clearer that we’re discarding the parsed value.
def replace[B](b: B): Parser[B] = self.map(_ => b)
emap
emap
is like map, except that our function f
can fail, hence it returns an Option
.
Much like map
we can implement this in terms of flatMap
and pure
.
We also need Parser.failure
, a parser which always fails to parse it’s argument.
def emap[B](f: A => Option[B]): Parser[B] =
.flatMap(f(_) match {
selfcase Some(value) => Parser.pure(value)
case None => Parser.failure
})
many
many
returns the result of running this parser over and over again
until it eventually fails.
Because the parser may be run multiple times, this produces a Parser[List[A]]
.
In “strict FP” terms, many
comes from the Alternative
typeclass.
def many: Parser[List[A]] =
(for {
<- self
h <- many
t } yield (h :: t))
.orElse(Parser.pure(List.empty))
}
object Parser
At this point, we’ve finished with the Parser[+A]
trait, and we’re now working on it’s companion object.
We’ve now defined all of the methods that are available on a Parser
: our “combinators”.
Next we want to build some simple, concrete, parsers.
object Parser {
import scala.util.matching.Regex
import scala.util.Try
Parser.pure
As mentioned above, Parser.pure
always returns a constant value,
and doesn’t consume any part of the string.
In this particular case, we only use this in order to implement map
in terms of flatMap
.
But in general, Monad
s should have an implementation of both flatMap
and pure
.
def pure[A](a: A): Parser[A] =
new Parser[A] {
override def parse(str: String): Option[(String, A)] = Some((str, a))
}
Parser.failure
A parser that always fails, and never returns anything.
We only really use this in order to implement emap
in terms of flatMap,
so you can ignore it from here on out.
In Scala, Nothing
is a subtype of every type, so we can use a Parser[Nothing]
in place of any other Parser, because our Parser is covariant, as mentioned above`.
def failure: Parser[Nothing] =
new Parser[Nothing] {
override def parse(str: String): Option[(String, Nothing)] = None
}
Parser.exactly
A parser that succeeds only if the string starts with a specific sequence.
It always returns that sequence, consuming it from the input.
This may seem trivial, but it can be composed with other Parsers to solve difficult problems.
def exactly(start: String): Parser[String] =
new Parser[String] {
override def parse(str: String): Option[(String, String)] =
if (str.startsWith(start)) {
Some((str.drop(start.length()), start))
} else {
None
}
}
Parser.matches
This is the same as exactly
except it takes a Regex
instead of a string.
I could probably have written exactly
in terms of matches
,
but this felt simpler, and more efficient.
Even when doing FP, we can still find a use for Regex
.
def matches(regexp: Regex): Parser[String] =
new Parser[String] {
override def parse(str: String): Option[(String, String)] =
regexp.findPrefixMatchOf(str)
.map(aMatch => (aMatch.after.toString(), aMatch.matched))
}
Parser.int
A Parser that will parse and return an integer.
This will fail if our input doesn’t match the regexp or if the conversion to an int fails.
Finally this is looking like it’s approaching a solution to our problem and not just “miscellaneous category theory”
def int: Parser[Int] =
matches("-?[0-9]+".r).emap(s => Try { s.toInt }.toOption)
Parser.spaces
A parser that matches whitespace.
We’ll generally be discarding the result of this, but it’s useful to indicate that the content of 2 parsers may be separated by whitespace.
def spaces: Parser[String] = matches("\\s?".r)
Parser.end
Parse a string, and fail if the string isn’t empty.
Returns Unit
; we can’t get any information from an empty string.
This is useful for running as the last step in a parser. It guarantees that we don’t return success if passed a valid string, followed by garbage.
def end: Parser[Unit] =
new Parser[Unit] {
override def parse(str: String): Option[(String, Unit)] =
match {
str case "" => Some(("", ()))
case _ => None
}
}
}
Solving the Actual Problem
From here on out, the code is less “frameworky” and more “targeted to the problem”.
Hence I’m moving from the Parser
trait, and companion object
and starting to write code in a Wordy
object.
There are a number of good Scala Parser Combinator frameworks, and in the wild, I probably wouldn’t have written any of the code up until this point.
object Wordy {
Wordy.infixOperatorParser
This is a parser that takes any of the operators, “multiplied by”, “divided by”, “plus” or “minus”.
And parses it into the corresponding “binary” function,
in this case a function from Int => Int => Int
(note that this is written in “curried form”
(you can probably ignore what that means)).
lazy val infixOperatorParser: Parser[Int => Int => Int] = {
val multiplied =
Parser
.exactly("multiplied by")
.replace[Int => Int => Int](a => b => a * b)
val divided =
Parser.exactly("divided by").replace[Int => Int => Int](a => b => a / b)
val plus =
Parser.exactly("plus").replace[Int => Int => Int](a => b => a + b)
val minus =
Parser.exactly("minus").replace[Int => Int => Int](a => b => a - b)
List(multiplied, divided, plus, minus).reduce(_ orElse _)
}
Wordy.infixFnParser
First runs infixOperatorParser
to get a binary function.
Then it parses the following int,
and returns a function Int => Int
obtained by running operator with the int as the second argument.
For example “multiplied by 4” would be parsed into a function _ * 4
.
and “plus 3” would be parsed into a function _ + 3
.
lazy val infixFnParser: Parser[Int => Int] = for {
<- infixOperatorParser
operator <- Parser.spaces
_ <- Parser.int
value } yield (a => operator(a)(value))
Wordy.postfixFnParser
This is very similar to infixOperatorParser
.
Except that it implements postfix functions,
such as “3 cubed”
lazy val postfixFnParser: Parser[Int => Int] = List(
Parser.exactly("squared").replace[Int => Int](a => a * a),
Parser.exactly("cubed").replace[Int => Int](a => a * a * a)
).reduce(_ orElse _)
Wordy.fnParser
This matches either “partially applied” infix operators, such as “plus 3”, or postfix operators, such as “squared”. It also consumes any leading spaces
lazy val fnParser: Parser[Int => Int] =
Parser.spaces.flatMap(_ => infixFnParser.orElse(postfixFnParser))
eqnParser
Parse an initial Int, followed by
a sequence of Operators and Ints as Functions Int => Int
.
Then apply the functions to the initial value from Left to Right.
It should be noted that if there are no functions, this returns the initial value.
This parses the “-12 divided by 2 divided by -3” part of the problem
lazy val eqnParser: Parser[Int] = for {
<- Parser.int
first <- fnParser.many
functions } yield functions.foldLeft(first) { (acc, f) => f(acc) }
Wordy.wordyParser
Wrap eqnParser
, first applying a parser that looks for the “What is”
then apply eqnParser
.
And finally, consumes the trailing “?”,
before validating that the entire string has been matched.
val wordyParser: Parser[Int] = for {
<- Parser.exactly("What is ")
_ <- eqnParser
res <- Parser.exactly("?")
_ <- Parser.end
_ } yield res
Wordy.answer
Finally, our implementation of answer
,
just call wordyParser.parse
, and get the result.
def answer(str: String): Option[Int] = wordyParser.parse(str).map(_._2)
}
This blog post is Literate Scala, view the raw source-code here.