Wordy

Scala Programming Literate

As mentioned in my previous post, I’ve recently been solving problems from Exercism with a bunch of my colleagues.

In this post, I’m discussing my solution to the Wordy exercise.

It’s embarrassing, but this post completely ignores the advice in the last one.

The last post talked about the importance of introducing concepts incrementally. In contrast, here I go all out, and implement a Parser Combinator Framework from scratch.

I’m really only publishing this because I’d already gone to the effort of adding fairly detailed “literate” comments.

The aim of the game in “Wordy” is to parse expressions that look like the following:

What is 2 plus 8 multiplied by 3?

You’re supposed to ignore BIDMAS, and evaluate the expressions from left to right. So in that case, the correct result would be 30.

I decided to slightly overreach the problem description, and also support parsing some “postfix” operations too, such as:

What is 2 squared?

I’m also going to be fairly religious about only accepting solutions that are well formed.

Parser Combinators

As mentioned above, this solution to “Wordy” starts by building a ((relatively) simple) “Parser Combinator” framework.

To quote a paper on Monadic Parser Combinators, by Graham Hutton and Erik Meijer:

In functional programming, a popular approach to building recursive descent parsers is to model parsers as functions, and to define higher-order functions (or combinators) that implement grammar constructions such as sequencing, choice, and repetition.

In a nutshell, we’re aiming to write a framework that consists of:

Very simple primitive parsers
Functions for composing those parsers together

Once we’ve got those two things, we will be able to compose our simple parsers together to form complicated ones, eventually solving the kata.

This pattern, starting with very simple primitives that are easy to compose, is very common in Functional Programming.

I’d go as far as saying that this principle is central to good Functional Programming Design.

The Framework

`Parser[A]`

A Parser[A] is a “Single Abstract Method” class

“Single Abstract Method” classes are isomorphic to a function. In this case, the function is parse with type String => Option[(String, A)]

You can think of a parser as taking a String, and either:

consuming part of that string, and returning a value
failing, and returning no value

The +A in the type signature means that our parser is “covariant” If we have a Parser[A] and A extends B, we can use our Parser[A] as if it were a Parser[B]. This doesn’t come up very much, so you can mostly ignore it.

trait Parser[+A] { self =>

`parse`

the “Single Abstract Method”.

Every “concrete” Parser must implement parse.

If we were doing this “in production”, we’d probably want to do something more efficient than creating a new string in every single Parser. For example, we could model this by updating an “index” into the string. Fortunately, in our case, none of the strings are going to be particularly large or complex, so we can get away with it.

  def parse(str: String): Option[(String, A)]

`flatMap`

flatMap takes a function from A => Parser[B] and produces a Parser[B] which:

runs this Parser, to produce an A, and consume part of the string
calls the function f(a) to produce a Parser[B]
runs that parser on the remainder of the string

Implementing flatMap on a trait implements the “Monad” pattern.

Amongst other things this lets us write “for comprehensions”, these “de-sugar” into chains of flatMaps

  def flatMap[B](f: A => Parser[B]): Parser[B] =
    new Parser[B] {
      override def parse(str: String): Option[(String, B)] =
        for {
          (s1, a) <- self.parse(str)
          (s2, b) <- f(a).parse(s1)
        } yield (s2, b)
    }

`orElse`

this mimics Scala’s Option.orElse.

It returns the result of “this parser”, if there is one, otherwise it returns the result of the parser in the argument.

  def orElse[AA >: A](p2: Parser[AA]): Parser[AA] =
    new Parser[AA] {
      override def parse(str: String): Option[(String, AA)] =
        self.parse(str).orElse(p2.parse(str))
    }

`map`

The famous “map” function.

This transforms our Parser[A] into a Parser[B] using a function A => B.

We can write it, in terms of flatMap seen above, and Parser.pure (which we’ll get to later).

Parser.pure always returns a constant value, and doesn't consume any part of the string.

  def map[B](f: A => B): Parser[B] = self.flatMap(a => Parser.pure(f(a)))

`replace`

This is really just a special case of map, we’re passing a constant value, rather than a function.

This doesn’t let us do anything that we couldn’t do with parser.map(_ => ..., except that it makes it slightly clearer that we’re discarding the parsed value.

  def replace[B](b: B): Parser[B] = self.map(_ => b)

`emap`

emap is like map, except that our function f can fail, hence it returns an Option.

Much like map we can implement this in terms of flatMap and pure.

We also need Parser.failure, a parser which always fails to parse it’s argument.

  def emap[B](f: A => Option[B]): Parser[B] =
    self.flatMap(f(_) match {
      case Some(value) => Parser.pure(value)
      case None        => Parser.failure
    })

`many`

many returns the result of running this parser over and over again until it eventually fails.

Because the parser may be run multiple times, this produces a Parser[List[A]].

In “strict FP” terms, many comes from the Alternative typeclass.

  def many: Parser[List[A]] =
    (for {
      h <- self
      t <- many
    } yield (h :: t))
      .orElse(Parser.pure(List.empty))
}

`object Parser`

At this point, we’ve finished with the Parser[+A] trait, and we’re now working on it’s companion object.

We’ve now defined all of the methods that are available on a Parser: our “combinators”.

Next we want to build some simple, concrete, parsers.

object Parser {
  import scala.util.matching.Regex
  import scala.util.Try

`Parser.pure`

As mentioned above, Parser.pure always returns a constant value, and doesn’t consume any part of the string.

In this particular case, we only use this in order to implement map in terms of flatMap.

But in general, Monads should have an implementation of both flatMap and pure.

  def pure[A](a: A): Parser[A] =
    new Parser[A] {
      override def parse(str: String): Option[(String, A)] = Some((str, a))
    }

`Parser.failure`

A parser that always fails, and never returns anything.

We only really use this in order to implement emap in terms of flatMap, so you can ignore it from here on out.

In Scala, Nothing is a subtype of every type, so we can use a Parser[Nothing] in place of any other Parser, because our Parser is covariant, as mentioned above`.

  def failure: Parser[Nothing] =
    new Parser[Nothing] {
      override def parse(str: String): Option[(String, Nothing)] = None
    }

`Parser.exactly`

A parser that succeeds only if the string starts with a specific sequence.

It always returns that sequence, consuming it from the input.

This may seem trivial, but it can be composed with other Parsers to solve difficult problems.

  def exactly(start: String): Parser[String] =
    new Parser[String] {
      override def parse(str: String): Option[(String, String)] =
        if (str.startsWith(start)) {
          Some((str.drop(start.length()), start))
        } else {
          None
        }
    }

`Parser.matches`

This is the same as exactly except it takes a Regex instead of a string.

I could probably have written exactly in terms of matches, but this felt simpler, and more efficient.

Even when doing FP, we can still find a use for Regex.

  def matches(regexp: Regex): Parser[String] =
    new Parser[String] {
      override def parse(str: String): Option[(String, String)] =
        regexp
          .findPrefixMatchOf(str)
          .map(aMatch => (aMatch.after.toString(), aMatch.matched))
    }

`Parser.int`

A Parser that will parse and return an integer.

This will fail if our input doesn’t match the regexp or if the conversion to an int fails.

Finally this is looking like it’s approaching a solution to our problem and not just “miscellaneous category theory”

  def int: Parser[Int] =
    matches("-?[0-9]+".r).emap(s => Try { s.toInt }.toOption)

`Parser.spaces`

A parser that matches whitespace.

We’ll generally be discarding the result of this, but it’s useful to indicate that the content of 2 parsers may be separated by whitespace.

  def spaces: Parser[String] = matches("\\s?".r)

`Parser.end`

Parse a string, and fail if the string isn’t empty.

Returns Unit; we can’t get any information from an empty string.

This is useful for running as the last step in a parser. It guarantees that we don’t return success if passed a valid string, followed by garbage.

  def end: Parser[Unit] =
    new Parser[Unit] {
      override def parse(str: String): Option[(String, Unit)] =
        str match {
          case "" => Some(("", ()))
          case _  => None
        }
    }
}

Solving the Actual Problem

From here on out, the code is less “frameworky” and more “targeted to the problem”.

Hence I’m moving from the Parser trait, and companion object and starting to write code in a Wordy object.

There are a number of good Scala Parser Combinator frameworks, and in the wild, I probably wouldn’t have written any of the code up until this point.

object Wordy {

`Wordy.infixOperatorParser`

This is a parser that takes any of the operators, “multiplied by”, “divided by”, “plus” or “minus”.

And parses it into the corresponding “binary” function, in this case a function from Int => Int => Int (note that this is written in “curried form” (you can probably ignore what that means)).

  lazy val infixOperatorParser: Parser[Int => Int => Int] = {
    val multiplied =
      Parser
        .exactly("multiplied by")
        .replace[Int => Int => Int](a => b => a * b)
    val divided =
      Parser.exactly("divided by").replace[Int => Int => Int](a => b => a / b)
    val plus =
      Parser.exactly("plus").replace[Int => Int => Int](a => b => a + b)
    val minus =
      Parser.exactly("minus").replace[Int => Int => Int](a => b => a - b)
    List(multiplied, divided, plus, minus).reduce(_ orElse _)
  }

`Wordy.infixFnParser`

First runs infixOperatorParser to get a binary function.

Then it parses the following int, and returns a function Int => Int obtained by running operator with the int as the second argument.

For example “multiplied by 4” would be parsed into a function _ * 4. and “plus 3” would be parsed into a function _ + 3.

  lazy val infixFnParser: Parser[Int => Int] = for {
    operator <- infixOperatorParser
    _ <- Parser.spaces
    value <- Parser.int
  } yield (a => operator(a)(value))

`Wordy.postfixFnParser`

This is very similar to infixOperatorParser. Except that it implements postfix functions, such as “3 cubed”

  lazy val postfixFnParser: Parser[Int => Int] = List(
    Parser.exactly("squared").replace[Int => Int](a => a * a),
    Parser.exactly("cubed").replace[Int => Int](a => a * a * a)
  ).reduce(_ orElse _)

`Wordy.fnParser`

This matches either “partially applied” infix operators, such as “plus 3”, or postfix operators, such as “squared”. It also consumes any leading spaces

  lazy val fnParser: Parser[Int => Int] =
    Parser.spaces.flatMap(_ => infixFnParser.orElse(postfixFnParser))

`eqnParser`

Parse an initial Int, followed by a sequence of Operators and Ints as Functions Int => Int.

Then apply the functions to the initial value from Left to Right.

It should be noted that if there are no functions, this returns the initial value.

This parses the “-12 divided by 2 divided by -3” part of the problem

  lazy val eqnParser: Parser[Int] = for {
    first <- Parser.int
    functions <- fnParser.many
  } yield functions.foldLeft(first) { (acc, f) => f(acc) }

`Wordy.wordyParser`

Wrap eqnParser, first applying a parser that looks for the “What is” then apply eqnParser. And finally, consumes the trailing “?”, before validating that the entire string has been matched.

  val wordyParser: Parser[Int] = for {
    _ <- Parser.exactly("What is ")
    res <- eqnParser
    _ <- Parser.exactly("?")
    _ <- Parser.end
  } yield res

`Wordy.answer`

Finally, our implementation of answer, just call wordyParser.parse, and get the result.

  def answer(str: String): Option[Int] = wordyParser.parse(str).map(_._2)
}

This blog post is Literate Scala, view the raw source-code here.