Friday, May 5, 2017

A puzzle about game review scoring in Haskell

Everyone knows that video game reviewers tend to hype up new releases, leading to score inflation over time. To fight that, we could ask reviewers to estimate the ordinal rank of each new game (e.g. "this game will be my GOTY" or "this game will be in my top 5 this year") instead of meaningless scores like "8/10". That makes sense because people want to know how a game compares to the best, not to the worst, and whether they should spend money now rather than next month. That way, reviewers can't rank too many games too highly, scores can be compared across years, and everyone's happy.

That leads to an interesting algorithmic problem. Let's say Bob the reviewer played five games in a year, and had this to say about them at the time:

  • "Game A will be in my top 3 games released this year."
  • "Game B will be in my top 2 games released this year."
  • "Game C will be my GOTY."
  • "Game D will be in my top 3 games released this year."
  • "Game E will be in my top 5 games released this year."

That translates to giving games A,B,C,D,E scores 3,2,1,3,5 (lower means better). But somewhere around game D, it seems like Bob's reviews lost credibility, because there are four games in his top 3!

Our problem will be taking a list of such scores, and finding the length of its longest "credible" prefix. For example, for a list like [3,2,1,3,5] above, the answer should be 3.

(Try solving the puzzle yourself if you like. Spoilers below.)

First of all, what does it mean for a list to be "credible"? [1,2,3] is clearly credible, and so is [3,3,3]. But [1,2,2] isn't, because there are three games in the top 2. How do we make that precise?

After thinking for a minute, you realize that that a list is credible if its smallest element is at least 1, the next smallest is at least 2, and so on. In other words, if its sorted version is elementwise greater than [0,1,2,...]. So we can easily check credibility in O(n log n):

import Data.List (sort)

credible :: [Int] -> Bool
credible xs = and (zipWith (<) [0..] (sort xs))

That might be optimal for a pure language, though proving that seems like a research problem. If mutation is allowed, we can build a histogram instead of sorting, leading to O(n):

import Control.Monad (forM_)
import Data.Array (elems)
import Data.Array.ST (runSTArray, newArray, readArray, writeArray)

credible2 :: [Int] -> Bool
credible2 xs = and $ zipWith (>=) [0..] $ scanl1 (+) $ elems $ runSTArray $ do
    let n = length xs
    a <- newArray (0, n) 0
    forM_ xs $ \x -> do
      let i = max 0 (min n x)
      y <- readArray a i
      writeArray a i (y + 1)
    return a

To find the length of the longest credible prefix, we could just call the previous function for each prefix, leading to O(n^2 log n) without mutation and O(n^2) with mutation:

credibleLength :: [Int] -> Int
credibleLength xs = maximum [n | n <- [0..length xs], credible (take n xs)]

But that does a lot of wasted work, processing the same data over and over again. It's better to find the right prefix length using binary search, leading to O(n log^2 n) without mutation and O(n log n) with mutation:

credibleLength2 :: [Int] -> Int
credibleLength2 xs = grow 1 where
  check n = credible (take n (xs ++ repeat 0))
  grow n = if check n then grow (n * 2) else shrink (div n 2) n
  shrink m n = if m + 1 == n then m else
    let k = div (m + n) 2 in if check k then shrink k n else shrink m k

Note how the binary search has "grow" and "shrink" phases, to keep the complexity dependent on the length of the longest credible prefix instead of the whole list.

But even that's still not optimal. Without mutation, the grow phase takes O(n log n) time, but the shrink phase takes O(n log^2 n) because we're sorting subsets of the same data over and over. Instead we could sort it all once and then ignore certain elements on each pass. That requires a more complicated algorithm, pairing the elements with their indices before sorting, but lets us achieve O(n log n) without mutation:

credibleLength3 :: [Int] -> Int
credibleLength3 xs = grow 1 where
  grow n = if check n then grow (2 * n) else shrink (div n 2) n where
    cache = sort (take n (zipWith (,) (xs ++ repeat 0) [0..]))
    check k = and (zipWith (<) [0..] (map fst (filter ((<k) . snd) cache)))
    shrink m n = if m + 1 == n then m else
      let k = div (m + n) 2 in if check k then shrink k n else shrink m k

But even that can be improved! If mutation is allowed, we can get very close to O(n) by using a completely different approach, keeping track of occupied intervals in a union-find data structure. The only problem is that it won't work as well with streaming, because we need enough memory for the whole list. But otherwise this ugly imperative-looking code should have the best asymptotic time so far:

{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts #-}

import Control.Monad (when)
import Control.Monad.ST (runST)
import Data.Array.ST (STArray, newArray, readArray, writeArray)
import Data.UnionFind.ST (Point, descriptor, fresh, repr, union)

credibleLength4 :: [Int] -> Int
credibleLength4 xs = runST $ do
  let n = length xs
  (arr :: STArray s Int (Maybe (Point s Int))) <- newArray (1, n) Nothing
  let
    update x nothing just = do
      val <- readArray arr x
      maybe nothing just val
    insert x = do
      point <- fresh x
      writeArray arr x (Just point)
      when (x > 1) (update (x - 1) (return ()) (\p -> union point p))
      when (x < n) (update (x + 1) (return ()) (\p -> union p point))
      return True
    check x = if x < 1 then return False else update y (insert y) just where
      y = min x n
      just point = do
        z <- descriptor point
        if z == 1 then return False else insert (z - 1)
  ys <- mapM check xs
  return (length (takeWhile id ys))

To summarize, without mutation both problems seem to require O(n log n) time, but mutation allows us to get linear time on one and quasilinear on the other. Fun!

UPDATE: in the Reddit discussion users djfletch and foBrowsing have proposed using IntSet to track unoccupied positions, leading to another O(n log n) solution that's very fast in practice. Here's the code, slightly adapted from their proposals:

import Data.IntSet (empty, delete, lookupLE, union, fromList)

credibleLength5 xs = go xs 1 empty where
  go [] _ _ = 0
  go (x:xs) nextHole holes = if x < nextHole
    then maybe 0 (\y -> 1 + go xs nextHole (delete y holes)) (lookupLE x holes)
    else 1 + go xs (x + 1) (union holes (fromList [nextHole .. x - 1]))

It's impressively short, and I still need to learn a lot about Haskell to write like that!

Saturday, September 20, 2014

A proof of Löb's theorem in Haskell

I just wrote a post about programming on LW: A proof of Löb's theorem in Haskell. There's some discussion there, and also in these two threads on /r/haskell. Many thanks to gelisam, who deserves at least as much credit as I do :-)

Wednesday, September 10, 2014

Even after all these years, writing quines still feels like I'm hacking the universe somehow

Quine's paradox is fascinating:
"Yields falsehood when preceded by its quotation" yields falsehood when preceded by its quotation.
It's a variant of the liar paradox "this sentence is false", but doesn't use explicit self-reference anywhere, only simple text substitution! That reflects the fact that you can write a quine in any programming language, and a program doesn't need any special "self-referential instructions" to print its own source code.

I just made a simple generator for English sentences that are similar to Quine's paradox. Given an input like this:
I believe in the sentence X
The generator assumes that X stands for "this sentence", and gives you an equivalent sentence without explicit self-reference:
I believe in the sentence "I believe in the sentence X, with the first X replaced by the previous quoted sentence", with the first X replaced by the previous quoted sentence.
Or it could take an input like this:
I don't think X means what you think it means
And give you this:
I don't think "I don't think X, with the first X replaced by the previous quoted sentence, means what you think it means", with the first X replaced by the previous quoted sentence, means what you think it means.
See?

Here's the JavaScript code for the generator, it's very simple:

function quine(phrase) {
  if (phrase.split("X").length != 2) {
    throw "Input phrase must contain exactly one occurrence of X";
  }
  var inner = phrase.replace(
    "X",
    "X, with the first X replaced by the previous quoted sentence" +
    ((phrase.indexOf("X ") == -1) ? "" : ","));
  return inner.replace("X", "\"" + inner + "\"") + ".";
}

Saturday, September 6, 2014

Cylinders in Spheres, Revisited

A couple months ago, a post called Cylinders in Spheres made the rounds on HN. It asks the question "what's the biggest cylinder that can fit inside a sphere?", and then solves it by doing a whole bunch of equations. I'm not satisfied with that solution, because in truth you can solve the problem by using geometrical imagination alone. You can do it in your head, in five minutes. Here's how:

1) Imagine a square inscribed in a circle. It's easy to see that the area of the square has a constant ratio to the area of the circle, regardless of the size of the circle. If we project out that shape into the 3rd dimension, we see that the volume of a cylinder has a constant ratio to the volume of a box inside the cylinder.

2) That means the biggest cylinder that fits inside a sphere corresponds to the biggest box that can fit inside a sphere. We just need to figure out how that box looks like.  From symmetry, it seems likely that the box must be a cube. But how do we prove that?

3) Let's tackle a simpler problem first. What's the biggest rectangle that can fit inside a circle? From symmetry, it's got to be a square, but how do we prove that? Well, the rectangle's diagonal is the diameter of the circle, and cuts the rectangle into two equal triangles. The area of a triangle is proportional to the product of base and height, so the area is maximized when the height is maximized. The height is maximized if the triangle touches the top of the half-circle arc, so we see that the rectangle with maximum area is indeed a square.

4) Now it's easy to tackle the problem with the sphere and the box. If the base of the box is a non-square rectangle, we can note that it's inscribed in a certain circle. We can change it to a square inscribed in the same circle, without affecting the height of the box, and thus increase the volume. That means if the box already has maximum volume, two of its dimensions must be equal. Since we can do the same with every pair of dimensions, we see that all three are equal. Our intuitions were right, the biggest box that fits inside a sphere is a cube! And the biggest cylinder is the cylinder that's wrapped around that cube.

5) For the finishing touch, let's mentally calculate the cylinder's volume. If the sphere has radius 1, the inscribed cube has vertices at coordinates ±1/√3, ±1/√3, ±1/√3. That means the cylinder has height 2/√3 and the base is a circle with radius √2/√3, which gives a volume of 4π/3√3. Whee!

Wednesday, July 16, 2014

Codata, co-totality, co-complexity

I've been trying to figure out complexity analysis in Haskell for some time, and now I think I finally understand why it's difficult.

As sigfpe described, there are two universes out there. One is about inductive types and always finite data, and the other is about coinductive types and potentially infinite data. Roughly, the list type in ML is a good example of data, and the list type in Haskell is a good example of codata.

The difference between data and codata affects most things you can say about programs. For example, ML has a concept of totality. The length function on lists is total, because lists are guaranteed to be finite. Haskell, on the other hand, has no useful concept of totality, and the length function is partial. But Haskell has a concept of "co-totality" instead, and the zip function on lists is "co-total", meaning that its result can be pattern-matched to arbitrary depth if the arguments satisfy the same condition. (The idea of "guarded" or "productive" recursion is related to that.)

So if the universe and co-universe are duals, there must be a natural concept of "co-complexity", right? Well, yeah. In a genuinely "co-universe" language, it would be easy to reason about time and space complexity, just like in ML and other languages from the "data universe". But it turns out that Haskell is not quite a "co-universe" language!

Giving away the punchline, data and codata don't correspond naturally to eager and lazy evaluation, as you might have guessed. Instead, data and codata correspond to call-by-value and call-by name! There's a big difference between call-by-name and lazy evaluation (call-by-need), namely that the former will evaluate the same expression several times if needed, while the latter uses memoization and graph reduction all over the place.

If Haskell was a pure call-by-name language, we'd be able to define "co-complexity" for individual functions, in terms of how hard it is to evaluate their result to a given depth. This way, complexity analysis would work compositionally, as you'd expect. But with lazy evaluation, accessing the head of a list becomes cheaper the second time you do it. This way, referential transparency of costs is lost.

Of course, programming in a pure call-by-name language would most likely be unpleasant, and there are good reasons for lazy evaluation to exist. But it was interesting for me to realize that Haskell is not the "co-universe" twin of a strict language like ML, but rather some sort of hybrid, which is harder to reason about than either of the two "pure" approaches.

In summary, it's not codata that makes it hard to reason about complexity, it's the pervasive memoization.

Friday, June 13, 2014

Yet another lousy monad tutorial

I'm not a big fan of monads, but I understand them. They're not rocket science. Let me try to write a monad tutorial that would've helped my past self understand what the fuss was about. I like concrete explanations that start with practical examples, without any annoying metaphors, and especially without any Haskell code. So here's five examples that have something in common:

1) If a function has type A → B, and another function has type B → C, then we can "compose" them into a new function of type A → C.

2) Let's talk about functions that can return multiple values. We can model these as functions of type A → List<B>. There is a natural way to "compose" two such functions. If one function has type A → List<B>, and another function has type B → List<C>, then we can "compose" them into a new function of type A → List<C>. The "composition" works by joining together all the intermediate lists of values into one. This is similar to MapReduce, which also collects together lists of results returned by individual workers.

3) Let's talk about functions that either return a value, or fail somewhere along the way. We can model these as functions of type A → Option<B>, where Option<B> is a type that contains either a value of type B, or a special value None. There is a natural way to "compose" two such functions. If one function has type A → Option<B>, and another function has type B → Option<C>, then we can "compose" them into a new function of type A → Option<C>. The "composition" works just like regular function composition, except if the first function returns None, then it gets passed along and the second function doesn't get called. This way you can have a "happy path" in your code, and check for None only at the end.

4) Let's talk about functions that call a remote machine asynchronously. We can model these as functions of type A → Promise<B>, where Promise<B> is a type that will eventually contain a value of type B, which you can wait for. There is a natural way to "compose" two such functions. If one function has type A → Promise<B>, and another function has type B → Promise<C>, then we can "compose" them into a new function of type A → Promise<C>. The "composition" is an asynchronous operation that waits for the first promise to return a value, then calls the second function on that value. This is known in some languages as "promise pipelining". It can sometimes make remote calls faster, because you can send both calls to the remote machine in the same request.

5) Let's talk about functions that do input or output in a pure functional language, like Haskell. We can define IO<A> as the type of opaque "IO instructions" that describe how to do some IO and return a value of type A. These "instructions" might eventually be executed by the runtime, but can also be freely passed around and manipulated before that. For example, to create instructions for reading a String from standard input, we'd have a function of type Void → IO<String>, and to create instructions for writing a String to standard output, we'd have String → IO<Void>. There is a natural way to "compose" two such functions. If one function has type A → IO<B>, and another function has type B → IO<C>, then we can "compose" them into a new function of type A → IO<C>. The "composition" works by just doing the IO in sequence. Eventually the whole program returns one huge complicated IO instruction with explicit sequencing inside, which is then passed to the runtime for execution. That's how Haskell works.

Another thing to note is that each of the examples above also has a natural "identity" function, such that "composing" it with any other function F gives you F again. For ordinary function composition, it's the ordinary identity function A → A. For lists, it's the function A → List<A> that creates a single-element list. For options, it's the function A → Option<A> that takes a value and returns an option containing that value. For promises, it's the function A → Promise<A> that takes a value and makes an immediately fulfilled promise out of it. And for IO, it's the function A → IO<A> that doesn't actually do any IO.

At this point we could go all mathematical and talk about how "compose" is like number multiplication, and "identity" is like the number 1, and then go off into monoids and categories and functors and other things that are frankly boring to me. So let's not go there! Whew!

Instead, to stay more on the programming track, let's use a Java-like syntax to define an interface Monad<T> with two methods "identity" and "compose". The five examples above will correspond to five different concrete classes that implement that interface, for different choices of the type parameter T.

The main complication is that the type parameter T must not be a simple type, like String. Instead it must be itself a generic type, like List, Option or Promise. The reason is that we want to have a single implementation of Monad<Option>, not separate implementations like Monad<Option<Integer>>, Monad<Option<String>> and so on. Java and C# don't support generic types whose parameters are themselves generic types (the technical term is "higher-kinded types"), but C++ has some support for them, called "template template parameters". Some functional languages have higher-kinded types, like Haskell, while others don't have them, like ML.

Anyway, here's what it would look like in Java, if Java supported such things:

interface Monad<T> {
  <A> Function<A, T<A>> identity();
  <A, B, C> Function<A, T<C>> compose(
    Function<A, T<B>> first,
    Function<B, T<C>> second
  );
} 

class OptionMonad implements Monad<Option> {
  public <A> Function<A, Option<A>> identity() {
    // Implementation omitted, figure it out
  }
  public <A, B, C> Function<A, Option<C>> compose(
    Function<A, Option<B>> first,
    Function<B, Option<C>> second
  ) {
    // Implementation omitted, do it yourself
  }
}

Defining Monad as an interface allows us to implement some general functionality that will work on all monads. For example, there's a well known function "liftM" that converts a function of type A → B into a function of type List<A> → List<B>, or Promise<A> → Promise<B>, or anything else along these lines. For different monads, liftM will do different useful things, e.g. liftM on lists is just the familiar "map" function in disguise. The implementation of liftM with lambda expressions would be very short, though a little abstract:

<T, A, B> Function<T<A>, T<B>> liftM(
  Function<A, B> func,
  Monad<T> monad
) {
  return (T<A> ta) -> monad.compose(
    () -> ta,
    (A a) -> monad.identity().apply(func.apply(a))
  ).apply(void);
}

Or if you don't like lambda expressions, here's a version without them:

<T, A, B> Function<T<A>, T<B>> liftM(
  Function<A, B> func,
  Monad<T> monad
) {
  return new Function<T<A>, T<B>>() {
    public T<B> apply (T<A> ta) {
      return monad.compose(
        new Function<Void, T<A>>() {
          public T<A> apply() {
            return ta;
          }
        },
        new Function<A, T<B>>() {
          public T<B> apply(A a) {
            return monad.identity().apply(func.apply(a));
          }
        }
      ).apply(void);
    }
  }
}

Monads first became popular as a nice way to deal with IO instructions in a pure language, treating them as first-class values that can be passed around, like in my example 5. Imperative languages don't need monads for IO, but they often face a similar problem of "option chaining" or "error chaining", like in my example 3. For example, the option types in Rust or Swift would benefit from having a "happy path" through the code and a centralized place to handle errors, which is exactly the thing that monads would give you.

Some people think that monads are about side effects, like some sort of escape hatch for Haskell. That's wrong, because you could always do IO without monads, and in fact the first version of Haskell didn't use them. In any case, only two of my five examples involve side effects. Monads are more like a design pattern for composing functions, which shows up in many places. I think the jury is still out on whether imperative/OO languages need to use monads explicitly, but it might be useful to notice them when they appear in your code implicitly.

Thursday, June 12, 2014

Why I don't like laziness

Some ppl think lazy functional programming is nicer than strict. At first I thought so as well, but now I think strict is nicer. Not for any engineering reasons, but from a purely mathematical point of view.

The reason has to do with equality. I think about the equality operation a lot, it really brings out the important issues in language design. It's become something of a litmus test for me, if some language feature is dragging down the equality operation, then it probably sucks in other ways too. For example, in Haskell you can do this:

data Natural = Zero | Successor Natural deriving Eq
omega :: Natural
omega = Successor omega

So you have this bizarre number "omega" that is not bottom, and you can pattern-match on it to your heart's content. You can compare it for equality with Zero, Successor Zero and so on, and all those comparisons will happily return False. But try to compare omega to itself, and you get an infinite loop!

That's the problem in a nutshell. Since Haskell is lazy, there are no types that support decidable equality at all! So you might as well allow "deriving Eq" for many more types, like Bool -> Bool. Sure, comparing two functions of that type is undecidable, but it's exactly as undecidable as comparing two lazy Bools in the first place :-)

Imagine for a moment that Haskell was strict. Then we would know, truly irrevocably and forever, that the types that allow "deriving Eq" are exactly the types that have an always terminating equality operation, nothing more and nothing less. In a lazy language, "deriving Eq" can't make such a powerful statement and replaces it with something wishy-washy, and that's why I think strict is nicer.