Saturday, September 20, 2014

A proof of Löb's theorem in Haskell

I just wrote a post about programming on LW: A proof of Löb's theorem in Haskell. There's some discussion there, and also in these two threads on /r/haskell. Many thanks to gelisam, who deserves at least as much credit as I do :-)

Wednesday, September 10, 2014

Even after all these years, writing quines still feels like I'm hacking the universe somehow

Quine's paradox is fascinating:
"Yields falsehood when preceded by its quotation" yields falsehood when preceded by its quotation.
It's a variant of the liar paradox "this sentence is false", but doesn't use explicit self-reference anywhere, only simple text substitution! That reflects the fact that you can write a quine in any programming language, and a program doesn't need any special "self-referential instructions" to print its own source code.

I just made a simple generator for English sentences that are similar to Quine's paradox. Given an input like this:
I believe in the sentence X
The generator assumes that X stands for "this sentence", and gives you an equivalent sentence without explicit self-reference:
I believe in the sentence "I believe in the sentence X, with the first X replaced by the previous quoted sentence", with the first X replaced by the previous quoted sentence.
Or it could take an input like this:
I don't think X means what you think it means
And give you this:
I don't think "I don't think X, with the first X replaced by the previous quoted sentence, means what you think it means", with the first X replaced by the previous quoted sentence, means what you think it means.
See?

Here's the JavaScript code for the generator, it's very simple:

function quine(phrase) {
  if (phrase.split("X").length != 2) {
    throw "Input phrase must contain exactly one occurrence of X";
  }
  var inner = phrase.replace(
    "X",
    "X, with the first X replaced by the previous quoted sentence" +
    ((phrase.indexOf("X ") == -1) ? "" : ","));
  return inner.replace("X", "\"" + inner + "\"") + ".";
}

Saturday, September 6, 2014

Cylinders in Spheres, Revisited

A couple months ago, a post called Cylinders in Spheres made the rounds on HN. It asks the question "what's the biggest cylinder that can fit inside a sphere?", and then solves it by doing a whole bunch of equations. I'm not satisfied with that solution, because in truth you can solve the problem by using geometrical imagination alone. You can do it in your head, in five minutes. Here's how:

1) Imagine a square inscribed in a circle. It's easy to see that the area of the square has a constant ratio to the area of the circle, regardless of the size of the circle. If we project out that shape into the 3rd dimension, we see that the volume of a cylinder has a constant ratio to the volume of a box inside the cylinder.

2) That means the biggest cylinder that fits inside a sphere corresponds to the biggest box that can fit inside a sphere. We just need to figure out how that box looks like.  From symmetry, it seems likely that the box must be a cube. But how do we prove that?

3) Let's tackle a simpler problem first. What's the biggest rectangle that can fit inside a circle? From symmetry, it's got to be a square, but how do we prove that? Well, the rectangle's diagonal is the diameter of the circle, and cuts the rectangle into two equal triangles. The area of a triangle is proportional to the product of base and height, so the area is maximized when the height is maximized. The height is maximized if the triangle touches the top of the half-circle arc, so we see that the rectangle with maximum area is indeed a square.

4) Now it's easy to tackle the problem with the sphere and the box. If the base of the box is a non-square rectangle, we can note that it's inscribed in a certain circle. We can change it to a square inscribed in the same circle, without affecting the height of the box, and thus increase the volume. That means if the box already has maximum volume, two of its dimensions must be equal. Since we can do the same with every pair of dimensions, we see that all three are equal. Our intuitions were right, the biggest box that fits inside a sphere is a cube! And the biggest cylinder is the cylinder that's wrapped around that cube.

5) For the finishing touch, let's mentally calculate the cylinder's volume. If the sphere has radius 1, the inscribed cube has vertices at coordinates ±1/√3, ±1/√3, ±1/√3. That means the cylinder has height 2/√3 and the base is a circle with radius √2/√3, which gives a volume of 4π/3√3. Whee!

Wednesday, July 16, 2014

Codata, co-totality, co-complexity

I've been trying to figure out complexity analysis in Haskell for some time, and now I think I finally understand why it's difficult.

As sigfpe described, there are two universes out there. One is about inductive types and always finite data, and the other is about coinductive types and potentially infinite data. Roughly, the list type in ML is a good example of data, and the list type in Haskell is a good example of codata.

The difference between data and codata affects most things you can say about programs. For example, ML has a concept of totality. The length function on lists is total, because lists are guaranteed to be finite. Haskell, on the other hand, has no useful concept of totality, and the length function is partial. But Haskell has a concept of "co-totality" instead, and the zip function on lists is "co-total", meaning that its result can be pattern-matched to arbitrary depth if the arguments satisfy the same condition. (The idea of "guarded" or "productive" recursion is related to that.)

So if the universe and co-universe are duals, there must be a natural concept of "co-complexity", right? Well, yeah. In a genuinely "co-universe" language, it would be easy to reason about time and space complexity, just like in ML and other languages from the "data universe". But it turns out that Haskell is not quite a "co-universe" language!

Giving away the punchline, data and codata don't correspond naturally to eager and lazy evaluation, as you might have guessed. Instead, data and codata correspond to call-by-value and call-by name! There's a big difference between call-by-name and lazy evaluation (call-by-need), namely that the former will evaluate the same expression several times if needed, while the latter uses memoization and graph reduction all over the place.

If Haskell was a pure call-by-name language, we'd be able to define "co-complexity" for individual functions, in terms of how hard it is to evaluate their result to a given depth. This way, complexity analysis would work compositionally, as you'd expect. But with lazy evaluation, accessing the head of a list becomes cheaper the second time you do it. This way, referential transparency of costs is lost.

Of course, programming in a pure call-by-name language would most likely be unpleasant, and there are good reasons for lazy evaluation to exist. But it was interesting for me to realize that Haskell is not the "co-universe" twin of a strict language like ML, but rather some sort of hybrid, which is harder to reason about than either of the two "pure" approaches.

In summary, it's not codata that makes it hard to reason about complexity, it's the pervasive memoization.

Friday, June 13, 2014

Yet another lousy monad tutorial

I'm not a big fan of monads, but I understand them. They're not rocket science. Let me try to write a monad tutorial that would've helped my past self understand what the fuss was about. I like concrete explanations that start with practical examples, without any annoying metaphors, and especially without any Haskell code. So here's five examples that have something in common:

1) If a function has type A → B, and another function has type B → C, then we can "compose" them into a new function of type A → C.

2) Let's talk about functions that can return multiple values. We can model these as functions of type A → List<B>. There is a natural way to "compose" two such functions. If one function has type A → List<B>, and another function has type B → List<C>, then we can "compose" them into a new function of type A → List<C>. The "composition" works by joining together all the intermediate lists of values into one. This is similar to MapReduce, which also collects together lists of results returned by individual workers.

3) Let's talk about functions that either return a value, or fail somewhere along the way. We can model these as functions of type A → Option<B>, where Option<B> is a type that contains either a value of type B, or a special value None. There is a natural way to "compose" two such functions. If one function has type A → Option<B>, and another function has type B → Option<C>, then we can "compose" them into a new function of type A → Option<C>. The "composition" works just like regular function composition, except if the first function returns None, then it gets passed along and the second function doesn't get called. This way you can have a "happy path" in your code, and check for None only at the end.

4) Let's talk about functions that call a remote machine asynchronously. We can model these as functions of type A → Promise<B>, where Promise<B> is a type that will eventually contain a value of type B, which you can wait for. There is a natural way to "compose" two such functions. If one function has type A → Promise<B>, and another function has type B → Promise<C>, then we can "compose" them into a new function of type A → Promise<C>. The "composition" is an asynchronous operation that waits for the first promise to return a value, then calls the second function on that value. This is known in some languages as "promise pipelining". It can sometimes make remote calls faster, because you can send both calls to the remote machine in the same request.

5) Let's talk about functions that do input or output in a pure functional language, like Haskell. We can define IO<A> as the type of opaque "IO instructions" that describe how to do some IO and return a value of type A. These "instructions" might eventually be executed by the runtime, but can also be freely passed around and manipulated before that. For example, to create instructions for reading a String from standard input, we'd have a function of type Void → IO<String>, and to create instructions for writing a String to standard output, we'd have String → IO<Void>. There is a natural way to "compose" two such functions. If one function has type A → IO<B>, and another function has type B → IO<C>, then we can "compose" them into a new function of type A → IO<C>. The "composition" works by just doing the IO in sequence. Eventually the whole program returns one huge complicated IO instruction with explicit sequencing inside, which is then passed to the runtime for execution. That's how Haskell works.

Another thing to note is that each of the examples above also has a natural "identity" function, such that "composing" it with any other function F gives you F again. For ordinary function composition, it's the ordinary identity function A → A. For lists, it's the function A → List<A> that creates a single-element list. For options, it's the function A → Option<A> that takes a value and returns an option containing that value. For promises, it's the function A → Promise<A> that takes a value and makes an immediately fulfilled promise out of it. And for IO, it's the function A → IO<A> that doesn't actually do any IO.

At this point we could go all mathematical and talk about how "compose" is like number multiplication, and "identity" is like the number 1, and then go off into monoids and categories and functors and other things that are frankly boring to me. So let's not go there! Whew!

Instead, to stay more on the programming track, let's use a Java-like syntax to define an interface Monad<T> with two methods "identity" and "compose". The five examples above will correspond to five different concrete classes that implement that interface, for different choices of the type parameter T.

The main complication is that the type parameter T must not be a simple type, like String. Instead it must be itself a generic type, like List, Option or Promise. The reason is that we want to have a single implementation of Monad<Option>, not separate implementations like Monad<Option<Integer>>, Monad<Option<String>> and so on. Java and C# don't support generic types whose parameters are themselves generic types (the technical term is "higher-kinded types"), but C++ has some support for them, called "template template parameters". Some functional languages have higher-kinded types, like Haskell, while others don't have them, like ML.

Anyway, here's what it would look like in Java, if Java supported such things:

interface Monad<T> {
  <A> Function<A, T<A>> identity();
  <A, B, C> Function<A, T<C>> compose(
    Function<A, T<B>> first,
    Function<B, T<C>> second
  );
} 

class OptionMonad implements Monad<Option> {
  public <A> Function<A, Option<A>> identity() {
    // Implementation omitted, figure it out
  }
  public <A, B, C> Function<A, Option<C>> compose(
    Function<A, Option<B>> first,
    Function<B, Option<C>> second
  ) {
    // Implementation omitted, do it yourself
  }
}

Defining Monad as an interface allows us to implement some general functionality that will work on all monads. For example, there's a well known function "liftM" that converts a function of type A → B into a function of type List<A> → List<B>, or Promise<A> → Promise<B>, or anything else along these lines. For different monads, liftM will do different useful things, e.g. liftM on lists is just the familiar "map" function in disguise. The implementation of liftM with lambda expressions would be very short, though a little abstract:

<T, A, B> Function<T<A>, T<B>> liftM(
  Function<A, B> func,
  Monad<T> monad
) {
  return (T<A> ta) -> monad.compose(
    () -> ta,
    (A a) -> monad.identity().apply(func.apply(a))
  ).apply(void);
}

Or if you don't like lambda expressions, here's a version without them:

<T, A, B> Function<T<A>, T<B>> liftM(
  Function<A, B> func,
  Monad<T> monad
) {
  return new Function<T<A>, T<B>>() {
    public T<B> apply (T<A> ta) {
      return monad.compose(
        new Function<Void, T<A>>() {
          public T<A> apply() {
            return ta;
          }
        },
        new Function<A, T<B>>() {
          public T<B> apply(A a) {
            return monad.identity().apply(func.apply(a));
          }
        }
      ).apply(void);
    }
  }
}

Monads first became popular as a nice way to deal with IO instructions in a pure language, treating them as first-class values that can be passed around, like in my example 5. Imperative languages don't need monads for IO, but they often face a similar problem of "option chaining" or "error chaining", like in my example 3. For example, the option types in Rust or Swift would benefit from having a "happy path" through the code and a centralized place to handle errors, which is exactly the thing that monads would give you.

Some people think that monads are about side effects, like some sort of escape hatch for Haskell. That's wrong, because you could always do IO without monads, and in fact the first version of Haskell didn't use them. In any case, only two of my five examples involve side effects. Monads are more like a design pattern for composing functions, which shows up in many places. I think the jury is still out on whether imperative/OO languages need to use monads explicitly, but it might be useful to notice them when they appear in your code implicitly.

Thursday, June 12, 2014

Why I don't like laziness

Some ppl think lazy functional programming is nicer than strict. At first I thought so as well, but now I think strict is nicer. Not for any engineering reasons, but from a purely mathematical point of view.

The reason has to do with equality. I think about the equality operation a lot, it really brings out the important issues in language design. It's become something of a litmus test for me, if some language feature is dragging down the equality operation, then it probably sucks in other ways too. For example, in Haskell you can do this:

data Natural = Zero | Successor Natural deriving Eq
omega :: Natural
omega = Successor omega

So you have this bizarre number "omega" that is not bottom, and you can pattern-match on it to your heart's content. You can compare it for equality with Zero, Successor Zero and so on, and all those comparisons will happily return False. But try to compare omega to itself, and you get an infinite loop!

That's the problem in a nutshell. Since Haskell is lazy, there are no types that support decidable equality at all! So you might as well allow "deriving Eq" for many more types, like Bool -> Bool. Sure, comparing two functions of that type is undecidable, but it's exactly as undecidable as comparing two lazy Bools in the first place :-)

Imagine for a moment that Haskell was strict. Then we would know, truly irrevocably and forever, that the types that allow "deriving Eq" are exactly the types that have an always terminating equality operation, nothing more and nothing less. In a lazy language, "deriving Eq" can't make such a powerful statement and replaces it with something wishy-washy, and that's why I think strict is nicer.

Friday, June 6, 2014

Developing Web applications using structured HTML diffs

The current trend in Web development is toward single-page applications, whose prevailing architecture I described in an earlier post. However, a lot of the functionality of modern webapps is conceptually pretty simple, and it's easy to imagine that the Web could have supported such applications from the start, with just one small architectural tweak to HTML.

What if clicking on a hyperlink didn't navigate to a new page by default, but instead loaded a set of changes from the server in some structured text format, like "append such-and-such HTML to the element with such-and-such ID", and applied these changes to the HTML of the current page? A subset of that functionality has always been possible with IFrames, where clicking a link would reload a separate region of the page. But in full generality, it seems to me that such an "HTML diff" protocol could be simple enough and general enough to support many use cases of current web apps.

For example, such functions as adding a product to a shopping cart, or switching from logged-out to logged-in state, or loading an additional set of blog posts at the bottom of the page, can all be expressed as HTML diffs that leave most of the page intact, and only add/delete/update the contents of specific elements. With built-in browser support for HTML diffs, many Web applications could be implemented without using JavaScript at all, using only plain hyperlinks, and everyone would be able to use their favorite server-side languages to generate the diffs on demand.

Another nice feature is that HTML diffs would work together well with CSS transitions, which can play a smooth animation whenever an element's contents change, without the need for JavaScript. For example, clicking a hyperlink on the page could smoothly expand a list of items, by loading its contents from the server and applying a diff to the current page.

To increase responsiveness, we could use ordinary HTTP caching to prefetch and cache certain diffs. In fact, some of the diffs could even be static files on the Web host, so you can implement some dynamic-looking features without any client-side or server-side coding at all! I don't expect that to be very useful, but it's certainly a nice feature to have.

Overall I'm not convinced that the approach can remove all uses of client-side scripting, because it wouldn't be able to support very interactive applications like Google Maps. But at least it might help reduce the complexity of some business webapps, by removing the need for huge stateful client-side frameworks like Angular or Ember. Also surprisingly, I haven't been able to find any previous descriptions of this idea, though I'm sure someone must have come up with it before.

Tuesday, June 3, 2014

Programming languages should have at most one equality operation per type

In many programming languages there are several ways to check things for equality. For example, Java has == for primitive equality and object identity, and equals() as an overridable equality operation. Other industrial languages like Python or C# have the same approach, and Common Lisp and Scheme have four equality operations each (!)

I think it's a mistake to have so many equality operations. In this post I'll try to argue that most languages would do well by following the example of Haskell, where some types have a single equality operation, and others have none at all.

The idea is that the equality operation should always correspond to observational equality. If it's possible to tell the two values apart by calling various functions on them, then the values are different, otherwise they are equal. For example, the integer 3 is always equal to any other integer 3, and there's no point in distinguishing their identity. On the other hand, a mutable array is not equal to another array with the same contents, because changing an element in one array doesn't affect the other one.

More generally, observational equality means that immutable types like integers, strings and immutable containers should have equality based on their contents, while mutable types like references or mutable containers should have equality based on identity. That approach is easy to understand and does the right thing in most cases.

As an additional complication, if a type contains functions, that means observational equality is undecidable in general. Such types should have no equality operation defined by default, but the user can provide one.

(It goes without saying that equality operations should be defined per type, because it makes no sense to compare values of different types. If the user needs to compare the integer 3 and the floating point number 3, they should use an explicit conversion first.)

At this point some people are saying, why shouldn't we compare mutable containers based on their current contents? There's no harm in having a "compareContents" operation, but it shouldn't be called equality, because when a container of mutable containers cannot assume stable equality of the individual mutable containers, you end up with weird bugs.

As a daring thought experiment, you could implement that approach in Java by doing the following changes:

1) Remove all primitive types like int. All types should be classes and inherit from Object. Primitive types cause problems everywhere in Java. With identity comparisons gone, the compiler will be free to unbox Integers anyway.

2) Remove null from the language, and replace it with a generic class Maybe<T>. Null also causes problems everywhere, not just for equality.

3) Remove equals() from Object and move it into the generic interface Comparable<T>. Note that equals() should accept an argument of type T, not any Comparable.

4) Make == be a synonym for equals() in all cases. Remove the default implementation of ==. In all cases where using equals() would lead to a compile error, due to a type not implementing the appropriate Comparable<T>, using == should be a compile error as well.

5) Add a standard class IdentityComparable that implements Comparable<IdentityComparable>, with a final implementation of equals() based on identity. That class should be used as the base for all mutable classes, including containers and file handles.

6) For all standard immutable classes like Integer and String, implement equals() based on contents, to make == do the right thing.

7) Do all the above for hashCode() and compareTo() as well. The declarations should be in Comparable<T>, and IdentityComparable should contain final implementations based on identity.

To be clear, I don't think Java will ever adopt any approach like that. But it's interesting to see just what changes to Java are needed to fix the equality mess, and how many of these changes also make sense for other reasons.

Monday, June 2, 2014

If OO languages don't need variants, they don't need objects either

The usual story is that object-oriented languages don't need sum types (a.k.a. variants) because they can use the Visitor pattern instead. For example, this Haskell type:

data Either a b = Left a | Right b

can be converted to a couple of Java interfaces:

interface EitherVisitor<A, B, Result> {
   Result visitLeft(A a);
   Result visitRight(B b);
}

interface Either<A, B> {
  <Result> Result accept(EitherVisitor<A, B, Result> visitor);
}

(Note that I'm using logji's corrected Visitor pattern, which is generic over the Result type. That's the right way to implement visitors and should be in every textbook.)

I think the above argument is right, and visitors can be used to replace sum types in most situations, but I also think it proves too much. Namely, you can make a similar argument to prove that object-oriented languages don't need product types (a.k.a. objects with fields) either! For example, this Haskell type:

data Pair a b = Pair a b

can be expressed with Java interfaces that contain only one method each (a.k.a. function types):

interface PairVisitor<A, B, Result> {
  Result visit(A a, B b);
}

interface Pair<A, B> {
  <Result> Result accept(PairVisitor<A, B, Result> visitor);
}

(With currying you could go even further, and obtain interfaces that contain a single method receiving a single argument each. I won't do it here because it makes the code less clear.)

More generally, if you have only function types, you can use Church encoding to simulate pairs, lists, numbers and just about everything else. The resulting code will have reasonable types in System F, which is a subset of many existing type systems, including Java's and Haskell's.

There is, however, a subtlety. The Java interfaces shown above don't actually contain any data of types A and B, they just have methods to return that data on demand. In other words, they correspond to lazy Pair and Either. That has a couple of nontrivial consequences.

First, the thing that promises to be a Pair or Either can go into an infinite loop or throw an unchecked exception when asked to return the data. That corresponds to Haskell's bottom value, which is an inhabitant of every type.

Second, lazy types can lead to infinite data structures. For example, this Haskell type:

data List a = Empty | Append a (List a)

can be expressed with Java interfaces like this:

interface ListVisitor<A, Result> {
  Result visitEmpty();
  Result visitAppend(A a, List<A> tail);
}

interface List<A> {
  <Result> Result accept(ListVisitor<A, Result> visitor);
}

If we encode lists in this way, there's nothing stopping a list from passing itself as its own tail into visitAppend. That means the type system cannot guarantee that the list is finite, as it can in eager functional languages like ML. Robert Harper has a well-known post arguing against lazy data structures, but Haskell programmers seem to like them.

On the other hand, using interfaces for sum and product types leads to an unexpected benefit for Java programmers, because they can write their own implementations that will be compatible with existing libraries. If we had a language where all types are interfaces "under the hood", we could do nice things like replacing fields with getter methods transparently, though the current Java style seems to be too clunky for that.

I guess this post turned out a little more rambling and educational than I intended. Oh well. A short summary is that both sum types and product types can be viewed as a layer of syntactic sugar over function types, and peeking below that layer can have some benefits for understanding programs.

Monday, May 19, 2014

Combining array bounds checking and loop condition checking into a single operation

In a memory-safe language, the naive way to loop over an array involves too many runtime checks:

int i = 0;
while (i < array.length) {
    doSomething(array[i]);
    i = i + 1;
}

The variable i is checked against array.length in the loop condition, and then checked against 0 and array.length again when array[i] is dereferenced. That's three comparisons for every element of the array. A clever compiler can remove some of them, but that requires complicated mathematical reasoning. For example, to remove the check against 0, the compiler must build a proof by mathematical induction that i always stays non-negative.

In this post I'll try to make it easier for the compiler to remove all checks except one. First, let's note that looping over a linked list, implemented as a sum type, requires only one check per element to be completely memory-safe:

data List a = Empty | Append a (List a)

product :: List Int -> Int
product (Empty) = 1
product (Append value tail) = value * sum tail

We can see that looping over a linked list requires only one check, which serves double purpose as a bounds check (for memory safety) and a loop condition check (for loop termination). Then how come looping over an array requires three checks? Can we reduce the number of checks without losing memory-safety?

The answer is that we can have only one check per element for arrays too, if we treat them like lists. We just need to use an Iterator class which is guaranteed to contain a valid index into the array, so that dereferencing the iterator doesn't require a bound check, and moving the iterator forward or backward requires exactly one bound check:

#include <iostream>

template <typename Type> class NonEmptyArray;  // defined below

template <typename Type> class NonEmptyIterator {
    friend class NonEmptyArray<Type>;

    private:
    Type* array;
    unsigned int maxIndex;
    unsigned int index;  // 0 <= index <= maxIndex
    NonEmptyIterator(Type* array_, unsigned int maxIndex_):
        array(array_), maxIndex(maxIndex_), index_(0) {}

    public:
    Type get() {
        return array[index];
    }
    
    void set(Type value) {
        array[index] = value;
    }

    bool increment() {
        if (index < maxIndex) {
            index++;
            return true;
        } else {
            return false;
        }
    } 

    bool decrement() {
        if (index > 0) {
            index--;
            return true;
        } else {
            return false;
        }
    } 
};

template <typename Type> class NonEmptyArray {
    private:  
    Type* array;
    unsigned int maxIndex;  // this means length - 1

    public:
    NonEmptyArray(unsigned int maxIndex_): maxIndex(maxIndex_) {
        array = new Type[maxIndex + 1];
    }
    
    ~NonEmptyArray() {
        delete [] array;
    }

    // Noncopyable boilerplate omitted...

    NonEmptyIterator<Type> iterator() {
        return NonEmptyIterator<Type>(array, maxIndex);
    }
};

int main() {
    NonEmptyArray<double> array(2);  // that means length is 3
    NonEmptyIterator<double> it(array.iterator());

    // Fill the array
    double x = 1.0;
    do {
        it.set(x);
        x += 1.0;
    } while (it.increment());

    // Compute the sum of the array elements
    double sum = 0.0;
    do {
        sum += it.get();
    } while (it.decrement());
    std::cout << sum << std::endl;  // prints 6
}

The above iterator class works only for non-empty arrays, because it needs to maintain the invariant that get() always returns a valid value. We could work around that by simulating sum types in C++, but I don't want to overcomplicate the code. The important thing is that once the iterator is constructed, calling the methods get(), set(), increment() and decrement() in any order is completely memory-safe for the lifetime of the array, while requiring only one bounds check for the typical case of traversing the array in a loop.

(Clever readers might point out that the check inside increment() and decrement() returns a boolean, which is then inspected in the loop condition, so that's technically two checks. But if the method calls are inlined, the compiler can trivially combine these two checks into one without doing any fancy mathematical reasoning.)

It's an interesting exercise to adapt the above approach to more complicated cases, like traversing two arrays of the same length at once. But I feel that's not really the point. Unlike approaches based on theorem provers or dependent types (or even "non-dependent types"), this approach to eliminating bounds checking probably doesn't scale to more complicated use cases. But it does have the advantage of being simple and understandable to most programmers, without using any type-theoretic tricks.

Using real arithmetic to eliminate bounds checking

This post is a repost of a question I asked on Lambda the Ultimate.
The usual problem with eliminating array bounds checking is that integer arithmetic is undecidable. For example, the compiler can't figure out that you can index into an array of m·n elements using an expression like i·n+j, even if it's known statically that 0≤i<m and 0≤j<n. Or rather, that particular problem is simple, but it's hard to come up with a general algorithm that would work for many different programs. To solve that problem, some people are trying to use dependent types and machine-checkable proofs, like Hongwei Xi, or restrict their attention to Presburger arithmetic (which is decidable but doesn't support multiplication), like Corneliu Popeea.
To take a different approach to the problem, we can note that statements like "if i ∈ [0,m-1] and j ∈ [0,n-1], then i·n+j ∈ [0,m·n-1]" (all segments are closed) can be interpreted and proved in real arithmetic, instead of integer arithmetic. The nice thing is that real arithmetic is decidable. If that approach works, we can have a language with integer-parameterized types like "array of n elements" and "integer between m and n", and have a simple definition of what the type checker can and cannot automatically prove about programs using such types, without having to write any proofs explicitly.
Of course the system will be limited by the fact that it's talking about reals rather than integers. For example, there will be problems with integer-specific operations on type parameters, like division and remainder, but I don't know how often these will be required.
In the comments to my post on LtU, Tom Primožič has carried out some experiments with several SMT solvers, including Microsoft's Z3. Indeed, it turns out that integer solvers can't handle the above example with matrix addressing, but a real arithmetic solver does the job easily. Here's a translation of the matrix addressing example, which you can try out in Z3's online interface:
(declare-const i Int)
(declare-const j Int)
(declare-const m Int)
(declare-const n Int)
(assert (and (<= 0 i) (<= i (- m 1))))
(assert (and (<= 0 j) (<= j (- n 1))))
(assert (and (<= 0 n) (<= 0 m)))

(assert (not (<= (+ (* i n) j) (- (* m n) 1))))
(check-sat-using qfnra-nlsat)

Note that the last line invokes the nonlinear real arithmetic solver NLSat. If you replace the last line with (check-sat) to use the default integer solver instead, it will timeout and fail to find a proof.

Tuesday, May 6, 2014

Functional equivalence in value-level programming

When you construct a container like Set or Map, you need to either explicitly pass the functions used for comparing or hashing elements, or automatically pick them up based on the types of the elements. "Value-level programming" tries to discourage type-directed dispatch, so we need to pass the functions explicitly. That leads to a problem when you need to quickly merge two Sets or Maps. The merging function cannot assume that the same hashing function was used in both containers, so it might need to rehash all elements, while with type-directed dispatch it could assume that the hashing function for the same type is always the same and skip the rehashing.

That's a genuine problem with value-level programming. There are several possible solutions:

1) If two containers have pointers to the same hashing function, that means it's safe to skip rehashing. This approach is guaranteed to be safe, but it's sometimes too conservative.

2) Each hashing function could contain an explicit ID, so the merging function can skip rehashing if the IDs match. This approach is not guaranteed to be safe, but it seems to discourage programmer errors pretty well.

3) The compiler could enforce the fact that there's only one value of a specific type, like HashFunction<String>. That can be done with private constructors in Java, or with abstract data types in functional languages. This approach is guaranteed to be safe, but it needs an escape hatch, possibly in the form of (1) or (2).

Overall it seems to me that value-level solutions to the above problem are not worse than type-directed solutions. I don't know if there are other, more severe variants of that problem.

Simulating template specialization with value-level polymorphism

When I showed my last post about translating a generic algorithm from C++ to Java to my friend who's an expert in C++, his main question was, what about template specialization? The simplest example of specialization is std::advance, which advances an iterator by n steps. It works in O(n) time for iterators that only support "it++", and in O(1) time for iterators that support "it+=n".

In value-level programming, that can be simulated by having a method in the ForwardIteratorOperations interface that returns Maybe<RandomAccessIteratorOperations>. For iterators that don't support fast random access, that method would return Maybe.Nothing, and for those that do, it would return a Maybe.Something containing a valid RandomAccessIteratorOperations. Since Java doesn't have sum types like Maybe, we can simulate it with a nullable return value instead. I've written a translation of std::advance into Java using that approach, you can find the code at the end of this post.

The problem is that the above approach doesn't scale to all uses of template specialization in C++, because if there are multiple interrelated concepts that you can specialize on, the number of possible conversions grows quadratically or worse. Another complication is that some concepts can have several type parameters, like multi-parameter type classes in Haskell. I'm still trying to figure out if that's an important use case.

In fact, I'm still trying to figure out if template specialization is worth the trouble at all. The C++ STL seems to have few uses of specialization. The examples I've found are std::advance, std::distance, vector<bool> which is known to be broken, and some specializations on trivially destructible types like pointers that allow you to skip calling destructors. The Boost libraries use specialization more, but it's often to compensate for the drawbacks of C++, like the lack of lambdas or garbage collection. I haven't found a really convincing use case for specialization yet, but I'll change my view if I find one.

Why am I so against specialization? Well, part of the reason is that I'm trying to explore "value-level programming", which means having no type-directed overloading or dispatch anywhere. But the bigger problem is that specialization in C++ seems to be incompatible with modularity and separate compilation, and would stay incompatible even if templates supported separate compilation. If you specialize someone else's templates, and their headers don't include yours, then everything will compile without errors, but the runtime behavior is undefined. That violates the principle that "well-typed programs don't have type errors", which feels very important to me. With the approach I outlined above, such errors are provably impossible, but at the cost of spelling out many things explicitly.

APPENDIX: the translation of std::advance into Java.

public class Main {

    static interface Forward<Iterator> {
        Iterator increment(Iterator iterator);
        boolean equal(Iterator first, Iterator second);
        /* Nullable */ RandomAccess<Iterator> getFastRandomAccess();
    }
   
    static interface RandomAccess<Iterator> {
        Iterator add(Iterator iterator, Integer delta);
        Integer subtract(Iterator first, Iterator second);
    }

    static interface Input<Iterator, Value> {
        Value get(Iterator iterator);
    }
       
    static <Iterator> Iterator advance(
        Iterator iterator,
        Integer delta,
        Forward<Iterator> forward
    ) {
        if (forward.getFastRandomAccess() != null) {
            return forward.getFastRandomAccess().add(iterator, delta);
        } else {
            while (delta > 0) {
                iterator = forward.increment(iterator);
                delta = delta - 1;
            }
            return iterator;
        }
    }
   
    static class List<Value> {
        public final Value value;
        /* Nullable */ public final List<Value> next;
        public List(
            Value value,
            List<Value> next
        ) {
            this.value = value;
            this.next = next;
        }
    }
   
    static class ArrayRandomAccess implements RandomAccess<Integer> {
        public Integer add(Integer iterator, Integer delta) {
            System.out.println("Using integer add");
            return iterator + delta;
        }
        public Integer subtract(Integer first, Integer second) {
            return first - second;
        }
    }
   
    static class ArrayForward implements Forward<Integer> {
        public Integer increment(Integer iterator) {
            System.out.println("Using integer increment");
            return iterator + 1;
        }
        public boolean equal(Integer first, Integer second) {
            return first == second;
        }
        public RandomAccess<Integer> getFastRandomAccess() {
            return new ArrayRandomAccess();
        }
    }
   
    static class ArrayInput<Value> implements Input<Integer, Value> {
        private final Value [] array;
        public ArrayInput(Value [] array) {
            this.array = array;
        }
        public Value get(Integer index) {
            return array[index];
        }
    }

    static class ListForward<Value> implements Forward<List<Value>> {
        public List<Value> increment(List<Value> iterator) {
            System.out.println("Using list increment");
            return iterator.next;
        }
        public boolean equal(List<Value> first, List<Value> second) {
            return first == second;
        }
        public RandomAccess<List<Value>> getFastRandomAccess() {
            return null;
        }               
    }
   
    static class ListInput<Value> implements Input<List<Value>,Value> {
        public Value get(List<Value> list) {
            return list.value;
        }
    }
   
    static <Iterator, Value> Value peekAhead(
        Iterator iterator,
        Integer delta,
        Forward<Iterator> forward,
        Input<Iterator, Value> input
    ) {
        return input.get(advance(iterator, delta, forward));
    }

    public static void main(String [] args) {
        Double [] array = new Double [] { 1.0, 2.0, 3.0 };
        List<Double> list =

            new List<Double>(1.0,
                new List<Double>(2.0,
                    new List<Double>(3.0, null)));
        System.out.println("" + peekAhead(

            0, 2, new ArrayForward(), new ArrayInput(array)));
        System.out.println("" + peekAhead(

            list, 2, new ListForward(), new ListInput()));
    }
}


And the output is this:

Using integer add
3.0
Using list increment
Using list increment
3.0

Sunday, May 4, 2014

An example of value-level programming in Java

In this post, which is just a big listing of code, I translate a C++ generic algorithm from the <algorithm> STL header into Java, using only basic features of Java. To prove that the generic algorithm can be used on any third-party data structure, I give an example of using it on a Java built-in array. I hope the example is nontrivial enough to show that many STL algorithms can be translated in the same way. More generally, I want to explore the possibility that "value-level programming" might be a good solution to the problems that C++ generic programming was intended to solve, as described in this interview with Alex Stepanov.

You can paste the code into http://www.compileonline.com/compile_java_online.php to try it out:

public class Main {

    // http://www.sgi.com/tech/stl/BidirectionalIterator.html
    static interface BidirectionalIteratorOperations<Iterator> {
        Iterator increment(Iterator iterator);
        Iterator decrement(Iterator iterator);
        boolean equal(Iterator first, Iterator second);
    }

    // This part factored out for convenience.
    static interface Dereference<Iterator, Value> {
        Value get(Iterator iterator);
        void set(Iterator iterator, Value value);
    }

    static interface UnaryPredicate<Value> {
        boolean test(Value value);
    }

    // http://www.cplusplus.com/reference/algorithm/partition/
    // Line-by-line translation.
    // Note that Iterator and Value can be _any_ types.
    static <Iterator, Value> Iterator partition(
        Iterator first,
        Iterator last,
        UnaryPredicate<Value> predicate,
        Dereference<Iterator, Value> dereference,
        BidirectionalIteratorOperations<Iterator> operations
    ) {
        while (!operations.equal(first, last)) {
            while (predicate.test(dereference.get(first))) {
                first = operations.increment(first);
                if (operations.equal(first, last)) {
                    return first;
                }
            }
            do {
                last = operations.decrement(last);
                if (operations.equal(first, last)) {
                    return first;
                }
            } while (!predicate.test(dereference.get(last)));
            swap(first, last, dereference);
            operations.increment(first);
        }
        return first;
    }
    
    // This isn't the true swap from C++, which swaps values.
    static <Iterator, Value> void swap(
        Iterator first,
        Iterator second,
        Dereference<Iterator, Value> dereference
    ) {
        Value temp = dereference.get(first);
        dereference.set(first, dereference.get(second));
        dereference.set(second, temp);
    }
    
    // Let's use the above to partition a Java built-in array,
    // using Integers as iterators.
    public static void main(String [] args) {
        final Double [] data =
            new Double [] { 1.0, 2.0, 3.0, 4.0, 5.0 };
        UnaryPredicate<Double> predicate =
            new UnaryPredicate<Double>() {
                public boolean test(Double value) {
                    return (value < 1.5) || (value > 4.5);
                }
            }
        ;
        Dereference<Integer, Double> dereference =
            new Dereference<Integer, Double>() {
                public Double get(Integer index) {
                    return data[index];
                }
                public void set(Integer index, Double value) {
                    data[index] = value;
                }
            }
        ;
        BidirectionalIteratorOperations<Integer> operations =
            new BidirectionalIteratorOperations<Integer>() {
                public Integer increment(Integer index) {
                    return index + 1;
                }
                public Integer decrement(Integer index) {
                    return index - 1;
                }
                public boolean equal(Integer first, Integer second) {
                    return first == second;
                }
            }
        ;
        partition(0, data.length, predicate, dereference, operations);
        System.out.println(java.util.Arrays.toString(data));
    }
}

Exception handling is for bugs

Should we use exceptions or return values to indicate failure? If we take the usual advice, "don't use exceptions for control flow", then we should never use exceptions at all, because control flow in unusual situations is still control flow. Maybe the advice is intended to mean something like "use exceptions only in unusual situations", but how do different programmers agree on which situations are unusual? For example, is "file not found" unusual? Or how about division by zero?

To decide whether to use exceptions or return values, my rule of thumb is that exceptions should be used for situations that indicate bugs. In other words, you should arrange your code so that it throws an exception only if there's a bug somewhere. That way, if you see an exception in your logs, you should find and fix the bug that caused it to be thrown. For example, "file not found" should not be an exception, because you cannot write a program that never gets "file not found". But division by zero should be an exception, because you can and should write programs that never divide by zero.

Using exceptions only to indicate bugs doesn't mean you should avoid exceptions. On the contrary, it's useful to have many assertions in the code and treat assertion failures as exceptions. If you use exception handling only for bugs, as I suggest, then all exceptions become assertion failures in some sense.

Saturday, May 3, 2014

Code readability is about how, not what

I often see people calling some piece of code "readable" if reads like an English-language description of the task, and uses complex programming techniques to make it actually work as a program. For example, many programs in Ruby are like that, or you can check out this post for another example in Haskell.

To me, such code doesn't look readable at all. I mostly read code in order to make changes or find bugs, or figure out something about the implementation. Complex programming techniques often make these tasks more difficult, regardless of their other benefits.

A piece of code looks readable to me if I can understand how it works, not just what it does. Using readable English identifiers is good, but it is also important to have straightforward control flow. That makes it easier to build a mental model of how the code executes. I wish more programmers paid attention to straightforward control flow, instead of using complex programming techniques to achieve surface readability.

Friday, May 2, 2014

A snippet from my new programming language

In the last few days I'm thinking about a hypothetical programming language called MVL, which is short for "Manifestly typed Value-level Language" or maybe "Verbose ML".

Here's some example code in that language:

GENERIC(Type) RECORD Group(
  operation: FUNCTION(Type, Type): Type,
  inverse: FUNCTION(Type): Type,
  identity: Type
)

GENERIC(Type) RECORD Field(
  addition: Group(Type),
  multiplication: Group(Type)
)

GENERIC(Scalar, Vector) RECORD VectorSpace(
  scalars: Field(Scalar),
  vectors: Group(Vector),
  scale: FUNCTION(Vector, Scalar): Vector
)

(Yes, I know that multiplication is not a group. But we can throw exceptions on divide by zero, so it's close enough.)

The idea of "value-level programming" (which I got from of Gabriel Gonzalez) is to pass these records around as arguments to functions, instead of using something like OO interfaces, Haskell typeclasses or concepts in C++ templates. The resulting code is similar to Haskell after translating typeclasses to dictionaries. Since it's annoying to always pass the records around, I'm thinking of a system like Agda's instance arguments or Scala's implicit arguments to make it easier:

GENERIC(Type) FUNCTION power(
  x: Type,
  y: Integer,
  IMPLICIT group: Group(Type)
): Type
  IF y > 0
    RETURN group.operation(x, power(x, y - 1))
  ELSE IF y = 0
    RETURN group.identity
  ELSE
    RETURN group.inverse(power(x, -y))

FUNCTION floatMultiply(x: Float, y: Float): Float
  RETURN x * y

FUNCTION floatInverse(x: Float): Float
  RETURN 1 / x

VAR floatMultiplicationGroup: Group(Float) = (
  operation: floatMultiply,
  inverse: floatInverse,
  identity: 1.0
)

VAR x: Float = power(0.5, -2)
# Returns 4.0

When you call the "power" function, you don't need to explicitly pass the group as an argument, because it's marked as implicit and picked up from the closest surrounding scope. The top level call picks up the value of floatMultiplicationGroup, and the recursive calls pick up the value from the enclosing call. If there are multiple matches in the same scope, that's a compile-time error, and you should pass the value explicitly.

Now all that remains is to make that idea into a workable language :-)

Friday, April 18, 2014

The difference between code and data is Turing completeness

In principle there seems to be no difference between code and data, because the same sequence of bits can be treated as a sequence of numbers or a piece of code. It is more accurate to say that the same sequence of bits can be treated as code or data by different programs.

For example, we could have a program A that treats its input as a sequence of numbers and computes the sum of these numbers, and program B that treats the input as a piece of code and executes it. The important difference between programs A and B is that we can make program B carry out any possible computation by giving it a suitable input, while program A can only compute sums.

On the spectrum between these two extremes, we can find programs whose behavior is more influenced by their input than program A, but easier to analyze than program B. These programs correspond to models of computation that are stronger than finite state machines, but weaker than Turing machines.

For example, if we run a syntax highlighting program on a piece of code, we probably can't make it carry out an arbitrary computation. In this way, a syntax highlighting program treats the input as data. If we give the same piece of code to a compiler, is is treated less like data and more like code, especially if the programming language it's written in allows some compile-time computation. And when we finally execute the program, it is treated as code by the interpreter or the operating system.

Now you can tell the difference between code and data in any software system. If the data can strongly influence the program used to process it, and make it carry out an arbitrary computation, then it's code.