Tuesday, June 3, 2014

Programming languages should have at most one equality operation per type

In many programming languages there are several ways to check things for equality. For example, Java has == for primitive equality and object identity, and equals() as an overridable equality operation. Other industrial languages like Python or C# have the same approach, and Common Lisp and Scheme have four equality operations each (!)

I think it's a mistake to have so many equality operations. In this post I'll try to argue that most languages would do well by following the example of Haskell, where some types have a single equality operation, and others have none at all.

The idea is that the equality operation should always correspond to observational equality. If it's possible to tell the two values apart by calling various functions on them, then the values are different, otherwise they are equal. For example, the integer 3 is always equal to any other integer 3, and there's no point in distinguishing their identity. On the other hand, a mutable array is not equal to another array with the same contents, because changing an element in one array doesn't affect the other one.

More generally, observational equality means that immutable types like integers, strings and immutable containers should have equality based on their contents, while mutable types like references or mutable containers should have equality based on identity. That approach is easy to understand and does the right thing in most cases.

As an additional complication, if a type contains functions, that means observational equality is undecidable in general. Such types should have no equality operation defined by default, but the user can provide one.

(It goes without saying that equality operations should be defined per type, because it makes no sense to compare values of different types. If the user needs to compare the integer 3 and the floating point number 3, they should use an explicit conversion first.)

At this point some people are saying, why shouldn't we compare mutable containers based on their current contents? There's no harm in having a "compareContents" operation, but it shouldn't be called equality, because when a container of mutable containers cannot assume stable equality of the individual mutable containers, you end up with weird bugs.

As a daring thought experiment, you could implement that approach in Java by doing the following changes:

1) Remove all primitive types like int. All types should be classes and inherit from Object. Primitive types cause problems everywhere in Java. With identity comparisons gone, the compiler will be free to unbox Integers anyway.

2) Remove null from the language, and replace it with a generic class Maybe<T>. Null also causes problems everywhere, not just for equality.

3) Remove equals() from Object and move it into the generic interface Comparable<T>. Note that equals() should accept an argument of type T, not any Comparable.

4) Make == be a synonym for equals() in all cases. Remove the default implementation of ==. In all cases where using equals() would lead to a compile error, due to a type not implementing the appropriate Comparable<T>, using == should be a compile error as well.

5) Add a standard class IdentityComparable that implements Comparable<IdentityComparable>, with a final implementation of equals() based on identity. That class should be used as the base for all mutable classes, including containers and file handles.

6) For all standard immutable classes like Integer and String, implement equals() based on contents, to make == do the right thing.

7) Do all the above for hashCode() and compareTo() as well. The declarations should be in Comparable<T>, and IdentityComparable should contain final implementations based on identity.

To be clear, I don't think Java will ever adopt any approach like that. But it's interesting to see just what changes to Java are needed to fix the equality mess, and how many of these changes also make sense for other reasons.

No comments:

Post a Comment