Nascent GHC Proposal: Source Rewrite Rules and Optional Constraints

Lately, I’ve been thinking more and more about something that would be nice to have in GHC.

Motivation #1: HMock Predicates

Consider the type from HMock:

data Predicate a = Predicate
{ showPredicate :: String,
showNegation :: String,
accept :: a -> Bool,
explain :: a -> String
}

The purpose of this type is to be a function that can also explain itself. That is, you can get s to explain what it’s testing, and why a particular argument matches or not. There are three important things to know about :

  1. It’s important that works on arbitrary types, including those without instances. Indeed, using predicates is the only way to match function calls in HMock whose arguments don’t have and instances.
  2. The exact strings produced by , , and are entirely unspecified, by design. It’s nice for users if they are as descriptive as possible, but they only get used in error messages and diagnostics, so no kind of real correctness depends upon their exact value.
  3. s are often polymorphic. For instance, there is , which compares checks for equality with a specific value. There are even more complex predicates, too, like , which uses a Ford-Fulkerson max-flow algorithm to match elements and child predicates, then reports on the best available matching, and whether there were either elements or child predicates that couldn’t be matched.

Here’s the fundamental problem: what do I do with instances? Even in the description above, there are some unsatisfying choices:

  • The predicate requires a constraint. This is used to implement , , and . However, if it weren’t available, I could still provide the core functionality of at the cost of somewhat poorer descriptive text. Leaving out the constraint would have doomed all users to receive poor descriptions. In the end, I left it in, making less useful. I could also define a separate predicate, which is identical to in all ways except that it lacks . This is ugly, and doing this everywhere would double the exposed symbols in this module (from 50 to 100 exports!)
  • On the other hand, doesn’t have an constraint on . If it did, I could provide better explanatory text by listing which specific element values in the collection were unexpected. Yet this seems too limiting on such a useful combinator, which is often useful for higher-order types. Therefore, I left the constraint out. This is a real loss of functionality. Again, I could define , but at the cost of exploding the API if I did this everywhere.

What I really want is for GHC to check the types looking for a constraint, then use one or the other implementation depending on whether a constraint exists. This might seem suspect. Indeed, it is! It’s only reasonable because the results of the two implementations are equivalent up to the unspecified explanatory text.

Motivation #2: Improving efficiency

This isn’t the only time something like this comes up. A similar situation often arises with polymorphic algorithms. With only an constraint, the best you can do for Haskell’s function (which removes duplicates from a list) is O(n²). Add an constraint, and you can do the same thing in O(n log n). It requires a bit of care to do it in a way that preserves the original order, but it’s still possible. Add a constraint, and you can do it in O(n) expected time.

In practice, Haskell offers , and common libraries offer to do the same thing more efficiently. That’s again an extra API for the same function, just with better performance at the cost of more assumptions about its input. It’s ugly that this is necessary.

Existing (and partial) solutions

It’s not surprising that people have thought about this in the past. There are several bits of prior work worth mentioning.

  • In C++, there’s an approach known as SFINAE: “Specialization Failure Is Not An Error”. What this means is that you can just write two implementations of the same template, and if the more specific one doesn’t compile, you don’t get an error! Instead, the compiler ignores it and substitutes the more general implementation. Since C++ monomorphizes all templates at compile time, it all comes down to trying, and dropping the specialization if possible. This means C++ programs can make these trade-offs without exposing different names and APIs.
  • Mike Izbicki has written a Haskell library called ifcxt for Haskell, which partially solves the problem. The good news is that you can use the library to define optional constraints, and different implementations based on whether the constraint is satisfied or not. The bad news is that it requires Template Haskell to write a bunch of boilerplate. Unfortunately, new boilerplate is needed for every new type, so this Template Haskell cannot be internal to the implementation and needs to be invoked from the client code.
  • GHC already has a mechanism called rewrite rules, which allow certain expressions to be matched and rewritten by the compiler into a more efficient form. However, this happens during the Core stage of GHC, which is after type class resolution has occurred. This means that rewrite rules are not able to add additional type classes.

It seems that what’s needed here is a GHC plugin or extension.

Proposed solution #1: Source rewrite rules

This seems very similar to a rewrite rule, but just needs to fire at a different stage of the compilation. Therefore, if I were proposing this, I would propose a new kind of rewrite rule. Something like this:

{-# SOURCE_RULES
"nub/Ord" forall x. nub x = ordNub x
#-}

When encountering a rule like this during type checking, GHC would attempt to perform the given substitution. If the resulting program type checks, fine. Otherwise, it would fall back to not applying the rule rather than failing the compile. This is very like the SFINAE rule in C++.

Proposed solution #2: Optional constraints

There’s a slight problem with the solution above: it’s not very compositional when we have separate compilation. If I write this:

foo :: a -> a
foo = id
fooNum :: Num a => a -> a
fooNum a = a + 1
{-# SOURCE_RULES
"fooNum" forall x. foo x = fooNum x
#-}
bar x = foo x

What is the type of ? If bar has type , then the rule never fires. On the other hand, if bar has type , then it is insufficiently general.

I propose that it should infer , but with an explicit type signature, one could write . This would inform GHC that it should generate two versions of , one specialized implementation with a constraint, and one general implementation without. It should then automatically generate a new source rule , that tries to rewrite the more general implementation into the specialized one when possible. (If there are no source rules that depend on the constraint, GHC could realize that both implementations are the same, and decline to generate a specialization at all, possibly with a warning about a redundant optional constraint.)

Conclusion

What do you think? Is this a direction you’d like to see GHC move? Is enough?

Software engineer, volunteer K-12 math and computer science teacher, author of the CodeWorld platform, amateur ring theorist, and Haskell enthusiast.