Nascent GHC Proposal: Source Rewrite Rules and Optional Constraints

Chris Smith
5 min readAug 27, 2021

--

Lately, I’ve been thinking more and more about something that would be nice to have in GHC.

Motivation #1: HMock Predicates

Consider the Predicate type from HMock:

data Predicate a = Predicate
{ showPredicate :: String,
showNegation :: String,
accept :: a -> Bool,
explain :: a -> String
}

The purpose of this type is to be a function a -> Bool that can also explain itself. That is, you can get Strings to explain what it’s testing, and why a particular argument matches or not. There are three important things to know about Predicate:

  1. It’s important that Predicate works on arbitrary types, including those without Show instances. Indeed, using predicates is the only way to match function calls in HMock whose arguments don’t have Eq and Show instances.
  2. The exact strings produced by showPredicate, showNegation, and explain are entirely unspecified, by design. It’s nice for users if they are as descriptive as possible, but they only get used in error messages and diagnostics, so no kind of real correctness depends upon their exact value.
  3. Predicates are often polymorphic. For instance, there is eq :: (Show a, Eq a) => a -> Predicate a, which compares checks for equality with a specific value. There are even more complex predicates, too, like unorderedElemsAre :: MonoFoldable t => [Predicate (Element t)] -> Predicate t, which uses a Ford-Fulkerson max-flow algorithm to match elements and child predicates, then reports on the best available matching, and whether there were either elements or child predicates that couldn’t be matched.

Here’s the fundamental problem: what do I do with Show instances? Even in the description above, there are some unsatisfying choices:

  • The eq predicate requires a Show constraint. This is used to implement showPredicate, showNegation, and explain. However, if it weren’t available, I could still provide the core functionality of eq at the cost of somewhat poorer descriptive text. Leaving out the Show constraint would have doomed all users to receive poor descriptions. In the end, I left it in, making eq less useful. I could also define a separate nonShowableEq predicate, which is identical to eq in all ways except that it lacks Show. This is ugly, and doing this everywhere would double the exposed symbols in this module (from 50 to 100 exports!)
  • On the other hand, unorderedElemsAre doesn’t have an Eq constraint on Element t. If it did, I could provide better explanatory text by listing which specific element values in the collection were unexpected. Yet this seems too limiting on such a useful combinator, which is often useful for higher-order types. Therefore, I left the constraint out. This is a real loss of functionality. Again, I could define showableUnorderedElemsAre, but at the cost of exploding the API if I did this everywhere.

What I really want is for GHC to check the types looking for a Show constraint, then use one or the other implementation depending on whether a Show constraint exists. This might seem suspect. Indeed, it is! It’s only reasonable because the results of the two implementations are equivalent up to the unspecified explanatory text.

Motivation #2: Improving efficiency

This isn’t the only time something like this comes up. A similar situation often arises with polymorphic algorithms. With only an Eq constraint, the best you can do for Haskell’s nub function (which removes duplicates from a list) is O(n²). Add an Ord constraint, and you can do the same thing in O(n log n). It requires a bit of care to do it in a way that preserves the original order, but it’s still possible. Add a Hashable constraint, and you can do it in O(n) expected time.

In practice, Haskell offers nub, and common libraries offer orbNub to do the same thing more efficiently. That’s again an extra API for the same function, just with better performance at the cost of more assumptions about its input. It’s ugly that this is necessary.

Existing (and partial) solutions

It’s not surprising that people have thought about this in the past. There are several bits of prior work worth mentioning.

  • In C++, there’s an approach known as SFINAE: “Specialization Failure Is Not An Error”. What this means is that you can just write two implementations of the same template, and if the more specific one doesn’t compile, you don’t get an error! Instead, the compiler ignores it and substitutes the more general implementation. Since C++ monomorphizes all templates at compile time, it all comes down to trying, and dropping the specialization if possible. This means C++ programs can make these trade-offs without exposing different names and APIs.
  • Mike Izbicki has written a Haskell library called ifcxt for Haskell, which partially solves the problem. The good news is that you can use the library to define optional constraints, and different implementations based on whether the constraint is satisfied or not. The bad news is that it requires Template Haskell to write a bunch of boilerplate. Unfortunately, new boilerplate is needed for every new type, so this Template Haskell cannot be internal to the implementation and needs to be invoked from the client code.
  • GHC already has a mechanism called rewrite rules, which allow certain expressions to be matched and rewritten by the compiler into a more efficient form. However, this happens during the Core stage of GHC, which is after type class resolution has occurred. This means that rewrite rules are not able to add additional type classes.

It seems that what’s needed here is a GHC plugin or extension.

Proposed solution #1: Source rewrite rules

This seems very similar to a rewrite rule, but just needs to fire at a different stage of the compilation. Therefore, if I were proposing this, I would propose a new kind of rewrite rule. Something like this:

{-# SOURCE_RULES
"nub/Ord" forall x. nub x = ordNub x
#-}

When encountering a rule like this during type checking, GHC would attempt to perform the given substitution. If the resulting program type checks, fine. Otherwise, it would fall back to not applying the rule rather than failing the compile. This is very like the SFINAE rule in C++.

Proposed solution #2: Optional constraints

There’s a slight problem with the solution above: it’s not very compositional when we have separate compilation. If I write this:

foo :: a -> a
foo = id
fooNum :: Num a => a -> a
fooNum a = a + 1
{-# SOURCE_RULES
"fooNum" forall x. foo x = fooNum x
#-}
bar x = foo x

What is the type of bar? If bar has type a -> a, then the fooNum rule never fires. On the other hand, if bar has type Num a => a -> a, then it is insufficiently general.

I propose that it should infer a -> a, but with an explicit type signature, one could write bar :: ?Num a => a -> a. This would inform GHC that it should generate two versions of bar, one specialized implementation with a Num constraint, and one general implementation without. It should then automatically generate a new source rule forall x. bar x = barNum x, that tries to rewrite the more general implementation into the specialized one when possible. (If there are no source rules that depend on the constraint, GHC could realize that both implementations are the same, and decline to generate a specialization at all, possibly with a warning about a redundant optional constraint.)

Conclusion

What do you think? Is this a direction you’d like to see GHC move? Is ifCxt enough?

--

--

Chris Smith

Software engineer, volunteer K-12 math and computer science teacher, author of the CodeWorld platform, amateur ring theorist, and Haskell enthusiast.