The Semigroup of Exponentially Weighted Moving Averages

6 min read2 days ago

This is just a quick note about some more interesting algebraic structure, and how that structure can help with generalizing an idea. None of this is new, but it’s the perspective it sheds on the problem solving process that was interesting to me.

Defining the EWMA

And exponentially weighted moving average or (EWMA) is a technique often used for keeping a running estimate of some quantity as you make more observations. The idea is that each time you make a new observation, you take a weighted average of the new observation with the old average.

For a quick example, suppose you’re checking the price of gasoline every day at a nearby store.

On the first day it’s $3.10.
The second day it’s $3.20. You take the average of $3.10 and $3.20, and get $3.15 for your estimate.
The next day, it’s $3.30. You take the average of $3.15 and $3.30, and get $3.225 for your estimate.

This is not just the average of the three data points, which would be $3.20. Instead, the EWMA is biased in favor of more recent data. In fact, if you took these observations for a month, the first day of the month would be only about a billionth of the final result — basically ignored entirely — while the last day would amount for fully half. That’s the point: you’re looking for an estimate of the current value, so older historical data is less and less important. But you still want to smooth out the sudden jumps up and down.

In the example above, I was taking a straight average between the previous EWMA and the new data point. That leads for very short-lived data, though. More generally, there’s a parameter, normally called α, that tells you how to weight the average, giving this formula:

EWMAₙ₊₁ = xₙ₊₁ α + EWMAₙ (1-α)

In the extreme, if α = 1, you ignore the historical data and just look at the most recent value. For smaller values of α, the EWMA takes longer to adjust to the current value, but smooths out the noise in the data.

The EWMA Semigroup

Traditionally, you compute an EWMA one observation at a time, because you only make one observation at a time and keep a running average as you go. But when analyzing such a process, you may want to compute the EWMA of a large set of data in a parallel or distributed way. To do that, you’re going to look for a way to map the observations from the list into a semigroup. (See my earlier article on the relationship between monoids and list processing.)

We’ll start by looking at what the EWMA looks like on a sequence of data points x₁, x₂, …, xₙ.

EWMA₁ = x₁
EWMA₂ = x₁ (1-α) + x₂ α
EWMA₃ = x₁ (1-α)² + x₂ α (1-α) + x₃ α
EWMA₄ = x₁ (1-α)³ + x₂ α (1-α)² + x₃ α (1-α) + x₄ α

The first term of this sequence is funky and doesn’t follow the same pattern as the rest. But that’s because it was always different: since you don’t have a prior average at the first data point, you can only take the first data point itself as an expectation.

That’s a problem if we want to generate an entire semigroup from values that represent a single data point, since they would seem to all fall into this special case. We can instead treat the initial value as acting in both ways: as a prior expectation, and as an update to that expectation, giving something like this:

EWMA₁ = x₁ (1-α) + x₁ α
EWMA₂ = x₁ (1-α)² + x₁ α (1-α) + x₂ α
EWMA₃ = x₁ (1-α)³ + x₁ α (1-α)² + x₂ α (1-α) + x₃ α
EWMA₄ = x₁ (1-α)⁴ + x₁ α (1-α)³ + x₂ α (1-α)² + x₃ α (1-α) + x₄ α

The first term is the contribution of the prior expectation, but even the single observation case now has update terms, as well.

There are now two parts to the semigroup: a prior expectation, and an update rule. There’s an obvious and trivial semigroup structure to prior expectations: just take the first value and discard the later ones.

The semigroup structure on the update rules is a little more subtle. The key is to realize that the rule for updating an exponentially weighted moving average always looks a bit like the formula for EWMA₁ above: a weighted average between the prior expectation and the new target value. While single observations should use the same weight (α), composing multiple observations amounts to adjusting to a combined target value with a larger weight

It takes a little algebra to derive the new weight and target value for a composition, but it amounts to this. If:

f₁(x) = x (1-α₁) + x₁ α₁
f₂(x) = x (1-α₂) + x₂ α₂

Then:

(f₁ ∘ f₂)(x) = x (1-α₃) + x₃ α₃
α₃ = α₁ + α₂ - α₁ α₂
x₃ = (x₁ α₁ + x₂ α₂ - x₂ α₁ α₂) / α₃

This can be defunctionalized to only store the weight and target values. In Haskell, it looks like this:

import Data.Foldable1 (Foldable1, foldMap1)
import Data.Semigroup (First (..))

data EWMAUpdate = EWMAUpdate
  { ewmaAlpha :: Double,
    ewmaTarget :: Double
  }
  deriving (Eq, Show)

instance Semigroup EWMAUpdate where
  EWMAUpdate a1 v1 <> EWMAUpdate a2 v2 =
    EWMAUpdate newAlpha newTarget
    where
      newAlpha = a1 + a2 - a1 * a2
      newTarget
        | newAlpha == 0 = 0
        | otherwise = (a1 * v1 + a2 * v2 - a1 * a2 * v2) / newAlpha

instance Monoid EWMAUpdate where
  mempty = EWMAUpdate 0 0

data EWMA = EWMA
  { ewmaPrior :: First Double,
    ewmaUpdate :: EWMAUpdate
  }
  deriving (Eq, Show)

instance Semigroup EWMA where
  EWMA p1 u1 <> EWMA p2 u2 = EWMA (p1 <> p2) (u1 <> u2)

singleEWMA :: Double -> Double -> EWMA
singleEWMA alpha x = EWMA (First x) (EWMAUpdate alpha x)

ewmaValue :: EWMA -> Double
ewmaValue (EWMA (First x) (EWMAUpdate a v)) = (1 - a) * x + a * v

ewma :: (Foldable1 t) => Double -> t Double -> Double
ewma alpha = ewmaValue . foldMap1 (singleEWMA alpha)

An EWMAUpdate is effectively a function of the form above, and its semigroup instance is reversed function composition. However, it is stored in a defunctionalized form so that the composition can be computed in advance. Then we add a few helper functions: singleEWMA to compute the EWMA for a single data point (with a given α value), ewmaValue to apply the update function to the prior and produce an estimate, and ewma to use this semigroup to compute the EWMA of any non-empty sequence.

Why?

What have we gained by looking at the problem this way?

The standard answer is that we’ve gained flexibility in how we perform the computation. In addition to computing an EWMA sequentially from left to right, we can do things like:

Compute the EWMA in a parallel or distributed way, farming out parts of the work to different threads or machines.
Compute an EWMA on the fly in the intermediate nodes of a balanced tree, such as one can do with Haskell’s fingertree package.

But neither of these are the reason I worked this out today. Instead, I was interested in a continuous time analogue of the EWMA. The traditional form of the EWMA is recursive, which gives you a value after each whole step of recursion, but doesn’t help much with continuous time. By working out this semigroup structure, the durations of each observation, which previously were implicit in the depth of recursion, turn into explicit weights that accumulate as different time intervals are concatenated. It’s then easy to see how you might incorporate a continuous observation with a non-unit duration, or with greater or lesser weight for some other reason.

That’s not new, of course, but it’s nice to be able to work it out simply by searching for algebraic structure.

The Semigroup of Exponentially Weighted Moving Averages

Defining the EWMA

The EWMA Semigroup

Why?

Written by Chris Smith