This notebook is motivated by a question posted on Operations Research Stack Exchange regarding the Holt formula for exponential smoothing of a univariate time series with linear trend, often referred to as “double exponential smoothing”.

Double exponential smoothing

I will use the notation from the Wikipedia page rather than the notation from the OR SE question. We start with a univariate time series \(x_t\), beginning at time \(t=0\) and presumed to contain linear trend. So the underlying generator of the time series is assumed to look like \[x_t = c_0 + c_1 t + \epsilon_t\] for some constants \(c_0\) and \(c_1,\) where \(\epsilon_t\) is independent and identically distributed noise. We do not assume that the noise is necessarily normally distributed, but we do assume zero mean for the noise (which is harmless, since we could otherwise just combine the mean of the noise with \(c_0\)).

The Holt double exponential smoothing formula uses smoothing weights \(\alpha, \beta \in (0,1)\) to produce estimates \(s_t\) and \(b_t\) for the current level (\(c_0 + c_1 t\)) at time \(t\) and the slope (\(c_1\)), respectively. The Wikipedia page starts with \(s_0=x_0\) and \(b_0=x_1 -x_0,\) which is fine. I think you could plug in other unbiased initials estimates of level and slope without violating the spirit of the method. Once you have initial values, the smoothing formulas are \[s_t = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})\] and \[b_t = \beta(s_t - s_{t-1}) + (1-\beta)b_{t-1}.\]

Question

The question on OR SE asked why the update to the slope is not computed as \[b_t = \beta(x_t - x_{t-1}) + (1-\beta)b_{t-1}.\] In other words, rather than blending the change in level estimates with the previous slope estimate, why not blend the change in actual observed value with the previous slope? The answer is that while you could do it that way, it would make the estimates a bit noisier. I did a bit of hand-waving in my answer on OR SE … OK, I flapped my arms hard enough to achieve liftoff … so I will try to provide experimental support here.

Experiment

We will generate observations of a time series containing linear trend and i.i.d. noise, compute level and slope estimates using both the Holt formula and the alternative raised in the OR SE question, and see how the estimates perform. To evaluate performance, we will generate multiple time series with the same parameters (including slope and intercept).

Setup

The following model parameters are utterly arbitrary.

horizon <- 100       # number of observations to generate per series
replications <- 100  # number of independent replications (time series) to generate
intercept <- 10      # $c_0$
slope <- 7           # $c_1$
sigma <- 4           # standard deviation of the noise in $x_t$
seed <- 1312         # random number seed

Generate the sample of \(x\) observations.

# Seed the random number generator.
set.seed(seed)
# Generate the x observations.
x <- intercept + slope * (0 : (horizon - 1)) + rnorm(horizon, mean = 0, sd = sigma)
for (r in 2:replications) {
  x <- rbind(x, intercept + slope * (0 : (horizon - 1)) + rnorm(horizon, mean = 0, sd = sigma))
}

Holt model

First we set the smoothing coefficients (again arbitrarily).

alpha <- 0.2  # weight for smoothing the level estimate
beta <- 0.15  # weight for smoothing the trend estimate

We generate smoothed estimates \(s_t\) of level and \(b_t\) of slope using the Holt formula.

# We first allocate vectors to hold the estimates.
hLevel <- matrix(data = NA, nrow = replications, ncol = horizon)  # level estimates ($s$)
hTrend <- hLevel                                                  # trend estimates ($b$)
# Next, initialize the estimates at time 1.
for (r in 1:replications) {
  hLevel[r, 1] <- x[r, 1]
  hTrend[r, 1] <- x[r, 2] - x[r, 1]
}
# Now use the smoothing formulas to fill in the remaining estimates.
for (t in 2:horizon) {
  for (r in 1:replications) {
    hLevel[r, t] <- alpha * x[r, t] + (1 - alpha) * (hLevel[r, t - 1] + hTrend[r, t - 1])
    hTrend[r, t] <- beta * (hLevel[r, t] - hLevel[r, t - 1]) + (1 - beta) * hTrend[r, t - 1]
  }
}

Variant

Using the same weights, we generate smoothed estimates using the variant in the trend update formula proposed in the OR SE question.

# We again allocate vectors to hold the estimates.
vLevel <- matrix(data = NA, nrow = replications, ncol = horizon)  # level estimates ($s$)
vTrend <- vLevel                                                  # trend estimates ($b$)
# Next, initialize the estimates at time 1, the same way as before.
for (r in 1:replications) {
  vLevel[r, 1] <- x[r, 1]
  vTrend[r, 1] <- x[r, 2] - x[r, 1]
}
# Now use the smoothing formulas to fill in the remaining estimates.
for (t in 2:horizon) {
  for (r in 1:replications) {
    # Level updates proceed as before.
    vLevel[r, t] <- alpha * x[r, t] + (1 - alpha) * (vLevel[r, t - 1] + vTrend[r, t - 1])
    # Trend updates replace the change in level estimate with the change in observed value.
    vTrend[r, t] <- beta * (x[r, t] - x[r, t - 1]) + (1 - beta) * vTrend[r, t - 1]
  }
}

Results.

We will use dplyr to manipulate the data and ggplot2 to plot results.

library(dplyr)
library(ggplot2)

For plotting purposes, we need to turn the level and trend estimates into a dataframe.

xdata <- data.frame(Time = rep.int(1:horizon, replications),
                    Method = "Holt",
                    Level = as.vector(t(hLevel)),
                    Trend = as.vector(t(hTrend)))
xdata2 <- data.frame(Time = rep.int(1:horizon, replications),
                     Method = "Variant",
                    Level = as.vector(t(vLevel)),
                    Trend = as.vector(t(vTrend)))
xdata <- rbind(xdata, xdata2)
rm(xdata2)

Next we convert individual observations into 95% confidence intervals.

xdata <- xdata %>% 
           group_by(Time, Method) %>% 
           summarize(Level_lo = t.test(Level, mu = mean(Level))$conf.int[1],
                     Level_hi = t.test(Level, mu = mean(Level))$conf.int[2],
                     Trend_lo = t.test(Trend, mu = mean(Trend))$conf.int[1],
                     Trend_hi = t.test(Trend, mu = mean(Trend))$conf.int[2])
`summarise()` has grouped output by 'Time'. You can override using the `.groups` argument.

Because the trend swamps the width of the level confidence intervals, making them hard to see, we convert both level and trend values into errors relative to the true level and trend.

xdata <- xdata %>% mutate(Level_lo = Level_lo - (intercept + slope*(Time - 1)),
                          Level_hi = Level_hi - (intercept + slope*(Time - 1)),
                          Trend_lo = Trend_lo - slope,
                          Trend_hi = Trend_hi - slope)

Level

Here we plot the confidence intervals for the error in estimated level over time, by method.

ggplot(data = xdata, mapping = aes(x = Time)) +
       labs(title = "95% C. I. for Error in Level Estimate", x = "Time", y = "Level Error") +
       geom_hline(mapping = aes(yintercept = 0)) +
       geom_errorbar(mapping = aes(color = Method, ymin = Level_lo, ymax = Level_hi)) +
       theme(plot.title = element_text(hjust = 0.5))

Initially the confidence intervals are large (and the proposed variation may actually do a bit better than the Holt method), but over time the width of the confidence intervals stabilizes and the two methods are pretty close in terms of estimate variability.

Slope

Here we plot the confidence intervals for the error in estimated slope over time, by method.

ggplot(data = xdata, mapping = aes(x = Time)) +
       labs(title = "95% C. I. for Error in Trend Estimate", x = "Time", y = "Trend Error") +
       geom_hline(mapping = aes(yintercept = 0)) +
       geom_errorbar(mapping = aes(color = Method, ymin = Trend_lo, ymax = Trend_hi)) +
       theme(plot.title = element_text(hjust = 0.5))

As with the error in level, the confidence intervals for the error in the trend estimates start large and stabilize (somewhat) in width, but here the Holt method has a clearly visible advantage in variability.

---
title: "Holt Exponential Smoothing"
author: Paul A. Rubin (rubin@msu.edu)
date: November 1, 2022
output: html_notebook
---

This notebook is motivated by a [question](https://or.stackexchange.com/questions/9217/forecasting-with-holts-linear-trend-method/) posted on Operations Research Stack Exchange regarding the Holt formula for [exponential smoothing](https://en.wikipedia.org/wiki/Exponential_smoothing#Double_exponential_smoothing_(Holt_linear)) of a univariate time series with linear trend, often referred to as "double exponential smoothing".

# Double exponential smoothing

I will use the notation from the Wikipedia page rather than the notation from the OR SE question. We start with a univariate time series $x_t$, beginning at time $t=0$ and presumed to contain linear trend. So the underlying generator of the time series is assumed to look like $$x_t = c_0 + c_1 t + \epsilon_t$$ for some constants $c_0$ and $c_1,$ where $\epsilon_t$ is independent and identically distributed noise. We do not assume that the noise is necessarily normally distributed, but we do assume zero mean for the noise (which is harmless, since we could otherwise just combine the mean of the noise with $c_0$).

The Holt double exponential smoothing formula uses smoothing weights $\alpha, \beta \in (0,1)$ to produce estimates $s_t$ and $b_t$ for the current level ($c_0 + c_1 t$) at time $t$ and the slope ($c_1$), respectively. The Wikipedia page starts with $s_0=x_0$ and $b_0=x_1 -x_0,$ which is fine. I think you could plug in other unbiased initials estimates of level and slope without violating the spirit of the method. Once you have initial values, the smoothing formulas are $$s_t = \alpha x_t + (1-\alpha)(s_{t-1} + b_{t-1})$$ and $$b_t = \beta(s_t - s_{t-1}) + (1-\beta)b_{t-1}.$$

# Question

The question on OR SE asked why the update to the slope is not computed as $$b_t = \beta(x_t - x_{t-1}) + (1-\beta)b_{t-1}.$$ In other words, rather than blending the change in level estimates with the previous slope estimate, why not blend the change in actual observed value with the previous slope? The answer is that while you could do it that way, it would make the estimates a bit noisier. I did a bit of hand-waving in my answer on OR SE ... OK, I flapped my arms hard enough to achieve liftoff ... so I will try to provide experimental support here.

# Experiment

We will generate observations of a time series containing linear trend and i.i.d. noise, compute level and slope estimates using both the Holt formula and the alternative raised in the OR SE question, and see how the estimates perform. To evaluate performance, we will generate multiple time series with the same parameters (including slope and intercept).

## Setup

The following model parameters are utterly arbitrary.

```{r}
horizon <- 100       # number of observations to generate per series
replications <- 100  # number of independent replications (time series) to generate
intercept <- 10      # $c_0$
slope <- 7           # $c_1$
sigma <- 4           # standard deviation of the noise in $x_t$
seed <- 1312         # random number seed
```

Generate the sample of $x$ observations.

```{r}
# Seed the random number generator.
set.seed(seed)
# Generate the x observations.
x <- intercept + slope * (0 : (horizon - 1)) + rnorm(horizon, mean = 0, sd = sigma)
for (r in 2:replications) {
  x <- rbind(x, intercept + slope * (0 : (horizon - 1)) + rnorm(horizon, mean = 0, sd = sigma))
}
```

## Holt model

First we set the smoothing coefficients (again arbitrarily).

```{r}
alpha <- 0.2  # weight for smoothing the level estimate
beta <- 0.15  # weight for smoothing the trend estimate
```

We generate smoothed estimates $s_t$ of level and $b_t$ of slope using the Holt formula.

```{r}
# We first allocate vectors to hold the estimates.
hLevel <- matrix(data = NA, nrow = replications, ncol = horizon)  # level estimates ($s$)
hTrend <- hLevel                                                  # trend estimates ($b$)
# Next, initialize the estimates at time 1.
for (r in 1:replications) {
  hLevel[r, 1] <- x[r, 1]
  hTrend[r, 1] <- x[r, 2] - x[r, 1]
}
# Now use the smoothing formulas to fill in the remaining estimates.
for (t in 2:horizon) {
  for (r in 1:replications) {
    hLevel[r, t] <- alpha * x[r, t] + (1 - alpha) * (hLevel[r, t - 1] + hTrend[r, t - 1])
    hTrend[r, t] <- beta * (hLevel[r, t] - hLevel[r, t - 1]) + (1 - beta) * hTrend[r, t - 1]
  }
}
```

## Variant

Using the same weights, we generate smoothed estimates using the variant in the trend update formula proposed in the OR SE question.

```{r}
# We again allocate vectors to hold the estimates.
vLevel <- matrix(data = NA, nrow = replications, ncol = horizon)  # level estimates ($s$)
vTrend <- vLevel                                                  # trend estimates ($b$)
# Next, initialize the estimates at time 1, the same way as before.
for (r in 1:replications) {
  vLevel[r, 1] <- x[r, 1]
  vTrend[r, 1] <- x[r, 2] - x[r, 1]
}
# Now use the smoothing formulas to fill in the remaining estimates.
for (t in 2:horizon) {
  for (r in 1:replications) {
    # Level updates proceed as before.
    vLevel[r, t] <- alpha * x[r, t] + (1 - alpha) * (vLevel[r, t - 1] + vTrend[r, t - 1])
    # Trend updates replace the change in level estimate with the change in observed value.
    vTrend[r, t] <- beta * (x[r, t] - x[r, t - 1]) + (1 - beta) * vTrend[r, t - 1]
  }
}
```

## Results.

We will use `dplyr` to manipulate the data and `ggplot2` to plot results.

```{r}
library(dplyr)
library(ggplot2)
```

For plotting purposes, we need to turn the level and trend estimates into a dataframe.

```{r}
xdata <- data.frame(Time = rep.int(1:horizon, replications),
                    Method = "Holt",
                    Level = as.vector(t(hLevel)),
                    Trend = as.vector(t(hTrend)))
xdata2 <- data.frame(Time = rep.int(1:horizon, replications),
                     Method = "Variant",
                    Level = as.vector(t(vLevel)),
                    Trend = as.vector(t(vTrend)))
xdata <- rbind(xdata, xdata2)
rm(xdata2)
```

Next we convert individual observations into 95% confidence intervals.

```{r}
xdata <- xdata %>% 
           group_by(Time, Method) %>% 
           summarize(Level_lo = t.test(Level, mu = mean(Level))$conf.int[1],
                     Level_hi = t.test(Level, mu = mean(Level))$conf.int[2],
                     Trend_lo = t.test(Trend, mu = mean(Trend))$conf.int[1],
                     Trend_hi = t.test(Trend, mu = mean(Trend))$conf.int[2])
```
Because the trend swamps the width of the level confidence intervals, making them hard to see, we convert both level and trend values into errors relative to the true level and trend.

```{r}
xdata <- xdata %>% mutate(Level_lo = Level_lo - (intercept + slope*(Time - 1)),
                          Level_hi = Level_hi - (intercept + slope*(Time - 1)),
                          Trend_lo = Trend_lo - slope,
                          Trend_hi = Trend_hi - slope)
```

### Level

Here we plot the confidence intervals for the error in estimated level over time, by method.

```{r}
ggplot(data = xdata, mapping = aes(x = Time)) +
       labs(title = "95% C. I. for Error in Level Estimate", x = "Time", y = "Level Error") +
       geom_hline(mapping = aes(yintercept = 0)) +
       geom_errorbar(mapping = aes(color = Method, ymin = Level_lo, ymax = Level_hi)) +
       theme(plot.title = element_text(hjust = 0.5))
```

Initially the confidence intervals are large (and the proposed variation may actually do a bit better than the Holt method), but over time the width of the confidence intervals stabilizes and the two methods are pretty close in terms of estimate variability.

### Slope

Here we plot the confidence intervals for the error in estimated slope over time, by method.

```{r}
ggplot(data = xdata, mapping = aes(x = Time)) +
       labs(title = "95% C. I. for Error in Trend Estimate", x = "Time", y = "Trend Error") +
       geom_hline(mapping = aes(yintercept = 0)) +
       geom_errorbar(mapping = aes(color = Method, ymin = Trend_lo, ymax = Trend_hi)) +
       theme(plot.title = element_text(hjust = 0.5))
```

As with the error in level, the confidence intervals for the error in the trend estimates start large and stabilize (somewhat) in width, but here the Holt method has a clearly visible advantage in variability.
