Introducing: fable.gam, a new forecasting option
Published:
Another time-series decomposition rant
Readers of this blog know that boy do I ever froth a structural time-series analysis approach. Basically, this involves decomposing a time series into the sum of simpler parts. Such a model might take the following form:
\[ f(x) = f_{1}(x) + f_{2}(x) + \dots + f_{n}(x) + \epsilon \] where \(f_{1}(x) \dots f_{n}(x)\) are the different parts of the time series we are explicitly modelling, and \(\epsilon\) is the error term. Examples of these parts may include:
- Trend - Is there an increase/decrease over time?
- Autocorrelation structure - How correlated are points to those that came before it? At which lag is this relationship meaningful (e.g., one point behind? Five points behind?)?
- Seasonality — Are there recurring periodicities in the data (e.g., cyclic patterns observed at a weekly and monthly level for retail sales)?
- Holidays - Are there infrequent holiday periods which are important to capture?
I really like this structural time series framework because it is highly interpretable and forces the analyst to make active and appropriate modelling decisions. One of the other benefits is that because the framework is additive, it can be exploited through powerful models such as generalised additive models and Gaussian processes. I have previously blogged about the latter here and here, but in this post we are focusing on the former: GAMs.
Generalised additive models
A GAM is a generalised linear model (GLM) in which the linear response variable depends linearly on unknown smooth functions of variables. This is powerful because these smooth functions are splines constructed of local basis functions, meaning that all manner of nonlinear relationships can be handled. Moreover, there are different ways to construct these smooth functions, such as cyclic cubic splines which ensure that the starting of each new spline connects to the end of the previous one - an essential property for modelling things like seasonality.
We can write a GAM as follows:
\[ g(E(Y)) = \beta_{0} + f_{1}(x_{1}) + f_{2}(x_{2}) + \cdots + f_{n}(x_{n}) \] where \(g\) is the link function used to connect the expected value to the target distribution, such as the log-link for the Poisson distribution, \(\beta_{0}\) is the intercept, and \(f\) are the unknown smooth functions to estimate. Note the similarity to the structural time series above!
More
Now, at this point, you might be wondering about the title of this post, and rightly so. How do GAMs fit into the forecasting software landscape? Well, they currently don’t really…
GAMs have been used to successfully model a range of time series, including daily Central England Temperature data (which are highly seasonal) and paleoecological time series, among many others. However, despite this success, GAMs remain a relatively underutilised tool in statistical analysis in general, but especially forecasting, where methods such as the classic exponential smoothing and autoregressive integrated moving average (ARIMA), as well as dynamic harmonic regression and more modern methods such as neural networks continue to dominate practice and the literature.
The closest existing forecasting framework to GAMs is the Prophet model developed by Facebook. Prophet follows the aforementioned decomposition time series approach, and can be written as:
\[ y(t) = g(t) + s(t) + h(t) + \epsilon_{t}. \] where \(g(t)\) is the trend (nonlinear) function, \(s(t)\) represents periodic changes (e.g., seasonality), and \(h(t)\) represents the effects of holidays (as Facebook developed Prophet for business time series modelling). The error term \(\epsilon_{t}\) represents any variance which is not modelled by the other components, though this is commonly reduced to a parametric Gaussian assumption.
While Prophet is basically a GAM, it is also written in the probabilistic programming language Stan and interfacing with the priors is not something analysts are necessarily familiar with. Moreover, there exists a gold standard package for fitting GAMs in R called mgcv which is not currently leveraged within the forecasting space, and, as a giant mgcv fan, I just wanted to see if I could do something with it!
The fable tidy forecasting ecosystem for R
fable is a framework evaluating, visualising, and combining models in a workflow consistent with the tidyverse. It provides an incredibly intuitive ecosystem and interface for doing real-world forecasting and removes a lot of the opportunities for user error to occur through its automatic handling of predictive distributions and other key aspects of the forecasting workflow.
fable provides easy access to a wide range of models right off the rip, including exponential smoothing, ARIMA, time series linear models, dynamic harmonic regression, and many more. Extension packages, such as fable.prophet, have also been made to bring additional models into the framework. With all that in mind, it goes without saying based off the title of this post that I have developed fable.gam—an extension to fable to enable the fitting of GAM forecast models.
Enough waffle, let’s load some R packages and dive right in!
library(dplyr)
library(lubridate)
library(ggplot2)
library(tsibbledata)
library(fable)
library(fable.gam)
library(fable.prophet)
library(fpp3)A practical case study
To demonstrate fable.gam, we are going to use a case study from the incredible Forecasting: Principles and Practice 3rd Edition (fpp3) which examines quarterly cement production. Because the series is relatively long, we can split the data into a training and a test set to evaluate predictive performance:
cement <- aus_production |>
filter(year(Quarter) >= 1988)
train <- cement |>
filter(year(Quarter) <= 2007)The training data is plotted below, where we can see some cyclic patterns and a nonlinear trend (in the raw non-differenced data).
train |>
autoplot(vars(Cement))The below code uses the incredible fable forecasting workflow to fit the same three models as were compared in fpp3 (exponential smoothing, ARIMA, and Prophet) as well as a GAM from fable.gam. fable.gam makes use of the fabletools ‘special’ functions trend and season which have been modified in fable.gam for the purposes of GAMs. Under the hood, fable.gam parses a trend() call as modelling a smooth function over time to capture temporal effects and a season() call as a smooth function using a cyclic cubic basis spline to ensure that over the duration of the time series, the start and end of a seasonal term connects (e.g., for data measured on a monthly basis, the end of final month – December – needs to connect continuously to the start of the following January). In addition to the trend() and season() special functions, fable.gam also includes the special trend2() which lets users control the smooth term fit to the temporal trend, just like if you were modifying the parameters of s() in raw mgcv. trend2() takes arguments k (the dimension of the basis used to represent the smooth term; i.e., the number of knots) and bs (a two letter character string indicating the smoothing basis to use).
Note that in the code below, I am making some very deliberate modelling decisions here (i.e., this is not a hands-off process):
- I am constraining the degree to which the trend term “overfits” to local trends by specifying a basis dimension of
k = 4. As I will show later, this prevents the model from forecasting out the (erroneous) upward tick near the end of the training data - I am using a Gaussian process smooth function. The default smooth in
mgcvis a thin plate regression spline, but in my testing of this example, the GP produces better forecasts - I am specifying both quarterly and annual seasonality to capture both sources of variance
- I am capturing residual serial correlation at lag one
fit <- train |>
model(
arima = ARIMA(Cement),
ets = ETS(Cement),
prophet = prophet(Cement ~ season(period = 4, order = 2,
type = "multiplicative")),
mygam = GAM(Cement ~ trend2(k = 4, bs = "gp") + season(4) + season(12) + errors(ar = 1))
)We can easily use fable to then produce forecasts for each model and draw a plot:
fc <- fit |>
forecast(h = "2 years 6 months")
fc |>
autoplot(cement) +
facet_wrap(~.model) +
theme(legend.position = "bottom")It seems like the forecast distributions are also reasonably similar, but we can explicitly quantify this through computing accuracy metrics:
fc |>
accuracy(cement)## # A tibble: 4 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima Test -161. 216. 186. -7.71 8.68 1.27 1.26 0.387
## 2 ets Test -171. 222. 191. -8.07 8.85 1.30 1.29 0.579
## 3 mygam Test -29.4 181. 154. -1.99 6.97 1.05 1.05 0.555
## 4 prophet Test -175. 248. 215. -8.33 9.86 1.47 1.44 0.703We see that according to each error metric, the GAM model is the most accurate model!
Why choosing an appropriate trend matters (a cautionary tale for GAMs in forecasting work)
As I mentioned earlier, the choose of trend can completely change the results. Let’s set k to a higher value (which means the smooth function will more closely match local trends in the training data) and see how we perform:
train |>
model(
arima = ARIMA(Cement),
ets = ETS(Cement),
prophet = prophet(Cement ~ season(period = 4, order = 2,
type = "multiplicative")),
mygam = GAM(Cement ~ trend2(k = 5, bs = "gp") + season(4) + season(12) + errors(ar = 1))
) |>
forecast(h = "2 years 6 months") |>
accuracy(cement)## # A tibble: 4 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima Test -161. 216. 186. -7.71 8.68 1.27 1.26 0.387
## 2 ets Test -171. 222. 191. -8.07 8.85 1.30 1.29 0.579
## 3 mygam Test -172. 258. 222. -8.36 10.3 1.52 1.50 0.598
## 4 prophet Test -176. 248. 215. -8.35 9.88 1.47 1.44 0.705We are now worse! How about if we also use the default thin plate regression spline instead of a Gaussian process?
train |>
model(
arima = ARIMA(Cement),
ets = ETS(Cement),
prophet = prophet(Cement ~ season(period = 4, order = 2,
type = "multiplicative")),
mygam = GAM(Cement ~ trend2(k = 5) + season(4) + season(12) + errors(ar = 1))
) |>
forecast(h = "2 years 6 months") |>
accuracy(cement)## # A tibble: 4 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima Test -161. 216. 186. -7.71 8.68 1.27 1.26 0.387
## 2 ets Test -171. 222. 191. -8.07 8.85 1.30 1.29 0.579
## 3 mygam Test -232. 308. 263. -11.0 12.2 1.80 1.79 0.628
## 4 prophet Test -176. 248. 216. -8.35 9.90 1.47 1.44 0.704The negative performance differential continues to climb! This really highlights why the analyst-in-the-loop approach to forecasting is critical, especially when using a curve-fitting approach instead of modelling an underlying data generating process (like in ARIMA) or applying some sort of dampening. It also highlights that GAMs are not an appropriate tool for all forecasting applications (though no tool ever is, as per the no free lunch theorem). In fact, I would say at this stage, they are likely only useful for short-range forecasting, given the spline nature of the trend term.
A second example: Corticosteroid drug sales in Australia
Let’s take a look at one more example. Again, we will use a case study from fpp3 so we have a nice reference point against traditional forecasting methods. This time, we are going to forecast monthly corticosteroid drug sales in Australia, which are known as H02 drugs under the Anatomical Therapeutic Chemical classification scheme. The data looks like this:
h02 <- PBS |>
filter(ATC2 == "H02") |>
reframe(Cost = sum(Cost) / 1e6, .by = "Month") |>
as_tsibble(index = "Month")
train2 <- h02 |>
filter(year(Month) < 2007)
train2 |>
autoplot(vars(Cost))The seasonality is very evident. We can easily fit the same seasonal ARIMA as is included in fpp3 as well as a GAM specification I created which generates very similar results:
fit2 <- train2 |>
model(
mygam = GAM(log(Cost) ~ trend2(k = 8, bs = "gp") + season(12) + errors(ar = 2)),
arima = ARIMA(log(Cost) ~ 0 + pdq(3, 0, 1) + PDQ(0, 1, 2))
)
fc2 <- fit2 |>
forecast(h = "1 year 6 months")
fc2 |>
autoplot(h02) +
theme(legend.position = "bottom")Examination of accuracy metrics further highlights the similarity in forecast quality between the two methods:
fc2 |>
accuracy(h02)## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima Test 0.0201 0.0694 0.0541 1.63 6.62 0.903 0.963 -0.104
## 2 mygam Test 0.00602 0.0706 0.0578 -0.456 7.30 0.966 0.979 0.0456Additional functionality in fable.gam: Interpolating missing data
Since fable.gam integrates directly into fable, all of the other functionality is readily available to use with GAMs, such as bootstrapping (forecast(x, bootstrap = TRUE)), generating future simulated paths (generate()), inspecting and understanding fitted model parameters through tidy(), glance(), and report(), and interpolating missing data (interpolate()). I’ll demonstrate the latter here is a final example.
Let’s take the Olympic running dataset in tsibbledata:
olympic_running |>
ggplot(aes(x = Year, y = Time, colour = Sex)) +
geom_line() +
facet_wrap(~Length, scales = "free_y") +
theme(legend.position = "bottom")We can see a bunch of missing data points by group. Using the same modelling workflow as above, we can fill these gaps statistically using a GAM by fitting a model and then calling interpolate which is a special fable operation designed to work with models fit within the framework:
intp <- olympic_running |>
model(lm = GAM(Time ~ trend2(k = 3))) |>
interpolate(olympic_running)
intp |>
ggplot(aes(x = Year, y = Time, colour = Sex)) +
geom_line() +
facet_wrap(~Length, scales = "free_y") +
theme(legend.position = "bottom")Looks like the interpolations are pretty sensible from this basic model!
A final example: Scenario forecasting with exogenous regressors
Another common task in real-world forecasting is to forecast different scenarios which are comprised of various values of predictor variables. One application of this methodology is cost-benefit analysis, where sentivity analysis typically tests a variety of scenarios against a base case. fable.gam can handle the inclusion of exogenous regressors just like the TSLM linear regression model in fable, only in this case, we can also model nonlinearities with smooths.
Take the us_change dataset from fpp3 for example. In the textbook, the following scenario is presented:
A US policy maker is interested in comparing the predicted change in consumption when there is a constant growth of 1% and 0.5% respectively for income and savings with no change in the employment rate, versus a respective decline of 1% and 0.5%, for each of the four quarters following the end of the sample.
Let’s inspect the relationships between each of the predictors/exogenous regressors analysed in fpp3 with the outcome of change in consumption:
us_change |>
dplyr::select(-c(Unemployment)) |>
pivot_longer(cols = Income:Savings, names_to = "names", values_to = "values") |>
ggplot(aes(x = values, y = Consumption)) +
geom_point() +
facet_wrap(~names, scales = "free", nrow = 3)There are visible increasing relationships between all three variables and consumption, but there appears to be some nonlinearity (if we ignore transformations for now). What might a GAM through each of them look like?
us_change |>
dplyr::select(-c(Unemployment)) |>
pivot_longer(cols = Income:Savings, names_to = "names", values_to = "values") |>
ggplot(aes(x = values, y = Consumption)) +
geom_point() +
geom_smooth(formula = y ~ s(x, k = 5), method = "gam", alpha = 0.3) +
facet_wrap(~names, scales = "free", nrow = 3)Looks like we might be onto something. Let’s now model the data formally in fable and set up our scenarios as per the above text (note that since the relationship between Income and Consumption appears to be linear, I’ll just include it as a linear predictor and not a smooth to demonstrate the powerful functionality of GAMs):
fit_consBest <- us_change |>
model(
lm = TSLM(Consumption ~ Income + Savings + Unemployment),
mygam = GAM(Consumption ~ Income + xreg(Savings, smooth = TRUE, k = 5) + xreg(Unemployment, smooth = TRUE, k = 5))
)We can now set up our forecast scenarios according to the above text using the scenarios() functionality in fable, compute the forecasts, and produce a plot to visualise the TSLM and GAM predictions:
future_scenarios <- scenarios(
Increase = new_data(us_change, 4) |>
mutate(Income = 1, Savings = 0.5, Unemployment = 0),
Decrease = new_data(us_change, 4) |>
mutate(Income = -1, Savings = -0.5, Unemployment = 0),
names_to = "Scenario")
fc3 <- forecast(fit_consBest, new_data = future_scenarios)
us_change |>
autoplot(Consumption) +
autolayer(fc3) +
labs(title = "US consumption", y = "% change") +
facet_wrap(~.model) +
theme(legend.position = "bottom", legend.box = "vertical",
legend.margin = margin())Pretty nice. Hopefully this shows why the fable tidy forecasting framework is so powerful: you can go from scientific modelling exercise to results that can inform policy making within one intuitive workflow.
Parting thoughts
Thanks for making it through another time series rant! There is a lot more work and thinking for me to do in this space, but the findings presented here at least suggest that it is an interesting research direction.
