Let’s run a simple experiment to see how well the `stl()`

function of the R statistical programming language decomposes time-series data.

### An Example

First, we plot some sales data:

```
sales<-c(39, 73, 41, 76, 75, 47, 4, 53, 40, 47, 31, 33,
58, 85, 61, 98, 90, 59, 34, 74, 78, 74, 56, 55,
91, 125, 96, 135, 131, 103, 86, 116, 117, 128, 113, 123)
time.series <- ts(data=sales, frequency = 12, start=c(2000, 1), end=c(2002, 12))
plot(time.series, xlab="Time", ylab="Sales (USD)", main="Widget Sales Over Time")
```

Observe the annual seasonality in the data:

We apply R's `stl()`

function ("seasonal and trend decomposition using Loess") to the sales data:

```
decomposed <- stl(time.series, s.window="periodic")
plot(decomposed)
```

This decomposes the sales data as the sum of a seasonal, a trend and a noise/remainder time-series:

We may easily extract the component time series:

```
decomposed <- stl(time.series, s.window="periodic")
seasonal <- decomposed$time.series[,1]
trend <- decomposed$time.series[,2]
remainder <- decomposed$time.series[,3]
```

This allows us to plot the seasonally-adjusted sales:

```
plot(trend+remainder,
main="Widget Sales over Time, Seasonally Adjusted",
ylab="Sales (USD)")
```

### An Experiment

How well does `stl()`

extract trend and seasonality from data? We run three simple graphical investigations.

#### Case 1: Strong seasonality and low, normally-distributed homoskedastic noise

The left side of each of the above images shows, from top to bottom:

- Generated sales data.
- The trend component from which the data was generated.
- The seasonal component from which the data was generated.
- The noise/remainder component from which the data was generated.

The right side shows:

- Generated sales data.
- The trend component identified by
`stl()`

. - The seasonal component identified by
`stl()`

. - The noise/remainder component identified by
`stl()`

.

Note the close match between the two trend components and between the two seasonal components. This indicates that `stl()`

works well in this instance.

#### Case 2: Weak seasonality and high, normally-distributed homoskedastic noise

Again, `stl()`

appears to work quite well.

#### Case 3: Weak seasonality and high, normally-distributed heteroskedastic noise

And `stl()`

still seems to work fairly well. This is heartening, as it's common for the variance in a time series to increase as its mean rises—as is the case here.

### How `stl()`

Works

When calling `stl()`

with `s.window="periodic"`

, the seasonal component for January is simply the mean of all January values. Similarly, the seasonal component for February is simply the mean of all February

values, etc. Otherwise, the seasonal component is calculated using loess smoothing (discussed below).

Having calculated the seasonal component, the seasonally-adjusted data (the original data minus the seasonal component) is loess-smoothed to determine the trend.

The remainder/noise is then the original data minus the seasonal and trend components.

The `stl()`

function is quite flexible:

- The seasonality does not have to run across a year. Any period may be used for this.
- The decomposition process can accommodate seasonality that changes over time.
- A robust decomposition process is available that is less affected by outliers than is the default.

### An Introduction to Loess Smoothing

Loess ("locally-weighted scatterplot smoothing") uses local regression to remove "jaggedness" from data.

- A window of a specified width is placed over the data. The wider the window, the smoother the resulting loess curve.
- A regression line (or curve) is fitted to the observations that fall within the window, the points closest to the centre of the window being weighted to have the greatest effect on the calculation of the regression line.
- The weighting is reduced on those points within the window that are furthest from the regression line. The regression is re-run and weights are again re-calculated. This process is repeated several times.
- We thereby obtain a point on the loess curve. This is the point on the regression line at the centre of the window.
- The loess curve is calculated by moving the window across the data. Each point on the resulting loess curve is the intersection of a regression line and a vertical line at the centre of such a window.

To calculate a loess curve using R:

```
plot(cars$speed, cars$dist, main="Car Speed and Stopping Distance", xlab="Speed (mph)", ylab="Stopping Distance (ft)")
lines(lowess(cars$speed, cars$dist), col="red")
```

### Generating Test Data

Here, for completeness, is the code I used to generate the graphs I used in my tests:

```
# Parameters
start.ym <- c(2000, 1)
end.ym <- c(2012,12)
n.years <- 13
# Set the seed for the randomisation
set.seed(5)
# Create the 2nd derivative of the sales trend
ddtrend.sales <- qnorm(runif(12*n.years, 0.1, 0.90), mean=0, sd=0.4)
# Create the 1st derivative of the sales trend from the 2nd derivative
dtrend.sales <- rep(NA, 12*n.years)
dtrend.sales[1] <- 0
for (i in 2:(12*n.years)) dtrend.sales[i] <- dtrend.sales[i-1] + ddtrend.sales[i]
# Create the sales trend from the 1st derivative
trend.sales <- rep(NA, 12*n.years)
trend.sales[1] <- 30
for (i in 2:(12*n.years)){
trend.sales[i] <- trend.sales[i-1] + dtrend.sales[i]
if (trend.sales[i] < 0) trend.sales[i] = 0
}
# Create the seasonality
seasonality <- rep(c(10, 30, 22, 32, 26, 14, 2, -15, -14, -13, -16, -2), 13)
# Create the random noise, normally distributed
noise <- qnorm(runif(12*n.years, 0.01, 0.99), mean=0, sd=18)
# To make the noise heteroskedastic, uncomment the following line
# noise <- noise * seq(1, 10, (10-1)/(12*n.years-1))
# Create the sales
sales <- trend.sales + seasonality + noise
# Put everything into a data frame
df.sales <- data.frame(sales, trend.sales, dtrend.sales, ddtrend.sales, seasonality, noise)
# Set graphical parameters and the layout
par(mar = c(0, 4, 0, 2)) # bottom, left, top, right
layout(matrix(c(1,5,2,6,3,7,4,8), 4, 2, byrow = TRUE), widths=c(1,1,1,1,1,1,1,1), heights=c(1,1,1,1,1,1,1,1))
# Plot sales
tseries <- ts(data=df.sales$sales, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Sales (USD, 1000's)", main="", xaxt="n")
# Plot the trend
tseries <- ts(data=df.sales$trend.sales, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Actual Sales Trend (USD, 1000's)", main="", xaxt="n")
# Plot the seasonality
tseries <- ts(data=df.sales$seasonality, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Actual Sales Seasonality (USD, 1000's)", main="", xaxt="n")
# Plot the noise
tseries <- ts(data=df.sales$noise, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Actual Sales Noise (USD, 1000's)", main="", xaxt="n")
# Decompose the sales time series
undecomposed <- ts(data=df.sales$sales, frequency = 12, start=start.ym, end=end.ym)
decomposed <- stl(undecomposed, s.window="periodic")
seasonal <- decomposed$time.series[,1]
trend <- decomposed$time.series[,2]
remainder <- decomposed$time.series[,3]
# Plot sales
tseries <- ts(data=df.sales$sales, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Sales (USD, 1000's)", main="", xaxt="n")
# Plot the decomposed trend
tseries <- ts(data=trend, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Est. Sales Trend (USD, 1000's)", main="", xaxt="n")
# Plot the decomposed seasonality
tseries <- ts(data=seasonal, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Est. Sales Seasonality (USD, 1000's)", main="", xaxt="n")
# Plot the decomposed noise
tseries <- ts(data=remainder, frequency = 12, start=start.ym, end=end.ym)
plot(tseries, ylab="Est. Sales Noise (USD, 1000's)", main="", xaxt="n")
```

Author: Peter Rosenmai