Unobserved Components Time Series Methodology

Click on one of the following topics: 14 Statistical Treatment of Models; 14.1 Model definitions; 14.1.1 Univariate time series models; 14.1.2 Explanatory variables and interventions; 14.1.3 Multivariate time series models; 14.1.4 Common factors; 14.1.4.1 Common levels; 14.1.4.2 Smooth trends with common slopes; 14.1.4.3 Common trends: level and slopes; 14.1.4.4 Common seasonals; 14.1.4.5 Common cycles; 14.1.5 Explanatory variables; 14.2 State space form; 14.2.1 Structural time series models in SSF; 14.2.1.1 Univariate models in SSF; 14.2.1.2 Multivariate model in SSF; 14.2.1.3 Deterministic trend and seasonal components; 14.3 Kalman filter; 14.3.1 The augmented Kalman filter; 14.3.2 The likelihood function; 14.3.2.1 Univariate models: the concentrated likelihood; 14.3.2.2 Homogeneous models; 14.3.2.3 Multivariate models; 14.3.3 Prediction error variance; 14.3.4 The final state and regression estimates; 14.3.5 Filtered components; 14.3.6 Residuals; 14.3.6.1 Generalised least squares residuals; 14.3.6.2 Generalised recursive residuals; 14.4 Disturbance smoother; 14.4.1 The augmented disturbance smoother; 14.4.2 The EM algorithm and exact score; 14.4.3 Smoothed components; 14.4.4 Auxiliary residuals; 14.5 Forecasting; 14.5.1 Forecasts of series and components; 14.5.2 Extrapolative residuals; 14.6 Parameter estimation; 14.6.1 The parameters of STAMP; 14.6.1.1 Variance matrices in a multivariate model; 14.6.1.2 Vector autoregressive components; 14.6.2 BFGS Estimation procedure^*; 14.6.3 Estimation of univariate models^*; 14.6.3.1 Initial values; 14.6.3.2 Strategy for setting parameters fixed; 14.6.3.3 Strategy for determining the concentrated parameter; 14.6.4 Estimation of multivariate models^*; 14.6.4.1 Initial values; 14.6.4.2 Strategy for fixing parameters; 14.7 Appendix 1: Diffuse distributions^*; 14.7.1 Collapse of the augmented KF; 14.7.2 Likelihood calculation; 14.7.3 Recursive regressions; 14.8 Appendix 2: Numerical optimisation^*; 14.8.1 Newton type methods; 14.8.2 Numerical score vector and diagonal Hessian matrix; 15 Model Output; 15.1 Output from STAMP; 15.1.1 Model estimation; 15.1.2 Selected model and estimation output; 15.1.3 Summary statistics; 15.1.4 The sample period; 15.1.5 Test; 15.2 Parameters; 15.2.1 Variances and standard deviations; 15.2.2 Cycle and AR(1); 15.2.3 Variance matrices (multivariate models); 15.2.4 Factor loading matrices (multivariate models); 15.2.5 Transformed parameters and standard errors; 15.3 Final state; 15.3.1 Analysis of state; 15.3.2 Regression analysis; 15.3.3 Seasonal tests; 15.3.4 Cycle tests; 15.3.5 Data in logs; 15.4 Goodness of fit; 15.4.1 Prediction error variance; 15.4.2 Prediction error mean deviation; 15.4.3 Coefficients of determination; 15.4.4 Information criteria : AIC and BIC; 15.5 Components; 15.5.1 Series with components; 15.5.2 Detrended; 15.5.3 Seasonally adjusted; 15.5.4 Individual seasonals; 15.5.5 Data in logs; 15.6 Residuals; 15.6.1 Correlogram; 15.6.2 Periodogram and spectrum; 15.6.3 Cumulative statistics and graphs; 15.6.4 Distribution statistics; 15.6.5 Heteroskedasticity; 15.7 Auxiliary residuals; 15.8 Predictive testing; 15.9 Forecast; Part:4 Statistical Output; 13 Descriptive Statistics; 13.1 Transformations; 13.2 Correlogram; 13.3 Periodogram; 13.4 Spectrum; 13.5 CUSUM; 13.6 CUSUM of squares; 13.7 Histogram; 13.8 Density estimation; 13.9 Distribution function; 13.10 Write report; 13.10.1 Serial correlation; 13.10.2 Normality test

List of tables: Table:13.1 Empirical size of normality tests; Table:14.1 Some special level and trend specifications; Table:14.2 Parameters and transformations

Part:4 Statistical Output

Chapter 13 Descriptive Statistics

The Transform entry in the Data menu involves transformation of series and calculation of statistics. This chapter provides the formulae underlying the computations of descriptive statistics for individual time series. Model-related diagnostic statistics are presented in Chapter 15.

The Describe dialog, to be accessed from the Transform dialog, produces two types of output. The first, and most useful, is graphical output. Each of the options we discuss below produces graphs of one sort or another. However, the last entry in the Describe dialog, `Write report', generates output to be written to the Results window. By default this is turned off, but it is often useful to mark it so that precise summary statistics can be recorded for later use in a research report.

13.1 Transformations

This option provides two different kinds of transformations: a transform function for the actual values and a time series transform function. These different transformations are described below.

The very common and basic transformations are the logarithm, the exponent and the absolute values. The `Box--Cox' and the `Stochastic volatility' transformations require some explanation. The transformation of Box and Cox (1964) is given by

(y_t^λ -1)/(λÿ)	if λ≠0
log y_t	if λ=0

, where ÿ= exp ( 1/T ∑_t=1^T log y_t) , t=1,...,T,

(eq:13.1)

where λ usually ranges between -2 and 2 (STAMP does not accept values outside this range to ensure numerical stability). Note that the Box--Cox transformation with the default λ= 1/2 is very similar to the `square root' transformation. The following stochastic volatility transformation (see also Chapter ??) is suggested by Professor Wayne Fuller, see Fuller (1996, p.494--497); that is,

log y_t²= log (y_t²+λs_y²)-λs_y²/(y_t²+λs_y²), t=1,...,T

(eq:13.2)

where s_y² is the unconditional variance of the y_t, and λ is a small constant value, the default value 0.02.

The second type of transformations focuses on differencing or integrating time series. If differencing is required, then the following is optional:

None	y_t
First	Δy_t=(1-L)y_t
Second	Δ²y_t=(1-L)²y_t
Seasonal sum	S(L)y_t=(1+L+...+L^s-1)y_t
Seasonal	Δ_sy_t=(1-L^s)y_t
First &seasonal	ΔΔ_sy_t=(1-L)(1-L^s)y_t

(eq:13.3)

where L is the lag operator (so Ly_t=y_t-1) and s is the number of seasonals in a year. When the differencing involves missing data, such as y₀ and y_-1, then the resulting differenced series will record a missing value. An example of this is that Δy₁ is regarded as a missing value. If integration is required, then the resulting series, called z_t, are defined as

None	z_t=y_t
First	Δz_t=y_t
Second	Δ²z_t=y_t
Seasonal sum	S(L)z_t=y_t
Seasonal	Δ_sz_t=y_t
First &seasonal	ΔΔ_sz_t=y_t

(eq:13.4)

where z_t is set to zero for any t≤0. This operation does not generate extra missing values.

Finally, the series can be detrended. This is carried out using the following trend model:

y_t	=	μ_t+ε_t,	ε_t~NID(0,σ²),
μ_t	=	μ_t-1+β_t-1
β_t	=	β_t-1+σ_t,	σ_t~NID(0,qσ²),

(eq:13.5)

which is discussed in §14.1. A forwards and backwards recursive algorithm is used to estimate μ_t, written by \mu _t, allowing the `detrended' series to be computed as y_t-\mu _t. The default parameter for q is 0.000625=1/1600, as suggested by Hodrick and Prescott. By increasing this number, a more responsive trend line is fitted, while if it is set to zero, a deterministic time trend is obtained.

13.2 Correlogram

The correlogram is defined as the series r_τ , for 1,...,p, where r_τ is the sample correlation between y_t and y_t-τ. The length of the correlogram p is pre-specified, leading to a figure which shows r₁,r₂,...,r_p plotted against 1,2,...,p. The autocorrelation for lag τ is defined by

r_τ =

∑_t=τ+1^T( y_t-y) ( y_t-τ-y)

∑_t=τ+1^T( y_t-y) ²

(eq:13.6)

where y= 1/T ∑_t=1^Ty_t is the sample mean of y_t. The approximate standard error for r_τ is 1/(T)^½.

13.3 Periodogram

The information in the correlogram can also be expressed in the frequency domain. The ordinates of the periodogram are

p(λ_j)=a_j²+b_j², for λ_j=2πj/T, j=0,...,n, and n=[T/2],

(eq:13.7)

where a_j and b_j are given by

a₀=(T)^½y,	a_j=( 2/T )^½∑_t=1^Ty_tc_j,t,	a_n= ( 1/T )^½∑_t=1^Ty_t( -1) ^t,
b₀=0,	b_j=( 2/T )^½∑_t=1^Ty_ts_j,t,	b_n=0,

(eq:13.8)

with c_j,t= cos λ_jt and s_j,t= sin λ_jt, for j=1,...,n-1. The sine and cosine terms are evaluated recursively, using the trigonometric addition rule; that is,

(

s_j,t+1

c_j,t+1

) =[

cos λ_j	sin λ_j
- sin λ_j	cos λ_j

] (

s_j,t

c_j,t

) , with (

s_j,0

c_j,0

) =(

) ,

(eq:13.9)

for t=0,...,T-1. Note that the terms cos λ_j and sin λ_j, for j=0,...,n, can also be evaluated recursively using a similar scheme. These recursions speed up the calculations by a significant amount. The periodogram is a plot of the ordinates p(λ_j) against λ_j/π, for j=1,...,n. Thus the X-axis ranges between 0 and 1.

The periodogram can also be written in terms of the sample autocorrelations; that is,

p(λ_j)=

2π

[ c(0)+2∑_τ=1^T-1c(τ) cos (λτ)] ,

(eq:13.10)

where c(τ)= 1/T ∑_t=1^T(y_t-y)(y_t-τ-y).

In the literature on the spectral analysis of stationary time series, the plot of p(λ_j) against λ_j is sometimes called the spectrogram, but the term periodogram is more widely used. For more information on this topic see Priestley (1981).

13.4 Spectrum

The periodogram is an inconsistent estimator of the spectrum; a consistent estimator of the spectrum is given by

f̂( λ_i) =

2π

∑_j=-m^mw_jp( λ_i+j) ,

(eq:13.11)

where m is called the window length and the w_j's are normalised weights which are constrained by w_-j=w_j, for j=0,...,m. The weights in STAMP are directly associated with the odd rows of the Pascal's triangular scheme, for example,

m	c₀	c₁	c₂	c₃	c₄
2	2	1
3	6	4	1
4	20	15	6	1
5	70	56	28	8	1

(eq:13.12)

and w_j=c_j4^1-m. Note that, for this Pascal's scheme, the normalisation factor is based on ∑_j=-m^mc_j=4^m-1. At the beginning and at the end of the estimated spectrum the weighting scheme is simply truncated and the normalisation factor is adjusted appropriately so that ∑_j w_j=1.

13.5 CUSUM

The CUSUM graph is designed to assess whether the mean of a process y_t changes over time; see Brown, Durbin and Evans (1975) and Harvey (1990, p. 153--5). In STAMP the CUSUMs are defined with respect to the standardised series

CUSUM(t)=σ̂ _y^-1∑_j=1^t(y_j-y), t=1,...,T

(eq:13.13)

where y= 1/T ∑_j=1^Ty_j and σ̂ _y=( 1/T ∑_j=1^T( y_j-y) )^½. The motivation for the CUSUM is that if the mean changes through the standardised series, it will have runs of positive or negative observations. Thus the CUSUM will drift up (or down), before going back to zero at CUSUM(T). A formal check on the CUSUM is based on assessing whether it crosses two `boundary' lines as given by

CUSUM=±[0.85(T)^½+1.7t/(T)^½].

(eq:13.14)

These lines are based on a significance level of 10%.

13.6 CUSUM of squares

A similar device to the CUSUM is available for detecting a change in variance. Attention is now focused on the squares of the data and so we work with,

CUSUMSQ(t)=∑_j=1^t(y_j-y)²/∑_j=1^T(y_j-y)², t=1,...,T.

(eq:13.15)

As t increases, the CUSUMSQ increases. If the model is homoskedastic, the increase should be linear. Departures from the 45^_° line in the CUSUMSQ graph suggest a breakdown of this assumption.

13.7 Histogram

Let z_t denote the standardised observations (y_t-y)/σ̂ _y, for t=1,...,T, where y and σ̂ _y are defined in §13.5. The range of z_t is divided into N intervals of length h with h defined in §13.8. Then the proportion of z_t in each interval constitutes the histogram.

13.8 Density estimation

The density can be estimated as a smoothed function of the histogram using a normal or Gaussian kernel. This can then be summed (`integrated') to obtain the estimated cumulative distribution function (CDF).

Denote the actual density of Z at z by f_z( z) . A non-parametric estimate of the density is obtained from the sample by:

f_z( z) ̂= 1/Th ∑_t=1^TK(

z-z_t

) ,

(eq:13.16)

where h is the window width or smoothing parameter, and K( ^.) is a kernel such that:

∫_-∞^∞ K( x) dx=1.

(eq:13.17)

STAMP sets h=1.06σ̂ _z/T^0.2 (=1.06/T^0.2 as σ̂ _z=1) as a default and uses the standard normal density for K( ^.) :

z-z_t

) =

(2π)^½

exp [ -

1/2

( z-z_t/h ) ²] .

(eq:13.18)

We thank Professor Bernard Silverman for permission to use his algorithm for calculating f_z( z) ̂. A general treatment of density function estimation can be found in Silverman (1986).

13.9 Distribution function

The estimated CDF of Z can be derived from f_z( z) ̂; this is shown with a standard normal CDF for comparison. When selecting one graph, a non-parametric density estimation menu allows the choices determining h to be set. The menu does not appear for multiple graphs on one screen.

13.10 Write report

When the `Write report' option is active in the Describe dialog, some descriptive statistics will be sent to the Results window. Which are selected to be reported will depend on the other options which are activated in the Describe dialog. This can be broken down into two types of output: measures of serial dependence and normality.

13.10.1 Serial correlation

The correlogram values r_τ , for τ=1,...,p, are defined in §??. Their standard errors, under the null hypothesis of white noise, are equal to 1/(T)^½. The Box--Ljung portmanteau statistic is given by

Q( p,q) =T(T+2)∑_τ=1^p

r_τ ²

T-p

(eq:13.19)

see Ljung and Box (1978). This is tested against a χ_q² distribution. When no model is fitted, the degrees of freedom, q, is set equal to p. However, when the Q-statistic is applied to the residuals of an estimated structural time series model, q is set equal to p+1 minus the number of parameters. Chapter 15 gives the details on how the Q -statististic is used in the context of fitting structural time series models. The Box--Ljung Q-statistic is a modification of the earlier Box and Pierce (1970) statistic, corrected to have better size behaviour in moderate samples. It is reported for a variety of values of p along with its Prob. value computed using a χ_q² distribution.

13.10.2 Normality test

Let μ, σ² denote the mean and variance of y, and write μ_i=E[ y-μ] ⁱ, so that σ²=μ₂. The skewness and kurtosis are defined as

√β₁=

μ₃

μ₂^3/2

and β₂=

μ₄

μ₂²

(eq:13.20)

Sample counterparts are defined by

y= 1/T ∑_t=1^Ty_t, m_i= 1/T ∑_t=1^T( y_t-y) ⁱ, √b₁=

m₃

m₂^3/2

and b₂=

m₄

m₂²

(eq:13.21)

A normal variate will have √β₁=0 and β₂=3. STAMP reports

T( √b₁) ²

and k=

T( b₂-3) ²

(eq:13.22)

which are both tested against a χ₁² distribution. The Bowman--Shenton test for normality is given by

N_BS=s+k,

(eq:13.23)

which is tested against a χ₂² distribution. As Bowman and Shenton (1975) note, the above tests are unsuitable unless used in very large samples. The statistics √b₁ and b₂ are not independently distributed in small samples, and the sample kurtosis especially approaches normality very slowly. Another normality test reported by STAMP is described in Doornik and Hansen (1994). It derives from Shenton and Bowman (1977), who give b₂ (conditional on b₂>1+b₁) a gamma distribution, and D'Agostino (1970), who approximates the distribution of √b₁ by the Johnson S_u system. Let s^* and k^* denote the transformed skewness and kurtosis, respectively, where the transformation creates statistics which are much closer to standard normal. The alternative normality test is

N_DH=s^*+k^*,

(eq:13.24)

which is also tested against a χ₂² distribution.Table Table:13.1 compares (eq:13.24) with its asymptotic form (eq:13.23). It gives the rejection frequencies under the null of normality, using χ₂² critical values. The experiments are based on 10,000 replications and common random numbers.

Table:13.1 Empirical size of normality tests

	nominal probabilities of N_DH				nominal probabilities of N_BS
T	20%	10%	5%	1%	20%	10%	5%	1%
50	0.1734	0.0869	0.0450	0.0113	0.0939	0.0547	0.0346	0.0175
100	0.1771	0.0922	0.0484	0.0111	0.1258	0.0637	0.0391	0.0183
150	0.1845	0.0937	0.0495	0.0131	0.1456	0.0703	0.0449	0.0188
250	0.1889	0.0948	0.0498	0.0133	0.1583	0.0788	0.0460	0.0180

STAMP reports the following statistics under this option:

mean	y	maximum
standard deviation	σ_y= (m₂)^½	skewness χ₁ ²	s
skewness	√b₁	kurtosis χ₁ ²	k
excess kurtosis	b₂-3	normality χ₂ ²	N_BS
minimum		normality χ₂ ²	N_DH

Chapter 14 Statistical Treatment of Models

To understand fully the output of STAMP it is useful to understand the way in which the models are handled statistically. The algorithms used to carry out the computations are based on the state space form. The interested reader may find explanations of the underlying ideas and proofs of the algorithms in Anderson and Moore (1979), Harvey (1989, Chs. 3 and 4), de Jong (1991) and Koopman (1993). This chapter simply sets out the algorithms and gives the exact technical details of implementation. We first formally define the models which were used in Chapters ??--??.

14.1 Model definitions

14.1.1 Univariate time series models

A univariate model may be written as

y_t=μ_t+γ_t+ψ_t+ν_t+ε_t, ε_t~NID(0,σ_ε ²), t=1,...,T,

(eq:14.1)

where μ_t is the trend, γ_t is the seasonal, ψ_t is the cycle, ν_t is a first-order autoregressive component and ε_t is the irregular. The model can be extended with two similar cycle components; see below.

The stochastic trend component is specified as

μ_t	=	μ_t-1+β_t-1+η_t,	η_t~NID(0,σ_η ²),
β_t	=	β_t-1+ζ_t,	ζ_t~NID(0,σ_ζ ²),

where β_t is the slope or gradient of the trend μ_t. The irregular ε_t, the level disturbance η_t and the slope disturbance ζ_t are mutually uncorrelated. The slope component β_t can be excluded from the trend specification when this is appropriate. Some special trend specifications are listed in Table Table:14.1.

Table:14.1 Some special level and trend specifications

Level	σ_ε	σ_η
constant term	*	0
local level (LL)	*	*
random walk (RW)	0	*

Trend	σ_ε	σ_η	σ_ζ
deterministic	*	0	0
LL with fixed slope	*	*	0
RW with fixed drift	0	*	0
local linear (LLT)	*	*	*
smooth trend	*	0	*
second differencing	0	0	*
Hodrick-Prescott	*	0	.025 σ_ε

10.5cm Asterix * indicates any positive value.

The seasonal component may have the dummy variable form or the trigonometric form. The number of seasonal frequencies in a period (e.g. a year) is given by integer s. When s is even, [s/2]=s/2, and when s is odd, [s/2]=(s-1)/2. The seasonal dummy is given by

γ_t=-γ_t-1-...-γ_t-s+1+ω_t, ω_t~NID(0,σ_ω ²).

The trigonometric seasonal form is

γ_t=∑_j=1^[s/2]γ_j,t

where each γ_j,t is generated by

[

γ_j,t

γ_j,t^*

] =[

cos λ_j	sin λ_j
- sin λ_j	cos λ_j

] [

γ_j,t-1

γ_j,t-1^*

] +[

ω_j,t

ω_j,t^*

] ,

j=1,...,[s/2],

t=1,...,T,

where λ_j=2πj/s is the frequency, in radians, and the seasonal disturbances ω_t and ω_t^* are two mutually uncorrelated NID disturbances with zero mean and common variance σ_ω ². For s even, the component at j=s/2 collapses to

γ_j,t=γ_j,t-1 cos λ_j+ω_j,t.

The statistical specification of a cycle, ψ_t, is given by

[

ψ_t

ψ_t^*

] =ρ_ψ [

cos λ_c	sin λ_c
- sin λ_c	cos λ_c

] [

ψ_t-1

ψ_t-1^*

] +[

κ_t

κ_t^*

] , t=1,...,T,

where ρ_ψ , in the range 0<ρ_ψ ≤1, is a damping factor; λ_c is the frequency, in radians, in the range 0 ≤λ_c ≤π; κ_t and κ_t^* are two mutually uncorrelated NID disturbances with zero mean and common variance σ_κ ². Note that the period of the cycle is equal to 2π/λ_c. There may be two additional cycles of this form incorporated in the model.

A first-order autoregressive, AR(1), process is given by

ν_t=ρ_ν ν_t-1+ξ_t, ξ_t~NID(0,σ_ξ ²),

with ρ_ν in the range 0<ρ_ν <1. The AR(1) component is actually a limiting case of the stochastic cycle when λ_c is equal to 0 or π, though it is specified separately in STAMP, partly to avoid confounding and partly because it is not a limiting case in multivariate models.

Finally, the disturbances driving each of the components in the model are mutually uncorrelated.

14.1.2 Explanatory variables and interventions

A single equation model may include exogenous explanatory variables, lagged values of the dependent variable and intervention variables, as well as unobserved components such as trend, seasonal and cycle. Thus (eq:14.1) can be extended as

y_t=μ_t+γ_t+ψ_t+r_t+∑_τ=1^pφ_τ y_t-τ+∑_i=1^k∑_τ=0^qΔ_iτx_i,t-τ+∑_j=1^hλ_jw_j,t+ε_t

(eq:14.2)

where x_it is an exogenous variable, w_jt is an intervention (dummy) variable and φ_τ ,Δ_iτ and λ_j are unknown parameters.

14.1.3 Multivariate time series models

Multivariate models have a similar form to univariate models, except that y_t is now an N×1 vector of observations, which depends on unobserved components which are also vectors. Thus in the special case of a multivariate local level model

y_t	=	μ_t+ε_t,	ε_t~NID(0,Σ _ε ),
μ_t	=	μ_t-1+η_t,	η_t~NID(0,Σ _η ),

(eq:14.3)

where Σ_ε and Σ_η are both N×N variance matrices, and η_t and ε_t are mutually uncorrelated in all time periods. The other disturbances in the general model (eq:14.1) similarly become vectors which have N×N variance matrices. Models of this kind are called seemingly unrelated time series equations (SUTSE).

In a homogeneous model the variance matrices are proportional. In the case of a multivariate local level model this implies Σ _η =qΣ _ε , where q is a non-negative scalar.

The other components are incorporated in a multivariate model in a similar manner by letting the components and disturbances become N×1 vectors and letting the variances become N×N variance matrices. In the case of the cycle, the parameters ρ and λ are the same for all series. As regards the disturbances,

E( κ_tκ_t') =E( κ_t^*κ_t^*') =Σ_κ and E( κ_tκ_t^*') =0.

(eq:14.4)

The specification of the cycle model can also be written as

[

ψ_t

ψ_t^*

] ={ρ_ψ [

cos λ_c	sin λ_c
- sin λ_c	cos λ_c

] ⊗I_N}[

ψ_t-1

ψ_t-1^*

] +[

κ_t

κ_t^*

] ,

(eq:14.5)

where

Var[

κ_t

κ_t^*

] =I₂⊗Σ_κ , ψ_t and ψ_t-1^*are N ×1 vectors.

A stationary first-order vector autoregressive may be included in a multivariate model as an alternative to, or even as well as, a cycle. Thus

ν_t=Φν_t-1+ξ_t, and Var(ξ_t)=Σ_ξ ,

(eq:14.6)

where Φ is a N×N matrix of coefficients. The condition for ν_t to be stationary is that the roots of the matrix polynomial I-ΦL should all lie outside the unit circle.

14.1.4 Common factors

In a common factor model, some or all of the components are driven by disturbance vectors with less than N elements. In terms of a SUTSE model, the presence of common factors means that the variance matrices of the relevant disturbances are less than full rank.

Common factors may appear in all components, including the irregular. Common trends arise through common levels, common slopes or a combination of the two.

14.1.4.1 Common levels

Consider the local level model, (eq:14.3), but suppose that the rank of Σ_η is K<N. The model then contains K common levels or common trends and may be written as

y_t	=	Θμ_t^†+μ_θ +ε_t,	ε_t~NID(0,Σ_ε ),
μ_t^†	=	μ_t-1^†+η_t^†,	η_t^†~NID(0,D_η ),

(eq:14.7)

where η_t^† is a K×1 vector, Θ is a (N×K) matrix of standardised factor loadings, D_η is a diagonal matrix and μ_θ is a (N×1) vector in which the first N-K elements are zeros and the last K elements are contained in a vector {\bf \mu } . The standardised factor loading matrix contains ones in the `diagonal' positions, i.e. θ_ii=1,i=1,...,K, while θ_ij=0 for j>i.

The model may be recast in the original SUTSE form (eq:14.3) by writing μ_t=Θμ_t^†+μ_θ and noting that Σ_η =ΘD_η Θ' is a singular matrix of rank K. When there are no common trends so N=K, the factor loading matrix is the Cholesky decomposition of Σ_η . As regards {\bf \mu }, partition Θ=(Θ₁',Θ₂')' where Θ₁ consists of the first K rows, and partition μ_t similarly. Then

\mu =-Θ₂Θ₁^-1μ_1t+μ_2t, t=1,...,T.

(eq:14.8)

The vector {\bf \mu } is estimated from the above set of equations using the estimated states at t=T. The computations are carried out by back-substitution since the load matrix is lower triangular.

14.1.4.2 Smooth trends with common slopes

Common slopes may be formulated along similar lines. We first consider the case where the variance matrix of the slope disturbances is of rank K_β but the variance matrix of levels is null so that the estimated trends are relatively smooth. The model is therefore

y_t	=	μ_t+ε_t,	ε_t~NID(0,Σ_ε ),
μ_t	=	μ_t-1+β_t-1,
β_t	=	β_t-1+ζ_t,	ζ_t~NID(0,Σ_ζ ),

but, following (eq:14.7) above , it may be re-formulated as

y_t	=	Θμ_t^†+μ_θt+ε_t,	ε_t~NID(0,Σ_ε ),
μ_t^†	=	μ_t-1^†+β_t-1^†,
β_t^†	=	β_t-1^†++ζ_t^†,	ζ_t^†~NID(0,D_ζ ),

where Σ_ζ =ΘDΘ' and the first K_β elements in μ_θt are zeros and the remainder are contained in a vector {\bf \mu }+{\bf \beta }% t. The (N-K)×1 vectors, {\bf \mu } and % {\bf \beta }, may both be calculated by expressions like (eq:14.8).

14.1.4.3 Common trends: level and slopes

A general multivariate local linear trend model in which the level variance matrix is of rank K_η while the slope variance matrix is of rank K_β may be written

y_t	=	μ_t+ε_t,	ε_t~NID(0,Σ_ε ),
μ_t	=	μ_t-1+Θ_β β_t-1^†+β_θ +η_t,	η_t~NID(0,Σ_η ),
β_t^†	=	β_t-1^†+ζ_t^†,	ζ_t^†~NID(0,D_ζ ),

where the N×K_β matrix Θ_β is such that Σ_ζ =Θ_β D_ζ Θ_β ' ,and β_θ =(0',{\bf % \beta }')' with {\bf \beta } a vector of length N-K_β .

If K_β =1, setting Θ_β equal to a vector of ones and letting {\bf \beta }=0 would imply that all series had the same underlying growth rate (when modelling in logs). This might be plausible even if there are no common levels. The implication is that the trends in the forecast function remain parallel, in other words the long-run growth paths are the same. However, unless there are similar restrictions on the levels, the growth paths within the sample will not necessarily have kept together.

The following interpretation can be made if K_β ≤K_μ and the common stochastic slopes only affect the observations via the common stochastic trends:

y_t	=	Θ_μ μ_t^†+μ_θt+ε_t,	ε_t~NID(0,Σ_ε ),
μ_t^†	=	μ_t-1^†+Θ_1μ^-1Θ_μββ_t-1^†+η_t^†,	η_t^†~NID(0,D_η ),
β_t^†	=	β_t-1^†+ζ_t^†,	ζ_t^†~NID(0,D_ζ ),

where Θ_1μ is the first K_μ rows of Θ_μ ,Θ_μβ is the first K_μ rows of Θ_β ,μ_θt=(0',β₁'t,{\bf \mu }'+β₂'t)' with {\bf \beta }₁ is a vector of length K_μ -K_β , {\bf \beta }₂ is a vector of length N-K_μ and % {\bf \beta }=({\bf \beta }₁', {\bf \beta }₂')' . The fixed components satisfy, for any 1 ≤t ≤T,

\mu =-Θ_2μΘ_1μ^-1μ_1t+μ_2t-{\bf \beta }₂t and \beta =-Θ_2βΘ_1β^-1β_1t+β_2t

where Θ_1β is the first K_β rows of Θ_β and Θ_2β is the last N-K_μ rows.

At present STAMP does not offer the option of making the restrictions implied by the above formulation. However, if K_μ =K_β =1, it must be the case that the above restriction implies Θ_2β=Θ_2μ. More generally for K_μ =K_β we have

μ_t^†=μ_t-1^†+Θ_1μ^-1Θ_1ββ_t-1^†+η_t^†, η_t^†~NID(0,D_η )

and setting Θ_1β=Θ_1μ implies the further restriction that the slopes in the common trends are mutually independent. Thus the common trends are fully independent in both level and slope disturbances. However, for this to be true for all rotations the further condition that D_ζ =q_ζ D_η is needed. This means that Σ_ζ =q_ζ Σ_η so the model is what we call trend homogeneous .

To summarise trend homogeneity means that K_μ =K_β and the slope and level disturbances enter through the same common trends, and when there is more than one common trend these trends are mutually independent. When the trend homogeneous constraint is applied, any restrictions made to Θ_1μ will automatically be carried over to Θ_1β.

14.1.4.4 Common seasonals

Common factors in seasonality implies a reduction in the number of disturbances driving changes in the seasonal patterns. It does not imply any similarity in seasonal patterns unless the deterministic seasonal components outside the common seasonals are set to zero. Thus suppose, for simplicity, that the series contain only seasonals and irregular. Then

y_t=Θγ_t^†+γ_θt+ε_t,

where the last N-K elements of γ_θt contain fixed seasonal effects.

14.1.4.5 Common cycles

A model with common trends and common cycles could be written

y_t=Θ_μ μ_t+μ_θ +Θ_ψ ψ_t+ε_t,

where Θ_ψ is N×K_ψ . Since the expectation of a cycle is zero, it is not necessary to include a vector of constant terms corresponding to the vector μ_θ which is needed for common levels.

As with other components, the common cycle can be re-formulated as a SUTSE model in which the disturbance variance matrices are Σ_κ = Θ_ψ D_κ Θ_ψ '. This is only possible because ρ_ψ and λ are the same in all the common cycles.

14.1.5 Explanatory variables

Explanatory variables and interventions may be included in multivariate models. Thus (eq:14.2) generalises to

y_t=μ_t+γ_t+ψ_t+r_t+∑_τ=1^rΦ_τ y_t-τ+∑_τ=0^sδ_τ x_t-τ+Λw_t+ε_t, t=1,...,T,

where x_t is a K×1 vector of explanatory variables and w_t is a K^*×1 vector of interventions. Elements in the parameters matrices, Φ, δ_τ and Λ may be specified to be zero, thereby excluding certain variables from particular equations. In addition, the unobserved components may be subject to common factor restrictions as described in the previous subsection.

14.2 State space form

The statistical treatment of structural time series models is based on the state space form (SSF). We use a particular SSF which is close to the `state space model' of de Jong (1991). It joins a measurement equation

y_t=Z_tα_t+X_tb+G_tu_t, t=1,...,T,

(eq:14.9)

with a transition equation, which allows the states α_t to evolve according to a first-order vector autoregressive process. We write this as

α_t+1=T_tα_t+W_tb+H_tu_t.

(eq:14.10)

The transition equation is initialised at the first time point by setting

α₁=W₀b+H₀u₀.

(eq:14.11)

The model is completed by making assumptions about the error process u_t and the vector of regressors b. We assume they are independently and normally distributed with

u_t~NID(0,σ²I), b=c+Bδ and δ~N(μ,σ²Λ).

(eq:14.12)

This formulation of the SSF is slightly different from that exploited in books such as Harvey (1989) and West and Harrison (1989). It is used to produce rather more elegant treatments of diffuse initial conditions and fixed effects; see de Jong (1991).

The state system matrices Z_t and T_t are fixed matrices which merely contain known values. The regression system matrices X_t and W_t are always known. The error system matrices G_t and H_t are also sparse but most non-zero values are unknown and are regarded as "hyper" parameters. The disturbance u_t is transformed into noise for the SSF by the error system matrices G_t and H_t. All system matrices can be regarded as selection type matrices.

At first sight, the role for b in the SSF looks quite complicated. However, the SSF is written in this way to allow a unified treatment of a variety of features for time series models. There are three particularly important ones on which to focus. First, the measurement equation allows b to influence the observations directly through the regressors X_t. In general X_t, like G_t and H_t, may be a sparse selection type matrix. Second, the states are directly affected by b through W_t. This feature is particularly useful for handling interventions like structural changes (changes in trend) and growth changes (changes in slope). Third, the prior distribution of the initial state vector is (partly) defined via b.

The random vector b allows the analysis of models where diffuse distributions are placed on the regressors and on the non-stationary components of the initial state. The vector b is a linear combination of the known constant vector c and the random vector δ. The distribution of δ is regarded as diffuse, i.e. the variance matrix Λ converges to infinity. Basically, this means that a proper prior distribution for the state cannot be given when components are non-stationary. For a technical discussion on these issues, we refer to de Jong and Chu-Chun-Lin (1994). The connection between diffuse initial conditions and marginal or restricted likelihoods is discussed in Tunnicliffe-Wilson (1989) and Shephard (1993b). The unconditional distributions for stationary components, such as cycles, are handled via the second part of (eq:14.11); that is, H₀u₀.

The generality of the SSF can be simplified for structural time series models in three different ways.

The constant vector c of the specification for b can be set to a zero vector such that b=Bδ. The matrix B is a square selection matrix of zeros and ones. The vector δ can be partitioned such that the regression effects and the initial effects are separated and that δ enter the equations of the SSF directly. This is achieved by

δ=(

δ_x

δ_i

) , and B=(B_x,B_i),

where the (k×1) vector δ_x contains the regression effects, the (d×1) vector δ_i contains the initial effects and the dimensions of the matrices B, B_x and B_i are (k+d)×(k+d), (k+d)×k and (k+d)×d, respectively. This brings the specification for b to

b=B_xδ_x+B_iδ_i

and

X_tb = X_t^*δ_x where X_t^* = X_tB_x and X_tB_i = 0,

W_tb = W_t^*δ_x where W_t^* = W_tB_x and W_tB_i = 0,

W₀b = W₀^*δ_i where W₀^* = W₀B_i and W₀B_x = 0.
For all models in STAMP, G_t and H_t will be orthogonal; that is, G_t'H_t=0, so the noise terms in the two equations, G_tu_t and H_tu_t, are independently distributed. This feature of a model simplifies the statistical analysis of the SSF.
All models in STAMP can be handled by a time-invariant SSF. This implies that the t subscripts of the state and error system matrices can be dropped, i.e. Z_t=Z, T_t=T, G_t=G and H_t=H. A time-invariant SSF also simplifies the statistical analysis of the SSF.

14.2.1 Structural time series models in SSF

Structural time series models fit nicely in the state space form. However, some care should be taken with the initial state vector specification.

14.2.1.1 Univariate models in SSF

Some examples are given to get a flavour of it.

Seasonal model. The first example of a structural time series model in SSF is a model with smooth trend and dummy stochastic seasonals with s=4 (quarterly observations). The SSF representation is

y_t

(

) α_t+(

σ_ε

) u_t

α_t

(

μ_t

β_t

γ_1,t

γ_2,t

γ_3,t

) =(

1	1	0	0	0
0	1	0	0	0
0	0	-1	-1	-1
0	0	1	0	0
0	0	0	1	0

) α_t-1+(

0	0	0
0	σ_ζ	0
0	0	σ_ω
0	0	0
0	0	0

) u_t

with vector u_t=(ε_t,ζ_t,ω_t)' and B_x=0. This illustrates the fact that the matrices G and H play the role of selectors and scaling factors for the model, often being very sparse. The initial state vector is given by α₁ = δ_i such that B=B_i=I and H₀=0.

Cycle model. Another example is a univariate model which consists of a level and cycle component. The SSF representation of this model is

y_t

(

) α_t+(

σ_ε

) u_t, where α_t=( μ_t,ψ_t,ψ_t) ',

α_t

(

1	0	0
0	ρ cos λ_c	ρ sin λ_c
0	-ρ sin λ_c	ρ cos λ_c

) α_t-1+(

σ_η	0	0
0	σ_κ	0
0	0	σ_κ

) u_t

with vector u_t=(ε_t,η_t,κ_t,κ_t^*)^'. The unknown values in transition matrix T are the damping factor ρ and frequency λ_c. The initial state is given by

α₁

(

) δ_i+(

0	0
σ_κ (1-ρ²)^{- 1/2}	0
0	σ_κ (1-ρ²)^{- 1/2}

) u₀

with vector u₀=(κ₀,κ₀^*)'. Notice that δ_i is a scalar here.

Explanatory variables. A univariate structural time series model may include explanatory variables. Consider a local linear trend model with two explanatory variables and a level intervention at some time point t=j. The SSF representation of this model is given by

y_t = (

1 0

) α_t+(

x_1,t x_2,t 0

) δ_x+(

σ_ε 0 0

) u_t

α_t = (

μ_t

β_t

) =(

1 1

0 1

) α_t-1+(

0 0 w_t

0 0 0

) δ_x+(

0 σ_η 0

0 0 σ_ζ

) u_t

with vector u_t=(ε_t,η_t,ζ_t)'. The initial state vector is given by α₁=δ_i such that B=I and H₀=0. The series x_1,t and x_2,t are explanatory variables and w_t is a series of zeros but at some time point t=j its value is unity.

14.2.1.2 Multivariate model in SSF

Any multivariate model in STAMP with common factors can be written in the SUTSE form with lower rank variance matrices; see §14.1.4. The SUTSE form retains the sparse structure of the system matrices Z and T which is important for computational issues related to the estimation of these models. Therefore, this general formulation is adopted to specify a multivariate model in the SSF. Since the SUTSE form is a minor generalisation to the univariate model, only one example will be given. The multivariate model is a smooth trend plus cycle model which can be represented by the SSF as

y_t

(

) α_t+(

Γ_ε

) u_t,

α_t

(

I	I	0	0
0	I	0	0
0	0	ρ cos λ_cI	ρ sin λ_c I
0	0	-ρ sin λ_cI	ρ cos λ_cI

) α_t-1+(

0	0	0
Γ_ζ	0	0
0	Γ_κ	0
0	0	Γ_κ

) u_t

where state vector α_t=( μ_t',β_t',ψ_t',ψ_t^*') , disturbance vector u_t=(ε_t,ζ_t,κ_t,κ_t^*)' and each variance matrix is decomposed as Σ=ΓΓ' with the appropriate subscript. The unknown values in transition matrix T rely on the damping factor ρ and the frequency λ_c. The initial state is given by

α₁

(

I	0
0	I
0	0
0	0

) δ_i+ 1/(1-ρ²)^½ (

0	0
0	0
Γ_κ	0
0	Γ_κ

) u₀

with vector u₀=(κ₀,κ₀^*)' and δ_i contains the initial effects for μ_t and β_t.

14.2.1.3 Deterministic trend and seasonal components

The deterministic seasonal component is the special case of the stochastic seasonal model with σ_ω =0. Alternatively, the fixed seasonal can be incorporated within X_t as it is usually done in regression models. The latter one is adopted in STAMP. In a similar way (partly) deterministic trends can be incorporated in X_t or W_t. However, for practical reasons such as the implementation of trend interventions within W_t, the trend components μ_t and β_t are always part of the state vector. The inclusion of a fixed trend in the state vector has some consequences for calculation and interpretation of residuals and the prediction error variance.

14.3 Kalman filter

The Kalman filter (KF) plays the same role for time series models in SSF as least squares computations for a regression model. The KF is primarily a set of vector and matrix recursions. The importance of the KF is based on

computation of one-step ahead predictions of observation and state vectors, and the corresponding mean square errors;
diagnostic checking by means of one-step ahead prediction errors;
computation of the likelihood function via the one-step ahead prediction error decomposition;
smoothing which uses the output of the KF.

The KF has a variety of forms, but the one exploited in STAMP computes

a_t\|t-1	=	E( α_t\|Y_t-1,δ=0),
σ²P_t\|t-1	=	E{(α_t-a_t\|t-1)(α_t-a_t\|t-1)'\|Y_t-1,δ=0},

which are referred to as the mean and mean square error (MSE) of the state, respectively, given the past information and setting δ to zero. Generally, we need to perform estimation with δ not zero, so these terms will be corrected by the use of an augmented Kalman filter. This will be discussed in the next subsection.

The recursive equations of the Kalman filter are given by

v_t	=	y_t-X_tb-Za_t\|t-1,	F_t	=	ZP_t\|t-1Z'+GG',
q_t	=	q_t-1+v_t'F_t^-1v_t,	K_t	=	TP_t\|t-1Z'F_t^-1,
a_t+1\|t	=	Ta_t\|t-1+W_tb+K_tv_t,
P_t+1\|t	=	TP_t\|t-1T'-K_tF_tK_t'+HH',

(eq:14.13)

starting with a_1|0=W₀b, P_1|0=H₀H₀' and q₀=0. Here K_t is called the Kalman gain, while v_t and σ²F_t are the one-step ahead prediction error (or innovation) and its mean square error, respectively. The scaled innovations F_t^{- 1/2}v_t (or generalised least squares residuals) are approximately NID with zero mean and scale identity matrix as its variance matrix in a correctly specified model. Note that the estimate of σ² is given by σ̂²=q_T/NT. The proof of the KF is simple and it relies on standard results of linear estimation. Although the KF is presented in terms of time-invariant system and error matrices (Z, T, G and H), the time indices can be added to these matrices without any problem.

In the special case of a univariate model, N=1, the innovation vector and the corresponding mean square error matrix reduce to the scalars v_t and σ²f_t, respectively. Also, the Kalman gain matrix reduces to the vector k_t. This notation is only used for univariate models.

The KF requires a positive (for univariate models) or a non-singular positive definite matrix (for multivariate models) F_t which may not occur at the start of the KF when, for example, the column rank of G₁ is less N and P_1|0=H₀H₀'=0. An example is when no irregular is present in a structural time series model. In these circumstances, the KF is warmed up by a number of prediction updates; that is, a_j+1|0=Ta_j|0 and P_j+1|0=TP_j|0T'+HH'. The initialisation process continues until F_j=ZP_jZ'+GG' is positive definite. The KF is then re-initialised by a_1|0=a_i|0, P_1|0=P_i|0 and q₀=0 where i is the number of required prediction updates.

When F_t becomes non-positive definite during the KF updating, it is generally assumed that a numerical problem has occurred. Available remedies like a square-root version of the KF are not considered. However, the implementation of the KF in STAMP ensures for each time-point t that the MSE matrices P_t+1|t and F_t are symmetric, which is crucial in avoiding numerical problems. Also, avoiding unnecessary computations, such as adding zeros and multiplying zeros or unity values, helps in this respect.

In case of a time-invariant SSF, the KF may reach a steady state solution at some time s; that is, when P_s+1|s=P_s|s-1 for p<s≤T where p is the dimension of the state vector. This also implies that F_j and K_j are time-invariant for j=s,...,T. The KF reduces in these circumstances to the equations for a_t+1|t, v_t and q_t which are computationally very cheap. Finally, the steady state can be interpreted as the solution of the Riccati equation; see Anderson and Moore (1979).

14.3.1 The augmented Kalman filter

The KF evaluates the one-step ahead prediction of the state vector conditional on δ=0. When non-stationary components and/or fixed regression effects are present in the model, the unknown random vector δ is treated as diffuse; see [14.2]. In these circumstances an augmented Kalman filter (AKF) is applied:

V_t	=	- ZA_t\|t-1-X_tB
A_t+1\|t	=	TA_t\|t-1+W_tB+K_tV_t
(s_t,S_t)	=	(s_t-1,S_t-1)+V_t'F_t^-1(v_t,V_t)

(eq:14.14)

for t=1,...,T and with A_1|0=W₀B. The AKF must be regarded as a supplement to the KF equations; the expressions for v_t, F_t, q_t, K_t, a_t+1|t and P_t+1|t remain as they are. The number of columns for V_t and A_t+1|t is k+d, the same number of columns as in matrix B of the SSF. The one-step ahead prediction of the state vector and the corresponding MSE matrix are given by

E(α_t\|Y_t-1)	=	â_t\|t-1	=	a_t\|t-1+A_t\|t-1S_t-1^-1s_t-1,
MSE(α_t\|Y_t-1)	=	σ²P̂_t\|t-1	=	σ²(P_t\|t-1+A_t\|t-1S_t-1^-1A_t\|t-1'),

(eq:14.15)

respectively. The one-step ahead prediction error and the corresponding MSE matrix (or variance matrix) are given by

y_t-E(y_t\|Y_t-1)	=	v̂_t	=	v_t+V_tS_t-1^-1s_t-1
MSE(y_t\|Y_t-1)	=	σ²F̂_t	=	σ²(F_t+V_tS_t-1^-1V_t'),

(eq:14.16)

respectively. The matrix inversions for S_t, t=1,...,T, can be evaluated recursively using similar methods as recursive regressions; see Appendix 14.7 of this chapter. The estimate of the scalar variance σ² is

σ̂²=

NT-d-k

q̂_T where q̂_T=q_T-s_T'S_T^-1s_T.

(eq:14.17)

The full sample generalised least squares estimate of δ and its MSE matrix are given by

δ̂=S_T^-1s_T and MSE(δ̂)=σ²S_T^-1

(eq:14.18)

respectively. The MSE matrix is used to obtain standard errors and t-statistics for δ̂. In a similar way as vector δ, δ̂ is partitioned as

δ̂=(

δ̂_x

δ̂_i

) .

For a proof and a full exposition of the augmented Kalman filter; see de Jong (1991). When the SSF does not contain values for X_t and W_t, the KF can evaluate â_t+1|t from t=d+1 onwards. This implies that the AKF is only used for the first d updates. Then, the AKF is collapsed to the KF, mainly using equation (eq:14.15) for t=d. On the other hand, when X_t and W_t are present in the SSF, a partial collapse with respect to δ_i is possible. The exact details of the full and the partial collapse are given in Appendix 14.7 of this chapter. Formal proofs on collapsing are given by de Jong and Chu-Chun-Lin (1994).

14.3.2 The likelihood function

The likelihood function can be obtained from the KF by using the innovations and their mean square errors. This is known as the prediction error decomposition. For example, the likelihood function, conditional on δ=0, is defined as

log L(y|θ^*,δ=0)=- NT/2 log σ²- 1/2 ∑_t=1^T log |F_t|-

q_T

2σ²

(eq:14.19)

apart from some constant. To limit the number of logarithmic operations in computing the likelihood, it is important to check whether a steady-state has been reached such that log |F_t| is constant for all t.

The unknown parameters in the system matrices of the SSF are stacked in vector θ. For a given vector θ=θ^*, the KF calculates the likelihood and, therefore, it provides means to obtain maximum likelihood estimates of the parameters. These matters are discussed in §14.6.

14.3.2.1 Univariate models: the concentrated likelihood

The diffuse likelihood function (DL) is defined as the likelihood when δ is supposed to be diffuse; see §14.2. For a given vector θ=θ^*, the DL is given by

log L(y|θ^*)=-

T^*

log σ²- 1/2 ∑_t=1^T log f_t- 1/2 log |S_T|-

2σ²

q̂_T

(eq:14.20)

with T^*=T-k-d. When a (partial) collapse of the AKF has taken place at some time t=m, the DL is calculated by

log L(y|θ^*)=-

T^*

log σ²- 1/2 ∑_t=1^T log f_t- 1/2 log |S_*|-

2σ²

q̂_T.

(eq:14.21)

Appendix 14.7 of this chapter defines S_* and it gives more details of likelihood calculation when a full or a partial collapse has taken place. The scalar variance σ² can be concentrated out of the likelihood DL. This concentrated diffuse likelihood function (CDL) is given by

log L_c(y|θ^*)=-

T^*

log σ̂²- 1/2 ∑_t=1^T log f_t- 1/2 log |S_*|.

(eq:14.22)

To calculate the CDL, one variance corresponding to a particular unobserved component is set to a unity value such that this variance is set equal to σ² of (eq:14.12). The maximum likelihood estimate of this `concentrated out' variance is σ̂². The other variances of the components in the model are now measured as a ratio to the concentrated variance. For example, in a local level model, where σ_ε ² is concentrated out, we have σ_η ²=σ²q_η . These variance ratios are referred to as q-ratios. Parameter estimation is carried out with respect to the q-ratios; see §14.6. The CDL is used for parameter estimation of univariate models in STAMP. For a full discussion on diffuse likelihoods, we refer to de Jong (1988).

14.3.2.2 Homogeneous models

The homogeneous model is introduced in §14.1 and its restriction for a multivariate local level model reduces to

Σ_η =q_η Σ_ε ,

(eq:14.23)

where the q-ratio q_η plays the same role as set out in the previous subsection for univariate models. The homogeneity restrictions imposed on multivariate models allow the decoupling of the KF; see Fernandez (1990). The N different time series in y_t are subjected to the same KF which corresponds to a univariate SSF where the matrices G and H contain the q-ratio values of the homogeneous model. So, the covariance equations of the (univariate) KF, i.e. f_t, k_t and P_t+1|t, are the same for all N time series. The set of N innovation series, stacked in v_t, are used to construct the diffuse likelihood. The full variance matrix of a particular component, say the irregular ε_t; that is, Σ=Σ_ε , can be concentrated out of the likelihood function leading to

log L_c(y|θ^*)=-

T^*

log |{\bf % \Sigma }|- N/2 ∑_t=1^T log f_t- N/2 log |S_T|

(eq:14.24)

where

{\bf \Sigma }=

T^*

∑_t=k+d+1^T

v_tv_t'

f_t

The univariate AKF is used for homogeneous models and, if possible, a (partial) collapse is made at some time t=m. Therefore, the same technical details apply as mentioned in the previous subsection which are set out in Appendix 14.7 of this chapter.

14.3.2.3 Multivariate models

The diffuse likelihood is used for parameter estimation of multivariate models. Thus, no parameter is concentrated out from the likelihood. The AKF is used for calculation but it can be (partially) collapsed at some time t=m. The multivariate diffuse likelihood is defined as

log L(y|θ^*)=- 1/2 ∑_t=1^T log |F_t|- 1/2 log |S_*|-

q̂_T

(eq:14.25)

where S_* is defined in Appendix 14.7 of this chapter.

14.3.3 Prediction error variance

The prediction error variance (PEV) of a univariate structural time series model is the variance of the one-step ahead prediction errors in the steady state. As discussed at the start of this section, the steady state may require more updates of the KF than the number of available observations. The formal definition of the estimated PEV is

PEV=σ̂²f,

where f is the steady state value of f_t; that is, f= lim _t→∞f_t. For a multivariate model, the prediction error variance matrix can be evaluated. When a steady state is not obtained, it may be useful to report a `finite' PEV; that is,

PEV_n=σ̂²f̂_n,

where n is some large finite number. The `finite' PEV at the end of the sample; that is, at n=T, might also be of interest.

14.3.4 The final state and regression estimates

The final state vector â_T|T contains, together with δ̂_x, all the information from the estimated SSF required to forecast future observations. Therefore, the final state is of special interest. The estimate of the state vector and its MSE matrix at the end of the sample are given by

â_T\|T	=	â_T\|T-1+P̂_T\|T-1Z_T'F̂_T^-1v̂_T,
σ̂²P̂_T\|T	=	σ̂²(P̂_T\|T-1-P̂_T\|T-1Z_T'F̂_T^-1Z_TP̂_T\|T-1),

respectively. The diagonal part of the MSE matrix is used to form standard errors and t-statistics of the final state â_T|T. Under certain circumstances, some elements of this diagonal part may be equal to zero and calculation of the t-statistic is not possible then. Also, for stationary parts of the state vector, it is not useful to calculate the t-statistic since stationary components are not persistent throughout the series.

The regression estimates are obtained from

δ̂_x={\bf S}_T^-1{\bf s}_T and MSE(δ̂_x)=σ̂²{\bf S}_T^-1,

where {\bf S}_T and {\bf s}_T are defined in Appendix 14.7 of this chapter. The diagonal part of the MSE matrix is used to form standard errors and t-statistics.

14.3.5 Filtered components

The filtered components are based on the one-step ahead predictions of the state vector; see equation (eq:14.15). When k=0, the AKF is applied until t=d. Then, the AKF collapses to the KF which evaluates the filtered states directly. However, when k>0, a partial collapse takes place and filtering is based on (eq:14.15) with matrix % {\bf S}_T^-1 calculated recursively; see Appendix 14.7 of this chapter. Note that the filtered states are only defined for t=d+k+1,...,T.

14.3.6 Residuals

The residuals in STAMP for univariate models are defined as the standardised innovations

v_t=v̂_t/σ̂f̂_t^1/2, t=d+1,...,T.

(eq:14.26)

where v̂_t and σ̂ ²f̂_t are defined as in (eq:14.16). When k=0, the residuals are obtained from the KF directly after the collapse of the AKF at t=d. In a multivariate model the residuals of the j-th equation are given by

v_j,t=v̂_j,t/σ̂F̂_jj,t^1/2

for t=d+1,...,T where a_i,t is the i-th element of vector a_t and A_ij,t is the (i,j) element of matrix A_t.

When a SSF contains fixed effects; that is, when k>0, two sets of residuals are distinguished inside STAMP. These residuals are defined in the next two subsections. Note that the two sets are the same when k=0.

14.3.6.1 Generalised least squares residuals

The generalised least squares (GLS) residuals are standardised innovations computed using (eq:14.26), t=d+1,...,T, but where v̂_t and f̂_t are obtained from the AKF (eq:14.15) applied to a SSF with b_x being replaced by B_xδ̂_x. The calculation of these residuals requires two steps. Firstly, the estimates δ̂_x and σ̂ are obtained from the AKF with d+k columns for A_t+1|t. Then, a second AKF is applied, with d columns for A_t+1|t, from which v̂_t and f̂_t are obtained for t=d+1,...,T. Of course, the first AKF may be partially collapsed after t=d and the second AKF may be fully collapsed to a KF after t=d. The GLS residuals are not independently distributed with constant variance in small samples. Tests for normality, heteroskedasticity and serial correlation are applied to the GLS residuals.

14.3.6.2 Generalised recursive residuals

The generalised recursive residuals are the standardised residuals (eq:14.26), but now obtained from an AKF, with d+k columns for A_t+1|t, accompanied with a set of recursive regressions for the inversion calculations of S_t; see Appendix 14.7 of this chapter. A partial collapse of this AKF may take place at t=d. These residuals are denoted by w_t, for t=d+k+1,...,T, and they are commonly used in predictive tests and diagnostics.

14.4 Disturbance smoother

Smoothing refers to the estimation of the state vector α_t and the disturbance vector u_t using information in the whole sample rather than just past data. Smoothing is an important feature because it is the basis for

signal extraction, detrending and seasonal adjustment;
diagnostic checking for detecting and distinguishing between outliers and structural changes using auxiliary residuals;
EM algorithm for initial estimation of parameters;
calculation of the score, defined as the first derivative of the likelihood with respect to the vector of parameters.

We consider the disturbance smoother (DS) of Koopman (1993) which directly estimates the disturbance vector of the SSF, u_t, and the MSE matrix; that is,

E(u_t|Y_T,δ=0) = ũ_t and MSE(ũ_t) = σ² C_t,

respectively, with t=1,...,T. Note that in the context of structural time series models, the interest in u_t is limited to elements of Gu_t and Hu_t and their corresponding MSE matrices.

The DS is based on the same backwards recursions as found by Bryson and Ho (1969), de Jong (1989) and Kohn and Ansley (1989) which are given by

e_t	=	F_t^-1v_t-K_t'r_t,	r_t-1	=	Z'e_t+T'r_t,
D_t	=	F_t^-1+K_t'N_tK_t,	N_t-1	=	Z_t'F_t^-1Z_t+L_t'N_tL_t,

where L_t= T-K_tZ and t=T,...,1. These backwards recursions are initialised with r_T=0 and N_T=0. The storage requirements for the KF are limited to the quantities v_t, F_t and K_t for t=1,...,T. The smooth estimates for the disturbances of the measurement equation and the MSE matrices are

Gũ_t=GG'e_t and σ²GC_tG'=σ²(GG'-GG'D_tGG'),

(eq:14.27)

respectively. The smooth estimates for the disturbances of the transition equation and the mean square errors are

Hũ_t=HH'r_t and σ²HC_tH'=σ²(HH'-HH'N_tHH'),

(eq:14.28)

respectively. The proof of the disturbance smoother is given by Koopman (1993). The DS is the most computationally efficient smoother available yet.

14.4.1 The augmented disturbance smoother

In a similar way as the KF, the DS must be augmented to allow for initial effects with diffuse distributions and fixed components. The supplement recursions for the DS are

E_t=F_t^-1V_t-K_t'R_t and R_t-1=Z'E_t+T'R_t,

and they will be referred to as the augmented disturbance smoother (ADS). The number of columns for E_t and R_t is equal to d+k. When the AKF is not collapsed at all, the full sample estimates of the disturbance vectors E(Gu_t|Y_T), and the corresponding MSE matrices MSE(Gu_t|Y_T), are given by (eq:14.27) with e_t and D_t replaced by

ê_t=e_t+E_tδ̃ and D̂_t=D_t-E_tS^-1E_t',

(eq:14.29)

respectively, for t=1,...,T. In a similar way, E(Hu_t|Y_T) and MSE(Hu_t|Y_T) are given by (eq:14.28) with r_t and N_t replaced by

r̂_t=r_t+R_tδ̃ and N̂_t=N_t-R_tS^-1R_t',

(eq:14.30)

respectively. for the transition disturbances with Note that δ̃=S_T^-1s_T and S=S_T. For a proof of the ADS, the reader is referred to Koopman (1993). When the AKF is fully collapsed to the KF at t=m, the DS can be used for t=T,...,m+1. The ADS should be applied for t=m,...,1 but when a collapse has taken place, the matrices S_T and R_m are not available. Therefore, smoothing is not straightforward when a collapse has occurred during the AKF. However, a suggestion of Chu-Chun-Lin and de Jong (1993) can be used to recover these missing matrices.

14.4.2 The EM algorithm and exact score

The EM algorithm is a recursive method to obtain maximum likelihood estimates for unknown elements in the state and error system matrices of the SSF; see Shumway and Stoffer (1982) and Watson and Engle (1983). This method requires a huge amount of computations and is not easy to implement. Koopman (1993) proposes a simple and computationally efficient EM algorithm for unknown values inside the variance matrices Ω_G=GG' and Ω_H=HH' where the error system matrices are supposed to be time-invariant. Let θ be the stack of these unknown values. Given a set of values for θ, say θ^*, the EM step calculates new variance matrices via

Ω_G(θ^†)=Ω_G(θ^*)+Ω_G(θ^*)Λ_eΩ_G(θ^*),

Ω_H(θ^†)=Ω_H(θ^*)+Ω_H(θ^*)Λ_rΩ_H(θ^*),

where

Λ_e=∑_t=1^Tê_tê_t^'-D̂_t and Λ_r=∑_t=1^Tr̂_tr̂_t'-N̂_t.

The quantities ê_t and D̂_t are defined in (eq:14.29) and the quantities r̂_t and N̂_t are defined in (eq:14.30). The log-likelihood value of the SSF with these new variance matrices is always larger compared to the log-likelihood value of the SSF with the old variance matrices. This EM algorithm only requires one pass through the KF and one pass back through the disturbance smoother. Note that the EM maximises the diffuse log-likelihood (eq:14.21). For a proof and a discussion of this EM algorithm, the reader is referred to Koopman (1993).

The score vector is defined as the first derivative of the log-likelihood with respect to the vector of parameters θ. The vector of first derivatives is usually evaluated numerically. Koopman and Shephard (1992) present a method to calculate the exact score for any parameter in the state and error system matrices. However, the computational gain is only achieved for exact score calculations with respect to parameters in the error system matrices G_t=G and H_t=H. Again, let θ be the stack of unknown values in the variance matrices of the SSF and let vector θ^* contain specific values for θ, then the exact score is given by

q( θ^*) =

∂ log L(y|θ^*)

∂θ

= 1/2 tr{Λ_e

∂Ω_G(θ^*)

∂θ

}+ 1/2 tr{Λ_r

∂Ω_H(θ^*)

∂θ

(eq:14.31)

and the definitions of the EM algorithm apply here as well.

14.4.3 Smoothed components

The smoothed state vector is defined as the estimated state vector using the full data-set; that is,

E(α_t|Y_T)=α̃_t.

(eq:14.32)

The smoothed state vector forms the basis for detrending and seasonal adjustment procedures for the SSF. Graphical inspection of a sub-set or a linear combination of the smoothed state vector is an important feature in STAMP. The actual values for (eq:14.32) are obtained by

α̃_t+1=Tα̃_t+W_t^*δ̃_x+Hũ_t, t=1,...,T,

(eq:14.33)

with α̃₁=â_1|0+P̂_1|0(r₀+R₀δ̃). Hence, the smoothed state vector is calculated using the forward recursion (eq:14.33) after the DS is applied and the vector Hũ_t=HH'r_t is stored for t=1,...,T. The MSE matrix of the smoothed state vector is not considered; see de Jong (1989) and Kohn and Ansley (1989). It is shown by Koopman (1993) that this method of state smoothing is computationally very fast. Also note that no extra storage space is required since the storage space of the Kalman gain matrix K_t can be used to store the vector Hũ_t for t=1,...,T.

14.4.4 Auxiliary residuals

The standardised smoothed estimates of the disturbances are referred to as auxiliary residuals. Graphs of these residuals, in conjunction with normality tests, are a valuable tool for detecting data irregularities such as outliers, level changes and slope changes. Note that the auxiliary residuals are serially correlated even when the model is correctly specified. Therefore, the normality test should be adjusted for the implied serial correlation. The technical details and some interesting applications can be found in Harvey and Koopman (1992).

The auxiliary residuals are obtained directly from (eq:14.27) and (eq:14.28). The residuals are standardised using the diagonal elements of the appropriate MSE matrix. The variance σ² must be replaced by its estimate σ̂².

14.5 Forecasting

An important feature of STAMP is its extensive forecast facility. The model-based extrapolations can be calculated for the actual series but also for any element of the state vector.

14.5.1 Forecasts of series and components

The forecast of the state vector l multi-steps ahead, for l=1,...,L, and its MSE matrix are given by

â_T+l|T=E(α_T+l|Y_T), σ²P̂_T+l|T=MSE(α_T+l|Y_T),

respectively. The forecasts and MSE matrices are generated recursively by

â_T+l\|T	=	Tâ_T+l-1\|T+W_T+l^*δ̃_x,
σ̂²P̂_T+l\|T	=	σ̂²(TP̂_T+l-1\|TT'+HH'),

(eq:14.34)

respectively. The initialisation is directly obtained from the AKF at time T . The forecast for y_T+l and the corresponding MSE matrices are given by

ŷ_T+l\|T	=	Zâ_T+l\|T+X_T+l^*δ̃_x,
σ̂²F̂_T+l\|T	=	σ̂²(ZP̂_T+l\|TZ'+GG'),

(eq:14.35)

respectively. Graphical inspection of the forecasted values can be valuable for checking model validity.

14.5.2 Extrapolative residuals

The extrapolative residuals can be calculated when future values of y_t become available. They are defined as

v̂_t+l=y_T+l-ŷ_T+l|T,

(eq:14.36)

where ŷ_T+l|T is defined in (eq:14.35). The variance matrices of the extrapolative residuals are equal to the MSE matrices of ŷ_T+l|T; that is, σ̂²F̂_T+l|T as in (eq:14.35). The extrapolative residuals can be standardised by using the corresponding diagonal elements of the variance matrix.

14.6 Parameter estimation

The model definitions are given in §14.1. It is shown in §14.2 how the models of STAMP can be put in SSF. The filtering and smoothing algorithms of §14.3 and §14.4, respectively, are associated with the SSF and can be applied conditional on known state and error system matrices. The unknown values inside these matrices are treated as parameters which need to be estimated. The main task of STAMP is to estimate these parameters using maximum likelihood estimation (MLE) methods. This subsection sets out the strategy for solving this problem.

14.6.1 The parameters of STAMP

The state and error system matrices of the SSF are time-invariant and contain many zero and unity values. Therefore, they can be regarded as selection matrices. For a univariate model, the unknown values in the error system matrices are the standard deviations of the unobserved components. The time-invariant transition matrix T of the SSF contains the unknown first-order autoregressive parameter of stationary components such as cycle and (optional) slope. In the case of a cycle, the state system matrix T also contains cosine and sine terms which rely on the unknown frequency λ. Table 14.2 overviews the set of possible parameters for a univariate model.

The variances are always positive, the cycle damping factors must be in the range 0<ρ<1, and the frequencies can only take values in the range 0<λ<π. Since the parameters are to be estimated via a numerical method, some special transformations are used to enforce these restrictions on the parameters; see Table 14.2. The set of transformed parameters, relevant for a specific model, are stacked in the vector θ.

As observed from Table 14.2, the variance of the cycle, σ_ψ ², is taken as the parameter instead of the variance of its disturbance σ_κ ². In this case, σ_κ ²=σ_ψ ²(1-ρ_ψ ²)^1/2 and σ_κ ²→0 as ρ_ψ →1. This allows the estimation of deterministic, but stationary, cycles which is the extreme case of ρ_ψ =1.

Table:14.2 Parameters and transformations

Variances	Parameter	Transformation
Irregular	σ_ε ²	exp (2θ_ε)
Level	σ_η ²	exp (2θ_η)
Slope	σ_ζ ²	exp (2θ_ζ)
Seasonal	σ_ω ²	exp (2θ_ω)
Cycle	σ_ψ ²	exp (2θ_ψ)
AR(1)	σ_ξ ²	exp (2θ_ξ)

Autoregressive	Parameter	Transformation
Slope	ρ_β	\|θ_β\|(1 + θ_β²)^{- 1/2}
Cycle	ρ_ψ	\|θ_ρ\| (1 + θ_ρ²)^{- 1/2}
AR(1)	ρ_ν	θ_ν (1 + θ_ν²)^{- 1/2}

Frequency	Parameter	Transformation
Cycle	λ	2π/ 2 + exp (θ_λ)

10.5cm

14.6.1.1 Variance matrices in a multivariate model

The same parameters exist for a multivariate model as for a univariate model, except that the variances are replaced by variance matrices; see §14.1. A special case is the vector autoregressive component which is discussed in §14.5.2. Since all variance matrices are treated the same, we consider a N×N matrix Σ, i.e. a non-negative symmetric matrix but not necessarily full rank. The Cholesky decomposition of Σ enforces these restrictions. The variance matrix is decomposed as Σ=ΘDΘ' where N×p matrix Θ is a lower-triangular matrix with unity values on the leading diagonal, 1≤p≤N, and p×p matrix D is a non-negative diagonal matrix. The lower-triangular elements of Θ; that is, θ_ij with i=2,...,N and j=1,...,min(i-1,p), may take any value, including zero values. The diagonal elements of D are transformed as

d_i= exp (2θ_ii)

where d_i is the i-th diagonal element of D and i=1,...,p. Indeed, matrix Θ is the same factor loadings matrix as in the model specifications for common components; see §14.1.4. Since matrix Θ has p columns, this would suggest that variance matrix Σ has rank p. In most cases this will be true. However, as any diagonal element of D can be zero, the rank of Σ can be less than p. Note that when d_i=0, the i-th column of Θ does not affect variance matrix Σ and can be set to zero.

The variance matrices are placed into the SSF via the error system matrices. For example, the variance matrix of the irregular enters the SSF via G=(ΘD^1/2,0,...,0) where Θ and D correspond to the decomposition of the irregular variance matrix Σ_ε and 0 denotes a zero matrix.

14.6.1.2 Vector autoregressive components

The multivariate model may include a vector autoregression of order 1. The parameters inside the matrix Φ appear in the SSF transition matrix T. All roots of the matrix polynomial I-ΦL must lie outside the unit circle. Within a numerical optimisation procedure, the constraints on Φ needed to keep the process stationary are imposed by means of an algorithm given by Ansley and Kohn (1986). In the special case of a VAR(1) model, the transformations are:

The parameters of the VAR component are put into the N×N unrestricted matrix A and the N×N positive definite symmetric matrix Γ.
Cholesky transformations are applied on the matrices I+AA'=BB' and Γ=LL' .
The VAR matrices are defined by Φ=LA'B^'-1L^-1 and Σ_υ =Γ-ΦΓΦ'.

This method ensure a stationary VAR component. However, imposing restrictions on the VAR matrices may be difficult.

14.6.2 BFGS Estimation procedure^*

The optimisation method is based on the BFGS quasi-Newton method and it aims to maximise the likelihood criterion l( θ) . At every call to this updating scheme, it delivers a new estimate for θ, denoted by θ_i+1, given a current estimate, θ_i, which brings the object function l( θ) closer to its maximum. The updating scheme can be written as

θ_i+1=θ_i+s_iδ_i

(eq:14.37)

where s_i is a step size scalar and δ_i is the search direction vector. Appendix 14.8 of this chapter gives some basic principles on numerical optimisation and in particular the BFGS updating scheme. Some practical details on initialisation, updating, linear search and termination, are given below.

At the start of the estimation process an initial parameter vector θ₀ and an initial search direction δ₀ must be given. The procedure of setting the initial parameter vector is different for univariate models and for multivariate models; see §14.6.3 and §14.6.4, respectively. The initial direction vector is calculated as

δ₀=H^d(θ₀)q(θ₀)
(eq:14.38)

where q(θ₀) is the score vector and H^d(θ₀) is the Hessian matrix which only contains values for the diagonal entries. These values are evaluated numerically and are based on parameter vector θ₀. The off-diagonal entries are zero. The initial score vector is evaluated numerically. For multivariate models, the score vector is evaluated analytically for any next update; see §14.4.2.
The maximisation process is terminated when all three convergence criteria hold depending on a small value ε or when the number of BFGS updates is equal to the integer M or when irregularities occur. In the latter case, the estimation procedure is stopped with an error code attached to it. The three convergence criteria are

1. Likelihood: crit₁= |l( θ_i) -l( θ_i+1) | / |l( θ_i) | < ε

2. Gradient: crit₂= 1/m ∑_j=1^m|q_j( θ_i) | < 10ε

3. Parameter: crit₃= 1/m ∑_j=1^m|θ_i+1,j-θ_i,j| / |θ_i,j| < 100ε

where θ_i,j denotes the j-th element of the parameter vector θ_i and q_j( θ_i) denotes the j-th element of the score vector q( θ_i) . When all three convergence criteria hold, the estimation procedure terminates and returns one of the following messages

Very strong crit₁<ε crit₂<ε crit₃<ε

Strong crit₁<ε crit₂<ε crit₃<10^Nε

Weak crit₁<ε crit₂<10^Nε crit₃<10^Nε

Very weak crit₁<10^Nε crit₂<10^Nε crit₃<10^Nε

where N is the number of elements in the vector y_t of the state space form; see §14.2. For univariate models, N is equal to unity. The default values are ε=10^-7 and M=100.
The direction vector δ_i is obtained via the BFGS scheme; see Appendix 14.8 of this chapter. To ensure stability in the optimisation routine: the BFGS updating of (eq:14.47) does not take place when d'g in (eq:14.47) is smaller than ε.
The linear search strategy is as follows. The step length s_i is given by

s_i= j min {(c-|θ_i,j|)~/~|δ_i,j|}

where θ_i,j denotes the j-th element of the parameter vector θ_i and q_j( θ_i) denotes the j-th element of the score vector q( θ_i) . If s_i<0.0001, s_i is set to 0.0001, and if s_i>0.9, s_i is set to 1. The constant c is set to 7 for parameters related to standard deviations, but c is set to 28 and 8 for parameters related to ρ and λ, respectively. The constants determine the parameter range for the optimisation procedure; that is, θ_i,j∈ [-c,c]. Also, when the maximum value in the direction vector δ_i, δ_max, exceeds the value 3, then s_i is overwritten by min (δ_max^-1,s_i). Subsequently, if l(θ_i+1) is not larger than l(θ_i), the scalar s_i is divided by 2. This is repeated until s_i is smaller than ε. When s_i<ε, the Hessian matrix H( θ_i) is set equal its diagonal matrix H^d( θ_i) . The re-setting of the Hessian matrix is only allowed twice. At the third occasion, the estimation procedure is terminated with an error message. Notice that in most cases the value for s_i is the unity value.
When the likelihood function, the score or the diagonal elements of the Hessian matrix cannot be evaluated for some reason, the estimation procedure terminates with an error message.

More specific details of implementation for univariate models and multivariate models are given below.

14.6.3 Estimation of univariate models^*

For univariate models, STAMP maximises the concentrated diffuse log-likelihood log L_c(y|θ) which is treated as an unconstrained non-linear function of θ, denoted by l(θ). The likelihood kernel l(θ) is appropriately scaled by a function of T, the number of observations.

14.6.3.1 Initial values

The elements of the vector of transformed parameters θ are given in Table 14.2. The elements related to standard deviations, θ^σ , are set equal to -0.5, except for θ_η (level) which initial value is -1 whereas θ_ζ (slope) and θ_ω (seasonal) are started off with -1.5 and -2, respectively.

One element of θ^σ needs to be concentrated out, which effectively means it must be restricted to zero (that is, a unity standard deviation). The choice of the concentrated parameter must be chosen appropriately. At the start of the optimisation routine, the element corresponding to the irregular standard deviation is concentrated out. When no irregular is present in the model we choose from level, slope, cycles, seasonal and autoregressive component, consecutively.

The initial values for ρ_ψ and λ, corresponding to a particular cycle, are provided by the user and STAMP transforms them automatically to θ₀. The initial values for θ_β and θ_υ are set to 2 (which corresponds to ρ values of approximately 0.9).

Before the quasi-Newton optimisation is started, a simple method is constructed to get θ^σ to the neighborhood of the optimal solution in a stable but quickly way. The initialisation method is similar to the quasi-Newton updating scheme (eq:14.37). The j-th element of the direction vector δ_i is given by

δ_i,j=H_j,j(θ_i)q_j(θ_i)

where q_j( θ_i) denotes the j-th element of the score vector q( θ_i) and H_j,j(θ_i) denotes the (j,j) element of the Hessian matrix H( θ_i) . The score and the diagonal elements of the Hessian matrix are numerically evaluated at θ_i. The initialisation updating steps are repeated five times but it is stopped earlier if the stopping criterion

j∑δ_i,j∇_j(θ_i)<0.01

holds.

During this simple estimation procedure, a choice is made of which parameter should be concentrated out of the log-likelihood. At every step i, the parameter related to the largest value in vector θ_i, is concentrated out when its corresponding value in δ_i is larger than 4. This rule reflects the desire to concentrate out the largest standard deviation of the likelihood. When the choice of concentrated parameter is changed, the other standard deviations should be adjusted, so they have the correct ratios with respect to the new concentrated parameter. This might create unstabilities in the search process. Therefore, the difference between the old and the new adjusted value is enforced to lie between -1 and 2. This strategy is a result of `fine tuning' after it has been applied to a wide range of different models and data-sets.

14.6.3.2 Strategy for setting parameters fixed

During the estimation procedure, certain parameters might take values close to their boundaries. When a parameter moves closer and closer to its boundary, it may slow down the optimisation procedure as a whole. In these circumstances, it is advised to fix parameters at their boundary values but to allow estimation to be continued for the remaining parameters. The strategy for setting parameters fixed is as follows. In the case of standard deviations, when the transformed parameter value is smaller that -5 and its gradient value is smaller than 0.0001 (in absolute terms), it is set to a constant implying a zero standard deviation.

Similarly, the damping factor ρ of a cycle is set to unity, when the element corresponding to θ_ρ in vector θ_i is larger than 25. If the value related to θ_λ (corresponding to the period of a cycle) is larger than 7 or less than -7, the program reduces the cycle component to an autoregressive component. In both cases, it is assumed that the corresponding gradient value is smaller than 0.0001 (in absolute terms). The transformed parameters related to ρ_β (slope) and ρ_υ (autoregressive) don't apply to this strategy.

14.6.3.3 Strategy for determining the concentrated parameter

The initial estimation procedure has determined a concentrated standard deviation from θ^σ . However, during the main estimation process, it might be more appropriate to concentrate out another parameter. When the largest transformed parameter value (corresponding to θ^σ ) exceeds the value 1.5, this particular parameter will be the new concentrated parameter. Then, the other parameters corresponding to θ^σ obtain values as ratios to the new concentrated parameter, the gradient vector q( θ_i) is re-calculated and the Hessian matrix is replaced by the diagonal matrix H^d( θ_i) .

14.6.4 Estimation of multivariate models^*

The estimation procedure for multivariate models follows mainly the same route as for univariate models but there are some differences. The likelihood kernel l( θ) is the diffuse log-likelihood log L(y|θ) but properly scaled by a function of T. So there is no need to find an appropriate concentrated parameter. The EM algorithm plays a prominent role in the procedure to obtain initial variance matrices for the disturbances; see §14.4.2. The strategy of setting elements to zero in the Cholesky decompositions of the variance matrices is set out below. The parameters for the VAR component cannot be set fixed during the estimation procedure.

14.6.4.1 Initial values

The initial settings for the multivariate model are discussed in terms of the variance matrix, the Cholesky decomposition and the VAR coefficient matrix, rather than directly in terms of the parameter vector θ. The initialisation procedure automatically transforms the elements of these matrices into the vector θ₀ .

The initial parameter settings of the variance matrices is similar to the univariate settings of the q-ratios, except that now a specific q-ratio applies to all N diagonal entries of the D matrix. The factor loading matrix Θ is set to the identity matrix. At this stage, the rank conditions of the variance matrices are ignored. The VAR matrix is set equal to a diagonal matrix with all entries equal to 0.9. The extra initial parameters for the cycle component are set manually by the user.

The Kalman filter is applied to the multivariate model and the sample variance matrix of the innovations is calculated. The variance matrices of the disturbances are set equal to this matrix but properly scaled by their appropriate q-ratios.

Then, the EM algorithm is applied; see §14.4.2. The number of EM steps is equal to 5 which has proved to be appropriate in most cases. The VAR matrix and the other stationary parameters (ρ and λ) remain fixed during the EM. After the EM steps the parameters are transformed as indicated in §14.6.1. For example, the Cholesky decomposition is used to get the diagonal matrix D and the factor loading matrix Θ from the variance matrix. At this stage, the rank conditions are taken into account. Finally, the parameters are stacked into θ₀.

14.6.4.2 Strategy for fixing parameters

The need to restrict parameters to their boundary values is important to speed up the process of estimation. This is even more relevant for multivariate models since the estimation process is likely to take much more time than univariate models because filtering and smoothing involve matrices with larger dimensions and more parameters need to be estimated.

The diagonal entries of D are treated in the same way as the standard deviations in a univariate model. The restrictions for the damping factor and the frequency parameter are also done in a similar fashion as for univariate models. No restrictions can be made with respect to the parameters related to the VAR component. The factor loading matrix elements enter the parameter vector without transformation.

When the j-th diagonal element of D is restricted to zero, the rank of the variance matrix Σ is reduced by one. All elements of the corresponding j-th column of the factor loading matrix can also be restricted to zero since these elements don't have any effect on the variance matrix Σ. Any specific element of the factor loading matrix Θ is set to zero when its value is smaller than 0.0001 and its corresponding gradient is also smaller than 0.0001 (both in absolute terms).

14.7 Appendix 1: Diffuse distributions^*

This appendix discusses technical details about the way we treat initial and regression effects for the SSF. Most components in STAMP are initialised using diffuse priors and so we must treat these priors carefully in our calculations to ensure stable and exact results. This is carried out using the idea of augmented filtering and smoothing. To ensure computational efficiency the Kalman filter (KF) needs to be collapsed at some point.

14.7.1 Collapse of the augmented KF

For a SSF with k=0; that is, X_t=0 and W_t=0, t=1,...,T , is treated by the AKF for t=1,...,m and the AKF is collapsed to the KF using (eq:14.15) at t=m. Then, the KF evaluates â_t+1|t and P̂_t+1|t directly, t=1,...,T . The collapse also contains the operation

q̂_m=q_m-s_m'S_m^-1s_m.

Indeed, the KF evaluates the residual vector v̂_t and its (unscaled) variance matrix F̂_t directly as well. The value for m is restricted by T≥m≥d and |S_m|>0 but usually it is set to m=d.

For a SSF with k>0, a full collapse cannot be made since X_t and W_t are time-varying. However, a partial collapse can take place with respect to δ_i. This means that the AKF is still applied but with a reduced number of columns. Therefore, we partition the AKF matrices

A_t|t-1

(

A_x,t|t-1,

A_i,t|t-1

) ,

V_t

(

V_x,t,

V_i,t

)

s_t

(

s_x,t

s_i,t

) ,

S_t

(

S_xx,t	S_ix,t'
S_ix,t	S_ii,t

)

(eq:14.39)

When S_ii,t is invertible at t=m, the AKF can be partially collapsed to

{\bf A}_t+1\|t	=	A_x,t+1\|t	+	A_i,t+1\|tS_ii,t^-1S_ix,t,
{\bf P}_t+1\|t	=	P_t+1\|t	+	A_i,t+1\|tS_ii,t^-1A_i,t+1\|t',

and

q_t	=	q_t- s_i,t'S_ii,t^-1s_i,t,
{\bf s}_t	=	s_x,t-S_ix,t'S_ii,t^-1s_i,t,
% \TeXButton{bg}{\displaystyle }{\bf S}_t	=	S_xx,t-S_ix,t'S_ii,t^-1S_ix,t.

Note that matrix B of the (partially collapsed) AKF is replaced by {\bf B}=B_x. The matrices {\bf A}_t+1|t, {\bf V}_t=-Z_t{\bf A}_t+1|t-X_t% {\bf B} and {\bf B} have k columns.

14.7.2 Likelihood calculation

At the end of a fully collapsed AKF, the estimate of σ² is given by σ̂ ²=q̂_T/( NT-d-k) where k=0 and q̂_T is obtained from the collapsed KF. When k>0, q̂_T is given by

q̂_T=q_T-s_T'S_T^-1s_T.

Likelihood calculation also requires the log value of the term |S_*| where S_*=S_T. If k=0 and a full collapse has taken place at t=m, then |S_*|=|S_m|. If k>0 and a partial collapse has taken place at t=m, then |S_*|=|S_ii,m||% {\bf S}_T|. Note that the determinants are obtained without any extra computational costs since the inverses of these matrices are calculated earlier.

14.7.3 Recursive regressions

The recursive evaluation of least squares estimates is common practice in standard regression analysis since it is a way to get the recursive residuals. The same technique can be applied to the recursive evaluation of S_t which consists of the equations

â_t\|t-1	=	a_t\|t-1+A_t\|t-1c_t-1,	P̂_t\|t-1	=	P_t\|t-1+A_t\|t-1C_t-1A_t\|t-1',
v̂_t	=	v_t+V_tc_t-1,	F̂_t	=	F_t+V_tC_t-1V_t',
			K_t^*	=	F̂_t^-1V_tC_t-1,
c_t	=	c_t-1+K_t^*'v̂_t,	C_t	=	C_t-1-K_t'F̂_tK_t^*.

(eq:14.40)

where c_t=S_t^-1s_t and C_t=S_t^-1.

14.8 Appendix 2: Numerical optimisation^*

Numerical optimisation methods are used to obtain the maximum likelihood estimates for the parameter vector θ. Here we outline some of the basic principles behind the estimation procedure of STAMP.

There is a vast literature on non-linear optimisation techniques (see, among many others, Gill, Murray and Wright (1981), Cramer (1986) and Thisted (1988)). Note that many texts on optimisation focus on minimisation, rather than maximisation, but of course: max l( θ) =- min -l( θ) .

A first approach to obtaining the MLE θ from l( θ) is to consider solving the score equations, assuming the relevant partial derivatives exist:

l( θ) =

∂l( θ)

∂θ

=q( θ) .

(eq:14.41)

Then q( θ) =0 defines the necessary conditions for a local maximum of l( θ) at θ. A sufficient condition is that

²l( θ) =

∂²l( θ)

∂θ∂θ'

∂q( θ)

∂θ

'=-Q( θ)

(eq:14.42)

also exists and is negative definite at θ. If Q( ^.) is positive definite for all parameter values, the likelihood is concave, and hence has a unique maximum; if not, there could be local optima or singularities.

14.8.1 Newton type methods

When q(θ) is a set of equations that are linear in θ, q(θ)=0 can be solved explicitly for θ. More generally, q(θ) is non-linear, yielding a problem of locating θ which is no more tractable than maximising l(θ). Thus, we consider iterative approaches in which a sequence of values of θ (denoted θ_i at the i^th iteration) is obtained approximating q(θ_i)=0, and corresponding to non-decreasing values of the criterion function l(^.) : l(θ_i+1)≥l(θ_i):

θ_i+1=h( θ_i) for i=1,2,...,I≤M

(eq:14.43)

where M is a terminal maximum number of steps, from an initial value θ₀. A convergence criterion is used to terminate the iteration, such as q(θ_i+1)~=0 or |l(θ_i+1)-l(θ_i)|≤ε. These implementation-specific aspects are discussed in §14.6.

Expand q( θ) =0 in a first-order Taylor's series:

q( θ) ~=q( θ₁) -Q( θ₁) ( θ-θ₁) =0,

(eq:14.44)

written as the iterative rule:

θ_i+1=θ_i+H( θ_i) q( θ_i) ,

(eq:14.45)

where H( ^.) =Q^-1( ^.) is the Hessian matrix. The gradient, q(θ_i), determines the direction of the step to be taken, and H(θ_i) modifies the size of the step which determines the metric: this algorithm is the Newton-Raphson technique, or Newton's method. Even when the direction is uphill, it is possible to overstep the maximum in that direction. In that case it is essential to add a line search to determine a step length s_i∈[ 0,1] :

θ_i+1=θ_i+s_iδ_i,

(eq:14.46)

where δ_i=H( θ_i) q( θ_i) and s_i is chosen to ensure that l( θ_i+1) ≥l( θ_i) .

STAMP uses the so-called variable metric or Broyden-Fletcher-Goldfarb-Shanno (BFGS) update quasi-Newton methods. These approximate the Hessian matrix H( θ_i) by a symmetric positive definite matrix K_i which is updated at every iteration but converges on H( ^.) .

Let d=s_iδ_i=θ_i-θ_i-1 and g=q( θ_i) -q( θ_i-1) ; then the BFGS update is:

K_i+1=K_i+( 1+

g'K_ig

d'g

)

dd'

d'g

dg'K_i+K_igd'

d'g

(eq:14.47)

This satisfies the quasi-Newton condition K_i+1g=d, and possesses the properties of hereditary symmetry (K_i+1 is symmetric if K_i is), hereditary positive definiteness, and super-linear convergence.

14.8.2 Numerical score vector and diagonal Hessian matrix

When the analytic formula for q( θ) cannot be obtained easily, STAMP uses BFGS with a numerical approximation to q( ^.) based on finite difference approximations. The numerical derivatives are calculated using:

q_j( θ) =

∂l( θ)

∂( ι_j'θ)

l( θ+ει_j) -l( θ)

(eq:14.48)

where ι_j is a unit vector (for example, ( 0 1 0 ... 0) ' for j=2), ε is a suitably chosen step length. Thus, ε represents a compromise between round-off error (cancellation of leading digits when subtracting nearly equal numbers) and truncation error (ignoring terms of higher order than ε in the approximation).

The diagonal entries of the Hessian matrix are numerically evaluated as

H_j,j^d( θ) =-

Q_j,j( θ)

where

Q_j,j( θ) =

∂²l( θ)

∂( ι_j'θ) ∂( ι_j'θ)

∂q_j( θ)

∂( ι_j'θ)

q_j( θ+ει_j) -q_j( θ)

with the same step-length ε.

Chapter 15 Model Output

This chapter defines the output of STAMP. The first section explains when and where the various parts of the output appear. The remaining sections explain the output is presented for every option of the Test menu.

15.1 Output from STAMP

This section details the output which occurs when a model is fitted to data. There are three different forms of output. Firstly, the iterations of the numerical optimisation algorithm are displayed in the Message window. Secondly, once the estimation algorithm has stopped (hopefully having converged!), the Results window reports details of the model specification and the results of the optimisation algorithm, and gives a summary of diagnostic statistics. Finally, the Test menu can be used to prompt STAMP to output specific details of the parameter estimates, state variables, model fit and diagnostics.

15.1.1 Model estimation

In order to compute the maximum likelihood estimates, STAMP first computes initial estimates and then it moves on to maximise the likelihood function directly by numerical optimisation; see Chapter 14. During this second phase various statistics are reported in the Message window:

Parameters - the current values of the transformed parameters;
Gradients - the derivatives of the likelihood function with respect to the transformed parameters;
Function value - the kernel of the log-likelihood function;
Initial value - the value of the function after the initial phase;
%change - the percentage increase of the current likelihood kernel over the initial value.

15.1.2 Selected model and estimation output

The initial output, written to the Results window, gives details of the selected model and various results from the numerical optimisation procedure; an example is reproduced in §??. In particular, the selected components are listed and the number of parameters is given. The sample period used for parameter estimation is also given.

A variety of statistics emerge from the numerical optimisation. The likelihood kernel evaluated at θ̂ is reported together with the number of iterations required for convergence. Three different convergence measures are reported and these are categorised by some term like STRONG CONVERGENCE; see §14.6.

15.1.3 Summary statistics

An example of the diagnostic summary report, which is printed to the Results window after fitting, is given in §??. Broadly this summary gives the basic diagnostics and goodness-of-fit statistics. The precise definitions are given in §15.4 and §15.7.

Log-likelihood - this is the actual log-likelihood function at its maximum;
Prediction error variance (PEV) - the variance, or variance matrix, of the one-step ahead prediction errors in the steady state, or the finite PEV if accompanied by a message of `No steady state';
Std. Error - the equation standard error is the square root of the PEV;
Normality - the Doornik--Hansen statistic, which is the Bowman--Shenton statistic with the correction of Doornik and Hansen (1994), distributed approximately as χ₂² under the null hypothesis;
H(h) - a test for heteroskedasticity, distributed approximately as F(h,h);
r(τ) - the residual autocorrelation at lag τ, distributed approximately as N(0,1/T);
DW - Durbin-Watson statistic, distributed approximately as N(2,4/T);
Q(P,d) - Box--Ljung Q-statistic based on the first P residual autocorrelations and distributed approximately as χ_d²;
R², R_D² or R_S² - the most suitable coefficient of determination.

Notice that the loss in the degrees of freedom d of the Box--Ljung Q-statistic takes account of the number of relative parameters in θ. It does not take account of any lagged dependent variables and their presence necessitates further adjustment.

15.1.4 The sample period

The parameters are estimated using the model sample as recorded from the Estimation dialog in the Model menu. The calculation of diagnostics and goodness-of-fit measures are calculated using the state sample. The default is that the model sample and the state sample are the same. However, the state sample might be changed to check sub-samples isolated from the rest of the sample. This can be done in the option Final state. Of course, both samples must be within the boundaries of the data sample.

15.1.5 Test

The nine items of the Test menu are self-explanatory. The headings of the following sections correspond to them. Note that the output for Goodness of Fit is requested by the Final state option and that many of the conventional time series diagnostics (built out of the innovations, or one-step ahead prediction errors) will come up in the Residual option.

15.2 Parameters

The variance parameters (and variance matrix parameters for multivariate models), together with frequencies and damping factors for cycles and coefficients for autoregressive components are referred to as parameters. The maximum likelihood estimates of the parameters can be requested from the Further output dialog. In a multivariate model, the same output is produced for each equation in turn. In addition, the full variance matrices and their decomposition in factor loading matrices and diagonal scaling matrices can be produced. This option also outputs the transformed parameters with their scores and approximate standard errors.

15.2.1 Variances and standard deviations

This gives the variances of the disturbances driving the various components. Thus, in terms of §14.1.1, the level variance is σ_η ², the slope variance is σ_ζ ² and so on. The standard deviations are the square roots of the variances. The q-ratios are the ratios of each variance or standard deviation to the largest.

For multivariate models, the variances and standard deviations are obtained from the diagonal entries of the variance matrices. In order to verify the q-ratios for multivariate models, the largest variance across all series is identified, and this component is set to one for all series, irrespective of whether or not it is the largest in a particular series. However, when this variance in a particular series is zero, all associated q-ratios are set to zero. This also applies to q-ratios for standard deviations.

15.2.2 Cycle and AR(1)

For a stochastic cycle component, this option outputs the damping factor ρ, the frequency λ, the period 2π/λ, and the period in terms of `years'; that is, 2π/sλ, where s is the number of `seasons'. In addition it gives the variance of the cycle, σ_ψ ²=σ_κ ²/(1-ρ²). For a multivariate model, the variance matrix of the cycle is given. This is

Σ_ψ =(1-ρ²)^-1Σ_κ ,

where Σ_κ is the variance matrix of the cycle disturbances.

For the AR(1) component in a univariate model, the autoregressive coefficient is reported. For a multivariate model the N×N matrix of coefficients is presented. The variance matrix of the disturbance vector ξ_t is produced.

15.2.3 Variance matrices (multivariate models)

The N×N variance matrices of the disturbances correspond to the variance parameters in a single equation model; see §14.1.3. In the STAMP output, the variance matrix is given by the lower triangular part of an N×N matrix, with the variances corresponding to the disturbance terms in each series given on the main diagonal (these are also produced by `Variances' above). The upper triangular part of the matrix gives the off-diagonal entries in the corresponding correlation matrix.

In a common factor model, some, or all, variance matrices will be of reduced rank; see §14.1.4. This will be reflected by values of unity in specific entries of the correlation matrix.

15.2.4 Factor loading matrices (multivariate models)

The N×K factor loading matrix Θ is defined in §14.1.4. When no common factor restrictions are imposed, so that N=K, the factor loading matrix is the lower triangular matrix associated with the Cholesky decomposition of variance matrix Σ; that is, Σ=ΘDΘ'. This option outputs the lower triangular factor matrix Θ and the diagonal scaling matrix D.

The vectors of constants, such as {\bf \mu } and % {\bf \beta }, are defined in §14.1.4 and only exist when common factors are specified in the model. The dimension of these vectors is N-K. The constant vector does not apply to the irregular component.

15.2.5 Transformed parameters and standard errors

The vector of estimated parameters, which are transformed as in §14.6, can be written to the Results window using this option. It also reports the score vector, associated with the estimated vector of parameters, and the numerical standard errors of the transformed parameters.

The scores are calculated as described in §14.6. The standard errors are obtained from the diagonal entries of the numerically evaluated Hessian matrix Q. Formally, the standard error of the i-th estimated parameter is defined as

se_i=-1/√Q_i

where Q_i is the i-th diagonal entry of matrix Q.

15.3 Final state

The estimate of the final state vector contains all the latest information on the components in the model. Together with the regression estimates of the explanatory variables and the interventions, the final state contains all the information needed to make forecasts.

The Final state dialog also provides useful information about the estimated seasonal pattern and the amplitude of the cycle. Some goodness-of-fit statistics can be generated, but these are discussed in §15.4.

15.3.1 Analysis of state

In terms of §14.3.4, the final state is the filtered estimate at time T; that is, â_T|T. The square roots of the diagonal elements of the corresponding MSE matrix, σ̂²P̂_T|T, are the root mean square errors (RMSEs) of the corresponding state elements. The t-value is the estimate of the state divided by the RMSE. Applied to the i-th element of the state, this is

t-value_i=a_i~/~RMSE_i.

where a_i is the i-th element of vector â_T|T.

The Prob.values shows the probability of the absolute value of a standard normal variable exceeding the absolute value of the t-value. In a hypothesis testing situation, a Prob.value of less than 0.05 corresponds to rejection of the hypothesis of a zero value on a two-sided test at the 5%level of significance. A Prob.value of less than 0.05 is indicated by *, while a value less than 0.01 is denoted **.

The various unobserved components which make up the structural time series model all appear in the final state .

Trend - `Lvl' is the estimate of the level of the trend μ_T while `Slp' is the estimate of the slope β_T. The t-value might not appear for the level when no irregular is specified in the model or when the irregular variance is estimated as zero.
Dummy variable seasonal - the estimate of the seasonal, γ_T, is part of the final state, together with the previous s-2 seasonals. These estimated elements of the state are denoted in the output by `Sea_1',...,`Sea_s-1'. Since the summing to zero constraint is used, the latest seasonal pattern can be obtained. It is given explicitly by the program if `Seasonal tests' is active.
Trigonometric seasonal - the final state contains the estimates for γ_1,T, γ_1,T^*, γ_2,T, γ_2,T^*, etc. and are denoted in the output by `Sea_1', `Sea_2' and so on. Note that the meaning of `Sea_1', `Sea_2, etc., is different to what it is in dummy variable seasonality, and so the final estimated seasonal pattern cannot be identified directly. It must be requested in `Seasonal tests'.
Cycle - the terms `Cyk_1' and `Cyk_2' give the estimates of ψ_T and ψ_T^* for cycle k, where k is 1, 2 and/or 3. Since a cycle component is not persistent throughout the series, a t-value is not appropriate here.
Autoregressive - the estimate of υ_T is denoted as `Ar1'.

The same output appears for multivariate models, but it is repeated for each individual series.

15.3.2 Regression analysis

The estimates of the regression parameters, the parameters of the intervention variables and the fixed seasonal dummies are based on the matrix (s_T,σ̂²S_T); see §14.3.4. These estimates are reported upon request (if part of the specified model) and they may be interpreted in much the same way as in a standard regression model. Because they are deterministic (not time-varying), their RMSEs are standard deviations. The t-values would have t-distributions if the (relative) parameters were known. Although they are asymptotically normal, a t-distribution may provide a better approximation to small sample properties. As was noted earlier, the Prob. values are based on the normal distribution.

15.3.3 Seasonal tests

If the seasonal component is deterministic, either because it is specified to be `fixed' at the outset or its disturbance variance is estimated to be zero, a joint test of significance can be carried out on the s-1 seasonal effects. The test is essentially the same as a test for the joint significance of a set of explanatory variables in regression. Under the null hypothesis of no seasonality, the large sample distribution of the test statistic, denoted by `Seasonal Chi²(s-1)' in the output, is χ_s-1². The Prob. value is the probability of a χ_s-1² variable exceeding the value of the test statistic. In the case of a stochastic seasonal, the joint seasonal test is also produced although a formal joint test of significance of the seasonal effect is inappropriate. However, the seasonal pattern is persistent throughout the series and when the seasonal pattern changes relatively slowly, which is usually the case, the test statistic can provide a useful guide to the relative importance of the seasonal.

The formal definition of the test statistic is

a'P^-1a

where a contains the estimates of the s-1 seasonal effects at time T and P is the corresponding MSE matrix. In the case of a fixed seasonal, a and P are obtained from the matrix (s_T,σ̂²S_T). For a stochastic seasonal, a and P come from the matrix (â_T|T,σ̂²P̂_T|T); see §14.3.4.

The final estimate of the seasonal effect for each season; that is, the estimated seasonal pattern at time T, is given here as well.

15.3.4 Cycle tests

The amplitude of the cycle is

(a²+a^*2)^½

where a is the element of â_T|T corresponding to ψ_T and a^* corresponds to ψ_T^*. The reported ratio is the comparison of the amplitude of the cycle with the level of the trend, and is also reported. This gives an indication of its relative importance. When the cycle is deterministic, but stationary, a joint significance χ² test (the same as the seasonal test) is produced as well.

15.3.5 Data in logs

When it is indicated that a model has been estimated in logarithms, some additional output for specific components appears in the Results window.

Trend - The value exp (a) is reported where a is the element of â_T|T corresponding to μ_T. When a slope component is specified, the estimated annual growth rate is reported as well. The annual growth rate is the estimate of β_T multiplied by 100s.
Seasonal - The multiplicative effect of the seasonal pattern on the level of the series is given by exp (a_j), where a_j is the estimate of the seasonal effect in season j. Also the percentage effect is calculated; that is, 100{ exp (a_j)-1}. This output is added to the output generated for the option `Seasonal tests'.
Cycle - Additional to Cycle tests, the amplitude as a percentage of the level is computed. This is simply multiplying the estimated amplitude by 100.

15.4 Goodness of fit

The goodness-of-fit statistics in STAMP all relate to the residuals, which are the standardised innovations; see §14.3.6. When explanatory variables are specified in the model, the generalised least squares residuals are used; see §14.3.6.1. The diagnostic checking facility for these residuals are provided in the Residuals option; see §15.7.

15.4.1 Prediction error variance

The basic measure of goodness-of-fit is the PEV as defined in §14.3.3. The PEV is the variance of the residuals in the steady state and so it corresponds to the variance of the disturbance of the reduced form ARIMA model. When the Kalman filter does not converge to the steady state, the finite PEV is used. For a univariate model, the PEV is denoted by σ̃², irrespective of whether or not it is the finite PEV. For a multivariate model, σ̃² denotes the diagonal entry of the prediction error variance matrix Σ̃ corresponding to a specific series in y_t. The output includes a message if the Kalman filter is not in the steady state at t=T. The Final state option includes the facility to re-calculate the PEV using a pre-set number of updates.

15.4.2 Prediction error mean deviation

Another measure of fit is the mean deviation of the residuals, v_t,t=d+1,...,T, defined by

md=

σ̃

T-d

∑_t=d+1^T|v_t|

where the PEV σ̃² is defined in §14.3.3. In a correctly specified model the reported ratio

2σ̃²

π~md²

should be approximately equal to unity.

When it is indicated that the the series are modelled in logs, the relative measure of error is also given. This measure is easily computed as 100md.

15.4.3 Coefficients of determination

In econometric modelling the traditional measure of goodness-of-fit is the coefficient of determination R². For a structural time series model this measure is defined by

R²=1-

(T-d)σ̃²

∑_t=1^T(y_t-y)²

where y_t is the original series and y is its sample mean. Note that for a multivariate model, y_t is a specific series of the vector y_t and σ̃² is the diagonal element of Σ̃ corresponding to that series. This coefficient of determination is most useful when the series appears to be stationary with no trend or seasonal.

When the series y_t shows trend movements, it is better to compare the PEV with the variance of first differences, Δy_t=y_t-y_t-1. This leads to

R_D²=1-

(T-d)σ̃²

∑_t=2^T(Δy_t-\Delta y)²

where \Delta y is the sample mean of Δy_t. It should be noted that R_D² may be negative, indicating a worse fit than a simple random walk plus drift model.

For seasonal data with a trend, it is more appropriate to measure the fit against a random walk plus drift and fixed seasonals. This requires the sum of squares, SSDSM, obtained by subtracting the seasonal mean from Δy_t. The coefficient of determination is then

R_S²=1-

(T-d)σ̃²

SSDSM

which can again be negative.

15.4.4 Information criteria : AIC and BIC

The fit of different models can be compared on the basis of the PEV. However when models have different number of parameters, it is more appropriate to compare them using the Akaike information criterion (AIC) or the Bayes information criterion (BIC). These criteria are based on the formula

log PEV+cm/T

where c is a constant and m is the number of parameters in θ plus the number of non-stationary components in the state vector. The AIC takes c as 2 and the BIC takes c as log T. For multivariate models, m is defined as for a univariate model, since the criteria are reported for each series in y_t separately.

15.5 Components

The `Components' option facilitates the computation of filtered or smoothed estimates of the specified components in the model. The filtered estimates are based on the past observation and can be calculated as described in §14.3.5. The smoothed estimates are based on all the observations, see §14.4.2. In what follows, the estimates are denoted by x̃, where x is some component which can be estimated either way. When the model is multivariate, the component is associated with a specific series in y_t.

15.5.1 Series with components

This option plots the original series, y_t, with one of the following (as requested):

the estimated trend, μ̃_t (which includes interventions on the level and slope);
the estimated trend plus cycle(s) plus AR(1), μ̃_t+ψ̃_t+υ̃_t;
the estimated trend plus cycle(s) plus AR(1) plus explanatory variables, μ̃_t+ψ̃_t+υ̃_t+x_t^'δ̃_x, where the last part includes all the explanatory variables in the equation.

15.5.2 Detrended

The detrended series is y_t-μ̃_t for t=1,...,T. The smoothed estimates are usually most appropriate for detrending. Note that level and slope interventions are included in the trend.

15.5.3 Seasonally adjusted

The seasonally adjusted series is y_t-γ̃_t for t=1,...,T. The smoothed estimate of the seasonal component is normally used for seasonal adjustment.

15.5.4 Individual seasonals

The estimated seasonal component γ̃_t is re-arranged into s different series, each corresponding to one of the seasons. The graph can be interpreted as a set of annual plots for the individual seasonal effects.

15.5.5 Data in logs

If the observations are in logarithms, the output of the various components may be amended in two ways: (1) `Anti-logs' can be calculated; or (2) `Modified anti-logs' can be calculated. These amendments do not apply in a similar way for all components. The following list gives the formal amendments:

Trend - (1) exp (μ̃_t) and (2) exp (μ̃_t);
Growth rate (Slope) - (1) 100s~β̃_t and (2) 100s~β̃_t;
Seasonal, Cycle, AR(1) and Irregular - (1) exp (ã) and (2) exp (μ̃_t+ã)- exp (μ̃_t), where ã is a specific component;
Detrended - (1) exp (y_t-μ̃_t) and (2) exp (y_t)- exp (μ̃_t);
Seasonally adjusted - (1) exp (y_t-γ̃_t) and (2) exp (y_t-γ̃_t).

15.6 Residuals

The residuals are the standardised one-step-ahead prediction errors or innovations and they are defined in §14.3.6. The residual series are simply denoted by v_t for t=d+1,...,T. For multivariate models, the series v_t are associated with a specific series in y_t. For a correctly specified model with known variance parameters, the residuals are assumed to be NID(0,1). Diagnostic statistics and graphs are the tools to validate this proposition.

15.6.1 Correlogram

The residual autocorrelation at lag τ is defined by

r_τ=

∑_t=τ+d+1^T( v_t-v) ( v_t-τ-v)

∑_t=d+1^T( v_t-v) ²

, where v=

T-d

∑_t=d+1^Tv_t.

(eq:15.1)

The approximate standard error for r_τ is 1/(T-d)^½. The Durbin and Watson (1950) statistic, is given by

DW=

∑_t=2+d^T(v_t-v_t-1)²

∑_t=1+d^Tv_t²

~~=~2{1-r(1)}.

In a correctly specified model, DW is approximately N(2,4/T).

A general test for serial dependence is the portmanteau Box--Ljung Q-statistic, which is based on the first s residual autocorrelations; see Ljung and Box (1978). It is given by

Q( P,d) =T(T+2)∑_j=1^P

r_j²

T-j

and distributed approximately as χ_d² under the null, where d is equal to P-n+1 and n is the number of parameters. For a multivariate model, n is determined as in a univariate model, so a variance matrix counts as one parameter. Notice that the loss in the degrees of freedom d of Q(P,d) takes account of the number of relative variance parameters. It does not take account of any lagged dependent variables and their presence necessitates further adjustment.

15.6.2 Periodogram and spectrum

These diagnostic graphs are defined in §13.3 and §??. They are applied to v_t in exactly the same way. Note that the theoretical spectrum is a horizontal straight line for a white-noise series.

15.6.3 Cumulative statistics and graphs

The `Cusum' and `Cusum of squares' are as given in §13.5 and §13.6 respectively, but are now calculated using the residuals v_t for t=d+1,...,T.

The `Cumulative periodogram' is another cumulative graph which is useful since it smooths out the periodogram. The cumulative periodogram cp_j is defined as

cp_j=

∑_i=1^jp_i²

∑_i=1ⁿp_i²

where p_i is the periodogram ordinate at i for i=1,...,n and n=[(T-d)/2]. The cumulative periodogram cp_j is plotted against j=0,...,n. A useful diagnostic statistic is to measure the maximum deviation of the cp_j from the diagonal line connecting cp₀=0 and cp_n=1; that is, max |cp_j-j/n| over j=1,...,n-1; see Durbin (1969). This measure is outputted on request in the Results window. Some typical critical values for this two-sided test are .44 for T ~=80, .40 for T ~=100 and .30 for T ~=200.

15.6.4 Distribution statistics

The Bowman and Shenton (1975) statistic for normality is calculated using the residuals v_t for t=d+1,...,T. As for the other statistics, and the graphs on the density, histogram and distribution, the method of calculation is as presented in §13.10.

15.6.5 Heteroskedasticity

A basic non-parametric test of heteroskedasticity is the two-sided F_h,h.-test,

H(h)=∑_t=T-h+1^Tv_t²/∑_t=d+1^d+1+hv_t²~

where h is the closest integer to (T-d)/3. An increase of the variance in the last third of the sample will cause H(h) to be large, while the reverse will cause it to be small. The heteroskedasticity test appears as part of the Summary statistics.

15.7 Auxiliary residuals

Auxiliary residuals are designed to detect unusual movements in a fitted time series model. These residuals are the smoothed estimates of the disturbances associated with the components irregular, level and slope. Section 14.4.4 sets out how to calculate the standardised auxiliary residuals. It is very useful to plot them and look for any irregularities. If requested, all residuals which have absolute value exceeding two are written to the Results window.

As well as graphical representations of the auxiliary residuals, STAMP reports the standard distribution statistics for these residuals; see §13.10. Of course, these statistics cannot be used as standard moment tests because of the serial correlation in the auxiliary residuals. However these statistics can be corrected for serial correlation as in Harvey and Koopman (1992). In STAMP, the correction factors are calculated from the first max ((T)^½, 20) sample autocorrelations of the auxiliary residuals.

15.8 Predictive testing

When the final date of the state sample is smaller than the final date of the data sample; that is, there are L observations, y_t, t=T+1,...,T+L `outside the sample', post-sample predictive testing may be carried out using all or some of these observations. However, when these final dates are the same, predictive testing within the sample is the only option; that is, predicting y_t for t=T-l+1,...,T. Of course, in the first situation, predictive testing within the sample is also possible.

When explanatory variables and/or interventions are specified in the model, the residuals used in predictive testing within the sample are the generalised recursive residuals defined in §14.3.6.2. These residuals are denoted by w_t for t=d^*+1,...,T where d^*=d+k and k is the number of explanatory variables in the model (excluding any variables which do not appear after time T-L). Note that w_t is zero when an intervention enters into the model. The error bands for the residuals are based on twice the RMSE. The CUSUM is defined in §13.5, and it is calculated for these residuals if requested.

The Chow test within the sample is constructed as

Chow=

T-l-d^*

∑_t=T-l+1^Tw_t²~/~∑_t=d^*+1^T-lw_t².

which is approximately distributed as F(l,T-l-d^*).

When post-sample predictive testing is carried out, the estimates of the coefficients of the explanatory variables and/or interventions remain the same as they were at the end of the state sample. The corresponding standardised residuals are denoted by v_t for t=T+1,...,L . The post-sample predictive `failure' test statistic is then

pft=∑_j=1^Lv_T+j²,

which is approximately distributed as χ_L².

The CUSUM t-test is also outputted in the form

cusumt=L^-1/2∑_j=1^Lv_T+j,

which is approximately distributed as a t distribution with T-L-d^* degrees of freedom.

The extrapolative residuals, which are defined in §14.5.2, can be used to assess the predictive performance of the model and make comparisons with rival models. If v̂_T+j|T, j=1,...,L, denotes the j-step ahead extrapolative residual in the post-sample period, the extrapolative sum of squares is given by

ess(T,L)=∑_j=1^Lv̂_T+j|T².

This statistic can also be generated using residuals within the sample. A similar statistic based on the sum of absolute values is defined as

esa(T,L)=∑_j=1^L|v̂_T+j|T|.

15.9 Forecast

The forecasts of the series y_t and the components are generated as described in §14.5. The forecast error bounds are based on one RMSE. For example, the forecasts of a specific series ŷ_T+j are obtained from a particular element of ŷ_T+j|T, for j=1,...,L, and the RMSE r̂_T+j is the square root of the corresponding diagonal element of the matrix σ̂²F̂_T+j|T. When the data are in logs, the anti-log transformation ( exp ) can be calculated for the forecasts. Their corresponding RMSEs are then based on

r̂_T+j^*= exp (ŷ_T+j+r̂_T+j)- exp (ŷ_T+j), j=1,...,L.

The anti-log forecasts adjusted for bias are calculated by

ŷ_T+j^*= exp (ŷ_T+j+0.5r̂_T+j²), j=1,...,L.

Unobserved Components Time Series Methodology

Part:4 Statistical Output

Chapter 13 Descriptive Statistics

13.1 Transformations

13.2 Correlogram

13.3 Periodogram

13.4 Spectrum

13.5 CUSUM

13.6 CUSUM of squares

13.7 Histogram

13.8 Density estimation

13.9 Distribution function

13.10 Write report

13.10.1 Serial correlation

13.10.2 Normality test

Table:13.1 Empirical size of normality tests

Chapter 14 Statistical Treatment of Models

14.1 Model definitions

14.1.1 Univariate time series models

Table:14.1 Some special level and trend specifications

14.1.2 Explanatory variables and interventions

14.1.3 Multivariate time series models

14.1.4 Common factors

14.1.4.1 Common levels

14.1.4.2 Smooth trends with common slopes

14.1.4.3 Common trends: level and slopes

14.1.4.4 Common seasonals

14.1.4.5 Common cycles

14.1.5 Explanatory variables

14.2 State space form

14.2.1 Structural time series models in SSF

14.2.1.1 Univariate models in SSF

14.2.1.2 Multivariate model in SSF

14.2.1.3 Deterministic trend and seasonal components

14.3 Kalman filter

14.3.1 The augmented Kalman filter

14.3.2 The likelihood function

14.3.2.1 Univariate models: the concentrated likelihood

14.3.2.2 Homogeneous models

14.3.2.3 Multivariate models

14.3.3 Prediction error variance

14.3.4 The final state and regression estimates

14.3.5 Filtered components

14.3.6 Residuals

14.3.6.1 Generalised least squares residuals

14.3.6.2 Generalised recursive residuals

14.4 Disturbance smoother

14.4.1 The augmented disturbance smoother

14.4.2 The EM algorithm and exact score

14.4.3 Smoothed components

14.4.4 Auxiliary residuals

14.5 Forecasting

14.5.1 Forecasts of series and components

14.5.2 Extrapolative residuals

14.6 Parameter estimation

14.6.1 The parameters of STAMP

Table:14.2 Parameters and transformations

14.6.1.1 Variance matrices in a multivariate model

14.6.1.2 Vector autoregressive components

14.6.2 BFGS Estimation procedure*

14.6.3 Estimation of univariate models*

14.6.3.1 Initial values

14.6.3.2 Strategy for setting parameters fixed

14.6.3.3 Strategy for determining the concentrated parameter

14.6.4 Estimation of multivariate models*

14.6.4.1 Initial values

14.6.4.2 Strategy for fixing parameters

14.7 Appendix 1: Diffuse distributions*

14.7.1 Collapse of the augmented KF

14.7.2 Likelihood calculation

14.7.3 Recursive regressions

14.8 Appendix 2: Numerical optimisation*

14.8.1 Newton type methods

14.8.2 Numerical score vector and diagonal Hessian matrix

Chapter 15 Model Output

15.1 Output from STAMP

15.1.1 Model estimation

15.1.2 Selected model and estimation output

15.1.3 Summary statistics

15.1.4 The sample period

14.6.2 BFGS Estimation procedure^*

14.6.3 Estimation of univariate models^*

14.6.4 Estimation of multivariate models^*

14.7 Appendix 1: Diffuse distributions^*

14.8 Appendix 2: Numerical optimisation^*