Unobserved Components Time Series Methodology
 Click on one of the following topics
 14 Statistical Treatment of Models
 14.1 Model definitions
 14.1.1 Univariate time series models
 14.1.2 Explanatory variables and interventions
 14.1.3 Multivariate time series models
 14.1.4 Common factors
 14.1.4.1 Common levels
 14.1.4.2 Smooth trends with common slopes
 14.1.4.3 Common trends: level and slopes
 14.1.4.4 Common seasonals
 14.1.4.5 Common cycles
 14.1.5 Explanatory variables
 14.2 State space form
 14.2.1 Structural time series models in SSF
 14.2.1.1 Univariate models in SSF
 14.2.1.2 Multivariate model in SSF
 14.2.1.3 Deterministic trend and seasonal components
 14.3 Kalman filter
 14.3.1 The augmented Kalman filter
 14.3.2 The likelihood function
 14.3.2.1 Univariate models: the concentrated likelihood
 14.3.2.2 Homogeneous models
 14.3.2.3 Multivariate models
 14.3.3 Prediction error variance
 14.3.4 The final state and regression estimates
 14.3.5 Filtered components
 14.3.6 Residuals
 14.3.6.1 Generalised least squares residuals
 14.3.6.2 Generalised recursive residuals
 14.4 Disturbance smoother
 14.4.1 The augmented disturbance smoother
 14.4.2 The EM algorithm and exact score
 14.4.3 Smoothed components
 14.4.4 Auxiliary residuals
 14.5 Forecasting
 14.5.1 Forecasts of series and components
 14.5.2 Extrapolative residuals
 14.6 Parameter estimation
 14.6.1 The parameters of STAMP
 14.6.1.1 Variance matrices in a multivariate model
 14.6.1.2 Vector autoregressive components
 14.6.2 BFGS Estimation procedure^{*}
 14.6.3 Estimation of univariate models^{*}
 14.6.3.1 Initial values
 14.6.3.2 Strategy for setting parameters fixed
 14.6.3.3 Strategy for determining the concentrated parameter
 14.6.4 Estimation of multivariate models^{*}
 14.6.4.1 Initial values
 14.6.4.2 Strategy for fixing parameters
 14.7 Appendix 1: Diffuse distributions^{*}
 14.7.1 Collapse of the augmented KF
 14.7.2 Likelihood calculation
 14.7.3 Recursive regressions
 14.8 Appendix 2: Numerical optimisation^{*}
 14.8.1 Newton type methods
 14.8.2 Numerical score vector and diagonal Hessian matrix
 15 Model Output
 15.1 Output from STAMP
 15.1.1 Model estimation
 15.1.2 Selected model and estimation output
 15.1.3 Summary statistics
 15.1.4 The sample period
 15.1.5 Test
 15.2 Parameters
 15.2.1 Variances and standard deviations
 15.2.2 Cycle and AR(1)
 15.2.3 Variance matrices (multivariate models)
 15.2.4 Factor loading matrices (multivariate models)
 15.2.5 Transformed parameters and standard errors
 15.3 Final state
 15.3.1 Analysis of state
 15.3.2 Regression analysis
 15.3.3 Seasonal tests
 15.3.4 Cycle tests
 15.3.5 Data in logs
 15.4 Goodness of fit
 15.4.1 Prediction error variance
 15.4.2 Prediction error mean deviation
 15.4.3 Coefficients of determination
 15.4.4 Information criteria : AIC and BIC
 15.5 Components
 15.5.1 Series with components
 15.5.2 Detrended
 15.5.3 Seasonally adjusted
 15.5.4 Individual seasonals
 15.5.5 Data in logs
 15.6 Residuals
 15.6.1 Correlogram
 15.6.2 Periodogram and spectrum
 15.6.3 Cumulative statistics and graphs
 15.6.4 Distribution statistics
 15.6.5 Heteroskedasticity
 15.7 Auxiliary residuals
 15.8 Predictive testing
 15.9 Forecast
 Part:4 Statistical Output
 13 Descriptive Statistics
 13.1 Transformations
 13.2 Correlogram
 13.3 Periodogram
 13.4 Spectrum
 13.5 CUSUM
 13.6 CUSUM of squares
 13.7 Histogram
 13.8 Density estimation
 13.9 Distribution function
 13.10 Write report
 13.10.1 Serial correlation
 13.10.2 Normality test
 List of tables
 Table:13.1 Empirical size of normality tests
 Table:14.1 Some special level and trend specifications
 Table:14.2 Parameters and transformations
Part:4 Statistical Output
Chapter 13 Descriptive Statistics
The Transform entry in the Data menu involves transformation of series and calculation of statistics. This chapter provides the formulae underlying the computations of descriptive statistics for individual time series. Modelrelated diagnostic statistics are presented in Chapter 15.
The Describe dialog, to be accessed from the Transform dialog, produces two types of output. The first, and most useful, is graphical output. Each of the options we discuss below produces graphs of one sort or another. However, the last entry in the Describe dialog, `Write report', generates output to be written to the Results window. By default this is turned off, but it is often useful to mark it so that precise summary statistics can be recorded for later use in a research report.
13.1 Transformations
This option provides two different kinds of transformations: a transform function for the actual values and a time series transform function. These different transformations are described below.
The very common and basic transformations are the logarithm, the exponent and the absolute values. The `BoxCox' and the `Stochastic volatility' transformations require some explanation. The transformation of Box and Cox (1964) is given by


where λ usually ranges between 2 and 2 (STAMP does not accept values outside this range to ensure numerical stability). Note that the BoxCox transformation with the default λ= 1/2 is very similar to the `square root' transformation. The following stochastic volatility transformation (see also Chapter ??) is suggested by Professor Wayne Fuller, see Fuller (1996, p.494497); that is,


where s_{y}^{2} is the unconditional variance of the y_{t}, and λ is a small constant value, the default value 0.02.
The second type of transformations focuses on differencing or integrating time series. If differencing is required, then the following is optional:


where L is the lag operator (so Ly_{t}=y_{t1}) and s is the number of seasonals in a year. When the differencing involves missing data, such as y_{0} and y_{1}, then the resulting differenced series will record a missing value. An example of this is that Δy_{1} is regarded as a missing value. If integration is required, then the resulting series, called z_{t}, are defined as


where z_{t} is set to zero for any t≤0. This operation does not generate extra missing values.
Finally, the series can be detrended. This is carried out using the following trend model:


which is discussed in §14.1. A forwards and backwards recursive algorithm is used to estimate μ_{t}, written by \mu _{t}, allowing the `detrended' series to be computed as y_{t}\mu _{t}. The default parameter for q is 0.000625=1/1600, as suggested by Hodrick and Prescott. By increasing this number, a more responsive trend line is fitted, while if it is set to zero, a deterministic time trend is obtained.
13.2 Correlogram
The correlogram is defined as the series r_{τ} , for 1,...,p, where r_{τ} is the sample correlation between y_{t} and y_{tτ}. The length of the correlogram p is prespecified, leading to a figure which shows r_{1},r_{2},...,r_{p} plotted against 1,2,...,p. The autocorrelation for lag τ is defined by


where y= 1/T ∑_{t=1}^{T}y_{t} is the sample mean of y_{t}. The approximate standard error for r_{τ} is 1/(T)^{½}.
13.3 Periodogram
The information in the correlogram can also be expressed in the frequency domain. The ordinates of the periodogram are


where a_{j} and b_{j} are given by


with c_{j,t}= cos λ_{j}t and s_{j,t}= sin λ_{j}t, for j=1,...,n1. The sine and cosine terms are evaluated recursively, using the trigonometric addition rule; that is,


for t=0,...,T1. Note that the terms cos λ_{j} and sin λ_{j}, for j=0,...,n, can also be evaluated recursively using a similar scheme. These recursions speed up the calculations by a significant amount. The periodogram is a plot of the ordinates p(λ_{j}) against λ_{j}/π, for j=1,...,n. Thus the Xaxis ranges between 0 and 1.
The periodogram can also be written in terms of the sample autocorrelations; that is,


where c(τ)= 1/T ∑_{t=1}^{T}(y_{t}y)(y_{tτ}y).
In the literature on the spectral analysis of stationary time series, the plot of p(λ_{j}) against λ_{j} is sometimes called the spectrogram, but the term periodogram is more widely used. For more information on this topic see Priestley (1981).
13.4 Spectrum
The periodogram is an inconsistent estimator of the spectrum; a consistent estimator of the spectrum is given by


where m is called the window length and the w_{j}'s are normalised weights which are constrained by w_{j}=w_{j}, for j=0,...,m. The weights in STAMP are directly associated with the odd rows of the Pascal's triangular scheme, for example,


and w_{j}=c_{j}4^{1m}. Note that, for this Pascal's scheme, the normalisation factor is based on ∑_{j=m}^{m}c_{j}=4^{m1}. At the beginning and at the end of the estimated spectrum the weighting scheme is simply truncated and the normalisation factor is adjusted appropriately so that ∑_{j} w_{j}=1.
13.5 CUSUM
The CUSUM graph is designed to assess whether the mean of a process y_{t} changes over time; see Brown, Durbin and Evans (1975) and Harvey (1990, p. 1535). In STAMP the CUSUMs are defined with respect to the standardised series


where y= 1/T ∑_{j=1}^{T}y_{j} and σ̂ _{y}=( 1/T ∑_{j=1}^{T}( y_{j}y) )^{½}. The motivation for the CUSUM is that if the mean changes through the standardised series, it will have runs of positive or negative observations. Thus the CUSUM will drift up (or down), before going back to zero at CUSUM(T). A formal check on the CUSUM is based on assessing whether it crosses two `boundary' lines as given by


These lines are based on a significance level of 10%.
13.6 CUSUM of squares
A similar device to the CUSUM is available for detecting a change in variance. Attention is now focused on the squares of the data and so we work with,


As t increases, the CUSUMSQ increases. If the model is homoskedastic, the increase should be linear. Departures from the 45^{°} line in the CUSUMSQ graph suggest a breakdown of this assumption.
13.7 Histogram
Let z_{t} denote the standardised observations (y_{t}y)/σ̂ _{y}, for t=1,...,T, where y and σ̂ _{y} are defined in §13.5. The range of z_{t} is divided into N intervals of length h with h defined in §13.8. Then the proportion of z_{t} in each interval constitutes the histogram.
13.8 Density estimation
The density can be estimated as a smoothed function of the histogram using a normal or Gaussian kernel. This can then be summed (`integrated') to obtain the estimated cumulative distribution function (CDF).
Denote the actual density of Z at z by f_{z}( z) . A nonparametric estimate of the density is obtained from the sample by:


where h is the window width or smoothing parameter, and K( ^{.}) is a kernel such that:


STAMP sets h=1.06σ̂ _{z}/T^{0.2} (=1.06/T^{0.2} as σ̂ _{z}=1) as a default and uses the standard normal density for K( ^{.}) :


We thank Professor Bernard Silverman for permission to use his algorithm for calculating f_{z}( z) ̂. A general treatment of density function estimation can be found in Silverman (1986).
13.9 Distribution function
The estimated CDF of Z can be derived from f_{z}( z) ̂; this is shown with a standard normal CDF for comparison. When selecting one graph, a nonparametric density estimation menu allows the choices determining h to be set. The menu does not appear for multiple graphs on one screen.
13.10 Write report
When the `Write report' option is active in the Describe dialog, some descriptive statistics will be sent to the Results window. Which are selected to be reported will depend on the other options which are activated in the Describe dialog. This can be broken down into two types of output: measures of serial dependence and normality.
13.10.1 Serial correlation
The correlogram values r_{τ} , for τ=1,...,p, are defined in §??. Their standard errors, under the null hypothesis of white noise, are equal to 1/(T)^{½}. The BoxLjung portmanteau statistic is given by


see Ljung and Box (1978). This is tested against a χ_{q}^{2} distribution. When no model is fitted, the degrees of freedom, q, is set equal to p. However, when the Qstatistic is applied to the residuals of an estimated structural time series model, q is set equal to p+1 minus the number of parameters. Chapter 15 gives the details on how the Q statististic is used in the context of fitting structural time series models. The BoxLjung Qstatistic is a modification of the earlier Box and Pierce (1970) statistic, corrected to have better size behaviour in moderate samples. It is reported for a variety of values of p along with its Prob. value computed using a χ_{q}^{2} distribution.
13.10.2 Normality test
Let μ, σ^{2} denote the mean and variance of y, and write μ_{i}=E[ yμ] ^{i}, so that σ^{2}=μ_{2}. The skewness and kurtosis are defined as


Sample counterparts are defined by


A normal variate will have √β_{1}=0 and β_{2}=3. STAMP reports


which are both tested against a χ_{1}^{2} distribution. The BowmanShenton test for normality is given by


which is tested against a χ_{2}^{2} distribution. As Bowman and Shenton (1975) note, the above tests are unsuitable unless used in very large samples. The statistics √b_{1} and b_{2} are not independently distributed in small samples, and the sample kurtosis especially approaches normality very slowly. Another normality test reported by STAMP is described in Doornik and Hansen (1994). It derives from Shenton and Bowman (1977), who give b_{2} (conditional on b_{2}>1+b_{1}) a gamma distribution, and D'Agostino (1970), who approximates the distribution of √b_{1} by the Johnson S_{u} system. Let s^{*} and k^{*} denote the transformed skewness and kurtosis, respectively, where the transformation creates statistics which are much closer to standard normal. The alternative normality test is


which is also tested against a χ_{2}^{2} distribution.Table Table:13.1 compares (eq:13.24) with its asymptotic form (eq:13.23). It gives the rejection frequencies under the null of normality, using χ_{2}^{2} critical values. The experiments are based on 10,000 replications and common random numbers.
Table:13.1 Empirical size of normality tests
nominal probabilities of N_{DH}  nominal probabilities of N_{BS}  
T  20%  10%  5%  1%  20%  10%  5%  1% 
50  0.1734  0.0869  0.0450  0.0113  0.0939  0.0547  0.0346  0.0175 
100  0.1771  0.0922  0.0484  0.0111  0.1258  0.0637  0.0391  0.0183 
150  0.1845  0.0937  0.0495  0.0131  0.1456  0.0703  0.0449  0.0188 
250  0.1889  0.0948  0.0498  0.0133  0.1583  0.0788  0.0460  0.0180 
STAMP reports the following statistics under this option:
mean  y  maximum  
standard deviation  σ_{y}= (m_{2})^{½}  skewness χ_{1} ^{2}  s  
skewness  √b_{1}  kurtosis χ_{1} ^{2}  k  
excess kurtosis  b_{2}3  normality χ_{2} ^{2}  N_{BS}  
minimum  normality χ_{2} ^{2}  N_{DH} 
Chapter 14 Statistical Treatment of Models
To understand fully the output of STAMP it is useful to understand the way in which the models are handled statistically. The algorithms used to carry out the computations are based on the state space form. The interested reader may find explanations of the underlying ideas and proofs of the algorithms in Anderson and Moore (1979), Harvey (1989, Chs. 3 and 4), de Jong (1991) and Koopman (1993). This chapter simply sets out the algorithms and gives the exact technical details of implementation. We first formally define the models which were used in Chapters ????.
14.1 Model definitions
14.1.1 Univariate time series models
A univariate model may be written as


where μ_{t} is the trend, γ_{t} is the seasonal, ψ_{t} is the cycle, ν_{t} is a firstorder autoregressive component and ε_{t} is the irregular. The model can be extended with two similar cycle components; see below.
The stochastic trend component is specified as

where β_{t} is the slope or gradient of the trend μ_{t}. The irregular ε_{t}, the level disturbance η_{t} and the slope disturbance ζ_{t} are mutually uncorrelated. The slope component β_{t} can be excluded from the trend specification when this is appropriate. Some special trend specifications are listed in Table Table:14.1.
Table:14.1 Some special level and trend specifications
Level  σ_{ε}  σ_{η}  
constant term  *  0  
local level (LL)  *  *  
random walk (RW)  0  *  
Trend  σ_{ε}  σ_{η}  σ_{ζ} 
deterministic  *  0  0 
LL with fixed slope  *  *  0 
RW with fixed drift  0  *  0 
local linear (LLT)  *  *  * 
smooth trend  *  0  * 
second differencing  0  0  * 
HodrickPrescott  *  0  .025 σ_{ε} 
The seasonal component may have the dummy variable form or the trigonometric form. The number of seasonal frequencies in a period (e.g. a year) is given by integer s. When s is even, [s/2]=s/2, and when s is odd, [s/2]=(s1)/2. The seasonal dummy is given by
γ_{t}=γ_{t1}...γ_{ts+1}+ω_{t}, ω_{t}~NID(0,σ_{ω} ^{2}). 
The trigonometric seasonal form is
γ_{t}=∑_{j=1}^{[s/2]}γ_{j,t} 
where each γ_{j,t} is generated by
[ 
 ] =[ 
 ] [ 
 ] +[ 
 ] , 

where λ_{j}=2πj/s is the frequency, in radians, and the seasonal disturbances ω_{t} and ω_{t}^{*} are two mutually uncorrelated NID disturbances with zero mean and common variance σ_{ω} ^{2}. For s even, the component at j=s/2 collapses to
γ_{j,t}=γ_{j,t1} cos λ_{j}+ω_{j,t}. 
The statistical specification of a cycle, ψ_{t}, is given by
[ 
 ] =ρ_{ψ} [ 
 ] [ 
 ] +[ 
 ] , t=1,...,T, 
where ρ_{ψ} , in the range 0<ρ_{ψ} ≤1, is a damping factor; λ_{c} is the frequency, in radians, in the range 0 ≤λ_{c} ≤π; κ_{t} and κ_{t}^{*} are two mutually uncorrelated NID disturbances with zero mean and common variance σ_{κ} ^{2}. Note that the period of the cycle is equal to 2π/λ_{c}. There may be two additional cycles of this form incorporated in the model.
A firstorder autoregressive, AR(1), process is given by
ν_{t}=ρ_{ν} ν_{t1}+ξ_{t}, ξ_{t}~NID(0,σ_{ξ} ^{2}), 
with ρ_{ν} in the range 0<ρ_{ν} <1. The AR(1) component is actually a limiting case of the stochastic cycle when λ_{c} is equal to 0 or π, though it is specified separately in STAMP, partly to avoid confounding and partly because it is not a limiting case in multivariate models.
Finally, the disturbances driving each of the components in the model are mutually uncorrelated.
14.1.2 Explanatory variables and interventions
A single equation model may include exogenous explanatory variables, lagged values of the dependent variable and intervention variables, as well as unobserved components such as trend, seasonal and cycle. Thus (eq:14.1) can be extended as


where x_{it} is an exogenous variable, w_{jt} is an intervention (dummy) variable and φ_{τ} ,Δ_{iτ} and λ_{j} are unknown parameters.
14.1.3 Multivariate time series models
Multivariate models have a similar form to univariate models, except that y_{t} is now an N×1 vector of observations, which depends on unobserved components which are also vectors. Thus in the special case of a multivariate local level model


where Σ_{ε} and Σ_{η} are both N×N variance matrices, and η_{t} and ε_{t} are mutually uncorrelated in all time periods. The other disturbances in the general model (eq:14.1) similarly become vectors which have N×N variance matrices. Models of this kind are called seemingly unrelated time series equations (SUTSE).
In a homogeneous model the variance matrices are proportional. In the case of a multivariate local level model this implies Σ _{η} =qΣ _{ε} , where q is a nonnegative scalar.
The other components are incorporated in a multivariate model in a similar manner by letting the components and disturbances become N×1 vectors and letting the variances become N×N variance matrices. In the case of the cycle, the parameters ρ and λ are the same for all series. As regards the disturbances,


The specification of the cycle model can also be written as


where
Var[ 
 ] =I_{2}⊗Σ_{κ} , ψ_{t} and ψ_{t1}^{*}are N ×1 vectors. 
A stationary firstorder vector autoregressive may be included in a multivariate model as an alternative to, or even as well as, a cycle. Thus


where Φ is a N×N matrix of coefficients. The condition for ν_{t} to be stationary is that the roots of the matrix polynomial IΦL should all lie outside the unit circle.
14.1.4 Common factors
In a common factor model, some or all of the components are driven by disturbance vectors with less than N elements. In terms of a SUTSE model, the presence of common factors means that the variance matrices of the relevant disturbances are less than full rank.
Common factors may appear in all components, including the irregular. Common trends arise through common levels, common slopes or a combination of the two.
14.1.4.1 Common levels
Consider the local level model, (eq:14.3), but suppose that the rank of Σ_{η} is K<N. The model then contains K common levels or common trends and may be written as


where η_{t}^{†} is a K×1 vector, Θ is a (N×K) matrix of standardised factor loadings, D_{η} is a diagonal matrix and μ_{θ} is a (N×1) vector in which the first NK elements are zeros and the last K elements are contained in a vector {\bf \mu } . The standardised factor loading matrix contains ones in the `diagonal' positions, i.e. θ_{ii}=1,i=1,...,K, while θ_{ij}=0 for j>i.
The model may be recast in the original SUTSE form (eq:14.3) by writing μ_{t}=Θμ_{t}^{†}+μ_{θ} and noting that Σ_{η} =ΘD_{η} Θ' is a singular matrix of rank K. When there are no common trends so N=K, the factor loading matrix is the Cholesky decomposition of Σ_{η} . As regards {\bf \mu }, partition Θ=(Θ_{1}',Θ_{2}')' where Θ_{1} consists of the first K rows, and partition μ_{t} similarly. Then


The vector {\bf \mu } is estimated from the above set of equations using the estimated states at t=T. The computations are carried out by backsubstitution since the load matrix is lower triangular.
14.1.4.2 Smooth trends with common slopes
Common slopes may be formulated along similar lines. We first consider the case where the variance matrix of the slope disturbances is of rank K_{β} but the variance matrix of levels is null so that the estimated trends are relatively smooth. The model is therefore

but, following (eq:14.7) above , it may be reformulated as

where Σ_{ζ} =ΘDΘ' and the first K_{β} elements in μ_{θt} are zeros and the remainder are contained in a vector {\bf \mu }+{\bf \beta }% t. The (NK)×1 vectors, {\bf \mu } and % {\bf \beta }, may both be calculated by expressions like (eq:14.8).
14.1.4.3 Common trends: level and slopes
A general multivariate local linear trend model in which the level variance matrix is of rank K_{η} while the slope variance matrix is of rank K_{β} may be written

where the N×K_{β} matrix Θ_{β} is such that Σ_{ζ} =Θ_{β} D_{ζ} Θ_{β} ' ,and β_{θ} =(0',{\bf % \beta }')' with {\bf \beta } a vector of length NK_{β} .
If K_{β} =1, setting Θ_{β} equal to a vector of ones and letting {\bf \beta }=0 would imply that all series had the same underlying growth rate (when modelling in logs). This might be plausible even if there are no common levels. The implication is that the trends in the forecast function remain parallel, in other words the longrun growth paths are the same. However, unless there are similar restrictions on the levels, the growth paths within the sample will not necessarily have kept together.
The following interpretation can be made if K_{β} ≤K_{μ} and the common stochastic slopes only affect the observations via the common stochastic trends:

where Θ_{1μ} is the first K_{μ} rows of Θ_{μ} ,Θ_{μβ} is the first K_{μ} rows of Θ_{β} ,μ_{θt}=(0',β_{1}'t,{\bf \mu }'+β_{2}'t)' with {\bf \beta }_{1} is a vector of length K_{μ} K_{β} , {\bf \beta }_{2} is a vector of length NK_{μ} and % {\bf \beta }=({\bf \beta }_{1}', {\bf \beta }_{2}')' . The fixed components satisfy, for any 1 ≤t ≤T,
\mu =Θ_{2μ}Θ_{1μ}^{1}μ_{1t}+μ_{2t}{\bf \beta }_{2}t and \beta =Θ_{2β}Θ_{1β}^{1}β_{1t}+β_{2t} 
where Θ_{1β} is the first K_{β} rows of Θ_{β} and Θ_{2β} is the last NK_{μ} rows.
At present STAMP does not offer the option of making the restrictions implied by the above formulation. However, if K_{μ} =K_{β} =1, it must be the case that the above restriction implies Θ_{2β}=Θ_{2μ}. More generally for K_{μ} =K_{β} we have
μ_{t}^{†}=μ_{t1}^{†}+Θ_{1μ}^{1}Θ_{1β}β_{t1}^{†}+η_{t}^{†}, η_{t}^{†}~NID(0,D_{η} ) 
and setting Θ_{1β}=Θ_{1μ} implies the further restriction that the slopes in the common trends are mutually independent. Thus the common trends are fully independent in both level and slope disturbances. However, for this to be true for all rotations the further condition that D_{ζ} =q_{ζ} D_{η} is needed. This means that Σ_{ζ} =q_{ζ} Σ_{η} so the model is what we call trend homogeneous .
To summarise trend homogeneity means that K_{μ} =K_{β} and the slope and level disturbances enter through the same common trends, and when there is more than one common trend these trends are mutually independent. When the trend homogeneous constraint is applied, any restrictions made to Θ_{1μ} will automatically be carried over to Θ_{1β}.
14.1.4.4 Common seasonals
Common factors in seasonality implies a reduction in the number of disturbances driving changes in the seasonal patterns. It does not imply any similarity in seasonal patterns unless the deterministic seasonal components outside the common seasonals are set to zero. Thus suppose, for simplicity, that the series contain only seasonals and irregular. Then
y_{t}=Θγ_{t}^{†}+γ_{θt}+ε_{t}, 
where the last NK elements of γ_{θt} contain fixed seasonal effects.
14.1.4.5 Common cycles
A model with common trends and common cycles could be written
y_{t}=Θ_{μ} μ_{t}+μ_{θ} +Θ_{ψ} ψ_{t}+ε_{t}, 
where Θ_{ψ} is N×K_{ψ} . Since the expectation of a cycle is zero, it is not necessary to include a vector of constant terms corresponding to the vector μ_{θ} which is needed for common levels.
As with other components, the common cycle can be reformulated as a SUTSE model in which the disturbance variance matrices are Σ_{κ} = Θ_{ψ} D_{κ} Θ_{ψ} '. This is only possible because ρ_{ψ} and λ are the same in all the common cycles.
14.1.5 Explanatory variables
Explanatory variables and interventions may be included in multivariate models. Thus (eq:14.2) generalises to
y_{t}=μ_{t}+γ_{t}+ψ_{t}+r_{t}+∑_{τ=1}^{r}Φ_{τ} y_{tτ}+∑_{τ=0}^{s}δ_{τ} x_{tτ}+Λw_{t}+ε_{t}, t=1,...,T, 
where x_{t} is a K×1 vector of explanatory variables and w_{t} is a K^{*}×1 vector of interventions. Elements in the parameters matrices, Φ, δ_{τ} and Λ may be specified to be zero, thereby excluding certain variables from particular equations. In addition, the unobserved components may be subject to common factor restrictions as described in the previous subsection.
14.2 State space form
The statistical treatment of structural time series models is based on the state space form (SSF). We use a particular SSF which is close to the `state space model' of de Jong (1991). It joins a measurement equation


with a transition equation, which allows the states α_{t} to evolve according to a firstorder vector autoregressive process. We write this as


The transition equation is initialised at the first time point by setting


The model is completed by making assumptions about the error process u_{t} and the vector of regressors b. We assume they are independently and normally distributed with


This formulation of the SSF is slightly different from that exploited in books such as Harvey (1989) and West and Harrison (1989). It is used to produce rather more elegant treatments of diffuse initial conditions and fixed effects; see de Jong (1991).
The state system matrices Z_{t} and T_{t} are fixed matrices which merely contain known values. The regression system matrices X_{t} and W_{t} are always known. The error system matrices G_{t} and H_{t} are also sparse but most nonzero values are unknown and are regarded as "hyper" parameters. The disturbance u_{t} is transformed into noise for the SSF by the error system matrices G_{t} and H_{t}. All system matrices can be regarded as selection type matrices.
At first sight, the role for b in the SSF looks quite complicated. However, the SSF is written in this way to allow a unified treatment of a variety of features for time series models. There are three particularly important ones on which to focus. First, the measurement equation allows b to influence the observations directly through the regressors X_{t}. In general X_{t}, like G_{t} and H_{t}, may be a sparse selection type matrix. Second, the states are directly affected by b through W_{t}. This feature is particularly useful for handling interventions like structural changes (changes in trend) and growth changes (changes in slope). Third, the prior distribution of the initial state vector is (partly) defined via b.
The random vector b allows the analysis of models where diffuse distributions are placed on the regressors and on the nonstationary components of the initial state. The vector b is a linear combination of the known constant vector c and the random vector δ. The distribution of δ is regarded as diffuse, i.e. the variance matrix Λ converges to infinity. Basically, this means that a proper prior distribution for the state cannot be given when components are nonstationary. For a technical discussion on these issues, we refer to de Jong and ChuChunLin (1994). The connection between diffuse initial conditions and marginal or restricted likelihoods is discussed in TunnicliffeWilson (1989) and Shephard (1993b). The unconditional distributions for stationary components, such as cycles, are handled via the second part of (eq:14.11); that is, H_{0}u_{0}.
The generality of the SSF can be simplified for structural time series models in three different ways.
 The constant vector c of the specification for b can
be set to a zero vector such that b=Bδ. The matrix B is a square selection matrix of zeros and ones. The vector δ
can be partitioned such that the regression effects and
the initial effects are separated and that δ enter the
equations of the SSF directly. This is achieved by
δ=( δ_{x} δ_{i} ) , and B=(B_{x},B_{i}), where the (k×1) vector δ_{x} contains the regression effects, the (d×1) vector δ_{i} contains the initial effects and the dimensions of the matrices B, B_{x} and B_{i} are (k+d)×(k+d), (k+d)×k and (k+d)×d, respectively. This brings the specification for b to
b=B_{x}δ_{x}+B_{i}δ_{i} and
X_{t}b = X_{t}^{*}δ_{x} where X_{t}^{*} = X_{t}B_{x} and X_{t}B_{i} = 0, W_{t}b = W_{t}^{*}δ_{x} where W_{t}^{*} = W_{t}B_{x} and W_{t}B_{i} = 0, W_{0}b = W_{0}^{*}δ_{i} where W_{0}^{*} = W_{0}B_{i} and W_{0}B_{x} = 0.  For all models in STAMP, G_{t} and H_{t} will be
orthogonal; that is, G_{t}'H_{t}=0, so the noise
terms in the two equations, G_{t}u_{t} and H_{t}u_{t},
are independently distributed. This feature of a model simplifies the
statistical analysis of the SSF.
 All models in STAMP can be handled by a timeinvariant SSF. This implies that the t subscripts of the state and error system matrices can be dropped, i.e. Z_{t}=Z, T_{t}=T, G_{t}=G and H_{t}=H. A timeinvariant SSF also simplifies the statistical analysis of the SSF.
14.2.1 Structural time series models in SSF
Structural time series models fit nicely in the state space form. However, some care should be taken with the initial state vector specification.
14.2.1.1 Univariate models in SSF
Some examples are given to get a flavour of it.
 Seasonal model. The first example of a structural time series
model in SSF is a model with
smooth trend and dummy
stochastic seasonals with s=4 (quarterly
observations). The SSF representation is
y_{t} = ( 1 0 1 0 0 ) α_{t}+( σ_{ε} 0 0 ) u_{t} α_{t} = ( μ_{t} β_{t} γ_{1,t} γ_{2,t} γ_{3,t} ) =( 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 ) α_{t1}+( 0 0 0 0 σ_{ζ} 0 0 0 σ_{ω} 0 0 0 0 0 0 ) u_{t} with vector u_{t}=(ε_{t},ζ_{t},ω_{t})' and B_{x}=0. This illustrates the fact that the matrices G and H play the role of selectors and scaling factors for the model, often being very sparse. The initial state vector is given by α_{1} = δ_{i} such that B=B_{i}=I and H_{0}=0.
 Cycle model. Another example is a univariate model which
consists of a level and cycle component. The SSF representation of this model is
y_{t} = ( 1 1 0 ) α_{t}+( σ_{ε} 0 0 0 ) u_{t}, where α_{t}=( μ_{t},ψ_{t},ψ_{t}) ', α_{t} = ( 1 0 0 0 ρ cos λ_{c} ρ sin λ_{c} 0 ρ sin λ_{c} ρ cos λ_{c} ) α_{t1}+( 0 σ_{η} 0 0 0 0 σ_{κ} 0 0 0 0 σ_{κ} ) u_{t} with vector u_{t}=(ε_{t},η_{t},κ_{t},κ_{t}^{*})^{'}. The unknown values in transition matrix T are the damping factor ρ and frequency λ_{c}. The initial state is given by
α_{1} = ( 1 0 0 ) δ_{i}+( 0 0 σ_{κ} (1ρ^{2})^{ 1/2 } 0 0 σ_{κ} (1ρ^{2})^{ 1/2 } ) u_{0} with vector u_{0}=(κ_{0},κ_{0}^{*})'. Notice that δ_{i} is a scalar here.
 Explanatory variables. A univariate structural time series model may include
explanatory variables. Consider a local linear trend model with two explanatory variables and a level intervention at some
time point t=j. The SSF representation of this model is given by
y_{t} = ( 1 0 ) α_{t}+( x_{1,t} x_{2,t} 0 ) δ_{x}+( σ_{ε} 0 0 ) u_{t} α_{t} = ( μ_{t} β_{t} ) =( 1 1 0 1 ) α_{t1}+( 0 0 w_{t} 0 0 0 ) δ_{x}+( 0 σ_{η} 0 0 0 σ_{ζ} ) u_{t} with vector u_{t}=(ε_{t},η_{t},ζ_{t})'. The initial state vector is given by α_{1}=δ_{i} such that B=I and H_{0}=0. The series x_{1,t} and x_{2,t} are explanatory variables and w_{t} is a series of zeros but at some time point t=j its value is unity.
14.2.1.2 Multivariate model in SSF
Any multivariate model in STAMP with common factors can be written in the SUTSE form with lower rank variance matrices; see §14.1.4. The SUTSE form retains the sparse structure of the system matrices Z and T which is important for computational issues related to the estimation of these models. Therefore, this general formulation is adopted to specify a multivariate model in the SSF. Since the SUTSE form is a minor generalisation to the univariate model, only one example will be given. The multivariate model is a smooth trend plus cycle model which can be represented by the SSF as

where state vector α_{t}=( μ_{t}',β_{t}',ψ_{t}',ψ_{t}^{*'}) , disturbance vector u_{t}=(ε_{t},ζ_{t},κ_{t},κ_{t}^{*})' and each variance matrix is decomposed as Σ=ΓΓ' with the appropriate subscript. The unknown values in transition matrix T rely on the damping factor ρ and the frequency λ_{c}. The initial state is given by

with vector u_{0}=(κ_{0},κ_{0}^{*})' and δ_{i} contains the initial effects for μ_{t} and β_{t}.
14.2.1.3 Deterministic trend and seasonal components
The deterministic seasonal component is the special case of the stochastic seasonal model with σ_{ω} =0. Alternatively, the fixed seasonal can be incorporated within X_{t} as it is usually done in regression models. The latter one is adopted in STAMP. In a similar way (partly) deterministic trends can be incorporated in X_{t} or W_{t}. However, for practical reasons such as the implementation of trend interventions within W_{t}, the trend components μ_{t} and β_{t} are always part of the state vector. The inclusion of a fixed trend in the state vector has some consequences for calculation and interpretation of residuals and the prediction error variance.
14.3 Kalman filter
The Kalman filter (KF) plays the same role for time series models in SSF as least squares computations for a regression model. The KF is primarily a set of vector and matrix recursions. The importance of the KF is based on
 computation of onestep ahead predictions of observation and state
vectors, and the corresponding mean square errors;
 diagnostic checking by means of onestep
ahead prediction errors;
 computation of the likelihood function via the onestep ahead
prediction error decomposition;
 smoothing which uses the output of the KF.
The KF has a variety of forms, but the one exploited in STAMP computes

which are referred to as the mean and mean square error (MSE) of the state, respectively, given the past information and setting δ to zero. Generally, we need to perform estimation with δ not zero, so these terms will be corrected by the use of an augmented Kalman filter. This will be discussed in the next subsection.
The recursive equations of the Kalman filter are given by


starting with a_{10}=W_{0}b, P_{10}=H_{0}H_{0}' and q_{0}=0. Here K_{t} is called the Kalman gain, while v_{t} and σ^{2}F_{t} are the onestep ahead prediction error (or innovation) and its mean square error, respectively. The scaled innovations F_{t}^{ 1/2 }v_{t} (or generalised least squares residuals) are approximately NID with zero mean and scale identity matrix as its variance matrix in a correctly specified model. Note that the estimate of σ^{2} is given by σ̂^{2}=q_{T}/NT. The proof of the KF is simple and it relies on standard results of linear estimation. Although the KF is presented in terms of timeinvariant system and error matrices (Z, T, G and H), the time indices can be added to these matrices without any problem.
In the special case of a univariate model, N=1, the innovation vector and the corresponding mean square error matrix reduce to the scalars v_{t} and σ^{2}f_{t}, respectively. Also, the Kalman gain matrix reduces to the vector k_{t}. This notation is only used for univariate models.
The KF requires a positive (for univariate models) or a nonsingular positive definite matrix (for multivariate models) F_{t} which may not occur at the start of the KF when, for example, the column rank of G_{1} is less N and P_{10}=H_{0}H_{0}'=0. An example is when no irregular is present in a structural time series model. In these circumstances, the KF is warmed up by a number of prediction updates; that is, a_{j+10}=Ta_{j0} and P_{j+10}=TP_{j0}T'+HH'. The initialisation process continues until F_{j}=ZP_{j}Z'+GG' is positive definite. The KF is then reinitialised by a_{10}=a_{i0}, P_{10}=P_{i0} and q_{0}=0 where i is the number of required prediction updates.
When F_{t} becomes nonpositive definite during the KF updating, it is generally assumed that a numerical problem has occurred. Available remedies like a squareroot version of the KF are not considered. However, the implementation of the KF in STAMP ensures for each timepoint t that the MSE matrices P_{t+1t} and F_{t} are symmetric, which is crucial in avoiding numerical problems. Also, avoiding unnecessary computations, such as adding zeros and multiplying zeros or unity values, helps in this respect.
In case of a timeinvariant SSF, the KF may reach a steady state solution at some time s; that is, when P_{s+1s}=P_{ss1} for p<s≤T where p is the dimension of the state vector. This also implies that F_{j} and K_{j} are timeinvariant for j=s,...,T. The KF reduces in these circumstances to the equations for a_{t+1t}, v_{t} and q_{t} which are computationally very cheap. Finally, the steady state can be interpreted as the solution of the Riccati equation; see Anderson and Moore (1979).
14.3.1 The augmented Kalman filter
The KF evaluates the onestep ahead prediction of the state vector conditional on δ=0. When nonstationary components and/or fixed regression effects are present in the model, the unknown random vector δ is treated as diffuse; see [14.2]. In these circumstances an augmented Kalman filter (AKF) is applied:


for t=1,...,T and with A_{10}=W_{0}B. The AKF must be regarded as a supplement to the KF equations; the expressions for v_{t}, F_{t}, q_{t}, K_{t}, a_{t+1t} and P_{t+1t} remain as they are. The number of columns for V_{t} and A_{t+1t} is k+d, the same number of columns as in matrix B of the SSF. The onestep ahead prediction of the state vector and the corresponding MSE matrix are given by


respectively. The onestep ahead prediction error and the corresponding MSE matrix (or variance matrix) are given by


respectively. The matrix inversions for S_{t}, t=1,...,T, can be evaluated recursively using similar methods as recursive regressions; see Appendix 14.7 of this chapter. The estimate of the scalar variance σ^{2} is


The full sample generalised least squares estimate of δ and its MSE matrix are given by


respectively. The MSE matrix is used to obtain standard errors and tstatistics for δ̂. In a similar way as vector δ, δ̂ is partitioned as
δ̂=( 
 ) . 
For a proof and a full exposition of the augmented Kalman filter; see de Jong (1991). When the SSF does not contain values for X_{t} and W_{t}, the KF can evaluate â_{t+1t} from t=d+1 onwards. This implies that the AKF is only used for the first d updates. Then, the AKF is collapsed to the KF, mainly using equation (eq:14.15) for t=d. On the other hand, when X_{t} and W_{t} are present in the SSF, a partial collapse with respect to δ_{i} is possible. The exact details of the full and the partial collapse are given in Appendix 14.7 of this chapter. Formal proofs on collapsing are given by de Jong and ChuChunLin (1994).
14.3.2 The likelihood function
The likelihood function can be obtained from the KF by using the innovations and their mean square errors. This is known as the prediction error decomposition. For example, the likelihood function, conditional on δ=0, is defined as


apart from some constant. To limit the number of logarithmic operations in computing the likelihood, it is important to check whether a steadystate has been reached such that log F_{t} is constant for all t.
The unknown parameters in the system matrices of the SSF are stacked in vector θ. For a given vector θ=θ^{*}, the KF calculates the likelihood and, therefore, it provides means to obtain maximum likelihood estimates of the parameters. These matters are discussed in §14.6.
14.3.2.1 Univariate models: the concentrated likelihood
The diffuse likelihood function (DL) is defined as the likelihood when δ is supposed to be diffuse; see §14.2. For a given vector θ=θ^{*}, the DL is given by


with T^{*}=Tkd. When a (partial) collapse of the AKF has taken place at some time t=m, the DL is calculated by


Appendix 14.7 of this chapter defines S_{*} and it gives more details of likelihood calculation when a full or a partial collapse has taken place. The scalar variance σ^{2} can be concentrated out of the likelihood DL. This concentrated diffuse likelihood function (CDL) is given by


To calculate the CDL, one variance corresponding to a particular unobserved component is set to a unity value such that this variance is set equal to σ^{2} of (eq:14.12). The maximum likelihood estimate of this `concentrated out' variance is σ̂^{2}. The other variances of the components in the model are now measured as a ratio to the concentrated variance. For example, in a local level model, where σ_{ε} ^{2} is concentrated out, we have σ_{η} ^{2}=σ^{2}q_{η} . These variance ratios are referred to as qratios. Parameter estimation is carried out with respect to the qratios; see §14.6. The CDL is used for parameter estimation of univariate models in STAMP. For a full discussion on diffuse likelihoods, we refer to de Jong (1988).
14.3.2.2 Homogeneous models
The homogeneous model is introduced in §14.1 and its restriction for a multivariate local level model reduces to


where the qratio q_{η} plays the same role as set out in the previous subsection for univariate models. The homogeneity restrictions imposed on multivariate models allow the decoupling of the KF; see Fernandez (1990). The N different time series in y_{t} are subjected to the same KF which corresponds to a univariate SSF where the matrices G and H contain the qratio values of the homogeneous model. So, the covariance equations of the (univariate) KF, i.e. f_{t}, k_{t} and P_{t+1t}, are the same for all N time series. The set of N innovation series, stacked in v_{t}, are used to construct the diffuse likelihood. The full variance matrix of a particular component, say the irregular ε_{t}; that is, Σ=Σ_{ε} , can be concentrated out of the likelihood function leading to


where
{\bf \Sigma }= 
 ∑_{t=k+d+1}^{T} 
 . 
The univariate AKF is used for homogeneous models and, if possible, a (partial) collapse is made at some time t=m. Therefore, the same technical details apply as mentioned in the previous subsection which are set out in Appendix 14.7 of this chapter.
14.3.2.3 Multivariate models
The diffuse likelihood is used for parameter estimation of multivariate models. Thus, no parameter is concentrated out from the likelihood. The AKF is used for calculation but it can be (partially) collapsed at some time t=m. The multivariate diffuse likelihood is defined as


where S_{*} is defined in Appendix 14.7 of this chapter.
14.3.3 Prediction error variance
The prediction error variance (PEV) of a univariate structural time series model is the variance of the onestep ahead prediction errors in the steady state. As discussed at the start of this section, the steady state may require more updates of the KF than the number of available observations. The formal definition of the estimated PEV is
PEV=σ̂^{2}f, 
where f is the steady state value of f_{t}; that is, f= lim _{t→∞}f_{t}. For a multivariate model, the prediction error variance matrix can be evaluated. When a steady state is not obtained, it may be useful to report a `finite' PEV; that is,
PEV_{n}=σ̂^{2}f̂_{n}, 
where n is some large finite number. The `finite' PEV at the end of the sample; that is, at n=T, might also be of interest.
14.3.4 The final state and regression estimates
The final state vector â_{TT} contains, together with δ̂_{x}, all the information from the estimated SSF required to forecast future observations. Therefore, the final state is of special interest. The estimate of the state vector and its MSE matrix at the end of the sample are given by

respectively. The diagonal part of the MSE matrix is used to form standard errors and tstatistics of the final state â_{TT}. Under certain circumstances, some elements of this diagonal part may be equal to zero and calculation of the tstatistic is not possible then. Also, for stationary parts of the state vector, it is not useful to calculate the tstatistic since stationary components are not persistent throughout the series.
The regression estimates are obtained from
δ̂_{x}={\bf S}_{T}^{1}{\bf s}_{T} and MSE(δ̂_{x})=σ̂^{2}{\bf S}_{T}^{1}, 
where {\bf S}_{T} and {\bf s}_{T} are defined in Appendix 14.7 of this chapter. The diagonal part of the MSE matrix is used to form standard errors and tstatistics.
14.3.5 Filtered components
The filtered components are based on the onestep ahead predictions of the state vector; see equation (eq:14.15). When k=0, the AKF is applied until t=d. Then, the AKF collapses to the KF which evaluates the filtered states directly. However, when k>0, a partial collapse takes place and filtering is based on (eq:14.15) with matrix % {\bf S}_{T}^{1} calculated recursively; see Appendix 14.7 of this chapter. Note that the filtered states are only defined for t=d+k+1,...,T.
14.3.6 Residuals
The residuals in STAMP for univariate models are defined as the standardised innovations


where v̂_{t} and σ̂ ^{2}f̂_{t} are defined as in (eq:14.16). When k=0, the residuals are obtained from the KF directly after the collapse of the AKF at t=d. In a multivariate model the residuals of the jth equation are given by
v_{j,t}=v̂_{j,t}/σ̂F̂_{jj,t}^{ 1/2 } 
for t=d+1,...,T where a_{i,t} is the ith element of vector a_{t} and A_{ij,t} is the (i,j) element of matrix A_{t}.
When a SSF contains fixed effects; that is, when k>0, two sets of residuals are distinguished inside STAMP. These residuals are defined in the next two subsections. Note that the two sets are the same when k=0.
14.3.6.1 Generalised least squares residuals
The generalised least squares (GLS) residuals are standardised innovations computed using (eq:14.26), t=d+1,...,T, but where v̂_{t} and f̂_{t} are obtained from the AKF (eq:14.15) applied to a SSF with b_{x} being replaced by B_{x}δ̂_{x}. The calculation of these residuals requires two steps. Firstly, the estimates δ̂_{x} and σ̂ are obtained from the AKF with d+k columns for A_{t+1t}. Then, a second AKF is applied, with d columns for A_{t+1t}, from which v̂_{t} and f̂_{t} are obtained for t=d+1,...,T. Of course, the first AKF may be partially collapsed after t=d and the second AKF may be fully collapsed to a KF after t=d. The GLS residuals are not independently distributed with constant variance in small samples. Tests for normality, heteroskedasticity and serial correlation are applied to the GLS residuals.
14.3.6.2 Generalised recursive residuals
The generalised recursive residuals are the standardised residuals (eq:14.26), but now obtained from an AKF, with d+k columns for A_{t+1t}, accompanied with a set of recursive regressions for the inversion calculations of S_{t}; see Appendix 14.7 of this chapter. A partial collapse of this AKF may take place at t=d. These residuals are denoted by w_{t}, for t=d+k+1,...,T, and they are commonly used in predictive tests and diagnostics.
14.4 Disturbance smoother
Smoothing refers to the estimation of the state vector α_{t} and the disturbance vector u_{t} using information in the whole sample rather than just past data. Smoothing is an important feature because it is the basis for
 signal extraction, detrending and seasonal adjustment;
 diagnostic checking for detecting and
distinguishing between outliers and structural changes using
auxiliary residuals;
 EM algorithm for initial estimation of parameters;
 calculation of the score, defined as the first derivative of the likelihood with respect to the vector of parameters.
We consider the disturbance smoother (DS) of Koopman (1993) which directly estimates the disturbance vector of the SSF, u_{t}, and the MSE matrix; that is,
E(u_{t}Y_{T},δ=0) = ũ_{t} and MSE(ũ_{t}) = σ^{2} C_{t}, 
respectively, with t=1,...,T. Note that in the context of structural time series models, the interest in u_{t} is limited to elements of Gu_{t} and Hu_{t} and their corresponding MSE matrices.
The DS is based on the same backwards recursions as found by Bryson and Ho (1969), de Jong (1989) and Kohn and Ansley (1989) which are given by

where L_{t}= TK_{t}Z and t=T,...,1. These backwards recursions are initialised with r_{T}=0 and N_{T}=0. The storage requirements for the KF are limited to the quantities v_{t}, F_{t} and K_{t} for t=1,...,T. The smooth estimates for the disturbances of the measurement equation and the MSE matrices are


respectively. The smooth estimates for the disturbances of the transition equation and the mean square errors are


respectively. The proof of the disturbance smoother is given by Koopman (1993). The DS is the most computationally efficient smoother available yet.
14.4.1 The augmented disturbance smoother
In a similar way as the KF, the DS must be augmented to allow for initial effects with diffuse distributions and fixed components. The supplement recursions for the DS are
E_{t}=F_{t}^{1}V_{t}K_{t}'R_{t} and R_{t1}=Z'E_{t}+T'R_{t}, 
and they will be referred to as the augmented disturbance smoother (ADS). The number of columns for E_{t} and R_{t} is equal to d+k. When the AKF is not collapsed at all, the full sample estimates of the disturbance vectors E(Gu_{t}Y_{T}), and the corresponding MSE matrices MSE(Gu_{t}Y_{T}), are given by (eq:14.27) with e_{t} and D_{t} replaced by


respectively, for t=1,...,T. In a similar way, E(Hu_{t}Y_{T}) and MSE(Hu_{t}Y_{T}) are given by (eq:14.28) with r_{t} and N_{t} replaced by


respectively. for the transition disturbances with Note that δ̃=S_{T}^{1}s_{T} and S=S_{T}. For a proof of the ADS, the reader is referred to Koopman (1993). When the AKF is fully collapsed to the KF at t=m, the DS can be used for t=T,...,m+1. The ADS should be applied for t=m,...,1 but when a collapse has taken place, the matrices S_{T} and R_{m} are not available. Therefore, smoothing is not straightforward when a collapse has occurred during the AKF. However, a suggestion of ChuChunLin and de Jong (1993) can be used to recover these missing matrices.
14.4.2 The EM algorithm and exact score
The EM algorithm is a recursive method to obtain maximum likelihood estimates for unknown elements in the state and error system matrices of the SSF; see Shumway and Stoffer (1982) and Watson and Engle (1983). This method requires a huge amount of computations and is not easy to implement. Koopman (1993) proposes a simple and computationally efficient EM algorithm for unknown values inside the variance matrices Ω_{G}=GG' and Ω_{H}=HH' where the error system matrices are supposed to be timeinvariant. Let θ be the stack of these unknown values. Given a set of values for θ, say θ^{*}, the EM step calculates new variance matrices via

where
Λ_{e}=∑_{t=1}^{T}ê_{t}ê_{t}^{'}D̂_{t} and Λ_{r}=∑_{t=1}^{T}r̂_{t}r̂_{t}'N̂_{t}. 
The quantities ê_{t} and D̂_{t} are defined in (eq:14.29) and the quantities r̂_{t} and N̂_{t} are defined in (eq:14.30). The loglikelihood value of the SSF with these new variance matrices is always larger compared to the loglikelihood value of the SSF with the old variance matrices. This EM algorithm only requires one pass through the KF and one pass back through the disturbance smoother. Note that the EM maximises the diffuse loglikelihood (eq:14.21). For a proof and a discussion of this EM algorithm, the reader is referred to Koopman (1993).
The score vector is defined as the first derivative of the loglikelihood with respect to the vector of parameters θ. The vector of first derivatives is usually evaluated numerically. Koopman and Shephard (1992) present a method to calculate the exact score for any parameter in the state and error system matrices. However, the computational gain is only achieved for exact score calculations with respect to parameters in the error system matrices G_{t}=G and H_{t}=H. Again, let θ be the stack of unknown values in the variance matrices of the SSF and let vector θ^{*} contain specific values for θ, then the exact score is given by


and the definitions of the EM algorithm apply here as well.
14.4.3 Smoothed components
The smoothed state vector is defined as the estimated state vector using the full dataset; that is,


The smoothed state vector forms the basis for detrending and seasonal adjustment procedures for the SSF. Graphical inspection of a subset or a linear combination of the smoothed state vector is an important feature in STAMP. The actual values for (eq:14.32) are obtained by


with α̃_{1}=â_{10}+P̂_{10}(r_{0}+R_{0}δ̃). Hence, the smoothed state vector is calculated using the forward recursion (eq:14.33) after the DS is applied and the vector Hũ_{t}=HH'r_{t} is stored for t=1,...,T. The MSE matrix of the smoothed state vector is not considered; see de Jong (1989) and Kohn and Ansley (1989). It is shown by Koopman (1993) that this method of state smoothing is computationally very fast. Also note that no extra storage space is required since the storage space of the Kalman gain matrix K_{t} can be used to store the vector Hũ_{t} for t=1,...,T.
14.4.4 Auxiliary residuals
The standardised smoothed estimates of the disturbances are referred to as auxiliary residuals. Graphs of these residuals, in conjunction with normality tests, are a valuable tool for detecting data irregularities such as outliers, level changes and slope changes. Note that the auxiliary residuals are serially correlated even when the model is correctly specified. Therefore, the normality test should be adjusted for the implied serial correlation. The technical details and some interesting applications can be found in Harvey and Koopman (1992).
The auxiliary residuals are obtained directly from (eq:14.27) and (eq:14.28). The residuals are standardised using the diagonal elements of the appropriate MSE matrix. The variance σ^{2} must be replaced by its estimate σ̂^{2}.
14.5 Forecasting
An important feature of STAMP is its extensive forecast facility. The modelbased extrapolations can be calculated for the actual series but also for any element of the state vector.
14.5.1 Forecasts of series and components
The forecast of the state vector l multisteps ahead, for l=1,...,L, and its MSE matrix are given by
â_{T+lT}=E(α_{T+l}Y_{T}), σ^{2}P̂_{T+lT}=MSE(α_{T+l}Y_{T}), 
respectively. The forecasts and MSE matrices are generated recursively by


respectively. The initialisation is directly obtained from the AKF at time T . The forecast for y_{T+l} and the corresponding MSE matrices are given by


respectively. Graphical inspection of the forecasted values can be valuable for checking model validity.
14.5.2 Extrapolative residuals
The extrapolative residuals can be calculated when future values of y_{t} become available. They are defined as


where ŷ_{T+lT} is defined in (eq:14.35). The variance matrices of the extrapolative residuals are equal to the MSE matrices of ŷ_{T+lT}; that is, σ̂^{2}F̂_{T+lT} as in (eq:14.35). The extrapolative residuals can be standardised by using the corresponding diagonal elements of the variance matrix.
14.6 Parameter estimation
The model definitions are given in §14.1. It is shown in §14.2 how the models of STAMP can be put in SSF. The filtering and smoothing algorithms of §14.3 and §14.4, respectively, are associated with the SSF and can be applied conditional on known state and error system matrices. The unknown values inside these matrices are treated as parameters which need to be estimated. The main task of STAMP is to estimate these parameters using maximum likelihood estimation (MLE) methods. This subsection sets out the strategy for solving this problem.
14.6.1 The parameters of STAMP
The state and error system matrices of the SSF are timeinvariant and contain many zero and unity values. Therefore, they can be regarded as selection matrices. For a univariate model, the unknown values in the error system matrices are the standard deviations of the unobserved components. The timeinvariant transition matrix T of the SSF contains the unknown firstorder autoregressive parameter of stationary components such as cycle and (optional) slope. In the case of a cycle, the state system matrix T also contains cosine and sine terms which rely on the unknown frequency λ. Table 14.2 overviews the set of possible parameters for a univariate model.
The variances are always positive, the cycle damping factors must be in the range 0<ρ<1, and the frequencies can only take values in the range 0<λ<π. Since the parameters are to be estimated via a numerical method, some special transformations are used to enforce these restrictions on the parameters; see Table 14.2. The set of transformed parameters, relevant for a specific model, are stacked in the vector θ.
As observed from Table 14.2, the variance of the cycle, σ_{ψ} ^{2}, is taken as the parameter instead of the variance of its disturbance σ_{κ} ^{2}. In this case, σ_{κ} ^{2}=σ_{ψ} ^{2}(1ρ_{ψ} ^{2})^{ 1/2 } and σ_{κ} ^{2}→0 as ρ_{ψ} →1. This allows the estimation of deterministic, but stationary, cycles which is the extreme case of ρ_{ψ} =1.
Table:14.2 Parameters and transformations
Variances  Parameter  Transformation 
Irregular  σ_{ε} ^{2}  exp (2θ_{ε}) 
Level  σ_{η} ^{2}  exp (2θ_{η}) 
Slope  σ_{ζ} ^{2}  exp (2θ_{ζ}) 
Seasonal  σ_{ω} ^{2}  exp (2θ_{ω}) 
Cycle  σ_{ψ} ^{2}  exp (2θ_{ψ}) 
AR(1)  σ_{ξ} ^{2}  exp (2θ_{ξ}) 
Autoregressive  Parameter  Transformation 
Slope  ρ_{β}  θ_{β}(1 + θ_{β}^{2})^{ 1/2 } 
Cycle  ρ_{ψ}  θ_{ρ} (1 + θ_{ρ}^{2})^{ 1/2 } 
AR(1)  ρ_{ν}  θ_{ν} (1 + θ_{ν}^{2})^{ 1/2 } 
Frequency  Parameter  Transformation 
Cycle  λ  2π/ 2 + exp (θ_{λ}) 
14.6.1.1 Variance matrices in a multivariate model
The same parameters exist for a multivariate model as for a univariate model, except that the variances are replaced by variance matrices; see §14.1. A special case is the vector autoregressive component which is discussed in §14.5.2. Since all variance matrices are treated the same, we consider a N×N matrix Σ, i.e. a nonnegative symmetric matrix but not necessarily full rank. The Cholesky decomposition of Σ enforces these restrictions. The variance matrix is decomposed as Σ=ΘDΘ' where N×p matrix Θ is a lowertriangular matrix with unity values on the leading diagonal, 1≤p≤N, and p×p matrix D is a nonnegative diagonal matrix. The lowertriangular elements of Θ; that is, θ_{ij} with i=2,...,N and j=1,...,min(i1,p), may take any value, including zero values. The diagonal elements of D are transformed as
d_{i}= exp (2θ_{ii}) 
where d_{i} is the ith diagonal element of D and i=1,...,p. Indeed, matrix Θ is the same factor loadings matrix as in the model specifications for common components; see §14.1.4. Since matrix Θ has p columns, this would suggest that variance matrix Σ has rank p. In most cases this will be true. However, as any diagonal element of D can be zero, the rank of Σ can be less than p. Note that when d_{i}=0, the ith column of Θ does not affect variance matrix Σ and can be set to zero.
The variance matrices are placed into the SSF via the error system matrices. For example, the variance matrix of the irregular enters the SSF via G=(ΘD^{ 1/2 },0,...,0) where Θ and D correspond to the decomposition of the irregular variance matrix Σ_{ε} and 0 denotes a zero matrix.
14.6.1.2 Vector autoregressive components
The multivariate model may include a vector autoregression of order 1. The parameters inside the matrix Φ appear in the SSF transition matrix T. All roots of the matrix polynomial IΦL must lie outside the unit circle. Within a numerical optimisation procedure, the constraints on Φ needed to keep the process stationary are imposed by means of an algorithm given by Ansley and Kohn (1986). In the special case of a VAR(1) model, the transformations are:
 The parameters of the VAR component are put into the N×N
unrestricted matrix A and the N×N positive definite
symmetric matrix Γ.
 Cholesky transformations are applied on the matrices I+AA'=BB' and Γ=LL' .
 The VAR matrices are defined by Φ=LA'B^{'1}L^{1} and Σ_{υ} =ΓΦΓΦ'.
This method ensure a stationary VAR component. However, imposing restrictions on the VAR matrices may be difficult.
14.6.2 BFGS Estimation procedure^{*}
The optimisation method is based on the BFGS quasiNewton method and it aims to maximise the likelihood criterion l( θ) . At every call to this updating scheme, it delivers a new estimate for θ, denoted by θ_{i+1}, given a current estimate, θ_{i}, which brings the object function l( θ) closer to its maximum. The updating scheme can be written as


where s_{i} is a step size scalar and δ_{i} is the search direction vector. Appendix 14.8 of this chapter gives some basic principles on numerical optimisation and in particular the BFGS updating scheme. Some practical details on initialisation, updating, linear search and termination, are given below.
 At the start of the estimation process an initial parameter vector θ_{0} and an initial search direction δ_{0} must be
given. The procedure of setting the initial parameter vector is different
for univariate models and for multivariate models; see §14.6.3 and
§14.6.4, respectively. The initial direction vector is calculated as
δ_{0}=H^{d}(θ_{0})q(θ_{0}) (eq:14.38) where q(θ_{0}) is the score vector and H^{d}(θ_{0}) is the Hessian matrix which only contains values for the diagonal entries. These values are evaluated numerically and are based on parameter vector θ_{0}. The offdiagonal entries are zero. The initial score vector is evaluated numerically. For multivariate models, the score vector is evaluated analytically for any next update; see §14.4.2.
 The maximisation process is terminated when all three convergence
criteria hold depending on a small value ε or when the number
of BFGS updates is equal to the integer M or when irregularities occur. In
the latter case, the estimation procedure is stopped with an error code attached to it.
The three convergence criteria are
1. Likelihood: crit_{1}= l( θ_{i}) l( θ_{i+1})  / l( θ_{i})  < ε 2. Gradient: crit_{2}= 1/m ∑_{j=1}^{m}q_{j}( θ_{i})  < 10ε 3. Parameter: crit_{3}= 1/m ∑_{j=1}^{m}θ_{i+1,j}θ_{i,j} / θ_{i,j} < 100ε where θ_{i,j} denotes the jth element of the parameter vector θ_{i} and q_{j}( θ_{i}) denotes the jth element of the score vector q( θ_{i}) . When all three convergence criteria hold, the estimation procedure terminates and returns one of the following messages
Very strong crit_{1}<ε crit_{2}<ε crit_{3}<ε Strong crit_{1}<ε crit_{2}<ε crit_{3}<10^{N}ε Weak crit_{1}<ε crit_{2}<10^{N}ε crit_{3}<10^{N}ε Very weak crit_{1}<10^{N}ε crit_{2}<10^{N}ε crit_{3}<10^{N}ε where N is the number of elements in the vector y_{t} of the state space form; see §14.2. For univariate models, N is equal to unity. The default values are ε=10^{7} and M=100.
 The direction vector δ_{i} is obtained via the BFGS
scheme; see Appendix 14.8 of this chapter. To ensure stability in the
optimisation routine: the BFGS updating of (eq:14.47) does not take place
when d'g in (eq:14.47) is smaller than ε.
 The linear search strategy is as follows. The step length s_{i} is
given by
s_{i}= j min {(cθ_{i,j})~/~δ_{i,j}} where θ_{i,j} denotes the jth element of the parameter vector θ_{i} and q_{j}( θ_{i}) denotes the jth element of the score vector q( θ_{i}) . If s_{i}<0.0001, s_{i} is set to 0.0001, and if s_{i}>0.9, s_{i} is set to 1. The constant c is set to 7 for parameters related to standard deviations, but c is set to 28 and 8 for parameters related to ρ and λ, respectively. The constants determine the parameter range for the optimisation procedure; that is, θ_{i,j}∈ [c,c]. Also, when the maximum value in the direction vector δ_{i}, δ_{ max }, exceeds the value 3, then s_{i} is overwritten by min (δ_{ max }^{1},s_{i}). Subsequently, if l(θ_{i+1}) is not larger than l(θ_{i}), the scalar s_{i} is divided by 2. This is repeated until s_{i} is smaller than ε. When s_{i}<ε, the Hessian matrix H( θ_{i}) is set equal its diagonal matrix H^{d}( θ_{i}) . The resetting of the Hessian matrix is only allowed twice. At the third occasion, the estimation procedure is terminated with an error message. Notice that in most cases the value for s_{i} is the unity value.
 When the likelihood function, the score or the diagonal elements of the Hessian matrix cannot be evaluated for some reason, the estimation procedure terminates with an error message.
More specific details of implementation for univariate models and multivariate models are given below.
14.6.3 Estimation of univariate models^{*}
For univariate models, STAMP maximises the concentrated diffuse loglikelihood log L_{c}(yθ) which is treated as an unconstrained nonlinear function of θ, denoted by l(θ). The likelihood kernel l(θ) is appropriately scaled by a function of T, the number of observations.
14.6.3.1 Initial values
The elements of the vector of transformed parameters θ are given in Table 14.2. The elements related to standard deviations, θ^{σ} , are set equal to 0.5, except for θ_{η} (level) which initial value is 1 whereas θ_{ζ} (slope) and θ_{ω} (seasonal) are started off with 1.5 and 2, respectively.
One element of θ^{σ} needs to be concentrated out, which effectively means it must be restricted to zero (that is, a unity standard deviation). The choice of the concentrated parameter must be chosen appropriately. At the start of the optimisation routine, the element corresponding to the irregular standard deviation is concentrated out. When no irregular is present in the model we choose from level, slope, cycles, seasonal and autoregressive component, consecutively.
The initial values for ρ_{ψ} and λ, corresponding to a particular cycle, are provided by the user and STAMP transforms them automatically to θ_{0}. The initial values for θ_{β} and θ_{υ} are set to 2 (which corresponds to ρ values of approximately 0.9).
Before the quasiNewton optimisation is started, a simple method is constructed to get θ^{σ} to the neighborhood of the optimal solution in a stable but quickly way. The initialisation method is similar to the quasiNewton updating scheme (eq:14.37). The jth element of the direction vector δ_{i} is given by
δ_{i,j}=H_{j,j}(θ_{i})q_{j}(θ_{i}) 
where q_{j}( θ_{i}) denotes the jth element of the score vector q( θ_{i}) and H_{j,j}(θ_{i}) denotes the (j,j) element of the Hessian matrix H( θ_{i}) . The score and the diagonal elements of the Hessian matrix are numerically evaluated at θ_{i}. The initialisation updating steps are repeated five times but it is stopped earlier if the stopping criterion
j∑δ_{i,j}∇_{j}(θ_{i})<0.01 
holds.
During this simple estimation procedure, a choice is made of which parameter should be concentrated out of the loglikelihood. At every step i, the parameter related to the largest value in vector θ_{i}, is concentrated out when its corresponding value in δ_{i} is larger than 4. This rule reflects the desire to concentrate out the largest standard deviation of the likelihood. When the choice of concentrated parameter is changed, the other standard deviations should be adjusted, so they have the correct ratios with respect to the new concentrated parameter. This might create unstabilities in the search process. Therefore, the difference between the old and the new adjusted value is enforced to lie between 1 and 2. This strategy is a result of `fine tuning' after it has been applied to a wide range of different models and datasets.
14.6.3.2 Strategy for setting parameters fixed
During the estimation procedure, certain parameters might take values close to their boundaries. When a parameter moves closer and closer to its boundary, it may slow down the optimisation procedure as a whole. In these circumstances, it is advised to fix parameters at their boundary values but to allow estimation to be continued for the remaining parameters. The strategy for setting parameters fixed is as follows. In the case of standard deviations, when the transformed parameter value is smaller that 5 and its gradient value is smaller than 0.0001 (in absolute terms), it is set to a constant implying a zero standard deviation.
Similarly, the damping factor ρ of a cycle is set to unity, when the element corresponding to θ_{ρ} in vector θ_{i} is larger than 25. If the value related to θ_{λ} (corresponding to the period of a cycle) is larger than 7 or less than 7, the program reduces the cycle component to an autoregressive component. In both cases, it is assumed that the corresponding gradient value is smaller than 0.0001 (in absolute terms). The transformed parameters related to ρ_{β} (slope) and ρ_{υ} (autoregressive) don't apply to this strategy.
14.6.3.3 Strategy for determining the concentrated parameter
The initial estimation procedure has determined a concentrated standard deviation from θ^{σ} . However, during the main estimation process, it might be more appropriate to concentrate out another parameter. When the largest transformed parameter value (corresponding to θ^{σ} ) exceeds the value 1.5, this particular parameter will be the new concentrated parameter. Then, the other parameters corresponding to θ^{σ} obtain values as ratios to the new concentrated parameter, the gradient vector q( θ_{i}) is recalculated and the Hessian matrix is replaced by the diagonal matrix H^{d}( θ_{i}) .
14.6.4 Estimation of multivariate models^{*}
The estimation procedure for multivariate models follows mainly the same route as for univariate models but there are some differences. The likelihood kernel l( θ) is the diffuse loglikelihood log L(yθ) but properly scaled by a function of T. So there is no need to find an appropriate concentrated parameter. The EM algorithm plays a prominent role in the procedure to obtain initial variance matrices for the disturbances; see §14.4.2. The strategy of setting elements to zero in the Cholesky decompositions of the variance matrices is set out below. The parameters for the VAR component cannot be set fixed during the estimation procedure.
14.6.4.1 Initial values
The initial settings for the multivariate model are discussed in terms of the variance matrix, the Cholesky decomposition and the VAR coefficient matrix, rather than directly in terms of the parameter vector θ. The initialisation procedure automatically transforms the elements of these matrices into the vector θ_{0} .
The initial parameter settings of the variance matrices is similar to the univariate settings of the qratios, except that now a specific qratio applies to all N diagonal entries of the D matrix. The factor loading matrix Θ is set to the identity matrix. At this stage, the rank conditions of the variance matrices are ignored. The VAR matrix is set equal to a diagonal matrix with all entries equal to 0.9. The extra initial parameters for the cycle component are set manually by the user.
The Kalman filter is applied to the multivariate model and the sample variance matrix of the innovations is calculated. The variance matrices of the disturbances are set equal to this matrix but properly scaled by their appropriate qratios.
Then, the EM algorithm is applied; see §14.4.2. The number of EM steps is equal to 5 which has proved to be appropriate in most cases. The VAR matrix and the other stationary parameters (ρ and λ) remain fixed during the EM. After the EM steps the parameters are transformed as indicated in §14.6.1. For example, the Cholesky decomposition is used to get the diagonal matrix D and the factor loading matrix Θ from the variance matrix. At this stage, the rank conditions are taken into account. Finally, the parameters are stacked into θ_{0}.
14.6.4.2 Strategy for fixing parameters
The need to restrict parameters to their boundary values is important to speed up the process of estimation. This is even more relevant for multivariate models since the estimation process is likely to take much more time than univariate models because filtering and smoothing involve matrices with larger dimensions and more parameters need to be estimated.
The diagonal entries of D are treated in the same way as the standard deviations in a univariate model. The restrictions for the damping factor and the frequency parameter are also done in a similar fashion as for univariate models. No restrictions can be made with respect to the parameters related to the VAR component. The factor loading matrix elements enter the parameter vector without transformation.
When the jth diagonal element of D is restricted to zero, the rank of the variance matrix Σ is reduced by one. All elements of the corresponding jth column of the factor loading matrix can also be restricted to zero since these elements don't have any effect on the variance matrix Σ. Any specific element of the factor loading matrix Θ is set to zero when its value is smaller than 0.0001 and its corresponding gradient is also smaller than 0.0001 (both in absolute terms).
14.7 Appendix 1: Diffuse distributions^{*}
This appendix discusses technical details about the way we treat initial and regression effects for the SSF. Most components in STAMP are initialised using diffuse priors and so we must treat these priors carefully in our calculations to ensure stable and exact results. This is carried out using the idea of augmented filtering and smoothing. To ensure computational efficiency the Kalman filter (KF) needs to be collapsed at some point.
14.7.1 Collapse of the augmented KF
For a SSF with k=0; that is, X_{t}=0 and W_{t}=0, t=1,...,T , is treated by the AKF for t=1,...,m and the AKF is collapsed to the KF using (eq:14.15) at t=m. Then, the KF evaluates â_{t+1t} and P̂_{t+1t} directly, t=1,...,T . The collapse also contains the operation
q̂_{m}=q_{m}s_{m}'S_{m}^{1}s_{m}. 
Indeed, the KF evaluates the residual vector v̂_{t} and its (unscaled) variance matrix F̂_{t} directly as well. The value for m is restricted by T≥m≥d and S_{m}>0 but usually it is set to m=d.
For a SSF with k>0, a full collapse cannot be made since X_{t} and W_{t} are timevarying. However, a partial collapse can take place with respect to δ_{i}. This means that the AKF is still applied but with a reduced number of columns. Therefore, we partition the AKF matrices


When S_{ii,t} is invertible at t=m, the AKF can be partially collapsed to

and

Note that matrix B of the (partially collapsed) AKF is replaced by {\bf B}=B_{x}. The matrices {\bf A}_{t+1t}, {\bf V}_{t}=Z_{t}{\bf A}_{t+1t}X_{t}% {\bf B} and {\bf B} have k columns.
14.7.2 Likelihood calculation
At the end of a fully collapsed AKF, the estimate of σ^{2} is given by σ̂ ^{2}=q̂_{T}/( NTdk) where k=0 and q̂_{T} is obtained from the collapsed KF. When k>0, q̂_{T} is given by
q̂_{T}=q_{T}s_{T}'S_{T}^{1}s_{T}. 
Likelihood calculation also requires the log value of the term S_{*} where S_{*}=S_{T}. If k=0 and a full collapse has taken place at t=m, then S_{*}=S_{m}. If k>0 and a partial collapse has taken place at t=m, then S_{*}=S_{ii,m}% {\bf S}_{T}. Note that the determinants are obtained without any extra computational costs since the inverses of these matrices are calculated earlier.
14.7.3 Recursive regressions
The recursive evaluation of least squares estimates is common practice in standard regression analysis since it is a way to get the recursive residuals. The same technique can be applied to the recursive evaluation of S_{t} which consists of the equations


where c_{t}=S_{t}^{1}s_{t} and C_{t}=S_{t}^{1}.
14.8 Appendix 2: Numerical optimisation^{*}
Numerical optimisation methods are used to obtain the maximum likelihood estimates for the parameter vector θ. Here we outline some of the basic principles behind the estimation procedure of STAMP.
There is a vast literature on nonlinear optimisation techniques (see, among many others, Gill, Murray and Wright (1981), Cramer (1986) and Thisted (1988)). Note that many texts on optimisation focus on minimisation, rather than maximisation, but of course: max l( θ) = min l( θ) .
A first approach to obtaining the MLE θ from l( θ) is to consider solving the score equations, assuming the relevant partial derivatives exist:


Then q( θ) =0 defines the necessary conditions for a local maximum of l( θ) at θ. A sufficient condition is that


also exists and is negative definite at θ. If Q( ^{.}) is positive definite for all parameter values, the likelihood is concave, and hence has a unique maximum; if not, there could be local optima or singularities.
14.8.1 Newton type methods
When q(θ) is a set of equations that are linear in θ, q(θ)=0 can be solved explicitly for θ. More generally, q(θ) is nonlinear, yielding a problem of locating θ which is no more tractable than maximising l(θ). Thus, we consider iterative approaches in which a sequence of values of θ (denoted θ_{i} at the i^{th} iteration) is obtained approximating q(θ_{i})=0, and corresponding to nondecreasing values of the criterion function l(^{.}) : l(θ_{i+1})≥l(θ_{i}):


where M is a terminal maximum number of steps, from an initial value θ_{0}. A convergence criterion is used to terminate the iteration, such as q(θ_{i+1})~=0 or l(θ_{i+1})l(θ_{i})≤ε. These implementationspecific aspects are discussed in §14.6.
Expand q( θ) =0 in a firstorder Taylor's series:


written as the iterative rule:


where H( ^{.}) =Q^{1}( ^{.}) is the Hessian matrix. The gradient, q(θ_{i}), determines the direction of the step to be taken, and H(θ_{i}) modifies the size of the step which determines the metric: this algorithm is the NewtonRaphson technique, or Newton's method. Even when the direction is uphill, it is possible to overstep the maximum in that direction. In that case it is essential to add a line search to determine a step length s_{i}∈[ 0,1] :


where δ_{i}=H( θ_{i}) q( θ_{i}) and s_{i} is chosen to ensure that l( θ_{i+1}) ≥l( θ_{i}) .
STAMP uses the socalled variable metric or BroydenFletcherGoldfarbShanno (BFGS) update quasiNewton methods. These approximate the Hessian matrix H( θ_{i}) by a symmetric positive definite matrix K_{i} which is updated at every iteration but converges on H( ^{.}) .
Let d=s_{i}δ_{i}=θ_{i}θ_{i1} and g=q( θ_{i}) q( θ_{i1}) ; then the BFGS update is:


This satisfies the quasiNewton condition K_{i+1}g=d, and possesses the properties of hereditary symmetry (K_{i+1} is symmetric if K_{i} is), hereditary positive definiteness, and superlinear convergence.
14.8.2 Numerical score vector and diagonal Hessian matrix
When the analytic formula for q( θ) cannot be obtained easily, STAMP uses BFGS with a numerical approximation to q( ^{.}) based on finite difference approximations. The numerical derivatives are calculated using:


where ι_{j} is a unit vector (for example, ( 0 1 0 ... 0) ' for j=2), ε is a suitably chosen step length. Thus, ε represents a compromise between roundoff error (cancellation of leading digits when subtracting nearly equal numbers) and truncation error (ignoring terms of higher order than ε in the approximation).
The diagonal entries of the Hessian matrix are numerically evaluated as
H_{j,j}^{d}( θ) = 

where
Q_{j,j}( θ) = 
 = 
 ~= 

with the same steplength ε.
Chapter 15 Model Output
This chapter defines the output of STAMP. The first section explains when and where the various parts of the output appear. The remaining sections explain the output is presented for every option of the Test menu.
15.1 Output from STAMP
This section details the output which occurs when a model is fitted to data. There are three different forms of output. Firstly, the iterations of the numerical optimisation algorithm are displayed in the Message window. Secondly, once the estimation algorithm has stopped (hopefully having converged!), the Results window reports details of the model specification and the results of the optimisation algorithm, and gives a summary of diagnostic statistics. Finally, the Test menu can be used to prompt STAMP to output specific details of the parameter estimates, state variables, model fit and diagnostics.
15.1.1 Model estimation
In order to compute the maximum likelihood estimates, STAMP first computes initial estimates and then it moves on to maximise the likelihood function directly by numerical optimisation; see Chapter 14. During this second phase various statistics are reported in the Message window:
 Parameters  the current values of
the transformed parameters;
 Gradients  the derivatives of the likelihood function with
respect to the transformed parameters;
 Function value  the kernel of the loglikelihood function;
 Initial value  the value of the function after the initial
phase;
 %change  the percentage increase of the current likelihood kernel over the initial value.
15.1.2 Selected model and estimation output
The initial output, written to the Results window, gives details of the selected model and various results from the numerical optimisation procedure; an example is reproduced in §??. In particular, the selected components are listed and the number of parameters is given. The sample period used for parameter estimation is also given.
A variety of statistics emerge from the numerical optimisation. The likelihood kernel evaluated at θ̂ is reported together with the number of iterations required for convergence. Three different convergence measures are reported and these are categorised by some term like STRONG CONVERGENCE; see §14.6.
15.1.3 Summary statistics
An example of the diagnostic summary report, which is printed to the Results window after fitting, is given in §??. Broadly this summary gives the basic diagnostics and goodnessoffit statistics. The precise definitions are given in §15.4 and §15.7.
 Loglikelihood  this is the actual loglikelihood function at
its maximum;
 Prediction error variance (PEV)  the
variance, or variance matrix, of the onestep ahead prediction errors in
the steady state, or the finite PEV if
accompanied by a message of `No steady state';
 Std. Error  the equation standard error is the square root of
the PEV;
 Normality  the DoornikHansen statistic, which is the BowmanShenton statistic
with the correction of Doornik and Hansen (1994), distributed
approximately as χ_{2}^{2} under the null hypothesis;
 H(h)  a test for heteroskedasticity, distributed
approximately as F(h,h);
 r(τ)  the residual autocorrelation at lag τ,
distributed approximately as N(0,1/T);
 DW  DurbinWatson statistic,
distributed approximately as N(2,4/T);
 Q(P,d)  BoxLjung Qstatistic based on the first P residual
autocorrelations and distributed approximately as χ_{d}^{2};
 R^{2}, R_{D}^{2} or R_{S}^{2}  the most suitable coefficient of determination.
Notice that the loss in the degrees of freedom d of the BoxLjung Qstatistic takes account of the number of relative parameters in θ. It does not take account of any lagged dependent variables and their presence necessitates further adjustment.
15.1.4 The sample period
The parameters are estimated using the model sample as recorded from the Estimation dialog in the Model menu. The calculation of diagnostics and goodnessoffit measures are calculated using the state sample. The default is that the model sample and the state sample are the same. However, the state sample might be changed to check subsamples isolated from the rest of the sample. This can be done in the option Final state. Of course, both samples must be within the boundaries of the data sample.
15.1.5 Test
The nine items of the Test menu are selfexplanatory. The headings of the following sections correspond to them. Note that the output for Goodness of Fit is requested by the Final state option and that many of the conventional time series diagnostics (built out of the innovations, or onestep ahead prediction errors) will come up in the Residual option.
15.2 Parameters
The variance parameters (and variance matrix parameters for multivariate models), together with frequencies and damping factors for cycles and coefficients for autoregressive components are referred to as parameters. The maximum likelihood estimates of the parameters can be requested from the Further output dialog. In a multivariate model, the same output is produced for each equation in turn. In addition, the full variance matrices and their decomposition in factor loading matrices and diagonal scaling matrices can be produced. This option also outputs the transformed parameters with their scores and approximate standard errors.
15.2.1 Variances and standard deviations
This gives the variances of the disturbances driving the various components. Thus, in terms of §14.1.1, the level variance is σ_{η} ^{2}, the slope variance is σ_{ζ} ^{2} and so on. The standard deviations are the square roots of the variances. The qratios are the ratios of each variance or standard deviation to the largest.
For multivariate models, the variances and standard deviations are obtained from the diagonal entries of the variance matrices. In order to verify the qratios for multivariate models, the largest variance across all series is identified, and this component is set to one for all series, irrespective of whether or not it is the largest in a particular series. However, when this variance in a particular series is zero, all associated qratios are set to zero. This also applies to qratios for standard deviations.
15.2.2 Cycle and AR(1)
For a stochastic cycle component, this option outputs the damping factor ρ, the frequency λ, the period 2π/λ, and the period in terms of `years'; that is, 2π/sλ, where s is the number of `seasons'. In addition it gives the variance of the cycle, σ_{ψ} ^{2}=σ_{κ} ^{2}/(1ρ^{2}). For a multivariate model, the variance matrix of the cycle is given. This is
Σ_{ψ} =(1ρ^{2})^{1}Σ_{κ} , 
where Σ_{κ} is the variance matrix of the cycle disturbances.
For the AR(1) component in a univariate model, the autoregressive coefficient is reported. For a multivariate model the N×N matrix of coefficients is presented. The variance matrix of the disturbance vector ξ_{t} is produced.
15.2.3 Variance matrices (multivariate models)
The N×N variance matrices of the disturbances correspond to the variance parameters in a single equation model; see §14.1.3. In the STAMP output, the variance matrix is given by the lower triangular part of an N×N matrix, with the variances corresponding to the disturbance terms in each series given on the main diagonal (these are also produced by `Variances' above). The upper triangular part of the matrix gives the offdiagonal entries in the corresponding correlation matrix.
In a common factor model, some, or all, variance matrices will be of reduced rank; see §14.1.4. This will be reflected by values of unity in specific entries of the correlation matrix.
15.2.4 Factor loading matrices (multivariate models)
The N×K factor loading matrix Θ is defined in §14.1.4. When no common factor restrictions are imposed, so that N=K, the factor loading matrix is the lower triangular matrix associated with the Cholesky decomposition of variance matrix Σ; that is, Σ=ΘDΘ'. This option outputs the lower triangular factor matrix Θ and the diagonal scaling matrix D.
The vectors of constants, such as {\bf \mu } and % {\bf \beta }, are defined in §14.1.4 and only exist when common factors are specified in the model. The dimension of these vectors is NK. The constant vector does not apply to the irregular component.
15.2.5 Transformed parameters and standard errors
The vector of estimated parameters, which are transformed as in §14.6, can be written to the Results window using this option. It also reports the score vector, associated with the estimated vector of parameters, and the numerical standard errors of the transformed parameters.
The scores are calculated as described in §14.6. The standard errors are obtained from the diagonal entries of the numerically evaluated Hessian matrix Q. Formally, the standard error of the ith estimated parameter is defined as
se_{i}=1/√Q_{i} 
where Q_{i} is the ith diagonal entry of matrix Q.
15.3 Final state
The estimate of the final state vector contains all the latest information on the components in the model. Together with the regression estimates of the explanatory variables and the interventions, the final state contains all the information needed to make forecasts.
The Final state dialog also provides useful information about the estimated seasonal pattern and the amplitude of the cycle. Some goodnessoffit statistics can be generated, but these are discussed in §15.4.
15.3.1 Analysis of state
In terms of §14.3.4, the final state is the filtered estimate at time T; that is, â_{TT}. The square roots of the diagonal elements of the corresponding MSE matrix, σ̂^{2}P̂_{TT}, are the root mean square errors (RMSEs) of the corresponding state elements. The tvalue is the estimate of the state divided by the RMSE. Applied to the ith element of the state, this is
tvalue_{i}=a_{i}~/~RMSE_{i}. 
where a_{i} is the ith element of vector â_{TT}.
The Prob.values shows the probability of the absolute value of a standard normal variable exceeding the absolute value of the tvalue. In a hypothesis testing situation, a Prob.value of less than 0.05 corresponds to rejection of the hypothesis of a zero value on a twosided test at the 5%level of significance. A Prob.value of less than 0.05 is indicated by *, while a value less than 0.01 is denoted **.
The various unobserved components which make up the structural time series model all appear in the final state .
 Trend  `Lvl' is the estimate of the level of the trend μ_{T}
while `Slp' is the estimate of the slope β_{T}. The tvalue might
not appear for the level when no irregular is specified in the model or when
the irregular variance is estimated as zero.
 Dummy variable seasonal  the estimate of the seasonal, γ_{T}, is part of the final state, together with the previous s2 seasonals.
These estimated elements of the state are denoted in the output by
`Sea_1',...,`Sea_s1'. Since the summing to zero constraint is used,
the latest seasonal pattern can be obtained. It is given explicitly by the
program if `Seasonal tests' is active.
 Trigonometric seasonal  the
final state contains the estimates for γ_{1,T}, γ_{1,T}^{*},
γ_{2,T}, γ_{2,T}^{*}, etc. and are denoted in the output by
`Sea_1', `Sea_2' and so on. Note that the meaning of `Sea_1', `Sea_2,
etc., is different to what it is in dummy variable seasonality, and so
the final estimated seasonal pattern cannot be identified directly. It must
be requested in `Seasonal tests'.
 Cycle  the
terms `Cyk_1' and `Cyk_2' give the estimates of ψ_{T} and ψ_{T}^{*} for cycle k, where k is 1, 2 and/or 3. Since a cycle component
is not persistent throughout the series, a tvalue is not appropriate
here.
 Autoregressive  the estimate of υ_{T} is denoted as `Ar1'.
The same output appears for multivariate models, but it is repeated for each individual series.
15.3.2 Regression analysis
The estimates of the regression parameters, the parameters of the intervention variables and the fixed seasonal dummies are based on the matrix (s_{T},σ̂^{2}S_{T}); see §14.3.4. These estimates are reported upon request (if part of the specified model) and they may be interpreted in much the same way as in a standard regression model. Because they are deterministic (not timevarying), their RMSEs are standard deviations. The tvalues would have tdistributions if the (relative) parameters were known. Although they are asymptotically normal, a tdistribution may provide a better approximation to small sample properties. As was noted earlier, the Prob. values are based on the normal distribution.
15.3.3 Seasonal tests
If the seasonal component is deterministic, either because it is specified to be `fixed' at the outset or its disturbance variance is estimated to be zero, a joint test of significance can be carried out on the s1 seasonal effects. The test is essentially the same as a test for the joint significance of a set of explanatory variables in regression. Under the null hypothesis of no seasonality, the large sample distribution of the test statistic, denoted by `Seasonal Chi^{2}(s1)' in the output, is χ_{s1}^{2}. The Prob. value is the probability of a χ_{s1}^{2} variable exceeding the value of the test statistic. In the case of a stochastic seasonal, the joint seasonal test is also produced although a formal joint test of significance of the seasonal effect is inappropriate. However, the seasonal pattern is persistent throughout the series and when the seasonal pattern changes relatively slowly, which is usually the case, the test statistic can provide a useful guide to the relative importance of the seasonal.
The formal definition of the test statistic is
a'P^{1}a 
where a contains the estimates of the s1 seasonal effects at time T and P is the corresponding MSE matrix. In the case of a fixed seasonal, a and P are obtained from the matrix (s_{T},σ̂^{2}S_{T}). For a stochastic seasonal, a and P come from the matrix (â_{TT},σ̂^{2}P̂_{TT}); see §14.3.4.
The final estimate of the seasonal effect for each season; that is, the estimated seasonal pattern at time T, is given here as well.
15.3.4 Cycle tests
The amplitude of the cycle is
(a_{}^{2}+a_{}^{*2})^{½} 
where a is the element of â_{TT} corresponding to ψ_{T} and a^{*} corresponds to ψ_{T}^{*}. The reported ratio is the comparison of the amplitude of the cycle with the level of the trend, and is also reported. This gives an indication of its relative importance. When the cycle is deterministic, but stationary, a joint significance χ^{2} test (the same as the seasonal test) is produced as well.
15.3.5 Data in logs
When it is indicated that a model has been estimated in logarithms, some additional output for specific components appears in the Results window.
 Trend  The value exp (a) is reported where a is the
element of â_{TT} corresponding to μ_{T}. When a slope
component is specified, the estimated annual growth rate is reported as
well. The annual growth rate is the estimate of β_{T }
multiplied by 100s.
 Seasonal  The multiplicative effect of the seasonal pattern on
the level of the series is given by exp (a_{j}), where a_{j}
is the estimate of the seasonal effect in season j. Also the percentage
effect is calculated; that is, 100{ exp (a_{j})1}. This output is added to
the output generated for the option `Seasonal tests'.
 Cycle  Additional to Cycle tests, the amplitude as a percentage of the level is computed. This is simply multiplying the estimated amplitude by 100.
15.4 Goodness of fit
The goodnessoffit statistics in STAMP all relate to the residuals, which are the standardised innovations; see §14.3.6. When explanatory variables are specified in the model, the generalised least squares residuals are used; see §14.3.6.1. The diagnostic checking facility for these residuals are provided in the Residuals option; see §15.7.
15.4.1 Prediction error variance
The basic measure of goodnessoffit is the PEV as defined in §14.3.3. The PEV is the variance of the residuals in the steady state and so it corresponds to the variance of the disturbance of the reduced form ARIMA model. When the Kalman filter does not converge to the steady state, the finite PEV is used. For a univariate model, the PEV is denoted by σ̃^{2}, irrespective of whether or not it is the finite PEV. For a multivariate model, σ̃^{2} denotes the diagonal entry of the prediction error variance matrix Σ̃ corresponding to a specific series in y_{t}. The output includes a message if the Kalman filter is not in the steady state at t=T. The Final state option includes the facility to recalculate the PEV using a preset number of updates.
15.4.2 Prediction error mean deviation
Another measure of fit is the mean deviation of the residuals, v_{t},t=d+1,...,T, defined by
md= 
 ∑_{t=d+1}^{T}v_{t} 
where the PEV σ̃^{2} is defined in §14.3.3. In a correctly specified model the reported ratio

should be approximately equal to unity.
When it is indicated that the the series are modelled in logs, the relative measure of error is also given. This measure is easily computed as 100md.
15.4.3 Coefficients of determination
In econometric modelling the traditional measure of goodnessoffit is the coefficient of determination R^{2}. For a structural time series model this measure is defined by
R^{2}=1 
 , 
where y_{t} is the original series and y is its sample mean. Note that for a multivariate model, y_{t} is a specific series of the vector y_{t} and σ̃^{2} is the diagonal element of Σ̃ corresponding to that series. This coefficient of determination is most useful when the series appears to be stationary with no trend or seasonal.
When the series y_{t} shows trend movements, it is better to compare the PEV with the variance of first differences, Δy_{t}=y_{t}y_{t1}. This leads to
R_{D}^{2}=1 
 . 
where \Delta y is the sample mean of Δy_{t}. It should be noted that R_{D}^{2} may be negative, indicating a worse fit than a simple random walk plus drift model.
For seasonal data with a trend, it is more appropriate to measure the fit against a random walk plus drift and fixed seasonals. This requires the sum of squares, SSDSM, obtained by subtracting the seasonal mean from Δy_{t}. The coefficient of determination is then
R_{S}^{2}=1 
 , 
which can again be negative.
15.4.4 Information criteria : AIC and BIC
The fit of different models can be compared on the basis of the PEV. However when models have different number of parameters, it is more appropriate to compare them using the Akaike information criterion (AIC) or the Bayes information criterion (BIC). These criteria are based on the formula
log PEV+cm/T 
where c is a constant and m is the number of parameters in θ plus the number of nonstationary components in the state vector. The AIC takes c as 2 and the BIC takes c as log T. For multivariate models, m is defined as for a univariate model, since the criteria are reported for each series in y_{t} separately.
15.5 Components
The `Components' option facilitates the computation of filtered or smoothed estimates of the specified components in the model. The filtered estimates are based on the past observation and can be calculated as described in §14.3.5. The smoothed estimates are based on all the observations, see §14.4.2. In what follows, the estimates are denoted by x̃, where x is some component which can be estimated either way. When the model is multivariate, the component is associated with a specific series in y_{t}.
15.5.1 Series with components
This option plots the original series, y_{t}, with one of the following (as requested):
 the estimated trend, μ̃_{t} (which includes
interventions on the level and slope);
 the estimated trend plus cycle(s) plus AR(1), μ̃_{t}+ψ̃_{t}+υ̃_{t};
 the estimated trend plus cycle(s) plus AR(1) plus explanatory variables, μ̃_{t}+ψ̃_{t}+υ̃_{t}+x_{t}^{'}δ̃_{x}, where the last part includes all the explanatory variables in the equation.
15.5.2 Detrended
The detrended series is y_{t}μ̃_{t} for t=1,...,T. The smoothed estimates are usually most appropriate for detrending. Note that level and slope interventions are included in the trend.
15.5.3 Seasonally adjusted
The seasonally adjusted series is y_{t}γ̃_{t} for t=1,...,T. The smoothed estimate of the seasonal component is normally used for seasonal adjustment.
15.5.4 Individual seasonals
The estimated seasonal component γ̃_{t} is rearranged into s different series, each corresponding to one of the seasons. The graph can be interpreted as a set of annual plots for the individual seasonal effects.
15.5.5 Data in logs
If the observations are in logarithms, the output of the various components may be amended in two ways: (1) `Antilogs' can be calculated; or (2) `Modified antilogs' can be calculated. These amendments do not apply in a similar way for all components. The following list gives the formal amendments:
 Trend  (1) exp (μ̃_{t}) and (2) exp (μ̃_{t});
 Growth rate (Slope)  (1) 100s~β̃_{t} and (2) 100s~β̃_{t};
 Seasonal, Cycle, AR(1) and Irregular  (1) exp (ã) and (2) exp (μ̃_{t}+ã) exp (μ̃_{t}), where ã is a specific component;
 Detrended  (1) exp (y_{t}μ̃_{t}) and (2) exp (y_{t}) exp (μ̃_{t});
 Seasonally adjusted  (1) exp (y_{t}γ̃_{t}) and (2) exp (y_{t}γ̃_{t}).
15.6 Residuals
The residuals are the standardised onestepahead prediction errors or innovations and they are defined in §14.3.6. The residual series are simply denoted by v_{t} for t=d+1,...,T. For multivariate models, the series v_{t} are associated with a specific series in y_{t}. For a correctly specified model with known variance parameters, the residuals are assumed to be NID(0,1). Diagnostic statistics and graphs are the tools to validate this proposition.
15.6.1 Correlogram
The residual autocorrelation at lag τ is defined by


The approximate standard error for r_{τ} is 1/(Td)^{½}. The Durbin and Watson (1950) statistic, is given by
DW= 
 ~~=~2{1r(1)}. 
In a correctly specified model, DW is approximately N(2,4/T).
A general test for serial dependence is the portmanteau BoxLjung Qstatistic, which is based on the first s residual autocorrelations; see Ljung and Box (1978). It is given by
Q( P,d) =T(T+2)∑_{j=1}^{P} 
 . 
and distributed approximately as χ_{d}^{2} under the null, where d is equal to Pn+1 and n is the number of parameters. For a multivariate model, n is determined as in a univariate model, so a variance matrix counts as one parameter. Notice that the loss in the degrees of freedom d of Q(P,d) takes account of the number of relative variance parameters. It does not take account of any lagged dependent variables and their presence necessitates further adjustment.
15.6.2 Periodogram and spectrum
These diagnostic graphs are defined in §13.3 and §??. They are applied to v_{t} in exactly the same way. Note that the theoretical spectrum is a horizontal straight line for a whitenoise series.
15.6.3 Cumulative statistics and graphs
The `Cusum' and `Cusum of squares' are as given in §13.5 and §13.6 respectively, but are now calculated using the residuals v_{t} for t=d+1,...,T.
The `Cumulative periodogram' is another cumulative graph which is useful since it smooths out the periodogram. The cumulative periodogram cp_{j} is defined as
cp_{j}= 

where p_{i} is the periodogram ordinate at i for i=1,...,n and n=[(Td)/2]. The cumulative periodogram cp_{j} is plotted against j=0,...,n. A useful diagnostic statistic is to measure the maximum deviation of the cp_{j} from the diagonal line connecting cp_{0}=0 and cp_{n}=1; that is, max cp_{j}j/n over j=1,...,n1; see Durbin (1969). This measure is outputted on request in the Results window. Some typical critical values for this twosided test are .44 for T ~=80, .40 for T ~=100 and .30 for T ~=200.
15.6.4 Distribution statistics
The Bowman and Shenton (1975) statistic for normality is calculated using the residuals v_{t} for t=d+1,...,T. As for the other statistics, and the graphs on the density, histogram and distribution, the method of calculation is as presented in §13.10.
15.6.5 Heteroskedasticity
A basic nonparametric test of heteroskedasticity is the twosided F_{h,h}.test,
H(h)=∑_{t=Th+1}^{T}v_{t}^{2}/∑_{t=d+1}^{d+1+h}v_{t}^{2}~ 
where h is the closest integer to (Td)/3. An increase of the variance in the last third of the sample will cause H(h) to be large, while the reverse will cause it to be small. The heteroskedasticity test appears as part of the Summary statistics.
15.7 Auxiliary residuals
Auxiliary residuals are designed to detect unusual movements in a fitted time series model. These residuals are the smoothed estimates of the disturbances associated with the components irregular, level and slope. Section 14.4.4 sets out how to calculate the standardised auxiliary residuals. It is very useful to plot them and look for any irregularities. If requested, all residuals which have absolute value exceeding two are written to the Results window.
As well as graphical representations of the auxiliary residuals, STAMP reports the standard distribution statistics for these residuals; see §13.10. Of course, these statistics cannot be used as standard moment tests because of the serial correlation in the auxiliary residuals. However these statistics can be corrected for serial correlation as in Harvey and Koopman (1992). In STAMP, the correction factors are calculated from the first max ((T)^{½}, 20) sample autocorrelations of the auxiliary residuals.
15.8 Predictive testing
When the final date of the state sample is smaller than the final date of the data sample; that is, there are L observations, y_{t}, t=T+1,...,T+L `outside the sample', postsample predictive testing may be carried out using all or some of these observations. However, when these final dates are the same, predictive testing within the sample is the only option; that is, predicting y_{t} for t=Tl+1,...,T. Of course, in the first situation, predictive testing within the sample is also possible.
When explanatory variables and/or interventions are specified in the model, the residuals used in predictive testing within the sample are the generalised recursive residuals defined in §14.3.6.2. These residuals are denoted by w_{t} for t=d^{*}+1,...,T where d^{*}=d+k and k is the number of explanatory variables in the model (excluding any variables which do not appear after time TL). Note that w_{t} is zero when an intervention enters into the model. The error bands for the residuals are based on twice the RMSE. The CUSUM is defined in §13.5, and it is calculated for these residuals if requested.
The Chow test within the sample is constructed as
Chow= 
 ∑_{t=Tl+1}^{T}w_{t}^{2}~/~∑_{t=d*+1}^{Tl}w_{t}^{2}. 
which is approximately distributed as F(l,Tld^{*}).
When postsample predictive testing is carried out, the estimates of the coefficients of the explanatory variables and/or interventions remain the same as they were at the end of the state sample. The corresponding standardised residuals are denoted by v_{t} for t=T+1,...,L . The postsample predictive `failure' test statistic is then
pft=∑_{j=1}^{L}v_{T+j}^{2}, 
which is approximately distributed as χ_{L}^{2}.
The CUSUM ttest is also outputted in the form
cusumt=L^{1/2}∑_{j=1}^{L}v_{T+j}, 
which is approximately distributed as a t distribution with TLd^{*} degrees of freedom.
The extrapolative residuals, which are defined in §14.5.2, can be used to assess the predictive performance of the model and make comparisons with rival models. If v̂_{T+jT}, j=1,...,L, denotes the jstep ahead extrapolative residual in the postsample period, the extrapolative sum of squares is given by
ess(T,L)=∑_{j=1}^{L}v̂_{T+jT}^{2}. 
This statistic can also be generated using residuals within the sample. A similar statistic based on the sum of absolute values is defined as
esa(T,L)=∑_{j=1}^{L}v̂_{T+jT}. 
15.9 Forecast
The forecasts of the series y_{t} and the components are generated as described in §14.5. The forecast error bounds are based on one RMSE. For example, the forecasts of a specific series ŷ_{T+j} are obtained from a particular element of ŷ_{T+jT}, for j=1,...,L, and the RMSE r̂_{T+j} is the square root of the corresponding diagonal element of the matrix σ̂^{2}F̂_{T+jT}. When the data are in logs, the antilog transformation ( exp ) can be calculated for the forecasts. Their corresponding RMSEs are then based on
r̂_{T+j}^{*}= exp (ŷ_{T+j}+r̂_{T+j}) exp (ŷ_{T+j}), j=1,...,L. 
The antilog forecasts adjusted for bias are calculated by
ŷ_{T+j}^{*}= exp (ŷ_{T+j}+0.5r̂_{T+j}^{2}), j=1,...,L. 