Maximum Likelihood Estimation¶
Classical estimation of parameters in state space models is facilitated by the availability of the likelihood as a byproduct of the filtering recursions. Given a set of initial parameters, numerical maximization techniques, often quasi-Newton methods, can be applied to find the set of parameters that maximize (locally) the likelihood function, \(\mathcal{L}(Y_n \mid \psi)\). In this section we describe how to apply maximum likelihood estimation (MLE) to state space models in Python. First we describe a direct approach using the classes defined in the previous section. Second, we show how integration with the Python package Statsmodels ([27]) can be used to great effect to streamline estimation.
Statsmodels is a library for statistical and econometric modeling using classical techniques. Its goals are comparable in many ways to those of the commercial econometric software like Stata in that it provides a range of out-of-the-box models that require little technical knowledge to apply to data. It includes least squares regression models, generalized linear models (GLM), robust linear models, discrete choice models, contrast analysis, and time-series analysis. Recent and forthcoming additions include mixed generalized linear models (mixed GLM) and general estimating equations (GEE). In addition, a version of the state space implementation described in this paper specialized to the classical estimation setting is forthcoming in Statsmodels.
We introduce a new parent class dp.ssm.MLEModel
(hereafter simply
MLEModel
) that transparently handles the integration. Models extending from
the MLEModel
class can painlessly perform maximum likelihood estimation via
a new fit
method. In addition, summary tables, postestimation results, and
model diagnostics are available.
Direct approach¶
Numerical optimziation routines in Python are available through the Python package SciPy ([13]). Generically, these are in the form of minimizers that accept a function and a set of starting parameters and return the set of parameters that (locally) minimize the function. There are a number of available algorithms, including the popular BFGS (Broyden–Fletcher–Goldfarb–Shannon) method. As is usual when minimization routines are available, in order to maximize the (log) likelihood, we minimize its negative.
The code below demonstrates how to apply maximum likelihood estimation to the
LocalLevel
class defined in the previous section for the Nile dataset. In
this case, because we have not bothered to define good starting parameters, we
use the Nelder-Mead algorithm which can be more robust than BFGS, although it
may converge more slowly.
# Import the generic minimization function from scipy
from scipy.optimize import minimize
# Create a new function to return the negative of the loglikelihood
nile_model_4 = LocalLevel(nile)
def func(params):
nile_model_4.update(params)
return -nile_model_4.loglike()
# Perform numerical optimization
output = minimize(func, nile_model_4.start_params, method='Nelder-Mead')
print(output.x) # [ 15108.31 1463.55]
nile_model_4.update(output.x)
print(nile_model_4.loglike()) # -632.537685587
The maximizing parameters are very close to those reported by [10] and achieve a negligibly higher loglikelihood (-632.53769 versus -632.53770).
Integration with Statsmodels¶
While likelihood maximization itself can be performed relatively easily, in practice there are often many other desired quantities, for example measures of parameter uncertainty (standard errors and confidence intervals). Another issue that arises in practice is that it is most convenient to allow the numerical optimizer to choose parameters across the entire real line. This means that some combinations of parameters chosen by the optimizer may lead to an invalid model specification. It is possible in some cases to use an optimization procedure with constraints or bounds, but it is almost always easier to allow the optimizer to choose in an unconstrained way and then to transform the parameters to fit the model. The implementation of parameter transformations will be discussed at greater length below.
While there is no barrier to calculating those quantities or implementing these transformations, the calculations are standard and there is no reason for each user to implement them separately. Again we turn to the principle of separation of concerns made possible through the object oriented programming approach, this time by making use of the tools available in the Python package Statsmodels.
Creating a model that features integration with the Statsmodels code base is as
simple as using dp.ssm.MLEModel
as the parent class instead of
dp.ssm.Model
. As the name suggests, it contains additional features
specific to maximum likelihood estimation while at the same time retaining all
of the features described above. [1] In particular, a new method, fit
, is
available which automatically performs maximum likelihood estimation using the
starting parameters defined in the start_params
attribute (see above) and
returns a results object.
Because the emphasis has shifted to evaluating the model in terms of
parameters, it is important to note that the loglike
, filter
,
smooth
, and simulation_smoother
methods now require as their first
argument the parameter vector at which to perform the operation.
The following code redefines the local level model to
use the MLEModel
base class; it is identical to the previous definition
except for three changes: (1) references to Model
have been replaced by
references to MLEModel
, (2) the update
method has been augmented
to allow for parameter transformations, and (3) a new attribute param_names
has been added which augments output with descriptive parameter names. A brief
explanation about the changes to the update
method follows.
First, a new argument has been added, describing whether or not the values in
the params
vector are considered to be transformed (i.e. in the space of
parameters valid for the model) or not (i.e. in the unconstrained space of
parameters used by the optimizer). Second, a new line replaces the params
vector with the output from the update
method of the parent class
(MLEModel
). If the parameters were not transformed, the parent update
method calls the appropriate transformation functions (see below for details)
and returns the transformed parameters.
class BaseMLELocalLevel(dp.ssm.MLEModel):
start_params = [1.0, 1.0]
param_names = ['obs.var', 'level.var']
def __init__(self, endog):
super(BaseMLELocalLevel, self).__init__(endog, k_states=1)
self['design', 0, 0] = 1.0
self['transition', 0, 0] = 1.0
self['selection', 0, 0] = 1.0
self.initialize_approximate_diffuse()
self.loglikelihood_burn = 1
def update(self, params, transformed=True):
# Transform the parameters if they are not yet transformed
params = super(BaseMLELocalLevel, self).update(params, transformed)
self['obs_cov', 0, 0] = params[0]
self['state_cov', 0, 0] = params[1]
With this new definition, we can instantiate our model and perform maximum
likelihood estimation. As one feature of the integration with Statsmodels, the
result object has a summary
method that prints a table of results:
nile_mlemodel_1 = BaseMLELocalLevel(nile)
# Notice that to call loglike, we now need to specify parameters, but we
# no longer need to call `update` beforehand.
print(nile_mlemodel_1.loglike([15099.0, 1469.1])) # -632.537695048
# Again we use Nelder-Mead; now specified as method='nm'
nile_mleresults_1 = nile_mlemodel_1.fit(method='nm', maxiter=1000)
print(nile_mleresults_1.summary())
==============================================================================
Dep. Variable: volume No. Observations: 100
Model: MLELocalLevel Log Likelihood -632.538
Date: Sat, 12 Sep 2015 AIC 1269.075
Time: 16:07:24 BIC 1274.286
Sample: 01-01-1871 HQIC 1271.184
- 01-01-1970
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
obs.var 1.511e+04 2586.966 5.840 0.000 1e+04 2.02e+04
level.var 1463.5474 843.718 1.735 0.083 -190.109 3117.203
===================================================================================
Ljung-Box (Q): 36.00 Jarque-Bera (JB): 0.05
Prob(Q): 0.65 Prob(JB): 0.98
Heteroskedasticity (H): 0.61 Skew: -0.03
Prob(H) (two-sided): 0.16 Kurtosis: 3.08
===================================================================================
Warnings:
[1] Covariance matrix calculated using the outer product of gradients.
A second feature is the availability of model diagnostics. Test statistics for
tests of the standardized residuals for normality, heteroskedasticity, and
serial correlation are reported at the bottom of the summary output. Diagnostic
plots can also be produced using the plot_diagnostics
method, illustrated
in Fig. 6. [2] Notice that Statsmodels is aware of the
date index of the Nile dataset and uses that information in the summary table
and diagnostic plots.
A third feature is the availability of forecasting (through the
get_forecasts
method) and impulse response functions (through the
impulse_responses
method). Due to the nature of the local level model these
are uninteresting here, but will be exhibited in the ARMA(1,1) and real
business cycle examples below.
[1] | An alternative is to use statsmodels.tsa.statespace.MLEModel as the
parent class. This has the benefit that state space models can be
estimated using the Statsmodels package alone but has the downside that
some advanced features described in this paper are not yet available
in that package. However, if all one wants to do is perform maximum
likelihood estimation it is perfectly sufficient, and all of the code
provided in this section will work equally well with this parent class. |
[2] | See sections 2.12 and 7.5 of [10] for a description of the standardized residuals and the definitions of the provided diagnostic tests. |
Parameter transformations¶
As mentioned above, parameter transformations are an important component of maximum likelihood estimation in a wide variety of cases. For example, in the local level model above the two estimated parameters are variances, which cannot theoretically be negative. Although the optimizer avoided the problematic regions in the above example, that will not always be the case. As another example, ARMA models are typically assumed to be stationary. This requires coefficients that permit inversion of the associated lag polynomial. Parameter transformations can be used to enforce these and other restrictions.
If an unconstrained variance parameter is squared the transformed variance parameter will always be positive. [23] and [2] describe transformations sufficient to induce stationarity in the univariate and multivariate cases, respectively, by taking advantage of the one-to-one correspondence between lag polynomial coefficients and partial autocorrelations. [3]
For practical reasons, it is strongly preferred that the transformation function have a well-defined inverse so that starting parameters can be specified in terms of the model space and then “untransformed” to the appropriate values in the unconstrained space.
Implementing parameter transformations when using MLEModel
as the base
class is as simple as adding two new methods: transform_params
and
untransform_params
(if no parameter transformations as required, these
methods can simply be omitted from the class definition). The following code
redefines the local level model again, this time to include parameter
transformations to ensure positive variance parameters. [4]
class MLELocalLevel(dp.ssm.MLEModel):
start_params = [1.0, 1.0]
param_names = ['obs.var', 'level.var']
def __init__(self, endog):
super(MLELocalLevel, self).__init__(endog, k_states=1)
self['design', 0, 0] = 1.0
self['transition', 0, 0] = 1.0
self['selection', 0, 0] = 1.0
self.initialize_approximate_diffuse()
self.loglikelihood_burn = 1
def transform_params(self, params):
return params**2
def untransform_params(self, params):
return params**0.5
def update(self, params, transformed=True):
# Transform the parameters if they are not yet transformed
params = super(MLELocalLevel, self).update(params, transformed)
self['obs_cov', 0, 0] = params[0]
self['state_cov', 0, 0] = params[1]
All of the code given above then applies equally to this new model, except that this class is robust to the optimizer selecting negative parameters.
[3] | The transformations to induce stationarity are made available in this
package as the functions
dp.ssm.tools.constrain_stationary_univariate and
dp.ssm.tools.constrain_stationary_multivariate . Their inverses are
also available, with the obvious name changes. |
[4] | Note that in Python, the exponentiation operator is ** . |
Example models¶
In this section, we extend the code from Representation in Python to allow for maximum likelihood estimation through Statsmodels integration.
ARMA(1, 1) model¶
from dismalpy.ssm.tools import (constrain_stationary_univariate, unconstrain_stationary_univariate)
class ARMA11(dp.ssm.MLEModel):
start_params = [0, 0, 1]
param_names = ['phi', 'theta', 'sigma2']
def __init__(self, endog):
super(ARMA11, self).__init__(
endog, k_states=2, k_posdef=1, initialization='stationary')
self['design', 0, 0] = 1.
self['transition', 1, 0] = 1.
self['selection', 0, 0] = 1.
def transform_params(self, params):
phi = constrain_stationary_univariate(params[0:1])
theta = constrain_stationary_univariate(params[1:2])
sigma2 = params[2]**2
return np.r_[phi, theta, sigma2]
def untransform_params(self, params):
phi = unconstrain_stationary_univariate(params[0:1])
theta = unconstrain_stationary_univariate(params[1:2])
sigma2 = params[2]**0.5
return np.r_[phi, theta, sigma2]
def update(self, params, transformed=True):
# Transform the parameters if they are not yet transformed
params = super(ARMA11, self).update(params, transformed)
self['design', 0, 1] = params[1]
self['transition', 0, 0] = params[0]
self['state_cov', 0, 0] = params[2]
The parameters can now be easily estimated via maximum likelihood as above.
This model also allows us to demonstrate the prediction and forecasting
features provided by the Statsmodels integration. In particular, we use the
get_prediction
method to retrieve a prediction object that gives in-sample
one-step-ahead predictions and out-of-sample forecasts, as well as confidence
intervals. Fig. 7 shows a graph of the output.
inf_model = ARMA21(inf)
inf_results = inf_model.fit()
inf_forecast = inf_results.get_prediction(start='2010-01-01', end='2017-01-01')
print(inf_forecast.predicted_mean) # [2010-01-01 -0.000747 ...
print(inf_forecast.conf_int()) # [2010-01-01 -0.006891 0.005398 ...
If only out-of-sample forecasts had been desired, the get_forecasts
method could have been used instead, and if only the forecasted values had
been desired (and not additional results like confidence intervals), the
methods predict
or forecast
could have been used.
Local level model¶
See the previous sections for the Python implementation of the local level model.
Real business cycle model¶
Due to the the complexity of the model, the full code for the model is too
long to display inline, but it is provided in the appendices. This code
implements the real business cycle model in a class named SimpleRBC
, and
allows some of the structural parameters to be estimated and some to be
calibrated (set to specific values).
Often in structural models one of the outcomes of interest is the time paths of the observed variables following a hypothetical structural shock; these time paths are called impulse response functions, and they can be generated for any state space model.
In the first application, we will calibrate all of the structural parameters to
the values suggested in [26] and simply estimate
the measurement error variances (these do not affect the model dynamics or the
impulse responses). Once the model has been estimated, the
impulse_responses
method can be used to generate the time paths.
# Calibrate everything except measurement variances
calibrated = {
'discount_rate': 0.95,
'disutility_labor': 3.0,
'capital_share': 0.36,
'depreciation_rate': 0.025,
'technology_shock_persistence': 0.85,
'technology_shock_var': 0.04**2
}
calibrated_mod = SimpleRBC(rbc_data, calibrated=calibrated)
calibrated_res = calibrated_mod.fit()
calibrated_irfs = calibrated_res.impulse_responses(40, orthogonalized=True)
The calculated impulse responses are displayed in Fig. 8. By calibrating fewer parameters we can expand estimation to include some of the structural parameters. For example, we may consider also estimating the two parameters describing the technology shock. Implementing this requires only the obvious changes to the above code. The impulse responses corresponding to this second exercise are displayed in Fig. 9. [5]
Recall that the RBC model has three observables, output, labor, and consumption, and two unobserved states, capital and the technology process. The Kalman filter provides optimal estimates of these unobserved series at time \(t\) based on on all data up to time \(t\), and the state smoother provides optimal estimates based on the full dataset. These can be retrieved from the results object. Fig. 10 displays the smoothed state values and confidence intervals for the partially estimated case.
[5] | We note again that this example is merely by way of illustration; it does not represent best-practices for careful RBC estimation. |