.. Introduction
.. Citation: :cite:``
.. Footnote: [#]_ 

Maximum Likelihood Estimation
-----------------------------

Classical estimation of parameters in state space models is facilitated by the
availability of the likelihood as a byproduct of the filtering recursions.
Given a set of initial parameters, numerical maximization techniques,
often quasi-Newton methods, can be applied to find the set of parameters
that maximize (locally) the likelihood function,
:math:`\mathcal{L}(Y_n \mid \psi)`. In this section we describe how to apply
maximum likelihood estimation (MLE) to state space models in Python. First we
describe a direct approach using the classes defined in the previous section.
Second, we show how integration with the Python package Statsmodels
(:cite:`seabold_statsmodels:_2010`) can be used to great effect to streamline
estimation.

Statsmodels is a library for statistical and econometric modeling using
classical techniques. Its goals are comparable in many ways to those of the
commercial econometric software like Stata in that it provides a range of
out-of-the-box models that require little technical knowledge to apply to data.
It includes least squares regression models, generalized linear models (GLM),
robust linear models, discrete choice models, contrast analysis, and time-series
analysis. Recent and forthcoming additions include mixed generalized linear
models (mixed GLM) and general estimating equations (GEE). In addition, a
version of the state space implementation described in this paper specialized
to the classical estimation setting is forthcoming in Statsmodels.

.. todo::

    Give a link / citation to the Statsmodels-specific version of this paper,
    and also reference in the footnote below.

We introduce a new parent class ``dp.ssm.MLEModel`` (hereafter simply
``MLEModel``) that transparently handles the integration. Models extending from
the ``MLEModel`` class can painlessly perform maximum likelihood estimation via
a new ``fit`` method. In addition, summary tables, postestimation results, and
model diagnostics are available.

Direct approach
^^^^^^^^^^^^^^^

Numerical optimziation routines in Python are available through the Python
package SciPy (:cite:`jones_scipy:_2001`). Generically, these are in the
form of minimizers that accept a function and a set of starting parameters and
return the set of parameters that (locally) minimize the function. There are a
number of available algorithms, including the popular BFGS
(Broyden–Fletcher–Goldfarb–Shannon) method. As is usual when minimization
routines are available, in order to maximize the (log) likelihood, we minimize
its negative.

The code below demonstrates how to apply maximum likelihood estimation to the
``LocalLevel`` class defined in the previous section for the Nile dataset. In
this case, because we have not bothered to define good starting parameters, we
use the Nelder-Mead algorithm which can be more robust than BFGS, although it
may converge more slowly.

::

    # Import the generic minimization function from scipy
    from scipy.optimize import minimize

    # Create a new function to return the negative of the loglikelihood
    nile_model_4 = LocalLevel(nile)
    def func(params):
        nile_model_4.update(params)
        return -nile_model_4.loglike()

    # Perform numerical optimization
    output = minimize(func, nile_model_4.start_params, method='Nelder-Mead')
    
    print(output.x)  # [ 15108.31   1463.55]
    nile_model_4.update(output.x)
    print(nile_model_4.loglike())  # -632.537685587

The maximizing parameters are very close to those reported by
:cite:`durbin_time_2012` and achieve a negligibly higher loglikelihood
(-632.53769 versus -632.53770).

Integration with Statsmodels
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

While likelihood maximization itself can be performed relatively easily, in
practice there are often many other desired quantities, for example measures
of parameter uncertainty (standard errors and confidence intervals). Another
issue that arises in practice is that it is most convenient to allow the
numerical optimizer to choose parameters across the entire real line. This
means that some combinations of parameters chosen by the optimizer may lead to
an invalid model specification. It is possible in some cases to use an
optimization procedure with constraints or bounds, but it is almost always
easier to allow the optimizer to choose in an unconstrained way and then to
transform the parameters to fit the model. The implementation of parameter
transformations will be discussed at greater length below.

While there is no barrier to calculating those quantities or implementing these
transformations, the calculations are standard and there is no reason for each
user to implement them separately. Again we turn to the principle of separation
of concerns made possible through the object oriented programming approach,
this time by making use of the tools available in the Python package
Statsmodels.

Creating a model that features integration with the Statsmodels code base is as
simple as using ``dp.ssm.MLEModel`` as the parent class instead of
``dp.ssm.Model``. As the name suggests, it contains additional features
specific to maximum likelihood estimation while at the same time retaining all
of the features described above. [#]_ In particular, a new method, ``fit``, is
available which automatically performs maximum likelihood estimation using the
starting parameters defined in the ``start_params`` attribute (see above) and
returns a results object.

Because the emphasis has shifted to evaluating the model in terms of
parameters, it is important to note that the ``loglike``, ``filter``,
``smooth``, and ``simulation_smoother`` methods now require as their first
argument the parameter vector at which to perform the operation.

The following code redefines the local level model to
use the ``MLEModel`` base class; it is identical to the previous definition
except for three changes: (1) references to ``Model`` have been replaced by
references to ``MLEModel``, (2) the ``update`` method has been augmented
to allow for parameter transformations, and (3) a new attribute ``param_names``
has been added which augments output with descriptive parameter names. A brief
explanation about the changes to the ``update`` method follows.

First, a new argument has been added, describing whether or not the values in
the ``params`` vector are considered to be *transformed* (i.e. in the space of
parameters valid for the model) or not (i.e. in the unconstrained space of
parameters used by the optimizer). Second, a new line replaces the ``params``
vector with the output from the ``update`` method of the parent class
(``MLEModel``). If the parameters were *not* transformed, the parent ``update``
method calls the appropriate transformation functions (see below for details)
and returns the transformed parameters.

::

    class BaseMLELocalLevel(dp.ssm.MLEModel):
        start_params = [1.0, 1.0]
        param_names = ['obs.var', 'level.var']

        def __init__(self, endog):
            super(BaseMLELocalLevel, self).__init__(endog, k_states=1)

            self['design', 0, 0] = 1.0
            self['transition', 0, 0] = 1.0
            self['selection', 0, 0] = 1.0

            self.initialize_approximate_diffuse()
            self.loglikelihood_burn = 1

        def update(self, params, transformed=True):
            # Transform the parameters if they are not yet transformed
            params = super(BaseMLELocalLevel, self).update(params, transformed)

            self['obs_cov', 0, 0] = params[0]
            self['state_cov', 0, 0] = params[1]


With this new definition, we can instantiate our model and perform maximum
likelihood estimation. As one feature of the integration with Statsmodels, the
result object has a ``summary`` method that prints a table of results:

::

    nile_mlemodel_1 = BaseMLELocalLevel(nile)
    
    # Notice that to call loglike, we now need to specify parameters, but we
    # no longer need to call `update` beforehand.
    print(nile_mlemodel_1.loglike([15099.0, 1469.1]))  # -632.537695048

    # Again we use Nelder-Mead; now specified as method='nm'
    nile_mleresults_1 = nile_mlemodel_1.fit(method='nm', maxiter=1000)
    
    print(nile_mleresults_1.summary())

::

    ==============================================================================
    Dep. Variable:                 volume   No. Observations:                  100
    Model:                  MLELocalLevel   Log Likelihood                -632.538
    Date:                Sat, 12 Sep 2015   AIC                           1269.075
    Time:                        16:07:24   BIC                           1274.286
    Sample:                    01-01-1871   HQIC                          1271.184
                             - 01-01-1970                                         
    Covariance Type:                  opg                                         
    ==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
    ------------------------------------------------------------------------------
    obs.var     1.511e+04   2586.966      5.840      0.000       1e+04    2.02e+04
    level.var   1463.5474    843.718      1.735      0.083    -190.109    3117.203
    ===================================================================================
    Ljung-Box (Q):                       36.00   Jarque-Bera (JB):                 0.05
    Prob(Q):                              0.65   Prob(JB):                         0.98
    Heteroskedasticity (H):               0.61   Skew:                            -0.03
    Prob(H) (two-sided):                  0.16   Kurtosis:                         3.08
    ===================================================================================

    Warnings:
    [1] Covariance matrix calculated using the outer product of gradients.

A second feature is the availability of model diagnostics. Test statistics for
tests of the standardized residuals for normality, heteroskedasticity, and
serial correlation are reported at the bottom of the summary output. Diagnostic
plots can also be produced using the ``plot_diagnostics`` method, illustrated
in :numref:`figure_4-diag-nile`. [#]_ Notice that Statsmodels is aware of the
date index of the Nile dataset and uses that information in the summary table
and diagnostic plots.

.. _figure_4-diag-nile:

.. figure:: images/fig_4-diag-nile.png

   Diagnostic plots for standardised residuals after maximum likelihood
   estimation on Nile data.

A third feature is the availability of forecasting (through the
``get_forecasts`` method) and impulse response functions (through the
``impulse_responses`` method). Due to the nature of the local level model these
are uninteresting here, but will be exhibited in the ARMA(1,1) and real
business cycle examples below.

.. [#] An alternative is to use ``statsmodels.tsa.statespace.MLEModel`` as the
       parent class. This has the benefit that state space models can be
       estimated using the Statsmodels package alone but has the downside that
       some advanced features described in this paper are not yet available
       in that package. However, if all one wants to do is perform maximum
       likelihood estimation it is perfectly sufficient, and all of the code
       provided in this section will work equally well with this parent class.

.. [#] See sections 2.12 and 7.5 of :cite:`durbin_time_2012` for a description
       of the standardized residuals and the definitions of the provided
       diagnostic tests.

Parameter transformations
"""""""""""""""""""""""""

As mentioned above, parameter transformations are an important component of
maximum likelihood estimation in a wide variety of cases. For example, in the
local level model above the two estimated parameters are variances, which
cannot theoretically be negative. Although the optimizer avoided the
problematic regions in the above example, that will not always be the case. As
another example, ARMA models are typically assumed to be stationary. This
requires coefficients that permit inversion of the associated lag polynomial.
Parameter transformations can be used to enforce these and other
restrictions.

If an unconstrained variance parameter is squared the transformed
variance parameter will always be positive. :cite:`monahan_note_1984` and
:cite:`ansley_note_1986` describe transformations sufficient to induce
stationarity in the univariate and multivariate cases, respectively, by taking
advantage of the one-to-one correspondence between lag polynomial coefficients
and partial autocorrelations. [#]_

For practical reasons, it is strongly preferred that the transformation
function have a well-defined inverse so that starting parameters can
be specified in terms of the model space and then "untransformed" to the
appropriate values in the unconstrained space.

Implementing parameter transformations when using ``MLEModel`` as the base
class is as simple as adding two new methods: ``transform_params`` and
``untransform_params`` (if no parameter transformations as required, these
methods can simply be omitted from the class definition). The following code
redefines the local level model again, this time to include parameter
transformations to ensure positive variance parameters. [#]_

::

    class MLELocalLevel(dp.ssm.MLEModel):
        start_params = [1.0, 1.0]
        param_names = ['obs.var', 'level.var']

        def __init__(self, endog):
            super(MLELocalLevel, self).__init__(endog, k_states=1)

            self['design', 0, 0] = 1.0
            self['transition', 0, 0] = 1.0
            self['selection', 0, 0] = 1.0

            self.initialize_approximate_diffuse()
            self.loglikelihood_burn = 1
            
        def transform_params(self, params):
            return params**2
        
        def untransform_params(self, params):
            return params**0.5

        def update(self, params, transformed=True):
            # Transform the parameters if they are not yet transformed
            params = super(MLELocalLevel, self).update(params, transformed)

            self['obs_cov', 0, 0] = params[0]
            self['state_cov', 0, 0] = params[1]

All of the code given above then applies equally to this new model, except that
this class is robust to the optimizer selecting negative parameters.

.. [#] The transformations to induce stationarity are made available in this
       package as the functions
       ``dp.ssm.tools.constrain_stationary_univariate`` and
       ``dp.ssm.tools.constrain_stationary_multivariate``. Their inverses are
       also available, with the obvious name changes.

.. [#] Note that in Python, the exponentiation operator is ``**``.

Example models
^^^^^^^^^^^^^^

In this section, we extend the code from :ref:`representation-in-python` to
allow for maximum likelihood estimation through Statsmodels integration.

.. todo::

    Put a reference in here to the out-of-the-box models / section.

ARMA(1, 1) model
""""""""""""""""

::

    from dismalpy.ssm.tools import (constrain_stationary_univariate, unconstrain_stationary_univariate)

    class ARMA11(dp.ssm.MLEModel):
        start_params = [0, 0, 1]
        param_names = ['phi', 'theta', 'sigma2']

        def __init__(self, endog):
            super(ARMA11, self).__init__(
                endog, k_states=2, k_posdef=1, initialization='stationary')
            
            self['design', 0, 0] = 1.
            self['transition', 1, 0] = 1.
            self['selection', 0, 0] = 1.

        def transform_params(self, params):
            phi = constrain_stationary_univariate(params[0:1])
            theta = constrain_stationary_univariate(params[1:2])
            sigma2 = params[2]**2
            return np.r_[phi, theta, sigma2]
        
        def untransform_params(self, params):
            phi = unconstrain_stationary_univariate(params[0:1])
            theta = unconstrain_stationary_univariate(params[1:2])
            sigma2 = params[2]**0.5
            return np.r_[phi, theta, sigma2]
            
        def update(self, params, transformed=True):
            # Transform the parameters if they are not yet transformed
            params = super(ARMA11, self).update(params, transformed)
            
            self['design', 0, 1] = params[1]
            self['transition', 0, 0] = params[0]
            self['state_cov', 0, 0] = params[2]


The parameters can now be easily estimated via maximum likelihood as above.
This model also allows us to demonstrate the prediction and forecasting
features provided by the Statsmodels integration. In particular, we use the
``get_prediction`` method to retrieve a prediction object that gives in-sample
one-step-ahead predictions and out-of-sample forecasts, as well as confidence
intervals. :numref:`figure_4-forecast-inf` shows a graph of the output.

::

    inf_model = ARMA21(inf)
    inf_results = inf_model.fit()

    inf_forecast = inf_results.get_prediction(start='2010-01-01', end='2017-01-01')
    print(inf_forecast.predicted_mean)  # [2010-01-01   -0.000747 ...
    print(inf_forecast.conf_int())      # [2010-01-01   -0.006891  0.005398 ...

.. _figure_4-forecast-inf:

.. figure:: images/fig_4-forecast-inf.png

   In-sample one-step-ahead predictions and out-of-sample forecasts for
   ARMA(1,1) model on US CPI inflation data.

If only out-of-sample forecasts had been desired, the ``get_forecasts``
method could have been used instead, and if only the forecasted values had
been desired (and not additional results like confidence intervals), the
methods ``predict`` or ``forecast`` could have been used.

Local level model
"""""""""""""""""

See the previous sections for the Python implementation of the local level
model.

Real business cycle model
"""""""""""""""""""""""""

Due to the the complexity of the model, the full code for the model is too
long to display inline, but it is provided in the appendices. This code
implements the real business cycle model in a class named ``SimpleRBC``, and
allows some of the structural parameters to be estimated and some to be
calibrated (set to specific values).

Often in structural models one of the outcomes of interest is the time paths of
the observed variables following a hypothetical structural shock; these time
paths are called impulse response functions, and they can be generated for any
state space model.

In the first application, we will calibrate all of the structural parameters to
the values suggested in :cite:`ruge-murcia_methods_2007` and simply estimate
the measurement error variances (these do not affect the model dynamics or the
impulse responses). Once the model has been estimated, the
``impulse_responses`` method can be used to generate the time paths.

.. todo::

    Put in exactly where the code is.

::

    # Calibrate everything except measurement variances
    calibrated = {
        'discount_rate': 0.95,
        'disutility_labor': 3.0,
        'capital_share': 0.36,
        'depreciation_rate': 0.025,
        'technology_shock_persistence': 0.85,
        'technology_shock_var': 0.04**2
    }
    calibrated_mod = SimpleRBC(rbc_data, calibrated=calibrated)
    calibrated_res = calibrated_mod.fit()

    calibrated_irfs = calibrated_res.impulse_responses(40, orthogonalized=True)

The calculated impulse responses are displayed in
:numref:`figure_4-calibrated-irf`. By calibrating fewer parameters we can
expand estimation to include some of the structural parameters. For example,
we may consider also estimating the two parameters describing the technology
shock. Implementing this requires only the obvious changes to the above code.
The impulse responses corresponding to this second exercise are displayed
in :numref:`figure_4-estimated-irf`. [#]_

.. _figure_4-calibrated-irf:

.. figure:: images/fig_4-calibrated-irf.png

   Impulse response functions corresponding to a fully calibrated RBC model.

.. _figure_4-estimated-irf:

.. figure:: images/fig_4-estimated-irf.png

   Impulse response functions corresponding to a partially estimated RBC model.

Recall that the RBC model has three observables, output, labor, and
consumption, and two unobserved states, capital and the technology process. The
Kalman filter provides optimal estimates of these unobserved series at time
:math:`t` based on on all data up to time :math:`t`, and the state smoother
provides optimal estimates based on the full dataset. These can be retrieved
from the results object. :numref:`figure_4-estimated-states` displays the
smoothed state values and confidence intervals for the partially estimated
case.

.. _figure_4-estimated-states:

.. figure:: images/fig_4-estimated-states.png

   Smoothed estimates of capital and the technology process from the partially
   estimated RBC model.


.. [#] We note again that this example is merely by way of illustration; it
       does not represent best-practices for careful RBC estimation.