SARIMAX: Introduction
=====================

This notebook replicates examples from the Stata ARIMA time series
estimation and postestimation documentation.

First, we replicate the four estimation examples
http://www.stata.com/manuals13/tsarima.pdf:

1. ARIMA(1,1,1) model on the U.S. Wholesale Price Index (WPI) dataset.
2. Variation of example 1 which adds an MA(4) term to the ARIMA(1,1,1)
   specification to allow for an additive seasonal effect.
3. ARIMA(2,1,0) x (1,1,0,12) model of monthly airline data. This example
   allows a multiplicative seasonal effect.
4. ARMA(1,1) model with exogenous regressors; describes consumption as
   an autoregressive process on which also the money supply is assumed
   to be an explanatory variable.

Second, we demonstrate postestimation capabilitites to replicate
http://www.stata.com/manuals13/tsarimapostestimation.pdf. The model from
example 4 is used to demonstrate:

1. One-step-ahead in-sample prediction
2. n-step-ahead out-of-sample forecasting
3. n-step-ahead in-sample dynamic prediction

.. code:: python

    %matplotlib inline

.. code:: python

    import numpy as np
    import pandas as pd
    from dismalpy import ssm
    import matplotlib.pyplot as plt
    from datetime import datetime

ARIMA Example 1: Arima
~~~~~~~~~~~~~~~~~~~~~~

As can be seen in the graphs from Example 2, the Wholesale price index
(WPI) is growing over time (i.e. is not stationary). Therefore an ARMA
model is not a good specification. In this first example, we consider a
model where the original time series is assumed to be integrated of
order 1, so that the difference is assumed to be stationary, and fit a
model with one autoregressive lag and one moving average lag, as well as
an intercept term.

The postulated data process is then:

.. math::


   \Delta y_t = c + \phi_1 \Delta y_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_{t}

where :math:`c` is the intercept of the ARMA model, :math:`\Delta` is
the first-difference operator, and we assume
:math:`\epsilon_{t} \sim N(0, \sigma^2)`. This can be rewritten to
emphasize lag polynomials as (this will be useful in example 2, below):

.. math::


   (1 - \phi_1 L ) \Delta y_t = c + (1 + \theta_1 L) \epsilon_{t}

where :math:`L` is the lag operator.

Notice that one difference between the Stata output and the output below
is that Stata estimates the following model:

.. math::


   (\Delta y_t - \beta_0) = \phi_1 ( \Delta y_{t-1} - \beta_0) + \theta_1 \epsilon_{t-1} + \epsilon_{t}

where :math:`\beta_0` is the mean of the process :math:`y_t`. This model
is equivalent to the one estimated in the SARIMAX class, but the
interpretation is different. To see the equivalence, note that:

.. math::


   (\Delta y_t - \beta_0) = \phi_1 ( \Delta y_{t-1} - \beta_0) + \theta_1 \epsilon_{t-1} + \epsilon_{t} \\
   \Delta y_t = (1 - \phi_1) \beta_0 + \phi_1 \Delta y_{t-1} + \theta_1 \epsilon_{t-1} + \epsilon_{t}

so that :math:`c = (1 - \phi_1) \beta_0`.

.. code:: python

    # Dataset
    data = pd.read_stata('data/wpi1.dta')
    data.index = data.t
    
    # Fit the model
    mod = ssm.SARIMAX(data['wpi'], trend='c', order=(1,1,1))
    res = mod.fit()
    print(res.summary())


.. parsed-literal::

                               Statespace Model Results                           
    ==============================================================================
    Dep. Variable:                    wpi   No. Observations:                  124
    Model:               SARIMAX(1, 1, 1)   Log Likelihood                -134.983
    Date:                Mon, 21 Sep 2015   AIC                            277.965
    Time:                        14:49:48   BIC                            289.246
    Sample:                    01-01-1960   HQIC                           282.548
                             - 10-01-1990                                         
    Covariance Type:                  opg                                         
    ==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
    ------------------------------------------------------------------------------
    intercept      0.1050      0.068      1.541      0.123      -0.029       0.239
    ar.L1          0.8740      0.054     16.124      0.000       0.768       0.980
    ma.L1         -0.4206      0.100     -4.204      0.000      -0.617      -0.225
    sigma2         0.5226      0.053      9.921      0.000       0.419       0.626
    ===================================================================================
    Ljung-Box (Q):                       36.96   Jarque-Bera (JB):                10.15
    Prob(Q):                              0.61   Prob(JB):                         0.01
    Heteroskedasticity (H):              17.00   Skew:                             0.27
    Prob(H) (two-sided):                  0.00   Kurtosis:                         4.30
    ===================================================================================
    
    Warnings:
    [1] Covariance matrix calculated using the outer product of gradients.


Thus the maximum likelihood estimates imply that for the process above,
we have:

.. math::


   \Delta y_t = 0.1050 + 0.8740 \Delta y_{t-1} - 0.4206 \epsilon_{t-1} + \epsilon_{t}

where :math:`\epsilon_{t} \sim N(0, 0.5226)`. Finally, recall that
:math:`c = (1 - \phi_1) \beta_0`, and here :math:`c = 0.1050` and
:math:`\phi_1 = 0.8740`. To compare with the output from Stata, we could
calculate the mean:

.. math:: \beta_0 = \frac{c}{1 - \phi_1} = \frac{0.1050}{1 - 0.8740} = 0.83

**Note**: these values are slightly different from the values in the
Stata documentation because the optimizer here has found parameters here
that yield a higher likelihood. Nonetheless, they are very close.

ARIMA Example 2: Arima with additive seasonal effects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This model is an extension of that from example 1. Here the data is
assumed to follow the process:

.. math::


   \Delta y_t = c + \phi_1 \Delta y_{t-1} + \theta_1 \epsilon_{t-1} + \theta_4 \epsilon_{t-4} + \epsilon_{t}

The new part of this model is that there is allowed to be a annual
seasonal effect (it is annual even though the periodicity is 4 because
the dataset is quarterly). The second difference is that this model uses
the log of the data rather than the level.

Before estimating the dataset, graphs showing:

1. The time series (in logs)
2. The first difference of the time series (in logs)
3. The autocorrelation function
4. The partial autocorrelation function.

From the first two graphs, we note that the original time series does
not appear to be stationary, whereas the first-difference does. This
supports either estimating an ARMA model on the first-difference of the
data, or estimating an ARIMA model with 1 order of integration (recall
that we are taking the latter approach). The last two graphs support the
use of an ARMA(1,1,1) model.

.. code:: python

    # Dataset
    data = pd.read_stata('data/wpi1.dta')
    data.index = data.t
    data['ln_wpi'] = np.log(data['wpi'])
    data['D.ln_wpi'] = data['ln_wpi'].diff()

.. code:: python

    # Graph data
    fig, axes = plt.subplots(1, 2, figsize=(15,4))
    
    # Levels
    axes[0].plot(data.index._mpl_repr(), data['wpi'], '-')
    axes[0].set(title='US Wholesale Price Index')
    
    # Log difference
    axes[1].plot(data.index._mpl_repr(), data['D.ln_wpi'], '-')
    axes[1].hlines(0, data.index[0], data.index[-1], 'r')
    axes[1].set(title='US Wholesale Price Index - difference of logs');


.. image:: sarimax_stata.ipynb_9_0.png


.. code:: python

    # Graph data
    fig, axes = plt.subplots(1, 2, figsize=(15,4))
    
    fig = sm.graphics.tsa.plot_acf(data.ix[1:, 'D.ln_wpi'], lags=40, ax=axes[0])
    fig = sm.graphics.tsa.plot_pacf(data.ix[1:, 'D.ln_wpi'], lags=40, ax=axes[1])


::


    ---------------------------------------------------------------------------

    NameError                                 Traceback (most recent call last)

    <ipython-input-6-7935fe6395f0> in <module>()
          2 fig, axes = plt.subplots(1, 2, figsize=(15,4))
          3 
    ----> 4 fig = sm.graphics.tsa.plot_acf(data.ix[1:, 'D.ln_wpi'], lags=40, ax=axes[0])
          5 fig = sm.graphics.tsa.plot_pacf(data.ix[1:, 'D.ln_wpi'], lags=40, ax=axes[1])


    NameError: name 'sm' is not defined


.. image:: sarimax_stata.ipynb_10_1.png


To understand how to specify this model here, first recall that from
example 1 we used the following code to specify the ARIMA(1,1,1) model:

.. code:: python

    mod = sm.tsa.statespace.SARIMAX(data['wpi'], trend='c', order=(1,1,1))

The ``order`` argument is a tuple of the form
``(AR specification, Integration order, MA specification)``. The
integration order must be an integer (for example, here we assumed one
order of integration, so it was specified as 1. In a pure ARMA model
where the underlying data is already stationary, it would be 0).

For the AR specification and MA specification components, there are two
possiblities. The first is to specify the **maximum degree** of the
corresponding lag polynomial, in which case the component is an integer.
For example, if we wanted to specify an ARIMA(1,1,4) process, we would
use:

.. code:: python

    mod = sm.tsa.statespace.SARIMAX(data['wpi'], trend='c', order=(1,1,4))

and the corresponding data process would be:

.. math::


   y_t = c + \phi_1 y_{t-1} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \theta_3 \epsilon_{t-3} + \theta_4 \epsilon_{t-4} + \epsilon_{t}

or

.. math::


   (1 - \phi_1 L)\Delta y_t = c + (1 + \theta_1 L + \theta_2 L^2 + \theta_3 L^3 + \theta_4 L^4) \epsilon_{t}

When the specification parameter is given as a maximum degree of the lag
polynomial, it implies that all polynomial terms up to that degree are
included. Notice that this is *not* the model we want to use, because it
would include terms for :math:`\epsilon_{t-2}` and
:math:`\epsilon_{t-3}`, which we don't want here.

What we want is a polynomial that has terms for the 1st and 4th degrees,
but leaves out the 2nd and 3rd terms. To do that, we need to provide a
tuple for the specifiation parameter, where the tuple describes **the
lag polynomial itself**. In particular, here we would want to use:

.. code:: python

    ar = 1          # this is the maximum degree specification
    ma = (1,0,0,1)  # this is the lag polynomial specification
    mod = sm.tsa.statespace.SARIMAX(data['wpi'], trend='c', order=(ar,1,ma)))

This gives the following form for the process of the data:

.. math::


   \Delta y_t = c + \phi_1 \Delta y_{t-1} + \theta_1 \epsilon_{t-1} + \theta_4 \epsilon_{t-4} + \epsilon_{t} \\
   (1 - \phi_1 L)\Delta y_t = c + (1 + \theta_1 L + \theta_4 L^4) \epsilon_{t}

which is what we want.

.. code:: python

    # Fit the model
    mod = ssm.SARIMAX(data['ln_wpi'], trend='c', order=(1,1,(1,0,0,1)))
    res = mod.fit(method='powell', disp=False)
    print(res.summary())


.. parsed-literal::

                                 Statespace Model Results                            
    =================================================================================
    Dep. Variable:                    ln_wpi   No. Observations:                  124
    Model:             SARIMAX(1, 1, (1, 4))   Log Likelihood                 386.129
    Date:                   Mon, 21 Sep 2015   AIC                           -762.258
    Time:                           14:49:49   BIC                           -748.157
    Sample:                       01-01-1960   HQIC                          -756.530
                                - 10-01-1990                                         
    Covariance Type:                     opg                                         
    ==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
    ------------------------------------------------------------------------------
    intercept      0.0027      0.002      1.670      0.095      -0.000       0.006
    ar.L1          0.7717      0.095      8.147      0.000       0.586       0.957
    ma.L1         -0.4021      0.126     -3.194      0.001      -0.649      -0.155
    ma.L4          0.3158      0.119      2.647      0.008       0.082       0.550
    sigma2         0.0001   9.74e-06     11.175      0.000    8.97e-05       0.000
    ===================================================================================
    Ljung-Box (Q):                       29.37   Jarque-Bera (JB):                44.36
    Prob(Q):                              0.89   Prob(JB):                         0.00
    Heteroskedasticity (H):               2.56   Skew:                             0.29
    Prob(H) (two-sided):                  0.00   Kurtosis:                         5.88
    ===================================================================================
    
    Warnings:
    [1] Covariance matrix calculated using the outer product of gradients.


ARIMA Example 3: Airline Model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In the previous example, we included a seasonal effect in an *additive*
way, meaning that we added a term allowing the process to depend on the
4th MA lag. It may be instead that we want to model a seasonal effect in
a multiplicative way. We often write the model then as an ARIMA
:math:`(p,d,q) \times (P,D,Q)_s`, where the lowercast letters indicate
the specification for the non-seasonal component, and the uppercase
letters indicate the specification for the seasonal component; :math:`s`
is the periodicity of the seasons (e.g. it is often 4 for quarterly data
or 12 for monthly data). The data process can be written generically as:

.. math::


   \phi_p (L) \tilde \phi_P (L^s) \Delta^d \Delta_s^D y_t = A(t) + \theta_q (L) \tilde \theta_Q (L^s) \epsilon_t

where:

-  :math:`\phi_p (L)` is the non-seasonal autoregressive lag polynomial
-  :math:`\tilde \phi_P (L^s)` is the seasonal autoregressive lag
   polynomial
-  :math:`\Delta^d \Delta_s^D y_t` is the time series, differenced
   :math:`d` times, and seasonally differenced :math:`D` times.
-  :math:`A(t)` is the trend polynomial (including the intercept)
-  :math:`\theta_q (L)` is the non-seasonal moving average lag
   polynomial
-  :math:`\tilde \theta_Q (L^s)` is the seasonal moving average lag
   polynomial

sometimes we rewrite this as:

.. math::


   \phi_p (L) \tilde \phi_P (L^s) y_t^* = A(t) + \theta_q (L) \tilde \theta_Q (L^s) \epsilon_t

where :math:`y_t^* = \Delta^d \Delta_s^D y_t`. This emphasizes that just
as in the simple case, after we take differences (here both non-seasonal
and seasonal) to make the data stationary, the resulting model is just
an ARMA model.

As an example, consider the airline model ARIMA
:math:`(2,1,0) \times (1,1,0)_{12}`, with an intercept. The data process
can be written in the form above as:

.. math::


   (1 - \phi_1 L - \phi_2 L^2) (1 - \tilde \phi_1 L^{12}) \Delta \Delta_{12} y_t = c + \epsilon_t

Here, we have:

-  :math:`\phi_p (L) = (1 - \phi_1 L - \phi_2 L^2)`
-  :math:`\tilde \phi_P (L^s) = (1 - \phi_1 L^12)`
-  :math:`d = 1, D = 1, s=12` indicating that :math:`y_t^*` is derived
   from :math:`y_t` by taking first-differences and then taking 12-th
   differences.
-  :math:`A(t) = c` is the *constant* trend polynomial (i.e. just an
   intercept)
-  :math:`\theta_q (L) = \tilde \theta_Q (L^s) = 1` (i.e. there is no
   moving average effect)

It may still be confusing to see the two lag polynomials in front of the
time-series variable, but notice that we can multiply the lag
polynomials together to get the following model:

.. math::


   (1 - \phi_1 L - \phi_2 L^2 - \tilde \phi_1 L^{12} + \phi_1 \tilde \phi_1 L^{13} + \phi_2 \tilde \phi_1 L^{14} ) y_t^* = c + \epsilon_t

which can be rewritten as:

.. math::


   y_t^* = c + \phi_1 y_{t-1}^* + \phi_2 y_{t-2}^* + \tilde \phi_1 y_{t-12}^* - \phi_1 \tilde \phi_1 y_{t-13}^* - \phi_2 \tilde \phi_1 y_{t-14}^* + \epsilon_t

This is similar to the additively seasonal model from example 2, but the
coefficients in front of the autoregressive lags are actually
combinations of the underlying seasonal and non-seasonal parameters.

Specifying the model here is done simply by adding the
``seasonal_order`` argument, which accepts a tuple of the form
``(Seasonal AR specification, Seasonal Integration order, Seasonal MA, Seasonal periodicity)``.
The seasonal AR and MA specifications, as before, can be expressed as a
maximum polynomial degree or as the lag polynomial itself. Seasonal
periodicity is an integer.

For the airline model ARIMA :math:`(2,1,0) \times (1,1,0)_{12}` with an
intercept, the command is:

.. code:: python

    mod = sm.tsa.statespace.SARIMAX(data['lnair'], order=(2,1,0), seasonal_order=(1,1,0,12))

.. code:: python

    # Dataset
    data = pd.read_stata('data/air2.dta')
    data.index = pd.date_range(start=datetime(data.time[0], 1, 1), periods=len(data), freq='MS')
    data['lnair'] = np.log(data['air'])
    
    # Fit the model
    mod = ssm.SARIMAX(data['lnair'], order=(2,1,0), seasonal_order=(1,1,0,12), simple_differencing=True)
    res = mod.fit()
    print(res.summary())


.. parsed-literal::

                                     Statespace Model Results                                 
    ==========================================================================================
    Dep. Variable:                       D.DS12.lnair   No. Observations:                  131
    Model:             SARIMAX(2, 0, 0)x(1, 0, 0, 12)   Log Likelihood                 240.821
    Date:                            Mon, 21 Sep 2015   AIC                           -473.643
    Time:                                    14:49:49   BIC                           -462.142
    Sample:                                01-01-1949   HQIC                          -468.970
                                         - 12-01-1960                                         
    Covariance Type:                              opg                                         
    ==============================================================================
                     coef    std err          z      P>|z|      [0.025      0.975]
    ------------------------------------------------------------------------------
    ar.L1         -0.4057      0.066     -6.141      0.000      -0.535      -0.276
    ar.L2         -0.0799      0.023     -3.468      0.001      -0.125      -0.035
    ar.S.L12      -0.4723      0.064     -7.369      0.000      -0.598      -0.347
    sigma2         0.0014      0.000      8.535      0.000       0.001       0.002
    ===================================================================================
    Ljung-Box (Q):                       49.89   Jarque-Bera (JB):                 0.72
    Prob(Q):                              0.14   Prob(JB):                         0.70
    Heteroskedasticity (H):               0.54   Skew:                             0.14
    Prob(H) (two-sided):                  0.04   Kurtosis:                         3.23
    ===================================================================================
    
    Warnings:
    [1] Covariance matrix calculated using the outer product of gradients.


Notice that here we used an additional argument
``simple_differencing=True``. This controls how the order of integration
is handled in ARIMA models. If ``simple_differencing=True``, then the
time series provided as ``endog`` is literatlly differenced and an ARMA
model is fit to the resulting new time series. This implies that a
number of initial periods are lost to the differencing process, however
it may be necessary either to compare results to other packages (e.g.
Stata's ``arima`` always uses simple differencing) or if the seasonal
periodicity is large.

The default is ``simple_differencing=False``, in which case the
integration component is implemented as part of the state space
formulation, and all of the original data can be used in estimation.

ARIMA Example 4: ARMAX (Friedman)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This model demonstrates the use of explanatory variables (the X part of
ARMAX). When exogenous regressors are included, the SARIMAX module uses
the concept of "regression with SARIMA errors" (see
http://robjhyndman.com/hyndsight/arimax/ for details of regression with
ARIMA errors versus alternative specifications), so that the model is
specified as:

.. math::


   y_t = \beta_t x_t + u_t \\
           \phi_p (L) \tilde \phi_P (L^s) \Delta^d \Delta_s^D u_t = A(t) +
               \theta_q (L) \tilde \theta_Q (L^s) \epsilon_t

Notice that the first equation is just a linear regression, and the
second equation just describes the process followed by the error
component as SARIMA (as was described in example 3). One reason for this
specification is that the estimated parameters have their natural
interpretations.

This specification nests many simpler specifications. For example,
regression with AR(2) errors is:

.. math::


   y_t = \beta_t x_t + u_t \\
   (1 - \phi_1 L - \phi_2 L^2) u_t = A(t) + \epsilon_t

The model considered in this example is regression with ARMA(1,1)
errors. The process is then written:

.. math::


   \text{consump}_t = \beta_0 + \beta_1 \text{m2}_t + u_t \\
   (1 - \phi_1 L) u_t = (1 - \theta_1 L) \epsilon_t

Notice that :math:`\beta_0` is, as described in example 1 above, *not*
the same thing as an intercept specified by ``trend='c'``. Whereas in
the examples above we estimated the intercept of the model via the trend
polynomial, here, we demonstrate how to estimate :math:`\beta_0` itself
by adding a constant to the exogenous dataset. In the output, the
:math:`beta_0` is called ``const``, whereas above the intercept
:math:`c` was called ``intercept`` in the output.

.. code:: python

    # Dataset
    data = pd.read_stata('data/friedman2.dta')
    data.index = data.time
    
    # Variables
    endog = data.ix['1959':'1981', 'consump']
    exog = sm.add_constant(data.ix['1959':'1981', 'm2'])
    
    # Fit the model
    mod = ssm.SARIMAX(endog, exog, order=(1,0,1))
    res = mod.fit()
    print(res.summary())


::


    ---------------------------------------------------------------------------

    NameError                                 Traceback (most recent call last)

    <ipython-input-9-ee4d9b9ad44e> in <module>()
          5 # Variables
          6 endog = data.ix['1959':'1981', 'consump']
    ----> 7 exog = sm.add_constant(data.ix['1959':'1981', 'm2'])
          8 
          9 # Fit the model


    NameError: name 'sm' is not defined


ARIMA Postestimation: Example 1 - Dynamic Forecasting
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here we describe some of the post-estimation capabilities of SARIMAX.

First, using the model from example, we estimate the parameters using
data that *excludes the last few observations* (this is a little
artificial as an example, but it allows considering performance of
out-of-sample forecasting and facilitates comparison to Stata's
documentation).

.. code:: python

    # Dataset
    raw = pd.read_stata('data/friedman2.dta')
    raw.index = raw.time
    data = raw.ix[:'1981']
    
    # Variables
    endog = data.ix['1959':, 'consump']
    exog = sm.add_constant(data.ix['1959':, 'm2'])
    nobs = endog.shape[0]
    
    # Fit the model
    mod = ssm.SARIMAX(endog.ix[:'1978-01-01'], exog=exog.ix[:'1978-01-01'], order=(1,0,1))
    fit_res = mod.fit()
    print(fit_res.summary())


::


    ---------------------------------------------------------------------------

    NameError                                 Traceback (most recent call last)

    <ipython-input-10-ed021f3dfdac> in <module>()
          6 # Variables
          7 endog = data.ix['1959':, 'consump']
    ----> 8 exog = sm.add_constant(data.ix['1959':, 'm2'])
          9 nobs = endog.shape[0]
         10 


    NameError: name 'sm' is not defined


Next, we want to get results for the full dataset but using the
estimated parameters (on a subset of the data).

.. code:: python

    mod = sm.tsa.statespace.SARIMAX(endog, exog=exog, order=(1,0,1))
    res = mod.filter(fit_res.params)


::


    ---------------------------------------------------------------------------

    NameError                                 Traceback (most recent call last)

    <ipython-input-11-404177711e7b> in <module>()
    ----> 1 mod = sm.tsa.statespace.SARIMAX(endog, exog=exog, order=(1,0,1))
          2 res = mod.filter(fit_res.params)


    NameError: name 'sm' is not defined


The ``predict`` command is first applied here to get in-sample
predictions. We use the ``full_results=True`` argument to allow us to
calculate confidence intervals (the default output of ``predict`` is
just the predicted values).

With no other arguments, ``predict`` returns the one-step-ahead
in-sample predictions for the entire sample.

.. code:: python

    # In-sample one-step-ahead predictions
    predict = res.get_prediction()
    predict_ci = predict.conf_int()

We can also get *dynamic predictions*. One-step-ahead prediction uses
the true values of the endogenous values at each step to predict the
next in-sample value. Dynamic predictions use one-step-ahead prediction
up to some point in the dataset (specified by the ``dynamic`` argument);
after that, the previous *predicted* endogenous values are used in place
of the true endogenous values for each new predicted element.

The ``dynamic`` argument is specified to be an *offset* relative to the
``start`` argument. If ``start`` is not specified, it is assumed to be
``0``.

Here we perform dynamic prediction starting in the first quarter of
1978.

.. code:: python

    # Dynamic predictions
    predict_dy = res.get_prediction(dynamic='1978-01-01')
    predict_dy_ci = predict_dy.conf_int()


::


    ---------------------------------------------------------------------------

    ValueError                                Traceback (most recent call last)

    <ipython-input-13-1bf0a0656a56> in <module>()
          1 # Dynamic predictions
    ----> 2 predict_dy = res.get_prediction(dynamic='1978-01-01')
          3 predict_dy_ci = predict_dy.conf_int()


    /Users/fulton/projects/statsmodels/statsmodels/tsa/statespace/mlemodel.pyc in get_prediction(self, start, end, dynamic, **kwargs)
       1795             except KeyError:
       1796                 raise ValueError("Dynamic must be in dates. Got %s | %s" %
    -> 1797                                  (str(dynamic), str(dtdynamic)))
       1798 
       1799         # Perform the prediction


    ValueError: Dynamic must be in dates. Got 1978-01-01 | 1978-01-01 00:00:00


We can graph the one-step-ahead and dynamic predictions (and the
corresponding confidence intervals) to see their relative performance.
Notice that up to the point where dynamic prediction begins (1978:Q1),
the two are the same.

.. code:: python

    # Graph
    fig, ax = plt.subplots(figsize=(9,4))
    npre = 4
    ax.set(title='Personal consumption', xlabel='Date', ylabel='Billions of dollars')
    
    # Plot data points
    data.ix['1977-07-01':, 'consump'].plot(ax=ax, style='o', label='Observed')
    
    # Plot predictions
    predict.predicted_mean.ix['1977-07-01':].plot(ax=ax, style='r--', label='One-step-ahead forecast')
    ci = predict_ci.ix['1977-07-01':]
    ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], color='r', alpha=0.1)
    predict_dy.predicted_mean.ix['1977-07-01':].plot(ax=ax, style='g', label='Dynamic forecast (1978)')
    ci = predict_dy_ci.ix['1977-07-01':]
    ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], color='g', alpha=0.1)
    
    legend = ax.legend(loc='lower right')


::


    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-14-95a397aa477f> in <module>()
          8 
          9 # Plot predictions
    ---> 10 predict.predicted_mean.ix['1977-07-01':].plot(ax=ax, style='r--', label='One-step-ahead forecast')
         11 ci = predict_ci.ix['1977-07-01':]
         12 ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], color='r', alpha=0.1)


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/pandas-0.16.2-py2.7-macosx-10.9-x86_64.egg/pandas/tools/plotting.pyc in plot_series(data, kind, ax, figsize, use_index, title, grid, legend, style, logx, logy, loglog, xticks, yticks, xlim, ylim, rot, fontsize, colormap, table, yerr, xerr, label, secondary_y, **kwds)
       2517                  yerr=yerr, xerr=xerr,
       2518                  label=label, secondary_y=secondary_y,
    -> 2519                  **kwds)
       2520 
       2521 


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/pandas-0.16.2-py2.7-macosx-10.9-x86_64.egg/pandas/tools/plotting.pyc in _plot(data, x, y, subplots, ax, kind, **kwds)
       2322         plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds)
       2323 
    -> 2324     plot_obj.generate()
       2325     plot_obj.draw()
       2326     return plot_obj.result


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/pandas-0.16.2-py2.7-macosx-10.9-x86_64.egg/pandas/tools/plotting.pyc in generate(self)
        910     def generate(self):
        911         self._args_adjust()
    --> 912         self._compute_plot_data()
        913         self._setup_subplots()
        914         self._make_plot()


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/pandas-0.16.2-py2.7-macosx-10.9-x86_64.egg/pandas/tools/plotting.pyc in _compute_plot_data(self)
       1015         if is_empty:
       1016             raise TypeError('Empty {0!r}: no numeric data to '
    -> 1017                             'plot'.format(numeric_data.__class__.__name__))
       1018 
       1019         self.data = numeric_data


    TypeError: Empty 'DataFrame': no numeric data to plot


.. image:: sarimax_stata.ipynb_27_1.png


Finally, graph the prediction *error*. It is obvious that, as one would
suspect, one-step-ahead prediction is considerably better.

.. code:: python

    # Prediction error
    
    # Graph
    fig, ax = plt.subplots(figsize=(9,4))
    npre = 4
    ax.set(title='Forecast error', xlabel='Date', ylabel='Forecast - Actual')
    
    # In-sample one-step-ahead predictions and 95% confidence intervals
    predict_error = predict.predicted_mean - endog
    predict_error.ix['1977-10-01':].plot(ax=ax, label='One-step-ahead forecast')
    ci = predict_ci.ix['1977-10-01':].copy()
    ci.iloc[:,0] -= endog.loc['1977-10-01':]
    ci.iloc[:,1] -= endog.loc['1977-10-01':]
    ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], alpha=0.1)
    
    # Dynamic predictions and 95% confidence intervals
    predict_dy_error = predict_dy.predicted_mean - endog
    predict_dy_error.ix['1977-10-01':].plot(ax=ax, style='r', label='Dynamic forecast (1978)')
    ci = predict_dy_ci.ix['1977-10-01':].copy()
    ci.iloc[:,0] -= endog.loc['1977-10-01':]
    ci.iloc[:,1] -= endog.loc['1977-10-01':]
    ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], color='r', alpha=0.1)
    
    legend = ax.legend(loc='lower left');
    legend.get_frame().set_facecolor('w')


::


    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    <ipython-input-15-bb71ad6772cd> in <module>()
         12 ci.iloc[:,0] -= endog.loc['1977-10-01':]
         13 ci.iloc[:,1] -= endog.loc['1977-10-01':]
    ---> 14 ax.fill_between(ci.index, ci.ix[:,0], ci.ix[:,1], alpha=0.1)
         15 
         16 # Dynamic predictions and 95% confidence intervals


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in fill_between(self, x, y1, y2, where, interpolate, **kwargs)
       4329 
       4330         # Convert the arrays so we can work with them
    -> 4331         x = ma.masked_invalid(self.convert_xunits(x))
       4332         y1 = ma.masked_invalid(self.convert_yunits(y1))
       4333         y2 = ma.masked_invalid(self.convert_yunits(y2))


    /Users/fulton/.virtualenvs/default/lib/python2.7/site-packages/numpy/ma/core.pyc in masked_invalid(a, copy)
       2242         cls = type(a)
       2243     else:
    -> 2244         condition = ~(np.isfinite(a))
       2245         cls = MaskedArray
       2246     result = a.view(cls)


    TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''


.. image:: sarimax_stata.ipynb_29_1.png