自迴歸

此筆記本介紹使用 AutoReg 模型進行自迴歸建模。它還涵蓋了 ar_select_order 在選擇使諸如 AIC 等訊息準則最小化的模型方面的輔助。自迴歸模型的動態由下式給出

\[y_t = \delta + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} + \epsilon_t.\]

AutoReg 也允許具有以下項的模型

  • 確定性項 (trend)

    • n:無確定性項

    • c:常數(預設)

    • ct:常數和時間趨勢

    • t:僅時間趨勢

  • 季節虛擬變數 (seasonal)

    • True 包含 \(s-1\) 個虛擬變數,其中 \(s\) 是時間序列的週期(例如,每月為 12)

  • 自訂確定性項 (deterministic)

    • 接受 DeterministicProcess

  • 外生變數 (exog)

    • 要包含在模型中的外生變數的 DataFramearray

  • 省略選定的落後期數 (lags)

    • 如果 lags 是整數的可迭代物件,則僅將這些包含在模型中。

完整規範為

\[y_t = \delta_0 + \delta_1 t + \phi_1 y_{t-1} + \ldots + \phi_p y_{t-p} + \sum_{i=1}^{s-1} \gamma_i d_i + \sum_{j=1}^{m} \kappa_j x_{t,j} + \epsilon_t.\]

其中

  • \(d_i\) 是一個季節虛擬變數,如果 \(mod(t, period) = i\),則為 1。如果模型包含常數(trend 中有 c),則排除週期 0。

  • \(t\) 是一個時間趨勢 (\(1,2,\ldots\)),在第一個觀察值中從 1 開始。

  • \(x_{t,j}\) 是外生迴歸量。注意,在定義模型時,這些在外生變數時間上與左側變數對齊。

  • \(\epsilon_t\) 假設為白雜訊過程。

第一個單元格匯入標準套件,並設定繪圖以內嵌顯示。

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import pandas_datareader as pdr
import seaborn as sns
from statsmodels.tsa.api import acf, graphics, pacf
from statsmodels.tsa.ar_model import AutoReg, ar_select_order

此單元格設定繪圖樣式,為 matplotlib 註冊 pandas 日期轉換器,並設定預設圖形大小。

[2]:
sns.set_style("darkgrid")
pd.plotting.register_matplotlib_converters()
# Default figure size
sns.mpl.rc("figure", figsize=(16, 6))
sns.mpl.rc("font", size=14)

第一組範例使用美國房屋開工率的月度成長率,該成長率未經季節性調整。季節性可以從規律的峰谷模式中明顯看出。我們將時間序列的頻率設定為「MS」(月份開始),以避免在使用 AutoReg 時出現警告。

[3]:
data = pdr.get_data_fred("HOUSTNSA", "1959-01-01", "2019-06-01")
housing = data.HOUSTNSA.pct_change().dropna()
# Scale by 100 to get percentages
housing = 100 * housing.asfreq("MS")
fig, ax = plt.subplots()
ax = housing.plot(ax=ax)
../../../_images/examples_notebooks_generated_autoregressions_6_0.png

我們可以從 AR(3) 開始。雖然這不是此數據的好模型,但它演示了 API 的基本用法。

[4]:
mod = AutoReg(housing, 3, old_names=False)
res = mod.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                     AutoReg(3)   Log Likelihood               -2993.442
Method:               Conditional MLE   S.D. of innovations             15.289
Date:                Thu, 03 Oct 2024   AIC                           5996.884
Time:                        15:55:21   BIC                           6019.794
Sample:                    05-01-1959   HQIC                          6005.727
                         - 06-01-2019
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.1228      0.573      1.961      0.050       0.000       2.245
HOUSTNSA.L1     0.1910      0.036      5.235      0.000       0.120       0.263
HOUSTNSA.L2     0.0058      0.037      0.155      0.877      -0.067       0.079
HOUSTNSA.L3    -0.1939      0.036     -5.319      0.000      -0.265      -0.122
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9680           -1.3298j            1.6448           -0.1499
AR.2            0.9680           +1.3298j            1.6448            0.1499
AR.3           -1.9064           -0.0000j            1.9064           -0.5000
-----------------------------------------------------------------------------

AutoReg 支援與 OLS 相同的共變異數估計器。以下,我們使用 cov_type="HC0",它是 White 的共變異數估計器。雖然參數估計值相同,但所有取決於標準誤差的量都會發生變化。

[5]:
res = mod.fit(cov_type="HC0")
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                     AutoReg(3)   Log Likelihood               -2993.442
Method:               Conditional MLE   S.D. of innovations             15.289
Date:                Thu, 03 Oct 2024   AIC                           5996.884
Time:                        15:55:21   BIC                           6019.794
Sample:                    05-01-1959   HQIC                          6005.727
                         - 06-01-2019
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.1228      0.601      1.869      0.062      -0.055       2.300
HOUSTNSA.L1     0.1910      0.035      5.499      0.000       0.123       0.259
HOUSTNSA.L2     0.0058      0.039      0.150      0.881      -0.070       0.081
HOUSTNSA.L3    -0.1939      0.036     -5.448      0.000      -0.264      -0.124
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            0.9680           -1.3298j            1.6448           -0.1499
AR.2            0.9680           +1.3298j            1.6448            0.1499
AR.3           -1.9064           -0.0000j            1.9064           -0.5000
-----------------------------------------------------------------------------
[6]:
sel = ar_select_order(housing, 13, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                    AutoReg(13)   Log Likelihood               -2676.157
Method:               Conditional MLE   S.D. of innovations             10.378
Date:                Thu, 03 Oct 2024   AIC                           5382.314
Time:                        15:55:21   BIC                           5450.835
Sample:                    03-01-1960   HQIC                          5408.781
                         - 06-01-2019
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            1.3615      0.458      2.970      0.003       0.463       2.260
HOUSTNSA.L1     -0.2900      0.036     -8.161      0.000      -0.360      -0.220
HOUSTNSA.L2     -0.0828      0.031     -2.652      0.008      -0.144      -0.022
HOUSTNSA.L3     -0.0654      0.031     -2.106      0.035      -0.126      -0.005
HOUSTNSA.L4     -0.1596      0.031     -5.166      0.000      -0.220      -0.099
HOUSTNSA.L5     -0.0434      0.031     -1.387      0.165      -0.105       0.018
HOUSTNSA.L6     -0.0884      0.031     -2.867      0.004      -0.149      -0.028
HOUSTNSA.L7     -0.0556      0.031     -1.797      0.072      -0.116       0.005
HOUSTNSA.L8     -0.1482      0.031     -4.803      0.000      -0.209      -0.088
HOUSTNSA.L9     -0.0926      0.031     -2.960      0.003      -0.154      -0.031
HOUSTNSA.L10    -0.1133      0.031     -3.665      0.000      -0.174      -0.053
HOUSTNSA.L11     0.1151      0.031      3.699      0.000       0.054       0.176
HOUSTNSA.L12     0.5352      0.031     17.133      0.000       0.474       0.596
HOUSTNSA.L13     0.3178      0.036      8.937      0.000       0.248       0.388
                                    Roots
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1             1.0913           -0.0000j            1.0913           -0.0000
AR.2             0.8743           -0.5018j            1.0080           -0.0829
AR.3             0.8743           +0.5018j            1.0080            0.0829
AR.4             0.5041           -0.8765j            1.0111           -0.1669
AR.5             0.5041           +0.8765j            1.0111            0.1669
AR.6             0.0056           -1.0530j            1.0530           -0.2491
AR.7             0.0056           +1.0530j            1.0530            0.2491
AR.8            -0.5263           -0.9335j            1.0716           -0.3317
AR.9            -0.5263           +0.9335j            1.0716            0.3317
AR.10           -0.9525           -0.5880j            1.1194           -0.4120
AR.11           -0.9525           +0.5880j            1.1194            0.4120
AR.12           -1.2928           -0.2608j            1.3189           -0.4683
AR.13           -1.2928           +0.2608j            1.3189            0.4683
------------------------------------------------------------------------------

plot_predict 可視覺化預測。這裡我們產生大量的預測,顯示模型捕獲的字串季節性。

[7]:
fig = res.plot_predict(720, 840)
../../../_images/examples_notebooks_generated_autoregressions_13_0.png

plot_diagnositcs 表明該模型捕獲了數據中的關鍵特徵。

[8]:
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)
../../../_images/examples_notebooks_generated_autoregressions_15_0.png

季節虛擬變數

AutoReg 支援季節虛擬變數,這是對季節性建模的替代方法。包含虛擬變數可將動態縮短為僅 AR(2)。

[9]:
sel = ar_select_order(housing, 13, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:               Seas. AutoReg(2)   Log Likelihood               -2652.556
Method:               Conditional MLE   S.D. of innovations              9.487
Date:                Thu, 03 Oct 2024   AIC                           5335.112
Time:                        15:55:27   BIC                           5403.863
Sample:                    04-01-1959   HQIC                          5361.648
                         - 06-01-2019
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.2726      1.373      0.927      0.354      -1.418       3.963
s(2,12)        32.6477      1.824     17.901      0.000      29.073      36.222
s(3,12)        23.0685      2.435      9.472      0.000      18.295      27.842
s(4,12)        10.7267      2.693      3.983      0.000       5.449      16.005
s(5,12)         1.6792      2.100      0.799      0.424      -2.437       5.796
s(6,12)        -4.4229      1.896     -2.333      0.020      -8.138      -0.707
s(7,12)        -4.2113      1.824     -2.309      0.021      -7.786      -0.636
s(8,12)        -6.4124      1.791     -3.581      0.000      -9.922      -2.902
s(9,12)         0.1095      1.800      0.061      0.952      -3.419       3.638
s(10,12)      -16.7511      1.814     -9.234      0.000     -20.307     -13.196
s(11,12)      -20.7023      1.862    -11.117      0.000     -24.352     -17.053
s(12,12)      -11.9554      1.778     -6.724      0.000     -15.440      -8.470
HOUSTNSA.L1    -0.2953      0.037     -7.994      0.000      -0.368      -0.223
HOUSTNSA.L2    -0.1148      0.037     -3.107      0.002      -0.187      -0.042
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -1.2862           -2.6564j            2.9514           -0.3218
AR.2           -1.2862           +2.6564j            2.9514            0.3218
-----------------------------------------------------------------------------

季節虛擬變數在預測中很明顯,預測在未來 10 年的所有期間都具有顯著的季節性成分。

[10]:
fig = res.plot_predict(720, 840)
../../../_images/examples_notebooks_generated_autoregressions_20_0.png
[11]:
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(lags=30, fig=fig)
../../../_images/examples_notebooks_generated_autoregressions_21_0.png

季節動態

雖然 AutoReg 不直接支援季節成分,因為它使用 OLS 來估計參數,但可以使用過參數化的季節 AR 來捕獲季節動態,該季節 AR 不會強加季節 AR 中的限制。

[12]:
yoy_housing = data.HOUSTNSA.pct_change(12).resample("MS").last().dropna()
_, ax = plt.subplots()
ax = yoy_housing.plot(ax=ax)
../../../_images/examples_notebooks_generated_autoregressions_24_0.png

我們首先使用僅選擇最大落後期的簡單方法來選擇模型。所有較低的落後期都會自動包含。要檢查的最大落後期設定為 13,因為這允許模型嵌套一個具有短期 AR(1) 成分和季節性 AR(1) 成分的季節性 AR,因此

\[(1-\phi_s L^{12})(1-\phi_1 L)y_t = \epsilon_t\]

變成

\[y_t = \phi_1 y_{t-1} +\phi_s Y_{t-12} - \phi_1\phi_s Y_{t-13} + \epsilon_t\]

展開時。AutoReg 不強制執行結構,但可以估計嵌套模型

\[y_t = \phi_1 y_{t-1} +\phi_{12} Y_{t-12} - \phi_{13} Y_{t-13} + \epsilon_t.\]

我們看到選擇了所有 13 個落後期。

[13]:
sel = ar_select_order(yoy_housing, 13, old_names=False)
sel.ar_lags
[13]:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

似乎不太可能需要所有 13 個落後期。我們可以設定 glob=True 來搜尋所有包含最多 13 個落後期的 \(2^{13}\) 個模型。

在這裡,我們看到選擇了前三個,第 7 個也被選擇,最後選擇了第 12 個和第 13 個。這在表面上與上述結構相似。

在擬合模型後,我們看一下診斷圖,該圖表明此規範似乎足以捕獲數據中的動態。

[14]:
sel = ar_select_order(yoy_housing, 13, glob=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  714
Model:             Restr. AutoReg(13)   Log Likelihood                 589.177
Method:               Conditional MLE   S.D. of innovations              0.104
Date:                Thu, 03 Oct 2024   AIC                          -1162.353
Time:                        15:55:45   BIC                          -1125.933
Sample:                    02-01-1961   HQIC                         -1148.276
                         - 06-01-2019
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            0.0035      0.004      0.875      0.382      -0.004       0.011
HOUSTNSA.L1      0.5640      0.035     16.167      0.000       0.496       0.632
HOUSTNSA.L2      0.2347      0.038      6.238      0.000       0.161       0.308
HOUSTNSA.L3      0.2051      0.037      5.560      0.000       0.133       0.277
HOUSTNSA.L7     -0.0903      0.030     -2.976      0.003      -0.150      -0.031
HOUSTNSA.L12    -0.3791      0.034    -11.075      0.000      -0.446      -0.312
HOUSTNSA.L13     0.3354      0.033     10.254      0.000       0.271       0.400
                                    Roots
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.0309           -0.2682j            1.0652           -0.4595
AR.2            -1.0309           +0.2682j            1.0652            0.4595
AR.3            -0.7454           -0.7417j            1.0515           -0.3754
AR.4            -0.7454           +0.7417j            1.0515            0.3754
AR.5            -0.3172           -1.0221j            1.0702           -0.2979
AR.6            -0.3172           +1.0221j            1.0702            0.2979
AR.7             0.2419           -1.0573j            1.0846           -0.2142
AR.8             0.2419           +1.0573j            1.0846            0.2142
AR.9             0.7840           -0.8303j            1.1420           -0.1296
AR.10            0.7840           +0.8303j            1.1420            0.1296
AR.11            1.0730           -0.2386j            1.0992           -0.0348
AR.12            1.0730           +0.2386j            1.0992            0.0348
AR.13            1.1193           -0.0000j            1.1193           -0.0000
------------------------------------------------------------------------------
[15]:
fig = plt.figure(figsize=(16, 9))
fig = res.plot_diagnostics(fig=fig, lags=30)
../../../_images/examples_notebooks_generated_autoregressions_29_0.png

我們還可以包含季節虛擬變數。由於模型使用的是年同比變化,因此這些都是不顯著的。

[16]:
sel = ar_select_order(yoy_housing, 13, glob=True, seasonal=True, old_names=False)
sel.ar_lags
res = sel.model.fit()
print(res.summary())
                               AutoReg Model Results
====================================================================================
Dep. Variable:                     HOUSTNSA   No. Observations:                  714
Model:             Restr. Seas. AutoReg(13)   Log Likelihood                 590.875
Method:                     Conditional MLE   S.D. of innovations              0.104
Date:                      Thu, 03 Oct 2024   AIC                          -1143.751
Time:                              16:08:11   BIC                          -1057.253
Sample:                          02-01-1961   HQIC                         -1110.317
                               - 06-01-2019
================================================================================
                   coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------
const            0.0167      0.014      1.215      0.224      -0.010       0.044
s(2,12)         -0.0179      0.019     -0.931      0.352      -0.056       0.020
s(3,12)         -0.0121      0.019     -0.630      0.528      -0.050       0.026
s(4,12)         -0.0210      0.019     -1.089      0.276      -0.059       0.017
s(5,12)         -0.0223      0.019     -1.157      0.247      -0.060       0.015
s(6,12)         -0.0224      0.019     -1.160      0.246      -0.060       0.015
s(7,12)         -0.0212      0.019     -1.096      0.273      -0.059       0.017
s(8,12)         -0.0101      0.019     -0.520      0.603      -0.048       0.028
s(9,12)         -0.0095      0.019     -0.491      0.623      -0.047       0.028
s(10,12)        -0.0049      0.019     -0.252      0.801      -0.043       0.033
s(11,12)        -0.0084      0.019     -0.435      0.664      -0.046       0.030
s(12,12)        -0.0077      0.019     -0.400      0.689      -0.046       0.030
HOUSTNSA.L1      0.5630      0.035     16.160      0.000       0.495       0.631
HOUSTNSA.L2      0.2347      0.038      6.248      0.000       0.161       0.308
HOUSTNSA.L3      0.2075      0.037      5.634      0.000       0.135       0.280
HOUSTNSA.L7     -0.0916      0.030     -3.013      0.003      -0.151      -0.032
HOUSTNSA.L12    -0.3810      0.034    -11.149      0.000      -0.448      -0.314
HOUSTNSA.L13     0.3373      0.033     10.327      0.000       0.273       0.401
                                    Roots
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.0305           -0.2681j            1.0648           -0.4595
AR.2            -1.0305           +0.2681j            1.0648            0.4595
AR.3            -0.7447           -0.7414j            1.0509           -0.3754
AR.4            -0.7447           +0.7414j            1.0509            0.3754
AR.5            -0.3172           -1.0215j            1.0696           -0.2979
AR.6            -0.3172           +1.0215j            1.0696            0.2979
AR.7             0.2416           -1.0568j            1.0841           -0.2142
AR.8             0.2416           +1.0568j            1.0841            0.2142
AR.9             0.7837           -0.8304j            1.1418           -0.1296
AR.10            0.7837           +0.8304j            1.1418            0.1296
AR.11            1.0724           -0.2383j            1.0986           -0.0348
AR.12            1.0724           +0.2383j            1.0986            0.0348
AR.13            1.1192           -0.0000j            1.1192           -0.0000
------------------------------------------------------------------------------

工業生產

我們將使用工業生產指數數據來檢查預測。

[17]:
data = pdr.get_data_fred("INDPRO", "1959-01-01", "2019-06-01")
ind_prod = data.INDPRO.pct_change(12).dropna().asfreq("MS")
_, ax = plt.subplots(figsize=(16, 9))
ind_prod.plot(ax=ax)
[17]:
<Axes: xlabel='DATE'>
../../../_images/examples_notebooks_generated_autoregressions_33_1.png

我們將首先使用最多 12 個落後期來選擇模型。儘管許多係數不顯著,但 AR(13) 使 BIC 標準最小化。

[18]:
sel = ar_select_order(ind_prod, 13, "bic", old_names=False)
res = sel.model.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:                 INDPRO   No. Observations:                  714
Model:                    AutoReg(13)   Log Likelihood                2321.382
Method:               Conditional MLE   S.D. of innovations              0.009
Date:                Thu, 03 Oct 2024   AIC                          -4612.763
Time:                        16:08:12   BIC                          -4544.476
Sample:                    02-01-1961   HQIC                         -4586.368
                         - 06-01-2019
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0012      0.000      2.783      0.005       0.000       0.002
INDPRO.L1      1.1577      0.035     33.179      0.000       1.089       1.226
INDPRO.L2     -0.0822      0.053     -1.543      0.123      -0.187       0.022
INDPRO.L3   7.041e-05      0.053      0.001      0.999      -0.104       0.104
INDPRO.L4      0.0076      0.053      0.143      0.886      -0.096       0.111
INDPRO.L5     -0.1329      0.053     -2.529      0.011      -0.236      -0.030
INDPRO.L6     -0.0080      0.052     -0.153      0.879      -0.111       0.095
INDPRO.L7      0.0556      0.052      1.064      0.287      -0.047       0.158
INDPRO.L8     -0.0296      0.052     -0.568      0.570      -0.132       0.073
INDPRO.L9      0.0920      0.052      1.770      0.077      -0.010       0.194
INDPRO.L10    -0.0808      0.052     -1.552      0.121      -0.183       0.021
INDPRO.L11    -0.0003      0.052     -0.005      0.996      -0.102       0.102
INDPRO.L12    -0.3820      0.052     -7.364      0.000      -0.484      -0.280
INDPRO.L13     0.3613      0.033     10.996      0.000       0.297       0.426
                                    Roots
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.0408           -0.2913j            1.0808           -0.4566
AR.2            -1.0408           +0.2913j            1.0808            0.4566
AR.3            -0.7803           -0.8039j            1.1204           -0.3726
AR.4            -0.7803           +0.8039j            1.1204            0.3726
AR.5            -0.2728           -1.0533j            1.0880           -0.2903
AR.6            -0.2728           +1.0533j            1.0880            0.2903
AR.7             0.2716           -1.0507j            1.0852           -0.2097
AR.8             0.2716           +1.0507j            1.0852            0.2097
AR.9             0.8010           -0.7285j            1.0827           -0.1175
AR.10            0.8010           +0.7285j            1.0827            0.1175
AR.11            1.0219           -0.2219j            1.0457           -0.0340
AR.12            1.0219           +0.2219j            1.0457            0.0340
AR.13            1.0560           -0.0000j            1.0560           -0.0000
------------------------------------------------------------------------------

我們也可以使用全域搜尋,如果需要,它允許輸入較長的落後期,而不需要較短的落後期。在這裡,我們看到許多落後期被刪除。該模型表明數據中可能存在一些季節性。

[19]:
sel = ar_select_order(ind_prod, 13, "bic", glob=True, old_names=False)
sel.ar_lags
res_glob = sel.model.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:                 INDPRO   No. Observations:                  714
Model:                    AutoReg(13)   Log Likelihood                2321.382
Method:               Conditional MLE   S.D. of innovations              0.009
Date:                Thu, 03 Oct 2024   AIC                          -4612.763
Time:                        16:08:14   BIC                          -4544.476
Sample:                    02-01-1961   HQIC                         -4586.368
                         - 06-01-2019
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0012      0.000      2.783      0.005       0.000       0.002
INDPRO.L1      1.1577      0.035     33.179      0.000       1.089       1.226
INDPRO.L2     -0.0822      0.053     -1.543      0.123      -0.187       0.022
INDPRO.L3   7.041e-05      0.053      0.001      0.999      -0.104       0.104
INDPRO.L4      0.0076      0.053      0.143      0.886      -0.096       0.111
INDPRO.L5     -0.1329      0.053     -2.529      0.011      -0.236      -0.030
INDPRO.L6     -0.0080      0.052     -0.153      0.879      -0.111       0.095
INDPRO.L7      0.0556      0.052      1.064      0.287      -0.047       0.158
INDPRO.L8     -0.0296      0.052     -0.568      0.570      -0.132       0.073
INDPRO.L9      0.0920      0.052      1.770      0.077      -0.010       0.194
INDPRO.L10    -0.0808      0.052     -1.552      0.121      -0.183       0.021
INDPRO.L11    -0.0003      0.052     -0.005      0.996      -0.102       0.102
INDPRO.L12    -0.3820      0.052     -7.364      0.000      -0.484      -0.280
INDPRO.L13     0.3613      0.033     10.996      0.000       0.297       0.426
                                    Roots
==============================================================================
                   Real          Imaginary           Modulus         Frequency
------------------------------------------------------------------------------
AR.1            -1.0408           -0.2913j            1.0808           -0.4566
AR.2            -1.0408           +0.2913j            1.0808            0.4566
AR.3            -0.7803           -0.8039j            1.1204           -0.3726
AR.4            -0.7803           +0.8039j            1.1204            0.3726
AR.5            -0.2728           -1.0533j            1.0880           -0.2903
AR.6            -0.2728           +1.0533j            1.0880            0.2903
AR.7             0.2716           -1.0507j            1.0852           -0.2097
AR.8             0.2716           +1.0507j            1.0852            0.2097
AR.9             0.8010           -0.7285j            1.0827           -0.1175
AR.10            0.8010           +0.7285j            1.0827            0.1175
AR.11            1.0219           -0.2219j            1.0457           -0.0340
AR.12            1.0219           +0.2219j            1.0457            0.0340
AR.13            1.0560           -0.0000j            1.0560           -0.0000
------------------------------------------------------------------------------

plot_predict 可用於產生預測圖以及信賴區間。在這裡,我們產生從最後一個觀察值開始並持續 18 個月的預測。

[20]:
ind_prod.shape
[20]:
(714,)
[21]:
fig = res_glob.plot_predict(start=714, end=732)
../../../_images/examples_notebooks_generated_autoregressions_40_0.png

完整模型和限制模型的預測非常相似。我還包含了一個 AR(5),它具有非常不同的動態

[22]:
res_ar5 = AutoReg(ind_prod, 5, old_names=False).fit()
predictions = pd.DataFrame(
    {
        "AR(5)": res_ar5.predict(start=714, end=726),
        "AR(13)": res.predict(start=714, end=726),
        "Restr. AR(13)": res_glob.predict(start=714, end=726),
    }
)
_, ax = plt.subplots()
ax = predictions.plot(ax=ax)
../../../_images/examples_notebooks_generated_autoregressions_42_0.png

診斷表明該模型捕獲了數據中的大部分動態。ACF 顯示季節性頻率的模式,因此可能需要更完整的季節性模型 (SARIMAX)。

[23]:
fig = plt.figure(figsize=(16, 9))
fig = res_glob.plot_diagnostics(fig=fig, lags=30)
../../../_images/examples_notebooks_generated_autoregressions_44_0.png

預測

預測是使用結果實例的 predict 方法產生的。預設產生靜態預測,這是一步預測。產生多步預測需要使用 dynamic=True

在下一個單元格中,我們產生樣本中最後 24 個期間的 12 步向前預測。這需要一個迴圈。

注意:這些技術上是樣本內預測,因為我們預測的數據用於估計參數。產生 OOS 預測需要兩個模型。第一個必須排除 OOS 期間。第二個使用完整樣本模型的 predict 方法,其中包含來自排除 OOS 期間的較短樣本模型的參數。

[24]:
import numpy as np

start = ind_prod.index[-24]
forecast_index = pd.date_range(start, freq=ind_prod.index.freq, periods=36)
cols = ["-".join(str(val) for val in (idx.year, idx.month)) for idx in forecast_index]
forecasts = pd.DataFrame(index=forecast_index, columns=cols)
for i in range(1, 24):
    fcast = res_glob.predict(
        start=forecast_index[i], end=forecast_index[i + 12], dynamic=True
    )
    forecasts.loc[fcast.index, cols[i]] = fcast
_, ax = plt.subplots(figsize=(16, 10))
ind_prod.iloc[-24:].plot(ax=ax, color="black", linestyle="--")
ax = forecasts.plot(ax=ax)
../../../_images/examples_notebooks_generated_autoregressions_46_0.png

與 SARIMAX 比較

SARIMAX 是具有外生迴歸量模型的季節性自迴歸積分移動平均的實作。它支援

  • 指定季節性和非季節性 AR 和 MA 成分

  • 包含外生變數

  • 使用卡爾曼濾波器的完整最大概似估計

這個模型比 AutoReg 具有更豐富的功能。與 SARIMAX 不同,AutoReg 使用 OLS 估計參數。這樣速度更快,且問題是全局凸的,因此沒有局部最小值問題。當比較 AR(P) 模型時,封閉形式的估計器及其性能是 AutoReg 相較於 SARIMAX 的主要優勢。AutoReg 也支援季節性虛擬變數,如果使用者將它們作為外生迴歸變數包含在內,則可以與 SARIMAX 一起使用。

[25]:
from statsmodels.tsa.api import SARIMAX

sarimax_mod = SARIMAX(ind_prod, order=((1, 5, 12, 13), 0, 0), trend="c")
sarimax_res = sarimax_mod.fit()
print(sarimax_res.summary())
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            6     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f= -3.21907D+00    |proj g|=  1.78480D+01
 This problem is unconstrained.
At iterate    5    f= -3.22490D+00    |proj g|=  1.52242D-01

At iterate   10    f= -3.22526D+00    |proj g|=  1.46006D+00

At iterate   15    f= -3.22550D+00    |proj g|=  7.46711D-01

At iterate   20    f= -3.22610D+00    |proj g|=  2.52890D-01

At iterate   25    f= -3.22611D+00    |proj g|=  1.80088D-01

At iterate   30    f= -3.22646D+00    |proj g|=  1.46895D+00

At iterate   35    f= -3.22677D+00    |proj g|=  1.05336D-01

At iterate   40    f= -3.22678D+00    |proj g|=  3.20093D-02

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    6     41     64      1     0     0   3.201D-02  -3.227D+00
  F =  -3.2267786140514940

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
                                     SARIMAX Results
=========================================================================================
Dep. Variable:                            INDPRO   No. Observations:                  714
Model:             SARIMAX([1, 5, 12, 13], 0, 0)   Log Likelihood                2303.920
Date:                           Thu, 03 Oct 2024   AIC                          -4595.840
Time:                                   16:08:18   BIC                          -4568.415
Sample:                               01-01-1960   HQIC                         -4585.248
                                    - 06-01-2019
Covariance Type:                             opg
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
intercept      0.0011      0.000      2.534      0.011       0.000       0.002
ar.L1          1.0801      0.010    107.331      0.000       1.060       1.100
ar.L5         -0.0847      0.011     -7.592      0.000      -0.107      -0.063
ar.L12        -0.4428      0.026    -17.302      0.000      -0.493      -0.393
ar.L13         0.4073      0.025     16.213      0.000       0.358       0.457
sigma2      9.136e-05   3.08e-06     29.630      0.000    8.53e-05    9.74e-05
===================================================================================
Ljung-Box (L1) (Q):                  21.77   Jarque-Bera (JB):               959.65
Prob(Q):                              0.00   Prob(JB):                         0.00
Heteroskedasticity (H):               0.37   Skew:                            -0.63
Prob(H) (two-sided):                  0.00   Kurtosis:                         8.54
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
 Warning:  more than 10 function and gradient
   evaluations in the last line search.  Termination
   may possibly be caused by a bad search direction.
[26]:
sarimax_params = sarimax_res.params.iloc[:-1].copy()
sarimax_params.index = res_glob.params.index
params = pd.concat([res_glob.params, sarimax_params], axis=1, sort=False)
params.columns = ["AutoReg", "SARIMAX"]
params
[26]:
AutoReg SARIMAX
常數項 0.001234 0.001085
INDPRO.L1 1.088761 1.080116
INDPRO.L5 -0.105665 -0.084738
INDPRO.L12 -0.388374 -0.442793
INDPRO.L13 0.362319 0.407338

自定義確定性過程

deterministic 參數允許使用自定義的 DeterministicProcess。這允許構建更複雜的確定性項,例如包含兩個週期的季節性分量,或者如下一個範例所示,使用傅立葉級數而不是季節性虛擬變數。

[27]:
from statsmodels.tsa.deterministic import DeterministicProcess

dp = DeterministicProcess(housing.index, constant=True, period=12, fourier=2)
mod = AutoReg(housing, 2, trend="n", seasonal=False, deterministic=dp)
res = mod.fit()
print(res.summary())
                            AutoReg Model Results
==============================================================================
Dep. Variable:               HOUSTNSA   No. Observations:                  725
Model:                     AutoReg(2)   Log Likelihood               -2716.505
Method:               Conditional MLE   S.D. of innovations             10.364
Date:                Thu, 03 Oct 2024   AIC                           5449.010
Time:                        16:08:18   BIC                           5485.677
Sample:                    04-01-1959   HQIC                          5463.163
                         - 06-01-2019
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
const           1.7550      0.391      4.485      0.000       0.988       2.522
sin(1,12)      16.7443      0.860     19.478      0.000      15.059      18.429
cos(1,12)       4.9409      0.588      8.409      0.000       3.789       6.093
sin(2,12)      12.9364      0.619     20.889      0.000      11.723      14.150
cos(2,12)      -0.4738      0.754     -0.628      0.530      -1.952       1.004
HOUSTNSA.L1    -0.3905      0.037    -10.664      0.000      -0.462      -0.319
HOUSTNSA.L2    -0.1746      0.037     -4.769      0.000      -0.246      -0.103
                                    Roots
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1           -1.1182           -2.1159j            2.3932           -0.3274
AR.2           -1.1182           +2.1159j            2.3932            0.3274
-----------------------------------------------------------------------------
[28]:
fig = res.plot_predict(720, 840)
../../../_images/examples_notebooks_generated_autoregressions_52_0.png

上次更新:2024 年 10 月 03 日