class: center, middle, inverse, title-slide .title[ # Econometrics - Lecture 7 ] .subtitle[ ## Time Series Regression and Forecasting ] .author[ ### Jonas Björnerstedt ] .date[ ### 2024-11-19 ] --- ## Lecture Content - Chapter 15.1 - 15.5 - Appendix 15.3 and 15.4 - Autoregressive models - Dealing with autocorrelation - Introduction of concepts - See - [Forecasting: Principles and Practice](https://otexts.com/fpp2/) - [Introduction to Econometrics in R](https://www.econometrics-with-r.org) --- ## Real GDP in the United States ```r USMacro = read_rds("us_macro_quarterly.rds") ``` - Changes matter - unemployment etc - Growth rate and recessions - GDP is related over time - Open dataset: [us_macro_quarterly.rds](https://rstudio.sh.se/ts/us_macro_quarterly.rds) - **Date** is date - **GDPC96** is GDP - Tspread = GS10 - TB3MS - Plot GDP log(GDP) --- ## Studying real GDP in R ```r ggplot(USMacro) + aes(Date,GDPGR) + geom_line() + geom_smooth(color = "red", se = FALSE, span = 0.2) + geom_hline(aes(yintercept = mean(GDPGR)), color = "red", linetype = 2) ``` ![](time_series07_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ## Autoregressive `\(u_t\)` or `\(Y_t\)` - Let `\(Y_t = \beta_0 + u_t\)` with autocorrelated errors `\(u_t\)` - If errors are AR(1), then: `\(u_t = \phi u_{t-1} + v_t\)` - Rewrite `$$Y_t = \beta_0 + u_t =\beta_0 + \phi u_{t-1} + v_t$$` - Since `\(Y_{t-1} = \beta_0 + u_{t-1}\)`: `$$Y_t = \beta_0 + \phi u_{t-1} + v_t = \beta_0 + \phi (Y_{t-1} - \beta_0 ) + v_t$$` - We can thus write it as an autoregressive equation in `\(Y_t\)` instead: `$$Y_t = (1-\phi) \beta_0 + \phi Y_{t-1} + v_t$$` --- ## Autocorrelation and misspecification - Misspecified model can lead to autocorrelation - It has been said that "Autocorrelation is always functional misspecification" - There is a relationship that is not in the model - Incorporating the lag of `\(Y_t\)` is a way to account for these - Think about _why_ there is autocorrelation - Include regressors `\(X\)` - Perhaps `\(Y\)` depends on `\(X\)` that changes slowly over time --- class: inverse, center, middle # Autoregressive Distributed Lag (ADL) model --- ## Autoregressive Distributed Lag (ADL) model - First order ADL, denoted ADL(1,1): `$$Y_t = \beta_0 + \beta_1 Y_{t-1} \color{red}{+ \delta_1 X_{t-1}} + u_t$$` - Note that `\(X_t\)` not in regression - Model should forecast `\(Y_t\)` given what we know at `\(t-1\)` - `\(X_t\)` will not be observed at time `\(t-1\)` - We assume that `$$\mathrm{E}(Y_t |Y_{t-1}, X_{t-1}) = \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1}$$` or in other words that `\(\mathrm{E}(u_t|Y_{t-1}, X_{t-1}) = 0\)`. --- ## Transforming AR errors to ADL * Let: `$$Y_t = \beta_0 + \beta_1 X_{t-1} + u_t$$` - with: `\(u_t = \phi u_{t-1} + v_t\)` - Rewrite by adding and subtracting `\(\phi Y_{t-1}\)`: `$$Y_t = \beta_0 + \beta_1 X_{t-1} + u_t \color{red}{+ \phi Y_{t-1} - \phi \left(\beta_0 + \beta_1 X_{t-2} + u_{t-1} \right)}$$` - Collect terms `$$Y_t = \phi Y_{t-1} + (1-\phi ) \beta_0 + \beta_1 X_{t-1} - \phi \beta_1 X_{t-2} + u_t - \phi u_{t-1}$$` - Use the fact that `\(u_t - \phi u_{t-1} = v_t\)`: `$$Y_t = \phi Y_{t-1} + (1-\phi ) \beta_0 + \beta_1 X_{t-1} - \phi \beta_1 X_{t-2} + v_t$$` - _ADL representation_. Note that the coefficients are different --- class: inverse, center, middle # AR and MA --- ## Autocorrelation function ACF * The autocorrelation function shows autocorrelations for different lag lengths - The first two (and almost four) lags are significant ```r library(forecast) ggAcf(USMacro$GDPGR) ``` ![](time_series07_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- ## Moving Average (MA) - Autoregressive AR(1) error `\(u_t\)`: `$$Y_t = u_t + \beta \color{red}{Y_{t-1}}$$` - Moving average MA(1) error `\(u_t\)`: `$$Y_t = u_t + \gamma \color{red}{u_{t-1}}$$` - Different consequences: - MA: has finite "memory" - AR: depends on _all_ previous errors --- ## AR(1) autocorrelations - AR(1) autocorrelations `$$E[Y_tY_{t-1}] = E\left[(u_t + \beta Y_{t-1})Y_{t-1}\right]=\beta E[ Y_{t}^2]$$` - Thus `$$\rho_1 = \frac{E[Y_tY_{t-1}] }{Var(Y_t)} = \frac{\beta E[ Y_{t}^2]}{E[Y_{t}^2]}=\beta$$` - Similarly `\(\rho_2 =\beta^2\)` and `\(\rho_p =\beta^p\)` --- ## MA autocorrelations `$$E[Y_tY_{t-1}] = E\left[(u_t + \gamma u_{t-1})(u_{t-1} + \gamma u_{t-2})\right]=E[\gamma u_{t-1}^2]$$` `$$\rho_1 = \frac{E[Y_tY_{t-1}] }{Var(Y_t)}=\frac{\gamma E[u_{t-1}^2]}{E[u^2_t + \gamma^2 u^2_{t-1}+ 2\gamma u_{t} u_{t-1}]}=\frac{\gamma}{1 + \gamma^2 }$$` - For a MA(1) process we have `\(\rho_2 = 0\)` --- ## MA autocorrelations ```r Y = arima.sim(list( ma = c(.5)), n = 5000) ggAcf(Y) ``` ![](time_series07_files/figure-html/unnamed-chunk-5-1.png)<!-- --> ??? Do Arima sim exercise --- ## AR process as MA process - Repeated substitution of `\(Y_{t-i}\)`: `$$Y_t = u_t + \gamma \color{red}{Y_{t-1}}$$` `$$Y_t = u_t + \gamma (u_{t-1} + \gamma \color{red}{Y_{t-2}})$$` `$$Y_t = u_t + \gamma u_{t-1} + \gamma^2 (u_{t-2} + \gamma \color{red}{Y_{t-3}})$$` `$$Y_t = u_t + \gamma u_{t-1} + \gamma^2 u_{t-2} + \gamma^3 u_{t-3} + \ldots = \sum_{i=0}^\infty \gamma^i u_{t-i}$$` - AR(1) process can be written as a MA( `\(\infty\)` ) process - _Wold decomposition theorem_: Any stationary process can be written in MA form --- ## Autocorrelations of AR(1) The autocorrelations of `\(Y_t = \beta Y_{t-1}+u_t\)` where `\(u_t\)` has sd `\(\sigma_u\)` ```r Y = arima.sim(list( ar = c(.9)), n = 5000, sd=.1) ggAcf(Y) ``` ![](time_series07_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## [Autocorrelations of AR(1)<sup> 🔗 </sup>](http://192.121.208.72:3939/time_series07-figs.Rmd) - The autocorrelations of `\(Y_t = \beta Y_{t-1}+u_t\)` where `\(u_t\)` has sd `\(\sigma_u\)` - 200 observations - uncertain estimation of autocorrelations ![](time_series07_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- ## Partial autocorrelation function - Look at estimate of last coefficient estimated - Shows value of coefficient of last p, in an estimation with lags 1 to p ```r ggPacf(USMacro$GDPGR) ``` ![](time_series07_files/figure-html/unnamed-chunk-8-1.png)<!-- --> --- ## Lag notation `$$Y_t + a_1 Y_{t-1} + a_2 Y_{t-2} + a_3 Y_{t-3} + a_4 Y_{t-4} = a(L)Y_t$$` where `\(L^s\)` is the _lag operator_ with `\(Y_{t-s} = L^s Y_t\)`, and `$$a(L) = 1 + a_1 L^1 + a_2 L^2 + a_3 L^3 + a_4 L^4$$` - The relationship between lag operators, polynomials and unit roots is beyond this course --- ## ARMA `$$Y_t = a_1 Y_{t-1} + a_2 Y_{t-2} + u_t + b_1 u_{t-1} + b_2 u_{t-2}$$` - ARMA(2, 2) process: `$$Y_t - a_1 Y_{t-1} - a_2 Y_{t-2} = u_t + b_1 u_{t-1} + b_2 u_{t-2}$$` - ARMA(p, q) process: `$$a(L)Y_t = b(L) u_{t}$$` where `$$a(L) = 1 - a_1 L^1 - a_2 L^2$$` - ARIMA uses differencing to get stationarity, then ARMA(p, q) --- class: inverse, center, middle # Time series in R --- ## Dates - Dates and times are complicated (messy) - Different types - yearly - quarterly - monthly - weekly - daily - with time - different precisions - Various difficulties, for example: - specify date ranges - axis tick mark labels in plots - handle missing dates in plots and with lags --- ## Dates in R - Dates are complicated in all statistics programs - Messier in R than in Stata - Several different solutions - In general they do not use data frames - Each variable is an object in the environment rather than in a data frame - `tsibble` package new solution that uses data frame --- ## Lags and differences in R - Old methods using time series objects (zoo, ts, mts,...) - Can be useful, but more complicated - New method (tidyverse) using dataframes - Tools improving quickly - Packages: forecast, tsibble, fable - The length of time between time periods does not matter - Dataframe has to be sorted by time! - Lags can be in years or seconds - Length of time between observations is assumed constant --- ## Data mangagement in R with dplyr The `dplyr` package in `tidyverse` is for data management and analysis. - `select(data, colnames)` - Select columns - `filter(data, condition)` - Select rows on condition - `rename(data, newname = oldname)` - Rename column(s) - `mutate(data, formulas)` - Modify or add columns --- ## Time series libraries in R * `library(forecast)` - `Acf()` and `Pacf()` - slightly better than `acf()` and `pacf()` - `ggAcf()` and `ggPacf()` - ACF using ggplot - Estimation of ARMA - Forecasts * `library(tseries)` - `adf.test()` - slightly better than `adf()` * `library(AER)` - dataset * `library(dynlm)` - dynamic linear models - lags, trends, seasons --- ### Estimate GDP growth (S&W p 578) ```r library(lubridate) library(huxtable) library(estimatr) USMacro2 = filter(USMacro, year(Date) >= 1962, year(Date) <= 2012 ) ar1 = lm_robust(GDPGR ~ lag(GDPGR), data = USMacro2) ar2 = lm_robust(GDPGR ~ lag(GDPGR) + lag(GDPGR, 2), data = USMacro2) huxreg(AR1 = ar1, AR2 = ar2) ```
AR1
AR2
(Intercept)
1.995 ***
1.632 ***
(0.353)
(0.408)
lag(GDPGR)
0.338 ***
0.278 ***
(0.077)
(0.081)
lag(GDPGR, 2)
0.179 *
(0.081)
N
203
202
R2
0.115
0.143
*** p < 0.001; ** p < 0.01; * p < 0.05.
--- ## Plot based on estimate of GDP growth ```r USMacro$GDPGR_p = arima.sim(n = length(USMacro$GDPGR) , model =list(ar = ar2$coefficients[2:3]), sd = sqrt(ar2[["res_var"]]) ) + mean(USMacro$GDPGR) ggplot(USMacro) + aes(Date, GDPGR) + geom_line(aes(color="Actual")) + geom_line(aes(y=GDPGR_p, color="Predicted")) ``` ![](time_series07_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ## Next Lecture 15. Chapter 16: Estimation of Dynamic Causal Effects