Econometrics - Lecture 7

class: center, middle, inverse, title-slide

.title[
# Econometrics - Lecture 7
]
.subtitle[
## Time Series Regression and Forecasting
]
.author[
### Jonas Björnerstedt
]
.date[
### 2024-11-19
]

---

## Lecture Content

- Chapter 15.1 - 15.5

- Appendix 15.3 and 15.4

- Autoregressive models

- Dealing with autocorrelation

- Introduction of concepts

- See 
  - [Forecasting: Principles and Practice](https://otexts.com/fpp2/)

- [Introduction to Econometrics in R](https://www.econometrics-with-r.org)

---
## Real GDP in the United States

```r
USMacro = read_rds("us_macro_quarterly.rds") 
```

- Changes matter - unemployment etc

- Growth rate and recessions

- GDP is related over time

- Open dataset: [us_macro_quarterly.rds](https://rstudio.sh.se/ts/us_macro_quarterly.rds)

- **Date** is date

- **GDPC96** is GDP

- Tspread = GS10 - TB3MS
    
- Plot GDP log(GDP)

---
## Studying real GDP in R

```r
ggplot(USMacro) + aes(Date,GDPGR) +
    geom_line() + 
    geom_smooth(color = "red", se = FALSE, span = 0.2) + 
    geom_hline(aes(yintercept = mean(GDPGR)), color = "red", linetype = 2)
```

![](time_series07_files/figure-html/unnamed-chunk-3-1.png)

---
## Autoregressive `$u_t$` or `$Y_t$`

- Let `$Y_t = \beta_0 + u_t$` with autocorrelated errors `$u_t$`

- If errors are AR(1), then: `$u_t = \phi u_{t-1} + v_t$`
  
- Rewrite
`$$Y_t = \beta_0 + u_t =\beta_0 + \phi u_{t-1} + v_t$$`
- Since `$Y_{t-1} = \beta_0 + u_{t-1}$`:
`$$Y_t = \beta_0 + \phi u_{t-1} + v_t = \beta_0 + \phi (Y_{t-1} - \beta_0 ) + v_t$$`
- We can thus write it as an autoregressive equation in `$Y_t$` instead:
`$$Y_t = (1-\phi) \beta_0 + \phi Y_{t-1} + v_t$$`

---
## Autocorrelation and misspecification

- Misspecified model can lead to autocorrelation

- It has been said that "Autocorrelation is always functional misspecification"

- There is a relationship that is not in the model

- Incorporating the lag of `$Y_t$` is a way to account for these

- Think about _why_ there is autocorrelation
  - Include regressors `$X$`
  
  - Perhaps `$Y$` depends on `$X$` that changes slowly over time

---
class: inverse, center, middle

# Autoregressive Distributed Lag (ADL) model

---
## Autoregressive Distributed Lag (ADL) model

- First order ADL, denoted ADL(1,1):
`$$Y_t = \beta_0 + \beta_1 Y_{t-1} \color{red}{+ \delta_1 X_{t-1}} + u_t$$`

- Note that `$X_t$` not in regression

- Model should forecast `$Y_t$` given what we know at `$t-1$`

- `$X_t$` will not be observed at time `$t-1$`

- We assume that 
`$$\mathrm{E}(Y_t |Y_{t-1}, X_{t-1}) = \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1}$$`
or in other words that `$\mathrm{E}(u_t|Y_{t-1}, X_{t-1}) = 0$`.

---
## Transforming AR errors to ADL

* Let: 
`$$Y_t = \beta_0 + \beta_1 X_{t-1} + u_t$$`
  - with: `$u_t = \phi u_{t-1} + v_t$`

- Rewrite by adding and subtracting `$\phi Y_{t-1}$`:

`$$Y_t = \beta_0 + \beta_1 X_{t-1} + u_t \color{red}{+ \phi Y_{t-1} - \phi \left(\beta_0 + \beta_1 X_{t-2} + u_{t-1} \right)}$$`

- Collect terms 
`$$Y_t = \phi Y_{t-1} + (1-\phi ) \beta_0 + \beta_1 X_{t-1} -  \phi \beta_1 X_{t-2} + u_t - \phi u_{t-1}$$`

- Use the fact that `$u_t - \phi u_{t-1} = v_t$`:
`$$Y_t = \phi Y_{t-1} + (1-\phi ) \beta_0 + \beta_1 X_{t-1} - \phi \beta_1 X_{t-2} + v_t$$`

- _ADL representation_. Note that the coefficients are different

---
class: inverse, center, middle

# AR and MA

---
## Autocorrelation function ACF

* The autocorrelation function shows autocorrelations for different lag lengths

- The first two (and almost four) lags are significant

```r
library(forecast)
ggAcf(USMacro$GDPGR)
```

![](time_series07_files/figure-html/unnamed-chunk-4-1.png)

---
## Moving Average (MA)

- Autoregressive AR(1) error `$u_t$`:
`$$Y_t = u_t + \beta \color{red}{Y_{t-1}}$$`

- Moving average MA(1) error `$u_t$`:
`$$Y_t = u_t + \gamma \color{red}{u_{t-1}}$$`

- Different consequences:

- MA: has finite "memory"

- AR: depends on _all_ previous errors

---
## AR(1) autocorrelations

- AR(1) autocorrelations
`$$E[Y_tY_{t-1}] = E\left[(u_t + \beta Y_{t-1})Y_{t-1}\right]=\beta E[ Y_{t}^2]$$`
- Thus
`$$\rho_1 = \frac{E[Y_tY_{t-1}] }{Var(Y_t)} = \frac{\beta E[ Y_{t}^2]}{E[Y_{t}^2]}=\beta$$`

- Similarly `$\rho_2 =\beta^2$` and `$\rho_p =\beta^p$`

---
## MA autocorrelations

`$$E[Y_tY_{t-1}] = E\left[(u_t + \gamma u_{t-1})(u_{t-1} + \gamma u_{t-2})\right]=E[\gamma u_{t-1}^2]$$`

`$$\rho_1 = \frac{E[Y_tY_{t-1}] }{Var(Y_t)}=\frac{\gamma E[u_{t-1}^2]}{E[u^2_t + \gamma^2 u^2_{t-1}+ 2\gamma u_{t} u_{t-1}]}=\frac{\gamma}{1 + \gamma^2 }$$`

- For a MA(1) process we have `$\rho_2 = 0$`

---
## MA autocorrelations

```r
Y = arima.sim(list( ma = c(.5)),  n = 5000)
ggAcf(Y)
```

![](time_series07_files/figure-html/unnamed-chunk-5-1.png)

???

Do Arima sim exercise

---
## AR process as MA process

- Repeated substitution of `$Y_{t-i}$`:
`$$Y_t = u_t + \gamma \color{red}{Y_{t-1}}$$`
`$$Y_t = u_t + \gamma (u_{t-1} + \gamma \color{red}{Y_{t-2}})$$`
`$$Y_t = u_t + \gamma u_{t-1} + \gamma^2 (u_{t-2} + \gamma \color{red}{Y_{t-3}})$$`
`$$Y_t = u_t + \gamma u_{t-1} + \gamma^2 u_{t-2} + \gamma^3 u_{t-3} + \ldots = \sum_{i=0}^\infty \gamma^i u_{t-i}$$`

- AR(1) process can be written as a MA( `$\infty$` ) process

- _Wold decomposition theorem_: Any stationary process can be written in MA form

---
## Autocorrelations of AR(1)

The autocorrelations of `$Y_t = \beta Y_{t-1}+u_t$` where `$u_t$` has sd `$\sigma_u$`

```r
Y = arima.sim(list( ar = c(.9)),  n = 5000, sd=.1)
ggAcf(Y)
```

![](time_series07_files/figure-html/unnamed-chunk-6-1.png)

---
## [Autocorrelations of AR(1)<sup> 🔗 </sup>](http://192.121.208.72:3939/time_series07-figs.Rmd)

- The autocorrelations of `$Y_t = \beta Y_{t-1}+u_t$` where `$u_t$` has sd `$\sigma_u$`
- 200 observations - uncertain estimation of autocorrelations

![](time_series07_files/figure-html/unnamed-chunk-7-1.png)

---
## Partial autocorrelation function

- Look at estimate of last coefficient estimated
  - Shows value of coefficient of last p, in an estimation with lags 1 to p

```r
ggPacf(USMacro$GDPGR)
```

![](time_series07_files/figure-html/unnamed-chunk-8-1.png)

---
## Lag notation

`$$Y_t + a_1 Y_{t-1} + a_2 Y_{t-2} + a_3 Y_{t-3} + a_4 Y_{t-4} = a(L)Y_t$$`
where `$L^s$` is the _lag operator_ with `$Y_{t-s} = L^s Y_t$`, and 
`$$a(L) = 1 + a_1 L^1 + a_2 L^2 + a_3 L^3 + a_4 L^4$$`

- The relationship between lag operators, polynomials and unit roots is beyond this course

---
## ARMA

`$$Y_t  = a_1 Y_{t-1} + a_2 Y_{t-2} + u_t + b_1 u_{t-1} + b_2 u_{t-2}$$`

- ARMA(2, 2) process:

`$$Y_t - a_1 Y_{t-1} - a_2 Y_{t-2}  = u_t + b_1 u_{t-1} + b_2 u_{t-2}$$`

- ARMA(p, q) process:

`$$a(L)Y_t = b(L) u_{t}$$`
where
`$$a(L) = 1 - a_1 L^1 - a_2 L^2$$`

- ARIMA uses differencing to get stationarity, then ARMA(p, q)

---
class: inverse, center, middle

# Time series in R

---
## Dates

- Dates and times are complicated (messy)
- Different types

- yearly
  - quarterly
  - monthly
  - weekly
  - daily
  - with time - different precisions

- Various difficulties, for example:
  - specify date ranges
  - axis tick mark labels in plots
  - handle missing dates in plots and with lags

---
## Dates in R

- Dates are complicated in all statistics programs

- Messier in R than in Stata

- Several different solutions
  
  - In general they do not use data frames
  
  - Each variable is an object in the environment rather than in a data frame
  
    - `tsibble` package new solution that uses data frame

---
## Lags and differences in R

- Old methods using time series objects (zoo, ts, mts,...)

- Can be useful, but more complicated
    
- New method (tidyverse) using dataframes

- Tools improving quickly 
    
    - Packages: forecast, tsibble, fable

- The length of time between time periods does not matter

- Dataframe has to be sorted by time!
    
    - Lags can be in years or seconds
    
    - Length of time between observations is assumed constant

---
## Data mangagement in R with dplyr

The `dplyr` package in `tidyverse` is for data management and analysis.

- `select(data, colnames)` - Select columns

- `filter(data, condition)` - Select rows on condition

- `rename(data, newname = oldname)` - Rename column(s)

- `mutate(data, formulas)` - Modify or add columns

---
## Time series libraries in R

* `library(forecast)`
    - `Acf()` and `Pacf()` - slightly better than `acf()` and  `pacf()`
    - `ggAcf()` and `ggPacf()` - ACF using ggplot
    - Estimation of ARMA 
    - Forecasts

* `library(tseries)`
    - `adf.test()` - slightly better than `adf()`
    
* `library(AER)` - dataset
* `library(dynlm)` - dynamic linear models

- lags, trends, seasons

---
### Estimate GDP growth (S&W p 578)

```r
library(lubridate)
library(huxtable)
library(estimatr)
USMacro2 = filter(USMacro, year(Date) >= 1962, year(Date) <= 2012 ) 
ar1 = lm_robust(GDPGR ~ lag(GDPGR), data = USMacro2)
ar2 = lm_robust(GDPGR ~ lag(GDPGR) + lag(GDPGR, 2), data = USMacro2)
huxreg(AR1 = ar1, AR2 = ar2)
```

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-9">
<col><col><col><tr>
<th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">AR1</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">AR2</th></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(Intercept)</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1.995 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1.632 ***</td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.353)   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.408)   </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">lag(GDPGR)</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.338 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.278 ***</td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.077)   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.081)   </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">lag(GDPGR, 2)</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">        </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.179 *  </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">        </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.081)   </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">N</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">203        </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">202        </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">R2</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.115    </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.143    </td></tr>
<tr>
<th colspan="3" style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"> *** p < 0.001;  ** p < 0.01;  * p < 0.05.</th></tr>
</table>

---
## Plot based on estimate of GDP growth

```r
USMacro$GDPGR_p = arima.sim(n = length(USMacro$GDPGR) , 
                            model =list(ar = ar2$coefficients[2:3]),  
                            sd = sqrt(ar2[["res_var"]])
                  ) + mean(USMacro$GDPGR)
ggplot(USMacro) + aes(Date, GDPGR) +
    geom_line(aes(color="Actual")) + geom_line(aes(y=GDPGR_p, color="Predicted")) 
```

![](time_series07_files/figure-html/unnamed-chunk-10-1.png)

---
## Next Lecture

15. Chapter 16: Estimation of Dynamic Causal Effects