Microeconometrics - Lecture 8

class: center, middle, inverse, title-slide

# Microeconometrics - Lecture 8
## Panel data
### Jonas Björnerstedt
### 2022-03-15

---

## Lecture Content

- Chapter 10. Panel data

1.  Fixed Effects

-  Endogeneity - unobserved individual effects

-  Least Square Dummy Variables (LSDV) estimation

-  Fixed Effects estimation

-  Robust standard errors
    
2.  Random Effects

-  Efficiency: observations over each individual correlated
    
3.  Long panels

---
class: inverse, center, middle
# Panel data

---
## Panel data = multiple observations

- **Cross section** - observations over `$n$` individuals in one
    time period

- **Time series** - observations of one individual
    over `$T$` time periods

- **Panel data** - observations of `$n$` individuals
    over `$T$` time periods

-  Also called *longitudal* data

-  Observations have two subscripts: `$X_{it}$` and `$Y_{it}$`

-  Can be

-  **balanced**: exactly `$T$` observations per `$i$`

-  **unbalanced**: `$t \le T$` observations per `$i$`

-  Handled the same way.

-  Think about endogeneity: Why is the data unbalanced?!
        
---
## Panel data - Advantages

Many observations of same individual `$i$` over time `$t$` gives:

- More data

- Can estimate individual coefficients

- Can handle unobserved individual characteristics

- Can handle autocorrelation and heteroskedasticity

---
## Short and long panels

- Short panel

-  Extension of cross section analysis

-  Independently sampled individuals

-  Arbitrary hetereoskedasticity and autocorrelation structures

-  Properties derived as `$n \rightarrow \infty$`

- Long (or dynamic) panel

-  Time series structure becomes important

-  Extension of time series analysis

-  Properties derived as `$T \rightarrow \infty$`

---
class: inverse, center, middle
# Fixed effects

---
## Crime dataset

* In `wooldridge` package

* Relationship between law enforcement and crime

* `prbarr` - Probability of Arrest
  * `crmrte` - Crime Rate

* Focus on 4 counties

```r
library(wooldridge)
data(crime4)
css = filter(crime4,  county %in% c(1,3,145,23) )  # subset to 4 counties
```

---
## Crime plot

```r
ggplot(css,aes(x =  prbarr, y = crmrte)) + 
  geom_point() + 
  geom_smooth(method="lm",se=FALSE) + 
  theme_xaringan() +  
  labs(x = 'prbarr - Probability of Arrest', y = 'crmrte - Crime Rate')
```

![](me08_files/figure-html/crime1-1.png)

---
## Effect of change in variable

* How much higher do we expect crime to be if the probability of arrest goes from 0.2 to 0.3? (or 20% to 30% in other words)

```r
xsection = lm_robust(crmrte ~ prbarr, data = css)
xsection_p = predict(xsection, newdata = data.frame(prbarr = c(0.2,0.3) ) )
kable(xsection_p)
```

|         x|
|---------:|
| 0.0214952|
| 0.0279753|

* predict is used to obtain prediction on actual data (fitted values) or on hypothetical values

---
## Panel data

* Different areas have different crime rates

![](me08_files/figure-html/unnamed-chunk-4-1.png)

---
## District relationships

* Looks like they all have similar slopes

![](me08_files/figure-html/unnamed-chunk-5-1.png)

---
## Only different intercept - _Fixed Effect_

* With the same slope, only different intercepts

![](me08_files/figure-html/dummy-1.png)

---
## Calculating the slope

* Now we will use three different methods for estimating the relationship

1) Use dummy variables

2) Subtract off mean (demean)

3) Fixed effects estimator

* In practice we use the last method

---
### Dummy Variable Regression

```r
library(broom) # pretty print regression results
mod = list()
dvreg = lm(crmrte ~ prbarr + factor(county) + 0, css)  
tidy(dvreg)  # pretty print regression results
```

---
## Demeaning

* subtract off mean in each county from observations in that county

```r
css2 = group_by(css, county) 
cdata = mutate(css2, 
    crmrte = crmrte - mean(crmrte),
    prbarr = prbarr - mean(prbarr)
    )
```

Estimation using demeaned variables:

```r
demeanreg = lm_robust(crmrte ~ prbarr + 0, data = cdata)
tidy(demeanreg) # pretty print regression results
```

* Negative relationship

---
## Demeaning illustration

![Animation of a fixed effects panel data estimator: we remove *between group* variation and concentrate on *within group* variation only](me08_files/figure-html/anim-1.gif)

---
### Using a package

* Different packages available for Fixed Effects estimation. Here we use `lm_robust`

```r
fe_reg = lm_robust(crmrte ~ prbarr, data = css, fixed_effects = county)
tidy(fe_reg)
```

---
## Comparing results

* Same estimated coefficent

* huxreg with titles for each regression

```r
huxreg('Dummy' = dvreg, 'Demeaned' = demeanreg, 'FE' = fe_reg)
```

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-11">
<col><col><col><col><tr>
<th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">Dummy</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">Demeaned</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">FE</th></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">prbarr</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 *  </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.014)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.018)</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.020)</td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)1</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.045 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.005)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)3</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.020 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.003)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)23</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.036 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.004)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)145</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.038 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.005)   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">N</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28        </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">R2</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.991    </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.159 </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.893 </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">logLik</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">126.516    </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">AIC</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-241.032    </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th colspan="4" style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"> *** p < 0.001;  ** p < 0.01;  * p < 0.05.</th></tr>
</table>

---
## Individual intercept - LSDV estimate

- Assume that the model has only one regressor `$X$`
    `$$Y_{it} = \beta_{0} + \beta_{1}X_{it} + \alpha_{i} + u_{it}$$`

- Then each individual has their own intercept
    `$$Y_{it} = \left(\beta_{0} + \alpha_{i}\right) + \beta_{1}X_{it} + u_{it}$$`

- Individual dummies catch all things constant by individual

-  Dummies must be included if correlated with regressors

-  Otherwise omitted variable: `$W_{it} = \alpha_i$`

-  If not, the estimate will be biased and inconsistent

-  Compare with Pooled estimate

- Denoted Least Square Dummy Variable (LSDV) estimate

---
## Bias of pooled OLS and panel estimate

- Bias depends on whether individual effect `$\alpha_{i}$` correlated with some `$X_i$`

![](figures/randomeffects.png)

---
## Bias of pooled OLS and panel estimate

- Bias depends on whether individual effect `$\alpha_{i}$` correlated with some `$X_i$`

![](figures/fixedeffects.png)

---
## Fixed Effects (FE) Model

- For each panel, take the difference with the average

-  Assume that the model has only one regressor `$X$`:
        `$$Y_{it} = \beta_{0} + \beta_{1}X_{it} + u_{it} + \alpha_{i}$$`

-  Let `$\bar{X}_{i}$` be the mean value of `$X$` for individual `$i$` over time 
        `$t$` `$$\tilde{X}_{it} = X_{it} - \bar{X}_{i}$$`

- Problem with LSDV

-  Concise output (we don’t care about individuals)

-  Computational limitations: huge variance covariance matrix

-  Statistically identical to FE

---
## Serial correlation, Heteroskedasticity and data

- In time series data, the structure of autocorrelation is limited by data

-  For sample size `$T$`, the correlation structure is `$T^{2}$`

-  More parameters than observations!

- With many individuals compared to time `$n \gg T$`

-  Arbitrary serial correlation can be accounted for

- Short panel allows estimation of any form of autocorrelation and
    heteroskedasticity

- Clustered standard errors

- Also called HAC standard errors

-  Heteroskedacticity and Autocorrelation Consistent

---
class: inverse, center, middle
# Random effects

---
## Correlated errors

- Errors can be seen to be correlated `$$v_{it} = u_{it} + \alpha_{i}$$`

- But not correlated with any regressor

![](figures/charity.png)

---
## Random effects (RE)

- Also called the Error Components model `$$v_{it} = u_{it} + \alpha_{i}$$`

- A Generalized Least Squares (GLS) model

-  Use knowledge of error structure to increase efficiency of estimate

- Strong assumptions required to treat this as correlation problem

-  Individual unobservables `$v_{it}$` are uncorrelated with `$X_{it}$`

---
## Random effects (RE)

- If assumptions hold, RE is more efficient than

-  pooled estimate if unobserved heterogeneity exists

-  FE estimate if unobserved is uncorrelated with observed

- Hausman test

-  Is the FE estimate significantly different than the RE?

-  Assumes homoskedastic error

---
## Comparison Fixed and Random effects

- RE is more efficient than FE

-  Variation between individuals is also used

-  Between variation assumed to be uncorrelated with regressions

- Strong assumptions required for RE to be consistent

-  Individual unobservables `$W_{it}$` are uncorrelated with
        observables `$X_{it}$`

-  Major point of FE is to solve problem of unobservables

-  Increased efficiency often lesser issue

---
## Next lecture

- Chapter 8. Nonlinear models