B/C Econometrics - Lecture 4

class: center, middle, inverse, title-slide

# B/C Econometrics - Lecture 4
## Dummy variables and Panel data
### Jonas Björnerstedt
### 2021-10-25

---

## Lecture Content

1. Dummy variables

1.  Panel data

-  Endogeneity - unobserved individual effects

-  Least Square Dummy Variables (LSDV) estimation

-  Fixed Effects estimation

3.  Robust standard errors

4. Some R - pipes

---
class: inverse, center, middle
# [Dummy variables](https://rstudio.sh.se/content/statistics04-figs/)

---
# Regression and dummy variables

* [Dummy variable exercise](https://rstudio.sh.se/content/statistics04-figs/)  <sup> 🔗 </sup>

---
## Relationship differs by gender

```r
ggplot(lengths) + aes(length, weight, color= gender) + 
  geom_point() + geom_smooth(method = "lm")
```

```
## `geom_smooth()` using formula 'y ~ x'
```

![](lecture04_files/figure-html/unnamed-chunk-2-1.png)

---
## Regression with dummy

* Only different intercept

```r
reg = lm(weight ~ length + gender, data = lengths)
ggplot(lengths) + aes(length, weight, color= gender) +geom_point() + geom_line(aes(y = predict(reg)))
```

![](lecture04_files/figure-html/unnamed-chunk-4-1.png)

---
class: inverse, center, middle
# Panel data and Fixed effects

---
## Panel data = multiple observations

- **Cross section** - observations over `\(n\)` individuals in one
    time period

- **Time series** - observations of one individual
    over `\(T\)` time periods

- **Panel data** - observations of `\(n\)` individuals
    over `\(T\)` time periods

-  Also called *longitudal* data

-  Can be

-  **balanced**: exactly `\(T\)` observations per `\(i\)`

-  **unbalanced**: `\(t \le T\)` observations per `\(i\)`

-  Handled the same way.

-  Think about endogeneity: Why is the data unbalanced?!
        
---
## Panel data - Advantages

Many observations of same individual `\(i\)` over time `\(t\)` gives:

- More data

- Can handle unobserved individual characteristics

- Can handle autocorrelation and heteroskedasticity 
  - not discussed in this course

---
## Crime dataset

* In `wooldridge` package

* Relationship between law enforcement and crime

* `prbarr` - Probability of Arrest
  * `crmrte` - Crime Rate

* Focus on 4 counties

```r
library(wooldridge)
data(crime4)
css = filter(crime4,  county %in% c(1,3,145,23) )  # subset to 4 counties
```

---
## Crime plot

```r
ggplot(css,aes(x =  prbarr, y = crmrte)) + 
  geom_point() + 
  geom_smooth(method="lm",se=FALSE) + 
  theme_xaringan() +  
  labs(x = 'prbarr - Probability of Arrest', y = 'crmrte - Crime Rate')
```

![](lecture04_files/figure-html/crime1-1.png)

---
## Effect of change in variable

* How much higher do we expect crime to be if the probability of arrest goes from 0.2 to 0.3? (or 20% to 30% in other words)

```r
xsection = lm_robust(crmrte ~ prbarr, data = css)
xsection_p = predict(xsection, newdata = data.frame(prbarr = c(0.2,0.3) ) )
kable(xsection_p)
```

|         x|
|---------:|
| 0.0214952|
| 0.0279753|

* predict is used to obtain prediction on actual data (fitted values) or on hypothetical values

---
## Panel data

* Different areas have different crime rates

![](lecture04_files/figure-html/unnamed-chunk-8-1.png)

---
## District relationships

* Looks like they all have similar slopes

![](lecture04_files/figure-html/unnamed-chunk-9-1.png)

---
## Only different intercept - _Fixed Effect_

* With the same slope, only different intercepts

![](lecture04_files/figure-html/dummy-1.png)

---
## Calculating the slope

* Now we will use three different methods for estimating the relationship

1) Use dummy variables

2) Subtract off mean (demean)

3) Fixed effects estimator

* In practice we use the last method

---
### Dummy Variable Regression

```r
library(broom) # pretty print regression results
mod = list()
dvreg = lm(crmrte ~ prbarr + factor(county) + 0, css)  
tidy(dvreg)  # pretty print regression results
```

---
## Demeaning

* subtract off mean in each county from observations in that county

```r
css2 = group_by(css, county) 
cdata = mutate(css2, 
    crmrte = crmrte - mean(crmrte),
    prbarr = prbarr - mean(prbarr)
    )
```

Estimation using demeaned variables:

```r
demeanreg = lm_robust(crmrte ~ prbarr + 0, data = cdata)
tidy(demeanreg) # pretty print regression results
```

* Negative relationship

---
## Demeaning illustration

![Animation of a fixed effects panel data estimator: we remove *between group* variation and concentrate on *within group* variation only](lecture04_files/figure-html/anim-1.gif)

---
### Using a package

* Different packages available for Fixed Effects estimation. Here we use `lm_robust`

```r
fe_reg = lm_robust(crmrte ~ prbarr, data = css, fixed_effects = county)
tidy(fe_reg)
```

---
## Comparing results

* Same estimated coefficent

* huxreg with titles for each regression

```r
huxreg('Dummy' = dvreg, 'Demeaned' = demeanreg, 'FE' = fe_reg)
```

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-15">
<col><col><col><col><tr>
<th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">Dummy</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">Demeaned</th><th style="vertical-align: top; text-align: center; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">FE</th></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">prbarr</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 *  </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-0.028 </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.014)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.018)</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.020)</td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)1</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.045 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.005)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)3</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.020 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.003)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)23</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.036 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.004)   </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">factor(county)145</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.038 ***</td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;"></th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">(0.005)   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">N</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28        </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">28     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">R2</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.991    </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.159 </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.893 </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">logLik</th><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">126.516    </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">AIC</th><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">-241.032    </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.8pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">     </td></tr>
<tr>
<th colspan="4" style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.8pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;"> *** p < 0.001;  ** p < 0.01;  * p < 0.05.</th></tr>
</table>

---
class: inverse, center, middle
# Heteroskedasticity

---
## What is Heteroskedasticity?

- Variability is often higher at higher values

- Same percentage variability

- Does not affect estimate

- Confidence intervals change

- Variance covariance is usually too low
    
    - Often increased variability for `\(X_i\)` far from mean
        `\(\bar X\)`
        
    - Variability at extremes results in more uncertainty than
        variability at the mean

---
## Heteroskedasticity plots

- Same data in both examples

- First and second half switch places

- Notice larger uncertainty (gray area) in second figure

- Observations with `\(X_{i}\)` far from mean `\(\bar X\)` are more
        influential
        
    - Variability far from mean increases uncertainty more
    
---
## Variability at the center

![](figures/hetcenter.png)

---
## Variability at the edges

![](figures/hetedges.png)

---
## Dealing with heteroskedasticity

Estimate with _robust_ standard errors

- Tends to give larger standard deviations

- Better to be cautious...

- Unfortunately `lm()` does not calculate robust SE

- Use `lm_robust()` in `estimatr` package

---
class: inverse, center, middle
# Pipes

---
## _Pipes_ in the tidyverse

* Not really needed but makes code simpler - used in documentation

* Often we want to take a dataset and perform several steps in order

* Pipe operator ` %>% ` facilitates

```r
select(lengths, length, weight)
```
can be written as

```r
lengths %>% select(length, weight)
```

* Pipe means: put the left hand side as the first argument in the function on the right hand side

* With several steps much easier to read
can be written as

```r
lengths %>% filter(gender == "Female") %>% select(length, weight)
```

---
## Selecting, correlating and formatting with pipe

```r
len2 =  select(lengths, length, weight)
len3 = correlate(len2)
kable(len3)
```

* Same code - hard to read

```r
kable(correlate(select(lengths, length, weight)))
```

* Same code - with pipes: `%>%`

```r
lengths %>% select(length, weight) %>% correlate() %>% kable()
```