Econometrics - Lecture 4

class: center, middle, inverse, title-slide

.title[
# Econometrics - Lecture 4
]
.subtitle[
## Ordinary Least Squares
]
.author[
### Jonas Björnerstedt
]
.date[
### 2024-11-11
]

---

## Lecture Content

1. OLS derivations

1. Dummy variables

2. Heteroscedasticity

---
## [Transform a random variable<sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics01-figs.Rmd#section-correlation)

- From a random variable `$Y$` we can create new random variables:

* `$2Y$` stretches
* `$Y + 1$` moves

---
## Properties of expectations

`$$E(aY) = aE(Y)$$`

- To show this for discrete random var, use definition 
`$$E(aY) =  \sum_{i=1}^k (aY_i) p_i= a \sum_{i=1}^k Y_i p_i = aE(Y)$$`

- Similarly, one can show that
`$$E(X + Y) = E(X) + E(Y)$$`
---
## Properties of variance

`$$Var(aX) = a^2 Var(X)$$`

- We can show this using the definitions:

`$$Var(aX) = E[(aX-E(aX))^2] = E[(aX-aE(X))^2] = E[a^2(X-E(X))^2]$$`

- Thus
`$$Var(aX) = E[a^2(X-E(X))^2] = a^2E[(X-E(X))^2] = a^2 Var(X)$$`

---
## Variance of `$X + Y$`

- For independent `$X$` and `$Y$`:
`$$Var(X + Y) = Var(X) + Var(Y)$$`

- For simplicity, assume that `$X$` and `$Y$` have zero mean, i.e. `$\mu_X = \mu_Y = 0$`

- The calculations become slightly more messy otherwise 
    
`$$Var(X + Y) = E[(X + Y - \mu_X - \mu_Y)^2] = E[(X + Y)^2]$$`

- We know that `$(X + Y)^2 = (X + Y)(X + Y) = X^2 + 2XY + Y^2$`. Thus

`$$E[(X + Y)^2] = E[X^2 + 2XY + Y^2] = E[X^2] + 2E[XY] + E[Y^2]$$`
- If `$X$` and `$Y$` are independent, they are uncorrelated, with `$E[XY] = 0$`. Thus

`$$Var(X + Y) = E[X^2]+ E[Y^2] = Var(X) + Var(Y)$$`

---
## Expected value and variance of mean

- What is the variance of the mean of two observations?

`$$\bar Y = \frac{Y_1+Y_2}{2}$$`
`$$E[\bar Y] = E[ \frac{Y_1+Y_2}{2}]= \frac{1}{2}(E[Y_1+Y_2] ) = \frac{1}{2}(E[Y_1]+E[Y_2] ) = E[Y]$$`
$$Var(\frac{Y_1+Y_2}{2}) = \frac{1}{4} Var(Y_1+Y_2)  = \frac{1}{4} \left( Var(Y_1)+Var(Y_2)\right) $$

* Thus with independent sampling we have:
    
`$$Var(\bar Y) = Var(\frac{Y_1+Y_2}{2}) = \frac{1}{2} Var(Y_i)$$`

* With `$n$` observations we have:
    
`$$Var(\bar Y) = Var \left( \frac{Y_1+Y_2+ \ldots +Y_n}{n} \right) = \frac{1}{n} Var(Y_i)$$`

---
## Units and size

- [Download lengthdata dataset](https://rstudio.sh.se/ts/lengthdata.rds)
- Regression coefficients depend on the scale of variables

```r
len = readRDS("lengthdata.rds")

len$m_length = len$length/100   # Length in meters
len$g_weight = len$weight*1000   # Weight in grams
```

| length| weight|gender | female| m_length| g_weight|
|------:|------:|:------|------:|--------:|--------:|
|    186|     82|Male   |      0|     1.86|    82000|
|    173|     54|Female |      1|     1.73|    54000|
|    168|     52|Female |      1|     1.68|    52000|
|    175|     79|Male   |      0|     1.75|    79000|
|    192|     85|Male   |      0|     1.92|    85000|
|    174|     75|Female |      1|     1.74|    75000|

---
## Rescaling regression

```r
r = lm(weight ~ length, data = len) 
```

|term        | estimate| std.error| statistic| p.value|
|:-----------|--------:|---------:|---------:|-------:|
|(Intercept) |   -97.78|     19.37|     -5.05|       0|
|length      |     0.97|      0.11|      8.56|       0|

```r
r = lm( g_weight ~ m_length, data = len) 
```

|term        |  estimate| std.error| statistic| p.value|
|:-----------|---------:|---------:|---------:|-------:|
|(Intercept) | -97775.98|  19368.85|     -5.05|       0|
|m_length    |  96506.20|  11269.98|      8.56|       0|

* Increase in weight in grams of a change in length in meters is __100 000 bigger__

---
## Summary statistics

* Summary statistics essential in order to underatand data

* The `vtable` package has a summary statistics function

```r
library(vtable)
st(len)
```

<table class="table" style="margin-left: auto; margin-right: auto;">
<caption>Summary Statistics</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Variable </th>
   <th style="text-align:left;"> N </th>
   <th style="text-align:left;"> Mean </th>
   <th style="text-align:left;"> Std. Dev. </th>
   <th style="text-align:left;"> Min </th>
   <th style="text-align:left;"> Pctl. 25 </th>
   <th style="text-align:left;"> Pctl. 75 </th>
   <th style="text-align:left;"> Max </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> length </td>
   <td style="text-align:left;"> 43 </td>
   <td style="text-align:left;"> 171 </td>
   <td style="text-align:left;"> 11 </td>
   <td style="text-align:left;"> 150 </td>
   <td style="text-align:left;"> 163 </td>
   <td style="text-align:left;"> 178 </td>
   <td style="text-align:left;"> 192 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> weight </td>
   <td style="text-align:left;"> 43 </td>
   <td style="text-align:left;"> 68 </td>
   <td style="text-align:left;"> 14 </td>
   <td style="text-align:left;"> 47 </td>
   <td style="text-align:left;"> 55 </td>
   <td style="text-align:left;"> 78 </td>
   <td style="text-align:left;"> 95 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> gender </td>
   <td style="text-align:left;"> 43 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... Female </td>
   <td style="text-align:left;"> 21 </td>
   <td style="text-align:left;"> 49% </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... Male </td>
   <td style="text-align:left;"> 22 </td>
   <td style="text-align:left;"> 51% </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>

---
## Standardizing variables

- Subtract mean to get zero mean variable (demeaning)

- Let `$W = Y - \mu_Y$`

$$ E[W] = E[Y - \mu_Y] = E[Y] - E[\mu_Y] = \mu_Y - \mu_Y = 0$$

- Divide by standard deviation to get var with unit variance

- Let `$U = \frac{Y}{\sigma_Y}$`. As `$Var[aY] = a^2 Var[Y]$` for any number `$a$`, we have: 
$$ Var[U] = Var[ \frac{Y}{\sigma_Y}] = Var[ \frac{1}{\sigma_Y}Y] = \frac{1}{\sigma^2_Y} Var[ Y] = 1$$

- Thus for any random variable `$Y$`, the _standardized_ variable
`$$\frac{Y-\mu_Y}{\sigma_Y}$$`
has mean 0 and variance 1.

---
## Standardized regression

```r
len$s_length = (len$length - mean(len$length))/sd(len$length) 
len$s_weight = (len$weight - mean(len$weight))/sd(len$weight)

sr = lm( s_weight ~ s_length, data = len)

library(broom)
tidy(sr) 
```

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-9">
<col><col><col><col><col><tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">term</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">estimate</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">std.error</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">statistic</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">p.value</th></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">(Intercept)</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">-3.57e-16</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">0.0924</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">-3.86e-15</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1       </td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">s_length</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.801   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.0935</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">8.56    </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1.13e-10</td></tr>
</table>

---
## Standardising  and correlation coefficent

- How variability in `$X$` relates to variability in `$Y$`

<table class="huxtable" style="border-collapse: collapse; border: 0px; margin-bottom: 2em; margin-top: 2em; ; margin-left: auto; margin-right: auto;  " id="tab:unnamed-chunk-10">
<col><col><col><col><col><tr>
<th style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">term</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">estimate</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">std.error</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">statistic</th><th style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: bold;">p.value</th></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">(Intercept)</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">-3.57e-16</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">0.0924</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">-3.86e-15</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0.4pt 0.4pt 0pt 0pt;    padding: 6pt 6pt 6pt 6pt; background-color: rgb(242, 242, 242); font-weight: normal;">1       </td></tr>
<tr>
<td style="vertical-align: top; text-align: left; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0.4pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">s_length</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.801   </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">0.0935</td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">8.56    </td><td style="vertical-align: top; text-align: right; white-space: normal; border-style: solid solid solid solid; border-width: 0pt 0.4pt 0.4pt 0pt;    padding: 6pt 6pt 6pt 6pt; font-weight: normal;">1.13e-10</td></tr>
</table>

```r
library(corrr)
df = select(len, length, weight)
correlate(df) 
```

---
class: inverse, center, middle
# OLS

---
## The Least squares assumption

- Assumption 1: Cond distribution of `$u$` has mean zero

- Relationship between `$X_i$` and `$u_i$` has to be specified

- Assumption 2: Observations are IID

- `$X_i$` and `$Y_i$` independent of `$X_j$` and `$Y_j$`

- Assumption 3: Large outliers are unlikely

- To ensure that variance can be estimated

- Use of the OLS assumptions

- Estimation of coefficients and their variance

- Unbiased and consistent estimates

---
## Properties of estimators

1. _Unbiased_

- Corresponds to population parameter in expectation

- Both `$\bar Y$` and `$Y_1$` are unbiased estimates of  `$\mu_Y$`
    
2. _Consistent_

- Converges in probability to `$\mu_Y$` as sample size increases to infinity

- `$\bar Y$` but not `$Y_1$` are consistent
        - Law of large numbers

- Note that a consistent estimator can be biased
    
3. _Efficient_

- Uncertainty (variance) in estimate lower than alternatives

- `$\bar Y$` efficient but not `$Y_1$`

---
## Best Linear Unbiased Estimator (BLUE)

- Unbiased: `$E\left(\hat\beta_{0}\right)=\beta_{0}$` and `$E\left(\hat\beta_{1}\right)=\beta_{1}$`

- Consistency (no asymptotic bias)

- Convergence (in probability) to true value as sample size `$N\rightarrow\infty$`

- Efficiency / best linear estimator

- No other unbiased linear estimator has lower variance

- Unbiased variance estimate: `$E\left(s^{2}\right)=\sigma^{2}$`

- Unbiased standard errors

---
## Zero conditional mean

- To ensure that we have a linear model, we assume that `$E(u_{i}|X_{i}) = 0$`

- The expected value of `$u_i$` does not depend on `$X_i$`

- But other aspects of the distribution of `$u$` could depend on `$X$`

- Horizontal variation `$X_i$` and vertical `$u_i$` are not too related

---
## OLS calculations

- Population equation: `$Y_i = \beta_0 + \beta_1 X_i + u_i$`

- we have `$E(Y_i) = E(\beta_0 +\beta_1 X_i + u_i) = \beta_0 +\beta_1 E(X_i) + E(u_i)$`

- Can assume that `$E(u_i) = 0$`, as `$\beta_0$` captures the constant part of the conditional expectation

- We thus have `$E(Y_i) = \beta_0 +\beta_1 E(X_i)$`

- OLS assumption: `$X_i$` is uncorrelated with `$u_i$`, i.e. `$E(X_iu_i) = 0$`.

---
## Simplest regression

- We consider now variables with `$E(X_i) = 0$` and `$E(Y_i) = 0$` 
    - Simplifies calculations

- As `$E(Y_i) = \beta_0 +\beta_1 E(X_i)$`

- As `$0 = \beta_0 +\beta_1 0 = \beta_0$`

- In this case the population equation has only one parameter `$\beta_1$`
$$ Y_i  = \beta_1X_i + u_i$$

---
## Estimator

- Assume that `$X_i$` and `$u_i$` are uncorrelated:

- Then `$$E[X_i u_i] = 0 = E[X_i(Y_i - \beta_1 X_i))] = E[X_iY_i] - \beta_1 E[X_i X_i]$$`

- solving for 
`$$\beta_1 = \frac{E[X_i Y_i]}{E[X_i X_i]} = \frac{\sigma_{XY}}{\sigma^2_{X}}$$`

- In the sample we can derive the corresponding equation, with no correlation in sample:
`$$\hat\beta_1 = \frac{s_{XY}}{s^2_{X}} = \frac{\sum_i^n X_i Y_i }{\sum_i^n (X_i)^2}$$`

* It can be shown that `$\hat\beta_1 \rightarrow \beta_1$` as sample increases

---
## Regression and correlation

- The correlation coefficient is just a rescaling of `$\beta_1$`

- Consider  `$V = \frac{Y}{\sigma_{Y}}$` `$Z = \frac{X}{\sigma_{X}}$` 
  - A simple rescaling of `$X$` and `$Y$` to variables with unit variance
  
Rewrite: `$Y = \beta_0 + \beta_1 X + u$`

`$$\frac{Y}{\sigma_{Y}} = \frac{\beta_0}{\sigma_{Y}} + \beta_1  \frac{\sigma_{X}}{\sigma_{Y}} \frac{X}{\sigma_{X}} + \frac{u}{\sigma_{Y}}$$`
Since `$\beta_1 = \frac{\sigma_{XY}}{\sigma^2_{X}}$` this means that:

`$$V = \frac{\beta_0}{\sigma_{Y}} + \beta_1  \frac{\sigma_{X}}{\sigma_{Y}} Z + \frac{u}{\sigma_{Y}}$$`
In terms of `$V$` and `$V$`, the relationship is:
`$$V = \alpha_0 + \alpha_1 Z + v$$`
---
## Correlation as a linerar relationship

We have

`$$V = \alpha_0 + \alpha_1 Z + v$$`
with `$\alpha_1 = \beta_1 \frac{\sigma_{X}}{\sigma_{Y}}$`. But `$\beta_1 = \frac{\sigma_{XY}}{\sigma^2_{X}}$` and thus

`$$\alpha_1 = \beta_1 \frac{\sigma_{X}}{\sigma_{Y}} = \frac{\sigma_{XY}}{\sigma^2_{X}}\frac{\sigma_{X}}{\sigma_{Y}} = \frac{\sigma_{XY}}{\sigma_{X}\sigma_{Y}} = \rho_{XY}$$`
Thus in terms of the rescaled variables, the population relationship is:

`$$V = \alpha_0 + \rho_{XY} Z + v$$`

- The coefficient `$\beta_1$` is a rescaling of the correlation coefficient `$\rho_{XY}$`

-  `$\beta_1 = 0$` if and only if `$X$` and `$Y$` are uncorrelated, ie: `$\rho_{XY} = 0$`

---
## Dummy variable regression

- [Employment data](https://rstudio.sh.se/ts/employment_06_07.rds): Line from mean for `$female = 0$` to mean for `$female = 1$`

- Slope corresponds to the lower average wage for women

![](statistics04_files/figure-html/unnamed-chunk-12-1.png)

---
## Dummy variable regression

- Same plot using _jitter_ (moving points slightly horizontally)

- Data points overlap less if distplayed slightly moved

![](statistics04_files/figure-html/unnamed-chunk-13-1.png)

---
## Empirical exercise

- Open [employment_06_07.rds](https://rstudio.sh.se/ts/employment_06_07.rds):

- Compare average for population and for female

```r
employment = read_rds("../data/employment_06_07.rds")
lm(earnwke~1, data = employment)
summary(employment$earnwke)
empf = filter(employment, female == 1)
lm(earnwke~female, data = employment)
lm(earnwke~female, data = empf)
summary(empf$earnwke)
```

---
class: inverse, center, middle
# Heteroscedasticity

---
## Heteroscedasticity

- Variability is often higher at higher values

- Same percentage variability

- Does not affect estimate

- Confidence intervals change

- Variance covariance is usually too low
    
    - Often increased variability for `$X_i$` far from mean
        `$\bar X$`
        
    - Variability at extremes results in more uncertainty than
        variability at the mean

---
## Heteroscedasticity plots

- Same data in both examples

- First and second half switch places

- Notice larger uncertainty (gray area) in second figure

- Observations with `$X_{i}$` far from mean `$\bar X$` are more
        influential
        
    - Variability far from mean increases uncertainty more
    
---
## Variability at the center

![](figures/hetcenter.png)

---
## Variability at the edges

![](figures/hetedges.png)

---
## Breusch-Pagan specification test

- Estimate model

- Squared residuals `$\hat u_{i}^2$` are generated

- Regess to see if errors are linearly dependent on variables
    `$$\hat u_{i}^{2}=\alpha_{0}+\alpha_{1}X_{i}+v_{i}$$`

- test to see if `$\hat \alpha_{1}=0$`

- Null hypothesis is homoscedasticity

---
## Dealing with heteroscedasticity

Estimate with _robust_ standard errors

- Tends to give larger standard deviations

- Better to be cautious...

- Unfortunately robust is _not the default in R_

- The `estimatr` package can be used

- Will also use the `huxtable` package for regression tables

---
## Next lecture

Chapter 6 - multivariate regression