Econometrics - Lecture 6

class: center, middle, inverse, title-slide

# Econometrics - Lecture 6
## Hypothesis Tests and Confidence Intervals
### Jonas Björnerstedt
### 2022-03-02

---

## Lecture Content

Uncertainty

- A little repetition of Probability and statistics

- Chapter 5 - Hypothesis tests

- Chapter 7 - Tests

---
class: inverse, center, middle
# Tests

## A little repetion
  
---
## Coin toss example

- A _fair coin_ has equal probability of heads and tails

- How do we deterimine if a coin is fair?

1. Toss the coin many times

1. Assign 1 if outcome is heads and -1 if tails

1. Take the average

1. Check if the average is close to zero

- But how can be sure it is fair if for example the average is 0.1 instead of zero?
   - How sure are we?

---
## Expected value and average

- What is the relationship between the expected value in the population and sample means taken from the population?

- Can we say how close the average height of 100 randomly sampled people is to the mean value in the population (the expected value)?

- Two approaches:

- Calculate the distribution of the mean

- The Central Limit Theorem (CLT)

---
## Design of a test

- Select a _null hypothesis_ `$H_0$`

- Create a statistic with a known distribution under `$H_0$`

- Calculate the value of this statistic for the data

- Determine the probability of actual outcome (or worse) occurring assuming `$H_0$`

- Reject `$H_0$` if the probability of the actual outcome occurring is below a chosen critical level

---
## Exact distribution

- If we know the distribution of the random variable `$X$`, we can calculate the distribution of `$\bar X$`

- Binomial distribution of fair coin
  
  - Normal distribution
  
- Given this calculated distribution, we can see how likely the mean in our sample is

- Reject `$H_0$` if it is _very_ unlikely (given some choice of threschold)

---
## [Coin toss - distribution](http://rstudio.sh.se/content/statistics02-figs#section-distribution) 🔗

.pull-left[

Outcome | Probability |
---------: | -------------: | 
1  | 1/2  
-1  | 1/2

* Expected value:

`$$E(Y) = -1*0.5 + 1*0.5 = 0$$`
* Variance:

`$$Var(Y) = (-1)^2*0.5 + (1)^2*0.5 = 1$$`

]
.pull-right[

![](statistics06_files/figure-html/unnamed-chunk-1-1.png)
]

---
## [Distribution of mean `$\bar Y$` - 2 obs](http://rstudio.sh.se/content/statistics02-figs#section-distribution) 🔗

.pull-left[

First | Second | Mean |
---------: | -------------: | ------:|
1  | 1 | 1 
1  | -1 | 0
-1  | 1 | 0
-1  | -1 | -1

* Expected value:

`$$E(\bar Y) = -1*\frac{1}{4} + 1*\frac{1}{4} + 0*\frac{1}{2} = 0$$`

]
.pull-right[

![](statistics06_files/figure-html/unnamed-chunk-2-1.png)
]

* Variance:

`$$Var(\bar Y) = (-1-0)^2 \frac{1}{4} + (1-0)^2*\frac{1}{4} + (0-0)^2\frac{1}{2} = \frac{1}{2}$$`

---
## Central Limit Theorem (CLT)

- Properties of the mean `$\bar Y$` of random samples of `$Y$`

- Central Limit Theorem (CLT)

- Take mean of independent random draws of `$Y$`

- Draws can have (almost) any distribution
    
1. Large sample imply that mean is almost normal

- Normal distribution is characterized by mean and variance

2. Mean will be close to expected value

3. Mean will vary due to sampling - _Standard error_ of `$\bar Y$`
  
  - Variance proportional to variance of `$Y$`
  
  - Variance inversely proportional to sample size `$n$`

`$$Var(\bar Y) = \frac{Var(Y)}{n}$$`
---
## [Coin toss average - find distribution 🔗 ](http://rstudio.sh.se/content/statistics02-figs.Rmd#section-Find-distribution)

![](statistics06_files/figure-html/unnamed-chunk-3-1.png)

---
## Central limit theorem

- Assume that `$Y$` has mean `$\mu_Y$` and variance `$\sigma^2_Y$`

- Then for large `$n$` the sample average `$\bar Y$` has almost a normal distribution 
`$$N(\mu_Y, \sigma_Y^2/n)$$`

- As the sample variance `$s_Y^2$` converges to the population variance `$\sigma_Y^2$`, we know approximately the distribution of the sample mean for large `$n$`.

---
## Standard deviation and standard error

- Can calculate the _standard error_
  
  - Square root of `$Var(\bar Y)$`

`$$SE(\bar Y) = \sqrt{Var(\bar Y)} = \sqrt{\frac{Var(Y)}{n} } = \frac{SD(Y)}{\sqrt{n} }$$`

* As `$\bar Y$` has almost normal distribution

* our estimate will vary due to sampling

* 95% of the draws will be within `$2SE(\bar Y)$` however
  
  * Confidence interval!

---
## Fair coin plot

_p-value_ given by green areas
 
![](statistics06_files/figure-html/unnamed-chunk-4-1.png)

---
## Is a coin fair?

- Assign 1 to head and -1 to tail

- Under null hypothesis `$H_0$` is fair `$E(Y)=\mu_Y=0$` (blue line)

- Variance is known!: `$Var(Y)=1$`

- With alternate hypothesis `$H_1$` we have `$E(Y)<0$` or `$E(Y)>0$`

- What is the probability that a sample average `$\bar Y$` is as far away as `$\hat Y$` or more?

- Here estimate `$\hat Y=0.4$` (red line)

- If `$\hat Y=0.4$` is possible, then also `$\hat Y=-0.4$` should be

- Called the _p-value_ (green areas)

---
class: inverse, center, middle
# Hypothesis tests in regressions

## With a single regressor

---
## Test scores regression

```r
caschool = read_rds("files/caschool.rds")
regschool = lm_robust(testscr ~ str, data = caschool) 
summary(regschool)
```

```

Call:
lm_robust(formula = testscr ~ str, data = caschool)

Standard error type:  HC2

Coefficients:
            Estimate Std. Error t value   Pr(>|t|) CI Lower CI Upper  DF
(Intercept)   698.93    10.3998  67.206 3.503e-226  678.491  719.375 418
str            -2.28     0.5213  -4.373  1.546e-05   -3.304   -1.255 418

Multiple R-squared:  0.05124 ,	Adjusted R-squared:  0.04897 
F-statistic: 19.13 on 1 and 418 DF,  p-value: 1.546e-05
```

---
## The variance covariance matrix

- The standard error of str is: 0.5212891

- The variance covariance matrix is given by the `vcov` function

```r
vcov(regschool)
```

```
            (Intercept)        str
(Intercept)  108.155628 -5.4010252
str           -5.401025  0.2717424
```

- `$\sqrt{0.2717424}=\hat\sigma_{\beta_1}=$` 0.5212891 is the std. err for **str**

- off-diagonal terms are correlations between `$\hat\beta_0$` and `$\hat\beta_1$`

---
## Standard error of `$\hat\beta_1$`

- Uncertainty in estimate `$\hat\beta_1$` depends on three things

1. Variance `$\sigma_u^2$` of unexplained variation `$u$`
    
    2. Variance `$\sigma_X^2$` of regressor `$X$`
    
    3. Number of obserations `$n$`

- Under homoscedasticity the relationship is simple
`$$\sigma_{\hat\beta_1}^2 = \frac{1}{n}\frac{\sigma_u^2}{\sigma_X^2}$$`

- More complicated under heteroscedasticity

- variation in `$u$` depends on `$X$`
    
    - Skip Key concept 4.4 p 177 and eq (5.4) p 194
    
---
## p-value and t-test

- Individual significance

- Standard errors `$\sigma_{\beta_1}$`

- Confidence intervals: `$\hat\beta_1 \pm t_{\alpha/2}\hat \sigma_{\beta_1}$`
    
    - `$t_{\alpha/2}$` is the critical value of a normal distribution for confidence level `$\alpha$`
    
    - Usually 95% CI are calculated with `$t_{\alpha/2}\approx 1.96$`.

- p-values - significantly different than zero at what level

- `$\alpha$` at which `$\beta_1$` is just significant

- t-statistic - critical value giving significance:

`$\frac{\beta_1 - 0}{\hat \sigma_{\beta_1}}$`

---
class: inverse, center, middle
# Hypothesis tests and confidence intervals

## With multiple regressors

---
## Correlation and OLS estimators

- Let  `$Y = \beta_0 + \beta_1 X + \beta_2 W + u$`

- Higher slope means lower intercept

- If `$X$` and `$W$` are _positively_ correlated then

- `$\hat\beta_1$` and `$\hat\beta_2$` are negatively correlated
    
    - If a high `$X$` explains more of `$Y$`, then the high value of `$W$` explains less

- If `$X$` and `$W$` are _negatively_ correlated then

- `$\hat\beta_1$` and `$\hat\beta_2$` are positively correlated

---
## Standard errors for the OLS estimators

.pull-left[

- Jointly normal distribution with means `$\hat\beta_0$`, `$\hat\beta_1$`

- joint variance specified by _variance covariance matrix_

]
.pull-right[

![](figures/pdf.png)
]

---
## Hypothesis tests for a single coefficient

- Test null hypothesis `$H_0$` that the coefficient of str is 2.1

- Compute `$$t^{act}= \frac{\hat\beta_j-2.1}{SE(\hat\beta_j)}$$`

- Reject at 5 percent level if

- `$|t^{act}| > 1.96$`

- Alternatively calculate the p-value 
`$$-2\Phi(-|t^{act}|)$$`

---
## Confidence interval
.pull-left[

- 90% confidence interval for `$\hat\beta_0$`

- `$\hat\beta_1$` can take any value
]
.pull-right[

![](figures/pdf_interval.png)
]

---
## Confidence sets
.pull-left[

- Most likely combinations of slopes and intercepts
]
.pull-right[

![](figures/pdf_region.png)

]

---
## Geometry of confidence sets

.pull-left[

- If the parameters were independent, the set would be a circle

- Set is parameter pairs within a certain distance (circle)
]
.pull-right[
![](figures/pdf_iid.png)

]

---
## Test of joint hypothesis on two independent variables

- Testing hypotheses on two or more coefficients

- What is the distribution of the distance?
    
    - For standard normal the distance is given by: `$t_1^2+t_2^2$`
    
    - Has `$\chi^2_2$` (or `$F_{2,\infty}$`) distribution

- The F-statistic

- Set critical level to exclude 5% of outcomes with greatest distance

- The F-value is a "distance measure"

- We can reject `$H_0$` if the F-value is large

---
## F-test

- Test if the squared sum is close to 0

- after rescaling the variance

- Estimates `$\beta_1$` and `$\beta_2$` are approximately normal

- Squared sum of `$\beta_1$` and `$\beta_2$` will be `$\chi^2$` distribution, 2 degrees of freedom

- Rescaling with variance gives F distribution
    
    - Rescaling implies dividing by `$\chi^2$` distributed variable, `$n - k$` degrees of freedom

---
## Testing several parameters

- Test `$q$` restrictions

- For example `$H_0: \beta_1 = \beta_2=0$`

- Can construct test statistic with

- F distribution wih `$q$` and `$n-k$` degrees of freedom

- Measures a 'distance' from

- higher F value makes `$H_0$` less likely
    
    - p value measures probability of test value given `$H_0$`

---
## F-test of regression

- Test of hypothesis: **all** estimated slopes are zero

- A high F value desirable in this case as we want to reject null hypothesis

- Corresponds to low probability of accepting the null hypothesis

- Joint significance different than individual

- F-test is very close to the `$R^2$` statistic

- When one is low the other is high

---
## Joint significance of correlated regressor

- Population equation: `$E(Y_i|X_i,W_i) = 1 + X_i + W_i$`

```r
library(AER)
obs = 100
X = rnorm(obs)
W = X + 0.1 * rnorm(obs)
Y = 1 + X + W + rnorm(obs)
model = lm( Y ~ X + W)
model
```

```
## 
## Call:
## lm(formula = Y ~ X + W)
## 
## Coefficients:
## (Intercept)            X            W  
##      0.9163       1.7462       0.2991
```

---
## Joint test

```r
# summary(model)
linearHypothesis(model, c("X=0", "W=0"))
```

```
## Linear hypothesis test
## 
## Hypothesis:
## X = 0
## W = 0
## 
## Model 1: restricted model
## Model 2: Y ~ X + W
## 
## Res.Df RSS Df Sum of Sq F Pr(>F) 
## 1 99 446.49 
## 2 97 96.92 2 349.57 174.92 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```