class: center, middle, inverse, title-slide # Econometrics - Lecture 6 ## Hypothesis Tests and Confidence Intervals ### Jonas Björnerstedt ### 2022-03-02 --- ## Lecture Content Uncertainty - A little repetition of Probability and statistics - Chapter 5 - Hypothesis tests - Chapter 7 - Tests --- class: inverse, center, middle # Tests ## A little repetion --- ## Coin toss example - A _fair coin_ has equal probability of heads and tails - How do we deterimine if a coin is fair? -- 1. Toss the coin many times 1. Assign 1 if outcome is heads and -1 if tails 1. Take the average 1. Check if the average is close to zero -- - But how can be sure it is fair if for example the average is 0.1 instead of zero? - How sure are we? --- ## Expected value and average - What is the relationship between the expected value in the population and sample means taken from the population? - Can we say how close the average height of 100 randomly sampled people is to the mean value in the population (the expected value)? - Two approaches: - Calculate the distribution of the mean - The Central Limit Theorem (CLT) --- ## Design of a test - Select a _null hypothesis_ `\(H_0\)` - Create a statistic with a known distribution under `\(H_0\)` - Calculate the value of this statistic for the data - Determine the probability of actual outcome (or worse) occurring assuming `\(H_0\)` - Reject `\(H_0\)` if the probability of the actual outcome occurring is below a chosen critical level --- ## Exact distribution - If we know the distribution of the random variable `\(X\)`, we can calculate the distribution of `\(\bar X\)` - Binomial distribution of fair coin - Normal distribution - Given this calculated distribution, we can see how likely the mean in our sample is - Reject `\(H_0\)` if it is _very_ unlikely (given some choice of threschold) --- ## [Coin toss - distribution](http://rstudio.sh.se/content/statistics02-figs#section-distribution) <sup> 🔗 </sup> .pull-left[ Outcome | Probability | ---------: | -------------: | 1 | 1/2 -1 | 1/2 * Expected value: `$$E(Y) = -1*0.5 + 1*0.5 = 0$$` * Variance: `$$Var(Y) = (-1)^2*0.5 + (1)^2*0.5 = 1$$` ] .pull-right[ ![](statistics06_files/figure-html/unnamed-chunk-1-1.png)<!-- --> ] --- ## [Distribution of mean `\(\bar Y\)` - 2 obs](http://rstudio.sh.se/content/statistics02-figs#section-distribution) <sup> 🔗 </sup> .pull-left[ First | Second | Mean | ---------: | -------------: | ------:| 1 | 1 | 1 1 | -1 | 0 -1 | 1 | 0 -1 | -1 | -1 * Expected value: `$$E(\bar Y) = -1*\frac{1}{4} + 1*\frac{1}{4} + 0*\frac{1}{2} = 0$$` ] .pull-right[ ![](statistics06_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] * Variance: `$$Var(\bar Y) = (-1-0)^2 \frac{1}{4} + (1-0)^2*\frac{1}{4} + (0-0)^2\frac{1}{2} = \frac{1}{2}$$` --- ## Central Limit Theorem (CLT) - Properties of the mean `\(\bar Y\)` of random samples of `\(Y\)` - Central Limit Theorem (CLT) - Take mean of independent random draws of `\(Y\)` - Draws can have (almost) any distribution 1. Large sample imply that mean is almost normal - Normal distribution is characterized by mean and variance 2. Mean will be close to expected value 3. Mean will vary due to sampling - _Standard error_ of `\(\bar Y\)` - Variance proportional to variance of `\(Y\)` - Variance inversely proportional to sample size `\(n\)` `$$Var(\bar Y) = \frac{Var(Y)}{n}$$` --- ## [Coin toss average - find distribution <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics02-figs.Rmd#section-Find-distribution) ![](statistics06_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ## Central limit theorem - Assume that `\(Y\)` has mean `\(\mu_Y\)` and variance `\(\sigma^2_Y\)` - Then for large `\(n\)` the sample average `\(\bar Y\)` has almost a normal distribution `$$N(\mu_Y, \sigma_Y^2/n)$$` - As the sample variance `\(s_Y^2\)` converges to the population variance `\(\sigma_Y^2\)`, we know approximately the distribution of the sample mean for large `\(n\)`. --- ## Standard deviation and standard error - Can calculate the _standard error_ - Square root of `\(Var(\bar Y)\)` `$$SE(\bar Y) = \sqrt{Var(\bar Y)} = \sqrt{\frac{Var(Y)}{n} } = \frac{SD(Y)}{\sqrt{n} }$$` * As `\(\bar Y\)` has almost normal distribution * our estimate will vary due to sampling * 95% of the draws will be within `\(2SE(\bar Y)\)` however * Confidence interval! --- ## Fair coin plot _p-value_ given by green areas ![](statistics06_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- ## Is a coin fair? - Assign 1 to head and -1 to tail - Under null hypothesis `\(H_0\)` is fair `\(E(Y)=\mu_Y=0\)` (blue line) - Variance is known!: `\(Var(Y)=1\)` - With alternate hypothesis `\(H_1\)` we have `\(E(Y)<0\)` or `\(E(Y)>0\)` - What is the probability that a sample average `\(\bar Y\)` is as far away as `\(\hat Y\)` or more? - Here estimate `\(\hat Y=0.4\)` (red line) - If `\(\hat Y=0.4\)` is possible, then also `\(\hat Y=-0.4\)` should be - Called the _p-value_ (green areas) --- class: inverse, center, middle # Hypothesis tests in regressions ## With a single regressor --- ## Test scores regression ```r caschool = read_rds("files/caschool.rds") regschool = lm_robust(testscr ~ str, data = caschool) summary(regschool) ``` ``` Call: lm_robust(formula = testscr ~ str, data = caschool) Standard error type: HC2 Coefficients: Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF (Intercept) 698.93 10.3998 67.206 3.503e-226 678.491 719.375 418 str -2.28 0.5213 -4.373 1.546e-05 -3.304 -1.255 418 Multiple R-squared: 0.05124 , Adjusted R-squared: 0.04897 F-statistic: 19.13 on 1 and 418 DF, p-value: 1.546e-05 ``` --- ## The variance covariance matrix - The standard error of str is: 0.5212891 - The variance covariance matrix is given by the `vcov` function ```r vcov(regschool) ``` ``` (Intercept) str (Intercept) 108.155628 -5.4010252 str -5.401025 0.2717424 ``` - `\(\sqrt{0.2717424}=\hat\sigma_{\beta_1}=\)` 0.5212891 is the std. err for **str** - off-diagonal terms are correlations between `\(\hat\beta_0\)` and `\(\hat\beta_1\)` --- ## Standard error of `\(\hat\beta_1\)` - Uncertainty in estimate `\(\hat\beta_1\)` depends on three things 1. Variance `\(\sigma_u^2\)` of unexplained variation `\(u\)` 2. Variance `\(\sigma_X^2\)` of regressor `\(X\)` 3. Number of obserations `\(n\)` - Under homoscedasticity the relationship is simple `$$\sigma_{\hat\beta_1}^2 = \frac{1}{n}\frac{\sigma_u^2}{\sigma_X^2}$$` - More complicated under heteroscedasticity - variation in `\(u\)` depends on `\(X\)` - Skip Key concept 4.4 p 177 and eq (5.4) p 194 --- ## p-value and t-test - Individual significance - Standard errors `\(\sigma_{\beta_1}\)` - Confidence intervals: `\(\hat\beta_1 \pm t_{\alpha/2}\hat \sigma_{\beta_1}\)` - `\(t_{\alpha/2}\)` is the critical value of a normal distribution for confidence level `\(\alpha\)` - Usually 95% CI are calculated with `\(t_{\alpha/2}\approx 1.96\)`. - p-values - significantly different than zero at what level - `\(\alpha\)` at which `\(\beta_1\)` is just significant - t-statistic - critical value giving significance: `\(\frac{\beta_1 - 0}{\hat \sigma_{\beta_1}}\)` --- class: inverse, center, middle # Hypothesis tests and confidence intervals ## With multiple regressors --- ## Correlation and OLS estimators - Let `\(Y = \beta_0 + \beta_1 X + \beta_2 W + u\)` - Higher slope means lower intercept - If `\(X\)` and `\(W\)` are _positively_ correlated then - `\(\hat\beta_1\)` and `\(\hat\beta_2\)` are negatively correlated - If a high `\(X\)` explains more of `\(Y\)`, then the high value of `\(W\)` explains less - If `\(X\)` and `\(W\)` are _negatively_ correlated then - `\(\hat\beta_1\)` and `\(\hat\beta_2\)` are positively correlated --- ## Standard errors for the OLS estimators .pull-left[ - Jointly normal distribution with means `\(\hat\beta_0\)`, `\(\hat\beta_1\)` - joint variance specified by _variance covariance matrix_ ] .pull-right[ ![](figures/pdf.png) ] --- ## Hypothesis tests for a single coefficient - Test null hypothesis `\(H_0\)` that the coefficient of str is 2.1 - Compute `$$t^{act}= \frac{\hat\beta_j-2.1}{SE(\hat\beta_j)}$$` - Reject at 5 percent level if - `\(|t^{act}| > 1.96\)` - Alternatively calculate the p-value `$$-2\Phi(-|t^{act}|)$$` --- ## Confidence interval .pull-left[ - 90% confidence interval for `\(\hat\beta_0\)` - `\(\hat\beta_1\)` can take any value ] .pull-right[ ![](figures/pdf_interval.png) ] --- ## Confidence sets .pull-left[ - Most likely combinations of slopes and intercepts ] .pull-right[ ![](figures/pdf_region.png) ] --- ## Geometry of confidence sets .pull-left[ - If the parameters were independent, the set would be a circle - Set is parameter pairs within a certain distance (circle) ] .pull-right[ ![](figures/pdf_iid.png) ] --- ## Test of joint hypothesis on two independent variables - Testing hypotheses on two or more coefficients - What is the distribution of the distance? - For standard normal the distance is given by: `\(t_1^2+t_2^2\)` - Has `\(\chi^2_2\)` (or `\(F_{2,\infty}\)`) distribution - The F-statistic - Set critical level to exclude 5% of outcomes with greatest distance - The F-value is a "distance measure" - We can reject `\(H_0\)` if the F-value is large --- ## F-test - Test if the squared sum is close to 0 - after rescaling the variance - Estimates `\(\beta_1\)` and `\(\beta_2\)` are approximately normal - Squared sum of `\(\beta_1\)` and `\(\beta_2\)` will be `\(\chi^2\)` distribution, 2 degrees of freedom - Rescaling with variance gives F distribution - Rescaling implies dividing by `\(\chi^2\)` distributed variable, `\(n - k\)` degrees of freedom --- ## Testing several parameters - Test `\(q\)` restrictions - For example `\(H_0: \beta_1 = \beta_2=0\)` - Can construct test statistic with - F distribution wih `\(q\)` and `\(n-k\)` degrees of freedom - Measures a 'distance' from - higher F value makes `\(H_0\)` less likely - p value measures probability of test value given `\(H_0\)` --- ## F-test of regression - Test of hypothesis: **all** estimated slopes are zero - A high F value desirable in this case as we want to reject null hypothesis - Corresponds to low probability of accepting the null hypothesis - Joint significance different than individual - F-test is very close to the `\(R^2\)` statistic - When one is low the other is high --- ## Joint significance of correlated regressor - Population equation: `\(E(Y_i|X_i,W_i) = 1 + X_i + W_i\)` ```r library(AER) obs = 100 X = rnorm(obs) W = X + 0.1 * rnorm(obs) Y = 1 + X + W + rnorm(obs) model = lm( Y ~ X + W) model ``` ``` ## ## Call: ## lm(formula = Y ~ X + W) ## ## Coefficients: ## (Intercept) X W ## 0.9163 1.7462 0.2991 ``` --- ## Joint test ```r # summary(model) linearHypothesis(model, c("X=0", "W=0")) ``` ``` ## Linear hypothesis test ## ## Hypothesis: ## X = 0 ## W = 0 ## ## Model 1: restricted model ## Model 2: Y ~ X + W ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 99 446.49 ## 2 97 96.92 2 349.57 174.92 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ```