class: center, middle, inverse, title-slide # Econometrics - Lecture 5 ## Linear Regression with Multiple Regressors ### Jonas Björnerstedt ### 2022-03-03 --- ## Lecture Content Chapter 6. Linear Regression with Multiple Regressors --- ## Multivariate regression - Linear Regression with Multiple Regressors - Allow `\(k\)` regressors `\(X_1, X_2, \ldots,X_k\)` - Estimate `\(k+1\)` parameters `\(\beta_0, \beta_{X}, \beta_{W}, \ldots,\beta_k\)` - Two subscripts are now needed for sample: `\(X_{1i}\)`, `\(X_{2i}\)`, `\(X_{ki}\)` - In lecture we focus on 2 regressors `\(X\)` and `\(W\)` - See textbook for `\(k\)` regressors --- ## Why control variables? If we just want to see how `\(Y_i\)` depends on `\(X_i\)`, why add variable `\(W_i\)` to the regression?! 1. Correlation between `\(X_i\)` and omitted variables - reduce _omitted variable bias_ 2. Reduce uncertainty - Reduce unexplained variation `\(u_i\)` - Tighter confidence intervals on parameters of interest Variables `\(X_2,...,X_k\)` are _control variables_ --- ## Specification - Linear model `$$E(Y_i|X_i, W_i) = \beta_{0}+\beta_{X}X_i + \beta_{W}W_i$$` - Sample data `$$Y_i=\beta_{0}+\beta_{X}X_{i}+ \beta_{W}W_{i}+u_i$$` - Estimate will give `\(\widehat\beta_{0},\widehat\beta_{X},\widehat\beta_{W}\)` and `\(\widehat u_i\)` `$$Y_i=\widehat\beta_{0}+\widehat\beta_{X}X_{i}+ \widehat\beta_{W}W_{i}+\widehat u_i$$` --- ## Linear relationship with 2 vars .pull-left[ ![](figures/linrel2.png) ] .pull-right[ - With two independent vars, the following relationship is a plane `$$Y_i = -0.1 X_i + 0.5 W_i$$` - For every `\(X_i\)` and `\(W_i\)` there is a unique `\(Y_i\)` - `\(\beta_0\)` is where plane crosses `\(Y\)` axis - `\(\beta_{X},\beta_{W}\)` is the slope in `\(X\)` and `\(W\)` directions ] --- ## The OLS estimator in multiple regression - The OLS estimator: `$$Y_i = \widehat\beta_{0} + \widehat\beta_{X} X_{i} + \widehat\beta_{W} W_{i} + \widehat u_i$$` - Find `\(\widehat\beta_0,\widehat\beta_{X},\widehat\beta_{W}\)` that minimize `$$SSR = \sum_{i=1}^n \widehat u_i^2$$` --- ## Regression residual - Residual has variance `$$\widehat\sigma_u^{2}=\frac{1}{n-3}\sum_{i=1}^{N} \widehat u_{i}^{2}=\frac{SSR}{n-3}$$` - Degrees of freedom: `\(n-3\)` - We have estimated 3 parameters `\(\hat\beta_0, \hat\beta_{X}, \hat\beta_{W}\)` --- ## Degrees of freedom - Estimating the average `\(\bar Y\)` with one observation `\((n=1)\)` gives zero variance - The average `\(\bar Y = Y_1\)` - Estimating `\(E(Y|X) = \beta_0 + \beta_{X} X\)` with two observations also gives perfect fit - With `\(X,W\)` and 3 parameters `\(\beta_0\)`, `\(\beta_{X}\)` and `\(\beta_{W}\)` three observations `\((n = 3)\)` are fit perfectly - Degrees of freedom adjustment compensates for this --- ## Adjusted `\(R^{2}\)` - Adding regressors always increases `\(R^{2}\)` - Better measure Adjusted `\(\bar{R}^{2}\)` `$$\bar{R}^{2}=1-\frac{SSR/\left(n-k\right)}{TSS/\left(n-1\right)}$$` - Adjusted by degrees of freedom `\(n-k\)`, where `\(k\)` is the number of parameters `\(\beta\)` - Adding parameters can decrease `\(\bar{R}^{2}\)` if `\(SSR\)` only decreases a little --- class: inverse, center, middle # Multicollinearity --- ## Correlation between random variables - Positive, zero and negative correlation <img src="figures/errcorr.png" alt="Drawing" style="width: 350px;"/> <img src="figures/errcorrneg.png" alt="Drawing" style="width: 350px;"/> --- ## [Linear relationship with 2 vars <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-ols) .pull-left[ <img src="figures/linrel.png" alt="Drawing" style="width: 400px;"/> ] .pull-right[ - With two independent vars, the following relationship is a plane `$$Y=-0.1 X + 0.5 W$$` - For every `\(X\)` and `\(W\)` there is a unique `\(Y\)` - `\(\beta_0\)` is intercept and `\(\beta_{X},\beta_{W}\)` are the slopes ] --- ## Standard error with 2 vars - Small variance in `\(u_i\)` and large in `\(X_i\)` and `\(W_i\)` ![](figures/nomulticollin2.png) --- ## [Multicollinearity <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics05-figs.Rmd#section-multicollinearity) - Many planes fit data almost as well ![](figures/highmulticollin2.png) --- ## Near perfect Multicollinearity - Detection - low individual significance - despite high joint significance - More data needed! - Does not cause any problems except for identifying single parameters - Do not ’solve’ by dropping a parameter if it should be included - Omitted variable bias - next section - Conceptual problem in model? - Are the variables capturing the same effect? - How do we interpret the coefficients? - Not a technical problem --- ## Perfect Multicollinearity - _Dummy variable trap_ - Regress on constant variable - Impossible to separate the effect of intercept from variable - Stata automatically drops a variable - Intercept is calculating by adding variable `\(X_0 = 1\)` - Makes algebra for solving simpler - Also facilitates understanding perfect multicollinearity - A column cannot be just a linear combination of other columns --- class: inverse, center, middle # Omitted variable bias --- ## Omitted variables - Assume `$$E(Y_i|X_i,W_i) = \beta_{0}+\beta_{X}X_i+ \beta_{W}W_i$$` - What happens if only one variable is included in the regression?: `$$Y_i = \alpha_{0} + \alpha_{X}X_{i} + v_i$$` - Estimating the conditional expectation `$$E(Y_i|X_i) = \alpha_{0}+\alpha_{X}X_i$$` - `\(u_i\)` can be thought of as the sum of all variables affecting `\(Y_i\)` - The effect of variation in `\(W_i\)` will be in the error term `\(v_i\)` - Note that if `\(W\)` does not vary, it will be incorporated in `\(\alpha_0\)` - Thus both the intercept and the error term contain the effect of _everything else_ on `\(Y\)` --- ## Conditions for omitted variable bias If `\(W_i\)` is not included, we get _omitted variable bias_ if 2. `\(W_i\)` is a determinant of `\(Y_i\)` 1. `\(X_i\)` and `\(W_i\)` are correlated - Equation (6.1) on page 231 is not very intuitive --- ## Omitted variable bias - If `\(X_i\)` and `\(W_i\)` are correlated, then `\(\omega_{X} \neq 0\)` in `$$W_i = \omega_{0} + \omega_{X} X_i + w_i$$` - Substitute `\(W_i\)` in the regression `$$Y_i = \beta_{0} + \beta_{X}X_{i} + \beta_{W}\overset{W_i}{\overbrace{\big(\omega_{0} +\omega_{X} X_i + w_i\big)}}+ u_i$$` - Rearrange `$$Y_i = (\beta_{0} + \beta_{W}\omega_{0}) + (\beta_{X} +\beta_{W}\omega_{X}) X_i + (\beta_{W}w_i+ u_i)$$` `$$Y_i = \alpha_{0} + \alpha_{X}X_{i} + v_i$$` --- ## Omitted variable bias - Estimating the relationship between only `\(X\)` and `\(Y\)` does not estimate `\(\beta_{X}\)`! `$$Y_i = (\beta_{0} + \beta_{W}\omega_{0}) + (\beta_{X} +\beta_{W}\omega_{X}) X_i + (\beta_{W}w_i+ u_i)$$` We get a bias `$$\alpha_X = \beta_{X} + \beta_{W}\omega_{X} \neq \beta_{X}$$` - The bias of this estimate depends on the sign and magnitudes of `\(\omega_{X}\)` and `\(\beta_{W}\)`. - The estimate is an _inconsistent_ estimator of `\(\beta_X\)` - increasing the sample size just improves the estimate of `\(\alpha_{X}\)` --- ## Application to the test scores data - Omitted variable ```r library(estimatr) library(tidyverse) caschool = read_rds("caschool.rds") # Regression with both rboth = lm( testscr ~ str + el_pct, data = caschool) # Regression omitting el_pct rstr = lm( testscr ~ str, data = caschool ) # How does el_pct depend on str?: re = lm( el_pct ~ str, data = caschool ) ``` --- ## Test scores - Omitted variable equation - Regressions with and without `el_pct` and with `str` as dependent var
testscr
testscr
el_pct
str
-1.101
1.814
-2.280
el_pct
-0.650
- Omitted variable equation `\(E(\hat\beta_{X}) = \beta_{X} + \omega_{X}\beta_{W}\)` ```r rboth$coefficients["str"] + re$coefficients["str"]*rboth$coefficients["el_pct"] ``` ``` str -2.279808 ``` --- ## [Tradeoff bias and precision <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics05-figs.Rmd#section-omitted) ``` =================================================== X 0.994*** 0.995*** (0.027) (0.027) W 0.020 (0.028) Constant 0.973*** 0.973*** (0.025) (0.025) --------------------------------------------------- Observations 50 50 R2 0.967 0.967 Adjusted R2 0.966 0.966 Residual Std. Error 0.177 (df = 47) 0.176 (df = 48) =================================================== Note: *p<0.1; **p<0.05; ***p<0.01 ``` --- ## Omitted variable - Correlation Inclusion/omission of `\(W\)` depends on correlation and on whether it is in the population equation. Correlation `\(X_i\)` and `\(W_i\)` | `\(\beta_W\)` | Included | Omitted ------------ | ---| ----------- | -------------- Uncorrelated | `\(\beta_W = 0\)` | | Correlated | `\(\beta_W = 0\)` | More uncertain | Uncorrelated | `\(\beta_W \neq 0\)` | | More uncertain Correlated | `\(\beta_W \neq 0\)` | | __Biased and Inconsistent__