class: center, middle, inverse, title-slide # Microeconometrics - Lecture 6 ## Endogeneity ### Jonas Björnerstedt ### 2022-03-09 --- ## Lecture Content - Chapter 12 - Instrumental Variable Regression - Continuation next lecture - Chapter 7 - Tests --- class: inverse, center, middle # Endogeneity --- ## Endogeneity bias - Assume that regressor `\(X_i\)` and `\(u_i\)` are positively correlated `$$E\left(u_i\left|X_i\right.\right) \neq 0$$` - Least squares estimation is upward biased! ![](figures/ovarbias.png) --- ## Bias with negative correlation - Assume that regressor `\(X_i\)` and `\(u_i\)` are negatively correlated - Least squares estimation is biased downward! ![](figures/ovarbiasneg.png) --- ## Types of endogeneity - Correlation between `\(X_i\)` and `\(u_i\)` can arise in different ways 1. Omitted variable - Price is correlated with unobserved variables that also affect demand 2. Simultaneity - Specify other equation 3. Measurement error - Bias towards zero - Endogeneity of one variable affects estimates of all! --- ## Omitted variable bias - Assume that - population model is: `\(Y_i = \beta_{0} + \beta_{1}X_i + \beta_{2}W_i + u_i\)` - `\(\mathrm{Cov}\left(X_i,W_i\right) > 0\)` - Estimating the model `$$Y_i = \gamma_{0} + \gamma_{1}X_i + v_i$$` results in biased estimate as `\(X_i\)` is correlated with `\(v_i\)`: `$$\mathrm{Cov}\left(X_i,v_i\right) = \mathrm{Cov}\left(X_i,\beta_{2} W_i + u_i\right) = \beta_{2}\mathrm{Cov}\left(X_i, W_i\right)$$` - Sign of bias `\(\mathrm{E}\left(\hat{\beta_{1}}\right) - \beta_{1}\)` depends on the sign of `\(\beta_{2}\)` - Omission of uncorrelated variables does not cause bias! - But the standard error of estimates will increase --- ## Illustration: Omitted variable bias - Correllation `\(X\)` and `\(W\)` - parameter for `\(X\)` includes effect due to `\(W\)` ![](figures/omitted_variable_bias.png) --- ## Measurement error - Assume that `\(X_i = X^{*}_i + W_i\)` is observed but `\(Y_i\)` is a function of `\(X^{*}_i\)` - Measurement error `\(W_i\)` - True relationship: `\(Y_i = \beta_{0} + \beta_{1}X^{*}_i + u_i\)` - Rewriting: `$$Y = \beta_{0} + \beta_{1}(X_i - W_i) + u_i = \beta_{0} + \beta_{1} X_i + (-\beta_{1} W_i + u_i)$$` - Estimate equation: `\(Y_i = \beta_{0} + \beta_{1} X_i + v_i\)` - `$$Cov(X_i,W_i) = E[(X_i-\bar X_i)W_i] = E[( X^{*}_i + W_i)W_i] = E[(W_i)^2] > 0$$` - Bias towards zero - `\(W_i\)` positively correlated with `\(X\)` but has negative effect `\(-\beta_{1}\)` on `\(Y_i\)` - With huge error, noise will dominate effect of `\(X_i\)` on `\(Y_i\)` - Measurement error spreads the data horizontally --- ## [Measurement error plot](http://rstudio.sh.se/content/me06-figs/) ``` ## `geom_smooth()` using formula 'y ~ x' ## `geom_smooth()` using formula 'y ~ x' ``` ![](me06_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ## Simultaneity and bias - Demand `$$q = \beta_{0} + \beta_{1}p + u$$` - Supply `$$p = \pi_{0} + \pi_{1}q + e$$` - If `\(\beta_{1} < 0\)` and `\(\pi_{1} > 0\)`, simultaneous equations - There is a unique equilibrium `\(\bar p\)` and `\(\bar q\)` - All variation from equilibrium is unexplained! - Cannot estimate relationships --- ## Supply and demand shifts - Solution: other parameters that explain demand or supply - Demand and cost shifters - A cost shifter can identify demand - A demand shifter can identify costs --- ## Selection bias - Existence of observation depends on the outcome `\(Y_i\)` - Self selection - Examples - Labor force participation --- ## Solution 1: Proxy variables - Non-central variables - ability in explaining wages - hard to measure ability (measurement error) - important but perhaps not central - Use other *proxy* variable instead: IQ test scores - highly correlated with ability - smaller measurement errors - Estimate measures corr between test scores and wages - Not the effect of different abilities - But reduces problem of omitted variable bias - Proxy variable should 1. be *redundant* - not directly explain anything 2. Be correlated with endogenous var --- ## Regression with correlated regressor - Population equation: `\(E(wage|educ, ability) = educ + ability\)` - Let education depend on ability: `\(educ = iq + v\)` - Let IQ score be a part of ability `\(ability = 0.6 iq + e\)` ```r obs = 1000 iq = 2 + rnorm(obs) ability = 0.6*iq + 0.3 * rnorm(obs) educ = iq + 0.5 * rnorm(obs) wage = educ + ability + rnorm(obs) lm( wage ~ educ ) ``` ``` ## ## Call: ## lm(formula = wage ~ educ) ## ## Coefficients: ## (Intercept) educ ## 0.1713 1.5138 ``` --- ## Regression with correlated regressor - IQ does not explain wages - employer does not know IQ score result - IQ is a _proxy variable_ (or _control variable_) - IQ coefficient has no interpretation - Helps to account for the correlation between educ and ability --- ## Proxy variable estimation ```r m1 = lm( wage ~ educ) m2 = lm( wage ~ educ + iq) m3 = lm( wage ~ educ + ability) m4 = lm( wage ~ educ + ability + iq) huxreg(m1, m2, m3, m4) ```
(1)
(2)
(3)
(4)
(Intercept)
0.171 *
-0.093
-0.067
-0.059
(0.070)
(0.073)
(0.066)
(0.069)
educ
1.514 ***
0.960 ***
0.974 ***
0.991 ***
(0.031)
(0.067)
(0.046)
(0.063)
iq
0.689 ***
-0.037
(0.075)
(0.096)
ability
1.105 ***
1.131 ***
(0.074)
(0.101)
N
1000
1000
1000
1000
R2
0.711
0.734
0.764
0.764
logLik
-1482.327
-1441.830
-1382.230
-1382.156
AIC
2970.653
2891.660
2772.460
2774.312
*** p < 0.001; ** p < 0.01; * p < 0.05.
--- ## Definition of omitted variable bias If `\(W_i\)` is not included, we get _omitted variable bias_ if 1. `\(X_i\)` and `\(W_i\)` are correlated 2. `\(W_i\)` is a determinant of `\(Y_i\)` - Equation (6.1) on page 231 is not very intuitive --- ## A formula for omitted variable bias - If `\(X\)` and `\(W\)` are correlated, then `\(\delta_1 \neq 0\)` in `$$W_i = \delta_0 + \delta_1 X_i + v_i$$` Omitting `\(W_i\)` in the regression `$$Y_i = \beta_{0} + \beta_{1}X_{i} + u_i$$` - gives a bias `$$E(\hat\beta_1) = \beta_1 + \beta_2 \delta_1$$` --- ## Omitted variable bias in education - Let: `$$wage_i = \beta_0 + \beta_1 educ_i + \beta_2 ability_i + u_i$$` - If ability is correlated with educ, we have `$$ability = \delta_0 + \delta_1 educ_i + v_i$$` - and thus `$$E(\hat\beta_1) = \beta_1 + \beta_2 \delta_1$$` --- ## Omitted variable experiment ```r obs = 1000 educ = 2 + rnorm(obs) ability = 1 + 0.5 * educ + rnorm(obs) wage = educ + ability + rnorm(obs) lm( wage ~ educ) ``` ``` ## ## Call: ## lm(formula = wage ~ educ) ## ## Coefficients: ## (Intercept) educ ## 1.083 1.484 ``` --- ## Good and bad proxy variables - If `$$ability_i = \pi_0 + \pi_1 iq_i + v_i$$` then `$$wage_i = \beta_0 + \beta_1 educ_i + \beta_2 (\pi_0 + \pi_1 iq_i + v_i) + u_i$$` - Rewriting: `$$wage_i = (\beta_0 + \beta_2 \pi_0) + \beta_1 educ_i + \beta_2 \pi_1 iq_i + (\beta_2 v_i + u_i)$$` - If `\(educ\)` is correlated with `\(iq\)`, but not with other aspects of ability `\(v\)`, then it is uncorrelated with the error term `\(\beta_2 v + u\)` - Is this assumption valid? --- class: inverse, center, middle # Hypothesis tests and confidence intervals --- ## Correlation and OLS estimators - Let `\(Y = \beta_0 + \beta_1 X + \beta_2 W + u\)` - Higher slope means lower intercept - If `\(X\)` and `\(W\)` are _positively_ correlated then - `\(\hat\beta_1\)` and `\(\hat\beta_2\)` are negatively correlated - If a high `\(X\)` explains more of `\(Y\)`, then the high value of `\(W\)` explains less - If `\(X\)` and `\(W\)` are _negatively_ correlated then - `\(\hat\beta_1\)` and `\(\hat\beta_2\)` are positively correlated --- ## Standard errors for the OLS estimators .pull-left[ - Jointly normal distribution with means `\(\hat\beta_0\)`, `\(\hat\beta_1\)` - joint variance specified by _variance covariance matrix_ ] .pull-right[ ![](figures/pdf.png) ] --- ## Hypothesis tests for a single coefficient - Test null hypothesis `\(H_0\)` that the coefficient of str is 2.1 - Compute `$$t^{act}= \frac{\hat\beta_j-2.1}{SE(\hat\beta_j)}$$` - Reject at 5 percent level if - `\(|t^{act}| > 1.96\)` - Alternatively calculate the p-value `$$-2\Phi(-|t^{act}|)$$` --- ## Confidence interval .pull-left[ - 90% confidence interval for `\(\hat\beta_0\)` - `\(\hat\beta_1\)` can take any value ] .pull-right[ ![](figures/pdf_interval.png) ] --- ## Confidence sets .pull-left[ - Most likely combinations of slopes and intercepts ] .pull-right[ ![](figures/pdf_region.png) ] --- ## Geometry of confidence sets .pull-left[ - If the parameters were independent, the set would be a circle - Set is parameter pairs within a certain distance (circle) ] .pull-right[ ![](figures/pdf_iid.png) ] --- ## Test of joint hypothesis on two independent variables - Testing hypotheses on two or more coefficients - What is the distribution of the distance? - For standard normal the distance is given by: `\(t_1^2+t_2^2\)` - Has `\(\chi^2_2\)` (or `\(F_{2,\infty}\)`) distribution - The F-statistic - Set critical level to exclude 5% of outcomes with greatest distance - The F-value is a "distance measure" - We can reject `\(H_0\)` if the F-value is large --- ## F-test - Test if the squared sum is close to 0 - after rescaling the variance - Estimates `\(\beta_1\)` and `\(\beta_2\)` are approximately normal - Squared sum of `\(\beta_1\)` and `\(\beta_2\)` will be `\(\chi^2\)` distribution, 2 degrees of freedom - Rescaling with variance gives F distribution - Rescaling implies dividing by `\(\chi^2\)` distributed variable, `\(n - k\)` degrees of freedom --- ## Testing several parameters - Test `\(q\)` restrictions - For example `\(H_0: \beta_1 = \beta_2=0\)` - Can construct test statistic with - F distribution wih `\(q\)` and `\(n-k\)` degrees of freedom - Measures a 'distance' from - higher F value makes `\(H_0\)` less likely - p value measures probability of test value given `\(H_0\)` --- ## F-test of regression - Test of hypothesis: **all** estimated slopes are zero - A high F value desirable in this case as we want to reject null hypothesis - Corresponds to low probability of accepting the null hypothesis - Joint significance different than individual - F-test is very close to the `\(R^2\)` statistic - When one is low the other is high --- ## Joint significance of correlated regressor - Population equation: `\(E(Y_i|X_i,W_i) = 1 + X_i + W_i\)` ```r library(AER) obs = 100 X = rnorm(obs) W = X + 0.1 * rnorm(obs) Y = 1 + X + W + rnorm(obs) model = lm( Y ~ X + W) model ``` ``` ## ## Call: ## lm(formula = Y ~ X + W) ## ## Coefficients: ## (Intercept) X W ## 1.123 2.920 -1.017 ``` --- ## Joint test * Can specify more complex test with `linearHypothesis()` ```r linearHypothesis(model, c("X=0", "W=0")) ```
Res.Df
RSS
Df
Sum of Sq
F
Pr(>F)
99
457
97
94.7
2
363
186
6.92e-34
--- ## Regression statistics * Regression statistics - test of all coefficients included ```r summary(model) ``` ``` ## ## Call: ## lm(formula = Y ~ X + W) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.3209 -0.6792 0.0605 0.5844 2.6535 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.12288 0.09921 11.319 < 2e-16 *** ## X 2.91952 0.91304 3.198 0.00187 ** ## W -1.01723 0.89701 -1.134 0.25958 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.9883 on 97 degrees of freedom ## Multiple R-squared: 0.7928, Adjusted R-squared: 0.7886 ## F-statistic: 185.6 on 2 and 97 DF, p-value: < 2.2e-16 ``` --- ## Regression statistics with broom package ```r library(broom) tidy(model) ```
term
estimate
std.error
statistic
p.value
(Intercept)
1.12
0.0992
11.3
1.97e-19
X
2.92
0.913
3.2
0.00187
W
-1.02
0.897
-1.13
0.26
```r glance(model) ```
r.squared
adj.r.squared
sigma
statistic
p.value
df
logLik
AIC
BIC
deviance
df.residual
nobs
0.793
0.789
0.988
186
6.92e-34
2
-139
286
297
94.7
97
100
--- ## Next Lecture - Chapter 12 - Instrumental Variable Regression