class: center, middle, inverse, title-slide # Econometrics - Lecture 3 ## Linear Regression with One Regressor ### Jonas Björnerstedt ### 2022-02-24 --- ## Lecture Content Part II. Fundamentals of Regression Analysis Chapter 4. Linear Regression with One Regressor - Generalize the concept of _mean_ - _Conditional mean_ `\(E(Y|X)\)` of `\(Y\)` - Given `\(X\)`, what is the mean of `\(Y\)`? --- ## Variance and correlation - Relationship between different random variables - Variance: `\(\sigma^2_Y = Var(Y)=E[(Y - \mu_Y)^2]\)` - Expected square distance from mean - Average square distance in sample - Standard deviation `\(\sigma_Y = \sqrt{Var(Y)}\)` - Covariance: `$$Cov(X, Y) = E[(X - \mu_X)(Y- \mu_Y)]$$` - Expected product of deviations from average for `\(X\)` and `\(Y\)` - Average product of deviations in sample --- ## Correlation coefficent - Normalize covariance with the standard deviation of `\(X\)` and `\(Y\)` `$$\rho_{XY} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y }$$` - We then have `$$-1 \le\rho_{XY}\le 1$$` ??? ## Scatterplots - Scatterplots the sample covariance and the sample correlation - Sample covariance and correlation - find data cps12.dta --- ## Sample covariance and correlation - The sample variance `\(s_{Y}^2\)` is given by `$$s_{Y}^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar Y)^2$$` - The sample covariance `\(s_{XY}\)` is given by `$$s_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y)$$` - The sample correlation coefficient `\(r_{XY}\)` is given by `$$r_{XY} = \frac{s_{XY}}{s_{X} s_{Y}}$$` --- ## Consistency of Sample Covariance <sup>1</sup> - According to the law of large numbers the average of independent random variables `\(Y_i\)` converges to the expected value `\(\mu_Y\)` - Requires that the variance `\(\sigma_Y^2\)` of `\(Y_i\)` is finite - The sample covariance is also an average of independent random variables `$$s_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y)$$` - Thus it converges in probability to the true value - Presupposes that it has finite variance - Implies finite fourth moments of `\(X_i\)` and `\(Y_i\)` are required .footnote[<sup>1</sup> Advanced topic] --- ## [Correlation - linear relationship <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-correlation) ![](statistics03_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- ## [Correlation and regression <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-population) ![](statistics03_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ## Population equation - How does `\(Y\)` depend on `\(X\)`? - Cannot hope to fully describe the relationship - Focus on the *conditional expectation*: - How does the *expected value* of `\(Y\)` depend on `\(X\)`? - To do this, we want a function `\(f(X)\)` such that `$$E(Y|X)=f(X)$$` - In linear models `\(f\)` is given by the _population regression_ line `$$E(Y|X) = \beta_{0}+\beta_{1} X$$` - `\(\beta_{0}\)` and `\(\beta_{1}\)` are _parameters_ in the model --- ## Some notation There is some more or less standard notation used in the book. If `\(X\)` and `\(Y\)` are random variables - `\(\mu_X = E(X)\)` - Expected value of `\(X\)` - `\(\sigma^2_X = Var(X) = E((X - \mu_X)^2)\)` - Variance of `\(X\)` - `\(\sigma_{XY} = Cov(X, Y) = E((X - \mu_X)(Y - \mu_Y))\)` - Covariance of `\(X\)` and `\(Y\)` - `\(corr(X, Y) = \rho_{XY} = \frac{\sigma_{XY} }{\sigma_{X}\sigma_{Y}}\)` - Correlation coefficient of `\(X\)` and `\(Y\)` --- ## Notation - population and sample There is some more or less standard notation used in the book. If `\(X\)` and `\(Y\)` are random variables Concept | Population | Sample ------------------------ | ---------------- | ----------- Expected value of `\(X\)` | `\(\mu_X = E(X)\)` | `\(\bar X\)` Variance of `\(X\)` | `\(\sigma^2_X = Var(X)\)` | `\(s^{2}_{X}\)` Standard deviation of `\(X\)` | `\(\sigma_X = se(X)\)` | `\(s_{X}\)` Covariance of `\(X\)` and `\(Y\)` | `\(\sigma_{XY} = Cov(X, Y)\)` | `\(s_{XY}\)` Correlation coefficient of `\(X\)` and `\(Y\)` | `\(corr(X, Y) = \rho_{XY}\)` | `\(r_{XY}\)` - `\(Y_i\)` - Random draw of `\(Y\)` - `\(\bar Y\)` - Mean of `\(Y_i\)` --- ## Notation - population and sample Population and sample definitions Concept | Population | Sample ------------------ | ---------------- | ----------- Expected value | `\(\mu_X = E(X)\)` | `\(\bar X = \frac{1}{n}\sum_i X_i\)` Variance | `\(\sigma^2_X = E[(X - \mu_X)^2]\)` | `\(s^{2}_{X} = \frac{1}{n-1}\sum_i (X_i - \bar X)^2\)` Covariance | `\(\sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)]\)` | `\(s_{XY} = \frac{1}{n-1}\sum_i (X_i - \bar X) (Y_i - \bar Y)\)` Correlation coefficient | `\(\rho_{XY}= \frac{\sigma_{XY} }{\sigma_{X}\sigma_{Y}}\)` | `\(r_{XY} = \frac{s_{XY} }{s_{X} s_{Y}}\)` --- ## Linear regression model - Given a _population regression_ function `$$E(Y|X) = \beta_{0}+\beta_{1} X$$` - A sample will consist of `\(n\)` observations `\(X_{i}\)` and `\(Y_{i}\)` - Each pair idependently and identically distributed with `$$Y_{i} = \beta_{0} + \beta_{1} X_{i} + u_{i}$$` - `\(u_{i}\)` is the _error term_ with `$$E(u_{i}|X_{i}) = 0$$` --- ## Mean, median and minimum distance - How do the mean and the median arise? - Minimum distance = median - Minimum square distance = mean --- ## [Linear distance - find median <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-median) ![](statistics03_files/figure-html/unnamed-chunk-3-1.png)<!-- --> --- ## [Square distance - find mean <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-mean) ![](statistics03_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- ### Mean and OLS <sup>1</sup> - Find `\(\mu\)` that minimizes sum of square distance to observations `\(Y_i\)`: $$ \sum_{i=1}^n (Y_i - \mu)^2$$ - To find minimum, take derivative wrt `\(\mu\)` and set to zero: `$$\sum_{i=1}^n -2(Y_i - \mu) = -2 \sum_{i=1}^n (Y_i - \mu) = 0$$` - This is zero when the sum is zero `$$\sum_{i=1}^n (Y_i - \mu) = \sum_{i=1}^n Y_i - \sum_{i=1}^n \mu = \sum_{i=1}^n Y_i - n \mu = 0$$` - Solving for `\(\mu\)`: $$ \mu = \frac{1}{n}\sum_{i=1}^n Y_i$$ --- ## The error term `\(u\)` (<sup>1</sup>) - For _any_ two random variables `\(X\)` and `\(Y\)` we can write `$$Y = \beta_{0}+\beta_{1} X+u$$` The question is only what properties `\(u\)` has! To see this: - If the true relationship is nonlinear, we can rewrite: `$$Y = f(X) + e$$` `$$Y = \beta_{0} + \beta_{1} X + (f(X) + e- \beta_{0} - \beta_{1} X)$$` `$$Y = \beta_{0} + \beta_{1} X + u$$` where `\(u\)` will depend on `\(X\)`: `$$u = f(X) + e - \beta_{0} - \beta_{1} X$$` --- ## Ordinary Least Squares (OLS) estimation - Given observations `\(X_{i}\)` and `\(Y_{i}\)` - Find `\(\hat\beta_{0}\)` and `\(\hat\beta_{1}\)` minimizing `$$\sum_{i=1}^n \hat u_{i}^2$$` - where the residuals `\(\hat u_{i}\)` are defined as: `$$\hat u_{i} = Y_{i} - \hat\beta_{0} - \hat\beta_{1} X_{i}$$` - Defines the line with the minimum square distance to the observations --- ## [Line with minimum square distance <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-ols) ![](statistics03_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## [Linear regression <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-regression) ![](statistics03_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Galton's regression - 'Regression to the mean' - tall parents tend to have shorter children - short parents tend to have longer - Can regress in either direction - tall children tend to have shorter parents - A regression is a means of expressing correlation - The regressors do not **cause** the dependent variable to change! - No causation even if relationship is strong --- ## Predicted values - The parameters `\(\hat\beta_{0}\)`, `\(\hat\beta_{1}\)` and `\(\hat u_i\)` fit the data. For all `\(i\)` we have `$$Y_{i} = \hat\beta_{0} + \hat\beta_{1} X_{i} + \hat u_i$$` - The predicted value `\(\hat Y_{i}\)` of the linear model is given by `$$\hat Y_{i} = \hat\beta_{0} + \hat\beta_{1} X_{i}$$` - Out of sample prediction `\(Y\)` can be obtained by inserting values of `\(X\)` not in sample: `$$Y = \hat\beta_{0} + \hat\beta_{1} X$$` --- ## Measures of fit - The `\(R^2\)` statistic is a measure of how much of the variance of `\(Y\)` is explained by `\(X\)` - The sample variance of `\(Y_{i}\)` is given by `$$TSS=\sum_{i=1}^{n} (Y_{i} - \bar Y)^2$$` - The variance of the predicted values `\(\hat Y_{i}\)` is given by `$$ESS=\sum_{i=1}^{n} (\hat Y_{i} - \bar Y)^2$$` - The `\(R^2\)` is the ratio of the two `$$R^2 = \frac{ESS}{TSS}$$` - Residual standard error / Root MSE --- ## [Data and regressions <sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-uncertainty) - Regression with `\(\beta=\left(0,1\right)\)`, `\(\sigma_u^{2}=1\)` and `\(0<x<10\)` - Grey area possible linear relationships within 95% CI ![](figures/ldisp.jpg) --- ## Large variance increases uncertainty - Regression with `\(\sigma_u^{2}=4\)` ![](figures/lvar.jpg) --- ## Small Variability in regressors - Same `\(\beta\)` and `\(\sigma_u^{2}\)`, but `\(4<x<6\)` - Note that high slope implies small intercept ![](figures/sdisp.jpg) --- ## Distribution of error `\(\varepsilon\)` and of `\(\beta\)` - Error term `\(u_i=-1,1\)` - With large sample `\(\beta\)` will be close to normal distribution - Knowledge of distribution can increase efficiency - Here we can see the *exact* relationship ![](figures/nonnormal.jpg) --- ## Non-normal residuals - Residuals are gathered around `\(\hat u_i=-1,1\)` - Not normally distributed! ![](figures/nnresid.jpg) --- ## Next lecture - Look at Appendices 4.2 and 4.3 - Chapter 4 - OLS theory - Chapter 5 - Hypothesis tests