Econometrics - Lecture 3

class: center, middle, inverse, title-slide

# Econometrics - Lecture 3
## Linear Regression with One Regressor
### Jonas Björnerstedt
### 2022-02-24

---

## Lecture Content

Part II. Fundamentals of Regression Analysis

Chapter 4. Linear Regression with One Regressor

- Generalize the concept of _mean_

- _Conditional mean_ `$E(Y|X)$` of `$Y$`

- Given `$X$`, what is the mean of `$Y$`?

---
## Variance and correlation

- Relationship between different random variables

- Variance: `$\sigma^2_Y = Var(Y)=E[(Y - \mu_Y)^2]$`

- Expected square distance from mean

- Average square distance in sample

- Standard deviation `$\sigma_Y = \sqrt{Var(Y)}$`

- Covariance: 
`$$Cov(X, Y) = E[(X - \mu_X)(Y- \mu_Y)]$$`

- Expected product of deviations from average for `$X$` and `$Y$`

- Average product of deviations in sample

---
## Correlation coefficent

- Normalize covariance with the standard deviation of `$X$` and `$Y$`
`$$\rho_{XY} = \frac{Cov(X, Y)}{\sigma_X \sigma_Y }$$`

- We then have
`$$-1 \le\rho_{XY}\le 1$$`

???

## Scatterplots

- Scatterplots the sample covariance and the sample correlation

- Sample covariance and correlation

- find data cps12.dta

---
## Sample covariance and correlation

- The sample variance `$s_{Y}^2$` is given by 
`$$s_{Y}^2 = \frac{1}{n-1} \sum_{i=1}^n (Y_i - \bar Y)^2$$`

- The sample covariance `$s_{XY}$` is given by 
`$$s_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y)$$`

- The sample correlation coefficient `$r_{XY}$` is given by
`$$r_{XY} = \frac{s_{XY}}{s_{X} s_{Y}}$$`

---
## Consistency of Sample Covariance 1

- According to the law of large numbers the average of independent random variables `$Y_i$` converges to the expected value `$\mu_Y$`

- Requires that the variance `$\sigma_Y^2$` of `$Y_i$` is finite

- The sample covariance is also an average of independent random variables
`$$s_{XY} = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar X)(Y_i - \bar Y)$$`

- Thus it converges in probability to the true value

- Presupposes that it has finite variance

- Implies finite fourth moments of `$X_i$` and `$Y_i$` are required

.footnote[1 Advanced topic]

---
## [Correlation - linear relationship 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-correlation)

![](statistics03_files/figure-html/unnamed-chunk-1-1.png)

---
## [Correlation and regression 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-population)

![](statistics03_files/figure-html/unnamed-chunk-2-1.png)

---
## Population equation

- How does `$Y$` depend on `$X$`?

- Cannot hope to fully describe the relationship

- Focus on the *conditional expectation*:

- How does the *expected value* of `$Y$` depend on `$X$`?

- To do this, we want a function `$f(X)$` such that
`$$E(Y|X)=f(X)$$`

- In linear models `$f$` is given by the _population regression_ line
    `$$E(Y|X) = \beta_{0}+\beta_{1} X$$`

- `$\beta_{0}$` and `$\beta_{1}$` are _parameters_ in the model

---
## Some notation

There is some more or less standard notation used in the book. If `$X$` 
and `$Y$` are random variables

- `$\mu_X = E(X)$` - Expected value of `$X$`

- `$\sigma^2_X = Var(X) = E((X - \mu_X)^2)$` - Variance of `$X$`

- `$\sigma_{XY} = Cov(X, Y) = E((X - \mu_X)(Y - \mu_Y))$` - Covariance of `$X$` and `$Y$`

- `$corr(X, Y) = \rho_{XY} = \frac{\sigma_{XY} }{\sigma_{X}\sigma_{Y}}$` - Correlation coefficient of `$X$` and `$Y$`

---
## Notation - population and sample

There is some more or less standard notation used in the book. If `$X$` 
and `$Y$` are random variables

Concept | Population | Sample
------------------------ | ---------------- | -----------
Expected value of `$X$` | `$\mu_X = E(X)$` | `$\bar X$`
Variance of `$X$` | `$\sigma^2_X = Var(X)$` | `$s^{2}_{X}$`
Standard deviation of `$X$` | `$\sigma_X = se(X)$` | `$s_{X}$`
Covariance of `$X$` and `$Y$` | `$\sigma_{XY} = Cov(X, Y)$` | `$s_{XY}$`
Correlation coefficient of `$X$` and `$Y$` | `$corr(X, Y) = \rho_{XY}$` | `$r_{XY}$`

- `$Y_i$` - Random draw of `$Y$`

- `$\bar Y$` - Mean of `$Y_i$`

---
## Notation - population and sample

Population and sample definitions

Concept | Population | Sample
------------------ | ---------------- | -----------
Expected value | `$\mu_X = E(X)$` | `$\bar X = \frac{1}{n}\sum_i X_i$`
Variance | `$\sigma^2_X = E[(X - \mu_X)^2]$` | `$s^{2}_{X} = \frac{1}{n-1}\sum_i (X_i - \bar X)^2$`
Covariance | `$\sigma_{XY} = E[(X - \mu_X)(Y - \mu_Y)]$` | `$s_{XY} = \frac{1}{n-1}\sum_i (X_i - \bar X) (Y_i - \bar Y)$`
Correlation coefficient | `$\rho_{XY}= \frac{\sigma_{XY} }{\sigma_{X}\sigma_{Y}}$` | `$r_{XY} = \frac{s_{XY} }{s_{X} s_{Y}}$`

---
## Linear regression model

- Given a _population regression_ function
    `$$E(Y|X) = \beta_{0}+\beta_{1} X$$`

- A sample will consist of `$n$` observations `$X_{i}$` and `$Y_{i}$`

- Each pair idependently and identically distributed with

`$$Y_{i} = \beta_{0} + \beta_{1} X_{i} + u_{i}$$`

- `$u_{i}$` is the _error term_ with
`$$E(u_{i}|X_{i}) = 0$$`

---
## Mean, median and minimum distance

- How do the mean and the median arise?

- Minimum distance = median

- Minimum square distance = mean

---
## [Linear distance - find median 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-median)

![](statistics03_files/figure-html/unnamed-chunk-3-1.png)

---
## [Square distance - find mean 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-mean)

![](statistics03_files/figure-html/unnamed-chunk-4-1.png)

---
### Mean and OLS 1

- Find `$\mu$` that minimizes sum of square distance to observations `$Y_i$`:

$$ \sum_{i=1}^n (Y_i - \mu)^2$$

- To find minimum, take derivative wrt `$\mu$` and set to zero:

`$$\sum_{i=1}^n -2(Y_i - \mu) = -2 \sum_{i=1}^n (Y_i - \mu) = 0$$`

- This is zero when the sum is zero

`$$\sum_{i=1}^n (Y_i - \mu) = \sum_{i=1}^n Y_i  - \sum_{i=1}^n \mu  = \sum_{i=1}^n Y_i  - n \mu = 0$$`

- Solving for `$\mu$`:

$$ \mu = \frac{1}{n}\sum_{i=1}^n Y_i$$
---
## The error term `$u$` (1)

- For _any_ two random variables `$X$` and `$Y$` we can write 
  `$$Y = \beta_{0}+\beta_{1} X+u$$`
The question is only what properties `$u$` has! To see this:

- If the true relationship is nonlinear, we can rewrite:

`$$Y = f(X) + e$$`

`$$Y = \beta_{0} + \beta_{1} X + (f(X) + e- \beta_{0} - \beta_{1} X)$$`

`$$Y = \beta_{0} + \beta_{1} X + u$$`

where `$u$` will depend on `$X$`:

`$$u = f(X) + e - \beta_{0} - \beta_{1} X$$`

---
## Ordinary Least Squares (OLS) estimation

- Given observations `$X_{i}$` and `$Y_{i}$`

- Find `$\hat\beta_{0}$` and `$\hat\beta_{1}$` minimizing
`$$\sum_{i=1}^n \hat u_{i}^2$$`

- where the residuals `$\hat u_{i}$` are defined as:
`$$\hat u_{i} = Y_{i} - \hat\beta_{0} - \hat\beta_{1} X_{i}$$`

- Defines the line with the minimum square distance to the observations

---
## [Line with minimum square distance 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-ols)

![](statistics03_files/figure-html/unnamed-chunk-5-1.png)

---
## [Linear regression 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-regression)

![](statistics03_files/figure-html/unnamed-chunk-6-1.png)

---
## Galton's regression

- 'Regression to the mean'

- tall parents tend to have shorter children

- short parents tend to have longer

- Can regress in either direction

- tall children tend to have shorter parents

- A regression is a means of expressing correlation

- The regressors do not **cause** the dependent variable to change!

- No causation even if relationship is strong

---
## Predicted values

- The parameters  `$\hat\beta_{0}$`, `$\hat\beta_{1}$` and `$\hat u_i$` fit the data. For all `$i$` we have
`$$Y_{i} = \hat\beta_{0} + \hat\beta_{1} X_{i} + \hat u_i$$`

- The predicted value `$\hat Y_{i}$` of the linear model is given by
  `$$\hat Y_{i} = \hat\beta_{0} + \hat\beta_{1} X_{i}$$`

- Out of sample prediction `$Y$` can be obtained by inserting values of `$X$` not in sample:
`$$Y = \hat\beta_{0} + \hat\beta_{1} X$$`

---
## Measures of fit

- The `$R^2$` statistic is a measure of how much of the variance of `$Y$` is explained by `$X$`

- The sample variance of `$Y_{i}$` is given by
`$$TSS=\sum_{i=1}^{n} (Y_{i} - \bar Y)^2$$`

- The variance of the predicted values `$\hat Y_{i}$` is given by
`$$ESS=\sum_{i=1}^{n} (\hat Y_{i} - \bar Y)^2$$`

- The `$R^2$` is the ratio of the two
`$$R^2 = \frac{ESS}{TSS}$$`

- Residual standard error / Root MSE

---
## [Data and regressions 🔗 ](http://rstudio.sh.se/content/statistics03-figs.Rmd#section-uncertainty)

- Regression with `$\beta=\left(0,1\right)$`, `$\sigma_u^{2}=1$` and `$0<x<10$`

- Grey area possible linear relationships within 95% CI

![](figures/ldisp.jpg)

---
## Large variance increases uncertainty

- Regression with `$\sigma_u^{2}=4$`

![](figures/lvar.jpg)

---
## Small Variability in regressors

- Same `$\beta$` and `$\sigma_u^{2}$`, but `$4<x<6$`

- Note that high slope implies small intercept

![](figures/sdisp.jpg)

---
## Distribution of error `$\varepsilon$` and of `$\beta$`

- Error term `$u_i=-1,1$`
- With large sample `$\beta$` will be close to normal distribution
- Knowledge of distribution can increase efficiency
    - Here we can see the *exact* relationship

![](figures/nonnormal.jpg)

---
## Non-normal residuals

- Residuals are gathered around `$\hat u_i=-1,1$`
- Not normally distributed!
![](figures/nnresid.jpg)

---
## Next lecture

- Look at Appendices 4.2 and 4.3

- Chapter 4 - OLS theory

- Chapter 5 - Hypothesis tests