class: center, middle, inverse, title-slide .title[ # Probability and statistics ] .subtitle[ ## Probability theory ] .author[ ### Jonas Björnerstedt ] .date[ ### 2024-02-20 ] --- ## Intro to Statistics overview - Three seminars - Probability, statistics and basic programming - Not in exam - But in econometrics exam! - Empirical exercise - Based on exercises we go through in class --- ## Textbook .pull-left[ - Stock and Watson, Introduction to Econometrics, Fourth (or Updated third edition) - Ch. 2. Review of Probability - Ch. 3. Review of Statistics - Used in both econometrics courses ] .pull-right[ <img src="figures/StockWatson_cover4.jpeg" width="100%" /> ] --- ## Resources - Rstudio resources: - https://rstudio.cloud/learn/primers - The internet! - Wikipedia is very good in probability and statistics - [Khan academy](https://www.khanacademy.org/math/statistics-probability) - From basic to advanced with [an app with videos and exercises](https://itunes.apple.com/us/app/khan-academy-you-can-learn/id469863705?mt=8) --- ## Today's lecture - Chapter 2 Introduction to R... --- ## Sampling Probability - Probability - Share of population with property - Share of random sample - frequency of event - Population - Example: Individuals in Sweden - Can be abstract set of states - States of the world where a coin toss gives heads - Sample - Draws of individuals from population - Example: Class --- ## Discrete random variable - Finite discrete variable takes `\(k\)` different values - Age or length of individuals in a class - Distribution can be characterized by the frequencies: - Relative frequency of each age or length --- ## Coin toss - Coin toss has two outcomes (heads or tails) - Assign a numerical value to each: -1, 1 - Equal probability of each outcome with a _fair coin_ ![](statistics01_files/figure-html/fig.width==1-1.png)<!-- --> --- ## Discrete random variable - dice - A toss of a die can have various outcomes - Sample space: {1, 2, 3, 4, 5, 6} - Each outcome occurs with equal probability - Frequency with which we expect outcome - _Probability Mass Function (PMF)_ - function that assigns a _probability_ to each outcome in the sample space ![](statistics01_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- ## Random draws from population .pull-left[ - Population of men and women (not real data!) `$$P(man)=P(woman)=1/2$$` ] .pull-right[ ![](statistics01_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- ## Joint distributions * Let `\(X\)` and `\(Y\)` be random variables * The probabilities of `\(X\)` and `\(Y\)` taking different values can be related * `\(X\)` and `\(Y\)` are said to be independent if for all values `\(x\)` and `\(y\)` that they can take, we have: `$$Pr(X=x, Y=y) = Pr(X=x)*Pr(Y=y)$$` --- ## Full time work .pull-left[ - men work more full time `$$P(work | man) = 0.6$$` `$$P(work | woman) = 0.4$$` - Probability of work sum of two bottom rectangles `$$P(work) = 0.6*0.5 + 0.4*0.5$$` ] .pull-right[ ![](statistics01_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] `$$P(work) = P(work | man)*P(man) + P(work | woman)*P(woman)$$` --- ## School - Conditional probability is the same `$$P(school) = P(school | man) = P(school | woman) = 0.2$$` - Gender gives no additional information about schooling ![](statistics01_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- ## Continuous random variable - Many random variables are best seen as continuous - Example: exact position of throwing darts - Probability of an exact outcome is zero `\(Pr(X = 0.5)=0\)` - _Probability Density Function (PDF)_ describes probabilities - Probability of an outcome is the _area_ under the PDF - Probability of `\(X < 1\)` is given by the red area - In this case `\(Pr(X < 1) \approx 0.68\)` ![](statistics01_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- ## Probability and statistics - Random variables - Numerical properties of individuals - Examples: height, weight and gender - Characterized by *probability distribution* - Probability of each value that variable can take - Example: frequencies of all lengths in population - Summarize with a *statistic* - Real or vector valued _function_ of sample - Random variable (it depends on a random sample) - Sampling distribution of statistic? --- ## Expected value - A *Statistic* summarizes properties of distributions - A real valued function of the probability distribution - If `\(Y\)` has a discrete distribution: `$$E(Y)= \sum_i^k Y_i p_i =\mu_Y$$` - For dice: `$$E(Y)=1*\frac{1}{6}+2*\frac{1}{6}+3*\frac{1}{6}+4*\frac{1}{6}+5*\frac{1}{6}+6*\frac{1}{6}=3.5$$` - Populations often have equal weights - Ex: The mean height of the Swedish population is just the average - Sum the weights of everybody and divide by the number of people --- ## Expected value - formula - Expected value: `$$E[Y] = Y_{1}p_{1}+Y_{2}p_{2}+ \ldots +Y_{n}p_{n} = \sum_{i=1}^{n}Y_{i}p_{i}$$` - Average population value if all have the same probability `$$E[Y] = \sum_{i=1}^{n}Y_{i}p_{i} = \sum_{i=1}^{n}Y_{i}\frac{1}{n} = \frac{1}{n} \sum_{i=1}^{n}Y_{i}$$` --- ## Variance - The variance is a measure of the spread around the expected value - How big is the square deviation on average? - Let `\(r\)` be the square deviation: `$$r = (Y - E[Y])^2$$` - Then the variance is the expected value of the square deviation: `$$Var(Y) = E[r] = E\left[(Y - E[Y])^2 \right]$$` --- ## Variance in sample The variance in a sample is the corresponding expression with means instead of expected values `$$Var(Y) = \frac{1}{n-1}\sum_{i=1}^{n}(Y_{i} - \bar Y)^2$$` * We take average square distance from mean `\(\bar Y\)` * Technical detail: there are only `\(n-1\)` observations of deviations from mean * If we have two observations, we have only _one_ difference. --- ## Conditional expectation * Conditional expectation - expected value given something * Two population variables `\(length\)` and `\(woman\)` * Expected length of women: `\(E(length | woman = 1 )\)` * Population average for women * `\(woman\)` is dummy variable, with values 0 and 1 --- ## [Transform a random variable<sup> 🔗 </sup>](http://rstudio.sh.se/content/statistics01-figs.Rmd#section-correlation) - From a random variable `\(Y\)` we can create new random variables: * `\(2Y\)` stretches * `\(Y + 1\)` moves * Squared deviation `\(r\)` is a random variable * [Illustrate with coin toss] --- ## Exercise - Coin toss has two outcomes (heads or tails) - Random variable taking values: -1, 1 - What is the expected value and variance? - Consider a random variable that assigns 0, 1 to outcomes - What is the expected value and variance? --- ## Mean and median of distribution - _Mean_ - Average `\(X\)` value - _Median_ - `\(X\)` with half of the density (area) to the left and right - Differs when distribution is not symmetric (skewed) - Example income ![](statistics01_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- ## Some notation There is some more or less standard notation used in the book. If `\(Y\)` is a random variable - `\(\mu_Y = E(Y)\)` - Expected value of `\(Y\)` - `\(\sigma^2_Y = Var(Y) = E((Y - \mu_Y)^2)\)` - Variance of `\(Y\)` - `\(\sigma_Y = std.dev(Y) = \sqrt{Var(Y)}\)` - Standard deviation of `\(Y\)` - Square root of the variance