class: center, middle, inverse, title-slide # Introduction to R ## Seminar 1 ### Jonas Björnerstedt ### 2021-02-18 --- --- class: inverse, center, middle # Introduction --- ## Overview - Presentation of R and Rstudio - Focus on practical knowledge - Data management - Create a dataset - Transform data - Data analysis - Exploratory data analysis --- ## Prerequisites - From the beginning - Starting point: Excel spreadsheet --- ## Zoom * Microphone off - but pose questions!! * Chat-message - to me or everyone * Switching from Zoom to Rstudio quickly * Use Alt-Tab (Cmd-Tab on mac) * Switches to previous program * Easier to hop back and forth --- class: inverse, center, middle # Into to Rstudio --- ## Why R? - Open source - R is free - Old argument - More and more common that programs are free - Google: Android and Google Docs etc. - R is Open source - Supported by the industry: [R Consortium](https://www.r-consortium.org/members) - [Google Colab](https://colab.research.google.com/#create=true&language=r) can be used for R - Best environment for data management and analysis - Can be used for this only - Popular, a lot on the net: [The Popularity of Data Science Software](http://r4stats.com/articles/popularity/) --- ## Why learn R?! - R best for data management and data analysis - You can use R for this and other program for regressions - Data management takes a lot of time - Important to do it effectively - Have more time to do _Exploratory Data Analysis_ - Better than Excel - Excel is error prone - Tools are more 'compatible' now - Use the best tool for the task --- class: inverse, center, middle # R in practice --- ## R versions and installation * Cloud version * Used in seminars and exercises * Rstudio Cloud - version run by Rstudio * Time limit before paying * Installation on own computer * Good idea! Independence * Not a good idea in teaching --- ## Login on ### https://rstudio.sh.se/datalab .pull-left[ * Go to site * Click on Datalab link * User name: your full email address provided in _Contact info_ * Carl.Lund@gmail.com has username __Carl.Lund@gmail.com__ * Password: ### You provide this in the _Contact info_ ] .pull-right[ ![](figures/login.png) ] --- ## Rstudio overview - Rstudio is an Integrated Development Environment (IDE) for the statistical language R - Used by programmers, analysts and statisticians - You can do _a lot_ of different things in Rstudio - Ignore menus, windows and symbols that you are not acquainted with - Three windows (_Panes_) are visible in the environment: - Size can be changed, and they can be minimized --- ## Windows Common windows: - Console - execute code - Environment - see defined variables - Files - manage files - Help - help text - Tutorial - step by step instructions --- ## Variables - Numbers: `a = 2` - Text string: `k = "Hej"` - _Environment_ window shows defined variables - With assignment `=` no result is displayed - Without assignment content is displayed - output short and a little cryptic ```r a = 2 a ``` ``` ## [1] 2 ``` --- ## Expressions and rows - R continues to parse code until a complete expression is found ```r a = 2 + 2 ``` - Used when expressions get long - for example when plotting - Everything that follows a `#` character is a comment ```r 2+2 ``` ``` ## [1] 4 ``` ```r # This is a comment, not executed ``` --- ## Functions - Provide input arguments - Often results in an output value - Functions can return a value: `round(3.21)` is 3 - The value can be put in a new variable ```r x = 3.21 y = round(x) y ``` ``` ## [1] 3 ``` - Functions can also be used to _do something_ - They are commonly used for _side results_ rather than returning a value - For example save dataset or plot a figure --- ## Function syntax - R is a _functional_ language - Everything is done with functions - Functions sometimes take several arguments - Which order should they be provided? - Can use _named arguments_ * Ex: round to first decimal ```r round(x, digits = 1) ``` ``` ## [1] 3.2 ``` - Help file - _code completion_ and hover text boxes help --- ## Console and script - Console - Execute line by line - Up arrow - Script - Ctrl-Enter to execute line - Ctrl-Enter on selected text executes selected code - Output in console - To create script - See menu: File > New File - The first alternative: - R Script --- ## Vectors - A _vector_ contains several different values of the same type - Created with the `c()` function ```r v = c(2,4,6) v ``` ``` ## [1] 2 4 6 ``` ```r v[2] ``` ``` ## [1] 4 ``` --- ## Packages - A package defines a set of functions - By loading a package the set of functions that can be used is changed - R is a language that you can modify - Overview of packages in __Packages__ window - You have to _load_ a package before using a function defined in it ```r library(ggplot2) ``` --- ## Help - R - statistics program with by far most help info on the net - Help menu, cheat sheets - Help files in the _Help_ pane - look at the examples at the end - Videos on the net - Courses in data analys on the net - Best help tool: Google - Questions on forums like _stackexchange.com_ - Many people searching implies good matches in google searches --- class: inverse, center, middle # Program scripts and notebooks --- ## Files - Rstudio can handle many different types of files - See menu: File > New File - The first three are most important: - R Script - R Notebook - R Markdown... --- ## Scripts and markdown * Scripts are code based * Text that is not code has to be marked as comments with a # * Notebooks are text based * Code is put within code chunks, with backticks indicating that it is code * Both types can have code * Should be self contained * Has to be self contained in order to generate reports --- ## Markup och Markdown - A _Markup language_ is a way to use code symbols to indicate formating - HTML stands for HyperText Markup Language - LaTeX is another example - _Markdown_ was created as a simple way to create common formats - Text symbols used to create: italic, boldface, headings etc - For help check the menu: Help > Markdown Quick Reference --- ## Notebook structure - Two philosophies: text with code, or code with text - Code based - from Mathematica - Jupyter and Colab examples - Easier to learn - Hard to fix broken files - Text document - A little harder to learn - document has text codes - More fragile - if the codes are wrong the document will not build - Easier to fix when broken - More flexible --- ## Notebooks and markdown - With __knit__ you can create documents in different formats - Overview: `Help > Markdown Quick Reference` - In toolbar, Click on __Knit__ (or __Preview__) ![](figures/knit.png) --- ## Notebooks and Code chunks - Markdown documents can be combinations of text and kod - Can be used to document your code and ideas together - Different languages can be included - EX: R, Python or Stata ![](figures/insert_chunk.png) --- ## Notebooks and Code chunks - Code to separate text from code - three backward apostrophies - ithin curly braces, language and options specified - To insert chunk, click on __Insert__ icon in toolbar - Alternative: Ctrl-Alt-I (Command-Option-I on Mac) - Run chunk by clicking on play button - Or key combination: Ctrl-Shift-Enter ![](figures/run_chunk.png) - Result shown in document below code - Results not saved in markdown document - saved in separate file --- ## Order of calculation - As with R script, results depend on the order of calculating chunks - When the file is knit, the output is created in a new environment