class: center, top, title-slide # STAT 302, Lecture Slides 1 ## Introduction and Getting Started with R ### Bryan Martin --- # Outline 1. Course Overview 2. Introduction to R, RStudio, and R Markdown 3. Getting started with data .middler[**Goal:** Create a functional R Markdown document utilizing basic R functionality (Short Lab 1)] --- # Syllabus .middler[.large[ [Link to syllabus](https://bryandmartin.github.io/STAT302/syllabus.html) ]] --- # Expectations What you should expect from me... * your learning will be my priority * you will be treated like an adult and with respect * your feedback will be valued * timely feedback on assignments * understandable and well-paced lectures (tell me if they are not!) * an attempt at making statistical computation as fun for you as it is for me! --- # Expectations What I will expect from you... * regular attendance * timely assignments that represent your best work * a respectful and engaged classroom * a desire and effort to learn challenging material --- # Canvas Discussion * Worth up to 2% extra credit on your final grade! * Substantive and helpful questions and answers -- .pull-left[ ### Bad questions: * How do you do problem 2? * Here's my code and it's broken. How do I fix it? ] .pull-right[ ### Good questions: * Here's a snippet of code I used for problem 2: <br/>`formatted code snippet` <br/>It returned the following error: <br/>`formatted error message` <br/>Does anyone know why? I already tried... * I don't understand the concept from Slide 18 today. Could anyone elaborate on why... ] --- # Canvas Discussion * Worth up to 2% extra credit on your final grade! * Substantive and helpful questions and answers .pull-left[ ### Bad answers: * Here's my solution * Fool! You should already know the answer to this! Your trivial question is no match for my superior intellect! ] .pull-right[ ### Good answers: * This error message occurs because your variable is a string instead of a numeric. Have you tried checking... * I think you have a bug in line 3 of the code you posted. You have more left parentheses than right parentheses so the line is incomplete. ] --- # Why R? R is a programming language designed for statistical analysis. -- * open-source * free * large and active community of developers and users * great analysis tools * great visualization tools -- * great user interface... --- # Why RStudio? RStudio is an integrated development environment (IDE) designed to make your life easier. -- * Organizes scripts, files, plots, code console, ... * Highlights syntax * Helpful interactive graphical interface * Will make an efficient, reproducible workflow *much* easier -- * R Markdown integration... --- # Why R Markdown? * Interface between code, output, and writing * Self-contained analyses * Creates HTML, PDF, slides (like these!), webpages, ... -- * Required for your labs! --- class: inverse .middler[.huge[Part 1: Introduction to R Utilities]] --- # Operators ```r # Addition 6 + 3 ``` ``` ## [1] 9 ``` ```r # Subtraction 6 - 3 ``` ``` ## [1] 3 ``` ```r # Multiplication 6 * 3 ``` ``` ## [1] 18 ``` ```r # Division 6 / 3 ``` ``` ## [1] 2 ``` --- # Comparison Operators ```r # Greater than 6 > 3 ``` ``` ## [1] TRUE ``` ```r # Less than 6 < 3 ``` ``` ## [1] FALSE ``` ```r # Equal to 6 == 3 ``` ``` ## [1] FALSE ``` ```r 6 == 3 + 3 ``` ``` ## [1] TRUE ``` --- # Comparison Operators ```r # Not equal to 6 != 3 ``` ``` ## [1] TRUE ``` ```r 6 < 6 ``` ``` ## [1] FALSE ``` ```r # Less than or equal to 6 <= 6 ``` ``` ## [1] TRUE ``` --- # Logical Operators ```r # and (6 < 3) & (1 < 3) ``` -- ``` ## [1] FALSE ``` -- ```r # and (2 < 3) & (1 < 3) ``` -- ``` ## [1] TRUE ``` -- ```r # or (6 < 3) | (1 < 3) ``` -- ``` ## [1] TRUE ``` -- ```r # a bit harder... (6 < 3) | (1 < 3) & (6 < 3) ``` -- ``` ## [1] FALSE ``` --- # Object Types ```r class(7) ``` ``` ## [1] "numeric" ``` ```r class("7") ``` ``` ## [1] "character" ``` ```r is.numeric(7) ``` ``` ## [1] TRUE ``` ```r is.numeric("7") ``` ``` ## [1] FALSE ``` --- # Object Types ```r is.character(7) ``` ``` ## [1] FALSE ``` ```r is.character("7") ``` ``` ## [1] TRUE ``` ```r is.na(7) ``` ``` ## [1] FALSE ``` ```r is.na(0/0) ``` ``` ## [1] TRUE ``` --- # Object Types ```r as.character(7) ``` ``` ## [1] "7" ``` ```r as.numeric("7") ``` ``` ## [1] 7 ``` ```r as.numeric("7") + 3 == 10 ``` ``` ## [1] TRUE ``` ```r "7" + 3 == 10 ``` ``` ## Error in "7" + 3: non-numeric argument to binary operator ``` --- # Assigning Variables ```r x <- 7 x ``` ``` ## [1] 7 ``` ```r x + 3 ``` ``` ## [1] 10 ``` ```r x == 7 ``` ``` ## [1] TRUE ``` ```r as.character(x) ``` ``` ## [1] "7" ``` ```r y <- 3 x + y ``` ``` ## [1] 10 ``` --- # Workspaces ```r # List all defined objects ls() ``` ``` ## [1] "x" "y" ``` ```r # Remove an object rm("x") ls() ``` ``` ## [1] "y" ``` ```r x ``` ``` ## Error in eval(expr, envir, enclos): object 'x' not found ``` --- # Workspaces ```r x <- 7 ls() ``` ``` ## [1] "x" "y" ``` ```r # Use with caution! This erases everything! rm(list = ls()) ls() ``` ``` ## character(0) ``` --- layout:true # Commenting Code --- ## What is a comment? * Computers completely ignore comments (in R, any line preceded by `#`) * Comments do not impact the functionality of your code at all. -- ### So why do them... -- * Commenting a code allows you to write notes for readers of your code only * Usually, that reader is you! * Coding without comments is ill-advised, bordering on impossible -- * Sneak peak at functions... --- ```r #' Wald-type t test #' @param mod an object of class \code{bbdml} #' @return Matrix with wald test statistics and p-values. Univariate tests only. waldt <- function(mod) { # Covariance matrix covMat <- try(chol2inv(chol(hessian(mod))), silent = TRUE) if (class(covMat) == "try-error") { warning("Singular Hessian! Cannot calculate p-values in this setting.") np <- length(mod$param) se <- tvalue <- pvalue <- rep(NA, np) } else { # Standard errors se <- sqrt(diag(covMat)) # test statistic tvalue <- mod$param/se # P-value pvalue <- 2*stats::pt(-abs(tvalue), mod$df.residual) } # make table coef.table <- cbind(mod$param, se, tvalue, pvalue) dimnames(coef.table) <- list(names(mod$param), c("Estimate", "Std. Error", "t value", "Pr(>|t|)")) return(coef.table) } ``` --- ## Comment Style Guide * When starting out, you should comment most lines * Frequent use of comments should allow most comments to be restricted to one line for readability * A comment should go above its corresponding line, be indented equally with the next line, and use a single `#` to mark a comment * Use a string of `-` or `=` to break your code into easily noticeable chunks * Example: `# Data Manipulation -----------` * RStudio allows you to collapse chunks marked like this to help with clutter -- * There are exceptions to every rule! Usually, comments are to help **you**! --- ## Example of when I break my own rules * Here's a snippet of a long mathematical function I wrote (lots code emitted with ellipses for space). * In order to help myself read through it later, I divided the function into major steps marked by easily visible comments I can see when scanning through ```r objfun <- function(theta, W, M, X, X_star, np, npstar, link, phi.link) { ### STEP 1 - Negative Log-likelihood # extract matrix of betas (np x 1), first np entries b <- utils::head(theta, np) # extract matrix of beta stars (npstar x 1), last npstar entries b_star <- utils::tail(theta, npstar) ... ### STEP 2 - Gradient # define gam gam <- phi/(1 - phi) ``` --- ## A final plea * Being a successful programmer *requires* commenting your code * Want to understand code you wrote >24 hours ago without comments? -- .center[] .center[.small[I just learned you can add gifs to R Markdown slides. Expect a lot of these]] -- * If you still aren't convinced... -- * Clear commenting is required for this course --- layout:false class: inverse .middler[.huge[Part 2: Using RStudio and R Markdown]] --- # RStudio Interface By default... * *Top left*: Editor pane. Browse and edit scripts and data with tabs * *Top right*: List of objects in your Environment (recall `ls()`), code History * *Bottom left*: Console for running R code line-by-line (`>` prompt) * *Bottom right*: Files, plots, packages, help files --- # Editor * Your workflow should be contained here (**not** your console) * Primarily used for writing and editing .R scripts -- * Try opening a file now using *File > New File > R Script*, write two lines of simple code * Click `Run` in the bar above your script. What happens? * Click on one of the lines of code. Press `Ctrl`/`⌘` + `Enter`. What happens? -- .center[**Important:** Every part of your R workflow belongs in this window!] --- layout: true # Environment & History * If you didn't already, define a variable in your R Script and run it * What happens in your Environment tab? -- * Type `install.packages("palmerpenguins")` in your Console. * Now add `library(palmerpenguins)` and `data(penguins)` to your script and run it. * What happens if you click on this in your Environment tab? * Note: We will delve deeper into data later! -- * Remove one of your variables and see what happens. --- * Click on the History tab to see what it contains. Try searching! -- * Select a line from your history and click `To Source`. What happens? -- * Useful for adding lines that you tested in your Console to your scripts -- .pushdown[.center[**Summary:** Useful to quickly browse what you have defined in your environment]] --- layout: false layout: true # Console --- * The quick and easy way to run individual lines of code * Nothing you do here is saved as part of your workflow! -- * Useful for debugging, testing code, iterating a plot until you like it ... * Once you get what you were looking for, add it to your script files! * **Never** manipulate your data in the console. Your workflow should always be **reproducible!** --- ## Incomplete Code What if we start a command, but do not finish it? ```r > 5 - + ``` Two options: * Press `Esc` to exit and *not* execute the line * Complete the command --- layout: false # Files, Plots, Packages, Help * We will explore this tab more as we get into functions and visualization * Files is used to browse the files on your computer * Useful for opening files/data, moving files you are working with * *Use caution!* Changing files here is the same as changing them on your computer. If you delete something, it's gone! * Plots are used to display plots you create in R * Help is used to browse help files of functions. You can explore these by preceding a function name with `?`. Try `?sqrt` to see. * Packages shows all the packages you currently have installed (we will get more into this later!) --- class: inverse .middler[.huge[Brief Intermission: File Organization]] --- layout:true # File Names Matter --- .middler[] --- .pull-left[ ## Bad * `newfinal2actualFINALnew.docx` * `asdfasdf.R` * `analy$i$ functions!.R` * `stuff.R` * Cluttered * Uninformative * Spaces * Special characters other than `_` and `-` ] .pull-right[ ## Good * `stat302_lab1.Rmd` * `analysis_functions.R` * `analysisFunctions.R` * `2020-01-08_labWriteup.Rmd` * Meaningful * Concise * camelCase or using `_` to distinguish words * Machine sortable ] --- ## Summary * Machine readable * Human readable * Plays well with default ordering -- * `01_draft.Rmd`, `02_draft.Rmd` , ... , `11_draft.Rmd` * `2018-05-05_resume.docx`, `2019-02-17_resume.docx`, `2020-01-08_resume.docx` --- layout: false layout: true # File Organization Matters --- .middler[] --- Easier to start with best practice rather than fix things later! -- 1. Somewhere on your computer, create the folder `STAT302` 2. Within that folder, create the subfolders `short_labs`<sup>1</sup>, `labs`, `projects` 3. Within your Short Labs folder, create a subfolder `short_lab_1`<sup>2</sup> 4. Put your both of short lab files from Monday into that folder 5. Within your Labs folder, create a folder for Lab 1 that follows the filename guide .footnote[[1] or `shortLabs`, `ShortLabs`, `Short_Labs`, ... (just follow the rules for file names!) [2] or `shortLab1`, `short_lab1`, ... ] -- .pushdown[May seem excessive for now, but this will come in handy when labs start including extra files such as data and figures!] --- layout: false # All done! For now... .middler[] --- layout: true # R Markdown --- Let's try making an R Markdown file: 1. Choose *File > New File > R Markdown...* 2. Make sure *HTML Output* is selected and click OK 3. Save the file in your new folder, call it `stat302_Lab1.Rmd` * *Hint:* Follow along, because this will become your Lab 1 submission! 4. Click the *Knit HTML* button * After it is done, browse to the file location using the `Files` tab. What do you notice? * Click *Open in Browser* to view the full HTML --- ## R Markdown Headers The header of .Rmd files is YAML (YAML Ain't Markup Language) code 5. Change `title` to "Lab 1" 6. Change `author` to your name in quotes 7. Change `date` to the due date in quotes -- Congrats! You have a functional .Rmd that will soon be your Lab 1 submissions! --- ## R Markdown Syntax (Thanks to Charles Lanfear, UW Sociology, for this very concise summary) --- .pull-left[ ## Output **bold/strong emphasis** *italic/normal emphasis* .forcehead[Header] ## Subheader ### Subsubheader ] .pull-right[ ## Syntax <pre> **bold/strong emphasis** *italic/normal emphasis* # Header ## Subheader ### Subsubheader </pre> ] --- .pull-left[ ## Output 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists [URLs are trivial](http://www.uw.edu)  ] .pull-right[ ## Syntax <div style="width:400px;overflow:auto"> <pre> 1. Ordered lists 1. Are real easy 1. Even with sublists 1. Or when lazy with numbering * Unordered lists * Are also real easy + Also even with sublists [URLs are trivial](http://www.uw.edu)  </div> </pre> ] --- .pull-left[ ## Output You can put some math `\(y= \left( \frac{2}{3} \right)^2\)` right up in there. `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or a sentence with `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ``` ] .pull-right[ ## Syntax <div style="width:400px;overflow:auto"> <pre> You can put some math $y= \left(\frac{2}{3} \right)^2$ right up in there `$$\frac{1}{n} \sum_{i=1}^{n} x_i = \bar{x}_n$$` Or a sentence with `code-looking font`. Or a block of code: ``` y <- 1:5 z <- y^2 ``` </pre> ] </div> --- ## Helpful Links * [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/) * [R Markdown Cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf) --- ## R Code within R Markdown As you saw in Short Lab 1, we can run and execute R code within R Markdown. To do so encase your code as follows. ```{r, eval = TRUE, echo = TRUE} # Your code goes here! ``` You can click the green triangle in the corner to evaluate that code chunk to preview the results without compiling the entire document --- ## Useful Code Chunk Parameters Parameters go into the opening brackets `{r}` and are separated by commas. Here are some you might find useful (checkout the guide links above for more): * `echo=FALSE`: Hide R code but keep results * `eval=FALSE`: Do not execute the R code * `include=FALSE`: Hides all output (useful to load packages at the beginning of your document) * `cache=TRUE`: Stores the results of the chunk, and only re-runs if the chunk is changed. Useful for files that take a while to compile * `fig.height=5, fig.width=5`: modify the dimensions of any plots that are generated in the chunk (units are in inches) --- ## In-Line R Code You can also include and execute R code directly in the text of your .Rmd! For example, say we define a variable ```r x <- 7 ``` If I want to reference this variable in text, I can do so directly by writing using ticks and starting with r. So if I type: The variable I want to reference is `r x`. what will appear is: The variable I want to reference is 7. --- ## In-Line R Code * This allows you to easily see where your values came from! * This prevents any typos in translating coding results to text! * This allows you to modify your analysis without needing to copy and paste updated results into your text! --- layout: false class: inverse .middler[.huge[Part 3: Data Types]] --- layout: true # Vectors --- * A **vector** is a set of values of the same type * We create vectors using the function `c()` ```r c(16, 3, 0, 7, -2) ``` ``` ## [1] 16 3 0 7 -2 ``` * We can shorthand vectors counting up (or down) using `:` ```r 1:5 ``` ``` ## [1] 1 2 3 4 5 ``` --- * We can also generate vectors using functions such as `rep()` and `seq()` ```r # Sequence from 1 to 20, incrementing by 5 seq(1, 20, by = 5) ``` ``` ## [1] 1 6 11 16 ``` ```r # Repeat each element of a vector 3 times each rep(c(1, 2), each = 3) ``` ``` ## [1] 1 1 1 2 2 2 ``` ```r # Repeat an entire vector 3 times rep(c(1, 2), 3) ``` ``` ## [1] 1 2 1 2 1 2 ``` --- * We index vectors using `[index]` after the vector name ```r x <- 1:5 x[3] ``` ``` ## [1] 3 ``` * If we use a negative index, we return the vector with that element removed ```r x[-4] ``` ``` ## [1] 1 2 3 5 ``` --- ## Vector Arithmetic **Vectorization**, or applying functions across vectors/arrays, is one of R's most powerful capabilities ```r y <- -5:-1 y ``` ``` ## [1] -5 -4 -3 -2 -1 ``` ```r x + y ``` ``` ## [1] -4 -2 0 2 4 ``` ```r x * y ``` ``` ## [1] -5 -8 -9 -8 -5 ``` --- Be careful! R **recycles**, repeating elements of shorter vectors to match longer vectors. This is incredibly useful when done on purpose, but can also easily lead to hard-to-catch bugs in your code! ```r 2 * x ``` ``` ## [1] 2 4 6 8 10 ``` ```r c(1, -1) * x ``` ``` ## Warning in c(1, -1) * x: longer object length is not a multiple of shorter ## object length ``` ``` ## [1] 1 -2 3 -4 5 ``` ```r c(1, -1) + x ``` ``` ## Warning in c(1, -1) + x: longer object length is not a multiple of shorter ## object length ``` ``` ## [1] 2 1 4 3 6 ``` --- We can apply many functions component-wise to vectors, including comparison operators. ```r x >= 3 ``` ``` ## [1] FALSE FALSE TRUE TRUE TRUE ``` ```r y < -2 ``` ``` ## [1] TRUE TRUE TRUE FALSE FALSE ``` ```r (x >= 3) & (y < -2) ``` ``` ## [1] FALSE FALSE TRUE FALSE FALSE ``` ```r x == c(1, 3, 2, 4, 5) ``` ``` ## [1] TRUE FALSE FALSE TRUE TRUE ``` --- ## Boolean Vectors In code, entries that are `TRUE` or `FALSE` are called **booleans**. These are incredibly important, because they can be used to give your computer conditions. What will the following code do? ```r x[x > 3] <- 3 x ``` -- ``` ## [1] 1 2 3 3 3 ``` --- ## Boolean Vectors We can also do basic arithmetic with booleans. `TRUE` is encoded as `1` and `FALSE` is encoded as `0`. ```r # First reset x x <- 1:5 sum(x >= 3) ``` -- ``` ## [1] 3 ``` -- ```r mean(x >= 3) ``` -- ``` ## [1] 0.6 ``` -- What is this last quantity telling us? -- By taking the mean, we are looking at the **proportion** of our vector that is `TRUE`! --- We can also get more complicated with our indexing. ```r # Return the second and third elements of y[c(2, 3)] ``` ``` ## [1] -4 -3 ``` ```r # Return the values of x greater than 3 x[x >= 3] ``` ``` ## [1] 3 4 5 ``` ```r # Values of x that match the index of the values of y that are less than -2 x[y < -2] ``` ``` ## [1] 1 2 3 ``` ```r # which() returns the index of entries that are TRUE which(y < -2) ``` ``` ## [1] 1 2 3 ``` --- We can compare entire vectors using `identical()` ```r identical(x, -rev(y)) ``` ``` ## [1] TRUE ``` What do you think the function `rev()` is doing in the code above? *Hint:* Use `?rev` to read the help files for the function --- ## Vector Data Types Note that vectors can only have one type of data. So we can do ```r c(1, 2, 3) ``` ``` ## [1] 1 2 3 ``` ```r c("a", "b", "c") ``` ``` ## [1] "a" "b" "c" ``` but when we try ```r c(1, "b", 3) ``` ``` ## [1] "1" "b" "3" ``` R will force the entries vector to be of the same type! This is a common source of bugs. --- ## Names We can assign names to the entries of our vectors using `names()`. This can be useful to label our data. Note that arithmetic doesn't change the names of our elements. ```r my_vec <- c(1, 2, 3) names(my_vec) <- c("a", "b", "c") my_vec ``` ``` ## a b c ## 1 2 3 ``` ```r my_vec + 1 ``` ``` ## a b c ## 2 3 4 ``` We can then access the names as their own vector by calling `names()` again. ```r names(my_vec) ``` ``` ## [1] "a" "b" "c" ``` --- ## Useful functions for vectors * `max()`, `min()`, `mean()`, `median()`, `sum()`, `sd()`, `var()` * `length()` returns the number of elements in the vector * `head()` and `tail()` return the beginning and end vectors * `sort()` will sort * `summary()` returns a 5-number summary * `any()` and `all()` to check conditions on Boolean vectors * `hist()` will return a crude histogram (we'll learn how to make this nicer later) You will need some of these for Lab 1! If you are unclear about what any of them do, use `?` before the function name to read the documentation. You should get in the habit of checking function documentation a lot! --- layout: false layout: true # Matrices --- * **Matrices** are two-dimensional extension of vectors, they have **rows** and **columns** * We can create a matrix using the function `matrix()` ```r my_matrix <- matrix(c(x, y), nrow = 2, ncol = 5, byrow = TRUE) my_matrix ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 2 3 4 5 ## [2,] -5 -4 -3 -2 -1 ``` ```r # Note: byrow = FALSE is the default my_matrix2 <- matrix(c(x, y), nrow = 2, ncol = 5) my_matrix2 ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 3 5 -4 -2 ## [2,] 2 4 -5 -3 -1 ``` .center[*Warning:* be careful not to call your matrix `matrix`! Why not?] --- We can also generate matrices by column binding (`cbind()`) and row binding (`rbind()`) vectors ```r cbind(x, y) ``` ``` ## x y ## [1,] 1 -5 ## [2,] 2 -4 ## [3,] 3 -3 ## [4,] 4 -2 ## [5,] 5 -1 ``` ```r rbind(x, y) ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## x 1 2 3 4 5 ## y -5 -4 -3 -2 -1 ``` --- ## Indexing and Subsetting Matrices Indexing a matrix is similar to indexing a vector, except we must index both the row and column, in that order. ```r my_matrix ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 2 3 4 5 ## [2,] -5 -4 -3 -2 -1 ``` ```r my_matrix[2, 3] ``` -- ``` ## [1] -3 ``` -- ```r my_matrix[2, c(1, 3, 5)] ``` -- ``` ## [1] -5 -3 -1 ``` --- Also similarly to vectors, we can subset using a negative index. ```r my_matrix ``` ``` ## [,1] [,2] [,3] [,4] [,5] ## [1,] 1 2 3 4 5 ## [2,] -5 -4 -3 -2 -1 ``` ```r my_matrix[-2, -4] ``` ``` ## [1] 1 2 3 5 ``` ```r # Note: Leaving an index blank includes all indices my_matrix[, -c(1, 3, 4, 5)] ``` ``` ## [1] 2 -4 ``` --- ```r my_matrix[, -c(1, 3, 4, 5)] ``` ``` ## [1] 2 -4 ``` ```r is.matrix(my_matrix[, -c(1, 3, 4, 5)]) ``` ``` ## [1] FALSE ``` What happened here? When subsetting a matrix reduces one dimension to length 1, R automatically coerces it into a vector. We can prevent this by including `drop = FALSE`. ```r my_matrix[, -c(1, 3, 4, 5), drop = FALSE] ``` ``` ## [,1] ## [1,] 2 ## [2,] -4 ``` ```r is.matrix(my_matrix[, -c(1, 3, 4, 5), drop = FALSE]) ``` ``` ## [1] TRUE ``` --- ## Filling in a Matrix We can fill in a matrix using indices. This is commonly done in statistical computing. In R, you should always start by initializing an empty matrix of the right size. ```r my_results <- matrix(NA, nrow = 3, ncol = 3) my_results ``` ``` ## [,1] [,2] [,3] ## [1,] NA NA NA ## [2,] NA NA NA ## [3,] NA NA NA ``` --- Then I can replace a single row (or column) using indices as follows. ```r my_results[2, ] <- c(2, 4, 3) my_results ``` ``` ## [,1] [,2] [,3] ## [1,] NA NA NA ## [2,] 2 4 3 ## [3,] NA NA NA ``` We can also fill in multiple rows (or columns) at once. (Likewise, we can also do subsets of rows/columns, or unique entries). Note that **recycling** applies here. ```r my_results[c(1, 3), ] <- 7 my_results ``` ``` ## [,1] [,2] [,3] ## [1,] 7 7 7 ## [2,] 2 4 3 ## [3,] 7 7 7 ``` --- ## Matrix Entry Types Matrices, like vectors, can only have entries of one type. ```r rbind(c(1, 2, 3), c("a", "b", "c")) ``` ``` ## [,1] [,2] [,3] ## [1,] "1" "2" "3" ## [2,] "a" "b" "c" ``` --- ## Functions on Matrices Let's create 3 matrices for the purposes of demonstrating matrix functions. ```r mat1 <- matrix(1:6, nrow = 2, ncol = 3, byrow = TRUE) mat1 ``` ``` ## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ``` ```r mat2 <- matrix(1:6, nrow = 3, ncol = 2) mat2 ``` ``` ## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6 ``` --- ```r mat3 <- matrix(5:10, nrow = 2, ncol = 3, byrow = TRUE) mat3 ``` ``` ## [,1] [,2] [,3] ## [1,] 5 6 7 ## [2,] 8 9 10 ``` --- ### Matrix Sums `+` ```r mat1 + mat3 ``` ``` ## [,1] [,2] [,3] ## [1,] 6 8 10 ## [2,] 12 14 16 ``` ### Element-wise Matrix Multiplication `*` ```r mat1 * mat3 ``` ``` ## [,1] [,2] [,3] ## [1,] 5 12 21 ## [2,] 32 45 60 ``` --- ### Matrix Multiplication `%*%` ```r mat_square <- mat1 %*% mat2 mat_square ``` ``` ## [,1] [,2] ## [1,] 14 32 ## [2,] 32 77 ``` ### Column Bind Matrices `cbind()` ```r cbind(mat1, mat3) ``` ``` ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 1 2 3 5 6 7 ## [2,] 4 5 6 8 9 10 ``` --- ### Transpose `t()` ```r t(mat1) ``` ``` ## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6 ``` ### Column Sums `colSums()` ```r colSums(mat1) ``` ``` ## [1] 5 7 9 ``` ### Row Sums `rowSums()` ```r rowSums(mat1) ``` ``` ## [1] 6 15 ``` --- ### Column Means `colMeans()` ```r colMeans(mat1) ``` ``` ## [1] 2.5 3.5 4.5 ``` ### Row Means `rowMeans()` ```r rowMeans(mat1) ``` ``` ## [1] 2 5 ``` ### Dimensions `dim()` ```r dim(mat1) ``` ``` ## [1] 2 3 ``` --- ### Determinent `det()` ```r det(mat_square) ``` ``` ## [1] 54 ``` ### Matrix Inverse `solve()` ```r solve(mat_square) ``` ``` ## [,1] [,2] ## [1,] 1.4259259 -0.5925926 ## [2,] -0.5925926 0.2592593 ``` ### Matrix Diagonal `diag()` ```r diag(mat_square) ``` ``` ## [1] 14 77 ``` --- ## A note on `diag()` `diag()` can also be used to generate diagonal matrices by supplying a vector ```r diag(c(1, 2, 3)) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 2 0 ## [3,] 0 0 3 ``` Supplying an integer will produce an identity matrix of that dimension ```r diag(3) ``` ``` ## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 1 0 ## [3,] 0 0 1 ``` --- ## Names We can assign names to the rows and columns, using `rownames()` and `colnames()`, respectively. Similarly to `names()` for vectors, we then access them by calling the function again. ```r colnames(mat1) <- c("var1", "var2", "var3") rownames(mat1) <- c("sample1", "sample2") mat1 ``` ``` ## var1 var2 var3 ## sample1 1 2 3 ## sample2 4 5 6 ``` ```r mat1 * 2 ``` ``` ## var1 var2 var3 ## sample1 2 4 6 ## sample2 8 10 12 ``` --- ## Names We can assign names to the rows and columns, using `rownames()` and `colnames()`, respectively. Similarly to `names()` for vectors, we then access them by calling the function again. ```r rownames(mat1) ``` ``` ## [1] "sample1" "sample2" ``` ```r colnames(mat1) ``` ``` ## [1] "var1" "var2" "var3" ``` --- ## Tables in R Markdown It is easy to go from matrices to tables using R Markdown. There are several methods (check the cheatsheet link and Google for alternatives). I will present one easy method here, but what you use is up to you! ```r # We need to load the knitr and kableExtra package library(knitr) library(kableExtra) ``` ```r my_tab <- data.frame(mat1) kable_styling(kable(mat1)) ``` <table class="table" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> var1 </th> <th style="text-align:right;"> var2 </th> <th style="text-align:right;"> var3 </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> sample1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3 </td> </tr> <tr> <td style="text-align:left;"> sample2 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 5 </td> <td style="text-align:right;"> 6 </td> </tr> </tbody> </table> What happened with the row and column names? --- layout: false class: inverse .middler[.huge[Part 4: R Coding Style Guide]] --- layout: true # R Coding Style Guide --- ## Who are you to tell me how to type? We will be using a mix of the [Tidyverse Style Guide](https://style.tidyverse.org/) by Hadley Wickham and the [Google Style Guide](https://google.github.io/styleguide/Rguide.html). Please see the links for details, but I will summarize some main points here and throughout the class as we learn more functionality, such as functions and packages. You will be graded on following good code style! --- ## Object Names Use either underscores (`_`) or big camel case (`BigCamelCase`) to separate words within an object name. Do not use dots `.` to separate words in R functions! ```r # Good day_one day_1 DayOne # Bad dayone ``` --- ## Object Names Names should be concise, meaningful, and (generally) nouns. ```r # Good day_one # Bad first_day_of_the_month djm1 ``` --- ## Object Names It is *very* important that object names do not write over common functions! ```r # Very extra super bad c <- 7 t <- 23 T <- FALSE mean <- "something" ``` Note: `T` and `F` are R shorthand for `TRUE` and `FALSE`, respectively. In general, spell them out to be as clear as possible. --- ## Spacing Put a space after every comma, just like in English writing. ```r # Good x[, 1] # Bad x[,1] x[ ,1] x[ , 1] ``` Do not put spaces inside or outside parentheses for regular function calls. ```r # Good mean(x, na.rm = TRUE) # Bad mean (x, na.rm = TRUE) mean( x, na.rm = TRUE ) ``` --- ## Spacing with Operators Most of the time when you are doing math, conditionals, logicals, or assignment, your operators should be surrounded by spaces. (e.g. for `==`, `+`, `-`, `<-`, etc.) ```r # Good height <- (feet * 12) + inches mean(x, na.rm = 10) # Bad height<-feet*12+inches mean(x, na.rm=10) ``` There are some exceptions we will learn more about later, such as the power symbol `^`. See the [Tidyverse Style Guide](https://style.tidyverse.org/) for more details! --- ## Extra Spacing Adding extra spaces ok if it improves alignment of `=` or `<-`. ```r # Good list( total = a + b + c, mean = (a + b + c) / n ) # Also fine list( total = a + b + c, mean = (a + b + c) / n ) ``` --- ## Long Lines of Code Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If a function call is too long to fit on a single line, use one line each for the function name, each argument, and the closing `)`. This makes the code easier to read and to change later. ```r # Good do_something_very_complicated( something = "that", requires = many, arguments = "some of which may be long" ) # Bad do_something_very_complicated("that", requires, many, arguments, "some of which may be long" ) ``` *Tip! Try RStudio > Preferences > Code > Display > Show Margin with Margin column 80 to give yourself a visual cue!* --- ## Assignment We use `<-` instead of `=` for assignment. This is moderately controversial if you find yourself in the right (wrong?) communities. ```r # Good x <- 5 # Bad x = 5 ``` --- ## Semicolons In R, semi-colons (`;`) are used to execute pieces of R code on a single line. In general, this is bad practice and should be avoided. Also, you never need to end lines of code with semi-colons! ```r # Bad a <- 2; b <- 3 # Also bad a <- 2; b <- 3; # Good a <- 2 b <- 3 ``` --- ## Quotes and Strings Use `"`, not `'`, for quoting text. The only exception is when the text already contains double quotes and no single quotes. ```r # Bad 'Text' 'Text with "double" and \'single\' quotes' # Good "Text" 'Text with "quotes"' '<a href="http://style.tidyverse.org">A link</a>' ``` --- Phew! All done for now. Follow these rules and your code will be looking .middler[]