If you collaborated with anyone, you must include “Collaborated with: FIRSTNAME LASTNAME” at the top of your lab!

For this lab, note that there are tidyverse methods to perform cross-validation in R (see the rsample package). However, your goal is to understand and be able to implement the algorithm “by hand”, meaning that automated procedures from the rsample package, or similar packages, will not be accepted.

To begin, load in the popular penguins data set from the package palmerpenguins.

library(palmerpenguins)
data(package = "palmerpenguins")

Part 1. k-Nearest Neighbors Cross-Validation (10 points)

Our goal here is to predict output class species using covariates bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g. All your code should be within a function my_knn_cv.

Input:

Please note the distinction between k_nn and k_cv!

Output: a list with objects

You will need to include the following steps:

Submission: To prove your function works, apply it to the penguins data. Predict output class species using covariates bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g. Use \(5\)-fold cross validation (k_cv = 5). Use a table to show the cv_err values for 1-nearest neighbor and 5-nearest neighbors (k_nn = 1 and k_nn = 5). Comment on which value had lower CV misclassification error and which had lower training set error (compare your output class to the true class, penguins$species).

Part 2. Random Forest Cross-Validation (10 points)

Now, we will predict output body_mass_g using covariates bill_length_mm, bill_depth_mm, and flipper_length_mm. All your code should be within a function my_rf_cv.

Input:

Output:

Your code will look very similar to Part 1! You will need the following steps:

Submission: To prove your function works, apply it to the penguins data. Predict body_mass_g using covariates bill_length_mm, bill_depth_mm, and flipper_length_mm. Run your function with \(5\)-fold cross validation (k = 5) and report the CV MSE.