For this portion of Project 3, you are being asked to develop a well-documented, well-tested, and well-explained R package. Follow the instruction on Lecture Slides 9 to set up the skeleton of your package. This R package should be adjusted to include functions we’ve written throughout the class:
my_t.testmy_lmmy_knn_cvmy_rf_cvYour package should include a detailed and thorough vignette, demonstrating use of all of these functions using the gapminder data from the gapminder package. However, you must add and document the gapminder data to your own package and export it as the object my_gapminder (with proper credit in the documentation!).
Specifically, the vignette should have 5 parts:
eval = FALSE).library() for your package.my_t.testlifeExp data from my_gapminder.my_lmlifeExp as your response variable and gdpPercap and continent as explanatory variables.gdpPercap coefficient.gdpPercap coefficient.gdpPercap hypothesis test using a p-value cut-off of \(\alpha = 0.05\).ggplot2 to plot the Actual vs. Fitted values.my_knn_cv using my_penguins.species using covariates bill_length_mm, bill_depth_mm, flipper_length_mm, and body_mass_g.k_cv = 5).k_nn\(= 1,\ldots, 10\):
k_nn, record the training misclassification rate and the CV misclassification rate (output from your function).my_rf_cvbody_mass_g using covariates bill_length_mm, bill_depth_mm, and flipper_length_mm.k in c(2, 5, 10):
k, run your function \(30\) times.ggplot2 with 3 boxplots to display these data in an informative way. There should be a boxplot associated with each value of k, representing \(30\) simulations each.README.md file should include badges for R-CMD-check automated testing and codecov code coverage. To add a code coverage badge, see the instructions on codecov.io after you link your repository.README.md should include an Installation section installation instructions and a Use section demonstrating how to view the package vignettes. See my package for an example.devtools::check().inference/ prediction as @keywords, depending on what the function is primarily used for.@examples.For this portion of Project 3, you are being asked to set up a GitHub repository demonstrating your ability to set up a systematic data analysis project workflow. For this part, we are pretending we don’t have a package and using code and analyses you have already generated for Part 1.
Your analysis should be contained on a GitHub repository and include:
.Rproj file with the name of the project.Data subfolder with the raw, unprocessed data.
Data, save the my_gapminder and my_penguins data as a raw .csv.Code subfolder with code to be loaded by your analysis files.
my_rf_cv.R from your package in Part 1. You can include it exactly as it appears in your package, including documentation. Good roxygen2 style documentation is not limited to packages!Analysis subfolder.
.Rmd file. This file can, for the most part, be a copy of part 5 from your package vignette. However, this R Markdown document must
Data subfolder,source() to source code directly from from the Code subfolder (your .Rmd should not include code generating the function my_rf_cv, it should load that function from your script!),ggsave() to save all your figures within your analysis scripts (remember, your relative path from files in Analysis will look like "../Output/Figures").saveRDS() and write_csv() to save your table of summary statistics and your simulation results, respectively (see Results description).Output subfolder with:
Figures sub-subfolder with all the figures you generated in AnalysisResults sub-subfolder that contains (a) your table of 8 summary statistics saved as a .rds file and (b) a .csv with your 90 simulated results with 3 columns for each value of \(k\) and 30 rows for each simulation..gitignore file
.Rproj.user and .Rhistory.Rmd in your Analysis folder, it should re-load the Data and Code files and re-generate all the results in Output. If your results in Output aren’t systematically re-generated when you run your Analysis, something in your pipeline is broken!