For this portion of Project 3, you are being asked to develop a well-documented, well-tested, and well-explained R package. Follow the instruction on Lecture Slides 9 to set up the skeleton of your package. This R package should be adjusted to include functions we’ve written throughout the class:
my_t.test
my_lm
my_knn_cv
my_rf_cv
Your package should include a detailed and thorough vignette, demonstrating use of all of these functions using the gapminder
data from the gapminder
package. However, you must add and document the gapminder
data to your own package and export it as the object my_gapminder
(with proper credit in the documentation!).
Specifically, the vignette should have 5 parts:
eval = FALSE
).library()
for your package.my_t.test
lifeExp
data from my_gapminder
.my_lm
lifeExp
as your response variable and gdpPercap
and continent
as explanatory variables.gdpPercap
coefficient.gdpPercap
coefficient.gdpPercap
hypothesis test using a p-value cut-off of \(\alpha = 0.05\).ggplot2
to plot the Actual vs. Fitted values.my_knn_cv
using my_penguins
.species
using covariates bill_length_mm
, bill_depth_mm
, flipper_length_mm
, and body_mass_g
.k_cv = 5
).k_nn
\(= 1,\ldots, 10\):
k_nn
, record the training misclassification rate and the CV misclassification rate (output from your function).my_rf_cv
body_mass_g
using covariates bill_length_mm
, bill_depth_mm
, and flipper_length_mm
.k
in c(2, 5, 10)
:
k
, run your function \(30\) times.ggplot2
with 3 boxplots to display these data in an informative way. There should be a boxplot associated with each value of k
, representing \(30\) simulations each.README.md
file should include badges for R-CMD-check automated testing and codecov code coverage. To add a code coverage badge, see the instructions on codecov.io
after you link your repository.README.md
should include an Installation
section installation instructions and a Use
section demonstrating how to view the package vignettes. See my package for an example.devtools::check()
.inference
/ prediction
as @keywords
, depending on what the function is primarily used for.@examples
.For this portion of Project 3, you are being asked to set up a GitHub repository demonstrating your ability to set up a systematic data analysis project workflow. For this part, we are pretending we don’t have a package and using code and analyses you have already generated for Part 1.
Your analysis should be contained on a GitHub repository and include:
.Rproj
file with the name of the project.Data
subfolder with the raw, unprocessed data.
Data
, save the my_gapminder
and my_penguins
data as a raw .csv
.Code
subfolder with code to be loaded by your analysis files.
my_rf_cv.R
from your package in Part 1. You can include it exactly as it appears in your package, including documentation. Good roxygen2
style documentation is not limited to packages!Analysis
subfolder.
.Rmd
file. This file can, for the most part, be a copy of part 5 from your package vignette. However, this R Markdown document must
Data
subfolder,source()
to source code directly from from the Code
subfolder (your .Rmd
should not include code generating the function my_rf_cv
, it should load that function from your script!),ggsave()
to save all your figures within your analysis scripts (remember, your relative path from files in Analysis
will look like "../Output/Figures"
).saveRDS()
and write_csv()
to save your table of summary statistics and your simulation results, respectively (see Results
description).Output
subfolder with:
Figures
sub-subfolder with all the figures you generated in Analysis
Results
sub-subfolder that contains (a) your table of 8 summary statistics saved as a .rds
file and (b) a .csv
with your 90 simulated results with 3 columns for each value of \(k\) and 30 rows for each simulation..gitignore
file
.Rproj.user
and .Rhistory
.Rmd
in your Analysis
folder, it should re-load the Data
and Code
files and re-generate all the results in Output
. If your results in Output
aren’t systematically re-generated when you run your Analysis, something in your pipeline is broken!