class: center, middle, inverse, title-slide # Review of the third day’s material --- layout: true <div class="my-footer"> <span> <img src="../img/au_logo_black.png" alt="Aarhus University", width="140"> </span> </div> --- class: center, middle # Reproducible documents --- ## Markdown syntax .pull-left[ ```markdown --- title: "Document title" author: Your Name --- # Header 1 ## Header 2 ### Header 3 Text **bold**, *italics* - list 1 - list 2 ``` ] -- .pull-right[ ```markdown 1. number 1 2. number 2 Footnote[^1] [^1]: Footnote content ![image caption](path/to/image.png) [Link](https://google.com) | | Column 1 | Column 2 | |:--|----:|--------:| | Row 1 | Cell | Cell | | Row 2 | Cell | Cell | ``` ] --- ## R Markdown code chunks ````markdown --- title: "My report" author: "Me!" bibliography: my_references.bib output: html_document: theme: sandstone --- Cite: [@Hoejsgaard2006a] Code chunk. ```{r chunk-label, chunk.options} # R Code here plot(iris) library(knitr) # Table kable(iris) ``` ```` --- class: center, middle # Efficient coding --- ## read_csv repeats twice, wrangling four times ```r *nhanes_2009 <- read_csv(here::here("data/nhanes-2009_10.csv")) nhanes_2009 %>% * mutate(ProblemBMIs = !between(BMI, 18.5, 40)) %>% * filter(!is.na(ProblemBMIs)) %>% * select(Age, Poverty, Pulse, BPSysAve, BPDiaAve, TotChol, * SleepHrsNight, PhysActiveDays, ProblemBMIs) %>% * gather(Measurement, Value, -ProblemBMIs) %>% * na.omit() %>% ggplot(aes(y = Value, x = Measurement, colour = ProblemBMIs)) + geom_jitter(position = position_dodge(width = 0.6)) + scale_color_viridis_d(end = 0.8) + labs(y = "", x = "") + theme_minimal() + theme(legend.position = c(0.85, 0.85)) + coord_flip() ``` --- ## jitter repeats twice ```r nhanes_2009 <- read_csv(here::here("data/nhanes-2009_10.csv")) nhanes_2009 %>% mutate(ProblemBMIs = !between(BMI, 18.5, 40)) %>% filter(!is.na(ProblemBMIs)) %>% select(Age, Poverty, Pulse, BPSysAve, BPDiaAve, TotChol, SleepHrsNight, PhysActiveDays, ProblemBMIs) %>% gather(Measurement, Value, -ProblemBMIs) %>% na.omit() %>% * ggplot(aes(y = Value, x = Measurement, colour = ProblemBMIs)) + * geom_jitter(position = position_dodge(width = 0.6)) + * scale_color_viridis_d(end = 0.8) + * labs(y = "", x = "") + * theme_minimal() + * theme(legend.position = c(0.85, 0.85)) + * coord_flip() ``` --- ## Move code into functions ```r *read_mutate_gather <- function(.file_path) { * .file_path %>% read_csv() %>% mutate(ProblemBMIs = !between(BMI, 18.5, 40)) %>% filter(!is.na(ProblemBMIs)) %>% select(Age, Poverty, Pulse, BPSysAve, BPDiaAve, TotChol, SleepHrsNight, PhysActiveDays, ProblemBMIs) %>% gather(Measurement, Value, -ProblemBMIs) %>% na.omit() *} *plot_jitter <- function(.dataset) { * .dataset %>% ggplot(aes(y = Value, x = Measurement, colour = ProblemBMIs)) + geom_jitter(position = position_dodge(width = 0.6)) + scale_color_viridis_d(end = 0.8) + labs(y = "", x = "") + theme_minimal() + theme(legend.position = c(0.85, 0.85)) + coord_flip() *} ``` --- ## density repeats twice ```r nhanes_2009 %>% mutate(ProblemBMIs = !between(BMI, 18.5, 40)) %>% filter(!is.na(ProblemBMIs)) %>% select(Age, Poverty, Pulse, BPSysAve, BPDiaAve, TotChol, SleepHrsNight, PhysActiveDays, ProblemBMIs) %>% gather(Measurement, Value, -ProblemBMIs) %>% na.omit() %>% * ggplot(aes(x = Value, fill = ProblemBMIs)) + * geom_density(alpha = 0.35) + * facet_wrap(~Measurement, scales = "free") + * scale_fill_viridis_d(end = 0.8) + * labs(y = "", x = "") + * theme_minimal() + * theme(legend.position = c(0.85, 0.15), * strip.text = element_text(face = "bold")) ``` --- ## Move code into functions ```r *plot_density <- function(.dataset) { * .dataset %>% ggplot(aes(x = Value, fill = ProblemBMIs)) + geom_density(alpha = 0.35) + facet_wrap(~Measurement, scales = "free") + scale_fill_viridis_d(end = 0.8) + labs(y = "", x = "") + theme_minimal() + theme(legend.position = c(0.85, 0.15), strip.text = element_text(face = "bold")) *} ``` --- ## Two dataframes, two figures each. ```r # Start with file paths: files <- c(here::here("data/nhanes-2009_10.csv"), here::here("data/nhanes-2011_12.csv")) ``` -- ```r # Apply wrangling to each data file: data_list <- files %>% map(read_mutate_gather) ``` -- ```r # Apply figure to each data file: # Plot the jitters map(data_list, plot_jitter) # Plot the density map(data_list, plot_density) ``` --- ## Parallel processing ```r # Start with file paths: files <- c(here::here("data/nhanes-2009_10.csv"), here::here("data/nhanes-2011_12.csv")) *library(furrr) *plan(multiprocess) # Apply wrangling to each data file: data_list <- files %>% * future_map(read_mutate_gather) ``` ```r # Apply figure to each data file: # Plot the jitters *future_map(data_list, plot_jitter) # Plot the density *future_map(data_list, plot_density) ```