Course Syllabus

Reproducibility and open scientific practices are increasingly demanded of scientists and researchers. Training on how to apply these practices in data analysis is still limited and has not kept up with demand. This course is aimed at researchers conducting quantitative analyses (ranging from lab-based research to epidemiology). By the end of the course, students will have:

An understanding of why an open and reproducible data workflow is important.
Practical experience in setting up and carrying out an open and reproducible data analysis workflow.
Know how to continue learning methods and applications in this field.

Students will develop proficiency in using the R statistical computing language, as well as improving their data and code literacy. Throughout this course we will focus on a general quantitative analytical workflow, using the R statistical software and other modern tools. The course will place particular emphasis on research in diabetes and metabolism; it will be taught by instructors working in this field and it will use relevant examples where possible. This course will not teach statistical techniques, as these topics are already covered in university curriculums.

Prerequisites and installation instructions

No experience in data analysis or programming assumed or required. However, before attending the workshop, there are a few prerequisites to complete.

Install the latest version of R
Install the latest version of RStudio
Install the packages listed in the Course Materials
Install Git
Read or scan through Chapter 1 of the online book “R for Data Science”
Read and abide by the Code of Conduct

Instructors and helpers

Lead instructor and organizer:
- Luke Johnston
Other instructors: TBD
Helpers:
- Clemens Wittenbecher

Recommended resources

R for Data Science: Excellent open and online resource for using R for data analysis and data science.
Fundamentals of Data Visualization: Excellent online resource for using ggplot2 and R graphics.
RStudio cheat sheets: Great quick reference.

Course Schedule

The workshop is structured as a series of participatory live-coding sessions (instructor and learner coding together) interspersed with hands-on exercises, using either a practice dataset or the participants’ own datasets. Some short lectures will be given throughout.

Date and time	Session topic
Tuesday, May 21
9:00-9:30	Introduction to the workshop, to reproducibility, and to open science
9:30-12:30	Project management and best practices (with coffee break)
12:30-13:30	Lunch (not provided)
13:30-17:00	Data management, wrangling, and best practices (with coffee break)
17:00-17:15	End of day remarks and short survey
Wednesday, May 22
9:00-9:30	Review of last day’s topics
9:30-12:30	Version control and collaborative practices (with coffee break)
12:30-13:30	Lunch (not provided)
13:30-16:30	Data visualization and best practices (with coffee break)
16:30-16:45	End of day remarks and short survey
Thursday, May 23
9:00-9:30	Review of last day’s topics
9:30-12:30	Creating reproducible documents (with coffee break)
12:30-13:30	Lunch (not provided)
13:30-16:30	Efficiency in data analysis and best practices (with coffee break)
16:30-16:45	Concluding remarks and short survey

Reproducible Quantitative Analyses and Workflows using R

Course Syllabus

Prerequisites and installation instructions

Instructors and helpers

Recommended resources

Course Schedule

Contact

Institutions and Sponsors