Reproducible Quantitative Analyses and Workflows using R

Reproducibility and open scientific practices are increasingly being requested or required of scientists and researchers, but training on these practices has not kept pace. This course intends to help bridge that gap.

Course Syllabus

Reproducibility and open scientific practices are increasingly demanded of scientists and researchers. Training on how to apply these practices in data analysis is still limited and has not kept up with demand. This course is aimed at researchers conducting quantitative analyses (ranging from lab-based research to epidemiology). By the end of the course, students will have:

  1. An understanding of why an open and reproducible data workflow is important.
  2. Practical experience in setting up and carrying out an open and reproducible data analysis workflow.
  3. Know how to continue learning methods and applications in this field.

Students will develop proficiency in using the R statistical computing language, as well as improving their data and code literacy. Throughout this course we will focus on a general quantitative analytical workflow, using the R statistical software and other modern tools. The course will place particular emphasis on research in diabetes and metabolism; it will be taught by instructors working in this field and it will use relevant examples where possible. This course will not teach statistical techniques, as these topics are already covered in university curriculums.

Prerequisites and installation instructions

No experience in data analysis or programming assumed or required. However, before attending the workshop, there are a few prerequisites to complete.

  1. Install the latest version of R
  2. Install the latest version of RStudio
  3. Install the packages listed in the Course Materials
  4. Install Git
  5. Read or scan through Chapter 1 of the online book “R for Data Science”
  6. Read and abide by the Code of Conduct

Instructors and helpers

  • Lead instructor and organizer:
  • Other instructors: TBD
  • Helpers:
    • Clemens Wittenbecher

Course Schedule

The workshop is structured as a series of participatory live-coding sessions (instructor and learner coding together) interspersed with hands-on exercises, using either a practice dataset or the participants’ own datasets. Some short lectures will be given throughout.

Date and time Session topic
Tuesday, May 21
9:00-9:30 Introduction to the workshop, to reproducibility, and to open science
9:30-12:30 Project management and best practices (with coffee break)
12:30-13:30 Lunch (not provided)
13:30-17:00 Data management, wrangling, and best practices (with coffee break)
17:00-17:15 End of day remarks and short survey
Wednesday, May 22
9:00-9:30 Review of last day’s topics
9:30-12:30 Version control and collaborative practices (with coffee break)
12:30-13:30 Lunch (not provided)
13:30-16:30 Data visualization and best practices (with coffee break)
16:30-16:45 End of day remarks and short survey
Thursday, May 23
9:00-9:30 Review of last day’s topics
9:30-12:30 Creating reproducible documents (with coffee break)
12:30-13:30 Lunch (not provided)
13:30-16:30 Efficiency in data analysis and best practices (with coffee break)
16:30-16:45 Concluding remarks and short survey

Contact

Institutions and Sponsors

Aarhus University German Institute of Human Nutrition