+ - 0:00:00
Notes for current slide
Notes for next slide

Final remarks: Data analysis in the era of reproducible and open science

1 / 13
  • Openness
  • Transparency
  • Quality: Reproducibility
  • Collaboration / Team work
  • Communication
2 / 13

Many rapid changes to key infrastructures of science

We are in the middle of an exponential growth curve:

  • Data production
  • Data storage and transfer
  • Computing power
  • Published research
  • Complexity of methods
3 / 13

Good modern example: UK Biobank

4 / 13

But we scientists still think from "cottage industry" perspective

  • Industrialisation of the research work flow
  • Specialisation in research tasks

5 / 13

Current scientific culture not prepared for analytic and computation era

6 / 13

Why? Still very strong barriers to change

  • Tools needed
  • Tradition, culture and common practices
  • Researchers don't yet see value in adopting open, reproducible workflow
  • Training and reward systems need to be adapted:
    • Publication
    • Academic recognition / careers
    • Research funding mechanisms
  • Law: privacy concerns about sharing data, IP protection, patents, etc.
7 / 13

Open science debates and initiatives don't recognize role of software

E.g. EU H2020 Open Science Mandate only mentions data and publications.

8 / 13

Little to no training in software or programming

Source from xkcd.

9 / 13

What does it mean for you?

10 / 13

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)
10 / 13

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

10 / 13

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

  • Keep the principles of reproducibility in mind, then find the tools

10 / 13

What does it mean for you?

  • Find and collaborate with those familiar with these concepts (online and/or in real life)

  • Cite research that is or tries to be more reproducible

  • Keep the principles of reproducibility in mind, then find the tools

  • Practice reproducible and open science

    • More on this later in session
10 / 13

Recognize importance of code and data: Cite them!

11 / 13

Recognize importance of code and data: Cite them!

# Example:
citation("dplyr")
##
## To cite package 'dplyr' in publications use:
##
## Hadley Wickham, Romain François, Lionel Henry and Kirill Müller
## (2019). dplyr: A Grammar of Data Manipulation. R package version
## 0.8.1. https://CRAN.R-project.org/package=dplyr
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {dplyr: A Grammar of Data Manipulation},
## author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller},
## year = {2019},
## note = {R package version 0.8.1},
## url = {https://CRAN.R-project.org/package=dplyr},
## }
11 / 13

Comment: True reproducibility is very difficult

  • Requires self-contained virtual environment
  • With exact package versions and operating system used
  • Tools to do this include:
    • Docker virtual containers [1]
    • Continuous intergration with [Travis] [2]
12 / 13

Comment: True reproducibility is very difficult

  • Requires self-contained virtual environment
  • With exact package versions and operating system used
  • Tools to do this include:
    • Docker virtual containers [1]
    • Continuous intergration with [Travis] [2]

...but this is not the goal, nor should it be

  • Tools to simplify this are being developed
    • Keep eye out

[1] Want more info, see this tutorial.
[2] For easier integration and use of Travis in R, see the travis package.

12 / 13

This is or will be the future. Be prepared.

(...i hope...)

13 / 13
  • Openness
  • Transparency
  • Quality: Reproducibility
  • Collaboration / Team work
  • Communication
2 / 13
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow