Week 5 - Reproducibility#

Today’s learning objectives

Topics for today

  • Set up a reproducible computing environment

  • Get feedback on environment reproducibility

What to expect After going through the assignments, you will have reproduced someone else’s environment.

📆 Seminar schedule#

  • Welcome!

  • Reminder of the Code of Conduct

  • Discuss previous assignments and share progress [10 min]

  • Introduce challenge and explain assignments [15 min]

  • Work on assignments [90 min]

🙋 Discussion and questions#

👉 Go to collaborative document.

Assignments for Friday 21st of April#

The goal of the this week’s challenges and assignments is to make sure your project is reproducible. The best test is to get someone to reproduce your computational environment.

In order have a reproducible computational environment certain things need to be in place including, dependency documentation, basic instructions for the users. Some setups may include running tests or providing examples to reproduce. Today we would like to help you find out and define also what specific considerations should be taken into account for your project.

Consider adding the FAIR card below as an issue in your project board for future reference.

Getting started with the essentials#

Tip

Tip: Add the FAIR card to your github board to keep practicing project management tools.

FAIR card: Dependency management

_Essential_
- [ ] Document your project dependencies
- [ ] Provide instructions for replicating the computational environment

_Recommended_
- [ ] Use a [dependency manager](https://the-turing-way.netlify.app/reproducible-research/renv/renv-package.html)
- [ ] Pin [exact versions](https://github.com/conda/conda-lock) used to generate your environment

_Optional_
- [ ] [Containerized workflow](https://the-turing-way.netlify.app/reproducible-research/renv/renv-containers.html)

Some examples to get inspired#

Python example using containerized solution

Cpp reproducible example:

Low end project example, not meant to be maintained but still reproducible:

Assignment 1#

  1. Think about what defines your computational environment:

    • Hardware requirements (GPU, CPU cores, memory)

    • Operating system

    • Third-party (proprietary) software

    • Programming language

    • Packages and libraries (and their versions)

    • Local relative folder structure

  2. Choose the most appropriate method for your project for capturing your computational environment and create a dependency file.

    • How tightly do you need to control the version of the dependencies?

Tip

💡 Tips:

  • While Conda is Python-centric to a degree, it is also well-integrated for use with other languages. For example, the base version of Conda includes the C++ standard library.

  • Does your conda take a long time to resolve the dependencies? Have a look at mamba!

  • Matlab toolbox requirements can be found with this function or with the Dependency Analyzer.

  • For complex dependencies or strict OS requirements, it might be easier to create a snapshot of the entire computing environment including the OS and all dependencies using a container.

Assignment 2#

Get someone to test your project’s reproducibility. In order to achieve this milestone, consider the following questions:

  • What is my user’s background and level?

  • What is an acceptable minimum time to reproduce your environment?

  • How do you know the environment is successfully recreated?

  1. Brainstorm and write down the steps and checkpoints you need to consider to achieve the above for your project.

  2. Make any modifications (commit) in your repo during the process including the feedback you get from someone that tried to reproduce your environment.

Tip

Don’t forget to:

  • Share progress in our channel

  • Use your journal to collect notes, links and progress that can be used to showcase progress and celebrate it

Materials#

Reproducible environments

Containers

Matlab