Week 5 - Reproducibility#
Today’s learning objectives
Topics for today
Set up a reproducible computing environment
Get feedback on environment reproducibility
What to expect After going through the assignments, you will have reproduced someone else’s environment.
📆 Seminar schedule#
Welcome!
Reminder of the Code of Conduct
Discuss previous assignments and share progress [10 min]
Introduce challenge and explain assignments [15 min]
Work on assignments [90 min]
🙋 Discussion and questions#
👉 Go to collaborative document.
Assignments for Friday 21st of April#
The goal of the this week’s challenges and assignments is to make sure your project is reproducible. The best test is to get someone to reproduce your computational environment.
In order have a reproducible computational environment certain things need to be in place including, dependency documentation, basic instructions for the users. Some setups may include running tests or providing examples to reproduce. Today we would like to help you find out and define also what specific considerations should be taken into account for your project.
Consider adding the FAIR card below as an issue in your project board for future reference.
Getting started with the essentials#
Tip
Tip: Add the FAIR card to your github board to keep practicing project management tools.
FAIR card: Dependency management
_Essential_
- [ ] Document your project dependencies
- [ ] Provide instructions for replicating the computational environment
_Recommended_
- [ ] Use a [dependency manager](https://the-turing-way.netlify.app/reproducible-research/renv/renv-package.html)
- [ ] Pin [exact versions](https://github.com/conda/conda-lock) used to generate your environment
_Optional_
- [ ] [Containerized workflow](https://the-turing-way.netlify.app/reproducible-research/renv/renv-containers.html)
Some examples to get inspired#
Python example using containerized solution
TU Delft: SATAY-LL/Transposonmapper
eScience Center: matchms/matchms
Cpp reproducible example:
Github: MurTree/murtree
Low end project example, not meant to be maintained but still reproducible:
Assignment 1#
Think about what defines your computational environment:
Hardware requirements (GPU, CPU cores, memory)
Operating system
Third-party (proprietary) software
Programming language
Packages and libraries (and their versions)
Local relative folder structure
Choose the most appropriate method for your project for capturing your computational environment and create a dependency file.
How tightly do you need to control the version of the dependencies?
Tip
💡 Tips:
While Conda is Python-centric to a degree, it is also well-integrated for use with other languages. For example, the base version of Conda includes the C++ standard library.
Does your conda take a long time to resolve the dependencies? Have a look at mamba!
Matlab toolbox requirements can be found with this function or with the Dependency Analyzer.
For complex dependencies or strict OS requirements, it might be easier to create a snapshot of the entire computing environment including the OS and all dependencies using a container.
Assignment 2#
Get someone to test your project’s reproducibility. In order to achieve this milestone, consider the following questions:
What is my user’s background and level?
What is an acceptable minimum time to reproduce your environment?
How do you know the environment is successfully recreated?
Brainstorm and write down the steps and checkpoints you need to consider to achieve the above for your project.
Make any modifications (commit) in your repo during the process including the feedback you get from someone that tried to reproduce your environment.
Read this if you get stuck
Document the steps to recreate the computational environment in your
README.md
Share your installation instructions with a colleague and invite she/him to install and test your environment.
Get feedback.
Tip
Don’t forget to:
Share progress in our channel
Use your journal to collect notes, links and progress that can be used to showcase progress and celebrate it
Materials#
Reproducible environments
Containers
Matlab