Comparison of Reproduction Platforms for Continuous Science
The idea of Continuous Integration (CI), i.e. the execution of test and build workflows on every commit, has an analogy in science:
The reproduction of scientific insights becomes "continuous" in the sense that
updates to data lead to the recalculation of statistical hypotheses like
an update to source code leads to the rebuilding of software assets in the context of CI.
This idea can even be put a step further, if the scientific data themselves are
created entirely in silico, e.g. if the data are created by a simulation.
We can then talk of replication (for a differentation between reproduction/reproducibility and replication see ).
There are various platforms supporting such continuous science workflows:
These Reproduction Platforms for Continuous Science (RPCS) differ (amongst other things) with regard to the set of feature they support, their performance, their license, their installation procedure and the protocols and standards they support.
Goal of this thesis will be to research all relevant platforms, develop a comparison scheme (such as a capability model) and test them against that scheme.
This master thesis can also be worked on by a group of motivated bachelor students.
- Know-How in or motivation to learn:
- Continuous Integration tools (Jenkins, Bamboo, gitlab-ci)
- Linux system administration (setting up and operating services on linux machines)
- Interest in issues of reproducibility of scientific findings, (a nice read to start is )
- Good skills in written English and sociability, since you probably will contact developers of several platforms to ask for code, documentation or support
- Research of all relevant RPCS (if necessary define a catalogue of criteria to include/exclude RPCS)
- Test-installation of all chosen RPCS
- Development of a comparison scheme between those platforms
- Data collection necessary for the comparison
- Presentation of the collected data
Installing the RPCS might necessitate computing resources exceeding the capacities of a normal desktop computer. In that case cloud computing capacities provided by the LRZ can be used.
Prof. Dr. Dieter Kranzlmüller
Number of students: 1 master student or several Bachelor students (min. 3)
- Barba, L. A.: Terminologies for Reproducible Research CoRR, 2018
- Jiminez, I. et al: The Popper Convention: Making Reproducible Systems Evaluation Practical 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017, 1561-1570
- Ayer et al.: Conquaire: Towards an architecture supporting continuous quality control to ensure reproducibility of research D-Lib Magazine, 2017, 23
- Ioannidis, J. P. A.: Why Most Published Research Findings Are False PLOS Medicine, Public Library of Science, 2005, 2