Aim
To ensure that the analyses can be properly reproduced
Requirements
Clear documentation of the data analysis in a log file (for example SPSS syntax, Do file in STATA, R script or Word file), to be able to reproduce the relevant data analyses.
Documentation
Log file including:
- Specific research questions or purpose of the analysis;
- Databases which are used for the analyses (For example ‘get file’ statement in SPSS syntax);
- All statistical analyses which are executed.
- Add a ‘README’ tab in your data files and/or separate descriptive document to ensure you can reproduce the results when needed (e.g. in case of audit or inspection, or journal review) and to promote interoperability of your data files.
Responsibilities
- Executing researcher:Â To document all steps that are taken throughout the data analysis in a log file.
- Project leaders:Â To regularly check and discuss the data analysis, by using the documentation in a log file.
- Research assistant:Â N.a.
How To
It is important in respect of reproducibility and efficiency of data analysis that clear documentation of the data analysis takes place. This may be undertaken by creating a log file for all the relevant analyses. This file needs to start off with the research question to be answered and the date of the analysis, and should end with a(n) (provisional) answer to the question.
A lof file (e.g. SPSS syntax) can be used to document your analyses (e.g. for an article) to allow you and others to easily retrieve and reproduce everything. Don’t forget to always include the name and location of the datafile (e.g. ‘get file’ in SPSS), so you know which file is related to your analysis (and where they are stored). Log files should include the code for all statistical tests conducted, to serve as an analysis logbook. Place your code in a logical order and distinguish between variable definitions and analyses (e.g. firstly all variable definitions, than the analyses for table 1, then table 2, etc.). A Dutch example of this can be found here.
Tip: annotate your log files (e.g. by using * followed by text in SPSS syntax). Annotations are an important part of documentation of your data analyses and facilitate reproduction of your results end recycling of your code.