As we all continue to work on increased Rigor, Reproducibility, and Transparency (RRT) in scientific work, one service that we are providing with increased frequency at BCC is replicating analyses for manuscripts before they are submitted.
We recommend that all projects have two statisticians: a primary statistician to perform all data analyses, and a second statistician to review and replicate everything.
We perform 3 tasks when we review and replicate a paper.
- Verify that we can precisely replicate every number in the text, tables, and figures by running the final analysis code on the data. (Reproducibility)
- Verify that the methods used to produce those numbers are accurately described in the paper (Transparency)
- Verify whether we believe the methods used are appropriate or could be improved (Rigor)
What is the process, and what files should I send to facilitate this review and replication?
Data. Best case is to have 1 (or 2-3) clean final analysis dataset(s) (.csv, .xlsx) with all the variables needed to replicate the analyses in the paper.
- Data will usually have 1 row of data per person/subject and 1 column for each variable.
- Longitudinal (repeated measures) data will usually have multiple rows of data per person/subject
- Sometimes 2-3 datasets are needed if data are represented in different data structures. For example, longitudinal data often has multiple rows of data for the same person, while the demographic data is only one per person.
- If more than 1 file is used, a unique ID should be used to link them together.
- NOTE: Please de-identify data before sending to BCC or anyone else.
Code. Please provide the syntax code that reads in this 1 (or 2-3) final analysis dataset and performs all data analysis, tables, and figures.
- Please add comments in the syntax file to indicate where each table and figure is being created.
- SAS, R, SPSS, or Stata are best.
Manuscript. Please provide a draft of the manuscript or report including methods, study design, analysis description, results, interpretations, tables and figures.
We will then perform the 3 RRT steps described above.
Data sharing. A related service we provide is to advise researchers on how to publish their data along with their publication. We recommend publishing this same analysis data file, and possibly the syntax file as well. The goal is that any reasonable statistician could reproduce all results using just the data provided and methods described in the paper. (We note that this is not always 100% possible if random numbers are involved, or if there are small details that couldn’t be included in the paper. But this is the goal.)
Leave a Reply