As we all continue to work on increased Rigor, Reproducibility, and Transparency (RRT) in scientific work, one service that we are providing with increased frequency at BCC is replicating analyses for manuscripts before they are submitted.
We recommend that all projects have two statisticians: a primary statistician to perform all data analyses, and a second statistician to review and replicate everything.
We perform 3 tasks when we review and replicate a paper.
- Ensure that we can precisely replicate every number in the text, tables, and figures.
- Ensure that the methods used to produce those numbers are accurately described in the paper, and that the results are interpreted correctly.
- Review whether we believe the methods used are reasonable to address the research questions in the paper.
What is the process, and what files should I send to facilitate this review and replication?
Data. Best case is to have 1 (or 2-3) clean final analysis dataset(s) (.csv, .xlsx) with all the variables needed to replicate the analyses in the paper.
- Data will usually have 1 row of data per person/subject and 1 column for each variable.
- Sometimes 2-3 datasets are needed if data are represented in different data structures. For example, longitudinal data often has multiple rows of data for the same person, while the demographic data is only one per person.
- If more than 1 file is used, a unique ID can be used to link them together.
- NOTE: Please de-identify data before sending to BCC or anyone else.
Syntax. Please provide the syntax that reads in this 1 (or 2-3) final analysis dataset and performs all data analysis, tables, and figures.
- Please add comments in the syntax file to indicate where each table and figure is being created.
- SAS, R, or Stata are fine.
Manuscript. Please provide a solid draft of the manuscript including methods, study design, analysis description, results, interpretations, tables and figures.
A related service we provide is to advise researchers on how to publish their data along with their publication. We recommend publishing this same analysis data file, and possibly the syntax file as well. The goal is that any reasonable statistician could reproduce all results using just the data provided and methods described in the paper. (We note that this is not always 100% possible if random numbers are involved, or if there are small details that couldn’t be included in the paper. But this is the goal.)