In more complex studies, you should make a data analysis plan prior to starting the analysis, but it is preferable to already make the plan before you even start collecting data. The plan should at least address the following topics:
- the research question in terms of population, intervention, comparison, and outcomes;
- a description of the (subgroup of the) population that is to be included in the analyses (in-and exclusion criteria);
- which datasets are used and if applicable, how datasets are merged;
- data from which time point (T1, T2, etc.) will be used, if applicable;
- variables to be used in the analyses and how these will be analysed (e.g., continuous or categorical);
- variables to be investigated as confounders or effect modifiers and how these will be analysed;
- missing value treatment;
- which analyses are to be carried out in which order.
- structuring of folders and files, and managing of file version control
You may need to consult a statistician about the choice of statistical methods.
Frequently Asked Questions
Who can help at my UMC?
Use the Toolbox to find data analysis support at your UMC. Check with your institute's data governance authority, clinical trial office or statistical support group if there is a template for the statistical analysis plan.
What statistical method should I use?
Your choice of statistical methods may have an impact on the conclusion that you can draw from your data. Think carefully about the hypothesis and the alternatives before running all kinds of statistical analyses and be open for unexpected outcomes. Do not hesitate to seek expert knowledge. Be aware of the risk to find spurious relations if you are performing multiple statistical analyses (p-hacking).
For ever-larger studies, ever-higher degrees of automation of all procedures are required. You should consider a workflow system rather than running each analysis step by hand. Workflow systems can automatically keep track of the exact processing steps. Workflow systems can also run the exactly reproduce the same sequence of steps on a series of input files. For very large studies, it may be required to use a system that can automatically recreate and validate workflows on multiple computer infrastructures.