[HTML][HTML] Analyzing clustered data: why and how to account for multiple observations nested within a study participant?

EL Moen, CJ Fricano-Kugler, BW Luikart, AJ O'Malley - Plos one, 2016 - journals.plos.org
EL Moen, CJ Fricano-Kugler, BW Luikart, AJ O'Malley
Plos one, 2016journals.plos.org
A conventional study design among medical and biological experimentalists involves
collecting multiple measurements from a study subject. For example, experiments utilizing
mouse models in neuroscience often involve collecting multiple neuron measurements per
mouse to increase the number of observations without requiring a large number of mice.
This leads to a form of statistical dependence referred to as clustering. Inappropriate
analyses of clustered data have resulted in several recent critiques of neuroscience …
A conventional study design among medical and biological experimentalists involves collecting multiple measurements from a study subject. For example, experiments utilizing mouse models in neuroscience often involve collecting multiple neuron measurements per mouse to increase the number of observations without requiring a large number of mice. This leads to a form of statistical dependence referred to as clustering. Inappropriate analyses of clustered data have resulted in several recent critiques of neuroscience research that suggest the bar for statistical analyses within the field is set too low. We compare naïve analytical approaches to marginal, fixed-effect, and mixed-effect models and provide guidelines for when each of these models is most appropriate based on study design. We demonstrate the influence of clustering on a between-mouse treatment effect, a within-mouse treatment effect, and an interaction effect between the two. Our analyses demonstrate that these statistical approaches can give substantially different results, primarily when the analyses include a between-mouse treatment effect. In a novel analysis from a neuroscience perspective, we also refine the mixed-effect approach through the inclusion of an aggregate mouse-level counterpart to a within-mouse (neuron level) treatment as an additional predictor by adapting an advanced modeling technique that has been used in social science research and show that this yields more informative results. Based on these findings, we emphasize the importance of appropriate analyses of clustered data, and we aim for this work to serve as a resource for when one is deciding which approach will work best for a given study.
PLOS