Sample size estimation for genomics experiments with dependent end points
In typical genomics studies involving numerous association tests of gene mutations with a disease, error rate control via multiplicity adjustment is paramount because even if all genes were to be non-differentially associated, we would still make some false positives. Many methods exist that incorporate the control of multiplicity for normally distributed endpoints in sample size estimation, but none addresses the issue for non-normally correlated endpoints. One common practice in the literature is to assume an equal correlation among all differentially associated or expressed genes, thereby using the generalized binomial or beta-binomial model to compute the comparison-wise power of detecting these genes.^ We present a fast and simple novel approach for estimating sample size which focuses on controlling the family-wise error rate using Hunter and Worsley's method for normally, t and chi-square distributed endpoints of any correlation structure. The sample size needed are computed using either a two-sample z-test or chi-square test formula depending on whether the response variable is continuous or binary at the desired comparison-wise power (using the binomial model), adjusted family-wise error rate and standardized effect size. These modifications would provide sample size estimates that are close to their exact values under more general correlation structures, where the generalized binomial or beta-binomial model may fail to perform.^
Koomson, Desmond, "Sample size estimation for genomics experiments with dependent end points" (2016). ETD Collection for University of Texas, El Paso. AAI10151106.