Tuesday, January 16, 2018

Dealing with low sample size significance testing


Recently I had to analyze data with very few data points in the range of 3-15. The data consisted of 3 groups and multiple subgroups. The most obvious choice, in this case, was to use a non-parametric statistical test such as Wilcox test. The problem with the Wilcox test is that we have the problem of losing power/sensitivity. A t-test, on the other hand, may give us false-positive especially with the sample size of 3. How do we deal with this? This issue is exacerbated especially for p-value calculation. P-values seems to be necessary "evil" but below are the points to address this problem.

These are some of the links which helped me to understand this issue:
https://stats.stackexchange.com/questions/14434/appropriateness-of-wilcoxon-signed-rank-test
https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless/2498#2498
https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl

I was intrigued further to carry informal "research" and found possible options.

  • If we cannot determine normality, should we just use t-test anyways? 
    • The idea is that this experiment is to detect potential vaccine candidate in the very preliminary stage so we need to be slightly "lenient" and err on the side of allowing few false positives.

  • Just select Wilcox.test since it is appropriate for non-parametric and enough power to detect a difference
    • https://stats.stackexchange.com/a/66235/124490

  • Use bootstrapped values: 
    • http://biostat.mc.vanderbilt.edu/wiki/pub/Main/JenniferThompson/ms_mtg_18oct07.pdf
    • Requires more than 8 samples.
      • https://stats.stackexchange.com/questions/33300/determining-sample-size-necessary-for-bootstrap-method-proposed-method
    • However, some say we may require more than 20
      • https://speakerdeck.com/jakevdp/statistics-for-hackers

  • Using Permutation test:
    • It works with a fewer sample size as compared to bootstrapping but it cannot generate confidence interval. 
    • In fact, Wilcox test is a subset of a permutation test. 

  • Plainly displaying the data points with a confidence interval.

  • Using Effect size to illustrate the "significance". Site: https://garstats.wordpress.com/2016/05/02/robust-effect-sizes-for-2-independent-groups/
    • Some of the recommendations include 
      • Cohens.d (Not to use for non-normal/non-parametric data) 
      • Cliff's delta (Non-parametric ordinal data)
      • Mutual information (MI)
      • Kolmogorov-Smirnov
      • Wilcox & Muska’s Q

  • Equivalence test. Site: https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/equivalence-tests/supporting-topics/why-use-an-equivalence-test/ 
    • This option requires knowing the "difference" which has some biological/Clinical significance.
Overall, this gives us many options but still, this is no panacea. The small sample size is a very difficult issue and these "solutions" can help to minimize the pain. 


No comments:

Post a Comment

Comparing R and Python

 I have used R for quite some time for data analysis. Especially with the use of Tidyverse package, it has been a very decent experience. Gg...