Thoughts and ideas: Dealing with low sample size significance testing

Recently I had to analyze data with very few data points in the range of 3-15. The data consisted of 3 groups and multiple subgroups. The most obvious choice, in this case, was to use a non-parametric statistical test such as Wilcox test. The problem with the Wilcox test is that we have the problem of losing power/sensitivity. A t-test, on the other hand, may give us false-positive especially with the sample size of 3. How do we deal with this? This issue is exacerbated especially for p-value calculation. P-values seems to be necessary "evil" but below are the points to address this problem.

These are some of the links which helped me to understand this issue:
https://stats.stackexchange.com/questions/14434/appropriateness-of-wilcoxon-signed-rank-test
https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless/2498#2498
https://stats.stackexchange.com/questions/121852/how-to-choose-between-t-test-or-non-parametric-test-e-g-wilcoxon-in-small-sampl

I was intrigued further to carry informal "research" and found possible options.

If we cannot determine normality, should we just use t-test anyways?

The idea is that this experiment is to detect potential vaccine candidate in the very preliminary stage so we need to be slightly "lenient" and err on the side of allowing few false positives.

Just select Wilcox.test since it is appropriate for non-parametric and enough power to detect a difference

https://stats.stackexchange.com/a/66235/124490

Use bootstrapped values:

http://biostat.mc.vanderbilt.edu/wiki/pub/Main/JenniferThompson/ms_mtg_18oct07.pdf
Requires more than 8 samples.

https://stats.stackexchange.com/questions/33300/determining-sample-size-necessary-for-bootstrap-method-proposed-method

However, some say we may require more than 20

https://speakerdeck.com/jakevdp/statistics-for-hackers

Using Permutation test:

It works with a fewer sample size as compared to bootstrapping but it cannot generate confidence interval.
In fact, Wilcox test is a subset of a permutation test.

Plainly displaying the data points with a confidence interval.

Using Effect size to illustrate the "significance". Site: https://garstats.wordpress.com/2016/05/02/robust-effect-sizes-for-2-independent-groups/

Some of the recommendations include

Cohens.d (Not to use for non-normal/non-parametric data)
Cliff's delta (Non-parametric ordinal data)
Mutual information (MI)
Kolmogorov-Smirnov
Wilcox & Muska’s Q

Equivalence test. Site: https://support.minitab.com/en-us/minitab/18/help-and-how-to/statistics/equivalence-tests/supporting-topics/why-use-an-equivalence-test/

This option requires knowing the "difference" which has some biological/Clinical significance.

Overall, this gives us many options but still, this is no panacea. The small sample size is a very difficult issue and these "solutions" can help to minimize the pain.

Thoughts and ideas

Tuesday, January 16, 2018

Dealing with low sample size significance testing

No comments:

Post a Comment

Adding GPG keys to Github account

Report Abuse