FairSubset: A tool to choose representative subsets of data for use with replicates or groups of different sample sizes

Main Article Content

Katherine K Ortell
Pawel M Switonski
Joe R Delaney

Keywords

statistics, normalization, automation, microscopy

Abstract

High-impact journals are promoting transparency of data. Modern scientific methods can be automated and produce disparate samples sizes. In many cases, it is desirable to retain identical or pre-defined sample sizes between replicates or groups. However, choosing which subset of originally acquired data that best matches the entirety of the data set without introducing bias is not trivial. Here, we released a free online tool, FairSubset, and its constituent Shiny App R code to subset data in an unbiased fashion. Subsets were set at the same N across samples and retained representative average and standard deviation information. The method can be used for quantitation of entire fields of view or other replicates without biasing the data pool toward large N samples. We showed examples of the tool’s use with fluorescence data and DNA-damage related Comet tail quantitation. This FairSubset tool and the method to retain distribution information at the single-datum level may be considered for standardized use in fair publishing practices.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...
Abstract 36 | HTML Downloads 82 PDF Downloads 209

References

1. Jones W. Longevity in a fasting spider. Science. 1884;3(48):4. Epub 1884/01/04. doi: 10.1126/science.ns-3.48.4-c. PubMed PMID: 17738099.
2. Lee JY, Kitaoka M. A beginner's guide to rigor and reproducibility in fluorescence imaging experiments. Mol Biol Cell. 2018;29(13):1519-25. Epub 2018/06/29. doi: 10.1091/mbc.E17-05-0276. PubMed PMID: 29953344; PubMed Central PMCID: PMCPMC6080651.
3. Ljosa V, Carpenter AE. Introduction to the quantitative analysis of two-dimensional fluorescence microscopy images for cell-based screening. PLoS Comput Biol. 2009;5(12):e1000603. Epub 2009/12/31. doi: 10.1371/journal.pcbi.1000603. PubMed PMID: 20041172; PubMed Central PMCID: PMCPMC2791844.
4. Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi: 10.1371/journal.pbio.1002128. PubMed PMID: 25901488; PubMed Central PMCID: PMCPMC4406565.
5. Kick the bar chart habit. Nat Methods. 2014;11(2):113. PubMed PMID: 24645190.
6. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9(7):671-5. PubMed PMID: 22930834; PubMed Central PMCID: PMCPMC5554542.
7. Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10(2):486-9. doi: 10.5812/ijem.3505. PubMed PMID: 23843808; PubMed Central PMCID: PMCPMC3693611.
8. Gyori BM, Venkatachalam G, Thiagarajan PS, Hsu D, Clement MV. OpenComet: an automated tool for comet assay image analysis. Redox Biol. 2014;2:457-65. doi: 10.1016/j.redox.2013.12.020. PubMed PMID: 24624335; PubMed Central PMCID: PMCPMC3949099.
9. Delaney JR, Patel CB, Willis KM, Haghighiabyaneh M, Axelrod J, Tancioni I, et al. Haploinsufficiency networks identify targetable patterns of allelic deficiency in low mutation ovarian cancer. Nat Commun. 2017;8:14423. doi: 10.1038/ncomms14423. PubMed PMID: 28198375; PubMed Central PMCID: PMCPMC5316854.
10. Data sharing and the future of science. Nat Commun. 2018;9(1):2817. doi: 10.1038/s41467-018-05227-z. PubMed PMID: 30026584; PubMed Central PMCID: PMCPMC6053389.
11. Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol. 2013;13:100. doi: 10.1186/1471-2288-13-100. PubMed PMID: 23902644; PubMed Central PMCID: PMCPMC3734029.