FairSubset: A tool to choose representative subsets of data for use with replicates or groups of different sample sizes


  • Katherine K Ortell Medical University of South Carolina
  • Pawel M Switonski Duke University
  • Joe R Delaney Medical University of South Carolina




statistics, normalization, automation, microscopy


High-impact journals are promoting transparency of data. Modern scientific methods can be automated and produce disparate samples sizes. In many cases, it is desirable to retain identical or pre-defined sample sizes between replicates or groups. However, choosing which subset of originally acquired data that best matches the entirety of the data set without introducing bias is not trivial. Here, we released a free online tool, FairSubset, and its constituent Shiny App R code to subset data in an unbiased fashion. Subsets were set at the same N across samples and retained representative average and standard deviation information. The method can be used for quantitation of entire fields of view or other replicates without biasing the data pool toward large N samples. We showed examples of the tool’s use with fluorescence data and DNA-damage related Comet tail quantitation. This FairSubset tool and the method to retain distribution information at the single-datum level may be considered for standardized use in fair publishing practices.

Author Biography

Joe R Delaney, Medical University of South Carolina

Assistant Professor Department of Biochemistry and Molecular Biology Medical University of South Carolina, Charleston, SC 29425, USA


Jones W. Longevity in a fasting spider. Science. 1884;3(48):4. Epub 1884/01/04. doi: 10.1126/science.ns-3.48.4-c. PubMed PMID: 17738099.

Lee JY, Kitaoka M. A beginner's guide to rigor and reproducibility in fluorescence imaging experiments. Mol Biol Cell. 2018;29(13):1519-25. Epub 2018/06/29. doi: 10.1091/mbc.E17-05-0276. PubMed PMID: 29953344; PubMed Central PMCID: PMCPMC6080651.

Ljosa V, Carpenter AE. Introduction to the quantitative analysis of two-dimensional fluorescence microscopy images for cell-based screening. PLoS Comput Biol. 2009;5(12):e1000603. Epub 2009/12/31. doi: 10.1371/journal.pcbi.1000603. PubMed PMID: 20041172; PubMed Central PMCID: PMCPMC2791844.

Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi: 10.1371/journal.pbio.1002128. PubMed PMID: 25901488; PubMed Central PMCID: PMCPMC4406565.

Kick the bar chart habit. Nat Methods. 2014;11(2):113. PubMed PMID: 24645190.

Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9(7):671-5. PubMed PMID: 22930834; PubMed Central PMCID: PMCPMC5554542.

Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10(2):486-9. doi: 10.5812/ijem.3505. PubMed PMID: 23843808; PubMed Central PMCID: PMCPMC3693611.

Gyori BM, Venkatachalam G, Thiagarajan PS, Hsu D, Clement MV. OpenComet: an automated tool for comet assay image analysis. Redox Biol. 2014;2:457-65. doi: 10.1016/j.redox.2013.12.020. PubMed PMID: 24624335; PubMed Central PMCID: PMCPMC3949099.

Delaney JR, Patel CB, Willis KM, Haghighiabyaneh M, Axelrod J, Tancioni I, et al. Haploinsufficiency networks identify targetable patterns of allelic deficiency in low mutation ovarian cancer. Nat Commun. 2017;8:14423. doi: 10.1038/ncomms14423. PubMed PMID: 28198375; PubMed Central PMCID: PMCPMC5316854.

Data sharing and the future of science. Nat Commun. 2018;9(1):2817. doi: 10.1038/s41467-018-05227-z. PubMed PMID: 30026584; PubMed Central PMCID: PMCPMC6053389.

Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol. 2013;13:100. doi: 10.1186/1471-2288-13-100. PubMed PMID: 23902644; PubMed Central PMCID: PMCPMC3734029.




How to Cite

Ortell KK, Switonski PM, Delaney JR. FairSubset: A tool to choose representative subsets of data for use with replicates or groups of different sample sizes. J Biol Methods [Internet]. 2019Sep.3 [cited 2022May27];6(3):e118. Available from: https://jbmethods.org/jbm/article/view/299