FairSubset: A tool to choose representative subsets of data for use with replicates or groups of different sample sizes

Katherine K Ortell; Pawel M Switonski; Joe R Delaney

doi:10.14440/jbm.2019.299

Katherine K Ortell

Medical University of South Carolina

Pawel M Switonski

Duke University

Joe R Delaney

Medical University of South Carolina

Keywords

statistics, normalization, automation, microscopy

Abstract

High-impact journals are promoting transparency of data. Modern scientific methods can be automated and produce disparate samples sizes. In many cases, it is desirable to retain identical or pre-defined sample sizes between replicates or groups. However, choosing which subset of originally acquired data that best matches the entirety of the data set without introducing bias is not trivial. Here, we released a free online tool, FairSubset, and its constituent Shiny App R code to subset data in an unbiased fashion. Subsets were set at the same N across samples and retained representative average and standard deviation information. The method can be used for quantitation of entire fields of view or other replicates without biasing the data pool toward large N samples. We showed examples of the tool’s use with fluorescence data and DNA-damage related Comet tail quantitation. This FairSubset tool and the method to retain distribution information at the single-datum level may be considered for standardized use in fair publishing practices.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract 36 | HTML Downloads 82 PDF Downloads 209

References

1. Jones W. Longevity in a fasting spider. Science. 1884;3(48):4. Epub 1884/01/04. doi: 10.1126/science.ns-3.48.4-c. PubMed PMID: 17738099.
2. Lee JY, Kitaoka M. A beginner's guide to rigor and reproducibility in fluorescence imaging experiments. Mol Biol Cell. 2018;29(13):1519-25. Epub 2018/06/29. doi: 10.1091/mbc.E17-05-0276. PubMed PMID: 29953344; PubMed Central PMCID: PMCPMC6080651.
3. Ljosa V, Carpenter AE. Introduction to the quantitative analysis of two-dimensional fluorescence microscopy images for cell-based screening. PLoS Comput Biol. 2009;5(12):e1000603. Epub 2009/12/31. doi: 10.1371/journal.pcbi.1000603. PubMed PMID: 20041172; PubMed Central PMCID: PMCPMC2791844.
4. Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13(4):e1002128. doi: 10.1371/journal.pbio.1002128. PubMed PMID: 25901488; PubMed Central PMCID: PMCPMC4406565.
5. Kick the bar chart habit. Nat Methods. 2014;11(2):113. PubMed PMID: 24645190.
6. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of image analysis. Nat Methods. 2012;9(7):671-5. PubMed PMID: 22930834; PubMed Central PMCID: PMCPMC5554542.
7. Ghasemi A, Zahediasl S. Normality tests for statistical analysis: a guide for non-statisticians. Int J Endocrinol Metab. 2012;10(2):486-9. doi: 10.5812/ijem.3505. PubMed PMID: 23843808; PubMed Central PMCID: PMCPMC3693611.
8. Gyori BM, Venkatachalam G, Thiagarajan PS, Hsu D, Clement MV. OpenComet: an automated tool for comet assay image analysis. Redox Biol. 2014;2:457-65. doi: 10.1016/j.redox.2013.12.020. PubMed PMID: 24624335; PubMed Central PMCID: PMCPMC3949099.
9. Delaney JR, Patel CB, Willis KM, Haghighiabyaneh M, Axelrod J, Tancioni I, et al. Haploinsufficiency networks identify targetable patterns of allelic deficiency in low mutation ovarian cancer. Nat Commun. 2017;8:14423. doi: 10.1038/ncomms14423. PubMed PMID: 28198375; PubMed Central PMCID: PMCPMC5316854.
10. Data sharing and the future of science. Nat Commun. 2018;9(1):2817. doi: 10.1038/s41467-018-05227-z. PubMed PMID: 30026584; PubMed Central PMCID: PMCPMC6053389.
11. Guo Y, Logan HL, Glueck DH, Muller KE. Selecting a sample size for studies with repeated measures. BMC Med Res Methodol. 2013;13:100. doi: 10.1186/1471-2288-13-100. PubMed PMID: 23902644; PubMed Central PMCID: PMCPMC3734029.

HTML PDF

Published

Sep 3, 2019

DOI https://doi.org/10.14440/jbm.2019.299

How to Cite

1.

Ortell KK, Switonski PM, Delaney JR. FairSubset: A tool to choose representative subsets of data for use with replicates or groups of different sample sizes. J Biol Methods [Internet]. 2019 Sep. 3 [cited 2024 Apr. 19];6(3):e118. Available from: https://jbmethods.org/index.php/jbm/article/view/299

Issue

Vol. 6 No. 3 (2019)

Section

Resources

Authors who publish with JBM agree to the following terms:

Authors retain copyright and grant JBM right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Author Biography