loading page

THE SIMILARITY STRUCTURE: A NOVEL CHARACTERIZATION OF EFFECT SIZE FOR LARGE SAMPLES
  • +2
  • Abel Sánchez-Jiménez,
  • Gonzalo Aparicio-Rodríguez,
  • Paloma Manubens,
  • Carlos Calvo-Tapia,
  • José Antonio Villacorta Atienza
Abel Sánchez-Jiménez
Unit of Biomathematics. Department of Biodiversity, Ecology and Evolution. Faculty of Biological Sciences, Complutense University of Madrid
Gonzalo Aparicio-Rodríguez
Unit of Biomathematics. Department of Biodiversity, Ecology and Evolution. Faculty of Biological Sciences, Complutense University of Madrid
Paloma Manubens
Unit of Biomathematics. Department of Biodiversity, Ecology and Evolution. Faculty of Biological Sciences, Complutense University of Madrid
Carlos Calvo-Tapia
Unit of Biomathematics. Department of Biodiversity, Ecology and Evolution. Faculty of Biological Sciences, Complutense University of Madrid
José Antonio Villacorta Atienza
Unit of Applied Mathematics. School of Optics and Optometry, Complutense University of Madrid, Unit of Biomathematics. Department of Biodiversity, Ecology and Evolution. Faculty of Biological Sciences, Complutense University of Madrid

Corresponding Author:[email protected]

Author Profile

Abstract

Statistical inference traditionally relies on p-values, assessing the alignment between data and the absence of effect. Nevertheless, in large datasets, p-values lose relevance, marking minor differences as statistically significant. Thus, it becomes imperative in large data to evaluate the practical, clinical, or biological effect's magnitude. Non-dimensional metrics like Cohen's d, allow for general comparisons, but they can obscure practical meaning. Dimensional metrics, such as confidence intervals, lack standardization and may complicate practical interpretation. We propose a novel approach termed the similarity structure for characterizing differences in large samples focused on the probability distribution of two subsamples being of size N, given that they are similar (statistically non-different). This quantifies the effect size as the expected sample size when similarity exists, irrespective of data nature, dimensionality, or hypothesis testing. Additionally, it can be translated into common measures like Cohen's d and required sample sizes for a statistical power of 0.9. Furthermore, the similarity structure allows to statistically compare effect sizes, assessing the importance of the factors involved in sample differences. The similarity structure allows for the transparent and versatile assessment, interpretation, and comparison of effect sizes, contributing to more comprehensible and reproducible scientific research. This approach is demonstrated with real-world examples.
06 May 2024Submitted to TechRxiv
09 May 2024Published in TechRxiv