Abstract

This paper introduces an innovative approach to the classification of gene expression data using the k-nearest neighbors (KNN) algorithm. High dimensionality and limited sample sizes continue to present significant challenges for conventional classification techniques, including KNN. In response, we propose the Relative Relation Metric (RRM), a novel metric that diverges from traditional distances which typically rely on direct numerical or spatial comparisons. RRM instead focuses on the count of relational changes between pairs of data points, drawing conceptual inspiration from Relative Expression Analysis, which identifies the most discriminating gene pairs between classes, and Kendall's Tau. Applied to real gene expression datasets for disease classification and compared with established metrics, our preliminary study suggests that RRM has potential as an effective alternative for high-dimensional data classification, especially in contexts requiring resistance to methodological variations and the transformational aspects of biological data.

Recommended Citation

Kartowicz-Stolarska, I.J. & Czajkowski, M. (2024). Relative Relation in KNN Classification for Gene Expression Data. A Preliminary Study. In B. Marcinkowski, A. Przybylek, A. Jarzębowicz, N. Iivari, E. Insfran, M. Lang, H. Linger, & C. Schneider (Eds.), Harnessing Opportunities: Reshaping ISD in the post-COVID-19 and Generative AI Era (ISD2024 Proceedings). Gdańsk, Poland: University of Gdańsk. ISBN: 978-83-972632-0-8. https://doi.org/10.62036/ISD.2024.94

Paper Type

Full Paper

DOI

10.62036/ISD.2024.94

Share

COinS
 

Relative Relation in KNN Classification for Gene Expression Data. A Preliminary Study

This paper introduces an innovative approach to the classification of gene expression data using the k-nearest neighbors (KNN) algorithm. High dimensionality and limited sample sizes continue to present significant challenges for conventional classification techniques, including KNN. In response, we propose the Relative Relation Metric (RRM), a novel metric that diverges from traditional distances which typically rely on direct numerical or spatial comparisons. RRM instead focuses on the count of relational changes between pairs of data points, drawing conceptual inspiration from Relative Expression Analysis, which identifies the most discriminating gene pairs between classes, and Kendall's Tau. Applied to real gene expression datasets for disease classification and compared with established metrics, our preliminary study suggests that RRM has potential as an effective alternative for high-dimensional data classification, especially in contexts requiring resistance to methodological variations and the transformational aspects of biological data.