TY - JOUR
T1 - Integrative analysis of RNA expression data unveils distinct cancer types through machine learning techniques
AU - Alanazi, Saad Awadh
AU - Alshammari, Nasser
AU - Alruwaili, Maddalah
AU - Junaid, Kashaf
AU - Abid, Muhammad Rizwan
AU - Ahmad, Fahad
N1 - © 2023 Published by Elsevier B.V. on behalf of King Saud University.
PY - 2024/3/1
Y1 - 2024/3/1
N2 - Cancer is a highly complex and heterogeneous disease. Traditional methods of cancer classification based on histopathology have limitations in guiding personalized prognosis and therapy. Gene expression profiling provides a powerful approach to unraveling molecular intricacies and better-stratifying cancer subtypes. In this study, we performed an integrative analysis of RNA sequencing data from five cancer types - BRCA, KIRC, COAD, LUAD, and PRAD. A machine learning workflow consisting of dataset identification, normalization, feature selection, dimensionality reduction, clustering, and classification was implemented. The k-means algorithm was applied to categorize samples into distinct clusters based solely on gene expression patterns. Five unique clusters emerged from the unsupervised machine learning based analysis, significantly correlating with the known cancer types. BRCA aligned predominantly with one cluster, while COAD spanned three clusters. KIRC was represented within two main clusters. LUAD is associated strongly with a single cluster and PRAD with another cluster. This demonstrates the ability of machine learning approaches to unravel complex signatures within transcriptomic profiles that can delineate cancer subtypes. The proposed study highlights the potential of integrative analytics to derive meaningful biological insights from high-dimensional omics datasets. Molecular subtyping through machine learning clustering enhances our understanding of the intrinsic heterogeneities and pathways dysregulated in different cancers. Overall, this study exemplifies a powerful computational framework to classify gene expressions of patients having different types of cancers and guide personalized therapeutic decisions. Finally, Wide Neural Network demonstrates a significantly higher accuracy, achieving 99.834% on the validation set and an even more impressive 99.995% on the test set.
AB - Cancer is a highly complex and heterogeneous disease. Traditional methods of cancer classification based on histopathology have limitations in guiding personalized prognosis and therapy. Gene expression profiling provides a powerful approach to unraveling molecular intricacies and better-stratifying cancer subtypes. In this study, we performed an integrative analysis of RNA sequencing data from five cancer types - BRCA, KIRC, COAD, LUAD, and PRAD. A machine learning workflow consisting of dataset identification, normalization, feature selection, dimensionality reduction, clustering, and classification was implemented. The k-means algorithm was applied to categorize samples into distinct clusters based solely on gene expression patterns. Five unique clusters emerged from the unsupervised machine learning based analysis, significantly correlating with the known cancer types. BRCA aligned predominantly with one cluster, while COAD spanned three clusters. KIRC was represented within two main clusters. LUAD is associated strongly with a single cluster and PRAD with another cluster. This demonstrates the ability of machine learning approaches to unravel complex signatures within transcriptomic profiles that can delineate cancer subtypes. The proposed study highlights the potential of integrative analytics to derive meaningful biological insights from high-dimensional omics datasets. Molecular subtyping through machine learning clustering enhances our understanding of the intrinsic heterogeneities and pathways dysregulated in different cancers. Overall, this study exemplifies a powerful computational framework to classify gene expressions of patients having different types of cancers and guide personalized therapeutic decisions. Finally, Wide Neural Network demonstrates a significantly higher accuracy, achieving 99.834% on the validation set and an even more impressive 99.995% on the test set.
KW - gene
KW - RNA expression, cancer
KW - tumour, diagnosis
KW - machine learning
KW - clustering
KW - classification
U2 - 10.1016/j.sjbs.2023.103918
DO - 10.1016/j.sjbs.2023.103918
M3 - Article
C2 - 38283772
SN - 1319-562X
VL - 31
JO - Saudi Journal of Biological Sciences
JF - Saudi Journal of Biological Sciences
IS - 3
M1 - 103918
ER -