Abstract
Accurate classification of cancer-related biomedical abstracts is critical for advancing cancer informatics and supporting decision-making in healthcare research. Yet progress in this domain is often constrained by limited availability of labeled corpora and the high computational demands of transformer-based approaches. To address these challenges, we propose a Residual Graph Attention Network (R-GAT) that integrates multi-head attention with residual connections to capture semantic and relational dependencies in biomedical texts. Evaluated on a curated dataset of 1,875 PubMed abstracts spanning thyroid, colon, lung, and generic cancer topics, R-GAT achieves stable and competitive performance (macro-F1: 0.96 ± 0.01), comparable to transformer-based models such as BioBERT and BioClinicalBERT and strong classical baselines like Logistic Regression, while requiring significantly fewer computational resources. Ablation studies confirm the importance of attention and residual connections in ensuring robustness under limited-data conditions. To support reproducibility and facilitate future research, we also release the curated dataset. Together, these contributions demonstrate the value of lightweight graph-based architectures as reliable and resource-efficient alternatives to computationally intensive transformers in biomedical NLP.
| Original language | English |
|---|---|
| Article number | 6582 |
| Number of pages | 10 |
| Journal | Scientific Reports |
| Volume | 26 |
| DOIs | |
| Publication status | Published - 17 Feb 2026 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Fingerprint
Dive into the research topics of 'R-GAT: cancer document classification leveraging graph-based residual network for scenarios with limited data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver