Localised Ensemble Learning (LEL) - A localised approach to class imbalance

Research output: Contribution to journalArticlepeer-review

Abstract

Class imbalance is a persistent challenge in machine learning, often degrading model performance by skewing predictions toward majority classes. Traditional approaches typically focus on global correction strategies, which may overlook important Localised irregularities within the data. These global methods may fail to adapt to the varying characteristics of individual samples, limiting their effectiveness in complex real-world scenarios.To address these limitations, we propose Localised Ensemble Learning (LEL), a novel framework that incorporates local structural information into the learning process. LEL begins by applying K-Nearest Neighbors (KNN) to assign each sample a Sample Type based on specific rules that capture neighbourhood distribution, distance-based imbalance, and
sample quality. This Sample Type feature is then used to partition the dataset into distinct subsets, each of which is treated using tailored imbalance mitigation strategies. Individual models are trained on these subsets, and their
predictions are integrated into a final ensemble, allowing LEL to address different forms of localised imbalance in a principled and modular way . The effectiveness of LEL is validated through a comprehensive evaluation against
global correction strategies namely SMOTE, NOGAN, Cost-Sensitive Learning (CSL) and Ensemble methods, across multiple metrics including Recall, Precision, F1 Score, Kappa, G Mean and quantitatively. Statistical significance of the results is assessed using paired T-tests and Wilcoxon signed rank test. SHAP (SHapley Additive exPlanations) values are employed to analyze feature contributions, revealing the Sample Type feature as a critical determinant of model performance. Additionally, an ablation study highlights the impact of key parameters, such as the k value in KNN providing further insights into the robustness and adaptability of the LEL framework. Experimental results
demonstrate that LEL consistently outperforms global approaches across all tested classifiers, including Random Forest, Decision Tree, XGBoost, and Naive Bayes. LEL achieves statistically significant improvements in recall and precision, underscoring its ability to handle localised forms of imbalance effectively which translates to global imbalances. The findings emphasize the importance of addressing localised data defects and leveraging features like Sample Type, which capture complex relationships and enhance predictive accuracy.
Original languageEnglish
JournalPattern Analysis and Applications
Publication statusAccepted for publication - 29 Aug 2025

Keywords

  • Class Imbalance
  • Classification
  • Random Forest
  • Nearest Neighbour

Fingerprint

Dive into the research topics of 'Localised Ensemble Learning (LEL) - A localised approach to class imbalance'. Together they form a unique fingerprint.

Cite this