Enhancing text classification through grammar-based feature engineering and learning models

Alaa Mohasseb, Andreas Kanavos, Eslam Amer

Research output: Contribution to journalArticlepeer-review

22 Downloads (Pure)

Abstract

Text classification remains a challenging task in natural language processing (NLP) due to linguistic complexity and data imbalance. This study proposes a hybrid approach that integrates grammar-based feature engineering with deep learning and transformer models to enhance classification performance. A dataset of factoid and non-factoid questions, further categorised into causal, choice, confirmation, hypothetical, and list types, is used to evaluate several models, including CNNs, BiLSTMs, MLPs, BERT, DistilBERT, Electra, and GPT-2. Grammatical and domain-specific features are explicitly extracted and leveraged to improve multi-class classification. To address class imbalance, the SMOTE algorithm is applied, significantly boosting the recall and F1-score for minority classes. Experimental results show that DistilBERT achieves the highest binary classification accuracy, equal to 94%, while BiLSTM and CNN outperform transformers in multi-class settings, reaching up to 92% accuracy. These findings confirm that grammar-based features provide critical syntactic and semantic insights, enhancing model robustness and interpretability beyond conventional embeddings.
Original languageEnglish
Article number424
Number of pages26
JournalInformation
Volume16
Issue number6
Early online date22 May 2025
DOIs
Publication statusPublished - 1 Jun 2025

Keywords

  • Text Classification
  • Deep Learning
  • Transformer Models
  • Grammar-Based 15 Feature Engineering
  • ; Natural Language Processing (NLP)
  • SMOTE
  • Question Classification

Fingerprint

Dive into the research topics of 'Enhancing text classification through grammar-based feature engineering and learning models'. Together they form a unique fingerprint.

Cite this