Abstract
In the era of widespread Internet use and extensive social media interaction, the digital realm is accumulating vast amounts of unstructured text data. This unstructured data often contain undesirable information, necessitating time-consuming manual classification efforts. An intelligent text classification system capable of automatically categorizing digitized texts based on semantic meaning is crucial. However, this task is particularly challenging for low-resource languages like Bengali due to a shortage of annotated corpora, issues with out-of-vocabulary words, lack of domain-specific hyperparameter tuning, limited ability to extract generalized text features, and class imbalances within the corpus. AFuNet: an attention-based fusion network to classify texts in a resource-constrained language. AFuNet undergoes a comprehensive four-phase experimental process, including baseline model evaluation and hyperparameter tuning, late fusion and model selection, attention-based early fusion and model identification, and an ablation study with impact analysis. Fine-tuned based on five Bengali text classification corpora, AFuNet achieves impressive accuracies: 96.60 ± 0.2 (BTCC11), 85.37 ± 0.2 (OSBC), 97.35 ± 0.2 (BARD), 93.74 ± 0.2 (IndicNLP), and 96.51 ± 0.2 (ProthomAlo). In comparison with previous state-of-the-art models on these corpora, AFuNet demonstrates significant accuracy improvements ranging from 0.54% to 4.49%, showcasing its effectiveness in advancing text classification capabilities for the Bengali language.
| Original language | English |
|---|---|
| Pages (from-to) | 6725-6748 |
| Number of pages | 24 |
| Journal | Neural Computing and Applications |
| Volume | 37 |
| Issue number | 9 |
| Early online date | 23 Jan 2025 |
| DOIs | |
| Publication status | Published (in print/issue) - 1 Mar 2025 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2025.
Keywords
- Early fusion
- Low-resource languages
- Multi-head attention
- Natural language processing
- Text classification
- Transformer-based learning