Abstract
The prevalence of multilabel aggressive text content on social media has a detrimental societal impact attracting the attention of government agencies and tech corporations to undertake measures against the spread of it. Hitherto research has focused on high-resource languages like English, leaving low-resource languages like Bengali out of the spotlight. This work presents a transformer-based technique to classify multilabel aggressive texts in Bengali into their targets to aid research in this area. A dataset (EM-BAD) containing 13728 texts is developed into five target classes: Religious Aggression (ReAG), Political Aggression (PoAG), Verbal Aggression (VeAG), Gender Aggression (GeAG), and Racial Aggression (RaAG) to perform the aggressive texts classification. Experimental results demonstrate that the Bangla-BERT with adjusted pooling layer and fine-tuning outdoes all ML, DL, and transformer-base baselines and existing techniques. The Bangla-BERT shows the highest weighted f1-score of 0.89 in the multilabel aggressive text classification task.
Original language | English |
---|---|
Title of host publication | 2023 26th International Conference on Computer and Information Technology (ICCIT) |
Publisher | IEEE |
Pages | 1-6 |
Number of pages | 6 |
ISBN (Electronic) | 979-8-3503-5901-5 |
ISBN (Print) | 979-8-3503-5902-2 |
DOIs | |
Publication status | Published online - 27 Feb 2024 |
Publication series
Name | 2023 26th International Conference on Computer and Information Technology (ICCIT) |
---|---|
Publisher | IEEE Control Society |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Keywords
- Natural language processing
- Aggressive text classification
- Deep learning
- Text processing
- Text corpora