Abstract
Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
| Original language | English |
|---|---|
| Article number | 125643 |
| Pages (from-to) | 1-16 |
| Number of pages | 16 |
| Journal | Expert Systems with Applications |
| Volume | 262 |
| Early online date | 4 Nov 2024 |
| DOIs | |
| Publication status | Published (in print/issue) - 1 Mar 2025 |
Bibliographical note
Publisher Copyright:© 2024 Elsevier Ltd
Data Access Statement
Data will be made available on request.Funding
This work is supported by the Directorate of Research and Extension (DRE) of Chittagong University of Engineering & Technology (CUET), Chittagong, Bangladesh.
Keywords
- Natural language processing
- Fine-tuning
- Hyperparameters tuning
- Early fusion
- Multi-head attention
- Low resource language
- Authorship attribution