AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution

Md. Rajib Hossain, Mohammed Moshiul Hoque, M. Ali Akber Dewan, Enamul Hoque, Nazmul Siddique

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)
1 Downloads (Pure)

Abstract

Authorship Attribution (AA) is crucial for identifying the author of a given text from a pool of suspects, especially with the widespread use of the internet and electronic devices. However, most AA research has primarily focused on high-resource languages like English, leaving low-resource languages such as Bengali relatively unexplored. Challenges faced in this domain include the absence of benchmark corpora, a lack of context-aware feature extractors, limited availability of tuned hyperparameters, and OOV issues. To address these challenges, this study introduces AuthorNet for authorship attribution using attention-based early fusion of transformer-based language models, i.e., concatenation of an embeddings output of two existing models that were fine-tuned. AuthorNet consists of three key modules: Feature extraction, Fine-tuning and selection of best-performing models, and Attention-based early fusion. To evaluate the performance of AuthorNet, a number of experiments using four benchmark corpora have been conducted. The results demonstrated exceptional accuracy: 98.86 ± 0.01%, 99.49 ± 0.01%, 97.91 ± 0.01%, and 99.87 ± 0.01% for four corpora. Notably, AuthorNet outperformed all foundation models, achieving accuracy improvements ranging from 0.24% to 2.92% across the four corpora.
Original languageEnglish
Article number125643
Pages (from-to)1-16
Number of pages16
JournalExpert Systems with Applications
Volume262
Early online date4 Nov 2024
DOIs
Publication statusPublished (in print/issue) - 1 Mar 2025

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Data Access Statement

Data will be made available on request.

Funding

This work is supported by the Directorate of Research and Extension (DRE) of Chittagong University of Engineering & Technology (CUET), Chittagong, Bangladesh.

Keywords

  • Natural language processing
  • Fine-tuning
  • Hyperparameters tuning
  • Early fusion
  • Multi-head attention
  • Low resource language
  • Authorship attribution

Fingerprint

Dive into the research topics of 'AuthorNet: Leveraging attention-based early fusion of transformers for low-resource authorship attribution'. Together they form a unique fingerprint.

Cite this