Abstract
Homophobic speech is a form of hate speech. Social media enables hate speech to spread rapidly and widely through the internet, and unlike offline hate speech, can persist indefinitely, thereby prolonging its impact. Due to the adverse impact of hate speech, policymakers have called for greater action from online platforms to moderate and remove hate speech, including homophobic content. While homophobic hate speech is prevalent in online soccer discourses, there are few studies on this empirical context in general and specifically on the use of Large Language Models (LLMs) for detecting such speech. This study addresses this gap by proposing a homophobic speech text classification pipeline. We introduce H-DICT, a new general dictionary for identifying potential homophobic content in documents, and leverage this dictionary to curate and manually label an annotated dataset of homophobic and non-homophobic samples from the UEFA European Football Championships (the Euros) discourse on Twitter. We fine-tune and evaluate five large language models (LLMs) based on the BERT architecture - BERT, DistilBERT, RoBERTa, BERT Hate, and RoBERTa Offensive - and use Integrated Gradients, an explainable AI technique to explain each model’s predictions. RoBERTa Offensive, an LLM fine-tuned specifically for detecting offensive language, presented the best performance when compared to the other LLMs.
Original language | English |
---|---|
Title of host publication | Social Networks Analysis and Mining |
Subtitle of host publication | 16th International Conference, ASONAM 2024, Rende, Italy, September 2–5, 2024, Proceedings, Part I |
Editors | Luca Maria Aiello, Tanmoy Chakraborty, Sabrina Gaito |
Pages | 489-504 |
Number of pages | 16 |
Volume | 1 |
ISBN (Electronic) | 978-3-031-78541-2 |
DOIs | |
Publication status | Published (in print/issue) - 24 Jan 2025 |
Event | 16th International Conference 2024 - Rende, Italy Duration: 2 Sept 2024 → 5 Sept 2024 Conference number: 16 |
Publication series
Name | Lecture Notes in Computer Science (LCNS, volume 15211) |
---|---|
Publisher | Springer |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 16th International Conference 2024 |
---|---|
Abbreviated title | ASONAM |
Country/Territory | Italy |
City | Rende |
Period | 2/09/24 → 5/09/24 |
Keywords
- soccer
- hate speech classification
- homophobic speech
- large language models
- explainable AI