Detecting Homophobic Speech in Soccer Tweets using Large Language Models and Explainable AI

Guto Leoni Santos, Vitor Gaboardi dos Santos, Colm Kearns, Gary Sinclair, Jack Black, Mark Doidge, Thomas Fletcher, Daniel Kilvington, Katie Liston, Patricia Takako Endo, Theo Lynn

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Homophobic speech is a form of hate speech. Social media enables hate speech to spread rapidly and widely through the internet, and unlike offline hate speech, can persist indefinitely, thereby prolonging its impact. Due to the adverse impact of hate speech, policymakers have called for greater action from online platforms to moderate and remove hate speech, including homophobic content. While homophobic hate speech is prevalent in online soccer discourses, there are few studies on this empirical context in general and specifically on the use of Large Language Models (LLMs) for detecting such speech. This study addresses this gap by proposing a homophobic speech text classification pipeline. We introduce H-DICT, a new general dictionary for identifying potential homophobic content in documents, and leverage this dictionary to curate and manually label an annotated dataset of homophobic and non-homophobic samples from the UEFA European Football Championships (the Euros) discourse on Twitter. We fine-tune and evaluate five large language models (LLMs) based on the BERT architecture - BERT, DistilBERT, RoBERTa, BERT Hate, and RoBERTa Offensive - and use Integrated Gradients, an explainable AI technique to explain each model’s predictions. RoBERTa Offensive, an LLM fine-tuned specifically for detecting offensive language, presented the best performance when compared to the other LLMs.
Original languageEnglish
Title of host publicationSocial Networks Analysis and Mining
Subtitle of host publication16th International Conference, ASONAM 2024, Rende, Italy, September 2–5, 2024, Proceedings, Part I
EditorsLuca Maria Aiello, Tanmoy Chakraborty, Sabrina Gaito
Pages489-504
Number of pages16
Volume1
ISBN (Electronic)978-3-031-78541-2
DOIs
Publication statusPublished (in print/issue) - 24 Jan 2025
Event16th International Conference 2024 - Rende, Italy
Duration: 2 Sept 20245 Sept 2024
Conference number: 16

Publication series

NameLecture Notes in Computer Science (LCNS, volume 15211)
PublisherSpringer
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference16th International Conference 2024
Abbreviated titleASONAM
Country/TerritoryItaly
CityRende
Period2/09/245/09/24

Keywords

  • soccer
  • hate speech classification
  • homophobic speech
  • large language models
  • explainable AI

Fingerprint

Dive into the research topics of 'Detecting Homophobic Speech in Soccer Tweets using Large Language Models and Explainable AI'. Together they form a unique fingerprint.

Cite this