Skip to main navigation Skip to search Skip to main content

Unsupervised Video Anomaly Detection with Swin Transformer and Temporal-Context Modeling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Video anomaly detection has broad applications in public safety, health monitoring, and emergency response. Existing transformer-based methods often struggle to capture multi-scale spatiotemporal patterns effectively. In this paper, we propose an unsupervised video anomaly detection framework that enhances feature representation by combining a 3D Swin Transformer with temporal shift modules and dynamic large kernel (DLK) convolutions. The 3D encoder-decoder structure models normal behavior from video sequences, and anomalies are identified through reconstruction errors. We validate our method on three public datasets: ShanghaiTech, Avenue, and Ped2. Our model achieves competitive accuracy, demonstrating improved capacity for capturing temporal dynamics and contextual details.
Original languageEnglish
Title of host publication2025 Asian Conference on Artificial Intelligence Technology (ACAIT)
PublisherIEEE
Pages1468-1472
Number of pages5
ISBN (Electronic)979-8-3315-8787-1
ISBN (Print)979-8-3315-8788-8
DOIs
Publication statusPublished online - 20 May 2026
Event2025 Asian Conference on Artificial Intelligence Technology (ACAIT) - Ordos, China
Duration: 12 Sept 202514 Sept 2025

Publication series

Name2025 Asian Conference on Artificial Intelligence Technology (ACAIT)
PublisherIEEE Control Society

Conference

Conference2025 Asian Conference on Artificial Intelligence Technology (ACAIT)
Country/TerritoryChina
CityOrdos
Period12/09/2514/09/25

Keywords

  • unsupervised learning
  • Swin Transformer
  • Spatio-Temporal Modeling
  • multi-scale representation

Fingerprint

Dive into the research topics of 'Unsupervised Video Anomaly Detection with Swin Transformer and Temporal-Context Modeling'. Together they form a unique fingerprint.

Cite this