Abstract
Surveillance cameras are common in both the private and public sectors for security and monitoring, and closed-circuit television (CCTV) systems are used for surveillance, generating large amounts of video data that cannot be manually monitored 24/7. The traditional approach to analysis is time-consuming and inefficient, and there is a growing need for automated surveillance systems that can recognize and classify anomalies. The research area that has been the most challenging to solve is AD systems that detect anomalies in data that is not structured according to the normal patterns. RNNs are slow and have difficulty identifying anomalies in the road that occur in multiple frames at the same time, whereas CNNs are limited in extracting temporal features from objects and generally disregard the background noise in video frames. In this study, a new framework for background removal is presented that removes the irrelevant background elements during object recognition. This framework saves temporal and spatial information over frames and uses YOLOv8 and a spatial-temporal adaptive fusion method with an end-to-end model based on a CNN encoder and a Transformer decoder for parallel video investigation. The proposed method was tested on the UCF Crime dataset and a custom Road Anomaly Dataset (RAD), and the accuracy of the framework was 89.90% on the UCF Crime dataset and 98.28% on the RAD dataset.
| Original language | English |
|---|---|
| Article number | 45341 |
| Pages (from-to) | 1-14 |
| Number of pages | 14 |
| Journal | Scientific Reports |
| Volume | 15 |
| Issue number | 1 |
| Early online date | 25 Nov 2025 |
| DOIs | |
| Publication status | Published (in print/issue) - 30 Dec 2025 |
Bibliographical note
© 2025. The Author(s).Funding
This work was supported by the Ongoing Research Funding program (ORF-2025-893), King Saud University, Riyadh, Saudi Arabia.
Keywords
- Smart transportation system
- CNN
- Deep learning
- Transfer Learning
- YOLO
- Road anomaly detection