Computational analysis of visual social signals

  • Shane Reid

Student thesis: Doctoral Thesis

Abstract

The ability to automatically understanding and predict human behaviour is increasingly important in the modern world. One approach to this is through social signal processing, a computing domain aimed at understanding and interpreting the observable behaviours that we display every day. By understanding these social signals, it is possible for us to make predictions about an individual’s future behaviour. Recent advances in computer vision and machine learning mean that it is now possible for such analysis and predictions to be automated.

The aim of this thesis is to explore and demonstrate how a social signal processing approach can be used in video surveillance contexts. This involves exploring the use of these social signals not just in a carefully controlled setting, but in realistic surveillance environments. In order to demonstrate the viability of social signal modelling in these contexts, we explore the problem of shoplifting detection and demonstrate that extraction of social signal features related to shoplifting could be used to predict such behaviour.

The automated detection and classification of human activities from video has been an open problem in computer vision for over two decades. However, most attempts in the past to use computer vision to detect social signals has a number of drawbacks. the recent emergence of deep learning techniques has enabled the development of methods for the fast extraction of skeletal keypoints features, which (alongside continued increases in computing power) has been a breakthrough in enabling the development of real time methods for human activity recognition that can readily scale to multiple people. Therefore, we investigated the use of keypoint features to detect human activities based social signals from a low-resolution video dataset.

As the outcome of this work, a social signal model for the problem of shoplifting detection has been developed. A model for head pose estimation has been presented and explored for difficult contexts such as where there are facial obfuscations. Finally, models for human activity have been developed and explored with low resolution and low framerate surveillance footage.

Date of Award2025
Original languageEnglish
SupervisorDermot Kerr (Supervisor), Sonya Coleman (Supervisor), Philip Vance (Supervisor) & Siobhan O'Neill (Supervisor)

Keywords

  • social signal processing

Cite this

'