Using Computer Vision to Apply Activity Recognition Techniques to the Monitoring of Emotional Wellbeing

Research output: Contribution to conferenceAbstractpeer-review

6 Downloads (Pure)

Abstract

The World Health Organization predicts that by 2050 those over the age of 60 will account for 22% of the world's population, up from 12% in 2015. Accompanying this change in demographics will be an increase in age-related health conditions, such as frailty, dementia and Parkinson’s disease which require long-term monitoring and treatment. Dementia and Parkinson’s disease are both progressive neurological conditions. Dementia, commonly presents as changes in mood and behaviour and memory problems. The symptoms of Parkinson’s disease include body tremors and slow, impaired movement, as well as depression and anxiety. Being able to monitor the emotions and emotional well-being of those affected by these conditions, on a long-term basis would allow clinicians, carers and family members to observe trends, supporting diagnosis, management, medication and provision of care.

A variety of methods are used to recognise emotion, including video analysis to observe visual and audio signals (if present), or wearable devices to observe physiological signals such as ECG, pulse, and temperature. Taking inspiration from a technique that has been used in human activity recognition, our study employs a pose estimation model generated by the open-source tool OpenPose to extract face, body and hand key-points from video and generating features to input into the algorithms that will classify emotion and/or wellbeing state, rather than activity. Our study aims to demonstrate that combining information about facial expression, body posture and hand gestures will improve categorisation of emotion.

For this study we use the Multimodal EmotionLines Dataset (MELD), an acted dataset containing 13,848 video clips of utterances from the TV show “Friends”, which have been labelled using 7 emotion categories. This dataset presents challenges similar to those encountered in a real-life dataset, including multiple people in the frame and occlusion.

To address the challenge of multiple people in frame, we have incorporated a mechanism, called pyppbox to enable us to track and identify the person whose emotional state we are observing and extract information for just that person in the frame. This is accomplished by training the facial recognition model FaceNet on images of the 6 main Friends characters, before running pyppbox on the dataset. The output generates bounding box coordinates for each person in each frame with an identity label, which we use to isolate the OpenPose key-points for the person of interest. From these key-points we create a set of pose features, including distances and angles as the input into the algorithms that are used to categorize the emotions, starting with LSTM.

We present the results of our study, discussing the performance of the combinations of features for categorising emotion, the effectiveness of the tools used for feature extraction and the future development of this approach for long-term observation of emotional wellbeing in age-related health conditions.
Original languageEnglish
Pages94-95
DOIs
Publication statusPublished (in print/issue) - 30 Sept 2025
EventInternational Digital Mental Health & Wellbeing Conference - Granada, Granada, Spain
Duration: 21 May 202523 May 2025
Conference number: 3rd
https://granada-en.congresoseci.com/dmhw2025/programme

Conference

ConferenceInternational Digital Mental Health & Wellbeing Conference
Country/TerritorySpain
CityGranada
Period21/05/2523/05/25
Internet address

Fingerprint

Dive into the research topics of 'Using Computer Vision to Apply Activity Recognition Techniques to the Monitoring of Emotional Wellbeing'. Together they form a unique fingerprint.

Cite this