Feature learning for Human Activity Recognition using Convolutional Neural Networks: A case study for Inertial Measurement Unit and Audio data

Federico Cruciani, Anastasios Vafeiadis, CD Nugent, I Cleland, P McCullagh, Konstantinos Votis, Dimitrios Giakoumis, Dimitrios Tzovaras, Liming (Luke) Chen, Raouf Hamzaoui

Research output: Contribution to journalArticle

Abstract

The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specic expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (i) different topologies and parameters are assessed to identify the best candidate
models for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (ii) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data balanced accuracy was 91.98% on the UCI-HAR dataset and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.
LanguageEnglish
Number of pages14
JournalCCF Transactions on Pervasive Computing and Interaction
DOIs
Publication statusPublished - 24 Jan 2020

Fingerprint

Units of measurement
Neural networks
Learning systems
Topology

Cite this

@article{b5c0ec0016c74fe9ad3a48dfec19cd12,
title = "Feature learning for Human Activity Recognition using Convolutional Neural Networks: A case study for Inertial Measurement Unit and Audio data",
abstract = "The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specic expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (i) different topologies and parameters are assessed to identify the best candidatemodels for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (ii) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data balanced accuracy was 91.98{\%} on the UCI-HAR dataset and 67.51{\%} on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30{\%} on the DCASE 2017 dataset, and 35.24{\%} on the Extrasensory dataset.",
author = "Federico Cruciani and Anastasios Vafeiadis and CD Nugent and I Cleland and P McCullagh and Konstantinos Votis and Dimitrios Giakoumis and Dimitrios Tzovaras and Chen, {Liming (Luke)} and Raouf Hamzaoui",
year = "2020",
month = "1",
day = "24",
doi = "10.1007/s42486-020-00026-2",
language = "English",
journal = "CCF Transactions on Pervasive Computing and Interaction",
issn = "2524-5228",

}

TY - JOUR

T1 - Feature learning for Human Activity Recognition using Convolutional Neural Networks

T2 - CCF Transactions on Pervasive Computing and Interaction

AU - Cruciani, Federico

AU - Vafeiadis, Anastasios

AU - Nugent, CD

AU - Cleland, I

AU - McCullagh, P

AU - Votis, Konstantinos

AU - Giakoumis, Dimitrios

AU - Tzovaras, Dimitrios

AU - Chen, Liming (Luke)

AU - Hamzaoui, Raouf

PY - 2020/1/24

Y1 - 2020/1/24

N2 - The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specic expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (i) different topologies and parameters are assessed to identify the best candidatemodels for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (ii) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data balanced accuracy was 91.98% on the UCI-HAR dataset and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.

AB - The use of Convolutional Neural Networks (CNNs) as a feature learning method for Human Activity Recognition (HAR) is becoming more and more common. Unlike conventional machine learning methods, which require domain-specic expertise, CNNs can extract features automatically. On the other hand, CNNs require a training phase, making them prone to the cold-start problem. In this work, a case study is presented where the use of a pre-trained CNN feature extractor is evaluated under realistic conditions. The case study consists of two main steps: (i) different topologies and parameters are assessed to identify the best candidatemodels for HAR, thus obtaining a pre-trained CNN model. The pre-trained model (ii) is then employed as feature extractor evaluating its use with a large scale real-world dataset. Two CNN applications were considered: Inertial Measurement Unit (IMU) and audio based HAR. For the IMU data balanced accuracy was 91.98% on the UCI-HAR dataset and 67.51% on the real-world Extrasensory dataset. For the audio data, the balanced accuracy was 92.30% on the DCASE 2017 dataset, and 35.24% on the Extrasensory dataset.

U2 - 10.1007/s42486-020-00026-2

DO - 10.1007/s42486-020-00026-2

M3 - Article

JO - CCF Transactions on Pervasive Computing and Interaction

JF - CCF Transactions on Pervasive Computing and Interaction

SN - 2524-5228

ER -