Abstract
This paper discusses the opportunities and
challenges associated with the collection of a large scale, diverse
dataset for Activity Recognition. The dataset was collected by 141
undergraduate students, in a controlled environment. Students
collected triaxial accelerometer data from a wearable
accelerometer whilst each carrying out 3 of the 18 investigated
activities, categorized into 6 scenarios of daily living. This data was
subsequently labelled, anonymized and uploaded to a shared
repository. This paper presents an analysis of data quality,
through outlier detection and assesses the suitability of the dataset
for the creation and validation of Activity Recognition models.
This is achieved through the application of a range of common
data driven machine learning approaches. Finally, the paper
describes challenges identified during the data collection process
and discusses how these could be addressed. Issues surrounding
data quality, in particular, identifying and addressing poor
calibration of the data were identified. Results highlight the
potential of harnessing these diverse data for Activity Recognition.
Based on a comparison of six classification approaches, a Random
Forest provided the best classification (F-measure: 0.88). In future
data collection cycles, participants will be encouraged to collect a
set of “common” activities, to support generation of a larger
homogeneous dataset. Future work will seek to refine the
methodology further and to evaluate model on new unseen data.
challenges associated with the collection of a large scale, diverse
dataset for Activity Recognition. The dataset was collected by 141
undergraduate students, in a controlled environment. Students
collected triaxial accelerometer data from a wearable
accelerometer whilst each carrying out 3 of the 18 investigated
activities, categorized into 6 scenarios of daily living. This data was
subsequently labelled, anonymized and uploaded to a shared
repository. This paper presents an analysis of data quality,
through outlier detection and assesses the suitability of the dataset
for the creation and validation of Activity Recognition models.
This is achieved through the application of a range of common
data driven machine learning approaches. Finally, the paper
describes challenges identified during the data collection process
and discusses how these could be addressed. Issues surrounding
data quality, in particular, identifying and addressing poor
calibration of the data were identified. Results highlight the
potential of harnessing these diverse data for Activity Recognition.
Based on a comparison of six classification approaches, a Random
Forest provided the best classification (F-measure: 0.88). In future
data collection cycles, participants will be encouraged to collect a
set of “common” activities, to support generation of a larger
homogeneous dataset. Future work will seek to refine the
methodology further and to evaluate model on new unseen data.
Original language | English |
---|---|
Pages | 522-527 |
DOIs | |
Publication status | Published (in print/issue) - 23 Mar 2018 |
Event | 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom Workshops) - Athens, Greece Duration: 19 Mar 2018 → 23 Mar 2018 |
Conference
Conference | 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom Workshops) |
---|---|
Country/Territory | Greece |
City | Athens |
Period | 19/03/18 → 23/03/18 |
Keywords
- Activity Recognition
- Crowd Sourcing
- Data Annotation
- Data Collection
- Data Quality
- Data Sharing