Abstract
Currently, open data and data sets are emerging in human activity recognition (HAR) due to their importance in different application areas such as improving people's lives, enabling informed care decisions, real-world problem solutions, and strategies for choosing the best HAR approaches. There are challenges associated with curating and sharing open data and data sets due to the absence of metadata and complete descriptions of the shared data. By properly curating data sets it will be easier to recognize, obtain and reuse to help make progress in HAR research. In this paper, we propose a conceptual framework for
understanding the open data set lifecycle as consisting of four phases of construction, sharing, finding, and using. Similarly, open issues and challenges are explored related to HAR data sets from the published literature. On this basis, an approach is presented to automatically extract metadata through web scraping of the HAR data sets and then perform a natural language processing (NLP) pipeline to detect the metadata of data sets. As a result of metadata retrieval, we show how comparisons can be performed under different scenarios which can help evaluate data set quality and identify areas for improvement in data set curation. This research work will assist the HAR research community in better understanding the open data set lifecycle and how data set quality can be improved.
understanding the open data set lifecycle as consisting of four phases of construction, sharing, finding, and using. Similarly, open issues and challenges are explored related to HAR data sets from the published literature. On this basis, an approach is presented to automatically extract metadata through web scraping of the HAR data sets and then perform a natural language processing (NLP) pipeline to detect the metadata of data sets. As a result of metadata retrieval, we show how comparisons can be performed under different scenarios which can help evaluate data set quality and identify areas for improvement in data set curation. This research work will assist the HAR research community in better understanding the open data set lifecycle and how data set quality can be improved.
Original language | English |
---|---|
Publication status | Accepted/In press - 31 Aug 2022 |
Event | International Conference on Ubiquitous Computing and Ambient Intelligence - Hotel Hesperia, CÓRDOBA, Spain Duration: 29 Nov 2022 → 2 Dec 2022 Conference number: 14 |
Conference
Conference | International Conference on Ubiquitous Computing and Ambient Intelligence |
---|---|
Abbreviated title | UCAmI |
Country/Territory | Spain |
City | CÓRDOBA |
Period | 29/11/22 → 2/12/22 |
Keywords
- Open Data
- Data Set Lifecycle
- Named Entity Recognition
- Data Quality
- Human Activity Recognition