A Deep Clustering via Automatic Feature Embedded Learning for Human Activity Recognition

Ting Wang, Wing W. Y. Ng, Jinde Li, Qiuxiu Wu, Shuai Zhang, CD Nugent, Colin Shewell

Research output: Contribution to journalArticlepeer-review

50 Downloads (Pure)


Traditional clustering algorithms are widely used for building bag-of-words (BOW) models to aggregate spatio-temporal feature points extracted from a video for human activity recognition problems. Their performances are restricted by the computational complexity which limits the number of feature points being used. In contrast, deep clustering yields good clustering performance without the limit of the number of feature points. Therefore, this work proposes a dual stacked autoencoders features embedded clustering (DSAFEC) and a BOW construction method based on the DSAFEC (B-DSAFEC) to reduce the computational complexity and to remove the selection restriction. The DSAFEC first transforms feature points extracted from a video to a learned feature space and then probabilities of cluster assignment of feature points are predicted to build BOWs for human activity recognition. A soft clustering is used by assigning each feature point to multiple clusters yielding the largest probabilities instead of only one in hard clustering. Experimental results on three benchmark human activity datasets show that the B-DSAFEC yields better performance compared to five reference methods which are developed based on either traditional clustering methods or deep clustering methods.
Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
Early online date5 Feb 2021
Publication statusE-pub ahead of print - 5 Feb 2021


  • Bag-of-words (BOW)
  • deep clustering
  • human activity recognition
  • autoencoder


Dive into the research topics of 'A Deep Clustering via Automatic Feature Embedded Learning for Human Activity Recognition'. Together they form a unique fingerprint.

Cite this