Understanding a happiness dataset: How the machine learning classification accuracy changes with different demographic groups

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review


In this paper, we use the HappyDB (which is a corpus of more than 100,000 happy moments or happiness statements) to train machine learning classifiers to classify the type of happiness statements, i.e., whether they are related to different categories, for example Achievement or Affection. Having identified the best performing classifier, we then sought to assess if the classifier had variable performance when tested using happiness statements from different demographic groups, such as those written by a married or single person, female or male, young or old and whether they are a parent or non-parent. Three different classifiers were initially used in this classification task, to determine classification accuracy. Having determined the best performing model (the convolutional neural network - CNN, deep learning algorithm), this model was then used for further analysis of results per cross sectional demographic groups. The CNN achieved an F1 score of 0.897 but had variable performance when tested on different demographic groups. Generally, we found that accuracy of prediction within this dataset declines with age, where the results for certain sub-groups were declining with increased age or flatlining, except for the single parents’ sub-group. This may be due to decreased numbers in these particular sub-groups, where the algorithm did not learn the patterns in the happiness statements for this cohort, due to a sparsity of training data for the sub-group. Results show that there is likely a change in word patterns in happiness statements for different demographics.
Original languageEnglish
Title of host publication2021 IEEE Symposium on Computers and Communications (ISCC)
Subtitle of host publicationICTS4eHealth-2021
PublisherIEEE Xplore
Number of pages4
ISBN (Electronic)978-1-6654-2744-9/21
ISBN (Print)978-1-6654-2745-6
Publication statusPublished - 15 Dec 2021
EventIEEE International Conference on ICT Solutions for e-Health - Athens, Greece
Duration: 5 Sep 20218 Sep 2021


ConferenceIEEE International Conference on ICT Solutions for e-Health
Abbreviated titleICTS4eHealth 2021
Internet address


  • Deep learning
  • Analytical models
  • Machine learning algorithms
  • Training data
  • Mental health
  • Prediction algorithms
  • Classification algorithms
  • Machine learningq
  • Positive pyschology


Dive into the research topics of 'Understanding a happiness dataset: How the machine learning classification accuracy changes with different demographic groups'. Together they form a unique fingerprint.

Cite this