Skip to main navigation Skip to search Skip to main content

Evaluating Fidelity in Synthetic Tabular Data Generation: A Comparative Study of CTGAN and TVAE for Human Activity Recognition Datasets

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The challenges associated with collecting real-world data are largely addressed through the generation of Synthetic Data (SD) across various domains. Utility, fidelity and privacy represent the key challenges in the synthetic tabular data generation, and each of these offers a unique perspective. In this research, we focused on the fidelity of the generated tabular data in comparison to real data, using four main metrics recommended in previous literature: Hellinger Distance (HD), Pairwise Correlation Differences (PCD), R-squared Depth vs. Depth (R2DD) Plot, and Area Under Receiver Operating Characteristic Curve (AUC-ROC). We used two Human Activity Recognition (HAR) datasets, 1) Mobile Health (mHealth) and 2) HAR Using Smartphones (HARUS); these datasets differ in the number of activities and sample sizes. We generated data using two generative methods: Conditional Tabular Generative Adversarial Networks (CTGAN) and Tabular Variational Autoencoders (TVAE).

The early results indicate that CTGAN achieved a 45% lower HD on mHealth (0.0608 vs. 0.1100), while TVAE achieved a 32% lower HD on HARUS (0.0825 vs. 0.1212). CTGAN excelled on mHealth at 1500/500 epochs (PCD 0.0295, R2DD Plot 0.9855, AUC-ROC 0.5591), whereas TVAE excelled on HARUS at 1800/700 epochs (PCD 0.0639, R2DD Plot 0.5213, AUC-ROC 0.6424). These findings suggest that adjusting the generative technique according to dataset characteristics, such as sample size and feature complexity, is useful. Future work will expand this analysis by integrating additional generative methods and datasets to explore the utility and privacy of synthetic data with fidelity.
Original languageEnglish
Title of host publicationProceedings of the International Conference on Ubiquitous Computing and Ambient Intelligence (UCAmI 2025)
PublisherSpringer Cham
Chapter1
Pages15-26
Number of pages12
Volume1
ISBN (Electronic)978-3-032-16992-1
ISBN (Print)978-3-032-16991-4
DOIs
Publication statusPublished (in print/issue) - 1 Apr 2026

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Keywords

  • synthetic data
  • Synthetic Data Fidelity
  • Synthetic data generation
  • Human Activity Recognition

Fingerprint

Dive into the research topics of 'Evaluating Fidelity in Synthetic Tabular Data Generation: A Comparative Study of CTGAN and TVAE for Human Activity Recognition Datasets'. Together they form a unique fingerprint.

Cite this