IMPROVING STROKE PREDICTION ON IMBALANCED CLINICAL DATA USING CTGAN AND TVAE: A SYNTHETIC DATA APPROACH

Research output: Contribution to conferenceAbstract

22 Downloads (Pure)

Abstract

Synthetic data (SD) have been evaluated and adopted in different domains and areas, especially in health. To conduct this study, we chose tabular data on stroke prediction, available in [3]. The dataset contains 11 clinical features, including the
last column, positive = 1 and negative = 0 for stroke. We chose this dataset because of its imbalance, which will be a perfect fit for implementing the generating techniques to know how well the real data resemble these. For generating SD, we use two techniques known as conditional tabular GAN (CTGAN) and tabular variational autoencoder (TVAE), which have different numbers of epochs and batch sizes. We further evaluated the results with three Machine Learning (ML) models as a benchmark with real data. The results highlight data generated with CTGAN (epochs=1500, batch size=500) perform
better with an accuracy score of 0.995 on random forest (RF) and Support Vector Machine (SVM).
Original languageEnglish
Pages28
Number of pages28
Publication statusPublished (in print/issue) - 1 May 2025
EventNorthern Ireland Biomedical Engineering Society Annual Symposium (NIBES) 2025 - Queens University, Belfast, Northern Ireland - Queens University, Belfast, United Kingdom
Duration: 1 May 20251 May 2025

Conference

ConferenceNorthern Ireland Biomedical Engineering Society Annual Symposium (NIBES) 2025 - Queens University, Belfast, Northern Ireland
Country/TerritoryUnited Kingdom
CityBelfast
Period1/05/251/05/25

Keywords

  • Synthetic Data generation
  • Imbalance data
  • Clinical data
  • Data Accuracy

Fingerprint

Dive into the research topics of 'IMPROVING STROKE PREDICTION ON IMBALANCED CLINICAL DATA USING CTGAN AND TVAE: A SYNTHETIC DATA APPROACH'. Together they form a unique fingerprint.

Cite this