Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.
LanguageEnglish
Title of host publicationData Science and Knowledge Engineering for Sensing Decision Support
Pages1281-1291
Number of pages11
Volume11
ISBN (Electronic)978-981-3273-24-5
DOIs
Publication statusPublished - 24 Aug 2018

Fingerprint

Learning systems
Experiments

Cite this

@inproceedings{d39c146e61a74dbc85fe942dd8c56532,
title = "Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms",
abstract = "Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.",
author = "Rachael Heyburn and RR Bond and Michaela Black and Maurice Mulvenna and Wallace, {J. G.} and Debbie Rankin and Brian Cleland",
year = "2018",
month = "8",
day = "24",
doi = "DOI: 10.1142/9789813273238_0160",
language = "English",
isbn = "978-981-3273-22-1",
volume = "11",
pages = "1281--1291",
booktitle = "Data Science and Knowledge Engineering for Sensing Decision Support",

}

Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms. / Heyburn, Rachael; Bond, RR; Black, Michaela; Mulvenna, Maurice; Wallace, J. G.; Rankin, Debbie; Cleland, Brian.

Data Science and Knowledge Engineering for Sensing Decision Support. Vol. 11 2018. p. 1281-1291.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms

AU - Heyburn, Rachael

AU - Bond, RR

AU - Black, Michaela

AU - Mulvenna, Maurice

AU - Wallace, J. G.

AU - Rankin, Debbie

AU - Cleland, Brian

PY - 2018/8/24

Y1 - 2018/8/24

N2 - Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.

AB - Sharing data is often a risk in terms of security and privacy especially if the data is sensitive. Algorithms can be used to generate synthetic data from an original raw dataset in order to share data that are considered more ‘privacy preserving’, and that increase the level of anonymity. In this paper, we carry out an experiment to study the validity of conducting machine learning on synthetic data. We compare the evaluation metrics produced from machine learning models that were trained using synthetic data with metrics yielded from machine learning models that were trained using the corresponding real data.

U2 - DOI: 10.1142/9789813273238_0160

DO - DOI: 10.1142/9789813273238_0160

M3 - Conference contribution

SN - 978-981-3273-22-1

VL - 11

SP - 1281

EP - 1291

BT - Data Science and Knowledge Engineering for Sensing Decision Support

ER -