Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

Che-Lun Hung, Wen-Pei Chen, Guan-Jie Hua, Huiru Zheng, Suh-Jen Tsai, Yaw-Ling Lin

    Research output: Contribution to journalArticle

    6 Citations (Scopus)

    Abstract

    Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
    LanguageEnglish
    Pages1096-1110
    JournalInternational Journal of Molecular Sciences
    Volume16
    Issue number1
    DOIs
    Publication statusPublished - 5 Jan 2015

    Fingerprint

    genome
    Human Genome
    Cloud computing
    Haplotypes
    Genes
    Nucleotides
    Polymorphism
    HapMap Project
    polymorphism
    nucleotides
    Single Nucleotide Polymorphism
    Genome
    Computational efficiency
    websites
    Parallel algorithms
    Genetic Research
    Websites
    Drug Design
    genes
    Medical Genetics

    Cite this

    Hung, Che-Lun ; Chen, Wen-Pei ; Hua, Guan-Jie ; Zheng, Huiru ; Tsai, Suh-Jen ; Lin, Yaw-Ling. / Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data. In: International Journal of Molecular Sciences. 2015 ; Vol. 16, No. 1. pp. 1096-1110.
    @article{91a8493a242441659b7d2800b0ab4027,
    title = "Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data",
    abstract = "Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.",
    author = "Che-Lun Hung and Wen-Pei Chen and Guan-Jie Hua and Huiru Zheng and Suh-Jen Tsai and Yaw-Ling Lin",
    year = "2015",
    month = "1",
    day = "5",
    doi = "10.3390/ijms16011096",
    language = "English",
    volume = "16",
    pages = "1096--1110",
    journal = "International Journal of Molecular Sciences",
    issn = "1661-6596",
    publisher = "MDPI",
    number = "1",

    }

    Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data. / Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen; Lin, Yaw-Ling.

    In: International Journal of Molecular Sciences, Vol. 16, No. 1, 05.01.2015, p. 1096-1110.

    Research output: Contribution to journalArticle

    TY - JOUR

    T1 - Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

    AU - Hung, Che-Lun

    AU - Chen, Wen-Pei

    AU - Hua, Guan-Jie

    AU - Zheng, Huiru

    AU - Tsai, Suh-Jen

    AU - Lin, Yaw-Ling

    PY - 2015/1/5

    Y1 - 2015/1/5

    N2 - Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

    AB - Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.

    U2 - 10.3390/ijms16011096

    DO - 10.3390/ijms16011096

    M3 - Article

    VL - 16

    SP - 1096

    EP - 1110

    JO - International Journal of Molecular Sciences

    T2 - International Journal of Molecular Sciences

    JF - International Journal of Molecular Sciences

    SN - 1661-6596

    IS - 1

    ER -