Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data

Haiying Wang, Huiru Zheng, Francisco Azuaje

    Research output: Contribution to journalArticle

    16 Citations (Scopus)

    Abstract

    Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.

    LanguageEnglish
    Pages163-175
    Number of pages13
    JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
    Volume4
    Issue number2
    DOIs
    Publication statusPublished - 7 May 2007

    Fingerprint

    Self-organizing Feature Map
    Self organizing maps
    Hierarchical Clustering
    Gene Expression Data
    Gene expression
    Cluster Analysis
    Siméon Denis Poisson
    Gene Expression
    Pattern Discovery
    Data analysis
    Visualization
    Clustering
    Data Mining
    Gene Expression Profiling
    Self-organizing Map
    Hybrid Approach
    Profiling
    Statistical property
    Accelerate
    Clustering Algorithm

    Keywords

    • Hybrid machine learning
    • Pattern discovery and visualization
    • Poisson distribution
    • Self-organizing maps
    • Serial analysis of gene expression

    Cite this

    @article{17a8841783aa48978a835aa862ddd943,
    title = "Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data",
    abstract = "Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.",
    keywords = "Hybrid machine learning, Pattern discovery and visualization, Poisson distribution, Self-organizing maps, Serial analysis of gene expression",
    author = "Haiying Wang and Huiru Zheng and Francisco Azuaje",
    year = "2007",
    month = "5",
    day = "7",
    doi = "10.1109/TCBB.2007.070204",
    language = "English",
    volume = "4",
    pages = "163--175",
    journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
    issn = "1545-5963",
    number = "2",

    }

    TY - JOUR

    T1 - Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data

    AU - Wang, Haiying

    AU - Zheng, Huiru

    AU - Azuaje, Francisco

    PY - 2007/5/7

    Y1 - 2007/5/7

    N2 - Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.

    AB - Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.

    KW - Hybrid machine learning

    KW - Pattern discovery and visualization

    KW - Poisson distribution

    KW - Self-organizing maps

    KW - Serial analysis of gene expression

    UR - http://www.scopus.com/inward/record.url?scp=34248386878&partnerID=8YFLogxK

    U2 - 10.1109/TCBB.2007.070204

    DO - 10.1109/TCBB.2007.070204

    M3 - Article

    VL - 4

    SP - 163

    EP - 175

    JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

    T2 - IEEE/ACM Transactions on Computational Biology and Bioinformatics

    JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

    SN - 1545-5963

    IS - 2

    ER -