Poisson-based self-organizing feature maps and hierarchical clustering for serial analysis of gene expression data

Haiying Wang, Huiru Zheng, Francisco Azuaje

    Research output: Contribution to journalArticle

    16 Citations (Scopus)

    Abstract

    Serial analysis of gene expression (SAGE) is a powerful technique for global gene expression profiling, allowing simultaneous analysis of thousands of transcripts without prior structural and functional knowledge. Pattern discovery and visualization have become fundamental approaches to analyzing such large-scale gene expression data. From the pattern discovery perspective, clustering techniques have received great attention. However, due to the statistical nature of SAGE data (i.e., underlying distribution), traditional clustering techniques may not be suitable for SAGE data analysis. Based on the adaptation and improvement of Self-Organizing Maps and hierarchical clustering techniques, this paper presents two new clustering algorithms, namely, PoissonS and PoissonHC, for SAGE data analysis. Tested on synthetic and experimental SAGE data, these algorithms demonstrate several advantages over traditional pattern discovery techniques. The results indicate that, by incorporating statistical properties of SAGE data, PoissonS and PoissonHC, as well as a hybrid approach (neuro-hierarchical approach) based on the combination of PoissonS and PoissonHC, offer significant improvements in pattern discovery and visualization for SAGE data. Moreover, a user-friendly platform, which may improve and accelerate SAGE data mining, was implemented. The system is freely available on request from the authors for nonprofit use.

    Original languageEnglish
    Pages (from-to)163-175
    Number of pages13
    JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
    Volume4
    Issue number2
    DOIs
    Publication statusPublished - 7 May 2007

      Fingerprint

    Keywords

    • Hybrid machine learning
    • Pattern discovery and visualization
    • Poisson distribution
    • Self-organizing maps
    • Serial analysis of gene expression

    Cite this