Abstract

The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block-cluster and allowing the use of well-grounded model selection methods. Although the LBM has been adapted to accommodate various feature types, it cannot be applied to datasets consisting of multiple distinct sets of features, termed views, for a common set of observations. The multi-view LBM is introduced herein, extending the LBM method to multi-view data, where each view marginally follows an LBM. For any pair of two views, the dependence between them is captured by a row-cluster membership matrix. A likelihood-based approach is formulated for parameter estimation, harnessing a stochastic EM algorithm merged with a Gibbs sampler, while an ICL criterion is formulated to determine the number of row- and column-clusters in each view. To justify the application of the multi-view approach, hypothesis tests are formulated to evaluate the independence of row-clusters across views, with the testing procedure seamlessly integrated into the estimation framework. A penalty scheme is also introduced to induce sparsity in row-clusterings. The algorithm's performance is validated using synthetic and real-world datasets, accompanied by recommendations for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.
Original languageEnglish
Article number108188
Pages (from-to)1-22
Number of pages22
JournalComputational Statistics & Data Analysis
Volume210
Early online date10 Apr 2025
DOIs
Publication statusPublished (in print/issue) - 31 Oct 2025

Bibliographical note

Publisher Copyright:
© 2025 The Authors

Funding

This work was funded in part by the HEA , DFHERIS and the Shared Island Fund and by the Research Ireland grant 21/RC/10295_P2 .

FundersFunder number
Higher Education Authority
21/RC/10295_P2

    UN SDGs

    This output contributes to the following UN Sustainable Development Goals (SDGs)

    1. SDG 3 - Good Health and Well-being
      SDG 3 Good Health and Well-being
    2. SDG 10 - Reduced Inequalities
      SDG 10 Reduced Inequalities

    Keywords

    • Co-clustering
    • Latent Block Model
    • Multi-View Data
    • High-dimensional Data
    • Gene Expression
    • High-dimensional data
    • Multi-view data
    • Gene expression

    Fingerprint

    Dive into the research topics of 'Co-Clustering Multi-View Data Using the Latent Block Model'. Together they form a unique fingerprint.

    Cite this