TY - JOUR
T1 - Co-Clustering Multi-View Data Using the Latent Block Model
AU - Tobin, Joshua
AU - Black, Michaela
AU - Ng, James
AU - Rankin, Debbie
AU - Wallace, Jonathan
AU - Hughes, Catherine
AU - Hoey, Leane
AU - Moore, AJ
AU - Wang, Jinling
AU - Horigan, Geraldine
AU - Carlin, Paul
AU - McNulty, Helene
AU - Molloy, Anne
AU - Zhang, Mimi
PY - 2025/4/10
Y1 - 2025/4/10
N2 - The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block-cluster and allowing the use of well-grounded model selection methods. Although the LBM has been adapted to accommodate various feature types, it cannot be applied to datasets consisting of multiple distinct sets of features, termed views, for a common set of observations. The multi-view LBM is introduced herein, extending the LBM method to multi-view data, where each view marginally follows an LBM. For any pair of two views, the dependence between them is captured by a row-cluster membership matrix. A likelihood-based approach is formulated for parameter estimation, harnessing a stochastic EM algorithm merged with a Gibbs sampler, while an ICL criterion is formulated to determine the number of row- and column-clusters in each view. To justify the application of the multi-view approach, hypothesis tests are formulated to evaluate the independence of row-clusters across views, with the testing procedure seamlessly integrated into the estimation framework. A penalty scheme is also introduced to induce sparsity in row-clusterings. The algorithm's performance is validated using synthetic and real-world datasets, accompanied by recommendations for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.
AB - The Latent Block Model (LBM) is a prominent model-based co-clustering method, returning parametric representations of each block-cluster and allowing the use of well-grounded model selection methods. Although the LBM has been adapted to accommodate various feature types, it cannot be applied to datasets consisting of multiple distinct sets of features, termed views, for a common set of observations. The multi-view LBM is introduced herein, extending the LBM method to multi-view data, where each view marginally follows an LBM. For any pair of two views, the dependence between them is captured by a row-cluster membership matrix. A likelihood-based approach is formulated for parameter estimation, harnessing a stochastic EM algorithm merged with a Gibbs sampler, while an ICL criterion is formulated to determine the number of row- and column-clusters in each view. To justify the application of the multi-view approach, hypothesis tests are formulated to evaluate the independence of row-clusters across views, with the testing procedure seamlessly integrated into the estimation framework. A penalty scheme is also introduced to induce sparsity in row-clusterings. The algorithm's performance is validated using synthetic and real-world datasets, accompanied by recommendations for optimal parameter selection. Finally, the multi-view co-clustering method is applied to a complex genomics dataset, and is shown to provide new insights for high-dimension multi-view problems.
KW - Co-clustering
KW - Latent Block Model
KW - Multi-View Data
KW - High-dimensional Data
KW - Gene Expression
UR - https://pure.ulster.ac.uk/en/publications/ad6fe251-df90-4a53-9c83-c9935b754957
U2 - 10.1016/j.csda.2025.108188
DO - 10.1016/j.csda.2025.108188
M3 - Article
SN - 0167-9473
VL - 210
SP - 1
EP - 22
JO - Computational Statistics & Data Analysis
JF - Computational Statistics & Data Analysis
M1 - 108188
ER -