MMPL-Net: Multi-modal prototype learning for one-shot RGB-D segmentation

Dexing Shan, Yunzhou Zhang, Xiaozheng Liu, Shitong Liu, Sonya A. Coleman, Dermot Kerr

    Research output: Contribution to journalArticlepeer-review

    2 Citations (Scopus)
    9 Downloads (Pure)

    Abstract

    For one-shot segmentation, prototype learning is extensively used. However, using only one RGB prototype to represent all information in the support image may lead to ambiguities. To this end, we propose a one-shot segmentation network based on multi-modal prototype learning that uses depth information to complement RGB information. Specifically, we propose a multi-modal fusion and refinement block (MFRB) and multi-modal prototype learning block (MPLB). MFRB fuses RGB and depth features to generate multi-modal features and refined depth features, which are used by MPLB, to generate multi-modal information prototypes, depth information prototypes, and global information prototypes. Furthermore, we introduce self-attention to capture global context information in RGB and depth images. By integrating self-attention, MFRB, and MPLB, we propose the multi-modal prototype learning network (MMPL-Net), which adapts to the ambiguity of visual information in the scene. Finally, we construct a one-shot RGB-D segmentation dataset called OSS-RGB-D-5i. Experiments using OSS-RGB-D-5i show that our proposed method outperforms several state-of-the-art techniques with fewer labeled images and generalizes well to previously unseen objects.

    Original languageEnglish
    Pages (from-to)1-14
    Number of pages14
    JournalNeural Computing and Applications
    Volume35
    Issue number14
    Early online date28 Feb 2023
    DOIs
    Publication statusPublished online - 28 Feb 2023

    Bibliographical note

    Funding Information:
    This work is supported by National Natural Science Foundation of China (No. 61973066), Major Science and Technology Projects of Liaoning Province (No. 2021JH1/10400049), Foundation of Key Laboratory of Equipment Reliability (No. D2C20205500306), Foundation of Key Laboratory of Aerospace System Simulation (No. 6142002200301).

    Publisher Copyright:
    © 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

    Keywords

    • Deep learning
    • Multi-modal prototype
    • One-shot segmentation
    • RGB-D semantic segmentation

    Fingerprint

    Dive into the research topics of 'MMPL-Net: Multi-modal prototype learning for one-shot RGB-D segmentation'. Together they form a unique fingerprint.

    Cite this