MMPL-Net: Multi-modal prototype learning for one-shot RGB-D segmentation

Dexing Shan, Yunzhou Zhang, Xiaozheng Liu, Shitong Liu, Sonya A. Coleman, Dermot Kerr

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)
47 Downloads (Pure)

Abstract

For one-shot segmentation, prototype learning is extensively used. However, using only one RGB prototype to represent all information in the support image may lead to ambiguities. To this end, we propose a one-shot segmentation network based on multi-modal prototype learning that uses depth information to complement RGB information. Specifically, we propose a multi-modal fusion and refinement block (MFRB) and multi-modal prototype learning block (MPLB). MFRB fuses RGB and depth features to generate multi-modal features and refined depth features, which are used by MPLB, to generate multi-modal information prototypes, depth information prototypes, and global information prototypes. Furthermore, we introduce self-attention to capture global context information in RGB and depth images. By integrating self-attention, MFRB, and MPLB, we propose the multi-modal prototype learning network (MMPL-Net), which adapts to the ambiguity of visual information in the scene. Finally, we construct a one-shot RGB-D segmentation dataset called OSS-RGB-D-5i. Experiments using OSS-RGB-D-5i show that our proposed method outperforms several state-of-the-art techniques with fewer labeled images and generalizes well to previously unseen objects.

Original languageEnglish
Pages (from-to)1-14
Number of pages14
JournalNeural Computing and Applications
Volume35
Issue number14
Early online date28 Feb 2023
DOIs
Publication statusPublished online - 28 Feb 2023

Bibliographical note

Funding Information:
This work is supported by National Natural Science Foundation of China (No. 61973066), Major Science and Technology Projects of Liaoning Province (No. 2021JH1/10400049), Foundation of Key Laboratory of Equipment Reliability (No. D2C20205500306), Foundation of Key Laboratory of Aerospace System Simulation (No. 6142002200301).

Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.

Funding

This work is supported by National Natural Science Foundation of China (No. 61973066), Major Science and Technology Projects of Liaoning Province (No. 2021JH1/10400049), Foundation of Key Laboratory of Equipment Reliability (No. D2C20205500306), Foundation of Key Laboratory of Aerospace System Simulation (No. 6142002200301).

Keywords

  • Deep learning
  • Multi-modal prototype
  • One-shot segmentation
  • RGB-D semantic segmentation

Fingerprint

Dive into the research topics of 'MMPL-Net: Multi-modal prototype learning for one-shot RGB-D segmentation'. Together they form a unique fingerprint.

Cite this