For one-shot segmentation, prototype learning is extensively used. However, using only one RGB prototype to represent all information in the support image may lead to ambiguities. To this end, we propose a one-shot segmentation network based on multi-modal prototype learning that uses depth information to complement RGB information. Specifically, we propose a multi-modal fusion and refinement block (MFRB) and multi-modal prototype learning block (MPLB). MFRB fuses RGB and depth features to generate multi-modal features and refined depth features, which are used by MPLB, to generate multi-modal information prototypes, depth information prototypes, and global information prototypes. Furthermore, we introduce self-attention to capture global context information in RGB and depth images. By integrating self-attention, MFRB, and MPLB, we propose the multi-modal prototype learning network (MMPL-Net), which adapts to the ambiguity of visual information in the scene. Finally, we construct a one-shot RGB-D segmentation dataset called OSS-RGB-D-5i. Experiments using OSS-RGB-D-5i show that our proposed method outperforms several state-of-the-art techniques with fewer labeled images and generalizes well to previously unseen objects.
|Number of pages||14|
|Journal||Neural Computing and Applications|
|Early online date||28 Feb 2023|
|Publication status||Published online - 28 Feb 2023|
Bibliographical noteFunding Information:
This work is supported by National Natural Science Foundation of China (No. 61973066), Major Science and Technology Projects of Liaoning Province (No. 2021JH1/10400049), Foundation of Key Laboratory of Equipment Reliability (No. D2C20205500306), Foundation of Key Laboratory of Aerospace System Simulation (No. 6142002200301).
© 2023, The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature.
- Deep learning
- Multi-modal prototype
- One-shot segmentation
- RGB-D semantic segmentation