Abstract
Automatic segmentation of polyps in endoscopic images plays a critical role in the early diagnosis of colorectal cancer. In recent years, Visual Transformers, especially pyramid vision transformers, have achieved remarkable strides and become dominating methods in polyp segmentation. However, due to the high resemblance between polyps and normal tissues in terms of size, appearance, color, and other aspects, the pyramid vision transformer methods still face the challenges of the representation of fine-grained details and identifying highly disguised polyps that could be pivotal in precise segmentation of colorectal polyp. To address these challenges, we propose a novel Contextual Information Flow Guided Transformer (CIFFormer) for colorectal polyp segmentation to reconstruct the architecture of a pyramid vision transformer via a contextual information flow design. Our proposed method utilizes a pyramid-structured encoder to obtain multi-resolution feature maps. To effectively capture the target object’s features at various levels of detail, from coarse-grained global information to fine-grained local information, we design a Global-Local Feature Synergy Fusion module (GLFS). GLFS adopts a progressive fusion strategy, first fusing the features of adjacent hierarchy and then gradually fusing across the hierarchy. This allows the model to utilize the features of different semantic levels better and avoid the information loss caused by direct fusion. In addition, we also introduce a Multi-Density Global-Local Residual Module (MDGL). The multi-density residual units within MDGL improve feature propagation and information flow. By employing both local and global residual learning, the model gains a better ability to capture detailed information at both global and local scales. The experimental results demonstrate that our CIFFormer model surpasses 17 benchmark models and achieves state-of-the-art performance on five popular datasets. Furthermore, our model exhibits remarkable performance on two video datasets as well. The source code of this work is available at https://github.com/lonlin404/CIFFormer
Original language | English |
---|---|
Article number | 130413 |
Pages (from-to) | 1-12 |
Number of pages | 12 |
Journal | Neurocomputing |
Volume | 644 |
Early online date | 15 May 2025 |
DOIs | |
Publication status | Published online - 15 May 2025 |
Bibliographical note
Publisher Copyright:© 2025 Elsevier B.V.
Data Access Statement
I have share the link about the data and code in the manuscript.Keywords
- Contextual information flow
- Multi density residual
- Polyp segmentation
- Pyramid vision transformer