TY - JOUR
T1 - Polyp-LVT: Polyp segmentation with lightweight vision transformers
AU - Lin, Long
AU - Lv, Guangzu
AU - Wang, Bin
AU - Xu, Cunlu
AU - Liu, Jun
N1 - Publisher Copyright:
© 2024 Elsevier B.V.
PY - 2024/9/27
Y1 - 2024/9/27
N2 - Automatic segmentation of polyps in endoscopic images is crucial for early diagnosis and surgical planning of colorectal cancer. However, polyps closely resemble surrounding mucosal tissue in both texture and indistinct borders and vary in size, appearance, and location which possess great challenge to polyp segmentation. Although some recent attempts have been made to apply Vision Transformer (ViT) to polyp segmentation and achieved promising performance, their application in clinical scenarios is still limited by high computational complexity, large model size, redundant dependencies, and significant training costs. To address these limitations, we propose a novel ViT-based approach named Polyp-LVT, strategically replacing the attention layer in the encoder with a global max pooling layer, which significantly reduces the model’s parameter count and computational cost while keeping the performance undegraded. Furthermore, we introduce a network block, named Inter-block Feature Fusion Module (IFFM), into the decoder, aiming to offer a streamlined yet highly efficient feature extraction. We conduct extensive experiments on three public polyp image benchmarks to evaluate our method. The experimental results show that compared with the baseline models, our Polyp-LVT network achieves a nearly 44% reduction in model parameters while gaining comparable segmentation performance.
AB - Automatic segmentation of polyps in endoscopic images is crucial for early diagnosis and surgical planning of colorectal cancer. However, polyps closely resemble surrounding mucosal tissue in both texture and indistinct borders and vary in size, appearance, and location which possess great challenge to polyp segmentation. Although some recent attempts have been made to apply Vision Transformer (ViT) to polyp segmentation and achieved promising performance, their application in clinical scenarios is still limited by high computational complexity, large model size, redundant dependencies, and significant training costs. To address these limitations, we propose a novel ViT-based approach named Polyp-LVT, strategically replacing the attention layer in the encoder with a global max pooling layer, which significantly reduces the model’s parameter count and computational cost while keeping the performance undegraded. Furthermore, we introduce a network block, named Inter-block Feature Fusion Module (IFFM), into the decoder, aiming to offer a streamlined yet highly efficient feature extraction. We conduct extensive experiments on three public polyp image benchmarks to evaluate our method. The experimental results show that compared with the baseline models, our Polyp-LVT network achieves a nearly 44% reduction in model parameters while gaining comparable segmentation performance.
KW - Polyp segmentation
KW - Lightweight vision transformer
KW - Pooling layer
KW - Colorectal cancer
UR - https://www.scopus.com/pages/publications/85197746163
U2 - 10.1016/j.knosys.2024.112181
DO - 10.1016/j.knosys.2024.112181
M3 - Article
SN - 0950-7051
VL - 300
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 112181
ER -