TF-SOD : A Novel Transformer Framework for Salient Object Detection

Zhenyu Wang, Yunzhou Zhang, Yan Liu, Zhuo Wang, Sonya Coleman, Dermot Kerr

Research output: Contribution to journalArticlepeer-review

269 Downloads (Pure)

Abstract

Most of existing saliency object detection models are based on fully convolutional networks (FCNs), which learn multi-scale/level semantic information through convolutional layers to obtain high-quality predicted saliency maps. However, convolution is locally interactive, and thus it is difficult to capture remote dependencies. Additionally, FCNs-based methods suffer from coarse object boundaries. In this paper, to solve these problems, we propose a novel transformer framework for salient object detection (named TF-SOD), which consists of the encoder part of the FCN, the fusion module (FM), the transformer module (TM)
and the feature decoder module (FDM). Specifically, the FM is a bridge connecting the encoder and TM, and provides some foresight for the non-local interaction of the TM. FDM can efficiently decode the non-local features output by the TM, and achieve deep fusion with local features. This architecture enables the network to achieve a close integration of local and non-local interactions, making information complementary to each other, deeply mining the associated information between features. Furthermore, we also propose a novel edge reinforcement learning strategy, which can effectively suppress edge blurring from local and global aspects by means of a powerful network architecture. Extensive experiments using five datasets demonstrate that the proposed method outperforms 19 state-of-the-art methods.
Original languageEnglish
JournalNeural Computing and Applications
Publication statusAccepted/In press - 7 Feb 2022

Keywords

  • Salient object detection
  • Fusion module
  • Transformer module
  • Feature decoder
  • Edge reinforcement learning strategy

Fingerprint

Dive into the research topics of 'TF-SOD : A Novel Transformer Framework for Salient Object Detection'. Together they form a unique fingerprint.

Cite this