Transformer-based Decoupled Modality Feature Learning for Visible-Infrared Person Re-Identification

  • Hu Lu
  • , Tingting Qin
  • , Yuxin Li
  • , Zhansheng Liu
  • , Yingquan Wang
  • , Shengli Wu
  • , ShaoHua Wan

Research output: Contribution to journalArticlepeer-review

1 Downloads (Pure)

Abstract

Visible-Infrared person re-identification (VI-ReID) aims to match pedestrian images across modalities, requiring the simultaneous handling of intra- and cross-modality discrepancies. Existing dual-stream networks extract modality-specific features but often suffer from over-coupling and insufficient shared identity modeling. Simple feature fusion strategies do not adequately address the modality gap. We propose a ViT-based deep learning framework, termed Transformer-based Decoupled Modality Feature Learning (TDMFL), which effectively learns both modality-specific and modality-shared features while leveraging modality-invariant identity information to decouple different modality representations. Specifically, we first introduce an identity-modality decoupling learning strategy (IMDL) to facilitate learning with reliable modality-shared features while preserving essential modality-specific information. Additionally, we design a novel Identity-Modality Aggregation (IMA) loss function that efficiently integrates modality-specific and modality-shared features, assisting the model in learning more modality-invariant representations from both identity consistency and modality adaptation perspectives. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. Code: https://github.com/hulu88/TDMFL.
Original languageEnglish
Pages (from-to)1-12
Number of pages12
JournalIEEE MultiMedia
Early online date12 Jan 2026
DOIs
Publication statusPublished online - 12 Jan 2026

Bibliographical note

Publisher Copyright:
© 1994-2012 IEEE.

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 16 - Peace, Justice and Strong Institutions
    SDG 16 Peace, Justice and Strong Institutions

Keywords

  • Transformers
  • Representation learning
  • Training
  • Feature extraction
  • Image color analysis
  • Computer vision
  • Colored noise
  • Optimization
  • Interference
  • Identification of persons

Fingerprint

Dive into the research topics of 'Transformer-based Decoupled Modality Feature Learning for Visible-Infrared Person Re-Identification'. Together they form a unique fingerprint.

Cite this