Abstract
Visible-Infrared person re-identification (VI-ReID) aims to match pedestrian images across modalities, requiring the simultaneous handling of intra- and cross-modality discrepancies. Existing dual-stream networks extract modality-specific features but often suffer from over-coupling and insufficient shared identity modeling. Simple feature fusion strategies do not adequately address the modality gap. We propose a ViT-based deep learning framework, termed Transformer-based Decoupled Modality Feature Learning (TDMFL), which effectively learns both modality-specific and modality-shared features while leveraging modality-invariant identity information to decouple different modality representations. Specifically, we first introduce an identity-modality decoupling learning strategy (IMDL) to facilitate learning with reliable modality-shared features while preserving essential modality-specific information. Additionally, we design a novel Identity-Modality Aggregation (IMA) loss function that efficiently integrates modality-specific and modality-shared features, assisting the model in learning more modality-invariant representations from both identity consistency and modality adaptation perspectives. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that our method significantly outperforms existing state-of-the-art approaches. Code: https://github.com/hulu88/TDMFL.
| Original language | English |
|---|---|
| Pages (from-to) | 1-12 |
| Number of pages | 12 |
| Journal | IEEE MultiMedia |
| Early online date | 12 Jan 2026 |
| DOIs | |
| Publication status | Published online - 12 Jan 2026 |
Bibliographical note
Publisher Copyright:© 1994-2012 IEEE.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 16 Peace, Justice and Strong Institutions
Keywords
- Transformers
- Representation learning
- Training
- Feature extraction
- Image color analysis
- Computer vision
- Colored noise
- Optimization
- Interference
- Identification of persons
Fingerprint
Dive into the research topics of 'Transformer-based Decoupled Modality Feature Learning for Visible-Infrared Person Re-Identification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver