Abstract
Due to seasonal and illumination variance, long-term visual localization tasks in dynamic environments is a crucial problem in the field of autonomous driving and robotics. At present, image-based retrieval is an effective method to solve this problem. However, it is difficult to completely distinguish changes in the same location over times by relying on content information alone. In order to solve these above problems, a double-domain network model combining semantic information and content information is proposed for visual localization task. In addition, this approach only needs to use the virtual KITTI 2 dataset for training. To reduce the domain difference between real scene and virtual image, the cross-predictive semantic segmentation mechanism is introduced to solve this problem. In addition, the obtained model achieves good domain adaptation and further has well generalization on other real datasets by introducing a domain loss function and a triplet semantic loss function. A series of experiments on the Extended CMU-Seasons dataset and the Oxford RobotCar-Seasons dataset demonstrates that the proposed network model outperformes the state-of-the-art baselines for retrieval-based visual localization in challenging environments.
Original language | English |
---|---|
Pages (from-to) | 1-15 |
Number of pages | 16 |
Journal | IEEE Transactions on Multimedia |
Volume | 26 |
Early online date | 20 Dec 2023 |
DOIs | |
Publication status | Published (in print/issue) - 4 Apr 2024 |
Bibliographical note
Publisher Copyright:© 1999-2012 IEEE.
Keywords
- Electrical and Electronic Engineering
- Computer Science Applications
- Media Technology
- Signal Processing
- Visualization
- Visual localization
- Semantic segmentation
- Task analysis
- Location awareness
- Training
- Semantics
- image retrieval
- Feature extraction
- changing environment
- domain adaptation
- semantic information