A Novel Visuo-Tactile Object Recognition Pipeline using Transformers with Feature Level Fusion

Research output: Contribution to conferencePaperpeer-review

17 Downloads (Pure)

Abstract

The task of visuo-tactile object recognition is key in enabling robots to interact with humans and their environment in an efficient and effective manner. The differing statistical properties of visual images and tactile time-series data make visuo-tactile fusion non-trivial and complex. This work investigates the usage of Transformers to perform feature level fusion for visuo-tactile data, utilising the Transformer to generate temporal relationships between the visual and tactile data through its self-attention structure. The proposed pipeline is tested on the PHAC-2 dataset, and a complex ablation experiment is completed across a collection of leading activation functions. The proposed pipeline is demonstrated to achieve state-of-the-art accuracy for visuo-tactile object recognition on the PHAC-2 dataset, achieving a 94.3% accuracy when data from two tactile actions are considered.
Original languageEnglish
Pages1-8
Number of pages8
DOIs
Publication statusPublished online - 9 Sept 2024
EventIEEE World Congress on Computational Intelligence - Japan
Duration: 30 Jun 20245 Jul 2024

Conference

ConferenceIEEE World Congress on Computational Intelligence
Abbreviated titleWCCI
Period30/06/245/07/24

Data Access Statement

no info found

Keywords

  • Object Recognition
  • Multimodal Fusion
  • transformer fires
  • Visio-tactile Data

Fingerprint

Dive into the research topics of 'A Novel Visuo-Tactile Object Recognition Pipeline using Transformers with Feature Level Fusion'. Together they form a unique fingerprint.

Cite this