Autonomous object recognition for robots

Student thesis: Doctoral Thesis

Abstract

The field of object recognition for robots has seen significant advancements, driven by developments in humanoid robotic platforms from multinational companies. These innovations have propelled research, especially as Industry 4.0 approaches, highlighting the importance of intelligent robotic systems in manufacturing. While vision has commonly been a leading area of research for object recognition for robotics, there are attributes of objects that cannot easily be determined by vision sensing alone, such as thermal conductivity and compressibility.

This thesis focuses on enhancing object recognition capabilities through the utilisation of advanced Deep Learning techniques, specifically Convolutional Neural Networks and Transformers, across various sensory input modalities. This PhD research project successfully improves the precision and efficiency of robotic systems in recognising everyday objects, which is crucial for their integration into the human environment. It involves three key contributions. The first contribution is a visual object detection method capable of detecting a vast range of objects in extremely cluttered scenes. The developed model has shown to outperform leading object detection models in the real-time area, achieving an accuracy of 65.9% on the leading visual object recognition benchmark dataset. The second contribution focuses on the usage of tactile data for touch-driven object recognition. The proposed method utilises a Convolutional Neural Network with fixed kernels to achieve superior accuracy when compared to alternative approaches, utilising the rich time-series tactile data available from the BioTac sensor. The proposed method outperformed existing tactile object recognition methods by improving accuracy by over 10% with a greatly reduced computational cost when evaluated on a dataset of 60 tactile objects. The final contribution is a combination of the visual and tactile systems, utilising a novel Transformer-based classification system alongside mid-fusion to achieve true visuo-tactile object recognition. The Transformer-based visuo-tactile pipeline demonstrated a notable improvement of up to 4% accuracy over both single-modality object recognition systems

Significant contributions in the field of object recognition for robotics presented in this PhD thesis demonstrate that a robotic system can simultaneously utilise the information presented through modalities such as vision and touch to achieve a more reliable and accurate determination of an objects identity. This PhD research project has the potential to play a key role in enabling the push through to true human-robot interaction to help push key robotic-assisted applications such as assisted home living.

Thesis is embargoed until 31st January 2027
Date of AwardJan 2025
Original languageEnglish
SupervisorBryan Gardiner (Supervisor) & Nazmul Siddique (Supervisor)

Keywords

  • multimodal sensor fusion
  • object detection
  • deep learning
  • computer vision
  • tactile object recognition
  • transformers
  • convolutional neural networks

Cite this

'