Ablation Study on Convolutional Neural Network-Transformer Fusion

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

426 Downloads (Pure)

Abstract

This paper presents an ablation study on combining Convolutional Neural Networks and Vision Transformers for image classification. Features extracted from both architectures are reduced using dimensionality reduction techniques, Principal Component Analysis and Uniform Manifold Approximation and Projection, and fused using multiple fusion strategies. Support Vector Machine, \textit{k}-Nearest Neighbour, and Random Forest are used for image classification. Results highlight that Uniform Manifold Approximation is more effective for Convolutional Neural Network features, while Principal Component Analysis better suits the Vision Transformer in this study. The best performance was achieved using ResNet50 and Vision Transformer features, reduced with Uniform Manifold Approximation and Principal Component Analysis, fused via concatenation, and classified with \textit{k}-Nearest Neighbour.
Original languageEnglish
Title of host publicationIrish machine vision and image processing conference
Number of pages4
Publication statusAccepted/In press - 30 Jun 2025

Keywords

  • Computer Vision
  • Machine Learning
  • Image Classification

Fingerprint

Dive into the research topics of 'Ablation Study on Convolutional Neural Network-Transformer Fusion'. Together they form a unique fingerprint.

Cite this