Machine learning and patterning recognition in a new sensor system for chemical-biological detection and biomedical applications

Student thesis: Doctoral Thesis


Performing gas and vapour analysis has typically been accomplished by employing spectroscopic techniques. Often these methods produce excellent results, but human breath is a particularly difficult challenge in elemental analysis and one on which spectroscopy methods have generally failed. To cope with its limitations in this field, optical emission spectroscopy from a small-volume (5 μL) atmospheric pressure RF-driven helium plasma was used in conjunction with Partial Least Squares – Discriminant Analysis (PLS-DA) algorithm to identify, and determine the concentration of, the OES gas obtained from an RF plasma. Our proposed analysis strategy will be based on subjecting the complete datasets to rigorous mathematical interrogation via the development of suitable algorithms that are sufficiently rapid, accurate and robust. When analysing spectral data there are 4 main issues: the fact that this data is complex, high dimensional, collinear and noisy due to the sensitivity of spectrometry-based detection methods. The PLSDA algorithm is a recognized technique for handling high dimensionality via latent variables for binomial and multinomial classification of spectral data. However, as collinearity between these types of data is more than the standard level, PLSDA as a standalone algorithm may not cope with this problem. To mitigate this in the present study, first we confirm which of the aforementioned issues is the most significant by manipulating data via pre-processing algorithms, spectra segmentation and VIP selection. Second, we utilise a regularization method to generalise the main algorithm. Third, introducing an innovative approach called ‘peak merging’ to determine concentration. The final and most important finding of this work is to separately identify three different gases (methane, ethane & acetylene) along with their concentration.
Date of AwardJun 2022
Original languageEnglish
SponsorsDepartment for the Economy
SupervisorPaul Maguire (Supervisor), Hui Wang (Supervisor) & Davide Mariotti (Supervisor)


  • Machine learning
  • Methane detection
  • Hydrocarbon identification
  • Variables important in projection (VIP)
  • PLSDA algorithm
  • Preprocessing algorithm
  • Gas detection
  • Data segmentation
  • Regularization
  • Regression
  • Model overfitting
  • Collinearity

Cite this