Hard Disk Drive Reliability: A Comparative Study of Supervised Machine Learning Algorithms for Predicting Drive Failure

Alistair McLean, Roy Sterritt

Research output: Contribution to conferencePaperpeer-review

19 Downloads (Pure)

Abstract

Unexpected downtime and IT system outages can cost organisations millions of dollars in lost revenue, loss of opportunity, and negatively impacted reputation. Third party cloud services and infrastructure are commonly used by individuals and organisations as it offers the ability to create highly scalable applications without the huge cost of purchasing and maintaining their own hardware facility. Consequently, cloud service providers are challenged with ensuring that their data centres are reliable, as they have shared responsibility for the applications deployed in them. One of the most common causes of IT system failure in data centres is failing Hard Disk Drives (HDDs). It is proposed that if data centres were able to accurately predict imminent HDD failures, then appropriate action could be taken to prevent potential outages. This paper investigates the relationship between Self-Monitoring, Analysis, and Reporting Technology (SMART) attributes and HDD failure, implementing supervised machine learning methods to predict drive failure at various prediction horizons. Random Forest and XGBoost classifiers are observed to achieve the best prediction performance, with the Area Under the Receiver Operating Characteristic Curve (AUROC) calculated at 0.9185±0.0066 and 0.9162±0.0066 respectively at the shortest prediction horizon (0-24 hours prior to failure). Reallocated sectors count (SMART 5), reported uncorrectable errors (SMART 187), current pending sector count (SMART 197), and uncorrectable sector count (SMART 198) were found to be the most important SMART attributes for HDD failure prediction.
Original languageEnglish
Pages8-14
Number of pages7
Publication statusPublished (in print/issue) - 9 Mar 2025
EventThe Twenty First International Conference on Autonomic and Autonomous Systems - Mercure Lisboa Hotel, Lisbon, Portugal
Duration: 9 Mar 202513 Mar 2025
Conference number: 21st
https://www.iaria.org/conferences2025/CfPICAS25.html

Conference

ConferenceThe Twenty First International Conference on Autonomic and Autonomous Systems
Abbreviated titleICAS 2025
Country/TerritoryPortugal
CityLisbon
Period9/03/2513/03/25
Internet address

Keywords

  • Hard Disk Drive
  • HDD Reliability
  • Machine Learning
  • Failure Prediction
  • Autonomic Computing
  • Artifical Intelligence (AI)

Fingerprint

Dive into the research topics of 'Hard Disk Drive Reliability: A Comparative Study of Supervised Machine Learning Algorithms for Predicting Drive Failure'. Together they form a unique fingerprint.
  • AI and Autonomic Computing

    Sterritt, R., 9 Mar 2025. 3 p.

    Research output: Contribution to conferencePaper

    Open Access
    File

Cite this