Self-optimised cost-sensitive classifiers for early field failure prediction in storage systems

Research output: Contribution to journalArticlepeer-review

17 Downloads (Pure)

Abstract

Data storage systems such as disk arrays go through rigorous testing in the production phase, however, a few of these DAs fail in the field and are returned back to the manufacturer. Although the failure appears in relatively a small percentage of the manufactured DAs, it results in a significant loss of data, time and money. This paper is motivated by the hypothesis that many of these failures could be predicted at the testing stage through data mining and machine learning. Field failure is modelled as a classification problem, however, as in many real-world problems, the problem suffers from significant class imbalance. Several approaches have been proposed that attempt to improve the performance of imbalanced classification by either modifying the dataset (resampling), or assigning classification costs to the classes’ cost matrix. These methods have been shown to improve performance, but they come with many parameters that need to be set, something that usually requires a lengthy exhaustive search, especially on problems with several classes. This paper presents a new scalable genetic algorithm approach for automating the design of the cost matrix CM along with the algorithm parameters. The proposed algorithms are tested on a real-world manufacturing dataset from Seagate disk arrays; the target is to predict from the devices’ testing data those that are likely to fail in the field. To demonstrate its performance, the proposed approach evaluated on a number of standard datasets and compared with other state-of-the-art methods.
Original languageEnglish
Article number101388
Number of pages11
JournalSwarm and Evolutionary Computation
Volume83
Early online date20 Sept 2023
DOIs
Publication statusPublished - 1 Dec 2023

Keywords

  • machine learning
  • failure analysis
  • classification algorithm
  • data storage systems
  • evolutionary algorithms
  • genetic algorithm
  • random forest

Fingerprint

Dive into the research topics of 'Self-optimised cost-sensitive classifiers for early field failure prediction in storage systems'. Together they form a unique fingerprint.

Cite this