WO2021238258A1 - Disk failure prediction method and system - Google Patents

Disk failure prediction method and system Download PDF

Info

Publication number
WO2021238258A1
WO2021238258A1 PCT/CN2021/073440 CN2021073440W WO2021238258A1 WO 2021238258 A1 WO2021238258 A1 WO 2021238258A1 CN 2021073440 W CN2021073440 W CN 2021073440W WO 2021238258 A1 WO2021238258 A1 WO 2021238258A1
Authority
WO
WIPO (PCT)
Prior art keywords
disk
sample
failure prediction
positive
disks
Prior art date
Application number
PCT/CN2021/073440
Other languages
French (fr)
Chinese (zh)
Inventor
王团结
梁鑫辉
曹琪
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2021238258A1 publication Critical patent/WO2021238258A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2221Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test input/output devices or peripheral units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Definitions

  • the invention relates to the technical field of intelligent operation and maintenance, and in particular to a disk failure prediction method and system.
  • the use of hard drives has reached the million-level.
  • the frequent occurrence of disk failures will cause the stability and reliability of the storage system and even the entire IT infrastructure to decline, and even have a negative impact on the business service level agreement.
  • the disk is also the component with the highest failure rate in the data center. Whether it is an abnormal disk read/write speed or data loss, the consequences for any enterprise are very serious. If the disk failure can be predicted in advance before the disk failure, and the possible abnormal disks can be backed up or replaced in time, it will greatly reduce the loss caused by the disk failure and bring great convenience to the operation of the storage system. And effectively improve the reliability of the data center.
  • SMART Self-Monitoring Analysis and Reporting Technology, self-monitoring, analysis and reporting technology
  • the traditional fault prediction method is to compare the characteristic value of the sample obtained by SMART monitoring with the preset safety value set by the manufacturer. If the characteristic value of the sample obtained by monitoring is about to or has exceeded the safety range of the preset safety value, it will be monitored by the host
  • the hardware or software automatically warns the user and initiates data recovery.
  • the above-mentioned failure prediction method will trigger a large number of disk IO processes and affect the normal business of users.
  • related technologies use machine learning methods to predict disk failures, allowing users to process user data during non-peak business hours, and its significance and value are better than post-event data recovery.
  • the present invention provides a disk failure prediction method and system, which aims to solve the problems that the existing disk failure prediction technology has low accuracy in predicting small sample disk failures and the problem that positive samples are difficult to predict.
  • the present invention provides a disk failure prediction method, including:
  • time series features as input and positive samples and negative samples as output, they are imported into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
  • the disk failure prediction method further includes:
  • the steps of using self-monitoring, analysis and reporting SMART technology to sample the disk data set, and marking the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk include:
  • the method further includes: using the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMARTs for disk failure analysis feature.
  • the step of extracting the SMART feature of each positive sample and negative sample according to a preset time sequence, and obtaining the time sequence feature of each positive sample and negative sample includes:
  • the SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
  • w is the weighting factor of positive and negative samples
  • y i is the true value of the i-th sample
  • Is the predicted value of the i-th sample Is the predicted probability value of the i-th sample
  • the present invention also provides a disk failure prediction system, including:
  • the extraction module is used to extract the SMART features of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence characteristics of each positive sample and negative sample;
  • Import module used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;
  • the machine learning module is used to import time series features as input and positive samples and negative samples as output to the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
  • the disk failure prediction system further includes:
  • the failure prediction module is used to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
  • the sampling module includes:
  • the disk sampling sub-module is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
  • the feature marking sub-module is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as a positive sample.
  • the sampling module further includes:
  • the feature analysis sub-module is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.
  • the disk failure prediction solution provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm.
  • the loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples.
  • SMART technology uses SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import it into the XGBoost algorithm that contains a custom loss function.
  • the XGBoost algorithm can machine learning the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples.
  • the classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.
  • the disk failure prediction method imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.
  • Fig. 1-A is a schematic diagram of the first disk failure provided by the prior art
  • FIG. 2 is a schematic flowchart of a first disk failure prediction method provided by an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a method for labeling positive and negative samples provided by the embodiment shown in FIG. 2;
  • FIG. 4 is a schematic flowchart of a method for importing a custom loss function provided by the embodiment shown in FIG. 2;
  • FIG. 5 is a schematic flowchart of a second method for predicting a disk failure according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a first disk failure prediction system provided by an embodiment of the present invention.
  • Figure 7 is a schematic structural diagram of a second disk failure prediction system provided by an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a sampling module provided by the embodiment shown in FIG. 6.
  • the number of disk failures is often small, the technical challenge for disk failure prediction is very large. It is a small probability event that a disk failure causes a system downtime. For small-scale or short-loaded disk storage systems, the number of failed disks is very small. At the same time, because the SMART characteristics of the disk are sparse and the disk is close to the failure, a sudden change occurs, resulting in the value of most of the SMART characteristics related to the failure is zero. Furthermore, the sparsity of SMART features makes it difficult to predict large positive samples corresponding to failed disks.
  • FIG. 2 is a schematic flowchart of a disk failure prediction method provided by an embodiment of the present invention. As shown in FIG. 2, the disk failure prediction method includes the following steps:
  • S110 Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk.
  • the positive sample and the negative sample can be used as the output of machine learning and imported into the machine learning model, so that the relevant algorithm can predict the failure probability of the disk according to the type of the positive sample and the negative sample, and predict the failure of the disk.
  • this step S110 Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk, which specifically includes the following sub-steps :
  • S111 Use the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMART features for disk failure analysis.
  • the selected SMART feature needs to be related to the fault and has a large information divergence.
  • a total of 7 SMART features are selected, 5, 187, 192, 193, 197, 198, and 199.
  • S112 Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking.
  • the disk data set can be divided into a disk training set for training related machine learning algorithms, a disk verification set for verifying related machine learning algorithms, and a disk test set for failure prediction of related disks.
  • down-sampling can be performed according to the preset sampling disk ratio.
  • S113 Mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample.
  • the disk failure prediction method further includes:
  • S120 Extract the SMART feature of each positive sample and the negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and the negative sample.
  • the sliding window can be 3 days, 5 days, or 7 days.
  • the extraction method is specifically to extract the exponentially weighted average of the difference between the samples before and after a window period.
  • the steps of extracting the SMART features of each positive sample and negative sample according to a preset time sequence to obtain the time sequence characteristics of each positive sample and negative sample are as follows:
  • the SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
  • the exponential smoothing value of the first day is the mean value of the time series values of the previous three days, and the formula is as follows:
  • the exponential smoothing value of the last sample point in the window period W is used as the feature value extracted by the SMART technology.
  • the weighted average of the difference between the samples before and after it measures the cumulative change rate of the original SMART over a period of time, and makes up for the defect caused by the sparse SMART feature.
  • the disk failure prediction method shown in Figure 2 further includes the following steps:
  • this step S130 Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm, which specifically includes the following sub-steps:
  • w is the weighting factor of positive and negative samples
  • y i is the true value of the i-th sample
  • mapping is obtained, and the mapping range is 0-1, so that y i,pred reflect the predicted probability of the i-th sample obtained according to the predicted value.
  • the default loss function of XGBoost is The embodiment of the application uses the above-mentioned custom loss function to replace the default loss function of the XGBoost algorithm to implement the import of the custom loss function and obtain an improved XGBoost algorithm.
  • the custom loss function adds a positive and negative sample weight factor w, which can adjust the proportion of positive and negative samples in the loss function.
  • the ratio of positive and negative samples is about 1:10. So the value of w is 0.9.
  • the training process of XGBoost is more inclined to positive samples.
  • the custom loss function also adds an adjustment factor for the difficulty of forecasting Through the prediction difficulty adjustment factor, it is possible to distinguish the degree of difficulty of a sample prediction.
  • the prediction difficulty adjustment factor approaches 0, and the loss function exponentially approaches 0; and when the sample is difficult to predict, that is The prediction probability of a positive sample is close to 0, and the prediction probability of a negative sample is close to 1, and the prediction difficulty adjustment factor approaches 1, and the loss function is relatively unchanged.
  • the training process of the XGBoost algorithm can be adjusted to make the training process of the XGBoost algorithm more inclined to samples that are difficult to predict.
  • S132 Perform a first-order derivative and a second-order derivative on the custom loss function to obtain the first derivative and the second derivative of the custom loss function.
  • the first-order derivative can be obtained as follows:
  • the second derivative can be obtained as follows:
  • the improved XGBoost algorithm can be used to predict the failure probability of the disk, because the custom loss function adds the positive and negative sample weight factors and the difficulty of prediction The adjustment factor, therefore, the training process of the improved XGBoost algorithm is more inclined to positive samples and unpredictable samples, which solves the problem of too few positive samples and difficult to predict in the prior art.
  • S140 Take the time series feature as input and take the positive sample and the negative sample as the output, and import it into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
  • the disk failure prediction method provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm.
  • the loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples.
  • SMART technology uses SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import them into the XGBoost algorithm that contains a custom loss function.
  • the XGBoost algorithm can perform machine learning on the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples.
  • the classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.
  • the disk failure prediction method imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.
  • the disk failure prediction method further includes:
  • S210 Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk.
  • S220 Sort the failed disks according to the failure prediction probability to obtain a preset number of failed disks.
  • the disk failure prediction model includes the above-mentioned improved XGBoost algorithm, the classification boundary obtained by the algorithm through machine learning, and the probability range corresponding to the positive and negative samples.
  • the disk failure prediction model is used to predict the failure of the disks in the disk test set, and the failure prediction probability of each disk at a specific time can be obtained, and then the predicted failed disks can be sorted according to the size of the failure prediction probability, and the preset number can be obtained. Failed disk.
  • the embodiment of the present application can set a disk training set, count the average number of failed disks per day N, and select the N samples with the highest probability as the disks that are predicted to fail this time.
  • the weighted average of the difference between the samples before and after obtained in the embodiments of the application measures the cumulative change rate of the original SMART over a period of time.
  • the loss function using only the default configuration of XGBoost, it is self-explanatory. Defining the loss function makes the model training process more inclined to small samples and more inclined to samples that are difficult to predict. Therefore, it can effectively improve the accuracy and recall rate of disk failure prediction.
  • the embodiment of the present invention also provides a disk failure prediction system for implementing the above method of the present invention. Since the principle of solving the problem in the system embodiment is similar to the above method, it has at least the above All the beneficial effects brought about by the technical solutions of the embodiments will not be repeated here.
  • FIG. 6 is a schematic structural diagram of a disk failure prediction system provided by an embodiment of the present invention. As shown in FIG. 6, the disk failure prediction system includes:
  • the sampling module 101 is used to sample the disk data set using the self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disks and the negative samples corresponding to the normal disks;
  • the extraction module 102 is configured to extract the SMART feature of each positive sample and negative sample according to a preset time sequence to obtain the time sequence feature of each positive sample and negative sample;
  • the import module 103 is used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;
  • the machine learning module 104 is configured to use timing features as input and positive samples and negative samples as output to import into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
  • the disk failure prediction system also includes:
  • the failure prediction module 105 is configured to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
  • the disk sorting module 106 is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.
  • the sampling module 101 in the embodiment shown in FIG. 6 and FIG. 7 includes:
  • the disk sampling submodule 1011 is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
  • the feature marking sub-module 1012 is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample.
  • the sampling module 101 also includes a feature analysis sub-module 1013, which is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.
  • the specific embodiments of the computer-readable storage medium of the present invention are basically the same as the above-mentioned embodiments of the intelligent identification method for calcium oxalate crystals based on microscopic images, and will not be described in detail here.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
  • any reference signs located between parentheses should not be constructed as limitations on the claims.
  • the word “comprising” does not exclude the presence of parts or steps not listed in the claims.
  • the word “a” or “an” preceding a component does not exclude the presence of multiple such components.
  • the invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the unit claims that list several devices, several of these devices may be embodied in the same hardware item.
  • the use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A disk failure prediction method and system. The disk failure prediction method comprises: sampling disk data sets by using SMART, and performing marking to obtain positive samples corresponding to failed disks and negative samples corresponding to normal disks; extracting SMART features of each of the positive samples and negative samples according to a preset time sequence, so as to obtain a time sequence feature of each of the positive samples and negative samples; importing a custom loss function into an extreme gradient boosting (XGBoost) algorithm to obtain an improved XGBoost algorithm, wherein in the custom loss function, losses caused by misclassification of the positive samples are greater than those caused by misclassification of the negative samples; and taking the time sequence features as an input and taking the positive samples and the negative samples as an output, and importing the time sequence features and the positive and negative samples into the improved XGBoost algorithm, such that machine learning is performed on the disk data sets by means of the improved XGBoost algorithm, so as to obtain a disk failure prediction model. According to the technical solution of the present invention, the problems in the prior art of it being difficult to predict positive samples corresponding to failed disks, and the prediction accuracy of the failed disks not being high can thus be solved.

Description

一种磁盘故障预测方法和系统Disk failure prediction method and system
本申请要求于2020年05月28日提交中国专利局、申请号为202010471262.1、发明名称为“一种磁盘故障预测方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 28, 2020, the application number is 202010471262.1, and the invention title is "a method and system for disk failure prediction", the entire content of which is incorporated into this application by reference middle.
技术领域Technical field
本发明涉及智能运维技术领域,尤其涉及一种磁盘故障预测方法和系统。The invention relates to the technical field of intelligent operation and maintenance, and in particular to a disk failure prediction method and system.
背景技术Background technique
在大规模数据中心,硬盘的使用规模已达百万级别。盘类故障问题频发,会导致存储系统甚至整个IT基础设施稳定性和可靠性的下降,甚至对业务服务等级协议造成负面影响。另外,磁盘也是数据中心故障率最高的部件,不管是磁盘读写速度异常或是数据丢失对于任何企业来说后果都十分严重。如果能在磁盘发生故障前提前预测到磁盘故障,预先对可能出现的异常磁盘及时备份数据或进行替换,将极大减少因磁盘故障而造成的损失,对存储系统的运营带来极大便利,并有效提高数据中心可靠性。In large-scale data centers, the use of hard drives has reached the million-level. The frequent occurrence of disk failures will cause the stability and reliability of the storage system and even the entire IT infrastructure to decline, and even have a negative impact on the business service level agreement. In addition, the disk is also the component with the highest failure rate in the data center. Whether it is an abnormal disk read/write speed or data loss, the consequences for any enterprise are very serious. If the disk failure can be predicted in advance before the disk failure, and the possible abnormal disks can be backed up or replaced in time, it will greatly reduce the loss caused by the disk failure and bring great convenience to the operation of the storage system. And effectively improve the reliability of the data center.
SMART(Self-Monitoring Analysis and Reporting Technology,自我监测、分析及报告技术),是一种自动的硬盘状态检测与预警系统和规范。通过预设在硬盘硬件内的检测指令,对硬盘的硬件(如磁头、盘片、马达和电路)的运行情况进行监控。传统的故障预测方法是比对SMART监控得到的样本特征值和厂商所设定的预设安全值,若监控得到的样本特征值即将或已超出预设安全值的安全范围,就通过主机的监控硬件或软件自动向用户做出警告并启动数据恢复。然而,上述故障预测方法会引发大量磁盘IO进程,影响用户的正常业务。为了改进上述故障预测方法,相关技术利用机器学习方法预测盘故障,可以让用户在业务不繁忙时间处理用户数据,其意义和价值好于事后的数据恢复。SMART (Self-Monitoring Analysis and Reporting Technology, self-monitoring, analysis and reporting technology) is an automatic hard disk status detection and early warning system and specification. Through the detection instructions preset in the hard disk hardware, the operation of the hard disk hardware (such as magnetic heads, platters, motors and circuits) is monitored. The traditional fault prediction method is to compare the characteristic value of the sample obtained by SMART monitoring with the preset safety value set by the manufacturer. If the characteristic value of the sample obtained by monitoring is about to or has exceeded the safety range of the preset safety value, it will be monitored by the host The hardware or software automatically warns the user and initiates data recovery. However, the above-mentioned failure prediction method will trigger a large number of disk IO processes and affect the normal business of users. In order to improve the above-mentioned failure prediction methods, related technologies use machine learning methods to predict disk failures, allowing users to process user data during non-peak business hours, and its significance and value are better than post-event data recovery.
然而,因为磁盘的故障数量往往较小,这样磁盘故障预测的技术挑战非常大。磁盘发生故障导致系统宕机属于小概率事件,对于小规模或装载 时间短的磁盘存储系统而言,发生故障磁盘的数目少之又少。同时,因为磁盘SMART特征稀疏且磁盘临近故障才发生突变,导致大部分与故障相关的SMART特征值为零。因此,SMART特征的稀疏性导致大量的与正常磁盘对应的负样本容易预测,而与故障磁盘对应的正样本难以预测。However, because the number of disk failures is often small, the technical challenge for disk failure prediction is very large. It is a small probability event that a disk failure causes a system downtime. For small-scale or short-loaded disk storage systems, the number of failed disks is very small. At the same time, because the SMART characteristics of the disk are sparse and the disk is close to the failure, a sudden change occurs, resulting in the value of most of the SMART characteristics related to the failure is zero. Therefore, the sparsity of SMART features leads to a large number of negative samples corresponding to normal disks that are easy to predict, while positive samples corresponding to failed disks are difficult to predict.
发明内容Summary of the invention
本发明提供一种磁盘故障预测方法和系统,旨在解决现有的磁盘故障预测技术,对小样本磁盘故障的预测准确率不高,且正样本难以预测的问题。The present invention provides a disk failure prediction method and system, which aims to solve the problems that the existing disk failure prediction technology has low accuracy in predicting small sample disk failures and the problem that positive samples are difficult to predict.
为实现上述目的,本发明提供了一种磁盘故障预测方法,包括:To achieve the above objective, the present invention provides a disk failure prediction method, including:
使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;
按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征;Extract the SMART feature of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and negative sample;
在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在自定义损失函数中正样本误分类造成的损失大于负样本;Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;
以时序特征作为输入、且以正样本和负样本作为输出,导入至改进型XGBoost算法,以使改进型XGBoost算法对磁盘数据集进行机器学习,得到磁盘故障预测模型。Taking time series features as input and positive samples and negative samples as output, they are imported into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
优选地,在得到磁盘故障预测模型后,磁盘故障预测方法还包括:Preferably, after the disk failure prediction model is obtained, the disk failure prediction method further includes:
使用磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率;Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
根据故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。Sort the failed disks according to the predicted probability of failure to obtain a preset number of failed disks.
优选地,使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本的步骤,包括:Preferably, the steps of using self-monitoring, analysis and reporting SMART technology to sample the disk data set, and marking the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk include:
按照预设采样磁盘比例对磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘;Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
标记故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为正样本。Mark the faulty disk and the SMART characteristics of each faulty disk within a predetermined period of time near the failure, as a positive sample.
优选地,在按照预设采样磁盘比例对磁盘数据集中磁盘进行采样的步骤之前,方法还包括:使用SMART算法对磁盘数据集进行值域和跳变分析,得到用于磁盘故障分析的多个SMART特征。Preferably, before the step of sampling the disks in the disk data set according to the preset sampling disk ratio, the method further includes: using the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMARTs for disk failure analysis feature.
优选地,按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征的步骤,包括:Preferably, the step of extracting the SMART feature of each positive sample and negative sample according to a preset time sequence, and obtaining the time sequence feature of each positive sample and negative sample, includes:
根据公式:According to the formula:
diff(t)=S(t)-S(t-1);diff(t)=S(t)-S(t-1);
Figure PCTCN2021073440-appb-000001
Figure PCTCN2021073440-appb-000001
Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];
计算得到每个正样本和负样本的SMART特征;其中,S为时间序列,t为时间,diff为前后样本差值;Y为指数平滑序列,alpha为平滑系数。The SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
优选地,在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法的步骤,包括:Preferably, the steps of importing a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain the improved XGBoost algorithm include:
设置自定义损失函数:
Figure PCTCN2021073440-appb-000002
Figure PCTCN2021073440-appb-000003
Set a custom loss function:
Figure PCTCN2021073440-appb-000002
Figure PCTCN2021073440-appb-000003
其中,w为正负样本权重因子,y i为第i个样本的真实值,
Figure PCTCN2021073440-appb-000004
为第i个样本的预测值,
Figure PCTCN2021073440-appb-000005
为第i个样本的预测概率值;
Among them, w is the weighting factor of positive and negative samples, y i is the true value of the i-th sample,
Figure PCTCN2021073440-appb-000004
Is the predicted value of the i-th sample,
Figure PCTCN2021073440-appb-000005
Is the predicted probability value of the i-th sample;
对自定义损失函数分别进行一阶求导和二阶求导,得到自定义损失函数的一阶导数和二阶导数;Perform the first-order derivative and the second-order derivative of the custom loss function respectively to obtain the first derivative and the second derivative of the custom loss function;
将自定义损失函数的一阶导数和二阶导数分别导入至XGBoost算法中,得到改进型XGBoost算法。Import the first derivative and the second derivative of the custom loss function into the XGBoost algorithm to obtain the improved XGBoost algorithm.
根据本发明的第二方面,本发明还提供了一种磁盘故障预测系统,包括:According to the second aspect of the present invention, the present invention also provides a disk failure prediction system, including:
采样模块,用于使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;The sampling module is used to sample the disk data set using self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;
提取模块,用于按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征;The extraction module is used to extract the SMART features of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence characteristics of each positive sample and negative sample;
导入模块,用于在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在自定义损失函数中正样本误分类造成的损失大于负样本;Import module, used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;
机器学习模块,用于以时序特征作为输入、且以正样本和负样本作为输出,导入至改进型XGBoost算法,以使改进型XGBoost算法对磁盘数据集进行机器学习,得到磁盘故障预测模型。The machine learning module is used to import time series features as input and positive samples and negative samples as output to the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
优选地,磁盘故障预测系统还包括:Preferably, the disk failure prediction system further includes:
故障预测模块,用于使用磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率;The failure prediction module is used to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
磁盘排序模块,用于根据故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。The disk sorting module is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.
优选地,采样模块包括:Preferably, the sampling module includes:
磁盘采样子模块,用于按照预设采样磁盘比例对磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘;The disk sampling sub-module is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
特征标记子模块,用于标记故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为正样本。The feature marking sub-module is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as a positive sample.
优选地,采样模块还包括:Preferably, the sampling module further includes:
特征分析子模块,用于使用SMART算法对磁盘数据集进行值域和跳变分析,得到用于磁盘故障分析的多个SMART特征。The feature analysis sub-module is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.
本申请技术方案提供的磁盘故障预测方案,在极致梯度提升XGBoost算法中导入自定义损失函数,该自定义损失函数中正样本误分类造成的损失大于负样本误分类造成的损失,因此当将该自定义损失函数导入极致梯度提升XGBoost算法,替换XGBoost原有的自定义损失函数后,XGBoost算法在训练过程中更加倾向于正样本。具体地,使用SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;然后提取得到每个正样本和负样本的时序特征,这样将时序特征作为输入,以正样本和负样本作为输出,导入包含有自定义损失函数的 XGBoost算法中,XGBoost算法能够根据自定义损失函数对输入的时序特征进行机器学习,得到正负样本的分类边界,根据该分类边界划分出每个样本所属正负类型的概率,从而训练得到磁盘故障预测模型。The disk failure prediction solution provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm. The loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples. After defining the loss function and importing the extreme gradient boosting XGBoost algorithm, after replacing the original custom loss function of XGBoost, the XGBoost algorithm is more inclined to positive samples during the training process. Specifically, use SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import it into the XGBoost algorithm that contains a custom loss function. The XGBoost algorithm can machine learning the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples. The classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.
综上,本申请技术方案提供的磁盘故障预测方法,通过将自定义损失函数导入极致梯度提升XGBoost算法中,因为自定义损失函数中正样本误分类导致的损失大于负样本,这样在使用时序特征和正负样本对XGBoost算法进行训练时,能够得到准确预测磁盘故障的磁盘故障预测模型,从而解决现有技术中因SMART特征稀疏性导致的与故障磁盘对应的正样本难以预测的问题。To sum up, the disk failure prediction method provided by the technical solution of this application imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.
附图说明Description of the drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on the structure shown in these drawings.
图1-A是现有技术提供的第一种磁盘故障的示意图;Fig. 1-A is a schematic diagram of the first disk failure provided by the prior art;
图1-B是现有技术提供的第二种磁盘故障的示意图;Figure 1-B is a schematic diagram of a second type of disk failure provided by the prior art;
图2是本发明实施例提供的第一种磁盘故障预测方法的流程示意图;2 is a schematic flowchart of a first disk failure prediction method provided by an embodiment of the present invention;
图3是图2所示实施例提供的一种正负样本标记方法的流程示意图;3 is a schematic flowchart of a method for labeling positive and negative samples provided by the embodiment shown in FIG. 2;
图4是图2所示实施例提供的一种自定义损失函数导入方法的流程示意图;4 is a schematic flowchart of a method for importing a custom loss function provided by the embodiment shown in FIG. 2;
图5是本发明实施例提供的第二种磁盘故障预测方法的流程示意图;5 is a schematic flowchart of a second method for predicting a disk failure according to an embodiment of the present invention;
图6是本发明实施例提供的第一种磁盘故障预测系统的结构示意图;6 is a schematic structural diagram of a first disk failure prediction system provided by an embodiment of the present invention;
图7是本发明实施例提供的第二种磁盘故障预测系统的结构示意图;Figure 7 is a schematic structural diagram of a second disk failure prediction system provided by an embodiment of the present invention;
图8是图6所示实施例提供的一种采样模块的结构示意图。FIG. 8 is a schematic structural diagram of a sampling module provided by the embodiment shown in FIG. 6.
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the objectives, functional characteristics and advantages of the present invention will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.
本发明实施例的主要解决问题是:The main problem to be solved by the embodiment of the present invention is:
因为磁盘的故障数量往往较小,这样磁盘故障预测的技术挑战非常大。磁盘发生故障导致系统宕机属于小概率事件,对于小规模或装载时间短的磁盘存储系统而言,发生故障磁盘的数目少之又少。同时,因为磁盘SMART特征稀疏且磁盘临近故障才发生突变,导致大部分与故障相关的SMART特征值为零。进而SMART特征的稀疏性导致大与故障磁盘对应的正样本难以预测。如图1-A和图1-B所示,统计分析发现即使是坏盘的最后7天,SMART5,SMART187等这些SMART特征中,50%-75%的值也都是0。而且故障磁盘直到剩余寿命的最后1-15天,SMART才会出现明显变化。如图1-A所示,这个盘的smart5到最后10天才出现变化,而到最后4天才发生明显增长;参见图1-B,甚至SMART187直到最后1天才发生变化。这种现象是普遍发生在坏盘上的,也就是越靠近生命的末期,越可能发生突变Because the number of disk failures is often small, the technical challenge for disk failure prediction is very large. It is a small probability event that a disk failure causes a system downtime. For small-scale or short-loaded disk storage systems, the number of failed disks is very small. At the same time, because the SMART characteristics of the disk are sparse and the disk is close to the failure, a sudden change occurs, resulting in the value of most of the SMART characteristics related to the failure is zero. Furthermore, the sparsity of SMART features makes it difficult to predict large positive samples corresponding to failed disks. As shown in Figure 1-A and Figure 1-B, statistical analysis found that even in the last 7 days of a bad disk, the values of 50%-75% of SMART features such as SMART5 and SMART187 are all 0. Moreover, the SMART will not change significantly until the last 1-15 days of the remaining life of the failed disk. As shown in Figure 1-A, the smart5 of this disk did not change until the last 10 days, and it did not increase significantly until the last 4 days; see Figure 1-B, even SMART187 did not change until the last day. This phenomenon generally occurs on bad disks, that is, the closer to the end of life, the more likely mutations will occur.
为解决上述问题,请参见图2,图2是本发明实施例提供的一种磁盘故障预测方法的流程示意图,如图2所示,该磁盘故障预测方法包括以下步骤:To solve the above problems, please refer to FIG. 2. FIG. 2 is a schematic flowchart of a disk failure prediction method provided by an embodiment of the present invention. As shown in FIG. 2, the disk failure prediction method includes the following steps:
S110:使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本。S110: Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk.
该正样本和负样本可以作为机器学习的输出,导入到机器学习的模型中,从而使得相关算法根据该正样本和负样本的类型预测磁盘的故障概率,对磁盘进行故障预测。The positive sample and the negative sample can be used as the output of machine learning and imported into the machine learning model, so that the relevant algorithm can predict the failure probability of the disk according to the type of the positive sample and the negative sample, and predict the failure of the disk.
具体如图3所示,该步骤S110:使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本,具体包括以下子步骤:Specifically, as shown in Figure 3, this step S110: Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk, which specifically includes the following sub-steps :
S111:使用SMART算法对磁盘数据集进行值域和跳变分析,得到用 于磁盘故障分析的多个SMART特征。其中,选出的SMART特征需要与故障相关且信息散度较大,本申请实施例中选用5、187、192、193、197、198和199共计7个SMART特征。S111: Use the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMART features for disk failure analysis. Among them, the selected SMART feature needs to be related to the fault and has a large information divergence. In the embodiment of the present application, a total of 7 SMART features are selected, 5, 187, 192, 193, 197, 198, and 199.
S112:按照预设采样磁盘比例对磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘。S112: Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking.
此次采样中,可以对磁盘数据集划分为对相关机器学习算法进行训练的磁盘训练集、对相关机器学习算法进行验证的磁盘验证集和对相关磁盘进行故障预测用的磁盘测试集。其中,预设磁盘采样比例可以设置为:正常磁盘采样比例:故障磁盘采样比例=5:1,在对训练集进行采样时,可以按照该预设采样磁盘比例进行降采样。In this sampling, the disk data set can be divided into a disk training set for training related machine learning algorithms, a disk verification set for verifying related machine learning algorithms, and a disk test set for failure prediction of related disks. Among them, the preset disk sampling ratio can be set as: normal disk sampling ratio: faulty disk sampling ratio=5:1. When sampling the training set, down-sampling can be performed according to the preset sampling disk ratio.
S113:标记故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为正样本。另外,可以标记故障磁盘远离故障的预定时段内SMART特征和该故障磁盘,作为负样本;并且还能够标记正常磁盘截止日期之前的最后预定时段内或远离截止日期的预定时段内SMART特征标记为负样本。S113: Mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample. In addition, it is possible to mark the SMART feature of the failed disk within a predetermined time period away from the failure and the failed disk as a negative sample; and it can also mark the SMART feature mark as negative in the last predetermined time period before the expiration date of the normal disk or within the predetermined time period far from the expiration date sample.
例如,将故障磁盘临近发生故障日期的7天内每天的SMART特征和该故障磁盘的编号标记为正样例;临近发生故障日期的30天之前的7天内每天的SMART特征和该故障磁盘的编号标记为负样例,将正常磁盘最后7天及30天之前的7天内的SMART特征和该正常磁盘的编号标记为负样例。For example, mark the daily SMART characteristics of the failed disk within 7 days of the failure date and the number of the failed disk as a positive example; mark the daily SMART characteristics and the number of the failed disk within the 7 days before 30 days of the failure date It is a negative example, the SMART characteristics of the last 7 days and 30 days before the normal disk and the number of the normal disk are marked as a negative example.
如图2所示,该磁盘故障预测方法还包括:As shown in Figure 2, the disk failure prediction method further includes:
S120:按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征。S120: Extract the SMART feature of each positive sample and the negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and the negative sample.
对多个SMART特征进行时序特征提取,需要设置滑动窗口,该滑动窗口可选择3天、5天或7天,提取方法具体为提取一窗口期内前后样本差值的指数加权平均数。To extract time series features for multiple SMART features, a sliding window needs to be set. The sliding window can be 3 days, 5 days, or 7 days. The extraction method is specifically to extract the exponentially weighted average of the difference between the samples before and after a window period.
具体地,该按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征的步骤,具体如下:Specifically, the steps of extracting the SMART features of each positive sample and negative sample according to a preset time sequence to obtain the time sequence characteristics of each positive sample and negative sample are as follows:
根据公式:According to the formula:
diff(t)=S(t)-S(t-1);diff(t)=S(t)-S(t-1);
Figure PCTCN2021073440-appb-000006
Figure PCTCN2021073440-appb-000006
Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];
计算得到每个正样本和负样本的SMART特征;其中,S为时间序列,t为时间,diff为前后样本差值;Y为指数平滑序列,alpha为平滑系数。The SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
设定SMART的时间序列为S,时间窗口为W,则第t天的前后样本差值为diff(t)=S(t)-S(t-1);Set the time series of SMART to S and the time window to W, then the difference between the samples before and after the tth day is diff(t)=S(t)-S(t-1);
设定指数平滑序列为Y,平滑系数为alpha,本实施例中alpha=0.8,则第一天的指数平滑值为前三天时序值的均值,公式如下:Set the exponential smoothing sequence to Y and the smoothing coefficient to alpha. In this embodiment, alpha=0.8, then the exponential smoothing value of the first day is the mean value of the time series values of the previous three days, and the formula is as follows:
Figure PCTCN2021073440-appb-000007
Figure PCTCN2021073440-appb-000007
从第二天起,当前时间的指数平滑值如下:Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1]。Starting from the second day, the exponential smoothing value of the current time is as follows: Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1].
本申请实施例中,将窗口期W内最后一个样本点的指数平滑值作为该SMART技术提取出来的特征值。In the embodiment of the present application, the exponential smoothing value of the last sample point in the window period W is used as the feature value extracted by the SMART technology.
与背景技术中提到的原始SMART特征相比,前后样本差值的加权平均值,衡量了原始SMART在过去一段时间内的累计变化率,弥补了SMART特征稀疏导致的缺陷。Compared with the original SMART feature mentioned in the background art, the weighted average of the difference between the samples before and after it measures the cumulative change rate of the original SMART over a period of time, and makes up for the defect caused by the sparse SMART feature.
在提取提取每个正样本和负样本的SMART特征,得到正负样本的时序特征后,图2所示磁盘故障预测方法还包括以下步骤:After extracting the SMART features of each positive sample and negative sample, and obtaining the timing characteristics of the positive and negative samples, the disk failure prediction method shown in Figure 2 further includes the following steps:
S130:在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在自定义损失函数中正样本误分类造成的损失大于负样本。S130: Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein, in the custom loss function, the loss caused by the misclassification of positive samples is greater than that of negative samples.
具体地,如图4所示,该步骤S130:在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法,具体包括以下子步骤:Specifically, as shown in FIG. 4, this step S130: Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm, which specifically includes the following sub-steps:
S131:设置自定义损失函数,该自定义损失函数具体为:
Figure PCTCN2021073440-appb-000008
Figure PCTCN2021073440-appb-000009
S131: Set a custom loss function, and the custom loss function is specifically:
Figure PCTCN2021073440-appb-000008
Figure PCTCN2021073440-appb-000009
其中,w为正负样本权重因子,y i为第i个样本的真实值,
Figure PCTCN2021073440-appb-000010
为第i个样本的预测值,
Figure PCTCN2021073440-appb-000011
为第i个样本的预测概率值。
Among them, w is the weighting factor of positive and negative samples, y i is the true value of the i-th sample,
Figure PCTCN2021073440-appb-000010
Is the predicted value of the i-th sample,
Figure PCTCN2021073440-appb-000011
Is the predicted probability value of the i-th sample.
Figure PCTCN2021073440-appb-000012
Figure PCTCN2021073440-appb-000013
Figure PCTCN2021073440-appb-000014
映射得到,映射范围为0-1,这样y i,pred就 反映了根据预测值得到的第i个样本的预测概率。
Figure PCTCN2021073440-appb-000012
Figure PCTCN2021073440-appb-000013
according to
Figure PCTCN2021073440-appb-000014
The mapping is obtained, and the mapping range is 0-1, so that y i,pred reflect the predicted probability of the i-th sample obtained according to the predicted value.
其中,XGBoost默认的损失函数为
Figure PCTCN2021073440-appb-000015
Figure PCTCN2021073440-appb-000016
本申请实施例使用上述自定义损失函数代替该XGBoost算法默认的损失函数,实现自定义损失函数的导入,得到改进型XGBoost算法。该自定义损失函数相比于默认损失函数增加了正负样本权重因子w,w能够调整正负样本在损失函数中的占比,本申请实施例中,正负样本比例约为1:10,所以w取值为0.9。与负样本分类成正样本相比,因为该正负样本权重因子w,正样本错误分类成负样本的损失更大,因此通过w调整XGBoost算法,则XGBoost的训练过程更倾向于正样本。
Among them, the default loss function of XGBoost is
Figure PCTCN2021073440-appb-000015
Figure PCTCN2021073440-appb-000016
The embodiment of the application uses the above-mentioned custom loss function to replace the default loss function of the XGBoost algorithm to implement the import of the custom loss function and obtain an improved XGBoost algorithm. Compared with the default loss function, the custom loss function adds a positive and negative sample weight factor w, which can adjust the proportion of positive and negative samples in the loss function. In this embodiment of the application, the ratio of positive and negative samples is about 1:10. So the value of w is 0.9. Compared with the classification of negative samples into positive samples, because of the weight factor w of positive and negative samples, the loss of misclassification of positive samples into negative samples is greater. Therefore, by adjusting the XGBoost algorithm by w, the training process of XGBoost is more inclined to positive samples.
另外,自定义损失函数还增加了预测难易调整因子
Figure PCTCN2021073440-appb-000017
通过该预测难易调整因子能够区分一个样本预测的难易程度。当样本容易预测时,即正样本的预测概率接近1,负样本预测概率接近0,则预测难易调整因子趋近于0,损失函数指数级趋近于0;而当样本难以预测时,即正样本预测概率接近0,负样本预测概率接近1,则预测难易调整因子趋近于1,损失函数相对不变化。通过设置预测难易调整因子
Figure PCTCN2021073440-appb-000018
能够调整XGBoost算法的训练过程,使该XGBoost算法的训练过程更加倾向于难以预测的样本。
In addition, the custom loss function also adds an adjustment factor for the difficulty of forecasting
Figure PCTCN2021073440-appb-000017
Through the prediction difficulty adjustment factor, it is possible to distinguish the degree of difficulty of a sample prediction. When the sample is easy to predict, that is, the prediction probability of a positive sample is close to 1, and the prediction probability of a negative sample is close to 0, the prediction difficulty adjustment factor approaches 0, and the loss function exponentially approaches 0; and when the sample is difficult to predict, that is The prediction probability of a positive sample is close to 0, and the prediction probability of a negative sample is close to 1, and the prediction difficulty adjustment factor approaches 1, and the loss function is relatively unchanged. By setting the forecast difficulty adjustment factor
Figure PCTCN2021073440-appb-000018
The training process of the XGBoost algorithm can be adjusted to make the training process of the XGBoost algorithm more inclined to samples that are difficult to predict.
S132:对自定义损失函数分别进行一阶求导和二阶求导,得到自定义损失函数的一阶导数和二阶导数。S132: Perform a first-order derivative and a second-order derivative on the custom loss function to obtain the first derivative and the second derivative of the custom loss function.
对原自定义损失函数:
Figure PCTCN2021073440-appb-000019
Figure PCTCN2021073440-appb-000020
Figure PCTCN2021073440-appb-000021
两边对
Figure PCTCN2021073440-appb-000022
一阶求导,可得一阶导数如下:
For the original custom loss function:
Figure PCTCN2021073440-appb-000019
Figure PCTCN2021073440-appb-000020
Figure PCTCN2021073440-appb-000021
Both sides
Figure PCTCN2021073440-appb-000022
For the first-order derivative, the first-order derivative can be obtained as follows:
Figure PCTCN2021073440-appb-000023
Figure PCTCN2021073440-appb-000023
对该一阶导数再次求导,可得二阶导数如下:Taking the derivative of the first derivative again, the second derivative can be obtained as follows:
Figure PCTCN2021073440-appb-000024
Figure PCTCN2021073440-appb-000024
Figure PCTCN2021073440-appb-000025
Figure PCTCN2021073440-appb-000025
S133:将自定义损失函数的一阶导数和二阶导数分别导入至XGBoost算法中,得到改进型XGBoost算法。S133: Import the first derivative and the second derivative of the custom loss function into the XGBoost algorithm respectively to obtain an improved XGBoost algorithm.
通过将该自定义损失函数的一阶导数和二阶导数导入XGBoost算法中,能够使用改进型XGBoost算法对磁盘进行故障概率的预测,因为自定义损失函数加入了正负样本权重因子和预测难易调整因子,因此改进型XGBoost算法的训练过程更加倾向于正样本和难以预测的样本,这样就解决了现有技术中正样本过少,难易预测的问题。By importing the first derivative and second derivative of the custom loss function into the XGBoost algorithm, the improved XGBoost algorithm can be used to predict the failure probability of the disk, because the custom loss function adds the positive and negative sample weight factors and the difficulty of prediction The adjustment factor, therefore, the training process of the improved XGBoost algorithm is more inclined to positive samples and unpredictable samples, which solves the problem of too few positive samples and difficult to predict in the prior art.
S140:以时序特征作为输入、且以正样本和负样本作为输出,导入至改进型XGBoost算法,以使改进型XGBoost算法对磁盘数据集进行机器学习,得到磁盘故障预测模型。S140: Take the time series feature as input and take the positive sample and the negative sample as the output, and import it into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
本申请技术方案提供的磁盘故障预测方法,在极致梯度提升XGBoost算法中导入自定义损失函数,该自定义损失函数中正样本误分类造成的损失大于负样本误分类造成的损失,因此当将该自定义损失函数导入极致梯度提升XGBoost算法,替换XGBoost原有的自定义损失函数后,XGBoost算法在训练过程中更加倾向于正样本。具体地,使用SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;然后提取得到每个正样本和负样本的时序特征,这样将时序特征作为输入,以正样本和负样本作为输出,导入包含有自定义损失函数的XGBoost算法中,XGBoost算法能够根据自定义损失函数对输入的时序特征进行机器学习,得到正负样本的分类边界,根据该分类边界划分出每个样本所属正负类型的概率,从而训练得到磁盘故障预测模型。The disk failure prediction method provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm. The loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples. After defining the loss function and importing the extreme gradient boosting XGBoost algorithm, after replacing the original custom loss function of XGBoost, the XGBoost algorithm is more inclined to positive samples during the training process. Specifically, use SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import them into the XGBoost algorithm that contains a custom loss function. The XGBoost algorithm can perform machine learning on the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples. The classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.
综上,本申请技术方案提供的磁盘故障预测方法,通过将自定义损失函数导入极致梯度提升XGBoost算法中,因为自定义损失函数中正样本误分类导致的损失大于负样本,这样在使用时序特征和正负样本对XGBoost算法进行训练时,能够得到准确预测磁盘故障的磁盘故障预测模型,从而解决现有技术中因SMART特征稀疏性导致的与故障磁盘对应的正样本难以预测的问题。In summary, the disk failure prediction method provided by the technical solution of this application imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.
另外,如图5所示,在图5所示的磁盘故障预测方法中,相比于图2 所示的磁盘故障预测方法,在上述得到磁盘故障预测模型后磁盘故障预测方法还包括:In addition, as shown in FIG. 5, in the disk failure prediction method shown in FIG. 5, compared with the disk failure prediction method shown in FIG. 2, after the disk failure prediction model is obtained, the disk failure prediction method further includes:
S210:使用磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率。S210: Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk.
S220:根据故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。S220: Sort the failed disks according to the failure prediction probability to obtain a preset number of failed disks.
磁盘故障预测模型包括上述改进型XGBoost算法,该算法经过机器学习得到的分类边界,以及正负样本对应的概率范围。这样使用磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测,能够得到各个磁盘在特定时刻的故障预测概率,然后根据故障预测概率的大小对预测的故障磁盘进行排序,能够得到前面预设数量的故障磁盘。具体地,本申请实施例能够设置磁盘训练集,统计每日故障磁盘数量均值N,选取概率最高的N个样本作为本次预测即将故障的磁盘。The disk failure prediction model includes the above-mentioned improved XGBoost algorithm, the classification boundary obtained by the algorithm through machine learning, and the probability range corresponding to the positive and negative samples. In this way, the disk failure prediction model is used to predict the failure of the disks in the disk test set, and the failure prediction probability of each disk at a specific time can be obtained, and then the predicted failed disks can be sorted according to the size of the failure prediction probability, and the preset number can be obtained. Failed disk. Specifically, the embodiment of the present application can set a disk training set, count the average number of failed disks per day N, and select the N samples with the highest probability as the disks that are predicted to fail this time.
综上,与SMART特征相比,本申请实施例得到的前后样本差值的加权平均值衡量了原始SMART在过去一段时间内的累计变化率,与仅仅使用XGBoost默认配置的损失函数相比,自定义损失函数使得模型训练过程更加倾向于小样本,更加倾向于难以预测的样本,因此能够有效改善磁盘故障预测的精确率和召回率。In summary, compared with the SMART feature, the weighted average of the difference between the samples before and after obtained in the embodiments of the application measures the cumulative change rate of the original SMART over a period of time. Compared with the loss function using only the default configuration of XGBoost, it is self-explanatory. Defining the loss function makes the model training process more inclined to small samples and more inclined to samples that are difficult to predict. Therefore, it can effectively improve the accuracy and recall rate of disk failure prediction.
另外,基于上述方法实施例的同一构思,本发明实施例还提供了磁盘故障预测系统,用于实现本发明的上述方法,由于该系统实施例解决问题的原理与上述方法相似,因此至少具有上述实施例的技术方案所带来的所有有益效果,在此不再一一赘述。In addition, based on the same concept of the above method embodiment, the embodiment of the present invention also provides a disk failure prediction system for implementing the above method of the present invention. Since the principle of solving the problem in the system embodiment is similar to the above method, it has at least the above All the beneficial effects brought about by the technical solutions of the embodiments will not be repeated here.
参见图6,图6为本发明实施例提供的一种磁盘故障预测系统的结构示意图,如图6所示,该磁盘故障预测系统包括:Referring to FIG. 6, FIG. 6 is a schematic structural diagram of a disk failure prediction system provided by an embodiment of the present invention. As shown in FIG. 6, the disk failure prediction system includes:
采样模块101,用于使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;The sampling module 101 is used to sample the disk data set using the self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disks and the negative samples corresponding to the normal disks;
提取模块102,用于按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征;The extraction module 102 is configured to extract the SMART feature of each positive sample and negative sample according to a preset time sequence to obtain the time sequence feature of each positive sample and negative sample;
导入模块103,用于在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在自定义损失函数中正样本误分类造成的损失大于负样本;The import module 103 is used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;
机器学习模块104,用于以时序特征作为输入、且以正样本和负样本作为输出,导入至改进型XGBoost算法,以使改进型XGBoost算法对磁盘数据集进行机器学习,得到磁盘故障预测模型。The machine learning module 104 is configured to use timing features as input and positive samples and negative samples as output to import into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.
另外,如图7所示,该磁盘故障预测系统还包括:In addition, as shown in Figure 7, the disk failure prediction system also includes:
故障预测模块105,用于使用磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率;The failure prediction module 105 is configured to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
磁盘排序模块106,用于根据故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。The disk sorting module 106 is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.
其中,如图8所示,图6和图7所示实施例中的采样模块101包括:Wherein, as shown in FIG. 8, the sampling module 101 in the embodiment shown in FIG. 6 and FIG. 7 includes:
磁盘采样子模块1011,用于按照预设采样磁盘比例对磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘;The disk sampling submodule 1011 is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
特征标记子模块1012,用于标记故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为正样本。The feature marking sub-module 1012 is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample.
该采样模块101还包括:特征分析子模块1013,用于使用SMART算法对磁盘数据集进行值域和跳变分析,得到用于磁盘故障分析的多个SMART特征。The sampling module 101 also includes a feature analysis sub-module 1013, which is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.
本发明计算机可读存储介质具体实施例与上述基于显微图像的草酸钙晶体智能鉴别方法的各实施例基本相同,在此不再详细赘述。The specific embodiments of the computer-readable storage medium of the present invention are basically the same as the above-mentioned embodiments of the intelligent identification method for calcium oxalate crystals based on microscopic images, and will not be described in detail here.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流 程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.
应当注意的是,在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的部件或步骤。位于部件之前的单词“一”或“一个”不排除存在多个这样的部件。本发明可以借助于包括有若干不同部件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that in the claims, any reference signs located between parentheses should not be constructed as limitations on the claims. The word "comprising" does not exclude the presence of parts or steps not listed in the claims. The word "a" or "an" preceding a component does not exclude the presence of multiple such components. The invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the unit claims that list several devices, several of these devices may be embodied in the same hardware item. The use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在 内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims (10)

  1. 一种磁盘故障预测方法,其特征在于,包括:A method for predicting disk failure, which is characterized in that it includes:
    使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;
    按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征;Extract the SMART feature of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and negative sample;
    在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在所述自定义损失函数中正样本误分类造成的损失大于负样本;Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein, in the custom loss function, the loss caused by the misclassification of positive samples is greater than that of negative samples;
    以所述时序特征作为输入、且以正样本和负样本作为输出,导入至所述改进型XGBoost算法,以使所述改进型XGBoost算法对所述磁盘数据集进行机器学习,得到磁盘故障预测模型。Use the time series feature as input and positive samples and negative samples as output, and import them into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model .
  2. 根据权利要求1所述的磁盘故障预测方法,其特征在于,在得到磁盘故障预测模型后,所述磁盘故障预测方法还包括:The disk failure prediction method according to claim 1, wherein after obtaining the disk failure prediction model, the disk failure prediction method further comprises:
    使用所述磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率;Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
    根据所述故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。The faulty disks are sorted according to the failure prediction probability to obtain a preset number of faulty disks.
  3. 根据权利要求1所述的磁盘故障预测方法,其特征在于,所述使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本的步骤,包括:The disk failure prediction method according to claim 1, wherein the self-monitoring, analysis and reporting SMART technology is used to sample the disk data set, and the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk are obtained by marking. The sample steps include:
    按照预设采样磁盘比例对所述磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘;Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
    标记所述故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为所述正样本。Mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as the positive sample.
  4. 根据权利要求3所述的磁盘故障预测方法,其特征在于,在所述按照预设采样磁盘比例对所述磁盘数据集中磁盘进行采样的步骤之前,所述方法还包括:The disk failure prediction method according to claim 3, wherein before the step of sampling the disks in the disk data set according to a preset sampling disk ratio, the method further comprises:
    使用SMART算法对磁盘数据集进行值域和跳变分析,得到用于磁盘 故障分析的多个SMART特征。The SMART algorithm is used to perform range and jump analysis on the disk data set to obtain multiple SMART features for disk failure analysis.
  5. 根据权利要求1所述的磁盘故障预测方法,其特征在于,所述按照预设时序提取每个正样本和负样本的SMART特征,得到每个正样本和负样本的时序特征的步骤,包括:The disk failure prediction method according to claim 1, wherein the step of extracting the SMART feature of each positive sample and negative sample according to a preset time sequence to obtain the time sequence feature of each positive sample and negative sample comprises:
    根据公式:According to the formula:
    diff(t)=S(t)-S(t-1);diff(t)=S(t)-S(t-1);
    Figure PCTCN2021073440-appb-100001
    Figure PCTCN2021073440-appb-100001
    Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];
    计算得到每个正样本和负样本的SMART特征;其中,S为时间序列,t为时间,diff为前后样本差值;Y为指数平滑序列,alpha为平滑系数。The SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
  6. 根据权利要求1所述的磁盘故障预测方法,其特征在于,所述在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法的步骤,包括:The disk failure prediction method according to claim 1, wherein the step of importing a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain the improved XGBoost algorithm comprises:
    设置自定义损失函数:
    Figure PCTCN2021073440-appb-100002
    Figure PCTCN2021073440-appb-100003
    Set a custom loss function:
    Figure PCTCN2021073440-appb-100002
    Figure PCTCN2021073440-appb-100003
    其中,w为正负样本权重因子,y i为第i个样本的真实值,
    Figure PCTCN2021073440-appb-100004
    为第i个样本的预测值,
    Figure PCTCN2021073440-appb-100005
    为第i个样本的预测概率值;
    Among them, w is the weighting factor of positive and negative samples, y i is the true value of the i-th sample,
    Figure PCTCN2021073440-appb-100004
    Is the predicted value of the i-th sample,
    Figure PCTCN2021073440-appb-100005
    Is the predicted probability value of the i-th sample;
    对所述自定义损失函数分别进行一阶求导和二阶求导,得到所述自定义损失函数的一阶导数和二阶导数;Performing first-order derivation and second-order derivation on the custom loss function, respectively, to obtain the first-order derivative and the second-order derivative of the custom loss function;
    将所述自定义损失函数的一阶导数和二阶导数分别导入至所述XGBoost算法中,得到所述改进型XGBoost算法。The first derivative and the second derivative of the custom loss function are respectively imported into the XGBoost algorithm to obtain the improved XGBoost algorithm.
  7. 一种磁盘故障预测系统,其特征在于,包括:A disk failure prediction system is characterized in that it comprises:
    采样模块,用于使用自我监测、分析及报告SMART技术对磁盘数据集进行采样,标记得到与故障磁盘对应的正样本以及与正常磁盘对应的负样本;The sampling module is used to sample the disk data set using self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;
    提取模块,用于按照预设时序提取每个正样本和负样本的SMART特 征,得到每个正样本和负样本的时序特征;The extraction module is used to extract the SMART characteristics of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence characteristics of each positive sample and negative sample;
    导入模块,用于在极致梯度提升XGBoost算法中导入自定义损失函数,得到改进型XGBoost算法;其中,在所述自定义损失函数中正样本误分类造成的损失大于负样本;The import module is used to import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples;
    机器学习模块,用于以所述时序特征作为输入、且以正样本和负样本作为输出,导入至所述改进型XGBoost算法,以使所述改进型XGBoost算法对所述磁盘数据集进行机器学习,得到磁盘故障预测模型。A machine learning module for importing the time series feature as input and positive samples and negative samples as output into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set , Get the disk failure prediction model.
  8. 根据权利要求7所述的磁盘故障预测系统,其特征在于,还包括:The disk failure prediction system according to claim 7, further comprising:
    故障预测模块,用于使用所述磁盘故障预测模型对磁盘测试集中的磁盘进行故障预测;得到各个磁盘的故障预测概率;The failure prediction module is configured to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;
    磁盘排序模块,用于根据所述故障预测概率对故障磁盘进行排序,得到预设数量的故障磁盘。The disk sorting module is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.
  9. 根据权利要求7所述的磁盘故障预测系统,其特征在于,所述采样模块包括:The disk failure prediction system according to claim 7, wherein the sampling module comprises:
    磁盘采样子模块,用于按照预设采样磁盘比例对所述磁盘数据集中磁盘进行采样,得到用于标记的多个故障磁盘和多个正常磁盘;The disk sampling submodule is used to sample the disks in the disk data set according to a preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;
    特征标记子模块,用于标记所述故障磁盘以及每个故障磁盘临近故障的预定时段内SMART特征,作为所述正样本。The feature marking sub-module is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as the positive sample.
  10. 根据权利要求9所述磁盘故障预测系统,其特征在于,所述采样模块还包括:The disk failure prediction system according to claim 9, wherein the sampling module further comprises:
    特征分析子模块,用于使用SMART算法对磁盘数据集进行值域和跳变分析,得到用于磁盘故障分析的多个SMART特征。The feature analysis sub-module is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.
PCT/CN2021/073440 2020-05-28 2021-01-23 Disk failure prediction method and system WO2021238258A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010471262.1A CN111752775B (en) 2020-05-28 2020-05-28 Disk fault prediction method and system
CN202010471262.1 2020-05-28

Publications (1)

Publication Number Publication Date
WO2021238258A1 true WO2021238258A1 (en) 2021-12-02

Family

ID=72674169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073440 WO2021238258A1 (en) 2020-05-28 2021-01-23 Disk failure prediction method and system

Country Status (2)

Country Link
CN (1) CN111752775B (en)
WO (1) WO2021238258A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169603A (en) * 2021-12-04 2022-03-11 湖北第二师范学院 XGboost-based regional primary school entrance academic degree prediction method and system
CN115410638A (en) * 2022-07-28 2022-11-29 南京航空航天大学 Magnetic disk fault detection system based on contrast clustering

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111752775B (en) * 2020-05-28 2022-11-18 苏州浪潮智能科技有限公司 Disk fault prediction method and system
CN112308126A (en) * 2020-10-27 2021-02-02 深圳前海微众银行股份有限公司 Fault recognition model training method, fault recognition device and electronic equipment
CN112395179B (en) * 2020-11-24 2023-03-10 创新奇智(西安)科技有限公司 Model training method, disk prediction method, device and electronic equipment
CN113095390A (en) * 2021-04-02 2021-07-09 东北大学 Walking stick motion analysis system and method based on cloud database and improved ensemble learning
CN113722130A (en) * 2021-08-16 2021-11-30 华中科技大学 Disk fault prediction method and system
CN116383016B (en) * 2023-06-06 2023-10-10 天翼云科技有限公司 Method, device and equipment for monitoring state of magnetic disk and predicting fault

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6574754B1 (en) * 2000-02-14 2003-06-03 International Business Machines Corporation Self-monitoring storage device using neural networks
CN105589795A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Disk failure prediction method and device based on prediction model
CN108986869A (en) * 2018-07-26 2018-12-11 南京群顶科技有限公司 A kind of disk failure detection method predicted using multi-model
CN109491850A (en) * 2018-11-21 2019-03-19 北京北信源软件股份有限公司 A kind of disk failure prediction technique and device
CN111752775A (en) * 2020-05-28 2020-10-09 苏州浪潮智能科技有限公司 Disk fault prediction method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6574754B1 (en) * 2000-02-14 2003-06-03 International Business Machines Corporation Self-monitoring storage device using neural networks
CN105589795A (en) * 2014-12-31 2016-05-18 中国银联股份有限公司 Disk failure prediction method and device based on prediction model
CN108986869A (en) * 2018-07-26 2018-12-11 南京群顶科技有限公司 A kind of disk failure detection method predicted using multi-model
CN109491850A (en) * 2018-11-21 2019-03-19 北京北信源软件股份有限公司 A kind of disk failure prediction technique and device
CN111752775A (en) * 2020-05-28 2020-10-09 苏州浪潮智能科技有限公司 Disk fault prediction method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114169603A (en) * 2021-12-04 2022-03-11 湖北第二师范学院 XGboost-based regional primary school entrance academic degree prediction method and system
CN115410638A (en) * 2022-07-28 2022-11-29 南京航空航天大学 Magnetic disk fault detection system based on contrast clustering
CN115410638B (en) * 2022-07-28 2023-11-07 南京航空航天大学 Disk fault detection system based on contrast clustering

Also Published As

Publication number Publication date
CN111752775B (en) 2022-11-18
CN111752775A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
WO2021238258A1 (en) Disk failure prediction method and system
CN108986869B (en) Disk fault detection method using multi-model prediction
CN108052528B (en) A kind of storage equipment timing classification method for early warning
CN108647136B (en) Hard disk damage prediction method and device based on SMART information and deep learning
EP3910571A1 (en) Methods and systems for server failure prediction using server logs
De Santo et al. Deep Learning for HDD health assessment: An application based on LSTM
CN109828869B (en) Method, device and storage medium for predicting hard disk fault occurrence time
US10216558B1 (en) Predicting drive failures
CN110164501B (en) Hard disk detection method, device, storage medium and equipment
CN111984511B (en) Multi-model disk fault prediction method and system based on two-classification
WO2022001125A1 (en) Method, system and device for predicting storage failure in storage system
CN111767162B (en) Fault prediction method for hard disks of different models and electronic device
CN112951311A (en) Hard disk fault prediction method and system based on variable weight random forest
CN116457802A (en) Automatic real-time detection, prediction and prevention of rare faults in industrial systems using unlabeled sensor data
CN111414289A (en) Disk failure prediction method and device based on transfer learning
CN111858108B (en) Hard disk fault prediction method and device, electronic equipment and storage medium
CN117251114A (en) Model training method, disk life prediction method, related device and equipment
CN115794451A (en) Execution strategy prediction method, device and system based on health state of storage equipment
CN115410638A (en) Magnetic disk fault detection system based on contrast clustering
CN111381990B (en) Disk fault prediction method and device based on flow characteristics
Zhou et al. Asldp: An active semi-supervised learning method for disk failure prediction
CN112737834A (en) Cloud hard disk fault prediction method, device, equipment and storage medium
CN115982622B (en) Nuclear reactor coolant system operation transient state rapid identification method, device and system
Xu et al. Classification Based Hard Disk Drive Failure Prediction: Methodologies, Performance Evaluation and Comparison
CN117093433B (en) Fault detection method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21812827

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21812827

Country of ref document: EP

Kind code of ref document: A1