WO2021238258A1

WO2021238258A1 - Disk failure prediction method and system

Info

Publication number: WO2021238258A1
Application number: PCT/CN2021/073440
Authority: WO
Inventors: 王团结; 梁鑫辉; 曹琪
Original assignee: 苏州浪潮智能科技有限公司
Priority date: 2020-05-28
Filing date: 2021-01-23
Publication date: 2021-12-02
Also published as: CN111752775B; CN111752775A

Abstract

A disk failure prediction method and system. The disk failure prediction method comprises: sampling disk data sets by using SMART, and performing marking to obtain positive samples corresponding to failed disks and negative samples corresponding to normal disks; extracting SMART features of each of the positive samples and negative samples according to a preset time sequence, so as to obtain a time sequence feature of each of the positive samples and negative samples; importing a custom loss function into an extreme gradient boosting (XGBoost) algorithm to obtain an improved XGBoost algorithm, wherein in the custom loss function, losses caused by misclassification of the positive samples are greater than those caused by misclassification of the negative samples; and taking the time sequence features as an input and taking the positive samples and the negative samples as an output, and importing the time sequence features and the positive and negative samples into the improved XGBoost algorithm, such that machine learning is performed on the disk data sets by means of the improved XGBoost algorithm, so as to obtain a disk failure prediction model. According to the technical solution of the present invention, the problems in the prior art of it being difficult to predict positive samples corresponding to failed disks, and the prediction accuracy of the failed disks not being high can thus be solved.

Description

Disk failure prediction method and system

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on May 28, 2020, the application number is 202010471262.1, and the invention title is "a method and system for disk failure prediction", the entire content of which is incorporated into this application by reference middle.

Technical field

The invention relates to the technical field of intelligent operation and maintenance, and in particular to a disk failure prediction method and system.

Background technique

In large-scale data centers, the use of hard drives has reached the million-level. The frequent occurrence of disk failures will cause the stability and reliability of the storage system and even the entire IT infrastructure to decline, and even have a negative impact on the business service level agreement. In addition, the disk is also the component with the highest failure rate in the data center. Whether it is an abnormal disk read/write speed or data loss, the consequences for any enterprise are very serious. If the disk failure can be predicted in advance before the disk failure, and the possible abnormal disks can be backed up or replaced in time, it will greatly reduce the loss caused by the disk failure and bring great convenience to the operation of the storage system. And effectively improve the reliability of the data center.

SMART (Self-Monitoring Analysis and Reporting Technology, self-monitoring, analysis and reporting technology) is an automatic hard disk status detection and early warning system and specification. Through the detection instructions preset in the hard disk hardware, the operation of the hard disk hardware (such as magnetic heads, platters, motors and circuits) is monitored. The traditional fault prediction method is to compare the characteristic value of the sample obtained by SMART monitoring with the preset safety value set by the manufacturer. If the characteristic value of the sample obtained by monitoring is about to or has exceeded the safety range of the preset safety value, it will be monitored by the host The hardware or software automatically warns the user and initiates data recovery. However, the above-mentioned failure prediction method will trigger a large number of disk IO processes and affect the normal business of users. In order to improve the above-mentioned failure prediction methods, related technologies use machine learning methods to predict disk failures, allowing users to process user data during non-peak business hours, and its significance and value are better than post-event data recovery.

However, because the number of disk failures is often small, the technical challenge for disk failure prediction is very large. It is a small probability event that a disk failure causes a system downtime. For small-scale or short-loaded disk storage systems, the number of failed disks is very small. At the same time, because the SMART characteristics of the disk are sparse and the disk is close to the failure, a sudden change occurs, resulting in the value of most of the SMART characteristics related to the failure is zero. Therefore, the sparsity of SMART features leads to a large number of negative samples corresponding to normal disks that are easy to predict, while positive samples corresponding to failed disks are difficult to predict.

Summary of the invention

The present invention provides a disk failure prediction method and system, which aims to solve the problems that the existing disk failure prediction technology has low accuracy in predicting small sample disk failures and the problem that positive samples are difficult to predict.

To achieve the above objective, the present invention provides a disk failure prediction method, including:

Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;

Extract the SMART feature of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and negative sample;

Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;

Taking time series features as input and positive samples and negative samples as output, they are imported into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.

Preferably, after the disk failure prediction model is obtained, the disk failure prediction method further includes:

Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;

Sort the failed disks according to the predicted probability of failure to obtain a preset number of failed disks.

Preferably, the steps of using self-monitoring, analysis and reporting SMART technology to sample the disk data set, and marking the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk include:

Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;

Mark the faulty disk and the SMART characteristics of each faulty disk within a predetermined period of time near the failure, as a positive sample.

Preferably, before the step of sampling the disks in the disk data set according to the preset sampling disk ratio, the method further includes: using the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMARTs for disk failure analysis feature.

Preferably, the step of extracting the SMART feature of each positive sample and negative sample according to a preset time sequence, and obtaining the time sequence feature of each positive sample and negative sample, includes:

According to the formula:

diff(t)=S(t)-S(t-1);

Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];

The SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.

Preferably, the steps of importing a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain the improved XGBoost algorithm include:

Set a custom loss function:

Among them, w is the weighting factor of positive and negative samples, y _i is the true value of the i-th sample,

Is the predicted value of the i-th sample,

Is the predicted probability value of the i-th sample;

Perform the first-order derivative and the second-order derivative of the custom loss function respectively to obtain the first derivative and the second derivative of the custom loss function;

Import the first derivative and the second derivative of the custom loss function into the XGBoost algorithm to obtain the improved XGBoost algorithm.

According to the second aspect of the present invention, the present invention also provides a disk failure prediction system, including:

The sampling module is used to sample the disk data set using self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;

The extraction module is used to extract the SMART features of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence characteristics of each positive sample and negative sample;

Import module, used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;

The machine learning module is used to import time series features as input and positive samples and negative samples as output to the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.

Preferably, the disk failure prediction system further includes:

The failure prediction module is used to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;

The disk sorting module is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.

Preferably, the sampling module includes:

The disk sampling sub-module is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;

The feature marking sub-module is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as a positive sample.

Preferably, the sampling module further includes:

The feature analysis sub-module is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.

The disk failure prediction solution provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm. The loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples. After defining the loss function and importing the extreme gradient boosting XGBoost algorithm, after replacing the original custom loss function of XGBoost, the XGBoost algorithm is more inclined to positive samples during the training process. Specifically, use SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import it into the XGBoost algorithm that contains a custom loss function. The XGBoost algorithm can machine learning the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples. The classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.

To sum up, the disk failure prediction method provided by the technical solution of this application imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on the structure shown in these drawings.

Fig. 1-A is a schematic diagram of the first disk failure provided by the prior art;

Figure 1-B is a schematic diagram of a second type of disk failure provided by the prior art;

2 is a schematic flowchart of a first disk failure prediction method provided by an embodiment of the present invention;

3 is a schematic flowchart of a method for labeling positive and negative samples provided by the embodiment shown in FIG. 2;

4 is a schematic flowchart of a method for importing a custom loss function provided by the embodiment shown in FIG. 2;

5 is a schematic flowchart of a second method for predicting a disk failure according to an embodiment of the present invention;

6 is a schematic structural diagram of a first disk failure prediction system provided by an embodiment of the present invention;

Figure 7 is a schematic structural diagram of a second disk failure prediction system provided by an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a sampling module provided by the embodiment shown in FIG. 6.

The realization of the objectives, functional characteristics and advantages of the present invention will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present invention, but not used to limit the present invention.

The main problem to be solved by the embodiment of the present invention is:

Because the number of disk failures is often small, the technical challenge for disk failure prediction is very large. It is a small probability event that a disk failure causes a system downtime. For small-scale or short-loaded disk storage systems, the number of failed disks is very small. At the same time, because the SMART characteristics of the disk are sparse and the disk is close to the failure, a sudden change occurs, resulting in the value of most of the SMART characteristics related to the failure is zero. Furthermore, the sparsity of SMART features makes it difficult to predict large positive samples corresponding to failed disks. As shown in Figure 1-A and Figure 1-B, statistical analysis found that even in the last 7 days of a bad disk, the values of 50%-75% of SMART features such as SMART5 and SMART187 are all 0. Moreover, the SMART will not change significantly until the last 1-15 days of the remaining life of the failed disk. As shown in Figure 1-A, the smart5 of this disk did not change until the last 10 days, and it did not increase significantly until the last 4 days; see Figure 1-B, even SMART187 did not change until the last day. This phenomenon generally occurs on bad disks, that is, the closer to the end of life, the more likely mutations will occur.

To solve the above problems, please refer to FIG. 2. FIG. 2 is a schematic flowchart of a disk failure prediction method provided by an embodiment of the present invention. As shown in FIG. 2, the disk failure prediction method includes the following steps:

S110: Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk.

The positive sample and the negative sample can be used as the output of machine learning and imported into the machine learning model, so that the relevant algorithm can predict the failure probability of the disk according to the type of the positive sample and the negative sample, and predict the failure of the disk.

Specifically, as shown in Figure 3, this step S110: Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk, which specifically includes the following sub-steps :

S111: Use the SMART algorithm to perform range and jump analysis on the disk data set to obtain multiple SMART features for disk failure analysis. Among them, the selected SMART feature needs to be related to the fault and has a large information divergence. In the embodiment of the present application, a total of 7 SMART features are selected, 5, 187, 192, 193, 197, 198, and 199.

S112: Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking.

In this sampling, the disk data set can be divided into a disk training set for training related machine learning algorithms, a disk verification set for verifying related machine learning algorithms, and a disk test set for failure prediction of related disks. Among them, the preset disk sampling ratio can be set as: normal disk sampling ratio: faulty disk sampling ratio=5:1. When sampling the training set, down-sampling can be performed according to the preset sampling disk ratio.

S113: Mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample. In addition, it is possible to mark the SMART feature of the failed disk within a predetermined time period away from the failure and the failed disk as a negative sample; and it can also mark the SMART feature mark as negative in the last predetermined time period before the expiration date of the normal disk or within the predetermined time period far from the expiration date sample.

For example, mark the daily SMART characteristics of the failed disk within 7 days of the failure date and the number of the failed disk as a positive example; mark the daily SMART characteristics and the number of the failed disk within the 7 days before 30 days of the failure date It is a negative example, the SMART characteristics of the last 7 days and 30 days before the normal disk and the number of the normal disk are marked as a negative example.

As shown in Figure 2, the disk failure prediction method further includes:

S120: Extract the SMART feature of each positive sample and the negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and the negative sample.

To extract time series features for multiple SMART features, a sliding window needs to be set. The sliding window can be 3 days, 5 days, or 7 days. The extraction method is specifically to extract the exponentially weighted average of the difference between the samples before and after a window period.

Specifically, the steps of extracting the SMART features of each positive sample and negative sample according to a preset time sequence to obtain the time sequence characteristics of each positive sample and negative sample are as follows:

According to the formula:

diff(t)=S(t)-S(t-1);

Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];

Set the time series of SMART to S and the time window to W, then the difference between the samples before and after the tth day is diff(t)=S(t)-S(t-1);

Set the exponential smoothing sequence to Y and the smoothing coefficient to alpha. In this embodiment, alpha=0.8, then the exponential smoothing value of the first day is the mean value of the time series values of the previous three days, and the formula is as follows:

Starting from the second day, the exponential smoothing value of the current time is as follows: Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1].

In the embodiment of the present application, the exponential smoothing value of the last sample point in the window period W is used as the feature value extracted by the SMART technology.

Compared with the original SMART feature mentioned in the background art, the weighted average of the difference between the samples before and after it measures the cumulative change rate of the original SMART over a period of time, and makes up for the defect caused by the sparse SMART feature.

After extracting the SMART features of each positive sample and negative sample, and obtaining the timing characteristics of the positive and negative samples, the disk failure prediction method shown in Figure 2 further includes the following steps:

S130: Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein, in the custom loss function, the loss caused by the misclassification of positive samples is greater than that of negative samples.

Specifically, as shown in FIG. 4, this step S130: Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm, which specifically includes the following sub-steps:

S131: Set a custom loss function, and the custom loss function is specifically:

Is the predicted value of the i-th sample,

Is the predicted probability value of the i-th sample.

according to

The mapping is obtained, and the mapping range is 0-1, so that y _i,pred reflect the predicted probability of the i-th sample obtained according to the predicted value.

Among them, the default loss function of XGBoost is

The embodiment of the application uses the above-mentioned custom loss function to replace the default loss function of the XGBoost algorithm to implement the import of the custom loss function and obtain an improved XGBoost algorithm. Compared with the default loss function, the custom loss function adds a positive and negative sample weight factor w, which can adjust the proportion of positive and negative samples in the loss function. In this embodiment of the application, the ratio of positive and negative samples is about 1:10. So the value of w is 0.9. Compared with the classification of negative samples into positive samples, because of the weight factor w of positive and negative samples, the loss of misclassification of positive samples into negative samples is greater. Therefore, by adjusting the XGBoost algorithm by w, the training process of XGBoost is more inclined to positive samples.

In addition, the custom loss function also adds an adjustment factor for the difficulty of forecasting

Through the prediction difficulty adjustment factor, it is possible to distinguish the degree of difficulty of a sample prediction. When the sample is easy to predict, that is, the prediction probability of a positive sample is close to 1, and the prediction probability of a negative sample is close to 0, the prediction difficulty adjustment factor approaches 0, and the loss function exponentially approaches 0; and when the sample is difficult to predict, that is The prediction probability of a positive sample is close to 0, and the prediction probability of a negative sample is close to 1, and the prediction difficulty adjustment factor approaches 1, and the loss function is relatively unchanged. By setting the forecast difficulty adjustment factor

The training process of the XGBoost algorithm can be adjusted to make the training process of the XGBoost algorithm more inclined to samples that are difficult to predict.

S132: Perform a first-order derivative and a second-order derivative on the custom loss function to obtain the first derivative and the second derivative of the custom loss function.

For the original custom loss function:

Both sides

For the first-order derivative, the first-order derivative can be obtained as follows:

Taking the derivative of the first derivative again, the second derivative can be obtained as follows:

S133: Import the first derivative and the second derivative of the custom loss function into the XGBoost algorithm respectively to obtain an improved XGBoost algorithm.

By importing the first derivative and second derivative of the custom loss function into the XGBoost algorithm, the improved XGBoost algorithm can be used to predict the failure probability of the disk, because the custom loss function adds the positive and negative sample weight factors and the difficulty of prediction The adjustment factor, therefore, the training process of the improved XGBoost algorithm is more inclined to positive samples and unpredictable samples, which solves the problem of too few positive samples and difficult to predict in the prior art.

S140: Take the time series feature as input and take the positive sample and the negative sample as the output, and import it into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.

The disk failure prediction method provided by the technical solution of this application introduces a custom loss function into the extreme gradient boosting XGBoost algorithm. The loss caused by the misclassification of positive samples in the custom loss function is greater than the loss caused by the misclassification of negative samples. After defining the loss function and importing the extreme gradient boosting XGBoost algorithm, after replacing the original custom loss function of XGBoost, the XGBoost algorithm is more inclined to positive samples during the training process. Specifically, use SMART technology to sample the disk data set, mark the positive sample corresponding to the failed disk and the negative sample corresponding to the normal disk; then extract the timing characteristics of each positive sample and negative sample, so that the timing feature is used as Input, take positive samples and negative samples as output, and import them into the XGBoost algorithm that contains a custom loss function. The XGBoost algorithm can perform machine learning on the time series features of the input according to the custom loss function to obtain the classification boundary of the positive and negative samples. The classification boundary divides the probability of the positive and negative type of each sample, thereby training a disk failure prediction model.

In summary, the disk failure prediction method provided by the technical solution of this application imports a custom loss function into the extreme gradient boosting XGBoost algorithm. Because the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples, the time series features and When the XGBoost algorithm is trained by positive and negative samples, a disk failure prediction model that accurately predicts disk failures can be obtained, thereby solving the problem that the positive samples corresponding to failed disks are difficult to predict due to the sparsity of SMART features in the prior art.

In addition, as shown in FIG. 5, in the disk failure prediction method shown in FIG. 5, compared with the disk failure prediction method shown in FIG. 2, after the disk failure prediction model is obtained, the disk failure prediction method further includes:

S210: Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk.

S220: Sort the failed disks according to the failure prediction probability to obtain a preset number of failed disks.

The disk failure prediction model includes the above-mentioned improved XGBoost algorithm, the classification boundary obtained by the algorithm through machine learning, and the probability range corresponding to the positive and negative samples. In this way, the disk failure prediction model is used to predict the failure of the disks in the disk test set, and the failure prediction probability of each disk at a specific time can be obtained, and then the predicted failed disks can be sorted according to the size of the failure prediction probability, and the preset number can be obtained. Failed disk. Specifically, the embodiment of the present application can set a disk training set, count the average number of failed disks per day N, and select the N samples with the highest probability as the disks that are predicted to fail this time.

In summary, compared with the SMART feature, the weighted average of the difference between the samples before and after obtained in the embodiments of the application measures the cumulative change rate of the original SMART over a period of time. Compared with the loss function using only the default configuration of XGBoost, it is self-explanatory. Defining the loss function makes the model training process more inclined to small samples and more inclined to samples that are difficult to predict. Therefore, it can effectively improve the accuracy and recall rate of disk failure prediction.

In addition, based on the same concept of the above method embodiment, the embodiment of the present invention also provides a disk failure prediction system for implementing the above method of the present invention. Since the principle of solving the problem in the system embodiment is similar to the above method, it has at least the above All the beneficial effects brought about by the technical solutions of the embodiments will not be repeated here.

Referring to FIG. 6, FIG. 6 is a schematic structural diagram of a disk failure prediction system provided by an embodiment of the present invention. As shown in FIG. 6, the disk failure prediction system includes:

The sampling module 101 is used to sample the disk data set using the self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disks and the negative samples corresponding to the normal disks;

The extraction module 102 is configured to extract the SMART feature of each positive sample and negative sample according to a preset time sequence to obtain the time sequence feature of each positive sample and negative sample;

The import module 103 is used to import a custom loss function in the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; among them, the loss caused by misclassification of positive samples in the custom loss function is greater than that of negative samples;

The machine learning module 104 is configured to use timing features as input and positive samples and negative samples as output to import into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model.

In addition, as shown in Figure 7, the disk failure prediction system also includes:

The failure prediction module 105 is configured to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;

The disk sorting module 106 is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.

Wherein, as shown in FIG. 8, the sampling module 101 in the embodiment shown in FIG. 6 and FIG. 7 includes:

The disk sampling submodule 1011 is used to sample the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;

The feature marking sub-module 1012 is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined time period near the failure, as a positive sample.

The sampling module 101 also includes a feature analysis sub-module 1013, which is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.

The specific embodiments of the computer-readable storage medium of the present invention are basically the same as the above-mentioned embodiments of the intelligent identification method for calcium oxalate crystals based on microscopic images, and will not be described in detail here.

Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

It should be noted that in the claims, any reference signs located between parentheses should not be constructed as limitations on the claims. The word "comprising" does not exclude the presence of parts or steps not listed in the claims. The word "a" or "an" preceding a component does not exclude the presence of multiple such components. The invention can be implemented by means of hardware comprising several different components and by means of a suitably programmed computer. In the unit claims that list several devices, several of these devices may be embodied in the same hardware item. The use of the words first, second, and third, etc. do not indicate any order. These words can be interpreted as names.

Although the preferred embodiments of the present invention have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present invention.

Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. In this way, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalent technologies, the present invention is also intended to include these modifications and variations.

Claims

A method for predicting disk failure, which is characterized in that it includes:

Use self-monitoring, analysis and reporting SMART technology to sample the disk data set, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;

Extract the SMART feature of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence feature of each positive sample and negative sample;

Import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein, in the custom loss function, the loss caused by the misclassification of positive samples is greater than that of negative samples;

Use the time series feature as input and positive samples and negative samples as output, and import them into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set to obtain a disk failure prediction model .
The disk failure prediction method according to claim 1, wherein after obtaining the disk failure prediction model, the disk failure prediction method further comprises:

Use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;

The faulty disks are sorted according to the failure prediction probability to obtain a preset number of faulty disks.
The disk failure prediction method according to claim 1, wherein the self-monitoring, analysis and reporting SMART technology is used to sample the disk data set, and the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk are obtained by marking. The sample steps include:

Sampling the disks in the disk data set according to the preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;

Mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as the positive sample.
The disk failure prediction method according to claim 3, wherein before the step of sampling the disks in the disk data set according to a preset sampling disk ratio, the method further comprises:

The SMART algorithm is used to perform range and jump analysis on the disk data set to obtain multiple SMART features for disk failure analysis.
The disk failure prediction method according to claim 1, wherein the step of extracting the SMART feature of each positive sample and negative sample according to a preset time sequence to obtain the time sequence feature of each positive sample and negative sample comprises:

According to the formula:

diff(t)=S(t)-S(t-1);

Y[t]=alpha*diff[t]+(1-alpha)*Y[t-1];

The SMART features of each positive sample and negative sample are calculated; among them, S is the time series, t is the time, diff is the difference between the samples before and after; Y is the exponential smoothing series, and alpha is the smoothing coefficient.
The disk failure prediction method according to claim 1, wherein the step of importing a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain the improved XGBoost algorithm comprises:

Set a custom loss function:

Among them, w is the weighting factor of positive and negative samples, y i is the true value of the i-th sample,
Is the predicted value of the i-th sample,
Is the predicted probability value of the i-th sample;

Performing first-order derivation and second-order derivation on the custom loss function, respectively, to obtain the first-order derivative and the second-order derivative of the custom loss function;

The first derivative and the second derivative of the custom loss function are respectively imported into the XGBoost algorithm to obtain the improved XGBoost algorithm.
A disk failure prediction system is characterized in that it comprises:

The sampling module is used to sample the disk data set using self-monitoring, analysis and reporting SMART technology, and mark the positive samples corresponding to the failed disk and the negative samples corresponding to the normal disk;

The extraction module is used to extract the SMART characteristics of each positive sample and negative sample according to the preset time sequence, and obtain the time sequence characteristics of each positive sample and negative sample;

The import module is used to import a custom loss function into the extreme gradient boosting XGBoost algorithm to obtain an improved XGBoost algorithm; wherein the loss caused by the misclassification of positive samples in the custom loss function is greater than that of negative samples;

A machine learning module for importing the time series feature as input and positive samples and negative samples as output into the improved XGBoost algorithm, so that the improved XGBoost algorithm performs machine learning on the disk data set , Get the disk failure prediction model.
The disk failure prediction system according to claim 7, further comprising:

The failure prediction module is configured to use the disk failure prediction model to predict the failure of the disks in the disk test set; obtain the failure prediction probability of each disk;

The disk sorting module is used to sort the faulty disks according to the failure prediction probability to obtain a preset number of faulty disks.
The disk failure prediction system according to claim 7, wherein the sampling module comprises:

The disk sampling submodule is used to sample the disks in the disk data set according to a preset sampling disk ratio to obtain multiple failed disks and multiple normal disks for marking;

The feature marking sub-module is used to mark the faulty disk and the SMART feature of each faulty disk within a predetermined period of time near the failure, as the positive sample.
The disk failure prediction system according to claim 9, wherein the sampling module further comprises:

The feature analysis sub-module is used to perform range and jump analysis on the disk data set using the SMART algorithm to obtain multiple SMART features for disk failure analysis.