WO2017129030A1

WO2017129030A1 - Disk failure prediction method and apparatus

Info

Publication number: WO2017129030A1
Application number: PCT/CN2017/071695
Authority: WO
Inventors: 丁永明; 周俊; 崔卿; 瞿神全
Original assignee: 阿里巴巴集团控股有限公司; 丁永明; 周俊; 崔卿; 瞿神全
Priority date: 2016-01-29
Filing date: 2017-01-19
Publication date: 2017-08-03
Also published as: TW201732591A; CN107025154A; CN107025154B

Abstract

Disclosed are a disk failure prediction method and apparatus. The method comprises: acquiring sample disk data of a disk through disk monitoring technology, the sample disk data comprising sample data on a plurality of dimensions (S21); performing sample training on the sample disk data by using a GBDT algorithm, to obtain a disk prediction model consisting of a plurality of decision-making trees (S23); and after disk data of a disk to be predicted is received, processing the disk data of the disk to be predicted by using the disk prediction model consisting of the plurality of decision-making trees, and determining whether the disk to be predicted is a failed disk (S25). The method solves the technical problem in the prior art of an inaccurate prediction result caused by the fact that some factors resulting in hard disk failures cannot be collected or quantized in a hard disk failure prediction system.

Description

Disk failure prediction method and device

Technical field

The present invention relates to the field of magnetic disks, and in particular to a method and apparatus for predicting failure of a magnetic disk.

Background technique

At present, the hard disk is the main medium for storing data, and once the hard disk fails, it will cause huge data loss. Therefore, how to ensure the stability of the hard disk can be very important. Under normal conditions, the probability of a hard disk error in 24 hours is about one in ten thousand. When a server has ten hard disks, the probability of a server hard disk error will rise to one thousandth, and with the current website. As the business develops, the number of hard disks that the server needs to use will increase, and the probability of multiple hard disks failing at the same time will increase.

Usually, data storage usually has multiple backups, such as mysql main and standby libraries, and GFS files default to 3 backups. On a large number of data storage platforms, if multiple hard disks fail at the same time, the probability of storing the same file on these hard disks will be high. That is, if multiple hard disks fail at the same time, some files will be lost. For some online services, most of them depend on the huge amount of data stored in the server. If the hard disk fails, the above online service will be abnormal or even suspended.

For the above reasons, systems that need to predict whether the hard disk will go wrong need a system that can tell us in advance which hard disks will go wrong. There are many reasons why the data may be lost. The most common ones are: external vibration, temperature and Humidity, electrical component damage, sound and dust, some of the above factors can be collected, such as temperature and humidity, some component data, but more data can not be collected and quantified, so it will lead to prediction results accurate.

Some factors in the prior art hard disk failure prediction system that easily cause hard disk failure There is no effective solution to the problem of inaccurate prediction results that cannot be collected or quantified.

Summary of the invention

The embodiment of the invention provides a method and a device for predicting a fault of a magnetic disk, so as to at least solve the technical problem that some factors in the prior art hard disk fault prediction system that are easy to cause the fault of the hard disk cannot be collected or quantized due to inaccurate prediction results. .

According to an aspect of the embodiments of the present invention, a method for predicting a fault of a magnetic disk includes: acquiring sample disk data of a disk by using a disk monitoring technology, where the sample disk data includes sample data in multiple dimensions; using a GBDT algorithm The sample disk data is sample-trained to obtain a disk prediction model composed of a plurality of decision trees; after receiving the disk data of the disk to be tested, the disk prediction data of the plurality of decision trees is used to process the disk data of the disk to be tested, Determine if the disk to be tested is a failed disk.

According to another aspect of the embodiments of the present invention, there is also provided a fault prediction apparatus for a disk, comprising: acquiring sample disk data of a disk by using a disk monitoring technology, wherein the sample disk data includes sample data in multiple dimensions; using GBDT The algorithm performs sample training on the sample disk data to obtain a disk prediction model composed of multiple decision trees. After receiving the disk data of the disk to be tested, the disk prediction data composed of multiple decision trees is used to perform disk data of the disk to be tested. Process to determine if the disk to be tested is a failed disk.

In the embodiment of the present invention, the sample disk data of the disk is obtained by using a disk monitoring technology, wherein the sample disk data includes sample data in multiple dimensions; and the sample disk data is sample-trained by using the GBDT algorithm to obtain multiple decision trees. The disk prediction model is formed by processing the disk data of the disk to be tested by using a disk prediction model composed of multiple decision trees after receiving the disk data of the disk to be tested, thereby determining whether the disk to be tested is a failed disk. The purpose is to realize the technical effect of predicting the fault state of the disk, and further solve the technical problem that some factors in the prior art hard disk fault prediction system that are easy to cause the fault of the hard disk cannot be inaccurate due to the acquisition or quantification.

DRAWINGS

The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:

1 is a block diagram showing the hardware structure of a computer terminal for predicting a failure of a magnetic disk according to an embodiment of the present invention;

2 is a flowchart of a method for predicting a failure of a magnetic disk according to an embodiment of the present invention;

3 is a schematic diagram of training sample disk data using the GBDT algorithm according to an embodiment of the present invention;

4 is a schematic diagram of calculating a predicted value of a disk using a GBDT algorithm according to an embodiment of the present invention;

FIG. 5 is a flowchart of an optional disk fault prediction method according to an embodiment of the present invention; FIG.

6 is a schematic structural diagram of a fault prediction apparatus for a magnetic disk according to an embodiment of the present invention;

7 is a schematic structural diagram of an optional disk fault prediction apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of an optional disk fault prediction apparatus according to an embodiment of the present invention; FIG.

9 is a schematic structural diagram of an optional disk fault prediction apparatus according to an embodiment of the present invention;

10 is a schematic structural diagram of an optional disk fault prediction apparatus according to an embodiment of the present invention;

11 is a block diagram showing the structure of a computer terminal according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.

It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.

Example 1

According to an embodiment of the present invention, there is also provided an embodiment of a method for predicting a failure of a magnetic disk, and it is to be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and Although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.

The method embodiment provided in Embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or the like. Taking a computer terminal as an example, FIG. 1 is a hardware block diagram of a computer terminal of a method for predicting a failure of a magnetic disk according to an embodiment of the present invention. As shown in FIG. 1, computer terminal 10 may include one or more (only one shown) processor 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) A memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in FIG. 1 is merely illustrative and does not limit the structure of the above electronic device. For example, computer terminal 10 may also include more or fewer components than those shown in FIG. 1, or have a different configuration than that shown in FIG.

The memory 104 can be used to store software programs and modules of the application software, such as program instructions/modules corresponding to the fault prediction method of the disk in the embodiment of the present invention, and the processor 102 executes by executing the software program and the module stored in the memory 104. Various functional applications and data processing, that is, the vulnerability detection method for implementing the above application. Memory 104 may include high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, memory 104 may further include memory remotely located relative to processor 102, which may be coupled to computer terminal 10 via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Transmission device 106 is for receiving or transmitting data via a network. The network specific examples described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network Interface Controller (NIC) that can be connected to other network devices through a base station to communicate with the Internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module for communicating with the Internet wirelessly.

In the above operating environment, the present application provides a method for predicting a failure of a magnetic disk as shown in FIG. 2. 2 is a flow chart of a method for predicting a failure of a magnetic disk according to an embodiment of the present invention.

It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, CD-ROM, including a number of instructions to make a terminal device (available The method described in various embodiments of the present invention is implemented by a mobile phone, a computer, a server, or a network device.

In the above operating environment, the present application provides a method of processing decompiled data as shown in FIG. 2. 2 is a flowchart of a method for processing decompiled data according to Embodiment 1 of the present invention. As shown in FIG. 2, the method includes:

Step 21: Obtain sample disk data of the disk by using a disk monitoring technology, where the sample disk data includes sample data in multiple dimensions.

In the above steps, the disk monitoring technology is used to monitor various disk data generated during the use of the disk after the factory to predict the fault state of the disk, so that the disk user can know that the disk is about to fail before the disk fails. Therefore, the data in the disk is copied and stored to avoid data loss.

In an optional embodiment, the sample disk data may include: an underlying data read error rate, a start/stop count, a number of remapping sectors, a power-on time accumulation, a spindle spin retries, and a disk calibration retry. The number of times, the number of disk power-ons, the temperature, and the write error rate can be used to obtain sample disk data based on historical disk failure conditions. For example, sample acquisition can be performed at a ratio of 1:5 to positive and negative samples, where the positive sample is the faulty disk and the negative sample is the disk with no fault.

It should be noted that when the disk data of the disk is obtained by the disk monitoring technology, the disks used by the various organizations that predict the disk failure are not necessarily the same, and the environmental factors such as temperature and humidity of the various mechanisms affect the disk. The ratio of the disks of different organizations is different. In order to provide more reliable sample disk data for the training of sample disk data, the sample disk data can also be obtained according to the actual disk damage of the mechanism.

Step S23: Perform sample training on the sample disk data by using a GBDT algorithm to obtain a disk prediction model composed of multiple decision trees.

In the above steps, GBDT (Gradient Boosting Decision Tree) is an iterative decision tree algorithm, which consists of multiple decision trees and accumulates the conclusions of all decision trees to obtain the final result. The above decision tree is used as a predictive model. On the basis of the results, the next layer of decision is made, including parameters such as decision points, state nodes, and result nodes. Each node in the tree represents the predicted object, and each bifurcation path represents the object. Attributes.

In an optional embodiment, if the sample disk is the original value of the SMART of the disk, the sample disk is sample-trained, for example, the original value is greater than or equal to a preset original value, and the sample disk may be considered to be faulty. The probability is large. When the original value is less than the preset value, the probability that the sample disk fails is considered to be small. Therefore, when the disk prediction model is determined, if the original value of the sample disk is greater than or equal to the preset original value, Confirm that the attribute of the sample disk is faulty. If the original value of the sample disk is less than the preset original value, confirm that the attribute of the sample disk is non-faulty. Establish a disk prediction model with the above-mentioned decision-making capability. When the disk to be detected is input to the decision tree, if the original value of the disk to be detected is greater than or equal to the preset original value, and the decision tree automatically confirms that the disk to be detected is faulty, confirm the The attribute of the sample disk is fault. When the original value of the sample disk is less than the preset original value, confirm that the attribute of the sample disk is non-faulty.

Step S25: After receiving the disk data of the disk to be tested, the disk data of the disk to be tested is processed by using the disk prediction model consisting of multiple decision trees to determine whether the disk to be tested is a faulty disk.

In an optional embodiment, the values of the multiple dimensions of the sample disk are used as evaluation indexes of the decision tree to obtain a plurality of decision trees, and then a plurality of decision trees form a disk prediction model to detect the detected disks.

It is worth noting here that the decision trees obtained according to each dimension of the disk may be the same and may not be the same. Therefore, when using multiple decision trees to form a disk prediction model, it is necessary to be based on the importance of each decision tree in the evaluation system. To confirm the weight value of each decision tree, and get the disk prediction model.

It should be noted here that when the disk disk data is obtained by the disk monitoring technology, the disk detection technology is adopted, so that the process of obtaining the sample disk data is simpler, and the acquired data is more comprehensive, and the sample disk data is Training provides a wealth of disk sample data. In the above steps, the sample training of the sample disk data by using the GBDT algorithm may be divided into two. Training is performed one or more times to improve the accuracy and recall rate of the disk prediction model composed of the decision tree corresponding to the training result.

Therefore, the solution of the first embodiment provided by the present application solves the technical problem that some factors in the prior art hard disk fault prediction system that are likely to cause the hard disk failure cannot be collected or quantized due to inaccurate prediction results.

According to the above embodiment of the present application, in a preferred embodiment, the sample disk data includes at least sample data in four dimensions: original value, standard value, worst value, and cumulative value.

The above-mentioned original value is the current parameter of the disk running time; the above-mentioned standard value is the value of each parameter of the normal disk running; the above-mentioned worst value is that when the disk is running, the detection parameters of the disk have the largest deviation from the normal value. Normal value; the above cumulative value is the cumulative result of each disk's detection parameters from disk usage to the current time.

In an optional embodiment, the parameters of the disk may be information describing various attributes of the disk, and may include an error read rate, a power-on frequency, a number of re-allocated sectors, a number of rotation retries, One or more of the number of disk calibration retries and the parity error rate may also include other attribute information of the disk.

The above steps of the present application can respectively obtain a plurality of different decision trees by using the sample data in the above four dimensions.

In an optional embodiment, the sample disk data can be obtained by using software such as HDTune or CrystalDiskInfo.

According to the above embodiment of the present application, in a preferred solution, after acquiring the sample disk data of the disk by using the disk monitoring technology, the method further includes:

Step S211, performing any one or more of the following operations on the sample data in each dimension: a difference operation, a square operation, and a distribution sum operation, so that sample data in any one dimension is expanded into a new dimension. sample.

In the above steps, the decision result is further calculated, and the decision tree can be expanded into a new dimension according to the operation result, and the sample data in this dimension is obtained.

When it is worth noting here, the sample data of each dimension can perform a variety of operations to obtain more dimensional sample data on the basis of this dimension. On the basis of four dimensions, each dimension is separately performed. Differential operations, square operations, and distributed summation operations yield sample data in sixteen dimensions, and the focus of decision making through sample data for each dimension is different.

In an optional embodiment, the sample data of the original value is still taken as an example, and the sample data of the original value is subjected to a difference operation, a square operation, and a distribution sum operation, thereby obtaining a new four dimensions. Sample data, using the new four dimensions of sample data for the most decision-making indicators to train, and get a new four decision trees.

According to the above embodiment of the present application, in a preferred solution, the sample disk data is sample-trained by using the GBDT algorithm, and a disk prediction model composed of multiple decision trees is obtained, including:

In step S231, sample disk data of all disks is used as training data, and the classification model parameters of the training data are initialized with default values.

In the above steps, the classification model parameter of the initialization training data may be preset the number of the above decision trees and the number of layers of each decision tree, that is, the initial setting of the attributes of the decision tree.

Step S233, extracting a plurality of feature data in the training data, creating each of the plurality of decision trees as a root node, and using the feature value corresponding to each feature data as a leaf node of the corresponding decision tree. .

Step S235: Calculate an optimal partition of all current leaf nodes and a gain thereof, and perform splitting with the leaf node with the largest gain and the corresponding split point, so that the sample disk data is divided into the child nodes.

In the above steps, the gain may be the minimum mean square error of the label value, that is, the square of the difference between the label value of each sample and the predicted label value, and calculate the sum of the squares of all the differences, which may be considered to be predicted. The more samples that are erroneous, the larger the mean square error, so the optimal branching basis can be found by minimizing the mean square error.

The decision tree may be a binary tree with each feature data as a root node, and each special data corresponds to a feature value, and the feature value is a leaf of a decision tree with the feature data as a root node. node. After determining the leaf nodes of the decision tree, the leaf nodes are further divided. It is worth noting that when the leaf nodes are further divided, the gain is maximized when the gains of the plurality of leaf nodes are different. Leaf nodes, so that all sample data can be divided into corresponding leaf nodes.

In an optional embodiment, the sample disk is a disk of A, B, C, and D. The A disk and the B disk are normal disks, and the C disk and the D disk are damaged disks. In one example, the normal disk corresponds to 0, and the failed disk corresponds to 1, so the four disks A, B, C, and D correspond to 0, 0, 1, 1, respectively. Obtaining the characteristic value of the disk in the first dimension is A, and training the sample disk data by using the GBDT algorithm. FIG. 3 is a schematic diagram of training the sample disk data by using the GBDT algorithm according to an embodiment of the present invention, and FIG. 3 As shown, the default initial value is set to 0.5, that is, the probability that each disk is a failed disk is 0.5, the threshold of the first dimension is A0, and the disk with the feature value greater than A0 is divided into a child node, which will be in the first dimension. A disk whose eigenvalue is less than or equal to A0 is divided into another child node, and the probability that the disk of the two child nodes is a failed disk is 0.5.

It should be noted that, in the above embodiment, for convenience of description, only four sample data are selected for description, so only two leaf nodes are obtained. In practical applications, after the root node is divided into two leaf nodes, You can continue to divide, the larger the sample data, the more the level of division.

According to the above embodiment of the present application, in a preferred solution, multiple feature data in the training data are extracted, each feature data is created as a root node, and each feature data is created. The corresponding feature value is used as the leaf node of the corresponding decision tree, including:

In step S2331, the threshold corresponding to any one of the feature data is read.

Step S2333, comparing the feature value of any one of the feature data with the threshold, and obtaining the entropy of the two branches according to the comparison result.

Step S2335, determining two new nodes as two leaf nodes of the arbitrary one feature data according to the entropy of the two branches.

Step S2337, using the above steps to process each feature data until each special The levy data gets the predetermined two unique leaf nodes.

In the above steps, each threshold of each feature is exhausted, and the features and thresholds that minimize the entropy of the two branches according to the feature being less than or equal to the threshold and the feature is greater than the threshold are found, and two new branches are obtained according to the standard branch. Nodes, use the same method to continue branching until all samples are split into leaf nodes with only normal disks or only failed disks, or reach the default termination condition. If there are not only normal disks or failed disks in the final leaf node, then The average tag value of all samples on the node is used as the predicted tag value for the leaf node.

It should be noted here that the tag value is the probability that the disk is a failed disk.

It should be noted here that the minimum entropy means that as far as possible, the ratio of positive and negative samples in each branch is far from 1:1, and the case of minimum entropy is that there are only positive or negative samples on the branch. That is, there are only normal disks or failed disks on the branch.

In an optional embodiment, in the example where the decision tree is a regression tree, each node obtains a predicted value equal to the average of all the tag values belonging to the node, and the node is divided. Exhausting each threshold of each feature, finding the best segmentation point, until the tag value of each sample on each leaf node is unique or reaches the preset termination condition, if the label of the sample on the final leaf node If the value is not unique, the average tag value of all samples on the node is used as the predicted tag value of the leaf node.

It should be noted here that in the above embodiment, the optimal partitioning criterion is no longer to minimize the entropy, but to minimize the mean square error, that is, the difference between the label value of each sample and the predicted label value. The square of the square, and calculate the sum of the squares of all the differences, can be considered that the more samples that are predicted to be wrong, the greater the mean square error, so the optimal branching basis can be found by minimizing the mean square error.

It should also be noted here that it is difficult to achieve the unique label value of each sample on each leaf node when performing the partitioning, so a termination condition can be preset in order to obtain the prediction result closest to the real situation. The termination condition can be the upper limit of the leaf.

According to the above embodiment of the present application, in a preferred solution, after obtaining a disk prediction model composed of a plurality of decision trees, the method further comprises: adjusting the classification model parameters, wherein the classification Model parameters include faulty disk samples and non-faulty disk samples In this case, if it is determined whether the disk to be tested is a faulty disk, the proportion of the failed disk samples in the classification model parameter is increased.

According to the above embodiment of the present application, in a preferred solution, the disk data of the disk to be tested is processed by using the disk prediction model composed of a plurality of decision trees to determine whether the disk to be tested is a faulty disk. ,include:

Step S251: After receiving the disk data of the disk to be tested, assign an initial value to the disk data of the disk to be tested.

Step S253, traversing each decision tree according to the initial value of the disk to be tested, calculating a prediction result and a first residual determined by the first decision tree, and assigning the first residual to the initial value. , get the updated initial value.

Step S255, calculating, by using the updated initial value, a prediction result determined by the second decision tree and a second residual, and assigning the updated residual value to the second residual, thereby traversing all the The decision tree obtains a result of predicting whether the disk to be tested is a failed disk.

In step S257, each tree learns the residual of all previous tree conclusions, and the residual is an accumulated amount that can obtain the true value after adding the predicted value.

In an optional embodiment, the four disks A, B, C, and D are still taken as an example, and the four disks A, B, C, and D can be divided into two parts by using feature A, respectively. , B and C, D, each part uses the average tag value as the predicted value. At this time, the residual is calculated, wherein the residual is the difference between the predicted value of the disk and the actual value of the disk, so the residual of A is 1-0.5=0.5, and the residuals of A, B, C, and D are respectively 0.5. -0.5, 0.5, -0.5. Then, as shown in FIG. 4, FIG. 4 is a schematic diagram of calculating a predicted value of a disk using the GBDT algorithm according to an embodiment of the present invention, and using the residual to replace the original values of A, B, C, and D, and inputting to the second decision tree. Train and divide into two leaf nodes according to the comparison result with feature B. If the predicted values and their residuals are equal, then simply add the conclusion of the second tree to the first tree to get the disk. Actual value. The second tree has only two values of 0.5 and -0.5, so it is split directly into two nodes. At this point everyone's residual is 0, that is, everyone gets real predictions.

It should be noted here that the above embodiment is for the purpose of explanation, so there are only two decision trees. In practical applications, a decision tree can be obtained according to the sample data amount, and the predicted value refers to the sum of all the previous trees. Since in this embodiment, the decision tree has only one decision tree before, so it is directly 0.5. If there are still strange decision trees, they need to be added together as the predicted value of A.

FIG. 5 is a flowchart of an optional fault prediction method for a magnetic disk according to an embodiment of the present invention. A preferred embodiment of the present application is described in detail below with reference to FIG. 5.

As shown in FIG. 5, a method for predicting a fault of a magnetic disk is provided. The method may include the following steps S51 to S57:

S51. Obtain sample data of the sample disk.

Specifically, in the above steps, the sample disk data can be obtained by using software such as HDTune or CrystalDiskInfo.

S52, performing differential operations on the sample data.

Specifically, in the above steps, the difference operation refers to a value obtained by performing difference calculation between the feature data of the disk at a certain time and the feature data of the disk before 24 hours.

S53, performing a distribution summation and/or a square operation on the result obtained by the difference operation.

S54, obtaining training and prediction data.

S55, the first step of training and forecasting, makes the recall rate larger.

S56, the second step of training and forecasting, balances recall and accuracy.

Specifically, in the above steps, since the proportion of negative samples in the training data is large, the proportion of positive samples is small. For example, when the ratio of the two is 1000:1, if all the training data is used for training, the prediction can be accurately predicted. Positive samples are rare. Because there are few positive samples in the training data, many data with real values of negative samples may be misjudged as positive samples. Therefore, the first step is to make the positive sample recall rate higher during training. During training, the training data predicted as the positive sample in the first step is used as the training data of the second step, that is, those samples that are close to the positive sample are selected as the training samples, so that when training, the trained model will be more It is good to predict the positive sample, so the result of the second step prediction, the accuracy of the positive sample will be greatly improved than the first step, so that the accuracy and recall rate reach a certain balance.

Example 2

According to an embodiment of the present invention, there is also provided a processing apparatus for decompiling data for implementing the processing method of the decompiled data, and FIG. 6 is a schematic structural diagram of a fault prediction apparatus for a magnetic disk according to an embodiment of the present invention, such as As shown in FIG. 6, the apparatus includes an acquisition module 60, a training module 62, and a processing module 64.

The obtaining module 60 is configured to acquire sample disk data of the disk by using a disk monitoring technology, where the sample disk data includes sample data in multiple dimensions;

The training module 62 is configured to perform sample training on the sample disk data by using a GBDT algorithm to obtain a disk prediction model composed of multiple decision trees;

The processing module 64, after receiving the disk data of the disk to be tested, processing the disk data of the disk to be tested by using the disk prediction model composed of multiple decision trees, and determining whether the disk to be tested is a fault disk .

It should be noted that the above-mentioned acquisition module 60, the training module 62, and the processing module 64 are the same as the application scenarios and the application scenarios that are implemented in the steps S21 to S25 of the embodiment, but are not limited to the disclosure in the first embodiment. content. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.

According to the above embodiment of the present application, in a preferred solution, the sample disk data is SMART disk data, wherein the sample disk data includes at least sample data in four dimensions: original value, standard value, and most Difference and cumulative value.

According to the above embodiment of the present application, in a preferred solution, as shown in FIG. 7, the device further includes:

The operation module 70 is configured to perform any one or more of the following operations on the sample data in each dimension: a difference operation, a square operation, and a distribution sum operation, so that sample data in any one dimension is expanded to a new one. Sample data on the dimension.

It should be noted that the foregoing operation module 770 is the same as the example and the application scenario implemented in step S21 to step S25 in the first embodiment, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the first embodiment. Provided in the computer terminal 10.

According to the above embodiment of the present application, in a preferred solution, as shown in FIG. 8, the training module 62 further includes:

The initial module 80 is configured to use sample disk data of all disks as training data, and initialize a classification model parameter of the training data by using a default value;

The extracting module 82 is configured to extract a plurality of feature data in the training data, create each of the plurality of decision trees as a root node, and use the feature value corresponding to each feature data as a corresponding decision tree. Leaf node

The first calculating module 84 is configured to calculate an optimal partition of all current leaf nodes and a gain thereof, and perform splitting with the leaf node with the largest gain and the corresponding split point, so that the sample disk data is divided into the child nodes.

It should be noted that the initial module 80, the extraction module 82, and the first calculation module 84 are the same as the application scenarios implemented in steps S231 to S235 of the embodiment, but are not limited to the foregoing embodiment. Public content. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.

According to the above embodiment of the present application, in a preferred solution, as shown in FIG. 9, the extraction module 82 includes:

The reading module 90 is configured to read a threshold corresponding to any one of the feature data;

The comparing module 92 is configured to compare the feature value of the any one of the feature data with the threshold, and obtain the entropy of the two branches according to the comparison result;

a determining module 94, configured to determine, according to the entropy of the two branches, two new nodes as two leaf nodes of the any one of the feature data;

The processing sub-module 96 is configured to process each feature data by using the above steps until each feature data obtains two predetermined unique leaf nodes.

It should be noted that the foregoing reading module 90, the comparing module 92, the determining module 94, and the processing sub-module 96 correspond to the realities implemented in steps S2331 to S2337 of the embodiment. The example is the same as the application scenario, but is not limited to the content disclosed in the first embodiment. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.

According to the above embodiment of the present application, in a preferred solution, after obtaining a disk prediction model composed of a plurality of decision trees, the method further comprises: adjusting the classification model parameters, wherein the classification In the case where the model parameters include a failed disk sample and a non-faulty disk sample, if it is determined whether the disk to be tested is a failed disk, the proportion of the failed disk sample in the classification model parameter is increased.

According to the above embodiment of the present application, in a preferred solution, as shown in FIG. 10, the processing module 64 includes:

The receiving module 100 is configured to: after receiving the disk data of the disk to be tested, assign an initial value to the disk data of the disk to be tested;

The second calculating module 102 is configured to traverse each decision tree according to the initial value of the disk to be tested, calculate a prediction result and a first residual determined by the first decision tree, and assign the first residual Giving the initial value, obtaining an updated initial value;

The traversing module 104 is configured to calculate, by using the updated initial value, a prediction result determined by the second decision tree and a second residual, and the second residual is assigned the updated initial value. Iterate through all the decision trees and get the result of predicting whether the disk to be tested is a failed disk.

It should be noted that the example of the receiving module 100, the second calculating module 102, and the traversing module 104 corresponding to the steps S251 to S255 of the embodiment are the same as the application scenario, but are not limited to the foregoing embodiment. Public content. It should be noted that the above module can be operated as part of the device in the computer terminal 10 provided in the first embodiment.

Example 3

Embodiments of the present invention may provide a computer terminal, which may be any one of computer terminal groups. Optionally, in this embodiment, the foregoing computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of the plurality of network devices of the computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the fault prediction method of the disk: acquiring the sample disk data of the disk by using the disk monitoring technology, wherein the sample disk data includes sample data in multiple dimensions; using GBDT The algorithm performs sample training on the sample disk data to obtain a disk prediction model composed of multiple decision trees. After receiving the disk data of the disk to be tested, the disk prediction data composed of multiple decision trees is used to perform disk data of the disk to be tested. Process to determine if the disk to be tested is a failed disk.

Optionally, FIG. 11 is a structural block diagram of a computer terminal according to an embodiment of the present invention. As shown in FIG. 11, the computer terminal A may include one or more (only one shown in the figure) processor 111, memory 113, and transmission device 115.

The memory can be used to store the software program and the module, such as the fault prediction method of the disk and the program instruction/module corresponding to the device in the embodiment of the present invention, and the processor executes various programs by running the software program and the module stored in the memory. Functional application and data processing, that is, the above-described method for predicting the failure of the disk. The memory may include a high speed random access memory, and may also include non-volatile memory such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory can further include memory remotely located relative to the processor, which can be connected to terminal A via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may call the memory stored information and the application by the transmission device to perform the following steps: the sample disk data is SMART disk data, wherein the sample disk data includes at least sample data in the following four dimensions: original value, standard value , the worst value and the cumulative value.

Optionally, the foregoing processor may further execute the following program code: perform any one or more of the following operations on the sample data in each dimension: a difference operation, a square operation, and a distributed sum operation, so that any one dimension The sample data is expanded out of the sample data on the new dimension.

Optionally, the foregoing processor may further execute the following program code: use sample disk data of all disks as training data, and initialize a classification model parameter of the training data by using a default value; and extract multiple feature data in the training data, Each feature data is used as a root node to create multiple decision trees, and the feature value corresponding to each feature data is used as a leaf node of the corresponding decision tree; the optimal partition of all current leaf nodes and its gain are calculated, and the gain is maximized. Leaves The node and the corresponding dividing point are split, so that the sample disk data is divided into the child nodes.

Optionally, the processor may further execute the following program code: read a threshold corresponding to any one of the feature data; compare the feature value of any one of the feature data with a threshold, and obtain an entropy of the two branches according to the comparison result; Two new nodes are determined as two leaf nodes of any one of the feature data according to the entropy of the two branches; each feature data is processed by the above steps until each feature data obtains two predetermined unique leaf nodes.

Optionally, the foregoing processor may further execute the following program code: after obtaining the disk prediction model composed of multiple decision trees, the method further includes: adjusting the classification model parameters, where the classification model parameters include the faulty disk In the case of sample and non-failed disk samples, if you want to determine if the disk under test is a failed disk, increase the proportion of the failed disk samples in the classification model parameters.

Optionally, the foregoing processor may further execute the following program code: after receiving the disk data of the disk to be tested, the disk data of the disk to be tested is given an initial value; and each decision tree is traversed according to the initial value of the disk to be tested. Calculating the prediction result and the first residual determined by the first decision tree, and assigning the first residual to the initial value to obtain the updated initial value; and calculating the updated initial value to obtain the second decision tree The determined prediction result and the second residual, and the second residual is assigned an updated initial value, thereby traversing all the decision trees to obtain a result of predicting whether the disk to be tested is a failed disk.

A person skilled in the art can understand that the structure shown in FIG. 11 is only an illustration, and the computer terminal can also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, and a palm phone. Brain and mobile Internet devices (MID), PAD and other terminal devices. FIG. 11 does not limit the structure of the above electronic device. For example, computer terminal A may also include more or fewer components (such as a network interface, display device, etc.) than shown in FIG. 11, or have a different configuration than that shown in FIG.

A person of ordinary skill in the art may understand that all or part of the steps of the foregoing embodiments may be completed by a program to instruct terminal device related hardware, and the program may be stored in a computer readable storage medium, and the storage medium may be Including: flash disk, read-only memory (ROM), random access memory (RAM), disk or optical disk.

Example 4

Embodiments of the present invention also provide a storage medium. Optionally, in the embodiment, the foregoing storage medium may be used to save the program code executed by the fault prediction method of the disk provided in the first embodiment.

Optionally, in this embodiment, the foregoing storage medium may be located in any one of the computer terminal groups in the computer network, or in any one of the mobile terminal groups.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring sample disk data of the disk by using a disk monitoring technology, wherein the sample disk data includes sample data in multiple dimensions; The GBDT algorithm is used to perform sample training on the sample disk data to obtain a disk prediction model composed of multiple decision trees. After receiving the disk data of the disk to be tested, a disk prediction model composed of multiple decision trees is used to measure the disks of the disk. The data is processed to determine whether the disk to be tested is a failed disk.

Optionally, the storage medium is further configured to store program code for performing the following steps: performing one or more of the following operations on the sample data in each dimension: a difference operation, a square operation, and a distribution sum operation, The sample data in any one dimension is expanded to the sample data on the new dimension.

Optionally, the storage medium is further configured to store program code for performing the following steps: The sample disk data of all disks is used as training data, and the classification model parameters of the training data are initialized by default values; multiple feature data in the training data are extracted, and each feature data is used as a root node to create multiple decision trees, and The feature value corresponding to each feature data is used as a leaf node of the corresponding decision tree; the optimal partition of all current leaf nodes and its gain are calculated, and the leaf nodes with the largest gain and the corresponding segment points are split, so that the sample disk data is obtained. Divided into child nodes.

Optionally, the storage medium is further configured to store program code for performing the following steps: reading a threshold corresponding to any one of the feature data; comparing the feature value of any one of the feature data with a threshold, and obtaining two according to the comparison result. Entropy of branches; two new nodes are determined as two leaf nodes of any one feature data according to the entropy of the two branches; each feature data is processed by the above steps until each feature data obtains two predetermined unique ones Leaf node.

Optionally, the storage medium is further configured to store program code for performing the following steps: after obtaining a disk prediction model composed of a plurality of decision trees, the method further comprises: adjusting the classification model parameters, wherein, in the classification In the case where the model parameters include a failed disk sample and a non-faulty disk sample, if the disk to be tested is determined to be a failed disk, the proportion of the failed disk sample in the classification model parameter is increased.

Optionally, the foregoing storage medium is further configured to store program code for performing the following steps: after receiving the disk data of the disk to be tested, the disk data of the disk to be tested is given an initial value; traversing according to the initial value of the disk to be tested. For each decision tree, the prediction result determined by the first decision tree and the first residual are calculated, and the first residual is assigned to the initial value to obtain the updated initial value; and the updated initial value is calculated. The prediction result determined by the two decision trees and the second residual, and the second residual is assigned an updated initial value, thereby traversing all the decision trees to obtain a result of predicting whether the disk to be tested is a failed disk.

The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed technical content, It can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, unit or module, and may be electrical or otherwise.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.

Claims

A method for predicting a fault of a disk, comprising:

Obtaining sample disk data of the disk by using a disk monitoring technology, wherein the sample disk data includes sample data in multiple dimensions;

Performing sample training on the sample disk data by using the GBDT algorithm to obtain a disk prediction model composed of multiple decision trees;

After receiving the disk data of the disk to be tested, the disk data of the disk to be tested is processed by using the disk prediction model consisting of multiple decision trees to determine whether the disk to be tested is a faulty disk.
The method according to claim 1, wherein the sample disk data is SMART disk data, wherein the sample disk data includes at least sample data in four dimensions: original value, standard value, and worst value. And cumulative values.
The method of claim 2, after the obtaining the sample disk data of the disk by the disk monitoring technology, the method further comprises:

The sample data in each dimension is subjected to any one or more of the following operations: a difference operation, a square operation, and a distribution sum operation, so that the sample data in any one dimension is expanded to the sample data in the new dimension.
The method according to any one of claims 1 to 3, wherein the sample disk data is sample-trained by using a GBDT algorithm to obtain a disk prediction model composed of a plurality of decision trees, including:

Taking sample disk data of all disks as training data, and initializing the classification model parameters of the training data by default values;

Extracting a plurality of feature data in the training data, creating each of the plurality of decision trees as a root node, and using the feature value corresponding to each feature data as a leaf node of the corresponding decision tree;

The optimal partition of all current leaf nodes and its gain are calculated, and splitted by the leaf node with the largest gain and the corresponding split point, so that the sample disk data is divided into the child nodes.
The method according to claim 4, wherein a plurality of feature data in the training data are extracted, each feature data is used as a root node to create the plurality of decision trees, and each feature data is corresponding The feature value is used as the leaf node of the corresponding decision tree, including:

Reading a threshold corresponding to any one of the feature data;

Comparing the feature value of any one of the feature data with the threshold, and obtaining the entropy of the two branches according to the comparison result;

Determining two new nodes as two leaf nodes of the arbitrary one of the feature data according to the entropy of the two branches;

Each of the feature data is processed using the above steps until each feature data obtains two predetermined unique leaf nodes.
The method according to claim 4, wherein after obtaining a disk prediction model composed of a plurality of decision trees, the method further comprises: adjusting the classification model parameters, wherein the classification model parameters In the case of a faulty disk sample and a non-faulty disk sample, if it is determined whether the disk to be tested is a failed disk, the proportion of the failed disk sample in the classification model parameter is increased.
The method according to claim 1, wherein the disk data of the disk to be tested is processed by using a disk prediction model composed of a plurality of decision trees, and determining whether the disk to be tested is a failed disk, including :

After receiving the disk data of the disk to be tested, assigning an initial value to the disk data of the disk to be tested;

Traversing each decision tree according to the initial value of the disk to be tested, calculating a prediction result determined by the first decision tree and a first residual, and assigning the first residual to the initial value to obtain an update After the initial value;

Calculating the prediction result determined by the second decision tree and the second residual by using the updated initial value, and assigning the updated initial value to the second residual, thereby traversing all the decision trees. Obtain a result of predicting whether the disk to be tested is a failed disk.
A fault prediction device for a magnetic disk, comprising:

An obtaining module, configured to acquire sample disk data of a disk by using a disk monitoring technology, where the sample disk data includes sample data in multiple dimensions;

a training module, configured to perform sample training on the sample disk data by using a GBDT algorithm, to obtain a disk prediction model composed of multiple decision trees;

The processing module, after receiving the disk data of the disk to be tested, processes the disk data of the disk to be tested by using the disk prediction model composed of multiple decision trees, and determines whether the disk to be tested is a fault disk.
The apparatus according to claim 8, wherein said sample disk data is SMART disk data, and wherein said sample disk data includes at least sample data in four dimensions: original value, standard value, and worst value. And cumulative values.
The device according to claim 9, wherein the device further comprises:

The operation module is configured to perform any one or more of the following operations on the sample data in each dimension: a difference operation, a square operation, and a distribution sum operation, so that the sample data in any one dimension is expanded into a new dimension. sample.
The apparatus according to any one of claims 8 to 10, wherein the training module further comprises:

An initial module, configured to use sample disk data of all disks as training data, and initialize a classification model parameter of the training data by using a default value;

An extraction module, configured to extract a plurality of feature data in the training data, create each of the plurality of decision trees as a root node, and use a feature value corresponding to each feature data as a corresponding decision tree Leaf node

a first calculation module, configured to calculate an optimal division of all current leaf nodes and increase thereof And splitting with the leaf node with the largest gain and the corresponding dividing point, so that the sample disk data is divided into the child nodes.
The apparatus according to claim 11, wherein the extraction module comprises:

a reading module, configured to read a threshold corresponding to any one of the feature data;

a comparison module, configured to compare the feature value of the any one of the feature data with the threshold, and obtain the entropy of the two branches according to the comparison result;

a determining module, configured to determine, according to the entropy of the two branches, two new nodes as two leaf nodes of the any one of the feature data;

The processing submodule is configured to process each feature data by using the above steps until each feature data obtains two predetermined unique leaf nodes.
The apparatus according to claim 11, wherein after obtaining a disk prediction model composed of a plurality of decision trees, the apparatus further comprises: adjusting the classification model parameters, wherein the classification model parameters In the case of a faulty disk sample and a non-faulty disk sample, if it is determined whether the disk to be tested is a failed disk, the proportion of the failed disk sample in the classification model parameter is increased.
The device according to claim 8, wherein the processing module comprises:

The receiving module is configured to: after receiving the disk data of the disk to be tested, assign an initial value to the disk data of the disk to be tested;

a second calculating module, configured to traverse each decision tree according to the initial value of the disk to be tested, calculate a prediction result and a first residual determined by the first decision tree, and assign the first residual to the The initial value is obtained as an updated initial value;

a traversing module, configured to calculate, by using the updated initial value, a prediction result determined by a second decision tree and a second residual, and the second residual is assigned the updated initial value, thereby traversing All decision trees get the result of predicting whether the disk to be tested is a failed disk.