CN114064366A

CN114064366A - Fault prediction method, device, equipment and storage medium

Info

Publication number: CN114064366A
Application number: CN202010780193.2A
Authority: CN
Inventors: 梁双春; 张安兵; 纪春芳; 张子豪; 刘童桐
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2022-02-18

Abstract

The application discloses a fault prediction method, a fault prediction device, equipment and a storage medium. Wherein the method comprises the following steps: determining a mutation point position corresponding to a SMART attribute value which has a mutation in at least one self-monitoring, analyzing and reporting technology (SMART) attribute value of a hard disk to be detected; determining the median of the position of the mutation point corresponding to the SMART attribute value with mutation; constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation; determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining a first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data.

Description

Fault prediction method, device, equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for fault prediction.

Background

At present, hard disk failures account for over 70% of all server component failures, and if hard disk failure prediction can be realized, the pressure of data operation and maintenance is greatly reduced, data migration can be carried out leisurely, and therefore the operation cost is reduced.

In the related art, a generally adopted hard disk failure prediction method mainly cleans periodically acquired hard disk Self-Monitoring, Analysis and Reporting Technology (SMART, Self-Monitoring Analysis and Reporting Technology) data, and then directly applies clustering, random forest or logistic regression and other methods to perform failure Analysis, but the problems of low prediction accuracy and poor universality exist because the characteristics of the SMART data are not considered.

Disclosure of Invention

In order to solve technical problems in the related art, embodiments of the present application provide a failure prediction method, apparatus, device, and storage medium.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a fault prediction method, which comprises the following steps:

determining a mutation point position corresponding to a SMART attribute value which is mutated in at least one SMART attribute value of a hard disk to be detected;

determining the median of the position of the mutation point corresponding to the SMART attribute value with mutation;

constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation;

determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining a first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data.

In the above scheme, the determining a mutation point position corresponding to a SMART attribute value that is mutated in at least one SMART attribute value of a hard disk to be detected includes:

determining historical failed hard disks and SMART attribute values of the historical failed hard disks based on historical sample data;

based on each SMART attribute value of the hard disk with the historical fault, detecting a mutation point corresponding to a SMART attribute value with a mutation in at least one SMART attribute value of the hard disk to be detected so as to determine the mutation point corresponding to the SMART attribute value with the mutation;

and determining the position of a mutation point corresponding to the SMART attribute value with mutation based on the length of the time sequence of the SMART attribute values of the hard disks with historical faults, the number of the hard disks with historical faults and the number of the SMART attribute values of the hard disks with historical faults.

In the above scheme, when at least one type of feature value is constructed based on a median of positions of mutation points corresponding to the SMART attribute values in which mutations occur, the method includes:

determining the dominant frequency of a Ricker (Ricker) wavelet based on the least common multiple of the median of the positions of the mutation points corresponding to the SMART attribute values with mutation;

determining a corresponding Ricker wavelet based on the dominant frequency of the Ricker wavelet;

and determining the number of Ricker wavelet peak values of each SMART attribute value time sequence with the sudden change based on the convolution result of each SMART attribute value time sequence with the Ricker wavelet.

grouping the SMART attribute value time sequence with mutation based on the least common multiple of the median value of the mutation point position corresponding to the SMART attribute value with mutation to obtain at least two data groups; the length of each data packet is less than or equal to the least common multiple;

for each data packet, determining a square of each data in the corresponding data packet; summing the determined squares of the data to obtain the square sum of the corresponding data packet;

the square sum occupancy of the respective data packet is determined based on the square sum of the respective data packet and the square sum of all data packets.

grouping the first interval in equal proportion based on the ratio of the least common multiple of the median value of the mutation point position corresponding to the mutated SMART attribute value to the length of the mutated SMART attribute value time sequence to obtain at least two subintervals;

intercepting data from the mutated SMART attribute value time sequence according to each subinterval to obtain the mutated SMART attribute value time subsequence corresponding to each subinterval;

carrying out mean value processing on the data in the mutated SMART attribute value time subsequence to obtain a mean value of a corresponding data packet;

and carrying out variance processing on the mean value of the data packet to obtain the variance of the corresponding data packet.

determining the length of a time sequence of the SMART attribute value with mutation and the result of integral division of the least common multiple of the median of the position of the mutation point corresponding to the SMART attribute value with mutation;

dividing the second interval in equal proportion based on the integer division result to obtain at least two equal division results;

and determining a data variance fluctuation mark based on the at least two equally dividing results and the time sequence of the SMART attribute value with mutation.

dividing the third interval in equal proportion based on the integer division result to obtain at least two equal division results;

and determining a data symmetry mark based on the at least two equally dividing results and the time sequence of the SMART attribute value with mutation.

performing linear regression processing on the SMART attribute value time sequence with mutation based on a least square estimation strategy to obtain a corresponding linear regression result;

and determining the overall linear trend characteristic of the data based on the linear regression result.

In the above scheme, the method further comprises:

determining the length of the time sequence of the SMART attribute values with mutation;

and optimizing the determined length of the time sequence of the SMART attribute value with the mutation to obtain the optimal length value of the time sequence of the SMART attribute value with the mutation.

In the foregoing scheme, the optimizing the determined length of the time series of the mutated SMART attribute values to obtain the optimal length of the time series of the mutated SMART attribute values includes:

performing interpolation processing on the determined length of the time sequence of the SMART attribute value with the mutation, and determining a fraction value of a fault prediction result of the hard disk to be detected;

and determining the length of the SMART attribute value time sequence corresponding to the maximum value of the fraction value of the fault prediction result of the hard disk to be detected as the length optimal value of the mutated SMART attribute value time sequence.

In the above scheme, the method further comprises: normalizing the at least one type of characteristic value to obtain at least one type of normalized characteristic value;

the determining the fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining with the first model comprises the following steps:

inputting the normalized at least one type of characteristic value serving as input data into the first model to obtain a fault prediction result of the hard disk to be detected, which is output by the first model; wherein the first model is a model for classification.

An embodiment of the present application further provides a failure prediction apparatus, where the apparatus includes:

the first determining unit is used for determining a mutation point position corresponding to a SMART attribute value which is mutated in at least one SMART attribute value of the hard disk to be detected;

a second determining unit, configured to determine a median of positions of mutation points corresponding to the SMART attribute values in which mutations occur;

the construction unit is used for constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation;

the third determining unit is used for determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining the first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data.

An embodiment of the present application further provides a failure prediction device, where the device includes:

the processor is used for determining a mutation point position corresponding to a SMART attribute value which is mutated in at least one SMART attribute value of the hard disk to be detected;

An embodiment of the present application further provides another failure prediction device, where the device includes: a processor and a memory for storing a computer program operable on the processor;

when the processor is used for running the computer program, the steps of the failure prediction method provided by the embodiment of the application are executed.

The embodiment of the present application further provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the failure prediction method provided by the embodiment of the present application.

According to the fault prediction method, the fault prediction device, the fault prediction equipment and the storage medium, the mutation point position corresponding to the SMART attribute value which is mutated in at least one SMART attribute value of the hard disk to be detected is determined; determining the median of the position of the mutation point corresponding to the SMART attribute value with mutation; constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation; determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining a first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data. By adopting the scheme of the embodiment of the application, the characteristic of the mutation point of the SMART attribute value is fully considered, the characteristic value of at least one type is reconstructed, and then the fault of the hard disk to be detected is predicted based on the constructed characteristic value of at least one type and by combining the first model, so that the fault prediction result of the hard disk to be detected is obtained, the characteristic of the fault disk can be accurately reflected, and the accuracy and the universality of hard disk fault prediction can be effectively improved.

Drawings

Fig. 1 is a schematic flowchart of a fault prediction method according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of another fault prediction method provided in the embodiment of the present application;

fig. 3 is a schematic structural diagram of a failure prediction apparatus according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a hardware composition structure of the failure prediction device according to the embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and that the technical solutions described in the embodiments of the present application may be combined with each other without conflict.

The embodiments of the present application will be described in further detail with reference to the drawings and examples.

An embodiment of the present application provides a failure prediction method, where the method is applied to a server, and fig. 1 is a schematic flow diagram of the failure prediction method provided in the embodiment of the present application, and as shown in fig. 1, the failure prediction method includes:

step 101, determining a mutation point position corresponding to a SMART attribute value which has a mutation in at least one SMART attribute value of a hard disk to be detected.

And 102, determining a median value of the position of the mutation point corresponding to the SMART attribute value with the mutation.

In practical application, SMART of a hard disk specifies a standard to be followed by a hard disk manufacturer, and is a data security technology commonly adopted in the current hard disk.

In practical applications, the number and meaning of SMART attribute values of hard disks may vary and are not completely consistent for different brands and models of hard disks, but the SMART attribute values as shown in table 1 below are usually required to be included in each brand and model of hard disk.

TABLE 1

Here, the SMART attribute value of the hard disk to be detected includes an original value of the SMART attribute value and a normalized value of the SMART attribute value, but in practical application, since normalization methods of different SMART attribute values by different hard disk device vendors are not uniform, in order to simplify an implementation process of the failure prediction method in the embodiment of the present application, the original value of the SMART attribute value is used for analysis in the embodiment of the present application.

It should be noted that, the brand and the model of the hard disk to be detected are not limited in the embodiment of the present application, that is, the embodiment of the present application may be applied to hard disk failure prediction of multiple brands and models.

In practical application, the server may first detect, based on each SMART attribute value of the hard disk with a historical failure, a mutation point corresponding to a SMART attribute value with a mutation in at least one SMART attribute value of the hard disk to be detected by a mutation point detection method, and then determine a position of the mutation point corresponding to the SMART attribute value with the mutation.

Based on this, in some embodiments, the determining a mutation point position corresponding to a SMART attribute value having a mutation in at least one SMART attribute value of the hard disk to be detected may be implemented as follows:

In practical application, by analyzing the SMART attribute values of the hard disks with failures in history, the situation that part of SMART attribute values suddenly rise or fall in a certain period of time before the failure occurs in the hard disks is found, which shows that the performance of the hard disks is obviously changed and is a precursor of the failure of the hard disks. In addition, for a hard disk in a good state, the analysis of the sudden change of the SMART attribute value is carried out in the same way, and the fact that the sudden change exists in part of the SMART attribute values is found, but the degree of the sudden change is different from that of a failed hard disk. Based on this, the embodiment of the application analyzes the mutation points of the SMART attribute values, extracts at least one type of characteristic value, and then performs classification prediction on the fault of the hard disk.

And matching each SMART attribute value of the hard disk with which the fault occurs historically with at least one SMART attribute value of the hard disk to be detected, and determining the mutation point of the SMART attribute value with high matching degree as the mutation point corresponding to the SMART attribute value with the mutation.

In practical application, the SMART attribute value of the hard disk is divided into training set data and test set data, for the training set data, elements of time sequences are added one by one in the SMART attribute value time sequence according to the position sequence to be used as subsequences, then each subsequence is sequenced to determine the weighted summation result of all subsequences, and finally the mutation point position corresponding to the SMART attribute value with mutation is determined based on the weighted summation result of all subsequences. The specific code implementation process is as follows:

here, the effective value (Pvalue) of the mutation point corresponding to the SMART attribute value at which the mutation occurs may be determined, for example, by the following equation: pvalue ═ exp (-6X Y ^2/(L ^3+ L ^ 2)); and then comparing the determined effective value Pvalue with a set threshold value, and further judging whether the determined mutation point is effective or not according to a comparison result. For example, assuming that the threshold is set to 0.05, if the Pvalue is less than 0.05, the determined mutation point is determined to be a valid mutation point; and if the Pvalue is more than 0.05, judging the determined mutation point as an invalid mutation point.

It should be noted that, by repeating the above process for each SMART attribute value of each hard disk with a history failure, the positions of the mutation points of all SMART attribute values of all hard disks with history failures in the training set can be obtained. Then, taking the median value of the array formed by the mutation point positions corresponding to the SMART attribute values of all the mutations, namely Y_j＝Median(Y_ij) Deleting if the current SMART attribute value has no mutation pointThe SMART attribute value without the mutation point does not enter the subsequent processing process of constructing the characteristic value.

And 103, constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation.

In the embodiment of the present application, the median of the mutation point positions corresponding to the mutated SMART attribute values can be used as an important parameter for constructing at least one type of characteristic values, which may include but are not limited to: the number of Ricker sub-wave peak values, the square sum ratio of data packets, the mean value and variance of the data packets, a data variance fluctuation mark, a data symmetry mark, a data integral linear trend characteristic and the like. Here, the at least one type of characteristic value is used to characterize a degree of mutation of the SMART attribute value in which the mutation occurs.

In practical application, because the SMART attribute values of the failed hard disk and the hard disk in a good state are all likely to have a sudden change, in order to realize the fault prediction of the hard disk to be detected, the embodiment of the application adopts a method of reconstructing the characteristic value to further distinguish the sudden change degree of the SMART attribute value with the sudden change.

Here, whether the SMART attribute value of the failed hard disk is a sudden change or the SMART attribute value of the good hard disk is a sudden change can be determined according to the degree and duration of the sudden change.

It should be noted that, in order to ensure the effectiveness of the SMART attribute value mutation analysis, in practical applications, the length of the SMART attribute value time sequence used for constructing the feature value needs to satisfy the following condition: the length of the time series of the SMART attribute values used for constructing the characteristic value must be larger than the median of the positions of the mutation points corresponding to the SMART attribute values in which the mutation occurs.

In some embodiments, when constructing at least one type of feature value based on a median value of the location of the mutation point corresponding to the SMART attribute value at which the mutation occurs, the method includes:

determining the dominant frequency of the Ricker wavelet based on the least common multiple of the median value of the mutation point position corresponding to the SMART attribute value with mutation;

Here, the number of Ricker wavelet peak values of each time series of SMART attribute values in which a mutation occurs is constructed as a type of characteristic value by taking into account Ricker wavelets commonly used in seismic wave analysis, so that the degree of influence after the mutation of the SMART attribute values can be better analyzed.

Specifically, the median of the mutation point position corresponding to the SMART attribute value with mutation is taken as the least common multiple, the least common multiple is assumed to be y, and the dominant frequency of Ricker wavelet is f₀Then, determining the dominant frequency of the Ricker wavelet based on the least common multiple can be obtained by the following formula: f. of₀1/(n x y), wherein n is a positive integer, n x y<L represents the total length of the SMART attribute value time series in which a mutation occurs; next, based on the dominant frequency of the Ricker wavelet, determining a corresponding Ricker wavelet, wherein the expression of the Ricker wavelet is as follows: r (t) ═ 1-2 (pi f)₀t)²]exp(-(πf₀t)²) Wherein, R (t) represents the waveform of Ricker wavelet, and t represents time; and performing convolution operation on each SMART attribute value time sequence with the sudden change and the Ricker wavelet to obtain a corresponding convolution result, and analyzing the number of Ricker wavelet peak values according to the convolution result. In the process of determining the number of the Ricker wavelet peak values, if the Ricker wavelet peak values (maximum values) are determined through inspection, the number of the Ricker wavelet peak values is added by 1, so that the corresponding number of the Ricker wavelet peak values can be obtained for each SMART attribute value time sequence with mutation and used as the characteristic value of each SMART attribute value time sequence with mutation, each positive integer n is taken as one value, the obtained number of the Ricker wavelet peak values is used as one characteristic value, and n characteristic values are shared.

In some embodiments, when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value at which a mutation occurs, the method includes:

Specifically, the median of the mutation point position corresponding to the mutated SMART attribute value is taken as the least common multiple, the mutated SMART attribute value time sequence is grouped under the assumption that the least common multiple is y, at least two data groups are obtained, the length of each data group is less than or equal to y, namely the number of the data groups is the integer division y of the total length of the mutated SMART attribute value time sequence, and when the remainder is not 0, the data corresponding to the remainder forms one data group. Assuming that n data packets are divided, for each data packet, the squares of each data in the corresponding data packet are calculated, the squares of each data are summed to obtain the sum of squares of the corresponding data packet, and then the ratio of the sum of squares of each data packet to the sum of squares of all data packets is calculated. For example, if the time series of the SMART attribute values in which the mutation occurs is divided into n data packets, n eigenvalues are formed, that is, the ratio of the sum of squares of the 1 st data packet to the sum of squares of all data packets is the 1 st eigenvalue, and so on, the ratio of the sum of squares of the nth data packet to the sum of squares of all data packets is the nth eigenvalue.

Specifically, the median of the mutation point position corresponding to the mutated SMART attribute value is taken as the least common multiple, assuming that the least common multiple is y, the length of the time series of the mutated SMART attribute value is L, assuming that the first interval is [0, 1], and [0, 1] is divided equally according to y/L to obtain at least two sub-intervals, for example, y/L is 0.2, and then the at least two sub-intervals which are divided equally are [0,0.2], [0.2,0.4], [0.4,0.6], [0.6,0.8], [0.8,1], and the intervals are constructed according to a small-to-large traversal formula, that is:

[0,0.2]，[0,0.4]，[0,0.6]，[0,0.8]，[0,1]

[0.2,0.4]，[0.2,0.6]，[0.2,0.8]，[0.2,1]

[0.4,0.6]，[0.4,0.8]，[0.4,1]

[0.6,0.8]，[0.6,1]

[0.8,1]

intercepting data from the mutated SMART attribute value time sequence according to each subinterval to obtain a mutated SMART attribute value time subsequence corresponding to each subinterval, for example, a subsequence from the beginning of the corresponding sequence to the 0.2 corresponding position of the total length according to [0,0.2], and then carrying out mean processing on the data in the mutated SMART attribute value time subsequence to obtain a mean value of a corresponding data packet as a characteristic value; and then carrying out variance processing on the mean value of the data packet to obtain the variance of the corresponding data packet as another characteristic value. Assuming that the number of subintervals is n, the number of eigenvalues constructed in the above manner is 2 n.

Specifically, the median of the mutation point position corresponding to the mutated SMART attribute value is taken as the least common multiple, the length of the time series of the mutated SMART attribute value is assumed to be y, the length of the time series of the mutated SMART attribute value is assumed to be L, the second interval is assumed to be [0, 0.5], the [0, 0.5] is divided into equal parts according to L and y, and the result of the division is the value of r, for example, if L/y is 0.1, the value of r is 0.1, 0.2, 0.3, 0.4, 0.5. And the data variance fluctuation mark is std (S), and the data variance fluctuation mark can be determined according to the formula std (S) > r (max (S) -min (S)), wherein the type of the data variance fluctuation mark is a pool type, S is a time sequence of SMART attribute values with mutation, max (S) is the maximum value of the time sequence of SMART attribute values with mutation, min (S) is the minimum value of the time sequence of SMART attribute values with mutation, and under the normal condition, the number of the data variance fluctuation marks is the same as the number of values of r.

Specifically, the median of the mutation point position corresponding to the mutated SMART attribute value is taken as the least common multiple, the least common multiple is assumed to be y, the length of the time sequence of the mutated SMART attribute value is L, the third interval is assumed to be [0, W ], the [0, W ] is divided equally according to L integer division y, and the result of the division is the value of r, for example, the value of r is taken as 0.1, 0.2 and 0.3 if L/y is assumed to be 0.1 and W is 0.3. The data symmetry flag is calculated by | mean (S) -mean (S) | < r (max (S) -min (S)), and the result is a bool type, where S is the time series of SMART attribute values at which a mutation occurs, max (S) is the maximum value of the time series of SMART attribute values at which a mutation occurs, min (S) is the minimum value of the time series of SMART attribute values at which a mutation occurs, mean (S) is the mean value of the time series of SMART attribute values at which a mutation occurs, and mean (S) is the median value of the time series of SMART attribute values at which a mutation occurs. In general, the number of data symmetry flags is the same as the number of values of r. The value of W depends on the brand and model of the hard disk to be detected, and corresponds to the maximum fluctuation value of the difference between the mean value and the median value.

Specifically, the data overall linear trend characteristics may include intercept (intercept), slope (slope), value (right value) and stderr (standard output), and if the time series of the SMART attribute values in which the mutation occurs is represented by Y, and if Y is a time stamp of the independent variable X, then the SMART attribute values in which the mutation occurs are represented by stderrThe time stamps X of the inter-sequence are subjected to linear regression to yield Y ═ BX + a + epsilon, where epsilon represents the error. Using a least squares estimation strategy, then B ═ X^TX)^-1X^TY', B represents slope, and A is intercept; rvalue ═ Σ (Y' -e (Y))²/∑(Y-Y’)²Wherein E (Y) represents the result of averaging Y; stderr ═ Σ (Y-Y')²(n-2), wherein n represents the length of the time series of SMART attribute values in which a mutation occurs.

And 104, determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining a first model.

Here, the first model is obtained by training, as training sample data, each feature value included in at least one type of feature value.

In some embodiments, the determining the failure prediction result of the hard disk to be detected based on the constructed at least one type of feature value in combination with the first model may be implemented as follows:

inputting the constructed at least one type of characteristic value into the first model as input data to obtain a fault prediction result of the hard disk to be detected, which is output by the first model; wherein the first model is a model for classification.

In practical application, in order to increase the calculation speed of the first model, normalization processing may be performed on at least one type of characteristic value to obtain at least one type of normalized characteristic value, and then fault prediction may be performed on the hard disk to be detected based on the at least one type of normalized characteristic value in combination with the first model.

Based on this, in some embodiments, the method further comprises: normalizing the at least one type of characteristic value to obtain at least one type of normalized characteristic value;

correspondingly, the determining of the fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining the first model can be realized by the following modes:

Here, the normalization processing may be specifically performed on the at least one type of feature value, and assuming that the normalized at least one type of feature value is y, y may be determined by the following formula: y is log10(x +1), where x represents at least one type of characteristic value determined above.

In practical application, the first model may be a random forest (RandomForest) model, and the random forest model is used for performing predictive analysis on the constructed at least one type of characteristic value to obtain a result of whether the hard disk to be detected fails in a prediction period.

The following sets of prediction results corresponding to different sequence lengths are given, and the code implementation process for performing fault prediction through the random forest model is as follows:

in practical application, since the length of the time series of the mutated SMART attribute values has a certain influence on the prediction result, in order to ensure the accuracy of the failure prediction result, the length of the time series of the mutated SMART attribute values needs to be optimized in the embodiment of the present application.

Based on this, in some embodiments, the method further comprises:

In practical application, the method comprises the following steps of optimizing the determined length of the time sequence of the SMART attribute value with the mutation to obtain the optimal length of the time sequence of the SMART attribute value with the mutation, wherein the optimal length comprises the following steps:

Here, the interpolation processing on the determined length of the time series of the SMART attribute values with the sudden change may be cubic spline interpolation processing on the length of the time series of the SMART attribute values with the sudden change, specifically, according to the test set result, the fractional value F _ score of the multiple prediction result and the value of the time series length L of the SMART attribute values are respectively configured into an array, and then the fractional value F _ score of the prediction result is determined by using a cubic spline interpolation method, and the expression is: and F _ score is a, L, 3, b, L2, c and L + d, wherein a, b, c and d respectively represent coefficients of a cubic spline interpolation formula, and the L value corresponding to the maximum value of the F _ score is selected as the optimal length value of the SMART attribute value time sequence with the mutation.

In practical application, because the ratio of the number of actual fault hard disks to the total number of hard disks is small, in the fault prediction process, when the number of positive and negative samples is unbalanced, the prediction precision is easily low, so that the SMART attribute value samples of the fault hard disks can be expanded, and D is the number of the samples of the fault disks constructed by expansion and is used for improving the fault prediction precision of the hard disks. In addition, the predicted time in advance of the hard disk failure state is less than or equal to D time units. For example, if D is 20, a failed hard disk sequence can be extended to 20 sequences with equal length, and the predicted time needs to be less than or equal to 20 time units. When the sampling period is day, the predicted time unit is day.

An embodiment of the present application further provides another fault prediction method, where the method is applied to a server, and fig. 2 is a schematic flowchart of another fault prediction method provided in the embodiment of the present application, and as shown in fig. 2, the fault prediction method includes:

step 201, determining a mutation point position corresponding to a SMART attribute value which has a mutation in at least one SMART attribute value of a hard disk to be detected;

step 202, determining a median value of the mutation point position corresponding to the SMART attribute value with mutation;

step 203, constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation;

here, the at least one type of characteristic value is used to characterize a degree of mutation of the SMART attribute value in which the mutation occurs.

In the embodiment of the present application, the median of the mutation point positions corresponding to the mutated SMART attribute values can be used as an important parameter for constructing at least one type of characteristic values, which may include but are not limited to: the number of Ricker sub-wave peak values, the square sum ratio of data packets, the mean value and variance of the data packets, a data variance fluctuation mark, a data symmetry mark, a data integral linear trend characteristic and the like.

In practical application, in order to ensure the effectiveness of the SMART attribute value mutation analysis, in practical application, the length of the SMART attribute value time sequence used for constructing the characteristic value needs to satisfy the following conditions: the length of the time series of the SMART attribute values used for constructing the characteristic value must be larger than the median of the positions of the mutation points corresponding to the SMART attribute values in which the mutation occurs.

Step 204, normalizing the at least one type of characteristic value to obtain a normalized at least one type of characteristic value;

step 205, inputting the normalized at least one type of characteristic value as input data to a first model to obtain a fault prediction result of the hard disk to be detected, which is output by the first model.

Here, the first model is a model for classification, and may specifically be a random forest model, and the random forest model is used for performing predictive analysis on at least one type of normalized feature value to obtain a result of whether a hard disk to be detected fails in a prediction period.

It should be noted that, the specific processing procedure of the failure prediction has been described in detail above, and can be understood by referring to the failure prediction method described in detail above, which is not described herein again.

According to the fault prediction scheme provided by the embodiment of the application, the position of a mutation point corresponding to a SMART attribute value which is mutated in at least one SMART attribute value of a hard disk to be detected is determined; determining the median of the position of the mutation point corresponding to the SMART attribute value with mutation; constructing at least one type of characteristic value based on the median value of the mutation point position corresponding to the SMART attribute value with mutation; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation; determining a fault prediction result of the hard disk to be detected based on the constructed at least one type of characteristic value and by combining a first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data.

By adopting the scheme of the embodiment of the application, the characteristic of the mutation point of the SMART attribute value is fully considered, the characteristic value of at least one type is reconstructed, and then the fault of the hard disk to be detected is predicted based on the constructed characteristic value of at least one type and by combining the first model, so that the fault prediction result of the hard disk to be detected is obtained, the characteristic of the fault disk can be accurately reflected, and the accuracy and the universality of hard disk fault prediction can be effectively improved.

In order to implement the fault prediction method according to the embodiment of the present application, an embodiment of the present application further provides a fault prediction apparatus, and fig. 3 is a schematic structural diagram of the fault prediction apparatus according to the embodiment of the present application, and as shown in fig. 3, the fault prediction apparatus includes:

a first determining unit 31, configured to determine a mutation point position corresponding to a SMART attribute value that has a mutation in at least one SMART attribute value of a hard disk to be detected;

a second determination unit 32, configured to determine a median of positions of sudden change points corresponding to the SMART attribute values where sudden changes occur;

a construction unit 33 configured to construct at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs; the at least one type of characteristic value is used for characterizing the mutation degree of the SMART attribute value with mutation;

a third determining unit 34, configured to determine, based on the constructed at least one type of feature value, a failure prediction result of the hard disk to be detected by combining with the first model; the first model is obtained by training the training sample data by taking each feature value included in at least one type of feature value as the training sample data.

In some embodiments, the first determining unit 31 is specifically configured to:

In some embodiments, when the constructing unit 33 constructs at least one type of feature value based on a median value of the location of the mutation point corresponding to the SMART attribute value at which the mutation occurs, the failure prediction apparatus further includes:

a fourth determining unit, configured to determine a dominant frequency of the Ricker wavelet based on a least common multiple of a median value of a mutation point position corresponding to the SMART attribute value having a mutation;

a fifth determining unit, configured to group the time series of the mutated SMART attribute values based on a least common multiple of a median value of the mutation point positions corresponding to the mutated SMART attribute values, so as to obtain at least two data groups; the length of each data packet is less than or equal to the least common multiple;

a sixth determining unit, configured to group the first intervals in an equal proportion based on a ratio of a least common multiple of a median value of a mutation point position corresponding to the mutated SMART attribute value to a length of a time sequence of the mutated SMART attribute value, so as to obtain at least two subintervals;

a seventh determining unit, configured to determine an integer division result of a minimum common multiple of a length of a time series of the SMART attribute values having a sudden change and a median of positions of the sudden change points corresponding to the SMART attribute values having a sudden change;

an eighth determining unit, configured to determine an integer division result of a minimum common multiple of a length of a time series of the SMART attribute values having a sudden change and a median of positions of the sudden change points corresponding to the SMART attribute values having a sudden change;

a ninth determining unit, configured to perform linear regression processing on the mutated SMART attribute value time series based on a least square estimation strategy, so as to obtain a corresponding linear regression result;

In some embodiments, the failure prediction apparatus further comprises:

a tenth determining unit for determining the length of the time series of SMART attribute values in which a mutation occurs;

and the optimizing unit is used for optimizing the determined length of the time sequence of the SMART attribute value with the mutation to obtain the optimal length value of the time sequence of the SMART attribute value with the mutation.

In practical application, the optimization unit is specifically configured to:

In some embodiments, the failure prediction apparatus further comprises:

the normalization unit is used for performing normalization processing on the at least one type of characteristic value to obtain at least one type of normalized characteristic value;

correspondingly, the third determining unit 34 is specifically configured to:

Here, the first determining unit 31, the second determining unit 32, the constructing unit 33, and the third determining unit 34 may be implemented by a processor in the failure prediction apparatus in conjunction with a communication interface at the time of actual application.

It should be noted that, when the failure prediction apparatus provided in the above embodiment performs failure prediction, only the division of each program module is taken as an example, and in practical applications, the processing distribution may be completed by different program modules according to needs, that is, the internal structure of the apparatus may be divided into different program modules to complete all or part of the processing described above. In addition, the fault prediction apparatus and the fault prediction method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments in detail and are not described herein again.

Based on the implementation of the composition structure of the program module, and in order to implement the failure prediction method according to the embodiment of the present application, an embodiment of the present application further provides a failure prediction device, and fig. 4 is a schematic diagram of a hardware composition structure of the failure prediction device according to the embodiment of the present application, and as shown in fig. 4, the failure prediction device 40 includes:

a communication interface 41, configured to obtain at least one SMART attribute value of a hard disk to be detected;

and the processor 42 is connected with the communication interface 41 and is used for executing the fault prediction method provided by one or more of the technical schemes when running the computer program. And the computer program is stored on the memory 43.

Specifically, the processor 42 is configured to determine a mutation point position corresponding to a SMART attribute value that has a mutation in at least one SMART attribute value of the hard disk to be detected;

In some embodiments, the processor 42 is specifically configured to:

In some embodiments, the processor 42 is further configured to:

when at least one type of characteristic value is constructed based on the median of the mutation point position corresponding to the SMART attribute value with mutation, determining the dominant frequency of the Ricker wavelet based on the least common multiple of the median of the mutation point position corresponding to the SMART attribute value with mutation;

In some embodiments, the processor 42 is further configured to:

grouping the time sequence of the mutated SMART attribute values based on the least common multiple of the median of the mutation point positions corresponding to the mutated SMART attribute values to obtain at least two data groups when at least one type of characteristic value is constructed based on the median of the mutation point positions corresponding to the mutated SMART attribute values; the length of each data packet is less than or equal to the least common multiple;

In some embodiments, the processor 42 is further configured to:

when at least one type of characteristic value is constructed based on the median of the mutation point position corresponding to the mutated SMART attribute value, the first intervals are grouped in equal proportion based on the ratio of the least common multiple of the median of the mutation point position corresponding to the mutated SMART attribute value to the length of the time sequence of the mutated SMART attribute value to obtain at least two subintervals;

In some embodiments, the processor 42 is further configured to:

when at least one type of characteristic value is constructed based on the median of the mutation point position corresponding to the SMART attribute value with mutation, determining the length of the time sequence of the SMART attribute value with mutation and the result of the integral division of the least common multiple of the median of the mutation point position corresponding to the SMART attribute value with mutation;

In some embodiments, the processor 42 is further configured to:

when at least one type of characteristic value is constructed based on the median value of the mutation point position corresponding to the SMART attribute value with mutation, linear regression processing is carried out on the SMART attribute value time sequence with mutation based on a least square estimation strategy, and a corresponding linear regression result is obtained;

In some embodiments, the processor 42 is further configured to:

In practical applications, the processor 42 is specifically configured to:

In some embodiments, the processor 42 is further configured to:

normalizing the at least one type of characteristic value to obtain at least one type of normalized characteristic value;

accordingly, the processor 42 is specifically configured to:

It should be noted that, the specific processing procedures of the communication interface 41 and the processor 42 are described in detail in the method embodiment, and are not described herein again.

Of course, in actual practice, the various components in fault prediction device 40 are coupled together by bus system 44. It will be appreciated that the bus system 44 is used to enable communications among the components. The bus system 44 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 44 in fig. 4.

The memory 43 in the embodiment of the present application is used to store various types of data to support the operation of the failure prediction apparatus 40. Examples of such data include: any computer program for operating on the fault prediction device 40.

The failure prediction method disclosed in the embodiment of the present application may be applied to the processor 42, or implemented by the processor 42. The processor 42 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by instructions in the form of hardware, integrated logic circuits, or software in the processor 42. The Processor 42 may be a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. Processor 42 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in memory 43, and processor 42 reads the information in memory 43 and performs the steps of the fault prediction method described above in conjunction with its hardware.

In an exemplary embodiment, the failure prediction Device 40 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors) or other electronic components for performing the steps of the aforementioned failure prediction methods.

It will be appreciated that the memory 43 of the embodiments of the present application can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage.

Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced Synchronous Dynamic Random Access Memory), Synchronous linked Dynamic Random Access Memory (DRAM, Synchronous Link Dynamic Random Access Memory), Direct Memory (DRmb Random Access Memory). The memories described in the embodiments of the present application are intended to comprise, without being limited to, these and any other suitable types of memory.

In an exemplary embodiment, the present application further provides a storage medium, specifically a computer storage medium, which may be a computer readable storage medium, for example, a memory 43 storing a computer program, which may be executed by a processor 42 in the failure prediction device 40 to complete the steps of the aforementioned failure prediction method. The computer-readable storage medium can be memories such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface Memory, optical disk or CD-ROM; or may be various devices including one or any combination of the above memories.

In the embodiments of the present application, the terms "first", "second", and the like, are used for distinguishing similar objects only, and do not denote a particular order or sequence of the objects, and it is to be understood that "first", "second", and the like, where the context allows, may be interchanged with other sequences or sequences, such that the embodiments of the present application described herein may be implemented in other sequences than those illustrated or described herein.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of fault prediction, the method comprising:

determining a mutation point position corresponding to a SMART attribute value which has a mutation in at least one self-monitoring, analyzing and reporting technology SMART attribute value of a hard disk to be detected;

2. The method according to claim 1, wherein the determining the position of the mutation point corresponding to the SMART attribute value having a mutation in the at least one SMART attribute value of the hard disk to be detected comprises:

3. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

determining the dominant frequency of the Rake Ricker wavelet based on the least common multiple of the median value of the position of the mutation point corresponding to the SMART attribute value with mutation;

4. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

5. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

6. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

7. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

8. The method according to claim 1, wherein when constructing at least one type of feature value based on a median value of a mutation point position corresponding to the SMART attribute value in which a mutation occurs, the method comprises:

9. The method of claim 1, further comprising:

10. The method according to claim 9, wherein the optimizing the determined length of the time series of the SMART attribute values with the sudden change to obtain the optimal length of the time series of the SMART attribute values with the sudden change comprises:

11. The method of claim 1, further comprising: normalizing the at least one type of characteristic value to obtain at least one type of normalized characteristic value;

12. A failure prediction apparatus, characterized in that the apparatus comprises:

the first determining unit is used for determining the position of a mutation point corresponding to a SMART attribute value which is mutated in at least one self-monitoring, analyzing and reporting technology SMART attribute value of the hard disk to be detected;

13. A failure prediction device, characterized in that the device comprises:

the processor is used for determining the position of a mutation point corresponding to a SMART attribute value which is mutated in at least one self-monitoring, analyzing and reporting technology SMART attribute value of the hard disk to be detected;

14. A failure prediction device, characterized in that the device comprises: a processor and a memory for storing a computer program operable on the processor;

wherein the processor is adapted to perform the steps of the failure prediction method of any one of claims 1 to 11 when running the computer program.

15. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the failure prediction method of any one of claims 1 to 11.