CN110389866A - Disk failure prediction technique, device, computer equipment and computer storage medium - Google Patents
Disk failure prediction technique, device, computer equipment and computer storage medium Download PDFInfo
- Publication number
- CN110389866A CN110389866A CN201810359823.1A CN201810359823A CN110389866A CN 110389866 A CN110389866 A CN 110389866A CN 201810359823 A CN201810359823 A CN 201810359823A CN 110389866 A CN110389866 A CN 110389866A
- Authority
- CN
- China
- Prior art keywords
- disk
- classifier
- training
- sample
- data information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000012549 training Methods 0.000 claims abstract description 133
- 230000015654 memory Effects 0.000 claims description 28
- 238000012544 monitoring process Methods 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- 238000013500 data storage Methods 0.000 abstract description 2
- 230000003252 repetitive effect Effects 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 6
- 238000012423 maintenance Methods 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 230000007613 environmental effect Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000005034 decoration Methods 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 230000004907 flux Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/2273—Test methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3037—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/285—Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computer Hardware Design (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a kind of disk failure prediction technique, device, computer equipment and computer storage mediums, belong to technical field of data storage, inventive disk failure prediction method includes: the data information in the acquisition multiple dimensions of disk sample to construct the training set that weak learning algorithm is readable in Adaboost algorithm;Training set is trained by Adaboost algorithm, generates strong classifier;Failure predication is carried out to disk to be measured according to strong classifier.This method is based on adaboost algorithm, predicts disk failure by continuous repetitive exercise Weak Classifier, can weaken each manufacturer's disk SMART threshold value bring otherness, and prediction accuracy is high, and the algorithm realizes simple, disk forecasting efficiency height.
Description
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a method and an apparatus for predicting disk failures, a computer device, a computer storage medium, and a storage control system.
Background
With the continuous development of computer and network technologies, more and more data are generated in succession, and the disk, which is a storage medium that cannot be kept apart by a computer, also faces more and more serious challenges. Due to a large amount of data read-write operations, Disks are constantly worn, and although the RAID (Redundant Array of Independent Disks) technology or a distributed system is provided at present, burst failure of a disk still affects data stability to a great extent.
The current method for detecting the disk failure is basically realized by monitoring the disk in real time, and whether the disk fails or not is confirmed through SMART information of the disk. The SMART information comprises information such as the read error rate of the bottom layer data of the disk, the read-write flux performance, the startup time of the disk and the like. Once a certain information error occurs, it represents that there may be a failure in the disk and the disk data is lost. Therefore, the passive receiving monitoring alarm mode for detecting the disk fault in real time cannot completely guarantee the stability of system service and data, and meanwhile, the passive receiving monitoring alarm mode is also an invisible bomb for operation and maintenance personnel in work. Therefore, to avoid data loss and reduce workload of operation and maintenance personnel, a failure of a disk needs to be predicted.
There are model methods for predicting disk failures, such as bayesian model, markov and hidden markov model, support vector machine model, neural network model, etc. In the algorithm model, the Bayesian model algorithm is simple and easy to realize, but the prediction accuracy is only 20% -30%; the prediction accuracy of the Markov model and the hidden Markov model is improved to 52 percent, but the prediction accuracy is still not ideal; although the support vector machine model and the neural network model can obtain 95% of prediction accuracy, the algorithm is complex, a large number of samples need to be subjected to long-time learning training, and even more, some training is invalid training, so that the learning effect cannot be achieved, and the prediction efficiency is low.
Disclosure of Invention
Embodiments of the present invention provide a method and an apparatus for predicting a disk failure, a computer device, a computer storage medium, and a storage control system, so as to solve the above technical problems in the prior art.
In a first aspect, an embodiment of the present invention provides a disk failure prediction method.
Specifically, the method comprises the following steps:
acquiring data information of a disk sample in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm;
training the training set by an Adaboost algorithm to generate a strong classifier;
and predicting the fault of the disk to be tested according to the strong classifier.
In a second aspect, an embodiment of the present invention provides a disk failure prediction apparatus.
Specifically, the apparatus comprises:
the disk sample data information acquisition module is used for acquiring data information of a disk sample in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm;
the strong classifier generating module is used for training the training set through an Adaboost algorithm to generate a strong classifier;
and the disk fault prediction module is used for predicting the fault of the disk to be tested according to the strong classifier.
In a third aspect, an embodiment of the present invention provides a computer device.
Specifically, the computer device includes:
a processor; and
a memory for storing a computer program for executing a computer program,
wherein the processor is configured to execute the computer program stored in the memory to implement the disk failure prediction method of the first aspect
In a fourth aspect, embodiments of the present invention provide a computer storage medium.
Specifically, the computer storage medium stores therein a computer program, and the computer program, when executed by a processor, implements the disk failure prediction method according to the first aspect.
In a fifth aspect, an embodiment of the present invention provides a storage control system.
Specifically, the storage control system comprises a training set, a data acquisition unit and a data processing unit, wherein the training set is used for acquiring data information of disk samples in multiple dimensions to construct a training set which is readable by a weak learning algorithm in an Adaboost algorithm; training the training set by an Adaboost algorithm to generate a strong classifier; and performing fault prediction on the disk in the storage node according to the strong classifier.
The adaboost algorithm is widely applied to the machine learning algorithm in the field of face recognition or image processing, is applied to the operation and maintenance field for the first time, achieves the progress across application levels, and provides a new alternative for the important problem of disk failure prediction in operation and maintenance, the adaboost algorithm is used for continuously and iteratively training a weak classifier to predict disk failure occurrence based on data information of a disk sample, so that the difference caused by SMART threshold values of disks of various manufacturers can be weakened, the algorithm is simple to achieve, the training time is short, the disk prediction efficiency is high, and the prediction accuracy is high and reaches 98%. In addition, the method can acquire the data information of each disk to be detected according to a certain time interval (for example, every hour), and if the time interval is shortened, the method can predict the disk fault in near real time, verify the disk fault in near real time, and is convenient for finding and processing in time.
These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a disk failure prediction method according to an embodiment of the present invention;
FIG. 2 is a flowchart of generating a strong classifier in a disk failure prediction method according to an embodiment of the present invention;
FIG. 3 is a flowchart of the method for generating a strong classifier in the disk failure prediction method according to the third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a disk failure prediction apparatus according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a detailed structure of a strong classifier generating module in a disk failure prediction apparatus according to a second embodiment of the present invention;
fig. 6 is a schematic diagram of a detailed structure of a strong classifier generating module in a three-disk failure prediction apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data used are interchangeable under appropriate circumstances to ensure that embodiments of the invention can be practiced in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or modules is not necessarily limited to those steps or modules explicitly listed, but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.
[ METHOD EXAMPLE 1 ]
Embodiment 1 of the present invention provides a disk failure prediction method, which may be executed in a mobile terminal, a computer terminal, or a similar operation device. Taking the example of running on a computer terminal, the computer terminal may include one or more processors (which may include but are not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory for storing data, and a transmission module for communication functions.
The memory may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the disk failure prediction method in the embodiment of the present invention. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor executes various functional applications and data processing by running software programs and modules stored in the memory, that is, the disk failure prediction method is realized.
The transmission device is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In the foregoing operating environment, as shown in fig. 1, the disk failure prediction method provided in this embodiment includes the following steps:
step S11, acquiring data information of a disk sample in multiple dimensions to construct a training set readable by a weak learning algorithm in an Adaboost algorithm;
in the above steps, the data information of the disk sample is various disk data generated in the use process of the disk, so as to predict the failure state of the disk, so that a disk user can know that the disk is about to fail before the disk fails, and thus, the data in the disk is copied and stored, and the data loss is avoided.
In this embodiment, the data information of all disk samples is merged into a training set, which is denoted as { (X)1,y1),(X2,y2)…(Xi,yi)…(Xn,yn) Where n is the total number of disks, XiData information, y, representing disk samples iiIndicating the state of disk i, yi=Y∈{-1,+1},yi1 denotes a positive sample, yiNegative examples are indicated by-1. The method for judging the state of the disk sample comprises the following steps: if the SMART information of the disk i exceeds a SMART threshold value, the disk i is in a failure state, if not, the disk i is in a normal state, wherein the SMART threshold value is limited by a brand manufacturer of the disk.
Step S12, training a training set through an Adaboost algorithm to generate a strong classifier;
adaboost is an iterative algorithm, and the core idea is as follows: for a specific disk prediction problem, different weak classifiers are trained aiming at the same disk sample training set, and then the weak classifiers are collected to form a strong classifier. The weak classifier is a classifier which predicts the target disk better than random guess, and the strong classifier achieves the ideal target disk prediction rate through learning of a certain disk sample.
In this embodiment, a weak classifier is trained for multiple times based on collected disk sample data, after each weak classifier training, the weight of each disk sample is determined to change the data distribution of the disk sample according to whether the classification of each disk sample is correct and the accuracy of the overall classification of the last training, so as to determine the weak classifier used in the next training, the disk sample data with the modified weight is sent to the next weak classifier for training, and finally, the weak classifiers obtained in each training are weighted and summed to serve as the strong classifier for final decision making.
And step S13, performing fault prediction on the disk to be tested according to the strong classifier.
Specifically, after receiving data information of a disk to be tested, extracting a characteristic value from the data information of the disk to be tested according to all dimensions corresponding to the strong classifier, inputting the extracted characteristic value into the strong classifier, and predicting whether the disk to be tested fails according to an output result of the strong classifier.
According to the technical scheme, the adaboost algorithm is applied to the operation and maintenance field, the occurrence of the disk faults is predicted by continuously iteratively training the weak classifier based on the data information of the disk samples, the difference caused by SMART thresholds of the disks of various manufacturers can be weakened, the prediction accuracy is high, the algorithm is simple to implement, and the disk prediction efficiency is high. In addition, the method can acquire the data information of each disk to be detected according to a certain time interval (for example, every hour), and if the time interval is shortened, the method can predict the disk fault in near real time, verify the disk fault in near real time, and is convenient for finding and processing in time. In addition, the disk state prediction accuracy is further ensured by predicting the disk sample multi-dimensional data information.
Optionally, in an implementation manner of this embodiment, the data information of the disk sample may include SMART information. The SMART information comprises the reading error rate of the bottom layer data of the disk, the performance information of the reading and writing flux, the startup time information of the disk and the like.
Optionally, in an implementation manner of this embodiment, the data information of the disk sample further includes information of at least one dimension of IO state information, environmental temperature information, and humidity information.
Because the SMART threshold values and/or the service scenes and/or the actual environments of the disks of all brands are different, the method comprehensively considers the environmental factors on the basis of single SMART information, and further ensures the accuracy of disk state prediction.
Optionally, in an implementation manner of this embodiment, step S13 specifically includes:
and training the disk sample data information classified according to brand manufacturers and/or business scenes through an Adaboost algorithm to generate a strong classifier.
The method classifies the data information of the sample disk, so that for example, after the data information of the disk sample based on the same brand manufacturer is trained, the disk fault of the same brand manufacturer can be predicted, and the method is favorable for the brand quality identification of the disk, or after the data information of the disk sample based on the same service scene is trained, the disk fault of the same service scene can be predicted, and the method is favorable for the service scene analysis.
[ METHOD EXAMPLE 2 ]
The method provided by this embodiment includes all the contents of method embodiment 1, and is not described herein again. As shown in fig. 2, in the present embodiment, step S12 is implemented by:
step S121, initializing the weight distribution of each disk sample in a training set;
optionally, in an implementation manner of this embodiment, during the first training, the initial weights of the disk samples may be set to be the same in advance, that is, in other words, in an implementation manner of this embodiment, the initial weights of the disk samples may be set to be then is the total number of disk samples, ω1,iRefer to the weight of the ith disk sample before the 1 st training.
Step S122, presetting at least one classifier for the data information of each dimension of the disk sample;
in the step, according to the disk sample data information, determining all dimensions contained in the disk sample data information to form a dimension set { d }1,d2、…de…dkAnd presetting classifiers with corresponding dimensionalities to form a preset classifier set { h }1(x)、h2(x)、hk(x)、…、hm(x) In which hk(x) The data information of the dimension k is corresponding to preset classifiers, m is larger than or equal to k, and one dimension can correspond to a plurality of preset classifiers.
For example, if the data information of the disk sample includes the read error rate of the data of the bottom layer of the disk and the ambient temperature, the disk will be readThe sample data information contains the following dimensions: disk bottom layer data read error rate dimension d1Ambient temperature dimension d2. At the moment, a classifier h corresponding to the dimension of reading error rate of the bottom layer data of the disk is preset1(x) Classifier h corresponding to environment temperature dimension2(x)。
In the present embodiment, the classifier h is presetk(x) A single-level decision tree is used. As the best implementation manner of this embodiment, the selected preset classifier h is trained for a certain dimension kk(x) The following were used:
wherein: f (x)i) The characteristic value of the data information of the disk sample i in the dimension k is obtained. q. q.skA disk state threshold set for dimension k. Disk state threshold qkDuring setting, sorting the characteristic values of the data information of all the disk samples in the dimension k, and further determining a disk state threshold qkThe disk state threshold qkEnabling a preset classifier hk(x) The classification error rate for all disk samples in dimension k is minimal, typically slightly below 50%. Wherein the classification error rate is preset classifier hk(x) And the current weight sum of all the disk samples with errors in prediction, wherein the disk samples with errors in prediction by the preset classifier comprise disk samples with normal disks predicted and classified as faults and disk samples with faults predicted and classified as normal disks. For the dimension k, comparing the disk state threshold with the characteristic value of the data information of each disk sample in the dimension k, determining the theoretical state of each disk sample, and then judging whether the theoretical state of each disk sample matches the real state of each disk sample according to the real state of the corresponding disk sample obtained by monitoring, so as to determine whether the disk sample is predicted incorrectly by a preset classifier.
Step S123, training an optimal classifier according to the data information of the disk sample, wherein the optimal classifier is a preset classifier with the minimum current classification error rate in the preset multidimensional classifier aiming at the current weight distribution of the disk sample, and the training frequency is at least one time;
optionally, in an implementation manner of this embodiment, the current classification error rate of each preset classifier is determined according to the current weight distribution of the disk samples. The current classification error rate may be the sum of the current weights of all disk samples that were mispredicted by the corresponding preset classifier. The weights of the disk samples constitute a vector (ω)t,1,ωt,2,…,ωt,i,…,ωt,n) Wherein, ω ist,iRefer to the weight of the ith disk sample before the t-th training.
Optionally, in an implementation manner of this embodiment, a preset classifier with the smallest current classification error rate is used as the optimal classifier, and the optimal classifier is a weak classifier suitable for the adaboost algorithm. It is understood that the classification error rate of the best classifier is the minimum classification error rate, and the corresponding dimension of the best classifier is the dimension of the classifier with the minimum classification error rate. All the pre-trained best classifiers are merged into a set cf1,1(x)、cf2,2(x)、cft,t(x)、…、cft,p(x) In which c isf,t(x) For the best classifier of the t-th pre-training, best classifier cf,t(x) Training disk sample data information in ft dimension, optimal classifier cft,t(x) The ft dimension of (a) is the dimension corresponding to the classifier with the minimum classification error rate in the preset classifiers of all dimensions of the t-th training, and the optimal classifier cft,t(x) Is recorded as epsilont。
For example, if 10 disk samples are taken, each sample includes information of two dimensions, such as the read error rate of the disk bottom layer data and the ambient temperature in the SMART information, corresponding to the dimension d1,d2The corresponding classifier of the two dimensions is h1(x)、h2(x) The initial weight of all disks is set to 0.1.
The disk bottom layer data reading error rate dimension is used for classifying the disk bottom layer data reading error rate of all disk samples and a classifier h1(x) Is compared with the threshold value in (1), and then obtained according to the monitoringPredicting and judging the real state of the corresponding disk sample, and only the first 3 disk samples are classified by a classifier h as a result1(x) Is predicted incorrectly; assuming classification in the ambient temperature dimension, the ambient temperature of all disk samples is compared to a classifier h2(x) Comparing the threshold values, and then performing prediction judgment according to the real states of the corresponding disk samples obtained by monitoring, wherein the result is that only the first 4 disk samples are classified by a classifier h2(x) Is predicted to be incorrect.
Initial weight distribution ω for disk samples before first training1,1=ω1,2=ω1,3=ω1,4=ω1,5=ω1,6=ω1,7=ω1,8=ω1,9=ω1,100.1, classifier h1(x) Has a classification error rate of omega1,1+ω1,2+ω1,30.3; classifier h2(x) Has a classification error rate of omega1,1+ω1,2+ω1,3+ω1,40.4. All classifiers h1(x)、h2(x) Middle, classifier h1(x) So that the classifier h is set to minimize the classification error rate1(x) As the best classifier c for the first pre-trainingf1,1(x) The optimal classifier cf1,1(x) The corresponding dimension f1 is the classifier h with the smallest error rate for the 1 st training classification1(x) The corresponding dimension, i.e. the dimension of the read error rate of the underlying data of the disk, the optimal classifier cf1,1(x) Is classified into error rates epsilon1For the classifier h1(x) I.e., 0.3.
Step S124, determining the weight of the best classifier in the current training in the strong classifier;
optionally, in an implementation manner of this embodiment, the weight occupied by the currently trained best classifier in the strong classifierIt can be seen thattWith epsilontI.e., the classifier with the smaller classification error rate will have a greater effect in the final strong classifier. Since in the above example, training is the first timeOf the best classifier cf1,1(x) Is classified into error rates epsilon10.3, so the best classifier c is trained for the first timef1,1(x) Weight occupied in strong classifier
It should be noted that, if the classification error rate of a certain classifier is 0, an anomaly with a divisor of 0 occurs when calculating the weight of the classifier in the final classifier, which belongs to boundary processing. At this time, the error rate can be set to a very small value according to the specific situation of the data information of the disk to be tested.
Step S125, updating the weight of each disk sample in the training set according to the current minimum error rate;
increasing the weight of the disk sample which is wrongly predicted by the currently trained optimal classifier, and reducing the weight of the disk sample which is correctly predicted by the currently trained optimal classifier;
it will be appreciated that the weights of all disk samples may be normalized.
Step S126, judging whether the currently trained optimal classifier meets preset conditions or not, if so, performing the next step, and if not, turning to step S123;
optionally, in an implementation manner of this embodiment, the preset condition is:
the classification error rate of the currently trained optimal classifier is smaller than a preset threshold of the classifier error rate; and/or
The training times reach a preset threshold value of the training times.
And step S127, determining a strong classifier by using a weighted summation mode of all the optimal classifiers.
In particular, a strong classifierWhere T is the total number of training sessions.
According to the technical scheme, before the optimal classifier is trained, the optimal classifier which is pre-trained currently is selected from the classifiers of multiple dimensions of the magnetic disk sample data information according to the weight distribution of the current magnetic disk sample, and after the optimal classifier is trained, the weight distribution of the magnetic disk sample is changed, so that the effect of the classifier with smaller classification error rate in the final strong classifier is larger, and the prediction accuracy is greatly improved when the formed strong classifier carries out magnetic disk fault prediction. In addition, the disk state prediction accuracy is further ensured by predicting the disk sample multi-dimensional data information.
Optionally, in an implementation manner of this embodiment, step S125 may be implemented by:
according to the formulaChanging the weight of each disk sample, where ωt+1,iIs the weight of the ith disk sample before t +1 training, cft,t(x) Is the best classifier for the t-th training, corresponding to the disk sample data information, y, in the training dimension ftiIs the state of the ith disk sample, αtOptimal classifier c for t-th trainingf,t(x) The weight occupied in the strong classifier,and N is the total number of the disk samples.
Wherein Z istIs a normalization factor for making the sum of the weight factors of the disk samples 1.0, thereby making the vector Dt+1=(ωt+1,1,ωt+1,2…ωt+1,i,…,ωt+1,n) Is a probability distribution vector.
For example, since in the above example, the best classifier c was trained for the first time1(x) Is classified into error rates epsilon1Weight α occupied in strong classifier as 0.310.4236. After the first training, Z can be calculated according to the formula10.9165, and then the weight of the disk sample that is predicted to be classified as wrong after the first training, i.e. the weight ω of the first three disk samples2,1=ω2,2=ω2,30.1667, is predicted classified as positiveThe exact disk sample weight, i.e., the last seven disk sample weights ω2,4=ω2,5=ω2,6=ω2,7=ω2,8=ω2,9=ω2,10=0.0714。
Therefore, after the first training, the weight of the disk sample with the wrong prediction classification is improved, the weight of the disk sample with the correct prediction classification is reduced, and a new disk sample weight distribution is obtained, so that the disk sample can be known to have the prediction error by observing the weight update of the disk sample. And training the next time based on the new disk distribution, so that disk samples with wrong prediction and classification can be intensively learned in the next classifier learning.
Weight distribution omega of disk samples before second training2,1=ω2,2=ω2,3=0.1667,ω2,4=ω2,5=ω2,6=ω2,7=ω2,8=ω2,9=ω2,100.0714 due to classifier h1(x) Predicting the sample with wrong classification as the first 3 disk samples, classifier h2(x) The samples with wrong classification are predicted to be the first 4 disk samples, so the classifier h aims at the weight distribution of the disk samples before the second training1(x) Has a classification error rate of omega21+ω22+ω230.5001; classifier h2(x) Has a classification error rate of omega27+ω28+ω29+ω2100.2856. All classifiers h1(x)、h2(x) Middle, classifier h2(x) So that the classifier h is set to minimize the classification error rate2(x) Optimal classifier c as a second pre-training2f,2(x) The best classifier cf2,2(x) The corresponding dimension f2 is the classifier h with the smallest error rate for the 2 nd training classification2(x) Corresponding dimension, i.e. ambient temperature, the optimal classifier c2f,2(x) Is classified into error rates epsilon2For the classifier h2(x) 0.2856. Thus the second trained optimal classifier c2f,2(x) Weight occupied in strong classifier
[ METHOD EXAMPLE 3 ]
The method provided by this embodiment includes all the contents of method embodiment 1, and is not described herein again. As shown in fig. 3, in the present embodiment, step S12 is implemented by:
step S12a, initializing the weight distribution of each disk sample in the training set;
step S12b, respectively training at least one first classifier for the data information of each dimension of each disk sample according to the current weight distribution of the disk sample, wherein the training times are at least one time;
optionally, in an implementation manner of this embodiment, in each dimension of the disk sample, the first classifier is trained according to corresponding data information, each dimension may correspond to multiple first classifiers, and the multiple first classifiers of all dimensions are trained simultaneously.
Step S12c, determining the current classification error rate of each first classifier, and selecting the first classifier with the minimum current classification error rate in all dimensions as a second classifier;
step S12d, determining the weight of the second classifier;
step S12e, updating the weight of each disk sample in the training set according to the current minimum error rate;
step S12f, judging whether the first classifier currently trained meets the preset condition, if so, carrying out the next step, and if not, turning to the step S12 b;
optionally, in an implementation manner of this embodiment, the preset condition is:
the classification error rate of the currently selected first classifier is smaller than a preset threshold value of the error rate of the classifier; and/or
The training times reach a preset threshold value of the training times.
And step S12g, determining a strong classifier by using a weighted summation mode of all the second classifiers.
It should be noted that, the parts of the above steps, for which description is omitted, are the same as the principle of the foregoing method embodiment 2, and are not repeated again.
According to the technical scheme, in the embodiment, for the current weight distribution of the disk sample, at least one first classifier is trained on the data information of each dimension, the first classifier with the minimum current classification error rate in all dimensions is selected as a second classifier, and all the second classifiers are used as references of the final strong classifier so as to realize disk prediction by using the final strong classifier. The method for respectively training the first classifier to select the second classifier in the strong classifier through each dimension is simpler in algorithm and high in prediction execution efficiency.
It should be noted that for simplicity of description, all the above method embodiments are described as a series of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Product example 1
Fig. 4 is a schematic structural diagram of a disk failure prediction apparatus according to embodiment 1 of the present invention. Referring to fig. 4, in this embodiment, the disk failure prediction apparatus includes a disk sample data information obtaining module 41, a strong classifier generating module 42, and a disk failure prediction module 43. The disk sample data information obtaining module 41 is configured to obtain data information of a disk sample in multiple dimensions, so as to construct a training set readable by a weak learning algorithm in an Adaboost algorithm; the strong classifier generating module 42 is configured to train the training set through an Adaboost algorithm to generate a strong classifier; and the fault prediction module 43 is configured to perform fault prediction on the disk to be tested according to the strong classifier.
It should be noted that, the disk sample data information obtaining module 41, the strong classifier generating module 42 and the disk failure predicting module 43 are the same as the example and the application scenario realized in the steps S11 to S13 of the first embodiment of the method, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal provided in the first embodiment.
According to the technical scheme, the strong classifier generation module 42 of the disk failure prediction device applies the adaboost algorithm as a machine learning algorithm to the operation and maintenance field, and based on data information of disk samples, strong classifiers are generated by training weak classifiers continuously and iteratively to predict disk failure occurrence through the disk failure prediction module 43. In addition, in the apparatus, the disk sample data information obtaining module 41 may obtain the data information of each disk to be tested according to a certain time interval (for example, every hour), and if the time interval is shortened, the disk failure prediction apparatus of this embodiment may predict the disk failure in near real time, verify the disk failure in near real time, and facilitate timely discovery and processing.
According to the above embodiment, in a preferred scheme, the data information of the disk sample includes only SMART information of the disk sample, or includes SMART information of the disk sample and other disk state information, where the other disk state information includes one or more of IO state information, ambient temperature information, and humidity information of the disk sample.
Because the SMART threshold values and/or the service scenes and/or the actual environments of the disks of all brands are different, the disk failure prediction device of the embodiment comprehensively considers environmental factors on the basis of single SMART information, and further ensures the accuracy of disk state prediction.
According to the above embodiment, in a preferred scheme, the strong classifier generating module 42 specifically trains the disk sample data information classified according to the brand manufacturer and/or the service scene through the Adaboost algorithm to generate the strong classifier.
The disk failure prediction device of the embodiment classifies the data information of the sample disk, so that, for example, after the data information of the disk sample of the same brand manufacturer is trained, the disk failure of the same brand manufacturer can be predicted, which is beneficial to the brand quality identification of the disk, or after the data information of the disk sample of the same service scene is trained, the disk failure of the same service scene can be predicted, which is beneficial to the service scene analysis.
Product example 2
Fig. 5 is a schematic structural diagram of a disk failure prediction apparatus according to embodiment 2 of the present invention. Referring to fig. 5, in this embodiment, the apparatus provided in this embodiment includes all the contents of the product embodiment 1, and is not described herein again. As shown in fig. 5, the strong classifier generating module 42 specifically includes a first disk weight setting sub-module 421, a first classifier training sub-module 422, a second classifier determining sub-module 423, a first disk weight updating sub-module 424, and a first strong classifier determining sub-module 425. The first disk weight setting submodule 421 is configured to initialize the weight distribution of each disk sample in the training set; the first classifier training sub-module 422 is configured to train at least one first classifier for the data information of each dimension of each disk sample, respectively, according to the current weight distribution of the disk sample, where the number of times of training is at least one; the second classifier determining sub-module 423 is used for determining the current classification error rate of the first classifier after each training of the first classifier, and selecting the first classifier with the minimum current classification error rate in all dimensions as the second classifier; the first disk weight updating submodule 424 is configured to update the weight of each disk sample in the training set according to the current minimum error rate; the first strong classifier determining sub-module 425 is configured to determine a strong classifier by performing a weighted summation of all the second classifiers when the currently selected second classifier satisfies a preset condition.
It should be noted here that, the first disk weight setting sub-module 421, the first classifier training sub-module 422, the second classifier determining sub-module 423, the first disk weight updating sub-module 424, and the first strong classifier determining sub-module 425 are the same as the example and the application scenario realized in step S12a to step S12g of the third embodiment of the method, but are not limited to the disclosure of the third embodiment. It should be noted that the modules described above as a part of the apparatus may be operated in the computer terminal provided in the third embodiment.
As can be seen from the above technical solution, in this embodiment, for the current weight distribution of the disk sample, the first classifier training sub-module 422 trains at least one first classifier for the data information of each dimension, the second classifier determining sub-module 423 selects the first classifier with the smallest current classification error rate in all dimensions as the second classifier, and then the first disk weight updating sub-module 424 changes the weight distribution of the disk sample, so that the function of the classifier with the smaller classification error rate in the final strong classifier is larger, and thus, the prediction accuracy is greatly improved when the strong classifier determined by the first strong classifier determining sub-module 425 performs disk fault prediction. In addition, since the first classifier training sub-module 422 trains the optimal classifier by using the multi-dimensional data information of the disk sample, the first strong classifier determining sub-module 425 obtains a strong classifier based on multiple dimensions, and the disk state is predicted by using the multi-dimensional strong classifier, so that the prediction accuracy is higher. Moreover, the first classifier training sub-module 422 trains the first classifier for the data information of each dimension respectively, the second classifier determining sub-module 423 selects the first classifier with the minimum current classification error rate in all dimensions as the second classifier, and then all the second classifiers are used as the reference of the final strong classifier, and the way of training the first classifier to select the second classifier in the strong classifier is adopted in each dimension, so that the algorithm is simpler, and the prediction execution efficiency is high.
Optionally, in an implementation manner of this embodiment, the apparatus further includes a second classifier weight determining sub-module 426, and the second classifier weight determining sub-module 426 is configured to determine the weight of the currently selected second classifier according to the minimum current classification error rate in all dimensions.
Optionally, in an implementation manner of this embodiment, the second classifier determining sub-module 423 is specifically configured to determine the current classification error rate of the first classifier as a sum of current weights of all disk samples that are predicted to be erroneous by the corresponding first classifier.
Optionally, in an implementation manner of this embodiment, the first disk weight updating submodule 424 is specifically configured to update the first disk weight according to a formulaUpdate the weight of each disk sample, where ωt+1,iIs the weight of the ith disk sample before t +1 training, cf,t(x) Is the second classifier of the t-th selection, yiFor the state of the ith disk sample,εtis a second classifier cf,t(x) The error rate of the classification of (a),and N is the total number of the disk samples.
Product example 3
Fig. 6 is a schematic structural diagram of a disk failure prediction apparatus according to embodiment 3 of the present invention. Referring to fig. 6, in this embodiment, the apparatus provided in this embodiment includes all the contents of the product embodiment 1, and is not described herein again. As shown in fig. 6, the strong classifier generating module 42 specifically includes a second disk weight setting sub-module 42a, an optimal classifier training sub-module 42b, a second disk weight updating sub-module 42c, and a second strong classifier determining sub-module 42 d. The second disk weight setting submodule 42a is configured to initialize the weight distribution of each disk sample in the training set; the optimal classifier training sub-module 42b is configured to train an optimal classifier according to the data information of the disk sample, where the optimal classifier is a preset classifier with a minimum current classification error rate for the current weight distribution of the disk sample in a preset multi-dimensional classifier, and the training frequency is at least one; the second disk weight updating submodule 42c updates the weight of each disk sample in the training set according to the current minimum error rate after the optimal classifier is trained each time; the second strong classifier determining sub-module 42d is used for determining the strong classifier by using a weighted summation of all the best classifiers when the currently trained best classifier satisfies the preset condition.
It should be noted here that the second disk weight setting sub-module 42a, the optimal classifier training sub-module 42b, the second disk weight updating sub-module 42c, and the second strong classifier determining sub-module 42d are the same as the example and application scenario realized in steps S121 to S127 of the second embodiment of the method, but are not limited to the disclosure of the second embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal provided in the first embodiment.
According to the above technical solution, in the disk failure prediction apparatus of this embodiment, before the first classifier training sub-module 522 trains the optimal classifier, the current pre-trained optimal classifier is selected according to the weight distribution of the current disk sample among the classifiers with multiple dimensions of the disk sample data information, and after the optimal classifier training, the first disk weight updating sub-module 42c changes the weight distribution of the disk sample, so that the classifier with a smaller classification error rate has a larger effect in the final strong classifier, and thus the prediction accuracy is greatly improved when the strong classifier formed by the strong classifier determining sub-module 42d performs disk failure prediction. In addition, since the optimal classifier training sub-module 42b trains the optimal classifier by using the multidimensional data information of the disk sample, the strong classifier determining sub-module 42c obtains a strong classifier based on multiple dimensions, and the disk state is predicted by using the multidimensional strong classifier, so that the prediction accuracy is higher.
Optionally, in an implementation manner of the present embodiment, the apparatus further includes an optimal classifier weight determining sub-module 42e, and the optimal classifier weight determining sub-module 42e is configured to determine the weight of the optimal classifier according to the minimum current classification error rate.
Optionally, in an implementation manner of this embodiment, the current classification error rate of the preset classifier is determined as the sum of the current weights of all the disk samples with prediction errors corresponding to the preset classifier.
Optionally, in an implementation manner of this embodiment, the second disk weight updating submodule 42c is specifically configured to update the second disk weight according to a formulaUpdate the weight of each disk sample, where ωt+1,iIs the weight of the ith disk sample before t +1 training, cft,t(x) Is the best classifier for the t-th training, yiFor the state of the ith disk sample,εtas an optimal classifier cf,t(x) The error rate of the classification of (a),and N is the total number of the disk samples.
The embodiment of the application also provides a computer device, which can be any computer terminal device in a computer device group, and can also be replaced by terminal devices such as a mobile terminal and the like. In this embodiment, the computer device may be at least one network device of a plurality of network devices located in a computer network. In this embodiment, the computer device implements the disk failure prediction method according to this embodiment.
The computer device of this embodiment may include one or more processors and memory. The memory may be used to store computer programs and modules, such as program instructions/modules corresponding to the disk failure prediction methods and apparatuses in the above embodiments, and may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The processor executes various functional applications and data processing by running software programs and modules stored in the memory, that is, the disk failure prediction method is realized. Specifically, the processor is configured to execute a computer program stored in the memory to implement any one of the disk failure prediction methods mentioned above, or to implement processing performed by any one of the disk failure prediction apparatuses mentioned above. It is understood that the processor may call the information stored in the memory and the application program through the transmission device to execute the disk failure prediction method according to the embodiment of the present method.
The computer equipment realizes the disk failure prediction method, can weaken the difference brought by SMART threshold values of disks of various manufacturers on the basis of successfully predicting the disk failures in the same way, and has high prediction accuracy and high prediction efficiency. In addition, the disk state prediction accuracy is further improved by predicting the disk sample multi-dimensional data information.
It will be understood by those skilled in the art that the computer device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a Mobile Internet Device (MID), a PAD, etc.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The embodiment of the application further provides a storage medium. In this embodiment, the computer storage medium has a computer program stored therein, where the computer program, when executed by a processor, implements the disk failure prediction method mentioned above, or implements the processing performed by any of the disk failure prediction apparatuses mentioned above. Optionally, in this embodiment, the storage medium may be located in any one of computer devices in a computer device group in a computer network, or in any one of mobile terminals in a mobile terminal group.
The computer storage medium realizes the disk failure prediction method, can weaken the difference brought by SMART threshold values of disks of various manufacturers on the basis of successfully predicting the disk failures in the same way, and has high prediction accuracy and high prediction efficiency. In addition, the disk state prediction accuracy is further improved by predicting the disk sample multi-dimensional data information.
The embodiment of the application also provides a storage control system, which comprises a plurality of storage nodes and a monitoring server, wherein each storage node is provided with a disk sample, and the monitoring server is used for acquiring data information of the disk samples in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm; training the training set by an Adaboost algorithm to generate a strong classifier; and performing fault prediction on the disks in the storage nodes according to the strong classifier.
The storage nodes are provided with disks for storing data. In the embodiment of the application, the storage node may obtain current data information of the disk, which represents the current running state of the disk, such as SMART information of a disk sample, or IO state information, environmental temperature, humidity information, and the like of the disk sample, and report the obtained current data information of the disk to the monitoring server. For example, the storage node may obtain current data information of the disk through the disk drive and the interface connected to the disk. For example, a state acquisition module for information acquisition is preset in the storage node, so that current data information of the disk is acquired through the state acquisition module. Of course, besides the current data information of the disk, the storage node may also obtain attribute information of the disk, such as the model, type, and the like of the disk, so that the subsequent monitoring server classifies the relevant data information of the disk with different attributes reported by different storage nodes based on the attributes of the disk. For the current data information of the disk reported by each storage node, the monitoring server may predict the occurrence of the failure of the disk based on the current data information of the disk in sequence for the disk in each storage node.
It should be noted that, the monitoring server and the examples and application scenarios implemented in step S11 to step S13 of the first embodiment of the method are the same, but are not limited to the disclosure of the first embodiment.
Optionally, in an implementation manner of this embodiment, the storage node is specifically configured to send the obtained current data information of the disk to the monitoring server at a certain time interval.
Optionally, in an implementation manner of this embodiment, the monitoring server requests current data information of the corresponding disk from the storage node at a certain time interval.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The embodiments in the present specification are described in a progressive manner, and the same and similar parts in the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (19)
1. A disk failure prediction method comprises the following steps:
acquiring data information of a disk sample in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm;
training the training set by an Adaboost algorithm to generate a strong classifier;
and predicting the fault of the disk to be tested according to the strong classifier.
2. The disk failure prediction method of claim 1, wherein the training set by an Adaboost algorithm to generate a strong classifier comprises:
initializing the weight distribution of each disk sample in the training set;
respectively training at least one first classifier for the data information of each dimension of each disk sample according to the current weight distribution of the disk sample, wherein the training times are at least one;
after each training, determining the current classification error rate of the first classifier, selecting the first classifier with the minimum current classification error rate in all dimensions as a second classifier, and updating the weight of each disk sample in the training set according to the current minimum error rate;
and when the currently selected second classifier meets the preset condition, determining the strong classifier by using a weighted summation mode of all the second classifiers.
3. The disk failure prediction method of claim 1, wherein the training set by an Adaboost algorithm to generate a strong classifier comprises:
initializing the weight distribution of each disk sample in the training set;
training an optimal classifier according to the data information of the disk sample, wherein the optimal classifier is a preset classifier with the minimum current classification error rate in a preset multi-dimensional classifier according to the current weight distribution of the disk sample, and the training times are at least one;
after each training, updating the weight of each disk sample in the training set according to the current minimum error rate;
and when the currently trained optimal classifier meets the preset condition, determining a strong classifier by using a weighted summation mode of all the optimal classifiers.
4. The disk failure prediction method according to claim 2 or 3, wherein the current classification error rate of the classifier is a sum of current weights of all disk samples that are predicted to be erroneous by the corresponding classifier.
5. A disk failure prediction method according to claim 2 or 3, characterized in that it is based on a formulaUpdate the weight of each disk sample, where ωt+1,iIs the weight of the ith disk sample before t +1 training, cft,t(x) Is the classifier with the smallest classification error rate in the t-th cycle, the best classifier cft,t(x) Ft is the dimension corresponding to the classifier with the minimum classification error rate in the preset classifiers of all dimensions of the t-th training, yiFor the state of the ith disk sample,εtfor the classifier cft,t(x) The error rate of the classification of (a),and N is the total number of the disk samples.
6. The disk failure prediction method according to claim 5, wherein the data information of the disk sample includes only SMART information of the disk sample, or includes SMART information of the disk sample and other disk state information, and the other disk state information includes one or more of IO state information, ambient temperature and humidity information of the disk sample.
7. The disk failure prediction method of claim 6, wherein the training of the data information of the disk samples by the Adaboost algorithm to generate the strong classifier comprises:
and training the disk sample data information classified according to brand manufacturers and/or business scenes through the Adaboost algorithm to generate a strong classifier.
8. The disk failure prediction method according to claim 5, wherein the failure prediction is performed on the disk to be tested according to the strong classifier:
after receiving the current data information of the disk to be tested, extracting characteristic values of the current data information of the disk to be tested according to all dimensions corresponding to the strong classifier, inputting the extracted characteristic values into the strong classifier, and predicting whether the disk to be tested fails according to the output result of the strong classifier.
9. A disk failure prediction apparatus comprising:
the disk sample data information acquisition module is used for acquiring data information of a disk sample in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm;
the strong classifier generating module is used for training the training set through an Adaboost algorithm to generate a strong classifier;
and the disk fault prediction module is used for predicting the fault of the disk to be tested according to the strong classifier.
10. The disk failure prediction apparatus of claim 9, wherein the strong classifier generation module comprises:
the first disk weight setting submodule is used for initializing the weight distribution of each disk sample in the training set;
the first classifier training sub-module is used for respectively training at least one first classifier for the data information of each dimension of each disk sample according to the current weight distribution of the disk sample, wherein the training times are at least one;
the second classifier determining submodule is used for determining the current classification error rate of the first classifier after the first classifier is trained every time, and selecting the first classifier with the minimum current classification error rate in all dimensions as the second classifier;
the first disk weight updating submodule is used for updating the weight of each disk sample in the training set according to the current minimum error rate;
and the first strong classifier determining submodule is used for determining the strong classifier by utilizing a weighted summation mode of all the second classifiers when the currently selected second classifier meets the preset condition.
11. The disk failure prediction method of claim 10, wherein the strong classifier generation module comprises:
the second disk weight setting submodule is used for initializing the weight distribution of each disk sample in the training set;
the optimal classifier training sub-module is used for training an optimal classifier according to the data information of the disk sample, the optimal classifier is a preset classifier with the minimum current classification error rate aiming at the current weight distribution of the disk sample in a preset multi-dimensional classifier, and the training frequency is at least one time;
the second disk weight updating submodule updates the weight of each disk sample in the training set according to the current minimum error rate after the optimal classifier is trained each time;
and the second strong classifier determining submodule is used for determining the strong classifier by utilizing a weighted summation mode of all the optimal classifiers when the currently trained optimal classifier meets the preset condition.
12. The disk failure prediction method according to claim 10 or 11, wherein the current classification error rate of the classifier is a sum of current weights of all disk samples that are mispredicted by the corresponding classifier.
13. The disk failure prediction method of claim 11, wherein the second disk weight update submodule is specifically configured to update the weight of the second disk according to a formulaUpdate the weight of each disk sample, where ωt+1,iIs the weight of the ith disk sample before t +1 training, cft,t(x) Is the best classifier for the t-th training, yiFor the state of the ith disk sample,εtas an optimal classifier cf,t(x) The error rate of the classification of (a),and N is the total number of the disk samples.
14. The disk failure prediction method according to claim 13, wherein the data information of the disk sample includes only SMART information of the disk sample, or includes SMART information of the disk sample and other disk state information, and the other disk state information includes one or more of IO state information, ambient temperature and humidity information of the disk sample.
15. The disk failure prediction method according to claim 14, wherein the strong classifier generating module specifically trains the disk sample data information classified according to brand manufacturers and/or business scenarios through the Adaboost algorithm to generate the strong classifier.
16. A computer device, comprising:
a processor; and
a memory for storing a computer program for executing a computer program,
wherein the processor is configured to execute a computer program stored in the memory to implement the disk failure prediction method according to any one of claims 1 to 8.
17. A computer storage medium, wherein a computer program is stored in the computer storage medium, and when executed by a processor, the computer program implements the disk failure prediction method according to any one of claims 1 to 8.
18. A storage control system comprises a plurality of storage nodes and a monitoring server, wherein each storage node is provided with a disk sample, and the monitoring server is used for acquiring data information of the disk samples in multiple dimensions to construct a training set which can be read by a weak learning algorithm in an Adaboost algorithm; training the training set by an Adaboost algorithm to generate a strong classifier; and performing fault prediction on the disk in the storage node according to the strong classifier.
19. The storage control system according to claim 17, wherein the storage node is specifically configured to obtain current data information of a corresponding disk, and send the current data information to the monitoring server at a certain time interval; or,
and the monitoring server requests the current data information of the corresponding disk from the storage node at a certain time interval.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810359823.1A CN110389866A (en) | 2018-04-20 | 2018-04-20 | Disk failure prediction technique, device, computer equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810359823.1A CN110389866A (en) | 2018-04-20 | 2018-04-20 | Disk failure prediction technique, device, computer equipment and computer storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110389866A true CN110389866A (en) | 2019-10-29 |
Family
ID=68282754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810359823.1A Withdrawn CN110389866A (en) | 2018-04-20 | 2018-04-20 | Disk failure prediction technique, device, computer equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110389866A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178378A (en) * | 2019-11-07 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Equipment fault prediction method and device, electronic equipment and storage medium |
CN112286469A (en) * | 2020-12-28 | 2021-01-29 | 湖南源科创新科技有限公司 | Solid state disk temperature rise control method and solid state disk |
CN112433928A (en) * | 2020-12-03 | 2021-03-02 | 中国建设银行股份有限公司 | Fault prediction method, device, equipment and storage medium of storage equipment |
CN112596964A (en) * | 2020-12-15 | 2021-04-02 | 中国建设银行股份有限公司 | Disk failure prediction method and device |
CN112988437A (en) * | 2019-12-17 | 2021-06-18 | 深信服科技股份有限公司 | Fault prediction method and device, electronic equipment and storage medium |
CN113094201A (en) * | 2021-06-09 | 2021-07-09 | 神威超算(北京)科技有限公司 | Disk fault notification method, device and computer readable medium |
CN113722130A (en) * | 2021-08-16 | 2021-11-30 | 华中科技大学 | Disk fault prediction method and system |
CN113744892A (en) * | 2021-09-02 | 2021-12-03 | 上海宝藤生物医药科技股份有限公司 | Embryo euploidy prediction method, embryo euploidy prediction device, electronic equipment and storage medium |
WO2022116922A1 (en) * | 2020-12-03 | 2022-06-09 | 中兴通讯股份有限公司 | Magnetic disk failure prediction method, prediction model training method, and electronic device |
CN115938406A (en) * | 2022-12-30 | 2023-04-07 | 天翼云科技有限公司 | Disk failure prediction method and device for processing extremely unbalanced data |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110007337A1 (en) * | 2009-07-10 | 2011-01-13 | Mikiko Imazeki | Apparatus, system, and method of predicting failure of image forming apparatus |
CN107025154A (en) * | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
-
2018
- 2018-04-20 CN CN201810359823.1A patent/CN110389866A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110007337A1 (en) * | 2009-07-10 | 2011-01-13 | Mikiko Imazeki | Apparatus, system, and method of predicting failure of image forming apparatus |
CN107025154A (en) * | 2016-01-29 | 2017-08-08 | 阿里巴巴集团控股有限公司 | The failure prediction method and device of disk |
Non-Patent Citations (2)
Title |
---|
张军;胡震波;朱新山;: "基于AdaBoost分类器的实时交通事故预测" * |
贾润莹;李静;王刚;李忠伟;刘晓光;: "基于Adaboost和遗传算法的硬盘故障预测模型优化及选择" * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178378A (en) * | 2019-11-07 | 2020-05-19 | 腾讯科技(深圳)有限公司 | Equipment fault prediction method and device, electronic equipment and storage medium |
CN111178378B (en) * | 2019-11-07 | 2023-05-16 | 腾讯科技(深圳)有限公司 | Equipment fault prediction method and device, electronic equipment and storage medium |
CN112988437A (en) * | 2019-12-17 | 2021-06-18 | 深信服科技股份有限公司 | Fault prediction method and device, electronic equipment and storage medium |
CN112988437B (en) * | 2019-12-17 | 2023-12-29 | 深信服科技股份有限公司 | Fault prediction method and device, electronic equipment and storage medium |
WO2022116922A1 (en) * | 2020-12-03 | 2022-06-09 | 中兴通讯股份有限公司 | Magnetic disk failure prediction method, prediction model training method, and electronic device |
CN112433928A (en) * | 2020-12-03 | 2021-03-02 | 中国建设银行股份有限公司 | Fault prediction method, device, equipment and storage medium of storage equipment |
CN112596964B (en) * | 2020-12-15 | 2024-05-17 | 中国建设银行股份有限公司 | Disk fault prediction method and device |
CN112596964A (en) * | 2020-12-15 | 2021-04-02 | 中国建设银行股份有限公司 | Disk failure prediction method and device |
CN112286469B (en) * | 2020-12-28 | 2021-03-23 | 湖南源科创新科技有限公司 | Solid state disk temperature rise control method and solid state disk |
CN112286469A (en) * | 2020-12-28 | 2021-01-29 | 湖南源科创新科技有限公司 | Solid state disk temperature rise control method and solid state disk |
CN113094201A (en) * | 2021-06-09 | 2021-07-09 | 神威超算(北京)科技有限公司 | Disk fault notification method, device and computer readable medium |
CN113722130A (en) * | 2021-08-16 | 2021-11-30 | 华中科技大学 | Disk fault prediction method and system |
CN113744892A (en) * | 2021-09-02 | 2021-12-03 | 上海宝藤生物医药科技股份有限公司 | Embryo euploidy prediction method, embryo euploidy prediction device, electronic equipment and storage medium |
CN115938406A (en) * | 2022-12-30 | 2023-04-07 | 天翼云科技有限公司 | Disk failure prediction method and device for processing extremely unbalanced data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110389866A (en) | Disk failure prediction technique, device, computer equipment and computer storage medium | |
CN111523621B (en) | Image recognition method and device, computer equipment and storage medium | |
CN108304936B (en) | Machine learning model training method and device, and expression image classification method and device | |
CN110427311B (en) | Disk fault prediction method and system based on time sequence characteristic processing and model optimization | |
CN109460793A (en) | A kind of method of node-classification, the method and device of model training | |
AU2019399664A1 (en) | A network device classification apparatus and process | |
JP7268756B2 (en) | Deterioration suppression program, degradation suppression method, and information processing device | |
CN109919252A (en) | The method for generating classifier using a small number of mark images | |
CN113051486A (en) | Friend-making scene-based recommendation model training method and device, electronic equipment and computer-readable storage medium | |
Liu et al. | Board-level functional fault identification using streaming data | |
CN111949459B (en) | Hard disk failure prediction method and system based on transfer learning and active learning | |
CN111124732A (en) | Disk fault prediction method, system, device and storage medium | |
CN112435137A (en) | Cheating information detection method and system based on community mining | |
CN112989312B (en) | Verification code identification method and device, electronic equipment and storage medium | |
CN115705274A (en) | Hard disk failure prediction method and device, computer readable medium and electronic equipment | |
CN117857168A (en) | Network attack detection method, device and processor | |
CN113079168B (en) | Network anomaly detection method and device and storage medium | |
CN115456481A (en) | Attendance data processing method applied to enterprise management and attendance server | |
Zheng et al. | Online GNN Evaluation Under Test-time Graph Distribution Shifts | |
CN113569957A (en) | Object type identification method and device of business object and storage medium | |
Fan et al. | Noise-suppressing deep tracking | |
Bhattacharjee et al. | Active learning for imbalanced domains: the ALOD and ALOD-RE algorithms | |
CN113963282A (en) | Video replacement detection and training method and device of video replacement detection model | |
CN117421145B (en) | Heterogeneous hard disk system fault early warning method and device | |
CN114418036B (en) | Method, device and storage medium for testing and training performance of neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191029 |
|
WW01 | Invention patent application withdrawn after publication |