CN114756420A - Fault prediction method and related device - Google Patents

Fault prediction method and related device Download PDF

Info

Publication number
CN114756420A
CN114756420A CN202011596329.0A CN202011596329A CN114756420A CN 114756420 A CN114756420 A CN 114756420A CN 202011596329 A CN202011596329 A CN 202011596329A CN 114756420 A CN114756420 A CN 114756420A
Authority
CN
China
Prior art keywords
data
sample data
fault
sub
health
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011596329.0A
Other languages
Chinese (zh)
Inventor
刘冬实
康炳南
纪晓峰
胡崝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202011596329.0A priority Critical patent/CN114756420A/en
Publication of CN114756420A publication Critical patent/CN114756420A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • G06F11/261Functional testing by simulating additional hardware, e.g. fault simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the application provides a fault prediction method and a related device, wherein the method comprises the following steps: dividing a plurality of sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the plurality of sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold value, the negative sample data comprises health data and a second part of sub-health data in the plurality of sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is less than a second threshold value; and training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing the target data to obtain a prediction result. By adopting the embodiment of the application, the accuracy of model prediction can be improved.

Description

Fault prediction method and related device
Technical Field
The present application relates to the field of fault diagnosis technologies, and in particular, to a fault prediction method and a related apparatus.
Background
With the development of scientific technology, training a machine learning model based on sample data has been widely applied in various fields. When the machine learning model is trained, a large amount of sample data can be collected, and the machine learning model is trained according to the extracted features aiming at the sample data, so that the prediction result of the machine learning model on the input data is gradually close to the extracted features. The sample data may include positive sample data, which is data of a certain category that requires machine learning model learning, and negative sample data, which is data that does not belong to the category.
In some application scenarios, the number of samples of positive sample data may be very small. For example, in the field of hard Disk failure prediction, for a Solid State Drive (SSD), which is commonly called a Solid State Disk, the failure rate of the SSD is low, and the data of the failed hard Disk is very small. This results in a low prediction accuracy of the trained prediction model.
Disclosure of Invention
The embodiment of the application discloses a fault prediction method and a related device, which can improve the accuracy of model prediction.
The first aspect of the embodiments of the present application discloses a fault prediction method, including: dividing the multiple sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the multiple sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold value, the negative sample data comprises health data and a second part of sub-health data in the multiple sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is less than a second threshold value; and training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing the target data.
In the embodiment of the application, the sample data is divided again to obtain positive sample data and negative sample data, wherein the positive sample data comprises total fault data of the sample data and a part of sub-health data, and the part of sub-health data is data with higher similarity to the fault data; the negative sample data includes health data in the sample data and a part of sub-health data, which is data with a low similarity to the fault data. Compared with the existing dividing mode of taking fault data as negative sample data and non-fault data as positive sample data, the method can divide the sample data again to effectively solve the problem of unbalance of the positive sample and the negative sample, and can train the model better based on the positive sample data and the negative sample data, thereby improving the accuracy of model prediction.
In a possible implementation manner of the first aspect, before the dividing the multiple sample data into positive sample data and negative sample data, the method further includes: acquiring a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows; and respectively summing the differential values of each sample data in the preset sliding windows to obtain a plurality of sample characteristics.
Therefore, the sample characteristics extracted through the preset sliding window can reflect the information in a certain time window, and the sample data can be better divided.
In a possible implementation manner of the first aspect, the dividing, from a plurality of sample data, positive sample data and negative sample data includes: determining target sub-health data from the plurality of sample data, wherein the similarity between the features of the target sub-health data and the features of the health data is smaller than a third threshold; marking the target sub-health data with the feature similarity larger than a first threshold value in the target sub-health data as first partial sub-health data; marking the target sub-health data with the feature similarity smaller than a second threshold value in the target sub-health data as second partial sub-health data; marking the fault data and the first part of sub-health data as positive sample data; the health data and the second portion of sub-health data are marked as negative sample data.
The method comprises the steps of firstly determining target sub-health data from sample data, then marking data with similarity with fault data in the target sub-health data as first part of sub-health data, classifying the first part of sub-health data as positive sample data, and expanding the positive sample data originally only containing the fault data; and marking data which has less similarity with the fault data in the target sub-health data as second part of sub-health data, and classifying the second part of sub-health data and the health data as negative sample data. By the method, the sample data can be better divided, so that the divided positive and negative sample data reach a balanced state.
In one possible implementation of the first aspect, determining a plurality of target sub-health data from a plurality of sample data comprises: marking data with the characteristic value of 0 or the characteristic value tending to 0 in a plurality of sample data as healthy data; and performing characteristic similarity analysis on the health data and the plurality of sample data, and marking the data, of which the characteristic similarity with the health data is smaller than a third threshold value, in the plurality of sample data as sub-health data.
It can be seen that the health data is first determined from the sample data, and then the data in the sample data that has less similarity to the health data is labeled as target sub-health data. To this end, sample data is divided into failure data, target sub-health data, and failure data. And the target sub-health data may be considered as a boundary dividing the fault data and the health data.
In a possible implementation manner of the first aspect, after the training is performed to obtain the failure prediction model according to the positive sample data and the negative sample data, the method further includes: analyzing the target data based on a fault prediction model to obtain a prediction result; determining a plurality of reasons of the prediction result according to the prediction result; the importance of the prediction result, the plurality of causes, and each of the causes of the prediction result is output.
It can be seen that not only the prediction result but also a plurality of reasons of the prediction result can be output based on the failure prediction model, and the importance ratio of each reason is also output. Therefore, guidance can be provided for operation and maintenance operation according to a plurality of reasons of the prediction result, and targeted maintenance is facilitated.
In a possible implementation manner of the first aspect, determining a cause of the prediction result according to the prediction result includes: selecting a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result; and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing a prediction result.
Therefore, the splitting characteristic is determined based on the fault prediction model and is used as the reason of the prediction result, the reason does not need to be found manually, and the method is more convenient and has reliability.
In one possible implementation of the first aspect, the positive sample data and the negative sample data are partitioned from the plurality of sample data according to one or more of a growth trend analysis, a distance calculation, and a clustering method.
Therefore, the sample data can be classified by different classification methods, and the classification method is richer and has higher selectivity.
In a possible implementation of the first aspect, the fault prediction model is a random forest model.
A second aspect of the embodiments of the present application discloses a failure prediction apparatus, including:
the sample dividing unit is used for dividing the multiple sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the multiple sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold value, the negative sample data comprises the health data and a second part of sub-health data in the multiple sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is less than a second threshold value;
and the training unit is used for training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing the target data.
In the embodiment of the application, the sample data is divided again to obtain positive sample data and negative sample data, wherein the positive sample data comprises total fault data of the sample data and a part of sub-health data, and the part of sub-health data is data with similarity to the fault data; the negative sample data includes health data in the sample data and a part of sub-health data, which is data with a low similarity to the fault data. Compared with the existing dividing mode of taking fault data as negative sample data and non-fault data as positive sample data, the method can divide the sample data again to effectively solve the problem of unbalance of the positive sample and the negative sample, and can train the model better based on the positive sample data and the negative sample data, thereby improving the accuracy of model prediction.
In a possible embodiment of the second aspect, the method further comprises a feature unit configured to: acquiring a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows; and respectively summing the differential values of each sample data in the preset sliding windows to obtain a plurality of sample characteristics.
Therefore, the sample characteristics extracted through the preset sliding window can reflect the information in a certain time window, and the sample data can be better divided.
In a possible implementation manner of the second aspect, the sample dividing unit is specifically configured to: determining target sub-health data from the plurality of sample data, wherein the similarity between the features of the target sub-health data and the features of the health data is smaller than a third threshold; marking the target sub-health data with the feature similarity with the fault data larger than a first threshold value in the target sub-health data as first partial sub-health data; marking the target sub-health data with the feature similarity smaller than a second threshold value in the target sub-health data as second partial sub-health data; marking the fault data and the first part of sub-health data as positive sample data; the health data and the second portion of sub-health data are marked as negative sample data.
The method comprises the steps of firstly determining target sub-health data from sample data, then marking data with similarity with fault data in the target sub-health data as first part of sub-health data, classifying the first part of sub-health data as positive sample data, and expanding the positive sample data originally only containing the fault data; and marking data which has less similarity with the fault data in the target sub-health data as second part of sub-health data, and classifying the second part of sub-health data and the health data as negative sample data. By the method, the sample data can be better divided, so that the positive and negative sample data obtained by dividing reach a balanced state.
In a possible implementation manner of the second aspect, the sample dividing unit is specifically configured to: marking data with the characteristic value of 0 or the characteristic value tending to 0 in a plurality of sample data as healthy data; and performing characteristic similarity analysis on the health data and the plurality of sample data, and marking the data, of the plurality of sample data, of which the characteristic similarity with the health data is smaller than a third threshold value as sub-health data.
It can be seen that the health data is first determined from the sample data, and then the data in the sample data that has less similarity to the health data is labeled as target sub-health data. To this end, sample data is divided into failure data, target sub-health data, and failure data. And the target sub-health data may be considered as a boundary that divides the fault data and the health data.
In a possible implementation manner of the second aspect, the method further includes a prediction analysis unit configured to analyze a prediction result of the target data based on a failure prediction model; determining a plurality of reasons of the prediction result according to the prediction result; the importance of the prediction result, the plurality of causes, and each of the causes of the prediction result is output.
It can be seen that not only the prediction result but also a plurality of reasons for the prediction result can be output based on the failure prediction model, and the importance ratio of each reason is also output. Therefore, guidance can be provided for operation and maintenance operation according to a plurality of reasons of the prediction result, and targeted maintenance is facilitated.
In a possible implementation manner of the second aspect, the prediction analysis unit is specifically configured to: selecting a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result; and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing a prediction result.
It can be seen that the splitting characteristics are determined based on the fault prediction model, and the splitting characteristics are used as the reasons of the prediction results, so that the reliability is provided.
In one possible embodiment of the second aspect, the positive sample data and the negative sample data are partitioned from the plurality of sample data according to one or more of a growth trend analysis, a distance calculation, and a clustering method.
Therefore, the sample data can be classified by different classification methods, and the classification method is richer and has higher selectivity.
In a possible embodiment of the second aspect, the fault prediction model is a random forest model.
A third aspect of an embodiment of the present application discloses a failure prediction device, which includes at least one processor, at least one memory, and a communication interface, where the communication interface is configured to send and/or receive data, and the at least one processor is configured to invoke a computer program stored in the at least one memory, so that the apparatus implements the method described in the first aspect or any one of the possible implementation manners of the first aspect.
A fourth aspect of the embodiments of the present application discloses a computer-readable storage medium, in which a computer program is stored, which, when running on one or more processors, performs the method described in the first aspect or any one of the possible implementations of the first aspect.
A fifth aspect of the embodiments of the present application discloses a chip system, which includes at least one processor, a memory, and an interface circuit, where the interface circuit is configured to provide information input/output for the at least one processor, and the memory stores a computer program that, when executed on the one or more processors, performs the method described in the first aspect or any one of the possible implementations of the first aspect.
Drawings
The drawings used in the embodiments of the present application are described below.
Fig. 1 is a schematic structural diagram of a failure prediction system provided in an embodiment of the present application;
FIG. 2 is a schematic view of a fault prediction scenario provided in an embodiment of the present application;
fig. 3 is a schematic flowchart of a fault prediction method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a failure prediction apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a failure prediction apparatus according to an embodiment of the present application.
Detailed Description
The embodiments of the present application will be described below with reference to the drawings.
In the field of hard disk failure prediction, in recent years, more and more enterprises adopt Solid State Drive (SSD) in data storage, but as the SSD is about to enter the middle and later stages of the lifecycle, the inventors of the present application found that the failure rate of SSD of some large data centers is increasing year by year. The hard disk failure problem may directly affect the continuity of customer service. Compared with a passive fault-tolerant technology, the active prediction of the hard disk fault can execute an operation and maintenance strategy in a planned way according to a prediction result, and the influence on the availability of customer service and customer experience caused by the sudden fault of the SSD is avoided.
Currently, in the field of hard disk failure prediction, the inventors of the present application found that the following problems exist in failure prediction:
problem 1, Hard Disk failure prediction research is mostly directed to a mechanical Hard Disk (HDD), because the SSD failure rate is lower and the utilized failure Hard Disk data is less, which may cause the problem of unbalanced positive and negative samples during model training. Some fault hard disks have no obvious characteristics on information such as S.M.A.R.T, I/O data and the like, while healthy hard disks have a large number of characteristics, so the characteristics of healthy hard disks can confuse the judgment on the fault hard disks, and the healthy hard disks and the fault hard disks can be difficultly distinguished in characteristics.
The current failure prediction method is to randomly extract data in a healthy hard disk and a failed hard disk according to a certain proportion as a sample training set, and take the residual data set as a test set/verification set, for example, using a sampling method.
The sampling method comprises upsampling and downsampling. The down-sampling refers to randomly sampling samples with larger proportion, or extracting the samples with larger proportion by a clustering method such as K-means and the like; the upsampling refers to repeatedly sampling a sample with a small sample size (for example, naive random upsampling), or generating the sample data by a method such as synthesizing a few types of Oversampling technologies (SMOTE).
The inventor of the application finds that the sampling method does not divide the hard disk data according to the characteristics of the hard disk data, so that positive and negative samples cannot be well divided.
For another example, an unsupervised and anomaly detection method is used to solve the problem of imbalance between positive and negative samples.
The unsupervised and abnormal detection method only uses a sample class training model with a larger proportion. Common methods include statistical-based anomaly detection algorithms (e.g., boxplot, 3-sigma, moving average, etc.), density-based anomaly detection algorithms (e.g., Local Outlier Factor (LOF)), clustering-based anomaly detection algorithms (K-means, DBSCAN, one-class SVM, iForest), Principal component analysis-based algorithms (PCA), sample reconstruction error-based algorithms (AutoEncoder), and so forth.
The inventor of the application finds that the positive and negative samples cannot be well divided due to the lack of utilization of labeling information of the samples in the unsupervised and abnormal detection methods.
Problem 2, the current research in the field of failure prediction mainly aims at improving the model recall rate and reducing the model false alarm rate. For example, a Support Vector Machine (SVM), a random forest, a Long-Term Memory network (LSTM), and other model methods are used to improve the model recall rate and reduce the model false alarm rate. However, the inventors of the present application found that the random forest provided by the existing sklern provides feature importance only for training set data. The inventor of the application finds that the Treeinterpreter method can only give the contribution of the characteristic value of the random forest model prediction result.
The inventor of the application finds that the model interpretability analysis of the current failure prediction technology is insufficient, and the reason for the prediction result cannot be given according to the prediction result given by the model. It may result in failure to take targeted operation and maintenance processing according to the prediction result.
The Treeinterpreter method lacks the feature value ratio and visual presentation, and is not beneficial for users to read and understand information.
Problem 3, in training a fault prediction model, the inventors of the present application found that the current feature construction method is only to calculate the first/second order difference values of adjacent sample data.
First order difference value calculation:
Δ y (x) y (x +1) -y (x) formula 1
And (3) calculating a second-order difference value:
Δ (Δ y (x)) Δ (y (x +1) -y (x)) Δ y (x +1) - Δ y (x) formula 2
The inventor of the application finds that the first order/second order difference values of adjacent sample data cannot represent the enhancement trend of the sample data in the historical time window. Since the failure time of the failed disk may occur in a time period in which the feature value is not increased (a time period in which the first-order/second-order differential value is 0), such a failure cannot be predicted somewhat only by the current feature construction method, and the model classification effect may be affected.
In view of the above problems, embodiments of the present application provide a failure prediction method and a related apparatus, by which an accumulated sum of first-order difference values of each sample data in a preset sliding window can be used as a sample feature, so that a plurality of sample features can be obtained. And then dividing the multiple sample characteristics to obtain positive sample characteristics and negative sample characteristics, wherein the positive sample characteristics comprise characteristics of fault data in the multiple sample data and characteristics of a first part of sub-healthy data, the similarity between the characteristics of the first part of sub-healthy data and the characteristics of the fault data is greater than a first threshold value, the negative sample characteristics comprise healthy data and a second part of sub-healthy data in the multiple sample data, and the similarity between the characteristics of the second part of sub-healthy data and the characteristics of the fault data is less than a second threshold value. And then, training according to the positive sample characteristics and the negative sample characteristics to obtain a fault prediction model. Predicting to obtain a prediction result of the target data based on the fault prediction model; determining the reason causing the prediction result according to the prediction result, and outputting the prediction result, the reason causing the prediction result, and the importance ratio of each reason causing the prediction result.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a fault prediction system provided in an embodiment of the present application, where the fault prediction system 100 includes a data acquisition module 101, a preprocessing module 102, a feature extraction module 103, a model training module 104, and a fault prediction module 105, where:
the data acquisition module 101 is configured to acquire and count data from the electronic device, where the data is sample data used for model training, where the sample data may have a condition that the positive samples and the negative samples are not distributed uniformly. For example, if the sample data is data of the SSD hard disk, there may be a few data of the failed hard disk, and many data of the non-failed hard disk, that is, the positive sample data is few, and the negative sample data is many. The electronic device may be a device with data Storage capability, and may be a physical Storage device, such as a memory (including a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), and the like), a disk (including a portable random access memory (CD-RAM), a Solid State Drive (SSD), and the like), or other electronic devices with data Storage capability, such as a Network Attached Storage (NAS) server, and the like, or a virtual Storage device, such as a virtual machine, a container, and the like.
The preprocessing module 102 is configured to preprocess the sample data acquired by the data acquisition module 101. Further, the pre-processing may include: and sorting the data format of the sample data, screening out a list of fault data, labeling the fault data in the plurality of sample data, labeling the fault data according to the fault time, and the like.
The feature extraction module 103 is configured to construct a sliding window accumulated difference feature, specifically, obtain a difference value of each sample data in a plurality of preset sliding windows according to a preset sliding window; and respectively summing the differential values of each sample data in the preset sliding windows, thereby obtaining a plurality of sample characteristics. It should be noted that the difference value may be a first-order difference value or a second-order difference value, and the embodiment of the present application is not limited in any way.
The model training module 104 is an electronic device having data processing capability and data transceiving capability, and may be a physical device such as a host, a rack server, a blade server, or the like, or a virtual device such as a virtual machine, a container, or the like, and positive sample data and negative sample data are obtained by dividing from a plurality of sample data, where the positive sample data includes fault data in the plurality of sample data and a first part of sub-health data, a similarity between a feature of the first part of sub-health data and a feature of the fault data is greater than a first threshold, the negative sample data includes a health data and a second part of sub-health data in the plurality of sample data, and a similarity between a feature of the second part of sub-health data and a feature of the fault data is less than a second threshold. Then, the model training module 104 may also train according to the divided positive sample data and negative sample data to obtain a fault prediction model.
The failure prediction module 105 is an electronic device with data processing capability and data transceiving capability, and may be a physical device such as a host, a rack server, a blade server, or the like, or a virtual device such as a virtual machine, a container, or the like, and is configured to predict a prediction result of target data based on a trained failure prediction model, determine multiple reasons of the prediction result according to the prediction result, and output the prediction result, the multiple reasons, and importance ratios of the respective reasons of the prediction result.
Optionally, the fault prediction module 105, the data acquisition module 101, the preprocessing module 102, the feature extraction module 103, and the model training module 104 may be one device, or one module in a certain device, or a device cluster formed by a plurality of devices. For example, the failure prediction module 105 may be deployed in one or more devices that need to perform failure prediction, but the devices that need to perform failure prediction may be devices deployed by the data acquisition module 101, the preprocessing module 102, the feature extraction module 103, and the model training module 104, or may not be devices deployed by the data acquisition module 101, the preprocessing module 102, the feature extraction module 103, and the model training module 104.
Referring to fig. 2, fig. 2 is a schematic view of a scenario of failure prediction according to an embodiment of the present disclosure. As can be seen in FIG. 2, the scenario 20 includes feature extraction 20A, model training 20B, and failure analysis 20C. The feature extraction 20A includes a data collection module 200, a fault labeling module 201, a feature extraction module 202, and a feature selection module 203; the model training 20B includes a sample data partitioning module 204 and a model training module 205; the fault analysis 20C includes a fault prediction module 206 and an analysis module 207.
The data collection module 200 is configured to collect sample data for model training, where the sample data may include operating data of one or more electronic devices each day. The electronic device may be a device with data Storage capability, and may be a physical Storage device, such as a memory (including a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), and the like), a disk (including a portable random access memory (CD-RAM), a Solid State Drive (SSD), and the like), or other electronic devices with data Storage capability, such as a Network Attached Storage (NAS) server, and the like, or a virtual Storage device, such as a virtual machine, a container, and the like.
And a fault marking module 201, configured to mark fault data in the sample data. Furthermore, the fault data in the sample data can be labeled according to the fault time. It is understood that the fault data is from the malfunctioning electronic device.
A feature extraction module 202 for constructing sample features. Further, obtaining a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows; and respectively summing the differential values of each sample data in the preset sliding windows, thereby obtaining a plurality of sample characteristics. It should be noted that the difference value may be a first-order difference value or a second-order difference value, and the embodiment of the present application is not limited in any way.
The feature selection module 203 screens sample features for model training from the plurality of sample features based on feature correlation and feature importance.
The sample data dividing module 204 is configured to divide the multiple sample data to obtain positive sample data and negative sample data, where the positive sample data includes fault data and a first part of sub-health data in the multiple sample data, a similarity between a feature of the first part of sub-health data and a feature of the fault data is greater than a first threshold, the negative sample data includes health data and a second part of sub-health data in the multiple sample data, and a similarity between a feature of the second part of sub-health data and a feature of the fault data is less than a second threshold.
And the model training module 205 is configured to train a fault prediction model according to the divided positive sample data and negative sample data, and the plurality of sample features. Further, a fault prediction model is trained based on a random forest algorithm.
And the fault prediction module 206 is configured to predict a prediction result of the target data based on the trained fault prediction model. Wherein the target data is data of the target device collected every day.
And the analysis module 207 is used for determining a plurality of reasons of the prediction result according to the prediction result and outputting the prediction result, the plurality of reasons and the importance ratio of each reason of the prediction result.
Optionally, the data collection module 200, the fault labeling module 201, the extraction module 202, and the feature selection module 203 may be one device, or one module in a certain device, or a device cluster formed by a plurality of devices. The sample data partitioning module 204 and the model training module 205 may be one device, one module in a certain device, or a device cluster composed of a plurality of devices. The failure prediction module 206 and the analysis module 207 may be one device, one module in a certain device, or a device cluster formed by a plurality of devices.
Referring to fig. 3, fig. 3 is a schematic flow chart of a fault prediction method provided in the present embodiment, and further, the method may be implemented based on the framework shown in fig. 1, and the method includes, but is not limited to, the following steps:
step S301: and extracting sample characteristics.
Specifically, before extracting sample features of sample data, the sample data needs to be preprocessed, such as one or more of data cleaning, data integration, data transformation, and data reduction. And the fault data in the sample data also needs to be labeled, and it can be understood that the data of the fault equipment is labeled as fault data, and the data of the non-fault equipment is labeled as non-fault data. For example, when a certain electronic device or a certain functional module in the electronic device fails to work normally, the failed device is recorded in a failure list, and the failure list includes information such as failure time and failed device. And the fault data in the sample data can be found out through the fault list, and the fault data in the sample data is labeled according to the fault time. It should be noted that the failure data includes not only the data on the day of the failure, but also the data before the failure.
Then, a plurality of sample features of the sample data are extracted by the feature extraction module. Further, the feature extraction module obtains a difference value of each sample data in the multiple preset sliding windows according to the preset sliding window (sliding window), and then sums the difference values of each sample data in the multiple preset sliding windows respectively, so as to obtain multiple sample features. Furthermore, the sample characteristics can be sorted according to size, the variation trend in the preset window can be obtained from the sorted sample characteristics, and the variation trend of the fault data can be a growth trend. It should be noted that the preset sliding window may be a suitable value according to actual requirements, and the embodiment of the present application is not limited in any way.
For example, if the collected sample data is the s.m.a.r.t data and the I/O data of the SSD hard disk, the s.m.a.r.t data and the I/O data of the SSD hard disk of each day may be regarded as a subset, and if the collected data is data within M days, the sample data may include M subsets. First, the difference values of the M subsets need to be calculated, respectively. If the preset sliding window is N days, and N is a positive integer smaller than M, the N days are used as the sliding window to divide M subsets to obtain P sets, the differential value of each subset in the P sets is obtained respectively, the differential values of each subset in the sets are summed, the accumulated sum of the differential values can be used as a sample characteristic, and P sample characteristics can be obtained due to the P sets.
It can be understood that the s.m.a.r.t data belongs to warning information that may be issued when there is an abnormality in the hard disk, and as time goes on, the more frequent the abnormality occurs in the hard disk, the more warning information is, which may cause a failure of the hard disk. Thus, the trend of the change in the failure data may be a growing trend.
In a possible implementation manner, after a plurality of sample features are extracted from a plurality of sample data, the sample features required by model training need to be selected according to feature correlation and feature importance. That is, features that may contribute to the quality of the model are selected automatically or manually. Feature selection is the process of culling those features that are not relevant and that may reduce the accuracy and quality of the model. Feature correlation, which is a method of understanding the relationship between multiple variables and attributes in a dataset, can be used to derive the reason why one or more attributes depend on or are associated with another attribute. Feature importance, a score may be provided for each feature in the data, the higher the score, the higher the importance or relevance of the feature to the output variable.
It should be noted that the difference value may be a first-order difference value or a second-order difference value, and the embodiment of the present application is not limited at all.
Step S302: and dividing the plurality of sample data to obtain positive sample data and negative sample data.
Specifically, because the collected sample data has a problem of imbalance between positive and negative samples, the model training module needs to divide the sample data again to obtain positive sample data and negative sample data for training the model. The positive sample data obtained by dividing comprises fault data and a first part of sub-health data in a plurality of sample data, and the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold value; the negative sample data comprises health data and a second part of sub-health data in the plurality of sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is smaller than a second threshold value. It should be noted that the first threshold and the second threshold may be equal or unequal. The first threshold and the second threshold may be a value for reference contrast set manually empirically or a value for reference contrast trained (or learned) from a plurality of historical values. Therefore, the first threshold or the second threshold corresponding to different scenes is also different.
In a possible implementation manner, the eigenvalue is 0 or the eigenvalue tends to be 0, which may indicate that the accumulated sum of the difference values in the preset sliding window has no abnormal change, that is, the data collected each day has little fluctuation and is in a steady state. For example, the trend to 0 specifically means that the absolute value of the difference from 0 is smaller than a fifth threshold, where the fifth threshold may be a smaller value that is set manually, for example, 0.1, and further, for example, 0.05, and the value of the fifth threshold is not limited in this embodiment of the present application.
For example, if the collected sample data is s.m.a.r.t data of the SSD hard disk every day in a period of time, the s.m.a.r.t data records data when the hard disk is abnormal, such as information of uncorrectable error count, newly-corrupted block count, block programming error count, and the like of the hard disk. It can be seen that when adoptingWhen the sample data of the set has no abnormal data in the preset sliding window or the abnormal data is in a non-growing state (the difference value is 0 or tends to be 0), the extracted characteristic value can be 0 or the characteristic value tends to be 0, and the part of data can be marked as healthy data. Then, the model training module may perform feature similarity analysis on the health data and a plurality of sample data (which may be data that does not include fault data and/or health data in the collected sample data), and mark data, of the plurality of sample data, whose feature similarity with the health data is smaller than a third threshold as sub-health data. The model training unit can perform feature similarity analysis through one or more of distance calculation and clustering methods. For example, a Minkowski distance (Minkowski distance), also known as a Minkowski distance, or a Minkowski distance, may be employed. Two i-dimensional variables U ═ U (U)1,u2,u3,…,ui) And V ═ V (V)1,v2,v3,…,vi) The Ming's distance between is defined as:
(∑(|wi(ui-vi)|p))1/pequation 3
Wherein p is a variable parameter. When p is 1, the Manhattan distance is obtained; when p is 2, the euclidean distance is defined, and when p is infinite, the chebyshev distance is defined. V represents health data, U represents a plurality of sample data, and U may specifically be data that does not include health data and/or failure data in the plurality of sample data. The health data and the plurality of sample data can be subjected to feature similarity analysis through a formula 3, namely, distances (feature similarities) between the health data and the plurality of sample data can be calculated, the calculated distances are ranked, and then data smaller than or equal to a third threshold value is selected from the ranked distances and marked as sub-health data. It should be noted that the third threshold is a value for reference comparison that is manually set according to experience, or a value for reference comparison that is obtained by training (or learning) according to a plurality of historical values, and the third threshold is not limited in this embodiment of the present application.
In one possible implementation, the modelThe training module can divide a plurality of sample data by a PCA method to obtain health data and sub-health data. Specifically, the model training module maps a plurality of sample data (which may be data not including fault data) to the low-dimensional feature space by the PCA method, and records the sample data X ═ X (X)1,x2,x3,…,xi) Calculating covariance matrix C ═ XTX,XTRepresenting the transposed vector, and then solving for the eigenvalues λ of the covariance matrix C1,λ2,……,λjAnd a feature vector e1,e2,……,ej. Calculate each data xiIn the main component ejDegree of deviation of
Figure BDA0002868062600000091
By passing
Figure BDA0002868062600000092
And calculating to obtain the scores of the sample data, and then sorting the scores of the sample data according to the size to form a score set. And marking sample data corresponding to the score of 0 or the score trend to 0 in the score set as healthy data, and marking the sample data corresponding to the score smaller than a third threshold in the score set as sub-healthy data. It should be noted that the third threshold is a value for reference comparison that is manually set according to experience, or a value for reference comparison that is obtained by training (or learning) according to a plurality of historical values, and the third threshold is not limited in this embodiment of the present application.
After the target sub-health data are determined, the model training unit compares the feature similarity of the target sub-health data with the fault data, marks data, with the feature similarity larger than a first threshold value, in the target sub-health data as first part sub-health data, and marks data, with the feature similarity smaller than a second threshold value, in the target sub-health data as first part sub-health data. For example, a hypothesis testing approach in statistics may be used by the Mann-Kendall growth trend algorithm. In the Mann-Kendall test, the original hypothesis Y0Is a targetHealth data Y ═ Y1,y2,y3,…,yi) N independent samples with random variables in the same distribution; alternative hypothesis Y1Is a bilateral test. Defining a test statistic S:
Figure BDA0002868062600000101
wherein, yjAnd ykFor data at different time points, sgn (y)j-yk) To indicate a function, according to yj-ykThe sign of (A) is 1, 0, -1.
When the Mann-Kendall statistic formula S is greater than, equal to or less than zero, respectively:
Figure BDA0002868062600000102
in the bilateral test, the measure of the magnitude of the trend is
Figure BDA0002868062600000103
1<k<j<i, positive β indicates "upward trend", and negative β indicates "downward trend". For a given confidence level α, the original hypothesis Y0: beta is 0, when | ZMK|>Z1-αWhen the original hypothesis is rejected, i.e. there is an upward or downward trend in the data at the confidence level α. Because the change trend of the fault data can be a growth trend, the model training unit compares the data with obvious rising trend in the target sub-health data with the fault data, marks the target sub-health data with rising trend and with similarity of features with the fault data larger than the first threshold value as the first part of sub-health data, and marks the target sub-health data with rising trend and with similarity of features with the fault data smaller than the second threshold value as the second part of sub-health data.
Finally, the model training unit may mark the fault data and the first portion of sub-health data as positive sample data and mark the health data and the second portion of sub-health data as negative sample data.
Step S303: and training according to the positive sample data and the negative sample data to obtain a fault prediction model.
Specifically, after the acquired sample data is re-divided to obtain positive sample data and negative sample data, the model training module can train to obtain the fault prediction model based on the constructed sample characteristics, the positive sample data and the negative sample data. Further, training data is selected from positive sample data and negative sample data respectively based on a down-sampling method, and then the residual sample data is used as verification/test data. Further, the fault prediction model may be a random forest model.
Step S304: and analyzing the target data based on the fault prediction model.
Specifically, when a certain device or a module of a certain device needs to perform fault analysis, the fault prediction module may analyze target data based on a fault prediction model, where the target data may be data of the device or the module of the device. Analyzing the target data based on the prediction model may result in whether the device or a module of the device is malfunctioning and on which day the malfunction may occur. When the prediction result is that the equipment is likely to have a fault in a certain day (for example, after 14 days), the prediction result of the target data is analyzed based on the fault prediction model, a plurality of reasons of the prediction result are determined according to the prediction result, and then the fault prediction module can output the prediction result, the plurality of reasons and importance ratios of the various reasons of the prediction result. Further, the failure prediction module may visualize presenting the predicted outcome, the plurality of causes, and the importance of each of the causes of the predicted outcome. Therefore, the device can be early-warned (for example, 14 days in advance), and the operation and maintenance strategy can be executed in a planned way according to a plurality of reasons of the prediction result, so that the influence on the availability of customer service and customer experience caused by the sudden failure of the device or the modules of the device is avoided.
In a possible implementation manner, the fault prediction model is a random forest model, and when the fault prediction module analyzes target data based on the fault prediction model to obtain a fault result, the fault prediction module can select a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result. In machine learning, a random forest is a classifier that contains multiple decision trees, and its output is dependent on the mode of the class of the individual tree output. In the decision tree model, at each decision node, the best features are selected for segmentation to further differentiate the samples arriving at that decision node. In each segmentation, the final decision (i.e., leaf node) may be closer. Thus, at each decision node, the selected segmentation features determine the final prediction result. Therefore, the fault prediction module can obtain the splitting characteristics on the decision path corresponding to the decision tree, and take the splitting characteristics as the reasons for the prediction result. It is understood that the prediction may be the result of the co-operation of multiple split features. Further, the combination of the plurality of split features may be represented as a failure mode that produces a predicted result. Then, the failure prediction module may count, sort, and normalize the plurality of split features to obtain a plurality of feature importance and a corresponding importance ratio, where the importance ratio may be a ratio of the split feature in the plurality of split features. Further, the top M of the plurality of feature importance may be selected as the reason for the prediction result.
Note that the feature value mentioned in the embodiments of the present application may be understood as a feature vector.
In the method described in fig. 3, the sample data is divided again to obtain positive sample data and negative sample data, where the positive sample data includes total fault data of the sample data and a part of sub-health data, and the part of sub-health data is data having similarity to the fault data; the negative sample data includes health data in the sample data and a part of sub-health data, which is data with a low similarity to the fault data. Compared with the existing dividing mode of taking fault data as negative sample data and non-fault data as positive sample data, the method can divide the sample data again to effectively solve the problem of unbalance of positive and negative samples, and can train the model better based on the positive sample data and the negative sample data to improve the accuracy of model prediction.
The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a failure prediction apparatus 400 according to an embodiment of the present application, where the failure prediction apparatus 400 may be a device node, or may be a module in the device node, such as a chip or an integrated circuit, and the failure prediction apparatus 400 is used to implement the aforementioned failure prediction method, such as the failure prediction method described in the embodiment shown in fig. 3.
Further, the failure prediction apparatus 400 may include a sample division unit 401, a training unit 402, a feature unit 403, and a prediction analysis unit 404, where the following units are described in detail:
the sample dividing unit 401 is configured to divide the multiple sample data to obtain positive sample data and negative sample data, where the positive sample data includes fault data and a first part of sub-health data in the multiple sample data, a similarity between a feature of the first part of sub-health data and a feature of the fault data is greater than a first threshold, the negative sample data includes health data and a second part of sub-health data in the multiple sample data, and a similarity between a feature of the second part of sub-health data and a feature of the fault data is less than a second threshold;
a training unit 402, configured to obtain a fault prediction model through training according to the positive sample data and the negative sample data, where the fault prediction model is used to analyze the target data.
In the embodiment of the application, the sample data is divided again to obtain positive sample data and negative sample data, wherein the positive sample data comprises total fault data of the sample data and a part of sub-health data, and the part of sub-health data is data with similarity to the fault data; the negative sample data includes health data in the sample data and a part of sub-health data, which is data with a low similarity to the fault data. Compared with the existing dividing mode of taking fault data as negative sample data and non-fault data as positive sample data, the method can divide the sample data again to effectively solve the problem of unbalance of the positive sample and the negative sample, and can train the model better based on the positive sample data and the negative sample data, thereby improving the accuracy of model prediction. In a possible implementation manner, the feature unit 403 is configured to obtain a differential value of each sample data in a plurality of preset sliding windows according to a preset sliding window; and respectively summing the differential values of each sample data in the preset sliding windows to obtain a plurality of sample characteristics.
It can be seen that the sample features extracted through the preset sliding window can reflect information in a certain time window, and sample data can be divided better.
In a possible implementation, the sample dividing unit 401 is specifically configured to: determining target sub-health data from the plurality of sample data, wherein the similarity between the features of the target sub-health data and the features of the health data is smaller than a third threshold; marking the target sub-health data with the feature similarity larger than a first threshold value in the target sub-health data as first partial sub-health data; marking the target sub-health data with the feature similarity smaller than a second threshold value in the target sub-health data as second partial sub-health data; marking the fault data and the first part of sub-health data as positive sample data; the health data and the second portion of sub-health data are marked as negative sample data.
The method comprises the steps of firstly determining target sub-health data from sample data, then marking data with similarity with fault data in the target sub-health data as first part of sub-health data, classifying the first part of sub-health data as positive sample data, and expanding the positive sample data originally only containing the fault data; and marking data which has less similarity with the fault data in the target sub-health data as second part of sub-health data, and classifying the second part of sub-health data and the health data as negative sample data. By the method, the sample data can be better divided, so that the positive and negative sample data obtained by dividing reach a balanced state.
In a possible implementation, the sample dividing unit 401 is specifically configured to: marking data with the characteristic value of 0 or the characteristic value tending to 0 in a plurality of sample data as healthy data; and performing characteristic similarity analysis on the health data and the plurality of sample data, and marking the data, of which the characteristic similarity with the health data is smaller than a third threshold value, in the plurality of sample data as sub-health data.
It can be seen that the health data is first determined from the sample data, and then the data in the sample data that has less similarity to the health data is labeled as target sub-health data. To this end, sample data is divided into failure data, target sub-health data, and failure data. And the target sub-health data may be considered as a boundary that divides the fault data and the health data.
In one possible implementation, the prediction analysis unit 404 is configured to analyze a prediction result of the target data based on a fault prediction model; determining a plurality of reasons of the prediction result according to the prediction result; the importance of the prediction result, the plurality of causes, and each of the causes of the prediction result is output.
It can be seen that not only the prediction result but also a plurality of reasons of the prediction result can be output based on the failure prediction model, and the importance ratio of each reason is also output. Therefore, guidance can be provided for operation and maintenance operation according to a plurality of reasons of the prediction result, and targeted maintenance is facilitated.
In a possible implementation manner, the prediction analysis unit 404 is configured to select a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result; and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing a prediction result.
It can be seen that the splitting characteristics are determined based on the fault prediction model, and the splitting characteristics are used as the reasons of the prediction results, so that the reliability is high.
In one possible embodiment, the positive sample data and the negative sample data are partitioned from the plurality of sample data according to one or more of a growth trend analysis, a distance calculation, and a clustering method.
Therefore, the sample data can be classified by different classification methods, and the classification method is richer and has higher selectivity.
In one possible embodiment, the fault prediction model is a random forest model.
It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 3.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a failure prediction apparatus provided in an embodiment of the present application, where the failure prediction apparatus 500 includes at least one processor 501, at least one memory 502, and a communication interface 503. Optionally, a bus 504 may be included, wherein the processor 501, the memory 502, and the communication interface 503 are interconnected via the bus 504.
The memory 502 is used to provide a storage space, and the storage space can store data such as an operating system and a computer program. The memory 502 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), among others.
The processor 501 is a module for performing arithmetic operations and/or logical operations, and may specifically be one or a combination of plural processing modules, such as a Central Processing Unit (CPU), a picture processing unit (GPU), a Microprocessor (MPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and a Complex Programmable Logic Device (CPLD).
The communication interface 503 is used for receiving and/or transmitting data from/to the outside, and may be a wired link interface such as an ethernet cable, and may also be a wireless link (Wi-Fi, bluetooth, general wireless transmission, vehicle-mounted short-range communication technology, etc.) interface. Optionally, the communication interface 503 may further include a transmitter (e.g., a radio frequency transmitter, an antenna, etc.) or a receiver, etc. coupled to the interface.
The processor 501 in the failure prediction device 500 is configured to read the computer program code stored in the memory 502 and perform the following operations:
dividing the multiple sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the multiple sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold value, the negative sample data comprises health data and a second part of sub-health data in the multiple sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is less than a second threshold value; and training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing the target data.
In the embodiment of the application, the sample data is divided again to obtain positive sample data and negative sample data, wherein the positive sample data comprises total fault data of the sample data and a part of sub-health data, and the part of sub-health data is data with similarity to the fault data; the negative sample data includes health data in the sample data and a part of sub-health data, which is data with a low similarity to the fault data. Compared with the existing dividing mode of taking fault data as negative sample data and non-fault data as positive sample data, the method can divide the sample data again to effectively solve the problem of unbalance of the positive sample and the negative sample, and can train the model better based on the positive sample data and the negative sample data, thereby improving the accuracy of model prediction.
In one possible implementation, the processor 501 is further configured to: acquiring a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows; and respectively summing the differential values of each sample data in the preset sliding windows to obtain a plurality of sample characteristics.
Therefore, the sample characteristics extracted through the preset sliding window can reflect the information in a certain time window, and the sample data can be better divided.
In one possible implementation, the processor 501 is specifically configured to: determining target sub-health data from the plurality of sample data, wherein the similarity between the features of the target sub-health data and the features of the health data is smaller than a third threshold; marking the target sub-health data with the feature similarity with the fault data larger than a first threshold value in the target sub-health data as first partial sub-health data; marking the target sub-health data with the feature similarity smaller than a second threshold value in the target sub-health data as second partial sub-health data; marking the fault data and the first part of sub-health data as positive sample data; the health data and the second portion of sub-health data are marked as negative sample data.
The method comprises the steps of firstly determining target sub-health data from sample data, then marking data with similarity with fault data in the target sub-health data as first part of sub-health data, classifying the first part of sub-health data as positive sample data, and expanding the positive sample data originally only containing the fault data; and marking data which has less similarity with the fault data in the target sub-health data as second part of sub-health data, and classifying the second part of sub-health data and the health data as negative sample data. By the method, the sample data can be better divided, so that the positive and negative sample data obtained by dividing reach a balanced state.
In a possible implementation, the processor 501 is specifically configured to: marking data with the characteristic value of 0 or the characteristic value tending to 0 in a plurality of sample data as healthy data; and performing characteristic similarity analysis on the health data and the plurality of sample data, and marking the data, of the plurality of sample data, of which the characteristic similarity with the health data is smaller than a third threshold value as sub-health data.
It can be seen that the health data is first determined from the sample data, and then the data in the sample data that has less similarity to the health data is labeled as target sub-health data. To this end, sample data is divided into failure data, target sub-health data, and failure data. And the target sub-health data may be considered as a boundary that divides the fault data and the health data.
In one possible implementation, the processor 501 is further configured to: analyzing a prediction result of the target data based on a fault prediction model; determining a plurality of reasons of the prediction result according to the prediction result; the prediction result, the plurality of causes, and the importance ratio of each cause of the prediction result are output.
It can be seen that not only the prediction result but also a plurality of reasons of the prediction result can be output based on the failure prediction model, and the importance ratio of each reason is also output. Therefore, guidance can be provided for operation and maintenance operation according to a plurality of reasons of the prediction result, and targeted maintenance is facilitated.
In one possible implementation, the processor 501 is further configured to: selecting a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result; and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing a prediction result.
Therefore, the splitting characteristic is determined based on the fault prediction model and is used as the reason of the prediction result, the reason does not need to be found manually, and the method is more convenient and has reliability.
In one possible implementation, the processor 501 is further configured to: the positive sample data and the negative sample data are obtained by dividing the multiple sample data according to one or more of a growth trend analysis method, a distance calculation method and a clustering method.
Therefore, the sample data can be classified by different classification methods, and the classification method is richer and has higher selectivity.
In one possible embodiment, the fault prediction model is a random forest model.
It should be noted that the implementation of each operation may also correspond to the corresponding description of the method embodiment shown in fig. 3.
The embodiment of the present application further provides a chip system, where the chip system includes at least one processor, a memory and an interface circuit, where the memory, the transceiver and the at least one processor are interconnected by a line, and the at least one memory stores a computer program; when the computer program is executed by the processor, the method flow shown in fig. 3 is implemented.
Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed on one or more processors, the method flow shown in fig. 3 is implemented.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. The procedures or functions described in the embodiments of the present application may be implemented in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program may be stored in or transmitted through a computer readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), among others.
The steps in the method embodiment of the present application may be sequentially adjusted, combined, and deleted according to actual needs.
The modules in the device embodiment of the application can be combined, divided and deleted according to actual needs.

Claims (18)

1. A method of fault prediction, comprising:
dividing a plurality of sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the plurality of sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold, the negative sample data comprises the health data and a second part of sub-health data in the plurality of sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is less than a second threshold;
and training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing target data to obtain a prediction result.
2. The method of claim 1, wherein before the partitioning the plurality of sample data into positive sample data and negative sample data, further comprising:
acquiring a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows;
and summing the differential values of each sample data in the preset sliding windows respectively to obtain the plurality of sample characteristics.
3. The method according to claim 1 or 2, wherein the dividing of the plurality of sample data into positive sample data and negative sample data comprises:
determining target sub-health data from the plurality of sample data, wherein a similarity of a feature of the target sub-health data to a feature of the health data is less than a third threshold;
marking the target sub-health data with the feature similarity with the fault data larger than the first threshold value in the target sub-health data as the first part of sub-health data;
marking the target sub-health data with the feature similarity with the fault data smaller than the second threshold value in the target sub-health data as the second part of sub-health data;
marking the fault data and the first portion of sub-health data as positive sample data;
marking the health data and the second portion of sub-health data as negative sample data.
4. The method of claim 3, wherein said determining a plurality of target sub-health data from said plurality of sample data comprises:
marking data with the characteristic value of 0 or the characteristic value tending to be 0 in the plurality of sample data as the healthy data;
and performing feature similarity analysis on the health data and the plurality of sample data, and marking the data with the feature similarity smaller than a third threshold value in the plurality of sample data as sub-health data.
5. The method according to any one of claims 1-4, wherein after training a fault prediction model based on said positive sample data and said negative sample data, further comprising:
analyzing target data based on the fault prediction model to obtain a prediction result;
determining a plurality of reasons of the prediction result according to the prediction result;
outputting the prediction result, the plurality of reasons, and the importance ratio of each reason of the prediction result.
6. The method of claim 5, wherein determining a cause for the predictor based on the predictor comprises:
selecting a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result;
and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing the prediction result.
7. The method according to any of claims 1-4, wherein said positive sample data and said negative sample data are partitioned from a plurality of sample data according to one or more of growth trend analysis, distance calculation and clustering methods.
8. A method according to any one of claims 1-7, characterized in that the fault prediction model is a random forest model.
9. A failure prediction apparatus, comprising:
the sample dividing unit is used for dividing a plurality of sample data to obtain positive sample data and negative sample data, wherein the positive sample data comprises fault data and a first part of sub-health data in the plurality of sample data, the similarity between the characteristics of the first part of sub-health data and the characteristics of the fault data is greater than a first threshold, the negative sample data comprises health data and a second part of sub-health data in the plurality of sample data, and the similarity between the characteristics of the second part of sub-health data and the characteristics of the fault data is smaller than a second threshold;
and the training unit is used for training according to the positive sample data and the negative sample data to obtain a fault prediction model, wherein the fault prediction model is used for analyzing target data.
10. The apparatus of claim 9, further comprising a characterization unit to:
acquiring a differential value of each sample data in a plurality of preset sliding windows according to the preset sliding windows;
and summing the differential values of each sample data in the preset sliding window respectively to obtain the plurality of sample characteristics.
11. The apparatus according to claim 9 or 10, wherein the sample dividing unit is specifically configured to:
determining target sub-health data from the plurality of sample data, wherein a similarity of a feature of the target sub-health data to a feature of the health data is less than a third threshold;
marking the target sub-health data with the feature similarity with the fault data larger than the first threshold value in the target sub-health data as the first part of sub-health data;
marking the target sub-health data with the feature similarity with the fault data smaller than the second threshold value in the target sub-health data as the second part of sub-health data;
marking the fault data and the first portion of sub-health data as positive sample data;
marking the health data and the second portion of sub-health data as negative sample data.
12. The apparatus according to claim 11, wherein the sample dividing unit is specifically configured to:
marking data with the characteristic value of 0 or the characteristic value tending to be 0 in the plurality of sample data as the healthy data;
and performing feature similarity analysis on the health data and the plurality of sample data, and marking the data with the feature similarity smaller than a third threshold value in the plurality of sample data as sub-health data.
13. The apparatus according to any one of claims 9-12, further comprising a prediction analysis unit configured to:
analyzing a prediction result of the target data based on the fault prediction model;
determining a plurality of reasons of the prediction result according to the prediction result;
and outputting the prediction result, the plurality of reasons and the importance ratio of each reason of the prediction result.
14. The apparatus according to claim 13, wherein the prediction analysis unit is configured to:
selecting a decision tree corresponding to the prediction result from the fault prediction model according to the prediction result;
and acquiring splitting characteristics on a decision path corresponding to the decision tree, wherein the splitting characteristics are reasons for causing the prediction result.
15. The apparatus according to any of claims 9-12, wherein said positive sample data and said negative sample are partitioned from a plurality of sample data according to one or more of growth trend analysis, distance calculation and clustering methods.
16. An arrangement according to any of claims 1-15, characterized in that the fault prediction model is a random forest model.
17. A failure prediction device, characterized in that the failure prediction device comprises a processor and a memory; the processor is configured to execute a memory-stored computer program to cause the fault prediction device to implement the method of any one of claims 1-8.
18. A computer-readable storage medium, in which a computer program is stored which, when run on a processor, is adapted to carry out the method of any one of claims 1-8.
CN202011596329.0A 2020-12-29 2020-12-29 Fault prediction method and related device Pending CN114756420A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011596329.0A CN114756420A (en) 2020-12-29 2020-12-29 Fault prediction method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011596329.0A CN114756420A (en) 2020-12-29 2020-12-29 Fault prediction method and related device

Publications (1)

Publication Number Publication Date
CN114756420A true CN114756420A (en) 2022-07-15

Family

ID=82324333

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011596329.0A Pending CN114756420A (en) 2020-12-29 2020-12-29 Fault prediction method and related device

Country Status (1)

Country Link
CN (1) CN114756420A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238837A (en) * 2022-09-23 2022-10-25 荣耀终端有限公司 Data processing method and device, electronic equipment and storage medium
CN116910006A (en) * 2023-07-24 2023-10-20 深圳市盛弘新能源设备有限公司 New energy battery-based data compression storage processing method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115238837A (en) * 2022-09-23 2022-10-25 荣耀终端有限公司 Data processing method and device, electronic equipment and storage medium
CN116910006A (en) * 2023-07-24 2023-10-20 深圳市盛弘新能源设备有限公司 New energy battery-based data compression storage processing method and system
CN116910006B (en) * 2023-07-24 2024-03-29 深圳市盛弘新能源设备有限公司 New energy battery-based data compression storage processing method and system

Similar Documents

Publication Publication Date Title
US10311368B2 (en) Analytic system for graphical interpretability of and improvement of machine learning models
CN108986869B (en) Disk fault detection method using multi-model prediction
CN105488539B (en) The predictor method and device of the generation method and device of disaggregated model, power system capacity
KR102068715B1 (en) Outlier detection device and method which weights are applied according to feature importance degree
CN112633601B (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
EP1958034B1 (en) Use of sequential clustering for instance selection in machine condition monitoring
CN111612041A (en) Abnormal user identification method and device, storage medium and electronic equipment
CN108847022B (en) Abnormal value detection method of microwave traffic data acquisition equipment
US10394631B2 (en) Anomaly detection and automated analysis using weighted directed graphs
CN114756420A (en) Fault prediction method and related device
CN112437053B (en) Intrusion detection method and device
CN113871009A (en) Sepsis prediction system, storage medium and apparatus in intensive care unit
CN112951311A (en) Hard disk fault prediction method and system based on variable weight random forest
CN112733146A (en) Penetration testing method, device and equipment based on machine learning and storage medium
CN115112372A (en) Bearing fault diagnosis method and device, electronic equipment and storage medium
CN111275101A (en) Fault identification method and device for aircraft hydraulic system and readable storage medium
CN117094184B (en) Modeling method, system and medium of risk prediction model based on intranet platform
CN112685374B (en) Log classification method and device and electronic equipment
CN112749003A (en) Method, apparatus and computer-readable storage medium for system optimization
CN115392351A (en) Risk user identification method and device, electronic equipment and storage medium
US11640558B2 (en) Unbalanced sample classification method and apparatus
WO2022183019A9 (en) Methods for mitigation of algorithmic bias discrimination, proxy discrimination and disparate impact
ZUBEDI et al. Implementation of Winsorizing and random oversampling on data containing outliers and unbalanced data with the random forest classification method
CN113435655B (en) Sector dynamic management decision method, server and system
KR102572192B1 (en) Auto Encoder Ensemble Based Anomaly Detection Method and System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination