CN115860243A - Fault prediction method and system based on industrial Internet of things data - Google Patents

Fault prediction method and system based on industrial Internet of things data Download PDF

Info

Publication number
CN115860243A
CN115860243A CN202211607140.6A CN202211607140A CN115860243A CN 115860243 A CN115860243 A CN 115860243A CN 202211607140 A CN202211607140 A CN 202211607140A CN 115860243 A CN115860243 A CN 115860243A
Authority
CN
China
Prior art keywords
prediction method
industrial internet
time series
multivariate time
fault
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211607140.6A
Other languages
Chinese (zh)
Inventor
王常玺
王婷
李康
李真林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202211607140.6A priority Critical patent/CN115860243A/en
Publication of CN115860243A publication Critical patent/CN115860243A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly relates to a fault prediction method and system based on industrial Internet of things data. The method of the invention comprises the following steps: step 1, inputting a multivariate time sequence collected by an industrial Internet of things; step 2, converting the multivariate time sequence into a pattern bag by using a discretization method; and 3, calculating the mode bag by using a classifier to obtain a fault prediction result. The invention also provides a system for realizing the method. The invention achieves good prediction performance by optimizing the algorithm and parameters of the method, thereby having good application prospect.

Description

Fault prediction method and system based on industrial Internet of things data
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a fault prediction method and system based on industrial Internet of things data.
Background
Multivariate Time Series (MTS) is generated when multiple interconnected data streams are recorded over a period of time. They are widely used in many fields such as speech recognition, EFG/EGC signal anomaly detection, smart home, machine monitoring, energy prediction (smart grid), location tracking, etc. In a modern industrial production line, equipment operation data acquired by using the internet of things is MTS data.
MTS is difficult to handle because an MTS sample contains multiple observations at one instant in time. Therefore, in order to classify MTSs, contributions of a plurality of features need to be considered at the same time. To address this problem, a variety of models have been developed. For example, pattern mining (Batal et al 2012, batal et al 2009, kadous et al 2005), classification methods (Chandrakala et al, 2010, nguyen et al 2011, orsenigo et al 2010), similarity measures (Chen et al 2013, yang et al 2005, yoon et al 2005).
At the same time, early classification of MTS is also a considerable problem. For example, analyzing the MTS generated by sensors monitoring the pulp and paper making process can identify anomalies as early as possible and provide emergency alerts to workers before a paper break event occurs. As another example, analyzing the MTS generated by patient monitoring and identifying abnormalities may provide emergency alerts to the physician. To date, there has been little research on early classification of MTS data, except for Ghalwash et al (2012, doi. He proposes a sharelet consisting of multiple segments, all of which are extracted simultaneously in the same sliding time window. However, the development of the existing multivariate time series classification model is still in an early stage, the applied scenes are still limited, and the predictive maintenance strategy applied to the industrial equipment faces some problems, for example, the improper core data representation causes the poor performance of the model. Therefore, there is still a need to develop new multivariate time series classification models for the predictive maintenance task of industrial equipment.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method for predicting faults by using MTS.
A fault prediction method based on industrial Internet of things data comprises the following steps:
step 1, inputting a multivariate time sequence collected by an industrial Internet of things;
step 2, converting the multivariate time sequence into a pattern bag by using a discretization method;
and 3, calculating the mode bag by using a classifier to obtain a fault prediction result.
Preferably, in step 2, the discretization method is SAX.
Preferably, in step 2, each univariate time series included in the multivariate time series is converted into words by SAX using a sliding window, and the words are counted to form a pattern bag.
Preferably, the size of the sliding window is 100 to 1000.
In step 3, the calculation process of the classifier includes: the features are selected using chi-square test, and then the fault is predicted using a logistic regression model.
Preferably, in step 3, the size of the feature set in the chi-square test is 20 to 2000.
Preferably, the multivariate time series is collected from a papermaking plant.
The invention also provides a system for realizing the fault prediction method based on the industrial internet of things data, which comprises the following steps:
the input module is used for inputting a multivariable time sequence acquired by the industrial Internet of things;
a multivariate time series conversion module for converting the multivariate time series into a pattern bag by using a discretization method;
the prediction module is used for calculating the mode bag by utilizing the classifier to obtain a fault prediction result;
and the output module is used for outputting the failure prediction result.
The present invention also provides a computer-readable storage medium having stored thereon a computer program for implementing the above-described industrial internet of things data-based failure prediction method.
The invention provides a method and a system for predicting faults by using MTS. The development of the current multivariate time series classification model is still in an early stage, and the applied scenes are still limited. The invention firstly uses the multivariate time series classification model for failure prediction (predictive maintenance model) of the industrial Internet of things, realizes fault diagnosis of the industrial Internet of things and has good application prospect.
In the preferred scheme, the invention optimizes the discretization method for converting the MTS and the classifier for obtaining the prediction result, can achieve good prediction performance and has good application prospect.
Obviously, many modifications, substitutions, and variations are possible in light of the above teachings of the invention, without departing from the basic technical spirit of the invention, as defined by the following claims.
The present invention will be described in further detail with reference to the following examples. This should not be understood as limiting the scope of the above-described subject matter of the present invention to the following examples. All the technologies realized based on the above contents of the present invention belong to the scope of the present invention.
Drawings
FIG. 1 is a graph showing the correlation between variables in Experimental example 1;
FIG. 2 is a graph of the feature set size visualization of FPR, FNR, recall, and precision for the training set and the test set in Experimental example 1;
fig. 3 visualizes the FPR, FNR, recall, and precision for the window sizes of the training and test sets in experimental example 1.
Detailed Description
It should be noted that, in the embodiment, the algorithm of the steps of data acquisition, transmission, storage, processing, etc. which are not specifically described, as well as the hardware structure, circuit connection, etc. which are not specifically described, can be implemented by the contents disclosed in the prior art.
Embodiment 1 fault prediction method and system based on industrial Internet of things data
The system provided by the embodiment comprises:
the input module is used for inputting a multivariable time sequence acquired by the industrial Internet of things;
a multivariate time series conversion module for converting the multivariate time series into a pattern Bag (BOP) using a discretization method;
the prediction module is used for calculating the mode bag by utilizing the classifier to obtain a fault prediction result;
and the output module is used for outputting a fault prediction result.
The method for predicting the failure of the equipment by adopting the system comprises the following steps: generating a pattern bag using symbol aggregation approximation (SAX) (SAX)
Figure BDA0003999035420000031
P.and LESER, U.2017.Multivariate time series classification with WEASEL + MUSE. ArXiv preprinting arXiv: 1711.11343.) carry out chi-square test of feature selection by taking the word packet as input. Then, the time series is classified (` based `) using logistic regression>
Figure BDA0003999035420000032
P.and/>
Figure BDA0003999035420000033
SFA, a systematic customer approach and index for precision search in high dimensional databases, proceedings of the15th International Conference on extension Database Technology,2012.ACM, 516-527), to obtain a fault prediction result.
The method specifically comprises the following steps:
step 1, inputting a multivariate time sequence collected by an industrial Internet of things; the multivariate time series is generated by the operation of an industrial manufacturing facility, such as a papermaking facility of a pulp and paper mill.
Step 2, converting the multivariate time sequence into a pattern bag by using a discretization method;
in particular, the BOP extracts subsequences of the multivariate time series and discretizes these real valued subsequences into a word, which is a series of symbols on a predefined alphabet. The classification model may be based on a BOP representation by constructing feature vectors using word counts and then applying classifiers to selected features. Two recently used discretization methods for converting time series into pattern bags are SAX (Senin et al, 2013) and the Symbolic Fourier Approximation (SFA). The difference between the two is that SAX relies on the discretization of the mean, whereas SFA is based on the discretization of Fourier transform coefficients. SFAs are considered to have better data adaptability than SAX. In this example, the SAX method is adopted, which is intended to be applied to non-uniformly sampled time series data more simply and accurately.
To apply the SAX method, L is first defined as the size of a sliding window that extracts a time series sub-sequence. Here a sliding time window is used, i.e. the window is moved forward one time point at a time. In addition, parameters in the SAX model, word size w and letter size a, are defined. For MTS, each univariate time series it contains is converted into words using SAX techniques. Assume that the MTS is expressed as: x = [ X ] 1 ,x 2 ,...,x n ],x i =[x 1i ,x 2i ,...,x ti ] T ,i=1,2,...,n.
T window, x, extracted from the univariate time series xi t1 ,x (t+1)1 ,...,x (t+L-1)1 Converted into a word string with w letters. For example, if w =3, the data in the window is converted to "acc". The SAX method described above is repeated for each univariate time series. The obtained BOP representations of each univariate time series are then combined together so that for each window there will be one word (w letters per word). These words are combined into a word vector of length n, denoted as Z t =[z t1 ,z t2 ,...,z tn ]. Through the direction ofAnd (4) sliding a window in front, and constructing a word design matrix for classification. In order to design the matrix and response Y = [ Y ] 1 ,y 2 ,...,y t ,...] T Matching, connecting Z t-1 And y t Constitute an example. Thus, the features used for prediction are advanced in time in response.
And 3, calculating the mode bag by using a classifier to obtain a fault prediction result.
Specifically, after representing the time series with the BOP, each classifier will use words to classify rather than numerical values. These words now become features. To perform feature screening, a chi-square test was performed on these words. All words appearing in the design matrix are grouped into bag B, i.e., all words belonging to the negative example (Category 0) are placed in bag B 0 All words belonging to the positive example (category 1) are put into the bag B 1 . This classification of words enables us to identify the information required for each feature in the chi-squared score calculation: the total number of positive instances (A) containing the feature, the total number of instances (M) containing the function, the total number of positive instances (P), and the total number of instances (N). The chi-squared score is calculated as follows:
Figure BDA0003999035420000051
as a statistical test of independence, the chi-square test will accept an invalid hypothesis, i.e., when the chi-square score is small, the two events are independent. In other words, when χ 2 The larger the correlation between the two events, i.e. the stronger the correlation between the feature and the tag. Thus, this step calculates the chi-squared score for each feature and selects the highest scoring feature for the following logical classification. The size of the feature set (the number of words it contains) S is a user defined value.
In the logistic regression algorithm, the occurrence frequency of each feature is counted, and a feature vector is constructed in the form of a histogram, so that the design matrix of words is further converted into a discrete count value matrix. Then, a conventional logistic regression model is applied to the constructed feature matrix.
The technical solution of the present invention will be further described by experiments.
Experimental example 1 comparison of Performance of different classifiers
1. Experimental methods
The experimental example has two experimental groups, the first experimental group is the method and the system of the example 1;
the second experimental group replaced the classifier of example 1 with SAX-VSM (SENIN, P.and MALINCHIK, S.SAX-VSM: interpretetable time series classification using SAX and vector space model.2013 IEEE 1 th international conference on data mining,2013.IEEE, 1175-1180.) as proposed by Senin and Mallinchik (2013). The BOP representation of SAX is input into a weighting scheme that assigns to term t a tf idf weight, which is the product of term frequency (tf) and document inverse frequency (idf). By following the SAX-VSM scheme, all words in B are considered features, and therefore feature selection can be avoided. The weight vector covering all words will be assigned to the bags B according to the frequency of occurrence of each word 0 And B 1 . It is to be noted here that: only in B and in B 0 (or B) 1 ) The words not shown in (1) are equivalent to those in (B) 0 (or B) 1 ) The frequency of (2) is 0.
2. Experimental data
The present example uses a "paper making" data set generated by a paper making facility of a pulp and paper mill, and fig. 1 is a correlation graph of the correlation between variables of MTS in the data set. The dispersed dark circles indicate that there is a wide correlation between pairs of variables, and therefore the variables in the present MTS must be analyzed as a whole, rather than as a univariate time series.
3. Results of the experiment
For the discretization function, the window size is set to L = [100, 500, 1000]. For feature selection-logistic regression, set the size of the feature set to S = [20, 200, 2000]By convention, the threshold for positive events is 0.5; for SAX-VSM, the word size is set to w =10 and the letter size is set to a =20. The first 8000 observations of the "paper" dataset were taken asTest resources are reserved and the rest are used for testing purposes. The BOP representation calculated with SAX is performed on the original variables and their derivatives, namely x' ti =|x ti -x (t-1)i |,i=1,2,...。
Thus, the experimental example has 122 variables as inputs for SAX.
To measure the classification performance, a series of indicators are used. False positives are indicated by FP, false negatives by FN, true positives by TP and true negatives by TN. The performance index of the classification result is calculated according to the definitions shown in table 1. For an actual data implementation, TP and FN may all be zero, or TP, FP and FN may all be zero. In this case, the corresponding metric value would be "NA" rather than a numerical value.
TABLE 1 Performance indicators and definitions
Performance Metric Calculation
FPR FP/(FP+TN)
FNR FN/(FN+TP)
Accuracy (TP+TN)/(TP+TN+FP+FN)
Recall TP/(TP+FN)
Precision TP/(TP+FP)
Fl-score 2TP/(2TP+FP+FN)
Table 2 shows the performance of the method of example 1 (feature selection + logistic regression) during the training and testing phases:
TABLE 2 Classification Performance-logistic regression for feature selection
Figure BDA0003999035420000061
Table 3 shows the indices of the SAX-VSM program in the experimental group showing comparison:
TABLE 3 Classification Performance of SAX-VSM
Figure BDA0003999035420000071
Comparing the data in Table 2 and Table 3, representing the SAX-VSM model performance of the experimental group in Table 3, the Recall, precision and F1-score values are NA or 0 and FNR is 1. The index indicates that in the test set, all model prediction results are of class 0, namely normal. Thus, the model has no ability to identify anomalies (category 1). In Table 2, which represents the method of example 1, the values of Recall, precision and F1-score in the test set were all greater than 0 at Feature set size =200 or 2000. The larger the indices Accuracy, recall, precision and F1-score are, the better, the smaller the FPR and FNR are, the better. It can be seen that the performance of the method of example 1 of the invention (feature selection + logistic regression) is significantly better than the SAX-VSM method of the comparative experimental group.
In addition, FIG. 2 shows the effect of the size of different feature sets on the performance of the method classification. By observing the image, as S increases, nearly all FNRs decrease, and Recall, precision increases. It follows that as the size of the feature set increases, the method classification performance increases.
Fig. 3 shows the effect of different sliding window sizes on the classification performance of the method. As can be seen from table 2 and fig. 3, the size of the sliding window has different effects on different indexes, but the sliding window selected in this experimental example has a better classification performance within a range of 100 to 1000.
The embodiment and the experimental example show that the method and the system for predicting the equipment fault by using the MTS are realized, and the algorithm and the parameters of the method are optimized to achieve good prediction performance, so that the method and the system have good application prospects.

Claims (9)

1. A fault prediction method based on industrial Internet of things data is characterized by comprising the following steps:
step 1, inputting a multivariate time sequence collected by an industrial Internet of things;
step 2, converting the multivariate time sequence into a pattern bag by using a discretization method;
and 3, calculating the mode bag by using a classifier to obtain a fault prediction result.
2. The failure prediction method of claim 1, characterized in that: in step 2, the discretization method is SAX.
3. The failure prediction method of claim 2, characterized in that: in step 2, each univariate time series contained in the multivariate time series is converted into words through SAX by using a sliding window, and the words are counted to form a pattern bag.
4. A failure prediction method as claimed in claim 3, characterized in that: the size of the sliding window is 100-1000.
5. The failure prediction method of claim 1, characterized in that: in step 3, the calculation process of the classifier includes: the features are selected using chi-square test, and then the fault is predicted using a logistic regression model.
6. The failure prediction method of claim 5, characterized in that: in step 3, in the chi-square test, the size of the feature set is 20-2000.
7. The failure prediction method according to any one of claims 1 to 6, characterized in that: the multivariate time series is collected from a papermaking plant.
8. A system for implementing the fault prediction method based on the industrial Internet of things data as claimed in any one of claims 1 to 7, the system is characterized by comprising:
the input module is used for inputting a multivariate time sequence acquired by the industrial Internet of things;
a multivariate time series conversion module for converting the multivariate time series into a pattern bag by using a discretization method;
the prediction module is used for calculating the mode bag by utilizing the classifier to obtain a fault prediction result;
and the output module is used for outputting a fault prediction result.
9. A computer-readable storage medium, characterized in that: a computer program for implementing the industrial internet of things data-based fault prediction method according to any one of claims 1 to 7 is stored thereon.
CN202211607140.6A 2022-12-14 2022-12-14 Fault prediction method and system based on industrial Internet of things data Pending CN115860243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211607140.6A CN115860243A (en) 2022-12-14 2022-12-14 Fault prediction method and system based on industrial Internet of things data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211607140.6A CN115860243A (en) 2022-12-14 2022-12-14 Fault prediction method and system based on industrial Internet of things data

Publications (1)

Publication Number Publication Date
CN115860243A true CN115860243A (en) 2023-03-28

Family

ID=85672897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211607140.6A Pending CN115860243A (en) 2022-12-14 2022-12-14 Fault prediction method and system based on industrial Internet of things data

Country Status (1)

Country Link
CN (1) CN115860243A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170303A (en) * 2023-11-03 2023-12-05 傲拓科技股份有限公司 PLC fault intelligent diagnosis maintenance system based on multivariate time sequence prediction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418482A (en) * 2020-10-26 2021-02-26 南京邮电大学 Cloud computing energy consumption prediction method based on time series clustering
CN113836820A (en) * 2021-10-20 2021-12-24 联想新视界(江西)智能科技有限公司 Equipment health assessment and fault diagnosis algorithm based on autocorrelation model and multivariate monitoring method
CN114722950A (en) * 2022-04-14 2022-07-08 武汉大学 Multi-modal multivariate time sequence automatic classification method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418482A (en) * 2020-10-26 2021-02-26 南京邮电大学 Cloud computing energy consumption prediction method based on time series clustering
CN113836820A (en) * 2021-10-20 2021-12-24 联想新视界(江西)智能科技有限公司 Equipment health assessment and fault diagnosis algorithm based on autocorrelation model and multivariate monitoring method
CN114722950A (en) * 2022-04-14 2022-07-08 武汉大学 Multi-modal multivariate time sequence automatic classification method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PENIBLAST: "多变量时间序列分类综述 (一)", pages 1 - 6, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/423417458> *
张杰;赵峰;孙曰瑶;: "基于基序及其时序关系的多变量数据流分类研究", 情报杂志, no. 09 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170303A (en) * 2023-11-03 2023-12-05 傲拓科技股份有限公司 PLC fault intelligent diagnosis maintenance system based on multivariate time sequence prediction
CN117170303B (en) * 2023-11-03 2024-01-26 傲拓科技股份有限公司 PLC fault intelligent diagnosis maintenance system based on multivariate time sequence prediction

Similar Documents

Publication Publication Date Title
CN107294993B (en) WEB abnormal traffic monitoring method based on ensemble learning
CN112084237A (en) Power system abnormity prediction method based on machine learning and big data analysis
CN111273623A (en) Fault diagnosis method based on Stacked LSTM
CN102521534B (en) Intrusion detection method based on crude entropy property reduction
CN109036577A (en) Diabetic complication analysis method and device
CN116894187A (en) Gear box fault diagnosis method based on deep migration learning
Yuan et al. Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring
CN117131110B (en) Method and system for monitoring dielectric loss of capacitive equipment based on correlation analysis
CN117131449A (en) Data management-oriented anomaly identification method and system with propagation learning capability
CN114091504A (en) Rotary machine small sample fault diagnosis method based on generation countermeasure network
CN114169091A (en) Method for establishing prediction model of residual life of engineering mechanical part and prediction method
CN115860243A (en) Fault prediction method and system based on industrial Internet of things data
Ma et al. Collaborative and adversarial deep transfer auto-encoder for intelligent fault diagnosis
Zhang et al. MS-TCN: A multiscale temporal convolutional network for fault diagnosis in industrial processes
Guo et al. Process monitoring and fault prediction in multivariate time series using bag-of-words
Golyadkin et al. SensorSCAN: Self-supervised learning and deep clustering for fault diagnosis in chemical processes
CN117675230A (en) Knowledge-graph-based oil well data integrity identification method
CN108898157A (en) The classification method of the radar chart representation of numeric type data based on convolutional neural networks
Oh et al. Multivariate time series open-set recognition using multi-feature extraction and reconstruction
CN110265151B (en) Learning method based on heterogeneous temporal data in EHR
CN112632466A (en) Bearing fault prediction method based on principal component analysis and deep bidirectional long-time and short-time memory network
Kalpana et al. A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty
Côme et al. Fault diagnosis of a railway device using semi-supervised independent factor analysis with mixing constraints
Ragab et al. Intelligent data mining For automatic face recognition
Lévesque et al. Development of a methodology to automatically identify active PD sources from phase resolved partial discharge patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination