CN110008253B - Industrial data association rule mining and abnormal working condition prediction method - Google Patents

Industrial data association rule mining and abnormal working condition prediction method Download PDF

Info

Publication number
CN110008253B
CN110008253B CN201910244856.6A CN201910244856A CN110008253B CN 110008253 B CN110008253 B CN 110008253B CN 201910244856 A CN201910244856 A CN 201910244856A CN 110008253 B CN110008253 B CN 110008253B
Authority
CN
China
Prior art keywords
data
sequence
fitting
association
line segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910244856.6A
Other languages
Chinese (zh)
Other versions
CN110008253A (en
Inventor
徐正国
王豆
陈积明
程鹏
孙优贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910244856.6A priority Critical patent/CN110008253B/en
Publication of CN110008253A publication Critical patent/CN110008253A/en
Application granted granted Critical
Publication of CN110008253B publication Critical patent/CN110008253B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an industrial data association rule mining and abnormal working condition prediction method which can be applied to fault prediction and health management of an industrial process. The invention introduces the association rule mining into the industrial equipment fault prediction, and finds the association between the operation parameters through the association rule mining algorithm. According to the characteristics of industrial data, starting from the variation trend of the operation parameters of the equipment, generating a transaction set by taking the variation trend of the operation parameters as the most important index, mining association rules between the parameters on the basis of the transaction set, and introducing the mining result of the association rules into the prediction of the abnormal working condition of the industrial equipment to obtain a more accurate prediction result. The method has great application value for fault prediction and health management in engineering.

Description

Industrial data association rule mining and abnormal working condition prediction method
Technical Field
The invention belongs to the technical field of reliability maintenance engineering, and relates to an industrial data association rule mining and abnormal working condition prediction method based on a two-stage frequent item set generation strategy.
Background
With the continuous emergence of complex systems and the increasing demand of real-time monitoring of industrial processes, modern industrial equipment is often equipped with a plurality of sensors to monitor the operation state of the industrial equipment in the operation process. Meanwhile, multiple fault modes may occur in the operation process of the equipment, a certain fault may correspond to a plurality of symptoms, and under the condition, the single sensor information cannot completely reflect the operation state of the equipment, so that fault prediction based on multi-sensor information is generated at the right moment. The failure prediction based on multi-sensor information aims to analyze the operation state of the equipment using comprehensive sensor information, thereby making more reliable equipment diagnosis and prediction. With the continuous development of sensing technology, the use of multiple sensors for condition monitoring, fault diagnosis and prediction of equipment has become a trend.
In the field of fault prediction, the work of combining association rule mining and fault prediction is still fresh at present. In fact, for time series data, equipment failure or failure is often represented by parameters or relevant features extracted from the parameters, and the prediction is often carried out on the variation trend of the parameters or the relevant features. And the association rule among the parameters is mined, so that more complete parameters, namely equipment running state information can be obtained, and a certain basis is provided for subsequent prediction.
Disclosure of Invention
Aiming at the current situation of the prior art, the invention aims to solve the problem that the association rule of sensor data is rarely considered in the existing data-driven prediction technology, provides an equipment abnormal working condition prediction method based on the operation parameter association rule, and constructs a more applicable wavelet neural network to perform abnormal working condition prediction (fault prediction).
The concept of the present invention will now be explained as follows:
the invention uses the association rule to depict the association of the operation parameters of the industrial process, and researches the abnormal working condition prediction problem mined based on the association rule of the time sequence data. In order to mine association rules on a sequence level for time series data, the invention provides a time series data association rule mining algorithm comprising a two-stage frequent item set generation process. In the first stage, extracting the change trend information of the time sequence as a basic mode for mining association rules, and finding a frequent item set of time sequence change forms; in the second stage, on the basis of the frequent item set of the time sequence variation form, the frequent item set of which the sequence is a basic mode is found, and association rule mining is carried out on every two sequences. And then, carrying out abnormal working condition prediction by using the system variables related to the association rule obtained by mining, and introducing the association rule into a wavelet neural network to improve the prediction accuracy. The method provided by the invention takes the operation parameter association rule into consideration, and can obtain a more accurate fault prediction result.
According to the invention concept, the invention provides an industrial data association rule mining and predicting method based on a two-stage frequent item set generation strategy, which comprises the following specific steps:
step 1: performing piecewise linearization representation and symbolization on time series data, and constructing a discrete data set suitable for association rule mining;
step 2: generating a frequent item set of the data set by adopting a two-stage frequent item set mining algorithm;
and step 3: generating association rules according to the frequent item sets, and extracting the association rules meeting the minimum support degree and the minimum confidence degree threshold;
and 4, step 4: and introducing the association rule mining result into a wavelet neural network and predicting the abnormal working condition of the industrial equipment.
Based on the above scheme, the following implementation manner can be specifically adopted for each step:
preferably, the step 1 comprises the following substeps:
step 1.1: the measuring time sequence of the sensor is as follows
Figure GDA0002883649050000021
N is the number of sensors and k is the time sequence length; the starting point of the initial fitting is
Figure GDA0002883649050000022
Initial fitting endpoint of
Figure GDA0002883649050000023
The fitting starting point is recorded as
Figure GDA0002883649050000024
Fitted endpoint of
Figure GDA0002883649050000025
Fitting error threshold value is omegaE
Step 1.2: for each
Figure GDA0002883649050000026
The piecewise fitting was performed as follows:
1.2.1 initializing a segmentation point count value of 1;
1.2.2 in turn for each starting point of the fit
Figure GDA0002883649050000027
Performing step 1) -step 4):
1) firstly, calculating end as start + h;
2) for data
Figure GDA0002883649050000028
Fitting by using a least square method, and calculating a fitting error ERR;
3) if the fitting error ERR is not more than the fitting error threshold value omegaEIf h is h +1, skipping to step 1) again;
4) if the fitting error ERR is larger than the fitting error threshold value omegaEObtaining
Figure GDA0002883649050000029
Line segment fitting sequence of
Figure GDA00028836490500000210
Recording the segmentation point when the start is equal to start + h
Figure GDA00028836490500000211
Resetting h to 2, count to count + 1;
1.2.3 circularly executing the step 1.2.2 until the end is larger than k, and obtaining a fitted linear time sequence
Figure GDA0002883649050000031
And segmentation point
Figure GDA0002883649050000032
Composed sequence of segmentation points Pi
Step 1.3: time series after fitting any sensor
Figure GDA0002883649050000033
Is marked as Yk={y1,y2,…,ykAnd extracting trend and numerical value information of each fitting line segment, and representing one fitting line segment s in the following triple modei
Figure GDA0002883649050000034
Wherein k isiWhich represents the slope of the line segment,
Figure GDA0002883649050000035
represents the span of the line segment on the time axis, riData { y } representing the growth rate of the line segment data corresponding to the line segmentj,yj+1,…,yj+h},
Figure GDA0002883649050000036
j is the starting point of the line segment;
for the line segmented time sequence YkAll the line segments in the sequence are subjected to triple representation to obtain a triple sequence Sn={s1,s2,…,snIn which n represents the time series XkThe number of segments after segmentation;
step 1.4: clustering line segment sequences in the triple sequence and symbolizing the line segments, which are used for representing different change forms of equipment or systems, and describing the line segments s by adopting Euclidean distanceiAnd sjDegree of similarity dij
Figure GDA0002883649050000037
Wherein d isijRepresenting a line segment siAnd sjSimilarity of (d)ijThe smaller the size, the more similar the change form of the two line segments, ωkAnd ωrIs a weight;
then according to the similarity index dijUsing a K-means clustering algorithm to pair SnClustering is carried out, and a phase is distributed to the same line segmentThe same symbol is used for representing the change mode of the operation parameter to obtain a symbolized sequence Fn={f1,f2,…,fn},f1,f2,…,fnRespectively representing symbols to which the 1 st, 2 … th, n line segments are assigned;
step 1.5: measuring time sequence for every two sensors
Figure GDA0002883649050000038
And
Figure GDA0002883649050000039
merging its segment point sequence PiAnd PjIs denoted by Pij,nij-1 is PiAnd PjThe number of the combined segmentation points; and symbolizing the sequence according to the combined segmentation point pair
Figure GDA00028836490500000310
And
Figure GDA00028836490500000311
performing segmentation reconstruction to obtain reconstructed symbolic sequence
Figure GDA00028836490500000312
And
Figure GDA00028836490500000313
preferably, the step 2 comprises the following substeps:
step 2.1: for measuring time series
Figure GDA0002883649050000041
And
Figure GDA0002883649050000042
respectively corresponding operating parameters ViAnd VjThe symbolized data of the measurement sequence obtained from step 1 is
Figure GDA0002883649050000043
And
Figure GDA0002883649050000044
from which a transaction set is formed, i.e. each transaction is recorded as
Figure GDA0002883649050000045
Figure GDA0002883649050000046
And
Figure GDA0002883649050000047
the line segment type symbols included in (1) are respectively marked as
Figure GDA0002883649050000048
And
Figure GDA0002883649050000049
recording the minimum support threshold of the two stages as min1And minisup2
Step 2.2: calculating the support degree of each item through a single scanning data set to obtain a frequent 1-item set, and performing the following processes of 2.2.1-2.2.3:
2.2.1: let σ (-) be the support count of an item or set of items, initially 0; is provided with
Figure GDA00028836490500000418
Is denoted by the class symbol tkT represents a or b;
2.2.2: for each transaction
Figure GDA00028836490500000410
Calculating σ (t)k)=σ(tk)+1;
2.2.3: for each tkIf, if
Figure GDA00028836490500000411
Not less than the minimum support degree threshold value minsup1Then, consider tkFor frequent 1-item sets, reserve tkAnd recording corresponding support degree counts; if it is not
Figure GDA00028836490500000412
Less than the minimum support threshold value minsup1Then, consider tkNot a frequent 1-item set;
step 2.3: using the frequent 1-item set t obtained in step 2.2kForming a 2-item set and calculating the support degree of the 2-item set to find the frequent 2-item set according to the following processes:
2.3.1: note apAnd bqRespectively, the symbols from the original line segment class after step 2.2
Figure GDA00028836490500000413
And
Figure GDA00028836490500000414
the item retained in (1);
2.3.2 for each { ap,bqExecuting the following steps:
1) for each one exists in
Figure GDA00028836490500000415
Of (1) { ap,bq}, calculate σ ({ a)p,bq})=σ({ap,bq})+1
2) If it is not
Figure GDA00028836490500000416
Not less than min1Then consider { ap,BqKeep { a } for the frequent 2-item setp,bqAnd recording corresponding support degree counts;
step 2.4: using the frequent 2-item set { a) obtained in step 2.3p,bqCalculating the support degree of every two operation parameters in the whole data set, and obtaining a frequent item set of a parameter level, and performing the following steps: for every two operating parameters ViAnd VjSet of formed items { Vi,Vj}, calculate σ ({ V)i,Vj})=sum(σ({ap,bq}) if
Figure GDA00028836490500000417
Not less than the minimum support degree threshold value minsup2Then { V } is retainedi,VjRecord the corresponding support degree and calculate sigma (V)i)=sum(σ(ap));σ(Vj)=sum(σ(bq))。
Preferably, the step 3 comprises the following substeps:
step 3.1: for each set { V satisfying the threshold of the support degree obtained in step 2i,VjResults in the following association rules: vj→ViAnd Vi→VjRecording the minimum confidence threshold value as minconf;
step 3.2: calculating a confidence threshold value according to each generated association rule group, wherein the process of extracting the association rules is as follows: for each association rule Vi→VjCalculating
Figure GDA0002883649050000051
If conf (V)i→Vj) If the minimum confidence coefficient threshold is not less than minconf, the association rule V is reservedi→VjAnd records the corresponding support and confidence omegai
Preferably, the step 4 comprises the following substeps:
step 4.1: for any set of association parameters extracted from the association rule, it is marked as { V1,V2,…,VuWhere u denotes the number of associated parameters, VuFor each association rule V, the rule's consequent, i.e. the target parameteri→Vu1,2, … u-1, each with a confidence level, which is denoted as ωi(ii) a For the target parameter VuPredicting abnormal working conditions by using a wavelet neural network;
step 4.2: constructing a training sample: the preset prediction step length is recorded to be l, and a group of association parameters extracted by association rule mining are set to be V1,V2,…,VuThe complete training data set formed by them is recorded as
Figure GDA0002883649050000052
Construct the following matrix ItrainFor the training input of the neural network:
Figure GDA0002883649050000053
wherein, ItrainEach column in the training output O is a training input sampletrainComprises the following steps:
Figure GDA0002883649050000054
step 4.3: training the wavelet neural network by using the constructed training sample: input parameter is ViI is 1,2, … u-1, and the output parameter is VuWherein at network initialization, the confidence ω derived from the association rule is usediSetting an initial weight value between a network input layer and a hidden layer, wherein i is 1,2, … u-1;
step 4.4: and (3) new data prediction: recording a preset abnormal working condition occurrence threshold value as omegapFor newly acquired sensor measurement data, the model trained in the step 4.3 is used for carrying out prediction in the step l, and if the obtained target parameter predicted value exceeds the set threshold value omega relative to the initial normal drift amountpAnd judging that the abnormal working condition occurs.
Preferably, before the device fails, the model is reconstructed and trained after a predetermined number of measurement data are updated with the data update, so as to obtain a more accurate prediction result.
The industrial data association rule mining and predicting method based on the two-stage frequent item set generation strategy can be used for a complex industrial system measured by a sensor. By mining the association rule of the operation parameters of the industrial equipment, the corresponding parameter association is obtained, and the parameter association is introduced into wavelet neural network prediction, so that a more accurate prediction effect can be obtained. The method provides firm support for subsequent equipment maintenance planning, is beneficial to equipment maintenance management with strict reliability requirements, and has wide prospects in the aspect of practical engineering application.
Drawings
FIG. 1 shows the predicted result of variable 7 of IDV (13) in the example and the comparison with the actual value;
FIG. 2 shows the predicted result of the variable 11 of IDV (13) in the example and the comparison with the actual value;
FIG. 3 shows the predicted error rate of IDV (13) variable 7 in the example;
FIG. 4 shows the predicted error rate of the IDV (13) variable 11 in the example.
Detailed Description
The embodiments of the present invention will now be further described with reference to the accompanying drawings.
The following example illustrates the specific operational steps and the effectiveness of the verification method in terms of Tennessee-Iseman (TE) process simulation data.
The data set was sampled at 3 minute intervals and recorded the variable measurements taken by each sensor at that sampling interval. Under each operating condition (normal operating state and fault operating state under 21 preset faults), the measurement data of the simulation process will generate two types of data sets, namely a training set and a test set. The acquisition process of the training set is measured values of all 52 variables obtained under the condition that the simulation process runs for 25 hours, wherein, except the training set acquired under the normal running state, the acquisition of the other 21 training set data introduces faults after the simulation process runs for 1 hour, and only the measured data of the following 24 hours are recorded. That is, the training set in the normal operation state has 500 observation samples, and the training sets collected in the remaining 21 fault states are all 480 observation samples. In addition, for 22 test sets, the data is all the variable measurement values collected after the simulation process runs for 48 hours, that is, each test set contains 960 sample data. It should be noted that in the simulation of 21 process faults, the corresponding fault was introduced after the simulation was run for 8 hours. Therefore, for the test set in 21 fault operation states, the first 160 observation samples are normal data, and the last 800 observation samples are fault data. In the TE process simulation model, only IDV (13) is a slowly varying fault, so in this example we use the relevant data of IDV (13) to perform experiments. The specific process of the industrial data association rule mining and abnormal working condition prediction method is as follows:
step 1: and (3) performing piecewise linearization representation and symbolization on the time series data, and constructing a discrete data set suitable for association rule mining. The method specifically comprises the following substeps:
step 1.1: the measuring time sequence of the sensor is as follows
Figure GDA0002883649050000071
N is the number of sensors and k is the time sequence length; the starting point of the initial fitting is
Figure GDA0002883649050000072
Initial fitting endpoint of
Figure GDA0002883649050000073
The fitting starting point is recorded as
Figure GDA0002883649050000074
Fitted endpoint of
Figure GDA0002883649050000075
Fitting error threshold value is omegaE. It should be noted that in the present invention, i and j are numbers indicating sensors as superscripts and are numbers indicating only ordinal numbers as subscripts, regardless of the sensor numbers.
Step 1.2: for each
Figure GDA0002883649050000076
The piecewise fitting was performed as follows:
1.2.1 initializing a segmentation point count value of 1;
1.2.2 in turn for each starting point of the fit
Figure GDA0002883649050000077
Performing step 1) -step 4):
1) firstly, calculating end as start + h;
2) for data
Figure GDA0002883649050000078
Fitting by using a least square method, and calculating a fitting error ERR;
3) if the fitting error ERR is not more than the fitting error threshold value omegaEIf h is h +1, skipping to step 1) again;
4) if the fitting error ERR is larger than the fitting error threshold value omegaEObtaining
Figure GDA0002883649050000079
Line segment fitting sequence of
Figure GDA00028836490500000710
Recording the segmentation point when the start is equal to start + h
Figure GDA00028836490500000711
Resetting h to 2, count to count + 1;
1.2.3 circularly executing 1.2.2 till end that end is larger than k, and obtaining a line-segment time sequence after least square fitting
Figure GDA00028836490500000712
And segmentation point
Figure GDA00028836490500000713
Composed sequence of segmentation points Pi
Step 1.3: time series after fitting any sensor
Figure GDA00028836490500000714
Is marked as Yk={y1,y2,…,ykWith a plurality of line segments fitted by the least squares method described above. Extracting trend and numerical information of each fitting line segment, and representing one fitting line segment s in the following triple modei
Figure GDA0002883649050000081
Wherein k isiWhich represents the slope of the line segment,
Figure GDA0002883649050000082
represents the span of the line segment on the time axis, riData { y } representing the growth rate of the line segment data corresponding to the line segmentj,yj+1,…,yj+h},
Figure GDA0002883649050000083
j is the starting point of the line segment;
for the line segmented time sequence YkAll the line segments in the sequence are subjected to triple representation to obtain a triple sequence Sn={s1,s2,…,snIn which n represents the time series XkThe number of segments after segmentation;
step 1.4: clustering line segment sequences in the triple sequence and symbolizing the line segments to represent different change forms of equipment or a system, thereby preparing for subsequent association rule mining. Describing line segment s by Euclidean distanceiAnd sjDegree of similarity dij
Figure GDA0002883649050000084
Wherein d isijRepresenting a line segment siAnd sjSimilarity of (d)ijThe smaller the size, the more similar the change form of the two line segments, ωkAnd ωrIs a weight;
then according to the similarity index dijUsing a K-means clustering algorithm to pair SnClustering is carried out, and the same symbol is distributed to the same line segment to represent the change mode of the operation parameter, so as to obtain a symbolized sequence Fn={f1,f2,…,fn},f1,f2,…,fnRespectively representing symbols to which the 1 st, 2 … th, n line segments are assigned;
step 1.5: measuring time sequence for every two sensors
Figure GDA0002883649050000085
And
Figure GDA0002883649050000086
merging its segment point sequence PiAnd PjIs denoted by Pij,nij-1 is PiAnd PjThe number of the combined segmentation points; and respectively symbolize the sequences according to the combined segmentation points
Figure GDA0002883649050000087
And
Figure GDA0002883649050000088
performing segmentation reconstruction to obtain reconstructed symbolic sequence
Figure GDA0002883649050000089
And
Figure GDA00028836490500000810
step 2: and generating a frequent item set of the data set by adopting a two-stage frequent item set mining algorithm. The method specifically comprises the following substeps:
step 2.1: for measuring time series
Figure GDA00028836490500000811
And
Figure GDA00028836490500000812
respectively corresponding operating parameters ViAnd VjThe symbolized data of the measurement sequence obtained from step 1 is
Figure GDA00028836490500000813
And
Figure GDA00028836490500000814
from which a transaction set is formed, i.e. each transaction logIs composed of
Figure GDA00028836490500000815
Figure GDA00028836490500000816
And
Figure GDA00028836490500000817
the line segment type symbols included in (1) are respectively marked as
Figure GDA0002883649050000091
And
Figure GDA0002883649050000092
recording the minimum support threshold of the two stages as min1And minisup2. In this example, the minimum support threshold is set as: minsup1=0.2, minsup2=0.2。
Step 2.2: calculating the support degree of each item through a single scanning data set to obtain a frequent 1-item set, and performing the following processes of 2.2.1-2.2.3:
2.2.1: let σ (-) be the support count of an item or set of items, initially 0; is provided with
Figure GDA0002883649050000093
Is denoted by the class symbol tkT represents a or b;
2.2.2: for each transaction
Figure GDA0002883649050000094
Calculating σ (t)l)=σ(tl)+1;
2.2.3: for each tlIf, if
Figure GDA0002883649050000095
Not less than the minimum support degree threshold value minsup1Then, consider tkFor frequent 1-item sets, reserve tkAnd recording corresponding support degree counts; if it is not
Figure GDA0002883649050000096
Less than the minimum support threshold value minsup1Then, consider tkNot a frequent 1-item set;
step 2.3: using the frequent 1-item set t obtained in step 2.2kForming a 2-item set and calculating the support degree of the 2-item set to find the frequent 2-item set according to the following processes:
2.3.1: note apAnd bqRespectively, the symbols from the original line segment class after step 2.2
Figure GDA0002883649050000097
And
Figure GDA0002883649050000098
the item retained in (1);
2.3.2 for each { ap,bqExecuting the following steps:
1) for each one exists in
Figure GDA0002883649050000099
Of (1) { ap,bq}, calculate σ ({ a)p,bq})=σ({ap,bq})+1
2) If it is not
Figure GDA00028836490500000910
Not less than min1Then consider { ap,bqKeep { a } for the frequent 2-item setp,bqAnd recording corresponding support degree counts;
step 2.4: using the frequent 2-item set { a) obtained in step 2.3p,bqCalculating the support degree of every two operation parameters in the whole data set, and obtaining a frequent item set of a parameter level, and performing the following steps: for every two operating parameters ViAnd VjSet of formed items { Vi,Vj}, calculate σ ({ V)i,Vj})=sum(σ({ap,bq}) if
Figure GDA00028836490500000911
Not less than the minimum support degree threshold value minsup2Then { V } is retainedi,VjRecord the corresponding support degree and calculate sigma (V)i)=sum(σ(ap));σ(Vj)=sum(σ(bq))。
And step 3: and generating association rules according to the frequent item set, and extracting the association rules meeting the minimum support degree and the minimum confidence degree threshold value. The method specifically comprises the following substeps:
step 3.1: for each set { V satisfying the threshold of the support degree obtained in step 2i,VjResults in the following association rules: vj→ViAnd Vi→VjRecording the minimum confidence threshold value as minconf; in this example, the minimum confidence threshold is set as: minconf ═ 0.7;
step 3.2: calculating a confidence threshold value according to each generated association rule group, wherein the process of extracting the association rules is as follows: for each association rule Vi→VjCalculating
Figure GDA0002883649050000101
If conf (V)i→Vj) If the minimum confidence coefficient threshold is not less than minconf, the association rule V is reservedi→VjAnd records the corresponding support and confidence omegai
In this step, association rules satisfying the threshold condition are generated, and a part of association parameters and confidence values thereof are extracted as shown in table 1. As can be seen from the results of table 1, this example will perform the prediction operation using variable 7 and variable 11 as target parameters.
And 4, step 4: and introducing the association rule mining result into a wavelet neural network and predicting the abnormal working condition of the industrial equipment. The method specifically comprises the following substeps:
step 4.1: for any set of association parameters extracted from the association rule, it is marked as { V1,V2,…,VuWhere u denotes the number of associated parameters, VuFor each association rule V, the rule's consequent, i.e. the target parameteri→Vu1,2, … u-1, all haveOne confidence, let it be ωi(ii) a For the target parameter VuPredicting abnormal working conditions by using a wavelet neural network;
step 4.2: constructing a training sample: let the preset prediction step be l, which in this example is set to 10. The set of association parameters extracted by association rule mining is { V }1,V2,…,VuThe complete training data set formed by them is recorded as
Figure GDA0002883649050000102
Construct the following matrix ItrainFor the training input of the neural network:
Figure GDA0002883649050000103
wherein, ItrainEach column in the training output O is a training input sampletrainComprises the following steps:
Figure GDA0002883649050000104
in particular, the training set herein not only uses fault data of the IDV (13) related variables, but also uses data of the related variables under normal operating conditions.
Step 4.3: training the wavelet neural network by using the constructed training sample: input parameter is ViI is 1,2, … u-1, and the output parameter is VuWherein at network initialization, the confidence ω derived from the association rule is usediAnd i is 1,2, … u-1, setting initial weight between the network input layer and the hidden layer. In this example, for variable 7, the input layer is 4 nodes and the hidden layer is 8 nodes; for the variable 11, the input layer is 3 nodes, the hidden layer is 6 nodes, the output layers of the two variables are 1 node, the adopted wavelet basis functions are all Morlet mother wavelet basis functions, and the related confidence values in the table 1 are used as the initialization weights of the input layer and the hidden layer of the neural network;
step 4.4: and (3) new data prediction: recording a preset abnormal working condition occurrence threshold value as omegapFor newly acquired sensor measurement data, the model trained in the step 4.3 is used for carrying out prediction in the step l, and if the obtained target parameter predicted value exceeds the set threshold value omega relative to the initial normal drift amountpAnd judging that the abnormal working condition occurs. Before the device does not fail, with the updating of the data, every updating a predetermined number NlAfter the measurement data is obtained, the model is reconstructed and trained to obtain more accurate prediction results, wherein N islDepending on the sensor sampling frequency and actual industrial field requirements. This example uses the first 300 data of the test set (total 960 sample points) to verify the prediction effect and updates the neural network with every 10 data. The threshold at which abnormal conditions (failures) occur (i.e. a parameter deviating from its normal value by a certain percentage) is set to ωp=0.015。
Table 1 association rules
Rule antecedents Rule clause Confidence level
Variable 13 Variable 7 0.7527
Variable 16 Variable 7 0.7446
Variable 36 Variable 7 0.7017
Variable 35 Variable 11 0.7513
Variable 36 Variable 11 0.7390
TABLE 2 Total prediction error Rate
Introducing association rules Without introducing association rules
Variable 7 1.0482 1.8548
Variable 11 0.8536 1.2135
Fig. 1 and fig. 2 show the prediction results of the variable 7 and the variable 13, and in order to verify the advantages of introducing the association rule, the prediction results are compared with the neural network prediction results under the condition of not introducing the association rule. In fig. 1 and 2, a vertical solid line indicates actual abnormal condition occurrence time under the condition of setting our threshold, and a vertical dotted line indicate predicted values of the abnormal condition occurrence time on the premise of introducing and not introducing the association rule, respectively. As can be seen from fig. 1 and fig. 2, the prediction result obtained by the method of the present invention can better approach the true value, and especially in the prediction of the first half test data, a good prediction result is obtained, because the first half is the operation data in the normal state, the training set is relatively complete and the value is relatively concentrated. In the prediction of the failure time, the method provided by the invention also obtains a better prediction result, in fig. 1, the predicted value lags behind the real value by 8 sampling points, and in fig. 2, the predicted value lags behind the real value by 5 sampling points. Compared with the prediction result without the introduction of the association rule, the method provided by the invention obviously obtains a more accurate prediction result. The error rate prediction calculation results for the variables 7 and 11 are shown in fig. 3 and 4. Also, to further quantify the results, the overall prediction error rate was calculated as shown in Table 2. From the point of view of the overall prediction error, the introduction of the association rule significantly reduces the prediction error of the neural network, which is also well reflected in the data presented in table 2.

Claims (5)

1. A method for mining industrial data association rules and predicting abnormal working conditions is characterized by comprising the following specific steps:
step 1: performing piecewise linearization representation and symbolization on time series data, and constructing a discrete data set suitable for association rule mining;
step 2: generating a frequent item set of the data set by adopting a two-stage frequent item set mining algorithm;
and step 3: generating association rules according to the frequent item sets, and extracting the association rules meeting the minimum support degree and the minimum confidence degree threshold;
and 4, step 4: introducing the association rule mining result into a wavelet neural network and predicting the abnormal working condition of the industrial equipment;
the step 1 comprises the following substeps:
step 1.1: the measuring time sequence of the sensor is as follows
Figure FDA0002883649040000011
N is the number of sensors and k is the time sequence length; the starting point of the initial fitting is
Figure FDA0002883649040000012
Initial fitting endpoint of
Figure FDA0002883649040000013
h is 2; the fitting starting point is recorded as
Figure FDA0002883649040000014
Fitted endpoint of
Figure FDA0002883649040000015
Fitting error threshold value is omegaE
Step 1.2: for each
Figure FDA0002883649040000016
The piecewise fitting was performed as follows:
1.2.1 initializing a segmentation point count value of 1;
1.2.2 in turn for each starting point of the fit
Figure FDA0002883649040000017
Performing step 1) -step 4):
1) firstly, calculating end as start + h;
2) for data
Figure FDA0002883649040000018
Fitting by using a least square method, and calculating a fitting error ERR;
3) if the fitting error ERR is not more than the fitting error threshold value omegaEIf h is h +1, skipping to step 1) again;
4) if the fitting error ERR is larger than the fitting error threshold value omegaEObtaining
Figure FDA0002883649040000019
Line segment fitting sequence of
Figure FDA00028836490400000110
Recording the segmentation point when the start is equal to start + h
Figure FDA00028836490400000111
Resetting h to 2, count to count + 1;
1.2.3 circularly executing the step 1.2.2 until the end is larger than k, and obtaining a fitted linear time sequence
Figure FDA0002883649040000021
And segmentation point
Figure FDA0002883649040000022
Composed sequence of segmentation points Pi
Step 1.3: time series after fitting any sensor
Figure FDA0002883649040000023
Is marked as Yk={y1,y2,…,ykAnd extracting trend and numerical value information of each fitting line segment, and representing one fitting line segment s in the following triple modei
Figure FDA0002883649040000024
Wherein k isiWhich represents the slope of the line segment,
Figure FDA0002883649040000025
represents the span of the line segment on the time axis, riData { y } representing the growth rate of the line segment data corresponding to the line segmentj,yj+1,…,yj+h},
Figure FDA0002883649040000026
j is the starting point of the line segment;
for the line segmented time sequence YkAll the line segments in the sequence are subjected to triple representation to obtain a triple sequence Sn={s1,s2,…,snIn which n represents the time series XkThe number of segments after segmentation;
step 1.4: clustering line segment sequences in the triple sequence and symbolizing the line segments, which are used for representing different change forms of equipment or systems, and describing the line segments s by adopting Euclidean distanceiAnd sjDegree of similarity dij
Figure FDA0002883649040000027
Wherein d isijRepresenting a line segment siAnd sjSimilarity of (d)ijThe smaller the size, the more similar the change form of the two line segments, ωkAnd ωrIs a weight;
then according to the similarity index dijUsing a K-means clustering algorithm to pair SnClustering is carried out, and the same symbol is distributed to the same line segment to represent the change mode of the operation parameter, so as to obtain a symbolized sequence Fn={f1,f2,…,fn},f1,f2,…,fnRespectively representing symbols to which the 1 st, 2 … th, n line segments are assigned;
step 1.5: measuring time sequence for every two sensors
Figure FDA0002883649040000028
And
Figure FDA0002883649040000029
merging its segment point sequence PiAnd PjIs denoted by Pij,nij-1 is PiAnd PjThe number of the combined segmentation points; and symbolizing the sequence according to the combined segmentation point pair
Figure FDA00028836490400000210
And
Figure FDA00028836490400000211
performing segmentation reconstruction to obtain reconstructed symbolic sequence
Figure FDA00028836490400000212
And
Figure FDA00028836490400000213
2. the method for mining industrial data association rules and predicting abnormal conditions as claimed in claim 1, wherein the step 2 comprises the following sub-steps:
step 2.1: for measuring time series
Figure FDA00028836490400000214
And
Figure FDA00028836490400000215
respectively corresponding operating parameters ViAnd VjThe symbolized sequence of the measurement time sequence obtained from step 1 is
Figure FDA0002883649040000031
And
Figure FDA0002883649040000032
from which a transaction set is formed, i.e. each transaction is recorded as
Figure FDA0002883649040000033
Figure FDA0002883649040000034
And
Figure FDA0002883649040000035
the line segment type symbols included in (1) are respectively marked as
Figure FDA0002883649040000036
And
Figure FDA0002883649040000037
recording the minimum support threshold of the two stages as min1And minisup2
Step 2.2: calculating the support degree of each item through a single scanning data set to obtain a frequent 1-item set, and performing the following processes of 2.2.1-2.2.3:
2.2.1: let σ (-) be the support count of an item or set of items, initially 0; is provided with
Figure FDA0002883649040000038
Is denoted by the class symbol tkT represents a or b;
2.2.2: for each transaction
Figure FDA0002883649040000039
Calculating σ (t)k)=σ(tk)+1;
2.2.3: for each tkIf, if
Figure FDA00028836490400000310
Not less than the minimum support degree threshold value minsup1Then, consider tkFor frequent 1-item sets, reserve tkAnd recording corresponding support degree counts; if it is not
Figure FDA00028836490400000311
Less than the minimum support threshold value minsup1Then, consider tkNot a frequent 1-item set;
step 2.3: using the frequent 1-item set t obtained in step 2.2kForming a 2-item set and calculating the support degree of the 2-item set to find the frequent 2-item set according to the following processes:
2.3.1: note apAnd bqRespectively, the symbols from the original line segment class after step 2.2
Figure FDA00028836490400000312
And
Figure FDA00028836490400000313
the item retained in (1);
2.3.2 for each { ap,bqExecuting the following steps:
1) for each one exists in
Figure FDA00028836490400000314
Of (1) { ap,bq}, calculate σ ({ a)p,bq})=σ({ap,bq})+1
2) If it is not
Figure FDA00028836490400000315
Not less than min1Then consider { ap,bqKeep { a } for the frequent 2-item setp,bqAnd recording corresponding support degree counts;
step 2.4: using the frequent 2-item set { a) obtained in step 2.3p,bqCalculating the support degree of every two operation parameters in the whole data set, and obtaining a frequent item set of a parameter level, and performing the following steps: for every two operating parameters ViAnd VjSet of formed items { Vi,Vj}, calculate σ ({ V)i,Vj})=sum(σ({ap,bq}) if
Figure FDA00028836490400000316
Not less than the minimum support degree threshold value minsup2Then { V } is retainedi,VjRecord the corresponding support degree and calculate sigma (V)i)=sum(σ(ap));σ(Vj)=sum(σ(bq))。
3. The method for mining industrial data association rules and predicting abnormal conditions as claimed in claim 2, wherein the step 3 comprises the following sub-steps:
step 3.1: for each set { V satisfying the threshold of the support degree obtained in step 2i,VjResults in the following association rules: vj→ViAnd Vi→VjRecording the minimum confidence threshold value as minconf;
step 3.2: calculating a confidence threshold value according to each generated association rule group, wherein the process of extracting the association rules is as follows: for each association rule Vi→VjCalculating
Figure FDA0002883649040000041
If conf (V)i→Vj) If the minimum confidence coefficient threshold value minconf is not less than the minimum confidence coefficient threshold value minconf, the association rule V is reservedi→VjAnd records the corresponding support and confidence omegai
4. The method for mining industrial data association rules and predicting abnormal operating conditions as claimed in claim 3, wherein the step 4 comprises the following sub-steps:
step 4.1: for any set of association parameters extracted from the association rule, it is marked as { V1,V2,…,VuWhere u denotes the number of associated parameters, VuFor each association rule V, the rule's consequent, i.e. the target parameteri→Vu1,2, … u-1, each with a confidence level, which is denoted as ωi(ii) a For the target parameter VuPredicting abnormal working conditions by using a wavelet neural network;
step 4.2: constructing a training sample: the preset prediction step length is recorded to be l, and a group of association parameters extracted by association rule mining are set to be V1,V2,…,VuThe complete training data set formed by them is recorded as
Figure FDA0002883649040000042
Construct the following matrix ItrainIs a neural netTraining input of the collaterals:
Figure FDA0002883649040000043
wherein, ItrainEach column in the training output O is a training input sampletrainComprises the following steps:
Figure FDA0002883649040000044
step 4.3: training the wavelet neural network by using the constructed training sample: input parameter is ViI is 1,2, … u-1, and the output parameter is VuWherein at network initialization, the confidence ω derived from the association rule is usediSetting an initial weight value between a network input layer and a hidden layer, wherein i is 1,2, … u-1;
step 4.4: and (3) new data prediction: recording a preset abnormal working condition occurrence threshold value as omegapFor newly acquired sensor measurement data, the model trained in the step 4.3 is used for carrying out prediction in the step l, and if the obtained target parameter predicted value exceeds the set threshold value omega relative to the initial normal drift amountpAnd judging that the abnormal working condition occurs.
5. The method as claimed in claim 1, wherein before the equipment fails, the model is reconstructed and trained after a predetermined number of measurement data are updated with the update of the data, so as to obtain a more accurate prediction result.
CN201910244856.6A 2019-03-28 2019-03-28 Industrial data association rule mining and abnormal working condition prediction method Active CN110008253B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910244856.6A CN110008253B (en) 2019-03-28 2019-03-28 Industrial data association rule mining and abnormal working condition prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910244856.6A CN110008253B (en) 2019-03-28 2019-03-28 Industrial data association rule mining and abnormal working condition prediction method

Publications (2)

Publication Number Publication Date
CN110008253A CN110008253A (en) 2019-07-12
CN110008253B true CN110008253B (en) 2021-02-23

Family

ID=67168723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910244856.6A Active CN110008253B (en) 2019-03-28 2019-03-28 Industrial data association rule mining and abnormal working condition prediction method

Country Status (1)

Country Link
CN (1) CN110008253B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112130541A (en) * 2020-10-20 2020-12-25 陕西煤业新型能源科技股份有限公司 Energy comprehensive management control system based on Internet of things
CN112380274B (en) * 2020-11-16 2023-08-22 北京航空航天大学 Abnormality detection method for control process
CN112800686A (en) * 2021-03-29 2021-05-14 国网江西省电力有限公司电力科学研究院 Transformer DGA online monitoring data abnormal mode judgment method
CN112801426B (en) * 2021-04-06 2021-06-22 浙江浙能技术研究院有限公司 Industrial process fault fusion prediction method based on correlation parameter mining
CN113032912A (en) * 2021-04-20 2021-06-25 上海交通大学 Ship diesel engine fault detection method based on association rule
CN114936581B (en) * 2022-06-01 2024-04-26 中国人民解放军63796部队 Multi-parameter association mining method based on time sequence data segmentation
CN115497267A (en) * 2022-09-06 2022-12-20 江西小手软件技术有限公司 Equipment early warning platform based on time sequence association rule
CN115689071B (en) * 2023-01-03 2023-05-02 南京工大金泓能源科技有限公司 Equipment fault fusion prediction method and system based on associated parameter mining
CN116204842B (en) * 2023-03-10 2023-09-08 广东省建设工程质量安全检测总站有限公司 Abnormality monitoring method and system for electrical equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201142630A (en) * 2009-12-21 2011-12-01 Ibm Method for training and using a classification model with association rule models
CN201898519U (en) * 2010-09-01 2011-07-13 燕山大学 Equipment maintenance early-warning device with risk control
CN103676645B (en) * 2013-12-11 2016-08-17 广东电网公司电力科学研究院 A kind of method for digging of the correlation rule in time series data stream
CN108873859B (en) * 2018-05-31 2020-07-31 浙江工业大学 Bridge type grab ship unloader fault prediction model method based on improved association rule

Also Published As

Publication number Publication date
CN110008253A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
CN110008253B (en) Industrial data association rule mining and abnormal working condition prediction method
CN110018670B (en) Industrial process abnormal working condition prediction method based on dynamic association rule mining
JP7240691B1 (en) Data drive active power distribution network abnormal state detection method and system
CN110008565B (en) Industrial process abnormal working condition prediction method based on operation parameter correlation analysis
CN112418277B (en) Method, system, medium and equipment for predicting residual life of rotating machine parts
JP6216242B2 (en) Anomaly detection method and apparatus
CN109298697B (en) Method for evaluating working state of each part of thermal power plant system based on dynamic baseline model
CN102789545B (en) Based on the Forecasting Methodology of the turbine engine residual life of degradation model coupling
Said et al. Machine learning technique for data-driven fault detection of nonlinear processes
JP2019527413A (en) Computer system and method for performing root cause analysis to build a predictive model of rare event occurrences in plant-wide operations
CN110414154B (en) Fan component temperature abnormity detection and alarm method with double measuring points
CN105548764A (en) Electric power equipment fault diagnosis method
CN105607631B (en) The weak fault model control limit method for building up of batch process and weak fault monitoring method
CN109917777B (en) Fault detection method based on mixed multi-sampling rate probability principal component analysis model
CN112683535B (en) Bearing life prediction method based on multi-stage wiener process
Mosallam et al. Component based data-driven prognostics for complex systems: Methodology and applications
CN111950627A (en) Multi-source information fusion method and application thereof
CN116380445B (en) Equipment state diagnosis method and related device based on vibration waveform
CN111103137A (en) Wind turbine gearbox fault diagnosis method based on deep neural network
CN111382494A (en) System and method for detecting anomalies in sensory data of industrial machines
CN115186762A (en) Engine abnormity detection method and system based on DTW-KNN algorithm
CN109299201B (en) Power plant production subsystem abnormity monitoring method and device based on two-stage clustering
CN114896861A (en) Rolling bearing residual life prediction method based on square root volume Kalman filtering
CN110308713A (en) A kind of industrial process failure identification variables method based on k neighbour reconstruct
JP6915693B2 (en) System analysis method, system analyzer, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant