CN106384128A - Method for mining time series data state correlation - Google Patents

Method for mining time series data state correlation Download PDF

Info

Publication number
CN106384128A
CN106384128A CN201610814387.3A CN201610814387A CN106384128A CN 106384128 A CN106384128 A CN 106384128A CN 201610814387 A CN201610814387 A CN 201610814387A CN 106384128 A CN106384128 A CN 106384128A
Authority
CN
China
Prior art keywords
cluster
window
data
rule
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610814387.3A
Other languages
Chinese (zh)
Inventor
王文青
王徐华
杨天社
鲍军鹏
赵静
李辉
张海龙
齐勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
China Xian Satellite Control Center
Original Assignee
Xian Jiaotong University
China Xian Satellite Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University, China Xian Satellite Control Center filed Critical Xian Jiaotong University
Priority to CN201610814387.3A priority Critical patent/CN106384128A/en
Publication of CN106384128A publication Critical patent/CN106384128A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method for mining time series data state correlation, and the method comprises the steps: preprocessing including outlier removing, uniformly-spaced interpolation, normalization operation and the like is performed on time series data variables; state mining is performed on single variable, a dynamic partitioning and clustering method is adopted to cluster integrated eigenvectors of all windows of the single variable, windows from different clusters represent different states, all clusters are ranked according to sizes, each window is represented by a character corresponding to the belonged cluster, then original numeric data is transformed to character string form, that is, a state character string of each variable can be acquired; the state character strings of all variables are aligned, and a multivariable state matrix is obtained; an Apriori algorithm is utilized to mine correlation rules between different variable states and bring a formal representation and correlation intension thereof out; the correlation rules are reduced to remove redundant information at last; according to the invention, the method has an ability of noise disturbance resistance and is suitable for carefully analyzing state value correlation of a small parameter set, and a state value mapping relation can be mined.

Description

A kind of method excavating time series data state relation
Technical field
The invention belongs to Intelligent Information Processing and field of computer technology are and in particular to a kind of time series data state of excavating is closed The method of connection.
Background technology
In large-scale complicated system, between variable states, comprise certain incidence relation.This incidence relation is subject to system The effect of inherent laws, just has certain embodiment on abnormal data.Relevance can show as on space-time cooccurrence relation, because Fruit relation, tendency relation, correlation etc..When system mode changes, the respective change of different variables will be caused.System Under normal condition with abnormality, moving law is different, is reflected in the version difference showing as variable on abnormal data. By analyzing the Changing Pattern of multiple variable abnormal datas, excavate the relevance between different variable states, for total clone System moving law, finds that incipient fault knowledge has important function.
Content of the invention
It is an object of the invention to provide a kind of method excavating time series data state relation, the method integrated use spy Levy extractive technique, clustering learning theory to excavate the state of single variable, then utilize Apriori algorithm to excavate different variables State relation rule simultaneously provides formalization representation and strength of association, finally correlation rule is carried out with yojan and eliminates redundancy;This Ambiguity or the uncertainty of variable-value is considered in invention, has noise resistance interference performance, is suitable for thin to small parameter set Cause ground to analyze its state value relevance, excavate state value mapping relations.
For reaching above-mentioned purpose, the technical solution used in the present invention is:
A kind of method excavating time series data state relation, the system realizing the method includes data preprocessing module, spy Levy extraction module, dynamic partition clustering module, multivariable state matrix generation module, Apriori state relation excavate module and Correlation rule yojan module, it comprises the concrete steps that:
1) first, data preprocessing module carries out elimination of burst noise, at equal intervals interpolation, normalization operation to original temporal data, Obtain valid data form;
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module, to every Individual window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
3) and then, dynamic partition clustering module enters Mobile state partition clustering to the characteristic vector of all windows of single variable, The cluster obtaining will be clustered in magnitude order, maximum cluster is represented with character ' a ', secondary big cluster is represented with ' b ', the like, Then it is considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster, so Raw value type data is converted to character string forms, i.e. the status word string of this variable;
4) multivariable state matrix generation module, by the status word string of all variables according to time unifying, forms state square Battle array;
5) Apriori state relation excavate module Apriori algorithm multivariable state matrix is carried out frequent item set and Association rule mining;
6) last, correlation rule yojan module carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains Whole multivariable state relation rule.
The step of described data preprocessing module elimination of burst noise includes:Calculate average and the standard deviation of each window, judge The standard deviation of the difference of each data point and watch window average that it the is located whether watch window more than 5 times, if being more than, this number Strong point is unruly-value rejecting;Interpolation at equal intervals is carried out to the time series after elimination of burst noise, if being spaced apart △ t, initial time is T, then Time set after interpolation should be { T+n* △ t n=0,1,2,3 ... } at equal intervals, and T+i* △ t is corresponding to be worth for original sequence In row from this moment nearest less than the value corresponding to T+i* △ t, that is, in original series first be more than T+i* △ t The observation corresponding to the previous moment;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans one first All over time series, obtain maximum (max) and the minimum of a value (min) of observation, each observation station is calculated according to equation below and returns Numerical value after one change, original time series span is transformed on [0,1] interval;
x i = x i - m i n Δ
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
Characteristic extracting module:First, with the window setting, univariate data is cut;Secondly, in each window Data carry out feature extraction, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represents system Meter feature, wherein average reflects the average level of data in a window, and variance then describes the fluctuation journey of data in window Degree;V2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, passes through Fourier transformation obtains a series of Fourier coefficient, and Fourier coefficient is ranked up from big to small according to absolute value, chooses The maximum Fourier coefficient of the first two and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient N], v3 represents time and frequency domain characteristics, carries out wavelet transform to each window, obtains n wavelet details coefficient, by this three aspect Characteristic synthetic gets up, and constitutes window feature vector v, v=[v1, v2, v3].
Described dynamic partition clustering module adopt dynamic partition clustering method to the feature of all windows of single variable to Amount carries out clustering its process as follows:
1) the independent cluster of first window, the cluster heart is the multi-feature vector of this window;
2) the initial division process of cluster, calculates similar between the 2nd window and the 1st cluster cluster heart according to equation below Degree:
c o s ( v i k , v j k ) = v i k · v j k | v i k | × | v j k |
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine between vector is similar Degree;cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows;
d ( v i k , v j k ) = 1 - c o s ( v i k , v j k ) + 1 2
d i s t ( v i , v j ) = Σ k = 1 3 d i s t ( v i k , v j k ) 3
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1];
If dist<D, d=0.2), then No. 2 windows are incorporated to first cluster and the cluster heart changed immediately according to equation below:
cv k = &Sigma; i = 1 n v i k n , ( k = 1 , 2 , 3 )
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window equal Value, the cluster heart c of a cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;If Dist >=d, the independent cluster of No. 2 windows, process remaining window successively:Calculate i-th window and all cluster cluster hearts having produced The distance between, pick out the cluster c nearest with it apart from dist, if dist<D, i window is incorporated to cluster c, otherwise individually becomes Cluster;
3) the adjustment process of cluster:(i=1,2 ... m), calculates the cluster heart distance of itself and all clusters to take out No. i-th window Dist, picks out the dist and its corresponding cluster c of minimum, if dist≤d, and window i is in cluster c, then by window i from original Cluster move on to cluster c;If dist≤d, and window i is in cluster c, then window i is not operated;If dist>D, then by window i from former Remove in the cluster coming, independent cluster;Repeat said process, until having processed all windows, calculate the cluster heart of all clusters.If existing One cluster heart there occurs change, then repeat the adjustment process of cluster, i.e. step 3) until the cluster heart of all clusters no longer changes;If all The cluster heart is all constant, then execution step 4);
4) merging process of cluster:Calculate the cluster heart distance of any two cluster, select closest two cluster ci, cj, and its Corresponding apart from dist, if dist≤α, α=0.3), then merge cluster ci, cj, and calculate the cluster heart of new cluster after merging, repeat to close And process 4), if dist>α, represents that there are not two close enough clusters can merge, then exit merging process, clustering algorithm Terminate, the window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents difference State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like, Then it is considered as noise less than the cluster of given threshold value, with '?' represent, by each window with its corresponding character representation of place cluster, so Raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
Described multivariable state matrix generation module be by the status word string of all variables according to time unifying it is assumed that There is n variable, the initial observation time of each variable is identical with deadline, then their window number is necessarily identical, shape State string length is also identical it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
It is to enter line frequency using Apriori algorithm to multivariable state matrix that described Apriori state relation excavates module Numerous item collection and association rule mining:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, look for Go out all of frequent 1 item collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until looking for again To any frequent k item collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;The Two, calculate the support of each candidate item, be considered as frequent episode more than the item of minimum support threshold value 0.001, in frequent episode Mining Association Rules on the basis of collection, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Its Secondary, for each nonvoid subset s of L, produce a candidate rule " s → (L-s) ", after wherein (L-s) represents removing s in L Remaining content;If the confidence level of this candidate rule last is more than given threshold value 0.5, exports this rule, otherwise abandon this Rule, the confidence level of rule is calculated as follows:
C f ( L , s ) = S p ( L ) S p ( s )
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents s Support.
Described correlation rule yojan module merges to the redundancy rule producing or deletes, and reduction steps are:
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;
3) if the former piece of two correlation rules is identical, consequent is compared, if there is inclusion relation in consequent, On the premise of confidence level difference very little, delete the rule belonging to consequent being comprised;
4) if the consequent of two correlation rules is identical, former piece is compared, if there is inclusion relation in former piece, On the premise of confidence level difference very little, delete the more rules of former piece, retain the fewer rule of former piece;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule, Detection to ring is realized by directed acyclic graph, represents former piece with a node, and a node represents consequent, and the two uses directed edge Connect, detect the correlation rule according to confidence level descending one by one.
With respect to prior art, the present invention first, detects the state of each variable, using dynamic partition clustering method to institute Window characteristic vector is had to be clustered, the window feature in same cluster is approximate, represents same state, the window in different clusters Represent different states.By the status word string of all variables according to time unifying, obtain multivariable state matrix.Using Apriori algorithm excavates the frequent cooccurrence relation between different variable states values, thus obtain multiple variables difference values it Between association, and provide Formal Representation and its strength of association.Finally remove redundancy with the correlation rule producing is carried out with yojan Information.The present invention take into account ambiguity or the uncertainty of variable-value, has noise resistance interference performance, is suitable for little ginseng Manifold is closed and is meticulously analyzed its state value relevance, excavates state value mapping relations.
Brief description
Fig. 1 is the module frame figure of present system.
Fig. 2 is the present invention dynamic partition clustering block flow diagram.
Table 1 is the status word string of example sequential variable of the present invention.
Table 2 is the state relation rule digging result of example sequential variable of the present invention.
Fig. 3 is few examples sequential variable states correlation rule schematic diagram of the present invention.
Fig. 4 is few examples sequential variable states correlation rule schematic diagram of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Referring to Fig. 1, the system realizing the present invention includes data preprocessing module 1-1, characteristic extracting module 1-2, dynamically draws Segregation generic module 1-3, multivariable state matrix generation module 1-4, Apriori state relation excavate module 1-5 and correlation rule Yojan module 1-6;The comprising the concrete steps that of the inventive method:
1) first, data preprocessing module 1-1 carries out elimination of burst noise, at equal intervals interpolation, normalization behaviour to original temporal data Make, obtain valid data form;
The step of elimination of burst noise includes:Calculate average and the standard deviation of each window, judge each data point and sight that it is located Examine the standard deviation of the difference of the window average whether watch window more than 5 times, if being more than, this data point is unruly-value rejecting;To going Time series after outlier carries out interpolation at equal intervals, if being spaced apart △ t, initial time is T, then the time collection after interpolation at equal intervals Conjunction should be { T+n* △ t n=0,1,2,3 ... }, T+i* △ t is corresponding be worth in original series from this moment nearest little In the value corresponding to T+i* △ t, that is, corresponding to first previous moment more than T+i* △ t in original series Observation;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans a time series first, obtain observation Maximum (max) and minimum of a value (min), according to equation below calculate each observation station normalization after numerical value, when will be original Between sequence span be transformed on [0,1] interval;
x i = x i - m i n &Delta;
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module 1-2, right Each window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
First, with the window setting, univariate data is cut;Secondly, feature is carried out to the data in each window Extract, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represents statistical nature, wherein average Reflect the average level of data in a window, variance then describes the degree of fluctuation of data in window;V2=[Fourier system Number 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, obtains one by Fourier transformation The Fourier coefficient of series, is ranked up according to absolute value from big to small to Fourier coefficient, chooses in maximum Fu of the first two Leaf system number and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient n], v3 represent that time-frequency domain is special Levy, wavelet transform is carried out to each window, obtain n wavelet details coefficient, this tripartite's region feature is integrated, constitute Window feature vector v, v=[v1, v2, v3].
3) and then, dynamic partition clustering module 1-3 the characteristic vector of all windows of single variable is entered Mobile state divide poly- Class, will cluster the cluster obtaining in magnitude order, and maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', class successively Push away, be then considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster, So raw value type data is converted to character string forms, i.e. the status word string of this variable;
Step 2-1 is first carried out, the independent cluster of first window, the cluster heart is the multi-feature vector of this window.Execute step Rapid 2-2, takes off a data.Execution step 2-3, calculates the distance of this data and all cluster hearts.Execution step 2-4, picks out The cluster c nearest with it, the distance between they are denoted as dist:
c o s ( v i k , v j k ) = v i k &CenterDot; v j k | v i k | &times; | v j k |
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine between vector is similar Degree.cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows.
d ( v i k , v j k ) = 1 - c o s ( v i k , v j k ) + 1 2
d i s t ( v i , v j ) = &Sigma; k = 1 3 d i s t ( v i k , v j k ) 3
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1].
Execution step 2-5, judges dist and given threshold value d, d=0.2) relation, if dist≤d, execution step 2-6, This data is incorporated in cluster c and changes the cluster heart immediately;
cv k = &Sigma; i = 1 n v i k n , ( k = 1 , 2 , 3 )
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window equal Value, the cluster heart c of a cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;
If dist>D, execution step 2-7, this data sheet alone becomes cluster and as the cluster heart.Execution step 2-8, judges whether All data are processed.If not processed, execution step 2-2, take off a data;Otherwise execution step 2-9, takes first Data.Execution step 2-10, the cluster heart calculating this data with all clusters is apart from dist.Execution step 2-11, picks out with it Near cluster c, the distance between they are denoted as dist.Execution step 2-12, judges the relation of dist and given threshold value d, if dist ≤ d, execution step 2-13, judge this data or not in cluster c, if this data does not exist in cluster c, execution step 2-14, should Data moves in cluster c;Otherwise execution step 2-15, does not operate to this data.If dist>D, execution step 2-16, this number According to independent cluster.Execution step 2-17, judges whether to have processed all data.If not processed, execution step 2-18, take off One data;Otherwise execution step 2-19, calculates the cluster heart of all clusters.Execution step 2-20, judges whether the cluster heart changes, that is, Whether cluster result changes.If changing, execution step 2-9;Otherwise execution step 2-21, selects closest Two clusters.Execution step 2-22, judges the distance between this two cluster hearts dist and given threshold value α, α=0.3) size.If dist<α, the cluster heart of new cluster, then execution step 2-21 after merging this two clusters and calculating merging;If dist >=α, exit Cluster process.Window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents Different states;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', successively Analogize, be then considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character list of place cluster Show, such raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
With reference to table 1, it is the status word string mining result to all example variables for the dynamic partition clustering module.To each Variable, the window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents difference State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like, Then it is considered as noise less than the cluster of given threshold value, with '?' represent, by each window with its corresponding character representation of place cluster, so Raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
Table 1
4) multivariable state matrix generation module 1-4, by the status word string of all variables according to time unifying, forms shape State matrix;
Multivariable state matrix generation module 1-4 is it is assumed that there being n by the status word string of all variables according to time unifying Individual variable, the initial observation time of each variable is identical with deadline, then their window number is necessarily identical, status word Symbol string length is also identical it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
5) Apriori state relation excavation module 1-5 carries out frequent item set with Apriori algorithm to multivariable state matrix And association rule mining;
It is to enter line frequency using Apriori algorithm to multivariable state matrix that described Apriori state relation excavates module Numerous item collection and association rule mining:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, look for Go out all of frequent 1 item collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until looking for again To any frequent k item collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;The Two, calculate the support of each candidate item, be considered as frequent episode more than the item of minimum support threshold value 0.001, in frequent episode Mining Association Rules on the basis of collection, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Its Secondary, for each nonvoid subset s of L, produce a candidate rule " s → (L-s) ", after wherein (L-s) represents removing s in L Remaining content;If the confidence level of this candidate rule last is more than given threshold value 0.5, exports this rule, otherwise abandon this Rule, the confidence level of rule is calculated as follows:
C f ( L , s ) = S p ( L ) S p ( s )
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents s Support.
6) last, correlation rule yojan module 1-6 carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains To final multivariable state relation rule.
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;For example:Second order frequent episode (A, B), if producing correlation rule A → B and B → A, only retains of confidence level maximum;
3) if the former piece of two correlation rules is identical, consequent is compared, if there is inclusion relation in consequent, On the premise of confidence level difference very little, delete the rule belonging to consequent being comprised;For example:A → B and A → B, C, if | Cf (A → B)-Cf (A → B, C) |<δ, then delete A → B, retains A → B, C;
4) if the consequent of two correlation rules is identical, former piece is compared, if there is inclusion relation in former piece, On the premise of confidence level difference very little, delete the more rules of former piece, retain the fewer rule of former piece;For example:A → B and A, C → B, if | Cf (A → B)-Cf (A, C → B) |<δ, then delete A, C → B, retains A → B;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule, Detection to ring is realized by directed acyclic graph, represents former piece with a node, and a node represents consequent, and the two uses directed edge Connect, detect the correlation rule according to confidence level descending one by one.It is assumed that currently considering A → B, look first at directed acyclic in figure Whether A can be reached from B, that is, detect whether comprise A in all descendant nodes of B, if the descendant node of B comprises A, that is, oriented Figure comprises B → A, then delete A → B.If not comprising A in the descendant node of B, then interpolate A → B in digraph.
With reference to table 2, it is the result after variable states correlation rule yojan.
Table 2
With reference to Fig. 3, it is the schematic diagram of correlation rule " P002=b → P004=b ", and in figure dotted line (upper part) represents change ' b ' state of amount P002, wherein some blank, blank is not without data, but this window corresponds to other states, no It is ' b ' state.Solid line (lower part) is ' b ' state of variable P004.The confidence level of this rule is 47/50=0.940, represents The number of times that variable P002 and P004 goes out present condition ' b ' simultaneously in 146 records is 47, and ' b ' state individually in variable P002 Number of times be 50.
With reference to Fig. 4, it is the schematic diagram of correlation rule " P073=b → P075=b ", and wherein dotted line (top half) represents ' b ' state of variable P075, solid line (the latter half) represents ' b ' state of variable P073.' b ' state of P073 can be seen not Identical to the greatest extent, spike in some tops, and some tops are parallel lines.In fact, when excavating variable states, to this variable All window feature vectors are clustered, the window feature vector approximation (might not be identical) in same cluster, same Window in individual cluster represents a kind of state, so the feature of data, form are approximate in the corresponding window of state, but not complete Exactly the same.The confidence level of this rule is 44/44=1.0, represents that variable P073 and P075 goes out present situation simultaneously in 146 records The number of times of state ' b ' is 44, and the number of times that ' b ' state individually in variable P073 is also 44.' b ' state in this explanatory variable P073 When be always accompanied by ' b ' state of P075.

Claims (7)

1. a kind of method excavating time series data state relation is it is characterised in that the system of realizing the method includes data and locates in advance Reason module (1-1), characteristic extracting module (1-2), dynamic partition clustering module (1-3), multivariable state matrix generation module (1- 4), Apriori state relation excavates module (1-5) and correlation rule yojan module (1-6), and it comprises the concrete steps that:
1) first, data preprocessing module (1-1) carries out elimination of burst noise, at equal intervals interpolation, normalization operation to original temporal data, Obtain valid data form;
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module (1-2), to every Individual window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
3) and then, dynamic partition clustering module (1-3) enters Mobile state partition clustering to the characteristic vector of all windows of single variable, The cluster obtaining will be clustered in magnitude order, maximum cluster is represented with character ' a ', secondary big cluster is represented with ' b ', the like, Then it is considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster, so Raw value type data is converted to character string forms, i.e. the status word string of this variable;
4) multivariable state matrix generation module (1-4), by the status word string of all variables according to time unifying, forms state Matrix;
5) Apriori state relation excavate module (1-5) with Apriori algorithm, multivariable state matrix is carried out frequent item set and Association rule mining;
6) last, correlation rule yojan module (1-6) carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains Final multivariable state relation rule.
2. according to claim 1 excavate time series data state relation method it is characterised in that:Described data is located in advance The step of reason module (1-1) elimination of burst noise includes:Calculate average and the standard deviation of each window, judge that each data point is located with it Whether the difference of watch window average is more than the standard deviation of 5 times of watch window, if being more than, this data point is unruly-value rejecting;Right Time series after elimination of burst noise carries out interpolation at equal intervals, if being spaced apart △ t, initial time is T, then time after interpolation at equal intervals Set should be { T+n* △ t n=0,1,2,3 ... }, and T+i* △ t is corresponding to be worth for nearest from this moment in original series Less than the value corresponding to T+i* △ t, that is, corresponding to first previous moment more than T+i* △ t in original series Observation;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans a time series first, obtain observation The maximum (max) of value and minimum of a value (min), calculate the numerical value after the normalization of each observation station according to equation below, will be original Time series span is transformed on [0,1] interval;
x i = x i - m i n &Delta;
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
3. the method excavating time series data state relation according to claim 1 is it is characterised in that characteristic extracting module (1-2):First, with the window setting, univariate data is cut;Secondly, feature is carried out to the data in each window and carries Take, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represent statistical nature, and wherein average is anti- Reflect the average level of data in a window, variance then describes the degree of fluctuation of data in window;V2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, obtains one by Fourier transformation and is The Fourier coefficient of row, is ranked up according to absolute value from big to small to Fourier coefficient, chooses the maximum Fourier of the first two Coefficient and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient n], v3 represent that time-frequency domain is special Levy, wavelet transform is carried out to each window, obtain n wavelet details coefficient, this tripartite's region feature is integrated, constitute Window feature vector v, v=[v1, v2, v3].
4. according to claim 1 excavate time series data state relation method it is characterised in that:Described dynamic division Cluster module (1-3) carries out clustering its process such as to the characteristic vector of all windows of single variable using dynamic partition clustering method Under:
1) the independent cluster of first window, the cluster heart is the multi-feature vector of this window;
2) the initial division process of cluster, calculates the similarity between the 2nd window and the 1st cluster cluster heart according to equation below:
c o s ( v i k , v j k ) = v i k &CenterDot; v j k | v i k | &times; | v j k |
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine similarity between vector; cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows;
d ( v i k , v j k ) = 1 - c o s ( v i k , v j k ) + 1 2
d i s t ( v i , v j ) = &Sigma; k = 1 3 d i s t ( v i k , v j k ) 3
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1];
If dist<D, d=0.2), then No. 2 windows are incorporated to first cluster and the cluster heart changed immediately according to equation below:
cv k = &Sigma; i = 1 n v i k n , ( k = 1 , 2 , 3 )
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window average, one The cluster heart c of individual cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;If dist >=d, The independent cluster of No. 2 windows, processes remaining window successively:Calculate between i-th window and all cluster cluster hearts having produced away from From, pick out the cluster c nearest with it apart from dist, if dist<D, i window is incorporated to cluster c, otherwise independent cluster;
3) the adjustment process of cluster:(i=1,2 ... m), and the cluster heart calculating it with all clusters, apart from dist, is chosen to take out No. i-th window Select the dist and its corresponding cluster c of minimum, if dist≤d, and window i not in cluster c, is then moved by window i from original cluster To cluster c;If dist≤d, and window i is in cluster c, then window i is not operated;If dist>D, then by window i from original cluster In remove, independent cluster;Repeat said process, until having processed all windows, calculate the cluster heart of all clusters.If there is a cluster The heart there occurs change, then repeat the adjustment process of cluster, i.e. step 3) until the cluster heart of all clusters no longer changes;If all cluster hearts are all Constant, then execution step 4);
4) merging process of cluster:Calculate the cluster heart distance of any two cluster, select closest two cluster ci, cj, and its corresponding Apart from dist, if dist≤α, α=0.3), then merge cluster ci, cj, and calculate the cluster heart of new cluster after merging, repeat to merge Journey 4), if dist>α, represents that there are not two close enough clusters can merge, then exit merging process, clustering algorithm terminates, Window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents different shapes State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like, it is less than The cluster of given threshold value is then considered as noise, with '?' represent, by each window with its corresponding character representation of place cluster, so original Numeric type data is converted to character string forms, that is, obtain the status word string of each variable.
5. according to claim 1 excavate time series data state relation method it is characterised in that:Described multivariable shape State matrix generation module (1-4) is it is assumed that there being n variable by the status word string of all variables according to time unifying, each change The initial observation time of amount is identical with deadline, then their window number is necessarily identical, status word string length also phase With it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
6. according to claim 1 excavate time series data state relation method it is characterised in that:Described Apriori It is to carry out frequent item set and correlation rule using Apriori algorithm to multivariable state matrix that state relation excavates module (1-5) Excavate:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, find out all of frequent 1 Collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until any frequent k item can not be found again Collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;Second, calculate each candidate The support of item, is considered as frequent episode more than the item of minimum support threshold value 0.001, excavates and close on the basis of frequent item set Connection rule, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Secondly, each for L is non- Vacuous subset s, produces a candidate rule " s → (L-s) ", wherein (L-s) remaining content after representing removing s in L;Such as finally Really the confidence level of this candidate rule is more than given threshold value 0.5, then export this rule, otherwise abandons this rule, the confidence level of rule It is calculated as follows:
C f ( L , s ) = S p ( L ) S p ( s )
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents propping up of s Degree of holding.
7. according to claim 1 excavate time series data state relation method it is characterised in that:Described correlation rule Yojan module (1-6) merges to the redundancy rule producing or deletes, and reduction steps are:
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;
3) if the former piece of two correlation rules is identical, consequent is compared, if consequent has inclusion relation, in confidence On the premise of degree difference very little, delete the rule belonging to consequent being comprised;
4) if the consequent of two correlation rules is identical, former piece is compared, if former piece has inclusion relation, in confidence On the premise of degree difference very little, delete the more rules of former piece, retain the fewer rule of former piece;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule, to ring Detection realized by directed acyclic graph, represent former piece with a node, a node represents consequent, the two is with directed edge even Connect, detect the correlation rule according to confidence level descending one by one.
CN201610814387.3A 2016-09-09 2016-09-09 Method for mining time series data state correlation Pending CN106384128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610814387.3A CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610814387.3A CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Publications (1)

Publication Number Publication Date
CN106384128A true CN106384128A (en) 2017-02-08

Family

ID=57936358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610814387.3A Pending CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Country Status (1)

Country Link
CN (1) CN106384128A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107454089A (en) * 2017-08-16 2017-12-08 北京科技大学 A kind of network safety situation diagnostic method based on multinode relevance
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat
CN107609107A (en) * 2017-09-13 2018-01-19 大连理工大学 A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data
CN108182178A (en) * 2018-01-25 2018-06-19 刘广泽 Groundwater level analysis method and system based on event text data mining
CN108577804A (en) * 2018-02-02 2018-09-28 西北工业大学 A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring
CN109300502A (en) * 2018-10-10 2019-02-01 汕头大学医学院 A kind of system and method for the analyzing and associating changing pattern from multiple groups data
CN109409695A (en) * 2018-09-30 2019-03-01 上海机电工程研究所 System Effectiveness evaluation index system construction method and system based on association analysis
CN109614491A (en) * 2018-12-21 2019-04-12 成都康赛信息技术有限公司 Further method for digging based on data quality checking rule digging result
CN109815857A (en) * 2019-01-09 2019-05-28 浙江工业大学 A kind of clustering method of ionising radiation time series
CN109886098A (en) * 2019-01-11 2019-06-14 中国船舶重工集团公司第七二四研究所 A kind of AESA radar frequency agility mode excavation method across sorting interval
CN110188566A (en) * 2019-05-19 2019-08-30 复旦大学 A method of the test access behavior based on sequence analysis damages data equity
CN111163053A (en) * 2019-11-29 2020-05-15 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111177216A (en) * 2019-12-23 2020-05-19 国网天津市电力公司电力科学研究院 Association rule generation method and device for behavior characteristics of comprehensive energy consumer
CN111428198A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN112284469A (en) * 2020-10-20 2021-01-29 重庆智慧水务有限公司 Zero drift processing method for ultrasonic water meter
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113535796A (en) * 2021-05-31 2021-10-22 国家电网有限公司大数据中心 Method and system for determining reason of high-age inventory of electric energy meters
CN116451792A (en) * 2023-06-14 2023-07-18 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117974211A (en) * 2024-02-02 2024-05-03 广东工业大学 Vegetable sales early warning method and system

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107454089A (en) * 2017-08-16 2017-12-08 北京科技大学 A kind of network safety situation diagnostic method based on multinode relevance
WO2019041628A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for mining multivariate time series association rule based on eclat
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat
CN107609107B (en) * 2017-09-13 2020-07-14 大连理工大学 Travel co-occurrence phenomenon visual analysis method based on multi-source city data
CN107609107A (en) * 2017-09-13 2018-01-19 大连理工大学 A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data
CN108182178A (en) * 2018-01-25 2018-06-19 刘广泽 Groundwater level analysis method and system based on event text data mining
CN108577804A (en) * 2018-02-02 2018-09-28 西北工业大学 A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring
CN109409695A (en) * 2018-09-30 2019-03-01 上海机电工程研究所 System Effectiveness evaluation index system construction method and system based on association analysis
CN109409695B (en) * 2018-09-30 2020-02-04 上海机电工程研究所 System efficiency evaluation index system construction method and system based on correlation analysis
CN109300502A (en) * 2018-10-10 2019-02-01 汕头大学医学院 A kind of system and method for the analyzing and associating changing pattern from multiple groups data
CN109614491A (en) * 2018-12-21 2019-04-12 成都康赛信息技术有限公司 Further method for digging based on data quality checking rule digging result
CN109614491B (en) * 2018-12-21 2023-06-30 成都康赛信息技术有限公司 Further mining method based on mining result of data quality detection rule
CN109815857A (en) * 2019-01-09 2019-05-28 浙江工业大学 A kind of clustering method of ionising radiation time series
CN109815857B (en) * 2019-01-09 2021-05-18 浙江工业大学 Clustering method of ionizing radiation time sequence
CN109886098A (en) * 2019-01-11 2019-06-14 中国船舶重工集团公司第七二四研究所 A kind of AESA radar frequency agility mode excavation method across sorting interval
CN110188566A (en) * 2019-05-19 2019-08-30 复旦大学 A method of the test access behavior based on sequence analysis damages data equity
CN111163053B (en) * 2019-11-29 2022-05-03 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111163053A (en) * 2019-11-29 2020-05-15 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111177216B (en) * 2019-12-23 2024-01-05 国网天津市电力公司电力科学研究院 Association rule generation method and device for comprehensive energy consumer behavior characteristics
CN111177216A (en) * 2019-12-23 2020-05-19 国网天津市电力公司电力科学研究院 Association rule generation method and device for behavior characteristics of comprehensive energy consumer
CN111428198A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN111428198B (en) * 2020-03-23 2023-02-07 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN112284469A (en) * 2020-10-20 2021-01-29 重庆智慧水务有限公司 Zero drift processing method for ultrasonic water meter
CN112284469B (en) * 2020-10-20 2024-03-19 重庆智慧水务有限公司 Zero drift processing method of ultrasonic water meter
CN112861364B (en) * 2021-02-23 2022-08-26 哈尔滨工业大学(威海) Method for realizing anomaly detection by modeling industrial control system equipment behavior based on secondary annotation of state delay transition diagram
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113535796A (en) * 2021-05-31 2021-10-22 国家电网有限公司大数据中心 Method and system for determining reason of high-age inventory of electric energy meters
CN116451792A (en) * 2023-06-14 2023-07-18 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN116451792B (en) * 2023-06-14 2023-08-29 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117272398B (en) * 2023-11-23 2024-01-26 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117974211A (en) * 2024-02-02 2024-05-03 广东工业大学 Vegetable sales early warning method and system

Similar Documents

Publication Publication Date Title
CN106384128A (en) Method for mining time series data state correlation
CN108415789B (en) Node fault prediction system and method for large-scale hybrid heterogeneous storage system
CN104573740B (en) A kind of equipment fault diagnosis method based on svm classifier model
CN105631596B (en) Equipment fault diagnosis method based on multi-dimensional piecewise fitting
CN109102005A (en) Small sample deep learning method based on shallow Model knowledge migration
CN109612513B (en) Online anomaly detection method for large-scale high-dimensional sensor data
CN104462184B (en) A kind of large-scale data abnormality recognition method based on two-way sampling combination
CN105607631B (en) The weak fault model control limit method for building up of batch process and weak fault monitoring method
CN111401573B (en) Working condition state modeling and model correcting method
CN109058771B (en) The pipeline method for detecting abnormality of Markov feature is generated and is spaced based on sample
CN105335752A (en) Principal component analysis multivariable decision-making tree-based connection manner identification method
CN109633369B (en) Power grid fault diagnosis method based on multi-dimensional data similarity matching
CN103592587A (en) Partial discharge diagnosis method based on data mining
CN108241873A (en) A kind of intelligent failure diagnosis method towards pumping plant main equipment
CN113723452A (en) Large-scale anomaly detection system based on KPI clustering
CN113268370B (en) Root cause alarm analysis method, system, equipment and storage medium
CN113327632B (en) Unsupervised abnormal sound detection method and device based on dictionary learning
CN112380274A (en) Control process-oriented anomaly detection system
CN106327323A (en) Bank frequent item mode mining method and bank frequent item mode mining system
CN110263944A (en) A kind of multivariable failure prediction method and device
CN102509001A (en) Method for automatically removing time sequence data outlier point
CN110245692A (en) A kind of hierarchy clustering method for Ensemble Numerical Weather Prediction member
Cai et al. An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining
CN114021620B (en) BP neural network feature extraction-based electric submersible pump fault diagnosis method
CN114443338A (en) Sparse negative sample-oriented anomaly detection method, model construction method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170208

WD01 Invention patent application deemed withdrawn after publication