CN106384128A - Method for mining time series data state correlation - Google Patents
Method for mining time series data state correlation Download PDFInfo
- Publication number
- CN106384128A CN106384128A CN201610814387.3A CN201610814387A CN106384128A CN 106384128 A CN106384128 A CN 106384128A CN 201610814387 A CN201610814387 A CN 201610814387A CN 106384128 A CN106384128 A CN 106384128A
- Authority
- CN
- China
- Prior art keywords
- cluster
- window
- data
- rule
- state
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a method for mining time series data state correlation, and the method comprises the steps: preprocessing including outlier removing, uniformly-spaced interpolation, normalization operation and the like is performed on time series data variables; state mining is performed on single variable, a dynamic partitioning and clustering method is adopted to cluster integrated eigenvectors of all windows of the single variable, windows from different clusters represent different states, all clusters are ranked according to sizes, each window is represented by a character corresponding to the belonged cluster, then original numeric data is transformed to character string form, that is, a state character string of each variable can be acquired; the state character strings of all variables are aligned, and a multivariable state matrix is obtained; an Apriori algorithm is utilized to mine correlation rules between different variable states and bring a formal representation and correlation intension thereof out; the correlation rules are reduced to remove redundant information at last; according to the invention, the method has an ability of noise disturbance resistance and is suitable for carefully analyzing state value correlation of a small parameter set, and a state value mapping relation can be mined.
Description
Technical field
The invention belongs to Intelligent Information Processing and field of computer technology are and in particular to a kind of time series data state of excavating is closed
The method of connection.
Background technology
In large-scale complicated system, between variable states, comprise certain incidence relation.This incidence relation is subject to system
The effect of inherent laws, just has certain embodiment on abnormal data.Relevance can show as on space-time cooccurrence relation, because
Fruit relation, tendency relation, correlation etc..When system mode changes, the respective change of different variables will be caused.System
Under normal condition with abnormality, moving law is different, is reflected in the version difference showing as variable on abnormal data.
By analyzing the Changing Pattern of multiple variable abnormal datas, excavate the relevance between different variable states, for total clone
System moving law, finds that incipient fault knowledge has important function.
Content of the invention
It is an object of the invention to provide a kind of method excavating time series data state relation, the method integrated use spy
Levy extractive technique, clustering learning theory to excavate the state of single variable, then utilize Apriori algorithm to excavate different variables
State relation rule simultaneously provides formalization representation and strength of association, finally correlation rule is carried out with yojan and eliminates redundancy;This
Ambiguity or the uncertainty of variable-value is considered in invention, has noise resistance interference performance, is suitable for thin to small parameter set
Cause ground to analyze its state value relevance, excavate state value mapping relations.
For reaching above-mentioned purpose, the technical solution used in the present invention is:
A kind of method excavating time series data state relation, the system realizing the method includes data preprocessing module, spy
Levy extraction module, dynamic partition clustering module, multivariable state matrix generation module, Apriori state relation excavate module and
Correlation rule yojan module, it comprises the concrete steps that:
1) first, data preprocessing module carries out elimination of burst noise, at equal intervals interpolation, normalization operation to original temporal data,
Obtain valid data form;
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module, to every
Individual window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
3) and then, dynamic partition clustering module enters Mobile state partition clustering to the characteristic vector of all windows of single variable,
The cluster obtaining will be clustered in magnitude order, maximum cluster is represented with character ' a ', secondary big cluster is represented with ' b ', the like,
Then it is considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster, so
Raw value type data is converted to character string forms, i.e. the status word string of this variable;
4) multivariable state matrix generation module, by the status word string of all variables according to time unifying, forms state square
Battle array;
5) Apriori state relation excavate module Apriori algorithm multivariable state matrix is carried out frequent item set and
Association rule mining;
6) last, correlation rule yojan module carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains
Whole multivariable state relation rule.
The step of described data preprocessing module elimination of burst noise includes:Calculate average and the standard deviation of each window, judge
The standard deviation of the difference of each data point and watch window average that it the is located whether watch window more than 5 times, if being more than, this number
Strong point is unruly-value rejecting;Interpolation at equal intervals is carried out to the time series after elimination of burst noise, if being spaced apart △ t, initial time is T, then
Time set after interpolation should be { T+n* △ t n=0,1,2,3 ... } at equal intervals, and T+i* △ t is corresponding to be worth for original sequence
In row from this moment nearest less than the value corresponding to T+i* △ t, that is, in original series first be more than T+i* △ t
The observation corresponding to the previous moment;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans one first
All over time series, obtain maximum (max) and the minimum of a value (min) of observation, each observation station is calculated according to equation below and returns
Numerical value after one change, original time series span is transformed on [0,1] interval;
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
Characteristic extracting module:First, with the window setting, univariate data is cut;Secondly, in each window
Data carry out feature extraction, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represents system
Meter feature, wherein average reflects the average level of data in a window, and variance then describes the fluctuation journey of data in window
Degree;V2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, passes through
Fourier transformation obtains a series of Fourier coefficient, and Fourier coefficient is ranked up from big to small according to absolute value, chooses
The maximum Fourier coefficient of the first two and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient
N], v3 represents time and frequency domain characteristics, carries out wavelet transform to each window, obtains n wavelet details coefficient, by this three aspect
Characteristic synthetic gets up, and constitutes window feature vector v, v=[v1, v2, v3].
Described dynamic partition clustering module adopt dynamic partition clustering method to the feature of all windows of single variable to
Amount carries out clustering its process as follows:
1) the independent cluster of first window, the cluster heart is the multi-feature vector of this window;
2) the initial division process of cluster, calculates similar between the 2nd window and the 1st cluster cluster heart according to equation below
Degree:
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine between vector is similar
Degree;cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows;
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1];
If dist<D, d=0.2), then No. 2 windows are incorporated to first cluster and the cluster heart changed immediately according to equation below:
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window equal
Value, the cluster heart c of a cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;If
Dist >=d, the independent cluster of No. 2 windows, process remaining window successively:Calculate i-th window and all cluster cluster hearts having produced
The distance between, pick out the cluster c nearest with it apart from dist, if dist<D, i window is incorporated to cluster c, otherwise individually becomes
Cluster;
3) the adjustment process of cluster:(i=1,2 ... m), calculates the cluster heart distance of itself and all clusters to take out No. i-th window
Dist, picks out the dist and its corresponding cluster c of minimum, if dist≤d, and window i is in cluster c, then by window i from original
Cluster move on to cluster c;If dist≤d, and window i is in cluster c, then window i is not operated;If dist>D, then by window i from former
Remove in the cluster coming, independent cluster;Repeat said process, until having processed all windows, calculate the cluster heart of all clusters.If existing
One cluster heart there occurs change, then repeat the adjustment process of cluster, i.e. step 3) until the cluster heart of all clusters no longer changes;If all
The cluster heart is all constant, then execution step 4);
4) merging process of cluster:Calculate the cluster heart distance of any two cluster, select closest two cluster ci, cj, and its
Corresponding apart from dist, if dist≤α, α=0.3), then merge cluster ci, cj, and calculate the cluster heart of new cluster after merging, repeat to close
And process 4), if dist>α, represents that there are not two close enough clusters can merge, then exit merging process, clustering algorithm
Terminate, the window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents difference
State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like,
Then it is considered as noise less than the cluster of given threshold value, with '?' represent, by each window with its corresponding character representation of place cluster, so
Raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
Described multivariable state matrix generation module be by the status word string of all variables according to time unifying it is assumed that
There is n variable, the initial observation time of each variable is identical with deadline, then their window number is necessarily identical, shape
State string length is also identical it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
It is to enter line frequency using Apriori algorithm to multivariable state matrix that described Apriori state relation excavates module
Numerous item collection and association rule mining:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, look for
Go out all of frequent 1 item collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until looking for again
To any frequent k item collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;The
Two, calculate the support of each candidate item, be considered as frequent episode more than the item of minimum support threshold value 0.001, in frequent episode
Mining Association Rules on the basis of collection, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Its
Secondary, for each nonvoid subset s of L, produce a candidate rule " s → (L-s) ", after wherein (L-s) represents removing s in L
Remaining content;If the confidence level of this candidate rule last is more than given threshold value 0.5, exports this rule, otherwise abandon this
Rule, the confidence level of rule is calculated as follows:
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents s
Support.
Described correlation rule yojan module merges to the redundancy rule producing or deletes, and reduction steps are:
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;
3) if the former piece of two correlation rules is identical, consequent is compared, if there is inclusion relation in consequent,
On the premise of confidence level difference very little, delete the rule belonging to consequent being comprised;
4) if the consequent of two correlation rules is identical, former piece is compared, if there is inclusion relation in former piece,
On the premise of confidence level difference very little, delete the more rules of former piece, retain the fewer rule of former piece;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule,
Detection to ring is realized by directed acyclic graph, represents former piece with a node, and a node represents consequent, and the two uses directed edge
Connect, detect the correlation rule according to confidence level descending one by one.
With respect to prior art, the present invention first, detects the state of each variable, using dynamic partition clustering method to institute
Window characteristic vector is had to be clustered, the window feature in same cluster is approximate, represents same state, the window in different clusters
Represent different states.By the status word string of all variables according to time unifying, obtain multivariable state matrix.Using
Apriori algorithm excavates the frequent cooccurrence relation between different variable states values, thus obtain multiple variables difference values it
Between association, and provide Formal Representation and its strength of association.Finally remove redundancy with the correlation rule producing is carried out with yojan
Information.The present invention take into account ambiguity or the uncertainty of variable-value, has noise resistance interference performance, is suitable for little ginseng
Manifold is closed and is meticulously analyzed its state value relevance, excavates state value mapping relations.
Brief description
Fig. 1 is the module frame figure of present system.
Fig. 2 is the present invention dynamic partition clustering block flow diagram.
Table 1 is the status word string of example sequential variable of the present invention.
Table 2 is the state relation rule digging result of example sequential variable of the present invention.
Fig. 3 is few examples sequential variable states correlation rule schematic diagram of the present invention.
Fig. 4 is few examples sequential variable states correlation rule schematic diagram of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Referring to Fig. 1, the system realizing the present invention includes data preprocessing module 1-1, characteristic extracting module 1-2, dynamically draws
Segregation generic module 1-3, multivariable state matrix generation module 1-4, Apriori state relation excavate module 1-5 and correlation rule
Yojan module 1-6;The comprising the concrete steps that of the inventive method:
1) first, data preprocessing module 1-1 carries out elimination of burst noise, at equal intervals interpolation, normalization behaviour to original temporal data
Make, obtain valid data form;
The step of elimination of burst noise includes:Calculate average and the standard deviation of each window, judge each data point and sight that it is located
Examine the standard deviation of the difference of the window average whether watch window more than 5 times, if being more than, this data point is unruly-value rejecting;To going
Time series after outlier carries out interpolation at equal intervals, if being spaced apart △ t, initial time is T, then the time collection after interpolation at equal intervals
Conjunction should be { T+n* △ t n=0,1,2,3 ... }, T+i* △ t is corresponding be worth in original series from this moment nearest little
In the value corresponding to T+i* △ t, that is, corresponding to first previous moment more than T+i* △ t in original series
Observation;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans a time series first, obtain observation
Maximum (max) and minimum of a value (min), according to equation below calculate each observation station normalization after numerical value, when will be original
Between sequence span be transformed on [0,1] interval;
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module 1-2, right
Each window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
First, with the window setting, univariate data is cut;Secondly, feature is carried out to the data in each window
Extract, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represents statistical nature, wherein average
Reflect the average level of data in a window, variance then describes the degree of fluctuation of data in window;V2=[Fourier system
Number 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, obtains one by Fourier transformation
The Fourier coefficient of series, is ranked up according to absolute value from big to small to Fourier coefficient, chooses in maximum Fu of the first two
Leaf system number and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient n], v3 represent that time-frequency domain is special
Levy, wavelet transform is carried out to each window, obtain n wavelet details coefficient, this tripartite's region feature is integrated, constitute
Window feature vector v, v=[v1, v2, v3].
3) and then, dynamic partition clustering module 1-3 the characteristic vector of all windows of single variable is entered Mobile state divide poly-
Class, will cluster the cluster obtaining in magnitude order, and maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', class successively
Push away, be then considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster,
So raw value type data is converted to character string forms, i.e. the status word string of this variable;
Step 2-1 is first carried out, the independent cluster of first window, the cluster heart is the multi-feature vector of this window.Execute step
Rapid 2-2, takes off a data.Execution step 2-3, calculates the distance of this data and all cluster hearts.Execution step 2-4, picks out
The cluster c nearest with it, the distance between they are denoted as dist:
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine between vector is similar
Degree.cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows.
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1].
Execution step 2-5, judges dist and given threshold value d, d=0.2) relation, if dist≤d, execution step 2-6,
This data is incorporated in cluster c and changes the cluster heart immediately;
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window equal
Value, the cluster heart c of a cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;
If dist>D, execution step 2-7, this data sheet alone becomes cluster and as the cluster heart.Execution step 2-8, judges whether
All data are processed.If not processed, execution step 2-2, take off a data;Otherwise execution step 2-9, takes first
Data.Execution step 2-10, the cluster heart calculating this data with all clusters is apart from dist.Execution step 2-11, picks out with it
Near cluster c, the distance between they are denoted as dist.Execution step 2-12, judges the relation of dist and given threshold value d, if dist
≤ d, execution step 2-13, judge this data or not in cluster c, if this data does not exist in cluster c, execution step 2-14, should
Data moves in cluster c;Otherwise execution step 2-15, does not operate to this data.If dist>D, execution step 2-16, this number
According to independent cluster.Execution step 2-17, judges whether to have processed all data.If not processed, execution step 2-18, take off
One data;Otherwise execution step 2-19, calculates the cluster heart of all clusters.Execution step 2-20, judges whether the cluster heart changes, that is,
Whether cluster result changes.If changing, execution step 2-9;Otherwise execution step 2-21, selects closest
Two clusters.Execution step 2-22, judges the distance between this two cluster hearts dist and given threshold value α, α=0.3) size.If
dist<α, the cluster heart of new cluster, then execution step 2-21 after merging this two clusters and calculating merging;If dist >=α, exit
Cluster process.Window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents
Different states;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', successively
Analogize, be then considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character list of place cluster
Show, such raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
With reference to table 1, it is the status word string mining result to all example variables for the dynamic partition clustering module.To each
Variable, the window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents difference
State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like,
Then it is considered as noise less than the cluster of given threshold value, with '?' represent, by each window with its corresponding character representation of place cluster, so
Raw value type data is converted to character string forms, that is, obtain the status word string of each variable.
Table 1
4) multivariable state matrix generation module 1-4, by the status word string of all variables according to time unifying, forms shape
State matrix;
Multivariable state matrix generation module 1-4 is it is assumed that there being n by the status word string of all variables according to time unifying
Individual variable, the initial observation time of each variable is identical with deadline, then their window number is necessarily identical, status word
Symbol string length is also identical it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
5) Apriori state relation excavation module 1-5 carries out frequent item set with Apriori algorithm to multivariable state matrix
And association rule mining;
It is to enter line frequency using Apriori algorithm to multivariable state matrix that described Apriori state relation excavates module
Numerous item collection and association rule mining:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, look for
Go out all of frequent 1 item collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until looking for again
To any frequent k item collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;The
Two, calculate the support of each candidate item, be considered as frequent episode more than the item of minimum support threshold value 0.001, in frequent episode
Mining Association Rules on the basis of collection, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Its
Secondary, for each nonvoid subset s of L, produce a candidate rule " s → (L-s) ", after wherein (L-s) represents removing s in L
Remaining content;If the confidence level of this candidate rule last is more than given threshold value 0.5, exports this rule, otherwise abandon this
Rule, the confidence level of rule is calculated as follows:
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents s
Support.
6) last, correlation rule yojan module 1-6 carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains
To final multivariable state relation rule.
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;For example:Second order frequent episode
(A, B), if producing correlation rule A → B and B → A, only retains of confidence level maximum;
3) if the former piece of two correlation rules is identical, consequent is compared, if there is inclusion relation in consequent,
On the premise of confidence level difference very little, delete the rule belonging to consequent being comprised;For example:A → B and A → B, C, if | Cf
(A → B)-Cf (A → B, C) |<δ, then delete A → B, retains A → B, C;
4) if the consequent of two correlation rules is identical, former piece is compared, if there is inclusion relation in former piece,
On the premise of confidence level difference very little, delete the more rules of former piece, retain the fewer rule of former piece;For example:A → B and
A, C → B, if | Cf (A → B)-Cf (A, C → B) |<δ, then delete A, C → B, retains A → B;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule,
Detection to ring is realized by directed acyclic graph, represents former piece with a node, and a node represents consequent, and the two uses directed edge
Connect, detect the correlation rule according to confidence level descending one by one.It is assumed that currently considering A → B, look first at directed acyclic in figure
Whether A can be reached from B, that is, detect whether comprise A in all descendant nodes of B, if the descendant node of B comprises A, that is, oriented
Figure comprises B → A, then delete A → B.If not comprising A in the descendant node of B, then interpolate A → B in digraph.
With reference to table 2, it is the result after variable states correlation rule yojan.
Table 2
With reference to Fig. 3, it is the schematic diagram of correlation rule " P002=b → P004=b ", and in figure dotted line (upper part) represents change
' b ' state of amount P002, wherein some blank, blank is not without data, but this window corresponds to other states, no
It is ' b ' state.Solid line (lower part) is ' b ' state of variable P004.The confidence level of this rule is 47/50=0.940, represents
The number of times that variable P002 and P004 goes out present condition ' b ' simultaneously in 146 records is 47, and ' b ' state individually in variable P002
Number of times be 50.
With reference to Fig. 4, it is the schematic diagram of correlation rule " P073=b → P075=b ", and wherein dotted line (top half) represents
' b ' state of variable P075, solid line (the latter half) represents ' b ' state of variable P073.' b ' state of P073 can be seen not
Identical to the greatest extent, spike in some tops, and some tops are parallel lines.In fact, when excavating variable states, to this variable
All window feature vectors are clustered, the window feature vector approximation (might not be identical) in same cluster, same
Window in individual cluster represents a kind of state, so the feature of data, form are approximate in the corresponding window of state, but not complete
Exactly the same.The confidence level of this rule is 44/44=1.0, represents that variable P073 and P075 goes out present situation simultaneously in 146 records
The number of times of state ' b ' is 44, and the number of times that ' b ' state individually in variable P073 is also 44.' b ' state in this explanatory variable P073
When be always accompanied by ' b ' state of P075.
Claims (7)
1. a kind of method excavating time series data state relation is it is characterised in that the system of realizing the method includes data and locates in advance
Reason module (1-1), characteristic extracting module (1-2), dynamic partition clustering module (1-3), multivariable state matrix generation module (1-
4), Apriori state relation excavates module (1-5) and correlation rule yojan module (1-6), and it comprises the concrete steps that:
1) first, data preprocessing module (1-1) carries out elimination of burst noise, at equal intervals interpolation, normalization operation to original temporal data,
Obtain valid data form;
2) secondly, the valid data of time series variable are divided into the window of equal length by characteristic extracting module (1-2), to every
Individual window data extracts feature, including Fourier's feature, statistical nature, wavelet character constitutive characteristic vector;
3) and then, dynamic partition clustering module (1-3) enters Mobile state partition clustering to the characteristic vector of all windows of single variable,
The cluster obtaining will be clustered in magnitude order, maximum cluster is represented with character ' a ', secondary big cluster is represented with ' b ', the like,
Then it is considered as noise less than the cluster of given threshold value 2, with '?' represent, by each window with its corresponding character representation of place cluster, so
Raw value type data is converted to character string forms, i.e. the status word string of this variable;
4) multivariable state matrix generation module (1-4), by the status word string of all variables according to time unifying, forms state
Matrix;
5) Apriori state relation excavate module (1-5) with Apriori algorithm, multivariable state matrix is carried out frequent item set and
Association rule mining;
6) last, correlation rule yojan module (1-6) carries out yojan to the correlation rule detecting, and eliminates redundancy, obtains
Final multivariable state relation rule.
2. according to claim 1 excavate time series data state relation method it is characterised in that:Described data is located in advance
The step of reason module (1-1) elimination of burst noise includes:Calculate average and the standard deviation of each window, judge that each data point is located with it
Whether the difference of watch window average is more than the standard deviation of 5 times of watch window, if being more than, this data point is unruly-value rejecting;Right
Time series after elimination of burst noise carries out interpolation at equal intervals, if being spaced apart △ t, initial time is T, then time after interpolation at equal intervals
Set should be { T+n* △ t n=0,1,2,3 ... }, and T+i* △ t is corresponding to be worth for nearest from this moment in original series
Less than the value corresponding to T+i* △ t, that is, corresponding to first previous moment more than T+i* △ t in original series
Observation;Linear normalization is carried out to the data after interpolation operation at equal intervals, scans a time series first, obtain observation
The maximum (max) of value and minimum of a value (min), calculate the numerical value after the normalization of each observation station according to equation below, will be original
Time series span is transformed on [0,1] interval;
Wherein, xiRepresent i-th observation station numerical value;△=max-min.
3. the method excavating time series data state relation according to claim 1 is it is characterised in that characteristic extracting module
(1-2):First, with the window setting, univariate data is cut;Secondly, feature is carried out to the data in each window and carries
Take, including statistical nature, Fourier's feature, wavelet character;V1=[average, variance], v1 represent statistical nature, and wherein average is anti-
Reflect the average level of data in a window, variance then describes the degree of fluctuation of data in window;V2=[Fourier coefficient
1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents frequency domain character, obtains one by Fourier transformation and is
The Fourier coefficient of row, is ranked up according to absolute value from big to small to Fourier coefficient, chooses the maximum Fourier of the first two
Coefficient and its corresponding frequency;V3=[wavelet coefficient detail coefficients 1 ... wavelet details coefficient n], v3 represent that time-frequency domain is special
Levy, wavelet transform is carried out to each window, obtain n wavelet details coefficient, this tripartite's region feature is integrated, constitute
Window feature vector v, v=[v1, v2, v3].
4. according to claim 1 excavate time series data state relation method it is characterised in that:Described dynamic division
Cluster module (1-3) carries out clustering its process such as to the characteristic vector of all windows of single variable using dynamic partition clustering method
Under:
1) the independent cluster of first window, the cluster heart is the multi-feature vector of this window;
2) the initial division process of cluster, calculates the similarity between the 2nd window and the 1st cluster cluster heart according to equation below:
In formula:cos(vik, vjk) represent window i vkThe v of (k=1,2,3) vector sum window jkCosine similarity between vector;
cos(vik, vjk) ∈ [- 1 ,+1], using equation below, calculate the distance between two windows;
In formula:dist(vi, vj) represent the distance between window i and j, dist (vi, vj) ∈ [0,1];
If dist<D, d=0.2), then No. 2 windows are incorporated to first cluster and the cluster heart changed immediately according to equation below:
In formula:cvkRepresent cluster heart c k-th characteristic vector, it be equal to cluster in all k-th characteristic vectors of window average, one
The cluster heart c of individual cluster is exactly cv1, cv2, cv3Combination, that is, in cluster all window multi-feature vectors mean value;If dist >=d,
The independent cluster of No. 2 windows, processes remaining window successively:Calculate between i-th window and all cluster cluster hearts having produced away from
From, pick out the cluster c nearest with it apart from dist, if dist<D, i window is incorporated to cluster c, otherwise independent cluster;
3) the adjustment process of cluster:(i=1,2 ... m), and the cluster heart calculating it with all clusters, apart from dist, is chosen to take out No. i-th window
Select the dist and its corresponding cluster c of minimum, if dist≤d, and window i not in cluster c, is then moved by window i from original cluster
To cluster c;If dist≤d, and window i is in cluster c, then window i is not operated;If dist>D, then by window i from original cluster
In remove, independent cluster;Repeat said process, until having processed all windows, calculate the cluster heart of all clusters.If there is a cluster
The heart there occurs change, then repeat the adjustment process of cluster, i.e. step 3) until the cluster heart of all clusters no longer changes;If all cluster hearts are all
Constant, then execution step 4);
4) merging process of cluster:Calculate the cluster heart distance of any two cluster, select closest two cluster ci, cj, and its corresponding
Apart from dist, if dist≤α, α=0.3), then merge cluster ci, cj, and calculate the cluster heart of new cluster after merging, repeat to merge
Journey 4), if dist>α, represents that there are not two close enough clusters can merge, then exit merging process, clustering algorithm terminates,
Window feature in same cluster in cluster result approximate it is considered to be a kind of state, the window of different clusters represents different shapes
State;By all clusters in magnitude order, maximum cluster is represented with character ' a ', and secondary big cluster is represented with ' b ', the like, it is less than
The cluster of given threshold value is then considered as noise, with '?' represent, by each window with its corresponding character representation of place cluster, so original
Numeric type data is converted to character string forms, that is, obtain the status word string of each variable.
5. according to claim 1 excavate time series data state relation method it is characterised in that:Described multivariable shape
State matrix generation module (1-4) is it is assumed that there being n variable by the status word string of all variables according to time unifying, each change
The initial observation time of amount is identical with deadline, then their window number is necessarily identical, status word string length also phase
With it is assumed that status word string length is m, then generate the multivariable state matrix of n*m.
6. according to claim 1 excavate time series data state relation method it is characterised in that:Described Apriori
It is to carry out frequent item set and correlation rule using Apriori algorithm to multivariable state matrix that state relation excavates module (1-5)
Excavate:Apriori algorithm Mining Frequent Itemsets Based flow process is as follows:First, by scanning transaction journal, find out all of frequent 1
Collection, this set is denoted as L1, then utilize L1Find frequent 2 item collections L2, so on, until any frequent k item can not be found again
Collection;Iteration is divided into two steps every time:First, produce candidate by connecting step and beta pruning step;Second, calculate each candidate
The support of item, is considered as frequent episode more than the item of minimum support threshold value 0.001, excavates and close on the basis of frequent item set
Connection rule, specific as follows:Firstly, for each frequent item set L, produce all nonvoid subsets of L;Secondly, each for L is non-
Vacuous subset s, produces a candidate rule " s → (L-s) ", wherein (L-s) remaining content after representing removing s in L;Such as finally
Really the confidence level of this candidate rule is more than given threshold value 0.5, then export this rule, otherwise abandons this rule, the confidence level of rule
It is calculated as follows:
Wherein, (L, s) represents the confidence level of regular " s → (L-s) " to Cf, and Sp (L) represents the support of L, and Sp (s) represents propping up of s
Degree of holding.
7. according to claim 1 excavate time series data state relation method it is characterised in that:Described correlation rule
Yojan module (1-6) merges to the redundancy rule producing or deletes, and reduction steps are:
1) for the correlation rule obtaining, it is ranked up from big to small according to confidence level;
2) for each K rank frequent episode (K>1), only retain the maximum correlation rule of confidence level;
3) if the former piece of two correlation rules is identical, consequent is compared, if consequent has inclusion relation, in confidence
On the premise of degree difference very little, delete the rule belonging to consequent being comprised;
4) if the consequent of two correlation rules is identical, former piece is compared, if former piece has inclusion relation, in confidence
On the premise of degree difference very little, delete the more rules of former piece, retain the fewer rule of former piece;
5) in order to ensure the uniformity of knowledge, it is to avoid circular reasoning occurs, need to detect with the presence or absence of ring in correlation rule, to ring
Detection realized by directed acyclic graph, represent former piece with a node, a node represents consequent, the two is with directed edge even
Connect, detect the correlation rule according to confidence level descending one by one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610814387.3A CN106384128A (en) | 2016-09-09 | 2016-09-09 | Method for mining time series data state correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610814387.3A CN106384128A (en) | 2016-09-09 | 2016-09-09 | Method for mining time series data state correlation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106384128A true CN106384128A (en) | 2017-02-08 |
Family
ID=57936358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610814387.3A Pending CN106384128A (en) | 2016-09-09 | 2016-09-09 | Method for mining time series data state correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106384128A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220483A (en) * | 2017-05-09 | 2017-09-29 | 西北大学 | A kind of mode prediction method of polynary time series data |
CN107454089A (en) * | 2017-08-16 | 2017-12-08 | 北京科技大学 | A kind of network safety situation diagnostic method based on multinode relevance |
CN107562865A (en) * | 2017-08-30 | 2018-01-09 | 哈尔滨工业大学深圳研究生院 | Multivariate time series association rule mining method based on Eclat |
CN107609107A (en) * | 2017-09-13 | 2018-01-19 | 大连理工大学 | A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data |
CN108182178A (en) * | 2018-01-25 | 2018-06-19 | 刘广泽 | Groundwater level analysis method and system based on event text data mining |
CN108577804A (en) * | 2018-02-02 | 2018-09-28 | 西北工业大学 | A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring |
CN109300502A (en) * | 2018-10-10 | 2019-02-01 | 汕头大学医学院 | A kind of system and method for the analyzing and associating changing pattern from multiple groups data |
CN109409695A (en) * | 2018-09-30 | 2019-03-01 | 上海机电工程研究所 | System Effectiveness evaluation index system construction method and system based on association analysis |
CN109614491A (en) * | 2018-12-21 | 2019-04-12 | 成都康赛信息技术有限公司 | Further method for digging based on data quality checking rule digging result |
CN109815857A (en) * | 2019-01-09 | 2019-05-28 | 浙江工业大学 | A kind of clustering method of ionising radiation time series |
CN109886098A (en) * | 2019-01-11 | 2019-06-14 | 中国船舶重工集团公司第七二四研究所 | A kind of AESA radar frequency agility mode excavation method across sorting interval |
CN110188566A (en) * | 2019-05-19 | 2019-08-30 | 复旦大学 | A method of the test access behavior based on sequence analysis damages data equity |
CN111163053A (en) * | 2019-11-29 | 2020-05-15 | 深圳市任子行科技开发有限公司 | Malicious URL detection method and system |
CN111177216A (en) * | 2019-12-23 | 2020-05-19 | 国网天津市电力公司电力科学研究院 | Association rule generation method and device for behavior characteristics of comprehensive energy consumer |
CN111428198A (en) * | 2020-03-23 | 2020-07-17 | 平安医疗健康管理股份有限公司 | Method, device, equipment and storage medium for determining abnormal medical list |
CN112284469A (en) * | 2020-10-20 | 2021-01-29 | 重庆智慧水务有限公司 | Zero drift processing method for ultrasonic water meter |
CN112861364A (en) * | 2021-02-23 | 2021-05-28 | 哈尔滨工业大学(威海) | Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation |
CN113535796A (en) * | 2021-05-31 | 2021-10-22 | 国家电网有限公司大数据中心 | Method and system for determining reason of high-age inventory of electric energy meters |
CN116451792A (en) * | 2023-06-14 | 2023-07-18 | 北京理想信息科技有限公司 | Method, system, device and storage medium for solving large-scale fault prediction problem |
CN117272398A (en) * | 2023-11-23 | 2023-12-22 | 聊城金恒智慧城市运营有限公司 | Data mining safety protection method and system based on artificial intelligence |
CN117974211A (en) * | 2024-02-02 | 2024-05-03 | 广东工业大学 | Vegetable sales early warning method and system |
-
2016
- 2016-09-09 CN CN201610814387.3A patent/CN106384128A/en active Pending
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107220483A (en) * | 2017-05-09 | 2017-09-29 | 西北大学 | A kind of mode prediction method of polynary time series data |
CN107454089A (en) * | 2017-08-16 | 2017-12-08 | 北京科技大学 | A kind of network safety situation diagnostic method based on multinode relevance |
WO2019041628A1 (en) * | 2017-08-30 | 2019-03-07 | 哈尔滨工业大学深圳研究生院 | Method for mining multivariate time series association rule based on eclat |
CN107562865A (en) * | 2017-08-30 | 2018-01-09 | 哈尔滨工业大学深圳研究生院 | Multivariate time series association rule mining method based on Eclat |
CN107609107B (en) * | 2017-09-13 | 2020-07-14 | 大连理工大学 | Travel co-occurrence phenomenon visual analysis method based on multi-source city data |
CN107609107A (en) * | 2017-09-13 | 2018-01-19 | 大连理工大学 | A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data |
CN108182178A (en) * | 2018-01-25 | 2018-06-19 | 刘广泽 | Groundwater level analysis method and system based on event text data mining |
CN108577804A (en) * | 2018-02-02 | 2018-09-28 | 西北工业大学 | A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring |
CN109409695A (en) * | 2018-09-30 | 2019-03-01 | 上海机电工程研究所 | System Effectiveness evaluation index system construction method and system based on association analysis |
CN109409695B (en) * | 2018-09-30 | 2020-02-04 | 上海机电工程研究所 | System efficiency evaluation index system construction method and system based on correlation analysis |
CN109300502A (en) * | 2018-10-10 | 2019-02-01 | 汕头大学医学院 | A kind of system and method for the analyzing and associating changing pattern from multiple groups data |
CN109614491A (en) * | 2018-12-21 | 2019-04-12 | 成都康赛信息技术有限公司 | Further method for digging based on data quality checking rule digging result |
CN109614491B (en) * | 2018-12-21 | 2023-06-30 | 成都康赛信息技术有限公司 | Further mining method based on mining result of data quality detection rule |
CN109815857A (en) * | 2019-01-09 | 2019-05-28 | 浙江工业大学 | A kind of clustering method of ionising radiation time series |
CN109815857B (en) * | 2019-01-09 | 2021-05-18 | 浙江工业大学 | Clustering method of ionizing radiation time sequence |
CN109886098A (en) * | 2019-01-11 | 2019-06-14 | 中国船舶重工集团公司第七二四研究所 | A kind of AESA radar frequency agility mode excavation method across sorting interval |
CN110188566A (en) * | 2019-05-19 | 2019-08-30 | 复旦大学 | A method of the test access behavior based on sequence analysis damages data equity |
CN111163053B (en) * | 2019-11-29 | 2022-05-03 | 深圳市任子行科技开发有限公司 | Malicious URL detection method and system |
CN111163053A (en) * | 2019-11-29 | 2020-05-15 | 深圳市任子行科技开发有限公司 | Malicious URL detection method and system |
CN111177216B (en) * | 2019-12-23 | 2024-01-05 | 国网天津市电力公司电力科学研究院 | Association rule generation method and device for comprehensive energy consumer behavior characteristics |
CN111177216A (en) * | 2019-12-23 | 2020-05-19 | 国网天津市电力公司电力科学研究院 | Association rule generation method and device for behavior characteristics of comprehensive energy consumer |
CN111428198A (en) * | 2020-03-23 | 2020-07-17 | 平安医疗健康管理股份有限公司 | Method, device, equipment and storage medium for determining abnormal medical list |
CN111428198B (en) * | 2020-03-23 | 2023-02-07 | 平安医疗健康管理股份有限公司 | Method, device, equipment and storage medium for determining abnormal medical list |
CN112284469A (en) * | 2020-10-20 | 2021-01-29 | 重庆智慧水务有限公司 | Zero drift processing method for ultrasonic water meter |
CN112284469B (en) * | 2020-10-20 | 2024-03-19 | 重庆智慧水务有限公司 | Zero drift processing method of ultrasonic water meter |
CN112861364B (en) * | 2021-02-23 | 2022-08-26 | 哈尔滨工业大学(威海) | Method for realizing anomaly detection by modeling industrial control system equipment behavior based on secondary annotation of state delay transition diagram |
CN112861364A (en) * | 2021-02-23 | 2021-05-28 | 哈尔滨工业大学(威海) | Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation |
CN113535796A (en) * | 2021-05-31 | 2021-10-22 | 国家电网有限公司大数据中心 | Method and system for determining reason of high-age inventory of electric energy meters |
CN116451792A (en) * | 2023-06-14 | 2023-07-18 | 北京理想信息科技有限公司 | Method, system, device and storage medium for solving large-scale fault prediction problem |
CN116451792B (en) * | 2023-06-14 | 2023-08-29 | 北京理想信息科技有限公司 | Method, system, device and storage medium for solving large-scale fault prediction problem |
CN117272398A (en) * | 2023-11-23 | 2023-12-22 | 聊城金恒智慧城市运营有限公司 | Data mining safety protection method and system based on artificial intelligence |
CN117272398B (en) * | 2023-11-23 | 2024-01-26 | 聊城金恒智慧城市运营有限公司 | Data mining safety protection method and system based on artificial intelligence |
CN117974211A (en) * | 2024-02-02 | 2024-05-03 | 广东工业大学 | Vegetable sales early warning method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106384128A (en) | Method for mining time series data state correlation | |
CN108415789B (en) | Node fault prediction system and method for large-scale hybrid heterogeneous storage system | |
CN104573740B (en) | A kind of equipment fault diagnosis method based on svm classifier model | |
CN105631596B (en) | Equipment fault diagnosis method based on multi-dimensional piecewise fitting | |
CN109102005A (en) | Small sample deep learning method based on shallow Model knowledge migration | |
CN109612513B (en) | Online anomaly detection method for large-scale high-dimensional sensor data | |
CN104462184B (en) | A kind of large-scale data abnormality recognition method based on two-way sampling combination | |
CN105607631B (en) | The weak fault model control limit method for building up of batch process and weak fault monitoring method | |
CN111401573B (en) | Working condition state modeling and model correcting method | |
CN109058771B (en) | The pipeline method for detecting abnormality of Markov feature is generated and is spaced based on sample | |
CN105335752A (en) | Principal component analysis multivariable decision-making tree-based connection manner identification method | |
CN109633369B (en) | Power grid fault diagnosis method based on multi-dimensional data similarity matching | |
CN103592587A (en) | Partial discharge diagnosis method based on data mining | |
CN108241873A (en) | A kind of intelligent failure diagnosis method towards pumping plant main equipment | |
CN113723452A (en) | Large-scale anomaly detection system based on KPI clustering | |
CN113268370B (en) | Root cause alarm analysis method, system, equipment and storage medium | |
CN113327632B (en) | Unsupervised abnormal sound detection method and device based on dictionary learning | |
CN112380274A (en) | Control process-oriented anomaly detection system | |
CN106327323A (en) | Bank frequent item mode mining method and bank frequent item mode mining system | |
CN110263944A (en) | A kind of multivariable failure prediction method and device | |
CN102509001A (en) | Method for automatically removing time sequence data outlier point | |
CN110245692A (en) | A kind of hierarchy clustering method for Ensemble Numerical Weather Prediction member | |
Cai et al. | An efficient outlier detection approach on weighted data stream based on minimal rare pattern mining | |
CN114021620B (en) | BP neural network feature extraction-based electric submersible pump fault diagnosis method | |
CN114443338A (en) | Sparse negative sample-oriented anomaly detection method, model construction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170208 |
|
WD01 | Invention patent application deemed withdrawn after publication |