CN106446081B - The method for excavating time series data incidence relation based on variation consistency - Google Patents

The method for excavating time series data incidence relation based on variation consistency Download PDF

Info

Publication number
CN106446081B
CN106446081B CN201610814069.7A CN201610814069A CN106446081B CN 106446081 B CN106446081 B CN 106446081B CN 201610814069 A CN201610814069 A CN 201610814069A CN 106446081 B CN106446081 B CN 106446081B
Authority
CN
China
Prior art keywords
cluster
window
variable
variation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201610814069.7A
Other languages
Chinese (zh)
Other versions
CN106446081A (en
Inventor
王文青
杨天社
鲍军鹏
张海龙
吴冠
李方正
王超
齐勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
China Xian Satellite Control Center
Original Assignee
Xian Jiaotong University
China Xian Satellite Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University, China Xian Satellite Control Center filed Critical Xian Jiaotong University
Priority to CN201610814069.7A priority Critical patent/CN106446081B/en
Publication of CN106446081A publication Critical patent/CN106446081A/en
Application granted granted Critical
Publication of CN106446081B publication Critical patent/CN106446081B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • G06F16/287Visualization; Browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)

Abstract

Based on the method that variation consistency excavates time series data incidence relation, time series data variable is pre-processed first;Then wavelet transformation is carried out to single variable, original time sequence is divided into several windows with sliding window, wavelet transform is carried out to each window, extracts maximum wavelet detail coefficients;WDC cluster is carried out to the maximum wavelet detail coefficients of all windows of single variable again, it is therefore an objective to distinguish and the different window of most of window wavelet character, these windows have corresponded to the change point of variable;CCP cluster finally is carried out to the change point of all variables, the change point of the same cluster internal variable is approximate in cluster result, therefore these variables have variation consistency, are considered to have potential incidence relation;The present invention changes consistency angle between variable, is not only able to be found to have the variable of linear correlation relationship, moreover it is possible to detect the variable with complex nonlinear incidence relation, this plays a significant role the association analysis between large-scale complicated system variable.

Description

The method for excavating time series data incidence relation based on variation consistency
Technical field
The invention belongs to Intelligent Information Processing and field of computer technology, and in particular to one kind is excavated based on variation consistency The method of time series data incidence relation.
Background technique
It in large-scale complicated system, generally requires to detect the incidence relation between multiple variables, this is for summarizing system fortune Professional etiquette rule, early warning are of great significance.There may be complicated incidence relation, this incidence relations between variable in system Effect usually by internal system rule.Relevance can show as cooccurrence relation, causality, tendency relationship on space-time Etc..When a variable changes, it will cause different variables that corresponding variation occurs.
Summary of the invention
The purpose of the present invention is to provide a kind of method for excavating time series data incidence relation based on variation consistency, the party Method integrated use wavelet transformation theory detects change point and the clustering learning theory of single variable to investigate multivariable variation Similitude between point vector, thus potential incidence relation between discovery time sequence variables.
In order to achieve the above objectives, the technical scheme is that
Based on the method that variation consistency excavates time series data incidence relation, the system for realizing this method includes that data are located in advance Module, characteristic extracting module, WDC cluster module and CCP cluster module are managed, is comprised the concrete steps that:
1) firstly, carrying out elimination of burst noise, at equal intervals interpolation, normalizing to original temporal data using the pre- module 1-1 processing of data Change operation, obtains the valid data form of timing variable;
2) secondly, being carried out using each window data of the characteristic extracting module 1-2 to the valid data form of timing variable Wavelet transform extracts maximum wavelet detail coefficients;
3) then, WDC is carried out using maximum wavelet detail coefficients of the WDC cluster module 1-3 to all windows of single variable It clusters, it is change point that window in the cluster of threshold value is less than in cluster result;
4) finally, CCP cluster is carried out to the change point vectors of all variables using CCP cluster module 1-4, in cluster result Variable in the same cluster be it is relevant, finally export the incidence relation and its intensity of each cluster internal variable.
The data preprocessing module carries out elimination of burst noise, at equal intervals interpolation, normalization operation packet to original temporal data Include following steps:
Firstly, calculate the mean value and standard deviation of each window, judge each data point and watch window mean value where it Whether difference is greater than the standard deviation of 5 times of watch window, if more than then the data point is outlier, is rejected;
Then, interpolation at equal intervals is carried out to the time series after elimination of burst noise, if the sampling interval is △ t, initial time is T, Then the time collection at equal intervals after interpolation is combined into { T+n* △ t n=0,1,2,3 ... }, and the corresponding value of T+i* △ t moment is original sequence Nearest from the moment in column to be less than value corresponding to T+i* △ t moment, i.e., first is greater than T+i* △ t moment in original series The previous moment corresponding to observation;
Finally, carrying out linear normalization to the data after interpolation operation at equal intervals, a time series is scanned first, is obtained The maximum value (max) and minimum value (min) of observation, according to formulaNumber after calculating each observation point normalization Value, original time series value range is transformed on [0,1] section, wherein xiIndicate i-th of observation point numerical value;△= max-min。
The characteristic extraction step of the characteristic extracting module includes: firstly, being carried out with sliding window to univariate data Cutting, if the Sampling starting point of initial data is t moment, the sampling interval is n seconds, window size m, sliding distance l, then first The period of a window is that the initial time of t, t+n*m, two windows is that first window initial time slides backward l, therefore The period of two windows is t+l, t+l+n*m, and so on, obtain N number of window;
Secondly, carrying out discrete wavelet transformation to the data in each window, according to window size, the wavelet decomposition number of plies is set L, maximum wavelet details coefficient cD in selected windowiAs the feature of the window, [i, cDi] indicate initial data in i-th The wavelet character of window.
The WDC sorting procedure of the WDC cluster module:
1) initialization of cluster, each independent cluster of window, the cluster heart be the window itself feature vector wavelet character [i, CDi], window number is denoted as m, and number of clusters mesh is denoted as n, at this time n=m;
2) the error sum of squares SSE of cluster result according to the following formula, is calculatedn
Wherein, n indicates the number of cluster;W indicates the window number in a cluster;J indicates the window subscript in cluster i;ciTable Show the cluster heart of cluster i;
3) the cluster heart distance of any two cluster according to the following formula, is calculated;
dist(ci,cj)=| ci-cj|i≠j
Wherein, dist (ci,cj) indicate cluster i and cluster j manhatton distance;ci、cjRespectively indicate the cluster heart of two clusters;
4) two nearest clusters of combined distance and according to the following formula replacement cluster center;
Wherein, c indicates the cluster heart;W indicates the window number in the cluster;cDiIndicate the maximum wavelet detail coefficients of window i;
5) n number subtracts 1;
6) step 2) is repeated to 5) until n=1;
7) corresponding cluster result when SSE declines most fast is picked out according to the following formula, is denoted as result={ c1,c2,… ck, k indicates the number of clusters mesh of this layer of cluster result;
Wherein, i indicates the number of plies of cluster;M is window number, that is, clusters the maximum number of plies;
8) distance for calculating any two cluster in result picks out two nearest clusters of distance, is denoted as ci,cj
If 9) dist (ci,cj)≤d, d=0.2 then merges the two clusters, and calculates the cluster heart of new cluster, then repeats to walk Rapid 8;
If 10) dist (ci,cj) > d, then exit cluster process;
11) contained window is the Parameters variation point in lesser cluster in cluster result, and lesser cluster is exactly window in cluster The ratio between several and total window number is less than the cluster of given threshold value 0.2, and all labels compared with window in tuftlet then constitute the variation of the parameter Point set, i.e. cpv={ cp1,cp2,…,cpm, wherein cpiIt is window label.
The CCP sorting procedure of the CCP cluster module includes:
1) the independent cluster of single variable is equipped with n variable, and the number of cluster is denoted as k, then k=n;
2) the variation consistency coefficient CoC of any two cluster according to the following formula, is calculated:
Wherein, CoC (c) indicates cluster c (ci, cjNew cluster after merging) variation consistency coefficient;X, y is any two in cluster c A variable;Z is cluster internal variable number, and the combination of any two variable has z (z-1)/2 kind, the variation consistency coefficient of a cluster It is equal to the average value of the variation consistency coefficient of all any two variables in cluster
Wherein, CoC (x, y) indicates the variation consistency coefficient of two variables x, y;|cpvx| indicate the change point of variable x The number i.e. size of the Parameters variation point set;|cpvy| indicate the change point number of variable y;|cpvxy| indicate variable x and y Common change point number;
cpvxy=cpvx∩cpvy
Wherein, cpvx、cpvyRespectively indicate the variation point set of variable x, y;
3) the variation strongest two cluster c of consistency are picked outi,cj, variation consistency coefficient between the two is denoted as max_ CoC;
4) if max_CoC is more than or equal to given threshold value 0.8, merge cluster ci,cj, k number subtracts 1, goes to step 2);
If 5) max_CoC is less than given threshold value, cluster process is exited, in final cluster result, in the same cluster Variable has incidence relation, and the strength of association between them is exactly the variation consistency coefficient CoC of corresponding cluster.
Variation consistency refers to that several timing variables always change at the time of close.That is, if more It almost changes on a variable longer period or together or does not change nearly all again, these variables have potential Incidence relation.The present invention is that foundation excavates the variable with relevance from a large amount of variables collections with the variation consistency of variable Subset.Compared with the existing technology, the invention has the following advantages: the present invention investigates more from variation consistency angle Incidence relation between each and every one variable, this incidence relation can be it is nonlinear, such as index, logarithm, multinomial function close System.The relevance that variable shows under variation is paid close attention to, and general association rule mining method only excavates normally In the case of frequent mode.Compared to traditional association rule mining method Apriori and FP-Tree, the present invention is suitable for big Quantitative change amount is associated analysis, therefrom finds potential relevance between parameter.
Detailed description of the invention
Fig. 1 is the module frame figure of present system.
Fig. 2 is WDC cluster module flow chart of the present invention.
Fig. 3 is CCP cluster module of the present invention.
Table 1 is the data simulation function of example timing variable of the present invention.
Fig. 4 is the emulation datagraphic segment of few examples timing variable of the present invention.
Table 2 is example time series data variable association relation excavation result in CCP cluster module.
Specific embodiment
Invention is further described in detail with reference to the accompanying drawings and embodiments.
Referring to Fig. 1, realize that system of the invention includes data preprocessing module 1-1, characteristic extracting module 1-2, WDC cluster Module 1-3 and CCP cluster module 1-4;The specific technical solution of the present invention is:
Step 1: elimination of burst noise, at equal intervals interpolation, normalizing are carried out to original temporal data using the pre- module 1-1 processing of data Change operation, obtains the valid data form of timing variable;
Firstly, calculate the mean value and standard deviation of each window, judge each data point and watch window mean value where it Whether difference is greater than the standard deviation of 5 times of watch window, if more than then the data point is outlier, is rejected;
Then, interpolation at equal intervals is carried out to the time series after elimination of burst noise, if the sampling interval is △ t, initial time is T, Then the time collection at equal intervals after interpolation is combined into { T+n* △ t n=0,1,2,3 ... }, and the corresponding value of T+i* △ t moment is original sequence Nearest from the moment in column to be less than value corresponding to T+i* △ t moment, i.e., first is greater than T+i* △ t moment in original series The previous moment corresponding to observation;
Finally, carrying out linear normalization to the data after interpolation operation at equal intervals, a time series is scanned first, is obtained The maximum value (max) and minimum value (min) of observation, according to formulaAfter calculating each observation point normalization Original time series value range is transformed on [0,1] section, wherein x by numerical valueiIndicate i-th of observation point numerical value;△= max-min;
Step 2: secondly, using characteristic extracting module 1-2 to each window data of the valid data form of timing variable Wavelet transform is carried out, maximum wavelet detail coefficients are extracted;
Firstly, univariate data is cut with sliding window, if the Sampling starting point of initial data is t moment, sampling Interval is n seconds, window size m, sliding distance l, then the period of first window is rising for t, t+n*m, two windows Moment beginning is that first window initial time slides backward l, therefore the period of second window is t+l, t+l+n*m, with such It pushes away, obtains N number of window;
Secondly, carrying out discrete wavelet transformation to the data in each window, according to window size, the wavelet decomposition number of plies is set L, maximum wavelet details coefficient cD in selected windowiAs the feature of the window, [i, cDi] indicate initial data in i-th The wavelet character of window;
Step 3: referring to fig. 2, then, using 1-3 pairs of cluster module of WDC (Wavelet Detail Coefficient) The maximum wavelet detail coefficients of all windows of single variable carry out WDC cluster, and window in the cluster of threshold value is less than in cluster result and is Change point;
1) step 2-1 is carried out first, and the initialization of cluster, each independent cluster of window, the cluster heart is the wavelet character of the window cDi, window number is denoted as m, and number of clusters mesh is denoted as n, at this time n=m;
2) it then carries out step 2-2 and calculates the error sum of squares SSE of cluster result according to the following formulan(Sum of Squared Error);
Wherein, n indicates the number of cluster;W indicates the window number in a cluster;J indicates the window subscript in cluster i;ciTable Show the cluster heart of cluster i;
3) step 2-3 is executed, according to the following formula, calculates the cluster heart distance of any two cluster;
dist(ci,cj)=| ci-cj|i≠j
Wherein, dist (ci,cj) indicate cluster i and cluster j manhatton distance;ci、cjRespectively indicate the cluster heart of two clusters;
4) step 2-4, two nearest clusters of combined distance and according to the following formula replacement cluster center are executed;
Wherein, c indicates the cluster heart;W indicates the window number in the cluster;cDiIndicate the maximum wavelet detail coefficients of window i;
5) step 2-5 is executed, n number subtracts 1;
6) step 2-6 is executed, repeats step 2) to 5) until n=1;
7) step 2-7 is executed, corresponding cluster result when SSE declines most fast is picked out according to the following formula, is denoted as Result={ c1,c2,…ck, k indicates the number of clusters mesh of this layer of cluster result;
Wherein, i indicates the number of plies of cluster;M is window number, that is, clusters the maximum number of plies;
8) step 2-8 is executed, the distance of any two cluster in result is calculated, picks out two nearest clusters of distance, note Make ci,cj
9) step 2-9 is executed, if dist (ci,cj)≤d, d=0.2), then merge the two clusters, and calculate the cluster of new cluster The heart, then repeatedly step 8;
10) step 2-10 is executed, if dist (ci,cj) > d, then exit cluster process;
11) contained window is the Parameters variation point in lesser cluster in cluster result, and lesser cluster is exactly window in cluster The ratio between several and total window number is less than the cluster of given threshold value 0.2, and all labels compared with window in tuftlet then constitute the variation of the parameter Point set, i.e. cpv={ cp1,cp2,…,cpm, wherein cpiIt is window label.
Step 4: referring to Fig. 3, finally, using CCP (Clustering based on Change Point) cluster module 1-4 carries out CCP cluster to the change point vectors of all variables, the variable in cluster result in the same cluster be it is relevant, finally Export the incidence relation and its intensity of each cluster internal variable;
1) step 3-1 is carried out first, and the single independent cluster of variable is equipped with n variable, and the number of cluster is denoted as k, then k=n;
2) step 3-2 is executed, according to the following formula, calculates the variation consistency coefficient CoC of any two cluster:
Wherein, CoC (c) indicates cluster c (ci, cjNew cluster after merging) variation consistency coefficient;X, y is any two in cluster c A variable;Z is cluster internal variable number, and the combination of any two variable has z (z-1)/2 kind, the variation consistency coefficient of a cluster It is equal to the average value of the variation consistency coefficient of all any two variables in cluster
Wherein, CoC (x, y) indicates the variation consistency coefficient of two variables x, y;|cpvx| indicate the change point of variable x Number (i.e. the size of the Parameters variation point set);|cpvy| indicate the change point number of variable y;|cpvxy| indicate variable x and y Common change point number;
cpvxy=cpvx∩cpvy
Wherein, cpvx、cpvyRespectively indicate the variation point set of variable x, y;
3) step 3-3 is executed, the variation strongest two cluster c of consistency are picked outi,cj, variation consistency system between the two Number scale makees max_CoC;
4) step 3-4 is executed, if max_CoC is more than or equal to given threshold value 0.8, merges cluster ci,cj, k number subtracts 1, turns Step 2);
5) step 3-5 is executed, if max_CoC is less than given threshold value, exits cluster process, in final cluster result, Variable in the same cluster has incidence relation, and the strength of association between them is exactly the variation consistency coefficient of corresponding cluster CoC。
Referring to table 1, simulated each variable 20 days for example time series data variable simulated function according to simulated function Data, sampling interval are 20 minutes.Three groups of correlated variables are wherein shared, every group includes 11 variables, A group variable and g1(x) phase It closes, B group variable and g2(x) related, C group variable and g3(x) related, formula is as follows:
Table 1
It is the emulation datagraphic segment of few examples time series data variable referring to Fig. 4.Yellow, white bars mark in figure The part of note indicates window, wherein " cDi " indicates the maximum wavelet detail coefficients of i-th of window.
It is example time series data variable association relation excavation in CCP cluster module as a result, wherein same referring to table 2 Variable in cluster is considered to have incidence relation, and the strength of association between them is exactly the variation consistency system of corresponding cluster Number CoC.
Table 2

Claims (5)

1. the method for excavating time series data incidence relation based on variation consistency, it is characterised in that: realize the system packet of this method Data preprocessing module (1-1), characteristic extracting module (1-2), WDC cluster module (1-3) and CCP cluster module (1-4) are included, It comprises the concrete steps that:
1) elimination of burst noise, at equal intervals interpolation, normalization are carried out to original temporal data firstly, handling using the pre- module of data (1-1) Operation, obtains the valid data form of timing variable;
2) secondly, using each window data of the characteristic extracting module (1-2) to the valid data form of timing variable carry out from Wavelet transformation is dissipated, maximum wavelet detail coefficients are extracted;
3) then, WDC is carried out using maximum wavelet detail coefficients of the WDC cluster module (1-3) to all windows of single variable to gather Class, it is change point that window in the cluster of threshold value is less than in cluster result;
4) same in cluster result finally, carrying out CCP cluster to the change point vectors of all variables using CCP cluster module (1-4) Variable in one cluster be it is relevant, finally export the incidence relation and its intensity of each cluster internal variable.
2. the method according to claim 1 for excavating time series data incidence relation based on variation consistency, it is characterised in that: The data preprocessing module (1-1) to original temporal data carry out elimination of burst noise, at equal intervals interpolation, normalization operation include with Lower step:
Firstly, calculating the mean value and standard deviation of each window, judges each data point and difference of watch window mean value is where it The standard deviation of the no watch window for being greater than 5 times, if more than then the data point is outlier, is rejected;
Then, interpolation at equal intervals is carried out to the time series after elimination of burst noise, if the sampling interval is Δ t, initial time is T, then etc. Time collection after the interpolation of interval is combined into { T+n* Δ t n=0,1,2,3 ... }, and the corresponding value of T+i* time Δt is in original series Nearest from the moment is less than value corresponding to T+i* time Δt, i.e., in original series first be greater than T+i* time Δt before Observation corresponding to one moment;
Finally, carrying out linear normalization to the data after interpolation operation at equal intervals, a time series is scanned first, is observed The maximum value (max) and minimum value (min) of value, according to formulaNumerical value after calculating each observation point normalization, Original time series value range is transformed on [0,1] section, wherein xiIndicate i-th of observation point numerical value;Δ=max- min。
3. the method according to claim 1 for excavating time series data incidence relation based on variation consistency, which is characterized in that The characteristic extraction step of the characteristic extracting module (1-2) includes: firstly, being cut with sliding window to univariate data It cuts, if the Sampling starting point of initial data is t moment, the sampling interval is n seconds, window size m, sliding distance l, then first The period of window is that the initial time of t, t+n*m, two windows is that first window initial time slides backward l, therefore second The period of a window is t+l, t+l+n*m, and so on, obtain N number of window;
Secondly, carrying out discrete wavelet transformation to the data in each window, according to window size, wavelet decomposition number of plies L, choosing are set Take maximum wavelet details coefficient cD in windowiAs the feature of the window, [i, cDi] indicate initial data in i-th of window Wavelet character.
4. the method according to claim 3 for excavating time series data incidence relation based on variation consistency, which is characterized in that The WDC sorting procedure of the WDC cluster module (1-3) includes:
1) initialization of cluster, each independent cluster of window, the cluster heart be the window itself feature vector wavelet character [i, CDi], window number is denoted as m, and number of clusters mesh is denoted as n, at this time n=m;
2) the error sum of squares SSE of cluster result according to the following formula, is calculatedn
Wherein, n indicates the number of cluster;W indicates the window number in a cluster;J indicates the window subscript in cluster i;ciIndicate cluster i The cluster heart;
3) the cluster heart distance of any two cluster according to the following formula, is calculated;
dist(ci,cj)=| ci-cj|i≠j
Wherein, dist (ci,cj) indicate cluster i and cluster j manhatton distance;ci、cjRespectively indicate the cluster heart of two clusters;
4) two nearest clusters of combined distance and according to the following formula replacement cluster center;
Wherein, c indicates the cluster heart;W indicates the window number in the cluster;cDiIndicate the maximum wavelet detail coefficients of window i;
5) n number subtracts 1;
6) step 2) is repeated to 5) until n=1;
7) corresponding cluster result when SSE declines most fast is picked out according to the following formula, is denoted as result={ c1,c2,…ck, k Indicate the number of clusters mesh of the cluster result;
Wherein, i indicates the number of plies of cluster;M is window number, that is, clusters the maximum number of plies;
8) distance for calculating any two cluster in result picks out two nearest clusters of distance, is denoted as ci,cj
If 9) dist (ci,cj)≤d, d=0.2 then merges the two clusters, and calculates the cluster heart of new cluster, then repeatedly step 8;
If 10) dist (ci,cj) > d, then exit cluster process;
11) contained window is Parameters variation point in lesser cluster in cluster result, and lesser cluster is exactly window number and total in cluster The ratio between window number is less than the cluster of given threshold value 0.2, and all labels compared with window in tuftlet then constitute the variation point set of parameter, i.e., Cpv={ cp1,cp2,…,cpm, wherein cpiIt is window label.
5. the method according to claim 1 for excavating time series data incidence relation based on variation consistency, it is characterised in that: The CCP sorting procedure of the CCP cluster module (1-4) includes:
1) the independent cluster of single variable is equipped with n variable, and the number of cluster is denoted as k, then k=n;
2) the variation consistency coefficient CoC of any two cluster according to the following formula, is calculated:
Wherein, CoC (c) indicates cluster c (ci, cjNew cluster after merging) variation consistency coefficient;X, y is that any two become in cluster c Amount;Z is cluster internal variable number, and the combination of any two variable has z (z-1)/2 kind, and the variation consistency coefficient of a cluster is just etc. In the average value of the variation consistency coefficient of any two variables all in cluster:
Wherein, CoC (x, y) indicates the variation consistency coefficient of two variables x, y;|cpvx| indicate variable x change point number be The size of Parameters variation point set;|cpvy| indicate the change point number of variable y;|cpvxy| indicate the common variation of variable x and y Point number;
cpvxy=cpvx∩cpvy
Wherein, cpvx、cpvyRespectively indicate the variation point set of variable x, y;
3) the variation strongest two cluster c of consistency are picked outi,cj, variation consistency coefficient between the two is denoted as max_CoC;
4) if max_CoC is more than or equal to given threshold value 0.8, merge cluster ci,cj, k number subtracts 1, goes to step 2);
If 5) max_CoC is less than given threshold value, cluster process is exited, in final cluster result, the variable in the same cluster Strength of association with incidence relation, and between them is exactly the variation consistency coefficient CoC of corresponding cluster.
CN201610814069.7A 2016-09-09 2016-09-09 The method for excavating time series data incidence relation based on variation consistency Expired - Fee Related CN106446081B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610814069.7A CN106446081B (en) 2016-09-09 2016-09-09 The method for excavating time series data incidence relation based on variation consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610814069.7A CN106446081B (en) 2016-09-09 2016-09-09 The method for excavating time series data incidence relation based on variation consistency

Publications (2)

Publication Number Publication Date
CN106446081A CN106446081A (en) 2017-02-22
CN106446081B true CN106446081B (en) 2019-08-13

Family

ID=58169070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610814069.7A Expired - Fee Related CN106446081B (en) 2016-09-09 2016-09-09 The method for excavating time series data incidence relation based on variation consistency

Country Status (1)

Country Link
CN (1) CN106446081B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109948646A (en) * 2019-01-24 2019-06-28 西安交通大学 A kind of time series data method for measuring similarity and gauging system
CN112231326B (en) * 2020-09-30 2022-08-30 新华三大数据技术有限公司 Method and server for detecting Ceph object
CN113282645A (en) * 2021-07-23 2021-08-20 广东粤港澳大湾区硬科技创新研究院 Satellite time sequence parameter analysis method, system, terminal and storage medium
CN116340796B (en) * 2023-05-22 2023-12-22 平安科技(深圳)有限公司 Time sequence data analysis method, device, equipment and storage medium
CN117472915B (en) * 2023-12-27 2024-03-15 中国西安卫星测控中心 Hierarchical storage method of time sequence data oriented to multiple Key values

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN105843919A (en) * 2016-03-24 2016-08-10 云南大学 Moving object track clustering method based on multi-feature fusion and clustering ensemble

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189183A1 (en) * 2014-12-31 2016-06-30 Flytxt BV System and method for automatic discovery, annotation and visualization of customer segments and migration characteristics

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205111A (en) * 2015-09-01 2015-12-30 西安交通大学 System and method for mining failure modes of time series data
CN105843919A (en) * 2016-03-24 2016-08-10 云南大学 Moving object track clustering method based on multi-feature fusion and clustering ensemble

Also Published As

Publication number Publication date
CN106446081A (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN106446081B (en) The method for excavating time series data incidence relation based on variation consistency
CN106384128A (en) Method for mining time series data state correlation
CN107272655B (en) Batch process fault monitoring method based on multistage ICA-SVDD
CN105974265B (en) A kind of electric network fault cause diagnosis method based on svm classifier technology
KR101232945B1 (en) Two-class classifying/predicting model making method, computer readable recording medium recording classifying/predicting model making program, and two-class classifying/predicting model making device
CN104142918A (en) Short text clustering and hotspot theme extraction method based on TF-IDF characteristics
CN105893208A (en) Cloud computing platform system fault prediction method based on hidden semi-Markov models
CN103279123A (en) Method of monitoring faults in sections for intermittent control system
CN108491886A (en) A kind of sorting technique of the polynary time series data based on convolutional neural networks
CN106709509B (en) Satellite telemetry data clustering method based on time series special points
CN114281864A (en) Correlation analysis method for power network alarm information
CN106682835B (en) Data-driven complex electromechanical system service quality state evaluation method
CN105205113A (en) System and method for excavating abnormal change process of time series data
CN106649438A (en) Time series data unexpected fault detection method
CN112738014A (en) Industrial control flow abnormity detection method and system based on convolution time sequence network
CN110059126B (en) LKJ abnormal value data-based complex correlation network analysis method and system
CN110348510B (en) Data preprocessing method based on staged characteristics of deepwater oil and gas drilling process
CN109558436A (en) Air station flight delay causality method for digging based on entropy of transition
CN109597901B (en) Data analysis method based on biological data
CN115935144A (en) Denoising and reconstructing method for operation and maintenance data
CN110059938B (en) Power distribution network planning method based on association rule driving
CN103729197B (en) Multi-granularity layer software clustering method based on LDA (latent dirichlet allocation) model
CN108427753A (en) A kind of new data digging method
CN114020811A (en) Data anomaly detection method and device and electronic equipment
Zhang et al. Similarity analysis of industrial alarm floods based on word embedding and move-split-merge distance

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190813