CN106384128A - Method for mining time series data state correlation - Google Patents

Method for mining time series data state correlation Download PDF

Info

Publication number
CN106384128A
CN106384128A CN201610814387.3A CN201610814387A CN106384128A CN 106384128 A CN106384128 A CN 106384128A CN 201610814387 A CN201610814387 A CN 201610814387A CN 106384128 A CN106384128 A CN 106384128A
Authority
CN
China
Prior art keywords
cluster
window
state
data
clusters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610814387.3A
Other languages
Chinese (zh)
Inventor
王文青
王徐华
杨天社
鲍军鹏
赵静
李辉
张海龙
齐勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
China Xian Satellite Control Center
Original Assignee
Xian Jiaotong University
China Xian Satellite Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University, China Xian Satellite Control Center filed Critical Xian Jiaotong University
Priority to CN201610814387.3A priority Critical patent/CN106384128A/en
Publication of CN106384128A publication Critical patent/CN106384128A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种挖掘时序数据状态关联的方法,首先对时序数据变量进行预处理,包括去野值、等间隔插值、归一化操作;然后对单个变量进行状态挖掘,用动态划分聚类方法对单个变量所有窗口的综合特征向量进行聚类,不同簇的窗口代表不同的状态,将所有簇按照大小排序,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即获取每个变量的状态字符串;再将所有变量的状态字符串对齐,获得多变量状态矩阵;利用Apriori算法挖掘不同变量状态之间的关联规则并给出形式化表达及其关联强度;最后进行关联规则约简以去除冗余信息;本发明具有抗噪音干扰能力,适合于对小参数集合细致地分析其状态取值关联性,挖掘出状态值映射关系。

A method for mining time-series data state associations. Firstly, the time-series data variables are preprocessed, including wild value removal, equal interval interpolation, and normalization operations; The comprehensive eigenvectors of all windows are clustered, and the windows of different clusters represent different states. All clusters are sorted according to size, and each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form. , which is to obtain the state string of each variable; then align the state strings of all variables to obtain a multivariate state matrix; use the Apriori algorithm to mine the association rules between different variable states and give the formal expression and its association strength; Finally, association rules are reduced to remove redundant information; the invention has the ability to resist noise interference, and is suitable for carefully analyzing the correlation of state values for small parameter sets, and digging out the state value mapping relationship.

Description

一种挖掘时序数据状态关联的方法A Method for Mining State Association of Time Series Data

技术领域technical field

本发明属于智能信息处理和计算机技术领域,具体涉及一种挖掘时序数据状态关联的方法。The invention belongs to the field of intelligent information processing and computer technology, and in particular relates to a method for mining time series data state correlation.

背景技术Background technique

在大型复杂系统中,变量状态之间包含一定的关联关系。这种关联关系受到系统内部规律的作用,在异常数据上就会有某种体现。关联性在时空上可以表现为共现关系、因果关系、先兆关系、相关性等等。当系统状态发生变化时,将引起不同变量的相应变化。系统在正常状态和异常状态下运行规律不同,反映在异常数据上表现为变量的变化形式不同。通过分析多个变量异常数据的变化规律,挖掘出不同变量状态之间的关联性,对于总结系统运行规律,发现潜在故障知识具有重要作用。In a large complex system, there are certain correlations among variable states. This correlation is affected by the internal laws of the system, and it will be reflected in abnormal data. Correlation can be manifested as co-occurrence relationship, causality relationship, precursor relationship, correlation and so on in time and space. When the state of the system changes, it will cause corresponding changes in different variables. The operating rules of the system are different under normal state and abnormal state, which is reflected in the abnormal data as the variation of variables. By analyzing the variation law of abnormal data of multiple variables, mining the correlation between the states of different variables plays an important role in summarizing the operation law of the system and discovering potential fault knowledge.

发明内容Contents of the invention

本发明的目的在于提供一种挖掘时序数据状态关联的方法,该方法综合运用了特征提取技术、聚类学习理论来挖掘单个变量的状态,然后利用Apriori算法挖掘不同变量的状态关联规则并给出形式化表示及关联强度,最后对关联规则进行约简消除冗余信息;本发明考虑到变量取值的模糊性或不确定性,具有抗噪音干扰能力,适合于对小参数集合细致地分析其状态取值关联性,挖掘出状态值映射关系。The purpose of the present invention is to provide a method for mining the state association of time series data. The method comprehensively uses feature extraction technology and clustering learning theory to mine the state of a single variable, and then uses the Apriori algorithm to mine the state association rules of different variables and gives Formal representation and association strength, and finally reduce the association rules to eliminate redundant information; the present invention considers the ambiguity or uncertainty of variable values, has the ability to resist noise interference, and is suitable for carefully analyzing small parameter sets. State value correlation, dig out the state value mapping relationship.

为达到上述目的,本发明采用的技术方案是:In order to achieve the above object, the technical scheme adopted in the present invention is:

一种挖掘时序数据状态关联的方法,实现该方法的系统包括数据预处理模块、特征提取模块、动态划分聚类模块、多变量状态矩阵生成模块、Apriori状态关联挖掘模块和关联规则约简模块,其具体步骤是:A method for mining time-series data state association, the system for realizing the method includes a data preprocessing module, a feature extraction module, a dynamic partition clustering module, a multivariate state matrix generation module, an Apriori state association mining module and an association rule reduction module, Its specific steps are:

1)首先,数据预处理模块对原始时序数据进行去野值、等间隔插值、归一化操作,得到有效数据形式;1) First, the data preprocessing module performs outlier removal, equal interval interpolation, and normalization operations on the original time series data to obtain an effective data form;

2)其次,特征提取模块将时间序列变量的有效数据划分成长度相等的窗口,对每个窗口数据提取特征,包括傅里叶特征、统计特征、小波特征构成特征向量;2) Secondly, the feature extraction module divides the effective data of time series variables into windows of equal length, and extracts features for each window data, including Fourier features, statistical features, and wavelet features to form feature vectors;

3)然后,动态划分聚类模块对单个变量所有窗口的特征向量进行动态划分聚类,将聚类得到的簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值2的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即该变量的状态字符串;3) Then, the dynamic clustering module dynamically clusters the eigenvectors of all windows of a single variable, sorts the clusters obtained by clustering according to size, the largest cluster is represented by the character 'a', and the next largest cluster is represented by 'b' 'Represents, and so on, clusters smaller than a given threshold 2 are considered noise, use'? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the state string of the variable;

4)多变量状态矩阵生成模块将所有变量的状态字符串按照时间对齐,形成状态矩阵;4) The multi-variable state matrix generation module aligns the state character strings of all variables according to time to form a state matrix;

5)Apriori状态关联挖掘模块用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘;5) The Apriori state association mining module uses the Apriori algorithm to mine frequent itemsets and association rules for the multivariate state matrix;

6)最后,关联规则约简模块对检测到的关联规则进行约简,消除冗余信息,得到最终的多变量状态关联规则。6) Finally, the association rule reduction module reduces the detected association rules, eliminates redundant information, and obtains the final multi-variable state association rules.

所述的数据预处理模块去野值的步骤包括:计算每个窗口的均值和标准差,判断每个数据点与其所在观察窗口均值之差是否大于5倍的观察窗口的标准差,若大于,则该数据点为野值剔除;对去野值后的时间序列进行等间隔插值,设间隔为△t,起始时刻是T,则等间隔插值后的时间集合应为{T+n*△t n=0,1,2,3…},T+i*△t时刻对应的值为原始序列中离该时刻最近的小于T+i*△t时刻所对应的值,即原始序列中第一个大于T+i*△t时刻的前一个时刻所对应的观测值;对等间隔插值操作后的数据进行线性归一化,首先扫描一遍时间序列,获得观测值的最大值(max)和最小值(min),根据如下公式计算每个观测点归一化后的数值,将原始时间序列取值范围转换到[0,1]区间上;The step of removing the wild value of the data preprocessing module includes: calculating the mean value and standard deviation of each window, judging whether the difference between each data point and the mean value of the observation window where it is located is greater than the standard deviation of the observation window of 5 times, if greater than, Then the data point is outlier elimination; perform equal interval interpolation on the time series after outlier removal, set the interval as △t, and the starting time is T, then the time set after equal interval interpolation should be {T+n*△ t n=0, 1, 2, 3...}, the value corresponding to the moment T+i*△t in the original sequence is smaller than the value corresponding to the moment T+i*△t, that is, the first value in the original sequence The observation value corresponding to the previous time greater than T+i*△t time; linearly normalize the data after equal interval interpolation operation, first scan the time series once, and obtain the maximum value (max) and minimum value of the observation value Value (min), calculate the normalized value of each observation point according to the following formula, and convert the value range of the original time series to the [0,1] interval;

xx ii == xx ii -- mm ii nno ΔΔ

其中,xi表示第i个观测点数值;△=max-min。Among them, x i represents the value of the i-th observation point; △=max-min.

特征提取模块:首先,用设定的窗口对单变量数据进行切割;其次,对每个窗口内的数据进行特征提取,包括统计特征、傅里叶特征、小波特征;v1=[均值,方差],v1表示统计特征,其中均值反映了一个窗口内数据的平均水平,方差则描述了窗口内数据的波动程度;v2=[傅里叶系数1,傅里叶频率1,傅里叶系数2,傅里叶频率2],v2表示频域特征,通过傅里叶变换得到一系列的傅里叶系数,对傅里叶系数按照绝对值从大到小进行排序,选取前两个最大的傅里叶系数及其所对应的频率;v3=[小波系数细节系数1,…小波细节系数n],v3表示时频域特征,对每个窗口进行离散小波变换,得到n个小波细节系数,将这三方面特征综合起来,构成窗口特征向量v,v=[v1,v2,v3]。Feature extraction module: firstly, use the set window to cut the univariate data; secondly, perform feature extraction on the data in each window, including statistical features, Fourier features, wavelet features; v1=[mean value, variance] , v1 represents a statistical feature, wherein the mean value reflects the average level of the data in a window, and the variance describes the fluctuation degree of the data in the window; v2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents the frequency domain feature, a series of Fourier coefficients are obtained through Fourier transform, the Fourier coefficients are sorted from large to small in absolute value, and the first two largest Fourier coefficients are selected Leaf coefficient and corresponding frequency thereof; v3=[wavelet coefficient detail coefficient 1, ... wavelet detail coefficient n], v3 represents time-frequency domain feature, carries out discrete wavelet transform to each window, obtains n wavelet detail coefficients, this The features of the three aspects are combined to form a window feature vector v, v=[v1, v2, v3].

所述的动态划分聚类模块采用动态划分聚类方法对单个变量所有窗口的特征向量进行聚类其过程如下:Described dynamic division clustering module adopts dynamic division clustering method to cluster the feature vectors of all windows of a single variable and its process is as follows:

1)第一个窗口单独成簇,簇心为该窗口的综合特征向量;1) The first window is clustered separately, and the cluster center is the comprehensive feature vector of the window;

2)簇的初始划分过程,根据如下公式计算第2个窗口和第1个簇簇心之间的相似度:2) In the initial division process of clusters, the similarity between the second window and the first cluster center is calculated according to the following formula:

cc oo sthe s (( vv ii kk ,, vv jj kk )) == vv ii kk ·&Center Dot; vv jj kk || vv ii kk || ×× || vv jj kk ||

式中:cos(vik,vjk)表示窗口i的vk(k=1,2,3)向量和窗口j的vk向量间的余弦相似度;cos(vik,vjk)∈[-1,+1],利用如下公式,计算两个窗口之间的距离;In the formula: cos(v ik , v jk ) represents the cosine similarity between the v k (k=1, 2, 3) vector of window i and the v k vector of window j; cos(v ik , v jk )∈[ -1, +1], use the following formula to calculate the distance between two windows;

dd (( vv ii kk ,, vv jj kk )) == 11 -- cc oo sthe s (( vv ii kk ,, vv jj kk )) ++ 11 22

dd ii sthe s tt (( vv ii ,, vv jj )) == ΣΣ kk == 11 33 dd ii sthe s tt (( vv ii kk ,, vv jj kk )) 33

式中:dist(vi,vj)表示窗口i和j之间的距离,dist(vi,vj)∈[0,1];In the formula: dist(v i , v j ) represents the distance between window i and j, dist(v i , v j )∈[0,1];

若dist<d,d=0.2),则将2号窗口并入第一个簇并且根据如下公式立即更换簇心:If dist<d, d=0.2), merge window 2 into the first cluster and replace the cluster center immediately according to the following formula:

cvcv kk == &Sigma;&Sigma; ii == 11 nno vv ii kk nno ,, (( kk == 11 ,, 22 ,, 33 ))

式中:cvk表示簇心c的第k个特征向量,它等于簇内所有窗口第k个特征向量的均值,一个簇的簇心c就是cv1,cv2,cv3的组合,即簇内所有窗口综合特征向量的平均值;若dist≥d,2号窗口单独成簇,依次处理其余窗口:计算第i个窗口和所有已经产生的簇簇心之间的距离,挑选出和其最近的簇c的距离dist,若dist<d,将i号窗口并入簇c,否则单独成簇;In the formula: cv k represents the kth eigenvector of the cluster center c, which is equal to the mean value of the kth eigenvector of all windows in the cluster, and the cluster center c of a cluster is the combination of cv 1 , cv 2 , and cv 3 , that is, the cluster The average value of the comprehensive eigenvectors of all windows in the window; if dist≥d, window No. 2 is clustered separately, and the remaining windows are processed in turn: calculate the distance between the i-th window and all the cluster centers that have been generated, and select the closest to it The distance dist of the cluster c, if dist<d, merge the i window into the cluster c, otherwise form a separate cluster;

3)簇的调整过程:取出第i号窗口(i=1,2,…m),计算其与所有簇的簇心距离dist,挑选出最小的dist及其对应的簇c,若dist≤d,且窗口i不在簇c中,则将窗口i从原来的簇移到簇c;若dist≤d,且窗口i在簇c中,则窗口i不进行操作;若dist>d,则将窗口i从原来的簇中移除,单独成簇;重复上述过程,直到处理完所有窗口,计算所有簇的簇心。若存在一个簇心发生了变化,则重复簇的调整过程,即步骤3)直至所有簇的簇心不再变化;若所有簇心都不变,则执行步骤4);3) Cluster adjustment process: take out the i-th window (i=1, 2,...m), calculate the distance dist between it and all cluster centers, select the smallest dist and its corresponding cluster c, if dist≤d , and window i is not in cluster c, move window i from the original cluster to cluster c; if dist≤d, and window i is in cluster c, then window i does not operate; if dist>d, move window i is removed from the original cluster and clustered separately; repeat the above process until all windows are processed, and the cluster centers of all clusters are calculated. If there is a cluster center that has changed, repeat the cluster adjustment process, that is, step 3) until the cluster centers of all clusters do not change; if all cluster centers are unchanged, then perform step 4);

4)簇的合并过程:计算任意两个簇的簇心距离,选出距离最近的两个簇ci,cj,及其对应的距离dist,若dist≤α,α=0.3),则合并簇ci,cj,并且计算合并后新簇的簇心,重复合并过程4),若dist>α,表示不存在两个足够接近的簇可以合并,则退出合并过程,聚类算法结束,聚类结果中同一个簇内的窗口特征近似,被认为是一种状态,不同簇的窗口代表不同的状态;将所有簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即获取每个变量的状态字符串。4) Cluster merging process: Calculate the distance between the cluster centers of any two clusters, select the two closest clusters c i , c j , and their corresponding distance dist, if dist≤α, α=0.3), then merge Clusters c i , c j , and calculate the cluster center of the new cluster after merging, repeat the merging process 4), if dist>α, it means that there are no two clusters that are close enough to be merged, then exit the merging process, and the clustering algorithm ends, In the clustering results, the window characteristics in the same cluster are similar, which is considered as a state, and the windows of different clusters represent different states; all clusters are sorted by size, the largest cluster is represented by the character 'a', and the second largest cluster Expressed by 'b', and so on, clusters smaller than a given threshold are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the status string of each variable is obtained.

所述的多变量状态矩阵生成模块是将所有变量的状态字符串按照时间对齐,假设有n个变量,每个变量的起始观测时间和截止时间相同,那么它们的窗口数目必然相同,状态字符串长度也相同,假设状态字符串长度为m,则生成n*m的多变量状态矩阵。The multi-variable state matrix generation module is to align the state character strings of all variables according to time, assuming that there are n variables, and the starting observation time and deadline time of each variable are the same, so their window numbers must be the same, and the state characters The length of the string is also the same, assuming that the length of the state string is m, a multivariate state matrix of n*m is generated.

所述的Apriori状态关联挖掘模块是利用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘:Apriori算法挖掘频繁项集流程如下:首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1寻找频繁2项集L2,如此下去,直到不能再找到任何频繁k项集;每次迭代分为两个步骤:第一,通过连接步和剪枝步产生候选项集;第二,计算每个候选项的支持度,大于最小支持度阈值0.001的项被认为是频繁项,在频繁项集的基础上挖掘关联规则,具体如下:首先,对于每个频繁项集L,产生L的所有非空子集;其次,对于L的每个非空子集s,产生一个候选规则“s→(L-s)”,其中(L-s)表示L中除去s之后所剩的内容;最后如果该候选规则的置信度大于给定阈值0.5,则输出该规则,否则抛弃该规则,规则的置信度按下式计算:The Apriori state association mining module utilizes the Apriori algorithm to mine frequent itemsets and association rules of the multivariate state matrix: the Apriori algorithm mining frequent itemsets process is as follows: first, find out all frequent 1-itemsets by scanning transaction records , the set is denoted as L 1 , and then use L 1 to find the frequent 2-itemset L 2 , and so on until no more frequent k-itemset can be found; each iteration is divided into two steps: first, through the connection step and the clipping step The branch step generates candidate item sets; secondly, calculate the support degree of each candidate item, and items greater than the minimum support threshold 0.001 are considered frequent items, and association rules are mined on the basis of frequent item sets, as follows: first, for For each frequent itemset L, generate all non-empty subsets of L; secondly, for each non-empty subset s of L, generate a candidate rule "s→(Ls)", where (Ls) represents all non-empty subsets in L after removing s Finally, if the confidence of the candidate rule is greater than the given threshold 0.5, the rule is output, otherwise the rule is discarded, and the confidence of the rule is calculated as follows:

CC ff (( LL ,, sthe s )) == SS pp (( LL )) SS pp (( sthe s ))

其中,Cf(L,s)表示规则“s→(L-s)”的置信度,Sp(L)表示L的支持度,Sp(s)表示s的支持度。Among them, Cf(L,s) represents the confidence degree of the rule "s→(L-s)", Sp(L) represents the support degree of L, and Sp(s) represents the support degree of s.

所述的关联规则约简模块对产生的冗余规则进行合并或删除,约简步骤为:The association rule reduction module merges or deletes the generated redundant rules, and the reduction steps are:

1)对于得到的关联规则,按照置信度从大到小进行排序;1) For the obtained association rules, sort according to the confidence degree from large to small;

2)对于每一个K阶频繁项(K>1),只保留置信度最大的关联规则;2) For each K-order frequent item (K>1), only keep the association rule with the highest confidence;

3)如果两条关联规则的前件相同,则对后件进行比较,如果后件存在包含关系,在置信度相差很小的前提下,删除被包含的后件所属于的规则;3) If the antecedents of the two association rules are the same, compare the latter, and if the latter has an inclusion relationship, delete the rule to which the included latter belongs under the premise that the confidence difference is small;

4)如果两条关联规则的后件相同,则对前件进行比较,如果前件存在包含关系,在置信度相差很小的前提下,删除前件比较多的规则,保留前件比较少的规则;4) If the latter parts of the two association rules are the same, compare the former parts. If there is an inclusion relationship between the former parts, on the premise that the difference in confidence is small, delete the rule with more former parts and keep the rule with less former parts rule;

5)为了确保知识的一致性,避免出现循环推理,需要检测关联规则中是否存在环,对环的检测通过有向无环图实现,用一个节点表示前件,一个节点表示后件,二者用有向边连接,逐一检测按照置信度降序排列的关联规则。5) In order to ensure the consistency of knowledge and avoid circular reasoning, it is necessary to detect whether there is a cycle in the association rules. The detection of the cycle is realized through a directed acyclic graph. One node represents the antecedent, and one node represents the latter. Use directed edge connections to detect association rules in descending order of confidence one by one.

相对于现有技术,本发明首先,检测每个变量的状态,采用动态划分聚类方法对所有窗口特征向量进行聚类,同一个簇内的窗口特征近似,代表同一种状态,不同簇内的窗口表示不同的状态。将所有变量的状态字符串按照时间对齐,得到多变量状态矩阵。利用Apriori算法挖掘不同变量状态取值之间的频繁共现关系,从而得到多个变量不同取值之间的关联,并给出形式化表达及其关联强度。最后用对产生的关联规则进行约简去除冗余信息。本发明考虑到了变量取值的模糊性或不确定性,具有抗噪音干扰能力,适合于对小参数集合细致地分析其状态取值关联性,挖掘出状态值映射关系。Compared with the prior art, the present invention first detects the state of each variable, and clusters all window feature vectors using a dynamic partition clustering method. The window features in the same cluster are similar, representing the same state, and the window feature vectors in different clusters are similar. Windows represent different states. Align the state strings of all variables according to time to obtain a multivariate state matrix. The Apriori algorithm is used to mine the frequent co-occurrence relationship between different variable state values, so as to obtain the correlation between different values of multiple variables, and give the formal expression and its correlation strength. Finally, redundant information is removed by reducing the generated association rules. The present invention takes into account the fuzziness or uncertainty of variable values, has anti-noise interference capability, and is suitable for carefully analyzing the relationship between state values of small parameter sets and digging out the state value mapping relationship.

附图说明Description of drawings

图1是本发明系统的模块框架图。Fig. 1 is a block diagram of the system of the present invention.

图2是本发明动态划分聚类模块流程图。Fig. 2 is a flowchart of the dynamic division and clustering module of the present invention.

表1是本发明示例时序变量的状态字符串。Table 1 is the state character string of the example time series variable of the present invention.

表2是本发明示例时序变量的状态关联规则挖掘结果。Table 2 is the state association rule mining results of the example time series variables of the present invention.

图3是本发明部分示例时序变量状态关联规则示意图。Fig. 3 is a schematic diagram of some example temporal variable state association rules in the present invention.

图4是本发明部分示例时序变量状态关联规则示意图。Fig. 4 is a schematic diagram of some example temporal variable state association rules in the present invention.

具体实施方式detailed description

下面结合附图对本发明作进一步详细说明。The present invention will be described in further detail below in conjunction with the accompanying drawings.

参见图1,实现本发明的系统包括数据预处理模块1-1、特征提取模块1-2、动态划分聚类模块1-3、多变量状态矩阵生成模块1-4、Apriori状态关联挖掘模块1-5和关联规则约简模块1-6;本发明方法的具体步骤是:Referring to Fig. 1, the system realizing the present invention comprises data preprocessing module 1-1, feature extraction module 1-2, dynamic division clustering module 1-3, multivariate state matrix generation module 1-4, Apriori state association mining module 1 -5 and association rule reduction module 1-6; The concrete steps of the inventive method are:

1)首先,数据预处理模块1-1对原始时序数据进行去野值、等间隔插值、归一化操作,得到有效数据形式;1) First, the data preprocessing module 1-1 performs outlier removal, equal interval interpolation, and normalization operations on the original time series data to obtain an effective data form;

去野值的步骤包括:计算每个窗口的均值和标准差,判断每个数据点与其所在观察窗口均值之差是否大于5倍的观察窗口的标准差,若大于,则该数据点为野值剔除;对去野值后的时间序列进行等间隔插值,设间隔为△t,起始时刻是T,则等间隔插值后的时间集合应为{T+n*△t n=0,1,2,3…},T+i*△t时刻对应的值为原始序列中离该时刻最近的小于T+i*△t时刻所对应的值,即原始序列中第一个大于T+i*△t时刻的前一个时刻所对应的观测值;对等间隔插值操作后的数据进行线性归一化,首先扫描一遍时间序列,获得观测值的最大值(max)和最小值(min),根据如下公式计算每个观测点归一化后的数值,将原始时间序列取值范围转换到[0,1]区间上;The step of removing the outlier includes: calculating the mean and standard deviation of each window, and judging whether the difference between each data point and the mean value of the observation window in which it is located is greater than 5 times the standard deviation of the observation window, if greater, then the data point is an outlier Removal; perform equal-interval interpolation on the time series after wild value removal, set the interval as △t, and start time T, then the time set after equal-interval interpolation should be {T+n*△t n=0, 1, 2 , 3...}, the value corresponding to the moment T+i*△t in the original sequence is less than the value corresponding to the moment T+i*△t closest to this moment, that is, the first one in the original sequence greater than T+i*△t The observation value corresponding to the previous moment at time t; linearly normalize the data after the equal interval interpolation operation, first scan the time series once, and obtain the maximum value (max) and minimum value (min) of the observation value, according to the following The formula calculates the normalized value of each observation point, and converts the value range of the original time series to the [0,1] interval;

xx ii == xx ii -- mm ii nno &Delta;&Delta;

其中,xi表示第i个观测点数值;△=max-min。Among them, x i represents the value of the i-th observation point; △=max-min.

2)其次,特征提取模块1-2将时间序列变量的有效数据划分成长度相等的窗口,对每个窗口数据提取特征,包括傅里叶特征、统计特征、小波特征构成特征向量;2) Next, the feature extraction module 1-2 divides the effective data of the time series variable into equal-length windows, and extracts features for each window data, including Fourier features, statistical features, and wavelet features to form feature vectors;

首先,用设定的窗口对单变量数据进行切割;其次,对每个窗口内的数据进行特征提取,包括统计特征、傅里叶特征、小波特征;v1=[均值,方差],v1表示统计特征,其中均值反映了一个窗口内数据的平均水平,方差则描述了窗口内数据的波动程度;v2=[傅里叶系数1,傅里叶频率1,傅里叶系数2,傅里叶频率2],v2表示频域特征,通过傅里叶变换得到一系列的傅里叶系数,对傅里叶系数按照绝对值从大到小进行排序,选取前两个最大的傅里叶系数及其所对应的频率;v3=[小波系数细节系数1,…小波细节系数n],v3表示时频域特征,对每个窗口进行离散小波变换,得到n个小波细节系数,将这三方面特征综合起来,构成窗口特征向量v,v=[v1,v2,v3]。First, use the set window to cut the univariate data; secondly, perform feature extraction on the data in each window, including statistical features, Fourier features, and wavelet features; v1=[mean, variance], v1 means statistics Features, where the mean value reflects the average level of the data in a window, and the variance describes the degree of fluctuation of the data in the window; v2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents the frequency domain feature, a series of Fourier coefficients are obtained through Fourier transform, the Fourier coefficients are sorted from large to small in absolute value, and the first two largest Fourier coefficients and their Corresponding frequency; v3=[wavelet coefficient detail coefficient 1, ... wavelet detail coefficient n], v3 represents time-frequency domain feature, carries out discrete wavelet transform to each window, obtains n wavelet detail coefficients, and these three aspects characteristics are synthesized Get up to form a window feature vector v, v=[v1, v2, v3].

3)然后,动态划分聚类模块1-3对单个变量所有窗口的特征向量进行动态划分聚类,将聚类得到的簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值2的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即该变量的状态字符串;3) Then, the dynamic partition clustering module 1-3 dynamically partitions and clusters the feature vectors of all windows of a single variable, and sorts the clusters obtained by clustering according to the size. The largest cluster is represented by the character 'a', and the second largest cluster Expressed by 'b', and so on, the clusters smaller than the given threshold 2 are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the state string of the variable;

首先执行步骤2-1,第一个窗口单独成簇,簇心为该窗口的综合特征向量。执行步骤2-2,取下一条数据。执行步骤2-3,计算该数据与所有簇心的距离。执行步骤2-4,挑选出和其最近的簇c,它们之间的距离记作dist:First execute step 2-1, the first window is clustered separately, and the cluster center is the integrated feature vector of the window. Execute step 2-2 to get the next piece of data. Execute steps 2-3 to calculate the distance between the data and all cluster centers. Execute steps 2-4, select the closest cluster c, and record the distance between them as dist:

cc oo sthe s (( vv ii kk ,, vv jj kk )) == vv ii kk &CenterDot;&CenterDot; vv jj kk || vv ii kk || &times;&times; || vv jj kk ||

式中:cos(vik,vjk)表示窗口i的vk(k=1,2,3)向量和窗口j的vk向量间的余弦相似度。cos(vik,vjk)∈[-1,+1],利用如下公式,计算两个窗口之间的距离。In the formula: cos(v ik , v jk ) represents the cosine similarity between the v k (k=1, 2, 3) vector of window i and the v k vector of window j. cos(v ik , v jk )∈[-1, +1], use the following formula to calculate the distance between two windows.

dd (( vv ii kk ,, vv jj kk )) == 11 -- cc oo sthe s (( vv ii kk ,, vv jj kk )) ++ 11 22

dd ii sthe s tt (( vv ii ,, vv jj )) == &Sigma;&Sigma; kk == 11 33 dd ii sthe s tt (( vv ii kk ,, vv jj kk )) 33

式中:dist(vi,vj)表示窗口i和j之间的距离,dist(vi,vj)∈[0,1]。In the formula: dist(v i , v j ) represents the distance between windows i and j, and dist(v i , v j )∈[0,1].

执行步骤2-5,判断dist和给定阈值d,d=0.2)的关系,若dist≤d,执行步骤2-6,将该数据并入簇c中并且立即更换簇心;Execute step 2-5, judge the relationship between dist and a given threshold d, d=0.2), if dist≤d, execute step 2-6, merge the data into cluster c and replace the cluster center immediately;

cvcv kk == &Sigma;&Sigma; ii == 11 nno vv ii kk nno ,, (( kk == 11 ,, 22 ,, 33 ))

式中:cvk表示簇心c的第k个特征向量,它等于簇内所有窗口第k个特征向量的均值,一个簇的簇心c就是cv1,cv2,cv3的组合,即簇内所有窗口综合特征向量的平均值;In the formula: cv k represents the kth eigenvector of the cluster center c, which is equal to the mean value of the kth eigenvector of all windows in the cluster, and the cluster center c of a cluster is the combination of cv 1 , cv 2 , and cv 3 , that is, the cluster The average value of the comprehensive eigenvectors of all windows in the window;

若dist>d,执行步骤2-7,该数据单独成簇并且作为簇心。执行步骤2-8,判断是否处理完所有数据。若没有处理完,执行步骤2-2,取下一条数据;否则执行步骤2-9,取第一条数据。执行步骤2-10,计算该数据与所有簇的簇心距离dist。执行步骤2-11,挑选出和其最近的簇c,它们之间的距离记作dist。执行步骤2-12,判断dist和给定阈值d的关系,若dist≤d,执行步骤2-13,判断该数据在不在簇c中,若该数据不存在簇c中,执行步骤2-14,将该数据移入簇c中;否则执行步骤2-15,不对该数据进行操作。若dist>d,执行步骤2-16,该数据单独成簇。执行步骤2-17,判断是否处理完所有数据。若没有处理完,执行步骤2-18,取下一条数据;否则执行步骤2-19,计算所有簇的簇心。执行步骤2-20,判断簇心是否有变化,即聚类结果是否发生变化。若发生变化,则执行步骤2-9;否则执行步骤2-21,选出距离最近的两个簇。执行步骤2-22,判断这两个簇心之间的距离dist和给定阈值α,α=0.3)的大小。若dist<α,合并这两个簇并且计算合并后新簇的簇心,接着执行步骤2-21;若dist≥α,则退出聚类过程。聚类结果中同一个簇内的窗口特征近似,被认为是一种状态,不同簇的窗口代表不同的状态;将所有簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值2的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即获取每个变量的状态字符串。If dist>d, perform steps 2-7, and the data is clustered separately and used as the cluster center. Execute steps 2-8 to determine whether all data has been processed. If not, go to step 2-2 to get the next piece of data; otherwise go to step 2-9 to get the first piece of data. Execute steps 2-10 to calculate the cluster center distance dist between the data and all clusters. Execute steps 2-11, select the closest cluster c, and record the distance between them as dist. Execute step 2-12, judge the relationship between dist and the given threshold d, if dist≤d, execute step 2-13, judge whether the data is in cluster c, if the data does not exist in cluster c, execute step 2-14 , move the data into cluster c; otherwise, execute step 2-15 and do not operate on the data. If dist>d, perform steps 2-16, and the data is clustered separately. Execute steps 2-17 to determine whether all data has been processed. If not, go to step 2-18 to get the next piece of data; otherwise go to step 2-19 to calculate the cluster centers of all clusters. Execute steps 2-20 to determine whether the cluster center has changed, that is, whether the clustering result has changed. If there is a change, go to step 2-9; otherwise go to step 2-21 to select the two closest clusters. Execute steps 2-22 to determine the distance dist between the two cluster centers and the given threshold α, α=0.3). If dist<α, merge the two clusters and calculate the cluster center of the merged new cluster, then execute step 2-21; if dist≥α, exit the clustering process. In the clustering results, the window characteristics in the same cluster are similar, which is considered as a state, and the windows of different clusters represent different states; all clusters are sorted by size, the largest cluster is represented by the character 'a', and the second largest cluster Expressed by 'b', and so on, the clusters smaller than the given threshold 2 are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the status string of each variable is obtained.

参照表1,其为动态划分聚类模块对所有示例变量的状态字符串挖掘结果。对每个变量,聚类结果中同一个簇内的窗口特征近似,被认为是一种状态,不同簇的窗口代表不同的状态;将所有簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即获取每个变量的状态字符串。Referring to Table 1, it is the state string mining results of all example variables by the dynamic partition clustering module. For each variable, the window characteristics in the same cluster in the clustering result are similar, which is considered as a state, and the windows of different clusters represent different states; all clusters are sorted by size, and the largest cluster is represented by the character 'a' , the next largest cluster is represented by 'b', and so on, the clusters smaller than a given threshold are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the status string of each variable is obtained.

表1Table 1

4)多变量状态矩阵生成模块1-4将所有变量的状态字符串按照时间对齐,形成状态矩阵;4) The multi-variable state matrix generation module 1-4 aligns the state character strings of all variables according to time to form a state matrix;

多变量状态矩阵生成模块1-4是将所有变量的状态字符串按照时间对齐,假设有n个变量,每个变量的起始观测时间和截止时间相同,那么它们的窗口数目必然相同,状态字符串长度也相同,假设状态字符串长度为m,则生成n*m的多变量状态矩阵。Multi-variable state matrix generation modules 1-4 are to align the state strings of all variables according to time. Assuming that there are n variables, and the start observation time and cut-off time of each variable are the same, then their window numbers must be the same, and the state characters The length of the string is also the same, assuming that the length of the state string is m, a multivariate state matrix of n*m is generated.

5)Apriori状态关联挖掘模块1-5用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘;5) Apriori state association mining modules 1-5 use the Apriori algorithm to mine frequent itemsets and association rules for the multivariate state matrix;

所述的Apriori状态关联挖掘模块是利用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘:Apriori算法挖掘频繁项集流程如下:首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1寻找频繁2项集L2,如此下去,直到不能再找到任何频繁k项集;每次迭代分为两个步骤:第一,通过连接步和剪枝步产生候选项集;第二,计算每个候选项的支持度,大于最小支持度阈值0.001的项被认为是频繁项,在频繁项集的基础上挖掘关联规则,具体如下:首先,对于每个频繁项集L,产生L的所有非空子集;其次,对于L的每个非空子集s,产生一个候选规则“s→(L-s)”,其中(L-s)表示L中除去s之后所剩的内容;最后如果该候选规则的置信度大于给定阈值0.5,则输出该规则,否则抛弃该规则,规则的置信度按下式计算:The Apriori state association mining module utilizes the Apriori algorithm to mine frequent itemsets and association rules of the multivariate state matrix: the Apriori algorithm mining frequent itemsets process is as follows: first, find out all frequent 1-itemsets by scanning transaction records , the set is denoted as L 1 , and then use L 1 to find the frequent 2-itemset L 2 , and so on until no more frequent k-itemset can be found; each iteration is divided into two steps: first, through the connection step and the clipping step The branch step generates candidate item sets; secondly, calculate the support degree of each candidate item, and items greater than the minimum support threshold 0.001 are considered frequent items, and association rules are mined on the basis of frequent item sets, as follows: first, for For each frequent itemset L, generate all non-empty subsets of L; secondly, for each non-empty subset s of L, generate a candidate rule "s→(Ls)", where (Ls) represents all non-empty subsets in L after removing s The remaining content; finally, if the confidence of the candidate rule is greater than the given threshold 0.5, the rule is output, otherwise the rule is discarded, and the confidence of the rule is calculated as follows:

CC ff (( LL ,, sthe s )) == SS pp (( LL )) SS pp (( sthe s ))

其中,Cf(L,s)表示规则“s→(L-s)”的置信度,Sp(L)表示L的支持度,Sp(s)表示s的支持度。Among them, Cf(L,s) represents the confidence degree of the rule "s→(L-s)", Sp(L) represents the support degree of L, and Sp(s) represents the support degree of s.

6)最后,关联规则约简模块1-6对检测到的关联规则进行约简,消除冗余信息,得到最终的多变量状态关联规则。6) Finally, the association rule reduction module 1-6 reduces the detected association rules, eliminates redundant information, and obtains the final multi-variable state association rules.

1)对于得到的关联规则,按照置信度从大到小进行排序;1) For the obtained association rules, sort according to the confidence degree from large to small;

2)对于每一个K阶频繁项(K>1),只保留置信度最大的关联规则;例如:二阶频繁项(A,B),如果产生关联规则A→B和B→A,只保留置信度最大的一个;2) For each K-order frequent item (K>1), only keep the association rule with the highest confidence; for example: second-order frequent item (A, B), if association rules A→B and B→A are generated, only keep the one with the greatest confidence;

3)如果两条关联规则的前件相同,则对后件进行比较,如果后件存在包含关系,在置信度相差很小的前提下,删除被包含的后件所属于的规则;例如:A→B和A→B,C,如果|Cf(A→B)-Cf(A→B,C)|<δ,则删除A→B,保留A→B,C;3) If the antecedents of the two association rules are the same, compare the latter, and if the latter has an inclusion relationship, delete the rule to which the included latter belongs under the premise that the confidence difference is small; for example: A →B and A→B, C, if |Cf(A→B)-Cf(A→B, C)|<δ, delete A→B, keep A→B, C;

4)如果两条关联规则的后件相同,则对前件进行比较,如果前件存在包含关系,在置信度相差很小的前提下,删除前件比较多的规则,保留前件比较少的规则;例如:A→B和A,C→B,如果|Cf(A→B)-Cf(A,C→B)|<δ,则删除A,C→B,保留A→B;4) If the latter parts of the two association rules are the same, compare the former parts. If there is an inclusion relationship between the former parts, on the premise that the difference in confidence is small, delete the rule with more former parts and keep the rule with less former parts Rules; for example: A→B and A, C→B, if |Cf(A→B)-Cf(A, C→B)|<δ, delete A, C→B, keep A→B;

5)为了确保知识的一致性,避免出现循环推理,需要检测关联规则中是否存在环,对环的检测通过有向无环图实现,用一个节点表示前件,一个节点表示后件,二者用有向边连接,逐一检测按照置信度降序排列的关联规则。假定当前考虑A→B,首先看有向无环图中从B是否能到达A,即检测在B的所有后继节点中是否包含A,如果B的后继节点包含A,即有向图包含B→A,则删除A→B。如果B的后继节点中不包含A,那么将A→B添进有向图中。5) In order to ensure the consistency of knowledge and avoid circular reasoning, it is necessary to detect whether there is a cycle in the association rules. The detection of the cycle is realized through a directed acyclic graph. One node represents the antecedent, and one node represents the latter. Use directed edge connections to detect association rules in descending order of confidence one by one. Assuming that A→B is currently considered, first check whether A can be reached from B in the directed acyclic graph, that is, check whether A is contained in all the successor nodes of B, if B's successor nodes contain A, that is, the directed graph contains B→ A, then delete A → B. If B's successor node does not contain A, then add A→B into the directed graph.

参照表2,其为变量状态关联规则约简后的结果。Referring to Table 2, it is the result of the reduction of the variable state association rules.

表2Table 2

参照图3,其为关联规则“P002=b→P004=b”的示意图,图中虚线(上部分)表示变量P002的‘b’状态,其中有一部分空白,空白并不是没有数据,而是该窗口对应其他状态,不是‘b’状态。实线(下部分)为变量P004的‘b’状态。该条规则的置信度为47/50=0.940,表示在146条记录中变量P002和P004同时出现状态‘b’的次数为47,变量P002单独出现‘b’状态的次数为50。With reference to Fig. 3, it is the schematic diagram of association rule " P002=b→P004=b ", and dotted line (upper part) in the figure represents the 'b' state of variable P002, wherein has a part blank, and blank is not that there is no data, but the The window corresponds to other states, not the 'b' state. The solid line (lower part) is the 'b' state of variable P004. The confidence level of this rule is 47/50=0.940, which means that among the 146 records, the number of times that variables P002 and P004 appear in state 'b' at the same time is 47, and the number of times that variable P002 appears in state 'b' alone is 50.

参照图4,其为关联规则“P073=b→P075=b”的示意图,其中虚线(上半部分)表示变量P075的‘b’状态,实线(下半部分)表示变量P073的‘b’状态。可以看到P073的‘b’状态不尽相同,有的顶部出现尖峰,有的顶部是平行直线。实际上,在挖掘变量状态时,对该变量的所有窗口特征向量进行聚类,同一个簇内的窗口特征向量近似(并不一定完全相同),同一个簇内的窗口表示一种状态,所以一个状态对应的窗口内数据的特征、形态近似,但是不完全相同。该条规则的置信度为44/44=1.0,表示在146条记录中变量P073和P075同时出现状态‘b’的次数为44,变量P073单独出现‘b’状态的次数也为44。这说明变量P073出现‘b’状态时总伴随着P075的‘b’状态。With reference to Fig. 4, it is the schematic diagram of association rule " P073=b→P075=b ", wherein dotted line (upper part) represents the 'b' state of variable P075, and solid line (lower part) represents the 'b' of variable P073 state. It can be seen that the 'b' state of P073 is different, some peaks appear on the top, and some tops are parallel straight lines. In fact, when mining the state of a variable, all the window eigenvectors of the variable are clustered, the window eigenvectors in the same cluster are approximate (not necessarily identical), and the windows in the same cluster represent a state, so The characteristics and shapes of the data in the window corresponding to a state are similar, but not identical. The confidence level of this rule is 44/44=1.0, which means that among the 146 records, the number of times that variables P073 and P075 simultaneously appear in state 'b' is 44, and the number of times that variable P073 appears in state 'b' alone is also 44. This shows that variable P073 is always accompanied by the 'b' state of P075 when the 'b' state appears.

Claims (7)

1.一种挖掘时序数据状态关联的方法,其特征在于,实现该方法的系统包括数据预处理模块(1-1)、特征提取模块(1-2)、动态划分聚类模块(1-3)、多变量状态矩阵生成模块(1-4)、Apriori状态关联挖掘模块(1-5)和关联规则约简模块(1-6),其具体步骤是:1. A method for digging time-series data state association is characterized in that, the system realizing the method comprises data preprocessing module (1-1), feature extraction module (1-2), dynamic division clustering module (1-3 ), multivariate state matrix generation module (1-4), Apriori state association mining module (1-5) and association rule reduction module (1-6), the specific steps are: 1)首先,数据预处理模块(1-1)对原始时序数据进行去野值、等间隔插值、归一化操作,得到有效数据形式;1) First, the data preprocessing module (1-1) performs wild value removal, equal interval interpolation, and normalization operations on the original time series data to obtain effective data forms; 2)其次,特征提取模块(1-2)将时间序列变量的有效数据划分成长度相等的窗口,对每个窗口数据提取特征,包括傅里叶特征、统计特征、小波特征构成特征向量;2) Next, the feature extraction module (1-2) divides the effective data of time series variables into windows of equal length, and extracts features for each window data, including Fourier features, statistical features, and wavelet features to form feature vectors; 3)然后,动态划分聚类模块(1-3)对单个变量所有窗口的特征向量进行动态划分聚类,将聚类得到的簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值2的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即该变量的状态字符串;3) Then, the dynamic clustering module (1-3) dynamically clusters the eigenvectors of all windows of a single variable, sorts the clusters obtained by clustering according to size, the largest cluster is represented by the character 'a', and the second largest cluster is The clusters are represented by 'b', and so on, the clusters smaller than the given threshold 2 are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the state string of the variable; 4)多变量状态矩阵生成模块(1-4)将所有变量的状态字符串按照时间对齐,形成状态矩阵;4) The multi-variable state matrix generation module (1-4) aligns the state character strings of all variables according to time to form a state matrix; 5)Apriori状态关联挖掘模块(1-5)用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘;5) The Apriori state association mining module (1-5) uses the Apriori algorithm to mine frequent itemsets and association rules for the multivariate state matrix; 6)最后,关联规则约简模块(1-6)对检测到的关联规则进行约简,消除冗余信息,得到最终的多变量状态关联规则。6) Finally, the association rule reduction module (1-6) reduces the detected association rules, eliminates redundant information, and obtains the final multivariate state association rules. 2.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于:所述的数据预处理模块(1-1)去野值的步骤包括:计算每个窗口的均值和标准差,判断每个数据点与其所在观察窗口均值之差是否大于5倍的观察窗口的标准差,若大于,则该数据点为野值剔除;对去野值后的时间序列进行等间隔插值,设间隔为△t,起始时刻是T,则等间隔插值后的时间集合应为{T+n*△t n=0,1,2,3…},T+i*△t时刻对应的值为原始序列中离该时刻最近的小于T+i*△t时刻所对应的值,即原始序列中第一个大于T+i*△t时刻的前一个时刻所对应的观测值;对等间隔插值操作后的数据进行线性归一化,首先扫描一遍时间序列,获得观测值的最大值(max)和最小值(min),根据如下公式计算每个观测点归一化后的数值,将原始时间序列取值范围转换到[0,1]区间上;2. the method for mining sequential data state association according to claim 1, is characterized in that: the step of described data preprocessing module (1-1) removing wild value comprises: calculating the mean value and standard deviation of each window, Determine whether the difference between each data point and the mean value of the observation window is greater than 5 times the standard deviation of the observation window. If it is greater, the data point is outlier elimination; perform equal interval interpolation on the time series after the outlier removal, and set the interval is △t, and the starting time is T, then the time set after equal interval interpolation should be {T+n*△t n=0, 1, 2, 3...}, and the value corresponding to T+i*△t time is the original The value corresponding to the moment closest to this moment in the sequence is less than T+i*△t moment, that is, the observation value corresponding to the first moment in the original sequence greater than T+i*△t moment; interpolation operation at equal intervals The final data is linearly normalized. First scan the time series to obtain the maximum value (max) and minimum value (min) of the observed value. Calculate the normalized value of each observation point according to the following formula, and convert the original time series The value range is converted to the [0,1] interval; xx ii == xx ii -- mm ii nno &Delta;&Delta; 其中,xi表示第i个观测点数值;△=max-min。Among them, x i represents the value of the i-th observation point; △=max-min. 3.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于,特征提取模块(1-2):首先,用设定的窗口对单变量数据进行切割;其次,对每个窗口内的数据进行特征提取,包括统计特征、傅里叶特征、小波特征;v1=[均值,方差],v1表示统计特征,其中均值反映了一个窗口内数据的平均水平,方差则描述了窗口内数据的波动程度;v2=[傅里叶系数1,傅里叶频率1,傅里叶系数2,傅里叶频率2],v2表示频域特征,通过傅里叶变换得到一系列的傅里叶系数,对傅里叶系数按照绝对值从大到小进行排序,选取前两个最大的傅里叶系数及其所对应的频率;v3=[小波系数细节系数1,…小波细节系数n],v3表示时频域特征,对每个窗口进行离散小波变换,得到n个小波细节系数,将这三方面特征综合起来,构成窗口特征向量v,v=[v1,v2,v3]。3. the method for digging time-series data state correlation according to claim 1, is characterized in that, feature extraction module (1-2): at first, cut single-variable data with the window of setting; Secondly, for each window Feature extraction is performed on the data in the window, including statistical features, Fourier features, and wavelet features; v1=[mean value, variance], v1 represents statistical features, where the mean value reflects the average level of data in a window, and the variance describes the window. The degree of fluctuation of the data; v2=[Fourier coefficient 1, Fourier frequency 1, Fourier coefficient 2, Fourier frequency 2], v2 represents the frequency domain feature, and a series of Fourier values are obtained through Fourier transform Leaf coefficients, sort the Fourier coefficients from large to small according to their absolute values, and select the first two largest Fourier coefficients and their corresponding frequencies; v3=[Wavelet coefficient detail coefficient 1,...Wavelet detail coefficient n] , v3 represents the time-frequency domain feature, and discrete wavelet transform is performed on each window to obtain n wavelet detail coefficients, and these three aspects of features are combined to form a window feature vector v, v=[v1, v2, v3]. 4.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于:所述的动态划分聚类模块(1-3)采用动态划分聚类方法对单个变量所有窗口的特征向量进行聚类其过程如下:4. the method for digging time-series data state correlation according to claim 1, is characterized in that: described dynamic division clustering module (1-3) adopts dynamic division clustering method to carry out aggregation to the feature vector of all windows of single variable The process of class is as follows: 1)第一个窗口单独成簇,簇心为该窗口的综合特征向量;1) The first window is clustered separately, and the cluster center is the comprehensive feature vector of the window; 2)簇的初始划分过程,根据如下公式计算第2个窗口和第1个簇簇心之间的相似度:2) In the initial division process of clusters, the similarity between the second window and the first cluster center is calculated according to the following formula: cc oo sthe s (( vv ii kk ,, vv jj kk )) == vv ii kk &CenterDot;&Center Dot; vv jj kk || vv ii kk || &times;&times; || vv jj kk || 式中:cos(vik,vjk)表示窗口i的vk(k=1,2,3)向量和窗口j的vk向量间的余弦相似度;cos(vik,vjk)∈[-1,+1],利用如下公式,计算两个窗口之间的距离;In the formula: cos(v ik , v jk ) represents the cosine similarity between the v k (k=1, 2, 3) vector of window i and the v k vector of window j; cos(v ik , v jk )∈[ -1, +1], use the following formula to calculate the distance between two windows; dd (( vv ii kk ,, vv jj kk )) == 11 -- cc oo sthe s (( vv ii kk ,, vv jj kk )) ++ 11 22 dd ii sthe s tt (( vv ii ,, vv jj )) == &Sigma;&Sigma; kk == 11 33 dd ii sthe s tt (( vv ii kk ,, vv jj kk )) 33 式中:dist(vi,vj)表示窗口i和j之间的距离,dist(vi,vj)∈[0,1];In the formula: dist(v i , v j ) represents the distance between window i and j, dist(v i , v j )∈[0,1]; 若dist<d,d=0.2),则将2号窗口并入第一个簇并且根据如下公式立即更换簇心:If dist<d, d=0.2), merge window 2 into the first cluster and replace the cluster center immediately according to the following formula: cvcv kk == &Sigma;&Sigma; ii == 11 nno vv ii kk nno ,, (( kk == 11 ,, 22 ,, 33 )) 式中:cvk表示簇心c的第k个特征向量,它等于簇内所有窗口第k个特征向量的均值,一个簇的簇心c就是cv1,cv2,cv3的组合,即簇内所有窗口综合特征向量的平均值;若dist≥d,2号窗口单独成簇,依次处理其余窗口:计算第i个窗口和所有已经产生的簇簇心之间的距离,挑选出和其最近的簇c的距离dist,若dist<d,将i号窗口并入簇c,否则单独成簇;In the formula: cv k represents the kth eigenvector of the cluster center c, which is equal to the mean value of the kth eigenvector of all windows in the cluster, and the cluster center c of a cluster is the combination of cv 1 , cv 2 , and cv 3 , that is, the cluster The average value of the comprehensive eigenvectors of all windows in the window; if dist≥d, window No. 2 is clustered separately, and the remaining windows are processed in turn: calculate the distance between the i-th window and all the cluster centers that have been generated, and select the closest to it The distance dist of the cluster c, if dist<d, merge the i window into the cluster c, otherwise form a separate cluster; 3)簇的调整过程:取出第i号窗口(i=1,2,…m),计算其与所有簇的簇心距离dist,挑选出最小的dist及其对应的簇c,若dist≤d,且窗口i不在簇c中,则将窗口i从原来的簇移到簇c;若dist≤d,且窗口i在簇c中,则窗口i不进行操作;若dist>d,则将窗口i从原来的簇中移除,单独成簇;重复上述过程,直到处理完所有窗口,计算所有簇的簇心。若存在一个簇心发生了变化,则重复簇的调整过程,即步骤3)直至所有簇的簇心不再变化;若所有簇心都不变,则执行步骤4);3) Cluster adjustment process: take out the i-th window (i=1, 2,...m), calculate the distance dist between it and all cluster centers, select the smallest dist and its corresponding cluster c, if dist≤d , and window i is not in cluster c, move window i from the original cluster to cluster c; if dist≤d, and window i is in cluster c, then window i does not operate; if dist>d, move window i is removed from the original cluster and clustered separately; repeat the above process until all windows are processed, and the cluster centers of all clusters are calculated. If there is a cluster center that has changed, repeat the cluster adjustment process, that is, step 3) until the cluster centers of all clusters do not change; if all cluster centers are unchanged, then perform step 4); 4)簇的合并过程:计算任意两个簇的簇心距离,选出距离最近的两个簇ci,cj,及其对应的距离dist,若dist≤α,α=0.3),则合并簇ci,cj,并且计算合并后新簇的簇心,重复合并过程4),若dist>α,表示不存在两个足够接近的簇可以合并,则退出合并过程,聚类算法结束,聚类结果中同一个簇内的窗口特征近似,被认为是一种状态,不同簇的窗口代表不同的状态;将所有簇按照大小排序,最大的簇用字符‘a’表示,次大的簇用‘b’表示,依次类推,小于给定阈值的簇则视为噪声,用‘?’表示,将每个窗口用其所在簇对应的字符表示,这样原始数值型数据被转化成字符串形式,即获取每个变量的状态字符串。4) Cluster merging process: Calculate the distance between the cluster centers of any two clusters, select the two closest clusters c i , c j , and their corresponding distance dist, if dist≤α, α=0.3), then merge Clusters c i , c j , and calculate the cluster center of the new cluster after merging, repeat the merging process 4), if dist>α, it means that there are no two clusters that are close enough to be merged, then exit the merging process, and the clustering algorithm ends, In the clustering results, the window characteristics in the same cluster are similar, which is considered as a state, and the windows of different clusters represent different states; all clusters are sorted by size, the largest cluster is represented by the character 'a', and the second largest cluster Expressed by 'b', and so on, clusters smaller than a given threshold are regarded as noise, and '? 'Indicates that each window is represented by the character corresponding to its cluster, so that the original numerical data is converted into a string form, that is, the status string of each variable is obtained. 5.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于:所述的多变量状态矩阵生成模块(1-4)是将所有变量的状态字符串按照时间对齐,假设有n个变量,每个变量的起始观测时间和截止时间相同,那么它们的窗口数目必然相同,状态字符串长度也相同,假设状态字符串长度为m,则生成n*m的多变量状态矩阵。5. the method for digging time-series data state correlation according to claim 1, is characterized in that: described multivariate state matrix generating module (1-4) is to align the state character strings of all variables according to time, assuming that there are n variables, and the start observation time and cut-off time of each variable are the same, then their number of windows must be the same, and the length of the state string is also the same. Assuming the length of the state string is m, a multivariate state matrix of n*m is generated. 6.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于:所述的Apriori状态关联挖掘模块(1-5)是利用Apriori算法对多变量状态矩阵进行频繁项集和关联规则挖掘:Apriori算法挖掘频繁项集流程如下:首先,通过扫描事务记录,找出所有的频繁1项集,该集合记做L1,然后利用L1寻找频繁2项集L2,如此下去,直到不能再找到任何频繁k项集;每次迭代分为两个步骤:第一,通过连接步和剪枝步产生候选项集;第二,计算每个候选项的支持度,大于最小支持度阈值0.001的项被认为是频繁项,在频繁项集的基础上挖掘关联规则,具体如下:首先,对于每个频繁项集L,产生L的所有非空子集;其次,对于L的每个非空子集s,产生一个候选规则“s→(L-s)”,其中(L-s)表示L中除去s之后所剩的内容;最后如果该候选规则的置信度大于给定阈值0.5,则输出该规则,否则抛弃该规则,规则的置信度按下式计算:6. the method for mining sequential data state association according to claim 1, is characterized in that: described Apriori state association mining module (1-5) is to utilize Apriori algorithm to carry out frequent itemsets and association rules to multivariate state matrix Mining: The Apriori algorithm mining frequent itemset process is as follows: First, find out all frequent 1-itemsets by scanning transaction records, and record this set as L 1 , then use L 1 to find frequent 2-itemsets L 2 , and so on until Can no longer find any frequent k-itemsets; each iteration is divided into two steps: first, generate candidate item sets through the connection step and pruning step; second, calculate the support of each candidate, which is greater than the minimum support threshold Items of 0.001 are considered frequent items, and association rules are mined on the basis of frequent itemsets, as follows: First, for each frequent itemset L, generate all non-empty subsets of L; secondly, for each non-empty subset of L Set s to generate a candidate rule "s→(Ls)", where (Ls) represents the remaining content in L after removing s; finally, if the confidence of the candidate rule is greater than the given threshold 0.5, then output the rule, otherwise Abandoning this rule, the confidence of the rule is calculated as follows: CC ff (( LL ,, sthe s )) == SS pp (( LL )) SS pp (( sthe s )) 其中,Cf(L,s)表示规则“s→(L-s)”的置信度,Sp(L)表示L的支持度,Sp(s)表示s的支持度。Among them, Cf(L,s) represents the confidence degree of the rule "s→(L-s)", Sp(L) represents the support degree of L, and Sp(s) represents the support degree of s. 7.根据权利要求1所述的挖掘时序数据状态关联的方法,其特征在于:所述的关联规则约简模块(1-6)对产生的冗余规则进行合并或删除,约简步骤为:7. The method for mining time series data state association according to claim 1, characterized in that: said association rule reduction module (1-6) merges or deletes redundant rules generated, and the reduction step is: 1)对于得到的关联规则,按照置信度从大到小进行排序;1) For the obtained association rules, sort according to the confidence degree from large to small; 2)对于每一个K阶频繁项(K>1),只保留置信度最大的关联规则;2) For each K-order frequent item (K>1), only keep the association rule with the highest confidence; 3)如果两条关联规则的前件相同,则对后件进行比较,如果后件存在包含关系,在置信度相差很小的前提下,删除被包含的后件所属于的规则;3) If the antecedents of the two association rules are the same, compare the latter, and if the latter has an inclusion relationship, delete the rule to which the included latter belongs under the premise that the confidence difference is small; 4)如果两条关联规则的后件相同,则对前件进行比较,如果前件存在包含关系,在置信度相差很小的前提下,删除前件比较多的规则,保留前件比较少的规则;4) If the latter parts of the two association rules are the same, compare the former parts. If there is an inclusion relationship between the former parts, on the premise that the difference in confidence is small, delete the rule with more former parts and keep the rule with less former parts rule; 5)为了确保知识的一致性,避免出现循环推理,需要检测关联规则中是否存在环,对环的检测通过有向无环图实现,用一个节点表示前件,一个节点表示后件,二者用有向边连接,逐一检测按照置信度降序排列的关联规则。5) In order to ensure the consistency of knowledge and avoid circular reasoning, it is necessary to detect whether there is a cycle in the association rules. The detection of the cycle is realized through a directed acyclic graph. One node represents the antecedent, and one node represents the latter. Use directed edge connections to detect association rules in descending order of confidence one by one.
CN201610814387.3A 2016-09-09 2016-09-09 Method for mining time series data state correlation Pending CN106384128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610814387.3A CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610814387.3A CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Publications (1)

Publication Number Publication Date
CN106384128A true CN106384128A (en) 2017-02-08

Family

ID=57936358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610814387.3A Pending CN106384128A (en) 2016-09-09 2016-09-09 Method for mining time series data state correlation

Country Status (1)

Country Link
CN (1) CN106384128A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107454089A (en) * 2017-08-16 2017-12-08 北京科技大学 A kind of network safety situation diagnostic method based on multinode relevance
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat
CN107609107A (en) * 2017-09-13 2018-01-19 大连理工大学 A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data
CN108182178A (en) * 2018-01-25 2018-06-19 刘广泽 Groundwater level analysis method and system based on event text data mining
CN108577804A (en) * 2018-02-02 2018-09-28 西北工业大学 A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring
CN109300502A (en) * 2018-10-10 2019-02-01 汕头大学医学院 A system and method for analyzing association change patterns from multi-omics data
CN109409695A (en) * 2018-09-30 2019-03-01 上海机电工程研究所 System Effectiveness evaluation index system construction method and system based on association analysis
CN109614491A (en) * 2018-12-21 2019-04-12 成都康赛信息技术有限公司 Further method for digging based on data quality checking rule digging result
CN109815857A (en) * 2019-01-09 2019-05-28 浙江工业大学 A Clustering Method for Ionizing Radiation Time Series
CN109886098A (en) * 2019-01-11 2019-06-14 中国船舶重工集团公司第七二四研究所 A kind of AESA radar frequency agility mode excavation method across sorting interval
CN110188566A (en) * 2019-05-19 2019-08-30 复旦大学 A Sequence Analysis-Based Method for Detecting Access Behavior Damages Data Rights and Interests
CN111163053A (en) * 2019-11-29 2020-05-15 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111177216A (en) * 2019-12-23 2020-05-19 国网天津市电力公司电力科学研究院 Method and device for generating association rules for comprehensive energy consumer behavior characteristics
CN111428198A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN112284469A (en) * 2020-10-20 2021-01-29 重庆智慧水务有限公司 Zero drift processing method for ultrasonic water meter
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113535796A (en) * 2021-05-31 2021-10-22 国家电网有限公司大数据中心 Method and system for determining reason of high-age inventory of electric energy meters
CN116451792A (en) * 2023-06-14 2023-07-18 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117974211A (en) * 2024-02-02 2024-05-03 广东工业大学 Vegetable sales warning method and system

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220483A (en) * 2017-05-09 2017-09-29 西北大学 A kind of mode prediction method of polynary time series data
CN107454089A (en) * 2017-08-16 2017-12-08 北京科技大学 A kind of network safety situation diagnostic method based on multinode relevance
WO2019041628A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for mining multivariate time series association rule based on eclat
CN107562865A (en) * 2017-08-30 2018-01-09 哈尔滨工业大学深圳研究生院 Multivariate time series association rule mining method based on Eclat
CN107609107B (en) * 2017-09-13 2020-07-14 大连理工大学 Travel co-occurrence phenomenon visual analysis method based on multi-source city data
CN107609107A (en) * 2017-09-13 2018-01-19 大连理工大学 A kind of trip co-occurrence phenomenon visual analysis method based on multi-source Urban Data
CN108182178A (en) * 2018-01-25 2018-06-19 刘广泽 Groundwater level analysis method and system based on event text data mining
CN108577804A (en) * 2018-02-02 2018-09-28 西北工业大学 A kind of BCG signal analysis methods and system towards hypertensive patient's monitoring
CN109409695A (en) * 2018-09-30 2019-03-01 上海机电工程研究所 System Effectiveness evaluation index system construction method and system based on association analysis
CN109409695B (en) * 2018-09-30 2020-02-04 上海机电工程研究所 System efficiency evaluation index system construction method and system based on correlation analysis
CN109300502A (en) * 2018-10-10 2019-02-01 汕头大学医学院 A system and method for analyzing association change patterns from multi-omics data
CN109614491A (en) * 2018-12-21 2019-04-12 成都康赛信息技术有限公司 Further method for digging based on data quality checking rule digging result
CN109614491B (en) * 2018-12-21 2023-06-30 成都康赛信息技术有限公司 Further mining method based on mining result of data quality detection rule
CN109815857A (en) * 2019-01-09 2019-05-28 浙江工业大学 A Clustering Method for Ionizing Radiation Time Series
CN109815857B (en) * 2019-01-09 2021-05-18 浙江工业大学 A Clustering Method for Ionizing Radiation Time Series
CN109886098A (en) * 2019-01-11 2019-06-14 中国船舶重工集团公司第七二四研究所 A kind of AESA radar frequency agility mode excavation method across sorting interval
CN110188566A (en) * 2019-05-19 2019-08-30 复旦大学 A Sequence Analysis-Based Method for Detecting Access Behavior Damages Data Rights and Interests
CN111163053B (en) * 2019-11-29 2022-05-03 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111163053A (en) * 2019-11-29 2020-05-15 深圳市任子行科技开发有限公司 Malicious URL detection method and system
CN111177216B (en) * 2019-12-23 2024-01-05 国网天津市电力公司电力科学研究院 Association rule generation method and device for comprehensive energy consumer behavior characteristics
CN111177216A (en) * 2019-12-23 2020-05-19 国网天津市电力公司电力科学研究院 Method and device for generating association rules for comprehensive energy consumer behavior characteristics
CN111428198A (en) * 2020-03-23 2020-07-17 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN111428198B (en) * 2020-03-23 2023-02-07 平安医疗健康管理股份有限公司 Method, device, equipment and storage medium for determining abnormal medical list
CN112284469A (en) * 2020-10-20 2021-01-29 重庆智慧水务有限公司 Zero drift processing method for ultrasonic water meter
CN112284469B (en) * 2020-10-20 2024-03-19 重庆智慧水务有限公司 Zero drift processing method of ultrasonic water meter
CN112861364B (en) * 2021-02-23 2022-08-26 哈尔滨工业大学(威海) Method for realizing anomaly detection by modeling industrial control system equipment behavior based on secondary annotation of state delay transition diagram
CN112861364A (en) * 2021-02-23 2021-05-28 哈尔滨工业大学(威海) Industrial control system equipment behavior modeling method and device based on state delay transition diagram secondary annotation
CN113535796A (en) * 2021-05-31 2021-10-22 国家电网有限公司大数据中心 Method and system for determining reason of high-age inventory of electric energy meters
CN116451792A (en) * 2023-06-14 2023-07-18 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN116451792B (en) * 2023-06-14 2023-08-29 北京理想信息科技有限公司 Method, system, device and storage medium for solving large-scale fault prediction problem
CN117272398A (en) * 2023-11-23 2023-12-22 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117272398B (en) * 2023-11-23 2024-01-26 聊城金恒智慧城市运营有限公司 Data mining safety protection method and system based on artificial intelligence
CN117974211A (en) * 2024-02-02 2024-05-03 广东工业大学 Vegetable sales warning method and system

Similar Documents

Publication Publication Date Title
CN106384128A (en) Method for mining time series data state correlation
US11301759B2 (en) Detective method and system for activity-or-behavior model construction and automatic detection of the abnormal activities or behaviors of a subject system without requiring prior domain knowledge
CN103728551B (en) A kind of analog-circuit fault diagnosis method based on cascade integrated classifier
CN108241873B (en) A kind of intelligent failure diagnosis method towards pumping plant main equipment
CN105205112A (en) System and method for excavating abnormal features of time series data
CN106909664A (en) A kind of power equipment data stream failure recognition methods
CN112380274B (en) Abnormality detection method for control process
CN109787958B (en) Network flow real-time detection method, detection terminal and computer readable storage medium
Zhang et al. A new time series representation model and corresponding similarity measure for fast and accurate similarity detection
CN105447464A (en) Electric energy quality disturbance recognition and classification method based on PSO
Badapanda et al. Agriculture data visualization and analysis using data mining techniques: application of unsupervised machine learning
Zhang et al. Fault diagnosis on train brake system based on multi-dimensional feature fusion and GBDT enhanced classification
CN110458071A (en) A Feature Extraction and Classification Method of Optical Fiber Vibration Signal Based on DWT-DFPA-GBDT
Zan et al. Dynamic SAX parameter estimation for time series
Wang et al. An ensemble classification algorithm for text data stream based on feature selection and topic model
Malik et al. A comprehensive approach towards data preprocessing techniques & association rules
Bakhtazad et al. Process trend analysis using wavelet-based de-noising
Nancharaiah et al. Analysis of Visual Communication Effect of Color in Print Advertising Design Based on Decision Tree Classification Algorithm
Abdel-Galil et al. On-line disturbance recognition utilizing vector quantization based fast match
Mahajan Textual data quality at scale for high dimensionality data
Xin et al. Classification for multiple power quality disturbances based on deep forest
CN113900924B (en) Software defect prediction method and system based on TAN half-naive Bayesian network
Siswanto et al. Dimensionality reduction for association rule mining with IST-EFP algorithm
Yang et al. ChatAN: Interval Adaptive Normalization based on ChatGPT Knowledge Augmentation
Sha et al. Mining association rules from dataset containing predetermined decision itemset and rare transactions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170208