CN105205113A - System and method for excavating abnormal change process of time series data - Google Patents

System and method for excavating abnormal change process of time series data Download PDF

Info

Publication number
CN105205113A
CN105205113A CN201510551876.XA CN201510551876A CN105205113A CN 105205113 A CN105205113 A CN 105205113A CN 201510551876 A CN201510551876 A CN 201510551876A CN 105205113 A CN105205113 A CN 105205113A
Authority
CN
China
Prior art keywords
data
window
feature
bunch
feature vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510551876.XA
Other languages
Chinese (zh)
Inventor
鲍军鹏
杨天社
胡绍林
齐勇
高宇
李肖瑛
张海龙
杨冬毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
China Xian Satellite Control Center
Original Assignee
Xian Jiaotong University
China Xian Satellite Control Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University, China Xian Satellite Control Center filed Critical Xian Jiaotong University
Priority to CN201510551876.XA priority Critical patent/CN105205113A/en
Publication of CN105205113A publication Critical patent/CN105205113A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Abstract

The invention discloses a system and a method for excavating the abnormal change process of time series data. The system comprises a data pre-processing module, an integrated eigenvector extraction module, an SDMC (similar density merge clustering) module, a feature character string generation module and an abnormal change process learning module. With the adoption of the system and the method, the change process from normality to abnormal deviation and then to obvious failure can be excavated from the mass time series data, and the feature change law in the process can be analyzed; the time series data are abstracted to form feature character strings, frequent words are excavated with a statistical learning method, and frequent modes are formed by continuous frequent words; the frequent modes correspond to general normal processes; a gap between the adjacent frequent modes is the abnormal change process; feature character strings of the abnormal change process express features of the process. The system and the method can be used for excavating and finding the abnormal change and failure development processes of a real-time system, play an important role in analyzing failure causes of the system and improving the failure diagnosis efficiency, and have important significance in whole-life health management of a complex system.

Description

A kind of digging system of time series data ANOMALOUS VARIATIONS process and method
[technical field]
The invention belongs to Intelligent Information Processing and field of computer technology, be specifically related to a kind of digging system for time series data ANOMALOUS VARIATIONS process and method.
[background technology]
Seasonal effect in time series ANOMALOUS VARIATIONS process is for understanding time series law characteristic, analysis of failure evolutionary process and fault cause, excavate fault knowledge, be familiar with to a deeper level and learn sequential system, prognoses system health status, gets involved initial failure early warning and all has vital role.
Seasonal effect in time series change often has an evolution.Its Evolution of different ANOMALOUS VARIATIONS is also different, respectively has feature.Excavating the evolution process of ANOMALOUS VARIATIONS and changing features rule, time series state will being excavated exactly from normal to departing from again to exception from magnanimity abnormal data, and from mile abnormality to the change procedure of severe exception or fault; Then the Changing Pattern of different characteristic in these evolution process is analyzed.
[summary of the invention]
The object that the present invention carries is the digging system and the method that provide a kind of time series data ANOMALOUS VARIATIONS process, by data prediction, multi-feature vector extraction, the generation of SDMC cluster, feature string, the process of mutation procedural learning, the change procedure from normal to exception can be excavated from magnanimity time series data.
To achieve these goals, the present invention adopts following technical scheme:
A digging system for time series data ANOMALOUS VARIATIONS process, comprises data preprocessing module, multi-feature vector extraction module, SDMC cluster module, feature string generation module and mutation procedural learning module;
Data preprocessing module, for cleaning original temporal data, interpolation processing, obtains normalization data;
Multi-feature vector extraction module, for automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;
SDMC cluster module, for carrying out cluster to multi-feature vector and merging between carrying out bunch cluster result;
Feature string generation module, for converting data to characteristic of correspondence character string according to cluster result;
Mutation procedural learning module, for feature string is divided into word sequence, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
The present invention further improves and is: data preprocessing module comprises elimination of burst noise, generates one-parameter file, processes at equal intervals and normalized step; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval.
The present invention further improves and is: multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:
C = 1 f = N T k
Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2.
The present invention further improves and is: SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.
The present invention further improves and is: feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N.
The present invention further improves and is: mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.
A method for digging for time series data ANOMALOUS VARIATIONS process, comprises the following steps:
The first step: data preprocessing module is cleaned original temporal data, interpolation processing, obtains normalization data;
Second step: multi-feature vector extraction module automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;
3rd step: SDMC cluster module carries out cluster to multi-feature vector and merges between carrying out bunch cluster result;
4th step: feature string generation module converts data to characteristic of correspondence character string according to cluster result;
5th step: feature string is divided into word sequence by mutation procedural learning module, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
The present invention further improves and is, described method for digging specifically comprises the following steps:
The first step: data preprocessing module is carried out elimination of burst noise to original temporal data, generate one-parameter file, processed at equal intervals and normalized; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval;
Second step: multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:
C = 1 f = N T k
Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2;
3rd step: SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates;
4th step: feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N;
5th step: mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.
Relative to prior art, the present invention has following beneficial effect: the present invention combines multiple temporal aspect, improves clustering method, thus excavates time series data mutation process more stablely, and abstract can be provided with feature string and represent, better process the uncertainty of time series data.
[accompanying drawing explanation]
Fig. 1 is the module frame figure of present system.
Fig. 2 is SDMC cluster module process flow diagram of the present invention.
Fig. 3 is mutation procedural learning block flow diagram of the present invention.
Fig. 4 is example parameter data and curves figure of the present invention.
Fig. 5 is the frequent mode that obtains of example parameter of the present invention and non-frequent mode.
Fig. 6 is the ANOMALOUS VARIATIONS process graphical that example parameter of the present invention is excavated.
[embodiment]
It is below the better exemplifying embodiment of this method.
With reference to Fig. 1, the digging system of a kind of time series data ANOMALOUS VARIATIONS of the present invention process, comprises data preprocessing module 1-1, multi-feature vector extraction module 1-2, SDMC cluster module 1-3, feature string generation module 1-4, mutation procedural learning module 1-5.
Data preprocessing module, for cleaning original temporal data, interpolation processing, obtains normalization data.
Data preprocessing module comprises elimination of burst noise, generates one-parameter file (cleaning), processes (interpolation) and normalized work at equal intervals; In order to remove noise jamming, obtain valid data value, the present invention deletes the invalid outlier in original temporal data by " elimination of burst noise process ", remain with valid value.Be specially, to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, reaches the object of elimination of burst noise with this.The present invention extracts one-parameter feature, does not consider the relation between multiparameter.Therefore we are write separately each actual parameter as a data file.To data, the present invention processes to ensure that the time interval in continuous time section between any two data points is identical at equal intervals.At equal intervals in handling procedure, we sample every 1 second to data at acquiescence.Data at equal intervals after process, per minute all from 0 second, 59 seconds terminate.Data also will be normalized after processing at equal intervals, its span are transformed on [0,1] interval, to eliminate the impact of dimension on result.Concrete employing linear normalization method, wherein maximin is obtained by the data statistics after processing at equal intervals, also can artificially arrange.
Multi-feature vector extraction module, for automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector.
Multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window, but not single features is vectorial.Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; The present invention automatically identifies the minimum complete cycle of time series data, and need not manually calculate one by one: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtain N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:
C = 1 f = N T k
Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then manually specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; The present invention according to the data adaptive determination wavelet decomposition number of plies, to obtain suitable proper vector length; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 lbe less than threshold value h; Window data, by after L layer wavelet decomposition, can obtain wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, the Fourier coefficient that before selecting, n (n is defaulted as 2) is maximum and respective frequencies thereof are as Fourier's feature.
SDMC cluster module, for carrying out cluster to multi-feature vector and merging between carrying out bunch cluster result, promotes Clustering Effect.
SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; Distance between traditional K-Means cluster can not ensure bunch is enough large; When some data point compares dispersion time, traditional K-Means cluster or point not high enough for a large amount of similarity is gathered in bunch by force, causes bunch very loose; A lot of tuftlet can be generated, and more similar between tuftlet; These two kinds of cluster results all do not have to reflect data real structure objective and accurately; SDMC (SimilarDensityMergeClustering) clustering method that the present invention proposes is similar to traditional K-Means method, but merging process between finally carrying out bunch, point in ensureing each bunch is enough similar, and similar tuftlet is suitably merged; SDMC clustering method specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.
Feature string generation module, for converting data to characteristic of correspondence character string according to cluster result.
Feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, thus N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data are converted to the feature string that length is N; Larger character then represents more possible off-note, the feature that namely probability of occurrence is less; The feature of maximum probability is designated as " a ", and secondary large feature is designated as " b " by that analogy; Article one, original temporal data are converted into a feature string.
Mutation procedural learning module, for feature string is divided into word sequence, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
Mutation procedural learning module is first given treats word under investigation size (be defaulted as 4, can think given); Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.
The method of a kind of time series data ANOMALOUS VARIATIONS of the present invention process, comprises the following steps:
First, data preprocessing module 1-1 cleans original temporal data, interpolation processing, obtains valid data form, to carry out follow-up excacation.
Secondly, multi-feature vector extraction module 1-2 automatic analysis data, obtain the minimum complete cycle of cycle data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector.
Then, SDMC cluster module 1-3 carries out cluster to multi-feature vector and merges between carrying out bunch cluster result.
Then, feature string generation module 1-4 converts data to characteristic of correspondence character string according to cluster result.
Finally, feature string is divided into word sequence by mutation procedural learning module 1-5, frequent and non-frequent word is divided into according to the frequency of word, then by asking for frequent mode gap thus obtaining non-frequent mode, change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
With reference to Fig. 2, it is the process flow diagram of SDMC cluster module of the present invention, comprises the following steps:
First carry out step 2-1, get Article 1 multi-feature vector and be one bunch separately, and as bunch center.Then carry out step 2-2, judge whether all multi-feature vectors process.If untreated complete all multi-feature vectors, then perform step 2-3, take off a multi-feature vector.Then perform step 2-4, calculate the distance at this multi-feature vector and current all bunches of centers.Then perform step 2-5, judge whether this multi-feature vector is less than appointment threshold value with the distance at certain bunch of center.If be less than appointment threshold value, then perform step 2-6, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center, then go to step 2-2.Otherwise, perform step 2-7, this multi-feature vector generated one bunch separately, and as bunch center, then go to step 2-2.If all multi-feature vectors process, then perform step 2-8, get Article 1 multi-feature vector.Then perform step 2-9, judge whether multi-feature vector processes.If untreated complete all multi-feature vectors, then perform step 2-10, calculate the distance at this multi-feature vector and current all bunches of centers.Then perform step 2-11, this multi-feature vector is put into nearest with it bunch.Then perform step 2-12, take off data.Then 2-9 is gone to step.If all multi-feature vectors process, then perform step 2-13, judge whether cluster result changes.If cluster result there occurs change, then perform step 2-14, adjustment change Cu Cu center, then goes to step 2-8.If cluster result is unchanged, then perform step 2-15, calculate distance between two between bunch center, from all bunches, select that bunch center is nearest two bunches.Then perform step 2-16, judge whether this is less than given threshold value to the distance between bunch center.If a bunch heart distance is less than given threshold value, then performs step 2-17, merge these two bunches, then go to step 2-15.If a bunch heart distance is not less than given threshold value, then SDMC cluster process terminates.
With reference to Fig. 3, it is mutation procedural learning block flow diagram of the present invention, comprises the following steps:
First carry out step 3-1, obtain the characteristic character string sequence generated by feature string generation module.Then perform step 3-2, in this character string, add up the frequency of occurrences that all length is the word of L (be defaulted as 4, can think given) individual character.Then perform step 3-3, judge whether the frequency of occurrences of all words is greater than given threshold value.If word frequencies is not more than given threshold value, perform step 3-4, marking this word is non-frequent word; Otherwise perform step 3-5, marking this word is frequent word.After all words have judged, perform step 3-6, rescan characteristic character string sequence.Then perform step 3-7, judge whether current location arrives character string end.If do not arrive character string end, then perform step 3-8, judge whether a continuous print L character is frequent word from current location.If this word is not frequent word, then performs step 3-9, judge whether its previous word is frequent word.If previous word is frequent word, then performs step 3-12 and obtain a frequent mode (i.e. the string of continuous frequent word) from a upper position to current location, and this pattern is put into frequent mode queue.Then perform step 3-10, slide backward a character.If previous word is not frequent word, then directly performs step 3-10, slide backward a character.Then 3-7 is gone to step.If a continuous print L character is frequent word from current location, then performs step 3-11, slide backward L character.Then 3-7 is gone to step.If character string has scanned, arrive character string end, then performed step 3-13, from frequent mode queue, find out the character string corresponding to gap between all adjacent frequent modes, be non-frequent mode.Then perform step 3-14, export the ANOMALOUS VARIATIONS process corresponding to all non-frequent modes, comprise and change to non-frequent mode from frequent mode and change to the process of frequent mode from non-frequent mode.So far, mutation procedural learning terminates.
With reference to Fig. 4, it is the data and curves figure of this method example parameter.
With reference to Fig. 5, it is the frequent mode that obtains from above-mentioned example parameter and non-frequent mode.The wherein position that occurs in feature string of numeral pattern.
With reference to Fig. 6, illustrate the ANOMALOUS VARIATIONS process excavated from above-mentioned example parameter.

Claims (8)

1. a digging system for time series data ANOMALOUS VARIATIONS process, is characterized in that, comprises data preprocessing module, multi-feature vector extraction module, SDMC cluster module, feature string generation module and mutation procedural learning module;
Data preprocessing module, for cleaning original temporal data, interpolation processing, obtains normalization data;
Multi-feature vector extraction module, for automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;
SDMC cluster module, for carrying out cluster to multi-feature vector and merging between carrying out bunch cluster result;
Feature string generation module, for converting data to characteristic of correspondence character string according to cluster result;
Mutation procedural learning module, for feature string is divided into word sequence, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
2. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, data preprocessing module comprises elimination of burst noise, generates one-parameter file, processes at equal intervals and normalized step; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval.
3. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:
C = 1 f = N T k
Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2.
4. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.
5. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, it is characterized in that, feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N.
6. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.
7. a method for digging for time series data ANOMALOUS VARIATIONS process, is characterized in that, comprises the following steps:
The first step: data preprocessing module is cleaned original temporal data, interpolation processing, obtains normalization data;
Second step: multi-feature vector extraction module automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;
3rd step: SDMC cluster module carries out cluster to multi-feature vector and merges between carrying out bunch cluster result;
4th step: feature string generation module converts data to characteristic of correspondence character string according to cluster result;
5th step: feature string is divided into word sequence by mutation procedural learning module, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.
8. the method for digging of a kind of time series data ANOMALOUS VARIATIONS process according to claim 7, it is characterized in that, described method for digging specifically comprises the following steps:
The first step: data preprocessing module is carried out elimination of burst noise to original temporal data, generate one-parameter file, processed at equal intervals and normalized; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval;
Second step: multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:
C = 1 f = N T k
Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2;
3rd step: SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates;
4th step: feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N;
5th step: mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.
CN201510551876.XA 2015-09-01 2015-09-01 System and method for excavating abnormal change process of time series data Pending CN105205113A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510551876.XA CN105205113A (en) 2015-09-01 2015-09-01 System and method for excavating abnormal change process of time series data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510551876.XA CN105205113A (en) 2015-09-01 2015-09-01 System and method for excavating abnormal change process of time series data

Publications (1)

Publication Number Publication Date
CN105205113A true CN105205113A (en) 2015-12-30

Family

ID=54952796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510551876.XA Pending CN105205113A (en) 2015-09-01 2015-09-01 System and method for excavating abnormal change process of time series data

Country Status (1)

Country Link
CN (1) CN105205113A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921599A (en) * 2017-04-26 2017-07-04 中国民用航空总局第二研究所 A kind of overlap signal removing method and system based on cluster
CN108960537A (en) * 2018-08-17 2018-12-07 安吉汽车物流股份有限公司 The prediction technique and device of logistics order, readable medium
WO2019037557A1 (en) * 2017-08-25 2019-02-28 清华大学 Method for learning time sequence characteristics of locomotive operation
CN109582482A (en) * 2017-09-29 2019-04-05 西门子公司 For detecting the abnormal method and device of discrete type production equipment
CN110020190A (en) * 2018-07-05 2019-07-16 中国科学院信息工程研究所 A kind of suspected threat index verification method and system based on multi-instance learning
CN110032490A (en) * 2018-12-28 2019-07-19 中国银联股份有限公司 Method and device thereof for detection system exception
CN111651755A (en) * 2020-05-08 2020-09-11 中国联合网络通信集团有限公司 Intrusion detection method and device
CN112732541A (en) * 2020-12-28 2021-04-30 北京航空航天大学 Intelligent criterion mining system for fault diagnosis of complex equipment
CN113515554A (en) * 2020-04-09 2021-10-19 华晨宝马汽车有限公司 Anomaly detection method and system for irregularly sampled time series

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106921599A (en) * 2017-04-26 2017-07-04 中国民用航空总局第二研究所 A kind of overlap signal removing method and system based on cluster
CN106921599B (en) * 2017-04-26 2019-08-13 中国民用航空总局第二研究所 A kind of overlap signal removing method and system based on cluster
WO2019037557A1 (en) * 2017-08-25 2019-02-28 清华大学 Method for learning time sequence characteristics of locomotive operation
CN109582482A (en) * 2017-09-29 2019-04-05 西门子公司 For detecting the abnormal method and device of discrete type production equipment
CN110020190A (en) * 2018-07-05 2019-07-16 中国科学院信息工程研究所 A kind of suspected threat index verification method and system based on multi-instance learning
CN110020190B (en) * 2018-07-05 2021-06-01 中国科学院信息工程研究所 Multi-instance learning-based suspicious threat index verification method and system
CN108960537A (en) * 2018-08-17 2018-12-07 安吉汽车物流股份有限公司 The prediction technique and device of logistics order, readable medium
CN108960537B (en) * 2018-08-17 2020-10-13 安吉汽车物流股份有限公司 Logistics order prediction method and device and readable medium
CN110032490A (en) * 2018-12-28 2019-07-19 中国银联股份有限公司 Method and device thereof for detection system exception
CN113515554A (en) * 2020-04-09 2021-10-19 华晨宝马汽车有限公司 Anomaly detection method and system for irregularly sampled time series
CN111651755A (en) * 2020-05-08 2020-09-11 中国联合网络通信集团有限公司 Intrusion detection method and device
CN111651755B (en) * 2020-05-08 2023-04-18 中国联合网络通信集团有限公司 Intrusion detection method and device
CN112732541A (en) * 2020-12-28 2021-04-30 北京航空航天大学 Intelligent criterion mining system for fault diagnosis of complex equipment
CN112732541B (en) * 2020-12-28 2023-05-09 北京航空航天大学 Intelligent criterion mining system for fault diagnosis of complex equipment

Similar Documents

Publication Publication Date Title
CN105205113A (en) System and method for excavating abnormal change process of time series data
CN108008332B (en) New energy remote testing equipment fault diagnosis method based on data mining
CN105205112A (en) System and method for excavating abnormal features of time series data
CN109489977B (en) KNN-AdaBoost-based bearing fault diagnosis method
CN108875772B (en) Fault classification model and method based on stacked sparse Gaussian Bernoulli limited Boltzmann machine and reinforcement learning
CN110335168B (en) Method and system for optimizing power utilization information acquisition terminal fault prediction model based on GRU
CN108435819B (en) Energy consumption abnormity detection method for aluminum profile extruder
CN105205111A (en) System and method for mining failure modes of time series data
CN106682835B (en) Data-driven complex electromechanical system service quality state evaluation method
CN110287827B (en) Bridge strain data outlier identification method based on data correlation
CN112668105B (en) Helicopter transmission shaft abnormity judgment method based on SAE and Mahalanobis distance
US20220179393A1 (en) Machine tool evaluation method, machine tool evaluation system and medium
CN111426905B (en) Power distribution network common bus transformation relation abnormity diagnosis method, device and system
CN116431966A (en) Reactor core temperature anomaly detection method of incremental characteristic decoupling self-encoder
CN117421684B (en) Abnormal data monitoring and analyzing method based on data mining and neural network
CN112215307B (en) Method for automatically detecting signal abnormality of earthquake instrument by machine learning
CN113485244A (en) Numerical control machine tool control system and method based on cutter wear prediction
CN111191502B (en) Stick-slip and jump drill abnormal working condition identification method based on drill string vibration signal
CN115310499B (en) Industrial equipment fault diagnosis system and method based on data fusion
CN110222390B (en) Gear crack identification method based on wavelet neural network
CN111241145A (en) Self-healing rule mining method and device based on big data
Chen et al. Hass: High accuracy spike sorting with wavelet package decomposition and mutual information
CN116298881B (en) Electrical signal motor health early warning method based on channel attention multi-module LMMD
Zhou et al. Fault variables recognition using improved k-nearest neighbor reconstruction
Liu et al. Research and application of wear prediction method of NC milling cutter based on data-driven

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151230

RJ01 Rejection of invention patent application after publication