CN105205113A

CN105205113A - System and method for excavating abnormal change process of time series data

Info

Publication number: CN105205113A
Application number: CN201510551876.XA
Authority: CN
Inventors: 鲍军鹏; 杨天社; 胡绍林; 齐勇; 高宇; 李肖瑛; 张海龙; 杨冬毅
Original assignee: Xian Jiaotong University; China Xian Satellite Control Center
Current assignee: Xian Jiaotong University; China Xian Satellite Control Center
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2015-12-30

Abstract

The invention discloses a system and a method for excavating the abnormal change process of time series data. The system comprises a data pre-processing module, an integrated eigenvector extraction module, an SDMC (similar density merge clustering) module, a feature character string generation module and an abnormal change process learning module. With the adoption of the system and the method, the change process from normality to abnormal deviation and then to obvious failure can be excavated from the mass time series data, and the feature change law in the process can be analyzed; the time series data are abstracted to form feature character strings, frequent words are excavated with a statistical learning method, and frequent modes are formed by continuous frequent words; the frequent modes correspond to general normal processes; a gap between the adjacent frequent modes is the abnormal change process; feature character strings of the abnormal change process express features of the process. The system and the method can be used for excavating and finding the abnormal change and failure development processes of a real-time system, play an important role in analyzing failure causes of the system and improving the failure diagnosis efficiency, and have important significance in whole-life health management of a complex system.

Description

A kind of digging system of time series data ANOMALOUS VARIATIONS process and method

[technical field]

The invention belongs to Intelligent Information Processing and field of computer technology, be specifically related to a kind of digging system for time series data ANOMALOUS VARIATIONS process and method.

[background technology]

Seasonal effect in time series ANOMALOUS VARIATIONS process is for understanding time series law characteristic, analysis of failure evolutionary process and fault cause, excavate fault knowledge, be familiar with to a deeper level and learn sequential system, prognoses system health status, gets involved initial failure early warning and all has vital role.

Seasonal effect in time series change often has an evolution.Its Evolution of different ANOMALOUS VARIATIONS is also different, respectively has feature.Excavating the evolution process of ANOMALOUS VARIATIONS and changing features rule, time series state will being excavated exactly from normal to departing from again to exception from magnanimity abnormal data, and from mile abnormality to the change procedure of severe exception or fault; Then the Changing Pattern of different characteristic in these evolution process is analyzed.

[summary of the invention]

The object that the present invention carries is the digging system and the method that provide a kind of time series data ANOMALOUS VARIATIONS process, by data prediction, multi-feature vector extraction, the generation of SDMC cluster, feature string, the process of mutation procedural learning, the change procedure from normal to exception can be excavated from magnanimity time series data.

To achieve these goals, the present invention adopts following technical scheme:

A digging system for time series data ANOMALOUS VARIATIONS process, comprises data preprocessing module, multi-feature vector extraction module, SDMC cluster module, feature string generation module and mutation procedural learning module;

Data preprocessing module, for cleaning original temporal data, interpolation processing, obtains normalization data;

Multi-feature vector extraction module, for automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;

SDMC cluster module, for carrying out cluster to multi-feature vector and merging between carrying out bunch cluster result;

Feature string generation module, for converting data to characteristic of correspondence character string according to cluster result;

Mutation procedural learning module, for feature string is divided into word sequence, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.

The present invention further improves and is: data preprocessing module comprises elimination of burst noise, generates one-parameter file, processes at equal intervals and normalized step; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval.

The present invention further improves and is: multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:

C = \frac{1}{f} = \frac{N T}{k}

Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 ^lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 ^lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2.

The present invention further improves and is: SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.

The present invention further improves and is: feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N.

The present invention further improves and is: mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.

A method for digging for time series data ANOMALOUS VARIATIONS process, comprises the following steps:

The first step: data preprocessing module is cleaned original temporal data, interpolation processing, obtains normalization data;

Second step: multi-feature vector extraction module automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector;

3rd step: SDMC cluster module carries out cluster to multi-feature vector and merges between carrying out bunch cluster result;

4th step: feature string generation module converts data to characteristic of correspondence character string according to cluster result;

5th step: feature string is divided into word sequence by mutation procedural learning module, is divided into frequent and non-frequent word, then by asking for frequent mode gap thus obtaining non-frequent mode according to the frequency of word; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.

The present invention further improves and is, described method for digging specifically comprises the following steps:

The first step: data preprocessing module is carried out elimination of burst noise to original temporal data, generate one-parameter file, processed at equal intervals and normalized; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval;

Second step: multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:

C = \frac{1}{f} = \frac{N T}{k}

Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 ^lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 ^lbe less than threshold value h; Window data, by after L layer wavelet decomposition, obtains wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, before selecting, n maximum Fourier coefficient and respective frequencies thereof are as Fourier's feature; N value is 2;

3rd step: SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates;

4th step: feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N;

5th step: mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.

Relative to prior art, the present invention has following beneficial effect: the present invention combines multiple temporal aspect, improves clustering method, thus excavates time series data mutation process more stablely, and abstract can be provided with feature string and represent, better process the uncertainty of time series data.

[accompanying drawing explanation]

Fig. 1 is the module frame figure of present system.

Fig. 2 is SDMC cluster module process flow diagram of the present invention.

Fig. 3 is mutation procedural learning block flow diagram of the present invention.

Fig. 4 is example parameter data and curves figure of the present invention.

Fig. 5 is the frequent mode that obtains of example parameter of the present invention and non-frequent mode.

Fig. 6 is the ANOMALOUS VARIATIONS process graphical that example parameter of the present invention is excavated.

[embodiment]

It is below the better exemplifying embodiment of this method.

With reference to Fig. 1, the digging system of a kind of time series data ANOMALOUS VARIATIONS of the present invention process, comprises data preprocessing module 1-1, multi-feature vector extraction module 1-2, SDMC cluster module 1-3, feature string generation module 1-4, mutation procedural learning module 1-5.

Data preprocessing module, for cleaning original temporal data, interpolation processing, obtains normalization data.

Data preprocessing module comprises elimination of burst noise, generates one-parameter file (cleaning), processes (interpolation) and normalized work at equal intervals; In order to remove noise jamming, obtain valid data value, the present invention deletes the invalid outlier in original temporal data by " elimination of burst noise process ", remain with valid value.Be specially, to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, reaches the object of elimination of burst noise with this.The present invention extracts one-parameter feature, does not consider the relation between multiparameter.Therefore we are write separately each actual parameter as a data file.To data, the present invention processes to ensure that the time interval in continuous time section between any two data points is identical at equal intervals.At equal intervals in handling procedure, we sample every 1 second to data at acquiescence.Data at equal intervals after process, per minute all from 0 second, 59 seconds terminate.Data also will be normalized after processing at equal intervals, its span are transformed on [0,1] interval, to eliminate the impact of dimension on result.Concrete employing linear normalization method, wherein maximin is obtained by the data statistics after processing at equal intervals, also can artificially arrange.

Multi-feature vector extraction module, for automatic analysis gained normalization data, obtain the minimum complete cycle of data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector.

Multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window, but not single features is vectorial.Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; The present invention automatically identifies the minimum complete cycle of time series data, and need not manually calculate one by one: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtain N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:

C = \frac{1}{f} = \frac{N T}{k}

Wherein, C represents the data cycle, and N represents window number, and T represents sampling interval Δ t, and k represents the frequency corresponding to maximal Fourier coefficient; Then time series data is divided into disjoint watch window, extracts polytype structural feature multi-feature vector at each watch window; For cycle data, then the minimum complete cycle fetched data is as window size; For data non-periodic, then manually specify a fixed value as window size; Window feature comprises average, variance, wavelet character, Fourier's structural feature proper vector in window; Wavelet character is obtained by wavelet decomposition; The present invention according to the data adaptive determination wavelet decomposition number of plies, to obtain suitable proper vector length; Wavelet decomposition number of plies L obtains according to window size k and threshold value h self-adaptation; Threshold value h is the maximum length expecting to obtain wavelet coefficient; L is initially 1, for the window size of regular length, if k/2 ^lbe less than threshold value h, then Decomposition order is L, otherwise L adds 1, repeats said process, until k/2 ^lbe less than threshold value h; Window data, by after L layer wavelet decomposition, can obtain wavelet approximation coefficients and the wavelet details coefficient of equal length; Fourier's feature is made up of the Fourier coefficient of fixed number and respective frequencies thereof; Watch window obtains a series of Fourier coefficient after Fourier transform; Ignore DC component, the Fourier coefficient that before selecting, n (n is defaulted as 2) is maximum and respective frequencies thereof are as Fourier's feature.

SDMC cluster module, for carrying out cluster to multi-feature vector and merging between carrying out bunch cluster result, promotes Clustering Effect.

SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; Distance between traditional K-Means cluster can not ensure bunch is enough large; When some data point compares dispersion time, traditional K-Means cluster or point not high enough for a large amount of similarity is gathered in bunch by force, causes bunch very loose; A lot of tuftlet can be generated, and more similar between tuftlet; These two kinds of cluster results all do not have to reflect data real structure objective and accurately; SDMC (SimilarDensityMergeClustering) clustering method that the present invention proposes is similar to traditional K-Means method, but merging process between finally carrying out bunch, point in ensureing each bunch is enough similar, and similar tuftlet is suitably merged; SDMC clustering method specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.

Feature string generation module, for converting data to characteristic of correspondence character string according to cluster result.

Feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, thus N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data are converted to the feature string that length is N; Larger character then represents more possible off-note, the feature that namely probability of occurrence is less; The feature of maximum probability is designated as " a ", and secondary large feature is designated as " b " by that analogy; Article one, original temporal data are converted into a feature string.

Mutation procedural learning module is first given treats word under investigation size (be defaulted as 4, can think given); Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.

The method of a kind of time series data ANOMALOUS VARIATIONS of the present invention process, comprises the following steps:

First, data preprocessing module 1-1 cleans original temporal data, interpolation processing, obtains valid data form, to carry out follow-up excacation.

Secondly, multi-feature vector extraction module 1-2 automatic analysis data, obtain the minimum complete cycle of cycle data, then be a watch window for cycle data with its minimum complete cycle, then extract the average in this window, variance, wavelet character, Fourier's structural feature multi-feature vector.

Then, SDMC cluster module 1-3 carries out cluster to multi-feature vector and merges between carrying out bunch cluster result.

Then, feature string generation module 1-4 converts data to characteristic of correspondence character string according to cluster result.

Finally, feature string is divided into word sequence by mutation procedural learning module 1-5, frequent and non-frequent word is divided into according to the frequency of word, then by asking for frequent mode gap thus obtaining non-frequent mode, change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode.

With reference to Fig. 2, it is the process flow diagram of SDMC cluster module of the present invention, comprises the following steps:

First carry out step 2-1, get Article 1 multi-feature vector and be one bunch separately, and as bunch center.Then carry out step 2-2, judge whether all multi-feature vectors process.If untreated complete all multi-feature vectors, then perform step 2-3, take off a multi-feature vector.Then perform step 2-4, calculate the distance at this multi-feature vector and current all bunches of centers.Then perform step 2-5, judge whether this multi-feature vector is less than appointment threshold value with the distance at certain bunch of center.If be less than appointment threshold value, then perform step 2-6, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center, then go to step 2-2.Otherwise, perform step 2-7, this multi-feature vector generated one bunch separately, and as bunch center, then go to step 2-2.If all multi-feature vectors process, then perform step 2-8, get Article 1 multi-feature vector.Then perform step 2-9, judge whether multi-feature vector processes.If untreated complete all multi-feature vectors, then perform step 2-10, calculate the distance at this multi-feature vector and current all bunches of centers.Then perform step 2-11, this multi-feature vector is put into nearest with it bunch.Then perform step 2-12, take off data.Then 2-9 is gone to step.If all multi-feature vectors process, then perform step 2-13, judge whether cluster result changes.If cluster result there occurs change, then perform step 2-14, adjustment change Cu Cu center, then goes to step 2-8.If cluster result is unchanged, then perform step 2-15, calculate distance between two between bunch center, from all bunches, select that bunch center is nearest two bunches.Then perform step 2-16, judge whether this is less than given threshold value to the distance between bunch center.If a bunch heart distance is less than given threshold value, then performs step 2-17, merge these two bunches, then go to step 2-15.If a bunch heart distance is not less than given threshold value, then SDMC cluster process terminates.

With reference to Fig. 3, it is mutation procedural learning block flow diagram of the present invention, comprises the following steps:

First carry out step 3-1, obtain the characteristic character string sequence generated by feature string generation module.Then perform step 3-2, in this character string, add up the frequency of occurrences that all length is the word of L (be defaulted as 4, can think given) individual character.Then perform step 3-3, judge whether the frequency of occurrences of all words is greater than given threshold value.If word frequencies is not more than given threshold value, perform step 3-4, marking this word is non-frequent word; Otherwise perform step 3-5, marking this word is frequent word.After all words have judged, perform step 3-6, rescan characteristic character string sequence.Then perform step 3-7, judge whether current location arrives character string end.If do not arrive character string end, then perform step 3-8, judge whether a continuous print L character is frequent word from current location.If this word is not frequent word, then performs step 3-9, judge whether its previous word is frequent word.If previous word is frequent word, then performs step 3-12 and obtain a frequent mode (i.e. the string of continuous frequent word) from a upper position to current location, and this pattern is put into frequent mode queue.Then perform step 3-10, slide backward a character.If previous word is not frequent word, then directly performs step 3-10, slide backward a character.Then 3-7 is gone to step.If a continuous print L character is frequent word from current location, then performs step 3-11, slide backward L character.Then 3-7 is gone to step.If character string has scanned, arrive character string end, then performed step 3-13, from frequent mode queue, find out the character string corresponding to gap between all adjacent frequent modes, be non-frequent mode.Then perform step 3-14, export the ANOMALOUS VARIATIONS process corresponding to all non-frequent modes, comprise and change to non-frequent mode from frequent mode and change to the process of frequent mode from non-frequent mode.So far, mutation procedural learning terminates.

With reference to Fig. 4, it is the data and curves figure of this method example parameter.

With reference to Fig. 5, it is the frequent mode that obtains from above-mentioned example parameter and non-frequent mode.The wherein position that occurs in feature string of numeral pattern.

With reference to Fig. 6, illustrate the ANOMALOUS VARIATIONS process excavated from above-mentioned example parameter.

Claims

1. a digging system for time series data ANOMALOUS VARIATIONS process, is characterized in that, comprises data preprocessing module, multi-feature vector extraction module, SDMC cluster module, feature string generation module and mutation procedural learning module;

2. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, data preprocessing module comprises elimination of burst noise, generates one-parameter file, processes at equal intervals and normalized step; The step of elimination of burst noise comprises: to each data setting bound, the numerical value being greater than the upper limit is become the upper limit, the numerical value being less than lower limit becomes lower limit, with this elimination of burst noise; Process in treatment step at equal intervals, give tacit consent to and sampled every 1 second to data, the data at equal intervals after process, per minute all from 0 second, 59 seconds terminate; Data are normalized after processing at equal intervals, its span are transformed on [0,1] interval.

3. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, multi-feature vector extraction module obtains the comprehensive constitutive characteristic vector of various features on watch window; Multi-feature vector is specifically configured to: [average, variance, wavelet character, Fourier's feature]; Automatically identify the minimum complete cycle of time series data, comprise the following steps: first set an initial inspection window, then this window slides backward the Δ t time and obtains a new window, by that analogy, obtains N number of window, the interval of delta t time between each window; Then the parameter value in each window forms this window vector, then calculate respectively t+0 moment window vector with t+ Δ t, t+2 Δ t ..., the inner product between t+N Δ t} moment window vector, obtains inner product value sequence; Then Fourier transform is carried out to inner product value sequence, asks for the frequency corresponding to Fourier coefficient maximal value, finally go out the cycle of data according to following formulae discovery:

C = \frac{1}{f} = \frac{N T}{k}

4. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, SDMC cluster module uses the multi-feature vector of watch window to carry out cluster to data; The clustering method of SDMC cluster module specifically comprises the following steps: first get Article 1 multi-feature vector and be one bunch separately, and as bunch center; Then get follow-up multi-feature vector successively and calculate the distance at this multi-feature vector and current all bunches of centers; If this distance is not more than given threshold value, this multi-feature vector is put into it apart from minimum bunch, and adjust this bunch of center; If this distance is greater than given threshold value, this multi-feature vector is generated one bunch separately, and as bunch center; After according to said process all multi-feature vectors being processed, again travel through all multi-feature vectors, get a multi-feature vector successively, calculate the distance at this multi-feature vector and current all bunches of centers, then this multi-feature vector is put into nearest with it bunch; Current all bunch centers are adjusted after so processing all multi-feature vector; If a bunch center changes, then repeat aforementioned process till a bunch center no longer changes; When a bunch center no longer changes, calculate the distance between two between bunch center; If the distance between bunch heart is less than given threshold value, then merge these two bunches; Then this process is repeated until the distance between any two bunches of hearts is all greater than given threshold value; So far SDMC cluster process terminates.

5. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, it is characterized in that, feature string generation module according to cluster result find belonging to each watch window character pair vector bunch, then this watch window is represented with the characteristic character of this bunch, N number of watch window sequence is converted to N number of characteristic character sequence, namely original temporal data is converted to the feature string that length is N.

6. the digging system of a kind of time series data ANOMALOUS VARIATIONS process according to claim 1, is characterized in that, mutation procedural learning module is first given treats word under investigation size; Then feature string is divided into word sequence; Then the probability of occurrence of each word is added up; The word being greater than given probability threshold value is exactly frequent word, otherwise with regard to the frequent word of right and wrong; Then in feature string, the frequent word of continuous print forms frequent mode, and the gap of adjacent frequent mode is just non-frequent mode; Change to non-frequent mode from frequent mode and be exactly ANOMALOUS VARIATIONS process from the process that non-frequent mode changes to frequent mode, the feature string corresponding to non-frequent mode is exactly the feature of this mutation process.

7. a method for digging for time series data ANOMALOUS VARIATIONS process, is characterized in that, comprises the following steps:

8. the method for digging of a kind of time series data ANOMALOUS VARIATIONS process according to claim 7, it is characterized in that, described method for digging specifically comprises the following steps:

C = \frac{1}{f} = \frac{N T}{k}