CN104915434B - A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW - Google Patents

A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW Download PDF

Info

Publication number
CN104915434B
CN104915434B CN201510351181.7A CN201510351181A CN104915434B CN 104915434 B CN104915434 B CN 104915434B CN 201510351181 A CN201510351181 A CN 201510351181A CN 104915434 B CN104915434 B CN 104915434B
Authority
CN
China
Prior art keywords
series
mrow
time
dtw
msup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510351181.7A
Other languages
Chinese (zh)
Other versions
CN104915434A (en
Inventor
刘大同
陈静
彭宇
彭喜元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201510351181.7A priority Critical patent/CN104915434B/en
Publication of CN104915434A publication Critical patent/CN104915434A/en
Application granted granted Critical
Publication of CN104915434B publication Critical patent/CN104915434B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Arrangements For Transmission Of Measured Signals (AREA)

Abstract

A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW, the present invention relates to multidimensional time-series sorting technique.The present invention is to solve for satellite telemetering data be fixed a subsection efect it is undesirable, due to correlation between multidimensional time-series be present and time series has minor shifts and makes measurement results not accurate enough and then the problem of cause classification results not accurate enough, and propose a kind of multidimensional time-series sorting technique based on mahalanobis distance DTW.This method is by 1, obtains multidimensional time-series X={ x for training1,x2,...,xj... xnAnd class label L={ l1,l2,…,ln};2nd, multidimensional time-series X '={ x ' to be sorted is extracted1,x′2,...,x′m};3rd, X '={ x ' is calculated1,x′2,...,x′mAnd X={ x1,x2,...,xj... xnBetween DTW distance sequences;4th, using the KNN sorting techniques of the DTW distances based on mahalanobis distance, according to the k nearest neighbor number of setting to multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mClassified, determine what the steps such as the generic of multidimensional time-series to be sorted were realized.The present invention is applied to multidimensional time-series classification field.

Description

A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW
Technical field
The present invention relates to the multidimensional time-series sorting technique based on mahalanobis distance DTW.
Background technology
By analyzing the yaw-position angle in satellite telemetering data, the overall variation trend at yaw-position angle is as schemed Shown in 1, its variations in detail is as shown in Fig. 2 show that satellite telemetering data has significantly periodically, and the characteristic is distant with satellite Data offer unit is surveyed to be confirmed.By analyzing each cycle of telemetry, it can be deduced that satellite is in this week Whether the running status within the phase is normal, according to the fixing point situation undesirable to satellite telemetering data subsection efect, such as Fig. 3 institutes Show, the degree of coupling between each time series obtained after segmentation is not high enough, certain deviation, and propulsion over time be present This deviation can be more obvious.
It is that a critical function of data mining is carried out to satellite telemetering data that classification is carried out to satellite telemetering data, is being divided A variety of data mining tasks, such as pattern-recognition, abnormality detection etc. can be completed on the basis of class.And satellite telemetering data There is its own feature, such as:Parameter is more, dimension is high, drift etc. be present, and these features cause in point for satellite telemetering data Class such as Euclidean distance, Pearson correlation coefficients, is embodied inadaptable using classical Time Series Similarity measure Property.Classical Time Series Similarity measure, it is impossible to the interdependence effects between multidimensional time-series are excluded, meanwhile, pin Minor shifts to time series be present can not realize that asynchronous measurement make it that measurement results are not accurate enough, and then cause to satellite telemetry The classification results of data are not accurate enough.
The content of the invention
The invention aims to solve for satellite telemetering data be fixed a subsection efect it is undesirable, due to more Correlation be present between dimension time series and time series has minor shifts and makes measurement results not accurate enough and then cause The problem of classification results are not accurate enough, and propose a kind of multidimensional time-series sorting technique based on mahalanobis distance DTW.
Above-mentioned goal of the invention is achieved through the following technical solutions:
Step 1:Historical satellite telemetry Y under satellite normal operating condition is carried out by mark of argument catastrophe point Segmentation, obtains normal multidimensional time-series X={ x1,x2,...,xj... xn, wherein, Y ndRow naThe historical satellite remote measurement of row Data matrix, ndFor the dimension values of multidimensional time-series, naCounted for the data of all historical satellite telemetries, xjFor ndOK nlenColumn data matrix represents X j-th of sequence, j=1,2 ..., n, nlenFor length of time series, n is the number of members in X;
Step 2: by the multidimensional time-series X={ x obtained by after segmentation1,x2,...,xj... xn, gathered by level Class method sets cluster target class number and carries out cluster operation to sequence as c, so as to obtain the classification mark of multidimensional time-series Sign L={ l1,l2,…,ln};Wherein, c is the positive integer less than n, l more than 1sRepresent L sequences s-th of element, its value by The determination of hierarchical clustering result, wherein s=1,2 ..., n;
Step 3:Extract adjacent m+1 argument catastrophe point in newest satellite telemetering data and correspond to test within time point Data are that multidimensional time-series to be sorted are X '={ x '1,x′2,...,x′m, wherein, m is the positive integer more than 0;
Step 4: calculate multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mWith containing class label Multidimensional time-series X={ x1,x2,...,xj... xnBetween DTW distance sequences
Wherein, dijCalculation it is as follows:
dij=DTWma(x'i,xj)
x'iRepresent X ' i-th of sequence, i=1,2 ..., m;DTWmaRepresent the DTW distance algorithms based on mahalanobis distance; DTW, dijFor x'iWith xjBetween the DTW distances based on mahalanobis distance;
Step 5: using the KNN sorting techniques of the DTW distances based on mahalanobis distance, treated according to the k nearest neighbor number of setting Multidimensional time-series X '={ x ' of classification1,x′2,...,x′mClassified, determine multidimensional time-series X ' to be sorted= {x′1,x′2,...,x′mGeneric L '={ l '1,l′2,…,l′m, wherein, K=1,2 ..., n;Generic l' is 1, Certain number determined in 2, L, c;KNN is K arest neighbors sorting algorithms;When completing a kind of multidimensional based on mahalanobis distance DTW Between sequence sorting technique.
Invention effect
It is that a critical function of data mining is carried out to satellite telemetering data that classification is carried out to satellite telemetering data, is being divided A variety of data mining tasks, such as pattern-recognition, abnormality detection etc. can be completed on the basis of class.And satellite telemetering data There is its own feature, such as:Parameter is more, dimension is high, drift etc. be present, and these features cause in point for satellite telemetering data Class such as Euclidean distance, Pearson correlation coefficients, is embodied inadaptable using classical Time Series Similarity measure Property.Classical time series measure, it is impossible to the interdependence effects between multidimensional time-series are excluded, meanwhile, for the time Sequence, which has minor shifts, can not realize that asynchronous measurement make it that measurement results are not accurate enough, and then cause to satellite telemetering data Classification results are not accurate enough.Therefore, it is necessary to using more rational Time Series Similarity measure.For some it is complicated or The satellite telemetering data that person's feature is not quite similar, choose reasonable time sequence similarity measure, it can be ensured that corresponding Mode excavation obtains more good effect.The specific invention effect of each several part is as follows:
The present invention first against according to the fixing point situation undesirable to satellite telemetering data subsection efect as shown in figure 3, It is the method that is segmented of mark to propose according to the argument catastrophe point in satellite telemetering data, its subsection efect as shown in figure 4, More compact as the segmentation result that mark is segmented using argument, the degree of coupling between each fragment sequence is higher, more reasonable.
Then, using dynamic time warping (Dynamic Time Warping, DTW) distance based on mahalanobis distance to more The distance between dimension satellite telemetering data time series is measured, and eliminates the interdependence effects between multidimensional time-series, Asynchronous measurement is realized, solves the problems, such as to make measurement results not true enough because time series has minor shifts.
Finally, gone through with reference to K nearest-neighbors (K-Nearest Neighbor, KNN) sorting algorithm and satellite telemetering data History multidimensional time-series are classified to newest remote measurement multidimensional time-series, and the running status current to satellite has been better achieved Differentiation.
Brief description of the drawings
Fig. 1 is the yaw-position angle sequence example schematic diagram that background technology proposes;
Fig. 2 is that the yaw-position angle sequence details that background technology proposes change example schematic diagram;
Fig. 3 is the result being segmented using fixing point to satellite telemetering data that embodiment one proposes;
Fig. 4 is the result using argument catastrophe point as mark to satellite telemetering data segmentation that embodiment one proposes;
Fig. 5 is the example schematic diagram of Wafer parameters 1 that embodiment proposes;
Fig. 6 is the example schematic diagram of Wafer parameters 2 that embodiment proposes;
Fig. 7 is the example schematic diagram of Wafer parameters 3 that embodiment proposes;
Fig. 8 is the example schematic diagram of Wafer parameters 4 that embodiment proposes;
Fig. 9 is the example schematic diagram of Wafer parameters 5 that embodiment proposes;
Figure 10 is the example schematic diagram of Wafer parameters 6 that embodiment proposes;
Figure 11 (a) is the 1st class schematic diagram data of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 11 (b) is the 2nd class schematic diagram data of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 11 (c) is the 3rd class schematic diagram data of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 11 (d) is the 4th class schematic diagram data of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 12 (a) is the 1st class schematic diagram data of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 12 (b) is the 2nd class schematic diagram data of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 12 (c) is the 3rd class schematic diagram data of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 12 (d) is the 4th class schematic diagram data of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 13 (a) is the 1st class schematic diagram data of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 13 (b) is the 2nd class schematic diagram data of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 13 (c) is the 3rd class schematic diagram data of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 13 (d) is the 4th class schematic diagram data of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 14 (a) is the 1st class result schematic diagram of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 14 (b) is the 2nd class result schematic diagram of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 14 (c) is the 3rd class result schematic diagram of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 14 (d) is the 4th class result schematic diagram of the satellite telemetering data dimension 1 that embodiment proposes;
Figure 15 (a) is the 1st class result schematic diagram of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 15 (b) is the 2nd class result schematic diagram of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 15 (c) is the 3rd class result schematic diagram of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 15 (d) is the 4th class result schematic diagram of the satellite telemetering data dimension 2 that embodiment proposes;
Figure 16 (a) is the 1st class result schematic diagram of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 16 (b) is the 2nd class result schematic diagram of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 16 (c) is the 3rd class result schematic diagram of the satellite telemetering data dimension 3 that embodiment proposes;
Figure 16 (d) is the 4th class result schematic diagram of the satellite telemetering data dimension 3 that embodiment proposes.
Embodiment
Embodiment one:A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW of present embodiment, Specifically prepared according to following steps:
Step 1:Historical satellite telemetry Y under satellite normal operating condition is carried out by mark of argument catastrophe point Segmentation, obtains normal multidimensional time-series X={ x1,x2,...,xj... xn, wherein, Y ndRow naThe historical satellite remote measurement of row Data matrix, ndFor the dimension values of multidimensional time-series, naCounted for the data of all historical satellite telemetries, xjFor ndOK nlenColumn data matrix represents X j-th of sequence, j=1,2 ..., n, nlenFor length of time series, n is the number of members in X;
Step 2: by the multidimensional time-series X={ x obtained by after segmentation1,x2,...,xj... xn, gathered by level Class method sets cluster target class number and carries out cluster operation to sequence as c, so as to obtain the classification mark of multidimensional time-series Sign L={ l1,l2,…,ln};Wherein, c is the positive integer less than n, l more than 1sRepresent L sequences s-th of element, its value by The determination of hierarchical clustering result, wherein s=1,2 ..., n;Classification assigned work herein, its method are not fixed, and can realize classification Specify any existing method can, hierarchy clustering method is that by one of method that classification is specified;
Step 3:Extract adjacent m+1 argument catastrophe point in newest satellite telemetering data and correspond to test within time point Data are that multidimensional time-series to be sorted are X '={ x '1,x′2,...,x′m, wherein, m is the positive integer more than 0;
Step 4: calculate multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mWith containing class label Multidimensional time-series X={ x1,x2,...,xj... xnBetween DTW distance sequences
Wherein, dijCalculation it is as follows:
dij=DTWma(x'i,xj)
x'iRepresent X ' i-th of sequence, i=1,2 ..., m;DTWmaRepresent the DTW distance algorithms based on mahalanobis distance; DTW (Dynamic Time Warping) is that one kind is reflected by bending time shaft preferably to carry out matching to time series form The method for measuring similarity (existing theory) penetrated, dijFor x'iWith xjBetween the DTW distances based on mahalanobis distance;
Step 5: using the KNN sorting techniques of the DTW distances based on mahalanobis distance, treated according to the k nearest neighbor number of setting Multidimensional time-series X '={ x ' of classification1,x′2,...,x′mClassified, determine multidimensional time-series X ' to be sorted= {x′1,x′2,...,x′mGeneric L '={ l '1,l′2,…,l′m, wherein, K=1,2 ..., n;Generic l' is 1, Certain number determined in 2, L, c;KNN (K-Nearest Neighbor) is K arest neighbors sorting algorithms;One kind is completed to be based on Mahalanobis distance DTW multidimensional time-series sorting technique.
Present embodiment effect:
It is that a critical function of data mining is carried out to satellite telemetering data that classification is carried out to satellite telemetering data, is being divided A variety of data mining tasks, such as pattern-recognition, abnormality detection etc. can be completed on the basis of class.And satellite telemetering data There is its own feature, such as:Parameter is more, dimension is high, drift etc. be present, and these features cause in point for satellite telemetering data Class such as Euclidean distance, Pearson correlation coefficients, is embodied inadaptable using classical Time Series Similarity measure Property.Classical time series measure, it is impossible to the interdependence effects between multidimensional time-series are excluded, meanwhile, for the time Sequence, which has minor shifts, can not realize that asynchronous measurement make it that measurement results are not accurate enough, and then cause to satellite telemetering data Classification results are not accurate enough.Therefore, it is necessary to using more rational Time Series Similarity measure.For some it is complicated or The satellite telemetering data that person's feature is not quite similar, choose reasonable time sequence similarity measure, it can be ensured that corresponding Mode excavation obtains more good effect.The specific invention effect of each several part is as follows:
The present invention first against according to the fixing point situation undesirable to satellite telemetering data subsection efect as shown in figure 4, It is the method that is segmented of mark to propose according to the argument catastrophe point in satellite telemetering data, its subsection efect as shown in figure 5, More compact as the segmentation result that mark is segmented using argument, the degree of coupling between each fragment sequence is higher, more reasonable.
Then, using dynamic time warping (Dynamic Time Warping, DTW) distance based on mahalanobis distance to more The distance between dimension satellite telemetering data time series is measured, and eliminates the interdependence effects between multidimensional time-series, Asynchronous measurement is realized, solves the problems, such as to make measurement results not true enough because time series has minor shifts.
Finally, gone through with reference to K nearest-neighbors (K-Nearest Neighbor, KNN) sorting algorithm and satellite telemetering data History multidimensional time-series are classified to newest remote measurement multidimensional time-series, and the running status current to satellite has been better achieved Differentiation.
Embodiment two:Present embodiment is unlike embodiment one:Argument is satellite in step 1 One of test parameter of telemetry, argument changing rule are incremented by successively from 0 °~360 °, have obvious periodicity, argument It is argument catastrophe point that value is changed into 0 ° from 360 °.Other steps and parameter are identical with embodiment one.
Embodiment three:Present embodiment is unlike embodiment one or two:By satellite in step 1 Historical satellite telemetry Y under normal operating condition is segmented using argument catastrophe point as mark, obtains normal multi-dimensional time Sequence X={ x1,x2,...,xj... xnDetailed process is:
(1) after argument reaches 360 °, then it is changed into 0 ° and restarts to be incremented by, is changed into 0 ° of this point from 360 ° and is mutated for argument Point;
(2) the corresponding time of argument catastrophe point is recorded;
(3) time according to corresponding to argument catastrophe point, extract two neighboring argument catastrophe point and correspond to test within the time Data are time series;Wherein multidimensional time-series are made up of a plurality of time series;Wherein, test data is yaw-position Angle, Speed of Reaction Wheels and busbar voltage.Other steps and parameter are identical with embodiment one or two.
Embodiment four:Unlike one of present embodiment and embodiment one to three:Step 4 is fallen into a trap Calculate dijDetailed process be:
(1) the covariance matrix C between each dimension of multidimensional time-series to be sorted is calculatedcov, its calculation is:
Ccov=E { [Y-E (Y)] [Y-E (Y)]T}
Wherein, Y ndRow naThe historical satellite telemetry matrix of row, E are to represent to calculate desired value;
(2) the DTW distances based on mahalanobis distance are i.e. in two time seriesesWithBetween find optimal crooked route to obtain minimum mahalanobis distance metric DTWma(x'i,xj);Adopt Carried out that d (p are calculated with mahalanobis distancek), calculation is:
In crooked route, the total Least-cost of bending that an optimal path causes it be present, i.e.,:
Wherein, P={ p1,p2,…,pK'Crooked route is represented,pkExpression P k-th of member, k=1, 2 ..., K', and for representing x'iIn the i-th ' individual element x 'ii'kWith xjIn jth ' individual element xjj'kBetween corresponding relation i' =1,2 ..., nlen, j'=1,2 ..., nlen, d (pk) represent x'ii'kWith xjj'kBending cost;
(3) in order to solveOne cost matrix is constructed by Dynamic Programming R (i', j'), i.e.,:
R (i', j')=d (i', j')+min { R (i', j'-1), R (i'-1, j'-1), R (i'-1, j') }
Wherein, R (0,0)=0, R (i', 0)=R (0, j')=+ ∞;R(nlen,nlen) it is exactly DTW measuring period sequences x'i And xjLowest distance value, that is, obtain DTWma(x'i,xj)=R (nlen,nlen).Other steps and parameter and embodiment one It is identical to one of three.
Embodiment five:Unlike one of present embodiment and embodiment one to four:Adopted in step 5 With the KNN sorting techniques of the DTW distances based on mahalanobis distance, according to the k nearest neighbor number of setting to multi-dimensional time sequence to be sorted Arrange X '={ x '1,x′2,...,x′mClassified, determine multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′m Generic L '={ l '1,l′2,…,l′mProcess be:
(1) determine and multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mIn each member between be based on horse The K multidimensional time-series containing class label of the DTW distance minimums of family name's distance, that is, exist In, per K minimum numerical value is taken out in row element, when determining the corresponding multidimensional containing class label of numerical value of this K minimum Between sequence, corresponding class label is
(2) classification often row class label is countedMiddle frequency of occurrences highest classification, it is Multidimensional time-series X '={ x ' of classification1,x′2,...,x′mGeneric be L '={ l '1,l′2,…,l′m}.Other steps Rapid and parameter is identical with one of embodiment one to four.
Beneficial effects of the present invention are verified using following examples:
Embodiment:
Carry out the KNN classification emulation experiments based on different time sequence similarity measure for Wafer data sets, Wafer data sets include 6 dimensions altogether, and for each dimension data as shown in Fig. 5 to Figure 10, its classification results is as shown in table 1.
Table 1 uses the classification results of different method for measuring similarity for Wafer data sets
By experimental result it can be found that the measurement results performance of traditional Euclidean distance is worst, and based on mahalanobis distance DTW distances behave oneself best, wherein when setting limitation length of window as 5, effect reaches optimal rate of accuracy reached to 98.10%, relatively 10.85% is improved in the accuracy rate of Euclidean distance.
Satellite telemetering data classification experiments:
Carry out the KNN classification experiments based on different time sequence similarity measure for satellite telemetering data, wherein Number of training is 50, and sample, which includes three dimension its corresponding relations, is respectively:Yaw-position angle corresponds to dimension 1, Speed of Reaction Wheels D Corresponding dimension 2, busbar voltage correspond to dimension 3, and it is always divided into 4 classification data of all categories such as Figure 11 (a)~(d) to Figure 13 (a)~(d) shows that test sample 50, its classification results is as shown in table 2, Figure 14 (a)~(d), Figure 15 (a)~(d) and Figure 16 (a)~(d) is specifically to be classified situation using the KNN algorithms of the DTW distances based on mahalanobis distance, and its classification results is as shown in table 2.
Table 2 uses the classification results of different method for measuring similarity for satellite telemetering data
By experimental result it can be found that the measurement results of traditional Euclidean distance still show it is worst, and based on geneva away from From DTW with a distance from behave oneself best, its rate of accuracy reached to 98.00%, 4.35%. is improved relative to the accuracy rate of Euclidean distance
The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to The protection domain of appended claims of the invention.

Claims (4)

1. a kind of multidimensional time-series sorting technique based on mahalanobis distance DTW, it is characterised in that one kind is based on mahalanobis distance DTW Multidimensional time-series sorting techniques be specifically to follow the steps below:
Step 1:Historical satellite telemetry Y under satellite normal operating condition is segmented using argument catastrophe point as mark, Obtain normal multidimensional time-series X={ x1,x2,...,xj... xn, wherein, it is width that argument value is changed into 0 ° of this point from 360 ° Cornicult height, Y ndRow naThe historical satellite telemetry matrix of row, ndFor the dimension values of multidimensional time-series, naGone through to be all The data points of history satellite telemetering data, xjFor ndRow nlenColumn data matrix represents X j-th of sequence, j=1,2 ..., n, nlen For length of time series, n is the number of members in X;
Step 2: by the multidimensional time-series X={ x obtained by after segmentation1,x2,...,xj... xn, pass through hierarchy clustering method Set cluster target class number and cluster operation is carried out to sequence as c, so as to obtain the class label L=of multidimensional time-series {l1,l2,…,ln};Wherein, c is the positive integer less than n, l more than 1sS-th of element of L sequences is represented, its value is gathered by level The determination of class result, wherein s=1,2 ..., n;
Step 3:Extract adjacent m+1 argument catastrophe point in newest satellite telemetering data and correspond to test data within time point Multidimensional time-series i.e. to be sorted are X '={ x '1,x′2,...,x′m, wherein, m is the positive integer more than 0;
Step 4: calculate multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mAnd the multidimensional containing class label Time series X={ x1,x2,...,xj... xnBetween DTW distance sequences
Wherein, dijCalculation it is as follows:
dij=DTWma(x'i,xj)
x'iRepresent X ' i-th of sequence, i=1,2 ..., m;DTWmaRepresent the DTW distance algorithms based on mahalanobis distance;dijFor x'iWith xjBetween the DTW distances based on mahalanobis distance;
Calculate dijDetailed process be:
(1) the covariance matrix C between each dimension of multidimensional time-series to be sorted is calculatedcov, its calculation is:
Ccov=E { [Y-E (Y)] [Y-E (Y)]T}
Wherein, Y ndRow naThe historical satellite telemetry matrix of row, E are to represent to calculate desired value;
(2) the DTW distances based on mahalanobis distance are i.e. in two time seriesesWithBetween find optimal crooked route to obtain minimum mahalanobis distance metric DTWma(x'i,xj);Adopt Carried out that d (p are calculated with mahalanobis distancek), calculation is:
<mrow> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>D</mi> <mi>M</mi> </msub> <mrow> <mo>(</mo> <msub> <msup> <mi>x</mi> <mo>&amp;prime;</mo> </msup> <mrow> <msup> <mi>ii</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>x</mi> <mrow> <msup> <mi>jj</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <msub> <msup> <mi>x</mi> <mo>&amp;prime;</mo> </msup> <mrow> <msup> <mi>ii</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <msup> <mi>jj</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> <mi>T</mi> </msup> <msubsup> <mi>C</mi> <mi>cov</mi> <mrow> <mo>-</mo> <mn>1</mn> </mrow> </msubsup> <mrow> <mo>(</mo> <msub> <msup> <mi>x</mi> <mo>&amp;prime;</mo> </msup> <mrow> <msup> <mi>ii</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>-</mo> <msub> <mi>x</mi> <mrow> <msup> <mi>jj</mi> <mo>&amp;prime;</mo> </msup> <mi>k</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </msqrt> </mrow>
In crooked route, the total Least-cost of bending that an optimal path causes it be present, i.e.,:
<mrow> <msub> <mi>DTW</mi> <mrow> <mi>m</mi> <mi>a</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <msup> <mi>x</mi> <mo>&amp;prime;</mo> </msup> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>x</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> <mi>p</mi> </munder> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>K</mi> <mo>&amp;prime;</mo> </msup> </munderover> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> </mrow>
Wherein, P={ p1,p2,…,pK' crooked route is represented,pkExpression P k-th of member, k=1,2 ..., K', And for representing x'iIn the i-th ' individual element x 'ii'kWith xjIn jth ' individual element x jj'kBetween corresponding relation i'=1, 2,…,nlen, j'=1,2 ..., nlen, d (pk) represent x'ii'kWith xjj'kBending cost;
(3) in order to solveBy Dynamic Programming come construct a cost matrix R (i', J'), i.e.,:
R (i', j')=d (i', j')+min { R (i', j'-1), R (i'-1, j'-1), R (i'-1, j') }
Wherein, R (0,0)=0, R (i', 0)=R (0, j')=+ ∞;R(nlen,nlen) it is exactly DTW measuring period sequences x'iAnd xj Lowest distance value, that is, obtain DTWma(x'i,xj)=R (nlen,nlen);
Step 5: using the KNN sorting techniques of the DTW distances based on mahalanobis distance, according to the k nearest neighbor number of setting to be sorted Multidimensional time-series X '={ x '1,x′2..., x ' m classified, determine multidimensional time-series X '={ x ' to be sorted1, x′2,...,x′mGeneric L '={ l1′,l2′,…,l′m, wherein, K=1,2 ..., n;Generic l' is 1,2 ..., Certain number determined in c;KNN is K arest neighbors sorting algorithms;Complete a kind of multi-dimensional time sequence based on mahalanobis distance DTW Row sorting technique.
A kind of 2. multidimensional time-series sorting technique based on mahalanobis distance DTW according to claim 1, it is characterised in that: Argument is one of test parameter of satellite telemetering data in step 1, and argument changing rule is, tool incremented by successively from 0 °~360 ° There is obvious periodicity.
A kind of 3. multidimensional time-series sorting technique based on mahalanobis distance DTW according to claim 1, it is characterised in that: The historical satellite telemetry Y under satellite normal operating condition is segmented using argument catastrophe point as mark in step 1, obtained To normal multidimensional time-series X={ x1,x2,...,xj... xnDetailed process is:
(1) after argument reaches 360 °, then it is changed into 0 ° and restarts to be incremented by, it is argument catastrophe point to be changed into 0 ° of this point from 360 °;
(2) the corresponding time of argument catastrophe point is recorded;
(3) time according to corresponding to argument catastrophe point, extract two neighboring argument catastrophe point and correspond to test data within the time For time series;Wherein multidimensional time-series are made up of a plurality of time series;Wherein, test data is yaw-position angle, flown Wheel speed and busbar voltage.
A kind of 4. multidimensional time-series sorting technique based on mahalanobis distance DTW according to claim 1, it is characterised in that: The KNN sorting techniques of the DTW distances based on mahalanobis distance are used in step 5, according to the k nearest neighbor number of setting to be sorted Multidimensional time-series X '={ x '1,x′2,...,x′mClassified, determine multidimensional time-series X '={ x ' to be sorted1,x ′2,...,x′mGeneric L '={ l '1,l′2,…,l′mProcess be:
(1) determine and multidimensional time-series X '={ x ' to be sorted1,x′2,...,x′mIn each member between based on geneva away from From DTW with a distance from minimum K multidimensional time-series containing class label, that is, existIn, often K minimum numerical value is taken out in row element, determines the multi-dimensional time sequence containing class label corresponding to this K minimum numerical value Row, corresponding class label are
(2) classification often row class label is countedMiddle frequency of occurrences highest classification, as classifies Multidimensional time-series X '={ x '1,x′2,...,x′mGeneric be L '={ l '1,l′2,…,l′m}。
CN201510351181.7A 2015-06-24 2015-06-24 A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW Active CN104915434B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510351181.7A CN104915434B (en) 2015-06-24 2015-06-24 A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510351181.7A CN104915434B (en) 2015-06-24 2015-06-24 A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW

Publications (2)

Publication Number Publication Date
CN104915434A CN104915434A (en) 2015-09-16
CN104915434B true CN104915434B (en) 2018-03-27

Family

ID=54084497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510351181.7A Active CN104915434B (en) 2015-06-24 2015-06-24 A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW

Country Status (1)

Country Link
CN (1) CN104915434B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709509B (en) * 2016-11-30 2021-05-28 哈尔滨工业大学 Satellite telemetry data clustering method based on time series special points
CN107451231A (en) * 2017-07-24 2017-12-08 上海电力学院 Indicator card sorting algorithm based on similarity query
CN108228832B (en) * 2018-01-04 2022-04-22 南京大学 Time series data completion method based on distance matrix
CN109034179B (en) * 2018-05-30 2022-03-22 河南理工大学 Rock stratum classification method based on Mahalanobis distance IDTW
CN109241077A (en) * 2018-08-30 2019-01-18 东北大学 Production target variation tendency visual query system and method based on similitude
CN109362036A (en) * 2018-10-17 2019-02-19 桂林电子科技大学 A kind of multi-modal indoor orientation method combined based on image with WIFI
CN109816211B (en) * 2018-12-29 2023-11-24 北京英视睿达科技股份有限公司 Method and device for judging similarity of pollution areas and improving pollution treatment efficiency
CN109828952B (en) * 2019-01-18 2021-05-11 上海卫星工程研究所 PCM system satellite telemetry data classification extraction method and system
CN110096335B (en) * 2019-04-29 2022-06-21 东北大学 Service concurrency prediction method for different types of virtual machines
CN110289986B (en) * 2019-05-27 2021-05-18 武汉大学 Accuracy quantification method of network simulation data
CN110288003B (en) * 2019-05-29 2022-01-18 北京师范大学 Data change identification method and equipment
CN111104438A (en) * 2019-11-21 2020-05-05 新浪网技术(中国)有限公司 Method and device for determining periodicity of time sequence and electronic equipment
CN112380992B (en) * 2020-11-13 2022-12-20 上海交通大学 Method and device for evaluating and optimizing accuracy of monitoring data in machining process
CN116504416B (en) * 2023-06-27 2023-09-08 福建无止境光学仪器有限公司 Eye degree prediction method based on machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
US8005771B2 (en) * 2007-10-04 2011-08-23 Siemens Corporation Segment-based change detection method in multivariate data stream
CN103646167A (en) * 2013-11-22 2014-03-19 北京空间飞行器总体设计部 Satellite abnormal condition detection system based on telemeasuring data
CN104102726A (en) * 2014-07-22 2014-10-15 南昌航空大学 Modified K-means clustering algorithm based on hierarchical clustering
CN104123368A (en) * 2014-07-24 2014-10-29 中国软件与技术服务股份有限公司 Big data attribute significance and recognition degree early warning method and system based on clustering

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8005771B2 (en) * 2007-10-04 2011-08-23 Siemens Corporation Segment-based change detection method in multivariate data stream
CN101561878A (en) * 2009-05-31 2009-10-21 河海大学 Unsupervised anomaly detection method and system based on improved CURE clustering algorithm
CN103646167A (en) * 2013-11-22 2014-03-19 北京空间飞行器总体设计部 Satellite abnormal condition detection system based on telemeasuring data
CN104102726A (en) * 2014-07-22 2014-10-15 南昌航空大学 Modified K-means clustering algorithm based on hierarchical clustering
CN104123368A (en) * 2014-07-24 2014-10-29 中国软件与技术服务股份有限公司 Big data attribute significance and recognition degree early warning method and system based on clustering

Also Published As

Publication number Publication date
CN104915434A (en) 2015-09-16

Similar Documents

Publication Publication Date Title
CN104915434B (en) A kind of multidimensional time-series sorting technique based on mahalanobis distance DTW
Zheng et al. A new unsupervised data mining method based on the stacked autoencoder for chemical process fault diagnosis
Chen et al. Multi-fault diagnosis study on roller bearing based on multi-kernel support vector machine with chaotic particle swarm optimization
CN109271975B (en) Power quality disturbance identification method based on big data multi-feature extraction collaborative classification
CN104915568B (en) Satellite telemetering data method for detecting abnormality based on DTW
Du et al. Recognition of concurrent control chart patterns using wavelet transform decomposition and multiclass support vector machines
Abonyi et al. Modified Gath–Geva clustering for fuzzy segmentation of multivariate time-series
Ren et al. A novel hybrid method of lithology identification based on k-means++ algorithm and fuzzy decision tree
Di Maio et al. Ensemble-approaches for clustering health status of oil sand pumps
CN110018670A (en) A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules
CN102930285A (en) Early failure identification method based on SILLE (Supervised Increment Locally Linear Embedding) dimensionality reduction
CN102945517B (en) The data digging method in a kind of apparel standard man-hour based on cluster analysis
Ishibashi et al. GFRBS-PHM: A genetic fuzzy rule-based system for phm with improved interpretability
CN109542952A (en) A kind of detection method of time series abnormal point
CN112749840A (en) Method for acquiring reference value of energy efficiency characteristic index of thermal power generating unit
CN113886183B (en) Method for measuring and calculating occurrence time of voltage sag event
CN108256274B (en) Power system state identification method based on search attractor error algorithm
CN109034179B (en) Rock stratum classification method based on Mahalanobis distance IDTW
Song et al. Robust time series dissimilarity measure for outlier detection and periodicity detection
Dubey et al. Hybrid classification model of correlation-based feature selection and support vector machine
Othman et al. Abnormal patterns detection in control charts using classification techniques
CN108664923A (en) Voltage disturbance Modulation recognition method and system based on LMD and machine learning classification
Makhlouk Time series data analytics: Clustering-based anomaly detection techniques for quality control in semiconductor manufacturing
Arpitha et al. Machine learning approaches for fault detection in semiconductor manufacturing process: A critical review of recent applications and future perspectives
Peng et al. A bidirectional weighted boundary distance algorithm for time series similarity computation based on optimized sliding window size.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant