CN103886195B - Time Series Similarity measure under shortage of data - Google Patents

Time Series Similarity measure under shortage of data Download PDF

Info

Publication number
CN103886195B
CN103886195B CN201410095671.0A CN201410095671A CN103886195B CN 103886195 B CN103886195 B CN 103886195B CN 201410095671 A CN201410095671 A CN 201410095671A CN 103886195 B CN103886195 B CN 103886195B
Authority
CN
China
Prior art keywords
similarity
data
nan
interval
time series
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410095671.0A
Other languages
Chinese (zh)
Other versions
CN103886195A (en
Inventor
祁宏生
王殿海
许骏
叶盈
韦薇
郑正非
蔡正义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201410095671.0A priority Critical patent/CN103886195B/en
Publication of CN103886195A publication Critical patent/CN103886195A/en
Application granted granted Critical
Publication of CN103886195B publication Critical patent/CN103886195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Time Series Similarity measure that can adapt to missing data.The method extracts data pair between two from original two time serieses, and is divided into 5 kinds according to shortage of data situation, calculates single order similarity respectively interval; Interval to single order similarity, extraction interval calculates second order similarity between two, and obtains second order similarity vector; Finally second order similarity vector is done on average, obtain two final seasonal effect in time series similarities.The present invention can adapt to several scenes, and method is simple, has no requirement to data integrity.

Description

Time Series Similarity measure under shortage of data
Technical field
The present invention relates to the time series similarity computing method in a kind of computer information processing, relating to specifically calculate has one or more missing data and the physical constraint of data is the method for the similarity between two time serieses in [0, the upper limit] situation.
Background technology
Time series is present in human society and occurring in nature in a large number, such as financial time series, traffic time sequence, temperature-time sequence etc., Time Series Similarity can find the many similar time serieses in similar field, thus provides extremely favourable data for the analysis of physical phenomenon and social phenomenon.Current Time Series Similarity method is mainly for the situation not having missing data, if shortage of data, mean value replacement, trend extrapolation, exponential smoothing etc. is then utilized to make up, but these make up the knowledge needing priori, thus be difficult to ensure the similarity accuracy after Data-parallel language, and in some cases, the disappearance of data can not only be interpreted as lacking of information, sometimes exactly can reflect more data characteristics.Thus to be necessary in missing data situation sequence similarity measure Time Created.
Summary of the invention
In order to overcome existing time series tolerance cannot be applied to missing data situation under, the present invention propose a kind of can under any deletion condition computing time sequence similarity method.The method has no requirement to data integrated degree.
The present invention solves method that its technical matters adopts as following, for two time serieses:
1) two seasonal effect in time series data pair are extracted between two.
2) each is divided into five kinds to data deletion condition, and it is interval to calculate its single order similarity according to shortage of data situation.
3) similarity is calculated between two to the some similarity intervals calculated again and obtain second order similarity vector.
4) second order similarity vector is averaging, then obtains final two seasonal effect in time series similarities.
Beneficial effect of the present invention: because the time series great majority of occurring in nature have certain constraint (such as speed is greater than 0 and is less than section speed limit), can adapt to several scenes, method is simple, has no requirement to data integrity.
Accompanying drawing explanation
Fig. 1 is the Similarity Measure schematic diagram of two bivectors containing missing values.
Embodiment
Below the present invention is described in further detail.
Suppose for two time series X i=(x i1, x i2...) and X j=(x j1, x j2...), length of time series is all N, and each value of time series has upper limit x, and lower limit is 0, and similarity calculating method is as follows:
1) extract two seasonal effect in time series data pair between two, if extract m and the n-th data to two time serieses respectively, obtain x jm, x jnand x im, x in, total right.And being constrained to of each data
2) for this to the every a pair { x in data im, x inand { x jm, x jn, be divided into following 5 kinds of situations to consider to calculate similarity interval, this interval is referred to as single order similarity:
(1) if data do not lack, then according to formula below:
s mn ′ ( { x im , x in } , { x jm , x jn } ) = x im x jm + x in x jn ( x jm ) 2 + ( x jn ) 2 ( x im ) 2 + ( x in ) 2
Final data to similarity interval are:
s mn∈[s mn′({x im,x in},{x jm,x jn}),s mn′({x im,x in},{x jm,x jn})]
(2) if data all lack, be also { x im, x in}={ NaN, NaN} and { x jm, x jn}={ NaN, NaN}, then:
s mn∈[1,1]
(3) if data only have a disappearance, without loss of generality, x is supposed jn=NaN, then according to cosine similarity Computation schema, the similarity of two binary vectors to equal on two dimensional surface two vectorial included angle cosines, as shown in Figure 1, works as x jnvacancy, due to x jnhave bound, thus two vectorial angles have a maximal value and minimum value, thus similarity be one also interval:
s mn∈[min(1,cos(Θ 1),cos(Θ 2)),max(1,cos(Θ 1),cos(Θ 2))]
Can be in the hope of cos ( Θ 1 ) = x im ( x im ) 2 + ( x in ) 2 , cos ( Θ 2 ) = x im x jm + x in x ‾ ( x jm ) 2 + ( x ‾ ) 2 ( x im ) 2 + ( x in ) 2 ;
(4) if two data are to there being shortage of data, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ x jm, NaN}, then similar, similarity is an interval:
s mn ∈ [ 0 , max ( x im ( x im ) 2 + ( x ‾ ) 2 , x jm ( x jm ) 2 + ( x ‾ ) 2 ) ]
(5) if two data are to there being a disappearance, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ NaN, x jn; Or two data are to having three disappearances, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ NaN, NaN}, similar, there is a similarity interval:
s mn∈[0,1]
3) right individual similarity interval is (by each s mninterval Unified Expression is represent interval initial value, represent interval end value), then calculate similarity (due to all known during similarity, thus similarity is all a scalar) successively between two, be referred to as second order similarity.Suppose that certain a pair similarity interval is respectively with then their similarity s mnkjcomputing method are:
s mnkj = s mn 1 s kj 1 + s mn 2 s kj 2 ( s mn 1 ) 2 + ( s kj 1 ) 2 ( s kj 1 ) 2 + ( s kj 2 ) 2 , ∀ m ≠ n , k ≠ j
Known, s mnkjnumber be C N ( N - 1 ) 2 2 = N ( N - 1 ) 2 ( N ( N - 1 ) 2 - 1 ) 2 = N 4 - 2 N 3 - N 2 - 2 N + 2 8
4) second order similarity vector is averaging, final two time series similarity s (X i, X j) be:
s ( X i , X j ) = Σ m ≠ nmk ≠ j s mnkj N 4 - 2 N 3 - N 2 - 2 N + 2 8 = 8 Σ m ≠ n , k ≠ j s mnkj N 4 - 2 N 3 - N 3 - 2 N + 2
So far, the time series similarity of two missing datas has been obtained.

Claims (1)

1. Time Series Similarity measure under shortage of data, is characterized in that:
Suppose for two time series X i=(x i1, x i2...) and X j=(x j1, x j2...), length of time series is all N, and missing data is expressed as NaN, and each value of time series has the upper limit lower limit is 0, and similarity calculating method is as follows:
1) extract two seasonal effect in time series data pair between two, if extract m and the n-th data to two time serieses respectively, obtain x jm, x jnand x im, x in, total right; And being constrained to of each data
2) for this to the every a pair { x in data im, x inand { x jm, x jn, be divided into following five kinds of situations to consider to calculate similarity interval, this interval is referred to as single order similarity:
(1) if data do not lack, then according to formula below:
s mn ′ ( { x im , x in } , { x jm , x jn } ) = x im x jm + x in x jn ( x jm ) 2 + ( x jn ) 2 ( x im ) 2 + ( x in ) 2 ;
Final data to similarity interval are:
s mn∈[s mn′({x im,x in},{x jm,x jn}),s mn′({x im,x in},{x jm,x jn})];
(2) if data all lack, be also { x im, x in}={ NaN, NaN} and { x jm, x jn}={ NaN, NaN}, then:
s mn∈[1,1];
(3) if data only have a disappearance, without loss of generality, x is supposed jn=NaN, then according to cosine similarity Computation schema, the similarity of two binary vectors to equal on two dimensional surface two vectorial included angle cosines, works as x jnvacancy, due to x jnhave bound, thus two vectorial angles have a maximal value and minimum value, thus similarity be one also interval:
s mn∈[min(1,cos(Θ 1),cos(Θ 2)),max(1,cos(Θ 1),cos(Θ 2))];
Try to achieve cos ( Θ 1 ) = x im ( x im ) 2 + ( x in ) 2 , cos ( Θ 2 ) = x im x jm + x in x ‾ ( x jm ) 2 + ( x ‾ ) 2 ( x im ) 2 + ( x in ) 2 ;
(4) if two data are to there being shortage of data, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ x jm, NaN}, then similarity is an interval:
s mn ∈ [ 0 , max ( x im ( x im ) 2 + ( x ‾ ) 2 , x jm ( x jm ) 2 + ( x ‾ ) 2 ) ] ;
(5) if two data are to there being a disappearance, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ NaN, x jn; Or two data are to having three disappearances, and form is { x im, x in}={ x im, NaN} and { x jm, x jn}={ NaN, NaN}, have a similarity interval:
s mn∈[0,1];
3) by each s mninterval Unified Expression is represent interval initial value, represent interval end value, right individual similarity is interval, then calculates similarity between two successively, is referred to as second order similarity; Suppose that certain a pair similarity interval is respectively with then their similarity s mnkjfor:
s mnkj = s mn 1 s kj 1 + s mn 2 s kj 2 ( s mn 1 ) 2 + ( s kj 1 ) 2 ( s kj 1 ) 2 + ( s kj 2 ) 2 , ∀ m ≠ n , k ≠ j ;
Known, s mnkjnumber be C N ( N - 1 ) 2 2 = N ( N - 1 ) 2 ( N ( N - 1 ) 2 - 1 ) 2 = N 4 - 2 N 3 - N 2 - 2 N + 2 8 ;
4) second order similarity vector is averaging, final two time series similarity s (X i, X j) be:
s ( X i , X j ) = Σ m ≠ n , k ≠ j s mnkj N 4 - 2 N 3 - N 2 - 2 N + 2 8 = 8 Σ m ≠ n , k ≠ j s mnkj N 4 - 2 N 3 - N 2 - 2 N + 2 ;
So far, the time series similarity of two missing datas has been obtained.
CN201410095671.0A 2014-03-14 2014-03-14 Time Series Similarity measure under shortage of data Active CN103886195B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410095671.0A CN103886195B (en) 2014-03-14 2014-03-14 Time Series Similarity measure under shortage of data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410095671.0A CN103886195B (en) 2014-03-14 2014-03-14 Time Series Similarity measure under shortage of data

Publications (2)

Publication Number Publication Date
CN103886195A CN103886195A (en) 2014-06-25
CN103886195B true CN103886195B (en) 2015-08-26

Family

ID=50955085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410095671.0A Active CN103886195B (en) 2014-03-14 2014-03-14 Time Series Similarity measure under shortage of data

Country Status (1)

Country Link
CN (1) CN103886195B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885696B (en) * 2017-11-20 2021-09-07 河海大学 Method for realizing missing data restoration by utilizing observation sequence similarity
CN113643538B (en) * 2021-08-11 2023-05-12 昆山轨道交通投资置业有限公司 Bus passenger flow measuring and calculating method integrating IC card historical data and manual investigation data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2354330B1 (en) * 2009-04-23 2012-01-30 Universitat Pompeu Fabra METHOD FOR CALCULATING MEASUREMENT MEASURES BETWEEN TEMPORARY SIGNS.
CN103020079B (en) * 2011-09-24 2017-03-08 国家电网公司 A kind of industrial data supplementation method
CN103246702B (en) * 2013-04-02 2016-01-06 大连理工大学 A kind of complementing method of the industrial sequence data disappearance based on segmentation Shape Representation
CN103279643B (en) * 2013-04-26 2016-08-24 华北电力大学(保定) A kind of computational methods of time series similarity
CN103577562B (en) * 2013-10-24 2016-08-31 河海大学 A kind of many measuring periods sequence similarity analyzes method
CN103561418A (en) * 2013-11-07 2014-02-05 东南大学 Anomaly detection method based on time series

Also Published As

Publication number Publication date
CN103886195A (en) 2014-06-25

Similar Documents

Publication Publication Date Title
CN102184414B (en) Method and system for identifying and judging pump indicator diagram
CN102103692B (en) Fingerprint image enhancing method
JP2015528960A5 (en)
CN102184408B (en) Autoregressive-model-based high range resolution profile radar target recognition method
CN102411711B (en) Finger vein recognition method based on individualized weight
MX362373B (en) Content based image retrieval.
CN105389550A (en) Remote sensing target detection method based on sparse guidance and significant drive
CN103605985A (en) A data dimension reduction method based on a tensor global-local preserving projection
CN103345760B (en) A kind of automatic generation method of medical image object shapes template mark point
CN103886195B (en) Time Series Similarity measure under shortage of data
CN104156723B (en) A kind of extracting method with the most stable extremal region of scale invariability
CN102637199B (en) Image marking method based on semi-supervised subject modeling
CN105809182A (en) Image classification method and device
CN102542543A (en) Block similarity-based interactive image segmenting method
CN103793704A (en) Supervising neighborhood preserving embedding face recognition method and system and face recognizer
CN104392234A (en) Image fast Fourier transformation (FFT) symbol information based unmanned aerial vehicle autonomous landing target detection method
CN103177105A (en) Method and device of image search
CN104751630A (en) Road traffic state acquisition method based on Kernel-KNN matching
CN105893723A (en) Rock mass fault gliding plane occurrence calculation method based on microseism event cluster PCA method
CN104851105A (en) Improved foam image segmentation method based on watershed transformation
CN101000651B (en) Method for recognizing multiple texture image
CN104462458A (en) Data mining method of big data system
CN102663040A (en) Method for obtaining attribute column weights based on KL (Kullback-Leibler) divergence training for positive-pair and negative-pair constrained data
CN106529482A (en) Traffic road sign identification method adopting set distance
CN104408335A (en) Curve shape considered anti-fake method of vector geographic data watermark

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant