CN103886195B - Time Series Similarity measure under shortage of data - Google Patents
Time Series Similarity measure under shortage of data Download PDFInfo
- Publication number
- CN103886195B CN103886195B CN201410095671.0A CN201410095671A CN103886195B CN 103886195 B CN103886195 B CN 103886195B CN 201410095671 A CN201410095671 A CN 201410095671A CN 103886195 B CN103886195 B CN 103886195B
- Authority
- CN
- China
- Prior art keywords
- similarity
- data
- nan
- interval
- time series
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000011524 similarity measure Methods 0.000 title claims abstract description 6
- 238000000034 method Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 8
- 239000000284 extract Substances 0.000 claims abstract description 5
- 230000001932 seasonal effect Effects 0.000 claims abstract description 5
- 230000008034 disappearance Effects 0.000 claims description 7
- 238000012935 Averaging Methods 0.000 claims description 3
- 241001269238 Data Species 0.000 claims description 2
- 238000000605 extraction Methods 0.000 abstract 1
- 238000004364 calculation method Methods 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of Time Series Similarity measure that can adapt to missing data.The method extracts data pair between two from original two time serieses, and is divided into 5 kinds according to shortage of data situation, calculates single order similarity respectively interval; Interval to single order similarity, extraction interval calculates second order similarity between two, and obtains second order similarity vector; Finally second order similarity vector is done on average, obtain two final seasonal effect in time series similarities.The present invention can adapt to several scenes, and method is simple, has no requirement to data integrity.
Description
Technical field
The present invention relates to the time series similarity computing method in a kind of computer information processing, relating to specifically calculate has one or more missing data and the physical constraint of data is the method for the similarity between two time serieses in [0, the upper limit] situation.
Background technology
Time series is present in human society and occurring in nature in a large number, such as financial time series, traffic time sequence, temperature-time sequence etc., Time Series Similarity can find the many similar time serieses in similar field, thus provides extremely favourable data for the analysis of physical phenomenon and social phenomenon.Current Time Series Similarity method is mainly for the situation not having missing data, if shortage of data, mean value replacement, trend extrapolation, exponential smoothing etc. is then utilized to make up, but these make up the knowledge needing priori, thus be difficult to ensure the similarity accuracy after Data-parallel language, and in some cases, the disappearance of data can not only be interpreted as lacking of information, sometimes exactly can reflect more data characteristics.Thus to be necessary in missing data situation sequence similarity measure Time Created.
Summary of the invention
In order to overcome existing time series tolerance cannot be applied to missing data situation under, the present invention propose a kind of can under any deletion condition computing time sequence similarity method.The method has no requirement to data integrated degree.
The present invention solves method that its technical matters adopts as following, for two time serieses:
1) two seasonal effect in time series data pair are extracted between two.
2) each is divided into five kinds to data deletion condition, and it is interval to calculate its single order similarity according to shortage of data situation.
3) similarity is calculated between two to the some similarity intervals calculated again and obtain second order similarity vector.
4) second order similarity vector is averaging, then obtains final two seasonal effect in time series similarities.
Beneficial effect of the present invention: because the time series great majority of occurring in nature have certain constraint (such as speed is greater than 0 and is less than section speed limit), can adapt to several scenes, method is simple, has no requirement to data integrity.
Accompanying drawing explanation
Fig. 1 is the Similarity Measure schematic diagram of two bivectors containing missing values.
Embodiment
Below the present invention is described in further detail.
Suppose for two time series X
i=(x
i1, x
i2...) and X
j=(x
j1, x
j2...), length of time series is all N, and each value of time series has upper limit x, and lower limit is 0, and similarity calculating method is as follows:
1) extract two seasonal effect in time series data pair between two, if extract m and the n-th data to two time serieses respectively, obtain x
jm, x
jnand x
im, x
in, total
right.And being constrained to of each data
2) for this
to the every a pair { x in data
im, x
inand { x
jm, x
jn, be divided into following 5 kinds of situations to consider to calculate similarity interval, this interval is referred to as single order similarity:
(1) if data do not lack, then according to formula below:
Final data to similarity interval are:
s
mn∈[s
mn′({x
im,x
in},{x
jm,x
jn}),s
mn′({x
im,x
in},{x
jm,x
jn})]
(2) if data all lack, be also { x
im, x
in}={ NaN, NaN} and { x
jm, x
jn}={ NaN, NaN}, then:
s
mn∈[1,1]
(3) if data only have a disappearance, without loss of generality, x is supposed
jn=NaN, then according to cosine similarity Computation schema, the similarity of two binary vectors to equal on two dimensional surface two vectorial included angle cosines, as shown in Figure 1, works as x
jnvacancy, due to x
jnhave bound, thus two vectorial angles have a maximal value and minimum value, thus similarity be one also interval:
s
mn∈[min(1,cos(Θ
1),cos(Θ
2)),max(1,cos(Θ
1),cos(Θ
2))]
Can be in the hope of
(4) if two data are to there being shortage of data, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ x
jm, NaN}, then similar, similarity is an interval:
(5) if two data are to there being a disappearance, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ NaN, x
jn; Or two data are to having three disappearances, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ NaN, NaN}, similar, there is a similarity interval:
s
mn∈[0,1]
3) right
individual similarity interval is (by each s
mninterval Unified Expression is
represent interval initial value,
represent interval end value), then calculate similarity (due to all known during similarity, thus similarity is all a scalar) successively between two, be referred to as second order similarity.Suppose that certain a pair similarity interval is respectively
with
then their similarity s
mnkjcomputing method are:
Known, s
mnkjnumber be
4) second order similarity vector is averaging, final two time series similarity s (X
i, X
j) be:
So far, the time series similarity of two missing datas has been obtained.
Claims (1)
1. Time Series Similarity measure under shortage of data, is characterized in that:
Suppose for two time series X
i=(x
i1, x
i2...) and X
j=(x
j1, x
j2...), length of time series is all N, and missing data is expressed as NaN, and each value of time series has the upper limit
lower limit is 0, and similarity calculating method is as follows:
1) extract two seasonal effect in time series data pair between two, if extract m and the n-th data to two time serieses respectively, obtain x
jm, x
jnand x
im, x
in, total
right; And being constrained to of each data
2) for this
to the every a pair { x in data
im, x
inand { x
jm, x
jn, be divided into following five kinds of situations to consider to calculate similarity interval, this interval is referred to as single order similarity:
(1) if data do not lack, then according to formula below:
Final data to similarity interval are:
s
mn∈[s
mn′({x
im,x
in},{x
jm,x
jn}),s
mn′({x
im,x
in},{x
jm,x
jn})];
(2) if data all lack, be also { x
im, x
in}={ NaN, NaN} and { x
jm, x
jn}={ NaN, NaN}, then:
s
mn∈[1,1];
(3) if data only have a disappearance, without loss of generality, x is supposed
jn=NaN, then according to cosine similarity Computation schema, the similarity of two binary vectors to equal on two dimensional surface two vectorial included angle cosines, works as x
jnvacancy, due to x
jnhave bound, thus two vectorial angles have a maximal value and minimum value, thus similarity be one also interval:
s
mn∈[min(1,cos(Θ
1),cos(Θ
2)),max(1,cos(Θ
1),cos(Θ
2))];
Try to achieve
(4) if two data are to there being shortage of data, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ x
jm, NaN}, then similarity is an interval:
(5) if two data are to there being a disappearance, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ NaN, x
jn; Or two data are to having three disappearances, and form is { x
im, x
in}={ x
im, NaN} and { x
jm, x
jn}={ NaN, NaN}, have a similarity interval:
s
mn∈[0,1];
3) by each s
mninterval Unified Expression is
represent interval initial value,
represent interval end value, right
individual similarity is interval, then calculates similarity between two successively, is referred to as second order similarity; Suppose that certain a pair similarity interval is respectively
with
then their similarity s
mnkjfor:
Known, s
mnkjnumber be
4) second order similarity vector is averaging, final two time series similarity s (X
i, X
j) be:
So far, the time series similarity of two missing datas has been obtained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410095671.0A CN103886195B (en) | 2014-03-14 | 2014-03-14 | Time Series Similarity measure under shortage of data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410095671.0A CN103886195B (en) | 2014-03-14 | 2014-03-14 | Time Series Similarity measure under shortage of data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103886195A CN103886195A (en) | 2014-06-25 |
CN103886195B true CN103886195B (en) | 2015-08-26 |
Family
ID=50955085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410095671.0A Active CN103886195B (en) | 2014-03-14 | 2014-03-14 | Time Series Similarity measure under shortage of data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103886195B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107885696B (en) * | 2017-11-20 | 2021-09-07 | 河海大学 | Method for realizing missing data restoration by utilizing observation sequence similarity |
CN113643538B (en) * | 2021-08-11 | 2023-05-12 | 昆山轨道交通投资置业有限公司 | Bus passenger flow measuring and calculating method integrating IC card historical data and manual investigation data |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ES2354330B1 (en) * | 2009-04-23 | 2012-01-30 | Universitat Pompeu Fabra | METHOD FOR CALCULATING MEASUREMENT MEASURES BETWEEN TEMPORARY SIGNS. |
CN103020079B (en) * | 2011-09-24 | 2017-03-08 | 国家电网公司 | A kind of industrial data supplementation method |
CN103246702B (en) * | 2013-04-02 | 2016-01-06 | 大连理工大学 | A kind of complementing method of the industrial sequence data disappearance based on segmentation Shape Representation |
CN103279643B (en) * | 2013-04-26 | 2016-08-24 | 华北电力大学(保定) | A kind of computational methods of time series similarity |
CN103577562B (en) * | 2013-10-24 | 2016-08-31 | 河海大学 | A kind of many measuring periods sequence similarity analyzes method |
CN103561418A (en) * | 2013-11-07 | 2014-02-05 | 东南大学 | Anomaly detection method based on time series |
-
2014
- 2014-03-14 CN CN201410095671.0A patent/CN103886195B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN103886195A (en) | 2014-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102184414B (en) | Method and system for identifying and judging pump indicator diagram | |
CN102103692B (en) | Fingerprint image enhancing method | |
JP2015528960A5 (en) | ||
CN102184408B (en) | Autoregressive-model-based high range resolution profile radar target recognition method | |
CN102411711B (en) | Finger vein recognition method based on individualized weight | |
MX362373B (en) | Content based image retrieval. | |
CN105389550A (en) | Remote sensing target detection method based on sparse guidance and significant drive | |
CN103605985A (en) | A data dimension reduction method based on a tensor global-local preserving projection | |
CN103345760B (en) | A kind of automatic generation method of medical image object shapes template mark point | |
CN103886195B (en) | Time Series Similarity measure under shortage of data | |
CN104156723B (en) | A kind of extracting method with the most stable extremal region of scale invariability | |
CN102637199B (en) | Image marking method based on semi-supervised subject modeling | |
CN105809182A (en) | Image classification method and device | |
CN102542543A (en) | Block similarity-based interactive image segmenting method | |
CN103793704A (en) | Supervising neighborhood preserving embedding face recognition method and system and face recognizer | |
CN104392234A (en) | Image fast Fourier transformation (FFT) symbol information based unmanned aerial vehicle autonomous landing target detection method | |
CN103177105A (en) | Method and device of image search | |
CN104751630A (en) | Road traffic state acquisition method based on Kernel-KNN matching | |
CN105893723A (en) | Rock mass fault gliding plane occurrence calculation method based on microseism event cluster PCA method | |
CN104851105A (en) | Improved foam image segmentation method based on watershed transformation | |
CN101000651B (en) | Method for recognizing multiple texture image | |
CN104462458A (en) | Data mining method of big data system | |
CN102663040A (en) | Method for obtaining attribute column weights based on KL (Kullback-Leibler) divergence training for positive-pair and negative-pair constrained data | |
CN106529482A (en) | Traffic road sign identification method adopting set distance | |
CN104408335A (en) | Curve shape considered anti-fake method of vector geographic data watermark |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |