CN103279643B - A kind of computational methods of time series similarity - Google Patents

A kind of computational methods of time series similarity Download PDF

Info

Publication number
CN103279643B
CN103279643B CN201310151558.5A CN201310151558A CN103279643B CN 103279643 B CN103279643 B CN 103279643B CN 201310151558 A CN201310151558 A CN 201310151558A CN 103279643 B CN103279643 B CN 103279643B
Authority
CN
China
Prior art keywords
time series
section
chronon sequence
sequence
chronon
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310151558.5A
Other languages
Chinese (zh)
Other versions
CN103279643A (en
Inventor
李中
张铁峰
张卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201310151558.5A priority Critical patent/CN103279643B/en
Publication of CN103279643A publication Critical patent/CN103279643A/en
Application granted granted Critical
Publication of CN103279643B publication Critical patent/CN103279643B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the computational methods of a kind of time series similarity in computer information technology processing technology field.Including: respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S in the same fashion1(i) and S2(i);Set each chronon sequence S1(i) and S2Weight w of (i)i;Calculate the distance between corresponding chronon sequence;According to the distance between corresponding chronon sequence and the weight of chronon sequence, calculate time series S1And S2Similarity.The present invention can the shape similarity degree of preferably reflecting time sequence, its complexity is low and judges that speed is fast.

Description

A kind of computational methods of time series similarity
Technical field
The invention belongs to computer information technology processing technology field, particularly relate to the calculating of a kind of time series similarity Method.
Background technology
Time series is present in the application necks such as business, finance, medicine, Astronomy, Meteorology, Aero-Space, electric power energy in a large number Territory.During data time series analysis, how to judge that the similarity of time series data is a underlying issue, extensively apply In the work such as seasonal effect in time series inquiry, pattern match, classification and data mining.
Existing time series similarity computational methods based on distance are broadly divided into two kinds, i.e. based on Euclidean distance The method of (or class Euclidean distance) and method based on dynamic time warping distance.Method based on Euclidean distance Do not possesses form identification ability, it is impossible to recognition time sequence patterns of change under different resolution.Based on dynamic time warping The method of distance carries out alignment coupling according to the Time Warp path of minimum cost, it would be preferable to support seasonal effect in time series time shaft is stretched Contracting, but it is unsatisfactory for distance triangle inequality, particularly it calculates time complexity is O (n2) (the wherein length of n express time sequence Degree), amount of calculation is very big, cannot actual apply under many circumstances.
The computational accuracy existed for current time series similarity computational methods is the highest, calculate the problems such as complicated, this A kind of time series similarity computational methods based on improvement distance of bright offer, can realize the quick, accurate of time series similarity Really judge.
Summary of the invention
It is an object of the invention to, it is provided that a kind of time series similarity computational methods based on improvement distance, be used for solving The deficiency that certainly existing time series similarity computational methods exist.
To achieve these goals, the technical scheme that the present invention proposes is, the computational methods of a kind of time series similarity, It is characterized in that described method includes:
Step 1: respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S in the same fashion1 (i) and S2(i);Wherein, i=1,2 ..., n, n are the number of chronon sequence;
Step 2: set each chronon sequence SjWeight w of (i)i;Wherein, j=1,2, i=1,2 ..., n;
Step 3: calculate the distance between corresponding chronon sequence;
Step 4: according to the distance between corresponding chronon sequence and the weight of chronon sequence, calculates time series S1 And S2Similarity;
Described respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S1(i) and S2I () is concrete It is:
Use and wait point seasonal effect in time series model split chronon sequence, if time series S1Element number L1It is the whole of n Several times, then by time series S1Being divided into n section, each section is a chronon sequence, the element number of each chronon sequence For
If time series S1Element number L1It not the integral multiple of n, then by time series S1Being divided into n section, each section is One chronon sequence, the element number of the 1st section to (n-1)th section is[] is rounding operation, the element number of n-th section ForOr, the element number of the 2nd section to n-th section is[] is rounding operation, the element of the 1st section Number is
If time series S2Element number L2It is the integral multiple of n, then by time series S2Being divided into n section, each section is One chronon sequence, the element number of each chronon sequence is
If time series S2Element number L2It not the integral multiple of n, then by time series S2Being divided into n section, each section is One chronon sequence, the element number of the 1st section to (n-1)th section is[] is rounding operation, the element number of n-th section ForOr, the element number of the 2nd section to n-th section is[] is rounding operation, the element of the 1st section Number is
Described each chronon sequence SjWeight w of (i)iAccording to formulaSet;Wherein, LjI () is chronon Sequence SjThe element number of (i), LjFor time series SjElement number, j=1,2, i=1,2 ..., n, n are the number of subsequence Mesh.
The described distance calculated between corresponding chronon sequence uses formula d ( S 1 ( i ) , S 2 ( i ) ) = Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) ) 2 + σ ; Wherein:
d(S1(i),S2(i)) it is corresponding chronon sequence S1(i) and S2Distance between (i);
For chronon sequence S1Pth the element of (i);
For chronon sequence S2Pth the element of (i);
M is chronon sequence S1(i) and S2The element number of (i), i=1,2 ..., n;
σ is chronon sequence S1(i) and S2The standard deviation of the difference of i element that () is corresponding and σ = 1 m Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) - μ ) 2 ;
μ is chronon sequence S1(i) and S2The arithmetic average of the difference of i element that () is corresponding and μ = 1 m Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) ) .
Described calculating time series S1And S2Similarity use formula Wherein:
Sim(S1,S2) it is time series S1And S2Similarity;
d(S1(i),S2(i)) it is corresponding chronon sequence S1(i) and S2Distance between (i);
wiFor chronon sequence S1(i) and S2Weight between (i);
N is the number of subsequence.
Time series is carried out equal or different length segmentation and arranges the weight of each chronon sequence, Ke Yiman by the present invention Actual similarity under the different situation of foot judges demand;It addition, the chronon sequence distance computational methods that the present invention provides, compare Tradition distance calculating method, can the shape similarity degree of preferably reflecting time sequence (i.e. time series local trend is different With), more meet mankind's daily experience and visual contrast, it is judged that more accurate;Finally, segmentation of the present invention calculate chronon sequence away from From, calculating similarity further according to subsequence distance, computation complexity is low, it is possible to realize the quick judgement of time series similarity.
Accompanying drawing explanation
Fig. 1 is the computational methods flow chart of the time series similarity that the present invention provides.
Detailed description of the invention
Below in conjunction with the accompanying drawings, preferred embodiment is elaborated.It is emphasized that the description below is merely exemplary Rather than in order to limit the scope of the present invention and application thereof.
Fig. 1 is the computational methods flow chart of the time series similarity that the present invention provides, as it is shown in figure 1, time series phase Include like the computational methods spent:
Step 1: respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S in the same fashion1 (i) and S2(i);Wherein, i=1,2 ..., n, n are the number of chronon sequence.
Concrete practical application request is judged, if the local similarity of time series day part is to time series according to similarity Global similarity influence degree is identical, then point seasonal effect in time series model split subsequence such as can use.By time series by long (if can not isometric divide equally, first or last chronon sequence length can not wait to spend the isometric n of being divided into a chronon sequence In other sub-sequence length).The concrete length value of each chronon sequence is relevant to time series feature, according to different field Time series fluctuating characteristic is arranged by the following method: seasonal effect in time series data value fluctuating margin is the least, and chronon sequence length takes It is worth the biggest;The time series that data value fluctuation is the most violent, chronon sequence length value is the least.Generally, chronon The length value of sequence is between 5-15.If time series global similarity is affected by the local similarity of time series day part Difference, the i.e. data of some (or several) period are relatively big on the impact of whole seasonal effect in time series similarity, other period data pair The impact of whole seasonal effect in time series similarity is less, then can use non-decile seasonal effect in time series model split chronon sequence.Ratio As, the independent segmentation of data of the one (or several) period bigger on the impact of whole Time Series Similarity, this (or several Individual) length of period is the length of this (or several) chronon sequence, and other period data of time series then press decile Seasonal effect in time series model split subsequence.
Illustrate with decile seasonal effect in time series model split subsequence below.If time series S1(S2) element Number L1(L2) be the integral multiple of n, then by time series S1(S2) it is divided into n section.Each section is a chronon sequence, Mei Geshi Between the element number of subsequence beSuch as, chronon sequence to be divided has 20 elements, i.e. acquires 20 The data of time point, will be divided into 5 chronon sequences, then can make 4 elements (data of time point) is one section, group Become chronon sequence, be like this just divided into 4 chronon sequences.
If time series S1(S2) element number L1(L2) not the integral multiple of n, then by time series S1(S2) it is divided into n Section, each section is a chronon sequence, and the element number of the 1st section to (n-1)th section is [] is for rounding fortune Calculating, the element number of n-th section isOr, the element number of the 2nd section to n-th section For[] is rounding operation, and the element number of the 1st section isAlso It is to have as a example by 20 elements by chronon sequence to be divided, if to be divided into 6 chronon sequences, then can make 1 section to the 5th section every section has 3 elements, and the 6th section is 5 elements;Or, the 2nd section can be made to have 3 elements to the 6th section every section, 1st section is 5 elements.
Step 2: set each chronon sequence SjWeight w of (i)i
In the present invention, arranging its weight by the length scale of each chronon sequence, concrete grammar is: a chronon sequence Weight value be i.e. the ratio of length and whole length of time series of this chronon sequence.The power of chronon sequence s (i) Weight wiComputational methods such as following formula:
wi=Lj(i)/Lj (1)
In above formula (1), wiIt it is chronon sequence S1(i) and S2The weight of (i), LjI () is chronon sequence SjThe element of (i) Number, namely chronon sequence SjThe length of (i);LjFor time series SjElement number, namely time series SjLength, j =1,2, i=1,2 ..., n, n are the number of subsequence.
For using the chronon sequence of non-decile seasonal effect in time series model split, can be according to each chronon sequence pair The size of time series global similarity influence degree arranges the weight of each chronon sequence, the chronon that influence degree is the biggest Sequence weights value is the biggest, and the chronon sequence weights value that influence degree is the least is the least, but the span of weight to be made is (0,1), ensures that the weight sum of all chronon sequences of time series division gained is equal to 1, i.e. satisfied following constraint simultaneously:
Σ i = 1 n w i = 1 - - - ( 2 )
In above formula (2), wiIt it is chronon sequence S1(i) and S2I the weight of (), n is the number of chronon sequence.
Step 3: calculate the distance between corresponding chronon sequence.
Distance employing formula between the chronon sequence that calculating is corresponding:
d ( S 1 ( i ) , S 2 ( i ) ) = Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) ) 2 + σ - - - ( 3 )
In above-mentioned (3) formula, d (S1(i),S2(i)) it is corresponding chronon sequence S1(i) and S2Distance between (i),For chronon sequence S1Pth the element of (i), i.e. time subsequence S1The data of pth the time point of (i),For Chronon sequence S2I pth the element of (), m is chronon sequence S1(i) and S2I the element number of (), σ is chronon sequence S1 (i) and S2The standard deviation of the difference of i element that () is corresponding, the computational methods of σ are as follows:
σ = 1 m Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) - μ ) 2 - - - ( 4 )
In above-mentioned (4) formula, μ is chronon sequence S1(i) and S2The arithmetic average of the difference of i element that () is corresponding, μ Computational methods as follows:
μ = 1 m Σ p = 1 m ( S 1 p ( i ) - S 2 p ( i ) ) - - - ( 5 )
In above-mentioned formula (3) and (4), i=1,2 ..., n, n are the number of chronon sequence.
Step 4: according to the distance between corresponding chronon sequence and the weight of chronon sequence, calculates time series S1 And S2Similarity.
Calculate time series S1And S2Similarity use formula
S i m ( S 1 , S 2 ) = 1 1 + Σ i = 1 n w i × d ( S 1 ( i ) , S 2 ( i ) ) - - - ( 6 )
In above-mentioned formula (6), Sim (S1,S2) it is time series S1And S2Similarity, d (S1(i),S2(i)) it is corresponding Chronon sequence S1(i) and S2Distance between (i), wiFor chronon sequence S1(i) and S2I the weight of (), n is chronon sequence Number.
The present invention has a following remarkable advantage:
(1) this method carries out equal or different length segmentation according to the relation of time series local similarity with global similarity And the weight of each chronon sequence is set, judge demand meeting the actual similarity of different user.
(2) difference of time series segmentation length, it is possible to portray seasonal effect in time series similar trend journey under different time scales Degree, judges to provide the thickness granularity division for different demands for time series similarity, makes this method be provided with good Time multi-resolution characteristics.
(3) consider time series plesiomorphism factor, devise new chronon sequence distance computational methods, compare tradition Distance, new method can the shape similarity degree (i.e. the similarities and differences of time series local trend) of preferably reflecting time sequence, more accord with Close mankind's daily experience and visual contrast, it is judged that more accurate.
(4) this method computation complexity is low, it is possible to realize the quick judgement of time series similarity.
The above, the only present invention preferably detailed description of the invention, but protection scope of the present invention is not limited thereto, Any those familiar with the art in the technical scope that the invention discloses, the change that can readily occur in or replacement, All should contain within protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims It is as the criterion.

Claims (2)

1. computational methods for time series similarity, is characterized in that described method includes:
Step 1: respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S in the same fashion1(i) And S2(i);Wherein, i=1,2 ..., n, n are the number of chronon sequence;
Step 2: set each chronon sequence SjWeight w of (i)i;Wherein, j=1,2, i=1,2 ..., n;
Step 3: calculate the distance between corresponding chronon sequence;
Step 4: according to the distance between corresponding chronon sequence and the weight of chronon sequence, calculates time series S1And S2 Similarity;
Described respectively by two time serieses S to be compared1And S2It is divided into chronon sequence S1(i) and S2I () specifically:
Use and wait point seasonal effect in time series model split chronon sequence, if time series S1Element number L1It it is the integer of n Times, then by time series S1Being divided into n section, each section is a chronon sequence, and the element number of each chronon sequence is
If time series S1Element number L1It not the integral multiple of n, then by time series S1Being divided into n section, each section is one Chronon sequence, the element number of the 1st section to (n-1)th section is[] is rounding operation, and the element number of n-th section isOr, the element number of the 2nd section to n-th section is[] is rounding operation, the element number of the 1st section For
If time series S2Element number L2It is the integral multiple of n, then by time series S2Being divided into n section, each section is one Chronon sequence, the element number of each chronon sequence is
If time series S2Element number L2It not the integral multiple of n, then by time series S2Being divided into n section, each section is one Chronon sequence, the element number of the 1st section to (n-1)th section is[] is rounding operation, and the element number of n-th section isOr, the element number of the 2nd section to n-th section is[] is rounding operation, the element number of the 1st section For
Described each chronon sequence SjWeight w of (i)iAccording to formulaSet;Wherein, LjI () is chronon sequence SjThe element number of (i), LjFor time series SjElement number, j=1,2, i=1,2 ..., n, n are the number of subsequence;
The described distance calculated between corresponding chronon sequence uses formula Wherein:
d(S1(i),S2(i)) it is corresponding chronon sequence S1(i) and S2Distance between (i);
For chronon sequence S1Pth the element of (i);
For chronon sequence S2Pth the element of (i);
M is chronon sequence S1(i) and S2The element number of (i), i=1,2 ..., n;
σ is chronon sequence S1(i) and S2The standard deviation of the difference of i element that () is corresponding and
μ is chronon sequence S1(i) and S2The arithmetic average of the difference of i element that () is corresponding and
Computational methods the most according to claim 1, is characterized in that described calculating time series S1And S2Similarity use public affairs FormulaWherein:
Sim(S1,S2) it is time series S1And S2Similarity;
d(S1(i),S2(i)) it is corresponding chronon sequence S1(i) and S2Distance between (i);
wiFor chronon sequence S1(i) and S2Weight between (i);
N is the number of subsequence.
CN201310151558.5A 2013-04-26 2013-04-26 A kind of computational methods of time series similarity Expired - Fee Related CN103279643B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310151558.5A CN103279643B (en) 2013-04-26 2013-04-26 A kind of computational methods of time series similarity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310151558.5A CN103279643B (en) 2013-04-26 2013-04-26 A kind of computational methods of time series similarity

Publications (2)

Publication Number Publication Date
CN103279643A CN103279643A (en) 2013-09-04
CN103279643B true CN103279643B (en) 2016-08-24

Family

ID=49062158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310151558.5A Expired - Fee Related CN103279643B (en) 2013-04-26 2013-04-26 A kind of computational methods of time series similarity

Country Status (1)

Country Link
CN (1) CN103279643B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103886195B (en) * 2014-03-14 2015-08-26 浙江大学 Time Series Similarity measure under shortage of data
CN104063467B (en) * 2014-06-26 2017-04-26 北京工商大学 Intra-domain traffic flow pattern discovery method based on improved similarity search technology
CN104182460B (en) * 2014-07-18 2017-06-13 浙江大学 Time Series Similarity querying method based on inverted index
EP3294987B1 (en) * 2015-05-13 2023-02-22 ConocoPhillips Company Time corrections for drilling data
CN105093218A (en) * 2015-06-12 2015-11-25 中国电子科技集团公司第四十一研究所 Flying target high-low altitude determining method based on time sequence similarity
CN107784311A (en) * 2016-08-24 2018-03-09 中国海洋大学 Global mesoscale eddy space-time hierarchical topology path construction technology
CN106503725A (en) * 2016-09-12 2017-03-15 新浪网技术(中国)有限公司 A kind of graphic processing method and device
CN108052628B (en) * 2017-12-19 2020-07-14 河北省科学院应用数学研究所 Turnout starting current detection method, system and terminal equipment
CN108491436A (en) * 2018-02-10 2018-09-04 大连智慧海洋软件有限公司 A kind of steel plate thickness matching process based on self-adapting stretching dynamic time warping algorithm
CN108573059B (en) * 2018-04-26 2021-02-19 哈尔滨工业大学 Time sequence classification method and device based on feature sampling
CN110955862B (en) * 2019-11-26 2023-10-13 新奥数能科技有限公司 Evaluation method and device for equipment model trend similarity
CN112365363A (en) * 2020-10-14 2021-02-12 国网四川省电力公司电力科学研究院 Calculation method for similarity of power load curves

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7770072B2 (en) * 2007-01-16 2010-08-03 Xerox Corporation Method and system for analyzing time series data
CN102364490B (en) * 2011-10-26 2014-10-08 华北电力大学 Automatic synchronization recognition method based on hierarchical analyzing model

Also Published As

Publication number Publication date
CN103279643A (en) 2013-09-04

Similar Documents

Publication Publication Date Title
CN103279643B (en) A kind of computational methods of time series similarity
Teichgraeber et al. Clustering methods to find representative periods for the optimization of energy systems: An initial framework and comparison
US20180351355A1 (en) Method for identifying pattern of load cycle
Byron The estimation of large social account matrices
US10387419B2 (en) Method and system for managing databases having records with missing values
Khachatryan et al. A search for pair production of new light bosons decaying into muons
CN109587713A (en) A kind of network index prediction technique, device and storage medium based on ARIMA model
Ploch Ordinal measures of association and the general linear model
CN104636325B (en) A kind of method based on Maximum-likelihood estimation determination Documents Similarity
CN107563557A (en) Determine the method and device of oil well output lapse rate
CN103544544A (en) Energy consumption forecasting method and device
US11210673B2 (en) Transaction feature generation
Artal-Tur et al. The socio-economic impact of migration flows
CN103365842B (en) A kind of page browsing recommends method and device
Zhang et al. Decomposition methods for tourism demand forecasting: A comparative study
Calcagnini et al. A time series analysis of labor productivity. Italy versus the European countries and the US
CN109255629A (en) A kind of customer grouping method and device, electronic equipment, readable storage medium storing program for executing
CN113032403A (en) Data insight method, device, electronic equipment and storage medium
Zhu et al. Autoregressive optimal transport models
CN106203515A (en) Multiple criteria fusion application is in the method for higher-dimension Small Sample Database feature selection
Cao et al. Research on dynamic time warping multivariate time series similarity matching based on shape feature and inclination angle
CN108021985A (en) A kind of model parameter training method and device
CN107133218A (en) Trade name intelligent Matching method, system and computer-readable recording medium
CN102930290A (en) Integrated classifier and classification method thereof
Amirteimoori et al. Increasing the discrimination power of data envelopment analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160824

Termination date: 20200426

CF01 Termination of patent right due to non-payment of annual fee