CN110032585A - A kind of time series bilayer symbolism method and device - Google Patents
A kind of time series bilayer symbolism method and device Download PDFInfo
- Publication number
- CN110032585A CN110032585A CN201910261214.7A CN201910261214A CN110032585A CN 110032585 A CN110032585 A CN 110032585A CN 201910261214 A CN201910261214 A CN 201910261214A CN 110032585 A CN110032585 A CN 110032585A
- Authority
- CN
- China
- Prior art keywords
- time series
- point
- time
- sequence
- observation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of time series bilayer symbolism method and device, can retain each subsequence can be lasting specific time interval.The described method includes: being grouped according to the size of observation in time series to time series;To the time series after grouping, the corresponding codomain range of symbol in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster;Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the cut-point of time shaft;According to the cut-point of time series, determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, time series is converted to the symbolism sequence comprising temporal relationship.The present invention relates to data processing fields.
Description
Technical field
The present invention relates to data processing field, a kind of time series bilayer symbolism method and device are particularly related to.
Background technique
The observation that time series is recorded under specific time by one group forms, and often has fixed time interval.
But the time series of continuous type numerical value is unfavorable for analyzing in practical applications, and the symbolism of time series is exactly in order to effective
The internal structure of acquisition time sequence and a kind of suitable discretization means carried out, time series symbolism engineering, science,
The numerous areas such as sociology and economics are all widely used.It but in the prior art, is simply to be clustered mostly
Or the size of directly regulation glossary of symbols carries out symbolism, is easy to cause the loss of data information in this way, it can not feedback coefficient
According to the length of different conditions institute's duration, such as:
It is logical to polymerize approximate (Symbolic Aggregate Approximation, SAX) using symbolism for the prior art one
Artificial stipulated time sequence symbol collection size is crossed, and impartial division, last benefit are carried out according to symbol numbers to time sequential value domain
The mean value of time series subsequence under using demarcation interval that can divide as this section representative symbol, thus by time series turn
Become symbolism sequence.
The prior art two carries out symbolism conversion in conjunction with clustering algorithm, for example, being calculated by K-Means (K- mean value) cluster
Method sets K initial cluster center, updates cluster centre by continuous iteration to obtain K class cluster, wherein each class cluster pair
A different symbol is answered, and then time series is changed into corresponding symbolism sequence.
The prior art one and the prior art two, although also can by time series is discrete turn to required for symbolism sequence
Column, but the discretization process needs continuous adjusting parameter to realize optimal as a result, the symbolism of time series is a Xiang Chong
The data prediction step wanted, removing initial data cleansing in this process should also save as far as possible what data were included
Information, while this method should have more general applicability, can just show the importance of time series symbolism method in this way.It is existing
Technology is more to reach required purpose based on the continuous adjustment of parameter, and just need in face of different time serieses
Again the parameter before adjustment, more importantly its final symbolism sequence got can not preferably react each
The length of state institute's duration is merely capable of representing a kind of secondary sequence of different conditions, to sum up, existing time series symbol
Number changing method overall process excessively relies on artificial settings parameter, and is easily lost data information.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of time series bilayer symbolism method and devices, existing to solve
Have and needs to be manually set parameter in time series semiosis present in technology and each state duration can not be retained
The problem of length.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of time series bilayer symbolism method, comprising:
According to the size of observation in time series, time series is grouped;
To the time series after grouping, symbol pair in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster
The codomain range answered;
Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft
Point;
According to the cut-point of time series, the start/stop time value and its locating codomain of each subsequence in time series are determined
Time series is converted to the symbolism sequence comprising temporal relationship by the corresponding symbol of range.
Further, the size according to observation in time series, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains according to the principle in each section including identical type data
Multiple initial sections.
Further, described pair grouping after time series, by Shannon entropy self-adaption cluster determine glossary of symbols size and
The corresponding codomain range of symbol includes: in glossary of symbols
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22 merges two sections of arbitrary neighborhood, determines the summation of all section entropy after merging, and determine its summation
Difference between the entropy of entire original time series;
S23, iteration execute S22, each iteration only merges once, after the completion of iteration, merged when by difference maximum two
A adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and true according to the number of current interval
The size for determining glossary of symbols, the range of the observation according to included in section determine codomain model corresponding to each symbol in glossary of symbols
It encloses.
Further, described by Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence
Column include:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than Grads threshold
Long-term change trend point of the observation point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, time series is obtained
Characteristic point sequence.
Further, described by gradient method, the gradient variation in analysis time sequence between observation point will
Observation point greater than Grads threshold includes: as the Long-term change trend point of time series
Determine observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiWith observation point xj+1
Between slope ki(j+1);
According to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation point
I, the corresponding gradient △ of jij;Wherein, n indicates the length of time series;
For observation point i, pass through formulaN >=3 determine the ladder of observation point i
Spend threshold value λi;
Judge whether observation point i is greater than Grads threshold λi, if so, using observation point i as the Long-term change trend of time series
Point.
Further, it is described according to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series
Include: in the cut-point of time shaft
The up and down and these three steady jumps occurred by characteristic point two neighboring in time series in codomain range
Relationship determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
Further, the cut-point according to time series determines that the start/stop time value of each subsequence and its institute are right
The symbol for the codomain range answered indicates that time series, which is converted to the symbolism sequence comprising temporal relationship, includes:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, in described
Between moment value corresponding observation and adjacent value ranges range relationship, renewal time sequence obtains most in the cut-point of time shaft
The cut-point of whole time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, is obtained same
Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of one codomain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, when obtaining including
The time series symbolism sequence of state relationship.
Further, the cut-point according to time series, determines the start/stop time of each subsequence in time series
Value and its corresponding symbol of locating codomain range, time series, which is converted to the symbolism sequence comprising temporal relationship, includes:
If the corresponding observation of intermediate time is located at previous codomain range, previous subsequence is updated with the intermediate time value
Right side time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, latter subsequence is updated with the intermediate time value
Left side time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, not to the time shaft of adjacent subsequence
Cut-point is updated.
The embodiment of the present invention also provides a kind of time series bilayer symbolism device, comprising:
Grouping module is grouped time series for the size according to observation in time series;
First determining module, for determining that glossary of symbols is big by Shannon entropy self-adaption cluster to the time series after grouping
The corresponding codomain range of symbol in small and glossary of symbols;
Module is obtained, for the characteristic point sequence by Minimum description length criterion and gradient slope method acquisition time sequence
Column;
Second determining module, for, in the jump relation of codomain range, determining the time according to the adjacent characteristic point of time series
Cut-point of the sequence in time shaft;
Symbolism module, for the cut-point according to time series, when determining the start-stop of each subsequence in time series
Quarter the value and its corresponding symbol of locating codomain range, time series is converted to the symbolism sequence comprising temporal relationship.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, according to the size of observation in time series, time series is grouped;To after grouping when
Between sequence, the corresponding codomain range of symbol in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster;By most
The characteristic point sequence of small description length criteria and gradient slope method acquisition time sequence;Existed according to the adjacent characteristic point of time series
The jump relation of codomain range determines time series in the cut-point of time shaft;According to the cut-point of time series, the time is determined
The start/stop time value of each subsequence and its corresponding symbol of locating codomain range in sequence, by time series be converted to comprising when
The symbolism sequence of state relationship, can retain the beginning and ending time of each subsequence, thus retain each subsequence can be lasting
Specific time interval.
Detailed description of the invention
Fig. 1 is the flow diagram of time series bilayer symbolism method provided in an embodiment of the present invention;
Fig. 2 is time series schematic diagram provided in an embodiment of the present invention;
Fig. 3 is the calculating process schematic diagram of L provided in an embodiment of the present invention (H) and L (D | H);
Fig. 4 is distance metric operation principle schematic diagram provided in an embodiment of the present invention;
Fig. 5 is the candidate feature point schematic diagram provided in an embodiment of the present invention extracted according to MDL criterion;
Fig. 6 is the Long-term change trend point schematic diagram provided in an embodiment of the present invention extracted by gradient method;
Fig. 7 is the double-deck symbolism process detailed maps provided in an embodiment of the present invention;
Fig. 8 is the semiosis schematic diagram provided in an embodiment of the present invention based under traditional approach;
Fig. 9 is the double-deck semiosis schematic diagram provided in an embodiment of the present invention;
Figure 10 is the structural schematic diagram of time series bilayer symbolism device provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool
Body embodiment is described in detail.
The present invention, which is directed to, to be needed to be manually set parameter in existing time series semiosis and can not retain each shape
The problem of state duration length, provides a kind of time series bilayer symbolism method and device.
Embodiment one
As shown in Figure 1, time series bilayer symbolism method provided in an embodiment of the present invention, comprising:
S1 is grouped time series according to the size of observation in time series;
S2 is determined in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster and is accorded with to the time series after grouping
Number corresponding codomain range;
S3 passes through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
S4 determines time series in time shaft according to the adjacent characteristic point of time series in the jump relation of codomain range
Cut-point;
S5 determines the start/stop time value of each subsequence in time series and its locating according to the cut-point of time series
Time series is converted to the symbolism sequence comprising temporal relationship by the corresponding symbol of codomain range.
Time series bilayer symbolism method described in the embodiment of the present invention, according to the size of observation in time series,
Time series is grouped;To the time series after grouping, glossary of symbols size and symbol are determined by Shannon entropy self-adaption cluster
Number concentrate the corresponding codomain range of symbol;Pass through Minimum description length criterion and the feature of gradient slope method acquisition time sequence
Point sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft
Point;According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined
Time series is converted to the symbolism sequence comprising temporal relationship, can retain the start-stop of each subsequence by corresponding symbol
Time, thus retain each subsequence can lasting specific time interval.
It is further, described according to time sequence in the specific embodiment of aforesaid time sequence bilayer symbolism method
The size of observation in column, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains according to the principle in each section including identical type data
Multiple initial sections.
In the present embodiment, for example, as shown in Fig. 2, by the observation in the time series X of continuous type numerical value according to incremental
Sequence is arranged, and time series is tentatively divided into p section I1,I2,...,Ij,...,Ip, I indicate time series it is preliminary
The section of division, j ∈ [1 ..., p] indicate section label, can specifically include following steps:
S11, it is assumed that time series X=< (x of continuous type numerical value1,t1),(x2,t2),...,(xi,ti),...,(xn,tn)
>, wherein xiIndicate tiThe length of the observation at moment, time series is n, according to observation xiAscending sequence is by the time
Sequence X is arranged, it is assumed that x1<...<xi<...xn, then the time series after sorting is < (x1,t1),(x2,t2),...,(xi,
ti),...,(xn,tn) >.
Time series after sequence is tentatively averagely divided into p section by S12, wherein comprising mutually of the same race in each section
Class data assume that its length is n/p for the ease of expression, then the data in j-th of section are
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, after described pair of grouping
Time series determines that the corresponding codomain range of symbol includes: in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22 merges two sections of arbitrary neighborhood, determines the summation of all section entropy after merging, and determine its summation
Difference between the entropy of entire original time series;
S23, iteration execute S22, each iteration only merges once, after the completion of iteration, merged when by difference maximum two
A adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and true according to the number of current interval
The size for determining glossary of symbols, the range of the observation according to included in section determine codomain model corresponding to each symbol in glossary of symbols
Enclose, wherein the number of character class included in glossary of symbols be by determined by codomain range, therefore the classification number of symbol with
The number in section is equal.
In the present embodiment, iteration merges the obtained time series section I of Preliminary division, calculate entire original time series and
The variation of data entropy is after merging to obtain the size that cluster section determines glossary of symbols, and the model of the observation according to included in section
Codomain range corresponding to each symbol in determining glossary of symbols is enclosed, can specifically include following steps:
B1, according to the calculation formula for the comentropy that Shannon provides, for aleatory variable x, comentropy H (I) can be indicated
Are as follows:
Wherein, P (x) indicates the probability that variable x occurs, it is assumed that has a sequence, Y=[1,2,3,4,5,6,1,2] then exists
In sequence Y, the probability P (1) that numerical value 1 occurs is equal to 2/8, i.e., 0.25.
Therefore, pass through formulaObtain the entropy H (I in each sectionj) are as follows:
Wherein, IjIndicate the section of Preliminary division;The length of n expression time series;mjIndicate IjContained difference in section
The species number of value;njiIndicate IjThe number of corresponding i-th kind of data category in section;
According to obtained H (Ij), determine the entropy h (H) of entire original time series are as follows:
B2 merges two sections of arbitrary neighborhood, for example, IjSection and Ij+1Section, all sections after being merged
The summation h'(H of entropy), it indicates are as follows:
Wherein, I'j=Ij+Ij+1;
Determine h'(H) and the entropy h (H) of entire original time series between difference.
B3, iteration execute B2, and each iteration only merges once, after the completion of iteration, determine the difference after all situations merge,
By when difference maximum (i.e.:) two adjacent sections for being merged are closed
And;
B4 is returned and is executed B2, B3 and terminate iteration when difference is not in variation, and according to current interval (be referred to as:
Cluster section) number determine that the size of glossary of symbols, the range of the observation according to included in section determine that each symbol institute is right
The codomain range answered.
It is further, described to be retouched by minimum in the specific embodiment of aforesaid time sequence bilayer symbolism method
The characteristic point sequence for stating length criteria and gradient slope method acquisition time sequence includes:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than Grads threshold
Long-term change trend point of the observation point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, time series is obtained
Characteristic point sequence.
In the present embodiment, using Minimum description length criterion (minimum description length, MDL) to continuous
The time series of type is compressed, and the candidate feature point of extraction time sequence can specifically include following steps:
C1, according to MDL criterion calculate between adjacent two observation point of time series (observation point includes moment and observation) away from
From description:
It being illustrated in conjunction with Fig. 3 and Fig. 4, L (H) assumes that the description length of condition, L (D | H) it is to be set up in assumed condition
Under the premise of, the description length of data, xckIndicate the ck candidate feature point len (x in time seriesckxc(k+1)) indicate line
Section xckxc(k+1)Length;d⊥(xckxc(k+1),xjx(j+1)) indicate line segment xckxc(k+1)With line segment xjx(j+1)(ck≤j≤c(k+1))
Between vertical range;dθ(xckxc(k+1),xjx(j+1)) indicate its angular distance;
The process is referred to Fig. 4 and is analyzed, that is, has:
Wherein, referring to Fig. 4, θ indicates it is adjacent two observation point x during time series feature point extractionjAnd xj+1Institute
The vector L being formed by connectingjWith neighbouring two candidate feature point xckAnd xc(k+1)The vector L connectediBetween angle, x'jFor point
xjIn LiOn subpoint;x'(j+1)For x(j+1)Subpoint;l⊥1Indicate xjAnd x'jBetween Euclidean distance;l⊥2Indicate x(j+1)
And x'(j+1)Between Euclidean distance.
C2, according to the calculated result in C1, by point xc(k+1)Measurement when as candidate feature point spends MDLpar(xck,
xc(k+1)) and point xc(k+1)Measurement when as non-candidate characteristic point spends MDLnopar(xck,xc(k+1)), when the former is less than the latter
Meet candidate feature point demand, it as shown in Figure 3 can be using the point as the candidate feature of time series point, wherein MDLpar(xck,
xc(k+1)) and MDLnopar(xck,xc(k+1)) indicate are as follows:
MDLpar(xck,xc(k+1))=L (H)+L (D | H)
When meeting MDLpar(xck,xc(k+1))<MDLnopar(xck,xc(k+1)) when, the point is special as the candidate of time series
The case where levying point, be otherwise non-candidate characteristic point, and being considered when next point is candidate feature point simultaneously.
In the present embodiment, the candidate feature point extracted according to MDL criterion is as shown in Figure 5.
In the present embodiment, by gradient method, its gradient value is calculated according to observation point adjacent in time series, for
A certain observation point xi, Grads threshold is determined using the gradient mean value and variance of whole time series, filters out becoming for time series
Gesture change point, can specifically include following steps:
D1 determines observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiAnd observation point
xj+1Between slope ki(j+1);
D2, according to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation
The corresponding gradient △ of point i, jij;Wherein, n indicates the length of time series;
D3 passes through formula for observation point iN >=3 determine observation point i's
Grads threshold λi;
D4, judges whether observation point i is greater than Grads threshold λi, if so, becoming observation point i as the trend of time series
Change point.
In the present embodiment, the Long-term change trend point extracted by gradient method is as shown in Figure 6.
In the present embodiment, the Long-term change trend point of obtained time series is added into the time series that MDL criterion is got
Candidate feature point in, obtain the characteristic point sequence of time series.
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, as shown with 7, according to when
Between the adjacent characteristic point of sequence in the jump relation of codomain range, determine time series in the cut-point of time shaft, for completing the
One layer of symbolism, can specifically include following steps:
The up and down and these three steady jumps occurred by characteristic point two neighboring in time series in codomain range
Relationship determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, as shown with 7, according to when
Between sequence cut-point, determine the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series
Number, time series is converted to the symbolism sequence comprising temporal relationship can specifically include for completing second layer symbolism
Following steps:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, in described
Between moment value corresponding observation and adjacent value ranges range relationship, renewal time sequence obtains most in the cut-point of time shaft
The cut-point of whole time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, is obtained same
Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of one codomain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, when obtaining including
The time series symbolism sequence of state relationship.
In the present embodiment, obtained symbolism sequence can be indicated are as follows:
[a,(t1,t2)],[b,(t3,t4)],......(t1<t2<t3<t4<…)
Wherein, a, b ... indicate symbol;(t1,t2) it is the beginning and ending time corresponding to a sign condition;(t3,t4) it is b symbol
Beginning and ending time corresponding to state.
It is further, described according to time sequence in the specific embodiment of aforesaid time sequence bilayer symbolism method
The cut-point of column determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, will
Time series is converted to the symbolism sequence comprising temporal relationship
If the corresponding observation of intermediate time is located at previous codomain range, previous subsequence is updated with the intermediate time value
Right side time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, latter subsequence is updated with the intermediate time value
Left side time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, not to the time shaft of adjacent subsequence
Cut-point is updated.
In the present embodiment, such as Fig. 8 and as figure 9, time series bilayer symbolism described in the present embodiment can retain every
The beginning and ending time of a subsequence, thus retain each subsequence can lasting specific time interval.
Time series bilayer symbolism method described in embodiment for a better understanding of the present invention is specifically answered in conjunction with one
With being described in detail:
Time series bilayer symbolism method described in the embodiment of the present invention, can be used in space quality data, for example,
To including 5 attributes, respectively the air quality data of PM2.5, PM10, NO2, O3 and SO2 carry out symbolism, wherein every
The data of primary each attribute are obtained within a hour, selected part data are described, and the form of data set is as follows:
1 air quality partial data collection of table
The sequence of value size is observed to each attribute respectively, tentatively the time series of each attribute is grouped;
For the time series of each attribute after grouping, each attribute is obtained using by the self-adaption cluster of Shannon entropy
Corresponding codomain range (codomain grade classification), while being the corresponding symbol of each codomain range assignment, the i.e. number of symbol
It is equal with codomain number of degrees, that is, the range that can obtain codomain by self-adaption cluster is as shown in table 2:
2 air quality data codomain range of table divides
The result as shown in table 2 is can be found that via the grade quilt that can obtain such as PM2.5 after adaptive clustering scheme
7 grades are divided into, each grade has the codomain range corresponding to it.
Then, feature point extraction is carried out to time series corresponding to each attribute of air quality data, utilizes characteristic point
The time series of sequence characterization serial number;
According to jump relation of the adjacent characteristic point of air quality data between codomain grade, determine that the trend of time series becomes
Change state (specific: time series is temporarily converted to three kinds of trend states), according to second of trend state subsequence and third
The intermediate time value of kind trend state subsequence adaptively determines the cut-point position of time shaft, and then obtains each subsequence institute
The lasting specific time interval of energy;
Finally, the codomain range being according to every strip sequence number is translated into corresponding symbolism sequence, above
It is the application in relation to time series bilayer symbolism method of the present invention in air quality data, then according to foregoing description
The results are shown in Table 3 for the symbolism that can obtain in table 1:
3 space quality data coding process table of table
It the characteristics of method proposed by the invention can be clearly found from table 3, can be accurately discrete by time series
The symbolism sequence that can be analyzed is turned to, while the lasting specific time range of each state can be retained, wherein x14Under
First numerical value of target represents attribute, and second numerical value represents codomain grade at measured value at this time, more important point
Be, the whole process of symbolism do not need be manually set parameter, avoid because parameter it is unreasonable caused by loss of learning phenomena such as,
The application aspect of method, for time series this method after symbolism in addition to conventional symbols method fortune can be similar to
Being obtained with association rules mining algorithms such as apriori, FP-growth can also other than the correlation rule that can be used in instructing decision
The specific temporal relationship between regular interior item collection is obtained, more detailed rule can be obtained, and is used in sequential mode mining
A series of continuous state transfer cases, the prediction to air quality can be then obtained, people's health guide etc. has very well
Application.
Embodiment two
The present invention also provides a kind of specific embodiments of time series bilayer symbolism device, due to provided by the invention
Time series bilayer symbolism device is corresponding with the specific embodiment of aforesaid time sequence bilayer symbolism method, the time
Sequence bilayer symbolism device can be of the invention to realize by executing the process step in above method specific embodiment
Purpose, therefore the explanation in above-mentioned time series bilayer symbolism method specific embodiment, are also applied for the present invention and mention
The specific embodiment of the time series bilayer symbolism device of confession will no longer go to live in the household of one's in-laws on getting married in present invention specific embodiment below
It states.
As shown in Figure 10, the embodiment of the present invention also provides a kind of time series bilayer symbolism device, comprising:
Grouping module 11 is grouped time series for the size according to observation in time series;
First determining module 12, for determining glossary of symbols by Shannon entropy self-adaption cluster to the time series after grouping
The corresponding codomain range of symbol in size and glossary of symbols;
Module 13 is obtained, for the characteristic point by Minimum description length criterion and gradient slope method acquisition time sequence
Sequence;
Second determining module 14, for according to the adjacent characteristic point of time series codomain range jump relation, when determining
Between sequence time shaft cut-point;
Symbolism module 15 determines the start-stop of each subsequence in time series for the cut-point according to time series
Time series, is converted to the symbolism sequence comprising temporal relationship by moment value and its corresponding symbol of locating codomain range.
Time series bilayer symbolism device described in the embodiment of the present invention, according to the size of observation in time series,
Time series is grouped;To the time series after grouping, glossary of symbols size and symbol are determined by Shannon entropy self-adaption cluster
Number concentrate the corresponding codomain range of symbol;Pass through Minimum description length criterion and the feature of gradient slope method acquisition time sequence
Point sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft
Point;According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined
Time series is converted to the symbolism sequence comprising temporal relationship, can retain the start-stop of each subsequence by corresponding symbol
Time, thus retain each subsequence can lasting specific time interval.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications
It should be regarded as protection scope of the present invention.
Claims (9)
1. a kind of time series bilayer symbolism method characterized by comprising
According to the size of observation in time series, time series is grouped;
To the time series after grouping, determine that symbol is corresponding in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster
Codomain range;
Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the cut-point of time shaft;
According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined
Time series is converted to the symbolism sequence comprising temporal relationship by corresponding symbol.
2. time series bilayer symbolism method according to claim 1, which is characterized in that described according in time series
The size of observation, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains multiple according to the principle in each section including identical type data
Initial section.
3. time series bilayer symbolism method according to claim 1, which is characterized in that the time after described pair of grouping
Sequence determines that the corresponding codomain range of symbol includes: in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22, merge two sections of arbitrary neighborhood, determine merge after all section entropy summation, and determine its summation with it is whole
Difference between the entropy of a original time series;
S23, iteration execute S22, and each iteration only merges once, after the completion of iteration, two phases being merged when by difference maximum
Adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and is determined and accorded with according to the number of current interval
The size of number collection, the range of the observation according to included in section determines codomain range corresponding to each symbol in glossary of symbols.
4. time series bilayer symbolism method according to claim 1, which is characterized in that described to pass through minimum description length
Degree criterion and the characteristic point sequence of gradient slope method acquisition time sequence include:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than the sight of Grads threshold
Long-term change trend point of the measuring point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, the spy of time series is obtained
Levy point sequence.
5. time series bilayer symbolism method according to claim 4, which is characterized in that described by gradient side
Method, the gradient variation in analysis time sequence between observation point, will be greater than the observation point of Grads threshold as time series
Long-term change trend point includes:
Determine observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiWith observation point xj+1Between
Slope ki(j+1);
According to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation point i, j pair
The gradient △ answeredij;Wherein, n indicates the length of time series;
For observation point i, pass through formulaDetermine the gradient threshold of observation point i
Value λi;
Judge whether observation point i is greater than Grads threshold λi, if so, using observation point i as the Long-term change trend of time series point.
6. time series bilayer symbolism method according to claim 1, which is characterized in that described according to time series phase
Adjacent characteristic point determines that time series includes: in the cut-point of time shaft in the jump relation of codomain range
It is closed in the up and down that codomain range occurs with these three steady jumps by characteristic point two neighboring in time series
System, determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
7. time series bilayer symbolism method according to claim 6, which is characterized in that described according to time series
The symbol of cut-point, the start/stop time value and its corresponding codomain range that determine each subsequence indicates, time series is turned
Being changed to the symbolism sequence comprising temporal relationship includes:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, according to it is described intermediate when
It is worth the relationship of corresponding observation Yu adjacent value ranges range quarter, renewal time sequence obtains final in the cut-point of time shaft
The cut-point of time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, obtains same value
Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of domain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, obtain including that tense closes
The time series symbolism sequence of system.
8. time series bilayer symbolism method according to claim 7, which is characterized in that described according to time series
Cut-point determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, by the time
Sequence is converted to the symbolism sequence comprising temporal relationship
If the corresponding observation of intermediate time is located at previous codomain range, updated on the right side of previous subsequence with the intermediate time value
Time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, updated on the left of latter subsequence with the intermediate time value
Time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, the time shaft of adjacent subsequence is not divided
Point is updated.
9. a kind of time series bilayer symbolism device characterized by comprising
Grouping module is grouped time series for the size according to observation in time series;
First determining module, for the time series after grouping, by Shannon entropy self-adaption cluster determine glossary of symbols size and
The corresponding codomain range of symbol in glossary of symbols;
Module is obtained, for the characteristic point sequence by Minimum description length criterion and gradient slope method acquisition time sequence;
Second determining module, for, in the jump relation of codomain range, determining time series according to the adjacent characteristic point of time series
In the cut-point of time shaft;
Symbolism module determines the start/stop time value of each subsequence in time series for the cut-point according to time series
And its corresponding symbol of locating codomain range, time series is converted to the symbolism sequence comprising temporal relationship.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910261214.7A CN110032585B (en) | 2019-04-02 | 2019-04-02 | Time sequence double-layer symbolization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910261214.7A CN110032585B (en) | 2019-04-02 | 2019-04-02 | Time sequence double-layer symbolization method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110032585A true CN110032585A (en) | 2019-07-19 |
CN110032585B CN110032585B (en) | 2021-11-30 |
Family
ID=67237225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910261214.7A Active CN110032585B (en) | 2019-04-02 | 2019-04-02 | Time sequence double-layer symbolization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032585B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113017628A (en) * | 2021-02-04 | 2021-06-25 | 山东师范大学 | Consciousness and emotion recognition method and system integrating ERP components and nonlinear features |
CN116155426A (en) * | 2023-04-19 | 2023-05-23 | 恩平市奥新电子科技有限公司 | Sound console operation abnormity monitoring method based on historical data |
Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030163057A1 (en) * | 2002-02-22 | 2003-08-28 | Flick James T. | Method for diagnosing heart disease, predicting sudden death, and analyzing treatment response using multifractial analysis |
US20060026152A1 (en) * | 2004-07-13 | 2006-02-02 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
CN101206762A (en) * | 2007-11-16 | 2008-06-25 | 中国科学院光电技术研究所 | Adaptive optical image high-resolution restoration method combining frame selection and blind deconvolution |
CN101496716A (en) * | 2009-02-26 | 2009-08-05 | 周洪建 | Measurement method for detecting sleep apnoea with ECG signal |
CN101655847A (en) * | 2008-08-22 | 2010-02-24 | 山东省计算中心 | Expansive entropy information bottleneck principle based clustering method |
CN101707575A (en) * | 2009-11-09 | 2010-05-12 | 东南大学 | Chaotic noise signal estimating method based on symbolic vector dynamics |
CN101714192A (en) * | 2009-11-13 | 2010-05-26 | 航天东方红卫星有限公司 | Satellite test data processing system |
CN101894125A (en) * | 2010-05-13 | 2010-11-24 | 复旦大学 | Content-based video classification method |
CN101894560A (en) * | 2010-06-29 | 2010-11-24 | 上海大学 | Reference source-free MP3 audio frequency definition objective evaluation method |
CN101916277A (en) * | 2010-08-11 | 2010-12-15 | 武大吉奥信息技术有限公司 | XML format-based geographic tile multi-pyramid temporal dataset generation method and device thereof |
CN102129525A (en) * | 2011-03-24 | 2011-07-20 | 华北电力大学 | Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set |
CN103136327A (en) * | 2012-12-28 | 2013-06-05 | 中国矿业大学 | Time series signifying method based on local feature cluster |
CN103942425A (en) * | 2014-04-14 | 2014-07-23 | 中国人民解放军国防科学技术大学 | Data processing method and device |
US20150379110A1 (en) * | 2014-06-25 | 2015-12-31 | Vmware, Inc. | Automated methods and systems for calculating hard thresholds |
CN105242779A (en) * | 2015-09-23 | 2016-01-13 | 歌尔声学股份有限公司 | Method for identifying user action and intelligent mobile terminal |
US20160142266A1 (en) * | 2014-11-19 | 2016-05-19 | Battelle Memorial Institute | Extracting dependencies between network assets using deep learning |
CN106095787A (en) * | 2016-05-30 | 2016-11-09 | 重庆大学 | A kind of Symbolic Representation method of time series data |
CN107358156A (en) * | 2017-06-06 | 2017-11-17 | 华南理工大学 | The feature extracting method of Ultrasonic tissue characterization based on Hilbert-Huang transform |
CN107991097A (en) * | 2017-11-16 | 2018-05-04 | 西北工业大学 | A kind of Method for Bearing Fault Diagnosis based on multiple dimensioned symbolic dynamics entropy |
CN108595528A (en) * | 2018-03-29 | 2018-09-28 | 重庆大学 | A kind of multivariate time series are based on Fourier coefficient symbolism classification set creation method |
-
2019
- 2019-04-02 CN CN201910261214.7A patent/CN110032585B/en active Active
Patent Citations (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030163057A1 (en) * | 2002-02-22 | 2003-08-28 | Flick James T. | Method for diagnosing heart disease, predicting sudden death, and analyzing treatment response using multifractial analysis |
US20060026152A1 (en) * | 2004-07-13 | 2006-02-02 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
CN101206762A (en) * | 2007-11-16 | 2008-06-25 | 中国科学院光电技术研究所 | Adaptive optical image high-resolution restoration method combining frame selection and blind deconvolution |
CN101655847A (en) * | 2008-08-22 | 2010-02-24 | 山东省计算中心 | Expansive entropy information bottleneck principle based clustering method |
CN101496716A (en) * | 2009-02-26 | 2009-08-05 | 周洪建 | Measurement method for detecting sleep apnoea with ECG signal |
CN101707575A (en) * | 2009-11-09 | 2010-05-12 | 东南大学 | Chaotic noise signal estimating method based on symbolic vector dynamics |
CN101714192A (en) * | 2009-11-13 | 2010-05-26 | 航天东方红卫星有限公司 | Satellite test data processing system |
CN101894125A (en) * | 2010-05-13 | 2010-11-24 | 复旦大学 | Content-based video classification method |
CN101894560A (en) * | 2010-06-29 | 2010-11-24 | 上海大学 | Reference source-free MP3 audio frequency definition objective evaluation method |
CN101916277A (en) * | 2010-08-11 | 2010-12-15 | 武大吉奥信息技术有限公司 | XML format-based geographic tile multi-pyramid temporal dataset generation method and device thereof |
CN102129525A (en) * | 2011-03-24 | 2011-07-20 | 华北电力大学 | Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set |
CN103136327A (en) * | 2012-12-28 | 2013-06-05 | 中国矿业大学 | Time series signifying method based on local feature cluster |
CN103942425A (en) * | 2014-04-14 | 2014-07-23 | 中国人民解放军国防科学技术大学 | Data processing method and device |
US20150379110A1 (en) * | 2014-06-25 | 2015-12-31 | Vmware, Inc. | Automated methods and systems for calculating hard thresholds |
US20160142266A1 (en) * | 2014-11-19 | 2016-05-19 | Battelle Memorial Institute | Extracting dependencies between network assets using deep learning |
CN105242779A (en) * | 2015-09-23 | 2016-01-13 | 歌尔声学股份有限公司 | Method for identifying user action and intelligent mobile terminal |
CN106095787A (en) * | 2016-05-30 | 2016-11-09 | 重庆大学 | A kind of Symbolic Representation method of time series data |
CN107358156A (en) * | 2017-06-06 | 2017-11-17 | 华南理工大学 | The feature extracting method of Ultrasonic tissue characterization based on Hilbert-Huang transform |
CN107991097A (en) * | 2017-11-16 | 2018-05-04 | 西北工业大学 | A kind of Method for Bearing Fault Diagnosis based on multiple dimensioned symbolic dynamics entropy |
CN108595528A (en) * | 2018-03-29 | 2018-09-28 | 重庆大学 | A kind of multivariate time series are based on Fourier coefficient symbolism classification set creation method |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113017628A (en) * | 2021-02-04 | 2021-06-25 | 山东师范大学 | Consciousness and emotion recognition method and system integrating ERP components and nonlinear features |
CN113017628B (en) * | 2021-02-04 | 2022-06-10 | 山东师范大学 | Consciousness and emotion recognition method and system integrating ERP components and nonlinear features |
CN116155426A (en) * | 2023-04-19 | 2023-05-23 | 恩平市奥新电子科技有限公司 | Sound console operation abnormity monitoring method based on historical data |
Also Published As
Publication number | Publication date |
---|---|
CN110032585B (en) | 2021-11-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2648946C2 (en) | Image object category recognition method and device | |
CN110852168A (en) | Pedestrian re-recognition model construction method and device based on neural framework search | |
CN106339416B (en) | Educational data clustering method based on grid fast searching density peaks | |
CN110738647B (en) | Mouse detection method integrating multi-receptive-field feature mapping and Gaussian probability model | |
CN105718960A (en) | Image ordering model based on convolutional neural network and spatial pyramid matching | |
CN107622322B (en) | Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff | |
CN101196905A (en) | Intelligent pattern searching method | |
CN110032585A (en) | A kind of time series bilayer symbolism method and device | |
CN115270007B (en) | POI recommendation method and system based on mixed graph neural network | |
CN112084373A (en) | Multi-source heterogeneous network user alignment method based on graph embedding | |
CN109325510A (en) | A kind of image characteristic point matching method based on lattice statistical | |
CN110569883A (en) | Air quality index prediction method based on Kohonen network clustering and Relieff feature selection | |
CN113887698B (en) | Integral knowledge distillation method and system based on graph neural network | |
CN112559587B (en) | Track space-time semantic mode extraction method based on urban semantic map | |
CN110070120B (en) | Depth measurement learning method and system based on discrimination sampling strategy | |
CN112685573A (en) | Knowledge graph embedding training method and related device | |
Zheng et al. | Boundary adjusted network based on cosine similarity for temporal action proposal generation | |
CN112488063A (en) | Video statement positioning method based on multi-stage aggregation Transformer model | |
CN109543712B (en) | Method for identifying entities on temporal data set | |
CN115797309A (en) | Surface defect segmentation method based on two-stage incremental learning | |
CN115424012A (en) | Lightweight image semantic segmentation method based on context information | |
CN114647679A (en) | Hydrological time series motif mining method based on numerical characteristic clustering | |
CN114398991A (en) | Electroencephalogram emotion recognition method based on Transformer structure search | |
CN114611668A (en) | Vector representation learning method and system based on heterogeneous information network random walk | |
CN111079089B (en) | Base station data anomaly detection method based on interval division |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |