CN110032585A - A kind of time series bilayer symbolism method and device - Google Patents

A kind of time series bilayer symbolism method and device Download PDF

Info

Publication number
CN110032585A
CN110032585A CN201910261214.7A CN201910261214A CN110032585A CN 110032585 A CN110032585 A CN 110032585A CN 201910261214 A CN201910261214 A CN 201910261214A CN 110032585 A CN110032585 A CN 110032585A
Authority
CN
China
Prior art keywords
time series
point
time
sequence
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910261214.7A
Other languages
Chinese (zh)
Other versions
CN110032585B (en
Inventor
王玲
李俊飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN201910261214.7A priority Critical patent/CN110032585B/en
Publication of CN110032585A publication Critical patent/CN110032585A/en
Application granted granted Critical
Publication of CN110032585B publication Critical patent/CN110032585B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of time series bilayer symbolism method and device, can retain each subsequence can be lasting specific time interval.The described method includes: being grouped according to the size of observation in time series to time series;To the time series after grouping, the corresponding codomain range of symbol in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster;Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the cut-point of time shaft;According to the cut-point of time series, determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, time series is converted to the symbolism sequence comprising temporal relationship.The present invention relates to data processing fields.

Description

A kind of time series bilayer symbolism method and device
Technical field
The present invention relates to data processing field, a kind of time series bilayer symbolism method and device are particularly related to.
Background technique
The observation that time series is recorded under specific time by one group forms, and often has fixed time interval. But the time series of continuous type numerical value is unfavorable for analyzing in practical applications, and the symbolism of time series is exactly in order to effective The internal structure of acquisition time sequence and a kind of suitable discretization means carried out, time series symbolism engineering, science, The numerous areas such as sociology and economics are all widely used.It but in the prior art, is simply to be clustered mostly Or the size of directly regulation glossary of symbols carries out symbolism, is easy to cause the loss of data information in this way, it can not feedback coefficient According to the length of different conditions institute's duration, such as:
It is logical to polymerize approximate (Symbolic Aggregate Approximation, SAX) using symbolism for the prior art one Artificial stipulated time sequence symbol collection size is crossed, and impartial division, last benefit are carried out according to symbol numbers to time sequential value domain The mean value of time series subsequence under using demarcation interval that can divide as this section representative symbol, thus by time series turn Become symbolism sequence.
The prior art two carries out symbolism conversion in conjunction with clustering algorithm, for example, being calculated by K-Means (K- mean value) cluster Method sets K initial cluster center, updates cluster centre by continuous iteration to obtain K class cluster, wherein each class cluster pair A different symbol is answered, and then time series is changed into corresponding symbolism sequence.
The prior art one and the prior art two, although also can by time series is discrete turn to required for symbolism sequence Column, but the discretization process needs continuous adjusting parameter to realize optimal as a result, the symbolism of time series is a Xiang Chong The data prediction step wanted, removing initial data cleansing in this process should also save as far as possible what data were included Information, while this method should have more general applicability, can just show the importance of time series symbolism method in this way.It is existing Technology is more to reach required purpose based on the continuous adjustment of parameter, and just need in face of different time serieses Again the parameter before adjustment, more importantly its final symbolism sequence got can not preferably react each The length of state institute's duration is merely capable of representing a kind of secondary sequence of different conditions, to sum up, existing time series symbol Number changing method overall process excessively relies on artificial settings parameter, and is easily lost data information.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of time series bilayer symbolism method and devices, existing to solve Have and needs to be manually set parameter in time series semiosis present in technology and each state duration can not be retained The problem of length.
In order to solve the above technical problems, the embodiment of the present invention provides a kind of time series bilayer symbolism method, comprising:
According to the size of observation in time series, time series is grouped;
To the time series after grouping, symbol pair in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster The codomain range answered;
Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft Point;
According to the cut-point of time series, the start/stop time value and its locating codomain of each subsequence in time series are determined Time series is converted to the symbolism sequence comprising temporal relationship by the corresponding symbol of range.
Further, the size according to observation in time series, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains according to the principle in each section including identical type data Multiple initial sections.
Further, described pair grouping after time series, by Shannon entropy self-adaption cluster determine glossary of symbols size and The corresponding codomain range of symbol includes: in glossary of symbols
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22 merges two sections of arbitrary neighborhood, determines the summation of all section entropy after merging, and determine its summation Difference between the entropy of entire original time series;
S23, iteration execute S22, each iteration only merges once, after the completion of iteration, merged when by difference maximum two A adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and true according to the number of current interval The size for determining glossary of symbols, the range of the observation according to included in section determine codomain model corresponding to each symbol in glossary of symbols It encloses.
Further, described by Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence Column include:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than Grads threshold Long-term change trend point of the observation point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, time series is obtained Characteristic point sequence.
Further, described by gradient method, the gradient variation in analysis time sequence between observation point will Observation point greater than Grads threshold includes: as the Long-term change trend point of time series
Determine observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiWith observation point xj+1 Between slope ki(j+1)
According to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation point I, the corresponding gradient △ of jij;Wherein, n indicates the length of time series;
For observation point i, pass through formulaN >=3 determine the ladder of observation point i Spend threshold value λi
Judge whether observation point i is greater than Grads threshold λi, if so, using observation point i as the Long-term change trend of time series Point.
Further, it is described according to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series Include: in the cut-point of time shaft
The up and down and these three steady jumps occurred by characteristic point two neighboring in time series in codomain range Relationship determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
Further, the cut-point according to time series determines that the start/stop time value of each subsequence and its institute are right The symbol for the codomain range answered indicates that time series, which is converted to the symbolism sequence comprising temporal relationship, includes:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, in described Between moment value corresponding observation and adjacent value ranges range relationship, renewal time sequence obtains most in the cut-point of time shaft The cut-point of whole time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, is obtained same Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of one codomain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, when obtaining including The time series symbolism sequence of state relationship.
Further, the cut-point according to time series, determines the start/stop time of each subsequence in time series Value and its corresponding symbol of locating codomain range, time series, which is converted to the symbolism sequence comprising temporal relationship, includes:
If the corresponding observation of intermediate time is located at previous codomain range, previous subsequence is updated with the intermediate time value Right side time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, latter subsequence is updated with the intermediate time value Left side time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, not to the time shaft of adjacent subsequence Cut-point is updated.
The embodiment of the present invention also provides a kind of time series bilayer symbolism device, comprising:
Grouping module is grouped time series for the size according to observation in time series;
First determining module, for determining that glossary of symbols is big by Shannon entropy self-adaption cluster to the time series after grouping The corresponding codomain range of symbol in small and glossary of symbols;
Module is obtained, for the characteristic point sequence by Minimum description length criterion and gradient slope method acquisition time sequence Column;
Second determining module, for, in the jump relation of codomain range, determining the time according to the adjacent characteristic point of time series Cut-point of the sequence in time shaft;
Symbolism module, for the cut-point according to time series, when determining the start-stop of each subsequence in time series Quarter the value and its corresponding symbol of locating codomain range, time series is converted to the symbolism sequence comprising temporal relationship.
The advantageous effects of the above technical solutions of the present invention are as follows:
In above scheme, according to the size of observation in time series, time series is grouped;To after grouping when Between sequence, the corresponding codomain range of symbol in glossary of symbols size and glossary of symbols is determined by Shannon entropy self-adaption cluster;By most The characteristic point sequence of small description length criteria and gradient slope method acquisition time sequence;Existed according to the adjacent characteristic point of time series The jump relation of codomain range determines time series in the cut-point of time shaft;According to the cut-point of time series, the time is determined The start/stop time value of each subsequence and its corresponding symbol of locating codomain range in sequence, by time series be converted to comprising when The symbolism sequence of state relationship, can retain the beginning and ending time of each subsequence, thus retain each subsequence can be lasting Specific time interval.
Detailed description of the invention
Fig. 1 is the flow diagram of time series bilayer symbolism method provided in an embodiment of the present invention;
Fig. 2 is time series schematic diagram provided in an embodiment of the present invention;
Fig. 3 is the calculating process schematic diagram of L provided in an embodiment of the present invention (H) and L (D | H);
Fig. 4 is distance metric operation principle schematic diagram provided in an embodiment of the present invention;
Fig. 5 is the candidate feature point schematic diagram provided in an embodiment of the present invention extracted according to MDL criterion;
Fig. 6 is the Long-term change trend point schematic diagram provided in an embodiment of the present invention extracted by gradient method;
Fig. 7 is the double-deck symbolism process detailed maps provided in an embodiment of the present invention;
Fig. 8 is the semiosis schematic diagram provided in an embodiment of the present invention based under traditional approach;
Fig. 9 is the double-deck semiosis schematic diagram provided in an embodiment of the present invention;
Figure 10 is the structural schematic diagram of time series bilayer symbolism device provided in an embodiment of the present invention.
Specific embodiment
To keep the technical problem to be solved in the present invention, technical solution and advantage clearer, below in conjunction with attached drawing and tool Body embodiment is described in detail.
The present invention, which is directed to, to be needed to be manually set parameter in existing time series semiosis and can not retain each shape The problem of state duration length, provides a kind of time series bilayer symbolism method and device.
Embodiment one
As shown in Figure 1, time series bilayer symbolism method provided in an embodiment of the present invention, comprising:
S1 is grouped time series according to the size of observation in time series;
S2 is determined in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster and is accorded with to the time series after grouping Number corresponding codomain range;
S3 passes through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
S4 determines time series in time shaft according to the adjacent characteristic point of time series in the jump relation of codomain range Cut-point;
S5 determines the start/stop time value of each subsequence in time series and its locating according to the cut-point of time series Time series is converted to the symbolism sequence comprising temporal relationship by the corresponding symbol of codomain range.
Time series bilayer symbolism method described in the embodiment of the present invention, according to the size of observation in time series, Time series is grouped;To the time series after grouping, glossary of symbols size and symbol are determined by Shannon entropy self-adaption cluster Number concentrate the corresponding codomain range of symbol;Pass through Minimum description length criterion and the feature of gradient slope method acquisition time sequence Point sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft Point;According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined Time series is converted to the symbolism sequence comprising temporal relationship, can retain the start-stop of each subsequence by corresponding symbol Time, thus retain each subsequence can lasting specific time interval.
It is further, described according to time sequence in the specific embodiment of aforesaid time sequence bilayer symbolism method The size of observation in column, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains according to the principle in each section including identical type data Multiple initial sections.
In the present embodiment, for example, as shown in Fig. 2, by the observation in the time series X of continuous type numerical value according to incremental Sequence is arranged, and time series is tentatively divided into p section I1,I2,...,Ij,...,Ip, I indicate time series it is preliminary The section of division, j ∈ [1 ..., p] indicate section label, can specifically include following steps:
S11, it is assumed that time series X=< (x of continuous type numerical value1,t1),(x2,t2),...,(xi,ti),...,(xn,tn) >, wherein xiIndicate tiThe length of the observation at moment, time series is n, according to observation xiAscending sequence is by the time Sequence X is arranged, it is assumed that x1<...<xi<...xn, then the time series after sorting is < (x1,t1),(x2,t2),...,(xi, ti),...,(xn,tn) >.
Time series after sequence is tentatively averagely divided into p section by S12, wherein comprising mutually of the same race in each section Class data assume that its length is n/p for the ease of expression, then the data in j-th of section are
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, after described pair of grouping Time series determines that the corresponding codomain range of symbol includes: in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22 merges two sections of arbitrary neighborhood, determines the summation of all section entropy after merging, and determine its summation Difference between the entropy of entire original time series;
S23, iteration execute S22, each iteration only merges once, after the completion of iteration, merged when by difference maximum two A adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and true according to the number of current interval The size for determining glossary of symbols, the range of the observation according to included in section determine codomain model corresponding to each symbol in glossary of symbols Enclose, wherein the number of character class included in glossary of symbols be by determined by codomain range, therefore the classification number of symbol with The number in section is equal.
In the present embodiment, iteration merges the obtained time series section I of Preliminary division, calculate entire original time series and The variation of data entropy is after merging to obtain the size that cluster section determines glossary of symbols, and the model of the observation according to included in section Codomain range corresponding to each symbol in determining glossary of symbols is enclosed, can specifically include following steps:
B1, according to the calculation formula for the comentropy that Shannon provides, for aleatory variable x, comentropy H (I) can be indicated Are as follows:
Wherein, P (x) indicates the probability that variable x occurs, it is assumed that has a sequence, Y=[1,2,3,4,5,6,1,2] then exists In sequence Y, the probability P (1) that numerical value 1 occurs is equal to 2/8, i.e., 0.25.
Therefore, pass through formulaObtain the entropy H (I in each sectionj) are as follows:
Wherein, IjIndicate the section of Preliminary division;The length of n expression time series;mjIndicate IjContained difference in section The species number of value;njiIndicate IjThe number of corresponding i-th kind of data category in section;
According to obtained H (Ij), determine the entropy h (H) of entire original time series are as follows:
B2 merges two sections of arbitrary neighborhood, for example, IjSection and Ij+1Section, all sections after being merged The summation h'(H of entropy), it indicates are as follows:
Wherein, I'j=Ij+Ij+1
Determine h'(H) and the entropy h (H) of entire original time series between difference.
B3, iteration execute B2, and each iteration only merges once, after the completion of iteration, determine the difference after all situations merge, By when difference maximum (i.e.:) two adjacent sections for being merged are closed And;
B4 is returned and is executed B2, B3 and terminate iteration when difference is not in variation, and according to current interval (be referred to as: Cluster section) number determine that the size of glossary of symbols, the range of the observation according to included in section determine that each symbol institute is right The codomain range answered.
It is further, described to be retouched by minimum in the specific embodiment of aforesaid time sequence bilayer symbolism method The characteristic point sequence for stating length criteria and gradient slope method acquisition time sequence includes:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than Grads threshold Long-term change trend point of the observation point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, time series is obtained Characteristic point sequence.
In the present embodiment, using Minimum description length criterion (minimum description length, MDL) to continuous The time series of type is compressed, and the candidate feature point of extraction time sequence can specifically include following steps:
C1, according to MDL criterion calculate between adjacent two observation point of time series (observation point includes moment and observation) away from From description:
It being illustrated in conjunction with Fig. 3 and Fig. 4, L (H) assumes that the description length of condition, L (D | H) it is to be set up in assumed condition Under the premise of, the description length of data, xckIndicate the ck candidate feature point len (x in time seriesckxc(k+1)) indicate line Section xckxc(k+1)Length;d(xckxc(k+1),xjx(j+1)) indicate line segment xckxc(k+1)With line segment xjx(j+1)(ck≤j≤c(k+1)) Between vertical range;dθ(xckxc(k+1),xjx(j+1)) indicate its angular distance;
The process is referred to Fig. 4 and is analyzed, that is, has:
Wherein, referring to Fig. 4, θ indicates it is adjacent two observation point x during time series feature point extractionjAnd xj+1Institute The vector L being formed by connectingjWith neighbouring two candidate feature point xckAnd xc(k+1)The vector L connectediBetween angle, x'jFor point xjIn LiOn subpoint;x'(j+1)For x(j+1)Subpoint;l⊥1Indicate xjAnd x'jBetween Euclidean distance;l⊥2Indicate x(j+1) And x'(j+1)Between Euclidean distance.
C2, according to the calculated result in C1, by point xc(k+1)Measurement when as candidate feature point spends MDLpar(xck, xc(k+1)) and point xc(k+1)Measurement when as non-candidate characteristic point spends MDLnopar(xck,xc(k+1)), when the former is less than the latter Meet candidate feature point demand, it as shown in Figure 3 can be using the point as the candidate feature of time series point, wherein MDLpar(xck, xc(k+1)) and MDLnopar(xck,xc(k+1)) indicate are as follows:
MDLpar(xck,xc(k+1))=L (H)+L (D | H)
When meeting MDLpar(xck,xc(k+1))<MDLnopar(xck,xc(k+1)) when, the point is special as the candidate of time series The case where levying point, be otherwise non-candidate characteristic point, and being considered when next point is candidate feature point simultaneously.
In the present embodiment, the candidate feature point extracted according to MDL criterion is as shown in Figure 5.
In the present embodiment, by gradient method, its gradient value is calculated according to observation point adjacent in time series, for A certain observation point xi, Grads threshold is determined using the gradient mean value and variance of whole time series, filters out becoming for time series Gesture change point, can specifically include following steps:
D1 determines observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiAnd observation point xj+1Between slope ki(j+1)
D2, according to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation The corresponding gradient △ of point i, jij;Wherein, n indicates the length of time series;
D3 passes through formula for observation point iN >=3 determine observation point i's Grads threshold λi
D4, judges whether observation point i is greater than Grads threshold λi, if so, becoming observation point i as the trend of time series Change point.
In the present embodiment, the Long-term change trend point extracted by gradient method is as shown in Figure 6.
In the present embodiment, the Long-term change trend point of obtained time series is added into the time series that MDL criterion is got Candidate feature point in, obtain the characteristic point sequence of time series.
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, as shown with 7, according to when Between the adjacent characteristic point of sequence in the jump relation of codomain range, determine time series in the cut-point of time shaft, for completing the One layer of symbolism, can specifically include following steps:
The up and down and these three steady jumps occurred by characteristic point two neighboring in time series in codomain range Relationship determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
In the specific embodiment of aforesaid time sequence bilayer symbolism method, further, as shown with 7, according to when Between sequence cut-point, determine the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series Number, time series is converted to the symbolism sequence comprising temporal relationship can specifically include for completing second layer symbolism Following steps:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, in described Between moment value corresponding observation and adjacent value ranges range relationship, renewal time sequence obtains most in the cut-point of time shaft The cut-point of whole time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, is obtained same Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of one codomain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, when obtaining including The time series symbolism sequence of state relationship.
In the present embodiment, obtained symbolism sequence can be indicated are as follows:
[a,(t1,t2)],[b,(t3,t4)],......(t1<t2<t3<t4<…)
Wherein, a, b ... indicate symbol;(t1,t2) it is the beginning and ending time corresponding to a sign condition;(t3,t4) it is b symbol Beginning and ending time corresponding to state.
It is further, described according to time sequence in the specific embodiment of aforesaid time sequence bilayer symbolism method The cut-point of column determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, will Time series is converted to the symbolism sequence comprising temporal relationship
If the corresponding observation of intermediate time is located at previous codomain range, previous subsequence is updated with the intermediate time value Right side time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, latter subsequence is updated with the intermediate time value Left side time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, not to the time shaft of adjacent subsequence Cut-point is updated.
In the present embodiment, such as Fig. 8 and as figure 9, time series bilayer symbolism described in the present embodiment can retain every The beginning and ending time of a subsequence, thus retain each subsequence can lasting specific time interval.
Time series bilayer symbolism method described in embodiment for a better understanding of the present invention is specifically answered in conjunction with one With being described in detail:
Time series bilayer symbolism method described in the embodiment of the present invention, can be used in space quality data, for example, To including 5 attributes, respectively the air quality data of PM2.5, PM10, NO2, O3 and SO2 carry out symbolism, wherein every The data of primary each attribute are obtained within a hour, selected part data are described, and the form of data set is as follows:
1 air quality partial data collection of table
The sequence of value size is observed to each attribute respectively, tentatively the time series of each attribute is grouped;
For the time series of each attribute after grouping, each attribute is obtained using by the self-adaption cluster of Shannon entropy Corresponding codomain range (codomain grade classification), while being the corresponding symbol of each codomain range assignment, the i.e. number of symbol It is equal with codomain number of degrees, that is, the range that can obtain codomain by self-adaption cluster is as shown in table 2:
2 air quality data codomain range of table divides
The result as shown in table 2 is can be found that via the grade quilt that can obtain such as PM2.5 after adaptive clustering scheme 7 grades are divided into, each grade has the codomain range corresponding to it.
Then, feature point extraction is carried out to time series corresponding to each attribute of air quality data, utilizes characteristic point The time series of sequence characterization serial number;
According to jump relation of the adjacent characteristic point of air quality data between codomain grade, determine that the trend of time series becomes Change state (specific: time series is temporarily converted to three kinds of trend states), according to second of trend state subsequence and third The intermediate time value of kind trend state subsequence adaptively determines the cut-point position of time shaft, and then obtains each subsequence institute The lasting specific time interval of energy;
Finally, the codomain range being according to every strip sequence number is translated into corresponding symbolism sequence, above It is the application in relation to time series bilayer symbolism method of the present invention in air quality data, then according to foregoing description The results are shown in Table 3 for the symbolism that can obtain in table 1:
3 space quality data coding process table of table
It the characteristics of method proposed by the invention can be clearly found from table 3, can be accurately discrete by time series The symbolism sequence that can be analyzed is turned to, while the lasting specific time range of each state can be retained, wherein x14Under First numerical value of target represents attribute, and second numerical value represents codomain grade at measured value at this time, more important point Be, the whole process of symbolism do not need be manually set parameter, avoid because parameter it is unreasonable caused by loss of learning phenomena such as, The application aspect of method, for time series this method after symbolism in addition to conventional symbols method fortune can be similar to Being obtained with association rules mining algorithms such as apriori, FP-growth can also other than the correlation rule that can be used in instructing decision The specific temporal relationship between regular interior item collection is obtained, more detailed rule can be obtained, and is used in sequential mode mining A series of continuous state transfer cases, the prediction to air quality can be then obtained, people's health guide etc. has very well Application.
Embodiment two
The present invention also provides a kind of specific embodiments of time series bilayer symbolism device, due to provided by the invention Time series bilayer symbolism device is corresponding with the specific embodiment of aforesaid time sequence bilayer symbolism method, the time Sequence bilayer symbolism device can be of the invention to realize by executing the process step in above method specific embodiment Purpose, therefore the explanation in above-mentioned time series bilayer symbolism method specific embodiment, are also applied for the present invention and mention The specific embodiment of the time series bilayer symbolism device of confession will no longer go to live in the household of one's in-laws on getting married in present invention specific embodiment below It states.
As shown in Figure 10, the embodiment of the present invention also provides a kind of time series bilayer symbolism device, comprising:
Grouping module 11 is grouped time series for the size according to observation in time series;
First determining module 12, for determining glossary of symbols by Shannon entropy self-adaption cluster to the time series after grouping The corresponding codomain range of symbol in size and glossary of symbols;
Module 13 is obtained, for the characteristic point by Minimum description length criterion and gradient slope method acquisition time sequence Sequence;
Second determining module 14, for according to the adjacent characteristic point of time series codomain range jump relation, when determining Between sequence time shaft cut-point;
Symbolism module 15 determines the start-stop of each subsequence in time series for the cut-point according to time series Time series, is converted to the symbolism sequence comprising temporal relationship by moment value and its corresponding symbol of locating codomain range.
Time series bilayer symbolism device described in the embodiment of the present invention, according to the size of observation in time series, Time series is grouped;To the time series after grouping, glossary of symbols size and symbol are determined by Shannon entropy self-adaption cluster Number concentrate the corresponding codomain range of symbol;Pass through Minimum description length criterion and the feature of gradient slope method acquisition time sequence Point sequence;According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the segmentation of time shaft Point;According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined Time series is converted to the symbolism sequence comprising temporal relationship, can retain the start-stop of each subsequence by corresponding symbol Time, thus retain each subsequence can lasting specific time interval.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, without departing from the principles of the present invention, several improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (9)

1. a kind of time series bilayer symbolism method characterized by comprising
According to the size of observation in time series, time series is grouped;
To the time series after grouping, determine that symbol is corresponding in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster Codomain range;
Pass through Minimum description length criterion and the characteristic point sequence of gradient slope method acquisition time sequence;
According to the adjacent characteristic point of time series in the jump relation of codomain range, determine time series in the cut-point of time shaft;
According to the cut-point of time series, the start/stop time value and its locating codomain range of each subsequence in time series are determined Time series is converted to the symbolism sequence comprising temporal relationship by corresponding symbol.
2. time series bilayer symbolism method according to claim 1, which is characterized in that described according in time series The size of observation, is grouped time series and includes:
According to the principle that observation in time series is incremented by, time series is ranked up;
To the time series after sequence, it is grouped, obtains multiple according to the principle in each section including identical type data Initial section.
3. time series bilayer symbolism method according to claim 1, which is characterized in that the time after described pair of grouping Sequence determines that the corresponding codomain range of symbol includes: in glossary of symbols size and glossary of symbols by Shannon entropy self-adaption cluster
S21 determines the entropy of entire original time series by the calculation formula of Shannon entropy;
S22, merge two sections of arbitrary neighborhood, determine merge after all section entropy summation, and determine its summation with it is whole Difference between the entropy of a original time series;
S23, iteration execute S22, and each iteration only merges once, after the completion of iteration, two phases being merged when by difference maximum Adjacent section merges;
S24, returning to execution S22, S23 terminates iteration when difference is not in variation, and is determined and accorded with according to the number of current interval The size of number collection, the range of the observation according to included in section determines codomain range corresponding to each symbol in glossary of symbols.
4. time series bilayer symbolism method according to claim 1, which is characterized in that described to pass through minimum description length Degree criterion and the characteristic point sequence of gradient slope method acquisition time sequence include:
Time series is compressed by Minimum description length criterion, the candidate feature point of extraction time sequence;
By gradient method, the gradient in analysis time sequence between observation point changes, and will be greater than the sight of Grads threshold Long-term change trend point of the measuring point as time series;
The Long-term change trend point of obtained time series is added in the candidate feature point of time series, the spy of time series is obtained Levy point sequence.
5. time series bilayer symbolism method according to claim 4, which is characterized in that described by gradient side Method, the gradient variation in analysis time sequence between observation point, will be greater than the observation point of Grads threshold as time series Long-term change trend point includes:
Determine observation point x in time seriesiWith observation point xjBetween slope kij, and determine observation point xiWith observation point xj+1Between Slope ki(j+1)
According to obtained slope kij、ki(j+1), pass through formula △ij=| kij-ki(j+1)|, 1≤j≤n-1 determines observation point i, j pair The gradient △ answeredij;Wherein, n indicates the length of time series;
For observation point i, pass through formulaDetermine the gradient threshold of observation point i Value λi
Judge whether observation point i is greater than Grads threshold λi, if so, using observation point i as the Long-term change trend of time series point.
6. time series bilayer symbolism method according to claim 1, which is characterized in that described according to time series phase Adjacent characteristic point determines that time series includes: in the cut-point of time shaft in the jump relation of codomain range
It is closed in the up and down that codomain range occurs with these three steady jumps by characteristic point two neighboring in time series System, determines three kinds of trend states;Wherein,
If two neighboring characteristic point is located at identical codomain range, corresponding to subsequence is the first trend state;
Rise if codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is second of trend state;
If the decline of codomain grade occurs for two neighboring characteristic point, corresponding to subsequence is the third trend state;
Obtain the characteristic point that jumps of codomain grade as time series time shaft cut-point.
7. time series bilayer symbolism method according to claim 6, which is characterized in that described according to time series The symbol of cut-point, the start/stop time value and its corresponding codomain range that determine each subsequence indicates, time series is turned Being changed to the symbolism sequence comprising temporal relationship includes:
The intermediate time value for obtaining second of trend state subsequence He the third trend state subsequence, according to it is described intermediate when It is worth the relationship of corresponding observation Yu adjacent value ranges range quarter, renewal time sequence obtains final in the cut-point of time shaft The cut-point of time series;
According to the cut-point of updated time series, the observation point within the scope of same codomain is merged, obtains same value Start/stop time value of the leftmost side moment and rightmost side moment of all observation points as corresponding subsequence within the scope of domain;
According to the corresponding symbol of codomain range locating for the start/stop time value of each subsequence and each subsequence, obtain including that tense closes The time series symbolism sequence of system.
8. time series bilayer symbolism method according to claim 7, which is characterized in that described according to time series Cut-point determines the start/stop time value and its corresponding symbol of locating codomain range of each subsequence in time series, by the time Sequence is converted to the symbolism sequence comprising temporal relationship
If the corresponding observation of intermediate time is located at previous codomain range, updated on the right side of previous subsequence with the intermediate time value Time shaft cut-point, the right side time shaft cut-point new as previous subsequence;
If the corresponding observation of intermediate time is located at latter codomain range, updated on the left of latter subsequence with the intermediate time value Time shaft cut-point, the left side time shaft cut-point new as latter subsequence;
If there is no the variations of codomain range for the corresponding observation of intermediate time, the time shaft of adjacent subsequence is not divided Point is updated.
9. a kind of time series bilayer symbolism device characterized by comprising
Grouping module is grouped time series for the size according to observation in time series;
First determining module, for the time series after grouping, by Shannon entropy self-adaption cluster determine glossary of symbols size and The corresponding codomain range of symbol in glossary of symbols;
Module is obtained, for the characteristic point sequence by Minimum description length criterion and gradient slope method acquisition time sequence;
Second determining module, for, in the jump relation of codomain range, determining time series according to the adjacent characteristic point of time series In the cut-point of time shaft;
Symbolism module determines the start/stop time value of each subsequence in time series for the cut-point according to time series And its corresponding symbol of locating codomain range, time series is converted to the symbolism sequence comprising temporal relationship.
CN201910261214.7A 2019-04-02 2019-04-02 Time sequence double-layer symbolization method and device Active CN110032585B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910261214.7A CN110032585B (en) 2019-04-02 2019-04-02 Time sequence double-layer symbolization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910261214.7A CN110032585B (en) 2019-04-02 2019-04-02 Time sequence double-layer symbolization method and device

Publications (2)

Publication Number Publication Date
CN110032585A true CN110032585A (en) 2019-07-19
CN110032585B CN110032585B (en) 2021-11-30

Family

ID=67237225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910261214.7A Active CN110032585B (en) 2019-04-02 2019-04-02 Time sequence double-layer symbolization method and device

Country Status (1)

Country Link
CN (1) CN110032585B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113017628A (en) * 2021-02-04 2021-06-25 山东师范大学 Consciousness and emotion recognition method and system integrating ERP components and nonlinear features
CN116155426A (en) * 2023-04-19 2023-05-23 恩平市奥新电子科技有限公司 Sound console operation abnormity monitoring method based on historical data

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030163057A1 (en) * 2002-02-22 2003-08-28 Flick James T. Method for diagnosing heart disease, predicting sudden death, and analyzing treatment response using multifractial analysis
US20060026152A1 (en) * 2004-07-13 2006-02-02 Microsoft Corporation Query-based snippet clustering for search result grouping
CN101206762A (en) * 2007-11-16 2008-06-25 中国科学院光电技术研究所 Adaptive optical image high-resolution restoration method combining frame selection and blind deconvolution
CN101496716A (en) * 2009-02-26 2009-08-05 周洪建 Measurement method for detecting sleep apnoea with ECG signal
CN101655847A (en) * 2008-08-22 2010-02-24 山东省计算中心 Expansive entropy information bottleneck principle based clustering method
CN101707575A (en) * 2009-11-09 2010-05-12 东南大学 Chaotic noise signal estimating method based on symbolic vector dynamics
CN101714192A (en) * 2009-11-13 2010-05-26 航天东方红卫星有限公司 Satellite test data processing system
CN101894125A (en) * 2010-05-13 2010-11-24 复旦大学 Content-based video classification method
CN101894560A (en) * 2010-06-29 2010-11-24 上海大学 Reference source-free MP3 audio frequency definition objective evaluation method
CN101916277A (en) * 2010-08-11 2010-12-15 武大吉奥信息技术有限公司 XML format-based geographic tile multi-pyramid temporal dataset generation method and device thereof
CN102129525A (en) * 2011-03-24 2011-07-20 华北电力大学 Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set
CN103136327A (en) * 2012-12-28 2013-06-05 中国矿业大学 Time series signifying method based on local feature cluster
CN103942425A (en) * 2014-04-14 2014-07-23 中国人民解放军国防科学技术大学 Data processing method and device
US20150379110A1 (en) * 2014-06-25 2015-12-31 Vmware, Inc. Automated methods and systems for calculating hard thresholds
CN105242779A (en) * 2015-09-23 2016-01-13 歌尔声学股份有限公司 Method for identifying user action and intelligent mobile terminal
US20160142266A1 (en) * 2014-11-19 2016-05-19 Battelle Memorial Institute Extracting dependencies between network assets using deep learning
CN106095787A (en) * 2016-05-30 2016-11-09 重庆大学 A kind of Symbolic Representation method of time series data
CN107358156A (en) * 2017-06-06 2017-11-17 华南理工大学 The feature extracting method of Ultrasonic tissue characterization based on Hilbert-Huang transform
CN107991097A (en) * 2017-11-16 2018-05-04 西北工业大学 A kind of Method for Bearing Fault Diagnosis based on multiple dimensioned symbolic dynamics entropy
CN108595528A (en) * 2018-03-29 2018-09-28 重庆大学 A kind of multivariate time series are based on Fourier coefficient symbolism classification set creation method

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030163057A1 (en) * 2002-02-22 2003-08-28 Flick James T. Method for diagnosing heart disease, predicting sudden death, and analyzing treatment response using multifractial analysis
US20060026152A1 (en) * 2004-07-13 2006-02-02 Microsoft Corporation Query-based snippet clustering for search result grouping
CN101206762A (en) * 2007-11-16 2008-06-25 中国科学院光电技术研究所 Adaptive optical image high-resolution restoration method combining frame selection and blind deconvolution
CN101655847A (en) * 2008-08-22 2010-02-24 山东省计算中心 Expansive entropy information bottleneck principle based clustering method
CN101496716A (en) * 2009-02-26 2009-08-05 周洪建 Measurement method for detecting sleep apnoea with ECG signal
CN101707575A (en) * 2009-11-09 2010-05-12 东南大学 Chaotic noise signal estimating method based on symbolic vector dynamics
CN101714192A (en) * 2009-11-13 2010-05-26 航天东方红卫星有限公司 Satellite test data processing system
CN101894125A (en) * 2010-05-13 2010-11-24 复旦大学 Content-based video classification method
CN101894560A (en) * 2010-06-29 2010-11-24 上海大学 Reference source-free MP3 audio frequency definition objective evaluation method
CN101916277A (en) * 2010-08-11 2010-12-15 武大吉奥信息技术有限公司 XML format-based geographic tile multi-pyramid temporal dataset generation method and device thereof
CN102129525A (en) * 2011-03-24 2011-07-20 华北电力大学 Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set
CN103136327A (en) * 2012-12-28 2013-06-05 中国矿业大学 Time series signifying method based on local feature cluster
CN103942425A (en) * 2014-04-14 2014-07-23 中国人民解放军国防科学技术大学 Data processing method and device
US20150379110A1 (en) * 2014-06-25 2015-12-31 Vmware, Inc. Automated methods and systems for calculating hard thresholds
US20160142266A1 (en) * 2014-11-19 2016-05-19 Battelle Memorial Institute Extracting dependencies between network assets using deep learning
CN105242779A (en) * 2015-09-23 2016-01-13 歌尔声学股份有限公司 Method for identifying user action and intelligent mobile terminal
CN106095787A (en) * 2016-05-30 2016-11-09 重庆大学 A kind of Symbolic Representation method of time series data
CN107358156A (en) * 2017-06-06 2017-11-17 华南理工大学 The feature extracting method of Ultrasonic tissue characterization based on Hilbert-Huang transform
CN107991097A (en) * 2017-11-16 2018-05-04 西北工业大学 A kind of Method for Bearing Fault Diagnosis based on multiple dimensioned symbolic dynamics entropy
CN108595528A (en) * 2018-03-29 2018-09-28 重庆大学 A kind of multivariate time series are based on Fourier coefficient symbolism classification set creation method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113017628A (en) * 2021-02-04 2021-06-25 山东师范大学 Consciousness and emotion recognition method and system integrating ERP components and nonlinear features
CN113017628B (en) * 2021-02-04 2022-06-10 山东师范大学 Consciousness and emotion recognition method and system integrating ERP components and nonlinear features
CN116155426A (en) * 2023-04-19 2023-05-23 恩平市奥新电子科技有限公司 Sound console operation abnormity monitoring method based on historical data

Also Published As

Publication number Publication date
CN110032585B (en) 2021-11-30

Similar Documents

Publication Publication Date Title
RU2648946C2 (en) Image object category recognition method and device
CN110852168A (en) Pedestrian re-recognition model construction method and device based on neural framework search
CN106339416B (en) Educational data clustering method based on grid fast searching density peaks
CN110738647B (en) Mouse detection method integrating multi-receptive-field feature mapping and Gaussian probability model
CN105718960A (en) Image ordering model based on convolutional neural network and spatial pyramid matching
CN107622322B (en) Forecasting factor identification method of medium-long term runoff and forecasting method of medium-long term runoff
CN101196905A (en) Intelligent pattern searching method
CN110032585A (en) A kind of time series bilayer symbolism method and device
CN115270007B (en) POI recommendation method and system based on mixed graph neural network
CN112084373A (en) Multi-source heterogeneous network user alignment method based on graph embedding
CN109325510A (en) A kind of image characteristic point matching method based on lattice statistical
CN110569883A (en) Air quality index prediction method based on Kohonen network clustering and Relieff feature selection
CN113887698B (en) Integral knowledge distillation method and system based on graph neural network
CN112559587B (en) Track space-time semantic mode extraction method based on urban semantic map
CN110070120B (en) Depth measurement learning method and system based on discrimination sampling strategy
CN112685573A (en) Knowledge graph embedding training method and related device
Zheng et al. Boundary adjusted network based on cosine similarity for temporal action proposal generation
CN112488063A (en) Video statement positioning method based on multi-stage aggregation Transformer model
CN109543712B (en) Method for identifying entities on temporal data set
CN115797309A (en) Surface defect segmentation method based on two-stage incremental learning
CN115424012A (en) Lightweight image semantic segmentation method based on context information
CN114647679A (en) Hydrological time series motif mining method based on numerical characteristic clustering
CN114398991A (en) Electroencephalogram emotion recognition method based on Transformer structure search
CN114611668A (en) Vector representation learning method and system based on heterogeneous information network random walk
CN111079089B (en) Base station data anomaly detection method based on interval division

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant