CN107220483B - Earth temperature mode prediction method - Google Patents

Earth temperature mode prediction method Download PDF

Info

Publication number
CN107220483B
CN107220483B CN201710324105.6A CN201710324105A CN107220483B CN 107220483 B CN107220483 B CN 107220483B CN 201710324105 A CN201710324105 A CN 201710324105A CN 107220483 B CN107220483 B CN 107220483B
Authority
CN
China
Prior art keywords
interest
pattern
air temperature
temperature
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710324105.6A
Other languages
Chinese (zh)
Other versions
CN107220483A (en
Inventor
肖云
许震洲
王欣
王选宏
高颢函
陈晓江
房鼎益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaanxi Dahang Wujiang Information Technology Co ltd
Original Assignee
Northwestern University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern University filed Critical Northwestern University
Priority to CN201710324105.6A priority Critical patent/CN107220483B/en
Publication of CN107220483A publication Critical patent/CN107220483A/en
Application granted granted Critical
Publication of CN107220483B publication Critical patent/CN107220483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a soil temperature mode prediction method, which comprises three stages, wherein a candidate interest mode set is searched for a time sequence formed by each condition variable and decision variable, and each candidate interest mode set is clustered respectively; secondly, generating prediction between condition variables and decision variables; and finally, matching the interest mode obtained in the first stage of executing the condition variable of the data to be tested with the prediction rule in the second stage, and outputting the prediction result of the decision variable if the prediction rule meets the antecedent of the prediction rule. The mode prediction method of the multivariate time sequence data has small calculation amount, effectively reduces the time complexity in the mode prediction and solves the problem of overhigh time complexity in the traditional method.

Description

Earth temperature mode prediction method
Technical Field
The invention belongs to the field of computers, particularly relates to the field of data mining in computers, and particularly relates to a mode prediction method of multivariate time sequence data.
Background
Time sequence prediction is a very important research direction in the fields of weather forecast, stock and the like. One of the most important methods in time series prediction is to predict the behavior of some variables according to their trends, which is called multivariate time series prediction. For example, if we consider two variables to be related, we may want to know if, for example, a 10% increase in temperature in a weather forecast affects the trend of humidity.
In multivariate prediction, we can divide the main methods into mathematical and artificial methods. In mathematical methods such as ARIMA (automated integrated Moving Average Model), a Model is established by converting a non-stationary time series into a stationary time series and then regressing a dependent variable only for its lag value and the present value and the lag value of a random error term), or an exponential smoothing algorithm is unreliable in processing non-linear irregular data in the real world. Artificial neural networks, support vector machines and K-neighbors are some of the machine learning methods applied to time series prediction. However, these conventional methods fail because many time variables translate and scale over time. To solve this problem, one solution is to consider a sequence of behaviors rather than a variable value. For example, some methods perform pattern prediction in time series analysis. These methods all assume a method of representing the data and then trying to find the most frequent pattern. However, the main problems with these solutions are: data representation in these methods does not reduce data dimensionality, especially high-dimensional data, and they also have to process the data using methods such as clustering resulting in increased time complexity; another problem is that their research has no ability to interpret output rules and relationships, and therefore reduction in time complexity and interpretation of output rules and relationships need to be effectively solved.
Disclosure of Invention
Aiming at the defects and shortcomings of the prior art, the invention aims to provide a mode prediction method of multivariate time series data, and solves the problem of high time complexity of the conventional data processing method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a mode prediction method of multivariate time series data comprises the following steps:
stage one: searching candidate interest mode sets for a time sequence formed by each condition variable and each decision variable, and clustering each candidate interest mode set respectively;
step 1: searching a candidate interest mode set;
step 1.1: finding available initial subsequences
For time series S ═ S1,…,slFrom s1Begin to look for slope m in turn1Two adjacent time sequence values not equal to 0, and taking the two adjacent time sequence values found for the first time as an initial subsequence Si={si,si+1Where i ═ 1,2, …, l-1, l are the length of the time series, slope m1The calculation formula of (2) is as follows:
Figure GDA0002781951400000021
step 1.2: calculating the slope of adjacent time series values
Adding the next s to the available initial subsequencei+2Calculating si+2And si+1Slope m of2
Step 1.3: obtaining interest patterns
If m is2Is not equal to m1To obtain the interest pattern pα={si,si+1,si+2};
If m is2Is equal to m1Continue step 1.2 until mkIs not equal to m1Thus, the interest pattern p is obtainedα={si,si+1,…,si+kIn which m iskIs s isi+kAnd si+k-1K is 1,2, …, l-2;
step 1.4 obtaining a set of candidate interest patterns
For time series S ═ S1,…,slFrom interest pattern pαUntil the entire time series S is found, S ═ S, steps 1.1 to 1.3 are repeated1,…,slAll the interest patterns in the data form a candidate interest pattern set Pc={p1,p2,…,pα,…,pβ,…,pn};
Step 2: clustering a candidate interest pattern set;
step 2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;
pruning rule is that: if the candidate interest pattern set PcAny two interest patterns p inα,pβDo not occur simultaneously in the region width wsIn the same region, D in the distance matrix DαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure GDA0002781951400000031
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: if interest pattern pαIs negative and the pattern of interest pβIs positive, will be D in the distance matrix DαβEndowing the obtained product to infinity;
step 2.2: calculating D in distance matrixαβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;
step 2.3: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom PcDeletion of p inαAnd pβThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P;
wherein d isminTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and specifically designating by a user;
and a second stage: generating prediction rules
And step 3: computing association rules using Apriori algorithm
Merging the interest pattern set P of each time variable to obtain PallUsing Apriori algorithm on PallThe interest mode in the system is subjected to association rule mining to obtain a plurality of association rules among different time variables;
and 4, step 4: generating prediction rules
(ii) m (p)vM) The multiple association rules of m (p') -1 are combined to form the following prediction rule:
A1≤V(pv1)≤B1and A is2≤V(pv2)≤B2…, and Aj≤V(pvj)≤Bj…, and Aλ≤V(p)≤BλThen C is1≤V(p′)≤C2And a delay of ΔT1A unit time;
wherein p isvjJ is 1,2, …, lambda is not less than 1, lambda is the number of condition variables, and p' is the interest mode formed by decision variables;
m(pvM) Is sLAnd s1Slope between, m (p)vM)=sgn(sL-s1),pvMIs the interest pattern, s, in the condition variables that has the greatest influence on the decision variableLIs pvMIn the last time series value, s1Is pvMThe first time series value; m (p ') is s'LAnd s'1Of slope of, s'LRepresents the last time-series value, s 'in p'1Represents the first time series value in p';
Ajand BjAre each m (p)vM) V (p) in the interest pattern association rule of 1 to m (p')/1vj) Minimum and maximum values of, C1And C2Are each m (p)vM) Minimum and maximum values of V (p ') in the interest pattern association rule of 1 ═ m (p'), aj、Bj、C1And C2Are all positive numbers;
V(pvj) Is the pattern of interest p in the condition variablevjV (p ') is the variation of the interest pattern p' in the decision variables,
V(pvj)=(max(pvj)-min(pvj))×m(pvj)
V(p′)=(max(p′)-min(p′))×m(p′)
max(pvj) And min (p)vj) Respectively represent interest patterns pvjA maximum time series value and a minimum time series value;
time delay delta T1=max(Δ(rg)),Δ(rg)=IpvM-Ip′,IpvMIs pvMStarting time value of (I)p′Is the starting time value of p';
② m (p)vM) The multiple association rules of-1 are combined to form the following prediction rule:
E1≤V(pv1)≤F1and E is2≤V(pv2)≤F2…, and Ej≤V(pvj)≤Fj…, and Eη≤V(p)≤FηThen G is1≤V(p′)≤G2And delayed by Δ T2A unit time;
wherein E isjAnd FjAre each m (p)vM) V (p) in the interest pattern association rule of-1 ═ m (p'),j) Minimum and maximum of, G1And G2Are each m (p)vM) M (p') -1, j-1, 2, …, η, η ≧ 1, η being the number of condition variables; ej、Fj、G1And G2Are all negative numbers, and j is a natural number; delta T2=max(Δ(rg));
And a third stage: and matching the interest mode obtained in the first stage of executing the condition variable of the data to be tested with the prediction rule in the second stage, and outputting the prediction result of the decision variable if the prediction rule meets the antecedent of the prediction rule.
Further, the candidate interest pattern set clustering method in step 2 may be replaced by the following method:
step 2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode;
step 2.2: for each child node i and j in the R-tree data structure, the pattern distance value that satisfies the rule condition is assigned to infinity using the following pruning rules 1 and 2.
Pruning rule is that: if two interest patterns pα,pβDo not occur simultaneously in the region width wsIn the same region, D in the distance matrixαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure GDA0002781951400000061
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: if interest pattern pαIs negative and the pattern of interest pβIs positive, will be D in the distance matrix DαβEndowing the obtained product to infinity;
step 2.3: distance matrix D in DijCalculating non-infinite elements according to Euclidean distance, and assigning to corresponding positions in a distance matrix;
step 2.4: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom PcDeletion of p inαAnd pβThe interest mode with the smaller time sequence number in the two interest modes finally obtains a new interest mode set P; wherein d isminAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequences, wherein the certain value is specified by a user.
Compared with the prior art, the invention has the beneficial effects that: the mode prediction method of the multivariate time sequence data has small calculation amount, effectively reduces the time complexity in the mode prediction and solves the problem of overhigh time complexity in the traditional method.
Drawings
FIG. 1 is a timing diagram of air temperature and rammed earth temperature.
FIG. 2 is a graph of the air temperature versus rammed earth temperature variation.
FIG. 3 is an MBR between modes.
FIG. 4 is a diagram of an R-tree based data structure for retrieving candidate patterns according to the present invention.
Figure 5 is a graph of performance versus sequence number for euclidean distance measurements using pruning and non-pruning strategies for distance matrix calculations.
FIG. 6 is a graph comparing time taken for Euclidean distance calculation using the pruning rule and time performance of distance matrix calculation using pruning and R-tree.
FIG. 7 is a performance evaluation of the six rules generated in the example.
The present invention will be explained in further detail with reference to examples.
Detailed Description
The condition variables in the invention are variables which can be used for predicting other variables, and the decision variables are variables which can be predicted by other variables.
The method of the invention comprises three stages:
stage one: searching candidate interest mode sets for a time sequence formed by each condition variable and each decision variable, and clustering each candidate interest mode set respectively; and a second stage: generating a prediction rule; and a third stage: and predicting the data to be detected according to the prediction rule generated in the second stage.
And the third stage is to match the interest pattern obtained after the data to be measured undergoes the first stage with the prediction rule in the second stage aiming at the data to be measured, and if the prediction rule is met, the prediction result of the decision variable is output.
In phase one: looking for patterns of interest and summarizing the behavior of the data, looking for patterns with positive and negative slopes for changes in the data. Since sequence data may contain repetitive patterns, the algorithm clusters and groups these patterns, specifically:
step 1: finding a set of candidate patterns of interest
Step 1.1: finding available initial subsequences
For time series S ═ S1,…,slFrom s1Begin to look for slope m in turn1Two adjacent time sequence values which are not 0 are used as the initial subsequence Si={si,si+1Where i is 1,2, …, l-1, l is the length of the time series,
Figure GDA0002781951400000081
slope m1The calculation formula of (2) is as follows:
Figure GDA0002781951400000082
step 1.2: calculating the slope of adjacent time series values
Adding the next s to the available initial subsequencei+2Calculating si+2And si+1Slope m of2
Step 1.3: obtaining interest patterns
If m is2Is not equal to m1To obtain the interest pattern pα={si,si+1,si+2};
If m is2Is equal to m1Continue step 1.2 until mkIs not equal to m1Thus, the interest pattern p is obtainedα={si,si+1,…,si+kIn which m iskIs s isi+kAnd si+k-1K is 1,2, …, l-2;
step 1.4 obtaining a set of candidate interest patterns
For time series S ═ S1,…,slFrom interest pattern pαOf the last time series value (i.e. s)i+k) To begin with, steps 1.1 to 1.3 are repeated until the entire time series S is found { S ═ S1,…,slN interest patterns of form a candidate interest pattern set Pc={p1,p2,…,pα,…,pβ,…,pn};
Step 2: clustering a candidate interest pattern set;
similar patterns are grouped in a set of candidate patterns, and the first step in finding similar patterns is to generate a distance matrix between the patterns. For each pair of modes, the elements of a distance matrix show the distances of the two modes. However, the time consumption of the conventional algorithm is too large, and in order to solve the problem, two methods are proposed in the present invention: one is a pruning rule, and the other is an R-tree combined pruning rule.
The pruning rules are specifically as follows:
step 2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;
pruning rule is that: if the candidate interest pattern set PcAny two interest patterns p inα,pβDo not occur simultaneously in the region width wsThen the distance of these two modes is infinite, and D in the distance matrix D is set as DαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure GDA0002781951400000091
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: pruning is performed using the slope of the pattern if the pattern of interest pαIs negative and the pattern of interest pβAre positive, they cannot be considered similar, and the distance matrix D will be DαβFilling to infinity;
step 2.2: calculating D in distance matrixαβThe euclidean distances between elements that are not infinite are assigned to corresponding positions in the distance matrix.
Step 2.3: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom PcDeletion of p inαAnd pβThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P; wherein d isminTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and obtaining the time sequenceThe value is specified by the user.
For R-tree combined with pruning rules, the R-tree is used to index the candidate pattern set PcAs shown in FIG. 4, P1-P9For the candidate mode, the leaf node of each tree in the R-tree structure is the MBR of one candidate mode. The method comprises the following specific steps:
step 2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode; where each leaf node is the MBR of a pattern, the middle entry of the R-Tree indexes the pattern with the nearby MBR. This data structure will be used to reduce the time complexity of the algorithm by reducing the number of modes processed. FIG. 3 illustrates pattern p1And mode p2And (4) MBR in between.
Step 2.2: for each child node i and j in the R-tree data structure, the pattern distance value that satisfies the rule condition is assigned to infinity using the following pruning rules 1 and 2.
Pruning rule is that: if the candidate interest pattern set PcAny two interest patterns p inα,pβDo not occur simultaneously in the region width wsThen the distance of these two modes is infinite, and D in the distance matrix D is set as DαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure GDA0002781951400000101
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: pruning is performed using the slope of the pattern if the pattern of interest pαIs negative and the pattern of interest pβAre positive, they cannot be considered similar, and the distance matrix D will be DαβFilling to infinity;
step 2.3: calculating D in distance matrixαβThe euclidean distances between elements that are not infinite are assigned to corresponding positions in the distance matrix.
Step 2.4: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom PcDeletion of p inαAnd pβThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P; wherein d isminAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and specifically specifying by a user.
And a second stage: generating prediction rules
And step 3: computing association rules using Apriori algorithm
Merging the interest pattern set P into P for each time variableallUsing Apriori algorithm on PallThe interest mode in the method is subjected to association rule mining to obtain the association rule of the interest mode among different time variables:
Figure GDA0002781951400000111
wherein, g is 1,2, …, R, pvjIs the interest pattern formed by the condition variable, and p' is the interest pattern formed by the decision variable;
the association rule r is calculated according to the following formulagDirection, delay and variation:
the direction calculation formula of the association rule is as follows:
Figure GDA0002781951400000112
wherein, m (p)vM) Is sLAnd s1Slope between, m (p)vM)=sgn(sL-s1),pvMIs the interest pattern, s, in the condition variables that has the greatest influence on the decision variableLRepresents pvMIn the last time series value, s1Represents pvMThe first time series value; m (p ') is s'LAnd slope between s ', s'LRepresents the last time series value in p ', and s ' represents the first time series value in p ';
the calculation formula of the time delay is as follows:
Figure GDA0002781951400000113
wherein,
Figure GDA0002781951400000121
is pvMStarting time value of (I)p′Is the starting time value of p';
variation of rule:
V(pvj)=(max(pvj)-min(pvj))×m(pvj)
V(p′)=(max(p′)-min(p′))×m(p′) (5)
wherein, V (p)vj) Is the pattern of interest p in the condition variablevjV (p ') is the variation of the interest pattern p' in the decision variables, max (p)vj) And min (p)vj) Respectively represent interest patterns pvjA maximum time series value and a minimum time series value.
And 4, step 4: generating prediction rules
(ii) m (p)vM) The multiple association rules of m (p') -1 are combined to form the following prediction rule:
A1≤V(pv1)≤B1and A is2≤V(pv2)≤B2…, and Aj≤V(pvj)≤Bj…, and Aλ≤V(p)≤BλThen C is1≤V(p′)≤C2And delayed by Δ T1A unit time;
wherein A isjAnd BjAre each m (p)vM) Interest pattern association rule r of 1 to m (p'),gmiddle V (p)vj) Minimum and maximum values of, C1And C2Are each m (p)vM) Interest pattern association rule r of 1 to m (p'),gwhere V (p') has a minimum value and a maximum value, j is 1,2, …, λ, λ ≧ 1, λ is the number of condition variables, and A isj、Bj、C1And C2Are all positive numbers.
② m (p)vM) The multiple association rules of-1 are combined to form the following prediction rule:
E1≤V(pv1)≤F1and E is2≤V(pv2)≤F2…, and Ej≤V(pvj)≤Fj…, and Eη≤V(p)≤FηThen G is1≤V(p′)≤G2And delayed by Δ T2A unit time;
wherein E isjAnd FjAre each m (p)vM) V (p) in the interest pattern association rule of-1 ═ m (p'),j) Minimum and maximum of, G1And G2Are each m (p)vM) Minimum and maximum values of V (p ') in the interest pattern association rule of-1 ═ m (p'), Ej、Fj、G1And G2All are negative numbers, j is 1,2, …, eta is more than or equal to 1, eta is the number of condition variables; delta T2=max(Δ(rg)),
Figure GDA0002781951400000122
Figure GDA0002781951400000123
Is pvMStarting time value of (I)p′Is the starting time value of p';
the present invention uses hit rate H to illustrate the performance of the algorithm of the present invention, where H is defined as:
Figure GDA0002781951400000131
and N is the number of interest modes which are accurately matched with the prediction rule in the condition variables, and M is the total number of interest modes in the condition variables.
The following embodiments of the present invention are provided, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are within the protection scope of the present invention.
Example 1
The embodiment provides that the algorithm of the invention is used for predicting the soil body temperature of the ancient avenue of the open great wall, wherein fig. 1 and fig. 2 are respectively a time sequence diagram and a variable relation diagram of the air temperature and the rammed soil temperature of the ancient avenue of the open great wall, and the time sequence data is processed through the three stages, wherein the embodiment only uses the pruning rule candidate interest mode set for clustering in the first stage, and the result is shown in fig. 5, wherein a negative curve represents the clustering result which does not use the pruning rule nor the R-tree data structure, a rounding-based curve represents the clustering result which uses the pruning rule, and as the number of the time sequence increases, the time course index used for calculating the euclidean distance without using the pruning rule increases, and the time for calculating the distance using the pruning rule increases slowly.
Example 2
The present embodiment is different from embodiment 1 in that: this embodiment uses the R-tree data structure in stage one, clustered in conjunction with the pruning rule candidate interest pattern set, which will be used to reduce the time complexity of the algorithm by reducing the number of patterns processed, as shown in FIG. 6. The horizontal axis represents the number of time series, and it can be seen that as the number of time series increases, the time for performing distance matrix calculation using the pruning rule in combination with the R-tree increases more slowly than the time for performing distance matrix calculation using only the pruning rule. In the time chart of the air temperature, p1And p2For the candidate pattern set identified by the algorithm of the present invention, MBR between two patterns can be constructed, and FIG. 3 shows a pattern p1And mode p2In between.
Table 1 shows six specific prediction rules generated in this example, and Table 1 shows the prediction rules
Figure GDA0002781951400000141
Fig. 7 shows the prediction results according to the six prediction rules, where the horizontal axis represents the monitoring area and the vertical axis represents the hit rate. The results of comparing the hit rates H predicted in regions 1,2,3,4,5 for the six generated prediction rules are shown in fig. 7, where the average hit rate for rule 3 is the highest.

Claims (2)

1. A soil temperature mode prediction method is characterized by comprising the following steps:
stage one: searching an air temperature candidate interest mode set and clustering the air temperature candidate interest mode set;
step 1.1: searching an air temperature candidate interest mode set;
step 1.1.1: finding an initial subsequence of available air temperatures
For air temperature time series S ═ S1,…,slFrom s1Begin to look for slope m in turn1Two adjacent time sequence values not equal to 0, and taking the two adjacent air temperature time sequence values found for the first time as an initial subsequence Si={si,si+1Where i ═ 1,2, …, l-1, l are the length of the time series, siRepresents the air temperature at the ith time point; slope m1The calculation formula of (2) is as follows:
Figure FDA0002635473030000011
step 1.1.2: calculating the slope of the adjacent air temperature time series value
Adding the next s to the available initial subsequencei+2Calculating si+2And si+1Slope m of2
Step 1.1.3: obtaining air temperature interest patterns
If m is2Is not equal to m1Obtaining an air temperature interest pattern pα={si,si+1,si+2};
If m is2Is equal to m1Continue step 1.1.2 until mkIs not equal to m1Thus, the air temperature interest pattern p is obtainedα={si,si+1,…,si+kIn which m iskIs s isi+kAnd si+k-1K is 1,2, …, l-2;
step 1.1.4 obtaining candidate air temperature interest mode set
For air temperature time series S ═ S1,…,slFrom interest pattern pαUntil the entire air temperature time series S is found, S ═ S, steps 1.1.1 to 1.1.3 are repeated1,…,slAll the interest patterns in the data form a candidate interest pattern set Pc={p1,p2,…,pα,…,pβ,…,pn};
Step 1.2: clustering the candidate interest patterns of the air temperature;
step 1.2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;
pruning rule is that: if candidate air temperature interest pattern set PcAny two interest patterns p inα,pβDo not occur simultaneously in the region width wsIn the same region, D in the distance matrix DαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure FDA0002635473030000021
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: if the air temperature interest pattern pαIs negative and the air temperature interest pattern pβHas a slope ofPositive, will be D in the distance matrix DαβEndowing the obtained product to infinity;
step 1.2.2: calculating D in distance matrixαβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;
step 1.2.3: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom the air temperature PcMiddle deletion air temperature pαAir temperature pβThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new air temperature interest mode set P;
wherein d isminTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent air temperature time sequence values, and specifically designating by a user;
and a second stage: searching a soil temperature candidate interest mode set and clustering the soil temperature candidate interest mode set;
step 2.1: searching a soil body temperature candidate interest mode set;
step 2.1.1: finding an initial subsequence of available soil temperature
For soil temperature time series S '{ S'1,…,s′lFrom s'1Begin to look for slope m 'in turn'1Two adjacent time sequence values not equal to 0, and taking the two adjacent soil body temperature time sequence values found for the first time as an initial subsequence S'i={s′i,s′i+1Where i ═ 1,2, …, l-1, l are the length of the time series, s'iRepresenting the soil body temperature of the ith time point; slope m'1The calculation formula of (2) is as follows:
Figure FDA0002635473030000031
step 2.1.2: calculating the slope of the adjacent soil temperature time sequence value
Adding the available initial subsequence by next s'i+2Calculatings′i+2And s'i+1Slope m 'of'2
Step 2.1.3: interest mode for obtaining soil temperature
If m'2Is not equal to m'1Obtaining a soil temperature interest pattern p'α={s′i,s′i+1,s′i+2};
If m'2Is equal to m'1Step 2.1.2 is continued until m'kIs not equal to m'1Obtaining a soil temperature interest pattern p'α={s′i,s′i+1,…,s′i+kWherein, m'kIs s'i+kAnd s'i+k-1K is 1,2, …, l-2;
step 2.1.4 obtaining a candidate soil body temperature interest mode set
For soil temperature time series S '{ S'1,…,s′lP 'from interest mode'αUntil the entire soil temperature time series S 'is found, S' is repeated from step 2.1.1 to step 2.1.3.1,…,s′lAll interest patterns in form a candidate interest pattern set P'c={p′1,p′2,…,p′α,…,p′β,…,p′n};
Step 2.2: clustering soil body temperature candidate interest mode sets;
step 2.2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;
pruning rule is that: if candidate soil temperature interest mode set P'cAny two interest patterns p 'of'α,p′βAre not simultaneously present in a region width of w'sIn the same region of (2), D 'in the distance matrix D'αβEndowing the obtained product to infinity; wherein, w'sIs a user-specified parameter, D' is a distance matrix for the pattern of interest,
Figure FDA0002635473030000041
D′αβ=d′αβ(p′α,p′β),d′αβis p'αAnd p'βThe euclidean distance of;
pruning rule 2: if soil body temperature interest mode p'αIs negative, and a soil temperature interest pattern p'βIs positive, and the distance matrix D 'is'αβEndowing the obtained product to infinity;
step 2.2.2: calculating D 'in distance matrix'αβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;
step 2.2.3: comparison of d'αβ(p′α,p′β) And user-specified d'minOf d'αβ≤d′minFrom soil temperature P'cMedium deleted soil temperature p'αAnd soil body temperature p'βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new soil body temperature interest mode set P';
wherein, d'minTaking a certain value between the minimum value and the maximum value of the Euclidean distance between the temperature time sequence values of two adjacent soil bodies, and specifically designating by a user;
and a third stage: generating prediction rules
And step 3: computing association rules using Apriori algorithm
Combining the interest mode set P of each air temperature time variable and the interest mode set P' of the soil temperature time variable to obtain PallUsing Apriori algorithm on PallThe interest mode in the system is subjected to association rule mining to obtain a plurality of association rules among different time variables;
and 4, step 4: generating prediction rules
(ii) m (p)vM) The multiple association rules of m (p') -1 are combined to form the following prediction rule:
A1≤V(pv1)≤B1and A is2≤V(pv2)≤B2…, and Aj≤V(pvj)≤Bj…, and Aλ≤V(p)≤BλThen C is1≤V(p′)≤C2And delayed by Δ T1A unit time;
wherein p isvjThe mode is an interest mode of air temperature, j is 1,2, …, lambda is more than or equal to 1, lambda is the number of condition variables, and p' is an interest mode of soil body temperature;
m(pvM) Is sLAnd s1Slope between, m (p)vM)=sgn(sL-s1),pvMIs the interest mode with the greatest influence on the soil body temperature variable in the air temperature variable, sLIs pvMLast air temperature value, s1Is pvMA first air temperature value; m (p ') is s'LAnd s'1Of slope of, s'LRepresents the last soil body temperature value, s 'in p'1Representing the first soil temperature value in p';
Ajand BjAre each m (p)vM) V (p) in the interest pattern association rule of 1 to m (p')/1vj) Minimum and maximum values of, C1And C2Are each m (p)vM) Minimum and maximum values of V (p ') in the interest pattern association rule of 1 ═ m (p'), aj、Bj、C1And C2Are all positive numbers;
V(pvj) Is the pattern of interest p in the air temperaturevjV (p ') is the variation of the interest pattern p' in the soil temperature,
V(pvj)=(max(pvj)-min(pvj))×m(pvj)
V(p′)=(max(p′)-min(p′))×m(p′)
max(pvj) And min (p)vj) Respectively represent interest patterns pvjThe maximum air temperature time series value and the minimum air temperature time series value;
time delay delta T1=max(Δ(rg)),Δ(rg)=IpvM-Ip′,IpvMIs pvMStarting time value of (I)p′Is the starting time value of p';
② m (p)vM) The multiple association rules of-1 are combined to form the following prediction rule:
E1≤V(pv1)≤F1and E is2≤V(pv2)≤F2…, and Ej≤V(pvj)≤Fj…, and Eη≤V(p)≤FηThen G is1≤V(p′)≤G2And delayed by Δ T2A unit time;
wherein E isjAnd FjAre each m (p)vM) V (p) in the interest pattern association rule of-1 ═ m (p'),j) J is 1,2, …, η, η ≧ 1, η is the number of condition variables, G1And G2Are each m (p)vM) Minimum and maximum values of V (p ') in the interest pattern association rule of-1 ═ m (p'); ej、Fj、G1And G2Are all negative numbers, and j is a natural number; delta T2=max(Δ(rg));
And a fourth stage: and inputting the air temperature of the earthen site detection area into the interest mode obtained in the first stage to match the prediction rule of the third stage, and outputting a soil body temperature mode if the conditions of the prediction rule are met.
2. The soil mass temperature pattern prediction method of claim 1 wherein: the candidate interest pattern set clustering method in step 1.2 may be replaced by the following method:
step 1.2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode;
step 1.2.2: assigning each child node i and j in the R-tree data structure to infinity a pattern distance value meeting a rule condition by using the following pruning rules 1 and 2;
pruning rule is that: if two air temperature interest patterns pα,pβNot simultaneously present in the regionDomain width of wsIn the same region, D in the distance matrixαβEndowing the obtained product to infinity; wherein, wsIs a user-specified parameter, D is a distance matrix for the pattern of interest,
Figure FDA0002635473030000071
Dαβ=dαβ(pα,pβ),dαβis pαAnd pβThe euclidean distance of;
pruning rule 2: if the air temperature interest pattern pαIs negative and the pattern of interest pβIs positive, will be D in the distance matrix DαβEndowing the obtained product to infinity;
step 1.2.3: distance matrix D in DαβCalculating non-infinite elements according to Euclidean distance, and assigning to corresponding positions in a distance matrix;
step 1.2.4: comparison dαβ(pα,pβ) And d specified by the userminA size of dαβ≤dminFrom PcDeletion of p inαAnd pβThe interest mode with the smaller time sequence number in the two air temperature interest modes is obtained finally to obtain a new interest mode set P; wherein d isminAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent air temperature time sequences, wherein the certain value is specified by a user.
CN201710324105.6A 2017-05-09 2017-05-09 Earth temperature mode prediction method Active CN107220483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710324105.6A CN107220483B (en) 2017-05-09 2017-05-09 Earth temperature mode prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710324105.6A CN107220483B (en) 2017-05-09 2017-05-09 Earth temperature mode prediction method

Publications (2)

Publication Number Publication Date
CN107220483A CN107220483A (en) 2017-09-29
CN107220483B true CN107220483B (en) 2021-01-01

Family

ID=59944105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710324105.6A Active CN107220483B (en) 2017-05-09 2017-05-09 Earth temperature mode prediction method

Country Status (1)

Country Link
CN (1) CN107220483B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797301A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Activity prediction method, activity prediction device, storage medium and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7305133B2 (en) * 2002-11-01 2007-12-04 Mitsubishi Electric Research Laboratories, Inc. Pattern discovery in video content using association rules on multiple sets of labels
CN102637208B (en) * 2012-03-28 2013-10-30 南京财经大学 Method for filtering noise data based on pattern mining
KR102074734B1 (en) * 2013-02-28 2020-03-02 삼성전자주식회사 Method and apparatus for pattern discoverty in sequence data
US9230015B2 (en) * 2013-07-02 2016-01-05 Hewlett-Packard Development Company, L.P. Deriving an interestingness measure for a cluster
US10210461B2 (en) * 2014-03-21 2019-02-19 International Business Machines Corporation Ranking data analytics results using composite validation
CN105320756B (en) * 2015-10-15 2018-07-10 中通服咨询设计研究院有限公司 A kind of database association rule digging method based on improvement Apriori algorithm
CN106384128A (en) * 2016-09-09 2017-02-08 西安交通大学 Method for mining time series data state correlation

Also Published As

Publication number Publication date
CN107220483A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN109754113B (en) Load prediction method based on dynamic time warping and long-and-short time memory
Xuan et al. Multi-model fusion short-term load forecasting based on random forest feature selection and hybrid neural network
Ma et al. A hybrid attention-based deep learning approach for wind power prediction
Khodayar et al. Rough deep neural architecture for short-term wind speed forecasting
Dasgupta et al. Nonlinear dynamic Boltzmann machines for time-series prediction
CN115018021B (en) Machine room abnormity detection method and device based on graph structure and abnormity attention mechanism
Wang et al. Correlation aware multi-step ahead wind speed forecasting with heteroscedastic multi-kernel learning
Yang et al. A novel self-constructing radial basis function neural-fuzzy system
Zhuang et al. Representation learning via semi-supervised autoencoder for multi-task learning
CN110766060B (en) Time series similarity calculation method, system and medium based on deep learning
Suryo et al. Improved time series prediction using LSTM neural network for smart agriculture application
CN110633846A (en) Gas load prediction method and device
CN112289391A (en) Anode aluminum foil performance prediction system based on machine learning
Lei et al. A hybrid regularization semi-supervised extreme learning machine method and its application
CN115766125A (en) Network flow prediction method based on LSTM and generation countermeasure network
CN114037143A (en) Short-term wind power combination prediction method
CN107220483B (en) Earth temperature mode prediction method
Khabusi et al. A deep learning approach to predict dissolved oxygen in aquaculture
Bebarta et al. Polynomial based functional link artificial recurrent neural network adaptive system for predicting Indian stocks
CN116596396A (en) Industrial polyethylene process quality prediction method based on K nearest neighbor interpolation and SLSTM
Ardilla et al. Batch Learning Growing Neural Gas for Sequential Point Cloud Processing
Yusoff et al. Long Term Short Memory with Particle Swarm Optimization for Crude Oil Price Prediction
Li et al. Research on recommendation algorithm based on e-commerce user behavior sequence
CN114169914A (en) Feature construction method, device and storage medium for item sales prediction
Anowar et al. Incremental Learning with Self-labeling of Incoming High-dimensional Data.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221213

Address after: 710075 Room 021, F2003, 20th Floor, Building 4-A, Xixian Financial Port, Fengdong New City Energy Jinmao District, Xixian New District, Xi'an, Shaanxi

Patentee after: Shaanxi Dahang Wujiang Information Technology Co.,Ltd.

Address before: 710069 No. 229 Taibai North Road, Shaanxi, Xi'an

Patentee before: NORTHWEST University