CN107220483B

CN107220483B - Earth temperature mode prediction method

Info

Publication number: CN107220483B
Application number: CN201710324105.6A
Authority: CN
Inventors: 肖云; 许震洲; 王欣; 王选宏; 高颢函; 陈晓江; 房鼎益
Original assignee: Northwestern University
Current assignee: Shaanxi Dahang Wujiang Information Technology Co ltd
Priority date: 2017-05-09
Filing date: 2017-05-09
Publication date: 2021-01-01
Anticipated expiration: 2037-05-09
Also published as: CN107220483A

Abstract

The invention discloses a soil temperature mode prediction method, which comprises three stages, wherein a candidate interest mode set is searched for a time sequence formed by each condition variable and decision variable, and each candidate interest mode set is clustered respectively; secondly, generating prediction between condition variables and decision variables; and finally, matching the interest mode obtained in the first stage of executing the condition variable of the data to be tested with the prediction rule in the second stage, and outputting the prediction result of the decision variable if the prediction rule meets the antecedent of the prediction rule. The mode prediction method of the multivariate time sequence data has small calculation amount, effectively reduces the time complexity in the mode prediction and solves the problem of overhigh time complexity in the traditional method.

Description

Earth temperature mode prediction method

Technical Field

The invention belongs to the field of computers, particularly relates to the field of data mining in computers, and particularly relates to a mode prediction method of multivariate time sequence data.

Background

Time sequence prediction is a very important research direction in the fields of weather forecast, stock and the like. One of the most important methods in time series prediction is to predict the behavior of some variables according to their trends, which is called multivariate time series prediction. For example, if we consider two variables to be related, we may want to know if, for example, a 10% increase in temperature in a weather forecast affects the trend of humidity.

In multivariate prediction, we can divide the main methods into mathematical and artificial methods. In mathematical methods such as ARIMA (automated integrated Moving Average Model), a Model is established by converting a non-stationary time series into a stationary time series and then regressing a dependent variable only for its lag value and the present value and the lag value of a random error term), or an exponential smoothing algorithm is unreliable in processing non-linear irregular data in the real world. Artificial neural networks, support vector machines and K-neighbors are some of the machine learning methods applied to time series prediction. However, these conventional methods fail because many time variables translate and scale over time. To solve this problem, one solution is to consider a sequence of behaviors rather than a variable value. For example, some methods perform pattern prediction in time series analysis. These methods all assume a method of representing the data and then trying to find the most frequent pattern. However, the main problems with these solutions are: data representation in these methods does not reduce data dimensionality, especially high-dimensional data, and they also have to process the data using methods such as clustering resulting in increased time complexity; another problem is that their research has no ability to interpret output rules and relationships, and therefore reduction in time complexity and interpretation of output rules and relationships need to be effectively solved.

Disclosure of Invention

Aiming at the defects and shortcomings of the prior art, the invention aims to provide a mode prediction method of multivariate time series data, and solves the problem of high time complexity of the conventional data processing method.

In order to achieve the purpose, the invention adopts the following technical scheme:

a mode prediction method of multivariate time series data comprises the following steps:

stage one: searching candidate interest mode sets for a time sequence formed by each condition variable and each decision variable, and clustering each candidate interest mode set respectively;

step 1: searching a candidate interest mode set;

step 1.1: finding available initial subsequences

For time series S ═ S₁,…,s_lFrom s₁Begin to look for slope m in turn₁Two adjacent time sequence values not equal to 0, and taking the two adjacent time sequence values found for the first time as an initial subsequence S_i＝{s_i,s_i+1Where i ═ 1,2, …, l-1, l are the length of the time series, slope m₁The calculation formula of (2) is as follows:

step 1.2: calculating the slope of adjacent time series values

Adding the next s to the available initial subsequence_i+2Calculating s_i+2And s_i+1Slope m of₂；

Step 1.3: obtaining interest patterns

If m is₂Is not equal to m₁To obtain the interest pattern p_α＝{s_i,s_i+1,s_i+2}；

If m is₂Is equal to m₁Continue step 1.2 until m_kIs not equal to m₁Thus, the interest pattern p is obtained_α＝{s_i,s_i+1,…,s_i+kIn which m is_kIs s is_i+kAnd s_i+k-1K is 1,2, …, l-2;

step 1.4 obtaining a set of candidate interest patterns

For time series S ═ S₁,…,s_lFrom interest pattern p_αUntil the entire time series S is found, S ═ S, steps 1.1 to 1.3 are repeated₁,…,s_lAll the interest patterns in the data form a candidate interest pattern set P_c＝{p₁,p₂,…,p_α,…,p_β,…,p_n}；

Step 2: clustering a candidate interest pattern set;

step 2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;

pruning rule is that: if the candidate interest pattern set P_cAny two interest patterns p in_α,p_βDo not occur simultaneously in the region width w_sIn the same region, D in the distance matrix D_αβEndowing the obtained product to infinity; wherein, w_sIs a user-specified parameter, D is a distance matrix for the pattern of interest,

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

pruning rule 2: if interest pattern p_αIs negative and the pattern of interest p_βIs positive, will be D in the distance matrix D_αβEndowing the obtained product to infinity;

step 2.2: calculating D in distance matrix_αβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;

step 2.3: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom P_cDeletion of p in_αAnd p_βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P;

wherein d is_minTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and specifically designating by a user;

and a second stage: generating prediction rules

And step 3: computing association rules using Apriori algorithm

Merging the interest pattern set P of each time variable to obtain P_allUsing Apriori algorithm on P_allThe interest mode in the system is subjected to association rule mining to obtain a plurality of association rules among different time variables;

and 4, step 4: generating prediction rules

(ii) m (p)_vM) The multiple association rules of m (p') -1 are combined to form the following prediction rule:

A₁≤V(p_v1)≤B₁and A is₂≤V(p_v2)≤B₂…, and A_j≤V(p_vj)≤B_j…, and A_λ≤V(p_vλ)≤B_λThen C is₁≤V(p′)≤C₂And a delay of ΔT₁A unit time;

wherein p is_vjJ is 1,2, …, lambda is not less than 1, lambda is the number of condition variables, and p' is the interest mode formed by decision variables;

m(p_vM) Is s_LAnd s₁Slope between, m (p)_vM)＝sgn(s_L-s₁)，p_vMIs the interest pattern, s, in the condition variables that has the greatest influence on the decision variable_LIs p_vMIn the last time series value, s₁Is p_vMThe first time series value; m (p ') is s'_LAnd s'₁Of slope of, s'_LRepresents the last time-series value, s 'in p'₁Represents the first time series value in p';

A_jand B_jAre each m (p)_vM) V (p) in the interest pattern association rule of 1 to m (p')/1_vj) Minimum and maximum values of, C₁And C₂Are each m (p)_vM) Minimum and maximum values of V (p ') in the interest pattern association rule of 1 ═ m (p'), a_j、B_j、C₁And C₂Are all positive numbers;

V(p_vj) Is the pattern of interest p in the condition variable_vjV (p ') is the variation of the interest pattern p' in the decision variables,

V(p_vj)＝(max(p_vj)-min(p_vj))×m(p_vj)

V(p′)＝(max(p′)-min(p′))×m(p′)

max(p_vj) And min (p)_vj) Respectively represent interest patterns p_vjA maximum time series value and a minimum time series value;

time delay delta T₁＝max(Δ(r_g))，Δ(r_g)＝I_pvM-I_p′，I_pvMIs p_vMStarting time value of (I)_p′Is the starting time value of p';

② m (p)_vM) The multiple association rules of-1 are combined to form the following prediction rule:

E₁≤V(p_v1)≤F₁and E is₂≤V(p_v2)≤F₂…, and E_j≤V(p_vj)≤F_j…, and E_η≤V(p_vη)≤F_ηThen G is₁≤V(p′)≤G₂And delayed by Δ T₂A unit time;

wherein E is_jAnd F_jAre each m (p)_vM) V (p) in the interest pattern association rule of-1 ═ m (p'),_j) Minimum and maximum of, G₁And G₂Are each m (p)_vM) M (p') -1, j-1, 2, …, η, η ≧ 1, η being the number of condition variables; e_j、F_j、G₁And G₂Are all negative numbers, and j is a natural number; delta T₂＝max(Δ(r_g))；

And a third stage: and matching the interest mode obtained in the first stage of executing the condition variable of the data to be tested with the prediction rule in the second stage, and outputting the prediction result of the decision variable if the prediction rule meets the antecedent of the prediction rule.

Further, the candidate interest pattern set clustering method in step 2 may be replaced by the following method:

step 2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode;

step 2.2: for each child node i and j in the R-tree data structure, the pattern distance value that satisfies the rule condition is assigned to infinity using the following

pruning rules

1 and 2.

Pruning rule is that: if two interest patterns p_α,p_βDo not occur simultaneously in the region width w_sIn the same region, D in the distance matrix_αβEndowing the obtained product to infinity; wherein, w_sIs a user-specified parameter, D is a distance matrix for the pattern of interest,

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

step 2.3: distance matrix D in D_ijCalculating non-infinite elements according to Euclidean distance, and assigning to corresponding positions in a distance matrix;

step 2.4: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom P_cDeletion of p in_αAnd p_βThe interest mode with the smaller time sequence number in the two interest modes finally obtains a new interest mode set P; wherein d is_minAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequences, wherein the certain value is specified by a user.

Compared with the prior art, the invention has the beneficial effects that: the mode prediction method of the multivariate time sequence data has small calculation amount, effectively reduces the time complexity in the mode prediction and solves the problem of overhigh time complexity in the traditional method.

Drawings

FIG. 1 is a timing diagram of air temperature and rammed earth temperature.

FIG. 2 is a graph of the air temperature versus rammed earth temperature variation.

FIG. 3 is an MBR between modes.

FIG. 4 is a diagram of an R-tree based data structure for retrieving candidate patterns according to the present invention.

Figure 5 is a graph of performance versus sequence number for euclidean distance measurements using pruning and non-pruning strategies for distance matrix calculations.

FIG. 6 is a graph comparing time taken for Euclidean distance calculation using the pruning rule and time performance of distance matrix calculation using pruning and R-tree.

FIG. 7 is a performance evaluation of the six rules generated in the example.

The present invention will be explained in further detail with reference to examples.

Detailed Description

The condition variables in the invention are variables which can be used for predicting other variables, and the decision variables are variables which can be predicted by other variables.

The method of the invention comprises three stages:

stage one: searching candidate interest mode sets for a time sequence formed by each condition variable and each decision variable, and clustering each candidate interest mode set respectively; and a second stage: generating a prediction rule; and a third stage: and predicting the data to be detected according to the prediction rule generated in the second stage.

And the third stage is to match the interest pattern obtained after the data to be measured undergoes the first stage with the prediction rule in the second stage aiming at the data to be measured, and if the prediction rule is met, the prediction result of the decision variable is output.

In phase one: looking for patterns of interest and summarizing the behavior of the data, looking for patterns with positive and negative slopes for changes in the data. Since sequence data may contain repetitive patterns, the algorithm clusters and groups these patterns, specifically:

step 1: finding a set of candidate patterns of interest

Step 1.1: finding available initial subsequences

For time series S ═ S₁,…,s_lFrom s₁Begin to look for slope m in turn₁Two adjacent time sequence values which are not 0 are used as the initial subsequence S_i＝{s_i,s_i+1Where i is 1,2, …, l-1, l is the length of the time series,

slope m₁The calculation formula of (2) is as follows:

step 1.2: calculating the slope of adjacent time series values

Step 1.3: obtaining interest patterns

step 1.4 obtaining a set of candidate interest patterns

For time series S ═ S₁,…,s_lFrom interest pattern p_αOf the last time series value (i.e. s)_i+k) To begin with, steps 1.1 to 1.3 are repeated until the entire time series S is found { S ═ S₁,…,s_lN interest patterns of form a candidate interest pattern set P_c＝{p₁,p₂,…,p_α,…,p_β,…,p_n}；

Step 2: clustering a candidate interest pattern set;

similar patterns are grouped in a set of candidate patterns, and the first step in finding similar patterns is to generate a distance matrix between the patterns. For each pair of modes, the elements of a distance matrix show the distances of the two modes. However, the time consumption of the conventional algorithm is too large, and in order to solve the problem, two methods are proposed in the present invention: one is a pruning rule, and the other is an R-tree combined pruning rule.

The pruning rules are specifically as follows:

pruning rule is that: if the candidate interest pattern set P_cAny two interest patterns p in_α,p_βDo not occur simultaneously in the region width w_sThen the distance of these two modes is infinite, and D in the distance matrix D is set as D_αβEndowing the obtained product to infinity; wherein, w_sIs a user-specified parameter, D is a distance matrix for the pattern of interest,

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

pruning rule 2: pruning is performed using the slope of the pattern if the pattern of interest p_αIs negative and the pattern of interest p_βAre positive, they cannot be considered similar, and the distance matrix D will be D_αβFilling to infinity;

step 2.2: calculating D in distance matrix_αβThe euclidean distances between elements that are not infinite are assigned to corresponding positions in the distance matrix.

Step 2.3: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom P_cDeletion of p in_αAnd p_βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P; wherein d is_minTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and obtaining the time sequenceThe value is specified by the user.

For R-tree combined with pruning rules, the R-tree is used to index the candidate pattern set P_cAs shown in FIG. 4, P₁-P₉For the candidate mode, the leaf node of each tree in the R-tree structure is the MBR of one candidate mode. The method comprises the following specific steps:

step 2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode; where each leaf node is the MBR of a pattern, the middle entry of the R-Tree indexes the pattern with the nearby MBR. This data structure will be used to reduce the time complexity of the algorithm by reducing the number of modes processed. FIG. 3 illustrates pattern p₁And mode p₂And (4) MBR in between.

pruning rules

1 and 2.

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

step 2.3: calculating D in distance matrix_αβThe euclidean distances between elements that are not infinite are assigned to corresponding positions in the distance matrix.

Step 2.4: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom P_cDeletion of p in_αAnd p_βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new interest mode set P; wherein d is_minAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent time sequence values, and specifically specifying by a user.

And a second stage: generating prediction rules

And step 3: computing association rules using Apriori algorithm

Merging the interest pattern set P into P for each time variable_allUsing Apriori algorithm on P_allThe interest mode in the method is subjected to association rule mining to obtain the association rule of the interest mode among different time variables:

wherein, g is 1,2, …, R, p_vjIs the interest pattern formed by the condition variable, and p' is the interest pattern formed by the decision variable;

the association rule r is calculated according to the following formula_gDirection, delay and variation:

the direction calculation formula of the association rule is as follows:

wherein, m (p)_vM) Is s_LAnd s₁Slope between, m (p)_vM)＝sgn(s_L-s₁)，p_vMIs the interest pattern, s, in the condition variables that has the greatest influence on the decision variable_LRepresents p_vMIn the last time series value, s₁Represents p_vMThe first time series value; m (p ') is s'_LAnd slope between s ', s'_LRepresents the last time series value in p ', and s ' represents the first time series value in p ';

the calculation formula of the time delay is as follows:

wherein,

is p_vMStarting time value of (I)_p′Is the starting time value of p';

variation of rule:

V(p_vj)＝(max(p_vj)-min(p_vj))×m(p_vj)

V(p′)＝(max(p′)-min(p′))×m(p′) (5)

wherein, V (p)_vj) Is the pattern of interest p in the condition variable_vjV (p ') is the variation of the interest pattern p' in the decision variables, max (p)_vj) And min (p)_vj) Respectively represent interest patterns p_vjA maximum time series value and a minimum time series value.

And 4, step 4: generating prediction rules

A₁≤V(p_v1)≤B₁and A is₂≤V(p_v2)≤B₂…, and A_j≤V(p_vj)≤B_j…, and A_λ≤V(p_vλ)≤B_λThen C is₁≤V(p′)≤C₂And delayed by Δ T₁A unit time;

wherein A is_jAnd B_jAre each m (p)_vM) Interest pattern association rule r of 1 to m (p'),_gmiddle V (p)_vj) Minimum and maximum values of, C₁And C₂Are each m (p)_vM) Interest pattern association rule r of 1 to m (p'),_gwhere V (p') has a minimum value and a maximum value, j is 1,2, …, λ, λ ≧ 1, λ is the number of condition variables, and A is_j、B_j、C₁And C₂Are all positive numbers.

wherein E is_jAnd F_jAre each m (p)_vM) V (p) in the interest pattern association rule of-1 ═ m (p'),_j) Minimum and maximum of, G₁And G₂Are each m (p)_vM) Minimum and maximum values of V (p ') in the interest pattern association rule of-1 ═ m (p'), E_j、F_j、G₁And G₂All are negative numbers, j is 1,2, …, eta is more than or equal to 1, eta is the number of condition variables; delta T₂＝max(Δ(r_g))，

Is p_vMStarting time value of (I)_p′Is the starting time value of p';

the present invention uses hit rate H to illustrate the performance of the algorithm of the present invention, where H is defined as:

and N is the number of interest modes which are accurately matched with the prediction rule in the condition variables, and M is the total number of interest modes in the condition variables.

The following embodiments of the present invention are provided, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are within the protection scope of the present invention.

Example 1

The embodiment provides that the algorithm of the invention is used for predicting the soil body temperature of the ancient avenue of the open great wall, wherein fig. 1 and fig. 2 are respectively a time sequence diagram and a variable relation diagram of the air temperature and the rammed soil temperature of the ancient avenue of the open great wall, and the time sequence data is processed through the three stages, wherein the embodiment only uses the pruning rule candidate interest mode set for clustering in the first stage, and the result is shown in fig. 5, wherein a negative curve represents the clustering result which does not use the pruning rule nor the R-tree data structure, a rounding-based curve represents the clustering result which uses the pruning rule, and as the number of the time sequence increases, the time course index used for calculating the euclidean distance without using the pruning rule increases, and the time for calculating the distance using the pruning rule increases slowly.

Example 2

The present embodiment is different from embodiment 1 in that: this embodiment uses the R-tree data structure in stage one, clustered in conjunction with the pruning rule candidate interest pattern set, which will be used to reduce the time complexity of the algorithm by reducing the number of patterns processed, as shown in FIG. 6. The horizontal axis represents the number of time series, and it can be seen that as the number of time series increases, the time for performing distance matrix calculation using the pruning rule in combination with the R-tree increases more slowly than the time for performing distance matrix calculation using only the pruning rule. In the time chart of the air temperature, p₁And p₂For the candidate pattern set identified by the algorithm of the present invention, MBR between two patterns can be constructed, and FIG. 3 shows a pattern p₁And mode p₂In between.

Table 1 shows six specific prediction rules generated in this example, and Table 1 shows the prediction rules

Fig. 7 shows the prediction results according to the six prediction rules, where the horizontal axis represents the monitoring area and the vertical axis represents the hit rate. The results of comparing the hit rates H predicted in

regions

1,2,3,4,5 for the six generated prediction rules are shown in fig. 7, where the average hit rate for rule 3 is the highest.

Claims

1. A soil temperature mode prediction method is characterized by comprising the following steps:

stage one: searching an air temperature candidate interest mode set and clustering the air temperature candidate interest mode set;

step 1.1: searching an air temperature candidate interest mode set;

step 1.1.1: finding an initial subsequence of available air temperatures

For air temperature time series S ═ S₁,…,s_lFrom s₁Begin to look for slope m in turn₁Two adjacent time sequence values not equal to 0, and taking the two adjacent air temperature time sequence values found for the first time as an initial subsequence S_i＝{s_i,s_i+1Where i ═ 1,2, …, l-1, l are the length of the time series, s_iRepresents the air temperature at the ith time point; slope m₁The calculation formula of (2) is as follows:

step 1.1.2: calculating the slope of the adjacent air temperature time series value

Step 1.1.3: obtaining air temperature interest patterns

If m is₂Is not equal to m₁Obtaining an air temperature interest pattern p_α＝{s_i,s_i+1,s_i+2}；

If m is₂Is equal to m₁Continue step 1.1.2 until m_kIs not equal to m₁Thus, the air temperature interest pattern p is obtained_α＝{s_i,s_i+1,…,s_i+kIn which m is_kIs s is_i+kAnd s_i+k-1K is 1,2, …, l-2;

step 1.1.4 obtaining candidate air temperature interest mode set

For air temperature time series S ═ S₁,…,s_lFrom interest pattern p_αUntil the entire air temperature time series S is found, S ═ S, steps 1.1.1 to 1.1.3 are repeated₁,…,s_lAll the interest patterns in the data form a candidate interest pattern set P_c＝{p₁,p₂,…,p_α,…,p_β,…,p_n}；

Step 1.2: clustering the candidate interest patterns of the air temperature;

step 1.2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;

pruning rule is that: if candidate air temperature interest pattern set P_cAny two interest patterns p in_α,p_βDo not occur simultaneously in the region width w_sIn the same region, D in the distance matrix D_αβEndowing the obtained product to infinity; wherein, w_sIs a user-specified parameter, D is a distance matrix for the pattern of interest,

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

pruning rule 2: if the air temperature interest pattern p_αIs negative and the air temperature interest pattern p_βHas a slope ofPositive, will be D in the distance matrix D_αβEndowing the obtained product to infinity;

step 1.2.2: calculating D in distance matrix_αβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;

step 1.2.3: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom the air temperature P_cMiddle deletion air temperature p_αAir temperature p_βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new air temperature interest mode set P;

wherein d is_minTaking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent air temperature time sequence values, and specifically designating by a user;

and a second stage: searching a soil temperature candidate interest mode set and clustering the soil temperature candidate interest mode set;

step 2.1: searching a soil body temperature candidate interest mode set;

step 2.1.1: finding an initial subsequence of available soil temperature

For soil temperature time series S '{ S'₁,…,s′_lFrom s'₁Begin to look for slope m 'in turn'₁Two adjacent time sequence values not equal to 0, and taking the two adjacent soil body temperature time sequence values found for the first time as an initial subsequence S'_i＝{s′_i,s′_i+1Where i ═ 1,2, …, l-1, l are the length of the time series, s'_iRepresenting the soil body temperature of the ith time point; slope m'₁The calculation formula of (2) is as follows:

step 2.1.2: calculating the slope of the adjacent soil temperature time sequence value

Adding the available initial subsequence by next s'_i+2Calculatings′_i+2And s'_i+1Slope m 'of'₂；

Step 2.1.3: interest mode for obtaining soil temperature

If m'₂Is not equal to m'₁Obtaining a soil temperature interest pattern p'_α＝{s′_i,s′_i+1,s′_i+2}；

If m'₂Is equal to m'₁Step 2.1.2 is continued until m'_kIs not equal to m'₁Obtaining a soil temperature interest pattern p'_α＝{s′_i,s′_i+1,…,s′_i+kWherein, m'_kIs s'_i+kAnd s'_i+k-1K is 1,2, …, l-2;

step 2.1.4 obtaining a candidate soil body temperature interest mode set

For soil temperature time series S '{ S'₁,…,s′_lP 'from interest mode'_αUntil the entire soil temperature time series S 'is found, S' is repeated from step 2.1.1 to step 2.1.3.₁,…,s′_lAll interest patterns in form a candidate interest pattern set P'_c＝{p′₁,p′₂,…,p′_α,…,p′_β,…,p′_n}；

Step 2.2: clustering soil body temperature candidate interest mode sets;

step 2.2.1: using the following pruning rules of firstly and secondly to endow a pattern distance value meeting the rule condition with infinity;

pruning rule is that: if candidate soil temperature interest mode set P'_cAny two interest patterns p 'of'_α,p′_βAre not simultaneously present in a region width of w'_sIn the same region of (2), D 'in the distance matrix D'_αβEndowing the obtained product to infinity; wherein, w'_sIs a user-specified parameter, D' is a distance matrix for the pattern of interest,

D′_αβ＝d′_αβ(p′_α,p′_β)，d′_αβis p'_αAnd p'_βThe euclidean distance of;

pruning rule 2: if soil body temperature interest mode p'_αIs negative, and a soil temperature interest pattern p'_βIs positive, and the distance matrix D 'is'_αβEndowing the obtained product to infinity;

step 2.2.2: calculating D 'in distance matrix'_αβThe distance of the non-infinite element is assigned to a corresponding position in the distance matrix;

step 2.2.3: comparison of d'_αβ(p′_α,p′_β) And user-specified d'_minOf d'_αβ≤d′_minFrom soil temperature P'_cMedium deleted soil temperature p'_αAnd soil body temperature p'_βThe interest mode with the smaller number of time sequence values in the two interest modes finally obtains a new soil body temperature interest mode set P';

wherein, d'_minTaking a certain value between the minimum value and the maximum value of the Euclidean distance between the temperature time sequence values of two adjacent soil bodies, and specifically designating by a user;

and a third stage: generating prediction rules

And step 3: computing association rules using Apriori algorithm

Combining the interest mode set P of each air temperature time variable and the interest mode set P' of the soil temperature time variable to obtain P_allUsing Apriori algorithm on P_allThe interest mode in the system is subjected to association rule mining to obtain a plurality of association rules among different time variables;

and 4, step 4: generating prediction rules

wherein p is_vjThe mode is an interest mode of air temperature, j is 1,2, …, lambda is more than or equal to 1, lambda is the number of condition variables, and p' is an interest mode of soil body temperature;

m(p_vM) Is s_LAnd s₁Slope between, m (p)_vM)＝sgn(s_L-s₁)，p_vMIs the interest mode with the greatest influence on the soil body temperature variable in the air temperature variable, s_LIs p_vMLast air temperature value, s₁Is p_vMA first air temperature value; m (p ') is s'_LAnd s'₁Of slope of, s'_LRepresents the last soil body temperature value, s 'in p'₁Representing the first soil temperature value in p';

V(p_vj) Is the pattern of interest p in the air temperature_vjV (p ') is the variation of the interest pattern p' in the soil temperature,

V(p_vj)＝(max(p_vj)-min(p_vj))×m(p_vj)

V(p′)＝(max(p′)-min(p′))×m(p′)

max(p_vj) And min (p)_vj) Respectively represent interest patterns p_vjThe maximum air temperature time series value and the minimum air temperature time series value;

wherein E is_jAnd F_jAre each m (p)_vM) V (p) in the interest pattern association rule of-1 ═ m (p'),_j) J is 1,2, …, η, η ≧ 1, η is the number of condition variables, G₁And G₂Are each m (p)_vM) Minimum and maximum values of V (p ') in the interest pattern association rule of-1 ═ m (p'); e_j、F_j、G₁And G₂Are all negative numbers, and j is a natural number; delta T₂＝max(Δ(r_g))；

And a fourth stage: and inputting the air temperature of the earthen site detection area into the interest mode obtained in the first stage to match the prediction rule of the third stage, and outputting a soil body temperature mode if the conditions of the prediction rule are met.

2. The soil mass temperature pattern prediction method of claim 1 wherein: the candidate interest pattern set clustering method in step 1.2 may be replaced by the following method:

step 1.2.1: constructing MBR of each mode in the candidate mode set by using the R-tree, forming a data structure of the mode, and obtaining an index of the mode;

step 1.2.2: assigning each child node i and j in the R-tree data structure to infinity a pattern distance value meeting a rule condition by using the following pruning rules 1 and 2;

pruning rule is that: if two air temperature interest patterns p_α,p_βNot simultaneously present in the regionDomain width of w_sIn the same region, D in the distance matrix_αβEndowing the obtained product to infinity; wherein, w_sIs a user-specified parameter, D is a distance matrix for the pattern of interest,

D_αβ＝d_αβ(p_α,p_β)，d_αβis p_αAnd p_βThe euclidean distance of;

pruning rule 2: if the air temperature interest pattern p_αIs negative and the pattern of interest p_βIs positive, will be D in the distance matrix D_αβEndowing the obtained product to infinity;

step 1.2.3: distance matrix D in D_αβCalculating non-infinite elements according to Euclidean distance, and assigning to corresponding positions in a distance matrix;

step 1.2.4: comparison d_αβ(p_α,p_β) And d specified by the user_minA size of d_αβ≤d_minFrom P_cDeletion of p in_αAnd p_βThe interest mode with the smaller time sequence number in the two air temperature interest modes is obtained finally to obtain a new interest mode set P; wherein d is_minAnd taking a certain value between the minimum value and the maximum value of the Euclidean distance between two adjacent air temperature time sequences, wherein the certain value is specified by a user.