CN117828308A

CN117828308A - Time sequence prediction method based on local segmentation

Info

Publication number: CN117828308A
Application number: CN202410238526.7A
Authority: CN
Inventors: 王涛; 杨斌; 赵影; 贺业凤
Original assignee: Shandong Jerei Digital Technology Co Ltd
Current assignee: Shandong Jerei Digital Technology Co Ltd
Priority date: 2024-03-04
Filing date: 2024-03-04
Publication date: 2024-04-05

Abstract

The application discloses a time sequence prediction method based on local segmentation, which belongs to the technical field of artificial intelligence, and comprises the following steps: the original time sequence data is subjected to data division and preprocessing to obtain a second training set; inputting the second training set into the constructed original prediction model comprising a local segmentation module, a transducer attention module, a segmentation flattening module and a full-connection layer which are sequentially connected for training to obtain a target prediction model; and inputting the time series data to be predicted into a target prediction model to obtain a target prediction result. The method can convert the correlation between time series data corresponding to a single time point of attention calculation into the similarity between the series sections, can hold the change of historical data more accurately, and can also keep the local semantic information in the time section, thereby reducing the spatial complexity of the attention calculation and further ensuring that the obtained prediction result is more accurate.

Description

Time sequence prediction method based on local segmentation

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a time sequence prediction method based on local segmentation.

Background

A time series is data arranged in a time sequence reflecting the variation of a single or multiple variables over a period of time, with time dependence and correlation. The time sequence prediction can reveal the change rule and trend of the observed variable, and provide data support for fine management and intelligent decision-making. The time sequence prediction is used as an important application of artificial intelligence technology, and can assist in the development of industries such as production and manufacture, economy and finance, resource monitoring and the like.

Currently, the long and short term memory network (LSTM) and other models based on a Recurrent Neural Network (RNN) have been used in many practical applications to exhibit good performance of multi-step prediction. However, limited by the model structure, recurrent neural networks are often not suitable for parallel training and have gradient vanishing problems, resulting in limited capture sequence length; a Convolutional Neural Network (CNN) based Time Convolutional Network (TCN) solves the problem of parallel training, but the need for memory is enormous due to the reliance on stacked hidden layers to obtain a larger receptive field. Therefore, there is a need for a time series prediction method to solve the above problems.

Disclosure of Invention

In view of this, the present application provides a time series prediction method based on local segmentation, which can convert the correlation between time series data corresponding to a single time point of attention calculation into the similarity between sequence segments, so that the historical data change can be grasped more accurately, and meanwhile, the local semantic information in the time period can be kept, so that the spatial complexity of the attention calculation is reduced, and the obtained prediction result is more accurate.

Specifically, the method comprises the following technical scheme:

the embodiment of the application provides a time sequence prediction method based on local segmentation, which comprises the following steps:

performing data division on the original time sequence data to obtain a first training set, a first verification set and a first test set;

preprocessing the first training set, the first verification set and the first test set to obtain a preprocessed second training set, a preprocessed second verification set and a preprocessed second test set;

an original prediction model is constructed, wherein the original prediction model comprises a local segmentation module, a transducer attention module, a segmentation flattening module and a full connection layer which are sequentially connected;

inputting the second training set into the original prediction model for training to obtain a target prediction model;

and inputting the time series data to be predicted into a target prediction model to obtain a target prediction result.

In some embodiments, preprocessing the first training set, the first verification set, and the first test set to obtain a preprocessed second training set, second verification set, and second test set, including:

extracting feature graphs of the first training set, the first verification set and the first test set;

calculating the data mean and variance of each channel in the first training set, the first verification set and the first test set;

and carrying out normalization processing on the feature map based on the data mean and variance to obtain a second training set, a second verification set and a second test set.

In some embodiments, the inputting the second training set into the original prediction model for training to obtain a target prediction model includes:

the segmentation processing of the overlapping time sequence segments is carried out on the second training set in a local segmentation module, so that segmented data are obtained;

performing position coding on the segmentation data to obtain position coding data comprising position information;

performing self-attention calculation processing on the position coding data in a transducer attention module to obtain decoding output data;

performing flattening processing on the decoded output data in a segmented flattening module to obtain one-dimensional flattening data;

inputting the one-dimensional clapping data into the full-connection layer to obtain an intermediate training result and an intermediate prediction model;

based on the intermediate training result and the average absolute errorMAEAnd Nash correlation coefficientNSECalculating a loss function value;

and stopping training in response to the loss function value meeting a condition to obtain the target prediction model.

In some embodiments, the transducer attention module includes an encoder for calculating self-attention and inputting the result to a decoder for calculating cross-attention, the transducer attention module having a transducer architecture;

the encoder includes a first multi-headed self-attention layer, a first residual connection layer, and a first normalization layer, and the decoder includes a second multi-headed self-attention layer, a multi-headed cross-attention layer, a second residual connection layer, and a second normalization layer.

In some embodiments, the performing, in the local segmentation module, segmentation processing of the second training set in the overlapping time sequence segments to obtain segmented data includes:

time series data in the second training setPerforming overlapping or non-overlapping partial segmentation to obtain segmentation data +.>；

Calculating the number of segments according to the following formulaN：

；

Wherein,ia special identification of the individual time series data after the segmentation process is shown,ithe value of (C) is%0,N)，PThe length of the segment to be represented is,Nin order to be able to divide the number of segments,Lrepresenting time series data in the second training setIs provided for the total length of (a),Sis the non-overlapping area of steps between two consecutive segments.

In some embodiments, the segmented data is position encoded according to the following formula, resulting in position encoded data comprising position information:

；

wherein,representing even items in said position-coded data,/->Indicated are odd items in the position-coded data,posrepresenting the position of said position-coded data in said time-series data,/for>Indicated is +.>Is a dimension of (c).

In some embodiments, for a fixed length of pitchkAccording to the following formula, obtainAnd->Is a relative positional relationship of:

；

。

in some embodiments, the decoded output data is derived according to the following formula:

；

wherein,representing said decoded output data, +.>Represented is a normalized exponential function, +.>Representing an initial component of the position-coded data,Q、K、Va query component, a key component and a numeric component, respectively, of the position-coded data, +.>、/>、/>Respectively represent a weight matrix corresponding to the query component, a weight matrix corresponding to the key component and a weight corresponding to the numerical componentMatrix (S)>Denoted by the transposed component of the key component,/->Representing the dimensions of the position-coded data,Linearrepresented as a linear function +.>Representing the position-coded data after a positive normalization process.

In some embodiments, the average absolute error is calculated according to the following equations, respectivelyMAEAnd the Nash correlation coefficientNSE：

；

Wherein,Mthe length of the prediction period in the intermediate training results is shown,representing the predicted value of time series data at the r-th moment in the intermediate training result,/>Representing observations of time series data in said intermediate training results, +.>Represents the average of the observations of time series data in the intermediate training results,rthe predicted time is shown.

The beneficial effects of the technical scheme provided by the embodiment of the application at least comprise:

the embodiment of the application provides a time sequence prediction method based on local segmentation, which is characterized in that a second training set is input into a constructed original prediction model comprising a local segmentation module, a Transformer attention module, a segmentation flattening module and a full-connection layer which are sequentially connected to train to obtain a target prediction model, the constructed original prediction model considers the problems of slow model training speed and prediction speed and low efficiency caused by long time sequence data period and irregular change trend, and the time sequence data is divided into a plurality of sequence-time periods with equal length and overlapping, so that the Transformer attention module operates on the sequence periods corresponding to the time periods instead of the time sequence data corresponding to single time points, thereby enhancing the learning capacity of similar time sequence change processes and reducing the space complexity of attention calculation; and inputting the time sequence data to be predicted into a target prediction model to obtain a target prediction result, so that the obtained target prediction result is more accurate. The method can convert the correlation between time series data corresponding to a single time point of attention calculation into the similarity between the series sections, can hold the change of historical data more accurately, and can also keep the local semantic information in the time section, thereby reducing the spatial complexity of the attention calculation and further ensuring that the obtained prediction result is more accurate.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for predicting a time sequence based on local segmentation according to an embodiment of the present application;

fig. 2 (a) is a prediction result of predicting 30 daily traffic in a time sequence prediction method based on local segmentation according to an embodiment of the present application;

fig. 2 (b) is a prediction result of predicting 90-day runoff in a time sequence prediction method based on local segmentation according to an embodiment of the present application;

fig. 2 (c) is a prediction result of predicting 180-day runoff in the local segmentation-based time sequence prediction method according to the embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The embodiment of the application provides a time sequence prediction method based on local segmentation, which comprises the following steps of.

And step 101, carrying out data division on the original time sequence data to obtain a first training set, a first verification set and a first test set.

In some embodiments, the raw time series data is aligned in a time dimension and then divided into a first training set, a first validation set, and a first test set.

It should be noted that, the first training set is used for training the model, the first verification set is used for adjusting model parameters, and the first test set evaluates the model.

And 102, preprocessing the first training set, the first verification set and the first test set to obtain a preprocessed second training set, a preprocessed second verification set and a preprocessed second test set.

The preprocessing may reduce the effects of distribution shifts between the second training set, the second validation set, and the second test set.

In some embodiments, step 102 may specifically include: (1) Extracting feature graphs of a first training set, a first verification set and a first test set to obtain feature information of data; (2) Calculating the data mean and variance of each channel in the first training set, the first verification set and the first test set; (3) And carrying out normalization processing on the feature map based on the data mean and variance to obtain a second training set, a second verification set and a second test set, calculating the data mean and variance of each channel, and carrying out normalization processing on the feature map by using the data mean and variance, so that the distribution offset effect among the second training set, the second verification set and the second test set can be reduced.

In some embodiments, 100 rounds of epoch training are set, the initial learning rate in the original predictive model is adjusted, and the initial learning rate is attenuated by an Adam algorithm optimization model.

And 103, constructing an original prediction model, wherein the original prediction model comprises a local segmentation module, a transducer attention module, a segmentation flattening module and a full connection layer which are sequentially connected.

The constructed original prediction model considers the problems of slow model training speed, slow prediction speed and low efficiency caused by long time sequence data period and irregular change trend, and the local segmentation module divides the time sequence data into a plurality of sequences which are equal in length and can be overlapped with each other to enable the transducer attention module to operate on the time sequence data corresponding to the sequence segment rather than the single time point corresponding to the time period, thereby enhancing the learning capability of similar time sequence change processes and reducing the space complexity of attention calculation. The transducer attention module is used for obtaining a time period with high correlation in the time series data through attention calculation, the segmentation flattening module is used for dividing the complete time series data into shorter time periods and flattening the time periods into dimensions available for the model, and the full-connection layer is used for converting the dimensions in the model into dimensions required for prediction.

And 104, inputting the second training set into the original prediction model for training to obtain a target prediction model.

And training the original prediction model to obtain a target prediction model, so that the prediction is performed more accurately.

In some embodiments, step 104 specifically includes:

(1) And in the local segmentation module, segmentation processing of the second training set in the overlapping time sequence segments is carried out, so that segmented data are obtained.

The constructed original prediction model considers the problems of low model training speed and prediction speed and low efficiency caused by long time sequence data period and irregular change trend, and the local segmentation module is used for carrying out segmentation processing on the second training set in a time sequence period which can be overlapped, so that the time sequence data is divided into a plurality of time sequence periods with equal length and can be overlapped, the segmented data is obtained, the local semantic information in the time period is reserved, the obtained prediction result is more accurate, and the subsequent calculation efficiency is influenced.

In some embodiments, the segmentation processing of the overlapping time sequence segments is performed on the second training set in the local segmentation module to obtain segmented data, which specifically includes:

Calculating the number of segments according to the following formulaN：

；

Before the dividing process, the method willLast value ofSThe number of repetitions is padded to the end of the time series data in the second training set to ensure that the segmentation is correct. The segmentation processing of the local segmentation module can be used forThe local time period and the corresponding time sequence data thereof are used for carrying out attention calculation, and the number of the input data can be calculated fromLReduced to aboutL/SMeaning that attention is paid to memory usage and computational complexity of the diagramSIs reduced by a multiple of (a). Therefore, when training time and GPU memory are limited, the segmentation process can enable the model to learn a longer history sequence, so that prediction performance is remarkably improved.

(2) And performing position coding on the split data to obtain position coded data comprising position information. The attention calculation cannot acquire the position information of the input sequence, and the position coding can enable the segmented data to acquire the relative position relation so as to orderly perform subsequent calculation.

In some embodiments, the partitioned data is position encoded according to the following formula, resulting in position encoded data comprising position information:

；

。

(3) And performing self-attention calculation processing on the position coding data in a transducer attention module to obtain decoding output data.

The transducer attention module focuses not only on single time period input, but on the whole sequence, and gives different weights to each time period in the sequence, so that end-to-end global optimization can be realized, and the transducer attention module calculates time sequence data corresponding to the time period rather than single time point, thereby enhancing learning ability of similar time sequence change process and reducing space complexity of attention calculation.

In some embodiments, the transducer attention module includes an encoder for calculating self-attention and inputting the result to the decoder for calculating cross-attention; the encoder includes a first multi-headed self-attention layer, a first residual connection layer, and a first normalization layer, and the decoder includes a second multi-headed self-attention layer, a multi-headed cross-attention layer, a second residual connection layer, and a second normalization layer.

In some embodiments, the number of encoders and decoders may be multiple, each of which may be stacked in series with multiple layers.

；

wherein,representing said decoded output data, +.>Represented is a normalized exponential function, +.>Representing an initial component of the position-coded data,Q、K、Va query component, a key component and a numeric component, respectively, of the position-coded data, +.>、/>、/>Respectively representing a weight matrix corresponding to the query component, a weight matrix corresponding to the key component and a weight matrix corresponding to the numerical component,/a weight matrix corresponding to the key component>Denoted by the transposed component of the key component,/->Representing the dimensions of the position-coded data,Linearrepresented as a linear function +.>Representing the position-coded data after a positive normalization process. Attention is usually focused on the correlation between time series data corresponding to time points, local context change trend of single time point is ignored, the attention calculation can be performed on the local time period and the corresponding runoff amount thereof through the segmentation processing of the local segmentation module, and the input data number can be calculated fromLReduced to aboutL/SMeaning that attention is paid to memory usage and computational complexity of the diagramSIs reduced by a multiple of (a).

(4) And performing flattening processing on the decoded output data in the segmented flattening module to obtain one-dimensional flattening data. The clapping process is the inverse of the previous segmentation process for restoring the data dimension.

(5) And inputting the one-dimensional clapping data into the full-connection layer to obtain an intermediate training result and an intermediate prediction model.

It should be noted that, the full connection layer is that each node is connected to all nodes of the previous layer, so as to integrate the features extracted from the front edge. The parameters of the fully connected layer are also generally the most due to their fully connected nature. Meanwhile, the number of nodes of the full-connection layer determines the output dimension. The intermediate prediction model is not a final target prediction model, and parameter adjustment and evaluation are needed to improve the prediction effect of the model.

(6) Based on the intermediate training result, average absolute errorMAEAnd Nash correlation coefficientNSEThe loss function value is calculated.

In some embodiments, the loss function may be the mean absolute errorMAEAnd Nash correlation coefficientNSE。

In some embodiments, the meters are each according to the following formulaCalculating the average absolute errorMAEAnd Nash correlation coefficientNSE：

；

Wherein,Mthe length of the prediction period in the intermediate training results is shown,representing the predicted value of time series data at the r-th moment in the intermediate training result,/>Representing observations of time series data in said intermediate training results, +.>Represents the average of the observations of time series data in the intermediate training results,rthe predicted time is shown. (7) And stopping training in response to the loss function value meeting the condition to obtain a target prediction model.

In some embodiments, the condition that the loss function value satisfies may be that the loss function value decreases five times in succession, andMAEapproaching 0,NSEApproaching 1.

It is noted that the loss function value is continuously reduced for five times to illustrate that the model has good robustness and is continuously improved,MAEapproaching 0,NSEAnd the approach to 1 shows that the target prediction model is trained well, and the target prediction result is more accurate.

In some embodiments, in response to the loss function value not decreasing five times in succession, parameters of each neuron in the intermediate predictive model are adjusted and training is continued.

And 105, inputting the time series data to be predicted into a target prediction model to obtain a target prediction result.

And inputting the time sequence data to be predicted into a target prediction model so as to obtain a target prediction result and obtain a change rule and trend of the time sequence. The target prediction result is one-dimensional time series data, i.e. values of each day in a time range that needs to be predicted.

The target prediction model and different prediction models (LSTM prediction model, TCN prediction model, transducer prediction model, informater prediction model, autoformer prediction model) are utilized to carry out the runoff (in m) on the runoff of the mountain-holding station along the hydrologic station in the Yangtze river basin under different prediction periods (3 days, 7 days, 15 days, 30 days, 90 days, 180 days and 360 days) ³ /s) a prediction of the number of samples,MAEandNSEthe comparison results are shown in table 1 as evaluation indexes of the prediction effect. From the analysis of table 1, it can be found that in short-term prediction, the target prediction model fails to take significant advantage, indicating that the local segmentation operation is a negative boost for short-term prediction. With the lengthening of the prediction period, the target prediction model obtains better performance in medium-and-long-term prediction, especially in 90-day prediction, and the target prediction model is matched with a suboptimal model InformirNSEIn comparison with the addition of 7.1%,MAEthe improvement of the prediction performance is continued to the prediction task of 360 days compared with the reduction of 4.9 percent.

Table 1 mountain-screening site daily Scale prediction comparative experiment results

The single prediction results of the long-term prediction periods in 30 days, 90 days and 180 days are presented in a broken line trend graph, and as shown in fig. 2 (a), 2 (b) and 2 (c), the blue curve represents an observation value curve, namely a true and accurate runoff amount curve provided by a hydrological station, the orange curve represents the prediction result of an LSTM model, the gray curve represents the prediction result of a TCN model, the yellow curve represents the prediction result of a transducer model, the light green curve represents the prediction result of an Informir model, the dark green curve represents the prediction result of an Autoformer model, the brown curve represents the prediction result of a target prediction model provided by the embodiment of the application, and the curve represented by the prediction result of the target prediction model provided by the embodiment of the application penetrates through the observation value curve, so that the judgment on the long-term predicted runoff amount and the runoff amount change trend is accurate can be found.

According to the time sequence prediction method based on the local segmentation, the second training set is input into the built original prediction model comprising the local segmentation module, the Transformer attention module, the segmentation flattening module and the full-connection layer which are sequentially connected, so that a target prediction model is obtained, the built original prediction model considers the problems of slow model training speed and prediction speed and low efficiency caused by long time sequence data period and irregular change trend, the time sequence data is divided into a plurality of sequence-time periods with equal length and overlapping, and the Transformer attention module is enabled to operate on the time sequence data corresponding to the sequence periods instead of the single time point, so that learning capacity of similar time sequence change processes is enhanced, and space complexity of attention calculation is reduced; and inputting the time sequence data to be predicted into a target prediction model to obtain a target prediction result, so that the obtained target prediction result is more accurate. The method can convert the correlation between time series data corresponding to a single time point of attention calculation into the similarity between the series sections, can hold the change of historical data more accurately, and can also keep the local semantic information in the time section, thereby reducing the spatial complexity of the attention calculation and further ensuring that the obtained prediction result is more accurate.

In this application, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" refers to two or more, unless explicitly defined otherwise.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the present application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A method of local segmentation-based time series prediction, the method comprising:

2. The partial segment based time series prediction method according to claim 1, wherein preprocessing the first training set, the first verification set and the first test set to obtain a preprocessed second training set, second verification set and second test set comprises:

3. The local segment-based time series prediction method according to claim 1 or 2, wherein the step of inputting the second training set into the original prediction model for training to obtain a target prediction model includes:

4. A partial segment based time series prediction method according to claim 3, wherein the transform attention module comprises an encoder for calculating self-attention and inputting the result to the decoder for calculating cross-attention;

5. A local segmentation based time series prediction method according to claim 3, wherein the performing, in the local segmentation module, segmentation processing of the second training set with respect to the overlapping time series segments to obtain the segmented data includes:

time series data in the second training setPerforming overlapping or non-overlapping partial segmentation to obtain segmented data；

Calculating the number of segments according to the following formulaN：

；

6. The partial segment based time series prediction method of claim 5, wherein the segmented data is position coded according to the following formula to obtain position coded data including position information:

；

wherein,representing even items in said position-coded data,/->Indicated are odd items in the position-coded data,posrepresenting the position of the position-coded data in the time-series data,indicated is +.>Is a dimension of (c).

7. The partial segment based time series prediction method of claim 6, wherein for a fixed length pitchkAccording to the following formula, obtainAnd->Is a relative positional relationship of:

；

。

8. the partial segment based time series prediction method of claim 6, wherein the decoded output data is derived according to the following formula:

；

wherein,representing said decoded output data, +.>Represented is a normalized exponential function, +.>Representing an initial component of the position-coded data,Q、K、Va query component, a key component and a numeric component, respectively, of the position-coded data, +.>、/>、/>Respectively representing a weight matrix corresponding to the query component, a weight matrix corresponding to the key component and a weight matrix corresponding to the numerical component,/a weight matrix corresponding to the key component>Denoted by the transposed component of the key component,/->Representing the dimensions of the position-coded data,Linearrepresented as a linear function +.>Representing the position-coded data after a positive normalization process.

9. A partial segment based time series prediction method according to claim 3, characterized in that the mean absolute error is calculated according to the following formula, respectivelyMAEAnd the Nash correlation coefficientNSE：

；