WO2022053064A1

WO2022053064A1 - Method and apparatus for time sequence prediction

Info

Publication number: WO2022053064A1
Application number: PCT/CN2021/118272
Authority: WO
Inventors: 朱云依
Original assignee: 胜斗士(上海)科技技术发展有限公司
Priority date: 2020-09-14
Filing date: 2021-09-14
Publication date: 2022-03-17
Also published as: CN112053004A

Abstract

A method and apparatus for time sequence prediction and a computer-readable storage medium. The method comprises: acquiring a historical data sequence of an object corresponding to a historical time sequence (S210); using a first neural network model to, on the basis of the historical data sequence, extract a regular data sequence corresponding to a future time sequence (S220); generating a predicted feature data sequence on the basis of the regular data sequence, a future dynamic feature sequence corresponding to the future time sequence, and a future static feature (S230); and using a second neural network model to, on the basis of the predicted feature data sequence, predict a future data sequence of an object corresponding to the future time sequence (S240). The described method can meet the requirements of high-efficiency calculation, and accurately capture the nonlinear impact of trend factors, seasonal factors, external factors, and the like on a predicted object, while simultaneously carrying out short-distance and long-distance time prediction.

Description

Method and apparatus for time series forecasting

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of Chinese Patent Application No. 202010959817.7 filed on September 14, 2020. The contents disclosed in the above Chinese patent application are hereby cited in their entirety as a part of this application.

technical field

The present application relates to time series forecasting, and in particular, to a method, apparatus, and computer-readable storage medium for predicting future data of an object based on historical data of the object.

Background technique

In industries such as retail and catering, it is necessary to estimate the sales data for a future period of time based on the historical sales of the product for stocking, distribution and updating of the product. Accurately predicting product sales in the future can effectively reduce costs, discover business opportunities in time, and quickly adjust business strategies to improve competitiveness.

Forecasting sales expectations for future times based on product sales over a past period of time is known as product time series forecasting. The current mainstream technologies for time series forecasting include two categories: one is the traditional statistics-based forecasting algorithm represented by Arima/Prophet, and the other is the deep learning-based forecasting algorithm represented by the LSTM neural network.

However, the time series prediction algorithm based on traditional statistics is a linear algorithm, and it is difficult to capture the nonlinear and long-term laws in the time series. The time series prediction algorithm based on LSTM neural network is prone to gradient disappearance or gradient explosion when the scale of the time series becomes larger, which leads to the distortion of the prediction result, and the operation efficiency is low, and the data and calculation are redundant.

Therefore, there is a need for improvements in time series forecasting methods.

It should be noted that the information disclosed in the above Background section is only for enhancing understanding of the background of the application, and therefore may include information that does not form the prior art known to a person of ordinary skill in the art.

SUMMARY OF THE INVENTION

The present application proposes a method, an apparatus and a computer-readable storage medium for time series prediction, which are used to solve at least one defect existing in the prior art solutions, and extract rules from historical data of objects and combine future influencing factors to predict The object's future data.

According to an aspect of the present application, a method for time series prediction is proposed, comprising:

Obtain a historical data sequence of an object corresponding to a historical time series, where the historical data in the historical data sequence includes historical dynamic characteristics and historical values of the object corresponding to the historical time in the historical time series, wherein the historical data Dynamic features are associated with corresponding historical time;

Using the first neural network model to extract the regular data sequence of the object corresponding to the future time sequence based on the historical data sequence;

A predicted feature data sequence is generated based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future dynamic feature sequence includes a future dynamic feature of the object corresponding to a future time in the time series, the future dynamic feature being associated with the corresponding future time; and

using a second neural network model to predict a future data sequence of the object corresponding to the future time series based on the predicted feature data sequence, where the future data in the future data sequence includes a future time corresponding to the future time series The predicted future value of the object.

According to another aspect of the present application, an apparatus for time series prediction is proposed, comprising:

A historical data acquisition unit configured to acquire a historical data sequence of the object corresponding to a historical time series, the historical data in the historical data sequence including the history of the object corresponding to the historical time in the historical time series dynamic features and historical values, wherein the historical dynamic features are associated with corresponding historical times;

a regularity extraction unit configured to use a first neural network model to extract the regularity data sequence of the object corresponding to the future time sequence based on the historical data sequence;

A predicted feature generation unit configured to generate a predicted feature data sequence based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future a sequence of dynamic characteristics including future dynamic characteristics of the object corresponding to future times of the future time series, the future dynamic characteristics being associated with the corresponding future times; and

A prediction unit configured to use a second neural network model to predict a future data sequence of the object corresponding to the future time series based on the predicted feature data sequence, where the future data in the future data sequence includes a sequence related to the future data sequence. The predicted future value of the object corresponding to the future time of the time series.

According to yet another aspect of the present application, a computer-readable storage medium is provided, on which a computer program is stored, the computer program including executable instructions, when the executable instructions are executed by at least one processor, implement the above-mentioned method.

According to yet another aspect of the present application, an electronic device is proposed, comprising a processor and a memory for storing executable instructions of the processor, wherein the processor is configured to execute the executable instructions to implement the above the method described.

The time series prediction method and device according to the embodiments of the present application can meet the requirements of efficient calculation, accurately capture the nonlinear effects of trend factors, seasonal factors, external factors, etc. on the predicted object, and make short-distance and Long-range time prediction.

It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

Description of drawings

The above and other features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the accompanying drawings.

1 is a schematic diagram of a seq2seq neural network model architecture for time series prediction according to an embodiment of the present application;

FIG. 2 is an exemplary flowchart of a method for time series forecasting according to an embodiment of the present application;

FIG. 3 is a schematic block diagram of an apparatus for time series prediction according to an embodiment of the present application; and

FIG. 4 is a schematic block diagram of an electronic device according to an embodiment of the present application.

detailed description

Example embodiments will now be described more fully with reference to the accompanying drawings. Exemplary embodiments, however, can be embodied in various forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this application will be thorough and complete, and will fully convey the concept of exemplary embodiments conveyed to those skilled in the art. In the drawings, the size of most elements may be exaggerated or deformed for clarity. The same reference numerals in the drawings denote the same or similar structures, and thus their detailed descriptions will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided in order to give a thorough understanding of the embodiments of the present application. However, one skilled in the art will appreciate that the technical solutions of the present application may be practiced without one or more of the specific details, or other methods, elements, etc. may be employed. In other instances, well-known structures, methods, or operations are not shown or described in detail to avoid obscuring aspects of the application.

Those skilled in the art will understand that, although the method and apparatus for predicting the future data of an object based on the historical data of the object are introduced with a specific neural network model structure hereinafter according to the embodiment, the solution of the present application is not limited to this example. It can be extended to other neural network structures capable of realizing the concept of time series forecasting of the present application, and can also be extended to other deep learning-based forecasting model structures. In the exemplary embodiment of this paper, the time series prediction method is introduced with the sales products of the catering industry at the sales place as the object, but the method of the present application can be applied to any objects and scenarios that require time series prediction. Hereinafter, unless otherwise specified, the neural network generally refers to an artificial neural network (ANN). A common convolutional neural network (CNN) can be used for the neural network, and a fully convolutional neural network can be further used as the case may be. Other specific types and structures of neural networks that are not relevant to the time series forecasting method of the present application have not been described too much herein to avoid confusion.

Among the mainstream time series forecasting algorithms, both the traditional statistics-based differential integrated moving average autoregressive (Autoregressive Integrated Moving Average, referred to as Arima) model forecasting algorithm and the time series model Prophet forecasting algorithm can be used to predict trends, seasonality, etc. time-related laws. They first disassemble the historical data corresponding to the historical time series into the linear superposition of trend factors, seasonal factors and external influencing factors, and respectively predict the impact of the above factors on the data corresponding to future time, and finally analyze the impact of these three factors. The subsequent prediction results are superimposed to obtain the final prediction result.

However, the main problems of traditional statistical time series forecasting algorithms are: linear algorithms are difficult to capture the laws that exist in time series; different time series are predicted independently, resulting in no consideration of the relationship between different time series, so each time series forecast It is not accurate enough and its simple limited superposition cannot accurately reflect the real process change trend; when the time series is relatively short and the corresponding amount of historical data is relatively small, the linear algorithm cannot capture long-term laws and cannot learn from other sequences.

In the prediction algorithm based on deep learning, the LSTM (Long-Short-Term Memory, long short-term memory) network structure is usually used. The historical observations, historical influencing factors and future influencing factors of the variables corresponding to the time series are used as the input of the neural network model structure, and the future predicted values of the variables are used as the output of the neural network model structure.

LSTM network is a temporal recurrent neural network specially designed to solve the long-term dependency problem of general RNN (Recurrent Neural Network), which is suitable for processing and predicting important events with very long intervals and delays in time series. LSTM networks outperform temporal recurrent neural networks and hidden Markov models (HMMs). The important structure in the LSTM network is the gate, in which the forget gate determines whether the input is passed into the block, the input gate determines whether the input is accepted to pass into the block, and the output gate determines whether the information in the block memory is passed out. LSTM networks are usually trained using gradient descent.

There are several flaws in the time series forecasting algorithm based on the LSTM network model. First, the gradient disappears or the gradient explodes during the transmission of information through the LSTM network model, so the prediction results in long-distance time series prediction may be distorted. Although gates in LSTM networks can mitigate this to some extent, they cannot be eradicated. Secondly, in the LSTM network structure, information is passed one by one on each node in the convolutional layer of the LSTM network structure, from front to back and from bottom to top, which makes it difficult for multiple pieces of information in the time series to operate in parallel, resulting in the network model The training process is slow and inefficient. Third, in order to input the time-invariant part of the information of the influencing factors contained in the time series into the LSTM network structure, it is necessary to copy this information on each node, resulting in redundancy of data and calculation, which further reduces the LSTM. The processing speed of the network model. Therefore, although the LSTM network model can overcome the problem of poor long-term forecasting effect of the Arima/Prophet algorithm, it still cannot meet many requirements of time series forecasting.

The time series prediction process of the present application is described below with reference to the seq2seq neural network model architecture of FIG. 1 and the method flow for time series prediction of FIG. 2 according to an embodiment of the present application.

The basic structure of the neural network model architecture 100 in FIG. 1 is a seq2seq (sequence to sequence) network model. The seq2seq neural network model can be regarded as a transformation model. The basic idea is that the former neural network model of the two neural network models connected in series is used as the encoder network, and the latter neural network model is used as the decoder network. The encoder network converts a sequence of data into a vector or sequence of vectors, and the decoder network generates another sequence of data from that vector or sequence of vectors. A usage scenario of the seq2seq network model is speech recognition, in which the encoder network converts or divides English sentences into English or Chinese semantic data or semantic sequences, and the decoder network can convert the semantic data or semantic sequences into English sentences corresponding to them. Chinese sentences. The optimization of the seq2seq network model can use the maximum likelihood estimation method to maximize the probability of the data sequence generated by the decoder to obtain the optimal conversion effect.

According to an embodiment of the present application, the seq2seq neural network model architecture 100 includes a first neural network model 110 as an encoder and a second neural network model 120 as a decoder.

The first neural network model 110 is used for extracting information in the historical data, especially regular data reflecting the regularity in the historical data. According to one embodiment, the first neural network model 110 is a WaveNet neural network. As a sequence generation model, the WaveNet network is designed to predict the predicted value of the nth data based on the first n-1 data of a data sequence. WaveNet is particularly suitable for high-throughput input of one-dimensional data sequences as multi-dimensional vectors, which can achieve fast computation. The standard WaveNet network model is a convolutional neural network in which each convolutional layer convolves the previous layer. The larger the convolution kernel of the network and the more layers, the stronger the perception ability in the time domain and the larger the perception range. In the creation process of the WaveNet network, each time a node is generated, the node can be placed in the last node in the input layer and then iteratively generated. The activation function of the WaveNet network can use gate units, for example. The hidden layer between the input layer and the output layer of the network adopts recursive and skip connections, that is, the node of each convolutional layer in the hidden layer will add the original value and the output value of the activation function and pass it to the next A convolutional layer. The operation of reducing the number of channels can be achieved through a 1x1 convolution kernel. Then, the results of the activation function output of each hidden layer are added and finally output through the output layer.

As shown in FIG. 1 , the first neural network model 110 has an input layer (ie, a first convolutional layer) 112 , a hidden layer 113 and an output layer 114 . The number of hidden layers 113 may be zero, one or more. The input layer 112 , the hidden layer 113 and the output layer 114 each have a plurality of nodes 111 . The number of nodes in the input layer 112 should at least correspond to the data length in the historical data sequence to ensure that the neural network can receive information at each historical time.

When the first neural network model 110 using the ordinary WaveNet network is a one-dimensional causal convolutional network, since the first n-1 data of the input data sequence needs to be used to predict the nth data, the number of nodes used is Decrement by 1 with each convolutional layer. If the length of the historical data is large, many layers need to be added to the first neural network model 110 to satisfy n passes or a large filter is required, so that the selected gradient in the gradient descent process is too small, the training of the network is complicated, and the fitting Ineffective.

According to the embodiments of the present application, the concept of Dilated Convolutional Neural Network (Dilated CNN) can be introduced. A dilated convolutional neural network is a convolutional network with "holes". According to an embodiment of the present application, the first convolutional layer (ie, the input layer) of the dilated convolutional neural network may be a one-dimensional causal convolutional network with an expansion coefficient of 1. Starting from the second convolutional layer of the neural network, the expansion coefficient of each convolutional layer is the expansion coefficient of the previous convolutional layer multiplied by the expansion index (Dilation Index), where the expansion index is a value not less than 2 and not greater than the convolutional layer. A positive integer for the kernel size. This dilated convolutional neural network configuration can be employed in both the hidden layer and the output layer of the first neural network model 110 . For example, when the dilation index is 2, the second convolutional layer will only use n, n-2, n-4, ... nodes for convolution, while the third convolutional layer will only use n, n-4 , n-8, ... nodes, and so on.

The expanded neural network structure can significantly speed up the information transfer process in the neural network, avoid gradient disappearance or gradient explosion, and improve the processing speed and prediction accuracy of the first neural network model 110 . For example, when the convolution kernel size is 2 and the expansion coefficient is 2, the number of convolutional layers of the neural network through which information is passed from the node corresponding to the first historical time to the node corresponding to the last historical time is Log ₂ N, where N is the length of data in the historical data series.

According to an embodiment of the present application, the second neural network model 120 as a decoder may be a multi-layer perceptual (MLP) network. The MLP network also includes an input layer, a hidden layer and an output layer, where each neuron node has an activation function (such as a sigmoid function) and is trained using a loss function. The MLP network predicts the future value of the object based on the historical law extracted by the encoder network and the future influencing factors (including dynamic and static factors).

Although the WaveNet network is used as an example of the encoder network of the seq2seq neural network model architecture, and the MLP network is used as an example of the decoder network, the time series method of the present application can use any other neural network capable of feature extraction and prediction of sequence data. Network structures, such as, but not limited to, various types of recurrent neural networks (RNNs) that can implement the function of time series prediction of the present application. For example, when using the seq2seq neural network model architecture, the encoder network can use an LSTM network, while the decoder network uses an MLP network. Although the LSTM network has shortcomings, it is still possible to combine with the MLP network and adjust the input and output data sequences of the network to a certain extent. Compared with the existing scheme, better results can be obtained. You can also choose the WaveNet network as the encoder network, LSTM network or other RNN network as the decoder network, etc.

In the flow chart depicted in FIG. 2, the method 200 for time series prediction is based on a historical time series T1={t ₁ , t ₂ , t ₃ , . . . , t _n } corresponding to a historical time series containing n historical times The historical data series 101 is used to predict the future data series 104 corresponding to the future time series T2={t _n+1 , t _n+2 , t _n+3 , . . . , t _n+m } including m future times. The unit of historical time and/or future time can be selected from hour, day, month, year, week, quarter, etc. as required. For example, for predicting the number of passengers getting on and off at a bus stop, hours, or even minutes, quarter-hour time units or intervals can be used. For fast food restaurants, time units such as days, months, and weeks can be used. According to the passenger flow of fast food restaurants, measuring and predicting the sales volume of food products in a weekly unit can better reflect the historical data laws and future trends of the industry than other units. According to the embodiments of the present application, the historical time and/or the future time may be a time point (for example, the _t1th time, as of the first quarter, 10 am in the morning, etc.), or may be a continuous time period (for example, the _t2th period of time) , week 2, month 3, October of the current year, etc.). When the historical time and the future time are points in time, the time intervals between the respective historical times and/or future times may be the same to indicate a period in which the historical data and future data are extracted and predicted at the continuous time interval as a period sexual information. When the historical time and the future time are a time period, the length of the time period can also be the same, so as to extract and predict the periodic information of the above-mentioned historical data and future data as a period.

The method 200 first acquires the historical data sequence 101 of the object corresponding to the historical time series T1 in step S210.

The historical data of the historical data series 101 includes historical dynamic features x _i and historical values yi corresponding to the historical time t _i in the historical time series T1 , where _i =1, 2, . . . , n. The historical value _yi is the measured value of the object measured at the historical time _ti , such as the actual sales value of the product. The historical value is caused by the internal factors of the object, so it can also be called the internal factors of the object or the internal characteristic data. The historical dynamic feature _xi is the dynamic feature of the historical value _yi in the historical data that affects the object, for example including one or more of whether it is a holiday, the number of working days, the number of days or weeks away from a holiday, and so on. Historical dynamic characteristics are associated with time, such as including periodic factors that cyclically affect objects with a certain period (also known as periodic historical dynamic characteristics) and aperiodic factors that affect objects aperiodically (also known as aperiodic factors). Sexual History Dynamics). The period of the periodic factor may be determined by the length of the same time interval between each historical time point in the historical time series, or by the length of the historical time as a time period of the same length. The way in which aperiodic factors affect objects is related to a specific historical time, or it can be said to be random or triggered based on events. At each historical time _ti in the historical time series T1, the corresponding aperiodic factors may be different. The number n of historical times represents the number or length of historical data.

When an object includes multiple parts or sub-objects (eg, a product is a collection of multiple products), the historical value _yi may be a multidimensional variable or vector. Similarly, when the historical dynamic feature _xi that affects the historical value y _i of the object includes many factors, the historical dynamic feature is considered to be a combination of multiple historical sub-dynamic features, and the historical dynamic feature _xi can also be a multidimensional variable or vector . Historical dynamic features x _i and historical values y _i can form a two-dimensional vector ( _xi, y _i ) ^T (also called a binary data group, hereinafter unified as a two-dimensional vector), each of which is a two-dimensional vector. The sub-vectors x _i and y _i are both multidimensional vectors as described above. Thus, the historical data sequence 101 can be represented as a one-dimensional sequence of two-dimensional vectors {(x ₁ , y ₁ ) ^T , (x ₂ , y ₂ ) ^T , . . . , (x _n , y _n ) ^T }.

According to the embodiment of the present application, historical static features not related to historical time are not added to the historical data 101 . The first neural network model 110 serving as an encoder does not process historical static data, which can reduce the redundancy of data and calculation, and improve the operation speed of the network model.

In step S220, the first neural network model 110 is used to complete the regularity extraction function of extracting the regularity data sequence 102 of the object corresponding to the future time series T2 based on the historical data sequence 101. The historical data sequence 101 is used as the input of the first neural network model 110, and the extracted regular data sequence 102 is output through the transmission and calculation of each convolutional layer of the WaveNet network. The dilated convolutional neural network described above can speed up the regular information extraction process of the regular data sequence 102 and improve the information extraction accuracy.

Since the historical dynamic features _xi in the input historical data sequence 101 include periodic dynamic features and aperiodic dynamic features, the extracted regularity represented in the regularity data sequence 102 output by the first neural network model 110 The information includes the periodic regularity feature _ca from the periodic historical dynamic feature and the aperiodic regularity feature cn _+j from the aperiodic historical dynamic feature, where j=1, 2, . . . m. The periodic regularity feature _ca corresponds to the periodic historical dynamic feature, cyclically affecting the future value y _n+j of the object with a certain period, j=1, 2, . . . The periodicity of the historical value of an object, such as the month, is affected. Since the time interval between the future time _ti of the future time series T2 (when the future time is a time point) and/or the length of the future time (when the future time is a time period) is set to be the same as that in the historical time series T1 The length of the historical time t _n+j is the same, so the periodic regularity characteristic ca is the same for each future time t _n _+j . Corresponding to the non-periodic historical feature data, the non-periodic regular feature c _n+j is based on the future value y _n+j of the object affected by a specific future time, so the non-periodic regular feature c _n+j for the future time series T2 Each corresponding future time tn _+j may be different. m is the number of future times in the future time series T2, indicating the number or length of future data to be predicted.

Since the periodic historical dynamic feature included in the historical dynamic feature _xi may be a combination of multiple sub-periodic factors, the periodic regular feature _ca also includes multiple sub-periodic regular features, which can be expressed as a multi-dimensional vector. According to the embodiment of the present application, the dimension of the periodic regularity feature _ca may be the same as the number of sub-periodic historical dynamic features in the periodic historical dynamic feature, or smaller than the latter to reduce the amount of computation. Similarly, an aperiodic historical dynamic feature may also have multiple sub-aperiodic dynamic features, so the aperiodic regularity feature c _n+j also includes multiple sub-aperiodic regularity features, which can be represented as multi-dimensional vectors. The dimension of the non-periodic regular feature c _n+j can also be the same as the number of sub-periodic historical dynamic features in the aperiodic historical dynamic feature, or smaller than the latter to reduce the amount of computation. In this way, the regular data sequence 102 can be represented as a one-dimensional sequence of two-dimensional vectors whose elements are composed of two multi-dimensional sub-vectors, a periodic regular feature c _a and a non-periodic regular feature c _n+j {(c _a , c _{n+ 1} ) ^T , ( _ca , cn ₊₂ ) ^T , . . . , ( _ca , cn _+m ) ^T }.

In predicting future values yn _+j in the subject's future data sequence 104, other factors that affect the subject at future times may also need to be considered.

Similar to the historical dynamic feature _xi , x _n+j in Fig. 1 is the future dynamic feature of the future value y _n+j of the influence object corresponding to the future time t _n+j in the future time series T2. The future dynamic feature x _n+j may be, for example, one or more of the promotion activities at a certain time in the future, whether it is a holiday, the number of working days, the number of days or weeks away from a holiday, and so on. The future dynamics are also associated with time and may be a multi-dimensional vector that includes multiple sub-future dynamics. The future dynamic features x _n+j form a one-dimensional sequence of multi-dimensional vectors {x _n+1 , x _n+2 , ..., x _n+m }.

Other factors may also include future static features xs that affect the object _'s future value yn _+j but are not time-dependent. The future static features _xs may include properties of the object (which are generally only relevant to the object itself and not to future time) and other features that are not time-dependent. For example, when the object is a product, the future static features x _s can be the category of the product, the temperature of the product, the sales location of the product (for example, represented by the location of the distribution center), etc. These features are only associated with the object and do not have transsexual. Depending on the number of uncorrelated influencing factors, the future static feature x _s can be a multidimensional vector composed of multiple sub-features.

According to the embodiments of the present application, the future static feature x _s can be further processed. The future static features x _s are divided into different types, and the correlation between each type is different. Embedding operations can transform sparse discrete variables into continuous variables. Embed the future static features according to their types, for example, divide them into two groups of future static features x _s1 and x _s2 according to location-related features and product attribute-related features, so that different groups of future static features are not related, that is, keep orthogonality Therefore, it avoids considering each specific static influence factor as a variable or a dimension of a vector, thereby reducing the overall dimension of future static features x _s and reducing the computational load of the model. In Figure 1, the future static feature set x _s1 includes multi-dimensional future static features e ₁ , which can affect objects at each future time t _n+j , and the future static feature set x _s2 includes multi-dimensional future static features e ₂ , can also affect the object at every future time t _n+j . According to specific circumstances, the number of future static features or future static feature groups may be 0, 1 or more. The number of specific features included in each future static feature determines that its dimension can be one or more.

The future static features x _s form a 0-, 1-, or multi-dimensional vector as a 1-dimensional sequence of length m {x _s , x _s , . . . , x _s } with elements. Taking the embodiment in FIG. 1 as an example, the one-dimensional sequence can be expressed as {(e ₁ , e ₂ ) ^T , (e ₁ , e ₂ ) ^T , . . . , (e ₁ , e ₂ ) ^T }.

The influence factors that can influence the subject's future data y _n+j are described above with four parts. In step S230 of the method 200, the predicted feature data sequence 103 may be generated based on the regular data sequence 102, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object. The generating process can be completed by splicing the one-dimensional sequences of the four influencing factors to form a one-dimensional prediction feature data sequence 103 with multi-dimensional (for example, 4-dimensional) vectors (or can be referred to as quaternary data groups) as elements. As shown in FIG. 1, the one-dimensional sequence 103 can be represented as {( _ca , cn ₊₁ _, e1, _e2 , xn ₊₁ ) ^T , ( _ca , cn ₊₂ _, e1, e ₂ , x _n+2 ) ^T , ..., (ca , c _n _+m , e ₁ , e ₂ , x _n+m ) ^T }.

In the next step S240, the predicted feature data sequence 103 is input into the second neural network model 120, and the prediction of the future corresponding to the future time series T2 is completed through the transfer and operation of a decoder such as a multi-layer perceptron MLP network. Data sequence 104, the prediction function of {yn ₊₁ , yn ₊₂ , ..., yn _+m }. The future value yn _+j of each predicted object in the future data sequence 104 is a multidimensional vector having the same dimensions as the historical value _yi .

The method 200 according to an embodiment of the present application may also optionally include, prior to using at least one of the first and second

neural network models

110 and 120 as encoder and decoder networks, respectively, using the training data set for the neural network model A step S250 of training to determine the optimal parameters of the model. The parameters of the neural network model can be unchanged during the use period after the training is completed, or they can be updated or adjusted based on a new data set after a period of use or in a predetermined period, and the model parameters can also be updated in real time by means of online supervision. .

FIG. 3 shows an exemplary structure of an apparatus 300 for time series prediction according to an embodiment of the present application. The apparatus 300 includes a historical data acquisition unit 310 , a regularity extraction unit 320 , a prediction feature generation unit 330 and a prediction unit 340 .

The historical data acquisition unit 310 is configured to acquire the historical data sequence 101 of the object corresponding to the historical time series T1. The historical data in the historical data sequence 101 includes the historical dynamic features _xi associated with time corresponding to the historical time t _i in the historical time series T1, and the historical value _yi of the object.

The regularity extraction unit 320 includes, for example, the first neural network model 110 as an encoder network in the seq2seq neural network model to extract the regularity of historical data. This unit is used to extract the regular data sequence 102 of the object corresponding to the future time series T2 from the historical data sequence 101 provided by the historical data acquisition unit 310 by using the neural network model. The regular data sequence 102 includes the periodic regular feature ca of the object corresponding to the future time t _n+j _of the future time series T2 and the aperiodic regular feature cn _+j associated with the corresponding future time. In the seq2seq neural network model structure, the encoder network can choose a sequence data network model such as the WaveNet network, and can further adopt a structure such as a dilated convolutional network to speed up information transfer and computation.

The prediction feature generation unit 330 is configured to use the regularity data sequence 102 output by the regularity extraction unit 320, the future dynamic feature sequence composed of the future dynamic feature x _n+ _{j corresponding to the future time t n+j} in the future time series T2, and the future dynamic feature sequence. The static features x _s are combined to generate the predicted feature data sequence 103 . A future dynamic feature x _n+j is associated with a future time t _n+j . In the process of generating the predicted feature data sequence 103, the predicted feature generating unit 330 may further group the static features x _s to orthogonalize each group of static features, thereby reducing the vector of each data element of the predicted feature data sequence dimension.

The prediction unit 340 comprises, for example, the second neural network model 120 as a decoder network in a seq2seq neural network model to predict future values of the object. The unit 340 is used to predict the future data sequence 104 of the object corresponding to the future time series T2 from the predicted feature data sequence 103 from the predicted feature generation unit 330 using the second neural network model 120. The second neural network model 120 may use a convolutional neural network such as a multi-layer perceptual MLP network.

The apparatus 300 also optionally includes a model training unit 350 for training the corresponding neural network model to determine the optimal parameters of the model before using the neural network model in the above-mentioned extraction unit 320 and the prediction unit 340, and can supervise or Update the parameters of the model.

The specific details of the functions performed by each unit that are the same or similar to those in the above-mentioned method 200 for time series prediction will not be described in detail.

For the time series forecasting method and apparatus according to the embodiments of the present application, the following experiments were conducted to compare the performance with existing time series forecasting schemes.

The experiment is carried out in the scenario of product prediction in the catering industry, and the test task requires to predict the sales volume of each product (object) in each distribution center in the next 1-4 weeks. The test dataset targets about 20 distribution centers, each including on average about 200 products. In the historical data of product sales, the longest is 128 weeks and the shortest is 1 week. The test task involves considering 23 dynamic influencing factors (such as whether it is a holiday, the number of working days, the number of weeks until the Spring Festival, etc.) and 7 static influencing factors (such as product classification, temperature, location of distribution center, etc.) in the prediction Wait).

Table 1 shows the training time, prediction time and prediction error of the models using different time series forecasting methods. Among them, the deep learning method using the seq2seq neural network model requires a lot of floating-point operations, and uses one more graphics processing unit GPU to accelerate the calculation than the traditional statistical algorithm Prophet.

Table 1

It can be seen from the results that the prediction accuracy (error) of the scheme using the WaveNet-MLP seq2seq2 (WaveNet network as the encoder and the MLP network as the decoder) according to the embodiment of the present application is better than that of the traditional statistical algorithm, and also better than that of the seq2seq neural network. Network model structure However, both the encoder and decoder networks adopt the scheme of the LSTM network model. In terms of prediction time, the solution using the neural network model is faster than the traditional statistical algorithm; and between the solutions using the neural network model, the training time of the WaveNet-MLP seq2seq neural network model structure of this application is significantly reduced .

Therefore, the advantages of the time series prediction method and apparatus according to the embodiments of the present application lie in the following aspects: using two neural network models such as the WaveNet network and the MLP network as the encoder and the decoder network, respectively, can make historical data Sequences, future data sequences are calculated in parallel at different historical times of the corresponding historical time series and different future times of the future time series, thereby improving the speed of model training and use; using neural network models such as WaveNet networks as encoders , especially the dilated convolutional network structure, which reduces the transmission path of the information in the historical data sequence of the object from the first historical time to the last historical time, avoiding the gradient disappearance and gradient explosion during the training process of the neural network, Thereby, long-distance time series prediction can be performed; only the influencing factors that do not change with time are introduced at the input of the second neural network model as the decoder part, avoiding duplication and calculation at each time point of the encoder network, thereby reducing Redundancy of data and computation; embedded grouping of influencing factors that do not change over time, such as static features, reduces the dimension of input data while maintaining the orthogonality between influencing factors.

It should be noted that although several modules or units of the apparatus for time series prediction are mentioned in the above detailed description, this division is not mandatory. Indeed, according to embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied. Components shown as modules or units may or may not be physical units, ie may be located in one place, or may be distributed over multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of the present application. Those of ordinary skill in the art can understand and implement it without creative effort.

In an exemplary embodiment of the present application, there is also provided a computer-readable storage medium on which a computer program is stored, the program including executable instructions, which, when executed by, for example, a processor, can implement any one of the above Steps of the method for time series forecasting described in the Examples. In some possible implementations, various aspects of the present application can also be implemented in the form of a program product, which includes program code, which is used to cause the program product to run on a terminal device when the program product is executed. The terminal device performs the steps according to various exemplary embodiments of the present application described in the method for time series prediction in this specification.

The program product for implementing the above method according to the embodiments of the present application may adopt a portable compact disc read only memory (CD-ROM) and include program codes, and may be executed on a terminal device such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

The computer-readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, carrying readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable storage medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out the operations of the present application may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural Programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).

In an exemplary embodiment of the present application, there is also provided an electronic device, which may include a processor, and a memory for storing executable instructions of the processor. Wherein, the processor is configured to perform the steps of the method for time series prediction in any one of the foregoing embodiments by executing the executable instructions.

As will be appreciated by one skilled in the art, various aspects of the present application may be implemented as a system, method or program product. Therefore, various aspects of the present application can be embodied in the following forms, namely: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".

The electronic device 400 according to this embodiment of the present application is described below with reference to FIG. 4 . The electronic device 400 shown in FIG. 4 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present application.

As shown in FIG. 4, electronic device 400 takes the form of a general-purpose computing device. Components of the electronic device 400 may include, but are not limited to, at least one processing unit 410, at least one storage unit 420, a bus 430 connecting different system components (including the storage unit 420 and the processing unit 410), a display unit 440, and the like.

Wherein, the storage unit stores program codes, and the program codes can be executed by the processing unit 410, so that the processing unit 410 executes various examples according to the present application described in the method for automatic time series prediction in this specification steps of sexual implementation. For example, the processing unit 410 may perform the steps shown in FIG. 2 .

The storage unit 420 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 4201 and/or a cache storage unit 4202 , and may further include a read only storage unit (ROM) 4203 .

The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205 including, but not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of a network environment.

The bus 430 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.

The electronic device 400 may also communicate with one or more external devices 500 (eg, keyboards, pointing devices, Bluetooth devices, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 400 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 450 . Also, the electronic device 400 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through a network adapter 460 . Network adapter 460 may communicate with other modules of electronic device 400 through bus 430 . It should be understood that, although not shown, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.

From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present application may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or a network Above, several instructions are included to cause a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the method for time series prediction according to an embodiment of the present application.

Other embodiments of the present application will readily occur to those skilled in the art upon consideration of the specification and practice of what is disclosed herein. This application is intended to cover any variations, uses or adaptations of this application that follow the general principles of this application and include common knowledge or conventional techniques in the technical field not disclosed in this application . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the application being indicated by the appended claims.

Claims

A method for time series forecasting, comprising:

Obtain a historical data sequence of an object corresponding to a historical time series, where the historical data in the historical data sequence includes historical dynamic characteristics and historical values of the object corresponding to the historical time in the historical time series, wherein the historical data Dynamic features are associated with corresponding historical time;

Using the first neural network model to extract the regular data sequence of the object corresponding to the future time sequence based on the historical data sequence;

A predicted feature data sequence is generated based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future dynamic feature sequence includes a future dynamic feature of the object corresponding to a future time in the time series, the future dynamic feature being associated with the corresponding future time; and

using a second neural network model to predict a future data sequence of the object corresponding to the future time series based on the predicted feature data sequence, where the future data in the future data sequence includes a future time corresponding to the future time series The predicted future value of the object.
The method according to claim 1, wherein the regular data in the regular data sequence includes periodic regular features and aperiodic regular features of the object corresponding to the future time of the future time series, wherein the Aperiodic regular features are associated with corresponding future times.
The method according to claim 1, wherein the first neural network model constitutes an encoder in a seq2seq network model, and the second neural network model constitutes a decoder in the seq2seq network model.
The method according to any one of claims 1 to 3, wherein the first neural network model is a WaveNet network.
The method according to claim 4, wherein the WaveNet network is a dilated convolutional neural network.
The method according to claim 5, wherein the WaveNet network comprises at least two convolution layers, and the first convolution layer in the at least two convolution layers is a one-dimensional convolution with an expansion coefficient of 1 layer, the expansion coefficient of the other convolutional layers after the first convolutional layer in the at least two convolutional layers is the expansion coefficient of the previous convolutional layer multiplied by the expansion index.
The method according to any one of claims 1 to 3, wherein the second neural network model is a multilayer perception (MLP) network.
The method according to any one of claims 1 to 3, wherein the periodic regularity characteristic of the object is the same for each future time in the future time series.
The method according to any one of claims 1 to 3, characterized in that, based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static state of the object Feature Generation The predicted feature data sequence includes:

For the corresponding future time in the future time series, the periodic regular feature and the non-periodic regular feature in the regular data sequence, the future dynamic feature in the future dynamic feature sequence, and the future static features are spliced into the predicted feature data sequence.
The method of any one of claims 1 to 3, wherein the future static characteristic is the same for each future time in the future time series.
The method according to any one of claims 1 to 3 and 9, characterized in that, further comprising:

Embedding grouping of the future static features.
The method according to any one of claims 1 to 3, characterized in that, further comprising, before using at least one of the first neural network model and the second neural network model, analyzing the used neural network The model is trained.
The method according to any one of claims 1 to 3, wherein the object is a product, and the historical value and the future value of the object are the historical sales volume and future sales volume of the product, respectively, The unit of at least one of the historical time and the future time includes one of the following: hour, day, month, year, week, quarter.
The method according to claim 13, wherein at least one of the historical dynamic characteristics and the future dynamic characteristics of the object comprises at least one of the following: whether it is a holiday, the number of working days, the distance The number of days or weeks of the holiday.
The method according to claim 13, wherein the future static feature comprises at least one of the following: product category, product temperature, and sales location of the product.
A device for time series prediction, characterized in that it includes:

A historical data acquisition unit configured to acquire a historical data sequence of the object corresponding to a historical time series, the historical data in the historical data sequence including the history of the object corresponding to the historical time in the historical time series dynamic features and historical values, wherein the historical dynamic features are associated with corresponding historical times;

a regularity extraction unit configured to use a first neural network model to extract the regularity data sequence of the object corresponding to the future time sequence based on the historical data sequence;

A predicted feature generation unit configured to generate a predicted feature data sequence based on the regular data sequence, the future dynamic feature sequence of the object corresponding to the future time series, and the future static feature of the object, wherein the future a sequence of dynamic characteristics including future dynamic characteristics of the object corresponding to future times of the future time series, the future dynamic characteristics being associated with the corresponding future times; and

A prediction unit configured to use a second neural network model to predict a future data sequence of the object corresponding to the future time series based on the predicted feature data sequence, where the future data in the future data sequence includes a sequence related to the future data sequence. The predicted future value of the object corresponding to the future time of the time series.
The apparatus according to claim 16, wherein the regular data in the regular data sequence includes periodic regular features and aperiodic regular features of the object corresponding to the future time of the future time series, wherein the Aperiodic regular features are associated with corresponding future times.
The apparatus according to claim 16, wherein the first neural network model is an encoder network in a seq2seq network model, and the second neural network model is a decoder network in a seq2seq network model.
The apparatus according to any one of claims 16 to 18, wherein the first neural network model is a WaveNet network.
The apparatus according to any one of claims 16 to 18, wherein the second neural network model is a multilayer perception (MLP) network.
A computer-readable storage medium having stored thereon a computer program comprising executable instructions that, when executed by at least one processor, implement the method according to any one of claims 1 to 15 method.
An electronic device, comprising:

processor; and

a memory for storing executable instructions for the processor;

wherein the processor is configured to execute the executable instructions to implement the method of any of claims 1-15.