CN115983497A

CN115983497A - Time sequence data prediction method and device, computer equipment and storage medium

Info

Publication number: CN115983497A
Application number: CN202310164275.8A
Authority: CN
Inventors: 刘雨桐; 胡要林; 景世青; 欧阳葆青; 李婉莹; 王国勋
Original assignee: China Resources Digital Technology Co Ltd
Current assignee: China Resources Digital Technology Co Ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-04-18

Abstract

The present application relates to the field of computer technologies, and in particular, to a time series data prediction method and apparatus, a computer device, and a storage medium. The method comprises the following steps: acquiring historical time sequence data at a historical time step; wherein the historical timing data comprises: a timing characteristic; inputting historical time sequence data into a preset sub-bucket neural network for distribution prediction to obtain a distribution probability sequence, wherein the distribution probability sequence comprises the distribution probability of time sequence characteristics in each preset sub-bucket; inputting historical time sequence data into preset sub-buckets for carrying out feature discretization to obtain bucket embedding vectors; carrying out weighted summation according to the distribution probability and the bucket embedding vector to obtain a discrete eigenvalue embedding vector of historical time series data; and inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data in a preset prediction time step. The embodiment of the application improves the accuracy of time sequence prediction.

Description

Time sequence data prediction method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a time series data prediction method and apparatus, a computer device, and a storage medium.

Background

In the related art, the current research methods for predicting the timing sequence of the industrial sensor are mainly classified into 2 types. One type is a classical model based on statistics, such as a moving average method, an exponential smoothing method, an ARIMA model, a state space model and the like, and because the statistical model excessively depends on assumptions such as stability and stability, the requirement on data is high, and non-linearity is generally presented among modern industrial process variables, the conventional operation state evaluation model based on a linear method cannot obtain satisfactory effects. Another class is predictive models based on machine learning, such as KNN regression, SVM regression, BP neural networks, deep neural networks, and the like. Wherein, the KNN regression, SVM regression and BP neural network have simple structure and stable performance, but the prediction accuracy is limited. With the advent of cloud computing and big data era, the improvement of computing capability and the great increase of training data provide support for deep learning, and deep networks represented by cyclic neural networks gradually become a popular research direction for time sequence prediction due to the advantages of strong universality, high prediction accuracy and the like. In practical application, the data rule of the sensor is mostly related to long-distance time data, however, the model can only learn the dependence relationship of a short period due to gradient explosion or gradient diffusion of the common RNN along with circulation. In the practical industrial production application, the working conditions are more complicated and extensive, the state change is rich, the expressive force of the neural networks can be greatly reduced, and the prediction precision is influenced.

Therefore, how to provide a time series data prediction method, which can improve the accuracy of time series data prediction, is a technical problem to be solved urgently.

Disclosure of Invention

The embodiment of the application mainly aims to provide a time series data prediction method and device, computer equipment and a storage medium, which can improve the accuracy of time series data prediction.

In order to achieve the above object, a first aspect of an embodiment of the present application provides a time series data prediction method, where the method includes:

acquiring historical time sequence data at a historical time step; wherein the historical timing data comprises: a timing characteristic;

inputting the historical time sequence data into a preset sub-bucket neural network for distribution prediction to obtain a distribution probability sequence, wherein the distribution probability sequence comprises the distribution probability of the time sequence characteristic in each preset sub-bucket; the preset sub-buckets are used for discretizing the time sequence characteristics;

inputting the historical time sequence data into the preset sub-buckets to carry out feature discretization to obtain bucket embedding vectors;

carrying out weighted summation according to the distribution probability and the bucket embedding vector to obtain a discrete eigenvalue embedding vector of the historical time sequence data;

and inputting the discrete characteristic value embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data in a preset prediction time step.

In some embodiments, the barreled neural network includes a fully connected sub-network and an activation function sub-network, and the inputting the historical time series data into a preset barreled neural network for distribution prediction to obtain a distribution probability sequence includes:

inputting the time sequence characteristics into the fully-connected sub-network for vector conversion to obtain a time sequence vector;

inputting the time sequence vector into the activation function sub-network to perform sub-bucket prediction to obtain the distribution probability of the time sequence feature in each preset sub-bucket;

and combining the distribution probability of the time sequence characteristics in each preset sub-bucket to obtain the distribution probability sequence.

In some embodiments, inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data at a preset prediction time step, where the method includes:

screening the discrete eigenvalue embedded vectors in different time step lengths according to a preset time window length to obtain a selected discrete eigenvalue embedded vector sequence;

extracting the required predicted time step according to a preset candidate time step;

and performing time sequence prediction on the selected discrete characteristic value embedded vector sequence through a time sequence prediction model to obtain target prediction time sequence data in the prediction time step.

In some embodiments, the time-series prediction model includes a generator, the generator includes an encoding network, a decoding network, a fully-connected network, and an activation function network, the time-series predicting the selected discrete eigenvalue embedded vector sequence by the time-series prediction model to obtain target prediction time-series data at the prediction time step includes:

performing first attention processing on the selected discrete eigenvalue embedded vector sequence through the coding network to obtain a coding time sequence eigenvector;

performing second attention processing on the coding time sequence characteristic vector through the decoding network to obtain a decoding time sequence characteristic vector;

performing feature mapping on the decoding time sequence feature vector through the full-connection network to obtain a candidate prediction time sequence feature vector;

and activating the candidate prediction time sequence characteristic vector through the activation function network to obtain target prediction time sequence data in a prediction time step.

In some embodiments, the encoding network includes a multi-head self-attention layer, a residual connection layer, a normalization layer, a first full-connection layer, an activation function layer, and a second full-connection layer, and the performing, by the encoding network, a first attention process on the selected discrete eigenvalue embedded vector sequence to obtain an encoded temporal eigenvector includes:

performing attention calculation on the selected discrete eigenvalue embedded vector through the multi-head self-attention layer to obtain a first time sequence eigenvector;

summing the selected discrete eigenvalue embedded vector and the first time sequence eigenvector through the residual error connection layer to obtain a second time sequence eigenvector;

normalizing the second time sequence characteristic vector through the normalization layer to obtain a third time sequence characteristic vector;

performing feature mapping on the third time sequence feature vector through the first full connection layer to obtain a fourth time sequence feature vector;

activating the fourth time sequence feature vector through the activation function layer to obtain a fifth time sequence feature vector;

and performing feature mapping on the fifth time sequence feature vector through the second full connection layer to obtain the coding time sequence feature vector.

In some embodiments, before the discrete eigenvalue embedded vector is input to a preset time sequence prediction model for time sequence prediction, and target prediction time sequence data at a preset prediction time step is obtained, the method further includes:

training the time sequence prediction model specifically comprises:

acquiring historical sample time sequence data at a historical sample time step; the historical sample timing data comprises historical sample timing characteristics, and each historical sample timing characteristic comprises a historical sample related variable characteristic and a historical sample target variable characteristic;

inputting the historical sample related variable characteristics and the historical sample target variable characteristics into a preset generator to generate time sequence data, and obtaining predicted target variable characteristics at a preset predicted sample time step;

splicing the predicted target variable characteristics and the historical sample target variable characteristics to obtain first characteristics to be identified;

acquiring reference target variable characteristics at the time step of the prediction sample, and splicing the reference target variable characteristics and the historical sample target variable characteristics to obtain second characteristics to be identified;

inputting the first feature to be identified into a preset identifier for identification to obtain a first identification result, and inputting the second feature to be identified into the identifier for identification to obtain a second identification result;

constructing a discrimination loss function according to the first discrimination result and the second discrimination result to obtain discrimination loss data;

calculating the mean square error according to the predicted target variable characteristics and the reference target variable characteristics to obtain first generation loss data;

constructing a generating loss function according to the first identification result to obtain second generating loss data;

performing parameter adjustment on the discriminator according to the discrimination loss data, and performing parameter adjustment on the generator according to the first generation loss data and the second generation loss data;

and obtaining the time sequence prediction model according to the adjusted generator.

In some embodiments, the discriminator includes a forward neural network, a backward neural network, and a classifier, and the inputting the first feature to be discriminated into a preset discriminator for discrimination to obtain a first discrimination result includes:

performing feature processing on the first feature to be identified according to a preset time sequence through the forward neural network to obtain a forward output vector;

performing feature processing on the first feature to be identified according to a time sequence opposite to a preset time sequence through the reverse neural network to obtain a reverse output vector;

and classifying the forward output vector and the reverse output vector through the classifier to obtain the first identification result.

A second aspect of an embodiment of the present application provides a time series data prediction apparatus, including:

the time sequence data acquisition module is used for acquiring historical time sequence data in a historical time step; wherein the historical timing data comprises: a timing characteristic;

the distribution module is used for inputting the historical time sequence data into a preset barrel-dividing neural network for distribution prediction to obtain a distribution probability sequence, and the distribution probability sequence comprises the distribution probability of each time sequence feature in each preset barrel; the preset sub-buckets are used for discretizing the time sequence characteristics;

the characteristic discretization module is used for inputting the historical time sequence data into the preset sub-buckets to carry out characteristic discretization so as to obtain bucket embedding vectors;

the characteristic embedding representation module is used for carrying out weighted summation according to the distribution probability and the barrel embedding vector to obtain a discrete characteristic value embedding vector of the historical time sequence data;

and the time sequence prediction module is used for inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data in a preset prediction time step length.

A third aspect of embodiments of the present application proposes a computer device, which includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, the processor is configured to perform the method according to any one of the embodiments of the first aspect of the present application.

A fourth aspect of embodiments of the present application proposes a computer-readable storage medium, which stores a computer program for performing, when the computer program is executed by a computer, the method according to any one of the embodiments of the first aspect of the present application.

According to the time sequence data prediction method and device, the computer equipment and the storage medium, when the time sequence characteristics are discretized, the time sequence characteristics are firstly distributed to different preset sub-buckets, bucket embedding obtained after discretization is carried out on the time sequence characteristics according to the distribution probability of the different preset sub-buckets and the preset sub-buckets is subjected to weighted summation, continuous representation on the time sequence characteristics is completed, and discrete characteristic value embedding vectors are obtained. And finally, performing time sequence prediction on the discrete characteristic value embedded vector through a time sequence prediction model to obtain target prediction time sequence data. The embodiment of the application improves the prediction precision.

Drawings

FIG. 1 is a schematic diagram of a system architecture for performing a method for predicting timing data according to an embodiment of the present application;

FIG. 2 is a flow chart of steps of a method for predicting time series data according to an embodiment of the present application;

FIG. 3 is a flowchart of the steps of step 102 of FIG. 2;

FIG. 4 is a flowchart of the steps of step 105 of FIG. 2;

FIG. 5 is a flowchart of the steps of step 303 of FIG. 4;

FIG. 6 is a flowchart illustrating steps of a method for predicting time series data according to another embodiment of the present disclosure;

fig. 7 is a block diagram illustrating a module structure of a time series data prediction apparatus according to an embodiment of the present application;

fig. 8 is a hardware structure diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

It is noted that while functional block divisions are provided in device diagrams and logical sequences are shown in flowcharts, in some cases, steps shown or described may be performed in sequences other than block divisions within devices or flowcharts. The terms first, second and the like in the description and in the claims, and the drawings described above, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

First, several terms referred to in the present application are resolved:

artificial Intelligence (AI): is a new technical science for researching and developing theories, methods, technologies and application systems for simulating, extending and expanding human intelligence; artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produces a new intelligent machine that can react in a manner similar to human intelligence, and research in this field includes robotics, language recognition, image recognition, natural language processing, and expert systems, among others. The artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is also a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The time series data prediction method provided by the embodiment of the application can be applied to artificial intelligence, and the artificial intelligence basic technology generally comprises the technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, a big data processing technology, an operation/interaction system, electromechanical integration and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

A neural network: the method is an arithmetic mathematical model simulating animal neural network behavior characteristics and performing distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system.

And (3) a back propagation algorithm: the BP (Back Propagation) algorithm is suitable for a learning algorithm of a multilayer neuron network and is established on the basis of a gradient descent method. The input-output relationship of the BP neural network is substantially a mapping relationship: an n-input m-output BP neural network performs the function of continuous mapping from n-dimensional euclidean space to a finite field in m-dimensional euclidean space, which is highly non-linear. Its information processing power comes from multiple composition of simple nonlinear function, so that it possesses strong function reproducibility.

Loss function: the loss function is used to evaluate the degree of difference between the predicted value and the true value of the model. In addition, the loss function is also an optimized objective function in the neural network, the training or optimizing process of the neural network is the process of minimizing the loss function, the smaller the loss function is, the closer the predicted value of the model is to the true value, and the better the accuracy of the model is.

At present, the value of data is more and more emphasized, time sequence data is an indispensable part in data concentration, data processing is widely applied to process data acquisition and process control in the fields of internet of things, internet of vehicles and industrial internet, a data link is established with process management, and the method becomes a new field of industrial data management. The development of emerging industries such as internet of things, big data, cloud computing and the like greatly promotes the industrial automation process. In the industrial production process, a large amount of information collected by various sensors collected through the Internet of things forms sensor big data, the data has typical time sequence data characteristics, and the automatic production process can be effectively supervised, the risk hidden danger can be prevented, and the industrial technology can be improved through processing and predicting the time sequence data.

For the prediction of time series data, some methods are also disclosed in the related art, but there is a problem of low accuracy. For this reason, the conventional prediction model neglects embedding of continuous features (i.e., feature extraction), and thus cannot reflect fluctuations in real data and cannot perform long-term sequence prediction.

Based on this, the embodiment of the application provides a time sequence data prediction method, which is a time sequence prediction method based on a continuous characteristic embedded countermeasure generation network. The method comprises the steps of utilizing a continuous characteristic embedded mode to realize high model capacity and end-to-end training, wherein each characteristic value has independent representation; a generative confrontation network based on a self-attention layer is adopted, a generator generates a predicted sequence by learning historical data distribution, a discriminator tries to classify whether a sample is from a real data set or is predicted, and the generative confrontation network is a generative model and only uses back propagation compared with other generative models without complex Markov chain. Multi-step prediction of long time sequences is achieved.

In addition, the embodiment of the application also provides a time sequence data prediction device, computer equipment and a computer storage medium, which are used for executing the time sequence data prediction method.

The time sequence data prediction method provided by the embodiment of the application is applied to a server side and can also be software running in the server side. In some embodiments, the server side may be configured as an independent physical server, may also be configured as a server cluster or a distributed system formed by a plurality of physical servers, and may also be configured as a cloud server that provides basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform; the software may be an application or the like implementing the above method, but is not limited to the above form.

Embodiments of the application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The present application provides a time series data prediction method, which can be applied to a system architecture 100 as shown in fig. 1, where the system architecture 100 can include

terminal devices

101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The plant terminals or the supplier terminals may use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the time series data prediction method provided in the embodiments of the present application is generally executed by a server, and accordingly, the time series data prediction apparatus is generally disposed in the server. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The time series data prediction method provided by the embodiment of the application is specifically explained by the following embodiment.

Referring to fig. 2, a time series data prediction method according to an embodiment of the present application includes, but is not limited to, steps 101 to 105.

Step 101, acquiring historical time sequence data in a historical time step; wherein the historical time series data comprises: a timing characteristic;

102, inputting historical time sequence data into a preset sub-bucket neural network for distribution prediction to obtain a distribution probability sequence, wherein the distribution probability sequence comprises the distribution probability of time sequence characteristics in each preset sub-bucket; the preset sub-barrels are used for discretizing time sequence characteristics;

103, inputting historical time sequence data into preset sub-buckets to carry out feature discretization to obtain bucket embedding vectors;

104, carrying out weighted summation according to the distribution probability and the bucket embedding vector to obtain a discrete eigenvalue embedding vector of historical time sequence data;

and 105, inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data in a preset prediction time step.

In the steps S101 to S105 illustrated in the embodiment of the present application, it is considered that the time fluctuation and the long-distance prediction of the continuous characteristic cannot be accurately reflected when the conventional time sequence prediction method performs time sequence prediction, and the time sequence characteristic in the industrial time sequence data in the embodiment of the present application is exactly the continuous characteristic highly related to time, so how to embed and express the time sequence characteristic is one of the most important points in the embodiment of the present application. In the embodiment of the application, when the time sequence features are discretized, the time sequence features are firstly distributed to different preset sub-buckets, bucket embedding obtained after discretization is carried out on the time sequence features according to the distribution probability and the preset sub-buckets is subjected to weighted summation, continuous representation on the time sequence features is completed, and discrete feature value embedding vectors are obtained. And finally, performing time sequence prediction on the discrete characteristic value embedded vector through a time sequence prediction model to obtain target prediction time sequence data. Therefore, the time sequence data are obtained by performing time sequence prediction through the discrete characteristic values, and therefore the time sequence prediction precision is improved in the embodiment of the application.

In discretizing the timing characteristics, the following method can be adopted: one way to deal with the timing characteristics is to assign a separate embedded module to each characteristic value, but this approach has a large number of parameters. Another way to handle continuous features is to assign an embedded module to each feature, reducing the number of parameters, but the model capability is also somewhat limited. The way from step 102 to step 104 is to use a method of embedding each time sequence feature in a barrel, which ensures the parameter quantity and can improve the embedding performance.

In step 101 of some embodiments, historical time series data at historical time steps is obtained; wherein the historical timing data comprises: and (4) timing characteristics.

The time-series data refers to a data sequence recorded in time series. Each data in the same data column is of the same aperture and is required to be comparable. The time series data can be time period number or time point number. The time sequence data management is mainly used for helping enterprises to monitor the production and operation processes of the enterprises in real time through the collection, storage, inquiry, processing and analysis of the time sequence data. The time series data is also obvious in application characteristics, for example, the data is usually only kept for a certain time length, operations such as frequency reduction sampling, interpolation, real-time calculation, aggregation and the like need to be performed, and a trend of a period of time is concerned instead of a value of a certain specific time and the like.

In order to monitor the running state of equipment, a production line and the whole system, industrial enterprises are provided with sensors at all key points and collect various industrial time sequence data. The industrial time sequence data are generated periodically or quasi-periodically, the acquisition frequency is high or low, the acquired industrial time sequence data are generally sent to a server for summarizing and real-time processing, and real-time monitoring or early warning is carried out on the operation of the system.

Next, description is made on how to acquire time series data: (1) And data acquisition, namely acquiring historical time sequence data of the N related variable time sequence characteristics X and the target variable time sequence characteristics Y. And (2) cleaning and processing data. And analyzing the time delay and the correlation among the variable time sequence characteristics based on the variable dimension, and removing the process parameters with strong linear correlation of two pairs for preliminary dimension reduction. Null values are filled with last time instant values based on the sample dimension. And (3) standardizing the data. And processing each column of process parameter data (time sequence characteristics), and converting into a standard normal distribution by a z-score method. Eliminating errors caused by different variable dimensions; z-score is a linear transformation that scales and then translates feature X. And (4) making a data set. And setting the historical data time window length h _ steps and the prediction step size p _ steps according to prediction requirements and experience.

In step 102 of some embodiments, historical time series data is input to a preset partitioned bucket neural network for distribution prediction, and a distribution probability sequence is obtained. The distribution probability sequence comprises the distribution probability of the time sequence characteristic in each preset sub-bucket; wherein, predetermine the bucket of minute and be used for carrying out the discretization to the chronogenesis characteristic.

Referring to fig. 3, in an embodiment, the barreled neural network includes a fully connected sub-network and an activation function sub-network, and step 102 specifically includes:

step 201, inputting the time sequence characteristics into a full-connection sub-network for vector conversion to obtain a time sequence vector;

step 202, inputting the time sequence vector into an activation function subnetwork to carry out sub-bucket prediction, and obtaining the distribution probability of the time sequence characteristic in each preset sub-bucket;

and 203, combining the distribution probability of the time sequence characteristics in each preset sub-bucket to obtain a distribution probability sequence.

Steps 201-203 are explained in detail next.

In step 201, a fully-connected subnetwork refers to a neural network composed of fully-connected layers. Fully connected layers (FC) act as "classifiers" throughout the convolutional neural network. The fully-connected layer serves to map timing characteristics to the sample mark space. In actual use, the fully-connected layer may be implemented by a convolution operation. The fully connected layer with the fully connected front layer can be converted into convolution with convolution kernel of 1x 1; and the fully-connected layer of which the front layer is the convolutional layer can be converted into the global convolution with the convolution kernel of hw, wherein hw is the height and width of the convolution result of the front layer respectively.

In step 202, the sub-network of activation functions refers to a neural network composed of activation functions. An Activation Function (Activation Function) is a Function added to an artificial neural network, and is intended to help the network learn complex patterns in data. In a neuron, input of an input is subjected to a series of weighted summations and then applied to another function, which is an activation function here. Similar to neuron-based models in the human brain, the activation function ultimately determines whether to deliver a signal and what is to be transmitted to the next neuron. In an artificial neural network, the activation function of a node defines the output of the node at a given input or set of inputs. Common activation functions include Sigmoid activation function, reLU activation function, and Softmax activation function.

And the bucket is used for offline the continuous feature into a series of 0/1 discrete features, namely discretizing the features. When the numerical features span different orders of magnitude, the model may only be sensitive to large feature values, which may allow for binning. Moreover, the sparse vectors obtained after barreling have higher inner product multiplication speed, calculation results are more convenient to store, and the robustness to abnormal data is very strong.

The number and width of the buckets can be specified according to experience in the business field, but there are also some conventional practices:

and (5) equally dividing the barrels. The width of each bucket is fixed, i.e. the range of values is fixed, such as 0-99, 100-199, 200-299, etc.; the sample distribution is uniform, and the situations that the number of some buckets is small and the number of some buckets is excessive are avoided. Equal frequency sub-buckets, also known as quantile sub-buckets. I.e. as many samples per bucket, it may happen that samples with too different values are placed in the same bucket. And (5) dividing the model into barrels. The model is used for finding the optimal bucket, such as clustering, the features are divided into a plurality of categories, or a tree model, the nonlinear model naturally has the capability of segmenting continuous features, and feature segmentation points are used for discretization.

The bucket division in the embodiment of the application is equivalent to the model bucket division, but different from the conventional mode, the embodiment of the application does not search the optimal bucket division for the time sequence characteristics. According to the embodiment of the application, the probability of each preset sub-bucket for the time sequence characteristics is calculated through the neural network, and the distribution probability is obtained. And carrying out weighted summation on the bucket embedding obtained by discretizing the time-series characteristics according to the distribution probability and the preset sub-buckets, and finishing continuous representation of the time-series characteristics to obtain discrete characteristic value embedding vectors.

In step 203, after obtaining the distribution probabilities of the time sequence features in different preset buckets, combining the distribution probabilities to obtain a distribution probability sequence.

In step 103 of some embodiments, the historical time series data is input to a preset sub-bucket for feature discretization, and a bucket embedding vector is obtained.

The preset sub-bucket performs feature mapping on time sequence features in historical time sequence data, which can be regarded as discretization of numerical variables, and then performs one-hot coding through binarization to obtain a bucket embedding vector.

In step 104 of some embodiments, a weighted summation is performed based on the allocation probabilities and the bucket embedding vector to obtain a discrete eigenvalue embedding vector for historical time series data.

Next, with a specific example, steps 102 to 104 are further described. Specifically, each time sequence feature is given to H preset sub-buckets, and each preset sub-bucket corresponds to a single d-dimensional bucket to be embedded; the end-to-end barrel division is realized by two layers of full connection layers and a softmax activating function. More specifically, for a specific value xi of the ith time sequence feature, firstly, two layers of neural networks are used for conversion to obtain a vector with the length of H:

h＝Leaky_ReLU(ωx)；

after automatic discretization, the distribution probability of the characteristic values to different sub-buckets is obtained, and then aggregation is carried out based on the distribution probability and the bucket embedding weighted average mode to obtain the bucket embedding corresponding to each characteristic value. And carrying out weighted summation on the bucket embedding of each sub-bucket according to the distribution probability of each sub-bucket to obtain a continuously different and unique embedded representation of the characteristic value. And, the sub-bucket probability distribution that similar characteristic values often obtained is also similar, that is, the bucket embedding that it obtained is also similar.

In step 105 of some embodiments, the discrete eigenvalue embedded vector is input to a preset time sequence prediction model for time sequence prediction, so as to obtain target prediction time sequence data at a preset prediction time step.

The time sequence prediction model is a model established by utilizing time sequence data to research the self-development rule of the object and predict the future development of the object according to the self-development rule. In the time sequence prediction model, the time t is taken as an independent variable, and the change trend of the Y value is researched. In reality, many problems, such as interest rate fluctuation, income rate change, various indexes reflecting stock market quotations, and the like, can be generally expressed as time series data, and through researching the data, the change rules of the economic variables are found (for some variables, the factors influencing the development change of the variables are too many, or the data mainly influencing the variables are difficult to collect, so that a regression model is difficult to establish to find the change development rules of the variables, at the moment, the time series analysis model shows the advantages thereof because the model does not need to establish a causal relationship model, and only the data of the variables per se is needed to establish the model), and the modeling mode belongs to the research category of time series analysis.

Referring to fig. 4, in an embodiment, the barreled neural network includes a fully connected sub-network and an activation function sub-network, and step 105 specifically includes:

step 301, screening discrete eigenvalue embedded vectors in different time step lengths according to a preset time window length to obtain a selected discrete eigenvalue embedded vector sequence;

step 302, extracting a required predicted time step according to a preset candidate time step;

and 303, performing time sequence prediction on the selected discrete eigenvalue embedded vector sequence through a time sequence prediction model to obtain target prediction time sequence data in a prediction time step.

Steps 301-303 are described in detail below.

In a step 301 of the method, a step, the time window length refers to the size of the time window. In many time series related tasks, it is often necessary to determine a suitable time window to split the entire time series into a plurality of subsequences for learning in subsequent tasks. In this embodiment, the discrete eigenvalue embedded vectors at different time steps can be screened by using the time window length to obtain a selected discrete eigenvalue embedded vector sequence. It is understood that the time window length comprises at least one time step. Correspondingly, the selected sequence of discrete eigenvalue embedding vectors comprises at least one discrete eigenvalue embedding vector within the length of the time window.

In steps 302 to 303, the historical time step includes t-2, t-1, and t, and the candidate time step includes t +1, t +2, and t + 3. The time step currently to be predicted, e.g., t +2, may be selected from the candidate time steps. After the timing prediction is performed in step 303, the target prediction timing data at t +2 will be obtained.

It is understood that, if at least two time steps to be currently predicted are selected from the candidate time steps, t +1 and t +2 are included. After the time series prediction is performed in step 303, target prediction time series data at t +1 and target prediction time series data at t +2 are obtained. The embodiment of the application can realize synchronous prediction of a plurality of time steps.

The time sequence prediction model of the embodiment of the application can be a prediction model based on a generative confrontation network, and mainly comprises a generator network G and a discriminator network D. The input historical timing data predicts the target value of the future p _ steps step through the generator network G.

And constructing a generator G based on a multi-head self-attention mechanism. The so-called self-attention mechanism is to directly calculate by some operation obtaining attention weight of each position of a sentence in the coding process; then, the implicit vector representation of the whole sentence is calculated in the form of weighted sum. Finally, the transform architecture is an Encoder-Decoder model constructed based on such a self-attention mechanism. The generator is used for predicting according to historical time series data (including related variable characteristics and target variable characteristics) and finding out a mapping relation between a target variable characteristic sequence Y and a related variable characteristic sequence X. Generative countermeasure networks are generally applied to image generation tasks in computer vision, and therefore, convolutional layers are mostly adopted for stack building. But is not suitable for modeling of timing problems, which is mainly due to the limitation of the size of the convolution kernel, and cannot well capture long-term dependence information. In this regard, the present application adopts a multi-head self-attention mechanism as a generator to mine the vector features after the sequence feature embedding process within the input time window length (i.e., the above-mentioned selected discrete feature value embedding vector sequence) to obtain a predicted value Y' over a period of time (i.e., target prediction time series data).

The generator consists of a codec structure based on a multi-head attention layer. The encoding part consists of 6 self-attention modules based on the multi-head attention system, the decoding part consists of 6 self-attention modules based on the multi-head attention system, and after the result output by decoding passes through a full connection layer, the final prediction result is calculated and output through softmax.

In order to take account of the simplicity of calculation and the flexibility of the model, the model adopts a mode of linear operation and nonlinear transformation of an activation function. The normalization layer performs normalization on the input vector. Generally, the input to the model is usually normalized so that the input to the model follows a normal distribution with a mean u and a variance h, which can speed up the convergence of the model. The fully-connected layer acts to map the output vectors of the previous layer to vectors of a specified dimension.

And (3) coding structure: the processed input sequence XE (selected discrete eigenvalue embedded vector sequence) is input to the encoding section. After a multi-head self-attention layer is firstly passed through in the self-attention module, a residual error calculation is carried out, namely an input vector is added into an output vector, and the output structure is changed into f (x) + x. And then, carrying out layer normalization, namely, carrying out normalization on each sample output by each layer, converting the data into the data with the mean value of 0 and the variance of 1, reducing the deviation of the data and avoiding gradient disappearance or explosion in the training process. And inputting the feature vector after layer normalization into a full-link neural network, then inputting the feature vector into a full-link network through a ReLU activation function once, outputting the feature vector, and finishing the calculation of a self-attention module. And transmitting the output feature vector into a next self-attention module, sequentially transmitting the feature vector to the 6 th module, and outputting the final feature vector of the coding part.

The decoding structure comprises: the processed output sequence Y is used as input for the decoding section. In order to facilitate processing and not introduce future information in the training process, a mask mechanism is adopted, and the number of the future position is set to be minus infinity.

Thus, after the subsequent calculation of the softmax activation function, the value is 0, that is, the influence of the future event on the current event will become 0, so that each event is only influenced by the historical events. The input vector of the decoding part firstly passes through a multi-head self-attention layer, and is subjected to residual error connection and layer normalization once, and then passes through a multi-head self-attention layer, and is subjected to residual error connection and layer normalization once to obtain an output H, and finally passes through a full-connection layer to obtain a generated sequence, namely a prediction result Y'.

With reference to fig. 5, in an embodiment, the time-series prediction model includes a generator, where the generator includes an encoding network, a decoding network, a full-connection network, and an activation function network, and step 303 specifically includes:

step 401, performing first attention processing on the selected discrete eigenvalue embedded vector sequence through a coding network to obtain a coding time sequence eigenvector;

step 402, carrying out second attention processing on the coding time sequence characteristic vector through a decoding network to obtain a decoding time sequence characteristic vector; wherein, the processing mechanism of the decoding network is a mask mechanism, and the number of the future position is set as minus infinity;

step 403, performing feature mapping on the decoding time sequence feature vector through a full-connection network to obtain a candidate prediction time sequence feature vector;

and step 404, activating the candidate prediction time sequence characteristic vector through an activation function network to obtain target prediction time sequence data in a prediction time step.

In an embodiment, the encoding network includes a multi-head self-attention layer, a residual connection layer, a normalization layer, a first fully-connected layer, an activation function layer, and a second fully-connected layer, and step 401 specifically includes: performing attention calculation on the selected discrete eigenvalue embedded vector through a multi-head self-attention layer to obtain a first time sequence eigenvector; the residual error connecting layer sums the selected discrete eigenvalue embedding vector and the first time sequence eigenvector to obtain a second time sequence eigenvector; normalizing the second time sequence feature vector through a normalization layer to obtain a third time sequence feature vector; performing feature mapping on the third time sequence feature vector through the first full connection layer to obtain a fourth time sequence feature vector; activating the fourth time sequence feature vector through an activation function layer to obtain a fifth time sequence feature vector; and performing feature mapping on the fifth time sequence feature vector through a second full connection layer to obtain a coding time sequence feature vector.

In an embodiment, the decoding network includes a multi-headed self-attention layer, a residual connection layer, a normalization layer, and a target full connection layer, and step 402 specifically includes: performing attention calculation on the encoding time sequence feature vector through a multi-head self-attention layer to obtain a first decoding vector; the residual error connecting layer sums the coding time sequence characteristic vector and the first decoding vector to obtain a second decoding vector; normalizing the second decoding vector through a normalization layer to obtain a third decoding vector; and performing feature mapping on the third decoding vector through the target full-connection layer to obtain a decoding time sequence feature vector.

The fully connected network in step 403 is composed of fully connected layers and the network of activation functions in step 404 is composed of activation functions.

As shown in fig. 6, in one embodiment, the time series prediction model is trained in advance by the following process:

step 501, obtaining historical sample time sequence data of a historical sample time step; the historical sample timing data comprises historical sample timing characteristics, and each historical sample timing characteristic comprises a historical sample related variable characteristic and a historical sample target variable characteristic;

step 502, inputting the historical sample related variable characteristics and the historical sample target variable characteristics into a preset generator to generate time sequence data, and obtaining predicted target variable characteristics at a preset predicted sample time step length;

step 503, splicing the predicted target variable characteristic and the historical sample target variable characteristic to obtain a first characteristic to be identified;

step 504, obtaining a reference target variable characteristic at a time step of a prediction sample, and splicing the reference target variable characteristic and a target variable characteristic of a historical sample to obtain a second characteristic to be identified;

step 505, inputting the first feature to be identified into a preset identifier for identification to obtain a first identification result, and inputting the second feature to be identified into the identifier for identification to obtain a second identification result;

step 506, constructing a discrimination loss function according to the first discrimination result and the second discrimination result to obtain discrimination loss data; performing mean square error calculation according to the predicted target variable characteristics and the reference target variable characteristics to obtain first generation loss data; constructing a generating loss function according to the first identification result to obtain second generating loss data;

and 507, performing parameter adjustment on the discriminator according to the discrimination loss data, performing parameter adjustment on the generator according to the first generation loss data and the second generation loss data, and obtaining a time sequence prediction model according to the adjusted generator.

Steps 501-507 are described in detail below.

The historical timing data in step 501 is at historical time steps, which may be, for example, t-2, t-1, and so on. The historical timing data may include a plurality of historical sample timing characteristics, with each historical sample timing characteristic including a historical sample related variable characteristic and a historical sample target variable characteristic.

It should be noted that the general time-series data includes related variable data X and target variable data Y, and the target variable data Y changes with the time step and the related variable data X. In the embodiment of the present application, the relevant variable data X is expressed as a historical sample relevant variable feature, and the target variable data Y is expressed as a historical sample target variable feature.

The generator in step 502 has a similar structure to the time sequence prediction model mentioned in the above embodiments, but the parameters are different, and the time sequence prediction model can be obtained by adjusting the parameters of the generator. The generator generates time sequence data of the historical sample related variable characteristics and the historical sample target variable characteristics, and aims to generate and obtain the predicted target variable characteristics at the preset predicted sample time step.

In the embodiments of the present application, in

steps

503 and 504, after obtaining the predicted target variable characteristics, the predicted target variable characteristics and the reference target variable characteristics may be input togetherTo the discriminator so that the discriminator gives a discrimination result for characterizing whether the predicted target variable characteristic is true or false. However, in practical application, the stability of the identification result is poor, and the accuracy of identification cannot be guaranteed. Therefore, in order to ensure that the predicted trend is correct and the predicted value is more accurate, the target variable characteristic Y is predicted _pre And historical sample target variable characteristics Y _history Splicing to obtain a first feature Y to be identified _history,pre And will reference the target variable characteristic Y _real And historical sample target variable characteristics Y _history Splicing to obtain a second characteristic Y to be identified _history,real 。

In step 505, the first feature to be identified Y is identified _history,pre And a second feature Y to be authenticated _history,real The input discriminator D discriminates and judges the predicted target variable characteristic Y _pre Whether real or generator generated.

The discriminator, also called discriminator, is intended to constitute a differentiable function D for classifying input data. The desired discriminator outputs 0 when false data is input and outputs 1 when true data is input. The modeling of the time sequence problem is generally habitually modeled by adopting a Recurrent Neural Network (RNN), because the natural recurrent autoregressive structure of the RNN is a good representation of a time sequence, the RNN needs to be processed in sequence, a hidden state ht-1 is obtained after a t-1 sample is processed, and the ht-1 and a t-moment sample are input into the RNN to obtain the hidden state ht. Both RNN and LSTM can only predict the output at the next time based on timing information at previous times, but in some cases, the output at the current time is not only related to previous states, but may also be related to future states. The BilSTM is formed by superposing two LSTMs one above the other, and the output is determined by the states of the two LSTMs together.

In one embodiment, the discriminator comprises a forward neural network, an inverse neural network, and a classifier, and step 505 comprises:

performing feature processing on the first feature to be identified according to a preset time sequence through a forward neural network to obtain a first forward output vector;

performing feature processing on the first feature to be identified according to a time sequence opposite to a preset time sequence through a reverse neural network to obtain a first reverse output vector;

and classifying the first forward output vector and the first backward output vector through a classifier to obtain a first identification result.

In one embodiment, step 505 further comprises:

performing feature processing on the second feature to be identified according to a preset time sequence through a forward neural network to obtain a second forward output vector;

performing feature processing on the second feature to be identified according to a time sequence opposite to the preset time sequence through a reverse neural network to obtain a second reverse output vector;

and classifying the second forward output vector and the second backward output vector through a classifier to obtain a second identification result.

In the above steps, a BilSTM-based discriminator may be constructed. BilSTM includes a forward BilSTM nodal link (also known as a forward neural network) and a reverse BilSTM nodal link (a reverse neural network). The forward BilSTM node link comprises a plurality of forward BilSTM nodes h1, h2 ... hn, and the reverse BilSTM node link comprises a plurality of reverse BilSTM nodes hn ' ... h2', h1' which are sequentially cascaded. Each input vector X1, X2, \8230, xn is input to a forward BilSTM node and a reverse BilSTM node, and an output result generated by the forward BilSTM node and an output result generated by the reverse BilSTM node jointly form an output vector corresponding to the input vector. And finally, classifying the output vectors through a classifier to respectively obtain a first identification result and a second identification result.

In step 506, the first authentication result is used

And a second discrimination result +>

Constructing a discriminant loss function to obtain discriminant loss data D _loss (ii) a According to predicted target variable characteristics Y _pre And reference target variable characteristics Y _real Calculating the mean square error to obtain first generation loss data g _MSE (ii) a Based on the first identification result->

Constructing a generation loss function to obtain second generation loss data G _loss Specifically, the following is shown:

G _loss ＝β ₁ g _MSE +β ₂ g _loss ；

where m is the sample size set in the batch, β ₁ ，β ₂ Is a settable weight.

In step 507, based on the loss data obtained above, the loss data D is discriminated from the data _loss Adjusting parameters of the discriminator according to the first generation loss data g _MSE And second generation loss data g _loss And adjusting parameters of the generator. And then obtaining a time sequence prediction model according to the adjusted generator.

By integrating the embodiments of the application, in order to solve the problem that the traditional time sequence prediction model neglects embedding of continuous features (namely feature extraction) and cannot reflect fluctuation in real data and predict a long-time sequence, the method processes the acquired data to obtain continuous features, discretizes the continuous features to introduce nonlinearity to the model, and improves the expression capacity of the model. Meanwhile, a generative countermeasure method is adopted to capture the distribution of real data, and multi-step prediction and trend with higher precision are obtained.

Therefore, the method of the embodiment can be used for multi-step prediction of an industrial system, provides a basis for changing the control strategy, enables the production system to be in an ideal state for as long as possible, and improves the production efficiency of the system. The automation intelligence level of the monitoring system and the control system can be further improved, and the method has a good application prospect.

Referring to fig. 7, a time series data prediction apparatus capable of implementing the time series data prediction method is further provided in the embodiment of the present application, and fig. 7 is a block diagram of a module structure of the time series data prediction apparatus provided in the embodiment of the present application, where the apparatus includes: a timing data acquisition module 601, an assignment module 602, a feature discretization module 603, a feature embedded representation module 604, and a timing prediction module 605. The time sequence data acquisition module 601 is configured to acquire historical time sequence data at a historical time step; wherein the historical timing data comprises: a timing characteristic; the distribution module 602 is configured to input historical time sequence data into a preset sub-bucket neural network for distribution prediction, so as to obtain a distribution probability sequence, where the distribution probability sequence includes a distribution probability of each time sequence feature in each preset sub-bucket; the preset sub-barrels are used for discretizing time sequence characteristics; the characteristic discretization module 603 is used for inputting historical time sequence data into preset sub-buckets to perform characteristic discretization so as to obtain bucket embedding vectors; the feature embedding representation module 604 is configured to perform weighted summation according to the distribution probability and the bucket embedding vector to obtain a discrete feature value embedding vector of the historical time series data; and the time sequence prediction module 605 is configured to input the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction, so as to obtain target prediction time sequence data in a preset prediction time step.

The time series data prediction device of the embodiment of the application is used for executing the time series data prediction method in the embodiment, and the specific processing procedure is the same as that of the time series data prediction method in the embodiment, and is not repeated here.

In the device of the embodiment of the application, when the time sequence features are discretized, the time sequence features are firstly distributed to different preset sub-buckets, bucket embedding obtained after discretization of the time sequence features is carried out according to the distribution probability and the preset sub-buckets is subjected to weighted summation, continuous representation of the time sequence features is completed, and discrete feature value embedding vectors are obtained. And finally, performing time sequence prediction on the discrete characteristic value embedded vector through a time sequence prediction model to obtain target prediction time sequence data. The prediction accuracy is improved.

The embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor is used to execute the time series data prediction method according to any one of the embodiments of the present application when the computer program is executed by the processor.

The hardware structure of the computer apparatus will be described in detail with reference to fig. 8. The computer device includes: a processor 701, a memory 702, an input/output interface 703, a communication interface 704, and a bus 705.

The processor 701 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute a relevant program to implement the technical solution provided in the embodiment of the present Application;

the Memory 702 may be implemented in a ROM (Read Only Memory), a static Memory device, a dynamic Memory device, or a RAM (Random Access Memory). The memory 702 may store an operating system and other application programs that, when implemented in software or firmware, the relevant program codes are stored in the memory 702, and are called by the processor 701 to execute the time series data prediction method of the embodiment of the present application;

an input/output interface 703 for realizing information input and output;

the communication interface 704 is used for realizing communication interaction between the device and other devices, and can realize communication in a wired manner (for example, USB, network cable, etc.) or in a wireless manner (for example, mobile network, WIFI, bluetooth, etc.); and a bus 705 that transfers information between various components of the device, such as the processor 701, the memory 702, the input/output interface 703, and the communication interface 704;

wherein the processor 701, the memory 702, the input/output interface 703 and the communication interface 704 are communicatively connected to each other within the device via a bus 705.

The embodiment of the present application further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a computer, the computer is configured to execute the time series data prediction method according to any one of the embodiments of the present application.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiments described in the embodiments of the present application are for more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation to the technical solutions provided in the embodiments of the present application, and it is obvious to those skilled in the art that the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems with the evolution of technology and the emergence of new application scenarios.

It will be understood by those skilled in the art that the embodiments shown in fig. 2 to 6 do not constitute a limitation of the embodiments of the present application, and may include more or less steps than those shown, or may combine some steps, or different steps.

The above-described embodiments of the apparatus are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be implemented in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" is used for describing the association relationship of the associated object, there may be three relationships, for example, "A and/or B" may mean: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes multiple instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing programs, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and the scope of the claims of the embodiments of the present application is not limited thereto. Any modifications, equivalents, and improvements that may occur to those skilled in the art without departing from the scope and spirit of the embodiments of the present application are intended to be within the scope of the claims of the embodiments of the present application.

Claims

1. A method for predicting time series data, the method comprising:

2. The method of claim 1, wherein the sub-barreled neural network comprises a fully connected sub-network and an activation function sub-network, and the inputting the historical time series data into a preset sub-barreled neural network for distribution prediction to obtain a distribution probability sequence comprises:

3. The method of claim 1, wherein the step of inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction to obtain target prediction time sequence data at a preset prediction time step comprises:

and performing time sequence prediction on the selected discrete characteristic value embedded vector sequence through the time sequence prediction model to obtain target prediction time sequence data in the prediction time step.

4. The method of claim 3, wherein the time-series prediction model comprises a generator comprising an encoding network, a decoding network, a fully-connected network, and an activation function network, and wherein the time-series prediction of the selected discrete eigenvalue embedded vector sequence by the time-series prediction model to obtain target prediction time-series data at the prediction time step comprises:

and performing activation processing on the candidate prediction time sequence characteristic vector through the activation function network to obtain target prediction time sequence data in a prediction time step.

5. The method of claim 4, wherein the coding network comprises a multi-headed self-attention layer, a residual connection layer, a normalization layer, a first full-connection layer, an activation function layer, and a second full-connection layer, and wherein the first attention processing is performed on the selected discrete eigenvalue embedded vector sequence through the coding network to obtain a coding temporal eigenvector comprises:

performing attention calculation on the selected discrete eigenvalue embedding vector through the multi-head self-attention layer to obtain a first time sequence eigenvector;

normalizing the second time sequence feature vector through the normalization layer to obtain a third time sequence feature vector;

6. The method according to any one of claims 1 to 5, wherein before inputting the discrete eigenvalue embedded vector to a preset time sequence prediction model for time sequence prediction, obtaining target prediction time sequence data at a preset prediction time step, the method further comprises:

training the time sequence prediction model specifically comprises:

7. The method according to claim 6, wherein the discriminator comprises a forward neural network, a backward neural network and a classifier, and the inputting the first feature to be discriminated into a preset discriminator for discrimination results to obtain a first discrimination result comprises:

8. An apparatus for predicting time series data, the apparatus comprising:

the time sequence data acquisition module is used for acquiring historical time sequence data in historical time step length; wherein the historical timing data comprises: a timing characteristic;

9. A computer device, characterized in that the computer device comprises a memory and a processor, wherein the memory has stored therein a computer program, and the processor is adapted to perform, when the computer program is executed by the processor:

the method of any one of claims 1 to 7.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which, when executed by a computer, is configured to perform:

the method of any one of claims 1 to 7.