CN114519610A

CN114519610A - Information prediction method and device

Info

Publication number: CN114519610A
Application number: CN202210143587.6A
Authority: CN
Inventors: 赵叶宇; 范芳芳; 方彦明
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-20

Abstract

The embodiment of the specification provides an information prediction method and an information prediction device, wherein the information prediction method comprises the following steps: the method comprises the steps of firstly obtaining historical time sequence information of a first historical time period and comprising a plurality of time sequences, then inputting the historical time sequence information into a pre-trained time sequence model consisting of a plurality of time sequence submodels connected in series, obtaining predicted time sequence vectors of target time periods output by the time sequence submodels, adding a retest mechanism among the time sequence submodels, finally aggregating the predicted time sequence vectors output by the time sequence submodels to obtain an aggregation result, and determining the predicted information of the target time periods based on the aggregation result. Meanwhile, the time sequence model is provided with a plurality of time sequence submodels, different scenes can be processed without establishing a plurality of models, and the cost is saved.

Description

Information prediction method and device

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence, in particular to an information prediction method.

Background

With the development of internet technology, accurate prediction of future information in industries such as offline retail industry, online e-commerce industry, financial credit industry and the like can guide enterprises, institutions and individuals to make correct decisions on current management and operation better, so that greater profits and benefits are obtained.

Currently, prediction is mainly made on future information through a neural network prediction model, but the information of a future period of time is predicted according to the historical information of a past period of time, only one piece of historical data is generally available, and the available processing scenes are limited. For example, the sales volume of each month of the year 2022 is predicted according to the sales volume of each month of the year 2021 of a storefront, and the sales volume of each month of the year 2022 is determined, only one kind of data exists, so that a single kind of historical data can only predict a single piece of information for a future period of time, and the prediction model has a great limitation. Therefore, an information prediction scheme for complex scenes is needed.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide an information prediction method. One or more embodiments of the present disclosure also relate to an information prediction apparatus, a computing device, a computer-readable storage medium, and a computer program to solve the technical problems of the prior art.

According to a first aspect of embodiments herein, there is provided an information prediction method, including:

acquiring historical time sequence information of a first historical time period, wherein the historical time sequence information comprises a plurality of time sequences;

Inputting historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence submodel through a plurality of time sequence submodels connected in series in the time sequence model, wherein except that the input of a first time sequence submodel is the historical time sequence information, the inputs of other time sequence submodels are time sequence vectors of a first historical time period output by a previous time sequence submodel;

aggregating the predicted time sequence vectors output by each time sequence sub-model to obtain an aggregation result;

based on the aggregation result, prediction information of the target time period is determined.

According to a second aspect of embodiments herein, there is provided an information prediction apparatus comprising:

a timing module configured to obtain historical timing information for a first historical time period, wherein the historical timing information comprises a plurality of time sequences; inputting historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence submodel through a plurality of series-connected time sequence submodels in the time sequence model, wherein except that the input of a first time sequence submodel is the historical time sequence information, the input of other time sequence submodels is a time sequence vector of a first historical time period output by a previous time sequence submodel;

And the prediction module is configured to aggregate the predicted time sequence vectors of the target time periods output by the time sequence submodels to obtain a time sequence aggregation result, and determine the prediction information of the target time periods based on the aggregation result.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the information prediction method described above.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the information prediction method described above.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the information prediction method described above.

In one or more embodiments of the present specification, the obtained historical timing information of the first historical time period is input into a pre-trained timing model, a predicted timing vector of a target time period output by each timing submodel is obtained through a plurality of series-connected timing submodels in the timing model, where the input of other timing submodels except the first timing submodel is a back-timing vector of the first historical time period output by the previous timing submodel, then the predicted timing vectors output by each timing submodel are aggregated, and the predicted information of the target time period can be determined based on the aggregation result. Because the time sequence model is a pre-trained multilayer sub-model structure, the multilayer sub-model structure can perform multiple times of back measurement on the time sequence vector of the first historical time period, and predict the time sequence vector of the target time period based on the time sequence vector of each time of back measurement, therefore, a plurality of time sequences can be simultaneously input, information prediction under the condition of inputting the plurality of time sequences is realized by using one time sequence model, the time sequence model does not need to be respectively established for each time sequence, the efficiency of information prediction is improved, the time sequence model is suitable for complex scenes, and the time sequences can simultaneously participate in the calculation of the time sequence sub-model, which means that the time sequences share model parameters, so that the time sequence model can capture the similarity relation among the time sequences, and the result of information prediction is more accurate.

Drawings

FIG. 1 is a flow chart of a method for information prediction provided in one embodiment of the present description;

fig. 2 is a flowchart of a processing procedure in which each time sequence sub-model obtains a predicted time sequence vector of a target time period in an information prediction method according to an embodiment of the present specification;

FIG. 3 is a flow chart illustrating training of a timing model in an information prediction method according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of another information prediction method provided by one embodiment of the present description;

FIG. 5 is a flow chart of yet another information prediction method provided by an embodiment of the present description;

FIG. 6 is a diagram of an information prediction architecture for a pluggable multivariate time series provided in one embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an information prediction apparatus according to an embodiment of the present disclosure;

fig. 8 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms referred to in one or more embodiments of the present specification are explained.

And (3) time sequence prediction: the method is characterized in that the future data change trend is subjected to predictive analysis according to the time sequence of statistical data in the historical time period.

Unary time series: refers to a sequence of statistics of individual data over a historical period of time. Such as statistics of regional power usage over the past year.

Multivariate time series: the method comprises a plurality of unary time sequences, and the sampling time points of the unary time sequences are the same. For example, sales of multiple different merchants over the same historical period of time.

Univariate time series: data that contains only one variable over a historical period of time, such as sales volume for a store.

Multivariate time series: it is data that 2 or more variables can be counted in a historical time period, and a time series of three variables may be a series consisting of the sales of the store on the day, the number of people from the day to the store, and the historical average sales.

Can be inserted: the method refers to abstraction of universal modules under different information prediction project scenes, so that the modules can be flexibly combined to construct a solution meeting the requirements of user projects.

Prediction of future information has been an important research direction in the industry. In the retail industry, the future sales volume/sales volume is predicted, so that the merchant can be helped to better perform inventory management; in the e-commerce industry, the future total commodity transaction Volume (GMV) is predicted, so that the stores can be better helped to prepare stocks and funds, develop marketing activities and the like; in the financial credit industry, prediction of future balance scales can help related institutions to better prepare funds and prevent liquidity risks. It can be seen that timing prediction plays a crucial role in the development of various industries.

Under different project scenes, the data information which can be observed and obtained is very different. From the simplest univariate unary time series, such as sales volume for a single store; to multivariate unitary time series such as number of people to store, sales volume, average sales volume, etc. of individual store history; further, in multivariate time series, such as a plurality of historical statistical data of a plurality of stores, the observed and acquired data information is more and more abundant, and meanwhile, a more complex model structure is required for processing, processing and using the information. In the face of a complex and highly differentiated scene, how to abstract and reassemble the general modules is very important to achieve the effect of covering various requirements.

The problem of a single time series is that only one historical time series is used to predict future daily sales, for example, based on sales information over a period of time. Common methods for this problem include various modeling methods such as differential Integrated Moving Average Autoregressive (ARIMA), exponential smoothing, and the like. However, the modeling method has the disadvantages that only a single time sequence can be processed, when a plurality of time sequences need to be predicted at the same time, such as sales volumes of a plurality of stores, the method cannot be sufficient, each store needs to be modeled separately, on one hand, the modeling time and storage cost are greatly increased, and on the other hand, the model cannot capture the similarity between the time sequences of different stores.

In order to address the above-mentioned problems, in an embodiment of the present specification, an information prediction method is provided, and the present specification relates to an information prediction apparatus, a computing device, a computer-readable storage medium, and a computer program, which will be described in detail one by one in the following embodiments.

Referring to fig. 1, fig. 1 shows a flowchart of an information prediction method provided according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 102: historical timing information of a first historical time period is acquired, wherein the historical timing information comprises a plurality of time sequences.

The first historical time period is a historical time interval for acquiring the statistical data, and may be a past year, a past quarter, a month, a day, an hour, and the like, which is not limited herein.

The historical time sequence information of the first historical time period is item data information which can be acquired in a project scene and is presented according to a time sequence in the historical time period, wherein the historical time sequence is presented in a unit time sequence of the historical time period, for example, each month in a year. The historical time sequence information of the first historical time period includes a plurality of time sequences, the plurality of time sequences are a plurality of time sequences of the item data in the first historical time period, which are acquired in a project scene, and each time sequence may be a univariate unary time sequence or a multivariate unary time sequence, which is not limited herein. For example, the plurality of time series may include a time series of the number of arriving stores, sales, turnover, etc. each store in the plurality of stores is laid out at each month during the past year.

The historical time sequence information of a plurality of time sequences is obtained, a model can process a large amount of data at one time without processing each time sequence in sequence, the information prediction efficiency is improved, and a data basis is provided for subsequent prediction.

And 104, inputting historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence sub-model through a plurality of series-connected time sequence sub-models in the time sequence model, wherein except that the input of the first time sequence sub-model is the historical time sequence information, the input of other time sequence sub-models is the return time sequence vector of the first historical time period output by the previous time sequence sub-model.

The time sequence model is a time sequence prediction model which is trained in advance and can obtain a prediction time sequence vector by encoding historical time sequence information. In the embodiment of the description, the time sequence model is obtained based on machine learning technology training, the time sequence model is different from the traditional modeling methods such as ARIMA and exponential smoothing, and the time sequence model adopts a deep neural network structure with a plurality of time sequence submodels connected in series.

The plurality of time sequence submodels are connected in series in the time sequence model, and each time sequence submodel can obtain a predicted time sequence vector of a target time period and a return time sequence vector of a first historical time period by encoding the input of the time sequence submodel. In the embodiment of the present specification, the acquired historical time series information of the first historical time period is input into the time series model, that is, a plurality of time series of the first time period are input into the time series model together as data of one batch (batch). The first time sequence submodel in the time sequence model takes historical time sequence information as input and can output a time-returning sequence vector of a first historical time period and a predicted time sequence vector of a target time period, then the first time sequence submodel can output the time-returning sequence vector to a second time sequence submodel, the second time sequence submodel takes the time-returning sequence vector as input and can output the time-returning sequence vector of the first historical time period and the predicted time sequence vector of the target time period, and so on, the input of each time sequence submodel is the time-returning sequence vector of the first historical time period output by the last time sequence submodel, and each time sequence submodel can output the predicted time sequence vector of the target time period.

In the transmission process, the plurality of time sequences can participate in the calculation of each time sequence submodel at the same time, which means that the plurality of time sequences share the model parameters, so that the time sequence model can capture the similarity relation among the plurality of time sequences.

The predicted time sequence vector is a time sequence vector of a target time period predicted by each time sequence sub-model after being encoded, wherein the target time period is a future time segment, such as a future day, a week, a month, a year, and the like, and is not limited herein.

And the time sequence vector of the first historical time period is the time sequence vector of the first historical time period measured back after each time sequence submodel is coded, wherein compared with the historical time sequence information of the first historical time period, the time sequence vector of the first historical time period is consistent with the historical time periods, and the unit times arranged according to the unit time sequence of the historical time periods are also consistent.

Through the time sequence model formed by connecting a plurality of time sequence submodels in series in advance, the time sequence model does not need to be established for each time sequence, complex scenes can be processed, a retest mechanism is introduced, and the prediction accuracy of each time sequence submodel on future information is improved.

And 106, aggregating the predicted time sequence vectors output by each time sequence sub-model to obtain an aggregation result, and determining the prediction information of the target time period based on the aggregation result.

Aggregation is a method of integrating predicted timing vectors output by each timing submodel into one vector, and there are many methods of aggregating vectors, such as performing dot multiplication, cross multiplication, and the like on vectors.

The forecast information of the target time period is project data information in the target time period forecasted in a project scene, and may be a time sequence presented in a time sequence, for example, the historical time sequence information of the historical time period is a combination of data such as monthly visit data, sales data and order placing data of a certain brand of store in 2021 year, and the forecast information of the target time period may be a combination of data such as monthly visit data, sales data and order placing data of the brand of store in 2022 year.

In an implementation manner of the embodiment of the present specification, step 106 may be specifically implemented by: performing element-level operation on the predicted time sequence vectors output by each time sequence sub-model to obtain an aggregation result; and according to the aggregation result, obtaining the prediction information of the target time period through the output layer.

Specifically, in order to ensure the unification of data in each predicted timing vector, element-wise operations (element-wise) are adopted for aggregating each predicted timing vector, that is, element-wise operations such as accumulation, multiplication, weighting and the like are performed on elements at the same position in different predicted timing vectors, so as to obtain an aggregated vector, the aggregated vector integrates the predicted timing vectors output by each timing submodel, and the prediction results of a plurality of time sequences are counted more completely. After the aggregation result is obtained, the aggregation result is input into an output layer (for example, a sigmoid layer), and prediction information of the target time period can be obtained through calculation of the output layer.

By applying the embodiment of the specification, the obtained historical time sequence information of the first historical time period is input into a pre-trained time sequence model, the predicted time sequence vector of the target time period output by each time sequence submodel is obtained through a plurality of time sequence submodels which are connected in series in the time sequence model, the input of other time sequence submodels except the first time sequence submodel is the time-returning time sequence vector of the first historical time period output by the previous time sequence submodel, then the predicted time sequence vectors output by each time sequence submodel are aggregated, and the predicted information of the target time period can be determined based on the aggregation result. Because the time sequence model is a pre-trained multilayer sub-model structure, the multilayer sub-model structure can perform multiple times of back measurement on the time sequence vector of the first historical time period, and predict the time sequence vector of the target time period based on the time sequence vector of each time of back measurement, therefore, a plurality of time sequences can be simultaneously input, information prediction under the condition of inputting the plurality of time sequences is realized by using one time sequence model, the time sequence model does not need to be respectively established for each time sequence, the efficiency of information prediction is improved, the time sequence model is suitable for complex scenes, and the time sequences can simultaneously participate in the calculation of the time sequence sub-model, which means that the time sequences share model parameters, so that the time sequence model can capture the similarity relation among the time sequences, and the result of information prediction is more accurate.

In this embodiment, the timing submodel may include: a flow of obtaining a predicted time sequence vector of a target time period for any time sequence submodel from an attention coding layer, a deep neural network coding layer, and a bidirectional predictive coding layer is shown in fig. 2, where fig. 2 shows a flow chart of a processing procedure of obtaining the predicted time sequence vector of the target time period by each time sequence submodel in an information prediction method provided in an embodiment of the present specification, and specifically includes the following steps.

Step 202: and aiming at the input of the time sequence submodel, coding the input by utilizing a self-attention coding layer to obtain a first feature vector.

The self-attention coding layer is a coding layer established based on a self-attention (self-attention) mechanism, information of other time steps can be fused into processing of each time step (timeout) in an input sequence, and output of a subsequent time sequence sub model is guaranteed to be a predicted time sequence vector of a target time period with high relevance and a return time sequence vector of a historical time period with high relevance.

In an implementation manner of the embodiments of the present specification, the above-mentioned processing for each time step in the input sequence to be fused with information of other time steps may be implemented by a coding manner based on a weight matrix. Accordingly, step 202 may be implemented as follows:

Aiming at the input of a self-attention coding layer, a preset self-attention operation mechanism is adopted to obtain a plurality of weight matrixes in parallel; and fusing the plurality of weight matrixes to obtain a first feature vector.

The self-attention-coding Layer includes an Embedding Layer (Embedding Layer) and a fusion Layer (Concat Layer), which are connected in series.

The embedded layer may obtain the multiple weight matrices by using a preset self-attention operation mechanism, where the preset self-attention operation mechanism is an operation mechanism that can obtain the multiple weight matrices in parallel, and may be, for example, an inquiry-Key-Value (QKV, Query-Key-Value) self-attention operation mechanism, and of course, any operation mechanism that can obtain the multiple weight matrices in parallel belongs to the protection scope of the embodiments of the present specification, and is not limited herein. Taking QKV self-attention operation mechanism as an example, it includes inquiry Embedding layer (Query Embedding), Key Embedding layer (Key Embedding) and Value Embedding layer (Value Embedding), the three Embedding layers are connected in parallel, compared with traditional self-attention operation mechanism, because QKV self-attention operation mechanism parallel structure, can high-efficiently parallelize the Embedding operation, so that the operation efficiency of the sub-timing model is improved, thereby improving the prediction efficiency of the timing model.

The input of the embedding layer when carrying out one-time embedding operation is historical time sequence information of a first historical time period or a return time sequence vector of the first historical time period, and is specifically determined by the sequence of the time sequence submodel in the time sequence model (the input of the embedding layer of the first time sequence submodel is the historical time sequence information of the first historical time period; and the input of the embedding layers of other time sequence submodels is the return time sequence vector of the first historical time period). The inputs of the query embedding layer, the key embedding layer and the value embedding layer in the embedding layer are the same.

Specifically, the operation process of the embedding layer is as follows: the method comprises the steps of dividing historical time sequence information or return time vectors of a first historical time period input into an embedding layer into i time steps according to unit time of the first historical time period, then sequentially inputting data of each time step into an inquiry embedding layer, a key embedding layer and a value embedding layer in parallel, and obtaining an inquiry weight matrix, a key weight matrix and a value weight matrix in parallel, wherein the inquiry weight matrix is a weight matrix with the size of i x 1, the key weight matrix is a weight matrix with the size of 1 x i, and the value weight matrix is a weight matrix with the size of i x 1.

Illustratively, the input from the attention-coding layer is a first history Historical time series information of time period, sales data (550, 640, 1060, 970) of a certain brand on a certain platform in 2021, and unit time is a quarter. First, the embedding layer segments the historical timing information (Ti) into 4 time steps [ Ti ] in quarterly]In the case of the 4 time steps, the data (sales) are [ (550), (640), (1060), (970)]. Then sequentially processing each time step [ Ti ]]The query embedding layer, the key embedding layer, and the value embedding layer. The inquiry embedding layer is subjected to sequence embedding, and an obtained inquiry weight matrix (qi) is (q1, q2, q3 and q 4); the key embedding layer is embedded in sequence, and the obtained key weight matrix (ki) is (k1, k2, k3, k4)^T(ii) a The value embedding layer is subjected to sequence embedding, and the obtained value weight matrix (vi) is (v1, v2, v3, v 4).

After obtaining the plurality of weight matrices, the obtained plurality of weight matrices are fused from a fusion layer in the attention coding layer, and a first feature vector can be obtained.

In the embodiments of the present specification, the fusion method specifically includes performing operations such as point multiplication and cross multiplication on the obtained multiple weight matrices. Taking QKV self-attention operation mechanism as an example, firstly, performing quantity product (×) on the inquiry weight matrix and the key weight matrix to obtain a fused quantity product value; and then carrying out vector product (x) on the fused number product and value weight matrix to obtain a fused vector, and determining the fused vector as a first feature vector.

Illustratively, the fusion layer performs a number product on the query weight matrix (qi) and the key weight matrix (ki) to obtain a fusion number product value, i.e., (qi) × (ki) ═ q1 × k1+ q2 × k2+ q3 × k3+ q4 × k 4; and performing vector product on the fused number product (qi) and the value weight matrix (vi) to obtain a fused vector [ (qi) ki (vi) ], and determining the fused vector as a first feature vector.

Because each weight matrix is embedded according to the item data of each time step, a plurality of weight matrices are fused, the item data of each time step can be fused with the item data information of other time steps, and then the capture of the relevance of the time sequence model on the item data of different time steps in the time sequence can be increased, and the prediction information of the target time period with more regularity can be obtained.

The input of the self-attention coding layer is coded by adopting a preset self-attention operation mechanism, a plurality of weight matrixes are obtained in parallel, and the weight matrixes are fused to obtain a first feature vector. The preset self-attention operation mechanism can be used for coding input in a parallelized mode, and the prediction efficiency of the whole time sequence model is improved.

In the self-attention coding layer, besides the self-attention arithmetic mechanism of QKV self-attention arithmetic mechanism of parallel arithmetic, a self-attention arithmetic mechanism of serial operation such as Long short-term memory model (LSTM)/Gated round-robin Unit (GRU) may be used, which is not limited in the embodiments of the present specification.

Step 204: and inputting the first feature vector into a deep neural network coding layer, and sequentially coding through a plurality of full connection layers in the deep neural network coding layer to obtain a second feature vector.

The deep neural network coding layer is composed of a plurality of layers of deep neural networks, and can extract the features in the first feature vector, perform cross combination operation according to the extracted features, generate nonlinear high-order features with high information density, and the generated high-order features can strengthen the information expression of the historical time sequence information of the first historical time period in the prediction information of the subsequent target time period. For example, the historical time series information of the first historical time period is visit amount data, sales amount data and order amount data of a certain brand in a certain platform shop in 2021 year, the first feature vector contains three feature information of visit amount, sales amount and order amount, the deep neural network coding layer can perform cross combination operation on the visit amount and the sales amount to generate a nonlinear high-order feature with high information density, namely visit/sales conversion rate, whether the design of the shop page of a certain brand in a certain platform shop is reasonable can be analyzed through the visit/sale conversion rate, and the brand is further guided to make targeted adjustment of shop page design, the high-order characteristic, namely the visit/sale conversion rate, cannot be directly obtained only through the visit amount or the sale amount and can be obtained only through cross combination operation, so that the prediction information of the time sequence model can be more accurate.

The full connection layer is a part of the multilayer structure in the multilayer deep neural network, and can perform cross combination operation on the extracted multiple features, so that the nonlinear high-order features generated by the multilayer deep neural network have higher information density on the basis of the multiple features. Multiple fully-connected layers are connected in series and can be sequentially encoded, i.e., except for the first fully-connected layer, the input eigenvector of each fully-connected layer is the output eigenvector of the last fully-connected layer.

The second eigenvector is the output eigenvector of the last fully-connected layer, with higher-order features.

Specifically, the first feature vector is input into a deep neural network coding layer, and is sequentially coded through a plurality of fully connected layers in the deep neural network coding layer to obtain a second feature vector with high-order features.

Illustratively, a first feature vector [ (qi) × (ki) × (vi) ] is input to the deep neural network coding layer, extracting two original features: and carrying out cross combination operation on the extracted two original features by a plurality of fully-connected layers in the coding layer of the deep neural network to obtain a high-order feature, namely a quarterly/sales correlation S, and sequentially coding to obtain a second feature vector [ (qi) ki (vi) S ] by referring to the high-order feature.

And sequentially coding the first characteristic vector through a plurality of fully connected layers in the deep neural network coding layer, thereby providing a data basis for the prediction of subsequent bidirectional prediction data.

Step 206: and inputting the second characteristic vector into the bidirectional predictive coding layer, coding the second characteristic vector by a backward detection layer in the bidirectional predictive coding layer to obtain a backward time sequence vector of a first historical time period, and coding the second characteristic vector by a prediction layer in the bidirectional predictive coding layer to obtain a predicted time sequence vector of a target time period.

The bidirectional predictive coding layer can perform bidirectional prediction according to the original feature and the high-order feature in the second feature vector.

The bidirectional predictive coding Layer comprises a backward test Layer (backseat Layer) and a predictive Layer (Forecast Layer), wherein the output of the backward test Layer is a backward time sequence vector of a first historical time period input into a next time sequence sub model, and the output of the predictive Layer is a predicted time sequence vector of a target time period output by the time sequence sub model.

After the retest layer obtains the retest time sequence vector of the first historical time period through retest operation, the retest time sequence vector is immediately input into the next time sequence sub-model. After obtaining the predicted time sequence vector of the target time period output by the time sequence submodel through prediction operation, the prediction layer temporarily stores the predicted time sequence vector in the prediction layer, and outputs the predicted time sequence vector of the target time period output by the time sequence submodel after confirming that the last time sequence submodel obtains the predicted time sequence vector of the output target time period.

Specifically, the second feature vector is input into the bidirectional predictive coding layer, the return time sequence vector of the first historical time period is obtained through coding of the return time layer in the bidirectional predictive coding layer, the return time sequence vector of the first historical time period is input into the next time sequence sub-model, the predicted time sequence vector of the target time period is obtained through coding of the predictive layer in the bidirectional predictive coding layer, and the predicted time sequence vector of the target time period is stored in the predictive layer.

Illustratively, the second feature vector [ (qi) × (ki) × (vi) | S ] is input to the bi-predictive coding layer, which can make bi-predictive prediction based on the original features in the second feature vector — sales at time step Ti and the higher order feature quarterly/sales correlation S. And obtaining a back-time sequence vector (Ti ') of a first historical time period through back-test layer coding in the bidirectional predictive coding layer (560, 630, 1070, 980), inputting the back-time sequence vector of the first historical time period into a next time sequence sub-model, obtaining a predicted time sequence vector (Ti ') of a target time period through predictive layer coding in the bidirectional predictive coding layer (590, 670, 1100, 1020), namely sales volume data of a certain brand on a certain platform in 2022, and storing the predicted time sequence vector (Ti ') of the target time period in the predictive layer.

By applying the embodiment of the specification, the information input by the time sequence model is coded by using the self-attention coding layer to obtain the first characteristic vector, so that the capture of the time sequence model on the relevance of the statistical values at different time points in the time sequence information can be increased, the prediction accuracy of the time sequence sub-model is improved, in addition, the relevance of each time sequence sub-model is increased by combining with a bidirectional prediction mechanism, and the prediction information with more regularity is obtained through prediction.

Fig. 3 shows a flowchart of training a timing model in an information prediction method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 302: training samples are obtained, wherein the training samples comprise sample timing information, and the sample timing information comprises a plurality of time sequences.

The training samples comprise sample time sequence information of a plurality of historical time periods, the training samples are generally preprocessed, a common preprocessing mode is standardization, and mutual influence among a plurality of item data information of time steps in a time sequence in the training samples can be eliminated.

The way to obtain training samples is by batch. Referring to step 102 of the embodiment of fig. 1, the training sample may be one or a combination of data of the visit data, sales data, order data, and the like of a certain brand of stores in 2011-2021, which are obtained by acquiring a plurality of platform statistics in batches.

Specifically, training samples are obtained in batches, wherein the training samples comprise sample timing information, and the sample timing information comprises a plurality of time sequences.

The time sequence model can be quickly trained by acquiring the sample time sequence information of a plurality of time sequences, so that the training efficiency is improved, and a sample basis is provided for subsequent training.

Step 304: inputting a training sample into a preset neural network to obtain time sequence information, wherein the preset neural network comprises a plurality of time sequence submodels connected in series, except that the input of a first time sequence submodel is the training sample, the inputs of other time sequence submodels are time sequence vectors output by a previous time sequence submodel, and the time sequence information is obtained based on the time sequence vectors output by a last time sequence submodel.

The preset neural network is formed by a plurality of series-connected time sequence submodels and has the functions of information prediction and information return measurement. In an embodiment of the present specification, the predetermined neural network is a network model based on machine learning.

Referring to step 102 in the embodiment of fig. 1, the time sequence submodels are connected in series in the preset neural network, and may encode the time sequence in the input training sample to obtain the return time sequence vector of the time sequence submodel, and may fuse the project data information of other time steps in the encoding of the project data information of each time step in the time sequence in the input training sample to obtain the return time sequence vector with high relevance, where the series connection manner of the time sequence submodels is that the input of each time sequence submodel is the output of the previous time sequence submodel except the first time sequence submodel.

And the return time sequence vector is a time sequence vector which is consistent with the historical time period of the training sample after each submodel is coded.

The preset neural network is trained by introducing a retesting mechanism, so that the prediction accuracy of the time sequence sub-model obtained by training on the future information is ensured, the accuracy of the future prediction information obtained after subsequent polymerization is improved, and a time sequence model with higher prediction accuracy is obtained.

Step 306: and calculating a loss value according to the return time sequence information and the sample time sequence information.

The loss value is a loss function value obtained by substituting the return time sequence information and the sample time sequence information into a preset loss function aiming at the preset neural network and calculating. The loss function may calculate a training effect in a current training iteration, and the training effect is expressed by a loss value, and common loss value calculation methods include a mean square error, an average error, a smooth average error, and the like, which are not limited herein.

According to the time sequence information of the return time and the time sequence information of the sample, the calculation method for calculating the loss value can be as follows: and calculating the mean square error value of the time sequence information and the sample time sequence information by using the mean square error according to the project data information coding result of each time step in the time sequence information and the project data information of each time step in the sample time sequence information, and determining the mean square error value as a loss value.

Specifically, according to the return time sequence information and the sample time sequence information, a preset loss function for the preset neural network is adopted to calculate a loss value.

The loss value obtained by calculation can provide data judgment conditions for iterative training of the whole time sequence model, and the training quality is guaranteed.

Step 308: and adjusting the network parameters of the preset neural network based on the loss value, and returning to execute the step 302.

Step 310: and under the condition of meeting a preset training stopping condition, obtaining a time sequence model for finishing training.

The preset training stopping condition is a training iteration stopping condition preset for the preset neural network, when the preset training stopping condition is met, the training is ended, and the preset neural network in the current training iteration is determined to be a time sequence model. The preset training stop condition may be a preset loss threshold, or may also be a preset training iteration condition value, for example, the number of training iterations, the training iteration time, the number of training samples of a training iteration, and the like, which is not limited herein.

Specifically, network parameters of the preset neural network are adjusted based on the loss values, the step of obtaining the training samples is executed again until the preset training stopping conditions are met, the training is finished, and the preset neural network in the current training iteration is determined to be the time sequence model.

By applying the embodiment of the specification, the pre-trained time sequence model can be obtained through training of the preset neural network, and a model basis is provided for prediction of future information.

The above embodiment solves the problem of information prediction based on a plurality of time series, but in practical situations, some additional information, such as the flow of stores, whether member days are present, whether promotion activities are present, and the like, have great influence on information prediction, and are particularly important for prediction results in the e-commerce field.

In order to address the above problem, fig. 4 shows a flowchart of another information prediction method provided according to an embodiment of the present specification, which specifically includes the following steps.

Step 402: historical covariate information for a second historical time period is obtained.

The second historical time period is a data time interval for acquiring the historical covariate information, and may be a year, a quarter, a month, a day, an hour, and the like, which is not limited herein, and may be the same as or different from the first historical time period in the embodiment of fig. 1.

The covariate information is reference variable information that contributes to making a prediction of the prediction information of the target time period, for example, international trade tendency information, national trade tendency information, and the like. The historical covariate information may be historical sliding statistical information, that is, a maximum value, a minimum value, an Average value, an exponential sliding Average (exponential sliding Average) and the like of the historical time series information of the second historical time period, and the historical sliding statistical information may well smooth the historical time series information of the second historical time period to help the time series model capture information of a variation trend of the prediction information of the target time period, or may be reference variable information supplemented with information obtained by performing embedded encoding on univariate time series data among univariate unary time series data, multivariate unary time series data, and multivariate time series data, and is not limited herein.

And acquiring historical covariate information of the second historical time period, and providing a data basis for subsequent prediction.

Step 404: and inputting the historical covariate information into a pre-trained covariate model to obtain a predicted covariate vector of the target time period.

The pre-trained covariate model is obtained after a preset neural network is trained, and historical covariate information can be coded, so that the covariate prediction model for predicting the covariate vector is obtained. In the embodiment of the present specification, the preset neural network is open source and can be directly obtained.

The predicted covariate vector is a covariate vector of a target time period obtained by predicting after the covariate model is coded, wherein the target time period is a future time segment, for example, a day, a week, a month, a year, and the like in the future, and is consistent with the target time period in the embodiment of fig. 1.

Optionally, the covariate model is a Multilayer neural network Model (MLP).

Because the input of the covariate model may be historical covariate information of a plurality of second historical time periods, the information is encoded by adopting a multilayer neural network (MLP), and the independence exists in encoding results and the mutual influence does not exist when the historical covariate information of the plurality of second historical time periods is encoded.

Specifically, historical covariate information of the second historical time period is input into a pre-trained covariate model, and a prediction covariate vector of the target time period is obtained after coding.

And obtaining the prediction covariate vector of the target time period, and providing a data basis for fusing the prediction covariate information into an aggregation result.

Step 406: and performing element-level operation on the aggregation result and the prediction covariate vector to obtain an updated aggregation result.

The polymerization vector is the polymerization result obtained in step 106 in the embodiment of fig. 1.

In order to ensure the unification of data in each prediction time sequence vector, element-level operation is carried out on the aggregation result and the prediction covariate vector, so that an updated aggregation vector is obtained, the vector integrates the original aggregation result and the prediction covariate vector output by the covariate model, an updated aggregation result is obtained, and covariate information is fused.

The element-level operation may be to perform a quantity product on each aggregation vector element in the aggregation result and each vector element in the prediction covariate vector to obtain an updated prediction quantity product value, and determine the updated prediction quantity product value as an updated aggregation result.

The element-level operation may be to perform a vector product on each aggregation vector element in the aggregation result and each vector element in the prediction covariate vector to obtain an updated prediction vector product vector, and determine the updated prediction vector product vector as an updated aggregation result.

The aggregation result and the prediction covariate vector are subjected to element-level operation, so that the unification of the aggregation result data obtained by aggregating each prediction time sequence vector and each prediction covariate vector is ensured, in addition, the aggregation result is updated, the aggregation result is ensured to be fused with the prediction covariate information on the basis of obtaining the prediction information of the time sequence model, the aggregation result contains more information, and the subsequent prediction result is more accurate.

Step 408: and according to the aggregation result, obtaining the prediction information of the target time period through the output layer.

After the updated aggregation result is obtained, the aggregation result is input into an output layer (for example, a sigmoid layer), and prediction information of the target time period can be obtained through calculation of the output layer.

By applying the embodiment of the specification, on the basis of prediction of a time sequence model, a covariate model is adopted to obtain a prediction covariate vector, so that the whole model framework has wider applicability and the prediction result of the model has higher accuracy under the condition of performing prediction by using more effective information.

In addition to the above covariate information, the date information has a large influence on the information prediction, and as shown in fig. 5, a flowchart of another information prediction method provided according to an embodiment of the present specification is shown, which specifically includes the following steps.

Step 502: date statistics are obtained.

The date statistic information may be historical statistic information obtained by counting dates of a historical time period, may be predicted statistic information obtained by predicting dates of a future time period, or may be a combination of the two, and is not limited herein.

The date statistical information is reference variable information that contributes to prediction of the prediction information of the target time period, and may be statistical information that is labeled after a special date is extracted, for example, for a project scene of the platform e-commerce store, a special date such as 11 th month, 6 th month, 18 th month, and spring festival may have a great influence on sales data, visit data, issue data, and the like of the store, or statistical information that is not subjected to a special date extraction may be extracted and labeled in a subsequent date model, for example, an activity day, a working day, and a non-working day set by the store or the platform may also have a certain influence on sales data, visit data, issue data, and the like of the store.

And acquiring date statistical information to provide a data base for subsequent prediction.

Step 504: and inputting the date statistical information into a pre-trained date model to obtain a predicted date vector in a target time period.

The pre-trained date model is obtained after training the preset neural network, and date statistical information can be coded, so that a date prediction model for predicting a date vector is obtained. In the embodiment of the present specification, the preset neural network is open-source and can be directly obtained.

The predicted date vector is a date vector of a target time period predicted by the date model after being encoded, wherein the target time period is a future time segment, such as a future day, a week, a month, a year, and the like, and is consistent with the target time period in the embodiment of fig. 1.

Optionally, the date model is a Long-short term memory model (LSTM).

Because the structure of a Recurrent Neural Network (RNN) such as a long-short term memory model is adopted, the date statistical information can be fused into the finally determined prediction information of the target time period, and the sequence relation of the date statistical information can be introduced, so that the prediction information of the target time period has stronger causality and interpretability, and the prediction rule can be better captured.

In the embodiment of the present specification, the date model may be a simpler recurrent neural network structure such as RNN and GRU, in addition to LSTM.

Specifically, the date statistical information is input into a date model trained in advance, and a predicted date vector in a target time period is obtained after coding.

And obtaining a forecast date vector in the target time period, and providing a data basis for fusing forecast date information into an aggregation result.

Step 506: and performing element-level operation on the aggregation result and the prediction date vector to obtain an updated aggregation result.

The aggregation vector is the aggregation result obtained in step 106 in the embodiment of fig. 1, and may also be the updated aggregation result obtained in step 406 in the embodiment of fig. 4.

In order to ensure the unification of data in each prediction time sequence vector, element-level operation is carried out on the aggregation result and the prediction date vector, so that an updated aggregation vector is obtained, the vector integrates the original aggregation result and the prediction date vector output by the date model, an updated aggregation result is obtained, and date information is fused.

The element-level operation may be a quantity product of each aggregated vector element in the aggregation result and each vector element in the prediction date vector, to obtain an updated prediction quantity product value, and determining the updated prediction quantity product value as an updated aggregation result.

The element-level operation may be to perform a cross product on each aggregated vector element in the aggregated result and each vector element in the predicted date vector to obtain an updated predicted cross product vector, and determine the updated predicted cross product vector as an updated aggregated result.

The aggregation result and the prediction date vectors are subjected to element-level operation, so that the unification of aggregation result data obtained by aggregating all the prediction time sequence vectors and the prediction date vectors is ensured, in addition, the aggregation result is updated, the aggregation result is ensured to be fused with the prediction date information on the basis of the prediction information obtained by the time sequence model, the information content of the aggregation result is more, and the subsequent prediction result is more accurate.

Step 508: and according to the aggregation result, obtaining the prediction information of the target time period through the output layer.

By applying the embodiment of the specification, on the basis of prediction of the time sequence model, the date model is adopted to obtain the predicted date vector, so that the whole model frame has wider applicability and the prediction result of the model has higher accuracy under the condition that more effective information is utilized for prediction.

The information prediction method according to the embodiments described above with reference to fig. 1, 2, 4, and 5 will be described in detail with reference to fig. 6. Fig. 6 shows a pluggable multivariate time sequence information prediction architecture diagram provided in an embodiment of the present specification, which includes a time sequence model, a date model, and a covariate model, where the time sequence model includes multiple time sequence submodels (e.g., time sequence submodel 1, time sequence submodels 2, …, time sequence submodel N in the diagram), and the time sequence submodels include an attention coding layer, a deep neural network coding layer, and a bidirectional prediction coding layer. The execution flow of the time sequence submodel is as follows:

the self-attention coding layer adopts a classical QKV self-attention operation mechanism, and the Q embedding layer, the K embedding layer and the V embedding layer respectively carry out embedding coding on input time sequence information (T1, T2, …, Ti-1 and Ti) to obtain a Q embedding vector (Q1, Q2, …, qi) and a K embedding vector (K1, K2, … and ki)^TAnd V embedding vectors (V1, V2, … and vi), wherein the addition of an attention mechanism enables the time sequence at each time step (timestep) to be fused with data of time sequences at other time steps, so that the capture of the relevance of the model to different time point data in the time sequence can be increased, and in addition, the QKV self-attention operation mechanism belongs to a parallelized attention mode, so that the training efficiency of the model can be improved. Through the fusion of the fusion layers, a first feature vector [ (q1, q2, …, qi) (k1, k2, …, ki) can be obtained ^T×(v1,v2,…,vi)]。

The first feature vector is input into a deep neural network coding layer, the deep neural network coding layer is composed of a plurality of fully-connected layers and is mainly responsible for automatically processing high-order features and carrying out nonlinear transformation on original information and enhancing the expression of time sequence information, and the specific process is shown in an embodiment shown in fig. 2. The first feature vector is sequentially coded through a plurality of full connection layers in a deep neural network coding layer to obtain a second feature vector.

And then inputting the second characteristic vector into a bidirectional predictive coding layer, wherein the bidirectional predictive coding layer is divided into two parts, the left part in the graph is a backward testing layer, namely the backward testing layer is used for generating information backward testing on the first historical time period, the backward testing layer is used for coding to obtain a backward testing time sequence vector of the first historical time period, the right part in the graph is a prediction layer, namely the backward testing layer is used for generating information prediction on the target time period, and the prediction time sequence vector of the target time period can be obtained through coding of the prediction layer. The addition of the retest layer can help the model to well decompose information such as historical time series trend, period, random items and the like. The results of the retest layer are used as input for the next sequential submodel, and the results of the prediction layer are stored for aggregation with the results of the other sequential submodels.

Covariates refer to variables which are helpful for predicting information except time sequence information concerned, historical sliding statistical information is common and mainly used, for example, past N days exponential sliding average, past N balance average/maximum/minimum, and the like, and the statistical type variables can smooth the historical information well and help the model to capture trending information. In the covariate model, considering that one or more pieces of covariate information exist, the information can be encoded by using MLP, that is, the covariate model comprises a plurality of fully-connected layers, and for the acquired data such as past N-day index sliding average values, past N-day average/maximum/minimum values and the like, the predicted covariate vector of the target time period can be obtained by sequentially encoding all fully-connected layers in the covariate model. As shown in the figure.

In time sequence prediction, date information is very important for distinguishing future information changes, and especially has a crucial decision function for predicting special time points; in the e-commerce field, such as double 11 promotion, 618 promotion, spring festival, and the like, the sales volume shows extreme high or extreme low, even changes with the changes of operation activities and the preheating period every year; in addition, information on daily member days, weekends, weekdays, and the like is also important; therefore, in the embodiments of the present specification, a date model is designed to capture information such as holidays, active days, working days, and non-working days. In this model, a superposition type date information coding is proposed, in which information such as a date (for example, a day of the week), whether the date is strong, whether the date is holiday, and the like are superposed, and then sequence coding is performed using LSTM, and finally a predicted date vector in a target time period after superposition is generated by addition. By adopting the structure of the RNN of the LSTM type, the date information can be merged into the final prediction, and the sequence relation information before the date can be introduced, such as the big pre-heating- > big formal date information which is in the front-back relation essentially.

Considering that in the time sequence prediction problem, the situations of dependent scenes, available data sources and the like are different, the situation that only time sequence data is available but no covariate data is available or no date related information exists, and at the moment, only partial data in the three models can be obtained. Therefore, in the aggregation calculation part, the aggregation is carried out by adopting a mode of adding module results, so that a pluggable design can be supported: when only the time sequence information exists, the input and the output of the covariate and date model are null, and the result of the aggregation calculation is equal to the output result of the time sequence model; when the time sequence information and the date information are possessed, the input and the output of the covariate model are null, the result of the aggregation calculation is equal to the addition and aggregation of the output result of the time sequence model and the output result of the date model, and the like, and by adopting the mode, the modularized flexible combination and the pluggable design can be realized, and the time sequence model can be generalized to various time sequence prediction project scenes. The architecture is more universal and generalized.

Corresponding to the above method embodiment, the present specification further provides an information prediction apparatus embodiment, and fig. 7 shows a schematic structural diagram of an information prediction apparatus provided in an embodiment of the present specification. As shown in fig. 7, the apparatus includes:

A timing module 702 configured to obtain historical timing information for a first historical time period, wherein the historical timing information comprises a plurality of time sequences; and inputting historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence submodel through a plurality of time sequence submodels connected in series in the time sequence model, wherein except that the input of the first time sequence submodel is the historical time sequence information, the inputs of other time sequence submodels are the return time sequence vectors of the first historical time period output by the previous time sequence submodel.

The prediction module 704 is configured to aggregate the predicted time sequence vectors of the target time periods output by each time sequence sub-model to obtain a time sequence aggregation result; based on the aggregation result, prediction information of the target time period is determined.

Optionally, the timing submodel comprises: the self-attention coding layer, the deep neural network coding layer and the bidirectional prediction coding layer;

a timing module 702, further configured to encode the input with a self-attention coding layer for the input of any timing submodel to obtain a first feature vector; inputting the first feature vector into a deep neural network coding layer, and sequentially coding through a plurality of full-connection layers in the deep neural network coding layer to obtain a second feature vector; and inputting the second characteristic vector into the bidirectional predictive coding layer, coding the second characteristic vector by a backward detection layer in the bidirectional predictive coding layer to obtain a backward time sequence vector of the first historical time period, and coding the second characteristic vector by a prediction layer in the bidirectional predictive coding layer to obtain a predicted time sequence vector of the target time period.

Optionally, the timing module 702 is further configured to obtain a plurality of weight matrices in parallel by using a preset self-attention operation mechanism for the input of the self-attention coding layer; and fusing the plurality of weight matrixes to obtain a first feature vector.

Optionally, the apparatus further comprises a training module;

a training module configured to obtain a training sample, wherein the training sample comprises sample timing information, and the sample timing information comprises a plurality of time sequences; inputting training samples into a preset neural network to obtain time-reversal sequence information, wherein the preset neural network comprises a plurality of time sequence submodels connected in series, except that the input of a first time sequence submodel is the training sample, the input of other time sequence submodels is time-reversal sequence vectors output by the last time sequence submodel, and the time-reversal sequence information is obtained based on the time-reversal sequence vectors output by the last time sequence submodel; calculating a loss value according to the time sequence information of the return time and the time sequence information of the sample; and adjusting the network parameters of the preset neural network based on the loss value, and returning to the step of acquiring the training sample until the training stopping condition is met, so as to obtain the time sequence model for completing the training.

Optionally, the predicting module 704 is further configured to perform element-level operation on the predicted time sequence vector output by each time sequence sub-model to obtain an aggregation result; and according to the aggregation result, obtaining the prediction information of the target time period through the output layer.

Optionally, the apparatus further comprises: a covariate module;

a covariate module configured to obtain historical covariate information for a second historical time period; inputting historical covariate information into a covariate model trained in advance to obtain a predicted covariate vector of a target time period;

a prediction module 704 further configured to perform element-level operations on the aggregated results and the prediction covariate vectors to obtain updated aggregated results; and obtaining the prediction information of the target time period through the output layer according to the aggregation result.

Optionally, the covariate model is a multi-layer neural network model.

Optionally, the apparatus further comprises: a date module;

a date module configured to obtain date statistics; inputting the date statistical information into a pre-trained date model to obtain a predicted date vector in a target time period;

a prediction module 704 further configured to perform element-level operations on the aggregated result and the prediction date vector to obtain an updated aggregated result; and obtaining the prediction information of the target time period through the output layer according to the aggregation result.

Optionally, the date model is a long-short term memory model.

By applying the embodiment of the specification, the obtained historical time sequence information of the first historical time period is input into a pre-trained time sequence model, the predicted time sequence vector of the target time period output by each time sequence submodel is obtained through a plurality of time sequence submodels which are connected in series in the time sequence model, the input of other time sequence submodels except the first time sequence submodel is the time-returning time sequence vector of the first historical time period output by the previous time sequence submodel, then the predicted time sequence vectors output by each time sequence submodel are aggregated, and the predicted information of the target time period can be determined based on the aggregation result. Because the time sequence model is a pre-trained multilayer sub-model structure, the multilayer sub-model structure can perform multiple times of back testing on the time sequence vector of the first historical time period, and predict the time sequence vector of the target time period based on the time sequence vector of each time of back testing, therefore, a plurality of time sequences can be input simultaneously, information prediction under the condition of inputting a plurality of time sequences is realized by using one time sequence model, the time sequence model does not need to be respectively established for each time sequence, the efficiency of information prediction is improved, the time sequence model can be suitable for complex scenes, and the time sequences can simultaneously participate in the calculation of the time sequence sub-model, which means that the time sequences share model parameters, so that the time sequence model can capture the similarity relation among the time sequences, and the result of information prediction is more accurate.

The foregoing is a schematic diagram of an information prediction apparatus according to the present embodiment. It should be noted that the technical solution of the information prediction apparatus and the technical solution of the information prediction method belong to the same concept, and for details that are not described in detail in the technical solution of the information prediction apparatus, reference may be made to the description of the technical solution of the information prediction method.

FIG. 8 illustrates a block diagram of a computing device, according to one embodiment of the present description. The components of the computing device 800 include, but are not limited to, a memory 810 and a processor 820. The processor 820 is coupled to the memory 810 via a bus 830, and the database 850 is used to store data.

Computing device 800 also includes access device 840, access device 840 enabling computing device 800 to communicate via one or more networks 860. Examples of such networks include a Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 840 may include one or more of any type of Network Interface (e.g., a Network Interface Controller) whether wired or Wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) Wireless Interface, a worldwide Interoperability for Microwave Access (Wi-MAX) Interface, an ethernet Interface, a Universal Serial Bus (USB) Interface, a cellular Network Interface, a bluetooth Interface, a Near Field Communication (NFC) Interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 800, as well as other components not shown in FIG. 8, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 8 is for purposes of example only and is not limiting as to the scope of the description. Those skilled in the art may add or replace other components as desired.

Computing device 800 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 800 may also be a mobile or stationary server.

Wherein the processor 820 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the information prediction method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the information prediction method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the information prediction method.

An embodiment of the present specification also provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the steps of the information prediction method described above.

The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the information prediction method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the information prediction method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the information prediction method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same concept as the technical solution of the information prediction method, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the information prediction method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. An information prediction method, comprising:

inputting the historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence submodel through a plurality of series-connected time sequence submodels in the time sequence model, wherein except the input of a first time sequence submodel which is the historical time sequence information, the input of other time sequence submodels is a return time sequence vector of the first historical time period output by the previous time sequence submodel;

And aggregating the predicted time sequence vectors output by each time sequence sub-model to obtain an aggregation result, and determining the prediction information of the target time period based on the aggregation result.

2. The method of claim 1, the timing submodel comprising: the self-attention coding layer, the deep neural network coding layer and the bidirectional prediction coding layer;

the step of obtaining the predicted time sequence vector of the target time period output by each time sequence submodel through a plurality of time sequence submodels connected in series in the time sequence model comprises the following steps:

aiming at the input of any time sequence sub-model, the self-attention coding layer is utilized to code the input to obtain a first feature vector;

inputting the first feature vector into the deep neural network coding layer, and sequentially coding through a plurality of fully-connected layers in the deep neural network coding layer to obtain a second feature vector;

and inputting the second characteristic vector into the bidirectional predictive coding layer, obtaining a time sequence vector of the first historical time period through coding of a backward-testing layer in the bidirectional predictive coding layer, and obtaining a predicted time sequence vector of the target time period through coding of a prediction layer in the bidirectional predictive coding layer.

3. The method of claim 2, wherein said encoding the input using the self-attention coding layer to obtain a first feature vector comprises:

aiming at the input of the self-attention coding layer, a preset self-attention operation mechanism is adopted to obtain a plurality of weight matrixes in parallel;

and fusing the plurality of weight matrixes to obtain a first feature vector.

4. The method of any of claims 1-3, further comprising, prior to said inputting said historical timing information into a pre-trained timing model:

acquiring a training sample, wherein the training sample comprises sample timing information, and the sample timing information comprises a plurality of time sequences;

inputting the training sample into a preset neural network to obtain time-reversal sequence information, wherein the preset neural network comprises a plurality of time sequence submodels connected in series, except that the input of the first time sequence submodel is the training sample, the input of other time sequence submodels is time-reversal sequence vectors output by the last time sequence submodel, and the time-reversal sequence information is obtained based on the time-reversal sequence vectors output by the last time sequence submodel;

calculating a loss value according to the time sequence information of the return time and the time sequence information of the sample;

And adjusting the network parameters of the preset neural network based on the loss value, and returning to the step of obtaining the training sample until a time sequence model for completing training is obtained under the condition that a preset training stopping condition is met.

5. The method of any of claims 1-3, wherein aggregating the predicted timing vectors output by the timing submodels to obtain an aggregated result comprises:

performing element-level operation on the predicted time sequence vectors output by each time sequence sub-model to obtain an aggregation result;

the determining the prediction information of the target time period based on the aggregation result comprises:

and obtaining the prediction information of the target time period through an output layer according to the aggregation result.

6. The method of claim 1, further comprising:

acquiring historical covariate information of a second historical time period;

inputting the historical covariate information into a pre-trained covariate model to obtain a predicted covariate vector of the target time period;

performing element-level operation on the aggregation result and the prediction covariate vector to obtain an updated aggregation result;

And according to the aggregation result, obtaining the prediction information of the target time period through an output layer.

7. The method of claim 6, the covariate model being a multi-layer neural network model.

8. The method of claim 1 or 6, further comprising:

acquiring date statistical information;

inputting the date statistical information into a pre-trained date model to obtain a predicted date vector in the target time period;

performing element-level operation on the aggregation result and the prediction date vector to obtain an updated aggregation result;

9. The method of claim 8, wherein the date model is a long-short term memory model.

10. An information prediction apparatus comprising:

a timing module configured to obtain historical timing information for a first historical time period, wherein the historical timing information comprises a plurality of time series; inputting the historical time sequence information into a pre-trained time sequence model, and obtaining a predicted time sequence vector of a target time period output by each time sequence submodel through a plurality of series-connected time sequence submodels in the time sequence model, wherein except the input of a first time sequence submodel which is the historical time sequence information, the input of other time sequence submodels is a return time sequence vector of the first historical time period output by the previous time sequence submodel;

The prediction module is configured to aggregate the predicted time sequence vectors of the target time periods output by the time sequence submodels to obtain a time sequence aggregation result, and the prediction information of the target time periods is determined based on the aggregation result.

11. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions, which when executed by the processor implement the steps of the information prediction method of any one of claims 1 to 9.

12. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the information prediction method of any one of claims 1 to 9.