CN112132353A - Time-space two-stage attention-based nonlinear exogenous sequence prediction method - Google Patents

Time-space two-stage attention-based nonlinear exogenous sequence prediction method Download PDF

Info

Publication number
CN112132353A
CN112132353A CN202011042266.4A CN202011042266A CN112132353A CN 112132353 A CN112132353 A CN 112132353A CN 202011042266 A CN202011042266 A CN 202011042266A CN 112132353 A CN112132353 A CN 112132353A
Authority
CN
China
Prior art keywords
attention
exogenous
module
sequences
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011042266.4A
Other languages
Chinese (zh)
Inventor
夏树涛
鲍际刚
夏智康
李佳维
刘鑫吉
圣亚军
朱天磊
孙继丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Wukong Investment Management Co ltd
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen Wukong Investment Management Co ltd
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Wukong Investment Management Co ltd, Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen Wukong Investment Management Co ltd
Priority to CN202011042266.4A priority Critical patent/CN112132353A/en
Publication of CN112132353A publication Critical patent/CN112132353A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/04Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Biomedical Technology (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Technology Law (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a nonlinear exogenous sequence prediction method based on space-time two-stage attention, which is applied to a prediction network model, wherein the prediction network model comprises an encoder configured with a space-time attention mechanism and a decoder configured with a self-attention mechanism, and a plurality of second-order exogenous sequences are determined according to a plurality of exogenous sequences after a plurality of target sequences to be predicted and the exogenous sequences are obtained; inputting the second-order exogenous sequences into the encoder, and determining hidden space characteristics through the encoder; and inputting the implicit spatial features and the target sequences into the decoder, and determining a prediction sequence corresponding to the target sequence through the decoder. According to the method and the device, correlation among exogenous sequences can be obtained through a space-time attention mechanism at an encoder end, and meanwhile, the long-range dependency relationship of time sequences can be better captured through the self-attention mechanism, so that the model prediction performance is improved, and the accuracy of the prediction sequences is improved.

Description

Time-space two-stage attention-based nonlinear exogenous sequence prediction method
Technical Field
The application relates to the technical field of trend prediction, in particular to a nonlinear exogenous sequence prediction method based on space-time two-stage attention.
Background
The nonlinear autoregressive exogenous sequence prediction (NARX) model is a time sequence prediction model and is widely applied to aspects of financial product trend prediction, short-term change prediction of climate environment, epidemic situation management and control trend prediction in application and medical scenes and the like. However, the existing nonlinear autoregressive exogenous sequence prediction model generally cannot extract the contribution between the input exogenous sequence linkages, and has a limited effect on the aspect of capturing the long-range dependence of a time sequence, so that the model precision of the nonlinear autoregressive exogenous sequence prediction model is influenced, and the accuracy of a prediction result is influenced.
Disclosure of Invention
The technical problem to be solved by the present application is to provide a time-space two-stage attention-based nonlinear exogenous sequence prediction method, aiming at the defects of the prior art.
In order to solve the technical problem, a first aspect of the embodiments of the present application provides a spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method, which is applied to a prediction network model, where the prediction network model includes an encoder and a decoder, the encoder is configured with a spatiotemporal attention mechanism, and the decoder is configured with a self-attention mechanism, and the method includes:
acquiring a plurality of target sequences to be predicted and a plurality of exogenous sequences, and determining a plurality of second-order exogenous sequences according to the exogenous sequences;
inputting the second-order exogenous sequences into the encoder, and determining hidden space characteristics through the encoder;
and inputting the implicit spatial features and the target sequences into the decoder, and determining a prediction sequence corresponding to the target sequence through the decoder.
The nonlinear exogenous sequence prediction method based on space-time two-stage attention comprises the following steps of obtaining a plurality of target sequences to be predicted and a plurality of exogenous sequences, and determining a plurality of second-order exogenous sequences according to the exogenous sequences, wherein the method further comprises the following steps:
for each exogenous sequence in the plurality of exogenous sequences, determining a correlation sequence of the exogenous sequence and each exogenous sequence in the plurality of exogenous sequences;
determining a plurality of second-order exogenous sequences according to all determined correlation sequences.
The nonlinear exogenous sequence prediction method based on spatiotemporal two-stage attention is characterized in that an encoder comprises a spatiotemporal attention module and an activation module, the second-order exogenous sequences are input into the encoder, and a hidden space characteristic is determined through the encoder, and the method specifically comprises the following steps:
inputting a plurality of second-order exogenous sequences into a space-time attention module, and determining candidate hidden space characteristics corresponding to the second-order exogenous sequences through the space-time attention module;
inputting each candidate hidden space feature into an activation module, and determining the hidden space feature through the activation module.
The nonlinear exogenous sequence prediction method based on space-time two-stage attention comprises the following steps of inputting a plurality of second-order exogenous sequences into a space-time attention module, and determining candidate hidden space features corresponding to the second-order exogenous sequences through the space-time attention module, wherein the method specifically comprises the following steps:
the space-time attention module determines space-time attention characteristics corresponding to each second-order exogenous sequence;
the space-time attention module determines a weight sequence corresponding to each second-order exogenous sequence based on the attention characteristics at each moment;
and the space-time attention module determines candidate hidden space characteristics corresponding to each second-order exogenous sequence based on each exogenous sequence and the weight sequence corresponding to each second-order exogenous sequence.
The nonlinear exogenous sequence prediction method based on the space-time two-stage attention comprises the steps that the activation module comprises a plurality of long and short memory units, the long and short memory units correspond to the candidate implicit space characteristics one by one, and each candidate implicit space characteristic is an input item of the corresponding long and short memory unit.
The decoder comprises a linear module, a plurality of cascade self-attention modules and a fusion module, wherein the input items of the linear module are a plurality of target sequences, the input item of the self-attention module positioned at the forefront in the cascade order comprises the output item of the linear module and hidden space characteristics, the input item of the latter self-attention module in two adjacent self-attention modules in the cascade order comprises the output item of the former self-attention module, the input items of the respective attention modules comprise hidden space characteristics, the input item of the fusion module is the output item of the last self-attention module, and the output item of the fusion module is a prediction sequence.
The non-linear exogenous sequence prediction method based on the spatio-temporal two-stage attention comprises the following steps that a self-attention module comprises a first multi-head attention unit, a first fusion unit, a second multi-head attention unit, a second fusion unit, a linear unit and a third fusion unit; the input items of the first multi-head attention unit comprise a plurality of output items of the self-attention module positioned in front of the self-attention module; the input items of the first fusion unit comprise the output item of the first multi-head attention unit and the output item of the self-attention module positioned in front of the self-attention module; the input items of the second multi-head attention unit comprise a plurality of implicit spatial features and the output items of the first fusion unit, the input items of the second fusion unit comprise the output items of the second multi-head attention unit and the input items of the second multi-head attention unit, the input items of the linear unit are the output items of the second fusion unit, and the input items of the third fusion unit comprise the output items of the linear unit and the input items of the linear unit.
The nonlinear exogenous sequence prediction method based on the space-time two-stage attention comprises the steps that the fusion module comprises a fusion unit and a full-connection unit, the fusion unit is connected with the full-connection unit, and the fusion unit is used for summing corresponding input items according to a time dimension.
A second aspect of embodiments of the present application provides a computer readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method as described in any one of the above.
A third aspect of the embodiments of the present application provides a terminal device, including: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the spatiotemporal two-stage attention-based non-linear exogenous sequence prediction method as described in any of the above.
Has the advantages that: compared with the prior art, the nonlinear exogenous sequence prediction method based on space-time two-stage attention is applied to a prediction network model, the prediction network model comprises an encoder configured with a space-time attention mechanism and a decoder configured with a self-attention mechanism, a plurality of target sequences to be predicted and a plurality of exogenous sequences are obtained, and a plurality of second-order exogenous sequences are determined according to the exogenous sequences; inputting the second-order exogenous sequences into the encoder, and determining hidden space characteristics through the encoder; and inputting the implicit spatial features and the target sequences into the decoder, and determining a prediction sequence corresponding to the target sequence through the decoder. According to the method and the device, correlation among exogenous sequences can be obtained through a space-time attention mechanism at an encoder end, and meanwhile, the long-range dependency relationship of time sequences can be better captured through the self-attention mechanism, so that the model prediction performance is improved, and the accuracy of the prediction sequences is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without any inventive work.
FIG. 1 is a flow chart of a spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method provided by the present application.
FIG. 2 is a schematic diagram of the principle of the spatio-temporal two-stage attention-based nonlinear exogenous sequence prediction method provided by the present application.
FIG. 3 is a schematic diagram of a spatiotemporal attention module in the spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method provided by the present application.
Fig. 4 is a schematic diagram of a decoder in the spatio-temporal two-stage attention-based nonlinear exogenous sequence prediction method provided by the present application.
Fig. 5 is a schematic structural diagram of a terminal device provided in the present application.
Detailed Description
The present application provides a time-space two-stage attention-based nonlinear exogenous sequence prediction method, and in order to make the purpose, technical scheme and effect of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The following further describes the content of the application by describing the embodiments with reference to the attached drawings.
The present implementation provides a spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method, as shown in fig. 1 and 2, applied to a prediction network model including an encoder configured with a spatiotemporal attention mechanism and a decoder configured with a self-attention mechanism, the method comprising:
s10, obtaining a plurality of target sequences to be predicted and a plurality of exogenous sequences, and determining a plurality of second-order exogenous sequences according to the exogenous sequences.
Specifically, the target sequences correspond to a plurality of targets to be predicted, each target sequence includes a target data value of each target to be predicted at one of the past preset number of times, the exogenous sequence is used for reflecting an influence factor of the target sequence, and each exogenous sequence in the exogenous sequences includes an exogenous data value at the past preset number of times. The plurality of exogenous sequences comprise exogenous sequences corresponding to each target sequence in the plurality of target sequences, and it can be understood that after the plurality of target sequences are obtained, exogenous data corresponding to each target sequence are determined, and a sequence formed by exogenous data values of each obtained exogenous data at a past preset number of moments is used as one exogenous sequence.
The target sequence and the exogenous sequence can be determined according to the application scenario of the nonlinear exogenous sequence prediction method based on space-time two-stage attention. The time-space two-stage attention-based nonlinear exogenous sequence prediction method can be applied to financial product trend prediction scenes, short-term change prediction scenes of climate environments, epidemic situation management and control trend prediction scenes and the like in application and medical scenes. In one implementation of this embodiment, the time-space two-stage attention-based nonlinear exogenous sequence prediction method is applied to a stock trend prediction scenario in a financial product, and accordingly, the target sequence may be a plurality of target stock prices and the exogenous sequence is a plurality of index data. Of course, in practical applications, the data information represented by the target sequence and the exogenous sequence may be different in different application scenarios.
In one implementation manner of this embodiment, the determining the second-order exogenous sequences according to the exogenous sequences specifically includes:
for each exogenous sequence in the plurality of exogenous sequences, determining a correlation sequence of the exogenous sequence and each exogenous sequence in the plurality of exogenous sequences;
determining a plurality of second-order exogenous sequences according to all determined correlation sequences.
Specifically, the correlation is used to reflect a second-order relationship between two exogenous sequences, and the second-order exogenous sequences are all determined correlation sequences, where the second-order exogenous sequences include a second-order relationship between exogenous sequences at a past preset number of times. Thus, the number of the second-order exogenous sequences is equal to the number of past preset number of moments, and each second-order exogenous sequence comprises an element number equal to the square of the number of the exogenous sequences. For example, if there are 4 exogenous sequences, each exogenous sequence includes exogenous data values of the last 8 time instants, then there are 8 second-order exogenous sequences, and each second-order exogenous sequence includes 16 elements. In one implementation manner of this embodiment, the second-order relationship between the exogenous sequences may be determined according to a preset relationship function, and the relationship function may be different for different application scenarios. Therefore, after a plurality of exogenous sequences are obtained, an application scene is determined according to the exogenous sequences, and a relation function corresponding to the application scene is selected based on the application scene, wherein the relation function can be stored locally in terminal equipment for executing the nonlinear exogenous sequence prediction method based on the space-time two-stage attention in advance, or can be stored in a remote server, and the terminal equipment for executing the nonlinear exogenous sequence prediction method based on the space-time two-stage attention is connected with the remote server to obtain the relation function corresponding to the application scene stored by the remote server.
In one implementation of the embodiment, the relationship function is described by taking the stock index trend prediction of the financial scenario as an example. Suppose the target training is a stock share price of several target stocks, Y ═ Y1,Y2,···,YT]TWherein T represents the number of time instants,
Figure BDA0002706998370000061
u is the target stock quantity,
Figure BDA0002706998370000071
stock price representing jth stock at tth momentGrid, and associated index data X ═ X1,X2,···,XN]Wherein N represents the number of index data,
Figure BDA0002706998370000072
Figure BDA0002706998370000073
and (4) data values representing ith index data at the t-th time. Wherein each of the plurality of target stocks is included in a plurality of indices, assuming that the set of indices comprising the ith stock is ΩiThen the associated exponential data is defined as Ω12U…UΩU
Firstly, an exogenous sequence is converted into relative data in a fluctuation form, so that a training set and a test set can have data distribution as close as possible, wherein the conversion mode can be as follows:
Figure BDA0002706998370000074
Figure BDA0002706998370000075
wherein the content of the first and second substances,
Figure BDA0002706998370000076
values representing the transformed ith exogenous sequence and the time t of the target sequence,
Figure BDA0002706998370000077
value representing the t-th time of the converted i-th target sequence
After obtaining the transformed exogenous sequences, determining the correlation of each exogenous sequence with each of a plurality of exogenous sequences, wherein the exogenous sequences
Figure BDA0002706998370000078
With exogenous sequences
Figure BDA0002706998370000079
Can be calculated as
Figure BDA00027069983700000710
Wherein the content of the first and second substances,
Figure BDA00027069983700000711
representing the relative second-order relation between the ith exogenous sequence and the jth exogenous sequence at the tth moment, wherein the relevance represents the meaning as follows: when the two indexes rise or fall at the same time at the tth moment, the trend is superposed by the correlation, and when the two indexes rise and fall at the tth moment, the correlation counteracts the two trends.
In addition, after the relative second-order relationship corresponding to each exogenous sequence is obtained, for each exogenous sequence, all the relative second-order relationships corresponding to the exogenous sequence can determine the second-order exogenous sequence corresponding to the exogenous sequence. Thus, several second-order exogenous sequences can be represented as
Figure BDA00027069983700000712
Wherein the content of the first and second substances,
Figure BDA00027069983700000713
and S20, inputting the second-order exogenous sequences into the encoder, and determining the hidden space characteristics through the encoder.
Specifically, the encoder is used for acquiring high-dimensional feature information presented by a plurality of second-order exogenous sequences at past moments, and the encoder focuses on exogenous sequence features related to a target sequence through a space-time attention mechanism. The encoder comprises a space-time attention module and an activation module, wherein the space-time attention module is used for paying attention to exogenous sequence features related to a target sequence, and the activation module is used for outputting high-dimensional feature information presented by a plurality of exogenous sequences at the past moment.
Based on this, the inputting the second-order exogenous sequences into the encoder, and determining the hidden spatial features through the encoder specifically include:
inputting a plurality of second-order exogenous sequences into a space-time attention module, and determining candidate hidden space characteristics corresponding to the second-order exogenous sequences through the space-time attention module;
inputting each candidate hidden space feature into an activation module, and determining the hidden space feature through the activation module.
Specifically, the spatiotemporal attention module is connected with the activation module, the output items of the spatiotemporal attention module are input items of the activation module, and the input items of the spatiotemporal attention module are a plurality of second-order exogenous sequences. The candidate hidden space features correspond to the second-order exogenous sequences one by one, and each candidate hidden space feature is obtained by outputting the corresponding second-order exogenous sequence through a space-time attention module. It can be understood that when the exogenous sequence corresponding to the candidate implicit spatial feature is input into the spatiotemporal attention module, the spatiotemporal attention module outputs the candidate implicit spatial feature. In addition, the activation module comprises a plurality of long and short memory units, the long and short memory units correspond to the candidate implicit spatial features one by one, and each candidate implicit spatial feature is an input item of the corresponding long and short memory LSTM unit.
In one implementation of this embodiment, the encoder may be understood as learning a mapping function:
Figure BDA0002706998370000081
wherein the content of the first and second substances,
Figure BDA0002706998370000082
representing the hidden state of the encoder at time t, m representing the dimension of the hidden state, and f representing a non-linear activation function. Each LSTM cell has a memory cell with a state of s at time ttAnd the update of the LSTM cell is subject to three gate states ft,itAnd otIn which three doors are providedState ft,itAnd otWe refer to them as forgetting gate, input gate and output gate, respectively. The update procedure of the LSTM unit may be expressed as:
Figure BDA0002706998370000083
Figure BDA0002706998370000084
Figure BDA0002706998370000091
Figure BDA0002706998370000092
ht=ot⊙tanh(st)
wherein the content of the first and second substances,
Figure BDA0002706998370000093
hidden state h representing past timet-1And input of the current time
Figure BDA0002706998370000094
The series connection of (a) and (b),
Figure BDA0002706998370000095
and
Figure BDA0002706998370000096
are parameters that need to be learned. σ and |, denote sigmoid activation function and element multiplication operation, respectively.
In an implementation manner of this embodiment, as shown in fig. 3, the inputting a plurality of second-order exogenous sequences into a spatiotemporal attention module, and the determining, by the spatiotemporal attention module, a candidate implicit spatial feature corresponding to each second-order exogenous sequence specifically includes:
the space-time attention module determines space-time attention characteristics corresponding to each second-order exogenous sequence;
the space-time attention module determines a weight sequence corresponding to each second-order exogenous sequence based on the attention characteristics at each moment;
and the space-time attention module determines candidate hidden space characteristics corresponding to each second-order exogenous sequence based on each exogenous sequence and the weight sequence corresponding to each second-order exogenous sequence.
In particular, the spatiotemporal attention feature and the sequence of weights may be represented as:
Figure BDA0002706998370000097
Figure BDA0002706998370000098
wherein the content of the first and second substances,
Figure BDA0002706998370000099
represents a second-order exogenous sequence generated by the ith exogenous sequence relative to the jth exogenous sequence,
Figure BDA00027069983700000910
is a pre-set parameter of the process,
Figure BDA00027069983700000911
is a weight value in the weight sequence for reflecting the second-order exogenous sequence
Figure BDA00027069983700000912
The weight coefficients that are assigned to the weight coefficients,
Figure BDA00027069983700000913
for an element in the spatiotemporal attention feature,
Figure BDA00027069983700000914
indicating the coding at time tHidden state of the device, st
Further, after the spatiotemporal attention feature, in order to ensure that the sum of ownership weight values in the weight sequence is 1, the spatiotemporal attention feature is normalized. In this embodiment, the normalization process uses a softmax function, that is, after obtaining the target object, the target object is obtained
Figure BDA00027069983700000915
After that, the softmax function correspondence is used
Figure BDA00027069983700000916
Normalization is performed to ensure that the sum of all weight coefficients is 1. In addition, after the weight sequence is obtained, multiplying each weight value in the weight sequence by an element in a second-order exogenous sequence corresponding to the weight value to obtain a candidate hidden space feature, where the candidate hidden space feature may be represented as:
Figure BDA0002706998370000101
based on this, after introducing the input end space-time attention mechanism on the basis of the LSTM unit, the hidden state mapping function corresponding to the encoder can be expressed as:
Figure BDA0002706998370000102
and S30, inputting the implicit spatial features and the target sequences into the decoder, and determining the prediction sequence corresponding to the target sequence through the decoder.
Specifically, as shown in fig. 4, the decoder includes a linear module, a plurality of cascaded self-attention modules, and a fusion module, where the linear module has a plurality of target sequences as input items, the input item of the self-attention module positioned at the top in the cascaded order includes an output item of the linear module and a hidden spatial feature, the input item of the subsequent self-attention module in two adjacent self-attention modules in the cascaded order includes an output item of the previous self-attention module, and each of the input items of the respective attention modules includes a hidden spatial feature, the fusion module has an input item of the last self-attention module, and the fusion module has a prediction sequence as an output item.
In one implementation of this embodiment, each of the self-attention mechanism modules is equivalent to learning a mapping function with the same input dimension and output dimension:
cl=g(H,cl-1)
wherein the content of the first and second substances,
Figure BDA0002706998370000103
the output of the l-th layer self-attention mechanism module is shown,
Figure BDA0002706998370000104
Figure BDA0002706998370000105
c0by inputting a sequence of objects
Figure BDA0002706998370000106
And determining the result through a linear module. The linear module is configured with multi-perception computation (MLP), c0The calculation formula of (c) may be:
Figure BDA0002706998370000107
wherein the content of the first and second substances,
Figure BDA0002706998370000108
is a preset parameter.
In one implementation manner of this embodiment, the self-attention module includes a first multi-head attention unit, a first fusion unit, a second multi-head attention unit, a second fusion unit, a linear unit, and a third fusion unit; the input items of the first multi-head attention unit comprise a plurality of output items of the self-attention module positioned in front of the self-attention module; the input items of the first fusion unit comprise the output item of the first multi-head attention unit and the output item of the self-attention module positioned in front of the self-attention module; the input items of the second multi-head attention unit comprise a plurality of implicit spatial features and the output items of the first fusion unit, the input items of the second fusion unit comprise the output items of the second multi-head attention unit and the input items of the second multi-head attention unit, the input items of the linear unit are the output items of the second fusion unit, and the input items of the third fusion unit comprise the output items of the linear unit and the input items of the linear unit.
Further, the multi-head attention mechanism of the first multi-head attention unit and the second multi-head attention unit configuration each corresponds to a mapping function, and the mapping function can be expressed as:
Output=MHAtt(q,k,v)
wherein the content of the first and second substances,
Figure BDA0002706998370000111
respectively representing a query matrix, a key matrix and a value matrix input to the module, wherein the query matrix, the key matrix and the value matrix can be determined according to the input items of the first multi-headed attention unit and the second multi-headed attention unit. The overall computational mechanism inside the mapping function can be expressed as:
Q=WqqT+bq
K=WkkT+bk
V=WvvT+bv
Figure BDA0002706998370000112
Output=ZTWZ+bz
wherein the content of the first and second substances,
Figure BDA0002706998370000113
is a preset parameter.
Based on this, as shown in fig. 4, the self-attention mechanism module can be expressed as:
Figure BDA0002706998370000114
Figure BDA0002706998370000115
Figure BDA0002706998370000116
Figure BDA0002706998370000117
Figure BDA0002706998370000118
Figure BDA0002706998370000119
further, the output item of the last self-attention mechanism module in the plurality of self-attention mechanism modules is acquired
Figure BDA00027069983700001110
Wherein the content of the first and second substances,
Figure BDA00027069983700001111
the predicted sequence is then determined by the fusion module. In a first implementation manner of this embodiment, the fusion module includes a fusion unit and a full-connection unit, the fusion unit is connected to the full-connection unit, and the fusion unit is configured to sum the corresponding input items according to a time dimension. It will be appreciated that upon acquisition
Figure BDA00027069983700001112
Figure BDA0002706998370000121
Then, in the time dimension
Figure BDA0002706998370000122
Summing to obtain the final prediction sequence needed by prediction through the full-connection unit
Figure BDA0002706998370000123
Can be expressed as:
Figure BDA0002706998370000124
Figure BDA0002706998370000125
wherein the content of the first and second substances,
Figure BDA0002706998370000126
is a preset parameter.
In an implementation manner of this embodiment, when training the prediction network model, a mean square error loss function may be used, and an expression of the mean square error loss function may be:
Figure BDA0002706998370000127
wherein the content of the first and second substances,
Figure BDA0002706998370000128
represents the predicted sequence of the ith sample,
Figure BDA0002706998370000129
representing the true sequence of the ith sample.
In summary, the present embodiment provides a method for predicting a nonlinear exogenous sequence based on spatio-temporal two-stage attention, where the method is applied to a prediction network model, the prediction network model includes an encoder configured with a spatio-temporal attention mechanism and a decoder configured with a self-attention mechanism, and a plurality of target sequences and a plurality of exogenous sequences to be predicted are obtained, and a plurality of second-order exogenous sequences are determined according to the plurality of exogenous sequences; inputting the second-order exogenous sequences into the encoder, and determining hidden space characteristics through the encoder; and inputting the implicit spatial features and the target sequences into the decoder, and determining a prediction sequence corresponding to the target sequence through the decoder. According to the method and the device, correlation among exogenous sequences can be obtained through a space-time attention mechanism at an encoder end, and meanwhile, the long-range dependency relationship of time sequences can be better captured through the self-attention mechanism, so that the model prediction performance is improved, and the accuracy of the prediction sequences is improved.
In the embodiment, the method provided by the embodiment is executed through a trained prediction network model, the prediction network model comprises an encoder and a decoder, the target sequence to be predicted and the past time information of the related exogenous sequence are correspondingly acquired, at the encoder end, the method extracts high-dimensional feature information presented by the related exogenous sequence at the past time through a recurrent neural network, and simultaneously designs an attention mechanism based on input features to guide the model to pay attention to the exogenous sequence features which are closer to the predicted target. At the decoder end, the method uses a stacked self-attention mechanism module to process input target sequence information and high-dimensional characteristic information extracted by an encoder, so that a long-range dependency relationship between sequences is captured, and finally the confidence coefficient of trend prediction is obtained. At the encoder end, the method creatively introduces the concept of a second-order attention mechanism, and extracts the relation between input sequences. Due to the introduction of the concept of attention mechanism, the method can be used for mining the contribution degree of each exogenous sequence to the prediction while predicting the trend, and has certain interpretability.
Based on the spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method, the present embodiment provides a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps in the spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to the above embodiments.
Based on the above-mentioned time-space two-stage attention-based non-linear exogenous sequence prediction method, the present application further provides a terminal device, as shown in fig. 5, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory)22, and may further include a communication Interface (Communications Interface)23 and a bus 24. The processor 20, the display 21, the memory 22 and the communication interface 23 can communicate with each other through the bus 24. The display screen 21 is configured to display a user guidance interface preset in the initial setting mode. The communication interface 23 may transmit information. The processor 20 may call logic instructions in the memory 22 to perform the methods in the embodiments described above.
Furthermore, the logic instructions in the memory 22 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.
The memory 22, which is a computer-readable storage medium, may be configured to store a software program, a computer-executable program, such as program instructions or modules corresponding to the methods in the embodiments of the present disclosure. The processor 20 executes the functional application and data processing, i.e. implements the method in the above-described embodiments, by executing the software program, instructions or modules stored in the memory 22.
The memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 22 may include a high speed random access memory and may also include a non-volatile memory. For example, a variety of media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, may also be transient storage media.
In addition, the specific processes loaded and executed by the storage medium and the instruction processors in the terminal device are described in detail in the method, and are not stated herein.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (10)

1. A spatiotemporal two-stage attention-based non-linear exogenous sequence prediction method applied to a prediction network model, the prediction network model comprising an encoder and a decoder, wherein the encoder is configured with a spatiotemporal attention mechanism and the decoder is configured with a self-attention mechanism, the method comprising:
acquiring a plurality of target sequences to be predicted and a plurality of exogenous sequences, and determining a plurality of second-order exogenous sequences according to the exogenous sequences;
inputting the second-order exogenous sequences into the encoder, and determining hidden space characteristics through the encoder;
and inputting the implicit spatial features and the target sequences into the decoder, and determining a prediction sequence corresponding to the target sequence through the decoder.
2. The spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to claim 1, wherein the determining a plurality of second-order exogenous sequences from a plurality of exogenous sequences specifically comprises:
for each exogenous sequence in the plurality of exogenous sequences, determining a correlation sequence of the exogenous sequence and each exogenous sequence in the plurality of exogenous sequences;
determining a plurality of second-order exogenous sequences according to all determined correlation sequences.
3. The spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to claim 1, wherein the encoder comprises a spatiotemporal attention module and an activation module, the input of the second-order exogenous sequences into the encoder, and the determination of the implicit spatial features by the encoder specifically comprises:
inputting a plurality of second-order exogenous sequences into a space-time attention module, and determining candidate hidden space characteristics corresponding to the second-order exogenous sequences through the space-time attention module;
inputting each candidate hidden space feature into an activation module, and determining the hidden space feature through the activation module.
4. The method for predicting the spatiotemporal two-stage attention-based nonlinear exogenous sequence according to claim 3, wherein the step of inputting the plurality of second-order exogenous sequences into a spatiotemporal attention module, and the step of determining candidate implicit spatial features corresponding to the second-order exogenous sequences by the spatiotemporal attention module specifically comprises the steps of:
the space-time attention module determines space-time attention characteristics corresponding to each second-order exogenous sequence;
the space-time attention module determines a weight sequence corresponding to each second-order exogenous sequence based on the attention characteristics at each moment;
and the space-time attention module determines candidate hidden space characteristics corresponding to each second-order exogenous sequence based on each exogenous sequence and the weight sequence corresponding to each second-order exogenous sequence.
5. The method as claimed in claim 3, wherein the activation module comprises a plurality of long and short memory units, the long and short memory units correspond to the hidden space features one by one, and each hidden space feature is an input item of the corresponding long and short memory unit.
6. The spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to claim 1, wherein the decoder comprises a linear module, a plurality of cascaded self-attention modules and a fusion module, wherein the linear module has a plurality of target sequences as input items, the first self-attention module has an input item comprising an output item of the linear module and a hidden space feature, the last self-attention module has an input item comprising an output item of the previous self-attention module, and the respective attention modules have hidden space features as input items, and the fusion module has a prediction sequence as output items.
7. The spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to claim 6, wherein the self-attention module comprises a first multi-head attention unit, a first fusion unit, a second multi-head attention unit, a second fusion unit, a linear unit, and a third fusion unit; the input items of the first multi-head attention unit comprise a plurality of output items of the self-attention module positioned in front of the self-attention module; the input items of the first fusion unit comprise the output item of the first multi-head attention unit and the output item of the self-attention module positioned in front of the self-attention module; the input items of the second multi-head attention unit comprise a plurality of implicit spatial features and the output items of the first fusion unit, the input items of the second fusion unit comprise the output items of the second multi-head attention unit and the input items of the second multi-head attention unit, the input items of the linear unit are the output items of the second fusion unit, and the input items of the third fusion unit comprise the output items of the linear unit and the input items of the linear unit.
8. The spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to claim 6, characterized in that the fusion module comprises a fusion unit and a full-connection unit, the fusion unit is connected with the full-connection unit, and the fusion unit is used for summing its corresponding input items according to a time dimension.
9. A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the spatiotemporal two-stage attention-based nonlinear exogenous sequence prediction method according to any one of claims 1-8.
10. A terminal device, comprising: a processor, a memory, and a communication bus; the memory has stored thereon a computer readable program executable by the processor;
the communication bus realizes connection communication between the processor and the memory;
the processor, when executing the computer readable program, implements the steps in the spatiotemporal two-stage attention-based non-linear exogenous sequence prediction method of any one of claims 1-8.
CN202011042266.4A 2020-09-28 2020-09-28 Time-space two-stage attention-based nonlinear exogenous sequence prediction method Pending CN112132353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011042266.4A CN112132353A (en) 2020-09-28 2020-09-28 Time-space two-stage attention-based nonlinear exogenous sequence prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011042266.4A CN112132353A (en) 2020-09-28 2020-09-28 Time-space two-stage attention-based nonlinear exogenous sequence prediction method

Publications (1)

Publication Number Publication Date
CN112132353A true CN112132353A (en) 2020-12-25

Family

ID=73844299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011042266.4A Pending CN112132353A (en) 2020-09-28 2020-09-28 Time-space two-stage attention-based nonlinear exogenous sequence prediction method

Country Status (1)

Country Link
CN (1) CN112132353A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743072A (en) * 2022-05-24 2022-07-12 中国科学院计算机网络信息中心 Training method of short-term time sequence prediction model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114743072A (en) * 2022-05-24 2022-07-12 中国科学院计算机网络信息中心 Training method of short-term time sequence prediction model

Similar Documents

Publication Publication Date Title
Liu et al. Stock market prediction with deep learning: The case of China
EP4089587A1 (en) Data processing method and related device
EP4209965A1 (en) Data processing method and related device
US20190138887A1 (en) Systems, methods, and media for gated recurrent neural networks with reduced parameter gating signals and/or memory-cell units
CN110264270B (en) Behavior prediction method, behavior prediction device, behavior prediction equipment and storage medium
CN110383299A (en) The generation time model of memory-enhancing effect
CN111898247B (en) Landslide displacement prediction method, landslide displacement prediction equipment and storage medium
CN114358657B (en) Post recommendation method and device based on model fusion
US20240135174A1 (en) Data processing method, and neural network model training method and apparatus
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
CN113239702A (en) Intention recognition method and device and electronic equipment
CN112132353A (en) Time-space two-stage attention-based nonlinear exogenous sequence prediction method
CN113869596A (en) Task prediction processing method, device, product and medium
KR102409041B1 (en) portfolio asset allocation reinforcement learning method using actor critic model
Unadkat et al. Deep learning for financial prediction
CN116542673A (en) Fraud identification method and system applied to machine learning
CN115860802A (en) Product value prediction method, device, computer equipment and storage medium
CN113642654B (en) Image feature fusion method and device, electronic equipment and storage medium
CN115841596A (en) Multi-label image classification method and training method and device of multi-label image classification model
CN114612231A (en) Stock quantitative trading method and device, terminal device and readable storage medium
CN113010687B (en) Exercise label prediction method and device, storage medium and computer equipment
CN111179070A (en) Loan risk timeliness prediction system and method based on LSTM
CN110955755A (en) Method and system for determining target standard information
CN111126423A (en) Feature set acquisition method and device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201225

RJ01 Rejection of invention patent application after publication