CN109643387A

CN109643387A - Learn the system and method with predicted time sequence data for using depth multiplication network

Info

Publication number: CN109643387A
Application number: CN201780053794.XA
Authority: CN
Inventors: P.伯查德
Original assignee: Goldman Sachs and Co LLC
Current assignee: Goldman Sachs and Co LLC
Priority date: 2016-09-01
Filing date: 2017-08-30
Publication date: 2019-04-16
Also published as: EP3507746A4; AU2017321524A1; WO2018045021A1; EP3507746A1; AU2017321524B2; CA3033753A1

Abstract

A kind of method includes being learnt and predicted time sequence data using calculating network (100).Calculating network includes one or more layers (102a, 102b, 102c), and each layer has encoder (104a, 104b, 104c) and decoder (106a, 106b, 106c).Each layer of the encoder (i) current feed-forward information from lower level or calculating network inputs (112) and the (ii) past feedback information from higher level or this layer with multiplicative combination.Each layer of encoder generates the current feed-forward information for higher level or this layer.Each layer of decoder (i) current feedback information from higher level or this layer and (ii) the current feed-forward information from lower level or at least one of calculates network inputs or past feed-forward information from lower level or calculating network inputs with multiplicative combination.Each layer of decoder generates the current feedback information for lower level or calculates network output (114).

Description

Learn for using depth multiplication network with the system of predicted time sequence data and Method

Technical field

The present disclosure relates generally to machine learning sum number it is predicted that.More specifically, this disclosure relates to for using depth multiplication Network learns the system and method with predicted time sequence data.

Background technique

" machine learning " is generally referred to as being designed to learning and executing data the calculating skill of forecast analysis from data Art.Neural network is an exemplary types of the machine learning techniques based on bio-networks (such as human brain).In neural network, Data processing is executed using artificial neuron, the artificial neuron is coupled together and passes through various communication link exchange warps The data of processing.It can be by changing weight associated with communication link so that some data are considered as than other data more In terms of important " study " to realize neural network.

" time series forecasting " is referred to using time series data, the prediction made by machine learning algorithm, institute It is all in this way via the input of one or more sense organs, over time collected data value to state time series data.Time Sequence prediction is intelligent important component.For example, the ability of the prediction input time sequence of intelligent entity can permit intelligent reality Body creates the model of the world (or some smaller portions).

Summary of the invention

Present disclose provides learn the system and method with predicted time sequence data for using depth multiplication network.

In the first embodiment, a kind of method includes being learnt and predicted time sequence data using network is calculated.It calculates Network includes one or more layers, and each layer includes encoder and decoder.Each layer of encoder multiplicative combination (i) Current feed-forward information from lower level calculates network inputs and the (ii) past feedback information from higher level or this layer. Each layer of encoder generates the current feed-forward information for higher level or this layer.Each layer of decoder multiplicative combination (i) Current feedback information from higher level or this layer and the (ii) current feed-forward information from lower level or calculate network inputs or Past feed-forward information or calculating network inputs at least one of of the person from lower level.Each layer of decoder generation is used for The current feedback information of lower level calculates network output.

In a second embodiment, a kind of device includes at least one storage of at least one processing equipment and store instruction Device, described instruction make at least one described processing equipment use calculating network when being executed by least one described processing equipment To learn and predicted time sequence data.Calculating network includes one or more layers, and each layer includes encoder and decoding Device.Each layer of encoder is configured to the current feed-forward information with multiplicative combination (i) from lower level or calculates network inputs (ii) the past feedback information from higher level or this layer.Each layer of encoder be configured to generate for higher level or The current feed-forward information of this layer.Each layer of decoder is configured to (i) current from higher level or this layer with multiplicative combination Feedback information and the (ii) current feed-forward information from lower level or before calculating network inputs or past from lower level At least one of feedforward information or calculating network inputs.Each layer of decoder is configured to generate for the current anti-of lower level Feedforward information calculates network output.

In the third embodiment, a kind of non-transitory computer-readable medium includes instruction, and described instruction is by least one When a processing equipment executes, learn at least one described processing equipment and predicted time sequence data using network is calculated. Calculating network includes one or more layers, and each layer includes encoder and decoder.Each layer of encoder is configured to The (i) current feed-forward information from lower level or calculating network inputs and the (ii) mistake from higher level or this layer with multiplicative combination The feedback information gone.Each layer of encoder is configured to generate the current feed-forward information for higher level or this layer.Each layer Decoder be configured to multiplicative combination (i) current feedback information from higher level or this layer and (ii) from lower level Current feed-forward information calculates network inputs or past feed-forward information from lower level or calculates in network inputs extremely It is one few.Each layer of decoder is configured to generate the current feedback information for lower level or calculates network output.

From following figure, specification and claims, other technical characteristics may be to those skilled in the art It is easily obvious.

Detailed description of the invention

In order to which the disclosure and its feature is more completely understood, it is described below referring now to what is taken in conjunction with attached drawing, in attached drawing In:

Fig. 1 illustrates the realizations according to the disclosure for learning and the example frame of the depth multiplication network of predicted time sequence data Structure；

Fig. 2 is illustrated according to the disclosure for being learnt the example system with predicted time sequence data using depth multiplication network System；And

Fig. 3 is illustrated according to the disclosure for being learnt the example side with predicted time sequence data using depth multiplication network Method.

Specific embodiment

The various embodiments of principle for describing the present invention are only made in Fig. 1 discussed below to 3 and this patent document To illustrate, and should not be construed to limit the scope of the invention in any way.It will be understood by those skilled in the art that can be The principle of the present invention is realized in any kind of equipment or system suitably arranged.

As noted above, such as when time series forecasting allows intelligent entity (such as people) to create the generation around him or she When the prediction model on boundary, time series forecasting is intelligent important component.The movement (motor) of intelligent entity is intended to can also be certainly So form a part of time series." motion intention " is generally referred to as desired movement movement associated with nerve signal, Arm or leg such as based on the mobile people of different nerve signals or the hand for opening/closing a people.Including past The prediction of motion intention allows the influence to the motion intention to world around to model.In addition, if intelligent entity includes The control system that the optimal movement intention about some high level goals for influencing the world can be calculated, then predict to move in the future The ability of intention can more accurately occur, and without always must have gone through fully optimized, this can calculated and energy user Face provides huge saving.Can learn and the specific example of time series data predicted include natural language (including text or Voice) and video.

The time using neural network is traditionally completed using pure feedforward neural network or shallow recurrent neural network Sequence prediction.Recurrent neural network refers to that the connection between the node in wherein network forms the nerve of " azimuth circle " or closed loop Network, do not allow other than starting and ending node (its indicate same node) in " azimuth circle " or closed loop node and The repetition of connection.Recently, the depth Recursive Networks remembered using shot and long term are had been devised by.Multiply although these Web vector graphics are some Method element, but they are mainly addition to keep backpropagation feasible.

In one aspect of the present disclosure, provide for learn and the equipment of predicted time sequence data, system, method and Computer-readable medium.Via (i) passing through (using can scavenger) multiplication of current low-level information and past high-level information Combination is to extract high-level information, and (ii) passes through predicted high-level information in future and current and/or past low-level information Multiplicative combination come feedback time sequence future prediction, and complete study and prediction.In the method, via by advanced and The multiplicative combination of low-level information combines feedforward and feedback, and forms depth Recursive Networks.

Fig. 1 illustrates shown for learn with the depth multiplication network of predicted time sequence data according to the realization of the disclosure Example framework 100.As shown in fig. 1, framework 100 includes one or more layer 102a-102c.In this example, framework 100 includes Three layer 102a-102c, but other number target zones can be used in framework 100.

Layer 102a-102c respectively includes encoder 104a-104c and respectively includes decoder 106a-106c.Encoder 104a-104c is configured to generate and export feed-forward information 108a-108c respectively.Encoder 104a- in layer 102a-102b 104b is configured to for feed-forward information 108a-108b to be output to next (next) higher level 102b-102c.Top 102c Encoder 104c be configured to export feed-forward information 108c for top 102c itself use (for purpose of feeding back and feedover The two).

Decoder 106a-106c is configured to generate and export feedback information 110a-110c respectively.In layer 102b-102c Decoder 106b-106c be configured to for feedback information 110a-110b to be output to next lower level 102a-102b.Most The decoder 106a of low layer 102a is configured to export the feedback information 110a from framework 100.

Feed-forward information is received as input 112 in the lowermost layer 102a of framework 100.Single input 112 indicates current Time sequential value, and multiple inputs 112 indicate the sequence for the value being provided in lowermost layer 102a, form the time of data Sequence.Feedback information is provided from the lowermost layer 102a of framework 100, as the next value 114 predicted.Individually predicted Next value 114 indicates predicted future time sequential value, and multiple predicted next values 114 indicate to be formed The sequence for the value that the slave lowermost layer 102a for the time series of data predicted is provided.Highest feedforward (and first feedback) information 108c indicates the highest coding of input time sequence data.

Layer 102a-102c respectively further comprises the delay cell 116a-116c for feedback and optionally includes needle respectively To the delay cell 118a-118c of feedforward.Delay cell 116a-116c is configured to receive feedback information and by the message delay One or more chronomeres.In some embodiments, such as when (one or more) delay of higher level compares (the one of low layer It is a or multiple) delay it is long when, delay cell 116a-116c can provide different message delays.Then by delayed information Encoder 104a-104c is supplied to from delay cell 116a-116c.Delay cell 118a-118c is configured to receive input 112 Or feed-forward information and may be by the zero or more chronomere of the message delay.Equally, in some embodiments, such as when compared with When (one or more) that low layer is compared in high-rise (one or more) delay postpones long, delay cell 118a-118c can be mentioned For different message delays.(possibility) delayed information is supplied to decoder 106a- from delay cell 118a-118c 106c。

In some embodiments, coding can be provided to transmit by non-linear pond unit 120a-120c respectively The input 112 of device 104a-104c or feed-forward information 108a-108b.Pond unit 120a-120c operates so as to increase its transformation The mode of invariance reduces the dimension of data.For example, so-calledℓ ²Pond unit such as can be translated and be rotated etc to providing Unitary group indicate invariance.

Each of encoder 104a-104c and decoder 106a-106c are configured to its input of multiplicative combination.Example Such as, each encoder 104a-104c is configured to (i) current (and may the pond) feedforward from lower level with multiplicative combination Information or input 112, and (ii) from higher level (or top own layer) delayed feedback information to produce Raw current feed-forward information.Then current feed-forward information is supplied to higher level (or the same layer at top).Each decoding Device 106a-106c be configured to multiplicative combination (i) come from higher level (or top own layer) current feedback letter Breath and (ii) current (and may postpone) feed-forward information from lower level input 112 to generate current feedback information.When Preceding feedback information is then supplied to lower level or as the next value predicted.As shown in fig. 1, from top The feed-forward information 108c of 102c is fed back to itself, takes the circumstances into consideration to postpone by delay cell 116c.

In some embodiments, framework 100 shown in Fig. 1 can be used for realizing autocoder." autocoder " is Attempt a neural network or other machines learning algorithm of the generation for the coding of one group of data.The coding indicates this group of data Expression but have reduced dimension.In the ideal case, which allows autocoder based in time series data Initial value carrys out the future value in predicted time sequence data.The ability of predicted time sequence data can be found in extensive application Purposes.

This can by make each decoder 106a-106c multiplicative combination (i) come from higher level (or top it from Oneself layer) current feedback information and (ii) the feed-forward information of the current and past from lower level or input 112 are realized.This Allow network to summarize inertia autocoder, uses current feedforward value, a past feedforward value and constant more advanced anti- The inertia combination of feedback.

Generally, realize that the network of autocoder is typically designed to so that its output approximatively reproduces its input.When When applied to time series data, autocoder is in the sense that reproducing the information in future using only past information " having causal ".Iteratively, it is such have causal autocoder can from itself reproduce it is entire when Between sequence, it means that have causal autocoder the entire time can be identified based on the initial value of time series Sequence.It is desirable that coding of the complete layer 102a-102c to input 112, so that the final coded representation (information of input 112 It is 108c) that height is controlled (such as sparse).The coded representation of input 112 can also be ideally used to what generation was predicted Next value 114 indicates the approximate reproduction of input 112.For time series data, there is causal autocoding Device will make the input in future approximatively be reproduced as the next value 114 predicted based on past input 112, to allow to have There is causal autocoder to make prediction time series data.

In some embodiments, when final coding is as advanced as possible and constant so that when identical coding can be used for many When spacer step (time step), it may be the most useful for having causal autocoder.By pond and/or pass through by Time series data is encoded into the coding of more low dimensional with multiplication, and invariance can be realized in Fig. 1.However, in order to (according to tool Having the requirement of causal autocoder) slightly the latter time step approximatively reproduces and is originally inputted 112, need to abandon Low-level information add backs in calculating.According to the understanding, feed-forward information 108a-108b can be used to calculate advanced constant coding It (information 108c) and can be used through the feedback information 110a-110b of consolidated network come via the decoded use of multiplication, benefit The next value 114 predicted is enriched with non-fixed information, rather than pure feedforward network is used for autocoder.

Each of layer 102a-102c includes for data to be encoded, provided with dimension reduction or executes any other Any (one or more) suitable structure of suitable processing operation.It is, for example, possible to use hardware or hardware and software/firmwares The combination of instruction is to realize each of layer 102a-102c.

Multiplicative combination in each of encoder 104a-104c and decoder 106a-106c can take various forms. For example, multiplicative combination may include numerical multiplication or boolean's and function.Multiplicative combination generally forms coding or decodes the biography of node A part of delivery function also may include or execute other mathematical operations (S-shaped of such as input signal damps).As specific Example, multiplicative combination can provide some approximations of Boolean AND operation, so that node be allowed to be operated as general state machine. Therefore, node can check whether that input isxAnd state isy, and if it is, then determine that new state should bez。

As other machines learning system, framework 100 can be trained so that encoder 104a-104c, decoder 106a-106c and delay cell 116a-116c, 118a-118c desirably work.Up to the present, it has generally avoided Depth multiplication network, because it is difficult to use the pure feedforward depth multiplication network of standard backpropagation techniques training.In some implementations In example, for framework 100(combination feedforward and feedback unit) training method will be the given time sequence training data the case where Lower repeated application following steps.It is single by each encoder 104a-104c and delay for each time step in training data First 116a-116c/118a-118c(the latter constitutes temporal forward-propagating) forward-propagating training data, and pass through each solution Code device 106a-106c(updates its non-delayed feed back input 110b-110c/108c) backpropagation training data.Then, at the same across All time steps, the more weight of new encoder 104a-104c and decoder 106a-106c, preferably from current training input Reproduce current training output.In some embodiments, if it is desired, then can also be in each encoder 104a-104c and/or solution Post-processing is such as executed by the standardization of its weight and/or rarefaction at code device 106a-106c.This leads to local optimum The stable convergence of network.

It in other embodiments, can be alternatively using the sparse volume for the recurrence and multiplicative property for being such as suitable for framework 100 Code technology trains encoder 104a-104c.During this is unsupervised, training is related to alternately the (i) power of more new encoder Weight (due to the recurrence of framework 100, is more than output and is used as the defeated of the training with encoding state is (ii) updated Enter).In each iteration, individually across training set and amount to across input and state each training to the output to encoder Activation be standardized.Then all weights of encoder are reduced into fixed amount.Standardization and the combination reduced are intended to Keep weight sparse.Sparsity can be particularly useful for multiplication network, because the sum of possible weight is very big.Once coding Device has the good expression of the combination of context and input via sparse coding, so that it may such as by using these encoding states The pass in this layer is trained in frequency analysis (frequency analysis) how to combine with the practical future value of time series Join decoder.Certainly, any other suitably trains mechanism that can be used together with the component of framework 100.

Framework 100 shown in Fig. 1 can find purposes in numerous applications.For example, framework 100 can be applied to nature Language understanding and generation.As a specific example, it is assumed that framework 100 includes four ranks.By feedforward, framework 100 four layers ( Moved up in framework 100) can by alpha code at phoneme, by phoneme encoding at word, by word be encoded into phrase and Phrase is encoded into sentence.By feedback, four layers (moving down in framework 100) of framework 100 can be by sentence context Combine with current and/or past phrase information to predict next phrase, by phrase context with it is current and/or past Word information is combined to predict next word, combines word context with pre- with current and/or past phoneme information Next phoneme is surveyed, and by phoneme context and current and/or past monogram to predict next letter.? In framework 100, each layer (other than lowermost layer 102a) is by the lower level more neighbouring than its more slowly transition status, because of the layer The information at place indicates more constant encoding state.Then it will be used to predict the less constant of lower level information by decoder Information feed back into prediction.For both encoder and decoder due to their multiplicative property, being considered indicates the spy Determine " state machine " of the grammer of abstraction level.

Although Fig. 1 illustrates realize for learning and the framework 100 of the depth multiplication network of predicted time sequence data One example, but various changes can be made to Fig. 1.For example, it includes three layers and can be any that framework 100, which does not need, It include other number target zones in suitable arrangement (including single layer).

Fig. 2 is illustrated to be shown for being learnt using depth multiplication network with predicted time sequence data according to the disclosure Example system 200.As shown in Figure 2, system 200 instruction include at least one processing equipment 202, at least one storage equipment 204, The computing system of at least one communication unit 206 and at least one input/output (I/O) unit 208.

Processing equipment 202 executes the instruction that can be loaded into memory 210.Processing equipment 202 is in any suitable cloth It include the processor or other equipment of any suitable (one or more) number and (one or more) type in setting.Processing is set Standby 202 exemplary types include microprocessor, microcontroller, digital signal processor, field programmable gate array, dedicated integrated Circuit and discrete circuit.

Memory devices 210 and lasting reservoir 212 are the examples for storing equipment 204, and expression can store and promote The retrieval of information (such as data, program code and/or other suitable information on the basis of temporarily or permanently) it is any (one or more) structure.Memory devices 210 can indicate random access memory or any (one or more) other conjunctions Suitable volatibility or non-volatile memory device.Lasting reservoir 212 may include support one of longer term storage of data or Multiple components or equipment, such as read-only memory, hard disk drive, flash memory or CD.

Communication unit 206 supports the communication with other systems or equipment.For example, communication unit 206 may include promoting to pass through The network interface card or wireless transceiver of the communication of wired or wireless network.Communication unit 206 can pass through any (one or more It is a) suitable physics or wireless communication link support to communicate.

I/O unit 208 allows outputting and inputting for data.For example, I/O unit 208 can by keyboard, mouse, keypad, Touch screen or other suitable input equipments provide the connection for user's input.I/O unit 208 can also be to display, printing Machine or other suitable output equipments send output.

It in some embodiments, may include realizing the instruction of the framework 100 of Fig. 1 by the instruction that processing equipment 202 executes. For example, may include realizing that various encoders shown in Fig. 1, decoder and delay are single as the instruction that processing equipment 202 executes The instruction of member, and support to be related to the instruction of the data flow and data exchange of these components.

Learn and the system 200 of predicted time sequence data although Fig. 2 is illustrated for using depth multiplication network One example, but various changes can be made to Fig. 2.Such as, it is assumed here that using the software executed by processing equipment 202/ Firmware realizes the framework 100 of Fig. 1.However, it is possible to use any suitable only hardware realization or any suitable hardware and soft Part/firmware is realized to realize the functionality.Occur moreover, calculating equipment with diversified configuration, and Fig. 2 is not by the disclosure It is limited to any specific calculating equipment.

Fig. 3 is illustrated to be shown for being learnt using depth multiplication network with predicted time sequence data according to the disclosure Example method 300.For being easy for explanation, method 300 is described as being realized by the equipment 200 of Fig. 2 using the framework 100 of Fig. 1. Note, however, can implementation method 300 in any other suitable manner.

As shown in Figure 3, training calculates network at step 302.This may include the processing equipment of such as equipment 200 202 reception training time sequence datas and the framework 100 that the data are supplied to Fig. 1.As noted above, framework 100 includes One or more layer 102a-102c, each layer include corresponding encoder 104a-104c and corresponding decoder 106a-106c. In some embodiments, it can be trained and repeating following operation.For each time step, pass through encoder 104a-104c and delay cell 116a-116c/118a-118c forward-propagating training data.For each time step, pass through solution Code device 106a-106c backpropagation training data.More new encoder 104a-104c and decoder 106a-106c with across sometimes Spacer step, which is preferably inputted from training, reproduces training output.Any desired after-treatment applications are conciliate in encoder 104a-104c Code device 106a-106c, such as standardization and/or rarefaction.

At step 304, input time sequence data is received at network calculating.This may include such as equipment 200 Processing equipment 202 from any suitable source receiving time sequence data, all one or more sensors in this way in the source or other Input equipment.This can also include that time series data is supplied to the layer 102a of framework 100 by the processing equipment 202 of equipment 200 As input 112.

It at step 306, is calculating at each of network layer, by the current feed-forward information from lower level or is calculating network Input and the past feedback information multiplicative combination from higher level or same layer.Each encoder is at step 308 It generates and is used for higher level or the current feed-forward information for itself.This may include that the processing equipment 202 of such as equipment 200 makes Past feedback of the multiplicative combination feed-forward information (input 112) with the decoder 106b from layer 102b is used with encoder 104a Information.This can also include equipment 200 processing equipment 202 using encoder 104b come with multiplicative combination come self-encoding encoder 104a Feed-forward information 108a and the decoder 106c from layer 102c past feedback information.This can also include equipment 200 Processing equipment 202 come the feed-forward information 108b of self-encoding encoder 104b and comes from itself with multiplicative combination using encoder 104c Past feedback information.

At step 310, calculating each of network layer place, by from higher level or same layer current feedback information and Current and/or past feed-forward information or calculating network inputs multiplicative combination from lower level.Each decoder to Step 312 place, which generates, is used for lower level or the current feedback information for itself.This may include the processing of such as equipment 200 Equipment 202 come the feedback information (information 108c) of self-encoding encoder 104c and comes from layer with multiplicative combination using decoder 106c Current/past feed-forward information of the encoder 104b of 102b.This can also use solution including the processing equipment 202 of equipment 200 Code device 106b working as come the feedback information 110c and encoder 104a from layer 102a with multiplicative combination from decoder 104c Before/past feed-forward information.This can also use multiplicative combination using decoder 106a including the processing equipment 202 of equipment 200 Feedback information 110b and current/past feed-forward information (input 112) from decoder 106b.

Note that layer 102a-102b currently feedovers it each of other than top 102c in step 306-312 Information is sent to next higher level 102b-102c, and each of other than lowermost layer 102a layer 102b-102c by its Current feedback information is sent to next lower level 102a-102b.Its current feed-forward information 108c is used as by top 102c Its current feedback information, and lowermost layer 102a is sent to its current feedback information as the next value 114 predicted Calculate network output.The current feed-forward information for being provided to lowermost layer 102a indicates current time sequential value, and from lowermost layer The current feedback information that 102a is provided indicates predicted future time sequential value.Note that it is directed to each layer of 102a-102c, it can With the feedback information to be become the past by current feedback information next life of the delay from higher level or itself.Moreover, for each Layer 102a-102c, the feed-forward information that can be become the past by current feed-forward information next life of the delay from lower level.In addition, can To be provided to the current feed-forward information of encoder 104a-104c by pond unit 120a-120c transmitting first to reduce dimension Degree or the Inalterability of displacement for increasing time series data.

In this manner, carrying out predicted time sequence data using calculating network at step 314.This may include such as equipment 200 processing equipment 202 inputs 112 using calculating network, based on a limited number of come the entire sequence of predicted time sequence data Column.

Learn and the method 300 of predicted time sequence data although Fig. 3 is illustrated for using depth multiplication network One example, but various changes can be made to Fig. 3.Although various steps can for example, being shown as series of steps With overlapping, parallel generation, occur or occur in a different order any number.As a specific example, step 306-314 is general It can overlap each other.

In some embodiments, various functions described in patent document are realized or supported by computer program, The computer program is formed by computer readable program code and is embodied in computer-readable medium.Phrase " computer Readable program code " includes any kind of computer code comprising source code, object code and executable code.Phrase " computer-readable medium " includes any kind of medium that can be accessed by a computer, such as read-only memory (ROM), random Access the memory of memory (RAM), hard disk drive, compact-disc (CD), digital video disc (DVD) or any other type. " non-transitory " computer-readable medium excludes conveying temporary electricity or the wired, wireless of other signals, optics or other communications Link.Non-transitory computer-readable medium includes that the medium of storing data and can for good and all store and overwritten data later Medium, such as compact disc rocordable or erasable memory equipment.

Illustrate that the definition of the certain word and expressions used through patent document can be advantageous.Term " application " and " program " refers to be suitable for being realized with suitable computer code (including source code, object code or executable code) one A or multiple computer programs, component software, instruction set, process, function, object, class, example, related data or part of it. Term " communication " and its derivative include both direct communication and indirect communication.Term " includes " and "comprising" and its derivative meaning Taste there is no limit include.Term "or" is inclusive, it is meant that and/or.Phrase " with ... it is associated " and its derive from Word may mean that including, be included in ... it is interior, with ... interconnect, include, be comprised in ... it is interior, be connected to or with ... Connect, be coupled to or with ... coupling, can be with ... communication, with ... cooperate, interlock and ... juxtaposition, close to, it is bound To or with ... the attribute binding, have, having ... and/or with ... have relationship etc..Phrase "...... at least one" Mean to can be used the various combination of one or more of listed item when the list with project is used together, and can It can need the only one project in list.For example, " at least one of A, B and C " includes any in following combination: A, B, C, A and B, A and C, B and C and A and B and C.

Description in patent document should not be read as implying that any particular element, step or function are must to be wrapped Include necessity or key element in scope of the claims.Moreover, no one of claim is intended to about any appended power Benefit requires or claim elements quote 35 articles of 112(f of United States Code No.) money, unless clearly making in specific rights requirement With exact word " component being used for ... " or " the step of being used for ... ", it is followed by the participle phrase of identification function.Right Such as (but not limited to) " mechanism ", " module ", " equipment ", " unit ", " component ", " element ", " component ", " dress in it is required that Set ", the use of term of " machine ", " system ", " processor ", " processing equipment " or " controller " etc is understood as that and is intended to Refer to that further modifying or enhance such as the feature by claim itself is knot known to those skilled in the relevant arts Structure, and be not intended to and quote 35 articles of 112(f of United States Code No.) money.

Although some embodiments and general associated method, the change of these embodiments and methods has been described in the disclosure More it will be apparent to those skilled in the art with displacement.Therefore, the above description of example embodiment does not limit Or the constraint disclosure.Without departing from the spirit and scope of the present disclosure as defined by the appended claims, other Change, replacement and change are also possible.

Claims

1. a kind of method, comprising:

Learn to include one or more layers with predicted time sequence data, calculating network using network is calculated, each layer includes Encoder and decoder；

Wherein each layer of encoder with multiplicative combination (i) current feed-forward information from lower level or calculate network inputs and (ii) the past feedback information from higher level or this layer, each layer of encoder are generated for the current of higher level or this layer Feed-forward information；And

Wherein each layer of decoder with multiplicative combination (i) current feedback information from higher level or this layer and (ii) come from compared with The current feed-forward information of low layer calculates network inputs or past feed-forward information or calculating network inputs from lower level At least one of, each layer of decoder generates the current feedback information for lower level or calculates network output.

2. according to the method described in claim 1, wherein:

Calculating network includes multiple layers；

Its current feed-forward information is sent next higher level by each layer other than top；And

Its current feedback information is sent next lower level by each layer other than lowermost layer.

3. according to the method described in claim 2, wherein:

It is top to use its current feed-forward information as its current feedback information；And

Lowermost layer, which sends its current feedback information to, calculates network output.

4. according to the method described in claim 2, wherein:

The current feed-forward information for being supplied to lowermost layer indicates current time sequential value；And

The current feedback information provided from lowermost layer indicates predicted future time sequential value.

5. according to the method described in claim 1, further include:

For each layer, by postponing the current feedback information from higher level or this layer, the past is generated from higher level or the layer Feedback information.

6. according to the method described in claim 1, further include:

For each layer, by postponing the current feed-forward information from lower level or calculating network inputs, from lower level or calculating Network inputs generate past feed-forward information.

7. being supplied to each layer of encoder according to the method described in claim 1, wherein transmitting first by pond unit Current feed-forward information from lower level or calculate network inputs, the pond unit, which reduces dimension or increases current feedforward, to be believed The Inalterability of displacement of breath.

8. according to the method described in claim 1, further include:

The encoder and decoder of each layer of training.

9. according to the method described in claim 8, wherein:

Calculating network includes multiple layers, jointly includes multiple encoders, multiple decoders and multiple delay cells；And

Each layer of encoder and decoder of training include:

For each of multiple time steps, pass through encoder and delay cell forward-propagating training data；

For each of time step, pass through decoder backpropagation training data；

Encoder and decoder are updated to improve their reproductions across time step to training data；And

It is post-processed to encoder and decoder application.

10. a kind of device, comprising:

At least one processing equipment；And

At least one processor of store instruction, described instruction when being executed by least one described processing equipment, make it is described extremely A few processing equipment learns to include one or more layers with predicted time sequence data, calculating network using network is calculated, Each layer includes encoder and decoder；

Wherein each layer of encoder is configured to the current feed-forward information with multiplicative combination (i) from lower level or calculates network Input and the (ii) past feedback information from higher level or this layer, each layer of encoder are configured to generate for higher The current feed-forward information of layer or this layer；And

Wherein each layer of decoder be configured to multiplicative combination (i) current feedback information from higher level or this layer and (ii) the current feed-forward information from lower level or calculating network inputs or past feed-forward information or meter from lower level Calculate at least one of network inputs, each layer of decoder be configured to generate by the current feedback information of lower level or based on Calculate network output.

11. device according to claim 10, in which:

Calculating network includes multiple layers；

Each layer other than top is configured to send next higher level for its current feed-forward information；And

Each layer other than lowermost layer is configured to send its current feedback information to next lower level.

12. device according to claim 11, in which:

It is top to be configured to use its current feed-forward information as its current feedback information；And

Lowermost layer, which is configured to send its current feedback information to, calculates network output.

13. device according to claim 11, in which:

Lowermost layer is configured to receive the current feed-forward information including current time sequential value；And

Lowermost layer is configured to provide the current feedback information including predicted future time sequential value.

14. device according to claim 10, wherein each layer be configured to postpone it is current from higher level or this layer Feedback information, to generate the past feedback information for being used for this layer.

15. device according to claim 10, wherein each layer is configured to postpone the letter of the current feedforward from lower level Breath calculates network inputs, to generate the past feed-forward information for being used for this layer.

16. device according to claim 10, wherein calculating network further includes multiple pond units, each pond unit quilt It is configured to receive the current feed-forward information from lower level or calculates network inputs, and reduce dimension or increase current feedforward letter The Inalterability of displacement of breath.

17. device according to claim 10, wherein at least one described processing equipment is further configured to train each layer Encoder and decoder.

18. device according to claim 17, in which:

In order to train each layer of encoder and decoder, at least one described processing equipment is configured to:

For each of time step, pass through decoder backpropagation training data；

It is post-processed to encoder and decoder application.

19. a kind of non-transitory computer-readable medium comprising instruction, described instruction are executed by least one processing equipment When, make at least one described processing equipment:

20. non-transitory computer-readable medium according to claim 19, in which:

Calculating network includes multiple layers；

21. non-transitory computer-readable medium according to claim 20, in which:

22. non-transitory computer-readable medium according to claim 20, in which:

23. non-transitory computer-readable medium according to claim 19, wherein each layer is configured to postpone to come from The current feedback information of higher level or this layer, to generate the past feedback information for being used for this layer.

24. non-transitory computer-readable medium according to claim 19, wherein each layer is configured to postpone to come from The current feed-forward information of lower level calculates network inputs, to generate the past feed-forward information for being used for this layer.

25. non-transitory computer-readable medium according to claim 19, wherein calculating network further includes multiple ponds Unit, each pond unit is configured to receive the current feed-forward information from lower level or calculates network inputs, and reduces Dimension or the Inalterability of displacement for increasing current feed-forward information.

26. non-transitory computer-readable medium according to claim 19 is also included in by least one described processing Equipment makes the instruction of each layer of the training of at least one processing equipment of the encoder and decoder when executing.

27. non-transitory computer-readable medium according to claim 26, in which:

The described instruction for making each layer of the training of at least one processing equipment of the encoder and decoder when executed includes At least one described processing equipment is set to execute the instruction of following steps when executed:

For each of time step, pass through decoder backpropagation training data；

It is post-processed to encoder and decoder application.