CN107274029A

CN107274029A - A kind of future anticipation method of interaction medium in utilization dynamic scene

Info

Publication number: CN107274029A
Application number: CN201710487508.2A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2017-10-20

Abstract

The future anticipation method of interaction medium in a kind of utilization dynamic scene proposed in the present invention, its main contents include：Multisample generation, sample sequence are merged with perfect, scene, its process is, the multisample for assuming prediction is generated first by condition variation self-encoding encoder, the marking and queuing of sample is carried out followed by the long-term favorable decisions method based on reverse optimum control framework and improves processing, introduce Recognition with Recurrent Neural Network and scene fusion is carried out to the medium of image-context, activity, obtained result is fed back in decoder again to be iterated.The present invention can handle the future event foresight activity of a certain medium in scene dynamically or statically and generate, sorts and improve sample there is provided a Recognition with Recurrent Neural Network set, significantly obtain the precise effect of prediction.

Description

A kind of future anticipation method of interaction medium in utilization dynamic scene

Technical field

The present invention relates to action prediction field, more particularly, to interaction medium in a kind of utilization dynamic scene not Carry out Forecasting Methodology.

Background technology

The response to predict the following scene to this track is carried out according to passing action trail, i.e., future event is predicted, is Attract the subject under discussion greatly paid close attention in recent years.Under the background of big data increased popularity, man-machine interaction, robot navigation etc. can meet Many unprecedented scenes, can high degree training smart equipment progress current scene and task with future anticipation method Real-time distribution with implement, simultaneously for the huge field of task amount, as population surveillance, resource exploration etc. possess hardware facility Scene will have high application potential, in addition, meeting an urgent need and arranging in multitask command system distributed coordination, a variety of vehicles anticollisions Apply, also have immeasurable humane and market value in terms of natural calamity escape route forewarning management.

Future anticipation is still a very highly difficult challenge.Because the scene being related in task both includes vision On dimension, considering in logic is also contains, while also assigning the function of this task Intelligent Recognition, it is necessary to high data volume Training test process.On the one hand, space-time instability causes Fuzzy Influence to visual reception, while the responsible journey of data volume Amount of calculation increase can cause the reduction of live effect caused by degree, on the other hand, and the medium interacted too much produces mutual shadow Ring, it is impossible to peel off carry out quantitative analysis one by one, therefore the prediction of action is judged to cause difficulty.

The present invention proposes a kind of long-term favorable decisions new frame based on reverse optimum control.First by condition variation Self-encoding encoder assumes the multisample of prediction to generate, followed by the long-term favorable decisions method based on reverse optimum control framework Carry out the marking and queuing of sample and improve processing, introducing Recognition with Recurrent Neural Network carries out scene to the medium of image-context, activity and melted Close, obtained result is fed back in decoder again to be iterated.The present invention can handle a certain in scene dynamically or statically The future event foresight activity of medium generates, sorts and improved sample there is provided Recognition with Recurrent Neural Network set, significantly takes The precise effect that must be predicted.

The content of the invention

The problem of for solving to predict future-action in complex scene multi interaction medium, it is an object of the invention to There is provided a kind of future anticipation method of interaction medium in utilization dynamic scene, it is proposed that a kind of based on reverse optimum control Long-term favorable decisions new frame.

To solve the above problems, the present invention provides a kind of future anticipation side of interaction medium in utilization dynamic scene Method, its main contents include：

(1) multisample is generated；

(2) sample sequence with it is perfect；

(3) scene is merged.

Wherein, described multisample generation, including universal model construction and condition variation self-encoding encoder.

Further, described universal model construction, for playing interactive object in generation event in certain scene, No matter static or motion, referred to as medium, if giving n medium, their passing movement track X=<X₁,X₂,…,X_n >, calculate their future behaviour track Y=<Y₁,Y₂,…,Y_n>Respective probability then be P<Y|X,I>, especially, I is should The input at scene lower a moment, the maximum probability obtained according to input is exported as prediction locus.

Further, described condition variation self-encoding encoder, learns a decision function f using depth generation model, Passing movement track X and input I are mapped on the Y of future behaviour track by the function, specifically：

(1) random latent variable z is introduced_i, in input X_iUnder conditions of, study to output Y_iProbability distribution be P (Y_i|X_i), It is respectively identification network Q that this process, which builds corresponding neutral net,_φ(z_i|Y_i,X_i), condition pro-active network P_v(z_i|X_i) and it is raw Into network P_θ(Y_i|X_i,z_i)；

(2) in the training stage：1. respectively with two Recognition with Recurrent Neural Network to medium i, X_iAnd Y_iEncoded, obtain two As a result 2. the two results are merged and is delivered to one layer of fully-connected network using nonlinear activation function again；3. again Generated with two parallel fully-connected networks on z_iAverageAnd standard deviationGaussian distribution model is modeled as according to it, and And be normalized with KL divergences；4. propose that two loss functions are adjusted to self-encoding encoder, respectively rebuild loss function With KL divergence loss functions l_KLD=D_KL(Q_φ(z_i|Y_i, X_i)||P_v(z_i))；

(3) in test phase：1. due to future anticipation trackDo not apply to and be rejected, therefore utilize passing track With multiple latent variable z_i ^(k)Sample merges；2. it is relative with the training stage, nowWith β (z_i ^(k)) it is input to Recognition with Recurrent Neural Network solution The hypothesis set of multiple predictions is produced in code device.

Further, described sample sequence with it is perfect, including sample evaluation improves and iterative feedback two parts.

Further, described sample evaluates perfect, is predicted for future event, using the decision-making in enhancing learning framework Formulating method, a medium can have selected after training one reward for a long time it is maximized action go implement as future thing Part, now designs a Recognition with Recurrent Neural Network model and goes prediction to being made in each trainingCarry out long term accumulation reward Measurement, be specially：

(1) medium i is scored, there is K forecast sampleThen the evaluation score s of each sample can Have：

Wherein,Refer to the forecast sample of other All Medias in addition to i,It is pre- test samples of the medium i in time t This,It is forecast samples all before knowing timestamp t, ψ is allocated to each time step t reward function, should Function is realized with the fully connected network network layers for being connected to Recognition with Recurrent Neural Network；

(2) perfect to medium i, in scoring process, Recognition with Recurrent Neural Network model estimates a regression vector simultaneouslyThe regression function η that the estimation procedure is used is：

Wherein, regression function η can accumulate all passing scene environments and the dynamic of all interaction mediums, while Estimated on whole time dimension optimal

Further, described iterative feedback, optimum regression substituting vector is obtained using formula (2)Go gradually complete The hypothesis set of kind predictionSpecially：

(1) each iteration,All it is updated toIt is then delivered to reverse optimum control module；

(2) reverse optimum control training in sequence and improve module, they have two kinds loss items, be respectively：Intersect Entropy loss l_CE=H (p, q), wherein q are obtained by activation primitive, i.e.,Return Lose item

(3) to a training batch in neutral net, its whole multitask loss item is：

Wherein, N is the number of the batch medium.

Further, described scene fusion, including context insertion and interaction feature.

Further, described context insertion, using Recognition with Recurrent Neural Network, receives following on each timestamp t Input X_t：

Wherein,It isIn the speed of t, γ is one layer of full articulamentum with nonlinear activation function, its handle Speed is mapped to higher-dimension and characterizes space,It will be located atConvolutional neural networks feature ρ (I) carry out pondization operate, R is gathered together the hiding vector on other medium spatial domains by a fused layer, self-embedding vectorIt is used as circulation nerve The initialization hidden state vector of network.

Further, described interaction feature, using the spatial domain grid operations using pond layer operation, makes different medium not With the unlikely disappearance of the interaction between sample, it is specially：In moment t medium i, its sample k spatial domain grid cell with Centered on, then to each grid cell g, the feature of its interior Hide AllAverage pond will all be carried out Change operation.

Brief description of the drawings

Fig. 1 is the system flow chart of the future anticipation method of interaction medium in a kind of utilization dynamic scene of the invention.

Fig. 2 is the example of the future anticipation method of interaction medium in a kind of utilization dynamic scene of the invention.

Embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

Fig. 1 is the system flow chart of the future anticipation method of interaction medium in a kind of utilization dynamic scene of the invention. Mainly include data input；Multisample is generated；Sample sort with it is perfect；Scene is merged.

Wherein, multisample is generated, including universal model construction and condition variation self-encoding encoder.

Universal model is constructed, for playing interactive object in generation event in certain scene, no matter static or fortune Move, referred to as medium, if giving n medium, their passing movement track X=<X₁,X₂,…,X_n>, calculate theirs Future behaviour track Y=<Y₁,Y₂,…,Y_n>Respective probability then be P<Y|X,I>, especially, I is defeated for the scene lower a moment Enter, the maximum probability obtained according to input is exported as prediction locus.

Described condition variation self-encoding encoder, learns a decision function f using depth generation model, and the function was incited somebody to action It is mapped to toward movement track X and input I on the Y of future behaviour track, specifically：

(2) in the training stage：1. respectively with two Recognition with Recurrent Neural Network to medium i, X_iAnd Y_iEncoded, obtain two As a result 2. the two results are merged and is delivered to one layer of fully-connected network using nonlinear activation function again；3. again Generated with two parallel fully-connected networks on z_iAverageAnd standard deviationGaussian distribution model is modeled as according to it, and And be normalized with KL divergences；4. propose that two loss functions are adjusted to self-encoding encoder, respectively rebuild loss function With KL divergence loss functions l_KLD=D_KL(Q_φ(z_i|Y_i,X_i)‖P_v(z_i))；

Sample sort with it is perfect, including sample evaluation improve with iterative feedback two parts.

Sample evaluates perfect, is predicted for future event, using the decision-making method in enhancing learning framework, Jie Matter can have selected one after training rewards maximized action and goes to implement as following event for a long time, and now design one is followed Ring neural network model goes the prediction to being made in each trainingThe measurement of long term accumulation reward is carried out, is specially：

Iterative feedback, optimum regression substituting vector is obtained using formula (2)Go gradually to improve the hypothesis set of predictionSpecially：

(3) to a training batch in neutral net, its whole multitask loss item is：

Wherein, N is the number of the batch medium.

Scene is merged, including context insertion and interaction feature.

Context is embedded in, using Recognition with Recurrent Neural Network, receives following input X on each timestamp t_t：

Interaction feature, using the spatial domain grid operations using pond layer operation, makes the phase between the different samples of different medium Interaction is unlikely to disappear, and is specially：In moment t medium i, its sample k spatial domain grid cell withCentered on, then to every One grid cell g, the feature of its interior Hide AllAverage pondization operation will all be carried out.

Fig. 2 is the example of the future anticipation method of interaction medium in a kind of utilization dynamic scene of the invention.As schemed Show, it is observed that by different iterationses, model learning to feedback gradually strengthen, the route of prediction is also increasingly leaned on Nearly real route, that is, thick line of adding some points.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and modification.

Claims

1. a kind of future anticipation method of interaction medium in utilization dynamic scene, it is characterised in that mainly including multisample Generate (one)；Sample sorts and perfect (two)；Scene merges (three).

2. (one) is generated based on the multisample described in claims 1, it is characterised in that become including universal model construction and condition Divide self-encoding encoder.

3. based on the universal model construction described in claims 2, it is characterised in that for being risen in certain scene in generation event Interactive object, no matter static or motion, referred to as medium, if giving n medium, their passing movement track X=＜ X₁,X₂,…,X_n＞, calculates their future behaviour track Y=＜ Y₁,Y₂,…,Y_n＞ respective probability is then P ＜ Y | X, I ＞, especially, I are the input at the scene lower a moment, and the maximum probability obtained according to input is exported as prediction locus.

4. based on the condition variation self-encoding encoder described in claims 2, it is characterised in that learnt using depth generation model Passing movement track X and input I are mapped on the Y of future behaviour track by one decision function f, the function, specifically：

(1) random latent variable z is introduced_i, in input X_iUnder conditions of, study to output Y_iProbability distribution be P (Y_i|X_i), this mistake It is respectively identification network Q that journey, which builds corresponding neutral net,_φ(z_i|Y_i,X_i), condition pro-active network P_v(z_i|X_i) and generation net Network P_θ(Y_i|X_i,z_i)；

(2) in the training stage：1. respectively with two Recognition with Recurrent Neural Network to medium i, X_iAnd Y_iEncoded, obtain two results 2. the two results are merged and is delivered to one layer of fully-connected network using nonlinear activation function again；3. again with simultaneously Two capable fully-connected networks are generated on z_iAverageAnd standard deviationGaussian distribution model is modeled as according to it, and is used KL divergences are normalized；4. propose that two loss functions are adjusted to self-encoding encoder, respectively rebuild loss function With KL divergence loss functions

(3) in test phase：1. due to future anticipation trackDo not apply to and be rejected, therefore utilize passing trackWith it is many Individual latent variable z_i ^(k)Sample merges；2. it is relative with the training stage, nowWith β (z_i ^(k)) it is input to Recognition with Recurrent Neural Network decoder The middle hypothesis set for producing multiple predictions.

5. based on the sample sequence described in claims 1 and perfect (two), it is characterised in that improve and change including sample evaluation Generation feedback two parts.

6. evaluate perfect based on the sample described in claims 5, it is characterised in that predicted for future event, using enhancing Decision-making method in learning framework, a medium can have selected after training one reward for a long time it is maximized action go reality Apply as following event, now design a Recognition with Recurrent Neural Network model and go prediction to being made in each trainingCarry out The measurement of long term accumulation reward, be specially：

<mrow> <mi>s</mi> <mrow> <mo>(</mo> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mi>i</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msubsup> <mo>;</mo> <mi>I</mi> <mo>,</mo> <mi>X</mi> <mo>,</mo> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mrow> <mi>j</mi> <mo>\</mo> <mi>i</mi> </mrow> <mrow> <mo>(</mo> <mo>&ForAll;</mo> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>t</mi> <mo>=</mo> <mn>1</mn> </mrow> </munder> <mi>&psi;</mi> <mrow> <mo>(</mo> <msubsup> <mover> <mi>y</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msubsup> <mo>;</mo> <mi>I</mi> <mo>,</mo> <mi>X</mi> <mo>,</mo> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mrow> <mi>&tau;</mi> <mo><</mo> <mi>t</mi> </mrow> <mrow> <mo>(</mo> <mo>&ForAll;</mo> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein,Refer to the forecast sample of other All Medias in addition to i,It is forecast samples of the medium i in time t,It is forecast samples all before knowing timestamp t, ψ is allocated to each time step t reward function, the function It is to be realized with the fully connected network network layers for being connected to Recognition with Recurrent Neural Network；

(2) perfect to medium i, in scoring process, Recognition with Recurrent Neural Network model estimates a regression vector simultaneouslyShould The regression function η that estimation procedure is used is：

<mrow> <mi>&Delta;</mi> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mi>i</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msubsup> <mo>=</mo> <mi>&eta;</mi> <mrow> <mo>(</mo> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mi>i</mi> <mrow> <mo>(</mo> <mi>k</mi> <mo>)</mo> </mrow> </msubsup> <mo>;</mo> <mi>I</mi> <mo>,</mo> <mi>X</mi> <mo>,</mo> <msubsup> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mrow> <mi>j</mi> <mo>\</mo> <mi>i</mi> </mrow> <mrow> <mo>(</mo> <mo>&ForAll;</mo> <mo>)</mo> </mrow> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein, regression function η can accumulate all passing scene environments and the dynamic of all interaction mediums, while whole Estimated on time dimension optimal

7. based on the iterative feedback described in claims 5, it is characterised in that using formula (2) obtain optimum regression replace to AmountGo gradually to improve the hypothesis set of predictionSpecially：

(2) reverse optimum control training in sequence and improve module, they have two kinds loss items, be respectively：Cross entropy is damaged Lose itemWherein q is obtained by activation primitive, i.e.,Return loss

(3) to a training batch in neutral net, its whole multitask loss item is：

Wherein, N is the number of the batch medium.

8. (three) are merged based on the scene described in claims 1, it is characterised in that including context insertion and interaction feature.

9. based on the context insertion described in claims 8, it is characterised in that Recognition with Recurrent Neural Network is used, at each Between receive following input X on stamp t_t：

<mrow> <msub> <mi>X</mi> <mi>t</mi> </msub> <mo>=</mo> <mo>&lsqb;</mo> <mi>&gamma;</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>&upsi;</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>,</mo> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>;</mo> <mi>&rho;</mi> <mo>(</mo> <mi>I</mi> <mo>)</mo> <mo>)</mo> </mrow> <mo>,</mo> <mi>r</mi> <mrow> <mo>(</mo> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>;</mo> <msub> <mover> <mi>y</mi> <mo>^</mo> </mover> <mrow> <mi>j</mi> <mo>\</mo> <mi>i</mi> </mrow> </msub> <mo>,</mo> <msub> <mi>h</mi> <msub> <mover> <mi>Y</mi> <mo>^</mo> </mover> <mrow> <mi>j</mi> <mo>\</mo> <mi>i</mi> </mrow> </msub> </msub> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein,It isIn the speed of t, γ is one layer of full articulamentum with nonlinear activation function, and it reflects speed It is mapped to higher-dimension and characterizes space,It will be located atConvolutional neural networks feature ρ (I) carry out pondization and operate, r passes through One fused layer, the hiding vector on other medium spatial domains is gathered together, self-embedding vectorIt is used as Recognition with Recurrent Neural Network Initialize hidden state vector.

10. based on the interaction feature described in claims 8, it is characterised in that use the spatial domain grid using pond layer operation Operation, makes the interaction between the different samples of different medium is unlikely to disappear, is specially：In moment t medium i, its sample k's Spatial domain grid cell withCentered on, then to each grid cell g, the feature of its interior Hide All Average pondization operation will all be carried out.