CN108537820A

CN108537820A - Dynamic prediction method, system and the equipment being applicable in

Info

Publication number: CN108537820A
Application number: CN201810348528.6A
Authority: CN
Inventors: 张崇洁; 朱广翔
Original assignee: Tsinghua University
Current assignee: Turing Artificial Intelligence Research Institute (Nanjing) Co., Ltd.
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2018-09-14
Anticipated expiration: 2038-04-18
Also published as: CN108537820B

Abstract

The application provides a kind of dynamic prediction method, system and the equipment being applicable in.The dynamic prediction method includes the following steps：Obtain current frame image；Determine to include the object mask code matrix to be predicted of object to be predicted and the reference object mask code matrix including reference object based on the current frame image；Based between the object mask code matrix to be predicted and the reference object mask code matrix relationship and default behavior prediction described in object to be predicted movement.The dynamic prediction method of the application by acquired current frame image by being divided into object and reference object to be predicted, and based between the reference object mask code matrix and object mask code matrix to be predicted indicated by object mask code matrix relationship and default behavior predict the movement of object to be predicted, make it possible to improve the generalization ability of dynamic prediction, and object is indicated using mask code matrix so that the prediction process is interpretable.

Description

Dynamic prediction method, system and the equipment being applicable in

Technical field

This application involves image analysis technology field, more particularly to a kind of dynamic prediction method, system and it is applicable in Equipment.

Background technology

In recent years, have benefited from universal and computing capability the promotion of big data, the combination of intensified learning and deep learning is Deeply study achieves innovative progress.In practical applications, extensive and interpretation is that deeply study faces Significant challenge.Wherein, extensive refers to adaptation of the model to fresh sample, i.e., trained model was not meeting the table in data It is existing.Interpretation refers to the characteristic for being different from " flight data recorder " and capable of explaining how to solve the problems, such as.Usually, in order to solve State problem, it is important to be from the effect of object level learning agent behavior wherein, intelligent body indicates the object for having capacity Body, such as robot, unmanned vehicle.

However, for intelligent body behavior effect study, prior art generally use using behavior as the dynamic of condition Prediction (dynamics prediction) concentrates on the effect of Pixel-level movement and directly predictive behavior, and which has limited for institute The interpretation and generalization ability of learning dynamics.

Invention content

In view of the foregoing deficiencies of prior art, the application is designed to provide a kind of dynamic prediction method, system And the equipment being applicable in, for solve in the prior art in terms of dynamic prediction existing for generalization ability it is low and unaccountable ask Topic.

In order to achieve the above objects and other related objects, the first aspect of the application provides a kind of dynamic prediction method, packet Include following steps：Obtain current frame image；It is covered based on the object to be predicted that current frame image determination includes object to be predicted Code matrix and the reference object mask code matrix including reference object；Based on the object mask code matrix to be predicted and the reference The movement of object to be predicted described in relationship and default behavior prediction between object mask code matrix.

In the certain embodiments of the first aspect of the application, the object mask code matrix to be predicted is waited for based on described Predict what object determined, the reference object mask code matrix is the pass based on the reference object with the object movement to be predicted What connection relationship or the type of the reference object determined.

It is described pre- including waiting for based on current frame image determination in the certain embodiments of the first aspect of the application It surveys the object mask code matrix to be predicted of object and includes the steps that the reference object mask code matrix of reference object includes：Using pre- The first convolutional neural networks first trained are based on the object mask to be predicted that current frame image determination includes object to be predicted Matrix and reference object mask code matrix including reference object.

It is described to be based on the object mask code matrix to be predicted and institute in the certain embodiments of the first aspect of the application The step of stating the movement of object to be predicted described in the relationship between reference object mask code matrix and default behavior prediction include：With Foundation presets visual field window size to described with reference to right centered on the position of object to be predicted in the object mask code matrix to be predicted As mask code matrix is cut to obtain clipped reference object mask code matrix；The second convolution nerve net based on training in advance Network determines effect of the reference object to the object to be predicted represented by the clipped reference object mask code matrix；It is based on The movement of the object to be predicted is predicted in default behavior and identified effect.

In the certain embodiments of the first aspect of the application, the dynamic prediction method is further comprising the steps of：It is right The clipped reference object mask code matrix point of addition information that is obtained and based on the second convolutional neural networks of training in advance Determine effect of the reference object represented by the clipped reference object mask code matrix to the object to be predicted.

In the certain embodiments of the first aspect of the application, the effect further includes preset object to be predicted itself Effect.

In the certain embodiments of the first aspect of the application, the dynamic prediction method is further comprising the steps of：From Constant background when being extracted in the current frame image；In conjunction with extracted when constant background and the object to be predicted predicted fortune It is dynamic to obtain next frame image.

In the certain embodiments of the first aspect of the application, the dynamic prediction method is further comprising the steps of：Base Constant background when third convolutional neural networks trained in advance are extracted from the current frame image；In conjunction with extracted when not The movement for becoming background and the object to be predicted predicted obtains next frame image.

In the certain embodiments of the first aspect of the application, the third convolutional neural networks are set as convolution warp Product structure.

In the certain embodiments of the first aspect of the application, first convolutional neural networks, second convolution Neural network and the third convolutional neural networks are obtained through unified training according to loss function.

In the certain embodiments of the first aspect of the application, the current frame image is that based on initial data or have What the outer input data of priori obtained.

The second aspect of the application also provides a kind of Dynamic Forecasting System, including：Acquiring unit, for obtaining present frame figure Picture；Subject detecting unit, for including the object mask code matrix to be predicted of object to be predicted based on current frame image determination And the reference object mask code matrix including reference object；Predicting unit, for based on the object mask code matrix to be predicted and The movement of object to be predicted described in relationship and default behavior prediction between the reference object mask code matrix.

In the certain embodiments of the second aspect of the application, the object mask code matrix to be predicted is waited for based on described Predict what object determined, the reference object mask code matrix is the pass based on the reference object with the object movement to be predicted What connection relationship or the type of the reference object determined.

In the certain embodiments of the second aspect of the application, the subject detecting unit is used for using training in advance First convolutional neural networks based on the current frame image determine include object to be predicted object mask code matrix to be predicted and Reference object mask code matrix including reference object.

In the certain embodiments of the second aspect of the application, the predicting unit includes：Module is cut, for institute It states in object mask code matrix to be predicted centered on the position of object to be predicted according to presetting visual field window size to the reference object Mask code matrix is cut to obtain clipped reference object mask code matrix；Determining module is acted on, for based on training in advance The second convolutional neural networks determine the reference object represented by the clipped reference object mask code matrix to it is described wait for it is pre- Survey the effect of object；Prediction module, the movement for predicting the object to be predicted based on default behavior and identified effect.

In the certain embodiments of the second aspect of the application, the effect determining module is used for being obtained through cutting out The reference object mask code matrix point of addition information cut simultaneously is determined described through cutting out based on the second convolutional neural networks of training in advance Effect of the reference object represented by reference object mask code matrix cut to the object to be predicted.

In the certain embodiments of the second aspect of the application, the effect further includes preset object to be predicted itself Effect.

In the certain embodiments of the second aspect of the application, the Dynamic Forecasting System further includes：Extraction unit is used Constant background when being extracted from the current frame image；The predicting unit when being additionally operable to combine extracted constant background and The movement for the object to be predicted predicted obtains next frame image.

In the certain embodiments of the second aspect of the application, the extraction unit is used for based on third trained in advance Convolutional neural networks constant background when being extracted from the current frame image；When the predicting unit is additionally operable to combine extracted The movement of constant background and the object to be predicted predicted obtains next frame image.

In the certain embodiments of the second aspect of the application, the third convolutional neural networks are set as convolution warp Product structure.

In the certain embodiments of the second aspect of the application, first convolutional neural networks, second convolution Neural network and the third convolutional neural networks are obtained through unified training according to loss function.

In the certain embodiments of the second aspect of the application, the current frame image is that based on initial data or have What the outer input data of priori obtained.

The third aspect of the application also provides a kind of computer readable storage medium, is stored at least one program, described At least one program, which is performed, realizes any dynamic prediction method among the above.

The fourth aspect of the application also provides a kind of equipment, including：Storage device, for storing at least one program；Place Device is managed, is connected with the storage device, for calling at least one program to execute as any in claim 1-11 The dynamic prediction method.

In the certain embodiments of the fourth aspect of the application, the equipment further includes display device, the display dress It sets for showing the object mask code matrix to be predicted, the reference object mask code matrix, the predicted object to be predicted At least one of exercise data.

In the certain embodiments of the fourth aspect of the application, the processing unit is additionally operable to be based on the present frame figure Picture, the object mask code matrix to be predicted and the reference object mask code matrix generate object mask image to be predicted and reference Object mask image；The display device is additionally operable to show that the object mask image to be predicted and/or the reference object are covered Code image.

In the certain embodiments of the fourth aspect of the application, the processing unit is additionally operable to based on described to be predicted right As mask image and the reference object mask image generate object mask image；It is described right that the display device is additionally operable to show As mask image.

As described above, the dynamic prediction method of the application, system and the equipment being applicable in, have the advantages that：This The dynamic prediction method of application by the way that acquired current frame image is divided into object and reference object to be predicted, and based on by Relationship between reference object mask code matrix and object mask code matrix to be predicted that object mask code matrix indicates and default behavior To predict the movement of object to be predicted, enabling improve the generalization ability of dynamic prediction, and using mask code matrix expression pair As so that the prediction process is interpretable.

Description of the drawings

Fig. 1 is shown as the exemplary scene schematic diagram in one embodiment using the application dynamic prediction method.

Fig. 2 is shown as the flow chart of the application dynamic prediction method in one embodiment.

Fig. 3 is shown as the structure of the convolutional neural networks of the application dynamic prediction method use in one embodiment and shows It is intended to.

Fig. 4 is shown as the flow charts of step S150 in one embodiment in the application dynamic prediction method.

Fig. 5 is shown as the flow chart of the application dynamic prediction method in another embodiment.

Fig. 6 is shown as the third convolutional neural networks of the application dynamic prediction method use in another embodiment Structural schematic diagram.

Fig. 7 is shown as the structural schematic diagram of the application Dynamic Forecasting System in one embodiment.

Fig. 8 is shown as the structural schematic diagram of predicting unit in one embodiment in the application Dynamic Forecasting System.

Fig. 9 is shown as the structural representation of predicting unit in another embodiment in the application Dynamic Forecasting System Figure.

Figure 10 is shown as the structural schematic diagram of the application Dynamic Forecasting System in another embodiment.

Figure 11 is shown as structural schematic diagram of the application Dynamic Forecasting System in another embodiment.

Figure 12 is shown as the structural schematic diagram of the application equipment in one embodiment.

Figure 13 is shown as the structural schematic diagram of the application equipment in another embodiment.

Specific implementation mode

Illustrate that presently filed embodiment, those skilled in the art can be by this explanations by particular specific embodiment below Content disclosed by book understands other advantages and effect of the application easily.

In described below, refer to the attached drawing, attached drawing describes several embodiments of the application.It should be appreciated that also can be used Other embodiment, and composition can be carried out without departing substantially from spirit and scope of the present disclosure and operational changed Become.Following detailed description should not be considered limiting, and the range of embodiments herein is only by the application's Claims of patent are limited.Term used herein is merely to describe specific embodiment, and be not intended to limit this Application.

Although term first, second etc. are used for describing various elements herein in some instances, these elements It should not be limited by these terms.These terms are only used for distinguishing an element with another element.For example, first waits for Predict that object can be referred to as the second object to be predicted, and similarly, the second object to be predicted can wait for pre- referred to as first Object is surveyed, without departing from the range of various described embodiments.First object to be predicted and the second object to be predicted be One object to be predicted is described, but unless context otherwise explicitly points out, otherwise they are not same to be predicted Object.Similar situation further includes first group of reference object and second group of reference object, etc..

Furthermore as used in herein, singulative " one ", "one" and "the" are intended to also include plural number shape Formula, unless there is opposite instruction in context.It will be further understood that term "comprising", " comprising " show that there are the spies Sign, step, operation, element, component, project, type, and/or group, but it is not excluded for other one or more features, step, behaviour Presence, appearance or the addition of work, element, component, project, type, and/or group.Term "or" used herein and "and/or" quilt It is construed to inclusive, or means any one or any combinations.Therefore, " A, B or C " or " A, B and/or C " mean " with Descend any one：A；B；C；A and B；A and C；B and C；A, B and C ".Only when element, function, step or the combination of operation are in certain sides When inherently mutually exclusive under formula, it just will appear the exception of this definition.

Deeply study is to combine deep learning with intensified learning to realize from the end pair for perceiving action End study, has the potentiality for making intelligent body realize entirely autonomous study.In practical applications, extensive and interpretation is depth The significant challenge that intensified learning faces.The key for solving above-mentioned challenge is the effect from object level learning agent behavior.So And for the study of the effect to intelligent body behavior, the prior art achieves weight in terms of using behavior as the dynamic prediction of condition Big progress, but still it is somewhat limited.First, what the prior art used concentrates merely on pixel by the dynamic prediction of condition of behavior Grade movement rather than follow the normal form of object-oriented, have ignored the basic prototype that object is physical kinetics and be always used as one it is whole Body carries out the mobile fact, thus the mobile object predicted usually has fuzzy profile and texture.Secondly, the prior art is adopted It as condition is predicted as the direct predictive behavior of the dynamic prediction of condition rather than using relationship between object, is limited using behavior For the interpretation and generalization ability of institute's learning dynamics.

In consideration of it, the application provides a kind of dynamic prediction method, the dynamic prediction method is with video frame and intelligent body Behavior is input, and environment is divided into object and is predicted as condition using behavior and object relationship, thus, the dynamic Prediction technique can also be known as the dynamic prediction method of object-oriented.Wherein, the object refers to the things as target, dynamic State predicts in applied environment that the object typically refers to the object in environment, such as the sample application scene being described later on In ladder, intelligent body etc..

In order to clearly describe the dynamic prediction method of the application, in conjunction with sample application scene to the dynamic prediction Method is described in detail.Referring to Fig. 1, Fig. 1 is shown as the exemplary scene using the application dynamic prediction method in a kind of reality The schematic diagram in mode is applied, as shown, the exemplary scene includes ladder A, wall B, space C and intelligent body D.In addition, Based on the exemplary scene, the default behavior of intelligent body A may include it is upward, downward, to the left, to the right and without operation.In reality In, intelligent body D can be moved up or down when encountering ladder A, and intelligent body D can be blocked when meeting wall B cannot Mobile, intelligent body D can be fallen when in space C.

It should be noted that the form of above application scene, each element and its quantity are only for example, rather than the application is answered With the limitation of scene and each element.In fact, the application scenarios can be other scenes for needing to carry out dynamic prediction, this Outside, there may also be having the case where multiple intelligent bodies under same application scene, this is no longer going to repeat them.

Referring to Fig. 2, Fig. 2 is shown as the flow chart of the application dynamic prediction method in one embodiment, as schemed institute Show, the dynamic prediction method includes step S110, step S130 and step S150.This is described below in conjunction with Fig. 1 and Fig. 2 The dynamic prediction method of application.

In step s 110, current frame image is obtained.

Wherein, the current frame image is for the next frame image being described later on.In this example, present frame Image refers to the image I (t) in t moment, and next frame image refers to the image I (t+1) at the t+1 moment to be predicted.Below Description in, current frame image I (t) is indicated with exemplary scene image shown in FIG. 1.

In addition, in certain embodiments, the current frame image can be obtained based on initial data.In other realities It applies in example, the current frame image can be obtained based on the outer input data with priori.Wherein, the priori Knowledge refers to previously known information.In this example, with priori outer input data may include for example, by What foreground detection mode obtained is conducive to determine the dynamic of the object mask code matrix to be predicted including object to be predicted being described later on State area information so that can refer to the dynamic area in the determination that the is described later on object mask code matrix to be predicted the step of Information and concentrate in the dynamic area determining object mask code matrix to be predicted to improve discrimination.

In step s 130, based on current frame image determine include object to be predicted object mask code matrix to be predicted and Reference object mask code matrix including reference object.

Wherein, the object to be predicted refers to the loose impediment that its dynamic needs to be predicted under current scene, such as Intelligent body D shown in Fig. 1, the object to be predicted are also referred to as dynamic object due to its moveable movement properties. The reference object refers to other objects removed under current scene other than the object to be predicted.In certain embodiments, Object under current scene can be divided into static object and dynamic object based on movement properties, then, in this case, institute Reference object is stated to may include static object, remove as other dynamic objects other than the dynamic object of object to be predicted.With For Fig. 1, the reference object may include ladder A, wall B and space C shown in Fig. 1, and wherein ladder A is working as front court Under scape it is stationary and can allow object to be predicted moved up and down at the position overlapped with ladder A and left and right translation, wall B Stationary and object to be predicted can be prevented to be moved to the positions wall B direction under current scene, space C working as front court It is stationary and object to be predicted can be made to be moved along all directions under scape.Wherein, ladder A, wall B and space C are since its is quiet Only motionless movement properties are also referred to as static object.If in addition, for example, further including intelligent body D ', intelligent body D ' in Fig. 1 It is moveable dynamic object similar to intelligent body D, then intelligent body D ' is also corresponding with as the intelligent body D of object to be predicted Reference object.In consideration of it, for object intelligent body D to be predicted, corresponding reference object includes ladder A, wall B, space C And intelligent body D '.

In certain embodiments, in the case where exemplary scene includes an independent dynamic object, the dynamic is right As for object to be predicted, one or more reference objects, one or more of reference objects are corresponding with the object to be predicted One group of reference object referred to as corresponding with the object to be predicted, by taking Fig. 1 as an example, the object to be predicted is an intelligent body D, institute It includes ladder A, wall B and space C to state one group of reference object.In further embodiments, it include two in exemplary scene Or more in the case of independent dynamic object, described two or multiple dynamic objects can be directed to and be predicted respectively, then Two or more objects to be predicted are corresponding with two or more dynamic objects, with the two or more objects to be predicted Be corresponding with two or more groups reference object, wherein every group of reference object include removed in current frame image object to be predicted it Other outer objects.For example, including the movement of two intelligent body D and each intelligent body D in current scene in exemplary scene In the case that mode is similar, two intelligent body D are expressed as the first object to be predicted and the second object to be predicted, then with first Object to be predicted is corresponding with first group of reference object, and first group of reference object includes that ladder A, wall B, space C and second are waited for Predict object.Be corresponding with second group of reference object with the second object to be predicted, second group of reference object include ladder A, wall B, The objects to be predicted of space C and first.

It should be noted that above-mentioned object to be predicted and reference object are only for example, those skilled in the art can be based on Different application scenarios determine corresponding object and reference object to be predicted, and this is no longer going to repeat them.

In addition, the object mask code matrix to be predicted refer to obtained after being blocked to current frame image only include wait for it is pre- The mask code matrix of object is surveyed, the reference object mask code matrix refers to being obtained after being blocked to current frame image only including ginseng According to the mask code matrix of object.Wherein, the mask code matrix of object indicates that each pixel of image belongs to the probability of the object, described general Rate is the number between 0-1, wherein 0 indicates that the probability for belonging to the object be that belong to the probability of the object be 1 for 0,1 expression. For the convenience of description, object mask code matrix and reference object mask code matrix to be predicted are referred to as object mask code matrix.In addition, base In said one object to be predicted is corresponding with one group of reference object the case where, correspondingly, one object mask square to be predicted Battle array is corresponding with one group of reference object mask code matrix.

In certain embodiments, the object mask code matrix to be predicted is determined based on object to be predicted, the reference Object mask code matrix is that the type of the incidence relation or reference object that are moved based on reference object and object to be predicted is determined. That is the object mask code matrix to be predicted is specific for object determination, the reference object mask code matrix is specific for What class determined.

For example, about object mask code matrix to be predicted, generated for object to be predicted such as intelligent body D corresponding to be predicted Object mask code matrix, there are multiple intelligent body D, the multiple objects to be predicted for generating corresponding each intelligent body D are covered Code matrix.

About reference object mask code matrix, according to reference object and the incidence relation of corresponding object movement to be predicted generate with The corresponding all kinds of reference object mask code matrixes of the incidence relation, wherein the incidence relation, that is, reference object is to be predicted right It is influenced caused by the movement of elephant.That is, the incidence relation can be based on reference object to object to be predicted influence To divide.It is described to influence to depend on reference object relative to the motion state of object to be predicted and the movement category of reference object Property.By taking Fig. 1 as an example, exemplary scene shown in FIG. 1 includes the static object ladder A that can make object climbing to be predicted, prevents The static object wall B of the object movement to be predicted and static object space C that object to be predicted can be made to fall, although example Multiple ladder A, wall B and space C are shown in scene, but the movement based on reference object relative to object to be predicted is closed System, can generate ladder class reference object mask code matrix corresponding with the reference object of ladder A one kind, similar, generation and wall The corresponding wall kind reference object mask code matrix of reference object of wall B one kind generates corresponding with the reference object of space C one kind Spatial class reference object mask code matrix.If described in addition, further include the coloured flag as static object in exemplary scene in Fig. 1 Coloured flag only indicates the movement that intelligent body D needs the final destination reached but do not influence intelligent body D, then it is opposite to be based on reference object In the movement relation of object to be predicted, coloured flag class reference object mask code matrix can be generated.If in addition, the exemplary scene in Fig. 1 In further include barrier, the barrier is also that the static object for preventing object to be predicted movement is then based on reference object Relative to the movement relation of object to be predicted, it is right with the reference of the reference object of wall B one kind and barrier one kind to generate As corresponding prevention class reference object mask code matrix.In addition, if scene shown in Fig. 1 includes two intelligent bodies i.e. two dynamics Object then can respectively predict two intelligent bodies that in this case, it is to be predicted that two intelligent bodies are referred to as first Object and the second object to be predicted.It is the second object to be predicted of dynamic object when predicting the first object to be predicted Reference object as the first object to be predicted.When predicting the second object to be predicted, first for dynamic object waits for Predict reference object of the object as the second object to be predicted.In consideration of it, being corresponding with the first object mask code matrix to be predicted, second Object mask code matrix to be predicted, first group of reference object mask code matrix corresponding with the first object mask code matrix to be predicted and with The corresponding second group of reference object mask code matrix of second object mask code matrix to be predicted.Wherein first group of reference object mask code matrix Including ladder class reference object mask code matrix, wall kind reference object mask code matrix (or prevent class reference object mask code matrix), Spatial class reference object mask code matrix, coloured flag class reference object mask code matrix and the second object mask code matrix to be predicted.Second Group reference object mask code matrix include ladder class reference object mask code matrix, wall kind reference object mask code matrix (or prevent class Reference object mask code matrix), spatial class reference object mask code matrix, coloured flag class reference object mask code matrix and first to be predicted Object mask code matrix.

Alternatively, about reference object mask code matrix, reference object mask square can also be determined according to the type of reference object Battle array.For example, in the case that exemplary scene in Fig. 1 includes barrier as described above, the barrier is also for preventing The static object of object to be predicted movement but belong to variety classes with wall B, then the type based on reference object, can give birth to respectively At wall kind reference object mask code matrix corresponding with the reference object of wall B one kind, and it is right with the reference of barrier one kind As corresponding obstacle species reference object mask code matrix.

For simplicity, the application includes that an object to be predicted, the object to be predicted are corresponding with exemplary scene The one group of reference object and reference object mask code matrix for including reference object is transported based on reference object and the object to be predicted Dynamic incidence relation is described for determining, but the application is not limited to this.It will be understood by those skilled in the art that based on answering With the difference of scene, the application can also be applied to include multiple objects to be predicted and multigroup reference corresponding with object to be predicted The case where object, this is no longer going to repeat them.

In one embodiment, such as foreground detection mode may be used to obtain object to be predicted from sequence image, be based on In the application scenarios pre-entered the feature of reference object with by feature recognition come obtain reference object, by mask code matrix pair Mask code matrix processing is carried out to obtain object mask code matrix to be predicted including the current frame image of object to be predicted and reference object With reference object mask code matrix.Wherein, the object mask code matrix to be predicted and reference object mask code matrix respectively include this and cover Location information of the code matrix in current frame image, so that object mask square to be predicted can be determined by the location information Battle array and position of the reference object mask code matrix relative to current frame image.In one example, the reference object mask code matrix and The object mask code matrix to be predicted carries out mask code matrix operation with artwork size based on current frame image and obtains.

In another embodiment, the first convolutional neural networks of training in advance can be used true based on the current frame image Surely include the object mask code matrix to be predicted of object to be predicted and the reference object mask code matrix including reference object.Show one In example, the first convolutional neural networks may include that multiple structures are identical but the convolutional neural networks of weighted.It can will be current Each convolutional neural networks of frame image input training in advance, the output layer of each convolutional neural networks is via channel formation interconnected amongst one another Full connection features figure, be followed by pixel-by-pixel softmax layers of (pixel-wise) to obtain the object mask to be predicted specific to object Matrix and reference object mask code matrix specific to class, wherein the number of the convolutional neural networks can be based on object mask The number of matrix determines.By taking Fig. 1 as an example, exemplary scene according to figure 1, object mask code matrix includes one to be predicted right As mask code matrix and three reference object mask code matrixes, thus, can be obtained by four convolutional neural networks corresponding four it is right As mask code matrix, four convolutional neural networks can be with identical structure but with different weights.If example shown in FIG. 1 Scene includes two dynamic objects, that is, intelligent body D, then relatively, corresponding five objects is obtained by five convolutional neural networks Mask code matrix, five convolutional neural networks can be with identical structures but with different weights.

For example, referring to Fig. 3, Fig. 3 is shown as the convolutional neural networks of the application dynamic prediction method use in a kind of reality The structural schematic diagram in mode is applied, as shown, the structure of the convolutional neural networks, which can be multilayer convolution, adds full convolution knot Structure, wherein I (t) indicates that current frame image, solid arrow indicate that convolution adds activation primitive, dotted arrow to indicate amplification plus connect entirely It connects, length interval dotted arrow is indicated to replicate plus be connected entirely, and in this example, activation primitive selects ReLU.Wherein it is possible to be arranged Conv (F, K, S) is indicated with F filter, the convolutional layer that convolution kernel is K and step-length is S, it is assumed that R () indicates activation primitive Layer i.e. ReLU layers, BN () indicate batch normalization layer, then five convolutional layers shown in Fig. 3 can be expressed as R (BN (Conv (64,5,2))), R (BN (Conv (64,3,2))), R (BN (Conv (64,3,1))), R (BN (Conv (32,1,1))), R (BN (Conv (1,3,1))).

It should be noted that the structure and parameter of above-mentioned convolutional neural networks is only for example, those skilled in the art can be with The structure and parameter of convolutional neural networks is carried out based on object and reference object to be predicted included in different application scene Variants and modifications, this is no longer going to repeat them.

In step S150, based on relationship between object mask code matrix to be predicted and reference object mask code matrix and pre- If the movement of behavior prediction object to be predicted.

Wherein, the default behavior is pre-set based on application scenarios.The default behavior can be for example, by using volume One or more behaviors that the movement of object to be predicted is controlled of the form output Machine oriented of code, such as " behavior 1 ", " behavior 2 " etc..In addition, the behavior may refer to corresponding concrete behavior in specific application scenarios.By taking Fig. 1 as an example, In exemplary scene shown in FIG. 1, default behavior may include behavior 1 to behavior 5, wherein be applied in the scene, behavior 1 to Behavior 5 indicate respectively upwards, downwards, to the left, to the right and without operation.The default behavior can be by encoding such as one-hot The mode of coding is arranged.

In the application, with default relationship between behavior and object mask code matrix to be predicted and reference object mask code matrix To predict the movement of object to be predicted.In some embodiments, object mask code matrix and reference object to be predicted are also based on The fortune of all objects of relationship and default behavior prediction including object to be predicted and reference object between mask code matrix It is dynamic, but the prediction mode is compared to computationally intensive, inefficiency for the movement for only predicting object to be predicted.

By taking Fig. 1 as an example, in application scenarios shown in Fig. 1, object mask code matrix to be predicted be include the first of intelligent body D Object mask code matrix to be predicted, reference object mask code matrix are respectively first kind reference object mask code matrix, the packet for including ladder A Include the second class reference object mask code matrix of wall B and the third class reference object mask code matrix including space C, wherein institute First kind reference object mask code matrix, the second class reference object mask code matrix and third class reference object mask code matrix is stated to be referred to as For first group of reference object mask code matrix corresponding to the first object mask code matrix to be predicted.Object mask code matrix to be predicted and ginseng Include that first kind reference object mask code matrix covers object to be predicted based on default behavior according to the relationship between object mask code matrix The code effect of matrix, the second class reference object mask code matrix based on default behavior to the effect of object mask code matrix to be predicted and Effect of the third class reference object matrix based on default behavior to object mask code matrix to be predicted.

In addition, the movement for the object to be predicted predicted can include but is not limited to the direction of object movement to be predicted, move Dynamic distance, object post exercise location information to be predicted etc..

Referring to Fig. 4, Fig. 4 is shown as the flows of step S150 in one embodiment in the application dynamic prediction method Figure, as shown, step S150 may include step S1501, step S1503 and step S1505.

In step S1501, regarded according to default centered on the position of object to be predicted in object mask code matrix to be predicted Wild window size cuts reference object mask code matrix to obtain clipped reference object mask code matrix.

Wherein, the position of object to be predicted is that the desired locations based on object mask code matrix to be predicted limit.For example, right In j-th of object Dj to be predicted, positionIt can be indicated by following formula (1)：

Wherein, H and W indicates the height and width of image, M respectively_DjIndicate that the object to be predicted of j-th of object to be predicted is covered Code matrix.

In addition, visual field window size is the maximum effective range for referring to indicate object relationship.Wherein, object relationship is Refer to the relationship between object and reference object to be predicted.Visual field window size can be that technical staff is pre-set based on experience. Assuming that visual field window size be w, then withCentered on, size be w visual field window Bw pass through following formula (2) indicate：

That is, above-mentioned formula (1) and formula (2) are based on, with object to be predictedCentered on, root Reference object mask code matrix is cut according to Bw.In one example, it can be realized at cutting by bilinearity sample mode Reason.In addition, in the case where default visual field window size is equal to original input picture size, it can be considered and do not cut.Due in reality In, principle of locality is typically found in object relationship, thus the application introduces principle of locality by cutting to handle, into And object to be predicted dynamically will be influenced to concentrate in the relationship between object to be predicted and other objects adjacent thereto.

In step S1503, clipped reference object mask is determined based on the second convolutional neural networks of training in advance Effect of the reference object to object to be predicted represented by matrix.

In one embodiment, the clipped reference object mask code matrix obtained in step S1501 can be inputted pre- First the second convolutional neural networks of training, wherein the second convolutional neural networks may include having identical structure but different weights Multiple convolutional neural networks.In another embodiment, clipped reference object mask code matrix that can also first to being obtained Then point of addition information inputs the clipped reference object mask code matrix and xy coordinate diagrams obtained in step S1501 Second convolutional neural networks of training in advance.Wherein, the clipped reference object mask code matrix point of addition obtained is believed Breath is so that subsequent processing is more sensitive to location information.For example, clipped reference object mask code matrix and constant xy are sat It marks on a map and is connected so that spatial information to be added in network, and then increase the variation of position, reduce symmetry.

Second convolutional neural networks are used to determine the effect of movement of the reference object to object to be predicted.Assuming that answering With in scene, n_OIndicate the total quantity of object mask code matrix, n_DIt indicates the number of object to be predicted, is then directed to (n_O-1)×n_DIt is right Object, total second convolutional neural networks include (n altogether_O-1)×n_DA convolutional neural networks.By taking Fig. 1 as an example, field shown in Fig. 1 Jing Zhong, object mask code matrix include an object mask code matrix to be predicted and three reference object mask code matrixes, wherein to be predicted Object mask code matrix is the object mask code matrix to be predicted for including intelligent body D, and reference object mask code matrix includes respectively ladder A First kind reference object mask code matrix including wall B the second class reference object mask code matrix and include the third of space C Class reference object mask code matrix, thus, corresponding three classes reference object pair one can be obtained by three convolutional neural networks and waited for Predict the effect of object.In addition, similarly, if including two dynamic objects in application scenarios, that is to say, that if in applied field Scape includes two intelligent body D, then can predict that the dynamic object is waited for by first respectively to two dynamic objects respectively Predict that object and the second object to be predicted indicate, then correspondingly, there are five object mask code matrixes altogether in the application scenarios, respectively For the first prediction object mask code matrix, the second prediction object mask code matrix, first kind reference object mask code matrix, the second class reference Object mask code matrix and third class reference object mask code matrix.Correspondingly, with the first prediction object mask code matrix corresponding the One group of reference object mask code matrix includes：Second prediction object mask code matrix, first kind reference object mask code matrix, the second class ginseng According to object mask code matrix and third class reference object mask code matrix.It is then directed to the first prediction object, needs corresponding four Convolutional neural networks, aforementioned four convolutional neural networks constitute first group of convolutional neural networks corresponding with the first prediction object. In addition, second group of reference object mask code matrix corresponding with the second prediction object mask code matrix includes：First prediction object mask Matrix, first kind reference object mask code matrix, the second class reference object mask code matrix and third class reference object mask code matrix. It is then directed to the second prediction object, needs corresponding four convolutional neural networks, aforementioned four convolutional neural networks are constituted and the The corresponding second group of convolutional neural networks of two prediction objects.To sum up, the second convolutional neural networks are to be predicted including corresponding respectively to Totally eight convolutional neural networks of two groups of object.

For ease of description, include an object to be predicted with exemplary scene, for corresponding one group of convolutional neural networks into Row description, but the application is without being limited thereto, it should be appreciated by those skilled in the art that including two or more objects to be predicted, It, can be with parallel processing to obtain every group of reference object respectively in the case of correspondingly including two or more groups convolutional neural networks Effect to corresponding object to be predicted.

In the example by taking Fig. 1 as an example, object mask code matrix includes an object mask code matrix to be predicted and three references Object mask code matrix, then the second convolutional neural networks include three convolutional neural networks, and three convolutional neural networks can be with With identical structure but with different weights.One in the specific implementation, the structure of the convolutional neural networks is similar to shown in Fig. 3 Structure.The order of connection of convolutional neural networks is R (BN (Conv (16,3,2))), R (BN (Conv (32,3,2))), R (BN (Conv (64,3,2))), R (BN (Conv (128,3,2))), the last one convolutional layer successively by 128 dimension hidden layers and 2 dimension export Layer reconstruct and full connection.

It should be noted that the structure and parameter of above-mentioned convolutional neural networks is only for example, those skilled in the art can be with Default behavior is based on to be predicted based on object to be predicted, reference object and reference object included in different application scene The effect of object to carry out variants and modifications to the structure and parameter of convolutional neural networks, and this is no longer going to repeat them.

Here, the effect is the reference object that is learnt based on convolutional neural networks to object Behavior-based control to be predicted Effect caused by mobile.For example, in the case where object to be predicted is currently located at ladder and the behavior of input is upwards, institute It states effect and indicates that object to be predicted moves up a setting distance along ladder, the effect can for example indicate the effect Vector.For another example, in the case that the behavior for above-mentioned coloured flag left and input being currently located in object to be predicted is to the right, due to coloured silk Flag is on the movement of object to be predicted without influence, then corresponding coloured flag class reference object mask can to the effect of object mask to be predicted To be expressed as 0.

In addition, the effect can also include the preset object to be predicted effect of itself.For example, pre- based on application scenarios The object to be predicted being first arranged all moves right certain distance under any circumstance, thus, in view of reference object treat it is pre- In the case of the effect for surveying object, also need to consider the object to be predicted effect of itself finally to determine object Behavior-based control to be predicted Movement.The effect is for example indicated by vector.

In step S1505, the movement of object to be predicted is predicted based on default behavior and identified effect.

In one embodiment, each ginseng clipped each reference object mask code matrix obtained based on each convolutional neural networks Take reference object represented by object mask code matrix to the effect of object to be predicted and object self-acting picture adduction to be predicted with Default behavior based on such as one-hot codings is multiplied to obtain the dynamic prediction to object to be predicted.

In another embodiment, clipped each reference object mask code matrix is obtained based on each convolutional neural networks each Reference object represented by reference object mask code matrix to the effect of object to be predicted and object self-acting to be predicted respectively with Then default behavior based on such as one-hot codings, which is multiplied, to be added again to obtain the dynamic prediction to object to be predicted.

In conclusion the dynamic prediction method of the application by acquired current frame image by being divided into object to be predicted And reference object, and based between the reference object mask code matrix and object mask code matrix to be predicted indicated by object mask code matrix Relationship and default behavior predict the movement of object to be predicted, enabling improve the generalization ability of dynamic prediction, and Object is indicated using mask code matrix so that the prediction process is interpretable.

In practical applications, in some cases, not only need to predict the movement of object to be predicted, it is also necessary to predict next Frame image.In consideration of it, referring to Fig. 5, Fig. 5 is shown as the flow of the application dynamic prediction method in another embodiment Figure, as shown, dynamic prediction method includes step S510, step S530, step S550, step S570 and step S590.

In step S510, current frame image is obtained.Step S510 is identical as the mode of step S110 in aforementioned citing Or it is similar, this will not be detailed here.

In step S570, constant background when being extracted from current frame image.

Wherein, when described constant background refer in image not over time and change object be formed by image. In some embodiments, image background can be obtained for example, by foreground detection mode.In further embodiments, it can be based on pre- Constant background when first trained third convolutional neural networks are extracted from current frame image.For example, the third convolutional Neural net The structure of network includes but not limited to：Full convolution, convolution deconvolution, residual error network (ResNet), Unet etc..

In one example, the third convolutional neural networks are set as convolution deconvolution structure.Referring to Fig. 6, Fig. 6 is shown For the third convolutional neural networks structural schematic diagram in another embodiment that the application dynamic prediction method uses, such as scheme Shown, the third convolutional neural networks are coder-decoder structure.Wherein, I (t) indicates current frame image, I_bg(t) it indicates Current background image, solid arrow indicate that convolution adds activation primitive, dotted arrow to indicate that reconstruct, the expression of single dotted broken line arrow connect entirely It connects, dash-double-dot arrow indicates that deconvolution adds activation primitive, and in this example, activation primitive selects ReLU.Wherein, for all Convolution sum deconvolution, setting convolution kernel, step-length and port number are respectively 3,2 and 64, are hidden between encoder and decoder The dimension of layer is 128.In addition, for the training of a large amount of environment, the port number of convolution could be provided as 128 to improve background separation Effect.Further, it is also possible to which the activation primitive ReLU of the last one warp lamination is replaced with tanh functions to export -1 to 1 model The value enclosed.

In step S530, based on current frame image determine include object to be predicted object mask code matrix to be predicted and Reference object mask code matrix including reference object.Step S530 is identical as the mode of step S130 in aforementioned citing or phase Seemingly, this will not be detailed here.

In step S550, based on relationship between object mask code matrix to be predicted and reference object mask code matrix and pre- If the movement of behavior prediction object to be predicted.Step S550 and the mode of the step S150 in aforementioned citing are same or similar, This is no longer described in detail.

In step S590, in conjunction with extracted when constant background and the object to be predicted predicted movement obtain it is next Frame image.

Wherein, the image I (t+ at the next frame image, that is, above-mentioned t+1 moment corresponding with current frame image I (t) 1).In one embodiment, based on current frame image, background image, object mask code matrix and the object to be predicted predicted Movement uses spatial alternation network (STN) to carry out spatial alternation processing to obtain next frame image.Specifically, on the one hand, be based on The movement of object mask code matrix to be predicted and the object to be predicted predicted carries out spatial alternation using the first spatial alternation network It handles and executes complementary operation to carry out multiplication operation with the when constant background image extracted and then obtain the back of the body at t+1 moment Scape image, wherein the multiplication operation refers to carrying out array element multiplication algorithm.On the other hand, it is based on object mask to be predicted The movement of matrix, current frame image and the object to be predicted predicted carries out spatial alternation using second space converting network Processing is to obtain the object images at t+1 moment.To the object images of the background image and t+1 moment at above-mentioned t+1 moment Sum operation is carried out to obtain the image i.e. next frame image at t+1 moment, wherein the sum operation refers to carrying out array member Plain phase computation system.Similarly, application scenarios include two objects to be predicted in the case of, for two objects to be predicted respectively into Mobile state is predicted and result is simultaneously displayed on next frame image.

It, can be using i.e. the predicted image of next frame image as new frame image in the case where obtaining next frame image It is supplied to step S510, such circulate operation is to predict the whole process of object movement to be predicted.

In addition, being based on above description, for the neural network that the application uses, can be introduced in training neural network Loss function is adjusted neural network, and can evaluate "current" model according to loss function in application.For example, Include but not limited in the application：Entropy loss function is introduced to limit the entropy of object mask code matrix, introduces and returns loss function The motion vector of optimization object mask code matrix and object to be predicted introduces pixel loss function to limit image prediction error, again Structure present image etc..In one embodiment, first convolutional neural networks in the application, second convolutional neural networks And the third convolutional neural networks are obtained through unified training according to loss function.In one example, for present frame Image is the case where acquisition based on initial data, and introduced loss function is as follows.

1) the step of being directed to constant background when being extracted from current frame image, L_BackgroundBackground loss function is indicated, for making network Meet timeinvariance energy.

Where it is assumed that H and W indicate the height and width of image respectively,Indicate t moment background image,It indicates T+1 moment background images, if current frame image I^(t)∈R^H×W×3, then the corresponding background image of current frame imageWherein, R indicates real number space, and the pixel of background image does not change over time.

2) be directed to based on current frame image determine include object to be predicted object mask code matrix to be predicted and including join According to object reference object mask code matrix the step of, wherein

i)L_EntropyEntropy loss function pixel-by-pixel is indicated, for limiting the entropy of object mask code matrix.It is every in order to reduce in the application The uncertain and incentive object mask code matrix of a pixel I (u, v) relationship obtains more discrete distribution and introduces entropy damage pixel-by-pixel It loses.

Wherein, n_OIndicate that the total quantity of object mask code matrix, c indicate full connection features layer in the first convolutional neural networks The c articles channel, f (u, v, c) indicate that the value at position (u, v), i indicate i-th of reference in the c articles channel of full connection features layer Object mask matrix, p indicate that the pixel I (u, v) of input picture belongs to the probability of the c articles channel object.

ii)L_ShortcutIt indicates to return loss function, for optimizing reference object mask code matrix and motion vector, in reconstruct image Early stage feedback is provided as before.

Wherein, n_OIndicate the total quantity of object mask code matrix, n_DIndicate that the quantity of independent object to be predicted, j indicate jth A object to be predicted, t indicate that t moment, t+1 indicate the t+1 moment,Indicate the motion vector of object Dj to be predicted andE_self(Dj) indicate the effect of object itself to be predicted andE(O_i,D_j) indicate i-th of reference The effect of j-th of object to be predicted of object pair andn_αThe quantity of expression behavior, α^(t)Expression behavior and

3) it is directed to and predicts the movement of the object to be predicted and combine to be carried based on default behavior and identified effect The when constant background and movement of the object to be predicted predicted the step of obtaining next frame image taken, wherein

i)L_PredictionIt indicates image prediction error, in the application, uses l₂Pixel loss limits image prediction error.

Wherein, I^(t+1)Indicate the frame of pixels at t+1 moment,Indicate the frame of pixels at predicted t+1 moment, In,Indicate frame of pixels of the object to be predicted predicted at the t+1 moment,Indicate predicted reference object the The frame of pixels at t+1 moment, STN representation space converting networks indicate that array element is multiplied.

ii)L_ReconstructIt indicates reconstructed error, in the application, uses similar l₂Pixel loss reconstructs current frame image.

iii)L_{Conformity error}The consistency that pixel changes when being moved for description object.

In conclusion the first convolutional neural networks described herein, the second convolutional neural networks and third convolution Neural network is according to through assigning different weights to above-mentioned loss function and combining and the total losses function that obtains carries out Adjusting training.That is, the first convolutional neural networks described herein, the second convolutional neural networks and third volume Product neural network is trained based on the total losses in formula (9).

L_Always=L_Shortcut+λ_pL_Prediction+λ_rL_Reconstruct+λ_cL_{Conformity error}+λ_bgL_Background+λ_eL_Entropy(formula 9)

Wherein, λ p, λ r, λ c, λ bg and λ e indicate weight.

In addition, being the case where acquisition based on the outer input data with priori, in nerve for current frame image Additional candidate region error L is introduced in the training of network_{Candidate region}。

Wherein,It is denoted as candidate dynamic area.

In addition, on the one hand, the application determines packet in appearance level when carrying out object detection based on current frame image The object mask code matrix to be predicted of object to be predicted and the reference object mask code matrix including reference object are included, on the other hand, Carry out dynamic prediction when in relationship level Behavior-based control and object relationship can also determine object mask code matrix to be predicted and Reference object mask code matrix, therefore, it is possible to carry out comparison compensation to result determined by the two to determine more accurately with reference to right As mask code matrix and reference object mask code matrix.

The application also provides a kind of Dynamic Forecasting System.Exist referring to Fig. 7, Fig. 7 is shown as the application Dynamic Forecasting System Structural schematic diagram in a kind of embodiment, as shown, the Dynamic Forecasting System includes acquiring unit 11, object detection list Member 12 and predicting unit 13.

Acquiring unit 11 is for obtaining current frame image.Wherein, the current frame image is relative to being described later on down For one frame image.In this example, current frame image refers to the image I (t) in t moment, and next frame image refers to be predicted The t+1 moment image I (t+1).In the following description, current frame image I is indicated with exemplary scene image shown in FIG. 1 (t)。

Subject detecting unit 12 be used for based on current frame image determine include object to be predicted object mask square to be predicted Battle array and the reference object mask code matrix including reference object.

In one embodiment, subject detecting unit 12 is obtained using such as foreground detection mode from sequence image to be predicted Object, based on the feature of reference object in the application scenarios pre-entered to obtain reference object by feature recognition, pass through It is to be predicted right to obtain that mask code matrix carries out mask code matrix processing to the current frame image including object to be predicted and reference object As mask code matrix and reference object mask code matrix.Wherein, the object mask code matrix to be predicted and reference object mask code matrix point Location information of the mask code matrix in current frame image is not included, so that can be determined by the location information to be predicted The position of object mask code matrix and reference object mask code matrix relative to current frame image.In one example, the reference object Mask code matrix and the object mask code matrix to be predicted are to carry out mask code matrix operation based on current frame image with artwork size to obtain It arrives.

In another embodiment, subject detecting unit 12 is used to be based on institute using the first convolutional neural networks of training in advance It states current frame image and determines the object mask code matrix to be predicted for including object to be predicted and the reference object including reference object Mask code matrix.In one example, the first convolutional neural networks may include that multiple structures are identical but the convolutional Neural of weighted Network.Current frame image can input to each convolutional neural networks of training in advance, the output layers of each convolutional neural networks via Channel is interconnected amongst one another to form full connection features figure, be followed by pixel-by-pixel softmax layers of (pixel-wise) to obtain specific to object Object mask code matrix to be predicted and reference object mask code matrix specific to class, wherein the number of the convolutional neural networks It can be determined based on the number of object mask code matrix.By taking Fig. 1 as an example, exemplary scene according to figure 1, object mask code matrix Thus it can pass through four convolutional neural networks including an object mask code matrix to be predicted and three reference object mask code matrixes Corresponding four object mask code matrixes are obtained, four convolutional neural networks can be with identical structure but with different power Weight.If exemplary scene shown in FIG. 1 relatively passes through five convolutional neural networks including two dynamic objects, that is, intelligent body D Corresponding five object mask code matrixes are obtained, five convolutional neural networks can be with identical structure but with different power Weight.

Predicting unit 13 be used for based between object mask code matrix to be predicted and reference object mask code matrix relationship and The movement of default behavior prediction object to be predicted.

In the application, predicting unit 13 is with default behavior and object mask code matrix to be predicted and reference object mask code matrix Between relationship predict the movement of object to be predicted.In some embodiments, predicting unit 13 is also based on to be predicted right As between mask code matrix and reference object mask code matrix relationship and default behavior prediction include object to be predicted and with reference to right As the movement of all objects inside, but the prediction mode is computationally intensive compared to for the movement for only predicting object to be predicted, Inefficiency.

Referring to Fig. 8, Fig. 8 is shown as the knot of predicting unit in one embodiment in the application Dynamic Forecasting System Structure schematic diagram, as shown, predicting unit 13 may include cutting module 131, effect determining module 132 and prediction module 133。

Module 131 is cut to be used to centered on the position of object to be predicted in object mask code matrix to be predicted regard according to default Wild window size cuts reference object mask code matrix to obtain clipped reference object mask code matrix.

In one embodiment, effect determining module 132 is used to determine warp based on the second convolutional neural networks of training in advance Effect of the reference object to object to be predicted represented by the reference object mask code matrix of cutting.In another embodiment, it acts on Determining module 132 can be also used for the clipped reference object mask code matrix point of addition information to being obtained and based on advance The second trained convolutional neural networks determine the reference object represented by clipped reference object mask code matrix to be predicted right The effect of elephant.

That is, the clipped reference object mask code matrix that can will be obtained via cutting module 131 inputs in advance The second trained convolutional neural networks, wherein the second convolutional neural networks may include having identical structure but different weights Multiple convolutional neural networks.Alternatively, can be first to the clipped reference object mask code matrix point of addition information obtained, so It afterwards will be via cutting the of clipped reference object mask code matrix that module 131 obtains and xy coordinate diagrams input training in advance Two convolutional neural networks.Wherein, to the clipped reference object mask code matrix point of addition information that is obtained so that follow-up Processing is more sensitive to location information.For example, being connected clipped reference object mask code matrix to incite somebody to action with constant xy coordinate diagrams Spatial information is added in network, and then increases the variation of position, reduces symmetry.

Prediction module 133 is used to predict the movement of object to be predicted based on default behavior and identified effect.

In one embodiment, prediction module is used to clipped each reference object mask code matrix being based on each convolutional Neural net Effect and to be predicted object itself of the reference object represented by each reference object mask code matrix that network obtains to object to be predicted Effect phase adduction is multiplied to obtain the dynamic prediction to object to be predicted with the default behavior based on such as one-hot codings.

In another embodiment, prediction module is used to clipped each reference object mask code matrix being based on each convolutional Neural The reference object represented by each reference object mask code matrix that network obtains to the effect of object to be predicted and object to be predicted from Body effect is multiplied with the default behavior encoded based on such as one-hot and then is added again to obtain moving object to be predicted respectively State is predicted.

Referring to Fig. 9, Fig. 9 is shown as predicting unit in the application Dynamic Forecasting System in another embodiment Structural schematic diagram, in conjunction with Fig. 1, as shown in figure 9, one in the specific implementation, the cutting module in predicting unit is received via right After the object mask code matrix including object mask code matrix to be predicted and reference object mask code matrix determined as detection unit, cut out Cut-off-die block determines the position of object to be predicted in object mask code matrix to be predicted and foundation presets the visual field centered on the position Window, which to reference object mask code matrix cut, obtains clipped reference object mask code matrix, and then, effect determining module is logical Xy coordinate diagrams are crossed to the clipped reference object mask code matrix point of addition information that is obtained and based on the second of training in advance Convolutional neural networks determine work of each reference object represented by clipped each reference object mask code matrix to object to be predicted With.Then, prediction module to above-mentioned reference object to itself of the effect of object to be predicted and preset object to be predicted Effect summation simultaneously carries out dot product to predict the movement of object to be predicted with preset behavior.

In conclusion the Dynamic Forecasting System of the application will be acquired in acquiring unit by using subject detecting unit Current frame image is divided into object and reference object to be predicted, and using predicting unit based on the ginseng indicated by object mask code matrix According between object mask code matrix and object mask code matrix to be predicted relationship and default behavior predict the fortune of object to be predicted It is dynamic, enabling to improve the generalization ability of dynamic prediction, and object is indicated using mask code matrix so that the prediction process can It explains.

In practical applications, in some cases, not only need to predict the movement of object to be predicted, it is also necessary to predict next Frame image.In consideration of it, referring to Fig. 10, Figure 10 is shown as the structure of the application Dynamic Forecasting System in another embodiment Schematic diagram, as shown, Dynamic Forecasting System includes acquiring unit 91, subject detecting unit 92, predicting unit 93 and extraction Unit 94.

Acquiring unit 91 is for obtaining current frame image.Acquiring unit 91 it is identical as the acquiring unit 11 in aforementioned citing or It is similar, it is no longer similar herein.

Constant background when being used to extract from current frame image of extraction unit 94.

Wherein, when described constant background refer in image not over time and change object be formed by image. In some embodiments, extraction unit 94 can obtain image background for example, by foreground detection mode.In further embodiments, Constant background when can be used for extracting from current frame image based on third convolutional neural networks trained in advance of extraction unit 94. For example, the structure of the third convolutional neural networks includes but not limited to：Full convolution, convolution deconvolution, residual error network (ResNet), Unet etc..

Subject detecting unit 92 be used for based on current frame image determine include object to be predicted object mask square to be predicted Battle array and the reference object mask code matrix including reference object.Subject detecting unit 92 and the subject detecting unit in aforementioned citing 12 is same or similar, no longer similar herein.

93 one side of predicting unit is used for based on the pass between object mask code matrix to be predicted and reference object mask code matrix The movement of system and default behavior prediction object to be predicted, constant background and institute are pre- when being on the other hand additionally operable to combine extracted The movement for the object to be predicted surveyed obtains next frame image.Wherein, predicting unit 93 is used to be based on object mask code matrix to be predicted The movement of relationship and default behavior prediction object to be predicted between reference object mask code matrix and aforementioned predicting unit 13 For based between object mask code matrix to be predicted and reference object mask code matrix relationship and default behavior prediction it is to be predicted The movement of object is same or similar, no longer similar herein.

In addition, when being used to combine extracted about predicting unit 93 constant background and the object to be predicted predicted fortune The dynamic embodiment for obtaining next frame image, wherein the next frame image, that is, above-mentioned corresponding with current frame image I (t) The image I (t+1) at t+1 moment.In one embodiment, predicting unit 93 is based on current frame image, background image, object mask The movement of matrix and the object to be predicted predicted uses spatial alternation network (STN) to carry out spatial alternation processing to obtain Next frame image.Specifically, on the one hand, predicting unit 93 is based on object mask code matrix to be predicted and the object to be predicted predicted Movement, use the first spatial alternation network to carry out spatial alternation processing and execute complementary operation with the when constant back of the body that is extracted Scape image carries out multiplication operation and then obtains the background image at t+1 moment, wherein the multiplication operation refers to carrying out array member Plain multiplication algorithm.On the other hand, predicting unit 93 is based on object mask code matrix to be predicted, current frame image and that is predicted wait for The movement for predicting object uses second space converting network to carry out spatial alternation processing to obtain the object images at t+1 moment. Background image and the object images at t+1 moment to the above-mentioned t+1 moment carry out sum operation to obtain the figure at t+1 moment Picture i.e. next frame image, wherein the sum operation refers to carrying out array element phase computation system.Similarly, include two in application scenarios In the case of a object to be predicted, predicting unit for two objects to be predicted carries out dynamic prediction and respectively by result while aobvious Show on next frame image.

In the case where predicting unit 93 obtains next frame image, can be predicted image using next frame image as New frame image is supplied to acquiring unit 91, and such circulate operation is to predict the whole process of object movement to be predicted.

It please refers to Fig.1 1, Figure 11 and is shown as structural representation of the application Dynamic Forecasting System in another embodiment Figure, as shown, one in the specific implementation, the Dynamic Forecasting System can be an end-to-end deep neural network, institute It includes multiple convolutional neural networks to state deep neural network, and the deep neural network is defeated with current frame image and behavior Enter, by exporting predicted next frame image after housebroken neural network.In conjunction with Fig. 1, as shown in figure 11, depth nerve net Network is using current frame image I (t) and behavior as input, on the one hand, extraction unit is based on third convolutional neural networks from present frame Extraction background image I in image I (t)_bg(t).On the other hand, subject detecting unit is based on current using the first convolutional neural networks Frame image I (t) determines that object mask code matrix, the object mask code matrix include reference object mask code matrix shown in top under Object mask code matrix to be predicted shown in portion.Then, predicting unit is based on object mask code matrix using the second convolutional neural networks Between relationship and behavior prediction object to be predicted movement.Then, on the one hand, pre- based on waiting for as shown in double dot dash line in figure The movement for surveying object mask code matrix and the object to be predicted predicted carries out spatial alternation processing using STN and executes fortune of negating It calculates to carry out multiplication operation with the when constant background image extracted and then obtain the background image at t+1 moment, wherein described Multiplication operation refers to carrying out array element multiplication algorithm.On the other hand, as shown in phantom in FIG., it is based on object mask square to be predicted Battle array and current frame image carry out then movement that multiplication operation combines the object to be predicted predicted on this basis, using STN Spatial alternation processing is carried out to obtain the object images at t+1 moment.Finally, to the background image and t at above-mentioned t+1 moment The object images at+1 moment carry out sum operation to obtain the image i.e. next frame image I (t+1) at t+1 moment, wherein described Multiplication operation refers to carrying out array element multiplication algorithm, and the sum operation refers to carrying out array element phase computation system.In addition, deep Next frame image I (t+1) can also be predicted the image I (t+2) at t+2 moment by degree neural network as new frame image, Such circulate operation obtains the whole process of object movement to be predicted.The Dynamic Forecasting System of the application uses the end of an entirety Opposite end neural network so that operated as a whole in the training and use to neural network, reduce manual intervention, realized Good estimated performance.

The application also provides a kind of equipment, please refers to Fig.1 2, Figure 12 and is shown as the application equipment in one embodiment Structural schematic diagram, as shown, the equipment includes storage device 21 and processing unit 22.

Storage device 21 is for storing at least one program.Described program includes being called by processing unit 22 of being described later on With execute acquisition, determination, extraction, prediction and etc. corresponding program.The storage device includes but not limited to that high speed is deposited at random Access to memory, nonvolatile memory.Such as one or more disk storage equipments, flash memory device or other nonvolatile solid states Storage device.In certain embodiments, storage device can also include the memory far from one or more processors, for example, Via the network attached storage that RF circuits or outside port and communication network (not shown) access, wherein the communication network Can be internet, one or more intranet, LAN (LAN), wide area network (WLAN), storage area network (SAN) etc. or its It is appropriately combined.Memory Controller can control the other assemblies of such as CPU and Peripheral Interface of robot etc to storage device Access.

Processing unit 22 is connected with storage device 21.The processing unit may include one or more processors.Processing dress Set operationally in storage device volatile memory and/or nonvolatile memory couple.Processing unit can perform The instruction stored in memory and/or non-volatile memory device to execute operation in a device, such as to acquired current Frame is analyzed and predicts the movement etc. of object to be predicted.In this way, processor may include one or more general purpose microprocessors, one A or multiple application specific processors (ASIC), one or more digital signal processors (DSP), one or more field-programmables are patrolled Collect array (FPGA) or any combination of them.In a kind of example, the processing unit connects storage dress by data line It sets.The processing unit is interacted by reading and writing data technology with storage device.Wherein, the reading and writing data technology include but It is not limited to：At a high speed/low speed data interface protocol, data base read-write operation etc..

Processing unit 22 is used to call at least one program to execute any dynamic prediction method above-mentioned. The dynamic prediction method includes：First, it is based on initial data and obtains t moment image I (t).Then, based on training in advance Convolutional neural networks execute operations described below using current frame image I (t) and default behavior as input：1) from image I (t) Constant background when extraction；2) based on current frame image I (t) determinations include object to be predicted object mask code matrix to be predicted and Reference object mask code matrix including reference object.Object mask code matrix wherein to be predicted is determined based on object to be predicted, Reference object mask code matrix is that the incidence relation moved based on reference object and object to be predicted is determined；3) it is to be predicted right to be based on As the movement of relationship and default behavior prediction object to be predicted between mask code matrix and reference object mask code matrix.Show one In example, according to default visual field window size to reference first centered on the position of object to be predicted in object mask code matrix to be predicted Object mask code matrix is cut to obtain clipped reference object mask code matrix, then, to the clipped ginseng obtained According to object mask code matrix point of addition information and determine that the reference object represented by clipped reference object mask code matrix is treated It predicts the effect of object, then, the movement of object to be predicted is predicted based on default behavior and identified effect；4) it combines and is carried Take when constant background and the object to be predicted predicted movement obtain t+1 moment image I (t+1).That is, through pre- T+1 moment image I (t+1) are exported after first trained convolutional neural networks processing.In addition, being repeated based on image I (t+1) above-mentioned Operation is recycled with this with obtaining t+2 moment image I (t+2) to predict the whole process of object movement to be predicted.

It please refers to Fig.1 3, Figure 13 and is shown as the structural schematic diagram of the application equipment in another embodiment, as schemed institute Show, the equipment further includes display device 23.Display device 23 is connected with processing unit 22.In one example, the processing dress It sets and display device is connected by data line.The processing unit is interacted by interface protocol and display device.Wherein, described Interface protocol includes but not limited to：HDMI interface agreement, serial interface protocol etc..

In certain embodiments, display device is for showing object mask code matrix to be predicted, reference object mask code matrix, warp At least one of the exercise data of the object to be predicted of prediction.Wherein, described to be predicted right by taking scene shown in Fig. 1 as an example As mask code matrix is the object mask code matrix to be predicted for indicating intelligent body D.The reference object mask code matrix is to indicate ladder A The spatial class ginseng of ladder class reference object mask code matrix, the wall kind reference object mask code matrix for indicating wall B, representation space C According to object mask code matrix.The exercise data of the predicted object to be predicted includes but not limited to：Object to be predicted moves rail Mark, the object direction of motion to be predicted and numerical value such as move upwards three pixels, next frame image.

In certain embodiments, can also by showing object mask image non-object mask code matrix come just more intuitive Dynamic prediction process is observed on ground.In consideration of it, processing unit be additionally operable to based on current frame image, object mask code matrix to be predicted and Reference object mask code matrix generates object mask image and reference object mask image to be predicted；Display device is additionally operable to display and waits for Predict object mask image and/or reference object mask image.By taking scene shown in Fig. 1 as an example, the object mask figure to be predicted As being that the object to be predicted including intelligent body D that processing unit is generated based on current frame image and object mask code matrix to be predicted is covered Code image.The reference object mask image is the packet that processing unit is generated based on current frame image and reference object mask code matrix Include the ladder class reference object mask image of ladder A including the wall kind reference object mask image of wall B including space C Spatial class reference object mask image.Display device can show prediction object mask image, reference object according to for demand Mask image, partly referring to image mask image or combinations thereof.

In addition, the processing unit is additionally operable to based on object mask image to be predicted and the generation pair of reference object mask image As mask image；The display device is additionally operable to show the object mask image.In one embodiment, the processing unit root Processing is overlapped to generate pair to object mask image to be predicted and corresponding reference object mask image according to user demand It exports as mask image and through display device.By taking scene shown in Fig. 1 as an example, the processing unit is pre- to waiting for including intelligent body D It surveys object mask image including the ladder class reference object mask image of ladder A including the wall kind reference object of wall B is covered Code image is overlapped processing to obtain object mask image, and user can be based on the object phase to be predicted shown by display device Object to be predicted is intuitively observed for the position of ladder and wall.Thus, by display device show object mask code matrix, Object mask image allows user by checking that image and respective value learn the opposite of object and reference object to be predicted The movement of position relationship, object to be predicted can visually and semantically explain dynamic model.

It is further to note that through the above description of the embodiments, those skilled in the art can be clearly Recognize that some or all of the application can be realized by software and in conjunction with required general hardware platform.Based on such reason Solution, the application also provide a kind of computer readable storage medium, and the storage medium is stored at least one program, described program Any dynamic prediction method above-mentioned is realized when executed.

Based on this understanding, substantially the part that contributes to existing technology can in other words for the technical solution of the application To be expressed in the form of software products, which may include be stored thereon with machine-executable instruction one A or multiple machine readable medias, these instructions by computer, computer network or other electronic equipments etc. one or Multiple machines may make the one or more machine to execute operation according to an embodiment of the present application when executing.Such as execute machine Each step in the localization method of people etc..Machine readable media may include, but be not limited to, floppy disk, CD, CD-ROM (compact-discs- Read-only memory), magneto-optic disk, ROM (read-only memory), RAM (random access memory), (erasable programmable is read-only by EPROM Memory), EEPROM (electrically erasable programmable read-only memory), magnetic or optical card, flash memory or executable suitable for storage machine Other kinds of medium/machine readable media of instruction.Wherein, the storage medium can may be alternatively located at third party positioned at robot In server, certain is provided using in the server in store as being located at.Concrete application store is not limited at this, such as millet application Store, Huawei are using store, apple using store etc..

The application can be used in numerous general or special purpose computing system environments or configuration.Such as：Personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer including any of the above system or equipment Distributed computing environment etc..

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Usually, program module includes routines performing specific tasks or implementing specific abstract data types, program, object, group Part, data structure etc..The application can also be put into practice in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage device.

The principles and effects of the application are only illustrated in above-described embodiment, not for limitation the application.It is any ripe Know the personage of this technology all can without prejudice to spirit herein and under the scope of, carry out modifications and changes to above-described embodiment.Cause This, those of ordinary skill in the art is complete without departing from spirit disclosed herein and institute under technological thought such as At all equivalent modifications or change, should be covered by claims hereof.

Claims

1. a kind of dynamic prediction method, which is characterized in that include the following steps：

Obtain current frame image；

Include the object mask code matrix to be predicted of object to be predicted and including reference object based on current frame image determination Reference object mask code matrix；

Based between the object mask code matrix to be predicted and the reference object mask code matrix relationship and default behavior it is pre- Survey the movement of the object to be predicted.

2. dynamic prediction method according to claim 1, which is characterized in that the object mask code matrix to be predicted is to be based on What the object to be predicted determined, the reference object mask code matrix is transported based on the reference object and the object to be predicted What the type of dynamic incidence relation or the reference object determined.

3. dynamic prediction method according to claim 1 or 2, which is characterized in that described true based on the current frame image Surely include the steps that the object mask code matrix to be predicted of object to be predicted and the reference object mask code matrix including reference object Including：Using in advance training the first convolutional neural networks based on the current frame image determine include object to be predicted wait for it is pre- Survey object mask code matrix and the reference object mask code matrix including reference object.

4. dynamic prediction method according to claim 3, which is characterized in that described to be based on the object mask square to be predicted Described in relationship and default behavior prediction between battle array and the reference object mask code matrix the step of movement of object to be predicted Including：

According to default visual field window size to described centered on the position of object to be predicted in the object mask code matrix to be predicted Reference object mask code matrix is cut to obtain clipped reference object mask code matrix；

The ginseng represented by the clipped reference object mask code matrix is determined based on the second convolutional neural networks of training in advance Effect according to object to the object to be predicted；

The movement of the object to be predicted is predicted based on default behavior and identified effect.

5. dynamic prediction method according to claim 4, which is characterized in that further comprising the steps of：To the warp obtained The reference object mask code matrix point of addition information of cutting simultaneously determines the warp based on the second convolutional neural networks of training in advance Effect of the reference object to the object to be predicted represented by the reference object mask code matrix of cutting.

6. dynamic prediction method according to claim 4, which is characterized in that the effect further includes preset to be predicted right As the effect of itself.

7. dynamic prediction method according to claim 1, which is characterized in that further comprising the steps of：

Constant background when being extracted from the current frame image；

In conjunction with extracted when constant background and the object to be predicted predicted movement obtain next frame image.

8. dynamic prediction method according to claim 4, which is characterized in that further comprising the steps of：

Constant background when being extracted from the current frame image based on third convolutional neural networks trained in advance；

9. dynamic prediction method according to claim 8, which is characterized in that the third convolutional neural networks are set as rolling up Product deconvolution structure.

10. dynamic prediction method according to claim 8, which is characterized in that first convolutional neural networks, described Two convolutional neural networks and the third convolutional neural networks are obtained through unified training according to loss function.

11. dynamic prediction method according to claim 1, which is characterized in that the current frame image is to be based on original number According to or with priori outer input data obtain.

12. a kind of Dynamic Forecasting System, which is characterized in that including：

Acquiring unit, for obtaining current frame image；

Subject detecting unit, for including the object mask code matrix to be predicted of object to be predicted based on current frame image determination And the reference object mask code matrix including reference object；

Predicting unit, for based on the relationship between the object mask code matrix to be predicted and the reference object mask code matrix with And the movement of object to be predicted described in default behavior prediction.

13. Dynamic Forecasting System according to claim 12, which is characterized in that the object mask code matrix to be predicted is base It is determined in the object to be predicted, the reference object mask code matrix is based on the reference object and the object to be predicted What the type of the incidence relation of movement or the reference object determined.

14. Dynamic Forecasting System according to claim 12 or 13, which is characterized in that the subject detecting unit is for making With in advance training the first convolutional neural networks based on the current frame image determine include object to be predicted object to be predicted Mask code matrix and reference object mask code matrix including reference object.

15. Dynamic Forecasting System according to claim 14, which is characterized in that the predicting unit includes：

Module is cut, for the default visual field of foundation centered on the position of object to be predicted in the object mask code matrix to be predicted Window size cuts the reference object mask code matrix to obtain clipped reference object mask code matrix；

Determining module is acted on, for determining that the clipped reference object is covered based on the second convolutional neural networks of training in advance Effect of the reference object to the object to be predicted represented by code matrix；

Prediction module, the movement for predicting the object to be predicted based on default behavior and identified effect.

16. Dynamic Forecasting System according to claim 15, which is characterized in that the effect determining module is used for being obtained Clipped reference object mask code matrix point of addition information and based in advance training the second convolutional neural networks determine Effect of the reference object to the object to be predicted represented by the clipped reference object mask code matrix.

17. Dynamic Forecasting System according to claim 15, which is characterized in that the effect further includes preset to be predicted The effect of object itself.

18. Dynamic Forecasting System according to claim 12, which is characterized in that further include：

Extraction unit, constant background when for being extracted from the current frame image；

Under the movement of the predicting unit constant background and the object to be predicted predicted when being additionally operable to combine extracted obtains One frame image.

19. Dynamic Forecasting System according to claim 15, which is characterized in that the extraction unit is used for based on instruction in advance Experienced third convolutional neural networks constant background when being extracted from the current frame image；The predicting unit is additionally operable to combine institute Extraction when constant background and the object to be predicted predicted movement obtain next frame image.

20. Dynamic Forecasting System according to claim 19, which is characterized in that the third convolutional neural networks are set as Convolution deconvolution structure.

21. Dynamic Forecasting System according to claim 19, which is characterized in that first convolutional neural networks, described Second convolutional neural networks and the third convolutional neural networks are obtained through unified training according to loss function.

22. Dynamic Forecasting System according to claim 12, which is characterized in that the current frame image is to be based on original number According to or with priori outer input data obtain.

23. a kind of computer readable storage medium is stored at least one program, which is characterized in that at least one program It is performed and realizes any dynamic prediction method in claim 1-11.

24. a kind of equipment, which is characterized in that including：

Storage device, for storing at least one program；

Processing unit is connected with the storage device, for calling at least one program to execute such as claim 1-11 In any dynamic prediction method.

25. equipment according to claim 24, which is characterized in that further include display device, the display device is for showing Show the movement number of the object mask code matrix to be predicted, the reference object mask code matrix, the predicted object to be predicted At least one of according to.

26. equipment according to claim 24, which is characterized in that the processing unit is additionally operable to be based on the present frame figure Picture, the object mask code matrix to be predicted and the reference object mask code matrix generate object mask image to be predicted and reference Object mask image；The display device is additionally operable to show that the object mask image to be predicted and/or the reference object are covered Code image.

27. equipment according to claim 26, which is characterized in that the processing unit is additionally operable to based on described to be predicted right As mask image and the reference object mask image generate object mask image；It is described right that the display device is additionally operable to show As mask image.