CN110334654A - Video estimation method and apparatus, the training method of video estimation model and vehicle - Google Patents
Video estimation method and apparatus, the training method of video estimation model and vehicle Download PDFInfo
- Publication number
- CN110334654A CN110334654A CN201910610206.9A CN201910610206A CN110334654A CN 110334654 A CN110334654 A CN 110334654A CN 201910610206 A CN201910610206 A CN 201910610206A CN 110334654 A CN110334654 A CN 110334654A
- Authority
- CN
- China
- Prior art keywords
- frame image
- characteristic pattern
- previous
- frame
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/584—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
Abstract
This application discloses a kind of video estimation method and apparatus, the training method and vehicle of video estimation model, the video estimation method comprises determining that the N characteristic pattern of previous N frame image, wherein, N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;Future M frame image is generated according to N characteristic pattern, wherein P frame image is spaced between the following M frame image and previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.By determining and previous N frame image, M frame image of discontinuous future, prediction of the realization to the video frame in future time section improve forecasting efficiency so as to shorten the calculating time, reduce resource occupation and computational burden to the technical solution of the application in time.
Description
Technical field
The present invention relates to technical field of computer vision, and in particular to a kind of video estimation method and apparatus, video estimation
The training method and vehicle of model.
Background technique
Video estimation can predict subsequent video according to given video, so that user can be according to prediction
Video judges in advance or decision.Existing video estimation method is generally based on former frame prediction a later frame, and by not
It is disconnected to repeat the prolonged video estimation of operation realization.
Summary of the invention
In order to solve the above-mentioned technical problem, the application is proposed.Embodiments herein provides a kind of video estimation side
Method and device, the training method of video estimation model and vehicle.
According to the one aspect of the application, a kind of video estimation method is provided, comprising: determine the N of previous N frame image
Characteristic pattern, wherein N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;It is generated not according to N characteristic pattern
Carry out M frame image, wherein P frame image is spaced between the following M frame image and previous N frame image, N is the integer greater than 1, and M, P are big
In or equal to 1 integer.
According to further aspect of the application, a kind of video estimation device is provided, comprising: determining module, for determining
The N characteristic pattern of previous N frame image, wherein N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;It generates
Module, for generating future M frame image according to N characteristic pattern, wherein be spaced P between the following M frame image and previous N frame image
Frame image, N are the integer greater than 1, and M, P are the integer more than or equal to 1.
According to the another aspect of the application, a kind of training method of video estimation model is provided, comprising: pass through utilization
Multiple Sample video training machine learning models obtain video estimation model, and each Sample video in multiple Sample videos includes
Previous N frame sample image and future M frame sample image, are spaced P frame between the following M frame sample image and previous N frame sample image
Image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
According to the another aspect of the application, a kind of computer readable storage medium is provided, storage medium is stored with meter
Calculation machine program, computer program is for executing above-mentioned video estimation method.
According to the another aspect of the application, a kind of electronic equipment is provided, comprising: processor;For storage processor
The memory of executable instruction, wherein processor is for executing above-mentioned video estimation method.
According to the another aspect of the application, a kind of vehicle is provided, including above-mentioned electronic equipment.
The embodiment of the present application provides a kind of video estimation method and apparatus, the training method and vehicle of video estimation model
, by the N characteristic pattern using the previous N frame image in known video, determination is discontinuous in time with previous N frame image
The following M frame image, realize prediction to the video frame in future time section, due to be omitted to be located at previous N frame image and
The prediction of video frame among the following M frame image calculates time, reduction resource occupation and computational burden so as to shorten,
Improve forecasting efficiency.
Detailed description of the invention
The embodiment of the present application is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purposes of the application,
Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present application, and constitutes explanation
A part of book is used to explain the application together with the embodiment of the present application, does not constitute the limitation to the application.In the accompanying drawings,
Identical reference label typically represents same parts or step.
Fig. 1 is the system architecture schematic diagram for the video estimation system that one exemplary embodiment of the application provides.
Fig. 2 is the flow diagram for the video estimation method that one exemplary embodiment of the application provides.
Fig. 3 is the schematic diagram of a scenario for the video estimation method that one exemplary embodiment of the application provides.
Fig. 4 is the process signal of the N characteristic pattern for the previous N frame image of determination that the application another exemplary embodiment provides
Figure.
Fig. 5 is that the process for generating future M frame image according to N characteristic pattern that the application another exemplary embodiment provides is shown
It is intended to.
Fig. 6 is the structural schematic diagram for the video estimation model that one exemplary embodiment of the application provides.
Fig. 7 is the structural schematic diagram for the video estimation device that one exemplary embodiment of the application provides.
Fig. 8 is the block diagram for the electronic equipment that one exemplary embodiment of the application provides.
Specific embodiment
In the following, example embodiment according to the application will be described in detail by referring to the drawings.Obviously, described embodiment is only
It is only a part of the embodiment of the application, rather than the whole embodiments of the application, it should be appreciated that the application is not by described herein
The limitation of example embodiment.
Application is summarized
Video estimation can make prediction to the change in location of entity each in video image, or in video image
The positional relationship of each entity and ambient enviroment is made prediction, so as to judge in advance convenient for user.Therefore, video preprocessor
It surveys and is with a wide range of applications in fields such as robot decision, automatic Pilot and video understandings.
For example, predicting future video by the previous video acquired according to camera in automatic Pilot field, it can be determined that
The motion state of vehicle around current vehicle is improved and is driven so as to adjust the motion state of current vehicle in time
Safety.
Existing video estimation method is to predict a later frame image based on previous frame image, by constantly repeating this behaviour
Make to carry out prolonged video estimation.For example, user it is desirable that future time section be 95s-100s video frame, it is existing
The video frame that video estimation method can be 0s-100s based on existing video estimation future time section, to obtain future time
Section is the video frame of 95s-100s.However, the video frame of 0s-95s is not user's needs in the video frame of 0s-100s
, and during prediction, continuously predict that the video frame of future 0s-100s can be than relatively time-consuming and can account for for existing video
With more computing resource, increase computational burden.
So existing video estimation method is difficult to the view in the period needed for directly obtaining user according to user demand
Frequency frame there is a problem of time-consuming long, computational burden weight.
Exemplary system
Fig. 1 is the system architecture schematic diagram for the video estimation system 1 that one exemplary embodiment of the application provides, it illustrates
A kind of video according to image capture device (for example, camera) acquisition is answered what the video in certain following period was predicted
Use scene.As shown in Figure 1, the video estimation system 1 includes electronic equipment 10, image capture device 20.Electronic equipment 10 passes through
According to the video in certain following period of image capture device video estimation collected, and then according to the video of the prediction to end
End executes corresponding operation.Here terminal can be automatic driving vehicle, and image capture device 20 can be vehicle-mounted camera,
It is also possible to the camera being mounted on other equipment (for example, unmanned plane).
It should be noted that the image capture device 20 in the embodiment of the present application can integrate on electronic equipment 10.
It should be noted that above-mentioned application scenarios are merely for convenience of understanding spirit herein and principle and showing, this
Embodiment is not limited to this for application.On the contrary, embodiments herein can be applied to any scene that may be applicable in.
Illustrative methods
Fig. 2 is the flow diagram for the video estimation method that one exemplary embodiment of the application provides.The present embodiment is held
Row main body for example can be the electronic equipment in Fig. 1, as shown in Fig. 2, this method comprises the following steps:
Step 210: determining the N characteristic pattern of previous N frame image, wherein N characteristic pattern includes the sky of previous N frame image
Between feature and temporal characteristics.
Video estimation method provided by the embodiments of the present application can be used for automatic Pilot field, and the entity in video, which can be, works as
Vehicle around vehicle in front and/or current vehicle, electronic equipment can be the controller of the onboard system on current vehicle, or
Independently of other controllers except the current vehicle.The video for the prediction that controller can be obtained according to video estimation is directly controlled
The video of prediction can be passed through the display screen on vehicle by the states such as speed, the steering of current vehicle processed or the controller
It plays, so that driver adjusts the states such as speed, the steering of current vehicle according to the content in the video of prediction.
Specifically, previous N frame image can be camera and acquire in the preset time period before current point in time
The video multiple images that are included, here, N can be 2 or the integer greater than 2.
In one embodiment, it is special from the 1st frame image zooming-out first in previous N frame image to can use neural network model
Sign figure, which includes the space characteristics and temporal characteristics of the 1st frame image.That is, fisrt feature figure can reflect image
In entity motion state and the relationship of time.Neural network model is continued with from the 2nd frame image in previous N frame image
The space characteristics of the 2nd frame image are extracted, and combine the space characteristics of fisrt feature figure and the 2nd frame image, it is special therefrom to extract second
Sign figure, which includes the space characteristics and temporal characteristics of the 1st frame image and the 2nd frame image.That is, second feature figure can
With the motion state of the entity reflected in image at any time (the 1st frame image to the 2nd frame image time experienced) variation close
System.And so on, N characteristic pattern can be obtained.
Step 220: future M frame image is generated according to N characteristic pattern, wherein the following M frame image and previous N frame image it
Between be spaced P frame image, N is integer greater than 1, and M, P are the integer more than or equal to 1.
N characteristic pattern (Feature Map) includes the space characteristics and temporal characteristics of previous N frame image, and space characteristics can
To characterize the position of entity and posture in every image, at the time of temporal characteristics can reflect every image and correspond to, space characteristics
It can reflect the variation and the relationship of time of position and posture with both temporal characteristics combinations.
In one embodiment, space characteristics in addition to include image in entity position and posture information other than, can also be into one
Step includes the colouring information or other actually required information of each position in image, and the embodiment of the present application is not made this specifically
It limits.
Predict that future M frame image, the M frame image may be constructed the video frame of prediction by N characteristic pattern.Previous N frame figure
As not in time being continuous with the following M frame image.
For example, previously the N frame image corresponding period be 9 points 20 seconds ten minutes (10s) ten seconds ten minutes to 9 points, the following M
The frame image corresponding period be 9 points 55 seconds ten minutes (5s) 50 seconds ten minutes to 9 points, i.e., previous N frame image and future M
30s is spaced between frame image.The 30s being spaced between previous N frame image and future M frame image can corresponding a certain number of figures
As (that is, P frame image).
Fig. 3 is the schematic diagram of a scenario for the video estimation method that one exemplary embodiment of the application provides, as shown in figure 3,
On time shaft, existing video frame (previous N frame image) corresponding period is that (t=0 is time point t=T0 to time point t=0
Current time).Video frame (the following M frame image) the to be predicted corresponding period is time point t=T1 to time point t=T2, and
The intermediate period, (t=0 to the corresponding video frame of t=T1) did not needed prediction.
In the present embodiment, the time interval between two field pictures adjacent in previous N frame image, with the following M frame image
In time interval between adjacent two field pictures, may be the same or different, this can be set according to actual needs
It is fixed.
When M is 1, i.e., the number of future M frame image is 1, can directly generate the image according to N characteristic pattern.
When M is greater than 1, i.e., the number of future M frame image is greater than 1, can directly generate future M according to N characteristic pattern
The 1st frame image in frame image, then sequentially generates remaining M-1 frame image according to the 1st frame image, the M frame image construction
The video of prediction.
The embodiment of the present application provides a kind of video estimation method, by utilizing the previous N frame image in known video
N characteristic pattern, the determining M frame image of discontinuous future in time with previous N frame image, realization is to the view in future time section
The prediction of frequency frame, since the prediction to the video frame being located among previous N frame image and future M frame image is omitted, so as to
Enough shorten calculates the time, reduces resource occupation and computational burden, improves forecasting efficiency.
Fig. 4 is the process signal of the N characteristic pattern for the previous N frame image of determination that the application another exemplary embodiment provides
Figure.Extend the application embodiment illustrated in fig. 4 on the basis of the application embodiment illustrated in fig. 2, below emphatically shown in narration Fig. 4
The difference of embodiment and embodiment illustrated in fig. 2, something in common repeat no more.
As shown in figure 4, determining the N feature of previous N frame image in video estimation method provided by the embodiments of the present application
Figure (i.e. step 210), comprising:
Step 211: the first space characteristics figure of every frame image in previous N frame image is determined, wherein the first space characteristics
Figure includes the space characteristics of every frame image.
By taking any frame image in previous N frame image as an example, space characteristics include position and the posture of the entity in image
Information, the space characteristics can indicate with the first space characteristics figure, the first space characteristics figure can with vector, matrix or other
Suitable form indicates.The corresponding first space characteristics figure of every hardwood image.
In one embodiment, the executing subject controller of the video estimation method can be obtained by executing video estimation model
To the following M frame image, which can be machine learning model and is obtained by training, study.The video estimation
Model can be based on one of convolutional neural networks, Recognition with Recurrent Neural Network, full Connection Neural Network or other neural networks
Or it is a variety of and composition.
In one embodiment, the first space characteristics figure in step 211 can execute setting for neural network by controller
The convolution algorithm of convolutional layer is determined to obtain.
Specifically, referring to Fig. 6, the 1st frame image, the 2nd frame image, the 3rd frame image are previous N frame figure in input frame image
Picture, the convolution algorithm that three frame images pass through convolutional layer respectively obtain respective first space characteristics figure.
Step 212: based on the (n-1)th characteristic pattern corresponding with the (n-1)th frame image and n-th frame image in previous N frame image
The first space characteristics figure, determine the n-th characteristic pattern, wherein n be greater than 1 and be less than or equal to N-1 integer.
According to the sequencing of time, first N frame image includes the 1st frame image, the 2nd frame image ... nth frame image.The
Two characteristic patterns are that the first space characteristics figure based on the corresponding fisrt feature figure of the 1st frame image and the 2nd frame image obtains.
Relationship (position of entity and the appearance in image that is, the space characteristics that second feature figure both can reflect image change with time
The variation relation of state and time), and can reflect the space characteristics of present image (that is, the position of entity and appearance in the 2nd frame image
State state).
In one embodiment, reflecting that the space characteristics of image change with time the parameter of relationship in fisrt feature figure can be with
It is assigned by initialization.
Third feature figure is the first space characteristics based on the corresponding second feature figure of the 2nd frame image and the 3rd frame image
What figure obtained.That is, the space characteristics that third feature figure reflects image change with time, (variation relation includes to relationship
The variation relation of 1 frame to the 2nd frame, and the variation relation from the 2nd frame to the 3rd frame) and present image space characteristics (that is,
The position of entity and posture state in 3rd frame image).
Step 212 is repeated, until determining N-1 characteristic pattern corresponding with N-1 frame image.
Step 213: based on N-1 characteristic pattern corresponding with N-1 frame image and nth frame image in previous N frame image
The first space characteristics figure, determine N characteristic pattern.
Characteristic pattern is similar with space characteristics figure, can also be indicated with vector, matrix or other suitable forms.
In one embodiment, the characteristic pattern in step 212 and step 213 can execute setting for neural network by controller
Convolution-shot and long term memory layer convolution algorithm is determined to obtain.
N characteristic pattern can carry out convolution fortune by the first space characteristics figure to N-1 characteristic pattern and nth frame image
It calculates to determine.
Specifically, referring to Fig. 6, the first space characteristics figure of the 1st frame image is by convolution-shot and long term memory layer convolution fortune
It calculates and obtains fisrt feature figure;Further, the first space characteristics figure of fisrt feature figure and the 2nd frame image passes through convolution-length
The convolution algorithm of short-term memory layer obtains second feature figure;First space characteristics figure of second feature figure and the 3rd frame image warp
It crosses convolution-shot and long term memory layer convolution algorithm and obtains third feature figure.In the present embodiment, the acquisition process of fisrt feature figure
It can be, the first space characteristics figure of the 0th characteristic pattern and the 1st frame image is by convolution-shot and long term memory layer convolution algorithm
And obtain, here, the 0th characteristic pattern can be that initialization assigns or pre-set.
Video estimation method provided by the embodiments of the present application, by based on the space characteristics comprising previous N frame image at any time
Between variation relation and nth frame image space characteristics N characteristic pattern, determination do not connect in time with previous N frame image
The continuous following M frame image, can be improved the accuracy of prediction result.
Fig. 5 is that the process for generating future M frame image according to N characteristic pattern that the application another exemplary embodiment provides is shown
It is intended to.Extend the application embodiment illustrated in fig. 5 on the basis of the application embodiment illustrated in fig. 2, describes Fig. 5 institute emphatically below
Show that the difference of embodiment and embodiment illustrated in fig. 2, something in common repeat no more.
As shown in figure 5, generating future M frame according to N characteristic pattern in video estimation method provided by the embodiments of the present application
Image (i.e. step 220), comprising:
221: the first space characteristics figure of blank frame image is determined according to blank frame image.
First space characteristics figure of blank frame image may include the space characteristics of blank frame image.Blank frame image can be with
It is preset image, can not includes the entity in previous N frame image in the image.
In one embodiment, the first space characteristics figure in step 221 can execute setting for neural network by controller
The convolution algorithm of convolutional layer is determined to obtain.
Specifically, referring to Fig. 6, predict that the 6th frame image in frame image, the 7th frame image, the 8th frame image are future M frame figure
Picture is spaced 3 frame images between input frame image and prediction frame image, this 3 frame image does not need prediction.6th frame image is
It is obtained based on blank frame image.Blank frame image obtains the first space characteristics of blank frame by the convolution algorithm of convolutional layer
Figure.
222: the 1st frame in future M frame image is determined according to the first space characteristics figure of N characteristic pattern and blank frame image
The second space characteristic pattern of image.
The corresponding first space characteristics figure of blank frame image can regard that a blank sheet of paper, N characteristic pattern can regard real as
Shape and color, the two combination can obtain width picture.That is, the first space characteristics figure of N characteristic pattern and blank frame image
In conjunction with the second space characteristic pattern that can obtain the 1st frame image in the following M frame image.
In one embodiment, the second space characteristic pattern in step 222 can execute setting for neural network by controller
Convolution-shot and long term memory layer convolution algorithm is determined to obtain.
Specifically, referring to Fig. 6, the first space characteristics figure of blank frame image and the third feature figure warp of input frame image
Cross the second space characteristic pattern that convolution-shot and long term memory layer convolution algorithm obtains the 6th frame image of prediction.
223: the 1st frame image in future M frame image is generated according to second space characteristic pattern.
Image can be restored according to second space characteristic pattern, that is, generate the 1st frame image in future M frame image.Future
Any image in M frame image is properly termed as prediction frame image, for example, the 1st frame image in future M frame image is properly termed as the
1 prediction frame image.
In one embodiment, the 1st frame image in step 223 can execute the setting warp of neural network by controller
The de-convolution operation of lamination obtains.
Specifically, referring to Fig. 6, the second space characteristic pattern of the 6th frame image is obtained by the de-convolution operation of warp lamination
6th frame image.
224: according in the following M frame image m-1 frame image generate m frame image, wherein m be greater than 1 and be less than or
Integer equal to M.
Specifically, the first space characteristics figure of the 1st prediction frame image can be determined in the convolutional layer of neural network.In mind
In convolution through network-shot and long term memory layer, previous N frame image and sky (are included according to the corresponding characteristic pattern of the 0th prediction frame image
The space characteristics and temporal characteristics of white frame image) and the 1st prediction frame image the first space characteristics figure, determine with the 2nd prediction
The corresponding second space characteristic pattern of frame image.In the warp lamination of neural network, according to the 2nd prediction frame image corresponding the
Two space characteristics figures generate the 2nd prediction frame image.
Herein, the first space characteristics figure includes the space characteristics of the 1st prediction frame image, and space characteristics include in image
The position of entity and posture information, can further include the 1st prediction frame image in each position colouring information or other
Actually required information.Characteristic pattern include previous N frame image and the 0th prediction frame image (blank frame image) space characteristics and when
Between feature, at the time of temporal characteristics can reflect every image and correspond to, both space characteristics and temporal characteristics are combined and be can reflect
The variation and the relationship of time of position and posture.
The generating process of 3rd prediction frame image may include: to determine the 2nd prediction frame figure in the convolutional layer of neural network
First space characteristics figure of picture;It is empty according to the first of the 2nd prediction frame image in convolution-shot and long term memory layer of neural network
Between characteristic pattern and the 1st corresponding characteristic pattern of prediction frame image determine and predict the corresponding second space characteristic pattern of frame image with the 3rd;
And in the warp lamination of neural network, it is pre- that the 3rd is generated according to second space characteristic pattern corresponding with the 3rd prediction frame image
Survey frame image.
Step 224 is repeated, predicts frame image until generating M.The following M frame image is the video for constituting prediction.
Specifically, referring to Fig. 6, the 6th frame image is special by the first space that the convolution algorithm of convolutional layer obtains the 6th frame image
Sign figure, the first space characteristics figure of blank frame image and the third feature figure of input frame image are by convolution-shot and long term memory
The convolution algorithm of layer can also obtain fifth feature figure (space characteristics and time comprising previous 3 frame image and blank frame image
Feature).The the first space characteristics figure and fifth feature figure of 6th frame image are by convolution-shot and long term memory layer convolution algorithm
The second space characteristic pattern and sixth feature figure for obtaining the 7th frame image of prediction (may include previous 3 frame image, blank frame
The space characteristics and temporal characteristics of image and the 6th frame image).The second space characteristic pattern of 7th frame image passes through warp lamination
De-convolution operation obtain the 7th frame image.7th frame image obtains the first sky of the 7th frame image by the convolution algorithm of convolutional layer
Between characteristic pattern, the first space characteristics figure and sixth feature figure of the 7th frame image are by convolution-shot and long term memory layer convolution fortune
The second space characteristic pattern for obtaining the 8th frame image of prediction is calculated, the second space characteristic pattern of the 8th frame image is by warp lamination
De-convolution operation obtains the 8th frame image.
The embodiment of the present application provides a kind of video estimation method, by utilizing the previous N frame image in known video
First space characteristics figure of N characteristic pattern and blank frame, the determining M frame of discontinuous future in time with previous N frame image
Image realizes the video estimation process from image sequence to image sequence, so as to avoid meter caused by non-essential prediction
The waste of resource is calculated, the forecasting efficiency of long-time video estimation is improved.
It can be complementary to one another, be combined with each other between Fig. 2, Fig. 4 and embodiment illustrated in fig. 5, to realize efficient view
Frequency prediction process.
The embodiment of the present application provides a kind of training method of video estimation model, which includes: to pass through utilization
Multiple Sample video training machine learning models obtain video estimation model, and each Sample video in multiple Sample videos includes
Previous N frame sample image and future M frame sample image, are spaced P frame between the following M frame sample image and previous N frame sample image
Image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
Specifically, machine learning model can be based on convolutional neural networks, Recognition with Recurrent Neural Network, full Connection Neural Network
Or one of other neural networks or a variety of and composition.Figure included by Sample video for training machine learning model
As frame number may be greater than N+M, that is, in Sample video, previous N frame sample image is discontinuous with future M frame sample image
, midfeather several sample images.
In the training process, joined by the network that the loss function of machine learning model reversely updates machine learning model
Number, it is final to obtain video estimation model until convergence.The loss function following M frame sample of future M frame sample image and prediction
Difference between image characterizes.
Specifically, in training machine learning model, machine learning model predicts future M according to previous N frame sample image
Frame sample image, the following M frame sample image predicted.The following M frame sample image and the following M frame sample image of prediction it
Between have differences, machine learning model obtains loss function according to the difference, and then reversely updates engineering using loss function
Practise the network parameter of model.By utilizing multiple Sample videos constantly training machine learning model, until obtaining video estimation
Model.The video estimation model of acquisition can be used for realizing Fig. 2 to video estimation method shown in Fig. 4.
During using multiple Sample video training machine learning models, not according to the prediction of previous N frame sample image
The detailed process for carrying out M frame sample image may refer to description of the Fig. 2 into Fig. 4, and to avoid repeating, details are not described herein.
The embodiment of the present application provides a kind of training method of video estimation model, previous in known video by utilizing
The N characteristic pattern of N frame sample image, prediction and previous N frame sample image M frame sample image of discontinuous future in time,
And then according to the parameter of loss function adjustment machine learning model to obtain video estimation model, so that the video estimation mould
Type can shorten when predicting the video in future time section and calculate the time, reduce resource occupation and computational burden,
Improve forecasting efficiency.
Exemplary means
Fig. 7 is the structural schematic diagram for the video estimation device 700 that one exemplary embodiment of the application provides.As shown in fig. 7,
The device 700 comprises determining that module 710 and generation module 720.
Determining module 710 is used to determine the N characteristic pattern of previous N frame image, wherein N characteristic pattern includes previous N frame figure
The space characteristics and temporal characteristics of picture;Generation module 720 is used to generate future M frame image according to N characteristic pattern, wherein the following M
P frame image is spaced between frame image and previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.
Specifically, it is determined that the specific work process and function of module 710 and generation module 720, may refer to above-mentioned Fig. 2
In description, details are not described herein.
The embodiment of the present application provides a kind of video estimation device, by utilizing the previous N frame image in known video
N characteristic pattern, the determining M frame image of discontinuous future in time with previous N frame image, realization is to the view in future time section
The prediction of frequency frame, since the prediction to the video frame being located among previous N frame image and future M frame image is omitted, so as to
Enough shorten calculates the time, reduces resource occupation and computational burden, improves forecasting efficiency.
According to one embodiment of the application, determining module 710 is used for: determining first of every frame image in previous N frame image
Space characteristics figure, wherein the first space characteristics figure includes the space characteristics of every frame image;Based in previous N frame image with (n-1)th
First space characteristics figure of corresponding (n-1)th characteristic pattern of frame image and n-th frame image, determines the n-th characteristic pattern, wherein n is big
In 1 and be less than or equal to N-1 integer;Based on N-1 characteristic pattern corresponding with N-1 frame image in previous N frame image and
First space characteristics figure of nth frame image, determines the N characteristic pattern.
According to one embodiment of the application, determining module 710 determines that the first space of every frame image in previous N frame image is special
The step of sign figure, is executed by executing the convolution algorithm of the convolutional layer of neural network, determining module 710 determine n-th characteristic pattern with
And the step of determining N characteristic pattern, is executed by executing convolution-shot and long term memory layer convolution algorithm of neural network.
According to one embodiment of the application, generation module 720 is used for: determining the first of blank frame image according to blank frame image
Space characteristics figure;The 1st frame in future M frame image is determined according to the first space characteristics figure of N characteristic pattern and blank frame image
The second space characteristic pattern of image;The 1st frame image in future M frame image is generated according to second space characteristic pattern;According to the following M
M-1 frame image in frame image generates m frame image, wherein m is the integer greater than 1 and less than or equal to M.
According to one embodiment of the application, generation module 720 determines the first space of blank frame image according to blank frame image
The step of characteristic pattern, is executed by the convolution algorithm of the convolutional layer of execution neural network, and generation module 720 is according to N characteristic pattern
The step of the second space characteristic pattern of the 1st frame image in future M frame image is determined with the first space characteristics figure of blank frame image
Suddenly it is executed by executing convolution-shot and long term memory layer convolution algorithm of neural network, generation module 720 is according to second space
Characteristic pattern generates deconvolution fortune of the step of the 1st frame image in future M frame image by the warp lamination of execution neural network
It calculates to execute.
The specific work process and function of modules may refer to description of the above-mentioned Fig. 2 into Fig. 6, herein no longer
It repeats.
Example electronic device
In the following, being described with reference to Figure 8 the electronic equipment according to the embodiment of the present application.The electronic equipment 80 can execute above-mentioned
Video estimation process.
Fig. 8 illustrates the block diagram of the electronic equipment 80 according to the embodiment of the present application.
As shown in figure 8, electronic equipment 80 includes one or more processors 81 and memory 82.
Processor 81 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability
Other forms processing unit, and can control the other assemblies in electronic equipment 80 to execute desired function.
Memory 82 may include one or more computer program products, and the computer program product may include each
The computer readable storage medium of kind form, such as volatile memory and/or nonvolatile memory.The volatile storage
Device for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile to deposit
Reservoir for example may include read-only memory (ROM), hard disk, flash memory etc..It can be deposited on the computer readable storage medium
One or more computer program instructions are stored up, processor 81 can run described program instruction, to realize this Shen described above
The video estimation method of each embodiment please and/or other desired functions.In the computer readable storage medium
In can also store the various contents such as input signal, signal component, noise component(s), video signal.
In one example, electronic equipment 80 can also include: input unit 83 and output device 84, these components pass through
The interconnection of bindiny mechanism's (not shown) of bus system and/or other forms.
For example, the input unit 83 can be above-mentioned video camera, the input signal for captured video image.In the electricity
When sub- equipment is stand-alone device, which can be communication network connector, collected for receiving from video camera
Input signal.
In addition, the input equipment 83 can also include such as keyboard, mouse etc..
The output device 84 can be output to the outside various information, including the video image etc. determined.The output equipment
84 may include such as display, loudspeaker, printer and communication network and its remote output devices connected etc..
Certainly, to put it more simply, illustrated only in Fig. 8 it is some in component related with the application in the electronic equipment 80,
The component of such as bus, input/output interface etc. is omitted.In addition to this, according to concrete application situation, electronic equipment 80 is also
It may include any other component appropriate.
The embodiment of the present application provides a kind of vehicle, including above-mentioned electronic equipment 80.Electronic equipment 80 can be vehicle-mounted
Equipment in system, for executing above-mentioned video estimation method, so as to be adjusted in time to the driving status of vehicle,
Guarantee traffic safety.
In one embodiment, vehicle can be automatic driving vehicle or automatic driving vehicle.
The video for the prediction that electronic equipment 80 can be obtained according to video estimation directly controls the speed of current vehicle, turns to
Etc. states, or the video of prediction is played by display screen on vehicle, so that driver is according in the video of prediction
Content adjusts the states such as speed, the steering of current vehicle.
Illustrative computer program product and computer readable storage medium
Other than the above method and equipment, embodiments herein can also be computer program product comprising meter
Calculation machine program instruction, it is above-mentioned that the computer program instructions make the processor execute this specification when being run by processor
According to the step in the video estimation method of the various embodiments of the application described in " illustrative methods " part.
The computer program product can be write with any combination of one or more programming languages for holding
The program code of row the embodiment of the present application operation, described program design language includes object oriented program language, such as
Java, C++ etc. further include conventional procedural programming language, such as " C " language or similar programming language.Journey
Sequence code can be executed fully on the user computing device, partly execute on a user device, be independent soft as one
Part packet executes, part executes on a remote computing or completely in remote computing device on the user computing device for part
Or it is executed on server.
In addition, embodiments herein can also be computer readable storage medium, it is stored thereon with computer program and refers to
It enables, the computer program instructions make the processor execute above-mentioned " the exemplary side of this specification when being run by processor
According to the step in the video estimation method of the various embodiments of the application described in method " part.
The computer readable storage medium can be using any combination of one or more readable mediums.Readable medium can
To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity
Magnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Readable storage medium storing program for executing it is more specific
Example (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory
Device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc
Read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The basic principle of the application is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that in this application
The advantages of referring to, advantage, effect etc. are only exemplary rather than limitation, must not believe that these advantages, advantage, effect etc. are the application
Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work being easy to understand
With, rather than limit, it is that must be realized using above-mentioned concrete details that above-mentioned details, which is not intended to limit the application,.
Device involved in the application, device, equipment, system block diagram only as illustrative example and be not intended to
It is required that or hint must be attached in such a way that box illustrates, arrange, configure.As those skilled in the art will appreciate that
, it can be connected by any way, arrange, configure these devices, device, equipment, system.Such as "include", "comprise", " tool
" etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above
"or" and "and" refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made
Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.
It may also be noted that each component or each step are can to decompose in the device of the application, device and method
And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the application.
The above description of disclosed aspect is provided so that any person skilled in the art can make or use this
Application.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein
General Principle can be applied to other aspect without departing from scope of the present application.Therefore, the application is not intended to be limited to
Aspect shown in this, but according to principle disclosed herein and the consistent widest range of novel feature.
In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the application
It applies example and is restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this field skill
Its certain modifications, modification, change, addition and sub-portfolio will be recognized in art personnel.
Claims (10)
1. a kind of video estimation method, comprising:
Determine the N characteristic pattern of previous N frame image, wherein the N characteristic pattern includes that the space of the previous N frame image is special
It seeks peace temporal characteristics;
Generate future M frame image according to the N characteristic pattern, wherein the future M frame image and the previous N frame image it
Between be spaced P frame image, N is integer greater than 1, and M, P are the integer more than or equal to 1.
2. according to the method described in claim 1, wherein, the N characteristic pattern of the previous N frame image of determination, comprising:
The first space characteristics figure of every frame image in the previous N frame image is determined, wherein the first space characteristics figure packet
Space characteristics containing every frame image;
First based on the (n-1)th characteristic pattern corresponding with the (n-1)th frame image in the previous N frame image and n-th frame image is empty
Between characteristic pattern, determine n-th characteristic pattern, wherein n be greater than 1 and be less than or equal to N-1 integer;
First based on N-1 characteristic pattern corresponding with N-1 frame image in the previous N frame image and nth frame image is empty
Between characteristic pattern, determine the N characteristic pattern.
3. according to the method described in claim 2, wherein, the first space of every frame image in the previous N frame image of determination
The step of characteristic pattern, is executed by executing the convolution algorithm of the convolutional layer of neural network, determination n-th characteristic pattern with
And the step of determination N characteristic pattern, remembers the convolution of layer by executing convolution-shot and long term of the neural network and transports
It calculates to execute.
4. described to generate future M frame image according to the N characteristic pattern according to the method described in claim 1, wherein, comprising:
The first space characteristics figure of the blank frame image is determined according to blank frame image;
It is determined in the future M frame image according to the first space characteristics figure of the N characteristic pattern and the blank frame image
The second space characteristic pattern of 1st frame image;
The 1st frame image in the future M frame image is generated according to the second space characteristic pattern;
M frame image is generated according to the m-1 frame image in the future M frame image, wherein m is greater than 1 and to be less than or equal to
The integer of M.
5. described to determine the first of the blank frame image according to blank frame image according to the method described in claim 4, wherein
The step of space characteristics figure, is executed by the convolution algorithm of the convolutional layer of execution neural network, described according to the N feature
First space characteristics figure of figure and the blank frame image determines the second space of the 1st frame image in the future M frame image
The step of characteristic pattern, is executed by executing convolution-shot and long term memory layer convolution algorithm of the neural network, the basis
The second space characteristic pattern generates the step of the 1st frame image in the future M frame image by executing the neural network
The de-convolution operation of warp lamination execute.
6. a kind of training method of video estimation model, comprising:
By obtaining the video estimation model, the multiple Sample video using multiple Sample video training machine learning models
In each Sample video include previous N frame sample image and future M frame sample image, the future M frame sample image and institute
State interval P frame image between previous N frame sample image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
7. a kind of video estimation device, comprising:
Determining module, for determining the N characteristic pattern of previous N frame image, wherein the N characteristic pattern includes the previous N frame
The space characteristics and temporal characteristics of image;
Generation module, for according to the N characteristic pattern generate future M frame image, wherein the future M frame image with it is described
P frame image is spaced between previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.
8. a kind of computer readable storage medium, the storage medium is stored with computer program, and the computer program is used for
Execute video estimation method described in any one of the claims 1 to 5.
9. a kind of electronic equipment, the electronic equipment include:
Processor;
For storing the memory of the processor-executable instruction,
Wherein, the processor is for executing video estimation method described in any one of the claims 1 to 5.
10. a kind of vehicle, including electronic equipment as claimed in claim 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910610206.9A CN110334654A (en) | 2019-07-08 | 2019-07-08 | Video estimation method and apparatus, the training method of video estimation model and vehicle |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910610206.9A CN110334654A (en) | 2019-07-08 | 2019-07-08 | Video estimation method and apparatus, the training method of video estimation model and vehicle |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334654A true CN110334654A (en) | 2019-10-15 |
Family
ID=68144401
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910610206.9A Pending CN110334654A (en) | 2019-07-08 | 2019-07-08 | Video estimation method and apparatus, the training method of video estimation model and vehicle |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334654A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110839156A (en) * | 2019-11-08 | 2020-02-25 | 北京邮电大学 | Future frame prediction method and model based on video image |
CN111901673A (en) * | 2020-06-24 | 2020-11-06 | 北京大学 | Video prediction method, device, storage medium and terminal |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170255832A1 (en) * | 2016-03-02 | 2017-09-07 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Detecting Actions in Videos |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
CN107492113A (en) * | 2017-06-01 | 2017-12-19 | 南京行者易智能交通科技有限公司 | A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method |
US20180144248A1 (en) * | 2016-11-18 | 2018-05-24 | Salesforce.Com, Inc. | SENTINEL LONG SHORT-TERM MEMORY (Sn-LSTM) |
CN108960160A (en) * | 2018-07-10 | 2018-12-07 | 深圳地平线机器人科技有限公司 | The method and apparatus of structural state amount are predicted based on unstructured prediction model |
CN109829495A (en) * | 2019-01-29 | 2019-05-31 | 南京信息工程大学 | Timing image prediction method based on LSTM and DCGAN |
CN109841226A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | A kind of single channel real-time noise-reducing method based on convolution recurrent neural network |
CN109891897A (en) * | 2016-10-27 | 2019-06-14 | 诺基亚技术有限公司 | Method for analyzing media content |
-
2019
- 2019-07-08 CN CN201910610206.9A patent/CN110334654A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170255832A1 (en) * | 2016-03-02 | 2017-09-07 | Mitsubishi Electric Research Laboratories, Inc. | Method and System for Detecting Actions in Videos |
CN109891897A (en) * | 2016-10-27 | 2019-06-14 | 诺基亚技术有限公司 | Method for analyzing media content |
US20180144248A1 (en) * | 2016-11-18 | 2018-05-24 | Salesforce.Com, Inc. | SENTINEL LONG SHORT-TERM MEMORY (Sn-LSTM) |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
CN107492113A (en) * | 2017-06-01 | 2017-12-19 | 南京行者易智能交通科技有限公司 | A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method |
CN108960160A (en) * | 2018-07-10 | 2018-12-07 | 深圳地平线机器人科技有限公司 | The method and apparatus of structural state amount are predicted based on unstructured prediction model |
CN109841226A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | A kind of single channel real-time noise-reducing method based on convolution recurrent neural network |
CN109829495A (en) * | 2019-01-29 | 2019-05-31 | 南京信息工程大学 | Timing image prediction method based on LSTM and DCGAN |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110839156A (en) * | 2019-11-08 | 2020-02-25 | 北京邮电大学 | Future frame prediction method and model based on video image |
CN111901673A (en) * | 2020-06-24 | 2020-11-06 | 北京大学 | Video prediction method, device, storage medium and terminal |
CN111901673B (en) * | 2020-06-24 | 2021-12-03 | 北京大学 | Video prediction method, device, storage medium and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11693901B2 (en) | Systems and methods for geolocation prediction | |
CN111386550A (en) | Unsupervised learning of image depth and ego-motion predictive neural networks | |
CN111246091B (en) | Dynamic automatic exposure control method and device and electronic equipment | |
CN109889849B (en) | Video generation method, device, medium and equipment | |
EP3847619B1 (en) | Unsupervised depth prediction neural networks | |
US20210056388A1 (en) | Knowledge Transfer Between Different Deep Learning Architectures | |
CN107920257A (en) | Video Key point real-time processing method, device and computing device | |
CN110334654A (en) | Video estimation method and apparatus, the training method of video estimation model and vehicle | |
CN107909638A (en) | Rendering intent, medium, system and the electronic equipment of dummy object | |
CN108648253A (en) | The generation method and device of dynamic picture | |
US11967150B2 (en) | Parallel video processing systems | |
EP3663965A1 (en) | Method for predicting multiple futures | |
CN115512251A (en) | Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement | |
CN109447141A (en) | Away rust by laser method and device based on machine learning | |
CN109981991A (en) | Model training method, image processing method, device, medium and electronic equipment | |
CN110378250A (en) | Training method, device and the terminal device of neural network for scene cognition | |
CN109903350A (en) | Method for compressing image and relevant apparatus | |
KR20210086583A (en) | Method and apparatus for controlling driverless vehicle and electronic device | |
CN109685805A (en) | A kind of image partition method and device | |
KR20230137991A (en) | Rendering new images of scenes using a geometry-aware neural network adjusted according to latent variables. | |
CN116634638A (en) | Light control strategy generation method, light control method and related device | |
CN110719487B (en) | Video prediction method and device, electronic equipment and vehicle | |
CN112668596B (en) | Three-dimensional object recognition method and device, recognition model training method and device | |
CN109711349B (en) | Method and device for generating control instruction | |
CN108881899A (en) | Based on the pyramidal image prediction method and apparatus of optical flow field and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |