CN110334654A - Video estimation method and apparatus, the training method of video estimation model and vehicle - Google Patents

Video estimation method and apparatus, the training method of video estimation model and vehicle Download PDF

Info

Publication number
CN110334654A
CN110334654A CN201910610206.9A CN201910610206A CN110334654A CN 110334654 A CN110334654 A CN 110334654A CN 201910610206 A CN201910610206 A CN 201910610206A CN 110334654 A CN110334654 A CN 110334654A
Authority
CN
China
Prior art keywords
frame image
characteristic pattern
previous
frame
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910610206.9A
Other languages
Chinese (zh)
Inventor
范坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Horizon Robotics Technology Research and Development Co Ltd
Original Assignee
Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Horizon Robotics Technology Research and Development Co Ltd filed Critical Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority to CN201910610206.9A priority Critical patent/CN110334654A/en
Publication of CN110334654A publication Critical patent/CN110334654A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights

Abstract

This application discloses a kind of video estimation method and apparatus, the training method and vehicle of video estimation model, the video estimation method comprises determining that the N characteristic pattern of previous N frame image, wherein, N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;Future M frame image is generated according to N characteristic pattern, wherein P frame image is spaced between the following M frame image and previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.By determining and previous N frame image, M frame image of discontinuous future, prediction of the realization to the video frame in future time section improve forecasting efficiency so as to shorten the calculating time, reduce resource occupation and computational burden to the technical solution of the application in time.

Description

Video estimation method and apparatus, the training method of video estimation model and vehicle
Technical field
The present invention relates to technical field of computer vision, and in particular to a kind of video estimation method and apparatus, video estimation The training method and vehicle of model.
Background technique
Video estimation can predict subsequent video according to given video, so that user can be according to prediction Video judges in advance or decision.Existing video estimation method is generally based on former frame prediction a later frame, and by not It is disconnected to repeat the prolonged video estimation of operation realization.
Summary of the invention
In order to solve the above-mentioned technical problem, the application is proposed.Embodiments herein provides a kind of video estimation side Method and device, the training method of video estimation model and vehicle.
According to the one aspect of the application, a kind of video estimation method is provided, comprising: determine the N of previous N frame image Characteristic pattern, wherein N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;It is generated not according to N characteristic pattern Carry out M frame image, wherein P frame image is spaced between the following M frame image and previous N frame image, N is the integer greater than 1, and M, P are big In or equal to 1 integer.
According to further aspect of the application, a kind of video estimation device is provided, comprising: determining module, for determining The N characteristic pattern of previous N frame image, wherein N characteristic pattern includes the space characteristics and temporal characteristics of previous N frame image;It generates Module, for generating future M frame image according to N characteristic pattern, wherein be spaced P between the following M frame image and previous N frame image Frame image, N are the integer greater than 1, and M, P are the integer more than or equal to 1.
According to the another aspect of the application, a kind of training method of video estimation model is provided, comprising: pass through utilization Multiple Sample video training machine learning models obtain video estimation model, and each Sample video in multiple Sample videos includes Previous N frame sample image and future M frame sample image, are spaced P frame between the following M frame sample image and previous N frame sample image Image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
According to the another aspect of the application, a kind of computer readable storage medium is provided, storage medium is stored with meter Calculation machine program, computer program is for executing above-mentioned video estimation method.
According to the another aspect of the application, a kind of electronic equipment is provided, comprising: processor;For storage processor The memory of executable instruction, wherein processor is for executing above-mentioned video estimation method.
According to the another aspect of the application, a kind of vehicle is provided, including above-mentioned electronic equipment.
The embodiment of the present application provides a kind of video estimation method and apparatus, the training method and vehicle of video estimation model , by the N characteristic pattern using the previous N frame image in known video, determination is discontinuous in time with previous N frame image The following M frame image, realize prediction to the video frame in future time section, due to be omitted to be located at previous N frame image and The prediction of video frame among the following M frame image calculates time, reduction resource occupation and computational burden so as to shorten, Improve forecasting efficiency.
Detailed description of the invention
The embodiment of the present application is described in more detail in conjunction with the accompanying drawings, the above-mentioned and other purposes of the application, Feature and advantage will be apparent.Attached drawing is used to provide to further understand the embodiment of the present application, and constitutes explanation A part of book is used to explain the application together with the embodiment of the present application, does not constitute the limitation to the application.In the accompanying drawings, Identical reference label typically represents same parts or step.
Fig. 1 is the system architecture schematic diagram for the video estimation system that one exemplary embodiment of the application provides.
Fig. 2 is the flow diagram for the video estimation method that one exemplary embodiment of the application provides.
Fig. 3 is the schematic diagram of a scenario for the video estimation method that one exemplary embodiment of the application provides.
Fig. 4 is the process signal of the N characteristic pattern for the previous N frame image of determination that the application another exemplary embodiment provides Figure.
Fig. 5 is that the process for generating future M frame image according to N characteristic pattern that the application another exemplary embodiment provides is shown It is intended to.
Fig. 6 is the structural schematic diagram for the video estimation model that one exemplary embodiment of the application provides.
Fig. 7 is the structural schematic diagram for the video estimation device that one exemplary embodiment of the application provides.
Fig. 8 is the block diagram for the electronic equipment that one exemplary embodiment of the application provides.
Specific embodiment
In the following, example embodiment according to the application will be described in detail by referring to the drawings.Obviously, described embodiment is only It is only a part of the embodiment of the application, rather than the whole embodiments of the application, it should be appreciated that the application is not by described herein The limitation of example embodiment.
Application is summarized
Video estimation can make prediction to the change in location of entity each in video image, or in video image The positional relationship of each entity and ambient enviroment is made prediction, so as to judge in advance convenient for user.Therefore, video preprocessor It surveys and is with a wide range of applications in fields such as robot decision, automatic Pilot and video understandings.
For example, predicting future video by the previous video acquired according to camera in automatic Pilot field, it can be determined that The motion state of vehicle around current vehicle is improved and is driven so as to adjust the motion state of current vehicle in time Safety.
Existing video estimation method is to predict a later frame image based on previous frame image, by constantly repeating this behaviour Make to carry out prolonged video estimation.For example, user it is desirable that future time section be 95s-100s video frame, it is existing The video frame that video estimation method can be 0s-100s based on existing video estimation future time section, to obtain future time Section is the video frame of 95s-100s.However, the video frame of 0s-95s is not user's needs in the video frame of 0s-100s , and during prediction, continuously predict that the video frame of future 0s-100s can be than relatively time-consuming and can account for for existing video With more computing resource, increase computational burden.
So existing video estimation method is difficult to the view in the period needed for directly obtaining user according to user demand Frequency frame there is a problem of time-consuming long, computational burden weight.
Exemplary system
Fig. 1 is the system architecture schematic diagram for the video estimation system 1 that one exemplary embodiment of the application provides, it illustrates A kind of video according to image capture device (for example, camera) acquisition is answered what the video in certain following period was predicted Use scene.As shown in Figure 1, the video estimation system 1 includes electronic equipment 10, image capture device 20.Electronic equipment 10 passes through According to the video in certain following period of image capture device video estimation collected, and then according to the video of the prediction to end End executes corresponding operation.Here terminal can be automatic driving vehicle, and image capture device 20 can be vehicle-mounted camera, It is also possible to the camera being mounted on other equipment (for example, unmanned plane).
It should be noted that the image capture device 20 in the embodiment of the present application can integrate on electronic equipment 10.
It should be noted that above-mentioned application scenarios are merely for convenience of understanding spirit herein and principle and showing, this Embodiment is not limited to this for application.On the contrary, embodiments herein can be applied to any scene that may be applicable in.
Illustrative methods
Fig. 2 is the flow diagram for the video estimation method that one exemplary embodiment of the application provides.The present embodiment is held Row main body for example can be the electronic equipment in Fig. 1, as shown in Fig. 2, this method comprises the following steps:
Step 210: determining the N characteristic pattern of previous N frame image, wherein N characteristic pattern includes the sky of previous N frame image Between feature and temporal characteristics.
Video estimation method provided by the embodiments of the present application can be used for automatic Pilot field, and the entity in video, which can be, works as Vehicle around vehicle in front and/or current vehicle, electronic equipment can be the controller of the onboard system on current vehicle, or Independently of other controllers except the current vehicle.The video for the prediction that controller can be obtained according to video estimation is directly controlled The video of prediction can be passed through the display screen on vehicle by the states such as speed, the steering of current vehicle processed or the controller It plays, so that driver adjusts the states such as speed, the steering of current vehicle according to the content in the video of prediction.
Specifically, previous N frame image can be camera and acquire in the preset time period before current point in time The video multiple images that are included, here, N can be 2 or the integer greater than 2.
In one embodiment, it is special from the 1st frame image zooming-out first in previous N frame image to can use neural network model Sign figure, which includes the space characteristics and temporal characteristics of the 1st frame image.That is, fisrt feature figure can reflect image In entity motion state and the relationship of time.Neural network model is continued with from the 2nd frame image in previous N frame image The space characteristics of the 2nd frame image are extracted, and combine the space characteristics of fisrt feature figure and the 2nd frame image, it is special therefrom to extract second Sign figure, which includes the space characteristics and temporal characteristics of the 1st frame image and the 2nd frame image.That is, second feature figure can With the motion state of the entity reflected in image at any time (the 1st frame image to the 2nd frame image time experienced) variation close System.And so on, N characteristic pattern can be obtained.
Step 220: future M frame image is generated according to N characteristic pattern, wherein the following M frame image and previous N frame image it Between be spaced P frame image, N is integer greater than 1, and M, P are the integer more than or equal to 1.
N characteristic pattern (Feature Map) includes the space characteristics and temporal characteristics of previous N frame image, and space characteristics can To characterize the position of entity and posture in every image, at the time of temporal characteristics can reflect every image and correspond to, space characteristics It can reflect the variation and the relationship of time of position and posture with both temporal characteristics combinations.
In one embodiment, space characteristics in addition to include image in entity position and posture information other than, can also be into one Step includes the colouring information or other actually required information of each position in image, and the embodiment of the present application is not made this specifically It limits.
Predict that future M frame image, the M frame image may be constructed the video frame of prediction by N characteristic pattern.Previous N frame figure As not in time being continuous with the following M frame image.
For example, previously the N frame image corresponding period be 9 points 20 seconds ten minutes (10s) ten seconds ten minutes to 9 points, the following M The frame image corresponding period be 9 points 55 seconds ten minutes (5s) 50 seconds ten minutes to 9 points, i.e., previous N frame image and future M 30s is spaced between frame image.The 30s being spaced between previous N frame image and future M frame image can corresponding a certain number of figures As (that is, P frame image).
Fig. 3 is the schematic diagram of a scenario for the video estimation method that one exemplary embodiment of the application provides, as shown in figure 3, On time shaft, existing video frame (previous N frame image) corresponding period is that (t=0 is time point t=T0 to time point t=0 Current time).Video frame (the following M frame image) the to be predicted corresponding period is time point t=T1 to time point t=T2, and The intermediate period, (t=0 to the corresponding video frame of t=T1) did not needed prediction.
In the present embodiment, the time interval between two field pictures adjacent in previous N frame image, with the following M frame image In time interval between adjacent two field pictures, may be the same or different, this can be set according to actual needs It is fixed.
When M is 1, i.e., the number of future M frame image is 1, can directly generate the image according to N characteristic pattern.
When M is greater than 1, i.e., the number of future M frame image is greater than 1, can directly generate future M according to N characteristic pattern The 1st frame image in frame image, then sequentially generates remaining M-1 frame image according to the 1st frame image, the M frame image construction The video of prediction.
The embodiment of the present application provides a kind of video estimation method, by utilizing the previous N frame image in known video N characteristic pattern, the determining M frame image of discontinuous future in time with previous N frame image, realization is to the view in future time section The prediction of frequency frame, since the prediction to the video frame being located among previous N frame image and future M frame image is omitted, so as to Enough shorten calculates the time, reduces resource occupation and computational burden, improves forecasting efficiency.
Fig. 4 is the process signal of the N characteristic pattern for the previous N frame image of determination that the application another exemplary embodiment provides Figure.Extend the application embodiment illustrated in fig. 4 on the basis of the application embodiment illustrated in fig. 2, below emphatically shown in narration Fig. 4 The difference of embodiment and embodiment illustrated in fig. 2, something in common repeat no more.
As shown in figure 4, determining the N feature of previous N frame image in video estimation method provided by the embodiments of the present application Figure (i.e. step 210), comprising:
Step 211: the first space characteristics figure of every frame image in previous N frame image is determined, wherein the first space characteristics Figure includes the space characteristics of every frame image.
By taking any frame image in previous N frame image as an example, space characteristics include position and the posture of the entity in image Information, the space characteristics can indicate with the first space characteristics figure, the first space characteristics figure can with vector, matrix or other Suitable form indicates.The corresponding first space characteristics figure of every hardwood image.
In one embodiment, the executing subject controller of the video estimation method can be obtained by executing video estimation model To the following M frame image, which can be machine learning model and is obtained by training, study.The video estimation Model can be based on one of convolutional neural networks, Recognition with Recurrent Neural Network, full Connection Neural Network or other neural networks Or it is a variety of and composition.
In one embodiment, the first space characteristics figure in step 211 can execute setting for neural network by controller The convolution algorithm of convolutional layer is determined to obtain.
Specifically, referring to Fig. 6, the 1st frame image, the 2nd frame image, the 3rd frame image are previous N frame figure in input frame image Picture, the convolution algorithm that three frame images pass through convolutional layer respectively obtain respective first space characteristics figure.
Step 212: based on the (n-1)th characteristic pattern corresponding with the (n-1)th frame image and n-th frame image in previous N frame image The first space characteristics figure, determine the n-th characteristic pattern, wherein n be greater than 1 and be less than or equal to N-1 integer.
According to the sequencing of time, first N frame image includes the 1st frame image, the 2nd frame image ... nth frame image.The Two characteristic patterns are that the first space characteristics figure based on the corresponding fisrt feature figure of the 1st frame image and the 2nd frame image obtains. Relationship (position of entity and the appearance in image that is, the space characteristics that second feature figure both can reflect image change with time The variation relation of state and time), and can reflect the space characteristics of present image (that is, the position of entity and appearance in the 2nd frame image State state).
In one embodiment, reflecting that the space characteristics of image change with time the parameter of relationship in fisrt feature figure can be with It is assigned by initialization.
Third feature figure is the first space characteristics based on the corresponding second feature figure of the 2nd frame image and the 3rd frame image What figure obtained.That is, the space characteristics that third feature figure reflects image change with time, (variation relation includes to relationship The variation relation of 1 frame to the 2nd frame, and the variation relation from the 2nd frame to the 3rd frame) and present image space characteristics (that is, The position of entity and posture state in 3rd frame image).
Step 212 is repeated, until determining N-1 characteristic pattern corresponding with N-1 frame image.
Step 213: based on N-1 characteristic pattern corresponding with N-1 frame image and nth frame image in previous N frame image The first space characteristics figure, determine N characteristic pattern.
Characteristic pattern is similar with space characteristics figure, can also be indicated with vector, matrix or other suitable forms.
In one embodiment, the characteristic pattern in step 212 and step 213 can execute setting for neural network by controller Convolution-shot and long term memory layer convolution algorithm is determined to obtain.
N characteristic pattern can carry out convolution fortune by the first space characteristics figure to N-1 characteristic pattern and nth frame image It calculates to determine.
Specifically, referring to Fig. 6, the first space characteristics figure of the 1st frame image is by convolution-shot and long term memory layer convolution fortune It calculates and obtains fisrt feature figure;Further, the first space characteristics figure of fisrt feature figure and the 2nd frame image passes through convolution-length The convolution algorithm of short-term memory layer obtains second feature figure;First space characteristics figure of second feature figure and the 3rd frame image warp It crosses convolution-shot and long term memory layer convolution algorithm and obtains third feature figure.In the present embodiment, the acquisition process of fisrt feature figure It can be, the first space characteristics figure of the 0th characteristic pattern and the 1st frame image is by convolution-shot and long term memory layer convolution algorithm And obtain, here, the 0th characteristic pattern can be that initialization assigns or pre-set.
Video estimation method provided by the embodiments of the present application, by based on the space characteristics comprising previous N frame image at any time Between variation relation and nth frame image space characteristics N characteristic pattern, determination do not connect in time with previous N frame image The continuous following M frame image, can be improved the accuracy of prediction result.
Fig. 5 is that the process for generating future M frame image according to N characteristic pattern that the application another exemplary embodiment provides is shown It is intended to.Extend the application embodiment illustrated in fig. 5 on the basis of the application embodiment illustrated in fig. 2, describes Fig. 5 institute emphatically below Show that the difference of embodiment and embodiment illustrated in fig. 2, something in common repeat no more.
As shown in figure 5, generating future M frame according to N characteristic pattern in video estimation method provided by the embodiments of the present application Image (i.e. step 220), comprising:
221: the first space characteristics figure of blank frame image is determined according to blank frame image.
First space characteristics figure of blank frame image may include the space characteristics of blank frame image.Blank frame image can be with It is preset image, can not includes the entity in previous N frame image in the image.
In one embodiment, the first space characteristics figure in step 221 can execute setting for neural network by controller The convolution algorithm of convolutional layer is determined to obtain.
Specifically, referring to Fig. 6, predict that the 6th frame image in frame image, the 7th frame image, the 8th frame image are future M frame figure Picture is spaced 3 frame images between input frame image and prediction frame image, this 3 frame image does not need prediction.6th frame image is It is obtained based on blank frame image.Blank frame image obtains the first space characteristics of blank frame by the convolution algorithm of convolutional layer Figure.
222: the 1st frame in future M frame image is determined according to the first space characteristics figure of N characteristic pattern and blank frame image The second space characteristic pattern of image.
The corresponding first space characteristics figure of blank frame image can regard that a blank sheet of paper, N characteristic pattern can regard real as Shape and color, the two combination can obtain width picture.That is, the first space characteristics figure of N characteristic pattern and blank frame image In conjunction with the second space characteristic pattern that can obtain the 1st frame image in the following M frame image.
In one embodiment, the second space characteristic pattern in step 222 can execute setting for neural network by controller Convolution-shot and long term memory layer convolution algorithm is determined to obtain.
Specifically, referring to Fig. 6, the first space characteristics figure of blank frame image and the third feature figure warp of input frame image Cross the second space characteristic pattern that convolution-shot and long term memory layer convolution algorithm obtains the 6th frame image of prediction.
223: the 1st frame image in future M frame image is generated according to second space characteristic pattern.
Image can be restored according to second space characteristic pattern, that is, generate the 1st frame image in future M frame image.Future Any image in M frame image is properly termed as prediction frame image, for example, the 1st frame image in future M frame image is properly termed as the 1 prediction frame image.
In one embodiment, the 1st frame image in step 223 can execute the setting warp of neural network by controller The de-convolution operation of lamination obtains.
Specifically, referring to Fig. 6, the second space characteristic pattern of the 6th frame image is obtained by the de-convolution operation of warp lamination 6th frame image.
224: according in the following M frame image m-1 frame image generate m frame image, wherein m be greater than 1 and be less than or Integer equal to M.
Specifically, the first space characteristics figure of the 1st prediction frame image can be determined in the convolutional layer of neural network.In mind In convolution through network-shot and long term memory layer, previous N frame image and sky (are included according to the corresponding characteristic pattern of the 0th prediction frame image The space characteristics and temporal characteristics of white frame image) and the 1st prediction frame image the first space characteristics figure, determine with the 2nd prediction The corresponding second space characteristic pattern of frame image.In the warp lamination of neural network, according to the 2nd prediction frame image corresponding the Two space characteristics figures generate the 2nd prediction frame image.
Herein, the first space characteristics figure includes the space characteristics of the 1st prediction frame image, and space characteristics include in image The position of entity and posture information, can further include the 1st prediction frame image in each position colouring information or other Actually required information.Characteristic pattern include previous N frame image and the 0th prediction frame image (blank frame image) space characteristics and when Between feature, at the time of temporal characteristics can reflect every image and correspond to, both space characteristics and temporal characteristics are combined and be can reflect The variation and the relationship of time of position and posture.
The generating process of 3rd prediction frame image may include: to determine the 2nd prediction frame figure in the convolutional layer of neural network First space characteristics figure of picture;It is empty according to the first of the 2nd prediction frame image in convolution-shot and long term memory layer of neural network Between characteristic pattern and the 1st corresponding characteristic pattern of prediction frame image determine and predict the corresponding second space characteristic pattern of frame image with the 3rd; And in the warp lamination of neural network, it is pre- that the 3rd is generated according to second space characteristic pattern corresponding with the 3rd prediction frame image Survey frame image.
Step 224 is repeated, predicts frame image until generating M.The following M frame image is the video for constituting prediction.
Specifically, referring to Fig. 6, the 6th frame image is special by the first space that the convolution algorithm of convolutional layer obtains the 6th frame image Sign figure, the first space characteristics figure of blank frame image and the third feature figure of input frame image are by convolution-shot and long term memory The convolution algorithm of layer can also obtain fifth feature figure (space characteristics and time comprising previous 3 frame image and blank frame image Feature).The the first space characteristics figure and fifth feature figure of 6th frame image are by convolution-shot and long term memory layer convolution algorithm The second space characteristic pattern and sixth feature figure for obtaining the 7th frame image of prediction (may include previous 3 frame image, blank frame The space characteristics and temporal characteristics of image and the 6th frame image).The second space characteristic pattern of 7th frame image passes through warp lamination De-convolution operation obtain the 7th frame image.7th frame image obtains the first sky of the 7th frame image by the convolution algorithm of convolutional layer Between characteristic pattern, the first space characteristics figure and sixth feature figure of the 7th frame image are by convolution-shot and long term memory layer convolution fortune The second space characteristic pattern for obtaining the 8th frame image of prediction is calculated, the second space characteristic pattern of the 8th frame image is by warp lamination De-convolution operation obtains the 8th frame image.
The embodiment of the present application provides a kind of video estimation method, by utilizing the previous N frame image in known video First space characteristics figure of N characteristic pattern and blank frame, the determining M frame of discontinuous future in time with previous N frame image Image realizes the video estimation process from image sequence to image sequence, so as to avoid meter caused by non-essential prediction The waste of resource is calculated, the forecasting efficiency of long-time video estimation is improved.
It can be complementary to one another, be combined with each other between Fig. 2, Fig. 4 and embodiment illustrated in fig. 5, to realize efficient view Frequency prediction process.
The embodiment of the present application provides a kind of training method of video estimation model, which includes: to pass through utilization Multiple Sample video training machine learning models obtain video estimation model, and each Sample video in multiple Sample videos includes Previous N frame sample image and future M frame sample image, are spaced P frame between the following M frame sample image and previous N frame sample image Image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
Specifically, machine learning model can be based on convolutional neural networks, Recognition with Recurrent Neural Network, full Connection Neural Network Or one of other neural networks or a variety of and composition.Figure included by Sample video for training machine learning model As frame number may be greater than N+M, that is, in Sample video, previous N frame sample image is discontinuous with future M frame sample image , midfeather several sample images.
In the training process, joined by the network that the loss function of machine learning model reversely updates machine learning model Number, it is final to obtain video estimation model until convergence.The loss function following M frame sample of future M frame sample image and prediction Difference between image characterizes.
Specifically, in training machine learning model, machine learning model predicts future M according to previous N frame sample image Frame sample image, the following M frame sample image predicted.The following M frame sample image and the following M frame sample image of prediction it Between have differences, machine learning model obtains loss function according to the difference, and then reversely updates engineering using loss function Practise the network parameter of model.By utilizing multiple Sample videos constantly training machine learning model, until obtaining video estimation Model.The video estimation model of acquisition can be used for realizing Fig. 2 to video estimation method shown in Fig. 4.
During using multiple Sample video training machine learning models, not according to the prediction of previous N frame sample image The detailed process for carrying out M frame sample image may refer to description of the Fig. 2 into Fig. 4, and to avoid repeating, details are not described herein.
The embodiment of the present application provides a kind of training method of video estimation model, previous in known video by utilizing The N characteristic pattern of N frame sample image, prediction and previous N frame sample image M frame sample image of discontinuous future in time, And then according to the parameter of loss function adjustment machine learning model to obtain video estimation model, so that the video estimation mould Type can shorten when predicting the video in future time section and calculate the time, reduce resource occupation and computational burden, Improve forecasting efficiency.
Exemplary means
Fig. 7 is the structural schematic diagram for the video estimation device 700 that one exemplary embodiment of the application provides.As shown in fig. 7, The device 700 comprises determining that module 710 and generation module 720.
Determining module 710 is used to determine the N characteristic pattern of previous N frame image, wherein N characteristic pattern includes previous N frame figure The space characteristics and temporal characteristics of picture;Generation module 720 is used to generate future M frame image according to N characteristic pattern, wherein the following M P frame image is spaced between frame image and previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.
Specifically, it is determined that the specific work process and function of module 710 and generation module 720, may refer to above-mentioned Fig. 2 In description, details are not described herein.
The embodiment of the present application provides a kind of video estimation device, by utilizing the previous N frame image in known video N characteristic pattern, the determining M frame image of discontinuous future in time with previous N frame image, realization is to the view in future time section The prediction of frequency frame, since the prediction to the video frame being located among previous N frame image and future M frame image is omitted, so as to Enough shorten calculates the time, reduces resource occupation and computational burden, improves forecasting efficiency.
According to one embodiment of the application, determining module 710 is used for: determining first of every frame image in previous N frame image Space characteristics figure, wherein the first space characteristics figure includes the space characteristics of every frame image;Based in previous N frame image with (n-1)th First space characteristics figure of corresponding (n-1)th characteristic pattern of frame image and n-th frame image, determines the n-th characteristic pattern, wherein n is big In 1 and be less than or equal to N-1 integer;Based on N-1 characteristic pattern corresponding with N-1 frame image in previous N frame image and First space characteristics figure of nth frame image, determines the N characteristic pattern.
According to one embodiment of the application, determining module 710 determines that the first space of every frame image in previous N frame image is special The step of sign figure, is executed by executing the convolution algorithm of the convolutional layer of neural network, determining module 710 determine n-th characteristic pattern with And the step of determining N characteristic pattern, is executed by executing convolution-shot and long term memory layer convolution algorithm of neural network.
According to one embodiment of the application, generation module 720 is used for: determining the first of blank frame image according to blank frame image Space characteristics figure;The 1st frame in future M frame image is determined according to the first space characteristics figure of N characteristic pattern and blank frame image The second space characteristic pattern of image;The 1st frame image in future M frame image is generated according to second space characteristic pattern;According to the following M M-1 frame image in frame image generates m frame image, wherein m is the integer greater than 1 and less than or equal to M.
According to one embodiment of the application, generation module 720 determines the first space of blank frame image according to blank frame image The step of characteristic pattern, is executed by the convolution algorithm of the convolutional layer of execution neural network, and generation module 720 is according to N characteristic pattern The step of the second space characteristic pattern of the 1st frame image in future M frame image is determined with the first space characteristics figure of blank frame image Suddenly it is executed by executing convolution-shot and long term memory layer convolution algorithm of neural network, generation module 720 is according to second space Characteristic pattern generates deconvolution fortune of the step of the 1st frame image in future M frame image by the warp lamination of execution neural network It calculates to execute.
The specific work process and function of modules may refer to description of the above-mentioned Fig. 2 into Fig. 6, herein no longer It repeats.
Example electronic device
In the following, being described with reference to Figure 8 the electronic equipment according to the embodiment of the present application.The electronic equipment 80 can execute above-mentioned Video estimation process.
Fig. 8 illustrates the block diagram of the electronic equipment 80 according to the embodiment of the present application.
As shown in figure 8, electronic equipment 80 includes one or more processors 81 and memory 82.
Processor 81 can be central processing unit (CPU) or have data-handling capacity and/or instruction execution capability Other forms processing unit, and can control the other assemblies in electronic equipment 80 to execute desired function.
Memory 82 may include one or more computer program products, and the computer program product may include each The computer readable storage medium of kind form, such as volatile memory and/or nonvolatile memory.The volatile storage Device for example may include random access memory (RAM) and/or cache memory (cache) etc..It is described non-volatile to deposit Reservoir for example may include read-only memory (ROM), hard disk, flash memory etc..It can be deposited on the computer readable storage medium One or more computer program instructions are stored up, processor 81 can run described program instruction, to realize this Shen described above The video estimation method of each embodiment please and/or other desired functions.In the computer readable storage medium In can also store the various contents such as input signal, signal component, noise component(s), video signal.
In one example, electronic equipment 80 can also include: input unit 83 and output device 84, these components pass through The interconnection of bindiny mechanism's (not shown) of bus system and/or other forms.
For example, the input unit 83 can be above-mentioned video camera, the input signal for captured video image.In the electricity When sub- equipment is stand-alone device, which can be communication network connector, collected for receiving from video camera Input signal.
In addition, the input equipment 83 can also include such as keyboard, mouse etc..
The output device 84 can be output to the outside various information, including the video image etc. determined.The output equipment 84 may include such as display, loudspeaker, printer and communication network and its remote output devices connected etc..
Certainly, to put it more simply, illustrated only in Fig. 8 it is some in component related with the application in the electronic equipment 80, The component of such as bus, input/output interface etc. is omitted.In addition to this, according to concrete application situation, electronic equipment 80 is also It may include any other component appropriate.
The embodiment of the present application provides a kind of vehicle, including above-mentioned electronic equipment 80.Electronic equipment 80 can be vehicle-mounted Equipment in system, for executing above-mentioned video estimation method, so as to be adjusted in time to the driving status of vehicle, Guarantee traffic safety.
In one embodiment, vehicle can be automatic driving vehicle or automatic driving vehicle.
The video for the prediction that electronic equipment 80 can be obtained according to video estimation directly controls the speed of current vehicle, turns to Etc. states, or the video of prediction is played by display screen on vehicle, so that driver is according in the video of prediction Content adjusts the states such as speed, the steering of current vehicle.
Illustrative computer program product and computer readable storage medium
Other than the above method and equipment, embodiments herein can also be computer program product comprising meter Calculation machine program instruction, it is above-mentioned that the computer program instructions make the processor execute this specification when being run by processor According to the step in the video estimation method of the various embodiments of the application described in " illustrative methods " part.
The computer program product can be write with any combination of one or more programming languages for holding The program code of row the embodiment of the present application operation, described program design language includes object oriented program language, such as Java, C++ etc. further include conventional procedural programming language, such as " C " language or similar programming language.Journey Sequence code can be executed fully on the user computing device, partly execute on a user device, be independent soft as one Part packet executes, part executes on a remote computing or completely in remote computing device on the user computing device for part Or it is executed on server.
In addition, embodiments herein can also be computer readable storage medium, it is stored thereon with computer program and refers to It enables, the computer program instructions make the processor execute above-mentioned " the exemplary side of this specification when being run by processor According to the step in the video estimation method of the various embodiments of the application described in method " part.
The computer readable storage medium can be using any combination of one or more readable mediums.Readable medium can To be readable signal medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can include but is not limited to electricity, magnetic, light, electricity Magnetic, the system of infrared ray or semiconductor, device or device, or any above combination.Readable storage medium storing program for executing it is more specific Example (non exhaustive list) includes: the electrical connection with one or more conducting wires, portable disc, hard disk, random access memory Device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc Read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The basic principle of the application is described in conjunction with specific embodiments above, however, it is desirable to, it is noted that in this application The advantages of referring to, advantage, effect etc. are only exemplary rather than limitation, must not believe that these advantages, advantage, effect etc. are the application Each embodiment is prerequisite.In addition, detail disclosed above is merely to exemplary effect and the work being easy to understand With, rather than limit, it is that must be realized using above-mentioned concrete details that above-mentioned details, which is not intended to limit the application,.
Device involved in the application, device, equipment, system block diagram only as illustrative example and be not intended to It is required that or hint must be attached in such a way that box illustrates, arrange, configure.As those skilled in the art will appreciate that , it can be connected by any way, arrange, configure these devices, device, equipment, system.Such as "include", "comprise", " tool " etc. word be open vocabulary, refer to " including but not limited to ", and can be used interchangeably with it.Vocabulary used herein above "or" and "and" refer to vocabulary "and/or", and can be used interchangeably with it, unless it is not such that context, which is explicitly indicated,.Here made Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be used interchangeably with it.
It may also be noted that each component or each step are can to decompose in the device of the application, device and method And/or reconfigure.These decompose and/or reconfigure the equivalent scheme that should be regarded as the application.
The above description of disclosed aspect is provided so that any person skilled in the art can make or use this Application.Various modifications in terms of these are readily apparent to those skilled in the art, and are defined herein General Principle can be applied to other aspect without departing from scope of the present application.Therefore, the application is not intended to be limited to Aspect shown in this, but according to principle disclosed herein and the consistent widest range of novel feature.
In order to which purpose of illustration and description has been presented for above description.In addition, this description is not intended to the reality of the application It applies example and is restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this field skill Its certain modifications, modification, change, addition and sub-portfolio will be recognized in art personnel.

Claims (10)

1. a kind of video estimation method, comprising:
Determine the N characteristic pattern of previous N frame image, wherein the N characteristic pattern includes that the space of the previous N frame image is special It seeks peace temporal characteristics;
Generate future M frame image according to the N characteristic pattern, wherein the future M frame image and the previous N frame image it Between be spaced P frame image, N is integer greater than 1, and M, P are the integer more than or equal to 1.
2. according to the method described in claim 1, wherein, the N characteristic pattern of the previous N frame image of determination, comprising:
The first space characteristics figure of every frame image in the previous N frame image is determined, wherein the first space characteristics figure packet Space characteristics containing every frame image;
First based on the (n-1)th characteristic pattern corresponding with the (n-1)th frame image in the previous N frame image and n-th frame image is empty Between characteristic pattern, determine n-th characteristic pattern, wherein n be greater than 1 and be less than or equal to N-1 integer;
First based on N-1 characteristic pattern corresponding with N-1 frame image in the previous N frame image and nth frame image is empty Between characteristic pattern, determine the N characteristic pattern.
3. according to the method described in claim 2, wherein, the first space of every frame image in the previous N frame image of determination The step of characteristic pattern, is executed by executing the convolution algorithm of the convolutional layer of neural network, determination n-th characteristic pattern with And the step of determination N characteristic pattern, remembers the convolution of layer by executing convolution-shot and long term of the neural network and transports It calculates to execute.
4. described to generate future M frame image according to the N characteristic pattern according to the method described in claim 1, wherein, comprising:
The first space characteristics figure of the blank frame image is determined according to blank frame image;
It is determined in the future M frame image according to the first space characteristics figure of the N characteristic pattern and the blank frame image The second space characteristic pattern of 1st frame image;
The 1st frame image in the future M frame image is generated according to the second space characteristic pattern;
M frame image is generated according to the m-1 frame image in the future M frame image, wherein m is greater than 1 and to be less than or equal to The integer of M.
5. described to determine the first of the blank frame image according to blank frame image according to the method described in claim 4, wherein The step of space characteristics figure, is executed by the convolution algorithm of the convolutional layer of execution neural network, described according to the N feature First space characteristics figure of figure and the blank frame image determines the second space of the 1st frame image in the future M frame image The step of characteristic pattern, is executed by executing convolution-shot and long term memory layer convolution algorithm of the neural network, the basis The second space characteristic pattern generates the step of the 1st frame image in the future M frame image by executing the neural network The de-convolution operation of warp lamination execute.
6. a kind of training method of video estimation model, comprising:
By obtaining the video estimation model, the multiple Sample video using multiple Sample video training machine learning models In each Sample video include previous N frame sample image and future M frame sample image, the future M frame sample image and institute State interval P frame image between previous N frame sample image, wherein N is the integer greater than 1, and M, P are the integer more than or equal to 1.
7. a kind of video estimation device, comprising:
Determining module, for determining the N characteristic pattern of previous N frame image, wherein the N characteristic pattern includes the previous N frame The space characteristics and temporal characteristics of image;
Generation module, for according to the N characteristic pattern generate future M frame image, wherein the future M frame image with it is described P frame image is spaced between previous N frame image, N is the integer greater than 1, and M, P are the integer more than or equal to 1.
8. a kind of computer readable storage medium, the storage medium is stored with computer program, and the computer program is used for Execute video estimation method described in any one of the claims 1 to 5.
9. a kind of electronic equipment, the electronic equipment include:
Processor;
For storing the memory of the processor-executable instruction,
Wherein, the processor is for executing video estimation method described in any one of the claims 1 to 5.
10. a kind of vehicle, including electronic equipment as claimed in claim 9.
CN201910610206.9A 2019-07-08 2019-07-08 Video estimation method and apparatus, the training method of video estimation model and vehicle Pending CN110334654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910610206.9A CN110334654A (en) 2019-07-08 2019-07-08 Video estimation method and apparatus, the training method of video estimation model and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910610206.9A CN110334654A (en) 2019-07-08 2019-07-08 Video estimation method and apparatus, the training method of video estimation model and vehicle

Publications (1)

Publication Number Publication Date
CN110334654A true CN110334654A (en) 2019-10-15

Family

ID=68144401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910610206.9A Pending CN110334654A (en) 2019-07-08 2019-07-08 Video estimation method and apparatus, the training method of video estimation model and vehicle

Country Status (1)

Country Link
CN (1) CN110334654A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110839156A (en) * 2019-11-08 2020-02-25 北京邮电大学 Future frame prediction method and model based on video image
CN111901673A (en) * 2020-06-24 2020-11-06 北京大学 Video prediction method, device, storage medium and terminal

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170255832A1 (en) * 2016-03-02 2017-09-07 Mitsubishi Electric Research Laboratories, Inc. Method and System for Detecting Actions in Videos
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism
CN107492113A (en) * 2017-06-01 2017-12-19 南京行者易智能交通科技有限公司 A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method
US20180144248A1 (en) * 2016-11-18 2018-05-24 Salesforce.Com, Inc. SENTINEL LONG SHORT-TERM MEMORY (Sn-LSTM)
CN108960160A (en) * 2018-07-10 2018-12-07 深圳地平线机器人科技有限公司 The method and apparatus of structural state amount are predicted based on unstructured prediction model
CN109829495A (en) * 2019-01-29 2019-05-31 南京信息工程大学 Timing image prediction method based on LSTM and DCGAN
CN109841226A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 A kind of single channel real-time noise-reducing method based on convolution recurrent neural network
CN109891897A (en) * 2016-10-27 2019-06-14 诺基亚技术有限公司 Method for analyzing media content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170255832A1 (en) * 2016-03-02 2017-09-07 Mitsubishi Electric Research Laboratories, Inc. Method and System for Detecting Actions in Videos
CN109891897A (en) * 2016-10-27 2019-06-14 诺基亚技术有限公司 Method for analyzing media content
US20180144248A1 (en) * 2016-11-18 2018-05-24 Salesforce.Com, Inc. SENTINEL LONG SHORT-TERM MEMORY (Sn-LSTM)
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism
CN107492113A (en) * 2017-06-01 2017-12-19 南京行者易智能交通科技有限公司 A kind of moving object in video sequences position prediction model training method, position predicting method and trajectory predictions method
CN108960160A (en) * 2018-07-10 2018-12-07 深圳地平线机器人科技有限公司 The method and apparatus of structural state amount are predicted based on unstructured prediction model
CN109841226A (en) * 2018-08-31 2019-06-04 大象声科(深圳)科技有限公司 A kind of single channel real-time noise-reducing method based on convolution recurrent neural network
CN109829495A (en) * 2019-01-29 2019-05-31 南京信息工程大学 Timing image prediction method based on LSTM and DCGAN

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110839156A (en) * 2019-11-08 2020-02-25 北京邮电大学 Future frame prediction method and model based on video image
CN111901673A (en) * 2020-06-24 2020-11-06 北京大学 Video prediction method, device, storage medium and terminal
CN111901673B (en) * 2020-06-24 2021-12-03 北京大学 Video prediction method, device, storage medium and terminal

Similar Documents

Publication Publication Date Title
US11693901B2 (en) Systems and methods for geolocation prediction
CN111386550A (en) Unsupervised learning of image depth and ego-motion predictive neural networks
CN111246091B (en) Dynamic automatic exposure control method and device and electronic equipment
CN109889849B (en) Video generation method, device, medium and equipment
EP3847619B1 (en) Unsupervised depth prediction neural networks
US20210056388A1 (en) Knowledge Transfer Between Different Deep Learning Architectures
CN107920257A (en) Video Key point real-time processing method, device and computing device
CN110334654A (en) Video estimation method and apparatus, the training method of video estimation model and vehicle
CN107909638A (en) Rendering intent, medium, system and the electronic equipment of dummy object
CN108648253A (en) The generation method and device of dynamic picture
US11967150B2 (en) Parallel video processing systems
EP3663965A1 (en) Method for predicting multiple futures
CN115512251A (en) Unmanned aerial vehicle low-illumination target tracking method based on double-branch progressive feature enhancement
CN109447141A (en) Away rust by laser method and device based on machine learning
CN109981991A (en) Model training method, image processing method, device, medium and electronic equipment
CN110378250A (en) Training method, device and the terminal device of neural network for scene cognition
CN109903350A (en) Method for compressing image and relevant apparatus
KR20210086583A (en) Method and apparatus for controlling driverless vehicle and electronic device
CN109685805A (en) A kind of image partition method and device
KR20230137991A (en) Rendering new images of scenes using a geometry-aware neural network adjusted according to latent variables.
CN116634638A (en) Light control strategy generation method, light control method and related device
CN110719487B (en) Video prediction method and device, electronic equipment and vehicle
CN112668596B (en) Three-dimensional object recognition method and device, recognition model training method and device
CN109711349B (en) Method and device for generating control instruction
CN108881899A (en) Based on the pyramidal image prediction method and apparatus of optical flow field and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015