CN110719487A - Video prediction method and device, electronic equipment and vehicle - Google Patents

Video prediction method and device, electronic equipment and vehicle Download PDF

Info

Publication number
CN110719487A
CN110719487A CN201810770361.2A CN201810770361A CN110719487A CN 110719487 A CN110719487 A CN 110719487A CN 201810770361 A CN201810770361 A CN 201810770361A CN 110719487 A CN110719487 A CN 110719487A
Authority
CN
China
Prior art keywords
distribution
encoder
frame
generator
priori
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810770361.2A
Other languages
Chinese (zh)
Other versions
CN110719487B (en
Inventor
侯鹏飞
范坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Horizon Robotics Science and Technology Co Ltd
Original Assignee
Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Horizon Robotics Science and Technology Co Ltd filed Critical Shenzhen Horizon Robotics Science and Technology Co Ltd
Priority to CN201810770361.2A priority Critical patent/CN110719487B/en
Publication of CN110719487A publication Critical patent/CN110719487A/en
Application granted granted Critical
Publication of CN110719487B publication Critical patent/CN110719487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

A video prediction method, a video prediction apparatus, an electronic device, and a vehicle are disclosed. The video prediction method comprises the following steps: a training step comprising: generating an a priori distribution from a previous frame using an a priori encoder; generating an a posteriori distribution from a previous frame and a subsequent frame using an a posteriori encoder; using the a posteriori distribution as an intermediate variable of a generator to generate a first predicted frame from a previous frame; and optimizing the prior encoder and the generator with the difference between the first predicted frame and the subsequent frame and the KL divergence between the prior distribution and the posterior distribution as a loss function; and a predicting step comprising: generating an a priori distribution from a known frame using an a priori encoder; and using the a priori distribution as an intermediate variable of the generator to generate future frames from the known frames. In this way, the generator and the a priori encoder for predicting video can be optimized by using the a priori distribution of the a priori encoder and the a posteriori distribution of the a posteriori encoder, thereby simplifying the training process of video prediction and improving the prediction effect.

Description

Video prediction method and device, electronic equipment and vehicle
Technical Field
The present application relates generally to the field of Assisted Driving (ADAS), and more particularly, to a video prediction method, a video prediction apparatus, an electronic device, and a vehicle.
Background
In recent years, automated driving, or Advanced Driving Assistance Systems (ADAS), have received extensive attention and intense research. The ADAS system needs to sense various states of the vehicle itself and the surrounding environment using various vehicle-mounted sensors, collect data, perform identification, detection and tracking of static and dynamic entities, and perform systematic calculation and analysis in combination with map data, thereby making driving policy decisions and finally realizing an automatic driving function.
In an automatic driving scene, videos obtained by image acquisition devices such as a camera and the like need to be predicted to realize dynamic prediction of entities in the environment, and then prediction results are used by subsequent modules to realize functions such as driving control of vehicles.
In video prediction, a variational automatic encoder (variational automatic encoder) is used to fit the posterior distribution of the future frames of a video by calculating the prior distribution of the video from the previous frames. The predicted result image needs to be as vivid as possible, and the motion track needs to conform to the real motion rule of the object as much as possible. During the training process, as the training progresses, the posterior distribution may need to gradually approach the distribution of the data set, and the prior distribution may need to gradually approach the posterior distribution. But since the posterior distribution is random at the beginning, the prior training is easily affected, and finally the overall effect is not ideal.
Accordingly, there is a need for an improved video prediction scheme.
Disclosure of Invention
The present application is proposed to solve the above-mentioned technical problems. Embodiments of the present application provide a video prediction method, a video prediction apparatus, an electronic device, and a vehicle, which obtain a posterior distribution by using real data instead of prediction data in a training stage, and optimize a priori encoder and a prediction generator using the posterior distribution, thereby simplifying a training process of video prediction and improving a prediction effect.
According to an aspect of the present application, there is provided a video prediction method, including: a training step comprising: generating an a priori distribution from a previous frame using an a priori encoder; generating an a posteriori distribution from the previous and subsequent frames using an a posteriori encoder; using the a posteriori distribution as an intermediate variable for a generator, generating a first predicted frame from the previous frame using the generator; and optimizing the prior encoder and the generator with the difference between the first predicted frame and the subsequent frame and the KL divergence between the prior distribution and the posterior distribution as a loss function; and, a predicting step comprising: generating an a priori distribution from a known frame using the a priori encoder; and using the a priori distribution as an intermediate variable for the generator, generating future frames from the known frames using the generator.
In the above video prediction method, the a priori encoder and the a posteriori encoder each comprise a convolutional network, and the generator comprises one of a long-short term memory network, a convolutional network, and an optical flow network.
In the above video prediction method, the previous frame and the subsequent frame for the a priori encoder and the a posteriori encoder in the training step are both real video data.
In the above video prediction method, the previous frame and the subsequent frame are video frames acquired by a driving assistance system of the vehicle.
In the above video prediction method, the predicting step further includes: generating a next future frame using the future frame as a known frame.
In the above video prediction method, generating an a priori distribution from a previous frame using an a priori encoder comprises: generating a first data pair of a plurality of means and variances using the previous frame; and generating a first random number that follows a gaussian distribution using the first pair of data for the plurality of means and variances as the prior distribution, and generating a posterior distribution from the previous and subsequent frames using an posterior encoder comprises: generating a plurality of second data pairs of means and variances using the previous and subsequent frames; and generating a second random number subject to a gaussian distribution as the posterior distribution using the second data pair of the plurality of means and variances.
In the above video prediction method, the training step further includes: using the a priori distribution as an intermediate variable for the generator, generating a second predicted frame from the previous frame using the generator; and optimizing the prior encoder and the generator with the difference between the second predicted frame and the subsequent frame and the KL divergence between the prior distribution and the posterior distribution as a loss function.
According to another aspect of the present application, there is provided a video prediction apparatus comprising an a priori encoder, an a posteriori encoder, a generator, a training unit, and a prediction unit, wherein the training unit is configured to: generating an a priori distribution from a previous frame using the a priori encoder; generating an a posteriori distribution from the previous and subsequent frames using the a posteriori encoder; generating, by the generator, a first predicted frame from the previous frame using the a posteriori distribution as an intermediate variable for the generator; and optimizing the prior encoder and the generator using the difference between the first predicted frame and the subsequent frame and the KL divergence between the prior distribution and the posterior distribution as a loss function, and the prediction unit is configured to: generating an a priori distribution from a known frame using the a priori encoder; and generating, by the generator, future frames from the known frames using the a priori distribution as an intermediate variable of the generator.
In the above video prediction apparatus, the a priori encoder and the a posteriori encoder each include a convolutional network, and the generator includes one of a long-short term memory network, a convolutional network, and an optical flow network.
In the above video prediction apparatus, the previous frame and the subsequent frame for the a priori encoder and the a posteriori encoder in the training unit are both real video data.
In the above-described video prediction apparatus, the previous frame and the subsequent frame are video frames acquired by a driving assistance system of the vehicle.
In the above video prediction apparatus, the prediction unit is further configured to: generating a next future frame using the future frame as a known frame.
In the above video prediction apparatus, the training unit generating an a priori distribution from a previous frame using an a priori encoder comprises: generating a first data pair of a plurality of means and variances using the previous frame; and generating a first random number that is uniform with a gaussian distribution using the first data pair of the plurality of means and variances as the prior distribution, and the training unit generating a posterior distribution from the previous and subsequent frames using an posterior encoder comprises: generating a plurality of second data pairs of means and variances using the previous and subsequent frames; and generating a second random number subject to a gaussian distribution as the posterior distribution using the second data pair of the plurality of means and variances.
In the above video prediction apparatus, the training unit is further configured to: using the a priori distribution as an intermediate variable for the generator, generating a second predicted frame from the previous frame using the generator; and optimizing the prior encoder and the generator with the difference between the second predicted frame and the subsequent frame and the KL divergence between the prior distribution and the posterior distribution as a loss function.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the video prediction method as described above.
According to yet another aspect of the present application, there is provided a vehicle comprising an electronic device as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a video prediction method as described above.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic diagram illustrating a system architecture to which a video prediction method according to an embodiment of the present application is applied.
Fig. 2 illustrates a flow diagram of a video prediction method according to an embodiment of the present application.
Fig. 3 illustrates a schematic diagram of a training process of a video prediction method according to an embodiment of the present application.
Fig. 4 illustrates a schematic diagram of a prediction process of a video prediction method according to an embodiment of the present application.
Fig. 5 illustrates a schematic diagram of another example of a training process of a video prediction method according to an embodiment of the present application.
Fig. 6 illustrates a block diagram of a video prediction apparatus according to an embodiment of the present application.
FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.
Summary of the application
As described above, in current video prediction, a variational auto-encoder is mainly used to calculate a prior distribution from the first few frames of a video, and to fit a posterior distribution of the future few frames of the video later, which is usually estimated a prior distribution with an LSTM (Long-Short term memory) network or simply assumed to be a standard normal distribution, and also used to encode the posterior. However, assuming a prior distribution as a normal distribution is too simple to fit the actual data. On the other hand, if the LSTM network is adopted, the difficulty of training the LSTM network structure itself is relatively high, so that it is easy to cause difficulty in learning a priori and a posteriori, and the training efficiency is low.
In view of the above technical problems, a basic idea of the present application is to provide a video prediction method, a video prediction apparatus, an electronic device, and a vehicle, in which a prior encoder generates a prior distribution using real data in a training step, a posterior encoder generates a posterior distribution using real data of more frames, the posterior distribution is used as an intermediate variable of a prediction generator to perform prediction, and the prior encoder and the generator are trained using KL divergence between the prior and posterior distributions and a difference between a predicted frame and a real frame, such as a mean square error. Moreover, the prior encoder and the posterior encoder can adopt a convolution network to replace a commonly used LSTM network, thereby greatly simplifying the training process and improving the prediction effect. In the prediction step, the a priori distribution generated by the trained a priori encoder can be used as an intermediate variable of the generator to perform video prediction.
Here, the video prediction method, the video prediction apparatus, the electronic device, and the vehicle according to the embodiments of the present application may be directly applied to video prediction, and may also be used for any other prediction task that can be converted into video prediction. For example, the motion prediction of objects such as vehicles and pedestrians in an automatic driving scene can be converted into a prediction task of a position occupying lattice point diagram sequence of each object in a panoramic top view. Furthermore, the predicted image does not only refer to a natural image containing a single or three color channels, but may also be any multi-channel three-dimensional data that implicitly expresses other information (e.g., velocity, acceleration).
Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.
Exemplary System
Fig. 1 is a schematic diagram illustrating a system architecture to which a video prediction method according to an embodiment of the present application is applied.
As shown in fig. 1, the system 100 may include an a priori encoder 10, an a posteriori encoder 20, and a generator 30. The a priori encoder 10 may receive several known frames of video, including the current frame, and generate an a priori distribution based thereon. The a posteriori encoder 20 may receive more frames than the a priori encoder 10, i.e. the a posteriori encoder 20 receives several subsequent frames in addition to several known frames received by the a priori encoder 10, so that an a posteriori distribution may be generated based on a preceding known frame and a following subsequent frame. During the training phase, both the a priori distribution and the a posteriori distribution may be provided to generator 30 as intermediate variables by which generator 30 may predict future frames from known frames. The prior encoder 10 and generator 30 may be trained using KL divergence between the prior distribution and the posterior distribution and the difference between the predicted future frame and the real frame, e.g., mean square error, as a loss function.
Here, specific implementations of the a priori encoder 10, the a posteriori encoder 20, and the generator 30 will be described in further detail below.
Exemplary method
Fig. 2 illustrates a flow diagram of a video prediction method 200 according to an embodiment of the present application. As shown in fig. 2, a video prediction method 200 according to an embodiment of the present application may include a training step S210 and a prediction step S220, where a variational auto-encoder (VAE) may be trained in the training step S210, and then prediction may be performed using the trained variational auto-encoder in the prediction step S220. It will be appreciated that the a priori encoder 10 and the generator 30 constitute a variational auto-encoder.
The training step S210 and the prediction step S220 shown in fig. 2 will be described in detail below with reference to fig. 3 to 4. As shown in fig. 2, the training step S210 may include a step S211 of generating an a priori distribution from a previous frame using the a priori encoder 10, and a step S212 of generating an a posteriori distribution from the previous frame and a subsequent frame using the a posteriori encoder 20, which process is illustrated in fig. 3. As shown in FIG. 3, a priori encoder 10 may receive multiple previous frames of video, such as frame Xt-4:XtWherein the frame XtCan be considered the current frame and an a priori distribution P1 is generated. In some embodiments, the process of generating the prior distribution P1 may include utilizing a previous frame Xt-4:XtGenerating a plurality of data pairs of mean μ and variance σ, and then using the meanThe data pair of value μ and variance σ generates a random number that follows a gaussian distribution as the prior distribution. A posteriori encoder 20 receives in addition to the previous frame Xt-4:XtIn addition, a subsequent frame X is receivedt+1:Xt+4Here, the previous frame X for the training stept-4:XtAnd subsequent frame Xt+1:Xt+4Are real data rather than predictive data generated by generator 30. For example, when applied to the driving assistance field, the previous frame Xt-4:XtAnd subsequent frame Xt+1:Xt+4Which may be a video frame captured by a sensor, such as a camera, of the vehicle's driver assistance system. A posteriori encoder 20 uses the previous frame Xt-4:XtAnd subsequent frame Xt+1:Xt+4A posterior distribution P2 is generated. Specifically, the process of generating the posterior distribution P2 may include utilizing the previous frame Xt-4:XtAnd subsequent frame Xt+1:Xt+4A plurality of data pairs of the mean μ and the variance σ are generated, and then random numbers that follow a gaussian distribution are generated using the data pairs of the mean μ and the variance σ as a posterior distribution. The KL divergence between the prior distribution P1 and the posterior distribution P2 may be used as a loss function for the training process.
With continued reference to fig. 2, the training step S210 further comprises a step S213 of using the posterior distribution P2 as an intermediate variable for the generator 30, using the generator 30 from the previous frame Xt-4:XtA first predicted frame is generated. Fig. 3 illustrates this process. In the example shown in fig. 3, the generator 30 comprises a Long Short Term Memory (LSTM) network comprising an input layer 31, an output layer 33, and a plurality of intermediate layers 32 located therebetween, fig. 3 showing three intermediate layers 32a, 32b and 32c, the intermediate layers 32 also being referred to as hidden layers. The posterior distribution P2 is provided to middle tier 32 as an intermediate variable, otherwise known as an implicit variable. Generator 30 uses intermediate variables to derive from previous frame Xt-4:XtGenerating a predicted frame X't+1. It should be understood that other models, such as convolutional networks, optical flow networks, etc., may be employed by generator 30 depending on the application scenario.
In step S214, frame X 'may be predicted't+1And its true value (i.e. the subsequent frame X)t+1) Example of the difference therebetweenSuch as Mean Square Error (MSE) and KL divergence between the prior distribution P1 and the posterior distribution P2 as a loss function, i.e., loss-MSE + KL, to train the prior encoder 10 and the generator 30. For example, the trainable network parameters of the a priori encoder 10 and generator 30 may be trained and optimized by methods such as the random steepest descent method (SGD) or improvements thereof. The a posteriori encoder 20 may be previously trained or may be synchronously trained in a training step S210 together with the a priori encoder 10 and the generator 30.
The inventors have found that when convolutional networks are used for both a priori encoder 10 and a posteriori encoder 20, the KL divergence between a priori distribution P1 and a posteriori distribution P2 can converge faster during training than when LSTM was used before, and therefore training efficiency can be significantly improved. Compared with the traditional LSTM network, the training process is simple and stable when the prior encoder and the posterior encoder both adopt the convolutional network, parameters do not need to be adjusted, special skills are not needed, the calculated amount is small, and the training speed is high.
After the training process is completed, a prediction step S220 may be performed, the prediction step S220 performing prediction using only the a priori encoder 10 and the generator 30. Specifically, in step S221, the prior distribution P1 is generated from the known frame using the prior encoder 10, and then in step S222, the future frame is generated from the known frame using the generator 30 using the prior distribution P1 as an intermediate variable of the generator 30.
Fig. 4 illustrates this process. As shown in FIG. 4, the a priori encoder 10 receives a known frame X of video to be predictedt-4:XtWhich includes the current frame Xt and uses the known frame Xt-4:XtAn a priori distribution P1 is generated. The prior distribution P1 is provided to an intermediate layer of the generator 30 as an intermediate variable that the generator 30 uses from the known frame Xt-4:XtGenerating future frame X't+1. In the prediction process, predicted future frame X't+1May also be provided as a known frame to the a priori encoder 10 and generator 30 to predict the next future frame X't+2. According to the video prediction method, after training is completed, the predicted image is clear and relatively accords with a real objectLaw of motion of the body. When the video prediction method according to the embodiment of the application is applied to the field of driving assistance, the predicted future frame can be used for a driving assistance system to decide an appropriate driving strategy.
Fig. 5 illustrates a schematic diagram of another example of a training process of a video prediction method according to an embodiment of the present application. For simplicity and clarity, only the differences of the example of fig. 5 from the example of fig. 3 will be described below. As shown in fig. 5, the a priori distribution P1 produced by the a priori encoder 10 is also provided to the generator 30 as an intermediate variable during the training process. Generator 30 generates predicted frame X 'using a posterior distributed intermediate variable P2't+1And generates a predicted frame Y 'using an a priori distributed intermediate variable P1't+1. In one aspect, predicted frame X 'may be used't+1True value X corresponding theretot+1The difference between, for example, mean square error MSE1 and KL divergence between a priori distribution P1 and a posteriori distribution P2 are trained as a loss function, which may be referred to as first training, on the other hand, predicted frame Y 'may be used't+1True value X corresponding theretot+1The differences between, for example, mean square error MSE2 and the KL divergence between a prior distribution P1 and a posterior distribution P2 are trained as a loss function, which may be referred to as a second training. The first training and the second training can be carried out alternatively or synchronously, the training process introduces the idea of confrontation training, better training effect can be realized, and the accuracy of the prediction result is further improved. Finally, through training, the prior distribution P1 converges to agree with the posterior distribution P2.
Exemplary devices
Fig. 6 illustrates a functional block diagram of a video prediction apparatus 300 according to an embodiment of the present application. As shown in fig. 6, the video prediction apparatus 300 according to the embodiment of the present application may include an a priori encoder 310, an a posteriori encoder 320, a generator 330, a training unit 340, and a prediction unit 350.
Training unit 340 may be configured to schedule other units to perform the training process, and in particular, may use a priori encoder 310 to perform the training process from previous frame Xt-4:XtGenerating an a priori distribution P1 from a previous frame X using a posteriori encoder 320t-4:XtAnd subsequentlyFrame Xt+1:Xt+4A posterior distribution P2 is generated and a posterior distribution P2 is provided to the generator 330 as an intermediate variable. Training unit 340 may also use generator 330 to derive from previous frame Xt-4:XtGenerating a predicted frame X't+1And to predict frame X't+1Corresponding to true value, i.e. frame Xt+1The difference between, for example, the Mean Square Error (MSE) and the KL divergence between the a priori distribution P1 and the a posteriori distribution P2, as loss functions to optimize the a priori encoder 310 and generator 330. In some embodiments, training unit 340 may also provide a priori distribution P1 to generator 330 as an intermediate variable, using generator 330, from a previous frame Xt-4:XtGenerating a predicted frame Y't+1And to predict frame Y't+1Corresponding to true value, i.e. frame Xt+1The difference between, for example, the mean square error and the KL divergence between the a priori distribution P1 and the a posteriori distribution P2, as a loss function to optimize the a priori encoder 310 and the generator 330. The training unit 340 may alternatively or synchronously perform a training process with the prior distribution P1 and the posterior distribution P2 as intermediate variables until the prior distribution P1 and the posterior distribution P2 converge to be consistent.
The prediction unit 350 may be configured to schedule other units to perform the training process, and in particular, may use the a priori encoder 310 to derive from the known frame Xt-4:XtAn a priori distribution P1 is generated and an a priori distribution P1 is used as an intermediate variable for the generator 330 from the known frame X by the generator 330t-4:XtGenerating future frame X't+1
In one example, the a priori encoder 310 and the a posteriori encoder 320 may each comprise a convolutional network, and the generator 330 may comprise one of a long-short term memory network, a convolutional network, and an optical flow network.
It is to be understood that the specific functions and operations of the respective units and modules in the video prediction apparatus 300 have been described in detail in the video prediction method described above with reference to fig. 1 to 5, and thus, a repetitive description thereof will be omitted.
As described above, the video prediction apparatus 300 according to the embodiment of the present application can be implemented in various terminal devices, for example, in-vehicle devices for driving assistance. In one example, the video prediction apparatus 300 according to the embodiment of the present application may be integrated into the terminal device as a software module and/or a hardware module. For example, the apparatus 300 may be a software module in an operating system of the terminal device, or may be an application program developed for the terminal device, which runs on a CPU (central processing unit) and/or a GPU (graphics processing unit), or runs on a dedicated hardware acceleration chip, such as a dedicated chip adapted to run a deep neural network; of course, the apparatus 300 may also be one of many hardware modules of the terminal device.
Alternatively, in another example, the video prediction apparatus 300 and the terminal device may be separate devices, and the apparatus 300 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information according to an agreed data format.
Exemplary electronic device
FIG. 7 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.
As shown in fig. 7, electronic device 400 includes one or more processors 410 and memory 420. The processor 410 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 400 to perform desired functions.
Memory 420 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by the processor 410 to implement the video prediction methods of the various embodiments of the present application described above and/or other desired functions. Various content such as previous frames, subsequent frames, predicted frames, etc. may also be stored in the computer readable storage medium.
In one example, electronic device 400 can also include an input interface 430 and an output interface 440, which are interconnected by a bus system and/or other form of connection mechanism (not shown). The input interface 430 may be connected to, for example, a video capture device such as an in-vehicle camera to receive known video frames that may be used for the training or prediction steps described above. The output interface 440 may output the prediction result to the outside, for example, may output the prediction result to a driving assistance system of the vehicle for use in determining a driving strategy.
Of course, for simplicity, only some of the components of the electronic device 400 relevant to the present application are shown in fig. 7, and components such as buses, input/output interfaces, and the like are omitted. In addition, electronic device 400 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the video prediction method according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.
The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the video prediction method according to various embodiments of the present application described in the "exemplary methods" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.
The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (13)

1. A video prediction method, comprising:
a training step comprising:
generating an a priori distribution from a previous frame using an a priori encoder;
generating an a posteriori distribution from the previous and subsequent frames using an a posteriori encoder;
using the a posteriori distribution as an intermediate variable for a generator, generating a first predicted frame from the previous frame using the generator; and
optimizing the prior encoder and the generator with a difference between the first predicted frame and the subsequent frame and a KL divergence between the prior distribution and the posterior distribution as a loss function; and
a prediction step comprising:
generating an a priori distribution from a known frame using the a priori encoder; and
using the prior distribution as an intermediate variable for the generator, generating future frames from the known frames using the generator.
2. The method of claim 1, wherein the a priori encoder and the a posteriori encoder each comprise a convolutional network, and the generator comprises one of a long short term memory network, a convolutional network, and an optical flow network.
3. The method of claim 1, wherein the previous and subsequent frames for the a priori encoder and the a posteriori encoder in the training step are both real video data.
4. The video prediction method of claim 3, wherein the previous frame and the subsequent frame are video frames acquired by a driver assistance system of a vehicle.
5. The method of claim 1, wherein the predicting step further comprises:
generating a next future frame using the future frame as a known frame.
6. The method of claim 1, wherein,
generating an a priori distribution from a previous frame using an a priori encoder includes:
generating a first data pair of a plurality of means and variances using the previous frame; and
using the first pair of data of the plurality of means and variances as the prior distribution to generate a first random number that follows a Gaussian distribution, an
Generating an a posteriori distribution from the previous and subsequent frames using an a posteriori encoder comprises:
generating a plurality of second data pairs of means and variances using the previous and subsequent frames; and
generating a second random number subject to a Gaussian distribution as the posterior distribution using the second data pair of the plurality of means and variances.
7. The method of claim 1, wherein the training step further comprises:
using the a priori distribution as an intermediate variable for the generator, generating a second predicted frame from the previous frame using the generator; and
optimizing the prior encoder and the generator with a difference between the second predicted frame and the subsequent frame and a KL divergence between the prior distribution and the posterior distribution as a loss function.
8. A video prediction apparatus includes a prior encoder, a posterior encoder, a generator, a training unit, and a prediction unit,
the training unit is configured to:
generating an a priori distribution from a previous frame using the a priori encoder;
generating an a posteriori distribution from the previous and subsequent frames using the a posteriori encoder;
generating, by the generator, a first predicted frame from the previous frame using the a posteriori distribution as an intermediate variable for the generator; and
optimizing the prior encoder and the generator using a difference between the first predicted frame and the subsequent frame and a KL divergence between the prior distribution and the posterior distribution as a loss function, and
the prediction unit is configured to:
generating an a priori distribution from a known frame using the a priori encoder; and
generating, by the generator, a future frame from the known frame using the prior distribution as an intermediate variable of the generator.
9. The apparatus of claim 8, wherein the training unit is further configured to:
generating, by the generator, a second predicted frame from the previous frame using the prior distribution as an intermediate variable of the generator; and
optimizing the prior encoder and the generator with a difference between the second predicted frame and the subsequent frame and a KL divergence between the prior distribution and the posterior distribution as a loss function.
10. The apparatus of claim 9, wherein said a priori encoder and said a posteriori encoder each comprise a convolutional network, and said generator comprises one of a long short term memory network, a convolutional network, and an optical flow network.
11. An electronic device, comprising:
a processor; and
memory having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the video prediction method of any one of claims 1-7.
12. A vehicle comprising the electronic device of claim 11.
13. A computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform a video prediction method according to any one of claims 1-7.
CN201810770361.2A 2018-07-13 2018-07-13 Video prediction method and device, electronic equipment and vehicle Active CN110719487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810770361.2A CN110719487B (en) 2018-07-13 2018-07-13 Video prediction method and device, electronic equipment and vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810770361.2A CN110719487B (en) 2018-07-13 2018-07-13 Video prediction method and device, electronic equipment and vehicle

Publications (2)

Publication Number Publication Date
CN110719487A true CN110719487A (en) 2020-01-21
CN110719487B CN110719487B (en) 2021-11-09

Family

ID=69208557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810770361.2A Active CN110719487B (en) 2018-07-13 2018-07-13 Video prediction method and device, electronic equipment and vehicle

Country Status (1)

Country Link
CN (1) CN110719487B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111031351A (en) * 2020-03-11 2020-04-17 北京三快在线科技有限公司 Method and device for predicting target object track
CN113473124A (en) * 2021-05-28 2021-10-01 北京达佳互联信息技术有限公司 Information acquisition method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055977A (en) * 2009-11-06 2011-05-11 三星电子株式会社 Fast motion estimation methods using multiple reference frames
CN104866821A (en) * 2015-05-04 2015-08-26 南京大学 Video object tracking method based on machine learning
CN105205783A (en) * 2015-09-14 2015-12-30 河海大学 SAR image blind super-resolution reestablishment method in combination with priori estimation
US20170091319A1 (en) * 2014-05-15 2017-03-30 Sentient Technologies (Barbados) Limited Bayesian visual interactive search
CN107679556A (en) * 2017-09-18 2018-02-09 天津大学 The zero sample image sorting technique based on variation autocoder
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102055977A (en) * 2009-11-06 2011-05-11 三星电子株式会社 Fast motion estimation methods using multiple reference frames
US20170091319A1 (en) * 2014-05-15 2017-03-30 Sentient Technologies (Barbados) Limited Bayesian visual interactive search
CN104866821A (en) * 2015-05-04 2015-08-26 南京大学 Video object tracking method based on machine learning
CN105205783A (en) * 2015-09-14 2015-12-30 河海大学 SAR image blind super-resolution reestablishment method in combination with priori estimation
US20180150728A1 (en) * 2016-11-28 2018-05-31 D-Wave Systems Inc. Machine learning systems and methods for training with noisy labels
CN107679556A (en) * 2017-09-18 2018-02-09 天津大学 The zero sample image sorting technique based on variation autocoder

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111031351A (en) * 2020-03-11 2020-04-17 北京三快在线科技有限公司 Method and device for predicting target object track
CN113473124A (en) * 2021-05-28 2021-10-01 北京达佳互联信息技术有限公司 Information acquisition method and device, electronic equipment and storage medium
CN113473124B (en) * 2021-05-28 2024-02-06 北京达佳互联信息技术有限公司 Information acquisition method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110719487B (en) 2021-11-09

Similar Documents

Publication Publication Date Title
US10977530B2 (en) ThunderNet: a turbo unified network for real-time semantic segmentation
US11769321B2 (en) Risk prediction method
CN111931929B (en) Training method and device for multitasking model and storage medium
CN110070572B (en) Method and system for generating range images using sparse depth data
CN110622176A (en) Video partitioning
EP3710993B1 (en) Image segmentation using neural networks
CN112183166A (en) Method and device for determining training sample and electronic equipment
US11967150B2 (en) Parallel video processing systems
CN110719487B (en) Video prediction method and device, electronic equipment and vehicle
US20220180647A1 (en) Collection, Processing, and Output of Flight Information Method, System, and Apparatus
CN114757301A (en) Vehicle-mounted visual perception method and device, readable storage medium and electronic equipment
CN111753862A (en) Method and device for training neural network model and image recognition method
CN110533184B (en) Network model training method and device
US11062141B2 (en) Methods and apparatuses for future trajectory forecast
CN112668596B (en) Three-dimensional object recognition method and device, recognition model training method and device
CN108960160B (en) Method and device for predicting structured state quantity based on unstructured prediction model
CN111639591A (en) Trajectory prediction model generation method and device, readable storage medium and electronic equipment
CN108881899B (en) Image prediction method and device based on optical flow field pyramid and electronic equipment
CN110753239B (en) Video prediction method, video prediction device, electronic equipment and vehicle
CN112150529A (en) Method and device for determining depth information of image feature points
CN110334654A (en) Video estimation method and apparatus, the training method of video estimation model and vehicle
WO2020040929A1 (en) Human action recognition in drone videos
CN111680674B (en) Hall personnel monitoring method based on self-integrated attention mechanism
JP2023553630A (en) Keypoint-based behavioral localization
US20210209399A1 (en) Bounding box generation for object detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant