WO2020252926A1 - 自动驾驶行为预测方法、装置、计算机设备及存储介质 - Google Patents

自动驾驶行为预测方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020252926A1
WO2020252926A1 PCT/CN2019/103467 CN2019103467W WO2020252926A1 WO 2020252926 A1 WO2020252926 A1 WO 2020252926A1 CN 2019103467 W CN2019103467 W CN 2019103467W WO 2020252926 A1 WO2020252926 A1 WO 2020252926A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
feature vector
result
excitation
convolution
Prior art date
Application number
PCT/CN2019/103467
Other languages
English (en)
French (fr)
Inventor
王健宗
吴天博
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020252926A1 publication Critical patent/WO2020252926A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • This application relates to the technical field of intelligent decision-making, and in particular to an automatic driving behavior prediction method, device, computer equipment and storage medium.
  • the unmanned driving system is a comprehensive system integrating environmental perception, planning and decision-making, multi-level assisted driving and other functions. It uses computers, modern sensing, information fusion, communication, artificial intelligence and automatic control technologies, which are typical High-tech complex.
  • the key technologies of autonomous driving can be divided into four parts: environment perception, behavior decision-making, path planning and motion control.
  • the machine learning systems commonly used in unmanned driving systems are based on supervised learning, but this requires a large number of labeled training samples, and also lacks common sense and independent prediction capabilities.
  • autonomous driving the external complex environment is often separated from the training samples, which makes the model lose its decision-making ability.
  • the embodiments of the present application provide an automatic driving behavior prediction method, device, computer equipment, and storage medium, aiming to solve the problem that machine learning systems commonly used in unmanned driving systems in the prior art are established based on supervised learning and require a large number of tags. Training samples, and the complex external environment in automatic driving is often separated from the training samples, so that the model loses the ability to make decisions and the ability to independently predict.
  • an embodiment of the present application provides an automatic driving behavior prediction method, which includes:
  • the compressed abstract representation feature vector is used as the input of the pre-trained hybrid density network-recurrent neural network model to obtain the prediction vector; wherein the output of the recurrent neural network model in the hybrid density network-recurrent neural network model is the same as the Compress the probability density function corresponding to the abstract characterization feature vector;
  • Both the compressed abstract representation feature vector and the prediction vector are input to a controller to generate an action vector; wherein the controller is a linear model;
  • an automatic driving behavior prediction device which includes:
  • the image receiving unit is used to receive 2D image frames in the video sequence currently collected by the autonomous driving terminal, and use the 2D image frames as the input of the variational autoencoder to obtain the compressed abstract characterization features corresponding to the 2D image frames vector;
  • the prediction vector acquisition unit is configured to use the compressed abstract representation feature vector as the input of a pre-trained hybrid density network-cyclic neural network model to obtain a prediction vector; wherein, the hybrid density network-cyclic neural network model is a cyclic neural network
  • the output of the model is a probability density function corresponding to the compressed abstract characterization feature vector;
  • An action acquisition unit configured to input the compressed abstract representation feature vector and the prediction vector to a controller to generate an action vector; wherein the controller is a linear model;
  • the vector sending unit is used to send the action vector to the automatic driving end.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
  • the program implements the automatic driving behavior prediction method described in the first aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned On the one hand, the automatic driving behavior prediction method.
  • FIG. 1 is a schematic flowchart of an automatic driving behavior prediction method provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a sub-flow of an automatic driving behavior prediction method provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of the structure of a neural network used in the automatic driving behavior prediction method provided by an embodiment of the application for inputting the pixel matrix to the variational autoencoder to perform multiple excitation convolution and excitation deconvolution;
  • FIG. 4 is a schematic diagram of another sub-flow of the method for predicting autonomous driving behavior provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of data flow in the method for predicting autonomous driving behavior provided by an embodiment of the application
  • FIG. 6 is a schematic diagram of a hybrid density network-cyclic neural network model in the method for predicting autonomous driving behavior provided by an embodiment of the application;
  • FIG. 7 is a schematic block diagram of an automatic driving behavior prediction device provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of subunits of an automatic driving behavior prediction apparatus provided by an embodiment of the application.
  • FIG. 9 is a schematic block diagram of another subunit of the automatic driving behavior prediction apparatus provided by an embodiment of the application.
  • FIG. 10 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic flowchart of an automatic driving behavior prediction method provided by an embodiment of the application.
  • the automatic driving behavior prediction method is applied to a smart car capable of driverless driving.
  • the application software in the smart car is executed.
  • the method includes steps S110 to S140.
  • S110 Receive a 2D image frame in a video sequence currently collected by the autonomous driving terminal, and use the 2D image frame as an input of a variational autoencoder to obtain a compressed abstract representation feature vector corresponding to the 2D image frame.
  • the camera installed in the driverless smart car that is, the autonomous driving end
  • one or more images can be randomly selected after cutting the video to obtain a 2D image frame
  • the image frame is input to the variational autoencoder (variational autoencoder abbreviated as VAE), and after processing by the variational autoencoder, the compressed abstract characterization feature vector corresponding to the 2D image frame can be obtained.
  • VAE variational autoencoder abbreviated as VAE
  • the encoding/decoding process in the variational autoencoder is a process of convolution/deconvolutional neural network, that is, the variational autoencoder as a visual processing module, its task is to learn the abstraction of each observed input frame Compress the representation, and then compress what the model sees (image frame) on each time frame.
  • the observed input image is condensed into a 32-dimensional latent vector (z) that obeys the Gaussian distribution, which means a smaller environmental representation and speeds up the learning process.
  • the function of this step is to condense the surrounding environment such as the straightness of the road, the upcoming curve and your position relative to the road during the driving process to determine the next behavior.
  • step S110 includes:
  • the pixel matrix corresponding to the 2D image frame (usually a 64*64*3 image, representing a 64*64 3-channel image)
  • the pixel matrix needs to be input to the variational autoencoder Perform multiple excitation convolutions and excitation deconvolutions to obtain compressed abstract characterization feature vectors.
  • FIG. 3 it is a schematic diagram of the structure of a neural network used by inputting the pixel matrix to the variational autoencoder for multiple excitation convolution and excitation deconvolution.
  • the 2D image frame can be abstractly compressed and characterized, so as to obtain the compressed abstract representation feature vector corresponding to the 2D image frame. among them,
  • step S111 inputting the pixel matrix to a variational autoencoder to perform multiple excitation convolutions to obtain an encoding result includes:
  • performing multiple excitation deconvolution on the classification result in step S113 to obtain the compressed abstract representation feature vector corresponding to the 2D image frame includes:
  • the third convolution result of 6*6*128 is input to the dense layer (that is, the fully connected layer in the convolutional neural network) as the encoding result, and then fully connected to obtain the 2D image
  • the classification result corresponding to the frame.
  • the excitation deconvolution with the same number of times as the excitation convolution can be used to restore the encoding result to realize the reconstruction of the image.
  • MDN-RNN hybrid density network-recurrent neural network
  • RNN outputs a probability density function p(z) instead of a deterministic prediction z.
  • step S120 includes:
  • the pre-trained mixture density network - the loop neural network model, the modeling need ratio distribution P (z (t + 1)
  • the hybrid density network-recurrent neural network model is specifically an LSTM with 256 hidden units (ie, long short-term memory network).
  • the recurrent neural network model tries to capture the potential understanding of the current state of the vehicle in the environment, but this time
  • the underlying understanding of the current state of the vehicle is based on the previous z (ie, compressed abstract characterization feature vector) and behavior, predicting what the next z might look like, and updating its hidden state.
  • the controller is used for the task of behavior selection.
  • the controller is a densely connected neural network.
  • the input of this network is the cascaded z (the potential state obtained from the VAE-length is 32) and h (the hidden state of the RNN-the length is 256).
  • These three output neurons correspond to three behaviors and are scaled to a suitable range. This behavior is then transmitted to the environment, which returns an updated observation, and then the next cycle begins.
  • step S130 includes:
  • an action vector corresponding to the compressed abstract representation feature vector and the prediction vector is obtained.
  • time step can also be understood as a time frame
  • an observation color image of the road and vehicle environment received by the visual sensor, that is, 2D image frame
  • it is necessary to return to the next Take a series of behavior parameters—that is, the direction of steering (-1 to 1), acceleration (0 to 1), and braking (0 to 1), and then pass this behavior to the environment, return to the next observation, and then Start the next cycle, so as to perform real-time learning from the previous time and space, predict the behavior of the next frame, and have better adaptability to the environment.
  • the motion vector is sent to the autopilot end, thereby controlling the unmanned driving.
  • the action vector includes at least the following behavioral parameters: that is, the direction of steering (-1 to 1), acceleration (0 to 1), and braking (0 to 1).
  • This method realizes the prediction of the future based on visual perception and learning by mixing different neural networks, and increases the accuracy of decision-making.
  • An embodiment of the present application also provides an automatic driving behavior prediction device, which is used to execute any embodiment of the aforementioned automatic driving behavior prediction method.
  • FIG. 7 is a schematic block diagram of an automatic driving behavior prediction apparatus provided by an embodiment of the present application.
  • the automatic driving behavior prediction device 100 can be configured in a smart car capable of driverless driving.
  • the automatic driving behavior prediction device 100 includes an image receiving unit 110, a prediction vector obtaining unit 120, an action obtaining unit 130, and a vector sending unit 140.
  • the image receiving unit 110 is configured to receive 2D image frames in the video sequence currently collected by the autonomous driving terminal, and use the 2D image frames as the input of the variational autoencoder to obtain the compressed abstract representation corresponding to the 2D image frames Feature vector.
  • the camera installed in the driverless smart car that is, the autonomous driving end
  • one or more images can be randomly selected after cutting the video to obtain a 2D image frame
  • the image frame is input to the variational autoencoder (variational autoencoder abbreviated as VAE), and after processing by the variational autoencoder, the compressed abstract characterization feature vector corresponding to the 2D image frame can be obtained.
  • VAE variational autoencoder abbreviated as VAE
  • the encoding/decoding process in the variational autoencoder is a process of convolution/deconvolutional neural network, that is, the variational autoencoder as a visual processing module, its task is to learn the abstraction of each observed input frame Compress the representation, and then compress what the model sees (image frame) on each time frame.
  • the observed input image is condensed into a 32-dimensional latent vector (z) that obeys the Gaussian distribution, which means a smaller environmental representation and speeds up the learning process.
  • the function of this step is to condense the surrounding environment such as the straightness of the road, the upcoming curve and your position relative to the road during the driving process to determine the next behavior.
  • the image receiving unit 110 includes:
  • the encoding unit 111 is configured to obtain a pixel matrix corresponding to the 2D image frame, and input the pixel matrix to a variational autoencoder to perform multiple excitation convolutions to obtain an encoding result;
  • the fully connected unit 112 is configured to fully connect the encoding result through the dense layer of the variational autoencoder to obtain the classification result;
  • the decoding unit 113 is configured to perform multiple excitation deconvolution on the classification result to obtain a compressed abstract characterization feature vector corresponding to the 2D image frame; wherein, the pixel matrix is input to the variational autoencoder for multiplexing
  • the number of excitation convolutions for sub-excitation convolution is the same as the number of excitation deconvolutions for performing multiple excitation deconvolutions on the classification result.
  • the pixel matrix corresponding to the 2D image frame (usually a 64*64*3 image, representing a 64*64 3-channel image)
  • the pixel matrix needs to be input to the variational autoencoder Perform multiple excitation convolutions and excitation deconvolutions to obtain compressed abstract characterization feature vectors.
  • FIG. 3 it is a schematic diagram of the structure of a neural network used by inputting the pixel matrix to the variational autoencoder for multiple excitation convolution and excitation deconvolution.
  • the 2D image frame can be abstractly compressed and characterized, so as to obtain the compressed abstract representation feature vector corresponding to the 2D image frame.
  • the encoding unit 111 includes:
  • a pixel matrix obtaining unit configured to obtain a 64*64*3 pixel matrix corresponding to the 2D image frame
  • the first excitation convolution unit is configured to perform the first excitation convolution on the 64*64*3 pixel matrix through the 32*4 first convolution kernel to obtain the first convolution result of 31*31*32;
  • the second excitation convolution unit is used to perform the second excitation convolution on the first convolution result of 31*31*32 through the second 64*4 convolution kernel to obtain the second convolution result of 14*14*64 ;
  • the third excitation convolution unit is used to perform the third excitation convolution on the second convolution result of 14*14*64 through the third convolution kernel of 128*4 to obtain the third convolution result of 6*6*128 As a result of encoding.
  • the decoding unit 113 includes:
  • the convolution result obtaining unit is used to obtain the 5*5*128 convolution result corresponding to the classification result;
  • the first excitation deconvolution unit is used to perform the first excitation deconvolution on the 5*5*128 convolution result corresponding to the classification result through the 64*5 fourth convolution kernel to obtain 13*13*64 The first deconvolution result of;
  • the second excitation deconvolution unit is used to perform the second excitation deconvolution on the first deconvolution result of 13*13*64 through the fifth convolution kernel of 32*6 to obtain the second deconvolution of 30*30*32 Deconvolution result;
  • the third excitation deconvolution unit is used to perform the third excitation deconvolution on the 30*30*32 second deconvolution result through the 3*6 sixth convolution kernel to obtain the 64*64*3 third
  • the deconvolution result is used as a compressed abstract characterization feature vector corresponding to the 2D image frame.
  • the third convolution result of 6*6*128 is input to the dense layer (that is, the fully connected layer in the convolutional neural network) as the encoding result, and then fully connected to obtain the 2D image
  • the classification result corresponding to the frame.
  • the excitation deconvolution with the same number of times as the excitation convolution can be used to restore the encoding result to realize the reconstruction of the image.
  • the prediction vector acquisition unit 120 is configured to use the compressed abstract representation feature vector as the input of the pre-trained hybrid density network-cyclic neural network model to obtain the prediction vector; wherein, the cyclic neural network in the hybrid density network-cyclic neural network model The output of the network model is the probability density function corresponding to the compressed abstract characterization feature vector.
  • MDN-RNN hybrid density network-recurrent neural network
  • RNN outputs a probability density function p(z) instead of a deterministic prediction z.
  • the prediction vector obtaining unit 120 includes:
  • the first neural network processing unit 121 is configured to use the compressed abstract representation feature vector as the input of the recurrent neural network model in the pre-trained hybrid density network-recurrent neural network model to obtain the probability corresponding to the compressed abstract representation feature vector Density function
  • the second neural network processing unit 122 is configured to use the probability density function and control parameters as the input of the mixed density network model in the pre-trained mixed density network-cyclic neural network model, and calculate the prediction vector.
  • the pre-trained mixture density network - the loop neural network model, the modeling need ratio distribution P (z (t + 1)
  • the hybrid density network-recurrent neural network model is specifically an LSTM with 256 hidden units (ie, long short-term memory network).
  • the recurrent neural network model tries to capture the potential understanding of the current state of the vehicle in the environment, but this time
  • the underlying understanding of the current state of the vehicle is based on the previous z (ie, compressed abstract characterization feature vector) and behavior, predicting what the next z might look like, and updating its hidden state.
  • the action acquisition unit 130 is configured to input the compressed abstract representation feature vector and the prediction vector to the controller to generate an action vector; wherein the controller is a linear model.
  • the controller is used for the task of behavior selection.
  • the controller is a densely connected neural network.
  • the input of this network is the cascaded z (the potential state obtained from the VAE-length is 32) and h (the hidden state of the RNN-the length is 256).
  • These three output neurons correspond to three behaviors and are scaled to a suitable range. This behavior is then transmitted to the environment, which returns an updated observation, and then the next cycle begins.
  • the action acquisition unit 130 includes:
  • the action vector acquiring unit is configured to acquire the action vector corresponding to the compressed abstract characterization feature vector and the prediction vector according to the linear model in the controller.
  • time step can also be understood as a time frame
  • an observation color image of the road and vehicle environment received by the visual sensor, that is, 2D image frame
  • it is necessary to return to the next Take a series of behavior parameters—that is, the direction of steering (-1 to 1), acceleration (0 to 1), and braking (0 to 1), and then pass this behavior to the environment, return to the next observation, and then Start the next cycle, so as to perform real-time learning from the previous time and space, predict the behavior of the next frame, and have better adaptability to the environment.
  • the vector sending unit 140 is configured to send the action vector to the automatic driving end.
  • the motion vector is sent to the autopilot end, thereby controlling the unmanned driving.
  • the action vector includes at least the following behavioral parameters: that is, the direction of steering (-1 to 1), acceleration (0 to 1), and braking (0 to 1).
  • the device realizes that based on visual perception and learning by mixing different neural networks, it realizes the prediction of the future and increases the accuracy of decision-making.
  • the above-mentioned automatic driving behavior prediction apparatus may be implemented in the form of a computer program, and the computer program may run on a computer device as shown in FIG. 10.
  • FIG. 10 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is an in-vehicle intelligent terminal of an unmanned intelligent car.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the automatic driving behavior prediction method.
  • the processor 502 is used to provide calculation and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the automatic driving behavior prediction method.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 10 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the automatic driving behavior prediction method in the embodiment of the present application.
  • the embodiment of the computer device shown in FIG. 10 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than shown in the figure. Or combine certain components, or different component arrangements.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 10, and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the automatic driving behavior prediction method of the embodiment of the present application.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了自动驾驶行为预测方法、装置、计算机设备及存储介质。该方法包括:接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;以及将所述动作向量发送至自动驾驶端。

Description

自动驾驶行为预测方法、装置、计算机设备及存储介质
本申请要求于2019年6月18日提交中国专利局、申请号为201910527673.5、申请名称为“自动驾驶行为预测方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及智能决策技术领域,尤其涉及一种自动驾驶行为预测方法、装置、计算机设备及存储介质。
背景技术
无人驾驶系统是一个集环境感知、规划决策、多等级辅助驾驶等功能于一体的综合系统,它集中运用了计算机、现代传感、信息融合、通讯、人工智能及自动控制等技术,是典型的高新技术综合体。而自动驾驶的关键技术依次可以分为环境感知、行为决策、路径规划和运动控制四大部分。
目前,无人驾驶系统中常使用的机器学习系统都基于有监督学习建立,但这需要大量有标签训练样本,并且也缺少常识与独立预测的能力。在自动驾驶中,外界复杂的环境常常脱离训练的样本,从而使模型失去决策的能力。
申请内容
本申请实施例提供了一种自动驾驶行为预测方法、装置、计算机设备及存储介质,旨在解决现有技术中无人驾驶系统中常使用的机器学习系统都基于有监督学习建立,需要大量有标签训练样本,而在自动驾驶中外界复杂的环境常常脱离训练的样本,从而使模型失去决策的能力以及独立预测的能力的问题。
第一方面,本申请实施例提供了一种自动驾驶行为预测方法,其包括:
接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循 环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
将所述动作向量发送至自动驾驶端。
第二方面,本申请实施例提供了一种自动驾驶行为预测装置,其包括:
图像接收单元,用于接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
预测向量获取单元,用于将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
动作获取单元,用于将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
向量发送单元,用于将所述动作向量发送至自动驾驶端。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的自动驾驶行为预测方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的自动驾驶行为预测方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的自动驾驶行为预测方法的流程示意图;
图2为本申请实施例提供的自动驾驶行为预测方法的子流程示意图;
图3为本申请实施例提供的自动驾驶行为预测方法中将像素矩阵输入至变 分自编码器进行多次激励卷积和激励反卷积所采用的神经网络的结构示意图;
图4为本申请实施例提供的自动驾驶行为预测方法的另一子流程示意图;
图5为本申请实施例提供的自动驾驶行为预测方法中数据流的示意图;
图6为本申请实施例提供的自动驾驶行为预测方法中混合密度网络-循环神经网络模型的示意图;
图7为本申请实施例提供的自动驾驶行为预测装置的示意性框图;
图8为本申请实施例提供的自动驾驶行为预测装置的子单元示意性框图;
图9为本申请实施例提供的自动驾驶行为预测装置的另一子单元示意性框图;
图10为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1,图1为本申请实施例提供的自动驾驶行为预测方法的流程示意图,该自动驾驶行为预测方法应用于可无人驾驶的智能汽车中,该方法通过安装于可无人驾驶的智能汽车中的应用软件进行执行。
如图1所示,该方法包括步骤S110~S140。
S110、接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量。
在本实施例中,若设置于可无人驾驶的智能汽车(即自动驾驶端)的摄像头采集了视频,可以对视频切割后随机选取一张或多张,得到2D图像帧,将所述2D图像帧输入至变分自编码器(变分自编码器简记为VAE),通过变分自编码器进行处理后,即可得到与所述2D图像帧对应的压缩抽象表征特征向量。其中,变分自编码器中的编码/解码过程是一个卷积/反卷积神经网络的过程,也即变分自编码器作为视觉处理模块,其任务是学习每个已观测输入帧的抽象压缩表征,然后在每一时间帧上压缩模型的所见(图像帧)。
通过VAE模型,将观测到的输入图像浓缩为服从高斯分布的32维潜在向量(z),这意味着更小的环境表征,加快学习过程。这一步的作用在于驾驶过程中,将周围的环境例如道路的平直度、即将到来的弯道以及你相对于道路的位置进行浓缩,从而决定下一个行为。
在一实施例中,如图2所示,步骤S110包括:
S111、获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
S112、通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
S113、对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
在本实施例中,当获取了所述2D图像帧对应的像素矩阵(一般是64*64*3的图像,表示64*64的3通道图像),需将像素矩阵输入至变分自编码器进行多次激励卷积和激励反卷积,从而得到压缩抽象表征特征向量。
如图3所示,其为将像素矩阵输入至变分自编码器进行多次激励卷积和激励反卷积所采用的神经网络的结构示意图。通过3次激励卷积和3次激励反卷积后,即可实现将所述2D图像帧进行抽象压缩表征,从而得到与所述2D图像帧对应的压缩抽象表征特征向量。其中,
在一实施例中,如图3所示,步骤S111中将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果,包括:
获取与所述2D图像帧对应的64*64*3的像素矩阵;
通过32*4的第一卷积核对64*64*3的像素矩阵进行第一次激励卷积,得到31*31*32的第一卷积结果;
通过64*4的第二卷积核对31*31*32的第一卷积结果进行第二次激励卷积,得到14*14*64的第二卷积结果;
通过128*4的第三卷积核对14*14*64的第二卷积结果进行第三次激励卷积,得到6*6*128的第三卷积结果以作为编码结果。
在本实施例中,对所述2D图像帧对应的64*64*3的像素矩阵进行3次激励卷积实现编码后,获取了像素矩阵中的重要特征,但同时也产生了很多空白的像素点。为了后续对编码结果进行还原,可以采用与激励卷积相同次数的激励反卷积对编码结果进行还原,不仅还原放大了编码结果,而且在一定程度上确保的图像的质量。
在一实施例中,如图3所示,步骤S113中对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
获取与分类结果对应的5*5*128的卷积结果;
通过64*5的第四卷积核对所述分类结果对应的5*5*128的卷积结果进行第一次激励反卷积,得到13*13*64的第一反卷积结果;
通过32*6的第五卷积核对13*13*64的第一反卷积结果进行第二次激励反卷积,得到30*30*32的第二反卷积结果;
通过3*6的第六卷积核对30*30*32的第二反卷积结果进行第三次激励反卷积,得到64*64*3的第三反卷积结果,以作为与所述2D图像帧对应的压缩抽象表征特征向量。
在本实施例中,6*6*128的第三卷积结果作为编码结果输入至稠密层(也即卷积神经网络中的全连接层)后进行全连接,即可得到与所述2D图像帧对应的分类结果。为了在完成分类后,将所述分类结果还原成像素矩阵,此时可以采用与激励卷积相同次数的激励反卷积对编码结果进行还原,实现对图像的重建。
S120、将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神 经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数。
在本实施例中,当实现了压缩每一时间帧的观测(即获取与所述2D图像帧对应的压缩抽象表征特征向量),还要压缩随着时间发生的一切变化的其他信息,具体实施时可采用混合密度网络-循环神经网络(即MDN-RNN)预测未来,MDN-RNN模型可以充当变分自编码器预期产生的未来z向量的预测模型。由于自然中的很多复杂环境是随机的,RNN以输出一个概率密度函数p(z)而不是一个确定性预测z。
在一实施例中,如图4-图6所示,步骤S120包括:
S121、将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型中循环神经网络模型的输入,得到与所述压缩抽象表征特征向量对应的概率密度函数;
S122、将所述概率密度函数及控制参数作为预先训练的混合密度网络-循环神经网络模型中混合密度网络模型的输入,计算得到预测向量。
在本实施例中,预先训练的混合密度网络-循环神经网络模型时,需建模率分布P(z (t+1)|a t,z t,h t),其中a t为在t时刻时采取的行动(即动作向量),而h t是循环神经网络模型在t时刻时的隐藏态,τ是用于控制模型不确定性的参数。混合密度网络-循环神经网络模型具体地说就是一个有着256个隐藏单元的LSTM(即长短期记忆网络),和VAE类似,循环神经网络模型试图捕获环境中车辆当前状态的潜在理解,但此次对车辆当前状态的潜在理解是要以之前的z(即压缩抽象表征特征向量)和行为为基础,预测下一个z可能是什么样的,更新自己的隐藏状态。
S130、将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型。
在本实施例中,控制器则是用于行为选择的任务。简单地说,控制器就是一个密集连接的神经网络,这个网络的输入是级联的z(从VAE得到的潜在状态—长度为32)和h(RNN的隐藏状态—长度是256)。这三个输出神经元对应三个行为,且被缩放至适合的范围。然后将这一行为传送至环境中去,这会返回一个更新的观察,然后开始下一循环。
在一实施例中,步骤S130包括:
获取控制器中的线性模型a t=W c[z t h t]+b c;其中,a t为动作向量,z t为压缩抽象表征特征向量,h t为预测向量,W c为权重矩阵,b c为偏置向量;
根据控制器中的线性模型获取与所述压缩抽象表征特征向量及所述预测向量对应的动作向量。
在本实施例中,若给定当前状态z t,可以产生z t+1的概率分布,然后从z t+1中采样并作为真实世界的观察值。在每一个时间步(time step,也可以理解为时间帧)中,都会被馈送一个观察(通过视觉传感器接收到的道路与车辆的环境彩色图像,也即2D图像帧),还需要返回接下来采取的一系列行为参数——也就是转向的方向(-1到1)、加速度(0到1)以及刹车(0到1),然后将这一行为传递到环境中,返回下一个观察,再开始下一次循环,从而从前序时间与空间上进行实时学习,预测下一帧的行为,对于环境有更好的适应性。
S140、将所述动作向量发送至自动驾驶端。
在本实施例中,当获取了当前动作向量后,将动作向量发送至自动驾驶端,从而控制无人驾驶。动作向量中至少包括以下为行为参数:也就是转向的方向(-1到1)、加速度(0到1)以及刹车(0到1)。
该方法实现了基于视觉感知,通过混合不同的神经网络学习,实现对未来的预测,增大决策的准确性。
本申请实施例还提供一种自动驾驶行为预测装置,该自动驾驶行为预测装置用于执行前述自动驾驶行为预测方法的任一实施例。具体地,请参阅图7,图7是本申请实施例提供的自动驾驶行为预测装置的示意性框图。该自动驾驶行为预测装置100可以配置于可无人驾驶的智能汽车中。
如图7所示,自动驾驶行为预测装置100包括图像接收单元110、预测向量获取单元120、动作获取单元130、及向量发送单元140。
图像接收单元110,用于接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量。
在本实施例中,若设置于可无人驾驶的智能汽车(即自动驾驶端)的摄像头采集了视频,可以对视频切割后随机选取一张或多张,得到2D图像帧,将所述2D图像帧输入至变分自编码器(变分自编码器简记为VAE),通过变分 自编码器进行处理后,即可得到与所述2D图像帧对应的压缩抽象表征特征向量。其中,变分自编码器中的编码/解码过程是一个卷积/反卷积神经网络的过程,也即变分自编码器作为视觉处理模块,其任务是学习每个已观测输入帧的抽象压缩表征,然后在每一时间帧上压缩模型的所见(图像帧)。
通过VAE模型,将观测到的输入图像浓缩为服从高斯分布的32维潜在向量(z),这意味着更小的环境表征,加快学习过程。这一步的作用在于驾驶过程中,将周围的环境例如道路的平直度、即将到来的弯道以及你相对于道路的位置进行浓缩,从而决定下一个行为。
在一实施例中,如图8所示,图像接收单元110包括:
编码单元111,用于获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
全连接单元112,用于通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
解码单元113,用于对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
在本实施例中,当获取了所述2D图像帧对应的像素矩阵(一般是64*64*3的图像,表示64*64的3通道图像),需将像素矩阵输入至变分自编码器进行多次激励卷积和激励反卷积,从而得到压缩抽象表征特征向量。
如图3所示,其为将像素矩阵输入至变分自编码器进行多次激励卷积和激励反卷积所采用的神经网络的结构示意图。通过3次激励卷积和3次激励反卷积后,即可实现将所述2D图像帧进行抽象压缩表征,从而得到与所述2D图像帧对应的压缩抽象表征特征向量。
在一实施例中,编码单元111包括:
像素矩阵获取单元,用于获取与所述2D图像帧对应的64*64*3的像素矩阵;
第一激励卷积单元,用于通过32*4的第一卷积核对64*64*3的像素矩阵进行第一次激励卷积,得到31*31*32的第一卷积结果;
第二激励卷积单元,用于通过64*4的第二卷积核对31*31*32的第一卷积 结果进行第二次激励卷积,得到14*14*64的第二卷积结果;
第三激励卷积单元,用于通过128*4的第三卷积核对14*14*64的第二卷积结果进行第三次激励卷积,得到6*6*128的第三卷积结果以作为编码结果。
在本实施例中,对所述2D图像帧对应的64*64*3的像素矩阵进行3次激励卷积实现编码后,获取了像素矩阵中的重要特征,但同时也产生了很多空白的像素点。为了后续对编码结果进行还原,可以采用与激励卷积相同次数的激励反卷积对编码结果进行还原,不仅还原放大了编码结果,而且在一定程度上确保的图像的质量。
在一实施例中,解码单元113包括:
卷积结果获取单元,用于获取与分类结果对应的5*5*128的卷积结果;
第一激励反卷积单元,用于通过64*5的第四卷积核对所述分类结果对应的5*5*128的卷积结果进行第一次激励反卷积,得到13*13*64的第一反卷积结果;
第二激励反卷积单元,用于通过32*6的第五卷积核对13*13*64的第一反卷积结果进行第二次激励反卷积,得到30*30*32的第二反卷积结果;
第三激励反卷积单元,用于通过3*6的第六卷积核对30*30*32的第二反卷积结果进行第三次激励反卷积,得到64*64*3的第三反卷积结果,以作为与所述2D图像帧对应的压缩抽象表征特征向量。
在本实施例中,6*6*128的第三卷积结果作为编码结果输入至稠密层(也即卷积神经网络中的全连接层)后进行全连接,即可得到与所述2D图像帧对应的分类结果。为了在完成分类后,将所述分类结果还原成像素矩阵,此时可以采用与激励卷积相同次数的激励反卷积对编码结果进行还原,实现对图像的重建。
预测向量获取单元120,用于将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数。
在本实施例中,当实现了压缩每一时间帧的观测(即获取与所述2D图像帧对应的压缩抽象表征特征向量),还要压缩随着时间发生的一切变化的其他信息,具体实施时可采用混合密度网络-循环神经网络(即MDN-RNN)预测未来,MDN-RNN模型可以充当变分自编码器预期产生的未来z向量的预测模型。 由于自然中的很多复杂环境是随机的,RNN以输出一个概率密度函数p(z)而不是一个确定性预测z。
在一实施例中,如图9所示,预测向量获取单元120包括:
第一神经网络处理单元121,用于将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型中循环神经网络模型的输入,得到与所述压缩抽象表征特征向量对应的概率密度函数;
第二神经网络处理单元122,用于将所述概率密度函数及控制参数作为预先训练的混合密度网络-循环神经网络模型中混合密度网络模型的输入,计算得到预测向量。
在本实施例中,预先训练的混合密度网络-循环神经网络模型时,需建模率分布P(z (t+1)|a t,z t,h t),其中a t为在t时刻时采取的行动(即动作向量),而h t是循环神经网络模型在t时刻时的隐藏态,τ是用于控制模型不确定性的参数。混合密度网络-循环神经网络模型具体地说就是一个有着256个隐藏单元的LSTM(即长短期记忆网络),和VAE类似,循环神经网络模型试图捕获环境中车辆当前状态的潜在理解,但此次对车辆当前状态的潜在理解是要以之前的z(即压缩抽象表征特征向量)和行为为基础,预测下一个z可能是什么样的,更新自己的隐藏状态。
动作获取单元130,用于将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型。
在本实施例中,控制器则是用于行为选择的任务。简单地说,控制器就是一个密集连接的神经网络,这个网络的输入是级联的z(从VAE得到的潜在状态—长度为32)和h(RNN的隐藏状态—长度是256)。这三个输出神经元对应三个行为,且被缩放至适合的范围。然后将这一行为传送至环境中去,这会返回一个更新的观察,然后开始下一循环。
在一实施例中,动作获取单元130包括:
线性模型获取单元,用于获取控制器中的线性模型a t=W c[z t h t]+b c;其中,a t为动作向量,z t为压缩抽象表征特征向量,h t为预测向量,W c为权重矩阵,b c为偏置向量;
动作向量获取单元,用于根据控制器中的线性模型获取与所述压缩抽象表征特征向量及所述预测向量对应的动作向量。
在本实施例中,若给定当前状态z t,可以产生z t+1的概率分布,然后从z t+1中采样并作为真实世界的观察值。在每一个时间步(time step,也可以理解为时间帧)中,都会被馈送一个观察(通过视觉传感器接收到的道路与车辆的环境彩色图像,也即2D图像帧),还需要返回接下来采取的一系列行为参数——也就是转向的方向(-1到1)、加速度(0到1)以及刹车(0到1),然后将这一行为传递到环境中,返回下一个观察,再开始下一次循环,从而从前序时间与空间上进行实时学习,预测下一帧的行为,对于环境有更好的适应性。
向量发送单元140,用于将所述动作向量发送至自动驾驶端。
在本实施例中,当获取了当前动作向量后,将动作向量发送至自动驾驶端,从而控制无人驾驶。动作向量中至少包括以下为行为参数:也就是转向的方向(-1到1)、加速度(0到1)以及刹车(0到1)。
该装置实现了基于视觉感知,通过混合不同的神经网络学习,实现对未来的预测,增大决策的准确性。
上述自动驾驶行为预测装置可以实现为计算机程序的形式,该计算机程序可以在如图10所示的计算机设备上运行。
请参阅图10,图10是本申请实施例提供的计算机设备的示意性框图。该计算机设备500是可无人驾驶的智能汽车的车载智能终端。
参阅图10,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行自动驾驶行为预测方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行自动驾驶行为预测方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件, 或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例的自动驾驶行为预测方法。
本领域技术人员可以理解,图10中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图10所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例的自动驾驶行为预测方法。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种自动驾驶行为预测方法,包括:
    接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
    将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
    将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
    将所述动作向量发送至自动驾驶端。
  2. 根据权利要求1所述的自动驾驶行为预测方法,其中,所述通将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
    获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
    通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
    对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
  3. 根据权利要求2所述的自动驾驶行为预测方法,其中,所述获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果,包括:
    获取与所述2D图像帧对应的64*64*3的像素矩阵;
    通过32*4的第一卷积核对64*64*3的像素矩阵进行第一次激励卷积,得到31*31*32的第一卷积结果;
    通过64*4的第二卷积核对31*31*32的第一卷积结果进行第二次激励卷积,得到14*14*64的第二卷积结果;
    通过128*4的第三卷积核对14*14*64的第二卷积结果进行第三次激励卷积,得到6*6*128的第三卷积结果以作为编码结果。
  4. 根据权利要求2所述的自动驾驶行为预测方法,其中,所述对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
    获取与分类结果对应的5*5*128的卷积结果;
    通过64*5的第四卷积核对所述分类结果对应的5*5*128的卷积结果进行第一次激励反卷积,得到13*13*64的第一反卷积结果;
    通过32*6的第五卷积核对13*13*64的第一反卷积结果进行第二次激励反卷积,得到30*30*32的第二反卷积结果;
    通过3*6的第六卷积核对30*30*32的第二反卷积结果进行第三次激励反卷积,得到64*64*3的第三反卷积结果,以作为与所述2D图像帧对应的压缩抽象表征特征向量。
  5. 根据权利要求1所述的自动驾驶行为预测方法,其中,所述将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量,包括:
    将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型中循环神经网络模型的输入,得到与所述压缩抽象表征特征向量对应的概率密度函数;
    将所述概率密度函数及控制参数作为预先训练的混合密度网络-循环神经网络模型中混合密度网络模型的输入,计算得到预测向量。
  6. 根据权利要求1所述的自动驾驶行为预测方法,其中,所述将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量,包括:
    获取控制器中的线性模型a t=W c[z t h t]+b c;其中,a t为动作向量,z t为压缩抽象表征特征向量,h t为预测向量,W c为权重矩阵,b c为偏置向量;
    根据控制器中的线性模型获取与所述压缩抽象表征特征向量及所述预测向量对应的动作向量。
  7. 根据权利要求1所述的自动驾驶行为预测方法,其中,所述接收自动驾驶端当前所采集的视频序列中的2D图像帧,包括:
    若设置于自动驾驶端的摄像头已采集到视频,对视频切割后随机选取一张 或多张,得到2D图像帧。
  8. 根据权利要求1所述的自动驾驶行为预测方法,其中,将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量之前,还包括:
    建模P(z (t+1)|a t,z t,h t),其中a t为在t时刻的动作向量,h t是循环神经网络模型在t时刻时的隐藏态,z t为当前状态,z t+1为下一时刻状态,τ是用于控制模型不确定性的参数。
  9. 一种自动驾驶行为预测装置,包括:
    图像接收单元,用于接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
    预测向量获取单元,用于将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
    动作获取单元,用于将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
    向量发送单元,用于将所述动作向量发送至自动驾驶端。
  10. 根据权利要求9所述的自动驾驶行为预测装置,其中,所述图像接收单元,包括:
    编码单元,用于获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
    全连接单元,用于通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
    解码单元,用于对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步 骤:
    接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
    将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
    将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
    将所述动作向量发送至自动驾驶端。
  12. 根据权利要求11所述的计算机设备,其中,所述通将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
    获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
    通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
    对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
  13. 根据权利要求12所述的计算机设备,其中,所述获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果,包括:
    获取与所述2D图像帧对应的64*64*3的像素矩阵;
    通过32*4的第一卷积核对64*64*3的像素矩阵进行第一次激励卷积,得到31*31*32的第一卷积结果;
    通过64*4的第二卷积核对31*31*32的第一卷积结果进行第二次激励卷积,得到14*14*64的第二卷积结果;
    通过128*4的第三卷积核对14*14*64的第二卷积结果进行第三次激励卷积,得到6*6*128的第三卷积结果以作为编码结果。
  14. 根据权利要求12所述的计算机设备,其中,所述对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
    获取与分类结果对应的5*5*128的卷积结果;
    通过64*5的第四卷积核对所述分类结果对应的5*5*128的卷积结果进行第一次激励反卷积,得到13*13*64的第一反卷积结果;
    通过32*6的第五卷积核对13*13*64的第一反卷积结果进行第二次激励反卷积,得到30*30*32的第二反卷积结果;
    通过3*6的第六卷积核对30*30*32的第二反卷积结果进行第三次激励反卷积,得到64*64*3的第三反卷积结果,以作为与所述2D图像帧对应的压缩抽象表征特征向量。
  15. 根据权利要求11所述的计算机设备,其中,所述将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量,包括:
    将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型中循环神经网络模型的输入,得到与所述压缩抽象表征特征向量对应的概率密度函数;
    将所述概率密度函数及控制参数作为预先训练的混合密度网络-循环神经网络模型中混合密度网络模型的输入,计算得到预测向量。
  16. 根据权利要求11所述的计算机设备,其中,所述将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量,包括:
    获取控制器中的线性模型a t=W c[z t h t]+b c;其中,a t为动作向量,z t为压缩抽象表征特征向量,h t为预测向量,W c为权重矩阵,b c为偏置向量;
    根据控制器中的线性模型获取与所述压缩抽象表征特征向量及所述预测向量对应的动作向量。
  17. 根据权利要求11所述的计算机设备,其中,所述接收自动驾驶端当前所采集的视频序列中的2D图像帧,包括:
    若设置于自动驾驶端的摄像头已采集到视频,对视频切割后随机选取一张或多张,得到2D图像帧。
  18. 根据权利要求11所述的计算机设备,其中,将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量之 前,还包括:
    建模P(z (t+1)|a t,z t,h t),其中a t为在t时刻的动作向量,h t是循环神经网络模型在t时刻时的隐藏态,z t为当前状态,z t+1为下一时刻状态,τ是用于控制模型不确定性的参数。
  19. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    接收自动驾驶端当前所采集的视频序列中的2D图像帧,将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量;
    将所述压缩抽象表征特征向量作为预先训练的混合密度网络-循环神经网络模型的输入,得到预测向量;其中,所述混合密度网络-循环神经网络模型中循环神经网络模型的输出为与所述压缩抽象表征特征向量对应的概率密度函数;
    将所述压缩抽象表征特征向量及所述预测向量均输入至控制器,生成得到动作向量;其中,所述控制器为线性模型;以及
    将所述动作向量发送至自动驾驶端。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述通将所述2D图像帧作为变分自编码器的输入,得到与所述2D图像帧对应的压缩抽象表征特征向量,包括:
    获取与所述2D图像帧对应的像素矩阵,将所述像素矩阵输入至变分自编码器进行多次激励卷积,得到编码结果;
    通过变分自编码器的稠密层对所述编码结果进行全连接,得到分类结果;
    对所述分类结果进行多次激励反卷积,得到与所述2D图像帧对应的压缩抽象表征特征向量;其中,将所述像素矩阵输入至变分自编码器进行多次激励卷积的激励卷积次数与对所述分类结果进行多次激励反卷积的激励反卷积的次数相同。
PCT/CN2019/103467 2019-06-18 2019-08-30 自动驾驶行为预测方法、装置、计算机设备及存储介质 WO2020252926A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910527673.5A CN110398957A (zh) 2019-06-18 2019-06-18 自动驾驶行为预测方法、装置、计算机设备及存储介质
CN201910527673.5 2019-06-18

Publications (1)

Publication Number Publication Date
WO2020252926A1 true WO2020252926A1 (zh) 2020-12-24

Family

ID=68323246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103467 WO2020252926A1 (zh) 2019-06-18 2019-08-30 自动驾驶行为预测方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110398957A (zh)
WO (1) WO2020252926A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914482A (zh) * 2020-07-27 2020-11-10 武汉中海庭数据技术有限公司 用于自动驾驶测试的行驶工况生成方法及系统
CN111988622B (zh) * 2020-08-20 2021-12-10 深圳市商汤科技有限公司 视频预测方法及装置、电子设备和存储介质
CN115373380A (zh) * 2022-04-13 2022-11-22 杭州电子科技大学 一种基于lstm和变分自编码的移动机器人自主探索方法

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590438A (zh) * 2017-08-16 2018-01-16 中国地质大学(武汉) 一种智能辅助驾驶方法及系统

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590438A (zh) * 2017-08-16 2018-01-16 中国地质大学(武汉) 一种智能辅助驾驶方法及系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAVID HA ET AL.: "World Models", ARXIV, 9 May 2018 (2018-05-09), XP081232786, DOI: 20200221110652Y *
EDER SANTANA ET AL.: "Learning a Driving Simulator", ARXIV, 3 August 2016 (2016-08-03), XP080718041, DOI: 20200221110440Y *

Also Published As

Publication number Publication date
CN110398957A (zh) 2019-11-01

Similar Documents

Publication Publication Date Title
CN109964237B (zh) 图像深度预测神经网络
EP3673417B1 (en) System and method for distributive training and weight distribution in a neural network
CN110363058B (zh) 使用单触发卷积神经网络的用于避障的三维对象定位
US10275691B2 (en) Adaptive real-time detection and examination network (ARDEN)
US20210097373A1 (en) Deep reinforcement learning with fast updating recurrent neural networks and slow updating recurrent neural networks
WO2020252926A1 (zh) 自动驾驶行为预测方法、装置、计算机设备及存储介质
CN110796692A (zh) 用于同时定位与建图的端到端深度生成模型
Akan et al. Stretchbev: Stretching future instance prediction spatially and temporally
CN111738037B (zh) 一种自动驾驶方法及其系统、车辆
WO2020052480A1 (zh) 无人驾驶行为决策及模型训练
CN109242003B (zh) 基于深度卷积神经网络的车载视觉系统自身运动确定方法
US20210097266A1 (en) Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision
CN111191492B (zh) 信息估计、模型检索和模型对准方法和装置
US20210103744A1 (en) Spatio-temporal embeddings
CN109584299B (zh) 一种定位方法、定位装置、终端及存储介质
CN111709471A (zh) 对象检测模型的训练方法以及对象检测方法、装置
JP2022164640A (ja) マルチモーダル自動ラベル付けと能動的学習のためのデータセットとモデル管理のためのシステムと方法
CN111242176B (zh) 计算机视觉任务的处理方法、装置及电子系统
CN114758502A (zh) 双车联合轨迹预测方法及装置、电子设备和自动驾驶车辆
Sadid et al. Dynamic Spatio-temporal Graph Neural Network for Surrounding-aware Trajectory Prediction of Autonomous Vehicles
CN111476062A (zh) 车道线检测方法、装置、电子设备及驾驶系统
Lange et al. Lopr: Latent occupancy prediction using generative models
CN113591885A (zh) 目标检测模型训练方法、设备及计算机存储介质
WO2019228654A1 (en) Method for training a prediction system and system for sequence prediction
Zhang et al. Prediction of human actions in assembly process by a spatial-temporal end-to-end learning model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19934072

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19934072

Country of ref document: EP

Kind code of ref document: A1