WO2024060558A1 - Feasible region prediction method and apparatus, and system and storage medium - Google Patents

Feasible region prediction method and apparatus, and system and storage medium Download PDF

Info

Publication number
WO2024060558A1
WO2024060558A1 PCT/CN2023/083769 CN2023083769W WO2024060558A1 WO 2024060558 A1 WO2024060558 A1 WO 2024060558A1 CN 2023083769 W CN2023083769 W CN 2023083769W WO 2024060558 A1 WO2024060558 A1 WO 2024060558A1
Authority
WO
WIPO (PCT)
Prior art keywords
bird
eye view
dimensional image
features
image features
Prior art date
Application number
PCT/CN2023/083769
Other languages
French (fr)
Chinese (zh)
Inventor
崔霄
Original Assignee
九识(苏州)智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 九识(苏州)智能科技有限公司 filed Critical 九识(苏州)智能科技有限公司
Publication of WO2024060558A1 publication Critical patent/WO2024060558A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The present application provides a feasible region prediction method and apparatus, and a system and a storage medium, which are applied to autonomous driving or assisted driving of a vehicle. The method comprises: acquiring an around-view image at the current moment, and obtaining a bird's-eye-view feature according to the around-view image, wherein the around-view image comprises images at a plurality of angles of view that are collected by a plurality of cameras on a vehicle; extracting the bird's-eye-view feature to obtain a bird's-eye-view high-dimensional image feature at the current moment; and generating a future feasible region prediction image according to the bird's-eye-view high-dimensional image feature at the current moment and a time sequence queue formed by bird's-eye-view high-dimensional image features at a plurality of historical moments, and outputting the feasible region prediction image. In the present application, the analysis and prediction of a future scenario are realized, a basis for a behavior decision of autonomous driving or assisted driving of a vehicle can be provided, and by combining scenario sensing and behavior prediction, repeated calculation and information accumulation errors caused by dividing feasible region segmentation and obstacle prediction into two modules are avoided, without the need to perform independent behavior prediction.

Description

可行域预测方法、装置、系统和存储介质Feasible region prediction method, device, system and storage medium 技术领域Technical field
本申请涉及辅助/自动驾驶技术领域,更具体地涉及一种可行域预测方法、装置、系统和存储介质。This application relates to the field of assisted/autonomous driving technology, and more specifically to a feasible region prediction method, device, system and storage medium.
背景技术Background technique
在辅助驾驶、自动驾驶等应用中,对可行驶区域的分割是重要的技术环节。可行驶区域的分割一般是基于摄像头或者激光雷达等传感器反馈回的信息进行的。其中摄像头主要反馈的是图像信息,图像信息的优点在于可视距离远、分辨率高、场景的色彩和纹理特征表达清晰,缺陷在于距离信息缺失、图像坐标系向世界坐标系转换困难。激光雷达主要反馈回的是点云信息,点云信息的优点在于距离准确、不用坐标系转换,缺点在于分辨率较低、色彩纹理信息缺失。目前,一般将可行域的提取问题认为是静态场景的分割问题。In applications such as assisted driving and autonomous driving, the segmentation of drivable areas is an important technical link. The segmentation of drivable areas is generally based on information fed back by sensors such as cameras or lidar. Among them, the camera mainly feeds back image information. The advantages of image information are long visual distance, high resolution, and clear expression of the color and texture characteristics of the scene. The disadvantages are the lack of distance information and the difficulty in converting the image coordinate system to the world coordinate system. Lidar mainly feeds back point cloud information. The advantage of point cloud information is that the distance is accurate and no coordinate system conversion is required. The disadvantage is that the resolution is low and the color texture information is missing. At present, the problem of extracting feasible regions is generally considered as the problem of segmentation of static scenes.
摄像头在价格方面相比激光雷达存在优势。相关技术中,以基于环视相机鸟瞰图特征的目标检测算法为例,利用卷积神经网络(CNN,Convolutional Neural Network)编码器对每一个相机的输入图像进行编码,利用转换器(Transformer)等模型将每个相机编码后的特征从图像坐标系转换到车辆坐标系,形成鸟瞰图特征(BEV Feature,Bird's-Eye-View Feature),从BEV Feature中对目标进行检测,输出鸟瞰视角的检测结果。Cameras have advantages over lidar in terms of price. In related technologies, taking the target detection algorithm based on the bird's-eye view features of the surround-view camera as an example, a convolutional neural network (CNN, Convolutional Neural Network) encoder is used to encode the input image of each camera, and models such as a transformer are used Convert the encoded features of each camera from the image coordinate system to the vehicle coordinate system to form a bird's-eye view feature (BEV Feature, Bird's-Eye-View Feature), detect the target from the BEV Feature, and output the detection results from the bird's-eye view.
但是,上述相关技术专注于通过BEV Feature对当前时刻的环境进行描述,目标检测和可行域分割均是对当前环境的刻画,缺乏对未来场景的分析。However, the above-mentioned related technologies focus on describing the environment at the current moment through BEV Features. Target detection and feasible region segmentation are both descriptions of the current environment and lack analysis of future scenarios.
鉴于上述问题的存在,本申请提出一种新的可行域预测方法、装置、系统和存储介质,以至少部分地解决上述问题。In view of the existence of the above problems, this application proposes a new feasible region prediction method, device, system and storage medium to at least partially solve the above problems.
发明内容Contents of the invention
在发明内容部分中引入了一系列简化形式的概念,这将在具体实施方式部分中进一步详细说明。本发明的发明内容部分并不意味着要试图限定出所要求保护的技术方案的关键特征和必要技术特征,更不意味着试图确定所要求保护的技术方案的保护范围。This summary introduces a series of concepts in a simplified form that are further described in the detailed description. The summary of the present invention is not intended to limit the key features and necessary technical features of the claimed technical solution, nor is it intended to determine the protection scope of the claimed technical solution.
本申请一方面提供了一种可行域预测方法,所述方法应用于车辆自动驾驶或者辅助驾驶,包括:获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。On the one hand, the present application provides a feasible region prediction method, which method is applied to vehicle automatic driving or assisted driving, including: obtaining a surround image at the current moment, and obtaining bird's-eye view features based on the surround image; the surround image includes Images from multiple perspectives collected by multiple cameras on the vehicle; extracting the bird's-eye view features to obtain high-dimensional bird's-eye view image features at the current moment; based on the high-dimensional bird's-eye view image features at the current moment and multiple histories A time series queue composed of bird's-eye view high-dimensional image features at each moment generates a future feasible region prediction map, and outputs the feasible region prediction map.
在一个示例中,所述根据所述环视图像得到鸟瞰图特征,包括:对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征。In one example, obtaining bird's-eye view features based on the surround image includes: performing feature extraction on images from multiple viewing angles to obtain high-dimensional image features of multiple viewing angle images; The image features are fused to obtain the bird's-eye view features.
在一个示例中,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:将所述多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,根据所述转换的结果得到所述鸟瞰图特征。In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features includes: converting the high-dimensional image features of the multiple perspective images from the coordinate system of the image from each perspective. to the vehicle coordinate system, and obtain the bird's-eye view features according to the result of the transformation.
在一个示例中,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:根据所述时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对所述未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成所述每个时刻的可行域预测图,以得到未来的所述多个时刻的可行域预测图。 In one example, generating a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments includes: according to the time series The queue generates bird's-eye view high-dimensional image features at multiple moments in the future; the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple moments in the future are upsampled to generate all the bird's-eye view high-dimensional image features. The feasible region prediction map at each moment is described to obtain the feasible region prediction map at the multiple moments in the future.
在一个示例中,所述方法是由训练好的一个神经网络来实施的,所述神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:所述第一子网络用于获取当前时刻的所述多个视角的图像,并对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;所述第二子网络用于对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征;所述第三子网络用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;所述第四子网络用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one example, the method is implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: the third sub-network One sub-network is used to obtain images from multiple viewing angles at the current moment, and perform feature extraction on the images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; the second sub-network is used to extract all images from multiple viewing angles. The high-dimensional image features of the multiple viewing angle images are fused to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the third sub-network is used to extract the bird's-eye view features. The four sub-networks are used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
在一个示例中,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:将所述视角图像高维图像特征作为键、所述鸟瞰图特征中的像素位置坐标作为查询输入到所述第二子网络,并根据所述第二子网络的输出结果得到所述鸟瞰图特征。In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view feature includes: using the high-dimensional image features of the perspective image as a key, the pixels in the bird's-eye view feature The location coordinates are input to the second sub-network as a query, and the bird's-eye view features are obtained based on the output results of the second sub-network.
在一个示例中,所述第三子网络包括多个卷积层,所述对所述鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个所述卷积层对所述鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到所述鸟瞰图高维图像特征。In one example, the third sub-network includes multiple convolutional layers, and extracting the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes: using one of the convolutional layers to extract the bird's-eye view features. Feature extraction is performed on the bird's-eye view features, and the features extracted by the previous convolution layer are re-extracted through the subsequent convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个示例中,所述第四子网络包括与所述时序队列中鸟瞰图高维图像特征的数量相同的分网络,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的所述分网络根据所述时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对所述未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one example, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the bird's-eye view high-dimensional image features according to the current moment and multiple historical moments Generating a future feasible region prediction map from a time series queue composed of bird's-eye view high-dimensional image features includes: generating bird's-eye view high-dimensional image features at corresponding moments in the future based on the corresponding sub-network and the time series queue respectively; The bird's-eye view high-dimensional image features at the corresponding time in the future are upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
在一个示例中,所述方法是由训练好的多个神经网络来实施的,所述多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:所述第一神经网络用于获取当前时刻的所述多个视角的图像,并对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;所述第二神经网络用于对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征;所述第三神经网络用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;所述第四神经网络用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one example, the method is implemented by multiple trained neural networks, including a first neural network, a second neural network, a third neural network and a fourth neural network, wherein: the first neural network is used to obtain the images of the multiple perspectives at the current moment, and perform feature extraction on the images of the multiple perspectives to obtain high-dimensional image features of the multiple perspective images; the second neural network is used to fuse the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features; the third neural network is used to extract the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the fourth neural network is used to generate a future feasible domain prediction map based on a time series queue composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments.
在一个示例中,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:将所述视角图像高维图像特征作为键、所述鸟瞰图特征中的像素位置坐标作为查询输入到所述第二神经网络,并根据所述第二神经网络的输出结果得到所述鸟瞰图特征。In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view feature includes: using the high-dimensional image features of the perspective image as a key, the pixels in the bird's-eye view feature The location coordinates are input to the second neural network as a query, and the bird's-eye view features are obtained based on the output of the second neural network.
在一个示例中,所述第三神经网络包括多个卷积层,所述对所述鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个所述卷积层对所述鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到所述鸟瞰图高维图像特征。In one example, the third neural network includes multiple convolutional layers, and extracting the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes: using one of the convolutional layers to extract the bird's-eye view features. Feature extraction is performed on the bird's-eye view features, and the features extracted by the previous convolution layer are re-extracted through the subsequent convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个示例中,所述第四神经网络包括与所述时序队列中鸟瞰图高维图像特征的数量相同的子网络,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的所述子网络根据所述时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对所述未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one example, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the bird's-eye view high-dimensional image features according to the current moment and multiple historical moments Generating a future feasible region prediction map from a time-series queue composed of bird's-eye view high-dimensional image features includes: generating bird's-eye view high-dimensional image features at corresponding moments in the future based on the corresponding sub-network and the time-series queue respectively; The bird's-eye view high-dimensional image features at the corresponding time in the future are upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
在一个示例中,所述可行域预测图是以概率化方式呈现的概率图,所述概率图用于表征未来时刻的环视图像中的像素点属于可行驶区域的概率。In one example, the feasible domain prediction map is a probability map presented in a probabilistic manner, and the probability map is used to represent the probability that a pixel point in the surround view image at a future moment belongs to a drivable area.
在一个示例中,当所述概率图中像素点的像素值不大于设定阈值时,所述像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当所述概率图中像素点的像素值大于所述设定阈值时,所述像素点所对应的未来时刻的环视图像中的像素点属于可行驶区 域。In one example, when the pixel value of a pixel in the probability map is not greater than a set threshold, the pixel in the surrounding image at a future time corresponding to the pixel does not belong to the drivable area. When the probability map When the pixel value of the middle pixel is greater than the set threshold, the pixel in the surrounding image in the future corresponding to the pixel belongs to the drivable area. area.
本申请又一方面提供了一种可行域预测装置,应用于车辆自动驾驶或者辅助驾驶,包括:鸟瞰图特征模块,用于获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;鸟瞰图高维图像特征模块,用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;可行域预测图模块,用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。On the other hand, the present application provides a feasible region prediction device, which is applied to vehicle automatic driving or assisted driving, including: a bird's-eye view feature module, used to obtain the surround image at the current moment, and obtain the bird's-eye view features based on the surround image; The surround image includes images from multiple perspectives collected by multiple cameras on the vehicle; a bird's-eye view high-dimensional image feature module is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; feasible The domain prediction map module is used to generate a future feasible domain prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible domain Forecast graph.
本申请又一方面提供了一种可行域预测装置,应用于车辆自动驾驶或者辅助驾驶,包括:多个设置于车辆上的摄像头,用于采集当前时刻的环视图像;所述环视图像包括多个视角的图像;一个或多个处理器,用于:获取所述环视图像,并根据所述环视图像得到鸟瞰图特征;对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。On the other hand, the present application provides a feasible domain prediction device, which is applied to vehicle automatic driving or assisted driving, including: multiple cameras arranged on the vehicle, used to collect surround images at the current moment; the surround images include images from multiple perspectives; one or more processors, used to: obtain the surround images, and obtain bird's-eye view features based on the surround images; extract the bird's-eye view features to obtain high-dimensional image features of the bird's-eye view at the current moment; generate a future feasible domain prediction map based on a time series queue composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments, and output the feasible domain prediction map.
本申请又一方面提供了一种可行域预测装置,包括存储器和处理器,所述存储器上存储有由所述处理器运行的计算机程序,所述计算机程序在由所述处理器运行时,使得所述处理器执行上述中任意一项所述的可行域预测方法。Another aspect of the present application provides a feasible region prediction device, which includes a memory and a processor. The memory stores a computer program run by the processor. When the computer program is run by the processor, such that The processor executes any one of the above feasible region prediction methods.
本申请又一方面提供了一种用于车辆自动驾驶或辅助驾驶的系统,所述系统包括上述中的任意一项所述的可行域预测装置。Another aspect of the present application provides a system for automatic driving or assisted driving of vehicles, the system including any one of the feasible region prediction devices described above.
本申请又一方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序在由处理器运行时使得所述处理器执行上述中任意一项所述的可行域预测方法。In yet another aspect, the present application provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run by a processor, the computer program causes the processor to execute any one of the above. The feasible region prediction method described above.
根据本申请实施例的可行域预测方法、装置、系统和存储介质,通过当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征得到未来的可行域预测图,实现了对未来场景的分析和预测,从而可以为车辆自动驾驶或辅助驾驶的行为决策提供依据;且可行域预测图基于所获取的当前时刻的环视图像而生成,可行域预测图通过将场景感知和行为预测结合在一起,能够直接给出障碍物未来的行动轨迹,从而划分出可行驶区域和不可行驶区域,避免了相关技术中需要将可行域分割和障碍物预测划分为两个模块所造成的重复计算和信息累计误差,无需再独立进行行为预测。According to the feasible region prediction method, device, system and storage medium of the embodiment of the present application, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and realizes The analysis and prediction of future scenes can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior The combination of prediction and prediction can directly give the future trajectory of obstacles, thereby dividing the drivable area and the non-driving area, avoiding the duplication caused by the need to divide feasible region segmentation and obstacle prediction into two modules in related technologies. Errors in calculations and information accumulate, eliminating the need to independently predict behavior.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.
在附图中:In the attached picture:
图1示出根据本申请实施例的电子设备的示意性框图;FIG1 is a schematic block diagram of an electronic device according to an embodiment of the present application;
图2示出根据本申请一实施例的可行域预测方法的示意性流程图;Figure 2 shows a schematic flow chart of a feasible region prediction method according to an embodiment of the present application;
图3示出根据本申请一实施例的根据环视图像得到鸟瞰图特征的流程框图;Figure 3 shows a flow chart for obtaining bird's-eye view features from a surround-view image according to an embodiment of the present application;
图4示出根据本申请一实施例的多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系的示意图;Figure 4 shows a schematic diagram of the transformation of high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system according to an embodiment of the present application;
图5示出根据本申请一实施例的对鸟瞰图特征进行提取得到鸟瞰图高维图像特征的流程框图;Figure 5 shows a flow chart of extracting bird's-eye view features to obtain bird's-eye view high-dimensional image features according to an embodiment of the present application;
图6示出了根据本申请一实施例的根据时序队列生成未来的可行域预测图的流程框图;Figure 6 shows a flow chart of generating a future feasible region prediction graph based on a timing queue according to an embodiment of the present application;
图7示出了根据本申请实施例的可行域预测装置的示意性框图;Figure 7 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present application;
图8示出了根据本申请实施例的另一可行域预测装置的示意性框图。 Figure 8 shows a schematic block diagram of another feasible region prediction device according to an embodiment of the present application.
具体实施方式Detailed ways
为了使得本申请的目的、技术方案和优点更为明显,下面将参照附图详细描述根据本申请的示例实施例。显然,所描述的实施例仅仅是本申请的一部分实施例,而不是本申请的全部实施例,应理解,本申请不受这里描述的示例实施例的限制。基于本申请中描述的本申请实施例,本领域技术人员在没有付出创造性劳动的情况下所得到的所有其它实施例都应落入本申请的保护范围之内。In order to make the purpose, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments of the present application. It should be understood that the present application is not limited by the example embodiments described here. Based on the embodiments of the present application described in this application, all other embodiments obtained by those skilled in the art without creative efforts should fall within the protection scope of the present application.
在下文的描述中,给出了大量具体的细节以便提供对本申请更为彻底的理解。然而,对于本领域技术人员而言显而易见的是,本申请可以无需一个或多个这些细节而得以实施。在其他的例子中,为了避免与本申请发生混淆,对于本领域公知的一些技术特征未进行描述。In the following description, numerous specific details are given in order to provide a thorough understanding of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other examples, some technical features that are well known in the art are not described in order to avoid confusion with the present application.
应当理解的是,本申请能够以不同形式实施,而不应当解释为局限于这里提出的实施例。相反地,提供这些实施例将使公开彻底和完全,并且将本申请的范围完全地传递给本领域技术人员。It will be understood that the application may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
在此使用的术语的目的仅在于描述具体实施例并且不作为本申请的限制。在此使用时,单数形式的“一”、“一个”和“所述/该”也意图包括复数形式,除非上下文清楚指出另外的方式。还应明白术语“组成”和/或“包括”,当在该说明书中使用时,确定所述特征、整数、步骤、操作、元件和/或部件的存在,但不排除一个或更多其它的特征、整数、步骤、操作、元件、部件和/或组的存在或添加。在此使用时,术语“和/或”包括相关所列项目的任何及所有组合。The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the terms "consisting of" and/or "comprising", when used in this specification, identify the presence of stated features, integers, steps, operations, elements and/or parts but do not exclude one or more others The presence or addition of features, integers, steps, operations, elements, parts, and/or groups. When used herein, the term "and/or" includes any and all combinations of the associated listed items.
为了彻底理解本申请,将在下列的描述中提出详细的结构,以便阐释本申请提出的技术方案。本申请的可选实施例详细描述如下,然而除了这些详细描述外,本申请还可以具有其他实施方式。In order to fully understand the present application, detailed structures will be provided in the following description to explain the technical solutions proposed in the present application. Optional embodiments of the present application are described in detail below. However, in addition to these detailed descriptions, the present application may also have other implementations.
首先,参照图1来描述用于实现本发明实施例的可行域预测方法和装置的示例电子设备100。First, an example electronic device 100 for implementing the feasible domain prediction method and apparatus according to an embodiment of the present invention is described with reference to FIG. 1 .
如图1所示,电子设备100包括一个或多个处理器102、一个或多个存储器104、输入装置106和输出装置108,这些组件通过总线系统110和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子设备100的组件和结构只是示例性的,而非限制性的,根据需要,所述电子设备也可以具有其他组件和结构。As shown in FIG. 1 , the electronic device 100 includes one or more processors 102 , one or more memories 104 , an input device 106 and an output device 108 . These components are connected through a bus system 110 and/or other forms of connection mechanisms (not shown). out) interconnection. It should be noted that the components and structures of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive. The electronic device may also have other components and structures as needed.
所述处理器102可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述电子设备100中的其它组件以执行期望的功能。The processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
所述存储器104可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器102可以运行所述程序指令,以实现下文所述的本发明实施例中(由处理器实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。The memory 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may execute the program instructions to implement the client functions (implemented by the processor) in the embodiments of the present invention described below. and/or other desired functionality. Various application programs and various data, such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.
所述输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
所述输出装置108可以向外部(例如用户)输出各种信息(例如图像或声音),并且可以包括显示器、扬声器等中的一个或多个。The output device 108 may output various information (such as images or sounds) to the outside (such as a user), and may include one or more of a display, a speaker, and the like.
示例性地,用于实现根据本发明实施例的可行域预测方法和装置的示例电子设备可以被实现为诸如智能手机、平板电脑等终端。 Illustratively, an example electronic device for implementing the feasible region prediction method and apparatus according to the embodiment of the present invention may be implemented as a terminal such as a smart phone, a tablet computer, or the like.
下面,将参考图2描述根据本发明实施例的可行域预测方法。图2是本申请实施例的可行域预测方法200的一个示意性流程图。本申请实施例的可行域预测方法用于可行域预测装置,可行域预测装置包括处理器、存储器、输入装置和输出装置等,该可行域预测装置可以实现为如上的电子设备100。具体地,本申请实施例的可行域预测方法200可以应用于车辆自动驾驶或者辅助驾驶,包括如下步骤:Next, a feasible region prediction method according to an embodiment of the present invention will be described with reference to FIG. 2 . Figure 2 is a schematic flow chart of the feasible region prediction method 200 according to the embodiment of the present application. The feasible region prediction method in the embodiment of the present application is used in a feasible region prediction device. The feasible region prediction device includes a processor, a memory, an input device, an output device, etc. The feasible region prediction device can be implemented as the above electronic device 100. Specifically, the feasible region prediction method 200 in the embodiment of the present application can be applied to vehicle automatic driving or assisted driving, including the following steps:
在步骤S210中,获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;In step S210, obtain the surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes images from multiple perspectives collected by multiple cameras on the vehicle;
在步骤S220中,对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;In step S220, the bird's-eye view features are extracted to obtain the bird's-eye view high-dimensional image features at the current moment;
在步骤S230中,根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。In step S230, generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map .
根据本发明实施例的可行域预测方法200,通过当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征得到未来的可行域预测图,实现了对未来场景的分析和预测,从而可以为车辆自动驾驶或辅助驾驶的行为决策提供依据;且可行域预测图基于所获取的当前时刻的环视图像而生成,可行域预测图通过将场景感知和行为预测结合在一起,能够直接给出障碍物未来的行动轨迹,从而划分出可行驶区域和不可行驶区域,避免了相关技术中需要将可行域分割和障碍物预测划分为两个模块所造成的重复计算和信息累计误差,无需再独立进行行为预测。According to the feasible region prediction method 200 of the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, thereby realizing the analysis and analysis of future scenes. Prediction, which can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map can combine scene perception and behavior prediction to It directly gives the future movement trajectory of the obstacle, thereby dividing the drivable area and the non-driving area, avoiding the repeated calculation and information accumulation errors caused by the need to divide the feasible region segmentation and obstacle prediction into two modules in related technologies. No more independent predictions of behavior.
在本发明的实施例中,步骤S210中在获取当前时刻的环视图像时,可以通过车辆上设置的多个摄像头同时曝光,从而采集到多个视角的图像,多个视角的图像共同构成环视图像。In the embodiment of the present invention, when obtaining the surround image at the current moment in step S210, multiple cameras installed on the vehicle can be exposed simultaneously, thereby collecting images from multiple viewing angles, and the images from multiple viewing angles together constitute the surround image. .
而后,通过图像处理技术对环视图像进行处理,从而得到鸟瞰图特征。在一个示例中,可以首先对所采集的多个视角的图像进行特征提取,得到多个视角图像高维图像特征。此处的高维图像特征是指对视角图像进行提取后得到的多个维度的图像特征,优选地,视角图像高维图像特征维度大于3。例如,可以通过Resnet50(残差网络50)对视角图像进行处理,输出的结果即是视角图像高维图像特征。然后可以对多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征。当然,除上述方式以外,也可以采取其他图像处理方式得到鸟瞰图特征,对此不进行限定。例如,如附图3所示,可以通过特征提取器1对摄像头1采集的图像进行特征提取,特征提取器2对摄像头2采集的图像进行特征提取,特征提取器3对摄像头3采集的图像进行特征提取……以此类推,得到n个视角图像高维图像特征,然后通过转换器网络对n个视角图像高维图像特征进行融合,得到特征维度为batchsize*H*W*C(其中,batchsize代表批量大小,H代表特征的高、W代表特征的宽、C代表特征的通道)的鸟瞰图特征。Then, the surrounding image is processed through image processing technology to obtain bird's-eye view features. In one example, feature extraction can be performed on the collected images from multiple viewing angles first to obtain high-dimensional image features of images from multiple viewing angles. The high-dimensional image features here refer to multi-dimensional image features obtained after extracting the perspective image. Preferably, the high-dimensional image feature dimension of the perspective image is greater than 3. For example, the perspective image can be processed through Resnet50 (residual network 50), and the output result is the high-dimensional image feature of the perspective image. Then, the high-dimensional image features of multiple viewing angle images can be fused to obtain the bird's-eye view features. Of course, in addition to the above methods, other image processing methods can also be used to obtain bird's-eye view features, and this is not limited. For example, as shown in Figure 3, feature extractor 1 can be used to extract features from the images collected by camera 1, feature extractor 2 can be used to extract features from the images collected by camera 2, and feature extractor 3 can be used to extract features from the images collected by camera 3. Feature extraction... By analogy, high-dimensional image features of n viewing angle images are obtained, and then the high-dimensional image features of n viewing angle images are fused through the converter network to obtain the feature dimension of batchsize*H*W*C (where, batchsize represents the batch size, H represents the height of the feature, W represents the width of the feature, and C represents the channel of the feature).
值得注意的是,对多个视角图像高维图像特征进行融合得到所述鸟瞰图特征的这一过程的本质是在进行坐标转换,即将多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,从而根据转换的结果得到鸟瞰图特征。所得到的鸟瞰图特征可以看作是车辆坐标系下的鸟瞰图像。例如,如附图4所示,右侧的图4(a)、图4(b)、图4(c)、图4(d)、图4(e)和图4(f)是不同视角的摄像头在同一时刻拍摄到的图像,每张图像在自身的图像坐标系下,左侧的图4(g)是一个BEV Feature的示例图,融合了右侧所有视角下的图像信息,投影到了车辆坐标系下。It is worth noting that the essence of the process of fusing high-dimensional image features of multiple viewing angle images to obtain the bird's-eye view features is to perform coordinate conversion, that is, converting the high-dimensional image features of multiple viewing angle images from the coordinates of the image from each viewing angle. The system is converted to the vehicle coordinate system, and the bird's-eye view features are obtained based on the conversion result. The obtained bird's-eye view feature can be regarded as a bird's-eye view image in the vehicle coordinate system. For example, as shown in Figure 4, Figure 4(a), Figure 4(b), Figure 4(c), Figure 4(d), Figure 4(e) and Figure 4(f) on the right are different viewing angles The images captured by the camera at the same time. Each image is in its own image coordinate system. Figure 4(g) on the left is an example of BEV Feature, which integrates the image information from all perspectives on the right and projects it to in the vehicle coordinate system.
在本发明的实施例中,步骤S220中由鸟瞰图特征得到鸟瞰图高维图像特征的过程可以参考步骤S210中由视角图像得到视角图像高维图像特征的过程,或者也可以采取其他图像处理方法实现。In the embodiment of the present invention, the process of obtaining the high-dimensional image features of the bird's-eye view image from the bird's-eye view features in step S220 can refer to the process of obtaining the high-dimensional image features of the perspective image from the perspective image in step S210, or other image processing methods can also be adopted. accomplish.
在本发明的实施例中,步骤S230中通过对当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征进行时序化编码得到时序队列,再根据时序队列来生成未来的可行域预测图。例如,以Ft作为当前时刻的鸟瞰图高维图像特征,Ft-n,Ft-(n-1)……Ft-1 作为多个历史时刻的鸟瞰图高维图像特征,则生成的时序队列可以表达为{Ft-n,Ft-(n-1)……Ft}。In the embodiment of the present invention, in step S230, a time series queue is obtained by sequentially encoding the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and then the future queue is generated based on the time series queue. Feasible region prediction map. For example, taking F t as the bird's-eye view high-dimensional image feature at the current moment, F tn , F t-(n-1) ...F t-1 As a bird's-eye view high-dimensional image feature of multiple historical moments, the generated time series queue can be expressed as {F tn , F t-(n-1) ...F t }.
在另一个示例中,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图可以包括:根据所述时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对所述未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成所述每个时刻的可行域预测图,以得到未来的所述多个时刻的可行域预测图。在该示例中,生成与当前时刻和历史时刻数目相同的未来时刻的可行域预测图,从而对环视图像中未来一段时间内的可行驶区域和不可行驶区域做出预测。In another example, the generation of the future feasible domain prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features at multiple historical moments may include: generating the bird's-eye view high-dimensional image features at multiple future moments based on the time series queue; upsampling the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple future moments to generate the feasible domain prediction map at each moment, so as to obtain the feasible domain prediction map at the multiple future moments. In this example, feasible domain prediction maps for the same number of future moments as the current moment and historical moments are generated, so as to predict the drivable and non-drivable areas in the surround image within a period of time in the future.
在本发明的实施例中,前述的对视角图像进行特征提取得到视角图像高维图像特征、对视角图像高维图像特征进行融合得到鸟瞰图特征、对鸟瞰图特征进行提取得到鸟瞰图高维图像特征以及生成可行域预测图等的计算可以通过同一个训练好的神经网络的不同部分来实施。例如,一个神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:例如,第一子网络获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第一子网络的输出输入到第二子网络,第二子网络对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第二子网络的输出输入到第三子网络,第三子网络对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第三子网络的输出输入到第四子网络,第四子网络根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。神经网络可以是卷积神经网络、深度神经网络等,例如可以是MoblieNet(移动网络)、Resnet(残差网络)等基于CNN的网络,也可以是Vision Transformer(视觉转换器)等基于Transformer的网络,对神经网络的具体类型不进行限定。In the embodiment of the present invention, the aforementioned feature extraction of the perspective image obtains the high-dimensional image features of the perspective image, the fusion of the high-dimensional image features of the perspective image obtains the bird's-eye view features, and the extraction of the bird's-eye view features obtains the bird's-eye view high-dimensional image. The computation of features and the generation of feasible region prediction maps can be performed through different parts of the same trained neural network. For example, a neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: for example, the first sub-network obtains images from multiple perspectives at the current moment and analyzes images from multiple perspectives. Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first sub-network is input to the second sub-network, and the second sub-network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second sub-network is input to the third sub-network, and the third sub-network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third sub-network is input to the fourth sub-network, and the The four sub-networks generate future feasible region prediction maps based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Neural networks can be convolutional neural networks, deep neural networks, etc. For example, they can be CNN-based networks such as MoblieNet (mobile network) and Resnet (residual network), or they can be Transformer-based networks such as Vision Transformer (Visual Transformer). , the specific type of neural network is not limited.
进一步地,第二子网络可以采取如下的方式对多个视角图像高维图像特征进行融合以得到鸟瞰图特征:将视角图像高维图像特征作为Key(键)、鸟瞰图特征中的像素位置坐标作为Query(查询)输入到第二子网络,并根据第二子网络的输出结果得到鸟瞰图特征。Further, the second sub-network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), the pixel position coordinates in the bird's-eye view feature It is input to the second sub-network as Query, and the bird's-eye view features are obtained based on the output results of the second sub-network.
进一步地,第三子网络可以包括多个卷积层,第三子网络可以采取如下的方式对鸟瞰图特征进行提取得到鸟瞰图高维图像特征:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。例如,如附图5所示,首先将鸟瞰图特征输入到第一卷积层(CONV1)进行特征提取,然后再将第一卷积层(CONV1)提取的结果输入到下一卷积层,以此类推,直至经过第n卷积层(CONVn)后实现n次特征提取,从而得到鸟瞰图高维图像特征F。Further, the third sub-network can include multiple convolutional layers. The third sub-network can extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features in the following manner: Characterize the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer (CONV 1 ) for feature extraction, and then the extracted results of the first convolution layer (CONV 1 ) are input to the next convolution layer, and so on, until n times of feature extraction are achieved after passing through the nth convolution layer (CONV n ), thereby obtaining the bird's-eye view high-dimensional image feature F.
进一步地,第四子网络可以包括与时序队列中鸟瞰图高维图像特征的数量相同的分网络,第四子网络可以采取如下的方式根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图:分别基于相应的分网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。例如,如附图6所示,以第四子网络包括多个Transformer的分网络为例,分网络Transformer1根据Ft-n、Ft-(…)、Ft所组成的时序队列生成未来的t+1时刻的鸟瞰图高维图像特征Ft+1,通过对Ft+1进行上采样处理得到未来的t+1时刻的可行域预测图;其他的分网络实现类似的过程,最终可以得到未来的t+1、t+(…)、t+q等多个时刻的可行域预测图。Further, the fourth sub-network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the fourth sub-network may adopt the following method according to the bird's-eye view high-dimensional image features at the current moment and multiple historical moments. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the fourth sub-network including multiple Transformers as an example, the sub-network Transformer 1 generates the future t based on the timing queue composed of F tn , F t-(...) , and F t The bird's-eye view high-dimensional image feature F t+1 at time +1 is obtained by upsampling F t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement a similar process, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.
在本发明的另一个实施例中,前述的对视角图像进行特征提取得到视角图像高维图像特征、对视角图像高维图像特征进行融合得到鸟瞰图特征、对鸟瞰图特征进行提取得到鸟瞰图高维图像特征以及生成可行域预测图等的计算也可以通过多个不同的神经网络来实施。例如,多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:例如,第一神经网络获取当前时刻的多个视角的图像,并对多个视角的图像进 行特征提取,得到多个视角图像高维图像特征;第一神经网络的输出输入到第二神经网络,第二神经网络对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第二神经网络的输出输入到第三神经网络,第三神经网络对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第三神经网络的输出输入到第四神经网络,第四神经网络根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。其中,神经网络可以是卷积神经网络、深度神经网络等,例如第一神经网络可以是MoblieNet、Resnet等基于CNN的网络,也可以Vision Transformer等基于Transformer的网络,第二神经网络可以是Transformer网络,第三神经网络可以是卷积神经网络、Transformer网络等,第四神经网络可以是Transformer网络,对神经网络的具体类型不进行限定。In another embodiment of the present invention, the aforementioned feature extraction of the perspective image obtains the high-dimensional image features of the perspective image, the fusion of the high-dimensional image features of the perspective image obtains the bird's-eye view feature, and the extraction of the bird's-eye view feature obtains the high-dimensional bird's-eye view feature. Computations such as dimensional image features and the generation of feasible region prediction maps can also be implemented through multiple different neural networks. For example, the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a fourth neural network, wherein: for example, the first neural network acquires images from multiple perspectives at the current moment, and analyzes the multiple perspectives images into Feature extraction is performed to obtain high-dimensional image features of multiple viewing angle images; the output of the first neural network is input to the second neural network, and the second neural network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; second The output of the neural network is input to the third neural network, and the third neural network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third neural network is input to the fourth neural network, and the fourth neural network The future feasible region prediction map is generated based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Among them, the neural network can be a convolutional neural network, a deep neural network, etc. For example, the first neural network can be a CNN-based network such as MoblieNet or Resnet, or a Transformer-based network such as Vision Transformer. The second neural network can be a Transformer network. , the third neural network can be a convolutional neural network, a Transformer network, etc., and the fourth neural network can be a Transformer network, and the specific type of the neural network is not limited.
进一步地,第二神经网络可以采取如下的方式对多个视角图像高维图像特征进行融合以得到鸟瞰图特征:将视角图像高维图像特征作为Key(键)、鸟瞰图特征中的像素位置坐标作为Query(查询)输入到第二神经网络,并根据第二神经网络的输出结果得到鸟瞰图特征。Further, the second neural network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), and the pixel position coordinates in the bird's-eye view feature. It is input to the second neural network as Query, and the bird's-eye view features are obtained according to the output result of the second neural network.
进一步地,第三神经网络可以包括多个卷积层,第三神经网络可以采取如下的方式对鸟瞰图特征进行提取得到鸟瞰图高维图像特征:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。例如,如附图5所示,首先将鸟瞰图特征输入到第一卷积层进行特征提取,然后再将第一卷积层提取的结果输入到下一卷积层,以此类推,直至经过第n卷积层后实现n次特征提取,从而得到鸟瞰图高维图像特征F。Further, the third neural network may include multiple convolutional layers. The third neural network may extract bird's-eye view features to obtain bird's-eye view high-dimensional image features in the following manner: Feature the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer for feature extraction, and then the extraction results of the first convolution layer are input to the next convolution layer, and so on, until After the nth convolutional layer, n times of feature extraction are implemented, thereby obtaining the bird's-eye view high-dimensional image feature F.
进一步地,第四神经网络可以包括与时序队列中鸟瞰图高维图像特征的数量相同的子网络,第四神经网络可以采取如下的方式根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图:分别基于相应的子网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。例如,如附图6所示,以第四神经网络包括多个Transformer的子网络为例,子网络Transformer1根据Ft-n、Ft-(…)、Ft所组成的时序队列生成未来的t+1时刻的鸟瞰图高维图像特征Ft+1,通过对Ft+1进行上采样处理得到未来的t+1时刻的可行域预测图;其他的子网络实现类似的过程,最终可以得到未来的t+1、t+(…)、t+q等多个时刻的可行域预测图。Further, the fourth neural network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue. The fourth neural network may adopt the following method to calculate the bird's-eye view high-dimensional image features at the current moment and multiple historical moments in the following manner. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the sub-network of the fourth neural network including multiple Transformers as an example, the sub-network Transformer 1 generates the future t based on the timing queue composed of F tn , F t-(...) , and F t The bird's-eye view high-dimensional image feature F t+1 at time +1 is obtained by upsampling F t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement similar processes, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.
在本发明的实施例中,所得到的可行域预测图可以是以概率化方式呈现的概率图,通过所述概率图可以表征未来时刻的环视图像中的像素点属于可行驶区域的概率,从而对未来时间段内道路的可行驶情况进行概率化表达。In embodiments of the present invention, the obtained feasible region prediction map can be a probability map presented in a probabilistic manner. The probability map can represent the probability that the pixels in the surrounding image in the future belong to the drivable area, so that Probabilistically express the drivability of the road in the future time period.
进一步地,可行域预测图可以采取如下的方式进行概率化表达:当概率图中像素点的像素值不大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当概率图中像素点的像素值大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。例如,概率图中每一个像素点的像素值是一个浮点数,浮点数的值位于[0,1]区间,假设坐标为(x,y)的像素点所对应的像素值为k,则若未来Q时刻该像素点为不可行驶区域,则k→0(k趋于0);若未来Q时刻该像素点为可行驶区域,则k→1(k趋于1)。Furthermore, the feasible region prediction map can be expressed probabilistically in the following way: when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel does not belong to the feasible region. Driving area, when the pixel value of a pixel in the probability map is greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel belongs to the drivable area. For example, the pixel value of each pixel in the probability map is a floating point number, and the value of the floating point number is in the interval [0, 1]. Assume that the pixel value corresponding to the pixel with coordinates (x, y) is k, then if If the pixel is a non-travelable area at time Q in the future, then k→0 (k tends to 0); if the pixel is a drivable area at time Q in the future, then k→1 (k tends to 1).
基于上述的描述,根据本发明实施例的可行域预测方法,通过当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征得到未来的可行域预测图,实现了对未来场景的分析和预测,从而可以为车辆自动驾驶或辅助驾驶的行为决策提供依据;且可行域预测图基于所获取的当前时刻的环视图像而生成,可行域预测图通过将场景感知和行为预测结合在一起,能够直接给出障碍物未来的行动轨迹,从而划分出可行驶区域和不可行驶区域,避免了相关技术中需要将可行域分割和障碍物预测划分为两个模块所造成的重复 计算和信息累计误差,无需再独立进行行为预测。Based on the above description, according to the feasible region prediction method of the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, realizing the prediction of the future. The analysis and prediction of the scene can provide a basis for the behavioral decision-making of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior prediction. Together, it can directly give the future trajectory of the obstacle, thereby dividing the drivable area and the non-driving area, avoiding the duplication caused by the need to divide feasible region segmentation and obstacle prediction into two modules in related technologies. Errors in calculations and information accumulate, eliminating the need to independently predict behavior.
以上示例性地描述了根据本发明实施例的可行域预测方法。示例性地,根据本发明实施例的可行域预测方法可以在具有存储器和处理器的设备、装置或者系统中实现。The above exemplarily describes the feasible region prediction method according to the embodiment of the present invention. Exemplarily, the feasible region prediction method according to the embodiment of the present invention can be implemented in a device, device or system having a memory and a processor.
此外,根据本发明实施例的可行域预测方法可以方便地部署到智能手机、平板电脑等本地终端上。替代地,根据本发明实施条例的可行域预测方法还可以部署在服务器端(或云端)。替代地,根据本发明实施例的可行域预测方法还可以分布地部署在服务器端(或云端)和本地终端处。In addition, the feasible region prediction method according to the embodiment of the present invention can be easily deployed on local terminals such as smartphones and tablet computers. Alternatively, the feasible region prediction method according to the implementation regulations of the present invention can also be deployed on the server side (or cloud). Alternatively, the feasible region prediction method according to the embodiment of the present invention can also be deployed in a distributed manner on the server side (or cloud) and the local terminal.
图7示出了根据本发明实施例的可行域预测装置的示意性框图。如图7所示,根据本发明实施例的可行域预测装置700可以应用于车辆自动驾驶或者辅助驾驶,包括鸟瞰图特征模块710、鸟瞰图高维图像特征模块720和可行域预测图模块730。其中,鸟瞰图特征模块710用于获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;鸟瞰图高维图像特征模块720用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;可行域预测图模块730用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。Figure 7 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present invention. As shown in Figure 7, the feasible region prediction device 700 according to the embodiment of the present invention can be applied to vehicle automatic driving or assisted driving, and includes a bird's-eye view feature module 710, a bird's-eye view high-dimensional image feature module 720 and a feasible region prediction map module 730. Among them, the bird's-eye view feature module 710 is used to obtain the surrounding image at the current moment, and obtain the bird's-eye view features according to the surrounding image; the surrounding image includes images from multiple perspectives collected by multiple cameras on the vehicle; the bird's-eye view is high-dimensional The image feature module 720 is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the feasible region prediction map module 730 is used to extract the bird's-eye view high-dimensional image features at the current moment and multiple histories. A time series queue composed of bird's-eye view high-dimensional image features at each moment generates a future feasible region prediction map, and outputs the feasible region prediction map.
其中,鸟瞰图特征模块710、鸟瞰图高维图像特征模块720和可行域预测图模块730可以由图1所示的电子设备100中的处理器102运行存储器104中存储的程序指令来实现,并且可以执行根据本发明实施例的可行域预测方法200中相应的步骤。以下仅对可行域预测装置的各模块的主要功能进行描述,而省略以上已经描述过的细节内容。Among them, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can be implemented by the processor 102 in the electronic device 100 shown in Figure 1 running the program instructions stored in the memory 104, and Corresponding steps in the feasible region prediction method 200 according to the embodiment of the present invention may be performed. Only the main functions of each module of the feasible region prediction device are described below, and the details described above are omitted.
在本发明的实施例中,鸟瞰图特征模块710在获取当前时刻的环视图像时,可以通过车辆上设置的多个摄像头同时曝光,从而采集到多个视角的图像,多个视角的图像共同构成环视图像。In the embodiment of the present invention, when acquiring the surrounding image at the current moment, the bird's-eye view feature module 710 can simultaneously expose through multiple cameras installed on the vehicle, thereby collecting images from multiple perspectives, and the images from multiple perspectives together constitute Look around the image.
而后,通过图像处理技术对环视图像进行处理,从而得到鸟瞰图特征。在一个示例中,可以首先对所采集的多个视角的图像进行特征提取,得到多个视角图像高维图像特征。此处的高维图像特征是指对视角图像进行提取后得到的多个维度的图像特征,通常情况下,应该是高于三个维度的。例如,可以通过Resnet50对视角图像进行处理,输出的结果即是视角图像高维图像特征。然后可以对多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征。当然,除上述方式以外,也可以采取其他图像处理方式得到鸟瞰图特征,对此不进行限定。例如,如附图3所示,可以通过特征提取器1对摄像头1采集的图像进行特征提取,特征提取器2对摄像头2采集的图像进行特征提取,特征提取器3对摄像头3采集的图像进行特征提取……以此类推,得到n个视角图像高维图像特征,然后通过Transformer网络对n个视角图像高维图像特征进行融合,得到特征维度为batchsize*H*W*C的鸟瞰图特征。Then, the surrounding image is processed through image processing technology to obtain bird's-eye view features. In one example, feature extraction can be performed on the collected images from multiple viewing angles first to obtain high-dimensional image features of images from multiple viewing angles. The high-dimensional image features here refer to the multi-dimensional image features obtained after extracting the perspective image. Normally, they should be higher than three dimensions. For example, the perspective image can be processed through Resnet50, and the output result is the high-dimensional image feature of the perspective image. Then, the high-dimensional image features of multiple viewing angle images can be fused to obtain the bird's-eye view features. Of course, in addition to the above methods, other image processing methods can also be used to obtain bird's-eye view features, and this is not limited. For example, as shown in Figure 3, feature extractor 1 can be used to extract features from the images collected by camera 1, feature extractor 2 can be used to extract features from the images collected by camera 2, and feature extractor 3 can be used to extract features from the images collected by camera 3. Feature extraction... By analogy, high-dimensional image features of n viewpoint images are obtained, and then the high-dimensional image features of n viewpoint images are fused through the Transformer network to obtain bird's-eye view features with feature dimensions of batchsize*H*W*C.
值得注意的是,对多个视角图像高维图像特征进行融合得到所述鸟瞰图特征的这一过程的本质是在进行坐标转换,即将多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,从而根据转换的结果得到鸟瞰图特征。所得到的鸟瞰图特征可以看作是车辆坐标系下的鸟瞰图像。例如,如附图4所示,右侧的图4(a)、图4(b)、图4(c)、图4(d)、图4(e)和图4(f)是不同视角的摄像头在同一时刻拍摄到的图像,每张图像在自身的图像坐标系下,左侧的图4(g)是一个BEV Feature的示例图,融合了右侧所有视角下的图像信息,投影到了车辆坐标系下。It is worth noting that the essence of the process of fusing high-dimensional image features of multiple viewing angle images to obtain the bird's-eye view features is to perform coordinate conversion, that is, converting the high-dimensional image features of multiple viewing angle images from the coordinates of the image from each viewing angle. The system is converted to the vehicle coordinate system, and the bird's-eye view features are obtained based on the conversion result. The obtained bird's-eye view feature can be regarded as a bird's-eye view image in the vehicle coordinate system. For example, as shown in Figure 4, Figure 4(a), Figure 4(b), Figure 4(c), Figure 4(d), Figure 4(e) and Figure 4(f) on the right are different viewing angles The images captured by the camera at the same time. Each image is in its own image coordinate system. Figure 4(g) on the left is an example of BEV Feature, which integrates the image information from all perspectives on the right and projects it to in the vehicle coordinate system.
在本发明的实施例中,鸟瞰图高维图像特征模块720由鸟瞰图特征得到鸟瞰图高维图像特征的过程可以参考鸟瞰图特征模块710由视角图像得到视角图像高维图像特征的过程,或者也可以采取其他图像处理方法实现。In the embodiment of the present invention, the process of the bird's-eye view high-dimensional image feature module 720 obtaining the bird's-eye view high-dimensional image features from the bird's-eye view features may refer to the process of the bird's-eye view feature module 710 obtaining the high-dimensional image features of the perspective image from the perspective image, or Other image processing methods can also be used.
在本发明的实施例中,可行域预测图模块730通过对当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征进行时序化编码得到时序队列,再根据时序队列来 生成未来的可行域预测图。例如,以Ft作为当前时刻的鸟瞰图高维图像特征,Ft-n,Ft-(n-1)……Ft-1作为多个历史时刻的鸟瞰图高维图像特征,则生成的时序队列可以表达为{Ft-n,Ft-(n-1)……Ft}。In the embodiment of the present invention, the feasible region prediction map module 730 performs time-series coding on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments to obtain a time-series queue, and then calculates the time-series queue according to the time-series queue. Generate a prediction map of the future feasible region. For example, if F t is used as the bird's-eye view high-dimensional image feature at the current moment, F tn , F t-(n-1) ...F t-1 is used as the bird's-eye view high-dimensional image feature at multiple historical moments, then the generated time series The queue can be expressed as {F tn , F t-(n-1) ...F t }.
在另一个示例中,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图可以包括:根据所述时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对所述未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成所述每个时刻的可行域预测图,以得到未来的所述多个时刻的可行域预测图。在该示例中,生成与当前时刻和历史时刻数目相同的未来时刻的可行域预测图,从而对环视图像中未来一段时间内的可行驶区域和不可行驶区域做出预测。In another example, generating a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments may include: according to the The time series queue generates bird's-eye view high-dimensional image features at multiple moments in the future; the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple moments in the future are upsampled to generate The feasible region prediction map at each moment is used to obtain the feasible region prediction map at the multiple moments in the future. In this example, a feasible region prediction map of the same number of future moments as the current moment and historical moments is generated, thereby predicting the drivable area and the non-driving area in the surrounding image for a period of time in the future.
在本发明的实施例中,鸟瞰图特征模块710、鸟瞰图高维图像特征模块720和可行域预测图模块730可以通过同一个训练好的神经网络的不同部分来实施。例如,一个神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:例如,第一子网络获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第一子网络的输出输入到第二子网络,第二子网络对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第二子网络的输出输入到第三子网络,第三子网络对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第三子网络的输出输入到第四子网络,第四子网络根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。神经网络可以是卷积神经网络、深度神经网络等,例如可以是MoblieNet、Resnet等基于CNN的网络,也可以是Vision Transformer等基于Transformer的网络,对神经网络的具体类型不进行限定。In the embodiment of the present invention, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can be implemented by different parts of the same trained neural network. For example, a neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: for example, the first sub-network obtains images from multiple perspectives at the current moment and analyzes images from multiple perspectives. Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first sub-network is input to the second sub-network, and the second sub-network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second sub-network is input to the third sub-network, and the third sub-network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third sub-network is input to the fourth sub-network, and the The four sub-networks generate future feasible region prediction maps based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Neural networks can be convolutional neural networks, deep neural networks, etc. For example, they can be CNN-based networks such as MoblieNet and Resnet, or they can be Transformer-based networks such as Vision Transformer. The specific type of neural network is not limited.
进一步地,第二子网络可以采取如下的方式对多个视角图像高维图像特征进行融合以得到鸟瞰图特征:将视角图像高维图像特征作为Key(键)、鸟瞰图特征中的像素位置坐标作为Query(查询)输入到第二子网络,并根据第二子网络的输出结果得到鸟瞰图特征。Further, the second sub-network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), the pixel position coordinates in the bird's-eye view feature It is input to the second sub-network as Query, and the bird's-eye view features are obtained based on the output results of the second sub-network.
进一步地,第三子网络可以包括多个卷积层,第三子网络可以采取如下的方式对鸟瞰图特征进行提取得到鸟瞰图高维图像特征:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。例如,如附图5所示,首先将鸟瞰图特征输入到第一卷积层进行特征提取,然后再将第一卷积层提取的结果输入到下一卷积层,以此类推,直至经过第n卷积层后实现n次特征提取,从而得到鸟瞰图高维图像特征F。Further, the third sub-network can include multiple convolutional layers. The third sub-network can extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features in the following manner: Characterize the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer for feature extraction, and then the extraction results of the first convolution layer are input to the next convolution layer, and so on, until After the nth convolutional layer, n times of feature extraction are implemented, thereby obtaining the bird's-eye view high-dimensional image feature F.
进一步地,第四子网络可以包括与时序队列中鸟瞰图高维图像特征的数量相同的分网络,第四子网络可以采取如下的方式根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图:分别基于相应的分网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。例如,如附图6所示,以第四子网络包括多个Transformer的分网络为例,分网络Transformer1根据Ft-n、Ft-(…)、Ft所组成的时序队列生成未来的t+1时刻的鸟瞰图高维图像特征Ft+1,通过对Ft+1进行上采样处理得到未来的t+1时刻的可行域预测图;其他的分网络实现类似的过程,最终可以得到未来的t+1、t+(…)、t+q等多个时刻的可行域预测图。Further, the fourth sub-network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the fourth sub-network may adopt the following method according to the bird's-eye view high-dimensional image features at the current moment and multiple historical moments. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the fourth sub-network including multiple Transformers as an example, the sub-network Transformer 1 generates the future t based on the timing queue composed of F tn , F t-(...) , and F t The bird's-eye view high-dimensional image feature F t+1 at time +1 is obtained by upsampling F t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement a similar process, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.
在本发明的另一个实施例中,鸟瞰图特征模块710、鸟瞰图高维图像特征模块720和可行域预测图模块730也可以通过多个不同的神经网络来实施。例如,多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:例如,第一神经网络获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第一神经网络的输出输入到第二神经网络,第二神经网络对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第二神经网络的输出输入到第三神经网络, 第三神经网络对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第三神经网络的输出输入到第四神经网络,第四神经网络根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。其中,神经网络可以是卷积神经网络、深度神经网络等,例如第一神经网络可以是MoblieNet、Resnet等基于CNN的网络,也可以Vision Transformer等基于Transformer的网络,第二神经网络可以是Transformer网络,第三神经网络可以是卷积神经网络、Transformer网络等,第四神经网络可以是Transformer网络,对神经网络的具体类型不进行限定。In another embodiment of the present invention, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can also be implemented through multiple different neural networks. For example, the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a fourth neural network, wherein: for example, the first neural network acquires images from multiple perspectives at the current moment, and analyzes the multiple perspectives Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first neural network is input to the second neural network, and the second neural network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second neural network is input to the third neural network, The third neural network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third neural network is input to the fourth neural network, and the fourth neural network is based on the high-dimensional image features of the bird's-eye view at the current moment and A time series queue composed of bird's-eye view high-dimensional image features at multiple historical moments generates a future feasible region prediction map. Among them, the neural network can be a convolutional neural network, a deep neural network, etc. For example, the first neural network can be a CNN-based network such as MoblieNet or Resnet, or a Transformer-based network such as Vision Transformer. The second neural network can be a Transformer network. , the third neural network can be a convolutional neural network, a Transformer network, etc., and the fourth neural network can be a Transformer network, and the specific type of the neural network is not limited.
进一步地,第二神经网络可以采取如下的方式对多个视角图像高维图像特征进行融合以得到鸟瞰图特征:将视角图像高维图像特征作为Key(键)、鸟瞰图特征中的像素位置坐标作为Query(查询)输入到第二神经网络,并根据第二神经网络的输出结果得到鸟瞰图特征。Further, the second neural network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), and the pixel position coordinates in the bird's-eye view feature. It is input to the second neural network as Query, and the bird's-eye view features are obtained according to the output result of the second neural network.
进一步地,第三神经网络可以包括多个卷积层,第三神经网络可以采取如下的方式对鸟瞰图特征进行提取得到鸟瞰图高维图像特征:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。例如,如附图5所示,首先将鸟瞰图特征输入到第一卷积层进行特征提取,然后再将第一卷积层提取的结果输入到下一卷积层,以此类推,直至经过第n卷积层后实现n次特征提取,从而得到鸟瞰图高维图像特征F。Further, the third neural network may include multiple convolutional layers. The third neural network may extract bird's-eye view features to obtain bird's-eye view high-dimensional image features in the following manner: Feature the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer for feature extraction, and then the extraction results of the first convolution layer are input to the next convolution layer, and so on, until After the nth convolutional layer, n times of feature extraction are implemented, thereby obtaining the bird's-eye view high-dimensional image feature F.
进一步地,第四神经网络可以包括与时序队列中鸟瞰图高维图像特征的数量相同的子网络,第四神经网络可以采取如下的方式根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图:分别基于相应的子网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。例如,如附图6所示,以第四神经网络包括多个Transformer的子网络为例,子网络Transformer1根据Ft-n、Ft-(…)、Ft所组成的时序队列生成未来的t+1时刻的鸟瞰图高维图像特征Ft+1,通过对Ft+1进行上采样处理得到未来的t+1时刻的可行域预测图;其他的子网络实现类似的过程,最终可以得到未来的t+1、t+(…)、t+q等多个时刻的可行域预测图。Further, the fourth neural network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue. The fourth neural network may adopt the following method to calculate the bird's-eye view high-dimensional image features at the current moment and multiple historical moments in the following manner. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the sub-network of the fourth neural network including multiple Transformers as an example, the sub-network Transformer 1 generates the future t based on the timing queue composed of F tn , F t-(...) , and F t The bird's-eye view high-dimensional image feature F t+1 at time +1 is obtained by upsampling F t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement similar processes, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.
在本发明的实施例中,所得到的可行域预测图可以是以概率化方式呈现的概率图,通过所述概率图可以表征未来时刻的环视图像中的像素点属于可行驶区域的概率,从而对未来时间段内道路的可行驶情况进行概率化表达。In embodiments of the present invention, the obtained feasible region prediction map can be a probability map presented in a probabilistic manner. The probability map can represent the probability that the pixels in the surrounding image in the future belong to the drivable area, so that Probabilistically express the drivability of the road in the future time period.
进一步地,可行域预测图可以采取如下的方式进行概率化表达:当概率图中像素点的像素值不大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当概率图中像素点的像素值大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。例如,概率图中每一个像素点的像素值是一个浮点数,浮点数的值位于[0,1]区间,假设坐标为(x,y)的像素点所对应的像素值为k,则若未来Q时刻该像素点为不可行驶区域,则k→0(k趋于0);若未来Q时刻该像素点为可行驶区域,则k→1(k趋于1)。Furthermore, the feasible region prediction map can be expressed probabilistically in the following way: when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel does not belong to the feasible region. Driving area, when the pixel value of a pixel in the probability map is greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel belongs to the drivable area. For example, the pixel value of each pixel in the probability map is a floating point number, and the value of the floating point number is in the interval [0, 1]. Assume that the pixel value corresponding to the pixel with coordinates (x, y) is k, then if If the pixel is a non-travelable area at time Q in the future, then k→0 (k tends to 0); if the pixel is a drivable area at time Q in the future, then k→1 (k tends to 1).
基于上述的描述,根据本发明实施例的可行域预测装置700,通过当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征得到未来的可行域预测图,实现了对未来场景的分析和预测,从而可以为车辆自动驾驶或辅助驾驶的行为决策提供依据;且可行域预测图基于所获取的当前时刻的环视图像而生成,可行域预测图通过将场景感知和行为预测结合在一起,能够直接给出障碍物未来的行动轨迹,从而划分出可行驶区域和不可行驶区域,避免了相关技术中需要将可行域分割和障碍物预测划分为两个模块所造成的重复计算和信息累计误差,无需再独立进行行为预测。Based on the above description, according to the feasible region prediction device 700 according to the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and realizes the prediction The analysis and prediction of future scenes can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior prediction. Combined together, the future trajectory of obstacles can be directly given, thereby dividing the drivable area and the non-driving area, avoiding the repeated calculations caused by dividing feasible region segmentation and obstacle prediction into two modules in related technologies. and information accumulation error, eliminating the need to independently predict behavior.
此外,本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功 能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。In addition, those of ordinary skill in the art can appreciate that the modules and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. These skills Whether it can be implemented in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.
根据本发明实施例,还提供了一种可行域预测装置,所述装置应用于车辆自动驾驶或者辅助驾驶,包括:多个设置于车辆上的摄像头,用于采集当前时刻的环视图像;所述环视图像包括多个视角的图像;一个或多个处理器,用于:获取所述环视图像,并根据所述环视图像得到鸟瞰图特征;对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。According to an embodiment of the present invention, a feasible region prediction device is also provided. The device is applied to vehicle automatic driving or assisted driving, and includes: a plurality of cameras installed on the vehicle for collecting surround images at the current moment; The surround image includes images from multiple perspectives; one or more processors are used to: obtain the surround image and obtain bird's-eye view features based on the surround image; extract the bird's-eye view features to obtain the bird's-eye view at the current moment map high-dimensional image features; generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction picture.
在一个示例中,根据环视图像得到鸟瞰图特征,包括:对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;对多个视角图像高维图像特征进行融合,得到鸟瞰图特征。In one example, obtaining bird's-eye view features based on the surround image includes: extracting features from images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; fusing high-dimensional image features of images from multiple viewing angles to obtain a bird's-eye view. feature.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,根据转换的结果得到鸟瞰图特征。In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: converting the high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system, according to the transformation The result is a bird's eye view feature.
在一个实施例中,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:根据时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成每个时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, generating a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments includes: generating a future multiple prediction map based on the time series queue. Bird's-eye view high-dimensional image features at multiple moments in the future; upsampling the bird's-eye view high-dimensional image features at each moment in the future to generate a feasible region prediction map at each moment , to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,处理器执行的步骤是由训练好的一个神经网络来实施的,神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:第一子网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二子网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三子网络用于对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第四子网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, the steps performed by the processor are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: the first The sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; the second sub-network is used to extract high-dimensional image features of images from multiple perspectives. Fusion is performed to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is used to extract the bird's-eye view high-dimensional image features at the current moment and the multi-dimensional A time series queue composed of high-dimensional bird's-eye view image features at each historical moment generates a future feasible region prediction map.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二子网络,并根据第二子网络的输出结果得到鸟瞰图特征。In one embodiment, high-dimensional image features of multiple perspective images are fused to obtain bird's-eye view features, including: inputting the high-dimensional image features of the perspective images as keys and the pixel position coordinates in the bird's-eye view features as queries into a second sub-network, and obtaining the bird's-eye view features according to an output result of the second sub-network.
在一个实施例中,第三子网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third sub-network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个实施例中,第四子网络包括与时序队列中鸟瞰图高维图像特征的数量相同的分网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的分网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,处理器执行的步骤是由训练好的多个神经网络来实施的,多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:第一神经网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二神经网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三神经网络用于对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第四神经网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图 高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, the steps performed by the processor are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a second neural network, a third neural network and a fourth neural network, wherein : The first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of multiple perspective images; the second neural network is used to extract high-dimensional image features from multiple perspective images. The third neural network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the fourth neural network is used to extract the bird's-eye view high-dimensional image features at the current moment. Aerial views of features and multiple historical moments The temporal queue composed of high-dimensional image features generates a future feasible region prediction map.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二神经网络,并根据第二神经网络的输出结果得到鸟瞰图特征。In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: using the high-dimensional image features of the viewing angle images as keys and the pixel position coordinates in the bird's-eye view features as queries and inputting them into the second neural network. network, and obtain bird's-eye view features based on the output of the second neural network.
在一个实施例中,第三神经网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third neural network includes multiple convolutional layers to extract features of the bird's-eye view to obtain high-dimensional image features of the bird's-eye view, including: extracting features of the bird's-eye view through one of the convolutional layers, and extracting features again on the results extracted by the previous convolutional layer through the subsequent convolutional layer in turn to obtain high-dimensional image features of the bird's-eye view.
在一个实施例中,第四神经网络包括与时序队列中鸟瞰图高维图像特征的数量相同的子网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的子网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time-series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,可行域预测图是以概率化方式呈现的概率图,概率图用于表征未来时刻的环视图像中的像素点属于可行驶区域的概率。In one embodiment, the feasible region prediction map is a probability map presented in a probabilistic manner. The probability map is used to represent the probability that pixels in the surrounding image in the future belong to the drivable area.
在一个实施例中,当概率图中像素点的像素值不大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当概率图中像素点的像素值大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。In one embodiment, when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future moment corresponding to the pixel does not belong to the drivable area. When the pixel of the pixel in the probability map When the value is greater than the set threshold, the pixels in the surrounding image in the future corresponding to the pixels belong to the drivable area.
图8示出了根据本发明实施例的可行域预测装置的示意性框图。可行域预测装置800包括存储器810和处理器820。Figure 8 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present invention. The feasible region prediction device 800 includes a memory 810 and a processor 820 .
其中,存储器810存储用于实现根据本发明实施例的可行域预测方法中的相应步骤的计算机程序。处理器820用于运行存储器810中存储的计算机程序,以执行根据本发明实施例的可行域预测方法的相应步骤,并且用于实现根据本发明实施例的可行域预测装置中的相应模块。The memory 810 stores a computer program for implementing corresponding steps in the feasible region prediction method according to the embodiment of the present invention. The processor 820 is used to run the computer program stored in the memory 810 to perform corresponding steps of the feasible region prediction method according to the embodiment of the present invention, and to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention.
在一个实施例中,在计算机程序被处理器820运行时使得可行域预测装置800执行以下步骤:获取当前时刻的环视图像,并根据环视图像得到鸟瞰图特征;环视图像包括由车辆上多个摄像头采集的多个视角的图像;对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出可行域预测图。In one embodiment, when the computer program is run by the processor 820, the feasible region prediction device 800 is caused to perform the following steps: obtain the surrounding image at the current moment, and obtain bird's-eye view features based on the surrounding image; the surrounding image includes images from multiple cameras on the vehicle Images collected from multiple perspectives; the bird's-eye view features are extracted to obtain the high-dimensional image features of the bird's-eye view at the current moment; it is composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments. The time series queue generates the future feasible region prediction map and outputs the feasible region prediction map.
在一个实施例中,根据环视图像得到鸟瞰图特征,包括:对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;对多个视角图像高维图像特征进行融合,得到鸟瞰图特征。In one embodiment, obtaining bird's-eye view features based on the surround image includes: performing feature extraction on images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; fusing high-dimensional image features of images from multiple viewing angles to obtain a bird's-eye view graph features.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,根据转换的结果得到鸟瞰图特征。In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: converting the high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system, according to the transformation The result is a bird's eye view feature.
在一个实施例中,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:根据时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成每个时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, generating a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments includes: generating a future multiple prediction map based on the time series queue. Bird's-eye view high-dimensional image features at multiple moments in the future; upsampling the bird's-eye view high-dimensional image features at each moment in the future to generate a feasible region prediction map at each moment , to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,在计算机程序被处理器820运行时使得可行域预测装置800执行的步骤是由训练好的一个神经网络来实施的,神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:第一子网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二子网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三子网络用于对鸟瞰图特征进行提取, 得到当前时刻的鸟瞰图高维图像特征;第四子网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, when the computer program is run by the processor 820, the steps performed by the feasible region prediction device 800 are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network, and a first sub-network. Three sub-networks and the fourth sub-network, among which: the first sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; the second sub-network is used to obtain images from multiple perspectives at the current moment. The sub-network is used to fuse high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third sub-network is used to extract bird's-eye view features. Obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is used to generate future feasible region predictions based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features at multiple historical moments. picture.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二子网络,并根据第二子网络的输出结果得到鸟瞰图特征。In one embodiment, fusion of high-dimensional image features of multiple viewpoint images to obtain bird's-eye view features includes: inputting the high-dimensional image features of the viewpoint images as keys and the pixel position coordinates in the bird's-eye view features as queries into the second sub-view. network, and obtain bird's-eye view features based on the output of the second sub-network.
在一个实施例中,第三子网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third sub-network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个实施例中,第四子网络包括与时序队列中鸟瞰图高维图像特征的数量相同的分网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的分网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,在计算机程序被处理器820运行时使得可行域预测装置800执行的步骤是由训练好的多个神经网络来实施的,多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:第一神经网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二神经网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三神经网络用于对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第四神经网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, when the computer program is run by the processor 820, the steps performed by the feasible region prediction device 800 are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a second neural network, and a second neural network. network, the third neural network and the fourth neural network, wherein: the first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives. ; The second neural network is used to fuse the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third neural network is used to extract the bird's-eye view features to obtain the high-dimensional bird's-eye view image features at the current moment; the fourth The neural network is used to generate a future feasible region prediction map based on a time series queue composed of high-dimensional bird's-eye view image features at the current moment and high-dimensional bird's-eye view image features at multiple historical moments.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二神经网络,并根据第二神经网络的输出结果得到鸟瞰图特征。In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: using the high-dimensional image features of the viewing angle images as keys and the pixel position coordinates in the bird's-eye view features as queries and inputting them into the second neural network. network, and obtain bird's-eye view features based on the output of the second neural network.
在一个实施例中,第三神经网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third neural network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting features from the bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个实施例中,第四神经网络包括与时序队列中鸟瞰图高维图像特征的数量相同的子网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的子网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time-series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,可行域预测图是以概率化方式呈现的概率图,概率图用于表征未来时刻的环视图像中的像素点属于可行驶区域的概率。In one embodiment, the feasible region prediction map is a probability map presented in a probabilistic manner. The probability map is used to represent the probability that pixels in the surrounding image in the future belong to the drivable area.
在一个实施例中,当概率图中像素点的像素值不大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当概率图中像素点的像素值大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。In one embodiment, when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future moment corresponding to the pixel does not belong to the drivable area. When the pixel of the pixel in the probability map When the value is greater than the set threshold, the pixels in the surrounding image in the future corresponding to the pixels belong to the drivable area.
根据本发明实施例,还提供了一种用于车辆自动驾驶或辅助驾驶的系统,所述系统包括上述中的任意一项所述的可行域预测装置。其中,有关可行域预测装置的描述可以参考前文,在此不再重复。According to an embodiment of the present invention, a system for automatic driving or assisted driving of a vehicle is also provided. The system includes the feasible region prediction device described in any one of the above. The description of the feasible region prediction device can be referred to the previous section and will not be repeated here.
此外,根据本发明实施例,还提供了一种存储介质,在所述存储介质上存储了计算机程序,在所述计算机程序被计算机或处理器运行时用于执行本发明实施例的可行域预测 方法,并且用于实现根据本发明实施例的可行域预测装置中的相应模块。所述存储介质例如可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、或者上述存储介质的任意组合。所述计算机可读存储介质可以是一个或多个计算机可读存储介质的任意组合,例如一个计算机可读存储介质包含用于根据环视图像得到鸟瞰图特征的计算机可读的程序代码,另一个计算机可读存储介质包含用于对鸟瞰图特征进行提取得到当前时刻的鸟瞰图高维图像特征的计算机可读的程序代码。In addition, according to an embodiment of the present invention, a storage medium is also provided. A computer program is stored on the storage medium. When the computer program is run by a computer or processor, it is used to perform feasible region prediction according to the embodiment of the present invention. Method, and used to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention. The storage medium may include, for example, a memory card of a smartphone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media. For example, one computer-readable storage medium contains computer-readable program code for obtaining bird's-eye view features based on surround-view images, and another computer-readable storage medium The readable storage medium contains computer-readable program code for extracting bird's-eye view features to obtain bird's-eye view high-dimensional image features at the current moment.
在一个实施例中,所述计算机程序在被计算机运行时可以实现根据本发明实施例的可行域预测装置的各个功能模块,并且/或者可以执行根据本发明实施例的可行域预测方法。In one embodiment, the computer program, when run by a computer, can implement each functional module of the feasible region prediction apparatus according to the embodiment of the present invention, and/or can execute the feasible region prediction method according to the embodiment of the present invention.
在一个实施例中,所述计算机程序在被计算机或处理器运行时使计算机或处理器执行以下步骤:获取当前时刻的环视图像,并根据环视图像得到鸟瞰图特征;环视图像包括由车辆上多个摄像头采集的多个视角的图像;对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出可行域预测图。In one embodiment, the computer program, when run by the computer or processor, causes the computer or processor to perform the following steps: obtain a surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes multiple images on the vehicle. Images from multiple perspectives collected by each camera; extract the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; based on the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments The formed time series queue generates a future feasible region prediction map and outputs a feasible region prediction map.
在一个实施例中,根据环视图像得到鸟瞰图特征,包括:对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;对多个视角图像高维图像特征进行融合,得到鸟瞰图特征。In one embodiment, obtaining bird's-eye view features based on the surround image includes: performing feature extraction on images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; fusing high-dimensional image features of images from multiple viewing angles to obtain a bird's-eye view graph features.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,根据转换的结果得到鸟瞰图特征。In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: converting the high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system, according to the transformation The result is a bird's eye view feature.
在一个实施例中,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:根据时序队列生成未来的多个时刻的鸟瞰图高维图像特征;对未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成每个时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, generating a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments includes: generating a future multiple prediction map based on the time series queue. Bird's-eye view high-dimensional image features at multiple moments in the future; upsampling the bird's-eye view high-dimensional image features at each moment in the future to generate a feasible region prediction map at each moment , to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,所述计算机程序在被计算机或处理器运行时使计算机或处理器执行的步骤是由训练好的一个神经网络来实施的,神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:第一子网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二子网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三子网络用于对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第四子网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, when the computer program is run by the computer or processor, the steps performed by the computer or processor are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network , the third sub-network and the fourth sub-network, in which: the first sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; The second sub-network is used to fuse the high-dimensional image features of multiple viewing angle images to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is The network is used to generate a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二子网络,并根据第二子网络的输出结果得到鸟瞰图特征。In one embodiment, fusion of high-dimensional image features of multiple viewpoint images to obtain bird's-eye view features includes: inputting the high-dimensional image features of the viewpoint images as keys and the pixel position coordinates in the bird's-eye view features as queries into the second sub-view. network, and obtain bird's-eye view features based on the output of the second sub-network.
在一个实施例中,第三子网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third sub-network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个实施例中,第四子网络包括与时序队列中鸟瞰图高维图像特征的数量相同的分网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的分网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。 In one embodiment, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,所述计算机程序在被计算机或处理器运行时使计算机或处理器执行的步骤是由训练好的多个神经网络来实施的,多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:第一神经网络用于获取当前时刻的多个视角的图像,并对多个视角的图像进行特征提取,得到多个视角图像高维图像特征;第二神经网络用于对多个视角图像高维图像特征进行融合,得到鸟瞰图特征;第三神经网络用于对鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;第四神经网络用于根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。In one embodiment, when the computer program is run by the computer or processor, the steps performed by the computer or processor are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a third neural network, and a first neural network. The second neural network, the third neural network and the fourth neural network, among which: the first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional images from multiple perspectives. Image features; the second neural network is used to fuse high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third neural network is used to extract bird's-eye view features to obtain high-dimensional bird's-eye view image features at the current moment; The fourth neural network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
在一个实施例中,对多个视角图像高维图像特征进行融合,得到鸟瞰图特征,包括:将视角图像高维图像特征作为键、鸟瞰图特征中的像素位置坐标作为查询输入到第二神经网络,并根据第二神经网络的输出结果得到鸟瞰图特征。In one embodiment, high-dimensional image features of multiple perspective images are fused to obtain bird's-eye view features, including: inputting the high-dimensional image features of the perspective images as keys and the pixel position coordinates in the bird's-eye view features as queries into a second neural network, and obtaining the bird's-eye view features based on the output results of the second neural network.
在一个实施例中,第三神经网络包括多个卷积层,对鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:通过其中一个卷积层对鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到鸟瞰图高维图像特征。In one embodiment, the third neural network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting features from the bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.
在一个实施例中,第四神经网络包括与时序队列中鸟瞰图高维图像特征的数量相同的子网络,根据当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:分别基于相应的子网络根据时序队列生成未来的相应时刻的鸟瞰图高维图像特征;分别对未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。In one embodiment, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time-series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.
在一个实施例中,可行域预测图是以概率化方式呈现的概率图,概率图用于表征未来时刻的环视图像中的像素点属于可行驶区域的概率。In one embodiment, the feasible region prediction map is a probability map presented in a probabilistic manner. The probability map is used to represent the probability that pixels in the surrounding image in the future belong to the drivable area.
在一个实施例中,当概率图中像素点的像素值不大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当概率图中像素点的像素值大于设定阈值时,像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。In one embodiment, when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future moment corresponding to the pixel does not belong to the drivable area. When the pixel of the pixel in the probability map When the value is greater than the set threshold, the pixels in the surrounding image in the future corresponding to the pixels belong to the drivable area.
根据本发明实施例的可行域预测装置中的各模块可以通过根据本发明实施例的电子设备的处理器运行在存储器中存储的计算机程序来实现,或者可以在根据本发明实施例的计算机程序产品的计算机可读存储介质中存储的计算机程序被计算机运行时实现。Each module in the feasible region prediction device according to the embodiment of the present invention can be implemented by the processor of the electronic device according to the embodiment of the present invention running a computer program stored in the memory, or can be implemented in a computer program product according to the embodiment of the present invention. The computer program stored in the computer-readable storage medium is implemented when the computer runs.
此外,根据本发明实施例,还提供了一种计算机程序,该计算机程序可以存储在云端或本地的存储介质上。在该计算机程序被计算机或处理器运行时用于执行本发明实施例的可行域预测方法的相应步骤,并且用于实现根据本发明实施例的可行域预测装置中的相应模块。In addition, according to an embodiment of the present invention, a computer program is also provided, and the computer program can be stored in a cloud or a local storage medium. When the computer program is run by a computer or processor, it is used to perform corresponding steps of the feasible region prediction method according to the embodiment of the present invention, and is used to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention.
基于上面的描述,根据本发明实施例的可行域预测方法、装置、系统和存储介质,通过当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征得到未来的可行域预测图,实现了对未来场景的分析和预测,从而可以为车辆自动驾驶或辅助驾驶的行为决策提供依据;且可行域预测图基于所获取的当前时刻的环视图像而生成,可行域预测图通过将场景感知和行为预测结合在一起,能够直接给出障碍物未来的行动轨迹,从而划分出可行驶区域和不可行驶区域,避免了相关技术中需要将可行域分割和障碍物预测划分为两个模块所造成的重复计算和信息累计误差,无需再独立进行行为预测。Based on the above description, according to the feasible region prediction method, device, system and storage medium of the embodiment of the present invention, the future feasible region is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The prediction map realizes the analysis and prediction of future scenarios, thereby providing a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment, and the feasible region prediction map is generated by Combining scene perception and behavior prediction can directly give the future trajectory of obstacles, thereby dividing the drivable area and the non-driving area, avoiding the need to divide feasible region segmentation and obstacle prediction into two parts in related technologies. The repeated calculations and information accumulation errors caused by the module no longer require independent behavior prediction.
尽管这里已经参考附图描述了示例实施例,应理解上述示例实施例仅仅是示例性的,并且不意图将本申请的范围限制于此。本领域普通技术人员可以在其中进行各种改变和修改,而不偏离本申请的范围和精神。所有这些改变和修改意在被包括在所附权利要求所要求的本申请的范围之内。Although example embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above-described example embodiments are exemplary only, and are not intended to limit the scope of the application thereby. Various changes and modifications can be made therein by those of ordinary skill in the art without departing from the scope and spirit of the present application. All such changes and modifications are intended to be included within the scope of the application as claimed in the appended claims.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟 以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those skilled in the art will appreciate that the units and algorithm steps of each example described in the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether to implement in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个设备,或一些特征可以忽略,或不执行。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another device, or some features can be ignored, or not implemented.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
类似地,应当理解,为了精简本申请并帮助理解各个发明方面中的一个或多个,在对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该本申请的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如相应的权利要求书所反映的那样,其发明点在于可以用少于某个公开的单个实施例的所有特征的特征来解决相应的技术问题。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。Similarly, it should be understood that in the description of the exemplary embodiments of the present application, in order to streamline the present application and aid in the understanding of one or more of the various inventive aspects, various features of the present application are sometimes grouped together into a single embodiment, FIG. , or in its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as the corresponding claims reflect, the inventive concept lies in solving a corresponding technical problem with less than all features of a single disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this application.
本领域的技术人员可以理解,除了特征之间相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。It will be understood by those skilled in the art that all features disclosed in this specification (including the accompanying claims, abstract and drawings) and all features of any method or apparatus so disclosed may be used in any combination, except where the features are mutually exclusive. Processes or units are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features included in other embodiments but not others, combinations of features of different embodiments are meant to be within the scope of the present application. within and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的一些模块的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some modules according to embodiments of the present application. The present application may also be implemented as a device program (eg, computer program and computer program product) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any other form.
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The application may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names.
以上所述,仅为本申请的具体实施方式或对具体实施方式的说明,本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以权利要求的保护范围为准。 The above are only specific implementation modes or descriptions of specific implementation modes of the present application. The protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily Any changes or substitutions that come to mind should be covered by the protection scope of this application. The protection scope of this application shall be subject to the protection scope of the claims.

Claims (19)

  1. 一种可行域预测方法,所述方法应用于车辆自动驾驶或者辅助驾驶,其特征在于,包括:A feasible region prediction method, the method is applied to vehicle automatic driving or assisted driving, and is characterized by including:
    获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;Obtain a surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes images from multiple perspectives collected by multiple cameras on the vehicle;
    对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;Extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment;
    根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。A future feasible domain prediction map is generated according to a time series queue formed by high-dimensional image features of the bird's-eye view at the current moment and high-dimensional image features of the bird's-eye view at multiple historical moments, and the feasible domain prediction map is output.
  2. 如权利要求1所述的可行域预测方法,其特征在于,所述根据所述环视图像得到鸟瞰图特征,包括:The feasible domain prediction method according to claim 1, characterized in that the step of obtaining the bird's-eye view features according to the surround image comprises:
    对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;Perform feature extraction on images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;
    对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征。The high-dimensional image features of the multiple viewing angle images are fused to obtain the bird's-eye view features.
  3. 如权利要求2所述的可行域预测方法,其特征在于,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:The feasible domain prediction method according to claim 2, characterized in that the step of fusing the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features comprises:
    将所述多个视角图像高维图像特征从各自视角的图像所在坐标系转换到车辆坐标系,根据所述转换的结果得到所述鸟瞰图特征。The high-dimensional image features of the multiple perspective images are converted from the coordinate system of the images of the respective perspectives to the vehicle coordinate system, and the bird's-eye view features are obtained according to the result of the conversion.
  4. 如权利要求1所述的可行域预测方法,其特征在于,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:The feasible region prediction method according to claim 1, characterized in that the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments generates future predictions. Feasible region prediction map, including:
    根据所述时序队列生成未来的多个时刻的鸟瞰图高维图像特征;Generate bird's-eye view high-dimensional image features at multiple moments in the future according to the time series queue;
    对所述未来的多个时刻的鸟瞰图高维图像特征中的每个时刻的鸟瞰图高维图像特征进行上采样处理,以生成所述每个时刻的可行域预测图,以得到未来的所述多个时刻的可行域预测图。The bird's-eye view high-dimensional image features of each of the bird's-eye view high-dimensional image features of multiple moments in the future are upsampled to generate a feasible region prediction map of each moment to obtain all future predictions. The feasible region prediction map at multiple times is described.
  5. 如权利要求2所述的可行域预测方法,其特征在于,所述方法是由训练好的一个神经网络来实施的,所述神经网络包括第一子网络、第二子网络、第三子网络和第四子网络,其中:The feasible region prediction method according to claim 2, characterized in that the method is implemented by a trained neural network, and the neural network includes a first sub-network, a second sub-network and a third sub-network. and the fourth subnetwork, where:
    所述第一子网络用于获取当前时刻的所述多个视角的图像,并对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;The first sub-network is used to obtain images from multiple viewing angles at the current moment, and perform feature extraction on the images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;
    所述第二子网络用于对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征;The second sub-network is used to fuse the high-dimensional image features of the multiple viewing angle images to obtain the bird's-eye view features;
    所述第三子网络用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;The third sub-network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;
    所述第四子网络用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。The fourth sub-network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
  6. 如权利要求5所述的可行域预测方法,其特征在于,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:The feasible region prediction method according to claim 5, wherein the fusion of high-dimensional image features of the multiple view images to obtain the bird's-eye view features includes:
    将所述视角图像高维图像特征作为键、所述鸟瞰图特征中的像素位置坐标作为查询输入到所述第二子网络,并根据所述第二子网络的输出结果得到所述鸟瞰图特征。The high-dimensional image features of the perspective image are used as keys and the pixel position coordinates in the bird's-eye view features are input into the second sub-network as queries, and the bird's-eye view features are obtained according to the output results of the second sub-network .
  7. 如权利要求5所述的可行域预测方法,其特征在于,所述第三子网络包括多个卷积层,所述对所述鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:The feasible region prediction method according to claim 5, wherein the third sub-network includes a plurality of convolutional layers, and the extraction of the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes:
    通过其中一个所述卷积层对所述鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到所述鸟瞰图高维图像特征。Feature extraction is performed on the bird's-eye view features through one of the convolutional layers, and features are extracted again on the results extracted by the previous convolutional layer through the subsequent convolutional layer to obtain the bird's-eye view high-dimensional image features. .
  8. 如权利要求5所述的可行域预测方法,其特征在于,所述第四子网络包括与所述时序队列中鸟瞰图高维图像特征的数量相同的分网络,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域 预测图,包括:The feasible region prediction method according to claim 5, wherein the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the The time series queue composed of the bird's-eye view high-dimensional image features and the bird's-eye view high-dimensional image features of multiple historical moments generates the feasible region of the future. Forecast graphs, including:
    分别基于相应的所述分网络根据所述时序队列生成未来的相应时刻的鸟瞰图高维图像特征;Generate bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue respectively;
    分别对所述未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。The bird's-eye view high-dimensional image features at the corresponding time in the future are respectively upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
  9. 如权利要求2所述的可行域预测方法,其特征在于,所述方法是由训练好的多个神经网络来实施的,所述多个神经网络包括第一神经网络、第二神经网络、第三神经网络和第四神经网络,其中:The feasible region prediction method according to claim 2, characterized in that the method is implemented by a plurality of trained neural networks, and the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a first neural network. Three neural networks and a fourth neural network, where:
    所述第一神经网络用于获取当前时刻的所述多个视角的图像,并对所述多个视角的图像进行特征提取,得到多个视角图像高维图像特征;The first neural network is used to obtain images from the multiple viewing angles at the current moment, and perform feature extraction on the images from the multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;
    所述第二神经网络用于对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征;The second neural network is used to fuse the high-dimensional image features of the multiple viewing angle images to obtain the bird's-eye view features;
    所述第三神经网络用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;The third neural network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;
    所述第四神经网络用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图。The fourth neural network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
  10. 如权利要求9所述的可行域预测方法,其特征在于,所述对所述多个视角图像高维图像特征进行融合,得到所述鸟瞰图特征,包括:The feasible region prediction method according to claim 9, wherein the fusion of high-dimensional image features of the multiple view images to obtain the bird's-eye view features includes:
    将所述视角图像高维图像特征作为键、所述鸟瞰图特征中的像素位置坐标作为查询输入到所述第二神经网络,并根据所述第二神经网络的输出结果得到所述鸟瞰图特征。The high-dimensional image features of the perspective image are used as keys and the pixel position coordinates in the bird's-eye view features are used as queries to input into the second neural network, and the bird's-eye view features are obtained according to the output results of the second neural network.
  11. 如权利要求9所述的可行域预测方法,其特征在于,所述第三神经网络包括多个卷积层,所述对所述鸟瞰图特征进行提取,得到鸟瞰图高维图像特征,包括:The feasible region prediction method according to claim 9, wherein the third neural network includes a plurality of convolutional layers, and the extraction of the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes:
    通过其中一个所述卷积层对所述鸟瞰图特征进行特征提取,并依次通过后一卷积层对前一卷积层提取的结果进行再次特征提取,以得到所述鸟瞰图高维图像特征。Feature extraction is performed on the bird's-eye view features through one of the convolutional layers, and features are extracted again on the results extracted by the previous convolutional layer through the subsequent convolutional layer to obtain the bird's-eye view high-dimensional image features. .
  12. 如权利要求9所述的可行域预测方法,其特征在于,所述第四神经网络包括与所述时序队列中鸟瞰图高维图像特征的数量相同的子网络,所述根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,包括:The feasible region prediction method according to claim 9, characterized in that the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue. The time series queue composed of the bird's-eye view high-dimensional image features and the bird's-eye view high-dimensional image features of multiple historical moments generates future feasible region prediction maps, including:
    分别基于相应的所述子网络根据所述时序队列生成未来的相应时刻的鸟瞰图高维图像特征;Generate bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue respectively;
    分别对所述未来的相应时刻的鸟瞰图高维图像特征进行上采样处理以生成未来的相应时刻的可行域预测图,以得到未来的多个时刻的可行域预测图。The bird's-eye view high-dimensional image features at the corresponding time in the future are respectively upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
  13. 如权利要求1所述的可行域预测方法,其特征在于,所述可行域预测图是以概率化方式呈现的概率图,所述概率图用于表征未来时刻的环视图像中的像素点属于可行驶区域的概率。The feasible region prediction method according to claim 1, wherein the feasible region prediction map is a probability map presented in a probabilistic manner, and the probability map is used to represent that the pixels in the surrounding image at a future time belong to the feasible region. Probability of driving area.
  14. 如权利要求13所述的可行域预测方法,其特征在于,当所述概率图中像素点的像素值不大于设定阈值时,所述像素点所对应的未来时刻的环视图像中的像素点不属于可行驶区域,当所述概率图中像素点的像素值大于所述设定阈值时,所述像素点所对应的未来时刻的环视图像中的像素点属于可行驶区域。The feasible region prediction method according to claim 13, characterized in that when the pixel value of a pixel in the probability map is not greater than a set threshold, the pixel in the surrounding image at a future time corresponding to the pixel It does not belong to the drivable area. When the pixel value of a pixel in the probability map is greater than the set threshold, the pixel in the surrounding image at a future time corresponding to the pixel belongs to the drivable area.
  15. 一种可行域预测装置,应用于车辆自动驾驶或者辅助驾驶,其特征在于,包括:A feasible region prediction device, applied to vehicle automatic driving or assisted driving, is characterized by including:
    鸟瞰图特征模块,用于获取当前时刻的环视图像,并根据所述环视图像得到鸟瞰图特征;所述环视图像包括由车辆上多个摄像头采集的多个视角的图像;A bird's-eye view feature module is used to obtain the surrounding image at the current moment, and obtain the bird's-eye view feature based on the surrounding image; the surrounding image includes images from multiple perspectives collected by multiple cameras on the vehicle;
    鸟瞰图高维图像特征模块,用于对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;A bird's-eye view high-dimensional image feature module is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;
    可行域预测图模块,用于根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测 图。The feasible region prediction map module is used to generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map. domain prediction picture.
  16. 一种可行域预测装置,应用于车辆自动驾驶或者辅助驾驶,其特征在于,包括:A feasible region prediction device, applied to vehicle automatic driving or assisted driving, is characterized by including:
    多个设置于车辆上的摄像头,用于采集当前时刻的环视图像;所述环视图像包括多个视角的图像;A plurality of cameras installed on the vehicle are used to collect surround images at the current moment; the surround images include images from multiple perspectives;
    一个或多个处理器,用于:One or more processors for:
    获取所述环视图像,并根据所述环视图像得到鸟瞰图特征;Obtain the surround image, and obtain bird's-eye view features based on the surround image;
    对所述鸟瞰图特征进行提取,得到当前时刻的鸟瞰图高维图像特征;Extracting the features of the bird's-eye view to obtain high-dimensional image features of the bird's-eye view at the current moment;
    根据所述当前时刻的鸟瞰图高维图像特征和多个历史时刻的鸟瞰图高维图像特征所构成的时序队列生成未来的可行域预测图,并输出所述可行域预测图。Generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map.
  17. 一种可行域预测装置,其特征在于,包括存储器和处理器,所述存储器上存储有由所述处理器运行的计算机程序,所述计算机程序在由所述处理器运行时,使得所述处理器执行权利要求1至14中任意一项所述的可行域预测方法。A feasible region prediction device, characterized in that it includes a memory and a processor. The memory stores a computer program run by the processor. When the computer program is run by the processor, the computer program causes the processing The device performs the feasible region prediction method described in any one of claims 1 to 14.
  18. 一种用于车辆自动驾驶或辅助驾驶的系统,其特征在于,所述系统包括权利要求15至17中任意一项所述的可行域预测装置。A system for automatic driving or assisted driving of vehicles, characterized in that the system includes the feasible region prediction device according to any one of claims 15 to 17.
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序在由处理器运行时使得所述处理器执行权利要求1至14中任意一项所述的可行域预测方法。 A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the computer program causes the processor to execute any one of claims 1 to 14 The feasible region prediction method.
PCT/CN2023/083769 2022-09-19 2023-03-24 Feasible region prediction method and apparatus, and system and storage medium WO2024060558A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211153320.1 2022-09-19
CN202211153320.1A CN115565154A (en) 2022-09-19 2022-09-19 Feasible region prediction method, device, system and storage medium

Publications (1)

Publication Number Publication Date
WO2024060558A1 true WO2024060558A1 (en) 2024-03-28

Family

ID=84741838

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083769 WO2024060558A1 (en) 2022-09-19 2023-03-24 Feasible region prediction method and apparatus, and system and storage medium

Country Status (2)

Country Link
CN (1) CN115565154A (en)
WO (1) WO2024060558A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565154A (en) * 2022-09-19 2023-01-03 九识(苏州)智能科技有限公司 Feasible region prediction method, device, system and storage medium
CN116168362A (en) * 2023-02-27 2023-05-26 小米汽车科技有限公司 Pre-training method and device for vehicle perception model, electronic equipment and vehicle
CN115965944B (en) * 2023-03-09 2023-05-09 安徽蔚来智驾科技有限公司 Target information detection method, device, driving device and medium
CN116012805B (en) * 2023-03-24 2023-08-29 深圳佑驾创新科技有限公司 Target perception method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876707A (en) * 2018-05-25 2018-11-23 北京市商汤科技开发有限公司 Birds-eye view generates and neural network training method, device, storage medium, equipment
US20210390714A1 (en) * 2020-06-11 2021-12-16 Toyota Research Institute, Inc. Producing a bird's eye view image from a two dimensional image
CN114723955A (en) * 2022-03-30 2022-07-08 上海人工智能创新中心 Image processing method, device, equipment and computer readable storage medium
CN114898315A (en) * 2022-05-05 2022-08-12 北京鉴智科技有限公司 Driving scene information determination method, object information prediction model training method and device
CN115565154A (en) * 2022-09-19 2023-01-03 九识(苏州)智能科技有限公司 Feasible region prediction method, device, system and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108876707A (en) * 2018-05-25 2018-11-23 北京市商汤科技开发有限公司 Birds-eye view generates and neural network training method, device, storage medium, equipment
US20210390714A1 (en) * 2020-06-11 2021-12-16 Toyota Research Institute, Inc. Producing a bird's eye view image from a two dimensional image
CN114723955A (en) * 2022-03-30 2022-07-08 上海人工智能创新中心 Image processing method, device, equipment and computer readable storage medium
CN114898315A (en) * 2022-05-05 2022-08-12 北京鉴智科技有限公司 Driving scene information determination method, object information prediction model training method and device
CN115565154A (en) * 2022-09-19 2023-01-03 九识(苏州)智能科技有限公司 Feasible region prediction method, device, system and storage medium

Also Published As

Publication number Publication date
CN115565154A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
WO2024060558A1 (en) Feasible region prediction method and apparatus, and system and storage medium
CN107808111B (en) Method and apparatus for pedestrian detection and attitude estimation
CN108256404B (en) Pedestrian detection method and device
TWI643137B (en) Object recognition method and object recognition system
CN113901909B (en) Video-based target detection method and device, electronic equipment and storage medium
WO2021249114A1 (en) Target tracking method and target tracking device
CN114549369B (en) Data restoration method and device, computer and readable storage medium
CN113378641A (en) Gesture recognition method based on deep neural network and attention mechanism
CN115249304A (en) Training method and device for detecting segmentation model, electronic equipment and storage medium
CN115131281A (en) Method, device and equipment for training change detection model and detecting image change
CN115131634A (en) Image recognition method, device, equipment, storage medium and computer program product
CN114758337A (en) Semantic instance reconstruction method, device, equipment and medium
CN114998667A (en) Multispectral target detection method, multispectral target detection system, computer equipment and storage medium
CN112633074B (en) Pedestrian information detection method and device, storage medium and electronic equipment
CN110796003B (en) Lane line detection method and device and electronic equipment
CN116258756B (en) Self-supervision monocular depth estimation method and system
CN111652181A (en) Target tracking method and device and electronic equipment
CN114332509B (en) Image processing method, model training method, electronic device and automatic driving vehicle
CN114973424A (en) Feature extraction model training method, hand action recognition method, device and electronic equipment
CN115375739A (en) Lane line generation method, apparatus, and medium
CN114067371A (en) Cross-modal pedestrian trajectory generation type prediction framework, method and device
CN114359565A (en) Image detection method, storage medium and computer terminal
CN115272738A (en) Data processing method, model training method and device
CN115909255B (en) Image generation and image segmentation methods, devices, equipment, vehicle-mounted terminal and medium
CN117788833A (en) Image recognition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23866857

Country of ref document: EP

Kind code of ref document: A1