WO2024060558A1

WO2024060558A1 - Feasible region prediction method and apparatus, and system and storage medium

Info

Publication number: WO2024060558A1
Application number: PCT/CN2023/083769
Authority: WO
Inventors: 崔霄
Original assignee: 九识(苏州)智能科技有限公司
Priority date: 2022-09-19
Filing date: 2023-03-24
Publication date: 2024-03-28
Also published as: CN115565154A

Abstract

The present application provides a feasible region prediction method and apparatus, and a system and a storage medium, which are applied to autonomous driving or assisted driving of a vehicle. The method comprises: acquiring an around-view image at the current moment, and obtaining a bird's-eye-view feature according to the around-view image, wherein the around-view image comprises images at a plurality of angles of view that are collected by a plurality of cameras on a vehicle; extracting the bird's-eye-view feature to obtain a bird's-eye-view high-dimensional image feature at the current moment; and generating a future feasible region prediction image according to the bird's-eye-view high-dimensional image feature at the current moment and a time sequence queue formed by bird's-eye-view high-dimensional image features at a plurality of historical moments, and outputting the feasible region prediction image. In the present application, the analysis and prediction of a future scenario are realized, a basis for a behavior decision of autonomous driving or assisted driving of a vehicle can be provided, and by combining scenario sensing and behavior prediction, repeated calculation and information accumulation errors caused by dividing feasible region segmentation and obstacle prediction into two modules are avoided, without the need to perform independent behavior prediction.

Description

Feasible region prediction method, device, system and storage medium

Technical field

This application relates to the field of assisted/autonomous driving technology, and more specifically to a feasible region prediction method, device, system and storage medium.

Background technique

In applications such as assisted driving and autonomous driving, the segmentation of drivable areas is an important technical link. The segmentation of drivable areas is generally based on information fed back by sensors such as cameras or lidar. Among them, the camera mainly feeds back image information. The advantages of image information are long visual distance, high resolution, and clear expression of the color and texture characteristics of the scene. The disadvantages are the lack of distance information and the difficulty in converting the image coordinate system to the world coordinate system. Lidar mainly feeds back point cloud information. The advantage of point cloud information is that the distance is accurate and no coordinate system conversion is required. The disadvantage is that the resolution is low and the color texture information is missing. At present, the problem of extracting feasible regions is generally considered as the problem of segmentation of static scenes.

Cameras have advantages over lidar in terms of price. In related technologies, taking the target detection algorithm based on the bird's-eye view features of the surround-view camera as an example, a convolutional neural network (CNN, Convolutional Neural Network) encoder is used to encode the input image of each camera, and models such as a transformer are used Convert the encoded features of each camera from the image coordinate system to the vehicle coordinate system to form a bird's-eye view feature (BEV Feature, Bird's-Eye-View Feature), detect the target from the BEV Feature, and output the detection results from the bird's-eye view.

However, the above-mentioned related technologies focus on describing the environment at the current moment through BEV Features. Target detection and feasible region segmentation are both descriptions of the current environment and lack analysis of future scenarios.

In view of the existence of the above problems, this application proposes a new feasible region prediction method, device, system and storage medium to at least partially solve the above problems.

Contents of the invention

This summary introduces a series of concepts in a simplified form that are further described in the detailed description. The summary of the present invention is not intended to limit the key features and necessary technical features of the claimed technical solution, nor is it intended to determine the protection scope of the claimed technical solution.

On the one hand, the present application provides a feasible region prediction method, which method is applied to vehicle automatic driving or assisted driving, including: obtaining a surround image at the current moment, and obtaining bird's-eye view features based on the surround image; the surround image includes Images from multiple perspectives collected by multiple cameras on the vehicle; extracting the bird's-eye view features to obtain high-dimensional bird's-eye view image features at the current moment; based on the high-dimensional bird's-eye view image features at the current moment and multiple histories A time series queue composed of bird's-eye view high-dimensional image features at each moment generates a future feasible region prediction map, and outputs the feasible region prediction map.

In one example, obtaining bird's-eye view features based on the surround image includes: performing feature extraction on images from multiple viewing angles to obtain high-dimensional image features of multiple viewing angle images; The image features are fused to obtain the bird's-eye view features.

In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features includes: converting the high-dimensional image features of the multiple perspective images from the coordinate system of the image from each perspective. to the vehicle coordinate system, and obtain the bird's-eye view features according to the result of the transformation.

In one example, generating a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments includes: according to the time series The queue generates bird's-eye view high-dimensional image features at multiple moments in the future; the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple moments in the future are upsampled to generate all the bird's-eye view high-dimensional image features. The feasible region prediction map at each moment is described to obtain the feasible region prediction map at the multiple moments in the future.

In one example, the method is implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: the third sub-network One sub-network is used to obtain images from multiple viewing angles at the current moment, and perform feature extraction on the images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; the second sub-network is used to extract all images from multiple viewing angles. The high-dimensional image features of the multiple viewing angle images are fused to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the third sub-network is used to extract the bird's-eye view features. The four sub-networks are used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.

In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view feature includes: using the high-dimensional image features of the perspective image as a key, the pixels in the bird's-eye view feature The location coordinates are input to the second sub-network as a query, and the bird's-eye view features are obtained based on the output results of the second sub-network.

In one example, the third sub-network includes multiple convolutional layers, and extracting the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes: using one of the convolutional layers to extract the bird's-eye view features. Feature extraction is performed on the bird's-eye view features, and the features extracted by the previous convolution layer are re-extracted through the subsequent convolution layer to obtain the bird's-eye view high-dimensional image features.

In one example, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the bird's-eye view high-dimensional image features according to the current moment and multiple historical moments Generating a future feasible region prediction map from a time series queue composed of bird's-eye view high-dimensional image features includes: generating bird's-eye view high-dimensional image features at corresponding moments in the future based on the corresponding sub-network and the time series queue respectively; The bird's-eye view high-dimensional image features at the corresponding time in the future are upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.

In one example, the method is implemented by multiple trained neural networks, including a first neural network, a second neural network, a third neural network and a fourth neural network, wherein: the first neural network is used to obtain the images of the multiple perspectives at the current moment, and perform feature extraction on the images of the multiple perspectives to obtain high-dimensional image features of the multiple perspective images; the second neural network is used to fuse the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features; the third neural network is used to extract the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the fourth neural network is used to generate a future feasible domain prediction map based on a time series queue composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments.

In one example, the fusion of the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view feature includes: using the high-dimensional image features of the perspective image as a key, the pixels in the bird's-eye view feature The location coordinates are input to the second neural network as a query, and the bird's-eye view features are obtained based on the output of the second neural network.

In one example, the third neural network includes multiple convolutional layers, and extracting the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes: using one of the convolutional layers to extract the bird's-eye view features. Feature extraction is performed on the bird's-eye view features, and the features extracted by the previous convolution layer are re-extracted through the subsequent convolution layer to obtain the bird's-eye view high-dimensional image features.

In one example, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the bird's-eye view high-dimensional image features according to the current moment and multiple historical moments Generating a future feasible region prediction map from a time-series queue composed of bird's-eye view high-dimensional image features includes: generating bird's-eye view high-dimensional image features at corresponding moments in the future based on the corresponding sub-network and the time-series queue respectively; The bird's-eye view high-dimensional image features at the corresponding time in the future are upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.

In one example, the feasible domain prediction map is a probability map presented in a probabilistic manner, and the probability map is used to represent the probability that a pixel point in the surround view image at a future moment belongs to a drivable area.

In one example, when the pixel value of a pixel in the probability map is not greater than a set threshold, the pixel in the surrounding image at a future time corresponding to the pixel does not belong to the drivable area. When the probability map When the pixel value of the middle pixel is greater than the set threshold, the pixel in the surrounding image in the future corresponding to the pixel belongs to the drivable area. area.

On the other hand, the present application provides a feasible region prediction device, which is applied to vehicle automatic driving or assisted driving, including: a bird's-eye view feature module, used to obtain the surround image at the current moment, and obtain the bird's-eye view features based on the surround image; The surround image includes images from multiple perspectives collected by multiple cameras on the vehicle; a bird's-eye view high-dimensional image feature module is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; feasible The domain prediction map module is used to generate a future feasible domain prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible domain Forecast graph.

On the other hand, the present application provides a feasible domain prediction device, which is applied to vehicle automatic driving or assisted driving, including: multiple cameras arranged on the vehicle, used to collect surround images at the current moment; the surround images include images from multiple perspectives; one or more processors, used to: obtain the surround images, and obtain bird's-eye view features based on the surround images; extract the bird's-eye view features to obtain high-dimensional image features of the bird's-eye view at the current moment; generate a future feasible domain prediction map based on a time series queue composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments, and output the feasible domain prediction map.

Another aspect of the present application provides a feasible region prediction device, which includes a memory and a processor. The memory stores a computer program run by the processor. When the computer program is run by the processor, such that The processor executes any one of the above feasible region prediction methods.

Another aspect of the present application provides a system for automatic driving or assisted driving of vehicles, the system including any one of the feasible region prediction devices described above.

In yet another aspect, the present application provides a computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is run by a processor, the computer program causes the processor to execute any one of the above. The feasible region prediction method described above.

According to the feasible region prediction method, device, system and storage medium of the embodiment of the present application, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and realizes The analysis and prediction of future scenes can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior The combination of prediction and prediction can directly give the future trajectory of obstacles, thereby dividing the drivable area and the non-driving area, avoiding the duplication caused by the need to divide feasible region segmentation and obstacle prediction into two modules in related technologies. Errors in calculations and information accumulate, eliminating the need to independently predict behavior.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting any creative effort.

In the attached picture:

FIG1 is a schematic block diagram of an electronic device according to an embodiment of the present application;

Figure 2 shows a schematic flow chart of a feasible region prediction method according to an embodiment of the present application;

Figure 3 shows a flow chart for obtaining bird's-eye view features from a surround-view image according to an embodiment of the present application;

Figure 4 shows a schematic diagram of the transformation of high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system according to an embodiment of the present application;

Figure 5 shows a flow chart of extracting bird's-eye view features to obtain bird's-eye view high-dimensional image features according to an embodiment of the present application;

Figure 6 shows a flow chart of generating a future feasible region prediction graph based on a timing queue according to an embodiment of the present application;

Figure 7 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present application;

Figure 8 shows a schematic block diagram of another feasible region prediction device according to an embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application more apparent, exemplary embodiments according to the present application will be described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments of the present application. It should be understood that the present application is not limited by the example embodiments described here. Based on the embodiments of the present application described in this application, all other embodiments obtained by those skilled in the art without creative efforts should fall within the protection scope of the present application.

In the following description, numerous specific details are given in order to provide a thorough understanding of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced without one or more of these details. In other examples, some technical features that are well known in the art are not described in order to avoid confusion with the present application.

It will be understood that the application may be embodied in different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the terms "consisting of" and/or "comprising", when used in this specification, identify the presence of stated features, integers, steps, operations, elements and/or parts but do not exclude one or more others The presence or addition of features, integers, steps, operations, elements, parts, and/or groups. When used herein, the term "and/or" includes any and all combinations of the associated listed items.

In order to fully understand the present application, detailed structures will be provided in the following description to explain the technical solutions proposed in the present application. Optional embodiments of the present application are described in detail below. However, in addition to these detailed descriptions, the present application may also have other implementations.

First, an example electronic device 100 for implementing the feasible domain prediction method and apparatus according to an embodiment of the present invention is described with reference to FIG. 1 .

As shown in FIG. 1 , the electronic device 100 includes one or more processors 102 , one or more memories 104 , an input device 106 and an output device 108 . These components are connected through a bus system 110 and/or other forms of connection mechanisms (not shown). out) interconnection. It should be noted that the components and structures of the electronic device 100 shown in FIG. 1 are only exemplary and not restrictive. The electronic device may also have other components and structures as needed.

The processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The memory 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 102 may execute the program instructions to implement the client functions (implemented by the processor) in the embodiments of the present invention described below. and/or other desired functionality. Various application programs and various data, such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (such as images or sounds) to the outside (such as a user), and may include one or more of a display, a speaker, and the like.

Illustratively, an example electronic device for implementing the feasible region prediction method and apparatus according to the embodiment of the present invention may be implemented as a terminal such as a smart phone, a tablet computer, or the like.

Next, a feasible region prediction method according to an embodiment of the present invention will be described with reference to FIG. 2 . Figure 2 is a schematic flow chart of the feasible region prediction method 200 according to the embodiment of the present application. The feasible region prediction method in the embodiment of the present application is used in a feasible region prediction device. The feasible region prediction device includes a processor, a memory, an input device, an output device, etc. The feasible region prediction device can be implemented as the above electronic device 100. Specifically, the feasible region prediction method 200 in the embodiment of the present application can be applied to vehicle automatic driving or assisted driving, including the following steps:

In step S210, obtain the surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes images from multiple perspectives collected by multiple cameras on the vehicle;

In step S220, the bird's-eye view features are extracted to obtain the bird's-eye view high-dimensional image features at the current moment;

In step S230, generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map .

According to the feasible region prediction method 200 of the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, thereby realizing the analysis and analysis of future scenes. Prediction, which can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map can combine scene perception and behavior prediction to It directly gives the future movement trajectory of the obstacle, thereby dividing the drivable area and the non-driving area, avoiding the repeated calculation and information accumulation errors caused by the need to divide the feasible region segmentation and obstacle prediction into two modules in related technologies. No more independent predictions of behavior.

In the embodiment of the present invention, when obtaining the surround image at the current moment in step S210, multiple cameras installed on the vehicle can be exposed simultaneously, thereby collecting images from multiple viewing angles, and the images from multiple viewing angles together constitute the surround image. .

Then, the surrounding image is processed through image processing technology to obtain bird's-eye view features. In one example, feature extraction can be performed on the collected images from multiple viewing angles first to obtain high-dimensional image features of images from multiple viewing angles. The high-dimensional image features here refer to multi-dimensional image features obtained after extracting the perspective image. Preferably, the high-dimensional image feature dimension of the perspective image is greater than 3. For example, the perspective image can be processed through Resnet50 (residual network 50), and the output result is the high-dimensional image feature of the perspective image. Then, the high-dimensional image features of multiple viewing angle images can be fused to obtain the bird's-eye view features. Of course, in addition to the above methods, other image processing methods can also be used to obtain bird's-eye view features, and this is not limited. For example, as shown in Figure 3, feature extractor 1 can be used to extract features from the images collected by camera 1, feature extractor 2 can be used to extract features from the images collected by camera 2, and feature extractor 3 can be used to extract features from the images collected by camera 3. Feature extraction... By analogy, high-dimensional image features of n viewing angle images are obtained, and then the high-dimensional image features of n viewing angle images are fused through the converter network to obtain the feature dimension of batchsize*H*W*C (where, batchsize represents the batch size, H represents the height of the feature, W represents the width of the feature, and C represents the channel of the feature).

It is worth noting that the essence of the process of fusing high-dimensional image features of multiple viewing angle images to obtain the bird's-eye view features is to perform coordinate conversion, that is, converting the high-dimensional image features of multiple viewing angle images from the coordinates of the image from each viewing angle. The system is converted to the vehicle coordinate system, and the bird's-eye view features are obtained based on the conversion result. The obtained bird's-eye view feature can be regarded as a bird's-eye view image in the vehicle coordinate system. For example, as shown in Figure 4, Figure 4(a), Figure 4(b), Figure 4(c), Figure 4(d), Figure 4(e) and Figure 4(f) on the right are different viewing angles The images captured by the camera at the same time. Each image is in its own image coordinate system. Figure 4(g) on the left is an example of BEV Feature, which integrates the image information from all perspectives on the right and projects it to in the vehicle coordinate system.

In the embodiment of the present invention, the process of obtaining the high-dimensional image features of the bird's-eye view image from the bird's-eye view features in step S220 can refer to the process of obtaining the high-dimensional image features of the perspective image from the perspective image in step S210, or other image processing methods can also be adopted. accomplish.

In the embodiment of the present invention, in step S230, a time series queue is obtained by sequentially encoding the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and then the future queue is generated based on the time series queue. Feasible region prediction map. For example, taking F _t as the bird's-eye view high-dimensional image feature at the current moment, F _tn , F _t-(n-1) ...F _t-1 As a bird's-eye view high-dimensional image feature of multiple historical moments, the generated time series queue can be expressed as {F _tn , F _t-(n-1) ...F _t }.

In another example, the generation of the future feasible domain prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features at multiple historical moments may include: generating the bird's-eye view high-dimensional image features at multiple future moments based on the time series queue; upsampling the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple future moments to generate the feasible domain prediction map at each moment, so as to obtain the feasible domain prediction map at the multiple future moments. In this example, feasible domain prediction maps for the same number of future moments as the current moment and historical moments are generated, so as to predict the drivable and non-drivable areas in the surround image within a period of time in the future.

In the embodiment of the present invention, the aforementioned feature extraction of the perspective image obtains the high-dimensional image features of the perspective image, the fusion of the high-dimensional image features of the perspective image obtains the bird's-eye view features, and the extraction of the bird's-eye view features obtains the bird's-eye view high-dimensional image. The computation of features and the generation of feasible region prediction maps can be performed through different parts of the same trained neural network. For example, a neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: for example, the first sub-network obtains images from multiple perspectives at the current moment and analyzes images from multiple perspectives. Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first sub-network is input to the second sub-network, and the second sub-network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second sub-network is input to the third sub-network, and the third sub-network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third sub-network is input to the fourth sub-network, and the The four sub-networks generate future feasible region prediction maps based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Neural networks can be convolutional neural networks, deep neural networks, etc. For example, they can be CNN-based networks such as MoblieNet (mobile network) and Resnet (residual network), or they can be Transformer-based networks such as Vision Transformer (Visual Transformer). , the specific type of neural network is not limited.

Further, the second sub-network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), the pixel position coordinates in the bird's-eye view feature It is input to the second sub-network as Query, and the bird's-eye view features are obtained based on the output results of the second sub-network.

Further, the third sub-network can include multiple convolutional layers. The third sub-network can extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features in the following manner: Characterize the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer (CONV ₁ ) for feature extraction, and then the extracted results of the first convolution layer (CONV ₁ ) are input to the next convolution layer, and so on, until n times of feature extraction are achieved after passing through the nth convolution layer (CONV _n ), thereby obtaining the bird's-eye view high-dimensional image feature F.

Further, the fourth sub-network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the fourth sub-network may adopt the following method according to the bird's-eye view high-dimensional image features at the current moment and multiple historical moments. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the fourth sub-network including multiple Transformers as an example, the sub-network Transformer ₁ generates the future t based on the timing queue composed of F _tn , F _t-(...) , and F _t The bird's-eye view high-dimensional image feature F _t+1 at time +1 is obtained by upsampling F _t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement a similar process, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.

In another embodiment of the present invention, the aforementioned feature extraction of the perspective image obtains the high-dimensional image features of the perspective image, the fusion of the high-dimensional image features of the perspective image obtains the bird's-eye view feature, and the extraction of the bird's-eye view feature obtains the high-dimensional bird's-eye view feature. Computations such as dimensional image features and the generation of feasible region prediction maps can also be implemented through multiple different neural networks. For example, the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a fourth neural network, wherein: for example, the first neural network acquires images from multiple perspectives at the current moment, and analyzes the multiple perspectives images into Feature extraction is performed to obtain high-dimensional image features of multiple viewing angle images; the output of the first neural network is input to the second neural network, and the second neural network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; second The output of the neural network is input to the third neural network, and the third neural network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third neural network is input to the fourth neural network, and the fourth neural network The future feasible region prediction map is generated based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Among them, the neural network can be a convolutional neural network, a deep neural network, etc. For example, the first neural network can be a CNN-based network such as MoblieNet or Resnet, or a Transformer-based network such as Vision Transformer. The second neural network can be a Transformer network. , the third neural network can be a convolutional neural network, a Transformer network, etc., and the fourth neural network can be a Transformer network, and the specific type of the neural network is not limited.

Further, the second neural network can adopt the following method to fuse the high-dimensional image features of multiple perspective images to obtain the bird's-eye view feature: use the high-dimensional image features of the perspective image as Key (key), and the pixel position coordinates in the bird's-eye view feature. It is input to the second neural network as Query, and the bird's-eye view features are obtained according to the output result of the second neural network.

Further, the third neural network may include multiple convolutional layers. The third neural network may extract bird's-eye view features to obtain bird's-eye view high-dimensional image features in the following manner: Feature the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer for feature extraction, and then the extraction results of the first convolution layer are input to the next convolution layer, and so on, until After the nth convolutional layer, n times of feature extraction are implemented, thereby obtaining the bird's-eye view high-dimensional image feature F.

Further, the fourth neural network may include the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue. The fourth neural network may adopt the following method to calculate the bird's-eye view high-dimensional image features at the current moment and multiple historical moments in the following manner. The time series queue composed of the bird's-eye view high-dimensional image features generates the future feasible region prediction map: the bird's-eye view high-dimensional image features of the corresponding time in the future are generated based on the corresponding sub-network and the time-series queue; respectively, the bird's-eye view high-dimensional image features of the corresponding time in the future are generated. The high-dimensional image features of the graph are upsampled to generate feasible region prediction maps at corresponding moments in the future to obtain feasible region prediction maps at multiple times in the future. For example, as shown in Figure 6, taking the sub-network of the fourth neural network including multiple Transformers as an example, the sub-network Transformer ₁ generates the future t based on the timing queue composed of F _tn , F _t-(...) , and F _t The bird's-eye view high-dimensional image feature F _t+1 at time +1 is obtained by upsampling F _t+1 to obtain the feasible region prediction map at time t+1 in the future; other sub-networks implement similar processes, and finally can obtain Feasible region prediction map for multiple times in the future such as t+1, t+(...), t+q, etc.

In embodiments of the present invention, the obtained feasible region prediction map can be a probability map presented in a probabilistic manner. The probability map can represent the probability that the pixels in the surrounding image in the future belong to the drivable area, so that Probabilistically express the drivability of the road in the future time period.

Furthermore, the feasible region prediction map can be expressed probabilistically in the following way: when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel does not belong to the feasible region. Driving area, when the pixel value of a pixel in the probability map is greater than the set threshold, the pixel in the surrounding image at the future time corresponding to the pixel belongs to the drivable area. For example, the pixel value of each pixel in the probability map is a floating point number, and the value of the floating point number is in the interval [0, 1]. Assume that the pixel value corresponding to the pixel with coordinates (x, y) is k, then if If the pixel is a non-travelable area at time Q in the future, then k→0 (k tends to 0); if the pixel is a drivable area at time Q in the future, then k→1 (k tends to 1).

Based on the above description, according to the feasible region prediction method of the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, realizing the prediction of the future. The analysis and prediction of the scene can provide a basis for the behavioral decision-making of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior prediction. Together, it can directly give the future trajectory of the obstacle, thereby dividing the drivable area and the non-driving area, avoiding the duplication caused by the need to divide feasible region segmentation and obstacle prediction into two modules in related technologies. Errors in calculations and information accumulate, eliminating the need to independently predict behavior.

The above exemplarily describes the feasible region prediction method according to the embodiment of the present invention. Exemplarily, the feasible region prediction method according to the embodiment of the present invention can be implemented in a device, device or system having a memory and a processor.

In addition, the feasible region prediction method according to the embodiment of the present invention can be easily deployed on local terminals such as smartphones and tablet computers. Alternatively, the feasible region prediction method according to the implementation regulations of the present invention can also be deployed on the server side (or cloud). Alternatively, the feasible region prediction method according to the embodiment of the present invention can also be deployed in a distributed manner on the server side (or cloud) and the local terminal.

Figure 7 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present invention. As shown in Figure 7, the feasible region prediction device 700 according to the embodiment of the present invention can be applied to vehicle automatic driving or assisted driving, and includes a bird's-eye view feature module 710, a bird's-eye view high-dimensional image feature module 720 and a feasible region prediction map module 730. Among them, the bird's-eye view feature module 710 is used to obtain the surrounding image at the current moment, and obtain the bird's-eye view features according to the surrounding image; the surrounding image includes images from multiple perspectives collected by multiple cameras on the vehicle; the bird's-eye view is high-dimensional The image feature module 720 is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the feasible region prediction map module 730 is used to extract the bird's-eye view high-dimensional image features at the current moment and multiple histories. A time series queue composed of bird's-eye view high-dimensional image features at each moment generates a future feasible region prediction map, and outputs the feasible region prediction map.

Among them, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can be implemented by the processor 102 in the electronic device 100 shown in Figure 1 running the program instructions stored in the memory 104, and Corresponding steps in the feasible region prediction method 200 according to the embodiment of the present invention may be performed. Only the main functions of each module of the feasible region prediction device are described below, and the details described above are omitted.

In the embodiment of the present invention, when acquiring the surrounding image at the current moment, the bird's-eye view feature module 710 can simultaneously expose through multiple cameras installed on the vehicle, thereby collecting images from multiple perspectives, and the images from multiple perspectives together constitute Look around the image.

Then, the surrounding image is processed through image processing technology to obtain bird's-eye view features. In one example, feature extraction can be performed on the collected images from multiple viewing angles first to obtain high-dimensional image features of images from multiple viewing angles. The high-dimensional image features here refer to the multi-dimensional image features obtained after extracting the perspective image. Normally, they should be higher than three dimensions. For example, the perspective image can be processed through Resnet50, and the output result is the high-dimensional image feature of the perspective image. Then, the high-dimensional image features of multiple viewing angle images can be fused to obtain the bird's-eye view features. Of course, in addition to the above methods, other image processing methods can also be used to obtain bird's-eye view features, and this is not limited. For example, as shown in Figure 3, feature extractor 1 can be used to extract features from the images collected by camera 1, feature extractor 2 can be used to extract features from the images collected by camera 2, and feature extractor 3 can be used to extract features from the images collected by camera 3. Feature extraction... By analogy, high-dimensional image features of n viewpoint images are obtained, and then the high-dimensional image features of n viewpoint images are fused through the Transformer network to obtain bird's-eye view features with feature dimensions of batchsize*H*W*C.

In the embodiment of the present invention, the process of the bird's-eye view high-dimensional image feature module 720 obtaining the bird's-eye view high-dimensional image features from the bird's-eye view features may refer to the process of the bird's-eye view feature module 710 obtaining the high-dimensional image features of the perspective image from the perspective image, or Other image processing methods can also be used.

In the embodiment of the present invention, the feasible region prediction map module 730 performs time-series coding on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments to obtain a time-series queue, and then calculates the time-series queue according to the time-series queue. Generate a prediction map of the future feasible region. For example, if F _t is used as the bird's-eye view high-dimensional image feature at the current moment, F _tn , F _t-(n-1) ...F _t-1 is used as the bird's-eye view high-dimensional image feature at multiple historical moments, then the generated time series The queue can be expressed as {F _tn , F _t-(n-1) ...F _t }.

In another example, generating a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments may include: according to the The time series queue generates bird's-eye view high-dimensional image features at multiple moments in the future; the bird's-eye view high-dimensional image features at each moment in the bird's-eye view high-dimensional image features at multiple moments in the future are upsampled to generate The feasible region prediction map at each moment is used to obtain the feasible region prediction map at the multiple moments in the future. In this example, a feasible region prediction map of the same number of future moments as the current moment and historical moments is generated, thereby predicting the drivable area and the non-driving area in the surrounding image for a period of time in the future.

In the embodiment of the present invention, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can be implemented by different parts of the same trained neural network. For example, a neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: for example, the first sub-network obtains images from multiple perspectives at the current moment and analyzes images from multiple perspectives. Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first sub-network is input to the second sub-network, and the second sub-network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second sub-network is input to the third sub-network, and the third sub-network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third sub-network is input to the fourth sub-network, and the The four sub-networks generate future feasible region prediction maps based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. Neural networks can be convolutional neural networks, deep neural networks, etc. For example, they can be CNN-based networks such as MoblieNet and Resnet, or they can be Transformer-based networks such as Vision Transformer. The specific type of neural network is not limited.

Further, the third sub-network can include multiple convolutional layers. The third sub-network can extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features in the following manner: Characterize the bird's-eye view features through one of the convolutional layers. Extract, and then re-extract features from the results extracted by the previous convolution layer through the latter convolution layer to obtain the bird's-eye view high-dimensional image features. For example, as shown in Figure 5, the bird's-eye view features are first input to the first convolution layer for feature extraction, and then the extraction results of the first convolution layer are input to the next convolution layer, and so on, until After the nth convolutional layer, n times of feature extraction are implemented, thereby obtaining the bird's-eye view high-dimensional image feature F.

In another embodiment of the present invention, the bird's-eye view feature module 710, the bird's-eye view high-dimensional image feature module 720 and the feasible region prediction map module 730 can also be implemented through multiple different neural networks. For example, the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a fourth neural network, wherein: for example, the first neural network acquires images from multiple perspectives at the current moment, and analyzes the multiple perspectives Feature extraction is performed on the image to obtain high-dimensional image features of multiple viewing angle images; the output of the first neural network is input to the second neural network, and the second neural network fuses the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; The output of the second neural network is input to the third neural network, The third neural network extracts the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; the output of the third neural network is input to the fourth neural network, and the fourth neural network is based on the high-dimensional image features of the bird's-eye view at the current moment and A time series queue composed of bird's-eye view high-dimensional image features at multiple historical moments generates a future feasible region prediction map. Among them, the neural network can be a convolutional neural network, a deep neural network, etc. For example, the first neural network can be a CNN-based network such as MoblieNet or Resnet, or a Transformer-based network such as Vision Transformer. The second neural network can be a Transformer network. , the third neural network can be a convolutional neural network, a Transformer network, etc., and the fourth neural network can be a Transformer network, and the specific type of the neural network is not limited.

Based on the above description, according to the feasible region prediction device 700 according to the embodiment of the present invention, the future feasible region prediction map is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and realizes the prediction The analysis and prediction of future scenes can provide a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment. The feasible region prediction map combines scene perception and behavior prediction. Combined together, the future trajectory of obstacles can be directly given, thereby dividing the drivable area and the non-driving area, avoiding the repeated calculations caused by dividing feasible region segmentation and obstacle prediction into two modules in related technologies. and information accumulation error, eliminating the need to independently predict behavior.

In addition, those of ordinary skill in the art can appreciate that the modules and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. These skills Whether it can be implemented in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of the present invention.

According to an embodiment of the present invention, a feasible region prediction device is also provided. The device is applied to vehicle automatic driving or assisted driving, and includes: a plurality of cameras installed on the vehicle for collecting surround images at the current moment; The surround image includes images from multiple perspectives; one or more processors are used to: obtain the surround image and obtain bird's-eye view features based on the surround image; extract the bird's-eye view features to obtain the bird's-eye view at the current moment map high-dimensional image features; generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction picture.

In one example, obtaining bird's-eye view features based on the surround image includes: extracting features from images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; fusing high-dimensional image features of images from multiple viewing angles to obtain a bird's-eye view. feature.

In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: converting the high-dimensional image features of multiple viewing angle images from the coordinate system of the image from each viewing angle to the vehicle coordinate system, according to the transformation The result is a bird's eye view feature.

In one embodiment, generating a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments includes: generating a future multiple prediction map based on the time series queue. Bird's-eye view high-dimensional image features at multiple moments in the future; upsampling the bird's-eye view high-dimensional image features at each moment in the future to generate a feasible region prediction map at each moment , to obtain feasible region prediction maps at multiple times in the future.

In one embodiment, the steps performed by the processor are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network and a fourth sub-network, wherein: the first The sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; the second sub-network is used to extract high-dimensional image features of images from multiple perspectives. Fusion is performed to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is used to extract the bird's-eye view high-dimensional image features at the current moment and the multi-dimensional A time series queue composed of high-dimensional bird's-eye view image features at each historical moment generates a future feasible region prediction map.

In one embodiment, high-dimensional image features of multiple perspective images are fused to obtain bird's-eye view features, including: inputting the high-dimensional image features of the perspective images as keys and the pixel position coordinates in the bird's-eye view features as queries into a second sub-network, and obtaining the bird's-eye view features according to an output result of the second sub-network.

In one embodiment, the third sub-network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.

In one embodiment, the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.

In one embodiment, the steps performed by the processor are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a second neural network, a third neural network and a fourth neural network, wherein : The first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of multiple perspective images; the second neural network is used to extract high-dimensional image features from multiple perspective images. The third neural network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the fourth neural network is used to extract the bird's-eye view high-dimensional image features at the current moment. Aerial views of features and multiple historical moments The temporal queue composed of high-dimensional image features generates a future feasible region prediction map.

In one embodiment, fusing high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features includes: using the high-dimensional image features of the viewing angle images as keys and the pixel position coordinates in the bird's-eye view features as queries and inputting them into the second neural network. network, and obtain bird's-eye view features based on the output of the second neural network.

In one embodiment, the third neural network includes multiple convolutional layers to extract features of the bird's-eye view to obtain high-dimensional image features of the bird's-eye view, including: extracting features of the bird's-eye view through one of the convolutional layers, and extracting features again on the results extracted by the previous convolutional layer through the subsequent convolutional layer in turn to obtain high-dimensional image features of the bird's-eye view.

In one embodiment, the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, based on the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The constituted time series queue generates a future feasible region prediction map, including: generating bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time-series queue; respectively generating bird's-eye view high-dimensional image features at the corresponding time in the future. An upsampling process is performed to generate feasible region prediction maps at corresponding times in the future to obtain feasible region prediction maps at multiple times in the future.

In one embodiment, the feasible region prediction map is a probability map presented in a probabilistic manner. The probability map is used to represent the probability that pixels in the surrounding image in the future belong to the drivable area.

In one embodiment, when the pixel value of the pixel in the probability map is not greater than the set threshold, the pixel in the surrounding image at the future moment corresponding to the pixel does not belong to the drivable area. When the pixel of the pixel in the probability map When the value is greater than the set threshold, the pixels in the surrounding image in the future corresponding to the pixels belong to the drivable area.

Figure 8 shows a schematic block diagram of a feasible region prediction device according to an embodiment of the present invention. The feasible region prediction device 800 includes a memory 810 and a processor 820 .

The memory 810 stores a computer program for implementing corresponding steps in the feasible region prediction method according to the embodiment of the present invention. The processor 820 is used to run the computer program stored in the memory 810 to perform corresponding steps of the feasible region prediction method according to the embodiment of the present invention, and to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention.

In one embodiment, when the computer program is run by the processor 820, the feasible region prediction device 800 is caused to perform the following steps: obtain the surrounding image at the current moment, and obtain bird's-eye view features based on the surrounding image; the surrounding image includes images from multiple cameras on the vehicle Images collected from multiple perspectives; the bird's-eye view features are extracted to obtain the high-dimensional image features of the bird's-eye view at the current moment; it is composed of the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments. The time series queue generates the future feasible region prediction map and outputs the feasible region prediction map.

In one embodiment, obtaining bird's-eye view features based on the surround image includes: performing feature extraction on images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles; fusing high-dimensional image features of images from multiple viewing angles to obtain a bird's-eye view graph features.

In one embodiment, when the computer program is run by the processor 820, the steps performed by the feasible region prediction device 800 are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network, a third sub-network, and a first sub-network. Three sub-networks and the fourth sub-network, among which: the first sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; the second sub-network is used to obtain images from multiple perspectives at the current moment. The sub-network is used to fuse high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third sub-network is used to extract bird's-eye view features. Obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is used to generate future feasible region predictions based on the time series queue composed of the bird's-eye view high-dimensional image features at the current moment and the bird's-eye view high-dimensional image features at multiple historical moments. picture.

In one embodiment, fusion of high-dimensional image features of multiple viewpoint images to obtain bird's-eye view features includes: inputting the high-dimensional image features of the viewpoint images as keys and the pixel position coordinates in the bird's-eye view features as queries into the second sub-view. network, and obtain bird's-eye view features based on the output of the second sub-network.

In one embodiment, when the computer program is run by the processor 820, the steps performed by the feasible region prediction device 800 are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a second neural network, and a second neural network. network, the third neural network and the fourth neural network, wherein: the first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives. ; The second neural network is used to fuse the high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third neural network is used to extract the bird's-eye view features to obtain the high-dimensional bird's-eye view image features at the current moment; the fourth The neural network is used to generate a future feasible region prediction map based on a time series queue composed of high-dimensional bird's-eye view image features at the current moment and high-dimensional bird's-eye view image features at multiple historical moments.

In one embodiment, the third neural network includes multiple convolutional layers, extracts bird's-eye view features, and obtains bird's-eye view high-dimensional image features, including: extracting features from the bird's-eye view features through one of the convolutional layers, and sequentially The features extracted by the previous convolution layer are extracted again through the latter convolution layer to obtain the bird's-eye view high-dimensional image features.

According to an embodiment of the present invention, a system for automatic driving or assisted driving of a vehicle is also provided. The system includes the feasible region prediction device described in any one of the above. The description of the feasible region prediction device can be referred to the previous section and will not be repeated here.

In addition, according to an embodiment of the present invention, a storage medium is also provided. A computer program is stored on the storage medium. When the computer program is run by a computer or processor, it is used to perform feasible region prediction according to the embodiment of the present invention. Method, and used to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention. The storage medium may include, for example, a memory card of a smartphone, a storage component of a tablet computer, a hard disk of a personal computer, a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), USB memory, or any combination of the above storage media. The computer-readable storage medium may be any combination of one or more computer-readable storage media. For example, one computer-readable storage medium contains computer-readable program code for obtaining bird's-eye view features based on surround-view images, and another computer-readable storage medium The readable storage medium contains computer-readable program code for extracting bird's-eye view features to obtain bird's-eye view high-dimensional image features at the current moment.

In one embodiment, the computer program, when run by a computer, can implement each functional module of the feasible region prediction apparatus according to the embodiment of the present invention, and/or can execute the feasible region prediction method according to the embodiment of the present invention.

In one embodiment, the computer program, when run by the computer or processor, causes the computer or processor to perform the following steps: obtain a surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes multiple images on the vehicle. Images from multiple perspectives collected by each camera; extract the bird's-eye view features to obtain the high-dimensional image features of the bird's-eye view at the current moment; based on the high-dimensional image features of the bird's-eye view at the current moment and the high-dimensional image features of the bird's-eye view at multiple historical moments The formed time series queue generates a future feasible region prediction map and outputs a feasible region prediction map.

In one embodiment, when the computer program is run by the computer or processor, the steps performed by the computer or processor are implemented by a trained neural network. The neural network includes a first sub-network, a second sub-network , the third sub-network and the fourth sub-network, in which: the first sub-network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional image features of images from multiple perspectives; The second sub-network is used to fuse the high-dimensional image features of multiple viewing angle images to obtain the bird's-eye view features; the third sub-network is used to extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment; the fourth sub-network is The network is used to generate a future feasible region prediction map based on a time series queue composed of bird's-eye view high-dimensional image features at the current moment and bird's-eye view high-dimensional image features at multiple historical moments.

In one embodiment, when the computer program is run by the computer or processor, the steps performed by the computer or processor are implemented by a plurality of trained neural networks. The plurality of neural networks include a first neural network, a third neural network, and a first neural network. The second neural network, the third neural network and the fourth neural network, among which: the first neural network is used to obtain images from multiple perspectives at the current moment, and perform feature extraction on images from multiple perspectives to obtain high-dimensional images from multiple perspectives. Image features; the second neural network is used to fuse high-dimensional image features of multiple viewing angle images to obtain bird's-eye view features; the third neural network is used to extract bird's-eye view features to obtain high-dimensional bird's-eye view image features at the current moment; The fourth neural network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.

In one embodiment, high-dimensional image features of multiple perspective images are fused to obtain bird's-eye view features, including: inputting the high-dimensional image features of the perspective images as keys and the pixel position coordinates in the bird's-eye view features as queries into a second neural network, and obtaining the bird's-eye view features based on the output results of the second neural network.

Each module in the feasible region prediction device according to the embodiment of the present invention can be implemented by the processor of the electronic device according to the embodiment of the present invention running a computer program stored in the memory, or can be implemented in a computer program product according to the embodiment of the present invention. The computer program stored in the computer-readable storage medium is implemented when the computer runs.

In addition, according to an embodiment of the present invention, a computer program is also provided, and the computer program can be stored in a cloud or a local storage medium. When the computer program is run by a computer or processor, it is used to perform corresponding steps of the feasible region prediction method according to the embodiment of the present invention, and is used to implement corresponding modules in the feasible region prediction device according to the embodiment of the present invention.

Based on the above description, according to the feasible region prediction method, device, system and storage medium of the embodiment of the present invention, the future feasible region is obtained through the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments. The prediction map realizes the analysis and prediction of future scenarios, thereby providing a basis for behavioral decisions of vehicle automatic driving or assisted driving; and the feasible region prediction map is generated based on the obtained surround image at the current moment, and the feasible region prediction map is generated by Combining scene perception and behavior prediction can directly give the future trajectory of obstacles, thereby dividing the drivable area and the non-driving area, avoiding the need to divide feasible region segmentation and obstacle prediction into two parts in related technologies. The repeated calculations and information accumulation errors caused by the module no longer require independent behavior prediction.

Although example embodiments have been described herein with reference to the accompanying drawings, it should be understood that the above-described example embodiments are exemplary only, and are not intended to limit the scope of the application thereby. Various changes and modifications can be made therein by those of ordinary skill in the art without departing from the scope and spirit of the present application. All such changes and modifications are intended to be included within the scope of the application as claimed in the appended claims.

Those skilled in the art will appreciate that the units and algorithm steps of each example described in the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether to implement in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered beyond the scope of this application.

In the several embodiments provided in this application, it should be understood that the disclosed devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another device, or some features can be ignored, or not implemented.

In the instructions provided here, a number of specific details are described. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.

Similarly, it should be understood that in the description of the exemplary embodiments of the present application, in order to streamline the present application and aid in the understanding of one or more of the various inventive aspects, various features of the present application are sometimes grouped together into a single embodiment, FIG. , or in its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as the corresponding claims reflect, the inventive concept lies in solving a corresponding technical problem with less than all features of a single disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this application.

It will be understood by those skilled in the art that all features disclosed in this specification (including the accompanying claims, abstract and drawings) and all features of any method or apparatus so disclosed may be used in any combination, except where the features are mutually exclusive. Processes or units are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will understand that although some embodiments described herein include certain features included in other embodiments but not others, combinations of features of different embodiments are meant to be within the scope of the present application. within and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some modules according to embodiments of the present application. The present application may also be implemented as a device program (eg, computer program and computer program product) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from an Internet website, or provided on a carrier signal, or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The application may be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In the element claim enumerating several means, several of these means may be embodied by the same item of hardware. The use of the words first, second, third, etc. does not indicate any order. These words can be interpreted as names.

The above are only specific implementation modes or descriptions of specific implementation modes of the present application. The protection scope of the present application is not limited thereto. Any person familiar with the technical field can easily Any changes or substitutions that come to mind should be covered by the protection scope of this application. The protection scope of this application shall be subject to the protection scope of the claims.

Claims

A feasible region prediction method, the method is applied to vehicle automatic driving or assisted driving, and is characterized by including:

Obtain a surround image at the current moment, and obtain bird's-eye view features based on the surround image; the surround image includes images from multiple perspectives collected by multiple cameras on the vehicle;

Extract the bird's-eye view features to obtain the bird's-eye view high-dimensional image features at the current moment;

A future feasible domain prediction map is generated according to a time series queue formed by high-dimensional image features of the bird's-eye view at the current moment and high-dimensional image features of the bird's-eye view at multiple historical moments, and the feasible domain prediction map is output.
The feasible domain prediction method according to claim 1, characterized in that the step of obtaining the bird's-eye view features according to the surround image comprises:

Perform feature extraction on images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;

The high-dimensional image features of the multiple viewing angle images are fused to obtain the bird's-eye view features.
The feasible domain prediction method according to claim 2, characterized in that the step of fusing the high-dimensional image features of the multiple perspective images to obtain the bird's-eye view features comprises:

The high-dimensional image features of the multiple perspective images are converted from the coordinate system of the images of the respective perspectives to the vehicle coordinate system, and the bird's-eye view features are obtained according to the result of the conversion.
The feasible region prediction method according to claim 1, characterized in that the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments generates future predictions. Feasible region prediction map, including:

Generate bird's-eye view high-dimensional image features at multiple moments in the future according to the time series queue;

The bird's-eye view high-dimensional image features of each of the bird's-eye view high-dimensional image features of multiple moments in the future are upsampled to generate a feasible region prediction map of each moment to obtain all future predictions. The feasible region prediction map at multiple times is described.
The feasible region prediction method according to claim 2, characterized in that the method is implemented by a trained neural network, and the neural network includes a first sub-network, a second sub-network and a third sub-network. and the fourth subnetwork, where:

The first sub-network is used to obtain images from multiple viewing angles at the current moment, and perform feature extraction on the images from multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;

The second sub-network is used to fuse the high-dimensional image features of the multiple viewing angle images to obtain the bird's-eye view features;

The third sub-network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;

The fourth sub-network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
The feasible region prediction method according to claim 5, wherein the fusion of high-dimensional image features of the multiple view images to obtain the bird's-eye view features includes:

The high-dimensional image features of the perspective image are used as keys and the pixel position coordinates in the bird's-eye view features are input into the second sub-network as queries, and the bird's-eye view features are obtained according to the output results of the second sub-network .
The feasible region prediction method according to claim 5, wherein the third sub-network includes a plurality of convolutional layers, and the extraction of the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes:

Feature extraction is performed on the bird's-eye view features through one of the convolutional layers, and features are extracted again on the results extracted by the previous convolutional layer through the subsequent convolutional layer to obtain the bird's-eye view high-dimensional image features. .
The feasible region prediction method according to claim 5, wherein the fourth sub-network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue, and the The time series queue composed of the bird's-eye view high-dimensional image features and the bird's-eye view high-dimensional image features of multiple historical moments generates the feasible region of the future. Forecast graphs, including:

Generate bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue respectively;

The bird's-eye view high-dimensional image features at the corresponding time in the future are respectively upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
The feasible region prediction method according to claim 2, characterized in that the method is implemented by a plurality of trained neural networks, and the plurality of neural networks include a first neural network, a second neural network, a third neural network, and a first neural network. Three neural networks and a fourth neural network, where:

The first neural network is used to obtain images from the multiple viewing angles at the current moment, and perform feature extraction on the images from the multiple viewing angles to obtain high-dimensional image features of images from multiple viewing angles;

The second neural network is used to fuse the high-dimensional image features of the multiple viewing angle images to obtain the bird's-eye view features;

The third neural network is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;

The fourth neural network is used to generate a future feasible region prediction map based on a time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments.
The feasible region prediction method according to claim 9, wherein the fusion of high-dimensional image features of the multiple view images to obtain the bird's-eye view features includes:

The high-dimensional image features of the perspective image are used as keys and the pixel position coordinates in the bird's-eye view features are used as queries to input into the second neural network, and the bird's-eye view features are obtained according to the output results of the second neural network.
The feasible region prediction method according to claim 9, wherein the third neural network includes a plurality of convolutional layers, and the extraction of the bird's-eye view features to obtain the bird's-eye view high-dimensional image features includes:

Feature extraction is performed on the bird's-eye view features through one of the convolutional layers, and features are extracted again on the results extracted by the previous convolutional layer through the subsequent convolutional layer to obtain the bird's-eye view high-dimensional image features. .
The feasible region prediction method according to claim 9, characterized in that the fourth neural network includes the same number of sub-networks as the number of bird's-eye view high-dimensional image features in the time series queue. The time series queue composed of the bird's-eye view high-dimensional image features and the bird's-eye view high-dimensional image features of multiple historical moments generates future feasible region prediction maps, including:

Generate bird's-eye view high-dimensional image features at the corresponding time in the future based on the corresponding sub-network and the time series queue respectively;

The bird's-eye view high-dimensional image features at the corresponding time in the future are respectively upsampled to generate feasible region prediction maps at the corresponding time in the future, so as to obtain feasible region prediction maps at multiple times in the future.
The feasible region prediction method according to claim 1, wherein the feasible region prediction map is a probability map presented in a probabilistic manner, and the probability map is used to represent that the pixels in the surrounding image at a future time belong to the feasible region. Probability of driving area.
The feasible region prediction method according to claim 13, characterized in that when the pixel value of a pixel in the probability map is not greater than a set threshold, the pixel in the surrounding image at a future time corresponding to the pixel It does not belong to the drivable area. When the pixel value of a pixel in the probability map is greater than the set threshold, the pixel in the surrounding image at a future time corresponding to the pixel belongs to the drivable area.
A feasible region prediction device, applied to vehicle automatic driving or assisted driving, is characterized by including:

A bird's-eye view feature module is used to obtain the surrounding image at the current moment, and obtain the bird's-eye view feature based on the surrounding image; the surrounding image includes images from multiple perspectives collected by multiple cameras on the vehicle;

A bird's-eye view high-dimensional image feature module is used to extract the bird's-eye view features and obtain the bird's-eye view high-dimensional image features at the current moment;

The feasible region prediction map module is used to generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map. domain prediction picture.
A feasible region prediction device, applied to vehicle automatic driving or assisted driving, is characterized by including:

A plurality of cameras installed on the vehicle are used to collect surround images at the current moment; the surround images include images from multiple perspectives;

One or more processors for:

Obtain the surround image, and obtain bird's-eye view features based on the surround image;

Extracting the features of the bird's-eye view to obtain high-dimensional image features of the bird's-eye view at the current moment;

Generate a future feasible region prediction map based on the time series queue composed of the bird's-eye view high-dimensional image features of the current moment and the bird's-eye view high-dimensional image features of multiple historical moments, and output the feasible region prediction map.
A feasible region prediction device, characterized in that it includes a memory and a processor. The memory stores a computer program run by the processor. When the computer program is run by the processor, the computer program causes the processing The device performs the feasible region prediction method described in any one of claims 1 to 14.
A system for automatic driving or assisted driving of vehicles, characterized in that the system includes the feasible region prediction device according to any one of claims 15 to 17.
A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the computer program causes the processor to execute any one of claims 1 to 14 The feasible region prediction method.