WO2021134325A1

WO2021134325A1 - Obstacle detection method and apparatus based on driverless technology and computer device

Info

Publication number: WO2021134325A1
Application number: PCT/CN2019/130155
Authority: WO
Inventors: 邹晓艺; 何明; 叶茂盛; 吴伟; 许双杰; 许家妙; 曹通易
Original assignee: 深圳元戎启行科技有限公司
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-07-08
Also published as: CN113678136B; CN113678136A

Abstract

An obstacle detection method based on driverless technology, comprising: obtaining current frame point cloud data and current frame image data within a preset angle range (202); projecting the current frame point cloud data from a plurality of viewing angles to obtain two-dimensional planes corresponding to the plurality of viewing angles (204); performing feature extraction on the two-dimensional plane corresponding to each viewing angle to obtain point cloud feature information corresponding to each viewing angle (206); inputting the point cloud feature information corresponding to each viewing angle and the current frame image data into a corresponding feature extraction model, and extracting spatial feature information corresponding to each viewing angle and image feature information corresponding to the current frame image data in parallel by means of the corresponding feature extraction model (208); fusing the spatial feature information corresponding to the plurality of viewing angles and the image feature information to obtain fused feature information (210); and inputting the fused feature information into a trained detection model, performing prediction calculation on the fused feature information by means of the detection model, and outputting an obstacle detection result (212).

Description

Obstacle detection method, device and computer equipment based on unmanned driving technology

Technical field

This application relates to an obstacle detection method, device, computer equipment and storage medium based on unmanned driving technology.

Background technique

The development of artificial intelligence technology has promoted the development of unmanned driving technology. In the process of unmanned driving, it is necessary to detect obstacles in the surrounding environment in real time. For example, pedestrians, vehicles and other traffic participants. By tracking and predicting the detected obstacles, the obstacle trajectory is obtained, which can better plan a reasonable route, avoid obstacles, and abide by traffic rules.

In the traditional way, the point cloud data is projected into the image data to obtain feature information of multiple channels, and then obstacle detection is performed based on the feature information of multiple channels. However, in the process of projecting the point cloud data to the image data for feature extraction, some information may be lost, resulting in the extraction of effective feature information of each source data is not comprehensive enough, resulting in low accuracy of obstacle detection.

Summary of the invention

According to various embodiments disclosed in the present application, an obstacle detection method, device, computer device, and storage medium based on an unmanned driving technology that can improve the accuracy of obstacle detection in an unmanned driving process are provided.

An obstacle detection method based on unmanned driving technology, including:

Obtain current frame point cloud data and current frame image data within a preset angle range;

Projecting the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

Perform feature extraction on the two-dimensional plane corresponding to each perspective, and obtain the point cloud feature information corresponding to each perspective;

The point cloud feature information corresponding to each perspective and the current frame image data are input into the corresponding feature extraction model, and the spatial feature information corresponding to each perspective and the current frame image data are extracted in parallel through the corresponding feature extraction model Image feature information;

Fusing the spatial feature information corresponding to multiple viewing angles with the image feature information to obtain the fused feature information; and

The fused feature information is input into a trained detection model, and the fused feature information is predicted and calculated through the detection model, and an obstacle detection result is output.

An obstacle detection device based on unmanned driving technology, including:

The acquisition module is used to acquire the current frame point cloud data and the current frame image data within a preset angle range;

A projection module, configured to project the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

The first extraction module is used to perform feature extraction on the two-dimensional plane corresponding to each perspective to obtain point cloud feature information corresponding to each perspective;

The second extraction module is used to input the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, and extract the spatial feature information corresponding to each perspective in parallel through the corresponding feature extraction model. Image feature information corresponding to the current frame of image data;

The fusion module is used to fuse the spatial feature information corresponding to multiple perspectives with the image feature information to obtain the fused feature information; and

The prediction module is used to input the fused feature information into the trained detection model, and perform prediction operations on the fused feature information through the detection model, and output obstacle detection results.

A computer device, including a memory and one or more processors, the memory stores computer readable instructions, and when the computer readable instructions are executed by the processor, the one or more processors execute The following steps:

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors perform the following steps:

Input the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, and extract the spatial feature information corresponding to each perspective and the current frame image data in parallel through the corresponding feature extraction model Image feature information;

The details of one or more embodiments of the present application are set forth in the following drawings and description. Other features and advantages of this application will become apparent from the description, drawings and claims.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. A person of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

Fig. 1 is an application environment diagram of an obstacle detection method based on unmanned driving technology in one or more embodiments.

Fig. 2 is a schematic flowchart of an obstacle detection method based on unmanned driving technology in one or more embodiments.

FIG. 3 is a schematic flowchart of the step of fusing spatial feature information and image feature information corresponding to multiple viewing angles to obtain fused feature information in one or more embodiments.

Fig. 4 is a block diagram of an obstacle detection device based on unmanned driving technology in one or more embodiments.

Figure 5 is a block diagram of a computer device in one or more embodiments.

Detailed ways

In order to make the technical solutions and advantages of the present application clearer, the following further describes the present application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

The obstacle detection method based on the unmanned driving technology provided in this application can be applied to the schematic diagram of obstacle detection during the unmanned driving process as shown in FIG. 1. The first vehicle-mounted sensor 102 sends the collected point cloud data of the current frame to the vehicle-mounted computer device 104. For example, the first vehicle-mounted sensor may be a lidar. On-board computer equipment can be referred to as computer equipment. The second vehicle-mounted sensor 106 sends the collected image data of the current frame within the preset angle range to the computer device 104. For example, the second vehicle-mounted sensor may be a vehicle-mounted camera. The computer device 104 projects the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles. The computer device 104 performs feature extraction on the two-dimensional plane corresponding to each view angle to obtain point cloud feature information corresponding to each view angle. The computer device 104 inputs the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, and extracts the spatial feature information corresponding to each perspective and the current frame image data in parallel through the corresponding feature extraction model. Image feature information. The computer device 104 fuses the spatial feature information corresponding to the multiple viewing angles with the current frame image data to obtain the fused feature information. The computer device 104 inputs the fused feature information into the trained detection model, and performs a prediction operation on the fused feature information through the detection model, and outputs an obstacle detection result.

In one of the embodiments, as shown in FIG. 2, an obstacle detection method based on unmanned driving technology is provided. Taking the method applied to the computer equipment in FIG. 1 as an example for description, the method includes the following steps:

Step 202: Obtain current frame point cloud data and current frame image data within a preset angle range.

In the process of unmanned driving of the vehicle, the collected current frame point cloud data is transmitted to the computer device through the first on-board sensor installed on the vehicle, and the preset angle range collected by the second on-board sensor installed on the vehicle The image data of the current frame within is sent to the computer device. For example, the first vehicle-mounted sensor may be a lidar. The current frame point cloud data is the current frame point cloud data within a 360-degree range collected by the first vehicle-mounted sensor. For example, the second vehicle-mounted sensor may be a vehicle-mounted camera. The current frame image data within the preset angle range may be the current frame image data within a 360-degree range around the vehicle collected by multiple on-board cameras.

Step 204: Project the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles.

Step 206: Perform feature extraction on the two-dimensional plane corresponding to each view angle to obtain point cloud feature information corresponding to each view angle.

The point cloud data of the current frame is 3D point cloud data. The computer device projects the acquired point cloud data of the current frame to multiple viewing angles, thereby projecting the 3D point cloud data into the two-dimensional planes corresponding to the multiple viewing angles, and realizes the conversion of the 3D point cloud data into the two-dimensional data in the two-dimensional plane . Multiple viewing angles may include a bird's-eye view and a front view. When the computer device projects the current point cloud data on the bird's-eye view angle, a two-dimensional plane corresponding to the bird's-eye view angle can be obtained. When the computer device projects the point cloud data of the current frame on the orthographic perspective, a two-dimensional plane corresponding to the orthographic perspective can be obtained.

The two-dimensional plane corresponding to each view includes the point cloud data of the current frame after projection. The computer device can extract the point cloud feature information corresponding to each perspective in the two-dimensional plane corresponding to each perspective. The point cloud feature information may be the local feature information of each point in the current frame point cloud data corresponding to each pixel in the two-dimensional plane, and the local feature information may include local depth, point cloud density, and the like. The trained neural network model is pre-stored in the computer equipment. For example, the neural network model can be a pointnet based on the attention layer. The computer device can input the two-dimensional plane corresponding to each perspective into the trained neural network model, and perform prediction operations on the two-dimensional plane corresponding to each perspective through the neural network model to obtain the point cloud feature information corresponding to each perspective .

Step 208: Input the point cloud feature information and current frame image data corresponding to each perspective into the corresponding feature extraction model, and extract the spatial feature information corresponding to each perspective and the current frame image data in parallel through the corresponding feature extraction model. Image feature information.

The computer device converts the point cloud feature information corresponding to each view angle and the current frame image data to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data. A plurality of feature extraction models are pre-stored in the computer equipment. The multiple feature extraction models may be the same type of feature extraction models. The feature extraction model is obtained by training a large amount of sample data. For example, the feature extraction model may be a 2D convolutional neural network model. The computer device inputs the point cloud feature vector corresponding to each perspective and the image matrix corresponding to the current frame data into the corresponding feature extraction model, and performs parallel feature extraction through the feature extraction model to obtain the spatial feature information corresponding to each perspective and the current frame data. Image feature information corresponding to the frame image data. The feature extraction model can include a pooling layer, and the computer device can perform dimensionality reduction processing on the point cloud feature information corresponding to each perspective according to the first resolution through the pooling layer of the corresponding feature extraction model, and then obtain the corresponding point cloud feature information for each perspective. Spatial feature information. The pooling layer of the corresponding feature extraction model performs dimensionality reduction processing on the current frame of image data according to the second resolution, and then obtains the image feature information corresponding to the current frame of image data. The spatial feature information may include information such as the shape of the obstacle. The image feature information may include information such as the shape and color of the obstacle.

Step 210: Fusion of spatial feature information and image feature information corresponding to multiple viewing angles to obtain fused feature information.

Step 212: Input the fused feature information into the trained detection model, and perform a prediction operation on the fused feature information through the detection model, and output an obstacle detection result.

After obtaining the spatial feature information and image feature information corresponding to multiple viewing angles, the computer device can merge the spatial feature information and image feature information corresponding to multiple viewing angles. The way of fusion may be to first stitch the spatial feature information and image feature information corresponding to multiple viewing angles according to preset parameters, and then align the stitched feature information to the preset viewing angles to obtain the fused feature information.

The computer equipment converts the fused feature information to obtain the fused feature vector. The trained detection model is pre-stored in the computer equipment. The detection model is obtained through training with a large amount of sample data. For example, the detection model may be a 2D convolutional neural network. The detection model includes multiple network layers, for example, it may include an input layer, an attention layer, a convolutional layer, a pooling layer, a fully connected layer, and so on. The computer device inputs the fused feature vector into the detection model, calculates the context vector and weight corresponding to the fused feature vector through the attention layer of the detection model, and generates a first extraction result according to the context feature and weight. Furthermore, the convolutional layer extracts the context feature corresponding to the context vector according to the first extraction result to generate the second extraction result. The second extraction result is reduced in dimensionality through the pooling layer of the detection model. The second extraction result after dimensionality reduction is classified by the fully connected layer, and the classification result can be obtained. The classification results are weighted and output through the output layer. The computer equipment obtains the obstacle detection result according to the classification result outputted by the weighting.

In this embodiment, the computer device obtains the current frame point cloud data and the current frame image data within a preset angle range, and projects the current frame point cloud data on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles. It is conducive to the subsequent fusion of the point cloud data of the current frame and the image data of the current frame. The computer device performs feature extraction on the two-dimensional plane corresponding to each perspective, obtains the point cloud feature information corresponding to each perspective, and inputs the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model. The spatial feature information corresponding to each view angle and the image feature information corresponding to the current frame image data are extracted in parallel through the corresponding feature extraction model. By performing multiple feature extraction on the two-dimensional plane corresponding to each viewing angle, it is possible to extract more comprehensive and effective feature information from the point cloud data of the current frame. The computer equipment fuses the spatial feature information and the image feature information corresponding to multiple viewing angles to obtain the fused feature information. Based on the data characteristics of multiple source data, multiple source data can be complemented to obtain more comprehensive obstacle feature information. The computer equipment predicts and calculates the fused feature information through the detection model, and outputs the obstacle detection result. Since the fused feature information is comprehensive, and the detection model is pre-trained, the accuracy of obstacle detection is effectively improved.

In one of the embodiments, as shown in FIG. 3, the steps of fusing the spatial feature information and the image feature information corresponding to multiple viewing angles to obtain the fused feature information include:

In step 302, the spatial feature information and the image feature information corresponding to the multiple viewing angles are spliced according to preset parameters to obtain spliced feature information.

Step 304: align the spliced feature information to a preset viewing angle according to the preset parameters to obtain the aligned feature information, and use the aligned feature information as the fused feature information.

The computer device can perform dimensionality reduction processing on the point cloud feature information corresponding to each perspective according to the first resolution through the pooling layer of the corresponding feature extraction model, and obtain the spatial feature information after the dimensionality reduction processing, that is, the space corresponding to multiple perspectives. In the process of feature information. The computer device can perform dimensionality reduction processing on the current frame image data according to the second resolution through the pooling layer of the corresponding feature extraction model to obtain the image feature information after the dimensionality reduction processing, that is, the image feature information corresponding to the current frame image data.

The preset parameter may be the coordinate conversion relationship between the point cloud data and the image data. The computer device splices the spatial feature information corresponding to the bird's-eye view angle and the spatial feature information corresponding to the front view angle with the image feature information respectively according to preset parameters. After the computer device obtains the spliced feature information, it can align the spliced feature information to a preset viewing angle according to preset parameters. The preset viewing angle may be a bird's-eye view. The computer device then obtains the aligned feature information on the preset viewing angle, and uses the aligned feature information as the fused feature information.

In this embodiment, the computer device stitches the spatial feature information and image feature information corresponding to the multiple viewing angles according to preset parameters, and then aligns the stitched feature information to the preset viewing angles according to the preset parameters to obtain the aligned Feature information, using the aligned feature information as the fused feature information. Because the spatial feature information corresponding to multiple viewing angles can improve the accurate 3D information, the lack of color information, and the image feature information includes higher-resolution color information, lacks 3D information, by splicing and aligning the spatial feature information and the image feature information , To achieve the fusion of complementary data, so as to perform obstacle detection based on the fused feature information, which can further improve the accuracy of obstacle detection.

In one of the embodiments, the two-dimensional plane includes the two-dimensional data corresponding to each point in the point cloud data of the current frame, and performing feature extraction on the two-dimensional plane corresponding to each perspective to obtain point cloud feature information includes: Extract multiple data dimensions from the two-dimensional data corresponding to each point in the point cloud data of the current frame; input multiple data dimensions into the trained neural network model, and perform prediction operations on the feature information of multiple dimensions through the neural network model , Get point cloud feature information.

The computer device can extract multiple data dimensions from the two-dimensional data corresponding to each point in the point cloud data of the current frame. Multiple data dimensions may include the coordinates of points, reflectivity and other dimensions. The trained neural network model is pre-stored in the computer equipment. The trained neural network model is obtained by training with a large amount of sample data. For example, the neural network model can be a pointnet based on the attention layer. The neural network model can include multiple network layers. For example, the network layer may include an attention layer, a convolutional layer, and so on. The computer device can input the extracted multiple data dimensions into the trained neural network model, and calculate the context vectors and weights corresponding to the multiple data dimensions through the attention layer of the neural network model. The neural network model takes the context vector and weight as the input of the convolutional layer, and extracts the context features corresponding to the context vector through the convolutional layer. The neural network model takes the context features and weights as the input of the pooling layer, and reduces the dimensionality of the context features through the pooling layer. The output layer of the neural network model outputs the context features and weights after dimensionality reduction, and uses the context features after dimensionality reduction as point cloud feature information.

In this embodiment, the computer device extracts multiple data dimensions from the two-dimensional data corresponding to each point in the point cloud data of the current frame, and performs prediction operations on the multiple data dimensions through a neural network model to obtain point cloud feature information. Since the neural network is pre-trained, the local feature information of each point in the current frame of point cloud data can be accurately extracted through the neural network model, which is beneficial to the subsequent extraction of spatial feature information of the current frame of point cloud data.

In one of the embodiments, projecting the point cloud data of the current frame on multiple viewing angles to obtain a two-dimensional plane corresponding to the multiple viewing angles includes: projecting the point cloud data of the current frame on a bird's-eye view angle to obtain the corresponding bird's-eye view angle. The two-dimensional plane of the current frame; project the point cloud data of the current frame on the orthographic perspective to obtain the two-dimensional plane corresponding to the orthographic perspective.

After obtaining the point cloud data of the current frame, the computer device can project the point cloud data of the current frame to multiple perspectives. The coordinates of the point cloud data of the current frame can be expressed as (x, y, z). Computer equipment can project with preset resolutions. Multiple viewing angles may include a bird's-eye view and a front view. For example, in the process of bird's-eye perspective projection, when the preset resolution is 0.1m per grid, then the point cloud data of the current frame within the range of -60<x<60 and -60<y<60 can be set. Projected into a two-dimensional plane with a size of 1200x 1200. Taking each pixel in the two-dimensional plane as a grid, the point cloud data of the current frame within the range of 0.05m will all fall on the corresponding grid. By projecting the point cloud data of the current frame on the bird's-eye perspective, the computer device can obtain an unobstructed and intuitive obstruction view, avoiding the problem of inaccurate point cloud feature information extracted by obstructing objects. By projecting the point cloud data of the current frame on the front view, the computer device can describe the shape of a smaller target more intuitively, for example, can describe pedestrians more intuitively. This is beneficial for the computer equipment to extract more comprehensive and accurate effective feature information from the two-dimensional planes corresponding to multiple viewing angles.

In one of the embodiments, the detection model includes a plurality of network layers, and performing prediction operations on the fused feature information through the detection model, and outputting obstacle detection results, includes: inputting the fused feature information into the input layer of the detection model; The fused feature information is input to the attention layer of the detection model through the input layer, and the context vector and weight corresponding to the fused feature information are calculated through the attention layer to generate the first extraction result; the first extraction result is input to the convolutional layer , Extract the context feature corresponding to the context vector through the convolutional layer to generate the second extraction result; input the second extraction result into the pooling layer, and perform dimensionality reduction processing on the second extraction result through the pooling layer; Second, the extraction results are input to the fully connected layer, and the second extraction results after the dimensionality reduction process are classified through the fully connected layer to obtain the classification results, and the classification results are weighted through the output layer and output; the weighted classification results are selected from the weighted output The classification result is used as the obstacle detection result.

The trained detection model is pre-stored in the computer equipment. The detection model may be a detection model obtained after pre-training with a large amount of sample data. For example, the detection model may be a 2D convolutional neural network based on the attention layer. The detection model can include multiple network layers. For example, the detection model may include multiple network layers such as an input layer, an attention layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. The computer device inputs the fused feature information to the input layer of the detection model by calling the trained detection model. The fused feature information is transmitted to the attention layer through the input layer, and the context vector and weight corresponding to the fused feature information are calculated through the attention layer, and the first extraction result is generated according to the context vector and weight. The detection model uses the first extraction result as the input of the convolution layer, extracts the context feature corresponding to the context vector through the convolution layer, and generates the second extraction result according to the context feature and weight. Furthermore, the computer device uses the second extraction result as the input of the pooling layer, and performs dimensionality reduction processing on the second extraction result through the pooling layer to obtain the second extraction result after the dimensionality reduction processing. The computer device uses the second extraction result after the dimensionality reduction processing as the input of the fully connected layer, and classifies the second extraction result after the dimensionality reduction processing to obtain the classification result. The classification result can include multiple categories of obstacles, multiple location information, and so on. Furthermore, the classification results are weighted and output through the output layer. Furthermore, the computer device selects the classification result with the largest weight among the weighted classification results, which is the obstacle detection result. Obstacle detection results may include the location information of the obstacle, the size of the obstacle, the shape of the obstacle, and so on.

In this embodiment, the computer device calculates the context vector and weight corresponding to the fused feature information through the attention layer of the detection model, and generates the first extraction result. It can filter the interference information in the fused feature information, and realize the feature focus processing on the fused feature information. The context feature corresponding to the context vector is extracted through the convolutional layer to generate the second extraction result, and the second extraction result is reduced by the pooling layer, which can extract the main context features and avoid the influence of redundant features. The computer device classifies the second extraction result after dimensionality reduction to obtain the classification result, and then weights the classification result and outputs it. The classification result with the largest weight among the weighted classification results is selected as the obstacle detection result, which can classify The results are normalized to further improve the accuracy of obstacle detection.

In one of the embodiments, the two-dimensional plane includes multiple pixels, and each pixel corresponds to the two-dimensional data of multiple points in the point cloud data of the current frame. Before feature extraction is performed on the two-dimensional plane corresponding to each view angle, it also includes : Perform average processing on the two-dimensional data of multiple points corresponding to each pixel to obtain the average value; perform normalization processing on the points corresponding to the corresponding pixels according to the average value.

Before performing feature extraction on the two-dimensional plane corresponding to each viewing angle, the computer device may also perform normalization processing on multiple points in the current frame point cloud data in the two-dimensional plane. Specifically, the two-dimensional plane includes multiple pixels, and each pixel may be represented by a grid, and each grid includes multiple points in the point cloud data of the current frame. The computer equipment averages the coordinates of multiple points in each grid to obtain the average value. Furthermore, the computer equipment makes the difference between the coordinates of each point in the grid and the average value to realize the normalization of the point cloud data of the current frame in each grid. By normalizing multiple points in the point cloud data of the current frame in the two-dimensional plane, it is beneficial to subsequently use the feature extraction model for feature extraction.

In one of the embodiments, the feature extraction is performed on the two-dimensional plane corresponding to each perspective to obtain the point cloud feature information corresponding to each perspective, and the method further includes: invoking multiple threads to concurrently extract each perspective in the two-dimensional plane corresponding to each perspective. Point cloud feature information corresponding to each perspective; before inputting the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, it also includes: using multi-threaded point cloud features corresponding to each perspective The information and the current frame image data are converted in parallel to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data.

The computer device uses multiple threads to concurrently extract the point cloud feature information corresponding to each perspective in the two-dimensional plane corresponding to each perspective through multiple threads. Thereby improving the extraction efficiency of point cloud feature information. The computer device can also use the multi-thread to perform the point cloud feature information corresponding to each perspective and the current frame image data before inputting the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model. Parallel conversion effectively reduces the time-consuming feature extraction model for feature extraction.

In one of the embodiments, the computer device may also obtain historical trajectory information of multiple obstacles in the current environment according to the obstacle detection result, and at the same time obtain the current position information of the vehicle. Predict the trajectory of multiple obstacles within a preset time period based on historical trajectory information and current position information.

Specifically, the computer device tracks the movement process of the obstacle in the obstacle detection result, predicts the position information at the current time based on the position information of the obstacle at the previous time, and compares the predicted position information at the current time with the actual position. Information is compared to obtain error information. The computer device corrects the position information at the next moment according to the error information, thereby obtaining historical trajectory information of multiple obstacles. The computer equipment can obtain the current position information sent by the vehicle-mounted locator. Therefore, the computer device can render the acquired historical trajectory information of multiple obstacles into a feature map to obtain a trajectory rendering map. The historical trajectory information may be the trajectory of each frame of the history of multiple obstacles. The computer device renders the historical trajectory information of multiple obstacles in the current frame to obtain a trajectory rendering map. The color of obstacles in each frame in the trajectory rendering diagram changes with the distance from the current frame. The farther away from the current frame, the lighter the color of the obstacle. The obstacle itself and the surrounding environment information can be obtained, and the influence factors of the trajectory can be considered from various aspects, which is more conducive to improving the accuracy of trajectory prediction.

The computer equipment obtains the current position information collected by the vehicle-mounted locator. The current location information may be the location information of the vehicle on the high-precision map at the current moment. The current location information can be expressed in the form of latitude and longitude. The computer equipment extracts map elements from the current location information. Map elements can include information such as lane lines, center lines, sidewalks, and stop lines. The computer device may render the extracted map elements according to multiple channel dimensions, and render the map elements into a map element rendering map corresponding to the channel dimensions. When the map elements are different, the channel dimensions corresponding to the map elements can also be different. Channel dimensions can include color channels, element channels, and so on. The color channel can include three channels of red, green, and blue. Elemental passages can include lane-line passages, center-line passages, and sidewalk passages. The current position of the obstacle can be rendered intuitively and accurately through the channel dimension corresponding to the map element, which is conducive to subsequent trajectory prediction.

After the computer device obtains the trajectory rendering image and the map element rendering image, the trajectory rendering image and the map element rendering image can be stitched together. The computer device determines the corresponding channel dimensions of the trajectory rendering map and the map element rendering map, and performs image stitching on the trajectory rendering map and the map element rendering map in the corresponding channel dimensions to obtain a spliced image matrix. The spliced image matrix may be a complete image including the trajectory rendering map and the map element rendering map.

The computer device has pre-trained a feature extractor before acquiring the historical trajectory information and current position information of multiple obstacles in the current environment. The computer device calls the trained feature extractor, and inputs the spliced image matrix into the trained feature extractor. The computer device extracts the image feature information and context feature information corresponding to the spliced image matrix through the feature extractor, and then outputs the feature extraction result corresponding to the spliced image matrix through the fully connected layer of the feature extractor. It realizes the combination of various influence factors of the obstacle trajectory, and further improves the comprehensiveness of the feature extraction results. The computer equipment can calculate the feature extraction results by means of regression prediction to obtain the trajectories of multiple obstacles within a preset time period. Because the obstacle detection result is more comprehensive and accurate, and the feature extraction result includes the trajectory of multiple obstacles in the history frame, the scope of environmental information is expanded, and the trajectory prediction based on various influencing factors is realized, thereby effectively providing the trajectory The accuracy of the forecast.

In one of the embodiments, as shown in FIG. 4, an obstacle detection device based on unmanned driving technology is provided, including: an acquisition module 402, a projection module 404, a first extraction module 406, a second extraction module 408, The fusion module 410 and the prediction module 412, where:

The acquiring module 402 is used to acquire the point cloud data of the current frame and the image data of the current frame within a preset angle range.

The projection module 404 is configured to project the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles.

The first extraction module 406 is configured to perform feature extraction on the two-dimensional plane corresponding to each view angle to obtain point cloud feature information corresponding to each view angle.

The second extraction module 408 is used to input the point cloud feature information and current frame image data corresponding to each perspective into the corresponding feature extraction model, and extract the spatial feature information corresponding to each perspective and the current frame through the corresponding feature extraction model in parallel. Image feature information corresponding to the frame image data.

The fusion module 410 is used for fusing spatial feature information and image feature information corresponding to multiple viewing angles to obtain fused feature information.

The prediction module 412 is configured to input the fused feature information into the trained detection model, perform prediction operations on the fused feature information through the detection model, and output obstacle detection results.

In one of the embodiments, the fusion module 410 is further configured to splice the spatial characteristic information and image characteristic information corresponding to the multiple viewing angles according to preset parameters to obtain the spliced characteristic information; according to the preset parameters, the spliced characteristic information The alignment is performed to the preset viewing angle to obtain the aligned feature information, and the aligned feature information is used as the fused feature information.

In one of the embodiments, the first extraction module 406 is also used to extract multiple data dimensions from the two-dimensional data corresponding to each point in the point cloud data of the current frame; input the multiple data dimensions to the trained neural network model In the process, the neural network model is used to perform predictive operations on multiple data dimensions to obtain point cloud feature information.

In one of the embodiments, the projection module 404 is also used to project the point cloud data of the current frame on the bird's-eye view angle to obtain a two-dimensional plane corresponding to the bird's-eye view angle; Two-dimensional plane corresponding to the viewing angle

In one of the embodiments, the prediction module 412 is also used to input the fused feature information to the input layer of the detection model; the fused feature information is input to the attention layer of the detection model through the input layer, and the attention layer is used to calculate The context vector and weight corresponding to the fused feature information are generated to generate the first extraction result; the first extraction result is input to the convolutional layer, and the context feature corresponding to the context vector is extracted through the convolutional layer to generate the second extraction result; the second extraction The result is input to the pooling layer, and the second extraction result is reduced by the pooling layer; the second extraction result after the dimensionality reduction is input into the fully connected layer, and the second extraction result after the dimensionality reduction is processed through the fully connected layer The classification results are obtained by classification, and the classification results are weighted and output through the output layer; the classification result with the largest weight among the weighted classification results is selected as the obstacle detection result.

In one of the embodiments, the above-mentioned device further includes: a normalization processing module for averaging the two-dimensional data of multiple points corresponding to each pixel to obtain an average value; according to the average value, the points corresponding to the corresponding pixels are processed Standardized processing.

In one of the embodiments, the first extraction module 406 is also used to call multiple threads to concurrently extract the point cloud feature information corresponding to each perspective in the two-dimensional plane corresponding to each perspective; the above-mentioned device further includes: a conversion module for The point cloud feature information corresponding to each view angle and the current frame image data are converted in parallel by using multiple threads to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data.

Regarding the specific limitations of the obstacle detection device based on unmanned driving technology, please refer to the above limitation on the trajectory prediction method, which will not be repeated here. The various modules in the above-mentioned obstacle detection device based on unmanned driving technology can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

In one of the embodiments, a computer device is provided, and its internal structure diagram may be as shown in FIG. 5. The computer equipment includes a processor, a memory, a communication interface and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer equipment is used to store obstacle detection results. The communication interface of the computer device is used to connect and communicate with the first vehicle-mounted sensor, the second vehicle-mounted sensor, and the vehicle-mounted positioning sensor. The computer readable instruction is executed by the processor to realize an obstacle detection method.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.

A computer device that includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute each of the foregoing method implementations. The steps in the example.

One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the steps in each of the foregoing method embodiments. step.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium, when the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

An obstacle detection method based on unmanned driving technology, including:

Obtain current frame point cloud data and current frame image data within a preset angle range;

Projecting the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

Perform feature extraction on the two-dimensional plane corresponding to each perspective, and obtain the point cloud feature information corresponding to each perspective;

The point cloud feature information corresponding to each perspective and the current frame image data are input into the corresponding feature extraction model, and the spatial feature information corresponding to each perspective and the current frame image data are extracted in parallel through the corresponding feature extraction model Image feature information;

Fusing the spatial feature information corresponding to multiple viewing angles with the image feature information to obtain the fused feature information; and

The fused feature information is input into a trained detection model, and the fused feature information is predicted and calculated through the detection model, and an obstacle detection result is output.
The method according to claim 1, wherein the fusing the spatial feature information corresponding to multiple viewing angles with the image feature information to obtain the fused feature information comprises:

Splicing the spatial feature information corresponding to the multiple viewing angles and the image feature information according to preset parameters to obtain the spliced feature information; and

The spliced feature information is aligned to a preset viewing angle according to the preset parameters to obtain the aligned feature information, and the aligned feature information is used as the fused feature information.
The method according to claim 1, wherein the two-dimensional plane includes two-dimensional data corresponding to each point in the point cloud data of the current frame, and the feature extraction is performed on the two-dimensional plane corresponding to each perspective To get point cloud feature information, including:

Extracting multiple data dimensions from the two-dimensional data corresponding to each point in the point cloud data of the current frame; and

Inputting multiple data dimensions into the trained neural network model, and performing prediction operations on the multiple data dimensions through the neural network model to obtain point cloud feature information.
The method according to claim 1, wherein the projecting the point cloud data of the current frame on multiple viewing angles to obtain a two-dimensional plane corresponding to the multiple viewing angles comprises:

Projecting the point cloud data of the current frame on a bird's-eye view angle to obtain a two-dimensional plane corresponding to the bird's-eye view angle; and

The point cloud data of the current frame is projected on the front view angle to obtain a two-dimensional plane corresponding to the front view angle.
The method according to claim 1, wherein the detection model includes a plurality of network layers, and the prediction operation on the fused feature information through the detection model to output an obstacle detection result comprises:

Input the fused feature information to the input layer of the detection model;

The fused feature information is input to the attention layer of the detection model through the input layer, and the context vector and weight corresponding to the fused feature information are calculated through the attention layer to generate a first extraction result ；

Inputting the first extraction result into a convolutional layer, extracting context features corresponding to the context vector through the convolutional layer, and generating a second extraction result;

Inputting the second extraction result into a pooling layer, and performing dimensionality reduction processing on the second extraction result through the pooling layer;

The second extraction result after the dimensionality reduction process is input into the fully connected layer, the second extraction result after the dimensionality reduction process is classified by the fully connected layer to obtain the classification result, and the classification result is weighted by the output layer Output; and

The classification result with the largest weight among the weighted classification results is selected as the obstacle detection result.
The method according to claim 1, wherein the two-dimensional plane includes a plurality of pixels, and each pixel corresponds to the two-dimensional data of a plurality of points in the point cloud data of the current frame, and the Before feature extraction is performed on the two-dimensional plane corresponding to each viewing angle, it also includes:

Perform averaging processing on the two-dimensional data of multiple points corresponding to each pixel to obtain the average value; and

The points corresponding to the corresponding pixels are normalized according to the average value.
The method according to any one of claims 1 to 6, wherein the performing feature extraction on the two-dimensional plane corresponding to each view angle to obtain the point cloud feature information corresponding to each view angle further comprises:

Calling multiple threads to concurrently extract the point cloud feature information corresponding to each viewing angle in the two-dimensional plane corresponding to each viewing angle; and

Before inputting the point cloud feature information corresponding to each view angle and the current frame image data into the corresponding feature extraction model, the method further includes:

The point cloud feature information corresponding to each view angle and the current frame image data are converted in parallel by using the multi-thread to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data.
An obstacle detection device based on unmanned driving technology, including:

The acquisition module is used to acquire the current frame point cloud data and the current frame image data within a preset angle range;

A projection module, configured to project the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

The first extraction module is used to perform feature extraction on the two-dimensional plane corresponding to each perspective to obtain point cloud feature information corresponding to each perspective;

The second extraction module is used to input the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, and extract the spatial feature information corresponding to each perspective in parallel through the corresponding feature extraction model. Image feature information corresponding to the current frame of image data;

The fusion module is used to fuse the spatial feature information corresponding to multiple perspectives with the image feature information to obtain the fused feature information; and

The prediction module is used to input the fused feature information into a trained detection model, and perform prediction operations on the fused feature information through the detection model, and output obstacle detection results.
The device according to claim 8, wherein the fusion module is further configured to splice the spatial characteristic information corresponding to the multiple viewing angles and the image characteristic information according to preset parameters to obtain the spliced characteristic information; and The spliced feature information is aligned to a preset viewing angle according to the preset parameters to obtain the aligned feature information, and the aligned feature information is used as the fused feature information.
The device according to claim 8, wherein the first extraction module is further configured to extract multiple data dimensions from the two-dimensional data corresponding to each point in the current frame point cloud data; and The data dimensions are input into the trained neural network model, and prediction operations are performed on multiple data dimensions through the neural network model to obtain point cloud feature information.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors Each processor performs the following steps:

Obtain current frame point cloud data and current frame image data within a preset angle range;

Projecting the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

Perform feature extraction on the two-dimensional plane corresponding to each perspective, and obtain the point cloud feature information corresponding to each perspective;

The point cloud feature information corresponding to each perspective and the current frame image data are input into the corresponding feature extraction model, and the spatial feature information corresponding to each perspective and the current frame image data are extracted in parallel through the corresponding feature extraction model Image feature information;

Fusing the spatial feature information corresponding to multiple viewing angles with the image feature information to obtain the fused feature information; and

The fused feature information is input into a trained detection model, and the fused feature information is predicted and calculated through the detection model, and an obstacle detection result is output.
The computer device according to claim 11, wherein the processor further executes the following step when executing the computer-readable instruction: according to preset parameters, the spatial feature information corresponding to the multiple viewing angles and the image feature information Perform splicing to obtain spliced feature information; and align the spliced feature information to a preset viewing angle according to the preset parameters to obtain the aligned feature information, and use the aligned feature information as the fused Characteristic information.
The computer device according to claim 11, wherein the processor further executes the following step when executing the computer-readable instructions: extracting the two-dimensional data corresponding to each point in the current frame point cloud data Multiple data dimensions; and input the multiple data dimensions into the trained neural network model, and perform prediction operations on the multiple data dimensions through the neural network model to obtain point cloud feature information.
The computer device according to claim 11, wherein the processor further executes the following steps when executing the computer-readable instructions: inputting the fused feature information into the input layer of the detection model; The input layer inputs the fused feature information to the attention layer of the detection model, and calculates the context vector and weight corresponding to the fused feature information through the attention layer to generate a first extraction result; The first extraction result is input to the convolutional layer, the context feature corresponding to the context vector is extracted through the convolutional layer to generate a second extraction result; the second extraction result is input to the pooling layer, and the second extraction result The transformation layer performs dimensionality reduction processing on the second extraction result; the second extraction result after the dimensionality reduction processing is input to the fully connected layer, and the second extraction result after the dimensionality reduction processing is classified through the fully connected layer to obtain The classification result is output after being weighted by the output layer; and the classification result with the largest weight among the weighted classification results is selected as the obstacle detection result.
The computer device according to any one of claims 11 to 14, wherein the processor further executes the following steps when executing the computer-readable instructions: invoking multiple threads to concurrently execute in the two-dimensional plane corresponding to each perspective Extracting point cloud feature information corresponding to each perspective; and before inputting the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, the method further includes: using the multithreading The point cloud feature information corresponding to each view angle and the current frame image data are converted in parallel to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data.
One or more non-volatile computer-readable storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Obtain current frame point cloud data and current frame image data within a preset angle range;

Projecting the point cloud data of the current frame on multiple viewing angles to obtain two-dimensional planes corresponding to the multiple viewing angles;

Perform feature extraction on the two-dimensional plane corresponding to each perspective, and obtain the point cloud feature information corresponding to each perspective;

The point cloud feature information corresponding to each perspective and the current frame image data are input into the corresponding feature extraction model, and the spatial feature information corresponding to each perspective and the current frame image data are extracted in parallel through the corresponding feature extraction model Image feature information;

Fusing the spatial feature information corresponding to multiple viewing angles with the image feature information to obtain the fused feature information; and

The fused feature information is input into a trained detection model, and the fused feature information is predicted and calculated through the detection model, and an obstacle detection result is output.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following step is further executed: according to preset parameters, the spatial feature information corresponding to the multiple viewing angles and the image feature Information is spliced to obtain spliced feature information; and the spliced feature information is aligned to a preset viewing angle according to the preset parameters to obtain the aligned feature information, and the aligned feature information is used as the fusion After the feature information.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: in the two-dimensional data corresponding to each point in the current frame point cloud data Extracting multiple data dimensions; and inputting the multiple data dimensions into the trained neural network model, and performing prediction operations on the multiple data dimensions through the neural network model to obtain point cloud feature information.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following step is further executed: inputting the fused feature information into the input layer of the detection model; The fused feature information is input to the attention layer of the detection model through the input layer, and the context vector and weight corresponding to the fused feature information are calculated through the attention layer to generate a first extraction result Input the first extraction result into the convolutional layer, extract the context feature corresponding to the context vector through the convolutional layer, and generate a second extraction result; input the second extraction result into the pooling layer, and pass the The pooling layer performs dimensionality reduction processing on the second extraction result; the second extraction result after the dimensionality reduction processing is input to the fully connected layer, and the second extraction result after the dimensionality reduction processing is classified through the fully connected layer The classification result is obtained, and the classification result is weighted and output through the output layer; and the classification result with the largest weight among the weighted classification results is selected as the obstacle detection result.
The storage medium according to any one of claims 16 to 19, wherein when the computer-readable instructions are executed by the processor, the following steps are further executed: calling multi-threads in a two-dimensional plane corresponding to each view angle Concurrently extracting the point cloud feature information corresponding to each perspective; and before inputting the point cloud feature information corresponding to each perspective and the current frame image data into the corresponding feature extraction model, the method further includes: using the multiple The thread performs parallel conversion of the point cloud feature information corresponding to each view angle and the current frame image data to obtain the point cloud feature vector corresponding to each view angle and the image matrix corresponding to the current frame image data.