WO2022017129A1

WO2022017129A1 - Target object detection method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022017129A1
Application number: PCT/CN2021/102684
Authority: WO
Inventors: 付万增; 王哲; 石建萍
Original assignee: 上海商汤临港智能科技有限公司
Priority date: 2020-07-22
Filing date: 2021-06-28
Publication date: 2022-01-27
Also published as: CN113971734A

Abstract

The present disclosure provides a target object detection method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring target point cloud data of a target scene collected by a radar apparatus; generating, on the basis of the target point cloud data, at least one target sparse matrix corresponding to the target point cloud data, the target sparse matrix being used for representing whether there are target objects at different positions of the target scene; and determining, on the basis of the at least one target sparse matrix and the target point cloud data, three-dimensional detection data of the target objects included in the target scene.

Description

Target object detection method, device, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the priority of the Chinese patent application filed on July 22, 2020, with the application number of 202010712645.3 and the invention titled "target object detection method, device, electronic device and storage medium", which application is by reference Incorporated herein.

technical field

The present disclosure relates to the technical field of lidar, and in particular, to a target object detection method, device, electronic device, and storage medium.

Background technique

Generally, object detection and segmentation algorithms are the core algorithms for many artificial intelligence applications. For example, object detection and segmentation algorithms can be applied in the field of automatic driving. , pedestrians, obstacles, etc. are detected to avoid collisions.

Convolutional neural network is a kind of feedforward neural network (Feedforward Neural Networks) with deep structure including convolution calculation. It is one of the deep learning algorithms and is widely used in artificial intelligence scenarios. Detection and segmentation algorithms based on convolutional neural networks generally organize a complex computing model through a huge amount of parameters to complete specific tasks. Such computing models often have extremely high requirements on the performance of computing devices, and there are computational The problems of large amount, high power consumption, and high delay lead to the complex detection process of the target object, the large amount of calculation, and the long time-consuming.

SUMMARY OF THE INVENTION

In view of this, the present disclosure provides at least a target object detection method, apparatus, electronic device, and storage medium.

In a first aspect, the present disclosure provides a target object detection method, comprising: acquiring target point cloud data of a target scene collected by a radar device; and generating at least one target corresponding to the target point cloud data based on the target point cloud data sparse matrix; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene; based on the at least one target sparse matrix and the target point cloud data, determine the target included in the target scene 3D inspection data for objects.

Using the above method, at least one corresponding target sparse matrix can be generated for the acquired target point cloud data, and the target sparse matrix is used to represent whether there are target objects at different positions of the target scene; in this way, based on the target sparse matrix and the target point Cloud data, when determining the three-dimensional detection data of the target object, the target position of the existing target object can be determined based on the target sparse matrix, so that the features corresponding to the target position can be processed, and the different positions except the target position can be processed. The features corresponding to other positions are not processed, which reduces the amount of calculation to obtain the three-dimensional detection data of the target object and improves the detection efficiency.

In a possible implementation manner, generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data includes: determining, based on the target point cloud data, a method for detecting the target object. The target sparse matrix corresponding to each layer of convolution module in the neural network.

Under the above-described embodiment, based on the target point cloud data, a corresponding target sparse matrix can be determined for each layer of convolution module of the neural network, so that each layer of convolution module can process the input feature map based on the target sparse matrix.

In a possible implementation manner, determining, based on the target point cloud data, a target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object, including: based on the target point cloud data, generating an initial sparse matrix; based on the initial sparse matrix, determining a target sparse matrix matching the target size of the feature map input to each layer of the convolution module of the neural network.

In the above embodiment, an initial sparse matrix can be generated based on the target point cloud data, and then based on the initial sparse matrix, a corresponding target sparse matrix can be determined for each layer of convolution modules of the neural network, and the target corresponding to each layer of convolution modules. The sparse matrix matches the target size of the feature map input to the convolution module of this layer, so that each layer of the convolution module can process the input feature map based on the target sparse matrix.

In a possible implementation manner, generating an initial sparse matrix based on the target point cloud data includes: determining a target area corresponding to the target point cloud data; dividing the target area into a plurality of grid areas; based on the grid area where the points of the target point cloud corresponding to the target point cloud data are located, determine the matrix element value corresponding to each grid area; based on the matrix element value corresponding to each grid area , and generate the initial sparse matrix corresponding to the target point cloud data.

Here, based on the target point cloud data, it can be judged whether there are points of the target point cloud in each grid area, and based on the judgment result, the matrix element value of each grid area can be determined. For example, if there is a target point cloud in the grid area , then the matrix element value of the grid area is 1, indicating that there is a target object in the grid area, and then based on the matrix element values corresponding to each grid area, an initial sparse matrix is generated, which is used to determine the subsequent 3D detection of the target object. Data provides data support.

In a possible implementation manner, based on the initial sparse matrix, determine a target sparse matrix matching the target size of the feature map input to the convolution module of each layer of the neural network, including any of the following: based on the The initial sparse matrix is to determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and the output sparse matrix is used as the target sparse matrix; based on the initial sparse matrix, it is determined that in the neural network, each an input sparse matrix corresponding to a layer of convolution modules, and the input sparse matrix as the target sparse matrix; based on the initial sparse matrix, determine the input sparse matrix and output corresponding to each layer of convolution modules in the neural network Sparse matrix, the input sparse matrix and the output sparse matrix are fused to obtain a fused sparse matrix, and the fused sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.

In the above-mentioned embodiment, a variety of methods are set to generate the target sparse matrix corresponding to each layer of convolution module, that is, the target sparse matrix can be an input sparse matrix, an output sparse matrix, or an input sparse matrix and an output sparse matrix. The fused sparse matrix generated by the matrix.

In a possible implementation manner, the determining, based on the initial sparse matrix, the input sparse matrix corresponding to each layer of convolution modules in the neural network includes: using the initial sparse matrix as the input sparse matrix of the neural network. The input sparse matrix corresponding to the first layer convolution module; based on the input sparse matrix corresponding to the i-1 layer convolution module, determine the feature map corresponding to the i layer convolution module and input to the i layer convolution module The input sparse matrix matching the target size of ; wherein, i is a positive integer greater than 1 and less than n+1, and n is the total number of layers of the convolution module of the neural network.

In the above method, the initial sparse matrix can be used as the input sparse matrix corresponding to the first-layer convolution module, and the input sparse matrix of each layer of convolution modules can be determined in turn, and then the target sparse matrix can be determined based on the input sparse matrix, as Subsequently, based on the target sparse matrix of each layer of convolution module, the 3D detection data of the target object is determined to provide data support.

In a possible implementation manner, the determining, based on the initial sparse matrix, the output sparse matrix corresponding to each layer of convolution modules in the neural network includes: based on the size threshold of the target object and the initial sparse matrix. Sparse matrix, determine the output sparse matrix corresponding to the neural network; based on the output sparse matrix, generate an output corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module sparse matrix; based on the output sparse matrix corresponding to the j+1th layer convolution module, an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module is generated, Wherein, j is a positive integer greater than or equal to 1 and less than n, and n is the total number of layers of the convolution module of the neural network.

In the above method, the output sparse matrix can be determined based on the initial sparse matrix, and the output sparse matrix of the n-th layer convolution module, . The output sparse matrix of the layer determines the target sparse matrix, which provides data support for the subsequent determination of the 3D detection data of the target object based on the target sparse matrix of each layer of convolution modules.

In a possible implementation manner, determining the three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data includes: based on the target point cloud data, generating a target point cloud feature map corresponding to the target point cloud data; based on the target point cloud feature map and the at least one target sparse matrix, using a neural network for detecting target objects to determine the target objects included in the target scene The three-dimensional detection data, wherein, the neural network includes a multi-layer convolution module.

In a possible implementation manner, generating a target point cloud feature map corresponding to the target point cloud data based on the target point cloud data includes: for each grid area, based on the target located in the grid area The coordinate information indicated by the target point cloud data corresponding to the points of the point cloud determines the feature information corresponding to the grid area; wherein, the grid area is the target point cloud data according to a preset number of grids. The corresponding target area is divided and generated; based on the feature information corresponding to each grid area, the target point cloud feature map corresponding to the target point cloud data is generated.

In the above embodiment, based on the feature information corresponding to each grid area, a target point cloud feature map corresponding to the target point cloud data is generated, and the target point cloud feature map includes the position information of each target point cloud point, and then Based on the target point cloud feature map and the at least one target sparse matrix, the three-dimensional detection data of the target object included in the target scene can be more accurately determined.

In a possible implementation manner, based on the target point cloud feature map and the at least one target sparse matrix, a neural network for detecting target objects is used to determine the three-dimensional detection data of the target objects included in the target scene, including: Based on the target sparse matrix corresponding to the first-layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map, and use the first-layer convolution module to analyze the target point cloud. The feature information to be convolved in the feature map is subjected to convolution processing to generate a feature map input to the second-layer convolution module; based on the target sparse matrix corresponding to the k-th layer convolution module in the neural network, determine the input to The feature information to be convoluted in the feature map of the k-th layer convolution module is to use the k-th layer convolution module of the neural network to perform convolution features in the feature map of the k-th layer convolution module. The information is subjected to convolution processing to generate a feature map input to the convolution module of the k+1 layer, where k is a positive integer greater than 1 and less than n, and n is the total number of layers of the convolution module of the neural network; based on The target sparse matrix corresponding to the nth layer convolution module in the neural network determines the feature information to be convolved in the feature map input to the nth layer convolution module, and uses the nth layer convolution of the neural network. module, which performs convolution processing on the feature information to be convolved in the feature map of the n-th layer convolution module to obtain three-dimensional detection data of the target object included in the target scene.

Here, based on the target sparse matrix of each layer of convolution module and the input feature map, the feature information to be convoluted can be determined, the feature information to be convolved is subjected to convolution processing, and the features in the feature map except the feature information to be convolved can be convoluted. Other feature information is not processed by convolution, which reduces the calculation amount of convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, which can reduce the calculation volume of neural network and improve the target object's performance. detection efficiency.

In a possible implementation manner, based on the target point cloud feature map and the at least one target sparse matrix, a neural network for detecting target objects is used to determine the three-dimensional detection data of the target objects included in the target scene, including: For each layer of convolution modules in the neural network except the last layer of convolution modules, determine the layer based on the target sparse matrix corresponding to the convolution module of the layer and the feature map input to the convolution module of the layer. The convolution vector corresponding to the convolution module; based on the convolution vector corresponding to the convolution module of this layer, determine the feature map input to the convolution module of the next layer; based on the target sparse matrix and input corresponding to the convolution module of the last layer To the feature map of the convolution module of the last layer, determine the convolution vector corresponding to the convolution module of the last layer; based on the convolution vector corresponding to the convolution module of the last layer, determine the target object included in the target scene. 3D inspection data.

Here, a convolution vector corresponding to each layer of convolution module can be generated based on the target sparse matrix of each layer of convolution module and the input feature map, and the convolution vector includes the feature information to be processed in the feature map. The feature information to be processed is: the feature information in the feature map that matches the position of the three-dimensional detection data of the target object indicated in the target sparse matrix, the generated convolution vector is processed, and the feature map is processed except the pending processing. The feature information other than the feature information is not processed, which reduces the calculation amount of the convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, and can reduce the calculation amount of the neural network. Improve the detection efficiency of target objects.

For descriptions of the effects of the following apparatuses, electronic devices, etc., reference may be made to the descriptions of the above-mentioned methods, which will not be repeated here.

In a second aspect, the present disclosure provides a target object detection device, comprising: an acquisition module for acquiring target point cloud data of a target scene collected by a radar device; a generation module for generating all target point cloud data based on the target point cloud data at least one target sparse matrix corresponding to the target point cloud data; the target sparse matrix is used to characterize whether there are target objects at different positions of the target scene; a determination module is used for the at least one target sparse matrix, and all The target point cloud data is used to determine the three-dimensional detection data of the target object included in the target scene.

In a third aspect, the present disclosure provides an electronic device, including a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the The memories communicate with each other through a bus, and when the machine-readable instructions are executed by the processor, the steps of the target object detection method according to the first aspect or any one of the implementation manners are executed.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, where a computer program stored thereon is executed by a processor to execute the steps of the target object detection method according to the first aspect or any one of the embodiments.

In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings required in the embodiments will be briefly introduced below. These drawings illustrate embodiments consistent with the present disclosure, and together with the description, serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

FIG. 1 shows a schematic flowchart of a target object detection method provided by an embodiment of the present disclosure;

2 shows a schematic flowchart of a specific method for determining a target sparse matrix corresponding to each layer of convolution modules in a neural network based on target point cloud data in a target object detection method provided by an embodiment of the present disclosure;

3 shows a schematic diagram of a target area and an initial sparse matrix corresponding to the target area provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic diagram of the architecture of a target object detection apparatus provided by an embodiment of the present disclosure;

FIG. 5 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

detailed description

In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments These are only some of the embodiments of the present disclosure, but not all of the embodiments. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

Generally, object detection and segmentation algorithms based on convolutional neural networks organize a complex computing model through a huge amount of parameters to complete specific tasks. Such computing models often have extremely high requirements on the performance of computing devices, and practical applications There are problems such as large amount of calculation, large power consumption and high delay in the detection process of target objects, which lead to the complex, computational and time-consuming process of target object detection.

In order to solve the above problem, an embodiment of the present disclosure provides a target object detection method, which can reduce the computation amount of the neural network and improve the detection efficiency of the target object.

To facilitate understanding of the embodiments of the present disclosure, a target object detection method disclosed in the embodiments of the present disclosure will be described in detail below.

The execution body of the method may be a server or a terminal device, for example, the terminal device may be a mobile phone, a tablet computer, a vehicle-mounted computer, or the like.

FIG. 1 is a schematic flowchart of a target object detection method provided by an embodiment of the present disclosure. As shown in FIG. 1 , the method includes S101 to S103, wherein:

S101 , acquiring target point cloud data of a target scene collected by a radar device.

S102 , based on the target point cloud data, generate at least one target sparse matrix corresponding to the target point cloud data; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene.

S103, based on at least one target sparse matrix and target point cloud data, determine three-dimensional detection data of the target object included in the target scene.

In the above method, at least one corresponding target sparse matrix may be generated for the obtained target point cloud data, and the target sparse matrix is used to represent whether there are target objects corresponding to the target point cloud data at different positions of the target scene, for example, in In the field of autonomous driving, the target objects can be motor vehicles, non-motor vehicles, pedestrians, obstacles, etc. around the autonomous vehicle equipped with radar devices; accordingly, based on the target sparse matrix and target point cloud data, determine the three-dimensional When detecting data, the target position of the target object in the target scene can be determined based on the target sparse matrix, so that the features corresponding to the target position can be processed, while the features corresponding to other positions in the target scene except the target position are not. processing, thereby reducing the amount of calculation required to obtain the three-dimensional detection data of the target object, and improving the detection efficiency.

For S101 : the radar device may be a laser radar, a millimeter-wave radar, or the like, and the embodiment of the present disclosure is described by taking the radar device as a laser radar device as an example. The LiDAR device collects the target point cloud data of the target scene through real-time emission scan lines. The target scene can be any scene. For example, in the field of automatic driving, the target scene can be a real-time scene encountered by a vehicle equipped with a lidar device during driving.

For S102: after acquiring the target point cloud data of the target scene, at least one target sparse matrix corresponding to the target point cloud data may be generated based on the target point cloud data. Among them, the target sparse matrix can represent whether there are target objects at different positions of the target scene.

Here, the sparse matrix may be a matrix including 0 and 1, and the element value of the sparse matrix is 0 or 1. For example, the value of the matrix element corresponding to the position where the target object exists in the target scene can be set to 1, then the position is the target position; the value of the matrix element corresponding to the position where the target object does not exist in the target scene can be set to 0.

In an optional embodiment, generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data may include: determining, based on the target point cloud data, the volume of each layer in the neural network used to detect the target object. The target sparse matrix corresponding to the product module.

Here, the neural network may be a trained neural network for detecting target objects. The neural network may include multiple layers of convolution modules, and each layer of convolution modules may include one layer of convolution layers. During specific implementation, a corresponding target sparse matrix may be determined for each layer of convolution modules, that is, each layer of convolution modules. layer to determine the corresponding target sparse matrix; or, the neural network may include multiple network modules (blocks), and each network module includes a multi-layer convolutional layer. During specific implementation, a corresponding target may be determined for each network module. The sparse matrix is to determine a corresponding target sparse matrix for the multi-layer convolutional layers included in the network module. Wherein, the structure of the neural network for detecting the target object can be set as required, and this is only an exemplary description.

For the trained neural network for detecting target objects, a corresponding target sparse matrix can be determined for each layer of convolution modules in the neural network based on the target point cloud data.

When training the neural network for detecting the target object, the training sample point cloud data can be obtained, and based on the training sample point cloud data, at least one sample sparse matrix corresponding to the training sample point cloud data can be generated, and then the training sample point cloud data can be generated based on the training sample. The point cloud data and the corresponding at least one sample sparse matrix are used to train the neural network, thereby obtaining a trained neural network.

In the above embodiment, based on the target point cloud data, a corresponding target sparse matrix can be determined for each layer of the convolution module of the neural network, so that each layer of the convolution module can be based on the target sparse matrix. Input feature map (feature map) to be processed.

In an optional embodiment, as shown in FIG. 2 , based on the target point cloud data, the target sparse matrix corresponding to each layer of convolution modules in the neural network is determined, which may include:

S201: Based on the target point cloud data, an initial sparse matrix is generated.

S202: Based on the initial sparse matrix, determine a target sparse matrix that matches the target size of the feature map input to the convolution module of each layer of the neural network.

In the above embodiment, an initial sparse matrix can be generated based on the target point cloud data, and then based on the initial sparse matrix, a corresponding target sparse matrix can be determined for each layer of the convolution module of the neural network, and the target corresponding to each layer of the convolution module. The sparse matrix matches the target size of the feature map input to the convolution module of this layer, so that each layer of the convolution module can process the input feature map based on the target sparse matrix.

For S201: as an optional implementation manner, based on the target point cloud data, an initial sparse matrix is generated, including:

A1: Determine a target area corresponding to the target point cloud data, and divide the target area into a plurality of grid areas according to a preset number of grids.

A2: Determine the matrix element value corresponding to each grid area based on the grid area where the point of the target point cloud corresponding to the target point cloud data is located.

A3, based on the matrix element value corresponding to each grid area, generate an initial sparse matrix corresponding to the target point cloud data.

Here, based on the target point cloud data, it can be judged whether there are points of the target point cloud in each grid area, and based on the judgment result, the matrix element value of each grid area can be determined. For example, if there is a target point cloud in the grid area , then the matrix element value of the grid area is 1, indicating that there is a target object at the location of the grid area, and then based on the matrix element values corresponding to each grid area, an initial sparse matrix is generated, which is used for subsequent determination of the target object. 3D inspection data provides data support.

Exemplarily, the target area corresponding to the target point cloud data may be: based on the position when the laser radar device acquires the target point cloud data (for example, take this position as the starting position) and the farthest distance that the laser radar device can detect (for example, , taking the longest distance as the length) to determine the obtained detection area. Among them, the target area can be determined according to the actual situation in combination with the target point cloud data.

During specific implementation, the preset number of grids may be N×M, and the target area may be divided into N×M grid areas, where N and M are positive integers. Among them, the values of N and M can be set according to actual needs.

During specific implementation, the target point cloud data includes the position information of multiple points of the target point cloud, and the grid area where each point is located can be determined based on the position information of the points, and further, for each grid area, in When there is a point corresponding to the target point cloud in the grid area, the value of the matrix element corresponding to the grid area can be 1; when there is no point corresponding to the target point cloud in the grid area, the grid area The value of the matrix element corresponding to the grid area can be 0, so the value of the matrix element corresponding to each grid area is determined.

After the matrix element value corresponding to each grid area is determined, an initial sparse matrix corresponding to the target point cloud data can be generated based on the matrix element value corresponding to each grid area, wherein the number of rows and columns of the initial sparse matrix The number corresponds to the number of grids. For example, if the number of grids is N×M, the number of rows of the initial sparse matrix is N, and the number of columns is M, that is, the initial sparse matrix is an N×M matrix.

Referring to FIG. 3 , the figure includes a laser radar device 31 . Taking the laser radar device as the center, the obtained target area 32 is divided into a plurality of grid areas according to the preset number of grids to obtain The divided grid regions 321 . Then determine the grid area where multiple points of the target point cloud corresponding to the target point cloud data are located, and set the matrix element value of the grid area where the points of the target point cloud exist (that is, the grid area with black shadows in the figure). If it is 1, the matrix element value of the grid area of the point without the target point cloud is set to 0, and the matrix element value of each grid area is obtained. Finally, based on the matrix element values corresponding to each grid region, an initial sparse matrix 33 corresponding to the target point cloud data is generated.

For S202: after the initial sparse matrix is obtained, a target sparse matrix matching the target size of the feature map input to each layer of convolution module of the neural network may be determined based on the initial sparse matrix.

As an optional embodiment, the target sparse matrix matching the target size of the feature map input to the convolution module of each layer of the neural network can be determined in the following manner:

Mode 1: Based on the initial sparse matrix, determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and use the output sparse matrix as the target sparse matrix.

Method 2: Based on the initial sparse matrix, determine the input sparse matrix corresponding to each layer of convolution module in the neural network, and use the input sparse matrix as the target sparse matrix.

Method 3: Based on the initial sparse matrix, determine the input sparse matrix and output sparse matrix corresponding to each layer of convolution module in the neural network, fuse the input sparse matrix and the output sparse matrix to obtain a fused sparse matrix, and use the fused sparse matrix as the The target sparse matrix corresponding to the layer convolution module.

Here, the target sparse matrix may be obtained from the output sparse matrix, may also be obtained from the input sparse matrix, or may also be obtained by fusing the input sparse matrix and the output sparse matrix.

For the first method, this method obtains the target sparse matrix from the output sparse matrix. During specific implementation, the output sparse matrix corresponding to each layer of convolution module in the neural network may be determined based on the initial sparse data, and the output sparse matrix is the target sparse matrix. The output sparse matrix can be used to represent whether there are target objects at different positions corresponding to the target scene in the output results of each layer of convolution modules in the neural network. For example, if the output results of each layer of convolution modules in the neural network When there is a target object at position A corresponding to the target scene, in the output sparse matrix, the value of the matrix element at the position corresponding to this position A can be 1; if there is no target object at position A, the output sparse matrix , the value of the matrix element at the position corresponding to the position A may be 0.

For the second method, this method obtains the target sparse matrix from the input sparse matrix. During specific implementation, the input sparse matrix corresponding to each layer of convolution modules in the neural network may be determined based on the initial sparse data, and the input sparse matrix is the target sparse matrix. The input sparse matrix may represent whether there are target objects at different positions corresponding to the target scene in the input data of each layer of convolution modules in the neural network. For example, if there is a target object at position A corresponding to the target scene in the input data of each layer of convolution module in the neural network, then in the input sparse matrix, the value of the matrix element at the position corresponding to this position A can be 1 ; If there is no target object at the position A, in the input sparse matrix, the value of the matrix element at the position corresponding to the position A can be 0.

For mode 3, the output sparse matrix corresponding to each layer of convolution module can be determined by mode 1, and the input sparse matrix corresponding to each layer of convolution module can be determined by mode 2, and the input sparse matrix corresponding to each layer of convolution module can be determined. It is fused with the output sparse matrix to obtain a fused sparse matrix, and the fused sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.

In specific implementation, the intersection of the input sparse matrix and the output sparse matrix can be taken to obtain a fused sparse matrix; the union of the input sparse matrix and the output sparse matrix can also be taken to obtain the fused sparse matrix. For example, if the input sparse matrix is:

If the output sparse matrix is:

Then take the intersection of the input sparse matrix and the output sparse matrix, and the obtained fusion sparse matrix is:

Then take the union of the input sparse matrix and the output sparse matrix, and the obtained fusion sparse matrix is:

In an optional embodiment, based on the initial sparse matrix, the input sparse matrix corresponding to each layer of convolution modules in the neural network is determined, which may include:

B1, take the initial sparse matrix as the input sparse matrix corresponding to the first layer convolution module of the neural network.

B2, based on the input sparse matrix corresponding to the i-1 layer convolution module, determine the input sparse matrix corresponding to the i layer convolution module and matching the target size of the feature map input by the i layer convolution module; wherein, i is a positive integer greater than 1 and less than n+1, where n is the total number of layers of the convolution module of the neural network.

Here, the initial sparse matrix can be used as the input sparse matrix corresponding to the first-layer convolution module of the neural network. The input sparse matrix corresponding to the second layer convolution module can be obtained from the input sparse matrix corresponding to the first layer convolution module, and the number of rows and columns of the input sparse matrix corresponding to the second layer convolution module is the same as the number of rows and columns input to the second layer The target size of the feature maps of the convolution module is consistent.

Exemplarily, an image expansion processing operation or an image erosion processing operation can be used to process the input sparse matrix corresponding to the first-layer convolution module to obtain a processed sparse matrix, and the number of rows and columns of the processed sparse matrix is adjusted to After the target size of the feature map input by the second-layer convolution module is matched, the input sparse matrix of the second-layer convolution module is obtained. By analogy, the input sparse matrix corresponding to the first layer convolution module, the input sparse matrix corresponding to the second layer convolution module, ..., the input sparse matrix corresponding to the nth layer convolution module (that is, the last layer of the neural network) can be obtained. input sparse matrix corresponding to the convolution module).

Exemplarily, a dilation processing range can be predetermined, and image dilation processing is performed on the input sparse matrix based on the dilation processing range to obtain a processed sparse matrix, wherein the dilation processing range can be determined based on the size threshold of the target object, or can be determined according to the size threshold of the target object. actually needs to be determined.

For example, if the input sparse matrix is:

Then the dilated sparse matrix can be:

Wherein, the above-mentioned expansion treatment process is only illustrative.

Exemplarily, the erosion process of the input sparse matrix is the inverse process of the expansion process. Specifically, the erosion process range can be predetermined, and the input sparse matrix is subjected to image erosion processing based on the erosion process range to obtain the processed sparse matrix. The corrosion processing range may be determined based on the size threshold of the target object, or may be determined according to actual needs.

For example, if the input sparse matrix is:

Then the sparse matrix after erosion processing can be:

Wherein, the above-mentioned etching treatment process is only illustrative.

During specific implementation, the number of rows and columns of the processed sparse matrix can be adjusted to a matrix matching the target size of the feature map input by the second-layer convolution module by means of up-sampling or down-sampling to obtain the second The input sparse matrix of the layer convolution module, wherein, there are various processes for adjusting the number of rows and columns of the processed sparse matrix, which are only illustrative here.

During specific implementation, the sparse degree of the sparse matrix can also be adjusted. For example, the sparse degree of the sparse matrix can be adjusted by adjusting the number of grids; or the sparse degree of the sparse matrix can also be adjusted through the erosion process. . The sparse degree of the sparse matrix is: the ratio of the number of matrix elements with a matrix element value of 1 in the sparse matrix to the total number of all matrix elements included in the sparse matrix.

In a possible implementation manner, based on the initial sparse matrix, the output sparse matrix corresponding to each layer of convolution module in the neural network is determined, which may include:

C1, based on the size threshold of the target object and the initial sparse matrix, determine the output sparse matrix corresponding to the neural network.

C2, based on the output sparse matrix, generate an output sparse matrix corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module.

C3, based on the output sparse matrix corresponding to the j+1th layer convolution module, generate an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module, where j is a positive integer greater than or equal to 1 and less than n, where n is the total number of layers of the convolution module of the neural network.

Here, you can first determine the expansion processing range according to the size threshold of the target object, and perform expansion processing on the initial sparse matrix based on the expansion processing range to obtain the processed sparse matrix. The processed sparse matrix is the output sparse matrix corresponding to the neural network. . For the expansion process, reference may be made to the above description, which will not be repeated here.

Using the output sparse matrix, determine the output sparse matrix of the nth layer convolution module of the neural network (that is, the last layer of the convolution module of the neural network), and so on to obtain the output sparse matrix of the n-1th layer convolution module, ... , the output sparse matrix of the second-layer convolution module, and the output sparse matrix of the first-layer convolution module.

Exemplarily, the image expansion processing operation or the image erosion processing operation can be used to process the output sparse matrix corresponding to the convolution module of the previous layer to obtain the processed sparse matrix, and the number of rows and columns of the processed sparse matrix can be obtained. After adjusting to match the target size of the feature map input by the current layer convolution module, the output sparse matrix of the current layer convolution module is obtained. For the process of determining the output sparse matrix of the convolution module of each layer, reference may be made to the above-mentioned process of determining the input sparse matrix, which will not be described in detail here.

For the case where the target sparse matrix of each convolution module of the neural network is obtained by the fusion of the input sparse matrix and the output sparse matrix, the output sparse matrix and the input sparse matrix of each convolution module of each layer can be obtained by using the above method, respectively. The obtained output sparse matrix and the input sparse matrix are fused to obtain the target sparse matrix of each convolution module.

In the above method, the output sparse matrix can be determined based on the initial sparse matrix, and the output sparse matrix of the n-th layer convolution module, . The output sparse matrix of the layer determines the target sparse matrix, which provides data support for the subsequent determination of the 3D detection data of the target object based on the target sparse matrix of the convolution module of each layer.

For S103: In specific implementation, three-dimensional detection data of the target object included in the target scene may be determined based on at least one target sparse matrix, target point cloud data, and a neural network for detecting the target object. The three-dimensional detection data includes the coordinates of the center point of the detection frame of the target object, the three-dimensional size of the detection frame, the orientation angle of the detection frame, the type of the detection frame, the confidence level of the detection frame, the ID of the target tracking, the speed and acceleration of the target object and one or more of timestamps, etc.

Here, the position of the 3D detection frame of the target object cannot exceed the position of the target area, that is, if the coordinates of the center point of the 3D detection frame are (X, Y, Z) and the dimensions are length L, width W, and height H, the following conditions are satisfied Condition: 0≤X-2/L, X+2/L<N _max , 0≤YW/2, Y+W/2<M _max , where N _max and M _max are the length and width thresholds of the target area .

In an optional embodiment, the three-dimensional detection data of the target object included in the target scene is determined based on at least one target sparse matrix and target point cloud data, including:

Step 1: Based on the target point cloud data, a target point cloud feature map corresponding to the target point cloud data is generated.

Step 2: Based on the target point cloud feature map and at least one target sparse matrix, use a neural network for detecting the target object to determine the three-dimensional detection data of the target object included in the target scene, wherein the neural network includes a multi-layer convolution module.

In specific implementation, the target point cloud data can be input into the neural network, the target point cloud data can be preprocessed, the target point cloud feature map corresponding to the target point cloud data can be generated, and then the target point cloud feature map, at least one target A sparse matrix, and a neural network, determine the three-dimensional detection data of the target object included in the target scene. Here, the feature map corresponding to the target point cloud data (ie, corresponding to the target point cloud) is simply referred to as the target point cloud feature map.

In step 1, based on the target point cloud data, a target point cloud feature map corresponding to the target point cloud data is generated, which may include:

For each grid area, the feature information corresponding to the grid area is determined based on the coordinate information indicated by the target point cloud data corresponding to the points of the target point cloud located in the grid area; The number of grids, generated by dividing the target area corresponding to the target point cloud data.

Based on the feature information corresponding to each grid area, a target point cloud feature map corresponding to the target point cloud data is generated.

For each grid area, if there are points of the target point cloud in the grid area, the coordinate information indicated by the target point cloud data corresponding to each point constitutes the feature information corresponding to the grid area; When there is no point in the target point cloud, the feature information of the grid area can be 0.

Based on the feature information corresponding to each grid area, the target point cloud feature map corresponding to the target point cloud data is generated. Among them, the size of the target point cloud feature map can be N×M×C, the size of the target point cloud feature map N×M is consistent with the size of the target sparse matrix of the first-layer convolution module, and the C It can be the maximum number of points of the target point cloud included in each grid area. For example, if the grid area A includes the largest number of points in the target point cloud in each grid area, There are 50 points in the target point cloud, and the value of C is 50, that is, the target point cloud feature map includes 50 feature maps with a size of N×M, and each feature map includes at least one point in the target point cloud. Coordinate information .

In the above embodiment, based on the feature information corresponding to each grid area, a target point cloud feature map corresponding to the target point cloud data is generated, and the target point cloud feature map includes the position information of each point of the target point cloud, and then Based on the target point cloud feature map and the at least one target sparse matrix, the three-dimensional detection data of the target object included in the target scene can be more accurately determined.

In step 2, the three-dimensional detection data of the target object included in the target scene may be determined based on the target point cloud feature map, at least one target sparse matrix, and the neural network.

During specific implementation, the three-dimensional detection data of the target object included in the target scene can be determined in the following two ways:

Method 1: Based on the target point cloud feature map and at least one target sparse matrix, determine the three-dimensional detection data of the target object included in the target scene, including:

1. Based on the target sparse matrix corresponding to the first-layer convolution module in the neural network, determine the feature information to be convoluted in the target point cloud feature map, and use the first-layer convolution module to perform convolution on the target point cloud feature map. Convolution processing is performed on the product feature information to generate a feature map that is input to the second-layer convolution module.

2. Based on the target sparse matrix corresponding to the k-th layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map input to the k-th layer convolution module, and use the k-th layer convolution of the neural network. module, which performs convolution processing on the feature information to be convolved in the target point cloud feature map of the kth layer convolution module, and generates a feature map input to the k+1th layer convolution module, where k is greater than 1 and less than A positive integer of n, where n is the total number of layers of the convolution module of the neural network.

3. Based on the target sparse matrix corresponding to the nth layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map input to the nth layer convolution module, and use the nth layer convolution of the neural network. The module performs convolution processing on the feature information to be convolved in the target point cloud feature map of the nth layer convolution module to obtain the three-dimensional detection data of the target object included in the target scene.

In the above embodiment, the target sparse matrix of the first-layer convolution module can be used to determine the feature information to be convolved in the target point cloud feature map input to the first-layer convolution module. Specifically, the target position with the matrix value of 1 in the target sparse matrix may be determined, and the feature information of the position corresponding to the target position in the target point cloud feature map is determined as the feature information to be convolved.

Then, the convolution module of the first layer is used to perform convolution processing on the feature information to be convolved in the feature map of the target point cloud, and the feature map input to the convolution module of the second layer is generated. Then use the target sparse matrix of the second-layer convolution module to determine the information to be convolved in the feature map input to the second-layer convolution module, and use the second-layer convolution module to analyze the features of the second-layer convolution module The feature information to be convolved in the figure is processed by convolution to generate a feature map input to the third-layer convolution module, and so on to obtain the input to the n-th layer convolution module (the last layer of the convolution module in the neural network). In the feature map, the three-dimensional detection data of the target object included in the target scene is obtained by determining the to-be-convolutional information of the n-th layer convolution module, and performing convolution processing on the to-be-convolved information of the n-th layer convolution module.

Here, the feature information to be convoluted can be determined based on the target sparse matrix of each layer of convolution module and the input feature map, and the feature information to be convolved is subjected to convolution processing. Convolution processing is not performed on other feature information of the neural network, which reduces the calculation amount of convolution processing of each layer of convolution module, improves the operation efficiency of each layer of convolution module, which can reduce the calculation amount of the neural network and improve the target object. detection efficiency.

Method 2: Based on the target point cloud feature map and at least one target sparse matrix, determine the three-dimensional detection data of the target object included in the target scene, including:

1. For each layer of convolution modules in the neural network except the last layer of convolution modules, determine the layer based on the target sparse matrix corresponding to the convolution module of this layer and the feature map input to the convolution module of this layer. The convolution vector corresponding to the convolution module; based on the convolution vector corresponding to the convolution module of this layer, the feature map input to the convolution module of the next layer is determined.

2. Based on the target sparse matrix corresponding to the convolution module of the last layer and the feature map input to the convolution module of the last layer, determine the convolution vector corresponding to the convolution module of the last layer; based on the volume corresponding to the convolution module of the last layer The product vector determines the three-dimensional detection data of the target object included in the target scene.

In the above embodiment, the convolution vector corresponding to the convolution module of each layer may also be determined based on the target input matrix corresponding to the convolution module of each layer and the feature map input to the convolution module of the layer. For example, for the first-layer convolution module, the target position with the matrix value of 1 in the target sparse matrix of the first-layer convolution module can be determined, and the feature information of the position corresponding to the target position in the target point cloud feature map can be determined to extract The feature information corresponding to the target position constitutes the convolution vector corresponding to the first-layer convolution module.

Further, the img2col and col2img technologies can be used to perform matrix multiplication operations on the corresponding convolution vectors through the first-layer convolution module to obtain a feature map input to the second convolution module. Based on the same processing process, the feature map input to the convolution module of the last layer can be obtained. Based on the target sparse matrix and feature map corresponding to the convolution module of the last layer, the convolution vector corresponding to the convolution module of the last layer is determined. The convolution vector corresponding to the convolution module of the last layer is processed to determine the three-dimensional detection data of the target object included in the target scene.

Those skilled in the art can understand that in the above method of the specific implementation, the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.

Based on the same concept, an embodiment of the present disclosure also provides a target object detection apparatus. Referring to FIG. 4 , a schematic diagram of the architecture of the target object detection apparatus provided by the embodiment of the present disclosure includes an acquisition module 401 , a generation module 402 and a determination Module 403.

an acquisition module 401, configured to acquire target point cloud data of a target scene collected by a radar device;

A generating module 402, configured to generate at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;

The determining module 403 is configured to determine the three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data.

In a possible implementation manner, the generating module 402, when generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data, is configured to: based on the target point cloud data, A target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object is determined.

In a possible implementation manner, when the generating module 402 determines, based on the target point cloud data, the target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object, for: generating an initial sparse matrix based on the target point cloud data; and determining a target sparse matrix matching the target size of the feature map input to each layer of convolution module of the neural network based on the initial sparse matrix.

In a possible implementation manner, the generating module 402, when generating an initial sparse matrix based on the target point cloud data, is used for:

determining the target area corresponding to the target point cloud data, and dividing the target area into a plurality of grid areas according to a preset number of grids;

Determine the matrix element value corresponding to each grid area based on the grid area where the point of the target point cloud corresponding to the target point cloud data is located;

Based on the matrix element value corresponding to each grid area, an initial sparse matrix corresponding to the target point cloud data is generated.

In a possible implementation, when the generation module 402 determines a target sparse matrix matching the target size of the feature map input to each layer of the convolution module of the neural network based on the initial sparse matrix, use At:

Based on the initial sparse matrix, determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and use the output sparse matrix as the target sparse matrix; or,

Based on the initial sparse matrix, determine the input sparse matrix corresponding to each layer of convolution module in the neural network, and use the input sparse matrix as the target sparse matrix; or,

Based on the initial sparse matrix, determine the input sparse matrix and the output sparse matrix corresponding to each layer of convolution module in the neural network, and fuse the input sparse matrix and the output sparse matrix to obtain a fused sparse matrix. The fusion sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.

In a possible implementation manner, when determining the input sparse matrix corresponding to each layer of convolution module in the neural network based on the initial sparse matrix, the generating module 402 is configured to: generate the initial sparse matrix as the input sparse matrix corresponding to the convolution module of the first layer of the neural network; based on the input sparse matrix corresponding to the convolution module of the i-1 layer, determine the volume corresponding to the convolution module of the i-th layer and the volume of the i-th layer The input sparse matrix matching the target size of the feature map input by the product module; wherein, i is a positive integer greater than 1 and less than n+1, and n is the total number of layers of the convolution module of the neural network.

In a possible implementation, the generation module 402, when determining, based on the initial sparse matrix, a target sparse matrix that matches the target size of the feature map input by the convolution module of each layer of the neural network, uses In: determining the output sparse matrix corresponding to the neural network based on the size threshold of the target object and the initial sparse matrix;

Based on the output sparse matrix, an output sparse matrix corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module is generated;

Based on the output sparse matrix corresponding to the j+1th layer convolution module, an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module is generated, where j is a positive integer greater than or equal to 1 and less than n, where n is the total number of layers of the convolution module of the neural network.

In a possible implementation manner, when determining the three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data, the determining module 403 uses At:

generating, based on the target point cloud data, a target point cloud feature map corresponding to the target point cloud data;

Based on the target point cloud feature map and the at least one target sparse matrix, a neural network for detecting target objects is used to determine three-dimensional detection data of the target objects included in the target scene, wherein the neural network includes multiple layers convolution module.

In a possible implementation manner, the determining module 403, when generating the target point cloud feature map corresponding to the target point cloud data based on the target point cloud data, is used for:

For each grid area, the feature information corresponding to the grid area is determined based on the coordinate information indicated by the target point cloud data corresponding to the points of the target point cloud located in the grid area; wherein, the grid area The area is generated by dividing the target area corresponding to the target point cloud data according to the preset number of grids;

In a possible implementation manner, the determining module 403 determines, based on the target point cloud feature map and the at least one target sparse matrix, the target object included in the target scene by using a neural network for detecting target objects. When the 3D inspection data of , is used for:

Based on the target sparse matrix corresponding to the first-layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map, and use the first-layer convolution module to analyze the target point cloud. The feature information to be convolved in the feature map is subjected to convolution processing to generate a feature map input to the second-layer convolution module;

Based on the target sparse matrix corresponding to the kth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the kth layer convolution module, and use the kth layer volume of the neural network. The product module performs convolution processing on the feature information to be convolved in the feature map of the kth layer convolution module, and generates a feature map input to the k+1th layer convolution module, where k is greater than 1 and less than A positive integer of n, n is the total number of layers of the convolution module of the neural network;

Based on the target sparse matrix corresponding to the nth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the nth layer convolution module, and use the nth layer volume of the neural network. The product module performs convolution processing on the feature information to be convolved in the feature map of the nth layer convolution module to obtain the three-dimensional detection data of the target object included in the target scene.

For each layer of convolution modules in the neural network except the last layer of convolution modules, determine the layer based on the target sparse matrix corresponding to the convolution module of the layer and the feature map input to the convolution module of the layer. The convolution vector corresponding to the convolution module; based on the convolution vector corresponding to the convolution module of this layer, determine the feature map input to the convolution module of the next layer; based on the target sparse matrix and input corresponding to the convolution module of the last layer To the feature map of the convolution module of the last layer, determine the convolution vector corresponding to the convolution module of the last layer; based on the convolution vector corresponding to the convolution module of the last layer, determine the target object included in the target scene. 3D inspection data.

In some embodiments, the functions or templates included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments. For specific implementation, reference may be made to the above method embodiments. For brevity, here No longer.

Based on the same technical concept, referring to FIG. 5 , an embodiment of the present disclosure further provides an electronic device including a processor 501 , a memory 502 and a bus 503 . Among them, the memory 502 is used to store execution instructions, including the memory 5021 and the external memory 5022; the memory 5021 here is also called the internal memory, which is used to temporarily store the operation data in the processor 501 and the data exchanged with the external memory 5022 such as the hard disk, The processor 501 exchanges data with the external memory 5022 through the memory 5021. When the electronic device 500 is running, the processor 501 communicates with the memory 502 through the bus 503, so that the processor 501 executes the following instructions:

Obtain the target point cloud data of the target scene collected by the radar device;

Based on the target point cloud data, at least one target sparse matrix corresponding to the target point cloud data is generated; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;

Based on the at least one target sparse matrix and the target point cloud data, three-dimensional detection data of the target object included in the target scene is determined.

In addition, an embodiment of the present disclosure further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the target object detection method described in the above method embodiments is executed.

The computer program product of the target object detection method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the target object detection method described in the above method embodiments. For details, refer to the above method embodiments, which will not be repeated here.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art who is familiar with the technical scope of the present disclosure can easily think of changes or substitutions, which should be covered within the scope of the present disclosure. within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

A target object detection method, comprising:

Obtain the target point cloud data of the target scene collected by the radar device;

Based on the target point cloud data, at least one target sparse matrix corresponding to the target point cloud data is generated; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;

Based on the at least one target sparse matrix and the target point cloud data, three-dimensional detection data of the target object included in the target scene is determined.
The method according to claim 1, wherein the generating at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data comprises:

Based on the target point cloud data, a target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object is determined.
The method according to claim 2, wherein the determining, based on the target point cloud data, a target sparse matrix corresponding to each layer of convolution modules in the neural network for detecting the target object, comprising:

generating an initial sparse matrix based on the target point cloud data;

Based on the initial sparse matrix, a target sparse matrix that matches the target size of the feature map input to each layer of convolution modules of the neural network is determined.
The method according to claim 3, wherein the generating an initial sparse matrix based on the target point cloud data comprises:

determining the target area corresponding to the target point cloud data;

Dividing the target area into a plurality of grid areas according to a preset number of grids;

Determine the matrix element value corresponding to each grid area based on the grid area where the point of the target point cloud corresponding to the target point cloud data is located;

Based on the matrix element value corresponding to each grid area, an initial sparse matrix corresponding to the target point cloud data is generated.
The method according to claim 3 or 4, wherein, based on the initial sparse matrix, the target sparse matrix that matches the target size of the feature map input to each layer of convolution module of the neural network is determined , including any of the following:

Based on the initial sparse matrix, determine the output sparse matrix corresponding to each layer of convolution module in the neural network, and use the output sparse matrix as the target sparse matrix;

Based on the initial sparse matrix, determine the input sparse matrix corresponding to each layer of convolution module in the neural network, and use the input sparse matrix as the target sparse matrix;

Based on the initial sparse matrix, determine the input sparse matrix and the output sparse matrix corresponding to each layer of convolution module in the neural network, and fuse the input sparse matrix and the output sparse matrix to obtain a fused sparse matrix. The fusion sparse matrix is used as the target sparse matrix corresponding to the convolution module of this layer.
The method according to claim 5, wherein the determining, based on the initial sparse matrix, the input sparse matrix corresponding to each layer of convolution modules in the neural network comprises:

Using the initial sparse matrix as the input sparse matrix corresponding to the first-layer convolution module of the neural network;

Based on the input sparse matrix corresponding to the i-1 layer convolution module, determine the input sparse matrix corresponding to the i layer convolution module and matching the target size of the feature map input by the i layer convolution module, where i is a positive integer greater than 1 and less than n+1, where n is the total number of layers of the convolution module of the neural network.
method according to claim 5, is characterized in that, described based on described initial sparse matrix, determine the output sparse matrix corresponding to each layer convolution module in described neural network, including:

Determine the output sparse matrix corresponding to the neural network based on the size threshold of the target object and the initial sparse matrix;

Based on the output sparse matrix, an output sparse matrix corresponding to the nth layer convolution module and matching the target size of the feature map input by the nth layer convolution module is generated;

Based on the output sparse matrix corresponding to the j+1th layer convolution module, an output sparse matrix corresponding to the jth layer convolution module and matching the target size of the feature map input by the jth layer convolution module is generated, where j is a positive integer greater than or equal to 1 and less than n, where n is the total number of layers of the convolution module of the neural network.
The method according to any one of claims 1 to 7, wherein the three-dimensional image of the target object included in the target scene is determined based on the at least one target sparse matrix and the target point cloud data Inspection data, including:

generating, based on the target point cloud data, a target point cloud feature map corresponding to the target point cloud data;

Based on the target point cloud feature map and the at least one target sparse matrix, a neural network for detecting the target object is used to determine three-dimensional detection data of the target object included in the target scene, wherein the neural network It includes a multi-layer convolution module.
The method according to claim 8, wherein the generating a target point cloud feature map corresponding to the target point cloud data based on the target point cloud data comprises:

For each grid area, the feature information corresponding to the grid area is determined based on the coordinate information indicated by the target point cloud data corresponding to the points of the target point cloud located in the grid area; wherein, the grid area The area is generated by dividing the target area corresponding to the target point cloud data according to the preset number of grids;

Based on the feature information corresponding to each grid area, a target point cloud feature map corresponding to the target point cloud data is generated.
The method according to claim 8 or 9, characterized in that, based on the target point cloud feature map and the at least one target sparse matrix, using a neural network for detecting the target object, determine whether the target scene is in the target scene. The three-dimensional detection data of the target object included, including:

Based on the target sparse matrix corresponding to the first-layer convolution module in the neural network, determine the feature information to be convolved in the target point cloud feature map, and use the first-layer convolution module to analyze the target point cloud. The feature information to be convolved in the feature map is subjected to convolution processing to generate a feature map input to the second-layer convolution module;

Based on the target sparse matrix corresponding to the kth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the kth layer convolution module, and use the kth layer volume of the neural network. The product module performs convolution processing on the feature information to be convolved in the feature map of the kth layer convolution module, and generates a feature map input to the k+1th layer convolution module, where k is greater than 1 and less than A positive integer of n, n is the total number of layers of the convolution module of the neural network;

Based on the target sparse matrix corresponding to the nth layer convolution module in the neural network, determine the feature information to be convolved in the feature map input to the nth layer convolution module, and use the nth layer volume of the neural network. The product module performs convolution processing on the feature information to be convolved in the feature map of the nth layer convolution module to obtain the three-dimensional detection data of the target object included in the target scene.
The method according to claim 8 or 9, characterized in that, based on the target point cloud feature map and the at least one target sparse matrix, using a neural network for detecting the target object, determine whether the target scene is in the target scene. The three-dimensional detection data of the target object included, including:

For each layer of convolution modules in the neural network except the last layer of convolution modules, determine the layer based on the target sparse matrix corresponding to the convolution module of the layer and the feature map input to the convolution module of the layer. The convolution vector corresponding to the convolution module; based on the convolution vector corresponding to the convolution module of this layer, determine the feature map input to the convolution module of the next layer;

Based on the target sparse matrix corresponding to the convolution module of the last layer and the feature map input to the convolution module of the last layer, the convolution vector corresponding to the convolution module of the last layer is determined; based on the volume corresponding to the convolution module of the last layer The product vector is used to determine the three-dimensional detection data of the target object included in the target scene.
A target object detection device, comprising:

an acquisition module for acquiring the target point cloud data of the target scene collected by the radar device;

A generating module, configured to generate at least one target sparse matrix corresponding to the target point cloud data based on the target point cloud data; the target sparse matrix is used to represent whether there are target objects at different positions of the target scene;

A determination module, configured to determine three-dimensional detection data of the target object included in the target scene based on the at least one target sparse matrix and the target point cloud data.
An electronic device, comprising a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, when the electronic device is running, the processor and the memory communicate through the bus, When the machine-readable instructions are executed by the processor, the target object detection method according to any one of claims 1 to 11 is performed.
A computer-readable storage medium on which a computer program is stored, the computer program executes the target object detection method according to any one of claims 1 to 11 when the computer program is run by a processor.