CN116863433A

CN116863433A - Target detection method based on point cloud sampling and weighted fusion and related equipment

Info

Publication number: CN116863433A
Application number: CN202311131605.XA
Authority: CN
Inventors: 李强; 钟勋利; 黄磊; 孙维泽; 叶恩福
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-10-10
Anticipated expiration: 2043-09-04
Also published as: CN116863433B

Abstract

The invention discloses a target detection method and related equipment based on point cloud sampling and weighted fusion, wherein the method comprises the following steps: selecting effective points in each column by using an efficient point cloud sampling method; based on a body column feature learning network, a weighted fusion technology is adopted for extracting key point information; the processed features are converted into pseudo images and target detection is performed using a 2D convolutional neural network and a detection head. According to the invention, by introducing the efficient point cloud sampling method and the triple weighted fusion, the problem of excessive invalid points is effectively solved, and the performance of small target detection is obviously improved on the premise of not losing the detection speed.

Description

Target detection method based on point cloud sampling and weighted fusion and related equipment

Technical Field

The invention relates to the technical field of target detection, in particular to a target detection method based on point cloud sampling and weighted fusion and related equipment.

Background

Autopilot is a complex unmanned system that relies on-board sensors to sense and control decisions about the surrounding environment. To achieve decision and control of autopilot, processing of sensor data is critical. Sensors such as lidar and cameras are typically used to sense the surrounding environment and acquire 3D semantic information of objects that will be the basis of an autopilot system to assist the vehicle in making accurate decisions and controls.

The laser radar-based 3D object detection is a key technology of automatic driving on environment perception, and target detection is achieved by acquiring point cloud data in real time and processing the point cloud data through a neural network to obtain 3D semantic information of an object.

In the field of autopilot, 3D object detection methods based on voxels are widely adopted, and the methods convert unordered point cloud data into ordered voxel representations, and extract voxel features by using a three-dimensional convolution network to realize 3D object detection. The expression of the voxels reserves the shape information of the point cloud and effectively improves the processing speed of the network. As a representative method of point cloud voxelization, pointPicloras is the most widely used 3D object detection algorithm in the industry. The pointpilars first converts the point cloud into a volume column (i.e., the three-dimensional point cloud is divided from the XY plane by a certain grid size, wherein each grid cell is called a volume column, representing a vertical region), then performs feature extraction through a 2D convolution network of feature pyramid structures, and finally uses a single-stage detection head (Single Shot MultiBox Detector) to realize 3D target detection. The method has the advantages that the complexity of processing the point cloud data is effectively reduced, and meanwhile, the target detection speed is greatly improved on the premise of not losing the precision.

Currently, pointPicloras uses a random sampling method when converting point clouds into volume columns to ensure that the number of point clouds in each volume column is the same and zero sample filling is performed in the event of a deficiency. In addition, each point is given the same weight during the column feature extraction process. However, due to the variation in the density of the point cloud and the difference in the number of points in different targets, there are a large number of invalid points in each body column, thereby limiting the detection performance for small targets (i.e., pedestrians, cyclists).

Accordingly, the prior art is still in need of improvement and development.

Disclosure of Invention

The invention mainly aims to provide a target detection method, a system, a terminal and a computer readable storage medium based on point cloud sampling and weighted fusion, and aims to solve the problem that in the prior art, when target detection is carried out, invalid points are too many, and small targets cannot be effectively detected.

In order to achieve the above object, the present invention provides a target detection method based on point cloud sampling and weighted fusion, which includes the following steps:

inputting point cloud data, dividing the point cloud data into a plurality of volume columns with the same size, randomly sampling or filling zero samples into the point cloud of each volume column, expanding the dimension of each point cloud, counting the number of effective points in each volume column, and selecting a representative point cloud subset in each volume column according to the number and spatial distribution of the effective points;

For any input body column, carrying out point weighting operation on the body column to obtain point weights, encoding channel characteristics to obtain characteristic weights, multiplying the point weights and the characteristic weights to obtain a combined weight matrix, multiplying the body column and the combined weight matrix element by element to obtain a body column subjected to point weighting and characteristic weighting fusion, splicing the original body column, the body column subjected to point weighting and characteristic weighting fusion, the central point coordinates of the body column and the average value of all sampling point coordinates contained in the body column to obtain the input of a body column weighting network, calculating the body column weights of two branches, adding the body column weights calculated by the two branches to obtain final body column weights, carrying out body column weighting on the input of the body column weighting network to obtain triple weighting fusion characteristics, carrying out dimension ascending by using two linear layers, and adding the maximum pooling characteristics and the average pooling characteristics to form final body column characteristic representation;

and converting the body column characteristic representation into a pseudo image characteristic, obtaining characteristic diagrams of different scales by the pseudo image characteristic through a convolution neural network of a characteristic pyramid structure, and inputting the characteristic diagrams into a detection head so as to convert the characteristic diagrams into an output result of target detection.

Optionally, in the target detection method based on point cloud sampling and weighted fusion, the input point cloud data divides the point cloud data into a plurality of volume columns with the same size, randomly samples or fills zero samples into the point cloud of each volume column, expands the dimension of each point cloud, counts the number of effective points in each volume column, and selects a representative point cloud subset in each volume column according to the number and spatial distribution of the effective points, including:

a frame of input point cloud data V consisting of M points is expressed asEach point cloud is denoted +.>, wherein ,/>，/>，/>Representing coordinates of a point cloud->Representing the reflectivity of the point cloud;

dividing the point cloud data V into L volume columns with the same size in a three-dimensional space;

for each column, randomly sampling if the number of the point clouds exceeds N, and filling by using zero samples if the number of the point clouds is less than N;

expanding the dimension of each point cloud to obtain, wherein ,/>，/>，/>Representing the distance of the point cloud to the average of all points of the body column, +.>，/>Representing the offset of the point cloud to the center of the body column, generating a dimension of +.>Is a three-dimensional tensor of (2);

counting the number of the effective points for N points in each body column, and recording the position index of the effective points;

Based on the judgment of the Z coordinate of the point cloud, selecting a point which is non-zero and is at the middle most as an initial point so as to ensure that the initial point is an effective point and can represent the key information of the target most;

sampling a group of point cloud subsets containing K points from an initial point by adopting a sampling algorithm of the farthest point;

and according to the number of the effective points in the volume column, carrying out validity evaluation on the sampled point cloud subset, and if invalid sampling position indexes appear, replacing the sampled effective points.

Optionally, in the target detection method based on point cloud sampling and weighted fusion, the sampling method includes sampling a set of point cloud subsets including K points from an initial point by using a distance farthest point sampling algorithm, and specifically includes:

selecting a point as an initial point, and adding the initial point into the sampling point set;

calculating the distance from all other points to the selected sampling point, and finding a point farthest from the selected sampling point set;

adding the furthest point to the set of sampling points;

repeatedly performing calculation of distances from all other points to the selected sampling points, finding a point farthest from the selected sampling point set, and adding the point farthest to the sampling point set until the sampling point set reaches the required number K.

Optionally, in the target detection method based on point cloud sampling and weighted fusion, the validity evaluation is performed on a point cloud subset obtained by sampling, specifically:

and judging whether the number of sampling points is larger than the number of effective points or not, and judging whether zero sampling points appear in the sampling points or not.

Optionally, in the target detection method based on point cloud sampling and weighted fusion, for any input volume column, performing point weighting operation on the volume column to obtain point weights, encoding channel characteristics to obtain feature weights, multiplying the point weights and the feature weights to obtain a combined weight matrix, multiplying the volume column and the combined weight matrix element by element to obtain a volume column after point weighting and feature weighting fusion, splicing an original volume column, the volume column after point weighting and feature weighting fusion, a center point coordinate of the volume column and a mean value of sampling point coordinates contained in the volume column to obtain an input of a volume column weighting network, calculating the volume column weights of two branches, adding the volume column weights calculated by the two branches to obtain a final volume column weight, performing the volume column weighting on the input of the volume column weighting network to obtain a triple weighted fusion feature, using two linear layers to perform dimension lifting, and adding the maximum pooling feature and the average pooling feature to form a final volume column feature representation, and the method specifically includes:

If the input volume column is expressed as, wherein ,/>Representing the number of columns, +.>，Representing real number set,/->Representing a two-dimensional matrix of K rows and C columns, wherein K represents the number of sampling points of each column, and C represents the characteristic dimension of each point;

given arbitrary body columnK sample point aggregation features in the column obtained by maximum pooling +.>；

Global feature encoding is performed using two 1 x 1 convolutions:；

wherein C represents a 1X 1 convolution operation,representing ReLU activation function, +.>Representation->Point-by-point weights of (a);

obtaining characteristics of each channel through maximum poolingFor K sampling points in a body column, each feature of the point cloud is a channel, and a feature maximum value in the K sampling points is selected as a feature of the channel;

each channel correlation encoding is performed using two 1 x 1 convolutions:, wherein ,/>Representation->Is a characteristic weight of (a);

multiplying the point weights and the characteristic weights by element-by-element multiplication to obtain a combined weight matrix：

；

wherein ,representing sigmoid function->Representing element-by-element multiplication;

will be and />Multiplying element by element to realize weighting of point and characteristic dimension, obtaining a body column subjected to point weighting and characteristic weighting fusion>：

；

Given four input tensors, the original volume columnPost-weighted fusion of points and features Center point coordinates of body column->And the mean value of the coordinates of each sampling point contained in the column +.>Splicing the four input tensors in the third dimension to obtain the input +.>；

The method comprises the steps of calculating body column weights by adopting a first branch and a second branch, carrying out global feature aggregation on the first branch and the second branch through a maximum pooling layer and an average pooling layer respectively, processing through two shared full-connection layers, adding features obtained by the maximum pooling and features obtained by the average pooling, and learning the body column weights through the full-connection layers and an activation layer:

；

wherein ,，/>，/>representing a full connection layer, ">Representing the maximum pooling layer,>representing the average pooling layer, ">Representing a volume column weight;

adding the body column weights calculated by the first branch and the second branch to obtain final body column weight, and inputtingPerforming column weighting to obtain triple weighted fusion characteristics +.>：

；

wherein ,、/>representing the body column weights of the first branch and the second branch respectively;

the two linear layers are used for dimension lifting and the maximum pooling feature and the average pooling feature are added to form the final voxel representation.

Optionally, in the target detection method based on point cloud sampling and weighted fusion, the converting the volumetric column feature representation into a pseudo image feature, obtaining feature graphs of different scales by a convolutional neural network of a feature pyramid structure from the pseudo image feature, and inputting the feature graphs to a detection head to convert the feature graphs into an output result of target detection, which specifically includes:

Projecting the body column characteristic representation back to the original position according to the coordinates of the body column center point to obtain the shape ofTensor of (2), wherein->Representing the feature dimension of a volume column, wherein H and W respectively represent the height and width of a pseudo image;

the pseudo image features are subjected to 2D convolution neural network of a feature pyramid structure to obtain feature graphs of different scales;

and inputting the characteristic map to a detection head, wherein the detection head consists of a group of convolution layers and full connection layers, and converting the characteristic map into an output result of target detection based on the detection head.

Optionally, the target detection method based on point cloud sampling and weighted fusion, wherein the output result includes prediction of a target class and position information of the target.

In addition, in order to achieve the above object, the present invention further provides a target detection system based on point cloud sampling and weighted fusion, where the target detection system based on point cloud sampling and weighted fusion includes:

the point cloud sampling module is used for inputting point cloud data, dividing the point cloud data into a plurality of volume columns with the same size, randomly sampling or filling zero samples into the point clouds of each volume column, expanding the dimension of each point cloud, counting the number of effective points in each volume column, and selecting a representative point cloud subset in each volume column according to the number and the spatial distribution of the effective points;

The weighting fusion module is used for carrying out point weighting operation on any input body column to obtain point weights, encoding channel characteristics to obtain characteristic weights, multiplying the point weights and the characteristic weights to obtain a combined weight matrix, multiplying the body column and the combined weight matrix element by element to obtain a body column subjected to point weighting and characteristic weighting fusion, splicing an original body column, a body column subjected to point weighting and characteristic weighting fusion, a central point coordinate of the body column and a mean value of sampling point coordinates contained in the body column to obtain an input of a body column weighting network, calculating the body column weights of two branches, adding the body column weights calculated by the two branches to obtain final body column weights, carrying out body column weighting on the input of the body column weighting network to obtain a triple weighting fusion characteristic, carrying out dimension lifting by using two linear layers, and adding the maximum pooling characteristic and the average pooling characteristic to form a final body column characteristic representation;

the object detection module is used for converting the body column characteristic representation into a pseudo image characteristic, obtaining characteristic diagrams with different scales through a convolution neural network with a characteristic pyramid structure by the pseudo image characteristic, and inputting the characteristic diagrams into a detection head so as to convert the characteristic diagrams into an output result of object detection.

In addition, in order to achieve the above object, the present invention also provides a computer readable storage medium storing a target detection program based on point cloud sampling and weighted fusion, which when executed by a processor, implements the steps of the target detection method based on point cloud sampling and weighted fusion as described above.

According to the method, point cloud data are input, the point cloud data are divided into a plurality of volume columns with the same size, point clouds of each volume column are randomly sampled or zero sample filling is carried out, the dimension of each point cloud is expanded, the number of effective points in each volume column is counted, and a representative point cloud subset is selected in each volume column according to the number and the spatial distribution of the effective points; for any input body column, carrying out point weighting operation on the body column to obtain point weights, encoding channel characteristics to obtain characteristic weights, multiplying the point weights and the characteristic weights to obtain a combined weight matrix, multiplying the body column and the combined weight matrix element by element to obtain a body column subjected to point weighting and characteristic weighting fusion, splicing the original body column, the body column subjected to point weighting and characteristic weighting fusion, the central point coordinates of the body column and the average value of all sampling point coordinates contained in the body column to obtain the input of a body column weighting network, calculating the body column weights of two branches, adding the body column weights calculated by the two branches to obtain final body column weights, carrying out body column weighting on the input of the body column weighting network to obtain triple weighting fusion characteristics, carrying out dimension ascending by using two linear layers, and adding the maximum pooling characteristics and the average pooling characteristics to form final body column characteristic representation; and converting the body column characteristic representation into a pseudo image characteristic, obtaining characteristic diagrams of different scales by the pseudo image characteristic through a convolution neural network of a characteristic pyramid structure, and inputting the characteristic diagrams into a detection head so as to convert the characteristic diagrams into an output result of target detection. According to the invention, an effective point is selected for each body column by adopting a high-efficiency point cloud sampling method, and the body column characteristics are learned through a triple weighted fusion network, so that the accuracy of small target detection is remarkably improved on the premise of not losing the detection speed.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the point cloud sampling and weighted fusion based target detection method of the present invention;

FIG. 2 is a schematic flow chart of a laser radar 3D target detection method based on efficient point cloud sampling and weighted fusion in a preferred embodiment of the target detection method based on point cloud sampling and weighted fusion of the present invention;

FIG. 3 is a schematic diagram of a highly efficient point cloud sampling method in a preferred embodiment of the target detection method based on point cloud sampling and weighted fusion of the present invention;

FIG. 4 is a schematic diagram of a triple weighted fusion network in a preferred embodiment of the target detection method based on point cloud sampling and weighted fusion of the present invention;

FIG. 5 is a schematic diagram of BEV view test results in a preferred embodiment of the point cloud sampling and weighted fusion based object detection method of the present invention;

FIG. 6 is a schematic diagram of 3D view test results in a preferred embodiment of the target detection method based on point cloud sampling and weighted fusion of the present invention;

FIG. 7 is a schematic diagram of a visual 3D object detection result in a preferred embodiment of the object detection method based on point cloud sampling and weighted fusion of the present invention;

FIG. 8 is a schematic diagram of a preferred embodiment of the point cloud sampling and weighted fusion based object detection system of the present invention;

FIG. 9 is a schematic diagram of the operating environment of a preferred embodiment of the terminal of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear and clear, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Currently, a 3D object detection algorithm for converting a point cloud into a volume column generally adopts a random sampling method. However, in practical applications, there is a limitation in detecting small objects due to the difference in the density of the point clouds and the difference in the number of points contained in each object. According to the invention, an efficient point cloud sampling method is adopted to select an effective point for each body column, and the body column characteristics are learned through a triple weighted fusion network, so that the accuracy of small target detection is remarkably improved on the premise of not losing the detection speed.

The target detection method based on the point cloud sampling and the weighted fusion according to the preferred embodiment of the present invention, as shown in fig. 1 and fig. 2, comprises the following steps:

step S10, inputting point cloud data, dividing the point cloud data into a plurality of volume columns with the same size, randomly sampling or filling zero samples into the point clouds of each volume column, expanding the dimension of each point cloud, counting the number of effective points in each volume column, and selecting a representative point cloud subset in each volume column according to the number and the spatial distribution of the effective points.

Specifically, one frame of input point cloud data V composed of M points is expressed asEach point cloud is denoted +.>, wherein ,/>，/>，/>Representing coordinates of a point cloud->Representing the reflectivity of the point cloud. The point cloud data V is then divided in three-dimensional space into L equal-sized volume columns. For each volume column, if the number of point clouds exceeds N, sampling is performed randomly, and if the number of point clouds is less than N, zero samples are used for filling (i.e. a certain point cloud is assumed, so that both coordinates and reflectivity are 0). Thereafter, the dimensions of each point cloudThe degree is expanded to obtain +.>, wherein ,/>，/>，/>Representing the distance of the point cloud to the average of all points of the body column, +.>，/>Representing the offset of the point cloud to the center of the body column, generating a dimension of +.>And L represents the number of volume columns.

The efficient point cloud sampling method aims at selecting key information points from each body pillar to help the subsequent network to effectively extract body pillar features. Fig. 3 shows a schematic diagram of a high-efficiency point cloud sampling method, which is mainly divided into the following four steps:

(1) For N points in each body column, counting the number of the effective points, and recording the position index of the effective points.

(2) Based on the judgment of the Z coordinate of the point cloud, the point with the Z coordinate being non-zero and at the middle is selected as an initial point (one and only one) so as to ensure that the initial point is a valid point and can represent the key information of the target most.

(3) A set of point cloud subsets of K points is sampled starting from an initial point using a distance furthest point sampling algorithm (Farthest Point Sampling). The distance farthest point sampling algorithm comprises the following steps:

(1) selecting a point as an initial point, and adding the initial point into the sampling point set;

(2) calculating the distance from all other points to the selected sampling point, and finding a point farthest from the selected sampling point set;

(3) adding the furthest point to the set of sampling points;

(4) repeating (2) and (3) until the set of sampling points reaches a desired number (e.g., K).

(4) And according to the number of the effective points in the body column, carrying out validity assessment on the sampled point cloud subset (namely judging whether the number of the sampling points is larger than the number of the effective points and whether zero sample points appear in the sampling points), and if invalid sampling position indexes appear, replacing the sampled effective points.

By the efficient point cloud sampling method, the number and spatial distribution of the effective points are considered to ensure that a representative subset of the point cloud is selected in each column.

Step S20, for any input body column, carrying out point weighting operation on the body column to obtain point weights, encoding channel characteristics to obtain characteristic weights, multiplying the point weights and the characteristic weights to obtain a combined weight matrix, multiplying the body column and the combined weight matrix element by element to obtain a body column subjected to point weighting and characteristic weighting fusion, splicing an original body column, the body column subjected to point weighting and characteristic weighting fusion, the center point coordinates of the body column and the average value of all sampling point coordinates contained in the body column to obtain the input of a body column weighting network, calculating the body column weights of two branches, adding the body column weights calculated by the two branches to obtain final body column weights, carrying out body column weighting on the input of the body column weighting network to obtain triple weighting fusion characteristics, carrying out dimension lifting by using two linear layers, and adding the maximum pooling characteristics and the average pooling characteristics to form the final body column characteristic representation.

Specifically, as the number of point clouds in the body pillar is reduced from N to K, higher requirements are placed on subsequent body pillar feature learning networks. For this purpose, the invention designs a triple weighted fusion network to further improve the representation of point cloud data. Let the input volume column be expressed as, wherein ,/>representing the number of columns, +.>，/>Representing real number set,/->Representing a two-dimensional matrix of K rows and C columns, K representing the number of sampling points per column, and C representing the characteristic dimension of each point. The structure diagram of the triple weighted fusion network is shown in fig. 4, and mainly comprises the following three parts:

1) Point weighting

Given arbitrary body columnK sample point aggregation features in the column obtained by maximum pooling +.>The method comprises the steps of carrying out a first treatment on the surface of the To better learn the correlation between points, two 1 x 1 convolutions are used for global feature coding, namely:

；（1）

wherein C represents a 1X 1 convolution operation,representing ReLU activation function, +.>Representation->Is a point-by-point weight of (c).

2) Feature weighting

Similar to point weighting, the characteristics of each channel are obtained through maximum poolingThat is, for K sampling points in the volume column, each feature of the point cloud is a channel, and a feature maximum value in the K sampling points is selected as a feature of the channel. Then, each channel correlation encoding is performed using two 1×1 convolutions, namely:

；（2）

wherein ,representation->Is a characteristic weight of (a).

Then, multiplying the point weight and the characteristic weight by element-by-element multiplication to obtain a combined weight matrixThe method comprises the following steps:

；（3）

wherein ,representing sigmoid function->Representing element-by-element multiplication.

Finally, will and />Multiplying element by element to realize weighting of point and characteristic dimension, obtaining a body column subjected to point weighting and characteristic weighting fusion>The method comprises the following steps:

；（4）

to enhance the robustness of the point and feature weighting, the above process is performed in an iterative manner.

3) Column weighting

In addition, the weighting of the volume columns is also important in view of the target differences involved in the different volume columns. Given four input tensors, the original volume columnPost after weighted fusion of points and features ∈ ->Center point coordinates of body column->And the mean value of the coordinates of each sampling point contained in the column +.>Splicing the four input tensors in the third dimension to obtain the input +.>。

As shown in fig. 4, the body column weights are calculated by adopting a first branch and a second branch (i.e. an upper branch and a lower branch), global feature aggregation is performed on the first branch and the second branch through a maximum pooling layer and an average pooling layer respectively, and the global feature aggregation is performed through two shared full-connection layers, then the features obtained by the maximum pooling and the features obtained by the average pooling are added, and the body column weights are learned through the full-connection layers and an activation layer, namely:

；（5）

wherein ,，/>，/>representing a full connection layer, ">Representing the maximum pooling layer,>representing the average pooling layer, ">Representing the volume column weights.

Then, adding the volume column weights calculated by the first branch and the second branch to obtain final volume column weights, and performing volume column weighting on the input Q to obtain a triple weighted fusion characteristicThe method comprises the following steps:

；（6）

finally, the dimension is upscaled using two linear layers and the maximum pooling feature and the average pooling feature are added to form the final voxel representation.

And step S30, converting the body column characteristic representation into a pseudo image characteristic, obtaining characteristic diagrams with different scales by the pseudo image characteristic through a convolution neural network with a characteristic pyramid structure, and inputting the characteristic diagrams into a detection head so as to convert the characteristic diagrams into an output result of target detection.

In particular, as shown in fig. 2,in order to be able to apply a 2D convolutional neural network on point cloud data, it is necessary to convert a volumetric pillar feature representation into a feature representation of a two-dimensional image, i.e. feature x width x length, called pseudo-image feature. The conversion process is as follows: projecting the body column characteristic representation back to the original position according to the coordinates of the body column center point to obtain the shape of Tensor of (2), wherein->Representing the feature dimensions of the volume column, H, W represent the height and width of the pseudo-image, respectively.

Then, the pseudo image features are subjected to 2D convolution neural network of a feature pyramid structure to obtain feature graphs of different scales; the feature pyramid structure is a network architecture for multi-scale feature extraction, and the main idea is to perform downsampling through a plurality of convolution layers with different convolution kernel sizes, and capture information with different scales in an image. For example, a larger convolution kernel may capture a larger range of information, while a smaller convolution kernel may extract more localized detail features.

Finally, after the feature images with different scales are obtained, inputting the feature images into a detection head, wherein the detection head consists of a group of convolution layers and full-connection layers, and converting the feature images into output results of target detection based on the detection head; these output results include predictions of the target category (i.e., whether the target is an automobile, a pedestrian, or a cyclist) and location information of the target (i.e., 3D bounding box coordinates of the detected target).

According to the invention, in a 3D target detection model represented by body column characteristics, an effective point is extracted for each body column by utilizing a high-efficiency point cloud sampling method, and a triple weighted network model based on points, characteristics and body columns is provided for learning body column characteristics and obtaining finer characteristic representation.

The method proposed by the present invention is tested as follows. All test data are based on the KITTI data set, including 7481 training samples and 7518 test samples. To evaluate model performance, training samples were divided into 3712 training sets and 3769 validation sets. In all training processes, only laser radar point cloud data of a training set are used, and three categories of automobiles, pedestrians and bicycles are trained and verified. The evaluation of each category is divided into three difficulty levels according to the size, the cut-off condition and the shielding state of the target: easy, medium and difficult. The Average Precision (AP) is used for respectively evaluating the target detection results of the three categories, and the average value (mAP) of the APs with medium difficulty in the three categories is calculated as the comprehensive performance index of the model.

Testing training and evaluation was performed in a hardware environment equipped with 3090Ti GPU and Intel i7-12700K CPU using the PyTorch framework. The network was trained with an Adam optimizer for 90 epochs, with a small batch size set to 16. The initial learning rate is 0.006, and the learning rate is dynamically adjusted by an OneCycle strategy. The maximum number of points per column is limited to 32 and the number of sampling points is set to 5.

Fig. 5 and 6 show the results of the evaluation of the present invention on Bird's Eye View (BEV) and 3D view, and compared with other existing target detection algorithms (e.g., MV3D, AVOD-FPN, F-PointNet, voxelNet, SECOND, pointRCNN, pointPlillars, MPNet, rangeDet, MANet). The results show that the invention shows obvious advantages on the basis of the reference model PointPicloras, and mAP values are improved from 67.71% and 60.46% of the reference model to 71.30% and 64.79% respectively. It is particularly notable that the AP values of the present invention are improved by more than 3% in terms of detection of small targets.

In addition, FIG. 7 shows the visual detection results of the present invention and the benchmark model PointPicloras. It can be seen that the PointPicloras missed detection of one car, one pedestrian and one cyclist, whereas the present invention successfully detected all targets, further verifying the effectiveness of the proposed method in small target detection.

Furthermore, the invention can be applied to other 3D target detection algorithms requiring point cloud sampling or body pillar feature learning.

Further, as shown in fig. 8, based on the target detection method based on the point cloud sampling and the weighted fusion, the invention further correspondingly provides a target detection system based on the point cloud sampling and the weighted fusion, wherein the target detection system based on the point cloud sampling and the weighted fusion comprises:

The point cloud sampling module 51 is configured to input point cloud data, divide the point cloud data into a plurality of columns with the same size, randomly sample or fill zero samples for the point clouds of each column, expand the dimension of each point cloud, count the number of effective points in each column, and select a representative point cloud subset in each column according to the number and spatial distribution of the effective points;

the weighted fusion module 52 is configured to perform a point weighting operation on any input volume column to obtain a point weight, encode a channel feature to obtain a feature weight, multiply the point weight and the feature weight to obtain a combined weight matrix, multiply the volume column and the combined weight matrix element by element to obtain a volume column after point weighting and feature weighting fusion, splice the original volume column, the volume column after point weighting and feature weighting fusion, the center point coordinates of the volume column, and the average value of the sampling point coordinates contained in the volume column to obtain an input of a volume column weighted network, calculate the volume column weights of two branches, add the volume column weights calculated by the two branches to obtain a final volume column weight, perform volume column weighting on the input of the volume column weighted network to obtain a triple weighted fusion feature, use two linear layers to perform dimension lifting, and add the maximum pooling feature and the average pooling feature to form a final volume column feature representation;

The target detection module 53 is configured to convert the body pillar feature representation into a pseudo image feature, obtain feature maps of different scales from the pseudo image feature through a convolutional neural network with a feature pyramid structure, and input the feature maps to a detection head to convert the feature maps into an output result of target detection.

Further, as shown in fig. 9, based on the target detection method and system based on the point cloud sampling and weighted fusion, the application further provides a terminal correspondingly, which comprises a processor 10, a memory 20 and a display 30. Fig. 9 shows only some of the components of the terminal, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.

The memory 20 may in some embodiments be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 20 may in other embodiments also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the terminal. The memory 20 is used for storing application software installed in the terminal and various data, such as program codes of the installation terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In an embodiment, the memory 20 stores a target detection program 40 based on point cloud sampling and weighted fusion, and the target detection program 40 based on point cloud sampling and weighted fusion can be executed by the processor 10, so as to implement the target detection method based on point cloud sampling and weighted fusion in the present application.

The processor 10 may in some embodiments be a central processing unit (Central Processing Unit, CPU), microprocessor or other data processing chip for executing program code or processing data stored in the memory 20, such as performing the point cloud sampling and weighted fusion based object detection method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like in some embodiments. The display 30 is used for displaying information at the terminal and for displaying a visual user interface. The components 10-30 of the terminal communicate with each other via a system bus.

In an embodiment, the steps of the target detection method based on point cloud sampling and weighted fusion as described above are implemented when the processor 10 executes the target detection program 40 based on point cloud sampling and weighted fusion in the memory 20.

The present invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a target detection program based on point cloud sampling and weighted fusion, and the target detection program based on point cloud sampling and weighted fusion implements the steps of the target detection method based on point cloud sampling and weighted fusion as described above when being executed by a processor.

In summary, the present invention provides a target detection method and related devices based on point cloud sampling and weighted fusion, where the method includes: when a plurality of far-field and uncorrelated narrow-band signals are incident on a uniform linear array receiver from different directions, an antenna unit receives a discrete time complex baseband signal, an output signal of the antenna unit is expressed as a vector form, a received signal at a plurality of moments is expressed as a matrix form, the received signal is quantized based on a single-bit analog-digital converter, and the quantized received signal is remodelled to obtain a new received signal model; selecting norms meeting preset requirements to characterize a noise matrix after single-bit quantization based on a new received signal model, and obtaining constraint conditions or punishment items of an optimization problem to obtain an optimization objective function; replacing a first objective function in the optimized objective function with a second objective function to obtain a new optimized objective function, wherein the second objective function is a lower semi-continuous function; and according to the new optimization objective function, solving an optimization problem by adopting a near-end alternating minimization method and a gradient descent algorithm to obtain the incidence angle of the target echo signal. The invention aims to remodel a receiving signal model of the single-bit radar, so that a receiving signal can retain or reflect the distribution information of noise data, and the distribution of noise is described by using a specific norm, thereby effectively inhibiting error quantization generated by low signal-to-noise ratio when the DoA is estimated by using a single-bit algorithm, ensuring the robustness of single-bit DoA estimation and further improving the accuracy of single-bit quantization DoA estimation in a low signal-to-noise ratio environment.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal comprising the element.

Of course, those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by a computer program for instructing relevant hardware (e.g., processor, controller, etc.), the program may be stored on a computer readable storage medium, and the program may include the above described methods when executed. The computer readable storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited in its application to the examples described above, but is capable of modification and variation in light of the above teachings by those skilled in the art, and that all such modifications and variations are intended to be included within the scope of the appended claims.

Claims

1. The target detection method based on the point cloud sampling and the weighted fusion is characterized by comprising the following steps of:

2. The target detection method based on point cloud sampling and weighted fusion according to claim 1, wherein the input point cloud data divides the point cloud data into a plurality of equal-sized volumes, randomly samples or fills zero samples for the point cloud of each volume, expands the dimension of each point cloud, counts the number of effective points in each volume, and selects a representative subset of the point clouds in each volume according to the number and spatial distribution of the effective points, and specifically comprises:

a frame of input point cloud data V consisting of M points is expressed asEach point cloud is represented as, wherein ,/>，/>，/>Representing coordinates of a point cloud->Representing the reflectivity of the point cloud;

Expanding the dimension of each point cloud to obtain, wherein ,/>，/>，/>Representing the distance of the point cloud to the average of all points of the body column, +.>，/>Representing the offset of the point cloud to the center of the body column, generating a dimension ofIs a three-dimensional tensor of (2);

3. The target detection method based on point cloud sampling and weighted fusion according to claim 2, wherein the sampling a set of point cloud subsets including K points from an initial point by using a distance farthest point sampling algorithm specifically includes:

adding the furthest point to the set of sampling points;

4. The target detection method based on point cloud sampling and weighted fusion according to claim 2, wherein the effectiveness evaluation is performed on the point cloud subset obtained by sampling, specifically:

5. The method for detecting a target based on point cloud sampling and weighted fusion according to claim 2, wherein for any input volume column, performing point weighting operation on the volume column to obtain point weights, encoding channel characteristics to obtain feature weights, multiplying the point weights and the feature weights to obtain a combined weight matrix, multiplying the volume column and the combined weight matrix element by element to obtain a volume column subjected to point weighting and feature weighting fusion, splicing an original volume column, the volume column subjected to point weighting and feature weighting fusion, a center point coordinate of the volume column and a mean value of sampling point coordinates contained in the volume column to obtain an input of a volume column weighting network, calculating the volume column weights of two branches, adding the volume column weights calculated by the two branches to obtain final volume column weights, performing volume column weighting on the input of the volume column weighting network to obtain a triple weighted fusion feature, performing dimension lifting by using two linear layers, and adding the maximized pool feature and the average pool feature to form a final volume column feature representation, comprising:

If the input volume column is expressed as, wherein ,/>Representing the number of columns, +.>，/>Representing real number set,/->Representing a two-dimensional matrix of K rows and C columns, wherein K represents the number of sampling points of each column, and C represents the characteristic dimension of each point;

Global feature encoding is performed using two 1 x 1 convolutions:；

；

wherein ,，/>，/>representing a full connection layer, ">Representing the maximum pooling layer,>representing the average pooling layer of the pool,representing a volume column weight;

；

6. The target detection method based on point cloud sampling and weighted fusion according to claim 1 or 5, wherein the converting the volumetric pillar feature representation into a pseudo image feature, obtaining feature maps of different scales from the pseudo image feature through a convolutional neural network of a feature pyramid structure, and inputting the feature maps to a detection head to convert the feature maps into an output result of target detection, specifically comprising:

7. The method for detecting a target based on point cloud sampling and weighted fusion according to claim 6, wherein the output result includes prediction of a target class and position information of the target.

8. A point cloud sampling and weighted fusion-based target detection system, comprising:

9. A terminal, the terminal comprising: the object detection method according to any one of claims 1-7, comprising a memory, a processor and a point cloud sampling and weighted fusion based object detection program stored on the memory and executable on the processor, wherein the point cloud sampling and weighted fusion based object detection program is executed by the processor to implement the steps of the point cloud sampling and weighted fusion based object detection method according to any one of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a point cloud sampling and weighted fusion based object detection program, which when executed by a processor, implements the steps of the point cloud sampling and weighted fusion based object detection method according to any of claims 1-7.