CN114120270A

CN114120270A - Point cloud target detection method based on attention and sampling learning

Info

Publication number: CN114120270A
Application number: CN202111314134.7A
Authority: CN
Inventors: 田炜; 赵晓龙; 邓振文; 黄禹尧; 谭大艺; 韩帅
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-11-08
Filing date: 2021-11-08
Publication date: 2022-03-01

Abstract

The invention relates to a point cloud target detection method based on attention and sampling learning, which comprises the following steps of: 1) collecting point cloud data of a target to be detected; 2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data; 3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics; 4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network; 5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics. Compared with the prior art, the method has the advantages of strong reliability, high accuracy and the like.

Description

Point cloud target detection method based on attention and sampling learning

Technical Field

The invention relates to the field of intelligent automobiles and computer vision, in particular to a point cloud target detection method based on attention and sampling learning.

Background

With the development of society and the improvement of living standard of residents, the automobile keeping amount in China gradually rises. Accordingly, road traffic safety becomes an important issue. The development of intelligent automobiles is expected to further improve the safety of the automobiles and reduce life and property losses caused by traffic accidents.

The intelligent automobile needs to accurately detect targets such as vehicles, pedestrians, non-motor vehicles and static obstacles in the surrounding environment through an environment sensor and a corresponding target detection method. The commonly used environmental perception sensors at present comprise a camera, a laser radar, a millimeter wave radar and the like. Due to lack of depth information, image data acquired by a camera is difficult to accurately position a target in space and is easily influenced by light in the environment; the millimeter wave radar can acquire three-dimensional position information, but the acquired three-dimensional information is too sparse, so that missing detection is easily caused; the laser radar can carry out dense three-dimensional position sampling on the surface of the surrounding environment, and then accurate three-dimensional point cloud data can be obtained. And point cloud target detection depends on a laser radar, and the size and the spatial position of a target can be accurately detected. Currently, the technical route of point cloud target detection can be divided into three types: the method comprises the following steps of voxel-based technical route (firstly, dispersing point clouds into regular voxels, and then detecting targets through a neural network), projection graph-based technical route (firstly, projecting the point clouds onto one or more plane views, and then detecting the targets through the neural network), and point cloud direct detection technical route (directly detecting the targets through the neural network without changing the representation form of the point clouds). In the processing of point clouds, information losses in the processing of the point clouds (voxelization, projection) are to be avoided as far as possible.

Compared with the other two technical routes, the technical route adopting the point cloud direct detection has the advantages that the information loss in the point cloud preprocessing is less because the point cloud voxelization or projection process does not exist. However, even with the method of the technical route for directly detecting the point cloud, there is a problem that the point cloud information is lost in the sampling stage in the process of extracting the point cloud features layer by layer, and improvement is urgently needed. In addition, in a traffic scene, a target is easily blocked by other targets or obstacles, and becomes a bottleneck limiting intelligent driving technology.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a point cloud target detection method based on attention and sampling learning, which has high accuracy and strong reliability.

The detection performance of the shielded vehicle and the reliability of the environment sensing system of the intelligent automobile are improved.

The purpose of the invention can be realized by the following technical scheme:

a point cloud target detection method based on attention and sampling learning comprises the following steps:

1) collecting point cloud data of a target to be detected;

2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data;

3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics;

4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network;

5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics.

Further, the point cloud extraction network comprises a plurality of point cloud attention networks and a plurality of layers of feature propagation networks, the plurality of point cloud attention networks are sequentially connected, and the plurality of layers of feature propagation networks are in jumping connection with the plurality of point cloud attention networks;

the point cloud attention network comprises a first self-attention model, a sampling learning network, two point networks and a first feedforward network;

according to the point cloud target detection method provided by the invention, the point cloud characteristics are extracted through the point cloud extraction network and are aggregated into candidate target characteristics, so that the detection capability of the point cloud target detection method on the shielded target is further enhanced, and in the extraction process of the point cloud characteristics, the information loss of the point cloud in sampling is reduced through sampling learning.

Further, the process of extracting the point cloud features by each point cloud attention network comprises the following steps:

201) inputting the input features into a sampling learning network to generate a sub-point cloud;

202) inputting the input features and the generated sub-point cloud into one point network to obtain a neighborhood feature vector F₁Said F₁Generating point cloud position codes by the corresponding coordinates through a multilayer perceptron;

203) f is to be₁Adding the point cloud position codes generated in the step 202), and projecting the point cloud position codes through a linear projection layer to form a point cloud query sequence Q;

204) extracting F by a farthest point sampling algorithm₁Subset F'₁Prepared from F'₁And F₁Inputting another point network to generate a neighborhood characteristic vector set, adding the neighborhood characteristic vector set and the point cloud position code generated in the step 202), and then generating a point cloud characteristic sequence through projection of a linear projection layer, wherein the point cloud characteristic sequence comprises a point cloud key characteristic sequence K and a point cloud value characteristic sequence V;

205) inputting Q, K and V into the first self-attention model to generate new point cloud characteristics;

the input characteristics of the sampling learning network of the first point cloud attention network are point cloud data, and the input characteristics of the sampling learning networks of the other point cloud attention networks are point cloud characteristics output by the last point cloud attention network;

the point cloud extraction network takes a point cloud attention network combined with a point network, a first self-attention model and a sampling learning network as a core structure, firstly, sub-point clouds in the point cloud are sampled through the sampling learning network, then, input features and neighborhood point cloud features of the sub-point clouds, namely local features, are extracted through the point network, finally, feature exchange is realized in all point cloud local features through the first self-attention model to extract global features of the point cloud, multiple layers of point cloud attention networks are overlapped, point cloud features containing local feature information and global feature information are extracted layer by layer, and then, the extracted point cloud features are spread to more points through a plurality of layers of feature spreading layers, namely linear interpolation;

the point cloud extraction network can extract local characteristics of the point cloud and global characteristics of the point cloud, and information loss of the point cloud in sampling is reduced.

Further, the step 201) includes:

211) mapping the input features to a high-dimensional space through a multilayer perceptron to generate a high-dimensional point cloud feature vector matrix;

212) reserving the maximum numerical value on each characteristic channel in the high-dimensional point cloud characteristic vector matrix through maximum pooling operation to obtain global characteristics;

213) splicing the global features to each high-dimensional point cloud feature vector to obtain spliced point cloud features;

214) inputting the characteristics of the spliced point cloud into a multilayer perceptron to obtain a sampling matrix;

215) and performing matrix multiplication on the sampling matrix and the input characteristics to obtain a sub-point cloud.

Further, the generating of the neighborhood feature vector set by the point network in step 204) includes:

221) from F'₁Is the center of the sphere, searching for F in the neighborhood of the sphere of radius r₁Is defined as an element of (1) each F in the neighborhood₁As a neighborhood feature set for the sphere center;

222) is prepared from F'₁Inputting the domain feature set of each element into a multilayer perceptron to obtain a high-dimensional point cloud feature vector;

223) to F'₁Performing maximum pooling operation on the high-dimensional point cloud feature vector of the domain feature set of each element, wherein F'₁Each element neighborhood of (a) generates a neighborhood feature vector, each neighborhood feature vectorAnd forming a neighborhood feature vector set.

Further, the step 205) includes:

231) q, K and V are divided into n groups along the feature dimension, and the n groups are respectively added with the corresponding point cloud position code combinations to correspondingly obtain a point cloud feature sequence Q 'with point cloud position information'_i、K′_iAnd V'_i，i＝1,2,…,n：

232) Computing an attention score matrix A_iThe calculation formula is as follows:

wherein d is the number of characteristic channels;

233) computing a set of intermediate vector sequences F_iThe calculation formula is as follows:

F_i＝A_iV′_i

f is to be_iSplicing along the dimension of the characteristic channel to obtain an intermediate characteristic vector sequence F₂；

234) F is to be₂And the neighborhood feature vector F generated in step 202)₁Adding the obtained data and carrying out layer normalization to generate a characteristic F₃；

235) F is to be₃Inputting the first feedforward network, and comparing the characteristics generated by the first feedforward network with F₃And adding, and carrying out layer normalization to generate new point cloud characteristics.

Further, the adaptive point cloud feature aggregation network comprises a plurality of decoding layers, wherein each decoding layer comprises a second self-attention model, a mutual attention model and a second feed-forward network;

aggregating the point cloud characteristics into target characteristics layer by layer through a plurality of decoding layers;

further, the step 4) comprises:

401) the first layer of decoding layer takes the target index characteristic generated in the step 3) as input, the other decoding layers take the first target characteristic output by the previous decoding layer as input to generate a new first target characteristic and a second target characteristic, the new first target characteristic is input into the multilayer perceptron to generate target position information, a target position code is generated according to the target position information, and the new second target characteristic is obtained by adding the target position code generated by the previous decoding layer and the first target characteristic;

the input first target features firstly pass through the second self-attention model to realize information exchange among the first target features, and then the mutual attention model extracts the relation between the first target features and the point cloud features to realize self-adaptive feature aggregation.

402) Inputting the second target feature vector into three linear projection layers to generate a target query feature sequence

Point cloud key feature sequence

Point cloud value feature sequence

403) To be provided with

Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence

404) Intermediate feature vector sequence

Adding the target characteristics input by the decoding layer, and generating characteristics through layer normalization

405) Will be characterized by

Adding the target position code and the target position code, and performing linear projection to obtain a target query characteristic sequence

406) Adding the point cloud features extracted in the step 2) with the point cloud position codes, and respectively obtaining point cloud key feature sequences through two linear projection layers

And point cloud value feature sequence

407) To be provided with

408) Intermediate feature vector sequence

And features of

After addition, the features are generated through layer normalization

409) Will be provided with

Inputting the second feedforward network, and comparing the characteristics generated by the second feedforward network with

Adding the obtained data, and performing layer normalization to obtain a first target feature output by the current decoding layerVector quantity;

410) and judging whether the current decoding layer is the last decoding layer, if so, taking the first target feature vector output by the current decoding layer as a candidate target feature, and ending the step, otherwise, executing the step 401).

Further, point cloud feature propagation is realized among the plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;

setting a feature propagation network at the tail end of a point cloud extraction network as a layer 1 feature propagation network, setting a layer l-1 feature propagation network as a layer to be propagated, wherein the specific process of point cloud feature propagation comprises the following steps:

241) finding k nearest neighbors of the midpoint of the l-1 layer from the l-1 layer feature propagation network;

242) calculating the distance d (x, x) between the nearest neighbor point and the point to be transmitted_i'), x and x_i' are the coordinates of the point to be transmitted and the nearest neighbor point, respectively;

243) point cloud features output by the l-th layer of feature propagation network are propagated to the l-1-th layer of feature propagation network, and the propagation formula is as follows:

ω_i＝1/d(x,x_i′)^p

wherein F is the point cloud characteristic propagated by the characteristic propagation network of the l-1 st layer, F_iP is a distance index for the nearest neighbor feature;

244) and (3) splicing the point cloud features obtained by the propagation of the characteristic propagation network of the layer l-1 with the point cloud features extracted by the point cloud attention network in jumping connection with the characteristic propagation network of the layer l-1, and obtaining the point cloud features output by the characteristic propagation network of the layer l-1 through 1-by-1 convolution.

Further, the step 3) comprises:

generating a confidence score representing the central approaching degree of the point cloud features and the target to be detected for each point cloud feature extracted in the step 2) through a multilayer perceptron, and sequentially selecting the feature with the highest confidence score as the target index point feature.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention discloses a point cloud extraction network, which takes a point cloud attention network combining a point network, a first self-attention model and a sampling learning network as a core structure, firstly samples sub-point clouds in the point cloud through the sampling learning network, then extracts input features and neighborhood point cloud features of the sub-point clouds, namely local features, finally realizes feature exchange in all point cloud local features through the first self-attention model to extract global features of the point cloud, superposes a plurality of layers of point cloud attention networks, extracts point cloud features containing local feature information and global feature information layer by layer, then spreads the extracted point cloud features to more points through a plurality of layers of feature spreading layers, namely linear interpolation, the point cloud extraction network can extract the point cloud local features and the global features, and reduces the information loss of the point cloud in sampling, the detection accuracy is high;

(2) the invention screens out target index point characteristics by a k-nearest value down-sampling method according to point cloud characteristics, aggregates the point cloud characteristics into candidate target characteristics by a self-adaptive point cloud characteristic aggregation network according to the target index point characteristics, generates the category, position and size information of a target to be detected by a multilayer perceptron according to the candidate target characteristics, the self-adaptive point cloud characteristic aggregation network comprises a plurality of decoding layers, each decoding layer comprises a second self-attention model, a mutual attention model and a second feed-forward network, the input first target characteristics firstly pass through the second self-attention model to realize the information exchange among the first target characteristics, then extracts the relation between the first target characteristics and the point cloud characteristics by the mutual attention model to realize the self-adaptive characteristic aggregation, and aggregates the point cloud characteristics into the target characteristics layer by layer through the plurality of decoding layers, and further, the detection capability of the network on the shielded target is enhanced, and the detection accuracy and reliability are high.

Drawings

FIG. 1 is a block diagram of the detection process of the present invention;

FIG. 2 is a schematic diagram of a point cloud extraction network;

fig. 3 is a schematic structural diagram of a decoding layer.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

A point cloud target detection method based on attention and sampling learning is disclosed, as shown in FIG. 1, and comprises the following steps:

1) collecting point cloud data of a target to be detected;

The point cloud target detection method provided by the embodiment is based on a point cloud target detection device, the point cloud target detection device comprises a controller and a laser radar which are arranged on a vehicle, point cloud data around the vehicle are collected through the laser radar, the controller receives the point cloud data collected through the laser radar through a data line, and the task of accurately detecting targets such as vehicles, pedestrians and static obstacles in the surrounding environment by an intelligent automobile is achieved.

As shown in fig. 3, the point cloud extraction network comprises four point cloud attention networks and two layers of feature propagation networks, wherein a plurality of point cloud attention networks are connected in sequence, and a plurality of layers of feature propagation networks are in jumping connection with a plurality of point cloud attention networks;

the point cloud attention network comprises a first self-attention model, a sampling learning network, a two-point network (PointNet) and a first feed-forward network;

according to the point cloud target detection method provided by the embodiment, the point cloud characteristics are extracted through the point cloud extraction network and are aggregated into candidate target characteristics, so that the detection capability of the point cloud target detection method for the shielded target is enhanced, and in the extraction process of the point cloud characteristics, the information loss of the point cloud in sampling is reduced through sampling learning.

The process of extracting the point cloud features by each point cloud attention network comprises the following steps:

Step 201) comprises:

211) mapping input features of dimensions N x 3 or N x (3+ C) to a high-dimensional space through a multilayer perceptron to generate a high-dimensional point cloud feature vector matrix of N x C1;

212) reserving the maximum value on each feature channel in the high-dimensional point cloud feature vector matrix of the NxC 1 through maximum pooling operation to obtain 1 xC 1-dimensional global features;

213) splicing the global features to each high-dimensional point cloud feature vector to obtain spliced point cloud features of N (2) C1 dimensions;

214) inputting the characteristics of the spliced point cloud into a multi-layer perceptron to obtain a sampling matrix of N x N1;

Step 204), the process of generating the neighborhood characteristic vector set by the point network comprises the following steps:

223) to F'₁Performing maximum pooling operation on the high-dimensional point cloud feature vector of the domain feature set of each element, wherein F'₁Each element neighborhood generates a neighborhood feature vector, and each neighborhood feature vector forms a neighborhood feature vector set.

Step 205) comprises:

wherein d is the number of characteristic channels;

F_i＝A_iV′_i

The adaptive point cloud feature aggregation network comprises a plurality of decoding layers, as shown in fig. 3, wherein each decoding layer comprises a second self-attention model, a mutual-attention model and a second feed-forward network;

the step 3) comprises the following steps:

generating a confidence score representing the central approaching degree of the point cloud features and the target to be detected for each point cloud feature extracted in the step 2) through a trained multilayer perceptron, and sequentially selecting the feature with the highest confidence score as a target index point feature;

the training process of the multi-layer perceptron trained in the step 3) comprises the following steps:

the dot label inside the object truth bounding box and one of the k nearest points to the object center is set to true, otherwise false, and supervised using the focal loss function.

The step 4) comprises the following steps:

Point cloud key feature sequence

Point cloud value feature sequence

403) To be provided with

404) Intermediate feature vector sequence

405) Will be characterized by

And point cloud value feature sequence

407) To be provided with

408) Intermediate feature vector sequence

And features of

After addition, the features are generated through layer normalization

409) Will be provided with

Adding, and carrying out layer normalization to obtain a first target characteristic vector output by a current decoding layer;

Point cloud feature propagation is realized among a plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;

ω_i＝1/d(x,x_i′)^p

The embodiment provides a point cloud target detection method based on attention and sampling learning, which aims to enhance the detection capability of a point cloud target detection algorithm on an occluded target and reduce point cloud information loss in a sampling stage in the point cloud feature extraction process as entry points, further enhance the detection capability of a network on the occluded target by using a point cloud feature extraction and point cloud global features aggregation, and reduce the information loss of the point cloud in sampling by using a sampling learning method in a point cloud sampling stage.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A point cloud target detection method based on attention and sampling learning is characterized by comprising the following steps:

1) collecting point cloud data of a target to be detected;

2. The method for detecting the point cloud target based on attention and sampling learning of claim 1, wherein the point cloud extraction network comprises a plurality of point cloud attention networks and a plurality of layers of feature propagation networks, the plurality of point cloud attention networks are connected in sequence, and the plurality of layers of feature propagation networks are in jumping connection with the plurality of point cloud attention networks;

the point cloud attention network comprises a first self-attention model, a sampling learning network, two point networks and a first feedforward network.

3. The method for point cloud target detection based on attention and sampling learning as claimed in claim 2, wherein the process of extracting point cloud features by each point cloud attention network comprises the following steps:

the input features of the sampling learning network of the first point cloud attention network are point cloud data, and the input features of the sampling learning networks of the other point cloud attention networks are point cloud features output by the last point cloud attention network.

4. The method as claimed in claim 3, wherein the step 201) comprises:

5. The method of claim 3, wherein the step 204) of generating the neighborhood feature vector set by the point network comprises:

6. The method of claim 3, wherein the step 205) comprises:

231) q, K and V are divided into n groups along the feature dimension, and the n groups are respectively added with the corresponding point cloud position code combinations to correspondingly obtain a point cloud feature sequence Q 'with point cloud position information'_i、K′_iAnd V'_i，i＝1，2，...，n：

wherein d is the number of characteristic channels;

F_i＝A_iV′_i

7. The method of claim 6, wherein the adaptive point cloud feature aggregation network comprises a plurality of decoding layers, each decoding layer comprising a second self-attention model, a mutual-attention model, and a second feed-forward network.

8. The method for detecting point cloud target based on attention and sample learning as claimed in claim 7, wherein the step 4) comprises:

Point cloud key feature sequence

Point cloud value feature sequence

403) To be provided with

404) Intermediate feature vector sequence

405) Will be characterized by

And point cloud value feature sequence

407) To be provided with

408) Intermediate feature vector sequence

And features of

After addition, the features are generated through layer normalization

409) Will be provided with

9. The method for detecting point cloud targets based on attention and sampling learning of claim 3, wherein point cloud feature propagation is realized among the plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;

ω_i＝1/d(x，x_i′)^p

10. The method for detecting the point cloud target based on attention and sample learning as claimed in claim 1, wherein the step 3) comprises: