CN114120270A - Point cloud target detection method based on attention and sampling learning - Google Patents

Point cloud target detection method based on attention and sampling learning Download PDF

Info

Publication number
CN114120270A
CN114120270A CN202111314134.7A CN202111314134A CN114120270A CN 114120270 A CN114120270 A CN 114120270A CN 202111314134 A CN202111314134 A CN 202111314134A CN 114120270 A CN114120270 A CN 114120270A
Authority
CN
China
Prior art keywords
point cloud
target
feature
network
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111314134.7A
Other languages
Chinese (zh)
Inventor
田炜
赵晓龙
邓振文
黄禹尧
谭大艺
韩帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202111314134.7A priority Critical patent/CN114120270A/en
Publication of CN114120270A publication Critical patent/CN114120270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a point cloud target detection method based on attention and sampling learning, which comprises the following steps of: 1) collecting point cloud data of a target to be detected; 2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data; 3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics; 4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network; 5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics. Compared with the prior art, the method has the advantages of strong reliability, high accuracy and the like.

Description

Point cloud target detection method based on attention and sampling learning
Technical Field
The invention relates to the field of intelligent automobiles and computer vision, in particular to a point cloud target detection method based on attention and sampling learning.
Background
With the development of society and the improvement of living standard of residents, the automobile keeping amount in China gradually rises. Accordingly, road traffic safety becomes an important issue. The development of intelligent automobiles is expected to further improve the safety of the automobiles and reduce life and property losses caused by traffic accidents.
The intelligent automobile needs to accurately detect targets such as vehicles, pedestrians, non-motor vehicles and static obstacles in the surrounding environment through an environment sensor and a corresponding target detection method. The commonly used environmental perception sensors at present comprise a camera, a laser radar, a millimeter wave radar and the like. Due to lack of depth information, image data acquired by a camera is difficult to accurately position a target in space and is easily influenced by light in the environment; the millimeter wave radar can acquire three-dimensional position information, but the acquired three-dimensional information is too sparse, so that missing detection is easily caused; the laser radar can carry out dense three-dimensional position sampling on the surface of the surrounding environment, and then accurate three-dimensional point cloud data can be obtained. And point cloud target detection depends on a laser radar, and the size and the spatial position of a target can be accurately detected. Currently, the technical route of point cloud target detection can be divided into three types: the method comprises the following steps of voxel-based technical route (firstly, dispersing point clouds into regular voxels, and then detecting targets through a neural network), projection graph-based technical route (firstly, projecting the point clouds onto one or more plane views, and then detecting the targets through the neural network), and point cloud direct detection technical route (directly detecting the targets through the neural network without changing the representation form of the point clouds). In the processing of point clouds, information losses in the processing of the point clouds (voxelization, projection) are to be avoided as far as possible.
Compared with the other two technical routes, the technical route adopting the point cloud direct detection has the advantages that the information loss in the point cloud preprocessing is less because the point cloud voxelization or projection process does not exist. However, even with the method of the technical route for directly detecting the point cloud, there is a problem that the point cloud information is lost in the sampling stage in the process of extracting the point cloud features layer by layer, and improvement is urgently needed. In addition, in a traffic scene, a target is easily blocked by other targets or obstacles, and becomes a bottleneck limiting intelligent driving technology.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a point cloud target detection method based on attention and sampling learning, which has high accuracy and strong reliability.
The detection performance of the shielded vehicle and the reliability of the environment sensing system of the intelligent automobile are improved.
The purpose of the invention can be realized by the following technical scheme:
a point cloud target detection method based on attention and sampling learning comprises the following steps:
1) collecting point cloud data of a target to be detected;
2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data;
3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics;
4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network;
5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics.
Further, the point cloud extraction network comprises a plurality of point cloud attention networks and a plurality of layers of feature propagation networks, the plurality of point cloud attention networks are sequentially connected, and the plurality of layers of feature propagation networks are in jumping connection with the plurality of point cloud attention networks;
the point cloud attention network comprises a first self-attention model, a sampling learning network, two point networks and a first feedforward network;
according to the point cloud target detection method provided by the invention, the point cloud characteristics are extracted through the point cloud extraction network and are aggregated into candidate target characteristics, so that the detection capability of the point cloud target detection method on the shielded target is further enhanced, and in the extraction process of the point cloud characteristics, the information loss of the point cloud in sampling is reduced through sampling learning.
Further, the process of extracting the point cloud features by each point cloud attention network comprises the following steps:
201) inputting the input features into a sampling learning network to generate a sub-point cloud;
202) inputting the input features and the generated sub-point cloud into one point network to obtain a neighborhood feature vector F1Said F1Generating point cloud position codes by the corresponding coordinates through a multilayer perceptron;
203) f is to be1Adding the point cloud position codes generated in the step 202), and projecting the point cloud position codes through a linear projection layer to form a point cloud query sequence Q;
204) extracting F by a farthest point sampling algorithm1Subset F'1Prepared from F'1And F1Inputting another point network to generate a neighborhood characteristic vector set, adding the neighborhood characteristic vector set and the point cloud position code generated in the step 202), and then generating a point cloud characteristic sequence through projection of a linear projection layer, wherein the point cloud characteristic sequence comprises a point cloud key characteristic sequence K and a point cloud value characteristic sequence V;
205) inputting Q, K and V into the first self-attention model to generate new point cloud characteristics;
the input characteristics of the sampling learning network of the first point cloud attention network are point cloud data, and the input characteristics of the sampling learning networks of the other point cloud attention networks are point cloud characteristics output by the last point cloud attention network;
the point cloud extraction network takes a point cloud attention network combined with a point network, a first self-attention model and a sampling learning network as a core structure, firstly, sub-point clouds in the point cloud are sampled through the sampling learning network, then, input features and neighborhood point cloud features of the sub-point clouds, namely local features, are extracted through the point network, finally, feature exchange is realized in all point cloud local features through the first self-attention model to extract global features of the point cloud, multiple layers of point cloud attention networks are overlapped, point cloud features containing local feature information and global feature information are extracted layer by layer, and then, the extracted point cloud features are spread to more points through a plurality of layers of feature spreading layers, namely linear interpolation;
the point cloud extraction network can extract local characteristics of the point cloud and global characteristics of the point cloud, and information loss of the point cloud in sampling is reduced.
Further, the step 201) includes:
211) mapping the input features to a high-dimensional space through a multilayer perceptron to generate a high-dimensional point cloud feature vector matrix;
212) reserving the maximum numerical value on each characteristic channel in the high-dimensional point cloud characteristic vector matrix through maximum pooling operation to obtain global characteristics;
213) splicing the global features to each high-dimensional point cloud feature vector to obtain spliced point cloud features;
214) inputting the characteristics of the spliced point cloud into a multilayer perceptron to obtain a sampling matrix;
215) and performing matrix multiplication on the sampling matrix and the input characteristics to obtain a sub-point cloud.
Further, the generating of the neighborhood feature vector set by the point network in step 204) includes:
221) from F'1Is the center of the sphere, searching for F in the neighborhood of the sphere of radius r1Is defined as an element of (1) each F in the neighborhood1As a neighborhood feature set for the sphere center;
222) is prepared from F'1Inputting the domain feature set of each element into a multilayer perceptron to obtain a high-dimensional point cloud feature vector;
223) to F'1Performing maximum pooling operation on the high-dimensional point cloud feature vector of the domain feature set of each element, wherein F'1Each element neighborhood of (a) generates a neighborhood feature vector, each neighborhood feature vectorAnd forming a neighborhood feature vector set.
Further, the step 205) includes:
231) q, K and V are divided into n groups along the feature dimension, and the n groups are respectively added with the corresponding point cloud position code combinations to correspondingly obtain a point cloud feature sequence Q 'with point cloud position information'i、K′iAnd V'i,i=1,2,…,n:
232) Computing an attention score matrix AiThe calculation formula is as follows:
Figure BDA0003343064190000041
wherein d is the number of characteristic channels;
233) computing a set of intermediate vector sequences FiThe calculation formula is as follows:
Fi=AiV′i
f is to beiSplicing along the dimension of the characteristic channel to obtain an intermediate characteristic vector sequence F2
234) F is to be2And the neighborhood feature vector F generated in step 202)1Adding the obtained data and carrying out layer normalization to generate a characteristic F3
235) F is to be3Inputting the first feedforward network, and comparing the characteristics generated by the first feedforward network with F3And adding, and carrying out layer normalization to generate new point cloud characteristics.
Further, the adaptive point cloud feature aggregation network comprises a plurality of decoding layers, wherein each decoding layer comprises a second self-attention model, a mutual attention model and a second feed-forward network;
aggregating the point cloud characteristics into target characteristics layer by layer through a plurality of decoding layers;
further, the step 4) comprises:
401) the first layer of decoding layer takes the target index characteristic generated in the step 3) as input, the other decoding layers take the first target characteristic output by the previous decoding layer as input to generate a new first target characteristic and a second target characteristic, the new first target characteristic is input into the multilayer perceptron to generate target position information, a target position code is generated according to the target position information, and the new second target characteristic is obtained by adding the target position code generated by the previous decoding layer and the first target characteristic;
the input first target features firstly pass through the second self-attention model to realize information exchange among the first target features, and then the mutual attention model extracts the relation between the first target features and the point cloud features to realize self-adaptive feature aggregation.
402) Inputting the second target feature vector into three linear projection layers to generate a target query feature sequence
Figure BDA0003343064190000042
Point cloud key feature sequence
Figure BDA0003343064190000043
Point cloud value feature sequence
Figure BDA0003343064190000044
403) To be provided with
Figure BDA0003343064190000045
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure BDA0003343064190000046
404) Intermediate feature vector sequence
Figure BDA0003343064190000047
Adding the target characteristics input by the decoding layer, and generating characteristics through layer normalization
Figure BDA0003343064190000048
405) Will be characterized by
Figure BDA0003343064190000051
Adding the target position code and the target position code, and performing linear projection to obtain a target query characteristic sequence
Figure BDA0003343064190000052
406) Adding the point cloud features extracted in the step 2) with the point cloud position codes, and respectively obtaining point cloud key feature sequences through two linear projection layers
Figure BDA0003343064190000053
And point cloud value feature sequence
Figure BDA0003343064190000054
407) To be provided with
Figure BDA0003343064190000055
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure BDA0003343064190000056
408) Intermediate feature vector sequence
Figure BDA0003343064190000057
And features of
Figure BDA0003343064190000058
After addition, the features are generated through layer normalization
Figure BDA0003343064190000059
409) Will be provided with
Figure BDA00033430641900000510
Inputting the second feedforward network, and comparing the characteristics generated by the second feedforward network with
Figure BDA00033430641900000511
Adding the obtained data, and performing layer normalization to obtain a first target feature output by the current decoding layerVector quantity;
410) and judging whether the current decoding layer is the last decoding layer, if so, taking the first target feature vector output by the current decoding layer as a candidate target feature, and ending the step, otherwise, executing the step 401).
Further, point cloud feature propagation is realized among the plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;
setting a feature propagation network at the tail end of a point cloud extraction network as a layer 1 feature propagation network, setting a layer l-1 feature propagation network as a layer to be propagated, wherein the specific process of point cloud feature propagation comprises the following steps:
241) finding k nearest neighbors of the midpoint of the l-1 layer from the l-1 layer feature propagation network;
242) calculating the distance d (x, x) between the nearest neighbor point and the point to be transmittedi'), x and xi' are the coordinates of the point to be transmitted and the nearest neighbor point, respectively;
243) point cloud features output by the l-th layer of feature propagation network are propagated to the l-1-th layer of feature propagation network, and the propagation formula is as follows:
Figure BDA00033430641900000512
ωi=1/d(x,xi′)p
wherein F is the point cloud characteristic propagated by the characteristic propagation network of the l-1 st layer, FiP is a distance index for the nearest neighbor feature;
244) and (3) splicing the point cloud features obtained by the propagation of the characteristic propagation network of the layer l-1 with the point cloud features extracted by the point cloud attention network in jumping connection with the characteristic propagation network of the layer l-1, and obtaining the point cloud features output by the characteristic propagation network of the layer l-1 through 1-by-1 convolution.
Further, the step 3) comprises:
generating a confidence score representing the central approaching degree of the point cloud features and the target to be detected for each point cloud feature extracted in the step 2) through a multilayer perceptron, and sequentially selecting the feature with the highest confidence score as the target index point feature.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention discloses a point cloud extraction network, which takes a point cloud attention network combining a point network, a first self-attention model and a sampling learning network as a core structure, firstly samples sub-point clouds in the point cloud through the sampling learning network, then extracts input features and neighborhood point cloud features of the sub-point clouds, namely local features, finally realizes feature exchange in all point cloud local features through the first self-attention model to extract global features of the point cloud, superposes a plurality of layers of point cloud attention networks, extracts point cloud features containing local feature information and global feature information layer by layer, then spreads the extracted point cloud features to more points through a plurality of layers of feature spreading layers, namely linear interpolation, the point cloud extraction network can extract the point cloud local features and the global features, and reduces the information loss of the point cloud in sampling, the detection accuracy is high;
(2) the invention screens out target index point characteristics by a k-nearest value down-sampling method according to point cloud characteristics, aggregates the point cloud characteristics into candidate target characteristics by a self-adaptive point cloud characteristic aggregation network according to the target index point characteristics, generates the category, position and size information of a target to be detected by a multilayer perceptron according to the candidate target characteristics, the self-adaptive point cloud characteristic aggregation network comprises a plurality of decoding layers, each decoding layer comprises a second self-attention model, a mutual attention model and a second feed-forward network, the input first target characteristics firstly pass through the second self-attention model to realize the information exchange among the first target characteristics, then extracts the relation between the first target characteristics and the point cloud characteristics by the mutual attention model to realize the self-adaptive characteristic aggregation, and aggregates the point cloud characteristics into the target characteristics layer by layer through the plurality of decoding layers, and further, the detection capability of the network on the shielded target is enhanced, and the detection accuracy and reliability are high.
Drawings
FIG. 1 is a block diagram of the detection process of the present invention;
FIG. 2 is a schematic diagram of a point cloud extraction network;
fig. 3 is a schematic structural diagram of a decoding layer.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
A point cloud target detection method based on attention and sampling learning is disclosed, as shown in FIG. 1, and comprises the following steps:
1) collecting point cloud data of a target to be detected;
2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data;
3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics;
4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network;
5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics.
The point cloud target detection method provided by the embodiment is based on a point cloud target detection device, the point cloud target detection device comprises a controller and a laser radar which are arranged on a vehicle, point cloud data around the vehicle are collected through the laser radar, the controller receives the point cloud data collected through the laser radar through a data line, and the task of accurately detecting targets such as vehicles, pedestrians and static obstacles in the surrounding environment by an intelligent automobile is achieved.
As shown in fig. 3, the point cloud extraction network comprises four point cloud attention networks and two layers of feature propagation networks, wherein a plurality of point cloud attention networks are connected in sequence, and a plurality of layers of feature propagation networks are in jumping connection with a plurality of point cloud attention networks;
the point cloud attention network comprises a first self-attention model, a sampling learning network, a two-point network (PointNet) and a first feed-forward network;
according to the point cloud target detection method provided by the embodiment, the point cloud characteristics are extracted through the point cloud extraction network and are aggregated into candidate target characteristics, so that the detection capability of the point cloud target detection method for the shielded target is enhanced, and in the extraction process of the point cloud characteristics, the information loss of the point cloud in sampling is reduced through sampling learning.
The process of extracting the point cloud features by each point cloud attention network comprises the following steps:
201) inputting the input features into a sampling learning network to generate a sub-point cloud;
202) inputting the input features and the generated sub-point cloud into one point network to obtain a neighborhood feature vector F1Said F1Generating point cloud position codes by the corresponding coordinates through a multilayer perceptron;
203) f is to be1Adding the point cloud position codes generated in the step 202), and projecting the point cloud position codes through a linear projection layer to form a point cloud query sequence Q;
204) extracting F by a farthest point sampling algorithm1Subset F'1Prepared from F'1And F1Inputting another point network to generate a neighborhood characteristic vector set, adding the neighborhood characteristic vector set and the point cloud position code generated in the step 202), and then generating a point cloud characteristic sequence through projection of a linear projection layer, wherein the point cloud characteristic sequence comprises a point cloud key characteristic sequence K and a point cloud value characteristic sequence V;
205) inputting Q, K and V into the first self-attention model to generate new point cloud characteristics;
the input characteristics of the sampling learning network of the first point cloud attention network are point cloud data, and the input characteristics of the sampling learning networks of the other point cloud attention networks are point cloud characteristics output by the last point cloud attention network;
the point cloud extraction network takes a point cloud attention network combined with a point network, a first self-attention model and a sampling learning network as a core structure, firstly, sub-point clouds in the point cloud are sampled through the sampling learning network, then, input features and neighborhood point cloud features of the sub-point clouds, namely local features, are extracted through the point network, finally, feature exchange is realized in all point cloud local features through the first self-attention model to extract global features of the point cloud, multiple layers of point cloud attention networks are overlapped, point cloud features containing local feature information and global feature information are extracted layer by layer, and then, the extracted point cloud features are spread to more points through a plurality of layers of feature spreading layers, namely linear interpolation;
the point cloud extraction network can extract local characteristics of the point cloud and global characteristics of the point cloud, and information loss of the point cloud in sampling is reduced.
Step 201) comprises:
211) mapping input features of dimensions N x 3 or N x (3+ C) to a high-dimensional space through a multilayer perceptron to generate a high-dimensional point cloud feature vector matrix of N x C1;
212) reserving the maximum value on each feature channel in the high-dimensional point cloud feature vector matrix of the NxC 1 through maximum pooling operation to obtain 1 xC 1-dimensional global features;
213) splicing the global features to each high-dimensional point cloud feature vector to obtain spliced point cloud features of N (2) C1 dimensions;
214) inputting the characteristics of the spliced point cloud into a multi-layer perceptron to obtain a sampling matrix of N x N1;
215) and performing matrix multiplication on the sampling matrix and the input characteristics to obtain a sub-point cloud.
Step 204), the process of generating the neighborhood characteristic vector set by the point network comprises the following steps:
221) from F'1Is the center of the sphere, searching for F in the neighborhood of the sphere of radius r1Is defined as an element of (1) each F in the neighborhood1As a neighborhood feature set for the sphere center;
222) is prepared from F'1Inputting the domain feature set of each element into a multilayer perceptron to obtain a high-dimensional point cloud feature vector;
223) to F'1Performing maximum pooling operation on the high-dimensional point cloud feature vector of the domain feature set of each element, wherein F'1Each element neighborhood generates a neighborhood feature vector, and each neighborhood feature vector forms a neighborhood feature vector set.
Step 205) comprises:
231) q, K and V are divided into n groups along the feature dimension, and the n groups are respectively added with the corresponding point cloud position code combinations to correspondingly obtain a point cloud feature sequence Q 'with point cloud position information'i、K′iAnd V'i,i=1,2,…,n:
232) Computing an attention score matrix AiThe calculation formula is as follows:
Figure BDA0003343064190000091
wherein d is the number of characteristic channels;
233) computing a set of intermediate vector sequences FiThe calculation formula is as follows:
Fi=AiV′i
f is to beiSplicing along the dimension of the characteristic channel to obtain an intermediate characteristic vector sequence F2
234) F is to be2And the neighborhood feature vector F generated in step 202)1Adding the obtained data and carrying out layer normalization to generate a characteristic F3
235) F is to be3Inputting the first feedforward network, and comparing the characteristics generated by the first feedforward network with F3And adding, and carrying out layer normalization to generate new point cloud characteristics.
The adaptive point cloud feature aggregation network comprises a plurality of decoding layers, as shown in fig. 3, wherein each decoding layer comprises a second self-attention model, a mutual-attention model and a second feed-forward network;
aggregating the point cloud characteristics into target characteristics layer by layer through a plurality of decoding layers;
the step 3) comprises the following steps:
generating a confidence score representing the central approaching degree of the point cloud features and the target to be detected for each point cloud feature extracted in the step 2) through a trained multilayer perceptron, and sequentially selecting the feature with the highest confidence score as a target index point feature;
the training process of the multi-layer perceptron trained in the step 3) comprises the following steps:
the dot label inside the object truth bounding box and one of the k nearest points to the object center is set to true, otherwise false, and supervised using the focal loss function.
The step 4) comprises the following steps:
401) the first layer of decoding layer takes the target index characteristic generated in the step 3) as input, the other decoding layers take the first target characteristic output by the previous decoding layer as input to generate a new first target characteristic and a second target characteristic, the new first target characteristic is input into the multilayer perceptron to generate target position information, a target position code is generated according to the target position information, and the new second target characteristic is obtained by adding the target position code generated by the previous decoding layer and the first target characteristic;
the input first target features firstly pass through the second self-attention model to realize information exchange among the first target features, and then the mutual attention model extracts the relation between the first target features and the point cloud features to realize self-adaptive feature aggregation.
402) Inputting the second target feature vector into three linear projection layers to generate a target query feature sequence
Figure BDA0003343064190000092
Point cloud key feature sequence
Figure BDA0003343064190000093
Point cloud value feature sequence
Figure BDA0003343064190000094
403) To be provided with
Figure BDA0003343064190000101
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure BDA0003343064190000102
404) Intermediate feature vector sequence
Figure BDA0003343064190000103
Adding the target characteristics input by the decoding layer, and generating characteristics through layer normalization
Figure BDA0003343064190000104
405) Will be characterized by
Figure BDA0003343064190000105
Adding the target position code and the target position code, and performing linear projection to obtain a target query characteristic sequence
Figure BDA0003343064190000106
406) Adding the point cloud features extracted in the step 2) with the point cloud position codes, and respectively obtaining point cloud key feature sequences through two linear projection layers
Figure BDA0003343064190000107
And point cloud value feature sequence
Figure BDA0003343064190000108
407) To be provided with
Figure BDA0003343064190000109
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure BDA00033430641900001010
408) Intermediate feature vector sequence
Figure BDA00033430641900001011
And features of
Figure BDA00033430641900001012
After addition, the features are generated through layer normalization
Figure BDA00033430641900001013
409) Will be provided with
Figure BDA00033430641900001014
Inputting the second feedforward network, and comparing the characteristics generated by the second feedforward network with
Figure BDA00033430641900001015
Adding, and carrying out layer normalization to obtain a first target characteristic vector output by a current decoding layer;
410) and judging whether the current decoding layer is the last decoding layer, if so, taking the first target feature vector output by the current decoding layer as a candidate target feature, and ending the step, otherwise, executing the step 401).
Point cloud feature propagation is realized among a plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;
setting a feature propagation network at the tail end of a point cloud extraction network as a layer 1 feature propagation network, setting a layer l-1 feature propagation network as a layer to be propagated, wherein the specific process of point cloud feature propagation comprises the following steps:
241) finding k nearest neighbors of the midpoint of the l-1 layer from the l-1 layer feature propagation network;
242) calculating the distance d (x, x) between the nearest neighbor point and the point to be transmittedi'), x and xi' are the coordinates of the point to be transmitted and the nearest neighbor point, respectively;
243) point cloud features output by the l-th layer of feature propagation network are propagated to the l-1-th layer of feature propagation network, and the propagation formula is as follows:
Figure BDA00033430641900001016
ωi=1/d(x,xi′)p
wherein F is the point cloud characteristic propagated by the characteristic propagation network of the l-1 st layer, FiP is a distance index for the nearest neighbor feature;
244) and (3) splicing the point cloud features obtained by the propagation of the characteristic propagation network of the layer l-1 with the point cloud features extracted by the point cloud attention network in jumping connection with the characteristic propagation network of the layer l-1, and obtaining the point cloud features output by the characteristic propagation network of the layer l-1 through 1-by-1 convolution.
The embodiment provides a point cloud target detection method based on attention and sampling learning, which aims to enhance the detection capability of a point cloud target detection algorithm on an occluded target and reduce point cloud information loss in a sampling stage in the point cloud feature extraction process as entry points, further enhance the detection capability of a network on the occluded target by using a point cloud feature extraction and point cloud global features aggregation, and reduce the information loss of the point cloud in sampling by using a sampling learning method in a point cloud sampling stage.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A point cloud target detection method based on attention and sampling learning is characterized by comprising the following steps:
1) collecting point cloud data of a target to be detected;
2) extracting point cloud characteristics through a point cloud extraction network according to the point cloud data;
3) screening out target index point characteristics by a k-nearest value down-sampling method according to the point cloud characteristics;
4) according to the target index point characteristics, point cloud characteristics are aggregated into candidate target characteristics through a self-adaptive point cloud characteristic aggregation network;
5) and generating the category, position and size information of the target to be detected through a multilayer perceptron according to the candidate target characteristics.
2. The method for detecting the point cloud target based on attention and sampling learning of claim 1, wherein the point cloud extraction network comprises a plurality of point cloud attention networks and a plurality of layers of feature propagation networks, the plurality of point cloud attention networks are connected in sequence, and the plurality of layers of feature propagation networks are in jumping connection with the plurality of point cloud attention networks;
the point cloud attention network comprises a first self-attention model, a sampling learning network, two point networks and a first feedforward network.
3. The method for point cloud target detection based on attention and sampling learning as claimed in claim 2, wherein the process of extracting point cloud features by each point cloud attention network comprises the following steps:
201) inputting the input features into a sampling learning network to generate a sub-point cloud;
202) inputting the input features and the generated sub-point cloud into one point network to obtain a neighborhood feature vector F1Said F1Generating point cloud position codes by the corresponding coordinates through a multilayer perceptron;
203) f is to be1Adding the point cloud position codes generated in the step 202), and projecting the point cloud position codes through a linear projection layer to form a point cloud query sequence Q;
204) extracting F by a farthest point sampling algorithm1Subset F'1Prepared from F'1And F1Inputting another point network to generate a neighborhood characteristic vector set, adding the neighborhood characteristic vector set and the point cloud position code generated in the step 202), and then generating a point cloud characteristic sequence through projection of a linear projection layer, wherein the point cloud characteristic sequence comprises a point cloud key characteristic sequence K and a point cloud value characteristic sequence V;
205) inputting Q, K and V into the first self-attention model to generate new point cloud characteristics;
the input features of the sampling learning network of the first point cloud attention network are point cloud data, and the input features of the sampling learning networks of the other point cloud attention networks are point cloud features output by the last point cloud attention network.
4. The method as claimed in claim 3, wherein the step 201) comprises:
211) mapping the input features to a high-dimensional space through a multilayer perceptron to generate a high-dimensional point cloud feature vector matrix;
212) reserving the maximum numerical value on each characteristic channel in the high-dimensional point cloud characteristic vector matrix through maximum pooling operation to obtain global characteristics;
213) splicing the global features to each high-dimensional point cloud feature vector to obtain spliced point cloud features;
214) inputting the characteristics of the spliced point cloud into a multilayer perceptron to obtain a sampling matrix;
215) and performing matrix multiplication on the sampling matrix and the input characteristics to obtain a sub-point cloud.
5. The method of claim 3, wherein the step 204) of generating the neighborhood feature vector set by the point network comprises:
221) from F'1Is the center of the sphere, searching for F in the neighborhood of the sphere of radius r1Is defined as an element of (1) each F in the neighborhood1As a neighborhood feature set for the sphere center;
222) is prepared from F'1Inputting the domain feature set of each element into a multilayer perceptron to obtain a high-dimensional point cloud feature vector;
223) to F'1Performing maximum pooling operation on the high-dimensional point cloud feature vector of the domain feature set of each element, wherein F'1Each element neighborhood generates a neighborhood feature vector, and each neighborhood feature vector forms a neighborhood feature vector set.
6. The method of claim 3, wherein the step 205) comprises:
231) q, K and V are divided into n groups along the feature dimension, and the n groups are respectively added with the corresponding point cloud position code combinations to correspondingly obtain a point cloud feature sequence Q 'with point cloud position information'i、K′iAnd V'i,i=1,2,...,n:
232) Computing an attention score matrix AiThe calculation formula is as follows:
Figure FDA0003343064180000021
wherein d is the number of characteristic channels;
233) computing a set of intermediate vector sequences FiThe calculation formula is as follows:
Fi=AiV′i
f is to beiSplicing along the dimension of the characteristic channel to obtain an intermediate characteristic vector sequence F2
234) F is to be2And the neighborhood feature vector F generated in step 202)1Adding the obtained data and carrying out layer normalization to generate a characteristic F3
235) F is to be3Inputting the first feedforward network, and comparing the characteristics generated by the first feedforward network with F3And adding, and carrying out layer normalization to generate new point cloud characteristics.
7. The method of claim 6, wherein the adaptive point cloud feature aggregation network comprises a plurality of decoding layers, each decoding layer comprising a second self-attention model, a mutual-attention model, and a second feed-forward network.
8. The method for detecting point cloud target based on attention and sample learning as claimed in claim 7, wherein the step 4) comprises:
401) the first layer of decoding layer takes the target index characteristic generated in the step 3) as input, the other decoding layers take the first target characteristic output by the previous decoding layer as input to generate a new first target characteristic and a second target characteristic, the new first target characteristic is input into the multilayer perceptron to generate target position information, a target position code is generated according to the target position information, and the new second target characteristic is obtained by adding the target position code generated by the previous decoding layer and the first target characteristic;
402) inputting the second target feature vector into three linear projection layers to generate a target query feature sequence
Figure FDA00033430641800000319
Point cloud key feature sequence
Figure FDA0003343064180000032
Point cloud value feature sequence
Figure FDA0003343064180000033
403) To be provided with
Figure FDA0003343064180000034
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure FDA0003343064180000035
404) Intermediate feature vector sequence
Figure FDA0003343064180000036
Adding the target characteristics input by the decoding layer, and generating characteristics through layer normalization
Figure FDA0003343064180000037
405) Will be characterized by
Figure FDA0003343064180000038
Adding the target position code and the target position code, and performing linear projection to obtain a target query characteristic sequence
Figure FDA0003343064180000039
406) Adding the point cloud features extracted in the step 2) with the point cloud position codes, and respectively obtaining point cloud key feature sequences through two linear projection layers
Figure FDA00033430641800000310
And point cloud value feature sequence
Figure FDA00033430641800000311
407) To be provided with
Figure FDA00033430641800000312
Respectively replacing Q, K, V in step 231), and performing steps 231) -233), generating an intermediate feature vector sequence
Figure FDA00033430641800000313
408) Intermediate feature vector sequence
Figure FDA00033430641800000314
And features of
Figure FDA00033430641800000315
After addition, the features are generated through layer normalization
Figure FDA00033430641800000316
409) Will be provided with
Figure FDA00033430641800000317
Inputting the second feedforward network, and comparing the characteristics generated by the second feedforward network with
Figure FDA00033430641800000318
Adding, and carrying out layer normalization to obtain a first target characteristic vector output by a current decoding layer;
410) and judging whether the current decoding layer is the last decoding layer, if so, taking the first target feature vector output by the current decoding layer as a candidate target feature, and ending the step, otherwise, executing the step 401).
9. The method for detecting point cloud targets based on attention and sampling learning of claim 3, wherein point cloud feature propagation is realized among the plurality of layers of feature propagation networks through a feature propagation strategy based on distance interpolation and a jump connection mode;
setting a feature propagation network at the tail end of a point cloud extraction network as a layer 1 feature propagation network, setting a layer l-1 feature propagation network as a layer to be propagated, wherein the specific process of point cloud feature propagation comprises the following steps:
241) finding k nearest neighbors of the midpoint of the l-1 layer from the l-1 layer feature propagation network;
242) calculating the distance d (x, x) between the nearest neighbor point and the point to be transmittedi'), x and xi' are the coordinates of the point to be transmitted and the nearest neighbor point, respectively;
243) point cloud features output by the l-th layer of feature propagation network are propagated to the l-1-th layer of feature propagation network, and the propagation formula is as follows:
Figure FDA0003343064180000041
ωi=1/d(x,xi′)p
wherein F is the point cloud characteristic propagated by the characteristic propagation network of the l-1 st layer, FiP is a distance index for the nearest neighbor feature;
244) and (3) splicing the point cloud features obtained by the propagation of the characteristic propagation network of the layer l-1 with the point cloud features extracted by the point cloud attention network in jumping connection with the characteristic propagation network of the layer l-1, and obtaining the point cloud features output by the characteristic propagation network of the layer l-1 through 1-by-1 convolution.
10. The method for detecting the point cloud target based on attention and sample learning as claimed in claim 1, wherein the step 3) comprises:
generating a confidence score representing the central approaching degree of the point cloud features and the target to be detected for each point cloud feature extracted in the step 2) through a multilayer perceptron, and sequentially selecting the feature with the highest confidence score as the target index point feature.
CN202111314134.7A 2021-11-08 2021-11-08 Point cloud target detection method based on attention and sampling learning Pending CN114120270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111314134.7A CN114120270A (en) 2021-11-08 2021-11-08 Point cloud target detection method based on attention and sampling learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111314134.7A CN114120270A (en) 2021-11-08 2021-11-08 Point cloud target detection method based on attention and sampling learning

Publications (1)

Publication Number Publication Date
CN114120270A true CN114120270A (en) 2022-03-01

Family

ID=80381357

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111314134.7A Pending CN114120270A (en) 2021-11-08 2021-11-08 Point cloud target detection method based on attention and sampling learning

Country Status (1)

Country Link
CN (1) CN114120270A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294343A (en) * 2022-07-13 2022-11-04 苏州驾驶宝智能科技有限公司 Point cloud feature enhancement method based on cross-position and channel attention mechanism
WO2023202401A1 (en) * 2022-04-19 2023-10-26 京东科技信息技术有限公司 Method and apparatus for detecting target in point cloud data, and computer-readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023202401A1 (en) * 2022-04-19 2023-10-26 京东科技信息技术有限公司 Method and apparatus for detecting target in point cloud data, and computer-readable storage medium
CN115294343A (en) * 2022-07-13 2022-11-04 苏州驾驶宝智能科技有限公司 Point cloud feature enhancement method based on cross-position and channel attention mechanism
CN115294343B (en) * 2022-07-13 2023-04-18 苏州驾驶宝智能科技有限公司 Point cloud feature enhancement method based on cross-position and channel attention mechanism

Similar Documents

Publication Publication Date Title
Al-qaness et al. An improved YOLO-based road traffic monitoring system
US10733755B2 (en) Learning geometric differentials for matching 3D models to objects in a 2D image
US20220011122A1 (en) Trajectory prediction method and device
WO2021249071A1 (en) Lane line detection method, and related apparatus
US10984659B2 (en) Vehicle parking availability map systems and methods
Ni et al. An improved deep network-based scene classification method for self-driving cars
CN111626217A (en) Target detection and tracking method based on two-dimensional picture and three-dimensional point cloud fusion
EP3822852B1 (en) Method, apparatus, computer storage medium and program for training a trajectory planning model
CN110781927B (en) Target detection and classification method based on deep learning under vehicle-road cooperation
CN116685874A (en) Camera-laser radar fusion object detection system and method
CN114120270A (en) Point cloud target detection method based on attention and sampling learning
US20220261590A1 (en) Apparatus, system and method for fusing sensor data to do sensor translation
CN113095152B (en) Regression-based lane line detection method and system
Zhang et al. Gc-net: Gridding and clustering for traffic object detection with roadside lidar
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
Bai et al. A survey and framework of cooperative perception: From heterogeneous singleton to hierarchical cooperation
Kanchana et al. Computer vision for autonomous driving
Guo et al. Feature‐based detection and classification of moving objects using LiDAR sensor
EP3764335A1 (en) Vehicle parking availability map systems and methods
CN113611008B (en) Vehicle driving scene acquisition method, device, equipment and medium
Bruno et al. A comparison of traffic signs detection methods in 2d and 3d images for the benefit of the navigation of autonomous vehicles
Liu et al. A vehicle detection model based on 5G-V2X for smart city security perception
Sharma et al. Deep Learning-Based Object Detection and Classification for Autonomous Vehicles in Different Weather Scenarios of Quebec, Canada
CN114495050A (en) Multitask integrated detection method for automatic driving forward vision detection
Yao et al. Lane marking detection algorithm based on high‐precision map and multisensor fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination