CN114821033A - Three-dimensional information enhanced detection and identification method and device based on laser point cloud - Google Patents

Three-dimensional information enhanced detection and identification method and device based on laser point cloud Download PDF

Info

Publication number
CN114821033A
CN114821033A CN202210289428.7A CN202210289428A CN114821033A CN 114821033 A CN114821033 A CN 114821033A CN 202210289428 A CN202210289428 A CN 202210289428A CN 114821033 A CN114821033 A CN 114821033A
Authority
CN
China
Prior art keywords
point cloud
point
features
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210289428.7A
Other languages
Chinese (zh)
Inventor
秦翰林
朱文锐
延翔
林凯东
许景贤
张天吉
侯本照
代杨
李兵斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202210289428.7A priority Critical patent/CN114821033A/en
Publication of CN114821033A publication Critical patent/CN114821033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a detection and identification method and a device for enhancing three-dimensional information based on laser point cloud, wherein the method comprises the following steps: carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels; extracting point cloud voxel characteristics of each non-empty voxel to obtain a plurality of characteristic graphs; inputting a plurality of feature maps into a regional suggestion network to generate a first-stage candidate box; inputting the candidate frame of the first stage into a point cloud space shape completion network to obtain a target point set; extracting point cloud structure information from the target point set to obtain global structure information; sampling non-empty voxel characteristics near each key point from the original point cloud data and combining the non-empty voxel characteristics as key point characteristics; extracting the key point features to obtain grid point features; fusing the grid point characteristics and the global structure information to obtain enhanced characteristics; and carrying out confidence degree prediction and candidate frame refinement on the enhanced features to obtain confidence degrees and boundary frame parameters. The method enhances the characteristic representation of the point cloud data and improves the detection precision of the laser point cloud target.

Description

Three-dimensional information enhanced detection and identification method and device based on laser point cloud
Technical Field
The invention belongs to the technical field of artificial intelligence and target detection and identification, and particularly relates to a detection and identification method based on three-dimensional information enhancement of laser point cloud.
Background
Because the imaging laser radar has the advantages of high angular resolution, strong anti-interference capability, high detection precision and the like, three-dimensional (3D) point cloud data reflecting three-dimensional geometric shape, angle and distance information of a target scene can be obtained. Therefore, the laser radar-based 3D point cloud target detection technology has wide application value in the field of unmanned driving.
The existing 3D point cloud target detection method mostly uses two processing methods based on voxel or original point cloud, and learns the characteristics from sparse and irregular point cloud data. The point cloud data is converted into a regular grid through voxelization, so that 3D object detection can be effectively carried out by using 2D convolution, but the quantization inevitably brings information loss, so that the positioning precision is reduced. On the contrary, the method for directly learning features from the original point cloud and completing prediction can keep accurate point cloud position information, but the calculation cost is high, and the problem of limited receptive field exists.
The existing research provides a point cloud characterization method based on voxelization to obtain voxelization characteristics of the point cloud, then the point cloud is converted into high-dimensional volume expression through 3D convolution, and finally a detection result is output through a regional suggestion network. Another research proposes that the original point cloud information generated by the laser radar is directly used as input, an end-to-end point cloud data processing network is designed, however, the point cloud data processing network extracts the feature description of each independent point and the feature description of the global point cloud, and does not consider the local features and the structural constraints, so that the performance of the point cloud data processing network in a complex scene is influenced.
Therefore, the existing method has the problems of difficult extraction of point cloud data characteristics and low point cloud target detection precision.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a detection and identification method and device based on three-dimensional information enhancement of laser point cloud. The technical problem to be solved by the invention is realized by the following technical scheme:
the embodiment of the invention provides a three-dimensional information enhanced detection and identification method based on laser point cloud, which comprises the following steps:
carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels;
extracting point cloud voxel characteristics of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of characteristic maps;
inputting the feature maps into a regional suggestion network to generate a first-stage candidate box;
inputting the candidate frame of the first stage into a point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set;
extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;
sampling the non-empty voxel characteristic combination near each key point from the original point cloud data as key point characteristics;
extracting the key point features by using a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;
fusing the grid point characteristics and the global structure information to obtain enhanced characteristics;
and carrying out confidence prediction and candidate frame refinement on the enhanced features to obtain confidence and boundary frame parameters.
In an embodiment of the present invention, performing voxelization processing on the original point cloud data to obtain a plurality of voxels includes:
and dividing the original point cloud data into a three-dimensional grid according to a fixed resolution ratio to obtain a plurality of voxels.
In an embodiment of the present invention, extracting point cloud voxel features of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of feature maps, including:
constructing the sparse convolutional network;
and inputting the average coordinate of the midpoint of each non-empty pixel and the reflectivity in the original point cloud data as initial features into the sparse convolution network, and outputting the feature maps.
In an embodiment of the present invention, inputting the first stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set, including:
processing the point set inside the first-stage candidate frame by using an ROI perception pooling module to obtain a first matrix;
inputting the first matrix into a multilayer perceptron to obtain a first intermediate characteristic, and inputting the first intermediate characteristic into a maximum pooling layer to obtain a second intermediate characteristic;
combining the second intermediate feature with the first intermediate feature to obtain a third intermediate feature;
inputting the third intermediate features into a multilayer perceptron and a maximum pooling layer in sequence to obtain global features;
and generating the target point set by stacking the global features through full connection.
In one embodiment of the present invention, extracting point cloud structure information from the target point set using a multi-scale grouping strategy to obtain global structure information comprises:
selecting a plurality of points from the target point set to form a target set by using a farthest point sampling algorithm;
extracting the local context of each point in the target set by utilizing a multilayer perceptron and a maximum pooling layer to obtain a second matrix;
extracting structural information from the second matrix by using the multi-scale grouping strategy to obtain a target tensor;
and generating the global structure information by the target tensor through a full connection layer.
In one embodiment of the present invention, sampling the non-empty voxel feature combination near each keypoint from the raw point cloud data as keypoint features comprises:
adopting a plurality of key points from the original point cloud data by using a furthest point sampling algorithm;
calculating a non-empty voxel characteristic set of each key point in a target radius range with the key point as a circle center in a kth sparse convolution module of a sparse convolution network:
Figure BDA0003561078850000041
wherein,
Figure BDA0003561078850000042
representing relative coordinates for the key points, d i Represents a key point, r k The radius of the object is represented by,
Figure BDA0003561078850000043
representing three-dimensional coordinates representing the keypoints in the kth sparse convolution module,
Figure BDA0003561078850000044
representing the characteristics output in the K sparse convolution module;
generating output features of each key point at the kth sparse convolution module by using the non-empty voxel feature set:
Figure BDA0003561078850000045
wherein M (-) represents a random sampling operation, and G (-) represents an operation performed by a multilayer perceptron;
and connecting the output characteristics of each key point in each sparse convolution module to obtain the key point characteristics.
In one embodiment of the invention, the grid points are characterized by:
Figure BDA0003561078850000051
where Ω (r) represents all points within a fixed radius r of a grid point, W (-) represents the mapping of a graph edge into a scalar or vector weight space, σ * A gate function with a learning behavior is represented,
Figure BDA0003561078850000052
linear projection representing the difference in position between two nodes, K i =Linear(f i ),K i Represents a key map, V i Representing the characteristics of node i.
In an embodiment of the present invention, the fusing the grid point feature and the global structure information to obtain an enhanced feature includes:
and fusing the grid point characteristics and the global structure information by using a perspective channel attention module to obtain enhanced characteristics.
In an embodiment of the present invention, performing confidence prediction and candidate frame refinement on the enhanced features to obtain a confidence and a bounding box parameter, includes:
and sequentially inputting the enhanced features into a two-layer multilayer perceptron to extract feature vectors, inputting the feature vectors into a first branch network for confidence prediction, and inputting into a second branch network for candidate frame refinement to obtain the confidence and the boundary frame parameters.
Another embodiment of the present invention provides a detection and recognition apparatus based on three-dimensional information enhancement of laser point cloud, including:
the point cloud voxelization module is used for voxelizing the original point cloud data to obtain a plurality of voxels;
the characteristic map extraction module is used for extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by utilizing a sparse convolution network to obtain a plurality of characteristic maps;
the candidate frame generation module is used for inputting the feature maps into the regional suggestion network to generate a first-stage candidate frame;
a point cloud space shape complementing module, configured to input the first-stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set;
the global structure information extraction module is used for extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;
a key point feature sampling module for sampling the non-empty voxel feature combination near each key point from the original point cloud data as key point features;
the grid point feature extraction module is used for extracting the key point features by utilizing a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;
the enhanced feature fusion module is used for fusing the grid point features and the global structure information to obtain enhanced features;
and the confidence coefficient and boundary frame parameter calculation module is used for carrying out confidence coefficient prediction and candidate frame refinement on the enhanced features to obtain confidence coefficient and boundary frame parameters.
Compared with the prior art, the invention has the beneficial effects that:
according to the detection and identification method, the point cloud space shape is completed, so that the extracted structural information contains more semantic representations, the feature representation of the point cloud data is enhanced, the characteristic extraction method has stronger representation capability compared with the feature extraction method adopted by the existing method, the problem that the existing method is difficult to extract the point cloud data features is solved, and the laser point cloud target detection precision is improved.
Drawings
Fig. 1 is a schematic flow diagram of a detection and identification method based on three-dimensional information enhancement of laser point cloud according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another method for detecting and identifying three-dimensional information enhancement based on laser point cloud according to an embodiment of the present invention;
fig. 3 is a schematic process diagram of a detection and identification method based on three-dimensional information enhancement of laser point cloud according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a feature extraction method combining a neural network and an attention mechanism according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a detection and identification device based on laser point cloud with enhanced three-dimensional information according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
Example one
Referring to fig. 1, fig. 2 and fig. 3, fig. 1 is a schematic flow chart of a three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention, fig. 2 is a schematic flow chart of another three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention, and fig. 3 is a schematic process chart of a three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention. The detection and identification method comprises the following steps:
and S1, carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels.
Specifically, the original point cloud data is divided into three-dimensional grids according to a fixed resolution, and the plurality of voxels are obtained. The method comprises the following steps of assuming that the range of the point cloud in the 3D space is length L, width W and height H along an axis X, Y, Z, and then correspondingly defining the length, width and height of each voxel as vL, vW and vH, wherein v is less than 1, so that a plurality of voxels are formed in the point cloud area.
And S2, extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of characteristic maps.
And S21, constructing a sparse convolution network.
Specifically, the sparse convolution network is formed by sequentially connecting a plurality of sparse convolution modules, each module is formed by sequentially connecting a plurality of stages, and each stage comprises a plurality of sub-manifold convolution layers and a normal sparse convolution layer. In this embodiment, the sparse convolution network is formed by sequentially connecting three identical sparse convolution modules, each module is formed by sequentially connecting two stages, and each stage includes 2 sub-manifold convolution layers and one normal sparse convolution layer.
And S22, inputting the average coordinate of each non-empty pixel midpoint and the reflectivity in the original point cloud data as initial features into the sparse convolution network, and outputting the feature maps.
Specifically, the average coordinates (x, y, z) of the midpoint of each non-empty voxel and the reflectivity R in the original point cloud data are used as initial features to be input into a sparse convolution network, the sparse convolution network can perform down-sampling on the z axis, and the obtained network output voxel feature vector is used as an extracted feature map.
And S3, inputting the feature maps into the area suggestion network to generate a first-stage candidate box.
Specifically, a regional suggestion network is established, and the extracted feature maps are input into the regional suggestion network to obtain a first-stage candidate frame.
And S4, inputting the first-stage candidate box into a point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set.
Specifically, a point cloud space shape completion network is constructed, and the network comprises an ROI sensing pooling module, a multilayer sensing machine, a maximum pooling layer, a feature combination module, a multilayer sensing machine, a maximum pooling layer and a full connection layer which are connected in sequence.
And S41, processing the point set inside the first-stage candidate frame by using an ROI perception pooling module to obtain a first matrix.
Specifically, assume that a first-stage candidate box is selected, and its internal point set is denoted as { P } i 1.. N }, where Pi is a vector of points with coordinates (x, y, z) and N is the number of points. The set of points Pi is then processed by the ROI-aware pooling module to obtain a first matrix M of n x 3, wherein,
Figure BDA0003561078850000081
r p is the size of the pooling layer.
And S42, inputting the first matrix into a multilayer perceptron to obtain a first intermediate feature, and inputting the first intermediate feature into a maximum pooling layer to obtain a second intermediate feature.
Specifically, a first matrix M of n × 3 is input into the multilayer perceptron, and a first intermediate feature v' is output; the first intermediate feature v' is then input into the maximum pooling layer, resulting in a 256-dimensional second intermediate feature v.
And S43, combining the second intermediate characteristic with the first intermediate characteristic to obtain a third intermediate characteristic.
Specifically, the 256-dimensional second intermediate feature v and the first intermediate feature v 'are subjected to vector splicing operation to obtain a 512-dimensional third intermediate feature w'.
And S44, sequentially inputting the third intermediate features into the multilayer perceptron and the maximum pooling layer to obtain global features.
Specifically, the obtained 512-dimensional third intermediate feature w' is input into the multilayer perceptron again, and the output feature of the multilayer perceptron is input into the maximum pooling layer, so that the 1024-dimensional local feature w is obtained finally.
And S45, generating the target point set by stacking the global features through full connection.
Specifically, 1024-dimensional global features w are overlapped through a fully connected layer (FCN) to generate a target point set
Figure BDA0003561078850000091
This is a 1024 x 3 matrix representing the complete spatial shape.
And S5, extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information.
And S51, selecting a plurality of points from the target point set by using a farthest point sampling algorithm to form a target set.
Specifically, a Farthest Point Sampling algorithm (FPS) is used to sample the target Point set
Figure BDA0003561078850000092
M points are selected to form a target set S, wherein S i I ═ 1.. multidot.m }, grouping T nearest neighbors of each point in the target set S to obtain an mxtx3 tensor.
And S52, extracting the local context of each point in the target set by utilizing a multilayer perceptron and a maximum pooling layer to obtain a second matrix.
Specifically, the target set S is input into the multi-layered perceptron, and the output of the multi-layered perceptron is input into the max-pooling layer, which outputs the local context of each point in the target set S, resulting in a second matrix N of size m × C1, where C1 represents the number of channels.
And S53, extracting structural information from the second matrix by using the multi-scale grouping strategy to obtain a target tensor.
Specifically, a Multi-Scale Grouping (MSG) strategy is utilized to extract structural information from the second matrix N, and the output size is mx (C) 1 +C 1 ) The target tensor of (1).
And S54, generating the global structure information by the target tensor through a full connection layer.
Specifically, m × (C) 1 +C 1 ) The target tensor is input to the full connection layer to generate the global structure information
Figure BDA0003561078850000101
Wherein,
Figure BDA0003561078850000102
represents the number of channels as C 1 Is determined.
And S6, sampling the non-empty voxel characteristic combination near each key point from the original point cloud data to be used as key point characteristics.
And S61, adopting a plurality of key points from the original point cloud data by using a furthest point sampling algorithm.
Specifically, firstly, a Farthest Point Sampling algorithm (FPS) is used to sample n key points d from the original Point cloud data i . For example, 2048 for the KITTI dataset n and 4096 for the Waymo dataset n.
Then sampling non-empty voxel characteristic combination near each key point as key point characteristic F m The method comprises the following specific steps:
and S62, calculating a non-empty voxel characteristic set of each key point in a target radius range with the key point as the center in the kth sparse convolution module of the sparse convolution network.
Specifically, first, the definition
Figure BDA0003561078850000103
Features output in the kth sparse convolution module of the sparse convolution network in step S2, N k For the number of non-empty voxels in the kth sparse convolution module, then for each keypoint d i It can be defined as the k-th sparse convolution module centered on itself and r k In a circular area with a target radius, the non-empty voxel feature set is:
Figure BDA0003561078850000111
wherein,
Figure BDA0003561078850000112
representing relative coordinates for the key points, d i Represents a key point, r k The radius of the object is represented by,
Figure BDA0003561078850000113
representing three-dimensional coordinates representing the keypoints in the kth sparse convolution module,
Figure BDA0003561078850000114
representing the characteristics output in the K sparse convolution module; .
And S63, generating the output feature of each key point in the k-th sparse convolution module by using the non-empty voxel feature set.
Specifically, a non-empty voxel feature set is utilized to generate a key point d i Output characteristic f in the kth sparse convolution module i (pvk)
Figure BDA0003561078850000115
Wherein M (-) represents a random sampling operation, and G (-) represents an operation performed by the multi-layer perceptron.
And S64, connecting the output characteristics of each key point in each sparse convolution module to obtain the key point characteristics.
In particular, for each keypoint d i Obtaining an output characteristic f i (pvk) Then, connecting the features of each sparse convolution module to obtain a total feature, namely obtaining a key point feature F m
And S7, extracting the key point features by using a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features.
Specifically, a feature extraction module combining a graph neural network and an attention mechanism is constructed, and key point features F are combined m Inputting the data into the feature extraction module for feature extraction to obtain grid point features。
Referring to fig. 4, fig. 4 is a schematic flow chart of a feature extraction method combining a graph neural network and an attention mechanism according to an embodiment of the present invention. The characteristic extraction method comprises the following steps:
s71, the graph neural network regards each point of the point cloud as a node of the graph structure, and each point and adjacent points generate edges; feature F of node i in graph structure i m Is shown as V i =MLP(F i m ) An edge can be represented as a linear projection of the position difference between two nodes, which is expressed by the formula:
Figure BDA0003561078850000121
feature F of key points m Inputting the data into a feature extraction module, and extracting features from adjacent nodes by weighted combination operation
Figure BDA0003561078850000122
The calculation is as follows:
Figure BDA0003561078850000123
where W (-) indicates for mapping graph edges into scalar or vector weight spaces, an "-" indicates a scalar vector product between a learning weight and a graph node,
Figure BDA0003561078850000124
linear projection, p, representing the difference in position between two nodes i Indicating the position of the ith node, p j Denotes the position of the jth node, V i =MLP(F i m ) Representing the characteristics of node i, F i m Representing the keypoint features.
S72. in the attention mechanism concept,
Figure BDA0003561078850000125
can be considered as slave grid point p gird To point p i Request mapping of, V i Can be regarded as slave feature f i Obtained value mapping, keyMapping K i Can be represented as K i =Linear(f i ) The process of extracting features can be expressed as the following formula,
Figure BDA0003561078850000126
for the extracted features:
Figure BDA0003561078850000127
wherein W (-) represents the softmax function.
S73, combining the formula (3) and the formula (4), and obtaining grid point characteristics:
Figure BDA0003561078850000128
where Ω (r) represents all points within a fixed radius r of a grid point, W (-) represents the mapping of a graph edge into a scalar or vector weight space, σ * Representing a gate function with learning, by linear projection and sigmoid function,
Figure BDA0003561078850000129
linear projection representing the difference in position between two nodes, K i =Linear(f i ),K i Represents a key map, V i Representing the characteristics of node i.
The formula (5) combines the principle based on the graph neural network and the attention mechanism, and is a flexible and effective feature extraction method; which can be adaptively derived from the geometric information Q by means of a gate function with learning pos Semantic information K and composition Q pos Learning more meaningful point cloud characteristics in K to obtain grid point characteristics F g
And S8, fusing the grid point characteristics and the global structure information to obtain enhanced characteristics.
Specifically, as shown in fig. 3, a perspective channel attention module is constructed, and grid point characteristics F are obtained g And global structure information F s Input into the perspective channelIn the module, the output enhancement feature F e . Further, in the perspective channel attention module, the grid point feature F is first identified g And global structure information F s Connecting, inputting the total characteristics obtained by connection into two paths of maximum pooling layers, linear layers and Rule functions for processing respectively, multiplying array elements in the two paths of processing results in sequence, processing the multiplication result by using a sigmoid function, and multiplying the characteristics obtained by connection and the processing result of the sigmoid function by using matrix multiplication to obtain an enhanced characteristic F e
And S9, performing confidence prediction and candidate frame refinement on the enhanced features to obtain confidence and bounding box parameters.
Specifically, the enhanced features are sequentially input into a two-layer multilayer perceptron to extract feature vectors, the feature vectors are input into a first branch network to carry out confidence prediction, the feature vectors are input into a second branch network to carry out candidate frame refinement, the confidence and the bounding box parameters are obtained, and therefore the detection recognition result is obtained. Further, the bounding box parameter may determine a target range of detection recognition, and the confidence level is used to determine the accuracy of the bounding box parameter.
First a confidence prediction is made. Specifically, the confidence prediction is obtained by calculating the intersection ratio (IoU) between the 3D prediction candidate region and the corresponding truth box. For the k 3D prediction candidate region, its confidence value y k Is normalized to be between [0,1 ]]As follows:
y k =min(1,max(0,2IOU k -0.5)) (6)
wherein, the IOU k The intersection ratio between the kth 3D prediction candidate region and its corresponding truth box is determined.
And then carrying out candidate frame refinement. Specifically, the refinement of the candidate frame is achieved by encoding the points of each 3D prediction candidate frame.
Further, after obtaining confidence and bounding box parameters, the method
Figure BDA0003561078850000141
The loss function is optimized:
Figure BDA0003561078850000142
wherein x is ti-ti * Ti is the predicted coordinate vector and ti is the true box coordinate vector.
Through optimization of the loss function, the recognition rate of the target result detected by the algorithm and the actual real target can be measured.
Further, the present embodiment verifies the recognition rate of the target detection through simulation.
1. Simulation conditions
The computer system adopted in the embodiment is a multi GPU SCS4880 series supercomputing server. The CPU processor is an Intel to Strong E5-2630V4 processor. The 8 NVIDIA RTX 2080Ti 11GB video cards are integrated, and the hard disk consists of 1 system disk and 2 storage disks. The operating system adopts Ubuntu16.0.4, and the operating environment is NET.framework 4.5. And installing support of related libraries such as CUDA9.0, Python3.5, CUDNN 0.7 and the like.
2. Emulated content
And calculating the target detection recognition rate according to the detection recognition method. By calculation, the target detection recognition rate is 78.9%. Therefore, the detection and identification method based on the laser point cloud three-dimensional information enhancement fully excavates the information contained in the point cloud data, can obtain higher detection and identification precision, and has advancement.
In conclusion, the detection and identification method of the embodiment completes the shape of the point cloud space, so that the extracted structural information contains more semantic representations, the feature representation of the point cloud data is enhanced, the method has stronger representation capability compared with a feature extraction mode adopted by the existing method, the problem that the existing method is difficult to extract the point cloud data features is solved, the detection precision of the laser point cloud target is improved, and the method can be used in the field of unmanned driving, such as obstacle identification and path planning.
Example two
On the basis of the first embodiment, please refer to fig. 5, and fig. 5 is a schematic structural diagram of a detection and identification apparatus for enhancing three-dimensional information based on laser point cloud according to an embodiment of the present invention. The detection and identification device comprises: a point cloud voxelization module, a feature map extraction module, a candidate frame generation module, a point cloud space shape completion module, a global structure information extraction module, a key point feature sampling module, a grid point feature extraction module, an enhanced feature fusion module and a confidence coefficient and bounding box parameter calculation module, wherein,
the point cloud voxelization module is used for voxelizing the original point cloud data to obtain a plurality of voxels. The characteristic map extraction module is connected with the point cloud voxelization module and used for extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by utilizing a sparse convolution network to obtain a plurality of characteristic maps. The candidate frame generation module is connected with the feature map extraction module and used for inputting the feature maps into the regional suggestion network to generate a first-stage candidate frame. And the point cloud space shape complementing module is connected with the candidate frame generating module and used for inputting the candidate frame of the first stage into the point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set. And the global structure information extraction module is connected with the point cloud space shape completion module and used for extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information. And the key point feature sampling module is connected with the point cloud voxelization module and the feature map extraction module and is used for sampling the non-empty voxel features near each key point from the original point cloud data and combining the non-empty voxel features as key point features. And the grid point feature extraction module is connected with the key point feature sampling module and is used for extracting the key point features by utilizing the feature extraction module combining the graph neural network and the attention mechanism to obtain the grid point features. The enhanced feature fusion module is connected with the global structure information extraction module and the grid point feature extraction module and is used for fusing the grid point features and the global structure information to obtain enhanced features. The confidence coefficient and boundary frame parameter calculation module is connected with the enhanced feature fusion module and is used for carrying out confidence coefficient prediction and candidate frame refinement on the enhanced features to obtain confidence coefficient and boundary frame parameters.
Please refer to embodiment one for a specific implementation method of each module, which is not described in detail in this embodiment.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A detection and identification method based on three-dimensional information enhancement of laser point cloud is characterized by comprising the following steps:
carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels;
extracting point cloud voxel characteristics of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of characteristic maps;
inputting the feature maps into a regional suggestion network to generate a first-stage candidate box;
inputting the candidate frame of the first stage into a point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set;
extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;
sampling the non-empty voxel characteristic combination near each key point from the original point cloud data as key point characteristics;
extracting the key point features by using a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;
fusing the grid point characteristics and the global structure information to obtain enhanced characteristics;
and carrying out confidence prediction and candidate frame refinement on the enhanced features to obtain confidence and boundary frame parameters.
2. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud as claimed in claim 1, wherein the voxel processing is performed on the original point cloud data to obtain a plurality of voxels, and the method comprises the following steps:
and dividing the original point cloud data into a three-dimensional grid according to a fixed resolution ratio to obtain a plurality of voxels.
3. The laser point cloud-based three-dimensional information enhanced detection and identification method according to claim 1, wherein the extracting point cloud voxel features of each non-empty voxel in the voxels by using a sparse convolution network to obtain feature maps comprises:
constructing the sparse convolutional network;
and inputting the average coordinate of the midpoint of each non-empty pixel and the reflectivity in the original point cloud data as initial features into the sparse convolution network, and outputting the feature maps.
4. The method of claim 1, wherein inputting the first-stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set comprises:
processing the point set inside the first-stage candidate frame by using an ROI perception pooling module to obtain a first matrix;
inputting the first matrix into a multilayer perceptron to obtain a first intermediate characteristic, and inputting the first intermediate characteristic into a maximum pooling layer to obtain a second intermediate characteristic;
combining the second intermediate feature with the first intermediate feature to obtain a third intermediate feature;
inputting the third intermediate features into a multilayer perceptron and a maximum pooling layer in sequence to obtain global features;
and generating the target point set by stacking the global features through full connection.
5. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud as claimed in claim 1, wherein extracting the point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain the global structure information comprises:
selecting a plurality of points from the target point set to form a target set by using a furthest point sampling algorithm;
extracting the local context of each point in the target set by utilizing a multilayer perceptron and a maximum pooling layer to obtain a second matrix;
extracting structural information from the second matrix by using the multi-scale grouping strategy to obtain a target tensor;
and generating the global structure information by the target tensor through a full connection layer.
6. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein sampling the non-empty voxel features in the vicinity of each key point from the original point cloud data as key point features comprises:
adopting a plurality of key points from the original point cloud data by using a furthest point sampling algorithm;
calculating a non-empty voxel characteristic set of each key point in a target radius range with the key point as a circle center in a kth sparse convolution module of a sparse convolution network:
Figure FDA0003561078840000031
wherein,
Figure FDA0003561078840000032
representing relative coordinates for the key points, d i Represents a key point, r k The radius of the object is represented by,
Figure FDA0003561078840000033
representing three-dimensional coordinates showing the keypoints in the kth sparse convolution module,
Figure FDA0003561078840000034
representing the features output in the Kth sparse convolution module;
generating output features of each key point at the kth sparse convolution module by using the non-empty voxel feature set:
Figure FDA0003561078840000035
wherein M (-) represents a random sampling operation, and G (-) represents an operation performed by a multilayer perceptron;
and connecting the output characteristics of each key point in each sparse convolution module to obtain the key point characteristics.
7. The method for detecting and identifying three-dimensional information enhancement based on laser point cloud of claim 1, wherein the grid point features are as follows:
Figure FDA0003561078840000036
where Ω (r) represents all points within a fixed radius r of a grid point, W (-) represents the mapping of a graph edge into a scalar or vector weight space, σ * A gate function with a learning behavior is represented,
Figure FDA0003561078840000041
linear projection representing the difference in position between two nodes, K i =Linear(f i ),K i Represents a key map, V i Representing the characteristics of node i.
8. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein the step of fusing the grid point features and the global structure information to obtain enhanced features comprises the steps of:
and fusing the grid point characteristics and the global structure information by using a perspective channel attention module to obtain enhanced characteristics.
9. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein performing confidence prediction and candidate frame refinement on the enhancement features to obtain confidence and bounding box parameters comprises:
and sequentially inputting the enhanced features into a two-layer multilayer perceptron to extract feature vectors, inputting the feature vectors into a first branch network for confidence prediction, and inputting into a second branch network for candidate frame refinement to obtain the confidence and the boundary frame parameters.
10. The utility model provides a detection recognition device of three-dimensional information reinforcing based on laser point cloud which characterized in that includes:
the point cloud voxelization module is used for voxelizing the original point cloud data to obtain a plurality of voxels;
the characteristic map extraction module is used for extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by utilizing a sparse convolution network to obtain a plurality of characteristic maps;
the candidate frame generation module is used for inputting the feature maps into the regional suggestion network to generate a first-stage candidate frame;
a point cloud space shape complementing module, configured to input the first-stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set;
the global structure information extraction module is used for extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;
a key point feature sampling module for sampling the non-empty voxel feature combination near each key point from the original point cloud data as key point features;
the grid point feature extraction module is used for extracting the key point features by utilizing a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;
the enhanced feature fusion module is used for fusing the grid point features and the global structure information to obtain enhanced features;
and the confidence coefficient and bounding box parameter calculation module is used for carrying out confidence coefficient prediction and candidate box refinement on the enhanced features to obtain confidence coefficient and bounding box parameters.
CN202210289428.7A 2022-03-23 2022-03-23 Three-dimensional information enhanced detection and identification method and device based on laser point cloud Pending CN114821033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210289428.7A CN114821033A (en) 2022-03-23 2022-03-23 Three-dimensional information enhanced detection and identification method and device based on laser point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210289428.7A CN114821033A (en) 2022-03-23 2022-03-23 Three-dimensional information enhanced detection and identification method and device based on laser point cloud

Publications (1)

Publication Number Publication Date
CN114821033A true CN114821033A (en) 2022-07-29

Family

ID=82530057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210289428.7A Pending CN114821033A (en) 2022-03-23 2022-03-23 Three-dimensional information enhanced detection and identification method and device based on laser point cloud

Country Status (1)

Country Link
CN (1) CN114821033A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874900A (en) * 2024-03-12 2024-04-12 中钜(陕西)工程咨询管理有限公司 House construction engineering supervision method based on BIM technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170374342A1 (en) * 2016-06-24 2017-12-28 Isee, Inc. Laser-enhanced visual simultaneous localization and mapping (slam) for mobile devices
CN110689008A (en) * 2019-09-17 2020-01-14 大连理工大学 Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction
CN111444811A (en) * 2020-03-23 2020-07-24 复旦大学 Method for detecting three-dimensional point cloud target
CN113378854A (en) * 2021-06-11 2021-09-10 武汉大学 Point cloud target detection method integrating original point cloud and voxel division
CN113468994A (en) * 2021-06-21 2021-10-01 武汉理工大学 Three-dimensional target detection method based on weighted sampling and multi-resolution feature extraction
CN113920499A (en) * 2021-10-27 2022-01-11 江苏大学 Laser point cloud three-dimensional target detection model and method for complex traffic scene

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170374342A1 (en) * 2016-06-24 2017-12-28 Isee, Inc. Laser-enhanced visual simultaneous localization and mapping (slam) for mobile devices
CN110689008A (en) * 2019-09-17 2020-01-14 大连理工大学 Monocular image-oriented three-dimensional object detection method based on three-dimensional reconstruction
CN111444811A (en) * 2020-03-23 2020-07-24 复旦大学 Method for detecting three-dimensional point cloud target
CN113378854A (en) * 2021-06-11 2021-09-10 武汉大学 Point cloud target detection method integrating original point cloud and voxel division
CN113468994A (en) * 2021-06-21 2021-10-01 武汉理工大学 Three-dimensional target detection method based on weighted sampling and multi-resolution feature extraction
CN113920499A (en) * 2021-10-27 2022-01-11 江苏大学 Laser point cloud three-dimensional target detection model and method for complex traffic scene

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117874900A (en) * 2024-03-12 2024-04-12 中钜(陕西)工程咨询管理有限公司 House construction engineering supervision method based on BIM technology
CN117874900B (en) * 2024-03-12 2024-05-24 中钜(陕西)工程咨询管理有限公司 House construction engineering supervision method based on BIM technology

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN112488210A (en) Three-dimensional point cloud automatic classification method based on graph convolution neural network
CN110930454A (en) Six-degree-of-freedom pose estimation algorithm based on boundary box outer key point positioning
WO2016130203A1 (en) Convolution matrix multiply with callback for deep tiling for deep convolutional neural networks
US20200003886A1 (en) Apparatus and method with ego motion estimation
US12056615B2 (en) Icospherical gauge convolutional neural network
Peyghambarzadeh et al. Point-PlaneNet: Plane kernel based convolutional neural network for point clouds analysis
Chen et al. 3D point cloud semantic segmentation toward large-scale unstructured agricultural scene classification
WO2023164933A1 (en) Building modeling method and related apparatus
CN111507222A (en) Three-dimensional object detection framework based on multi-source data knowledge migration
US20220277581A1 (en) Hand pose estimation method, device and storage medium
CN114612660A (en) Three-dimensional modeling method based on multi-feature fusion point cloud segmentation
CN116152800A (en) 3D dynamic multi-target detection method, system and storage medium based on cross-view feature fusion
CN114120045B (en) Target detection method and device based on multi-gate control hybrid expert model
CN116704504A (en) Radar panorama segmentation method based on decoupling dynamic convolution kernel
Zhong et al. Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
CN114821033A (en) Three-dimensional information enhanced detection and identification method and device based on laser point cloud
Tong et al. Learning local contextual features for 3D point clouds semantic segmentation by attentive kernel convolution
CN115147564A (en) Three-dimensional model construction method, neural network training method and device
Ansari et al. Angle-based feature learning in GNN for 3D object detection using point cloud
Singh et al. Deep learning-based semantic segmentation of three-dimensional point cloud: a comprehensive review
Wang et al. A category-contrastive guided-graph convolutional network approach for the semantic segmentation of point clouds
CN117351198A (en) Point cloud semantic segmentation method based on dynamic convolution
CN114819053A (en) Average wave direction forecast deviation correction method based on space-time convolution LSTM
Bastås et al. Outdoor global pose estimation from RGB and 3D data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination