CN114821033A

CN114821033A - Three-dimensional information enhanced detection and identification method and device based on laser point cloud

Info

Publication number: CN114821033A
Application number: CN202210289428.7A
Authority: CN
Inventors: 秦翰林; 朱文锐; 延翔; 林凯东; 许景贤; 张天吉; 侯本照; 代杨; 李兵斌
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2022-07-29

Abstract

The invention relates to a detection and identification method and a device for enhancing three-dimensional information based on laser point cloud, wherein the method comprises the following steps: carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels; extracting point cloud voxel characteristics of each non-empty voxel to obtain a plurality of characteristic graphs; inputting a plurality of feature maps into a regional suggestion network to generate a first-stage candidate box; inputting the candidate frame of the first stage into a point cloud space shape completion network to obtain a target point set; extracting point cloud structure information from the target point set to obtain global structure information; sampling non-empty voxel characteristics near each key point from the original point cloud data and combining the non-empty voxel characteristics as key point characteristics; extracting the key point features to obtain grid point features; fusing the grid point characteristics and the global structure information to obtain enhanced characteristics; and carrying out confidence degree prediction and candidate frame refinement on the enhanced features to obtain confidence degrees and boundary frame parameters. The method enhances the characteristic representation of the point cloud data and improves the detection precision of the laser point cloud target.

Description

Three-dimensional information enhanced detection and identification method and device based on laser point cloud

Technical Field

The invention belongs to the technical field of artificial intelligence and target detection and identification, and particularly relates to a detection and identification method based on three-dimensional information enhancement of laser point cloud.

Background

Because the imaging laser radar has the advantages of high angular resolution, strong anti-interference capability, high detection precision and the like, three-dimensional (3D) point cloud data reflecting three-dimensional geometric shape, angle and distance information of a target scene can be obtained. Therefore, the laser radar-based 3D point cloud target detection technology has wide application value in the field of unmanned driving.

The existing 3D point cloud target detection method mostly uses two processing methods based on voxel or original point cloud, and learns the characteristics from sparse and irregular point cloud data. The point cloud data is converted into a regular grid through voxelization, so that 3D object detection can be effectively carried out by using 2D convolution, but the quantization inevitably brings information loss, so that the positioning precision is reduced. On the contrary, the method for directly learning features from the original point cloud and completing prediction can keep accurate point cloud position information, but the calculation cost is high, and the problem of limited receptive field exists.

The existing research provides a point cloud characterization method based on voxelization to obtain voxelization characteristics of the point cloud, then the point cloud is converted into high-dimensional volume expression through 3D convolution, and finally a detection result is output through a regional suggestion network. Another research proposes that the original point cloud information generated by the laser radar is directly used as input, an end-to-end point cloud data processing network is designed, however, the point cloud data processing network extracts the feature description of each independent point and the feature description of the global point cloud, and does not consider the local features and the structural constraints, so that the performance of the point cloud data processing network in a complex scene is influenced.

Therefore, the existing method has the problems of difficult extraction of point cloud data characteristics and low point cloud target detection precision.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a detection and identification method and device based on three-dimensional information enhancement of laser point cloud. The technical problem to be solved by the invention is realized by the following technical scheme:

the embodiment of the invention provides a three-dimensional information enhanced detection and identification method based on laser point cloud, which comprises the following steps:

carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels;

extracting point cloud voxel characteristics of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of characteristic maps;

inputting the feature maps into a regional suggestion network to generate a first-stage candidate box;

inputting the candidate frame of the first stage into a point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set;

extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;

sampling the non-empty voxel characteristic combination near each key point from the original point cloud data as key point characteristics;

extracting the key point features by using a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;

fusing the grid point characteristics and the global structure information to obtain enhanced characteristics;

and carrying out confidence prediction and candidate frame refinement on the enhanced features to obtain confidence and boundary frame parameters.

In an embodiment of the present invention, performing voxelization processing on the original point cloud data to obtain a plurality of voxels includes:

and dividing the original point cloud data into a three-dimensional grid according to a fixed resolution ratio to obtain a plurality of voxels.

In an embodiment of the present invention, extracting point cloud voxel features of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of feature maps, including:

constructing the sparse convolutional network;

and inputting the average coordinate of the midpoint of each non-empty pixel and the reflectivity in the original point cloud data as initial features into the sparse convolution network, and outputting the feature maps.

In an embodiment of the present invention, inputting the first stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set, including:

processing the point set inside the first-stage candidate frame by using an ROI perception pooling module to obtain a first matrix;

inputting the first matrix into a multilayer perceptron to obtain a first intermediate characteristic, and inputting the first intermediate characteristic into a maximum pooling layer to obtain a second intermediate characteristic;

combining the second intermediate feature with the first intermediate feature to obtain a third intermediate feature;

inputting the third intermediate features into a multilayer perceptron and a maximum pooling layer in sequence to obtain global features;

and generating the target point set by stacking the global features through full connection.

In one embodiment of the present invention, extracting point cloud structure information from the target point set using a multi-scale grouping strategy to obtain global structure information comprises:

selecting a plurality of points from the target point set to form a target set by using a farthest point sampling algorithm;

extracting the local context of each point in the target set by utilizing a multilayer perceptron and a maximum pooling layer to obtain a second matrix;

extracting structural information from the second matrix by using the multi-scale grouping strategy to obtain a target tensor;

and generating the global structure information by the target tensor through a full connection layer.

In one embodiment of the present invention, sampling the non-empty voxel feature combination near each keypoint from the raw point cloud data as keypoint features comprises:

adopting a plurality of key points from the original point cloud data by using a furthest point sampling algorithm;

calculating a non-empty voxel characteristic set of each key point in a target radius range with the key point as a circle center in a kth sparse convolution module of a sparse convolution network:

wherein,

representing relative coordinates for the key points, d _i Represents a key point, r _k The radius of the object is represented by,

representing three-dimensional coordinates representing the keypoints in the kth sparse convolution module,

representing the characteristics output in the K sparse convolution module;

generating output features of each key point at the kth sparse convolution module by using the non-empty voxel feature set:

wherein M (-) represents a random sampling operation, and G (-) represents an operation performed by a multilayer perceptron;

and connecting the output characteristics of each key point in each sparse convolution module to obtain the key point characteristics.

In one embodiment of the invention, the grid points are characterized by:

where Ω (r) represents all points within a fixed radius r of a grid point, W (-) represents the mapping of a graph edge into a scalar or vector weight space, σ _* A gate function with a learning behavior is represented,

linear projection representing the difference in position between two nodes, K ⁱ ＝Linear(f _i )，K _i Represents a key map, V ⁱ Representing the characteristics of node i.

In an embodiment of the present invention, the fusing the grid point feature and the global structure information to obtain an enhanced feature includes:

and fusing the grid point characteristics and the global structure information by using a perspective channel attention module to obtain enhanced characteristics.

In an embodiment of the present invention, performing confidence prediction and candidate frame refinement on the enhanced features to obtain a confidence and a bounding box parameter, includes:

and sequentially inputting the enhanced features into a two-layer multilayer perceptron to extract feature vectors, inputting the feature vectors into a first branch network for confidence prediction, and inputting into a second branch network for candidate frame refinement to obtain the confidence and the boundary frame parameters.

Another embodiment of the present invention provides a detection and recognition apparatus based on three-dimensional information enhancement of laser point cloud, including:

the point cloud voxelization module is used for voxelizing the original point cloud data to obtain a plurality of voxels;

the characteristic map extraction module is used for extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by utilizing a sparse convolution network to obtain a plurality of characteristic maps;

the candidate frame generation module is used for inputting the feature maps into the regional suggestion network to generate a first-stage candidate frame;

a point cloud space shape complementing module, configured to input the first-stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set;

the global structure information extraction module is used for extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information;

a key point feature sampling module for sampling the non-empty voxel feature combination near each key point from the original point cloud data as key point features;

the grid point feature extraction module is used for extracting the key point features by utilizing a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features;

the enhanced feature fusion module is used for fusing the grid point features and the global structure information to obtain enhanced features;

and the confidence coefficient and boundary frame parameter calculation module is used for carrying out confidence coefficient prediction and candidate frame refinement on the enhanced features to obtain confidence coefficient and boundary frame parameters.

Compared with the prior art, the invention has the beneficial effects that:

according to the detection and identification method, the point cloud space shape is completed, so that the extracted structural information contains more semantic representations, the feature representation of the point cloud data is enhanced, the characteristic extraction method has stronger representation capability compared with the feature extraction method adopted by the existing method, the problem that the existing method is difficult to extract the point cloud data features is solved, and the laser point cloud target detection precision is improved.

Drawings

Fig. 1 is a schematic flow diagram of a detection and identification method based on three-dimensional information enhancement of laser point cloud according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another method for detecting and identifying three-dimensional information enhancement based on laser point cloud according to an embodiment of the present invention;

fig. 3 is a schematic process diagram of a detection and identification method based on three-dimensional information enhancement of laser point cloud according to an embodiment of the present invention;

FIG. 4 is a schematic flow chart of a feature extraction method combining a neural network and an attention mechanism according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a detection and identification device based on laser point cloud with enhanced three-dimensional information according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.

Example one

Referring to fig. 1, fig. 2 and fig. 3, fig. 1 is a schematic flow chart of a three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention, fig. 2 is a schematic flow chart of another three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention, and fig. 3 is a schematic process chart of a three-dimensional information enhancement detection and identification method based on laser point cloud according to an embodiment of the present invention. The detection and identification method comprises the following steps:

and S1, carrying out voxelization processing on the original point cloud data to obtain a plurality of voxels.

Specifically, the original point cloud data is divided into three-dimensional grids according to a fixed resolution, and the plurality of voxels are obtained. The method comprises the following steps of assuming that the range of the point cloud in the 3D space is length L, width W and height H along an axis X, Y, Z, and then correspondingly defining the length, width and height of each voxel as vL, vW and vH, wherein v is less than 1, so that a plurality of voxels are formed in the point cloud area.

And S2, extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by using a sparse convolution network to obtain a plurality of characteristic maps.

And S21, constructing a sparse convolution network.

Specifically, the sparse convolution network is formed by sequentially connecting a plurality of sparse convolution modules, each module is formed by sequentially connecting a plurality of stages, and each stage comprises a plurality of sub-manifold convolution layers and a normal sparse convolution layer. In this embodiment, the sparse convolution network is formed by sequentially connecting three identical sparse convolution modules, each module is formed by sequentially connecting two stages, and each stage includes 2 sub-manifold convolution layers and one normal sparse convolution layer.

And S22, inputting the average coordinate of each non-empty pixel midpoint and the reflectivity in the original point cloud data as initial features into the sparse convolution network, and outputting the feature maps.

Specifically, the average coordinates (x, y, z) of the midpoint of each non-empty voxel and the reflectivity R in the original point cloud data are used as initial features to be input into a sparse convolution network, the sparse convolution network can perform down-sampling on the z axis, and the obtained network output voxel feature vector is used as an extracted feature map.

And S3, inputting the feature maps into the area suggestion network to generate a first-stage candidate box.

Specifically, a regional suggestion network is established, and the extracted feature maps are input into the regional suggestion network to obtain a first-stage candidate frame.

And S4, inputting the first-stage candidate box into a point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set.

Specifically, a point cloud space shape completion network is constructed, and the network comprises an ROI sensing pooling module, a multilayer sensing machine, a maximum pooling layer, a feature combination module, a multilayer sensing machine, a maximum pooling layer and a full connection layer which are connected in sequence.

And S41, processing the point set inside the first-stage candidate frame by using an ROI perception pooling module to obtain a first matrix.

Specifically, assume that a first-stage candidate box is selected, and its internal point set is denoted as { P } _i 1.. N }, where Pi is a vector of points with coordinates (x, y, z) and N is the number of points. The set of points Pi is then processed by the ROI-aware pooling module to obtain a first matrix M of n x 3, wherein,

r _p is the size of the pooling layer.

And S42, inputting the first matrix into a multilayer perceptron to obtain a first intermediate feature, and inputting the first intermediate feature into a maximum pooling layer to obtain a second intermediate feature.

Specifically, a first matrix M of n × 3 is input into the multilayer perceptron, and a first intermediate feature v' is output; the first intermediate feature v' is then input into the maximum pooling layer, resulting in a 256-dimensional second intermediate feature v.

And S43, combining the second intermediate characteristic with the first intermediate characteristic to obtain a third intermediate characteristic.

Specifically, the 256-dimensional second intermediate feature v and the first intermediate feature v 'are subjected to vector splicing operation to obtain a 512-dimensional third intermediate feature w'.

And S44, sequentially inputting the third intermediate features into the multilayer perceptron and the maximum pooling layer to obtain global features.

Specifically, the obtained 512-dimensional third intermediate feature w' is input into the multilayer perceptron again, and the output feature of the multilayer perceptron is input into the maximum pooling layer, so that the 1024-dimensional local feature w is obtained finally.

And S45, generating the target point set by stacking the global features through full connection.

Specifically, 1024-dimensional global features w are overlapped through a fully connected layer (FCN) to generate a target point set

This is a 1024 x 3 matrix representing the complete spatial shape.

And S5, extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information.

And S51, selecting a plurality of points from the target point set by using a farthest point sampling algorithm to form a target set.

Specifically, a Farthest Point Sampling algorithm (FPS) is used to sample the target Point set

M points are selected to form a target set S, wherein S _i I ═ 1.. multidot.m }, grouping T nearest neighbors of each point in the target set S to obtain an mxtx3 tensor.

And S52, extracting the local context of each point in the target set by utilizing a multilayer perceptron and a maximum pooling layer to obtain a second matrix.

Specifically, the target set S is input into the multi-layered perceptron, and the output of the multi-layered perceptron is input into the max-pooling layer, which outputs the local context of each point in the target set S, resulting in a second matrix N of size m × C1, where C1 represents the number of channels.

And S53, extracting structural information from the second matrix by using the multi-scale grouping strategy to obtain a target tensor.

Specifically, a Multi-Scale Grouping (MSG) strategy is utilized to extract structural information from the second matrix N, and the output size is mx (C) ₁ +C ₁ ) The target tensor of (1).

And S54, generating the global structure information by the target tensor through a full connection layer.

Specifically, m × (C) ₁ +C ₁ ) The target tensor is input to the full connection layer to generate the global structure information

Wherein,

represents the number of channels as C ₁ Is determined.

And S6, sampling the non-empty voxel characteristic combination near each key point from the original point cloud data to be used as key point characteristics.

And S61, adopting a plurality of key points from the original point cloud data by using a furthest point sampling algorithm.

Specifically, firstly, a Farthest Point Sampling algorithm (FPS) is used to sample n key points d from the original Point cloud data _i . For example, 2048 for the KITTI dataset n and 4096 for the Waymo dataset n.

Then sampling non-empty voxel characteristic combination near each key point as key point characteristic F ^m The method comprises the following specific steps:

and S62, calculating a non-empty voxel characteristic set of each key point in a target radius range with the key point as the center in the kth sparse convolution module of the sparse convolution network.

Specifically, first, the definition

Features output in the kth sparse convolution module of the sparse convolution network in step S2, N _k For the number of non-empty voxels in the kth sparse convolution module, then for each keypoint d _i It can be defined as the k-th sparse convolution module centered on itself and r _k In a circular area with a target radius, the non-empty voxel feature set is:

wherein,

representing the characteristics output in the K sparse convolution module; .

And S63, generating the output feature of each key point in the k-th sparse convolution module by using the non-empty voxel feature set.

Specifically, a non-empty voxel feature set is utilized to generate a key point d _i Output characteristic f in the kth sparse convolution module _i ^(pvk) ：

Wherein M (-) represents a random sampling operation, and G (-) represents an operation performed by the multi-layer perceptron.

And S64, connecting the output characteristics of each key point in each sparse convolution module to obtain the key point characteristics.

In particular, for each keypoint d _i Obtaining an output characteristic f _i ^(pvk) Then, connecting the features of each sparse convolution module to obtain a total feature, namely obtaining a key point feature F ^m 。

And S7, extracting the key point features by using a feature extraction module combining a graph neural network and an attention mechanism to obtain grid point features.

Specifically, a feature extraction module combining a graph neural network and an attention mechanism is constructed, and key point features F are combined ^m Inputting the data into the feature extraction module for feature extraction to obtain grid point features。

Referring to fig. 4, fig. 4 is a schematic flow chart of a feature extraction method combining a graph neural network and an attention mechanism according to an embodiment of the present invention. The characteristic extraction method comprises the following steps:

s71, the graph neural network regards each point of the point cloud as a node of the graph structure, and each point and adjacent points generate edges; feature F of node i in graph structure _i ^m Is shown as V ⁱ ＝MLP(F _i ^m ) An edge can be represented as a linear projection of the position difference between two nodes, which is expressed by the formula:

feature F of key points ^m Inputting the data into a feature extraction module, and extracting features from adjacent nodes by weighted combination operation

The calculation is as follows:

where W (-) indicates for mapping graph edges into scalar or vector weight spaces, an "-" indicates a scalar vector product between a learning weight and a graph node,

linear projection, p, representing the difference in position between two nodes _i Indicating the position of the ith node, p _j Denotes the position of the jth node, V ⁱ ＝MLP(F _i ^m ) Representing the characteristics of node i, F _i ^m Representing the keypoint features.

S72. in the attention mechanism concept,

can be considered as slave grid point p _gird To point p _i Request mapping of, V ⁱ Can be regarded as slave feature f _i Obtained value mapping, keyMapping K _i Can be represented as K ⁱ ＝Linear(f _i ) The process of extracting features can be expressed as the following formula,

for the extracted features:

wherein W (-) represents the softmax function.

S73, combining the formula (3) and the formula (4), and obtaining grid point characteristics:

where Ω (r) represents all points within a fixed radius r of a grid point, W (-) represents the mapping of a graph edge into a scalar or vector weight space, σ _* Representing a gate function with learning, by linear projection and sigmoid function,

The formula (5) combines the principle based on the graph neural network and the attention mechanism, and is a flexible and effective feature extraction method; which can be adaptively derived from the geometric information Q by means of a gate function with learning _pos Semantic information K and composition Q _pos Learning more meaningful point cloud characteristics in K to obtain grid point characteristics F ^g 。

And S8, fusing the grid point characteristics and the global structure information to obtain enhanced characteristics.

Specifically, as shown in fig. 3, a perspective channel attention module is constructed, and grid point characteristics F are obtained ^g And global structure information F ^s Input into the perspective channelIn the module, the output enhancement feature F ^e . Further, in the perspective channel attention module, the grid point feature F is first identified ^g And global structure information F ^s Connecting, inputting the total characteristics obtained by connection into two paths of maximum pooling layers, linear layers and Rule functions for processing respectively, multiplying array elements in the two paths of processing results in sequence, processing the multiplication result by using a sigmoid function, and multiplying the characteristics obtained by connection and the processing result of the sigmoid function by using matrix multiplication to obtain an enhanced characteristic F ^e 。

And S9, performing confidence prediction and candidate frame refinement on the enhanced features to obtain confidence and bounding box parameters.

Specifically, the enhanced features are sequentially input into a two-layer multilayer perceptron to extract feature vectors, the feature vectors are input into a first branch network to carry out confidence prediction, the feature vectors are input into a second branch network to carry out candidate frame refinement, the confidence and the bounding box parameters are obtained, and therefore the detection recognition result is obtained. Further, the bounding box parameter may determine a target range of detection recognition, and the confidence level is used to determine the accuracy of the bounding box parameter.

First a confidence prediction is made. Specifically, the confidence prediction is obtained by calculating the intersection ratio (IoU) between the 3D prediction candidate region and the corresponding truth box. For the k 3D prediction candidate region, its confidence value y _k Is normalized to be between [0,1 ]]As follows:

y _k ＝min(1,max(0,2IOU _k -0.5)) (6)

wherein, the IOU _k The intersection ratio between the kth 3D prediction candidate region and its corresponding truth box is determined.

And then carrying out candidate frame refinement. Specifically, the refinement of the candidate frame is achieved by encoding the points of each 3D prediction candidate frame.

Further, after obtaining confidence and bounding box parameters, the method

The loss function is optimized:

wherein x is ti-ti ^* Ti is the predicted coordinate vector and ti is the true box coordinate vector.

Through optimization of the loss function, the recognition rate of the target result detected by the algorithm and the actual real target can be measured.

Further, the present embodiment verifies the recognition rate of the target detection through simulation.

1. Simulation conditions

The computer system adopted in the embodiment is a multi GPU SCS4880 series supercomputing server. The CPU processor is an Intel to Strong E5-2630V4 processor. The 8 NVIDIA RTX 2080Ti 11GB video cards are integrated, and the hard disk consists of 1 system disk and 2 storage disks. The operating system adopts Ubuntu16.0.4, and the operating environment is NET.framework 4.5. And installing support of related libraries such as CUDA9.0, Python3.5, CUDNN 0.7 and the like.

2. Emulated content

And calculating the target detection recognition rate according to the detection recognition method. By calculation, the target detection recognition rate is 78.9%. Therefore, the detection and identification method based on the laser point cloud three-dimensional information enhancement fully excavates the information contained in the point cloud data, can obtain higher detection and identification precision, and has advancement.

In conclusion, the detection and identification method of the embodiment completes the shape of the point cloud space, so that the extracted structural information contains more semantic representations, the feature representation of the point cloud data is enhanced, the method has stronger representation capability compared with a feature extraction mode adopted by the existing method, the problem that the existing method is difficult to extract the point cloud data features is solved, the detection precision of the laser point cloud target is improved, and the method can be used in the field of unmanned driving, such as obstacle identification and path planning.

Example two

On the basis of the first embodiment, please refer to fig. 5, and fig. 5 is a schematic structural diagram of a detection and identification apparatus for enhancing three-dimensional information based on laser point cloud according to an embodiment of the present invention. The detection and identification device comprises: a point cloud voxelization module, a feature map extraction module, a candidate frame generation module, a point cloud space shape completion module, a global structure information extraction module, a key point feature sampling module, a grid point feature extraction module, an enhanced feature fusion module and a confidence coefficient and bounding box parameter calculation module, wherein,

the point cloud voxelization module is used for voxelizing the original point cloud data to obtain a plurality of voxels. The characteristic map extraction module is connected with the point cloud voxelization module and used for extracting the point cloud voxel characteristics of each non-empty voxel in the voxels by utilizing a sparse convolution network to obtain a plurality of characteristic maps. The candidate frame generation module is connected with the feature map extraction module and used for inputting the feature maps into the regional suggestion network to generate a first-stage candidate frame. And the point cloud space shape complementing module is connected with the candidate frame generating module and used for inputting the candidate frame of the first stage into the point cloud space shape complementing network to complement the point cloud space shape to obtain a target point set. And the global structure information extraction module is connected with the point cloud space shape completion module and used for extracting point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain global structure information. And the key point feature sampling module is connected with the point cloud voxelization module and the feature map extraction module and is used for sampling the non-empty voxel features near each key point from the original point cloud data and combining the non-empty voxel features as key point features. And the grid point feature extraction module is connected with the key point feature sampling module and is used for extracting the key point features by utilizing the feature extraction module combining the graph neural network and the attention mechanism to obtain the grid point features. The enhanced feature fusion module is connected with the global structure information extraction module and the grid point feature extraction module and is used for fusing the grid point features and the global structure information to obtain enhanced features. The confidence coefficient and boundary frame parameter calculation module is connected with the enhanced feature fusion module and is used for carrying out confidence coefficient prediction and candidate frame refinement on the enhanced features to obtain confidence coefficient and boundary frame parameters.

Please refer to embodiment one for a specific implementation method of each module, which is not described in detail in this embodiment.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A detection and identification method based on three-dimensional information enhancement of laser point cloud is characterized by comprising the following steps:

2. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud as claimed in claim 1, wherein the voxel processing is performed on the original point cloud data to obtain a plurality of voxels, and the method comprises the following steps:

3. The laser point cloud-based three-dimensional information enhanced detection and identification method according to claim 1, wherein the extracting point cloud voxel features of each non-empty voxel in the voxels by using a sparse convolution network to obtain feature maps comprises:

constructing the sparse convolutional network;

4. The method of claim 1, wherein inputting the first-stage candidate box into a point cloud space shape complementing network to complement a point cloud space shape to obtain a target point set comprises:

5. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud as claimed in claim 1, wherein extracting the point cloud structure information from the target point set by using a multi-scale grouping strategy to obtain the global structure information comprises:

selecting a plurality of points from the target point set to form a target set by using a furthest point sampling algorithm;

6. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein sampling the non-empty voxel features in the vicinity of each key point from the original point cloud data as key point features comprises:

wherein,

representing three-dimensional coordinates showing the keypoints in the kth sparse convolution module,

representing the features output in the Kth sparse convolution module;

7. The method for detecting and identifying three-dimensional information enhancement based on laser point cloud of claim 1, wherein the grid point features are as follows:

8. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein the step of fusing the grid point features and the global structure information to obtain enhanced features comprises the steps of:

9. The method for detecting and identifying the three-dimensional information enhancement based on the laser point cloud of claim 1, wherein performing confidence prediction and candidate frame refinement on the enhancement features to obtain confidence and bounding box parameters comprises:

10. The utility model provides a detection recognition device of three-dimensional information reinforcing based on laser point cloud which characterized in that includes:

and the confidence coefficient and bounding box parameter calculation module is used for carrying out confidence coefficient prediction and candidate box refinement on the enhanced features to obtain confidence coefficient and bounding box parameters.