Disclosure of Invention
In view of the analysis, the invention provides a rapid semantic map construction method for unstructured roads, which is used for solving the problems of low construction precision, low speed and the like of the existing high-precision map for mining area scenes.
The invention discloses a rapid semantic map construction method for unstructured roads, which comprises the following specific steps:
step 1, constructing a semantic segmentation model;
step 11, obtaining pictures and point cloud data at the same moment;
step 12, rasterizing the point cloud data by a cylinder to obtain point cloud grid units and cylinder characteristics of points, and obtaining tensors of the characteristics of each point cloud grid unit based on the cylinder characteristics of the points and the MLP (multi-layer neural network);
step 13, downsampling the points of the point cloud grid unit;
step 131, inputting tensors of the characteristics of each point cloud grid unit into an asymmetric residual block; outputting tensors of the sparse convolution characteristics;
step 132, multi-modal focus convolution downsampling:
carrying out sub-manifold cavity convolution on each cylindrical grid to obtain an output shape of each cylindrical grid position;
projecting RGB pixel point information of the picture onto point cloud data in each point cloud grid unit, and assigning RGB pixel point information to each point in each point cloud grid unit to obtain an RGB point cloud grid unit;
sparse convolution feature extraction is carried out on the points of the RGB point cloud grid units, importance probability calculation is carried out through a sigmoid function, and a tertiary importance graph I at the position k in the convolution kernel is obtained k p And center I of the tertiary importance map 0 p ;
Three-time importance map I at position k based on convolution kernel k p And center I of the tertiary importance map 0 p Selecting an important grid;
generating a convolution kernel dynamic convolution output position p based on the output shape of each cylindrical grid position and the position of the convolution kernel of the significant grid out ;
Dynamically convolving the convolution kernel by an output position p out Performing multi-mode focus convolution to obtain tensors of downsampled output features;
step 14, up-sampling the points of the point cloud grid unit;
step 141, multi-modal focus convolution upsampling:
inputting a tensor of the down-sampling output characteristics, and performing multi-mode focus deconvolution based on the RGB point cloud grid unit to obtain a vector of deconvoluted characteristics;
splicing the tensor of the downsampled output features with the tensor of the deconvoluted features to obtain the tensor of the spliced features; inputting tensors of the splicing characteristics into an asymmetric residual block; obtaining up-sampling output grid characteristics;
step 142, up-sampling the obtained up-sampling output grid characteristics to obtain tensors of down-sampling and up-sampling superposition characteristics of each point;
step 15, refining tensors of the downsampled and upsampled superposition characteristics of each point by using the MLP point characteristics of each point obtained in the step 12; acquiring the category probability of each refined point by using a loss function, and taking the highest category probability of each point as the point label of the point;
step 2, constructing a real-time semantic map;
acquiring real-time GPS-IMU data and real-time Lei Dadian cloud data of the unmanned vehicle;
using the semantic segmentation model constructed in the step 1 to form a key frame semantic point cloud for the label of each point in the obtained real-time point cloud data of each frame;
and superposing all the key frame semantic point clouds to obtain a semantic map, and downsampling the semantic map to generate a semantic RGB map.
Optionally, the point tags include obstacle tags, ground tags, and retaining wall tags.
Optionally, the method further comprises a step 3 of correcting the semantic RGB map, and specifically comprises the following steps:
step 31, searching labels of all points with the radius R around the points divided into barrier labels in the semantic RGB map through Kdtree, and resetting other points as barrier labels except the points divided into ground labels; counting the categories of the labels with the most categories of the original labels at the moment as updated labels of the points reset as obstacle labels by searching the N original labels of the points around the reset label through KNN; forming all the points after updating the labels into an obstacle semantic map;
step 32, filtering dynamic obstacles to obtain an output semantic map;
inputting a current frame obstacle semantic map and a previous frame obstacle semantic map;
if the labels of the pixel points at the same position in the two adjacent frames of obstacle semantic maps are different, the two adjacent frames of obstacle semantic maps are assumed to have dynamic obstacles;
for recursive Bayes filteringStability term addition penalty term odds (p) penalty ) Obtaining the stability probability l of the dynamic obstacle at the current moment s (t) :
Wherein l s (t-1) The stability probability of the dynamic obstacle at the previous moment; p (P) stable And p prior The known stable map surface element probability and the known map surface element prior probability are respectively; p is p penalty Punishment items for point cloud noise; alpha is an included angle between the normal line of the distance image and the normal line of the depth image, and d is a difference value between the obstacle semantic map and the distance image; sigma (sigma) α 2 Noise α; sigma (sigma) d 2 Noise of d;
according to the stability probability l of the dynamic obstacle at the current moment
s (t) Judging whether dynamic barriers exist in the current frame barrier semantic map, and l
τ To set the stability threshold, if
Then there is no dynamic obstacle and the obstacle semantic map of the dynamic obstacle frame is not present to construct the output semantic map.
Optionally, when the real-time semantic map is constructed, the calibrated GNSS/IMU integrated navigation positioning system and the laser radar of the unmanned vehicle are used for acquiring real-time GPS-IMU data and real-time Lei Dadian cloud data of the unmanned vehicle.
Optionally, the calibration method of the GNSS/IMU integrated navigation positioning system and the laser radar of the unmanned vehicle comprises the following steps:
acquiring an initial rotation translation matrix of a GNSS/IMU combined navigation positioning system and a laser radar of the unmanned vehicle; the rotation translation matrix is the initial deviation and the initial attitude angle of the laser radar coordinate system and the GPS vehicle body coordinate system in the XYZ direction, wherein the attitude angle comprises a yaw angle, a roll angle and a pitch angle;
the laser radar is arranged at the front part and/or the tail part of the automatic driving vehicle, the GNSS/IMU integrated navigation positioning system is arranged on the vehicle body, and the measuring deviation of the laser radar position and the XYZ direction of the position of the center of the vehicle body is measured; placing a chess calibration plate in front of a central line in the advancing direction of a vehicle body and perpendicular to the central line, extracting point characteristics of corner points of the calibration plate, fitting a plane where the calibration plate is positioned according to the extracted point characteristics of the corner points, acquiring measurement attitude angles of a GPS vehicle body coordinate system and a laser radar coordinate system, and determining a rotation translation conversion relation of the GPS vehicle body coordinate system and the laser radar coordinate system of the unmanned vehicle according to measurement deviation of the position of the GPS vehicle body coordinate system and the laser radar coordinate system in the XYZ direction and the measurement attitude angle;
and calibrating an initial rotation translation matrix of the GNSS/IMU integrated navigation positioning system and the laser radar of the unmanned vehicle based on the rotation translation conversion relation.
Optionally, extracting edge features and plane features of input real-time Lei Dadian cloud data of each frame; acquiring real-time position and pose information of the current unmanned vehicle through real-time GPS-IMU data, and converting semantic point clouds of all key frames into a world coordinate system by using an initial rotation translation matrix; if the NDT matching score between the two frames of semantic point clouds is smaller than a threshold value, matching among a plurality of key frame semantic point clouds is carried out by using edge features and plane features, and parameters of an initial rotation translation matrix are optimized.
Compared with the prior art, the invention has at least one of the following beneficial effects:
(1) The method adopts the combination of cylindrical grid sampling and point cloud to extract the original data characteristics, overcomes the sparsity and density non-uniformity of the point cloud, and balances the precision and speed of a characteristic extraction network;
(2) According to the method, three-dimensional data of the cylinder characteristics rho, theta and z are input into a BackBone, and grid point cloud characteristic extraction is carried out. Through four downsampling and three upsampling, RGB branches are added simultaneously, and features of each downsampling and features corresponding to the upsampling are fused through a semantic understanding module based on multidimensional decomposition. In the sampling process, multi-modal focus convolution based on RGB information is used, and point cloud importance analysis is carried out on the periphery of a point cloud grid through RGB color information output by an RGB branch circuit so as to determine the shape of the multi-modal focus convolution. Finally, the output features are subjected to point feature extraction by using MLP again, are fused with the initial point features, refine the points in the grid, and improve the accuracy of point cloud identification;
(3) According to the method, abnormal points are processed based on a boundary erosion algorithm of semantic segmentation, single-frame point cloud data are classified through a model extracted by the semantic segmentation algorithm, the boundary erosion is adopted to process the classified abnormal points, and a semantic segmentation result is optimized.
Detailed Description
The invention will now be described in detail with reference to the drawings and to specific embodiments thereof.
1-6, a rapid semantic map construction method for unstructured roads is disclosed, which comprises the following specific steps:
step 1, constructing a semantic segmentation model;
step 11, enabling the output frequency of the picture and the point cloud data to be consistent, and obtaining the picture and the point cloud data at the same moment; thereby synchronizing the time of the picture with the point cloud data;
preferably, the picture is obtained by a camera and the point cloud data is obtained by a lidar.
Step 12, rasterizing the point cloud data by a cylinder to obtain point cloud grid units and cylinder characteristics of points, and obtaining characteristic tensors of each point cloud grid unit based on the cylinder characteristics of the points and the MLP (multi-layer neural network);
as shown in fig. 2, the point cloud data is divided according to a cylindrical grid from a cartesian coordinate system to obtain a cylindrical grid; converting the points in the cylindrical grid point cloud from the coordinates of a Cartesian coordinate system to cylindrical coordinates;
optionally, the point-wise feature of the point cloud in the cylindrical grid is extracted through the MLP multilayer neural network based on the point-wise target detection method, and the specific steps are as follows:
dividing the point cloud data from a Cartesian coordinate system according to a cylindrical grid to obtain point cloud grid units; the method comprises the steps of acquiring the cylindrical characteristics of points in a point cloud grid unit, wherein the expression is as follows:
z’z’=z。
wherein X is the X-axis coordinate of the point in a Cartesian coordinate system; y is the Y-axis coordinate of the point in the Cartesian coordinate system; z is the Z-axis coordinate of the point in the Cartesian coordinate system; ρ is the horizontal radius of the point and the Z-axis in the Cartesian coordinate system; θ is the azimuth angle of the point relative to the X-Y axis plane; z' is the difference in height between the point and the origin of the Cartesian coordinate system; all points are mapped into a cylindrical grid.
Meanwhile, inputting the cylindrical features rho, theta and z' of all points into an MLP (multi-layer neural network) to perform point-based MLP feature extraction on point cloud data, and increasing the 3-dimensional point features of the point cloud to C in Maintaining the point characteristics, and carrying out maximum pooling on points in the point cloud grid unit through a maximum pooling layer to obtain C in MLP point feature of dimension, output feature C in Tensors of H, W, L, wherein H, W and L are the radius, azimuth and altitude, respectively, of the point cloud grid element, C in Is the dimension of the point;
step 13, downsampling the points of the point cloud grid unit;
step 131, feature C in Tensors of H, W and L are input into an asymmetric residual block; horizontal three-dimensional convolution and vertical three-dimensional convolution feature C in Tensors of H, W, L; for horizontal three-dimensional convolution, 3×1×3 convolution kernels and 1×3×3 convolution kernel feature extraction are sequentially performed; for vertical three-dimensional convolution, sequentially performing 1×3×3 convolution kernels and 3×1×3 convolution kernel feature extraction; while increasing the dimension C of the input in Output sparse convolution feature 2×c in Tensors of H, W, L; each point in each point cloud grid element is traversed.
And adopting an asymmetric residual block to extract the characteristics of the points in the cylinder grid. The expressive power of points in the grid is enhanced, so that the robustness of the network is enhanced, and meanwhile, compared with the traditional 3D convolution, the calculation cost is reduced by extracting the characteristics through the asymmetric residual errors.
Step 132, multi-modal focus convolution downsampling:
performing sub-manifold hole convolution on each cylindrical grid, namely when the center point position P of each convolution kernel is matched with the input point position P of the non-empty cylindrical grid in When overlapping, carrying out convolution operation on the center point of each convolution kernel; wherein, each cylinder grid is subjected to sub-manifold cavity convolution Y p The expression is:
wherein x is p For the input sparse convolution feature at the center point position p of the convolution kernel, x p =(2C inp ,H p ,W p ,L p );w k The weight of the position k in the convolution kernel, d is the spatial dimension, preferably 3; k (K) d Convolving the kernel weight for the sub-manifold holes, preferably k=3, K d =3 3 。
Convolving the sub-manifold holes for each cylindrical grid location, the output shape of each cylindrical grid location
The method comprises the following steps:
wherein P (P, K) d )={p+k|k∈K d }。
Selecting importance of adjacent points:
converting the picture into a coordinate system of point cloud data through a conversion matrix, projecting RGB pixel point information of the picture onto the point cloud data in each point cloud grid unit, and assigning RGB pixel point information to each point in each point cloud grid unit to obtain an RGB point cloud grid unit; thereby spatially synchronizing the image data and the point cloud data;
importance map I
p : sparse convolution feature extraction is carried out on the points of the RGB point cloud grid units, importance probability calculation is carried out through a sigmoid function, and a tertiary importance graph at the position k in the convolution kernel is obtained
And center of the three-time importance map +.>
Wherein, the sparse convolution kernel weight and the sub-manifold cavity convolution kernel weight K
d The same applies.
Importance map I p Sparse convolution feature 2 x C involving input at convolution kernel center point position p in Tensors of H, W, L, RGB image information, thereby obtaining importance of surrounding candidate grid output features, and thus, hole convolution of regular sparse conv and submanifold sparse conv can be balanced, and connectivity information is ensured not to be lost while reducing the calculation amount.
Selecting an important grid:
center of three-time importance graph output by RGB point cloud grid unit
When the importance threshold tau is larger than or equal to the importance threshold tau, the RGB point cloud grid unit is an important grid, and the position of a convolution kernel of the important grid is obtained:
wherein P is
im The position of the convolution kernel, which is the important grid;
the output shape of the convolution kernel, which is an important grid; />
Convolution kernel weights are generated for the importance grid.
Grid importance includes important, relatively important, and unimportant grids; the unimportant grids are point cloud data in the point cloud grid units, the relatively important grids are point cloud data in the point cloud grid units, and the RGB features of the images are extracted to be background point clouds; the important grid is a point cloud in a point cloud grid unit, and the RGB features are extracted as target point clouds.
Wherein the selected important grid generates a convolution kernel dynamic output shape
From the three-time importance diagram->
The decision, the expression is:
finally, a convolution kernel dynamics is generated based on the output shape of each cylindrical grid location and the location of the convolution kernel of the significant gridConvolved output position p out :
Dynamically convolving the convolution kernel by an output position p out Convolving with step length of 2 to obtain p out Output characteristics 2 x C in Tensors of H/2, W/2, L/2; performing four downsampling operations to sequentially obtain first downsampled output characteristics 2×C in Tensor of H/2, W/2, L/2, second downsampled output feature 4*C in H/4, W/4, L/4 tensors, third downsampled output characteristics 8*C in H/8,W/8, L/8 tensor, fourth downsampled output feature 16×C in Tensors of H/16, W/16, L/16.
Step 14, up-sampling the points of the point cloud grid unit;
step 141, multi-modal focus convolution upsampling:
input fourth downsampled output feature 16×c in Tensors of H/16, W/16 and L/16, performing multi-mode focus convolution up-sampling, and performing sub-manifold cavity convolution at the same time as 132, and obtaining a final deconvolution shape P based on the obtained three importance maps obtained by the RGB point cloud grid unit o ‘ ut The method comprises the steps of carrying out a first treatment on the surface of the Based on the deconvolution shape P o ‘ ut Deconvolution with step length of 2 is carried out, and the deconvoluted characteristic 8*C is output in ' H/8,W/8, tensor of L/8;
step 142, third one of the downsampled output features 8*C in H/8,W/8, tensor of L/8 and deconvoluted feature 8*C in ' H/8,W/8, tensors of L/8 are spliced to obtain splice characteristics 16X C in Tensors of H/8,W/8, L/8; splice feature 16 x c in The tensor of H/8,W/8 and L/8 is input into an asymmetric residual block; horizontal three-dimensional convolution and vertical three-dimensional convolution stitching feature 16×c in Tensors of H/8,W/8, L/8; for horizontal three-dimensional convolution, 3×1×3 convolution kernels and 1×3×3 convolution kernel feature extraction are sequentially performed; for vertical three-dimensional convolution, sequentially performing 1×3×3 convolution kernels and 3×1×3 convolution kernel feature extraction; at the same timeReducing input dimensions, outputting upsampled features 8*C in Tensors of H/8,W/8 and L/8 to obtain up-sampling output grid characteristics;
the up-sampling output grid characteristic is up-sampled for three times to obtain down-sampling and up-sampling superposition characteristic C of each point in Tensors of H, W, L.
By superposing the lower sampling block and the upper sampling block, an asymmetric three-dimensional multi-modal focus convolution network is established.
The invention adopts asymmetric residual blocks to strengthen the horizontal and vertical kernels, thereby enhancing the robustness of the cylindrical grid.
Step 15, refining the downsampled and upsampled superposition characteristics of each point by using the MLP point characteristics of each point obtained in the step 12; acquiring the category probability of each refined point by using a loss function, and taking the highest category probability of each point as the point label of the point; the Point-wise refinement module is used for reclassifying the Point labels, so that the problem of information loss caused by label division errors when the labels are divided into cylindrical grids is solved.
Specifically, the spot labels include obstacle labels, ground labels, and retaining wall labels.
Step 2, calibrating a GNSS/IMU integrated navigation positioning system and a laser radar of the unmanned vehicle;
acquiring an initial rotation translation matrix of a GNSS/IMU combined navigation positioning system and a laser radar of the unmanned vehicle; the rotation translation matrix is the initial deviation and the initial attitude angle of the laser radar coordinate system and the GPS vehicle body coordinate system in the XYZ direction, wherein the attitude angle comprises a yaw angle, a roll angle and a pitch angle;
the laser radar is arranged at the front part and/or the tail part of the automatic driving vehicle, the GNSS/IMU integrated navigation positioning system is arranged on the vehicle body, and the measuring deviation of the laser radar position and the XYZ direction of the position of the center of the vehicle body is measured; placing a checkerboard calibration plate vertically with a central line 10 meters in front of the central line in the advancing direction of a vehicle body, extracting point characteristics of corner points of the checkerboard calibration plate, fitting a plane where the checkerboard is positioned according to the extracted point characteristics of the corner points, acquiring measurement attitude angles of a GPS vehicle body coordinate system and a laser radar coordinate system, and determining a rotation translation conversion relation of the GPS vehicle body coordinate system and the laser radar coordinate system of the unmanned vehicle according to measurement deviation of the position where the GPS vehicle body coordinate system and the laser radar coordinate system are positioned in the XYZ direction and the measurement attitude angle;
and unifying a coordinate system of the GNSS/IMU integrated navigation positioning system of the unmanned vehicle and a laser radar based on the rotation translation conversion relation obtained through calibration.
Step 3, constructing a real-time semantic map
The calibrated GNSS/IMU integrated navigation positioning system and the laser radar of the unmanned vehicle are used for respectively acquiring real-time GPS-IMU data and real-time Lei Dadian cloud data of the unmanned vehicle;
extracting edge characteristics and plane characteristics of input real-time Lei Dadian cloud data of each frame according to the roughness; and saving the edge characteristics and the plane characteristics of each frame of real-time Lei Dadian cloud data.
Optionally, feature extraction is performed on each frame of real-time Lei Dadian cloud data using a radar odometer.
Using the semantic segmentation model constructed in the step 1 to form a key frame semantic point cloud for the label of each point in the obtained real-time point cloud data of each frame; deleting other data among the key frame semantic point clouds, and estimating by using the key frame semantic point clouds;
acquiring real-time position and pose information of the current unmanned vehicle through real-time GPS-IMU data, and converting semantic point clouds of all key frames into a world coordinate system by using an initial rotation translation matrix; and judging whether to optimize the initial rotation flat matrix parameters by using the NDT feature matching effect, and if the NDT matching score between two frames of semantic point clouds is smaller than a threshold value, matching a plurality of key frame semantic point clouds by using the edge features and the plane features, so as to optimize the initial rotation flat matrix parameters.
And superposing all the key frame semantic point clouds to obtain a semantic map, and downsampling the semantic map to generate a semantic RGB map.
Step 4, correcting the semantic RGB map:
when the semantic RGB map is obtained by adopting the method, the dynamic obstacle in the optimized single-frame semantic RGB map is restored by adopting a boundary erosion algorithm under the condition that the dynamic obstacle has false detection or the mIOU is smaller, and the method comprises the following specific steps of:
step 41, optimizing obstacle labels in a semantic RGB map:
searching labels of all points with the radius R around the points divided into barrier labels in the semantic RGB map through Kdtree, and setting other points as barrier labels except the points divided into ground labels; after the setting is successful, searching the reset N original labels of the points around the points divided into the obstacle labels through KNN, and counting the class of the label with the highest class of the original label at the moment as a new label of the point divided into the labels again; and constructing all the points after updating the labels into an obstacle semantic map.
Step 42, filtering dynamic obstacle to obtain output semantic map
Inputting a current frame obstacle semantic map and a previous frame obstacle semantic map;
if the labels of the pixel points at the same position in the two adjacent frames of obstacle semantic maps are different, the two adjacent frames of obstacle semantic maps are assumed to have dynamic obstacles;
the probability of dynamic barriers in the assumed adjacent two-frame barrier semantic map is judged by a recursive Bayesian filter, and the specific judging method comprises the following steps:
adding a penalty term odds (p) penalty ) Obtaining the stability probability l of the dynamic obstacle at the current moment s (t) :
Wherein l s (t-1) The stability probability of the dynamic obstacle at the previous moment; p (P) stable And p prior The known stable map surface element probability and the known map surface element prior probability are respectively; p is p penalty Punishment items for point cloud noise; alpha is the distance image normal and depth imageThe included angle between normals, d, is the difference between the obstacle semantic map and the distance image; sigma (sigma) α 2 Noise α; sigma (sigma) d 2 Is noise of d.
According to the stability probability l of the dynamic obstacle at the current moment
s (t) Judging whether dynamic barriers exist in the current frame barrier semantic map, and l
τ To set the stability threshold, if
Then there is a dynamic obstacle and the corresponding obstacle point is deleted; if->
Then there is no dynamic obstacle and the obstacle semantic map of the dynamic obstacle frame is not present to construct the output semantic map.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.