CN114638794A

CN114638794A - Crack detection and three-dimensional positioning method based on deep learning and SLAM technology

Info

Publication number: CN114638794A
Application number: CN202210214242.5A
Authority: CN
Inventors: 周静; 宋先义; 郭玲
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-17

Abstract

The invention discloses a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, which comprises the following steps: creating a data set, constructing a crack detection network model, and training and storing an optimal model; fusing data acquired by an Inertial Measurement Unit (IMU) and a camera KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of a visual SLAM; judging each frame image acquired in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information; and integrating the crack information and the depth information acquired by KinectV2 into a visual inertia SLAM frame, completing the construction of a dense point cloud map containing the crack information, and realizing the three-dimensional positioning of the crack. The method realizes real-time detection of the crack and positioning of the crack in a three-dimensional environment, and has high detection accuracy and high robustness.

Description

Crack detection and three-dimensional positioning method based on deep learning and SLAM technology

Technical Field

The invention relates to a pavement crack detection and positioning method, in particular to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology.

Background

Pavement cracks are the most common diseases of roads, so that the detection of the pavement cracks is particularly important for road maintenance. At present, the traditional pavement detection work needs manual work, a large amount of time and energy are consumed, the work efficiency and the safety are low, and therefore an automatic pavement crack detection method needs to be researched. Common automatic detection algorithms are mostly based on a deep learning network, the crack detection algorithm based on deep learning can improve the efficiency of maintenance work on a road surface, meanwhile, the efficiency of related work such as maintenance of infrastructure such as bridges, tunnels and dams is improved to a certain extent, the inspection maintenance cost is reduced, the automation of crack defect inspection is realized, and the method has great application value and significance.

The invention patent 201911371906.3 discloses a crack detection method and system based on image processing, which adopts a pixel-tracking-based algorithm to connect cracks, and simultaneously performs shape feature analysis on the connected regions to screen out regions which do not meet requirements. The model realized by the method needs manual feature selection, is greatly influenced by environmental factors, and has low generalization.

The invention patent 201911355501.0 discloses a crack detection network based on a Faster R-CNN network, which can classify cracks and backgrounds and obtain the positioning result of a crack boundary box, but the preprocessing of the image in the early stage of the method ignores the interference of road noise and different illumination intensities, so the final crack detection accuracy is low, and the method is greatly influenced by interferents.

The invention patent 202010247786.2 discloses a pavement crack rapid identification method based on deep learning, which utilizes a feature extraction network to map a road image to be detected to a coding space to obtain feature images with different sizes; and inputting the feature image of each size into a crack detection network, performing progressive feature fusion, preset convolution operation and attention enhancement on the feature image, and performing subsequent preset convolution operation and upsampling on the feature image to obtain a feature image which has the same size as the road image and is marked with a crack area, wherein the crack is not subjected to feature-based processing subsequently, and the crack is quantized.

The invention patent 202010236401.2 discloses a crack detection model based on yolov5, which can output the monitoring information of cracks, including the prediction frame, detection category and category confidence of images, and set the loss function of the model as the weighted sum of a classification loss function, a target loss function, a regression loss function and a cheap angle loss function, so that the anti-interference performance and the accuracy of crack detection are improved, but the network cannot obtain the specific position information of the cracks.

In summary, the crack efficiency can be improved by the crack based on the deep learning network, but many crack detection networks have high requirements on the number of training sets and image processing, and the network model is complex, and meanwhile, only the position of the crack in the plane image is obtained, but the three-dimensional position of the crack in the environment is not obtained, which is not beneficial to the subsequent repair work.

Disclosure of Invention

The invention aims to provide a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, solves the problems that the accuracy and efficiency of a plurality of crack detection networks are low, and the three-dimensional positioning of cracks cannot be completed at the same time, and completes the real-time pavement crack detection of a mobile robot in the environment.

The technical scheme for realizing the invention is as follows: a crack detection and three-dimensional positioning method based on deep learning and SLAM technology comprises the following steps:

s1, creating a data set of the crack detection network, constructing a crack detection network model, and training and storing an optimal model;

s2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and completing pose estimation in a tracking thread of a visual SLAM frame;

s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information;

and S4, integrating the two-dimensional crack detection module obtained in the step S3 and the depth information acquired by the KinectV2 into a visual inertial SLAM frame, completing construction of a dense point cloud map containing crack information, and realizing three-dimensional positioning of cracks.

Further, the specific operation of step S1 is as follows:

(1) collecting pavement crack images, and preprocessing and marking the pavement crack images;

(2) based on the construction mode of the full convolution neural network, a coding part consists of four groups of same coding blocks, each coding block consists of two convolution layers of 3 multiplied by 3 and a convolution layer of 1 multiplied by 1, a residual module is added into each layer, and then the coding block enters a maximum pooling layer of 2 multiplied by 2; the decoding part of the network is composed of four groups of same decoding blocks, a characteristic image obtained after deconvolution is connected with a characteristic image of a layer corresponding to the coding part in a jumping way, and a double-channel attention mechanism is added in the jumping connection; adding a 1 multiplied by 1 convolution layer in the last layer of the decoding part to realize end-to-end crack segmentation and obtain a crack detection result image with the same size as the input image;

further, the specific operation of step S2 is as follows:

(1) calibrating a KinectV2 camera and an IMU respectively to obtain respective internal parameters, and then carrying out combined calibration on the KinectV2 camera and the IMU to obtain a conversion matrix and a time difference of the KinectV2 camera and the IMU;

(2) calculating a pre-integration model of the IMU between continuous frames through an error model and a motion model of the IMU, so that the problem that the information acquisition frequencies of a camera and the IMU are not aligned is solved; initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;

(3) and performing data fusion on the IMU pre-integration and visual information in a tight coupling mode, realizing pose estimation by adopting a non-linear optimization mode based on sliding window marginalization aiming at a visual reprojection error and an IMU residual error, and obtaining an optimized pose through a target optimization function.

Further, the filtering rule regarding the key frames in step S3 is as follows, and one of the following rules is satisfied:

1) at least 20 frames away from the last reference key frame;

2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;

3) the current frame can track at least 50 characteristic points;

4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, so that the overlapping rate is low;

further, the specific operation of step S4 is as follows:

(1) and (3) screening out key frames through visual SLAM tracking, BA optimization and loop detection correction, calculating coordinates of each pixel point under a camera coordinate system and a world coordinate system according to a color image and a depth image which are acquired by KinectV2 and correspond to the key frames according to a camera imaging principle of KinectV2, and constructing a dense point cloud map.

(2) And 3, mapping the crack information obtained by performing semantic segmentation on all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating by adopting Bayesian updating aiming at the condition that semantic labels among continuous key frames are inconsistent, wherein the specific mode is as follows:

wherein the three-dimensional point is set as V_d，

Representing the current set of all key frames,/_kA class of three-dimensional voxels is represented,

representing the independently distributed probability distribution of the three-dimensional point on the semantic label set.

And updating the semantic information of the three-dimensional point cloud by the above formula to obtain a dense point cloud map with crack information which is globally consistent, and obtaining the three-dimensional position information of the crack by the point cloud.

Compared with the prior art, the invention has the following advantages:

(1) the crack detection network for deep learning constructed by the invention simplifies the difficulty and complexity of crack detection and improves the accuracy and robustness of crack detection;

(2) according to the method, based on a sliding window tight coupling rear end nonlinear optimization mode, measurement data of KinectV2 and IMU are fused, so that pose estimation is completed, positioning accuracy is improved, and the problems that pure vision is easy to track and lose in a characteristic point sparse scene and positioning fails are solved;

(3) according to the method, the visual inertial SLAM technology and the two-dimensional crack detection network are combined, the dense point cloud map containing crack information is constructed, the problem that the sparse point cloud map constructed in the original visual SLAM lacks semantic information is solved, and three-dimensional positioning of cracks is achieved.

Drawings

FIG. 1 is a flow chart of crack detection and location in accordance with the present invention.

Fig. 2 is a diagram of the deep learning network structure of the present invention.

FIG. 3 is a dual channel attention gate diagram in the deep learning network of the present invention.

FIG. 4 is a flow chart of the construction of the dense point cloud map based on the visual SLAM according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings and accompanying examples.

According to fig. 1, the invention relates to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, and the crack detection method comprises the following steps:

and S1, creating an image training set, constructing a crack detection network model, and training and storing an optimal model.

The specific operation of step S1 is as follows:

(1) collecting original data, expanding data, and normalizing the images, including cutting and graying. Aiming at the influence caused by uneven illumination intensity, image preprocessing is carried out by adopting technologies such as histogram equalization, Gaussian bilateral filtering and the like;

(2) and marking the preprocessed image, wherein the crack area is marked as white, and the non-crack area is marked as black, so as to obtain and store a mask image. The image data set was divided into a training set and a test set with a ratio of 7: 3. Wherein the number of cracked images and non-cracked images contained in the training set is equal.

(3) Constructing a fracture splitting network as shown in fig. 2 and 3, wherein the fracture splitting network comprises four times of down sampling and four times of up sampling: each downsampling is carried out by adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the convolved image is cascaded with original input information, then the downsampling is carried out by using pooling with the scale of 2 multiplied by 2, the result is used as the input of the next layer of convolution, and through the operation, the characteristic channel is expanded to be twice of that of the previous layer of network; and when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data is processed by the AG module, the output signal is cascaded with the up-sampled gate signal and then output, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, but each feature vector with 64 dimensions is mapped to an output layer through 1 × 1 convolution in the last layer of the network.

The GDL loss function is selected as the loss function of the deep learning network, and the detection accuracy of the crack is influenced due to the problems that the conventional loss cross entropy function has a good segmentation effect on the background pixels of the image but is not sensitive to the crack pixels; updating model parameters by adopting an Adam optimization algorithm;

and realizing the crack detection network in a Pythrch frame, and training according to the selected training strategy and the image training set. And (3) training by adopting a GPU mode, setting most of hyper-parameters of the network, such as learning rate, iteration times and the like according to specific training conditions, observing a Loss curve and an accuracy rate curve, and obtaining and storing an optimal crack detection model. And verifying the generalization capability and accuracy of the model in a test set.

And S2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM.

The specific operation of step S2 is as follows:

(1) respectively completing the calibration of KinectV2 and IMU by utilizing a calibration tool iai _ kinect2 and kalibr _ all in ROS to obtain internal parameters; and carrying out combined calibration on the two to obtain the conversion matrix of the two and the time difference caused by the sampling frequency.

(2) The error model of the IMU is as follows.

Wherein,

showing the measurements of the accelerometer and gyroscope. a (t) and ω (t) represent the true values of the two, b (t) refers to the slowly varying error produced by the sensor, called random walk noise, and η (t) represents white noise with extremely fast fluctuations in variation.

Substituting the error model of the IMU into the motion model to obtain the following IMU complete motion model:

wherein R is_EBA rotation matrix representing the world coordinate system to the IMU coordinate system, a_W、v_W、p_WRespectively representing the acceleration, the speed and the translation amount under a world coordinate system, i and j are two adjacent key frames, delta phi_ij、δv_ij、δp_ijRespectively representing the noise, R, of the IMU during rotation, speed, position measurements_i、v_i、p_iRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; r_j、v_j、p_jRespectively representing the relative rotation of the jth key frameMatrix, speed and translation; Δ t_ijRepresenting the time difference between the two time instants i, j.

And (3) performing pre-integration on adjacent key frames to obtain a pre-integration model as the formula (3).

Wherein, delta phi_i，δv_ij，δp_iRepresenting rotation, velocity, noise at the time of position measurement of the IMU, respectively.

(3) And initializing the bias, the gravitational acceleration and the speed of the gyroscope, and reducing the accumulated error of the IMU.

(4) The residual model of the IMU and the reprojection error model of the camera are shown in equations (4) and (5), and thus a nonlinear optimization model is established, as in equation (6).

Wherein,

is a residual model of the inertial measurement unit, where X is the variable to be optimized,

value of IMU pre-integration, e_p、e_q、e_v、e_ba、e_bgRespectively performing zero offset on the position, the speed, the attitude and a gyroscope in the pre-integration of the inertia measurement unit, performing zero offset on the accelerometer, wherein b is random walk noise and beta is white noise;

for the visual reprojection error, ξ, of the entire system_iLie algebra, m, corresponding to the pose of the camera_jIs a three-dimensional map point, p_ijAnd the three-dimensional map points are corresponding pixel points in the image.

Aiming at the problem that the calculation amount of an optimization model is increased rapidly along with the operation of a system, a sliding window mode is adopted for optimization, namely, only the constraint relation between the removed key frame and the rest frames is reserved in the sliding window, the old key frame is not directly removed, and the reserved constraint relation is not used for optimization, so that the calculation amount can be reduced. Therefore, the pose optimization is carried out by establishing an optimization objective function shown in the formula (7), wherein three items represent prior information, a visual reprojection error and an IMU residual error respectively.

Where B represents the set of all IMU measurements, k represents the kth image, H_pHessian matrix representing prior information, and gamma represents Jacobian matrix of optimized variables

the screening rules for key frames are as follows, one of which is satisfied:

1) at least 20 frames away from the last reference key frame;

3) the current frame can track at least 50 characteristic points;

s4, the two-dimensional crack detection module obtained in the step 4 and the depth information collected by KinectV2 are integrated into a visual SLAM frame, the dense point cloud map containing crack information is constructed, and three-dimensional positioning of cracks is achieved.

The specific operation of step S4 is as follows:

(1) according to fig. 4, the color image and the depth image collected by the KinectV2 are used as sensor input of the visual SLAM, and then the color image and the depth image corresponding to the key frame which meets the rule are screened out through tracking, BA optimization and loop detection.

(2) Each pixel point in the depth map comprises two-dimensional pixel coordinates (u, v) and a depth value d of the pixel point on the color map, the coordinates of the pixel point converted into a camera coordinate system are obtained through a formula of parameters obtained through the imaging principle of a camera and the calibration of a Kinectv2 camera, and the pixel point is converted into a world coordinate system through a formula (8). And processing by a function in the PCL library to obtain a dense point cloud map in a pcd format.

Wherein C is a camera internal reference matrix, R and t are respectively a rotation matrix and a translation vector of the camera, [ x, y, z [ ]]^TRepresenting the coordinates of the point cloud in the world coordinate system.

(3) And 3, performing crack detection on the key frame, and converting the coordinates of the map points corresponding to the characteristic points for segmenting the cracks into a world coordinate system through a coordinate conversion relation. Due to the fact that the data collected by the sensor is unstable, inconsistency of crack semantic labels among continuous key frames can be caused, and conflicts are generated when the semantic labels are fused. A method of progressive semantic tag association using Bayesian estimation.

Suppose that key frame K is currently being processed_tThree-dimensional point set as V_dThen all key frames are aggregated as

The distribution l of the semantic label of the point can be known through Bayesian updating_kAs follows:

then, the formula (10) can be obtained by Markov hypothesis:

wherein,

due to P (l)_k) Is quantitative and does not change with time, so the regularization term factor can be ignored

The semantic information of the three-dimensional point cloud when a new key frame arrives is updated through the formula (11).

And associating the two-dimensional semantic labels of the plurality of key frames through Bayes updating, and realizing the transfer of the two-dimensional semantic labels to the three-dimensional point cloud through the coordinate conversion relation in the dense reconstruction thread, so as to obtain the three-dimensional crack semantic map with consistent global semantic labels.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative examples and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A crack detection and three-dimensional positioning method based on deep learning and SLAM technology is characterized by comprising the following steps:

s1, constructing a crack detection network model, training the crack detection network model through a data set, and storing an optimal model;

s2, fusing data acquired by the inertial measurement unit and the camera in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM;

s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection network model, and extracting crack information;

and S4, integrating the crack information obtained in the step S3 and the data acquired by the camera into a visual inertia SLAM frame, and constructing a dense point cloud map containing the crack information to complete three-dimensional positioning of the crack.

2. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the crack detection network model is built based on a full convolution neural network, a residual error module and an attention mechanism, a convolution layer of 1 x 1 is added to the convolution group of each layer, and the residual error module is added at the same time; a two-channel attention mechanism is added to the jump connection part of encoding and decoding.

3. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 2, wherein the step S1 specifically comprises:

step S1-1, collecting original data and performing data expansion to construct a data set, performing normalization processing on the data, and performing image preprocessing by histogram equalization and Gaussian bilateral filtering;

step S1-2, labeling the preprocessed image, wherein the crack area is marked as white, the non-crack area is marked as black, obtaining and storing a mask image, and dividing the data set into a training set and a test set;

step S1-3, constructing a crack detection network model, including four down-sampling and four up-sampling: each downsampling is performed by firstly adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the image is cascaded with original input information, then the downsampling is performed by adopting pooling with the scale of 2 multiplied by 2, and the result is used as the input of the next layer of convolution; when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data are processed by the AG module, the output signal is cascaded and output with the up-sampled gate signal, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, each feature vector with 64 dimensions is mapped to an output layer by using 1 × 1 convolution at the last layer of the network;

and step S1-4, iteratively training the crack detection network model based on the training set, and verifying through the test set until the optimal model meeting the set value is obtained.

4. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 3, wherein the ratio of the training set to the test set is 7:3, and the number of cracked images and crack-free images contained in the training set is equal.

5. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the step S2 specifically comprises:

s2-1, calibrating the camera and the inertia measurement unit respectively to obtain respective internal parameters, and then calibrating the camera and the inertia measurement unit jointly to obtain a conversion matrix and a time difference of the camera and the inertia measurement unit;

s2-2, calculating a pre-integration model between continuous frames of the inertial measurement unit through an error model and a motion model of the inertial measurement unit, and initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;

and S2-3, performing data fusion on the pre-integrated model and visual information in a tight coupling mode, performing pose estimation on the visual reprojection error and the inertial measurement unit residual error by using a sliding window marginalization-based nonlinear optimization model, and determining the optimized pose by using a target optimization function.

6. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3,

the pre-integration model is as follows:

where i and j are two adjacent key frames, δ φ_ij、δv_ij、δp_ijRespectively representing the noise in the rotation, velocity, position measurements of the inertial measurement unit, R_i、v_i、p_iRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; r_j、v_j、p_jRespectively representing the relative rotation matrix, the speed and the translation quantity of the jth key frame; Δ t_ijRepresenting the time difference between the two time instants i, j.

7. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the nonlinear optimization model is:

wherein,

value of pre-integration for the inertial measurement unit, e_p、e_q、e_v、e_ba、e_bgRespectively pre-integrating the position, the speed, the attitude and the zero offset of a gyroscope, an accelerometer is zero offset, b is random walk noise, and beta is white noise in an inertial measurement unit;

8. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the objective optimization function is:

wherein B represents allSet of IMU measurements, k denotes the k-th image, H_pAnd a hessian matrix representing prior information, and gamma represents a jacobian matrix of each optimization variable.

9. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the screening rule of the key frame screened in step S3 is one of the following rules:

1) at least 20 frames away from the last reference key frame;

3) the current frame can track at least 50 characteristic points;

4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, and the overlapping rate is ensured to be low.

10. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the step S4 specifically comprises:

s4-1, carrying out dense point cloud mapping on the color map and the depth map corresponding to the keyframe screened out through visual SLAM tracking, local BA optimization and loop correction to obtain a dense point cloud map;

s4-2, mapping the crack information obtained by semantically dividing all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating Bayes aiming at the inconsistency of semantic labels among continuous key frames, namely:

wherein the three-dimensional point is set as V_d，

Represents the current set of all key frames,/_kA class of three-dimensional voxels is represented,

representing the independent distribution probability distribution of the three-dimensional point on the semantic label set;

when a new key frame arrives, the semantic information of the three-dimensional point cloud is updated according to the formula, a globally consistent dense point cloud map with crack information is obtained, and the three-dimensional position information of cracks is obtained through the dense point cloud map.