CN114638794A - Crack detection and three-dimensional positioning method based on deep learning and SLAM technology - Google Patents
Crack detection and three-dimensional positioning method based on deep learning and SLAM technology Download PDFInfo
- Publication number
- CN114638794A CN114638794A CN202210214242.5A CN202210214242A CN114638794A CN 114638794 A CN114638794 A CN 114638794A CN 202210214242 A CN202210214242 A CN 202210214242A CN 114638794 A CN114638794 A CN 114638794A
- Authority
- CN
- China
- Prior art keywords
- crack
- crack detection
- model
- dimensional
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 62
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000013135 deep learning Methods 0.000 title claims abstract description 25
- 238000005516 engineering process Methods 0.000 title claims abstract description 19
- 238000005259 measurement Methods 0.000 claims abstract description 23
- 230000000007 visual effect Effects 0.000 claims abstract description 23
- 238000005457 optimization Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 18
- 230000008878 coupling Effects 0.000 claims abstract description 7
- 238000010168 coupling process Methods 0.000 claims abstract description 7
- 238000005859 coupling reaction Methods 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 7
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 230000006870 function Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 13
- 230000010354 integration Effects 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 238000013519 translation Methods 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000001133 acceleration Effects 0.000 claims description 4
- 239000003795 chemical substances by application Substances 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 3
- 230000004927 fusion Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000005295 random walk Methods 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000004913 activation Effects 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 claims description 2
- 230000002146 bilateral effect Effects 0.000 claims description 2
- 238000012937 correction Methods 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims description 2
- 230000009466 transformation Effects 0.000 claims description 2
- 238000002372 labelling Methods 0.000 claims 1
- 238000010606 normalization Methods 0.000 claims 1
- 238000010276 construction Methods 0.000 abstract description 4
- 238000011897 real-time detection Methods 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000012423 maintenance Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000009191 jumping Effects 0.000 description 2
- 230000000750 progressive effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008439 repair process Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, which comprises the following steps: creating a data set, constructing a crack detection network model, and training and storing an optimal model; fusing data acquired by an Inertial Measurement Unit (IMU) and a camera KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of a visual SLAM; judging each frame image acquired in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information; and integrating the crack information and the depth information acquired by KinectV2 into a visual inertia SLAM frame, completing the construction of a dense point cloud map containing the crack information, and realizing the three-dimensional positioning of the crack. The method realizes real-time detection of the crack and positioning of the crack in a three-dimensional environment, and has high detection accuracy and high robustness.
Description
Technical Field
The invention relates to a pavement crack detection and positioning method, in particular to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology.
Background
Pavement cracks are the most common diseases of roads, so that the detection of the pavement cracks is particularly important for road maintenance. At present, the traditional pavement detection work needs manual work, a large amount of time and energy are consumed, the work efficiency and the safety are low, and therefore an automatic pavement crack detection method needs to be researched. Common automatic detection algorithms are mostly based on a deep learning network, the crack detection algorithm based on deep learning can improve the efficiency of maintenance work on a road surface, meanwhile, the efficiency of related work such as maintenance of infrastructure such as bridges, tunnels and dams is improved to a certain extent, the inspection maintenance cost is reduced, the automation of crack defect inspection is realized, and the method has great application value and significance.
The invention patent 201911371906.3 discloses a crack detection method and system based on image processing, which adopts a pixel-tracking-based algorithm to connect cracks, and simultaneously performs shape feature analysis on the connected regions to screen out regions which do not meet requirements. The model realized by the method needs manual feature selection, is greatly influenced by environmental factors, and has low generalization.
The invention patent 201911355501.0 discloses a crack detection network based on a Faster R-CNN network, which can classify cracks and backgrounds and obtain the positioning result of a crack boundary box, but the preprocessing of the image in the early stage of the method ignores the interference of road noise and different illumination intensities, so the final crack detection accuracy is low, and the method is greatly influenced by interferents.
The invention patent 202010247786.2 discloses a pavement crack rapid identification method based on deep learning, which utilizes a feature extraction network to map a road image to be detected to a coding space to obtain feature images with different sizes; and inputting the feature image of each size into a crack detection network, performing progressive feature fusion, preset convolution operation and attention enhancement on the feature image, and performing subsequent preset convolution operation and upsampling on the feature image to obtain a feature image which has the same size as the road image and is marked with a crack area, wherein the crack is not subjected to feature-based processing subsequently, and the crack is quantized.
The invention patent 202010236401.2 discloses a crack detection model based on yolov5, which can output the monitoring information of cracks, including the prediction frame, detection category and category confidence of images, and set the loss function of the model as the weighted sum of a classification loss function, a target loss function, a regression loss function and a cheap angle loss function, so that the anti-interference performance and the accuracy of crack detection are improved, but the network cannot obtain the specific position information of the cracks.
In summary, the crack efficiency can be improved by the crack based on the deep learning network, but many crack detection networks have high requirements on the number of training sets and image processing, and the network model is complex, and meanwhile, only the position of the crack in the plane image is obtained, but the three-dimensional position of the crack in the environment is not obtained, which is not beneficial to the subsequent repair work.
Disclosure of Invention
The invention aims to provide a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, solves the problems that the accuracy and efficiency of a plurality of crack detection networks are low, and the three-dimensional positioning of cracks cannot be completed at the same time, and completes the real-time pavement crack detection of a mobile robot in the environment.
The technical scheme for realizing the invention is as follows: a crack detection and three-dimensional positioning method based on deep learning and SLAM technology comprises the following steps:
s1, creating a data set of the crack detection network, constructing a crack detection network model, and training and storing an optimal model;
s2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and completing pose estimation in a tracking thread of a visual SLAM frame;
s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information;
and S4, integrating the two-dimensional crack detection module obtained in the step S3 and the depth information acquired by the KinectV2 into a visual inertial SLAM frame, completing construction of a dense point cloud map containing crack information, and realizing three-dimensional positioning of cracks.
Further, the specific operation of step S1 is as follows:
(1) collecting pavement crack images, and preprocessing and marking the pavement crack images;
(2) based on the construction mode of the full convolution neural network, a coding part consists of four groups of same coding blocks, each coding block consists of two convolution layers of 3 multiplied by 3 and a convolution layer of 1 multiplied by 1, a residual module is added into each layer, and then the coding block enters a maximum pooling layer of 2 multiplied by 2; the decoding part of the network is composed of four groups of same decoding blocks, a characteristic image obtained after deconvolution is connected with a characteristic image of a layer corresponding to the coding part in a jumping way, and a double-channel attention mechanism is added in the jumping connection; adding a 1 multiplied by 1 convolution layer in the last layer of the decoding part to realize end-to-end crack segmentation and obtain a crack detection result image with the same size as the input image;
further, the specific operation of step S2 is as follows:
(1) calibrating a KinectV2 camera and an IMU respectively to obtain respective internal parameters, and then carrying out combined calibration on the KinectV2 camera and the IMU to obtain a conversion matrix and a time difference of the KinectV2 camera and the IMU;
(2) calculating a pre-integration model of the IMU between continuous frames through an error model and a motion model of the IMU, so that the problem that the information acquisition frequencies of a camera and the IMU are not aligned is solved; initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;
(3) and performing data fusion on the IMU pre-integration and visual information in a tight coupling mode, realizing pose estimation by adopting a non-linear optimization mode based on sliding window marginalization aiming at a visual reprojection error and an IMU residual error, and obtaining an optimized pose through a target optimization function.
Further, the filtering rule regarding the key frames in step S3 is as follows, and one of the following rules is satisfied:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, so that the overlapping rate is low;
further, the specific operation of step S4 is as follows:
(1) and (3) screening out key frames through visual SLAM tracking, BA optimization and loop detection correction, calculating coordinates of each pixel point under a camera coordinate system and a world coordinate system according to a color image and a depth image which are acquired by KinectV2 and correspond to the key frames according to a camera imaging principle of KinectV2, and constructing a dense point cloud map.
(2) And 3, mapping the crack information obtained by performing semantic segmentation on all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating by adopting Bayesian updating aiming at the condition that semantic labels among continuous key frames are inconsistent, wherein the specific mode is as follows:
wherein the three-dimensional point is set as Vd,Representing the current set of all key frames,/kA class of three-dimensional voxels is represented,representing the independently distributed probability distribution of the three-dimensional point on the semantic label set.
And updating the semantic information of the three-dimensional point cloud by the above formula to obtain a dense point cloud map with crack information which is globally consistent, and obtaining the three-dimensional position information of the crack by the point cloud.
Compared with the prior art, the invention has the following advantages:
(1) the crack detection network for deep learning constructed by the invention simplifies the difficulty and complexity of crack detection and improves the accuracy and robustness of crack detection;
(2) according to the method, based on a sliding window tight coupling rear end nonlinear optimization mode, measurement data of KinectV2 and IMU are fused, so that pose estimation is completed, positioning accuracy is improved, and the problems that pure vision is easy to track and lose in a characteristic point sparse scene and positioning fails are solved;
(3) according to the method, the visual inertial SLAM technology and the two-dimensional crack detection network are combined, the dense point cloud map containing crack information is constructed, the problem that the sparse point cloud map constructed in the original visual SLAM lacks semantic information is solved, and three-dimensional positioning of cracks is achieved.
Drawings
FIG. 1 is a flow chart of crack detection and location in accordance with the present invention.
Fig. 2 is a diagram of the deep learning network structure of the present invention.
FIG. 3 is a dual channel attention gate diagram in the deep learning network of the present invention.
FIG. 4 is a flow chart of the construction of the dense point cloud map based on the visual SLAM according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings and accompanying examples.
According to fig. 1, the invention relates to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, and the crack detection method comprises the following steps:
and S1, creating an image training set, constructing a crack detection network model, and training and storing an optimal model.
The specific operation of step S1 is as follows:
(1) collecting original data, expanding data, and normalizing the images, including cutting and graying. Aiming at the influence caused by uneven illumination intensity, image preprocessing is carried out by adopting technologies such as histogram equalization, Gaussian bilateral filtering and the like;
(2) and marking the preprocessed image, wherein the crack area is marked as white, and the non-crack area is marked as black, so as to obtain and store a mask image. The image data set was divided into a training set and a test set with a ratio of 7: 3. Wherein the number of cracked images and non-cracked images contained in the training set is equal.
(3) Constructing a fracture splitting network as shown in fig. 2 and 3, wherein the fracture splitting network comprises four times of down sampling and four times of up sampling: each downsampling is carried out by adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the convolved image is cascaded with original input information, then the downsampling is carried out by using pooling with the scale of 2 multiplied by 2, the result is used as the input of the next layer of convolution, and through the operation, the characteristic channel is expanded to be twice of that of the previous layer of network; and when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data is processed by the AG module, the output signal is cascaded with the up-sampled gate signal and then output, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, but each feature vector with 64 dimensions is mapped to an output layer through 1 × 1 convolution in the last layer of the network.
The GDL loss function is selected as the loss function of the deep learning network, and the detection accuracy of the crack is influenced due to the problems that the conventional loss cross entropy function has a good segmentation effect on the background pixels of the image but is not sensitive to the crack pixels; updating model parameters by adopting an Adam optimization algorithm;
and realizing the crack detection network in a Pythrch frame, and training according to the selected training strategy and the image training set. And (3) training by adopting a GPU mode, setting most of hyper-parameters of the network, such as learning rate, iteration times and the like according to specific training conditions, observing a Loss curve and an accuracy rate curve, and obtaining and storing an optimal crack detection model. And verifying the generalization capability and accuracy of the model in a test set.
And S2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM.
The specific operation of step S2 is as follows:
(1) respectively completing the calibration of KinectV2 and IMU by utilizing a calibration tool iai _ kinect2 and kalibr _ all in ROS to obtain internal parameters; and carrying out combined calibration on the two to obtain the conversion matrix of the two and the time difference caused by the sampling frequency.
(2) The error model of the IMU is as follows.
Wherein,showing the measurements of the accelerometer and gyroscope. a (t) and ω (t) represent the true values of the two, b (t) refers to the slowly varying error produced by the sensor, called random walk noise, and η (t) represents white noise with extremely fast fluctuations in variation.
Substituting the error model of the IMU into the motion model to obtain the following IMU complete motion model:
wherein R isEBA rotation matrix representing the world coordinate system to the IMU coordinate system, aW、vW、pWRespectively representing the acceleration, the speed and the translation amount under a world coordinate system, i and j are two adjacent key frames, delta phiij、δvij、δpijRespectively representing the noise, R, of the IMU during rotation, speed, position measurementsi、vi、piRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; rj、vj、pjRespectively representing the relative rotation of the jth key frameMatrix, speed and translation; Δ tijRepresenting the time difference between the two time instants i, j.
And (3) performing pre-integration on adjacent key frames to obtain a pre-integration model as the formula (3).
Wherein, delta phii,δvij,δpiRepresenting rotation, velocity, noise at the time of position measurement of the IMU, respectively.
(3) And initializing the bias, the gravitational acceleration and the speed of the gyroscope, and reducing the accumulated error of the IMU.
(4) The residual model of the IMU and the reprojection error model of the camera are shown in equations (4) and (5), and thus a nonlinear optimization model is established, as in equation (6).
Wherein,is a residual model of the inertial measurement unit, where X is the variable to be optimized,value of IMU pre-integration, ep、eq、ev、eba、ebgRespectively performing zero offset on the position, the speed, the attitude and a gyroscope in the pre-integration of the inertia measurement unit, performing zero offset on the accelerometer, wherein b is random walk noise and beta is white noise;for the visual reprojection error, ξ, of the entire systemiLie algebra, m, corresponding to the pose of the camerajIs a three-dimensional map point, pijAnd the three-dimensional map points are corresponding pixel points in the image.
Aiming at the problem that the calculation amount of an optimization model is increased rapidly along with the operation of a system, a sliding window mode is adopted for optimization, namely, only the constraint relation between the removed key frame and the rest frames is reserved in the sliding window, the old key frame is not directly removed, and the reserved constraint relation is not used for optimization, so that the calculation amount can be reduced. Therefore, the pose optimization is carried out by establishing an optimization objective function shown in the formula (7), wherein three items represent prior information, a visual reprojection error and an IMU residual error respectively.
Where B represents the set of all IMU measurements, k represents the kth image, HpHessian matrix representing prior information, and gamma represents Jacobian matrix of optimized variables
S3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information;
the screening rules for key frames are as follows, one of which is satisfied:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, so that the overlapping rate is low;
s4, the two-dimensional crack detection module obtained in the step 4 and the depth information collected by KinectV2 are integrated into a visual SLAM frame, the dense point cloud map containing crack information is constructed, and three-dimensional positioning of cracks is achieved.
The specific operation of step S4 is as follows:
(1) according to fig. 4, the color image and the depth image collected by the KinectV2 are used as sensor input of the visual SLAM, and then the color image and the depth image corresponding to the key frame which meets the rule are screened out through tracking, BA optimization and loop detection.
(2) Each pixel point in the depth map comprises two-dimensional pixel coordinates (u, v) and a depth value d of the pixel point on the color map, the coordinates of the pixel point converted into a camera coordinate system are obtained through a formula of parameters obtained through the imaging principle of a camera and the calibration of a Kinectv2 camera, and the pixel point is converted into a world coordinate system through a formula (8). And processing by a function in the PCL library to obtain a dense point cloud map in a pcd format.
Wherein C is a camera internal reference matrix, R and t are respectively a rotation matrix and a translation vector of the camera, [ x, y, z [ ]]TRepresenting the coordinates of the point cloud in the world coordinate system.
(3) And 3, performing crack detection on the key frame, and converting the coordinates of the map points corresponding to the characteristic points for segmenting the cracks into a world coordinate system through a coordinate conversion relation. Due to the fact that the data collected by the sensor is unstable, inconsistency of crack semantic labels among continuous key frames can be caused, and conflicts are generated when the semantic labels are fused. A method of progressive semantic tag association using Bayesian estimation.
Suppose that key frame K is currently being processedtThree-dimensional point set as VdThen all key frames are aggregated asThe distribution l of the semantic label of the point can be known through Bayesian updatingkAs follows:
then, the formula (10) can be obtained by Markov hypothesis:
wherein,due to P (l)k) Is quantitative and does not change with time, so the regularization term factor can be ignoredThe semantic information of the three-dimensional point cloud when a new key frame arrives is updated through the formula (11).
And associating the two-dimensional semantic labels of the plurality of key frames through Bayes updating, and realizing the transfer of the two-dimensional semantic labels to the three-dimensional point cloud through the coordinate conversion relation in the dense reconstruction thread, so as to obtain the three-dimensional crack semantic map with consistent global semantic labels.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative examples and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Claims (10)
1. A crack detection and three-dimensional positioning method based on deep learning and SLAM technology is characterized by comprising the following steps:
s1, constructing a crack detection network model, training the crack detection network model through a data set, and storing an optimal model;
s2, fusing data acquired by the inertial measurement unit and the camera in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM;
s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection network model, and extracting crack information;
and S4, integrating the crack information obtained in the step S3 and the data acquired by the camera into a visual inertia SLAM frame, and constructing a dense point cloud map containing the crack information to complete three-dimensional positioning of the crack.
2. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the crack detection network model is built based on a full convolution neural network, a residual error module and an attention mechanism, a convolution layer of 1 x 1 is added to the convolution group of each layer, and the residual error module is added at the same time; a two-channel attention mechanism is added to the jump connection part of encoding and decoding.
3. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 2, wherein the step S1 specifically comprises:
step S1-1, collecting original data and performing data expansion to construct a data set, performing normalization processing on the data, and performing image preprocessing by histogram equalization and Gaussian bilateral filtering;
step S1-2, labeling the preprocessed image, wherein the crack area is marked as white, the non-crack area is marked as black, obtaining and storing a mask image, and dividing the data set into a training set and a test set;
step S1-3, constructing a crack detection network model, including four down-sampling and four up-sampling: each downsampling is performed by firstly adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the image is cascaded with original input information, then the downsampling is performed by adopting pooling with the scale of 2 multiplied by 2, and the result is used as the input of the next layer of convolution; when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data are processed by the AG module, the output signal is cascaded and output with the up-sampled gate signal, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, each feature vector with 64 dimensions is mapped to an output layer by using 1 × 1 convolution at the last layer of the network;
and step S1-4, iteratively training the crack detection network model based on the training set, and verifying through the test set until the optimal model meeting the set value is obtained.
4. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 3, wherein the ratio of the training set to the test set is 7:3, and the number of cracked images and crack-free images contained in the training set is equal.
5. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the step S2 specifically comprises:
s2-1, calibrating the camera and the inertia measurement unit respectively to obtain respective internal parameters, and then calibrating the camera and the inertia measurement unit jointly to obtain a conversion matrix and a time difference of the camera and the inertia measurement unit;
s2-2, calculating a pre-integration model between continuous frames of the inertial measurement unit through an error model and a motion model of the inertial measurement unit, and initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;
and S2-3, performing data fusion on the pre-integrated model and visual information in a tight coupling mode, performing pose estimation on the visual reprojection error and the inertial measurement unit residual error by using a sliding window marginalization-based nonlinear optimization model, and determining the optimized pose by using a target optimization function.
6. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3,
the pre-integration model is as follows:
where i and j are two adjacent key frames, δ φij、δvij、δpijRespectively representing the noise in the rotation, velocity, position measurements of the inertial measurement unit, Ri、vi、piRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; rj、vj、pjRespectively representing the relative rotation matrix, the speed and the translation quantity of the jth key frame; Δ tijRepresenting the time difference between the two time instants i, j.
7. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the nonlinear optimization model is:
wherein,is a residual model of the inertial measurement unit, where X is the variable to be optimized,value of pre-integration for the inertial measurement unit, ep、eq、ev、eba、ebgRespectively pre-integrating the position, the speed, the attitude and the zero offset of a gyroscope, an accelerometer is zero offset, b is random walk noise, and beta is white noise in an inertial measurement unit;for the visual reprojection error, ξ, of the entire systemiLie algebra, m, corresponding to the pose of the camerajIs a three-dimensional map point, pijAnd the three-dimensional map points are corresponding pixel points in the image.
8. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the objective optimization function is:
wherein B represents allSet of IMU measurements, k denotes the k-th image, HpAnd a hessian matrix representing prior information, and gamma represents a jacobian matrix of each optimization variable.
9. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the screening rule of the key frame screened in step S3 is one of the following rules:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, and the overlapping rate is ensured to be low.
10. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the step S4 specifically comprises:
s4-1, carrying out dense point cloud mapping on the color map and the depth map corresponding to the keyframe screened out through visual SLAM tracking, local BA optimization and loop correction to obtain a dense point cloud map;
s4-2, mapping the crack information obtained by semantically dividing all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating Bayes aiming at the inconsistency of semantic labels among continuous key frames, namely:
wherein the three-dimensional point is set as Vd,Represents the current set of all key frames,/kA class of three-dimensional voxels is represented,representing the independent distribution probability distribution of the three-dimensional point on the semantic label set;
when a new key frame arrives, the semantic information of the three-dimensional point cloud is updated according to the formula, a globally consistent dense point cloud map with crack information is obtained, and the three-dimensional position information of cracks is obtained through the dense point cloud map.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210214242.5A CN114638794A (en) | 2022-03-04 | 2022-03-04 | Crack detection and three-dimensional positioning method based on deep learning and SLAM technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210214242.5A CN114638794A (en) | 2022-03-04 | 2022-03-04 | Crack detection and three-dimensional positioning method based on deep learning and SLAM technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114638794A true CN114638794A (en) | 2022-06-17 |
Family
ID=81948078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210214242.5A Pending CN114638794A (en) | 2022-03-04 | 2022-03-04 | Crack detection and three-dimensional positioning method based on deep learning and SLAM technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114638794A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114972470A (en) * | 2022-07-22 | 2022-08-30 | 北京中科慧眼科技有限公司 | Road surface environment obtaining method and system based on binocular vision |
CN115575407A (en) * | 2022-12-07 | 2023-01-06 | 浙江众合科技股份有限公司 | Detection method applied to track and tunnel |
CN115700781A (en) * | 2022-11-08 | 2023-02-07 | 广东技术师范大学 | Visual positioning method and system based on image inpainting in dynamic scene |
CN115797789A (en) * | 2023-02-20 | 2023-03-14 | 成都东方天呈智能科技有限公司 | Cascade detector-based rice pest monitoring system and method and storage medium |
CN116310349A (en) * | 2023-05-25 | 2023-06-23 | 西南交通大学 | Large-scale point cloud segmentation method, device, equipment and medium based on deep learning |
CN116363087A (en) * | 2023-03-23 | 2023-06-30 | 南京航空航天大学 | Method for detecting surface defects of automatic composite material laying |
CN118154700A (en) * | 2024-05-10 | 2024-06-07 | 常州星宇车灯股份有限公司 | On-line monitoring method for accuracy of external parameters of vehicle sensor |
-
2022
- 2022-03-04 CN CN202210214242.5A patent/CN114638794A/en active Pending
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114972470A (en) * | 2022-07-22 | 2022-08-30 | 北京中科慧眼科技有限公司 | Road surface environment obtaining method and system based on binocular vision |
CN115700781A (en) * | 2022-11-08 | 2023-02-07 | 广东技术师范大学 | Visual positioning method and system based on image inpainting in dynamic scene |
CN115700781B (en) * | 2022-11-08 | 2023-05-05 | 广东技术师范大学 | Visual positioning method and system based on image complementary painting in dynamic scene |
CN115575407A (en) * | 2022-12-07 | 2023-01-06 | 浙江众合科技股份有限公司 | Detection method applied to track and tunnel |
CN115797789A (en) * | 2023-02-20 | 2023-03-14 | 成都东方天呈智能科技有限公司 | Cascade detector-based rice pest monitoring system and method and storage medium |
CN116363087A (en) * | 2023-03-23 | 2023-06-30 | 南京航空航天大学 | Method for detecting surface defects of automatic composite material laying |
CN116310349A (en) * | 2023-05-25 | 2023-06-23 | 西南交通大学 | Large-scale point cloud segmentation method, device, equipment and medium based on deep learning |
CN116310349B (en) * | 2023-05-25 | 2023-08-15 | 西南交通大学 | Large-scale point cloud segmentation method, device, equipment and medium based on deep learning |
CN118154700A (en) * | 2024-05-10 | 2024-06-07 | 常州星宇车灯股份有限公司 | On-line monitoring method for accuracy of external parameters of vehicle sensor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114638794A (en) | Crack detection and three-dimensional positioning method based on deep learning and SLAM technology | |
CN111798475B (en) | Indoor environment 3D semantic map construction method based on point cloud deep learning | |
CN110956651B (en) | Terrain semantic perception method based on fusion of vision and vibrotactile sense | |
CN110738697B (en) | Monocular depth estimation method based on deep learning | |
CN109636905B (en) | Environment semantic mapping method based on deep convolutional neural network | |
CN111598030B (en) | Method and system for detecting and segmenting vehicle in aerial image | |
Wu et al. | Vehicle detection of multi-source remote sensing data using active fine-tuning network | |
CN112488025B (en) | Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion | |
CN111340881B (en) | Direct method visual positioning method based on semantic segmentation in dynamic scene | |
CN114092697B (en) | Building facade semantic segmentation method with attention fused with global and local depth features | |
CN114283137B (en) | Photovoltaic module hot spot defect detection method based on multiscale feature map reasoning network | |
Balaska et al. | Enhancing satellite semantic maps with ground-level imagery | |
CN114359861B (en) | Intelligent vehicle obstacle recognition deep learning method based on vision and laser radar | |
CN117218343A (en) | Semantic component attitude estimation method based on deep learning | |
CN112766136A (en) | Space parking space detection method based on deep learning | |
CN115861619A (en) | Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network | |
CN115115859A (en) | Long linear engineering construction progress intelligent identification and analysis method based on unmanned aerial vehicle aerial photography | |
CN111582232A (en) | SLAM method based on pixel-level semantic information | |
CN111797692A (en) | Depth image gesture estimation method based on semi-supervised learning | |
CN117315169A (en) | Live-action three-dimensional model reconstruction method and system based on deep learning multi-view dense matching | |
CN114332644B (en) | Large-view-field traffic density acquisition method based on video satellite data | |
CN116503602A (en) | Unstructured environment three-dimensional point cloud semantic segmentation method based on multi-level edge enhancement | |
Huang et al. | Overview of LiDAR point cloud target detection methods based on deep learning | |
CN117611911A (en) | Single-frame infrared dim target detection method based on improved YOLOv7 | |
Kajabad et al. | YOLOv4 for urban object detection: Case of electronic inventory in St. Petersburg |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |