CN114638794A - Crack detection and three-dimensional positioning method based on deep learning and SLAM technology - Google Patents

Crack detection and three-dimensional positioning method based on deep learning and SLAM technology Download PDF

Info

Publication number
CN114638794A
CN114638794A CN202210214242.5A CN202210214242A CN114638794A CN 114638794 A CN114638794 A CN 114638794A CN 202210214242 A CN202210214242 A CN 202210214242A CN 114638794 A CN114638794 A CN 114638794A
Authority
CN
China
Prior art keywords
crack
crack detection
model
dimensional
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210214242.5A
Other languages
Chinese (zh)
Inventor
周静
宋先义
郭玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202210214242.5A priority Critical patent/CN114638794A/en
Publication of CN114638794A publication Critical patent/CN114638794A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, which comprises the following steps: creating a data set, constructing a crack detection network model, and training and storing an optimal model; fusing data acquired by an Inertial Measurement Unit (IMU) and a camera KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of a visual SLAM; judging each frame image acquired in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information; and integrating the crack information and the depth information acquired by KinectV2 into a visual inertia SLAM frame, completing the construction of a dense point cloud map containing the crack information, and realizing the three-dimensional positioning of the crack. The method realizes real-time detection of the crack and positioning of the crack in a three-dimensional environment, and has high detection accuracy and high robustness.

Description

Crack detection and three-dimensional positioning method based on deep learning and SLAM technology
Technical Field
The invention relates to a pavement crack detection and positioning method, in particular to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology.
Background
Pavement cracks are the most common diseases of roads, so that the detection of the pavement cracks is particularly important for road maintenance. At present, the traditional pavement detection work needs manual work, a large amount of time and energy are consumed, the work efficiency and the safety are low, and therefore an automatic pavement crack detection method needs to be researched. Common automatic detection algorithms are mostly based on a deep learning network, the crack detection algorithm based on deep learning can improve the efficiency of maintenance work on a road surface, meanwhile, the efficiency of related work such as maintenance of infrastructure such as bridges, tunnels and dams is improved to a certain extent, the inspection maintenance cost is reduced, the automation of crack defect inspection is realized, and the method has great application value and significance.
The invention patent 201911371906.3 discloses a crack detection method and system based on image processing, which adopts a pixel-tracking-based algorithm to connect cracks, and simultaneously performs shape feature analysis on the connected regions to screen out regions which do not meet requirements. The model realized by the method needs manual feature selection, is greatly influenced by environmental factors, and has low generalization.
The invention patent 201911355501.0 discloses a crack detection network based on a Faster R-CNN network, which can classify cracks and backgrounds and obtain the positioning result of a crack boundary box, but the preprocessing of the image in the early stage of the method ignores the interference of road noise and different illumination intensities, so the final crack detection accuracy is low, and the method is greatly influenced by interferents.
The invention patent 202010247786.2 discloses a pavement crack rapid identification method based on deep learning, which utilizes a feature extraction network to map a road image to be detected to a coding space to obtain feature images with different sizes; and inputting the feature image of each size into a crack detection network, performing progressive feature fusion, preset convolution operation and attention enhancement on the feature image, and performing subsequent preset convolution operation and upsampling on the feature image to obtain a feature image which has the same size as the road image and is marked with a crack area, wherein the crack is not subjected to feature-based processing subsequently, and the crack is quantized.
The invention patent 202010236401.2 discloses a crack detection model based on yolov5, which can output the monitoring information of cracks, including the prediction frame, detection category and category confidence of images, and set the loss function of the model as the weighted sum of a classification loss function, a target loss function, a regression loss function and a cheap angle loss function, so that the anti-interference performance and the accuracy of crack detection are improved, but the network cannot obtain the specific position information of the cracks.
In summary, the crack efficiency can be improved by the crack based on the deep learning network, but many crack detection networks have high requirements on the number of training sets and image processing, and the network model is complex, and meanwhile, only the position of the crack in the plane image is obtained, but the three-dimensional position of the crack in the environment is not obtained, which is not beneficial to the subsequent repair work.
Disclosure of Invention
The invention aims to provide a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, solves the problems that the accuracy and efficiency of a plurality of crack detection networks are low, and the three-dimensional positioning of cracks cannot be completed at the same time, and completes the real-time pavement crack detection of a mobile robot in the environment.
The technical scheme for realizing the invention is as follows: a crack detection and three-dimensional positioning method based on deep learning and SLAM technology comprises the following steps:
s1, creating a data set of the crack detection network, constructing a crack detection network model, and training and storing an optimal model;
s2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and completing pose estimation in a tracking thread of a visual SLAM frame;
s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information;
and S4, integrating the two-dimensional crack detection module obtained in the step S3 and the depth information acquired by the KinectV2 into a visual inertial SLAM frame, completing construction of a dense point cloud map containing crack information, and realizing three-dimensional positioning of cracks.
Further, the specific operation of step S1 is as follows:
(1) collecting pavement crack images, and preprocessing and marking the pavement crack images;
(2) based on the construction mode of the full convolution neural network, a coding part consists of four groups of same coding blocks, each coding block consists of two convolution layers of 3 multiplied by 3 and a convolution layer of 1 multiplied by 1, a residual module is added into each layer, and then the coding block enters a maximum pooling layer of 2 multiplied by 2; the decoding part of the network is composed of four groups of same decoding blocks, a characteristic image obtained after deconvolution is connected with a characteristic image of a layer corresponding to the coding part in a jumping way, and a double-channel attention mechanism is added in the jumping connection; adding a 1 multiplied by 1 convolution layer in the last layer of the decoding part to realize end-to-end crack segmentation and obtain a crack detection result image with the same size as the input image;
further, the specific operation of step S2 is as follows:
(1) calibrating a KinectV2 camera and an IMU respectively to obtain respective internal parameters, and then carrying out combined calibration on the KinectV2 camera and the IMU to obtain a conversion matrix and a time difference of the KinectV2 camera and the IMU;
(2) calculating a pre-integration model of the IMU between continuous frames through an error model and a motion model of the IMU, so that the problem that the information acquisition frequencies of a camera and the IMU are not aligned is solved; initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;
(3) and performing data fusion on the IMU pre-integration and visual information in a tight coupling mode, realizing pose estimation by adopting a non-linear optimization mode based on sliding window marginalization aiming at a visual reprojection error and an IMU residual error, and obtaining an optimized pose through a target optimization function.
Further, the filtering rule regarding the key frames in step S3 is as follows, and one of the following rules is satisfied:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, so that the overlapping rate is low;
further, the specific operation of step S4 is as follows:
(1) and (3) screening out key frames through visual SLAM tracking, BA optimization and loop detection correction, calculating coordinates of each pixel point under a camera coordinate system and a world coordinate system according to a color image and a depth image which are acquired by KinectV2 and correspond to the key frames according to a camera imaging principle of KinectV2, and constructing a dense point cloud map.
(2) And 3, mapping the crack information obtained by performing semantic segmentation on all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating by adopting Bayesian updating aiming at the condition that semantic labels among continuous key frames are inconsistent, wherein the specific mode is as follows:
Figure BDA0003532365260000031
wherein the three-dimensional point is set as Vd
Figure BDA0003532365260000032
Representing the current set of all key frames,/kA class of three-dimensional voxels is represented,
Figure BDA0003532365260000033
representing the independently distributed probability distribution of the three-dimensional point on the semantic label set.
And updating the semantic information of the three-dimensional point cloud by the above formula to obtain a dense point cloud map with crack information which is globally consistent, and obtaining the three-dimensional position information of the crack by the point cloud.
Compared with the prior art, the invention has the following advantages:
(1) the crack detection network for deep learning constructed by the invention simplifies the difficulty and complexity of crack detection and improves the accuracy and robustness of crack detection;
(2) according to the method, based on a sliding window tight coupling rear end nonlinear optimization mode, measurement data of KinectV2 and IMU are fused, so that pose estimation is completed, positioning accuracy is improved, and the problems that pure vision is easy to track and lose in a characteristic point sparse scene and positioning fails are solved;
(3) according to the method, the visual inertial SLAM technology and the two-dimensional crack detection network are combined, the dense point cloud map containing crack information is constructed, the problem that the sparse point cloud map constructed in the original visual SLAM lacks semantic information is solved, and three-dimensional positioning of cracks is achieved.
Drawings
FIG. 1 is a flow chart of crack detection and location in accordance with the present invention.
Fig. 2 is a diagram of the deep learning network structure of the present invention.
FIG. 3 is a dual channel attention gate diagram in the deep learning network of the present invention.
FIG. 4 is a flow chart of the construction of the dense point cloud map based on the visual SLAM according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings and accompanying examples.
According to fig. 1, the invention relates to a crack detection and three-dimensional positioning method based on deep learning and SLAM technology, and the crack detection method comprises the following steps:
and S1, creating an image training set, constructing a crack detection network model, and training and storing an optimal model.
The specific operation of step S1 is as follows:
(1) collecting original data, expanding data, and normalizing the images, including cutting and graying. Aiming at the influence caused by uneven illumination intensity, image preprocessing is carried out by adopting technologies such as histogram equalization, Gaussian bilateral filtering and the like;
(2) and marking the preprocessed image, wherein the crack area is marked as white, and the non-crack area is marked as black, so as to obtain and store a mask image. The image data set was divided into a training set and a test set with a ratio of 7: 3. Wherein the number of cracked images and non-cracked images contained in the training set is equal.
(3) Constructing a fracture splitting network as shown in fig. 2 and 3, wherein the fracture splitting network comprises four times of down sampling and four times of up sampling: each downsampling is carried out by adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the convolved image is cascaded with original input information, then the downsampling is carried out by using pooling with the scale of 2 multiplied by 2, the result is used as the input of the next layer of convolution, and through the operation, the characteristic channel is expanded to be twice of that of the previous layer of network; and when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data is processed by the AG module, the output signal is cascaded with the up-sampled gate signal and then output, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, but each feature vector with 64 dimensions is mapped to an output layer through 1 × 1 convolution in the last layer of the network.
The GDL loss function is selected as the loss function of the deep learning network, and the detection accuracy of the crack is influenced due to the problems that the conventional loss cross entropy function has a good segmentation effect on the background pixels of the image but is not sensitive to the crack pixels; updating model parameters by adopting an Adam optimization algorithm;
and realizing the crack detection network in a Pythrch frame, and training according to the selected training strategy and the image training set. And (3) training by adopting a GPU mode, setting most of hyper-parameters of the network, such as learning rate, iteration times and the like according to specific training conditions, observing a Loss curve and an accuracy rate curve, and obtaining and storing an optimal crack detection model. And verifying the generalization capability and accuracy of the model in a test set.
And S2, fusing data acquired by an Inertial Measurement Unit (IMU) and KinectV2 in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM.
The specific operation of step S2 is as follows:
(1) respectively completing the calibration of KinectV2 and IMU by utilizing a calibration tool iai _ kinect2 and kalibr _ all in ROS to obtain internal parameters; and carrying out combined calibration on the two to obtain the conversion matrix of the two and the time difference caused by the sampling frequency.
(2) The error model of the IMU is as follows.
Figure BDA0003532365260000051
Wherein,
Figure BDA0003532365260000052
showing the measurements of the accelerometer and gyroscope. a (t) and ω (t) represent the true values of the two, b (t) refers to the slowly varying error produced by the sensor, called random walk noise, and η (t) represents white noise with extremely fast fluctuations in variation.
Substituting the error model of the IMU into the motion model to obtain the following IMU complete motion model:
Figure BDA0003532365260000053
wherein R isEBA rotation matrix representing the world coordinate system to the IMU coordinate system, aW、vW、pWRespectively representing the acceleration, the speed and the translation amount under a world coordinate system, i and j are two adjacent key frames, delta phiij、δvij、δpijRespectively representing the noise, R, of the IMU during rotation, speed, position measurementsi、vi、piRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; rj、vj、pjRespectively representing the relative rotation of the jth key frameMatrix, speed and translation; Δ tijRepresenting the time difference between the two time instants i, j.
And (3) performing pre-integration on adjacent key frames to obtain a pre-integration model as the formula (3).
Figure BDA0003532365260000061
Wherein, delta phii,δvij,δpiRepresenting rotation, velocity, noise at the time of position measurement of the IMU, respectively.
(3) And initializing the bias, the gravitational acceleration and the speed of the gyroscope, and reducing the accumulated error of the IMU.
(4) The residual model of the IMU and the reprojection error model of the camera are shown in equations (4) and (5), and thus a nonlinear optimization model is established, as in equation (6).
Figure BDA0003532365260000062
Figure BDA0003532365260000063
Figure BDA0003532365260000064
Wherein,
Figure BDA0003532365260000065
is a residual model of the inertial measurement unit, where X is the variable to be optimized,
Figure BDA0003532365260000066
value of IMU pre-integration, ep、eq、ev、eba、ebgRespectively performing zero offset on the position, the speed, the attitude and a gyroscope in the pre-integration of the inertia measurement unit, performing zero offset on the accelerometer, wherein b is random walk noise and beta is white noise;
Figure BDA0003532365260000071
for the visual reprojection error, ξ, of the entire systemiLie algebra, m, corresponding to the pose of the camerajIs a three-dimensional map point, pijAnd the three-dimensional map points are corresponding pixel points in the image.
Aiming at the problem that the calculation amount of an optimization model is increased rapidly along with the operation of a system, a sliding window mode is adopted for optimization, namely, only the constraint relation between the removed key frame and the rest frames is reserved in the sliding window, the old key frame is not directly removed, and the reserved constraint relation is not used for optimization, so that the calculation amount can be reduced. Therefore, the pose optimization is carried out by establishing an optimization objective function shown in the formula (7), wherein three items represent prior information, a visual reprojection error and an IMU residual error respectively.
Figure BDA0003532365260000072
Where B represents the set of all IMU measurements, k represents the kth image, HpHessian matrix representing prior information, and gamma represents Jacobian matrix of optimized variables
S3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection model, and extracting crack information;
the screening rules for key frames are as follows, one of which is satisfied:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, so that the overlapping rate is low;
s4, the two-dimensional crack detection module obtained in the step 4 and the depth information collected by KinectV2 are integrated into a visual SLAM frame, the dense point cloud map containing crack information is constructed, and three-dimensional positioning of cracks is achieved.
The specific operation of step S4 is as follows:
(1) according to fig. 4, the color image and the depth image collected by the KinectV2 are used as sensor input of the visual SLAM, and then the color image and the depth image corresponding to the key frame which meets the rule are screened out through tracking, BA optimization and loop detection.
(2) Each pixel point in the depth map comprises two-dimensional pixel coordinates (u, v) and a depth value d of the pixel point on the color map, the coordinates of the pixel point converted into a camera coordinate system are obtained through a formula of parameters obtained through the imaging principle of a camera and the calibration of a Kinectv2 camera, and the pixel point is converted into a world coordinate system through a formula (8). And processing by a function in the PCL library to obtain a dense point cloud map in a pcd format.
Figure BDA0003532365260000081
Wherein C is a camera internal reference matrix, R and t are respectively a rotation matrix and a translation vector of the camera, [ x, y, z [ ]]TRepresenting the coordinates of the point cloud in the world coordinate system.
(3) And 3, performing crack detection on the key frame, and converting the coordinates of the map points corresponding to the characteristic points for segmenting the cracks into a world coordinate system through a coordinate conversion relation. Due to the fact that the data collected by the sensor is unstable, inconsistency of crack semantic labels among continuous key frames can be caused, and conflicts are generated when the semantic labels are fused. A method of progressive semantic tag association using Bayesian estimation.
Suppose that key frame K is currently being processedtThree-dimensional point set as VdThen all key frames are aggregated as
Figure BDA0003532365260000087
The distribution l of the semantic label of the point can be known through Bayesian updatingkAs follows:
Figure BDA0003532365260000082
then, the formula (10) can be obtained by Markov hypothesis:
Figure BDA0003532365260000083
wherein,
Figure BDA0003532365260000086
due to P (l)k) Is quantitative and does not change with time, so the regularization term factor can be ignored
Figure BDA0003532365260000084
The semantic information of the three-dimensional point cloud when a new key frame arrives is updated through the formula (11).
Figure BDA0003532365260000085
And associating the two-dimensional semantic labels of the plurality of key frames through Bayes updating, and realizing the transfer of the two-dimensional semantic labels to the three-dimensional point cloud through the coordinate conversion relation in the dense reconstruction thread, so as to obtain the three-dimensional crack semantic map with consistent global semantic labels.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative examples and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (10)

1. A crack detection and three-dimensional positioning method based on deep learning and SLAM technology is characterized by comprising the following steps:
s1, constructing a crack detection network model, training the crack detection network model through a data set, and storing an optimal model;
s2, fusing data acquired by the inertial measurement unit and the camera in a nonlinear optimization tight coupling mode, and finishing pose estimation in a tracking thread of the visual SLAM;
s3, judging each frame image collected in real time in a tracking thread, screening out key frames, performing semantic segmentation on the key frames through a crack detection network model, and extracting crack information;
and S4, integrating the crack information obtained in the step S3 and the data acquired by the camera into a visual inertia SLAM frame, and constructing a dense point cloud map containing the crack information to complete three-dimensional positioning of the crack.
2. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the crack detection network model is built based on a full convolution neural network, a residual error module and an attention mechanism, a convolution layer of 1 x 1 is added to the convolution group of each layer, and the residual error module is added at the same time; a two-channel attention mechanism is added to the jump connection part of encoding and decoding.
3. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 2, wherein the step S1 specifically comprises:
step S1-1, collecting original data and performing data expansion to construct a data set, performing normalization processing on the data, and performing image preprocessing by histogram equalization and Gaussian bilateral filtering;
step S1-2, labeling the preprocessed image, wherein the crack area is marked as white, the non-crack area is marked as black, obtaining and storing a mask image, and dividing the data set into a training set and a test set;
step S1-3, constructing a crack detection network model, including four down-sampling and four up-sampling: each downsampling is performed by firstly adopting convolution kernels of 3 multiplied by 3 and 1 multiplied by 1, then a ReLu activation function is used for activating the convolved image, the image is cascaded with original input information, then the downsampling is performed by adopting pooling with the scale of 2 multiplied by 2, and the result is used as the input of the next layer of convolution; when the up-sampling is carried out, namely the model enters a sixth layer, the output of the fifth layer is used as a gate signal, the gate signal is expanded to 2 times of the original gate signal through 1 × 1 deconvolution, then the gate signal and the output of the fourth layer of the model are respectively input into a set AG module, after the data are processed by the AG module, the output signal is cascaded and output with the up-sampled gate signal, the output signal enters a convolution layer with the convolution kernel size of 3 × 3, the output result is cascaded with the original input signal of the sixth layer and is used as the input of a seventh layer, and so on, each feature vector with 64 dimensions is mapped to an output layer by using 1 × 1 convolution at the last layer of the network;
and step S1-4, iteratively training the crack detection network model based on the training set, and verifying through the test set until the optimal model meeting the set value is obtained.
4. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 3, wherein the ratio of the training set to the test set is 7:3, and the number of cracked images and crack-free images contained in the training set is equal.
5. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the step S2 specifically comprises:
s2-1, calibrating the camera and the inertia measurement unit respectively to obtain respective internal parameters, and then calibrating the camera and the inertia measurement unit jointly to obtain a conversion matrix and a time difference of the camera and the inertia measurement unit;
s2-2, calculating a pre-integration model between continuous frames of the inertial measurement unit through an error model and a motion model of the inertial measurement unit, and initializing the bias, the gravity acceleration and the speed of the gyroscope to complete the visual inertia joint initialization;
and S2-3, performing data fusion on the pre-integrated model and visual information in a tight coupling mode, performing pose estimation on the visual reprojection error and the inertial measurement unit residual error by using a sliding window marginalization-based nonlinear optimization model, and determining the optimized pose by using a target optimization function.
6. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3,
the pre-integration model is as follows:
Figure FDA0003532365250000021
Figure FDA0003532365250000022
Figure FDA0003532365250000023
where i and j are two adjacent key frames, δ φij、δvij、δpijRespectively representing the noise in the rotation, velocity, position measurements of the inertial measurement unit, Ri、vi、piRespectively representing the relative rotation matrix, the speed and the translation quantity of the ith key frame; rj、vj、pjRespectively representing the relative rotation matrix, the speed and the translation quantity of the jth key frame; Δ tijRepresenting the time difference between the two time instants i, j.
7. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the nonlinear optimization model is:
Figure FDA0003532365250000031
Figure FDA0003532365250000032
Figure FDA0003532365250000033
wherein,
Figure FDA0003532365250000034
is a residual model of the inertial measurement unit, where X is the variable to be optimized,
Figure FDA0003532365250000035
value of pre-integration for the inertial measurement unit, ep、eq、ev、eba、ebgRespectively pre-integrating the position, the speed, the attitude and the zero offset of a gyroscope, an accelerometer is zero offset, b is random walk noise, and beta is white noise in an inertial measurement unit;
Figure FDA0003532365250000036
for the visual reprojection error, ξ, of the entire systemiLie algebra, m, corresponding to the pose of the camerajIs a three-dimensional map point, pijAnd the three-dimensional map points are corresponding pixel points in the image.
8. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 3, wherein the objective optimization function is:
Figure FDA0003532365250000037
wherein B represents allSet of IMU measurements, k denotes the k-th image, HpAnd a hessian matrix representing prior information, and gamma represents a jacobian matrix of each optimization variable.
9. The crack detection and three-dimensional positioning method based on deep learning and SLAM technology as claimed in claim 1, wherein the screening rule of the key frame screened in step S3 is one of the following rules:
1) at least 20 frames away from the last reference key frame;
2) the current frame is at least 20 frames apart from the last key frame or the local thread is in idle state;
3) the current frame can track at least 50 characteristic points;
4) the number of map points tracked by the current frame is more than 80% more than that of the reference key frame, and the overlapping rate is ensured to be low.
10. The crack detection and three-dimensional positioning method based on the deep learning and SLAM technology as claimed in claim 1, wherein the step S4 specifically comprises:
s4-1, carrying out dense point cloud mapping on the color map and the depth map corresponding to the keyframe screened out through visual SLAM tracking, local BA optimization and loop correction to obtain a dense point cloud map;
s4-2, mapping the crack information obtained by semantically dividing all key frames in the step 3 into a dense point cloud map through a coordinate transformation relation, and updating Bayes aiming at the inconsistency of semantic labels among continuous key frames, namely:
Figure FDA0003532365250000041
wherein the three-dimensional point is set as Vd
Figure FDA0003532365250000042
Represents the current set of all key frames,/kA class of three-dimensional voxels is represented,
Figure FDA0003532365250000043
representing the independent distribution probability distribution of the three-dimensional point on the semantic label set;
when a new key frame arrives, the semantic information of the three-dimensional point cloud is updated according to the formula, a globally consistent dense point cloud map with crack information is obtained, and the three-dimensional position information of cracks is obtained through the dense point cloud map.
CN202210214242.5A 2022-03-04 2022-03-04 Crack detection and three-dimensional positioning method based on deep learning and SLAM technology Pending CN114638794A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210214242.5A CN114638794A (en) 2022-03-04 2022-03-04 Crack detection and three-dimensional positioning method based on deep learning and SLAM technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210214242.5A CN114638794A (en) 2022-03-04 2022-03-04 Crack detection and three-dimensional positioning method based on deep learning and SLAM technology

Publications (1)

Publication Number Publication Date
CN114638794A true CN114638794A (en) 2022-06-17

Family

ID=81948078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210214242.5A Pending CN114638794A (en) 2022-03-04 2022-03-04 Crack detection and three-dimensional positioning method based on deep learning and SLAM technology

Country Status (1)

Country Link
CN (1) CN114638794A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972470A (en) * 2022-07-22 2022-08-30 北京中科慧眼科技有限公司 Road surface environment obtaining method and system based on binocular vision
CN115575407A (en) * 2022-12-07 2023-01-06 浙江众合科技股份有限公司 Detection method applied to track and tunnel
CN115700781A (en) * 2022-11-08 2023-02-07 广东技术师范大学 Visual positioning method and system based on image inpainting in dynamic scene
CN115797789A (en) * 2023-02-20 2023-03-14 成都东方天呈智能科技有限公司 Cascade detector-based rice pest monitoring system and method and storage medium
CN116310349A (en) * 2023-05-25 2023-06-23 西南交通大学 Large-scale point cloud segmentation method, device, equipment and medium based on deep learning
CN116363087A (en) * 2023-03-23 2023-06-30 南京航空航天大学 Method for detecting surface defects of automatic composite material laying
CN118154700A (en) * 2024-05-10 2024-06-07 常州星宇车灯股份有限公司 On-line monitoring method for accuracy of external parameters of vehicle sensor

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972470A (en) * 2022-07-22 2022-08-30 北京中科慧眼科技有限公司 Road surface environment obtaining method and system based on binocular vision
CN115700781A (en) * 2022-11-08 2023-02-07 广东技术师范大学 Visual positioning method and system based on image inpainting in dynamic scene
CN115700781B (en) * 2022-11-08 2023-05-05 广东技术师范大学 Visual positioning method and system based on image complementary painting in dynamic scene
CN115575407A (en) * 2022-12-07 2023-01-06 浙江众合科技股份有限公司 Detection method applied to track and tunnel
CN115797789A (en) * 2023-02-20 2023-03-14 成都东方天呈智能科技有限公司 Cascade detector-based rice pest monitoring system and method and storage medium
CN116363087A (en) * 2023-03-23 2023-06-30 南京航空航天大学 Method for detecting surface defects of automatic composite material laying
CN116310349A (en) * 2023-05-25 2023-06-23 西南交通大学 Large-scale point cloud segmentation method, device, equipment and medium based on deep learning
CN116310349B (en) * 2023-05-25 2023-08-15 西南交通大学 Large-scale point cloud segmentation method, device, equipment and medium based on deep learning
CN118154700A (en) * 2024-05-10 2024-06-07 常州星宇车灯股份有限公司 On-line monitoring method for accuracy of external parameters of vehicle sensor

Similar Documents

Publication Publication Date Title
CN114638794A (en) Crack detection and three-dimensional positioning method based on deep learning and SLAM technology
CN111798475B (en) Indoor environment 3D semantic map construction method based on point cloud deep learning
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN110738697B (en) Monocular depth estimation method based on deep learning
CN109636905B (en) Environment semantic mapping method based on deep convolutional neural network
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
Wu et al. Vehicle detection of multi-source remote sensing data using active fine-tuning network
CN112488025B (en) Double-temporal remote sensing image semantic change detection method based on multi-modal feature fusion
CN111340881B (en) Direct method visual positioning method based on semantic segmentation in dynamic scene
CN114092697B (en) Building facade semantic segmentation method with attention fused with global and local depth features
CN114283137B (en) Photovoltaic module hot spot defect detection method based on multiscale feature map reasoning network
Balaska et al. Enhancing satellite semantic maps with ground-level imagery
CN114359861B (en) Intelligent vehicle obstacle recognition deep learning method based on vision and laser radar
CN117218343A (en) Semantic component attitude estimation method based on deep learning
CN112766136A (en) Space parking space detection method based on deep learning
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN115115859A (en) Long linear engineering construction progress intelligent identification and analysis method based on unmanned aerial vehicle aerial photography
CN111582232A (en) SLAM method based on pixel-level semantic information
CN111797692A (en) Depth image gesture estimation method based on semi-supervised learning
CN117315169A (en) Live-action three-dimensional model reconstruction method and system based on deep learning multi-view dense matching
CN114332644B (en) Large-view-field traffic density acquisition method based on video satellite data
CN116503602A (en) Unstructured environment three-dimensional point cloud semantic segmentation method based on multi-level edge enhancement
Huang et al. Overview of LiDAR point cloud target detection methods based on deep learning
CN117611911A (en) Single-frame infrared dim target detection method based on improved YOLOv7
Kajabad et al. YOLOv4 for urban object detection: Case of electronic inventory in St. Petersburg

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination