CN116091426A - Pavement crack detection method based on coder-decoder - Google Patents
Pavement crack detection method based on coder-decoder Download PDFInfo
- Publication number
- CN116091426A CN116091426A CN202211700351.4A CN202211700351A CN116091426A CN 116091426 A CN116091426 A CN 116091426A CN 202211700351 A CN202211700351 A CN 202211700351A CN 116091426 A CN116091426 A CN 116091426A
- Authority
- CN
- China
- Prior art keywords
- module
- network
- decoder
- output
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 230000004927 fusion Effects 0.000 claims abstract description 17
- 230000002950 deficient Effects 0.000 claims abstract description 6
- 230000007246 mechanism Effects 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 20
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 abstract description 2
- 238000012805 post-processing Methods 0.000 abstract description 2
- 230000007547 defect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 3
- 230000032683 aging Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000001303 quality assessment method Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30181—Earth observation
- G06T2207/30184—Infrastructure
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pavement crack detection method based on a coder-decoder, which can effectively detect cracks of a pillow layer by constructing a new end-to-end encoder-decoder residual error network. By combining the deep supervision mechanism and fusion loss, the model can capture fine details, so that the network is more easily optimized. The model can directly output a high-quality saliency map, which is almost close to a corresponding ground reality value. The resulting saliency map uniformly highlights clearly defined defective objects while effectively filtering out background noise. The model of the invention has stronger robustness and does not need any post-processing, and the real-time speed on a single GPU is higher.
Description
Technical Field
The invention belongs to the field of significance detection, and particularly relates to a pavement crack detection method based on a coder-decoder.
Background
The highway network of china is the largest worldwide. In the past few decades, china's highway network has been rapidly developed, and the road network covers all regions of the country, thereby providing convenience for people to travel. With the rapid development of China's economy, china's highway network has also been further expanded and improved. At present, china has established a large number of highways, and the pavement quality and facility equipment of the highways are improved. In addition, china also strengthens the supervision of highway construction, and ensures the quality and safety of highway construction.
Road surface cracks are a common defect on road surfaces, and are usually caused by aging of road materials, climate change, overload of vehicles, construction quality problems and the like. The road surface crack not only damages the visual effect of the road, but also can influence the safety of the road.
On the one hand, road surface cracks can make the road surface become uneven, cause the vehicle to control the difficulty, increase the risk of traffic accident. Particularly in humid climates, water convergence of road surface cracks can lead to easy sliding of vehicles and cause greater threat to driving safety.
On the other hand, road cracks can also accelerate the aging and damage of road materials, resulting in damaged road structures, which need to be repaired or updated. This not only increases the cost of maintaining and building the road, but also affects the useful life of the road.
Therefore, detecting and repairing road cracks in time is very important for maintaining road safety and quality. If the road is found to be cracked, the road should be repaired in time so as not to cause greater damage.
The common road crack detection methods include the following:
manual detection method: and (5) manually recording the crack condition of the road surface by personnel on-site inspection, and judging the information such as the type, the position, the length and the like of the crack. The method is simple and easy to implement, but has low efficiency, and is difficult to comprehensively detect all cracks on the road.
The camera detection method comprises the following steps: the road surface is photographed by a camera or unmanned aerial vehicle mounted on the vehicle and cracks are identified using a computer-aided analysis system. The method can rapidly detect the crack condition of the road surface, but requires certain equipment investment and technical difficulty.
Laser scanning method: the road surface is scanned by a laser scanner mounted on the vehicle and cracks are identified using a computer-aided analysis system. The method can rapidly and accurately detect the crack condition of the road surface, but needs certain equipment investment and technical difficulty.
Machine learning method: the automatic identification of the cracks on the road surface is realized by manually collecting a large amount of crack image data on the road surface and training a model by using a machine learning technology. The method can rapidly and accurately detect the crack condition of the road surface, does not need special equipment, and can use a common camera or a mobile phone for shooting.
Compared with the traditional manual detection, the automatic surface defect detection has the advantages of high precision and high efficiency, and is an effective method for reducing the labor cost. Because of the rapid growth of road network driving mileage, automatic detection of road surface cracks plays a vital role in an intelligent traffic system. Pavement crack detection systems typically include the removal of non-crack images and the quantitative detection of cracks.
Depth necessarily results in better detection accuracy. Experiments prove that the detection accuracy can be ensured and the detection speed can be improved by selecting a network system structure with proper depth. Although great progress has been made in the field of DCNN-based crack detection, it remains to be explored how to obtain more detailed crack characteristics. In the detection of road cracks, there are difficulties such as small cracks, a lot of picture noise, unclear boundaries, incomplete area information, and the like.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a pavement crack detection method based on a coder-decoder.
The invention proposes a new end-to-end encoder-decoder residual network for significance detection of defective objects. And the designed multi-level channel weighting fusion module and the residual error optimization module are alternately utilized to gradually recover predicted spatial significance values from the encoded multi-level semantic features, so that complete defect objects are promoted to be detected, and non-significance backgrounds are restrained. Compared with the existing significance detection method, the method can accurately segment the complete defect object with definite boundary and effectively filter irrelevant background noise.
A pavement crack detection method based on a coder-decoder comprises the following steps:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
the new end-to-end encoder-decoder residual network comprises an encoder network and a decoder network;
the encoder network includes an input layer, four residual blocks of ResNet-34, and a bridge module.
In the decoder network, multi-level channel weighted fusion Modules (MCW) and residual optimization modules (ORM) are used alternately to progressively recover the significance information encoded in the previous multi-scale features. The output of the bridge module passes through the MCW and then the ORM, and the bridge module is cyclically reciprocated, and a total of 5 MCWs and 4 ORMs form a decoder module.
And 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1.
And 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
Further, for the encoder network, resNet-34 is selected as the backbone network. The entire encoder network contains one input layer, four residual blocks of ResNet-34, and one bridge module. The input layer has 64 channels, the kernel size is 3×3 and the stride is 1, and the maximum pooling operation is added at the tail of the input layer to further enlarge the size of the accepting field. The convolved output of the input layer is input to a batch normalization layer to balance the scale of the features, followed by a ReLU activation function to enhance the nonlinear representation capability.
Formally, given one input image, multi-scale features are extracted at 5 levels. The residual block of each resnet34 is embedded with a channel attention module and a spatial attention module.
An additional bridging module is designed at the end of the encoder network to further capture global context awareness information, which is suitable for accurately locating the region of the defective object. The bridge module comprises 3X3 convolution blocks with the expansion rates of {1,2,4} of 3 512 channels, wherein the outputs of the three modules are spliced together and then sent to the next 3X3 convolution block to be used as the output of the bridge module. Notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
Further, the inputs of the MCW module are the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level. In particular, the EN of the bottommost MCW is the output of the bridge module. The received EN and DE are first subjected to a concatenation operation in the channel dimension, and then the channel is changed to the original size by a convolution of 1x 1. And then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
Further, the input of the ORM module is the MCW module output OUT1 of the current hierarchy. The output OUT2, the output feature DE of the decoder stage, is obtained by first a convolution of 3x3, then a channel shuffling operation, then a convolution of 3x3 and a convolution of 1x1, and finally adding to the input OUT1. Notably, each 3x3 convolution is followed by a BN, reLU operation.
Further, the specific method in the step 3 is as follows:
training the constructed end-to-end encoder-decoder residual network through the data set in the step 1, adopting a depth supervision mechanism in the training process, carrying OUT 3x3 convolution on the output of each stage of the encoder network, namely OUT2 of each level, reducing the number of channels to 1, and enabling the result to be the side output of each level. Then a bilinear upsampling is performed, the same resolution matching is performed as for the input image, and a sigmoid activation function maps the predicted value to [0,1]. The output of each level is supervised, with the top-most side output saliency map as the final output of the present invention.
A fusion penalty is constructed to oversee the training process of the network to learn more detailed information in boundary location and structure capture.
The fusion loss used an organic fusion of BCE, ioU, SSIM three losses.
wherein , andBCE loss, ioU loss and SSIM loss are indicated, respectively. The weights alpha, beta and gamma can be changed along with different training stages. In the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
Further, the first 10 epochs are defined as early, defining α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
Further, BCE loss is defined as:
wherein G is ground true phase, S is predicted significance map.
Further, ioU loss is used to evaluate the similarity of G and S, and can be defined as:
further, the structural information is captured by SSIM loss. In particular, the method comprises the steps of, andtwo blocks (size=n×n) cut out from the saliency map S and ground truth G, respectively. SSIM is defined as:
the invention has the following beneficial effects:
the invention is a new end-to-end encoder decoder residual error network, which can effectively detect cracks of the occipital layer. By combining the deep supervision mechanism and fusion loss, the model can capture fine details, so that the network is more easily optimized. The model can directly output a high-quality saliency map, which is almost close to a corresponding ground reality value. The resulting saliency map uniformly highlights clearly defined defective objects while effectively filtering out background noise. Our model is robust and does not require any post-processing, and is faster in real-time on a single GPU.
Drawings
FIG. 1 is a schematic diagram of an end-to-end encoder-decoder residual network architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a residual block structure of a resnet34 according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a bridge module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a multi-level channel weighted fusion Module (MCW) according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a residual optimization module (ORM) according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the drawings and examples.
A pavement crack detection method based on a coder-decoder comprises the following steps:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
as shown in fig. 1, the new end-to-end encoder-decoder residual network includes an encoder network and a decoder network;
for the encoder network, resNet-34 is selected as the backbone network for the encoder portion. On the one hand, by using layer-jump connections (i.e. shortcut identity mapping) on a simple network (simply stacking convolutional layers) employed by VGG networks, the residual learning framework is easier to optimize. Residual structures, on the other hand, are easy to implement for deeper networks, and still have a low complexity. In this way, the model can obtain accuracy benefits from significantly increased depth, i.e., more context information is covered, due to the expansion of the acceptance field. The entire encoder network contains one input layer, four residual blocks of ResNet-34, and one bridge module. Furthermore, unlike the original ResNet-34, the input layer of the present invention has 64 channels, a kernel size of 3×3 and stride of 1, instead of a kernel size of 7×7 and stride of 2; the maximum pooling operation is then added at the end of the input layer to further expand the size of the acceptance field. The present invention performs such operations, encoding better spatial details, respectively, and capturing better filter responses before and after pooling operations. The convolved output of the input layer is input to a batch normalization layer to balance the scale of the features, followed by a ReLU activation function to enhance the nonlinear representation capability.
Formally, given one input image, multi-scale features are extracted at 5 levels. Attention mechanisms are considered to have learning capabilities of accurate, compact features and are widely used in a variety of computer vision tasks due to their effectiveness and efficiency. The residual block of each resnet34 is embedded with a channel attention module and a spatial attention module (as shown in fig. 2).
This is more difficult than edge detection, which requires only simple gradient information, since a complete segmentation of the uniform region is required for significant target detection. For this purpose, the invention designs an additional bridging module at the end of the encoder network to further capture global context awareness information, which is suitable for accurately locating the region of the defective object. As shown in fig. 3, the bridge module includes 3×3 convolution blocks each having an expansion ratio of {1,2,4} of 3 512 channels, and outputs of the three modules are spliced together and then sent to the next 3×3 convolution block to be used as an output of the bridge module. Notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
In the decoder network, multi-level channel weighted fusion Modules (MCW) and residual optimization modules (ORM) are used alternately to progressively recover the significance information encoded in the previous multi-scale features. The output of the bridge module passes through the MCW and then the ORM, and the bridge module is cyclically reciprocated, and a total of 5 MCWs and 4 ORMs form a decoder module. The feature map generated directly by the encoder network is more focused on insignificant background areas. The main reason for this is that global context information is not fully considered, leading to incorrect significance prediction results. To solve this problem, the present invention designs a multi-level channel weighted fusion Module (MCW) to capture more efficient feature areas and filter out background noise interferents. After processing the proposed MCW, the model is more focused on the region of the defect object or its edges.
As shown in fig. 4, the inputs to the MCW module are the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level. In particular, the EN of the bottommost MCW is the output of the bridge module. The received EN and DE are first subjected to a concatenation operation in the channel dimension, and then the channel is changed to the original size by a convolution of 1x 1. And then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
As shown in fig. 5, the input of the ORM module is the MCW module output OUT1 of the current hierarchy. The output OUT2, the output feature DE of the decoder stage, is obtained by first a convolution of 3x3, then a channel shuffling operation, then a convolution of 3x3 and a convolution of 1x1, and finally adding to the input OUT1. Notably, each 3x3 convolution is followed by a BN, reLU operation.
And 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1.
The constructed end-to-end encoder-decoder residual network is trained through the data set in the step 1, in the training process, a depth supervision mechanism is adopted, the output of each stage of the encoder network, namely OUT2 of each level is subjected to 3x3 convolution, the channel number is reduced to 1, and the result is obtained as the side output of each level. Then a bilinear upsampling is performed, the same resolution matching is performed as for the input image, and a sigmoid activation function maps the predicted value to [0,1]. The output of each level is supervised, with the top-most side output saliency map as the final output of the present invention.
Since salient object detection can also be regarded as a dense binary classification problem in nature, its output represents the probability score that each pixel is a foreground object. Therefore, previous approaches always use cross entropy (typically applied to classification tasks) as a training penalty. However, this simple strategy is difficult to direct the network to capture global structural information of significant targets, resulting in ambiguous boundaries or incomplete detection results. To overcome this problem, the present invention constructs a fusion penalty to supervise the training process of the network to learn more detailed information in boundary location and structure capture.
The fusion loss used an organic fusion of BCE, ioU, SSIM three losses.
wherein , andBCE loss, ioU loss and SSIM loss are indicated, respectively. And the weight alpha, beta,Gamma will vary with the different phases of training. In the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
Specifically, 10 epochs prior to the training of the present invention are defined as early, with α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
More specifically, BCE loss is widely used for binary task classification, defined as:
wherein G is ground true phase, S is predicted significance map.
IoU loss is used to evaluate the similarity of G and S and can be defined as:
SSIM loss was originally proposed in the image quality assessment effort, which captured structural information. In particular, the method comprises the steps of, and Two blocks (size=n×n) cut out from the saliency map S and ground truth G, respectively. SSIM is defined as:
representing P S and PG Mean value of image-> andRepresenting P S and PG Variance of image> andRepresenting P S and PG Covariance of image, C 1 and C2 A constant is expressed for ensuring that the denominator is not zero (C is taken in the present invention 1 =C 2 =0.0001)。
And 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the invention, and these alternatives or modifications should be considered to be within the scope of the invention.
The invention, in part not described in detail, is within the skill of those skilled in the art.
Claims (9)
1. The pavement crack detection method based on the coder-decoder is characterized by comprising the following steps of:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
the new end-to-end encoder-decoder residual network comprises an encoder network and a decoder network;
the encoder network comprises an input layer, four residual blocks of ResNet-34 and a bridge module;
in the decoder network, a multi-level channel weighted fusion Module (MCW) and a residual optimization module (ORM) are alternately used to gradually recover significance information encoded in previous multi-scale features; the output of the bridge module passes through the MCW and then the ORM, and the bridge module is circularly reciprocated, and a decoder module is formed by 5 MCWs and 4 ORMs;
step 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1;
and 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
2. A method of detecting road cracks based on codec according to claim 1, wherein for the encoder network, res net-34 is selected as the backbone network; the whole encoder network comprises an input layer, four residual blocks of ResNet-34 and a bridging module; the input layer is provided with 64 channels, the kernel size is 3 multiplied by 3, the stride is 1, and the maximum pooling operation is added at the tail part of the input layer so as to further enlarge the size of the receiving field; the convolved output of the input layer is input to a batch normalization layer to balance the scale of the feature, followed by a ReLU activation function to enhance the nonlinear representation capability;
formally, given an input image, multi-scale features are extracted at 5 levels; the residual block of each resnet34 is embedded with a channel attention module and a spatial attention module;
an additional bridging module is designed at the end of the encoder network to further capture global context awareness information, which is applicable to accurately locating the region of the defective object; the bridging module comprises 3X3 convolution blocks with 512 channels and expansion rates of {1,2,4} respectively, outputs of the three modules are spliced together and then sent to the next 3X3 convolution block to be used as the output of the bridging module; notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
3. A method of detecting road cracks based on a codec according to claim 2, wherein the input of the MCW module is the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level; in particular, the EN of the bottommost MCW is the output of the bridge module; firstly, splicing the received EN and DE in the channel dimension, and then changing the channel into the original size through a convolution of 1x 1; and then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
4. A method of detecting road cracks based on codec according to claim 3, wherein the input of the ORM module is the MCW module output OUT1 of the current level; firstly, carrying OUT a convolution of 3x3, then carrying OUT a channel shuffling operation, then carrying OUT a convolution of 3x3 and a convolution of 1x1, and finally adding with an input OUT1 to obtain an output OUT2, namely an output characteristic DE of a decoder stage; BN and ReLU operations are performed after each 3x3 convolution.
5. The method for detecting cracks on a pavement based on a codec according to claim 4, wherein the specific method in step 3 is as follows:
training the constructed end-to-end encoder-decoder residual network through the data set in the step 1, adopting a depth supervision mechanism in the training process, carrying OUT 3x3 convolution on the output of each stage of the encoder network, namely OUT2 of each level, reducing the number of channels to 1, and enabling the result to be the side output of each level; then performing a bilinear upsampling, performing the same resolution matching as the input image, and a sigmoid activation function mapping the predicted value to [0,1]; each level of output is supervised, and the top-level side output saliency map is taken as the final output of the invention;
constructing a fusion loss to supervise the training process of the network so as to learn more detailed information in boundary position and structure capture;
the fusion loss adopts organic fusion of BCE, ioU, SSIM three losses;
wherein , andBCE loss, ioU loss, and SSIM loss, respectively; the weights alpha, beta and gamma can change along with different training stages; in the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
6. A method for detecting cracks in a pavement based on a codec according to claim 5, wherein the first 10 epochs are defined as early, α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211700351.4A CN116091426A (en) | 2022-12-28 | 2022-12-28 | Pavement crack detection method based on coder-decoder |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211700351.4A CN116091426A (en) | 2022-12-28 | 2022-12-28 | Pavement crack detection method based on coder-decoder |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116091426A true CN116091426A (en) | 2023-05-09 |
Family
ID=86213136
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211700351.4A Withdrawn CN116091426A (en) | 2022-12-28 | 2022-12-28 | Pavement crack detection method based on coder-decoder |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116091426A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116596930A (en) * | 2023-07-18 | 2023-08-15 | 吉林大学 | Semi-supervised multitasking real image crack detection system and method |
CN117173182A (en) * | 2023-11-03 | 2023-12-05 | 厦门微亚智能科技股份有限公司 | Defect detection method, system, equipment and medium based on coding and decoding network |
-
2022
- 2022-12-28 CN CN202211700351.4A patent/CN116091426A/en not_active Withdrawn
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116596930A (en) * | 2023-07-18 | 2023-08-15 | 吉林大学 | Semi-supervised multitasking real image crack detection system and method |
CN116596930B (en) * | 2023-07-18 | 2023-09-22 | 吉林大学 | Semi-supervised multitasking real image crack detection system and method |
CN117173182A (en) * | 2023-11-03 | 2023-12-05 | 厦门微亚智能科技股份有限公司 | Defect detection method, system, equipment and medium based on coding and decoding network |
CN117173182B (en) * | 2023-11-03 | 2024-03-19 | 厦门微亚智能科技股份有限公司 | Defect detection method, system, equipment and medium based on coding and decoding network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480611B (en) | Crack identification method based on deep learning convolutional neural network | |
CN116091426A (en) | Pavement crack detection method based on coder-decoder | |
CN114677601B (en) | Dam crack detection method based on unmanned aerial vehicle inspection and combined with deep learning | |
CN114359130B (en) | Road crack detection method based on unmanned aerial vehicle image | |
CN111222580A (en) | High-precision crack detection method | |
Li et al. | Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing | |
CN110197505B (en) | Remote sensing image binocular stereo matching method based on depth network and semantic information | |
CN112766136B (en) | Space parking space detection method based on deep learning | |
CN112964712A (en) | Method for rapidly detecting state of asphalt pavement | |
CN107705254B (en) | City environment assessment method based on street view | |
CN114705689A (en) | Unmanned aerial vehicle-based method and system for detecting cracks of outer vertical face of building | |
CN117037105B (en) | Pavement crack filling detection method, system, terminal and medium based on deep learning | |
CN114626445B (en) | Dam termite video identification method based on optical flow network and Gaussian background modeling | |
CN111353396A (en) | Concrete crack segmentation method based on SCSEOCUnet | |
CN111951289B (en) | Underwater sonar image data segmentation method based on BA-Unet | |
Fu et al. | Extended efficient convolutional neural network for concrete crack detection with illustrated merits | |
CN114049538A (en) | Airport crack image confrontation generation method based on UDWGAN + + network | |
CN114419421A (en) | Subway tunnel crack identification system and method based on images | |
CN114596316A (en) | Road image detail capturing method based on semantic segmentation | |
CN116433629A (en) | Airport pavement defect identification method based on GA-Unet | |
CN113869433A (en) | Deep learning method for rapidly detecting and classifying concrete damage | |
CN117952898A (en) | Water delivery tunnel crack detection method based on UNet network | |
CN118072193A (en) | Dam crack detection method based on unmanned aerial vehicle image and deep learning | |
CN111881914B (en) | License plate character segmentation method and system based on self-learning threshold | |
CN113744185A (en) | Concrete apparent crack segmentation method based on deep learning and image processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20230509 |