CN116091426A - Pavement crack detection method based on coder-decoder - Google Patents

Pavement crack detection method based on coder-decoder Download PDF

Info

Publication number
CN116091426A
CN116091426A CN202211700351.4A CN202211700351A CN116091426A CN 116091426 A CN116091426 A CN 116091426A CN 202211700351 A CN202211700351 A CN 202211700351A CN 116091426 A CN116091426 A CN 116091426A
Authority
CN
China
Prior art keywords
module
network
decoder
output
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211700351.4A
Other languages
Chinese (zh)
Inventor
颜成钢
陈雨中
杨浩男
张文豪
武松鹤
朱尊杰
高宇涵
孙垚棋
陈楚翘
王鸿奎
王廷宇
殷海兵
张继勇
李宗鹏
赵治栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangdian Lishui Research Institute Co Ltd
Original Assignee
Hangdian Lishui Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangdian Lishui Research Institute Co Ltd filed Critical Hangdian Lishui Research Institute Co Ltd
Priority to CN202211700351.4A priority Critical patent/CN116091426A/en
Publication of CN116091426A publication Critical patent/CN116091426A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30181Earth observation
    • G06T2207/30184Infrastructure
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pavement crack detection method based on a coder-decoder, which can effectively detect cracks of a pillow layer by constructing a new end-to-end encoder-decoder residual error network. By combining the deep supervision mechanism and fusion loss, the model can capture fine details, so that the network is more easily optimized. The model can directly output a high-quality saliency map, which is almost close to a corresponding ground reality value. The resulting saliency map uniformly highlights clearly defined defective objects while effectively filtering out background noise. The model of the invention has stronger robustness and does not need any post-processing, and the real-time speed on a single GPU is higher.

Description

Pavement crack detection method based on coder-decoder
Technical Field
The invention belongs to the field of significance detection, and particularly relates to a pavement crack detection method based on a coder-decoder.
Background
The highway network of china is the largest worldwide. In the past few decades, china's highway network has been rapidly developed, and the road network covers all regions of the country, thereby providing convenience for people to travel. With the rapid development of China's economy, china's highway network has also been further expanded and improved. At present, china has established a large number of highways, and the pavement quality and facility equipment of the highways are improved. In addition, china also strengthens the supervision of highway construction, and ensures the quality and safety of highway construction.
Road surface cracks are a common defect on road surfaces, and are usually caused by aging of road materials, climate change, overload of vehicles, construction quality problems and the like. The road surface crack not only damages the visual effect of the road, but also can influence the safety of the road.
On the one hand, road surface cracks can make the road surface become uneven, cause the vehicle to control the difficulty, increase the risk of traffic accident. Particularly in humid climates, water convergence of road surface cracks can lead to easy sliding of vehicles and cause greater threat to driving safety.
On the other hand, road cracks can also accelerate the aging and damage of road materials, resulting in damaged road structures, which need to be repaired or updated. This not only increases the cost of maintaining and building the road, but also affects the useful life of the road.
Therefore, detecting and repairing road cracks in time is very important for maintaining road safety and quality. If the road is found to be cracked, the road should be repaired in time so as not to cause greater damage.
The common road crack detection methods include the following:
manual detection method: and (5) manually recording the crack condition of the road surface by personnel on-site inspection, and judging the information such as the type, the position, the length and the like of the crack. The method is simple and easy to implement, but has low efficiency, and is difficult to comprehensively detect all cracks on the road.
The camera detection method comprises the following steps: the road surface is photographed by a camera or unmanned aerial vehicle mounted on the vehicle and cracks are identified using a computer-aided analysis system. The method can rapidly detect the crack condition of the road surface, but requires certain equipment investment and technical difficulty.
Laser scanning method: the road surface is scanned by a laser scanner mounted on the vehicle and cracks are identified using a computer-aided analysis system. The method can rapidly and accurately detect the crack condition of the road surface, but needs certain equipment investment and technical difficulty.
Machine learning method: the automatic identification of the cracks on the road surface is realized by manually collecting a large amount of crack image data on the road surface and training a model by using a machine learning technology. The method can rapidly and accurately detect the crack condition of the road surface, does not need special equipment, and can use a common camera or a mobile phone for shooting.
Compared with the traditional manual detection, the automatic surface defect detection has the advantages of high precision and high efficiency, and is an effective method for reducing the labor cost. Because of the rapid growth of road network driving mileage, automatic detection of road surface cracks plays a vital role in an intelligent traffic system. Pavement crack detection systems typically include the removal of non-crack images and the quantitative detection of cracks.
Depth necessarily results in better detection accuracy. Experiments prove that the detection accuracy can be ensured and the detection speed can be improved by selecting a network system structure with proper depth. Although great progress has been made in the field of DCNN-based crack detection, it remains to be explored how to obtain more detailed crack characteristics. In the detection of road cracks, there are difficulties such as small cracks, a lot of picture noise, unclear boundaries, incomplete area information, and the like.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a pavement crack detection method based on a coder-decoder.
The invention proposes a new end-to-end encoder-decoder residual network for significance detection of defective objects. And the designed multi-level channel weighting fusion module and the residual error optimization module are alternately utilized to gradually recover predicted spatial significance values from the encoded multi-level semantic features, so that complete defect objects are promoted to be detected, and non-significance backgrounds are restrained. Compared with the existing significance detection method, the method can accurately segment the complete defect object with definite boundary and effectively filter irrelevant background noise.
A pavement crack detection method based on a coder-decoder comprises the following steps:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
the new end-to-end encoder-decoder residual network comprises an encoder network and a decoder network;
the encoder network includes an input layer, four residual blocks of ResNet-34, and a bridge module.
In the decoder network, multi-level channel weighted fusion Modules (MCW) and residual optimization modules (ORM) are used alternately to progressively recover the significance information encoded in the previous multi-scale features. The output of the bridge module passes through the MCW and then the ORM, and the bridge module is cyclically reciprocated, and a total of 5 MCWs and 4 ORMs form a decoder module.
And 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1.
And 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
Further, for the encoder network, resNet-34 is selected as the backbone network. The entire encoder network contains one input layer, four residual blocks of ResNet-34, and one bridge module. The input layer has 64 channels, the kernel size is 3×3 and the stride is 1, and the maximum pooling operation is added at the tail of the input layer to further enlarge the size of the accepting field. The convolved output of the input layer is input to a batch normalization layer to balance the scale of the features, followed by a ReLU activation function to enhance the nonlinear representation capability.
Formally, given one input image, multi-scale features are extracted at 5 levels. The residual block of each resnet34 is embedded with a channel attention module and a spatial attention module.
An additional bridging module is designed at the end of the encoder network to further capture global context awareness information, which is suitable for accurately locating the region of the defective object. The bridge module comprises 3X3 convolution blocks with the expansion rates of {1,2,4} of 3 512 channels, wherein the outputs of the three modules are spliced together and then sent to the next 3X3 convolution block to be used as the output of the bridge module. Notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
Further, the inputs of the MCW module are the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level. In particular, the EN of the bottommost MCW is the output of the bridge module. The received EN and DE are first subjected to a concatenation operation in the channel dimension, and then the channel is changed to the original size by a convolution of 1x 1. And then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
Further, the input of the ORM module is the MCW module output OUT1 of the current hierarchy. The output OUT2, the output feature DE of the decoder stage, is obtained by first a convolution of 3x3, then a channel shuffling operation, then a convolution of 3x3 and a convolution of 1x1, and finally adding to the input OUT1. Notably, each 3x3 convolution is followed by a BN, reLU operation.
Further, the specific method in the step 3 is as follows:
training the constructed end-to-end encoder-decoder residual network through the data set in the step 1, adopting a depth supervision mechanism in the training process, carrying OUT 3x3 convolution on the output of each stage of the encoder network, namely OUT2 of each level, reducing the number of channels to 1, and enabling the result to be the side output of each level. Then a bilinear upsampling is performed, the same resolution matching is performed as for the input image, and a sigmoid activation function maps the predicted value to [0,1]. The output of each level is supervised, with the top-most side output saliency map as the final output of the present invention.
A fusion penalty is constructed to oversee the training process of the network to learn more detailed information in boundary location and structure capture.
The fusion loss used an organic fusion of BCE, ioU, SSIM three losses.
Figure SMS_1
wherein ,
Figure SMS_2
and
Figure SMS_3
BCE loss, ioU loss and SSIM loss are indicated, respectively. The weights alpha, beta and gamma can be changed along with different training stages. In the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
Further, the first 10 epochs are defined as early, defining α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
Further, BCE loss is defined as:
Figure SMS_4
wherein G is ground true phase, S is predicted significance map.
Further, ioU loss is used to evaluate the similarity of G and S, and can be defined as:
Figure SMS_5
further, the structural information is captured by SSIM loss. In particular, the method comprises the steps of,
Figure SMS_6
Figure SMS_7
and
Figure SMS_8
two blocks (size=n×n) cut out from the saliency map S and ground truth G, respectively. SSIM is defined as:
Figure SMS_9
the invention has the following beneficial effects:
the invention is a new end-to-end encoder decoder residual error network, which can effectively detect cracks of the occipital layer. By combining the deep supervision mechanism and fusion loss, the model can capture fine details, so that the network is more easily optimized. The model can directly output a high-quality saliency map, which is almost close to a corresponding ground reality value. The resulting saliency map uniformly highlights clearly defined defective objects while effectively filtering out background noise. Our model is robust and does not require any post-processing, and is faster in real-time on a single GPU.
Drawings
FIG. 1 is a schematic diagram of an end-to-end encoder-decoder residual network architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a residual block structure of a resnet34 according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a bridge module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a multi-level channel weighted fusion Module (MCW) according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a residual optimization module (ORM) according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the drawings and examples.
A pavement crack detection method based on a coder-decoder comprises the following steps:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
as shown in fig. 1, the new end-to-end encoder-decoder residual network includes an encoder network and a decoder network;
for the encoder network, resNet-34 is selected as the backbone network for the encoder portion. On the one hand, by using layer-jump connections (i.e. shortcut identity mapping) on a simple network (simply stacking convolutional layers) employed by VGG networks, the residual learning framework is easier to optimize. Residual structures, on the other hand, are easy to implement for deeper networks, and still have a low complexity. In this way, the model can obtain accuracy benefits from significantly increased depth, i.e., more context information is covered, due to the expansion of the acceptance field. The entire encoder network contains one input layer, four residual blocks of ResNet-34, and one bridge module. Furthermore, unlike the original ResNet-34, the input layer of the present invention has 64 channels, a kernel size of 3×3 and stride of 1, instead of a kernel size of 7×7 and stride of 2; the maximum pooling operation is then added at the end of the input layer to further expand the size of the acceptance field. The present invention performs such operations, encoding better spatial details, respectively, and capturing better filter responses before and after pooling operations. The convolved output of the input layer is input to a batch normalization layer to balance the scale of the features, followed by a ReLU activation function to enhance the nonlinear representation capability.
Formally, given one input image, multi-scale features are extracted at 5 levels. Attention mechanisms are considered to have learning capabilities of accurate, compact features and are widely used in a variety of computer vision tasks due to their effectiveness and efficiency. The residual block of each resnet34 is embedded with a channel attention module and a spatial attention module (as shown in fig. 2).
This is more difficult than edge detection, which requires only simple gradient information, since a complete segmentation of the uniform region is required for significant target detection. For this purpose, the invention designs an additional bridging module at the end of the encoder network to further capture global context awareness information, which is suitable for accurately locating the region of the defective object. As shown in fig. 3, the bridge module includes 3×3 convolution blocks each having an expansion ratio of {1,2,4} of 3 512 channels, and outputs of the three modules are spliced together and then sent to the next 3×3 convolution block to be used as an output of the bridge module. Notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
In the decoder network, multi-level channel weighted fusion Modules (MCW) and residual optimization modules (ORM) are used alternately to progressively recover the significance information encoded in the previous multi-scale features. The output of the bridge module passes through the MCW and then the ORM, and the bridge module is cyclically reciprocated, and a total of 5 MCWs and 4 ORMs form a decoder module. The feature map generated directly by the encoder network is more focused on insignificant background areas. The main reason for this is that global context information is not fully considered, leading to incorrect significance prediction results. To solve this problem, the present invention designs a multi-level channel weighted fusion Module (MCW) to capture more efficient feature areas and filter out background noise interferents. After processing the proposed MCW, the model is more focused on the region of the defect object or its edges.
As shown in fig. 4, the inputs to the MCW module are the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level. In particular, the EN of the bottommost MCW is the output of the bridge module. The received EN and DE are first subjected to a concatenation operation in the channel dimension, and then the channel is changed to the original size by a convolution of 1x 1. And then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
As shown in fig. 5, the input of the ORM module is the MCW module output OUT1 of the current hierarchy. The output OUT2, the output feature DE of the decoder stage, is obtained by first a convolution of 3x3, then a channel shuffling operation, then a convolution of 3x3 and a convolution of 1x1, and finally adding to the input OUT1. Notably, each 3x3 convolution is followed by a BN, reLU operation.
And 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1.
The constructed end-to-end encoder-decoder residual network is trained through the data set in the step 1, in the training process, a depth supervision mechanism is adopted, the output of each stage of the encoder network, namely OUT2 of each level is subjected to 3x3 convolution, the channel number is reduced to 1, and the result is obtained as the side output of each level. Then a bilinear upsampling is performed, the same resolution matching is performed as for the input image, and a sigmoid activation function maps the predicted value to [0,1]. The output of each level is supervised, with the top-most side output saliency map as the final output of the present invention.
Since salient object detection can also be regarded as a dense binary classification problem in nature, its output represents the probability score that each pixel is a foreground object. Therefore, previous approaches always use cross entropy (typically applied to classification tasks) as a training penalty. However, this simple strategy is difficult to direct the network to capture global structural information of significant targets, resulting in ambiguous boundaries or incomplete detection results. To overcome this problem, the present invention constructs a fusion penalty to supervise the training process of the network to learn more detailed information in boundary location and structure capture.
The fusion loss used an organic fusion of BCE, ioU, SSIM three losses.
Figure SMS_10
wherein ,
Figure SMS_11
and
Figure SMS_12
BCE loss, ioU loss and SSIM loss are indicated, respectively. And the weight alpha, beta,Gamma will vary with the different phases of training. In the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
Specifically, 10 epochs prior to the training of the present invention are defined as early, with α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
More specifically, BCE loss is widely used for binary task classification, defined as:
Figure SMS_13
wherein G is ground true phase, S is predicted significance map.
IoU loss is used to evaluate the similarity of G and S and can be defined as:
Figure SMS_14
SSIM loss was originally proposed in the image quality assessment effort, which captured structural information. In particular, the method comprises the steps of,
Figure SMS_15
and
Figure SMS_16
Figure SMS_17
Two blocks (size=n×n) cut out from the saliency map S and ground truth G, respectively. SSIM is defined as:
Figure SMS_18
Figure SMS_19
representing P S and PG Mean value of image->
Figure SMS_20
and
Figure SMS_21
Representing P S and PG Variance of image>
Figure SMS_22
and
Figure SMS_23
Representing P S and PG Covariance of image, C 1 and C2 A constant is expressed for ensuring that the denominator is not zero (C is taken in the present invention 1 =C 2 =0.0001)。
And 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments, and it is not intended that the invention be limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the invention, and these alternatives or modifications should be considered to be within the scope of the invention.
The invention, in part not described in detail, is within the skill of those skilled in the art.

Claims (9)

1. The pavement crack detection method based on the coder-decoder is characterized by comprising the following steps of:
step (1), acquiring a data set;
the data set adopts a public data set ack 500;
step (2), constructing a new end-to-end encoder-decoder residual network;
the new end-to-end encoder-decoder residual network comprises an encoder network and a decoder network;
the encoder network comprises an input layer, four residual blocks of ResNet-34 and a bridge module;
in the decoder network, a multi-level channel weighted fusion Module (MCW) and a residual optimization module (ORM) are alternately used to gradually recover significance information encoded in previous multi-scale features; the output of the bridge module passes through the MCW and then the ORM, and the bridge module is circularly reciprocated, and a decoder module is formed by 5 MCWs and 4 ORMs;
step 3, training the constructed end-to-end encoder-decoder residual network through the data set in the step 1;
and 4, finishing pavement crack detection through the trained end-to-end encoder-decoder residual error network.
2. A method of detecting road cracks based on codec according to claim 1, wherein for the encoder network, res net-34 is selected as the backbone network; the whole encoder network comprises an input layer, four residual blocks of ResNet-34 and a bridging module; the input layer is provided with 64 channels, the kernel size is 3 multiplied by 3, the stride is 1, and the maximum pooling operation is added at the tail part of the input layer so as to further enlarge the size of the receiving field; the convolved output of the input layer is input to a batch normalization layer to balance the scale of the feature, followed by a ReLU activation function to enhance the nonlinear representation capability;
formally, given an input image, multi-scale features are extracted at 5 levels; the residual block of each resnet34 is embedded with a channel attention module and a spatial attention module;
an additional bridging module is designed at the end of the encoder network to further capture global context awareness information, which is applicable to accurately locating the region of the defective object; the bridging module comprises 3X3 convolution blocks with 512 channels and expansion rates of {1,2,4} respectively, outputs of the three modules are spliced together and then sent to the next 3X3 convolution block to be used as the output of the bridging module; notably, each convolution layer is followed by a batch normalization and a ReLU activation function.
3. A method of detecting road cracks based on a codec according to claim 2, wherein the input of the MCW module is the residual block output EN of the same level in the encoder and the output characteristics DE of the decoder stage of the previous level; in particular, the EN of the bottommost MCW is the output of the bridge module; firstly, splicing the received EN and DE in the channel dimension, and then changing the channel into the original size through a convolution of 1x 1; and then passing through a channel attention module and finally adding with the input DE to obtain the output OUT1 of the MCW module.
4. A method of detecting road cracks based on codec according to claim 3, wherein the input of the ORM module is the MCW module output OUT1 of the current level; firstly, carrying OUT a convolution of 3x3, then carrying OUT a channel shuffling operation, then carrying OUT a convolution of 3x3 and a convolution of 1x1, and finally adding with an input OUT1 to obtain an output OUT2, namely an output characteristic DE of a decoder stage; BN and ReLU operations are performed after each 3x3 convolution.
5. The method for detecting cracks on a pavement based on a codec according to claim 4, wherein the specific method in step 3 is as follows:
training the constructed end-to-end encoder-decoder residual network through the data set in the step 1, adopting a depth supervision mechanism in the training process, carrying OUT 3x3 convolution on the output of each stage of the encoder network, namely OUT2 of each level, reducing the number of channels to 1, and enabling the result to be the side output of each level; then performing a bilinear upsampling, performing the same resolution matching as the input image, and a sigmoid activation function mapping the predicted value to [0,1]; each level of output is supervised, and the top-level side output saliency map is taken as the final output of the invention;
constructing a fusion loss to supervise the training process of the network so as to learn more detailed information in boundary position and structure capture;
the fusion loss adopts organic fusion of BCE, ioU, SSIM three losses;
Figure FDA0004023836750000031
wherein ,
Figure FDA0004023836750000032
and
Figure FDA0004023836750000033
BCE loss, ioU loss, and SSIM loss, respectively; the weights alpha, beta and gamma can change along with different training stages; in the early stage of training, the weight of BCE is amplified, convergence is quickened, the weight of IoU and SSIM loss is gradually increased in the middle stage, and model refinement is quickened.
6. A method for detecting cracks in a pavement based on a codec according to claim 5, wherein the first 10 epochs are defined as early, α=2, β=0.5, γ=0.5; defined later as mid-term, α=1, β=0.5+0.1 (epoch-10), γ=0.5+0.1 (epoch-10).
7. The codec-based pavement crack detection method of claim 5, wherein BCE loss is defined as:
Figure FDA0004023836750000041
wherein G is ground true phase, S is predicted significance map.
8. The method for detecting cracks in a pavement based on a codec according to claim 5, wherein IoU loses similarity for evaluating G and S, which can be defined as:
Figure FDA0004023836750000042
9. the codec-based pavement crack detection method of claim 5, wherein the structural information is captured by SSIM loss; in particular, the method comprises the steps of,
Figure FDA0004023836750000043
and
Figure FDA0004023836750000044
Two blocks, of size=n×n, cut from saliency map S and ground truth G, respectively; SSIM is defined as:
Figure FDA0004023836750000045
CN202211700351.4A 2022-12-28 2022-12-28 Pavement crack detection method based on coder-decoder Withdrawn CN116091426A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211700351.4A CN116091426A (en) 2022-12-28 2022-12-28 Pavement crack detection method based on coder-decoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211700351.4A CN116091426A (en) 2022-12-28 2022-12-28 Pavement crack detection method based on coder-decoder

Publications (1)

Publication Number Publication Date
CN116091426A true CN116091426A (en) 2023-05-09

Family

ID=86213136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211700351.4A Withdrawn CN116091426A (en) 2022-12-28 2022-12-28 Pavement crack detection method based on coder-decoder

Country Status (1)

Country Link
CN (1) CN116091426A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596930A (en) * 2023-07-18 2023-08-15 吉林大学 Semi-supervised multitasking real image crack detection system and method
CN117173182A (en) * 2023-11-03 2023-12-05 厦门微亚智能科技股份有限公司 Defect detection method, system, equipment and medium based on coding and decoding network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116596930A (en) * 2023-07-18 2023-08-15 吉林大学 Semi-supervised multitasking real image crack detection system and method
CN116596930B (en) * 2023-07-18 2023-09-22 吉林大学 Semi-supervised multitasking real image crack detection system and method
CN117173182A (en) * 2023-11-03 2023-12-05 厦门微亚智能科技股份有限公司 Defect detection method, system, equipment and medium based on coding and decoding network
CN117173182B (en) * 2023-11-03 2024-03-19 厦门微亚智能科技股份有限公司 Defect detection method, system, equipment and medium based on coding and decoding network

Similar Documents

Publication Publication Date Title
CN107480611B (en) Crack identification method based on deep learning convolutional neural network
CN116091426A (en) Pavement crack detection method based on coder-decoder
CN114677601B (en) Dam crack detection method based on unmanned aerial vehicle inspection and combined with deep learning
CN114359130B (en) Road crack detection method based on unmanned aerial vehicle image
CN111222580A (en) High-precision crack detection method
Li et al. Automatic bridge crack identification from concrete surface using ResNeXt with postprocessing
CN110197505B (en) Remote sensing image binocular stereo matching method based on depth network and semantic information
CN112766136B (en) Space parking space detection method based on deep learning
CN112964712A (en) Method for rapidly detecting state of asphalt pavement
CN107705254B (en) City environment assessment method based on street view
CN114705689A (en) Unmanned aerial vehicle-based method and system for detecting cracks of outer vertical face of building
CN117037105B (en) Pavement crack filling detection method, system, terminal and medium based on deep learning
CN114626445B (en) Dam termite video identification method based on optical flow network and Gaussian background modeling
CN111353396A (en) Concrete crack segmentation method based on SCSEOCUnet
CN111951289B (en) Underwater sonar image data segmentation method based on BA-Unet
Fu et al. Extended efficient convolutional neural network for concrete crack detection with illustrated merits
CN114049538A (en) Airport crack image confrontation generation method based on UDWGAN + + network
CN114419421A (en) Subway tunnel crack identification system and method based on images
CN114596316A (en) Road image detail capturing method based on semantic segmentation
CN116433629A (en) Airport pavement defect identification method based on GA-Unet
CN113869433A (en) Deep learning method for rapidly detecting and classifying concrete damage
CN117952898A (en) Water delivery tunnel crack detection method based on UNet network
CN118072193A (en) Dam crack detection method based on unmanned aerial vehicle image and deep learning
CN111881914B (en) License plate character segmentation method and system based on self-learning threshold
CN113744185A (en) Concrete apparent crack segmentation method based on deep learning and image processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20230509