CN112634292A - Asphalt pavement crack image segmentation method based on deep convolutional neural network - Google Patents

Asphalt pavement crack image segmentation method based on deep convolutional neural network Download PDF

Info

Publication number
CN112634292A
CN112634292A CN202110012193.2A CN202110012193A CN112634292A CN 112634292 A CN112634292 A CN 112634292A CN 202110012193 A CN202110012193 A CN 202110012193A CN 112634292 A CN112634292 A CN 112634292A
Authority
CN
China
Prior art keywords
layer
crack
output
image
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110012193.2A
Other languages
Chinese (zh)
Other versions
CN112634292B (en
Inventor
万海峰
李娜
孙启润
黄磊
苑兆迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai University
Original Assignee
Yantai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai University filed Critical Yantai University
Priority to CN202110012193.2A priority Critical patent/CN112634292B/en
Publication of CN112634292A publication Critical patent/CN112634292A/en
Application granted granted Critical
Publication of CN112634292B publication Critical patent/CN112634292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a method for segmenting an asphalt pavement crack image based on a deep convolutional neural network, which comprises the following steps of: preparing a crack picture data set, and carrying out picture preprocessing according to the data set; determining a structure of a CrackResatentionNet model, determining a loss function and an optimizer, further initializing a weight matrix by using normal distribution, modifying and updating a parameter gradient through forward propagation to reach a predicted value of an output layer and backward propagation, updating the weight matrix, and finally loading the trained CrackResatentionNet model, thereby predicting a well-segmented asphalt pavement image and accurately outputting the split image. The invention can proportionally fuse the outputs of two added attention modules, more emphasizes position information, the output of each coding layer is fused with the attention output and is connected with a corresponding decoding layer, and the output of the previous decoding layer is used as the input to the next decoding layer. Therefore, the decoding layer and the up-sampling operation thereof can fully utilize the spatial information and improve the segmentation precision of the image.

Description

Asphalt pavement crack image segmentation method based on deep convolutional neural network
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method for segmenting an asphalt pavement crack image based on a deep convolutional neural network.
Background
Cracks are the initial manifestation of early damage and potential degradation of asphalt pavements, and the adverse effects of cracks on the performance and function of road engineering are more and more pronounced due to the dramatic increase in traffic volume and the increase in traffic load levels. When the asphalt pavement cracks early, the cracks are timely and accurately detected and identified, and the scale of the cracks is accurately evaluated, so that a road engineering management and maintenance mechanism can be guided to adopt a scientific pavement pre-maintenance scheme, and the road is prevented from generating irreversible large-scale structural damage and the service life is shortened. The manual inspection of the cracks of the asphalt pavement needs a great deal of time and labor cost, and the accuracy is not enough; the crack detection precision of the image captured by the camera is obviously improved, the consistency and the objectivity are better, and the precision is synchronously improved. However, the segmentation of the cracks at the pixel level is still not accurate enough due to the influence of shadow, uneven illumination or irregular crack shape. The segmentation of the asphalt pavement crack image aiming at the deep convolutional neural network has important significance for the accurate, automatic and intelligent detection of the asphalt pavement crack.
By integrating the depth learning and image processing technology of the depth convolution neural network architecture of the attention mechanism, integrating the image segmentation method and the technical equipment and embedding the image segmentation method and the technical equipment into the pavement detection equipment, the asphalt pavement crack can be accurately and intelligently identified, so that the asphalt pavement crack data information is dynamically acquired in real time in all weather by means of the cloud platform, and the intelligent detection and management and maintenance decision level and efficiency of the asphalt pavement crack are improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an image segmentation method which is simple in structure, easy to implement, high in convergence speed and accurate, and can accurately and efficiently identify cracks of the asphalt pavement, so that a scientific method is provided for intelligent nondestructive detection and maintenance decision scheme formulation of the asphalt pavement.
A method for segmenting an asphalt pavement crack image based on a deep convolutional neural network comprises the following steps:
the method comprises the following steps: preparing a picture data set of the asphalt pavement cracks;
step two: preprocessing a picture; the pre-processing of the picture includes scaling a large image to a uniform size;
step three: setting a model structure of CrackResattentionNet, wherein the model of CrackResattentionNet adopts a structure based on an encoder-decoder and comprises an encoder and a decoder, and an attention module is additionally arranged between each encoder and each decoder and is positioned behind each encoder and connected with the corresponding decoder;
step four: determining a loss function; the comparison was made using pixel cross entropy loss (CE), balanced pixel cross entropy loss (BCE), and Dice loss:
step 401: the pixel cross entropy loss CE is shown in equation (6) below:
Figure BDA0002885593580000021
i represents the index of the pixel, n x n represents the size of the output image, p is the true value of the sample, the positive class is 1, the negative class is 0,
Figure BDA0002885593580000022
a probability of predicting a sample as positive;
step 402: the balanced pixel cross-entropy penalty is similar to the pixel cross-entropy penalty, with a sum of weights of 1, as shown in equation (7) below:
Figure BDA0002885593580000023
wherein BCE is balance pixel cross entropy loss, n x n represents the size of an output pixel, beta is a balance coefficient, p is a real value of a sample, a positive class is 1, a negative class is 0,
Figure BDA0002885593580000031
a probability of predicting a sample as positive;
step 403: the Dice loss is designed from the perspective of the cross-over ratio IoU, and is shown in equation (8):
Figure BDA0002885593580000032
in the formula (8), TP is pixel true positive, FN is pixel false negative;
step five: determining an optimizer, and adopting an Adam optimizer;
step six: initializing a weight matrix; for the ResNet34 pre-training model part, using the weight of the pre-training model, and for other layers except ResNet34, including an input layer, an output layer, a coding layer 5, a decoding layer 1 to a decoding layer 5, initializing a weight matrix by using normal distribution;
step seven: forward propagation; the input signal obtains the output of each layer with the help of the weight matrix, and finally reaches the predicted value of the output layer;
step eight: backward propagation; after a network prediction result calculated by any group of random parameters is obtained through forward propagation, correcting and updating by utilizing the gradient of a loss function relative to each parameter;
step nine: updating the weight matrix; updating the weight matrix according to the gradient of the parameters obtained by back propagation;
step ten: if the maximum training times are not reached, returning to the step seven, continuing to forward propagate, otherwise, saving the CrackResattentionNet binary model with the best performance;
step eleven: inputting a crack image of the asphalt pavement to be segmented; collecting the shot asphalt pavement crack images and using the collected images as the input of a system;
step twelve: preprocessing an image; the pre-processing of the picture includes scaling a large image to a uniform size;
step thirteen: loading the trained CrackResattentionNet, comprising the following steps:
step 1301: finding out a trained model file according to the transmitted file name;
step 1302: reading the model file to a memory;
step 1303: the prediction model predicts by using parameters in the loaded model file;
fourteen steps: segmentation and output of the crack image; inputting an asphalt pavement image with cracks, and predicting the well-segmented asphalt pavement image through the trained CrackResattentionNet, wherein pixels of the cracks are displayed in white, and other backgrounds are displayed in black;
step fifteen: and acquiring a trained CrackResattentionNet model file, storing the trained CrackResattentionNet model file on a disk, and simultaneously loading a model binary file into a memory.
In the above step one, a scheme of adopting step 101 or step 102 is specifically adopted:
step 101: directly using the annotated public fracture segmentation dataset comprising fracture images and the annotated fracture shapes and positions as a fracture picture dataset;
step 102: shooting real pavement crack pictures to form a crack picture data set; each crack photograph was manually annotated with crack shape and location by Labelme software.
The manual labeling of step 102 is realized by the following 4 sub-steps:
step 1021, starting a Labelme software window, and opening a pavement crack picture;
step 1022, drawing a polygon on the outer contour of the crack by using a mouse according to the shape of the crack, so that the polygon just covers the crack;
step 1023, naming the crack as a crack mark and saving the image file;
step 1024, Labelme will automatically generate a json file containing the position and the mark of each coordinate point of the polygon.
In the second step, the image is scaled to a uniform size of 448 × 448 pixels, and if the image is rectangular, it needs to be first uniform to a square size.
In the third step, the encoder is composed of an input layer, an encoding layer-1 to an encoding layer-5, wherein the encoding layer-1 to the encoding layer-4 respectively correspond to the first layer to the fourth layer of the ResNet34 network which is ready to be trained, and are ResNet34-1 to ResNet34-4 respectively; the decoder consists of a decoding layer-1 to a decoding layer-5 and an output layer.
In the third step, the attention module obtains the outputs from the coding layer-1 to the coding layer-4, and obtains the corresponding outputs-1 to-4 of the attention module through attention calculation; the output of the attention module is added with the output of the corresponding coding layer and the output of the previous decoding layer, and the sum is directly sent to the next decoding layer as input; the coding layer-5 is a coding layer with a structure different from the structures of the coding layer-1 to the coding layer-4, a convolution kernel with the size of 2 multiplied by 2 is used for carrying out convolution with the step length of 2, the padding is 0, and the size of an output matrix is divided by 2 on the original size; discarding, batch normalization processing and activating functions are connected after convolution operation; the output of the coding layer-5 is directly input into the decoding layer-5; decoding layer-5 contains convolution block-1, convolution block-2 and deconvolution block, the last part is convolution block-3; the convolution blocks-1 to-3 will use convolution kernels of size 1 × 1, perform convolution with step size 1, and will obtain the same size as the input size; after convolution, sequentially connecting discarding, batch normalization processing and activating functions; the deconvolution block will first perform deconvolution via the ConvTranspose2d function with a convolution kernel size of 2 x 2 and a step size of 2, which will multiply the input size by 2, with the batch normalization process and the activation function immediately following it.
In the third step, the above-mentioned step,an attention module comprising a location attention module and a channel attention module; the location attention module will extract a larger range of context information in the local features; feature map A, B, C generated using convolutional layer, where { A, B, C, D }. epsilon.RC×H×WThen deforming A, B, C to RC×NWhere N — H × W is the number of pixels; then transpose B to RN×CThus, matrix multiplication is performed between transposes of C and B, resulting in RN×NThen applying a softmax layer to calculate a spatial attention feature map S e RN×NAs in formula (1):
Figure BDA0002885593580000051
s in formula (1)jiMeasuring the influence of the ith position on the j positions; the more similar the feature representations of two locations, the greater the correlation between them; then transposes S, and performs matrix multiplication between transposes of A and S, resulting in RC×NThen the result is transformed into RC×H×WFinally, multiplying by the scale parameter alpha, and carrying out element summation operation on the original convolution characteristic D to obtain the final output H ∈ RC×H×WAs shown in equation (2):
Figure BDA0002885593580000052
α in the formula (2) is initialized to 0, and by gradual learning, more weight is obtained.
The channel attention module first performs convolution to extract feature maps E, F, G, H, and { E, F, G, H }. epsilon.RC ×H×W(ii) a Then the matrix F, G is transformed into RC×NWhere N — H × W is the number of pixels; then transpose F to RN×CThus, matrix multiplication is performed between transposes of F and E to obtain RN×NThe result matrix of (1), now applying a softmax layer to compute a spatial attention map X ∈ RN×NAs in formula (3):
Figure BDA0002885593580000061
x in formula (3)jiMeasuring the influence of the ith position on the j positions; then carrying out matrix multiplication between the softmax result X and the deformed G to obtain a result RC×NThen morph the result to RC×H×WFinally, multiplying by a scale parameter beta, and carrying out element summation operation by using the original convolution characteristic H to obtain the final output I epsilon RC×H×WAs shown in the following equation (4):
Figure BDA0002885593580000062
in the formula (4), beta is initialized to 0, and is gradually learned to more weights through learning; the scaled sum operation based on the matrix element level is calculated as shown in equation (5) below:
Figure BDA0002885593580000063
in the formula (5)
Figure BDA0002885593580000064
Is a hyper-parameter, and the position attention of the fracture segmentation is emphasized by 0.8.
The invention adopts a deep convolution-based neural network architecture and incorporates an attention mechanism, and has the advantages that: 1. the encoder of the core part mainly utilizes the convolution layer of ResNet34 to extract image features, and an encoding layer is added behind the convolution layer to better extract information; 2. the decoder uses the deconvolution layer to perform semantic segmentation on the cracked and non-cracked pixels; 3. connecting an additional location attention module and a channel attention module behind each encoder to capture remote context information; 4. the outputs of the two attention modules will be fused proportionally, which may emphasize the location information more. The output of each encoding layer will merge with the attention output and connect with the corresponding decoding layer, the output of the previous decoding layer being input to the next decoding layer. Therefore, the decoding layer and the up-sampling operation thereof can fully utilize the spatial information and improve the prediction precision.
Drawings
FIG. 1 is an overall flow chart of the asphalt pavement crack image segmentation system of the present invention;
FIG. 2 is a CrackResattentionNet network architecture diagram of the present invention;
fig. 3 is a schematic diagram of an encoding block 5 in an embodiment of the present invention;
FIG. 4 is a block diagram of a decoding block 5 according to an embodiment of the present invention;
FIG. 5 is a block diagram of decoding blocks 1, 2, 3, 4 according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an output block in an embodiment of the present invention;
FIG. 7 is a schematic view of an attention module in an embodiment of the invention;
FIG. 8 is a diagram illustrating various model fracture prediction-BCE loss for a common data set in an embodiment of the present invention;
FIG. 9 is a graph of various model crack predictions-BCE loss for an example smoking platform data set in an embodiment of the present invention.
Detailed Description
In order to facilitate an understanding of the invention, the invention is described in more detail below with reference to the accompanying drawings and specific examples. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Example one
The asphalt pavement crack image segmentation method based on the deep convolutional neural network comprises the following steps:
the method comprises the following steps: preparing a picture data set of the asphalt pavement cracks; specifically, the scheme of step 101 or step 102 is adopted:
step 101: directly using the labeled public fracture segmentation data set as a fracture picture data set;
step 102: shooting real asphalt pavement crack photos to form a crack picture data set; if 5000 pictures of cracks of the asphalt pavement are shot by a vehicle-mounted camera on a certain road section of the smoke table; because the shape and the position of the crack are not marked in the shot crack picture, the shape and the position of the crack are manually marked in each crack picture through Labelme software;
preferably, step 102 comprises the following implementation:
step 1021, starting a Labelme software window, and opening a pavement crack picture;
step 1022, drawing a polygon on the outer contour of the crack by using a mouse according to the shape of the crack, so that the polygon just covers the crack;
step 1023, naming the crack as a crack mark and saving the image file;
step 1024, Labelme will automatically generate a json file containing the position and the mark of each coordinate point of the polygon.
Step two: preprocessing a picture; preprocessing the picture comprises scaling the image with large pixels to a uniform size; if the unified size is 448 multiplied by 448 pixels, if the image is rectangular, the unified size is also needed to be square, for example, the 800 multiplied by 600 center is changed to 600 multiplied by 600;
step three: setting a model structure of CrackResAttentionNet; the CrackResAttentionNet model adopts a structure based on an encoder-decoder, and comprises an encoder and a decoder, wherein an attention module is added between each encoder and each decoder, and is positioned behind each encoder and connected with the corresponding decoder;
based on the above, as shown in fig. 2, the encoder is composed of an input layer, an encoding layer-1 to an encoding layer-5, wherein the encoding layer-1 to the encoding layer-4 correspond to the first layer to the fourth layer of the resenet 34 network which is pre-trained, respectively, and are ResNet34-1 to ResNet 34-4. As shown in fig. 2, the decoder is composed of a decoding layer-1 to a decoding layer-5, and an output layer.
On the basis of the above, as shown in fig. 2, the attention module respectively obtains the outputs of the coding layer-1 to the coding layer-4, and obtains the corresponding attention module outputs-1 to-4 through attention calculation. The output of the attention module is added with the output of the corresponding coding layer and the output of the previous decoding layer, and the sum is directly sent to the next decoding layer as input.
Step four: determining a loss function; the invention uses the following loss function for comparison: namely pixel cross entropy loss (CE), balanced pixel cross entropy loss (BCE), and Dice loss.
Preferably, the loss function adopts balanced pixel cross entropy loss (BCE), test data in the sample is implemented, and higher prediction precision can be obtained by adopting BCE loss, so that a better segmentation effect is obtained;
step five: determining an optimizer, preferably adopting an Adam optimizer, wherein the Adam optimizer has the advantages of high efficiency, small occupied memory, suitability for large-scale data and the like;
step six: initializing a weight matrix, for a ResNet34 pre-training model part, using weights of the pre-training model, and for other layers except ResNet34, including an input layer, an output layer, a coding layer 5, a decoding layer 1 to a decoding layer 5, initializing the weight matrix by using normal distribution; the initialization normal distribution initialization weight matrix firstly establishes truncated normal distribution through a general normal distribution and a truncation interval, and then samples are obtained from the truncated normal distribution in an inverse distribution sampling mode to obtain corresponding truncated normal distribution samples which serve as initial values of the weight matrix.
Preferably, the weight initialization value is obtained by sampling from a truncated normal distribution with a variance of 0.01, so that the model can be converged more quickly in the following training process.
Step seven: forward propagation, wherein the input signal obtains the output of each layer with the help of the weight matrix, and finally reaches the predicted value of the output layer;
step eight: backward propagation; after a network prediction result calculated by any group of random parameters is obtained through forward propagation, the parameters are corrected and updated by utilizing the gradient of a loss function relative to each parameter;
step nine: updating the weight matrix, and updating the weight matrix according to the gradient of the parameters obtained by back propagation to achieve the effect of reducing the loss function;
step ten: and if the maximum training times are not reached, returning to the step seven, continuing the forward propagation, and otherwise, saving the CrackResattentionNet binary model with the best performance.
Step eleven: inputting an asphalt pavement crack image to be segmented, collecting the shot pavement crack image and taking the collected image as the input of a system;
step twelve: preprocessing an image; the pre-processing of the picture includes scaling a large image to a uniform size;
step thirteen: loading the trained CrackResattentionNet, comprising the following steps:
step 1301: finding out a trained model file according to the transmitted file name;
step 1302: reading the model file to a memory;
step 1303: the prediction model predicts by using parameters in the loaded model file;
preferably, the efficiency of segmentation is accelerated by reading the trained CrackResattentionNet model from disk into memory and then using the trained parameters directly for prediction.
Fourteen steps: segmenting and outputting the asphalt pavement crack image;
step fifteen: and acquiring a trained CrackResattentionNet model file, storing the trained CrackResattentionNet model file on a disk, and simultaneously loading a model binary file into a memory.
Example two
On the basis of the above embodiment, with reference to fig. 1 to 7, it is further explained that the asphalt pavement crack image segmentation method based on the deep convolutional neural network of the present invention includes the following steps:
the method comprises the following steps: preparing a picture data set of the asphalt pavement cracks; adopting the scheme of step 101 or step 102:
step 101: the labeled public asphalt pavement crack segmentation data set is directly used, and the embodiment adopts the following steps: https:// www.irit.fr/. Sylvie. Chambon/Crack _ Detection _ database. html, wherein the data set comprises an asphalt pavement Crack image and marked Crack shapes and positions, and is used as an asphalt pavement Crack image data set;
step 102: shooting real pavement crack pictures through a vehicle-mounted camera to form a crack picture data set; if 5000 pictures of asphalt pavement cracks are shot on a certain road section of the smoke table; since the taken crack pictures are not marked with the shape and the position of the crack, the shape and the position of the crack need to be manually marked for each crack picture through Labelme software.
Step two: preprocessing a picture; the preprocessing of the picture comprises the steps of scaling a large image to a uniform size of 448 multiplied by 448 pixels, if the image is rectangular, the image also needs to be uniformly square in size, for example, the 800 multiplied by 600 center is changed to 600 multiplied by 600;
step three: setting the model structure of the crackreatinetentinenet, as shown in fig. 2, the model of the crackreatentinenet adopts a structure based on an encoder-decoder, and comprises an encoder and a decoder, and an attention module is added between each encoder and each decoder, and is positioned behind each encoder and connected with the corresponding decoder.
Specifically, as shown in fig. 2, the encoder comprises an input layer, an encoding layer-1 to an encoding layer-5, wherein the encoding layer-1 to the encoding layer-4 respectively correspond to the first layer to the fourth layer of the resenet 34 network which is pre-trained, and are respectively ResNet34-1 to ResNet 34-4.
Specifically, as shown in fig. 2, the decoder is composed of a decoding layer-1 to a decoding layer-5 and an output layer.
Specifically, as shown in fig. 2, the attention module obtains the outputs from the coding layer-1 to the coding layer-4, and obtains the corresponding attention module outputs-1 to-4 through attention calculation. The output of the attention module is added with the output of the corresponding coding layer and the output of the previous decoding layer, and the sum is directly sent to the next decoding layer as input.
Further, as shown in fig. 3, the coding layer-5 is a coding layer having a structure different from the coding layer-1 to the coding layer-4, and it uses a convolution kernel with a size of 2 × 2 to perform convolution with a step size of 2, and the padding is 0, and the size of the output matrix is divided by 2; discarding, batch normalization processing and activating functions are connected after convolution operation; the output of the coding layer-5 is directly input into the decoding layer-5;
further, as shown in FIG. 4, the decoding layer-5 contains a convolution block-1, a convolution block-2, and a de-convolution block, the last part being convolution block-3. Convolution blocks-1 to-3 will use convolution kernels of size 1x1, performing convolutions of step size 1 (also filled with 0), which will get the same size as the input size. After convolution, discarding, batch normalization processing and activating functions are connected in sequence. The deconvolution block will first perform deconvolution via the ConvTranspose2d function with a convolution kernel size of 2 x 2 and a step size of 2, which will multiply the input size by 2, with the batch normalization process and the activation function immediately following it.
Further, as shown in fig. 5, the decoding layers-1, -2, -3 to-4 have the same structure, which includes a convolution block-1, an inverse convolution block, and a convolution block-2. Both convolution block-1, convolution block-2 will perform the convolution using a convolution kernel of size 1x1 and step size 1 and fill in 0, which will result in the same size as the input, followed by discard, batch normalization and activation. The deconvolution block will perform deconvolution by the ConvTranspose2d function with a kernel size of 3 x 3 and a step size of 2, which will multiply the input size by 2, immediately followed by the batch normalization process and the activation function.
Further, as shown in fig. 6, the output layer includes a deconvolution block-1, a convolution block-2, and a deconvolution block-2; the deconvolution block-1 will be deconvoluted by the function ConvTranspose2d with a convolution kernel size of 3 x 3 with a step size of 2, which will multiply the input size by 2. The batch normalization process and activation function then follows. The convolution block-1 and convolution block-2 have the same structure, it will perform convolution with step size 1 with a convolution kernel of size 3 x 3 and fill 0, the output matrix is the same size as the input. Discard, batch normalization, and activate function concatenate. Deconvolution block-2 only needs to perform deconvolution by the ConvTranspose2d function, with a kernel size of 2 x 2 and a step size of 2, which will multiply the input size by 2. The output will be the final predicted image, which is the same size as the input image.
On the basis of the above, the attention module is positioned behind each encoder and connected with a corresponding decoder; the cracks are segmented differently in scale, illumination and different views, and since the convolution operation introduces more local receptive fields, there may be differences in the features corresponding to the pixels with the same label, which may lead to inconsistencies within the same class, affecting accuracy. Therefore, context information is extracted by establishing a correlation mechanism among the global features, the segmentation capability of the crack is increased on the basis of crack segmentation, long-distance context information can be effectively captured, and the feature representation capability is improved.
As shown in fig. 7, two types of attention modules are added to obtain a global context according to local characteristics in the network. For the output of the ResNet34 decoding layer, the convolutional layer is first applied to obtain the features of the different layers, which does not change the size of the input.
The first attention module is a location attention module. The location attention module will extract a greater range of contextual information in the local features. Feature map A, B, C generated using convolutional layer, where { A, B, C, D }. epsilon.RC×H×W. Then deforming A, B and C into RC×NWhere N × W is the number of pixels.
Then transpose B to RN×CThus, a matrix multiplication between transposes of C and B can be performed, resulting in RN×NThen applying a softmax layer to calculate a spatial attention feature map S e RN×NAs in formula (1):
Figure BDA0002885593580000131
wherein s isjiMeasurement ofThe effect of the ith position on the j positions. The more similar the feature representations of two locations, the greater the correlation between them. Then transpose S, which will have the same shape RN×NBut the size is changed. The invention performs a matrix multiplication between transposes of A and S, with the result that R isC×NThen morph this result into RC×H×WFinally, multiplying by the scale parameter alpha, and carrying out element summation operation on the original convolution characteristic D to obtain the final output H ∈ RC×H×WAs shown in equation (2):
Figure BDA0002885593580000132
where α is initialized to 0 and gets more weight through gradual learning.
Also, for the channel attention module shown as B in FIG. 7, it can emphasize the interdependent feature maps and improve the semantic-specific feature representation. Convolution is first performed to extract the feature maps E, F, G, H, and { E, F, G, H }. epsilon.RC×H×W. Then the matrix F, G is transformed into RC×NWhere N × W is the number of pixels. Then transpose F to RN×CThus, a matrix multiplication between transposes of F and E can be performed to obtain RN×NThe result matrix of (1), now applying a softmax layer to compute a spatial attention map X ∈ RN×NAs in formula (3):
Figure BDA0002885593580000133
wherein xjiThe effect of the ith position on the j positions is measured.
Then carrying out matrix multiplication between the softmax result X and the deformed G to obtain a result RC×NThen morph the result to RC×H×WFinally, multiplying by a scale parameter beta, and carrying out element summation operation by using the original convolution characteristic H to obtain the final output I epsilon RC×H×WAs shown in the following equation (4):
Figure BDA0002885593580000141
where β is initialized to 0 and gradually learns more weight through learning.
Thus, the final feature of each channel is a weighted sum of the features of all channels and the original features, which models well the long-range semantic dependencies between feature maps. Under the condition that the position attention H and the channel attention I are both C multiplied by H multiplied by W, the two attention results are fused, and the attention ratio of the channel is given as
Figure BDA0002885593580000142
Accordingly, the ratio of channel attention is
Figure BDA0002885593580000143
The scaled sum operation based on the matrix element level is calculated as shown in equation (5) below:
Figure BDA0002885593580000144
here, the
Figure BDA0002885593580000145
Is a hyper-parameter and the location attention of crack segmentation can be emphasized with 0.8.
The encoder and decoder are connected by a bridge, as shown in fig. 2, the bridge connector connects each encoder layer and decoder and is implemented by merging the output of each encoder layer with the output of the attention module and the last decoded layer; inputting this fused output to the decoding layer, the encoding layer and corresponding attention information can be captured.
Step four: determining a loss function; the invention uses three loss functions for comparison, namely pixel cross entropy loss (CE), balanced pixel cross entropy loss (BCE) and Dice loss:
wherein the pixel cross entropy loss CE is shown in the following formula (6):
Figure BDA0002885593580000146
i represents the index of the pixel, n x n represents the size of the output image, p is the true value of the sample, the positive class is 1, the negative class is 0,
Figure BDA0002885593580000147
the probability that a sample is predicted to be positive.
The balanced pixel cross-entropy penalty is similar to the pixel cross-entropy penalty, but it only assigns weights to the positive and negative samples, with the sum of the weights being 1. The formula is shown in the following formula (7):
Figure BDA0002885593580000023
wherein BCE is balance pixel cross entropy loss, n x n represents the size of an output pixel, beta is a balance coefficient, p is a real value of a sample, a positive class is 1, a negative class is 0,
Figure BDA0002885593580000152
the probability that a sample is predicted to be positive. The Dice loss is designed from the perspective of the cross-over ratio (IoU), as shown in equation (8): a
Figure BDA0002885593580000153
In the formula (8), TP is pixel true positive, FN is pixel false negative;
step five: determining an optimizer; by using the Adam optimizer, the Adam optimizer has the advantages of high efficiency, small occupied memory, suitability for large-scale data and the like.
Step six: initializing a weight matrix; for the ResNet34 pre-training model part, weights of the pre-training model are used, and for other layers except ResNet34 including the input layer, the output layer, the encoding layer 5, the decoding layer 1 to the decoding layer 5, weight matrices are initialized using random normality.
Step seven: forward propagation; the input signal obtains the output of each layer with the help of the weight matrix, and finally reaches the predicted value of the output layer.
Step eight: backward propagation; after the network prediction results calculated by any random set of parameters are obtained through forward propagation, the network prediction results are updated by using the gradient of the loss function relative to each parameter.
Step nine: updating the weight matrix; and updating the weight matrix according to the gradient of the parameters obtained by back propagation.
Step ten: and if the maximum training times are not reached, returning to the step seven, continuing the forward propagation, and otherwise, saving the CrackResattentionNet binary model with the best performance.
Step eleven: inputting a crack image of the asphalt pavement to be segmented; and collecting road surface crack images shot by the vehicle-mounted camera as the input of the system.
Step twelve: preprocessing an image; the pre-processing of the picture includes scaling the large image to a uniform size of 448 x 448 pixels, and if the image is rectangular, it needs to be first uniform to a square size (e.g., change the 800 x 600 truncated center to 600 x 600).
Step thirteen: loading the trained CrackResattentionNet, comprising the following steps:
step 1301: finding out a trained model file according to the transmitted file name;
step 1302: reading the model file to a memory;
step 1303: the prediction model predicts by using parameters in the loaded model file;
fourteen steps: segmentation and output of the crack image; the road surface image with cracks is input, and the well-divided road surface image can be predicted through the trained CrackResAttentionNet, wherein the pixels of the cracks are displayed in white, and other backgrounds are displayed in black.
And fifthly, storing the trained CrackResAttentionNet model file obtained in the step fifteen on a magnetic disk, and loading a model binary file to a memory in the step.
EXAMPLE III
Based on the above embodiments, the performance of the CrackResattentionNet model is evaluated by the following two test data, one based on the common pavement crack data set and the other based on the smoke bench data set.
All tests were performed on a computer of the following specifications:
the software environment is based on ubuntu16.04, python being the primary programming language. The experiments were performed on a Pytorch 1.5 deep learning framework.
The invention adopts a small batch random gradient descent method as an optimization algorithm for training. The values of the hyper-parameters are as follows: the weight attenuation factor is 0.0002, the momentum is 0.9, the learning rate is 0.01, the small batch number is 4, and the epochs number is 60.
For each type of experiment, the invention will run a typical image segmentation model, including ENet, ExFuse, FCN, LinkNet, SegNet, and UNet, in addition to CrackResAttentionNet.
An ENet (efficient Neural network) split network is particularly good at low latency operation because it has fewer parameters; the ExFuse (enhancing Feature Fusion for Semantic segmentation) segmentation network effectively combines the low-order features and the high-order features, thereby greatly improving the segmentation accuracy; fcn (full volumetric networks) is the first segmentation model to make a major breakthrough using full convolution instead of full connected layers; the LinkNet segmentation model is also based on a coder decoder framework, and better accuracy is obtained through fewer parameters; the SegNet segmentation model is specially designed for efficient semantic segmentation; UNet is a symmetric encoder-decoder architecture, like the letter U-shape, that was initially used for medical image segmentation.
For each of the above exemplary models, the present invention will train three different penalty functions, namely pixel cross-entropy penalty (CE), balanced pixel cross-entropy penalty (BCE), and Dice penalty (Dice).
For the crack segmentation task in the present invention, the following evaluation indices were used: accuracy, average IoU, precision (P), recall (R), and F1. The F1 score is a harmonic mean of accuracy and recall, the crack pixel (white pixel in the image) is defined as a positive sample, and the pixels are classified into four types according to the combination of the marked and predicted results: true Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).
Accuracy (Precision) is defined as the number of correctly identified pixels on all predicted pixels; the precision is defined as the ratio of correctly predicted crack pixels relative to all predicted crack pixels; recall (Recall) is defined as the ratio of correctly predicted crack pixels relative to all real crack pixels; the F1 score is the harmonic mean of accuracy and Recall, where accuracy (Precision) is equation (9), Recall (Recall) is equation (10), and F1 score is equation (11).
Figure BDA0002885593580000171
Figure BDA0002885593580000172
Figure BDA0002885593580000173
An intersection set (IOU) reflects the degree of overlap between two objects. In the invention, the IOU is evaluated on the "crack" category to provide a measure of overlap between the actual cracks of the asphalt pavement and the predicted cracks, as shown in equation (12).
Figure BDA0002885593580000181
The test results are as follows:
1. public data set
TABLE 1 common fracture data set-CE loss
Segmentation model Accuracy/%) Average IoU Recall/%) F1/%
ENet 80.03 0.7222 83.94 81.94
ExFuse 82.22 0.7170 81.17 81.69
FCN 81.87 0.7102 77.72 79.74
LinkNet 81.15 0.7097 82.62 81.88
SegNet 78.00 0.6632 75.18 76.56
UNet 80.19 0.7042 82.88 81.51
CrackResAttentionNet 82.58 0.7283 85.13 83.84
TABLE 2 common fracture data set-BCE loss
Figure BDA0002885593580000182
Figure BDA0002885593580000191
TABLE 3 common fracture data set-Dice loss
Segmentation model Accuracy/%) Average IoU Recall/%) F1/%
ENet 76.18 0.5545 56.68 65.00
ExFuse 48.92 0.4888 50.00 49.45
FCN 80.17 0.6783 79.11 79.64
LinkNet 86.97 0.7076 83.00 84.94
SegNet 82.92 0.6696 84.35 83.63
UNet 80.76 0.7002 85.20 82.92
CrackResAttentionNet 90.72 0.7169 81.93 86.10
2. Cigarette bench data set
TABLE 4 tobacco stage crack data set-CE loss
Figure BDA0002885593580000192
Figure BDA0002885593580000201
TABLE 5 tobacco stage crack data set-BCE loss
Segmentation model Accuracy/%) Average IoU Recall/%) F1/%
ENet 94.67 0.8120 92.34 93.49
ExFuse 95.18 0.8203 91.85 93.48
FCN 93.05 0.8295 90.64 91.83
LinkNet 95.07 0.8253 92.08 93.55
SegNet 91.24 0.7806 83.04 86.95
UNet 94.28 0.8161 90.26 92.23
CrackResAttentionNet 96.17 0.8369 93.44 94.79
TABLE 6 tobacco stage crack data set-Dice loss
Segmentation model Accuracy/%) Average IoU Recall/%) F1/%
ENet 94.80 0.8217 92.10 93.43
ExFuse 92.10 0.7412 87.66 89.87
FCN 90.23 0.7765 89.16 89.69
LinkNet 94.45 0.8076 91.62 93.01
SegNet 91.80 0.7486 90.23 91.01
UNet 93.76 0.8011 91.10 92.41
CrackResAttentionNet 95.43 0.8275 94.2 94.81
From the test results shown in the above table, it can be seen that the CrackResattentionNet proposed by the present invention performs better than the existing typical methods, especially in terms of accuracy and method, directly reflecting the location and severity of the crack.
For the same method, by comparing three different loss functions (CE, BCE, rice), it can be seen that the balanced pixel cross entropy loss (BCE) has better performance than the other two methods. The BCE-loss sample segmentation output for each model is shown in fig. 8 and 9, from which it can be seen that the segmentation of the image by crackresintenenet is very close to the true value, while typical models such as SegNet, FCN, ExFuse have significant mishandling for noise, resulting in segmentation of white non-crack regions.
The invention utilizes CrackResattentionNet and typical models (ENet, ExFuse, FCN, LinkNet, SegNet, UNet) under three different loss functions (CE, BCE, Dice), and test results on a common crack data set and a smoke bench data set show that CrackResattentionNet using BCE loss functions has precision (89.40%), average IoU (71.51%), recall rate (81.09%) and F1 (85.04%) on the common data set, precision (96.17%), average IoU (83.69%), recall rate (93.44%) and F1 (94.79%).
The invention has proposed a structure and its concrete application method based on crack detection of bituminous pavement of the codec network and picture element level image segmentation, the encoder of the core part mainly utilizes the convolution layer of ResNet34 to withdraw the image characteristic, and increased a coded layer in order to withdraw the information better behind it; the decoder uses a deconvolution layer to perform semantic segmentation on cracked and non-cracked pixels, and an additional position attention module and a channel attention module are connected behind each encoder to capture remote context information; the outputs of the two attention modules are fused in proportion, so that the position information can be more emphasized, the output of each coding layer is fused with the attention output and is connected with the corresponding decoding layer, and the output of the previous decoding layer is used as the input to the next decoding layer; therefore, the decoding layer and the up-sampling operation thereof can fully utilize the spatial information and improve the prediction precision. By implementing the technical scheme of the invention, the asphalt pavement crack can be accurately and intelligently identified, and the intelligent detection and management and maintenance decision level and efficiency of the asphalt pavement crack are improved.
The technical features mentioned above are combined with each other to form various embodiments which are not listed above, and all of them are regarded as the scope of the present invention described in the specification; also, modifications and variations may be suggested to those skilled in the art in light of the above teachings, and it is intended to cover all such modifications and variations as fall within the true spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A method for segmenting an asphalt pavement crack image based on a deep convolutional neural network is characterized by comprising the following steps:
the method comprises the following steps: preparing a picture data set of the asphalt pavement cracks;
step two: preprocessing a picture; the pre-processing of the picture includes scaling a large image to a uniform size;
step three: setting a model structure of CrackResAttentionNet; the CrackResAttentionNet model adopts a structure based on an encoder-decoder, and comprises an encoder and a decoder, wherein an attention module is added between each encoder and each decoder, and is positioned behind each encoder and connected with the corresponding decoder;
step four: determining a loss function; the comparison was made using pixel cross entropy loss (CE), balanced pixel cross entropy loss (BCE), and Dice loss:
step 401: the pixel cross entropy loss CE is shown in equation (6) below:
Figure FDA0002885593570000011
i represents the index of the pixel, n x n represents the size of the output image, p is the true value of the sample, the positive class is 1, the negative class is 0,
Figure FDA0002885593570000012
a probability of predicting a sample as positive;
step 402: the balanced pixel cross-entropy penalty is similar to the pixel cross-entropy penalty, with a sum of weights of 1, as shown in equation (7) below:
Figure FDA0002885593570000013
wherein BCE is balance pixel cross entropy loss, n x n represents the size of an output pixel, beta is a balance coefficient, p is a real value of a sample, a positive class is 1, a negative class is 0,
Figure FDA0002885593570000014
a probability of predicting a sample as positive;
step 403: the Dice loss is designed from the perspective of the cross-over ratio IoU, and is shown in equation (8):
Figure FDA0002885593570000021
in the formula (8), TP is pixel true positive, FN is pixel false negative;
step five: determining an optimizer, and adopting an Adam optimizer;
step six: initializing a weight matrix; for the ResNet34 pre-training model part, using the weight of the pre-training model, and for other layers except ResNet34, including an input layer, an output layer, a coding layer 5, a decoding layer 1 to a decoding layer 5, initializing a weight matrix by using normal distribution;
step seven: forward propagation; the input signal obtains the output of each layer with the help of the weight matrix, and finally reaches the predicted value of the output layer;
step eight: backward propagation; after a network prediction result calculated by any group of random parameters is obtained through forward propagation, correcting and updating by utilizing the gradient of a loss function relative to each parameter;
step nine: updating the weight matrix; updating the weight matrix according to the gradient of the parameters obtained by back propagation;
step ten: if the maximum training times are not reached, returning to the step seven, continuing to forward propagate, otherwise, saving the CrackResattentionNet binary model with the best performance;
step eleven: inputting a crack image of the asphalt pavement to be segmented; collecting the shot asphalt pavement crack images and using the collected images as the input of a system;
step twelve: preprocessing an image; the pre-processing of the picture includes scaling a large image to a uniform size;
step thirteen: loading the trained CrackResattentionNet, comprising the following steps:
step 1301: finding out a trained model file according to the transmitted file name;
step 1302: reading the model file to a memory;
step 1303: the prediction model predicts by using parameters in the loaded model file;
fourteen steps: segmentation and output of the crack image; inputting an asphalt pavement image with cracks, and predicting the well-segmented asphalt pavement image through the trained CrackResattentionNet, wherein pixels of the cracks are displayed in white, and other backgrounds are displayed in black;
step fifteen: and acquiring a trained CrackResattentionNet model file, storing the trained CrackResattentionNet model file on a disk, and simultaneously loading a model binary file into a memory.
2. The method according to claim 1, wherein in the first step, the scheme of step 101 or step 102 is specifically adopted:
step 101: directly using a marked public crack segmentation data set, wherein the data set comprises an asphalt pavement crack image and a marked crack shape and position as a crack image data set;
step 102: shooting real asphalt pavement crack photos to form a crack picture data set; manually marking the shape and the position of each crack photo by Labelme software;
102, manually marking the label by adopting the following 4 sub-steps:
step 1021, starting a Labelme software window, and opening a picture of the asphalt pavement crack;
step 1022, drawing a polygon on the outer contour of the crack by using a mouse according to the shape of the crack, so that the polygon just covers the crack;
step 1023, naming the crack as a crack mark and saving the image file;
step 1024, Labelme will automatically generate a json file containing the position and the mark of each coordinate point of the polygon.
3. The method of claim 2, wherein in step two, the image is scaled to a uniform size of 448 x 448 pixels, and if the image is rectangular, it is also required to be uniform to a square size.
4. The method of claim 3, wherein in the third step, the encoder comprises an input layer, an encoding layer-1 to an encoding layer-5, wherein the encoding layer-1 to the encoding layer-4 respectively correspond to the first layer to the fourth layer of the ResNet34 network which is pre-trained, and are ResNet34-1 to ResNet34-4 respectively; the decoder consists of a decoding layer-1 to a decoding layer-5 and an output layer.
5. The method according to claim 4, wherein in the third step, the attention module respectively obtains the outputs from coding layer-1 to coding layer-4, and obtains the corresponding attention module output-1 to attention module output-4 through attention calculation; the output of the attention module is added with the output of the corresponding coding layer and the output of the previous decoding layer, and the sum is directly sent to the next decoding layer as input; the coding layer-5 is a coding layer with a structure different from the structures of the coding layer-1 to the coding layer-4, a convolution kernel with the size of 2 multiplied by 2 is used for carrying out convolution with the step length of 2, the padding is 0, and the size of an output matrix is divided by 2 on the original size; discarding, batch normalization processing and activating functions are connected after convolution operation; the output of the coding layer-5 is directly input into the decoding layer-5; decoding layer-5 contains convolution block-1, convolution block-2 and deconvolution block, the last part is convolution block-3; convolution blocks-1 to-3 will use convolution kernels of size 1x1, perform a convolution with step size 1, and will get the same size as the input size; after convolution, sequentially connecting discarding, batch normalization processing and activating functions; the deconvolution block will first perform deconvolution via the ConvTranspose2d function with a convolution kernel size of 2 x 2 and a step size of 2, which will multiply the input size by 2, with the batch normalization process and the activation function immediately following it.
6. The method of claim 5, wherein in step three, the attention module comprises a location attention module and a channel attention module; the location attention module will extract a larger range of context information in the local features; feature map A, B, C generated using convolutional layer, where { A, B, C, D }. epsilon.RC×H×WThen deforming A, B, C to RC×NWhere N — H × W is the number of pixels; then transpose B to RN×CThus, matrix multiplication is performed between transposes of C and B, resulting in RN×NThen applying a softmax layer to calculate a spatial attention feature map S e RN×NAs in formula (1):
Figure FDA0002885593570000041
s in formula (1)jiMeasuring the i-th position versus j positions(ii) an effect; the more similar the feature representations of two locations, the greater the correlation between them; then transposes S, and performs matrix multiplication between transposes of A and S, resulting in RC×NThen the result is transformed into RC×H×WFinally, multiplying by the scale parameter alpha, and carrying out element summation operation on the original convolution characteristic D to obtain the final output H ∈ RC×H×WAs shown in equation (2):
Figure FDA0002885593570000051
in the formula (2), alpha is initialized to 0, and more weights are obtained through gradual learning;
the channel attention module first performs convolution to extract feature maps E, F, G, H, and { E, F, G, H }. epsilon.RC×H×W(ii) a Then the matrix F, G is transformed into RC×NWhere N — H × W is the number of pixels; then transpose F to RN×CThus, matrix multiplication is performed between transposes of F and E to obtain RN×NThe result matrix of (1), now applying a softmax layer to compute a spatial attention map X ∈ RN×NAs in formula (3):
Figure FDA0002885593570000052
x in formula (3)jiMeasuring the influence of the ith position on the j positions; then carrying out matrix multiplication between the softmax result X and the deformed G to obtain a result RC×NThen morph the result to RC×H×WFinally, multiplying by a scale parameter beta, and carrying out element summation operation by using the original convolution characteristic H to obtain the final output I epsilon RC×H×WAs shown in the following equation (4):
Figure FDA0002885593570000053
in the formula (4), beta is initialized to 0, and is gradually learned to more weights through learning; the scaled sum operation based on the matrix element level is calculated as shown in equation (5) below:
Figure FDA0002885593570000054
in the formula (5)
Figure FDA0002885593570000055
Is a hyper-parameter, and the position attention of the fracture segmentation is emphasized by 0.8.
CN202110012193.2A 2021-01-06 2021-01-06 Asphalt pavement crack image segmentation method based on deep convolutional neural network Active CN112634292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110012193.2A CN112634292B (en) 2021-01-06 2021-01-06 Asphalt pavement crack image segmentation method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110012193.2A CN112634292B (en) 2021-01-06 2021-01-06 Asphalt pavement crack image segmentation method based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN112634292A true CN112634292A (en) 2021-04-09
CN112634292B CN112634292B (en) 2021-08-24

Family

ID=75290771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110012193.2A Active CN112634292B (en) 2021-01-06 2021-01-06 Asphalt pavement crack image segmentation method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN112634292B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129281A (en) * 2021-04-13 2021-07-16 广西大学 Wheat stem section parameter detection method based on deep learning
CN113284093A (en) * 2021-04-29 2021-08-20 安徽省皖北煤电集团有限责任公司 Satellite image cloud detection method based on improved D-LinkNet
CN113313669A (en) * 2021-04-23 2021-08-27 石家庄铁道大学 Method for enhancing semantic features of top layer of surface disease image of subway tunnel
CN113421276A (en) * 2021-07-02 2021-09-21 深圳大学 Image processing method, device and storage medium
CN114170232A (en) * 2021-12-02 2022-03-11 匀熵教育科技(无锡)有限公司 X-ray chest radiography automatic diagnosis and new crown infected area segmentation method based on Transformer
CN114494868A (en) * 2022-01-19 2022-05-13 安徽大学 Unmanned aerial vehicle remote sensing building extraction method based on multi-feature fusion deep learning
CN114596266A (en) * 2022-02-25 2022-06-07 烟台大学 Concrete crack detection method based on ConcreteCrackSegNet model
CN114724133A (en) * 2022-04-18 2022-07-08 北京百度网讯科技有限公司 Character detection and model training method, device, equipment and storage medium
CN114782405A (en) * 2022-05-20 2022-07-22 盐城工学院 Bridge crack detection method and device based on image recognition and machine vision
CN114897909A (en) * 2022-07-15 2022-08-12 四川大学 Crankshaft surface crack monitoring method and system based on unsupervised learning
CN115147439A (en) * 2022-07-11 2022-10-04 南京工业大学 Concrete crack segmentation method and system based on deep learning and attention mechanism
CN115147381A (en) * 2022-07-08 2022-10-04 烟台大学 Pavement crack detection method based on image segmentation
CN115571656A (en) * 2022-09-28 2023-01-06 华能伊敏煤电有限责任公司 Automatic dumping control method and system based on material level detection
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN116993730A (en) * 2023-09-26 2023-11-03 四川新视创伟超高清科技有限公司 Crack detection method based on 8K image
CN117455813A (en) * 2023-11-15 2024-01-26 齐鲁工业大学(山东省科学院) Method for restoring Chinese character images of shielding handwritten medical records based on gating convolution and SCPAM attention module

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130273A1 (en) * 2017-10-27 2019-05-02 Salesforce.Com, Inc. Sequence-to-sequence prediction using a neural network model
CN110009641A (en) * 2019-03-08 2019-07-12 广州视源电子科技股份有限公司 Crystalline lens dividing method, device and storage medium
CN111222580A (en) * 2020-01-13 2020-06-02 西南科技大学 High-precision crack detection method
CN111402259A (en) * 2020-03-23 2020-07-10 杭州健培科技有限公司 Brain tumor segmentation method based on multi-level structure relation learning network
CN111915592A (en) * 2020-08-04 2020-11-10 西安电子科技大学 Remote sensing image cloud detection method based on deep learning
CN111986204A (en) * 2020-07-23 2020-11-24 中山大学 Polyp segmentation method and device and storage medium
CN112183507A (en) * 2020-11-30 2021-01-05 北京沃东天骏信息技术有限公司 Image segmentation method, device, equipment and storage medium
CN112233105A (en) * 2020-10-27 2021-01-15 江苏科博空间信息科技有限公司 Road crack detection method based on improved FCN

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130273A1 (en) * 2017-10-27 2019-05-02 Salesforce.Com, Inc. Sequence-to-sequence prediction using a neural network model
CN110009641A (en) * 2019-03-08 2019-07-12 广州视源电子科技股份有限公司 Crystalline lens dividing method, device and storage medium
CN111222580A (en) * 2020-01-13 2020-06-02 西南科技大学 High-precision crack detection method
CN111402259A (en) * 2020-03-23 2020-07-10 杭州健培科技有限公司 Brain tumor segmentation method based on multi-level structure relation learning network
CN111986204A (en) * 2020-07-23 2020-11-24 中山大学 Polyp segmentation method and device and storage medium
CN111915592A (en) * 2020-08-04 2020-11-10 西安电子科技大学 Remote sensing image cloud detection method based on deep learning
CN112233105A (en) * 2020-10-27 2021-01-15 江苏科博空间信息科技有限公司 Road crack detection method based on improved FCN
CN112183507A (en) * 2020-11-30 2021-01-05 北京沃东天骏信息技术有限公司 Image segmentation method, device, equipment and storage medium

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129281A (en) * 2021-04-13 2021-07-16 广西大学 Wheat stem section parameter detection method based on deep learning
CN113129281B (en) * 2021-04-13 2022-06-21 广西大学 Wheat stem section parameter detection method based on deep learning
CN113313669A (en) * 2021-04-23 2021-08-27 石家庄铁道大学 Method for enhancing semantic features of top layer of surface disease image of subway tunnel
CN113284093A (en) * 2021-04-29 2021-08-20 安徽省皖北煤电集团有限责任公司 Satellite image cloud detection method based on improved D-LinkNet
CN113421276A (en) * 2021-07-02 2021-09-21 深圳大学 Image processing method, device and storage medium
CN113421276B (en) * 2021-07-02 2023-07-21 深圳大学 Image processing method, device and storage medium
CN114170232A (en) * 2021-12-02 2022-03-11 匀熵教育科技(无锡)有限公司 X-ray chest radiography automatic diagnosis and new crown infected area segmentation method based on Transformer
CN114170232B (en) * 2021-12-02 2024-01-26 匀熵智能科技(无锡)有限公司 Transformer-based X-ray chest radiography automatic diagnosis and new crown infection area distinguishing method
CN114494868B (en) * 2022-01-19 2022-11-22 安徽大学 Unmanned aerial vehicle remote sensing building extraction method based on multi-feature fusion deep learning
CN114494868A (en) * 2022-01-19 2022-05-13 安徽大学 Unmanned aerial vehicle remote sensing building extraction method based on multi-feature fusion deep learning
CN114596266A (en) * 2022-02-25 2022-06-07 烟台大学 Concrete crack detection method based on ConcreteCrackSegNet model
CN114596266B (en) * 2022-02-25 2023-04-07 烟台大学 Concrete crack detection method based on ConcreteCrackSegNet model
CN114724133B (en) * 2022-04-18 2024-02-02 北京百度网讯科技有限公司 Text detection and model training method, device, equipment and storage medium
CN114724133A (en) * 2022-04-18 2022-07-08 北京百度网讯科技有限公司 Character detection and model training method, device, equipment and storage medium
CN114782405A (en) * 2022-05-20 2022-07-22 盐城工学院 Bridge crack detection method and device based on image recognition and machine vision
CN115147381A (en) * 2022-07-08 2022-10-04 烟台大学 Pavement crack detection method based on image segmentation
CN115147439B (en) * 2022-07-11 2023-12-29 南京工业大学 Concrete crack segmentation method and system based on deep learning and attention mechanism
CN115147439A (en) * 2022-07-11 2022-10-04 南京工业大学 Concrete crack segmentation method and system based on deep learning and attention mechanism
CN114897909A (en) * 2022-07-15 2022-08-12 四川大学 Crankshaft surface crack monitoring method and system based on unsupervised learning
CN115571656B (en) * 2022-09-28 2023-06-02 华能伊敏煤电有限责任公司 Automatic soil discharging control method and system based on material level detection
CN115571656A (en) * 2022-09-28 2023-01-06 华能伊敏煤电有限责任公司 Automatic dumping control method and system based on material level detection
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN115731243B (en) * 2022-11-29 2024-02-09 北京长木谷医疗科技股份有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN116993730B (en) * 2023-09-26 2023-12-15 四川新视创伟超高清科技有限公司 Crack detection method based on 8K image
CN116993730A (en) * 2023-09-26 2023-11-03 四川新视创伟超高清科技有限公司 Crack detection method based on 8K image
CN117455813A (en) * 2023-11-15 2024-01-26 齐鲁工业大学(山东省科学院) Method for restoring Chinese character images of shielding handwritten medical records based on gating convolution and SCPAM attention module

Also Published As

Publication number Publication date
CN112634292B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN112634292B (en) Asphalt pavement crack image segmentation method based on deep convolutional neural network
CN110188765B (en) Image semantic segmentation model generation method, device, equipment and storage medium
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN114092832B (en) High-resolution remote sensing image classification method based on parallel hybrid convolutional network
CN109523013B (en) Air particulate matter pollution degree estimation method based on shallow convolutional neural network
CN113780296A (en) Remote sensing image semantic segmentation method and system based on multi-scale information fusion
CN113642390B (en) Street view image semantic segmentation method based on local attention network
CN112862774B (en) Accurate segmentation method for remote sensing image building
CN113850824A (en) Remote sensing image road network extraction method based on multi-scale feature fusion
CN112418212B (en) YOLOv3 algorithm based on EIoU improvement
CN113034444A (en) Pavement crack detection method based on MobileNet-PSPNet neural network model
CN116310339A (en) Remote sensing image segmentation method based on matrix decomposition enhanced global features
Nakhaee et al. DeepRadiation: An intelligent augmented reality platform for predicting urban energy performance just through 360 panoramic streetscape images utilizing various deep learning models
CN114529552A (en) Remote sensing image building segmentation method based on geometric contour vertex prediction
CN114170446A (en) Temperature and brightness characteristic extraction method based on deep fusion neural network
CN112634174B (en) Image representation learning method and system
CN114187530A (en) Remote sensing image change detection method based on neural network structure search
CN112347531B (en) Brittle marble Dan Sanwei crack propagation path prediction method and system
CN112651314A (en) Automatic landslide disaster-bearing body identification method based on semantic gate and double-temporal LSTM
CN115601759A (en) End-to-end text recognition method, device, equipment and storage medium
CN115330703A (en) Remote sensing image cloud and cloud shadow detection method based on context information fusion
CN114241470A (en) Natural scene character detection method based on attention mechanism
CN113870341A (en) Blast furnace sintering ore particle size detection method and system based on RGB and laser feature fusion
CN112508441B (en) Urban high-density outdoor thermal comfort evaluation method based on deep learning three-dimensional reconstruction
CN116030347B (en) High-resolution remote sensing image building extraction method based on attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant