WO2024092968A1 - 一种路面裂缝检测方法、介质及系统 - Google Patents

一种路面裂缝检测方法、介质及系统 Download PDF

Info

Publication number
WO2024092968A1
WO2024092968A1 PCT/CN2022/138775 CN2022138775W WO2024092968A1 WO 2024092968 A1 WO2024092968 A1 WO 2024092968A1 CN 2022138775 W CN2022138775 W CN 2022138775W WO 2024092968 A1 WO2024092968 A1 WO 2024092968A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
layer
neural network
convolutional
pooling layer
Prior art date
Application number
PCT/CN2022/138775
Other languages
English (en)
French (fr)
Inventor
王浩仰
潘宗俊
曹建坤
张菁红
弋晓明
孙浩宇
Original Assignee
中公高科养护科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中公高科养护科技股份有限公司 filed Critical 中公高科养护科技股份有限公司
Publication of WO2024092968A1 publication Critical patent/WO2024092968A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Definitions

  • the present application relates to the technical field of pavement crack detection, and in particular to a pavement crack detection method, medium and system.
  • Deep learning algorithms represented by convolutional neural networks have been applied by a large number of researchers to the task of detecting road cracks, and their detection speed and accuracy are far superior to traditional methods.
  • the complexity of the road background such as interference from shadows, stains, markings, repairs, etc.
  • the diversity of crack types such as light-colored cracks, fuzzy cracks, wet cracks, white cracks, etc.
  • shallow convolutional neural network models can roughly locate the location information of cracks, but in many images, it is impossible to effectively distinguish between some noise and crack information.
  • deep models can abstract more advanced features after multiple convolution and pooling operations, but some cracks with unclear features are easily filtered out, and only the crack parts with high responses are screened out, and the recognition accuracy is also low.
  • the embodiments of the present application provide a pavement crack detection method, medium and system to solve the problem of low accuracy in detecting pavement cracks in the prior art.
  • a pavement crack detection method comprising:
  • the pavement crack detection model includes: a convolutional neural network optimized by a residual network, an average pooling module, and an attention mechanism module
  • the convolutional neural network includes: a first convolutional neural network submodule, a second convolutional neural network submodule, a third convolutional neural network submodule, a fourth convolutional neural network submodule, a fifth convolutional neural network submodule, a superposition module, a weighting module, a recognition module, and an activation module connected in sequence
  • the average pooling module includes: a first average pooling layer, a second average pooling layer, a third average pooling layer, and a fourth average pooling layer;
  • the road surface image is input into the convolutional neural network, and the first convolutional neural network submodule, the second convolutional neural network submodule, the third convolutional neural network submodule, the fourth convolutional neural network submodule and the fifth convolutional neural network submodule respectively output a first feature map, a second feature map, a third feature map, a fourth feature map and a fifth feature map;
  • the first feature map, the second feature map, the third feature map and the fourth feature map are input into the first average pooling layer, the second average pooling layer, the third average pooling layer and the fourth average pooling layer respectively, the sixth feature map, the seventh feature map, the eighth feature map and the ninth feature map are output respectively;
  • the tenth feature map and the weight matrix are input into the weighting module, and after the tenth feature map and the weight matrix are dot-multiplied, they are input into the recognition module, and the eleventh feature map is output;
  • the eleventh feature map is input into the activation module, and a prediction matrix is outputted to characterize whether each square sub-image block of the road surface image has a crack, wherein each element of the prediction matrix corresponds to each square sub-image block of the road surface image, and the value of each element of the prediction matrix is 0 or 1.
  • the value of the element is 0, indicating that the corresponding square sub-image block has no crack, and the value of the element is 1, indicating that the corresponding square sub-image block has a crack.
  • a computer-readable storage medium on which computer program instructions are stored; when the computer program instructions are executed by a processor, a pavement crack detection method as described in the first aspect embodiment is implemented.
  • a pavement crack detection system comprising: a computer-readable storage medium as in the above-mentioned second aspect embodiment.
  • the embodiment of the present application makes full use of the multi-scale feature map structure of the convolutional neural network model, making up for the problem of high missed recognition rate caused by the traditional convolutional neural network model using only single-scale feature maps; the inter-channel attention mechanism is used to process the superimposed multi-scale feature maps, and the multi-scale information is further screened, thereby improving the feature extraction effect of the model; it can significantly improve the accuracy of crack recognition under complex road backgrounds.
  • FIG1 is a flow chart of a pavement crack detection method according to an embodiment of the present application.
  • FIG2 is a schematic diagram of the structure of a pavement crack detection model according to an embodiment of the present application.
  • FIG3 is a schematic diagram of the structure of a convolutional neural network according to an embodiment of the present application.
  • FIG4 is a schematic diagram of the structure of a convolutional neural network and an average pooling module according to an embodiment of the present application.
  • Some embodiments of the present application disclose a road crack detection method. As shown in FIG1 , the method comprises the following steps:
  • Step S101 construct a pavement crack detection model.
  • the pavement crack detection model includes: a convolutional neural network optimized by a residual network, an average pooling module, and an attention mechanism module.
  • the convolutional neural network includes: a first convolutional neural network submodule, a second convolutional neural network submodule, a third convolutional neural network submodule, a fourth convolutional neural network submodule, a fifth convolutional neural network submodule, a superposition module, a weighting module, a recognition module and an activation module, which are connected in sequence.
  • the convolutional neural network used for crack recognition uses a deep convolutional neural network and can combine the information of multi-scale feature maps, so as to accurately locate the crack part while retaining as much original information as possible and reducing the situation of missed recognition.
  • the first convolutional neural network submodule includes a first residual block and a first maximum pooling layer connected in sequence.
  • the first residual block includes a first convolutional layer and a second convolutional layer connected in sequence.
  • the convolution kernel size of the first convolutional layer and the second convolutional layer is 5 ⁇ 5, and both have 32 convolution kernels.
  • the second convolutional neural network submodule includes a second residual block and a second maximum pooling layer connected in sequence.
  • the second residual block includes a third convolutional layer and a fourth convolutional layer connected in sequence.
  • the convolution kernel size of the third convolutional layer and the fourth convolutional layer is 3 ⁇ 3, and both have 64 convolution kernels.
  • the third convolutional neural network submodule includes a third residual block and a third maximum pooling layer connected in sequence.
  • the third residual block includes a fifth convolutional layer and a sixth convolutional layer connected in sequence, and the convolution kernel size of the fifth convolutional layer and the sixth convolutional layer is 3 ⁇ 3, and both have 128 convolution kernels.
  • the fourth convolutional neural network submodule includes a fourth residual block and a fourth maximum pooling layer connected in sequence.
  • the fourth residual block includes a seventh convolutional layer and an eighth convolutional layer connected in sequence.
  • the convolution kernel size of the seventh convolutional layer and the eighth convolutional layer is 3 ⁇ 3, and both have 256 convolution kernels.
  • the fifth convolutional neural network submodule includes a fifth residual block and a fifth maximum pooling layer connected in sequence.
  • the fifth residual block is a ninth convolutional layer.
  • the convolution kernel size of the ninth convolutional layer is 3 ⁇ 3 and has 256 convolution kernels.
  • Convolution kernels can extract features of different spatial scales.
  • a 3 ⁇ 3 convolution kernel can extract a 3 ⁇ 3 rectangular space with 9 elements, so it can model the spatial position correlation between all elements in this rectangular space.
  • a 5 ⁇ 5 convolution kernel can extract a larger 5 ⁇ 5 rectangular space, so it can model the relationship between elements that are farther apart.
  • the window sizes of the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer, the fourth maximum pooling layer and the fifth maximum pooling layer are all 2 ⁇ 2.
  • the images output by the first convolutional layer, the second convolutional layer, the third convolutional layer, the fourth convolutional layer, the fifth convolutional layer, the sixth convolutional layer, the seventh convolutional layer, the eighth convolutional layer and the ninth convolutional layer are all processed by the activation function (sigmoid).
  • the images output by the first maximum pooling layer, the second maximum pooling layer, the third maximum pooling layer, and the fourth maximum pooling layer are all processed by batch normalization.
  • the recognition module is the tenth convolutional layer.
  • the convolution kernel size of the tenth convolutional layer is 1 ⁇ 1, and there is one convolution kernel.
  • the convolutional neural network is provided with initialized network parameters, wherein the network parameters include weights and biases.
  • the average pooling module includes: a first average pooling layer, a second average pooling layer, a third average pooling layer and a fourth average pooling layer.
  • the attention mechanism module includes: a global pooling layer, a first fully connected layer, and a second fully connected layer connected in sequence.
  • Step S102 After marking multiple square sub-image blocks in the road surface image, the road surface image is input into the convolutional neural network, and the first convolutional neural network submodule, the second convolutional neural network submodule, the third convolutional neural network submodule, the fourth convolutional neural network submodule and the fifth convolutional neural network submodule respectively output the first feature map, the second feature map, the third feature map, the fourth feature map and the fifth feature map.
  • the road surface image can be acquired by an acquisition device such as a camera.
  • the resolution of the acquired road surface image is set by the acquisition device.
  • the road surface image can be preprocessed in advance.
  • the road surface image is pre-processed by filling pixels on the right and bottom sides of the road surface image so that the pixels of the road surface image meet the first preset pixel resolution, wherein the grayscale value of the filled pixels is 255; in addition, after marking multiple square sub-image blocks in the road surface image, each square sub-image block in the road surface image is compressed to a second preset pixel resolution.
  • the resolution of the original collected road surface image is 3024 ⁇ 2048 pixels
  • pixels are padded on the right and bottom of the road surface image to pad the image to 3400 ⁇ 2200 pixels.
  • the pixels of the marked square sub-image block are 100 ⁇ 100.
  • some embodiments of the present application scale the original road surface image by 32/100 times, that is, compress the image to 32 ⁇ 32 pixels, so that each 100 ⁇ 100 pixel square sub-image block on the original road surface crack image is mapped to the compressed 32 ⁇ 32 square sub-image block. In this way, the road surface image is divided into 34 ⁇ 22 square sub-image blocks.
  • the feature maps of different layers can be regarded as three-dimensional matrices, whose dimensions are image length, image width and number of channels. It should be noted that when the input is a large-size pavement crack image, the GPU computing power requirement is relatively high. Therefore, in order to reduce the computing pressure, some embodiments of the present application use a single-channel grayscale image for network calculation, that is, the input is Batch size ⁇ 1 ⁇ H ⁇ W.
  • Batch size represents the number of samples sent to the network each time, H represents the image length, and W represents the image width; the sizes of the first feature map, the second feature map, the third feature map, the fourth feature map and the fifth feature map are [H/2, W/2, 32], [H/4, W/4, 64], [H/8, W/8, 128], [H/16, W/16, 256], [H/32, W/32, 256], respectively.
  • Step S103 After the first feature map, the second feature map, the third feature map and the fourth feature map are input into the first average pooling layer, the second average pooling layer, the third average pooling layer and the fourth average pooling layer, respectively, the sixth feature map, the seventh feature map, the eighth feature map and the ninth feature map are output respectively.
  • this step is used to perform an average pooling operation on the larger feature maps at the higher level. This operation can both maintain the local features of the model and freely scale the size of the feature map.
  • the window size of the first average pooling layer is 16 ⁇ 16
  • the window size of the second average pooling layer is 8 ⁇ 8
  • the window size of the third average pooling layer is 4 ⁇ 4
  • the window size of the fourth average pooling layer is 2 ⁇ 2.
  • Step S104 input the fifth characteristic map, the sixth characteristic map, the seventh characteristic map, the eighth characteristic map and the ninth characteristic map into the superposition module for superposition, and then output the tenth characteristic map.
  • the length and width of the three-dimensional matrix of the superimposed feature map remain unchanged, and the number of channels is the sum of the number of channels of all multi-scale feature maps. In some embodiments of the present application, the number is 736 channels.
  • Step S105 Input the tenth feature map into the attention mechanism module and output the weight matrix.
  • the global pooling layer is used to perform global pooling on the tenth feature map and extract the feature value of each channel of the tenth feature map.
  • the tenth feature map has a total of 736 channels.
  • the first fully connected layer is used to encode the feature value of each channel of the tenth feature map into a feature vector of a preset length, wherein the preset length is 64.
  • the second fully connected layer is used to learn weights from each feature vector to form a weight matrix, that is, the weights occupied by different feature maps.
  • Step S106 input the tenth feature map and the weight matrix into the weighting module, perform dot multiplication of the tenth feature map and the weight matrix, input the recognition module, and output the eleventh feature map.
  • the weighting method is dot multiplication, that is, all pixel values in the feature map are multiplied by the weight, and the new feature map obtained is the weighted result.
  • the recognition module selects a convolutional layer that does not change the positional relationship. Its specific parameters are as described above and will not be repeated here.
  • Step S107 input the eleventh feature map into the activation module, and output a prediction matrix representing whether each square sub-image block of the road surface image has a crack.
  • the process of this step specifically includes:
  • the characteristic parameter value of each square sub-image block in the eleventh characteristic image is mapped to a value between 0 and 1 through an activation function to obtain a pavement crack prediction probability value of each square sub-image block.
  • each element of the prediction matrix corresponds to each square sub-image of the road surface image.
  • the value of each element of the prediction matrix is 0 or 1.
  • the value of the element is 0, which means that the corresponding square sub-image block has no cracks, and the value of the element is 1, which means that the corresponding square sub-image block has cracks.
  • the entire road surface image is used as input, and end-to-end segmentation prediction is performed on whether each square sub-image block contains cracks.
  • the pixel size of the scaled image is H ⁇ W
  • the length and width of the original input road surface image become 1/32 of the original, respectively, and the size of the prediction matrix of whether there are road surface cracks is finally returned is H/32 ⁇ W/32.
  • a 34 ⁇ 22 prediction matrix is output.
  • Some embodiments of the present application further disclose a computer-readable storage medium, on which computer program instructions are stored; when the computer program instructions are executed by a processor, the pavement crack detection method as described in the above embodiments is implemented.
  • Some embodiments of the present application also disclose a pavement crack detection system, including: a computer-readable storage medium as described in the above embodiments.
  • the original model before optimization was also trained, and the test results were compared with the optimization model of some embodiments of the present application.
  • the main comparison models used include the linear convolutional neural network model CNN-1 as shown in Figure 3 corresponding to the optimization model of some embodiments of the present application and the multi-scale feature map superposition model CNN-2 as shown in Figure 4.
  • various models are trained and tested on an actual pavement crack dataset, and the comparison results are shown in Table 1.
  • the evaluation index uses the positive sample similarity index coefficient, and the calculation formula is as follows:
  • X represents the probability matrix formed by the original image through network feedforward
  • Y represents the labeling matrix formed by the labeling file.
  • 1 represent the L1 norms of the two matrices respectively;
  • X*Y represents the Hadamard Product of the two matrices; smooth is a smoothing factor added to avoid zero division errors, and is usually a small positive real number. In some embodiments of the present application, the value is 10-3.
  • some embodiments of the present application make full use of the multi-scale feature map structure of the convolutional neural network model, which makes up for the problem of high missed recognition rate caused by the traditional convolutional neural network model using only single-scale feature maps; the inter-channel attention mechanism is used to process the superimposed multi-scale feature maps, and the multi-scale information is further screened, thereby improving the feature extraction effect of the model; and the accuracy of crack recognition under complex road backgrounds can be significantly improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开一种路面裂缝检测方法、介质及系统,包括:路面图像中标注多个正方形子图像块后输入卷积神经网络输出第一特征图、第二特征图、第三特征图、第四特征图和第五特征图;将第一特征图、第二特征图、第三特征图和第四特征图分别输入第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层输出第六特征图、第七特征图、第八特征图和第九特征图;将第五特征图、第六特征图、第七特征图、第八特征图和第九特征图输入叠合模块输出第十特征图;将第十特征图输入注意力机制模块输出权重矩阵;将第十特征图和权重矩阵输入赋权模块后输入识别模块输出第十一特征图;将第十一特征图输入激活模块输出预测矩阵。本申请提高裂缝识别准确率。

Description

一种路面裂缝检测方法、介质及系统
相关申请的交叉引用
本申请要求于2022年11月01日提交中国专利局、申请号202211355412.8、申请名称为“一种路面裂缝检测方法、介质及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及路面裂缝检测技术领域,尤其涉及一种路面裂缝检测方法、介质及系统。
背景技术
以卷积神经网络为代表的深度学习算法,被大量研究人员应用到路面裂缝的检测任务上,其检测速度和准确度远远优于传统方法。然而,在实际工程应用场景中,由于路面背景存在复杂性(例如阴影、污渍、标线、修补等干扰)和裂缝种类存在多样性(例如浅色裂缝、模糊裂缝、潮湿裂缝、白裂缝等),仅使用浅层卷积神经网络模型虽然可以对裂缝的位置信息起到粗略的定位作用,但在许多图像中无法有效区分部分噪声与裂缝信息。相比之下深层模型经多次卷积和池化操作后可以抽象出更高级的特征,但某些特征不明显的裂缝易被过滤,只有响应很高的裂缝部分被筛选出来,识别正确率同样较低。
概述
本申请实施例提供一种路面裂缝检测方法、介质及系统,以解决现有技术检测路面裂缝的正确率较低的问题。
第一方面,提供一种路面裂缝检测方法,包括:
构建路面裂缝检测模型,其中,路面裂缝检测模型包括:通过残差网络优化的卷积神经网络、平均池化模块和注意力机制模块,卷积神经网络包括:依次连接的第一卷积神经网络子模块、第二卷积神经网络子模块,第三卷积神经网络子模块、第四卷积神经网络子模块、第五卷积神经网络子模块、叠合模块、赋权模块、识别模块和激活模块,平均池化模块包括:第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层;
在路面图像中标注多个正方形子图像块后,将路面图像输入卷积神经网络,分别由第一卷积神经网络子模块、第二卷积神经网络子模块,第三卷积 神经网络子模块、第四卷积神经网络子模块和第五卷积神经网络子模块输出第一特征图、第二特征图、第三特征图、第四特征图和第五特征图;
将第一特征图、第二特征图、第三特征图和第四特征图,分别输入第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层后,分别输出第六特征图、第七特征图、第八特征图和第九特征图;
将第五特征图、第六特征图、第七特征图、第八特征图和第九特征图输入叠合模块进行叠合后,输出第十特征图;
将第十特征图输入注意力机制模块,输出权重矩阵;
将第十特征图和权重矩阵输入赋权模块,使第十特征图与权重矩阵点乘后,输入识别模块,输出第十一特征图;
将第十一特征图输入激活模块,输出表征路面图像的每一正方形子图像块是否具有裂缝的预测矩阵,其中,预测矩阵的每一元素对应路面图像的每一正方形子图像块,预测矩阵的每一元素的取值为0或1,元素的取值为0表示对应的正方形子图像块不具有裂缝,元素的取值为1表示对应的正方形子图像块具有裂缝。
第二方面,提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序指令;计算机程序指令被处理器执行时实现如上述第一方面实施例的路面裂缝检测方法。
第三方面,提供一种路面裂缝检测系统,包括:如上述第二方面实施例的计算机可读存储介质。
这样,本申请实施例,充分利用了卷积神经网络模型的多尺度特征图结构,弥补了传统卷积神经网络模型仅使用单尺度特征图造成的漏识别率较高的问题;采用通道间注意力机制对叠合后的多尺度特征图进行处理,对多尺度信息进行进一步筛选,从而提升模型的特征提取效果;可显著提升复杂路面背景下裂缝识别准确率。
附图说明
为了更清楚地说明本申请一些实施例的技术方案,下面将对本申请一些实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例的路面裂缝检测方法的流程图;
图2是本申请实施例的路面裂缝检测模型的结构示意图;
图3是本申请实施例的卷积神经网络的结构示意图;
图4是本申请实施例的卷积神经网络和平均池化模块的结构示意图。
详细描述
下面将结合本申请一些实施例中的附图,对本申请一些实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的一些实施例,本领域普通技术人员在没有作出创造性劳动前提下所获取的所有其他实施例,都属于本申请保护的范围。
本申请一些实施例公开了一种路面裂缝检测方法。如图1所示,该方法包括如下的步骤:
步骤S101:构建路面裂缝检测模型。
具体的,如图2所示,路面裂缝检测模型包括:通过残差网络优化的卷积神经网络、平均池化模块和注意力机制模块。
其中,卷积神经网络包括:依次连接的第一卷积神经网络子模块、第二卷积神经网络子模块,第三卷积神经网络子模块、第四卷积神经网络子模块、第五卷积神经网络子模块、叠合模块、赋权模块、识别模块和激活模块。该用于裂缝识别的卷积神经网络,使用深层卷积神经网络,并可将多尺度特征图的信息进行结合,从而在准确定位裂缝部分的同时尽量保留较多的原始信息,减少漏识别的情况。
具体的,第一卷积神经网络子模块包括依次连接的第一残差块和第一最大池化层。第一残差块包括依次连接的第一卷积层和第二卷积层。第一卷积层和第二卷积层的卷积核大小均为5×5,且均具有32个卷积核。
第二卷积神经网络子模块包括依次连接的第二残差块和第二最大池化层。第二残差块包括依次连接的第三卷积层和第四卷积层。第三卷积层和第四卷积层的卷积核大小均为3×3,且均具有64个卷积核。
第三卷积神经网络子模块包括依次连接的第三残差块和第三最大池化层。第三残差块包括依次连接的第五卷积层和第六卷积层,第五卷积层和第六卷积层的卷积核大小均为3×3,且均具有128个卷积核。
第四卷积神经网络子模块包括依次连接的第四残差块和第四最大池化层。第四残差块包括依次连接的第七卷积层和第八卷积层。第七卷积层和第八卷积层的卷积核大小均为3×3,且均具有256个卷积核。
第五卷积神经网络子模块包括依次连接的第五残差块和第五最大池化 层。第五残差块为第九卷积层。第九卷积层的卷积核大小为3×3,且具有256个卷积核。
卷积核可以提取到不同空间尺度的特征,3×3的卷积核则能提取到一个3×3共9个元素的矩形空间,因此它可以对这个矩形空间内的所有元素之间的空间位置相关关系进行建模。与此类似,5×5的卷积核能够提取到更大范围的5×5的矩形空间,因此可以对相距更远的元素之间的关系进行建模。
第一最大池化层、第二最大池化层、第三最大池化层、第四最大池化层和第五最大池化层的窗口大小均为2×2。
第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层、第六卷积层、第七卷积层、第八卷积层和第九卷积层输出的图像均经过激活函数(sigmoid)处理。
第一最大池化层、第二最大池化层、第三最大池化层和第四最大池化层输出的图像均经过批标准化(BatchNormalization)处理。
通过使用上述的ReLU激活函数,可增加非线性并缓解梯度消失的问题。通过上述的批标准化处理,可防止内部数据的分布变化。
识别模块为第十卷积层。第十卷积层的卷积核大小为1×1,且具有1个卷积核。
应当理解的是,该卷积神经网络中设置有初始化的网络参数。其中,网络参数包括权重和偏置。
平均池化模块包括:第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层。
注意力机制模块包括:依次连接的全局池化层、第一全连接层和第二全连接层。
步骤S102:在路面图像中标注多个正方形子图像块后,将路面图像输入卷积神经网络,分别由第一卷积神经网络子模块、第二卷积神经网络子模块,第三卷积神经网络子模块、第四卷积神经网络子模块和第五卷积神经网络子模块输出第一特征图、第二特征图、第三特征图、第四特征图和第五特征图。
路面图像可通过摄像头等采集设备采集得到。采集的路面图像的分辨率为采集设备设定的。为了使路面图像的分辨率适合本申请一些实施例的卷积神经网络,可预先对路面图像进行预处理。
具体的,路面图像预先通过在路面图像的右侧和下侧填充像素处理,使路面图像的像素满足第一预设像素分辨率,其中,填充的像素的灰度值为255;此外,在路面图像中标注多个正方形子图像块后,将路面图像中的每一正方形子图像块压缩为第二预设像素分辨率。
例如,原始采集的路面图像的分辨率为3024×2048像素,则在该路面图像的右侧和下侧填充像素,将图像填充(padding)为3400×2200像素。标注的正方形子图像块的像素为100×100,为了方便计算同时减少参数计算量,本申请一些实施例对原始的路面图像进行32/100倍的缩放,即将该图像压缩为32×32像素,使得原始路面裂缝图像上的每个100×100像素的正方形子图像块映射到压缩后的32×32的正方形子图像块上。这样,路面图像被划分为34×22个正方形子图像块。
不同层的特征图均可看作三维矩阵,其维度分别为图像长度、图像宽度和通道数。需要说明的是,对于输入是大尺寸路面裂缝图像的情况,对GPU算力的要求较高。因此,为了减轻计算压力,本申请一些实施例采取单通道灰度图用于网络计算,即输入为Batch size×1×H×W。Batch size表示每次送入网络中的样本数量,H表示图像长度,W表示图像宽度;第一特征图、第二特征图、第三特征图、第四特征图和第五特征图的大小分别为[H/2,W/2,32]、[H/4,W/4,64]、[H/8,W/8,128]、[H/16,W/16,256]、[H/32,W/32,256]。
步骤S103:将第一特征图、第二特征图、第三特征图和第四特征图,分别输入第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层后,分别输出第六特征图、第七特征图、第八特征图和第九特征图。
由于第一特征图、第二特征图、第三特征图、第四特征图和第五特征图的大小均不同,后续步骤中无法直接沿第三维(通道)进行合并,因此在进行合并之前,首先通过本步骤对高层较大的特征图进行平均池化操作。该操作既能保持模型的局部特征,又能自由缩放特征图的大小。
具体的,第一平均池化层的窗口大小为16×16,第二平均池化层的窗口大小为8×8,第三平均池化层的窗口大小为4×4,第四平均池化层的窗口大小为2×2。
步骤S104:将第五特征图、第六特征图、第七特征图、第八特征图和第九特征图输入叠合模块进行叠合后,输出第十特征图。
叠合后的特征图的三维矩阵长宽不变,通道数为全部多尺度特征图通道 数之和,本申请一些实施例为736个通道。
通过将不同层次的特征图全部进行叠合,可将更多原始图像的细节信息加入识别过程当中,防止路面图像中特征不明显、响应不高的裂缝部分在层次加深的过程中被过滤。
步骤S105:将第十特征图输入注意力机制模块,输出权重矩阵。
全局池化层用于将第十特征图进行全局池化,提取第十特征图的每一通道的特征值。第十特征图一共有736个通道。
第一全连接层用于将第十特征图的每一通道的特征值编码为预设长度的特征向量。其中,预设长度为64。
第二全连接层用于从每一特征向量中学习得到权重,组成权重矩阵,即不同特征图所占的权重。
通过对特征图增加权重,可更加有效地结合多尺度特征图的信息。
步骤S106:将第十特征图和权重矩阵输入赋权模块,使第十特征图与权重矩阵点乘后,输入识别模块,输出第十一特征图。
赋权方法为点乘,即特征图内全部像素值点乘以权重,得到的新特征图,即为赋权后的结果。
在本申请一些实施例的裂缝检测中,由于图像标注具有相对位置关系,因此识别模块选择不改变位置关系的卷积层,其具体参数如前文所述,在此不再赘述。
步骤S107:将第十一特征图输入激活模块,输出表征路面图像的每一正方形子图像块是否具有裂缝的预测矩阵。
该步骤的过程具体包括:
(1)将第十一特征图中的每一正方形子图像块的特征参数值通过激活函数映射到0~1之间,得到每一正方形子图像块的路面裂缝预测概率值。
(2)通过预设的路面裂缝预测概率阈值对每一特征参数值进行二值化判断。
根据判断结果不同,具体有如下两种情况。
(3)当该正方形子图像块的路面裂缝预测概率值大于路面裂缝预测概率阈值时,将该正方形子图像块的特征参数值标记为1,则预测矩阵的对应该正方形子图像块的元素赋值为1。
(4)当该正方形子图像块的路面裂缝预测概率值不大于路面裂缝预测概率阈值时,将该正方形子图像块的特征参数值标记为0,则预测矩阵的对 应该正方形子图像块的元素赋值为0。
通过上述的过程可知,预测矩阵的每一元素对应路面图像的每一正方形子图像。预测矩阵的每一元素的取值为0或1。元素的取值为0表示对应的正方形子图像块不具有裂缝,元素的取值为1表示对应的正方形子图像块具有裂缝。
考虑到裂缝特征的连续性,将整张路面图像作为输入,对每个正方形子图像块中是否包含有裂缝进行端到端的分割预测。在一具体实施例中,设放缩后的图片像素大小为H×W,则总共有H/32行,W/32列正方形子图像块。与之相对应的,经过网络五次下采样后,原始输入的路面图像的长和宽分别变为原来的1/32,最终返回的是否具有路面裂缝的预测矩阵的大小为H/32×W/32。具体的,如前所述的3400×2200像素的路面图像,输出34×22的预测矩阵。
本申请一些实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序指令;所述计算机程序指令被处理器执行时实现如上述实施例所述的路面裂缝检测方法。
本申请一些实施例还公开了一种路面裂缝检测系统,包括:如上述实施例所述的计算机可读存储介质。
为了评估本申请一些实施例中提出的优化模型的检测效果,将优化前的原始模型也进行了训练,并将测试结果与本申请一些实施例的优化模型进行了对比。主要使用的对比模型包括与本申请一些实施例的优化模型相对应的如图3所示的直线型卷积神经网络模型CNN-1和如图4所示的多尺度特征图叠合后模型CNN-2。
本申请一些实施例在实际路面裂缝数据集上,对各个模型进行训练和测试,对比结果如表1所示。评价指标选用正样本相似度指标coefficient,计算公式如下:
Figure PCTCN2022138775-appb-000001
其中,X表示由原始图像经过网络前馈形成的概率矩阵,Y表示由标记文件形成的标记矩阵。||X|| 1和||Y|| 1分别表示两个矩阵的L1范数;X*Y表示两个矩阵的哈达马积(Hadamard Product);smooth是为了避免除零错误而加入的平滑因子,常取较小的正实数,本申请一些实施例中取值为10-3。
表1不同模型的相似度指标
模型 coefficient
CNN-1 0.6676
CNN-2 0.7163
本申请实施例的模型 0.7210
通过表1结果可以看出,特征图叠合后,CNN-2相较于CNN-1模型,正样本相似度指标有了明显提升;本申请一些实施例进一步增加注意力模块,提出的优化模型正样本相似度指标最好,这证明了本申请一些实施例提出的优化方法,对于复杂路面背景下有无裂缝的判别更加精确。
综上,本申请一些实施例,充分利用了卷积神经网络模型的多尺度特征图结构,弥补了传统卷积神经网络模型仅使用单尺度特征图造成的漏识别率较高的问题;采用通道间注意力机制对叠合后的多尺度特征图进行处理,对多尺度信息进行进一步筛选,从而提升模型的特征提取效果;可显著提升复杂路面背景下裂缝识别准确率。
以上所述,仅为本申请的一些实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (10)

  1. 一种路面裂缝检测方法,其特征在于,包括:
    构建路面裂缝检测模型,其中,所述路面裂缝检测模型包括:通过残差网络优化的卷积神经网络、平均池化模块和注意力机制模块,所述卷积神经网络包括:依次连接的第一卷积神经网络子模块、第二卷积神经网络子模块,第三卷积神经网络子模块、第四卷积神经网络子模块、第五卷积神经网络子模块、叠合模块、赋权模块、识别模块和激活模块,所述平均池化模块包括:第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层;
    在路面图像中标注多个正方形子图像块后,将所述路面图像输入所述卷积神经网络,分别由所述第一卷积神经网络子模块、所述第二卷积神经网络子模块,所述第三卷积神经网络子模块、所述第四卷积神经网络子模块和所述第五卷积神经网络子模块输出第一特征图、第二特征图、第三特征图、第四特征图和第五特征图;
    将所述第一特征图、所述第二特征图、所述第三特征图和所述第四特征图,分别输入第一平均池化层、第二平均池化层、第三平均池化层和第四平均池化层后,分别输出第六特征图、第七特征图、第八特征图和第九特征图;
    将所述第五特征图、所述第六特征图、所述第七特征图、所述第八特征图和所述第九特征图输入叠合模块进行叠合后,输出第十特征图;
    将所述第十特征图输入注意力机制模块,输出权重矩阵;
    将所述第十特征图和所述权重矩阵输入所述赋权模块,使所述第十特征图与所述权重矩阵点乘后,输入所述识别模块,输出第十一特征图;
    将所述第十一特征图输入所述激活模块,输出表征所述路面图像的每一所述正方形子图像块是否具有裂缝的预测矩阵,其中,所述预测矩阵的每一元素对应所述路面图像的每一所述正方形子图像块,所述预测矩阵的每一元素的取值为0或1,所述元素的取值为0表示对应的所述正方形子图像块不具有裂缝,所述元素的取值为1表示对应的所述正方形子图像块具有裂缝。
  2. 根据权利求1所述的路面裂缝检测方法,其特征在于,
    所述第一卷积神经网络子模块包括依次连接的第一残差块和第一 最大池化层,所述第一残差块包括依次连接的第一卷积层和第二卷积层,所述第一卷积层和所述第二卷积层的卷积核大小均为5×5,且均具有32个卷积核;
    所述第二卷积神经网络子模块包括依次连接的第二残差块和第二最大池化层,所述第二残差块包括依次连接的第三卷积层和第四卷积层,所述第三卷积层和所述第四卷积层的卷积核大小均为3×3,且均具有64个卷积核;
    所述第三卷积神经网络子模块包括依次连接的第三残差块和第三最大池化层,所述第三残差块包括依次连接的第五卷积层和第六卷积层,所述第五卷积层和所述第六卷积层的卷积核大小均为3×3,且均具有128个卷积核;
    所述第四卷积神经网络子模块包括依次连接的第四残差块和第四最大池化层,所述第四残差块包括依次连接的第七卷积层和第八卷积层,所述第七卷积层和所述第八卷积层的卷积核大小均为3×3,且具有256个卷积核;
    所述第五卷积神经网络子模块包括依次连接的第五残差块和第五最大池化层,所述第五残差块为第九卷积层,所述第九卷积层的卷积核大小为3×3,且具有256个卷积核;
    所述第一最大池化层、所述第二最大池化层、所述第三最大池化层、所述第四最大池化层和所述第五最大池化层的窗口大小均为2×2。
  3. 根据权利要求2所述的路面裂缝检测方法,其特征在于:所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层、所述第六卷积层、所述第七卷积层、所述第八卷积层和所述第九卷积层输出的特征图均经过激活函数处理;
    所述第一最大池化层、所述第二最大池化层、所述第三最大池化层和所述第四最大池化层输出的特征图均经过批标准化处理。
  4. 根据权利要求1所述的路面裂缝检测方法,其特征在于:所述第一平均池化层的窗口大小为16×16,所述第二平均池化层的窗口大小为8×8,所述第三平均池化层的窗口大小为4×4,所述第四平均池化层的窗口大小为2×2。
  5. 根据权利要求1所述的路面裂缝检测方法,其特征在于,所述注意力机制模块包括:依次连接的全局池化层、第一全连接层和第二 全连接层;
    所述全局池化层用于将所述第十特征图进行全局池化,提取所述第十特征图的每一通道的特征值,所述第一全连接层用于将所述第十特征图的每一通道的特征值编码为预设长度的特征向量,所述第二全连接层用于从每一所述特征向量中学习得到权重,组成所述权重矩阵。
  6. 根据权利要求1所述的路面裂缝检测方法,其特征在于:所述识别模块为第十卷积层,所述第十卷积层的卷积核大小为1×1,且具有1个卷积核。
  7. 根据权利要求1所述的路面裂缝检测方法,其特征在于:所述输出表征所述路面图像的每一所述正方形子图像块是否具有裂缝的预测矩阵的步骤,包括:
    将所述第十一特征图中的每一正方形子图像块的特征参数值通过激活函数映射到0~1之间,得到每一正方形子图像块的路面裂缝预测概率值;
    通过预设的路面裂缝预测概率阈值对每一所述特征参数值进行二值化判断;
    当该正方形子图像块的路面裂缝预测概率值大于所述路面裂缝预测概率阈值时,将该正方形子图像块的所述特征参数值标记为1,则所述预测矩阵的对应该正方形子图像块的元素赋值为1;
    当该正方形子图像块的路面裂缝预测概率值不大于所述路面裂缝预测概率阈值时,将该正方形子图像块的所述特征参数值标记为0,则所述预测矩阵的对应该正方形子图像块的元素赋值为0。
  8. 根据权利要求1所述的路面裂缝检测方法,其特征在于:所述路面图像预先通过在所述路面图像的右侧和下侧填充像素处理,使所述路面图像的像素满足第一预设像素分辨率;
    在所述路面图像中标注多个正方形子图像块后,将所述路面图像中的每一所述正方形子图像块压缩为第二预设像素分辨率。
  9. 一种计算机可读存储介质,其特征在于:所述计算机可读存储介质上存储有计算机程序指令;所述计算机程序指令被处理器执行时实现如权利要求1~8中任一项所述的路面裂缝检测方法。
  10. 一种路面裂缝检测系统,其特征在于,包括:如权利要求9所述的计算机可读存储介质。
PCT/CN2022/138775 2022-11-01 2022-12-13 一种路面裂缝检测方法、介质及系统 WO2024092968A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211355412.8A CN115661623A (zh) 2022-11-01 2022-11-01 一种路面裂缝检测方法、介质及系统
CN202211355412.8 2022-11-01

Publications (1)

Publication Number Publication Date
WO2024092968A1 true WO2024092968A1 (zh) 2024-05-10

Family

ID=84995157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/138775 WO2024092968A1 (zh) 2022-11-01 2022-12-13 一种路面裂缝检测方法、介质及系统

Country Status (2)

Country Link
CN (1) CN115661623A (zh)
WO (1) WO2024092968A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133960A (zh) * 2017-04-21 2017-09-05 武汉大学 基于深度卷积神经网络的图像裂缝分割方法
CN109949290A (zh) * 2019-03-18 2019-06-28 北京邮电大学 路面裂缝检测方法、装置、设备及存储介质
CN114418937A (zh) * 2021-12-06 2022-04-29 北京邮电大学 路面裂缝检测方法以及相关设备
US20220222914A1 (en) * 2021-01-14 2022-07-14 Tata Consultancy Services Limited System and method for attention-based surface crack segmentation
CN115147439A (zh) * 2022-07-11 2022-10-04 南京工业大学 基于深度学习与注意力机制的混凝土裂缝分割方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133960A (zh) * 2017-04-21 2017-09-05 武汉大学 基于深度卷积神经网络的图像裂缝分割方法
CN109949290A (zh) * 2019-03-18 2019-06-28 北京邮电大学 路面裂缝检测方法、装置、设备及存储介质
US20220222914A1 (en) * 2021-01-14 2022-07-14 Tata Consultancy Services Limited System and method for attention-based surface crack segmentation
CN114418937A (zh) * 2021-12-06 2022-04-29 北京邮电大学 路面裂缝检测方法以及相关设备
CN115147439A (zh) * 2022-07-11 2022-10-04 南京工业大学 基于深度学习与注意力机制的混凝土裂缝分割方法及系统

Also Published As

Publication number Publication date
CN115661623A (zh) 2023-01-31

Similar Documents

Publication Publication Date Title
CN112465748B (zh) 基于神经网络的裂缝识别方法、装置、设备及存储介质
CN108564085B (zh) 一种自动读取指针式仪表读数的方法
CN113205051B (zh) 基于高空间分辨率遥感影像的储油罐提取方法
CN106203327A (zh) 基于卷积神经网络的肺部肿瘤识别系统及方法
CN111783772A (zh) 一种基于RP-ResNet网络的抓取检测方法
CN109583483A (zh) 一种基于卷积神经网络的目标检测方法和系统
CN111652273B (zh) 一种基于深度学习的rgb-d图像分类方法
CN112233067A (zh) 一种热轧钢卷端面质量检测方法及系统
CN111753828A (zh) 一种基于深度卷积神经网络的自然场景水平文字检测方法
CN113936195B (zh) 敏感图像识别模型的训练方法、训练装置和电子设备
CN109284779A (zh) 基于深度全卷积网络的物体检测方法
CN113256494B (zh) 一种文本图像超分辨率方法
CN112819748B (zh) 一种带钢表面缺陷识别模型的训练方法及装置
CN110909615A (zh) 基于多尺度输入混合感知神经网络的目标检测方法
CN116012653A (zh) 一种注意力残差单元神经网络高光谱图像分类方法及系统
CN112364974A (zh) 一种基于激活函数改进的YOLOv3算法
CN116912674A (zh) 基于改进的YOLOv5s网络模型复杂水环境下目标检测方法及系统
CN115272691A (zh) 一种钢筋绑扎状态检测模型的训练方法、识别方法及设备
CN111368637B (zh) 一种基于多掩模卷积神经网络的搬运机器人识别目标方法
CN116523888B (zh) 路面裂缝的检测方法、装置、设备及介质
WO2024092968A1 (zh) 一种路面裂缝检测方法、介质及系统
CN115937540A (zh) 基于Transformer编码器的图像匹配方法
CN114596433A (zh) 一种绝缘子识别方法
CN111950451A (zh) 基于多尺度预测cnn及龙芯芯片的多类别目标识别方法
CN112417961A (zh) 一种基于场景先验知识的海面目标检测方法