CN114612306A

CN114612306A - Deep learning super-resolution method for crack detection

Info

Publication number: CN114612306A
Application number: CN202210250155.5A
Authority: CN
Inventors: 刘鹏宇; 刘天禹; 陈善继
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-10

Abstract

The invention discloses a deep learning super-resolution method for crack detection, and belongs to the technical field of image super-resolution. The invention comprises the following steps: constructing a crack image data set for super-resolution network training; constructing a super-resolution network facing to the crack; training a super-resolution network facing the crack; and (5) super-resolution amplification of the crack image. The method fully utilizes the advantages of depth learning in the field of image super-resolution, designs the lightweight residual module comprising an attention mechanism and depth separable convolution based on the characteristics of the crack image, and constructs the super-resolution network by adopting the rear upper sampling structure, thereby solving the problems of difficult and inaccurate mapping from the crack low-resolution image to the high-resolution image, performing super-resolution amplification on the crack image under the condition of low computing resource occupation, retaining the texture information of the crack and improving the visual experience.

Description

Deep learning super-resolution method for crack detection

Technical Field

The invention relates to the technical field of image super-resolution, in particular to a deep learning super-resolution method for crack detection.

Background

At present, in the field of crack detection, the crack detection is mainly divided into artificial subjective detection, detection by using a sound wave emission instrument and a laser scanner. The former method is restricted by subjective consciousness of a detector and has no universality. The latter method uses instrumentation with high cost, which is not conducive to large-scale deployment. The rapid development of the deep learning image processing technology brings new opportunities for crack detection, and the method is used for intelligently analyzing the crack image, so that the efficiency of detection work and the accuracy of a detection result can be improved, the task load of detection workers can be reduced as much as possible, and the detection cost is reduced. The processing of images by deep learning is mostly completed based on a convolutional neural network, and the network extracts features by continuously iterating convolutional layers and obtains expected results by mapping layers. Most experiments prove that the method has incomparable advantages compared with the traditional method, and has better identification performance for concrete cracks, road cracks and rock stratum cracks.

The detection network commonly used for crack detection is a semantic segmentation network, the network carries out discrimination and classification on pixel points of an image one by one, and more pixel points mean that the content which can be learned by the network is richer, so that the network has higher requirements on the resolution of an input image. In the authoritative data sets, concentrate Crack Images for Classification and Crack-detection in the Crack detection field, the image resolution is about 224 × 224, and the image resolution requirements fed into the segmentation network are 480 × 480, 640 × 640 or higher, so that the low-resolution image which does not meet the input requirements needs to be amplified to obtain the high-resolution image which meets the network requirements (the low resolution and the high resolution are in a relative relationship, 400 × 400 belongs to a high-resolution image compared with 200 × 200, and 400 × 400 belongs to a low-resolution image compared with 600 × 600). The commonly used amplification method is to perform interpolation amplification on the low-resolution image by a bicubic interpolation method, so that the interpolation amplification is simple and quick, an additional module is not required to be added, and the method is widely applied to amplification of images at a mobile phone end and a computer end. However, the image with the rough interpolation and amplification is a rough high-resolution image, the resolution is only improved, the texture information of the object in the image is lost, and the most common problem is that the edge of the amplified object is blurred, so that the visual experience is influenced, and the subsequent identification and segmentation processing is not facilitated.

In view of the development of the deep learning super-resolution technology, the realization of image amplification by using a super-resolution network is a popular research direction. The method fits a mapping relation from a low-resolution image to a high-resolution image through learning of a large amount of data, improves the resolution of the image, brings better visual experience, and is an important image processing technology. However, the existing deep learning super-resolution technology is mainly oriented to actual scenes with task types, such as landscapes, animals, plants, people, food, buildings and vehicles, the mapping relation constructed by the existing network has a good effect on the occasions, but the image applied to the crack type is distorted and blurred, which indicates that the mapping relation constructed before is not suitable for the crack image; in addition, the existing super-resolution technology occupies higher computing resources and is not suitable for being used as a preprocessing module for crack detection. In view of the above problems, the invention designs a crack detection-oriented deep learning super-resolution method, which effectively solves the problems of difficult and inaccurate mapping from a crack low-resolution image to a high-resolution image, performs super-resolution amplification on the crack image while occupying less resources, retains the texture information of the crack and improves the visual experience.

Disclosure of Invention

The method mainly solves the technical problems that the existing deep learning super-resolution technology is poor in effect, the crack low-resolution image is difficult to map to the high-resolution image, and the computing resource occupation is high. Therefore, a deep learning super-resolution method facing crack detection is constructed, the method is used for improving the resolution of a crack image and simultaneously reserving crack texture information, and computing resources are reduced. In order to achieve the purpose, the invention adopts the following technical scheme:

a crack detection-oriented deep learning super-resolution method comprises the following steps:

step 1: and constructing a crack image data set for network training.

The quality of the data set in the deep learning is crucial to the super-resolution result, so that an original crack image data set is constructed through network open-source image data and field acquisition image data, and then the crack image set used for training and supervision is constructed through data enhancement, data clipping and data downsampling.

Step 2: and constructing a super-resolution network facing the crack.

And constructing a crack-oriented lightweight super-resolution network based on a post-up-sampling super-resolution network structure. The structure learns the low-resolution images end to end, and adds a learnable upsampling layer at the end for fitting the high-resolution images, so that the method has the advantage of greatly reducing the occupation of computing resources. The designed network can be divided into three modules of Head, Body and Tail.

The Head module is composed of two ordinary convolution layers and used for improving the dimensionality of an input low-resolution image and performing primary texture information extraction.

Further, the Body module consists of 16 repeatedly stacked blocks and 1 convolutional layer for refined texture feature extraction. Wherein each Block is divided into a front part, a middle part and a back part. The front part adopts a common convolution layer to extract the information output by the previous layer, the middle part is a lightweight residual error structure containing an attention mechanism and a depth separable convolution, and the input and the output are subjected to characteristic fusion in a jump connection mode; and collecting the texture information of the middle part by adopting a common convolution layer at the tail end. And finally, the input and the output of the Body module are fused in a jump connection mode, so that the interaction between texture information is improved.

And the final Tail module consists of two common convolution layers and a sub-pixel convolution layer and is used for carrying out the upper adoption operation on the characteristic diagram output by the Body so as to realize the amplification of the low-resolution image.

And step 3: and training a super-resolution network facing the crack.

Inputting the constructed crack super-resolution data set into a designed network, selecting an L1 loss function and an Adam optimizer, training the network to fit a mapping relation from a low-resolution crack image to a high-resolution crack image, and storing a model with the highest fitting rate after training.

And 4, step 4: and (5) super-resolution amplification of the crack image.

And mapping the low-resolution crack image by using the trained model file to obtain the amplified high-resolution crack image. The image is saved and can be used for crack detection later.

Compared with the prior art, the invention has the following advantages:

1. the invention discloses a deep learning super-resolution method designed based on crack image characteristics, which effectively solves the problems of distortion and blur of crack images amplified by the existing method, has better visual experience and crack texture information, and can be used for image preprocessing of crack detection and identification tasks.

2. The lightweight residual error module comprising an attention mechanism and a depth separable convolution is designed, and a lightweight super-resolution network is constructed by adopting a rear-mounted up-sampling structure, so that the occupation of computing resources is effectively reduced on the premise of ensuring that the network can perform high-precision super-resolution.

Drawings

FIG. 1 is a schematic overall flow chart of the crack detection-oriented deep learning super-resolution method in the invention.

FIG. 2 is a schematic flow chart of the present invention for constructing a fracture image dataset for training.

Fig. 3 is a structural diagram of a super-resolution network facing crack detection in the present invention.

Detailed Description

The invention mainly realizes super-resolution of crack images, and the specific method adopted by the invention is described in detail below by combining the attached drawings.

Specifically, the process of the deep learning super-resolution method for crack detection is shown in appendix 1, and includes the following steps. S1, constructing a crack image data set for network training. And S2, constructing a super-resolution network facing the crack. S3: and training a super-resolution network facing the crack. And S4, super-resolution amplification of the crack image.

(1) For S1, a fracture image dataset for network training is constructed.

The flow is shown in FIG. 2 of the appendix. Partial crack images are obtained through public resources, and the camera is used for shooting different types of cracks in different scenes for supplement, so that the diversity of crack types is improved. And the data is randomly rotated, light is supplemented, and the generalization of the data is improved. Then, cutting the processed image data to obtain a supervision image used as a true value; and then, processing the cut images by adopting a down-sampling and noise-adding mode to generate low-resolution images for training.

(2) For S2, a super-resolution network facing the crack is constructed.

The overall architecture of the network is illustrated on the left side of appendix FIG. 3. The designed network can be divided into three modules of Head, Body and Tail, wherein relu is an activation function.

The Head module is composed of two ordinary convolution layers with convolution kernel size of 3, the number of channels of an image with input resolution of 96 multiplied by 96 and channel number of 3 is expanded to 64, and initial extraction of features is carried out.

The Body module consists of 16 repeatedly stacked blocks and 1 common convolution layer, and jump connection is performed between the input and the output of the Body module for refined texture feature extraction. Wherein the Block specific structure is shown in the right side of appendix 3. Each Block is divided into a front part, a middle part and a rear part, and the front part extracts information output by a previous layer by a common convolution with a convolution kernel size of 3.

The middle part is a lightweight residual error structure, namely jump connection is carried out between input and output for fusing characteristic information. The residual error structure firstly increases the number of image channels to 128 through common convolution with convolution kernel size of 1, and then extracts information through deep separable convolution with convolution kernel size of 3, wherein the convolution sets the number of the convolution kernel channels to 1, and the number of the convolution kernels is set to the number of the characteristic image channels, so that each convolution kernel channel can process each characteristic image channel. After depth separable convolution, channel separation operation is carried out on the feature map, the feature map is divided into 2 feature maps with 64 channels and the same size, and the two feature maps are sent to two branches of channel attention (left) and space attention (right) respectively. Wherein the channel attention is subjected to average pooling processing aiming at each channel of the characteristic diagram to obtain a one-dimensional vector; obtaining an output vector through 2 common convolutional layers with the convolutional kernel size of 1, wherein the vector analyzes a weight relation for each channel of the characteristic diagram and gives a larger weight to more important channels; and finally, normalizing the output vector, and multiplying the output vector by the input feature map to obtain a new feature map. The spatial attention is focused on the spatial information of the feature map, and mean operation is carried out on each pixel in all channels of the feature map to obtain a single-channel feature map; then obtaining an output characteristic diagram through a common convolutional layer with the convolutional kernel size of 7; finally, the output characteristic diagram is normalized and processed, and then the input characteristic diagram is multiplied to obtain a new characteristic diagram. The two attention mechanisms can effectively assign corresponding weights to the characteristic diagram channels and pixels according to the interested areas of the characteristic diagrams, and improve the characteristic extraction capability. And then, carrying out channel splicing operation on the feature graphs processed by the two attention mechanisms, and carrying out channel number reduction operation through a common convolution with the convolution kernel size of 1 to restore the state of the residual structure input feature graph.

And the rear part is a common convolution with a convolution kernel of 3, the output of the middle residual error structure is subjected to information summary, and finally the input and the output of the Block module are fused in a jump connection mode, so that the interaction between texture information is improved.

The Tail module consists of two ordinary convolution layers with convolution kernel size of 3 and a sub-pixel convolution layer and is used for carrying out the upper adoption operation on the feature map output by Body so as to realize the amplification of the low-resolution image. The sub-pixel convolution is used as a common image super-resolution up-sampling module, so that the size of an image can be well improved under the condition of keeping the original information of the image. The network with the structural design has the advantages of low calculation amount and high precision, and well completes the task of crack image super-resolution amplification.

(3) For S3, the super-resolution network facing the crack is trained.

Inputting the constructed crack super-resolution data set into an inverted design network, selecting an L1 loss function and an Adam optimizer, setting the training times to be 200, setting the initial learning rate to be 0.0005, setting the input low-resolution training image size to be 96x96, setting the supervision image size to be 192 x 192, calculating the loss between the image of the training image after entering the network and the supervision image, finding balance between the training speed and the training precision by utilizing the adaptive learning rate, and continuously optimizing the network by the optimizer to fit the mapping relation between the low-resolution image and the high-resolution image. And after the training is finished, storing the model with the highest fitting rate.

(4) For S4, super-resolution magnification of the crack image.

And mapping the low-resolution crack image by using the trained model file so as to obtain the high-resolution crack image. Compared with an image amplified by a traditional interpolation method, the high-resolution image obtained by the network has better texture detail information and visual experience, and the image is finally stored and can be used for subsequent crack detection.

The above embodiments are merely illustrative of the technical solutions of the present invention, and are not restrictive. Those skilled in the art will understand that: the above embodiments do not limit the present invention in any way, and all similar technical solutions obtained by means of equivalent replacement or equivalent transformation belong to the protection scope of the present invention.

Claims

1. A crack detection-oriented deep learning super-resolution method is characterized by comprising the following steps:

s1: constructing a crack image dataset for network training;

s2: constructing a super-resolution network facing to the crack;

s3: training a super-resolution network facing the crack;

for S2, constructing a crack-oriented super-resolution network; the designed network can be divided into three modules of Head, Body and Tail, wherein relu is an activation function; the Head module consists of two common convolution layers with convolution kernel size of 3, expands the number of channels of an image with input resolution of 96 multiplied by 96 and channel number of 3 to 64, and performs initial extraction of features; the Body module consists of 16 repeatedly stacked blocks and 1 common convolution layer, and jump connection is carried out between the input and the output of the Body module for extracting refined texture features; each Block is divided into a front part, a middle part and a rear part, and the front part extracts information output by a previous layer by a common convolution with a convolution kernel of 3; the middle part is a lightweight residual error structure, namely jump connection is carried out between input and output for fusing characteristic information; the residual error structure firstly increases the number of image channels to 128 through the common convolution with the convolution kernel size of 1, then carries out information extraction through the depth separable convolution with the convolution kernel size of 3, the convolution sets the number of the convolution kernel channels to 1, the number of the convolution kernels is set to the number of the characteristic diagram channels, the characteristic diagram is subjected to channel separation operation after the depth separable convolution, the characteristic diagram is divided into 2 two characteristic diagrams with 64 channels and the same size, and the two characteristic diagrams are respectively sent into two branches of channel attention and space attention; wherein the channel attention carries out average pooling treatment aiming at the characteristic graph of each channel to obtain a one-dimensional vector; obtaining an output vector through 2 common convolutional layers with the convolutional kernel size of 1, wherein the vector analyzes a weight relation for each channel of the characteristic diagram and gives a larger weight to more important channels; finally, normalizing the output vector, and multiplying the output vector by the input feature map to obtain a new feature map; the spatial attention is focused on the spatial information of the feature map, and all channels of the feature map are subjected to average pooling operation to obtain a single-channel feature map; then obtaining an output characteristic diagram through a common convolution layer with the convolution kernel size of 7; finally, the output characteristic diagram is normalized and processed, and then the input characteristic diagram is multiplied to obtain a new characteristic diagram; then, carrying out channel splicing operation on the feature graphs processed by the two attention mechanisms, and carrying out channel number reduction operation through a common convolution with a convolution kernel size of 1 to restore the state of the residual error structure input feature graph; the back part is a common convolution with a convolution kernel size of 3, information gathering is carried out on the output of the middle residual error structure, finally, the input and the output of the Block module are fused in a jump connection mode, and interaction among texture information is improved; the Tail module consists of two ordinary convolution layers with convolution kernel size of 3 and a sub-pixel convolution layer and is used for carrying out the upper adoption operation on the feature graph output by Body;

for S3, training a super-resolution network facing the crack; inputting the constructed crack super-resolution data set into a designed network, selecting an L1 loss function and an Adam optimizer, setting the training times to be 200, setting the initial learning rate to be 0.0005, inputting the size of a low-resolution training image to be 96x96, setting the size of a supervision image to be 192 x 192, calculating the loss between the image after the training image enters the network and the supervision image, optimizing the fitting relation, and storing a model with the highest fitting rate after the training is finished; and mapping the low-resolution crack image by using the trained model file so as to obtain the high-resolution crack image.

2. The crack detection-oriented deep learning super-resolution method according to claim 1, characterized in that:

for S1, constructing a fracture image dataset for network training; partial crack images are obtained through public resources, and different types of cracks under different scenes are shot by a camera to be used as supplements, so that the diversity of crack types is improved; randomly rotating the data, supplementing light and improving the generalization of the data; then, cutting the processed image data to obtain a supervision image used as a true value; and then, processing the cut images by adopting a down-sampling and noise-adding mode to generate low-resolution images for training.