Lightweight engineering structure crack identification method and system based on phantom convolution
Technical Field
The invention relates to the technical field of computer vision and engineering structure crack identification, in particular to a lightweight engineering structure crack identification method and system based on phantom convolution.
Background
With the rapid development of socioeconomic of China, the important civil engineering structure gradually becomes large-scale and complicated. After the engineering structure is built, the maintenance and management become the key of the safe operation of the engineering structure. The crack is one of the main diseases of the engineering structure, and the safe operation of the engineering structure is seriously influenced. Therefore, effective crack identification of the engineering structure is very important. With the rapid development of computer technology, particularly the rapid development of convolutional neural networks, image recognition and computer vision, the nondestructive crack recognition based on the computer vision has become a research hotspot for crack recognition at home and abroad.
The traditional neural network model needs to extract depth features with stronger expression capability through huge convolution operation, limited storage and calculation resources of portable equipment cannot meet the requirements of the traditional neural network model, and how to design an efficient and accurate lightweight neural network model for crack identification is the key for solving the problems. With the rapid development of the mobile internet, portable equipment is rapidly popularized, a traditional neural network model needs to extract depth features with stronger expression capability through huge convolution operation, limited storage and calculation resources of the portable equipment cannot meet the requirements of the portable equipment, and how to design an efficient and accurate lightweight neural network model for crack recognition is the key for solving the problems.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a lightweight engineering structure crack identification method and system based on phantom convolution, and provides a solution for applying the engineering structure crack identification to portable equipment.
In order to achieve the purpose, the invention designs a lightweight engineering structure crack identification method based on phantom convolution, which is characterized in that,
the method is used for carrying out engineering structure crack identification on image data and comprises the following steps: collecting a concrete engineering structure picture to be identified; inputting a picture to be recognized into a trained lightweight engineering structure crack recognition network, classifying the picture to be recognized by the lightweight engineering structure crack recognition network, and outputting two recognition results, namely a crack recognition result and a crack-free recognition result;
the specific steps of the modeling and training process of the lightweight engineering structure crack identification network comprise:
1) constructing a phantom convolution module: extracting image intrinsic features by using common convolution, performing linear transformation on the image intrinsic features to obtain ghost features, and splicing the image intrinsic features and the ghost features together to obtain complete image features;
2) constructing a network construction unit: firstly, increasing the number of characteristic channels by using a phantom convolution module, extracting deep features of an image, then performing down-sampling by using depth convolution, and then compressing the number of the characteristic channels by using another phantom convolution so as to reduce the calculated amount;
3) network training: and constructing a lightweight engineering structure crack identification network based on the network construction unit stack, performing network training on the lightweight engineering structure crack identification network by using a crack data set, mapping image features to one-dimensional vectors by using global average pooling and point-to-point convolution, and outputting through a full connection layer to obtain a final crack identification result.
Preferably, step 1) of the modeling and training process of the lightweight engineering structure crack recognition network extracts the intrinsic features of the image by using 1 × 1 ordinary convolution, and the calculation formula is
Z=X*λ (1)
In the formula, Z is the extracted intrinsic feature, the dimension of Z is mxhxw, lambda is a convolution filter of the layer network, the dimension of Z is mxhxw X c, X is convolution operation, R is a feature set, h and w are the height and width of the feature after convolution, m is the number of common convolution filters, the value of Z is n/2 and the deviation term in the convolution process is discarded, n is the number of feature channels, c is the number of input feature channels, and R is the size of the convolution filter. The size of the ordinary convolution hyper-parametric convolution filter is 1, the step bit is 1, and padding is 0.
Preferably, when the intrinsic features of the image are linearly transformed in step 1), firstly, the intrinsic features of the image are subjected to batch normalization, then, the intrinsic features of the image are subjected to linear transformation through 3 × 3 depth convolution, image space features are extracted to serve as ghost features, then, the ghost features are subjected to batch normalization, finally, the intrinsic features and the ghost features are spliced together in channel dimensions to obtain complete image features, a part of crack features are generated through common convolution in phantom convolution, then, redundant features are generated through cheap linear transformation, all features of the crack image are extracted under extremely low calculation and storage consumption, and a calculation formula of the phantom convolution is as follows:
Y=Z+F(Z) (2)
in the formula, Z is an extracted intrinsic feature, the dimension of Z is mxhxw, m is the number of ordinary convolution filters, n/2 is taken, a deviation item in the convolution process is discarded, n is the number of feature channels, the size of the ordinary convolution hyper-parametric convolution filter is 1, a step bit is 1, padding is 0, F represents 3 × 3 deep convolution, + represents a concatenate operation, Y is a finally output image feature, and the dimension of Y is nxhxw.
Preferably, the network construction unit in step 2) of the modeling and training process of the lightweight engineering structure crack recognition network is based on a MobileNetV2 network structure, and the convolution in the phantom convolution used first uses batch normalization and a ReLU nonlinear activation function, while only batch normalization is used in one phantom convolution used again.
Preferably, a lightweight channel attention machine ECA is added before the phantom convolution is used again in step 2) of the modeling and training process of the lightweight engineering structure crack identification network to guide the computing resource to be biased to the part with the largest information amount in the input signal.
Preferably, in step 3) of the modeling and training process of the lightweight engineering structure crack recognition network, the lightweight engineering structure crack recognition network firstly uses a standard convolutional layer with a step length of 2 and 4 × 3 convolutional filters, then uses a series of network construction units with gradually increasing feature channel numbers, and finally classifies through a full connection layer.
Preferably, in step 3) of the modeling and training process of the lightweight engineering structure fracture identification network, the lightweight engineering structure fracture identification network performs model training by using an SGD + Momentum optimizer, and updates the weight matrix by using a weighted average sum of a historical weight gradient matrix and a current weight gradient matrix, wherein a calculation formula is as follows:
V t =aV t-1 +(1- a)∇ω t r (7)
ω t+1= ω t -bV t (8)
in the formula, ∇ omega t r is the current weight gradient, V t-1 For historical weight gradients, V t Is the weighted average of the current weight gradient and the historical weight gradient, a is a weight coefficient, a is more than or equal to 0 and less than or equal to 1, omega t Is the current weight, b is the momentum coefficient, ω t+1 Is the updated weight.
The invention also provides a light-weight engineering structure crack identification system based on phantom convolution, which is characterized by comprising a phantom convolution module, a network construction unit, a light-weight engineering structure crack identification network and a crack identification module;
the phantom convolution module: the image processing system comprises a plurality of image processing modules, a feature splicing layer and a virtual image convolution module, wherein the virtual image convolution module is used for constructing a virtual image convolution module, the structure of the virtual image convolution module comprises a common convolution layer, a depth convolution layer and a feature splicing layer, the common convolution layer uses common convolution to extract internal features of an image, the depth convolution layer performs batch normalization on the internal features of the image, linear transformation is performed on the internal features through the depth convolution layer, image space features are extracted to serve as ghost features, the ghost features are subjected to batch normalization, and the feature splicing layer splices the internal features of the image and the ghost features together on a channel dimension to obtain complete image features;
the network construction unit: the method comprises the steps that two phantom convolution modules and a depth convolution stack are used, one phantom convolution module is used for increasing the number of characteristic channels, deep features of an image are extracted, then the depth convolution stack is used for carrying out down-sampling, and finally the other phantom convolution module is used for compressing the number of the characteristic channels to reduce the calculated amount;
the lightweight engineering structure crack identification network comprises the following components: the system is used for mapping image features to one-dimensional vectors by point-by-point convolution and global average pooling, wherein the first layer is a standard convolution layer of a convolution filter, then the standard convolution layer is stacked by a series of network construction units with gradually increased feature channel numbers, and finally the standard convolution layer is classified by a full connection layer;
the crack identification module: and the method is used for inputting the picture to be identified into the lightweight engineering structure crack identification network to obtain a classification result.
Further, a lightweight attention mechanism ECA mechanism is arranged between the deep convolution stack and the next phantom convolution module in the network construction unit when the number of the characteristic channels reaches the maximum.
The invention further provides a computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the above-mentioned lightweight engineering structure crack identification method based on phantom convolution.
At present, the mainstream lightweight neural network model needs to extract deep features of an image through a complex and multilayer network structure, but the image features of cracks are simpler, so that redundant calculation exists by directly using the existing lightweight neural network model, and the model reasoning speed is reduced. If the existing lightweight neural network structure is directly simplified, the image feature extraction is insufficient, and the crack identification precision is reduced.
The invention provides an efficient and accurate light weight neural network model for crack identification, which has the beneficial effects that:
1. the phantom convolution adopted by the invention has good robust performance and nonlinear processing capability, and can further reduce the floating point calculation amount and parameter of the model by replacing the traditional convolution with cheap linear transformation.
2. The invention provides a lightweight network construction unit applied to crack identification, which can extract deep features of a crack image through a small amount of calculation, thereby greatly reducing the complexity of a model while keeping the precision of the model basically unchanged.
3. The method constructs a lightweight crack identification model based on an efficient network construction unit, applies a lightweight attention mechanism ECA to the model to guide computing resources to be biased to the part with the largest information amount in an input signal, and realizes efficient and accurate crack identification with extremely low computing and storage cost.
4. The method can be used for identifying the cracks of the engineering structure in the embedded equipment.
Drawings
FIG. 1 is a flow chart of a lightweight engineering structure crack identification model based on phantom convolution according to the invention;
FIG. 2 is a diagram of a network fabric element architecture;
FIG. 3 is a schematic diagram of an ECA attention mechanism;
FIG. 4 is a diagram of a lightweight fracture identification network architecture.
Detailed Description
The invention is described in further detail below with reference to the figures and specific embodiments.
The invention provides a lightweight engineering structure crack identification method based on phantom convolution, which is used for carrying out engineering structure crack identification on image data and comprises the following steps: collecting a concrete engineering structure picture to be identified, inputting the picture to be identified into a trained lightweight engineering structure crack identification network, classifying the picture to be identified by the lightweight engineering structure crack identification network, and outputting two identification results of a crack and a crack; the concrete engineering structure picture is acquired at any angle and distance through a single-lens reflex camera, the picture is in a jpg format, and the resolution is more than or equal to 224.
The specific steps of the modeling and training process of the lightweight engineering structure crack identification network are shown in fig. 1, and comprise the following steps:
step 1: and constructing a phantom convolution module.
101) Firstly, the 1 × 1 ordinary convolution is used to extract the intrinsic features of the image, and the calculation formula is as follows
Z=X*λ (1)
In the formula, Z is the extracted intrinsic feature, the dimension of Z is mxhxw, lambda is a convolution filter of the layer network, the dimension of Z is mxhxw X c, X is convolution operation, R is a feature set, h and w are the height and width of the feature after convolution, m is the number of common convolution filters, n/2 is taken (n is the number of output feature channels) and the deviation term in the convolution process is discarded, c is the number of input feature channels, and R is the size of the convolution filter. The size of the ordinary convolution hyper-parametric convolution filter is 1, the step bit is 1, and padding is 0.
102) And carrying out batch normalization on the internal features of the image, carrying out linear transformation on the internal features through 3 x 3 depth convolution, extracting image space features as ghost features, carrying out batch normalization on the ghost features, and finally splicing the internal features and the ghost features together on a channel dimension to obtain complete image features. The crack image detail features are single, more redundant features exist in the case of using common convolution compared with common images, the redundant features are similar but not identical, the detail features which can cause the crack image to be omitted by the model are directly removed, the identification precision is reduced, and if the detail features are reserved, a large amount of calculation and storage waste is brought, and the model reasoning speed is reduced. The phantom convolution can firstly generate a part of crack features through common convolution, then generate the redundant features through cheap linear transformation, extract all features of crack images under extremely low calculation and storage consumption and improve the model reasoning speed. The computational formula for phantom convolution is as follows:
Y=Z+F(Z) (2)
in the formula, Z is an extracted intrinsic feature, the dimension of Z is mxhxw, m is the number of ordinary convolution filters, n/2 is taken, a deviation item in the convolution process is discarded, n is the number of feature channels, the size of the ordinary convolution hyper-parametric convolution filter is 1, a step bit is 1, padding is 0, F represents 3 × 3 deep convolution, + represents a concatenate operation, Y is a finally output image feature, and the dimension of Y is nxhxw.
Step 2: a network construction unit is constructed.
201) As shown in fig. 2, the network construction unit is formed by two phantom convolution modules and a 3 × 3 deep convolution stack. Firstly, a phantom convolution module is used for increasing the number of characteristic channels, deep features of an image are extracted, then 3 x 3 depth convolution with the step length of 2 is used for carrying out down-sampling, and finally a phantom convolution is used for compressing the number of the characteristic channels so as to reduce the calculated amount. MobileNetV2 suggests that the use of the nonlinear activation function ReLU in the high dimension results in a large loss of information, so we use batch normalization and the ReLU nonlinear activation function in the first phantom convolution, and only batch normalization in the second phantom convolution. ReLU is calculated as follows
f(x)=max(0,x) (3)
202) As shown in fig. 3, the last network construction unit adds a lightweight Channel Attention Channel assignment (ECA) to guide the computation resource to the part of the input signal with the largest amount of information before the second phantom convolution, so as to improve the performance of the network.
203) The lightweight attention mechanism ECA first pools the extracted image features globally averaged, narrows the height and width of the features to 1 × 1, and then uses a band matrix R k To learn channel attention, ω is the band matrix parameter, R k The formula is as follows:
in this example, k is 3, which means that each channel exchanges information with only two adjacent channels, ECA has 3 × C parameters, and channel y i The attention weight of (2) is only related to itself and its 2 adjacent channels, and all channels share parameters to improve ECA efficiency, and the attention weight of each channel is calculated as follows
The formula (5) can be quickly realized by one-dimensional convolution, and the calculation formula is as follows
Where eta is the channel attention weight of the attention mechanism output, sigma is the activation function, and y is the initial channel attention weight,Ω 3 Representing 3 adjacent channel attention weight sets, i is the channel attention weight serial number output by the attention mechanism, j is the initial channel attention weight serial number, CAD 3 A one-dimensional convolution operation of size 3. And finally, multiplying the learned channel attention weight by the corresponding characteristic of the learned channel attention weight and inputting the multiplied channel attention weight to the next phantom convolution module.
And step 3: and (5) network training.
301) A lightweight engineering structure crack identification network is constructed based on network construction unit stacking, the lightweight crack identification network structure is shown in figure 4, the first layer is a standard convolution layer with the step length of 2 and 4 3 x 3 convolution filters, and then a series of network construction unit stacking with gradually increased feature channel numbers are carried out, so that deep features of an image are extracted under the condition of low calculated amount and parameter amount. The step size of the phantom convolutions in all network construction units is 1 and downsampling is performed using a 3 x 3 depth convolution with a step size of 2 between the first and second phantom convolutions. After the image features are extracted, the features are mapped into vectors with the length of 192 in one dimension by using global average pooling and point-by-point convolution, and finally classified through a full connection layer. The spatial features of the crack image are single, the crack features can be rapidly extracted through continuous downsampling, and model calculation and storage consumption are reduced (common images cannot be continuously downsampled to extract the features). However, if the step length is set to be 2 directly on the pooling or the common convolution, and continuous down-sampling is carried out, partial crack characteristics can be ignored, and the model identification precision is reduced. The algorithm downsamples using a 3 x 3 depth convolution with a step size of 2 between the first and second phantom convolutions, downsampling at the high pass to fully extract the fracture features.
302) And performing network training on the lightweight engineering structure crack identification network by using the crack data set. The method comprises the steps of collecting concrete crack pictures in the daytime, at night and under different weather conditions through a single-lens reflex camera at any angle and distance, wherein the pictures are in a jpg format, the resolution is more than or equal to 224 x 224, collecting at least 5000 cracks and non-cracks, carrying out manual marking to train a model, and keeping training samples balanced. The concrete itself cannot be pure black (other colors such as gray) and is distinguished from the pure black cracks on the picture.
Model training used SGD + Momentum as the optimizer, with Momentum set to 0.9. The SGD + Momentum updates the weight matrix using the weighted average sum of the historical weight gradient matrix and the current weight gradient matrix, which is calculated as follows. Wherein ∇ omega t r is the current weight gradient, V t-1 For historical weight gradients, V t Is the weighted average of the current weight gradient and the historical weight gradient, a is a weight coefficient, a is more than or equal to 0 and less than or equal to 1, omega t Is the current weight, b is the momentum coefficient, ω t+1 For the updated weight, under the same learning rate, the weight matrix can be updated in a larger step by using a Momentum accelerated SGD optimization algorithm, so that the model can cross a local optimal point as far as possible, and a good convergence effect is achieved.
V t =aV t-1 +(1- a)∇ω t r (7)
ω t+1= ω t -bV t (8)
In the formula, ∇ omega t r is the current weight gradient, V t-1 For historical weight gradients, V t Is the weighted average of the current weight gradient and the historical weight gradient, a is a weight coefficient, a is more than or equal to 0 and less than or equal to 1, omega t Is the current weight, b is the momentum coefficient, ω t+1 Is the updated weight.
303 mapping image features to one-dimensional vectors by using global average pooling and point-by-point convolution, and outputting through a full connection layer to obtain a final crack identification result
The experiment of the embodiment of the invention is carried out on a PC machine of an i58300H processor, an NVIDIA GTX-1080Ti display card and a 16G memory, and is realized by using a Paddle2.1.3 deep learning framework. A crack data set disclosed by the middle east technical university is selected as an evaluation object, network training parameters are set, and then the performance of the algorithm is improved through experimental analysis.
From equation 2, it can be seen that there are two hyper-parameters in the phantom convolution, namely, the number m of convolution filters used to generate the intrinsic features and the filter size d × d of the depth convolution used to perform linear transformation on the intrinsic features. Based on previous successful experience, m is set to be n/2, and experiments are carried out on fracture data sets for different sizes of d, and the results are shown in table 1. Acc represents the accuracy of crack identification, and MFLOPs and MParams represent million floating point operands and million parameter quantities respectively.
Experimental results show that the 3 × 3 depth convolution filter has better effect compared with a larger and smaller filter. This is because the 1 × 1 convolution filter cannot introduce spatial information into the features, and the larger convolution filters of 5 × 5 and 7 × 7 result in over-fitting and greater computational and memory access. D in the phantom convolution is finally set to 3.
The experimental results of the crack data set disclosed by the lightweight crack identification model and other mainstream lightweight deep neural networks at the middle east technical university are shown in table 2.
The experimental result shows that compared with the mainstream lightweight trunk network in recent years, the lightweight crack identification model provided by the invention has a greatly improved reasoning speed on a crack data set disclosed by the middle east technology university. Compared with GhostNet 0.25 multiplied by good performance, the floating point operand of the model is reduced by nearly 2 times while keeping Acc unchanged, the parameter quantity is reduced by nearly 2 times, and the calculation and storage expenses are greatly reduced while the crack identification precision is kept, so that the model can be applied to embedded equipment.
Based on the above method, the present invention further provides a device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above method.
The invention constructs a lightweight crack identification network construction unit based on phantom convolution, and designs a brand-new lightweight crack identification network architecture by referring to the network architecture of MobileNet V3. The method comprises the steps of increasing the number of characteristic channels of an image through a common convolution, and then extracting deep features of the crack image through a series of network construction units which are continuously sampled and the number of the characteristic channels is gradually increased. The down-sampling of the network construction unit is realized by a single 3 x 3 deep convolution in the feature with a high channel number, rather than the traditional pooling or changing the convolution step size. The continuous downsampling network structure not only greatly reduces the parameter quantity and the calculated quantity of the model, but also avoids the loss of crack image characteristic information caused by common downsampling. In order to improve the network performance, an ECA light attention mechanism is specially added at the position with the highest number of characteristic channels to guide the calculation resources to be biased to the part with the largest information amount in the input signal.
Those not described in detail in this specification are well within the skill of the art.
Finally, it should be noted that the above detailed description is only for illustrating the technical solution of the patent and not for limiting, although the patent is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the patent can be modified or replaced by equivalents without departing from the spirit and scope of the technical solution of the patent, which should be covered by the claims of the patent.