CN115565068B

CN115565068B - Full-automatic detection method for breakage of high-rise building glass curtain wall based on light-weight deep convolutional neural network

Info

Publication number: CN115565068B
Application number: CN202211210404.4A
Authority: CN
Inventors: 卓仁杰; 沈晓涵; 杨明昊; 余明行; 高琳琳
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-04-18
Anticipated expiration: 2042-09-30
Also published as: CN115565068A

Abstract

The invention relates to a full-automatic detection method for high-rise building glass curtain wall damage based on a lightweight deep convolutional neural network, which comprises the steps of constructing a YOLO network, replacing a convolutional layer in the YOLO network by an extended convolutional layer, a channel-by-channel convolutional layer, a point-by-point convolutional layer and a pruning layer which are sequentially connected to obtain a new convolutional layer, and finally using the replaced YOLO network as a constructed first detection network; then the parameter w to be learned in the pruning layer is processed _i Carrying out sparse constraint and pruning on the new convolutional layer in the first detection network to obtain a second detection network; and finally, carrying out sparse constraint and pruning on the scaling factor vector of each BN layer in the second detection network to obtain a third detection network. Therefore, the method further reduces the parameter quantity of the model on the premise of ensuring high accuracy, and realizes the full-automatic detection of the glass curtain wall of the urban high-rise building based on the unmanned aerial vehicle.

Description

Full-automatic detection method for breakage of high-rise building glass curtain wall based on light-weight deep convolutional neural network

Technical Field

The invention relates to the field of image detection, in particular to a full-automatic detection method for high-rise building glass curtain wall damage based on a lightweight deep convolutional neural network.

Background

The building glass curtain wall is widely applied to high-rise buildings due to the advantages of beautiful appearance, wide visual field, rapid construction, strong plasticity and the like. In the building glass curtain wall industry, china is subjected to rapid development from none to some market leading positions, and the building glass curtain wall is the first world producing and using big country at present. However, under the influence of factors such as outside (such as bad weather), use time and the like, the glass curtain wall can be damaged in the use process, the method not only influences the beauty and the visual field of the glass curtain wall, but also has huge potential safety hazard, so that the timely and accurate detection of the building glass curtain wall must be enhanced. Therefore, china has made a strict engineering specification document (JGJ 102-2003) of the building glass curtain wall, wherein the following rules are provided: the glass curtain wall needs to be fully checked every five years. This makes the inspection demand for glass curtain wall greatly improve.

At present, the building glass curtain wall detection industry mainly uses the traditional manual visual inspection mode, and this mode is time-consuming and time-consuming, and easily produces the problem of missed measure or false measure because of artifical tired. In addition, the inside of some high-rise glass curtain walls is sheltered from and causes that the detection can only be operated outside the building, and artifical visual needs and personnel high altitude construction combine together to realize the detection this moment. Thus, conventional detection methods also present certain risks. Therefore, an efficient and fast novel glass curtain wall detection method is urgently needed to avoid the defects of the traditional mode.

Along with the rapid development of unmanned aerial vehicle technique, the mode that utilizes unmanned aerial vehicle to carry out the aided detection to building glass curtain wall has appeared in recent years. The picture that this mode was shot with unmanned aerial vehicle transmits to measurement personnel's computer in real time on, measurement personnel only need look over the picture that unmanned aerial vehicle transmitted back on the computer to evaluate building glass curtain wall's state. The mode is labor-saving and time-saving, and effectively avoids dangers brought by high-altitude operation. However, this method requires the human inspection of each image collected by the unmanned aerial vehicle by the inspector, and thus still has the problems of a certain degree of labor and time consumption and missed inspection or false inspection caused by artificial fatigue. Therefore, based on the pictures acquired by the unmanned aerial vehicle, a Convolutional Neural Network (CNN) is introduced to realize a full-automatic detection method of the glass curtain wall of the high-rise building.

CNN is widely used in tasks such as image detection, classification, segmentation, etc. due to its excellent feature extraction capability. In recent years, there are a large number of advanced CNNs used for target detection, for example, the YOLO series, SSD, fast R-CNN, faster R-CNN, and the like. However, these networks have a large number of parameters and are difficult to embed in small devices, drones, which have limited computing power. Further improvements to the prior art are needed for this purpose.

Disclosure of Invention

The invention aims to solve the technical problem of providing a full-automatic detection method for the breakage of a glass curtain wall of a high-rise building based on a light-weight deep convolutional neural network, which has high accuracy and a small parameter number model in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a full-automatic detection method for high-rise building glass curtain wall damage based on a lightweight deep convolutional neural network is characterized by comprising the following steps:

step 1, obtaining a certain number of glass curtain wall images of urban high-rise buildings, and carrying out damage marking on the glass curtain wall images to obtain labels of the glass curtain wall images so as to form a sample set;

step 2, dividing the sample set into a training set, a verification set and a test set;

step 3, constructing a first detection network, and training and verifying the constructed first detection network to obtain the first detection network with optimal parameters;

the method for constructing the first detection network specifically comprises the following steps:

constructing a YOLO network, replacing a convolution layer in the YOLO network by using sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution to obtain a new convolution layer, and finally using the replaced YOLO network as a constructed first detection network;

wherein the structure of the new convolution layer is as follows: judging whether the channel-by-channel convolution step length is 1, if so, connecting a pruning layer after point-by-point convolution, and performing residual connection between the input of the extended convolution and the output of the pruning layer to obtain a new convolution layer; if not, taking the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution as a new convolution layer;

the specific calculation process of the pruning layer is as follows:

the pruning layer comprises a differentiable gate, the output g of which _i The calculation formula is as follows:

wherein the content of the first and second substances,

is a Sigmoid function, g _i ∈[0,1]，w _i Is the parameter to be learned in the differentiable gate;

output g of the differentiable gate _i Normalized to 0 or 1 to obtain

Wherein G is _i ∈{0,1}，

Represents: />

Then G _i ＝1；/>

Then G _i ＝0；

Output of the pruning layer

The calculation formula is as follows:

wherein, F _i Feature maps output for point-by-point convolution, E (G) _i ) Is a spreading function which will G _i Extended to and _i the same size; as element position-by-position multiplication;

output characteristic diagram F of new convolution layer with convolution step size of 1 channel by channel _i3 The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

adding elements position by position; f _i1 Is the feature map input to the extended convolution; when G is _i If =0, then F _i3 ＝F _i1 (ii) a When G is _i =1, then =>

The training process is as follows: inputting the images in the training set into the first detection network in batches, and performing training each time on the parameter w to be learned in the pruning layer _i Carrying out sparse constraint, and calculating Loss function Loss of first detection network sparse training _b And through the Loss function Loss _b Reversely updating the parameters of the first detection network;

step 4, pruning the new convolution layer with the channel-by-channel convolution step length of 1 in the first detection network with the optimal parameters in the step 3 to obtain a second detection network;

step 5, inputting the images in the training set into a second detection network in batches, carrying out sparse constraint on the scaling factor vector gamma of each BN layer in the second detection network during each training, calculating a Loss function Loss of sparse training of the second detection network, and reversely updating parameters of the second detection network through the Loss function Loss to obtain the second detection network with optimal parameters;

step 6, pruning the second detection network with the optimal parameters in the step 5 according to the scaling factor vectors gamma of all BN layers to obtain a third detection network;

step 7, fine tuning the third detection network by using samples in the training set and the verification set, obtaining a final third detection network if the performance of the fine tuned third detection network meets the requirement, and otherwise, repeating the steps 4-6;

and 8, randomly selecting a glass curtain wall image in the test set, and inputting the selected glass curtain wall image into the final third detection network in the step 7 to obtain a detection result of the glass curtain wall image.

Further, the pruning in the step 4 comprises the following specific steps:

g corresponding to the pruning layer in the new convolution layer with the channel-by-channel convolution step length of 1 _i Pruning the new coiling layer to remove the new coiling layer if the coiling layer is less than or equal to 0.5;

g corresponding to the pruned layer in the new convolution layer with channel-by-channel convolution step length of 1 _i And if the sum is more than 0.5, removing the pruning layer in the new convolution layer, storing the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution, and performing residual connection between the input of the extended convolution and the output of the point-by-point convolution.

In order to increase the effective receptive field of the detection network, in step 3, after the convolutional layer is replaced by the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution, the method further includes enlarging the size of at least part of convolutional kernels in the new convolutional layer.

Preferably, the YOLO network in step 3 is a YOLO v4 network.

Compared with the prior art, the invention has the advantages that: in order to achieve the aim that the unmanned aerial vehicle carries the detection network to be used for completing full-automatic detection of the damage of the glass curtain wall of the urban high-rise building, firstly, the common convolution layer in the YOLO network is replaced by the sequentially connected extended convolution, channel-by-channel convolution, point-by-point convolution and pruning layer, so that the parameter number of the model is reduced; secondly, in order to increase the effective receptive field of the network, a part of convolution kernels are enlarged, and the detection precision of the network is effectively improved; and then designing a mixed pruning strategy based on the new convolutional layer and the characteristic diagram channel, and further reducing the parameter quantity of the model on the premise of ensuring high accuracy. And finally, deploying the network to the unmanned aerial vehicle, realizing the full-automatic detection of the glass curtain wall of the urban high-rise building based on the unmanned aerial vehicle, and changing the current situations of manual inspection and visual inspection in the industry.

Drawings

FIG. 1 is a flow chart of a fully automatic detection method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a first detection network in accordance with the present invention;

FIG. 3 is a schematic diagram of the extended convolution, channel-by-channel convolution, and point-by-point convolution in connection with the present invention;

FIG. 4 is a schematic diagram of a new convolutional layer with a channel-by-channel convolution step size of 1 in the present invention;

FIG. 5 is a schematic diagram of pruning a new convolutional layer with a channel-by-channel convolution step size of 1 in the present invention;

FIG. 6 is a schematic diagram of channel pruning based on feature maps in the present invention.

Detailed Description

The invention is described in further detail below with reference to the accompanying examples.

As shown in fig. 1 to 6, the method for fully automatically detecting the breakage of a glass curtain wall of a high-rise building based on a lightweight deep convolutional neural network in the embodiment includes the following steps:

in the embodiment, the image of the glass curtain wall of the urban high-rise building is obtained by means of network search, mobile phone shooting, unmanned aerial vehicle shooting and the like; completing image breakage labeling work by using Labelimg;

in this embodiment, the training set, the verification set, and the test set are set according to 7:2:1, the training set is used for training the network, namely network parameters are adjusted; the verification set is used for selecting the optimal network parameters; the test set is used for testing the optimal network generalization capability;

constructing a YOLO network, replacing a convolution layer in the YOLO network by sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution to obtain a new convolution layer, and finally using the replaced YOLO network as a constructed first detection network;

when the channel-by-channel convolution step length is 1, the size of the feature graph input by the extended convolution is the same as the size of the feature graph output by the pruning layer, so residual connection can be performed between the input of the extended convolution and the output of the pruning layer, and the detection accuracy is improved; on the contrary, when the channel-by-channel convolution step length is not 1, residual connection cannot be performed between the input of the extended convolution and the output of the pruning layer; when the step size of the channel-by-channel convolution is 2 (i.e., stride: 2), no residual connection is made between the input of the spread convolution and the output of the point-by-point convolution, as shown in FIG. 3 (a); when the channel-by-channel convolution step size is 1 (i.e., stride: 1), as shown in fig. 3 (b), residual connection is performed between the input of the spread convolution and the output of the point-by-point convolution; the following new convolutional layer pruning is mainly directed to the case where there is residual concatenation (i.e. to: channel-by-channel convolution step size is 1); as shown in fig. 2, for convenience of description, a module after connection of the extended convolution, the channel-by-channel convolution and the point-by-point convolution is denoted as IRB, and as shown in fig. 4, a new convolution layer after the point-by-point convolution and connected with the pruned layer is denoted as IRB-compact; the first feature map, the second feature map, the third feature map and the fourth feature map as in fig. 4 correspond to an output of the spread convolution, an output of the channel-by-channel convolution, an output of the point-by-point convolution and an output of the pruning layer, respectively;

in this embodiment, the YOLO network is a YOLO v4 network; the YOLO v4 is an improved version of a YOLO v3 target detection algorithm, and compared with the YOLO v3 target detection algorithm, the YOLO v3 target detection algorithm is optimized in the aspects of data enhancement, backbone feature extraction network, feature pyramid activation functions and the like, so that the real-time detection speed and precision of the YOLO v3 target detection algorithm are improved. The YOLO v4 network comprises a feature extraction backbone network, an SPP module, a feature fusion module and a classification regression layer, wherein the feature extraction backbone network is of a CSPDarknet53 network structure and realizes feature extraction of an input image; the SPP module can extract multi-scale depth features with different receptive fields, and the multi-scale depth features are connected in the channel dimension of the feature map for fusion, so that the detection precision is improved; the PANET is a characteristic fusion module which fuses the characteristics of the context by an up-sampling method and a down-sampling method to obtain higher semantic information and improve the accuracy rate of target detection; the YOLO-head structure of YOLO v3 is still used in the classification regression layer. In addition, the feature extraction backbone network and the feature fusion module comprise a plurality of convolution layers, and a BN layer (batch standard layer) is arranged behind each convolution layer; the specific structure of each module of the YOLO v4 network is shown in fig. 2, where the YOLO v4 network is the prior art and is not described herein again;

when the convolution layer is replaced by the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution, the size of at least part of convolution kernels in the new convolution layer is enlarged, namely the IRB with the enlarged size of the convolution kernels is recorded as e-IRB, so that the detection precision of the detection network is effectively improved;

the specific working principle of the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution can refer to the disclosure of the applicant's prior application CN202111395954.3 "an automatic detection method for breakage of glass curtain walls of high-rise buildings", and details thereof are not repeated herein.

The specific calculation process of the pruning layer is as follows:

wherein the content of the first and second substances,

g obtained here _i Is continuous, and in order to accurately generate the pruned network, the output g of the differentiable gate is used in this embodiment _i Normalized to 0 or 1 to obtain

Wherein G is _i ∈{0,1}，/>

Represents: />

Then G _i ＝1；

Then G _i =0; due to the indicating function>

Indifferent, straight-through estimator (STE) was used to calculate the gradient;

output of the pruning layer

The calculation formula is as follows:

wherein, F _i Feature maps output for point-by-point convolution, G (G) _i ) Is a spreading function which will G _i Extend toAnd F _i The same size; as element position-by-position multiplication; for example:

then->

Output profile F of the new convolutional layer _i3 The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

adding elements position by position; f _i1 Is the feature map input to the extended convolution; when G is _i When =0, F _i3 ＝F _i1 (ii) a When G is _i =1, based on the signal strength of the signal strength measured by the sensor>

The training process is as follows: inputting the images in the training set into the first detection network in batches, and performing training each time on the parameter w to be learned in the pruning layer _i Carrying out sparse constraint, and calculating Loss function Loss of first detection network sparse training _b And through a Loss function Loss _b Reversely updating the parameters of the first detection network;

Loss _b ＝λ _b ∑σ(w _i )+L _YOLo

where σ (·) is a Sigmoid function; w is a _i Is a parameter that can be trained in the pruning layer; lambda [ alpha ] _b The sparse factor is used for controlling the sparse degree of the new convolution layer and is a manually set hyper-parameter; l is _YOLo Is a loss function of the YOLO v4, and the calculation method uses a loss function calculation method of the existing YOLO v4 network;

step 4, pruning the new convolution layer with the channel-by-channel convolution step length of 1 in the first detection network with the optimal parameters in the step 3 to obtain a second detection network; pruning is not carried out on the new convolution layer with the channel-by-channel convolution step length not being 1;

the pruning method comprises the following specific steps:

g corresponding to the pruning layer in the new convolution layer with the channel-by-channel convolution step length of 1 _i If the number of the pruning layers in the new convolution layer is more than 0.5, removing the pruning layers in the new convolution layer, storing the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution, and performing residual connection between the input of the extended convolution and the output of the point-by-point convolution;

when Gate =0.274, as shown in fig. 5, the new convolutional layer is removed, as shown in fig. 5, denoted as IRB removal module; when Gate =0.631, the pruning layer is removed, and the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution are saved, as shown in fig. 5, and are recorded as IRB retention modules;

step 5, inputting the images in the training set into a second detection network in batches, carrying out sparse constraint on the scaling factor vector gamma of each BN layer in the second detection network during each training, calculating a Loss function Loss of the sparse training of the second detection network, and reversely updating the parameters of the second detection network through the Loss function Loss to obtain the second detection network with optimal parameters;

the calculation formula of Loss is:

wherein, lambda is a balance factor and is a parameter set manually; g (γ) = | γ | L1 regularization expressed as γ; q is a set of scaling factor vectors γ for each BN layer;

the detailed pruning method can refer to the content disclosed in the applicant's prior application CN202111395954.3 "an automatic detection method for breakage of glass curtain walls of high-rise buildings", which is not described herein again; as shown in fig. 6, the numerical values with smaller scaling factor vector γ are pruned, and it can be seen from the figure that the parameters of the model can be obviously reduced after the pruning operation, and the large numerical values in the scaling factors can be retained, so as to accelerate the training speed and improve the training precision of the model;

And deploying the network obtained in the final third detection network on the unmanned aerial vehicle, and realizing full-automatic detection of the glass curtain wall of the urban high-rise building by utilizing the function of the unmanned aerial vehicle automatically surrounding the building. In this embodiment, the unmanned plane selects and uses the PHANTOM 4PRO in the world.

Claims

1. A full-automatic detection method for high-rise building glass curtain wall damage based on a lightweight deep convolutional neural network is characterized by comprising the following steps:

the specific calculation process of the pruning layer comprises the following steps:

wherein the content of the first and second substances,

will differentiate the output g of the gate _i Normalized to 0 or 1 to obtain

Wherein G is _i ∈{0,1}，

Represents: />

Then G _i ＝1；/>

Then G _i ＝0；

Output of the pruning layer

The calculation formula is as follows:

wherein, F _i Feature maps output for point-by-point convolution, E (G) _i ) Is a spreading function which will G _i Is extended to be AND _i The same size; as element position-by-position multiplication;

output characteristic diagram F of new convolution layer with convolution step length of 1 channel by channel _i3 The calculation formula of (2) is as follows:

wherein, the first and the second end of the pipe are connected with each other,

adding elements position by position; f _i1 Is the feature map input to the extended convolution; when G is _i If =0, then F _i3 ＝F _i1 (ii) a When G is _i If =1, then = live>

2. The fully automatic detection method according to claim 1, characterized in that: the specific method for pruning in the step 4 comprises the following steps:

g corresponding to the pruning layer in the new convolution layer with the channel-by-channel convolution step length of 1 _i And if the sum of the calculated average value and the calculated average value is more than 0.5, removing a pruning layer in the new convolution layer, storing the sequentially connected expansion convolution, channel-by-channel convolution and point-by-point convolution, and performing residual connection between the input of the expansion convolution and the output of the point-by-point convolution.

3. The fully automatic detection method according to claim 1, characterized in that: and in the step 3, after the convolution layer is replaced by the sequentially connected extended convolution, channel-by-channel convolution and point-by-point convolution, the method also comprises the step of expanding the size of at least part of convolution kernels in the new convolution layer.

4. The fully automatic detection method according to any one of claims 1 to 3, characterized in that: the YOLO network in step 3 is a YOLO v4 network.