CN115995056A

CN115995056A - Automatic bridge disease identification method based on deep learning

Info

Publication number: CN115995056A
Application number: CN202310281075.0A
Authority: CN
Inventors: 章羚; 王铁鑫; 岳大川; 于宪政; 温晓光; 王京杭
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-03-22
Filing date: 2023-03-22
Publication date: 2023-04-21

Abstract

The invention discloses an automatic bridge defect recognition method based on deep learning, which is characterized by collecting defect videos of related bridge parts, performing screenshot operation on the videos, marking the positions of bridge defects and the types of the defects in pictures, and thus building a training data set; training the bridge diseases according to the training data set to obtain a deep neural network model; and inputting the bridge disease video into the optimized YOLOv5 neural network model for automatic identification, outputting a disease identification and classification result and a disease positioning result, and generating a corresponding XML file. According to the invention, the automatic mode is used for identifying and marking the bridge diseases instead of the manual mode, so that the efficiency of identifying and marking the bridge diseases is effectively improved, the labor cost is reduced, and the output result can be applied to automatic generation training of a large-scale bridge disease data set.

Description

Automatic bridge disease identification method based on deep learning

Technical Field

The invention belongs to the technical field of bridge detection, and particularly relates to an automatic bridge disease identification method based on deep learning.

Background

According to the development statistics publication of the transportation industry in 2021 issued by the China department of transportation, the total amount of national highway bridges reaches 96.11 ten thousand by the end of 2021, and the total length reaches 7380.21 ten thousand of linear meters. Bridges have become an integral part of transportation systems, playing a vital role in national economy operations, all over the world. However, along with the extension of the service life of the bridge, related diseases often exist in bridge engineering under the influence of environment and load, and a plurality of potential safety hazards are caused to the bridge. Once the bridge is in accident, great economic loss is caused, and the life safety of people is even endangered. The bridge is detected regularly at the present stage, the diseases are found early, and the prevention and the control are carried out timely, so that the method is an effective measure for transforming the dangerous old bridge. The daily inspection of bridge diseases is also a correct way for ensuring the safe passing of the bridge and prolonging the service life of the bridge.

The continuous increase of the total length of the bridge and the continuous extension of the service life thereof lead to various diseases of the bridge. These diseases can cause certain potential safety hazards, and the service life of the bridge is reduced, such as cracking of bridge support concrete, surface weathering, exposed reinforcing steel bars and the like. The high-risk conditions for bridge inspection and the enormous cost of bridge maintenance indicate that efficient bridge inspection and assessment is necessary. Before deep learning techniques are widely adopted, most of the detection utilized conventional computer vision methods based on image processing. For example, edge detection methods (e.g., sobel operator, canny edge detection, LOG filter operator) can be used to detect edges between the lesion and the surrounding background, by setting a corresponding static threshold for a specific case, and determining whether there is a relevant lesion in the region of interest in the image based on the threshold, but the method is not applicable to dynamic and complex scenes. To solve this limitation, the acquired images may be detected using a computer vision technique based on deep learning, helping inspectors determine the kind and location of bridge diseases, thereby solving a large number of manpower and improving disease evaluation efficiency. However, the deep learning method needs a large number of data sets, and the current data set production mainly depends on manual work, so that the accuracy is low, the cost is high, the detection work is complex, and the situation that the data sets are difficult to acquire is caused.

Disclosure of Invention

The invention aims to: the invention aims to provide an automatic bridge disease recognition method based on deep learning, which is used for carrying out video recording on the appearance of a bridge through an unmanned aerial vehicle, accurately recognizing the bridge disease condition by combining a computer vision technology, realizing automatic classification of bridge disease categories, outputting an XML file of a classification result, and being capable of being used for subsequent model optimization training.

The technical scheme is as follows: the invention provides an automatic bridge disease identification method based on deep learning, which comprises the following steps:

(1) Preprocessing a bridge disease video obtained in advance to obtain the disease position and the disease type of the bridge, and constructing a data set to obtain a training set and a verification set;

(2) The attention introducing mechanism module optimizes the Darknet-53 network structure and the loss function in the YOLOv5 algorithm and adjusts the data parameters of the Darknet-53 network structure and the loss function; training the bridge disease identification model in batches by weight parameter data in the frozen neural network;

(3) Observing the convergence condition of the training model, and judging whether the loss function is over-fitted or not and whether the loss function meets the training standard or not; if the training standard is met, directly outputting a standard bridge disease recognition model, if the training standard is not met, finding a weight function meeting a preset standard, thawing the data set, performing refined training again, and outputting the standard bridge disease recognition model after the weight function meeting the standard is obtained;

(4) And rapidly identifying the damage condition in the bridge video by using a standard bridge damage identification model, obtaining the bridge damage category and the probability of possibility thereof, and outputting and storing the result as an XML file for training.

Further, the implementation process of the step (1) is as follows:

performing frame cutting operation on the bridge defect video to obtain image data, performing smooth processing, expanding and unifying the format and the size of the image; marking the processed picture data set, and dividing the processed picture data set into two data sets with diseases and without diseases; refining a dataset having a disease, the disease comprising: cracking, peeling, weathering, exposing the steel bars and rusting; the data set is divided into a training set and a validation set.

Further, the YOLOv5 in the step (2) includes a network dark-53 in the convolutional neural network CNN, and a network inclusion batch standardization and LeakyReLU activation function layer.

Further, the implementation process of the attention introducing mechanism module in the step (2) for optimizing the dark-53 network structure in YOLOv5 is as follows:

the attention mechanism module comprises a channel attention part and a space attention part, and the attention mechanism is led between the input connector and the output connector to lead the network model training to be focused on a target area; wherein the channel attention formula is:

；

wherein ,

activating a function for Sigmoid->

Represented as a multi-layer perceptron->

and />

Separate tableShown as average pooling and maximum pooling operation, < >>

For input features->

Representing a channel attention mechanism obtained after parallel pooling operation and input of a multi-layer perceptron;

the space attention part performs maximum pooling and average pooling splicing on the output result obtained by the channel attention, and continues to perform convolution operation, and after the dimension of the output obtained by the channel attention is reduced to a single channel, the final output result is obtained through activation; the formula of spatial attention is:

；

convolution operation representing convolution kernel 7*7, < >>

Characteristic results of the spatial attention mechanism after convolution and activation function processing pooling operation are represented.

Further, the loss function in the step (2) is:

；

；

；

wherein ：

representing the weight of the coordinate loss to the total loss; />

A weight representing the total loss of category loss;

a weight indicating total loss of the station without the detection target confidence loss;

representing the weight of the confidence loss containing the detection target accounting for the total loss;

whether the jth anchor frame of the ith grid contains the detection target or not is 1, and is not 0;

indicating whether the jth anchor frame of the ith grid contains the detection target, and is not 1, but is 0;

an abscissa value representing the center coordinates of the real target frame;

a ordinate value representing the center coordinates of the real target frame;

representing the width of a real target frame;

representing the height of a real target frame;

representation ofConfidence of the target frame;

a category indicating a detection target;

an abscissa value representing a center coordinate of the predicted target frame;

a ordinate value representing the center coordinates of the predicted target frame;

representing the width of the predicted target frame;

representing the height of the predicted target frame;

representing the confidence of the predicted target frame;

representing the category of the predicted target.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, the bridge diseases are identified by optimizing the You Only Look Once version5 YOLOv5 algorithm, the identification accuracy is superior to that of the common YOLOv5 algorithm, the identification speed is high, the detection time can be greatly shortened, and the automatic marking efficiency is improved; the detection method of the invention not only can identify the position information of the bridge diseases, but also can judge the disease condition of the bridge and mark the disease probability, shortens the manual marking time and obviously improves the generation speed of the disease data set; the network model obtained by training in the invention automatically marks the disease position and disease category in the video, can be used for automatically generating the bridge disease picture data set, reduces the labor cost and improves the detection efficiency.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a hybrid attention mechanism module architecture;

FIG. 3 is a schematic view of a channel attention structure;

fig. 4 is a schematic view of a spatial attention structure.

Description of the embodiments

The invention is described in further detail below with reference to the accompanying drawings.

The invention provides an automatic bridge disease recognition method based on deep learning, which comprises the steps of acquiring data through flying around a bridge by an unmanned aerial vehicle, and storing acquired positioning information and video information of bridge appearance conditions into a saving card through a data processing unit; processing the collected video information frame by frame to obtain image data, and preprocessing the picture information by expanding, adjusting the size, smoothing and filtering; marking the processed pictures into a data set to obtain a training set and a verification set; optimizing a YOLOv5 algorithm, performing optimization test on network model parameters, extracting picture characteristic values, and training a bridge disease identification model by freezing a part of training set; judging whether the obtained training model converges or not, whether a loss function is fitted or not, and continuing to train the model by thawing the data set optimization parameters; verifying a final training model, calculating the intersection ratio (Intersection of Union, ioU) of the bridge disease area and the actual marking area obtained by the model, drawing an average value (mean Average Precision, mAP) curve of each class of precision values, and further analyzing whether the model meets the condition; and identifying the bridge diseases by using the trained model, and outputting an XML file. The XML file output by the model can be used for manufacturing a data set in the aspect of bridge detection, the bridge disease condition can be detected, and the XML file capable of being trained can be output, so that the function of automatic marking of the bridge disease is realized, the picture data set in the bridge field can be automatically generated, the detection efficiency is improved, and the labor cost is saved. As shown in fig. 1, the method specifically comprises the following steps:

step 1: preprocessing a bridge disease video obtained in advance, obtaining the disease position and the disease type of the bridge, and constructing a data set.

The bridge unmanned aerial vehicle flies around the bridge, the micro inertial measurement unit in the positioning module collects positioning information, the data processing unit processes the collected measured values, the camera module collects video information of the bridge component, and the positioning information and the video information are stored through the data storage unit.

And collecting video of related components of the bridge, extracting fault frames of the video, obtaining image data, expanding the image data and performing image smoothing treatment. And extracting the stored video data, converting the video data into RGB picture data through a written Python program, and performing filtering smoothing processing to complete the processing of a picture data set.

Taking a frame of video signal K seconds to obtain a plurality of RGB picture data sets, manually screening to obtain a picture with defects, and smoothing noise points in the picture through Gaussian filtering; the gaussian filtering one-dimensional formula is:

；

respectively carrying out one-dimensional smoothing on RGB three channels of the picture, wherein

Is the mean value of x>

Is the variance of x, because the center point is the origin when the mean is calculated, therefore +.>

Equal to 0;

and marking a real frame of the processed picture data set by using labelImg software in a python environment, selecting a frame and marking a disease area of the picture, storing the frame as an XML file, taking 80% of the marked data set as a training set and the remaining 20% as a verification set.

For example: the bridge Crack defect is marked as 'Crack', the peeling mark is marked as 'spalling', the weathering mark is marked as 'Efflorescence', the exposed reinforcing steel bar is marked as 'Exposedbars', the corrosion mark is marked as 'corrosingain', and a corresponding XML file is obtained after each picture is marked; the main parameters contained in the XML file are described as follows: size: picture size, object: objects contained in a picture, (one picture may contain a plurality of objects, in this method, bridge diseases), name: the label class (in this method, there is a mask, spallation, efficiency, exposedBars, corrosionStain), bndbox: object real frame, difficult: whether it is difficult to identify.

Step 2: the Darknet-53 network structure and the loss function in YOLOv5 are optimized through the attention introducing mechanism module, data parameters of the Darknet-53 network structure and the loss function are optimized, and a part of training set is frozen for training the bridge disease identification model in batches.

The YOLOv5 framework contains network dark-53 in convolutional neural network CNN (Convolutional Neural Network), which contains batch normalization (Batch Normalization) and the LeakyReLU activation function layer.

The network model training loss function adopts a mean square error (Mean Square Error, MSE) loss function aiming at target frame coordinate regression, and the category and the confidence coefficient adopt cross entropy loss functions, wherein the specific forms are as follows:

；

；

；

wherein ：

representing the weight of the coordinate loss to the total loss;

a weight representing the total loss of category loss;

an abscissa value representing the center coordinates of the real target frame;

a ordinate value representing the center coordinates of the real target frame;

representing the width of a real target frame;

representing the height of a real target frame;

representing the confidence level of the target frame;

a category indicating a detection target;

representing the width of the predicted target frame;

representing the height of the predicted target frame;

representing the confidence of the predicted target frame; />

Representing the category of the predicted target.

The loss function of the whole network is:

；

wherein ,

, />

, />

, />

respectively represent the coordinate lossesCategory loss, no confidence loss, and weight with confidence loss to total loss; whether the j anchor frame of the i-th grid contains the detection target, namely bridge diseases; (/>

,/>

, />

, />

, />

, />

) The center coordinates and width and height, confidence and category of the real target frame are respectively. And (/ ->

, />

, />

, />

, />

, />

) The central coordinates, width and height, confidence and category of the target frame are predicted.

IoU is the most common indicator in target detection, and measures the similarity between a predicted frame and a target frame by encoding the shape properties (e.g., width, height, position) of the target into normalized metrics, so it has scale invariance. The calculation process is as follows:

；

wherein ：

: is the area of the prediction box.

: is the area of the target frame.

The DarkNet-53 network of the network model has 53 convolution layers, the last layer is 1×1 convolution to realize full connection, 52 convolutions coexist in the main network, wherein the first layer is convolved by a filter consisting of 32 3×3 convolution kernels, the later convolution layer is formed by 5 groups of repeated residual units, and the candidate region of the object (defect) is obtained and the category and the position of the candidate region are marked.

The mixed domain attention mechanism CBAM is introduced to optimize the YOLOv5 model, and as shown in fig. 2, the attention module comprises a channel attention part and a space attention part, and the attention mechanism is introduced between the input connector and the output connector to enable the network model training to be more focused on a target area, so that a better training effect is achieved. Wherein the channel attention is as shown in fig. 3, the formula is:

；

wherein ,

activating a function for Sigmoid->

Represented as a multi-layer perceptron->

and />

Denoted mean pooling and maximum pooling operations, respectively,>

for input features->

Representing the channel attention mechanism obtained after parallel pooling operation and input to the multi-layer perceptron.

And the spatial attention part is shown in fig. 4, the output result obtained by the channel attention is subjected to maximum pooling and average pooling splicing, convolution operation is continued, the dimension of the output result is reduced to a single channel, and the final output result is obtained through an activation function. The formula of spatial attention is:

；

wherein ：

for the features obtained after the channel attention mechanism, < +.>

and />

Denoted mean pooling and maximum pooling operations, respectively,>

convolution operation representing convolution kernel 7*7, < >>

Activating a function for Sigmoid->

The network model which is trained by the attention module is added to have higher recognition accuracy.

In the training process of the network model, the initial learning rate is set to be 0.01, the batch-size is set to be 16, a program starts to run after a part of data sets are frozen, and the YOLOv5 model is automatically adjusted to the learning rate in the training process to keep the model in an efficient learning state.

Step 3: observing the convergence condition of the training model, and judging whether the loss function is over-fitted or not and whether the loss function meets the training standard or not; the training standard is to look at the decline condition of the loss function, usually after a certain number of rounds of training, the loss function result is stabilized at a numerical value, the training is less and the fitting is insufficient, the training is more and the fitting is excessive, and the result can be seen through a graph drawn in a training log of the model.

If the training standard is met, directly outputting the model, and if the training standard is not met, finding a more optimized weight function, thawing the data set, performing refinement training again, and outputting the model after the weight function meeting the standard is obtained. In the situation of bridge diseases, the loss function is approximately trained to be about 10, so that the loss function can meet the training standard.

Since there are 5 downsampling processes with a stride of 2 x 2 in dark-53 in the YOLOv5 model, the connector map is scaled down 32 times, so the image size is reprocessed to a multiple of 32. In order to ensure the relative accuracy of model training, inhibit overfitting and improve the generalization capability of the model, a large number of picture data sets should be provided to enable the network model to learn, so that the existing picture data sets are expanded. The method comprises the steps of programming a program to enable a picture to be turned left and right and up and down, and then randomly deleting and adding pixels, adjusting contrast, adjusting brightness and the like to generate similar but not identical samples so as to enlarge a training data set.

Observing whether the numerical value of the loss function meets the requirement, and if the numerical value is stable to about 10 and floats up and down, namely meeting the condition required by the method; if the values float up and down at values greater than 10, the program is terminated prematurely to prevent over-fitting of the model learning process.

The early termination program stores the weight function obtained when the current training is carried out for a period of time, modifies the batch-size into 8, defreezes the frozen data set, finds the optimal value in the stored weight function, and continues the training of the model on the basis after the parameters are adjusted; and finally, storing the final weight function obtained by training, and obtaining a final training result.

And testing the model, obtaining an mAP curve by using the verification set, and observing whether the model meets the requirement. The threshold value is set to 0.5, and is marked as a positive sample when the measured IoU value is greater than the threshold value, and is marked as a negative sample otherwise. Calculate the precision and Recall of each prediction, and calculate AP, mAP:

；

where p is precision, r is the recovery value corresponding to the precision at this time, and n is the total number of categories. Specifically, there are the following concepts: positive samples (TP) labeled positive samples, positive samples (FP) labeled negative samples, negative samples (TN) labeled negative samples, negative samples (FN) labeled positive samples, and precision and recall (recall); the average precision value (AP) of one type of label can be obtained through calculation through precision and recall rate, the higher the AP value is, the higher the probability of successful identification of the label target is, mAP is the average value of all types of label precision values of the whole data set, the comprehensive accuracy of the network model for identifying the whole target can be judged, and the method is also a judging standard for judging whether the network model is good in identification effect or not.

Step 4: and rapidly identifying bridge disease conditions in the bridge video by using the model, obtaining bridge disease categories and probability, and outputting and storing the results as XML files for training.

And identifying the bridge disease video by using the trained network model, marking the disease position and the disease category in the video, and outputting the result to an XML file. The model reads the video information of the bridge component detected by the bridge unmanned plane, automatically marks the area part of the bridge Liang Binghai in the video, marks the prediction probability of the prediction frame, and stores the result as an XML file. The result output by the model can be used for automatically marking bridge diseases in the video, and the XML file of the marked result can be used for training other models.

The foregoing is merely a preferred embodiment of the method of the present invention, and is not intended to limit the invention in any other way, but the technology according to the present invention can be changed in the specific embodiments and application scope, and is directly or indirectly applied to other technical fields, which are all the same as the protection scope of the patent of the present invention.

Claims

1. The automatic bridge disease identification method based on deep learning is characterized by comprising the following steps of:

(2) The attention introducing mechanism module optimizes the Darknet-53 network structure and the loss function in the YOLOv5 algorithm, and optimizes the data parameters of the YOLOv5 algorithm; freezing the weight parameter data in the Darknet-53 network, and training the bridge disease identification model in batches;

(3) Observing the convergence condition of the model, and judging whether the loss function is over-fitted or not and whether the loss function meets the training standard or not; if the training standard is met, directly outputting a standard bridge disease recognition model, if the training standard is not met, finding a weight function meeting a preset standard, thawing the data set, performing refined training again, and outputting the standard bridge disease recognition model after the weight function meeting the standard is obtained;

(4) And rapidly identifying the damage condition in the bridge video according to the standard bridge damage identification model, obtaining the bridge damage category and the probability of possibility thereof, and outputting and storing the result as an XML file for training.

2. The automatic bridge defect recognition method based on deep learning of claim 1, wherein the implementation process of the step (1) is as follows:

3. The automatic bridge defect recognition method based on deep learning according to claim 1, wherein the YOLOv5 in the step (2) comprises a dark-53 network in a convolutional neural network CNN, and the network comprises batch standardization and a LeakyReLU activation function layer.

4. The automatic bridge defect recognition method based on deep learning according to claim 1, wherein the implementation process of optimizing the dark net-53 network structure in YOLOv5 by the attention introducing mechanism module in step (2) is as follows: