CN114359654A

CN114359654A - YOLOv4 concrete apparent disease detection method based on position relevance feature fusion

Info

Publication number: CN114359654A
Application number: CN202111478855.1A
Authority: CN
Inventors: 苏祖强; 赵成; 韩延; 王诚诚; 王鑫
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-04-15

Abstract

The invention belongs to the technical field of computer vision, and particularly relates to a position relevance feature fusion-based YOLOv4 concrete apparent disease detection method, which comprises the steps of respectively carrying out multi-scale fusion on three layers of features output by a path aggregation network of YOLOv4, and carrying out feature multi-scale self-adaptive fusion by a position relevance attention-based module to construct a position relevance feature fusion-based YOLOv4 model; marking the position and the category of the disease of the collected disease image by using a marking tool, and training the model by using the disease image and the marked disease information; inputting the real-time detection concrete apparent disease image into a trained model, and outputting an image for marking disease types and positions after detection by the model; according to the invention, a feature fusion module based on position relevance is added behind the original route aggregation network of YOLOv4, so that the effect of YOLOv4 feature fusion is enhanced, and the detection precision of the target is improved.

Description

YOLOv4 concrete apparent disease detection method based on position relevance feature fusion

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a method for detecting the apparent diseases of YOLOv4 concrete based on position relevance feature fusion.

Background

Nowadays, there are more and more building structures aged worldwide, most of which use concrete as a building material. The concrete is influenced by the comprehensive action of various forces for a long time and by external environments such as chloride and sulfate corrosion in an extremely severe environment, and various diseases such as cracks, holes, honeycombs, pitted surfaces, exposed ribs and the like inevitably occur. The civil engineering such as bridges and dams formed by concrete structures has extremely high requirement on the integrity of the concrete structures, and in the long-term service process, if the civil engineering is not detected and maintained in time, collapse is finally caused to occur accidents along with the lapse of time, so that irreparable loss is caused. At present, the concrete apparent disease detection mainly adopts a manual detection mode, but the detection mode based on manual detection has the defects that the detection result is influenced by subjectivity, the detection labor intensity is high, the automation degree is low, and the like. With the increase of the service life of bridges, dams and other buildings, more and more bridges and dams need to be detected, and the artificial disease detection method cannot meet the requirements of actual engineering application, so that the concrete apparent disease detection method with higher efficiency and higher intelligent degree needs to be researched urgently.

Concrete apparent disease detection is essentially a target detection problem, and with the rapid development of computer vision technology and artificial intelligence theory in recent years, learners combine computer vision technology and artificial intelligence algorithm to realize concrete apparent disease detection. Deep learning is the latest research result in the field of artificial intelligence, and compared with the traditional machine learning method, the deep learning can realize self-adaptive feature extraction, and meanwhile, as the network structure of the deep learning is deeper, more abundant and abstract features can be extracted. At present, target detection algorithms based on deep learning are mainly divided into two categories: the first type is a method of double-stage detection algorithm such as R-CNN, Faster R-CNN and the like, the methods firstly acquire the area to be detected through a series of candidate area generation algorithms, and then send the area to a convolutional neural network for type judgment to realize the detection and positioning of the target; the second type is a single-stage target detection algorithm, such as SSD, YOLO, etc., which can achieve end-to-end target detection and positioning. Because the on-site detection task of the apparent concrete diseases has certain requirements on speed and precision, the single-stage method is more suitable for detecting the apparent concrete diseases. The YOLO v4 is a YOLO series fourth generation algorithm, and is used for extracting and fusing different scale features of an image through a CSPDarknet53 network, and detecting targets with different sizes on different scale feature maps respectively, so that the detection speed and the detection precision are well balanced. However, in an actual concrete disease detection task, the field environment is complex, the acquired image has complex background information, and the single-stage detection algorithm YOLOv4 is used for detection, so that the characteristic information of the disease cannot be sufficiently extracted, and the condition of missed detection is easy to occur.

Disclosure of Invention

In order to enrich the feature information extracted in the detection process of the YOLOv4 and further improve the detection precision, the invention provides a position relevance feature fusion-based YOLOv4 concrete apparent disease detection method, which comprises the following steps:

collecting concrete apparent disease images used for training a model, and constructing a YOLOv4 model based on position relevance feature fusion;

marking the position and the category of the disease by using a marking tool for the collected disease image;

training a model of YOLOv4 based on location relevance feature fusion according to the acquired disease image and the disease category and location information acquired by the labeling software;

detecting concrete apparent disease images in real time through a trained model based on position relevance feature fusion, classifying and positioning the detected diseases, and outputting images for marking disease types and positions after detection;

the position relevance feature fusion-based YOLOv4 model respectively performs multi-scale fusion on three layers of features output by a path aggregation network of YOLOv4, and performs feature multi-scale adaptive fusion through a position relevance attention-based module, specifically comprising:

the position relevance Attention module embeds position information into channel weights by using Coordinate Attention channel Attention from channel dimensions, and screens out channels sensitive to the position information during fusion;

the position relevance Attention module performs Spatial adaptive weight adjustment on the feature map by using Spatial Attention space Attention from a space dimension, so that the position of the target is more concerned during detection.

Further, the channel weight obtaining process includes:

two 1D global pooling operations are adopted, and each channel is aggregated along a horizontal coordinate and a vertical coordinate respectively to obtain two characteristics f with direction perception independently_hAnd f_w；

Performing concatenate operation on the two obtained feature graphs, and then performing convolution operation to further extract feature information;

separating a feature map f 'along the vertical direction from the extracted feature information'_hAnd a feature map f 'in the horizontal direction'_wUsing an activation function to obtain channel weights M from the vertical and horizontal directions_hAnd M_w；

M_h＝σ(F_h(f′_h))

M_w＝σ(F_w(f′_w))

Wherein, sigma is Sigmoid activation function, F_hAnd F_wTwo convolution kernels of 1x1, respectively, for adjusting the feature map f in the vertical direction′_hAnd a feature map f 'in the horizontal direction'_wThe output channel dimension is made the same as the original number of in-out channels.

Further, the spatially adaptive weight adjustment of the feature map using Spatial Attention comprises:

generation of two different features F using maximum pooling and average pooling in channel dimension_s ^avgAnd F_s ^maxConnecting the two features;

extracting information from the two connected features by convolution calculation to obtain a spatial attention feature;

obtaining a spatial weight M using an activation function for a spatial attention feature_sExpressed as:

wherein rho (.) is sigmoid activation function,

represents the convolution operation using a 7x7 convolution kernel; f' is the input feature map.

Further, the training process of the position relevance feature fusion-based model of YOLOv4 includes:

dividing an input image into S multiplied by S squares;

predicting n bounding boxes in each square, and generating confidence of a detection target for the bounding boxes of each square;

for each boundary box, predicting the conditional probability of a certain class of detection target, and multiplying the conditional probability by the confidence coefficient to obtain the confidence coefficient of each boundary box for each specific class;

and calculating the direct difference between the output result and the labeling result by adopting a loss function of YOLOv4, and carrying out back propagation training on the model through the difference.

Further, the confidence is expressed as:

wherein Confidence represents the Confidence level, Pr (object) represents the probability that the boundary box contains the object to be detected,

indicating the overlap ratio of the predicted bounding box and the labeled bounding box.

Further, the loss function of YOLOv4 is expressed as:

L＝L_reg+L_conf+L_class；

wherein L is_regThe frame regression loss; l is_confIs a loss of confidence; l is_classIs a classification loss.

Further, the frame regression loss L_regExpressed as:

where IOU (A, B) represents the intersection ratio of real box A and predicted box B, p²(A, B) represents the Euclidean distance between the central points of the prediction frame B and the real frame A; m represents the diagonal distance of the coverage area of the real box A and the prediction box B; w and h denote the width and height of the prediction box, w^gtAnd h^gtRepresents true width and height; α is a balance coefficient, and v is a parameter for keeping the aspect ratio of the prediction box uniform.

Further, the confidence loss L_confExpressed as:

wherein λ is_noobjIs the weight of the cross-over ratio error, s²B is the predicted boundary frame number of each cell;

whether a detection target exists in the jth boundary box of the ith cell or not is represented, if the detection target exists, the detection target is 1, and if the detection target does not exist, the detection target is 0;

in order to predict the degree of confidence,

is the actual confidence.

Further, the classification loss L_classExpressed as:

wherein s is²The number of divided cells;

whether a detection target exists in the jth boundary box of the ith cell or not is represented, if the detection target exists, the detection target is 1, and if the detection target does not exist, the detection target is 0; p_i ^jIs the actual probability of the category to which the object in the cell belongs,

is the prediction probability.

According to the invention, a position relevance-based feature fusion module is added behind an original Path Aggregation Network (PANET) of the YOLOv4, so that the effect of YOLOv4 feature fusion is enhanced, and the detection precision of a target is improved.

Drawings

FIG. 1 is a flow chart of a method for detecting apparent diseases of YOLOv4 concrete based on location relevance feature fusion, according to the invention;

FIG. 2 is a training flowchart of the model of YOLOv4 based on location relevance feature fusion according to the present invention;

FIG. 3 is a block diagram of a location correlation feature fusion module according to the present invention;

FIG. 4 is a diagram of a YOLOv4 network structure based on location relevance feature fusion, which is adopted by the present invention;

FIG. 5 is a diagram of the detection effect of the present invention and the prior art, wherein (a) is an SSD detection effect diagram, (b) is a Faster RCNN detection effect diagram, (c) is an original YOLOv4 detection effect diagram, and (d) is a YOLOv4 model detection effect diagram based on location relevance feature fusion, namely the present invention;

fig. 6 is a diagram of the detection effect of the present invention and the prior art, wherein (a) is an SSD detection effect diagram, (b) is a fast RCNN detection effect diagram, (c) is an original YOLOv4 detection effect diagram, and (d) is a YOLOv4 model detection effect diagram based on location relevance feature fusion, that is, the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a position relevance feature fusion-based YOLOv4 concrete apparent disease detection method, which comprises the following steps:

In this embodiment, as shown in fig. 1, the method for detecting the apparent diseases of the YOLOv4 concrete based on location relevance feature fusion includes the following steps:

step 1, acquiring concrete building apparent disease images.

And 2, finely marking the disease area in the image by using a marking tool through a manual marking method, and marking the position and the category information of the disease in the image.

And 3, training a position relevance feature fusion-based Yolov4 model according to the collected disease images and the disease category and position information acquired through the labeling software.

The position relevance feature fusion-based YOLOv4 model adopted by the invention respectively performs multi-scale fusion on three layers of features output by a Path Aggregation Network (PANET), performs multi-scale self-adaptive fusion of the features through a position relevance attention-based module, and then performs output prediction, wherein the improved overall Network structure is shown in FIG. 4. The position relevance Attention module firstly uses the Coordinate Attention (CA) channel Attention from the channel dimension to embed the position information into the channel weight, screens out the channel sensitive to the position information during fusion, thereby avoiding the influence of the position information loss on the feature fusion effect in the transformation process of the feature graph dimension, and then uses the Spatial Attention (SA) space Attention from the space dimension to carry out the Spatial adaptive weight adjustment on the feature graph, so that the position of the target is more concerned during detection, and finally the detection precision of the target is improved.

The location correlation attention module structure is shown in fig. 3, and the module includes a channel attention module and a space attention module, namely:

1) channel attention module

The module adopts two 1D global pooling operations, and aggregates each channel along a horizontal coordinate and a vertical coordinate respectively to obtain two characteristics f with direction perception independently_hAnd f_wThen, the two feature maps obtained are subjected to a concatemate operation, and then convolution operation is carried out to further extract feature information, and the feature map f 'along the vertical direction is separated'_hAnd a feature map f 'in the horizontal direction'_wFinally, the activation function is used to obtain the channel weights M from the vertical and horizontal directions_hAnd M_wThe weight outputs are as follows:

M_h＝σ(F_h(f′_h))

M_w＝σ(F_w(f′_w))

wherein σ is Sigmoid activation function, F_hAnd F_wTwo convolution kernels, 1x1 each, are used to adjust the channel dimensions of the output to be the same as the original number of incoming and outgoing channels.

2) Space attention module

The module uses maximum pooling (Max pooling) and average pooling (Avg pooling) in the channel dimension, respectively, to generate two different features F_s ^avgAnd F_s ^maxThen connecting the two features together for rollingThe product calculation further extracts information to obtain a space attention characteristic U_sThen using the activation function to obtain the spatial weight M_sThe weight outputs are as follows:

wherein rho (.) is sigmoid activation function,

representing a convolution operation using a 7x7 convolution kernel.

In this embodiment, normalization processing is performed on the acquired concrete apparent disease image, then the image is reduced or enlarged, so that the size of the image is 416 × 416, the processed image is obtained, random data scrambling is performed, then 80% of the disease image is divided into a training set, and 20% of the disease image is divided into a test set for training and testing a model.

When the collected disease image is marked on the position and the category of the disease by using a marking tool, the marking tool is used for marking a target area in the image by using a rectangular frame, and the coordinate, the width and the height of the central point of the marking rectangular frame and the category of the marking rectangular frame are obtained.

Referring to fig. 2, a flowchart of the YOLOv4 model training based on location relevance feature fusion according to the present invention includes:

a1, dividing the input image into S × S squares by the model;

a2, predicting n bounding boxes in each square, and generating confidence of the detected object for the bounding box of each square, wherein the confidence is expressed as:

representing the overlapping rate of the predicted bounding box and the labeled bounding box;

a3, for each bounding box, predicting the conditional probability Pr (class (i) object) of containing a certain class of detection object, wherein Pr (class (i) object) represents the probability that the bounding box contains the i-th class of detection object;

a4, multiplying the Confidence coefficient obtained in the step a2 by the conditional probability Pr (class (i) object) obtained in the step A3 to obtain the Confidence coefficient of each boundary box for each specific class;

and A5, calculating by adopting a loss function of YOLOv4 to obtain a positioning frame of each detection target, wherein the loss function is used for calculating the direct difference between the output result of the model and the labeling result. The YOLOv4 loss function mainly comprises three parts, namely border regression loss, confidence regression loss and classification loss, and is specifically represented by the following formula:

L＝L_reg+L_conf+L_class

in the formula (4), L_regFor coordinate loss, the specific formula is as follows:

IOU (A, B) in the formula represents the intersection ratio of the real box and the predicted box, p²(A, B) representing the Euclidean distance between the central points of the prediction frame and the real frame; m represents the diagonal distance of the coverage areas of the real box and the prediction box; w and h denote the width and height of the prediction box, w^gtAnd h^gtRepresenting realityIs wide and high. α is a balance coefficient, and v is a parameter for keeping the aspect ratio of the prediction box uniform.

L_confFor confidence loss, the specific formula is as follows:

wherein λ is_noobjIs the weight of the cross-over ratio error, is set to 0.5,

in order to predict the degree of confidence,

is the actual confidence.

L_classThe method adopts a binary cross entropy loss function for classification, and the specific formula is as follows:

wherein, P_i ^jIs the actual probability of the category to which the object in the cell belongs,

is the prediction probability.

The input image is: firstly, normalizing the acquired concrete apparent disease image, then reducing or amplifying the image to enable the size of the image to be 416x416 to obtain a processed image, firstly randomly disordering data, then dividing 80% of the disease image into a training set and 20% of the disease image into a testing set for training and testing a model.

And 4, according to the trained model and the acquired concrete apparent disease image to be detected on site, detecting the concrete apparent disease image in real time, classifying and positioning the detected diseases, and outputting an image for marking the disease category and position after detection.

In order to verify the effectiveness of the method, an experiment detects common concrete apparent diseases, 1751 concrete disease images collected on site are established, wherein the main disease types comprise stripping, cracks, cavities, honeycombs, exposed ribs and the like, and the disease positions and the category information of the disease images are marked by using image marking software. And the ratio of the raw materials is 8: 2, the training data set and the test data set are divided according to the proportion, and the details of the data sets are shown in the following table 1:

TABLE 1 concrete apparent disease data set

Disease category	Training data label number	Number of test data markers	In all
				Exfoliation	644	145	789
Crack (crack)	794	157	951
				Hole(s)	600	160	760
Honeycomb body	481	115	596
				Exposed rib	562	161	723

In order to verify the effect of the provided position relevance-based feature fusion YOLOv4 model, training and testing are carried out on the established data set, and the performance of the improved model is evaluated by adopting the average detection accuracy rate mAP, the recall rate recall and the precision rate precision of evaluation indexes commonly used for target detection. And corresponding ablation experiments were performed, and five sets of experiments were performed, namely, the original YOLOv4 model, the SA Spatial Attention module, namely SA-YOLOv4(Spatial Attention-YOLOv4), the CA channel Attention module, namely CA-YOLOv4 (coordination Attention-YOLOv4), the model, namely M-YOLOv4(Multiscale-YOLOv4), the YOLOv4 model, namely L-YOLOv4 (localization correlation fusion-YOLOv4) based on Location correlation feature fusion, were added only after the pant output, respectively. The experimental results are shown in table 2 below.

TABLE 2 results of the experiment

Network model	mAP	recall	precision
				YOLOv4	75.34％	57.14％	86.67％
SA-YOLOv4	75.91％	58.42％	86.76％
				CA-YOLOv4	76.06％	58.19％	86.84％
M-YOLOv4	76.43％	58.65％	87.15％
				L-YOLOv4	77.16％	59.34％	87.35％

As can be seen from table 2, the original YOLOv4 model is improved in detection effects by adding a multi-scale fusion, a CA channel attention module, and an SA space attention module behind a routing aggregation network PANet. The position relevance feature fusion-based YOLOv4 model combining the ideas of the three is improved to the highest degree, compared with the original YOLOv4, the average detection accuracy rate mAP is higher by 1.82%, the recall rate recall is higher by 2.2%, and the precision rate precision is higher by 0.68%, which shows that the detection accuracy is effectively improved and the omission ratio is reduced through the position relevance feature fusion-based YOLOv 4.

Meanwhile, the YOLOv4 fused based on the position relevance characteristics is compared with other classic target detection algorithms SSD and fast RCNN, and comparison is made in terms of Average detection accuracy (mean Average Precision) mapp and detection speed, and the result is shown in table 3 below.

TABLE 3 results of the experiment

As can be seen from table 3, the original yollov 4 mAP value reached 75.34% for the self-constructed concrete apparent disease dataset used in this experiment. The mAP value of the position relevance feature fusion based YOLOv4 is improved by 1.82% compared with the original YOLOv4, and the mAP value of SSD and Faster RCNN on the data set is respectively lower by 5.66% and 2.15%. The position relevance feature fusion-based YOLOv4 has a good effect on the precision and speed of concrete apparent disease detection, which shows that a position relevance-based feature fusion module is added behind a Path Aggregation Network (PANet) of an original YOLOv4, and the Network extracts richer feature information and improves the detection precision through multi-scale adaptive feature fusion.

Fig. 5-6 (a), (b), (c), and (d) respectively represent SSD detection effect maps, Faster RCNN detection effect maps, raw YOLOv4 detection effect maps, and YOLOv4 model detection effect maps based on location association feature fusion. As can be seen from fig. 5-6, SSD, fast RCNN, and original YOLOv4 are not ideal in detecting the disease image collected on site, and have the situations of missing detection, false detection, etc., and L-YOLOv4 performs multi-scale feature fusion based on the position relevance after the path aggregation network, fully utilizes feature information from different scales, and enriches the context information of the target, so that the target detection is more accurate during the detection, and the problems of missing detection and false detection in the concrete apparent disease detection task are effectively improved.

The invention provides a position relevance feature fusion-based Yolov4 concrete apparent disease detection method aiming at a concrete apparent disease detection scene, and a position relevance-based feature fusion module is added behind a Path Aggregation Network (PANET) of an original Yolov4, so that the effect of YOLov4 feature fusion is enhanced, and the target detection effect is improved.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A position relevance feature fusion-based YOLOv4 concrete apparent disease detection method is characterized by comprising the following steps:

2. The method for detecting the apparent diseases of the YOLOv4 concrete based on the location correlation feature fusion as claimed in claim 1, wherein the channel weight obtaining process comprises:

M_h＝σ(F_h(f′_h))

M_w＝σ(F_w(f′_w))

Wherein, sigma is Sigmoid activation function, F_hAnd F_wTwo convolution kernels of 1x1, respectively, for adjusting the feature map f 'in the vertical direction'_hAnd a feature map f 'in the horizontal direction'_wThe output channel dimension is made the same as the original number of in-out channels.

3. The method for detecting the apparent disease of the YOLOv4 concrete based on the location-related feature fusion as claimed in claim 1, wherein the spatially adaptive weight adjustment of the feature map using the Spatial Attention space comprises:

wherein rho (.) is sigmoid activation function,

4. The method for detecting the apparent concrete diseases based on the position relevance feature fusion YOLOv4 as claimed in claim 1, wherein the training process of the model based on the position relevance feature fusion YOLOv4 comprises:

dividing an input image into S multiplied by S squares;

5. The method for detecting the apparent diseases of the YOLOv4 concrete based on the location relevance feature fusion as claimed in claim 4, wherein the confidence coefficient is expressed as:

6. The method for detecting the apparent concrete diseases based on the position relevance feature fusion of YOLOv4 as claimed in claim 1, wherein the loss function of YOLOv4 is expressed as:

L＝L_reg+L_conf+L_class；

7. The method for detecting the apparent diseases of the YOLOv4 concrete based on the location correlation feature fusion as claimed in claim 6, wherein the border regression loss L_regExpressed as:

8. The method for detecting the apparent diseases of the YOLOv4 concrete based on the fusion of the position relevance features of claim 6, wherein the confidence loss L is_confExpressed as:

in order to predict the degree of confidence,

is the actual confidence.

9. The method for detecting the apparent diseases of the YOLOv4 concrete based on the fusion of the position correlation characteristics according to claim 6, wherein the classification loss L is_classExpressed as:

wherein s is²The number of divided cells;

is the prediction probability.