Disclosure of Invention
The technical scheme adopted by the invention is a pavement structure intelligent detection method based on Deep Learning, which comprises two main parts, namely a Deep Learning (DL) part and an image processing technology (image process technology, IPT) part. The method comprises the following specific steps:
S1 Deep Learning (DL) section:
step one: manual calibration
The data set used by the invention is 3000 images of three-channel gray field of asphalt pavement in Jiangsu province in China, the pixels are 4096 multiplied by 2000, and the images are shot by a ZOYON-RTM intelligent road detection vehicle. Through training and learning of road surface diseases in the images, the invention can automatically identify, detect and extract road surface cracks so as to be used for intelligent road surface structure health monitoring.
And (3) calibrating the picture by using a software LabelImg, and storing the calibrated picture as an xml file.
Step two: target detection
And inputting the processed xml file into an algorithm model for learning and training. And a two-stage detector Faster R-CNN model is used as a calculation tool to carry out target detection on the fractured asphalt pavement.
1) Input device
In Faster R-CNN, the short side length of each input size is set to 600. For rectangular images of arbitrary size, the size is scaled to 600 pixels in short side length. In the present invention, the adjusted input size is set to 600×600.
2) Basic convolutional neural network (Base CNN)
The basic classical classification model VGG-16 is used as the basis of CNN. In the target positioning step, the image layers above Conv13 layers are intercepted as output layers of Base CNN and are used as input of RPN due to the need of preserving the original spatial features of the feature images.
3) Regional network (RPN)
The regional generation network adopts three-dimensional 512×512, 256×256 and 128×128 red boxes, and simultaneously three yellow boxes with the length-width ratios of 1:1, 1:1 and 2:1 are respectively arranged for each area size, so that a total of 9 anchor points are arranged. The anchor point converts different sized inputs into fixed sized outputs while converting fixed sized inputs into different preset anchor point sized outputs.
The sliding window is set to 3 x 3 and 9 anchors are selected at the center pixel of each window. These unsuitable anchors are then deleted and a relatively good candidate sample is selected from the remaining anchors according to the following rules. Positive samples refer to samples belonging to a certain class, and negative samples refer to samples not belonging to a certain class. The specific category is judged by an orthogonal ratio formula and expressed as:
wherein IoU is the quadrature ratio; a is a bounding box true value; b is the predicted value of the boundary box.
(1) The positive samples satisfy an orthogonal ratio of anchor points to true values greater than 0.7.
(2) The negative samples satisfy an anchor point true value IoU of less than 0.3.
Wherein to increase the acceptance domain, RPN uses a Base CNN network with a 3:3 convolutional layer. The foreground (object) and background (background) are then classified. And simultaneously, the offset of the center point coordinates of the ground truth value and the center point coordinates of the corresponding anchor points and the ratio of the length and the width of the ground truth value of the object to the length and the width of the anchor are regressed.
The loss function of the RPN consists of two parts, the cross entropy function and L1 regularization, expressed as:
in p i The prediction probability of fracture for anchor i; ti is a vector of 4 parameterized coordinates of the prediction bounding box;andis a corresponding true value; l (L) cls To determine if it is a classification loss of a crack; l (L) reg Is regression loss;
N cls and N reg For the corresponding sample size; lambda is the balance parameter of the two loss functions. i is an index of a small lot of anchors.
4) RoI pooling
The RoI Chi Jineng can accept the profile output of Base CNN and can also accept the RoI output of RPN. After the RoI is projected onto the feature map, the different sized mapping regions are divided into a fixed number of regions. The largest pool of each region is mapped to a feature map of the same size.
5) Classifier
The final classifier performs classification and regression simultaneously. The RPN output is set to 64 mini-batch sizes, where the ratio of positive and negative samples is 1:3. Positive samples were defined as ro and true values IoU values greater than 0.5, negative samples IoU are in range (0.1, 0.5).
S2 Image Processing Technology (IPT) part:
step one: pavement crack image pretreatment
The image can automatically identify the crack through a Deep Learning (DL) part, and the next step, namely an Image Processing Technology (IPT) part, is to extract pavement crack information through a mode of combining various image processing. The image preprocessing is the preprocessing of 3000 pictures with pavement diseases, which are selected from three-channel gray scale places of asphalt pavement in Jiangsu province. The method comprises the following steps of: first, an image is read. And secondly, carrying out gray scale treatment on the image. The image preprocessing lays a foundation for the subsequent deep image processing. The method comprises the following steps:
The image is read. The JPG format is adopted as a format for road surface image reading. The system is convenient for storing a large number of pictures, and the efficiency of processing and identifying cracks by the system is improved. Meanwhile, in order to realize the processing of a large number of pavement crack images, a batch processing method is adopted. First, the pictures in the folder are numbered. Then, the sequentially read pictures are processed and identified. Finally, the recognition result of the output image saves time to a certain extent, improves the visibility of data and is beneficial to analyzing the relevant characteristic values of the pavement cracks.
And (5) graying the image. In order to facilitate the processing and recognition of the pavement crack image, the image needs to be grayed. The main purpose of the pavement crack image processing is to separate the background area from the crack area in the image, thereby extracting the crack from the picture. The gray-scale image mainly reflects the brightness of each object in the image, so that cracks can be extracted according to the brightness of different objects. Thus, removing color information in an image may facilitate subsequent processing of the image.
The color image is mainly divided into three channels R (red), G (green), B (blue), each having 256 levels of different brightness. Each pixel in the image will exhibit a color after the three channels are scaled together, which forms a color image. However, the gray image has only one channel, that is, the gray image is composed of 256 gray values of different sizes, black is 0, and white is 255.
Since the image is an array of pixels, if the upper left corner of the image is taken as the origin, the vertical downward direction is the positive y-axis direction, and the horizontal rightward direction is the positive x-axis direction, the gray values of the pixels at different positions are represented by a set of functions related to coordinates. A color picture is thus expressed as:
f(x,y)=(R,G,B) (3)
wherein f (x, y) is an original image of the pavement crack; r is the brightness value of a red channel; g is the brightness value of the green channel; b is the blue channel luminance value.
The conversion of a color picture into a gray picture is represented as:
f(x,y)=(0.299×R+0.587×G+0.114×B) (4)
wherein f (x, y) is an original image of the road surface shot by the ZOYON-RTM intelligent road detection vehicle; r is the brightness value of a red channel; g is the brightness value of the green channel; b is the blue channel luminance value.
Step two: image enhancement based on dodging algorithm
The Mask dodging algorithm is adopted to conduct batch processing on pavement crack pictures, brightness distribution of images is adjusted, detail information in the images is enhanced while original image brightness is basically maintained, contrast of the images is improved, and a foundation is laid for extracting crack information. The Mask dodging algorithm is a compound algorithm consisting of a Gaussian low-pass filter, image difference and piecewise linear gray scale stretching. The method comprises the following steps:
A gaussian low pass filter. A gaussian low pass filter is a filter in the frequency domain. The image can be transformed from the spatial domain to the wavenumber domain by fourier transformation. Crack edges and other sharp gray scale changes (e.g., noise) in a pavement crack image primarily affect the high frequency content of the fourier transform of the image. Thus, it is possible to extract a non-uniform low frequency signal by attenuating the high frequency.
The two-dimensional form of the gaussian low pass filter is expressed as:
where u and v are coordinates of the image in the frequency domain; h (u, v) is a two-dimensional discrete function of a gaussian low pass filter of size u×v; d (D) 0 Is the cut-off frequency; d (u, v) is the distance from the center of the frequency rectangle.
And (5) performing image difference operation. And carrying out differential operation on the original picture and the image filtered by the Gaussian low-pass filter, so that uneven components in the original image can be removed, and high-frequency content with even brightness can be obtained. Since the high frequency content mainly includes crack information, more crack information is retained in the differentiated image. Because the difference between the image and the original image has a deviation in brightness, the logarithm of the average gray level of the original image is added to the whole image after the difference, so that the effect of correcting the brightness of the original image is achieved, and the bright spot noise generated by partial pixel points higher than 255 gray levels due to the fact that the average gray level of the image is directly added to the image is prevented. The image difference operation can be expressed as:
G(x,y)=f(x,y)-μ(x,y)+log(average) (6)
Wherein G (x, y) is an image after differential; f (x, y) is the original image of the pavement crack; μ (x, y) is the image filtered by the gaussian low pass filter; log (average) the logarithm of the average brightness of the original image.
Piecewise linear gray scale stretching. The method can compress the noise with high gray background area or low gray in the image, expand the gray level range of the crack of interest, and enhance the contrast of the image. Piecewise linear gray stretching is shown as:
wherein g (x, y) is an image after gray scale stretching; f (x, y) is the original image; , b is the gray scale range of the original image; c, d is the gray scale of the processed image.
Step four: image segmentation
And thirdly, an image enhancement algorithm for image dodging lays a foundation for image segmentation. The image quality and the contrast of the image subjected to dodging enhancement are greatly improved, so that the image segmentation is effective on the basis. Image segmentation is carried out, firstly minimum pooling is carried out, and then crack contours are outlined through a Sobel edge detection operator. Finally, introducing a self-adaptive threshold value to effectively carry out binarization processing on the image, and making a cushion for image post-processing.
1) Minimum pooling
In a gray image, the gray level represented by black is 0, the gray level represented by white is 255, and the rest of the different gray levels are between 0 and 255. Dark colored cracks in pavement crack images tend to be cracks, while the background is relatively light colored. That is to say that the grey level of the crack is smaller than the background grey level. However, since the crack area is small and the proportion of the crack area in the image is small, false detection or omission of detection occurs when the edge detection operator is used to detect the crack edge. To highlight the crack, the contrast of the crack to the background is enhanced while noise in the image is reduced by appropriately changing the image size, so the road crack image is processed with minimum pooling.
Minimum pooling operates with a template kernel that is of arbitrary size, but must be square, such as 1 x 1,2 x 2. The image size after the minimum pooling process is shown in the formulas 8 and 9.
M 2 =(M 1 -F)/S+1 (8)
N 2 =(N 1 -F)/S+1 (9)
Where F×F is the template size, M×N is the image size, and S is the distance the template moves each time. Since the present invention achieves the purpose of resizing an image by minimum pooling, the boundary fill is set to 0.
As can be seen from equations 8 and 9, the size and step size of the filter affect the size and quality of the image after processing. Wherein the step size has a great influence on the image size, and the step size is set to 1 in order not to excessively change the size of the original image. While at the same time. It is known from the prior related study that the minimum value pooling filter with the size of 4 has larger image signal-to-noise ratio and average gradient, namely the image has relatively more details and relatively higher quality, so the minimum value pooling filter with the size of 4 is selected.
2) Sobel edge detection operator
The Sobel operator is a first-order partial derivative template and processes an image by the principle of weighted smoothing and differential calculation of the image. The template of the Sobel operator is similar to that of the Prewitt operator except that 2 is used on the center coefficient. The center coefficient makes the Sobel operator more advantageous than the Prewitt operator in terms of smoothing noise. Noise suppression is necessary when dealing with derivatives, because these operators are detected from image gradients, and the cracks and the background and noise and background have strong gray scale rates, so it is important to be able to smooth noise on the basis of crack detection. The 3 x 3 operator structure of Sobel can be expressed as:
and (5) binarizing the adaptive threshold image. The adaptive threshold segmentation method uses multiple thresholds to segment an image based on pixel neighborhood characteristics. Adaptive thresholding first divides the image into sub-blocks, and then calculates the thresholding for each sub-block. There are generally two methods of calculating the threshold, one is to calculate the average value of the sub-block and then correct it by adding a constant, thereby obtaining the threshold of the sub-block. Another method is to convolve the sub-block with a gaussian template, and correct the convolved value with a constant to obtain the threshold value of the sub-block. And finally, binarizing each sub-block according to the corresponding threshold value, wherein pixels higher than the threshold value become white, and pixels lower than the threshold value become black.
Step five: image post-processing
And step four, the image is subjected to crack edge extraction, so that the effect is obvious. But the image background is very noisy. Therefore, the image needs to be post-processed, so that small area noise in the background is eliminated, and the crack is repaired to a certain extent. And finally, extracting the framework of the crack, and displaying the form of the crack, so that the crack form can be conveniently identified. In the image post-processing part, morphological closing operation is firstly used, and noise in an image background is firstly subjected to preliminary corrosion by using the morphological closing operation. Then, the maximum connected domain denoising is adopted in order to further remove small area noise occurring in the background region. And then, carrying out average pooling on the cracks, and removing noise points with smaller sizes in the image by adjusting the size of the image. And finally, extracting a crack skeleton and displaying the morphological information of the crack.
Morphological closing operation. The morphological operation of the image can repair and restore the crack morphology on a certain distance, on one hand, the noise in the image can be further filtered, and especially the effect of processing the salt and pepper noise is obvious. On the other hand crack details may be enhanced. Morphological opening and closing operations are based on erosion and expansion operations. The erosion operation can erode bright areas in the image, expanding black areas. In other words, the expansion operation may thicken the fracture region, enhancing fracture details, but also enhancing noise strength. While the dilation operation may expand bright areas in the image, eroding dark areas in the image. That is, the dilation operation may refine the crack region in the image, erode the black noise points, but on the other hand may lose some of the crack detail.
The morphological opening and closing operation combines the expansion and corrosion operations. The open operation adopts the sequence of firstly etching and then expanding, and the close operation adopts the sequence of firstly expanding and then etching to process the image. The starting operation can smooth the outline of the target area, break the narrow connection between objects and eliminate tiny noise points. The closed operation can smooth the partial outline of the target area, but contrary to the open operation, the closed operation often makes up for the small break or the place where the boundary line between two objects breaks, and can also fill the small holes in the target object.
The morphological open-close operation can remove noise and cause discontinuous cracks to a certain extent, so that the noise points are eliminated by mainly adopting a method of denoising a maximum connected domain through the algorithm function of the analysis open-close operation, and the morphological open-close operation is selected. Thus, the background noise can be primarily removed on the basis of ensuring the continuity of the cracks.
The closed operation uses structural elements to process the image. The shape of the structural elements is generally rectangular, square, oval and cross, and the sizes of the structural elements are also various. In the invention, the image is selected to be subjected to the closing operation by adopting 4 multiplied by 4 square structural elements, and then the image is subjected to the closing operation by adopting 6 multiplied by 6 cross structural elements.
And denoising the maximum connected domain. In order to realize the denoising of the maximum connected domain, the image needs to be scanned, and each connected domain in the image is marked. The labeling of connected domains is generally performed by a four-neighborhood method or an eight-neighborhood method. The four-neighborhood method mainly scans four points around the marked point, namely, the four points around the marked point, and the eight-neighborhood method increases the scanning of the diagonal adjacent domain on the basis of the four-neighborhood method. The invention scans the image by eight neighborhood method.
Because the number of noise connected domains in the image is not large, the invention adopts a contour-based labeling method to track the connected domains of the image when the connected domains are scanned. The specific steps of the algorithm are as follows:
(1) The whole image is scanned, and the image is traversed by adopting the principle of left to right and top to bottom.
(2) And generating an image which is the same as the original image, and then when the outline of the connected domain of the original image is identified, corresponding the identified pixel points with the pixel points of the copied image, and setting the corresponding pixel points to be white or other colors, so that the outline of the connected domain is outlined, and the information of the original image is reserved.
(3) When the image is scanned, if the point A is the first outline point scanned and is not marked. Then the scan starts from point a and a new label is given to point a, and then boundary tracking is performed according to a certain search strategy. Eventually all edge points on the same contour as a are scanned and back to the a start point, at which point the points on this path need to be marked with the same reference number as the a point. In this way, the boundary of a connected domain is outlined.
(4) After the outer contour edge of the connected domain is scanned, whether the inner contour edge exists in the connected domain needs to be scanned. It is necessary to scan the pixel value to the right of each already marked outer contour edge point and mark these pixels as the same number as the outer contour point, starting from that point, and stopping if a black pixel point is encountered, which is typically a differently positioned outer contour point of the same number.
(5) In step (4), if a special point is encountered when scanning to the right, the pixel is an inner contour pixel, and the right below the point is a black pixel and is not a point on the outer contour edge. Then the inner contour edge needs to be tracked starting from B according to a certain boundary search strategy. Since the reference numerals of B are the same as those of the outer contour points, the same reference numerals are given to the pixel points of the same inner contour as the point B.
(6) After traversing all the pixels on the inner contour, the inner contour points can continue to scan to the right and mark the pixels as the same labels as the outer contour points until the next black pixel point is scanned. If the conditions (4) and (5) are met in the scanning process, the operation is repeated. Until the boundaries of all the connected domains in the image are traversed.
After the boundary of the connected domain in the graph is marked, the area surrounded by the closed boundary needs to be calculated, and the areas are sequentially arranged from small to large. And (3) repeatedly trial calculation, and selecting the area size of the connected domain in the 70% of the area size row in each image as a threshold value for deleting the connected domain with a small area. That is, the connected domain below this area is deleted and filled in white, and only the connected domain above this area remains.
And (5) carrying out mean pooling. In order to remove scattered noise points on the boundary in the image, the invention adopts mean value pooling to process the image.
The principle of mean pooling is the same as minimum pooling, as shown in equations 8 and 9.
The average value of a certain area can be obtained through the average value pooling, and as the image is a binary image, namely the gray level in the image is only 0 and 255, the pixel points with other gray levels can appear in the image after the average value. Because the black area of the noise point is small and the white area is surrounded by the periphery, the gray level of the pixel point after the noise point is averaged and pooled is higher, namely the color is lighter. In contrast, the black area of the crack is larger, and the surrounding white area is smaller, so that the pixel gray level of the crack area after being subjected to mean pooling is smaller and is close to black. Thus, the fixed thresholding is used herein to binarize the image, thereby removing noise points. That is, pixels above a certain gray value will become white, while pixels below the threshold will become black. On the other hand, the adjustment of the image size by the averaging pool also helps to eliminate small area noise points.
And then binarizing the image after the mean value pool, wherein the image is processed by adopting a simplest fixed value binarization method. The statistics show that the pixel value after the average pooling of most noise points is about 100, so 100 is selected as a binarization threshold, and the principle can be expressed as follows:
wherein g (x, y) is a binarized image; f (x, y) is an input image; t is the segmentation threshold.
Through repeated comparison and experiments, the invention selects the filter with the size of 7, and the step length of the filter is 1.
Extracting a skeleton. In order to further extract the morphology of the crack, the invention extracts the skeleton of the pavement crack. The skeleton, as its name suggests, expresses the trend of the crack with one line, so that the treatment can express the crack morphology with only a small number of pixels. On one hand, the method can intuitively and simply display the form information of the pavement cracks, and on the other hand, the method can reduce the storage space of the images and provide convenience for storing a large number of processed images. The essence of the skeleton is extracted to refine the target area in the image, so that the invention adopts a refining algorithm to carry out skeleton extraction on the crack. The skeleton extraction method is numerous, and the invention adopts a table look-up method to realize skeleton extraction of the image cracks. The method comprises the following specific steps.
(1) An eight-neighborhood scanning method is needed for the image, and the principle of left to right and top to bottom is adopted for traversing the image.
(2) In the scanning process, whether the pixel point is a point on the skeleton or not needs to be judged, so that whether the pixel point is deleted or not is judged. The principle of judging whether the pixel point can be deleted mainly comprises the following four points:
1) The target internal point cannot be deleted;
2) Target encouragement points cannot be deleted;
3) The end points of the straight line cannot be deleted;
4) If a point is a boundary point, if the number of connected domains is not increased after the point is removed, the point is deleted, otherwise it needs to be preserved.
(3) Specifically, whether a pixel point can be deleted is judged, and the problem is solved by looking up an image processing skeleton refinement table. Because the invention adopts an eight-neighborhood scanning method, when scanning pixel points, eight neighborhood around a certain pixel point needs to be marked, and weights of different positions are different.
(4) And (3) calculating the value of the central pixel point according to the eight neighborhood labels and weights obtained in the step (3), wherein the value of the point is equal to the weighted average of the eight neighborhood weights and the total binary gray level of the corresponding image.
(5) What the number of the 231 st bit is in the mapping table. The mapping table mainly comprises 0 and 1, wherein 0 represents that the pixel points cannot be deleted, and 1 represents that the pixel points can be deleted. The mapping table has 256 mapping tables, and the mapping tables correspond to eight neighborhood weights.
Step five: crack type determination and evaluation
The invention firstly adopts a projection method to divide the crack image into three types of transverse cracks, longitudinal cracks and oblique cracks. Then, the pixel length of the crack is calculated. And finally, carrying out batch processing on the images, and judging the accuracy of the preliminary identification of the pavement cracks based on the image processing technology.
And judging the type of the crack. And judging the crack type by adopting a projection method. The processed binary image has split pixel values of black, gray values of 0, and white background, and gray values of 255. The gray values of the pixels in the image are projected to the X axis and the Y axis respectively, and are expressed as:
wherein X (i) is the sum of gray values projected on the X axis; y (i) is the sum of the gray values projected on the Y axis; m is the number of pixels of the horizontal axis of the image; n is the number of vertical axis pixel points of the image; f (x, y) is the original image.
The gray value distribution of the pavement crack image in two directions has a certain rule, and the rule is mainly expressed in the fluctuation of the gray value. The gray value fluctuation of the X axis of the longitudinal crack is stronger than that of the Y axis, the gray value fluctuation of the Y axis of the transverse crack is stronger than that of the X axis, and the fluctuation intensity of the X axis and the Y axis of the oblique crack is approximately the same.
Thus, the introduction of standard deviation to quantify this fluctuation can be expressed as:
wherein X is σ The standard deviation of the gray value of the pavement crack image on the X axis is obtained; y is Y σ The standard deviation of the gray value of the pavement crack image on the Y axis is obtained; m is the number of pixels of the horizontal axis of the image; n is the number of vertical axis pixel points of the image; mu (mu) 1 Is the mean value of gray values projected on the X axis; mu (mu) 2 The mean value of gray values projected on the Y axis; x is x i A value for each pixel projected on the X-axis; y is i For the value of each pixel projected on the Y-axis.
The criteria for dividing the crack type according to the gray value standard deviation in two directions can be expressed as: .
X σ >1.5Y σ (16)
Y σ >1.5X σ (17)
Wherein X is σ The standard deviation of the gray value of the pavement crack image on the X axis is obtained; y is Y σ Standard deviation of gray value of pavement crack image on Y axis
When equation 14 is satisfied, the crack is considered a longitudinal crack; when equation 13 is satisfied, then the crack is considered a transverse crack; when neither equation 14 nor equation 15 is satisfied, then it is considered an oblique fracture.
And (5) calculating the crack length. Based on OpenCV, the crack length is detected, firstly, a crack skeleton in an image is marked by adopting cv2.findcontours, and then, the length of the crack skeleton is calculated by adopting cv2.arclength.
Since the true lengths of the cracks in the pavement crack image set are not recorded, the invention calculates the crack length as the pixel length. Therefore, if the actual lengths of the cracks in the images are obtained, the actual lengths of the cracks in one image and the calculated pixel lengths are known, and the lengths of the cracks in other images can be calculated according to the ratio of the actual lengths to the calculated lengths, so that the calculation accuracy can be known.
Detailed Description
The data set used by the invention is 3000 images of three-channel gray field of asphalt pavement in Jiangsu province in China, the pixels are 4096 multiplied by 2000, and the images are shot by a ZOYON-RTM intelligent road detection vehicle. These images are taken under different environmental and driving conditions, such as day-night illumination, high-low speed, rain-dry conditions, etc. The original image is preprocessed and then input into the invention for rapid calculation:
1) Each original image is divided into two 2048 x 2000 sub-images.
2) The size reset is 2000 x 2000.
3) In 5673 images, the major 4 objects of the pavement were manually marked and classified as cracks, seal cracks, pavement markings, and well covers, respectively, as shown in table 1.
4) 5000 images were randomly selected as the training set and the remaining 673 images were selected as the test set.
The calibration work needs 10 persons in total, and about 100 hours are needed to finish the label work, and compared with the semantic segmentation label, the label work is relatively small. Typical identification works are shown in table 1.
Table 1 calibration summary
Classification
|
Road marking
|
Crack and crack
|
Repairing cracks
|
Well cover
|
Total number of
|
4063
|
3980
|
6323
|
580 |
In RPN, as shown in table 2, for a feature map with Conv13 layer output size of 39×39, a sliding window is set to 3×3, and 9 anchor points are selected at the center pixel of each window. A total of 39×39×9= 13689 anchor regions are produced. The anchor points whose edges exceed the original image boundary are deleted, and the 2000 best candidate samples are selected from the remaining 3000 anchor points, and the screening method is shown in fig. 6.
TABLE 2 basic convolution-VGG-16 architecture and Anchor size parameters
/>
FIG. 4 shows the results of detection of selected images by a single stage detector YOLOv3 and a two stage detector Faster-RCNN. It can be seen that the detection of objects such as cracks by YOLOv3 is not satisfactory compared to the fast-RCNN. The final average accuracy (mAP) using Faster-RCNN is shown in Table 3, and the overall average accuracy can reach 0.917.
TABLE 3 Faster-RCNN final average precision
Classification
|
Road marking
|
Crack and crack
|
Repairing cracks
|
Well cover
|
Sum total
|
Average accuracy
|
0.9030
|
0.9091
|
0.9076
|
0.9517
|
0.9178 |
The feature extraction result graph is shown in fig. 7.
In the image enhancement based on the dodging algorithm, a Mask dodging method and a dodging method based on an electronic printer principle are compared through a qualitative evaluation method and a quantitative evaluation method.
Qualitative evaluation: the two resulting output images are substantially identical from a visual point of view, but the images processed by the dodging algorithm based on the principle of an electronic printer contain fine ripple textures, which are not visually similar to the Mask dodging algorithm. In addition, the Mask dodging algorithm has a somewhat stronger contrast than the second algorithm.
Quantitative evaluation: and adopting three aspects of mean square error, peak signal-to-noise ratio and average gradient to carry out comprehensive analysis. The method comprises the following steps:
1. mean square error
The mean square error is also known as the standard deviation. The degree to which a set of data deviates from the average, i.e., the magnitude of the fluctuation of the data, is primarily reflected mathematically. The degree of difference between the processed image and the original image is mainly reflected, and can be used for evaluating the degree of change of the processed image compared with the original image. The smaller the value of the index, the closer the processed image is to the information contained in the original image. The calculation of the index can be expressed as the following formula:
Wherein M and N are the length and width of the image respectively; f (x, y) is the original image;is the processed image.
2. Peak signal to noise ratio
The peak signal-to-noise ratio is mainly used for evaluating the quality change condition of the image before and after the processing such as compression, transmission or enhancement and the like, and is established on the basis of the mean square error. The smaller the value of the index, the higher the degree of interference of the image signal, and the worse the image quality. The calculation of this index can be expressed as:
where MSE is the mean square error; l is the gray scale of the image, which is taken as 255 in the present invention.
3. Average gradient
The average gradient mainly reflects the detail information of the image, and generally, the larger the value of the index is, the more the detail information of the image is, the larger the contrast of the image is, and therefore the clearer the image is. The calculation process of the index can be expressed as:
wherein M and N are the length and width of the image respectively; f (x, y) is the original image; delta x f (x, y) is the gradient of the pixel across the row; delta y f (x, y) is the gradient of the picture element over the column.
The calculation results of the above indices are shown in table 4.
Table 4 correlation index calculation for two algorithms
From table 1, it can be seen that the mean square error of Mask dodging is smaller than the dodging value based on the electronic printer, which indicates that Mask dodging is closer to the original image than dodging based on the electronic printer, and the degree of change is relatively smaller than that of the original image, which indicates that more information contained in the original image is retained. In terms of peak signal-to-noise ratio, mask dodging is larger than dodging value based on an electronic printer, which means that the image signals processed by the Mask dodging value is small in interference degree and high in image quality. In terms of average gradient, mask dodging is larger than dodging based on an electronic printer, which means that the former images have more information details and the contrast of the images is larger.
Through qualitative and quantitative analysis, the Mask dodging algorithm performance is superior to the dodging algorithm based on the electronic printer, so the Mask dodging algorithm is selected.
Quantitative comparisons of the index values for the filters of sizes 1 through 6 were made in image segmentation minimization pooling, as shown in table 5.
TABLE 5 values of various indices of images
It can be seen from table 17 that both the mean square error and peak signal to noise ratio of fig. 17 a) and b) are 0, indicating that the difference from the homogenized image is small, and the purpose of enhancing crack to background contrast by minimum pooling is not achieved, thus excluding the filters of sizes 1 and 2. From the mean square error point of view, fig. 17 c) differs more from the three following figures, which illustrates that the three following figures differ more significantly from the original figures, thus excluding a filter of size 3. The three indices of fig. 17 d), e), f) differ little, so that an image with a larger peak signal-to-noise ratio and average gradient is selected, i.e. an image with relatively more details and relatively higher quality is selected. A minimum pooling filter of size 4 is therefore selected.
Comparing the Prewitt operator, the Sobel operator, the Laplacian operator and the Scharr operator in the edge detection operator. From fig. 28, it can be seen that the edge detection operators perform well on the detection performance of the picture, and the crack outline in the picture can be clearly seen. In addition, since the background noise in the image is serious, many white spots are shown in the four images, but the noise suppression by the Prewitt and Sobel operators is relatively stronger. The operator related index values are shown in table 6.
Table 6 values of the operator related indicators
From the perspective of mean square error, the value of the Scharr detection operator is the largest, the mean square error of the edge images detected by the Prewitt and Sobel is the closest, and the value is smaller. Indicating that the Prewitt and Sobel processed images are relatively closer to the homogenized images. From the peak signal-to-noise ratio point of view, these values are not much different, where the minimum of the Scharr detection operator, prewitt and Sobel, are close and relatively large. The image noise immunity detected by the Prewitt and Sobel is relatively stronger, and the image quality is higher. Looking at the average gradient finally, it can be seen that the value of the Scharr detection operator is maximum, and that Prewitt and Sobel are relatively close and relatively low. The image detected by the Scharr detection operator has stronger contrast and more details. In a comprehensive view, the contrast of the image detected by the Scharr detection operator is stronger, the detail is more, and meanwhile, the noise is more serious. And the pictures detected by the Laplacian detection operator do not perform very well in all respects. The detection effect of Prewitt and Sobel is relatively good, and the image quality and noise immunity effect are good.
In conclusion, the Sobel edge detection operator algorithm is simple and efficient, and the detection effect is good. Therefore, the Sobel edge detection operator is selected as one of the means of image segmentation.
In the image post-processing and morphological closing operation, the image is selected to be closed by adopting 4×4 square structural elements, and then is closed by adopting 6×6 cross structural elements, as shown in fig. 19.
As can be seen from fig. 19, in the image after the first processing, although the background noise is reduced as compared with the image after the segmentation, the form of the crack is basically developed, but the noise point is still much larger. But the background noise points of the image after the two-time closing operation treatment are much less, and the basic form of the crack is well reserved.
As can be seen from fig. 19 b), there are many black dot noises in the background of the image after the two morphological closing operations. Since the cracks are black, the noise is also black, which makes it difficult to remove the noise and to preserve the cracks. However, it is clear that although the crack and noise gray levels are the same, their areas are quite different. The area of the crack is large, and the number of noise points is large, but the area of each noise point is small. The removal of noise points is thus achieved on an area basis.
In order to remove sporadic noise points on the boundaries in the image, the image is processed using mean pooling. Through repeated comparison and test, a filter with a size of 7 is selected, and the filter step size is 1.
In the skeleton extraction process, whether a pixel point can be deleted or not is specifically judged, and the pixel point needs to be solved in a table look-up mode. Since the eight-neighborhood scanning method is adopted, when scanning a pixel, eight neighborhoods around a certain pixel need to be marked, and weights of different positions are different, as shown in fig. 24. The value of the center pixel point is calculated according to fig. 24, which is equal to the weighted average of the eight neighborhood weights and the total binary gray level of the corresponding image. What the number of the 231 st bit is in the mapping table. The mapping table mainly consists of 0 and 1, wherein 0 represents that the pixel point can not be deleted, and 1 represents that the point can be deleted. The mapping table has 256 mapping tables, and the mapping tables correspond to eight neighborhood weights. The table is shown in fig. 25. Therefore, the 231 th number is 0, so the pixel should remain.
In the crack identification, 300 crack images are detected, wherein the images comprise 125 longitudinal cracks, 125 transverse cracks and 50 oblique cracks. The original image is classified, and the longitudinal cracks, the transverse cracks and the oblique cracks are marked 1, 2 and 3 respectively to generate an array. And marking the processed image according to the condition for judging the crack type. The output mark is compared with the mark of the original image, thereby calculating the recognition accuracy. Finally, the accuracy after detection by the method of the invention is 87%.
The fracture length calculation is shown in table 7.
TABLE 7 crack length
/>
It calculates the crack length as the pixel length and provides a thinking for calculating the true crack length.