CN117495793A

CN117495793A - Track surface defect detection method, electronic equipment and storage medium

Info

Publication number: CN117495793A
Application number: CN202311399424.5A
Authority: CN
Inventors: 袁锦辉; 来晓虹; 杨飞
Original assignee: Zhejiang Zhonghe Zhiyuan Technology Co ltd
Current assignee: Zhejiang Zhonghe Zhiyuan Technology Co ltd
Priority date: 2023-10-26
Filing date: 2023-10-26
Publication date: 2024-02-02

Abstract

The invention discloses a method for detecting surface defects of a track, electronic equipment and a storage medium, which are used for identifying and positioning the surface defects of a steel rail of the track, wherein the detection method comprises the following steps: step S1: manually marking the defects in the images with the defects in the track to obtain a data set; step S2: training the Rep-YOLO model by utilizing the data set obtained in the step S1, and evaluating and adjusting parameters of the model after the training is finished to obtain an optimal Rep-YOLO model; step S3: inputting the acquired orbit image into the optimal Rep-YOLO model obtained in the step S2, and outputting the position of the target object, the category of the target object and the confidence of the target object by the model. The invention not only saves cost, but also greatly improves the efficiency of surface defect detection.

Description

Track surface defect detection method, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of rail transit, and particularly relates to a rail surface defect detection method based on deep learning.

Background

The rails are used as a railway key infrastructure, and ensuring that the rails are in a good use state is important for safe operation of the railway. As the density of railway vehicles is continuously increased and the running speed of trains is continuously increased, the interaction between train wheel sets and steel rails gradually develops the surface defect problem of the steel rails, including the problems of cracks, chipping and the like. The biggest problem of manual inspection mode lies in that inspection personnel intensity of labour is big. In recent years, research on machine vision, AI and other domains has been advanced, and many detection algorithms based on machine vision have been developed, and particularly after convolutional neural networks have been proposed, researchers have begun to apply them to fields such as target detection and recognition. The defect detection based on machine vision has the advantages of no contact, short time consumption, no subjective factor and the like, reduces the cost of hiring a large number of people, ensures the safety of detection personnel, increases the detection precision, and avoids false detection and omission.

The YOLOv7 structure consists of a backbone network, a feature fusion network and a prediction network. In order to improve the stability and learning ability of the deep network, an extended high-efficiency aggregation network (Extended efficient layer aggregation networks, E-ELAE) is introduced into the YOLOv7, and meanwhile, a scaling method of a composite model is adopted, so that the quantity, the calculated quantity, the reasoning speed and the precision of the model parameters are optimally balanced. In addition, the model greatly enriches training resources through a re-parameterization strategy, so that the network obtains more characteristic information, higher precision and faster reasoning speed. Although YOLOv7 is an efficient and accurate target detection algorithm, there is a challenge in small target detection because small targets occupy few pixels in the image and are easily ignored or misinterpreted as noise or background. Therefore, in order to improve the performance of YOLOv7 in terms of small target detection, improvement thereof is required.

Disclosure of Invention

Aiming at the problem of applying YOLOv7 to the surface defect of a track, the invention aims to solve the technical problem of providing a track surface defect detection method based on deep learning, electronic equipment and a storage medium, and improving the efficiency and the precision of applying the deep learning technology to the surface defect of the track.

In order to solve the technical problems, the invention adopts the following technical scheme:

in one aspect, a method for detecting a track surface defect based on deep learning is provided, which includes the following steps:

step S1: acquiring an image of a defect in the track and a defect-free image;

step S2: dividing the original image in the step S1 into training and testing images and respectively preprocessing to obtain a data set for training and testing:

step S3: training the Rep-YOLO model by utilizing the data set obtained in the step S2, and evaluating and adjusting parameters of the model after the training is finished to obtain an optimal Rep-YOLO model;

the Rep-YOLO model comprises a feature extraction backbone network, a cavity space convolution pooling pyramid structure, an asymptotic feature pyramid network and a multi-classifier module, wherein the feature extraction backbone network adopts a RepGhost structure as a main feature extraction module; three layers of feature graphs output by the feature extraction backbone network are processed by a cavity space convolution pooling pyramid structure and then are transmitted into an asymptotic feature pyramid network; three feature layers output by the feature extraction backbone network are used as input layers of the asymptotic feature pyramid network; the asymptotic feature pyramid network is provided with a feature fusion module, wherein the feature fusion module firstly fuses two adjacent low-level features and gradually brings high-level features into a fusion process; the multi-classifier module receives three layers of feature graphs output by the asymptotic feature pyramid network, and performs final classification and positioning on the feature graphs;

step S4: inputting the tested track image into the optimal Rep-YOLO model obtained in the step S3, and outputting the position of the target object, the category of the target object and the confidence of the target object by the model.

Preferably, preprocessing the data, removing dirty data in the data, performing data enhancement and expansion on the data set after manual labeling to obtain a sample set, and then dividing the sample set into a training set and a testing set, wherein the proportion is 8.5:1.5; and finally, training the Rep-YOLO model by using the divided sample set.

Preferably, the model training in step S3 includes the steps of:

s51: setting the pixel area of the network anchor to {12,16,19,36,40,28,36,75,76,55,72,146,142,110,192,243,459,401};

s52: a RepGhost structure is used as a main feature extraction module of the feature extraction backbone network;

s53, outputting a bottom-layer feature map by a feature extraction backbone network as an input of a cavity space convolution pooling pyramid structure;

s54: taking three feature layers output by a feature extraction backbone network as input layers of an asymptotic feature pyramid network, fusing two adjacent low-level features, and gradually bringing high-level features into a fusion process;

s53: combining semantic information of the bottom-layer feature map and the high-layer feature map, and creating a left-right feature map block;

s54: a plurality of prediction frames in an image are detected, and a plurality of prediction results of the same prediction frame only take the prediction result with the highest confidence score.

Preferably, a designed RepGhost structure is used as a main extraction module in the backbone network, the input end of the module enters three branches, the common convolution with the convolution kernel of 1, the GhostConv convolution and a residual edge are respectively normalized, and the three branches are added channel by channel and then an activation layer is carried out.

Preferably, the feature extraction backbone network outputs three layers of feature graphs, the feature graphs of the third layer 20x20 output by the feature extraction backbone network pass through convolution layers with the cavity convolution rates of 5, 9 and 13 and a pooling layer with the convolution kernel of 1 and the convolution kernel of 3, five layers of feature graphs which pass through the cavity convolution, the common convolution and the pooling layer are spliced, and convolution operation with the convolution kernel of 1 is carried out, so that the number of channels is reduced, and then the feature graphs are transmitted into the asymptotic feature pyramid network.

Preferably, the processing steps of the asymptotic feature pyramid network on the feature map are as follows: performing 1*1 convolution operation on three shallow-to-deep feature layers C1, C2 and C3 generated by a feature extraction backbone network to generate C1_0, C2_0 and C3_0, performing downsampling on the C2_0, enabling the C3_0 to enter ASF-1 to generate a feature map C3_1, performing reghost structure once again to form C3_2, performing adjacent interpolation on the C3_0 to be 2, enabling the C2_0 to enter ASF-1 to generate the feature map C2_1, and performing reghost structure once to form C2_2; then, performing downsampling operation on C1_0 and C2_2, entering an ASF-2 structure with C3_2, performing downsampling on the same time of the RepGhost structure to generate an effective feature P3, performing neighbor interpolation on C1_0 and C3_2 to perform upsampling on the same time of the RepGhost structure to generate an effective feature P2 after entering an ASF-2 module with C2_2, performing upsampling on the same time of the neighbor interpolation of C3_2 to generate 4 and upsampling on the same time of the neighbor interpolation of C2_2 to generate an effective feature P2, performing the same time of the RepGhost structure to generate the effective feature P1 with the same size of C1_0 and the same time of the RepGhost structure; and finally, the three-layer feature images are transmitted into a multi-classifier module.

Preferably, the three layers of feature maps of the input multi-classifier module are respectively an 80x80 feature map P3, a 20x20 feature map P5, a 40x40 feature map P4, and the pixel area of the anchor used by the 80x80 feature map P3 is set to {12,16,19,36,40,28}; the pixel area of the anchor for 20x20 map P5 is set to {142,110,192,243,459,401}, and the pixel area of the anchor for 40x40 map P4 is set to {36,75,76,55,72,146}.

Preferably, the surface defects comprise scratches, punch scratches, straight point rail cracks, side grinding and fish scales, for defect classification and positioning, model calculation average precision mAP is used as an evaluation index, a trained track defect detection model is used for testing track defect images, if mAP of the model is more than or equal to 90%, the model is identified to be capable of accurately detecting track defects, otherwise, the model is required to be optimized and finely adjusted; and/or the model parameter adjustment comprises increasing the number of training samples, modifying the learning rate and iterating the times.

In another aspect, there is provided an electronic device including:

a memory storing a plurality of instructions;

and the processor loads instructions from the memory to execute the steps in the detection method.

Further, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by an electronic device, implements the steps of the detection method.

The invention adopts the technical scheme and has the following beneficial effects:

(1) The designed RepGhost structure in the backbone network is used as a main feature extraction module, so that the network feature extraction capability is enhanced, and the parameter quantity is greatly reduced.

(2) The invention uses the hole space convolution pooling pyramid network structure, thereby not only effectively avoiding the problems of image distortion and the like caused by image region clipping and scaling operation, solving the problem of extracting relevant repeated characteristics of the graph by the convolution neural network, greatly improving the speed of generating candidate frames, but also enhancing the capability of detecting smaller targets on a track.

(3) The invention takes the asymptotic feature pyramid network as the neck part of the whole network, solves the problem of losing or degrading the feature information of the feature pyramid network from top to bottom and from bottom to top in the traditional method, and particularly can solve the problem of losing or degrading the information by supporting direct interaction of non-adjacent layers on the feature fusion of the non-adjacent layers. It first fuses two adjacent low-level features and gradually incorporates the high-level features into the fusion process. This approach helps to avoid large semantic gaps between non-adjacent levels.

(4) The surface defects which can be detected by the method comprise scratch, punch damage, straight point rail crack, wave mill, welding joint low collapse, side mill and fish scale, the problem that the traditional algorithm can only detect specific defects is solved, and the labor cost is saved.

(5) Experimental results show that the detection method can reach 94.58% of accuracy, and the detection method can be used for identifying and positioning the surface defect detection of the rail, so that the labor cost of enterprises is saved, and the efficiency of the surface defect detection is greatly improved.

The specific technical scheme adopted by the invention and the beneficial effects brought by the technical scheme are disclosed in the following detailed description in combination with the drawings.

Drawings

The invention is further described with reference to the drawings and detailed description which follow:

FIG. 1 is a flowchart of a track surface defect detection method based on Rep-YOLO according to an embodiment of the present invention;

FIG. 2 is an overall network architecture diagram of an embodiment;

FIG. 3 is a block diagram of a RepGhost of an embodiment;

FIG. 4 is a diagram of a hole space convolution pooling pyramid structure of an embodiment;

fig. 5 is a schematic diagram of an asymptotic feature pyramid network structure according to an embodiment.

Detailed Description

The technical solutions of the embodiments of the present invention will be explained and illustrated below with reference to the drawings of the embodiments of the present invention, but the following embodiments are only preferred embodiments of the present invention, and not all embodiments. Based on the examples in the implementation manner, other examples obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 to 5, the present embodiment relates to a track surface defect detection method based on deep learning, including the steps of:

step 1, acquiring an object detection image of an industrial site, and manually marking defects in original image data of a track to obtain a data set.

And carrying out data enhancement expansion on the data set to obtain a sample set. The enhancement and expansion operation of the data set is as follows: the data expansion is carried out by adding noise, turning, rotating, cutting, scaling, translating, dithering, affine transformation, gray scale increasing, fusing a specific background and the like, so as to obtain a final data set.

Specifically, 50% of the randomly extracted data sets are expanded by using a mosaic data enhancement method, 50% of the randomly extracted data sets are expanded by using a random clipping data enhancement method, and the two data enhancement modes are combined to finally obtain a final sample set.

Step 2, training set and verification set are calculated according to 8.5:1.5, wherein the specific data set is formed as shown in table 1, the image data of the training set is used for training the model, the image data of the verification set is used for verifying the detection capability of the model, and the model continuously adjusts the network parameters according to the verification effect through a large amount of training.

TABLE 1

And step 3, acquiring a Rep-YOLO network structure.

In the embodiment of the invention, the Rep-YOLO model comprises a feature extraction backbone network, a cavity space convolution pooling pyramid structure, an asymptotic feature pyramid network and a multi-classifier module. The image in the Rep-YOLO network structure forms a backbone network through a RepGhost structure, MBConv and PConv; the input end of the module enters three branches, the common convolution with the convolution kernel of 1, the GhostConv convolution and a residual edge are respectively normalized, and the three branches are added channel by channel and then an activation layer is carried out. The backbone network outputs three layers of feature maps, wherein feature maps with a size of 20x20 are input into the cavity space convolution pooling pyramid structure. The feature images of the third layer 20x20 output by the backbone network are spliced through the convolution layers with the cavity convolution rates of 5, 9 and 13 and the pooling layers with the convolution kernels of 1 and 3, and the five-layer feature images with the cavity convolution, the common convolution and the pooling layers are subjected to convolution operation with the convolution kernels of 1 once, so that the number of channels is reduced, and then the feature images are transmitted into the asymptotic feature pyramid network.

Carrying out 1*1 convolution operation on three shallow-deep feature layers C1, C2 and C3 generated by a trunk feature extraction network to generate C1_0, C2_0 and C3_0, carrying out downsampling on the C2_0, enabling the C3_0 to enter ASF-1 to generate a feature map C3_1, carrying out reghost structure once again to form C3_2, carrying out up sampling on the C3_0, carrying out adjacent interpolation to be 2, enabling the C2_0 and the C2_0 to enter ASF-1 to generate the feature map C2_1, and carrying out reghost structure once to form C2_2; then, performing downsampling operation on C1_0 and C2_2, entering an ASF-2 structure with C3_2, performing downsampling on the same time of the RepGhost structure to generate an effective feature P3, performing neighbor interpolation on C1_0 and C3_2 to perform upsampling on the same time of the RepGhost structure to generate 2, entering an ASF-2 module with C2_2, performing reghost to form an effective feature map P2, performing neighbor interpolation on C3_2 to perform upsampling on C4 and performing neighbor interpolation on C2_2 to perform upsampling on C2_2 to obtain a feature map with the same size as C1_0 and entering the ASF-2 structure with C1_0, and performing neighbor interpolation on the same time of the RepGhost structure to generate the effective feature P1; and finally, the three-layer feature images are transmitted into a multi-classifier module.

In the embodiment of the present invention, the multi-classifier module includes three Yolo Head classifiers, and the three multi-classifier module receiving feature fusion modules output fusion features with different sizes, which include 20×20, 40×40, and 80×80 fusion features.

And 4, training the target training model through the training set, and testing the trained target training model through the testing set.

In the embodiment of the invention, the network configuration is analyzed, the training network is started after the thread for loading the data is started, when the target training model is trained, the weight is stored once every 10 iterations, the optimal weight and the final weight are updated until the iteration number reaches the maximum value, and the maximum value can be an empirical value or a set value.

The step 5 specifically comprises the following steps:

s52: the designed RepGhost structure is used as a main module of the main feature extraction network of the network structure, so that the calculated amount and the parameter number are greatly reduced to a certain extent, and the method is applied to the industrial lighter weight;

s53, outputting a feature map at the bottommost layer by the network trunk as the input of a cavity space convolution pooling pyramid structure, so that the feature extraction capability of small target defects is enhanced;

s54: the three feature layers output by the backbone network are used as input layers of the asymptotic feature pyramid network, two adjacent low-level features are fused, and the high-level features are gradually brought into the fusion process, so that larger semantic differences between non-adjacent levels can be avoided;

s54: detecting a plurality of predicted frames in an image, wherein high redundancy may exist among predicted results to perform non-maximum suppression, and other predicted frames exceeding a threshold value with the predicted frame with the highest confidence score in the same detection object can be filtered out, namely, a plurality of predicted results of the same predicted frame only take the predicted result with the highest confidence score.

And 6, performing performance evaluation on the trained track defect detection model.

The loss function is divided into a boundary box regression loss, a confidence coefficient loss and a classification loss, and the SIOU is adopted to calculate the boundary box regression loss, and the formula is as follows:

where IOU is the intersection ratio of the prediction Box and the real Box GT. The confidence loss and the classification loss are calculated by adopting a binary cross entropy loss function, and the formula is as follows:

where N is the total amount, x is the sample, y is the tag, and Sigmoid is the Sigmoid activation function.

In order to equalize the positive and negative samples, the confidence Loss and the classification Loss are calculated by adopting Focal Loss on the basis of the binary cross entropy Loss, wherein the Focal= - (1-t) rlog (t) is expressed as follows

Where r is a constant, when it is 0, the Focal Loss is consistent with the BCE Loss, and t can be expressed as:

the model calculates the average accuracy mAP as an evaluation index, a trained orbit defect detection model is utilized to test an orbit defect image, if the mAP of the model is more than or equal to 90%, the model is identified to accurately detect the orbit defect, otherwise, the model parameters are optimized and finely adjusted.

And 7, parameter optimization fine tuning, namely further optimizing fine tuning of the track defect detection model by combining the evaluation result of the step 6, wherein the fine tuning is realized mainly by increasing the number of training samples, modifying the learning rate and iterating times.

According to the track surface main defect target identification method based on Rep-YOLO, two data enhancement modes are adopted to conduct data enhancement and expansion to obtain a sample, robustness of a network is enhanced, and overfitting is prevented. In the process of improving the YOLOv7 model, a backbone network is designed to be a main feature extraction module by a RepGhost structure, so that the learning capacity in the backbone network structure is enhanced, and meanwhile, the parameter calculation amount is greatly reduced; three layers of cavity convolution operation with different expansion rates are used through a cavity space convolution pooling pyramid structure, so that the recognition capability of a network on a small target is more accurate; through the asymptotic feature pyramid network, the space information and the semantic information are richer than those of the original YOLOv7 model, the important features of the feature map are more prominent, and the accuracy of detection and identification of the detector is effectively improved.

The main network of the Rep-YOLO model is formed by taking a designed RepGhost structure as a main feature extraction module, taking an MBConv structure as a network downsampling module, taking three layers of feature images output by the main network as input of an asymptotic feature pyramid network through a cavity space convolution pooling pyramid structure, and improving the detection capability of the network by capturing feature images with different scales. Experimental results show that the accuracy of the detection method can reach 94.58%, the detection method can be used for identifying and positioning the surface defect detection of the rail, the labor cost of enterprises is saved, and the efficiency of the surface defect detection is greatly improved.

Example two

An electronic device, the electronic device comprising:

a memory storing a plurality of instructions;

and a processor for loading instructions from the memory to perform the steps of the detection method of the first embodiment.

Example III

A computer-readable storage medium having stored thereon a computer program which, when executed by an electronic device, performs the steps of the detection method of embodiment one.

While the invention has been described in terms of embodiments, it will be appreciated by those skilled in the art that the invention is not limited thereto but rather includes the drawings and the description of the embodiments above. Any modifications which do not depart from the functional and structural principles of the present invention are intended to be included within the scope of the appended claims.

Claims

1. The track surface defect detection method based on deep learning is characterized by comprising the following steps of:

step S1: acquiring an image of a defect in the track and a defect-free image;

2. The method for detecting the surface defects of the track based on the deep learning according to claim 1, wherein the method is characterized in that the data are preprocessed, dirty data in the data are removed, a sample set is obtained by carrying out data enhancement expansion on a data set after manual labeling, the sample set is divided into a training set and a test set, and the proportion is 8.5:1.5; and finally, training the Rep-YOLO model by using the divided sample set.

3. The method for detecting rail surface defects based on deep learning as claimed in claim 1, wherein the model training in the step S3 comprises the steps of:

4. The method for detecting the surface defects of the track based on the deep learning according to claim 1, wherein a designed RepGhost structure is used as a main extraction module in a backbone network, the input end of the module enters three branches, the common convolution with the convolution kernel of 1, the GhostConv convolution and a residual edge are respectively normalized, and an activation layer is carried out after the three branches are added channel by channel.

5. The track surface defect detection method based on deep learning according to claim 1, wherein the feature extraction backbone network outputs three layers of feature images, the cavity space convolution pooling pyramid structure carries out a pooling layer with a convolution rate of 5, 9 and 13 and a convolution kernel of 1 and a convolution kernel of 3 on the feature images of a third layer 20x20 output by the feature extraction backbone network, and the five layers of feature images of the cavity convolution, the common convolution and the pooling layer are spliced, and a convolution operation with the convolution kernel of 1 is carried out, so that the number of channels is reduced, and then the feature images are transmitted to an asymptotic feature pyramid network.

6. The method for detecting the surface defects of the track based on the deep learning as set forth in claim 5, wherein the step of processing the feature map by the asymptotic feature pyramid network is as follows: performing 1*1 convolution operation on three shallow-to-deep feature layers C1, C2 and C3 generated by a feature extraction backbone network to generate C1_0, C2_0 and C3_0, performing downsampling on the C2_0, enabling the C3_0 to enter ASF-1 to generate a feature map C3_1, performing reghost structure once again to form C3_2, performing adjacent interpolation on the C3_0 to be 2, enabling the C2_0 to enter ASF-1 to generate the feature map C2_1, and performing reghost structure once to form C2_2; then, performing downsampling operation on C1_0 and C2_2, entering an ASF-2 structure with C3_2, performing downsampling on the same time of the RepGhost structure to generate an effective feature P3, performing neighbor interpolation on C1_0 and C3_2 to perform upsampling on the same time of the RepGhost structure to generate an effective feature P2 after entering an ASF-2 module with C2_2, performing upsampling on the same time of the neighbor interpolation of C3_2 to generate 4 and upsampling on the same time of the neighbor interpolation of C2_2 to generate an effective feature P2, performing the same time of the RepGhost structure to generate the effective feature P1 with the same size of C1_0 and the same time of the RepGhost structure; and finally, the three-layer feature images are transmitted into a multi-classifier module.

7. The method for detecting surface defects of a track based on deep learning according to claim 1, wherein three layers of feature maps input to the multi-classifier module are respectively 80x80 feature map P3, 20x20 feature map P5, 40x40 feature map P4, and pixel area of an anchor used for the 80x80 feature map P3 is set to {12,16,19,36,40,28}; the pixel area of the anchor for 20x20 map P5 is set to {142,110,192,243,459,401}, and the pixel area of the anchor for 40x40 map P4 is set to {36,75,76,55,72,146}.

8. The method for detecting the surface defects of the track based on the deep learning according to claim 1, wherein the surface defects comprise scratches, punch scratches, straight point rail cracks, side grinding and fish scales, average accuracy mAP is calculated by a model as an evaluation index for defect classification and positioning, a track defect image is tested by using a trained track defect detection model, if mAP of the model is more than or equal to 90%, the model is determined to be capable of accurately detecting the track defects, otherwise, the model is required to be subjected to optimized fine adjustment; and/or the model parameter adjustment comprises increasing the number of training samples, modifying the learning rate and iterating the times.

9. An electronic device, the electronic device comprising:

a memory storing a plurality of instructions;

a processor loading instructions from the memory to perform the steps in the method of any one of claims 1 to 8.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by an electronic device, implements the steps of the method of any one of claims 1-8.