CN117011249A

CN117011249A - Tire appearance defect detection method based on deep learning

Info

Publication number: CN117011249A
Application number: CN202310880700.3A
Authority: CN
Inventors: 刘韵婷; 戴佳霖
Original assignee: Shenyang Ligong University
Current assignee: Shenyang Ligong University
Priority date: 2023-07-18
Filing date: 2023-07-18
Publication date: 2023-11-07

Abstract

The invention provides a tire appearance defect detection method based on deep learning, which comprises the following steps: inputting the tire image to be detected into a feature extraction network of an improved Mask R-CNN model, and carrying out feature extraction on the tire image by the feature extraction network to output a feature image; inputting the feature map to a region candidate network of the improved Mask R-CNN model to generate a plurality of boundary frames, screening by a non-maximum inhibition method, and selecting the boundary frame with the highest confidence coefficient as a candidate frame; the feature map is input to a Mask segmentation module of the improved Mask R-CNN model, and Mask marks are generated on the defective feature map, so that detection of tire appearance defects is achieved. The invention solves the problems of low detection efficiency, low precision, strong subjectivity and the like in the detection of the tire appearance defects in China.

Description

Tire appearance defect detection method based on deep learning

Technical Field

The invention belongs to the technical field of tire detection, and particularly relates to a tire appearance defect detection method based on deep learning.

Background

The traffic accident on the expressway in China is caused by the tire fault with high probability, so that the safety problem of the tire can directly influence the running safety of the whole vehicle, and even the personal safety of the whole vehicle. The tire tread is seriously damaged, the tire burst phenomenon is easy to occur when the tire runs at a high speed or is braked emergently, and even the vehicle is destroyed and the person is killed, so that the damage which cannot be compensated is caused. For the detection of the appearance of a tire, research at home and abroad is mostly focused on the positions of the sidewall, the inside of the tire, the bead and the like of the tire. The method is affected by technical limitations and problem complexity, and the difficulty of the tire tread local detection method is high, and the error of human eye detection is high mainly due to the fact that the tread pattern is large in number and miscellaneous, so that human eyes and tools are adopted to detect the appearance defects of the tire at present, great subjective factors exist, and the detection accuracy is reduced.

In order to effectively improve the accuracy of target detection, a plurality of detection methods based on Mask R-CNN improvement are generated. Silvia Liberata Ullo et al propose a landslide detection method based on Mask R-CNN improvement, which adopts transfer learning landslide region and fuses target detection algorithm to improve detection accuracy. YIngying Xu and the like are used for effectively solving the problem of poor feature extraction under the condition of complex tunnel surface background, a path enhanced feature pyramid network (PAFPN) is adopted, and a branch of edge detection is matched, so that a tunnel surface defect automatic detection and segmentation method based on an improved Mask R-CNN is provided. Yung Tian and the like propose a MASL R-CNN model for improving Mask aiming at detection and segmentation of apple flowers with three different growth states of bracts, semi-openness and complete openness, and the model uses U-Net as a backbone network, and improves feature utilization rate and promotes feature reuse through feature mapping splicing in the encoding and decoding process. Wang Liangdeng A Mask R-CNN-based oil tea fruit identification and detection method is provided, the limitation of the traditional detection algorithm is avoided, and the influence of factors such as fruit overlapping, shielding, color and the like on oil tea fruit identification and detection under different illumination conditions is solved.

However, the existing tire appearance defect detection still has the problems of low detection efficiency, low precision, strong subjectivity and the like.

Disclosure of Invention

The invention aims to: the invention provides a tire appearance defect detection method based on deep learning, which aims to solve the problems of low detection efficiency, low precision, strong subjectivity and the like in the tire appearance defect detection in China, and provides an improved tire defect detection model.

The technical scheme is as follows:

the first aspect of the present invention provides a tire appearance defect detection method based on deep learning, comprising the steps of:

s1: inputting the tire image to be detected into a feature extraction network of an improved Mask R-CNN model, and carrying out feature extraction on the tire image by the feature extraction network to output a feature image;

s2: the feature map of the step S1 is input into a region candidate network of an improved Mask R-CNN model to generate a plurality of boundary frames, screening is carried out through a non-maximum suppression method, and the boundary frame with the highest confidence coefficient is selected as a candidate frame;

s3: inputting the feature images in the candidate frames to a Mask segmentation module of the improved Mask R-CNN model, and generating Mask marks for the defective feature images, so that the detection of the tire appearance defects is realized.

Further, the method for establishing the improved Mask R-CNN model comprises the following steps:

step 1, data acquisition, namely acquiring tire data images through an RGB camera, and forming a data set after expansion;

step 2, defect marking, namely performing defect marking on the tire data image of the data set in the step 1 by using an image marking tool to obtain a data set of the defect marking;

step 3, a feature extraction network part in the Mask R-CNN model adopts a fused SE attention module to obtain an improved Mask R-CNN model;

step 4, model pre-training, namely pre-training the Mask R-CNN model improved in the step 3 on a COCO data set to obtain corresponding parameters and weights;

step 5, performing migration operation on the trained model in a migration learning mode; and (3) randomizing parameters obtained by transfer learning through fine tuning operation, and training the data set obtained in the step (2) to finally obtain the improved Mask R-CNN model capable of identifying the appearance defects of the tire.

Further, the improved Mask R-CNN model comprises a feature extraction network, a region candidate network and a Mask segmentation module, wherein the feature extraction network is respectively connected with the region candidate network and the Mask segmentation module, and the region candidate network is connected with the Mask segmentation module.

Further, the backbone network of the feature extraction network is SE-ResNet50+FPN; the SE-ResNet50 network is used for extracting the characteristics of the input tire image by improving the quality of the characteristic map after the SE attention module is added to the last small Block of Conv Block and Identity Block in the ResNet50 network; FPN is a feature processing architecture that generates multi-scale feature maps to process objects of different sizes in target detection.

Further, the SE attention module includes Squeeze, excitation and Scale, the feature map is input to the squeze and Scale respectively, the squeze outputs features to the specification through the compressed model, the specification outputs weight after adaptively adjusting the model, and then inputs the weight after adaptively adjusting to Scale, and the Scale performs weighting operation on the weight output after adaptively adjusting and the features input in the original, and finally outputs the feature map to the next convolution.

Further, in step S3, the Mask segmentation module includes a coordinate regression network and Mask branches, where the coordinate regression network is used to regress the coordinates of the selected final candidate frame, and determine the position for the subsequent Mask generating operation; the Mask branches are used for comprehensively classifying the network results and the adjusted positions of the full-connection layers, and performing segmentation prediction on the target to be detected to generate Mask marks.

A second aspect of the present invention proposes a mobile terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor implementing the method for detecting a tire appearance defect based on deep learning as in the first aspect when executing said program.

A third aspect of the present invention proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the deep learning-based tire appearance defect detection method according to the first aspect.

The beneficial effects are that:

the invention provides a tire appearance defect detection method based on deep learning. The Mask R-CNN model is adopted, so that the positions of the tires in the data can be found more accurately, and unnecessary background parts can be removed. Firstly, feature extraction is carried out on the surface of a tire by adopting a feature extraction network combined with an attention mechanism, so that the attention of a model to useful information is enhanced, and feature information with lower correlation degree is restrained, so that the model can extract more robust features, and the feature extraction capacity of the network and the quality of a feature map are improved; then, the obtained characteristic diagram is operated through the RPN to finish the detection of the tire defect; and finally, effectively dividing the image by adopting a Mask R-CNN network, and generating a Mask mark. Experiments prove that the improved detection network provided by the invention can detect larger defects, tiny scratches and shallower scratches on the tread, and the tire defect detection accuracy reaches 91.3%.

Drawings

FIG. 1 is a diagram of an overall network model architecture;

FIG. 2 is a diagram of the SENet model;

FIG. 3 is a schematic diagram of a visual user interface;

FIG. 4 is a graph of the effect of identifying defects on a tire surface;

FIG. 5 is a diagram of the overall defect identification effect of a tire;

FIG. 6 is a graph of accuracy of network comparison experiments;

FIG. 7 is a response time histogram;

FIG. 8 is a graph of module comparison experimental recognition rate;

fig. 9 is an ablation experiment identification rate line graph.

Detailed Description

The invention is described in more detail below with reference to the drawings accompanying the specification.

The present invention proposes a software-based solution. The tire safety visual detection system can detect the damaged part by utilizing the RGB camera by utilizing the image-based detection technology of deep learning. The detection of the damaged part of the tire is mainly characterized in that all areas of the tire can be detected, so that the tire is comprehensively damaged. The detection of the damaged part of the tire is mainly characterized in that all areas of the tire can be detected, so that the tire is comprehensively damaged. Because the camera taking the tire image is not a fixed environment, finding the exact tire position requires removing areas of unnecessary background portions. Comprehensively considering, the experiment uses a Mask R-CNN detection model to solve the problems. Meanwhile, the feature extraction network fused with the SE attention module is adopted, so that the attention of the model to useful information is enhanced, and feature information with lower correlation degree is restrained, so that the model can extract more robust features. The network may also extract the object region from the background and then perform instance segmentation on the tire image. Thus, unnecessary tire background areas can be reduced to be used as model input, and the detection accuracy is affected.

As shown in fig. 1, the present invention provides a tire appearance defect detection method based on deep learning, which specifically comprises the following steps:

as shown in fig. 1, the method for establishing the improved Mask R-CNN model is as follows:

the anomaly detection is carried out on the defects of the appearance surface of the tire, and no data set is disclosed for the study, so that the data set needs to be established by oneself. In the image acquisition process, because the shooting angles are different, the acquired tire images are different, so that the overturn, rotation, size change, brightness change and the like are applied to the original images, the expansion of image data sets with different angles is realized, and the data sets are obtained.

the invention employs a visual image annotator (VGG Image Annotator, VIA) to mark defects in the pixel-adjusted image. The collected tire image picture is subjected to uniform pixel adjustment, and usually the obtained tire image has 4000×3000 pixels, but the excessive pixels are not beneficial to identification, so the experiment adjusts the pixel cut of the image to 576×576. And marking a defect target in the tire image by using an image marking tool, and providing a preselected area of the feature to be extracted for subsequent network model training. Finally, each marked picture is stored as a corresponding json file, and the file is finally embodied in a coordinate form, which can be directly applied to the Mask R-CNN model.

in order to improve the robustness of the features, the feature extraction network integrated with the SE attention module is provided. The improved feature extraction network mainly embeds SE attention modules between convolution layers, the SE attention modules can improve the attention of useful feature information according to the importance degree of channels, and restrain feature information with low correlation degree. Meanwhile, soft pooling operation is adopted in the feature extraction network, so that the loss of feature information in the pooling process is reduced as much as possible. The feature extraction network integrated with the SE attention module is adopted, so that the attention of the model to useful information is enhanced, and feature information with lower correlation degree is restrained, so that the model can extract more robust features.

Step 4, model pre-training, namely pre-training the improved Mask R-CNN model in the step 3 on a COCO data set to obtain corresponding parameters and weights;

step 5, performing migration operation on the trained model in a migration learning mode; randomizing parameters obtained by transfer learning through Fine-tuning (Fine-tuning) operation, and finally obtaining an improved Mask R-CNN model capable of identifying the appearance defects of the tire through training the data set obtained in the step 2.

Because the invention adopts the mode of combining the improved Mask R-CNN model and the transfer learning, the transfer learning is a method commonly used in the identification field, and the time can be saved by using the method, and better performance can be obtained without a large number of data sets under most conditions. And (3) comparing the similarity of the image characteristics of the tire data set and the COCO data set in the step (2) by adopting a transfer learning technology, and then performing fine tuning operation. Because the model after the learning is transferred, the model itself may not be fully tried, so that training time needs to be saved by a fine adjustment operation, and learning accuracy can also be improved. The network proposed from the present invention is then initialized with the trained parameters.

And (3) obtaining corresponding learning weights through training, then adopting a transfer learning mode, comparing the similarity of the tire data set characteristics in the step (2) and the COCO data set characteristics of the network, performing Fine-tuning operation, and applying the obtained parameters to the network model provided by the invention to detect the tire appearance defect data.

When the transfer learning is trained through the model, the weight of the head layer can be generated, when the test image is transmitted through the ResNet, the model can use the area suggestion network to carry out image area separation, so that high-quality areas can be generated, meanwhile, different areas can correspondingly generate different feature images, and the feature images with fixed sizes are generated through the full-connection layer and are used for detecting defects on the tire image. The trained transfer learning model determines the accuracy of the formed regions. Meanwhile, when the fine tuning operation is pretrained on a large data set, model parameters of the fine tuning operation are in a better position from the beginning, so that the network can be converged more rapidly, and meanwhile, characteristics (particularly bottom characteristics) obtained by training can be more diversified. In order to be able to extract the feature points by a method applicable to the tire dataset, the trimming operation will fix to the third layer at the bottom of ResNet, and then the weights of the subsequent layers will be adjusted to suit the learning from the tire dataset.

And inputting the data set obtained in the step 2 into a Mask R-CNN model with a transfer learning identification function for training to obtain an improved Mask R-CNN model capable of identifying the tire appearance defects.

Improved Mask R-CNN model structure and principle:

the Mask R-CNN model structure diagram with the improved fused attention mechanism is shown in figure 1. The model structure comprises a feature extraction network, a region candidate network and a Mask segmentation module, wherein the feature extraction network is respectively connected with the region candidate network and the Mask segmentation module, and the region candidate network is connected with the Mask segmentation module.

Feature extraction network structure: the main network of the feature extraction network is SE-ResNet50+FPN, the SE-ResNet50 network is the ResNet50 network of which the feature extraction network part adopts a fused SE attention module, the feature extraction is carried out on the input tire image, and the robustness of the feature is improved. The ResNet50 is a deep convolutional neural network and is mainly responsible for extracting features from an original image, and the process is to continuously convolve the feature images in an up-sampling mode, so that feature images with different sizes can be obtained, after the final convolution operation is completed, the feature images obtained by up-sampling are subjected to down-sampling operation, and the feature images obtained by up-sampling are subjected to mapping operation, so that the feature images obtained by each sampling are guaranteed to be improved. Meanwhile, an SE attention module is added in each stage of feature extraction, namely after the SE attention module is added to the last small Block of Conv Block and Identity Block in ResNet, the quality of the feature map is further effectively improved. In Stage1-Stage4 links, each Stage contains two network structures: conv Block structure and Identity Block structure, and are three blocks. The specific details are as follows: 1. the output of each Stage is the input of the next Stage; 2. whether Conv Block or Identity Block, the channel of the last small Block is four times that of the first and middle small blocks; 3. the last small Block has no RELU operation, whether Conv Block or Identity Block. FPN is a feature processing architecture that can generate multi-scale feature maps to process objects of different sizes in target detection. The FPN adds an additional layer behind the convolutional neural network to fuse the features of different sizes, so that the accuracy of object detection can be effectively improved.

The structure of the SE attention module, as shown in fig. 2, includes a measure (global information embedding), an expression (adaptive adjustment) and a Scale (product), the feature map extracted from the input picture is respectively input to the measure and the expression, the measure outputs the feature to the expression through the compressed model, the expression outputs the weight after the adaptive adjustment to the Scale, and the Scale performs the weighting operation on the weight output after the adaptive adjustment and the feature input in the original, and finally outputs the feature map to the next convolution.

Currently, there have been many efforts in the deep learning field to improve the performance of networks by improving the network in the spatial dimension. But the SE attention module improves the performance of the network by modeling the channel relationships. Where Squeeze and specification are two vital steps in the SE network, valuable information can be extracted from the data and features in the current task are optimized based on this information while invalid features are suppressed.

When the input is given, the characteristic channel number is subjected to corresponding convolution and transformation operation, and a new characteristic channel number output can be obtained. In contrast to conventional CNNs, SE attention modules recalibrate previously acquired features through Squeeze and specification operations.

Squeeze: global Information Embedding (global information embedding). The input features are aggregated by the Squeeze compression operation, so that the feature mapping of the space dimension h multiplied by w can be effectively converted into a real number, the output dimension is completely consistent with the input feature channel, and the reliability and accuracy of the data are improved. Z is Z _c Representing statistics, generated by narrowing U to the spatial dimension h w, U _c Representing the input characteristics, the compression operation is obtained by equation (1).

The specification: adaptive Recalibration (adaptive tuning). The method is similar to a gating mechanism of a cyclic neural network, and can set weights of various features according to parameters, so that each channel can understand and master data on the whole, further, the data can be better utilized, and some insignificant data can be processed. In order to increase the generalization capability of the model and reduce the complexity of the model, the sigmiod function and the Relu function are combined, and simultaneously, two full-connection layers (FC) are adopted for parameterization control, and the self-adaptive adjustment operation is obtained by a formula (2).

S＝F _ex (z,W)＝σ(g(z,W))＝σ(W ₂ δ(W ₁ z)) (2)

Wherein delta is a function of ReLU,the dimension-reducing layer parameter is W ₁ The dimension reduction ratio is r; another dimension-increasing layer is W ₂ 。

Scale: and carrying out weighting operation on the weight value output after the self-adaptive adjustment and the original input characteristic.

The SE attention module has extremely high flexibility, mainly comprises an added structure between the input and the output of a Layer, when the SE is embedded into a certain Layer, the received characteristic diagram is subjected to pooling operation, then enters the full-connection Layer 1-activation Layer 1-full-connection Layer 2-activation Layer 2, and finally, the learned numerical value of the channel number is subjected to matrix multiplication with the characteristic diagram of the original input through a Scale Layer. The full-connection layer 1 is used for degrading characteristics, the activation layer 1 is a ReLU layer, the activation layer 2 is a Sigmoid layer, and normalization processing is carried out on each channel through the full-connection layer 2 and the activation layer 2, so that fitting accuracy is further improved.

Program for feature extraction network: firstly, inputting a picture to perform data preprocessing (size, normalization and the like); and then inputting the processed data into a feature extraction network (SE-ResNet50+FPN), and extracting a better feature map through the feature extraction network integrating the attention mechanism.

The method comprises the following steps:

1) The input image size is 576×576 (band size,576, 576,3) during training, and the model is entered into the feature extraction network, and the backbone network of the feature extraction network is ResNet50, and Stage 2-Stage 5 are composed of previously defined identity_block and conv_block, so that the corresponding convolution operation is performed next.

2) In Stage1, first, zero padding operation is performed on the input picture [ zeropad 2D (3, 3) ], so that the size of the image becomes (582 ); meanwhile, the convolution kernel (fliters) is 7, the step length is set to 2, and the output length and width are (582-7+1)/2=288 according to the formula. The number of filters is 64, so the output shape is (288, 288, 64); batchNorm: the length and width are not changed, and the activation function Relu is the same as the length and width are not changed; the image is then pooled, maxPooling2D: stridewei2. The length and width change and the shape changes (144, 144, 64).

3) In Stage2, the step length is first set to (1, 1), so the length and width are not changed. The two identity_blocks do not change length and width as much. The number of filters of the last identity_block is 256, and thus the output shape is (144, 144, 256). In this section, since the size of the output feature map becomes 1/4 of the input image, the receptive field of this section is 4*4.

4) In Stage3, the first conv_block uses preset stride (2, 2), the length and width become half. The number of filters of the last identity_block is 512, and thus the output shape is (72, 72, 512). Because the length and width of the output characteristic diagram are 1/8 of the input picture, the receptive field is 8 x 8.

5) In Stage4, the first conv_block uses preset stride (2, 2), the length and width become half. The number of filters of the last identity_block is 1024, and thus the output shape is (36, 36, 1024). Because the length and width of the output characteristic diagram are 1/16 of the input picture, the receptive field is 16 x 16.

6) In Stage5, the first one of the cases, is a pre-set stride (2, 2), which is half the length and width. The number of filters of the last identity_block is 2048, and thus the output shape is (18, 18, 2048). Because the length and width of the output characteristic diagram are 1/32 of the input picture, the receptive field is 32 x 32.

S2, inputting the feature map in the step S1 into an improved region candidate network of the Mask R-CNN model to generate a plurality of boundary frames (BBox), screening by a non-maximum inhibition method, and selecting the boundary frame with the highest confidence as a candidate frame, thereby improving the efficiency of classifying the network;

after the feature extraction is completed, the feature map obtained in step S1 is input to a regional candidate network (RPN), and then enters the next Layer of Proposal Layer. In this layer, the main task is to mark candidate frames, and in this step, a plurality of candidate frames are obtained through coordinate acquisition, and finally, the operation is performed through a non-maximum suppression method, so that more effective candidate frames are left. The method can extract the object area from the background, then the example segmentation is carried out on the tire image, so that unnecessary tire background areas can be reduced to be used as model input, and the detection accuracy is affected. After this is done, the ROIAlign layer is entered, which also performs the pooling operation. The function of this step is to avoid partial feature loss caused by the roiling operation.

RoIAlign in RPN is a region feature aggregation approach that functions to pool corresponding regions in a feature map to a fixed size feature map based on the position coordinates of a preselected box. Roialign is a basic architecture comprising a series of residual networks, and FPN is a subsampling method, which can centralize a plurality of independent features to generate a multidimensional feature map with rich connotation, thereby providing powerful support for later analysis and processing. By using pre-set models and parameters, the RPN can effectively help the model identify the optimal distribution range. The input of the part is a characteristic diagram, the region is screened by using a non-maximum inhibition method, the invalid part is reduced, and meanwhile, the valid part is left, so that the efficiency of the classification network is improved. Meanwhile, the full-connection layer uses RoIALign to replace the original RoIPooling operation on the basis of the fast R-CNN, and the accuracy of the algorithm is improved while quantization is canceled.

The non-maximum suppression method is an existing method, and elements which are not maximum values are suppressed, namely a local maximum value is searched. For a target, the algorithm can generate multiple candidate frames for the target, each frame corresponds to one score, all the score is sorted, one of the score is selected, the overlapping degree of the score and the frame with the largest score is calculated by using other frames, namely, the so-called iou, when the score is larger than a certain threshold value, for example, the threshold value is set to be 0.7 in the embodiment, the frames larger than the threshold value of 0.7 are deleted, only one frame with the largest score is reserved, and therefore, the frames which are not the largest value, namely, the non-largest value inhibition is inhibited.

And S3, inputting the feature images in the candidate frames to a Mask segmentation module of the improved Mask R-CNN model, and generating Mask marks for the defective feature images so as to realize detection of the tire appearance defects.

The Mask segmentation module comprises: the coordinate regression network and Mask branches are mainly used for predicting the type of the target to be detected and obtaining the coordinate of the target to be detected, and the precision error caused by inaccurate position of the full-connection layer can be avoided by adjusting the position of the full-connection layer. In Mask R-CNN, the coordinate regression network is used for carrying out regression on the coordinates of the selected final candidate frame, and determining the position for the subsequent Mask generating operation; the Mask branches have the functions of integrating the result of the classification network and the adjusted position of the full-connection layer, and performing segmentation prediction on the target to be detected to generate Mask marks.

The feature map within the candidate box eventually goes into Detection Target Layer, which is the target detection layer. Mask segmentation is performed at this layer, and includes coordinate regression, classification (only defect detection is performed this time, so there is no classification), and Mask marking, and the segmentation result is marked with a Mask. Mask R-CNN is an improvement over Fast R-CNN, except Mask R-CNN has Mask branches. The mask can expand the framework of object detection while adding fully connected layers and redefine the ROI loss function. And finally, the output detection image is consistent with the original input image in size.

The structure of the region candidate network and Mask branches in the improved Mask R-CNN model still adopts the network of the classical model.

In order to more intuitively display the detection result, the design uses the improved model to detect the appearance defects of the tire and displays the detection result through a visual user interface.

After uploading an image by selecting a local album, the system calls the improved Mask R-CNN model and the running environment, and the identified result is returned to the system. Eventually, the visual interface will exhibit the identified defect site.

The invention adopts a visual user interface mode to carry out experiments, so that an interface which can be more intuitively displayed is needed to be manufactured; an identification model user interface (GUI) is built with PyQt5, and the UI interface can be written by Qt Designer in PyQt 5. The design of the Qt Designer conforms to the architecture of the model-view-controller, achieves separation of view and logic and is easy to develop, and the interface of the design is shown in fig. 3.

The invention also provides a mobile terminal which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the tire appearance defect detection method based on deep learning when executing the program.

The present invention also proposes a computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the steps in the method for detecting an appearance defect of a tire based on deep learning of the present invention.

Example 1

1.1 Experimental configuration

The experiment is performed by using Ubuntu 18.04.2 operating system and adopting Intel Kuri 7-3770 model CPU and GPU as GTX 1070Ti environment. The modified Mask R-CNN learning framework employs Tensorflow 1.14.0, keras 2.1.0 version.

1.2 Experimental methods

Firstly, preparing 400 experimental test data sets, then using the improved Mask R-CNN model to test the test set data, and carrying out defect detection through a designed user operation interface. Identification of tire surface area defect detection with ResNet101 as the backbone network and SE-ResNet50 as the backbone network, respectively, by the feature extraction network in the improved Mask R-CNN model.

After the above test is completed, 100 tire whole outline pictures are prepared again, and defect detection is continued for the main network and the modified SE-ResNet50 respectively through the ResNet 101.

1.3 experimental results

And detecting the data set of 400 experimental tests to obtain an identification effect diagram of tire surface area defect detection of the feature extraction network by taking ResNet101 as a main network and SE-ResNet50 as a main network, as shown in fig. 4. As can be seen from the above experimental comparison, the use of the res net101 as the base network allows for the majority of defect identification for tire pictures, but is not identifiable for some minor tread scratches and minor scratches. When the improved method is adopted, after a attention mechanism is added between the convolution layers of the basic network, the large defects, the tiny scratches and the shallower scratches of the tread can be identified and marked at the same time.

When 100 whole outline pictures of the tire are detected, as shown in fig. 5, the experimental result graph can correspondingly identify two main extraction networks when the appearance of the tire has a large defect, but when detecting a small scratch, the SE-ResNet50 provided by the invention has a more excellent effect.

The detection result also proves that the improved basic network has more advantages in the aspect of micro defect identification. The experimental model can well solve the problems of easy defect detection and the like on the original basic network.

Example 2 comparative test

In order to verify that the accuracy of the network provided by the invention in defect identification is improved, the model provided by the invention is compared with DenseNet, alexNet, VGG, leNet and GoogleNet in the homemade data set accurately, and the results are shown in Table 1.

Table 1 comparison of tire dataset experimental results

By comparing the test with five other basic networks, after the average value of the test results is taken, the test adopts a network model with higher detection rate than other basic networks and faster response time according to the table 1. Meanwhile, as can be seen from the table, the effect after use is improved to a certain extent along with the continuous updating progress of the basic network of the network. And with the updating and replacement of the basic network, the detection accuracy is continuously increased, and the detection time is shortened, so that the method can be clearly compared with the method of the invention.

In conclusion, the network adopted in the experiment has more advantages. The accuracy line diagram of the comparison experiment is shown in fig. 6, and the line diagram data in the graph can show that the method used by the patent shows great advantages in the comparison of the detection accuracy; response time bar graph as shown in fig. 7, it can be seen from the graph that the response time required for the backbone network used in the present invention is shorter.

Meanwhile, in order to verify that Mask R-CNN can play more roles in the defect identification field, the experiment also compares the effect of Fast R-CNN module and Fast R-CNN module on the network provided by the invention, and the result is shown in Table 2.

Table 2 comparison table of experimental results for different modules

As can be seen from table 2, the comparison experiment was performed by comparing the application effects of three different modules in the model, and the average value of the experimental results was taken and filled in the table after a plurality of experiments. Through a series of experimental verification and result analysis, the Mask R-CNN used by the model is more excellent in defect detection, the single detection time is shortest, and high-efficiency and high-accuracy defect identification can be realized. As shown in FIG. 8, the graph of the comparison experiment accuracy and the graph of the comparison result can be seen from the data in the line graph, and the Mask R-CNN used at this time is superior to the Faster R-CNN and the Fast R-CNN in terms of detection time and detection accuracy.

Example 3 ablation experiments

To further verify the effectiveness of the proposed attention network module, experimental comparisons were made on the homemade dataset using the modified base network and the base network before modification, and ResNet50, resNet101, SE-ResNet50 and SE-ResNet101 were selected for comparison, the results of which are shown in Table 3.

Table 3 comparison of ablation experimental results

As can be seen from table 3, the ablation experiment was performed by comparing the application effects of four different networks in the model, and after multiple experiments, the average value of the experimental results was taken and filled in the table. From the experimental data in the table, the more the number of convolution layers is, the higher the identification accuracy is when the attention channel is not added; however, the more the number of convolution layers, the less effective the attention mechanism is added. The comparison graph of the ablation experiment is shown in fig. 9, and as can be seen from the line graph in the graph, when the main network adopts ResNet50 or SE-ResNet101, the change amplitude of the detection accuracy is larger; when the ResNet101 is adopted in the backbone network, the detection accuracy is stable, and compared with the ResNet50, the detection accuracy is obviously improved; when ResNet101 is compared with SE-ResNet101, the detection accuracy is not obviously improved, and the response time is prolonged; in summary, the SE-ResNet50 adopted by the invention can be used as a backbone network to obtain higher accuracy.

In summary, the invention establishes a tire defect detection model based on deep learning, carries out careful analysis on the condition of abrasion detection errors caused by tire defects, applies a transfer learning technology and matches with fine adjustment operation, so that the obtained weight can be learned according to tire characteristics, and meanwhile, a attention module is added during characteristic extraction, thereby improving the attention of the model to useful characteristic information and inhibiting the characteristic information with low correlation degree. The final Mask R-CNN model can identify defect points in the tire image, and computer vision tire defect detection is achieved.

The embodiment results show that the model of the invention realizes non-machine visual detection, can realize the identification of tire defects with higher precision, and further avoids the error of visual detection, but the accuracy of visual detection is lower than that of machine detection. In the future, the model is combined with a camera to realize more rapid visual system detection, so that the processing efficiency is further improved, and the accuracy of visual tire defect detection is increased to a new height.

Claims

1. A tire appearance defect detection method based on deep learning is characterized in that: the method comprises the following steps:

2. The method for detecting tire appearance defects based on deep learning as claimed in claim 1, wherein: the method for establishing the improved Mask R-CNN model comprises the following steps:

3. The method for detecting the appearance defects of the tire based on the deep learning as claimed in claim 2, wherein: the improved Mask R-CNN model comprises a feature extraction network, a region candidate network and a Mask segmentation module, wherein the feature extraction network is respectively connected with the region candidate network and the Mask segmentation module, and the region candidate network is connected with the Mask segmentation module.

4. A tire appearance defect detection method based on deep learning as claimed in claim 3, wherein: the backbone network of the feature extraction network is SE-ResNet50+FPN; the SE-ResNet50 network is used for extracting the characteristics of the input tire image by improving the quality of the characteristic map after the SE attention module is added to the last small Block of Conv Block and Identity Block in the ResNet50 network; FPN is a feature processing architecture that generates multi-scale feature maps to process objects of different sizes in target detection.

5. The method for detecting the appearance defects of the tire based on the deep learning as claimed in claim 2, wherein: the SE attention module comprises Squeeze, excitation and Scale, the feature map is respectively input into the Squeeze and the Scale, the Squeeze outputs features to the expression through the compression model, the expression outputs weight after the model is adaptively adjusted, the weight is input into the Scale, and the Scale carries out weighting operation on the weight output after the adaptive adjustment and the features input in the prior art, and finally outputs the feature map to the next convolution.

6. The method for detecting tire appearance defects based on deep learning as claimed in claim 1, wherein: the Mask segmentation module in the step S3 comprises a coordinate regression network and Mask branches, wherein the coordinate regression network is used for carrying out regression on the coordinates of the selected final candidate frame, and determining the position for the subsequent Mask generating operation; the Mask branches are used for comprehensively classifying the network results and the adjusted positions of the full-connection layers, and performing segmentation prediction on the target to be detected to generate Mask marks.

7. A mobile terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the deep learning based tire appearance defect detection method according to any one of claims 1 to 6 when executing the program.

8. A computer-readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the steps in the deep learning-based tire appearance defect detection method according to any one of claims 1 to 6.