CN115953386A - MSTA-YOLOv 5-based lightweight gear surface defect detection method - Google Patents

MSTA-YOLOv 5-based lightweight gear surface defect detection method Download PDF

Info

Publication number
CN115953386A
CN115953386A CN202310056291.5A CN202310056291A CN115953386A CN 115953386 A CN115953386 A CN 115953386A CN 202310056291 A CN202310056291 A CN 202310056291A CN 115953386 A CN115953386 A CN 115953386A
Authority
CN
China
Prior art keywords
convolution
msta
surface defect
gear surface
yolov5
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310056291.5A
Other languages
Chinese (zh)
Inventor
闫蕊
张让勇
刘琦
顾笑言
郭文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Original Assignee
Qilu University of Technology
Shandong Computer Science Center National Super Computing Center in Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology, Shandong Computer Science Center National Super Computing Center in Jinan filed Critical Qilu University of Technology
Publication of CN115953386A publication Critical patent/CN115953386A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of product defect detection, and discloses a light-weight gear surface defect detection method based on MSTA-YOLOv5, which comprises the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; and finally, sending the gear defect image to be detected into a trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear. The invention solves the problems that the computing resource demand is too large, the memory consumption is serious, the cost is higher, an enterprise needs a low-delay model, and a mobile device terminal needs a fast and accurate small model, realizes the detection and automatic sorting of the surface defects of the gear, and can improve the detection efficiency of the surface defects of the gear.

Description

MSTA-YOLOv 5-based lightweight gear surface defect detection method
Technical Field
The invention relates to the technical field of product defect detection, in particular to a light-weight gear surface defect detection method based on MSTA-YOLOv 5.
Background
With the development of science and technology and the change of social requirements, large-scale multi-face complex structure workpieces are more popularized in industrial production, gears are transmission parts widely applied in the mechanical industry, and the quality of the gears is particularly important in production. However, in the actual production process, various defects appear on the surface of the gear due to the influence of factors such as process flow, production equipment, field environment and the like, and if the defects are not processed in time, the apparent quality, the performance and the service life of the gear are influenced, so that the production benefit of enterprises is reduced. Therefore, need detect the gear surface, and traditional artifical detection load is big, easily causes inspection personnel visual fatigue, appears louing to examine, the wrong detection.
In recent years, with the rapid development of machine vision technology, machine vision-based detection technology is applied to product surface quality detection. Most of the existing gear defect detection technologies adopt a digital image processing technology, but the technology has a single processing mode and algorithm, and is difficult to effectively extract a defect target for a gear with high surface complexity, so that the detection effect is not ideal.
For example, patent document CN115187820A discloses a lightweight target detection method, device, equipment, and storage medium, and in the YOLOv4 network structure, shuffleNetv is used as a feature extraction module, but the number of parameters and the amount of calculation are very large, and the feature extraction module uses an SE attention mechanism, which has the disadvantage of insufficient precision;
for example, patent document No. CN112990325A discloses a light network construction method for embedded real-time visual target detection, which uses a CBAM attention mechanism and has the advantage of light weight, but the accuracy loss is large as the light weight is improved, and in the technical scheme, a Focus slicing operation is adopted, so that the parameter amount is increased, and the advantage of light weight is weakened;
for example, patent document CN114898171A discloses a real-time target detection method suitable for embedded platform, which achieves the effect of light weight, but still causes great loss of precision.
With the development of artificial intelligence technology, the deep learning method has excellent performance advantages when processing industrial images with complex backgrounds and weak defects, and is widely applied to the fields of image processing and workpiece quality detection. By adopting the deep learning method, the semantic recognition and segmentation can be accurately carried out on the surface defects of the gear, the interference of background and other factors is reduced, and the detection accuracy is effectively improved. Although a great deal of research is carried out to improve different target detection networks and detect defects of industrial products, considerable effects are achieved, no research is carried out aiming at models which are small in size and less in calculation parameters and are needed in enterprises, and the models can achieve good detection speed and accuracy in equipment which is low in cost budget and relatively insufficient in calculation force.
Disclosure of Invention
The accuracy of the deep learning method is greatly improved in the field of image classification, but the current target detection algorithm based on deep learning has high cost due to large computing resource requirement and serious memory consumption. Aiming at the problems and the problems that enterprises need low-delay models and mobile equipment terminals need fast and accurate small models, the invention provides a light-weight gear surface defect detection method based on MSTA-YOLOv5, which realizes detection and automatic sorting of gear surface defects and can improve detection efficiency of gear surface defect detection.
The technical scheme for solving the technical problem of the invention is as follows:
a light-weight gear surface defect detection method based on MSTA-YOLOv5 comprises the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; and finally, sending the gear defect image to be detected into a trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear.
The MSTA-YOLOv5 detection model comprises the following steps:
an input section: inputting the gear surface defect image into an MSTA-YOLOv5 network, and performing self-adaptive anchor frame calculation and Mosaic9 data enhancement;
a backbone part: the feature extraction backbone network adopts a ShuffleNetv2 architecture and comprises a CBRM operation, a first downsampling layer, a second convolution normalization layer, a second downsampling layer, a third convolution normalization layer, a third downsampling layer and a fourth convolution normalization layer which are sequentially connected; respectively recording 3 gear surface defect feature maps obtained by performing feature extraction on the gear surface defect image subjected to the down-sampling layer processing by utilizing 1-by-1 convolution as S2, S3 and S4;
a neck portion: the Neck portion Neck structure adopts FPN + PAN, the FPN layers transmit strong semantic information from top to bottom, S4 is convoluted by 3 x 3 to obtain a characteristic diagram Q4, Q4 is connected with S3 after being subjected to transposed convolution and up-sampling, and the characteristic diagram Q3 is obtained through 3 x 3 convolution; q3 is connected with S2 after being subjected to transposed convolution and upsampling, and then is subjected to convolution with 3 x 3 to obtain a characteristic diagram which is marked as Q2;
the PAN transmits strong positioning information from bottom to top, the characteristic diagram Q2 is used as a bottom layer characteristic R2, and the R2 is connected with Q3 after being downsampled to obtain a characteristic diagram R3; r3 is connected with Q4 after down-sampling, and the obtained characteristic diagram is marked as R4; r2, R3 and R4 are respectively subjected to convolution with 3 x 3 to obtain characteristic diagrams T2, T3 and T4;
respectively integrating an AMECA attention module behind the last 3C 3 modules of the Neck Neck structure, respectively taking the feature maps T2, T3 and T4 as original input feature maps, respectively passing through a global average pooling module and a global maximum pooling module, adding the two obtained feature maps, compressing spatial information, then using 1 × 1 to convolution learn channel attention information, combining the obtained channel attention information with the original input feature maps, and finally obtaining specific channel attention feature maps D1, D2 and D3;
an output section: and respectively inputting the characteristic diagrams D1, D2 and D3 into a YOLOv5-MSTA detection head network to finally obtain a detection result.
Further, the Mosaic9 data enhancement comprises: firstly, taking out a batch of data from a total data set, randomly taking out 9 pictures from the data set each time, cutting and zooming at random positions, and synthesizing new pictures; the process is repeated for the batch-size times, and finally a batch of new data comprising the batch-size pictures subjected to the Mosaic9 data enhancement is obtained and transmitted to the neural network for training.
Further, the CBRM operation comprises Conv, BN, reLU and MaxPool.
Further, the first downsampling layer, the second downsampling layer and the third downsampling layer all include a Shuffle _ Block (d) module, the Shuffle _ Block (d) divides the input feature into two branches, and the left branch has 2 convolutional layers which are respectively a 3 × 3 deep convolutional layer with a step size of 2 and a common convolutional layer with a step size of 1 × 1; three convolution layers are arranged on the right branch, namely 1 × 1 ordinary convolution, 3 × 3 depth convolution with the step length of 2 and 1 × 1 ordinary convolution respectively; the left and right branches are spliced by Concat to fuse the features of the left and right branches, and finally, information communication between the two branches is enabled through a channel shuffling operation.
Further, the second convolution normalization layer, the third convolution normalization layer, and the fourth convolution normalization layer each include a Shuffle _ Block (c) module, the Shuffle _ Block (c) module splits each channel into two branches, no operation is performed on the left branch according to the fragmentation degree criterion of the model to be reduced, 3 convolutional layers are provided on the right branch, which are respectively 1 × 1 normal convolution, 3 × 3 deep convolution, 1 × 1 normal convolution, 3 × 3 deep convolution, and 1 × 1 normal convolution, and three convolutional layers have the same input and output channels, wherein two 1 × 1Conv are no longer group convolutional products but are changed into normal convolutions, and after the 3 convolutions, the two branches are spliced by Concat.
Further, the operation of transposing convolution upsampling includes:
(1) Filling s-1 rows and columns of 0 among elements of the input feature graph, wherein s represents the step pitch of the transposed convolution;
(2) Filling k-p-1 rows and columns 0 around the input feature map, wherein k represents the kernel _ size of the transposed convolution, and p is the filling of the transposed convolution;
(3) Turning the convolution kernel parameters up and down and left and right;
(4) And (5) performing normal convolution operation, filling 0 and step 1.
Further, the process of the AMECA attention module includes:
(1) Firstly, inputting a feature map X, wherein the dimension of the feature map X is H, W and C;
(2) Performing spatial feature compression on the input feature map X; using the global average pooled GAP in the spatial dimension to obtain a feature map F1 of 1 × c; using global maximum pooling GMP, a feature map F2 of 1 × c was obtained;
(3) Fusing the F1 and the F2 to obtain a feature map F3 of 1 × C, and obtaining semantic information of a higher level;
(4) Channel feature learning is carried out on the fused feature map F3; learning the importance among different channels through 1 × 1 convolution, wherein the dimension of the output feature graph F4 is still 1 × C;
(5) Obtaining F41 by the feature map F4 through a sigma function;
(6) Finally, combining the channel attention, and performing channel-by-channel product on the characteristic diagram F41 of the channel attention and the original input characteristic diagram X to finally output a characteristic diagram X' with the channel attention;
wherein, H, W and C respectively represent the height, width and channel number of the input characteristic diagram, and sigma represents an activation function;
the characteristic diagrams T2, T3 and T4 are used as input characteristic diagrams X, and the obtained output characteristic diagrams X' are respectively characteristic diagrams D1, D2 and D3.
A computer readable medium has stored thereon a computer program for executing the method as described above.
The invention has the beneficial effects that:
according to the method, the input end is enhanced by adopting Mosaic9 data, so that a small sample target is added while a data set is enriched, and the training speed and the generalization capability of the network are improved; in order to facilitate model deployment, the ShuffleNet 2 is used as a backbone network extraction feature, channel rearrangement realizes cross-group information exchange, a YOLOv5 lightweight neural network model is constructed, the number of network parameters is reduced, and the model detection speed is increased; by adopting the transposition convolution mode to perform upsampling, the upsampling at the semantic level is realized, so that the network can be lighter while the characteristics contain stronger semantic information; and finally, an AMECA attention mechanism is added into the Neck structure, and the information extraction mode of the model channel is adjusted through an attention module, so that the channel characteristics are enhanced, the defect detection is more accurate, the extraction capability of the defect characteristics of the gear is further enhanced, and the detection performance of the defect model of the gear is improved.
Drawings
FIG. 1 is a network architecture diagram of the MSTA-YOLOv5 detection model of the present invention;
FIG. 2 is a flow chart of the Mosaic9 data enhancement of the present invention;
FIG. 3 is a structural diagram of two modules of ShuffleNet v2 of the present invention;
FIG. 4 is a block diagram of an AMECA attention module of the present invention;
FIG. 5 is a network architecture diagram of YOLOv 5;
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
A light-weight gear surface defect detection method based on MSTA-YOLOv5 comprises the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; and finally, sending the gear defect image to be detected into a trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear. The gear surface defect types comprise three types of tooth bottom black skin, tooth surface black skin and bump.
As shown in fig. 1, the MSTA-YOLOv5 detection model includes:
an input section: inputting the gear surface defect image into an MSTA-YOLOv5 network, and performing self-adaptive anchor frame calculation and Mosaic9 data enhancement;
a backbone part: the feature extraction backbone network adopts a ShuffleNetv2 architecture and comprises a CBRM operation, a first downsampling layer, a second convolution normalization layer, a second downsampling layer, a third convolution normalization layer, a third downsampling layer and a fourth convolution normalization layer which are sequentially connected; marking 3 gear surface defect feature maps obtained after the gear surface defect images subjected to the down-sampling layer processing are subjected to feature extraction by utilizing 1-to-1 convolution as S2, S3 and S4 respectively;
as shown in fig. 5 and fig. 1, based on the traditional YOLOv5 model, the present invention uses the ShuffleNetV2 architecture to replace the CSPDarknet53 as a feature extraction network, and constructs the YOLOv5 lightweight neural network model. The ShuffleNet V2 not only inherits the characteristics of the ShuffleNet grouping volume and the channel rearrangement, but also follows 4 criteria for designing the lightweight network. Under the same condition, the ShuffleNet V2 has higher speed and better accuracy compared with other models. The MSTA-YOLOv5 model inputs the target characteristic quantity extracted by the ShuffleNet V2, adaptively adjusts parameters of the network model according to the loss value returned by each iteration, and can obtain a detection model with the best evaluation index after the loss value converges to be stable. The parameter quantity Parameters and the calculated quantity FLOPs of the model are greatly reduced, and the size of the model is reduced. The comparison of the number of layers, the amount of parameters and the amount of calculations for the two architectures CSPDarknet53 and ShuffleNetv2 is shown in table 1.
TABLE 1 backbone network parameter comparison
Model (model) Number of network layers Amount of ginseng Calculated amount of
CSPDarknet53 270 7.03M 16.0GFLOPs
ShuffleNetv2 308 3.79M 8.0GFLOPs
In which, FLOPs refers to the magnitude of the calculated quantity, and for the convolutional layer, the calculation formula of FLOPs is as follows:
FLOPs=2HW(C in K 2 +1)C out (1)
wherein, C in Refers to the number of channels, C, of the convolutional layer input tensor out The number of channels of the convolutional layer output tenor is referred, K refers to the size of a convolutional kernel, and then a constant term is removed and simplified into:
FLOPs=HW(C in K 2 )C out (2)
for the convolutional layer, the parameter Parameters are calculated as follows:
parameters=Co×(Ci×K×K+1) (3)
wherein Co is the number of output channels, ci is the number of input channels, and K refers to the size of a convolution kernel;
H. w represents the height and width of the input characteristic diagram respectively;
as shown in fig. 3, two modules of the Shuffle netv2 backbone are Shuffle _ Block (d), shuffle _ Block (c) and Shuffle _ Block (d), which can be distinguished in the yaml configuration file by the step size, where the step size stride =2 in Shuffle _ Block (c) and the step size stride =1 in Shuffle \ Block (d), which are used alternately in the present invention. The method comprises the steps of replacing Focus slices at the input end of an original YOLOv5 network with CBRM, performing convolution by 3 x 3, replacing all Conv + C3 of a backbone network with Shuffle _ Block, and removing SPP and a subsequent C3 structure because the speed is influenced by the parallel operation of the SPP.
A neck portion: the Neck portion Neck structure adopts FPN + PAN, the FPN layers transmit strong semantic information from top to bottom, S4 is convoluted by 3 x 3 to obtain a characteristic diagram Q4, Q4 is connected with S3 after being subjected to transposed convolution and up-sampling, and the characteristic diagram Q3 is obtained through 3 x 3 convolution; q3 is connected with S2 after being subjected to transposed convolution and upsampling, and then is subjected to convolution with 3 x 3 to obtain a characteristic diagram which is marked as Q2;
the PAN transmits strong positioning information from bottom to top, the characteristic diagram Q2 is used as a bottom layer characteristic R2, and the R2 is connected with Q3 after being downsampled to obtain a characteristic diagram R3; r3 is connected with Q4 after being downsampled, and the obtained characteristic diagram is marked as R4; r2, R3 and R4 are respectively convolved by 3 x 3 to obtain characteristic diagrams T2, T3 and T4;
respectively integrating an AMECA attention module behind the last 3C 3 modules of the Neck Neck structure, respectively taking the feature maps T2, T3 and T4 as original input feature maps, respectively passing through a global average pooling module and a global maximum pooling module, adding the two obtained feature maps, compressing spatial information, then using 1 × 1 convolution to learn channel attention information, combining the obtained channel attention information with the original input feature maps, and finally obtaining specific channel attention feature maps D1, D2 and D3;
an output section: and respectively inputting the characteristic diagrams D1, D2 and D3 into a YOLOv5-MSTA detection head network to finally obtain a detection result.
As shown in fig. 2, the Mosaic9 data enhancement includes: firstly, taking out a batch of data from a total data set, randomly taking out 9 pictures from the data set each time, cutting and zooming at random positions, and synthesizing new pictures; the process is repeated for the batch-size times, and finally a batch of new data comprising the batch-size pictures subjected to the Mosaic9 data enhancement is obtained and transmitted to the neural network for training. The method adopts Mosaic9 data enhancement, adopts 9 images to perform random cutting and zooming, and then performs random arrangement and splicing to form a picture, thereby improving and realizing rich data sets, increasing the small sample target, improving the training speed and generalization capability of the network, calculating the data of the 9 images at one time during normalization operation, and reducing the requirement of the model on the memory.
Specifically, the CBRM operations include Conv, BN, reLU, and MaxPool.
Specifically, the first downsampling layer, the second downsampling layer and the third downsampling layer all include a Shuffle _ Block (d) module, the Shuffle _ Block (d) does not perform shunting operation any more, the input feature is divided into two branches, and the left branch has 2 convolutional layers which are respectively a 3 × 3 deep convolutional layer with a step length of 2 and a common convolutional layer with a step length of 1 × 1; three convolution layers are arranged on the right branch, namely 1 × 1 common convolution, 3 × 3 depth convolution with the step length of 2 and 1 × 1 common convolution respectively; the left and right branches are spliced by Concat to fuse the features of the left and right branches, and finally, information communication between the two branches is enabled through a channel shuffling operation. Unlike Shuffle _ Block (c), 3 × 3 deep convolutions are introduced on both the left and right sides to achieve downsampling.
Specifically, the second convolution normalization layer, the third convolution normalization layer, and the fourth convolution normalization layer each include a Shuffle _ Block (c) module, the Shuffle _ Block (c) module splits each channel into two branches, no operation is performed on the left branch according to the fragmentation degree criterion of the model to be reduced, 3 convolutional layers are provided on the right branch, which are respectively 1 × 1 normal convolution, 3 × 3 deep convolution, 1 × 1 normal convolution, 3 × 3 deep convolution, and 1 × 1 normal convolution, and three convolutional layers have the same input and output channels, where two 1 × 1Conv are no longer group convolutional products but are changed into normal convolutions, and after the 3 convolutions, the two branches are spliced by Concat; this allows the input and output channels to be identical, the two branches being the result of splicing by Concat for a channel shuffle operation, which enables the communication of information between the two branches.
Specifically, the operation step of transposing convolution upsampling includes:
(1) Filling s-1 rows and columns 0 among the elements of the input feature graph, wherein s represents the step pitch of the transposition convolution;
(2) Filling k-p-1 rows and columns 0 around the input feature map, wherein k represents the kernel _ size of the transposition convolution, and p is filling of the transposition convolution;
(3) Turning the convolution kernel parameters up and down and left and right;
(4) And (5) performing normal convolution operation, filling 0 and step 1.
The optimal upsampling mode is obtained by adopting the upsampling mode of transposition convolution and self-learning of the network, so that the upsampling at the semantic level is realized, and the features contain stronger semantic information. In the transposition convolution calculation process, each input element value is used as the weight of a convolution kernel, the multiplied values are used as the up-sampling output corresponding to the element, and the overlapped output parts of different inputs are directly added to be used as the output.
The proposed AMECA attention module is shown in fig. 4, and the feature map passes through a global average pooling module and a global maximum pooling module, and the two obtained feature maps are added to compress spatial information, and then 1 × 1 convolution is used to learn channel attention information, and the obtained channel attention information is combined with an original input feature map to finally obtain a specific channel attention feature map. AMECA avoids dimension reduction, effectively captures cross-channel interaction information, and enables the network to more accurately locate and identify a target area.
The process of the AMECA attention module comprises the following steps:
(1) Firstly, inputting a feature map X, wherein the dimension of the feature map X is H X W X C;
(2) Performing spatial feature compression on the input feature map X; using global average pooling GAP in spatial dimensions, obtaining a feature map F1 of 1 × C; using global maximum pooling GMP to obtain a feature map F2 of 1 × c;
(3) Fusing the F1 and the F2 to obtain a feature map F3 of 1 × C, and obtaining semantic information of a higher level;
(4) Channel feature learning is carried out on the fused feature map F3; learning the importance among different channels through 1 × 1 convolution, wherein the dimension of the output feature graph F4 is still 1 × C;
(5) Obtaining F41 by the characteristic diagram F4 through a sigma function;
(6) Finally, combining the attention of the channels, and performing channel-by-channel product on the characteristic diagram F41 of the attention of the channels and the original input characteristic diagram X to finally output a characteristic diagram X' with the attention of the channels;
wherein, H, W and C respectively represent the height, width and channel number of the input characteristic diagram, and sigma represents an activation function;
the characteristic diagrams T2, T3 and T4 are used as input characteristic diagrams X, and the obtained output characteristic diagrams X' are respectively characteristic diagrams D1, D2 and D3.
In another embodiment, a computer readable medium has stored thereon a computer program for executing the method as described above.
The calculated amount and the parameter amount of the network after the lightweight processing are greatly reduced, an AMECA attention mechanism is introduced after the last 3C 3 modules of the Neck structure of the YOLOv5 network structure model, and the information extraction mode of the model space and the channel is adjusted. The method well ensures no loss of precision and can meet the requirement of real-time detection of the surface defects of the gear.
Comparative experiments were performed according to the present invention as shown in table 2. The comparison test is measured by parameter quantity, calculated quantity and model size, and the smaller the parameter quantity, calculated quantity and model size is, the lower the network complexity is. The parameters, calculated amount and model size of the detection network adopting the ShuffLeNetv2 module are lower than those of the detection network not adopting the ShuffLeNetv2 module under the condition of other same parameters, so that the ShuffLeNetv2 module has the function of light weight.
TABLE 2 comparison of parameters, calculated quantities, model sizes for different models
Figure SMS_1
Figure SMS_2
According to the detection results and the results in the table, the MSTA-YOLOv5 model has great advantages compared with the YOLOv3, YOLOv4 and YOLOv5s models. Compared with the original YOLOv5s model, the parameters and the calculated amount are greatly reduced, the parameter amount is reduced by about 46%, the calculated amount is reduced by 50%, and the size of the model is reduced by about 44%. The new model is more simplified, the complexity is obviously reduced, the requirement of deployment at a mobile end is met, and the gear surface defect detection method has a better detection result.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto, and various modifications and changes that can be made by those skilled in the art without inventive efforts based on the technical solutions of the present invention are within the scope of the present invention.

Claims (7)

1. A light-weight gear surface defect detection method based on MSTA-YOLOv5 is characterized by comprising the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; finally, the gear defect image to be detected is sent to a trained MSTA-YOLOv5 detection model, and the defect type of the detected gear is obtained;
the MSTA-YOLOv5 detection model comprises:
an input section: inputting the gear surface defect image into an MSTA-YOLOv5 network, and performing self-adaptive anchor frame calculation and Mosaic9 data enhancement;
a backbone part: the feature extraction backbone network adopts a ShuffleNetv2 architecture and comprises a CBRM operation, a first downsampling layer, a second convolution normalization layer, a second downsampling layer, a third convolution normalization layer, a third downsampling layer and a fourth convolution normalization layer which are sequentially connected; respectively recording 3 gear surface defect feature maps obtained by performing feature extraction on the gear surface defect image subjected to the down-sampling layer processing by utilizing 1-by-1 convolution as S2, S3 and S4;
a neck portion: the Neck portion Neck structure adopts FPN + PAN, the FPN layers transmit strong semantic information from top to bottom, S4 is convoluted by 3 x 3 to obtain a characteristic diagram Q4, Q4 is connected with S3 after being subjected to transposition convolution and upsampling, and the characteristic diagram Q3 is obtained through 3 x 3 convolution; q3 is connected with S2 after being subjected to transposed convolution and up sampling, and the characteristic graph obtained through 3-by-3 convolution is marked as Q2;
PAN transmits strong positioning information from bottom to top, the characteristic diagram Q2 is used as a bottom characteristic R2, and the R2 is connected with the Q3 after down sampling to obtain a characteristic diagram R3; r3 is connected with Q4 after being downsampled, and the obtained characteristic diagram is marked as R4; r2, R3 and R4 are respectively convolved by 3 x 3 to obtain characteristic diagrams T2, T3 and T4;
respectively integrating an AMECA attention module behind the last 3C 3 modules of the Neck Neck structure, respectively taking the feature maps T2, T3 and T4 as original input feature maps, respectively passing through a global average pooling module and a global maximum pooling module, adding the two obtained feature maps, compressing spatial information, then using 1 × 1 convolution to learn channel attention information, combining the obtained channel attention information with the original input feature maps, and finally obtaining specific channel attention feature maps D1, D2 and D3;
an output section: and respectively inputting the characteristic diagrams D1, D2 and D3 into a YOLOv5-MSTA detection head network to finally obtain a detection result.
2. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the Mosaic9 data enhancement comprises: firstly, taking out a batch of data from a total data set, randomly taking out 9 pictures from the data set each time, cutting and zooming at random positions, and synthesizing new pictures; the process is repeated for the batch-size times, and finally a batch of new data comprising the batch-size pictures subjected to the Mosaic9 data enhancement is obtained and transmitted to the neural network for training.
3. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the CBRM operation comprises Conv, BN, reLU and MaxPool.
4. The MSTA-YOLOv 5-based lightweight gear surface defect detection method of claim 1, wherein the first downsampling layer, the second downsampling layer and the third downsampling layer respectively comprise a Shuffle _ Block (d) module, the Shuffle _ Block (d) module divides an input feature into two branches, the left branch has 2 convolution layers, and the convolution layers are respectively a 3 x 3 depth convolution layer with a step size of 2 and a common convolution with a step size of 1 x 1; three convolution layers are arranged on the right branch, namely 1 × 1 common convolution, 3 × 3 depth convolution with the step length of 2 and 1 × 1 common convolution respectively; splicing the left and right branches through Concat to fuse the characteristics of the left and right branches, and finally starting information communication between the two branches through channel shuffling operation;
the second convolution normalization layer, the third convolution normalization layer and the fourth convolution normalization layer respectively comprise a Shuffle _ Block (c) module, the Shuffle _ Block (c) module splits each channel and divides each channel into two branches, no operation is performed on the left branch according to the fragmentation degree criterion of a model to be reduced, 3 convolutional layers are arranged on the right branch and respectively comprise 1 × 1 ordinary convolution, 3 × 3 deep convolution and 1 × 1 ordinary convolution, the three convolutional layers of 1 × 1 ordinary convolution, 3 × 3 deep convolution and 1 × 1 ordinary convolution have the same input and output channels, two 1 × 1Conv are not group convolutional products and are changed into ordinary convolution, and after the 3 convolutions, the two branches are spliced through Concat.
5. The MSTA-YOLOv 5-based lightweight gear surface defect detection method of claim 1, wherein the operation step of transpose convolution upsampling comprises:
(1) Filling s-1 rows and columns 0 among the elements of the input feature graph, wherein s represents the step pitch of the transposition convolution;
(2) Filling k-p-1 rows and columns 0 around the input feature map, wherein k represents the kernel _ size of the transposition convolution, and p is filling of the transposition convolution;
(3) Turning the convolution kernel parameters up and down, left and right;
(4) And (5) performing normal convolution operation, filling 0 and step 1.
6. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the AMECA attention module process comprises:
(1) Firstly, inputting a feature map X, wherein the dimension of the feature map X is H, W and C;
(2) Performing spatial feature compression on the input feature map X; using the global average pooled GAP in the spatial dimension to obtain a feature map F1 of 1 × c; using global maximum pooling GMP, a feature map F2 of 1 × c was obtained;
(3) Fusing the F1 and the F2 to obtain a feature map F3 of 1 × C, and obtaining higher-level semantic information;
(4) Channel feature learning is carried out on the fused feature map F3; learning the importance among different channels through 1 × 1 convolution, wherein the dimension of the output feature map F4 is still 1 × C;
(5) Obtaining F41 by the characteristic diagram F4 through a sigma function;
(6) Finally, combining the attention of the channels, and performing channel-by-channel product on the characteristic diagram F41 of the attention of the channels and the original input characteristic diagram X to finally output a characteristic diagram X' with the attention of the channels;
wherein, H, W and C respectively represent the height, width and channel number of the input characteristic diagram, and sigma represents an activation function;
the characteristic diagrams T2, T3 and T4 are used as input characteristic diagrams X, and the obtained output characteristic diagrams X' are respectively characteristic diagrams D1, D2 and D3.
7. A computer-readable medium, characterized in that a computer program is stored thereon for performing the method according to any of claims 1-6.
CN202310056291.5A 2023-01-06 2023-01-18 MSTA-YOLOv 5-based lightweight gear surface defect detection method Pending CN115953386A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310017224 2023-01-06
CN2023100172242 2023-01-06

Publications (1)

Publication Number Publication Date
CN115953386A true CN115953386A (en) 2023-04-11

Family

ID=87290776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310056291.5A Pending CN115953386A (en) 2023-01-06 2023-01-18 MSTA-YOLOv 5-based lightweight gear surface defect detection method

Country Status (1)

Country Link
CN (1) CN115953386A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502950A (en) * 2023-04-26 2023-07-28 佛山科学技术学院 Defect detection method based on federal learning and related equipment
CN116721071A (en) * 2023-06-05 2023-09-08 南京邮电大学 Industrial product surface defect detection method and device based on weak supervision

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116502950A (en) * 2023-04-26 2023-07-28 佛山科学技术学院 Defect detection method based on federal learning and related equipment
CN116721071A (en) * 2023-06-05 2023-09-08 南京邮电大学 Industrial product surface defect detection method and device based on weak supervision
CN116721071B (en) * 2023-06-05 2024-08-06 南京邮电大学 Industrial product surface defect detection method and device based on weak supervision

Similar Documents

Publication Publication Date Title
CN111325751B (en) CT image segmentation system based on attention convolution neural network
CN109241972B (en) Image semantic segmentation method based on deep learning
CN109190752B (en) Image semantic segmentation method based on global features and local features of deep learning
CN113850824B (en) Remote sensing image road network extraction method based on multi-scale feature fusion
CN115953386A (en) MSTA-YOLOv 5-based lightweight gear surface defect detection method
CN111144329B (en) Multi-label-based lightweight rapid crowd counting method
CN110046550B (en) Pedestrian attribute identification system and method based on multilayer feature learning
CN110223304B (en) Image segmentation method and device based on multipath aggregation and computer-readable storage medium
CN113239930A (en) Method, system and device for identifying defects of cellophane and storage medium
CN112767423B (en) Remote sensing image building segmentation method based on improved SegNet
CN110020658B (en) Salient object detection method based on multitask deep learning
CN115082928B (en) Method for asymmetric double-branch real-time semantic segmentation network facing complex scene
CN113066065B (en) No-reference image quality detection method, system, terminal and medium
CN116416497B (en) Bearing fault diagnosis system and method
CN115205147A (en) Multi-scale optimization low-illumination image enhancement method based on Transformer
CN110866938B (en) Full-automatic video moving object segmentation method
CN111798469A (en) Digital image small data set semantic segmentation method based on deep convolutional neural network
CN112149526B (en) Lane line detection method and system based on long-distance information fusion
CN114821058A (en) Image semantic segmentation method and device, electronic equipment and storage medium
CN113763327A (en) CBAM-Res _ Unet-based power plant pipeline high-pressure steam leakage detection method
CN113436198A (en) Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction
CN114998756A (en) Yolov 5-based remote sensing image detection method and device and storage medium
CN111160378A (en) Depth estimation system based on single image multitask enhancement
CN118397367A (en) Tampering detection method based on convolution vision Mamba
CN111914853B (en) Feature extraction method for stereo matching

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination