CN115953386A

CN115953386A - A detection method for surface defects of lightweight gears based on MSTA-YOLOv5

Info

Publication number: CN115953386A
Application number: CN202310056291.5A
Authority: CN
Inventors: 闫蕊; 张让勇; 刘琦; 顾笑言; 郭文杰
Original assignee: Qilu University of Technology; National Supercomputing Center in Jinan
Current assignee: Qilu University of Technology; National Supercomputing Center in Jinan
Priority date: 2023-01-06
Filing date: 2023-01-18
Publication date: 2023-04-11

Abstract

The invention relates to the technical field of product defect detection, and discloses a light-weight gear surface defect detection method based on MSTA-YOLOv5, which comprises the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; and finally, sending the gear defect image to be detected into a trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear. The invention solves the problems that the computing resource demand is too large, the memory consumption is serious, the cost is higher, an enterprise needs a low-delay model, and a mobile device terminal needs a fast and accurate small model, realizes the detection and automatic sorting of the surface defects of the gear, and can improve the detection efficiency of the surface defects of the gear.

Description

A detection method for surface defects of lightweight gears based on MSTA-YOLOv5

技术领域technical field

本发明涉及产品缺陷检测技术领域，具体涉及一种基于MSTA-YOLOv5的轻量化齿轮表面缺陷检测方法。The invention relates to the technical field of product defect detection, in particular to a method for detecting surface defects of lightweight gears based on MSTA-YOLOv5.

背景技术Background technique

随着科学技术的发展与社会需求变化，大型多面复杂结构工件在工业生产中更加普及，齿轮是机械工业中应用广泛的传动部件，其质量在生产中尤为重要。但在实际生产过程中，由于工艺流程、生产设备和现场环境等因素的影响，造成齿轮表面出现各种缺陷，如果不及时处理，将会影响齿轮表观质量、性能和使用寿命，导致企业生产效益下降。因此，需要对齿轮表面进行检测，而传统的人工检测工作量大，易造成检测人员视觉疲劳，出现漏检、错检。With the development of science and technology and changes in social needs, large multi-faceted and complex structural workpieces are more popular in industrial production. Gears are widely used transmission components in the machinery industry, and their quality is particularly important in production. However, in the actual production process, due to the influence of factors such as process flow, production equipment and on-site environment, various defects appear on the surface of the gear. Benefits drop. Therefore, it is necessary to inspect the surface of the gear, and the traditional manual inspection has a large workload, which may easily cause visual fatigue of the inspectors, resulting in missed inspections and false inspections.

近年来，随着机器视觉技术快速发展，基于机器视觉的检测技术被应用在产品表面质量检测中。而现在的齿轮缺陷检测技术大部分采用数字化图像处理技术，但这种技术的处理方式、算法单一，面对表面复杂性较高的齿轮来说，很难有效的提取缺陷目标，导致检测效果不理想。In recent years, with the rapid development of machine vision technology, inspection technology based on machine vision has been applied in product surface quality inspection. Most of the current gear defect detection technology adopts digital image processing technology, but the processing method and algorithm of this technology are single, and it is difficult to effectively extract defect targets for gears with high surface complexity, resulting in poor detection results. ideal.

如公开号为CN115187820A的专利文件公开了轻量化的目标检测方法、装置、设备、存储介质，在YOLOv4网络结构中采用了ShuffleNetv作为特征提取模块，但是参数量和计算量非常大，且特征提取模块中用到了SE注意力机制，具有精度不足的缺点；For example, the patent document with the publication number CN115187820A discloses lightweight target detection methods, devices, equipment, and storage media. ShuffleNetv is used as the feature extraction module in the YOLOv4 network structure, but the amount of parameters and calculations are very large, and the feature extraction module The SE attention mechanism is used in the method, which has the disadvantage of insufficient precision;

如公开号为CN112990325A的专利文件公开了一种面向嵌入式实时视觉目标检测的轻型网络构建方法，用到了CBAM注意力机制，具有轻量化的优点，但是随着轻量化提高其精度损失较大，且该技术方案中采用了Focus切片操作，增加了参数量，减弱了轻量化的优势；For example, the patent document with the publication number CN112990325A discloses a light-weight network construction method for embedded real-time visual target detection, using the CBAM attention mechanism, which has the advantage of light weight, but with the increase in light weight, the loss of accuracy is large. Moreover, the Focus slicing operation is adopted in this technical solution, which increases the amount of parameters and weakens the advantages of light weight;

如公开号为CN114898171A的专利文件公开了一种适用于嵌入式平台的实时目标检测方法，虽然达到了轻量化的效果，但是精度还是造成了比较大的损失。For example, the patent document whose publication number is CN114898171A discloses a real-time target detection method suitable for embedded platforms. Although the effect of light weight is achieved, the accuracy still causes a relatively large loss.

随着人工智能技术的发展，深度学习方法因其在处理背景复杂、缺陷微弱的工业图像时展现出卓越的性能优势，被广泛应用于图像处理和工件质量检测领域。采用深度学习方法，能够准确地对齿轮表面缺陷进行语义识别和分割，降低了背景及其他因素的干扰，从而有效提升了检测准确性。虽然有大量研究对不同的目标检测网络进行改进并对工业产品进行缺陷检测，达到了可观的效果，但没有针对企业内需要体积小、计算参数更少的模型进行研究，这样的模型在成本预算低、算力相对不足的设备中也能达到不错的检测速度和准确率。With the development of artificial intelligence technology, deep learning methods are widely used in the fields of image processing and workpiece quality inspection because of their excellent performance advantages in processing industrial images with complex backgrounds and weak defects. Using the deep learning method, it can accurately identify and segment the gear surface defects semantically, reducing the interference of background and other factors, thus effectively improving the detection accuracy. Although there are a lot of researches on improving different target detection networks and detecting defects in industrial products, which have achieved considerable results, there is no research on the models that require small size and fewer calculation parameters in the enterprise. Such models are in the cost budget. Devices with low computing power and relatively insufficient computing power can also achieve good detection speed and accuracy.

发明内容Contents of the invention

深度学习方法在图像分类领域已经极大地提高了准确度，但目前基于深度学习的目标检测算法计算资源需求太大、内存消耗严重使得成本较高。本发明针对上述问题以及企业需要低延迟模型并且移动设备终端需要既快又准确的小模型问题，提供了一种基于MSTA-YOLOv5的轻量化齿轮表面缺陷检测方法，实现齿轮表面缺陷的检测与自动分拣，能够实现齿轮表面缺陷检测的检测效率提高。The deep learning method has greatly improved the accuracy in the field of image classification, but the current target detection algorithm based on deep learning requires too much computing resources and serious memory consumption, which makes the cost high. Aiming at the above problems and the problem that enterprises need low-latency models and mobile device terminals need fast and accurate small models, the present invention provides a light-weight gear surface defect detection method based on MSTA-YOLOv5, which realizes the detection and automatic detection of gear surface defects. Sorting can improve the detection efficiency of gear surface defect detection.

本发明解决技术问题的技术方案为：The technical scheme that the present invention solves technical problem is:

一种基于MSTA-YOLOv5的轻量化齿轮表面缺陷检测方法，包括以下步骤：首先获取齿轮表面缺陷图像，并对所述的图像进行标注和划分，构建齿轮表面缺陷数据集；然后构建MSTA-YOLOv5检测模型，基于齿轮表面缺陷数据集对MSTA-YOLOv5检测模型进行训练；最后将待检测的齿轮缺陷图像送入训练好的MSTA-YOLOv5检测模型，获取检测齿轮的缺陷类型。A light-weight gear surface defect detection method based on MSTA-YOLOv5, comprising the following steps: firstly obtain a gear surface defect image, and mark and divide the image to construct a gear surface defect data set; then construct MSTA-YOLOv5 detection The model is based on the gear surface defect data set to train the MSTA-YOLOv5 detection model; finally, the gear defect image to be detected is sent to the trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear.

所述的MSTA-YOLOv5检测模型的包括：The MSTA-YOLOv5 detection model includes:

输入部分：将齿轮表面缺陷图像输入MSTA-YOLOv5网络，进行自适应锚框计算和Mosaic9数据增强；Input part: Input the gear surface defect image into the MSTA-YOLOv5 network for adaptive anchor frame calculation and Mosaic9 data enhancement;

骨干部分：特征提取主干网络采用ShuffleNetv2架构，包括依次连接的CBRM操作、第一下采样层、第二卷积归一化层、第二下采样层、第三卷积归一化层、第三下采样层、第四卷积归一化层；经过下采样层处理后的齿轮表面缺陷图像利用1*1卷积进行特征提取之后得到的3个齿轮表面缺陷特征图分别记为S2,S3,S4；Backbone part: the feature extraction backbone network adopts the ShuffleNetv2 architecture, including sequentially connected CBRM operations, the first downsampling layer, the second convolutional normalization layer, the second downsampling layer, the third convolutional normalization layer, and the third Downsampling layer, the fourth convolutional normalization layer; the gear surface defect image processed by the downsampling layer uses 1*1 convolution for feature extraction, and the three gear surface defect feature maps obtained are respectively denoted as S2, S3, S4;

颈部部分：颈部Neck结构采用FPN+PAN，FPN层自顶向下传递强语义信息，S4经过3*3卷积，得到特征图记为Q4，Q4经过转置卷积上采样后与S3相连接，再经过3*3卷积，得到特征图记为Q3；Q3经过转置卷积上采样后与S2相连接，再经过3*3的卷积，得到的特征图记为Q2；Neck part: The neck Neck structure adopts FPN+PAN, and the FPN layer transmits strong semantic information from top to bottom. S4 undergoes 3*3 convolution, and the feature map obtained is marked as Q4. connected, and then through 3*3 convolution, the obtained feature map is marked as Q3; Q3 is connected with S2 after transposed convolution upsampling, and then after 3*3 convolution, the obtained feature map is marked as Q2;

PAN自底向上传递强定位信息，特征图Q2作为底层特征R2，R2经过下采样后，与Q3相连接，得到特征图记为R3；R3经过下采样后与Q4相连接，得到的特征图记为R4；R2、R3、R4分别经过3*3的卷积，得到特征图T2、T3、T4；PAN transmits strong positioning information from the bottom up. The feature map Q2 is used as the bottom feature R2. After R2 is down-sampled, it is connected with Q3, and the feature map is marked as R3; after R3 is down-sampled, it is connected with Q4, and the obtained feature map is marked as is R4; R2, R3, and R4 undergo 3*3 convolution respectively to obtain feature maps T2, T3, and T4;

在颈部Neck结构的后3个C3模块之后分别集成一个AMECA注意力模块，分别将特征图T2、T3、T4作为原始输入特征图，分别经过全局平均池化模块与全局最大池化模块，并将得到的两个特征图进行相加，压缩空间信息，随后使用1*1卷积学习通道注意力信息，得到的通道注意力信息与原始输入特征图结合，最终得到具体的通道注意力特征图D1、D2、D3；After the last three C3 modules of the neck Neck structure, an AMECA attention module is integrated respectively, and the feature maps T2, T3, and T4 are used as the original input feature maps, respectively, through the global average pooling module and the global maximum pooling module, and Add the two obtained feature maps, compress the spatial information, and then use 1*1 convolution to learn channel attention information, combine the obtained channel attention information with the original input feature map, and finally obtain a specific channel attention feature map D1, D2, D3;

输出部分：将特征图D1、D2、D3分别输入YOLOv5-MSTA检测头网络，最终得到检测结果。Output part: Input the feature maps D1, D2, and D3 into the YOLOv5-MSTA detection head network respectively, and finally get the detection results.

进一步地，所述的Mosaic9数据增强包括：首先从总数据集中取出一个批量的数据，每次从中随机取出9张图片，进行随机位置的裁剪、缩放，合成新图片；上述过程重复batch-size次，最后得到包括batch-size个经过了Mosaic9数据增强后图片的一个批量的新数据，再传递给神经网络进行训练。Further, the Mosaic9 data enhancement includes: firstly take out a batch of data from the total data set, randomly take out 9 pictures from it each time, perform cropping and zooming at random positions, and synthesize new pictures; the above process is repeated batch-size times , and finally get a batch of new data including batch-size pictures after Mosaic9 data enhancement, and then pass it to the neural network for training.

进一步地，所述的CBRM操作包括Conv、BN、ReLU和MaxPool。Further, the CBRM operation includes Conv, BN, ReLU and MaxPool.

进一步地，所述的第一下采样层、第二下采样层、第三下采样层中均包括Shuffle_Block(d)模块，所述的Shuffle_Block(d)将输入特征分为两个分支，左侧分支有2个卷积层，分别为步长为2的3×3深度卷积和1×1的普通卷积；右侧分支上有三个卷积层，分别为1×1的普通卷积、步长为2的3×3深度卷积和1×1的普通卷积；左右两侧分支通过Concat进行拼接来使左右两侧的特征进行融合，最后通过通道混洗操作来启用两个分支之间的信息通信。Further, the first downsampling layer, the second downsampling layer, and the third downsampling layer all include a Shuffle_Block(d) module, and the Shuffle_Block(d) divides the input features into two branches, the left side The branch has 2 convolutional layers, which are 3×3 depth convolution with a step size of 2 and 1×1 ordinary convolution; there are three convolutional layers on the right branch, which are 1×1 ordinary convolution, 3×3 depth convolution with a step size of 2 and 1×1 ordinary convolution; the left and right branches are spliced through Concat to fuse the features on the left and right sides, and finally the channel shuffling operation is used to enable the two branches communication between information.

进一步地，所述的第二卷积归一化层、第三卷积归一化层、第四卷积归一化层中均包括Shuffle_Block(c)模块，所述的Shuffle_Block(c)模块对每个通道进行分流，划分为两个分支，根据要减少模型的碎片化程度准则，在左边的分支上没有进行任何操作，在右边的分支上有3个卷积层，分别是1×1普通卷积、3×3深度卷积、1×1普通卷积，1×1普通卷积、3×3深度卷积、1×1普通卷积三个卷积层拥有相同的输入和输出通道，其中两个所述的1×1Conv不再是组卷积而改变为普通的卷积，在所述的3个卷积之后，两个分支是通过Concat进行拼接。Further, the second convolutional normalization layer, the third convolutional normalization layer, and the fourth convolutional normalization layer all include a Shuffle_Block(c) module, and the Shuffle_Block(c) module pairs Each channel is split and divided into two branches. According to the criterion of reducing the degree of fragmentation of the model, no operation is performed on the left branch, and there are 3 convolutional layers on the right branch, which are 1×1 ordinary Convolution, 3×3 depth convolution, 1×1 ordinary convolution, 1×1 ordinary convolution, 3×3 depth convolution, 1×1 ordinary convolution three convolutional layers have the same input and output channels, Two of the 1×1Convs are no longer group convolutions but changed to ordinary convolutions. After the three convolutions, the two branches are spliced by Concat.

进一步地，所述的转置卷积上采样的运算步骤包括：Further, the operation steps of the transposed convolution upsampling include:

(1)在输入特征图元素间填充s-1行、列0，其中s表示转置卷积的步距；(1) Fill s-1 rows and columns 0 between the input feature map elements, where s represents the step size of the transposed convolution;

(2)在输入特征图四周填充k-p-1行、列0，其中k表示转置卷积的kernel_size大小，p为转置卷积的填充；(2) Fill k-p-1 rows and columns 0 around the input feature map, where k represents the kernel_size of the transposed convolution, and p is the padding of the transposed convolution;

(3)将卷积核参数上下、左右翻转；(3) Flip the convolution kernel parameters up and down, left and right;

(4)做正常卷积运算，填充0，步距1。(4) Do normal convolution operation, fill with 0, and step 1.

进一步地，所述的AMECA注意力模块的流程包括：Further, the process of the AMECA attention module includes:

(1)首先输入特征图X，特征图X的维度是H*W*C；(1) First input the feature map X, the dimension of the feature map X is H*W*C;

(2)对输入特征图X进行空间特征压缩；在空间维度，使用全局平均池化GAP，得到1*1*C的特征图F1；使用全局最大值池化GMP，得到1*1*C的特征图F2；(2) Perform spatial feature compression on the input feature map X; in the spatial dimension, use the global average pooling GAP to obtain the feature map F1 of 1*1*C; use the global maximum pooling GMP to obtain the 1*1*C feature map Feature map F2;

(3)将F1与F2融合得到1*1*C的特征图F3，获得更高层次的语义信息；(3) F1 and F2 are fused to obtain a 1*1*C feature map F3 to obtain higher-level semantic information;

(4)对融合后的特征图F3进行通道特征学习；通过1*1*1卷积，学习不同通道之间的重要性，此时输出的特征图F4维度还是1*1*C；(4) Carry out channel feature learning on the fused feature map F3; through 1*1*1 convolution, learn the importance between different channels, and the dimension of the output feature map F4 at this time is still 1*1*C;

(5)将特征图F4经过σ函数得到F41；(5) Pass the feature map F4 through the σ function to obtain F41;

(6)最后是通道注意力结合，将通道注意力的特征图F41与原始输入特征图X，进行逐通道乘积，最终输出具有通道注意力的特征图X'；(6) Finally, channel attention is combined, and the feature map F41 of channel attention is multiplied channel-by-channel with the original input feature map X, and finally the feature map X' with channel attention is output;

其中，H，W和C分别表示输入特征图的高、宽和通道数，σ代表激活函数；Among them, H, W and C represent the height, width and channel number of the input feature map, respectively, and σ represents the activation function;

所述的特征图T2、T3、T4作为输入特征图X，分别得到的输出特征图X'为特征图D1、D2、D3。The feature maps T2, T3, and T4 are used as input feature maps X, and the output feature maps X' obtained respectively are feature maps D1, D2, and D3.

一种计算机可读介质其上存储有计算机程序用于执行如上所述的方法。A computer readable medium having stored thereon a computer program for performing the method as described above.

本发明的有益效果：Beneficial effects of the present invention:

本发明通过在输入端采用Mosaic9数据增强，在丰富数据集的同时增加了小样本目标，提升网络的训练速度和泛化能力；为了便于模型部署，本发明使用ShuffleNetv2作为骨干网络提取特征，通道重排实现跨组信息交流，构建YOLOv5轻量级神经网络模型，降低网络参数量的同时提升模型检测速度；通过采用转置卷积方式进行上采样，实现了语义级别的上采样，使特征含有更强的语义信息的同时能够使网络更加轻量化；最后在Neck结构中加入AMECA注意力机制，通过注意力模块调整模型通道的信息提取方式，加强通道特征，使得缺陷检测更加精准，进一步增强齿轮缺陷特征提取能力，提高齿轮缺陷模型检测的性能。In the present invention, by using Mosaic9 data enhancement at the input end, the small sample target is added while enriching the data set, and the training speed and generalization ability of the network are improved; in order to facilitate model deployment, the present invention uses ShuffleNetv2 as the backbone network to extract features, and the channel weight To achieve cross-group information exchange, build a YOLOv5 lightweight neural network model, reduce the amount of network parameters and improve the speed of model detection; by using transposed convolution for upsampling, semantic level upsampling is realized, so that features contain more Strong semantic information can make the network more lightweight; finally, add the AMECA attention mechanism to the Neck structure, adjust the information extraction method of the model channel through the attention module, strengthen the channel features, make the defect detection more accurate, and further enhance the gear defect Feature extraction capability to improve the performance of gear defect model detection.

附图说明Description of drawings

图1为本发明的MSTA-YOLOv5检测模型的网络结构图；Fig. 1 is the network structure diagram of the MSTA-YOLOv5 detection model of the present invention;

图2为本发明的Mosaic9数据增强的流程结构图；Fig. 2 is the flow chart of Mosaic9 data enhancement of the present invention;

图3为本发明的ShuffleNetv2的两种模块的结构图；Fig. 3 is the structural diagram of two kinds of modules of ShuffleNetv2 of the present invention;

图4为本发明的AMECA注意力模块的结构图；Fig. 4 is the structural diagram of the AMECA attention module of the present invention;

图5为YOLOv5的网络结构图；Figure 5 is a network structure diagram of YOLOv5;

具体实施方式Detailed ways

为了能清楚说明本方案的技术特点，下面通过具体实施方式，并结合其附图，对本发明进行详细阐述。下文的公开提供了许多不同的实施例或例子用来实现本发明的不同结构。为了简化本发明的公开，下文中对特定例子的部件和设置进行描述。此外，本发明可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的，其本身不指示所讨论各种实施例和/或设置之间的关系。应当注意，在附图中所图示的部件不一定按比例绘制。本发明省略了对公知组件和处理技术及工艺的描述以避免不必要地限制本发明。In order to clearly illustrate the technical features of the present solution, the present invention will be described in detail below through specific implementation methods and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the present invention. To simplify the disclosure of the present invention, components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. It should be noted that components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted herein to avoid unnecessarily limiting the present invention.

一种基于MSTA-YOLOv5的轻量化齿轮表面缺陷检测方法，包括以下步骤：首先获取齿轮表面缺陷图像，并对所述的图像进行标注和划分，构建齿轮表面缺陷数据集；然后构建MSTA-YOLOv5检测模型，基于齿轮表面缺陷数据集对MSTA-YOLOv5检测模型进行训练；最后将待检测的齿轮缺陷图像送入训练好的MSTA-YOLOv5检测模型，获取检测齿轮的缺陷类型。本发明齿轮表面缺陷类型包括齿底黑皮、齿面黑皮和磕碰三种类型。A light-weight gear surface defect detection method based on MSTA-YOLOv5, comprising the following steps: firstly obtain a gear surface defect image, and mark and divide the image to construct a gear surface defect data set; then construct MSTA-YOLOv5 detection The model is based on the gear surface defect data set to train the MSTA-YOLOv5 detection model; finally, the gear defect image to be detected is sent to the trained MSTA-YOLOv5 detection model to obtain the defect type of the detected gear. The gear surface defect types in the present invention include three types of tooth bottom black skin, tooth surface black skin and bumps.

如图1所示，所述的MSTA-YOLOv5检测模型的包括：As shown in Figure 1, the MSTA-YOLOv5 detection model includes:

如图5、图1所示，本发明在传统的YOLOv5模型基础上，用ShuffleNetV2架构替代CSPDarknet53作为特征提取网络，构建YOLOv5轻量级神经网络模型。ShuffleNetV2既继承了ShuffleNet分组卷积和通道重排的特点，又遵循了设计轻量级网络的4条准则。在同等条件下，ShuffleNetV2相比其他模型速度快，而且准确度更好。MSTA-YOLOv5模型输入由ShuffleNetV2提取的目标特征量，根据每次迭代返回的损失值，自适应的调整网络模型的参数，等待损失值收敛趋于稳定，便可以得到评估指标最佳的检测模型。大大减少了模型的参数量Parameters和计算量FLOPs，减小了模型的大小。CSPDarknet53和ShuffleNetv2两个架构的层数、参数量和计算量的对比如表1所示。As shown in Figure 5 and Figure 1, on the basis of the traditional YOLOv5 model, the present invention uses the ShuffleNetV2 architecture instead of CSPDarknet53 as a feature extraction network to construct a YOLOv5 lightweight neural network model. ShuffleNetV2 not only inherits the characteristics of ShuffleNet group convolution and channel rearrangement, but also follows the four guidelines for designing lightweight networks. Under the same conditions, ShuffleNetV2 is faster and more accurate than other models. The MSTA-YOLOv5 model inputs the target feature quantity extracted by ShuffleNetV2, adaptively adjusts the parameters of the network model according to the loss value returned by each iteration, and waits for the loss value to converge and stabilize, then the detection model with the best evaluation index can be obtained. It greatly reduces the parameters of the model Parameters and the amount of calculation FLOPs, reducing the size of the model. Table 1 shows the comparison of the number of layers, the amount of parameters, and the amount of calculation of the two architectures of CSPDarknet53 and ShuffleNetv2.

表1 主干网络参数量对比Table 1 Comparison of backbone network parameters

模型Model 网络层数Network layers 参数量Parameter amount 计算量Calculations CSPDarknet53CSPDarknet53 270270 7.03M7.03M 16.0GFLOPs16.0 GFLOPs ShuffleNetv2ShuffleNetv2 308308 3.79M3.79M 8.0GFLOPs8.0 GFLOPs

其中FLOPs就是指计算量的大小，对于卷积层而言，FLOPs的计算公式如下：Among them, FLOPs refers to the amount of calculation. For the convolutional layer, the calculation formula of FLOPs is as follows:

FLOPs＝2HW(C_inK²+1)C_out (1)FLOPs＝2HW(C _in K ² +1)C _out (1)

其中，C_in是指卷积层输入tensor的通道数，C_out指的是卷积层输出tensor的通道数，K指的是卷积核的大小，而后把常数项去掉，简化为：Among them, C _in refers to the number of channels of the input tensor of the convolutional layer, C _out refers to the number of channels of the output tensor of the convolutional layer, and K refers to the size of the convolution kernel, and then the constant item is removed, which is simplified as:

FLOPs＝HW(C_inK²)C_out (2)FLOPs＝HW(C _in K ² )C _out (2)

对于卷积层而言，参数量Parameters的计算公式如下：For the convolutional layer, the calculation formula of the parameter quantity Parameters is as follows:

parameters＝Co×(Ci×K×K+1) (3)parameters＝Co×(Ci×K×K+1) (3)

其中，Co是输出通道数，Ci是输入通道数，K指的是卷积核的大小；Among them, Co is the number of output channels, Ci is the number of input channels, and K refers to the size of the convolution kernel;

H、W分别表示输入特征图的高、宽；H and W represent the height and width of the input feature map, respectively;

如图3所示，ShuffleNetv2主干有两种模块分别为Shuffle_Block(d)、Shuffle_Block(c)，Shuffle_Block(c)和Shuffle_Block(d)在yaml配置文件里通过步长就可以区分，Shuffle_Block(c)的步长stride＝2，Shuffle_Block(d)的步长stride＝1，这两个在本发明中是交替使用的。将原始YOLOv5网络输入端的Focus切片替换成CBRM，一个3*3的卷积，将主干网络所有Conv+C3替换为Shuffle_Block，去掉了SPP和后面的一个C3结构，因为SPP的并行操作会影响速度。As shown in Figure 3, the ShuffleNetv2 backbone has two modules: Shuffle_Block(d), Shuffle_Block(c), Shuffle_Block(c) and Shuffle_Block(d) can be distinguished by the step size in the yaml configuration file, Shuffle_Block(c) The stride of stride=2, the stride of Shuffle_Block(d)=1, these two are used alternately in the present invention. Replace the Focus slice at the input end of the original YOLOv5 network with CBRM, a 3*3 convolution, replace all Conv+C3 in the backbone network with Shuffle_Block, and remove the SPP and a subsequent C3 structure, because the parallel operation of SPP will affect the speed.

如图2所示，所述的Mosaic9数据增强包括：首先从总数据集中取出一个批量的数据，每次从中随机取出9张图片，进行随机位置的裁剪、缩放，合成新图片；上述过程重复batch-size次，最后得到包括batch-size个经过了Mosaic9数据增强后图片的一个批量的新数据，再传递给神经网络进行训练。本发明采用Mosaic9数据增强，采用9张图像进行随机裁剪、缩放后，再随机排列拼接形成一张图片，这样改进实现丰富数据集的同时，增加了小样本目标，提升网络的训练速度和泛化能力，在进行归一化操作时会一次性计算9张图片的数据，因此模型对内存的需求降低。As shown in Figure 2, the Mosaic9 data enhancement includes: firstly take out a batch of data from the total data set, randomly take out 9 pictures from it each time, perform cropping and zooming at random positions, and synthesize new pictures; the above process repeats batch -size times, and finally get a batch of new data including batch-size images after Mosaic9 data enhancement, and then pass it to the neural network for training. The present invention adopts Mosaic9 data enhancement, uses 9 images for random cutting and zooming, and then randomly arranges and stitches them to form a picture, which improves and enriches the data set while increasing the small sample target and improving the training speed and generalization of the network Ability, the data of 9 pictures will be calculated at one time during the normalization operation, so the memory requirement of the model is reduced.

具体的，所述的CBRM操作包括Conv、BN、ReLU和MaxPool。Specifically, the CBRM operation includes Conv, BN, ReLU and MaxPool.

具体的，所述的第一下采样层、第二下采样层、第三下采样层中均包括Shuffle_Block(d)模块，所述的Shuffle_Block(d)不再进行通过分流操作，将输入特征分为两个分支，左侧分支有2个卷积层，分别为步长为2的3×3深度卷积和1×1的普通卷积；右侧分支上有三个卷积层，分别为1×1的普通卷积、步长为2的3×3深度卷积和1×1的普通卷积；左右两侧分支通过Concat进行拼接来使左右两侧的特征进行融合，最后通过通道混洗操作来启用两个分支之间的信息通信。与Shuffle_Block(c)不同的是在左右两侧都引入了3×3的深度卷积来实现下采样。Specifically, the first down-sampling layer, the second down-sampling layer, and the third down-sampling layer all include a Shuffle_Block(d) module, and the Shuffle_Block(d) does not perform a shunting operation, and divides the input feature There are two branches, the left branch has 2 convolutional layers, which are 3×3 depth convolution with a step size of 2 and 1×1 ordinary convolution; there are three convolutional layers on the right branch, which are 1 ×1 ordinary convolution, 3×3 depth convolution with a step size of 2, and 1×1 ordinary convolution; the left and right branches are spliced by Concat to fuse the features on the left and right sides, and finally shuffled through channels Action to enable communication of information between two branches. The difference from Shuffle_Block(c) is that a 3×3 depth convolution is introduced on the left and right sides to achieve downsampling.

具体的，所述的第二卷积归一化层、第三卷积归一化层、第四卷积归一化层中均包括Shuffle_Block(c)模块，所述的Shuffle_Block(c)模块对每个通道进行分流，划分为两个分支，根据要减少模型的碎片化程度准则，在左边的分支上没有进行任何操作，在右边的分支上有3个卷积层，分别是1×1普通卷积、3×3深度卷积、1×1普通卷积，1×1普通卷积、3×3深度卷积、1×1普通卷积三个卷积层拥有相同的输入和输出通道，其中两个所述的1×1Conv不再是组卷积而改变为普通的卷积，在所述的3个卷积之后，两个分支是通过Concat进行拼接；这样能够满足输入和输出通道是一样的，两个分支是通过Concat进行拼接的结果进行通道混洗操作，通过通道混洗操作启用两个分支之间的信息通信。Specifically, the second convolutional normalization layer, the third convolutional normalization layer, and the fourth convolutional normalization layer all include a Shuffle_Block(c) module, and the Shuffle_Block(c) module is Each channel is split and divided into two branches. According to the criterion of reducing the degree of fragmentation of the model, no operation is performed on the left branch, and there are 3 convolutional layers on the right branch, which are 1×1 ordinary Convolution, 3×3 depth convolution, 1×1 ordinary convolution, 1×1 ordinary convolution, 3×3 depth convolution, 1×1 ordinary convolution three convolutional layers have the same input and output channels, Two of the 1×1Conv are no longer group convolutions but changed to ordinary convolutions. After the three convolutions, the two branches are spliced by Concat; this can satisfy the input and output channels. Similarly, the two branches are spliced through Concat to perform channel shuffling operations, and information communication between the two branches is enabled through channel shuffling operations.

具体的，所述的转置卷积上采样的运算步骤包括：Specifically, the operation steps of the transposed convolution upsampling include:

采用转置卷积的上采样方式，通过网络自行学习，来获取最优的上采样方式，实现了语义级别的上采样，使特征含有更强的语义信息。转置卷积计算过程是将输入的每个元素值作为卷积核的权重，相乘后作为该元素对应的上采样输出，不同输入的重叠的输出部分直接相加作为输出。The upsampling method of transposed convolution is used to obtain the optimal upsampling method through self-learning by the network, and the upsampling at the semantic level is realized, so that the feature contains stronger semantic information. The transposed convolution calculation process is to use each element value of the input as the weight of the convolution kernel, multiply it as the upsampling output corresponding to the element, and add the overlapping output parts of different inputs directly as the output.

所提出的AMECA注意力模块如图4所示，将特征图经过全局平均池化模块与全局最大池化模块，并将得到的两个特征图进行相加，压缩空间信息，随后使用1*1卷积学习通道注意力信息，得到的通道注意力信息与原始输入特征图结合，最终得到具体的通道注意力特征图。AMECA避免了降维，有效捕获了跨通道交互的信息，使得网络可以更精确地定位并识别到目标区域。The proposed AMECA attention module is shown in Figure 4. The feature map is passed through the global average pooling module and the global maximum pooling module, and the two obtained feature maps are added to compress the spatial information, and then use 1*1 Convolutional learning channel attention information, the obtained channel attention information is combined with the original input feature map, and finally a specific channel attention feature map is obtained. AMECA avoids dimensionality reduction and effectively captures cross-channel interaction information, enabling the network to locate and identify target areas more precisely.

所述的AMECA注意力模块的流程包括：The process of the AMECA attention module includes:

另一种实施例，一种计算机可读介质其上存储有计算机程序用于执行如上所述的方法。In another embodiment, a computer readable medium has a computer program stored thereon for performing the method as described above.

轻量化处理后的网络计算量和参数量大幅下降，在YOLOv5网络结构模型的Neck结构的最后3个C3模块之后引入AMECA注意力机制，调整模型空间和通道的信息提取方式。该方法很好地保证精度没有损失，可以满足齿轮表面缺陷实时检测的需求。After the lightweight processing, the amount of network calculations and parameters are greatly reduced. After the last three C3 modules of the Neck structure of the YOLOv5 network structure model, the AMECA attention mechanism is introduced to adjust the information extraction method of the model space and channels. This method guarantees no loss of accuracy and can meet the needs of real-time detection of gear surface defects.

本发明做了对比试验，如表2所示。对比试验从参数量、计算量、模型大小进行了衡量，参数量、计算量和模型大小越小，说明网络复杂度越低。采用ShuffleNetv2模块的检测网络其参数量、计算量和模型大小均低于其他相同参数条件下的未采用ShuffleNetv2模块的检测网络，从而说明ShuffleNetv2模块具有轻量化的作用。The present invention has done comparative test, as shown in table 2. The comparison test was measured from the amount of parameters, calculation amount, and model size. The smaller the amount of parameters, calculation amount, and model size, the lower the network complexity. The parameters, calculation amount and model size of the detection network using the ShuffleNetv2 module are lower than those of the detection network without the ShuffleNetv2 module under the same parameter conditions, which shows that the ShuffleNetv2 module has a lightweight effect.

表2不同模型的参数量、计算量、模型大小对比Table 2 Comparison of parameter quantity, calculation quantity and model size of different models

根据检测结果以及表中的结果得出，MSTA-YOLOv5模型相较于YOLOv3、YOLOv4和YOLOv5s模型具有很大的优势。相较于原始的YOLOv5s模型，模型、参数量和计算量都得到了大幅度减少，参数量约减少了46％，计算量减少了50％，模型大小约减少了44％。新的模型更加精简，复杂度明显降低，实现了在移动端部署的要求，对齿轮表面缺陷具有较好的检测结果。According to the test results and the results in the table, the MSTA-YOLOv5 model has great advantages over the YOLOv3, YOLOv4 and YOLOv5s models. Compared with the original YOLOv5s model, the model, the amount of parameters and the amount of calculation have been greatly reduced, the amount of parameters has been reduced by about 46%, the amount of calculation has been reduced by 50%, and the size of the model has been reduced by about 44%. The new model is more streamlined, the complexity is significantly reduced, and the requirements for deployment on the mobile end are met. It has better detection results for gear surface defects.

上述虽然结合附图对发明的具体实施方式进行了描述，但并非对本发明保护范围的限制，在本发明的技术方案的基础上，本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. On the basis of the technical solution of the present invention, those skilled in the art can make various Modifications or variations are still within the protection scope of the present invention.

Claims

1. A light-weight gear surface defect detection method based on MSTA-YOLOv5 is characterized by comprising the following steps: firstly, acquiring a gear surface defect image, marking and dividing the image, and constructing a gear surface defect data set; then constructing an MSTA-YOLOv5 detection model, and training the MSTA-YOLOv5 detection model based on the gear surface defect data set; finally, the gear defect image to be detected is sent to a trained MSTA-YOLOv5 detection model, and the defect type of the detected gear is obtained;

the MSTA-YOLOv5 detection model comprises:

an input section: inputting the gear surface defect image into an MSTA-YOLOv5 network, and performing self-adaptive anchor frame calculation and Mosaic9 data enhancement;

a backbone part: the feature extraction backbone network adopts a ShuffleNetv2 architecture and comprises a CBRM operation, a first downsampling layer, a second convolution normalization layer, a second downsampling layer, a third convolution normalization layer, a third downsampling layer and a fourth convolution normalization layer which are sequentially connected; respectively recording 3 gear surface defect feature maps obtained by performing feature extraction on the gear surface defect image subjected to the down-sampling layer processing by utilizing 1-by-1 convolution as S2, S3 and S4;

a neck portion: the Neck portion Neck structure adopts FPN + PAN, the FPN layers transmit strong semantic information from top to bottom, S4 is convoluted by 3 x 3 to obtain a characteristic diagram Q4, Q4 is connected with S3 after being subjected to transposition convolution and upsampling, and the characteristic diagram Q3 is obtained through 3 x 3 convolution; q3 is connected with S2 after being subjected to transposed convolution and up sampling, and the characteristic graph obtained through 3-by-3 convolution is marked as Q2;

PAN transmits strong positioning information from bottom to top, the characteristic diagram Q2 is used as a bottom characteristic R2, and the R2 is connected with the Q3 after down sampling to obtain a characteristic diagram R3; r3 is connected with Q4 after being downsampled, and the obtained characteristic diagram is marked as R4; r2, R3 and R4 are respectively convolved by 3 x 3 to obtain characteristic diagrams T2, T3 and T4;

respectively integrating an AMECA attention module behind the last 3C 3 modules of the Neck Neck structure, respectively taking the feature maps T2, T3 and T4 as original input feature maps, respectively passing through a global average pooling module and a global maximum pooling module, adding the two obtained feature maps, compressing spatial information, then using 1 × 1 convolution to learn channel attention information, combining the obtained channel attention information with the original input feature maps, and finally obtaining specific channel attention feature maps D1, D2 and D3;

an output section: and respectively inputting the characteristic diagrams D1, D2 and D3 into a YOLOv5-MSTA detection head network to finally obtain a detection result.

2. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the Mosaic9 data enhancement comprises: firstly, taking out a batch of data from a total data set, randomly taking out 9 pictures from the data set each time, cutting and zooming at random positions, and synthesizing new pictures; the process is repeated for the batch-size times, and finally a batch of new data comprising the batch-size pictures subjected to the Mosaic9 data enhancement is obtained and transmitted to the neural network for training.

3. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the CBRM operation comprises Conv, BN, reLU and MaxPool.

4. The MSTA-YOLOv 5-based lightweight gear surface defect detection method of claim 1, wherein the first downsampling layer, the second downsampling layer and the third downsampling layer respectively comprise a Shuffle _ Block (d) module, the Shuffle _ Block (d) module divides an input feature into two branches, the left branch has 2 convolution layers, and the convolution layers are respectively a 3 x 3 depth convolution layer with a step size of 2 and a common convolution with a step size of 1 x 1; three convolution layers are arranged on the right branch, namely 1 × 1 common convolution, 3 × 3 depth convolution with the step length of 2 and 1 × 1 common convolution respectively; splicing the left and right branches through Concat to fuse the characteristics of the left and right branches, and finally starting information communication between the two branches through channel shuffling operation;

the second convolution normalization layer, the third convolution normalization layer and the fourth convolution normalization layer respectively comprise a Shuffle _ Block (c) module, the Shuffle _ Block (c) module splits each channel and divides each channel into two branches, no operation is performed on the left branch according to the fragmentation degree criterion of a model to be reduced, 3 convolutional layers are arranged on the right branch and respectively comprise 1 × 1 ordinary convolution, 3 × 3 deep convolution and 1 × 1 ordinary convolution, the three convolutional layers of 1 × 1 ordinary convolution, 3 × 3 deep convolution and 1 × 1 ordinary convolution have the same input and output channels, two 1 × 1Conv are not group convolutional products and are changed into ordinary convolution, and after the 3 convolutions, the two branches are spliced through Concat.

5. The MSTA-YOLOv 5-based lightweight gear surface defect detection method of claim 1, wherein the operation step of transpose convolution upsampling comprises:

(1) Filling s-1 rows and columns 0 among the elements of the input feature graph, wherein s represents the step pitch of the transposition convolution;

(2) Filling k-p-1 rows and columns 0 around the input feature map, wherein k represents the kernel _ size of the transposition convolution, and p is filling of the transposition convolution;

(3) Turning the convolution kernel parameters up and down, left and right;

(4) And (5) performing normal convolution operation, filling 0 and step 1.

6. The MSTA-YOLOv 5-based light-weight gear surface defect detection method as claimed in claim 1, wherein the AMECA attention module process comprises:

(1) Firstly, inputting a feature map X, wherein the dimension of the feature map X is H, W and C;

(2) Performing spatial feature compression on the input feature map X; using the global average pooled GAP in the spatial dimension to obtain a feature map F1 of 1 × c; using global maximum pooling GMP, a feature map F2 of 1 × c was obtained;

(3) Fusing the F1 and the F2 to obtain a feature map F3 of 1 × C, and obtaining higher-level semantic information;

(4) Channel feature learning is carried out on the fused feature map F3; learning the importance among different channels through 1 × 1 convolution, wherein the dimension of the output feature map F4 is still 1 × C;

(5) Obtaining F41 by the characteristic diagram F4 through a sigma function;

(6) Finally, combining the attention of the channels, and performing channel-by-channel product on the characteristic diagram F41 of the attention of the channels and the original input characteristic diagram X to finally output a characteristic diagram X' with the attention of the channels;

wherein, H, W and C respectively represent the height, width and channel number of the input characteristic diagram, and sigma represents an activation function;

the characteristic diagrams T2, T3 and T4 are used as input characteristic diagrams X, and the obtained output characteristic diagrams X' are respectively characteristic diagrams D1, D2 and D3.

7. A computer-readable medium, characterized in that a computer program is stored thereon for performing the method according to any of claims 1-6.