CN114663346A

CN114663346A - Strip steel surface defect detection method based on improved YOLOv5 network

Info

Publication number: CN114663346A
Application number: CN202210113743.4A
Authority: CN
Inventors: 石肖松; 刘坤; 杨晓松; 孟蕊
Original assignee: Hebei University of Technology
Current assignee: Hebei University of Technology
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-06-24
Anticipated expiration: 2042-01-30
Also published as: CN114663346B

Abstract

The invention discloses a strip steel surface defect detection method based on an improved YOLOv5 network, which is based on a YOLOv5 network model and is added with a self-designed channel space attention module, thereby improving the detection precision and solving the problem of feature extraction in complex scenes and backgrounds. The detection method provided by the invention gives full play to the advantage of extracting the characteristics by the deep learning method, can learn simple shallow characteristics from a large amount of data set without depending on manual characteristic engineering, and then gradually learn more complex and abstract deep characteristics, and has the advantages of better performance, higher defect type identification precision, high defect accuracy and recall rate of the lithium battery and high identification speed.

Description

A strip surface defect detection method based on improved YOLOv5 network

技术领域technical field

本发明属于工业缺陷检测技术领域，具体涉及一种基于空间通道注意力模块的YOLOv5网络的带钢表面缺陷检测方法。The invention belongs to the technical field of industrial defect detection, and in particular relates to a strip steel surface defect detection method based on a YOLOv5 network of a spatial channel attention module.

背景技术Background technique

带钢是钢铁的重要原材料之一，广泛应用于机械制造、航空航天和交通运输中，在各类生产生活中占有重要地位，但是带钢的生产过程中，由于工业技术的限制和生产工艺的影响，可能造成带钢表面存在表面油斑、杏仁状缺陷、白点和划痕等多种缺陷。这些缺陷在很大程度上会影响带钢的抗腐蚀性和使用寿命。现有的缺陷检测手段使用人工裸眼检测为主，工人检测效率低，劳动强度大，生产成本高，已经不能满足带钢表面缺陷检测的需求。Strip steel is one of the important raw materials of steel, widely used in machinery manufacturing, aerospace and transportation, and plays an important role in various production and life. It may cause various defects such as surface oil spots, almond-shaped defects, white spots and scratches on the surface of the strip. These defects greatly affect the corrosion resistance and service life of the strip. The existing defect detection methods mainly use artificial naked-eye detection, which has low detection efficiency, high labor intensity and high production cost, and can no longer meet the needs of strip surface defect detection.

深度学习通过卷积神经网络进行缺陷特征的自动提取并进行学习，无需进行人工特征因子设计，因此深度神经网络具有学习能力强和鲁棒性高的特点，已逐渐成为带钢表面缺陷检测的主流方法。翁玉尚等人(翁玉尚,肖金球,夏禹. 改进Mask R-CNN算法的带钢表面缺陷检测[J/OL].计算机工程与应用:1-12[2021-06-24].)提出改进的掩码区域卷积神经网络(Mask R-CNN)算法，使用k-means II聚类算法改进区域建议网络(RPN)锚框生成方法。李维刚等人(李维刚,叶欣,赵云涛,王文波.基于改进YOLOv3算法的带钢表面缺陷检测[J]. 电子学报,2020)提出了一种融合浅层特征与深层特征的的YOLOV3算法框架，改进后的YOLOv3算法在东北大学带钢数据集上平均精度均值达到了80％。但是，面对微弱微小的带钢缺陷时，由于背景和前景耦合程度强、缺陷区域小，深度学习模型不能很好的提取特征，导致模型检测效果较差。Deep learning automatically extracts and learns defect features through convolutional neural networks, without the need for artificial feature factor design. Therefore, deep neural networks have the characteristics of strong learning ability and high robustness, and have gradually become the mainstream of strip surface defect detection. method. Weng Yushang et al. (Weng Yushang, Xiao Jinqiu, Xia Yu. Improved Mask R-CNN Algorithm for Strip Steel Surface Defect Detection [J/OL]. Computer Engineering and Applications: 1-12 [2021-06-24].) Proposed an improved mask Code Region Convolutional Neural Network (Mask R-CNN) algorithm, using k-means II clustering algorithm to improve the region proposal network (RPN) anchor box generation method. Li Weigang et al. (Li Weigang, Ye Xin, Zhao Yuntao, Wang Wenbo. Strip surface defect detection based on improved YOLOv3 algorithm [J]. Journal of Electronics, 2020) proposed a YOLOV3 algorithm framework that combines shallow features and deep features, The improved YOLOv3 algorithm achieves an average accuracy of 80% on the strip steel dataset of Northeastern University. However, in the face of weak and tiny strip defects, due to the strong coupling between the background and the foreground and the small defect area, the deep learning model cannot extract features well, resulting in poor model detection effect.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足，本发明拟解决的技术问题是，提供一种基于改进 YOLOv5网络的带钢表面缺陷检测方法，能对常见不同类型的带钢表面缺陷进行实时检测并进行缺陷定位，提高不同种类以及相似结构缺陷识别的准确率，能满足实际带钢工业生产的实时性和准确性要求。In view of the deficiencies of the prior art, the technical problem to be solved by the present invention is to provide a strip steel surface defect detection method based on an improved YOLOv5 network, which can perform real-time detection and defect positioning for common different types of strip steel surface defects, and improve the The accuracy of identification of different types and similar structural defects can meet the real-time and accuracy requirements of actual strip steel industrial production.

本发明解决上述技术问题采用的技术方案是：设计一种基于改进YOLOv5 网络的带钢表面缺陷检测方法，其特征在于，该方法包括以下步骤：The technical solution adopted by the present invention to solve the above-mentioned technical problems is to design a method for detecting surface defects of strip steel based on an improved YOLOv5 network, which is characterized in that the method comprises the following steps:

第一步：图像数据集获取Step 1: Image dataset acquisition

1.1利用工业相机采集带钢表面图像，筛选出包含缺陷的图片；当筛选出的缺陷图像中的缺陷类型涵盖已知的带钢表面缺陷的类型时，形成缺陷图片集；1.1 Use an industrial camera to collect images of the surface of the strip steel, and screen out the pictures containing defects; when the defect types in the screened defect images cover the types of known surface defects of the strip steel, a defect picture set is formed;

1.2将缺陷图片集进行尺寸归一化操作，然后使用Labelimg软件对缺陷图片集中的图片进行手动标注，使每一张缺陷图片上具有缺陷种类和缺陷位置坐标的标签；1.2 Normalize the size of the defect picture set, and then use Labelimg software to manually mark the pictures in the defect picture set, so that each defect picture has the label of the defect type and the coordinates of the defect position;

1.3随机将完成标注的缺陷图片集的不少于60％部分划分为训练集，余下部分为验证集；1.3 Randomly divide no less than 60% of the marked defect image set into the training set, and the rest as the verification set;

第二步：构造改进YOLOv5网络模型Step 2: Construct an improved YOLOv5 network model

改进YOLOv5网络模型为在YOLOv5网络模型的基础上，在PAN的三个 CSP23_模块与网络模型的分类与定位部分的三个conv模块之间均串连接入一个CSA模块；The improved YOLOv5 network model is based on the YOLOv5 network model, and a CSA module is connected in series between the three CSP23_ modules of the PAN and the three conv modules of the classification and positioning part of the network model;

CSA模块包括一个通道注意力模块和一个空间注意力模块，两个模块串联，通道注意力模块的输出为空间注意力模块的输入；The CSA module includes a channel attention module and a spatial attention module, the two modules are connected in series, and the output of the channel attention module is the input of the spatial attention module;

CSP23_模块输出的特征图F₁输入到CSA模块中，首先经过通道注意力模块处理，通道注意力模块首先将输入的特征F₁分别经过基于深度和宽度的全局最大池化和全局平均池化，得到两个1×1×C的特征图；接着，将得到的两个 1×1×C的特征图分别经过一个卷积核大小为k的快速一维卷积处理，然后将两个快速一维卷积得到的结果相加后再经过激活函数sigmoid处理，得到通道注意力；将通道注意力乘以原始特征F₁进行特征的重新加权，得到加权后的特征F₂；The feature map F1 output by the _{CSP23_} module is input into the CSA module, and firstly processed by the channel attention module. The channel attention module _first passes the input feature F1 through global max pooling and global average pooling based on depth and width, respectively. , two 1×1×C feature maps are obtained; then, the two obtained 1×1×C feature maps are processed by a fast one-dimensional convolution with a convolution kernel size of k, and then the two fast The results obtained by the one-dimensional convolution are added and then processed by the activation function sigmoid to obtain the channel attention; the channel attention is multiplied by the original feature F ₁ to re-weight the feature to obtain the weighted feature F ₂ ;

通道注意力模块输出的特征F₂输入到空间注意力模块，空间注意力模块对特征F₂分别进行全局最大池化和全局平均池化处理，得到两个H×W×1的特征图；然后将这两个H×W×1的特征图基于通道做通道拼接操作，然后将拼接操作得到的结果经过一个卷积核为7×7大小的卷积操作，降维为一个一维向量，即H×W×1，再经过激活函数sigmoid生成空间注意力权重；最后将空间注意力权重与输入特征F₂相乘，得到空间注意力模块的输出特征F₃；特征F₃即为CSA模块的输出，也为网络模型的分类与定位部分的conv模块的输入； _The feature F2 output by the channel attention module is input to the spatial attention module, and the spatial attention module performs global maximum pooling and global average pooling on the feature F2 respectively, and obtains _two feature maps of H×W×1; then The two H×W×1 feature maps are spliced based on the channel, and then the result obtained by the splicing operation is subjected to a convolution operation with a convolution kernel of 7×7 size, and the dimension is reduced to a one-dimensional vector, that is, H×W×1, and then generate the spatial attention weight through the activation function sigmoid; finally, multiply the spatial attention weight with the input feature F ₂ to obtain the output feature F ₃ of the spatial attention module; feature F ₃ is the CSA module’s output feature F 3 . The output is also the input of the conv module of the classification and positioning part of the network model;

第三步：训练改进YOLOv5网络模型Step 3: Train and improve the YOLOv5 network model

3.1图像数据集预处理3.1 Image dataset preprocessing

将训练集采用Mosaic数据增强的方式进行预处理；The training set is preprocessed by Mosaic data augmentation;

3.2参数设置3.2 Parameter setting

初始化所有权重值、偏置值、批量归一化尺度因子值，设置网络的初始学习率、batch_size，并将初始化的参数数据输入网络中；根据训练损失的变化来动态调整学习率与迭代次数，以更新整个网络的参数；训练分为两个阶段，第一阶段是训练开始的前100个周期，初始学习率固定为0.001，以加快收敛；第二阶段是指100个周期之后的训练周期，初始学习率设置为0.0001；Initialize all weight values, bias values, and batch normalized scale factor values, set the initial learning rate and batch_size of the network, and input the initialized parameter data into the network; dynamically adjust the learning rate and the number of iterations according to changes in training loss, In order to update the parameters of the entire network; the training is divided into two stages, the first stage is the first 100 cycles of training, and the initial learning rate is fixed at 0.001 to speed up the convergence; the second stage refers to the training cycle after 100 cycles, The initial learning rate is set to 0.0001;

3.3网络模型训练3.3 Network model training

将经过预处理的训练集输入到第二步中的初始化参数设定好后的改进 YOLOv5网络模型中进行特征提取，利用K-means聚类方法对训练集图像自动生成锚框，以锚框的尺寸作为先验框，通过边框回归预测得到边界框；然后使用 logistic分类器对边界框进行分类，获得每个边界框对应的缺陷类别分类概率；再通过非极大值抑制法对所有边界框的缺陷类别分类概率进行排序，确定每个边界框对应的缺陷类别，得到预测值；预测值包括缺陷类别和缺陷位置信息，非极大抑制阈值为0.5；然后通过损失函数GIOU计算预测值和真实值之间的loss 值；根据训练损失值进行反向传播，更新骨干网络和分类回归网络的参数，直至loss值符合预设，网络模型参数的训练完成；Input the preprocessed training set into the improved YOLOv5 network model after the initialization parameters are set in the second step for feature extraction, and use the K-means clustering method to automatically generate anchor frames for the training set images. The size is used as the prior box, and the bounding box is obtained by bounding box regression prediction; then the bounding box is classified by the logistic classifier to obtain the classification probability of the defect category corresponding to each bounding box; The classification probability of defect categories is sorted, the defect category corresponding to each bounding box is determined, and the predicted value is obtained; the predicted value includes defect category and defect location information, and the non-maximum suppression threshold is 0.5; then the predicted value and the true value are calculated by the loss function GIOU The loss value between them; backpropagation is performed according to the training loss value, and the parameters of the backbone network and the classification and regression network are updated, until the loss value conforms to the preset value, and the training of the network model parameters is completed;

3.4网络模型测试3.4 Network Model Test

将验证集输入到步骤3.3中完成参数训练的网络模型中，得到验证集的张量预测值；将其张量预测值与标注信息对比，测试该网络模型的可靠性；采用 AP对网络模型进行评价，当AP不小于85％时，则该网络模型测试为可靠；Input the validation set into the network model that has completed the parameter training in step 3.3, and obtain the tensor prediction value of the validation set; compare the tensor prediction value with the label information to test the reliability of the network model; Evaluation, when the AP is not less than 85%, the network model test is reliable;

第四步：带钢表面缺陷检测Step 4: Strip surface defect detection

将待检测的带钢表面图像经过如第一步的步骤1.2中相同的尺寸归一化操作，然后输入到第三步中测试为可靠的网络模型中，得到待检测的带钢表面图像的缺陷张量信息，包括缺陷位置、缺陷类别和置信度。The surface image of the strip to be inspected is subjected to the same size normalization operation as in step 1.2 of the first step, and then input into the network model tested as reliable in the third step to obtain the defects of the surface image of the strip to be inspected Tensor information, including defect location, defect category, and confidence.

与现有技术相比，本发明的有益效果是：本发明检测方法以YOLOv5网络模型为基础，加入了自行设计的通道空间注意力模块；通道空间注意力模块通过先将浅层特征与深层特征融合在一起再进行注意力运算，这样由于深层包含有更多的高级语义信息与更少的背景信息，浅层特征与深层特征融合之后的目标信息得到强化，在做注意力运算时可以让网络更多的关注目标缺陷，抑制背景信息，从而可以更好地引导多尺度融合，提高检测精度，解决了在复杂场景及背景下的特征提取难题。本发明检测方法充分发挥了深度学习方法提取特征的优势，能够不依赖人工的特征工程，从大量数据集中先学习简单的浅层特征，再逐渐学习到更为复杂抽象的深层特征，性能更好，缺陷种类识别精度更高，且锂电池的缺陷的精确率、召回率高，识别速度快。Compared with the prior art, the beneficial effects of the present invention are: the detection method of the present invention is based on the YOLOv5 network model, and a self-designed channel space attention module is added; Integrate them together and then perform the attention operation, so that since the deep layer contains more high-level semantic information and less background information, the target information after the fusion of shallow features and deep features is strengthened, and the network can be made when doing attention operations. More attention is paid to target defects and background information is suppressed, which can better guide multi-scale fusion, improve detection accuracy, and solve the problem of feature extraction in complex scenes and backgrounds. The detection method of the present invention fully utilizes the advantages of the deep learning method for extracting features, and can learn simple shallow features from a large number of data sets without relying on artificial feature engineering, and then gradually learn more complex and abstract deep features, with better performance. , the defect type identification accuracy is higher, and the lithium battery defect has a high precision rate, a high recall rate, and a fast identification speed.

附图说明Description of drawings

图1为本发明检测方法一种实施例的CSA模块的结构与原理示意图。FIG. 1 is a schematic diagram of the structure and principle of a CSA module according to an embodiment of the detection method of the present invention.

图2为本发明检测方法一种实施例的改进YOLOv5网络模型的结构与原理示意图。2 is a schematic diagram of the structure and principle of an improved YOLOv5 network model according to an embodiment of the detection method of the present invention.

具体实施方式Detailed ways

下面将结合附图对本申请的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative work fall within the scope of protection of the present application.

本发明提供一种基于改进YOLOv5网络的带钢表面缺陷检测方法(简称检测方法，参见图1-2)，该检测方法包括以下步骤：The present invention provides a kind of strip steel surface defect detection method (referred to as detection method, referring to Fig. 1-2) based on improved YOLOv5 network, this detection method comprises the following steps:

第一步：图像数据集获取Step 1: Image dataset acquisition

1.2将缺陷图片集进行尺寸归一化操作(本实施例为缩放至608*608像素)，然后使用Labelimg软件对缺陷图片集中的图片进行手动标注，使每一张缺陷图片上具有缺陷种类和缺陷位置坐标的标签；1.2 Normalize the size of the defect picture set (scale to 608*608 pixels in this embodiment), and then use Labelimg software to manually mark the pictures in the defect picture set, so that each defect picture has defect types and defects. label for location coordinates;

1.3随机将完成标注的缺陷图片集的不少于60％部分划分为训练集，余下部分为验证集；本实施为4:1，即训练集为80％，余下20％为验证集。1.3 Randomly divide no less than 60% of the marked defect image set into the training set and the rest as the validation set; this implementation is 4:1, that is, the training set is 80%, and the remaining 20% is the validation set.

改进YOLOv5网络模型为在YOLOv5网络模型的基础上，在PAN(像素聚合网络)的三个CSP23_(跨阶段局部网络，分别为CSP23_5、CSP23_4、 CSP23_3)模块与网络模型的分类与定位部分的三个conv(卷积)模块之间均串连接入一个CSA(通道空间注意力)模块。The improved YOLOv5 network model is based on the YOLOv5 network model, in the three CSP23_ (cross-stage local networks, respectively CSP23_5, CSP23_4, CSP23_3) modules of the PAN (pixel aggregation network) and three of the classification and positioning parts of the network model. A CSA (channel spatial attention) module is connected in series between the conv (convolution) modules.

CSA模块包括一个通道注意力(ChannelAttention)模块和一个空间注意力(SpatialAttention)模块，两个模块串联，通道注意力模块的输出为空间注意力模块的输入。The CSA module includes a channel attention module and a spatial attention module. The two modules are connected in series, and the output of the channel attention module is the input of the spatial attention module.

CSP23_模块输出的特征图F₁(C×W×H)输入到CSA模块中，首先经过通道注意力模块处理，通道注意力模块首先将输入的特征F₁(C×W×H)分别经过基于深度和宽度的全局最大池化(MaxPool)和全局平均池化(AvgPool)，得到两个C×1×1的特征图。接着，将得到的两个C×1×1的特征图分别经过一个卷积核大小为k的快速一维卷积(Conv1d)处理，然后将两个快速一维卷积得到的结果相加后再经过激活函数sigmoid处理，得到通道注意力；将通道注意力乘以原始特征F₁进行特征的重新加权，得到加权后的特征F₂。The feature map F ₁ (C×W×H) output by the CSP23_ module is input into the CSA module, and firstly processed by the channel attention module, which firstly passes the input feature F ₁ (C×W×H) through the Based on depth and width global max pooling (MaxPool) and global average pooling (AvgPool), two feature maps of C×1×1 are obtained. Next, the obtained two C×1×1 feature maps are processed by a fast one-dimensional convolution (Conv1d) with a convolution kernel size of k, and then the results obtained by the two fast one-dimensional convolutions are added. Then, through the activation function sigmoid processing, the channel attention is obtained; the channel attention is multiplied by the original feature F ₁ to re-weight the feature, and the weighted feature F ₂ is obtained.

快速一维卷积的卷积核大小k代表本地跨通道交互的覆盖范围，即有多少个相近邻参与一维通道的注意力预测。其中交互作用的覆盖范围(即内核大小k)与通道维成比例，具体计算公式为：The kernel size k of fast 1D convolution represents the coverage of local cross-channel interactions, that is, how many neighbors participate in the attention prediction of 1D channels. The coverage of the interaction (that is, the kernel size k) is proportional to the channel dimension, and the specific calculation formula is:

其中，C表示特征通道数量，β＝2和b＝1表示两个超参数。where C represents the number of feature channels, and β=2 and b=1 represent two hyperparameters.

通道注意力模块输出的特征F₂输入到空间注意力模块，空间注意力模块对特征F₂分别进行全局最大池化(Max Pool)和全局平均池化(Mean Pool)处理，得到两个1×W×H的特征图；然后将这两个1×W×H的特征图基于通道做通道拼接(Concat)操作，然后将拼接操作得到的结果经过一个卷积核为7 ×7大小的卷积操作，降维为一个一维向量，即1×W×H，再经过激活函数 sigmoid生成空间注意力权重。最后将空间注意力权重与输入特征F₂相乘，得到空间注意力模块的输出特征F₃。特征F₃即为CSA模块的输出，也为网络模型的分类与定位部分的conv模块的输入。 _The feature F2 output by the channel attention module is input to the spatial attention module, and the spatial attention module performs global maximum pooling (Max Pool) and global average pooling (Mean Pool) on the feature F2 respectively, and obtains _two 1× The feature map of W×H; then the two 1×W×H feature maps are concatenated based on the channel (Concat) operation, and then the result obtained by the concatenation operation is passed through a convolution kernel with a size of 7×7. Operation, reduce the dimension to a one-dimensional vector, that is, 1×W×H, and then generate the spatial attention weight through the activation function sigmoid. Finally, the spatial attention weight is multiplied by the input feature F ₂ to obtain the output feature F ₃ of the spatial attention module. Feature F3 is the output _of the CSA module and the input of the conv module of the classification and localization part of the network model.

3.1图像数据集预处理3.1 Image dataset preprocessing

3.2参数设置3.2 Parameter setting

初始化所有权重值、偏置值、批量归一化尺度因子值，设置网络的初始学习率、batch_size，并将初始化的参数数据输入网络中；根据训练损失的变化来动态调整学习率与迭代次数，以更新整个网络的参数；训练分为两个阶段，第一阶段是训练开始的前100个周期，初始学习率固定为0.001，以加快收敛；第二阶段是指100个周期之后的训练周期，初始学习率设置为0.0001。Initialize all weight values, bias values, and batch normalized scale factor values, set the initial learning rate and batch_size of the network, and input the initialized parameter data into the network; dynamically adjust the learning rate and the number of iterations according to changes in training loss, In order to update the parameters of the entire network; the training is divided into two stages, the first stage is the first 100 cycles of training, and the initial learning rate is fixed at 0.001 to speed up the convergence; the second stage refers to the training cycle after 100 cycles, The initial learning rate is set to 0.0001.

3.3网络模型训练3.3 Network model training

将经过预处理的训练集输入到第二步中的初始化参数设定好后的改进 YOLOv5网络模型中进行特征提取，利用K-means聚类方法对训练集图像自动生成锚框，以锚框的尺寸(锚框尺寸会根据图像缩放尺寸等比缩放)作为先验框，通过边框回归预测得到边界框；然后使用logistic分类器对边界框进行分类，获得每个边界框对应的缺陷类别分类概率；再通过非极大值抑制法(NMS)对所有边界框的缺陷类别分类概率进行排序，确定每个边界框对应的缺陷类别，得到预测值；预测值包括缺陷类别和缺陷位置信息，非极大抑制阈值为0.5；然后通过损失函数GIOU计算预测值和真实值之间的loss值；根据训练损失值进行反向传播，更新骨干网络和分类回归网络的参数，直至loss值符合预设，网络模型参数的训练完成；Input the preprocessed training set into the improved YOLOv5 network model after the initialization parameters are set in the second step for feature extraction, and use the K-means clustering method to automatically generate anchor frames for the training set images. The size (the anchor box size will be proportionally scaled according to the image zoom size) is used as the a priori box, and the bounding box is obtained through the bounding box regression prediction; then the logistic classifier is used to classify the bounding box to obtain the defect category classification probability corresponding to each bounding box; Then, the non-maximum value suppression method (NMS) is used to sort the defect category classification probabilities of all bounding boxes, determine the defect category corresponding to each bounding box, and obtain the predicted value; the predicted value includes the defect category and defect location information, and the non-maximum The suppression threshold is 0.5; then the loss value between the predicted value and the real value is calculated by the loss function GIOU; back-propagation is performed according to the training loss value, and the parameters of the backbone network and the classification and regression network are updated until the loss value conforms to the preset, the network model The training of parameters is completed;

3.4网络模型测试3.4 Network Model Test

将验证集输入到步骤3.3中完成参数训练的网络模型中，得到验证集的张量预测值；将其张量预测值与标注信息对比，测试该网络模型的可靠性；采用 AP对网络模型进行评价，当AP不小于85％时，则该网络模型测试为可靠。Input the validation set into the network model that has completed the parameter training in step 3.3, and obtain the tensor prediction value of the validation set; compare the tensor prediction value with the label information to test the reliability of the network model; Evaluation, when the AP is not less than 85%, the network model test is reliable.

第四步：带钢表面缺陷检测Step 4: Strip surface defect detection

将待检测的带钢表面图像经过如第一步的步骤1.2中相同的尺寸归一化操作，然后输入到第三步中测试为可靠的网络模型中，得到待检测的带钢表面图像的缺陷张量信息，包括缺陷位置、缺陷类别和置信度(缺陷类别分类概率最大值)。The surface image of the strip to be inspected is subjected to the same size normalization operation as in step 1.2 of the first step, and then input into the network model tested as reliable in the third step to obtain the defects of the surface image of the strip to be inspected Tensor information, including defect location, defect category, and confidence (maximum probability of defect category classification).

YOLOv5模型对输入端的图像采用Mosaic数据增强的方式，丰富图像内容，提升对烟雾较浅或者烟雾区域较小的检测效果。The YOLOv5 model uses the Mosaic data enhancement method for the input image to enrich the image content and improve the detection effect of light smoke or small smoke area.

Mosaic数据增强具有如下特征：Mosaic data augmentation has the following characteristics:

首先从训练样本集中取出一批(批是指Batch，批次是指Batch size，是模型的超参数，本实施例中批次等于32)图像；其次从这批图像中随机挑选4张图像，随机通过色域变化、缩小、反转和/或裁剪方式操作这四张图像，每张图像至少进行一种操作；然后将这4张图像按照左上、左下、右上、右下四个方向位置排列后拼接得到一张新的图像，新图像大小与未进行操作的原图大小相同，为608×608×3；重复上述操作，设置循环次数等于Batch size。First, take a batch of images from the training sample set (batch refers to Batch, batch refers to Batch size, which is a hyperparameter of the model, in this example, the batch is equal to 32) images; secondly, 4 images are randomly selected from this batch of images, Randomly operate the four images through color gamut change, reduction, inversion and/or cropping, and perform at least one operation on each image; then arrange the four images in four directions: upper left, lower left, upper right, and lower right After stitching, a new image is obtained. The size of the new image is the same as the size of the original image without operation, which is 608×608×3; repeat the above operation, and set the number of cycles equal to the Batch size.

Focus模块连接三组CBL(卷积层Convolution-批标准化Batch normalization-激活函数Leaky Relu，CBL)卷积结构和跨阶段局部网络CSP1_X 模块组成的结构，之后再连接一个池化网络SPP模块，构成特征提取网络；通过使用Focus模块对图像进行切片操作，经过Focus模块后送入由CBL卷积结构和CSP1_X模块组成的结构，再通过SPP模块将低层特征和高层特征相融合。The Focus module connects three groups of CBL (Convolution-Batch Normalization-Batch Normalization-Activation Function Leaky Relu, CBL) convolution structure and a structure composed of cross-stage local network CSP1_X modules, and then connects a pooling network SPP module to form features Extract the network; by using the Focus module to slice the image, after passing through the Focus module, it is sent to the structure composed of the CBL convolution structure and the CSP1_X module, and then the low-level features and high-level features are fused through the SPP module.

Focus模块具有如下特征：The Focus module has the following features:

首先将大小为608×608×3的原图像用1、2、3、4四个数字做标记，其次将相同数字的像素组合成大小为304×304×3的4个部分，然后将这4个部分按照数字大小在深度方向拼接为304×304×12的特征图，再连接一个CBL卷积结构。First mark the original image of size 608×608×3 with four numbers 1, 2, 3, and 4, and then combine the pixels of the same number into 4 parts of size 304×304×3, and then combine these 4 Each part is spliced into a 304×304×12 feature map in the depth direction according to the number size, and then connected to a CBL convolution structure.

Focus模块包含的CBL卷积结构的特征是：卷积层(convolution，conv)的卷积核个数为64，大小为3×3，步距为1。The features of the CBL convolution structure included in the Focus module are: the number of convolution kernels in the convolution layer (convolution, conv) is 64, the size is 3 × 3, and the stride is 1.

CSP1_X模块具有如下特征：The CSP1_X module has the following characteristics:

X代表残差结构的个数，除残差结构个数不同外的其他结构相同。X represents the number of residual structures, and other structures are the same except for the number of residual structures.

以CSP1_3模块为例，该CSP1_3模块首先对输入的特征图进行CBL卷积操作；然后送入3个残差结构中，接着对经过残差结构的特征图进行卷积，并与输入的特征图直接卷积后得到的新特征图进行深度方向拼接concat；最后通过批标准化、激活函数Leakyrelu和一层CBL卷积结构输入到下个模块中。Taking the CSP1_3 module as an example, the CSP1_3 module first performs the CBL convolution operation on the input feature map; The new feature map obtained after direct convolution is concatenated in the depth direction; finally, it is input into the next module through batch normalization, activation function Leakyrelu and a layer of CBL convolution structure.

CSP1_X模块中直接对输入特征图进行卷积的CBL卷积结构的卷积层conv 和CSP1_X模块中位于最后的CBL卷积结构中的卷积层conv相同，卷积层conv 相关尺寸为：卷积核大小为1×1，步距为1。The convolution layer conv of the CBL convolution structure that directly convolves the input feature map in the CSP1_X module is the same as the convolution layer conv located in the last CBL convolution structure in the CSP1_X module. The relevant dimensions of the convolution layer conv are: convolution The kernel size is 1×1 and the stride is 1.

特征再处理网络设计：采用特征金字塔FPN和像素聚合网络PAN的结构，其中FPN结构是由两组CSP2_3模块、CBL卷积结构和上采样up sample结构串联组成的；PAN包括两个CBL卷积结构，对数据进行下采样。Feature reprocessing network design: adopt the structure of feature pyramid FPN and pixel aggregation network PAN, where the FPN structure is composed of two sets of CSP2_3 modules, CBL convolution structure and up-sampling up sample structure in series; PAN includes two CBL convolution structures , downsample the data.

将FPN中的每个上采样结构的输出和特征提取网络中的CSP1_9-1、 CSP1_9-2模块的输出特征图进行深度方向张量拼接concat，同时将FPN中的每个CBL卷积结构的输出和PAN中的CBL卷积结构对应大小的特征图进行深度方向张量拼接concat，PAN结构每次经过CBL卷积结构后分别增加一个 CSP2_3模块和SPP模块，PAN结构的第一个CBL卷积结构之前也增加一个 CSP2_3模块和SPP模块；CSP2_3和SPP模块都不改变特征图大小，Neck中 CSP2_3-3模块的输出是76×76，之后连接到CBL卷积结构，把图像大小变成 38×38，CSP2_3-4模块的输出连接到CBL卷积结构的输入，把图像变成19× 19。Concat the output of each upsampling structure in the FPN and the output feature maps of the CSP1_9-1 and CSP1_9-2 modules in the feature extraction network for depth-wise tensor splicing concat, and concat the output of each CBL convolution structure in the FPN Perform depthwise tensor splicing concat with the feature map of the corresponding size of the CBL convolution structure in the PAN. After each time the PAN structure passes through the CBL convolution structure, a CSP2_3 module and an SPP module are added respectively. The first CBL convolution structure of the PAN structure A CSP2_3 module and an SPP module were also added before; neither the CSP2_3 nor the SPP module changed the feature map size. The output of the CSP2_3-3 module in Neck was 76×76, and then connected to the CBL convolution structure to change the image size to 38×38 , the output of the CSP2_3-4 module is connected to the input of the CBL convolutional structure, turning the image into 19×19.

FPN包括了两个CBL卷积结构，但是这里的CBL都是步距为1的卷积，对特征图的大小没有影响，是FPN中的上采样up sample改变的特征图大小，学习多尺度目标信息。FPN includes two CBL convolution structures, but the CBLs here are convolutions with a stride of 1, which has no effect on the size of the feature map. It is the size of the feature map changed by the up-sampling up sample in the FPN. Learning multi-scale targets information.

跨阶段局部网络CSP2_3模块具有的特征是：The features of the cross-stage local network CSP2_3 module are:

该模块结构与CSP1_3模块每个残差结构中的将add融合过程删除后的结构相同。CSP2_3模块包括一个初始的CBL卷积结构和一个末尾的CBL卷积结构、多个重复单元，大小为1×1的卷积层经批标准化和激活函数Leaky relu连接3×3卷积层，该3×3卷积层再连接批标准化和激活函数Leaky relu构成重复单元，重复单元的数量为三个，第一个重复单元的输入连接初始的CBL卷积结构的输出，三个重复单元依次串联后的输出连接一层卷积层，该卷积层的输出与原始输入经一层卷积层后进行拼接，拼接后再经批标准化和激活函数 Leaky relu连接末尾的CBL卷积结构。The structure of this module is the same as the structure after deleting the add fusion process in each residual structure of the CSP1_3 module. The CSP2_3 module consists of an initial CBL convolutional structure and a final CBL convolutional structure, multiple repeating units, a convolutional layer of size 1×1 connected to a 3×3 convolutional layer by batch normalization and activation function Leaky relu. The 3×3 convolutional layer is then connected to the batch normalization and activation function Leaky relu to form a repeating unit. The number of repeating units is three. The input of the first repeating unit is connected to the output of the initial CBL convolution structure, and the three repeating units are connected in series. The latter output is connected to a convolutional layer. The output of the convolutional layer is spliced with the original input through a convolutional layer. After splicing, batch normalization and activation function Leaky relu connect the CBL convolutional structure at the end.

FPN中的CBL卷积结构具有的特征是：卷积核大小都为1×1，步距都为 1。The characteristics of the CBL convolution structure in FPN are: the size of the convolution kernel is 1 × 1, and the stride is 1.

PAN中的CBL卷积结构具有的特征是：卷积核大小都为3×3，步距都为 2。The characteristics of the CBL convolution structure in PAN are: the size of the convolution kernel is 3 × 3, and the stride is 2.

SPP模块由最大池化核大小分别为1×1、5×5、9×9、13×13的四个并行池化层组成，SPP模块自身包含了两层CBL卷积结构，在开始和结束部分。The SPP module consists of four parallel pooling layers with maximum pooling kernel sizes of 1×1, 5×5, 9×9, and 13×13, respectively. The SPP module itself includes a two-layer CBL convolution structure. part.

YOLOv5模型的输出继承了YOLOv3的思想，采用3个尺度特征图进行检测，特征图大小分别19×19、38×38、76×76。并为增加的每个SPP模块后的每个尺度分配相应个数的Anchor Box，特征图中的每个像素生成9个Anchor Box，通过加权非极大值抑制筛选出最优框，并将GIOU作为损失函数返回到网络中，训练参数。The output of the YOLOv5 model inherits the idea of YOLOv3, and uses three scale feature maps for detection, and the feature map sizes are 19×19, 38×38, and 76×76 respectively. And assign a corresponding number of Anchor Boxes to each scale after each SPP module added, generate 9 Anchor Boxes for each pixel in the feature map, filter out the optimal box through weighted non-maximum suppression, and put the GIOU Returned to the network as a loss function, training parameters.

本实施例在Centos7.9.2的平台下完成，使用Python编程实现，测试网络模型和训练网络模型的计算机性能是：Tesla v100、interl Xeon(R)Gold 6271c CPU @2.6GHz；使用的框架是pytorch深度学习框架。YOLOv5模型的学习率选为 λ＝0.01，训练步数为500次。This example is completed under the platform of Centos 7.9.2 and implemented using Python programming. The computer performance of the test network model and the training network model is: Tesla v100, interl Xeon(R) Gold 6271c CPU @2.6GHz; the framework used is pytorch depth Learning framework. The learning rate of the YOLOv5 model is selected as λ=0.01, and the number of training steps is 500 times.

本实施例采用AP(Average Precision，平均精度)和mAP(mean AveragePrecision，平均准确率)进行评价。在目标检测中，每个类别都对应一个准确率和召回率。近几年常见的评价指标是AP和mAP，AP为准确率(precision)- 召回率(recall)曲线下的面积，P-R曲线是以Precision为y轴，Recall为x轴绘制。评判模型的优劣主要是通过曲线下的面积大小来决定。mAP则是多个类别的AP 均值。In this embodiment, AP (Average Precision, average precision) and mAP (mean Average Precision, average accuracy) are used for evaluation. In object detection, each category corresponds to a precision rate and a recall rate. The common evaluation indicators in recent years are AP and mAP. AP is the area under the precision-recall curve. The P-R curve is drawn with Precision as the y-axis and Recall as the x-axis. The pros and cons of the evaluation model are mainly determined by the size of the area under the curve. mAP is the average AP of multiple classes.

其中，TP是正例正确分为正例的数量，FP是正例错误分为负例的数量，FN 是负例分为正例的数量。Among them, TP is the number of positive examples correctly classified into positive examples, FP is the number of positive examples incorrectly classified into negative examples, and FN is the number of negative examples classified into positive examples.

AP的计算方法为：AP is calculated as:

其中，p表示准确率(precision)，r表示召回率(recall)。Among them, p represents the precision rate (precision), r represents the recall rate (recall).

在实际的统计情况中，准确率和召回率不是连续的曲线，独立的有限个数值。因此，计算过程中将其离散统计：In actual statistical situations, precision and recall are not continuous curves, but are independent finite numbers. Therefore, it is discrete statistics during the calculation process:

其中，N表示待测数据集的图像总数，p(k)表示在模型对k个图像识别出的准确率，Δr(k)代表当模型在识别出k个图像到k-1个图像时的召回率变化情况。Among them, N represents the total number of images in the data set to be tested, p(k) represents the accuracy rate of k images recognized by the model, and Δr(k) represents when the model recognizes k images to k-1 images. Changes in recall rate.

在目标检测中，通常利用预测出的标记框与目标的真实标记框的重叠度多少来判断模型是否正确检测出来目标，该重叠度也被称为IOU(Intersection Over Union)，一般会设定IoU的阈值为0.5，如果，模型经过计算得到的IoU大于0.5，那么则判定正确检测出目标，IoU示意图如图所示。In target detection, the degree of overlap between the predicted marked frame and the real marked frame of the target is usually used to judge whether the model correctly detects the target. This degree of overlap is also called IOU (Intersection Over Union), and IoU is generally set. The threshold is 0.5. If the IoU calculated by the model is greater than 0.5, then it is determined that the target is correctly detected. The schematic diagram of the IoU is shown in the figure.

计算公式为：The calculation formula is:

其中，A∩B表示预测框与目标框重叠的面积，A∪B表示预测框与目标框的并集面积。Among them, A∩B represents the overlapping area of the prediction frame and the target frame, and A∪B represents the union area of the prediction frame and the target frame.

本实施例对带钢表面块状缺陷、划痕、油斑和白点一共4种缺陷图像进行了实验，其中对白点的识别准确率在87％左右，其余所有缺陷识别率均达到90％以上，对于块状缺陷和油斑这两个具有相似结构的缺陷的识别率较高，说明本发明检测方法对于这两类相似缺陷区别度高，检测精度高。In this example, four kinds of defect images, including block defects, scratches, oil spots and white spots on the surface of the strip steel, have been tested. The recognition accuracy of white spots is about 87%, and the recognition rates of all other defects are above 90%. , the recognition rate of block defects and oil spots with similar structures is high, indicating that the detection method of the present invention has high discrimination and high detection accuracy for these two types of similar defects.

本发明未述及之处适用于现有技术。What is not described in the present invention applies to the prior art.

Claims

1. A strip steel surface defect detection method based on an improved YOLOv5 network is characterized by comprising the following steps:

the first step is as follows: image dataset acquisition

1.1, acquiring a surface image of the strip steel by using an industrial camera, and screening out a picture containing a defect; when the defect type in the screened defect image covers the known type of the surface defect of the strip steel, a defect image set is formed;

1.2, carrying out size normalization operation on the defect picture set, and then manually labeling the pictures in the defect picture set by using Labelimg software to enable each defect picture to have a label of defect type and defect position coordinates;

1.3 randomly dividing not less than 60% of the marked defect picture set into a training set, and taking the rest as a verification set;

the second step: improved YOLOv5 network model

The improved YOLOv5 network model is characterized in that on the basis of a YOLOv5 network model, a CSA module is connected in series between three CSP23_ modules of a PAN and three conv modules of a classification and positioning part of the network model;

the CSA module comprises a channel attention module and a space attention module, wherein the two modules are connected in series, and the output of the channel attention module is the input of the space attention module;

CSP23_ Module output feature map F₁The input is processed by a channel attention module firstly, and the input characteristic F is processed by the channel attention module firstly₁Respectively carrying out global maximum pooling and global average pooling based on depth and width to obtain two Cx 1 x 1 feature maps; secondly, respectively carrying out fast one-dimensional convolution processing on the two obtained Cx 1 x 1 characteristic graphs with a convolution kernel size of k, adding results obtained by the two fast one-dimensional convolutions, and carrying out sigmoid processing on the results to obtain channel attention; multiplying the channel attention by the original feature F₁Re-weighting the features to obtain weighted features F₂；

Feature F of channel attention module output₂Input to the spatial attention Module, spatial attention Module to feature F₂Respectively carrying out global maximum pooling and global average pooling to obtain two characteristic graphs of 1 multiplied by W multiplied by H; then, performing channel splicing operation on the two 1 xWxH feature graphs based on channels, performing convolution operation with a convolution kernel of 7 x 7 on the result obtained by the splicing operation, reducing the dimension to a one-dimensional vector, namely 1 xWxH, and generating a spatial attention weight through an activation function sigmoid; finally, the spatial attention weight and the input feature F are combined₂Multiplying to obtain the output characteristic F of the space attention module₃(ii) a Characteristic F₃The CSA module is the output of the CSA module and is also the input of the conv module of the classification and positioning part of the network model;

the third step: training improved Yolov5 network model

3.1 image dataset preprocessing

Preprocessing the training set in a Mosaic data enhancement mode;

3.2 parameter settings

Initializing all weight values, bias values and batch normalization scale factor values, setting the initial learning rate and batch _ size of the network, and inputting initialized parameter data into the network; dynamically adjusting the learning rate and the iteration times according to the change of the training loss so as to update the parameters of the whole network; the training is divided into two stages, the first stage is the first 100 periods of the training, and the initial learning rate is fixed to be 0.001 so as to accelerate convergence; the second stage refers to a training period 100 periods later, and the initial learning rate is set to 0.0001;

3.3 network model training

Inputting the preprocessed training set into an improved YOLOv5 network model with set initialization parameters in the second step for feature extraction, automatically generating an anchor frame for the images of the training set by using a K-means clustering method, taking the size of the anchor frame as a prior frame, and obtaining a boundary frame through frame regression prediction; then, classifying the bounding boxes by using a logistic classifier to obtain defect class classification probability corresponding to each bounding box; sorting the defect type classification probabilities of all the boundary frames by a non-maximum value inhibition method, and determining the defect type corresponding to each boundary frame to obtain a predicted value; the predicted value comprises defect type and defect position information, and the non-maximum inhibition threshold value is 0.5; then calculating a loss value between the predicted value and the true value through a loss function GIOU; performing back propagation according to the training loss value, updating parameters of the backbone network and the classification regression network until the loss value accords with the preset value, and finishing the training of the network model parameters;

3.4 network model testing

Inputting the verification set into the network model which completes parameter training in the step 3.3 to obtain a tensor prediction value of the verification set; comparing the predicted value of the tensor with the labeling information, and testing the reliability of the network model; evaluating the network model by using the AP, and testing the network model to be reliable when the AP is not less than 85%;

the fourth step: strip steel surface defect detection

And (3) performing the same size normalization operation as in the step 1.2 of the first step on the surface image of the strip steel to be detected, and then inputting the image into the network model tested to be reliable in the third step to obtain the defect tensor information of the surface image of the strip steel to be detected, including the defect position, the defect type and the confidence coefficient.

2. The strip steel surface defect detection method based on the improved YOLOv5 network as claimed in claim 1, wherein the convolution kernel size k of the fast one-dimensional convolution in the channel attention module represents the coverage of local cross-channel interaction, i.e. how many neighbors participate in the attention prediction of the one-dimensional channel; wherein the coverage of the interaction k is proportional to the channel dimension, and the specific calculation formula is:

where C denotes the number of eigenchannels, β ═ 2 and b ═ 1 denote two superparameters.

3. The method as claimed in claim 1, wherein in step 1.2 of the first step, the size normalization is performed to scale the image to 608 × 608 pixels.

4. The method for detecting the surface defects of the steel strip based on the improved YOLOv5 network as claimed in claim 1, wherein in step 1.3 of the first step, the training set is 80%, and the rest 20% is the verification set.