CN117541568A

CN117541568A - Deep learning-based automobile brake disc surface defect detection method

Info

Publication number: CN117541568A
Application number: CN202311584766.4A
Authority: CN
Inventors: 李岩; 周国跃; 菅强; 赵爱国; 刘润德; 张振国
Original assignee: Changchun University of Technology
Current assignee: Changchun University of Technology
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-02-09

Abstract

The invention relates to a method for detecting surface defects of an automobile brake disc based on deep learning. Comprising the following steps: 1. an image acquisition device is built to acquire an image of the surface of an automobile brake disc; 2. performing data preprocessing and marking on the obtained surface image of the automobile brake disc, and establishing an automobile brake disc surface defect data set; 3. constructing a model for detecting the surface defects of the automobile brake disc according to the surface defect data set of the automobile brake disc; 4. training the model by using the training set and the verification set and selecting an optimal detection model according to the evaluation index; 5. verifying the model by using the test set, and judging whether the performance of the model meets the requirement; 6. and detecting the surface of the automobile brake disc by using a defect detection model meeting the requirements, outputting a detection result, and realizing intelligent defect detection and identification. The invention can improve the detection efficiency and accuracy of the surface defects of the automobile brake disc, reduce the labor cost, and can be rapidly adapted to the detection of the surface defects of novel products, thereby shortening the development period.

Description

A method for detecting surface defects of automobile brake discs based on deep learning

技术领域Technical Field

本发明属于汽车刹车盘技术领域，具体的说是一种基于深度学习的汽车刹车盘表面缺陷检测方法。The present invention belongs to the technical field of automobile brake discs, and specifically is a method for detecting surface defects of automobile brake discs based on deep learning.

背景技术Background Art

汽车刹车盘作为汽车制动系统中重要部件之一，通过与其他制动部件进行摩擦，将汽车的动能转化为热能，从而实现汽车的减速或停止。在汽车刹车盘的生产过程中，由于生产原料、设备工艺以及人工操作等相关生产因素的影响，汽车刹车盘表面不可避免的会出现砂眼、划痕、脏污缺陷，如果汽车安装有缺陷的刹车盘，则缺陷会在刹车盘使用过程中导致其损坏，汽车轻则出现减速距离延长的现象，重则导致汽车刹车失灵，极易发生交通事故，所以需要在出厂前对汽车刹车盘进行严格的表面缺陷检测以保证其制动效果，从源头上阻止问题刹车盘流入市场。As one of the important components in the automobile braking system, automobile brake discs convert the kinetic energy of the car into heat energy through friction with other brake components, thereby achieving deceleration or stopping of the car. In the production process of automobile brake discs, due to the influence of relevant production factors such as production materials, equipment technology and manual operation, it is inevitable that sand holes, scratches and dirt defects will appear on the surface of automobile brake discs. If a car is equipped with a defective brake disc, the defect will cause damage to the brake disc during use. The car will have a longer deceleration distance at the least, and the car's brakes will fail at the worst, which is very likely to cause traffic accidents. Therefore, it is necessary to conduct strict surface defect inspection on automobile brake discs before leaving the factory to ensure their braking effect and prevent problematic brake discs from entering the market at the source.

目前汽车刹车盘在出厂前的表面缺陷检测通常采用人工检测和机器视觉检测，其中，人工检测常常存在成本高、实时性差、误检，漏检现象严重等问题，无法满足汽车刹车盘表面缺陷检测高精度需求；而基于机器视觉的表面缺陷检测，由于工厂生产环境复杂，图像采集过程中易存在噪声等客观因素的影响，常常需要多种图像处理算法相配合进行特征提取，导致缺陷检测算法复杂度较高，不利于工作人员进行后期的维护与改进，而且传统图像处理方法提取特征能力有限，面对复杂缺陷信息时特征提取效果不充分，进而导致检测效果较差，鲁棒性不强。近些年随着深度学习技术在视觉领域的不断发展，有大量研究人员将基于深度学习的视觉检测技术应用于工业检测领域，从而避免了人工检测的缺点，缓解了基于机器视觉的工业检测鲁棒性差的问题。At present, the surface defect detection of automobile brake discs before leaving the factory usually adopts manual inspection and machine vision inspection. Among them, manual inspection often has problems such as high cost, poor real-time performance, false detection, and serious missed detection, which cannot meet the high-precision requirements of automobile brake disc surface defect detection; and surface defect detection based on machine vision, due to the complex factory production environment and the influence of objective factors such as noise during image acquisition, often requires multiple image processing algorithms to cooperate for feature extraction, resulting in high complexity of defect detection algorithms, which is not conducive to the subsequent maintenance and improvement of staff. In addition, the traditional image processing method has limited feature extraction capabilities, and the feature extraction effect is not sufficient when facing complex defect information, which leads to poor detection effect and weak robustness. In recent years, with the continuous development of deep learning technology in the field of vision, a large number of researchers have applied deep learning-based visual inspection technology to the field of industrial inspection, thereby avoiding the shortcomings of manual inspection and alleviating the problem of poor robustness of industrial inspection based on machine vision.

发明内容Summary of the invention

本发明提供了一种基于深度学习的汽车刹车盘表面缺陷检测方法，在原有YOLOv5l目标检测模型的基础上对主干网络、特征融合网络以及预测网络进行改进以适配汽车刹车盘表面缺陷数据集，从而提高检测精度及效率，增强检测算法的鲁棒性，实现汽车刹车盘的自动化检测，解决人工检测效率低、精度差的问题。The present invention provides a method for detecting surface defects of automobile brake discs based on deep learning. On the basis of the original YOLOv5l target detection model, the backbone network, feature fusion network and prediction network are improved to adapt to the automobile brake disc surface defect dataset, thereby improving the detection accuracy and efficiency, enhancing the robustness of the detection algorithm, realizing the automatic detection of automobile brake discs, and solving the problems of low efficiency and poor accuracy of manual detection.

本发明技术方案结合附图说明如下：The technical solution of the present invention is described as follows in conjunction with the accompanying drawings:

一种基于深度学习的汽车刹车盘表面缺陷检测方法，包括以下步骤：A method for detecting surface defects of automobile brake discs based on deep learning, comprising the following steps:

步骤一、在生产线末端搭建图像采集装置获取不同光照条件下汽车刹车盘表面图像；Step 1: Build an image acquisition device at the end of the production line to obtain the surface images of automobile brake discs under different lighting conditions;

步骤二、对获取的汽车刹车盘表面图像进行预处理并标注，建立汽车刹车盘表面缺陷数据集，并且将所有标注样本图像分别划分为训练集、验证集和测试集；Step 2: pre-process and annotate the acquired automobile brake disc surface images, establish an automobile brake disc surface defect dataset, and divide all annotated sample images into a training set, a validation set, and a test set;

步骤三、根据汽车刹车盘表面缺陷数据集，构建汽车刹车盘表面缺陷检测模型；Step 3: Based on the automobile brake disc surface defect dataset, a surface defect detection model for automobile brake disc is constructed;

步骤四、利用训练集的样本和验证集的样本对步骤三建立的缺陷检测模型进行迭代训练及验证，选取验证集mAP指标最高的权重模型作为最优汽车刹车盘表面缺陷检测模型；Step 4: Use the samples of the training set and the samples of the validation set to iteratively train and validate the defect detection model established in step 3, and select the weighted model with the highest mAP index in the validation set as the optimal automobile brake disc surface defect detection model;

步骤五、利用步骤二中的测试集的样本对步骤四获得的最优汽车刹车盘表面缺陷检测模型进行测试，根据mAP指标评估检测模型是否满足精度要求；若不满足要求，则重复步骤四，继续训练检测模型；若满足要求，则执行步骤六；Step 5: Use the samples of the test set in step 2 to test the optimal automobile brake disc surface defect detection model obtained in step 4, and evaluate whether the detection model meets the accuracy requirements according to the mAP index; if it does not meet the requirements, repeat step 4 and continue to train the detection model; if it meets the requirements, execute step 6;

步骤六、利用满足精度要求的缺陷检测模型对汽车刹车盘表面进行检测，输出检测结果，从而实现智能化缺陷检测与识别。Step 6: Use a defect detection model that meets the accuracy requirements to inspect the surface of the automobile brake disc and output the test results, thereby realizing intelligent defect detection and identification.

进一步的，所述步骤一的具体方法如下：Furthermore, the specific method of step one is as follows:

利用高分辨率CMOS面阵工业相机和面光源组成的图像采集装置在不同光照条件下对汽车刹车盘进行拍摄；The image acquisition device composed of a high-resolution CMOS area array industrial camera and a surface light source is used to photograph the automobile brake disc under different lighting conditions;

所述不同光照条件包括光照强度和光照质量。The different lighting conditions include lighting intensity and lighting quality.

进一步的，所述步骤二的具体方法如下：Furthermore, the specific method of step 2 is as follows:

21)将汽车刹车盘表面原始图像进行裁剪，获得大小为640×640的子图像，两张子图像之间存在20％重叠区域，对每张子图像进行概率为50％的镜像和翻转；21) The original image of the automobile brake disc surface is cropped to obtain sub-images of size 640×640. There is a 20% overlap between the two sub-images. Each sub-image is mirrored and flipped with a probability of 50%;

22)使用LabelImg标注软件对步骤21)处理后的图像进行标注，输出VOC标注格式；22) Use LabelImg annotation software to annotate the image processed in step 21) and output it in VOC annotation format;

23)将VOC标注格式转换为COCO标注格式，完成汽车刹车盘表面缺陷数据集的构建；23) Convert the VOC annotation format to the COCO annotation format to complete the construction of the automobile brake disc surface defect dataset;

24)将步骤23)构建的汽车刹车盘表面缺陷数据集分为训练集、验证集和测试集。24) The automobile brake disc surface defect dataset constructed in step 23) is divided into a training set, a validation set and a test set.

进一步的，所述训练集、验证集和测试集的划分比例为8：1：1。Furthermore, the division ratio of the training set, validation set and test set is 8:1:1.

进一步的，所述步骤三的具体方法如下：Furthermore, the specific method of step three is as follows:

31)以YOLOv5l检测模型为基准模型，，利用自主构建的CSPDarkFormer主干特征提取网络替换原有YOLOv5l中的CSPDarkNet53主干特征提取网络；31) Using the YOLOv5l detection model as the benchmark model, the self-built CSPDarkFormer backbone feature extraction network is used to replace the CSPDarkNet53 backbone feature extraction network in the original YOLOv5l;

32)对原有YOLOv5l的特征融合网络进行改进，作为汽车刹车盘表面缺陷检测模型的特征融合网络。32) The original feature fusion network of YOLOv5l is improved as the feature fusion network of the automobile brake disc surface defect detection model.

33)将原有YOLOv5l中的预测网络进行解耦操作，移除预测置信度分支，保留类别预测分支和位置预测分支，同时基于无锚框检测机制构建损失函数用于模型的训练。33) Decouple the prediction network in the original YOLOv5l, remove the prediction confidence branch, retain the category prediction branch and the position prediction branch, and build a loss function based on the anchor-free box detection mechanism for model training.

进一步的，所述步骤31)的具体方法如下：Further, the specific method of step 31) is as follows:

所述CSPDarkFormer主干特征提取网络以Transformer架构为基础，利用卷积模块替换Transformer架构中常见的自注意力模块，其包括5层网络结构，依次为：第1层为Stem层、第2-4层均由Patch Embedding层和DFM模块组成、第5层由Patch Embedding层、DFM模块和SPPF模块组成，输出主干特征提取网络的第3-5层特征图作为改进的特征融合网络的输入，分别记作{C₃,C₄,C₅}；The CSPDarkFormer backbone feature extraction network is based on the Transformer architecture, and uses a convolutional module to replace the common self-attention module in the Transformer architecture. It includes a 5-layer network structure, which is: the first layer is the Stem layer, the 2nd to 4th layers are composed of the Patch Embedding layer and the DFM module, and the 5th layer is composed of the Patch Embedding layer, the DFM module and the SPPF module. The feature maps of the 3rd to 5th layers of the backbone feature extraction network are output as the input of the improved feature fusion network, which are denoted as {C ₃ ,C ₄ ,C ₅ } respectively;

其中，Patch Embedding层包括一个大小为3、步长为2的卷积模块和一个BN归一化层；Among them, the Patch Embedding layer includes a convolution module with a size of 3 and a stride of 2, and a BN normalization layer;

DFM模块包括一个C2Former模块和一个多层感知机模块MLP；The DFM module includes a C2Former module and a multi-layer perceptron module MLP;

特征图首先进行归一化操作，然后经过C2Former模块处理并与自身进行逐元素相加，形成残差连接，然后再经过归一化操作及多层感知机MLP模块处理，实现不同通道间之间的信息交互；The feature map is first normalized, then processed by the C2Former module and added to itself element by element to form a residual connection, and then processed by the normalization operation and the multi-layer perceptron MLP module to achieve information interaction between different channels;

所述DFM模块的计算过程如下：The calculation process of the DFM module is as follows:

式中，X_DFM为DFM模块的输出特征图；X为经过Patch Embedding层下采样后的特征图，即Input为Patch Embedding层的输入特征图，为对输入特征图进行3×3卷积，步长为2的下采样操作；Norm₁(·)和Norm₂(·)为BN归一化；C2Former(·)为C2Former模块；MLP(·)为MLP多层感知机模块；F₁为DFM模块的中间特征图；Where X _DFM is the output feature map of the DFM module; X is the feature map after downsampling by the Patch Embedding layer, that is, Input is the input feature map of the Patch Embedding layer. is a 3×3 convolution operation with a step size of 2 for the input feature map; Norm ₁ (·) and Norm ₂ (·) are BN normalization; C2Former (·) is the C2Former module; MLP (·) is the MLP multi-layer perceptron module; F ₁ is the intermediate feature map of the DFM module;

所述C2Former模块包括两个卷积核大小为1×1的CBS模块以及若干个DarkBlock模块；其中，CBS模块包括卷积模块、BN归一化层以及SiLU激活函数；The C2Former module includes two CBS modules with a convolution kernel size of 1×1 and several DarkBlock modules; wherein the CBS module includes a convolution module, a BN normalization layer and a SiLU activation function;

C2Former模块计算流程如下：The calculation process of the C2Former module is as follows:

X_C2Former＝f^1×1([[n×D(f^1×1(X))],X])X _C2Former =f ^1×1 ([[n×D(f ^1×1 (X))],X])

式中，X_C2Former为C2Former模块的输出特征图；X为C2Former模块的输入特征图；n为DarkBlock模块的数量，依次串行连接，并且特征图每经过一个DarkBlock模块处理后，都进行拼接操作，最后再与输入X进行拼接操作；f^1×1为卷积核大小为1×1的CBS模块；[·]为Concate拼接操作符；D为DarkBlock模块，计算流程如下：Where X _C2Former is the output feature map of the C2Former module; X is the input feature map of the C2Former module; n is the number of DarkBlock modules, which are connected in series, and each feature map is concatenated after being processed by a DarkBlock module, and finally concatenated with the input X; f ^1×1 is a CBS module with a convolution kernel size of 1×1; [·] is the Concate concatenation operator; D is a DarkBlock module, and the calculation process is as follows:

式中，X_DarkBlock为DarkBlock模块的输出特征图；X为DarkBlock模块的输入特征图；为5×5大小的DCBS模块，即包含一个5×5大小的深度可分离卷积模块，一个BN归一化层和一个SiLU激活函数；f^3×3为卷积核大小为3×3的CBS模块；DarkBlock模块中采用残差连接方式；Where X _DarkBlock is the output feature map of the DarkBlock module; X is the input feature map of the DarkBlock module; is a 5×5 DCBS module, which includes a 5×5 depthwise separable convolution module, a BN normalization layer, and a SiLU activation function; f ^3×3 is a CBS module with a convolution kernel size of 3×3; the DarkBlock module uses a residual connection method;

所述多层感知机MLP模块包括卷积核大小为1×1的卷积模块和SiLU激活函数，计算流程如下：The multi-layer perceptron MLP module includes a convolution module with a convolution kernel size of 1×1 and a SiLU activation function. The calculation process is as follows:

X_MLP＝g^1×1(τ(g^1×1(X))X _MLP =g ^1×1 (τ(g ^1×1 (X))

式中，X_MLP为MLP模块的输出特征图；X为MLP模块的输入特征图；τ(·)为SiLU激活函数；g^1×1为卷积核大小为1×1的卷积模块。Where _XMLP is the output feature map of the MLP module; X is the input feature map of the MLP module; τ(·) is the SiLU activation function; g1 ^×1 is the convolution module with a convolution kernel size of 1×1.

进一步的，所述步骤32)的具体方法如下：Further, the specific method of step 32) is as follows:

首先在输入部分添加SimSE轻量级通道注意力模块，然后将原有YOLOv5l中特征融合网络中的C3模块替换为C2Former模块，最后分别输出80×80、40×40、20×20三层特征图作为预测网络的输入，记为{P₃,P₄,P₅}，其中C2Former模块中的DarkBlock模块无残差连接；First, the SimSE lightweight channel attention module is added to the input part, and then the C3 module in the original YOLOv5l feature fusion network is replaced by the C2Former module. Finally, the three-layer feature maps of 80×80, 40×40, and 20×20 are output as the input of the prediction network, denoted as {P ₃ ,P ₄ ,P ₅ }, where the DarkBlock module in the C2Former module has no residual connection;

所述SimSE轻量级通道注意力模块为轻量化SE注意力模块，计算流程如下：The SimSE lightweight channel attention module is a lightweight SE attention module, and the calculation process is as follows:

X_SimSE＝σ(g^1×1(Avg(X))X _SimSE = σ(g ^1×1 (Avg(X))

式中，X_SimSE为SimSE轻量级通道注意力模块的输出特征图；X为SimSE轻量级通道注意力模块的输入特征图；Avg(·)为全局平均池化操作；g^1×1为卷积核大小为1×1的卷积模块；σ(·)为HardSigmoid激活函数；Where _XSimSE is the output feature map of the SimSE lightweight channel attention module; X is the input feature map of the SimSE lightweight channel attention module; Avg(·) is the global average pooling operation; g ^1×1 is the convolution module with a convolution kernel size of 1×1; σ(·) is the HardSigmoid activation function;

进一步的，所述步骤33)的具体方法如下：Further, the specific method of step 33) is as follows:

所述预测网络的损失函数具体表示为：The loss function of the prediction network is specifically expressed as:

Loss＝λ_clsL_cls+λ_bboxL_bbox+λ_dflL_dfl Loss＝λ _cls L _cls +λ _bbox L _bbox +λ _dfl L _dfl

式中，λ_cls、λ_bbox和λ_dfl为各个损失权重系数，分别选取0.5、7.5和0.375；L_cls为分类损失；L_bbox和L_dfl共同组成定位损失；Where λ _cls , λ _bbox and λ _dfl are the weight coefficients of each loss, which are selected as 0.5, 7.5 and 0.375 respectively; L _cls is the classification loss; L _bbox and L _dfl together constitute the positioning loss;

主要定位损失函数L_bbox为CIOU损失，具体表示为：The main positioning loss function L _bbox is CIOU loss, which is specifically expressed as:

式中，IOU为预测框与真实框之间的交并比；ρ²(b,b^gt)为预测框中点与真实框中点的L2距离；c为预测框与真实框之间最长对角线距离；α和v为用于衡量宽高比，具体表示为：Where IOU is the intersection-over-union ratio between the predicted box and the real box; ρ ² (b, b ^gt ) is the L2 distance between the midpoint of the predicted box and the midpoint of the real box; c is the longest diagonal distance between the predicted box and the real box; α and v are used to measure the aspect ratio, which is specifically expressed as:

式中，w和h分别为预测框的宽和高，w^gt和h^gt分别为真实框的宽和高；Where w and h are the width and height of the predicted box, respectively, and w ^gt and h ^gt are the width and height of the real box, respectively;

辅助定位损失函数L_dfl为Distribution Focal Loss，具体表示为：The auxiliary positioning loss function L _dfl is Distribution Focal Loss, which is specifically expressed as:

L_dfl＝-((y_i+1-y)log(S_i)+(y-y_i)log(S_i+1))L _dfl =-((y _i+1 -y)log(S _i )+(yy _i )log(S _i+1 ))

式中，y为真实框中心点到边界的距离；y_i和y_i+1分别为真实值y向下取整和向上取整的整数值；S_i与S_i+1为预测框中心点到边界的概率值；log为对数运算；Where y is the distance from the center point of the real box to the boundary; _yi and yi ₊₁ are the integer values of the real value y rounded down and rounded up respectively; _Si and Si ₊₁ are the probability values from the center point of the predicted box to the boundary; log is the logarithmic operation;

所述L_cls为交叉熵损失，表示为：The L _cls is the cross entropy loss, expressed as:

式中，n为样本数量；为预测样本的类别概率分数；y_i为真实标签。Where n is the number of samples; is the category probability score of the predicted sample; _yi is the true label.

进一步的，所述步骤四中，Furthermore, in step 4,

训练方式包括：训练总轮数为600轮，Batch size选取16，优化器采用AdamW优化器，初始学习率为0.001，采用cos学习率衰减方式，训练过程中采用随机裁剪、Mosaic、Mixup、随机HSV变换数据增强方式，最后100轮关闭Mosaic和MixUp数据增强；The training method includes: the total number of training rounds is 600 rounds, the batch size is 16, the optimizer is the AdamW optimizer, the initial learning rate is 0.001, the cos learning rate decay method is adopted, and the random cropping, Mosaic, Mixup, and random HSV transformation data enhancement methods are used during the training process. Mosaic and MixUp data enhancements are turned off in the last 100 rounds;

训练过程包括：主干特征提取网络输入分辨率大小为640×640的训练图像，首先经过Stem层后将特征图分辨率减小为原来的2倍，然后依次经过3层由Padding Embedding层和DFM模块组成的特征提取层，最后再经过由Padding Embedding层、DFM模块和SPPF模块组成的特征提取层，完成最后的特征提取，最终，主干特征提取网络输出三层特征图，其分辨率大小分别为80×80、40×40、20×20，记作{C₃,C₄,C₅}，作为特征融合网络的输入特征图；在特征融合网络获得{C₃,C₄,C₅}后，首先经过SimSE轻量级通道注意力模块，然后经过改进的特征融合网络，最终获得三层特征图，其分辨率大小分别为80×80、40×40、20×20，分别记作{P₃,P₄,P₅}，作为预测网络的输入特征图；The training process includes: the backbone feature extraction network inputs a training image with a resolution of 640×640, first passes through the Stem layer to reduce the feature map resolution to twice the original, then passes through 3 feature extraction layers composed of a Padding Embedding layer and a DFM module in sequence, and finally passes through a feature extraction layer composed of a Padding Embedding layer, a DFM module and an SPPF module to complete the final feature extraction. Finally, the backbone feature extraction network outputs three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {C ₃ ,C ₄ ,C ₅ }, as the input feature maps of the feature fusion network; after the feature fusion network obtains {C ₃ ,C ₄ ,C ₅ }, it first passes through the SimSE lightweight channel attention module, and then passes through the improved feature fusion network, and finally obtains three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {P ₃ ,P ₄ ,P ₅ }, as the input feature map of the prediction network;

预测网络包含三层，每层输出分辨率与输入分辨率相对应，每一层包括两个分支，分别预测类别及位置，每个分支均包含两个卷积核大小为3×3的CBS模块以及卷积核大小为1×1的卷积，其中，输出分辨率80×80的预测网络分支负责预测小目标；输出分辨率40×40的预测网络分支负责预测中等目标；输出分辨率20×20的预测网络分支负责预测大目标，最终汽车刹车盘缺陷检测模型输出缺陷类别及位置信息，均由矩形框标出。The prediction network consists of three layers, and the output resolution of each layer corresponds to the input resolution. Each layer includes two branches, which predict the category and position respectively. Each branch contains two CBS modules with a convolution kernel size of 3×3 and a convolution with a convolution kernel size of 1×1. Among them, the prediction network branch with an output resolution of 80×80 is responsible for predicting small targets; the prediction network branch with an output resolution of 40×40 is responsible for predicting medium targets; the prediction network branch with an output resolution of 20×20 is responsible for predicting large targets. Finally, the automobile brake disc defect detection model outputs the defect category and location information, which are marked by a rectangular box.

进一步的，所述步骤五中，将mAP作为评估指标，具体定义如下：Furthermore, in step 5, mAP is used as an evaluation indicator, which is specifically defined as follows:

式中，n为类别数量；AP为P-R曲线和坐标轴所围成的面积，表示每类缺陷的预测精度；mAP为每类缺陷的精度的平均值，其中，P为准确率，R为召回率，具体定义为：Where n is the number of categories; AP is the area enclosed by the P-R curve and the coordinate axis, which represents the prediction accuracy of each type of defect; mAP is the average accuracy of each type of defect, where P is the precision and R is the recall, which is specifically defined as:

式中，TP为预测正确的正样本数量；FP为预测正确的负样本数量；FN为预测错误的正样数量。In the formula, TP is the number of correctly predicted positive samples; FP is the number of correctly predicted negative samples; FN is the number of incorrectly predicted positive samples.

本发明的有益效果为：The beneficial effects of the present invention are:

1)本发明构建的CSPDarkFormer主干特征提取网络，以Transformer架构为基础，利用卷积模块替换Transformer架构中常用的自注意力模块，保留了Transformer结构中跨通道信息交互能力，显著增加了模型捕捉局部上下文信息的能力，同时缓解Transformer中的自注意力机制在大特征图上因计算量过大导致检测速度大幅下降的不足，从而提高特征提取的效率及精度；1) The CSPDarkFormer backbone feature extraction network constructed by the present invention is based on the Transformer architecture. The convolution module is used to replace the self-attention module commonly used in the Transformer architecture, which retains the cross-channel information interaction capability in the Transformer structure and significantly increases the model's ability to capture local context information. At the same time, it alleviates the deficiency of the self-attention mechanism in the Transformer on large feature maps, which leads to a significant decrease in detection speed due to excessive calculation, thereby improving the efficiency and accuracy of feature extraction;

2)本发明改进的特征融合网络采用大卷积核进行特征融合及特征提取，扩大模型的感受野，提升模型的检测精度；2) The improved feature fusion network of the present invention adopts a large convolution kernel for feature fusion and feature extraction, which expands the receptive field of the model and improves the detection accuracy of the model;

3)本发明基于原有预测网络，将预测网络进行解耦操作，采用无锚检测机制，防止锚框等先验信息对检测精度的影响，从而提高边界框回归精度；3) Based on the original prediction network, the present invention decouples the prediction network and adopts an anchor-free detection mechanism to prevent the influence of prior information such as anchor boxes on the detection accuracy, thereby improving the bounding box regression accuracy;

4)本发明基于无锚检测机制，优化了损失函数，将DFL损失与边界框回归损失联合表示，增加正负样本训练的全面性，提高边界框回归的收敛速度，进一步提升模型检测精度和效率。4) Based on the anchor-free detection mechanism, the present invention optimizes the loss function, jointly represents the DFL loss and the bounding box regression loss, increases the comprehensiveness of positive and negative sample training, improves the convergence speed of bounding box regression, and further improves the model detection accuracy and efficiency.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本发明实施例的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for use in the embodiments are briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present invention and therefore should not be regarded as limiting the scope. For ordinary technicians in this field, other related drawings can be obtained based on these drawings without creative work.

图1为本发明的流程示意图；Fig. 1 is a schematic diagram of a process of the present invention;

图2a为砂眼示例图；Fig. 2a is an example diagram of sand holes;

图2b为划痕示意图；Figure 2b is a schematic diagram of a scratch;

图2c为脏污示意图；Figure 2c is a schematic diagram of dirt;

图3为基于深度学习的汽车刹车盘表面缺陷检测模型的结构示意图；FIG3 is a schematic diagram of the structure of a surface defect detection model for automobile brake discs based on deep learning;

图4为DFM模块示意图；FIG4 is a schematic diagram of a DFM module;

图5为C2Former模块示意图；Figure 5 is a schematic diagram of the C2Former module;

图6a为CBS模块；Figure 6a is a CBS module;

图6b为DCBS模块示意图；Figure 6b is a schematic diagram of a DCBS module;

图7为实验检测效果图。Figure 7 is a diagram showing the experimental detection results.

具体实施方式DETAILED DESCRIPTION

下面结合附图和实施例对本发明作进一步的详细说明。可以理解的是，此处所描述的具体实施例仅仅用于解释本发明，而非对本发明的限定。另外还需要说明的是，为了便于描述，附图中仅示出了与本发明相关的部分而非全部结构。The present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are only used to explain the present invention, rather than to limit the present invention. It should also be noted that, for ease of description, only parts related to the present invention, rather than all structures, are shown in the accompanying drawings.

实施例一Embodiment 1

参阅图1，本实施例提供了一种基于深度学习的汽车刹车盘表面缺陷检测方法，包括以下步骤：Referring to FIG. 1 , this embodiment provides a method for detecting surface defects of automobile brake discs based on deep learning, comprising the following steps:

步骤一、在生产线末端搭建图像采集装置获取不同光照条件下汽车刹车盘表面图像，具体如下：Step 1: Build an image acquisition device at the end of the production line to obtain the surface image of the automobile brake disc under different lighting conditions, as follows:

所述不同光照条件包括调整光照强度和光照质量，模拟符合工厂生产条件的复杂环境。The different lighting conditions include adjusting the lighting intensity and lighting quality to simulate a complex environment that meets factory production conditions.

步骤二、对获取的汽车刹车盘表面图像进行预处理并标注，建立汽车刹车盘表面缺陷数据集，并且将所有标注样本图像分别划分为训练集、验证集和测试集，具体如下：Step 2: Preprocess and annotate the acquired automobile brake disc surface images, establish an automobile brake disc surface defect dataset, and divide all annotated sample images into training set, validation set and test set, as follows:

23)将VOC标注格式转换为COCO标注格式便于模型测试，完成汽车刹车盘表面缺陷数据集的构建；23) Convert the VOC annotation format to COCO annotation format to facilitate model testing and complete the construction of the automobile brake disc surface defect dataset;

24)将步骤23)构建的汽车刹车盘表面缺陷数据集分为训练集、验证集和测试集；所述训练集、验证集和测试集的划分比例为8：1：1。24) The automobile brake disc surface defect dataset constructed in step 23) is divided into a training set, a validation set and a test set; the division ratio of the training set, the validation set and the test set is 8:1:1.

步骤三、根据汽车刹车盘表面缺陷数据集，构建汽车刹车盘表面缺陷检测模型，以YOLOv5l检测模型为基准模型，根据汽车刹车盘表面缺陷特点对YOLOv5l检测模型的主干特征提取网络，特征融合网络以及预测网络进行优化，具体如下：Step 3: Based on the automobile brake disc surface defect dataset, a car brake disc surface defect detection model is constructed. The YOLOv5l detection model is used as the benchmark model. According to the characteristics of the automobile brake disc surface defects, the backbone feature extraction network, feature fusion network and prediction network of the YOLOv5l detection model are optimized as follows:

31)以YOLOv5l检测模型为基准模型，利用自主构建的CSPDarkFormer主干特征提取网络替换原有YOLOv5l中的CSPDarkNet53主干特征提取网络；31) Using the YOLOv5l detection model as the benchmark model, the self-built CSPDarkFormer backbone feature extraction network was used to replace the CSPDarkNet53 backbone feature extraction network in the original YOLOv5l;

DFM模块主要包括一个C2Former模块和一个多层感知机模块MLP；The DFM module mainly includes a C2Former module and a multi-layer perceptron module MLP;

式中，X_DarkBlock为DarkBlock模块的输出特征图；X为DarkBlock模块的输入特征图；为5×5大小的DCBS模块，即包含一个5×5大小的深度可分离卷积模块，一个BN归一化层和一个SiLU激活函数；f^3×3为卷积核大小为3×3的CBS模块；在主干特征提取网络中，为了缓解梯度消失现象，DarkBlock模块中采用残差连接方式；Where X _DarkBlock is the output feature map of the DarkBlock module; X is the input feature map of the DarkBlock module; is a 5×5 DCBS module, which includes a 5×5 depthwise separable convolution module, a BN normalization layer and a SiLU activation function; f ^3×3 is a CBS module with a convolution kernel size of 3×3; in the backbone feature extraction network, in order to alleviate the gradient vanishing phenomenon, the residual connection method is used in the DarkBlock module;

X_MLP＝g^1×1(τ(g^1×1(X))X _MLP =g ^1×1 (τ(g ^1×1 (X))

32)对原有YOLOv5l的特征融合网络进行改进，作为汽车刹车盘表面缺陷检测模型的特征融合网络；32) Improve the original feature fusion network of YOLOv5l as the feature fusion network of the automobile brake disc surface defect detection model;

首先在输入部分添加SimSE轻量级通道注意力模块，然后将原有YOLOv5l中特征融合网络中的C3模块替换为C2Former模块，最后分别输出80×80、40×40、20×20三层特征图作为预测网络的输入，记为{P₃,P₄,P₅}；以P₄输出层为例，具体计算流程可表示为：First, add the SimSE lightweight channel attention module to the input part, then replace the C3 module in the feature fusion network in the original YOLOv5l with the C2Former module, and finally output the three-layer feature maps of 80×80, 40×40, and 20×20 as the input of the prediction network, denoted as {P ₃ ,P ₄ ,P ₅ }; taking the P ₄ output layer as an example, the specific calculation process can be expressed as:

式中，为P₄层的输出特征图；和分别为第3层、第4层和第5层的中间特征图；C₄为主干特征提取网络的第4层输出特征图；UP(·)和Down(·)分别为上采样和下采样操作；f^1×1为卷积核大小为1×1的CBS模块；[·]表示Concate拼接操作符；C2Former(·)为C2Fomer模块，与YOLOv5l特征融合网络中的C3模块的残差连接方式相同，即C2Former模块中的DarkBlock模块无残差连接。In the formula, is the output feature map of _P4 layer; and are the intermediate feature maps of the 3rd, 4th and 5th layers respectively; C ₄ is the output feature map of the 4th layer of the backbone feature extraction network; UP(·) and Down(·) are upsampling and downsampling operations respectively; f ^1×1 is a CBS module with a convolution kernel size of 1×1; [·] represents the Concate concatenation operator; C2Former(·) is a C2Fomer module, which has the same residual connection method as the C3 module in the YOLOv5l feature fusion network, that is, the DarkBlock module in the C2Former module has no residual connection.

X_SimSE＝σ(g^1×1(Avg(X))X _SimSE = σ(g ^1×1 (Avg(X))

辅助定位损失函数L_dfl为Distribution Focal Loss，将该损失函数与定位损失函数联合表示，用于提高边界框回归的收敛速度，具体表示为：The auxiliary positioning loss function L _dfl is the Distribution Focal Loss, which is jointly expressed with the positioning loss function to improve the convergence speed of bounding box regression. It is specifically expressed as:

预测网络包含三层，每层输出分辨率与输入分辨率相对应，每一层包括两个分支，分别预测类别及位置，每个分支均包含两个卷积核大小为3×3的CBS模块以及卷积核大小为1×1的卷积，其中，输出分辨率80×80的预测网络分支负责预测小目标；输出分辨率40×40的预测网络分支负责预测中等目标；输出分辨率20×20的预测网络分支负责预测大目标；最终汽车刹车盘缺陷检测模型输出缺陷类别及位置信息，均由矩形框标出。The prediction network consists of three layers, and the output resolution of each layer corresponds to the input resolution. Each layer includes two branches, which predict the category and position respectively. Each branch contains two CBS modules with a convolution kernel size of 3×3 and a convolution with a convolution kernel size of 1×1. Among them, the prediction network branch with an output resolution of 80×80 is responsible for predicting small targets; the prediction network branch with an output resolution of 40×40 is responsible for predicting medium targets; the prediction network branch with an output resolution of 20×20 is responsible for predicting large targets; finally, the automobile brake disc defect detection model outputs the defect category and location information, both marked by rectangular boxes.

所述步骤五中，将mAP作为评估指标，具体定义如下：In step 5, mAP is used as the evaluation indicator, which is specifically defined as follows:

实施例二Embodiment 2

参阅图1，本实施例以一种实际情况为例通过实施例一来检测汽车刹车盘表面缺陷，具体如下：Referring to FIG. 1 , this embodiment takes a practical situation as an example to detect the surface defects of the automobile brake disc through the first embodiment, as follows:

S1、在生产线末端搭建图像采集装置获取不同光照条件下汽车刹车盘表面图像。S1. Build an image acquisition device at the end of the production line to obtain the surface images of automobile brake discs under different lighting conditions.

具体的，采用高分辨率CMOS工业面阵相机和面光源组成图像采集装置，对流水线上的汽车刹车盘表面进行图像采集，采集过程中需要对光源亮度、照射角度进行调整，获得不同光照强度和光照质量的表面缺陷图像，以模拟工厂复杂生产环境，并丰富数据集，其中，缺陷主要包括砂眼、划痕和脏污3个类别，其中，每张图像分辨率大小为5472×3648。Specifically, a high-resolution CMOS industrial area array camera and a surface light source are used to form an image acquisition device to capture images of the surface of automobile brake discs on the assembly line. During the acquisition process, the brightness and illumination angle of the light source need to be adjusted to obtain surface defect images with different light intensities and qualities to simulate the complex production environment of the factory and enrich the data set. The defects mainly include three categories: sand holes, scratches and dirt. The resolution of each image is 5472×3648.

S2、对汽车刹车盘表面图像预处理并标注，建立汽车刹车盘表面缺陷数据集，将所有标注样本图像划分为训练集、验证集和测试集。S2. Preprocess and annotate the automobile brake disc surface images, establish an automobile brake disc surface defect dataset, and divide all annotated sample images into training set, verification set and test set.

具体的，由于原始图像分辨率过大，导致无法与后续的缺陷检测算法相结合，所以需要先对原始图片进行图像裁剪，其中，裁剪子图像的大小为640×640，裁剪过程中需要每张子图之间保留20％的重叠区域，裁剪完成后，对裁剪图片进行数据清洗，去除不包含缺陷以及缺陷模糊不清的图像数据，数据清洗完成后统一对数据集进行概率为50％的镜像和翻转，以扩充数据集，利用LabelImg图像标注工具，对汽车刹车盘表面缺陷数据集进行图像标注，输出VOC标注格式，然后将VOC标注格式转换为COCO标注格式便于模型性能的量化，完成汽车刹车盘表面缺陷数据集的构建，最终按照8:1:1的比例将数据集划分为训练集、验证集和测试集。部分样本示例图如图2a、图2b和图2c所示，其中，图2a为砂眼缺陷，图2b为划痕缺陷，图2c为脏污缺陷。Specifically, since the original image resolution is too large, it cannot be combined with the subsequent defect detection algorithm, so the original image needs to be cropped first, where the size of the cropped sub-image is 640×640. During the cropping process, a 20% overlap area needs to be retained between each sub-image. After the cropping is completed, the cropped image is cleaned to remove image data that does not contain defects or has unclear defects. After the data cleaning is completed, the data set is uniformly mirrored and flipped with a probability of 50% to expand the data set. The LabelImg image annotation tool is used to annotate the image of the automobile brake disc surface defect data set, output the VOC annotation format, and then convert the VOC annotation format to the COCO annotation format to facilitate the quantification of the model performance. The construction of the automobile brake disc surface defect data set is completed, and finally the data set is divided into a training set, a validation set, and a test set according to the ratio of 8:1:1. Some sample example images are shown in Figures 2a, 2b, and 2c, where Figure 2a is a sand hole defect, Figure 2b is a scratch defect, and Figure 2c is a dirt defect.

S3、依据汽车刹车盘表面缺陷数据集，构建汽车刹车盘表面缺陷检测模型。S3. Based on the automobile brake disc surface defect dataset, a automobile brake disc surface defect detection model is constructed.

具体的，以YOLOv5l检测模型为基准模型，根据汽车刹车盘表面缺陷数据集，对YOLOv5l目标检测模型进行主干特征提取网络、特征融合网络以及预测网络三方面进行优化，整体框架图，如图3所示，具体优化方案如下：Specifically, taking the YOLOv5l detection model as the benchmark model, according to the automobile brake disc surface defect dataset, the YOLOv5l target detection model is optimized in three aspects: backbone feature extraction network, feature fusion network, and prediction network. The overall framework diagram is shown in Figure 3. The specific optimization scheme is as follows:

(1)主干特征提取网络；(1) Backbone feature extraction network;

主干特征提取网络以Transformer架构为基础，利用卷积模块替换Transformer架构中常见的自注意力模块，搭建CSPDarkFormer主干特征提取网络并替换原有CSPDarkNet53主干特征提取网络，CSPDarkFormer主干特征提取网络包括5层网络结构，依次为：第1层为Stem层、第2-4层均由Patch Embedding层和DFM模块组成、第5层由PatchEmbedding层、DFM模块和SPPF模块组成，输出主干特征提取网络的第3-5层特征图作为改进的特征融合网络的输入，分别记作{C₃,C₄,C₅}；The backbone feature extraction network is based on the Transformer architecture. The convolution module is used to replace the common self-attention module in the Transformer architecture. The CSPDarkFormer backbone feature extraction network is built and replaces the original CSPDarkNet53 backbone feature extraction network. The CSPDarkFormer backbone feature extraction network includes a 5-layer network structure, which is: the first layer is the Stem layer, the 2nd to 4th layers are composed of the Patch Embedding layer and the DFM module, and the 5th layer is composed of the PatchEmbedding layer, the DFM module and the SPPF module. The feature maps of the 3rd to 5th layers of the backbone feature extraction network are output as the input of the improved feature fusion network, which are denoted as {C ₃ ,C ₄ ,C ₅ } respectively;

Patch Embedding层包括一个大小为3、步长为2的卷积模块和一个BN归一化层，主要用于特征图下采样，当给予输入X时，其计算流程如下：The Patch Embedding layer includes a convolution module of size 3 and stride 2 and a BN normalization layer, which is mainly used for feature map downsampling. When given input X, its calculation process is as follows:

PE(X)＝Norm(Conv(X))PE(X)＝Norm(Conv(X))

式中，PE(·)为Patch Embedding层；X为Patch Embedding层的输入特征图；Conv(·)为卷积操作；Norm(·)为BN归一化，指的是对于输入x减去当前批次的均值，然后除以当前批次的标准差，最后再与γ相乘并与β相加，从而获得归一化后的x_i值，表示为：Where PE(·) is the Patch Embedding layer; X is the input feature map of the Patch Embedding layer; Conv(·) is the convolution operation; Norm(·) is the BN normalization, which means that the input x is subtracted from the mean of the current batch, then divided by the standard deviation of the current batch, and finally multiplied by γ and added to β to obtain the normalized _xi value, which is expressed as:

式中，x为输入特征图；ε为不为0的极小值，防止除数为0，γ和β为模型的可学习参数；E(x)和Var(x)分别为当前批次的均值和方差，具体可表示为：In the formula, x is the input feature map; ε is a minimum value that is not 0 to prevent the divisor from being 0, γ and β are the learnable parameters of the model; E(x) and Var(x) are the mean and variance of the current batch, respectively, which can be expressed as:

式中，n为当前批次数量。Where n is the current batch size.

所述DFM模块作为主干特征提取网络中的重要特征提取部件，如图4所示，主要包括：一个C2Former模块和一个多层感知机模块MLP，特征图首先经过归一化操作，然后经过C2Former模块处理并与其自身进行逐元素相加，形成残差连接，然后再经过归一化操作以及多层感知机实现不同通道间之间的信息交互，DFM模块的计算过程如下：The DFM module is an important feature extraction component in the backbone feature extraction network, as shown in FIG4 , and mainly includes: a C2Former module and a multi-layer perceptron module MLP. The feature map is first normalized, then processed by the C2Former module and added element by element to itself to form a residual connection, and then normalized and multi-layer perceptrons are used to realize information interaction between different channels. The calculation process of the DFM module is as follows:

式中，X_DFM为DFM模块的输出特征图；X为经过Patch Embedding层下采样后的特征图；Norm₁(·)和Norm₂(·)为BN归一化；F₁为中间特征图；C2Former(·)表示C2Former模块；MLP(·)为MLP多层感知机模块；Where X _DFM is the output feature map of the DFM module; X is the feature map after downsampling by the Patch Embedding layer; Norm ₁ (·) and Norm ₂ (·) are BN normalizations; F ₁ is the intermediate feature map; C2Former (·) represents the C2Former module; MLP (·) is the MLP multi-layer perceptron module;

所述C2Former模块，如图5所示，包括两个卷积核大小为1×1的CBS模块以及若干个DarkBlock模块，当输入为X时，首先通过卷积核大小为1×1的CBS模块初步提取特征并将通道数扩大两倍，然后对通道进行Split操作，即将通道均分为二，第一部分继续进行特征提取，在特征提取过程中，每经过一个DarkBlock模块处理后都需要进行Concate拼接操作，然后第二部分与第一部分特征提取后的特征图进行Concate操作，最后再经过卷积核大小为1×1的CBS模块调整输出通道数。The C2Former module, as shown in FIG5 , includes two CBS modules with a convolution kernel size of 1×1 and several DarkBlock modules. When the input is X, the CBS module with a convolution kernel size of 1×1 is first used to preliminarily extract features and double the number of channels. Then, the channels are split into two, and the first part continues to extract features. During the feature extraction process, a Concate operation is required after each DarkBlock module is processed. Then, the second part is concatenated with the feature map after feature extraction of the first part. Finally, the number of output channels is adjusted by the CBS module with a convolution kernel size of 1×1.

其中，特征提取部分采用DarkBlock模块，该模块包括一个卷积核大小为3×3的CBS模块和一个卷积核大小为5×5的DCBS模块，依次串联组成DarkBlock模块，同时在该模块中加入残差连接，防止梯度消失。Among them, the feature extraction part adopts the DarkBlock module, which includes a CBS module with a convolution kernel size of 3×3 and a DCBS module with a convolution kernel size of 5×5, which are connected in series to form a DarkBlock module. At the same time, a residual connection is added to the module to prevent the gradient from disappearing.

多层感知机模块MLP为Transformer架构中的基本模块，主要用于通道间的信息交互，其计算流程如下：The multi-layer perceptron module MLP is the basic module in the Transformer architecture, which is mainly used for information interaction between channels. Its calculation process is as follows:

X_MLP＝g^1×1(τ(g^1×1(X))X _MLP =g ^1×1 (τ(g ^1×1 (X))

式中，X_MLP为MLP模块的输出特征图；X为MLP模块的输入特征图；τ(·)为SiLU激活函数；g^1×1卷积核大小为1×1的卷积模块；Where X _MLP is the output feature map of the MLP module; X is the input feature map of the MLP module; τ(·) is the SiLU activation function; g is the convolution module with a ^1×1 convolution kernel size of 1×1;

本发明构建的CSPDarkFormer主干特征提取网络，以Transformer架构为基础，利用卷积模块替换Transformer结构中的常用自注意力模块，保留了Transformer中跨通道信息交互能力，显著增加了模型步骤局部上下文信息的能力，同时缓解Transformer中的自注意力机制在大特征图上因计算量过大导致检测速度大幅下降的不足，从而提高特征提取的效率及精度。The CSPDarkFormer backbone feature extraction network constructed in the present invention is based on the Transformer architecture and uses a convolutional module to replace the commonly used self-attention module in the Transformer structure, thereby retaining the cross-channel information interaction capability of the Transformer and significantly increasing the ability of the model step to obtain local contextual information. At the same time, it alleviates the deficiency of the self-attention mechanism in the Transformer that the detection speed is greatly reduced due to excessive computation on large feature maps, thereby improving the efficiency and accuracy of feature extraction.

(2)特征融合网络；(2) Feature fusion network;

改进的特征融合网络作为汽车刹车盘表面缺陷检测模型中的特征融合网络，相较原YOLOv5l的特征融合网络的改进在于：首先在输入部分添加SimSE轻量级通道注意力模块，然后将YOLOv5l中特征融合网络的C3模块替换为C2Former模块，最后分别输出80×80、40×40、20×20三层特征图作为预测网络的输入，分别记为{P₃,P₄,P₅}，以P₄输出层为例，具体计算流程可表示为：The improved feature fusion network is used as the feature fusion network in the automobile brake disc surface defect detection model. Compared with the original YOLOv5l feature fusion network, the improvement is as follows: first, the SimSE lightweight channel attention module is added to the input part, and then the C3 module of the feature fusion network in YOLOv5l is replaced by the C2Former module. Finally, the three-layer feature maps of 80×80, 40×40, and 20×20 are output as the input of the prediction network, respectively, and are recorded as {P ₃ ,P ₄ ,P ₅ }. Taking the P ₄ output layer as an example, the specific calculation process can be expressed as:

式中，为P₄层的输出特征图；和分别为第3层、第4层和第5层的中间特征图；C₄为主干特征提取网络的第4层输出特征图；UP(·)和Down(·)分别表示上采样和下采样操作，f^1×1表示卷积核大小为1×1的CBS模块，[·]表示Concate拼接操作符，C2Former(·)为C2Former模块，与YOLOv5l特征融合网络中C3模块的残差连接方式相同，即特征融合网络中C2Former模块的DarkBlock模块无残差连接。In the formula, is the output feature map of _P4 layer; and are the intermediate feature maps of the 3rd, 4th and 5th layers respectively; C ₄ is the output feature map of the 4th layer of the backbone feature extraction network; UP(·) and Down(·) represent upsampling and downsampling operations respectively, f ^1×1 represents the CBS module with a convolution kernel size of 1×1, [·] represents the Concate concatenation operator, and C2Former(·) is the C2Former module, which has the same residual connection method as the C3 module in the YOLOv5l feature fusion network, that is, the DarkBlock module of the C2Former module in the feature fusion network has no residual connection.

所述轻量级通道注意力模块SimSE为SE注意力模块的轻量化改进，原有SE注意力机制采用全连接层进行通道权重的重新分配，该方法当输入特征图较大时，全连接层存在较大的参数量与计算量，不利于模型的实际部署，所以在SimSE通道注意力机制模块中，适当减少全连接层数量并采用卷积替代全连接层，以降低计算量和参数量，其计算流程如下：The lightweight channel attention module SimSE is a lightweight improvement of the SE attention module. The original SE attention mechanism uses a fully connected layer to redistribute channel weights. When the input feature map is large, the fully connected layer has a large amount of parameters and calculations, which is not conducive to the actual deployment of the model. Therefore, in the SimSE channel attention mechanism module, the number of fully connected layers is appropriately reduced and convolution is used to replace the fully connected layer to reduce the amount of calculation and parameters. The calculation process is as follows:

X_SimSE＝σ(g^1×1(Avg(X))X _SimSE = σ(g ^1×1 (Avg(X))

式中，X_SimSE为SimSE轻量级通道注意力模块的输出特征图；X为通道注意力模块的输入；Avg(·)表示全局平均池化层；g^1×1卷积核大小为1×1的卷积模块；σ(·)为HardSigmoid激活函数。Where _XSimSE is the output feature map of the SimSE lightweight channel attention module; X is the input of the channel attention module; Avg(·) represents the global average pooling layer; g is the convolution module with a ^1×1 convolution kernel size of 1×1; σ(·) is the HardSigmoid activation function.

本发明改进的特征融合网络采用大卷积核进行特征融合及特征提取，扩大模型的感受野，提升模型的检测精度。The improved feature fusion network of the present invention adopts a large convolution kernel for feature fusion and feature extraction, which expands the receptive field of the model and improves the detection accuracy of the model.

(3)预测网络；(3) Prediction network;

预测网络在原有YOLOv5l预测网络的基础上进行解耦操作，移除预测置信度分支，保留类别预测分支和位置预测分支，同时基于无锚框检测机制构建损失函数用于模型的训练，汽车刹车盘表面缺陷检测模型损失函数如下：The prediction network is decoupled based on the original YOLOv5l prediction network, removing the prediction confidence branch, retaining the category prediction branch and the position prediction branch, and building a loss function based on the anchor-free box detection mechanism for model training. The loss function of the automobile brake disc surface defect detection model is as follows:

式中，λ_cls、λ_bbox和λ_dfl为各个损失权重系数，分别选取0.5、7.5和0.375。Where λ _cls , λ _bbox and λ _dfl are the loss weight coefficients, which are selected as 0.5, 7.5 and 0.375 respectively.

L_cls为分类损失，采用交叉熵损失函数，可具体表示为：L _cls is the classification loss, using the cross entropy loss function, which can be specifically expressed as:

式中，n为样本数量；为预测样本的类别概率分数；y_i为真实标签；log为对数运算。Where n is the number of samples; is the category probability score of the predicted sample; _yi is the true label; log is the logarithmic operation.

L_bbox和L_dfl共同组成定位损失，其中，L_bbox为CIOU损失函数，L_dfl为DistributionFocal Loss损失函数，具体可表示为：L _bbox and L _dfl together constitute the positioning loss, where L _bbox is the CIOU loss function and L _dfl is the DistributionFocal Loss loss function, which can be specifically expressed as:

式中，IOU为预测框与真实框之间的交并比；ρ²(b,b^gt)为预测框中点与真实框中点的L2距离；c为预测框与真实框之间最长对角线距离；α和v用于衡量宽高比，可具体表示为：Where IOU is the intersection-over-union ratio between the predicted box and the true box; ρ ² (b, b ^gt ) is the L2 distance between the midpoint of the predicted box and the midpoint of the true box; c is the longest diagonal distance between the predicted box and the true box; α and v are used to measure the aspect ratio, which can be specifically expressed as:

式中，w和h分别为预测框的宽和高，w^gt和h^gt分别为真实框的宽和高。Where w and h are the width and height of the predicted box, respectively, and w ^gt and h ^gt are the width and height of the real box, respectively.

式中，y为真实框中心点到边界的距离；y_i和y_i+1分别表示真实值y向下取整和向上取整的整数值；S_i与S_i+1表示预测框中心点到边界的概率值。Where y is the distance from the center point of the true box to the boundary; _yi and yi ₊₁ represent the integer values of the true value y rounded down and rounded up respectively; _Si and Si ₊₁ represent the probability values from the center point of the predicted box to the boundary.

本发明基于原有预测网络，将预测网络进行解耦操作，采用无锚检测机制，防止锚框等先验信息对检测精度的影响，从而提高边界框回归精度，并且基于无锚检测机制，优化了损失函数，将DFL损失与边界框回归CIOU损失联合表示，增加正负样本训练的全面性，提高边界框回归的收敛速度，进一步提升模型检测精度和效率。Based on the original prediction network, the present invention decouples the prediction network and adopts an anchor-free detection mechanism to prevent the influence of prior information such as anchor boxes on the detection accuracy, thereby improving the bounding box regression accuracy. In addition, based on the anchor-free detection mechanism, the loss function is optimized, and the DFL loss is jointly represented with the bounding box regression CIOU loss, which increases the comprehensiveness of positive and negative sample training, improves the convergence speed of bounding box regression, and further improves the model detection accuracy and efficiency.

S4、利用训练集样本和验证集样本对S3建立的缺陷检测模型进行迭代训练及验证，选取验证集mAP指标最高的权重模型作为最终模型。S4. Use the training set samples and the validation set samples to iteratively train and validate the defect detection model established in S3, and select the weighted model with the highest mAP index in the validation set as the final model.

具体的，训练方式包括：训练总轮数为600轮，Batch size选取16，优化器采用AdamW优化器，初始学习率为0.001，采用cos学习率衰减方式，训练过程中采用随机裁剪、Mosaic、Mixup、随机HSV变换数据增强方式，最后100轮关闭Mosaic和MixUp数据增强，防止过拟合。Specifically, the training method includes: the total number of training rounds is 600 rounds, the batch size is 16, the optimizer is the AdamW optimizer, the initial learning rate is 0.001, the cos learning rate decay method is adopted, and random cropping, Mosaic, Mixup, and random HSV transformation data enhancement methods are used during training. In the last 100 rounds, Mosaic and MixUp data enhancements are turned off to prevent overfitting.

训练过程包括：主干特征提取网络输入分辨率大小为640×640的训练图像，首先经过Stem层后将特征图分辨率减小为原来的2倍，然后依次经过3层由Padding Embedding层和DFM模块组成的特征提取层，最后再经过由Padding Embedding层、DFM模块和SPPF模块组成的特征提取层，完成最后的特征提取，最终，主干特征提取网络输出三层特征图，其分辨率大小分别为80×80、40×40、20×20，记作{C₃,C₄,C₅}，作为特征融合网络的输入特征图；在特征融合网络获得{C₃,C₄,C₅}后，首先经过SimSE轻量级通道注意力模块，然后经过改进的特征融合网络，最终获得三层特征图，其分辨率大小分别为80×80、40×40、20×20，分别记作{P₃,P₄,P₅}，作为预测网络的输入特征图；预测网络包含三层，每层输出分辨率与输入分辨率相对应，每一层包括两个分支，分别预测类别及位置，每个分支均包含两个卷积核大小为3×3的CBS模块以及卷积核大小为1×1的卷积，其中，输出分辨率80×80的预测网络分支负责预测小目标，比如砂眼以及细小脏污或划痕缺陷，输出分辨率40×40的预测网络分支负责预测中等目标，输出分辨率20×20的预测网络分支负责预测大目标，比如大面积脏污或较长划痕缺陷。最终汽车刹车盘缺陷检测模型输出缺陷类别及位置信息，均由矩形框标出。The training process includes: the backbone feature extraction network inputs a training image with a resolution of 640×640, first passes through the Stem layer to reduce the feature map resolution to twice the original, then passes through 3 feature extraction layers composed of a Padding Embedding layer and a DFM module in sequence, and finally passes through a feature extraction layer composed of a Padding Embedding layer, a DFM module and an SPPF module to complete the final feature extraction. Finally, the backbone feature extraction network outputs three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {C ₃ ,C ₄ ,C ₅ }, as the input feature maps of the feature fusion network; after the feature fusion network obtains {C ₃ ,C ₄ ,C ₅ }, it first passes through the SimSE lightweight channel attention module, and then passes through the improved feature fusion network, and finally obtains three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {P ₃ ,P ₄ ,P ₅ }, as the input feature map of the prediction network; the prediction network contains three layers, the output resolution of each layer corresponds to the input resolution, each layer includes two branches, predicting the category and position respectively, each branch contains two CBS modules with a convolution kernel size of 3×3 and a convolution kernel size of 1×1, among which the prediction network branch with an output resolution of 80×80 is responsible for predicting small targets, such as sand holes and small dirt or scratch defects, the prediction network branch with an output resolution of 40×40 is responsible for predicting medium targets, and the prediction network branch with an output resolution of 20×20 is responsible for predicting large targets, such as large-area dirt or long scratch defects. Finally, the automobile brake disc defect detection model outputs defect category and location information, both marked by rectangular boxes.

本发明中用到的CBS模块和DCBS模块分别如图6a和图6b所示，其中，CBS模块包含卷积模块、BN归一化层以及SiLU激活函数，DCBS包含由分离卷积和点卷积组成的深度可分离卷积模块、BN归一化层以及SiLU激活函数。The CBS module and DCBS module used in the present invention are shown in Figures 6a and 6b respectively, wherein the CBS module includes a convolution module, a BN normalization layer and a SiLU activation function, and the DCBS includes a depth-separable convolution module composed of separable convolution and point convolution, a BN normalization layer and a SiLU activation function.

本发明中用到的激活函数有SiLU和Hardsigmoid，具体可表示为：The activation functions used in the present invention are SiLU and Hardsigmoid, which can be specifically expressed as:

SiLU(x)＝x×sigmoid(x)SiLU(x)＝x×sigmoid(x)

S5、利用S2中的测试集样本对S4所获得的最优汽车刹车盘表面缺陷检测模型进行测试，根据mAP指标评估检测模型是否满足需求；若不满足要求，则重复S4，继续训练模型；若满足要求，则执行S6；S5, using the test set samples in S2 to test the optimal automobile brake disc surface defect detection model obtained in S4, and evaluating whether the detection model meets the requirements according to the mAP index; if it does not meet the requirements, repeat S4 and continue to train the model; if it meets the requirements, execute S6;

具体的，将mAP作为评估指标，具体定义如下：Specifically, mAP is used as the evaluation indicator, and its specific definition is as follows:

式中，n为类别数量，AP为P-R曲线和坐标轴所围成的面积，表示每类缺陷的预测精度；mAP为每类缺陷的精度的平均值，其中，P为准确率，R为召回率，可具体定义为：Where n is the number of categories, AP is the area enclosed by the P-R curve and the coordinate axis, which represents the prediction accuracy of each type of defect; mAP is the average accuracy of each type of defect, where P is the precision and R is the recall, which can be specifically defined as:

式中，TP为预测正确的正样本数量；FP为预测正确的负样本数量；FN为预测错误的正样数量In the formula, TP is the number of correctly predicted positive samples; FP is the number of correctly predicted negative samples; FN is the number of incorrectly predicted positive samples

S6、利用满足需求的缺陷检测模型对汽车刹车盘表面进行检测，输出检测结果，实现智能化缺陷检测与识别。S6. Use the defect detection model that meets the requirements to inspect the surface of the automobile brake disc, output the test results, and realize intelligent defect detection and identification.

实验验证Experimental verification

利用实例二中S4建立的汽车刹车盘表面缺陷数据集，对汽车刹车盘表面缺陷检测模型、YOLOv5l、YOLOv7l、Faster R-CNN和RetinaNet目标检测模型进行对比实验，结果如表1所示。Using the automobile brake disc surface defect dataset established by S4 in Example 2, a comparative experiment was conducted on the automobile brake disc surface defect detection model, YOLOv5l, YOLOv7l, Faster R-CNN and RetinaNet target detection models. The results are shown in Table 1.

表1Table 1

从表1中可以看出，本发明所提出的针对汽车刹车盘的表面缺陷检测模型具有良好的检测效果，相较于YOLOv5l和YOLOv7l分别提升1.5％和3％，与RetinaNet相比，性能提升8.1％，与双阶段目标检测模型Faster R-CNN相比，性能提升8.3％，满足实际生产精度需求。部分结果实验图如图7所示，其中每张图片左上角的数值为该类缺陷的概率值。As can be seen from Table 1, the surface defect detection model for automobile brake discs proposed in the present invention has a good detection effect, which is 1.5% and 3% higher than YOLOv5l and YOLOv7l respectively, 8.1% higher than RetinaNet, and 8.3% higher than the two-stage target detection model Faster R-CNN, meeting the actual production accuracy requirements. Some experimental results are shown in Figure 7, where the value in the upper left corner of each picture is the probability value of this type of defect.

综上，本发明在原有YOLOv5l目标检测模型的基础上对主干网络、特征融合网络以及预测网络进行改进以适配汽车刹车盘表面缺陷数据集，从而提高检测精度及效率，增强检测算法的鲁棒性，实现汽车刹车盘的自动化检测。In summary, based on the original YOLOv5l target detection model, the present invention improves the backbone network, feature fusion network and prediction network to adapt to the automobile brake disc surface defect dataset, thereby improving the detection accuracy and efficiency, enhancing the robustness of the detection algorithm, and realizing the automatic detection of automobile brake discs.

尽管已经示出和描述了本发明的实施例，对于本领域的普通技术人员而言，可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and variations may be made to the embodiments without departing from the principles and spirit of the present invention, and that the scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. A method for detecting surface defects of automobile brake discs based on deep learning, characterized in that it comprises the following steps:

Step 1: Build an image acquisition device at the end of the production line to obtain the surface images of automobile brake discs under different lighting conditions;

Step 2: pre-process and annotate the acquired automobile brake disc surface images, establish an automobile brake disc surface defect dataset, and divide all annotated sample images into a training set, a validation set, and a test set;

Step 3: Based on the automobile brake disc surface defect dataset, a surface defect detection model for automobile brake disc is constructed;

Step 4: Use the samples of the training set and the samples of the validation set to iteratively train and validate the defect detection model established in step 3, and select the weighted model with the highest mAP index in the validation set as the optimal automobile brake disc surface defect detection model;

Step 5: Use the samples of the test set in step 2 to test the optimal automobile brake disc surface defect detection model obtained in step 4, and evaluate whether the detection model meets the accuracy requirements according to the mAP index; if it does not meet the requirements, repeat step 4 and continue to train the detection model; if it meets the requirements, execute step 6;

Step 6: Use a defect detection model that meets the accuracy requirements to inspect the surface of the automobile brake disc and output the test results, thereby realizing intelligent defect detection and identification.

2. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 1, it is characterized in that the specific method of step 1 is as follows:

The image acquisition device composed of a high-resolution CMOS area array industrial camera and a surface light source is used to photograph the automobile brake disc under different lighting conditions;

The different lighting conditions include lighting intensity and lighting quality.

3. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 1, it is characterized in that the specific method of step 2 is as follows:

21) The original image of the automobile brake disc surface is cropped to obtain sub-images of size 640×640. There is a 20% overlap between the two sub-images. Each sub-image is mirrored and flipped with a probability of 50%;

22) Use LabelImg annotation software to annotate the image processed in step 21) and output it in VOC annotation format;

23) Convert the VOC annotation format to the COCO annotation format to complete the construction of the automobile brake disc surface defect dataset;

24) The automobile brake disc surface defect dataset constructed in step 23) is divided into a training set, a validation set and a test set.

4. According to the deep learning-based automobile brake disc surface defect detection method of claim 3, it is characterized in that the division ratio of the training set, the validation set and the test set is 8:1:1.

5. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 1, it is characterized in that the specific method of step three is as follows:

31) Using the YOLOv5l detection model as the benchmark model, the self-built CSPDarkFormer backbone feature extraction network was used to replace the CSPDarkNet53 backbone feature extraction network in the original YOLOv5l;

32) Improve the original feature fusion network of YOLOv5l as the feature fusion network of the automobile brake disc surface defect detection model;

33) Decouple the prediction network in the original YOLOv5l, remove the prediction confidence branch, retain the category prediction branch and the position prediction branch, and build a loss function based on the anchor-free box detection mechanism for model training.

6. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 5, it is characterized in that the specific method of step 31) is as follows:

The CSPDarkFormer backbone feature extraction network includes a 5-layer network structure, which is: the first layer is the Stem layer, the 2nd to 4th layers are composed of the Patch Embedding layer and the DFM module, and the 5th layer is composed of the Patch Embedding layer, the DFM module and the SPPF module. The feature maps of the 3rd to 5th layers of the backbone feature extraction network are output as the input of the bidirectional feature pyramid network, which are respectively denoted as {C ₃ ,C ₄ ,C ₅ };

Among them, the Patch Embedding layer includes a convolution module with a size of 3 and a stride of 2, and a BN normalization layer;

The DFM module mainly includes a C2Former module and a multi-layer perceptron module MLP;

The feature map is first normalized, then processed by the C2Former module and added to itself element by element to form a residual connection, and then normalized and multi-layer perceptron MLP module is used to realize information interaction between different channels;

The calculation process of the DFM module is as follows:

Where X _DFM is the output feature map of the DFM module; X is the feature map after downsampling by the Patch Embedding layer, that is, Input is the input feature map of the Patch Embedding layer. is a 3×3 convolution operation with a step size of 2 for the input feature map; Norm ₁ (·) and Norm ₂ (·) are BN normalization; C2Former (·) is the C2Former module; MLP (·) is the MLP multi-layer perceptron module; F ₁ is the intermediate feature map of the DFM module;

The C2Former module includes two CBS modules with a convolution kernel size of 1×1 and several DarkBlock modules; wherein the CBS module includes a convolution module, a BN normalization layer and a SiLU activation function;

The calculation process of the C2Former module is as follows:

X _C2Former =f ^1×1 ([[n×D(f ^1×1 (X))],X])

Where X _C2Former is the output feature map of the C2Former module; X is the input feature map of the C2Former module; n is the number of DarkBlock modules, which are connected in series, and each feature map is concatenated after being processed by a DarkBlock module, and finally concatenated with the input X; f ^1×1 is a CBS module with a convolution kernel size of 1×1; [·] is the Concate concatenation operator; D is a DarkBlock module, and the calculation process is as follows:

Where X _DarkBlock is the output feature map of the DarkBlock module; X is the input feature map of the DarkBlock module; is a 5×5 DCBS module, which includes a 5×5 depthwise separable convolution module, a BN normalization layer, and a SiLU activation function; f ^3×3 is a CBS module with a convolution kernel size of 3×3; the DarkBlock module uses a residual connection method;

The multi-layer perceptron MLP module includes a convolution module with a convolution kernel size of 1×1 and a SiLU activation function. The calculation process is as follows:

X _MLP =g ^1×1 (τ(g ^1×1 (X))

Where _XMLP is the output feature map of the MLP module; X is the input feature map of the MLP module; τ(·) is the SiLU activation function; g1 ^×1 is the convolution module with a convolution kernel size of 1×1.

7. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 5, it is characterized in that the specific method of step 32) is as follows:

First, the SimSE lightweight channel attention module is added to the input part, and then the C3 module in the original YOLOv5l feature fusion network is replaced by the C2Former module. Finally, the three-layer feature maps of 80×80, 40×40, and 20×20 are output as the input of the prediction network, denoted as {P ₃ ,P ₄ ,P ₅ }, where the DarkBlock module in the C2Former module has no residual connection;

The SimSE lightweight channel attention module is a lightweight SE attention module, and the calculation process is as follows:

X _SimSE = σ(g ^1×1 (Avg(X))

Where _XSimSE is the output feature map of the SimSE lightweight channel attention module; X is the input feature map of the SimSE lightweight channel attention module; Avg(·) is the global average pooling operation; g1 ^×1 is the convolution module with a convolution kernel size of 1×1; σ(·) is the HardSigmoid activation function.

8. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 5, it is characterized in that the specific method of step 33) is as follows:

The loss function of the prediction network is specifically expressed as:

Loss＝λ _cls L _cls +λ _bbox L _bbox +λ _dfl L _dfl

Where λ _cls , λ _bbox and λ _dfl are the weight coefficients of each loss, which are selected as 0.5, 7.5 and 0.375 respectively; L _cls is the classification loss; L _bbox and L _dfl together constitute the positioning loss;

The main positioning loss function L _bbox is CIOU loss, which is specifically expressed as:

Where IOU is the intersection-over-union ratio between the predicted box and the real box; ρ ² (b, b ^gt ) is the L2 distance between the midpoint of the predicted box and the midpoint of the real box; c is the longest diagonal distance between the predicted box and the real box; α and v are used to measure the aspect ratio, which is specifically expressed as:

Where w and h are the width and height of the predicted box, respectively, and w ^gt and h ^gt are the width and height of the real box, respectively;

The auxiliary positioning loss function L _dfl is Distribution Focal Loss, which is specifically expressed as:

L _dfl =-((y _i+1 -y)log(S _i )+(yy _i )log(S _i+1 ))

Where y is the distance from the center point of the real box to the boundary; _yi and yi ₊₁ are the integer values of the real value y rounded down and rounded up respectively; _Si and Si ₊₁ are the probability values from the center point of the predicted box to the boundary; log is the logarithmic operation;

The L _cls is the cross entropy loss, expressed as:

Where n is the number of samples; is the category probability score of the predicted sample; _yi is the true label.

9. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 5, it is characterized in that in the step 4,

The training method includes: the total number of training rounds is 600 rounds, the batch size is 16, the optimizer is the AdamW optimizer, the initial learning rate is 0.001, the cos learning rate decay method is adopted, and the random cropping, Mosaic, Mixup, and random HSV transformation data enhancement methods are used during the training process. Mosaic and MixUp data enhancements are turned off in the last 100 rounds;

The training process includes: the backbone feature extraction network inputs a training image with a resolution of 640×640, first passes through the Stem layer to reduce the feature map resolution to twice the original, then passes through 3 feature extraction layers composed of a Padding Embedding layer and a DFM module in sequence, and finally passes through a feature extraction layer composed of a Padding Embedding layer, a DFM module and an SPPF module to complete the final feature extraction. Finally, the backbone feature extraction network outputs three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {C ₃ ,C ₄ ,C ₅ }, as the input feature maps of the feature fusion network; after the feature fusion network obtains {C ₃ ,C ₄ ,C ₅ }, it first passes through the SimSE lightweight channel attention module, and then passes through the improved feature fusion network, and finally obtains three layers of feature maps with resolutions of 80×80, 40×40, and 20×20, respectively, denoted as {P ₃ ,P ₄ ,P ₅ }, as the input feature map of the prediction network;

The prediction network consists of three layers, and the output resolution of each layer corresponds to the input resolution. Each layer includes two branches, which predict the category and position respectively. Each branch contains two CBS modules with a convolution kernel size of 3×3 and a convolution with a convolution kernel size of 1×1. Among them, the prediction network branch with an output resolution of 80×80 is responsible for predicting small targets; the prediction network branch with an output resolution of 40×40 is responsible for predicting medium targets; the prediction network branch with an output resolution of 20×20 is responsible for predicting large targets; finally, the automobile brake disc defect detection model outputs the defect category and location information, both marked by rectangular boxes.

10. According to the method for detecting surface defects of automobile brake discs based on deep learning in claim 5, it is characterized in that in the step 5, mAP is used as an evaluation index, which is specifically defined as follows:

Where n is the number of categories; AP is the area enclosed by the P-R curve and the coordinate axis, which represents the prediction accuracy of each type of defect; mAP is the average accuracy of each type of defect, where P is the precision and R is the recall, which is specifically defined as:

In the formula, TP is the number of correctly predicted positive samples; FP is the number of correctly predicted negative samples; FN is the number of incorrectly predicted positive samples.