CN117011231B - Strip steel surface defect detection method and system based on improved YOLOv5 - Google Patents

Strip steel surface defect detection method and system based on improved YOLOv5 Download PDF

Info

Publication number
CN117011231B
CN117011231B CN202310774663.8A CN202310774663A CN117011231B CN 117011231 B CN117011231 B CN 117011231B CN 202310774663 A CN202310774663 A CN 202310774663A CN 117011231 B CN117011231 B CN 117011231B
Authority
CN
China
Prior art keywords
feature
module
strip steel
network
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310774663.8A
Other languages
Chinese (zh)
Other versions
CN117011231A (en
Inventor
张永平
沈思洁
徐森
郭乃瑄
孟海涛
陈朝峰
邵星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Institute of Technology
Yancheng Institute of Technology Technology Transfer Center Co Ltd
Original Assignee
Yancheng Institute of Technology
Yancheng Institute of Technology Technology Transfer Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Institute of Technology, Yancheng Institute of Technology Technology Transfer Center Co Ltd filed Critical Yancheng Institute of Technology
Priority to CN202310774663.8A priority Critical patent/CN117011231B/en
Publication of CN117011231A publication Critical patent/CN117011231A/en
Application granted granted Critical
Publication of CN117011231B publication Critical patent/CN117011231B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a strip steel surface defect detection method and system based on improved YOLOv5, wherein the method comprises the following steps: building an improved anchor-freeYOLOv5 network; acquiring a strip steel surface image; inputting the strip steel surface image into the anchor-freyyov 5 network to obtain a strip steel surface defect detection result; and outputting the detection result of the surface defect of the strip steel. According to the strip steel surface defect detection method and system based on improved YOLOv5, the number of super parameters is reduced, the detection speed is improved, the Loss function is improved aiming at the problems of unbalanced data and low training speed, the EIoU Loss is adopted as a regression Loss function, the scale is unchanged, the evaluation index of frame regression can be directly optimized, the model fitting speed is improved, the degradation of a network caused by a large number of negative samples is prevented, the Focal length Loss is adopted as objective Loss, and the detection precision of samples difficult to classify is improved by increasing the Loss weight.

Description

Strip steel surface defect detection method and system based on improved YOLOv5
Technical Field
The invention relates to the technical field of computer data processing, in particular to a strip steel surface defect detection method and system based on improved YOLOv 5.
Background
At present, the strip steel is an important product in the modern steel industry, the strip steel has large production quantity and high conveying speed, higher requirements are put forward on the accuracy and the rapidity of the detection of the defects on the surface of the strip steel, and the detection method is also required to have better robustness and generalization due to the complex environment and various interference factors of the production site. Traditional methods such as eddy current detection, magnetic leakage detection, infrared detection and the like are difficult to meet the requirements of rapid and accurate detection. The deep learning which is rapidly developed in recent years becomes a research hot spot for detecting the surface defect image of the strip steel. The target detection algorithm based on deep learning replaces the traditional target detection algorithm and becomes the main stream of the target detection algorithm. Existing deep learning object detection algorithms are largely divided into two categories, the detection algorithm is two-stage detection algorithm represented by Faster R-CNN and Mask R-CNN; the other is a one-stage detection algorithm represented by YOLO and SSD. The YOLO series model is balanced between the detection precision and the detection speed, and a new reference model is continuously pushed out, so that development of several generations is currently experienced, a large number of advanced experiences of other models are absorbed, the detection precision is continuously improved, and the good detection speed is maintained. Compared with the prior version, the detection speed and the detection precision of the existing YOLOv5 algorithm are greatly improved, and the method is one of the most excellent detection algorithms at present. However, an anchor-based method is used in a series of target detection algorithms including YOLOv5, and for different data sets, the method has an anchor box that is initially set, and in the network training process, the network outputs a prediction frame based on the initial anchor box, and then compares the prediction frame with a real frame (grouping algorithm), and then reversely updates and iterates network parameters. But this approach has drawbacks. The setting of the anchors requires manual design (aspect ratio, size and number of anchors), and also requires different designs for different data sets, which is quite cumbersome; the match mechanism of the anchor allows the frequency to which the extreme dimensions are matched and lower relative to the frequency to which objects of moderate size are matched. Training networks are less likely to learn these extreme samples while learning. The number of anchors is large, and each anchor needs to perform IOU calculation, so that efficiency is reduced. The Anchor-based algorithm introduces non-maximal suppression (NMS), which improves detection accuracy, but the computational complexity and complexity seriously prevent the improvement of detection speed.
In recent years, although a basic detection framework for target detection is already determined, new ideas such as Anchor-free, transformer still emerge, and optimization of the detection framework is still ongoing. The anchor-free method represented by CornerNet, FCOS attempts to remove the prior frame, reduces the number of super parameters, and has reached detection accuracy approaching that of the anchor-based method. This approach, while still problematic, provides new ideas for target detection technology. An Anchor-free YOLOv5 network for detecting defects on the surface of a tape was therefore proposed. The YOLOv5s network in YOLOv5 is selected, and the network structure is modified in three ways.
Disclosure of Invention
The embodiment of the invention provides a strip steel surface defect detection method based on improved YOLOv5, which comprises the following steps:
building an improved anchor-freeYOLOv5 network;
acquiring a strip steel surface image;
inputting the strip steel surface image into the anchor-freyyov 5 network to obtain a strip steel surface defect detection result;
and outputting the detection result of the surface defect of the strip steel.
Preferably, the Anchor-freeYOLOv5 network includes: the device comprises a feature map module, a feature fusion module, a convolution module and a detector module which are connected in sequence.
Preferably, the working mechanism of the Anchor-freeYOLOv5 network includes: the input network module, the backhaul network module, the Neck network module and the Prediction network module are sequentially connected.
Preferably, the input network module performs Mosaic data enhancement on the input strip steel surface image.
Preferably, the backup network module is provided with a Focus structure and a CSPDarknet structure for extracting feature images of the strip steel surface image after the metal data enhancement;
the CSPDarknet structure includes 5 CSP modules.
Preferably, the Neck network module adopts a BiFPN module+PAN module structure;
the output of the backbond network module is used as the input of the BiFPN module to perform feature fusion to obtain a feature pyramid;
the PAN module firstly copies the lowest layer in the feature pyramid and becomes the bottommost layer of the new feature pyramid;
performing downsampling operation on the bottommost layer of the new feature pyramid;
the penultimate layer of the feature pyramid is subjected to 3x3 convolution with the step length of 2, and is added with the bottom layer subjected to downsampling by a transverse connection; the addition mode adopts concat operation;
finally, a 3x3 convolution is performed to fuse the features of the addition result.
Preferably, the Prediction network module uses EIoUloss when outputting.
Preferably, the Anchor-freeYOLOv5 network uses EIoU as a bounding box regression loss during training.
Preferably, after feature fusion, the Neck network module predicts three-dimensional tensor, objectivity and class prediction of the coding bounding box based on Anchor-free bounding box regression using two convolution layers.
The embodiment of the invention provides a strip steel surface defect detection method based on improved YOLOv5, which comprises the following steps:
the building module is used for building an improved anchor-freeYOLOv5 network;
the acquisition module is used for acquiring the surface image of the strip steel;
the detection module is used for inputting the strip steel surface image into the anchor-freyolov 5 network to obtain a strip steel surface defect detection result;
and the output module is used for outputting the detection result of the surface defect of the strip steel.
The invention has the following beneficial effects:
(1) In order to alleviate the problems in the above-mentioned Anchor-based method, a novel anchor-free detection scheme is proposed, the number of super-parameters is reduced, and the detection speed is improved.
(2) The method aims at solving the problems of unbalanced data and low training speed, improves the loss function, adopts EIoU loss as a regression loss function, has unchanged scale, and can directly optimize the evaluation index of frame regression to accelerate the model fitting speed.
(3) To prevent degradation of the network by a large number of negative samples, focal Loss (FL) is used as an objective loss. Focal Loss improves the detection accuracy of samples difficult to classify by increasing the Loss weight.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of an Anchor-freeYOLOv5 network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a feature fusion module according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a convolution module according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a detector module according to an embodiment of the present invention;
FIG. 5 is a diagram of a BiFPN structure in an embodiment of the invention;
fig. 6 is a schematic diagram of a fpn+pan connection mode in an embodiment of the present invention;
fig. 7 is a diagram of PAN architecture in an embodiment of the present invention;
FIG. 8 is a regression diagram of bounding boxes according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a method for detecting defects on a strip steel surface by improving YOLOv5 according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a strip steel surface defect detection method based on improved YOLOv5, which is shown in fig. 1 and comprises the following steps:
building an improved anchor-freeYOLOv5 network;
acquiring a strip steel surface image;
inputting the strip steel surface image into the anchor-freyyov 5 network to obtain a strip steel surface defect detection result;
and outputting the detection result of the surface defect of the strip steel.
The Anchor-freeyolv 5 network includes: the device comprises a feature map module, a feature fusion module, a convolution module and a detector module which are connected in sequence.
The working mechanism of the Anchor-freeyolv 5 network comprises: the input network module, the backhaul network module, the Neck network module and the Prediction network module are sequentially connected.
And the input network module performs Mosaic data enhancement on the input strip steel surface image.
The back bone network module is provided with a Focus structure and a CSPDarknet structure which are used for extracting feature images of the strip steel surface image after the Mosaic data enhancement;
the CSPDarknet structure includes 5 CSP modules.
The Neck network module adopts a BiFPN module+PAN module structure;
the output of the backbond network module is used as the input of the BiFPN module to perform feature fusion to obtain a feature pyramid;
the PAN module firstly copies the lowest layer in the feature pyramid and becomes the bottommost layer of the new feature pyramid;
performing downsampling operation on the bottommost layer of the new feature pyramid;
the penultimate layer of the feature pyramid is subjected to 3x3 convolution with the step length of 2, and is added with the bottom layer subjected to downsampling by a transverse connection; the addition mode adopts concat operation;
finally, a 3x3 convolution is performed to fuse the features of the addition result.
And the output of the Prediction network module uses EIoUloss.
The Anchor-freeyolv 5 network uses EIoU as a bounding box regression loss during training.
After feature fusion, the Neck network module predicts three-dimensional tensor, objectivity and class prediction of the coding bounding box by using two convolution layers based on Anchor-free bounding box regression.
The working principle and the beneficial effects of the technical scheme are as follows:
although YOLOv5 achieves satisfactory results in object detection, the network divides the input image into a plurality of grid areas of the same size, then predicts the coordinate information of a plurality of bounding boxes, the confidence scores of the classifications and the probabilities of the respective categories of each bounding box for each grid area, then filters out some bounding boxes which are unlikely to be objects according to the set confidence score threshold, and finally processes out the rest of redundant bounding boxes through non-maximal suppression, thus obtaining the final detection result. There are still some problems in the application of the detection of defective images. For example, input of a 640X640 image, YOLOv5 needs to predict thousands of anchor boxes, but the number of defects of a defective image is about one or two. Too many negative samples may result in high false positive rates and low recall rates. To alleviate this problem, an Anchor-free YOLOv5 network for detection of defects on a tape surface is proposed, as shown in fig. 1.
FIG. 1 depicts an overall detection network, F i And F i ' represents feature graphs of the path from the lower and from the top to the lower, respectively, and i represents the level of feature mapping. Fig. 2 is a feature fusion module corresponding to the Agg module of fig. 1, fig. 3 is a convolution module corresponding to Conv, n×nconv of fig. 1, and indicates that the convolution kernel has a size of n×n. Fig. 4 is a detector module.
Specifically, the network divides an input image into a plurality of grid areas with the same size, then predicts coordinate information of a plurality of bounding boxes, confidence scores of classification and probabilities of each class of each bounding box for each grid area, then filters out some bounding boxes which are unlikely to be targets according to a set confidence score threshold, and finally processes out the rest redundant bounding boxes after non-maximal inhibition, thus obtaining a final detection result.
The working mechanism of the network is described below, and the network structure comprises four parts of input, back, ck and Prediction.
The input image is subjected to Mosaic data enhancement, four pictures are adopted for Mosaic data enhancement, and the four pictures are spliced in a random scaling, random cutting and random arrangement mode, so that the advantages of greatly enriching detection as a data set, particularly, a plurality of small targets are added in random scaling, and the robustness of a network is better.
The Backbone network part of the backhaul is mainly used for feature extraction and mainly adopts the following structure: focus structure, CSPDarknet structure. The Focus structure is a picture which was recently introduced in YOLOv5 by the original author and is not introduced in YOLOv1-YOLOv4 for directly processing input, and Focus is mainly a slicing operation on the picture. For example, the original 608x608x3 image is input into a Focus structure, and is firstly changed into a feature map of 304x304x12 by slicing operation, and then is changed into a feature map of 304x304x32 by convolution operation of 32 convolution kernels once. CSPDarknet is a structure generated based on YOLOv3 backbone network dark 53 by referring to the experience of CSPNet in 2019, and comprises 5 CSP modules. The size of the convolution kernel before each CSP module is 3x3, stride=2, and thus can function as a downsampling. Since the backbox has 5 CSP modules, the input image is 608x608, so the law of feature map variation is: after the 608- >304- >152- >76- >38- >19 passes through the CSP module for 5 times, a characteristic diagram with the size of 19x19 is obtained. The CSP module is adopted to divide the feature mapping of the base layer into two parts, and then the two parts are combined through the cross-stage hierarchical structure, so that the calculation amount is reduced, and meanwhile, the accuracy can be ensured. The mesh activation function is used in the backhaul of the network, and the leakage_relu function is used in the latter network.
Feature fusion is carried out on the Neck of the Neck network, a BiFPN+PAN structure is adopted, and a CSP structure is also added, so that the capability of network feature fusion is enhanced. The output of the backbone network is simply processed (the number of channels is adjusted) to be used as the input of the BiFPN module for carrying out feature fusion so as to obtain a final feature pyramid with richer semantics and details. The resolution of each layer of the feature pyramid is different, so different feature layers are used to detect objects of different proportions. But all share a detection head. The detection heads are divided into a classification subnet and a regression subnet. The classification subnetwork is used to predict the object for each anchor point of K, where K represents the number of classes of objects in the dataset used for training. The regression subnetwork is used to predict a 4-dimensional class agnostic offset for each anchor point that represents the distance from the anchor point to the left, top, right, and bottom of the prediction box if the confidence of the anchor point exceeds a threshold. Introduction of BiFPN is an improved part of the present application, and the negk part of the original YOLOv5 structure is fpn+pan. BiFPN is known as a bi-directional feature pyramid network. In fusing features with different resolutions, we first resize them to the same resolution and then aggregate them. Since different input elements have different resolutions, their contributions to the output elements are often non-uniform. Thus, we add an additional weight to each input and let the network know the importance of each input feature. The weighting method is as follows:
wherein O represents the result after fusion, I i Is input, omega i And omega j Is a learnable weight and Relu is used for each update to ensure that its value is greater than 0, and epsilon=0.0001 is a small fraction whose primary function is to prevent the value of the denominator from being equal to 0, epsilon + sigma j ω j Weight, ω i ·I i Representing the weighted features. For ease of understanding, two 5-level fusion features are described in bipin:
here, theInput features of fifth layer on top-down path, +.>Is an intermediate feature of the fifth layer on the top-down path, and +.>Is the output feature of the sixth layer in the bottom-up approach, ω and ω' are learning weights, C represents a depth separable convolution operation, and R represents a resolution-matched up-sampling or down-sampling operation.
As shown in fig. 5-6, a bottom-up feature pyramid is added to the bippn, which contains two PAN structures. The PAN first replicates the lowest level of the feature pyramid, becoming the lowest level of the heart feature pyramid. The bottommost layer of the new feature pyramid is subjected to a downsampling operation, and then the penultimate layer of the feature pyramid is subjected to a convolution of 3x3, wherein the step is 2; and then adds to the downsampled bottom layer by a cross-connect. Finally, a 3x3 convolution is performed to fuse their features. Thus, the combination operation BiFPN conveys strong semantic features, while PAN conveys strong positioning features from bottom to top, two hands are connected, and parameter aggregation is carried out on different detection layers from different trunk layers.
As shown in fig. 7, the concat operation is adopted in the final addition mode of the PAN, the series features are fused, and the two features are directly connected. If the dimensions of the two input features x and y are p and q, the dimension of the output feature z is p+q. A new feature polymerization mode is used, which can generate more proper anchor frame to obtain better performance, and consists of two modules of Agg and Conv, as shown in fig. 2 and 3, the fourth layer and the third layer F can be obtained in the formula (1) (2) 3 ' and F 4 'top-down feature map'.
F 4 ′=Conv(F 4 +U(F 5 ′)) (1)
F 3 ′=Conv(F 3 +U(F 4 ′)) (2)
In the formula (1), U represents F 5 ' 2X upsampling is performed and then convolved with F by a 1X1 convolution kernel 4 Has the same shape. Then U (F) 5 ') with F 4 Adding and applying to Conv model to obtain F 4 '. The formula (2) is the same as the formula (1). F (F) 3 And F 4 Refer to input feature diagram, F 3 ′、F 4 ' and F 5 ' refers to the output feature map after feature fusion. F (F) 4 ' and F 3 The' spatial resolution formula is medium and highest for detecting medium and small samples, respectively. The object of the research is that the surface defect of the strip steel belongs to a small target.
The Prediction output part of the network structure is modified to use EIoU loss, which will be explained in detail later.
Anchor-free bounding box regression
As shown in fig. 8, after neck feature fusion through YOLOv5s network, two convolution layers are applied to predict the coding bounding boxThree-dimensional tensor, objectivity, and class prediction. Here, the bounding box regression is performed using the AF scheme instead of the AB scheme, which has three disadvantages. First, the anchor point size in YOLOv5 is a super parameter, which is difficult to set in advance. Second, the AB scheme introduces multiple anchor points, which will increase the number of negative samples. Thus, the imbalance between positive and negative samples is exacerbated. Third, detection performance is highly dependent on the size of the anchors, and improper anchor size may lead to performance degradation. The study used an anchor-free protocol for bounding box regression as shown in the detector of fig. 4. The boundary regression block diagram is shown in fig. 8. In an auto-focus scheme, the anchor point size need not be preset. Prediction boundary box offset related parameter (t x1 ,t y1 ,t x2 ,t y2 ) For each coordinate (x c, ,y c ) Then, the position of the prediction frame (x 1 ,y 1 ,x 2 ,y 2 ) The calculation method of (2) is as follows:
wherein (x) 1 ,y 1 ) And (x) 2 ,y 2 ) Representing the top left and bottom right points of the prediction box, respectively. (x) c ,y c ) Is a coordinate in the feature map. At (x) c ,y c ) The reason why (0.5 ) is added is to make the center point of the detected image symmetrical. It should be noted that (e) tx1 ,e ty1 ),e tx2 ,e ty2 ) Is derived from (x) c +0.5,y c +0.5), whileNot (t) x1 ,t y1 ,t x2 ,t y2 ). The purpose of using the index is to narrow the scope of the bounding box regression, thereby improving the accuracy of the regression. Strides i Is the position conversion ratio. For frame predictions generated by different scales (i=3, 4, 5), stride i 8, 16, 32, respectively.
3.1EIoU loss function
YOLOv5 used CIoU as a bounding box regression loss during training. EIoU was chosen for the boundary box regression loss in this study, which was more appropriate for the anchor-free scheme. The parameter v in the CIoU function reflects the difference between aspect ratios, rather than the confidence difference for the wide and high, respectively, which has some effect on model fitting. Therefore, the EIoU function is adopted as a loss function, the EIoU function divides the similarity parameter v of the measured aspect ratio into two parts, the losses of the width and the height are calculated respectively, and the convergence speed is faster. The EIoU loss function is:
from this it can be seen that EIoU divides the loss function into three parts, ioU loss, distance loss Ldis, side loss Lasp. It can be seen that EIoU has side length directly as penalty term. Rho in 2 (b,b gt ) Euclidean distance, ρ, representing the widths of the real and predicted frames 2 (ω,ω gt ) Euclidean distance, c, representing the heights of the real and predicted frames ω Representing the shortest width comprising both the real and predicted frames, c h Representing the shortest height that contains both the real and predicted frames.
The flow of CIoU algorithm is introduced:
if the prediction block is B P =(x 1 ,y 1 ,x 2 ,y 2 ) While GT box (group trunk) is B g =(x 1 ’,y 1 ’,x 2 ’,y 2 ’)
Boundary frame coordinates B of input prediction frame B P And GT box B g
B P =(x 1 ,y 1 ,x 2 ,y 2 )
B G =(x 1 ’,y 1 ’,x 2 ’,y 2 ’)
And (3) outputting: l (L) GIoU Step 1: calculating the area P of B: a is that P =(x 2 -x 1 )x(y 2 -y 1 )
Step 2: area g of B is calculated: ag= (x) 2 ‘-x 1 ‘)x(y 2 ’-y 1 ‘)
Step 3: calculating areas P and B of intersecting boxes between B g ,IB:
x 1 IB =max(x 1 ,x 1 ‘)x 2 IB =min(x 2 ,x 2 ’)
y 1 IB =max(y 1 ,y 1 ‘)y 2 IB =min(y 2 ,y 2 ‘)
if(x 2 IB >x 1 IB and y 2 IB >y 1 IB ):
IB=(x 2 IB -x 1 IB )×(y 2 IB -y 1 IB )
Else:
IB=0
End
Step 4: calculating areas P and B of the minimum enclosure containing B g ,EB:
x 1 EB =min(x 1 ,x 1 ‘)x 2 EB =max(x 2 ,x 2 ‘)
y 1 EB =min(y 1 ,y 1 ’)y 2 EB =max(y 2 ,y 2 ‘)
EB=(x 2 EB -x 1 EB )×(y 2 EB -y 1 EB )
Step 5: calculation IoU:
AU=A p +A g -IB
IoU=IB/AU
step 6: computing GIoU
GIoU=IoU-(EB-AU)/EB
Step 7: calculating a loss function L GIoU
L GIoU =1-GIoU
Step 8: AUB is unchanged when GIoU is contained, two frame distance is changed, GIoU loss is unchanged, and DIoU is improved
Calculating DIoU:
step 9: calculate L DIoU
L DIoU =1-IoU+DIoU
Step 10: the DIoU is improved, the center points are coincident, the distance between the center points is unchanged, the DIOU Loss is unchanged, and the aspect ratio is improved to CIOU. The values of c and d are unchanged when the two box center points coincide. The aspect ratio of the lead-in frame is required at this time
Computing CIoU:
CIoU=DIoU+αv
where α is a weight function and v is used to measure the uniformity of the aspect ratio:
step 11: calculation L CIoU
CIoU=1-IoU+DIoU+αv
The penalty term of EIoU is to split the influence factor of aspect ratio based on the penalty term of CIoU to calculate the length and width of the target frame and the anchor frame, respectively, and the loss function comprises three parts: the overlap loss, center distance loss, width-height loss, the first two parts continue the method in CIoU, but the width-height loss directly minimizes the difference between the width and height of the target frame and the anchor frame, resulting in faster convergence speed.
1.2Focal loss function
The anchor-free scheme reduces the number of prediction frames compared to original YOLOv5, but the positive and negative samples remain extremely unbalanced. This imbalance makes the cross entropy loss employed by the objective loss (Objectness) of YOLOv5 unsuitable. Focal Loss (FL) may be an important way to alleviate this problem. The method has the important point that the weight is added to the loss corresponding to the sample according to the difficulty of sample resolution, namely, the sample easy to distinguish is added with smaller weight, and the sample difficult to distinguish is added with larger weight, so that the performance is improved. The Focal Loss has the following characteristics: when the sample is very small (whether the sample is difficult to divide or not, whether the division is correct or not), the regulating factor approaches to 1, and the weight of the sample in the loss function is not affected; when the sample is large (the sample is easy to divide, whether the division is correct or not), the regulating factor approaches 0, and the weight of the sample in the loss function is reduced greatly; the focusing parameters may adjust the degree of weight reduction of the easily classified samples, the greater the degree of weight reduction. Now begin to describe how the functional formula of Focal loss is derived:
the cross entropy loss LCE for the two classes can be described by equation (8).
L CE (p,y)=-ylog(p)-(1-y)log(1-p) (8)
Where y ε {0,1} represents the GT class, where 1 represents the object and 0 represents the background. p epsilon [0,1 ]]Is the target probability output of the logic function output. In order to extract indistinguishable samples and mitigate the weight of the easily distinguishable samples, at L CE Adding regulating factor to obtain L FL
L FL (p,y)=|y-p| γ L CE (p,y) (9)
In formula (9), γ is an adjustable parameter and is greater than 0. When a sample is correctly classified as such,
|y-p|->0 is p->y,L FL (p, y) is close to 0, indicating a decrease in loss of the easily classified samples. When the sample is misclassified, |y-p| ->1,L CE Equivalent to L FL
To mark the target of each prediction box for the training phase. The prediction box centered closest to the true box is marked positive. The frame overlaps with the real frame larger than the threshold value, and objective loss L is not counted FL . The threshold is set to 0.5. Class prediction loss is also a two-class cross entropy loss, which is the same as original YOLOv 5. Thus, the final loss function of Anchor-free YOLOv5 in the training phase of this study is as follows:
L AF =L EIoU (box)+L FL (objectness)+L CE (class) (10)
theoretically, AF Yolov5 has a higher efficiency than AB Yolov 5. Take an N x N grid as an example. For the AF method, the maximum number of prediction boundary boxes is n×n, since we predict one box for each point. For the AB method, the maximum number of prediction bounding boxes is 3×n×n, since three boxes are predicted per point. In theory, the training time and the test time of the AF method are smaller than those of the AB method.
The improved feature fusion approach may result in a more like defect detection box.
The EIoU regression loss divides the loss term of the aspect ratio into the difference value of the predicted width and height and the minimum external frame width and height, so that the convergence of the prediction frame is accelerated, and the regression accuracy of the prediction frame is improved.
Using FL as the targeting loss improves detection performance. On the one hand, when the number difference between positive and negative samples is large, the network may not be suitable for convergence due to cross entropy loss; on the other hand, the algorithm can effectively reduce the weight of the easily distinguished samples, and more attention is paid to the samples which are difficult to distinguish, so that the samples in the optimization process are more targeted. As shown in fig. 9, the overall algorithm architecture of the present application is shown.
The embodiment of the invention provides a strip steel surface defect detection method based on improved YOLOv5, which comprises the following steps:
the building module is used for building an improved anchor-freeYOLOv5 network;
the acquisition module is used for acquiring the surface image of the strip steel;
the detection module is used for inputting the strip steel surface image into the anchor-freyolov 5 network to obtain a strip steel surface defect detection result;
and the output module is used for outputting the detection result of the surface defect of the strip steel.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. The strip steel surface defect detection method based on the improved YOLOv5 is characterized by comprising the following steps of:
building an improved anchor-freeYOLOv5 network;
acquiring a strip steel surface image;
inputting the strip steel surface image into the anchor-freyyov 5 network to obtain a strip steel surface defect detection result;
outputting the detection result of the surface defect of the strip steel;
the working mechanism of the Anchor-freeyolv 5 network comprises: the input network module, the backhaul network module, the Neck network module and the Prediction network module are sequentially connected;
the input network module carries out Mosaic data enhancement on the input strip steel surface image;
the back bone network module is provided with a Focus structure and a CSPDarknet structure which are used for extracting feature images of the strip steel surface image after the Mosaic data enhancement;
the CSPDarknet structure comprises 5 CSP modules;
the Neck network module adopts a BiFPN module+PAN module structure;
the output of the backbond network module is used as the input of the BiFPN module to perform feature fusion to obtain a feature pyramid;
the PAN module firstly copies the lowest layer in the feature pyramid and becomes the bottommost layer of the new feature pyramid;
performing downsampling operation on the bottommost layer of the new feature pyramid;
the penultimate layer of the feature pyramid is subjected to 3x3 convolution with the step length of 2, and is added with the bottom layer subjected to downsampling by a transverse connection; the addition mode adopts concat operation;
finally, fusing the characteristics of the addition result by a 3x3 convolution;
an additional weight is added to the input of each BiFPN module, and the network is enabled to know the importance of each input characteristic, and the weighting method is as follows:
wherein O represents the result after fusion, I i Is input, omega i And omega j Is a learnable weight and Relu is used for each update to ensure that its value is greater than 0, and epsilon=0.0001 is a small fraction whose function is to prevent the value of the denominator from being equal to 0, epsilon + sigma j ω j Weight, ω i ·I i Representing the weighted features;
two 5-level fusion features are described in bipin:
is the input feature of the fifth layer on the top-down path, < >>Is an intermediate feature of the fifth layer on the top-down path, < >>Is an intermediate feature of the sixth layer on the top-down path, and +.>Output characteristics of fifth layer in bottom-up pathway, +.>Is the output feature of the fourth layer in the bottom-up approach, C represents a depth separable convolution operation, and R (…) represents a resolution-matched up-sampling or down-sampling operation, ω 1 For inputting features->For intermediate features->Is the learning weight of +.2 is the input feature +.>For intermediate features->Learning weights, ω 3 For inputting features->For intermediate featuresLearning weights of->Is the input feature of the sixth layer on the path from top to bottom, omega' 1 Is an input feature +.>For output characteristics->Learning weights, ω' 2 Is an intermediate feature->For output characteristics->Learning weights, ω' 3 Is an intermediate featureFor output characteristics->Learning weights, ω' 4 Is the output feature +.>Is a learning weight of (a).
2. The improved YOLOv 5-based strip surface defect detection method of claim 1, wherein the anchor-freyolov 5 network comprises: the device comprises a feature map module, a feature fusion module, a convolution module and a detector module which are connected in sequence.
3. The method for detecting surface defects of strip steel based on improved YOLOv5 of claim 1, wherein said Prediction network module uses EIoUloss for output.
4. The method for detecting surface defects of strip steel based on improved YOLOv5 of claim 1, wherein said anchor-freyolov 5 network uses EIoU as a bounding box regression loss during training.
5. The method for detecting strip steel surface defects based on improved YOLOv5 of claim 4, wherein after feature fusion of the neg network module, three-dimensional tensor, objectivity and class prediction of the coding bounding box are predicted based on Anchor-free bounding box regression by using two convolution layers.
6. A strip steel surface defect detection system based on improved YOLOv5, comprising:
the building module is used for building an improved anchor-freeYOLOv5 network;
the acquisition module is used for acquiring the surface image of the strip steel;
the detection module is used for inputting the strip steel surface image into the anchor-freyolov 5 network to obtain a strip steel surface defect detection result;
the output module is used for outputting the detection result of the surface defect of the strip steel;
the working mechanism of the Anchor-freeyolv 5 network comprises: the input network module, the backhaul network module, the Neck network module and the Prediction network module are sequentially connected;
the input network module carries out Mosaic data enhancement on the input strip steel surface image;
the back bone network module is provided with a Focus structure and a CSPDarknet structure which are used for extracting feature images of the strip steel surface image after the Mosaic data enhancement;
the CSPDarknet structure comprises 5 CSP modules;
the Neck network module adopts a BiFPN module+PAN module structure;
the output of the backbond network module is used as the input of the BiFPN module to perform feature fusion to obtain a feature pyramid;
the PAN module firstly copies the lowest layer in the feature pyramid and becomes the bottommost layer of the new feature pyramid;
performing downsampling operation on the bottommost layer of the new feature pyramid;
the penultimate layer of the feature pyramid is subjected to 3x3 convolution with the step length of 2, and is added with the bottom layer subjected to downsampling by a transverse connection; the addition mode adopts concat operation;
finally, fusing the characteristics of the addition result by a 3x3 convolution;
an additional weight is added to the input of each BiFPN module, and the network is enabled to know the importance of each input characteristic, and the weighting method is as follows:
wherein O represents the result after fusion, I i Is input, omega i And omega j Is a learnable weight and Relu is used for each update to ensure that its value is greater than 0, and epsilon=0.0001 is a small fraction whose function is to prevent the value of the denominator from being equal to 0, epsilon + sigma j ω j Weight, ω i ·I i Representing the weighted features;
two 5-level fusion features are described in bipin:
is the input feature of the fifth layer on the top-down path, < >>Is an intermediate feature of the fifth layer on the top-down path, < >>Is an intermediate feature of the sixth layer on the top-down path, and +.>Output characteristics of fifth layer in bottom-up pathway, +.>Is the output feature of the fourth layer in the bottom-up approach, C represents a depth separable convolution operation, and R (…) represents a resolution-matched up-sampling or down-sampling operation, ω 1 For inputting features->For intermediate features->Learning weights, ω 2 Is an input feature +.>For intermediate features->Learning weights, ω 3 For inputting features->For intermediate featuresLearning weights of->Is the input feature of the sixth layer on the path from top to bottom, omega' 1 Is an input feature +.>For output characteristics->Learning weights, ω' 2 Is an intermediate feature->For output characteristics->Learning weights, ω' 3 Is an intermediate featureFor output characteristics->Learning weights, ω' 4 Is the output feature +.>Is a learning weight of (a).
CN202310774663.8A 2023-06-27 2023-06-27 Strip steel surface defect detection method and system based on improved YOLOv5 Active CN117011231B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310774663.8A CN117011231B (en) 2023-06-27 2023-06-27 Strip steel surface defect detection method and system based on improved YOLOv5

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310774663.8A CN117011231B (en) 2023-06-27 2023-06-27 Strip steel surface defect detection method and system based on improved YOLOv5

Publications (2)

Publication Number Publication Date
CN117011231A CN117011231A (en) 2023-11-07
CN117011231B true CN117011231B (en) 2024-04-09

Family

ID=88570155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310774663.8A Active CN117011231B (en) 2023-06-27 2023-06-27 Strip steel surface defect detection method and system based on improved YOLOv5

Country Status (1)

Country Link
CN (1) CN117011231B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763364A (en) * 2021-09-09 2021-12-07 深圳市涌固精密治具有限公司 Image defect detection method based on convolutional neural network
CN113837059A (en) * 2021-09-22 2021-12-24 哈尔滨工程大学 Patrol vehicle for advising pedestrians to wear mask in time and control method thereof
CN115705637A (en) * 2021-08-11 2023-02-17 中国科学院沈阳计算技术研究所有限公司 Improved YOLOv5 model-based spinning cake defect detection method
CN115719337A (en) * 2022-11-11 2023-02-28 无锡学院 Wind turbine surface defect detection method
CN116206323A (en) * 2022-10-18 2023-06-02 多彩贵州印象网络传媒股份有限公司 Structured information identification method based on anchor-free frame regression and pixel classification algorithm fusion
CN116309427A (en) * 2023-03-14 2023-06-23 盐城工学院 PCB surface defect detection method based on improved YOLOv5 algorithm
CN116309704A (en) * 2023-02-20 2023-06-23 重庆邮电大学 Small target tracking method based on anchor-free frame detection network and feature re-fusion module

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115705637A (en) * 2021-08-11 2023-02-17 中国科学院沈阳计算技术研究所有限公司 Improved YOLOv5 model-based spinning cake defect detection method
CN113763364A (en) * 2021-09-09 2021-12-07 深圳市涌固精密治具有限公司 Image defect detection method based on convolutional neural network
CN113837059A (en) * 2021-09-22 2021-12-24 哈尔滨工程大学 Patrol vehicle for advising pedestrians to wear mask in time and control method thereof
CN116206323A (en) * 2022-10-18 2023-06-02 多彩贵州印象网络传媒股份有限公司 Structured information identification method based on anchor-free frame regression and pixel classification algorithm fusion
CN115719337A (en) * 2022-11-11 2023-02-28 无锡学院 Wind turbine surface defect detection method
CN116309704A (en) * 2023-02-20 2023-06-23 重庆邮电大学 Small target tracking method based on anchor-free frame detection network and feature re-fusion module
CN116309427A (en) * 2023-03-14 2023-06-23 盐城工学院 PCB surface defect detection method based on improved YOLOv5 algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Path Aggregation Network for Instance Segmentation";Shu Liu等;《 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition》;第8759-8768页 *
"YOLOX: Exceeding YOLO Series in 2021";Zheng Ge等;《arXiv》;第1-7页 *

Also Published As

Publication number Publication date
CN117011231A (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN109902677B (en) Vehicle detection method based on deep learning
WO2023015743A1 (en) Lesion detection model training method, and method for recognizing lesion in image
CN107145908B (en) A kind of small target detecting method based on R-FCN
CN110287932B (en) Road blocking information extraction method based on deep learning image semantic segmentation
CN109117876A (en) A kind of dense small target deteection model building method, model and detection method
CN110298298A (en) Target detection and the training method of target detection network, device and equipment
CN111126472A (en) Improved target detection method based on SSD
CN109816012A (en) A kind of multiscale target detection method of integrating context information
CN107633226B (en) Human body motion tracking feature processing method
CN110765865B (en) Underwater target detection method based on improved YOLO algorithm
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN113591795A (en) Lightweight face detection method and system based on mixed attention feature pyramid structure
CN111898668A (en) Small target object detection method based on deep learning
CN113468968B (en) Remote sensing image rotating target detection method based on non-anchor frame
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN110263731B (en) Single step human face detection system
CN112634369A (en) Space and or graph model generation method and device, electronic equipment and storage medium
CN110009628A (en) A kind of automatic testing method for polymorphic target in continuous two dimensional image
WO2024032010A1 (en) Transfer learning strategy-based real-time few-shot object detection method
Fan et al. A novel sonar target detection and classification algorithm
CN116503389B (en) Automatic detection method for external absorption of tooth root
CN108460336A (en) A kind of pedestrian detection method based on deep learning
CN112926652A (en) Fish fine-grained image identification method based on deep learning
CN112149665A (en) High-performance multi-scale target detection method based on deep learning
CN111339950B (en) Remote sensing image target detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant