CN115187544A

CN115187544A - DR-RSBU-YOLOv 5-based fabric flaw detection method

Info

Publication number: CN115187544A
Application number: CN202210802618.4A
Authority: CN
Inventors: 郑雨婷; 吕文涛; 余凯; 王成群; 徐伟强
Original assignee: Zhejiang Sci Tech University ZSTU
Current assignee: Zhejiang Sci Tech University ZSTU
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-14

Abstract

The invention discloses a fabric flaw detection method based on DR-RSBU-YOLOv 5. The method comprises the following steps: establishing an extended fabric image dataset; dividing the training set into a training set and a verification set; constructing a DR-RSBU-YOLOv5 network; inputting the training set into training, calculating the overall loss value of the network and updating parameters; inputting and processing the verification set, and calculating the average accuracy value of the fabric defect categories; obtaining a trained DR-RSBU-YOLOv5 network; and inputting and processing the fabric image data set to be detected, reserving the final prediction frame and mapping to detect and position the fabric flaws. The method effectively solves the problem of interference caused by small defects and noise in the fabric image to the detection of the defects of the medium and large target fabrics, improves the accuracy and speed of the network, enhances the detection capability of the defects of the fabrics and accelerates the detection efficiency.

Description

DR-RSBU-YOLOv 5-based fabric flaw detection method

Technical Field

The invention relates to a fabric flaw detection method, in particular to a fabric flaw detection method based on a DR-RSBU-YOLOv5 model.

Background

The textile industry has always occupied an important position in national economic development of China. In textile industry, fabrics are used not only as basic materials for making garments, but also as raw materials for other decorative and industrial applications. With the improvement of economic development and the living standard of people, the requirement of the market on the product quality gradually rises, and professional quality inspection personnel are required to inspect the flaws of the textiles and reject unqualified cloth. The manual detection method is low in efficiency and high in cost, and false detection or missing detection can be caused due to the influence of subjective factors of detection personnel. Therefore, it is very necessary to design an efficient automatic fabric defect detection method.

In recent years, convolutional Neural Networks (CNNs) have become more and more important in image classification, detection, segmentation, and other tasks, and a great number of researchers have participated in the field and improved various network models. In the direction of target detection, a great deal of excellent research work is also shown, and representative detection models such as R-CNN and YOLO series are provided, wherein the YOLO series are distinct and occupy a niche among a plurality of models due to the advantages of high detection speed and light weight of the models, but the problems of noise interference, low detection speed and the like exist in the current fabric flaw detection task, and the existing advanced detection model is to be improved in detection accuracy and efficiency indexes.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a fabric flaw detection method based on DR-RSBU-YOLOv5, which reduces the problems of complex background interference, false detection and missing detection caused by noise and the like in a fabric flaw detection task by introducing a noise-reducing improved depth residual shrinkage structure DR-RSBU-CW at a specific position of a network. Compared with the existing advanced detection model, the method has great advantages in detection precision and efficiency index, and simultaneously well meets the real-time requirement in an industrial scene.

The technical scheme adopted by the invention is as follows:

the method for detecting the fabric defects comprises the following steps:

s1) collecting a plurality of fabric images with fabric defects, sequentially carrying out fabric defect data labeling and data enhancement processing on each fabric image to obtain a labeled enhanced fabric image, wherein each labeled enhanced fabric image is provided with a target GT frame with the fabric defects obtained after the fabric defect data labeling, all the fabric images and the enhanced fabric image jointly establish an extended fabric image data set, and the extended fabric image data set comprises a plurality of extended fabric images.

And S2) clustering all target GT frames in the extended fabric image data set by using a Kmeans + + clustering algorithm to obtain K prior frames.

S3) dividing the extended fabric image data set according to a preset proportion to obtain a training set, a verification set and a test set, wherein the preset proportion is 8.

S4) building a DR-RSBU-YOLOv5 network.

S5) the training set comprises M expanded fabric images, X expanded fabric images in the training set are selected and input into a DR-RSBU-YOLOv5 network for training, and N training prediction characteristic graphs are output for each expanded fabric image.

S6) aiming at the N training prediction feature maps of each expanded fabric image, uniformly distributing the K prior frames in the step S3) to the N training prediction feature maps, adjusting the K prior frames according to the training prediction feature maps to respectively obtain K training prediction frames, and selecting a plurality of training prediction frames as training candidate frames according to the target GT frame.

S7) calculating an overall loss value of the DR-RSBU-YOLOv5 network according to the training candidate frame and the target GT frame, reversely propagating the overall loss value into the DR-RSBU-YOLOv5 network, and updating parameters of the DR-RSBU-YOLOv5 network by using a gradient descent method.

S8) repeating the steps S5) to S7) for each expansion fabric image in the training set, when X expansion fabric images in the selected training set are input into the DR-RSBU-YOLOv5 network with updated parameters after the step S7) is repeated last time, processing until all expansion fabric images in the training set are processed by the DR-RSBU-YOLOv5 network with updated parameters, and obtaining the DR-RSBU-YOLOv5 network at the moment as a pre-training DR-RSBU-YOLOv5 network.

S9) the verification set comprises a plurality of extension fabric images, each extension fabric image in the verification set is input into a pre-training DR-RSBU-YOLOv5 network for processing, and N verification prediction characteristic graphs are output; and (3) carrying out the same processing of processing the N training prediction feature maps in the step S6) on the N verification prediction feature maps of each extension fabric image in the verification set to obtain a plurality of verification prediction frames, and selecting the plurality of verification prediction frames as verification candidate frames according to the target GT frame.

And calculating the accuracy value AP of each fabric flaw category in the verification set according to the verification candidate frame and the target GT frame, and calculating the average accuracy value mAP of all the accuracy values AP.

S10) repeating the steps S8) to S9) until the average accuracy value mAP obtained for a plurality of times is equal to or approaches a fixed value, and obtaining the pre-training DR-RSBU-YOLOv5 network at the moment as the DR-RSBU-YOLOv5 network after training is completed.

And inputting the test set into a trained DR-RSBU-YOLOv5 network for processing, and detecting and positioning the fabric defects.

S11) acquiring a plurality of fabric images to be detected with fabric flaws, inputting each data set of the fabric images to be detected into a DR-RSBU-YOLOv5 network which is trained for processing, and outputting N detection prediction characteristic graphs; carrying out the same processing of processing the N training prediction characteristic graphs in the step S6) on the N detection prediction characteristic graphs to obtain a plurality of detection prediction frames; using a non-maximum value to inhibit NMS to remove redundant frames in each detection prediction frame, and using the reserved detection prediction frame as a final prediction frame; and mapping the final prediction frame to a fabric image to be detected to detect and position the fabric defects.

In the step S1), sequentially performing fabric defect data labeling and data enhancement processing on each fabric image to obtain a labeled enhanced fabric image, firstly performing data labeling on the type and position of each fabric defect in each fabric image, wherein each fabric defect is completely framed by a rectangular target GT frame, each target GT frame is labeled as (class, xmin, ymin, xmax, ymax), class represents the type of the fabric defect contained in the target GT frame, xmin and ymin respectively represent the x coordinate and the y coordinate of the top point of the upper left corner of the target GT frame, and xmax and ymax represent the x coordinate and the y coordinate of the top point of the lower right corner of the target GT frame; and then performing Mosaic data enhancement processing to finally obtain an image of the marked enhanced fabric. The Mosaic data enhancement specifically includes randomly selecting four fabric images, randomly cutting the four fabric images, and splicing the four fabric images to one image.

In the step S2), the width and height of the target GT frame are obtained by data marks of the target GT frame, all target GT frames in the extended fabric image data set are clustered according to the width and height of the target GT frame by using a Kmeans + + algorithm to obtain K clustering center coordinates, and the K clustering center coordinates are respectively used as the width and height to form K prior frames.

In the step S4), the constructed DR-RSBU-YOLOv5 network includes a Backbone network Backbone, a first convolutional block CBL, an improved depth residual shrinkage structure DR-RSBU-CW, a path aggregation network PANet, and a prediction header part:

a) Backbone network Backbone:

the Backbone network backhaul comprises a concentration layer Focus, a four-layer convolution bottleneck layer and a space pyramid fast pooling module SPPF which are connected in sequence; each layer of convolution Bottleneck layer comprises a first convolution block CBL and a Bottleneck structure module BottleneckCSP which are sequentially connected, and the number of the Bottleneck modules Bottleneck in the Bottleneck structure module BottleneckCSP of the four layers of convolution Bottleneck layers is 3, 6, 9 and 3 in sequence.

The outputs of the second layer of convolution bottleneck layer, the third layer of convolution bottleneck layer and the space pyramid fast pooling module SPPF are used as the output of the Backbone network backhaul; the output of the second layer of convolution bottleneck layer is input into a path aggregation network (PANet) for processing; the output of the third convolution bottleneck layer is respectively input into an improved depth residual shrinkage structure DR-RSBU-CW and a path aggregation network PANet for processing; the output of the spatial pyramid fast pooling module SPPF is input into the first convolution block CBL for processing, the output of the first convolution block CBL processing is directly input into the path aggregation network PANet for processing, and the output of the first convolution block CBL processing is up-sampled and then input into the path aggregation network PANe for processing.

b) Modified depth residual shrinkage structure DR-RSBU-CW:

the improved depth residual error shrinkage structure DR-RSBU-CW comprises a first convolution layer Conv, a first normalization layer BN, a first rectifying linear unit activation function ReLU, a second convolution layer Conv, a second normalization layer BN, a third convolution layer Conv, a third normalization layer BN, a global average pooling layer, a first full-connection layer FC, a second rectifying linear unit activation function ReLU, a second full-connection layer FC, an activation function Sigmoid and a third rectifying linear unit activation function ReBN, wherein the inputs of the improved depth residual error shrinkage structure DR-RSBU-CW are respectively input into the first convolution layer Conv and the second convolution layer Conv for processing, the output of the first convolution layer Conv is sequentially processed by the first normalization layer BN, the first rectifying linear unit activation function ReLU, the third convolution layer Conv and the third normalization layer BN, the output of the second convolution layer is processed by the second normalization layer BN, the addition results of the second normalization layer BN and the third normalization layer BN are added to obtain zero assignment processing results, and the addition results are subjected to zero assignment processing to obtain zero assignment results; and after carrying out absolute value processing on the output result of the second batch of normalization layer BN, sequentially carrying out soft thresholding on the result obtained by multiplying the output of the global average pooling layer and the output of the activation function Sigmoid and a zero assigning result together, adding the processed result and the output result of the first rectifying linear unit activation function ReLU, and then outputting the result after processing by the third rectifying linear unit activation function ReLU as the output of the improved depth residual shrinkage structure DR-RSBU-CW.

c) Path aggregation network PANet:

the path aggregation network PANet comprises a first fusion Bottleneck layer, a second volume block CBL, a second fusion Bottleneck layer, a third volume block CBL, a third fusion Bottleneck layer, a fourth volume block CBL and a fusion volume layer which are sequentially connected, wherein the first fusion Bottleneck layer, the second fusion Bottleneck layer and the third fusion Bottleneck layer respectively comprise a fusion function Concat and a Bottleneck structure module BottleneeckCSP which are sequentially connected, and the number of Bottleneck modules Bottleneeck in the Bottleneck structure module BottleneeckCSP of the fusion Bottleneck layer is 3; the fused convolutional layer includes one fusion function Concat and 5 convolutional layers Conv connected in sequence.

The output of the first convolution block CBL and the output of the third convolution bottleneck layer of the Backbone network Backbone are input into the first fusion bottleneck layer together for processing after being subjected to up-sampling, the processed output is subjected to up-sampling processing after being processed by the second convolution block CBL, the output of the up-sampling processing and the output of the second convolution bottleneck layer of the Backbone network Backbone are input into the second fusion bottleneck layer together for processing, and the processed outputs are input into the third convolution block CBL and the prediction head part respectively for processing; the output of the third volume block CBL processing, the output of the first fusion bottleneck layer and the output of the improved depth residual contraction structure DR-RSBU-CW are jointly input into the third fusion bottleneck layer for processing, the processed outputs are respectively input into the fourth volume block CBL and the prediction header part for processing, the output of the fourth volume block CBL and the output of the first volume block CBL are jointly input into the fusion volume layer for processing, and the processed outputs are input into the prediction header part for processing.

d) Prediction head part:

the prediction header part comprises three prediction headers YOLO Head, and the output of a second fusion bottleneck layer in a top-down path of the path aggregation network PANet, the output of a third fusion bottleneck layer in a bottom-up path of the path aggregation network PANet and the output of the fusion convolution layer are respectively used as the input of the three prediction headers YOLO Head; the output of the three prediction heads YOLO Head is three prediction feature maps of DR-RSBU-yollov 5 network output, namely N =3.

Adjusting the number of input channels to num _ anchors (5 + num _classes) through three prediction heads YOLO Head; the output of the three prediction heads Head of the prediction Head part is respectively input to the three Loss functions Loss, and the three Loss functions Loss are processed and then output as the classification confidence coefficient Loss, the frame confidence coefficient Loss and the cross-over comparison Loss of the input of the prediction Head part.

In the improved depth residual error contraction structure DR-RSBU-CW, the outputs of a second normalization layer BN and a third normalization layer BN are added to obtain an addition result, and then zero-giving processing is carried out on the addition result to obtain a zero-giving result, specifically, the input of a DR-RSBU-YOLOv5 network is in an image form, the image input by the DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until the image is processed by the second normalization layer BN and the third normalization layer BN and then output into a first characteristic diagram and a second characteristic diagram, the first characteristic diagram and the second characteristic diagram are added to obtain an addition characteristic diagram, and then zero-giving processing is carried out on all values on the addition characteristic diagram, namely, the values larger than or equal to 0 in all the values on the addition characteristic diagram are assigned to zero to obtain the zero-giving characteristic diagram.

And carrying out absolute value processing on the output result of the second batch of normalization layers BN, specifically, taking absolute values of all values on the first characteristic diagram and then carrying out the next operation.

In the improved depth residual error shrinkage structure DR-RSBU-CW, the result obtained by multiplying the output of a global average pooling layer by the output of an activation function Sigmoid and the zero-giving result are subjected to soft thresholding together, specifically, an image input by a DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until a multiplied feature map is obtained by multiplying the output feature map of the global average pooling layer and the output feature map of the activation function Sigmoid, all values on the multiplied feature map and the zero-giving feature map are subjected to soft thresholding, and a value x in the soft thresholding is processed by the following soft threshold function:

wherein y is the output of x after soft thresholding; τ is a threshold, that is, a result obtained by multiplying the output of the global average pooling layer by the output of the activation function Sigmoid.

The part of x > tau is not reserved, only the feature after noise filtering in the part of x < -tau is reserved, and the original feature after dimension reduction through 1 x 1 convolution is added into the identity mapping, so that the information loss is not excessive. This preserves, to some extent, the detailed features that may be flaws to be detected.

In the step S5), X is the divisor of M, and N is the divisor of K. Specifically, K =9.

In the step S6), the following operations are performed on the N training prediction feature maps of each extended fabric image:

sequencing the K prior frames in the step S3) according to the size of a scale, then uniformly dividing the K prior frames into N groups of prior frames according to the sequencing sequence, and respectively allocating the N groups of prior frames to N training prediction feature maps according to the size of the N groups of prior frames and the size of the N training prediction feature maps, namely allocating the N groups of prior frames to the N training prediction feature maps with the sizes from large to small according to the size of the N groups of prior frames from small to large in sequence.

Adjusting the K prior frames according to the training prediction feature map, specifically adjusting the K prior frames according to the image information of the training prediction feature map, wherein the image information of the training prediction feature map comprises position scale adjustment information, classification confidence coefficient and frame confidence coefficient, the position scale adjustment information is adjustment information of width, height and center point coordinates, and the following operations are performed for each training prediction feature map:

dividing the training prediction feature map into H multiplied by W grid units, wherein H and W are respectively the height and width of the training prediction feature map, the center of each grid unit is called an anchor point, the information of the anchor point comprises position scale adjustment information, classification confidence coefficient and frame confidence coefficient, and the following operations are carried out for each anchor point and K/N prior frames distributed on the training prediction feature map:

superposing K/N prior frames on an anchor point, wherein the anchor point corresponds to an anchor point vector with the length of a, and a = num _ anchor (5 + num _ class), wherein num _ anchor represents the number of the prior frames on the anchor point, namely K/N, and num _ class represents the number of fabric defect categories, namely K; carrying out dimension splitting on the anchor point vector to obtain K/N one-dimensional adjustment vectors respectively corresponding to K/N priori frames, wherein the length of each one-dimensional adjustment vector is 5+ num \ class, adjusting the position and the scale of each priori frame according to the position scale adjustment information of each one-dimensional adjustment vector to respectively obtain K/N training prediction frames, wherein the information of each training prediction frame comprises respective position information, classification confidence coefficient and frame confidence coefficient, and the position information is the width and the height of the training prediction frame and the coordinate information of the central point.

K priori frames are adjusted to respectively obtain K training prediction frames, for each target GT frame, the intersection ratio loss IoU between the target GT frame and each training prediction frame is calculated, and the training prediction frame with the largest intersection ratio loss IoU between the target GT frame and the target GT frame is obtained to serve as a training candidate frame; and finally, selecting a plurality of training prediction boxes as training candidate boxes according to the target GT boxes.

In step S7), the overall loss value of the DR-RSBU-YOLOv5 network is calculated according to the training candidate frames and the target GT frames, that is, the following operations are performed for each target GT frame and the training candidate frame obtained in step S6) of the target GT frame:

the target GT frame corresponds to a one-dimensional GT vector, the length of the one-dimensional GT vector is 5+ num \ class, and the information of the one-dimensional GT vector comprises position information, classification confidence coefficient and frame confidence coefficient; through step S6), a one-dimensional adjustment vector used by the training candidate frame is adjusted, and the loss between the one-dimensional GT vector and the one-dimensional adjustment vector is calculated, including the bounding box position loss, the classification confidence loss, and the bounding box confidence loss: calculating by using the CIoU loss according to the position information of each one-dimensional GT vector and each one-dimensional adjusting vector to obtain the position loss of the bounding box; obtaining classification confidence loss by using binary cross entropy loss calculation according to the classification confidence of each one-dimensional GT vector and each one-dimensional adjusting vector; and calculating to obtain the frame confidence coefficient loss by using binary cross entropy loss according to the frame confidence coefficient of each one-dimensional GT vector and each one-dimensional adjusting vector.

And weighting and summing the position loss of the bounding box, the classification confidence loss and the bounding box confidence loss to obtain an overall loss value of the DR-RSBU-Yolov5 network, reversely propagating the overall loss value into the DR-RSBU-Yolov5 network, and simultaneously updating and optimizing parameters of the DR-RSBU-Yolov5 network by using a gradient descent method.

In step S11), the following operations are performed for each final prediction block and each detected prediction feature map in which the final prediction block is located:

acquiring a to-be-detected fabric image of a detection prediction characteristic diagram obtained after training of the DR-RSBU-YOLOv5 network processing, mapping a final prediction frame on the detection prediction characteristic diagram to a fabric flaw of the to-be-detected fabric image according to a proportional relation between the detection prediction characteristic diagram and the to-be-detected fabric image, and detecting and positioning the fabric flaw on the detected fabric image.

The invention has the beneficial effects that:

the invention improves the YOLOv5 network, reduces the detection influence of small defects and noise on medium-sized and large-sized defects in a fabric image by improving the RSBU-CW module and designing an improved depth residual shrinkage structure DR-RSBU-CW with a noise reduction effect, can well conform to the original structure of the YOLOv5 network, fuses more characteristics on the premise of not bringing noise, ensures that the noise-reduced characteristic diagram can not only provide important information, but also reduces the introduction of noise as much as possible, improves the precision and the speed of the network, enhances the detection capability on the fabric defects and accelerates the detection efficiency.

Drawings

FIG. 1 is a flow chart of a fabric defect detection method of the present invention;

FIG. 2 is a schematic diagram of a network structure of the DR-RSBU-YOLOv5 model;

FIG. 3 is a schematic diagram of a noise reduced modified depth residual puncturing structure DR-RSBU-CW;

fig. 4 is a graph of the detection effect of a sample fabric image.

Detailed Description

The invention is described in further detail below with reference to the figures and the embodiments.

The following description is provided for illustrative purposes and is not intended to limit the invention to the particular embodiments disclosed. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the features in the following embodiments and examples may be combined with each other without conflict.

As shown in fig. 1, the method for detecting fabric defects comprises the following steps:

In the step S1), fabric flaw data marking and data enhancement processing are sequentially carried out on each fabric image to obtain a marked enhanced fabric image, firstly, data marking of the type and the position of each fabric flaw in each fabric image is carried out, each fabric flaw is completely framed by a rectangular target GT frame, each target GT frame is marked as (class, xmin, ymin, xmax, ymax), the class represents the type of the fabric flaws contained in the target GT frame, xmin and ymin respectively represent the x coordinate and the y coordinate of the top point of the left top corner of the target GT frame, and xmax and ymax represent the x coordinate and the y coordinate of the top point of the right bottom corner of the target GT frame; and then performing Mosaic data enhancement processing to finally obtain an image of the marked enhanced fabric. The Mosaic data enhancement specifically includes randomly selecting four fabric images, randomly cutting the four fabric images, and splicing the four fabric images to one image.

S2) clustering all the GT frames of the target in the extended fabric image data set by using a Kmeans + + clustering algorithm to obtain K prior frames.

In the step S2), the width and the height of the target GT frame are obtained through data marks of the target GT frame, all the target GT frames in the extended fabric image data set are clustered according to the width and the height of the target GT frame by using a Kmeans + + algorithm to obtain K clustering center coordinates, and the K clustering center coordinates are respectively used as the width and the height to form K prior frames.

S4) building a DR-RSBU-YOLOv5 network, as shown in figure 2.

In the step S4), the constructed DR-RSBU-YOLOv5 network comprises a Backbone network Backbone, a first volume block CBL, an improved depth residual error shrinkage structure DR-RSBU-CW, a path aggregation network PANet and a prediction header part:

a) Backbone network Backbone:

the trunk network backhaul comprises a concentration layer Focus, four convolution bottleneck layers and a spatial pyramid rapid pooling module SPPF which are connected in sequence; each convolution Bottleneck layer comprises a first convolution block CBL and a Bottleneck structure module BottleneckCSP which are sequentially connected, and the number of the Bottleneck modules BottleneckCSPs in the Bottleneck structure modules BottleneckCSPs of the four convolution Bottleneck layers is 3, 6, 9 and 3 in sequence.

The outputs of the second layer of convolution bottleneck layer, the third layer of convolution bottleneck layer and the space pyramid fast pooling module SPPF are used as the output of the Backbone network backhaul; the output of the second layer of convolution bottleneck layer is input into a path aggregation network (PANet) for processing; the output of the third convolution bottleneck layer is respectively input into an improved depth residual error shrinkage structure DR-RSBU-CW and a path aggregation network PANet for processing; the output of the spatial pyramid fast pooling module SPPF is input into the first convolution block CBL for processing, the output of the first convolution block CBL processing is directly input into the path aggregation network PANet for processing, and the output of the first convolution block CBL processing is up-sampled and then input into the path aggregation network PANe for processing.

b) Modified depth residual shrinkage structure DR-RSBU-CW:

as shown in fig. 3, the modified depth residual error puncturing structure DR-RSBU-CW includes a first convolution layer Conv, a first batch of normalization layers BN, a first rectifying linear unit activation function ReLU, a second convolution layer Conv, a second batch of normalization layers BN, a third convolution layer Conv, a third batch of normalization layers BN, a global averaging pooling layer, a first full connection layer FC, a second rectifying linear unit activation function ReLU, a second full connection layer FC, an activation function Sigmoid, and a third rectifying linear unit activation function ReLU, inputs of the modified depth residual error puncturing structure DR-RSBU-CW are respectively input into the first convolution layer Conv and the second convolution layer Conv for processing, outputs of the first convolution layer Conv are sequentially processed by the first batch of normalization layers BN, the first rectifying linear unit activation function ReLU, the third convolution layer Conv, and the third batch of normalization layers BN, outputs of the second convolution layer Conv are processed by the second normalization layer, outputs of the second batch of normalization layers BN and the third batch of normalization layers BN are added to obtain a zero-added result; and after carrying out absolute value processing on the output result of the second batch of normalization layer BN, sequentially carrying out soft thresholding on the result obtained by multiplying the output of the global average pooling layer and the output of the activation function Sigmoid and a zero assigning result together, adding the processed result and the output result of the first rectifying linear unit activation function ReLU, and then outputting the result after processing by the third rectifying linear unit activation function ReLU as the output of the improved depth residual shrinkage structure DR-RSBU-CW.

In an improved depth residual shrinkage structure DR-RSBU-CW, adding outputs of a second batch of normalization layers BN and a third batch of normalization layers BN to obtain an addition result, and performing zero-giving processing on the addition result to obtain a zero-giving result, wherein the input of a DR-RSBU-YOLOv5 network is in an image form, the image input by the DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until the image is processed by the second batch of normalization layers BN and the third batch of normalization layers BN and then output into a first characteristic diagram and a second characteristic diagram, the first characteristic diagram and the second characteristic diagram are added to obtain an addition characteristic diagram, and then performing zero-giving processing on all values on the addition characteristic diagram, namely, the values larger than or equal to 0 in all values on the addition characteristic diagram are zero-given to obtain an assigned zero characteristic diagram;

and (4) carrying out absolute value processing on the output results of the second batch of normalization layers BN, specifically, carrying out next operation after all values on the first characteristic diagram are taken as absolute values.

In an improved depth residual shrinkage structure DR-RSBU-CW, the result obtained by multiplying the output of a global average pooling layer and the output of an activation function Sigmoid and a zero-giving result are subjected to soft thresholding together, specifically, an image input by a DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until a multiplied feature map is obtained by multiplying the output feature map of the global average pooling layer and the output feature map of the activation function Sigmoid, all values on the multiplied feature map and the zero-giving feature map are subjected to soft thresholding, and a value x in the soft thresholding is processed by the following soft threshold function:

The part of x > tau is not reserved, only the feature after noise filtering in the part of x < -tau is reserved, and because the identity mapping adds the original feature after dimension reduction through 1 x 1 convolution, the information loss is not excessive. This preserves to some extent the characteristic of detail that may be a flaw to be detected.

c) Path aggregation network PANet:

the path aggregation network PANet comprises a first fusion Bottleneck layer, a second volume block CBL, a second fusion Bottleneck layer, a third volume block CBL, a third fusion Bottleneck layer, a fourth volume block CBL and a fusion volume layer which are sequentially connected, wherein the first fusion Bottleneck layer, the second fusion Bottleneck layer and the third fusion Bottleneck layer respectively comprise a fusion function Concat and a Bottleneck structure module BottleneckCSP which are sequentially connected, and the number of Bottleneck modules Bottleneck in the Bottleneck structure module BottleneckCSP of the fusion Bottleneck layer is 3; the fused convolutional layer includes one fusion function Concat and 5 convolutional layers Conv connected in sequence.

The output of the first convolution block CBL and the output of the third convolution bottleneck layer of the Backbone network Backbone are input into the first fusion bottleneck layer together for processing after being subjected to up-sampling, the processed output is subjected to up-sampling processing after being processed by the second convolution block CBL, the output of the up-sampling processing and the output of the second convolution bottleneck layer of the Backbone network Backbone are input into the second fusion bottleneck layer together for processing, and the processed outputs are input into the third convolution block CBL and the prediction head part respectively for processing; the output of the third volume block CBL processing, the output of the first fusion bottleneck layer and the output of the improved depth residual shrinkage structure DR-RSBU-CW are jointly input into the third fusion bottleneck layer for processing, the processed outputs are respectively input into the fourth volume block CBL and the prediction header part for processing, the output of the fourth volume block CBL and the output of the first volume block CBL are jointly input into the fusion volume layer for processing, and the processed outputs are input into the prediction header part for processing.

d) Prediction head part:

Adjusting the number of input channels to num _ anchors (5 + num _classes) through three prediction heads YOLO Head; the outputs of the three prediction heads Head of the prediction Head part are respectively input into three Loss functions Loss, and the outputs of the three Loss functions Loss after processing are the classification confidence coefficient Loss, the frame confidence coefficient Loss and the cross-over comparison Loss of the inputs of the prediction Head part.

In step S5), X is a divisor of M and N is a divisor of K. Specifically, K =9.

S6) aiming at the N training prediction feature maps of each expanded fabric image, uniformly distributing the K prior frames in the step S3) on the N training prediction feature maps, adjusting the K prior frames according to the training prediction feature maps, respectively obtaining the K training prediction frames, and selecting a plurality of training prediction frames as training candidate frames according to a target GT frame.

In step S6), the following operations are performed for N training prediction feature maps of each extended fabric image:

Adjusting the K prior frames according to the training prediction feature map, specifically adjusting the K prior frames according to the image information of the training prediction feature map, wherein the image information of the training prediction feature map comprises position scale adjustment information, classification confidence and frame confidence, the position scale adjustment information is adjustment information of coordinates of width, height and a center point, and for each training prediction feature map, the following operations are performed:

dividing the training prediction feature map into H multiplied by W grid units, wherein H and W are respectively the height and the width of the training prediction feature map, the center of each grid unit is called an anchor point, the information of the anchor point comprises position scale adjustment information, classification confidence coefficient and frame confidence coefficient, and the following operations are carried out for each anchor point and K/N prior frames distributed on the training prediction feature map:

superposing K/N prior frames on anchor points, wherein the anchor points correspond to an anchor point vector with the length of a, and a = num _ anchor (5 + num _ class), wherein num _ anchor represents the number of the prior frames on the anchor points, namely K/N, and num _ class represents the number of fabric defect categories, namely K; carrying out dimension splitting on the anchor point vector to obtain K/N one-dimensional adjustment vectors respectively corresponding to K/N priori frames, wherein the length of each one-dimensional adjustment vector is 5+ num \ class, adjusting the position and the scale of each priori frame according to the position scale adjustment information of each one-dimensional adjustment vector to respectively obtain K/N training prediction frames, wherein the information of each training prediction frame comprises respective position information, classification confidence coefficient and frame confidence coefficient, and the position information is the width and the height of the training prediction frame and the coordinate information of the central point.

In step S7), the overall loss value of the DR-RSBU-YOLOv5 network is calculated according to the training candidate frames and the target GT frames, that is, the following operations are performed for each target GT frame and the one training candidate frame obtained in step S6) of the target GT frame:

the target GT frame corresponds to a one-dimensional GT vector, the length of the one-dimensional GT vector is 5+ num \ class, and the information of the one-dimensional GT vector comprises position information, classification confidence coefficient and frame confidence coefficient; through step S6), a one-dimensional adjustment vector used by the training candidate frame is adjusted, and the loss between the one-dimensional GT vector and the one-dimensional adjustment vector is calculated, including the bounding box position loss, the classification confidence loss, and the bounding box confidence loss: calculating by using the CIoU loss according to the position information of each one-dimensional GT vector and each one-dimensional adjusting vector to obtain the position loss of the bounding box; obtaining classification confidence loss by using binary cross entropy loss calculation according to the classification confidence of each one-dimensional GT vector and each one-dimensional adjusting vector; and calculating to obtain frame confidence coefficient loss by using binary cross entropy loss according to the frame confidence coefficient of each one-dimensional GT vector and each one-dimensional adjusting vector.

S9) the verification set comprises a plurality of extension fabric images, each extension fabric image in the verification set is input into a pre-training DR-RSBU-YOLOv5 network for processing, and N verification prediction characteristic graphs are output; and (4) carrying out the same processing of processing N training prediction feature maps in the step S6) on N verification prediction feature maps of each extension fabric image in the verification set to obtain a plurality of verification prediction frames, and selecting the plurality of verification prediction frames as verification candidate frames according to the target GT frame.

S10) repeating the steps S8) to S9) until the average accuracy value mAP obtained for multiple times is equal to or approaches to a fixed value, and obtaining the pre-training DR-RSBU-YOLOv5 network at the moment as the DR-RSBU-YOLOv5 network after training is completed. And inputting the test set into a trained DR-RSBU-YOLOv5 network for processing, and detecting and positioning the fabric defects.

S11) acquiring a plurality of to-be-detected fabric images with fabric flaws, inputting each to-be-detected fabric image data set into a trained DR-RSBU-YOLOv5 network for processing, and outputting N detection prediction characteristic maps; carrying out the same processing of processing the N training prediction characteristic graphs in the step S6) on the N detection prediction characteristic graphs to obtain a plurality of detection prediction frames; using a non-maximum value to inhibit NMS to remove redundant frames in each detection prediction frame, and using the reserved detection prediction frame as a final prediction frame; and mapping the final prediction frame to the fabric image to be detected for detecting and positioning the fabric defects, as shown in fig. 4.

In step S11), the following operations are performed for each final prediction frame and each detected prediction feature map in which the final prediction frame is located:

The size of the fabric image to be detected is 4096 × 1696 pixels, 9 prior frames are obtained in step S2, and the sizes of the 9 prior frames are (7, 8), (15, 14), (30, 33), (25, 188), (639, 34), (639, 49), (71, 636), (639, 71), (639, 637), respectively. The scales of the detection prediction feature maps of 3 different scales output in the step S11) are respectively 20 × 20, 40 × 40, and 80 × 80. For each detected predicted feature map, 3 prior boxes are assigned. As for the detection prediction feature map with the 20 × 20 scale, the largest prior boxes (71, 636), (639, 71), (639, 637) with 3 sizes are allocated because the receptive field is largest; for the detection prediction feature map of 40 × 40 scale, a larger prior box of 3 sizes (25, 188), (639, 34), (639, 49), (639, 75) is assigned; for the 80 x 80 scale of the detected predicted feature map, 3 a priori boxes (7, 8), (15, 14), (30, 33) of minimum size are assigned.

In order to verify the performance of the method, a DR-RSBU-YOLOv5 network is used for predicting the fabric image to be detected, the Average accuracy mean mAP (mean Average Precision) and the accuracy (Precision) and Recall (Recall) corresponding to each fabric defect category are calculated by using the prediction result, and the fabric defect categories selected from the fabric image to be detected are respectively as follows: sewing head sealing, sewing head printing sealing print, insect sticking bug, hole breaking, wrinkle scrimp, weaving defect flaw, missing print, color difference color shade fold, stain dirty, miss pattern error, watermark, hair, defect, and water spot wax spot. The experimental results are shown in table 1, and the detection ratios of the overall fabric defect classes before and after the improvement of the network structure are shown in table 2:

TABLE 1

TABLE 2

As can be seen from Table 1, the method can realize the detection of the fabric defects of various types, and obtains higher accuracy. As can be seen from the table 2, the depth residual shrinkage structure RSBU is introduced and improved on the basis of the YOLOv5 network, the noise-reduced improved depth residual shrinkage structure DR-RSBU-CW is obtained, the DR-RSBU-YOLOv5 network is finally constructed, the average accuracy mean mAP is increased by one point, and the detection capability of the fabric defects is enhanced.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art without departing from the spirit of the present invention should fall within the protection scope of the present invention.

Claims

1. A fabric flaw detection method based on DR-RSBU-YOLOv5 is characterized in that: the method comprises the following steps:

s1) collecting a plurality of fabric images with fabric defects, sequentially carrying out fabric defect data labeling and data enhancement processing on each fabric image to obtain a labeled enhanced fabric image, wherein each labeled enhanced fabric image is provided with a target GT frame with the fabric defects obtained after the fabric defect data labeling, all the fabric images and the enhanced fabric image jointly establish an extended fabric image data set, and the extended fabric image data set comprises a plurality of extended fabric images;

s2) clustering all the GT frames of the extended fabric image data set by using a Kmeans + + clustering algorithm to obtain K prior frames;

s3) dividing the extended fabric image data set according to a preset proportion to obtain a training set and a verification set;

s4) building a DR-RSBU-YOLOv5 network;

s5) the training set comprises M expanded fabric images, X expanded fabric images in the training set are selected and input into a DR-RSBU-YOLOv5 network for training, and N training prediction characteristic graphs are output for each expanded fabric image;

s6) aiming at N training prediction feature maps of each expanded fabric image, uniformly distributing the K prior frames in the step S3) on the N training prediction feature maps, adjusting the K prior frames according to the training prediction feature maps to respectively obtain K training prediction frames, and selecting a plurality of training prediction frames as training candidate frames according to a target GT frame;

s7) calculating an overall loss value of the DR-RSBU-YOLOv5 network according to the training candidate frame and the target GT frame, reversely propagating the overall loss value into the DR-RSBU-YOLOv5 network, and updating parameters of the DR-RSBU-YOLOv5 network by using a gradient descent method;

s8) repeating the steps S5) to S7) for each expansion fabric image in the training set, when X expansion fabric images in the selected training set are input into the DR-RSBU-YOLOv5 network with updated parameters after the step S7) is repeated last time, processing the X expansion fabric images in the training set until all the expansion fabric images in the training set are processed by the DR-RSBU-YOLOv5 network with updated parameters, and obtaining the DR-RSBU-YOLOv5 network at the moment as a pre-training DR-RSBU-YOLOv5 network;

s9) the verification set comprises a plurality of extension fabric images, each extension fabric image in the verification set is input into a pre-training DR-RSBU-YOLOv5 network for processing, and N verification prediction characteristic graphs are output; carrying out the same processing of processing N training prediction characteristic maps in the step S6) on N verification prediction characteristic maps of each extension fabric image in the verification set to obtain a plurality of verification prediction frames, and selecting the plurality of verification prediction frames as verification candidate frames according to the target GT frame;

calculating the accuracy value AP of each fabric flaw category in the verification set according to the verification candidate frame and the target GT frame, and calculating the average accuracy value mAP of all the accuracy values AP;

s10) repeating the steps S8) to S9) until the average accuracy value mAP obtained for multiple times is equal to a fixed value, and obtaining the pre-training DR-RSBU-Yolov5 network at the moment as the DR-RSBU-Yolov5 network after training is finished;

s11) acquiring a plurality of fabric images to be detected with fabric flaws, inputting each data set of the fabric images to be detected into a DR-RSBU-YOLOv5 network which is trained for processing, and outputting N detection prediction characteristic graphs; carrying out the same processing of processing N training prediction characteristic maps in the step S6) on N detection prediction characteristic maps to obtain a plurality of detection prediction frames; using a non-maximum value to inhibit NMS to remove redundant frames in each detection prediction frame, and using the reserved detection prediction frame as a final prediction frame; and mapping the final prediction frame to the fabric image to be detected to detect and position the fabric defects.

2. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 1, wherein: in the step S1), sequentially performing fabric defect data labeling and data enhancement processing on each fabric image to obtain a labeled enhanced fabric image, firstly performing data labeling on the type and position of each fabric defect in each fabric image, wherein each fabric defect is completely framed by a rectangular target GT frame, each target GT frame is labeled as (class, xmin, ymin, xmax, ymax), class represents the type of the fabric defect contained in the target GT frame, xmin and ymin respectively represent the x coordinate and the y coordinate of the top point of the upper left corner of the target GT frame, and xmax and ymax represent the x coordinate and the y coordinate of the top point of the lower right corner of the target GT frame; and then, performing enhancement processing on the Mosaic data to finally obtain an image of the marked enhanced fabric.

3. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 2, wherein: in the step S2), the width and height of the target GT frame are obtained by data marks of the target GT frame, all target GT frames in the extended fabric image data set are clustered according to the width and height of the target GT frame by using a Kmeans + + algorithm to obtain K clustering center coordinates, and the K clustering center coordinates are respectively used as the width and height to form K prior frames.

4. The DR-RSBU-YOLOv 5-based fabric defect detection method of claim 1, wherein: in the step S4), the constructed DR-RSBU-YOLOv5 network includes a Backbone network Backbone, a first convolutional block CBL, an improved depth residual shrinkage structure DR-RSBU-CW, a path aggregation network PANet, and a prediction header part:

a) Backbone network Backbone:

the trunk network backhaul comprises a concentration layer Focus, four convolution bottleneck layers and a spatial pyramid rapid pooling module SPPF which are connected in sequence; each convolution bottleneck layer comprises a first convolution block CBL and a bottleneck structure module BottleneckCSP which are connected in sequence;

the outputs of the second layer of convolution bottleneck layer, the third layer of convolution bottleneck layer and the space pyramid fast pooling module SPPF are used as the output of the Backbone network backhaul; the output of the second layer of convolution bottleneck layer is input into a path aggregation network (PANet) for processing; the output of the third convolution bottleneck layer is respectively input into an improved depth residual error shrinkage structure DR-RSBU-CW and a path aggregation network PANet for processing; the output of the spatial pyramid fast pooling module SPPF is input into a first convolution block CBL for processing, the output processed by the first convolution block CBL is directly input into a path aggregation network PANet for processing, and the output processed by the first convolution block CBL is input into the path aggregation network PANE for processing after up-sampling;

b) Improving a depth residual shrinkage structure DR-RSBU-CW:

the improved depth residual error shrinkage structure DR-RSBU-CW comprises a first convolution layer Conv, a first batch of normalization layers BN, a first rectifying linear unit activation function ReLU, a second convolution layer Conv, a second batch of normalization layers BN, a third convolution layer Conv, a third batch of normalization layers BN, a global average pooling layer, a first full-connection layer FC, a second rectifying linear unit activation function ReLU, a second full-connection layer FC, an activation function Sigmoid and a third rectifying linear unit activation function ReLU, wherein the input of the improved depth residual error shrinkage structure DR-RSBU-CW is respectively input into the first convolution layer Conv and the second convolution layer Conv for processing, the output of the first convolution layer Conv is sequentially processed by the first convolution layer BN, the first rectifying linear unit activation function ReLU, the third convolution layer Conv and the third batch of normalization layers BN, the output of the second convolution layer Conv is processed by the second batch of normalization layers BN, the output of the second batch of normalization layers and the third batch of normalization layers BN is added to obtain a zero addition result; the result of the output of the second batch of normalization layer BN is processed by absolute value processing and then sequentially processed by a global average pooling layer, a first full connection layer FC, a second rectifying linear unit activation function ReLU, a second full connection layer FC and an activation function Sigmoid, the result obtained by multiplying the output of the global average pooling layer and the output of the activation function Sigmoid and the zero-giving result are jointly processed by soft threshold processing, and the processed result is added with the result output by the first rectifying linear unit activation function ReLU and then processed by a third rectifying linear unit activation function ReLU and then output as the output of the improved depth residual shrinkage structure DR-RSBU-CW;

c) Path aggregation network PANet:

the path aggregation network PANet comprises a first fusion bottleneck layer, a second volume block CBL, a second fusion bottleneck layer, a third volume block CBL, a third fusion bottleneck layer, a fourth volume block CBL and a fusion volume layer which are sequentially connected, wherein the first fusion bottleneck layer, the second fusion bottleneck layer and the third fusion bottleneck layer respectively comprise a fusion function Concat and a bottleneck structure module BottleneckCSP which are sequentially connected; the fused convolutional layer comprises a fusion function Concat and 5 convolutional layers Conv connected in sequence;

the output of the first convolution block CBL and the output of the third convolution bottleneck layer of the Backbone network Backbone are input into the first fusion bottleneck layer together for processing after being subjected to up-sampling, the processed output is subjected to up-sampling processing after being processed by the second convolution block CBL, the output of the up-sampling processing and the output of the second convolution bottleneck layer of the Backbone network Backbone are input into the second fusion bottleneck layer together for processing, and the processed outputs are input into the third convolution block CBL and the prediction head part respectively for processing; the output of the third volume block CBL processing, the output of the first fusion bottleneck layer and the output of the improved depth residual shrinkage structure DR-RSBU-CW are jointly input into the third fusion bottleneck layer for processing, the processed outputs are respectively input into the fourth volume block CBL and the prediction header part for processing, the output of the fourth volume block CBL and the output of the first volume block CBL are jointly input into the fusion volume layer for processing, and the processed outputs are input into the prediction header part for processing;

d) Prediction head part:

the prediction Head part comprises three prediction heads YOLO Head, and the output of a second fusion bottleneck layer of the path aggregation network PANet, the output of a third fusion bottleneck layer of the path aggregation network PANet and the output of the fusion convolution layer are respectively used as the input of the three prediction heads YOLO Head; the output of the three prediction heads YOLO Head is three prediction feature maps of DR-RSBU-yollov 5 network output, namely N =3.

5. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 4, wherein: in the improved depth residual shrinkage structure DR-RSBU-CW, the outputs of a second batch of normalization layers BN and a third batch of normalization layers BN are added to obtain an addition result, and then zero-giving processing is carried out on the addition result to obtain a zero-giving result, specifically, the input of a DR-RSBU-YOLOv5 network is in an image form, the image input by the DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until the image is processed by the second batch of normalization layers BN and the third batch of normalization layers BN and then output into a first feature map and a second feature map, the first feature map and the second feature map are added to obtain an addition feature map, and then zero-giving processing is carried out on all values on the addition feature map, namely, the values larger than or equal to 0 in all values on the addition feature map are assigned to zero to obtain a zero feature map;

and carrying out absolute value processing on the output results of the second batch of normalization layers BN, specifically, taking absolute values of all values on the first characteristic diagram and then carrying out the next operation.

6. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 5, wherein: in the improved depth residual shrinkage structure DR-RSBU-CW, the result obtained by multiplying the output of the global average pooling layer by the output of the activation function Sigmoid and the zeroing result are subjected to soft thresholding together, specifically, an image input by a DR-RSBU-YOLOv5 network is processed in the DR-RSBU-YOLOv5 network until a multiplied feature map is obtained by multiplying the output feature map of the global average pooling layer and the output feature map of the activation function Sigmoid, all values on the multiplied feature map and the zeroing feature map are subjected to soft thresholding, and one value x in the soft thresholding is processed through the following soft threshold function:

7. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 1, wherein: in the step S5), X is the divisor of M, and N is the divisor of K.

8. The DR-RSBU-YOLOv5 based fabric flaw detection method of claim 1, wherein: in the step S6), the following operations are performed on the N training prediction feature maps of each extended fabric image:

sequencing the K prior frames in the step S3) according to the size of a scale, then uniformly dividing the K prior frames into N groups of prior frames according to the sequencing sequence, and respectively allocating the N groups of prior frames to N training prediction feature maps according to the size of the N groups of prior frames and the size of the N training prediction feature maps, namely allocating the N groups of prior frames to the N training prediction feature maps with the sizes from large to small according to the size of the N groups of prior frames from small to large in sequence;

superposing K/N prior frames on an anchor point, wherein the anchor point corresponds to an anchor point vector with the length of a, and a = num _ anchor (5 + num _ class), wherein num _ anchor represents the number of the prior frames on the anchor point, namely K/N, and num _ class represents the number of fabric defect categories, namely K; carrying out dimension splitting on the anchor point vector to obtain K/N one-dimensional adjustment vectors respectively corresponding to K/N priori frames, wherein the length of each one-dimensional adjustment vector is 5+ num \ class, adjusting the position and the scale of each priori frame according to the position scale adjustment information of each one-dimensional adjustment vector to respectively obtain K/N training prediction frames, wherein the information of each training prediction frame comprises respective position information, classification confidence coefficient and frame confidence coefficient, and the position information is the width and the height of the training prediction frame and the coordinate information of a central point;

9. The DR-RSBU-YOLOv 5-based fabric flaw detection method of claim 8, wherein: in step S7), the overall loss value of the DR-RSBU-YOLOv5 network is calculated according to the training candidate frames and the target GT frames, that is, the following operations are performed for each target GT frame and the training candidate frame obtained in step S6) of the target GT frame:

the target GT frame corresponds to a one-dimensional GT vector, the length of the one-dimensional GT vector is 5+ num \ class, and the information of the one-dimensional GT vector comprises position information, classification confidence coefficient and frame confidence coefficient; through step S6), a one-dimensional adjustment vector used by the training candidate frame is adjusted, and the loss between the one-dimensional GT vector and the one-dimensional adjustment vector is calculated, including the bounding box position loss, the classification confidence loss, and the bounding box confidence loss: calculating by using the CIoU loss according to the position information of each one-dimensional GT vector and each one-dimensional adjusting vector to obtain the position loss of the bounding box; obtaining classification confidence loss by using binary cross entropy loss calculation according to the classification confidence of each one-dimensional GT vector and each one-dimensional adjustment vector; obtaining frame confidence coefficient loss by using binary cross entropy loss calculation according to the frame confidence coefficients of the one-dimensional GT vectors and the one-dimensional adjusting vectors;

and weighting and summing the position loss of the bounding box, the classification confidence loss and the bounding box confidence loss to obtain an overall loss value of the DR-RSBU-YOLOv5 network, back-propagating the overall loss value into the DR-RSBU-YOLOv5 network, and updating and optimizing parameters of the DR-RSBU-YOLOv5 network by using a gradient descent method.

10. The DR-RSBU-YOLOv 5-based fabric flaw detection method of claim 8, wherein: in step S11), the following operations are performed for each final prediction block and each detected prediction feature map in which the final prediction block is located: