CN116934685A

CN116934685A - Steel surface defect detection algorithm based on Focal module and deformable convolution

Info

Publication number: CN116934685A
Application number: CN202310673020.4A
Authority: CN
Inventors: 黄同愿; 张伟峰; 余潜江
Original assignee: Chongqing University of Technology
Current assignee: Chongqing University of Technology
Priority date: 2023-06-07
Filing date: 2023-06-07
Publication date: 2023-10-24

Abstract

The invention discloses a FOcal module based on Yolov7 and a deformable convolution network steel surface defect detection algorithm, belonging to the field of target detection, comprising the following steps: s1, data acquisition; s2, constructing a newly developed target detection algorithm based on a YOLOv7 model; s3, optimizing the loss function, training a target detection algorithm model, and storing an optimal model; s4, predicting by using an optimal model, storing a prediction result, acquiring an evaluation index, and finally comparing the result; steel is an indispensable raw material in the manufacturing industry, and the quality of the steel determines the quality of the product. However, many defects are generated on the surface of the steel during the production process, and the defects have the problems of various types, complex shape, irregular shape and the like. To detect these defects, we propose a DF-YOLOv7 model suitable for steel surface defect detection. Firstly, a K-means++ algorithm is used for the model, the size of defects in different data sets is adapted by adjusting the size of a fixed anchor frame, and the feature extraction capability of the model for different defects is improved. Secondly, the model is provided to extract outline features of defects by using a D-SPPCSPC module, so that defect detection performance of the model is improved, and the parameter number of the model is further reduced. Finally, the model is put forward to use CIoULoss with a Focal module, so that the model is more focused on a high-quality anchor frame, and the problem of unbalance of positive and negative samples is solved. The experimental result shows that under the detection speed of 51.7 frames per second, mAP of the proposed model on the NEU-DET data set is 0.771, which is improved by 3.6% compared with the original model and is higher than the precision of some SOTA detectors, so that the DF-YOLOv7 model can obtain good defect detection performance, and the requirement of industry on real-time detection is met.

Description

Steel surface defect detection algorithm based on Focal module and deformable convolution

Technical Field

The invention relates to a deep neural network, in particular to an algorithm for improving the detection precision of steel surface defects by fusing a Focal module and a deformable convolution network, and belongs to the field of target detection.

Background

In the object detection task, the steel defect detection algorithm of the front edge basically has the problem of low detection precision. In view of this, we propose a YOLOv7 steel defect detection algorithm with higher inspection accuracy, which is an algorithm based on K-means++, deformable convolution, increasing the loss function of the Focal module. The feature enhancement extraction of the model is realized by utilizing the deformable convolution, the SPPCSPC architecture is modified, the convolution mode is replaced by the deformable convolution, the new structure is named as a D-SPPCSPC module, and different semantic information is fused in the three feature graphs by combining with the FPN, so that the feature extraction quality and the defect detection performance are improved. We retain the dominant feature extraction network of YOLOv7, adding the improvement part to the later part. And validated effectively and allowed experiments on YOLO series algorithms in the NEU-DET dataset. The result shows that the novel YOLOv7 algorithm is very effective in detecting steel defects.

The method is inspired by a data set acquired by steel in the field, and the main problem of detecting the defects of the steel is how to judge the position of the defects of the steel, and then, the type of defects can be judged. Moreover, the insufficient extraction of the contour information can lead to misjudgment of defect types, so that deformable convolution is introduced to strengthen the extraction of image features, multi-scale feature extraction is performed, the perception capability of the edge information in a feature map is improved, and the accuracy of a model on defect detection is improved. Based on the Yolo architecture, we designed a stronger feature extraction module D-SPPCSPC to extract the defect features.

The early steel surface defect detection method is that a worker discovers defects existing in steel by using a high-frequency high-intensity flash lamp, and the detection accuracy rate greatly depends on the working state and experience level of the worker. In the traditional computer vision defect detection, image information is acquired through a CCD camera, and then the defect information is obtained by analyzing local abnormal characteristics through a signal processing program. These conventional defect detection methods have certain limitations, and it is difficult to meet the requirement of high efficiency in the high-speed production process of high-strength modernization.

With the advent of deep learning, both two-stage and single-stage target detection methods were applied to the detection of steel surface defects. The two-stage detection method has high accuracy, but the detection speed is slower. Single stage detectors are faster but have a somewhat lower accuracy than two stage detectors.

Although YOLO has achieved great success in target detection, there are still problems in steel defect detection. Firstly, for the calculation of the size of an anchor frame, different initial centroids are selected, so that the clustering result is larger in difference, and the practical application capability of the anchor frame is reduced. Secondly, the relevance from shallow layer to deep layer features in the model is low, detail features are easy to lose, and defects cannot be better distinguished and positioned. Third, since there is a case where the positive and negative samples are unbalanced in the samples, the calculation of loss is also greatly affected. When there are more easily distinguishable negative samples, the whole training process is performed around the easily distinguishable negative samples, so that the positive samples are submerged, and the model is slow to converge and inaccurate in regression. In addition, the number of the defect data sets of the steel materials disclosed in the prior art is small, and the types are not large. Therefore, the detection of the surface defects of the steel has the problems of poor model generalization capability, incomplete feature extraction and the like, so that the defect detection precision is low, and the method cannot be effectively applied to the real-time detection of industry.

Disclosure of Invention

In order to solve the problem of lower real-time detection precision of steel defects in target detection, the method can effectively improve the detection precision of the defects by fusing a Focal module and deformable convolution, has good performance of detecting the steel defects in real time, and has the following specific scheme:

the detection precision algorithm by fusing the Focal module and the deformable convolution comprises the following steps:

s1, data acquisition;

s2, constructing a newly developed target detection algorithm based on a Focal module and a YOLOv7 model of a deformable convolution network;

s3, optimizing the loss function, training a target detection algorithm model, and storing an optimal model;

s4, predicting by using an optimal model, storing a prediction result, acquiring an evaluation index, and finally comparing the result;

further, the experimental data in step S1 has been rapidly deployed into a wide range of industrial applications in actual production. Therefore, the requirements for steel defect detection data collected from these platforms are increasing, which makes computer vision more and more closely related to steel defect detection. It is appreciated that a large scale reference is presented for various important computer vision tasks and relevant important defects are marked, named NEU-DET, to meet vision and defect detection. NEU-DET was collected by university of Northeast (NEU) and recorded In the university of northeast surface defect database, and has a total of six typical surface defects of hot rolled steel strip, namely, roll-In scale (RS), plaque (Pa), cracks (Cr), pitting (PS), inclusions (In) and scratches (Sc). The database includes 1,800 grayscale images: 300 samples of each of six different types of typical surface defects. The original resolution of each image is 200 x 200 pixels. The intra-class defects in the database are greatly different in appearance, for example, the scratches may be horizontal scratches, vertical scratches, oblique scratches, and the like. At the same time, the inter-class defects have similar aspects, such as rolling into scale, cracks and pitting. In addition, the gray scale of the intra-class defect image varies due to the influence of illumination and material variation. In short, the NEU surface defect database includes two difficult challenges, namely that intra-class defects have large differences in appearance, while inter-class defects have similar aspects.

Further, the content of step 2 will mainly include the use of the K-means++ method, the design of the D-SPPCSPC structure, and the design of the enhanced feature extraction layer based on the D-SPPCSPC structure architecture, and the optimization of the loss function CIOU. The method comprises the following steps:

s21, the generalization capability of different data sets is improved through K-means++, and the feature extraction capability of the model for different defect sizes is improved;

s22, a D-SPPCSPC module is provided, the extraction capacity of the shallow key features is enhanced from the deep features, more effective information is provided for model prediction defects, and the robustness of the model is improved;

s23, proposing CIoULoss with a Focal module, solving the problem of sample unbalance in a bounding box regression task, and reducing the contribution of a large number of anchor boxes which are less overlapped with a target box to the bounding box regression so as to focus on a high-quality anchor box, thereby improving the regression accuracy and the robustness of the model.

Further, in step S21, the K-means++ can significantly reduce the error of the classification result to obtain a better clustering effect, obtain an anchor frame scale more suitable for training data, improve the detection speed and accuracy, and for different data sets, the sizes of anchor frames suitable for the different data sets are inconsistent. Therefore, the size of the Anchor box is flexibly designed according to actual conditions, so that the convergence speed during training can be increased, and the target can be positioned more accurately, thereby better meeting the requirements of actual production. .

Further, in step S22, the deformable convolution adds an offset to the sampling point, so that the convolution kernel can be dynamically adjusted according to the content of the target area, thereby better extracting the characteristics of defects with complex and irregular shapes, obtaining richer texture and contour information, and fusing the characteristics by adopting the deformable convolution in the later stage of the network, so as to be beneficial to transmitting more low-resolution semantic information to the final detection process, and further improving the detection accuracy.

Further, CIoUloss in step S23 is used for the regression task of the bounding box of the target detection algorithm, since it is scale insensitive (scalein variable) and directly quantifies the overlap ratio of the predicted and real boxes with respect to L1Loss and L2 Loss. The Focal module is added on the basis, so that the loss contribution of the samples which are easy to classify is reduced, the loss proportion of the samples which are difficult to classify is increased, and the proportion between the positive and negative sample losses is adjusted. The regression accuracy and the robustness of the model are improved, and the loss function greatly improves the performance of the model.

Further, CIoUloss is used for the regression task of the bounding box of the target detection algorithm in step S23, and the bounding box loss function defined based on the L1 norm refers to calculating the differences between absolute values of four point coordinates of the prediction bounding box and four point abscissas and ordinates corresponding to the real bounding box, respectively, and then adding. The boundary box Loss function defined based on the L2 norm refers to a boundary box regression Loss function (IoU _Loss) which takes a box formed by 4 points of the boundary box as a whole for regression, and takes the correlation between coordinates into consideration, and a Focal module is added on the basis of the box Loss function, wherein the difference between squares and the difference between the squares are calculated respectively by the four point coordinates of the predicted boundary box and the four point abscissa and the ordinate corresponding to the real boundary box and then added, and the FocalCIOU has the following formula:

Loss _{Focal CIoU} ＝IoU ^γ Loss _CIoU

further, a target detection algorithm is built for model training, a data set is loaded for training, then an optimal model of the target detection algorithm is stored, and a trained model is used for predicting a test set of data.

The invention has the beneficial effects that:

the real-time detector of the YOLO series disclosed by the invention is favored by the industry because of the real-time advantage of the detection algorithm of the steel surface defects through the Focal module and the deformable convolution. In this section we compared to the same scale YOLO series detectors, including yolov3, yolov4, yolov5-l, yolov7-l, performance on target detection tasks on NEU-DET to demonstrate the effectiveness of the proposed method on steel defect detection issues.

In general, we propose a DF-YOLOv 7-based defect detection algorithm that can efficiently and accurately detect steel surface defects that are complex in shape and irregular. The method has the advantages that the generalization capability of different data sets is improved by using K-means++, the feature extraction capability of the model for different defect sizes is improved, the outline features of defects are extracted by using a D-SPPCSPC module, the defect detection performance of the model is improved, meanwhile, the focus of a high-quality sample is enhanced by adding a Focal module, and the problem of unbalance of positive and negative samples is solved. Experimental results show that mAP of the proposed algorithm on the NEU-DET data set reaches 0.771, which is improved by 3.6% compared with the original model, and is higher than the accuracy of some SOTA defect detectors. This shows that the DF-YOLOv7 model significantly improves the defect detection accuracy on the steel surface defect data set, while maintaining the processing speed standard required for steel surface defect detection. However, there is a slight disadvantage in that the proposed solution seems to have a difference in contribution to models of different scales, which still has certain limitations in detecting defects of irregular shape, unclear boundaries or extreme background disturbances. In future research, along with the continuous development of defect detection technology, the materialization and the precision of defect information are important to research. The image processing method will be widely used for noise reduction and enhancement of boundary contour features. For example, background interference may be reduced using a diffusion model, thereby enabling robust detection of defects. Therefore, further research work is to extend the network architecture to a wider defect detection field to meet the defect detection requirements in more practical scenarios.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an improved YOLOv7 overall architecture;

FIG. 2D-SPPCSPC block diagram;

fig. 3 is a schematic diagram of a deformable convolution principle.

FIG. 4 is a schematic representation of the visual comparison results of SOTAYolo series and DF-Yolov7 (our);

Detailed Description

Further details are provided below with reference to the specific embodiments.

s1, data acquisition;

the experimental data in step S1 has been rapidly deployed into a wide range of industrial applications in actual production. Therefore, the requirements for steel defect detection data collected from these platforms are increasing, which makes computer vision more and more closely related to steel defect detection. It is appreciated that a large scale reference is presented for various important computer vision tasks and relevant important defects are marked, named NEU-DET, to meet vision and defect detection. NEU-DET was collected by university of Northeast (NEU) and recorded In the university of northeast surface defect database, and has a total of six typical surface defects of hot rolled steel strip, namely, roll-In scale (RS), plaque (Pa), cracks (Cr), pitting (PS), inclusions (In) and scratches (Sc). The database includes 1,800 grayscale images: 300 samples of each of six different types of typical surface defects. The original resolution of each image is 200 x 200 pixels. The intra-class defects in the database are greatly different in appearance, for example, the scratches may be horizontal scratches, vertical scratches, oblique scratches, and the like. At the same time, the inter-class defects have similar aspects, such as rolling into scale, cracks and pitting. In addition, the gray scale of the intra-class defect image varies due to the influence of illumination and material variation. In short, the NEU surface defect database includes two difficult challenges, namely that intra-class defects have large differences in appearance, while inter-class defects have similar aspects.

in general, in the process of data annotation, the size of a real frame is greatly different from the default size of an original algorithm, and the size of an anchor frame has influence on the network detection speed and accuracy. For the K-means clustering algorithm, when the selected initial centroids are different, the clustering results are greatly different. This allows the pre-set anchor frame to not play a promoting role, but rather slow down the model regression, causing decision bias or errors. In addition, the size of the anchor frame corresponding to the data set is not consistent with that of the data set. Therefore, the size of the Anchor box is flexibly designed according to actual conditions, so that the convergence speed during training can be increased, and the target can be positioned more accurately, thereby better meeting the requirements of actual production.

In order to overcome the defects, the K-means++ algorithm is used for replacing the K-means algorithm to solve the problem when the anchor frame is redesigned. K-means++ can remarkably reduce errors of classification results to obtain better clustering effects, obtain anchor frame dimensions more suitable for training data, and improve detection speed and accuracy.

K-means++ can remarkably reduce errors of classification results to obtain better clustering effects, obtain anchor frame dimensions more suitable for training data, improve detection speed and accuracy, and adapt to different data sets, so that the sizes of anchor frames are inconsistent. Therefore, the size of the Anchor box is flexibly designed according to actual conditions, so that the convergence speed during training can be increased, and the target can be positioned more accurately, thereby better meeting the requirements of actual production.

the three feature layers output by the Neck part are all related to the feature images output by the SPPCSPC module at the last of the backbones, so that the feature enhancement extraction is carried out on the feature images generated by the module, and the adaptability of the model to defects of different scales can be remarkably improved. Therefore, the robustness and the positioning accuracy of the model can be improved to the greatest extent. A new D-SPPCSPC module is presented herein.

The method can further strengthen the extracted features without obviously reducing the training speed of the model or increasing the complexity of the model by replacing the traditional convolution of 3X3 with the deformable convolution.

The traditional convolution only samples fixed positions in the characteristic diagram, but the shape of the steel surface defects is irregular and the sizes are inconsistent. Therefore, the models have poor fitting properties for steel defects. This situation may result in that important defect texture and contour information cannot be sufficiently extracted, and thus a certain degree of influence is caused on defect detection, resulting in missing of a defect target. Therefore, in the convolution process, the convolution kernel with a regular shape may limit feature extraction, so that the traditional convolution method has a certain limitation, can not effectively extract irregular local features, and loses key information of an image, thereby reducing accuracy of defects.

To solve the above problem, we introduce Deformable Convolution NetWork (DCN) to enhance the feature extraction capability of the network. In contrast to conventional convolution, the deformable convolution breaks away from the limitations of conventional convolution. The deformable convolution can dynamically adjust the convolution kernel according to the content of the target area by adding offset to the sampling points, so that the characteristics of defects with complex and irregular shapes can be better extracted, and richer texture and contour information can be obtained.

In the later stage of the network, the deformable convolution is adopted to fuse the features, so that more low-resolution semantic information can be transferred to the final detection process, and the detection accuracy is further improved. As shown in fig. XX, the sampling position of the conventional convolution on the target is fixed, whereas the deformable convolution can adaptively learn the sensing region, which has strong adaptability.

In addition, the D-SPPCSPC adds parallel 5×5, 9×9 and 13×13 MaxPool operations in the stacked convolution layers, and the operations can carry out multi-scale pooling on the feature graphs according to the size of the input feature graphs, and splice the pooling results to be used as new feature graphs to be input to the subsequent convolution layers. Therefore, the information of the input feature map can be fully utilized, and the model is adapted to images with different resolutions. In the final output stage, the D-SPPCSPC divides the features into two parts, wherein one part is processed conventionally, the other part is processed in parallel pooling, and the two parts are finally combined together. This can reduce the calculation amount by half, so that the speed of the model becomes faster and the accuracy becomes higher.

S23, providing a CIoU Loss with a Focal module, solving the problem of sample unbalance in a bounding box regression task, and reducing the contribution of a large number of anchor boxes which are less overlapped with a target box to the bounding box regression so as to focus on a high-quality anchor box, thereby improving the regression accuracy and the robustness of the model. CIoU loss is used for the regression task of the bounding box of the target detection algorithm, and the bounding box loss function defined based on the L1 norm refers to the calculation of the difference between absolute values of four point coordinates of the predicted bounding box and four point abscissas and ordinates corresponding to the real bounding box, respectively, and then addition. The boundary box Loss function defined based on the L2 norm refers to a boundary box regression Loss function (IoU _Loss) which takes a box formed by 4 points of the boundary box as a whole for regression, and takes the correlation between coordinates into consideration, and a Focal module is added on the basis of the box Loss function, wherein the difference between squares and the difference between the squares are calculated respectively by the four point coordinates of the predicted boundary box and the four point abscissa and the ordinate corresponding to the real boundary box and then added, and the FocalCIOU has the following formula:

Loss _{Focal CIoU} ＝IoU ^γ Loss _CIoU

CIoU Loss is used for the regression task of bounding boxes of target detection algorithms because it is scale insensitive (scale invariant) and directly quantifies the overlap of the predicted and real boxes relative to L1Loss and L2 Loss. The Focal module is added on the basis, so that the loss contribution of the samples which are easy to classify is reduced, the loss proportion of the samples which are difficult to classify is increased, and the proportion between the positive and negative sample losses is adjusted. The regression accuracy and the robustness of the model are improved, and the loss function greatly improves the performance of the model.

S24, further using three layers of pre-measuring heads to accurately predict steel defects, and increasing the extraction capacity of the defects.

to verify if the improvement to the baseline YOLOv7 network is valid. Ablation experiments were performed herein on the NEU-DET dataset. There were 7 total experiments, and the results of the experiments are shown in the table.

TABLE 1 comparison of experimental effects of different modules on v7 model

As can be seen from table 1, each improved module improves the detection accuracy based on the original network, and particularly improves the accuracy of the AP50 significantly.

When the K-means++ algorithm is used to change the size of the anchor frame, the AP50 and the AP75 of the model are improved by a small extent, namely 0.400,0.747,0.394. The difference of local optimal solutions caused by selecting different initial centroids is avoided, so that the size of an anchor frame which is more in line with an actual data set is obtained, and the target positioning is more accurate. As can be seen from the table, the accuracy of the improved algorithm using K-means++ is higher than that of the algorithm not used under the same conditions.

After combining the K-means++ algorithm and the D-SPPCSPC module, the AP50 and the AP75 are improved significantly compared with the original model. Because the deformable convolution can break the inherent regular-shaped convolution kernel of the traditional convolution, and combines the convolution kernel weight and the offset value, the model can better perform modeling learning on geometric transformation, more associated target areas are output, and the capability of the model for adapting to deformation and the positioning accuracy of defects are improved. The overall effect of the model is somewhat worse than the K-means + + algorithm alone because the deformable convolution may introduce unwanted regions to interfere with the feature extraction of the model, which can reduce the model's performance in terms of values.

Based on the combination of the K-means++ algorithm and the D-SPPCSPC module, and the FocalCIOULoss, the overall AP value is improved, and especially the AP50 value reaches the best 0.771 in the whole experiment. The Focal module focuses on the difficult-to-separate samples from the aspect of difficult-to-separate samples, solves the problem of low classification accuracy of few samples, and is also beneficial to improving the overall performance of the model.

From the results of Table XX, the DF-YOLOv7 model improves accuracy and efficiency in defect detection, and the overall FLOPs are reduced to 12.449G. Meanwhile, compared with a reference model, the parameter quantity of the DF-YOLOv7 model is reduced to 32.754M, and the detection accuracy is improved. These results suggest that the DF-YOLOv7 model may better meet actual production requirements.

TABLE 2 comparison of experimental effects of different YOLO baseline models

YOLO series defect detection algorithms are favored by the industry for their high real-time and high reliability advantages. We verified the effectiveness of the proposed method for defect detection by comparing the YOLO series detectors of different sizes, including SSD, YOLOv3, YOLOv4, YOLOv5-l and YOLOv7-l, on the NEU-DET dataset. Experimental results show that the DF-YOLOv7 model is optimal in defect detection of Crazing, pitted _ surface, rolled _in-scale, scratches and the like, the mAP value is 0.771, and the FPS value is 51.7, which are all optimal values.

TABLE 3 comparison of experimental effects of SOTA target detection models

To further demonstrate the superiority of DF-YOLOv7 in surface defect detection, we performed a comparative analysis with some of the existing advanced detection methods.

Experimental results Table 3 shows that mAP of DF-Yolov7 reaches 0.771 at maximum. Among the four types of defect detection Inclusion, patches, pitted-surface, scratches, the Precision of DF-Yolov7 reaches 0.808, 0.935, 0.889 and 0.965, respectively, which are all optimal performance for such defect detection. For other surface defects, such as Crazing, rolled-in-scale, although less accurate than the optimal level for other SOTA defect detectors, the overall performance of the proposed model is considerable. In terms of speed, FPS (FramesPerSecond) was employed to verify the real-time nature of the proposed model. The DF-YOLOv7 can reach 51.7FPS, and compared with other algorithms, the detection speed of DF-YOLOv7 is at a middle-upper level, which means that DF-YOLOv7 also meets the quasi-real-time requirement of steel surface defect detection.

in figure four we show one visualized test result for the different detectors described above.

Claims

1. The method for detecting the surface defects of the steel based on the Yolov7 and fused Focal module and deformable convolution network comprises the following steps:

s1, data acquisition;

s2, constructing a newly developed target detection algorithm based on a YOLOv7 model;

s4, predicting by using the optimal model, storing a prediction result, acquiring an evaluation index, and finally comparing the result.

2. The Yolov 7-based fused Focal module and deformable convolution network steel surface defect detection algorithm of claim 1, wherein: the experimental data in the step S1 has been rapidly deployed into actual production for a wide range of industrial applications. Therefore, the requirements for steel defect detection data collected from these platforms are increasing, which makes computer vision more and more closely related to steel defect detection. It is appreciated that a large scale reference is presented for various important computer vision tasks and relevant important defects are marked, named NEU-DET, to meet vision and defect detection. NEU-DET was collected by university of Northeast (NEU) and recorded In the university of northeast surface defect database, and has a total of six typical surface defects of hot rolled steel strip, namely, roll-In scale (RS), plaque (Pa), cracks (Cr), pitting (PS), inclusions (In) and scratches (Sc). The database includes 1,800 grayscale images: 300 samples of each of six different types of typical surface defects. The original resolution of each image is 200 x 200 pixels. The intra-class defects in the database are greatly different in appearance, for example, the scratches may be horizontal scratches, vertical scratches, oblique scratches, and the like. At the same time, the inter-class defects have similar aspects, such as rolling into scale, cracks and pitting. In addition, the gray scale of the intra-class defect image varies due to the influence of illumination and material variation. In short, the NEU surface defect database includes two difficult challenges, namely that intra-class defects have large differences in appearance, while inter-class defects have similar aspects.

3. The Yolov 7-based fused Focal module and deformable convolution network steel surface defect detection algorithm of claim 1, wherein: the content of the step 2 mainly comprises the use of a K-means++ method, the design of a D-SPPCSPC structure, the design of an enhanced feature extraction layer based on the D-SPPCSPC structure architecture and the optimization of a loss function CIOU. The simple operation mode is adopted consistently, a plurality of 3*3 convolution kernels which are more efficient for GPU calculation are used, and excessive branch bottleneck structures are not used as far as possible, specifically:

s23, providing a CIoU Loss with a Focal module, solving the problem of sample unbalance in a bounding box regression task, and reducing the contribution of a large number of anchor boxes which are less overlapped with a target box to the bounding box regression so as to focus on a high-quality anchor box, thereby improving the regression accuracy and the robustness of the model.

4. The Yolov 7-based fused Focal module and deformable convolution network steel surface defect detection algorithm of claim 1, wherein: in the step S21, K-means++ can remarkably reduce errors of classification results to obtain better clustering effects, obtain anchor frame dimensions more suitable for training data, improve detection speed and accuracy, and adapt to different data sets, wherein the sizes of anchor frames are inconsistent. Therefore, the size of the Anchor box is flexibly designed according to actual conditions, so that the convergence speed during training can be increased, and the target can be positioned more accurately, thereby better meeting the requirements of actual production. .

5. The Yolov 5-based fusion spatial information multi-head prediction small target detection algorithm of claim 3, wherein: in the step S22, the deformable convolution adds an offset to the sampling point, so that the convolution kernel can be dynamically adjusted according to the content of the target area, thereby better extracting the characteristics of defects with complex and irregular shapes, obtaining richer texture and contour information, and fusing the characteristics by adopting the deformable convolution in the later stage of the network, thereby being beneficial to transmitting more low-resolution semantic information to the final detection process, and further improving the detection accuracy.

6. A Yolov 7-based fused Focal module and deformable convolution network steel surface defect detection algorithm according to claim 3, characterized in that: the CIoU Loss in step S23 is used for the regression task of the bounding box of the target detection algorithm, since it is scale insensitive (scale invariant) and directly quantifies the overlap of the predicted and real boxes with respect to L1Loss and L2 Loss. The Focal module is added on the basis, so that the loss contribution of the samples which are easy to classify is reduced, the loss proportion of the samples which are difficult to classify is increased, and the proportion between the positive and negative sample losses is adjusted. The regression accuracy and the robustness of the model are improved, and the loss function greatly improves the performance of the model.

7. The Yolov 7-based fused Focal module and deformable convolution network steel surface defect detection algorithm of claim 6, wherein: the CIoU loss in step S23 is used for the regression task of the bounding box of the target detection algorithm, and the bounding box loss function defined based on the L1 norm refers to calculating the difference between the absolute values of the four point coordinates of the predicted bounding box and the four point abscissas and ordinates corresponding to the real bounding box, and then adding. The boundary box Loss function defined based on the L2 norm refers to a boundary box regression Loss function (IoU _Loss) which takes a box formed by 4 points of the boundary box as a whole for regression, and takes the correlation between coordinates into consideration, and a Focal module is added on the basis of the box Loss function, wherein the sum of squares difference is calculated respectively by four point coordinates of the predicted boundary box and four point abscissa and four point ordinate corresponding to the real boundary box and then added, and the formula of the Focal CIOU is as follows:

Loss _{Focal CIoU} ＝IoU ^γ Loss _CIoU