CN114359245A

CN114359245A - Method for detecting surface defects of products in industrial scene

Info

Publication number: CN114359245A
Application number: CN202210021615.7A
Authority: CN
Inventors: 王星; 庄开宇; 杨根科
Original assignee: Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Current assignee: Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority date: 2022-01-10
Filing date: 2022-01-10
Publication date: 2022-04-15

Abstract

The invention discloses a method for detecting surface defects of products in an industrial scene, which relates to the technical field of product surface defect detection and machine vision in the industrial scene, and comprises the following steps: step 1, acquiring images to obtain surface images of products on an industrial production line; step 2, image marking, namely marking the surface image with defects to obtain a surface defect data set; step 3, enhancing data, namely enhancing the data of the surface defect data set by one or a combination of a plurality of data enhancement modes, wherein the data enhancement modes comprise random cutting, random horizontal turning, random vertical turning, scale dithering, color dithering, Mosaic or Mixup; step 4, constructing a surface defect detection model; step 5, training a surface defect detection model; and 6, predicting the surface defect detection model.

Description

Method for detecting surface defects of products in industrial scene

Technical Field

The invention relates to the technical field of product surface defect detection and machine vision in an industrial scene, in particular to a product surface defect detection method in an industrial scene.

Background

In an industrial scene, efficient and stable quality detection is an important link of a product manufacturing process. The speed and accuracy of quality detection directly affect the production line capacity and the final quality of the product. The defect-free surface of the product is an important basis for judging whether the product meets the industrial quality requirement. In the industrial production and manufacturing process, the defect detection on the surface of the product is mostly completed through manual quality inspection, but the manual detection method needs high labor cost, has high false detection rate and omission factor and cannot meet the requirement of real-time detection. In order to adapt to the information and intelligent trend of the current manufacturing industry, the constraint of the traditional manpower and manual work on the productivity and the efficiency needs to be eliminated in the production link.

The product surface defect detection method based on the traditional machine vision is to select and extract defect characteristics in an image processing mode. Firstly, the obtained surface image of the product is influenced by a variable uncertain external industrial production environment and the complex background texture of the product. In addition, the artificially designed features cannot contain all defect features and are complex to extract and poor in generalization. The above disadvantages limit the application of conventional machine vision based methods for detecting surface defects of products.

In recent years, machine vision based on deep learning has been a research focus, and the method also provides a new solution for detecting the surface defects of the products. Convolutional Neural Networks (CNN) are a type of feed-forward Neural network that includes convolution calculations and has a deep structure, and are a representative algorithm for deep learning. The convolutional neural network has the advantages of local connection, weight sharing and the like, and the relationship between a pixel point and surrounding pixel points is fully considered by modeling the pixel points in the local neighborhood of the image through convolution operation. The convolutional neural network gradually extracts complex high-level characteristic information from the image through the multilayer convolutional layers, and has strong characterization learning capacity. In the field of product surface defect detection in industrial scenes, the most widely used is a target detection network model YOLO (you Look Only one) series which utilizes a convolutional neural network to extract image features. The YOLO series detection model can identify the defect type in the image and output the boundary frame coordinates of the defect, so that defect positioning is realized.

However, the existing method for detecting the surface defects of the products in the industrial scene still has the following problems:

1. at present, most of product surface defect detection methods realized through a convolutional neural network are limited by the receptive field of each layer of feature diagram in the network, and a detection model cannot model pixels outside the receptive field in an image. Due to the inherent local modeling characteristic of convolution operation, the defect detection method based on the convolution neural network is difficult to capture the long-distance dependency relationship in data, so that the defect detection precision is low;

2. at present, most detection methods adopt a PANET network to fuse multi-scale features extracted from a backbone network. The PANet network, although fusing input features of different scales, simply sums them without distinction. Input features of different scales have different resolutions and generally contribute differently to the fused output features. The traditional method can cause poor detection effect of the tiny defects;

3. the detection head of most current detection methods is an anchor-based model, requiring densely placing anchor boxes of predefined different sizes and aspect ratios on the feature map. In order to obtain good detection performance, it is necessary to perform clustering analysis and determine the best anchor boxes before model training. Because the surface defects of products on the industrial production line are random, various and unknown, predefined anchor boxes can cause poor generalization capability of the detection method;

4. at present, when most detection methods distribute positive and negative training samples for group channel and background in an image, a fixed label distribution strategy is usually adopted, but a suboptimal distribution result can be caused by adopting the fixed distribution strategy for labels of various sizes, shapes and types.

Therefore, those skilled in the art are devoted to develop a method for detecting surface defects of products in an industrial scene, so as to solve the above problems in the prior art.

Disclosure of Invention

In view of the above defects in the prior art, the technical problem to be solved by the present invention is how to solve the problems that the precision of the product surface defect detection method in the existing industrial scene is not high enough, especially the detection, generalization and robustness of the micro defects are not strong enough, and how to realize the better balance between the detection speed and precision of the product surface defects.

In order to solve the technical problem, the invention provides a method for detecting the surface defects of products in an industrial scene, which comprises the steps of firstly, forming a hierarchical feature map representation by using Swin transform as a backbone network, extracting multi-scale features of a surface image of the product, then, carrying out weighted multi-scale fusion on input features extracted by the backbone network by using a BiFPN network, and finally, outputting a predicted defect detection result in multiple scales by using an FCOS network as a detection head. And the OTA dynamic label distribution strategy is used for distributing training samples during model training, so that the model is quickly and effectively converged, and the final detection precision is improved.

The invention provides a method for detecting surface defects of products in an industrial scene, which comprises the following steps:

step 1, acquiring images to obtain surface images of products on an industrial production line;

step 2, image marking, namely marking the surface image with defects to obtain a surface defect data set;

step 3, enhancing data, namely enhancing the data of the surface defect data set by one or a combination of a plurality of data enhancement modes, wherein the data enhancement modes comprise random cutting, random horizontal turning, random vertical turning, scale dithering, color dithering, Mosaic or Mixup;

step 4, constructing a surface defect detection model;

step 5, training a surface defect detection model;

step 6, predicting a surface defect detection model;

wherein the content of the first and second substances,

in the step 4, the method comprises the following steps:

step 4.1, establishing a backbone network for extracting multi-scale features through a Swin Transformer;

step 4.2, performing multi-scale fusion on the multi-scale features extracted by the backbone network through a BiFPN network, and enhancing the multi-scale features with different resolutions;

and 4.3, based on the multi-scale features after fusion enhancement, using an FCOS network of an anchor-free model as a detection head to generate the surface defect detection model.

Further, in step 4.1, the image block division is implemented by convolution with 7 × 7 with step 4, and the downsampling is implemented by convolution with 3 × 3 with step 2 in the feature map between different stages; calculating self-attention in non-overlapping local windows in each shifted window block; assuming that each of the local windows contains M × M image blocks and the entire surface image contains h × w image blocks, the computation complexity of the global multi-head self-attention and the window-based multi-head self-attention is:

Ω(MSA)＝4hwC²+2(hw)²C；

Ω(W-MSA)＝4hwC²+2M²hwC；

wherein MSA is global multi-head self-attention, omega (MSA) is complexity of global multi-head self-attention, W-MSA is multi-head self-attention based on a window, omega (W-MSA) is complexity of multi-head self-attention based on the window, and C is an image channel; h is the number of image blocks in the direction of the image height H; w is the number of image blocks in the image width W direction.

Further, in said step 4.1, cross-window connections are allowed to increase efficiency; and realizing shift window partitioning among the continuous shift window blocks, wherein the W-MSA mechanism and the SW-MSA mechanism are respectively adopted, and the specific calculation is as follows:

wherein, W-MSA is multi-head self-attention based on window, LN is layer normalization, MLP is multi-layer perceptron, and SW-MSA is multi-head self-attention based on shift window.

Further, in the step 4.2, the following steps are included:

step 4.2.1, deleting nodes with only one input edge in the backbone network by the BiFPN network;

step 4.2.2, if the original input node and the output node are in the same layer, adding an additional edge between the original input node and the output node;

and 4.2.3, the BiFPN takes each bidirectional path from top to bottom and from bottom to top as a characteristic network layer, repeats the same layer for multiple times to realize the characteristic fusion of a higher layer, and uses the rapid normalization fusion weighting characteristic to ensure that the weight value after normalization is between 0 and 1.

Further, in the step 4.3, the FCOS network of the anchor-free model takes position (x, y) as a training sample; the FCOS network shares parameters among different characteristic layers, and has three branches: a Classification branch, a Regression branch and a Center-less branch; the Classification branch predicts the probability that the current layer feature map position (x, y) belongs to C-type defects; predicting the defect boundary frame coordinate corresponding to the position (x, y) of the feature diagram of the current layer by the Regression branch; the Center-less branch predicts the Center-less of the current layer feature map location (x, y); using as a final post-processing confidence score by multiplying the class confidence of the Classification branch prediction with the Center-less of the Center-less branch prediction.

Further, the size of the surface image is related to the product type, and the surface image is an RGB three-channel color image.

Further, in the step 5, the following steps are included:

step 5.1, dividing the surface defect data set subjected to data enhancement into a training set and a testing set according to a certain proportion;

and 5.2, inputting the training set subjected to data enhancement into the surface defect detection model for training to generate the trained surface defect detection model, wherein the division of positive and negative samples during training is based on an optimal transmission distribution strategy.

Further, in the step 5.1, the surface defect data set obtained in the step 2 is divided into the training set and the test set according to a ratio of 4: 1; in the step 5.2, the divided training set is enhanced by using the data enhancement strategy in the step 3, and is input into the surface defect detection model constructed in the step 4 for training; the total training loss of the surface defect detection model is a weighted sum of classification loss and regression loss:

Loss＝L_cls+λL_reg

wherein L is_clsSelecting focal local as the predicted loss between the defect class and the ground trouth class for the classification loss; l is_regSelecting GIoU loss as the loss between the predicted defect boundary box coordinate and the ground truth boundary box coordinate for the regression loss; lambda is a weight coefficient, and the default value is 0.5;

the optimal transmission allocation strategy is to define the unit transmission cost between anchor points and ground truths or background as the sum of the classification loss and the regression loss, and convert the optimal label allocation scheme to solve the optimal transmission plan; after the unit transmission cost is defined, the optimal transmission plan can be solved quickly and effectively through Sinkhorn-Knopp iteration.

Further, the hyper-parameter setting of the surface defect detection model training is as follows: adopting multi-scale training to adjust the size of the input surface image to ensure that the short edge of the image is between 480 and 800 and the long edge of the image is not more than 1333; an SGD optimizer with momentum of 0.9 and weight attenuation of 0.005 is adopted; the surface defect detection model is trained by 100epochs in total, and the initial learning rate is 0.0001; the learning rate was reduced to 1/10 at 67 th and 89 th epochs; using 8 GPU training, each GPU was assigned two images, with a total batch size of 16.

Further, in the step 6, the following steps are included:

inputting the surface images in the divided test set into the surface defect detection model trained in the step 5, extracting the multi-scale features by the backbone network, then performing multi-scale fusion on the multi-scale features extracted by the backbone network through the BiFPN network, and then using the FCOS network as a detection head to respectively output a predicted defect probability and a predicted defect bounding box for the multi-scale feature map of the BiFPN network; for the defect prediction result output by the FCOS network, firstly, a confidence threshold value of 0.05 is used for filtering out low credibility results, then Soft-NMS with a threshold value of 0.6 is used for post-processing the prediction results of all layers to generate a final defect prediction result, and the type and the position of the defect are displayed in the surface image.

The method for detecting the surface defects of the product in the industrial scene based on Swin transform and FCOS provided by the embodiment of the invention at least has the following technical effects:

1. according to the embodiment of the invention, Swin transform is used as a backbone network for extracting image features. The image block division is first achieved by 7 × 7 convolution with step size 4, and then the downsampling is achieved by 3 × 3 convolution with step size 2 for the feature maps between different stages. To reduce computational complexity, multi-headed self-attention is computed in non-overlapping shift windows while allowing cross-window connections. Effective long-distance dependence modeling in images is realized by adopting W-MSA and SW-MSA mechanisms in two continuous Swin transducer blocks respectively.

2. In the embodiment of the invention, the BiFPN network is adopted for carrying out cross-scale connection, and the multi-scale input characteristics extracted by the Swin Transformer of the trunk network are fused. Different from the traditional PANet network which simply sums the multi-scale features, the BiFPN network integrates the weighted bidirectional cross-scale connection and the rapid normalization fusion, and enhances the characterization capability of the input features with different resolutions, thereby improving the precision of the detection of the micro defects.

3. In the embodiment of the invention, an FCOS network of an anchor-free model is used as a detection head. The previous anchor-based model has the position (x, y) on the input image as the center of multiple anchor boxes and these anchor boxes as reference regression target bounding boxes. The detection performance of the anchor-based model is very sensitive to the hyper-parameter design of anchor boxes, the predefined anchor boxes also prevent the generalization performance of the detection method, and the FCOS network of the anchor-free model takes the position (x, y) as a training sample, so that the design amount of the hyper-parameter is reduced. The number of fuzzy samples is reduced by introducing the center-less branch, and the generalization and the robustness of the detection method are improved.

4. The embodiment of the invention determines which group route or background each anchor point should be allocated to by expressing label allocation in the detection method as an optimal transmission problem and considering context information from the global perspective. This allocation strategy is called Optimal Transport Allocation (OTA), and realizes global one-to-many Optimal allocation. This provides more high quality supervisory signals to the training process of the model, allowing it to converge quickly to the best results.

5. Through the improvement of each part, the embodiment of the invention has stronger robustness and generalization capability and improves the precision of the detection of the tiny defects. Compared with the detection method in the prior art, the technical scheme provided by the embodiment of the invention can accurately detect the defects with different sizes, variable sizes and complex types in the industrial production line, and realizes the balance between the detection speed and the detection precision of the surface defects of the products.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a schematic flow chart of a preferred embodiment of the present invention;

FIG. 2 is a schematic block diagram of the embodiment shown in FIG. 1;

FIG. 3 is a schematic diagram of the BiFPN network in the embodiment shown in FIG. 1;

FIG. 4 is a schematic architectural diagram of the FCOS network in the embodiment shown in FIG. 1.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

The existing product surface defect detection method in the industrial scene still has the problems that the product surface defect detection method is not high enough in precision, particularly the detection, generalization and robustness of micro defects are not strong enough, and the problem of how to realize the better balance between the product surface defect detection speed and the precision.

In the embodiment of the method for detecting the surface defects of the product in the industrial scene, the problems in the prior art are improved from four different aspects, specifically:

1. and a Swin transform is adopted as a backbone network for extracting image features, and a hierarchical feature map is formed, so that subsequent multi-scale feature fusion is facilitated. In order to overcome the defect of inherent local modeling characteristics of a convolutional neural network for extracting image features, a common Transformer structure in natural language processing is introduced, a long-distance dependency relationship in an image is captured, and global modeling is performed.

2. And performing cross-scale connection through a BiFPN network, and fusing multi-scale input features extracted by a backbone network. Different from the traditional PANet network which simply sums the multi-scale features, the BiFPN network integrates the weighted bidirectional cross-scale connection and the rapid normalization fusion, and enhances the characterization capability of the input features with different resolutions, thereby improving the precision of the detection of the micro defects.

3. An FCOS network using the anchor-free model is proposed as the detection head. The complicated hyper-parameter design of the traditional anchor-based model, namely the size, the aspect ratio and the number of anchor boxes predefined on a characteristic diagram, is avoided, the parameter number of the model is reduced, meanwhile, as a center-less branch is introduced, the number of fuzzy samples is reduced, and the generalization and the robustness of the detection method are greatly improved.

4. The label distribution step in the detection method is expressed as a special linear programming, namely, an optimal transmission problem, so that the optimal label distribution scheme is searched and converted into the optimal transmission plan. This allocation strategy is then solved iteratively by Sinkhorn-Knopp, and is called Optimal Transport Assignment (OTA). Compared with the traditional label distribution strategy, the optimal transmission distribution takes global information into consideration in the distribution process, and one-to-many distribution is dynamically carried out, so that the model obtains more high-quality supervision signals and rapidly converges to the optimal result.

The method for detecting the surface defects of the product in the industrial scene comprises the following steps as shown in figure 1:

step 1, acquiring images, namely acquiring surface images of products on an industrial production line shot by an industrial camera;

and 2, image marking, namely performing defect marking on the collected product surface image by using an image marking tool. The marked content is the type of the defect and the position of the defect in the image, and the obtained marked file and the original image are used for forming a surface defect data set required by the model;

and 3, enhancing data, namely enhancing the data of the obtained data set, wherein the enhancing data mainly comprises random cutting, random horizontal turning, random vertical turning, scale shaking, color shaking, Mosaic or Mixup.

And 4, step 4: and constructing a surface defect detection model. First, a trunk network for extracting multi-scale features is established by a Swin transform (Shifted-windows transform). Then, multi-scale fusion is carried out on the features extracted from the main network through a BiFPN (bidirectional Feature Pyramid network) network, and features with different resolutions are enhanced. Finally, based on the multi-scale features after the fusion enhancement, an FCOS (full connected One-Stage) network of the anchor-free model is used as the detection head.

And 5, training a surface defect detection model. Dividing a surface defect data set into a training set and a testing set according to a certain proportion, inputting the training set subjected to data enhancement into a surface defect detection model for training, and dividing positive and negative samples during training according to an Optimal Transport Allocation (OTA) strategy.

And 6, predicting the surface defect detection model. And reasoning by using the trained surface defect detection model, inputting the product surface image in the test set, and outputting and displaying the detected defect type and position.

Specifically, in step 1, an image is acquired to obtain a surface image of a product on an industrial production line photographed by an industrial camera. Firstly, the characteristics of the surface defects of the products on the industrial production line are fully highlighted under the illumination condition, and then a CCD industrial camera is used for shooting high-quality surface images of the products. The image size is related to the product type, and the image is an RGB three-channel color image.

And in the step 2, image marking, wherein the acquired surface image of the product is subjected to defect marking by using an image marking tool. Specifically, an open source labeling tool labelImg is used for manually labeling the defects in the collected surface image of the product to form a corresponding xml labeling file. Each surface defect image corresponds to an xml annotation file, namely an annotation file comprising defect type information and defect position information. The label format of the xml label file is the same as that of the PASCAL VOC data set, i.e. the (x, y) coordinates of the upper left corner and the lower right corner of the smallest rectangular bounding box enclosing the defect and the corresponding defect class are stored. The surface defect image and the corresponding label file form a surface defect data set required by the model. The annotation content is the type of the defect and the position of the defect in the image, and the obtained annotation file and the original image are used for forming a surface defect data set, namely a data set, required by the model.

In step 3, the data is enhanced. The surface defect data set is used as the basis of defect detection, and the obtained data set is subjected to data enhancement in consideration of less defect data in industrial practice. The data enhancement techniques mainly include random cropping, random horizontal flipping, random vertical flipping, scale dithering, color dithering, Mosaic, or Mixup. And (3) while data enhancement, if the surface defect image is shifted or transformed, making corresponding changes to the size and the coordinates of the labeling bounding box through formula calculation. And ensuring that the defects in the image are still matched with the labeling boundary box after the image is transformed.

Specifically, random cropping: randomly cutting the image under the condition of ensuring that the defect target is not cut off, and calculating the position of the original labeling boundary frame in the cut image; random horizontal turning: horizontally turning the image and the labeling bounding box with the probability of 0.5; random vertical turning: vertically overturning the image and the labeling boundary frame with the probability of 0.5; and (3) scale dithering: before cutting, randomly adjusting the image size resize to be 0.5-1.5 times of the original image, and correspondingly adjusting the labeling boundary frame; color dithering: transferring the image from the RGB space to the HSV space, randomly changing the lightness (value), the saturation (saturation) and the hue (hue) of the image to form pictures under different illumination and colors, and transferring the converted image to the RGB space; mosaic: splicing the four images, wherein each image has a corresponding labeling boundary frame, and after the four images are spliced, obtaining a new image and simultaneously obtaining the labeling boundary frame corresponding to the new image; mix up: and performing weighted fusion on the two images according to a certain proportion, namely adding each corresponding pixel value according to a certain proportion.

In step 4, a surface defect detection model is constructed. Specifically, the proposed surface defect detection model can be divided into three parts, and the schematic architecture diagrams of each part are shown in fig. 2 to 4.

First, a Swin Transformer of a backbone network for extracting multi-scale features of an image is shown in fig. 2. Swin Transformer is inspired by Transformer in the field of natural language processing, which is known to use a self-attentive mechanism to focus on long-range dependencies in data. The scale of elements in the visual domain is very different and the resolution of pixels in an image is much higher than the words in a text passage. For this Swin Transformer to construct a hierarchical signature graph, unlike the original model, the present invention first uses a 7 × 7 convolution with step size of 4 to implement image block partitioning, and then uses a 3 × 3 convolution with step size of 2 to implement down-sampling for the signature graph between different stages. In each Swin Transformer block, self-attention was calculated in non-overlapping local windows. Assuming that each local Window contains M × M image blocks and the entire image contains h × W image blocks, the computational complexity of the global MSA (Multi-head Self-annotation) and Window-based W-MSA (Window Multi-head Self-annotation) is:

Ω(MSA)＝4hwC²+2(hw)²C；

Ω(W-MSA)＝4hwC²+2M²hwC；

from the above, the computational complexity of Swin Transformer is linear with the image size.

To increase the receptive field of the network to achieve global self-attention, cross-window connections are allowed to improve efficiency. Window partitions are Shifted between consecutive Swin Transformer blocks using the W-MSA (Window Multi-head Self-orientation) and SW-MSA (Shifted-Window Multi-head Self-orientation) mechanisms, respectively. The calculation is as follows:

Then, a BiFPN network is used to perform multi-scale fusion on the features extracted by the Swin Transformer of the backbone network, so as to enhance the input features with different resolutions, and the schematic architecture diagram of the method is shown in FIG. 3. Firstly, compared with the PANET, the BiFPN network deletes the node with only one input edge, because the BiFPN network does not perform feature fusion, the BiFPN network has small contribution to the feature network fusing different features; secondly, if the original input node and the output node are in the same layer, an additional edge is added between the original input node and the output node, and more features are fused under the condition that too much additional calculation cost is not increased; thirdly, the BiFPN network regards each bidirectional path from top to bottom and from bottom to top as a feature network layer, and repeats the same layer for multiple times to realize feature fusion of higher layers. The fast normalized fusion weighting feature is used such that the value of the normalized weights is between 0 and 1. The BiFPN network integrates weighted bidirectional cross-scale connection and rapid normalization fusion.

Finally, the FCOS network of the anchor-free model is used as the detection head, and the schematic diagram of the architecture is shown in FIG. 4. The FCOS network of the anchor-free model takes the position (x, y) as a training sample, so that the design amount of the hyper-parameters is reduced, and the generalization and the robustness of the detection method are improved. FCOS networks share parameters between different feature layers, and have three branches: the Classification branch, the Regression branch, and the Center-less branch. The Classification branch predicts the probability that the current layer feature map position (x, y) belongs to a C-type defect. And predicting the defect boundary box coordinate corresponding to the position (x, y) of the current layer feature diagram by the Regression branch. The Center-less branch predicts the Center-less for the current layer feature map position (x, y). By using the class confidence of the Classication branch prediction multiplied by the Center-less of the Center-less branch prediction as the confidence score for the final post-processing, a large number of low quality defect detection results are suppressed.

In step 5, the surface defect detection model is trained. Specifically, the surface defect data set obtained in step 2 is divided into a training set and a test set in a ratio of 4: 1. Partitioned training set uses data enhancement in step 3And (4) enhancing the strategy, and inputting the surface defect detection model constructed in the step (4) for training. The total loss of training for the surface defect detection model is a weighted sum of the classification loss and the regression loss: loss ═ L_cls+λL_reg。L_clsTo classify the loss, focal loss is chosen as the loss between the predicted defect class and the ground trouth class. L is_regFor regression loss, GIoU loss was chosen as the loss between predicted defect bounding box coordinates and ground truth bounding box coordinates. λ is a weighting factor, and is 0.5 by default.

During training, the division of positive and negative samples is based on an optimal transmission distribution OTA strategy, unit transmission cost between anchor points and ground truths or background is defined as the sum of classification loss and regression loss, and the optimal label distribution scheme is searched and converted into the optimal transmission plan. After the unit transmission cost is defined, the optimal transmission plan can be solved quickly and effectively through Sinkhorn-Knopp iteration. The optimal transmission allocation strategy considers context information from the global perspective and dynamically allocates the defect labels with various sizes, shapes and categories in a one-to-many way.

The hyper-parameter settings for model training are as follows: adopting multi-scale training to adjust the size of the input image to ensure that the short edge of the image is between 480 and 800 and the long edge of the image is not more than 1333; an SGD optimizer with momentum of 0.9 and weight attenuation of 0.005 is adopted; the model was trained for a total of 100epochs, with an initial learning rate of 0.0001. The learning rate was reduced to 1/10 at 67 th and 89 epoch; using 8 GPU training, each GPU was assigned two images, with a total batch size of 16.

In step 6, the surface defect detection model predicts. Specifically, the surface images of the products in the divided test set are input into the surface defect detection model trained in the fifth step. Firstly, extracting multi-scale features by a trunk network Swin transducer, then carrying out multi-scale fusion on the features extracted by the trunk network Swin transducer through a BiFPN network, and then using an FCOS network as a detection head to respectively output a predicted defect probability and a predicted defect bounding box to a multi-scale feature map of the BiFPN network. For the defect prediction result output by the FCOS network, a confidence threshold value of 0.05 is used for filtering out a low-confidence result. The prediction results of all layers are then post-processed using a Soft-NMS threshold of 0.6 to generate the final defect prediction results and display the type and location of the defect in the product surface image.

According to the method for detecting the surface defects of the product in the industrial scene based on Swin transform and FCOS, which is provided by the embodiment of the invention, through the improvement of the parts, the method has stronger robustness and generalization capability and improves the precision of detecting the tiny defects. Compared with the detection method in the prior art, the method can accurately detect the defects with different sizes, variable sizes and complex types in the industrial production line, and realizes the balance between the detection speed and the detection precision of the surface defects of the products.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A method for detecting surface defects of a product in an industrial scene is characterized by comprising the following steps:

step 4, constructing a surface defect detection model;

step 5, training a surface defect detection model;

step 6, predicting a surface defect detection model;

wherein the content of the first and second substances,

in the step 4, the method comprises the following steps:

2. The method for detecting surface defects of products under industrial scenes according to claim 1, characterized in that in step 4.1, the division of image blocks is realized by 7 × 7 convolution with step 4, and the down-sampling is realized by 3 × 3 convolution with step 2 in the feature map between different stages; calculating self-attention in non-overlapping local windows in each shifted window block; assuming that each of the local windows contains M × M image blocks and the entire surface image contains h × w image blocks, the computation complexity of the global multi-head self-attention and the window-based multi-head self-attention is:

Ω(MSA)＝4hwC²+2(hw)²C；

Ω(W-MSA)＝4hwC²+2M²hwC；

3. The method for detecting surface defects of products under industrial scenes according to claim 2, characterized in that in the step 4.1, cross-window connection is allowed to improve efficiency; and realizing shift window partitioning among the continuous shift window blocks, wherein the W-MSA mechanism and the SW-MSA mechanism are respectively adopted, and the specific calculation is as follows:

4. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 1, wherein in the step 4.2, the method comprises the following steps:

5. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 1, wherein in the step 4.3, the FCOS network of the anchor-free model takes the position (x, y) as a training sample; the FCOS network shares parameters among different characteristic layers, and has three branches: a Classification branch, a Regression branch and a Center-less branch; the Classification branch predicts the probability that the current layer feature map position (x, y) belongs to C-type defects; predicting the defect boundary frame coordinate corresponding to the position (x, y) of the feature diagram of the current layer by the Regression branch; the Center-less branch predicts the Center-less of the current layer feature map location (x, y); using as a final post-processing confidence score by multiplying the class confidence of the Classification branch prediction with the Center-less of the Center-less branch prediction.

6. The method for detecting surface defects of products in an industrial setting as claimed in claim 1, wherein the size of the surface image is related to the type of the product, and the surface image is an RGB three-channel color map.

7. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 1, wherein in the step 5, the method comprises the following steps:

8. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 7, wherein in the step 5.1, the surface defect data set obtained in the step 2 is divided into the training set and the testing set according to a ratio of 4: 1; in the step 5.2, the divided training set is enhanced by using the data enhancement strategy in the step 3, and is input into the surface defect detection model constructed in the step 4 for training; the total training loss of the surface defect detection model is a weighted sum of classification loss and regression loss:

Loss＝L_cls+λL_reg

9. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 7, wherein the hyper-parameters of the training of the surface defect detection model are set as follows: adopting multi-scale training to adjust the size of the input surface image to ensure that the short edge of the image is between 480 and 800 and the long edge of the image is not more than 1333; an SGD optimizer with momentum of 0.9 and weight attenuation of 0.005 is adopted; the surface defect detection model is trained by 100epochs in total, and the initial learning rate is 0.0001; the learning rate was reduced to 1/10 at 67 th and 89 th epochs; using 8 GPU training, each GPU was assigned two images, with a total batch size of 16.

10. The method for detecting the surface defects of the products under the industrial scene as claimed in claim 7 or 8, wherein in the step 6, the method comprises the following steps: