CN114494164A

CN114494164A - Steel surface defect detection method and device and computer storage medium

Info

Publication number: CN114494164A
Application number: CN202210037702.1A
Authority: CN
Inventors: 王正旭; 代晓林; 刘梦玫; 张楠
Original assignee: Dalian Jiaji Automation Electromechanical Technology Co ltd
Current assignee: Dalian Jiaji Automation Electromechanical Technology Co ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-05-13

Abstract

The invention discloses a method and a device for detecting steel surface defects and a computer storage medium, and relates to the technical field of steel defect detection, wherein the method comprises the following steps: acquiring a steel surface defect picture data set, preprocessing a steel surface defect picture in the data set, and dividing a training set and a testing set; constructing a Mask RCNN based on a multi-head attention mechanism and a Swin transform, and training the Mask RCNN by using a training set; and carrying out steel surface defect detection on the steel surface defect pictures concentrated in the test by using the trained Mask RCNN to obtain a steel surface defect detection result. The multi-head self-attention mechanism and the Swin transducer disclosed by the invention can keep small pixel information and reduce the calculated amount; therefore, the detection precision can be improved, and particularly, the precision when detecting small-size defects is greatly improved; meanwhile, the calculation speed is increased, and real-time detection is achieved.

Description

Steel surface defect detection method and device and computer storage medium

Technical Field

The invention relates to the technical field of steel defect detection, in particular to a Mask RCNN network steel surface defect detection method and device based on a multi-head self-attention mechanism and a Swin transform and a computer storage medium.

Background

The applications of steel are related to almost every field of human society, such as architecture, automobile, aerospace, precision instruments, and so on. In the foreseeable future, no material can fully replace steel, and the steel is still an indispensable substance material in human society. In the production, transportation and storage processes of steel, the quality of the steel is frequently changed due to the characteristics of long transportation time, complex and variable natural conditions and the like. When a purchaser checks and accepts steel products, it is necessary to determine the types and degrees of defects of the steel products, and these types and degrees of defects require manual handling. However, the manual processing workload is huge, and the response speed is slow, so that the subsequent production and processing are greatly influenced.

The current steel surface defect identification mainly comprises a traditional artificial feature classification detection method and a machine learning classification detection method, and the traditional defect identification algorithm is divided into three categories:

1) statistical recognition algorithm

The algorithm calculates the probability distribution of the occurrence of the defects by counting the regularity and periodicity of pixel distribution and by means of statistical inference, and judges the defects of the metal surface by calculating the probability and expecting. The method mainly comprises the following steps: calculating methods such as critical value calculation, clustering, boundary detection, two-dimensional fractal, gray scale statistics, conjugate matrix calculation, local binary pattern, morphological characteristic discrimination and the like. The method has the advantages of small calculated amount and low requirement on hardware; the defects of the metal surface are required to be regular and periodic, relatively accurate initial values are required, the method is sensitive to noise data, the prediction accuracy is general, and the defects with small size, such as small scratches, small pockmarks and the like, cannot be detected.

2) Spectral method

The method comprises the steps of irradiating the metal surface through a light source, analyzing the spectrum of reflected light, identifying defects and backgrounds according to various calculation methods, and further judging whether the metal surface has defects. The method mainly comprises the following steps: fourier transform, Gabor filter, improved FIR filter, wavelet transform, multi-scale geometric analysis, Hough transform and other calculation methods. The method has the advantages of simple structure, mature algorithm and easy realization and operation; the method has the disadvantages of high stability of light source and multiple high precision spectrum receivers, large equipment investment, high equipment damage rate in high temperature, high humidity and high corrosion environment of factory, difficult maintenance, and poor prediction precision due to only using a filter for data analysis.

3) Method of modelling

The modeling method predicts whether the metal surface is defective by converting the original picture data into a low-dimensional distribution. The method mainly comprises the following steps: markov random field model, Weibull distribution model, dynamic contour model and the like. The method is mainly used for detecting and classifying surface materials, and is generally not used for detecting surface defects due to low prediction precision.

In the machine learning classification detection method, a common deep learning image recognition algorithm is a network architecture algorithm such as fast RCNN and YOLO V5. Although the common deep learning algorithm has made a certain breakthrough in the field of image recognition, the common deep learning algorithm still cannot meet the requirements of steel surface defect recognition, and has low recognition accuracy and low recognition speed on small-scale defects.

Disclosure of Invention

In view of the above, the present invention is to provide a Mask RCNN network steel surface defect detection method, device and computer storage medium based on a multi-head self-attention mechanism and Swin Transformer, and design a deep learning method for the steel surface defect characteristics, so as to meet the requirements of high-precision and fast identification of large, medium and small-sized steel surface defects of each category.

In order to achieve the purpose, the invention provides the following technical scheme:

in one aspect, the invention provides a method for detecting surface defects of steel, comprising the following steps:

acquiring a steel surface defect picture data set, preprocessing a steel surface defect picture in the data set, and dividing a training set and a testing set; the picture of the steel surface defect is a picture containing cracks, inclusions, plaques, a pitting surface, rolling scale or a scratch defect;

constructing a Mask region convolutional neural network Mask RCNN based on a multi-head attention mechanism and a spin transform, and training the Mask RCNN by using the training set; the Mask RCNN comprises a fixed window multi-head self-attention module, an offset window multi-head self-attention module, a full connection module, a layer standardization module, a sub-graph fusion module, a feature pyramid network FPN, a region suggestion network RPN, an interest region fusion RoI align and a full connection network; the Mask RCNN uses a multi-head self-attention mechanism and a Swin transform as a backbone network, extracts a fusion characteristic graph by using FPN, extracts an interested region by using RPN, obtains a candidate suggestion frame by RoI align, judges the category and extracts the suggestion frame by using a full connection layer and a linear layer, and obtains defect example segmentation by Mask branching;

and carrying out steel surface defect detection on the steel surface defect pictures concentrated in the test by using the trained Mask RCNN to obtain a steel surface defect detection result.

Further, training the Mask RCNN by using the training set, including:

inputting the steel surface defect pictures in the training set into a backbone network with a self-attention mechanism and a Swin Transformer, and extracting feature maps of 4 different stages;

fusing the feature maps of adjacent stages in the feature maps of the 4 different stages by using an FPN (field programmable gate array) mode to obtain 4 feature maps with different resolutions;

inputting the 4 feature maps with different resolutions into an RPN, and acquiring a preset number of candidate suggestion frames according to the size of an original image;

performing binary classification on the candidate suggestion frames, performing pooling synthesis by using RoI align, performing classification and regression through a fully-connected network to obtain detection results and corresponding loss, and performing defect instance segmentation and labeling by using mask branches;

using a random gradient descent method SGD to adjust the weight of a backbone network based on a multi-head self-attention mechanism and a Swin transform through calculating loss, and finishing training when the number of iterations reaches a set value to obtain a weight file;

and loading the weight file to a Swin transform architecture based on a multi-head self-attention mechanism to obtain a trained Mask RCNN based on the multi-head attention mechanism and the Swin transform.

Further, using an FPN method to fuse feature maps of adjacent stages in the feature maps of the 4 different stages to obtain 4 feature maps of different resolutions, including:

performing 1x1 convolution operation on the feature map S5 output by the stage 4 to obtain a feature map F5; carrying out 3x3 convolution operation on the feature map F5 to obtain a feature map P5;

performing 1x1 convolution operation on the feature map S4 output in the stage 3 to obtain a feature map F4, performing 2-time upsampling on the feature map F5, fusing the feature map F4, and performing 3x3 convolution operation to obtain a feature map P4;

performing 1x1 convolution operation on the feature map S3 output in the stage 2 to obtain a feature map F3, performing 2-time upsampling on the feature map F4, fusing the feature map F3, and performing 3x3 convolution operation to obtain a feature map P3;

and performing 1x1 convolution operation on the feature map S2 output in the stage 1 to obtain a feature map F2, performing 2-time upsampling on the feature map F3, fusing the feature map F2, and performing 3x3 convolution operation to obtain a feature map P2.

Further, inputting the feature maps with the 4 different resolutions into an RPN, and acquiring a predetermined number of candidate suggestion frames according to the original image size, including:

inputting the original image and the feature map into a candidate frame generator, generating a predetermined number of candidate frames with the sizes of 1:1, 1:2 and 2:1, obtaining candidate suggestion frames by using regression prediction, and obtaining filtered candidate suggestion frames by using non-maximum value suppression.

Further, preprocessing the steel surface defect picture in the data set, including:

and cutting the filled data to enhance operation, obtaining a preprocessed steel surface defect picture, and marking the position and the type of the defect.

Further, inputting the steel surface defect pictures in the training set into a backbone network with a self-attention mechanism and a Swin transform, and extracting feature maps of 4 different stages, wherein the feature maps comprise:

constructing a swin _ transformer _ block basic module;

constructing a patch _ partition basic module to realize the creation of a subgraph;

constructing a stage 1 module, wherein the input of the stage 1 module is a sub-graph obtained by the patch _ partition basic module; mapping and layer normalization operation are carried out on the input subgraph; performing convolution operation on the input picture by using a convolution kernel with the same size as the sub-picture to obtain a first feature picture; taking the first characteristic diagram as the input of the swin _ transformer _ block, outputting the obtained characteristic diagram, and performing the swin _ tansformer _ block operation again to obtain a second characteristic diagram which is the input of the stage 2;

constructing a patch _ clustering basic module, wherein the input of the patch _ clustering basic module is a feature map with the size of 4x4, taking adjacent pixels of each 2x2 as a partition subgraph, and splicing the pixels at the same position in each partition subgraph to obtain 4 feature maps of 2x 2; splicing the four feature maps in the depth channel direction, doubling the depth channel of the feature maps through layer normalization operation and linear transformation of a full-connection layer in the depth direction of the feature maps, and outputting 2-time down-sampling feature maps;

a stage 2 building module, which is used for performing patch _ clustering operation on the input second feature map to obtain a third feature map; taking the third feature diagram as the input of the swin _ transformer _ block, outputting the obtained feature diagram, and performing the swin _ tansformer _ block operation again to obtain a fourth feature diagram which is the input of the stage 3;

constructing a stage 3 module, wherein the stage 3 module performs a patch _ clustering operation on the input fourth feature map to obtain a fifth feature map; taking the fifth characteristic diagram as the input of the switch _ transformer _ block, performing the switch _ transformer _ block operation on the output characteristic diagram again, and recycling the input switch _ transformer _ block twice to obtain a sixth characteristic diagram which is the input of the stage 4;

constructing a stage 4 module, wherein the stage 4 module performs patch _ clustering operation on the input sixth feature map to obtain a seventh feature map; and taking the seventh feature diagram as an input of the swin _ transformer _ block, outputting the obtained feature diagram, and performing the swin _ tansformer _ block operation again to obtain an eighth feature diagram.

Further, constructing a swin _ transformer _ block base module, comprising:

carrying out normalized LayerNorm operation on the input characteristic diagram, then using a fixed window self-attention mechanism, fusing the characteristic diagram obtained by operation with the original characteristic diagram, and recording the fused characteristic diagram as a first fused characteristic diagram; the fixed window attention mechanism is to segment the feature map by using windows with the size of 7x7, and perform multi-head self-attention operation in each window;

performing layer normalized LayerNorm operation and multilayer perceptron MLP operation on the first fusion characteristic diagram, and performing second fusion on the operated characteristics and the fusion characteristic diagram to obtain a second fusion first characteristic diagram;

carrying out layer normalization LayerNorm operation on the secondarily fused first feature diagram, then fusing the feature diagram obtained by operation with the original feature diagram by using a self-attention mechanism of a migration window, and recording the feature diagram as a second fused feature diagram; the shifting window attention mechanism is to segment the feature graph by using a window with the size of 7x7, shift the window and realize information interaction among different windows;

and performing layer normalized LayerNorm operation and multilayer perceptron (MLP) operation on the second fusion characteristic diagram, and performing second fusion on the operated characteristics and the fusion characteristic diagram to obtain a second fusion second characteristic diagram.

In another aspect, the present invention further provides a device for detecting surface defects of steel, the device comprising:

the device comprises a data set unit, a data processing unit and a data processing unit, wherein the data set unit is used for acquiring a steel surface defect picture data set, preprocessing a steel surface defect picture in the data set, and dividing a training set and a test set; the picture of the steel surface defect is a picture containing cracks, inclusions, plaques, a pitting surface, rolling scale or a scratch defect;

the training unit is used for constructing a Mask region convolutional neural network Mask RCNN based on a multi-head attention mechanism and a rotary deformation Swin transform, and training the Mask RCNN by using the training set acquired by the data set unit; the Mask RCNN comprises a fixed window multi-head self-attention module, an offset window multi-head self-attention module, a full connection module, a layer standardization module, a sub-graph fusion module, a feature pyramid network FPN, a region suggestion network RPN, an interest region fusion RoI align and a full connection network; the Mask RCNN uses a multi-head self-attention mechanism and a Swin transform as a backbone network, an FPN is used for extracting a fusion characteristic diagram, an RPN is used for extracting an interested region, a candidate suggestion frame is obtained through RoI align, a full connection layer and a linear layer are used for judging the category and extracting the suggestion frame, and then defect example segmentation is obtained through Mask branching;

and the detection unit is used for carrying out steel surface defect detection on the steel surface defect pictures in the test set acquired by the data set unit by using the Mask RCNN trained by the training unit to obtain a steel surface defect detection result.

Further, the training unit comprises:

a first feature map extraction subunit, configured to input the steel surface defect picture in the training set into a backbone network with a self-attention mechanism and a Swin transform, and extract feature maps in 4 different stages;

the second feature map extraction subunit is configured to fuse, in an FPN manner, feature maps of adjacent stages in the 4 feature maps of different stages obtained by the first feature map extraction unit, so as to obtain 4 feature maps of different resolutions;

a candidate suggestion frame obtaining subunit, configured to input the 4 feature maps with different resolutions obtained by the second feature map extracting unit into an RPN, and obtain a predetermined number of candidate suggestion frames according to the size of the original image;

the example segmentation subunit is used for performing binary classification on the candidate suggestion frame obtained by the candidate suggestion frame obtaining unit, performing pooling synthesis by using RoI align, performing classification and regression through a fully-connected network to obtain a detection result and corresponding loss, and performing defect example segmentation and labeling by using a mask branch;

the system comprises a weight file acquisition subunit, a weight file acquisition subunit and a weight file processing subunit, wherein the weight file acquisition subunit is used for carrying out weight adjustment on a multi-head self-attention mechanism and Swin transform-based backbone network by adopting a random gradient descent method SGD through calculating loss, and the number of iterations reaches a set value, namely the training is finished to obtain a weight file; and loading the weight file to a Swin transform architecture based on a multi-head self-attention mechanism to obtain a trained Mask RCNN based on the multi-head attention mechanism and the Swin transform.

In still another aspect, the present invention further provides a computer readable storage medium, in which a set of computer instructions is stored, and when the set of computer instructions is executed by a processor, the method for detecting surface defects of a steel material as described above is implemented.

The invention has the advantages and positive effects that:

the invention discloses a multi-head self-attention mechanism and a Swin transform, which can keep small pixel information and reduce the calculated amount; therefore, the detection precision can be improved, and particularly, the precision is greatly improved when small-size defects are detected; meanwhile, the calculation speed is increased, and real-time detection is realized; in addition, the Mask RCNN network not only can detect the defects on the surface of the steel, but also can realize defect example segmentation, and help a front-line worker to more accurately determine the positions and the shapes of the defects.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a Mask RCNN network structure based on a multi-head self-attention mechanism and Swin transform according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for detecting surface defects of steel products according to an embodiment of the present invention;

FIG. 3 is a diagram showing the results of the image inspection of the surface defects of steel materials according to the embodiment of the present invention;

in the figure: 1-crack, 2-scratch, 3-pitting surface, 4-patch, 5-roll scale, 6-inclusion, black box is the predicted defect box.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Mask RCNN (Mask-based Convolutional Neural Networks) based on multi-head self-attention mechanism and Swin Transformer is a deep learning method, as shown in fig. 1, the Mask RCNN based on multi-head self-attention mechanism and Swin Transformer is composed of a multi-head self-attention module with fixed windows, an offset window multi-head self-attention module, a fully-connected module, a layer standardization module, a subgraph fusion module, a Feature Pyramid Network (FPN), a Region suggestion Network (RPN), a RoI interest Region fusion (RoI of interest) align, and a fully-connected Network. When the Mask RCNN based on the multi-head self-attention machine mechanism and the Swin transducer is used for detecting the steel defects, the multi-head self-attention machine mechanism and the Swin transducer are used as a backbone network, a fusion characteristic graph is extracted by using FPN, a RoI is extracted by using RPN, a candidate suggestion frame is obtained through RoI align, a full connection layer and a linear layer are used for judging the category and extracting the suggestion frame, and defect example segmentation is obtained through Mask branch (Mask branch). Various defects on the surface of the steel can be accurately and rapidly detected, and the defects are marked and segmented.

As shown in fig. 2, a flowchart of a Mask RCNN steel surface defect detection method based on a multi-head self-attention machine and Swin Transformer in an embodiment of the present invention is shown, where the method includes the following specific steps:

s101, acquiring a steel surface defect picture, performing data preprocessing on picture data, and dividing a training set and a test set;

the picture of the steel surface defect is a picture containing cracks, inclusions, plaques, a pitting surface, rolling scale or scratch defects, and the obtained picture of the steel surface defect is divided into a training set and a testing set according to the proportion of 7: 3. The embodiment of the invention adopts a hot-rolled strip steel surface defect database which comprises 6 defects of inclusions, scratches, rolling oxide skin, cracks, pitting surfaces and plaques, wherein 300 pictures of each type of defects are 1800 pictures of steel surface defects, and the picture size is 200px by 200 px.

In specific implementation, a pretreatment operation needs to be performed on a steel surface defect picture in a training set, specifically, the pretreatment operation includes: and cutting the filled data to enhance operation to obtain a new steel surface defect picture, wherein the picture size is 224 multiplied by 3(H multiplied by W multiplied by C), and marking the position and the type of the defect.

S102, inputting the training set picture into a backbone network with a self-attention mechanism and a Swin transform, and extracting feature maps S2, S3, S4 and S5 of 4 different stages;

in step S102, the original picture is input into the Swin Transformer architecture of the multi-head self-attention mechanism with attention module, and feature maps S2, S3, S4 and S5 of 4 different stages are extracted.

S103, fusing the feature maps in the adjacent stages by using an FPN feature pyramid network mode to obtain 4 feature maps P2, P3, P4 and P5 with different resolutions;

in step S103, performing 1x1 convolution operation on the feature map S5 output in stage 4 to obtain a feature map F5; and performing a convolution operation on the feature map F5 by 3x3 to obtain a feature map P5. And performing 1x1 convolution operation on the feature map S4 output in the stage 3 to obtain a feature map F4, performing 2-time upsampling on the feature map F5, fusing the feature map F4, and performing 3x3 convolution operation to obtain a feature map P4. And performing 1x1 convolution operation on the feature map S3 output in the stage 2 to obtain a feature map F3, performing 2-time upsampling on the feature map F4, fusing the feature map F3, and performing 3x3 convolution operation to obtain a feature map P3. And performing 1x1 convolution operation on the feature map S2 output in the stage 1 to obtain a feature map F2, performing 2-time upsampling on the feature map F3, fusing the feature map F2, and performing 3x3 convolution operation to obtain a feature map P2.

S104, inputting the feature maps P2, P3, P4 and P5 into an RPN region suggestion network, acquiring a predetermined number of suggestion frames according to the size of an original image, and generating a plurality of candidate suggestion frames for one object;

in step S104, the original image and the feature map are input to the candidate frame generator to generate a certain number of candidate frames of 1:1, 1:2, and 2:1 sizes, and then the regression prediction is used to obtain the suggested frame, and the non-maximum value is used to suppress the suggested frame to obtain the filtered suggested frame.

S105, inputting the candidate suggestion boxes into a regressor to perform binary classification, then performing pooling synthesis by using RoI align, performing classification and regression through a fully-connected network to obtain a detection result and corresponding loss, and performing defect instance segmentation and labeling by using a mask _ branch;

in step S105, the candidate suggestion frame is input into RoI align to be subjected to pooling synthesis, then input into a full-connection network to be subjected to classification and candidate frame regression, output of 6 channels and 24 channels is obtained to represent classification and regression results, and the steel surface defect detection loss is calculated. And performing pixel alignment and up-sampling operation by using a mask _ bridge, restoring each feature graph to the original size, wherein the number of channels is 6, and obtaining the segmentation label of the defect example.

S106, carrying out weight adjustment on the backbone network based on the multi-head self-attention mechanism and Swin transform by using a random gradient descent method SGD through calculating loss, and predicting a test set after the iteration times reach a set value, namely finishing training to obtain a weight file;

in step S106, using a random gradient descent method SGD, setting the number of pictures batch _ size input in each batch during training to 2, setting the learning rate to 0.00125 × batch _ size × gpu _ nums, setting the gpu _ nums to the number of gpu display cards, setting the momentum to 0.9, and setting the weight attenuation rate to 0.00005-0.0001, performing weight adjustment on the whole Swin Transformer architecture based on the multi-head self-attention mechanism through a loss function, setting the iteration number to 12-24, and obtaining a weight file when the iteration number is reached, that is, the training is ended; and loading the weight file to a Swin transform framework based on a multi-head self-attention mechanism, and predicting pictures of the test set to obtain the position, the type and the confidence coefficient of the steel surface defect.

For ease of understanding, step S102 is explained in detail below. Step S102 specifically includes the following steps:

s201, defining a swin _ transformer _ block basic module.

Firstly, carrying out normalized LayerNorm operation on an input feature map, then fusing the feature map obtained by operation with the original feature map by using a fixed window self-attention mechanism, and recording the fused feature map as a fused feature map. And then, performing normalized LayerNorm operation and multi-layer perceptron MLP operation on the fused feature map, and performing secondary fusion on the operated features and the fused feature map to obtain a secondary fused feature map which is used as the input of the next step.

And secondly, carrying out normalized LayerNorm operation on the secondary fusion characteristic diagram obtained in the first step, and then fusing the characteristic diagram obtained by operation with the original characteristic diagram by using a self-attention mechanism of a migration window to be recorded as a fusion characteristic diagram. And then, performing second fusion on the operated features and the fusion feature map by using normalized LayerNorm operation and multi-layer perceptron MLP operation on the fusion feature map to obtain a secondary fusion feature map.

The fixed window attention mechanism is to slice the feature map with windows of size 7x7 and perform a multi-headed self-attention operation in each window. The shifting window attention mechanism is to cut the feature map by a window with the size of 7x7, shift the window in real time, and realize information interaction among different windows, namely, shifting the 0, 1, 2 lines to the rightmost side of the feature map, and shifting the 0, 1, 2 columns to the bottommost side of the feature map.

And then performing multi-head self-attention operation on each segmented window. The multi-head self-attention operation is to regard each window as an in-feature map, the length of an input feature map sequence is 4i, and an input node is x_iInput mapping to vector a by Input Embedding_iWill vector a_iRespectively through three transformation matrixes, i.e. inquiry matrixes W_qKeyword matrix W_kInformation matrix W_vObtaining corresponding query vector qⁱA keyword vector kⁱInformation vector vⁱFurther processing the obtained query vector q according to the number of used heads hⁱA keyword vector kⁱInformation vector vⁱDividing the query vector into h parts of head number, and dividing the query vector q into h partsⁱTo each head. Then, query vector q for each head^i,hjAlong with the keyword vector k^i,hjCarry out matching, vⁱRepresenting the slave input vector aⁱUsing the information obtained in (q)ⁱWith each kⁱPerforming a dot product operation, dividing by d^1/2Obtaining the corresponding attention weight alpha, wherein d representsKeyword vector kⁱLength of (d). And uniformly writing the vectors obtained by matching into a matrix multiplication form, and performing softmax processing on each row to obtain the weight of the information vector v. And weighting each weight and uniformly writing the weights into a matrix multiplication form. Then, concat splicing is carried out on the result obtained by each head, and the spliced result passes through a weight parameter matrix W to be trained^oFusing to obtain final new characteristic diagram b_i。

S202, a patch _ partition basic module is defined, and the module realizes creation of the sub-graph patch.

The input pictures are saved in a size of 224x224x3 (H)_ixW_ixC_i) Tensor x of_iIn this example, the original is divided into 56 × 56 × 3 sub-maps patch by using the sub-map size (4 × 4 for patch _ size), and the number of heads is set to 3. Performing one-dimensional expansion on each sub-graph patch to obtain a channel number C_patch16x3, and then re-stitching the pixel sequence according to the sub-graph patch position, i.e. downsampling the patch _ size of the original image.

S203, a Stage 1 module is defined, the sub-graph patch of 56 multiplied by 48 is input by the module, and the number of heads is set to be 3.

Firstly, mapping embedding and layer normalized LayerNorm operation are carried out on a subgraph, so that the subgraph becomes a characteristic graph with the channel number C being 96. This process can also be understood as dividing the original image by the sub-graph patch and inputting the channel C_inputIs 3, the number of output channels C_outputIs 96. And performing convolution operation on the input picture by using a convolution kernel _ size with the same size as the sub-graph patch _ size and with the step size of 4 to obtain a feature map of 56x56x 96. And taking the characteristic diagram as input of the swin _ transformer _ block, outputting the obtained characteristic diagram, and performing the swin _ tansformer _ block operation again to obtain the characteristic diagram which is input of the Stage 2.

S204, defining a patch _ clustering basic module, wherein the input of the module is a feature map, and the output of the module is a feature map subjected to 2-time down-sampling.

Firstly, a feature map with the size of 4x4 is selected, then adjacent pixels of each 2x2 are used as a partition patch, and then pixels at the same position in each partition patch are spliced to obtain 4 feature maps of 2x 2. And then splicing the four feature maps in the depth channel direction, and then performing linear transformation on the feature maps in the depth direction through a normalized LayerNorm operation and a full connection layer to double the depth channel of the feature maps.

And S205, defining a stage 2 module, wherein the input of the module is a feature map of 56x56x96, and the number of heads is set to be 6.

First, a patch _ clustering operation is performed on the feature map, so as to obtain a 28x28x192 feature map. And taking the characteristic diagram as an input of the swin _ transformer _ block, outputting the obtained characteristic diagram, and performing the swin _ tansformer _ block operation again to obtain the characteristic diagram which is an input of Stage 3.

And S206, defining a stage 3 module, wherein the input of the stage 3 module is a 28x28x192 characteristic diagram, and the number of heads is set to be 12.

First, a patch _ clustering operation is performed on the feature map, so as to obtain a feature map of 14x14x 384. And taking the characteristic diagram as an input of the switch _ transformer _ block, outputting the obtained characteristic diagram, performing the switch _ transformer _ block operation again, and recycling the input switch _ transformer _ block2 times to obtain the characteristic diagram as the input of Stage 4.

And S207, defining a stage 4 module, wherein the input of the module is a characteristic diagram of 14x14x384, and the number of the heads is set to be 24.

First, a patch _ clustering operation is performed on the feature map, so as to obtain a 28x28x192 feature map. And taking the characteristic diagram as an input of the swin _ transformer _ block, outputting the obtained characteristic diagram, and performing the swin _ tansformer _ block operation again.

In order to evaluate the detection effect of the steel surface defect detection method in the invention on the steel surface defect picture, an mAP (mean Average precision) evaluation index is used, namely the Average value of different types of Average precision AP (Average precision), wherein the Average precision AP is the area under the precision-recall rate curve. The invention measures the model performance of defect detection by using an average precision mean mAP under a threshold value of 0.5 of an intersection ratio IoU (intersection over Union), wherein the intersection ratio IoU refers to the ratio of the intersection area and the combination area of a prediction frame and a real frame. The detection results are shown in FIG. 3.

The invention provides a new method for detecting the surface defects of the steel, takes a multi-head self-attention mechanism, a Swin transform backbone network and an FPN characteristic pyramid network as an effective means for characteristic extraction, can effectively solve the problem of information loss of small-size pixel areas, improves the detection accuracy rate, effectively promotes the development of the steel surface defect detection technology, and has practical popularization value. In addition, the Swin transform architecture algorithm is adopted, and compared with other common deep learning algorithms, the Swin transform has the advantages of simple structure, higher prediction accuracy, high prediction speed, capability of detecting and classifying various surface defects, capability of accurately identifying even tiny defects and the like.

The invention also provides a steel surface defect detection device corresponding to the steel surface defect detection method, which comprises the following steps:

the training unit is used for constructing a Mask region convolutional neural network Mask RCNN based on a multi-head attention mechanism and a rotary deformation Swin transform, and training the Mask RCNN by using the training set acquired by the data set unit; the Mask RCNN comprises a fixed window multi-head self-attention module, an offset window multi-head self-attention module, a full connection module, a layer standardization module, a sub-graph fusion module, a feature pyramid network FPN, a region suggestion network RPN, an interest region fusion RoI align and a full connection network; the Mask RCNN uses a multi-head self-attention mechanism and a Swin transform as a backbone network, extracts a fusion characteristic graph by using FPN, extracts an interested region by using RPN, obtains a candidate suggestion frame by RoI align, judges the category and extracts the suggestion frame by using a full connection layer and a linear layer, and obtains defect example segmentation by Mask branching;

Wherein, the training unit specifically includes:

The steel surface defect detecting device of the embodiment of the present invention is relatively simple in description because it corresponds to the steel surface defect detecting method of the above embodiment, and for the related similarities, please refer to the description of the steel surface defect detecting method of the above embodiment, and the detailed description is omitted here.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer instruction set is stored in the computer readable storage medium, and when being executed by a processor, the computer instruction set realizes the steel surface defect detection method provided by any one of the above embodiments.

In the embodiments provided in the present invention, it should be understood that the disclosed technical contents can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for detecting surface defects of a steel material, the method comprising:

2. The method for detecting the surface defects of the steel products as claimed in claim 1, wherein the training of the Mask RCNN by the training set comprises:

inputting the steel surface defect pictures in the training set into a backbone network with a self-attention mechanism and a Swin transform, and extracting feature graphs of 4 different stages;

3. The method for detecting the surface defects of the steel products as claimed in claim 2, wherein the 4 feature maps of different stages are fused by using an FPN mode to obtain 4 feature maps of different resolutions, comprising:

4. The method as claimed in claim 2, wherein the step of inputting the 4 feature maps with different resolutions into the RPN to obtain a predetermined number of candidate suggestion boxes according to the size of the original map comprises:

5. The method for detecting the surface defect of the steel material according to claim 1, wherein the preprocessing of the surface defect picture of the steel material in the data set comprises:

6. The method for detecting the surface defects of the steel products according to claim 2, wherein the images of the surface defects of the steel products in the training set are input into a backbone network with a self-attention mechanism and a Swin transform, and feature maps of 4 different stages are extracted, and the method comprises the following steps:

constructing a swin _ transformer _ block basic module;

constructing a patch _ clustering basic module, wherein the input of the patch _ clustering basic module is a feature map with the size of 4x4, taking adjacent pixels of each 2x2 as a partitioning sub-map, and splicing the pixels at the same position in each partitioning sub-map to obtain 4 feature maps of 2x 2; splicing the four feature maps in the depth channel direction, doubling the depth channel of the feature maps through layer normalization operation and linear transformation of a full-connection layer in the depth direction of the feature maps, and outputting 2-time down-sampling feature maps;

7. The method for detecting the surface defects of the steel products as claimed in claim 6, wherein the step of constructing a swin _ transformer _ block basis module comprises the following steps:

carrying out normalized LayerNorm operation on the input feature map, then fusing the feature map obtained by operation with the original feature map by using a fixed window self-attention mechanism, and recording the fused feature map as a first fused feature map; the fixed window attention mechanism is to segment the feature map by using windows with the size of 7x7, and perform multi-head self-attention operation in each window;

8. A steel surface defect detection device, characterized in that the device includes:

the training unit is used for constructing a Mask region convolution neural network Mask RCNN based on a multi-head attention mechanism and a rotary deformation Swin Transformer, and training the Mask RCNN by using the training set acquired by the data set unit; the Mask RCNN comprises a fixed window multi-head self-attention module, an offset window multi-head self-attention module, a full connection module, a layer standardization module, a sub-graph fusion module, a feature pyramid network FPN, a region suggestion network RPN, an interest region fusion RoI align and a full connection network; the Mask RCNN uses a multi-head self-attention mechanism and a Swin transform as a backbone network, extracts a fusion characteristic graph by using FPN, extracts an interested region by using RPN, obtains a candidate suggestion frame by RoI align, judges the category and extracts the suggestion frame by using a full connection layer and a linear layer, and obtains defect example segmentation by Mask branching;

9. A steel surface defect detecting apparatus as claimed in claim 1, wherein said training unit comprises:

10. A computer readable storage medium storing a set of computer instructions, wherein the set of computer instructions is executed by a processor to implement the method for detecting surface defects of steel products according to any one of claims 1 to 8.