CN117523298A

CN117523298A - Steel surface defect detection method and system

Info

Publication number: CN117523298A
Application number: CN202311551599.3A
Authority: CN
Inventors: 梁芳芳; 王朝阳; 黄子龙
Original assignee: Hebei Agricultural University
Current assignee: Hebei Agricultural University
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-02-06

Abstract

The invention discloses a steel surface defect detection method and a steel surface defect detection system, wherein the method comprises the following steps: acquiring a steel surface image and preprocessing to obtain a processed image; extracting features of the processed image; performing target detection on the extracted features to obtain a steel surface defect image; and outputting a steel surface defect image to finish steel surface defect detection. The invention introduces an ECA attention mechanism in the backbone network and the neck network, can improve the detection accuracy, and is particularly used in the defect detection task in a complex scene. Meanwhile, a Ghost convolution is introduced into the neck network, so that the weight of the model is reduced. The real-time performance can be improved, the calculation load can be reduced, and the model is more suitable for the rapid detection requirement in the actual industrial environment. The invention also introduces FcaNet attention mechanism at the detection head to enhance feature representation and discrimination capability.

Description

Steel surface defect detection method and system

Technical Field

The invention relates to the field of computer vision, in particular to a steel surface defect detection method and system.

Background

The traditional steel surface defect detection method mainly relies on manual visual detection or uses a specific instrument for detection. These methods are generally time consuming, costly, and require high expertise and experience of the operator. In order to improve the detection efficiency and accuracy, deep learning has been widely used in steel surface defect detection in recent years.

Whereas the prior art generally has a complex network structure, and contains a large number of parameters and calculation amounts. This results in a model training and reasoning process requiring high computational resources, limiting its real-time and efficiency in practical applications. Insensitive to small size defects: due to network structure and training data limitations, prior art solutions may have certain limitations in detecting small size defects or complex textures. This results in some small or hidden defects that may not be accurately detected.

The above drawbacks mainly stem from the limitations of the prior art solutions in terms of data acquisition, network architecture design and algorithm optimization. Therefore, further improvements and optimization algorithms are needed to increase the accuracy, real-time and adaptability of steel surface defect detection.

Disclosure of Invention

In order to solve the technical problems in the background, the invention solves the problems of poor real-time performance and low detection precision in the detection of the steel surface defects by improving the algorithm and the system design, thereby improving the efficiency and the reliability of the whole detection process.

In order to achieve the above object, the present invention provides a method for detecting a defect on a steel surface, comprising the steps of:

acquiring a steel surface image and preprocessing to obtain a processed image;

extracting features of the processed image;

performing target detection on the extracted features to obtain a steel surface defect image;

outputting the steel surface defect image to finish steel surface defect detection.

Preferably, the method for performing the pretreatment comprises: adjusting the size, brightness and contrast of the acquired steel surface image, and removing noise; and obtaining the processed image.

Preferably, the method for extracting the features comprises the following steps: introducing an ECA attention mechanism into a trunk network and a neck network of the Yolov5s structure, and simultaneously introducing a Ghost convolution into the neck network of the Yolov5s structure to finish feature extraction.

Preferably, the method for performing the target detection includes: detecting by using a detection head on the basis of feature extraction; the detection head consists of a rolling layer and a full-connection layer.

Preferably, in the target detection process, fcanet based on SEnet and combined with ASFF structure are adopted to perform multi-scale fusion, so that the small target detection precision is improved.

The invention also provides a steel surface defect detection system, which is used for realizing the method and comprises the following steps: the device comprises an acquisition module, an extraction module, a detection module and an output module;

the acquisition module is used for acquiring the steel surface image and preprocessing the steel surface image to obtain a processed image;

the extraction module is used for extracting the characteristics of the processed image;

the detection module is used for carrying out target detection on the extracted characteristics to obtain a steel surface defect image;

the output module is used for outputting the steel surface defect image to finish steel surface defect detection.

Preferably, the pretreatment process includes: adjusting the size, brightness and contrast of the acquired steel surface image, and removing noise; and obtaining the processed image.

Preferably, the workflow of the extraction module includes: introducing an ECA attention mechanism into a trunk network and a neck network of the Yolov5s structure, and simultaneously introducing a Ghost convolution into the neck network of the Yolov5s structure to finish feature extraction.

Compared with the prior art, the invention has the following beneficial effects:

the invention introduces an ECA attention mechanism in the backbone network and the neck network, can improve the detection accuracy, and is particularly used in the defect detection task in a complex scene. Meanwhile, a Ghost convolution is introduced into the neck network, so that the weight of the model is reduced. The real-time performance can be improved, the calculation load can be reduced, and the model is more suitable for the rapid detection requirement in the actual industrial environment. The invention also introduces FcaNet attention mechanism at the detection head to enhance feature representation and discrimination capability. In addition, the invention also enhances the expression capability of the model to defect characteristics with different scales by introducing an ASFF structure, thereby improving the detection robustness.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method according to an embodiment of the invention.

FIG. 2 is a schematic diagram of the structure of YOLOv5s according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network structure of a YOLO-EGA according to an embodiment of the invention;

FIG. 4 is a schematic diagram showing a comparison of a C3 structure and a C3ghost structure according to an embodiment of the present invention; wherein a is an original C3 structure; b is the C3 ghest structure of the present invention;

fig. 5 is a schematic diagram of an ASFF structure according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Before proceeding with the description, the original YOLOv5s target detection algorithm is first described.

YOLOv5s is a lightweight object detection algorithm (fig. 2), and its network structure includes an input layer, a backbone network, a neck network (feature pyramid), a detection head, and an output layer.

The input layer accepts the input image and scales it to a predefined input size 224 x 224. In YOLOv5, training was performed using a mode of mosaics data enhancement. The Mosaic data enhancement mainly comprises the following steps:

1. randomly selecting picture stitching fiducial point coordinates (x _c ，y _c )。

2. Four pictures were randomly selected.

3. The four pictures are placed at the upper left, upper right, lower left and lower right positions on the large picture of the specified size according to the reference point and the size adjustment, respectively.

4. Mapping the size transformation mode of each picture to a picture label.

5. And splicing the large images according to the designated abscissa and ordinate, and processing the coordinates of the detection frame beyond the boundary. By means of Mosaic data enhancement, the YOLOv5 can effectively utilize a plurality of image samples, and generalization capability and detection accuracy of the model are improved.

Next, the backbone network is composed of a series of convolutional layers and residual blocks for extracting image features. The features are processed by a feature pyramid module to obtain feature graphs of different scales for target detection.

In YOLOv5s, the backbone network is responsible for extracting image features of different layers of images, and consists of modules such as CBS, C3, SPPF and the like. The CBS layer consists of convolution (Conv), bulk Normalization (BN) and activation functions (Silu). The C3 module includes three standard convolution layers and multiple bottlenecks. SPPF uses two pooling cores, 5 x 5 and 1x1, to increase receptive fields and input arbitrary image aspect ratios and sizes.

In the neck network, a fpn+pan structure is employed to improve the performance of the model. FPN (Feature Pyramid Networks) is responsible for delivering deep semantic features to the shallow layer in order to better handle targets of different scales. Meanwhile, PAN (Pyramid Attention Network) transmits the positioning information of the shallow layer to the deep layer, and fuses the characteristics of different trunk layers and detection layers. The design purpose of the structure is to capture more key information in the image through multi-level feature fusion, so that the performance of the model in the target detection task is improved. By means of the optimization mode, the Yolov5s neg network has better performance and accuracy in detection tasks.

The detection head is used for predicting the position and the category of the target according to the feature map, and comprises a group of convolution layers and full connection layers for converting the output feature map of the backbone network into the prediction of the position and the category of the target bounding box. In YOLOv5 Head, different size feature maps of the neg network are feature extended by 1x1 convolution to handle different scale targets. The number of feature channels after expansion is (category number +5) ×the number of anchors on each detection layer, where 5 is the center point coordinate, width, height and confidence of the prediction frame. For 3 detection layers, one feature map for each, and the grid size on each feature map is preset by YOLOv 5. 3 anchors with different aspect ratios are preset in each grid and used for predicting the target positions and categories, so that the channel dimension can be used for storing all the position and category information based on the anchor prior frame.

The target frame regression calculation formula for YOLOv5 is as follows:

b _x ＝2σ(t _x )-0.5+c _x

b _y ＝2σ(t _y )-0.5+c _y

b _w ＝p _w ×(2σ(t _w )) ²

b _h ＝p _h ×(2σ(t _h )) ²

wherein, (b) _x ，b _y ，b _w ，b _h ) Representing the center point coordinates, width and height of the prediction frame, (c) _x ，c _y ) Representing the upper left corner coordinates of the grid where the center point of the prediction frame is located, (t) _x ，t _y ) Representing that the center point of the prediction box sits against the top left corner of the gridTarget offset, (t) _w ，t _h ) Represents the scaling of the width-height of the prediction frame relative to the anchor width-height, (p) _w ，p _h ) Representing the width and height of the a priori frame anchor.

The loss function of YOLOv5s is a three-part weighted sum of the positioning, target and classification losses, as shown in:

Loss＝λ ₁ L _obj +λ ₂ L _bbox +λ ₃ L _conf

wherein lambda is ₁ 、λ ₂ 、λ ₃ Is a hyper-parameter that coordinates these three losses. L (L) _obj Representing the loss of target classification. L (L) _conf Indicating whether there is a loss of confidence in the target at the detection point. L (L) _bbox Representing losses due to deviations between the target real frame and the predicted frame.

Based on the YOLOv5s target detection algorithm, the invention improves the YOLOv5s target detection algorithm to improve network performance. The improved network YOLO-EGA structure is shown in fig. 3. In a backbone network, an attention mechanism ECA module is introduced to extract more comprehensive and more important target information, other redundant information is filtered, and a small amount of parameters are added at the same time; and an ECA attention mechanism is introduced in the feature pyramid fusion process of the neck network, so that the characterization capability of lower-level features is enhanced, and the neck network is more sensitive to a defect detection task. In order to realize the light weight of the model and give consideration to the detection precision, the C3 module of the neck network is replaced by a C3ghost module, so that the parameter is greatly reduced, the model is lighter, and meanwhile, the good performance is maintained; at the detection head, the invention cuts in from the angle of the frequency domain, the invention introduces a FcaNet attention mechanism based on SENet before the detection head, the FcaNet module obtains the weight corresponding to different characteristic channels through learning, and then the expression capacity of useful characteristics is improved and the expression capacity of useless characteristics is restrained according to the size of the weight, thereby highlighting the defect area in the image, weakening the background area and realizing the defect detection under multiple scales. In addition, the ASFF structure is introduced into the detection head part to effectively integrate multi-level and multi-scale characteristic information, and the perceptibility and positioning accuracy of the model to the steel defects are improved. The detailed detection flow of the present invention is described in the examples below.

Example 1

As shown in fig. 1, which is a schematic flow chart of a method of an embodiment, the steps include:

s1, acquiring a steel surface image and preprocessing to obtain a processed image.

First, images are acquired containing the surface of the steel material, which may be from a camera, scanner, or the like. After the acquisition is completed, preprocessing operations are performed on the input images, including: the adjustment of image size, brightness and contrast, noise removal and the like are adjusted to improve the accuracy and robustness of the subsequent algorithm. The method comprises the following specific steps:

1. when adjusting the image size, since the input image size is required to be 224×224 pixels, the original image needs to be adjusted in such a size. First, a scaling ratio is calculated, and a scaling ratio to be scaled is calculated according to an input target size and an original image size. This typically involves calculating the scaling in the length-width direction to keep the original scale of the image unchanged. Then scaling the image, scaling the original image using the calculated scaling, and high quality image scaling can be achieved typically using bilinear interpolation or the like. Finally, filling the border, after the image scaling is completed, if the scaled image size is insufficient to fill the target size, the border of gray (or other designated color) needs to be filled around the image so that the final image size reaches the target size.

2. In image normalization, the mean value and standard deviation are used for normalization, and the pixel value of each channel of an image is subtracted by the mean value and then divided by the standard deviation to realize the normalization processing of image data. This helps the model learn and converge better.

3. In adjusting brightness and contrast, this can be achieved by linear transformation. An algorithm at the pixel level may be used to operate on the RGB channels. Brightness adjustment: the brightness of the image may be adjusted by increasing or decreasing the RGB value of each pixel. Contrast adjustment: the contrast of the image may be adjusted by stretching or compressing the range of pixel values. The brightness and contrast ratio are adjusted on the premise of keeping the image natural and real, so that the target object is easier to observe and recognize. The adjusted image should meet the basic requirements of the human eye for visual comfort and object recognition.

4. After the preprocessing is completed, noise needs to be removed, and gaussian filtering can be considered to smooth the image and reduce noise, and the change of pixel values in the image is reduced by applying a gaussian kernel.

S2, extracting features of the processed image.

The preprocessed image is then feature extracted using a deep Convolutional Neural Network (CNN). The present embodiment adopts a lightweight network structure (YOLOv 5 s) and introduces ECA attention mechanisms in the feature extraction and fusion stage (backbone network and neck network) to enhance feature expression capability. In addition, the embodiment introduces the Ghost convolution into the neck network to reduce the parameter quantity of the model so as to realize the lightweight design of the model. The method comprises the following specific steps: by weight training the channel dimensions of the feature layer, the network can focus more on defective portions of the input picture while suppressing unwanted information in the surrounding complex environment. In addition to improvements in the backbone network, the present embodiment also optimizes the neck network. In the process, an ECA attention mechanism is introduced, and the network is enabled to pay more attention to important feature information by training weights in channel dimensions of features with different scales, so that the feature fusion effect is enhanced.

In summary, by introducing the ECA attention mechanism, the present embodiment effectively improves the capability of the network to detect the surface defects of the steel. The optimization measures enable the network to concentrate on key features, and interference of irrelevant information is reduced, so that accuracy and robustness of steel surface defect detection are improved remarkably.

Meanwhile, in order to realize the light weight of the model, in the neck network, the embodiment further improves the original C3 structure (as shown in fig. 4), and the embodiment also introduces a Ghost module in the GhostNet network to replace a part of convolution layers in the Yolov5s network, namely, the C3 structure is used to replace the original C3 structure, and the comparison between the C3 structure and the C3 structure is shown in fig. 4. This makes it possible to reduce the weight of the model without reducing the detection accuracy. The improvement enhances the perceptibility of the model to the local features, reduces the parameter quantity, and improves the performance of the model under the condition of limited computing resources, so that the model is more efficient and accurate.

S3, performing target detection on the extracted features to obtain a steel surface defect image.

On the basis of the feature extraction, a detection head is used for detecting the target. The detection head is typically composed of a convolution and a full connection layer, mapping the extracted features to target detection results. In this process, the present embodiment adopts a fnanet based on secet and combines with an ASFF structure to perform multi-scale fusion, so as to improve the capability of small target detection.

Due to the complexity of the industrial production environment, steel surface defects may present complex background occlusions, resulting in loss of important features. To solve this problem, the channel attention mechanism has achieved great success in the field of computer vision, such as SENet, which is a common choice in the field of deep learning because of its simplicity and efficiency. However, the channel attention mechanism uses scalar quantities to represent the channel, which can lead to a significant amount of information loss. The present embodiment cuts in from the perspective of the frequency domain, uses SENet-based frequency domain channel attention modules FcaNet, fcaNet to treat the channel representation problem as a compression process using frequency analysis, and compresses the relevant channels in the channel attention mechanism using discrete cosine transform. FcaNet always performs better than SENet, and has the same number of parameters and computational costs. The FcNet attention module acquires weights corresponding to different feature channels through learning, and suppresses the expression capacity of useless features while improving the expression capacity of useful features according to the sizes of the weights. The method uses two-dimensional DCT to fuse a plurality of frequency components, divides the input X into n parts according to channel dimensions, and distributes corresponding two-dimensional DCT frequency components to each part. The formula is:

s.t. i∈{0,1,…,n-1}

wherein H is the height of the input feature map; w is the width of the input feature map; x is the input image feature tensor; u (u) _i ,v _i Is corresponding to X ⁱ Indexing of the two-dimensional frequency components; freq ⁱ Is a compressed C' dimensional vector; c' =c/n.

And 3 different-scale output layers of the feature fusion part of the YOLOv5s model are connected with 3 detection heads by using the attention of the FcNet channel, so that a defect area in an image can be highlighted, a background area is weakened, and defect detection under multiple scales is realized, thereby enhancing the discrimination capability of an output feature map. The self-adaptive characteristic of the FcNet channel attention mechanism enables the network to adapt to defects of different scales and shapes, so that the generalization capability and the robustness of the network are improved.

In addition, in order to enhance the feature expression capability on the defective portion of the steel surface while suppressing the interference of the ineffective features in the complex background on the detection, the present embodiment needs to enhance the fusion utilization of the multi-scale features. However, in the original YOLOv5s algorithm, the FPN structure only adjusts different feature layers to be uniform in size and then accumulates the feature layers, so that fusion between different feature dimensions is inconsistent, and the detection effect is affected.

To solve the above-described problem, the present embodiment introduces an ASFF structure to the detection head portion. The ASFF module solves the problem of inconsistency inside the feature pyramid by learning the relation between different feature graphs, and adjusts the fusion proportion of the features of different levels according to the importance weight of each feature graph by introducing an attention mechanism. In this way, the module can automatically focus on features that are more helpful for target detection and filter out conflicting information that is not relevant to the target.

The embodiment fully exerts the self-adaptive feature fusion strategy of the ASFF module, and the ASFF structure filters conflict information spatially, optimizes the feature fusion process and reduces the calculation cost.

The structural design of ASFF is shown in fig. 5.

Taking ASFF-1 as an example, in the neg part of the YOLOv5s network, feature maps are output as Level1, level2 and Level 3. The present embodiment adjusts to the same scale by downsampling or upsampling and then fuses the feature maps of Level1, level2, and Level 3. The fusion operation is a self-adaptive weight adding process, and a specific calculation process is represented by the following formula:

in the method, in the process of the invention,a signature output representing the (i, j) position; />Representing a fusion function of the learned different scale features; />Features representing 3 different scales may learn weights.

For Level1, level2, and Level3 feature maps, the present embodiment uses a 1×1 convolution operation to calculate three learnable weight parameters. Normalization was then performed by softmax, bringing the parameters to the [0,1] range and summing to 1.

S4, outputting a steel surface defect image to finish steel surface defect detection.

For each detected defect target, the network structure outputs its category, confidence and location information. The category indicates the type of defect, the confidence reflects the degree of confidence of the algorithm on the detection result, and the position information indicates the position of the defect target in the image.

Example two

In this embodiment, there is also provided a steel surface defect detection system, including: the device comprises an acquisition module, an extraction module, a detection module and an output module; the acquisition module is used for acquiring the steel surface image and preprocessing the steel surface image to obtain a processed image; the extraction module is used for extracting the characteristics of the processed image; the detection module is used for carrying out target detection on the extracted characteristics to obtain a steel surface defect image; the output module is used for outputting a steel surface defect image and finishing steel surface defect detection.

The pretreatment process comprises the following steps: adjusting the size, brightness and contrast of the acquired steel surface image, and removing noise; and obtaining a processed image. The workflow of the extraction module comprises: introducing an ECA attention mechanism into a trunk network and a neck network of the Yolov5s structure, and simultaneously introducing a Ghost convolution into the neck network of the Yolov5s structure to finish feature extraction. And the workflow of the detection module comprises: detecting by using a detection head on the basis of feature extraction; the detection head consists of a rolling layer and a full-connection layer. Meanwhile, fcanet based on SEnet and ASFF structure are combined to perform multi-scale fusion, so that small target detection accuracy is improved.

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims

1. The steel surface defect detection method is characterized by comprising the following steps:

acquiring a steel surface image and preprocessing to obtain a processed image;

extracting features of the processed image;

2. The method for detecting surface defects of steel according to claim 1, wherein the method for performing the pretreatment comprises: adjusting the size, brightness and contrast of the acquired steel surface image, and removing noise; and obtaining the processed image.

3. The method for detecting surface defects of steel material according to claim 1, wherein the method for performing the feature extraction comprises: introducing an ECA attention mechanism into a trunk network and a neck network of the Yolov5s structure, and simultaneously introducing a Ghost convolution into the neck network of the Yolov5s structure to finish feature extraction.

4. The method for detecting surface defects of steel material according to claim 1, wherein the method for performing the target detection comprises: detecting by using a detection head on the basis of feature extraction; the detection head consists of a rolling layer and a full-connection layer.

5. The steel surface defect detection method according to claim 4, wherein in the target detection process, the Fcanet based on the secet is adopted and combined with the ASFF structure to perform multi-scale fusion, so that the small target detection accuracy is improved.

6. A steel surface defect detection system for implementing the method of any one of claims 1-5, comprising: the device comprises an acquisition module, an extraction module, a detection module and an output module;

7. The steel surface defect detection system of claim 6, wherein the pre-treatment process comprises: adjusting the size, brightness and contrast of the acquired steel surface image, and removing noise; and obtaining the processed image.

8. The steel surface defect detection system of claim 6, wherein the workflow of the extraction module comprises: introducing an ECA attention mechanism into a trunk network and a neck network of the Yolov5s structure, and simultaneously introducing a Ghost convolution into the neck network of the Yolov5s structure to finish feature extraction.