CN116777842A

CN116777842A - Light texture surface defect detection method and system based on deep learning

Info

Publication number: CN116777842A
Application number: CN202310591633.3A
Authority: CN
Inventors: 金�一; 鲁浩然; 王旭; 王涛; 李浥东
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-09-19

Abstract

The invention provides a method and a system for detecting light texture surface defects based on deep learning, wherein the method is divided into training and testing stages. The training stage is based on texture surface images of an input training set, the texture surface images are forward propagated layer by layer to obtain a prediction frame of the defect characteristics, the prediction frame of the defect characteristics is obtained, then loss between the prediction frame of the defect characteristics and a real frame of a target image is calculated, reverse propagation is carried out by using the loss, model weights are updated, and the process is repeated until the set iteration round number epoch is reached. And in the test stage, loading data of a test set, outputting the category and the position of the defect image through a trained model, performing evaluation index calculation, judging the performance of the model according to the index, returning to a training link again if the expected requirement cannot be met, performing further adjustment training, and storing model weights if the expected performance is reached, so that the flow of the whole technical invention is completed, and a final solution is obtained.

Description

Light texture surface defect detection method and system based on deep learning

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a system for detecting defects of a lightweight texture surface based on deep learning.

Background

In recent years, artificial intelligence develops heat and hardware equipment is updated fast in iteration, the deep learning technology develops well, the target detection task is widely applied in the deep learning development, and the target detection task is one of tasks for researching the most heat in the deep learning technology. The object detection task based on the deep neural network has prominent advantages in aspects of face recognition, image segmentation, pedestrian re-recognition, industrial detection and the like. The deep learning technology is continuously updated, and the innovation of the target detection task is promoted to be continuously emerging. The target detection algorithm based on deep learning is divided into two types according to the detection steps, one type is a two-stage target detection algorithm, and the detection accuracy of the two-stage target detection algorithm is high, but the detection speed is low; the other type is a one-stage target detection algorithm, and the detection accuracy of the algorithm is lower than that of a two-stage algorithm, but the detection speed is high. The surface defect detection task is a branch of the target detection task, and in the industrial field, the surface defect detection is one of important links of product production. The traditional method is time-consuming and labor-consuming, and the quality of the detection effect is seriously dependent on the human objective factors of experts. The deep learning technology overcomes the defects of the traditional method, has good detection effect and strong universality, but still has some problems, has small data volume in the industrial field, has slow detection timeliness, needs to be improved in detection precision and the like. The method aims to improve the detection accuracy of the target detection network model, the target detection network model is more and more complex, the depth is also deeper and deeper, the required computational power resource is large, the calculation amount of the model is large, the detection timeliness is low, and the real-time requirement of defect detection cannot be met in edge equipment or embedded equipment with limited computational power. In order to meet the real-time requirements of defect detection in edge equipment or embedded devices, a lightweight design of the model is required.

The light weight of the network model is to ensure the accuracy of the model, simplify the network, reduce the network parameters and increase the calculation speed. At present, there are many research methods for lightening networks, and the lightening methods can be roughly classified into one type of compression operation on a model and the other type of lightening design on a network. The light weight method specifically comprises light weight network structure, knowledge distillation, network pruning, quantization and the like.

Disclosure of Invention

The embodiment of the invention provides a light texture surface defect detection method and system based on deep learning, which are used for solving the problems existing in the prior art.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

The lightweight texture surface defect detection method based on deep learning comprises the following steps:

s1, obtaining a prediction frame with defect characteristics through layer-by-layer convolution forward propagation processing of a network model provided with a SheffeNetv 2 lightweight network based on a texture surface image of a training set;

s2, obtaining loss between a prediction frame with defect characteristics and a target image real frame through calculation, and reversely transmitting the loss to a network model to update model parameters;

s3, repeatedly executing the steps S1 and S2 until the preset iteration times are reached, and obtaining a texture surface defect image;

s4, testing and evaluating the texture surface defect image, if the evaluation result does not reach the preset requirement, modifying the super parameters of the network model, returning to the execution steps S1 to S3, and otherwise, outputting the texture surface defect image.

Preferably, the network model comprises a backbone network, a neck and an output part which are sequentially arranged along the data flow direction;

the backbone network has a 3x3 convolutional layer, a normalization layer, a ReLu active layer, and a max pooling layer sequentially arranged in a data flow direction, and a first shufflenet 2 layer composed of one Shuffle block of step size 2, a second shufflenet 2 layer composed of three Shuffle blocks of step size 1, a third shufflenet 2 layer composed of one Shuffle block of step size 2, a fourth shufflenet 2 layer composed of seven Shuffle blocks of step size 1, a fifth shufflenet 2 layer composed of one Shuffle block of step size 2, a sixth shufflenet 2 layer composed of three Shuffle blocks of step size 1;

each Shuffle block includes a Shortcut branch and a deep convolution branch; respectively carrying out feature extraction operation on the short circuit branch and the deep convolution branch when the step length is 1, then recombining the extracted feature information through channel re-washing, and merging the short circuit branch and the deep convolution branch when the step length is 2;

the neck portion includes:

the first part is formed by overlapping a GhostConv module, a CARAFE up-sampling operator, a splicing module and a C3Ghost module; the splicing module of the first part is used for splicing the backbone network;

the second part is formed by overlapping a CA_H attention mechanism module, a GhostConv module, a CARAFE up-sampling operator, a splicing module and a C3Ghost module; the splicing module of the second part is used for splicing the backbone network;

the third part consists of a CA_H attention mechanism module, a GhostConv module, a splicing module and a C3Ghost superposition; the splicing module of the third part is used for splicing the GhostConv module of the second part;

the fourth part is formed by overlapping a CA_H attention mechanism module, a GhostConv module, a splicing module and a C3Ghost module; the splicing module of the fourth part is used for splicing the GhostConv module of the first part;

the CA_H attention mechanism module is provided with an H-sigmoid activation function;

the output part is provided with three GhostConv modules which are respectively connected with the second part, the third part and the fourth part of the neck part.

Preferably, each GhostConv module is provided with a standard convolution sub-module and a channel-by-channel convolution sub-module which are sequentially arranged along the data flow direction, and the output result of the GhostConv module is the output result of channel combination of the convolution result of the standard convolution sub-module and the convolution result of the channel-by-channel convolution sub-module.

Preferably, calculating the loss between the predicted frame with the defect feature and the target image real frame is achieved by a SIoU regression loss function including angle loss, distance loss, shape loss, and IoU loss;

the angle loss is calculated by the following formula

In the method, in the process of the invention,and->Representing the central coordinate value of the real frame b _cx And b _cy A central coordinate value representing a prediction frame;

distance loss passing type

Calculating; wherein, c _h And c _w Is defined as the height and width of the smallest rectangle that encloses both anchor frames;

shape loss pass-through type

Calculating; where w and h are defined as the width and height of the model output bounding box, w ^gt And h ^gt Is defined as the width and height of the actual frame of the object, θ is a variable factor representing the weight of the shape loss;

IoU loss through type

Calculating; wherein A and B respectively represent two rectangular frames;

based on the formulas (1) to (5), SIoU regression loss function formulas are obtained

In a second aspect, the invention provides a lightweight texture surface defect detection system based on deep learning, comprising a training module and a testing module;

the training module has a training set, and is further configured to:

based on a texture surface image of a training set, obtaining a prediction frame with defect characteristics through layer-by-layer convolution forward propagation processing of a network model provided with a shufflenet 2 lightweight network;

obtaining loss between a prediction frame with defect characteristics and a target image real frame through calculation, and reversely transmitting the loss to a network model to update model parameters;

repeatedly executing the process until the preset iteration times are reached, and obtaining a texture surface defect image;

the test module adds the texture surface defect image output by the training module into a test set of the test module, and the test module is also used for: and (3) testing and evaluating the texture surface defect image, if the evaluation result does not reach the preset requirement, modifying the super parameters of the network model, returning to the execution process of the execution training module, and otherwise, outputting the texture surface defect image.

According to the technical scheme provided by the embodiment of the invention, the invention provides a method and a system for detecting the defects of the lightweight texture surface based on deep learning, wherein the method is divided into training and testing stages. The training stage is based on texture surface images of an input training set, the texture surface images are forward propagated layer by layer to obtain a prediction frame of the defect characteristics, the prediction frame of the defect characteristics is obtained, then loss between the prediction frame of the defect characteristics and a real frame of a target image is calculated, reverse propagation is carried out by using the loss, model weights are updated, and the process is repeated until the set iteration round number epoch is reached. And in the test stage, loading data of a test set, outputting the category and the position of the defect image through a trained model, performing evaluation index calculation, judging the performance of the model according to the index, returning to a training link again if the expected requirement cannot be met, performing further adjustment training, and storing model weights if the expected performance is reached, so that the flow of the whole technical invention is completed, and a final solution is obtained. The method and the system provided by the invention have the following advantages:

1. compared with the existing lightweight obstacle detection model method for improving the YOLOv5s detection network, the method has the advantages of good detection effect and 97.9% precision.

2. The invention utilizes two lightweight networks, an attention mechanism, a lightweight upsampling operator and a SIoU loss function to ensure that the model parameter quantity is small, and the whole model is only 0.62MB.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a prior art detection model architecture;

FIG. 2 is a process flow diagram of a lightweight texture surface defect detection method based on deep learning provided by the invention;

FIG. 3 is a schematic diagram of a network model of the deep learning-based lightweight texture surface defect detection method according to the present invention;

FIG. 4 is a schematic diagram of a SheffleNetv 2 convolution block of the deep learning-based lightweight texture surface defect detection method of the present invention;

FIG. 5 is a schematic diagram of a CA attention mechanism module of the deep learning-based lightweight texture surface defect detection method according to the present invention;

FIG. 6 is a schematic diagram of a CA_H attention mechanism module of the deep learning based lightweight texture surface defect detection method according to the present invention;

fig. 7 is a schematic diagram of a GhostConv module of the deep learning-based lightweight texture surface defect detection method provided by the invention;

FIG. 8 is a flow chart of a preferred embodiment of a deep learning based lightweight texture surface defect detection method provided by the present invention;

FIG. 9 is a logical block diagram of a deep learning based lightweight texture surface defect detection system provided by the present invention;

fig. 10 is a schematic diagram of two rectangular frames for calculating IoU loss in the deep learning-based lightweight texture surface defect detection method according to the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.

The invention provides a lightweight texture surface defect detection method and system based on re-washing and ghost features, which are used for solving the following problems in the prior art:

currently, most of product part defect detection in the industry is completed manually, and this method has high labor cost, low detection rate and low efficiency, for example: PCB surface defect detection, steel plate defect detection, metal shaft detection and the like. With the development of the deep learning technology, the application of the deep learning technology in the industry is a hot field, machines are effectively used for replacing manpower, and the Chinese manufacturing is changed into the Chinese 'intelligent' manufacturing. The deep learning technology overcomes the defects of the traditional method, has good detection effect and strong universality, but still has some problems, has small data volume in the industrial field, has slow detection timeliness, needs to be improved in detection precision, and is difficult to meet the requirements of real-time detection in industrial production. The method aims to improve the detection accuracy of the target detection network model, the target detection network model is more and more complex, the depth is also deeper and deeper, the required computational power resource is large, the calculation amount of the model is large, the detection timeliness is low, and the real-time requirement of defect detection cannot be met in edge equipment or embedded equipment with limited computational power. In order to meet the real-time requirement of defect detection in edge equipment or embedded equipment, a one-stage detection algorithm YOLOv5 is used as a reference model for light-weight design, and the purpose of reducing the complexity and parameter quantity of a network structure while maintaining high detection accuracy is achieved.

The YOLOv5 of the mainstream single-stage detection network has good performance in detection speed and detection precision, and has been widely concerned in practical engineering application. Aiming at the defects of poor real-time performance and low detection precision of a traditional train track obstacle detection method, a lightweight obstacle detection model YOLOv5s-MGCT for improving a YOLOv5s detection network is provided, the detection speed of the model is greatly improved, the detection precision is improved, and a good detection effect is achieved in practical application.

As shown in fig. 1, in the approximation technical scheme, a lighter Mixup data enhancement mode is introduced to replace the original Mosaic data enhancement mode in the algorithm; the depth separable convolution GhostConv in the GhostNet network structure is introduced to replace a common convolution layer in a feature extraction network and a feature fusion network in the original YOLOv5s model, so that the calculation cost of the model is reduced; adding a CA space attention mechanism at the tail end of the model feature extraction network, so that the loss of important position information is reduced in the training process of an algorithm, and the loss of detection precision caused by improving GhostNet is compensated; and (3) performing sparse training and channel pruning operation on the improved model, pruning a channel with little influence on detection precision, and simultaneously retaining important characteristic information, so that the model is lighter. Meanwhile, compared with the current mainstream detection algorithm, the method has certain advantages in detection precision and detection speed, and is suitable for detecting the obstacle target in the complex track traffic environment.

In addition, the existing YOLOv5s-MGCT network structure introduces a Ghostnet lightweight network, but the overall model parameter amount of 4.7MB needs to be further reduced. The detection precision mAP0.5 of the algorithm model is 94.7%, the precision requirement in industry is very high, and the value needs to be further improved.

Referring to fig. 2, the invention provides a lightweight texture surface defect detection method based on deep learning, which comprises the following steps:

s2, obtaining loss between a prediction frame with defect characteristics and a target image real frame through calculation, and reversely transmitting the loss to the network model to update model parameters;

s4, testing and evaluating the texture surface defect image, if the evaluation result does not meet the preset requirement, modifying the super parameters of the network model, returning to the steps S1 to S3, and otherwise, outputting the texture surface defect image.

The invention aims to meet the real-time requirement of defect detection in edge equipment or embedded equipment, uses a one-stage detection algorithm YOLOv5 as a reference model to carry out lightweight innovation, and aims to reduce the complexity and parameter quantity of a network structure while maintaining high detection accuracy.

In the preferred embodiment of the present invention, the basic flow of the texture surface defect detection method is shown in fig. 2, firstly, in the model training stage, the texture surface image of the training set is input, and forward propagation is performed through layer-by-layer convolution to obtain a prediction frame of defect characteristics, so as to obtain the prediction frame of defect characteristics, then the loss between the prediction frame of defect characteristics and the real frame of the target image is calculated, the back propagation is performed by using the loss, the model weight is updated, and the process is repeated until the set iteration round number epoch is reached. And in the test stage, loading data of a test set, outputting the category and the position of the defect image through a trained model, performing evaluation index calculation, judging the performance of the model according to the index, returning to a training link again if the expected requirement cannot be met, performing further adjustment training, and storing model weights if the expected performance is reached, so that the flow of the whole technical invention is completed, and a final solution is obtained.

The invention provides a new model algorithm, namely a lightweight texture surface defect detection method based on re-washing and ghost features, as shown in fig. 3, wherein an improved network model is provided, and the improved network model comprises a backbone network, a neck part and an output part which are sequentially arranged along the data flow direction.

The backbone network has a shufflenet 2 lightweight network for feature extraction, which specifically includes a 3x3 convolutional layer, a normalization layer, a ReLu active layer, and a max pooling layer sequentially arranged in a data flow direction, and a first shufflenet 2 layer composed of one Shuffle block of step size 2, a second shufflenet 2 layer composed of three Shuffle blocks of step size 1, a third shufflenet 2 layer composed of one Shuffle block of step size 2, a fourth shufflenet 2 layer composed of seven Shuffle blocks of step size 1, a fifth shufflenet 2 layer composed of one Shuffle block of step size 2, and a sixth shufflenet 2 layer composed of three Shuffle blocks of step size 1.

Each Shuffle block includes a Shortcut branch and a deep convolution branch; and when the step length is 1, respectively carrying out feature extraction operation on the short circuit branch and the deep convolution branch, then recombining the extracted feature information through channel re-washing, and when the step length is 2, merging the short circuit branch and the deep convolution branch.

The neck portion includes:

the first part is formed by overlapping a GhostConv module, a CARAFE up-sampling operator, a splicing module and a C3Ghost module; the splicing module of the first part is used for splicing a fourth shufflenet v2 layer of the backbone network;

the second part is formed by overlapping a CA_H attention mechanism module, a GhostConv module, a CARAFE up-sampling operator, a splicing module and a C3Ghost module; the splicing module of the second part is used for splicing a second shufflenet v2 layer of the backbone network;

Compared with the basic flow and the prior art, the method mainly has 5 innovations: the first point is that in order to realize the rapid acquisition capability of the features in the backbone network, the backbone network is constructed by utilizing the shufflenet v2 lightweight network, so that the detection speed is faster and the parameter quantity is smaller; the second point is that the Sigmoid activation function can help the network model to improve performance, but the method is in an exponential form and has complex calculated amount, the curve of the H-Sigmoid function is similar to the Sigmoid function, the H-Sigmoid activation function has no exponential operation, compared with the Sigmoid function, the calculated amount can be reduced, and the reasoning time of the network is saved. In the CA attention module, the original Sigmoid function is replaced by the H-Sigmoid function, and an improved attention mechanism module CA_H is constructed based on the CA attention mechanism; the third point is that a lightweight network GhostNet module is used for carrying out lightweight deployment on a network model in a neck part, and because the CA_H attention module contains space direction information and can solve the problem of long-range dependence, the CA_H attention module can search a place of interest in an activation graph, enrich semantic information of the activation graph and exclude some useless information, and the CA_H module can obtain enough characteristic information to help the model to generate a target frame in combination with the GhostConv module; the fourth point is that the original up-sampling operation is replaced by a lightweight up-sampling operator CARAFE, and the boundary of the CARAFE up-sampling operator receptive field is larger, so that the loss of useful information can be prevented after up-sampling is performed by taking a feature map obtained by Ghost convolution as the up-sampling input; the fifth point is that the detection effect of the model can be improved by using the SIoU loss function.

In the preferred embodiment provided by the present invention, the modules designed and used in the present invention are specifically arranged and function as follows.

(1) SheffeNetv 2 lightweight network module

A ShuffleNetv2 lightweight network is used in the backbone network to build a lightweight fast feature extraction network. The basic component of the Shuffle net v2 lightweight network is Shuffle Block, which is divided into two parts according to the step size, as shown in fig. 4, which specifically illustrates the Shuffle Block in the Shuffle net v2 network. When the step size is 1, according to the third principle of the four principles proposed by the shufflenet 2, the input channel is uniformly divided into two parts by using the channel separation operation, and one branch is taken as a Shortcut, so that all operations are not executed. The other branch uses a convolution with 2 convolution kernels being 1 and a depth convolution with 1 convolution kernel being 3 according to a second principle of four principles proposed by the SheffeNetv 2, then the two branches are spliced and fused together, the first principle and the fourth principle of the four principles proposed by the SheffeNetv 2 are met, and finally the characteristic information of the two channels is rearranged and combined in a disordered way by utilizing channel re-washing. When the step length is 2, the input end does not use channel separation operation, so the channel dimension of the output is doubled, 1 3x3 depth separable convolution and 1 standard convolution are added in one non-operation branch with the original step length of 1, and other operations are unchanged.

(2) CA_H attention mechanism module (CA_ H attention mechanism module)

For the SE module which only pays attention to the channel and loses the characteristic position information, the CA attention module is improved based on the defects of the SE module, the two-dimensional channel is split into one-dimensional characteristic channels in the vertical direction and the horizontal direction, then the characteristic channels containing the space direction information are encoded, and each channel effectively integrates the position information of the input characteristic along the specific direction, so that the problem of long-range dependence can be solved. Finally, multiplying the two channels with the original input features to enrich the position information of the feature map. Because of its small size and small number of parameters, it can be inserted and used into the network model. The main structure of the CA module is shown in FIG. 5.

The Sigmoid activation function can help the network model to improve the performance, but is in an exponential form, the calculated amount is complex, the H-Sigmoid activation function is similar to the Sigmoid activation function as far as possible, the H-Sigmoid activation function has no exponential operation, the calculated amount can be reduced, and the reasoning time of the network is saved. Therefore, in the CA attention module, the original Sigmoid function is replaced by the H-Sigmoid function, and a new module CA_H which is improved based on the CA attention mechanism is constructed. As shown in fig. 6.

(3) GhostNet light-weight network module

A GhostConv module has been proposed in GhostNet to process features, obtain redundant information and enrich feature graphs by using lower cost operations. The main structure of the GhostConv module is shown in fig. 7, the module firstly performs downsampling through standard convolution to reduce the parameter amount by half, then performs channel-by-channel convolution on the convolved result, and finally performs channel combination on the first-step convolution result and the second-step convolution result. GhostConv is effective in reducing the amount of parameters for convolution operations compared to normal convolution.

Setting the input data as X epsilon R ^c×h×w The convolution kernel is K, and the result Y epsilon R is output ^h '×w' _×n The calculated amount after the convolution operation is c×k×h '×w' ×n, where n is the number of convolution kernels and c is the number of channels. Some similar feature maps in the GhostConv module are not required to be extracted again, are called as 'ghost', can be converted through some operations with small calculation amount and low cost, set the channel number of the ghost as m, and each feature map is processed for s times with small costThe operation is converted, and then n=m×s feature maps can be obtained. In the GhostConv, the last operation is an identity operation, so m times (s-1) conversion operations are taken into account, the convolution kernel size of the conversion operations is uniformly set to d in consideration of the operation efficiency of a real scene, and the calculation amount ratio of the common convolution operation to the Ghost convolution operation is shown in the following formula, wherein d and k are not greatly different, and s < c.

From this, the GhostConv operation can save a lot of calculation amount compared with the common convolution operation, and has good advantages in terms of accuracy and model parameter size.

(4) CARAFE up-sampling operator module

The upsampling operator with good performance tends to have a large receptive field, so that the domain information can be efficiently utilized. Secondly, the size value of the upsampling kernel should be dynamically matched with the semantic information of the activation graph, and upsampling is implemented according to the input content. The other is that a large number of parameters are avoided in the up-sampling operation, the calculation complexity is reduced, and the effect of light weight is realized. The CARAFE upsampling operator has the characteristics that the CARAFE has large receptive field when the features are rearranged, can dynamically perform upsampling operation according to input data, and has small calculated amount. The CARAFE is divided into a nuclear prediction part and a characteristic recombination part. When the input data is transmitted to the module, the input data firstly enters the core prediction part, the up-sampling core is dynamically matched according to the input content, and then up-sampling is realized through the content-aware reorganization module.

(5).SIoU Loss

None of the regression loss functions previously proposed discuss the angle problem of the predicted and real frames. For this problem, SIoU Loss is proposed, and factors of distance, shape, ioU and angular direction are comprehensively considered in calculating penalty metrics of a real target frame and a model prediction anchor frame in an image. The added angular direction penalty term greatly facilitates the training process because it facilitates the prediction box to be quickly moved to the nearest axis, the latter method only requiring regression of one coordinate X or Y. In other words, the added angular direction penalty term can reduce the total number of degrees of freedom, speed up the calculation of the distance between the object real frame and the model prediction bounding box in the image, and speed up the convergence of the object real frame and the model prediction bounding box in the image.

The SIoU regression loss function consists of angle loss, distance loss, shape loss, and IoU loss. The angular loss definition is given in equations 1 and 2.

Wherein the method comprises the steps ofAnd->Representing the central coordinate value of the real frame b _cx And b _cy The central coordinate value of the prediction frame is represented.

The distance loss is defined in formula 3, wherein c _h And c _w Is defined as the height and width of the smallest rectangle that encloses both anchor boxes.

The definition of the shape loss Ω is given in equation 4, where the definition of w and h is the width and height of the model output bounding box, w ^gt And h ^gt Is defined as the width and height of the actual box of the object, θ is a variable factor, and is the weight of the shape loss.

IoU loss definition is given in equation 5. Wherein a and B represent two rectangular frames respectively, specifically, a real frame a and a predicted frame B, and the value of IoU is equal to the intersection of the two rectangular frames to the union of the two rectangular frames (as shown in fig. 10);

the final SIoU regression loss function definition is given in equation 6.

The invention also provides an embodiment for exemplarily displaying the execution process of the method provided by the invention.

The invention relates to a lightweight texture surface defect detection method based on re-washing and ghost features, wherein the basic flow is shown in fig. 8, and the implementation of the invention mainly comprises the following stages of a preprocessing stage, a feature extraction stage, a loss calculation stage, a model iteration optimization stage, a model test evaluation stage and the like. The specific operation of each stage is explained in detail below.

Before the prior art is used, a technician is required to perform configuration work of related links, wherein the configuration work comprises a development environment for installing a Linux operating system, python 3.8 (and the versions above), and a depth frame of PyTorch1.11 (and the versions above), because the algorithm used by the invention is a model algorithm based on deep learning, a training process for performing the model in a GPU environment is recommended, and a Pytorch1.11 (and the versions above) of a GPU version and a CUDA parallel computing architecture of a corresponding version are required to be installed.

Input of algorithm:

1. texture image data: the method comprises a training set and a testing set, wherein a training image is used for training the capability of the model to extract characteristics, and the testing set is used for verifying the performance of the model.

2. Model algorithm hyper-parameters: including image size, batch size in training, iteration number and learning rate, optimizer momentum factor, etc.

Output of the algorithm:

and obtaining the trained parameter weights of the model algorithm reaching the performance evaluation standard.

The method comprises the following steps:

and (3) a step of: pretreatment stage

Step 1-1: converting the weakly marked data into data in the form of anchor frames;

step 1-2: loading texture pictures (comprising training set and test set data) into a GPU video memory;

step 1-3: and (3) using the Mosaic data for enhancement, randomly selecting four pictures for operations such as zooming and cutting, and finally synthesizing into an image.

2. Feature extraction stage

Step 2-1: firstly, obtaining characteristic information in a backbone network through a 3x3 convolution layer, a normalization layer, a ReLu activation layer and maximum pooling, and then, passing through a Shuffle block with a step length of 2, three Shuffle blocks with a step length of 1, a Shuffle block with a step length of 2, seven Shuffle blocks with a step length of 1, a Shuffle block with a step length of 2 and three Shuffle blocks with a step length of 1;

step 2-2: features of the backbone network output are fed into the neck part, in particular the neck can be divided into four parts: the first part consists of a GhostConv, a CARAFE up-sampling operator, a fourth layer of spliced backbone network and a C3Ghost superposition; the second part consists of a CA_H attention mechanism, a GhostConv, a CARAFE up-sampling operator, a spliced backbone network second layer and a C3Ghost superposition; the third part consists of a CA_H attention mechanism, a GhostConv, a twelfth layer of a spliced network structure and a C3Ghost superposition; the fourth layer is composed of a CA_H attention mechanism, a GhostConv, a seventh layer of a spliced network structure and a C3Ghost superposition.

Step 2-3: and predicting the fifteenth, nineteenth and twenty third output results in the network structure to obtain a target classification prediction result.

3. Loss calculation stage

Step 3-1: for positioning Loss, a SIoU Loss calculation is used; for confidence and classification loss, a binary cross entropy function is used to calculate;

step 3-2: the three losses calculated above are added to obtain the final loss.

4. Model optimization stage

Step 4-1: the code implementation is based on a PyTorch deep learning framework, and can be counter-propagated from the finally calculated composite loss value, and the gradient value of the parameters in the model is automatically calculated;

step 4-2: using the gradients calculated in the previous step, updating the learnable parameter values of the model algorithm using an optimizer (e.g. a SGD optimizer of Pytorch);

step 4-3: repeating all the execution steps before the model reaches the number of rounds set by the super parameters, and stopping the training process of the model after the number of rounds is reached.

5. Test evaluation stage

Step 5-1: the texture image of the test set is read and loaded to the GPU video memory, and standardized operation which is the same as that of the training link is carried out (note that image enhancement is not needed during test);

step 5-2: and (3) adopting Mean Average Precsion (mAP), parameter quantity, calculated quantity and Frames Per Second (FPS) evaluation indexes commonly used in the defect detection task, and primarily evaluating the model quality by evaluating the calculated index values.

Step 5-3: if the evaluation result does not meet the requirement, the super parameters of the model need to be adjusted, the first step of the execution step is returned, the training link of the model is carried out again, and if the evaluation result meets the requirement, the model weight can be saved, so that the solution of the lightweight defect detection task is obtained.

In a second aspect, the present invention provides a system for performing the above method, comprising a training module 801 and a testing module 802;

the training module 801 has a training set, and is further configured to:

based on the texture surface image of the training set, a prediction frame with defect characteristics is obtained through forward propagation processing of layer-by-layer convolution in the network model;

the test module 802 adds the texture surface defect image output by the training module to its own test set, and is further configured to: and (3) testing and evaluating the texture surface defect image, if the evaluation result does not reach the preset requirement, modifying the super parameters of the network model, returning to the execution process of the execution training module, and otherwise, outputting the texture surface defect image.

In summary, the present invention provides a method and a system for detecting defects on a lightweight texture surface based on deep learning, wherein the method is divided into training and testing stages. The training stage is based on texture surface images of an input training set, the texture surface images are forward propagated layer by layer to obtain a prediction frame of the defect characteristics, the prediction frame of the defect characteristics is obtained, then loss between the prediction frame of the defect characteristics and a real frame of a target image is calculated, reverse propagation is carried out by using the loss, model weights are updated, and the process is repeated until the set iteration round number epoch is reached. And in the test stage, loading data of a test set, outputting the category and the position of the defect image through a trained model, performing evaluation index calculation, judging the performance of the model according to the index, returning to a training link again if the expected requirement cannot be met, performing further adjustment training, and storing model weights if the expected performance is reached, so that the flow of the whole technical invention is completed, and a final solution is obtained. The method and the system provided by the invention have the following advantages:

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. The lightweight texture surface defect detection method based on deep learning is characterized by comprising the following steps of:

2. The method of claim 1, wherein the network model comprises a backbone network, a neck, and an output, sequentially arranged along a data flow direction;

the backbone network has a 3x3 convolutional layer, a normalization layer, a ReLu activation layer and a maximum pooling layer which are sequentially arranged along the data flow direction, a first shuffleNetv2 layer consisting of one Shuffle block with a step length of 2, a second shuffleNetv2 layer consisting of three Shuffle blocks with a step length of 1, a third shuffleNetv2 layer consisting of one Shuffle block with a step length of 2, a fourth shuffleNetv2 layer consisting of seven Shuffle blocks with a step length of 1, a fifth shuffleNetv2 layer consisting of one Shuffle block with a step length of 2, and a sixth shuffleNetv2 layer consisting of three Shuffle blocks with a step length of 1;

each Shuffle block includes a Shortcut branch and a deep convolution branch; when the step length is 1, the Shortcut branch and the depth convolution branch respectively perform characteristic extraction operation, then the extracted characteristic information is recombined through channel re-washing, and when the step length is 2, the Shortcut branch and the depth convolution branch are combined;

the neck portion includes:

3. The method according to claim 1, wherein each of the GhostConv modules has a standard convolution sub-module and a channel-by-channel convolution sub-module sequentially arranged along a data flow direction, and the output result of the GhostConv module is an output result of channel-merging the convolution result of the standard convolution sub-module and the convolution result of the channel-by-channel convolution sub-module.

4. The method of claim 1, wherein calculating the loss between the predicted box with the defect feature and the target image real box is accomplished by a SIoU regression loss function, including angle loss, distance loss, shape loss, and IoU loss;

the angle loss is calculated by the following formula

the distance is lost through

the shape loss passing through type

the IoU is lost through

Calculating; wherein A and B respectively represent two rectangular frames;

5. The lightweight texture surface defect detection system based on deep learning is characterized by comprising a training module and a testing module;

the training module has a training set, and is further configured to:

obtaining loss between a prediction frame with defect characteristics and a target image real frame through calculation, and reversely transmitting the loss to the network model to update model parameters;

the test module adds the texture surface defect image output by the training module into a test set of the test module, and is also used for: and testing and evaluating the texture surface defect image, if the evaluation result does not reach the preset requirement, modifying the super parameters of the network model, returning to the execution process of the training module, and otherwise, outputting the texture surface defect image.