CN116071315A

CN116071315A - Product visual defect detection method and system based on machine vision

Info

Publication number: CN116071315A
Application number: CN202211738497.8A
Authority: CN
Inventors: 沈锋; 郭中原; 胡扬俊; 俞强玮; 费杰; 刘盈智; 刘立鹏; 韩双来
Original assignee: FOCUSED PHOTONICS (HANGZHOU) Inc
Current assignee: FOCUSED PHOTONICS (HANGZHOU) Inc
Priority date: 2022-12-31
Filing date: 2022-12-31
Publication date: 2023-05-05

Abstract

The invention relates to the technical field of image processing, in particular to a product visual defect detection method and system based on machine vision, wherein the method comprises the following steps: acquiring a product surface image; inputting the product surface image into a feature extraction network model to obtain product features; inputting product characteristics to a defect detection network model to detect defects and outputting a defect detection result; the defect detection network model is obtained through the following steps: recording defect information on a product surface image, and constructing a product defect detection data set according to the defect information, wherein the product defect detection data set comprises a training set and a verification set; training and verifying the deep learning model by using the product defect detection data set to obtain a defect detection network model. The beneficial technical effects of the invention include: the detection efficiency to complicated product surface can be improved, detection accuracy is promoted.

Description

Product visual defect detection method and system based on machine vision

Technical Field

The invention relates to the technical field of image processing, in particular to a product visual defect detection method and system based on machine vision.

Background

Visual defects on the surface of industrial products can have adverse effects on the attractiveness, comfort, usability and the like of the products, so that manufacturers can detect the visual defects on the surface of the products so as to discover and control the visual defects in time.

The detection methods of the visual defects of the current products are divided into three types: the first method is a manual detection method, which has the defects of high cost, difficulty in achieving the required precision and speed when judging the micro defects, high labor intensity, poor consistency of detection standards and the like; the second is a mechanical device contact detection method, which can meet the production requirement in quality, but has the defects of high price, poor flexibility, low speed and the like of mechanical device detection equipment; the third is a machine vision detection method, namely, the defects possibly existing in the product are automatically detected by utilizing an image processing and analyzing technology, the method adopts a non-contact working mode, the installation is flexible, the measurement precision and the speed are high, the method is an effective means for realizing the automation, the intellectualization and the precise control of equipment, and the method has the outstanding advantages of safety, reliability, wide spectral response range, long-time working under severe environment, high production efficiency and the like. The same machine vision detection equipment can realize multi-parameter detection of different products, and the expense of large equipment is saved for enterprises. However, the surface defect detection based on machine vision still has the following two technical problems:

1) The signal to noise ratio of the detection system is generally low and weak signals are difficult to detect or cannot be effectively distinguished from noise due to the influence of multiple factors such as environment, illumination, production process, noise and the like.

2) The machine vision surface defect detection, especially on-line detection, is characterized by huge data volume, more redundant information and high feature space dimension, and meanwhile, the algorithm capability for extracting limited defect information from mass data is insufficient and the real-time performance is not high by considering the diversity of real machine vision facing objects and problems.

Therefore, currently, the visual defect inspection of products still mainly depends on a manual detection method, but high repeatability and long-time load can cause the reduction of manual detection efficiency, so that the visual defects of the products are easily ignored, and then potential and even serious defect traceability interference is caused for a later test plan, so that the test progress and project progress are affected.

For this reason, it is necessary to study a machine vision-based defect detection method capable of improving the defect detection efficiency and detection accuracy for a complex surface.

Such as chinese patent CN113421263a, publication No. 2021, no. 9, no. 21, part defect detection methods, apparatus, media, and computer program products. The method comprises the following steps: obtaining an image to be detected corresponding to a part to be detected, and carrying out defect prediction on the image to be detected based on a full-scale gray level priori depth segmentation model to obtain an image defect detection result, wherein the full-scale gray level priori depth segmentation model is obtained by carrying out iterative training optimization on a defect detection model to be trained, which is formed by cascading a preset number of deep neural network modules based on a training defect image set collected in advance. The invention solves the technical problem of low accuracy of part defect detection.

For example, chinese patent CN114881998A, published day 2022, 8 and 9, a bow and arrow surface defect detection method and system based on deep learning. The method comprises the steps of obtaining sample pictures of all defect types included in a product to generate a training sample set, wherein the defect types at least comprise scratches, bumps, pits, white spots and watermarks. The method comprises the steps of performing defect detection on a customer product by using an image-based deep learning algorithm, establishing a sample distribution data set of each defect of an aluminum plate, researching the characteristic distribution of the sample sets on a two-dimensional image by using a neural network algorithm, establishing a mathematical model, inputting the model into a product image acquired by a customer on site, outputting coordinates of points possibly being defects in the image by using model calculation, and judging whether the product has defects or not by screening results. The invention solves the problem that the detection standards of the defects judged by the manual detection method are inconsistent. However, the technical problems of large calculated amount, low detection efficiency and low detection precision of a deep learning algorithm for detecting complex surface defects by a machine vision detection method are not solved by the technical scheme.

Disclosure of Invention

The invention aims to solve the technical problems that: at present, for the technical problems of low detection efficiency and low detection precision in the visual defect detection of products, a method and a system for detecting the visual defects of the products based on machine vision are provided, so that the detection efficiency of the surfaces of complex products can be improved, and the detection precision is improved.

The technical problems are solved, and the invention adopts the following technical scheme: a product visual defect detection method based on machine vision comprises the following steps:

acquiring a product surface image;

inputting the product surface image into a feature extraction network model to obtain product features;

inputting the product characteristics to a defect detection network model to detect defects and outputting a defect detection result;

the defect detection network model is obtained through the following steps: recording defect information on a product surface image, and constructing a product defect detection data set according to the defect information, wherein the product defect detection data set comprises a training set and a verification set; and training and verifying a deep learning model by using the product defect detection data set to obtain the defect detection network model.

Preferably, inputting the product surface image into the feature extraction network model to obtain the product features specifically includes:

inputting the product surface image into the feature extraction network model to obtain a feature map, wherein the feature map is a set of at least one product feature, and the feature extraction network model comprises a residual network model and a bottleneck layer.

Preferably, inputting the product surface image into the surface feature extraction network model to obtain a feature map specifically includes:

establishing a Focus module in the feature extraction network model, inputting the product surface image into the Focus module, and performing slicing operation;

and carrying out convolution operation on the picture obtained after slicing operation in the feature extraction network model to obtain a double downsampling feature map.

Preferably, the step of constructing a product defect detection dataset from the defect information comprises:

marking whether the product surface image contains a defect;

classifying the marked pictures containing the defect targets, and recording the real frame positions and the category information of all the defect targets;

and constructing different product defect detection data sets aiming at different types of defects, and finally dividing the product defect detection data sets into a training set and a verification set according to a certain proportion.

Preferably, the step of training and validating a deep learning model using the product defect detection dataset comprises:

(1) Constructing an algorithm network model, comprising: constructing a trunk feature extraction network, pre-training a classification task on a public image Net image dataset in advance by the trunk network, and storing a pre-training network model and a model weight file;

(2) Constructing a cross-stage partial network;

(3) Building a feature aggregation network, adding cross small-batch normalization after each convolution layer of the feature aggregation network, and activating a function Mish to form a convolution module;

(4) Training the constructed algorithm network model by using the training set of the product defect detection data set.

Preferably, the cross-phase partial network includes: and splitting the feature map obtained by the trunk feature extraction network into two parts, namely a trunk part and a branch part, wherein the trunk part is sent to the SPP module after passing through a plurality of convolution layers, the obtained feature map layers are spliced, and the result of the trunk part convolution operation is connected with the branch part after passing through a plurality of convolution layers.

Preferably, in the step of training the constructed algorithm network model by using the training set of the product defect detection data set, a loss function is added, wherein the loss function comprises a regression loss function, a category loss function and a confidence loss function; wherein GIOU loss is used for the defect location regression loss function, and a binary cross entropy loss function is used for the defect class loss function and the confidence loss function; and (3) calculating a total loss value, performing error back propagation, using a cross small-batch normalization and activation function Mish, and storing a model with the highest average precision mean value in verification sets in all training to obtain a trained improved target detection algorithm network model.

Preferably, the method for calculating the total loss value is as follows:

L＝γ ₁ L _cls +γ ₂ L _obj +γ ₃ L _box

wherein L is _cls Representing class loss function, gamma ₁ Weight value representing class loss function, L _obj Representing confidence loss function, gamma ₂ Weight value representing confidence loss function, L _box Representing regression loss function, gamma ₃ A weight value representing a regression loss function;

calculating class loss functions

Wherein: o (O) _ij True values representing class, i.e. O when no class j object is contained in sample i _ij With value 0, O when sample i contains a jth class object _ij Has a value of 1, N _pos Representing the total number of all positive samples, pos representing the position of the target object, cla representing the class of the target object, sigmoid (C _ij ) The predicted value of the category is expressed, namely the probability of the j-th category target in the model in the predicted sample i;

calculating confidence loss function

Wherein: o (O) _i True value representing confidence, i.e. O when sample i belongs to a positive sample _i With a value of 1, O when sample i belongs to a negative sample _i With a value of 0, n represents the total number of all samples, sigmoid (C _i ) The predicted value representing the confidence coefficient is the probability of the positive sample in the model appearing in the predicted sample i;

marking the coordinates of the prediction frame as

The sitting sign of the real frame is +.>

And->

Area of prediction frame +.>

Area of real frame

Calculating the area S of the overlapping part between the real frame and the predicted frame ^d ：

/>

Searching a real frame and the minimum circumscribed rectangle of the real frame and the prediction frame, and calculating the area S of the minimum circumscribed rectangle of the two rectangular frames ^c The method comprises the following steps:

the intersection ratio IOU of the prediction frame and the real frame is calculated as follows:

the regression loss function is calculated as:

wherein: s is S ^c The area of the smallest circumscribed rectangle of the real frame and the predicted frame is represented, and U represents the area of the union of the real frame and the predicted frame.

Preferably, in the step of training the constructed algorithm network model by using the training set of the product defect detection dataset, predicting the verification set by using the improved target detection algorithm network model and parameters to obtain a prediction result, performing post-processing on the prediction result by using the GIOU-NMS module to obtain an output, calculating the detection precision and average precision mean value of the trained improved target detection algorithm network model based on the output, and recording the detection result;

wherein the calculation formula of the average precision mean value is

Wherein APi is the detection precision of a certain detection part of the product, and x is the number of the detection parts.

A machine vision based product visual defect detection system for performing the machine vision based product visual defect detection method as described above, comprising:

the image acquisition module is used for acquiring a product surface image and inputting the product surface image into the feature extraction network model to acquire product features;

the defect analysis module is used for inputting the product characteristics to a defect detection network model to detect defects and determining defect types and corresponding defect quantity;

and the data transmission and display module outputs and displays the defect type and the corresponding defect number of the defect product.

The beneficial technical effects of the invention include: by adopting the method and the system for detecting the visual defects of the product based on machine vision, a product defect detection data set is constructed by improving a target detection algorithm, so that the visual defects of the product can be rapidly and accurately detected, and the technical problem of low efficiency of manually detecting the visual defects is solved; by establishing a Focus module in the feature extraction network model, the original sample of the product surface image is free from feature loss, and the next convolution operation is facilitated; by constructing a cross-stage partial network, the reduction of the parameter calculation amount is realized, and the defect detection efficiency is improved; by using GIOU loss for the regression loss of the defect position, the condition that the existing network regression loss function only pays attention to the overlapped part and cannot regress the non-intersection of the predicted frame and the real frame is overcome, and the defect detection precision is improved.

Other features and advantages of the present invention will be disclosed in the following detailed description of the invention and the accompanying drawings.

Drawings

The invention is further described with reference to the accompanying drawings:

FIG. 1 is a flow chart of a method for detecting visual defects of a product based on machine vision according to an embodiment of the present invention;

FIG. 2 is a flow chart of constructing a product defect detection dataset according to an embodiment of the present invention;

FIG. 3 is a flow chart of training and validating a deep learning model using a product defect detection dataset in accordance with an embodiment of the present invention;

fig. 4 is a schematic diagram of a detection system of a machine vision-based product visual defect detection method according to an embodiment of the present invention.

Wherein: 1. the system comprises an image acquisition module, a defect analysis module, a data transmission and display module and a display module.

Detailed Description

The technical solutions of the embodiments of the present invention will be explained and illustrated below with reference to the drawings of the embodiments of the present invention, but the following embodiments are only preferred embodiments of the present invention, and not all embodiments. Based on the examples in the implementation manner, other examples obtained by a person skilled in the art without making creative efforts fall within the protection scope of the present invention.

In the following description, directional or positional relationships such as the terms "inner", "outer", "upper", "lower", "left", "right", etc., are presented for convenience in describing the embodiments and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the invention.

Before explaining the technical scheme of the present embodiment in detail, first, a description is given of a background situation to which the present embodiment is applied.

The defect problem of the product always exists in industrial production, the visual defect on the surface of the industrial product can bring adverse effects to the attractiveness, comfort level, usability and the like of the product, and the safety problem caused by the defect of the product is increased, so that the enterprise pays more attention to the identification and detection of the visual defect of the product. The traditional visual defect detection method of the product is carried out by manual operation of professional technicians, and has low detection efficiency, and the consistency and accuracy of the detection standard are difficult to ensure by subjective judgment of the technicians. Later, with the rapid rise of computer image processing and analysis technology, the technology is expanded to be applied to the field of visual defect detection of products, so that a visual defect detection method of the products is more mature and perfected, wherein a machine vision detection method plays an important role in the visual defect detection technology of the products.

The machine vision detection method obtains the surface image of the product through a proper light source and an image sensor (CCD camera), extracts the characteristic information of the image by utilizing a corresponding image processing algorithm, and then performs the operations of distinguishing, counting, storing, inquiring and the like of surface defects according to the characteristic information. The method overcomes the defects of low sampling rate, low accuracy, poor real-time performance, low efficiency, high labor intensity and the like of the manual detection method to a great extent, and is widely researched and applied in the modern industry. The machine vision detection method mainly has two applicable scenes: (1) In the product testing stage, static testing of the product is carried out, so that the surface defects of the product are rapidly identified, and the labor is saved; (2) In the production stage of the product, the quality detection of the product is carried out, the static detection of the product can be rapidly completed, and the product has an intelligent quality detection function for a large-batch and automatic production line.

However, the surface defect detection based on machine vision still has the following two technical problems:

(1) The signal to noise ratio of the detection system is generally low and weak signals are difficult to detect or cannot be effectively distinguished from noise due to the influence of multiple factors such as environment, illumination, production process, noise and the like. Therefore, how to construct a stable, reliable and robust machine vision defect detection system to adapt to illumination variation, noise and other interference of external bad environments is one of the problems to be solved.

(2) The machine vision surface defect detection, especially on-line detection, is characterized by huge data volume, more redundant information and high feature space dimension, and meanwhile, the algorithm capability for extracting limited defect information from mass data is insufficient and the real-time performance is not high by considering the diversity of real machine vision facing objects and problems. Therefore, more robust image processing and analysis algorithms need to be studied, the effectiveness and execution efficiency of image processing are improved, the complexity of the algorithms is reduced, and the accuracy of recognition is improved.

To this end, an embodiment of the present application provides a method for detecting visual defects of a product based on machine vision, please refer to fig. 1, which includes the following steps:

step A01) obtaining a product surface image.

The principle is that the image sensor (such as CCD camera) is used to convert the light collected by the lens into electric signal, then into digital signal, and then transferred to the next step for feature extraction. The operation of obtaining an image of the surface of a product may refer to the related art, and this will not be described in detail in the embodiments of the present application.

Step A02) inputting the product surface image into a feature extraction network model to obtain product features.

Further, inputting the product surface image into the feature extraction network model to obtain the product features specifically includes: inputting the product surface image into a feature extraction network model to obtain a feature map, wherein the feature map is a set of at least one product feature. Optionally, the feature extraction network model includes a residual network model and a bottleneck layer.

Further, inputting the product surface image into the surface feature extraction network model to obtain the feature map specifically includes: establishing a Focus module in the feature extraction network model, inputting a product surface image into the Focus module, and performing slicing operation; and carrying out convolution operation on the picture obtained after slicing operation in the feature extraction network model to obtain a double downsampling feature map.

The downsampling mainly plays roles in reducing the quantity of parameters and realizing dimension reduction in the neural network, and meanwhile, the local receptive field can be increased. The downsampling process is inevitably accompanied by a loss of features, especially in the case of segmentation tasks that are subject to downsampling encoding and upsampling decoding. Focus is a special downsampling method in the object detection network, and a slicing operation is used to split a high-resolution picture/feature map into a plurality of low-resolution picture/feature maps. The specific operation is that a picture is split into 4 parts through interval sampling, thus four pictures are taken, the four pictures are complementary and have long length, but no characteristics are lost, W, H information of the pictures is concentrated into a channel space, an input channel is expanded by 4 times, namely, the pictures generated by splicing in the channel dimension are changed into 12 channels relative to the original RGB three-channel mode, and finally different characteristics are extracted in a convolution mode, so that a double downsampling characteristic diagram without characteristics loss is finally obtained. In the yolov5s network, the original 640×640×3 image is input into a Focus module, and is firstly changed into a 320×320×12 feature map, and then is subjected to a convolution operation to finally be changed into a 320×320×32 feature map, so that the receptive field of each point is improved, the loss of original information is reduced, and the next convolution operation is facilitated.

And A03) inputting product characteristics into a defect detection network model to detect defects and outputting a defect detection result.

According to the embodiment, the product defect detection data set is constructed by improving the target detection algorithm, so that the visual defects of the product can be rapidly and accurately detected, and the technical problem of low efficiency of manual visual defect detection is solved.

The defect detection network model is obtained through the following steps:

recording defect information on a product surface image, and constructing a product defect detection data set according to the defect information, wherein the product defect detection data set comprises a training set and a verification set;

training and verifying the deep learning model by using the product defect detection data set to obtain a defect detection network model.

That is, before the defect detection network model of the present embodiment is actually applied, the detection accuracy and speed of the defect detection network model may be trained and improved by means of machine learning.

Further, the present embodiment utilizes a plurality of product surface images having various surface defects to construct a training set for training a defect detection network model. Referring to fig. 2, the step of constructing a product defect inspection data set according to defect information includes:

step B01), intercepting a surface area of a product to be detected based on morphological processing and background difference; by this step, unnecessary influence of other images in the background on the algorithm can be avoided.

Step B02) marking whether the surface image of the product contains defects or not;

step B03), classifying the marked pictures containing the defect targets, and recording the real frame positions and the category information of all the defect targets;

step B04) constructing different product defect detection data sets aiming at different types of defects, and finally dividing the product defect detection data sets into a training set and a verification set according to a certain proportion.

The training set is used for training the defect detection network model, and the verification set is used for verifying the detection precision and speed of the model. The product defect detection data set of different types of defects can be constructed by extracting the surface area of the product to be detected based on morphological processing and background difference, marking pictures containing defects and classifying, and simultaneously recording the position and type information of all defect targets.

Optionally, the data set can be expanded by means of translation, overturn, rotation, scaling, changing background color temperature and the like, so that the identification capability of the defect detection network model is more robust.

Optionally, referring to fig. 3, the step of training and validating the deep learning model using the product defect detection dataset includes:

step C01) constructing an algorithm network model, which comprises the following steps: and constructing a trunk feature extraction network, pre-training the classification task of the trunk network on the published image Net image dataset in advance, and storing a pre-training network model and a model weight file.

In the field of machine vision, convolutional neural networks (ConvNets, CNN) have become the most dominant method. One milestone event in CNN history is the occurrence of a res net model, which can train deeper CNN models, thus achieving higher accuracy. The ResNet model also belongs to a residual network, and the core of the ResNet model is to build a short circuit connection (shortcuts) between a front surface layer and a rear surface layer, which is helpful for the back propagation of gradients in the training process, so that deeper CNN networks can be trained. However, the ResNet model is designed purely to save computation time and further reduce the time required for the whole model training, and has no effect on the final model accuracy.

EfficientNet is a new model scaling method based on ResNet, which uses a simple and efficient composite coefficient to scale up the network from three dimensions of depth (depth), width (width) and resolution (resolution) of the input image, without scaling the dimensions of the network arbitrarily as in conventional ResNet, and an optimal set of parameters (composite coefficients) can be obtained based on neural structure search technique EfficientNet. EfficientNet is not only much faster than ResNet, but also more accurate.

The convolutional neural network EfficientNetV2 adopts a training and sensing neural network structure search (NAS) and scaling technology to jointly optimize the training speed and parameter quantity, and a search space of the NAS is expanded by adopting a new op (such as Fused-MBConv). Compared with other SOTA (state of the art) schemes, efficientNetV2 converges faster and the model is smaller (6.8 x). Acceleration is typically achieved by gradually increasing the image size during the training process, but typically results in reduced accuracy. To compensate for this performance loss, an improved version of progressive learning is proposed that adaptively adjusts regularization factors such as dropout, data augmentation, and mixup according to image size. The benefit from progressive learning approach, efficientNetV2 is significantly better than other models on CIFAR/Cars/flows datasets. By pre-training on the ImageNet dataset, the EfficientNetV2 reached 87.3% top1 accuracy on the ImageNet image dataset, better than ViT with 2.0% accuracy, and faster training (5 x-11 x).

Preferably, the present embodiment uses lightweight EfficientNetv2 as the backbone feature extraction network. The target detection algorithm can be generally divided into a single-stage target detection algorithm and a double-stage target detection algorithm, wherein the single-stage algorithm is completed in one stage due to the fact that convolution characteristic extraction and attribute classification are all concentrated, and the double-stage target detection algorithm is completed separately, so that a difference between detection and training speeds is caused. The single-stage algorithm represented at present is a YOLO series and an SSD series, wherein the update speed of the YOLO series algorithm is high, the detection speed can reach within 10ms (5 s model) at present, the detection precision is high, and the method is suitable for defect detection and positioning. It should be noted that, in the embodiment of the present application, compared to the original backbone feature extraction network of the YOLOv5 network, the first 1 to 3 stages of the afflicientnetv 2 use Fused-MBConv (shallow network) modules, and the depthwiseconv3×3 and the extended conv1×1 in the MBConv modules are replaced by a single conventional conv3×3. Moreover, the EfficientNetv1 part stage adopts a 5x5 convolution kernel, while EfficientNetv2 only adopts a 3x3 convolution kernel, but EfficientNetv2 contains more layers to make up for receptive fields; there is also no stage of last stride-1 in EfficientNetv2 and EfficientNetv 1. These differences make the trunk feature extraction network parameters of the embodiment less, and the memory consumption is also less, thus effectively shortening the training time.

In practice, a plurality of convolution layers having the same structure are generally referred to as a stage. The Stride is the Stride, note that the Operator modules are stacked repeatedly multiple times in each Stage, with the other strides default to 1 except for the Stride of the first Operator module. Layers represent the number of times that Stage repeatedly stacks operators.

Step C02) builds a cross-phase partial network.

Further, constructing the cross-stage part network may include splitting a feature map obtained by the trunk feature extraction network into two parts, which are respectively recorded as a trunk part and a branch part, sending the trunk part into the SPP module after passing through a plurality of convolution layers, performing a splicing operation on the obtained feature map layer, and connecting the result of the convolution operation of the trunk part with the branch part after passing through a plurality of convolution layers.

In the embodiment of the application, the calculation amount is reduced by constructing the cross-stage part network, the calculation speed of the cross-stage part network can be effectively improved by fusing the residual error network or the bottleneck layer in the cross-stage part network, the detection efficiency is improved, and the detection precision is improved to a certain extent while the parameter calculation amount is reduced.

And C03) constructing a feature aggregation network, adding cross small-batch normalization after each convolution layer of the feature aggregation network, and activating a function Mish to form a convolution module.

According to the above structure, the present embodiment provides a structure of a feature aggregation network, and examples are as follows:

the 77 th layer is a convolution layer and an up-sampling layer, the convolution kernel size is 1 multiplied by 1, and the step length is 1;

the 78 th to 82 th layers are formed by carrying out convolution operation with the convolution kernel size of 1 multiplied by 1 and the step length of 1 on the characteristic layers obtained by the 62 th layers, carrying out series connection on the characteristic layers obtained by the 77 th layers, and then carrying out 5 convolution layers, wherein the convolution kernel sizes are 1*1, 3*3, 1*1, 3*3 and 1*1 in sequence, and the step length is 1;

the 83 th layer is a convolution layer and an up-sampling layer, the convolution kernel size is 1 multiplied by 1, and the step length is 1;

the 84 th to 88 th layers are obtained by carrying out convolution operation on the characteristic image layer obtained by the 45 th layer with the convolution kernel size of 1 multiplied by 1 and the step length of 1, and then carrying out series connection with the characteristic image layer obtained by the 83 rd layer, and then carrying out 5 convolution layers, wherein the convolution kernel sizes are 1*1, 3*3, 1*1, 3*3 and 1*1 in sequence, and the step length is 1.

Class 1 and frame prediction result (yoleohead 1) is layer 115 output; class 2 and frame prediction result (yoleohead 2) is layer 87 output; the 3 rd class and frame prediction result (yoleohead 3) is the 88 th layer output.

In the above embodiment, the 1 st classification and frame prediction result includes a 2-dimensional convolution with a size of 3*3 and a step of 1, and a 2-dimensional convolution with a size of 1*1 and a step of 1, and the output feature map size is 52×52×18, where 52×52 is the feature map plane size, and 18 is the feature map channel number. The 2 nd classification and frame prediction result includes a 2-dimensional convolution of size 3*3 with a step size of 1, and a 2-dimensional convolution of size 1*1 with a step size of 1, outputting a feature map size 26 x 18, where 26 x 26 is the feature map plane size and 18 is the feature map channel number. The 3 rd classification and bounding box prediction result includes a 2-dimensional convolution of size 3*3 with a step size of 1, and a 2-dimensional convolution of size 1*1 with a step size of 1, with an output feature map size of 13 x 18, where 13 x 13 is the feature map plane size and 18 is the feature map channel number. The three classification and frame prediction results all have 18 channels, taking 1×1×18 in the 3 rd classification and frame prediction result (yoleohead 3) as an example, 1 st to 6 th are parameters of the first prediction frame, 7 to 12 th are parameters of the second prediction frame, and 13 to 18 th are parameters of the third prediction frame. Among the parameters of the first prediction frame, the 1 st parameter is the confidence of the prediction frame; the 2 nd parameter is the probability of kiln defects in the frame; the 3 rd to 6 th parameters are 4 position adjustment parameters of the prediction frame, and the parameters of the second and third prediction frames are sequentially identical to the parameters of the first prediction frame.

It will be appreciated that in the examples of the above embodiments, specific examples are given for the sake of clarity of presentation of layers, blocks, core sizes, step sizes, etc. of the respective networks. These values are exemplary illustrations and one of ordinary skill in the art can modify the number of layers, number of blocks, core size, step size, etc., as desired in actual use.

Step C04) training the constructed algorithmic network model using a training set of product defect detection datasets.

Further, in the step of training the constructed algorithmic network model using the training set of product defect detection datasets, a loss function is added. A loss function (lossfunction) or cost function (costfunction) is a function that maps the value of a random event or its related random variable to a non-negative real number to represent the "risk" or "loss" of the random event. In this embodiment, the loss functions may include a regression loss function, a category loss function, and a confidence loss function; wherein GIOU loss is used for the defect location regression loss function, and a binary cross entropy loss function is used for the defect class loss function and the confidence loss function; and (3) calculating a total loss value, then carrying out error back propagation, using a cross small-batch normalization and activation function Mish, avoiding the network degradation phenomenon, accelerating the network training, and storing a model with the highest average precision mean value in the verification set in all the training to obtain a trained improved target detection algorithm network model.

The method can save a model, model parameters and training results once in 10 times of each iteration, and finally save the model, model parameters and training results with the highest mean average precision in verification sets in all training.

Further, the calculation method of the total loss value is as follows:

L＝γ ₁ L _cls +γ ₂ L _obj +γ ₃ L _box

calculating class loss functions

calculating confidence loss function

Wherein: o (O) _i True value representing confidence, i.e. O when sample i belongs to a positive sample _i With a value of 1, O when sample i belongs to a negative sample _i With a value of 0, n represents the total number of all samples, sigmoid (C _i ) The predicted value representing the confidence isThe probability of positive samples in the model occurring in the predicted samples i;

marking the coordinates of the prediction frame as

The sitting sign of the real frame is +.>

And->

Area of prediction frame +.>

Area of real frame

the regression loss function is calculated as:

According to the embodiment, the GIOU loss is used for the regression loss of the defect position, so that the problem that the existing network regression loss function only pays attention to the overlapping part and cannot regress the situation that the predicted frame is not intersected with the real frame is overcome, and the defect detection precision is improved.

Further, in the step of training the constructed algorithm network model by using the training set of the product defect detection data set, predicting the verification set by using the improved target detection algorithm network model and parameters to obtain a prediction result, performing post-processing on the prediction result by using a GIOU-NMS (non-maximum suppression) module to obtain an output, calculating the average value of the detection precision and the average precision of the trained improved target detection algorithm network model based on the output, and recording the detection result;

wherein the calculation formula of the average precision mean value is

Wherein APi is the detection precision of a certain detection part of the product, and x is the detection partNumber of parts.

A machine vision-based product visual defect detection system for performing the machine vision-based product visual defect detection method as described above, referring to fig. 4, comprising:

the image acquisition module 1 is used for acquiring a product surface image and inputting the product surface image into the feature extraction network model to acquire product features;

the defect analysis module 2 inputs product characteristics to the defect detection network model to detect defects and determine defect types and corresponding defect quantity;

the data display module 3 outputs and displays the defect type of the defect product and the corresponding defect number.

Preferably, in this embodiment, the captured product surface image is distributed to a plurality of operation servers in the detection system, and the models deployed in the servers automatically identify various defects on the image for the input product surface image and generate all defect types and corresponding defect numbers of the product, so that the visual defect detection work of the product can be completed in a very short time (within seconds).

While the invention has been described in terms of embodiments, it will be appreciated by those skilled in the art that the invention is not limited thereto but rather includes the drawings and the description of the embodiments above. Any modifications which do not depart from the functional and structural principles of the present invention are intended to be included within the scope of the appended claims.

Claims

1. The product visual defect detection method based on machine vision is characterized by comprising the following steps of:

acquiring a product surface image;

2. The machine vision-based product visual defect detection method of claim 1, wherein inputting the product surface image into the feature extraction network model to obtain the product features specifically comprises:

3. The machine vision-based product visual defect detection method of claim 2, wherein inputting the product surface image to the surface feature extraction network model to obtain a feature map specifically comprises:

4. A machine vision based product visual defect detection method as set forth in claim 1, wherein,

the step of constructing a product defect detection dataset from the defect information comprises:

marking whether the product surface image contains a defect;

5. A machine vision based product visual defect detection method as set forth in claim 3, wherein,

the step of training and validating a deep learning model using the product defect detection dataset includes:

(2) Constructing a cross-stage partial network;

6. The machine vision based product visual defect detection method of claim 5, wherein,

the cross-phase partial network comprises: and splitting the feature map obtained by the trunk feature extraction network into two parts, namely a trunk part and a branch part, wherein the trunk part is sent to the SPP module after passing through a plurality of convolution layers, the obtained feature map layers are spliced, and the result of the trunk part convolution operation is connected with the branch part after passing through a plurality of convolution layers.

7. The machine vision based product visual defect detection method of claim 5, wherein,

adding a loss function in the step of training the constructed algorithm network model by using the training set of the product defect detection data set, wherein the loss function comprises a regression loss function, a category loss function and a confidence loss function; wherein GIOU loss is used for the defect location regression loss function, and a binary cross entropy loss function is used for the defect class loss function and the confidence loss function; and (3) calculating a total loss value, performing error back propagation, using a cross small-batch normalization and activation function Mish, and storing a model with the highest average precision mean value in verification sets in all training to obtain a trained improved target detection algorithm network model.

8. The method for detecting visual defects of a machine vision based product of claim 7,

the calculation method of the total loss value comprises the following steps:

L＝γ ₁ L _cls +γ ₂ L _obj +γ ₃ L _box

calculating class loss functions

calculating confidence loss function

marking the coordinates of the prediction frame as

The sitting sign of the real frame is +.>

And is also provided with

Area of prediction frame +.>

Area of real frame

Searching the real frame and the minimum circumscribed rectangle of the predicted frame, and calculating two rectangle framesMinimum circumscribed rectangular area S ^c The method comprises the following steps:

the regression loss function is calculated as:

9. The method for detecting visual defects of a machine vision based product of claim 7,

in the step of training the constructed algorithm network model by using the training set of the product defect detection data set, predicting the verification set by using the improved target detection algorithm network model and parameters to obtain a prediction result, performing post-processing on the prediction result by using a GIOU-NMS module to obtain output, calculating the average value of detection precision and average precision of the trained improved target detection algorithm network model based on the output, and recording the detection result;

wherein the calculation formula of the average precision mean value is

Wherein AP (i) is the detection precision of a certain detection part of the product, and x is the number of detection parts.

10. A machine vision based product visual defect detection system for performing the machine vision based product visual defect detection method of any one of claims 1 to 9, comprising: