CN112184679A

CN112184679A - YOLOv 3-based wine bottle flaw automatic detection method

Info

Publication number: CN112184679A
Application number: CN202011064186.9A
Authority: CN
Inventors: 关洁; 杨海东; 李俊宇; 李淑芬
Original assignee: Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute; Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Current assignee: Foshan Nanhai Guangdong Technology University CNC Equipment Cooperative Innovation Institute; Foshan Guangdong University CNC Equipment Technology Development Co. Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2021-01-05

Abstract

The invention discloses a method for automatically detecting a wine bottle flaw based on YOLOv3, which comprises the steps of learning defect characteristics in wine bottle flaw picture data marked by manual collection by using a YOLOv3 network structure, obtaining a model capable of automatically identifying whether a wine bottle in production contains the flaw, loading the model into a real-time wine bottle identification system, shooting in real time and returning an identification result in real time, solving the problems of low efficiency and precision of a manual detection belt, and helping to improve the automation degree of the production process. The invention also has the advantages of convenient operation and easy implementation.

Description

YOLOv 3-based wine bottle flaw automatic detection method

Technical Field

The invention relates to the technical field of deep learning computer vision, in particular to a wine bottle flaw automatic detection method based on YOLOv 3.

Background

With continuous transformation and automatic production in the production and manufacturing industry of China, the former manual-dominated gradual transformation is changed into a machine-dominated generation assembly line, and an artificial intelligence algorithm plays a vital role. For example, in the production process of wine bottles by wine bottle manufacturers, the quality detection of the wine bottles is mainly completed manually in the past, and the manufacturing quantity of the wine bottles is huge, so that the detection task is huge, and the detection is easy to miss. At present, a computer vision algorithm is adopted, and whether the wine bottle has flaws or not can be immediately judged by shooting the wine bottle by using a camera, so that manpower is liberated, and therefore the prior art needs to be further improved and perfected.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an automatic wine bottle flaw detection method based on YOLOv 3.

The purpose of the invention is realized by the following technical scheme:

a wine bottle flaw automatic detection method based on YOLOv3 mainly comprises the following steps:

step S1: collecting picture data of defective wine bottles rich in a production field to be subjected to model learning;

step S2: marking flaws in the picture, and then manufacturing according to the type of the VOC data set;

step S3: all data are visualized to summarize the distribution condition of each class of the data;

step S4: learning wine bottle flaws in the data by using a YOLOv3 network, and fitting the data by using deep learning multilayer convolution to obtain a converged weight model;

step S5: testing the defective wine bottle images of the test set by using the learned weight model, and then retraining after adjusting the parameters;

step S6: after the model test is promoted stably, the model is optimized through an optimization scheme, so that the precision is improved;

step S7: and obtaining a final model and debugging the final model on a field recognition system.

Further, in step S1, two high-precision industrial cameras are used to collect image data of the bottle cap and the bottle body respectively, and the image data are obtained separately because the environmental factors and the size ratios of the two parts are not the same; then, the size of the image is adjusted to 416 × 416 by opencv (image processing library).

Further, in step S2, the method includes dividing the bottle cap into ten defects according to the defect categories, wherein the ten defects include bottle cap damage, bottle cap deformation, bottle cap broken edge, bottle cap rotation, bottle cap breaking point, label skew, label wrinkling, label bubble, normal code spraying and abnormal code spraying; the method comprises the steps of manually marking flaws by using marking software LabelImg for target segmentation, performing framing and marking classification on the flaws in all pictures to obtain marking information files of the images, then sorting data and corresponding marking files according to the format of a VOC data set, and dividing the data and the corresponding marking files into a training set, a verification machine and a test set.

Further, in step S3, data visualization is performed by using the panda library and the matplotlib library, etc., the distribution of the data and the percentage of the various types of data amount of the defect feature are checked, and the number of the class features of the data can be correspondingly adjusted according to the distribution comparison graph data, so as to ensure that the data amount of each class is enough for model learning, and ensure the balance of the data.

Further, in step S4, the YOLOv3 backbone structure is formed by a Darknet-53 network, which uses the residual structure of ResNet for reference, and has 53 layers of convolution networks, wherein the convolution networks include 23 residual blocks, and the network structure can be made deeper by using this structure, so that deep semantic information can be extracted; wherein, 5 resX structures in the network structure, X represents a number, and there are res1, res2, … and res8, respectively, which indicate that this res _ block contains n res _ units, which is a large component of Yolo v 3; the basic component is also DBL, and consists of a layer of convolution, a layer of BN and a layer of LeakyReLU activation layer; the application of the BN layer can accelerate the convergence speed, the data of each layer are converted under the state that the mean value is zero and the variance is 1, so that the data of each layer are distributed in the same training mode, the convergence is easy, and the gradient explosion and the gradient disappearance are prevented.

Further, in step S4, the Yolo v3 does the initial size of the bounding box by means of k-means clustering in Yolo v2, and this a priori knowledge is very large for the initialization help of the bounding box, and after all, excessive bounding boxes are guaranteed to be effective, but are relatively large for the algorithm speed influence, compared with the structure prediction speed of two-stage such as fast RCNN, the speed is much Faster, and can reach 30FPS, and at the same time, the speed is high, and the mapp on the coco data set can reach 57.9%.

Further, in step S4, the Yolo v3 prediction part is composed of three branches, and the prediction of three different scales is obtained by performing a concatenation operation (concat) on tensors of different scales, where the outputs of the three different scales correspond to three convolutional layers, the number of convolutional cores of the last convolutional layer is 45, and the total classification number is 10: 3 × (10+4+1) ═ 45, 3 indicates that one grid cell contains 3 bounding boxes, 4 indicates 4 coordinate information of the frame, and 1 indicates confidence.

Further, in step S5, by evaluating the detection capability of the model for each type of defect, calculating an AP (average precision) value of each type and a total AP (mean average precision) value, and observing the AP value of each type, wherein the AP value is low, and the model is specifically and intensively learned for this type of data, so as to improve the overall detection accuracy of the model.

Further, in step S6, after the basic parameters of the model training are stabilized, the optimization schemes such as cfg files and backbone networks may be modified to continue to improve the accuracy of the model.

The working process and principle of the invention are as follows: according to the method, a YOLOv3 network structure is utilized to learn defect characteristics in wine bottle flaw picture data which are marked by manual collection, a model which can automatically identify whether a wine bottle in production contains a flaw or not is obtained, the model is loaded into a real-time wine bottle identification system, real-time shooting is carried out, and then an identification result is returned in real time, so that the problems of low efficiency and precision of a manual detection belt can be solved, the automation degree of the production process is improved, the detection speed is very high, a plurality of images can be detected in one second, the requirement of a large-batch detection task is met, the defect that the small-size target identification precision is not high in the previous version is overcome, and the detection precision is remarkably improved. The invention also has the advantages of convenient operation and easy implementation.

Compared with the prior art, the invention also has the following advantages:

(1) the method for automatically detecting the flaws of the wine bottle based on the YOLOv3 provided by the invention utilizes a computer vision algorithm, the speed and the precision of recognition are far superior to those of manual work, the detection efficiency and the precision after production are greatly improved, and the high-speed stable operation of production is ensured.

(2) The method for automatically detecting the flaws of the wine bottle based on the YOLOv3 is obviously higher than other two-stage models in detection speed on the premise of ensuring high enough detection precision, the speed can reach 30FPS, and the mAP on a coco data set can reach 57.9% while the speed is high.

(3) According to the method for automatically detecting the wine bottle flaws based on the YOLOv3, in the structural design of model prediction, the model can be ensured to learn and predict characteristics with a large scale change range by fusing multi-scale characteristics.

(4) The method for automatically detecting the wine bottle flaws based on the YOLOv3 adopts a method of directly predicting relative positions, predicts the relative coordinates of the center point of the b-box relative to the upper left corner of the grid unit, calculates loss by multiplying the error of IoU by the prediction category probability calculated by regression, and ensures the improvement of category and positioning precision during training.

Drawings

Fig. 1, 2 and 3 are images of raw wine bottle fault-containing data collected by the industry in accordance with the present invention.

Fig. 4 is a network architecture framework diagram of the YOLOv3 model in the present invention.

Fig. 5 is an image of the present invention predicting production after training wine bottle fault data using the YOLOv3 model.

FIG. 6 is a simplified system diagram of the present invention for field testing a model.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described below with reference to the accompanying drawings and examples.

Example 1:

as shown in fig. 1 to fig. 6, the present embodiment discloses a wine bottle fault detection method based on deep learning target detection, which is characterized in that a one-stage YOLOv3 multi-scale feature extraction detection network is used to realize accurate positioning and classification of fault targets with multiple categories and a large scale range, and specifically includes the following implementation steps:

step one, collecting picture data of defective wine bottles rich in a production field to wait for model learning.

(1) Because wine bottle flaws may be on the bottle cap and the bottle body, pictures of the bottle body and the bottle cap are collected respectively.

(2) The method is characterized in that high-precision industrial cameras are adopted to shoot the pictures of the defective wine bottles in the production environment, and the cameras are required to be arranged at two places, namely, one for collecting the pictures of the bottle caps and the other for collecting the pictures of the bottle bodies, as shown in figures 1 and 2.

(3) The pictures are all unified into 416 x 416 size by using opencv library, and the input of the network is guaranteed to be fixed in size.

And step two, marking the flaws in the picture, and then manufacturing according to the type of the VOC data set.

(1) According to all defect categories, a standard of all defect characteristics is worked out, and one is divided into 10 defects:

the bottle cap comprises the following components of 1, bottle cap breakage ', 2, bottle cap deformation, 3, bottle cap broken edge, 4, bottle cap screwing', 5, bottle cap breaking point ', 6, label skew', 7, label wrinkling ', 8, label bubble', 9, code spraying is normal, and code spraying is abnormal.

(2) And then, manually marking the flaws by using a marking software LabelImg for target segmentation, and performing framing and marking classification on the flaws in all the pictures to obtain a marking information file of the image.

(3) Then, according to the format of the VOC data set, the data and the corresponding label file are sorted and divided into a training set, a verification machine and a test set.

And thirdly, visually summarizing the distribution condition of each type of data by all the data.

After the labeling is finished, performing data visualization on all labeled information data by using a panda library, a matplotlib library and the like, checking the distribution condition of the data and the proportion of various types of data quantity of flaw characteristics, and as can be seen from the proportional relation of various characteristics in fig. 3, ensuring that the data quantity of each type is enough for the model to learn, and ensuring the balance of the data.

And fourthly, learning the wine bottle flaws in the data by using a YOLOv3 network, and fitting the data by using the deep learning multilayer convolution to obtain a converged weight model.

(1) Characteristics of the YOLOv3 network structure: YOLO is an end-to-end target detection model, as shown in fig. 4, and its basic idea is: firstly, extracting features from input features through a feature extraction network to obtain feature map output with a specific size. The input image is divided into 13 × 13 grid cells, and then if the center coordinate of an object in the real frame falls in a grid cell, the object is predicted by the grid cell. There are a fixed number of bounding boxes per object, three of Yolo v3, and the regression box used for prediction is determined using logistic regression.

(2) Firstly, the feature extraction part, namely the Yolov3 backbone structure, is composed of Darknet-53 network, which uses the residual error structure of ResNet and has 53 layers of convolution network, wherein 23 residual error blocks are contained, the structure can make the network structure deeper, and can extract deep semantic information. Wherein the network structure comprises 5 resX structures in the network structure. X represents a number, including res1, res2, …, res8, etc., which means that the res _ block contains n res _ units, which is a large component of Yolo v 3. The basic component is also DBL, which is composed of a layer of convolution, a layer of BN and a layer of LeakyReLU activation, and the step size of 5 convolutions in Darknet-53 is 2. After 5 times of reduction, the feature map is reduced to 1/32 of the original input size. The size of the network input picture is a multiple of 32, taken as 416 x 416.

(3) Secondly, the prediction part is composed of three branches, three predictions of different scales are obtained by performing splicing operation (concat) on tensors of different scales, the outputs of the three different scales correspond to three convolutional layers, the number of convolution kernels of the last convolutional layer is 45, and the total classification number is 10 types: 3 × (10+4+1) ═ 45, 3 indicates that one grid cell contains 3 bounding boxes, 4 indicates 4 coordinate information of the frame, and 1 indicates confidence.

(4) The initial size of the bounding box is also determined by the Yolo v3 in the way of k-means clustering in Yolo v2, and this prior knowledge is very large for the initialization help of the bounding box, and after all, too many bounding boxes are guaranteed for the effect, but have a relatively large influence on the algorithm speed.

(5) Yolo v3 employs a method of directly predicting relative position. The relative coordinates of the b-box center point with respect to the upper left corner of the grid cell are predicted. The (tx, ty, tw, th, t0) is directly predicted, and then the position size and confidence of the b-box are calculated by the following coordinate offset formula.

b_x＝σ(t_x)+c_x

b_y＝σ(t_y)+c_y

p_r(object)*IOU(b，object)＝σ(t₀)

tx, ty, tw, th are the predicted outputs of the model. cx and cy represent the coordinates of grid cells, for example, if the feature map size of a certain layer is 13 × 13, there are 13 × 13 grid cells, the coordinate cx of the grid cell at the 0 th row and the 1 st column is 0, and cy is 1. pw and ph represent the size of the predicted leading bounding box. bx, by, bw, and bh are the coordinates and size of the center of the predicted bounding box. Sum of squared error loss is used when training the several coordinate values because the error in this way can be calculated very quickly.

(6) Yolo v3 predicts the score for each bounding box using logistic regression. If the overlap of the bounding box with the real box is better than any other bounding box before, the value should be 1. If the bounding box is not the best, but does overlap with the real object beyond a certain threshold (the threshold set here in Yolo v3 is 0.5), then the prediction is ignored. Yolo v3 assigns only one bounding box to each real object, and if the bounding box does not coincide with the real object, no coordinate or class prediction penalty will be incurred, only an object prediction penalty will be incurred.

(7) Inputting the data set into a network model to learn the characteristics of the wine bottle flaws to obtain output corresponding position information and category information, then comparing and calculating errors by using a loss function (binary cross entropy loss) and marked information, and gradually fitting through gradient descent and back propagation to obtain a better flaw detection model.

And step five, testing the defective wine bottle images of the test set by using the learned weight model, and then retraining after adjusting the parameters.

(1) After the model is trained and fitted, the model can be tested by primarily utilizing the test set data, and the precision of the model for detecting the flaws of the new data can be checked. The approximate effect graph is shown in fig. 5. Whether basic errors (corresponding errors of labels, undetected obvious flaws and the like) occur in the model is roughly checked, and if the basic errors occur, data marking information and setting information of the training file need to be rechecked.

(2) If no basic error occurs, the detection capability of the model for each type of flaw can be evaluated, and the AP (average precision) value and the mAP (mean average precision) value of each type can be calculated. And observing the AP value of each category, and then obtaining the categories with low identification precision, and performing data enhancement and other operations on the data in a targeted manner to improve the overall detection precision of the model.

And sixthly, after the model is tested and improved stably, optimizing the model through an optimization scheme, and improving the precision.

(1) The detection accuracy can be increased by setting the random parameter in the cfg file to 1.

(2) The size of the input image may be increased, and more than (height 608) is set in the cfg file, but the detection accuracy may be increased to ensure a multiple of 32.

(3) All classes in the data set are ensured to have corresponding label, and the code can be used for checking.

(4) The richness of data is increased, enough images under different production backgrounds and different illumination conditions are guaranteed, and then the iteration times during training are guaranteed to be enough.

And seventhly, obtaining the final model and debugging the model on a field recognition system.

Finally, after the training precision of the model is high enough and stable enough, the model can be loaded to the site for testing, the model is tested in real time by using the system structure of FIG. 6, a produced wine bottle image is collected by using a fixedly installed industrial camera (another camera for collecting a wine bottle cap is similar to the image and is not shown in the figure), then the image is transmitted to a computer to be input into the model for flaw detection, whether the detection effect is stable or unstable needs to be observed, the environment of the site is required to be changed, if the detection effect is changed, new image data needs to be collected again, and then training learning is carried out to improve the identification precision of the model until the flaw in the image can be stably detected by the model.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A wine bottle flaw automatic detection method based on YOLOv3 is characterized by comprising the following steps:

2. The method for automatically detecting flaws in wine bottles based on YOLOv3 as claimed in claim 1, wherein in step S1, two high-precision industrial cameras are used to respectively collect image data of bottle caps and bottle bodies, and the image data are separately obtained because the environmental factors and the size ratios of the two parts are not the same; then, the size of the image is adjusted to 416 × 416 by opencv (image processing library).

3. The YOLOv 3-based wine bottle fault automatic detection method according to claim 1, wherein in step S2, the wine bottle fault automatic detection method is divided into ten kinds of faults according to all fault categories, which are respectively bottle cap breakage, bottle cap deformation, bottle cap broken edge, bottle cap rotation, bottle cap breaking point, label skew, label wrinkling, label bubble, normal code spraying and abnormal code spraying; the method comprises the steps of manually marking flaws by using marking software LabelImg for target segmentation, performing framing and marking classification on the flaws in all pictures to obtain marking information files of the images, then sorting data and corresponding marking files according to the format of a VOC data set, and dividing the data and the corresponding marking files into a training set, a verification machine and a test set.

4. The method for automatically detecting wine bottle flaws according to claim 1 based on YOLOv3, wherein in step S3, data visualization is performed by using a panda library, a matplotlib library and the like, the distribution of data and the proportion of various types of data of flaws are checked, and the number of characteristics of each category of data can be adjusted correspondingly according to the distribution comparison graph data, so as to ensure that the data amount of each category is enough for model learning and the data balance is ensured.

5. The method of claim 1, wherein in step S4, the YOLOv3 backbone structure is formed by a Darknet-53 network, which uses the residual error structure of ResNet, and has 53 layers of convolutional networks, which contain 23 residual error blocks, and this structure can make the network structure deeper and extract deep semantic information; wherein, 5 resX structures in the network structure, X represents a number, and there are res1, res2, … and res8, respectively, which indicate that this res _ block contains n res _ units, which is a large component of Yolo v 3; the basic component is also DBL, and consists of a layer of convolution, a layer of BN and a layer of LeakyReLU activation layer; the application of the BN layer can accelerate the convergence speed, the data of each layer are converted under the state that the mean value is zero and the variance is 1, so that the data of each layer are distributed in the same training mode, the convergence is easy, and the gradient explosion and the gradient disappearance are prevented.

6. The method for automatically detecting wine bottle flaws according to claim 1 based on Yolo 3, wherein in step S4, Yolo v3 is made by means of k-means clustering in Yolo v2, the priori knowledge is still large for initialization help of the bounding box, and after all, too many bounding boxes are guaranteed for effect, but the algorithm speed influence is still large, compared with the structure prediction speed of tow-stage such as fast RCNN, the speed is much Faster, and can reach 30FPS, and the mapp on the coco data set can reach 57.9% while the speed is high.

7. The method of claim 1, wherein in step S4, the Yolo v3 prediction part is composed of three branches, and the three predictions of different scales are obtained by performing a stitching operation (concat) on tensors of different scales, the three outputs of different scales correspond to three convolutional layers, the number of convolutional cores of the last convolutional layer is 45, and the total classification number is 10: 3 × (10+4+1) ═ 45, 3 indicates that one grid cell contains 3 bounding boxes, 4 indicates 4 coordinate information of the frame, and 1 indicates confidence.

8. The YOLOv 3-based wine bottle flaw automatic detection method according to claim 1, wherein in step S5, by evaluating the detection capability of the model for each type of flaws, calculating the AP (average precision) value of each type and the total mapp (mean average precision) value, and observing the AP value of each type, wherein the AP value is low, and the model is purposefully and intensively learned for the type of data to improve the overall detection accuracy.

9. The method of claim 1, wherein in step S6, after the basic parameters of the model training are stabilized, optimization schemes such as cfg files and backbone network changes can be modified to improve the accuracy of the model.