CN116051808A

CN116051808A - YOLOv 5-based lightweight part identification and positioning method

Info

Publication number: CN116051808A
Application number: CN202310037356.1A
Authority: CN
Inventors: 赵礼刚; 秦齐; 吴爱胜
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2023-01-10
Filing date: 2023-01-10
Publication date: 2023-05-02

Abstract

The invention discloses a lightweight part identification positioning method based on YOLOv5, which comprises the steps of collecting part sample images through an industrial camera; carrying out data enhancement on the acquired image samples, manufacturing a data set, and dividing the data set into a training set and a verification set; constructing a part recognition deep learning model based on lightweight YOLOv5, loading a data set to train an algorithm model, and obtaining a target recognition result and a prediction frame; intercepting a target area from an image in a prediction frame, extracting edges of a preprocessing area to obtain a target frame, extracting feature points of the target frame to obtain position information of the feature points, and calculating the center of gravity as a positioning point of the part; based on camera calibration parameters and a pinhole imaging principle, converting pixel coordinates of the center of gravity of the part into actual physical coordinates, and obtaining the actual coordinates of the center of gravity of the part. The invention improves the robustness of target detection and greatly improves the efficiency of target identification.

Description

YOLOv 5-based lightweight part identification and positioning method

Technical Field

The invention relates to the technical field of part identification and positioning, in particular to a YOLOv 5-based lightweight part identification and positioning method with high target detection robustness and high target identification efficiency.

Background

Along with the wide application and research of machine vision, the part identification technology is used as a key link in the industrial manufacturing process, particularly, the part identification technology is deployed in an industrial robot, so that the industrial robot can classify target parts more quickly and accurately, and a light part identification method becomes the main content of research and development. In the industrial production process, the detection and identification of the parts face a plurality of challenges due to factors such as various types of the parts, complex classification environment, huge quantity, small targets and the like.

Due to rapid development of deep learning technology, two types of part target recognition methods are currently available. A regression-based single-stage (One-stage) recognition algorithm consisting of SSD, YOLO series and the like; the other is a Two-stage (Two-stage) recognition algorithm based on candidate boxes, such as R-CNN, faster R-CNN, etc. The recognition algorithm of the part target in the industrial production and manufacturing process must ensure precision and recognition rate simultaneously, and in the actual environment, the small target part has the conditions of false detection and omission, and the parameters and the volume of the model are too large, so that the problem that the model is difficult to deploy on an embedded platform is solved, and the high-precision requirement in the industrial production cannot be met.

Disclosure of Invention

The invention aims to: the invention aims to provide a YOLOv 5-based lightweight part identification positioning method with high target detection robustness and high target identification efficiency.

The technical scheme is as follows: the invention comprises the following steps:

(1) Collecting part sample images through an industrial camera;

(2) Carrying out data enhancement on the acquired image samples, manufacturing a data set, and dividing the data set into a training set and a verification set;

(3) Constructing a part recognition deep learning model based on lightweight YOLOv5, loading a data set to train an algorithm model, and obtaining a target recognition result and a prediction frame;

(4) Intercepting a target area from an image in a prediction frame, extracting edges of a preprocessing area to obtain a target frame, extracting feature points of the target frame to obtain position information of the feature points, and calculating the center of gravity as a positioning point of the part;

(5) Based on camera calibration parameters and a pinhole imaging principle, converting pixel coordinates of the center of gravity of the part into actual physical coordinates, and obtaining the actual coordinates of the center of gravity of the part.

Further, the step (1) controls the height of the industrial camera to be 30mm away from the tabletop, sets the shooting angle of the camera to be perpendicular to the tabletop, and then collects part images by using the industrial camera.

Further, the step (2) includes:

(21) Randomly overturning and cutting the sample image obtained in the step (1), and randomly adjusting the tone, brightness and saturation of the picture to strengthen a data set;

(22) Labeling the photo processed by the image by utilizing the YOLO format of a labelimg labeling tool, and classifying and labeling according to the type of the part;

(23) The training set is randomly selected according to 80% of the total data set number, and the rest 20% are the test sets.

Further, the step (3) includes:

(31) An original YOLOv5 model is built, and an input end, a main structure, a Neck structure and an output end of the model are built in sequence and are connected according to the algorithm propagation direction;

(32) Replacing a backbone network backbone of the YOLOv5 model with ghostnet;

(33) Constructing a novel feature pyramid SPP F;

(34) Removing the detection layer for a large target scale;

(35) Replacing the GIOU loss function in the YOLOv5 model with the Alpha-SIOU loss function;

(36) The average accuracy mAP of the overall sample detection is obtained, the average accuracy P of the target detection and the average recall rate R of the target are obtained, and the specific formulas are as follows:

wherein TP refers to identifying the correct number of parts; FN refers to the number of unidentified parts; FP refers to the number of parts that are misidentified; n represents the number of categories of parts.

Further, the step (35) includes:

(35-1) Angle loss

Wherein c _h The height difference between the center points of the real frame and the predicted frame, sigma is the distance between the center points of the real frame and the predicted frame,

for the coordinates of the center point of the real frame, < >>

The central coordinate of the prediction frame;

(35-2) distance loss

/>

γ＝2-Λ

Wherein (c) _w ，c _h ) The width and the height of the minimum circumscribed rectangle of the real frame and the prediction frame are the same;

(35-3) shape loss

Wherein w, h, w ^gt ，hg ^t The width and the height of the prediction frame and the real frame are respectively; theta controls the degree of concern over shape loss, typically taking [2,6]；

(35-4) IOU loss

Where IOU represents the ratio of the intersection of the areas of the image real frame and the prediction frame to the union of the areas, Δ is the distance loss, Ω is the shape loss.

Further, the step (4) includes:

(41) Gray processing is carried out on the identified image;

(42) Performing Gaussian smoothing on the image processed in the step (41);

(43) Performing median filtering on the image processed in the step (42);

(44) Extracting the edge profile of the part by using the Canny edge;

(45) The center of gravity is calculated.

The beneficial effects are that: compared with the prior art, the invention has the following remarkable advantages: mAP values in the test set reach 99.4%, precision, recall is reduced by 0.1% and 0.2% compared with the YOLOv5 model, but the reasoning time is 35% faster than before, and the calculated amount and the volume are only 35.62% and 14.58% of the YOLOv5 model; the part identification performance is greatly improved, and the robustness of target detection is also improved.

Drawings

FIG. 1 is a block diagram of an algorithm of the present invention;

FIG. 2 is a diagram of the YOLOv5 algorithm;

FIG. 3 is a graph comparing a conventional convolution with a Ghost convolution;

FIG. 4 is a diagram of SPP and SPPF networks;

FIG. 5 is a diagram of SPP F network architecture;

FIG. 6 is a wide and high contrast plot of a marker box in a part dataset;

FIG. 7 is a diagram showing the effect of part recognition, (a) is the result of YOLOv5 detection; (b) The result is a YOLOv5-ghost test result, and (c) the test result is a test result of the present invention.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings.

The invention provides a lightweight part identification and positioning method based on improved YOLOv5, which comprises the following steps:

(1) Collecting part sample images through an industrial camera specifically comprises the following steps:

the flange coupler, the gear, the bolt, the nut, the bearing and the shaft sleeve are selected as objects, and the six parts have various details and various types and are widely used in actual industrial production. The height of the industrial camera is controlled at 30mm from the tabletop, the shooting angle of the camera is set to be perpendicular to the tabletop, and then the image of the part is acquired by the industrial camera.

(2) Carrying out data enhancement on the acquired image samples, manufacturing a data set, and dividing the data set into a training set and a verification set, wherein the method specifically comprises the following steps of: randomly overturning and cutting the sample image obtained in the step S1, and randomly adjusting the tone, brightness and saturation of the picture to strengthen a data set so as to obtain 3797 part images; labeling the photo processed by the image by utilizing the YOLO format of a labelimg labeling tool, and labeling according to the type of the part; the training set is randomly selected according to 80% of the total data set number, and the rest 20% are the test sets.

(3) As shown in fig. 1, in the YOLO v5 model structure, a lightweight YOLO v 5-based part recognition deep learning model is constructed, and an algorithm model is trained by loading a data set to obtain a target recognition result and a prediction frame, which specifically includes:

(31) As shown in fig. 2, an original YOLOv5 model is built, and an input end, a main structure, a neg structure and an output end of the model are built in sequence and are connected according to the propagation direction of an algorithm;

(32) Replacing a backbone network backbone of the YOLO v5 model with a ghostnet;

as shown in fig. 3, the goal of the ghostnet is to reduce some of the computation as much as possible, thereby generating these redundant feature maps. The operation of the traditional convolution and the Ghost module shows that the Ghost module divides the traditional volume into two steps to operate, a small amount of traditional feature images are obtained by using the operation of the traditional convolution, then the Ghost feature images are generated through linear operation, and finally the traditional feature images and the Ghost feature images are combined and output, wherein the linear operation replaces a large amount of calculation in the traditional convolution, and the calculation amount and the volume of a model are greatly reduced.

(33) Constructing a novel feature pyramid SPP_F:

as shown in fig. 4, the feature pyramid using the SPP (Spatial Pyramid Pooling) module as the model in YOLOv5 is initially used to separate the main context features, the SPP uses sampling regions with the size of kernel size= {1×1,5×5,9×9, 13×13} and performs Maxpool (maximum pooling) operation, and then performs Concat (fusion) processing on feature maps with different scales. The original YOLOv5 model used in the method adopts an SPPF module as a characteristic pyramid of the model, the SPPF structure is formed by connecting input in series through a plurality of MaxPool layers with the size of 5x5, namely 9*9 convolution in SPP is replaced by two series 5*5 convolutions, 13 x 13 convolutions in SPP are replaced by three series 5*5 convolutions, and the SPPF reduces calculation time while maintaining the SPP receptive field unchanged.

The number of neurons is reduced by the SPP and the oversized sampling area of the SPPF, so that partial information is lost, so that the spp_f module herein adopts a sampling area with a size of kernel size= {1 x 1, 5x5, 9 x 9, 13 x 13} and performs Avgpool (average pooling operation), the average pooling can effectively avoid that individual information is too jittered to ignore most of numerical distribution of the sampling area, the spp_f adopts a Conv (conv+bn+relu) module, then performs average pooling of 3*3, performs average pooling of 5*5 and 7*7 on the basis of 3*3 pooling, can effectively enlarge the acceptance range of backbone network characteristics, and finally performs Concat operation on the pooled result and data subjected to pooling operation, and the spp_f network structure is shown in fig. 5.

(34) Removing the detection layer for a large target scale;

in the YOLOv5 model, there are three detection layers, and when the input image size is 640 x 640, the neck network is processed accordingly. 8. And (3) downsampling by 16 times and 32 times, wherein the sizes of the corresponding detection layer characteristic diagrams are 80×80, 40×40 and 20×20, and the detection layer characteristic diagrams are used for identifying small, medium and large targets. The analysis of the images in the dataset is carried out herein, as shown in fig. 5, and the parts in the dataset are found to be mostly small and medium-sized targets, so that the 20×20 feature layers for detecting large targets are removed, and the parameter quantity and volume of the model are reduced, so that the method can be more suitable for part target detection herein.

(35) The GIOU loss function in the YOLOv5 model is replaced by an Alpha-SIOU loss function, the direction between a real frame and a predicted frame is not considered by the GIOU, the convergence speed is low, the vector angle between the real frame and the predicted frame is introduced to the SIOU, the relevant loss function is redefined, the Power parameter Alpha is introduced into the loss function SIOU aiming at the problems, and the Alpha supercoefficients are adjusted to meet the different levels of Bounding box regression accuracy. Replacing the original loss function with an Alpha-SIOU loss function, wherein the SIOU loss function consists of four functions:

(35-1) Angle loss (Angle)

for the coordinates of the center point of the real frame, < >>

In order to predict the coordinates of the center of the frame,

(35-2) distance loss (Disancecost)

γ＝2-Λ

(35-3) shape loss (shape)

Wherein w, h, w ^gt ，h ^gt The width and the height of the prediction frame and the real frame are respectively; theta controls the degree of concern over shape loss, typically taking [2,6]；

(35-4) IOU loss

IoU the ratio of the intersection of the areas of the image real frame and the prediction frame to the union of the areas.

(36) As shown in the recognition effect diagram of the part in fig. 6, the average accuracy mAP of the overall sample detection is obtained, the average accuracy P of the target detection and the average recall rate R of the target are as follows:

wherein: TP refers to identifying the correct number of parts; FN refers to the number of unidentified parts; FP refers to the number of parts that are misidentified; n represents the number of categories of parts.

(4) The method comprises the steps of intercepting a target area from an image in a prediction frame, extracting edges of a preprocessing area to obtain a target frame, extracting feature points of the target frame to obtain position information of the feature points, and calculating the center of gravity as a positioning point of a part, wherein the method specifically comprises the following steps:

(41) Gray processing is carried out on the identified image: using YUV brightness graying, according to the physical meaning of the component of Y in the YUV color space, the brightness level is reflected by the value, and according to the change relation between RGB and YUV color space, the correspondence between brightness Y and R, G, B can be established, and the gray value of the image can be expressed by the brightness value. The formula is:

Y(i，j)＝0.299R(i，j)+0.578G(i，j)+0.114B(i，j)

where (i, j) represents a coordinate point in the two-dimensional image.

(42) And (3) carrying out Gaussian smoothing on the image processed in the step (S4.1), wherein Gaussian noise in normal distribution can be eliminated by Gaussian filtering, and the Gaussian noise belongs to linear filtering. The gaussian template scans each pixel in the image and substitutes the value of the template center pixel point with the weighted average gray value of the pixels in the field determined by the template.

(43) And (3) carrying out median filtering on the image processed in the step (S4.2), wherein the median of the neighborhood gray values of the pixel points is used for replacing the gray values of the pixel points, so that impulse noise and salt and pepper noise can be effectively removed, and meanwhile, the edge details of the image can be kept.

(44) The Canny edge extracts the edge profile of the part. The canny edge detection algorithm has strong noise resistance, uses Gaussian filtering to smooth an image, uses finite difference of first-order bias to calculate the amplitude and direction of the gradient, carries out non-maximum suppression on the gradient amplitude, and adopts double-threshold detection and edge connection.

(45) The center of gravity of the graph proposed by the edge of the part is calculated as follows:

(5) Based on camera calibration parameters and a pinhole imaging principle, converting pixel coordinates of the center of gravity of the part into actual physical coordinates to obtain the actual coordinates of the center of gravity of the part, and specifically comprising the following steps: performing camera calibration by adopting a Zhang Zhengyou plane calibration method to obtain internal parameters and external parameters of a camera;

and determining the actual coordinates of the center of gravity of the part based on the principle of pinhole imaging according to the internal parameters and the external parameters of the camera, the vertical distance between the camera and the desktop where the part is located and the pixel coordinates of the center of gravity of the part.

In order to verify the lightweight part identification positioning method based on YOLOv5 provided by the embodiment of the invention, the following experiment is carried out.

The performance of the detection model was measured using Precision (Precision), recall (Recall), and average Precision mean (mAP). The calculation formula is as follows:

In order to better test the performance of the improved algorithm in the present invention in identifying the targets of the parts, the currently popular lightweight target identification algorithm is compared with the improved method in the present invention on the premise of using a uniform data set and a consistent partitioning method, and mAP values (%), inference time (ms), calculation amount (GFLOPs) and volume (MB) of the model are taken as evaluation indexes, and the final results of the test comparison are shown in Table 1.

TABLE 1

As can be seen from the comparison of the data in the table, mAP values of the algorithm reach 0.994, precision and Recall are reduced by 0.1% and 0.2% compared with the YOLOv5 model, the reasoning time is 35% faster than before, and the calculated amount and volume are only 35.62% and 14.58% of the YOLOv5 model. Compared with the currently popular lightweight recognition algorithm, the model has the advantages that not only Precision, recall is increased, but also the reasoning time (ms) of the model is faster, and the calculated amount (GFLOPs) and the volume (MB) are smaller. From the analysis, the light part detection algorithm provided by the method has good effects on average precision (mAP), inference time (ms), calculated amount (GFLOPs) and volume (MB), so that the method is more suitable for being deployed on low-power equipment. In order to more intuitively display the difference of detection effects before and after model improvement, part of the test pairs are shown in fig. 7, the error detection condition before improvement is improved, the classification confidence score is also improved, and the feasibility of the improvement method is further illustrated.

The ablation experiment is to verify the improved performance of each measure, and the experimental results are shown in table 2, (1) the SPP F characteristic pyramid module is adopted, (2) the large target detection layer is removed, and (3) the modification loss function is shown. As can be seen from the data in the table, when the large target detection layer is deleted, precision, recall is reduced by 1.0% compared with the YOLOv5 model, but the volume is reduced to 14.58% of the original volume, so that the model volume is greatly reduced under the condition that the accuracy of the model is slightly lost after the large target detection layer is deleted, and the volume and the calculated amount of the model are not increased and Precision, recall, mAP are improved after the SPP F feature pyramid is adopted. Overall, the overall performance of the algorithm is also most excellent after three measures are improved simultaneously using the Ghost model.

TABLE 2

/>

Claims

1. A lightweight part identification positioning method based on YOLOv5 is characterized by comprising the following steps:

(1) Collecting part sample images through an industrial camera;

2. The YOLOv 5-based lightweight part recognition positioning method of claim 1, wherein the step (1) controls the height of the industrial camera to be 30mm from the tabletop, sets the photographing angle of the camera to be perpendicular to the tabletop, and then collects the part image by using the industrial camera.

3. The YOLOv 5-based lightweight part identification positioning method of claim 1, wherein step (2) comprises:

4. The YOLOv 5-based lightweight part identification positioning method of claim 1, wherein step (3) comprises:

(32) Replacing a backbone network backbone of the YOLOV5 model with ghostnet;

(33) Constructing a novel feature pyramid SPP_F;

(34) Removing the detection layer for a large target scale;

5. The YOLOv 5-based lightweight part identification positioning method of claim 1, wherein the step (35) comprises:

(35-1) Angle loss

Wherein c _h For real frames and predictionsThe difference in height of the center point of the frame, sigma, is the distance between the true frame and the center point of the predicted frame,

for the coordinates of the center point of the real frame, < >>

The central coordinate of the prediction frame;

(35-2) distance loss

γ＝2-Λ

(35-3) shape loss

Wherein w, h, w ^gt ，h ^gt Respectively, prediction frame and trueThe width and height of the solid frame; theta controls the degree of concern over shape loss, typically taking [2,6]；

(35-4) IOU loss

6. The YOLOv 5-based lightweight part identification positioning method of claim 1, wherein step (4) comprises:

(41) Gray processing is carried out on the identified image;

(42) Performing Gaussian smoothing on the image processed in the step (41);

(43) Performing median filtering on the image processed in the step (42);

(44) Extracting the edge profile of the part by using the Canny edge;

(45) The center of gravity is calculated.