CN114494756A

CN114494756A - Improved clustering algorithm based on Shape-GIoU

Info

Publication number: CN114494756A
Application number: CN202210007090.1A
Authority: CN
Inventors: 王兰美; 周琨; 王桂宝; 廖桂生; 孙长征; 张志伟
Original assignee: Xidian University; Shaanxi University of Technology
Current assignee: Xidian University; Shaanxi University of Technology
Priority date: 2022-01-05
Filing date: 2022-01-05
Publication date: 2022-05-13

Abstract

The invention provides a clustering algorithm based on Shape-GIoU improvement, which solves the problem that the original GIoU is degraded into IoU, introduces a proportionality coefficient lambda/beta, introduces the influence of the aspect ratio of two frames into a formula, can solve the problem that the GIoU is degraded, considers the influence of the non-coverage area and the aspect ratio of the two frames, and partially improves the detection performance of the YOLOv4 algorithm; firstly, constructing a data set and preprocessing sample data; secondly, introducing the proposed Shape-GIoU method into a K-means + + clustering method, and then replacing an anchor value in the cfg file of the original edition YOLOv4 algorithm by the obtained anchor value; finally, comparing a YOLOv4 algorithm model using a K-means + + clustering method, and analyzing a test result; compared with a K-means + + clustering algorithm, the improved clustering algorithm based on Shape-GIoU can solve the problem that the GIoU is degraded, considers the influences of non-coverage area and aspect ratio, and improves the detection precision and recall rate of the YOLOv4 algorithm model; in addition, the method can still be used in combination with other classical algorithm models, and the detection performance of the algorithm models is improved.

Description

Improved clustering algorithm based on Shape-GIoU

Technical Field

The invention belongs to the field of image recognition, and particularly relates to a Shape-GIoU-based improved clustering algorithm, wherein a K-means + + clustering method is combined with a Shape-GIoU to cluster an initial Anchor box by improving a computing mode Shape-GIoU of IoU, so that the detection precision and recall rate of a YOLOv4 algorithm in a data set are improved.

Background

In recent years, deep learning models are becoming the main algorithms in the field of target detection, wherein convolutional neural networks have achieved significant results in the field of target detection. The target detection model based on deep learning can be divided into two types: a two-stage detection algorithm based on Region Proposal, such as R-CNN, Fast R-CNN, etc., the algorithm needs to generate candidate regions first, and then classify and regress the target candidate regions. The other is a single-stage detection algorithm such as a YOLO series, and compared with a double-stage target detection algorithm, the YOLO series algorithm abandons the stage of generating a candidate region, and uses an Anchor box (Anchor box) to replace the candidate region for final regression. Before YOLOv4 was proposed, the parameter for evaluating the similarity between the Anchor box and the real box was IoU, but IoU has many drawbacks: firstly, IoU is always 0 when two frames do not intersect, and then IoU cannot express the information of the distance between the center points of the two frames; secondly, since IoU only contains information about the overlapping area of two frames, when IoU of the two frames are equal, it is impossible to determine the specific shape information, such as aspect ratio.

Therefore, the invention provides a clustering algorithm based on Shape-GIoU improvement, the problem that the original GIoU is degraded to IoU is solved through embedding the improved IoU calculation mode Shape-GIoU, the problem that the GIoU is degraded can be solved by introducing a proportional coefficient lambda/beta and introducing the influence of the aspect ratio into a formula, the influence of the non-coverage area and the aspect ratio is considered, and the detection accuracy and the recall rate of the YOLOv4 network are improved.

Disclosure of Invention

In view of the above problems, it is an object of the present invention to provide an improved clustering algorithm for IoU calculation.

In order to achieve the purpose, the invention adopts the following technical solutions:

a clustering algorithm based on Shape-GIoU improvement is characterized in that a proportional coefficient lambda/beta is introduced, the influence of non-coverage area and aspect ratio is considered, the problem that the original GIoU is degraded to IoU is solved, and the detection performance of the YOLOv4 algorithm is improved.

The improved clustering algorithm comprises the following steps:

downloading public data sets AIZOO and RMFD for detecting the face mask, wherein because the quality of the data sets is also uneven, the face mask-wearing photo or the face mask-not-wearing photo with the resolution ratio larger than 608X 608 is selected from the public AIZOO and RMFD face recognition data sets, and a face mask data set used by the invention is constructed; creating a corresponding xml file for each sample picture in the data set, and storing the path of each sample picture and the category information p of the interested target in the sample picture_iCoordinate information of the upper left corner and the lower right corner of the target and the resolution of the sample picture are stored in corresponding xml files; preprocessing xml file data, firstly obtaining coordinate values of an interested target in a sample picture, and then converting the coordinate data into the width and height of a real frame, wherein the calculation formula is as follows: the width is equal to the horizontal coordinate of the lower right corner-the horizontal coordinate of the upper left corner, the height is equal to the vertical coordinate of the lower right corner-the vertical coordinate of the upper left corner, finally, the width and height values are normalized, and the normalized width and heightValue-width-height value/resolution of input picture; wherein the download address of the AIZOO data set: https:// github. com/AIZOOTech/faceMaskDetection; download address of RMFD data set: https// gitubb. com/X-zhangyang/Real-World-Masked-Face-Dataset;

because the prior open data sets aiming at the face and mask detection are less and the picture quality is also uneven, the invention selects the face picture with the mask or the picture without the mask with the resolution ratio more than 608X 608 from the open AIZOO and RMFD face recognition data sets to construct the face and mask data set used by the invention; the data sets of the invention are divided into two categories: the method comprises the steps that a mask wearing face target and a mask not wearing face target comprise 11208 pictures, a data set comprises 7933 mask wearing face targets and 13651 mask not wearing face targets in different scenes, and a test set, a verification set and a training set are divided according to the ratio of 6:2: 2;

step two, arbitrarily selecting normalized width and height data (w) of a real frame of an interested target in 1 sample picture from the data set to be processed_j,h_j) As initial clustering center y_j(w_j,h_j)；

Step three, for each sample x in the data set_i(w_i,h_i) Calculating it from the selected cluster center y_j(w_j,h_j) The distance of (d); distance d used_ijThe calculation method is as follows:

d_ij＝1-Shape-GIoU_ij

the invention is Shape-GIoU_ijThe calculation is shown as follows:

x_i∩y_j＝min(w_i,w_j)×min(h_i,h_j)

x_i∪y_j＝w_i×h_i+w_j×h_j-min(w_i,w_j)×min(h_i,h_j)

A_ij＝max(w_i,w_j)×max(h_i,h_j)

wherein x_i∩y_jArea, x, representing the union between the ith real box and the jth cluster center_i∪y_jDenotes the area of intersection between the ith real box and the jth cluster center, A_ijAn area of a minimum bounding rectangle representing an ith real box and a jth cluster center, where i ═ 0, 1., 21583, j ═ 1, 2., 9;

as can be seen from the above formula, when A_ij＝(x_i∪y_j) When the two frames completely intersect, the GIoU will degenerate to IoU, and a scaling factor is introduced

The coefficient introduces the influence of the aspect ratio of the two frames into a formula, and can solve the problem that the GIoU is degenerated to IoU;

step four, calculating each sample x_i(w_i,h_i) Probability of being selected as next cluster center

Randomly generating a new clustering center y_j+1(w_j+1,h_j+1)；

Step five, repeating the step three and the step four until K clustering centers are selected, and operating a standard K-means algorithm by using the K initial clustering centers to recalculate the clustering centers of each category;

because the YOLOv4 algorithm has three detection scales, each scale has 3 anchors, so that 9 clustering centers are obtained by selecting K to be 9, and the values of the clustering centers are multiplied by the resolution of output pictures to finally obtain 9 anchors required by the YOLOv4 algorithm;

step six, obtaining an anchor value by using a K-means + + clustering algorithm, carrying out network training through standard YOLOv4 and storing a test result test1, obtaining a new anchor value by using the method of the invention, and carrying out network training through standard YOLOv4 and storing a test result test 2; comparing the test results of the 2 experiments, wherein the indexes of comparison are mAP and Recall respectively; downloading standard YOLOv4 network and compiling, the downloading address of YOLOv 4: https// githu.com/alexeyAB/darknet, changing a training set, a verification set and a test set directory in a voc.data file in a data folder into addresses of downloaded data sets, and specifying the number and name of categories; modifying anchor, classes and filters parameters in a YOLOv4.cfg file in a cfg folder, wherein the anchor parameters are 9 anchor values obtained by using a K-means + + clustering algorithm; classes are the category numbers of the data set constructed by the invention and are respectively a mask wearing face target and a mask not wearing face target, namely, the classes are 2; filters (category number +5) × 3 ═ 21; starting to train the YOLOv4 network, and selecting a YOLOv4_ best in a backup folder as a test weight Q after the network training is finished₁Testing is carried out and a test result test1 is obtained; the method of the invention is used for obtaining a new anchor value, modifying the anchor parameter in the Yolov4.cfg file in the cfg folder, training the Yolov4 network and obtaining the test weight Q₂The test is performed and the test result test2 is obtained.

The method provides an improved clustering algorithm based on Shape-GIoU, and combines a K-means + + clustering method and the Shape-GIoU to cluster an initial Anchor box by improving a calculation mode of IoU; compared with a K-means + + clustering method, the method disclosed by the invention improves the mAP value and the Recall value of the YOLOv4 algorithm in the data set.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a sample graph of a training set;

FIG. 3 is a comparison graph of IoU values;

FIG. 4 is a graph of Shape-GIoU value calculations;

FIG. 5 is a graph of the partial detection results of the YOLOv4 model using the method of the invention;

FIG. 6 is the overall performance of the YOLOv4 model using the K-means + + clustering algorithm and the YOLOv4 model using the method of the present invention on validation data sets;

Detailed Description

In order to make the aforementioned and other objects, features and advantages of the present invention more apparent, the following detailed description of the embodiments of the present invention, taken in conjunction with the accompanying drawings, is set forth below:

referring to fig. 1, the implementation steps of the invention are as follows:

downloading public data sets AIZOO and RMFD for detecting the face mask, wherein because the quality of the data sets is also uneven, the face mask-wearing photo or the face mask-not-wearing photo with the resolution ratio larger than 608X 608 is selected from the public AIZOO and RMFD face recognition data sets, and a face mask data set used by the invention is constructed; creating a corresponding xml file for each sample picture in the data set, and storing the path of each sample picture and the category information p of the interested target in the sample picture_iCoordinate information of the upper left corner and the lower right corner of the target and the resolution of the sample picture are stored in corresponding xml files; preprocessing xml file data, firstly obtaining coordinate values of an interested target in a sample picture, and then converting the coordinate data into the width and height of a real frame, wherein the calculation formula is as follows: and finally, normalizing the width and height values, wherein the calculation formula is as follows: the normalized width and height value is equal to the width and height value/the resolution of the input picture; wherein the download address of the AIZOO data set: https:// githubZOOTech/faceMaskDetection; download address of RMFD data set: https// gitubb. com/X-zhangyang/Real-World-Masked-Face-Dataset;

the data sets of the invention are divided into two categories: the method comprises the steps that a mask wearing face target and a mask not wearing face target comprise 11208 pictures, a data set comprises 7933 mask wearing face targets and 13651 mask not wearing face targets in different scenes, and a test set, a verification set and a training set are divided according to the ratio of 6:2: 2;

FIG. 2 is a partial sample diagram of a training set in a data set used in the present invention, so as to show the universality of a target detection object, and different images are trained under different angles in different scenes;

d_ij＝1-Shape-GIoU_ij

referring to FIG. 4, the Shape-GIoU calculation of the present invention is shown as follows:

x_i∩y_j＝min(w_i,w_j)×min(h_i,h_j)

x_i∪y_j＝w_i×h_i+w_j×h_j-min(w_i,w_j)×min(h_i,h_j)

A_ij＝max(w_i,w_j)×max(h_i,h_j)

wherein x_i∩y_jArea, x, representing the union between real box and cluster center_i∪y_jRepresenting the area of intersection between the real box and the cluster center, A_ijThe area of the smallest bounding rectangle representing the real box and the cluster center,

Randomly generating a new clustering center y_j+1(w_j+1,h_j+1)；

step six, obtaining an anchor value by using a K-means + + clustering algorithm, carrying out network training through standard YOLOv4 and storing a test result test1, obtaining a new anchor value by using the method of the invention, and carrying out network training through standard YOLOv4 and storing a test result test 2; comparing the test results of the 2 experiments, wherein the indexes of comparison are mAP and Recall respectively; downloading and compiling a standard YOLOv4 network, wherein the download address of the standard YOLOv4 network is as follows: https// githu.com/alexeyAB/darknet), changing the addresses of the training set, the verification set and the test set directory in the voc.data file in the data folder into download data sets, and specifying the number and the name of categories; modifying anchor, classes and filters parameters in a YOLOv4.cfg file in a cfg folder, wherein the anchor parameters are 9 anchor values obtained by using a K-means + + clustering algorithm; classes are the category numbers of the data set constructed by the invention and are respectively a mask wearing face target and a mask not wearing face target, namely, the classes are 2; filters (category number +5) × 3 ═ 21; starting to train the YOLOv4 network, and selecting a YOLOv4_ best in a backup folder as a test weight Q after the network training is finished₁Testing is carried out and a test result test1 is obtained; the method of the invention is used for obtaining a new anchor value, modifying the anchor parameter in the Yolov4.cfg file in the cfg folder, training the Yolov4 network and obtaining the test weight Q₂The test is performed and the test result test2 is obtained.

The invention is further described below in connection with a simulation example.

Simulation example:

the invention adopts a face mask data set as a training set, a verification set and a test set, and provides part of detection effect graphs using the method of the invention.

Fig. 2 is an illustration diagram of a part of samples in a training set, a part of test data in a face mask data set is randomly selected and displayed as a result, and pictures with different backgrounds, different types of masks, different target sizes, different angles and different target densities are selected to display the universality of the test result.

FIG. 3 shows a graphical representation of the Shape-GIoU-based calculation of the method of the present invention, wherein the dashed box represents the prediction box, the solid box represents the real box, and the parameter of FIG. 4(a) is w_i＝4,h_i＝8,w_j＝2,h _j4; 4(b) has a parameter w_i＝4,h_i＝8,

Fig. 4(a) results using different calculation methods:

x_i∩y_j＝min(w_i,w_j)×min(h_i,h_j)＝2×4＝8

x_i∪y_j＝w_i×h_i+w_j×h_j-min(w_i,w_j)×min(h_i,h_j)

＝4×8+2×4-2×4＝32

A_ij＝max(w_i,w_j)×max(h_i,h_j)＝4×8＝32

fig. 4(b) results using different calculation methods:

therefore, under the condition that the prediction frame is completely wrapped by the real frame, for the condition that the prediction frame accounts for the same proportion of the real frame but has different length-width ratios, the Shape-GIoU provided by the method can be well distinguished, but the existing calculation method cannot be distinguished.

FIG. 3 is a diagram comparing the Shape-GIoU calculation of the present invention with the current calculation method, wherein the dashed box represents the prediction box, and the solid box represents the real box, so that when the prediction box is completely wrapped by the real box, the current calculation method cannot be distinguished, and the Shape-GIoU proposed by the present invention can be distinguished from the current calculation method when the prediction box occupies the real box with the same ratio but with different aspect ratios.

Fig. 5 is a partial detection result diagram of the original YOLOv4 model, and a detection diagram of pictures with different backgrounds, different types of masks, different target sizes, different angles and different target densities is selected to show the universality of the original detection model, so that it can be seen that the basic class detection effect of the object in the picture is good.

FIG. 6 shows the overall performance of the YOLOv4 model using the K-means + + clustering algorithm and the YOLOv4 model using the method of the present invention on the test data set, and it can be seen that both mAP and Recall on the test set are improved by the method of the present invention.

In conclusion, simulation experiments show that the YOLOv4 model of the method can be well distinguished when the prediction frame is completely wrapped by the real frame and the prediction frame accounts for the same proportion of the real frame but has different length-width ratios, and the detection performance of the YOLOv4 algorithm model is improved. The method can also be used in combination with a classical algorithm model, and the detection performance of the algorithm model is improved.

Claims

1. A clustering algorithm based on Shape-GIoU improvement is characterized in that:

downloading public data sets AIZOO and RMFD of face mask detection, selecting a face photo with a mask or a photo without the mask with a resolution ratio of more than 608X 608 to construct a face mask data set used by the invention, and preprocessing data in the data set to obtain normalized wide and high value data;

Step three, for each sample x in the data set_i(w_i,h_i) Calculating it from the selected cluster center y_j(w_j,h_j) The distance of (d);

Randomly generating a new clustering center y_j+1(w_j+1,h_j+1)；

step six, obtaining an anchor value by using a K-means + + clustering algorithm, performing network training through standard YOLOv4 and storing a test result test1, obtaining a new anchor value by using the method of the invention, performing network training through standard YOLOv4 and storing a test result test2, and comparing the test results of 2 experiments, wherein the compared indexes are mAP and Recall respectively;

in the foregoing steps, i ═ 0, 1., 21583 denotes a reference numeral of sample data, and j ═ 1, 2., 9 denotes a reference numeral of a cluster center.

2. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, step one, downloading the public data sets aizo and RMFD of face mask detection, because the quality of the data sets is also uneven, the invention selects the face photo with a mask or the non-mask photo with resolution greater than 608 x 608 from the public data sets of face recognition of aizo and RMFD, and constructs the data set of the face mask used in the invention; creating a corresponding xml file for each sample picture in the data set, and storing the path of each sample picture and the category information p of the interested target in the sample picture_iCoordinate information of the upper left corner and the lower right corner of the target and the resolution of the sample picture are stored in corresponding xml files; preprocessing xml file data, firstly obtaining coordinate values of an interested target in a sample picture, and then converting the coordinate data into the width and height of a real frame, wherein the calculation formula is as follows: and finally, normalizing the width and height values, wherein the calculation formula is as follows: the normalized width and height value is equal to the width and height value/the resolution of the input picture; wherein the download address of the AIZOO data set: https:// github. com/AIZOOTech/faceMaskDetection; download address of RMFD data set: https// gitubb. com/X-zhangyang/Real-World-Masked-Face-Dataset;

the data sets of the invention are divided into two categories: the mask wearing face target and the mask not wearing face target comprise 11208 photos, the data set comprises 7933 mask wearing face targets and 13651 mask not wearing face targets in different scenes, and the test set, the verification set and the training set are divided according to the ratio of 6:2: 2.

3. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, wherein in step two, the normalized width and height data (w) of the real frame of the target of interest in 1 sample picture is arbitrarily selected from the data set to be processed_j,h_j) As initial clustering center y_j(w_j,h_j)。

4. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, step three, for each sample x in the data set_i(w_i,h_i) Calculating it from the selected cluster center y_j(w_j,h_j) The distance of (d); distance d used_ijThe calculation method is as follows:

d_ij＝1-Shape-GIoU_ij

the invention is Shape-GIoU_ijThe calculation is shown as follows:

x_i∩y_j＝min(w_i,w_j)×min(h_i,h_j)

x_i∪y_j＝w_i×h_i+w_j×h_j-min(w_i,w_j)×min(h_i,h_j)

A_ij＝max(w_i,w_j)×max(h_i,h_j)

as can be seen from the above formula, when A_ij＝(x_i∪y_j) When the two frames completely intersect, the GIoU is degenerated to IoU, and a proportionality coefficient lambda is introduced_ij/β_ijThis factor introduces the effect of the aspect ratio of the two boxes into the equation, which can solve the problem of GIoU falling to IoU.

5. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, wherein x is calculated for each sample_i(w_i,h_i) Probability of being selected as next cluster center

Randomly generating a new clustering center y_j+1(w_j+1,h_j+1)。

6. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, wherein step five, step three and step four are repeated until K clustering centers are selected, and the K initial clustering centers are used to run a standard K-means algorithm to recalculate the clustering centers of each category; because the YOLOv4 algorithm has three detection scales, each scale has 3 anchors, we take K to 9 to obtain 9 cluster centers, multiply the values of the cluster centers by the resolution of the output picture, and finally we obtain 9 anchors required by the YOLOv4 algorithm.

7. The improved clustering algorithm based on Shape-GIoU as claimed in claim 1, step six, using K-means + + clustering algorithm to obtain anchor value, training and storing test result test1 through standard YOLOv4 network, obtaining new anchor value using the method of the present invention, training and storing test result test2 through standard YOLOv4 network; the test results of 2 experiments were compared,the indexes of comparison are mAP and Recall respectively, and the standard YOLOv4 network download address is as follows: https:// github.com/alexeyAB/dacknet downloads and compiles a standard YOLOv4 network, changes a training set, a verification set and a test set directory in a voc.data file in a data folder as addresses of downloaded data sets, and specifies the number and name of categories; modifying the anchor, classes and filters parameters in a YOLOv4.cfg file in a cfg folder, wherein the anchor parameter is 9 anchor values obtained by using a K-means + + clustering algorithm; classes are the category numbers of the data set constructed by the invention and are respectively a mask wearing face target and a mask not wearing face target, namely, the classes are 2; filters (category number +5) × 3 ═ 21; starting to train the YOLOv4 network, and selecting a YOLOv4_ best in a backup folder as a test weight Q after the network training is finished₁Testing is carried out and a test result test1 is obtained; the method of the invention is used for obtaining a new anchor value, modifying the anchor parameter in the Yolov4.cfg file in the cfg folder, training the Yolov4 network and obtaining the test weight Q₂The test is performed and the test result test2 is obtained.