CN112200186B

CN112200186B - Vehicle logo identification method based on improved YOLO_V3 model

Info

Publication number: CN112200186B
Application number: CN202011099944.0A
Authority: CN
Inventors: 郭峰峰; 白治江
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2024-03-15
Anticipated expiration: 2040-10-15
Also published as: CN112200186A

Abstract

The invention relates to a vehicle logo identification method based on an improved yolo_v3 model. The implementation steps are as follows: manufacturing a vehicle logo image data set with labels and tags, and enhancing the data; extracting multi-scale features of the vehicle logo in the data set by using a convolutional neural network of the improved YOLO_V3 model, and training the vehicle logo model; inputting an image of the to-be-detected car logo, identifying the to-be-detected car logo by using a car logo identification model, and obtaining position information of the to-be-detected car logo; and outputting a predicted image according to the training model to finish detection. The method provided by the invention has higher robustness, can identify the vehicle logo, and is beneficial to the establishment of an intelligent traffic system, thereby improving urban traffic safety.

Description

Vehicle logo identification method based on improved YOLO_V3 model

Technical Field

The invention belongs to the technical field of computer vision and artificial intelligence, in particular to a vehicle logo identification method based on an improved YOLO_V3 model, which is used for detecting whether a vehicle logo exists in an image and distinguishing the vehicle logo.

Background

In recent years, computer vision-based vehicle detection and recognition systems are increasingly playing an important role in smart city construction projects, and in order to promote smart city health development, continuous development of science and technology is required to improve and perfect an intelligent transportation system (IntelligentTransportSystem, ITS). Vehicles are an important member of the smart city, and are an important object of perception. Thus, in addition to license plate features, emblem features are also important features in vehicle detection. The traditional vehicle identification method is to quickly search a target vehicle from a large number of videos, mainly by license plate identification, and search vehicle images with the same license plate number from the large number of videos through known license plate numbers. However, various problems such as license plate theft, damaged license plate, license plate shielding and the like exist, which are unfavorable for the recognition of the license plate of the vehicle. The vehicle logo has obvious characteristics and strong representativeness, and can uniquely identify the brand of the vehicle, so that the brand of the vehicle is determined through the vehicle logo, the range of identifying the vehicle is reduced, and the pre-classification is realized. The real vehicle identification technology is one of the image classification technologies, and comprises two key links of feature extraction and feature classification. Conventional image characterization methods include Local Binary Pattern (LBP) features, gradient direction histogram features (histogramoforiented gradient, HOG) features and Haar features. Common feature classification methods include a random forest learning method, a Support Vector Machine (SVM) learning method, and an Adaboost learning method.

In vehicle brand and model identification, vehicle identification is an important research direction, and how to quickly and accurately identify vehicle marks in real time is an incentive for research. In the car logo recognition problem, the conventional recognition method includes two steps: logo positioning and logo recognition. Regarding the positioning of the vehicle logo, there is a method that firstly, the vehicle logo is positioned through the position of the license plate and the relative position of the license plate and the vehicle logo, then texture features in different directions of a candidate region of the vehicle logo are calculated to obtain the edge information of the vehicle logo, and finally, the vehicle logo is positioned in detail, the method needs the vehicle logo to have obvious edge features, is harsh in adaptation environment, and can achieve the required effect only under ideal conditions; the other method is to position the logo by combining an adaboost learning method and chebyshev time together, but the calculation method is complex and takes a long time. Regarding vehicle logo recognition, there is a method of classifying vehicle logos by using a template matching and edge direction histogram method, which requires collection of a large amount of template samples, and is time-consuming, because residual errors of image quality are uneven, recognition accuracy is low, matching time is long, and real-time performance is poor; the other method is that firstly SIFT features are extracted, then a classifier is trained to classify the vehicle logo, but the SIFT algorithm has obvious defects, large calculated amount and more detected redundant extreme points, and is not beneficial to the extraction of the feature points.

In recent years, a deep learning method represented by a Convolutional Neural Network (CNN) model has achieved tremendous results in the field of computer vision. The convolutional neural network and the support vector machine classifier are combined together for use, the vehicle logo is firstly screened for the first time through the convolutional neural network, and is classified by sending the vehicle logo into the support vector machine, so that higher accuracy is achieved, however, the support vector machine needs manual selection, the environmental requirement of the feature selection on a data set is higher, the convolutional neural network does not need to be selected manually, the accuracy is higher than that of the traditional method, but because of more parameters and larger calculated amount, the real-time performance is not met.

In the vehicle logo identification method, a plurality of classification methods are affected by a positioning method, if positioning is deviated, the identification accuracy is reduced, even if some positioning is not wrong, the identification efficiency is not good, the requirement on picture quality is high, and the requirement on environment is high.

Disclosure of Invention

In order to improve the recognition rate, robustness and real-time performance of the vehicle logo recognition method, the vehicle logo recognition method based on the improved YOLO_V3 model is provided, the accuracy of vehicle logo recognition is improved, and experiments show that the model can perform vehicle logo recognition more accurately and rapidly.

In order to achieve the above object, the present invention provides a vehicle logo recognition method based on an improved yolo_v3 model, comprising the steps of:

s1, manufacturing a data set of the logo type image with labels and tags.

S2, extracting multi-scale features of the vehicle logo in the data set by using a convolutional neural network of the improved YOLO_V3 model, and training a vehicle logo model;

s3, inputting an image of the to-be-detected car logo, identifying the to-be-detected car logo by using a car logo identification model, and obtaining position information of the to-be-detected car logo;

and S4, outputting a predicted image according to the training model to finish detection.

The step S1 comprises the following steps:

collecting the car logo in the initial image data, expanding the initial image data of the existing part of car logo types by using RandomErase, cutOut, mixUp, rotation, contrast enhancement and other data enhancement modes, and marking; cutting the resolution of the logo image of the initial image data set into a fixed size, and matching the label of the initial image data to obtain a logo type image and a logo label after marking as the existing logo type data set. Further, the data set of the car logo type is divided into a training set and a testing set, and the testing set is used for testing the accuracy and the robustness of the car logo model. Further, model robustness is assessed by accuracy (PR) and Recall (RE) and vehicle logo. The accuracy parameter PR represents the accuracy of the car logo model, the higher the PR value is, the better the robustness of the car logo model is, the recall rate RE represents the accuracy of detecting the car logo and distinguishing the car logo, the higher the RE value is, the better the detection result is, and the accuracy and recall rate are defined as shown in the following formula:

wherein T represents the number of the logos correctly detected by the logo model, and TF represents the number of the logos incorrectly detected by the logo model. FT represents the number of marks that the logo model missed.

The step S2 comprises the following steps:

the convolution neural network of the improved YOLO_V3 model carries out convolution operation of different sizes on the vehicle logo images of the input training set to form feature images of different scales of the vehicle logo images; the convolutional neural network learns the features of the vehicle logo images in different scales, and realizes detection of multiple scales of the vehicle logo.

The step S3 comprises the following steps:

inputting the to-be-detected logo image into a logo model, using a K-means algorithm to count parameters of anchor frames, and simultaneously determining initial positions of the boundary frames, and predicting the boundary frames by three anchor frames of each unit on each scale; dividing an icon image to be detected into S multiplied by S grids, and predicting B rectangular frames and confidence degrees corresponding to the rectangular frames by each grid; wherein S represents the number of divided grids; b represents the number of frames each grid is responsible for. And selecting a vehicle logo priori boundary box with the maximum confidence score value, and predicting the position of the vehicle logo image to be detected through a logistic regression function.

The K-means method comprises the following steps:

randomly selecting K samples from the data object to serve as initial K centroids; the Euclidean distance between all the remaining samples and each centroid is calculated, and each sample is divided into class clusters where the closest centroid is located; re-computing the centroid of each cluster; if all the K centroids are not changed, outputting a cluster classification result, and if the K centroids are changed, returning to the step 2.

The confidence and position size of the bounding box are calculated by the following coordinate offset formula.

pr(object)×IOU(b,object)＝σ(t0)，

bx＝σ(tx)+cx，

by＝σ(ty)+cy，

bw＝pw×e ^tw ，

bh＝ph×e ^th ，

Wherein the predicted output of the model is (tx, ty, tw, th). cx and cy, pw and ph represent the size of the pre-prediction bounding box. bx, by, bw, and bh are the coordinates and size of the center of the predicted bounding box. The YOLO v3 algorithm predicts the score for each bounding box using a logistic regression method. If the overlap of the real frame and one of the predicted bounding boxes is better than the other bounding boxes, the value may be 1. If not the best, but a certain threshold is exceeded, the prediction is ignored. The YOLO v3 algorithm assigns a bounding box to each real object and if the real object does not match the bounding box, no class prediction loss or coordinates will be generated, only object prediction loss will be generated.

The improvement of the invention is that:

the upsampling method in the yolo_v3 model herein alternates using pixel rebinning that performs depth-to-space conversion and nearest neighbor interpolation that performs spatial transform, helping to reduce the number of parameters and the temporal complexity. Using normalized coordinate scale, introducing a punishment factor, wherein no common area exists between the predicted frame and the real frame, the predicted frame moves towards the target frame, the distance between the central point and the overlapping area of the boundary frame are considered, the length and width between the target frame and the anchor frame are also very important, the moving speed can be provided, the convergence speed is fast, and finally the punishment factor of the aspect ratio of the target frame is required to be introduced, wherein a is the Euclidean distance between the central points of the predicted frame and the real frame, b is the central point of the anchor frame, b ^t Is the center point of the target frame and c is the diagonal distance that merges the anchor frame and the target frame into the smallest rectangle. The definition is as follows:

w and v in this equation are parameters for measuring the consistency of the scale between the target frame and the anchor frame and for balancing the scale, respectively, and it can be seen from the equation of w that the bounding box loss function is optimized in a direction that tends to overlap more areas, where w and v are calculated as follows:

features fig. 13×13 are omitted, and detection on the scale of 104×104 is added.

The improved yolo_v3 algorithm used in the method has higher recognition rate, obvious advantages compared with other methods, better effect in practical application, higher accuracy rate for various complex scenes where the sample is located, adaptation to difficult environments and stronger generalization.

Drawings

FIG. 1 is a flow chart of a method for vehicle logo based on an improved YOLO_V3 model in the present invention;

FIG. 2 is a graph of network training loss for the improved YOLO_V3 model-based vehicle logo approach of the present invention;

FIG. 3 is a graph of nearest neighbor interpolation in upsampling of the vehicle logo method based on the modified YOLO_V3 model in the present invention;

FIG. 4 is a pixel reconstruction diagram in upsampling of the vehicle logo method based on the modified YOLO_V3 model in the present invention;

fig. 5 is a diagram showing a structure of an improved yolo_v3 of the vehicle logo recognition method based on the improved yolo_v3 model in the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples.

The invention provides a vehicle logo identification method based on an improved yolo_v3 model. As shown in the general flow diagram of fig. 1, the method comprises the steps of:

s1, manufacturing a data set of a logo type image with labels and tags;

the step S1 comprises the following steps:

collecting the car logo in the initial image data, expanding the initial image data of the existing part of car logo types by using RandomErase, cutOut, mixUp, rotation, contrast enhancement and other data enhancement modes, and marking; cutting the resolution of the logo image of the initial image data set into a fixed size, and matching the label of the initial image data to obtain a logo type image and a logo label after marking as the existing logo type data set. Further, the data set of the car logo type is divided into a training set and a testing set according to a certain proportion, and the testing set is used for testing the accuracy and the robustness of the car logo identification model. The accuracy rate (PR) and the recall Rate (RE) are used for evaluating the robustness of the car logo identification model, wherein the accuracy rate parameter PR represents the accuracy of the car logo identification model, the higher the PR value is, the better the robustness of the car logo identification model is, the recall rate RE represents the accuracy rate of detecting car logos and distinguishing the car logos, the higher the RE value is, the better the detection result is, and the accuracy rate and the recall rate are defined as shown in the following formula:

wherein T represents the number of the logos correctly detected by the logo model, and TF represents the number of the logos incorrectly detected by the logo model. FT represents the number of marks that the logo model missed. Specifically, the dataset was prepared according to 7: the scale of 3 is divided into training and test sets.

the step S2 comprises the following steps:

the convolution neural network of the improved YOLO_V3 model carries out convolution operation of different sizes on the vehicle logo images of the input training set to form feature images of different scales of the vehicle logo images; specifically, when the resolution of the input image sequence of the convolutional neural network of the yolo_v3 model is 416×416, the neural network forms 26×26, 52×52, 104×104, respectively, through the convolutional operation and the residual network. The convolutional neural network learns the features of the vehicle logo images in different scales, and realizes detection of multiple scales of the vehicle logo.

the step S3 comprises the following steps:

inputting the image of the vehicle logo to be detected into a vehicle logo model, and counting parameters of an anchor frame by using a K-means algorithm, wherein the K-means method comprises the following steps of: randomly selecting K samples from the data object to serve as initial K centroids; the Euclidean distance between all the remaining samples and each centroid is calculated, and each sample is divided into class clusters where the closest centroid is located; re-computing the centroid of each cluster; if all the K centroids are not changed, outputting a cluster classification result, and if the K centroids are changed, returning to the step 2. Simultaneously determining an initial position of the bounding box, and predicting the bounding box by three anchor blocks of each unit on each scale; dividing an icon image to be detected into S multiplied by S grids, and predicting B rectangular frames and confidence degrees corresponding to the rectangular frames by each grid, wherein S represents the number of divided grids; b represents the number of frames in charge of each grid; selecting a vehicle logo priori boundary box with the maximum confidence score value, and predicting the position of the vehicle logo image to be detected through a logistic regression function; the confidence and location size of the bounding box are calculated by the following coordinate offset formula:

pr(object)×IOU(b,object)＝σ(t0)，

bx＝σ(tx)+cx，

by＝σ(ty)+cy，

bw＝pw×e ^tw ，

bh＝ph×e ^th ，

wherein the predicted output of the model is (tx, ty, tw, th). cx and cy, pw and ph represent the size of the pre-prediction bounding box. bx, by, bw, and bh are the coordinates and size of the center of the predicted bounding box. The yolo_v3 algorithm predicts the score for each bounding box using a logistic regression method. If the overlap of the real frame and one of the predicted bounding boxes is better than the other bounding boxes, the value may be 1. If not the best, but a certain threshold is exceeded, the prediction is ignored. The yolo_v3 algorithm assigns a bounding box to each real object, and if the real object does not match the bounding box, no class prediction loss or coordinates will be generated, only object prediction loss will be generated.

When training starts for the modified yolo_v3 algorithm, the learning rate for the training phase is set to 0.001, the number of samples selected for one training is set to 10, the initial decay rate is set to 0.0005, and the momentum is set to 0.9. When the iteration number of the training model reaches 500, the attenuation learning rate can further converge the loss function, and the attenuation rate is set to be 0.1. The convergence curve of the loss function during training of the improved YOLO v3 algorithm is shown in fig. 2, with the loss decreasing as the number of iterations increases.

The improvement of the invention is that:

nearest neighbor interpolation, which does not require computation, is performed by assigning to it the nearest neighbor pixel in four directions. Let x+a, y+b (x, y is a positive integer, a, b is a fraction greater than 0 and less than 1) be the required pixel coordinates, f (x+a, y+b) be the gray value of the pixel to be solved, as shown in fig. 3. If (x+a, y+b) falls in the A region, it can be determined that a <0.5, B <0.5, the pixel value to be calculated is the pixel value of the upper left corner, and similarly, the pixel value to be calculated in the B region is the pixel value of the lower left corner, the pixel value to be calculated in the C region is the pixel value of the upper right corner, and the pixel value to be calculated in the D region is the pixel value of the lower right corner.

The implementation flow chart of the pixel reorganization algorithm is shown in fig. 4, a feature map with the channel number of r2 is obtained through convolution operation (the input low-resolution image is consistent with the obtained feature map), r is an upsampling factor, which is the magnification of the image, and if the high-resolution image is to be obtained, a period screening method is needed. The upsampling method in the yolo_v3 model herein alternates using pixel rebinning that performs depth-to-space conversion and nearest neighbor interpolation that performs spatial transform, helping to reduce the number of parameters and the temporal complexity.

Using normalized coordinate scale, introducing a punishment factor, wherein no common area exists between the predicted frame and the real frame, the predicted frame moves towards the target frame, the distance between the central point and the overlapping area of the boundary frame are considered, the length and width between the target frame and the anchor frame are also very important, the moving speed can be provided, the convergence speed is fast, and finally the punishment factor of the aspect ratio of the target frame is required to be introduced, wherein a is the Euclidean distance between the central points of the predicted frame and the real frame, b is the central point of the anchor frame, b ^t Is the center point of the target frame and c is the diagonal distance that merges the anchor frame and the target frame into the smallest rectangle. The definition is as follows:

features fig. 13×13 are omitted, and detection on the scale of 104×104 is added. The resulting yolo_v3 structure is shown in fig. 5.

The test results of the partial test specimens are shown in the following table:

vehicle logo	Correct number (number)	Number of errors (number of errors)	Accuracy rate of
				Public	46	4	96.0％
Audi	40	10	94.0％
				BMW horse	44	6	96.0％
Benz toy	45	5	94.0％
				Faraday device	48	2	94.0％
Non-car logo	50	0	100％

The improved YOLO_v3 algorithm used in the method has higher recognition rate, obvious advantages compared with other methods, better effect in practical application, higher accuracy rate for various complex scenes where the sample is located, adaptation to difficult environments and stronger generalization; the method provided by the invention has higher robustness, can identify the vehicle logo, and is beneficial to the establishment of an intelligent traffic system, thereby improving urban traffic safety.

Those of ordinary skill in the art will recognize that there are numerous other embodiments of the invention, which are described herein to aid the reader in understanding the principles of the invention, and it is to be understood that the scope of the invention is not limited to such specific recitations and embodiments. Various modifications and variations may be made by those skilled in the art in light of the teachings of this invention without departing from the spirit or essential scope thereof, and such modifications and variations are intended to be included within the scope of the following claims.

Claims

1. A vehicle logo recognition method based on an improved yolo_v3 model, comprising the steps of:

s1, manufacturing a data set of a logo type image with labels and tags;

s2, extracting multi-scale features of the vehicle logo in the data set by using a convolutional neural network of the improved YOLO_V3 model, and training the vehicle logo model, wherein the method comprises the following steps of:

s2.1, performing convolution operation of different sizes on the vehicle logo images of the input training set by using the improved convolutional neural network of the YOLO_V3 model to form feature images of different scales of the vehicle logo images;

s2.2, the convolutional neural network learns features of different scales of the logo image, and detection of multiple scales of the logo is achieved;

s3, inputting an image of the vehicle logo to be detected, identifying the vehicle logo to be detected by using the vehicle logo identification model, obtaining the position information of the vehicle logo to be detected,

the method comprises the following steps:

s3.1, inputting an image of a vehicle logo to be detected into a vehicle logo model, using a K-means algorithm to count parameters of an anchor frame, and simultaneously determining initial positions of the boundary frames, and predicting the boundary frames by three anchor frames of each unit on each scale;

s3.2, dividing the to-be-detected logo image into S multiplied by S grids, and predicting B rectangular frames and confidence degrees corresponding to the rectangular frames by each grid;

wherein S represents the number of divided grids; b represents the number of frames in charge of each grid;

s3.3, selecting a vehicle logo priori boundary box with the maximum confidence coefficient score value, and predicting the position of the vehicle logo image to be detected through a logistic regression function;

the confidence and location size of the bounding box are calculated by the following coordinate offset formula:

pr(object)×IOU(b,object)＝σ(t0)，

bx＝σ(tx)+cx，

by＝σ(ty)+cy，

bw＝pw×e ^tw ，

bh＝ph×e ^th ，

the predicted output of the model is (tx, ty, tw, th), the grid cell coordinates denoted by cx and cy, and pw and ph denote the size of the pre-prediction bounding box; bx, by, bw, and bh are the coordinates and size of the center of the predicted bounding box; the yolo_v3 algorithm predicts the score of each bounding box using a logistic regression method; the yolo_v3 algorithm assigns a bounding box to each real object, and if the real object is not matched with the bounding box, the class prediction loss or coordinates are not generated, and only the object prediction loss is generated;

s4, introducing a punishment factor by using a normalized coordinate scale, and finally introducing a punishment factor of the aspect ratio of the target frame, wherein a is the Euclidean distance between the center points of the prediction frame and the real frame, b is the center point of the anchor frame, and b ^t Is the center point of the target frame, c is the diagonal distance of the smallest rectangle that merges the anchor frame and the target frame; the definition is as follows:

s5, outputting a predicted image according to the training model to finish detection.

2. The method of vehicle logo based on yolo_v3 model as claimed in claim 1, wherein the step S1 comprises the steps of:

the method comprises the steps of collecting an automobile logo in initial image data, expanding the initial image data of the existing part of automobile logo types by using data enhancement modes such as Random Erase, cutOut, mixUp, rotation, contrast enhancement and the like, and marking;

cutting the resolution of the logo image of the initial image data set into a fixed size, and matching the label of the initial image data to obtain a logo type image and a logo label after marking as the existing logo type data set.

3. The method of claim 2, wherein the existing logo type data set is divided into a training set and a test set, and the test set is used for testing the robustness of the yolo_v3 model.

4. The vehicle logo method based on the modified yolo_v3 model of claim 1, wherein the K-means method comprises the steps of:

s3.1.1, randomly selecting K samples from the data object to serve as initial K centroids;

s3.1.2, calculating Euclidean distances between all the remaining samples and each centroid, and dividing each sample into clusters of the closest centroids;

s3.1.3, recalculate the centroid of each cluster;

s3.1.4, if all the K centroids are unchanged, outputting a cluster classification result, and if the K centroids are changed, returning to the S3.1.2 th step.

5. The method of claim 1, wherein the vehicle logo model is a YOLO-V3 model based on a dark-53 as an underlying network structure.

6. A vehicle logo method based on an improved yolo_v3 model as claimed in claim 1, characterized by improved upsampling; the up-sampling method in the yolo_v3 model uses pixel reorganization and nearest neighbor interpolation to perform conversion from depth to space alternately, so that the parameter number and the time complexity are reduced, and the nearest neighbor interpolation performs spatial conversion.

7. The method for identifying a vehicle logo based on an improved yolo_v3 model according to claim 1, wherein the feature map 13 x 13 is omitted and a 104 x 104 scale test is added.