CN112132130A

CN112132130A - Real-time license plate detection method and system for whole scene

Info

Publication number: CN112132130A
Application number: CN202010999847.0A
Authority: CN
Inventors: 柯逍; 曾淦雄; 林炳辉
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2020-12-25
Anticipated expiration: 2040-09-22
Also published as: CN112132130B

Abstract

The invention relates to a real-time license plate detection method and a system for a whole scene, which comprises the following steps: generating a training set and a testing set for network training and testing; modifying a YOLOv3-tiny network structure to generate an MD-YOLO model; training the constructed MD-YOLO model by adopting a training set, and selecting an optimal weight file based on the mAP; and taking the license plate to be detected as the input of the trained MD-YOLO model, outputting the result with the license plate detection frame and returning the result. The method can finish direct license plate region extraction in any scene.

Description

Real-time license plate detection method and system for whole scene

Technical Field

The invention relates to the technical field of intelligent traffic image detection, in particular to a full-scene-oriented real-time license plate detection method and system.

Background

The license plate recognition technology of motor vehicles is an important link in the current automatic traffic control system. The vehicle identification method can help various management systems to quickly complete vehicle identification and determine vehicle information. The license plate detection is a necessary step for realizing license plate number identification, and the realization of high-efficiency license plate detection is an indispensable important step of the whole license plate identification system.

In the past, license plate detection methods centered on traditional digital image processing techniques. The method can only process the license plate detection in a specific environment and has poor robustness. In recent years, the development of deep learning methods represented by convolutional neural networks improves the precision and speed of license plate detection, and the robustness is greatly enhanced. However, the two-stage detection method represented by Fast-RCNN has high calculation cost and large model size, and cannot be directly deployed in terminal equipment. In addition, the problem of large original model and calculated amount also exists in a single-stage detection method represented by YOLO, so that a lighter target detection model needs to be developed to realize license plate detection for terminal deployment and real-time license plate detection.

In addition, in the past technical implementation, the image generally processed mostly has only one license plate, and the license plate area accounts for a large proportion of the image. However, in more real scenes, it is very common that there are many license plates in one picture. The prior solution is to detect the vehicle position first and then to complete the license plate detection for each vehicle. This significantly increases the processing time. Therefore, a method for directly detecting the license plate in any scene needs to be developed, so that the whole time for extracting the license plate is greatly shortened.

Disclosure of Invention

In view of this, the present invention provides a full-scene-oriented real-time license plate detection method and system, which can complete direct license plate region extraction in any scene.

The invention is realized by adopting the following scheme: a full-scene-oriented real-time license plate detection method specifically comprises the following steps:

generating a training set and a testing set for network training and testing;

modifying a YOLOv3-tiny network structure to generate an MD-YOLO model;

training the constructed MD-YOLO model by adopting a training set, and selecting an optimal weight file based on the mAP;

and taking the license plate to be detected as the input of the trained MD-YOLO model, outputting the result with the license plate detection frame and returning the result.

Further, the generating of the training set and the test set for network training and testing specifically includes:

collecting a license plate detection data set;

manually labeling the license plate area of the picture without the label by using LabLeImg, wherein the label comprises: a rectangular frame containing the license plate, coordinates (lx, ly) at the upper left corner, coordinates (rx, ry) at the lower right corner and a tag name plate;

and dividing all the data marked with the labels into a training set and a testing set.

Furthermore, an anchor frame is generated by using a K-means clustering method for the training data set, and the anchor frame of the original model is replaced, so that the license plate detection precision is higher.

Further, the generating of the anchor frame by using the K-means clustering method for the training data set specifically includes:

defining the width and the height of an input picture as W and H respectively; in the picture, the coordinates of the license plate are (lx)_i，ly_i，rx_i，ry_i) I ∈ {1, 2, …, n }, where lx_i，ly_iRepresents the coordinates of the upper left corner, rx, of the ith license plate_i，ry_iRepresenting the coordinates of the lower right corner of the ith license plate;

the width and height of the license plate are normalized by using the following calculation formula:

in the formula, w_iRepresents the normalized width; represents h_iRepresentative standardThe height after the formation;

calculating the distance D between the license plate labeling frame and the license plate anchor frame, wherein D is 1-IoU, and IoU is defined as the license plate labeling frame (w)_i，h_i) And the license plate anchor (w)_j，h_j) Wherein i represents the ith license plate, and j represents the jth anchor frame;

k clustering centers are initialized, and the height and the width of the license plate anchor frame are (w)_j，h_j)，j∈{1，2，3…k}；

Calculating the distance D between each license plate frame and the license plate anchor frame; selecting the license plate anchor frame with the minimum D for each license plate frame to distribute;

and calculating the mean value of each cluster as the cluster center of the next iterative calculation, and repeating the steps until the position of the cluster center tends to be stable.

Further, the modifying YOLOv3-tiny network structure to generate the MD-YOLO model specifically includes:

the MD-YOLO model comprises the following layers in sequence: conv1, MaxPool, Conv2, MaxPool, Conv3, MaxPool, Conv4, MaxPool, Conv5, MaxPool, Conv6, MaxPool, Conv7, Conv8, Conv9, Conv10, route, Conv11, UpSample, route, Conv12, Conv 13; wherein, Conv1 to Conv13 represent convolutional layers, MaxPool is a maximum pooling layer, and route represents the realization of the merged output of the base convolutional layer and the current layer;

modifying the picture size of the input MD-YOLO model to 416 × 416, wherein Conv10 and Conv13 are two output layers with the dimensions of 13 × 13 × C and 26 × 26 × C, respectively, where C ═ 5 × 3 and classes ═ 1;

and adjusting the convolution quantity and the convolution sum of the MD-YOLO.

Further, the adjusting the number and the sum of the convolutions of the MD-YOLO specifically includes: the number of convolutions for Conv1 was 3, the number of convolutions for Conv2 was 3, and the number of convolutions for Conv7 was 512. The convolution kernel size of Conv9 was 1 × 1, and the convolution kernel size of Conv12 was 1 × 1.

Further, the MD-YOLO model constructed by training of the training set and selecting the optimal weight file based on the mep specifically include:

setting a training configuration file, storing a weight file every thousand rounds, calculating the mAP once, and updating the weight file of the current optimal mAP value;

35 ten thousand rounds of training are performed in total; and after training is finished, selecting a weight file with the optimal mAP value as a final result file.

Further, the step of outputting a result with a license plate detection frame and returning the result with the license plate detection frame by using the license plate to be detected as the input of the trained MD-YOLO model specifically comprises the following steps:

inputting an image to be detected into a trained MD-YOLO model, obtaining predicted license plate prediction frames, and obtaining confidence P of all license plate prediction frames;

performing non-maximum suppression on the license plates of all the license plate prediction frames to remove repeated frame selection;

and screening the confidence degrees of all the obtained license plate prediction frames, and selecting the prediction frame with the confidence degree larger than a preset value to finally predict the final result.

Further, the step of performing non-maximum suppression on the license plate of all the prediction frames to remove repeated frame selection specifically comprises:

calculating the aspect ratio of all license plate prediction frames; the definition of the method is that,

λ is the aspect ratio of the license plate prediction frame, h_pred，w_predRespectively representing the height and width of the prediction box;

by means of probability value P of number plate_λWhether the license plate in the license plate prediction frame is a license plate or not is predicted, and the calculation formula is as follows:

updating the confidence P of each license plate prediction frame as P + P_λ；

Sequencing according to the confidence degrees of the license plate prediction frames to generate a boundary frame list, selecting the license plate prediction frame with the highest confidence degree, adding the license plate prediction frame into an output list, and deleting the license plate prediction frame from the boundary frame list; IoU of the license plate prediction frame with the highest confidence coefficient and other prediction frames in the boundary frame list are calculated, and the prediction frame with the confidence coefficient being more than a preset threshold value is deleted IoU; and then, sequencing the confidence degrees of the license plate prediction frames again, and repeating the steps until the boundary frame list is empty, wherein the license plate prediction frames in the output list are finally obtained license plate prediction frames.

The invention also provides a real-time license plate detection system facing the whole scene, which comprises a memory, a processor and a computer program instruction which is stored on the memory and can be run by the processor, wherein when the processor runs the computer program instruction, the steps of the method are realized.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides a full-scene-oriented real-time license plate detection method, which is used for carrying out targeted improvement by using a single-stage target detection-based network YOLO (object oriented error detection) so as to further improve the detection precision and speed, reduce the size of a network model and be suitable for terminal deployment.

2. The invention uses a multi-scale prediction method, and obviously improves the detection capability of small target objects. By modifying the convolution kernel, the nonlinearity of the network is improved, the capability of the network for processing complex environment is improved, and the network has full-scene detection capability.

3. The invention directly faces to the license plate for detection, thereby avoiding the time cost for detecting the vehicle, further shortening the time for detecting the license plate, and simultaneously avoiding the vehicle detection for processing the shielded license plate in a complex scene.

Through the promotion, the network has the capabilities of terminal deployment and real-time efficient detection, is a method suitable for any scene, and has the characteristics of one-time training and multi-scene application.

Drawings

FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating a prediction effect according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides a full-scene-oriented real-time license plate detection method, which specifically includes the following steps:

s1: generating a training set and a testing set for network training and testing; the method comprises the steps of establishing a multi-license scene data set and an open-source public single-license image data set;

s2: modifying a YOLOv3-tiny network structure to generate an MD-YOLO model; based on the characteristics of the license plate recognition task, the convolution number of the convolution layer is modified, the accuracy and the speed of the network are balanced, and the size of the network model is compressed. Modifying the convolution kernel of the detection layer, improving the nonlinearity of the network, and improving the detection capability of the network under the complex environment condition;

s3: adopting an MD-YOLO model constructed by training of a training set, and selecting an optimal weight file based on mAP (mean Average precision);

s4: and (3) taking the image to be detected containing the license plate as the input of the trained MD-YOLO model, outputting a result with a license plate detection frame and returning the result.

In this embodiment, in step S1, the generating a training set and a test set for network training and testing specifically includes:

collecting a license plate detection data set; the method comprises the steps of extracting pictures of videos shot by a vehicle event data recorder and a public large Chinese license plate data set;

manually labeling a license plate area of a picture without a label by using LabLeImg, labeling a plurality of license plates recognizable by human eyes in the picture, wherein the label comprises: a rectangular frame containing the license plate, coordinates (lx, ly) at the upper left corner, coordinates (rx, ry) at the lower right corner and a tag name plate;

all labeled data are divided into training set and testing set in a ratio of 7: 3.

And generating an anchor frame by using a K-means clustering method for the training data set, and replacing the anchor frame of the original model, so that the license plate detection precision is higher. The method specifically comprises the following steps:

in the formula, w_iRepresents the normalized width; represents h_iRepresents the normalized height;

calculating the distance D between the license plate labeling frame and the license plate anchor frame, wherein D is 1-IoU, and IoU is defined as the license plate labeling frame (w)_i，h_i) And the license plate anchor (w)_j，h_j) Wherein i represents the ith license plate, j represents the jth anchor frame, and the calculation formula is as follows:

In this embodiment, in step S2, the modifying YOLOv3-tiny network structure to generate the MD-YOLO model specifically includes:

the MD-YOLO model comprises the following layers in sequence: conv1, MaxPool, Conv2, MaxPool, Conv3, MaxPool, Conv4, MaxPool, Conv5, MaxPool, Conv6, MaxPool, Conv7, Conv8, Conv9, Conv10, route, Conv11, UpSample, route, Conv12, Conv 13; wherein, Conv1 to Conv13 represent convolutional layers, the size of a convolutional kernel is 1 or 3, MaxPool is a maximum pooling layer, the size of the kernel is 2, the step size (stride) is 2 or 1, and route represents that the merged output of the base convolutional layer and the current layer is realized;

modifying the picture size of the input MD-YOLO model to 416 × 416, wherein the scales of two output layers, Conv10 and Conv13, are 13 × 13 × C and 26 × 26 × C, respectively, where C ═ classes +5 × 3 and classes ═ 1; and the prediction output of two different scales is favorable for improving the detection rate of the small target license plate.

And adjusting the convolution quantity and the convolution sum of the MD-YOLO. The method specifically comprises the following steps: the number of convolutions of Conv1 is 3 and the number of convolutions of Conv2 is 3, and the limitation of these two numbers of convolutions greatly reduces the amount of computation. The number of convolutions of Conv7 was 512. The convolution kernel size of Conv9 was 1 × 1, and the convolution kernel size of Conv12 was 1 × 1. 1 x 1 improves the network non-linearity with a further degree of certainty when reducing the amount of computation. The final MD-YOLO model is shown in the following table.

In this embodiment, in step S3, the training of the MD-YOLO model constructed by using the training set and selecting the optimal weight file based on the mAP specifically include:

In this embodiment, the outputting the result with the license plate detection frame and returning the result with the license plate detection frame by using the license plate to be detected as the input of the trained MD-YOLO model specifically includes:

and screening the confidence degrees of all the obtained license plate prediction frames, and selecting the final predicted result of the prediction frame with the confidence degree larger than the preset value (0.5 is selected in the embodiment).

In this embodiment, the performing non-maximum suppression on the license plate for all the prediction frames to remove repeated frame selection specifically includes:

by means of probability value P of number plate_λWhether the license plate is in the license plate prediction frame is predicted, the length-width ratio of the license plate is about 3.14, so that whether the prediction frame is the license plate can be predicted through the range of lambda,the prior knowledge is fully utilized, and the recognition rate of the license plate is improved. Within the range of 3.14, there is a high probability of being a license plate. In order to take the rotation of the license plate into consideration, the embodiment selects a larger width relaxation ratio, P_λThe probability value of the license plate calculated by the prediction box is defined as the following formula:

updating the confidence P of each license plate prediction frame as P + P_λ；

Sequencing according to the confidence degrees of the license plate prediction frames to generate a boundary frame list, selecting the license plate prediction frame with the highest confidence degree, adding the license plate prediction frame into an output list, and deleting the license plate prediction frame from the boundary frame list; IoU of the license plate prediction frame with the highest confidence coefficient and other prediction frames in the boundary frame list are calculated, and the prediction frame with the confidence coefficient being more than a preset threshold value (set as 0.5 in the embodiment) is deleted IoU; and then, sequencing the confidence degrees of the license plate prediction frames again, and repeating the steps until the boundary frame list is empty, wherein the license plate prediction frames in the output list are finally obtained license plate prediction frames.

The embodiment also provides a full-scene-oriented real-time license plate detection system, which comprises a memory, a processor and computer program instructions stored on the memory and capable of being executed by the processor, wherein when the processor executes the computer program instructions, the steps of the method are implemented.

According to the embodiment, the YOLOv3-tiny vehicle license plate detection task is improved, and the calculated amount of the model and the size of the model are greatly reduced and the detection speed is further improved on the premise of ensuring the detection precision through the limitation of the convolution number and the convolution kernel. The final effect is as shown in fig. 2, inputting the image containing the license plate, and returning the image with the license plate labeling frame. As can be seen from FIG. 2, the method is very effective in detecting the license plate in a complex scene and has high robustness.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims

1. A full-scene-oriented real-time license plate detection method is characterized by comprising the following steps:

generating a training set and a testing set for network training and testing;

modifying a YOLOv3-tiny network structure to generate an MD-YOLO model;

and (3) taking the image to be detected containing the license plate as the input of the trained MD-YOLO model, outputting a result with a license plate detection frame and returning the result.

2. The full-scene-oriented real-time license plate detection method of claim 1, wherein the generation of the training set and the test set for network training and testing specifically comprises:

collecting a license plate detection data set;

3. The full-scene-oriented real-time license plate detection method of claim 2, wherein an anchor frame is generated by using a K-means clustering method on the training data set to replace an anchor frame of an original model, so that the license plate detection precision is higher.

4. The full-scene-oriented real-time license plate detection method of claim 3, wherein the training data set is subjected to K-means clustering to generate an anchor frame, and the anchor frame replacing the original model specifically comprises:

5. The full-scene-oriented real-time license plate detection method of claim 1, wherein the modifying of the YOLOv3-tiny network structure to generate the MD-YOLO model specifically comprises:

modifying the picture size of the input MD-YOLO model to 416 × 416, with Conv10 and Conv13 as two output layers, with dimensions of 13 × 13 × C, 26 × 26 × C, respectively, where C ═ 5 × 3, classes ═ 1;

and adjusting the convolution quantity and the convolution sum of the MD-YOLO.

6. The full-scene-oriented real-time license plate detection method of claim 5, wherein the adjusting of the number and the sum of the convolutions of the MD-YOLO is specifically as follows: the number of convolutions for Conv1 was 3, the number of convolutions for Conv2 was 3, and the number of convolutions for Conv7 was 512. The convolution kernel size of Conv9 was 1 × 1, and the convolution kernel size of Conv12 was 1 × 1.

7. The full-scene-oriented real-time license plate detection method of claim 1, wherein the MD-YOLO model constructed by training with the training set is selected based on the mAP, and specifically:

8. The full-scene-oriented real-time license plate detection method of claim 1, wherein the step of outputting the result with the license plate detection frame by using the image to be detected containing the license plate as an input of the trained MD-YOLO model specifically comprises the steps of:

9. The full-scene-oriented real-time license plate detection method of claim 8, wherein the non-maximum suppression of the license plate is performed on all the prediction frames to remove repeated frame selection specifically comprises:

updating the confidence P of each license plate prediction frame as P + P_λ；

10. A full-scene oriented real-time license plate detection system comprising a memory, a processor, and computer program instructions stored on the memory and executable by the processor, the computer program instructions, when executed by the processor, performing the method steps of any of claims 1-9.