CN111695561A

CN111695561A - License plate detection and correction recognition method and recognition system based on SSD

Info

Publication number: CN111695561A
Application number: CN202010449471.6A
Authority: CN
Inventors: 孙超; 邢卫国; 施远银; 鞠蓉
Original assignee: Nanjing Boya Jizhi Intelligent Technology Co ltd
Current assignee: Nanjing Boya Jizhi Intelligent Technology Co ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2020-09-22

Abstract

The invention discloses a license plate detection and correction recognition method and a license plate detection and correction recognition system based on SSD, wherein the method comprises the following steps: inputting an image, and detecting whether a license plate exists in the image, the position of the license plate and the position of a key point on the license plate; judging whether the license plate inclines or not, and if the license plate inclines, correcting the position of the license plate through affine transformation of key points; identifying an untilted license plate image or a corrected license plate image, detecting whether characters exist in the license plate image or not and positions of the characters, and outputting an identification result; the system comprises a license plate and license plate key point detection module, a correction module and a license plate character detection module which are sequentially connected, wherein an image to be detected is input into the license plate and license plate key point detection module, and the license plate character detection module outputs character information on the license plate. The invention provides an end-to-end license plate detection, correction and license plate character recognition method based on an SSD algorithm, and the accuracy of license plate recognition can be improved.

Description

License plate detection and correction recognition method and recognition system based on SSD

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a license plate detection and correction recognition method and a license plate detection and correction recognition system based on an SSD.

Background

Vehicle detection and identification are important components of modern intelligent traffic systems, and good vehicle detection and identification systems can greatly relieve increasingly severe traffic pressure.

At present, vehicle detection and discernment mainly adopt earlier draw the license plate information in the vehicle, the method of discerning is carried out to the license plate character again, traditional license plate discernment is when carrying out license plate detection, directly send the output result after detecting into discernment afterwards, in the face of conventional scene, it can obtain good effect, can be fine solution problem under general scene, but under the condition of serious license plate slope, for example the license plate probably detected, but the license plate that detects is the slope, this accuracy of discernment after will very big reduction.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problem of license plate recognition rate reduction caused by license plate inclination in the prior art, the invention discloses a license plate detection, correction and recognition method and a license plate recognition system based on SSD.

The technical scheme is as follows: the invention adopts the following technical scheme: a license plate detection and correction recognition method based on SSD is characterized by comprising the following steps:

s1, inputting an image, detecting whether a license plate exists in the image, and if so, detecting the position of the license plate and the position of a key point on the license plate;

s2, judging whether the license plate inclines or not, and if the license plate inclines, correcting the position of the license plate through affine transformation of key points;

s3, recognizing the untilted license plate image or the corrected license plate image, detecting whether characters exist in the license plate image, detecting the position and the type of the characters if the characters exist, and outputting a recognition result.

A license plate detection and correction recognition system based on SSD comprises a license plate and license plate key point detection module, a correction module and a license plate character detection module which are connected in sequence, wherein an image to be detected is input into the license plate and license plate key point detection module, and the license plate character detection module outputs character information on the license plate; wherein the content of the first and second substances,

the license plate and license plate key point detection module is used for detecting whether a license plate, the position of the license plate and the position of a key point on the license plate exist in the image and outputting an independent license plate image to the correction module;

the correction module is used for judging whether the license plate in the license plate image is inclined or not, correcting the inclined license plate image and outputting the license plate image which is not inclined or the corrected license plate image to the license plate character detection module;

the license plate character detection module is used for detecting whether characters exist in the license plate image and the positions and the types of the characters, and outputting a recognition result.

Preferably, the license plate and license plate key point detection module comprises a first data input layer, a basic feature extraction network, a deep feature extraction network, a first prior frame module and a first detection network; wherein the content of the first and second substances,

the first data input layer is used for reading an input image, preprocessing the input image and outputting the preprocessed image to the basic feature extraction network;

the basic feature extraction network is used for extracting the shallow feature of the image and outputting a shallow feature map to the deep feature extraction network and the first detection network;

the deep feature extraction network is used for extracting deep features of the image from the shallow feature map and outputting the deep feature map to the first detection network;

the first priori frame module is used for setting candidate target frames on the shallow layer characteristic diagram and the deep layer characteristic diagram and presetting candidate positions of the license plate in the image;

the first detection network comprises a plurality of convolution layers which are connected and used for screening candidate targets in a candidate target frame according to a shallow feature map and a deep feature map and carrying out license plate probability prediction, license plate position prediction and license plate key point prediction on an image.

Preferably, the basic feature extraction network comprises three maximum pooling layers, two convolution layers are respectively arranged in front of and behind the first maximum pooling layer, three convolution layers are respectively arranged in front of and behind the third maximum pooling layer, and an activation function layer is arranged behind each convolution layer; the preprocessed image is input into the first convolution layer, and the last activation function layer outputs a shallow feature map.

Preferably, the deep feature extraction network comprises a maximum pooling layer, five convolutional layers are arranged behind the maximum pooling layer, and an activation function layer is arranged behind each convolutional layer; the shallow layer feature map is input into the maximum pooling layer, the third layer of activation function layer outputs a first deep layer feature map, the fifth layer of activation function layer outputs a second deep layer feature map, and the first deep layer feature map and the second deep layer feature map are input into the first detection network.

Preferably, the prior frame module presets 25200 candidate target frames, which are candidate targets, wherein:

setting 19200 candidate targets on the shallow feature map, wherein the 19200 candidate targets comprise 6400 candidate targets with the aspect ratio of 1:1, 6400 candidate targets with the aspect ratio of 1:3 and 6400 candidate targets with the aspect ratio of 3: 1;

setting 4800 candidate targets on the first deep feature map, including 1600 candidate targets with aspect ratio 1:1, 1600 candidate targets with aspect ratio 1:3 and 1600 candidate targets with aspect ratio 3: 1;

1200 candidate targets are arranged on the second deep layer feature map, and the 1200 candidate targets comprise 400 candidate targets with the aspect ratio of 1:1, 400 candidate targets with the aspect ratio of 1:3 and 400 candidate targets with the aspect ratio of 3: 1.

Preferably, the license plate character detection module comprises a second data input layer, a feature extraction network, a second prior frame module and a second detection network; wherein the content of the first and second substances,

the second data input layer is used for reading the license plate image, preprocessing the license plate image and outputting the preprocessed license plate image to the feature extraction network;

the feature extraction network is used for extracting features of the license plate image and outputting a feature map to the second detection network;

the second prior frame module is used for setting a candidate target frame on the characteristic diagram and presetting candidate positions of characters in the license plate image;

the second detection network comprises two connected convolutional layers and is used for screening candidate targets in the candidate target frame according to the characteristic diagram, performing character probability prediction, character position prediction and character type prediction on the license plate image and outputting a recognition result.

Preferably, the feature extraction network comprises five convolution layers and four maximum pooling layers which are arranged in a crossed manner, and an activation function layer is arranged behind each convolution layer; the preprocessed license plate image is input into a first layer of convolution layer, and a characteristic graph is output from a last layer of activation function layer.

Preferably, the second prior frame module presets 768 candidate target frames on the feature map, where the candidate target frames are candidate targets, and the candidate targets include 256 candidate targets with an aspect ratio of 1:1, 256 candidate targets with an aspect ratio of 1:2, and 256 candidate targets with an aspect ratio of 2: 1.

Preferably, when the candidate targets are screened, the candidate target frames are sorted according to the probability prediction result of the candidate targets, the candidate targets with low probability are filtered, then the remaining candidate targets are subjected to non-maximum suppression filtering, relatively small candidate targets between the candidate target frames with large overlapping proportion are filtered, and finally the remaining candidate targets are used as the final output result.

Has the advantages that: the invention provides an end-to-end license plate detection, correction and license plate character recognition method based on an SSD algorithm, when the license plate is detected, the license plate in an input picture is detected, key points of the license plate are also detected, the detected license plate is not directly recognized any more, the position of the license plate is corrected according to the detected key points and then the corrected license plate is sent to a recognition part, the recognition is realized by detecting characters in the corrected license plate, and the accuracy of the license plate recognition can be improved.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a general block diagram of the present invention;

FIG. 3 is a diagram of a basic feature extraction network in a license plate and license plate key point detection module;

FIG. 4 is a block diagram of an additional feature extraction network in a license plate and license plate key point detection module;

FIG. 5 is a block diagram of a first detection network in a license plate and license plate key point detection module;

fig. 6 is a structural diagram of a feature extraction network in a license plate character detection module.

Detailed Description

The present invention will be further described with reference to the accompanying drawings.

An SSD (Single Shot MultiBox Detector) algorithm belongs to a one-stage method of multi-frame prediction, a target is directly detected by adopting a CNN (compressed natural network), and the problems that a small target is difficult to detect and the positioning is inaccurate are solved to a certain extent. The core concept of the SSD algorithm is as follows:

1. using multiscale feature maps for detection

The feature maps with different sizes are adopted in a multi-scale mode, the feature map in the front of the CNN network is generally larger, and the convolution or pool with stride =2 is gradually adopted to reduce the size of the feature map, for example, a larger feature map and a smaller feature map can be used for detection. The benefit of multiscale is that larger feature maps can be used to detect relatively small targets, while small feature maps are responsible for detecting large targets.

2. Detection by convolution

The SSD algorithm directly adopts convolution to extract detection results from different feature maps, and only a small convolution kernel is needed to obtain detection values in the feature maps.

3. Setting a prior frame

The SSD algorithm uses the concept of anchors in the Faster R-CNN as reference, each unit is provided with prior frames with different scales or length-width ratios, and predicted bounding boxes (bounding boxes) are based on the prior frames, so that the training difficulty is reduced to a certain extent. In general, each cell is provided with a plurality of prior frames, and the dimensions and the aspect ratios of the prior frames are different.

The invention discloses a license plate detection and correction identification method based on SSD, which comprises the following steps:

s1, inputting an image, detecting whether a license plate exists in the image, and if so, detecting the position of the license plate and the position of a key point on the license plate.

The key points are four vertexes on the detected license plate, namely an upper left vertex, an upper right vertex, a lower left vertex and a lower right vertex. For different license plates, the key points are all the four vertexes.

And S2, judging whether the license plate inclines or not, and if the license plate inclines, correcting the position of the license plate through affine transformation of key points.

When the license plate is detected, the detection network predicts the vertex coordinates of the upper left corner and the lower right corner of the license plate and also detects the coordinates of four key points of the license plate. And comparing the predicted top left corner vertex coordinate and the predicted bottom right corner vertex coordinate of the license plate with the predicted top left corner vertex coordinate and the predicted bottom right corner vertex coordinate of the license plate obtained by detecting the key points, wherein if the two top left corner vertex coordinates and the two bottom right corner vertex coordinates are respectively consistent, the license plate is not inclined, otherwise, the license plate is inclined.

When correcting an inclined license plate, firstly determining the coordinates of detected key points, wherein the key point coordinates required to be used in the step are top left corner vertex coordinates, bottom left corner vertex coordinates and bottom right corner vertex coordinates; then, determining corrected coordinates of the key points, wherein (0, 0), (192, 0) and (0, 64) are used as the corrected coordinates in the invention; then obtaining a transformation matrix according to the coordinate of the key point before correction and the corrected coordinate after correction; and finally, correcting the coordinates of the 4 key points of the license plate according to the transformation matrix to obtain the corrected license plate. When a correction coordinate is set, keeping the aspect ratio of the corrected license plate to be the same as that of an actual license plate, wherein the aspect ratio is generally 1: 3; meanwhile, the size of the license plate obtained after correction is the same as the size of the license plate input during training character detection.

S3, recognizing the untilted license plate image or the corrected license plate image, detecting whether characters exist in the license plate image, detecting the positions and the types of the characters if the characters exist, outputting a recognition result, and arranging the recognized and classified characters according to the positions of the characters when the recognition result is output.

Based on the concept of SSD algorithm, the invention also discloses a license plate detection and correction recognition system based on SSD, which comprises a license plate and license plate key point detection module, a correction module and a license plate character detection module which are connected in sequence, wherein an image to be detected is input into the license plate and license plate key point detection module, and the license plate character detection module outputs character information on the license plate; wherein the content of the first and second substances,

The license plate and license plate key point detection module comprises a first data input layer, a basic feature extraction network, a deep feature extraction network, a first prior frame module and a first detection network; wherein the content of the first and second substances,

the first data input layer is used for reading an input image, carrying out corresponding preprocessing on the input image and outputting the preprocessed image to the basic feature extraction network;

the first detection network is used for carrying out license plate probability prediction, license plate position prediction and license plate key point prediction on the input image according to the shallow feature map and the deep feature map.

The basic feature extraction network consists of a plurality of convolution layers, an activation function layer and a maximum pooling layer, wherein the convolution layers are used for extracting feature information of an image to be detected and identified; the activation function layer is connected with the convolution layer and is used for filtering useless interference characteristic information output by the convolution layer; the maximum pooling layer is connected with the activation function layer and used for screening and reducing the dimension of the feature information output by the activation function layer and reducing feature calculation amount. In one embodiment of the present invention, when creating the basic feature extraction network, a convolution layer is created first, and then an activation function layer is created, in this embodiment, a ReLU (nonlinear unit) is used as an activation function, then, the convolution layer and the activation function layer are created again, and after the creation of the activation function layer is completed, a maximum pooling layer is created to compress feature information to reduce the amount of computation; after the creation of the maximum pooling layer is completed, the convolution layer, the activation function layer, the maximum pooling layer, the convolution layer, the activation function layer, the maximum pooling layer, the convolution layer, the activation function layer, the convolution layer and the activation function layer are alternately created again, and finally the construction of the basic feature extraction network is completed. The output features of the basic feature extraction network are used as the shallow features of the input image and as the input of the first detection network and the deep feature extraction network.

The extra feature extraction network is composed of a plurality of convolution layers, an activation function layer and a maximum pooling layer and is used for extracting deep features of an input image, when the extra feature extraction network is constructed, a shallow feature is used as input, the maximum pooling layer, the convolution layer, the activation function layer, the convolution layer and the activation function layer are sequentially created, finally, a feature output by the third activation function layer and a feature output by the last activation function layer are respectively used as a first deep feature and a second deep feature, and the first deep feature and the second deep feature are used as input of the first detection network.

The first detection network is composed of a plurality of convolution layers and used for predicting whether the position information of the license plate in the input image is the license plate or not and the key point position of the license plate, and meanwhile, the input of the first detection network is respectively a shallow feature output by the basic feature extraction network and a first deep feature and a second deep feature output by the extra feature extraction network. The detection network is provided with two branches, wherein one branch predicts the vertex coordinates of the upper left corner and the lower right corner of the license plate, and the other branch detects the coordinates of four key points of the license plate.

The first priori frame module is used for setting candidate target frames, which are candidate targets, and considering the proportion of the license plate, 25200 candidate targets are preset in the license plate and license plate key point detection module, wherein 19200 candidate targets are set on the shallow feature map, 6400 candidate targets with the aspect ratio of 1:1, 6400 candidate targets with the aspect ratio of 1:3 and 6400 candidate targets with the aspect ratio of 3:1 are provided in the candidate targets; in addition, 4800 candidate targets are set on the first-layer deep feature map, wherein 1600 candidate targets with the aspect ratio of 1:1, 1600 candidate targets with the aspect ratio of 1:3 and 1600 candidate targets with the aspect ratio of 3:1 are set on the first-layer deep feature map; finally, 1200 candidate targets are also arranged on the second-layer deep feature map, 400 candidate targets with the aspect ratio of 1:1, 400 candidate targets with the aspect ratio of 1:3 and 400 candidate targets with the aspect ratio of 3:1 are arranged in the candidate frames, and the candidate targets are densely covered on the feature map.

The license plate character detection module comprises a second data input layer, a feature extraction network, a second prior frame module and a second detection network; wherein the content of the first and second substances,

the second data input layer is used for reading the license plate image, carrying out corresponding preprocessing on the license plate image and outputting the preprocessed license plate image to the feature extraction network;

the second detection network is used for performing character probability prediction, character position prediction and character type prediction on the license plate image according to the characteristics of the license plate image and outputting a recognition result.

The feature extraction network is composed of a plurality of convolution layers, an activation function layer and a maximum pooling layer, wherein the corrected license plate image is input into the convolution layer at the most front end, and when the feature extraction network is constructed, the convolution layers, the activation function layer, the maximum pooling layer, the convolution layer, the activation function layer, the maximum pooling layer, the convolution layer and the activation function layer are sequentially constructed, and finally the output of the activation function layer is sent into the second detection network.

The second detection network consists of two layers of convolution layers and is used for predicting character position information, character type information and whether the input license plate image is a character or not.

The second priori frame module is used for setting candidate target frames, wherein the candidate target frames are candidate targets, and considering the proportion of license plate characters, 768 candidate targets are preset in the character detection network, and 256 candidate targets with the aspect ratio of 1:1, 256 candidate targets with the aspect ratio of 1:2 and 256 candidate targets with the aspect ratio of 2:1 are in the candidate frames.

When the method is implemented specifically, firstly, a target to be detected is sent into a first data input layer for data processing, then the target to be detected is sent into a basic feature extraction network, the basic feature extraction network comprises 10 convolutional layers, 10 activation function layers and 3 maximum pooling layers, the final output is called as a shallow feature, on one hand, the shallow feature is sent into the first detection network, and the license plate position prediction, the license plate key point prediction and the license plate probability prediction are carried out on candidate targets on the shallow feature; and on the other hand, the shallow feature is transmitted to an additional feature network for feature extraction, the additional feature network comprises 6 convolutional layers, 6 activation function layers and 1 maximum pooling layer in total, the feature output by the third activation function layer is called a first deep feature, the feature output by the last activation function layer is called a second deep feature, the two layers of features are obtained and then are respectively transmitted to a first detection network, and license plate position prediction, license plate key point prediction and license plate probability prediction are carried out on the candidate target. Finally, 25200 candidate targets are screened. In the screening process, firstly, the candidate targets with low scores are filtered according to the probability scores of the candidate targets in the order from small to large, then the non-maximum value inhibition filtering is carried out, namely, the relatively small candidate targets with the overlapping part proportion of the candidate target frames larger than the threshold are filtered according to the threshold, and finally the remaining candidate targets are used as the final output result.

In the invention, the network parameters in the license plate key point detection module and the license plate character detection module are obtained after training iteration, and the network parameters are continuously updated in the training process until the network output is stable, and at the moment, the network parameters are fixed. The training process is to input the training data into each network to calculate the output, then to calculate the loss of the output and the data label, then to update the weight through the back propagation, and to repeat the above process continuously until the loss value reaches the expected effect, to obtain the trained network parameters and network.

The invention provides a method for detecting a license plate and key points of the license plate, which is characterized in that after the license plate is detected, a detected target is not directly sent to the subsequent recognition, the license plate is corrected first, and then the corrected license plate is sent to the subsequent recognition. When the license plate correction is carried out, affine transformation is carried out according to the obtained key points.

In the invention, when the license plate is identified, the identification is not carried out in a classification mode, but is carried out by detection, and the corrected license plate is used as a target to be detected by using a license plate character detection module for detection. When the vehicle license plate recognition method is used for detecting and recognizing, not only are the positions of the characters on the vehicle license plate regressed, but also the categories of the characters are classified, and the recognition accuracy is improved. During recognition, firstly, the corrected license plate is sent to a second data input layer, then, the processed data is sent to a feature extraction network, 5 layers of convolution layers, 5 layers of activation function layers and 4 layers of maximum pooling layers are totally arranged in the feature extraction network, and character detection is carried out on an output feature diagram of the last activation function layer. 768 candidate targets are designed in advance in a second priori frame module in the license plate character detection module, character category prediction and character position prediction are carried out on the 768 candidate targets in a second detection network, and then the candidate targets are screened. In the screening process, firstly, the candidate targets with low scores are filtered according to the probability scores of the candidate targets in the order from small to large, then the non-maximum value inhibition filtering is carried out, namely, the relatively small candidate targets with the overlapping part proportion of the candidate target frames larger than the threshold are filtered according to the threshold, and finally the remaining candidate targets are used as the final output result.

The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims

1. A license plate detection and correction recognition method based on SSD is characterized by comprising the following steps:

2. A license plate detection and correction recognition system based on SSD is characterized by comprising a license plate and license plate key point detection module, a correction module and a license plate character detection module which are sequentially connected, wherein an image to be detected is input into the license plate and license plate key point detection module, and the license plate character detection module outputs character information on the license plate; wherein the content of the first and second substances,

3. The SSD-based license plate detection and correction recognition system of claim 2, wherein the license plate and license plate key detection module comprises a first data input layer, a base feature extraction network, a deep feature extraction network, a first prior frame module, and a first detection network; wherein the content of the first and second substances,

4. The SSD-based license plate detection and correction recognition system of claim 3, wherein the basic feature extraction network comprises three largest pooling layers, the first largest pooling layer is provided with two convolution layers in front and back, the third largest pooling layer is provided with three convolution layers in front and back, and an activation function layer is provided behind each convolution layer; the preprocessed image is input into the first convolution layer, and the last activation function layer outputs a shallow feature map.

5. The SSD-based license plate detection and correction recognition system of claim 3, wherein the deep feature extraction network comprises a max pooling layer, five convolutional layers are arranged behind the max pooling layer, and an activation function layer is arranged behind each convolutional layer; the shallow layer feature map is input into the maximum pooling layer, the third layer of activation function layer outputs a first deep layer feature map, the fifth layer of activation function layer outputs a second deep layer feature map, and the first deep layer feature map and the second deep layer feature map are input into the first detection network.

6. The SSD-based license plate detection and correction recognition system of claim 3, wherein the first prior frame module presets 25200 candidate object frames, of which are candidate objects, wherein:

7. The SSD-based license plate detection and correction recognition system of claim 2, wherein the license plate character detection module comprises a second data input layer, a feature extraction network, a second prior frame module, and a second detection network; wherein the content of the first and second substances,

8. The SSD-based license plate detection and correction recognition system of claim 7, wherein the feature extraction network comprises five convolutional layers and four maximum pooling layers arranged in a crossed manner, and an activation function layer is arranged behind each convolutional layer; the preprocessed license plate image is input into a first layer of convolution layer, and a characteristic graph is output from a last layer of activation function layer.

9. The SSD-based license plate detection and correction recognition system of claim 7, wherein the second prior frame module pre-sets 768 candidate object frames on the feature map, wherein the candidate object frames are candidate objects, and the candidate object frames comprise 256 candidate objects with an aspect ratio of 1:1, 256 candidate objects with an aspect ratio of 1:2, and 256 candidate objects with an aspect ratio of 2: 1.

10. The SSD-based license plate detection and correction recognition system of claim 3 or 7, wherein when screening candidate targets, the candidate target frames are sorted according to the probability prediction results of the candidate targets, to filter out candidate targets with low probability, then non-maximum suppression filtering is performed on the remaining candidate targets, to filter out relatively smaller candidate targets between candidate target frames with large overlapping portion proportion, and finally the remaining candidate targets are taken as the final output result.