CN117011722A

CN117011722A - License plate recognition method and device based on unmanned aerial vehicle real-time monitoring video

Info

Publication number: CN117011722A
Application number: CN202210453349.5A
Authority: CN
Inventors: 阮雅端; 朱一鸣; 汪维; 汪良文; 陈启美
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2023-11-07

Abstract

A license plate recognition method and device based on unmanned aerial vehicle real-time monitoring video utilizes a deep learning convolutional neural network training to obtain a license plate detection network model and a license plate recognition network model, and during real-time recognition, the real-time monitoring video is detected by the license plate detection network model to obtain a license plate rectangular area, the license plate recognition network model recognizes the rectangular area to obtain license plate specific information, and the license plate detection network model and the license plate recognition network model are cascaded to obtain a license plate recognition result. Aiming at the problem that the unmanned aerial vehicle scene is different in scale due to the change of flying height, the method improves the characteristic pyramid network in the target detection convolutional neural network and optimizes the loss function, can solve the problem of scale transformation of the license plate target, and improves the positioning performance of the boundary frame in license plate detection to obtain more accurate license plate position information, thereby improving the accuracy of license plate identification.

Description

License plate recognition method and device based on unmanned aerial vehicle real-time monitoring video

Technical Field

The invention belongs to the technical field of video image processing, relates to unmanned aerial vehicle real-time monitoring video, and discloses a license plate recognition method and device based on unmanned aerial vehicle real-time monitoring video.

Background

Unmanned aerial vehicle aerial photography gradually becomes an important mode for shooting intelligent traffic monitoring videos with advantages of simplicity, flexibility, low price and the like. License plate is used as a unique identifier for vehicle management, and license plate recognition plays an extremely important role in an intelligent traffic system.

In an actual unmanned aerial vehicle monitoring environment, the problem of low accuracy can occur when the license plate is positioned based on the characteristics of the license plate, including colors, edges, textures and the like, due to the influence of external factors such as highly complex background, weather, illumination and the like; the license plate recognition method based on character segmentation cuts separated license plate characters from a license plate region, is easy to be influenced by noise, blurring, low resolution and the like of pictures, and has low accuracy of license plate character recognition. Particularly in the specific environment of unmanned aerial vehicle, license plate recognition is still a problem to be optimized. Unmanned aerial vehicle shooting scene can have the change of flight altitude, appears the problem that the scale is different: because the change of the height of the unmanned aerial vehicle causes the change of the size of the license plate, the problem that the information is lost easily exists in the semantic information and the feature extraction.

With the rapid development of the deep learning algorithm, alexey et al propose license plate recognition networks based on convolutional neural networks, XDu et al propose spineNet networks for solving the problem of scale transformation, and domestic scholars propose to combine a target detection algorithm with spineNet to replace a backbone network in the original detection network, so as to solve the problem of different target scales. The existing researches are aimed at natural scenes under fixed cameras, license plate recognition algorithms under unmanned aerial vehicle scenes are not proposed, and the requirements of accuracy and instantaneity in actual engineering application cannot be met only by using a universal network.

Disclosure of Invention

The invention aims to solve the problems that: under unmanned aerial vehicle real-time supervision's scene, the vehicle is more on the road, under the condition of breaking rules, only relies on the manual work unable timely license plate of drawing the vehicle of breaking rules, current detection recognition technology can't solve unmanned aerial vehicle shooting and appears the problem that angle transformation caused license plate size scale transformation, and the scheme that part was used for discernment under the image scale transformation also can't satisfy the requirement of accuracy and instantaneity under the unmanned aerial vehicle scene.

The technical scheme of the invention is as follows: the license plate recognition method based on the unmanned aerial vehicle real-time monitoring video comprises the steps of obtaining a license plate detection network model and a license plate recognition network model by utilizing a deep-learning convolutional neural network training, cascading the two network models to realize the license plate recognition of the unmanned aerial vehicle real-time monitoring video, firstly constructing a license plate detection training set by utilizing the unmanned aerial vehicle monitoring video, training the license plate detection network model, storing the position and label information of a license plate region obtained by training, and taking the same data set of the license plate detection network model as a license plate recognition training set training license plate recognition network model; and when the real-time monitoring video is identified in real time, detecting a rectangular license plate area by a license plate detection network model, identifying the license plate area by the license plate detection network model to obtain license plate specific information, cascading the license plate rectangular area and the license plate specific information to obtain a license plate identification result, and displaying the final license plate area and the license plate specific information in road traffic video data.

Further, the license plate detection network model is realized based on a convolutional neural network, and specifically comprises the following steps:

1) Acquiring road traffic video data acquired by an unmanned aerial vehicle, and dividing the road traffic video data into a training set and a testing set required by a training network model;

2) Marking license plates in road traffic video data, pre-training license plate targets by using the marked data, and optimizing a detection network for multi-scale characteristics of the license plates in the road traffic video data acquired by an unmanned aerial vehicle:

2.1 Improving the feature pyramid network, adding a self-adaptive attention module and a feature enhancement module to the feature pyramid network FPN, and generating a feature map containing multi-scale context information by the self-adaptive attention module through fusing feature maps with different scales; the feature enhancement module learns different receptive fields by using different expansion coefficients for the high-level semantic features of the FPN according to the high-level semantic information of the FPN, and fuses the semantic information of the different receptive fields through the multi-branch pooling layer to obtain feature graphs of the same kernel and different receptive fields;

2.2 Optimizing the Loss function, improving the regression accuracy of the boundary frame, using alpha-IOU Loss as the Loss function of the boundary frame, introducing super-parameter alpha into CIOU, wherein the Loss function is as follows:

the super parameter alpha is used for adjusting the regression accuracy of different horizontal bounding boxes, and IOU ^α Representing the alpha power of IOU, B representing the prediction bounding box, B ^gt Representing a real bounding box, C representing bounding B and B ^gt B represents the center point of B, B ^gt Representation B ^gt Wherein ρ (·) represents the Euclidean distance, c represents the diagonal length, v, β are as follows,

wherein w is ^gt 、h ^gt Representing the width and height of a real bounding box, w, h representing the width and height of a prediction bounding box;

2.3 The IOU threshold is adjusted to be 0.5, and the confidence coefficient threshold is adjusted to be 0.8, so that a more accurate prediction result is obtained.

3) And outputting the positioning of the license plate by the license plate detection network model, obtaining an ROI picture of the license plate, and marking on a video image.

In the step 1), a monitoring video of a road section to be detected of a non-fixed camera is acquired by unmanned aerial vehicle aerial photography, a vehicle license plate is displayed on a monitoring picture frame, a data set is segmented according to a specification of 30 frames/s, an image frame of the license plate is screened, data augmentation operation is carried out on the data set, the data augmentation method comprises the steps of rotating, horizontally stacking, overturning, combining multiple modes, respectively shrinking 4 pictures to one fourth, then splicing the four modes into a whole picture, marking the data set, dividing the marked data set into a training set and a testing set, and training a license plate detection network model.

Further, the license plate recognition network model is realized based on a convolutional neural network, the same data set of the license plate detection network model is used as the data set of the license plate recognition network for pre-training, network super-parameters are adjusted to obtain the license plate recognition network model based on deep learning, and character recognition is carried out on the license plate ROI picture positioned by the license plate detection network model to obtain license plate specific information.

The invention also provides a license plate recognition device based on the unmanned aerial vehicle real-time monitoring video, which is provided with a computer readable storage medium, wherein a computer program is configured in the computer readable storage medium and is used for realizing the license plate detection network model and the license plate recognition network model, and the computer program realizes the license plate recognition method based on the unmanned aerial vehicle real-time monitoring video when being executed.

In order to improve the accuracy and the real-time performance of license plate recognition technology, the invention provides a new scheme, and the license plate recognition network model is cascaded based on the license plate detection network model to extract the license plate region in the image and recognize the license plate information. Compared with the situation that the visual angle is changed due to the movement of the unmanned aerial vehicle in the scene of the fixed camera monitoring, the license plate can also have different scales, the license plate detection network is improved, and the invention is based on the existing target detection neural network: 1) On one hand, a self-adaptive attention module is added, high-layer and low-layer features are fused through multi-layer convolution operation, so that abundant semantic information with different scales is obtained, and loss of the semantic information is reduced; on the other hand, a feature enhancement module is added, a convolution layer of a plurality of branches is used for obtaining feature graphs with different scales, expansion of context semantic information with different expansion coefficients is carried out for obtaining feature graphs with different scales, the receptive field information obtained by each branch is fused, the feature enhancement module learns the receptive field information obtained by learning, and finally, the problem of scale change existing in practical application can be adapted, so that the detection accuracy is improved; 2) The loss function of the detection network is optimized, and the regression accuracy of the boundary boxes of different levels is improved, so that the target area is positioned more accurately.

In addition, due to the specificity of the unmanned aerial vehicle scene, the unmanned aerial vehicle is easily affected by light, interference is brought to license plate feature information extraction, and universal license plate recognition models cannot be used for detecting and recognizing license plates. According to the invention, a video frame is extracted based on unmanned aerial vehicle aerial video as an original data set, array augmentation operation is carried out on the data set, parameters in model training codes such as iteration times, batch sizes, weights and the like are modified, and a license plate detection model and a license plate recognition model applicable to unmanned aerial vehicle scenes are obtained through training by modifying network parameters and thresholds, so that the problems in license plate detection based on unmanned aerial vehicle monitoring video are solved.

Aiming at the problem that the prior deep learning-based method cannot be fully suitable for the scale transformation of the unmanned airport scene detection target, the invention improves the characteristic pyramid network in the target detection convolutional neural network and optimizes the loss function. In the prior art, the problem of information loss caused by continuous reduction of the scale of a backbone network of a traditional detection network is solved mostly through searching for the rearrangement of the scale of the feature map. The invention improves the feature pyramid network from the feature fusion angle to realize multi-layer feature fusion. The method provided by the invention can solve the problem of scale transformation of the license plate target, and can improve the positioning performance of the boundary frame in license plate detection to obtain more accurate license plate position information, thereby improving the accuracy of license plate recognition.

Aiming at the problem that the prior deep learning-based method cannot be fully suitable for the scale transformation of the unmanned airport scene detection target, the invention improves the characteristic pyramid network in the target detection convolutional neural network and optimizes the loss function. The prior art mostly solves the problem of context information loss by improving the backbone network, for example, solves the problem of information loss caused by continuous downsizing of the traditional backbone network by searching for feature map scale rearrangement. The invention improves the feature pyramid network from the feature fusion angle to realize multi-layer feature fusion. The method provided by the invention can solve the problem of scale transformation of the license plate target, and can improve the positioning performance of the boundary frame in license plate detection to obtain more accurate license plate position information, thereby improving the accuracy of license plate recognition.

The invention displays the finally positioned license plate and the specific information of the license plate in the road traffic video data, and can monitor and record the vehicle in real time.

Drawings

Fig. 1 is a flowchart of a license plate recognition method based on a real-time monitoring video of an unmanned aerial vehicle.

Fig. 2 is a flowchart of a license plate recognition algorithm according to the present invention.

FIG. 3 is a diagram of the YOLOv5 detection network employed by the license plate detection network model of the present invention.

FIG. 4 is a diagram of a modified feature pyramid network of the present invention.

Fig. 5 is a diagram of a self-adaptive attention module in a license plate detection network model according to the present invention.

FIG. 6 is a diagram of a feature enhancement module in a license plate detection network model according to the present invention.

Detailed Description

The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video can effectively solve the problem of different scales of unmanned aerial vehicle shooting, and meanwhile, the detection precision of small targets is improved. Firstly, improving a characteristic pyramid network in a license plate detection network: on one hand, the self-adaptive attention module is added to reduce the loss of high-level semantic information, and on the other hand, the feature enhancement module is added to enhance the expression capability of a feature pyramid to improve the positioning performance of a boundary frame; and secondly, optimizing the loss function, improving the regression accuracy of the bounding box, and further improving the detection accuracy, thereby improving the accuracy of license plate recognition. The practice of the invention is specifically described below.

According to the license plate recognition method based on the unmanned aerial vehicle real-time monitoring video, a network model of license plate detection and character recognition is trained by using a deep-learning convolutional neural network, and network parameters and corresponding thresholds are adjusted so that the method is suitable for unmanned aerial vehicle real-time monitoring scenes. As shown in fig. 1, the method comprises the following steps:

step1: and acquiring road traffic video data acquired by the unmanned aerial vehicle, and dividing the road traffic video data into data sets required for training and testing. Further, the license plate in the unmanned aerial vehicle aerial video is marked, for example, a marking tool LabelImg is used for marking, and the license plate is divided into a training set and a testing set according to the proportion of 8:2. All the data sets are picture frames under the unmanned airport scene, and are combined with the CCPD data set of the urban parking data set in China, and are uniformly marked to be used as the data sets. As a preferred mode, vehicle license plates are displayed on monitoring picture frames of road traffic video data, a data set is segmented according to a specification of 30 frames/s, high-quality image frames containing the license plates are screened, the high-quality image frames indicate that the license plates in pictures are free from blurring, and the original video resolution is generally required to be 1920 x 1080. The data augmentation operation is carried out on the data set, the data augmentation method comprises the steps of rotating, horizontally overlaying, overturning, combining multiple modes, respectively shrinking 4 pictures to one fourth, then piecing together to form a whole picture, marking the data set, and dividing the marked data set into a training set and a testing set for training a license plate detection network model.

Step2: and training a license plate detection network model. According to the multi-scale characteristics of license plates in the unmanned aerial vehicle aerial photographing scene, the detection network is optimized and improved: 1) Improving a feature pyramid network in a target detection neural network, specifically, on the basis of an original feature pyramid network, on one hand, adding an adaptive attention module, and simultaneously fusing high-layer and low-layer features through multi-layer convolution operation to obtain abundant semantic information with different scales, so that the loss of the semantic information is reduced; on the other hand, a feature enhancement module is added to obtain feature images with different scales by using convolution layers of a plurality of branches, the feature images with different scales can obtain receptive fields with different sizes, receptive field information obtained by each branch is fused, and the feature enhancement module learns the receptive field information obtained by learning, so that the problem of scale change in practical application can be finally adapted, and the detection accuracy is improved; 2) In order to improve the regression accuracy of the bounding box, the loss function is optimized, and the super parameters are added into the original loss function to adaptively adjust the target loss and gradient of the high-low cross-correlation ratio, so that the regression accuracy of the bounding box with different levels is realized. And inputting the training set into a target detection convolutional neural network for training, and storing a weight file trained by the target detection neural network. 3) And adjusting the IOU threshold and the confidence threshold, and eliminating the interference of the error bounding box.

The license plate detection network model disclosed by the invention uses a deep learning YOLO V5 algorithm as a basis of a target detection model, as shown in fig. 3, a main network is composed of CSP modules, feature extraction is carried out through CSPDarknet53, an original feature pyramid network FPN is improved, an adaptive attention module and a feature enhancement module are added, the problem of information loss caused by reduction of the number of channels is mainly solved, and the expression capacity of a feature pyramid is enhanced mainly aiming at a higher-layer feature map through learning different receptive fields of the feature map. The improved feature pyramid network can better utilize features extracted by the backbone network by fusing feature graphs with different scales, wherein reference numeral 1 represents processing of the self-adaptive attention module, and reference numeral 2 represents processing of the feature enhancement module.

1) Improved feature pyramid network: the improved FPN network structure is shown in fig. 4, the { c1, c2, c3, c4, c5} is a feature map obtained by performing a plurality of convolution operations on an original image, the feature map is generated by performing a plurality of convolution operations on the input original image, the self-adaptive attention module is added to fuse high-layer features and low-layer features in a top-down fusion mode, so that richer semantic information is obtained, loss of information is reduced, and meanwhile, the feature enhancement module is added to learn different receptive fields of the feature map with rich semantics, so that the accuracy of target detection is improved. The self-adaptive attention module and the feature enhancement module are mainly aimed at semantic information of a high layer, the feature enhancement module is added in 3-5 layers, and feature images of different scales obtained by the self-adaptive attention module of the feature image of the 5 th layer are fused with feature maps of the 5 th layer to generate a context aggregation feature image; the feature enhancement module expands receptive fields by using different expansion coefficients according to 3-5 layers of high-level semantic information to obtain feature graphs with different scales, and enhances the representation capability of a feature pyramid. M5 and M6 in fig. 4 are summed, i.e. the feature map of c5 is fused with feature maps of different scales.

The adaptive attention module takes c5 as input, and as shown in fig. 5, mainly includes the following steps:

1) Obtaining features with different scales through the self-adaptive pooling layer;

2) Transforming the context features of different scales into features with the same size as c5 through upsampling, and fusing three feature channels through concat operation by using spatial attention;

3) And after the feature graphs fused with the three feature channels sequentially pass through the 1X 1 convolution layer, the ReLU activation layer, the 3X 3 convolution layer and the sigmoid activation layer, the space weight corresponding to each feature graph is obtained. The generated weight map and the feature map after merging the channels are subjected to product operation, separated and fused with the feature map of c5 to obtain a context feature aggregation map; the finally obtained feature aggregation map has rich multi-scale context information. In fig. 5, the final context feature aggregation feature map is the result obtained after the fusion of M5 and M6 in fig. 4.

The feature enhancement module is shown in fig. 6, and uses different expansion coefficients for the c3, c4 and c5 feature graphs on a plurality of branches through multi-branch convolution, learns different receptive fields of the feature graphs, and fuses semantic information of different branches through a multi-branch pooling layer. The multi-branch convolution layers include convolution layers of different expansion coefficients, a batch normalization layer (BN layer), and a Relu activation layer. The expansion coefficients in the three branches shown in fig. 6 are different, but have the same kernel size, and the operation of expansion convolution only enlarges the receptive field of the feature map, does not affect the resolution and the pixel range corresponding to the feature map, and specifically, the convolution kernel size is enlarged from 3*3 to 7*7, and the corresponding receptive field is also enlarged to 7*7. The feature images are subjected to multi-branch expansion convolution operation, so that feature images with the same kernel and different receptive fields can be obtained, namely, the context semantic information is expanded by different expansion coefficients to obtain feature images with different scales, and then the feature images with different scales are fused with the result of the self-adaptive attention module to obtain richer semantic information, so that the expression capability of the feature images is improved.

2) In order to improve the bounding box regression accuracy, the loss function is optimized: the invention uses the alpha-IOU Loss as the boundary box Loss function, the alpha-IOU Loss adaptively weights the Loss and gradient of the high and low cross-correlation targets by introducing the super parameter alpha, the accuracy of prediction box regression can be improved, alpha can be generalized and suitable for boundary box regression, and the performance of positioning correct and error boundary box data sets can be improved. The alpha-IOU Loss is based on the unified exponentiation of the existing Loss, and the regression accuracy of the bounding box is improved by selecting proper alpha.

For a Loss function in the bounding box regression, the problem of gradient disappearance occurs when two bounding boxes are not overlapped, the problem of gradient disappearance on a non-overlapped sample is solved by introducing a penalty term into the IOU Loss by the GIOU Loss, the convergence speed is further increased by taking the distance between the two bounding boxes as the penalty term by the DIOU Loss, and the information such as the overlapping area, the center distance, the aspect ratio scale and the like of the bounding boxes is comprehensively considered by the CIOU Loss, so that the convergence speed and the detection performance are further improved. The invention discloses an improvement of a YOLO V5 detection network based on deep learning of a CIOU mode adopted by YOLO V5, and a loss function selects L _α-CIOU 。

In the above formula, alpha is a super parameter for adjusting regression accuracy of different horizontal bounding boxes，IOU ^α Representing the alpha power of IOU, B representing the prediction bounding box, B ^gt Representing a real bounding box, C representing bounding B and B ^gt B represents the center point of B, B ^gt Representation B ^gt Wherein ρ (·) represents the Euclidean distance, c represents the diagonal length, and v and β are represented by the following formula.

Wherein w is ^gt 、h ^gt Representing the width and height of the real bounding box, w, h representing the width and height of the prediction bounding box.

3) The IOU threshold is adjusted to be 0.5, and the confidence coefficient threshold is adjusted to be 0.8, so that a more accurate prediction result is obtained.

And finally, inputting the training set into the improved convolutional neural network for training.

Step3: and inputting the unmanned aerial vehicle monitoring video into a convolutional neural network model by utilizing a license plate target detection model to realize the positioning of the license plate, obtaining a license plate detection result and marking on an image. As shown in fig. 2, the position information and the tag information of the license plate region are stored.

Step4: the data set prepared in Step1 is used as a data set required by a license plate recognition network to be pre-trained, super parameters in the network are adjusted to be suitable for the license plate data set in an unmanned aerial vehicle scene, a network model of the license plate recognition based on deep learning is obtained, and a trained weight file is stored.

The license plate recognition network is trained by adopting an end-to-end method, characters are not required to be segmented in advance, an original image of a license plate area is used as input of the convolutional neural network, a lightweight convolutional neural backbone network is adopted, CTC loss fitting convolutional neural network is used, in order to further improve performance, a global context part is added in the middle characteristic mapping of a pre-decoder, the size of the required character length is calculated on the output of the backbone network through a full connection layer, the final result output is adjusted, and finally the final result output is connected with the output of the backbone network. The depth of the feature map is matched to the class number of the character using a 1 x 1 convolution. In the final character sequence decoding stage, character reasoning is completed using greedy search and prefix search.

Step5: and (3) carrying out character recognition on the license plate region positioned in Step3 by using the license plate recognition model obtained in Step4, and outputting a license plate character sequence.

Step6, displaying the finally positioned license plate and specific information of the license plate in the road traffic video data, and carrying out real-time monitoring and recording on the illegal vehicles.

Based on the license plate recognition method, the invention provides a license plate recognition algorithm for monitoring video scenes in real time by an unmanned aerial vehicle, and the final effect can meet the requirements of instantaneity and accuracy.

The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video can accurately recognize unique identification information-license plate information of the vehicle in real time, and provides more accurate information for traffic supervision. Firstly, marking a license plate in an acquired unmanned aerial vehicle real-time monitoring video, training a license plate target detection convolutional neural network by using marked data, and carrying out improvement optimization on a characteristic pyramid network and a loss function to finally obtain a license plate target detection model in order to be suitable for unmanned aerial vehicle scenes; and then taking license plate label information as labels, adjusting network parameters, and training a license plate recognition convolutional neural network by using the labeled data to finally obtain a license plate recognition model. Finally, the two algorithms are cascaded, and license plate recognition is realized by using two network models.

The two algorithms are superior to other networks in detection precision and network architecture, and meanwhile, the network is improved, so that the method is more suitable for unmanned aerial vehicle scenes, and license plate character information is accurately identified in real time. The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video can realize license plate recognition under different characteristic scales and different complex scenes, and specific information of the license plate can be accurately displayed in the video in real time.

Claims

1. The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video is characterized in that a license plate detection network model and a license plate recognition network model are obtained through deep learning convolutional neural network training, the two network models are cascaded, so that the license plate recognition of the unmanned aerial vehicle real-time monitoring video is realized, firstly, a license plate detection training set is built through the unmanned aerial vehicle monitoring video, the license plate detection network model is trained, the position and the label information of a license plate area obtained through training are saved, and the same data set of the license plate detection network model is used as a license plate recognition training set training license plate recognition network model; and when the real-time monitoring video is identified in real time, detecting a rectangular license plate area by a license plate detection network model, identifying the license plate area by the license plate detection network model to obtain license plate specific information, cascading the license plate rectangular area and the license plate specific information to obtain a license plate identification result, and displaying the final license plate area and the license plate specific information in road traffic video data.

2. The license plate recognition method based on unmanned aerial vehicle real-time monitoring video according to claim 1, wherein the license plate detection network model is realized based on a convolutional neural network, and specifically comprises the following steps:

2.3 Adjusting the IOU threshold to be 0.5 and the confidence coefficient threshold to be 0.8, and performing network training;

3. The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video is characterized in that the feature outputted by a feature pyramid network FPN is mapped into { c1, c2, c3, c4, c5}, the self-adaptive attention module obtains feature graphs with different scales after operating the feature graph c5, and then the feature graphs are added into the feature graph c5 to generate a context aggregation feature graph; the feature enhancement module operates the c3, c4 and c5 layer feature graphs, and enhances the representation capability of a feature pyramid aiming at the high-level semantic information of the image, specifically:

taking c5 as input of the adaptive attention module, the operation of the adaptive attention module comprises:

1) Obtaining c5 three features with different scales through the self-adaptive pooling layer;

2) Transforming the context features of different scales into features with the same size as c5 through upsampling, and fusing three feature channels through concat operation by using a spatial attention mechanism;

3) Fusing the feature maps of the three feature channels, sequentially performing four operations of a 1X 1 convolution layer, a ReLU activation layer, a 3X 3 convolution layer and a sigmoid activation layer to obtain different space weights corresponding to each feature map, performing product operation on the generated weight maps and the feature maps after combining the channels, separating the feature maps, and adding the feature maps into the feature maps of c5 to obtain context feature aggregation mapping output;

and simultaneously, the characteristic enhancement module learns different receptive fields for c3, c4 and c5 of the FPN by respectively using different expansion coefficients of a multi-branch convolution layer, and fuses semantic information of different branches through a multi-branch pooling layer to be used as characteristic output of the layer, wherein the multi-branch convolution layer comprises a convolution layer with different expansion coefficients, a batch normalization layer and a Relu activation layer.

4. The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video is characterized in that in the step 1), the unmanned aerial vehicle is used for aerial acquisition to obtain the monitoring video of a road section to be detected of a non-fixed camera, a vehicle license plate is displayed on a monitoring picture frame, a data set is segmented according to a specification of 30 frames/s, a license plate image frame is screened, data augmentation operation is carried out on the data set, the data augmentation method comprises the steps of rotating, horizontally moving, superposing, overturning, combining multiple modes, respectively shrinking 4 pictures to one fourth, then splicing the four pictures into a whole picture, marking the data set, dividing the marked data set into a training set and a testing set, and training a license plate detection network model.

5. The license plate recognition method based on the unmanned aerial vehicle real-time monitoring video according to claim 1 or 2, wherein the license plate recognition network model is realized based on a convolutional neural network, the same data set of the license plate detection network model is used for pre-training, network super-parameters are adjusted to obtain the license plate recognition network model based on deep learning, and character recognition is carried out on a license plate ROI picture positioned by the license plate detection network model to obtain license plate specific information.

6. License plate recognition device based on unmanned aerial vehicle real-time monitoring video, characterized in that the device is provided with a computer readable storage medium, wherein a computer program is configured in the computer readable storage medium and is used for realizing the license plate detection network model and the license plate recognition network model according to any one of claims 1-5, and the computer program is executed to realize the license plate recognition method based on unmanned aerial vehicle real-time monitoring video according to claims 1-5.