CN116612398A

CN116612398A - Unmanned aerial vehicle inspection photo pole tower license plate character recognition method based on CTPN algorithm

Info

Publication number: CN116612398A
Application number: CN202211597530.XA
Authority: CN
Inventors: 林龙旭; 郑恩辉
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-08-18

Abstract

The application discloses a method for identifying pole and tower license plate characters based on a CTPN algorithm. The method comprises the steps of shooting tower license plate photos of a plurality of lines by using an unmanned aerial vehicle, marking the text parts of all data license plates for subsequent training, training a marked data set by using an improved CTPN algorithm to obtain text position information, training a CRNN network by using a public data set for text content identification, and screening and completing text information which is identified and output by license plate pictures of the lines to be processed. The application realizes the determination of the character position of the pole and tower license plate in the inspection photo by improving and optimizing the CTPN network, and recognizes the character content by the CRNN network, thereby realizing the recognition of the information of the small target license plate.

Description

Unmanned aerial vehicle inspection photo pole tower license plate character recognition method based on CTPN algorithm

Technical Field

The application relates to a character recognition method for a number plate of a pole tower of an unmanned aerial vehicle inspection photo in the field of image processing, in particular to a character recognition method for a number plate of a pole tower of an unmanned aerial vehicle inspection photo based on a CTPN algorithm.

Background

Under the background that the application range of an intelligent inspection mode of an unmanned aerial vehicle is increasingly enlarged in the operation and maintenance of a power transmission line, no platform or software in a power grid system in China is based on artificial intelligence theory and technology at present, and automatic classification naming and quality analysis of inspection photos without specific flight tracks can be realized. Unmanned aerial vehicle inspection is obvious to transmission line on-the-spot inspection efficiency promotion effect, and the definition inspection relates to huge photo data acquisition and classification naming work. However, in the existing inspection operation, after the inspection personnel acquires the inspection image data through the unmanned aerial vehicle, the inspection photo is still related to the tower to which the inspection photo belongs and the shooting position thereof in a manual mode, so that a large number of inspection photos are classified and named, a large amount of effort and time are required, and the accuracy of the operation result cannot be guaranteed.

In the traditional scene text detection, firstly, characters are identified through textures, which is similar to pattern identification, but pixels are enumerated, the calculated amount is too large, and specific conditions are identified through a preset form, so that generalization capability is poor. Secondly, the text recognition method taking SWT as an example and taking a communicating component as a basis can find the area where the characters are located more quickly, but the robustness of the method is not high due to factors such as affine transformation of pictures, disconnection of the characters and the like in a scene. Then, a text detection method combining the two methods is adopted, namely, the area where the text is located is found through the communicating component, and then the text is identified in an auxiliary mode through a texture method, but only the horizontally arranged text can be detected. The existing scene character recognition learns character information by a deep learning method.

The application needs to recognize the scene text of the high-pixel photo shot by the unmanned aerial vehicle, and the size of the number plate in the photo is smaller, so that the photo can be regarded as small target detection. The direct input of the high-pixel picture into the network can cause excessive model parameter quantity and excessive hardware requirement, and most of the deep learning models can adjust the input picture to a certain size, but the size is not in line with the requirement of small target detection under the high-pixel picture, because the adjusted picture pixels are lost too much, and the small target features are not obvious. The present study will be applicable to small object detection in high pixel pictures by improving the existing scene text recognition model.

Disclosure of Invention

In order to achieve the above purpose, the application provides a CTPN algorithm-based unmanned aerial vehicle inspection photo tower number plate character recognition method, which is characterized in that the CTPN algorithm is improved to position characters in an inspection photo, characters after the positions are acquired are input into a CRNN algorithm to recognize character contents, so that the line number name and the tower number of a tower are obtained, the name of a line is obtained, and finally lost data is completed through analysis of the tower number.

The technical scheme adopted by the application is as follows:

step 1: taking inspection photographs of the pole tower license plate by using the unmanned aerial vehicle at fixed points;

step 2: labeling the acquired pole tower number plate data set, and constructing a pole tower number plate data set;

step 3: training an improved CTPN network to obtain text position information;

step 4: training a convolutional recurrent neural network to achieve identification of text content;

step 5: and carrying out character recognition on the number plate of the line to be processed.

The step 2 mainly comprises screening inspection photos, screening out pictures with lost numbers and severely or underexposed shooting, marking the inspection photos by using an open source method PPOCRLAbel, marking the inspection photos by using a rectangular frame, outputting coordinates of four corners of the rectangular frame and Chinese character meanings of a tower number plate, changing the output result into a format of an international conference in the field of document analysis and identification for subsequent training by using a script, wherein the format mainly comprises the coordinates of four points, language used by the characters and text content.

The training of the improved CTPN network in step 3 to obtain the text location information is specifically:

cutting the inspection photo by adopting a sliding window with the size of 684 multiplied by 384, moving to the next position at the overlapping rate of 180 pixel points during transverse movement, moving to the next position at 180 pixel points during longitudinal movement, finally cutting out a plurality of pictures with the size of 684 multiplied by 382, storing the left upper corner coordinate position when moving to the point as a naming during cutting, simultaneously calculating whether the coordinates of each point of the stored rectangular frame are inside the real frame and whether the real frame is inside the cut picture, inputting the picture with the real frame in the cut picture and the picture with a part of the real frame as negative samples into a network, and inputting the picture with the real frame in the cut picture as positive samples into network learning.

The improved CTPN network inputs the preprocessed picture into the VGG16 network to extract image features, a 3X 3 sliding window is used for convoluting the feature map to obtain feature vectors, the obtained feature vectors are input into a two-way long and short memory artificial neural network to learn the features of each row, finally the output feature values pass through a full connection layer, an anchor point mechanism is introduced when a frame is predicted again, the width of the anchor point mechanism is not changed to 16, the heights are preset to ten, namely 11, 16, 22, 32, 46, 65, 93, 134, 191 and 273, so that different height texts are covered, and 2k vertical coordinates, 2k prediction scores and k frame adjustment parameters are output.

Wherein the vertical coordinate is composed of two parts, namely the height of the central position and the height of the rectangular frame.

Wherein, the liquid crystal display device comprises a liquid crystal display device,and h ^a The y-coordinate center and height, c, of the anchor point, respectively _y And h are the predicted y-coordinate center and height, respectively, and k frame adjustment parameters are used to refine the two end points of the text lineRepresenting the horizontal translation amount of each suggestion box;

wherein x is _side The predicted nearest coordinates to the anchor horizontal coordinates,is the x coordinate center of the anchor point, w ^a Is the width of the anchor point.

And splicing the cut pictures back to the complete original picture through the previously-misplaced upper left corner picture coordinates, carrying out real boundary box integration in the original picture, searching candidate anchor points with the horizontal distance of the candidate boxes smaller than 50 along the forward direction, reserving the anchor points with the overlapping rate of the candidate boxes larger than 0.5, selecting the candidate box value with the largest value obtained through an activation function, carrying out searching again along the reverse direction by the same method, and finally splicing the candidate boxes meeting the requirements to obtain a large predicted box as the final output result.

The training results need to be evaluated by a loss function, which consists of three parts, and three branches correspond to the three parts. The first part is a logistic regression loss for supervising whether text is contained in the learning anchor, the second part is a regression of the bounding box, and the third part is an offset for supervising the regression of the anchor bounding box that contains text.

The step 4 specifically comprises the following steps: firstly, extracting a characteristic diagram of an input image from a pole and tower number plate data set through a first convolution layer of a convolution neural network, wherein seven layers of convolution are shared in the convolution neural network, four largest pooling layers are used for extracting characteristics, two normalization operations are used for accelerating model convergence, and the training process is shortened.

The method comprises the steps of generating a feature map from left to right according to columns, wherein each column contains 512-dimensional features, namely, the ith feature vector is a connection of ith column pixels of the feature map, the feature vectors form a feature vector sequence of a sequence, inputting the output vector into a sequence cyclic neural network, namely, a bidirectional long and short memory artificial neural network, continuously extracting character sequence features on the basis of the convolution features, and outputting a prediction label. Finally, a series of tag distributions obtained from the loop layer are converted into a final tag sequence by a link timing classification penalty.

The step 5 specifically comprises the following steps: reading the photos of the pole and tower number plates of the whole line to be processed, which are divided according to the file, identifying and matching each number plate to obtain a line name and a tower number name, storing the line name into an array according to the sequence, determining the line name as the line name by finding the number with the largest occurrence number, sequencing the tower number, and finding the part with the different value to supplement the tower number.

In summary, according to the CTPN algorithm-based unmanned aerial vehicle inspection photo pole and tower license plate character recognition method provided by the embodiment of the application, the high-pixel picture shot by the unmanned aerial vehicle is adapted to through improvement and optimization of the CTPN algorithm, the acquisition of the pole and tower license plate position information is realized, the picture of the acquired position information is input into a convolutional neural network+ctc (connection time sequence classification) network to read the content information of the pole and tower license plate photo, and finally the recognized result is complemented and corrected by using priori knowledge.

The beneficial effects of the application are as follows:

in the prior art, for the step that the size of an input picture is adjusted, the size of the picture is adjusted to 448 multiplied by 448, more pixels are lost in the adjusted picture, characters in the picture of the pole and tower number plate picture are small targets, and pixels of an original image are larger, so that the content of the characters is difficult to detect due to the loss of the pixels after the size of the picture is adjusted, the model is larger due to the fact that the original picture is used for training, and the requirement on hardware video memory is too high in the training and testing process.

The application performs overlapping cutting on the original picture on the premise of not losing the pixel points, so that the integrity of the picture content is reserved to a greater extent during cutting, the text content and the position of a small target are detected, and meanwhile, the size of the model is smaller than that of the model trained by inputting the original picture, and the application has more advantages.

Drawings

FIG. 1 is a schematic flow chart of the present application;

FIG. 2 is a flowchart of an improved CTPN algorithm of the present application.

FIG. 3 is a schematic diagram of recognition according to an embodiment of the present application.

Detailed Description

The application will be described in further detail with reference to the accompanying drawings and specific examples.

As shown in fig. 1, the method mainly comprises the following steps:

step 1: taking pole tower license plates to patrol photos by using the unmanned aerial vehicle at fixed points;

step 2: labeling the acquired pole number plate data set to construct a pole number plate data set;

step 3: training an improved CTPN network to obtain text position information;

step 4: training the CRNN network to achieve identification of text content;

step 5: performing character recognition on the number plate of the line to be tested;

further, in the step 1, the unmanned aerial vehicle is utilized to shoot the inspection photographs of all parts of the pole tower at fixed points, and the specific method for constructing the low-quality photograph data set is as follows:

and shooting the tower number plate of the whole line through a camera of the unmanned aerial vehicle under a good environment, taking out the sd memory card in the unmanned aerial vehicle, and extracting the shot tower photo.

And (2) marking the acquired pole number plate data set in the step to construct the pole number plate data set, wherein the specific method comprises the following steps:

screening inspection photographs, screening out numbers and characters lost in the inspection photographs and pictures with serious or underexposed shooting, marking the screened inspection photographs by using an open source method PPOCRLabael, marking coordinates of four points and Chinese meanings of the four points, changing the output results into a format of ICDAR2017 through scripts for a subsequent training model, wherein the format is composed of the coordinates of the four points, language used by characters and text content.

As shown in fig. 2, in step 3, an improved CTPN network is trained to obtain the position information of the characters in the tower number plate picture, and the specific method is as follows:

the original image size is 5472 multiplied by 3078, a sliding window with the size of 684 multiplied by 384 is used for cutting the image, the image moves to the next position at the overlapping rate of 180 pixel points when moving transversely, the image moves to the next position at 180 pixel points when moving longitudinally, a plurality of images with the size of 684 multiplied by 382 are finally cut, the upper left corner position when moving to the point is stored as a naming when cutting, meanwhile, whether the coordinates of each point are inside a real frame or not and whether a real frame exists inside the cut image or not is calculated, the image with no real frame in the cut image and a part of the image with the real frame exist as negative samples are input into a network, and the image with the real frame in the cut image is input into the network as positive samples for learning. The method is a training part, and a prediction part predicts each image.

The improved CTPN network inputs the preprocessed picture into the VGG16 network to extract image features, then convolves the feature map with a 3X 3 sliding window to obtain feature vectors, inputs the obtained feature vectors into a bidirectional LSTM network to learn the features of each row, finally outputs the feature values through a full connection layer, wherein an anchor mechanism is introduced when a frame is predicted again, the width of the frame is not changed to 16, the height is preset to ten of 11, 16, 22, 32, 46, 65, 93, 134, 191 and 273 respectively, so as to ensure that different height texts are covered, and 2k vertical coordinates, 2k prediction scores and k frame adjustment parameters are output.

The vertical coordinate is composed of two parts, one is the height of the center position and the height of the rectangular frame.

And h ^a The y-coordinate center and height of the anchor,c _y and h are the predicted y-coordinate center and height, respectively; k frame adjustment parameters are used for finishing two endpoints of the text line and represent the horizontal translation amount of each suggestion frame.

Wherein x is _side The predicted nearest coordinates to the anchor horizontal coordinates,is the x coordinate center of the anchor, w ^a Is the width of the anchor.

And splicing the cut pictures back to the complete original picture through the previously-misplaced upper left corner picture coordinates, integrating the Anchor boxes in the original picture, searching candidate Anchor with the box horizontal distance smaller than 50 along the forward direction, reserving the Anchor with the box overlapping rate larger than 0.5, selecting the box value with the largest Softmax value, searching again in the reverse direction by the same method, and finally splicing the boxes meeting the requirements to obtain a large prediction frame as the final output result.

The training results need to be evaluated by a loss function, which consists of three parts, and three branches correspond to the three parts.

The first part is Softmax Loss for supervised learning of whether text is contained in the Anchor, the second part is regression of bounding boxes, and the third part is an offset for supervised learning of the Anchor bounding box regression containing text.

Still further, training the CRNN network in step 4 to obtain the specific content of the text in the tower number plate picture, which specifically includes the following steps:

the CRNN network can divide three parts, namely a marked pole tower number plate data set passes through a convolution layer of a first CNN network to extract a characteristic diagram of an input image, then four maximum pooling layers are used for extracting characteristics, and finally two normalization operations are used for accelerating model convergence and shortening a training process.

The method is characterized in that the method is generated from left to right on the feature map, each column contains 512-dimensional features, namely, the ith feature vector is a connection of ith column pixels of the feature map, the feature vectors form a feature vector sequence of a sequence, an output vector sequence RNN (RNN cyclic neural network) is a bidirectional LSTM (least squares) network, character sequence features are continuously extracted on the basis of convolution features, and a prediction label is output.

Finally, a series of tag distributions obtained from the loop layer are converted into final tag sequences by CTC loss.

In the step 5, the number plate of the whole line is subjected to character recognition to a certain extent, and the specific scheme is as follows: and reading the photos of the tower number plates shot by the unmanned aerial vehicle, which are divided by the file folder, identifying and matching each number plate to obtain a line name and a tower number name, storing the line name into an array in sequence, determining the line name as the line name by finding the number with the largest occurrence number, sequencing the tower numbers, and finding out the part with different values to supplement the tower number.

In summary, according to the CTPN algorithm-based unmanned aerial vehicle inspection photo pole and tower license plate character recognition method provided by the embodiment of the application, the high-pixel picture shot by the unmanned aerial vehicle is adapted to through improvement and optimization of the CTPN algorithm, the acquisition of the pole and tower license plate position information is realized, the picture of the acquired position information is input into the crnn+ctc network to read the content information of the pole and tower license plate photo, and finally the recognized result is complemented and corrected by using priori knowledge.

The application solves the problem of character recognition of the inspection photo pole and tower number plate, the existing model can cause the loss and distortion of picture pixels, the character content and the position can not be recognized, and the application solves the problems of positioning the character position and recognizing the content under the condition of high resolution and small target by a cutting method.

Claims

1. The unmanned aerial vehicle inspection photo pole tower license plate character recognition method based on the CTPN algorithm is characterized by comprising the following steps:

step 3: training an improved CTPN network to obtain text position information;

2. The unmanned aerial vehicle inspection photo tower number plate character recognition method based on the CTPN algorithm, which is disclosed in claim 1, is characterized in that: the step 2 is mainly to label the inspection photo by using an open source method PPOCRLabael, label the characters of the inspection photo by using a rectangular frame, output the coordinates of four corner points of the rectangular frame and the Chinese meaning of a tower number plate, change the output result into a format of an international conference in the field of document analysis and recognition for subsequent training by using a script, and the format mainly comprises the coordinates of four points, the language used by the characters and the character content.

3. The unmanned aerial vehicle inspection photo tower number plate character recognition method based on the CTPN algorithm, which is disclosed in claim 1, is characterized in that: the training of the improved CTPN network in step 3 to obtain the text location information is specifically: cutting the inspection photo by adopting a sliding window with the size of 684 multiplied by 384, moving to the next position at the overlapping rate of 180 pixel points during transverse movement, moving to the next position at 180 pixel points during longitudinal movement, finally cutting out a plurality of pictures with the size of 684 multiplied by 382, storing the left upper corner coordinate position when moving to the point as a naming during cutting, simultaneously calculating whether the coordinates of each point of the stored rectangular frame are inside the real frame and whether the real frame is inside the cut picture, inputting the picture with the real frame in the cut picture and the picture with a part of the real frame as negative samples into a network, and inputting the picture with the real frame in the cut picture as positive samples into network learning.

4. The CTPN algorithm-based unmanned aerial vehicle inspection photo tower number plate character recognition method according to claim 1, wherein the step 4 is specifically as follows: firstly, extracting a characteristic diagram of an input image from a pole and tower number plate data set through a first convolution layer of a convolution neural network, wherein seven layers of convolution are shared in the convolution neural network, four largest pooling layers are used for extracting characteristics, two normalization operations are used for accelerating model convergence, and the training process is shortened.

5. The unmanned aerial vehicle inspection photo tower number plate character recognition method based on the CTPN algorithm, which is disclosed in claim 1, is characterized in that: the step 5 specifically comprises the following steps: reading the photos of the pole and tower number plates of the whole line to be processed, which are divided according to the file, identifying and matching each number plate to obtain a line name and a tower number name, storing the line name into an array according to the sequence, determining the line name as the line name by finding the number with the largest occurrence number, sequencing the tower numbers, and finding out the part of the supplementary tower numbers with different values.