CN111723789A

CN111723789A - Image text coordinate positioning method based on deep learning

Info

Publication number: CN111723789A
Application number: CN202010101820.5A
Authority: CN
Inventors: 王春宝
Original assignee: Individual
Current assignee: Individual
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2020-09-29

Abstract

The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of Robot Process Automation (RPA). The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.

Description

Image text coordinate positioning method based on deep learning

Technical Field

The invention relates to the field of image target detection and identification in artificial intelligence, in particular to an image text coordinate positioning method based on deep learning, and is applied to the related fields of robot process automation and the like.

Background

The traditional optical character recognition is mainly oriented to high-quality document images, and the technology assumes that the background of an input image is clean, the font is simple, the characters are arranged neatly, and the high recognition level can be achieved under the condition of meeting the requirements. The characters in the image have much more complex fonts, colors and arrangements compared with the characters in the document, and the characters in the advertisements, trademarks and other advertisements have strong artistic styles, and the fonts, the sizes, the colors, the typesetting, the textures and the like of the characters are easy to change violently. It can be seen that conventional optical character recognition has not been able to meet the current needs of enterprises and industries. In addition, many intelligent applications need to complete the whole production and development process by acquiring accurate coordinate points of texts, such as robot process automation.

As enterprises implement their intelligent transformation vision, the technical complexity resulting from increased data consumption remains one of the biggest challenges to be addressed, while robotic processes are automated

(robot Process Automation, RPA) can reduce its complexity very well. The artificial intelligence and the RPA are combined, the advantages of the artificial intelligence and the RPA can be well utilized to solve practical problems, and the method can play a vital role in exploring a new business model.

Disclosure of Invention

The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of robot process automation. The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.

The invention has the beneficial effects that: sensing and judgment based activities traditionally performed by humans can now be done in a shared manner, and can now be done very quickly by robots in most cases. Because artificial intelligence can build a knowledge base from historical data and use it for behavioral decision-making and prediction, the combination of AI techniques with RPA techniques helps to overcome RPA limitations. By combining the RPA technology and the deep learning technology and by means of the text recognition and positioning technology in the picture, the system can help enterprises to realize intelligent automatic solutions, thereby reducing the complexity of enterprise management, greatly improving the production efficiency and reducing the production cost.

The method combines the image feature extraction technology based on CNN and the sequence translation technology based on RNN, and provides a new image text coordinate positioning method to achieve the following two aims:

(1) end-to-end unconstrained character recognition is realized;

(2) coordinates of text in the image are located and retrieval of character coordinates is provided.

Drawings

FIG. 1 is a processing flow chart of the image text coordinate positioning method based on deep learning according to the present invention.

Detailed Description

The present invention is further illustrated with reference to the accompanying FIG. 1 and the specific embodiments, it is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting thereof. The specific implementation steps are as follows:

1. and acquiring an original image and performing primary processing.

2. The text region obtained in the detection step usually contains noise and other factors that affect recognition. Therefore, the image needs to be preprocessed before character recognition. Preprocessing an original image, wherein the preprocessing mainly comprises image scaling and image enhancement;

2.1 scaling the picture to a size suitable for processing;

2.2, carrying out denoising and image enhancement operations on the image, wherein the purpose is to remove background or noise points, thereby increasing the contrast and achieving the purpose of highlighting characters.

3. And carrying out RPN region proposal on the preprocessed image. And adding a classification layer and a frame regression layer of a full link layer on the CNN convolutional neural network to perform target area proposal on the image.

4. Using a CNN convolutional neural network and an RNN cyclic neural network to extract features of the candidate regions, thereby obtaining a plurality of candidate text regions and candidate frame coordinate points;

4.1 extracting the characteristics of the region proposal from the row RPN neural network through the CNN convolutional neural network;

4.2, further processing the extracted feature vectors in the step 4.1 by using an RNN recurrent neural network so as to obtain the regional features of each line;

and 4.3, correcting and adjusting the candidate frame coordinates of the candidate text region by using an LSTM long-time memory network, so as to obtain more accurate candidate frame coordinates.

5. Then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;

5.1, extracting the characteristics of each candidate text region through a CNN convolutional neural network;

and 5.2, further identifying the text content in the identified and extracted features through a CTC decoding mechanism. The CTC is mainly used for sequence decoding, and the text in the image is further identified after the text is acquired at the specific position in the input image through the step 4, so that the task complexity is greatly reduced, and the identification speed of the text in the image is improved.

6. After the language model is corrected, the final recognition result is obtained;

6.1 establishing a corpus so as to train word vectors and language models by using the corpus later;

6.2, inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model;

6.3 therefore, the result of recognition after step 5 is output as modified text information through the trained language model.

7. Acquiring candidate frame coordinates and a center coordinate point of a corresponding text by retrieving the text;

7.1 through the coordinates of each text candidate box obtained in the step 4, the coordinates of the center point of the text segment and the coordinates of each character can be further obtained;

and 7.2, matching the retrieved text with the text recognized in the image, and returning the text with the highest matching degree and the coordinate point thereof.

8. The robot process automation related software further completes the work by calling an interface of the text recognition tool.

The above embodiments are only specific cases of the present invention, and the protection scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are made by a person of ordinary skill in the art according to the claims of the deep learning based image text recognition method of the present invention and fall within the protection scope of the present invention.

Claims

1. An image text coordinate positioning method based on deep learning is characterized by mainly comprising the following steps:

s1: acquiring an original image;

s2: the character area obtained in the detection step usually contains noise and other factors which influence the recognition, so that before character recognition, the image needs to be preprocessed, and the original image needs to be preprocessed, wherein the preprocessing mainly comprises image scaling and image enhancement;

s3: carrying out RPN region proposing on the preprocessed image, and adding a classification layer and a frame regression layer of a full link layer on a CNN convolutional neural network to carry out target region proposing on the image;

s4: performing feature extraction on the candidate regions by using a CNN convolutional neural network and an RNN recurrent neural network to obtain a plurality of candidate text regions;

s5: then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;

s6: after the language model is corrected, the final recognition result is obtained;

and S7, acquiring the candidate box coordinates and the center coordinate point of the corresponding text by searching the text.

2. The image text coordinate positioning method based on deep learning of claim 1, wherein the preprocessing in S2 is specifically: s2.1: scaling the picture to a size suitable for processing; s2.2: and carrying out denoising and image enhancement operations on the image.

3. The method for locating coordinates of image text based on deep learning of claim 1, wherein the step of obtaining multiple candidate text regions in S4 is S4.1: performing feature extraction on the region proposal from the row RPN neural network through a CNN convolutional neural network; s4.2: further processing the extracted feature vectors in S4.1 by using an RNN recurrent neural network, thereby obtaining the regional features of each row; s4.3: and correcting and adjusting the candidate frame coordinates of the candidate text area by using the LSTM long-time memory network.

4. The method according to claim 1, wherein the preliminary text recognition in S5 for obtaining multiple candidate regions is: s5.1: performing feature extraction on each candidate text region through a CNN convolutional neural network; s5.2: and further identifying the text content in the identified and extracted features through a CTC decoding mechanism.

5. The image text coordinate positioning method based on deep learning of claim 1, wherein the text recognition correction step in S6 is: s6.1: establishing a corpus so as to train word vectors and language models by using the corpus later; s6.2: inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model; s6.3: and outputting the corrected text information.

6. The image text coordinate positioning method based on deep learning of claim 1, wherein the step S7 is as follows: s7.1: through the coordinates of each text candidate box obtained in the step S4, coordinates of a center point of the text segment and coordinates of each character can be further obtained; s7.2: and matching the retrieved text with the text identified in the image, and returning the text with the highest matching degree and the coordinate point thereof.

7. The image text coordinate positioning method based on deep learning of claim 1, wherein the robot process automation related software further completes its work by calling an interface of a text recognition tool.