CN111723789A - Image text coordinate positioning method based on deep learning - Google Patents

Image text coordinate positioning method based on deep learning Download PDF

Info

Publication number
CN111723789A
CN111723789A CN202010101820.5A CN202010101820A CN111723789A CN 111723789 A CN111723789 A CN 111723789A CN 202010101820 A CN202010101820 A CN 202010101820A CN 111723789 A CN111723789 A CN 111723789A
Authority
CN
China
Prior art keywords
text
image
neural network
candidate
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010101820.5A
Other languages
Chinese (zh)
Inventor
王春宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010101820.5A priority Critical patent/CN111723789A/en
Publication of CN111723789A publication Critical patent/CN111723789A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of Robot Process Automation (RPA). The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.

Description

Image text coordinate positioning method based on deep learning
Technical Field
The invention relates to the field of image target detection and identification in artificial intelligence, in particular to an image text coordinate positioning method based on deep learning, and is applied to the related fields of robot process automation and the like.
Background
The traditional optical character recognition is mainly oriented to high-quality document images, and the technology assumes that the background of an input image is clean, the font is simple, the characters are arranged neatly, and the high recognition level can be achieved under the condition of meeting the requirements. The characters in the image have much more complex fonts, colors and arrangements compared with the characters in the document, and the characters in the advertisements, trademarks and other advertisements have strong artistic styles, and the fonts, the sizes, the colors, the typesetting, the textures and the like of the characters are easy to change violently. It can be seen that conventional optical character recognition has not been able to meet the current needs of enterprises and industries. In addition, many intelligent applications need to complete the whole production and development process by acquiring accurate coordinate points of texts, such as robot process automation.
As enterprises implement their intelligent transformation vision, the technical complexity resulting from increased data consumption remains one of the biggest challenges to be addressed, while robotic processes are automated
(robot Process Automation, RPA) can reduce its complexity very well. The artificial intelligence and the RPA are combined, the advantages of the artificial intelligence and the RPA can be well utilized to solve practical problems, and the method can play a vital role in exploring a new business model.
Disclosure of Invention
The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of robot process automation. The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.
The invention has the beneficial effects that: sensing and judgment based activities traditionally performed by humans can now be done in a shared manner, and can now be done very quickly by robots in most cases. Because artificial intelligence can build a knowledge base from historical data and use it for behavioral decision-making and prediction, the combination of AI techniques with RPA techniques helps to overcome RPA limitations. By combining the RPA technology and the deep learning technology and by means of the text recognition and positioning technology in the picture, the system can help enterprises to realize intelligent automatic solutions, thereby reducing the complexity of enterprise management, greatly improving the production efficiency and reducing the production cost.
The method combines the image feature extraction technology based on CNN and the sequence translation technology based on RNN, and provides a new image text coordinate positioning method to achieve the following two aims:
(1) end-to-end unconstrained character recognition is realized;
(2) coordinates of text in the image are located and retrieval of character coordinates is provided.
Drawings
FIG. 1 is a processing flow chart of the image text coordinate positioning method based on deep learning according to the present invention.
Detailed Description
The present invention is further illustrated with reference to the accompanying FIG. 1 and the specific embodiments, it is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting thereof. The specific implementation steps are as follows:
1. and acquiring an original image and performing primary processing.
2. The text region obtained in the detection step usually contains noise and other factors that affect recognition. Therefore, the image needs to be preprocessed before character recognition. Preprocessing an original image, wherein the preprocessing mainly comprises image scaling and image enhancement;
2.1 scaling the picture to a size suitable for processing;
2.2, carrying out denoising and image enhancement operations on the image, wherein the purpose is to remove background or noise points, thereby increasing the contrast and achieving the purpose of highlighting characters.
3. And carrying out RPN region proposal on the preprocessed image. And adding a classification layer and a frame regression layer of a full link layer on the CNN convolutional neural network to perform target area proposal on the image.
4. Using a CNN convolutional neural network and an RNN cyclic neural network to extract features of the candidate regions, thereby obtaining a plurality of candidate text regions and candidate frame coordinate points;
4.1 extracting the characteristics of the region proposal from the row RPN neural network through the CNN convolutional neural network;
4.2, further processing the extracted feature vectors in the step 4.1 by using an RNN recurrent neural network so as to obtain the regional features of each line;
and 4.3, correcting and adjusting the candidate frame coordinates of the candidate text region by using an LSTM long-time memory network, so as to obtain more accurate candidate frame coordinates.
5. Then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;
5.1, extracting the characteristics of each candidate text region through a CNN convolutional neural network;
and 5.2, further identifying the text content in the identified and extracted features through a CTC decoding mechanism. The CTC is mainly used for sequence decoding, and the text in the image is further identified after the text is acquired at the specific position in the input image through the step 4, so that the task complexity is greatly reduced, and the identification speed of the text in the image is improved.
6. After the language model is corrected, the final recognition result is obtained;
6.1 establishing a corpus so as to train word vectors and language models by using the corpus later;
6.2, inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model;
6.3 therefore, the result of recognition after step 5 is output as modified text information through the trained language model.
7. Acquiring candidate frame coordinates and a center coordinate point of a corresponding text by retrieving the text;
7.1 through the coordinates of each text candidate box obtained in the step 4, the coordinates of the center point of the text segment and the coordinates of each character can be further obtained;
and 7.2, matching the retrieved text with the text recognized in the image, and returning the text with the highest matching degree and the coordinate point thereof.
8. The robot process automation related software further completes the work by calling an interface of the text recognition tool.
The above embodiments are only specific cases of the present invention, and the protection scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are made by a person of ordinary skill in the art according to the claims of the deep learning based image text recognition method of the present invention and fall within the protection scope of the present invention.

Claims (7)

1. An image text coordinate positioning method based on deep learning is characterized by mainly comprising the following steps:
s1: acquiring an original image;
s2: the character area obtained in the detection step usually contains noise and other factors which influence the recognition, so that before character recognition, the image needs to be preprocessed, and the original image needs to be preprocessed, wherein the preprocessing mainly comprises image scaling and image enhancement;
s3: carrying out RPN region proposing on the preprocessed image, and adding a classification layer and a frame regression layer of a full link layer on a CNN convolutional neural network to carry out target region proposing on the image;
s4: performing feature extraction on the candidate regions by using a CNN convolutional neural network and an RNN recurrent neural network to obtain a plurality of candidate text regions;
s5: then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;
s6: after the language model is corrected, the final recognition result is obtained;
and S7, acquiring the candidate box coordinates and the center coordinate point of the corresponding text by searching the text.
2. The image text coordinate positioning method based on deep learning of claim 1, wherein the preprocessing in S2 is specifically: s2.1: scaling the picture to a size suitable for processing; s2.2: and carrying out denoising and image enhancement operations on the image.
3. The method for locating coordinates of image text based on deep learning of claim 1, wherein the step of obtaining multiple candidate text regions in S4 is S4.1: performing feature extraction on the region proposal from the row RPN neural network through a CNN convolutional neural network; s4.2: further processing the extracted feature vectors in S4.1 by using an RNN recurrent neural network, thereby obtaining the regional features of each row; s4.3: and correcting and adjusting the candidate frame coordinates of the candidate text area by using the LSTM long-time memory network.
4. The method according to claim 1, wherein the preliminary text recognition in S5 for obtaining multiple candidate regions is: s5.1: performing feature extraction on each candidate text region through a CNN convolutional neural network; s5.2: and further identifying the text content in the identified and extracted features through a CTC decoding mechanism.
5. The image text coordinate positioning method based on deep learning of claim 1, wherein the text recognition correction step in S6 is: s6.1: establishing a corpus so as to train word vectors and language models by using the corpus later; s6.2: inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model; s6.3: and outputting the corrected text information.
6. The image text coordinate positioning method based on deep learning of claim 1, wherein the step S7 is as follows: s7.1: through the coordinates of each text candidate box obtained in the step S4, coordinates of a center point of the text segment and coordinates of each character can be further obtained; s7.2: and matching the retrieved text with the text identified in the image, and returning the text with the highest matching degree and the coordinate point thereof.
7. The image text coordinate positioning method based on deep learning of claim 1, wherein the robot process automation related software further completes its work by calling an interface of a text recognition tool.
CN202010101820.5A 2020-02-19 2020-02-19 Image text coordinate positioning method based on deep learning Pending CN111723789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010101820.5A CN111723789A (en) 2020-02-19 2020-02-19 Image text coordinate positioning method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010101820.5A CN111723789A (en) 2020-02-19 2020-02-19 Image text coordinate positioning method based on deep learning

Publications (1)

Publication Number Publication Date
CN111723789A true CN111723789A (en) 2020-09-29

Family

ID=72564053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010101820.5A Pending CN111723789A (en) 2020-02-19 2020-02-19 Image text coordinate positioning method based on deep learning

Country Status (1)

Country Link
CN (1) CN111723789A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633422A (en) * 2021-03-10 2021-04-09 北京易真学思教育科技有限公司 Training method of text recognition model, text recognition method, device and equipment
CN113792175A (en) * 2021-08-23 2021-12-14 西南科技大学 Image understanding method based on fine-grained feature extraction
CN114281041A (en) * 2021-12-23 2022-04-05 浙江中控技术股份有限公司 Flow chart creation method, model training method, device, equipment and medium
WO2022160707A1 (en) * 2021-01-29 2022-08-04 北京来也网络科技有限公司 Human-machine interaction method and apparatus combined with rpa and ai, and storage medium and electronic device
US11861919B2 (en) 2020-12-17 2024-01-02 Beijing Baidu Netcom Science Technology Co., Ltd. Text recognition method and device, and electronic device
US11893776B2 (en) 2020-10-30 2024-02-06 Boe Technology Group Co., Ltd. Image recognition method and apparatus, training method, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376658A (en) * 2018-10-26 2019-02-22 信雅达系统工程股份有限公司 A kind of OCR method based on deep learning
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
US20190095730A1 (en) * 2017-09-25 2019-03-28 Beijing University Of Posts And Telecommunications End-To-End Lightweight Method And Apparatus For License Plate Recognition
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying
CN110363199A (en) * 2019-07-16 2019-10-22 济南浪潮高新科技投资发展有限公司 Certificate image text recognition method and system based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095730A1 (en) * 2017-09-25 2019-03-28 Beijing University Of Posts And Telecommunications End-To-End Lightweight Method And Apparatus For License Plate Recognition
CN109376658A (en) * 2018-10-26 2019-02-22 信雅达系统工程股份有限公司 A kind of OCR method based on deep learning
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying
CN110363199A (en) * 2019-07-16 2019-10-22 济南浪潮高新科技投资发展有限公司 Certificate image text recognition method and system based on deep learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11893776B2 (en) 2020-10-30 2024-02-06 Boe Technology Group Co., Ltd. Image recognition method and apparatus, training method, electronic device, and storage medium
US11861919B2 (en) 2020-12-17 2024-01-02 Beijing Baidu Netcom Science Technology Co., Ltd. Text recognition method and device, and electronic device
WO2022160707A1 (en) * 2021-01-29 2022-08-04 北京来也网络科技有限公司 Human-machine interaction method and apparatus combined with rpa and ai, and storage medium and electronic device
CN112633422A (en) * 2021-03-10 2021-04-09 北京易真学思教育科技有限公司 Training method of text recognition model, text recognition method, device and equipment
CN112633422B (en) * 2021-03-10 2021-06-22 北京易真学思教育科技有限公司 Training method of text recognition model, text recognition method, device and equipment
CN113792175A (en) * 2021-08-23 2021-12-14 西南科技大学 Image understanding method based on fine-grained feature extraction
CN114281041A (en) * 2021-12-23 2022-04-05 浙江中控技术股份有限公司 Flow chart creation method, model training method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111723789A (en) Image text coordinate positioning method based on deep learning
Bheda et al. Using deep convolutional networks for gesture recognition in american sign language
Yuliang et al. Detecting curve text in the wild: New dataset and new solution
Vaidya et al. Handwritten character recognition using deep-learning
CN110570481A (en) calligraphy word stock automatic repairing method and system based on style migration
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN110674777A (en) Optical character recognition method in patent text scene
CN111666937A (en) Method and system for recognizing text in image
CN110705490B (en) Visual emotion recognition method
Liu et al. Compact feature learning for multi-domain image classification
CN114038037A (en) Expression label correction and identification method based on separable residual attention network
CN113673510A (en) Target detection algorithm combining feature point and anchor frame joint prediction and regression
CN110458132A (en) One kind is based on random length text recognition method end to end
CN111523622A (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN106127112A (en) Data Dimensionality Reduction based on DLLE model and feature understanding method
Li et al. Historical Chinese character recognition method based on style transfer mapping
Yu et al. Exemplar-based recursive instance segmentation with application to plant image analysis
Sutha et al. Neural network based offline Tamil handwritten character recognition System
Zhao et al. Cbph-net: A small object detector for behavior recognition in classroom scenarios
CN113743443B (en) Image evidence classification and recognition method and device
Zhu et al. Attention combination of sequence models for handwritten Chinese text recognition
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network
CN111738177B (en) Student classroom behavior identification method based on attitude information extraction
Hu et al. Towards facial de-expression and expression recognition in the wild

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination