CN111723789A - Image text coordinate positioning method based on deep learning - Google Patents
Image text coordinate positioning method based on deep learning Download PDFInfo
- Publication number
- CN111723789A CN111723789A CN202010101820.5A CN202010101820A CN111723789A CN 111723789 A CN111723789 A CN 111723789A CN 202010101820 A CN202010101820 A CN 202010101820A CN 111723789 A CN111723789 A CN 111723789A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- neural network
- candidate
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 238000013135 deep learning Methods 0.000 title claims abstract description 16
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 29
- 238000013528 artificial neural network Methods 0.000 claims abstract description 16
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000004801 process automation Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000000306 recurrent effect Effects 0.000 claims abstract description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 239000013598 vector Substances 0.000 claims description 4
- 238000012937 correction Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 2
- 238000012549 training Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of Robot Process Automation (RPA). The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.
Description
Technical Field
The invention relates to the field of image target detection and identification in artificial intelligence, in particular to an image text coordinate positioning method based on deep learning, and is applied to the related fields of robot process automation and the like.
Background
The traditional optical character recognition is mainly oriented to high-quality document images, and the technology assumes that the background of an input image is clean, the font is simple, the characters are arranged neatly, and the high recognition level can be achieved under the condition of meeting the requirements. The characters in the image have much more complex fonts, colors and arrangements compared with the characters in the document, and the characters in the advertisements, trademarks and other advertisements have strong artistic styles, and the fonts, the sizes, the colors, the typesetting, the textures and the like of the characters are easy to change violently. It can be seen that conventional optical character recognition has not been able to meet the current needs of enterprises and industries. In addition, many intelligent applications need to complete the whole production and development process by acquiring accurate coordinate points of texts, such as robot process automation.
As enterprises implement their intelligent transformation vision, the technical complexity resulting from increased data consumption remains one of the biggest challenges to be addressed, while robotic processes are automated
(robot Process Automation, RPA) can reduce its complexity very well. The artificial intelligence and the RPA are combined, the advantages of the artificial intelligence and the RPA can be well utilized to solve practical problems, and the method can play a vital role in exploring a new business model.
Disclosure of Invention
The invention discloses an image text coordinate positioning method based on deep learning, and is applied to the related field of robot process automation. The principle is as follows: after the original image is acquired, the original image is preprocessed, wherein the preprocessing comprises an image enhancement algorithm and image scaling. And extracting candidate regions of the preprocessed image through an RPN neural network, and extracting features of the candidate regions through a CNN convolutional neural network and an RNN recurrent neural network based on the candidate regions, so that a plurality of candidate text regions and candidate frame coordinate points are obtained. And then, carrying out primary text recognition on the candidate text region by using the CNN convolutional neural network and the CTC neural network, and obtaining a final recognition result after the processing is carried out after the language model is corrected. The invention greatly improves the speed and accuracy of text recognition in the image, and can acquire the text coordinate points required to be accurate so as to promote the intelligent process of related industries.
The invention has the beneficial effects that: sensing and judgment based activities traditionally performed by humans can now be done in a shared manner, and can now be done very quickly by robots in most cases. Because artificial intelligence can build a knowledge base from historical data and use it for behavioral decision-making and prediction, the combination of AI techniques with RPA techniques helps to overcome RPA limitations. By combining the RPA technology and the deep learning technology and by means of the text recognition and positioning technology in the picture, the system can help enterprises to realize intelligent automatic solutions, thereby reducing the complexity of enterprise management, greatly improving the production efficiency and reducing the production cost.
The method combines the image feature extraction technology based on CNN and the sequence translation technology based on RNN, and provides a new image text coordinate positioning method to achieve the following two aims:
(1) end-to-end unconstrained character recognition is realized;
(2) coordinates of text in the image are located and retrieval of character coordinates is provided.
Drawings
FIG. 1 is a processing flow chart of the image text coordinate positioning method based on deep learning according to the present invention.
Detailed Description
The present invention is further illustrated with reference to the accompanying FIG. 1 and the specific embodiments, it is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting thereof. The specific implementation steps are as follows:
1. and acquiring an original image and performing primary processing.
2. The text region obtained in the detection step usually contains noise and other factors that affect recognition. Therefore, the image needs to be preprocessed before character recognition. Preprocessing an original image, wherein the preprocessing mainly comprises image scaling and image enhancement;
2.1 scaling the picture to a size suitable for processing;
2.2, carrying out denoising and image enhancement operations on the image, wherein the purpose is to remove background or noise points, thereby increasing the contrast and achieving the purpose of highlighting characters.
3. And carrying out RPN region proposal on the preprocessed image. And adding a classification layer and a frame regression layer of a full link layer on the CNN convolutional neural network to perform target area proposal on the image.
4. Using a CNN convolutional neural network and an RNN cyclic neural network to extract features of the candidate regions, thereby obtaining a plurality of candidate text regions and candidate frame coordinate points;
4.1 extracting the characteristics of the region proposal from the row RPN neural network through the CNN convolutional neural network;
4.2, further processing the extracted feature vectors in the step 4.1 by using an RNN recurrent neural network so as to obtain the regional features of each line;
and 4.3, correcting and adjusting the candidate frame coordinates of the candidate text region by using an LSTM long-time memory network, so as to obtain more accurate candidate frame coordinates.
5. Then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;
5.1, extracting the characteristics of each candidate text region through a CNN convolutional neural network;
and 5.2, further identifying the text content in the identified and extracted features through a CTC decoding mechanism. The CTC is mainly used for sequence decoding, and the text in the image is further identified after the text is acquired at the specific position in the input image through the step 4, so that the task complexity is greatly reduced, and the identification speed of the text in the image is improved.
6. After the language model is corrected, the final recognition result is obtained;
6.1 establishing a corpus so as to train word vectors and language models by using the corpus later;
6.2, inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model;
6.3 therefore, the result of recognition after step 5 is output as modified text information through the trained language model.
7. Acquiring candidate frame coordinates and a center coordinate point of a corresponding text by retrieving the text;
7.1 through the coordinates of each text candidate box obtained in the step 4, the coordinates of the center point of the text segment and the coordinates of each character can be further obtained;
and 7.2, matching the retrieved text with the text recognized in the image, and returning the text with the highest matching degree and the coordinate point thereof.
8. The robot process automation related software further completes the work by calling an interface of the text recognition tool.
The above embodiments are only specific cases of the present invention, and the protection scope of the present invention includes but is not limited to the above embodiments, and any suitable changes or substitutions that are made by a person of ordinary skill in the art according to the claims of the deep learning based image text recognition method of the present invention and fall within the protection scope of the present invention.
Claims (7)
1. An image text coordinate positioning method based on deep learning is characterized by mainly comprising the following steps:
s1: acquiring an original image;
s2: the character area obtained in the detection step usually contains noise and other factors which influence the recognition, so that before character recognition, the image needs to be preprocessed, and the original image needs to be preprocessed, wherein the preprocessing mainly comprises image scaling and image enhancement;
s3: carrying out RPN region proposing on the preprocessed image, and adding a classification layer and a frame regression layer of a full link layer on a CNN convolutional neural network to carry out target region proposing on the image;
s4: performing feature extraction on the candidate regions by using a CNN convolutional neural network and an RNN recurrent neural network to obtain a plurality of candidate text regions;
s5: then, performing preliminary text recognition on the candidate text region by using a CNN convolutional neural network and a CTC neural network;
s6: after the language model is corrected, the final recognition result is obtained;
and S7, acquiring the candidate box coordinates and the center coordinate point of the corresponding text by searching the text.
2. The image text coordinate positioning method based on deep learning of claim 1, wherein the preprocessing in S2 is specifically: s2.1: scaling the picture to a size suitable for processing; s2.2: and carrying out denoising and image enhancement operations on the image.
3. The method for locating coordinates of image text based on deep learning of claim 1, wherein the step of obtaining multiple candidate text regions in S4 is S4.1: performing feature extraction on the region proposal from the row RPN neural network through a CNN convolutional neural network; s4.2: further processing the extracted feature vectors in S4.1 by using an RNN recurrent neural network, thereby obtaining the regional features of each row; s4.3: and correcting and adjusting the candidate frame coordinates of the candidate text area by using the LSTM long-time memory network.
4. The method according to claim 1, wherein the preliminary text recognition in S5 for obtaining multiple candidate regions is: s5.1: performing feature extraction on each candidate text region through a CNN convolutional neural network; s5.2: and further identifying the text content in the identified and extracted features through a CTC decoding mechanism.
5. The image text coordinate positioning method based on deep learning of claim 1, wherein the text recognition correction step in S6 is: s6.1: establishing a corpus so as to train word vectors and language models by using the corpus later; s6.2: inputting the text in the corpus into a deep learning neural network, and training a text recognition correction model; s6.3: and outputting the corrected text information.
6. The image text coordinate positioning method based on deep learning of claim 1, wherein the step S7 is as follows: s7.1: through the coordinates of each text candidate box obtained in the step S4, coordinates of a center point of the text segment and coordinates of each character can be further obtained; s7.2: and matching the retrieved text with the text identified in the image, and returning the text with the highest matching degree and the coordinate point thereof.
7. The image text coordinate positioning method based on deep learning of claim 1, wherein the robot process automation related software further completes its work by calling an interface of a text recognition tool.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010101820.5A CN111723789A (en) | 2020-02-19 | 2020-02-19 | Image text coordinate positioning method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010101820.5A CN111723789A (en) | 2020-02-19 | 2020-02-19 | Image text coordinate positioning method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111723789A true CN111723789A (en) | 2020-09-29 |
Family
ID=72564053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010101820.5A Pending CN111723789A (en) | 2020-02-19 | 2020-02-19 | Image text coordinate positioning method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111723789A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633422A (en) * | 2021-03-10 | 2021-04-09 | 北京易真学思教育科技有限公司 | Training method of text recognition model, text recognition method, device and equipment |
CN113792175A (en) * | 2021-08-23 | 2021-12-14 | 西南科技大学 | Image understanding method based on fine-grained feature extraction |
CN114281041A (en) * | 2021-12-23 | 2022-04-05 | 浙江中控技术股份有限公司 | Flow chart creation method, model training method, device, equipment and medium |
WO2022160707A1 (en) * | 2021-01-29 | 2022-08-04 | 北京来也网络科技有限公司 | Human-machine interaction method and apparatus combined with rpa and ai, and storage medium and electronic device |
US11861919B2 (en) | 2020-12-17 | 2024-01-02 | Beijing Baidu Netcom Science Technology Co., Ltd. | Text recognition method and device, and electronic device |
US11893776B2 (en) | 2020-10-30 | 2024-02-06 | Boe Technology Group Co., Ltd. | Image recognition method and apparatus, training method, electronic device, and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109376658A (en) * | 2018-10-26 | 2019-02-22 | 信雅达系统工程股份有限公司 | A kind of OCR method based on deep learning |
CN109492630A (en) * | 2018-10-26 | 2019-03-19 | 信雅达系统工程股份有限公司 | A method of the word area detection positioning in the financial industry image based on deep learning |
US20190095730A1 (en) * | 2017-09-25 | 2019-03-28 | Beijing University Of Posts And Telecommunications | End-To-End Lightweight Method And Apparatus For License Plate Recognition |
CN109902622A (en) * | 2019-02-26 | 2019-06-18 | 中国科学院重庆绿色智能技术研究院 | A kind of text detection recognition methods for boarding pass information verifying |
CN110363199A (en) * | 2019-07-16 | 2019-10-22 | 济南浪潮高新科技投资发展有限公司 | Certificate image text recognition method and system based on deep learning |
-
2020
- 2020-02-19 CN CN202010101820.5A patent/CN111723789A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190095730A1 (en) * | 2017-09-25 | 2019-03-28 | Beijing University Of Posts And Telecommunications | End-To-End Lightweight Method And Apparatus For License Plate Recognition |
CN109376658A (en) * | 2018-10-26 | 2019-02-22 | 信雅达系统工程股份有限公司 | A kind of OCR method based on deep learning |
CN109492630A (en) * | 2018-10-26 | 2019-03-19 | 信雅达系统工程股份有限公司 | A method of the word area detection positioning in the financial industry image based on deep learning |
CN109902622A (en) * | 2019-02-26 | 2019-06-18 | 中国科学院重庆绿色智能技术研究院 | A kind of text detection recognition methods for boarding pass information verifying |
CN110363199A (en) * | 2019-07-16 | 2019-10-22 | 济南浪潮高新科技投资发展有限公司 | Certificate image text recognition method and system based on deep learning |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11893776B2 (en) | 2020-10-30 | 2024-02-06 | Boe Technology Group Co., Ltd. | Image recognition method and apparatus, training method, electronic device, and storage medium |
US11861919B2 (en) | 2020-12-17 | 2024-01-02 | Beijing Baidu Netcom Science Technology Co., Ltd. | Text recognition method and device, and electronic device |
WO2022160707A1 (en) * | 2021-01-29 | 2022-08-04 | 北京来也网络科技有限公司 | Human-machine interaction method and apparatus combined with rpa and ai, and storage medium and electronic device |
CN112633422A (en) * | 2021-03-10 | 2021-04-09 | 北京易真学思教育科技有限公司 | Training method of text recognition model, text recognition method, device and equipment |
CN112633422B (en) * | 2021-03-10 | 2021-06-22 | 北京易真学思教育科技有限公司 | Training method of text recognition model, text recognition method, device and equipment |
CN113792175A (en) * | 2021-08-23 | 2021-12-14 | 西南科技大学 | Image understanding method based on fine-grained feature extraction |
CN114281041A (en) * | 2021-12-23 | 2022-04-05 | 浙江中控技术股份有限公司 | Flow chart creation method, model training method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111723789A (en) | Image text coordinate positioning method based on deep learning | |
Bheda et al. | Using deep convolutional networks for gesture recognition in american sign language | |
Yuliang et al. | Detecting curve text in the wild: New dataset and new solution | |
Vaidya et al. | Handwritten character recognition using deep-learning | |
CN110570481A (en) | calligraphy word stock automatic repairing method and system based on style migration | |
Wilkinson et al. | Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
CN110674777A (en) | Optical character recognition method in patent text scene | |
CN111666937A (en) | Method and system for recognizing text in image | |
CN110705490B (en) | Visual emotion recognition method | |
Liu et al. | Compact feature learning for multi-domain image classification | |
CN114038037A (en) | Expression label correction and identification method based on separable residual attention network | |
CN113673510A (en) | Target detection algorithm combining feature point and anchor frame joint prediction and regression | |
CN110458132A (en) | One kind is based on random length text recognition method end to end | |
CN111523622A (en) | Method for simulating handwriting by mechanical arm based on characteristic image self-learning | |
CN106127112A (en) | Data Dimensionality Reduction based on DLLE model and feature understanding method | |
Li et al. | Historical Chinese character recognition method based on style transfer mapping | |
Yu et al. | Exemplar-based recursive instance segmentation with application to plant image analysis | |
Sutha et al. | Neural network based offline Tamil handwritten character recognition System | |
Zhao et al. | Cbph-net: A small object detector for behavior recognition in classroom scenarios | |
CN113743443B (en) | Image evidence classification and recognition method and device | |
Zhu et al. | Attention combination of sequence models for handwritten Chinese text recognition | |
CN111144469B (en) | End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network | |
CN111738177B (en) | Student classroom behavior identification method based on attitude information extraction | |
Hu et al. | Towards facial de-expression and expression recognition in the wild |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |