CN111507348A - Character segmentation and identification method based on CTC deep neural network - Google Patents

Character segmentation and identification method based on CTC deep neural network Download PDF

Info

Publication number
CN111507348A
CN111507348A CN202010294624.4A CN202010294624A CN111507348A CN 111507348 A CN111507348 A CN 111507348A CN 202010294624 A CN202010294624 A CN 202010294624A CN 111507348 A CN111507348 A CN 111507348A
Authority
CN
China
Prior art keywords
segmentation
ctc
recognition
neural network
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010294624.4A
Other languages
Chinese (zh)
Inventor
侯进
黄贤俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyuan Hengji Technology Co ltd
Original Assignee
Shenyuan Hengji Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyuan Hengji Technology Co ltd filed Critical Shenyuan Hengji Technology Co ltd
Priority to CN202010294624.4A priority Critical patent/CN111507348A/en
Publication of CN111507348A publication Critical patent/CN111507348A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention provides a character segmentation and recognition method based on a CTC deep neural network, which comprises the following steps of a1. extracting features from an input image by using CNN, a2. carrying out CE LL segmentation on the features extracted by a1, fixing the height and width of CE LL, determining the number of the CE LL by the image length, a3. directly segmenting and classifying each CE LL with the determined features and outputting segmentation signals, a4. calculating the loss between the real segmentation signals and the segmentation signals output by a model by using CTC L OSS, feeding back the loss condition and training the whole model, a5. segmenting a text by using the segmentation signals output by a3 and carrying out CNN + softmax classification recognition on a single character, wherein the real segmentation signals are mapped by a labeled text, and the CTC L OSS can automatically solve the problem of text alignment.

Description

Character segmentation and identification method based on CTC deep neural network
Technical Field
The invention relates to the technical field of character segmentation and recognition, in particular to a character segmentation and recognition method based on a CTC deep neural network.
Background
The OCR (Optical Character Recognition) is an image processing technology for detecting, recognizing and structuring image characters, the current OCR technology is divided into three modules of detection, Recognition and structuring, and the detection and Recognition have two frames, namely 1 single Character detection and single Character Recognition frames, wherein the core task of the detection module is to detect each independent Character region of an image, the Recognition module is responsible for recognizing characters of each cut Character region image, the basic frame of the existing Recognition model is CNN + softmax, 2 text line detection and whole line Recognition frames, the core task of the detection module is to detect a text region in the image, and the Recognition module is responsible for recognizing texts of the cut text region image, and the basic frame of the existing Recognition model is CNN + L + STM + CTC.
The method mainly comprises the following steps of I extracting features from pictures, II enumerating a large number of rectangles to try to regress corresponding objects, III classifying the enumerated rectangles into 2 types, including a positive sample and other negative samples which are large in intersection, IV cutting the positive sample from a feature map, then removing the boundary of a regression target according to the feature map, and text line recognition, wherein the depth cycle network is used for carrying out word string recognition, combining CNN and RNN, extracting image features through CNN, transversely slicing the feature map, then adopting a typical RNN structure L STM cycle network to carry out text inference, and finally adopting a CTC loss function to calculate the difference between predicted characters and labels to finish end-to-end training.
The text information in the image content is structured based on a template and rule logic, the existing frames have certain disadvantages, detection needs to carry out drawing frame marking on each character position under the 1 st frame, marking cost is extremely high, and meanwhile, the structuring difficulty is greatly improved, so that the specific target of a general detection task is to detect text lines instead of detecting individual characters independently; in the framework of type 2, the identification module takes a lot of time. Aiming at the problems of the existing frame, a method for segmenting and identifying characters, which is used for optimizing the identification frame and reducing the time consumption of an identification module, is urgently needed.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
The invention aims to provide a character segmentation and identification method based on a CTC deep neural network, so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
a character segmentation and identification method based on a CTC deep neural network comprises the following steps:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step a1, wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and c, segmenting the text by using the segmentation signal output by the step a3, and performing CNN + softmax classification recognition on the single character.
Further, the real segmentation signal is mapped from the annotation text.
Further, the CTC L OSS may automatically resolve text alignment issues.
Further, the CTC L OSS calculation formula is as follows:
Figure BDA0002451727140000021
where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.
Further, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.
Further, the probability of the single alignment scheme is calculated as follows:
Figure BDA0002451727140000022
compared with the prior art, the invention has the following beneficial effects: 1. compared with the prior art, the method greatly improves the speed of OCR recognition, and recognition optimization can be targeted after the OCR recognition is cut into single characters, so that the final precision is improved; 2. compared with the prior art, the method improves the recognition frame, and separates the recognition process into two steps of character segmentation and single character recognition, so that optimization can be carried out separately and pertinently. 3. Compared with the prior art, the method has the advantages of unique concept, novel idea and operability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a process diagram of the CTC deep neural network-based text segmentation and recognition method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiment is only one embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is further described with reference to the following drawings and detailed description:
as shown in fig. 1, the method for character segmentation and recognition based on CTC deep neural network includes the following steps:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step a1, wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and c, segmenting the text by using the segmentation signal output by the step a3, and performing CNN + softmax classification recognition on the single character.
The method fundamentally and greatly improves the speed of OCR recognition, and the recognition optimization can have pertinence after the OCR recognition is cut into single characters, so that the final precision is improved, and meanwhile, the recognition frame is improved, and the recognition process is separated into two steps of character segmentation and single character recognition, so that the optimization can be separately carried out with pertinence.
According to the above, the real segmentation signal is mapped from the annotation text.
In accordance with the above, the CTC L OSS may automatically resolve text alignment issues.
According to the above, the CTC L OSS calculation formula is as follows:
Figure BDA0002451727140000041
where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.
According to the above, the single correct alignment scheme is one of the alignment schemes, and the single alignment scheme is probabilistically present in the alignment scheme.
According to the above, the probability of the single alignment scheme is calculated as follows:
Figure BDA0002451727140000042
verifying that the real segmentation signals are mapped by the label text, so that the text contents of ' I ', ' I ' and ' Chinese are respectively input, and the real segmentation signals ' 101 ', ' 10101 ' and ' 101010101 ' can be obtained. The core role of CTC is to automatically solve the alignment problem, so that the difference between the segmented signal output by the model and the real signal mapped based on the text length in the above case can be calculated.
The word "state" is taken as an example to illustrate its associated definition and computational logic:
by using the above method for character segmentation and recognition based on the CTC deep neural network, the CTC L OSS aims to maximize the probability value of formula 1, where x in formula 1 is a feature generated after an image is extracted by using CNN, L is a real signal, and pi represents a single correct alignment scheme, where all in formula 3 are correct alignment schemes.
Figure BDA0002451727140000051
The probability of a single correct alignment scheme is calculated using the above calculation formula 2.
And calculating the loss between the real segmentation signal and the segmentation signal output by the model through a formula 1, a formula 2 and a formula 3, feeding back the loss condition and training the whole model.
Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that various changes, modifications and substitutions can be made without departing from the spirit and scope of the invention as defined by the appended claims. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The character segmentation and identification method based on the CTC deep neural network is characterized by comprising the following steps of:
a1. extracting features of the input image by using CNN;
a2. performing CE LL segmentation on the features extracted in the step (a1), wherein the height and the width of CE LL are fixed, and the number of the CE LL is determined by the length of the image;
a3. directly carrying out segmentation classification on each CE LL with the determined characteristics and outputting segmentation signals;
a4. calculating the loss between the real segmentation signals and the segmentation signals output by the model by using a CTC L OSS calculation formula, feeding back the loss condition and training the whole model;
a5. and (d) segmenting the text by utilizing the segmentation signal output by the step (a3), and performing CNN + softmax classification recognition on the single character.
2. The method of CTC deep neural network-based word segmentation and recognition according to claim 1, wherein the true segmentation signal is mapped from an annotation text.
3. The CTC deep neural network-based word segmentation and recognition method of claim 1, wherein the CTC L OSS may automatically solve a text alignment problem.
4. The CTC deep neural network-based text segmentation and recognition method of claim 1, wherein the CTC L OSS calculation formula is as follows:
Figure FDA0002451727130000011
where x is the feature produced after the image extraction using CNN, L is the true signal, and pi represents a single correct alignment scheme.
5. The method of CTC deep neural network-based text segmentation and recognition of claim 4, wherein the single correct alignment scheme is one of alignment schemes, the single alignment scheme being probabilistically present in the alignment scheme.
6. The method of CTC deep neural network-based text segmentation and recognition of claim 5, wherein the probability of the single alignment scheme is calculated as follows:
Figure FDA0002451727130000012
CN202010294624.4A 2020-04-15 2020-04-15 Character segmentation and identification method based on CTC deep neural network Pending CN111507348A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010294624.4A CN111507348A (en) 2020-04-15 2020-04-15 Character segmentation and identification method based on CTC deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010294624.4A CN111507348A (en) 2020-04-15 2020-04-15 Character segmentation and identification method based on CTC deep neural network

Publications (1)

Publication Number Publication Date
CN111507348A true CN111507348A (en) 2020-08-07

Family

ID=71870990

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010294624.4A Pending CN111507348A (en) 2020-04-15 2020-04-15 Character segmentation and identification method based on CTC deep neural network

Country Status (1)

Country Link
CN (1) CN111507348A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381175A (en) * 2020-12-05 2021-02-19 中国人民解放军32181部队 Circuit board identification and analysis method based on image processing
CN113537201A (en) * 2021-09-16 2021-10-22 江西风向标教育科技有限公司 Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3200123A1 (en) * 2016-01-28 2017-08-02 Siemens Aktiengesellschaft Text recognition
CN108960245A (en) * 2018-07-13 2018-12-07 广东工业大学 The detection of tire-mold character and recognition methods, device, equipment and storage medium
CN109241894A (en) * 2018-08-28 2019-01-18 南京安链数据科技有限公司 A kind of specific aim ticket contents identifying system and method based on form locating and deep learning
CN109993160A (en) * 2019-02-18 2019-07-09 北京联合大学 A kind of image flame detection and text and location recognition method and system
US10388272B1 (en) * 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
CN110175603A (en) * 2019-04-01 2019-08-27 佛山缔乐视觉科技有限公司 A kind of engraving character recognition methods, system and storage medium
CN110766017A (en) * 2019-10-22 2020-02-07 国网新疆电力有限公司信息通信公司 Mobile terminal character recognition method and system based on deep learning
CN110866530A (en) * 2019-11-13 2020-03-06 云南大学 Character image recognition method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3200123A1 (en) * 2016-01-28 2017-08-02 Siemens Aktiengesellschaft Text recognition
CN108960245A (en) * 2018-07-13 2018-12-07 广东工业大学 The detection of tire-mold character and recognition methods, device, equipment and storage medium
CN109241894A (en) * 2018-08-28 2019-01-18 南京安链数据科技有限公司 A kind of specific aim ticket contents identifying system and method based on form locating and deep learning
US10388272B1 (en) * 2018-12-04 2019-08-20 Sorenson Ip Holdings, Llc Training speech recognition systems using word sequences
CN109993160A (en) * 2019-02-18 2019-07-09 北京联合大学 A kind of image flame detection and text and location recognition method and system
CN110175603A (en) * 2019-04-01 2019-08-27 佛山缔乐视觉科技有限公司 A kind of engraving character recognition methods, system and storage medium
CN110766017A (en) * 2019-10-22 2020-02-07 国网新疆电力有限公司信息通信公司 Mobile terminal character recognition method and system based on deep learning
CN110866530A (en) * 2019-11-13 2020-03-06 云南大学 Character image recognition method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张艺玮;赵一嘉;王馨悦;董兰芳;: "结合密集神经网络与长短时记忆模型的中文识别" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381175A (en) * 2020-12-05 2021-02-19 中国人民解放军32181部队 Circuit board identification and analysis method based on image processing
CN113537201A (en) * 2021-09-16 2021-10-22 江西风向标教育科技有限公司 Multi-dimensional hybrid OCR recognition method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113158808B (en) Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction
Rehman et al. Performance analysis of character segmentation approach for cursive script recognition on benchmark database
CN112818951B (en) Ticket identification method
CN104778470B (en) Text detection based on component tree and Hough forest and recognition methods
CN104881458B (en) A kind of mask method and device of Web page subject
EP3349124A1 (en) Method and system for generating parsed document from digital document
CN110413787B (en) Text clustering method, device, terminal and storage medium
CN113537227B (en) Structured text recognition method and system
CN111507348A (en) Character segmentation and identification method based on CTC deep neural network
CN111539417B (en) Text recognition training optimization method based on deep neural network
CN109086772A (en) A kind of recognition methods and system distorting adhesion character picture validation code
CN114187595A (en) Document layout recognition method and system based on fusion of visual features and semantic features
CN117437647B (en) Oracle character detection method based on deep learning and computer vision
CN114581932A (en) Picture table line extraction model construction method and picture table extraction method
Ghosh et al. R-PHOC: segmentation-free word spotting using CNN
CN111832497B (en) Text detection post-processing method based on geometric features
Karanje et al. Survey on text detection, segmentation and recognition from a natural scene images
CN109284678A (en) Guideboard method for recognizing semantics and system
US20230315799A1 (en) Method and system for extracting information from input document comprising multi-format information
CN111581478A (en) Cross-website general news acquisition method for specific subject
CN113761209B (en) Text splicing method and device, electronic equipment and storage medium
CN114529894A (en) Rapid scene text detection method fusing hole convolution
Mars et al. Combination of DE-GAN with CNN-LSTM for Arabic OCR on Images with Colorful Backgrounds
Mosannafat et al. Farsi text detection and localization in videos and images
Fan et al. BURSTS: A bottom-up approach for robust spotting of texts in scenes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination