CN114255464A - Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework - Google Patents
Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework Download PDFInfo
- Publication number
- CN114255464A CN114255464A CN202111530794.9A CN202111530794A CN114255464A CN 114255464 A CN114255464 A CN 114255464A CN 202111530794 A CN202111530794 A CN 202111530794A CN 114255464 A CN114255464 A CN 114255464A
- Authority
- CN
- China
- Prior art keywords
- network
- training
- scrn
- data set
- craft
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks, which comprises the following steps: (1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set; (2) training a CRAFT network by using an image data set; (3) training irregular texts by using a real data set to correct network SCRN; (4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network; (5) and connecting the CRAFT network with the SCRN-SEED network, constructing a complete model and training. The method can fully detect the bending deformation text or long text examples, achieves the detection purpose by accurately positioning each character and then connecting the detected characters into one text through an affinity mechanism, and is suitable for bending, deforming or extremely long texts; through correction of irregular text pictures and detection of semantic information used for global information, low-quality text examples can be accurately identified.
Description
Technical Field
The invention relates to the technical field of character detection methods, in particular to a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks.
Background
Optical Character Recognition (OCR) conventionally refers to analyzing an input scanned document image to recognize Character information in the image, and such a technique assumes that an input image has a clean background, simple fonts and regular Character arrangement, and can achieve a higher Recognition level in a case of meeting requirements. Scene Text Recognition (STR) refers to recognizing Text information in a natural Scene picture, and the difficulty of the Scene Text Recognition is much higher than that of Text Recognition in a scanned document image, and the display forms of characters in the natural Scene are extremely rich, and the following situations may occur: the text line has a plurality of patterns such as horizontal, vertical, bending, rotating, twisting and the like; the character area in the image generates the phenomena of incomplete, fuzzy and the like; background is various, such as characters appear on a plane, a curved surface or a wrinkled surface, complex interference textures are arranged near character areas, or non-character areas have textures similar to characters, such as sand, a grass, a fence, a brick wall and the like.
The scene text detection method based on the neural network obtains a good effect, far exceeds the traditional technology in detection and identification, and still cannot well solve the problems of bending, fuzzy interference texture and the like in a natural scene. The existing natural scene character detection method has the following problems: the character positioning is not accurate, the traditional text positioning model frame mostly focuses on the whole line of text, a large receptive field is needed, and a single rectangular box is adopted to mark the position of the text, so that the method is not suitable for bending, deforming or extremely long text and is difficult to accurately position and mark; character recognition is inaccurate, and many proposed recognition methods based on an encoder-decoder framework are used for processing curved texts, but most of the methods are based on local visual features, and ignore global semantic information, so that recognition accuracy is greatly reduced when the method is faced with conditions such as image blurring, uneven illumination, incomplete characters and the like.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above problems, the present invention aims to provide a natural scene text detection and recognition method based on a CRAFT and SCRN-SEED framework.
The technical scheme is as follows: the invention discloses a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks, which comprises the following steps:
(1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set;
(2) training the CRAFT network with an image data set:
(201) improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a region score and an affinity score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) initializing the improved CRAFT network by using a pre-training model by applying the idea of transfer learning;
(3) training irregular texts by using a real data set to correct network SCRN;
(4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network;
(5) and connecting the improved CRAFT network and the SCRN-SEED network, constructing a complete model and training.
Further, the step of initializing the improved CRAFT network by using the pre-training model according to the idea of transfer learning includes:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: ratio of 5 using the synthttext dataset, 1: 3, using on-line difficult excavation;
then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
Further, the step of combining the SCRN and the SEED network and training the combined SCRN-SEED network comprises:
replacing an image correction module in the SEED network with the trained SCRN to construct a SCRN-SEED network, initializing parameters of a pre-training model by using a pre-training language model of a semantic model FastText, primarily training the SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
Further, the step of connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training the complete model comprises: generating a minimum rectangular frame containing all characters from polygons of texts with any shapes, cutting the rectangular frame, adjusting the format of the cut pictures, inputting the minimum rectangular frame into a SCRN-SEED network to complete the construction of a model, training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene into the model, and performing automatic character detection and recognition tasks.
Further, the real data sets are from ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText, CTW-1500, etc., and the synthetic data sets are SynthText data sets;
resizing each picture in the image dataset and converting mdb the format of the pictures in the dataset.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:
1. the method can fully detect the bending deformation text or the long text example, achieves the detection purpose by accurately positioning each character and then connecting the detected characters into one text through an affinity machine system, only needs to pay attention to the distance between the characters, does not need to pay attention to the whole line of text, does not need a large receptive field, and is suitable for the bending, deformation or extremely long text;
2. the invention can accurately identify the text example with low quality: the invention corrects the irregular text picture by the correcting module of the SCRN according to the output result of the detector, and the SCRN network achieves the purpose of correction by means of the central line of each character instance and some geometric attributes, so that the effect is better; the corrected text picture is subjected to character recognition work through a recognition module in the SEED network, semantic information is used for predicting global information, and the accuracy is greatly improved in text recognition with defects, fuzzy phenomena and the like.
Drawings
FIG. 1 is a diagram of the basic framework of the improved CRAFT model of the present invention;
FIG. 2 is a basic frame diagram of an irregular text correction network SCRN employed in the present invention;
FIG. 3 is a basic framework diagram of the SEED network identification module employed in the present invention;
FIG. 4 is a diagram of an overall model structure of the method for detecting and identifying characters in a natural scene according to the present invention;
fig. 5 is a diagram illustrating an exemplary effect of the method for detecting and identifying characters in a natural scene according to the present invention.
Detailed Description
The method for detecting and identifying the natural scene characters based on the CRAFT and the SCRN-SEED framework comprises the following steps:
(1) and establishing an image data set by using the real data set and the synthetic data set, dividing the image data set into a training set and a testing set, adjusting the size of each picture in the image data set, and converting mdb the format of the pictures in the data set.
The real data set is from ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText and CTW-1500, and the synthetic data set is a SynthText data set.
(2) The CRAFT network is trained using an image data set, and the flowchart is shown in FIG. 1.
(201) Improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a Region Score and an Affinity Score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) applying the idea of transfer learning, initializing the improved CRAFT network by using a pre-training model:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: a ratio of 5 uses the synthttext data set to ensure that the character regions are certainly separate, with 1: the scale of 3 uses online hard mining and may also apply data enhancement techniques such as cropping, rotation, or color change.
Then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
(3) The irregular text correction network SCRN is trained using the real data set, and the network framework is shown in fig. 2.
(4) Combining the SCRN with the SEED network, and training the combined SCRN-SEED network to be used as a recognition network. The framework of the identity module in the original SEED network is shown in fig. 3.
And replacing an image correction module in the original SEED network with the trained SCRN to construct a SCRN-SEED network, so that the correction effect and the recognition accuracy of irregular texts are improved, and the difficulty of model training is reduced. Downloading a pre-training language model of a semantic model FastText according to a required recognition language, initializing parameters of the pre-training model, preliminarily training the SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
(5) And connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training.
And generating a minimum rectangular frame containing all characters by using the polygon of the text with any shape, cutting the rectangular frame, adjusting the cut picture format, and inputting the picture into the SCRN-SEED network to complete the construction of the model. The model framework after construction is shown in fig. 4 and comprises three parts, namely a CRAFT network is used as a detector part, an SCRN network is used as a correction network part, and a SEED network is used as an identification part. Training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene to the model weight, and performing automatic character detection and recognition tasks, wherein fig. 5 is an effect graph of each part of the pictures in the natural scene when the pictures are positioned and recognized by the model.
Claims (5)
1. The natural scene character detection and identification method based on the CRAFT and the SCRN-SEED framework is characterized by comprising the following steps:
(1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set;
(2) training the CRAFT network with an image data set:
(201) improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a region score and an affinity score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) initializing the improved CRAFT network by using a pre-training model by applying the idea of transfer learning;
(3) training irregular texts by using a real data set to correct network SCRN;
(4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network;
(5) and connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training.
2. The method for detecting and recognizing characters in natural scene according to claim 1, wherein the step of initializing the improved CRAFT network by using the pre-training model based on the idea of transfer learning comprises:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: ratio of 5 using the synthttext dataset, 1: 3, using on-line difficult excavation;
then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
3. The method for detecting and recognizing natural scene characters as claimed in claim 1, wherein the step of combining the SCRN with the SEED network and training the combined SCRN-SEED network comprises:
replacing an image correction module in the SEED network with the trained SCRN, initializing parameters of a pre-training model by using a pre-training language model of a semantic model FastText, initially training the improved SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
4. The natural scene character detection and recognition method of claim 1, wherein the step of connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training the model comprises: generating a minimum rectangular frame containing all characters from polygons of texts with any shapes, cutting the rectangular frame, adjusting the format of the cut pictures, inputting the minimum rectangular frame into a SCRN-SEED network to complete the construction of a model, training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene into the model, and performing automatic character detection and recognition tasks.
5. The natural scene word detection and recognition method of claim 1, wherein the real data set is from an ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText, CTW-1500 database, and the synthetic data set is a SynthText data set;
resizing each picture in the image dataset and converting mdb the format of the pictures in the dataset.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111530794.9A CN114255464A (en) | 2021-12-14 | 2021-12-14 | Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111530794.9A CN114255464A (en) | 2021-12-14 | 2021-12-14 | Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114255464A true CN114255464A (en) | 2022-03-29 |
Family
ID=80792318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111530794.9A Pending CN114255464A (en) | 2021-12-14 | 2021-12-14 | Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114255464A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114972947A (en) * | 2022-07-26 | 2022-08-30 | 之江实验室 | Depth scene text detection method and device based on fuzzy semantic modeling |
-
2021
- 2021-12-14 CN CN202111530794.9A patent/CN114255464A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114972947A (en) * | 2022-07-26 | 2022-08-30 | 之江实验室 | Depth scene text detection method and device based on fuzzy semantic modeling |
CN114972947B (en) * | 2022-07-26 | 2022-12-06 | 之江实验室 | Depth scene text detection method and device based on fuzzy semantic modeling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111325203B (en) | American license plate recognition method and system based on image correction | |
CN109241894B (en) | Bill content identification system and method based on form positioning and deep learning | |
CN111723585B (en) | Style-controllable image text real-time translation and conversion method | |
CN111626292B (en) | Text recognition method of building indication mark based on deep learning technology | |
CN113537227B (en) | Structured text recognition method and system | |
CN112307919B (en) | Improved YOLOv 3-based digital information area identification method in document image | |
CN111242024A (en) | Method and system for recognizing legends and characters in drawings based on machine learning | |
CN110796131A (en) | Chinese character writing evaluation system | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
Tardón et al. | Optical music recognition for scores written in white mensural notation | |
CN116704523B (en) | Text typesetting image recognition system for publishing and printing equipment | |
CN115880566A (en) | Intelligent marking system based on visual analysis | |
CN115240210A (en) | System and method for auxiliary exercise of handwritten Chinese characters | |
CN114255464A (en) | Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework | |
CN117437647B (en) | Oracle character detection method based on deep learning and computer vision | |
CN109508716B (en) | Image character positioning method and device | |
CN109147002B (en) | Image processing method and device | |
CN114581932A (en) | Picture table line extraction model construction method and picture table extraction method | |
CN113989806A (en) | Extensible CRNN bank card number identification method | |
CN111832497B (en) | Text detection post-processing method based on geometric features | |
CN110766001B (en) | Bank card number positioning and end-to-end identification method based on CNN and RNN | |
CN111414889A (en) | Financial statement identification method and device based on character identification | |
CN115311666A (en) | Image-text recognition method and device, computer equipment and storage medium | |
Nath et al. | Improving various offline techniques used for handwritten character recognition: a review | |
CN113657162A (en) | Bill OCR recognition method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |