CN114255464A - Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework - Google Patents

Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework Download PDF

Info

Publication number
CN114255464A
CN114255464A CN202111530794.9A CN202111530794A CN114255464A CN 114255464 A CN114255464 A CN 114255464A CN 202111530794 A CN202111530794 A CN 202111530794A CN 114255464 A CN114255464 A CN 114255464A
Authority
CN
China
Prior art keywords
network
training
scrn
data set
craft
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111530794.9A
Other languages
Chinese (zh)
Inventor
叶堂华
孙乐
朱均可
刘凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202111530794.9A priority Critical patent/CN114255464A/en
Publication of CN114255464A publication Critical patent/CN114255464A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks, which comprises the following steps: (1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set; (2) training a CRAFT network by using an image data set; (3) training irregular texts by using a real data set to correct network SCRN; (4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network; (5) and connecting the CRAFT network with the SCRN-SEED network, constructing a complete model and training. The method can fully detect the bending deformation text or long text examples, achieves the detection purpose by accurately positioning each character and then connecting the detected characters into one text through an affinity mechanism, and is suitable for bending, deforming or extremely long texts; through correction of irregular text pictures and detection of semantic information used for global information, low-quality text examples can be accurately identified.

Description

Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework
Technical Field
The invention relates to the technical field of character detection methods, in particular to a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks.
Background
Optical Character Recognition (OCR) conventionally refers to analyzing an input scanned document image to recognize Character information in the image, and such a technique assumes that an input image has a clean background, simple fonts and regular Character arrangement, and can achieve a higher Recognition level in a case of meeting requirements. Scene Text Recognition (STR) refers to recognizing Text information in a natural Scene picture, and the difficulty of the Scene Text Recognition is much higher than that of Text Recognition in a scanned document image, and the display forms of characters in the natural Scene are extremely rich, and the following situations may occur: the text line has a plurality of patterns such as horizontal, vertical, bending, rotating, twisting and the like; the character area in the image generates the phenomena of incomplete, fuzzy and the like; background is various, such as characters appear on a plane, a curved surface or a wrinkled surface, complex interference textures are arranged near character areas, or non-character areas have textures similar to characters, such as sand, a grass, a fence, a brick wall and the like.
The scene text detection method based on the neural network obtains a good effect, far exceeds the traditional technology in detection and identification, and still cannot well solve the problems of bending, fuzzy interference texture and the like in a natural scene. The existing natural scene character detection method has the following problems: the character positioning is not accurate, the traditional text positioning model frame mostly focuses on the whole line of text, a large receptive field is needed, and a single rectangular box is adopted to mark the position of the text, so that the method is not suitable for bending, deforming or extremely long text and is difficult to accurately position and mark; character recognition is inaccurate, and many proposed recognition methods based on an encoder-decoder framework are used for processing curved texts, but most of the methods are based on local visual features, and ignore global semantic information, so that recognition accuracy is greatly reduced when the method is faced with conditions such as image blurring, uneven illumination, incomplete characters and the like.
Disclosure of Invention
The purpose of the invention is as follows: in view of the above problems, the present invention aims to provide a natural scene text detection and recognition method based on a CRAFT and SCRN-SEED framework.
The technical scheme is as follows: the invention discloses a natural scene character detection and identification method based on CRAFT and SCRN-SEED frameworks, which comprises the following steps:
(1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set;
(2) training the CRAFT network with an image data set:
(201) improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a region score and an affinity score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) initializing the improved CRAFT network by using a pre-training model by applying the idea of transfer learning;
(3) training irregular texts by using a real data set to correct network SCRN;
(4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network;
(5) and connecting the improved CRAFT network and the SCRN-SEED network, constructing a complete model and training.
Further, the step of initializing the improved CRAFT network by using the pre-training model according to the idea of transfer learning includes:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: ratio of 5 using the synthttext dataset, 1: 3, using on-line difficult excavation;
then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
Further, the step of combining the SCRN and the SEED network and training the combined SCRN-SEED network comprises:
replacing an image correction module in the SEED network with the trained SCRN to construct a SCRN-SEED network, initializing parameters of a pre-training model by using a pre-training language model of a semantic model FastText, primarily training the SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
Further, the step of connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training the complete model comprises: generating a minimum rectangular frame containing all characters from polygons of texts with any shapes, cutting the rectangular frame, adjusting the format of the cut pictures, inputting the minimum rectangular frame into a SCRN-SEED network to complete the construction of a model, training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene into the model, and performing automatic character detection and recognition tasks.
Further, the real data sets are from ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText, CTW-1500, etc., and the synthetic data sets are SynthText data sets;
resizing each picture in the image dataset and converting mdb the format of the pictures in the dataset.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages:
1. the method can fully detect the bending deformation text or the long text example, achieves the detection purpose by accurately positioning each character and then connecting the detected characters into one text through an affinity machine system, only needs to pay attention to the distance between the characters, does not need to pay attention to the whole line of text, does not need a large receptive field, and is suitable for the bending, deformation or extremely long text;
2. the invention can accurately identify the text example with low quality: the invention corrects the irregular text picture by the correcting module of the SCRN according to the output result of the detector, and the SCRN network achieves the purpose of correction by means of the central line of each character instance and some geometric attributes, so that the effect is better; the corrected text picture is subjected to character recognition work through a recognition module in the SEED network, semantic information is used for predicting global information, and the accuracy is greatly improved in text recognition with defects, fuzzy phenomena and the like.
Drawings
FIG. 1 is a diagram of the basic framework of the improved CRAFT model of the present invention;
FIG. 2 is a basic frame diagram of an irregular text correction network SCRN employed in the present invention;
FIG. 3 is a basic framework diagram of the SEED network identification module employed in the present invention;
FIG. 4 is a diagram of an overall model structure of the method for detecting and identifying characters in a natural scene according to the present invention;
fig. 5 is a diagram illustrating an exemplary effect of the method for detecting and identifying characters in a natural scene according to the present invention.
Detailed Description
The method for detecting and identifying the natural scene characters based on the CRAFT and the SCRN-SEED framework comprises the following steps:
(1) and establishing an image data set by using the real data set and the synthetic data set, dividing the image data set into a training set and a testing set, adjusting the size of each picture in the image data set, and converting mdb the format of the pictures in the data set.
The real data set is from ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText and CTW-1500, and the synthetic data set is a SynthText data set.
(2) The CRAFT network is trained using an image data set, and the flowchart is shown in FIG. 1.
(201) Improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a Region Score and an Affinity Score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) applying the idea of transfer learning, initializing the improved CRAFT network by using a pre-training model:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: a ratio of 5 uses the synthttext data set to ensure that the character regions are certainly separate, with 1: the scale of 3 uses online hard mining and may also apply data enhancement techniques such as cropping, rotation, or color change.
Then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
(3) The irregular text correction network SCRN is trained using the real data set, and the network framework is shown in fig. 2.
(4) Combining the SCRN with the SEED network, and training the combined SCRN-SEED network to be used as a recognition network. The framework of the identity module in the original SEED network is shown in fig. 3.
And replacing an image correction module in the original SEED network with the trained SCRN to construct a SCRN-SEED network, so that the correction effect and the recognition accuracy of irregular texts are improved, and the difficulty of model training is reduced. Downloading a pre-training language model of a semantic model FastText according to a required recognition language, initializing parameters of the pre-training model, preliminarily training the SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
(5) And connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training.
And generating a minimum rectangular frame containing all characters by using the polygon of the text with any shape, cutting the rectangular frame, adjusting the cut picture format, and inputting the picture into the SCRN-SEED network to complete the construction of the model. The model framework after construction is shown in fig. 4 and comprises three parts, namely a CRAFT network is used as a detector part, an SCRN network is used as a correction network part, and a SEED network is used as an identification part. Training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene to the model weight, and performing automatic character detection and recognition tasks, wherein fig. 5 is an effect graph of each part of the pictures in the natural scene when the pictures are positioned and recognized by the model.

Claims (5)

1. The natural scene character detection and identification method based on the CRAFT and the SCRN-SEED framework is characterized by comprising the following steps:
(1) establishing an image data set by utilizing the real data set and the synthetic data set, and dividing the image data set into a training set and a testing set;
(2) training the CRAFT network with an image data set:
(201) improving the CRAFT network, taking a ResNet50 network as a backbone network, inputting the pictures in the synthetic data set into the improved CRAFT network for feature extraction, and outputting a region score and an affinity score;
(202) coding is carried out through Gaussian thermodynamic mapping according to the two scores to generate a Gaussian thermodynamic diagram;
(203) cutting a complete text in an input picture into single characters according to a watershed algorithm, and generating polygons of texts in any shapes from the characters through post-processing operation;
(203) initializing the improved CRAFT network by using a pre-training model by applying the idea of transfer learning;
(3) training irregular texts by using a real data set to correct network SCRN;
(4) combining the SCRN with the SEED network, and training the combined SCRN-SEED network;
(5) and connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training.
2. The method for detecting and recognizing characters in natural scene according to claim 1, wherein the step of initializing the improved CRAFT network by using the pre-training model based on the idea of transfer learning comprises:
first, a CRAFT network is trained using a synthetic dataset, the network is optimized using an Adam optimizer, and the network is then fine-tuned using multiple real datasets, during the fine-tuning, with a 1: ratio of 5 using the synthttext dataset, 1: 3, using on-line difficult excavation;
then, a CRAFT network is trained by using a real data set containing quadrilateral labels and a SynthText data set, and a part of the CRAFT network is divided into a test set to adjust network parameters.
3. The method for detecting and recognizing natural scene characters as claimed in claim 1, wherein the step of combining the SCRN with the SEED network and training the combined SCRN-SEED network comprises:
replacing an image correction module in the SEED network with the trained SCRN, initializing parameters of a pre-training model by using a pre-training language model of a semantic model FastText, initially training the improved SCRN-SEED network by using a test set, and adjusting network parameters according to a training effect.
4. The natural scene character detection and recognition method of claim 1, wherein the step of connecting the improved CRAFT network with the SCRN-SEED network, constructing a complete model and training the model comprises: generating a minimum rectangular frame containing all characters from polygons of texts with any shapes, cutting the rectangular frame, adjusting the format of the cut pictures, inputting the minimum rectangular frame into a SCRN-SEED network to complete the construction of a model, training the model by using a verification set, reserving parameters with the optimal training effect, inputting pictures in a natural scene into the model, and performing automatic character detection and recognition tasks.
5. The natural scene word detection and recognition method of claim 1, wherein the real data set is from an ICDAR2013, ICDAR2015, ICDAR2017, MSRA-TD500, TotalText, CTW-1500 database, and the synthetic data set is a SynthText data set;
resizing each picture in the image dataset and converting mdb the format of the pictures in the dataset.
CN202111530794.9A 2021-12-14 2021-12-14 Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework Pending CN114255464A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111530794.9A CN114255464A (en) 2021-12-14 2021-12-14 Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111530794.9A CN114255464A (en) 2021-12-14 2021-12-14 Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework

Publications (1)

Publication Number Publication Date
CN114255464A true CN114255464A (en) 2022-03-29

Family

ID=80792318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111530794.9A Pending CN114255464A (en) 2021-12-14 2021-12-14 Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework

Country Status (1)

Country Link
CN (1) CN114255464A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972947A (en) * 2022-07-26 2022-08-30 之江实验室 Depth scene text detection method and device based on fuzzy semantic modeling

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114972947A (en) * 2022-07-26 2022-08-30 之江实验室 Depth scene text detection method and device based on fuzzy semantic modeling
CN114972947B (en) * 2022-07-26 2022-12-06 之江实验室 Depth scene text detection method and device based on fuzzy semantic modeling

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN109241894B (en) Bill content identification system and method based on form positioning and deep learning
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN111626292B (en) Text recognition method of building indication mark based on deep learning technology
CN113537227B (en) Structured text recognition method and system
CN112307919B (en) Improved YOLOv 3-based digital information area identification method in document image
CN111242024A (en) Method and system for recognizing legends and characters in drawings based on machine learning
CN110796131A (en) Chinese character writing evaluation system
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
Tardón et al. Optical music recognition for scores written in white mensural notation
CN116704523B (en) Text typesetting image recognition system for publishing and printing equipment
CN115880566A (en) Intelligent marking system based on visual analysis
CN115240210A (en) System and method for auxiliary exercise of handwritten Chinese characters
CN114255464A (en) Natural scene character detection and identification method based on CRAFT and SCRN-SEED framework
CN117437647B (en) Oracle character detection method based on deep learning and computer vision
CN109508716B (en) Image character positioning method and device
CN109147002B (en) Image processing method and device
CN114581932A (en) Picture table line extraction model construction method and picture table extraction method
CN113989806A (en) Extensible CRNN bank card number identification method
CN111832497B (en) Text detection post-processing method based on geometric features
CN110766001B (en) Bank card number positioning and end-to-end identification method based on CNN and RNN
CN111414889A (en) Financial statement identification method and device based on character identification
CN115311666A (en) Image-text recognition method and device, computer equipment and storage medium
Nath et al. Improving various offline techniques used for handwritten character recognition: a review
CN113657162A (en) Bill OCR recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination