CN112580507A - Deep learning text character detection method based on image moment correction - Google Patents

Deep learning text character detection method based on image moment correction Download PDF

Info

Publication number
CN112580507A
CN112580507A CN202011506599.8A CN202011506599A CN112580507A CN 112580507 A CN112580507 A CN 112580507A CN 202011506599 A CN202011506599 A CN 202011506599A CN 112580507 A CN112580507 A CN 112580507A
Authority
CN
China
Prior art keywords
character
loss
text
loss function
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011506599.8A
Other languages
Chinese (zh)
Other versions
CN112580507B (en
Inventor
田辉
刘其开
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei High Dimensional Data Technology Co ltd
Original Assignee
Hefei High Dimensional Data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei High Dimensional Data Technology Co ltd filed Critical Hefei High Dimensional Data Technology Co ltd
Priority to CN202011506599.8A priority Critical patent/CN112580507B/en
Publication of CN112580507A publication Critical patent/CN112580507A/en
Application granted granted Critical
Publication of CN112580507B publication Critical patent/CN112580507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a deep learning text character detection method based on image moment correction, which specifically comprises the following steps: preparing a data set, manually correcting a pre-labeled box frame, generating a heat map label in a Gaussian heat map form according to the box frame, defining a neural network structure and a loss function, pre-training, expanding a training sample set of an actual scene, performing self-adaptive binarization operation on the expanded training sample set, calculating Hu moment feature vectors of each character, and performing fine tuning training and model testing and verification by taking an orientation quantity mean value as an auxiliary label of the character and modifying the loss function form; according to the method, the heat map label and the moment feature vector label are combined to form an optimization loss function, so that the accuracy of a character box frame is improved, and the problems of excessive segmentation and insufficient segmentation of a character frame are solved; by preprocessing the sample set after expansion, the problem of insufficient character-level labeling is solved, and the character detection generalization capability is better.

Description

Deep learning text character detection method based on image moment correction
Technical Field
The invention belongs to the field of target detection, and particularly relates to a deep learning text character detection method based on image moment correction.
Background
At present, text detection is widely applied to the field of computer vision, such as real-time translation, image retrieval, scene analysis, geographic positioning, blind navigation and the like, so that the text detection has extremely high application value and research significance in scene understanding and text analysis.
The existing text detection methods are divided into the following categories:
1. the traditional image processing method is based on manually designed feature detection, such as MSER (maximum stable extremum region) and SWT (stroke width transformation), mainly processes text detection of printing fonts and printing and scanning scenes, and has poor text detection effect on natural scenes;
2. the Two-stage method based on deep learning generates a candidate region and extracts corresponding features, carries out network training fine adjustment, and outputs a corresponding text region frame, and has the advantages of higher precision, good performance for small-scale target detection, sharing calculated amount, and low inference speed and longer training period;
3. the One-stage method based on deep learning directly skips the step of generating a candidate frame to predict the text region frame of the target end to end, and has the advantages of high reasoning speed, lower precision than the two-stage method and poor small target detection effect.
Most of the existing text detection algorithm technology is based on the position coordinates of the region of the output text line, for example, the reference network CTPN in the existing text detection technology is improved based on the Two-stage method, on the basis of the fast RCNN, the specificity improvement of horizontal arrangement or vertical arrangement of the target text is combined, and the region of the text line is output. Existing text detection algorithm techniques are not accurate to character-level text detection and thus provide limited information.
The existing character-level text detection algorithm is based on a semantic segmentation idea, a Gaussian center heat map is used for replacing a pixel-level block heat map by a label, two indexes of regional score or compact score are adopted to optimize a network, and a probability map is subjected to binarization processing in post-processing to obtain a final character frame. The character-level text detection can output not only the coordinates of a single character frame body, but also the coordinates of a text line area, the output information is richer, and the larger requirements of customers can be met. However, the existing algorithm for detecting the character-level text is affected by parameters and the complex Chinese text scene, and the over-segmentation or under-segmentation phenomenon occurs on the frame of the segmented character, which respectively corresponds to the rectangular frame and the blackened rectangular frame shown in fig. 4.
Disclosure of Invention
In order to solve the above problems, the present invention provides a deep learning text character detection method based on image moment correction, which includes the following steps:
a: preparing a data set, namely pre-labeling a randomly sampled sample in the data set, and storing a box frame of each character of the sample;
b: manually correcting the box frame which is not accurately pre-marked, and generating a heat map label in a Gaussian heat map form according to the box frame;
c: defining neural network structure and loss function losscross
D: using the determined network structure and loss function loss in said step CcrossCarrying out preliminary pre-training;
e: expanding a training sample set of an actual scene;
f: performing self-adaptive binarization operation on the training sample set expanded in the step E, calculating the Hu moment feature vector of each character, and taking the orientation quantity mean value as an auxiliary label of the character;
g: modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set;
h: and (3) model testing and verifying, namely modifying the parameter theta of the Gaussian heat map generated by the pre-labeling, and drawing an accuracy rate change curve of the character box frame under different theta threshold values, so that a proper parameter theta is selected according to requirements.
Further, in the present invention,
the data set in the step A mainly comprises data in ICDAR2017, ICDAR2019 and CTW, and randomly sampled samples in the data set are pre-labeled by adopting an easy OCR trained public character level segmentation model.
Further, in the present invention,
the step B of pre-labeling inaccuracy specifically means that the character box frame is over-segmented or under-segmented;
the over-segmentation means that the character box does not contain all the current character in the box, and the under-segmentation means that the character box contains other characters or symbols besides the current character.
Further, in the present invention,
and B, mapping the box frame to a two-dimensional Gaussian map by adopting perspective transformation to generate a tag in the form of a Gaussian heatmap.
Further, in the present invention,
the specific operation of determining the neural network structure in the step C is as follows:
a sample with a preset size is input by the network, a VGG16 reference network is taken as a feature extraction network, and U-net is taken as a decoding network;
outputting a pixel score matrix representing the confidence region;
loss function loss in said step CcrossDetermined by the following method:
loss function losscrossAnd adopting pixel-level cross entropy loss, namely setting the theta threshold value for the tag heat map, wherein character areas are considered to be represented by a category 1 if the theta threshold value is larger, and non-character areas are represented by a category 0 if the theta threshold value is smaller.
Further, in the present invention,
and E, the method for expanding the training sample set of the actual scene comprises the steps of shooting an interface containing documents on a computer screen under random screenshot or different angles, pre-labeling by using a pre-trained model, and manually correcting by using the mode in the step B.
Further, in the present invention,
the theta threshold is obtained by the following steps:
performing Gaussian smoothing processing on the heat map label, and calculating a gradient map of the heat map label;
determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely the character frame under the threshold;
randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
Further, in the present invention,
the loss function loss after modification in the step G is the loss function loss in the step CcrossAdding
Loss of L2: loss is losscross+m*lossL2
Wherein
Figure BDA0002845103350000041
L2 loss characterizing sample moments, m representing number of samples, K representing number of characters of a single sample, yijMeans, f (x), representing the mean of the moment feature vectors corresponding to the jth character in the ith sampleij) And representing the mean value of the moment feature vector corresponding to the jth character in the ith sample of the network output prediction.
Further, in the present invention,
and H, testing and verifying the model in the step H, wherein the sample is a character in a text scene shot or screenshot by a randomly selected computer document.
The invention has the advantages that:
the detection method of the invention provides that the center of a single character is represented based on the image moment characteristics, more robust auxiliary information is provided, namely, an optimization loss function is formed by combining a Gaussian heat map and the moment characteristics to improve the accuracy of a character box frame, the character detection segmentation capability of a model is improved by combining a segmentation task (a heat map label) and a regression task (a moment characteristic label), and the problems of excessive segmentation and under-segmentation of a character frame are solved; in addition, a sample is synthesized by text scenes in the screenshot, a preliminary character text detection model is pre-trained, then pre-labeling is carried out in a real text sample, the text is manually corrected, and the moment characteristic of each character in the real sample is calculated and used as a regular term of a loss function in the training fine adjustment. The preprocessing mode makes up the problem of insufficient character-level labeling on one hand, and on the other hand, the character detection generalization capability of the preprocessing mode is better in the actual text scene of printing, photographing or screenshot.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 shows a prior art character segmentation algorithm flow diagram;
FIG. 2 shows a flow diagram of a character segmentation algorithm of an embodiment of the present invention;
FIG. 3 illustrates an exemplary graph of a exemplar tag Gaussian map of the present invention;
fig. 4 illustrates an exemplary diagram of an over-segmentation or under-segmentation phenomenon.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Because the sample background of the natural scene is complex, and deviation can be caused by calculating the image moment characteristic, the image moment characteristic value is calculated only for the screenshot of the computer document background or the shot specific scene. The moments of different orders have different characteristics, the origin moment or the central moment is used as the characteristics of the image, the characteristics cannot be guaranteed to have translation, rotation and proportion invariance simultaneously, if the characteristics of the image are represented by the central moment, the characteristics only have translation invariance, and the normalized central moment not only has translation invariance, but also has proportion invariance and rotation invariance, so that the Hu moment vector is used as auxiliary information to provide more prior knowledge of the network for training.
The invention discloses a deep learning text character detection method based on image moment correction, which comprises the following steps:
preparing a data set; the public Chinese data set used in the method mainly comprises an ICDAR2017 data set, an ICDAR2019 data set and CTW (wild Chinese text) data, wherein the CTW data has high diversity and complexity and comprises a plane text, a projected text, a city street view text, a town street view text, a text under a weak illumination condition, a long-distance text, a partial display text and the like; for each image, all Chinese characters are marked in the data set; for each Chinese character, the data set is labeled with its character category and bounding box. Firstly, pre-labeling the samples sampled randomly in the data set by using an open character level segmentation model trained by simple optical character recognition (easy OCR), and storing a box frame of each character of each sample;
b, developing a simple fine-tuning man-machine interaction labeling interface, similar to a target detection labeling tool, automatically loading a picture and a json format label corresponding to the picture, and then manually correcting some character box frames with inaccurate pre-labeling in a popped dialog box, wherein the inaccurate prediction refers to insufficient frame (over-segmentation) of the current character or areas from the frame to adjacent characters, commas and the like (under-segmentation), and specific examples refer to a rectangular frame (over-segmentation) and a blackened rectangular frame (under-segmentation) in the graph 4; generating a tag in a gaussian heatmap form according to the box frame of the character, as shown in the sample tag gaussian of fig. 3, in this step, the box frame of the character is mapped onto a two-dimensional gaussian map through perspective transformation, so as to represent the heatmap tag of the character.
C defines the network structure and loss function: the method comprises the steps that a sample with the size h x w x 3 is input into a network, a VGG16 reference network is taken as a feature extraction network, an improved U-net is taken as a decoding network, a pixel score matrix (the specific structure is shown in figure 2) representing a confidence coefficient area is output, h represents the height of an image input into the network, w represents the width of the image input into the network, and 3 is the number of RGB channels;
the loss function uses pixel-level cross entropy loss by setting a theta threshold for the tag heat map, with character regions identified as class 1 if greater than the theta threshold, and non-character regions identified as class 0 if less than the theta threshold.
Therefore, the accuracy of the parameter theta under different values needs to be compared so as to select the best parameter theta, and the theta threshold is obtained by testing in an actual training sample by means of a watershed algorithm in graphics, and the general steps are as follows:
firstly, Gaussian smoothing is carried out on the label heat map, a gradient map of the label heat map is calculated, then communication areas under different thresholds are determined according to a watershed algorithm, the minimum circumscribed rectangle (namely a character frame under the threshold) under each communication area is taken, a plurality of characters are randomly statistically sampled, the accuracy of the minimum circumscribed frame under the threshold is artificially and subjectively judged, and the threshold with relatively high accuracy is taken as a theta threshold.
D, pre-training, and performing preliminary pre-training by adopting the network structure and the loss function defined in the step C.
And E, expanding a training sample set of an actual scene, randomly screenshot or shooting an interface containing documents on a computer screen at different angles, such as a webpage, a word document and the like, pre-labeling by using a pre-trained model, and manually correcting by using the mode of the step B.
F, carrying out self-adaptive binarization on the sample expanded in the step E to obtain a binary image, then calculating the Hu moment feature vector of each character, and taking the average value of the Hu moment feature vector as an auxiliary label of the character; theoretically, the character area moment characteristic mean values have small difference, compared with a non-character area, the moment characteristic value of the character area is much larger, and a moment characteristic branch is introduced, so that on one hand, the attention of a model is more inclined to the character area, and the detection is facilitated; on the other hand, the moment characteristic mean value can guide network learning of a more accurate character frame, and segmentation is facilitated.
G, modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function by using the expanded training sample set; and (3) carrying out model training by using the expanded training sample, wherein the step of distinguishing the details of pre-training is as follows: the loss function of the network is modified, namely a regular item branch which takes the Hu moment feature vector as auxiliary label information is added, and the loss of the original cross entropy is reduced due to the character frame moment vectorcrossAfter L2 loss is added, joint training is carried out, and the value of m is 0.01-0.05;
loss=losscross+m*lossL2
wherein
Figure BDA0002845103350000071
L2 loss characterizing sample moments, m representing the number of samples, and K representing the number of characters of a single sample. y isijRepresenting the mean value of the moment feature vectors corresponding to the jth character in the ith sample, and taking the mean value as a moment feature label, f (x)ij) Represents the mean of the moment eigenvectors corresponding to the jth character in the ith sample of the network output prediction, and L2 represents the least squares error.
The method comprises the steps of H model testing and verification, wherein the model of the method mainly aims to solve the problem of character detection of a text scene shot by a computer document, so that a sample in the scene is used for testing and verification, and the accuracy rate of character segmentation is counted; since the pre-labeled thermodynamic diagram is affected by the parameter theta. Therefore, the accuracy of the parameter theta at different values needs to be compared to select the best parameter theta. The parameter theta of the Gaussian heatmap generated by pre-labeling is modified, and an accuracy rate change curve of the character box frame under different theta threshold values is drawn, so that the appropriate parameter theta is selected according to requirements.
FIG. 1 illustrates a prior art character segmentation algorithm
Scaling an input sample to a value h x w x 3 as a network input, adopting a VGG16 reference network as a feature extraction network, wherein the higher the stage of extracting the network is, the more abstract a correspondingly generated feature map is, and the size is reduced to 1/2; in order to fuse the information of the bottom layer features and the information of the high layer features, the decoding network U-net enables the feature diagram of a certain output layer to be the same as the feature diagram of a certain stage of the extraction network in size through upsampling, so that merging and fusion are conducted, and finally a pixel score matrix representing a character connection confidence coefficient area is output through a convolution layer of 1 x 1. The main idea is to use a segmentation task to predict a character detection frame, add a character connection confidence matrix to an output branch to solve the character positioning problem of a non-rectangular area, and use a synthesized character data set to perform weak supervised learning to complete a pre-training task of a model, thereby improving the character segmentation effect under the integral natural scene.
FIG. 2 shows the character segmentation algorithm of the present method
The method is basically the same as the method in terms of network structure, the size and the output of an input sample are different, the input size is h x w x 3 structure, a VGG16 reference network is taken as a feature extraction network and a decoding network to fuse the features of the upper layer and the lower layer, a pixel score matrix representing a character moment mean vector is output through a convolution layer of 1 x 1, and a moment feature vector is obtained by introducing branch output of a full connection layer. The two branches mainly combine the segmentation and regression tasks, and the box coordinate of target detection is replaced by the moment feature, so that the character is more robust in the positioning segmentation task of Chinese character texts with relatively consistent aspect ratio due to the characteristic of the moment feature vector. Constructing a batch of data sets related to the practical application of the algorithm, such as text data sets of computer shooting and screenshot scenes; the purpose is to solve the character-level text detection problem; by taking the idea of semantic segmentation into account, each character is also labeled with a gaussian heat map, where a higher pixel value indicates a closer pixel to the center point of the character.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A deep learning text character detection method based on image moment correction is characterized by comprising the following steps:
a: preparing a data set, namely pre-labeling a randomly sampled sample in the data set, and storing a box frame of each character of the sample;
b: manually correcting the box frame which is not accurately pre-marked, and generating a heat map label in a Gaussian heat map form according to the box frame;
c: defining neural network structure and loss function losscross
D: using the determined network structure and loss function loss in said step CcrossCarrying out preliminary pre-training;
e: expanding a training sample set of an actual scene;
f: performing self-adaptive binarization operation on the training sample set expanded in the step E, calculating the Hu moment feature vector of each character, and taking the orientation quantity mean value as an auxiliary label of the character;
g: modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set;
h: and (3) model testing and verifying, namely modifying the parameter theta of the Gaussian heat map generated by the pre-labeling, and drawing an accuracy rate change curve of the character box frame under different theta threshold values, so that a proper parameter theta is selected according to requirements.
2. The method of claim 1, wherein the text character detection method based on image moment correction,
the data set in the step A mainly comprises data in ICDAR2017, ICDAR2019 and CTW, and randomly sampled samples in the data set are pre-labeled by adopting an easy OCR trained public character level segmentation model.
3. The method of claim 1, wherein the text character detection method based on image moment correction,
the step B of pre-labeling inaccuracy specifically means that the character box frame is over-segmented or under-segmented;
the over-segmentation means that the character box does not contain all the current character in the box, and the under-segmentation means that the character box contains other characters or symbols besides the current character.
4. The method of claim 1, wherein the text character detection method based on image moment correction,
and B, mapping the box frame to a two-dimensional Gaussian map by adopting perspective transformation to generate a tag in the form of a Gaussian heatmap.
5. The method of claim 1, wherein the text character detection method based on image moment correction,
the specific operation of determining the neural network structure in the step C is as follows:
a sample with a preset size is input by the network, a VGG16 reference network is taken as a feature extraction network, and U-net is taken as a decoding network;
outputting a pixel score matrix representing the confidence region;
loss function loss in said step CcrossDetermined by the following method:
loss function losscrossUsing pixel-level cross entropy loss, i.e. by setting the theta threshold to the tag heat map, to be greater than thAnd if the eta threshold is smaller than the theta threshold, the character area is regarded as a character area and is represented by a category 1, and if the eta threshold is smaller than the theta threshold, the character area is regarded as a non-character area and is represented by a category 0.
6. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
and E, the method for expanding the training sample set of the actual scene comprises the steps of shooting an interface containing documents on a computer screen under random screenshot or different angles, pre-labeling by using a pre-trained model, and manually correcting by using the mode in the step B.
7. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
the theta threshold is obtained by the following steps:
performing Gaussian smoothing processing on the heat map label, and calculating a gradient map of the heat map label;
determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely the character frame under the threshold;
randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
8. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
the loss function loss after modification in the step G is the loss function loss in the step CcrossAdding
Loss of L2: loss is losscross+m*lossL2
Wherein
Figure FDA0002845103340000031
L2 loss characterizing sample moments, m denotes the number of samples,k denotes the number of characters of a single sample, yijMeans, f (x), representing the mean of the moment feature vectors corresponding to the jth character in the ith sampleij) And representing the mean value of the moment feature vector corresponding to the jth character in the ith sample of the network output prediction.
9. The method of claim 8, wherein the text character is selected from a group consisting of a text character,
and H, testing and verifying the model in the step H, wherein the sample is a character in a text scene shot or screenshot by a randomly selected computer document.
CN202011506599.8A 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction Active CN112580507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011506599.8A CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011506599.8A CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Publications (2)

Publication Number Publication Date
CN112580507A true CN112580507A (en) 2021-03-30
CN112580507B CN112580507B (en) 2024-05-31

Family

ID=75136268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011506599.8A Active CN112580507B (en) 2020-12-18 2020-12-18 Deep learning text character detection method based on image moment correction

Country Status (1)

Country Link
CN (1) CN112580507B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221867A (en) * 2021-05-11 2021-08-06 北京邮电大学 Deep learning-based PCB image character detection method
CN113313720A (en) * 2021-06-30 2021-08-27 上海商汤科技开发有限公司 Object segmentation method and device
CN113743416A (en) * 2021-08-24 2021-12-03 的卢技术有限公司 Data enhancement method for real sample-free situation in OCR field
CN114579046A (en) * 2022-01-21 2022-06-03 南华大学 Cloud storage similar data detection method and system
CN117649672A (en) * 2024-01-30 2024-03-05 湖南大学 Font type visual detection method and system based on active learning and transfer learning

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090157571A1 (en) * 2007-12-12 2009-06-18 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification
CN104899821A (en) * 2015-05-27 2015-09-09 合肥高维数据技术有限公司 Method for erasing visible watermark of document image
WO2017185257A1 (en) * 2016-04-27 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing adam gradient descent training algorithm
RU2656708C1 (en) * 2017-06-29 2018-06-06 Самсунг Электроникс Ко., Лтд. Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
EP3499457A1 (en) * 2017-12-15 2019-06-19 Samsung Display Co., Ltd System and method of defect detection on a display
CN110717492A (en) * 2019-10-16 2020-01-21 电子科技大学 Method for correcting direction of character string in drawing based on joint features
WO2020046960A1 (en) * 2018-08-31 2020-03-05 Alibaba Group Holding Limited System and method for optimizing damage detection results
CN111079638A (en) * 2019-12-13 2020-04-28 河北爱尔工业互联网科技有限公司 Target detection model training method, device and medium based on convolutional neural network
CN111222434A (en) * 2019-12-30 2020-06-02 深圳市爱协生科技有限公司 Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090157571A1 (en) * 2007-12-12 2009-06-18 International Business Machines Corporation Method and apparatus for model-shared subspace boosting for multi-label classification
CN104899821A (en) * 2015-05-27 2015-09-09 合肥高维数据技术有限公司 Method for erasing visible watermark of document image
WO2017185257A1 (en) * 2016-04-27 2017-11-02 北京中科寒武纪科技有限公司 Device and method for performing adam gradient descent training algorithm
RU2656708C1 (en) * 2017-06-29 2018-06-06 Самсунг Электроникс Ко., Лтд. Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering
EP3422254A1 (en) * 2017-06-29 2019-01-02 Samsung Electronics Co., Ltd. Method and apparatus for separating text and figures in document images
EP3499457A1 (en) * 2017-12-15 2019-06-19 Samsung Display Co., Ltd System and method of defect detection on a display
CN108399421A (en) * 2018-01-31 2018-08-14 南京邮电大学 A kind of zero sample classification method of depth of word-based insertion
WO2020046960A1 (en) * 2018-08-31 2020-03-05 Alibaba Group Holding Limited System and method for optimizing damage detection results
CN110717492A (en) * 2019-10-16 2020-01-21 电子科技大学 Method for correcting direction of character string in drawing based on joint features
CN111079638A (en) * 2019-12-13 2020-04-28 河北爱尔工业互联网科技有限公司 Target detection model training method, device and medium based on convolutional neural network
CN111222434A (en) * 2019-12-30 2020-06-02 深圳市爱协生科技有限公司 Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
JUNMING CHANG 等: "A Segmentation Algorithm for Touching Character Based on the Invariant Moments and Profile Feature", 《2012 INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND COMMUNICATION TECHNOLOGY》, 31 December 2012 (2012-12-31), pages 188 - 191, XP032311483, DOI: 10.1109/ICCECT.2012.159 *
TIAN, H 等: "Promising Techniques for Anomaly Detection on Network Traffic", 《COMPUTER SCIENCE AND INFORMATION SYSTEMS》, vol. 14, no. 3, 30 November 2017 (2017-11-30), pages 597 - 609 *
杨玲玲 等: "一种基于图像矩和纹理特征的自然场景文本检测算法", 《小型微型计算机系统》, vol. 37, no. 06, 30 June 2016 (2016-06-30), pages 1313 - 1317 *
田萱 等: "基于语义分割的食品标签文本检测", 《农业机械学报》, vol. 51, no. 08, 31 August 2020 (2020-08-31), pages 336 - 343 *
田辉: "计算机游戏著作权保护问题研究", 《中国博士学位论文全文数据库社会科学Ⅰ辑》, no. 2019, 15 September 2019 (2019-09-15), pages 117 - 3 *
章慧 等: "基于多尺度图像融合的新闻视频文字区域检测定位算法", 《贵州大学学报(自然科学版)》, vol. 29, no. 06, 15 December 2012 (2012-12-15), pages 86 - 90 *
贾文其 等: "基于栈式降噪自编码神经网络的车牌字符识别", 《计算机工程与设计》, vol. 37, no. 03, 31 March 2016 (2016-03-31), pages 751 - 756 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221867A (en) * 2021-05-11 2021-08-06 北京邮电大学 Deep learning-based PCB image character detection method
CN113313720A (en) * 2021-06-30 2021-08-27 上海商汤科技开发有限公司 Object segmentation method and device
CN113313720B (en) * 2021-06-30 2024-03-29 上海商汤科技开发有限公司 Object segmentation method and device
CN113743416A (en) * 2021-08-24 2021-12-03 的卢技术有限公司 Data enhancement method for real sample-free situation in OCR field
CN113743416B (en) * 2021-08-24 2024-03-05 的卢技术有限公司 Data enhancement method for non-real sample situation in OCR field
CN114579046A (en) * 2022-01-21 2022-06-03 南华大学 Cloud storage similar data detection method and system
CN114579046B (en) * 2022-01-21 2024-01-02 南华大学 Cloud storage similar data detection method and system
CN117649672A (en) * 2024-01-30 2024-03-05 湖南大学 Font type visual detection method and system based on active learning and transfer learning
CN117649672B (en) * 2024-01-30 2024-04-26 湖南大学 Font type visual detection method and system based on active learning and transfer learning

Also Published As

Publication number Publication date
CN112580507B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN111325203B (en) American license plate recognition method and system based on image correction
CN112580507B (en) Deep learning text character detection method based on image moment correction
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN110390251B (en) Image and character semantic segmentation method based on multi-neural-network model fusion processing
WO2019192397A1 (en) End-to-end recognition method for scene text in any shape
CN110647829A (en) Bill text recognition method and system
CN110969129B (en) End-to-end tax bill text detection and recognition method
CN103049763B (en) Context-constraint-based target identification method
CN110580699A (en) Pathological image cell nucleus detection method based on improved fast RCNN algorithm
CN110766008A (en) Text detection method facing any direction and shape
CN111860348A (en) Deep learning-based weak supervision power drawing OCR recognition method
CN108509881A (en) A kind of the Off-line Handwritten Chinese text recognition method of no cutting
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
CN112287941B (en) License plate recognition method based on automatic character region perception
CN110502655B (en) Method for generating image natural description sentences embedded with scene character information
CN111738055A (en) Multi-class text detection system and bill form detection method based on same
CN110598698B (en) Natural scene text detection method and system based on adaptive regional suggestion network
CN113158977B (en) Image character editing method for improving FANnet generation network
CN111523622B (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN111340034A (en) Text detection and identification method and system for natural scene
CN112070174A (en) Text detection method in natural scene based on deep learning
CN113762269A (en) Chinese character OCR recognition method, system, medium and application based on neural network
CN110991374B (en) Fingerprint singular point detection method based on RCNN
CN116229482A (en) Visual multi-mode character detection recognition and error correction method in network public opinion analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant