CN112580507A - Deep learning text character detection method based on image moment correction - Google Patents
Deep learning text character detection method based on image moment correction Download PDFInfo
- Publication number
- CN112580507A CN112580507A CN202011506599.8A CN202011506599A CN112580507A CN 112580507 A CN112580507 A CN 112580507A CN 202011506599 A CN202011506599 A CN 202011506599A CN 112580507 A CN112580507 A CN 112580507A
- Authority
- CN
- China
- Prior art keywords
- character
- loss
- text
- loss function
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 37
- 238000012937 correction Methods 0.000 title claims abstract description 14
- 238000013135 deep learning Methods 0.000 title claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 31
- 230000006870 function Effects 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000011218 segmentation Effects 0.000 claims abstract description 20
- 238000002372 labelling Methods 0.000 claims abstract description 17
- 239000013598 vector Substances 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims abstract description 5
- 238000004891 communication Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 abstract description 3
- 238000012795 verification Methods 0.000 abstract description 3
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 6
- 238000013519 translation Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 238000007639 printing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a deep learning text character detection method based on image moment correction, which specifically comprises the following steps: preparing a data set, manually correcting a pre-labeled box frame, generating a heat map label in a Gaussian heat map form according to the box frame, defining a neural network structure and a loss function, pre-training, expanding a training sample set of an actual scene, performing self-adaptive binarization operation on the expanded training sample set, calculating Hu moment feature vectors of each character, and performing fine tuning training and model testing and verification by taking an orientation quantity mean value as an auxiliary label of the character and modifying the loss function form; according to the method, the heat map label and the moment feature vector label are combined to form an optimization loss function, so that the accuracy of a character box frame is improved, and the problems of excessive segmentation and insufficient segmentation of a character frame are solved; by preprocessing the sample set after expansion, the problem of insufficient character-level labeling is solved, and the character detection generalization capability is better.
Description
Technical Field
The invention belongs to the field of target detection, and particularly relates to a deep learning text character detection method based on image moment correction.
Background
At present, text detection is widely applied to the field of computer vision, such as real-time translation, image retrieval, scene analysis, geographic positioning, blind navigation and the like, so that the text detection has extremely high application value and research significance in scene understanding and text analysis.
The existing text detection methods are divided into the following categories:
1. the traditional image processing method is based on manually designed feature detection, such as MSER (maximum stable extremum region) and SWT (stroke width transformation), mainly processes text detection of printing fonts and printing and scanning scenes, and has poor text detection effect on natural scenes;
2. the Two-stage method based on deep learning generates a candidate region and extracts corresponding features, carries out network training fine adjustment, and outputs a corresponding text region frame, and has the advantages of higher precision, good performance for small-scale target detection, sharing calculated amount, and low inference speed and longer training period;
3. the One-stage method based on deep learning directly skips the step of generating a candidate frame to predict the text region frame of the target end to end, and has the advantages of high reasoning speed, lower precision than the two-stage method and poor small target detection effect.
Most of the existing text detection algorithm technology is based on the position coordinates of the region of the output text line, for example, the reference network CTPN in the existing text detection technology is improved based on the Two-stage method, on the basis of the fast RCNN, the specificity improvement of horizontal arrangement or vertical arrangement of the target text is combined, and the region of the text line is output. Existing text detection algorithm techniques are not accurate to character-level text detection and thus provide limited information.
The existing character-level text detection algorithm is based on a semantic segmentation idea, a Gaussian center heat map is used for replacing a pixel-level block heat map by a label, two indexes of regional score or compact score are adopted to optimize a network, and a probability map is subjected to binarization processing in post-processing to obtain a final character frame. The character-level text detection can output not only the coordinates of a single character frame body, but also the coordinates of a text line area, the output information is richer, and the larger requirements of customers can be met. However, the existing algorithm for detecting the character-level text is affected by parameters and the complex Chinese text scene, and the over-segmentation or under-segmentation phenomenon occurs on the frame of the segmented character, which respectively corresponds to the rectangular frame and the blackened rectangular frame shown in fig. 4.
Disclosure of Invention
In order to solve the above problems, the present invention provides a deep learning text character detection method based on image moment correction, which includes the following steps:
a: preparing a data set, namely pre-labeling a randomly sampled sample in the data set, and storing a box frame of each character of the sample;
b: manually correcting the box frame which is not accurately pre-marked, and generating a heat map label in a Gaussian heat map form according to the box frame;
c: defining neural network structure and loss function losscross;
D: using the determined network structure and loss function loss in said step CcrossCarrying out preliminary pre-training;
e: expanding a training sample set of an actual scene;
f: performing self-adaptive binarization operation on the training sample set expanded in the step E, calculating the Hu moment feature vector of each character, and taking the orientation quantity mean value as an auxiliary label of the character;
g: modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set;
h: and (3) model testing and verifying, namely modifying the parameter theta of the Gaussian heat map generated by the pre-labeling, and drawing an accuracy rate change curve of the character box frame under different theta threshold values, so that a proper parameter theta is selected according to requirements.
Further, in the present invention,
the data set in the step A mainly comprises data in ICDAR2017, ICDAR2019 and CTW, and randomly sampled samples in the data set are pre-labeled by adopting an easy OCR trained public character level segmentation model.
Further, in the present invention,
the step B of pre-labeling inaccuracy specifically means that the character box frame is over-segmented or under-segmented;
the over-segmentation means that the character box does not contain all the current character in the box, and the under-segmentation means that the character box contains other characters or symbols besides the current character.
Further, in the present invention,
and B, mapping the box frame to a two-dimensional Gaussian map by adopting perspective transformation to generate a tag in the form of a Gaussian heatmap.
Further, in the present invention,
the specific operation of determining the neural network structure in the step C is as follows:
a sample with a preset size is input by the network, a VGG16 reference network is taken as a feature extraction network, and U-net is taken as a decoding network;
outputting a pixel score matrix representing the confidence region;
loss function loss in said step CcrossDetermined by the following method:
loss function losscrossAnd adopting pixel-level cross entropy loss, namely setting the theta threshold value for the tag heat map, wherein character areas are considered to be represented by a category 1 if the theta threshold value is larger, and non-character areas are represented by a category 0 if the theta threshold value is smaller.
Further, in the present invention,
and E, the method for expanding the training sample set of the actual scene comprises the steps of shooting an interface containing documents on a computer screen under random screenshot or different angles, pre-labeling by using a pre-trained model, and manually correcting by using the mode in the step B.
Further, in the present invention,
the theta threshold is obtained by the following steps:
performing Gaussian smoothing processing on the heat map label, and calculating a gradient map of the heat map label;
determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely the character frame under the threshold;
randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
Further, in the present invention,
the loss function loss after modification in the step G is the loss function loss in the step CcrossAdding
Loss of L2: loss is losscross+m*lossL2
WhereinL2 loss characterizing sample moments, m representing number of samples, K representing number of characters of a single sample, yijMeans, f (x), representing the mean of the moment feature vectors corresponding to the jth character in the ith sampleij) And representing the mean value of the moment feature vector corresponding to the jth character in the ith sample of the network output prediction.
Further, in the present invention,
and H, testing and verifying the model in the step H, wherein the sample is a character in a text scene shot or screenshot by a randomly selected computer document.
The invention has the advantages that:
the detection method of the invention provides that the center of a single character is represented based on the image moment characteristics, more robust auxiliary information is provided, namely, an optimization loss function is formed by combining a Gaussian heat map and the moment characteristics to improve the accuracy of a character box frame, the character detection segmentation capability of a model is improved by combining a segmentation task (a heat map label) and a regression task (a moment characteristic label), and the problems of excessive segmentation and under-segmentation of a character frame are solved; in addition, a sample is synthesized by text scenes in the screenshot, a preliminary character text detection model is pre-trained, then pre-labeling is carried out in a real text sample, the text is manually corrected, and the moment characteristic of each character in the real sample is calculated and used as a regular term of a loss function in the training fine adjustment. The preprocessing mode makes up the problem of insufficient character-level labeling on one hand, and on the other hand, the character detection generalization capability of the preprocessing mode is better in the actual text scene of printing, photographing or screenshot.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 shows a prior art character segmentation algorithm flow diagram;
FIG. 2 shows a flow diagram of a character segmentation algorithm of an embodiment of the present invention;
FIG. 3 illustrates an exemplary graph of a exemplar tag Gaussian map of the present invention;
fig. 4 illustrates an exemplary diagram of an over-segmentation or under-segmentation phenomenon.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Because the sample background of the natural scene is complex, and deviation can be caused by calculating the image moment characteristic, the image moment characteristic value is calculated only for the screenshot of the computer document background or the shot specific scene. The moments of different orders have different characteristics, the origin moment or the central moment is used as the characteristics of the image, the characteristics cannot be guaranteed to have translation, rotation and proportion invariance simultaneously, if the characteristics of the image are represented by the central moment, the characteristics only have translation invariance, and the normalized central moment not only has translation invariance, but also has proportion invariance and rotation invariance, so that the Hu moment vector is used as auxiliary information to provide more prior knowledge of the network for training.
The invention discloses a deep learning text character detection method based on image moment correction, which comprises the following steps:
preparing a data set; the public Chinese data set used in the method mainly comprises an ICDAR2017 data set, an ICDAR2019 data set and CTW (wild Chinese text) data, wherein the CTW data has high diversity and complexity and comprises a plane text, a projected text, a city street view text, a town street view text, a text under a weak illumination condition, a long-distance text, a partial display text and the like; for each image, all Chinese characters are marked in the data set; for each Chinese character, the data set is labeled with its character category and bounding box. Firstly, pre-labeling the samples sampled randomly in the data set by using an open character level segmentation model trained by simple optical character recognition (easy OCR), and storing a box frame of each character of each sample;
b, developing a simple fine-tuning man-machine interaction labeling interface, similar to a target detection labeling tool, automatically loading a picture and a json format label corresponding to the picture, and then manually correcting some character box frames with inaccurate pre-labeling in a popped dialog box, wherein the inaccurate prediction refers to insufficient frame (over-segmentation) of the current character or areas from the frame to adjacent characters, commas and the like (under-segmentation), and specific examples refer to a rectangular frame (over-segmentation) and a blackened rectangular frame (under-segmentation) in the graph 4; generating a tag in a gaussian heatmap form according to the box frame of the character, as shown in the sample tag gaussian of fig. 3, in this step, the box frame of the character is mapped onto a two-dimensional gaussian map through perspective transformation, so as to represent the heatmap tag of the character.
C defines the network structure and loss function: the method comprises the steps that a sample with the size h x w x 3 is input into a network, a VGG16 reference network is taken as a feature extraction network, an improved U-net is taken as a decoding network, a pixel score matrix (the specific structure is shown in figure 2) representing a confidence coefficient area is output, h represents the height of an image input into the network, w represents the width of the image input into the network, and 3 is the number of RGB channels;
the loss function uses pixel-level cross entropy loss by setting a theta threshold for the tag heat map, with character regions identified as class 1 if greater than the theta threshold, and non-character regions identified as class 0 if less than the theta threshold.
Therefore, the accuracy of the parameter theta under different values needs to be compared so as to select the best parameter theta, and the theta threshold is obtained by testing in an actual training sample by means of a watershed algorithm in graphics, and the general steps are as follows:
firstly, Gaussian smoothing is carried out on the label heat map, a gradient map of the label heat map is calculated, then communication areas under different thresholds are determined according to a watershed algorithm, the minimum circumscribed rectangle (namely a character frame under the threshold) under each communication area is taken, a plurality of characters are randomly statistically sampled, the accuracy of the minimum circumscribed frame under the threshold is artificially and subjectively judged, and the threshold with relatively high accuracy is taken as a theta threshold.
D, pre-training, and performing preliminary pre-training by adopting the network structure and the loss function defined in the step C.
And E, expanding a training sample set of an actual scene, randomly screenshot or shooting an interface containing documents on a computer screen at different angles, such as a webpage, a word document and the like, pre-labeling by using a pre-trained model, and manually correcting by using the mode of the step B.
F, carrying out self-adaptive binarization on the sample expanded in the step E to obtain a binary image, then calculating the Hu moment feature vector of each character, and taking the average value of the Hu moment feature vector as an auxiliary label of the character; theoretically, the character area moment characteristic mean values have small difference, compared with a non-character area, the moment characteristic value of the character area is much larger, and a moment characteristic branch is introduced, so that on one hand, the attention of a model is more inclined to the character area, and the detection is facilitated; on the other hand, the moment characteristic mean value can guide network learning of a more accurate character frame, and segmentation is facilitated.
G, modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function by using the expanded training sample set; and (3) carrying out model training by using the expanded training sample, wherein the step of distinguishing the details of pre-training is as follows: the loss function of the network is modified, namely a regular item branch which takes the Hu moment feature vector as auxiliary label information is added, and the loss of the original cross entropy is reduced due to the character frame moment vectorcrossAfter L2 loss is added, joint training is carried out, and the value of m is 0.01-0.05;
loss=losscross+m*lossL2
whereinL2 loss characterizing sample moments, m representing the number of samples, and K representing the number of characters of a single sample. y isijRepresenting the mean value of the moment feature vectors corresponding to the jth character in the ith sample, and taking the mean value as a moment feature label, f (x)ij) Represents the mean of the moment eigenvectors corresponding to the jth character in the ith sample of the network output prediction, and L2 represents the least squares error.
The method comprises the steps of H model testing and verification, wherein the model of the method mainly aims to solve the problem of character detection of a text scene shot by a computer document, so that a sample in the scene is used for testing and verification, and the accuracy rate of character segmentation is counted; since the pre-labeled thermodynamic diagram is affected by the parameter theta. Therefore, the accuracy of the parameter theta at different values needs to be compared to select the best parameter theta. The parameter theta of the Gaussian heatmap generated by pre-labeling is modified, and an accuracy rate change curve of the character box frame under different theta threshold values is drawn, so that the appropriate parameter theta is selected according to requirements.
FIG. 1 illustrates a prior art character segmentation algorithm
Scaling an input sample to a value h x w x 3 as a network input, adopting a VGG16 reference network as a feature extraction network, wherein the higher the stage of extracting the network is, the more abstract a correspondingly generated feature map is, and the size is reduced to 1/2; in order to fuse the information of the bottom layer features and the information of the high layer features, the decoding network U-net enables the feature diagram of a certain output layer to be the same as the feature diagram of a certain stage of the extraction network in size through upsampling, so that merging and fusion are conducted, and finally a pixel score matrix representing a character connection confidence coefficient area is output through a convolution layer of 1 x 1. The main idea is to use a segmentation task to predict a character detection frame, add a character connection confidence matrix to an output branch to solve the character positioning problem of a non-rectangular area, and use a synthesized character data set to perform weak supervised learning to complete a pre-training task of a model, thereby improving the character segmentation effect under the integral natural scene.
FIG. 2 shows the character segmentation algorithm of the present method
The method is basically the same as the method in terms of network structure, the size and the output of an input sample are different, the input size is h x w x 3 structure, a VGG16 reference network is taken as a feature extraction network and a decoding network to fuse the features of the upper layer and the lower layer, a pixel score matrix representing a character moment mean vector is output through a convolution layer of 1 x 1, and a moment feature vector is obtained by introducing branch output of a full connection layer. The two branches mainly combine the segmentation and regression tasks, and the box coordinate of target detection is replaced by the moment feature, so that the character is more robust in the positioning segmentation task of Chinese character texts with relatively consistent aspect ratio due to the characteristic of the moment feature vector. Constructing a batch of data sets related to the practical application of the algorithm, such as text data sets of computer shooting and screenshot scenes; the purpose is to solve the character-level text detection problem; by taking the idea of semantic segmentation into account, each character is also labeled with a gaussian heat map, where a higher pixel value indicates a closer pixel to the center point of the character.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A deep learning text character detection method based on image moment correction is characterized by comprising the following steps:
a: preparing a data set, namely pre-labeling a randomly sampled sample in the data set, and storing a box frame of each character of the sample;
b: manually correcting the box frame which is not accurately pre-marked, and generating a heat map label in a Gaussian heat map form according to the box frame;
c: defining neural network structure and loss function losscross;
D: using the determined network structure and loss function loss in said step CcrossCarrying out preliminary pre-training;
e: expanding a training sample set of an actual scene;
f: performing self-adaptive binarization operation on the training sample set expanded in the step E, calculating the Hu moment feature vector of each character, and taking the orientation quantity mean value as an auxiliary label of the character;
g: modifying the form of the loss function, adding a regular term branch, and performing fine tuning training by using the modified loss function loss by using the extended training sample set;
h: and (3) model testing and verifying, namely modifying the parameter theta of the Gaussian heat map generated by the pre-labeling, and drawing an accuracy rate change curve of the character box frame under different theta threshold values, so that a proper parameter theta is selected according to requirements.
2. The method of claim 1, wherein the text character detection method based on image moment correction,
the data set in the step A mainly comprises data in ICDAR2017, ICDAR2019 and CTW, and randomly sampled samples in the data set are pre-labeled by adopting an easy OCR trained public character level segmentation model.
3. The method of claim 1, wherein the text character detection method based on image moment correction,
the step B of pre-labeling inaccuracy specifically means that the character box frame is over-segmented or under-segmented;
the over-segmentation means that the character box does not contain all the current character in the box, and the under-segmentation means that the character box contains other characters or symbols besides the current character.
4. The method of claim 1, wherein the text character detection method based on image moment correction,
and B, mapping the box frame to a two-dimensional Gaussian map by adopting perspective transformation to generate a tag in the form of a Gaussian heatmap.
5. The method of claim 1, wherein the text character detection method based on image moment correction,
the specific operation of determining the neural network structure in the step C is as follows:
a sample with a preset size is input by the network, a VGG16 reference network is taken as a feature extraction network, and U-net is taken as a decoding network;
outputting a pixel score matrix representing the confidence region;
loss function loss in said step CcrossDetermined by the following method:
loss function losscrossUsing pixel-level cross entropy loss, i.e. by setting the theta threshold to the tag heat map, to be greater than thAnd if the eta threshold is smaller than the theta threshold, the character area is regarded as a character area and is represented by a category 1, and if the eta threshold is smaller than the theta threshold, the character area is regarded as a non-character area and is represented by a category 0.
6. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
and E, the method for expanding the training sample set of the actual scene comprises the steps of shooting an interface containing documents on a computer screen under random screenshot or different angles, pre-labeling by using a pre-trained model, and manually correcting by using the mode in the step B.
7. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
the theta threshold is obtained by the following steps:
performing Gaussian smoothing processing on the heat map label, and calculating a gradient map of the heat map label;
determining communication areas under different thresholds according to a watershed algorithm, and taking the minimum circumscribed rectangle under each communication area, namely the character frame under the threshold;
randomly counting and sampling a plurality of words, judging the accuracy of the minimum external frame under the corresponding different thresholds, and taking the threshold with the highest accuracy as the theta threshold.
8. The method for detecting deep learning text characters based on image moment correction according to any one of claims 1-5,
the loss function loss after modification in the step G is the loss function loss in the step CcrossAdding
Loss of L2: loss is losscross+m*lossL2
WhereinL2 loss characterizing sample moments, m denotes the number of samples,k denotes the number of characters of a single sample, yijMeans, f (x), representing the mean of the moment feature vectors corresponding to the jth character in the ith sampleij) And representing the mean value of the moment feature vector corresponding to the jth character in the ith sample of the network output prediction.
9. The method of claim 8, wherein the text character is selected from a group consisting of a text character,
and H, testing and verifying the model in the step H, wherein the sample is a character in a text scene shot or screenshot by a randomly selected computer document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011506599.8A CN112580507B (en) | 2020-12-18 | 2020-12-18 | Deep learning text character detection method based on image moment correction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011506599.8A CN112580507B (en) | 2020-12-18 | 2020-12-18 | Deep learning text character detection method based on image moment correction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112580507A true CN112580507A (en) | 2021-03-30 |
CN112580507B CN112580507B (en) | 2024-05-31 |
Family
ID=75136268
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011506599.8A Active CN112580507B (en) | 2020-12-18 | 2020-12-18 | Deep learning text character detection method based on image moment correction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112580507B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221867A (en) * | 2021-05-11 | 2021-08-06 | 北京邮电大学 | Deep learning-based PCB image character detection method |
CN113313720A (en) * | 2021-06-30 | 2021-08-27 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113743416A (en) * | 2021-08-24 | 2021-12-03 | 的卢技术有限公司 | Data enhancement method for real sample-free situation in OCR field |
CN113989485A (en) * | 2021-11-29 | 2022-01-28 | 合肥高维数据技术有限公司 | Text character segmentation method and system based on OCR recognition |
CN114549906A (en) * | 2022-02-28 | 2022-05-27 | 长沙理工大学 | Improved image classification algorithm for step-by-step training of Top-k loss function |
CN114579046A (en) * | 2022-01-21 | 2022-06-03 | 南华大学 | Cloud storage similar data detection method and system |
CN117649672A (en) * | 2024-01-30 | 2024-03-05 | 湖南大学 | Font type visual detection method and system based on active learning and transfer learning |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090157571A1 (en) * | 2007-12-12 | 2009-06-18 | International Business Machines Corporation | Method and apparatus for model-shared subspace boosting for multi-label classification |
CN104899821A (en) * | 2015-05-27 | 2015-09-09 | 合肥高维数据技术有限公司 | Method for erasing visible watermark of document image |
WO2017185257A1 (en) * | 2016-04-27 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing adam gradient descent training algorithm |
RU2656708C1 (en) * | 2017-06-29 | 2018-06-06 | Самсунг Электроникс Ко., Лтд. | Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering |
CN108399421A (en) * | 2018-01-31 | 2018-08-14 | 南京邮电大学 | A kind of zero sample classification method of depth of word-based insertion |
EP3422254A1 (en) * | 2017-06-29 | 2019-01-02 | Samsung Electronics Co., Ltd. | Method and apparatus for separating text and figures in document images |
EP3499457A1 (en) * | 2017-12-15 | 2019-06-19 | Samsung Display Co., Ltd | System and method of defect detection on a display |
CN110717492A (en) * | 2019-10-16 | 2020-01-21 | 电子科技大学 | Method for correcting direction of character string in drawing based on joint features |
WO2020046960A1 (en) * | 2018-08-31 | 2020-03-05 | Alibaba Group Holding Limited | System and method for optimizing damage detection results |
CN111079638A (en) * | 2019-12-13 | 2020-04-28 | 河北爱尔工业互联网科技有限公司 | Target detection model training method, device and medium based on convolutional neural network |
CN111222434A (en) * | 2019-12-30 | 2020-06-02 | 深圳市爱协生科技有限公司 | Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning |
CN111553346A (en) * | 2020-04-26 | 2020-08-18 | 佛山市南海区广工大数控装备协同创新研究院 | Scene text detection method based on character region perception |
-
2020
- 2020-12-18 CN CN202011506599.8A patent/CN112580507B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090157571A1 (en) * | 2007-12-12 | 2009-06-18 | International Business Machines Corporation | Method and apparatus for model-shared subspace boosting for multi-label classification |
CN104899821A (en) * | 2015-05-27 | 2015-09-09 | 合肥高维数据技术有限公司 | Method for erasing visible watermark of document image |
WO2017185257A1 (en) * | 2016-04-27 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing adam gradient descent training algorithm |
RU2656708C1 (en) * | 2017-06-29 | 2018-06-06 | Самсунг Электроникс Ко., Лтд. | Method for separating texts and illustrations in images of documents using a descriptor of document spectrum and two-level clustering |
EP3422254A1 (en) * | 2017-06-29 | 2019-01-02 | Samsung Electronics Co., Ltd. | Method and apparatus for separating text and figures in document images |
EP3499457A1 (en) * | 2017-12-15 | 2019-06-19 | Samsung Display Co., Ltd | System and method of defect detection on a display |
CN108399421A (en) * | 2018-01-31 | 2018-08-14 | 南京邮电大学 | A kind of zero sample classification method of depth of word-based insertion |
WO2020046960A1 (en) * | 2018-08-31 | 2020-03-05 | Alibaba Group Holding Limited | System and method for optimizing damage detection results |
CN110717492A (en) * | 2019-10-16 | 2020-01-21 | 电子科技大学 | Method for correcting direction of character string in drawing based on joint features |
CN111079638A (en) * | 2019-12-13 | 2020-04-28 | 河北爱尔工业互联网科技有限公司 | Target detection model training method, device and medium based on convolutional neural network |
CN111222434A (en) * | 2019-12-30 | 2020-06-02 | 深圳市爱协生科技有限公司 | Method for obtaining evidence of synthesized face image based on local binary pattern and deep learning |
CN111553346A (en) * | 2020-04-26 | 2020-08-18 | 佛山市南海区广工大数控装备协同创新研究院 | Scene text detection method based on character region perception |
Non-Patent Citations (7)
Title |
---|
JUNMING CHANG 等: "A Segmentation Algorithm for Touching Character Based on the Invariant Moments and Profile Feature", 《2012 INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND COMMUNICATION TECHNOLOGY》, 31 December 2012 (2012-12-31), pages 188 - 191, XP032311483, DOI: 10.1109/ICCECT.2012.159 * |
TIAN, H 等: "Promising Techniques for Anomaly Detection on Network Traffic", 《COMPUTER SCIENCE AND INFORMATION SYSTEMS》, vol. 14, no. 3, 30 November 2017 (2017-11-30), pages 597 - 609 * |
杨玲玲 等: "一种基于图像矩和纹理特征的自然场景文本检测算法", 《小型微型计算机系统》, vol. 37, no. 06, 30 June 2016 (2016-06-30), pages 1313 - 1317 * |
田萱 等: "基于语义分割的食品标签文本检测", 《农业机械学报》, vol. 51, no. 08, 31 August 2020 (2020-08-31), pages 336 - 343 * |
田辉: "计算机游戏著作权保护问题研究", 《中国博士学位论文全文数据库社会科学Ⅰ辑》, no. 2019, 15 September 2019 (2019-09-15), pages 117 - 3 * |
章慧 等: "基于多尺度图像融合的新闻视频文字区域检测定位算法", 《贵州大学学报(自然科学版)》, vol. 29, no. 06, 15 December 2012 (2012-12-15), pages 86 - 90 * |
贾文其 等: "基于栈式降噪自编码神经网络的车牌字符识别", 《计算机工程与设计》, vol. 37, no. 03, 31 March 2016 (2016-03-31), pages 751 - 756 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113221867A (en) * | 2021-05-11 | 2021-08-06 | 北京邮电大学 | Deep learning-based PCB image character detection method |
CN113313720A (en) * | 2021-06-30 | 2021-08-27 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113313720B (en) * | 2021-06-30 | 2024-03-29 | 上海商汤科技开发有限公司 | Object segmentation method and device |
CN113743416A (en) * | 2021-08-24 | 2021-12-03 | 的卢技术有限公司 | Data enhancement method for real sample-free situation in OCR field |
CN113743416B (en) * | 2021-08-24 | 2024-03-05 | 的卢技术有限公司 | Data enhancement method for non-real sample situation in OCR field |
CN113989485A (en) * | 2021-11-29 | 2022-01-28 | 合肥高维数据技术有限公司 | Text character segmentation method and system based on OCR recognition |
CN114579046A (en) * | 2022-01-21 | 2022-06-03 | 南华大学 | Cloud storage similar data detection method and system |
CN114579046B (en) * | 2022-01-21 | 2024-01-02 | 南华大学 | Cloud storage similar data detection method and system |
CN114549906A (en) * | 2022-02-28 | 2022-05-27 | 长沙理工大学 | Improved image classification algorithm for step-by-step training of Top-k loss function |
CN117649672A (en) * | 2024-01-30 | 2024-03-05 | 湖南大学 | Font type visual detection method and system based on active learning and transfer learning |
CN117649672B (en) * | 2024-01-30 | 2024-04-26 | 湖南大学 | Font type visual detection method and system based on active learning and transfer learning |
Also Published As
Publication number | Publication date |
---|---|
CN112580507B (en) | 2024-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112580507B (en) | Deep learning text character detection method based on image moment correction | |
CN111325203B (en) | American license plate recognition method and system based on image correction | |
CN111723585B (en) | Style-controllable image text real-time translation and conversion method | |
CN111488826B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN110390251B (en) | Image and character semantic segmentation method based on multi-neural-network model fusion processing | |
WO2019192397A1 (en) | End-to-end recognition method for scene text in any shape | |
CN110647829A (en) | Bill text recognition method and system | |
CN113673338B (en) | Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels | |
CN110969129B (en) | End-to-end tax bill text detection and recognition method | |
CN110399845A (en) | Continuously at section text detection and recognition methods in a kind of image | |
CN103049763B (en) | Context-constraint-based target identification method | |
CN110580699A (en) | Pathological image cell nucleus detection method based on improved fast RCNN algorithm | |
CN111860348A (en) | Deep learning-based weak supervision power drawing OCR recognition method | |
CN110298343A (en) | A kind of hand-written blackboard writing on the blackboard recognition methods | |
CN111738055B (en) | Multi-category text detection system and bill form detection method based on same | |
CN112287941B (en) | License plate recognition method based on automatic character region perception | |
CN113158977B (en) | Image character editing method for improving FANnet generation network | |
CN110502655B (en) | Method for generating image natural description sentences embedded with scene character information | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
CN111523622B (en) | Method for simulating handwriting by mechanical arm based on characteristic image self-learning | |
CN112070174A (en) | Text detection method in natural scene based on deep learning | |
CN111340034A (en) | Text detection and identification method and system for natural scene | |
CN113762269A (en) | Chinese character OCR recognition method, system, medium and application based on neural network | |
CN115116074A (en) | Handwritten character recognition and model training method and device | |
CN110991374B (en) | Fingerprint singular point detection method based on RCNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |