CN113705673A - Character detection method, device, equipment and storage medium - Google Patents
Character detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN113705673A CN113705673A CN202110996114.6A CN202110996114A CN113705673A CN 113705673 A CN113705673 A CN 113705673A CN 202110996114 A CN202110996114 A CN 202110996114A CN 113705673 A CN113705673 A CN 113705673A
- Authority
- CN
- China
- Prior art keywords
- character
- detection model
- detection
- training set
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 222
- 238000003860 storage Methods 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 92
- 238000012545 processing Methods 0.000 claims abstract description 41
- 230000009467 reduction Effects 0.000 claims abstract description 21
- 238000012937 correction Methods 0.000 claims abstract description 18
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 10
- 238000006243 chemical reaction Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 9
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 5
- 235000002566 Capsicum Nutrition 0.000 claims description 4
- 239000006002 Pepper Substances 0.000 claims description 4
- 235000016761 Piper aduncum Nutrition 0.000 claims description 4
- 235000017804 Piper guineense Nutrition 0.000 claims description 4
- 235000008184 Piper nigrum Nutrition 0.000 claims description 4
- 238000011176 pooling Methods 0.000 claims description 4
- 150000003839 salts Chemical class 0.000 claims description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000003708 edge detection Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 230000014759 maintenance of location Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 244000203593 Piper nigrum Species 0.000 claims 1
- 230000001965 increasing effect Effects 0.000 abstract description 6
- 238000000034 method Methods 0.000 description 29
- 230000000694 effects Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000004927 fusion Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 4
- 241000722363 Piper Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000004821 distillation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011049 filling Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Character Input (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a character detection method, a device, equipment and a storage medium, comprising the following steps: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using a training set; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the image; and judging the direction of the image by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion. Therefore, the richness of the training set can be increased, the generalization capability of the training model is improved, the character detection capability under a complex scene is improved, and meanwhile, the correct character direction is provided for the character recognition stage.
Description
Technical Field
The present invention relates to the field of text detection, and in particular, to a text detection method, apparatus, device, and storage medium.
Background
At present, the mainstream OCR (Optical Character Recognition) technology is mainly implemented based on two stages of detection and Recognition, where the detection stage is to detect an area in which a Character exists in an image and return coordinates of the Character area, and the Recognition stage is to recognize picture content output by the detection stage and corresponding to the coordinates and return corresponding characters. The detection is an important step for realizing OCR, the accurate detection of the position of the character is the premise of recognition, and redundant information is removed for the recognition stage.
The character detection technology also belongs to the technical field of target detection, and because a large number of bent and irregular character examples exist and gaps exist among characters, the detection of the characters requires higher detection precision. The existing character detection technology is mainly realized by a traditional detection method, a segmentation-based detection method, a regression-based detection method and a mixing method. The methods have good detection effect on texts with various shape rules, but have poor detection effect on dense characters and small target characters and characters in complex scenes, and are easy to have the problems of missing detection, false detection, low detection quality, redundant detection frames and the like.
Therefore, how to improve the text detection effect in some complex scenes is a technical problem that needs to be solved urgently by the technical personnel in the field.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device and a storage medium for detecting a character, which can improve the character detection capability in a complex scene and provide a correct character direction for a character recognition stage. The specific scheme is as follows:
a text detection method, comprising:
synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
training the character detection model and the direction detection model by using the processed training set until the network converges;
inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected;
and judging the direction of the character image to be detected by the direction detection model after training of each character area, and performing coordinate adjustment of corresponding direction conversion.
Preferably, in the text detection method provided in the embodiment of the present invention, when synthesizing a real document scene image as a training set, the method further includes:
detecting the training set, and counting the detected result;
in the statistical result, only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion;
and rendering the detection result by adopting an image processing tool to realize semi-automatic marking.
Preferably, in the text detection method provided in the embodiment of the present invention, the denoising processing performed on the training set includes:
converting the training set after data enhancement into a gray scale map;
removing pepper salt noise points by adopting median filtering;
enhancing the character outline information by adopting a high contrast retention algorithm;
adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image;
and performing AND operation on the obtained binary image and the training set after data enhancement to obtain the training set after noise reduction.
Preferably, in the text detection method provided in an embodiment of the present invention, the correcting process performed on the training set includes:
carrying out edge detection processing on the training set subjected to noise reduction;
detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions;
obtaining a rotation matrix according to the rotation angle;
and carrying out affine transformation according to the rotation matrix to obtain the training set after correction processing.
Preferably, in the text detection method provided in the embodiment of the present invention, the backhaul part of the text detection model uses a ResNet network structure, a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ Transformer network structure;
the Neck part of the character detection model adopts an FPN structure, a Bi _ FPN structure, a PANet structure or a recurive-FPN structure.
Preferably, in the text detection method provided in the embodiment of the present invention, an SPP network structure is fused in the backhaul part of the text detection model in the feature extraction stage.
Preferably, in the text detection method provided in the embodiment of the present invention, the direction detection model is composed of convolution, pooling, Batch Normalization, an activation function, and jump connection.
The embodiment of the invention also provides a character detection device, which comprises:
the data processing module is used for synthesizing a real document scene image as a training set and carrying out data enhancement, noise reduction and correction processing on the training set;
the model construction module is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
the model training module is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module is used for inputting the character image to be detected to the trained character detection model for detection and outputting the coordinates of the character area in the character image to be detected;
and the direction detection module is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
The embodiment of the invention also provides character detection equipment which comprises a processor and a memory, wherein the processor executes the computer program stored in the memory to realize the character detection method provided by the embodiment of the invention.
The embodiment of the present invention further provides a computer-readable storage medium for storing a computer program, where the computer program is executed by a processor to implement the above-mentioned text detection method provided in the embodiment of the present invention.
According to the technical scheme, the character detection method provided by the invention comprises the following steps: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using the processed training set until the network converges; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected; and judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion.
In the character detection method provided by the invention, a real document scene image is synthesized to be used as a training set, and data enhancement, noise reduction and correction processing are carried out on the training set, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage.
In addition, the invention also provides a corresponding device, equipment and a computer readable storage medium aiming at the character detection method, so that the method has higher practicability, and the device, the equipment and the computer readable storage medium have corresponding advantages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a text detection method according to an embodiment of the present invention;
FIG. 2 is a graph of image and label segmentation after data enhancement and noise reduction provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of an SPP structure provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a text detection model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a text detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a character detection method, as shown in figure 1, comprising the following steps:
s101, synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
in practical application, the document detection scene designed by the invention can include character detection on various types and typesetting modes of character paragraphs, tables and character-table-picture mixed electronic scanned documents, and character detection on various types of documents shot by a user by using a handheld device in a complex scene, and the obtained document scene image can include dense texts, small target characters, characters in any angles and any shapes.
Because the development source data set is basically a character detection scene in a natural scene, the generalization capability of the model is not strong when the development source data set is used for model training, and the difficulty in character detection in a real scene is mainly influenced by factors such as illumination intensity, shooting angle, definition, the size of characters in an image, the distribution condition of the characters and the like.
In addition, in the invention, the optimization by adopting the image processing technology is an indispensable step, including background interference removal, inclination correction and the like, so that the quality of the document is improved as much as possible, and the detection capability of the model on the characters is improved.
S102, constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value;
s103, training the character detection model and the direction detection model by using the processed training set until the network converges;
s104, inputting the character image to be detected to the trained character detection model for detection, and outputting coordinates of a character area in the character image to be detected;
and S105, judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion.
In the text detection method provided by the embodiment of the invention, a real document scene image is synthesized to be used as a training set, and the training set is subjected to data enhancement, noise reduction and correction processing, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage.
In specific implementation, in the text detection method provided in the embodiment of the present invention, because the manual labeling is a time-consuming and labor-consuming task, when step S101 is executed to synthesize a real document scene image as a training set, the method may further include: detecting the training set, and counting the detected result; only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion; and rendering the detection result by adopting an image processing tool (such as OpenCV) to realize semi-automatic marking. Therefore, the result which is not ideal for detection can be filtered, time and labor are saved, the capital investment for manually processing the boring work is reduced, and errors generated in different manual processing processes are reduced.
It should be noted that, an open-source character detection model may be used to detect a training set (i.e., document data with a large number of small target instances and intensive characters), and the detected results are counted, because the open-source model has poor detection capability for the small target instances, in order to ensure the correctness of the synthesized data set, only a detection frame for detecting a relatively large character instance area may be retained and reduced according to the actual situation, and then a computer image processing library OpenCV is used to render and generate new image data, so as to perform the labeling for generating small target characters and intensive characters in a targeted manner, and at the same time, achieve the effect of increasing small target samples, and belong to a semi-automatic labeling manner, so that the synthesized training set data is closer to the real scene.
In specific implementation, in the text detection method provided in the embodiment of the present invention, the step S101 performs data enhancement on the training set, and may include: the method comprises the steps of randomly cutting, randomly turning and randomly changing fuzzy transformation on a training set, and randomly changing the brightness of an image, wherein the four transformations aim to increase the richness of a training sample and also increase the complexity of the training sample, the robustness of a model is improved, and the rotation operation at any angle aims to improve the detection capability of the model on characters at different angles.
Because the real scene not only includes the character detection of the scanned document, but also faces the problem of document detection of the user shooting in the real scene, the influence of environmental noise inevitably exists, for the scanned document, the conditions of page inclination, scanning trace interference, handwriting, seal on the perforation, pepper salt noise and the like exist, and for the photo shot by the user, the conditions of unbalanced illumination, page inclination, handwriting blur, background interference and the like exist. Therefore, in the invention, the image noise reduction and correction work is to perform noise reduction and correction processing on the input image shot by the user in the model inference stage. Fig. 2 shows an image and label segmentation map after data enhancement processing and noise reduction processing.
In a specific implementation, in the above text detection method provided in the embodiment of the present invention, since the text detection method implements pixel level classification based on a segmentation idea, binarization processing is a very important step, and the step S101 performs noise reduction processing on the training set, which may include: converting the training set after data enhancement into a gray scale map; removing pepper salt noise points by adopting median filtering; enhancing the character outline information by adopting a high contrast retention algorithm; adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image; and performing AND operation on the obtained binary image and the training set subjected to data enhancement to obtain a noise-reduced training set. Therefore, excessive loss of character information caused by the binarization process is avoided.
In specific implementation, in the text detection method provided in the embodiment of the present invention, the image rectification processing mainly solves the problem of tilt of image content, the tilted document has a great influence on the subsequent text sorting processing, otherwise, the sequence between the text and the sequence between the lines are disordered, and the step S101 performs rectification processing on the training set, which may include: carrying out edge detection processing on the denoised training set; detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions; obtaining a rotation matrix according to the rotation angle; and carrying out affine transformation according to the rotation matrix to obtain a training set after correction processing.
In specific implementation, in the text detection method provided in the embodiment of the present invention, in order to improve the extraction capability of the image features of the model and the detection capability of the model on small target texts and dense texts, the text detection model adopts a text detection architecture with a mixed idea, and is mainly optimized from a backhaul part and a tack part of the text detection model.
Specifically, a Backbone part of the character detection model is mainly responsible for extracting image features, the Backbone part can use various structures, the purpose is that the feature extraction capability, the reasoning speed and the occupation of video memory are all optimal as far as possible, the method mainly comprises a ResNet series structure, the output of a feature extraction network is four feature maps with different sizes and scales, a feature map with a large scale is mainly responsible for detecting a small target, and a feature map with a small scale is responsible for detecting a large target, namely, targets with different sizes are detected according to the sizes of the feature maps. Meanwhile, the backhaul part of the character detection model is fused with an SPP (Spatial Pyramid Pooling) network structure in the feature extraction stage, so as to increase the receptive field, extract important context features without reducing the network computation speed, and the structure is shown in fig. 3.
The ResNet series network structure belongs to the classical characteristic extraction network structure in the computer vision field, the strong characteristic extraction capability of the ResNet series network structure is widely applied to various computer vision tasks, preferably, a ResNet50+ deformation convolution structure can be used in the invention, the detection capability of a model for bent characters is improved, a ResNet50 and a ResNet18 are trained simultaneously by adopting a model distillation technology, the knowledge learned by a complex model is quickly transferred to a lightweight model, the reasoning speed of the model is improved, and the occupation of a video memory is reduced.
It should be noted that the backhaul part of the text detection model may also use a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ Transformer network structure. The MobileNet network belongs to a lightweight network structure, and compared with a ResNet series network structure, the MobileNet network has fewer parameters and is lighter in model. The RepVGG network structure is also called as a heavily parameterized structure, belongs to a training-reasoning decoupling architecture, and generally belongs to a Plain structure, a 3 x 3 convolution structure is mainly used in the network, a multi-branch structure is arranged in a training stage, and the RepVGG network can be fused into a single main network in a reasoning stage, so that the reasoning speed is improved while the feature extraction capability is maintained. The Swin _ Transformer structure is similar to a CNN hierarchical feature mapping mode, a self-attention mechanism is sampled to calculate local region features, and a window-shifting multi-head attention (SW-MSA) structure is used for realizing cross-domain information fusion.
Specifically, the neutral part of the character detection model may use a Bi _ FPN (weighted Bi-directional feature pyramid network) structure, and the output of the feature extraction network is four feature maps with different sizes, where a feature map with a large scale is mainly responsible for detecting a small target and a feature map with a small scale is responsible for detecting a large target, that is, targets with different sizes are detected according to the size of the feature map. After the feature maps with different scales are input into the Neck structure, the Bi _ FPN structure can enable the finer feature map of the lower layer of the network to be easily transmitted into the upper layer structure, efficient scale fusion is carried out on the large-scale feature map output by the backhaul, transmission of feature information of a small target in the neural network is promoted, and therefore the effect of improving the detection performance of the small target is achieved. The common FPN structure is equivalent to the feature fusion of different scales, and the Bi _ FPN structure adopts a weighting mode to the feature graphs of different scales, so that the feature information can be better balanced. Fig. 4 shows an architecture diagram of the entire text detection model.
It should be noted that the hack component in the text detection module can be replaced with a FPN (Feature Pyramid Network) structure, a pant (Path Aggregation Network) structure, and a Recursive-FPN (Recursive Feature Pyramid Network) structure. The FPN structure belongs to a classic feature fusion network structure, realizes top-down semantics for the first time, realizes feature fusion operation and relieves the problem of feature imbalance. The PANet architecture, i.e., a path aggregation network. For a general FPN feature pyramid network, different feature scales are fused, a path called Bottom _ up is added to the PANet, so that a finer feature map of a lower layer is easier to transfer to an upper layer, then fusion is performed on the same scale, and then splicing operation is performed, so that Top-Down and Bottom-up exist, and the obtained information is richer. The recursion-FPN structure inputs the output of the traditional FPN network structure into the Backbone structure again for recycling, so that multi-scale feature information can be more effectively acquired.
In a specific implementation, in the text detection method provided in the embodiment of the present invention, the direction detection model is composed of a series of convolution, pooling, Batch Normalization, activation function, and jump connection. The input of the model is the image value corresponding to the character area coordinate output by the character detection module, the image value is converted into a gray scale map, the shape of the character area image value is converted into 32 x 200 pixel size by using perspective change processing, the output of the model is the corresponding direction value, for example, 0 represents the correct direction, 1 represents clockwise rotation 90 degrees, and so on, 2 represents 180 degrees, and 3 represents 270 degrees. The direction of the image is determined according to the mode of direction values of all character example areas in the image, the direction of the character example level abandons the influence of a background area, the input is the characteristic of all character areas, the detection method is more reliable than the input method of taking the whole image as a direction detector, the common increase of errors and calculation amount caused by the existence of a large number of table areas or blank areas in a document can be avoided, and the structure of the model is lighter. The inferred image sizes are uniformly scaled to 32 x 200, white filling is used in order to ensure the character proportion, and the accuracy of the final trained model is higher than that of the mode of inputting the whole image.
Based on the same inventive concept, the embodiment of the present invention further provides a text detection apparatus, and since the principle of the apparatus for solving the problem is similar to the text detection method, the implementation of the apparatus can refer to the implementation of the text detection method, and repeated details are not repeated.
In specific implementation, as shown in fig. 5, the text detection apparatus provided in the embodiment of the present invention specifically includes:
the data processing module 11 is configured to synthesize a real document scene image as a training set, and perform data enhancement, noise reduction, and correction processing on the training set;
the model construction module 12 is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value;
the model training module 13 is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module 14 is configured to input the character image to be detected to the trained character detection model for detection, and output coordinates of a character region in the character image to be detected;
and the direction detection module 15 is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
In the character detection device provided by the embodiment of the invention, the richness of a training set can be increased through the interaction of the four modules, the interference of background noise is removed, and the generalization capability of a training model is improved; the method can detect the characters at any angle, avoids poor detection effect caused by character angle problems, can judge the direction of the whole image according to the directions of all character instance areas, improves character detection capability and accuracy rate in a complex scene, and provides correct character direction for a character recognition stage.
For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Correspondingly, the embodiment of the invention also discloses a character detection device, which comprises a processor and a memory; the processor executes the computer program stored in the memory to implement the character detection method disclosed in the foregoing embodiments.
For more specific processes of the above method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Further, the present invention also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by a processor, implements the text detection method disclosed above.
For more specific processes of the above method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device, the equipment and the storage medium disclosed by the embodiment correspond to the method disclosed by the embodiment, so that the description is relatively simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
To sum up, a text detection method provided by the embodiment of the present invention includes: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using the processed training set until the network converges; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected; and judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion. In the character detection method, a real document scene image is synthesized to be used as a training set, and data enhancement, noise reduction and correction processing are carried out on the training set, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage. In addition, the invention also provides a corresponding device, equipment and a computer readable storage medium aiming at the character detection method, so that the character detection method has higher practicability, and the device, the equipment and the computer readable storage medium have corresponding advantages.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method, the apparatus, the device and the storage medium for detecting characters provided by the present invention are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.
Claims (10)
1. A text detection method, comprising:
synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
training the character detection model and the direction detection model by using the processed training set until the network converges;
inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected;
and judging the direction of the character image to be detected by the direction detection model after training of each character area, and performing coordinate adjustment of corresponding direction conversion.
2. The text detection method of claim 1, wherein, while synthesizing a real document scene image as a training set, further comprising:
detecting the training set, and counting the detected result;
in the statistical result, only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion;
and rendering the detection result by adopting an image processing tool to realize semi-automatic marking.
3. The text detection method of claim 1, wherein denoising the training set comprises:
converting the training set after data enhancement into a gray scale map;
removing pepper salt noise points by adopting median filtering;
enhancing the character outline information by adopting a high contrast retention algorithm;
adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image;
and performing AND operation on the obtained binary image and the training set after data enhancement to obtain the training set after noise reduction.
4. The text detection method of claim 3, wherein the correcting the training set comprises:
carrying out edge detection processing on the training set subjected to noise reduction;
detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions;
obtaining a rotation matrix according to the rotation angle;
and carrying out affine transformation according to the rotation matrix to obtain the training set after correction processing.
5. The text detection method according to claim 1, wherein the backhaul part of the text detection model uses a ResNet network structure, a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ fransformer network structure;
the Neck part of the character detection model adopts an FPN structure, a Bi _ FPN structure, a PANet structure or a recurive-FPN structure.
6. The text detection method according to claim 5, wherein the Backbone part of the text detection model is fused with an SPP network structure in a feature extraction stage.
7. The text detection method of claim 1, wherein the direction detection model consists of convolution, pooling, Batch Normalization, activation function, and jump join.
8. A character detection apparatus, comprising:
the data processing module is used for synthesizing a real document scene image as a training set and carrying out data enhancement, noise reduction and correction processing on the training set;
the model construction module is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
the model training module is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module is used for inputting the character image to be detected to the trained character detection model for detection and outputting the coordinates of the character area in the character image to be detected;
and the direction detection module is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
9. A text detection device comprising a processor and a memory, wherein the processor implements the text detection method of any one of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the text detection method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110996114.6A CN113705673B (en) | 2021-08-27 | 2021-08-27 | Text detection method, text detection device, text detection equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110996114.6A CN113705673B (en) | 2021-08-27 | 2021-08-27 | Text detection method, text detection device, text detection equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113705673A true CN113705673A (en) | 2021-11-26 |
CN113705673B CN113705673B (en) | 2023-12-12 |
Family
ID=78656035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110996114.6A Active CN113705673B (en) | 2021-08-27 | 2021-08-27 | Text detection method, text detection device, text detection equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113705673B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114067192A (en) * | 2022-01-07 | 2022-02-18 | 北京许先网科技发展有限公司 | Character recognition method and system |
CN114898375A (en) * | 2022-05-20 | 2022-08-12 | 深信服科技股份有限公司 | Character detection model training method and component, text recognition method and component |
CN114998482A (en) * | 2022-06-13 | 2022-09-02 | 厦门大学 | Intelligent generation method of characters and artistic patterns |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110222680A (en) * | 2019-05-19 | 2019-09-10 | 天津大学 | A kind of domestic waste article outer packing Method for text detection |
CN111353491A (en) * | 2020-03-12 | 2020-06-30 | 中国建设银行股份有限公司 | Character direction determining method, device, equipment and storage medium |
-
2021
- 2021-08-27 CN CN202110996114.6A patent/CN113705673B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097049A (en) * | 2019-04-03 | 2019-08-06 | 中国科学院计算技术研究所 | A kind of natural scene Method for text detection and system |
CN110222680A (en) * | 2019-05-19 | 2019-09-10 | 天津大学 | A kind of domestic waste article outer packing Method for text detection |
CN111353491A (en) * | 2020-03-12 | 2020-06-30 | 中国建设银行股份有限公司 | Character direction determining method, device, equipment and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114067192A (en) * | 2022-01-07 | 2022-02-18 | 北京许先网科技发展有限公司 | Character recognition method and system |
CN114898375A (en) * | 2022-05-20 | 2022-08-12 | 深信服科技股份有限公司 | Character detection model training method and component, text recognition method and component |
CN114998482A (en) * | 2022-06-13 | 2022-09-02 | 厦门大学 | Intelligent generation method of characters and artistic patterns |
Also Published As
Publication number | Publication date |
---|---|
CN113705673B (en) | 2023-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | ICDAR 2019 competition on table detection and recognition (cTDaR) | |
CN108898610B (en) | Object contour extraction method based on mask-RCNN | |
CN113705673B (en) | Text detection method, text detection device, text detection equipment and storage medium | |
CN110598609B (en) | Weak supervision target detection method based on significance guidance | |
CN109948510B (en) | Document image instance segmentation method and device | |
CN109409366B (en) | Distorted image correction method and device based on angular point detection | |
CN110378310B (en) | Automatic generation method of handwriting sample set based on answer library | |
US8442319B2 (en) | System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking | |
US20130208986A1 (en) | Character recognition | |
CN112712273B (en) | Handwriting Chinese character aesthetic degree judging method based on skeleton similarity | |
CN110598566A (en) | Image processing method, device, terminal and computer readable storage medium | |
CN112949455B (en) | Value-added tax invoice recognition system and method | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
CN109190625A (en) | A kind of container number identification method of wide-angle perspective distortion | |
CN112800955A (en) | Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid | |
CN114529925A (en) | Method for identifying table structure of whole line table | |
CN113158895A (en) | Bill identification method and device, electronic equipment and storage medium | |
CN110335280A (en) | A kind of financial documents image segmentation and antidote based on mobile terminal | |
CN116704516B (en) | Visual inspection method for water-soluble fertilizer package | |
CN116071763A (en) | Teaching book intelligent correction system based on character recognition | |
CN113033558A (en) | Text detection method and device for natural scene and storage medium | |
CN103455816B (en) | Stroke width extraction method and device and character recognition method and system | |
CN110443235B (en) | Intelligent paper test paper total score identification method and system | |
CN115512379A (en) | Method and system for identifying and extracting check result of check box in paper text | |
CN114581928A (en) | Form identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |