CN113705673A - Character detection method, device, equipment and storage medium - Google Patents

Character detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN113705673A
CN113705673A CN202110996114.6A CN202110996114A CN113705673A CN 113705673 A CN113705673 A CN 113705673A CN 202110996114 A CN202110996114 A CN 202110996114A CN 113705673 A CN113705673 A CN 113705673A
Authority
CN
China
Prior art keywords
character
detection model
detection
training set
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110996114.6A
Other languages
Chinese (zh)
Other versions
CN113705673B (en
Inventor
王明辉
闾磊
邓川
黄甫毅
胡一可
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yishu Technology Co ltd
Original Assignee
Sichuan Yishu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yishu Technology Co ltd filed Critical Sichuan Yishu Technology Co ltd
Priority to CN202110996114.6A priority Critical patent/CN113705673B/en
Publication of CN113705673A publication Critical patent/CN113705673A/en
Application granted granted Critical
Publication of CN113705673B publication Critical patent/CN113705673B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a character detection method, a device, equipment and a storage medium, comprising the following steps: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using a training set; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the image; and judging the direction of the image by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion. Therefore, the richness of the training set can be increased, the generalization capability of the training model is improved, the character detection capability under a complex scene is improved, and meanwhile, the correct character direction is provided for the character recognition stage.

Description

Character detection method, device, equipment and storage medium
Technical Field
The present invention relates to the field of text detection, and in particular, to a text detection method, apparatus, device, and storage medium.
Background
At present, the mainstream OCR (Optical Character Recognition) technology is mainly implemented based on two stages of detection and Recognition, where the detection stage is to detect an area in which a Character exists in an image and return coordinates of the Character area, and the Recognition stage is to recognize picture content output by the detection stage and corresponding to the coordinates and return corresponding characters. The detection is an important step for realizing OCR, the accurate detection of the position of the character is the premise of recognition, and redundant information is removed for the recognition stage.
The character detection technology also belongs to the technical field of target detection, and because a large number of bent and irregular character examples exist and gaps exist among characters, the detection of the characters requires higher detection precision. The existing character detection technology is mainly realized by a traditional detection method, a segmentation-based detection method, a regression-based detection method and a mixing method. The methods have good detection effect on texts with various shape rules, but have poor detection effect on dense characters and small target characters and characters in complex scenes, and are easy to have the problems of missing detection, false detection, low detection quality, redundant detection frames and the like.
Therefore, how to improve the text detection effect in some complex scenes is a technical problem that needs to be solved urgently by the technical personnel in the field.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device and a storage medium for detecting a character, which can improve the character detection capability in a complex scene and provide a correct character direction for a character recognition stage. The specific scheme is as follows:
a text detection method, comprising:
synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
training the character detection model and the direction detection model by using the processed training set until the network converges;
inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected;
and judging the direction of the character image to be detected by the direction detection model after training of each character area, and performing coordinate adjustment of corresponding direction conversion.
Preferably, in the text detection method provided in the embodiment of the present invention, when synthesizing a real document scene image as a training set, the method further includes:
detecting the training set, and counting the detected result;
in the statistical result, only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion;
and rendering the detection result by adopting an image processing tool to realize semi-automatic marking.
Preferably, in the text detection method provided in the embodiment of the present invention, the denoising processing performed on the training set includes:
converting the training set after data enhancement into a gray scale map;
removing pepper salt noise points by adopting median filtering;
enhancing the character outline information by adopting a high contrast retention algorithm;
adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image;
and performing AND operation on the obtained binary image and the training set after data enhancement to obtain the training set after noise reduction.
Preferably, in the text detection method provided in an embodiment of the present invention, the correcting process performed on the training set includes:
carrying out edge detection processing on the training set subjected to noise reduction;
detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions;
obtaining a rotation matrix according to the rotation angle;
and carrying out affine transformation according to the rotation matrix to obtain the training set after correction processing.
Preferably, in the text detection method provided in the embodiment of the present invention, the backhaul part of the text detection model uses a ResNet network structure, a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ Transformer network structure;
the Neck part of the character detection model adopts an FPN structure, a Bi _ FPN structure, a PANet structure or a recurive-FPN structure.
Preferably, in the text detection method provided in the embodiment of the present invention, an SPP network structure is fused in the backhaul part of the text detection model in the feature extraction stage.
Preferably, in the text detection method provided in the embodiment of the present invention, the direction detection model is composed of convolution, pooling, Batch Normalization, an activation function, and jump connection.
The embodiment of the invention also provides a character detection device, which comprises:
the data processing module is used for synthesizing a real document scene image as a training set and carrying out data enhancement, noise reduction and correction processing on the training set;
the model construction module is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
the model training module is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module is used for inputting the character image to be detected to the trained character detection model for detection and outputting the coordinates of the character area in the character image to be detected;
and the direction detection module is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
The embodiment of the invention also provides character detection equipment which comprises a processor and a memory, wherein the processor executes the computer program stored in the memory to realize the character detection method provided by the embodiment of the invention.
The embodiment of the present invention further provides a computer-readable storage medium for storing a computer program, where the computer program is executed by a processor to implement the above-mentioned text detection method provided in the embodiment of the present invention.
According to the technical scheme, the character detection method provided by the invention comprises the following steps: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using the processed training set until the network converges; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected; and judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion.
In the character detection method provided by the invention, a real document scene image is synthesized to be used as a training set, and data enhancement, noise reduction and correction processing are carried out on the training set, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage.
In addition, the invention also provides a corresponding device, equipment and a computer readable storage medium aiming at the character detection method, so that the method has higher practicability, and the device, the equipment and the computer readable storage medium have corresponding advantages.
Drawings
In order to more clearly illustrate the embodiments of the present invention or technical solutions in related arts, the drawings used in the description of the embodiments or related arts will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a text detection method according to an embodiment of the present invention;
FIG. 2 is a graph of image and label segmentation after data enhancement and noise reduction provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of an SPP structure provided in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a text detection model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a text detection device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a character detection method, as shown in figure 1, comprising the following steps:
s101, synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
in practical application, the document detection scene designed by the invention can include character detection on various types and typesetting modes of character paragraphs, tables and character-table-picture mixed electronic scanned documents, and character detection on various types of documents shot by a user by using a handheld device in a complex scene, and the obtained document scene image can include dense texts, small target characters, characters in any angles and any shapes.
Because the development source data set is basically a character detection scene in a natural scene, the generalization capability of the model is not strong when the development source data set is used for model training, and the difficulty in character detection in a real scene is mainly influenced by factors such as illumination intensity, shooting angle, definition, the size of characters in an image, the distribution condition of the characters and the like.
In addition, in the invention, the optimization by adopting the image processing technology is an indispensable step, including background interference removal, inclination correction and the like, so that the quality of the document is improved as much as possible, and the detection capability of the model on the characters is improved.
S102, constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value;
s103, training the character detection model and the direction detection model by using the processed training set until the network converges;
s104, inputting the character image to be detected to the trained character detection model for detection, and outputting coordinates of a character area in the character image to be detected;
and S105, judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion.
In the text detection method provided by the embodiment of the invention, a real document scene image is synthesized to be used as a training set, and the training set is subjected to data enhancement, noise reduction and correction processing, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage.
In specific implementation, in the text detection method provided in the embodiment of the present invention, because the manual labeling is a time-consuming and labor-consuming task, when step S101 is executed to synthesize a real document scene image as a training set, the method may further include: detecting the training set, and counting the detected result; only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion; and rendering the detection result by adopting an image processing tool (such as OpenCV) to realize semi-automatic marking. Therefore, the result which is not ideal for detection can be filtered, time and labor are saved, the capital investment for manually processing the boring work is reduced, and errors generated in different manual processing processes are reduced.
It should be noted that, an open-source character detection model may be used to detect a training set (i.e., document data with a large number of small target instances and intensive characters), and the detected results are counted, because the open-source model has poor detection capability for the small target instances, in order to ensure the correctness of the synthesized data set, only a detection frame for detecting a relatively large character instance area may be retained and reduced according to the actual situation, and then a computer image processing library OpenCV is used to render and generate new image data, so as to perform the labeling for generating small target characters and intensive characters in a targeted manner, and at the same time, achieve the effect of increasing small target samples, and belong to a semi-automatic labeling manner, so that the synthesized training set data is closer to the real scene.
In specific implementation, in the text detection method provided in the embodiment of the present invention, the step S101 performs data enhancement on the training set, and may include: the method comprises the steps of randomly cutting, randomly turning and randomly changing fuzzy transformation on a training set, and randomly changing the brightness of an image, wherein the four transformations aim to increase the richness of a training sample and also increase the complexity of the training sample, the robustness of a model is improved, and the rotation operation at any angle aims to improve the detection capability of the model on characters at different angles.
Because the real scene not only includes the character detection of the scanned document, but also faces the problem of document detection of the user shooting in the real scene, the influence of environmental noise inevitably exists, for the scanned document, the conditions of page inclination, scanning trace interference, handwriting, seal on the perforation, pepper salt noise and the like exist, and for the photo shot by the user, the conditions of unbalanced illumination, page inclination, handwriting blur, background interference and the like exist. Therefore, in the invention, the image noise reduction and correction work is to perform noise reduction and correction processing on the input image shot by the user in the model inference stage. Fig. 2 shows an image and label segmentation map after data enhancement processing and noise reduction processing.
In a specific implementation, in the above text detection method provided in the embodiment of the present invention, since the text detection method implements pixel level classification based on a segmentation idea, binarization processing is a very important step, and the step S101 performs noise reduction processing on the training set, which may include: converting the training set after data enhancement into a gray scale map; removing pepper salt noise points by adopting median filtering; enhancing the character outline information by adopting a high contrast retention algorithm; adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image; and performing AND operation on the obtained binary image and the training set subjected to data enhancement to obtain a noise-reduced training set. Therefore, excessive loss of character information caused by the binarization process is avoided.
In specific implementation, in the text detection method provided in the embodiment of the present invention, the image rectification processing mainly solves the problem of tilt of image content, the tilted document has a great influence on the subsequent text sorting processing, otherwise, the sequence between the text and the sequence between the lines are disordered, and the step S101 performs rectification processing on the training set, which may include: carrying out edge detection processing on the denoised training set; detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions; obtaining a rotation matrix according to the rotation angle; and carrying out affine transformation according to the rotation matrix to obtain a training set after correction processing.
In specific implementation, in the text detection method provided in the embodiment of the present invention, in order to improve the extraction capability of the image features of the model and the detection capability of the model on small target texts and dense texts, the text detection model adopts a text detection architecture with a mixed idea, and is mainly optimized from a backhaul part and a tack part of the text detection model.
Specifically, a Backbone part of the character detection model is mainly responsible for extracting image features, the Backbone part can use various structures, the purpose is that the feature extraction capability, the reasoning speed and the occupation of video memory are all optimal as far as possible, the method mainly comprises a ResNet series structure, the output of a feature extraction network is four feature maps with different sizes and scales, a feature map with a large scale is mainly responsible for detecting a small target, and a feature map with a small scale is responsible for detecting a large target, namely, targets with different sizes are detected according to the sizes of the feature maps. Meanwhile, the backhaul part of the character detection model is fused with an SPP (Spatial Pyramid Pooling) network structure in the feature extraction stage, so as to increase the receptive field, extract important context features without reducing the network computation speed, and the structure is shown in fig. 3.
The ResNet series network structure belongs to the classical characteristic extraction network structure in the computer vision field, the strong characteristic extraction capability of the ResNet series network structure is widely applied to various computer vision tasks, preferably, a ResNet50+ deformation convolution structure can be used in the invention, the detection capability of a model for bent characters is improved, a ResNet50 and a ResNet18 are trained simultaneously by adopting a model distillation technology, the knowledge learned by a complex model is quickly transferred to a lightweight model, the reasoning speed of the model is improved, and the occupation of a video memory is reduced.
It should be noted that the backhaul part of the text detection model may also use a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ Transformer network structure. The MobileNet network belongs to a lightweight network structure, and compared with a ResNet series network structure, the MobileNet network has fewer parameters and is lighter in model. The RepVGG network structure is also called as a heavily parameterized structure, belongs to a training-reasoning decoupling architecture, and generally belongs to a Plain structure, a 3 x 3 convolution structure is mainly used in the network, a multi-branch structure is arranged in a training stage, and the RepVGG network can be fused into a single main network in a reasoning stage, so that the reasoning speed is improved while the feature extraction capability is maintained. The Swin _ Transformer structure is similar to a CNN hierarchical feature mapping mode, a self-attention mechanism is sampled to calculate local region features, and a window-shifting multi-head attention (SW-MSA) structure is used for realizing cross-domain information fusion.
Specifically, the neutral part of the character detection model may use a Bi _ FPN (weighted Bi-directional feature pyramid network) structure, and the output of the feature extraction network is four feature maps with different sizes, where a feature map with a large scale is mainly responsible for detecting a small target and a feature map with a small scale is responsible for detecting a large target, that is, targets with different sizes are detected according to the size of the feature map. After the feature maps with different scales are input into the Neck structure, the Bi _ FPN structure can enable the finer feature map of the lower layer of the network to be easily transmitted into the upper layer structure, efficient scale fusion is carried out on the large-scale feature map output by the backhaul, transmission of feature information of a small target in the neural network is promoted, and therefore the effect of improving the detection performance of the small target is achieved. The common FPN structure is equivalent to the feature fusion of different scales, and the Bi _ FPN structure adopts a weighting mode to the feature graphs of different scales, so that the feature information can be better balanced. Fig. 4 shows an architecture diagram of the entire text detection model.
It should be noted that the hack component in the text detection module can be replaced with a FPN (Feature Pyramid Network) structure, a pant (Path Aggregation Network) structure, and a Recursive-FPN (Recursive Feature Pyramid Network) structure. The FPN structure belongs to a classic feature fusion network structure, realizes top-down semantics for the first time, realizes feature fusion operation and relieves the problem of feature imbalance. The PANet architecture, i.e., a path aggregation network. For a general FPN feature pyramid network, different feature scales are fused, a path called Bottom _ up is added to the PANet, so that a finer feature map of a lower layer is easier to transfer to an upper layer, then fusion is performed on the same scale, and then splicing operation is performed, so that Top-Down and Bottom-up exist, and the obtained information is richer. The recursion-FPN structure inputs the output of the traditional FPN network structure into the Backbone structure again for recycling, so that multi-scale feature information can be more effectively acquired.
In a specific implementation, in the text detection method provided in the embodiment of the present invention, the direction detection model is composed of a series of convolution, pooling, Batch Normalization, activation function, and jump connection. The input of the model is the image value corresponding to the character area coordinate output by the character detection module, the image value is converted into a gray scale map, the shape of the character area image value is converted into 32 x 200 pixel size by using perspective change processing, the output of the model is the corresponding direction value, for example, 0 represents the correct direction, 1 represents clockwise rotation 90 degrees, and so on, 2 represents 180 degrees, and 3 represents 270 degrees. The direction of the image is determined according to the mode of direction values of all character example areas in the image, the direction of the character example level abandons the influence of a background area, the input is the characteristic of all character areas, the detection method is more reliable than the input method of taking the whole image as a direction detector, the common increase of errors and calculation amount caused by the existence of a large number of table areas or blank areas in a document can be avoided, and the structure of the model is lighter. The inferred image sizes are uniformly scaled to 32 x 200, white filling is used in order to ensure the character proportion, and the accuracy of the final trained model is higher than that of the mode of inputting the whole image.
Based on the same inventive concept, the embodiment of the present invention further provides a text detection apparatus, and since the principle of the apparatus for solving the problem is similar to the text detection method, the implementation of the apparatus can refer to the implementation of the text detection method, and repeated details are not repeated.
In specific implementation, as shown in fig. 5, the text detection apparatus provided in the embodiment of the present invention specifically includes:
the data processing module 11 is configured to synthesize a real document scene image as a training set, and perform data enhancement, noise reduction, and correction processing on the training set;
the model construction module 12 is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value;
the model training module 13 is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module 14 is configured to input the character image to be detected to the trained character detection model for detection, and output coordinates of a character region in the character image to be detected;
and the direction detection module 15 is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
In the character detection device provided by the embodiment of the invention, the richness of a training set can be increased through the interaction of the four modules, the interference of background noise is removed, and the generalization capability of a training model is improved; the method can detect the characters at any angle, avoids poor detection effect caused by character angle problems, can judge the direction of the whole image according to the directions of all character instance areas, improves character detection capability and accuracy rate in a complex scene, and provides correct character direction for a character recognition stage.
For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Correspondingly, the embodiment of the invention also discloses a character detection device, which comprises a processor and a memory; the processor executes the computer program stored in the memory to implement the character detection method disclosed in the foregoing embodiments.
For more specific processes of the above method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Further, the present invention also discloses a computer readable storage medium for storing a computer program; the computer program, when executed by a processor, implements the text detection method disclosed above.
For more specific processes of the above method, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device, the equipment and the storage medium disclosed by the embodiment correspond to the method disclosed by the embodiment, so that the description is relatively simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
To sum up, a text detection method provided by the embodiment of the present invention includes: synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set; constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the region coordinates of the text example output by the text detection model, and the output is a corresponding direction value; training the character detection model and the direction detection model by using the processed training set until the network converges; inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected; and judging the direction of the character image to be detected by the trained direction detection model of each character area, and performing coordinate adjustment of corresponding direction conversion. In the character detection method, a real document scene image is synthesized to be used as a training set, and data enhancement, noise reduction and correction processing are carried out on the training set, so that the richness of the training set can be increased, the interference of background noise is removed, and the generalization capability of a training model is improved; the character detection model and the direction detection model which are trained can detect characters at any angle, so that poor detection effect caused by character angle problems is avoided, the direction of the whole image can be judged according to the directions of all character instance areas, the character detection capability and accuracy under a complex scene are improved, and meanwhile, the correct character direction is provided for a character recognition stage. In addition, the invention also provides a corresponding device, equipment and a computer readable storage medium aiming at the character detection method, so that the character detection method has higher practicability, and the device, the equipment and the computer readable storage medium have corresponding advantages.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The method, the apparatus, the device and the storage medium for detecting characters provided by the present invention are described in detail above, and a specific example is applied in the description to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A text detection method, comprising:
synthesizing a real document scene image as a training set, and performing data enhancement, noise reduction and correction processing on the training set;
constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
training the character detection model and the direction detection model by using the processed training set until the network converges;
inputting the character image to be detected into the trained character detection model for detection, and outputting the coordinates of the character area in the character image to be detected;
and judging the direction of the character image to be detected by the direction detection model after training of each character area, and performing coordinate adjustment of corresponding direction conversion.
2. The text detection method of claim 1, wherein, while synthesizing a real document scene image as a training set, further comprising:
detecting the training set, and counting the detected result;
in the statistical result, only the detection frame of the text example area which is detected to exceed the set size is reserved, and the reserved detection frame is reduced according to the set proportion;
and rendering the detection result by adopting an image processing tool to realize semi-automatic marking.
3. The text detection method of claim 1, wherein denoising the training set comprises:
converting the training set after data enhancement into a gray scale map;
removing pepper salt noise points by adopting median filtering;
enhancing the character outline information by adopting a high contrast retention algorithm;
adopting a self-adaptive threshold algorithm to realize binarization processing to obtain a binary image;
and performing AND operation on the obtained binary image and the training set after data enhancement to obtain the training set after noise reduction.
4. The text detection method of claim 3, wherein the correcting the training set comprises:
carrying out edge detection processing on the training set subjected to noise reduction;
detecting the linear directions by using Hough transform, and obtaining a rotation angle according to the mode of all the linear directions;
obtaining a rotation matrix according to the rotation angle;
and carrying out affine transformation according to the rotation matrix to obtain the training set after correction processing.
5. The text detection method according to claim 1, wherein the backhaul part of the text detection model uses a ResNet network structure, a MobileNetV3 network structure, a RepVGG network structure, or a Swin _ fransformer network structure;
the Neck part of the character detection model adopts an FPN structure, a Bi _ FPN structure, a PANet structure or a recurive-FPN structure.
6. The text detection method according to claim 5, wherein the Backbone part of the text detection model is fused with an SPP network structure in a feature extraction stage.
7. The text detection method of claim 1, wherein the direction detection model consists of convolution, pooling, Batch Normalization, activation function, and jump join.
8. A character detection apparatus, comprising:
the data processing module is used for synthesizing a real document scene image as a training set and carrying out data enhancement, noise reduction and correction processing on the training set;
the model construction module is used for constructing a character detection model and a direction detection model; the input of the direction detection model is an image value corresponding to the text example region coordinates output by the text detection model, and the output is a corresponding direction value;
the model training module is used for training the character detection model and the direction detection model by using the processed training set until the network converges;
the character detection module is used for inputting the character image to be detected to the trained character detection model for detection and outputting the coordinates of the character area in the character image to be detected;
and the direction detection module is used for judging the direction of the character image to be detected according to the trained direction detection model of each character area and performing coordinate adjustment of corresponding direction conversion.
9. A text detection device comprising a processor and a memory, wherein the processor implements the text detection method of any one of claims 1 to 7 when executing a computer program stored in the memory.
10. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the text detection method according to any one of claims 1 to 7.
CN202110996114.6A 2021-08-27 2021-08-27 Text detection method, text detection device, text detection equipment and storage medium Active CN113705673B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110996114.6A CN113705673B (en) 2021-08-27 2021-08-27 Text detection method, text detection device, text detection equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110996114.6A CN113705673B (en) 2021-08-27 2021-08-27 Text detection method, text detection device, text detection equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113705673A true CN113705673A (en) 2021-11-26
CN113705673B CN113705673B (en) 2023-12-12

Family

ID=78656035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110996114.6A Active CN113705673B (en) 2021-08-27 2021-08-27 Text detection method, text detection device, text detection equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113705673B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067192A (en) * 2022-01-07 2022-02-18 北京许先网科技发展有限公司 Character recognition method and system
CN114898375A (en) * 2022-05-20 2022-08-12 深信服科技股份有限公司 Character detection model training method and component, text recognition method and component
CN114998482A (en) * 2022-06-13 2022-09-02 厦门大学 Intelligent generation method of characters and artistic patterns

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110222680A (en) * 2019-05-19 2019-09-10 天津大学 A kind of domestic waste article outer packing Method for text detection
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110222680A (en) * 2019-05-19 2019-09-10 天津大学 A kind of domestic waste article outer packing Method for text detection
CN111353491A (en) * 2020-03-12 2020-06-30 中国建设银行股份有限公司 Character direction determining method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114067192A (en) * 2022-01-07 2022-02-18 北京许先网科技发展有限公司 Character recognition method and system
CN114898375A (en) * 2022-05-20 2022-08-12 深信服科技股份有限公司 Character detection model training method and component, text recognition method and component
CN114998482A (en) * 2022-06-13 2022-09-02 厦门大学 Intelligent generation method of characters and artistic patterns

Also Published As

Publication number Publication date
CN113705673B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
Gao et al. ICDAR 2019 competition on table detection and recognition (cTDaR)
CN108898610B (en) Object contour extraction method based on mask-RCNN
CN113705673B (en) Text detection method, text detection device, text detection equipment and storage medium
CN110598609B (en) Weak supervision target detection method based on significance guidance
CN109948510B (en) Document image instance segmentation method and device
CN109409366B (en) Distorted image correction method and device based on angular point detection
CN110378310B (en) Automatic generation method of handwriting sample set based on answer library
US8442319B2 (en) System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
US20130208986A1 (en) Character recognition
CN112712273B (en) Handwriting Chinese character aesthetic degree judging method based on skeleton similarity
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
CN112949455B (en) Value-added tax invoice recognition system and method
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN109190625A (en) A kind of container number identification method of wide-angle perspective distortion
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN114529925A (en) Method for identifying table structure of whole line table
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN110335280A (en) A kind of financial documents image segmentation and antidote based on mobile terminal
CN116704516B (en) Visual inspection method for water-soluble fertilizer package
CN116071763A (en) Teaching book intelligent correction system based on character recognition
CN113033558A (en) Text detection method and device for natural scene and storage medium
CN103455816B (en) Stroke width extraction method and device and character recognition method and system
CN110443235B (en) Intelligent paper test paper total score identification method and system
CN115512379A (en) Method and system for identifying and extracting check result of check box in paper text
CN114581928A (en) Form identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant