CN111798480A - Character detection method and device based on single character and character connection relation prediction - Google Patents
Character detection method and device based on single character and character connection relation prediction Download PDFInfo
- Publication number
- CN111798480A CN111798480A CN202010719772.6A CN202010719772A CN111798480A CN 111798480 A CN111798480 A CN 111798480A CN 202010719772 A CN202010719772 A CN 202010719772A CN 111798480 A CN111798480 A CN 111798480A
- Authority
- CN
- China
- Prior art keywords
- character
- map
- text
- word
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000003062 neural network model Methods 0.000 claims abstract description 17
- 238000012805 post-processing Methods 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 77
- 230000008569 process Effects 0.000 claims description 17
- 230000015654 memory Effects 0.000 claims description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000010586 diagram Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000003709 image segmentation Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 description 19
- 238000013461 design Methods 0.000 description 14
- 238000004364 calculation method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000005452 bending Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/187—Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/28—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
- G06V30/287—Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a character detection method and a device based on single character and character connection relation prediction, which comprises the following steps: training a neural network model; and performing feature extraction through the neural network model to obtain a feature map, wherein the feature map comprises a character position score map Text-map and an inter-character connection relation score map Link-map, performing post-processing on the Text-map and the Link-map, and calculating the minimum external rectangle of each connected domain to realize the detection of the position of the character from the picture. The character-level-based prediction model can force the convolution kernel of the model to pay more attention to character-level features, can effectively correlate character-level features learned by the model in a post-processing branch, realizes the detection of long texts, and avoids the omission of characters.
Description
Technical Field
The invention belongs to the field of character detection, and particularly relates to a character detection method based on single character and character connection relation prediction.
Background
Before the explosion of the neural network technology, character detection is usually based on manually set feature extraction methods, such as region feature extraction MSER, programming framework swt (standard Widget toolkit), and the like, as basic components. With the development of neural network technology in recent years, new and more neural network-based methods have become popular. Some new text detection methods have been designed in sequence, but in general, these methods are essentially improved based on general target detection or general case segmentation methods, such as detecting text in natural images (SSD), faster-CNN, Full Convolution Network (FCN). We can classify it into two big categories according to different text detection algorithm design methods, which are: a text detection method based on bounding box regression and a text detection method based on pixel segmentation.
A text detection method based on bounding box regression; some text detection methods are improved by using a method of surrounding box regression in general target detection. Unlike general object detection, objects in text detection are often irregular fields and have certain angles. And therefore cannot directly use a general purpose object detector as a model for text detection. In order to solve the problem, some text box (Textboxes) character detection methods are designed, which can adapt to text objects of different shapes by changing the shapes of a convolution kernel and an anchor box (anchor boxes) in a general object detector, and have good detection effect on horizontal rectangular character objects. The depth Matching Prior Network (Deep Matching Prior Network) further applies a quadrilateral sliding window to filter the detection error area. In a recent method, a Rotation-Sensitive regression detector (RSDD) fully utilizes the Rotation invariant feature to improve the detection effect of a character detector on an inclined text target by actively rotating a convolution filter. However, when using this method, a priori knowledge is needed to limit the shape of the target in a natural scene.
A text detection method based on pixel segmentation; another common method is based on segmentation methods, which aim to find the coverage area of the text on the image at the pixel level. This method is used to detect text boundary regions. There are also similar algorithms that attempt to reduce background interference at the feature level for methods that are optimized on this basis, using attention mechanisms to enhance text regions at the feature level. The segmentation-based character detection method aims to represent an input image as a text/non-text region in the form of a binary image, and then further segment text instances in the text region by using a post-processing method.
Disclosure of Invention
The invention provides a character detection method and a character detection device based on single character and character connection relation prediction, which are used for realizing the detection of each character in a picture.
The technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for detecting words based on single character and word connection relation prediction, comprising the following steps:
training a neural network model; performing feature extraction through the neural network model to obtain a feature map, wherein the feature map comprises a character position score map Text-map and an inter-character connection relation score map Link-map, and performing post-processing on the Text-map and the Link-map, and the post-processing comprises the following steps:
converting the Text-map into a binary map Bm1, setting a threshold value lambda 1 for the Text-map,
converting the Link-map into a binary map Bm2, setting a threshold value lambda 2 for the Link-map,
initializing the binary map Bm1 to be 0, wherein the position value on the binary map Bm1 is 1 when the characteristic map value of the corresponding position is larger than a threshold value lambda 1;
initializing the binary map Bm2 to be 0, wherein the value of the position on the binary map Bm2 is 1 when the value of the characteristic map of the corresponding position is larger than a threshold value lambda 2;
performing connected component analysis on the obtained binary map Bm1 and binary map Bm2, and obtaining pixels of all character areas at the moment;
and calculating the minimum circumscribed rectangle of each connected domain to realize the detection of the position of the characters from the picture.
According to the technical characteristics, the post-processing method does not use other methods such as Non-Maximum Suppression (NMS) except that the three calculations need extremely small calculation amount, and after a connected domain representing a text region is obtained, a minimum circumscribed rectangle is obtained to represent a circumscribed rectangle of the text instance, so that a binary graph is obtained only through a threshold value, which needs time design; in essence, the character detection method is also a character detection method based on pixel segmentation, but the method is different from other methods of directly segmenting character examples, and in order to break through the limitation of the steps, the method can also combine a complete text example enclosure box by predicting the characteristic diagram of a single character and the connection relation characteristic diagram of the example to which the character belongs in a searching mode, and can realize the uplink and downlink connection problem in character detection, the representation of a curved text field and the detection of a long text field.
In one possible design, the model training of the training neural network model includes:
performing label conversion, and converting the label in the form of the bounding box numerical value into a score map label with character-level labels;
for each training image, generating a corresponding Text-map and a corresponding Link-map for each instance in the picture; generating a specific Gaussian map unit for each character position in the original image on the Text-map, and generating a binary map unit representing example connection relation for each example in the original image on the Link-map; for the generation of Link-map, firstly drawing a diagonal line for each character bounding box, and obtaining an upper triangle and a lower triangle at the moment; and then connecting the centers of gravity of the upper triangle and the lower triangle which belong to the same example respectively to obtain an upper line and a lower line, and closing the two line segments to obtain the polygonal enclosing frame representing the example connection relation. Firstly, the binary connection graph generated by the method is different from a binary semantic segmentation graph in a general character detection scheme based on segmentation, the connection relation of the method is more compact, and the connection tightness degree between two characters can be measured through the width of the connection binary graph; secondly, the method encodes a single character into a Gaussian heat map to reflect the center and the edge of the character, can flexibly represent the relationship between a real label and an image, and can help a depth model to learn semantic connection information in a word or a field by using a binary segmentation graph to represent the connection relationship in an example.
In one possible design, the gaussian map unit is generated as follows:
generating a two-dimensional standard Gaussian feature map in advance;
calculating a perspective transformation matrix of four coordinates of a standard Gaussian characteristic diagram and a character surrounding frame;
and transferring the standard Gaussian feature map to a surrounding frame region through perspective transformation.
In one possible design, when the neural network model is trained, the training samples are used for generating character-level labels from the word-level labels in a weak supervision training mode, and the character-level labels are represented in a form of character surrounding boxes.
In one possible design, the method for generating the character-level labels is as follows:
for a real training sample with word-level labels, carrying out forward reasoning on the real training sample by using a fully trained model, and predicting to obtain Text-map of the sample; cutting out a feature slice of the text on the feature map according to the original word-level label; estimating the corresponding position of each character on the character feature slice by applying a watershed algorithm and representing the corresponding position in a form of a bounding box; and finally, converting the surrounding frame into the original image.
In one possible design, in the model training process, the number of the generated character bounding boxes is matched with the number of the real characters to obtain a confidence level, and the confidence level is used for measuring the character bounding boxes generated in the model training process.
In one possible design, the method for measuring the character bounding box generated in the model training process by using the confidence coefficient is as follows:
for a word level label instance w in the training data, Tw and Lw are used for respectively representing the position of a generated character bounding box of the instance w and the number of real characters; applying a watershed image segmentation algorithm to obtain Lcw the number of character bounding boxes, then calculating the confidence score Sconf (w) of the example character bounding box by the following formula,
wherein, the min function is the minimum value in the return given parameter table;
at this time, the confidence score map sc (p) for describing the complete image may be expressed as follows,
p represents all pixels in the corresponding real word bounding box.
The confidence score sconf (w) is used for measuring the confidence of the generated label, so that the label generated by the trained model is more accurate, and the identification effect of the whole network model can be monitored through a confidence score graph Sc (p) describing a complete image.
In one possible design, the neural network model employs DetNet as the base network. The DetNet is a novel basic network specially designed for a target detection task, and the capability of positioning the position of a small target object in the target detection task is ensured based on less down-sampling operations.
In a second aspect, the present invention provides a word detection apparatus based on single character and word-word connection relation prediction, comprising a memory, a processor and a transceiver connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for sending and receiving messages, and the processor is used for reading the computer program and executing the method according to the first aspect.
In a third aspect, the present invention provides a computer-readable storage medium having stored thereon instructions which, when run on a computer, perform the method according to the first aspect.
The invention has the following advantages and beneficial effects:
1. according to the invention, through the post-processing method, except that the three calculations need extremely small calculation amount, other methods such as Non-Maximum Suppression (NMS) are not used, after a connected domain representing a character area is obtained, the minimum circumscribed rectangle is obtained to represent the circumscribed rectangle of the character example, so that a binary image is obtained only through a threshold value, and time is spent on designing; the character detection method is basically a character detection method based on pixel segmentation, but the method can break through the limitation of the steps compared with other methods for directly segmenting character examples, and the method can realize the up-down connection problem in character detection, the representation of a curved text field and the detection of a long text field by predicting the connection relation characteristic diagram of a single character and an example to which the character belongs and combining a complete text example enclosure frame in a searching mode; in other words, the traditional character detection method is based on surrounding box regression or pixel segmentation, and in order to detect long texts or large target texts, a deeper network structure is often needed to improve the receptive field of the model.
2. The binary connection graph generated by the method is different from a binary semantic segmentation graph in a general character detection scheme based on segmentation, the connection relation of the method is more compact, and the connection tightness degree between two characters can be measured through the width of the connection binary graph; secondly, the method encodes a single character into a Gaussian heat map to reflect the center and the edge of the character, can flexibly represent the relationship between a real label and an image, and represents the intrinsic connection relationship of an example by using a binary segmentation map, which is beneficial to a depth model to learn semantic connection information in words or fields;
3. the confidence score Sconf (w) is used for measuring the confidence of the generated label in the training process, so that the label generated by the training model is more accurate, and the identification effect of the whole network model can be monitored through the confidence score chart Sc (p) describing the complete image.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is an overall structure diagram of a neural network model according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments of the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "including," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, numbers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
It should be understood that specific details are provided in the following description to facilitate a thorough understanding of example embodiments. However, it will be understood by those of ordinary skill in the art that the example embodiments may be practiced without these specific details. For example, systems may be shown in block diagrams in order not to obscure the examples in unnecessary detail. In other instances, well-known processes, structures and techniques may be shown without unnecessary detail in order to avoid obscuring example embodiments.
The aim of character detection is to accurately locate the position of characters in an image, and the development of character detection technology in recent years goes through the following stages. The text detection method comprises a first stage of detecting horizontal texts, a second stage of detecting inclined straight texts with certain inclination angles, and a third stage of detecting texts in any shapes including bent texts with any bending angles. Recent research efforts have tended to locate text positions using arbitrary polygons, which can be used to more accurately locate text positions in images. In the embodiment, the character detection method based on character detection and word-word relation prediction also belongs to this new category.
Our method aims to accurately locate the position of each character in the input image and the character connection relationship with a field.
Examples
In a first aspect, the present embodiment provides a method for detecting words based on single character and inter-word connection relationship prediction, including the following steps:
training a neural network model; performing feature extraction through the neural network model to obtain a feature map, wherein the feature map comprises a character position score map Text-map and an inter-character connection relation score map Link-map, and performing post-processing on the Text-map and the Link-map, and the post-processing comprises the following steps:
converting the Text-map into a binary map Bm1, setting a threshold value lambda 1 for the Text-map,
converting the Link-map into a binary map Bm2, setting a threshold value lambda 2 for the Link-map,
initializing the binary map Bm1 to be 0, wherein the position value on the binary map Bm1 is 1 when the characteristic map value of the corresponding position is larger than a threshold value lambda 1;
initializing the binary map Bm2 to be 0, wherein the value of the position on the binary map Bm2 is 1 when the value of the characteristic map of the corresponding position is larger than a threshold value lambda 2;
performing Connected Component Labeling (CCL) on the obtained binary map Bm1 and binary map Bm2, and obtaining pixels of all text regions at this time; in specific implementation, the step can be implemented by connected components function in OpenCV, which is a cross-platform computer vision and machine learning software library.
And calculating the minimum circumscribed rectangle of each connected domain to detect the position of the characters from the picture, wherein the calculation method of the step is realized by a minAreaRect function in OpenCV during specific implementation.
The overall structure of the text detection method model provided by this embodiment mainly includes three parts, as shown in fig. 1, a first part: a main network formed by DetNet is used as a characteristic extraction part, the DetNet is a novel basic network specially designed for a target detection task, and the capability of positioning the position of a small target object in the target detection task is ensured based on less down-sampling operations; a second part: an upsampling feature fusion part; and a third part: the post-processing part based on connected domain analysis is also called the inference part, since this phase only exists at the time of inference. The post-processing is the reasoning part. In implementation, the model may output text position representation modes of various shapes according to specific requirements, typically including character bounding boxes, word bounding boxes, polygonal curved edge bounding boxes, and the like.
In the post-processing method, except that the three calculations need extremely small calculation amount, other methods such as Non-Maximum Suppression (NMS) are not used, after a connected domain representing a text region is obtained, a minimum circumscribed rectangle of the connected domain represents a text example, so that a binary image is obtained only by a threshold value, which needs time design; in essence, the character detection method is also a character detection method based on pixel segmentation, but the method is different from other methods of directly segmenting character examples, and in order to break through the limitation of the steps, the method can realize the up-down connection problem in character detection, the representation of a curved text field and the detection of a long text field by predicting the connection relation characteristic diagram of a single character and an example to which the character belongs and combining a complete text example enclosure frame in a searching mode.
In one possible design, the model training of the training neural network model includes:
performing label conversion, and converting the label in the form of the bounding box numerical value into a score map label with character-level labels; because the training process of the algorithm does not directly regress character coordinates, but needs a special post-processing branch to process the feature map, the training process of the algorithm needs label conversion, and labels in a bounding box numerical value form are converted into score map labels with character level labels.
The labeling method of the synthetic character level is as follows: for each training image, generating a corresponding Text-map and a corresponding Link-map for each instance in the picture; generating a specific Gaussian map unit for each character position in the original image on the Text-map, and generating a binary map unit representing example connection relation for each example in the original image on the Link-map; for the generation of Link-map, firstly drawing a diagonal line for each character bounding box, and obtaining an upper triangle and a lower triangle at the moment; and then connecting the centers of gravity of the upper triangle and the lower triangle which belong to the same example respectively to obtain an upper line and a lower line, and closing the two line segments to obtain the polygonal enclosing frame representing the example connection relation. Firstly, the binary connection graph generated by the method is different from a binary semantic segmentation graph in a general character detection scheme based on segmentation, the connection relation of the method is more compact, and the connection tightness degree between two characters can be measured through the width of the connection binary graph; secondly, the method encodes a single character into a Gaussian heat map to reflect the center and the edge of the character, can flexibly represent the relationship between a real label and an image, and can help a depth model to learn semantic connection information in a word or a field by using a binary segmentation graph to represent the connection relationship in an example.
Directly from the bounding box of each character, calculating the gaussian distribution value of each pixel value within the bounding box takes a lot of time, which is not feasible in practical training processes. It should be noted that, in the training process of the general target detection model, because the label is a horizontal quadrilateral bounding box, in this class of algorithms, they provide a method for generating an internal gaussian feature map directly according to the label of the target bounding box, and because in our current task, the character bounding box is often an arbitrary convex quadrilateral, in consideration of time complexity, the generation of the gaussian map unit is completed by using the following steps:
generating a two-dimensional standard Gaussian feature map in advance;
calculating a perspective transformation matrix of four coordinates of a standard Gaussian characteristic diagram and a character surrounding frame;
and transferring the standard Gaussian feature map to a surrounding frame region through perspective transformation.
In one possible design, when the neural network model is trained, the training samples are used for generating character-level labels from the word-level labels in a weak supervision training mode, and the character-level labels are represented in a form of character surrounding boxes. In specific implementation, the method for generating the character level label is as follows:
for a real training sample with word-level labels, carrying out forward reasoning on the real training sample by using a fully trained model, and predicting to obtain Text-map of the sample; cutting out a feature slice of the text on the feature map according to the original word-level label; estimating the corresponding position of each character on the character feature slice by applying a watershed algorithm and representing the corresponding position in a form of a bounding box; and finally, converting the surrounding frame into the original image.
In one possible design, in the model training process, the number of the generated character bounding boxes is matched with the number of the real characters to obtain a confidence level, and the confidence level is used for measuring the character bounding boxes generated in the model training process. In specific implementation, the method for measuring the character bounding box generated in the model training process by using the confidence coefficient comprises the following steps:
for a word level label instance w in the training data, Tw and Lw are used for respectively representing the position of a generated character bounding box of the instance w and the number of real characters; applying a watershed image segmentation algorithm to obtain Lcw the number of character bounding boxes, then calculating the confidence score Sconf (w) of the example character bounding box by the following formula,
wherein, the min function is the minimum value in the return given parameter table;
at this time, the confidence score map sc (p) for describing the complete image may be expressed as follows,
p represents all pixels in the corresponding real word bounding box.
The confidence score sconf (w) is used for measuring the confidence of the generated label, so that the label generated by the trained model is more accurate, and the identification effect of the whole network model can be monitored through a confidence score graph Sc (p) describing a complete image.
In other words, in the conventional character detection method, no matter based on bounding box regression or pixel segmentation, in order to detect a long text or a large target text, a deeper network structure is often required to improve the receptive field of the model.
In a second aspect, the present embodiment provides a word detection apparatus based on single character and word-word connection relation prediction, including a memory, a processor, and a transceiver connected in sequence, where the memory is used for storing a computer program, the transceiver is used for sending and receiving messages, and the processor is used for reading the computer program and executing the method according to the first aspect.
For example, the Memory may include, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Flash Memory (Flash Memory), a First In First Out (FIFO) Memory, and/or a First In Last Out (FILO) Memory, and the like; the processor may not be limited to the use of a microprocessor model number STM32F105 family; the transceiver may be, but is not limited to, a Wireless Fidelity (WiFi) Wireless transceiver, a bluetooth Wireless transceiver, a General Packet Radio Service (GPRS) Wireless transceiver, a ZigBee protocol (ZigBee) Wireless transceiver, and/or the like. In addition, the device may include, but is not limited to, a power module, a display screen, and other necessary components.
A third aspect of the present embodiment provides a computer-readable storage medium, on which instructions are stored, and when the instructions are executed on a computer, the method according to the first aspect or any one of the possible designs of the first aspect of the present embodiment is executed. The computer-readable storage medium refers to a carrier for storing data, and may include, but is not limited to, floppy disks, optical disks, hard disks, flash memories, flash disks and/or Memory sticks (Memory sticks), etc., and the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
For the working process, the working details, and the technical effects of the computer-readable storage medium provided in this embodiment, reference may be made to the first aspect of the embodiment, which is not described herein again.
The present invention provides a computer program product comprising instructions which, when run on a computer, may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus, cause the computer to perform the method according to the first aspect of the embodiments.
The embodiments described above are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device to perform the methods described in the embodiments or some portions of the embodiments.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. The character detection method based on single character and character connection relation prediction is characterized by comprising the following steps of:
training a neural network model; performing feature extraction through the neural network model to obtain a feature map, wherein the feature map comprises a character position score map Text-map and an inter-character connection relation score map Link-map, and performing post-processing on the Text-map and the Link-map, and the post-processing comprises the following steps:
converting the Text-map into a binary map Bm1, setting a threshold value lambda 1 for the Text-map,
converting the Link-map into a binary map Bm2, setting a threshold value lambda 2 for the Link-map,
initializing a binary map Bm1 to be 0, wherein the value of the position on the binary map Bm1 is 1 when the value of the feature map of the corresponding position is larger than a threshold value lambda 1;
initializing the binary map Bm2 to be 0, wherein the value of the position on the binary map Bm2 is 1 when the value of the characteristic map of the corresponding position is larger than a threshold value lambda 2;
performing connected component analysis on the obtained binary map Bm1 and binary map Bm2, and obtaining pixels of all character areas at the moment;
and calculating the minimum circumscribed rectangle of each connected domain to realize the detection of the position of the characters from the picture.
2. The method of claim 1, wherein the model training for training the neural network model comprises:
performing label conversion, and converting the label in the form of the bounding box numerical value into a score map label with character level labels;
for each training image, generating a corresponding Text-map and a corresponding Link-map for each instance in the picture; and generating a specific Gaussian map unit on the Text-map for each character position in the original image, and generating a binary map unit representing example connection relation on the Link-map for each example in the original image.
3. The method of claim 2, wherein the Gaussian map unit is generated by:
generating a two-dimensional standard Gaussian feature map in advance;
calculating a perspective transformation matrix of four coordinates of a standard Gaussian characteristic diagram and a character surrounding frame;
and transferring the standard Gaussian feature map to a surrounding frame region through perspective transformation.
4. The method as claimed in claim 1, wherein the neural network model generates a character-level label from the word-level label by training samples in a weak supervision manner during training, and the character-level label is represented in a form of a character bounding box.
5. The method of claim 4, wherein the method for generating the character-level labels comprises:
for a real training sample with word-level labels, carrying out forward reasoning on the real training sample by using a fully trained model, and predicting to obtain Text-map of the sample; cutting out a feature slice of the text on the feature map according to the original word-level label; estimating the corresponding position of each character on the character feature slice by applying a watershed algorithm and representing the corresponding position in a form of a bounding box; and finally, converting the surrounding frame into the original image.
6. The word detection method based on single character and word connection relation prediction as claimed in claim 4, wherein: in the model training process, the number of the generated character bounding boxes is matched with the number of the real characters to obtain a confidence coefficient, and the confidence coefficient is used for measuring the character bounding boxes generated in the model training process.
7. The method for detecting words based on single character and word connection relation prediction as claimed in claim 6, wherein the method for measuring the character bounding box generated in the model training process by using confidence coefficient comprises:
for a word level label instance w in the training data, Tw and Lw are used for respectively representing the position of a character bounding box generated by the instance w and the number of real characters; applying a watershed image segmentation algorithm to obtain Lcw the number of character bounding boxes, then calculating the confidence score Sconf (w) of the example character bounding box by the following formula,
wherein, the min function is the minimum value in the return given parameter table;
at this time, the confidence score map sc (p) for describing the complete image may be expressed as follows,
p represents all pixels in the corresponding real word bounding box.
8. The word detection method based on single character and word connection relation prediction as claimed in claim 1, wherein: the neural network model adopts a basic network DetNet.
9. A character detection device based on single character and character connection relation prediction is characterized in that: the system comprises a memory, a processor and a transceiver which are connected in sequence, wherein the memory is used for storing a computer program, the transceiver is used for transmitting and receiving messages, and the processor is used for reading the computer program and executing the method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that: the computer-readable storage medium has stored thereon instructions that, when executed on a computer, perform the method of any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010719772.6A CN111798480B (en) | 2020-07-23 | 2020-07-23 | Character detection method and device based on single character and character connection relation prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010719772.6A CN111798480B (en) | 2020-07-23 | 2020-07-23 | Character detection method and device based on single character and character connection relation prediction |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111798480A true CN111798480A (en) | 2020-10-20 |
CN111798480B CN111798480B (en) | 2024-07-26 |
Family
ID=72828691
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010719772.6A Active CN111798480B (en) | 2020-07-23 | 2020-07-23 | Character detection method and device based on single character and character connection relation prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111798480B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257708A (en) * | 2020-10-22 | 2021-01-22 | 润联软件系统(深圳)有限公司 | Character-level text detection method and device, computer equipment and storage medium |
CN112541491A (en) * | 2020-12-07 | 2021-03-23 | 沈阳雅译网络技术有限公司 | End-to-end text detection and identification method based on image character region perception |
CN112580629A (en) * | 2020-12-23 | 2021-03-30 | 深圳市捷顺科技实业股份有限公司 | License plate character recognition method based on deep learning and related device |
CN113065547A (en) * | 2021-03-10 | 2021-07-02 | 国网河北省电力有限公司 | Character supervision information-based weak supervision text detection method |
CN113673338A (en) * | 2021-07-16 | 2021-11-19 | 华南理工大学 | Natural scene text image character pixel weak supervision automatic labeling method, system and medium |
CN113837168A (en) * | 2021-09-22 | 2021-12-24 | 易联众智鼎(厦门)科技有限公司 | Image text detection and OCR recognition method, device and storage medium |
CN114708591A (en) * | 2022-04-19 | 2022-07-05 | 复旦大学 | Document image Chinese character detection method based on single character connection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05174187A (en) * | 1991-12-25 | 1993-07-13 | Matsushita Electric Ind Co Ltd | Character recognizing device |
CN108764036A (en) * | 2018-04-24 | 2018-11-06 | 西安电子科技大学 | A kind of handwritten form Tibetan language word fourth recognition methods |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN111179251A (en) * | 2019-12-30 | 2020-05-19 | 上海交通大学 | Defect detection system and method based on twin neural network and by utilizing template comparison |
CN111242129A (en) * | 2020-01-03 | 2020-06-05 | 创新工场(广州)人工智能研究有限公司 | Method and device for end-to-end character detection and identification |
-
2020
- 2020-07-23 CN CN202010719772.6A patent/CN111798480B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05174187A (en) * | 1991-12-25 | 1993-07-13 | Matsushita Electric Ind Co Ltd | Character recognizing device |
WO2019192397A1 (en) * | 2018-04-04 | 2019-10-10 | 华中科技大学 | End-to-end recognition method for scene text in any shape |
CN108764036A (en) * | 2018-04-24 | 2018-11-06 | 西安电子科技大学 | A kind of handwritten form Tibetan language word fourth recognition methods |
CN111179251A (en) * | 2019-12-30 | 2020-05-19 | 上海交通大学 | Defect detection system and method based on twin neural network and by utilizing template comparison |
CN111242129A (en) * | 2020-01-03 | 2020-06-05 | 创新工场(广州)人工智能研究有限公司 | Method and device for end-to-end character detection and identification |
Non-Patent Citations (3)
Title |
---|
吴财贵;唐权华: "基于深度学习的图片敏感文字检测", 计算机工程与应用, vol. 51, no. 14, 14 October 2014 (2014-10-14) * |
陈善雄;韩旭;林小渝;刘云;王明贵;: "基于MSER和CNN的彝文古籍文献的字符检测方法", 华南理工大学学报(自然科学版), no. 06, 15 June 2020 (2020-06-15) * |
陈红洁;顾国弟;巢国平;罗文彬;林瑜;李俊彦;肖永来;罗忆;杨学晨;姚良群;汪明浩;张华;沈峰;王全荣: "基于视频图像的车辆号牌检测和分析系统", 2010年国家科技成果网科技成果汇编第二批-第三批, 30 November 2009 (2009-11-30) * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112257708A (en) * | 2020-10-22 | 2021-01-22 | 润联软件系统(深圳)有限公司 | Character-level text detection method and device, computer equipment and storage medium |
CN112541491A (en) * | 2020-12-07 | 2021-03-23 | 沈阳雅译网络技术有限公司 | End-to-end text detection and identification method based on image character region perception |
CN112541491B (en) * | 2020-12-07 | 2024-02-02 | 沈阳雅译网络技术有限公司 | End-to-end text detection and recognition method based on image character region perception |
CN112580629A (en) * | 2020-12-23 | 2021-03-30 | 深圳市捷顺科技实业股份有限公司 | License plate character recognition method based on deep learning and related device |
CN113065547A (en) * | 2021-03-10 | 2021-07-02 | 国网河北省电力有限公司 | Character supervision information-based weak supervision text detection method |
CN113673338A (en) * | 2021-07-16 | 2021-11-19 | 华南理工大学 | Natural scene text image character pixel weak supervision automatic labeling method, system and medium |
CN113673338B (en) * | 2021-07-16 | 2023-09-26 | 华南理工大学 | Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels |
CN113837168A (en) * | 2021-09-22 | 2021-12-24 | 易联众智鼎(厦门)科技有限公司 | Image text detection and OCR recognition method, device and storage medium |
CN114708591A (en) * | 2022-04-19 | 2022-07-05 | 复旦大学 | Document image Chinese character detection method based on single character connection |
CN114708591B (en) * | 2022-04-19 | 2024-10-15 | 复旦大学 | Document image Chinese character detection method based on single word connection |
Also Published As
Publication number | Publication date |
---|---|
CN111798480B (en) | 2024-07-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111798480A (en) | Character detection method and device based on single character and character connection relation prediction | |
EP3620979B1 (en) | Learning method, learning device for detecting object using edge image and testing method, testing device using the same | |
CN109740668B (en) | Deep model training method and device, electronic equipment and storage medium | |
EP3591582B1 (en) | Method and system for automatic object annotation using deep network | |
CN111488826A (en) | Text recognition method and device, electronic equipment and storage medium | |
CN109740752B (en) | Deep model training method and device, electronic equipment and storage medium | |
CN108830780B (en) | Image processing method and device, electronic device and storage medium | |
Tian et al. | Learning complementary saliency priors for foreground object segmentation in complex scenes | |
CN111178355B (en) | Seal identification method, device and storage medium | |
CN111723815B (en) | Model training method, image processing device, computer system and medium | |
CN112418216A (en) | Method for detecting characters in complex natural scene image | |
CN112101386B (en) | Text detection method, device, computer equipment and storage medium | |
CN113657202A (en) | Component identification method, training set construction method, device, equipment and storage medium | |
JP5832656B2 (en) | Method and apparatus for facilitating detection of text in an image | |
CN109190662A (en) | A kind of three-dimensional vehicle detection method, system, terminal and storage medium returned based on key point | |
Qiao et al. | Wire segmentation for printed circuit board using deep convolutional neural network and graph cut model | |
CN116597466A (en) | Engineering drawing text detection and recognition method and system based on improved YOLOv5s | |
CN113591746A (en) | Document table structure detection method and device | |
Naosekpam et al. | Multi-lingual Indian text detector for mobile devices | |
WO2023221292A1 (en) | Methods and systems for image generation | |
CN115620315A (en) | Handwritten text detection method, device, server and storage medium | |
CN115063807A (en) | Image processing method and device, readable storage medium and electronic equipment | |
CN114511862A (en) | Form identification method and device and electronic equipment | |
CN110852102B (en) | Chinese part-of-speech tagging method and device, storage medium and electronic equipment | |
Yin et al. | Image dehazing with uneven illumination prior by dense residual channel attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |