CN107093172B - Character detection method and system - Google Patents
Character detection method and system Download PDFInfo
- Publication number
- CN107093172B CN107093172B CN201610091568.8A CN201610091568A CN107093172B CN 107093172 B CN107093172 B CN 107093172B CN 201610091568 A CN201610091568 A CN 201610091568A CN 107093172 B CN107093172 B CN 107093172B
- Authority
- CN
- China
- Prior art keywords
- image
- color
- blocks
- connected blocks
- merging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10008—Still image; Photographic image from scanner, fax or copier
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30176—Document
Abstract
The invention discloses a character detection method and a system; the method comprises the following steps: performing color reduction processing on each image in a three-color channel of a target image to obtain a color reduction image, and converting the target image into a binary image; combining the connected blocks with the same color in the color reduction image, and combining the connected blocks with the same color in the binary image; combining the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a connecting mode in the vertical and horizontal directions respectively to obtain candidate character areas in the target image; and extracting a specific area from the position of the candidate character area on the target image, and judging whether the extracted specific area contains character rows or character columns or not based on the comparison result of the probability of containing the character area in the extracted specific area and a preset probability threshold. By implementing the invention, the text in the image can be accurately detected.
Description
Technical Field
The invention relates to a character detection technology in an image, in particular to a character detection method and a character detection system.
Background
A document image is a document in an image Format, which is a document that is converted into an image Format by some means (such as scanning) for a user to read electronically, and typical examples of the document image are a Portable Document Format (PDF) image and a DjVu Format image.
The current text detection technology can detect text in a document image (locate a text-bearing area in the image), and perform text recognition based on the detected text-bearing area.
Images in a general sense include not only document images but also non-document images (i.e., images uploaded by users in a scan format image such as a web album, which may be joint photographic experts group (JPG) images, Bitmap (BMP) images, Tag Image File Format (TIFF) images, Graphic Interchange Format (GIF) images, exchangeable image file format (EXIF) images, and the like.
If the characters in the non-document format image can be identified, accurate semantic information can be obtained, and a user is helped to retrieve and manage the image. To identify the characters in the non-scanning format image and detect the characters in the image, which is a necessary pre-step, the existing character detection technology mostly uses manually specified features to judge whether the image contains the characters or not, and detects English characters, because Chinese and English have significant differences in character form structures, the detection accuracy of Chinese applied to the document image and the detection accuracy of English applied to the document image have great differences, and the requirement of practical application is difficult to meet.
Disclosure of Invention
The embodiment of the invention provides a character detection method and a character detection system, which can accurately detect a text in an image.
The technical scheme of the embodiment of the invention is realized as follows: .
In a first aspect, an embodiment of the present invention provides a text detection method, where the method includes:
performing color reduction processing on each image in a three-color channel of a target image to obtain a color reduction image, and converting the target image into a binary image;
merging the connected blocks with the same color in the color reduction image, and merging the connected blocks with the same color in the binary image;
combining the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image in a connecting mode in the vertical and horizontal directions respectively to obtain a candidate character area in the target image;
and extracting a specific region at the position on the target image corresponding to the candidate character region, and judging whether the extracted specific region contains character rows or character columns or not based on the comparison result of the probability of containing the character region in the extracted specific region and a preset probability threshold.
Preferably, the color-reducing processing of each image in the three color channels of the target image to obtain a color-reduced image includes:
quantizing each channel of the red, green and blue channels of the target image by K levels respectively to obtain K level intervals;
and mapping the brightness of each pixel in the target image in the RGB three-color channel into an interval of corresponding channel quantization, wherein K is an integer and 255> K > 1.
Preferably, the merging connected blocks with the same color in the color-reduced image and the merging connected blocks with the same color in the binary image includes:
for each pixel in the subtractive image and in the binary image as a separate connected block, establishing a union set for the pixels and performing the following:
if the color of the pixel is the same as that of any one of the pixels adjacent to the pixel 8, merging the connected blocks of the two adjacent pixels with the same color into the same connected block
And judging the pixel area of each connected block, merging the connected blocks into the connected blocks adjacent to the connected blocks if the pixel area of the connected blocks is smaller than a pixel area threshold value, and setting the color of the connected blocks as the color of the merged connected blocks.
Preferably, after the merging the connected blocks with the same color in the color-reduced image and the merging the connected blocks with the same color in the binary image, the method further comprises:
discarding connected blocks which are in the color reduction image and in the binary image and accord with preset characteristics; the preset features include at least one of:
the area of the connected blocks is smaller than the pixel area threshold value;
the length of any one edge of the connected blocks is larger than that of the connected blocks of a first preset proportion of the corresponding image edge length;
and any side length in the connected blocks is larger than the frame length threshold, and the ratio of the pixel area to the bounding box area is smaller than the ratio threshold.
Preferably, after the merging the connected blocks with the same color in the color-reduced image and the merging the connected blocks with the same color in the binary image, the method further comprises:
merging the connected blocks of each color channel in the subtractive image into new connected blocks respectively based on the position relationship of the connected blocks, and merging the connected blocks in the binary image into new connected blocks based on the position relationship; wherein, at least one of the following processes is executed:
merging the connected blocks with the distance smaller than the distance threshold;
taking the maximum value of the average values of the respective lengths and the widths of any two connected blocks, and if the maximum value meets a preset condition, combining the two selected connected blocks;
combining connected blocks of which the bounding boxes are crossed and the crossed parts accord with preset crossed characteristics;
and merging the connected blocks of which the bounding boxes are aligned and meet a preset alignment merging rule.
Preferably, the merging, in a connected manner, the connected blocks of each color channel of the three color channels of the color-reduced image and the connected blocks in the binary image in the vertical and horizontal directions, respectively, to obtain a candidate text region in the target image, includes:
combining in the horizontal direction, combining in the vertical direction and combining in the horizontal direction in sequence based on a connection combination rule; wherein the connection merging rule comprises:
and connecting the two selected connected blocks to form a new connected block according to at least one of the following conditions:
the minimum distance between the center distances or the edge distances of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a first preset proportion of the minimum side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction;
the distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion of the smallest side length of the bounding boxes of the two communicating blocks in the side length perpendicular to the reference axial direction;
the difference value of the side lengths of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a third preset proportion of the smallest side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction.
Preferably, the extracting a specific region from the position of the candidate text region on the target image, and determining whether the extracted specific region includes a text row or a text column based on a comparison result between a probability that the extracted specific region includes a text region and a preset probability threshold includes:
extracting a specific area from the target image, obtaining connected bounding boxes on the color reduction image and the binary image, sending the bounding boxes obtained by connecting the color reduction image and the binary image into a convolutional neural network classifier by a specific sliding window step length sliding window for judgment, and obtaining the probability of characters contained in each sliding window;
averaging the probabilities of characters contained in the sliding window to obtain the probability that the candidate character area comprises character rows or character columns;
and if the obtained probability is greater than a preset probability threshold, determining that the character row or the character column exists in the specific area.
In a second aspect, an embodiment of the present invention provides a text detection system, where the system includes:
the color reduction binary processing unit is used for carrying out color reduction processing on each image in three-color channels of the target image to obtain a color reduction image and converting the target image into a binary image;
a first merging unit, configured to merge connected blocks with the same color in the color-reduced image, and merge connected blocks with the same color in the binary image;
a second merging unit, configured to merge a connected block of each color channel of the three color channels of the color-reduced image and a connected block in the binary image in a connected manner in the vertical and horizontal directions, respectively, to obtain a candidate text region in the target image;
and the judging unit is used for extracting a specific area from the position, corresponding to the candidate character area, of the target image and judging whether the extracted specific area contains character rows or character columns or not based on the comparison result of the probability of containing the character area in the extracted specific area and a preset probability threshold.
Preferably, the color-reducing binary processing unit is further configured to quantize each of the red, green, and blue channels of the target image into K levels respectively to obtain K levels of intervals;
and mapping the brightness of each pixel in the target image in the RGB three-color channel into an interval of corresponding channel quantization, wherein K is an integer and 255> K > 1.
Preferably, the first merging unit is further configured to perform the following processing on each pixel in the subtractive color image and in the binary image as a single connected block by establishing a union set for the pixels:
the first merging unit is further configured to merge two adjacent connected blocks to which pixels with the same color belong into the same connected block if the color of the pixel is the same as that of any one of the 8 adjacent pixels
The first merging unit is further configured to determine a pixel area of each connected block, merge the connected blocks into a connected block adjacent to the connected block if the pixel area of the connected block is smaller than a pixel area threshold, and set the color of the connected block as the color of the merged connected block.
Preferably, the system further comprises:
a discarding processing unit, configured to discard connected blocks in the color-reduced image and connected blocks in the binary image that meet a preset feature after the first merging unit merges the connected blocks in the color-reduced image that have the same color and merges the connected blocks in the binary image that have the same color; the preset features include at least one of:
discarding connected blocks of which the area is smaller than a pixel area threshold value in the connected blocks;
discarding the connected blocks with any side length larger than the side length of the corresponding image in a first preset proportion;
and discarding the connected blocks of which any side length is larger than the frame length threshold value and the ratio of the pixel area to the bounding box area is smaller than the ratio threshold value.
Preferably, the system further comprises
A fourth merging unit, configured to merge the connected blocks with the same color in the color-reduced image and merge the connected blocks with the same color in the binary image into new connected blocks based on the positional relationship of the connected blocks of each color channel in the color-reduced image, and merge the connected blocks in the binary image into new connected blocks based on the positional relationship after the first merging unit merges the connected blocks with the same color in the color-reduced image and merges the connected blocks with the same color in the binary image;
the fourth merging unit is further configured to perform at least one of the following processes:
merging the connected blocks with the distance smaller than the distance threshold;
taking the maximum value of the average values of the respective lengths and the widths of any two connected blocks, and if the maximum value meets a preset condition, combining the two selected connected blocks;
combining connected blocks of which the bounding boxes are crossed and the crossed parts accord with preset crossed characteristics;
and merging the connected blocks of which the bounding boxes are aligned and meet a preset alignment merging rule.
Preferably, the second merging unit is further configured to sequentially perform merging in the horizontal direction, merging in the vertical direction, and merging in the horizontal direction based on a connection merging rule; wherein the connection merging rule comprises:
and connecting the two selected connected blocks to form a new connected block according to at least one of the following conditions:
the minimum distance between the center distances or the edge distances of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a first preset proportion of the minimum side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction;
the distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion of the smallest side length of the bounding boxes of the two communicating blocks in the side length perpendicular to the reference axial direction;
the difference value of the side lengths of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a third preset proportion of the smallest side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction.
Preferably, the determining unit is further configured to extract a specific region from the target image, obtain a connected bounding box between the color-reduced image and the binary image, send the connected bounding box between the color-reduced image and the binary image into a convolutional neural network classifier by using a specific sliding window step size sliding window for discrimination, and obtain a probability that each sliding window contains a character;
the judging unit is further configured to average probabilities of the characters included in the sliding window to obtain a probability that the candidate character region includes a character row or a character column;
the judging unit is further configured to judge that a text row or a text column exists in the specific region if the obtained probability is greater than a preset probability threshold.
According to the method, the image is divided into the connected blocks according to colors, the connected blocks are potential bounding boxes containing characters, then the probability that each bounding box contains character rows (or character columns) is verified through a convolutional neural network sliding window, when the probability is larger than a preset probability threshold value, the bounding box is judged to contain the character rows (or the character columns), the processing is suitable for the document image and the non-document image, and the text in the image can be accurately detected.
Drawings
FIG. 1 is a first flowchart of a text detection method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a second exemplary embodiment of a text detection method;
fig. 3 to 6 are schematic diagrams illustrating detection results of the text detection method according to the embodiment of the invention;
FIGS. 7-8 are schematic diagrams of convolutional neural networks in an embodiment of the present invention;
fig. 9 is an alternative structural diagram of the text detection system according to the embodiment of the invention.
Detailed Description
Embodiments of the present invention provide a method and system for detecting text in images, including scanned-format images and non-scanned-format images, where the images include not only conventional scanned-format images, such as PDF format, but also non-document images, such as joint photographic experts group (JPG) images, Bitmap (BMP) images, Tagged Image File Format (TIFF) images, Graphics Interchange Format (GIF) images, exchangeable image file format (EXIF) images, and the like.
The text detection system disclosed by the embodiment of the invention positions the region bearing the text in the image by implementing a file detection method, and the image for text detection by the file detection system can be a document image such as a PDF document, or can also be a non-document image such as a JPG image, a BMP image, a TIFF image, a GIF image and an EXIF image, and is used as a source of the image, mainly used for screen capture of electronic equipment (such as a smart phone, a tablet computer and a notebook computer), scanned electronic versions of printed matters such as posters and magazines, and other digital images containing printed Chinese characters.
Referring to fig. 1, in the embodiment of the present invention, in step 101, color reduction processing is performed on each image in three color channels of a target image to obtain a color-reduced image, and the target image is converted into a binary image; in step 102, combining the connected blocks with the same color in the color reduction image, and combining the connected blocks with the same color in the binary image; in step 103, combining the connected blocks of each color channel of the three color channels of the subtractive image and the connected blocks in the binary image in a connected manner in the vertical and horizontal directions respectively to obtain candidate text regions in the target image; in step 104, a specific region is extracted from the position of the target image corresponding to the candidate text region, and whether a text row or a text column is included in the extracted specific region is determined based on a comparison result between a probability that the extracted specific region includes a text region and a preset probability threshold.
It can be seen that the text detection system locates text lines (or text columns, such as text lines of Chinese characters, of course, text lines of letters such as English letters, numbers and symbols, or text lines formed by any type of character combination of Chinese characters, letters, numbers, symbols and the like) in the images shown in fig. 3 to 6 by clustering and layering colors of the images, merging and filtering connected blocks, and discriminating based on the deep convolutional neural network, so as to identify the characters in the text lines based on the located text lines.
The present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Example one
Referring to fig. 2, the method for detecting text by the text detection system of the embodiment includes the following steps:
Inputting a target image to be detected, quantizing each channel of red, green and blue (RGB) three colors of the target image by K levels (K is an integer and 255> K >1, for example, 4), namely dividing (for example, uniformly dividing) the brightness of each channel in the RGB three-color channel into K sections (Bin), namely, reducing the brightness level of 0-255 to 0- (K-1) level, mapping the brightness of each pixel in the target image in the RGB three-color channel into the Bin of the corresponding channel division, for the target image, because each channel in the RGG three-color channel has 256 brightness levels (0-255), the target image can have 255^3 (the third power of 255) colors, and after dividing the brightness of each channel in the RGB three-color channel into K sections, the target image has K ^3 (the third power of K, less than 255^3) colors, thus, a color-reduced image f1 is obtained.
Taking the value K as 2, each channel has two luminance levels of 0 and 1 after quantization, that is, 0 to 127 in the luminance levels 0 to 255 of each channel are mapped to the quantized luminance 0, 128-255 in the luminance levels 0 to 255 of each channel are mapped to the quantized luminance 1, if the luminance of the corresponding RGB three-color channel of one pixel in the target image is (0, 122, 255), the luminance after color reduction processing is (0, 0, 1), and the above-mentioned luminance mapping processing is performed on each pixel in the target image.
As characters in an image usually have 2 cases, 1) the characters are monochromatic; 2) the brightness of the characters is obviously different from the areas around the characters. Step 201 achieves the following technical effects for the above two cases respectively: the characters in the color-reducing image are made to have one of K ^3 colors.
Converting a target image into a gray image (only one gray channel), and performing local adaptive binarization on the gray image: dividing the gray scale image into N windows, and dividing pixels in the windows into two parts according to a uniform threshold value T for each window in the N windows to obtain a binary image f2, wherein T is a Gaussian weighted sum of windows with preset sizes (such as 25 × 25 pixels) and pixels as centers.
As characters in an image usually have 2 cases, 1) the characters are monochromatic; 2) the brightness of the characters is obviously different from the areas around the characters. Step 202 achieves the following technical effects for the two cases: and making the characters in the binary image belong to black or white.
The pixels corresponding to the characters in the color-reduced image and the binary image obtained in step 201 and step 202 have the same color, and in step 203, each pixel is used as a connected block and the connected blocks having the same color are combined, so that the characters are connected.
And step 203, identifying connected blocks in the color reduction image and the binary image, merging the connected blocks with the same color in the color reduction image, and merging the connected blocks with the same color in the binary image.
For the connected block of each color channel of the RGB three-color channels of the color-reduced image f1, and the connected block of the binary image f2 (only one gray image), the following processing is performed:
1) each pixel is taken as a single connected block (namely, a connected subgraph, which is a concept in graph theory, each pixel on an image is taken as a vertex in an undirected graph, an edge is taken between adjacent pixels, and the whole image is taken as an undirected graph).
2) Building and searching a set, which is a classical algorithm for efficiently performing a connected block merging process).
3) The subtractive color image f1 is traversed, and each pixel of the binary image f2 is processed to:
traversing the pixels in the subtractive color image f 1: for a certain pixel, if the color of any one of the pixels (the color of any one channel of the pixel in the RGB channel refers to the brightness value of the pixel in the corresponding channel, and the color in the pixel gray-scale image refers to the gray-scale value of the pixel in the gray-scale image) in 8 adjacent pixels (the pixels refer to 8 adjacent pixels at the upper, lower, left and right sides of the pixel and at two ends of 2 diagonal lines) is the same, merging the connected blocks to which two adjacent pixels with the same color belong into the same connected block; then, traversing each connected block, and judging the pixel area of each connected block: if the pixel area of the connected block k (the value range of k corresponds to the number of connected blocks) is smaller than the pixel area threshold (4 pixels), the connected block k (the pixel area of which is smaller than the pixel area threshold) is merged into the connected block adjacent to the connected block k, and the color of the connected block (the pixel area of which is smaller than the pixel area threshold) is set as the color of the merged connected block.
For example, for a pixel I in the subtractive color image f1 (I taking the value I)1≥i≥1,I1The number of pixels in the subtractive color image f 1) is the luminance of any channel X among the three RGB color channels (here, the channel X is any one of the three RGB color channels, and is referred to as the R channel), and if the luminance of any pixel j among pixels i and 8 adjacent pixels (which means 8 adjacent pixels at the upper, lower, left, right, and both ends of 2 diagonal lines) is the same in the corresponding channel (which is the same as the assumed R channel), the connected block to which the pixel i belongs and the connected block to which the pixel j belongs are merged into one connected block. Then, traversing each connected block, and judging the pixel area of each connected block: and if the pixel area of the connected block k (the value range of k is the number of the connected blocks) is smaller than the threshold value (4 pixels), merging the connected block k into a connected block adjacent to the connected block k, and setting the color of the pixel in the connected block k as the brightness of the connected block into which the connected block k is merged.
For another example, for a certain pixel, if the pixel I (I is I) in the gray scale map of the target image2≥i≥1,I2The number of pixels in the grayscale map) is the same as the color (grayscale value) of the pixel j in 8 adjacent pixels (8 pixels at the upper, lower, left, right, and two ends of 2 diagonal lines in total), the connected blocks to which the adjacent pixels i and j belong are merged into the same connected block; then, traversing each connected block, and judging the pixel area of each connected block: and if the pixel area of the connected block k (the value range of k is the number of the connected blocks) is smaller than the threshold (4 pixels), merging the connected block k into a connected block adjacent to the connected block k, and setting the gray value of the pixel in the connected block k as the gray value of the pixel in the connected block into which the connected block k is merged.
Step 203 merges pixels belonging to the same character (at least the same stroke for a Chinese character) together into a connected block for subsequent processing.
A subsequent step 204 discards connected blocks in the color-reduced image and in the binary image that meet the preset features (where the preset features correspond to features of non-text regions in the image).
The connected blocks of each color channel in the color reduction image f1 and the connected blocks of the binary image f2 are respectively subjected to at least one of the following processes:
1) discarding connected blocks with areas still smaller than a pixel area threshold (for example, 4 pixels) in the connected blocks, and regarding the connected blocks with areas still smaller than the pixel area threshold (for example, 4 pixels) as not bearing characters;
2) discarding the connected blocks corresponding to the background colors: the length of any side of the connected block is larger than a first preset proportion (such as 0.8 time) of the side length of the corresponding image;
3) abandon the intercommunication piece that the frame corresponds: any side of the connected block is larger than the frame length threshold (such as 65 pixels), and the ratio of the pixel area of the connected block to the bounding box area is smaller than the ratio threshold (such as 0.22). The bounding box of a connected block is the smallest rectangle that includes all the pixels contained in the connected block (the sides of the rectangle are parallel to the x and y axes of the image and therefore can be uniquely identified)
Optionally, step 206 may be further performed to merge the disconnected strokes of the characters in the image (e.g., the chinese characters and i and j of the english characters) together in view of the fact that the image includes characters whose strokes are not connected.
And step 205, merging the connected blocks into new connected blocks respectively based on the position relationship (such as distance and intersection) of the connected blocks of each color channel in the subtractive color image, and merging the connected blocks into new connected blocks based on the position relationship (such as distance and intersection) of the connected blocks in the binary image.
1) And merging the connected blocks with the distance smaller than the distance threshold (the distance refers to the Chebyshev distance d between the center points of the bounding boxes of the two connected blocks).
2) The maximum value of the average values of the length and the width of each of the two connected blocks is taken as ms (max ((a1+ b1)/2.0, (a2+ b2)/2.0)), where a1 and b1 are the length and the width a2 of the bounding box of the first connected block and b2 is the length and the width of the bounding box of the second connected block), and 0.4ms is taken as the distance threshold. Then, if the preset conditions are met, the method comprises the following steps: 0.4ms <1 or 1<0.4ms <3, and distance d < 3; and merging the two selected connected blocks.
3) For the connected blocks of each of the RGB three-color channels of the color-reduced image f1 and the connected blocks of the binary image f2, the connected blocks in which the bounding boxes intersect and the intersecting portions conform to the preset intersection features are merged. For example, if the bounding boxes of two connected blocks intersect, the area of the intersection part is greater than the preset 10% of the area of the smaller of the two bounding boxes, and the area of the intersection part is less than 10% of the image area, then the two connected blocks with the intersection in the bounding boxes are merged.
4) Merging connected blocks whose bounding boxes are aligned and which satisfy a preset alignment merging rule (alignment means: the bounding boxes of the connectivity block are aligned in a horizontal or vertical direction, i.e.: 1) the surrounding boxes of the two communicating blocks are consistent in height and consistent in vertical position; 2) the width of the bounding boxes of the two connected blocks is the same, and the positions in the horizontal direction are the same).
An example of an alignment merge rule is: after merging the aligned connected blocks, the bounding boxes of the two connected blocks (i.e. the smallest bounding box containing the two bounding boxes) are merged if the sum of the bounding box areas of the two connected blocks is less than the area increment proportion threshold (e.g. 10%) and the area of the merged bounding box is less than the proportion threshold (e.g. 10%) of the image area.
In step 206, the connected blocks of each color channel of the RGB three-color channels of the color-reduced image f1 and the connected blocks in the binary image f2 are respectively combined in a connected manner in the vertical and horizontal directions, so as to obtain candidate text regions (including text row regions and text column regions) in the image.
The aim is to connect single characters (such as Chinese characters) into character rows or columns: based on the connection merging rule (the same connection merging rule is used for the merging in the horizontal direction and the merging in the vertical direction, which will be described later), the connected blocks are firstly merged in the horizontal direction, then merged in the vertical direction, and finally merged in the horizontal direction.
Generally, characters in a horizontal arrangement mode are more common than characters in a vertical arrangement mode in an image, so that in step 206, the connected blocks are firstly merged in the horizontal direction, the horizontally arranged characters are guaranteed to be merged firstly, the possibility that the horizontal characters are wrongly and vertically merged is reduced, then the connected blocks are merged in the vertical direction, and the characters which do not meet the horizontal merging rule but meet the vertical merging rule are merged well; however, in this process, since the bounding box of the connected block may be changed, a new pair of bounding boxes satisfying the horizontal combination rule is generated, and therefore, the connected blocks in the horizontal direction are combined again.
One example of a join merge rule is that a bounding box of two connected tiles connects two connected tiles as new connected tiles satisfying at least one of the following conditions:
1) the center distance (the distance between the coordinates of the centers of the two bounding boxes in the corresponding reference axial direction) or the minimum distance (the distance between the edge coordinates of the two bounding boxes in the reference axial direction) of the bounding boxes of the two communicating blocks in the reference axial direction (a horizontal axis or a vertical axis) is smaller than a first preset proportion (for example, 0.15 times) of the minimum side length (the side length consistent with the reference axial direction) of the side lengths (the side lengths consistent with the reference axial direction) of the two bounding boxes corresponding to the reference axial direction;
since the coordinate ranges of the two bounding boxes in the corresponding reference axial directions may be separated or partially overlapped, the distances of the bounding boxes of the two connected blocks in the corresponding reference axial directions can be most accurately characterized by adopting the smaller distance in the center distance or the edge distance.
2) The distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion (such as two times) of the minimum side length of the side lengths of the two bounding boxes corresponding to the direction perpendicular to the reference axial direction;
3) the difference between the side lengths of the bounding boxes of the two connected blocks in the reference axial direction (the difference between the side lengths of the bounding boxes of the two 2 connected blocks corresponding to the reference axial direction) is smaller than a third preset proportion (for example, 30%) of the minimum side length of the bounding boxes of the two connected blocks in the corresponding reference axial direction.
In the foregoing steps 201 to 206, a new bounding box obtained from a connected bounding box obtained from the color-reduced image f1 and the binary image f2, that is, a union of bounding boxes connected in a row, is rectangular in shape, that is, a region potentially including a text row or a text column (that is, a candidate text region), a region of interest (roigeion of interest, that is, the aforementioned specific region, a region to be processed which is outlined in the target image I in a manner of a box, a circle, an ellipse, an irregular polygon, etc.) is extracted from the target image I, a probability p _ w that a text is included in each sliding window is obtained by using a specific sliding window step length, for example, using the shortest side length S of the region as a window side length, and using 0.5S as a sliding window step length, and sending the sliding window into a pre-trained Convolutional Neural Network (CNN) classifier to discriminate, and the probabilities p _ w that the text region included in each sliding window are obtained by averaging all p _ w, so as to obtain the probability p _ l that the text region of the candidate text row (or text row), and if the probability p _ l is larger than a preset probability threshold (0.5), judging that the character row (or the character column) exists in the region of interest.
At step 208, the overlapping bounding boxes are combined into a bounding box and output as a text-containing region.
Training a convolutional neural network:
marking Chinese characters in the received data (including character images), then screening the output of the step 206 (before filtering by the convolutional neural network), selecting the part close to the mark, cutting the bounding box into sliding windows according to the method in the step 208, manually separating the windows belonging to the characters and the windows not belonging to the characters, and scaling all the windows to 32 × 32 pixels.
These windows were built into training and validation data, training the neural networks shown in fig. 6 and 7, with each data being cut to 27 x 27 pixel size by random center and flipped randomly. Using random gradient descent (SGD) training, the trained base _ size takes 50, the weight attenuation (weight _ decay) takes 0.0005, the momentum takes 0.9, the learning rate (learning rate) calculates lr ═ base _ lr (1+ 0.0001. iter) ^ (-0.75) with the following formula, iter is the number of iterations, the first 10 ten thousand iterations, base _ lr takes 0.001, and then takes 0.0001.
An embodiment of the present invention provides a text detection system, which is shown in fig. 9 and includes:
a subtractive color binary processing unit 100, configured to perform subtractive color processing on each image in three color channels of a target image to obtain a subtractive color image, and convert the target image into a binary image;
a first merging unit 200, configured to merge connected blocks with the same color in the color-reduced image, and merge connected blocks with the same color in the binary image;
a second merging unit 300, configured to merge a connected block of each color channel of the three color channels of the color-reduced image and a connected block in the binary image in a connected manner in the vertical and horizontal directions, respectively, to obtain a candidate text region in the target image;
a determining unit 400, configured to extract a specific region from a position on the target image corresponding to the candidate text region, and determine whether the extracted specific region includes a text row or a text column based on a comparison result between a probability that the extracted specific region includes a text region and a preset probability threshold.
Preferably, the color-reducing binary processing unit 100 is further configured to quantize each of the red, green, and blue channels of the target image into K levels of intervals;
and mapping the brightness of each pixel in the target image in the RGB three-color channel into an interval of corresponding channel quantization, wherein K is an integer and 255> K > 1.
Preferably, the first merging unit 200 is further configured to, as a single connected block, establish a union set for each pixel in the color-reduced image and in the binary image, and perform the following processing on the union set of the pixels:
the first merging unit 200 is further configured to merge two adjacent connected blocks with the same color into the same connected block if the color of the pixel is the same as that of any one of the 8 adjacent pixels
The first merging unit 200 is further configured to determine a pixel area of each connected block, merge the connected block into a connected block adjacent to the connected block if the pixel area of the connected block is smaller than a pixel area threshold, and set a color of the connected block as a color of the merged connected block.
Preferably, the system further comprises:
a discarding processing unit 500, configured to discard connected blocks in the color-reduced image and connected blocks in the binary image that meet a preset feature after the first merging unit 200 merges connected blocks in the color-reduced image that have the same color and merges connected blocks in the binary image that have the same color; the preset features include at least one of:
discarding connected blocks of which the area is smaller than a pixel area threshold value in the connected blocks;
discarding the connected blocks with any side length larger than the side length of the corresponding image in a first preset proportion;
and discarding the connected blocks of which any side length is larger than the frame length threshold value and the ratio of the pixel area to the bounding box area is smaller than the ratio threshold value.
Preferably, the system further comprises
A third merging unit 600, configured to merge the connected blocks with the same color in the color-reduced image and merge the connected blocks with the same color in the binary image into new connected blocks based on the position relationship of the connected blocks of each color channel in the color-reduced image, and merge the connected blocks in the binary image into new connected blocks based on the position relationship after the first merging unit 200 merges the connected blocks with the same color in the color-reduced image and merges the connected blocks in the binary image into new connected blocks based on the position relationship;
the third merging unit 600 is further configured to perform at least one of the following processes:
merging the connected blocks with the distance smaller than the distance threshold;
taking the maximum value of the average values of the respective lengths and the widths of any two connected blocks, and if the maximum value meets a preset condition, combining the two selected connected blocks;
combining connected blocks of which the bounding boxes are crossed and the crossed parts accord with preset crossed characteristics;
and merging the connected blocks of which the bounding boxes are aligned and meet a preset alignment merging rule.
Preferably, the second merging unit 300 is further configured to sequentially perform merging in the horizontal direction, merging in the vertical direction, and merging in the horizontal direction based on a connection merging rule; wherein the connection merging rule comprises:
and connecting the two selected connected blocks to form a new connected block according to at least one of the following conditions:
the minimum distance between the center distances or the edge distances of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a first preset proportion of the minimum side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction;
the distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion of the smallest side length of the bounding boxes of the two communicating blocks in the side length perpendicular to the reference axial direction;
the difference value of the side lengths of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a third preset proportion of the smallest side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction.
Preferably, the determining unit 400 is further configured to extract an interesting region from the target image, obtain a connected bounding box between the color-reduced image and the binary image, send the connected bounding box between the color-reduced image and the binary image into a convolutional neural network classifier by a sliding window with a specific sliding window step length for discrimination, and obtain a probability that each sliding window contains a character;
the determining unit 400 is further configured to average probabilities of the characters included in the sliding window to obtain a probability that the candidate character region includes a character row or a character column;
the determining unit 400 is further configured to determine that a text row or a text column exists in the region of interest if the obtained probability is greater than a preset probability threshold.
An embodiment of the present invention provides a computer storage medium, where an executable instruction is stored in the computer storage medium, and the executable instruction is used to execute the file detection method shown in fig. 1 or fig. 2.
In summary, the embodiments of the present invention have the following beneficial effects:
the invention provides a method and a system for detecting characters in an image, which are suitable for positioning characters such as print Chinese characters and the like in the image in a network album, and the output result can be used as the input of a character recognition system to help to finally generate an accurate character recognition result.
Those skilled in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Random Access Memory (RAM), a Read-Only Memory (ROM), a magnetic disk, and an optical disk.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a RAM, a ROM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (11)
1. A method for detecting text, the method comprising:
quantizing each channel of red, green and blue channels of the target image by K levels respectively to obtain K level intervals, wherein K is an integer and is more than 255 and more than K > 1;
mapping the brightness of each pixel in the target image in an RGB three-color channel to an interval corresponding to channel quantization to obtain a color reduction image, and converting the target image into a binary image;
merging the connected blocks with the same color in the color reduction image, and merging the connected blocks with the same color in the binary image;
sequentially carrying out combination in the horizontal direction, combination in the vertical direction and combination in the horizontal direction on the connected blocks of each color channel of the three-color channel of the subtractive image and the connected blocks in the binary image based on a connection combination rule to obtain a candidate character area in the target image; wherein the connection merging rule comprises:
and connecting the two selected connected blocks to form a new connected block according to at least one of the following conditions:
the minimum distance between the center distances or the edge distances of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a first preset proportion of the minimum side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction; the distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion of the smallest side length of the bounding boxes of the two communicating blocks in the side length perpendicular to the reference axial direction; the difference value of the side lengths of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a third preset proportion of the smallest side length of the bounding boxes of the two communicating blocks corresponding to the side lengths in the reference axial direction;
and extracting a specific region at the position on the target image corresponding to the candidate character region, and judging whether the extracted specific region contains character rows or character columns or not based on the comparison result of the probability of containing the character region in the extracted specific region and a preset probability threshold.
2. The method of claim 1, wherein merging connected blocks having the same color in a subtractive image and merging connected blocks having the same color in a binary image comprises:
for each pixel in the subtractive image and in the binary image as a separate connected block, establishing a union set for the pixels and performing the following:
if the color of the pixel is the same as that of any one of the adjacent pixels of 8, combining the connected blocks of the two adjacent pixels with the same color into the same connected block;
and judging the pixel area of each connected block, merging the connected blocks into the connected blocks adjacent to the connected blocks if the pixel area of the connected blocks is smaller than a pixel area threshold value, and setting the color of the connected blocks as the color of the merged connected blocks.
3. The method of claim 1, wherein after merging connected blocks having the same color in the subtractive image and merging connected blocks having the same color in the binary image, the method further comprises:
discarding connected blocks which are in the color reduction image and in the binary image and accord with preset characteristics; the preset features include at least one of:
the area of the connected blocks is smaller than the pixel area threshold value;
the length of any one edge of the connected blocks is larger than that of the connected blocks of a first preset proportion of the corresponding image edge length;
and any side length in the connected blocks is larger than the frame length threshold, and the ratio of the pixel area to the bounding box area is smaller than the ratio threshold.
4. The method of claim 1, wherein after merging connected blocks having the same color in the subtractive image and merging connected blocks having the same color in the binary image, the method further comprises:
merging the connected blocks of each color channel in the subtractive image into new connected blocks respectively based on the position relationship of the connected blocks, and merging the connected blocks in the binary image into new connected blocks based on the position relationship; wherein the merging comprises performing at least one of:
merging the connected blocks with the distance smaller than the distance threshold;
taking the maximum value of the average values of the respective lengths and the widths of any two connected blocks, and if the maximum value meets a preset condition, combining the two selected connected blocks;
combining connected blocks of which the bounding boxes are crossed and the crossed parts accord with preset crossed characteristics;
and merging the connected blocks of which the bounding boxes are aligned and meet a preset alignment merging rule.
5. The method according to any one of claims 1 to 4, wherein the extracting a specific region at a position on the target image corresponding to the candidate text region, and determining whether the extracted specific region contains a text row or a text column based on a comparison result between a probability that the extracted specific region contains a text region and a preset probability threshold comprises:
extracting a specific area from the target image, obtaining connected bounding boxes on the color reduction image and the binary image, sending the bounding boxes obtained by connecting the color reduction image and the binary image into a convolutional neural network classifier by a specific sliding window step length sliding window for judgment, and obtaining the probability of characters contained in each sliding window;
averaging the probabilities of characters contained in the sliding window to obtain the probability that the candidate character area comprises character rows or character columns;
and if the obtained probability is greater than a preset probability threshold, determining that the character row or the character column exists in the specific area.
6. A text detection system, the system comprising:
the color reduction binary processing unit is used for quantizing each channel of red, green and blue channels of the target image by K levels respectively to obtain K level intervals, wherein K is an integer and is more than 255 and K is more than 1;
the color reduction binary processing unit is used for mapping the brightness of each pixel in the target image in an RGB three-color channel to a quantized interval of a corresponding channel to obtain a color reduction image, and converting the target image into a binary image;
a first merging unit, configured to merge connected blocks with the same color in the color-reduced image, and merge connected blocks with the same color in the binary image;
a second merging unit, configured to sequentially perform merging in the horizontal direction, merging in the vertical direction, and merging in the horizontal direction on a connected block of each color channel of the three-color channels of the color-reduced image and a connected block in the binary image based on a connection and merging rule, so as to obtain a candidate text region in the target image; wherein the connection merging rule comprises:
and connecting the two selected connected blocks to form a new connected block according to at least one of the following conditions:
the minimum distance between the center distances or the edge distances of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a first preset proportion of the minimum side length of the side lengths of the bounding boxes of the two communicating blocks corresponding to the reference axial direction; the distance between the bounding boxes of the two communicating blocks in the direction perpendicular to the reference axial direction is smaller than a second preset proportion of the smallest side length of the bounding boxes of the two communicating blocks in the side length perpendicular to the reference axial direction; the difference value of the side lengths of the bounding boxes of the two communicating blocks in the reference axial direction is smaller than a third preset proportion of the smallest side length of the bounding boxes of the two communicating blocks corresponding to the side lengths in the reference axial direction;
and the judging unit is used for extracting a specific area from the position, corresponding to the candidate character area, of the target image and judging whether the extracted specific area contains character rows or character columns or not based on the comparison result of the probability of containing the character area in the extracted specific area and a preset probability threshold.
7. The system of claim 6,
the first merging unit is further configured to perform the following processing on each pixel in the color reduction image and the binary image as a single connected block, establishing a union-check set for the pixels:
the first merging unit is further configured to merge two adjacent connected blocks to which pixels with the same color belong into the same connected block if the color of the pixel is the same as that of any one of the pixels adjacent to the 8 th pixel;
the first merging unit is further configured to determine a pixel area of each connected block, merge the connected blocks into a connected block adjacent to the connected block if the pixel area of the connected block is smaller than a pixel area threshold, and set the color of the connected block as the color of the merged connected block.
8. The system of claim 6, wherein the system further comprises:
a discarding processing unit, configured to discard connected blocks in the color-reduced image and connected blocks in the binary image that meet a preset feature after the first merging unit merges the connected blocks in the color-reduced image that have the same color and merges the connected blocks in the binary image that have the same color; the preset features include at least one of:
the area of the connected blocks is smaller than the pixel area threshold value;
the length of any one edge of the connected blocks is larger than that of the connected blocks of a first preset proportion of the corresponding image edge length;
and any side length in the connected blocks is larger than the frame length threshold, and the ratio of the pixel area to the bounding box area is smaller than the ratio threshold.
9. The system of claim 6, further comprising
A fourth merging unit, configured to merge the connected blocks with the same color in the color-reduced image and merge the connected blocks with the same color in the binary image into new connected blocks based on the positional relationship of the connected blocks of each color channel in the color-reduced image, and merge the connected blocks in the binary image into new connected blocks based on the positional relationship after the first merging unit merges the connected blocks with the same color in the color-reduced image and merges the connected blocks with the same color in the binary image;
wherein the fourth merging unit is further configured to perform at least one of the following processes:
merging the connected blocks with the distance smaller than the distance threshold;
taking the maximum value of the average values of the respective lengths and the widths of any two connected blocks, and if the maximum value meets a preset condition, combining the two selected connected blocks;
combining connected blocks of which the bounding boxes are crossed and the crossed parts accord with preset crossed characteristics;
and merging the connected blocks of which the bounding boxes are aligned and meet a preset alignment merging rule.
10. The system according to any one of claims 6 to 9,
the judgment unit is further configured to extract a specific region from the target image, obtain connected bounding boxes in the color reduction image and the binary image, send the connected bounding boxes in the color reduction image and the binary image to a convolutional neural network classifier through a specific sliding window step length sliding window for judgment, and obtain a probability that each sliding window contains characters;
the judging unit is further configured to average probabilities of the characters included in the sliding window to obtain a probability that the candidate character region includes a character row or a character column;
the judging unit is further configured to judge that a text row or a text column exists in the specific region if the obtained probability is greater than a preset probability threshold.
11. A storage medium having stored thereon executable instructions for causing a processor to perform the text detection method of any one of claims 1 to 5 when executed.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610091568.8A CN107093172B (en) | 2016-02-18 | 2016-02-18 | Character detection method and system |
PCT/CN2017/073407 WO2017140233A1 (en) | 2016-02-18 | 2017-02-13 | Text detection method and system, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610091568.8A CN107093172B (en) | 2016-02-18 | 2016-02-18 | Character detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107093172A CN107093172A (en) | 2017-08-25 |
CN107093172B true CN107093172B (en) | 2020-03-17 |
Family
ID=59625563
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610091568.8A Active CN107093172B (en) | 2016-02-18 | 2016-02-18 | Character detection method and system |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107093172B (en) |
WO (1) | WO2017140233A1 (en) |
Families Citing this family (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108205676B (en) * | 2017-11-22 | 2019-06-07 | 西安万像电子科技有限公司 | The method and apparatus for extracting pictograph region |
CN108989793A (en) * | 2018-07-20 | 2018-12-11 | 深圳市华星光电技术有限公司 | A kind of detection method and detection device of text pixel |
CN109191539B (en) * | 2018-07-20 | 2023-01-06 | 广东数相智能科技有限公司 | Oil painting generation method and device based on image and computer readable storage medium |
CN109389150B (en) * | 2018-08-28 | 2022-04-05 | 东软集团股份有限公司 | Image consistency comparison method and device, storage medium and electronic equipment |
CN111222368B (en) * | 2018-11-26 | 2023-09-19 | 北京金山办公软件股份有限公司 | Method and device for identifying document paragraphs and electronic equipment |
CN111325199B (en) * | 2018-12-14 | 2023-10-27 | 中移(杭州)信息技术有限公司 | Text inclination angle detection method and device |
CN111401110A (en) * | 2019-01-03 | 2020-07-10 | 百度在线网络技术(北京)有限公司 | Method and device for extracting information |
CN109815957A (en) * | 2019-01-30 | 2019-05-28 | 邓悟 | A kind of character recognition method based on color image under complex background |
CN110059685B (en) * | 2019-04-26 | 2022-10-21 | 腾讯科技(深圳)有限公司 | Character area detection method, device and storage medium |
CN110058838B (en) * | 2019-04-28 | 2021-03-16 | 腾讯科技(深圳)有限公司 | Voice control method, device, computer readable storage medium and computer equipment |
CN109977956B (en) * | 2019-04-29 | 2022-11-18 | 腾讯科技(深圳)有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111178346B (en) * | 2019-11-22 | 2023-12-08 | 京东科技控股股份有限公司 | Text region positioning method, text region positioning device, text region positioning equipment and storage medium |
CN111062365B (en) * | 2019-12-30 | 2023-05-26 | 上海肇观电子科技有限公司 | Method, apparatus, chip circuit and computer readable storage medium for recognizing mixed typeset text |
CN111369441B (en) * | 2020-03-09 | 2022-11-15 | 稿定(厦门)科技有限公司 | Word processing method, medium, device and apparatus |
CN111340028A (en) * | 2020-05-18 | 2020-06-26 | 创新奇智(北京)科技有限公司 | Text positioning method and device, electronic equipment and storage medium |
CN111681229B (en) * | 2020-06-10 | 2023-04-18 | 创新奇智(上海)科技有限公司 | Deep learning model training method, wearable clothes flaw identification method and wearable clothes flaw identification device |
CN112149523B (en) * | 2020-09-04 | 2021-05-28 | 开普云信息科技股份有限公司 | Method and device for identifying and extracting pictures based on deep learning and parallel-searching algorithm |
CN112418204A (en) * | 2020-11-18 | 2021-02-26 | 杭州未名信科科技有限公司 | Text recognition method, system and computer medium based on paper document |
CN112650832B (en) * | 2020-12-14 | 2022-09-06 | 中国电子科技集团公司第二十八研究所 | Knowledge correlation network key node discovery method based on topology and literature characteristics |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101398894A (en) * | 2008-06-17 | 2009-04-01 | 浙江师范大学 | Automobile license plate automatic recognition method and implementing device thereof |
CN101447027A (en) * | 2008-12-25 | 2009-06-03 | 东莞市微模式软件有限公司 | Binaryzation method of magnetic code character area and application thereof |
CN102136064A (en) * | 2011-03-24 | 2011-07-27 | 成都四方信息技术有限公司 | System for recognizing characters from image |
CN103632159A (en) * | 2012-08-23 | 2014-03-12 | 阿里巴巴集团控股有限公司 | Method and system for training classifier and detecting text area in image |
CN103839062A (en) * | 2014-03-11 | 2014-06-04 | 东方网力科技股份有限公司 | Image character positioning method and device |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090148043A1 (en) * | 2007-12-06 | 2009-06-11 | International Business Machines Corporation | Method for extracting text from a compound digital image |
CN101615252B (en) * | 2008-06-25 | 2012-07-04 | 中国科学院自动化研究所 | Method for extracting text information from adaptive images |
CN101763516B (en) * | 2010-01-15 | 2012-02-29 | 南京航空航天大学 | Character recognition method based on fitting functions |
JP5826081B2 (en) * | 2012-03-19 | 2015-12-02 | 株式会社Pfu | Image processing apparatus, character recognition method, and computer program |
CN103034856B (en) * | 2012-12-18 | 2016-01-20 | 深圳深讯和科技有限公司 | The method of character area and device in positioning image |
-
2016
- 2016-02-18 CN CN201610091568.8A patent/CN107093172B/en active Active
-
2017
- 2017-02-13 WO PCT/CN2017/073407 patent/WO2017140233A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101398894A (en) * | 2008-06-17 | 2009-04-01 | 浙江师范大学 | Automobile license plate automatic recognition method and implementing device thereof |
CN101447027A (en) * | 2008-12-25 | 2009-06-03 | 东莞市微模式软件有限公司 | Binaryzation method of magnetic code character area and application thereof |
CN102136064A (en) * | 2011-03-24 | 2011-07-27 | 成都四方信息技术有限公司 | System for recognizing characters from image |
CN103632159A (en) * | 2012-08-23 | 2014-03-12 | 阿里巴巴集团控股有限公司 | Method and system for training classifier and detecting text area in image |
CN103839062A (en) * | 2014-03-11 | 2014-06-04 | 东方网力科技股份有限公司 | Image character positioning method and device |
Also Published As
Publication number | Publication date |
---|---|
CN107093172A (en) | 2017-08-25 |
WO2017140233A1 (en) | 2017-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107093172B (en) | Character detection method and system | |
CN111476067B (en) | Character recognition method and device for image, electronic equipment and readable storage medium | |
CN101122953B (en) | Picture words segmentation method | |
JP6139396B2 (en) | Method and program for compressing binary image representing document | |
KR100339691B1 (en) | Apparatus for recognizing code and method therefor | |
CN107590491B (en) | Image processing method and device | |
US8462394B2 (en) | Document type classification for scanned bitmaps | |
CN100527156C (en) | Picture words detecting method | |
CN111191695A (en) | Website picture tampering detection method based on deep learning | |
CN104298982A (en) | Text recognition method and device | |
KR101169140B1 (en) | Apparatus and method for generating image for text region extraction | |
CN109241861B (en) | Mathematical formula identification method, device, equipment and storage medium | |
CN105447522A (en) | Complex image character identification system | |
US20150371100A1 (en) | Character recognition method and system using digit segmentation and recombination | |
CN103577818A (en) | Method and device for recognizing image characters | |
CN107977658B (en) | Image character area identification method, television and readable storage medium | |
JP2009169948A (en) | Device and method for determining orientation of document, and program and recording medium thereof | |
WO2015002719A1 (en) | Method of improving contrast for text extraction and recognition applications | |
EP0949579A2 (en) | Multiple size reductions for image segmentation | |
CN110598566A (en) | Image processing method, device, terminal and computer readable storage medium | |
JP4077919B2 (en) | Image processing method and apparatus and storage medium therefor | |
US7146047B2 (en) | Image processing apparatus and method generating binary image from a multilevel image | |
CN110210467B (en) | Formula positioning method of text image, image processing device and storage medium | |
CN111461131A (en) | Identification method, device, equipment and storage medium for ID card number information | |
CN109948598B (en) | Document layout intelligent analysis method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |