CN104951741A - Character recognition method and device thereof - Google Patents

Character recognition method and device thereof Download PDF

Info

Publication number
CN104951741A
CN104951741A CN201410127438.6A CN201410127438A CN104951741A CN 104951741 A CN104951741 A CN 104951741A CN 201410127438 A CN201410127438 A CN 201410127438A CN 104951741 A CN104951741 A CN 104951741A
Authority
CN
China
Prior art keywords
block
identified
pixel
word
combined
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410127438.6A
Other languages
Chinese (zh)
Inventor
张宇
杜志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201410127438.6A priority Critical patent/CN104951741A/en
Publication of CN104951741A publication Critical patent/CN104951741A/en
Priority to HK15111978.6A priority patent/HK1211363A1/en
Pending legal-status Critical Current

Links

Abstract

The present application discloses a character recognition method and a device thereof to solve the problem of low precision of character recognition in the prior art is solved. The external connection rectangle of the communication region formed by each stroke in a character row is determined, and the external connection rectangle is cut to obtain blocks to be combined. According to the overlapped area of the blocks to be combined and a preset width to height ratio range, the blocks to be combined are combined to obtain blocks to be recognized, a start point block is selected from the blocks to be recognized, for each block to be recognized behind the start point block, all blocks from the start point block to the blocks to be recognized are combined to an undetermined character block, the character in the undetermined character block is identified, the confidence of recognition is determined, and the character with highest confidence is determined the character in the actual character block where the start point block is. Through the above method, the condition that characters with left and right structures and left, middle and right structures are wrongly recognized as different characters can be avoided, and in the scene that Chinese characters and other characters are mixed and arranged, the precision of character recognition can be effectively improved.

Description

A kind of character recognition method and device
Technical field
The application relates to field of computer technology, particularly relates to a kind of character recognition method and device.
Background technology
Along with the development of computer technology, character recognition technology arises at the historic moment, by this technology, equipment can by the Text region in image out, and typing character recognition technology being applied to non-digitalization information just can significantly improve the efficiency of inputting of non-digitalization information.Conventional method is, gathers the image (such as, taking pictures to papery document) of non-digitalization information, and the word in recycling character recognition technology recognition image, to obtain information and typing.Obviously, when adopting character recognition technology to carry out typing to non-digitalization information, the precision of Text region is a key factor of the accuracy determining entry information.
In the prior art, the method for Text region mainly contains following two kinds.
The first, as shown in Figure 1, specifically comprise the following steps:
S101: extract the literal line in image, the literal line of extraction is carried out binary conversion treatment.
Fig. 2 A is the first the character recognition method schematic diagram of the prior art illustrated for papery document, in fig. 2, this papery document comprises a lot of literal lines, such as, first literal line is the literal line at " document " these two word places, the literal line at second these four word place of word behavior " vendor information ", the literal line at the 3rd these two word places of word behavior " phone ".
Then in the image shown in Fig. 2 A, respectively each literal line can be extracted, then binary conversion treatment is carried out to each literal line extracted.Wherein, the object of binary conversion treatment is: with the word in literal line for prospect, with other information in literal line for background, distinguishes the word in literal line and other information.
Such as, suppose the literal line at " phone " place in the word behavior Fig. 2 A extracted, then the pixel value of each pixel at the strokes of characters place in this literal line can be set to 255(pixel value be the color of the pixel of 255 is pure white), it be the color of the pixel of 0 is black that the pixel value of other information place pixels is set to 0(pixel value), as shown in Figure 2 B.
Fig. 2 B is schematic diagram literal line being carried out binary conversion treatment.In fig. 2b, the strokes of characters of all words (comprising Chinese character, numeral, punctuation mark) in the literal line at " phone " place is all pure white, and other places are all ater.
S102: for the literal line after each binary conversion treatment extracted, determine the vertical projection of each row pixel in this literal line.
Wherein, the vertical projection of a row pixel is the pixel value sum of this row pixel.
S103: according to the vertical projection of each row pixel, determines the word block at each word place in this literal line.
When determining the height of word block at a word place, for the literal line after binary conversion treatment, determine the top pixel value in all strokes of a word be 255 pixel and bottom pixel value be the pixel of 255, the horizontal line at these two pixel places is defined as coboundary and the lower boundary of this literal field block, as shown in Figure 2 B.
In the literal line shown in Fig. 2 B, for " electricity " word, when determining the height of the word block at this " electricity " word place, can determine that in the stroke of this " electricity " word, the top pixel value is the horizontal line at the pixel place of 255, as the coboundary of the word block at this " electricity " word place, be the horizontal line at the pixel place of 255 by bottom pixel value in the stroke being somebody's turn to do " electricity " word, as the lower boundary of the word block at this " electricity " word place.
When determining the width of word block at a word place, due to usually all certain gap can be had between word and word, therefore, pixel ideally on gap to be all pixel value be 0 pixel, that is on gap, often the vertical projection of row pixel is 0, therefore, can be the left margin in the region of 0 and right margin using vertical projection as the right margin of adjacent two word blocks and left margin, as shown in Figure 2 B.
In the literal line shown in Fig. 2 B, for " electricity " word and " words " word, because on the gap between these two words, the vertical projection of each row pixel is 0, therefore, being the right margin that the left margin in the region of 0 is defined as the word block at " electricity " word place by this vertical projection, is the left margin that the right margin in the region of 0 is defined as the word block at " words " word place by this vertical projection.
Like this, by up-and-down boundary and the right boundary of word block, each word block can be determined.
S104: be a word by the image recognition in each rectangle block.
Visible, above-mentioned first method of the prior art mainly determines word block by the method for vertical projection, and then identifies the word in the word block determined.
But, due to the complex structure of Chinese character, the word of more left and right, left, center, right structure is there is in Chinese character, for the Chinese character of this structure, often also there are some gaps between radical, and the block at certain radical place of a word is often defined as a word block by the character recognition method shown in Fig. 1, thus certain radical of a word is identified as a word, therefore, the precision of the method Text region shown in Fig. 1 is lower.Such as, for " tree " word, method is as shown in Figure 1 adopted to be identified as " wood ", " again ", " very little " possibly, or " power ", " very little ", or " wood ", " to ".
The second, due to the relatively-stationary word of ratio that Chinese character is a kind of width and height, therefore, for the literal line extracted, the height of each word in this literal line can be estimated, and for each word in this literal line, determine the left margin of the block at this word place, according to the width preset and ratio highly, determine the maximum estimated right margin of this word location block (wherein, the height that this maximum estimated right margin is greater than this word to the distance of left margin is multiplied by the product stating ratio), the actual right margin of the word block at this word place is found again left from this maximum estimated right margin, after searching out, the word block at this word place can be determined, be a word by the image recognition in this literal field block, as shown in Figure 3.
Fig. 3 is the schematic diagram of the second character recognition method in prior art, and in figure 3, the region shown in shade is the word block at possible word place, is described for the 3rd shadow region.When determining word block, first determining the left margin of word block, supposing that the coordinate of the left margin determined is (X 0, 0), at the ratio according to the width preset and height, determine the maximum estimated right margin of the block at this word place.Suppose that the coordinate of maximum estimated right margin is (X r, 0), then X r-X 0value (also namely, left margin is to the distance of the maximum estimated right margin) height that is greater than this literal field block take advantage of the product of default ratio.Actual right margin (the X of this literal field block is found again left from maximum estimated right margin 1, 0), after searching out actual right margin, this literal field block can be determined.
Visible, second method of the prior art carries out Text region based on the ratio of width to height (i.e. the ratio of width and height) this hypothesis relatively-stationary of Chinese character, but, even if the ratio of width to height of Chinese character is relatively fixing, also the situation of a large amount of Chinese character and other word mixings such as English alphabet, arabic numeral is there is in practical application scene, and the ratio of width to height of other words except Chinese character may not be fixed, therefore, second method is when identifying Chinese character and other word mixings, and accuracy of identification is also lower.
Such as, supposing that literal line comprises word " 1 day ", is the situation of arabic numeral and Chinese character mixing, in this case, adopt second method identification word time, possibly mistake " 1 day " is identified as " old ".
Summary of the invention
The embodiment of the present application provides a kind of character recognition method and device, in order to solve the problem that in prior art, the precision of Text region is lower.
A kind of character recognition method that the embodiment of the present application provides, comprising:
Determine the connected domain be made up of each stroke in literal line, determine the boundary rectangle of each connected domain;
For each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined;
According to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified;
According to the vertical order of each block to be identified, select block to be identified as starting point block successively;
For the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified;
Word the highest for the degree of confidence of identification is defined as the word in the actual word block at described starting point block place.
A kind of character recognition device that the embodiment of the present application provides, comprising:
Boundary rectangle determination module, determines the connected domain be made up of each stroke in literal line, determines the boundary rectangle of each connected domain;
Cutting module, for each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined;
Merge module, according to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified;
Degree of confidence determination module, according to the vertical order of each block to be identified, selects block to be identified as starting point block successively; For the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified;
Identify determination module, word the highest for the degree of confidence of identification is defined as the word in the actual word block at described starting point block place.
The embodiment of the present application provides a kind of character recognition method and device, the boundary rectangle of the connected domain be made up of each stroke in the method determination literal line, according to the pixel value of pixel in each boundary rectangle, cutting is carried out to each boundary rectangle and obtains block to be combined, again according to overlapping region and the default aspect ratio range of each block to be combined, each block to be combined is merged, obtain block to be identified, therefrom select starting point block, for being positioned at the block each to be identified after starting point block, a word block undetermined is merged into by from starting point block to all blocks of this block to be identified, and identify the word in this word block undetermined, determine the degree of confidence identified, finally word the highest for degree of confidence is defined as the word in the actual word block at this starting point block place.By said method, the situation Chinese character of left and right, left, center, right structure being identified as by mistake multiple different Chinese character can be avoided, in the scene of Chinese character and other word mixings, also effectively can improve the precision of Text region.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide further understanding of the present application, and form a application's part, the schematic description and description of the application, for explaining the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the process of the first Text region in prior art;
Fig. 2 A is the first the character recognition method schematic diagram of the prior art illustrated for papery document;
Fig. 2 B is schematic diagram literal line being carried out binary conversion treatment;
Fig. 3 is the schematic diagram of the second character recognition method in prior art;
The Text region process that Fig. 4 provides for the embodiment of the present application;
Fig. 5 for the embodiment of the present application provide for " to " situation of boundary rectangle overlap that illustrates of word;
The application scenarios schematic diagram of the local binarization that Fig. 6 A provides for the embodiment of the present application;
Fig. 6 B is for only adopting the schematic diagram of overall binaryzation to the literal line shown in Fig. 6 A;
Schematic diagram after the combination overall situation binaryzation that Fig. 6 C provides for the embodiment of the present application and local binarization process the literal line shown in Fig. 6 A;
The character recognition device structural representation that Fig. 7 provides for the embodiment of the present application.
Embodiment
Multiple different Chinese character is identified as in order to avoid the Chinese character mistake by left and right, left, center, right structure, first cutting block is adopted in the embodiment of the present application, again according to the overlapping region of the block of cutting and the ratio of width to height of default word block, to the method that the block of cutting merges, the accuracy of identification of the Chinese character to left and right, left, center, right structure effectively can be improved.And in order to improve Text region precision in the scene of Chinese character and other word mixings, the embodiment of the present application adopts the mode soundd out, the block that may merge temporarily merges into a word block undetermined, and identify the word in word block undetermined, determine the degree of confidence identified, word the highest for the degree of confidence of identification is defined as the word in actual word block.
For making the object of the application, technical scheme and advantage clearly, below in conjunction with the application's specific embodiment and corresponding accompanying drawing, technical scheme is clearly and completely described.Obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all belong to the scope of the application's protection.
The Text region process that Fig. 4 provides for the embodiment of the present application, specifically comprises the following steps:
S401: determine the connected domain be made up of each stroke in literal line, determine the boundary rectangle of each connected domain.
Due in practical application scene, the all words comprising Chinese character are all made up of simple stroke, and the stroke forming a word often can form one or more connected domain, therefore, in the embodiment of the present application, the connected domain be made up of each stroke in literal line can be determined, then determine the boundary rectangle of each connected domain.
Concrete, before determining the connected domain be made up of each stroke in this literal line, first can extract this literal line in image (this image can be such as the images such as the photo of document), and carry out binary conversion treatment to this word extracted is capable.Wherein, from image, the method for literal line is extracted not within the application's protection domain.To the capable object of carrying out binary conversion treatment of this word be: using the stroke in literal line as prospect, other parts as a setting, to distinguish stroke in literal line and other parts.Such as, the pixel value of the pixel at the stroke place in literal line can be set to the foreground pixel value (as 255, pure white) preset, the pixel value of the pixel at other part places be set to the background pixel value (as 0, ater) preset.Like this, when determining the connected domain be made up of each stroke in literal line, can determine to be the connected domain that the pixel that in literal line, pixel value is this foreground pixel value is formed the connected domain that each stroke is formed, finally to determine the boundary rectangle of each connected domain again.
It should be noted that, for a connected domain in the embodiment of the present application, the boundary rectangle of this connected domain determined is not its minimum enclosed rectangle, but up-and-down boundary be horizontal line, right boundary is boundary rectangle perpendicular to horizontal perpendicular line.
S402: for each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined.
In the embodiment of the present application, adjacent two words considering in practical application scene in image may be sticked together for some reason (as, during handwriting, the stroke of two adjacent words may connect together), therefore, after the boundary rectangle determining each connected domain, can for each boundary rectangle, according in this boundary rectangle often the vertical projection of row pixel cutting is carried out to this boundary rectangle, obtain block to be combined.The stroke comprised in the block to be combined now obtained may be all strokes in a word, is also likely certain radical in a Chinese character, as " wood " in " tree " word.
S403: according to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified.
In the embodiment of the present application, after each boundary rectangle is cut into block to be combined, can merge each block to be combined.
Concrete, when merging each block to be combined in the embodiment of the present application, can carry out two kinds of merging, a kind of is that the block to be combined of overlying relation merges, and another kind is that the block to be combined of left-right relation merges.
When merging the block to be combined of overlying relation, first will for two blocks to be combined, determine whether these two blocks to be combined are overlying relation, concrete grammar can be: judge whether the distance of the coboundary of these two blocks to be combined is greater than the first distance threshold, and judge whether the distance of the lower boundary of these two blocks to be combined is greater than second distance threshold value, wherein, the first distance threshold and second distance threshold value can set as required.Suppose that these two blocks to be combined are the first block to be combined and the second block to be combined, then:
If two judged results are and are, then for these two blocks to be combined, continue to judge whether the horizontal ordinate of the left margin of the first block to be combined is less than the horizontal ordinate of the right margin of the second block to be combined, or whether the horizontal ordinate of the right margin of the first block to be combined is less than the horizontal ordinate of the left margin of the second block to be combined.When the horizontal ordinate of the left margin of the first block to be combined is less than the horizontal ordinate of the right margin of the second block to be combined, then can judge whether the left margin of the first block to be combined is greater than the 3rd distance threshold to the distance of the right margin of the second block to be combined, if so, then can determine that these two blocks to be combined are overlying relation.When the horizontal ordinate of the right margin of the first block to be combined is less than the horizontal ordinate of the left margin of the second block to be combined, then can judge whether the right margin of the first block to be combined is greater than the 3rd distance threshold to the distance of the left margin of the second block to be combined, if so, then can determine that these two blocks to be combined are overlying relation.When determining that these two blocks to be combined are overlying relation, then can merge this two blocks to be combined.When merging, specifically determine the boundary rectangle of the connected domain simultaneously comprised in these two blocks to be combined, as the block after merging.
Such as, for " extensively " word, in step S401, " Dian " on its top and " factory " of bottom, be defined as two connected domains, and then also just determine two boundary rectangles, and one is the boundary rectangle of " Dian ", and another is the boundary rectangle of " factory ".And these two boundary rectangles also cannot cutting in step S402, all as block to be combined, therefore, for these two blocks to be combined in step S403, the distance of the coboundary of two blocks to be combined is less than the first distance threshold, the distance of lower boundary is less than second distance threshold value, and then the left margin of the boundary rectangle of judgement " Dian " is less than the right margin of the boundary rectangle of " factory ", and the left margin of the boundary rectangle of " Dian " is greater than the 3rd distance threshold to the distance of the right margin of the boundary rectangle of " factory ", therefore, determine that these two blocks to be combined are overlying relation, thus, merge this two blocks to be combined, namely, determine a boundary rectangle simultaneously comprising " Dian " and " factory " these two connected domains, as the block after merging.
When merging the block to be combined of right position relation, can merge according to the aspect ratio range of the overlapping region of adjacent two blocks to be combined and default word block.Concrete, due to the boundary rectangle that the boundary rectangle determined in step S401 is connected domain, and be up-and-down boundary be horizontal boundary rectangle, therefore, for the Chinese character of left and right, left, center, right structure, the boundary rectangle that there will be different connected domain has overlapping situation, also namely there will be overlap by the block to be combined after step S402 cutting.
Such as, for " concerning " word, " again " of left side and " very little " of right-hand part is two different connected domains, but the boundary rectangle of these two connected domains has overlapping part, as shown in Figure 5.
Fig. 5 for the embodiment of the present application provide for " to " situation of boundary rectangle overlap that illustrates of word, in Figure 5, " to " left side " again " of word and right-hand part " very little " be two different connected domains, and the boundary rectangle of these two connected domains has overlapping part, when carrying out cutting by step S402 to boundary rectangle, these two boundary rectangles are not split, and are directly carried out subsequent treatment by as block to be combined, therefore, these two blocks to be combined have lap.
In addition, because Chinese character is the relatively-stationary a kind of word of the ratio of width to height, therefore, can according to the feature of Chinese character, preset the aspect ratio range of word block, this aspect ratio range is exactly the aspect ratio range of general Chinese character.
Then can according to the size of the overlapping region of adjacent two blocks to be combined in the embodiment of the present application, and this aspect ratio range preset, these two blocks to be combined are merged.Also be, for adjacent two blocks to be combined, on the left of determining, the horizontal ordinate of the right margin of block to be combined is the first horizontal ordinate, on the right side of determining, the horizontal ordinate of the left margin of block to be combined is the second horizontal ordinate, when the first horizontal ordinate be greater than the second horizontal ordinate and the difference that the first horizontal ordinate subtracts the second horizontal ordinate be greater than the 3rd default threshold value and the ratio of width to height of the block obtained after being merged by these two blocks to be combined falls in the aspect ratio range of default word block time, determine that these two blocks to be combined meet specified requirements, these two blocks to be combined are merged.In other words, if the overlapping region of two adjacent blocks to be combined is comparatively large, and the ratio of width to height of the block obtained after merging falls into default aspect ratio range, then merge this two blocks to be combined.
In addition, first block to be combined is positioned at completely to the situation (the first block to be combined and the second block to be combined are any two different blocks to be combined) of the second block inside to be combined, then can directly be merged by these two blocks to be combined, the block to be combined after merging is the second block to be combined.
Like this, for the Chinese character of most of left and right, left, center, right structure, even if in step S402, the radical of these Chinese characters has been cut into different blocks to be combined, in the combining step of step S403, these radicals still can be merged into a block, as block to be identified.
It should be noted that, when merging block to be combined, can (this order be the sequential write of normal words according to the vertical order of each block to be combined, namely from left to right), successively for each block to be combined, whether merge according to above-mentioned condition judgment, if merge, then judge whether the block obtained after merging can also continue with follow-up block to be combined to merge again, until can not be merged, when not merging, then continue to merge from the next block of the block after the current merging obtained, namely recurrence merges.
S404: according to the vertical order of each block to be identified, select block to be identified as starting point block successively.
Wherein, above-mentioned vertical order refers to general character calligraph order, that is, from left to right.
S405: for the block each to be identified be positioned at after starting point block, determine from origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified.
In the embodiment of the present application, each block to be identified is obtained and after have selected starting point block by step S404 by step S403, can for the block each to be identified be positioned at after this starting point block, determine from this origin zone BOB(beginning of block), all blocks to this block to be identified, and tentative all blocks determined are merged into a word block undetermined, then this word block undetermined is sent into identify that engine identifies, to determine the degree of confidence of the word identified.Wherein, identify that the degree of confidence of the word of a word block undetermined represents: the credible degree Stroke discrimination in this word block undetermined being become this word.Specifically determine that the method for the degree of confidence identifying word is not within the protection domain of the application.
S406: word the highest for the degree of confidence of identification is defined as the word in the actual word block at this starting point block place.
By step S405 by likely all tentatively with the block to be identified of this origin zone merged block be merged into word block undetermined and after determining the degree of confidence of identification, then word the highest for the degree of confidence of identification can be defined as the word of the actual word block at this starting point block place.
It should be noted that, also be recurrence identification according to the method for credible degree identification word in step S404 ~ S406, also be, after identifying the highest word of degree of confidence according to current starting point block, can determine that word block meta undetermined when degree of confidence is the highest is in last block to be identified, then with this next one block to be identified being positioned at last block to be identified for starting point block identifies.
The effect of the character recognition method that the embodiment of the present application provides is described with an instantiation below.
Such as, suppose that the word in literal line is actually " 1 day, greenery village ", then by above-mentioned steps S401,9 connected domains can be determined, respectively:
The connected domain of " Si " the first half (being similar to the part of " one ");
The connected domain of " Si " the latter half (being " a carrying " stroke);
The connected domain of " record ";
The connected domain of " leaf ";
The connected domain of " wood ";
The connected domain of " Dian " in " very little ";
The connected domain of the part in " very little " except " Dian ";
The connected domain of " 1 ";
The connected domain of " day ".
Then, determine the boundary rectangle of these 9 connected domains respectively, these 9 boundary rectangles are all up-and-down boundaries is horizontal line, right boundary is the boundary rectangle of perpendicular line.
By step S402, suppose the pixel value of often row pixel in the boundary rectangle according to " leaf " place, this boundary rectangle is cut into two blocks to be combined, one is the block to be combined at " mouth " place, another is the block to be combined at " ten " place, and all non-cutting of other boundary rectangle, directly as block to be combined, then now have 10 blocks to be combined, respectively:
The block to be combined of " Si " the first half (being similar to the part of " one ");
The block to be combined of " Si " the latter half (being " a carrying " stroke);
The block to be combined of " record ";
The block to be combined of " mouth ";
The block to be combined of " ten ";
The block to be combined of " wood ";
The block to be combined of " Dian " in " very little ";
The block to be combined of the part in " very little " except " Dian ";
The block to be combined of " 1 ";
The block to be combined of " day ".
As follows to the processing procedure of above-mentioned 10 blocks to be combined in step S403:
First the block to be combined of overlying relation and internal relations is merged.
Concrete, for the block to be combined of " Si " the first half (being similar to the part of " one ") and the block to be combined of " Si " the latter half (being " a carrying " stroke), can determine that these two blocks to be combined are overlying relation, therefore directly merge, obtain the block to be combined of " Si ".For the block to be combined of the part in the block to be combined of " Dian " in " very little " and " very little " except " Dian ", determine that the block to be combined of " Dian " is arranged in the inside of the block to be combined of " very little " part except " Dian ", the two is internal relations block, therefore the two is merged into the block to be combined of " very little ".
Again the block to be combined of left-right relation is merged.
Concrete, for the block to be combined of " Si ", determine that the block to be combined of " Si " is larger with the overlapping region of the block to be combined of " record ", and the ratio of width to height also falls into the aspect ratio range of default word block after merging, therefore again the block to be combined of " Si " and the block to be combined of " record " are merged, obtain the block to be combined of " green ".For the block to be combined of " green " and the block to be combined of " mouth ", determine that these two blocks to be combined do not have overlapping region, therefore do not remerge, thus, obtain a block to be identified, i.e. the block to be identified of " green ".Continue to continue to merge for the block to be combined of " mouth ".
For the block to be combined of " mouth ", determine that the block to be combined of itself and follow-up " ten " does not have overlapping region, therefore do not merge, thus obtain the block to be identified of " mouth ".The block to be combined continued for " ten " merges.
For the block to be combined of " ten ", determine that the block to be combined of itself and follow-up " wood " does not have overlapping region, therefore do not merge, thus obtain the block to be identified of " ten ".The block to be combined continued for " wood " merges.
For the block to be combined of " wood ", determine that the overlapping block region to be combined of itself and follow-up " very little " is larger, and the ratio of width to height of the block obtained after merging falls in the aspect ratio range of default word block, therefore these two blocks are merged into a block, i.e. the block to be combined in " village ".Continue for the block to be combined in " village ", determine that the block to be combined of itself and follow-up " 1 " does not have overlapping region, therefore do not merge, thus obtain the block to be identified in " village ", the block to be combined continued for " 1 " merges.
For the block to be combined of " 1 ", determine that the block to be combined of itself and follow-up " day " does not have overlapping region, therefore do not merge, obtain the region to be identified of " 1 ", the block to be combined continued for " day " merges.
For the block to be combined of " day ", determine there is no block to be combined thereafter, thus obtain the block to be identified of " day ".
So far, step S403 merges end, obtains 6 blocks to be identified altogether, is respectively:
The block to be identified of " green ";
The block to be identified of " mouth ";
The block to be identified of " ten ";
The block to be identified in " village ";
The region to be identified of " 1 ";
The block to be identified of " day ".
In step S404, according to vertical order, first select the block to be identified of " green " as starting point block, the block to be identified (starting point block) that first will be somebody's turn to do " green ", as word block undetermined, identifies, determines degree of confidence.Again the block to be identified of the block to be identified of " green " and " mouth " is thereafter merged into a word block undetermined to identify, determine degree of confidence.Again the block to be identified of the block to be identified of " green ", " mouth ", the block to be identified of " ten " are merged into a word block undetermined and identified, determine degree of confidence.By that analogy, until all blocks the block to be identified from the block to be identified of " green " to " day " are merged into a word block undetermined and identified, till determining degree of confidence.Obviously, what the degree of confidence of identification was the highest is exactly by the situation of the block to be identified self of " green " as word block undetermined, therefore, identifies " green " word.
Due to degree of confidence the highest time be the block to be identified itself of " green ", that is the block to be identified of " green " when inherently degree of confidence is the highest word block meta undetermined in last block to be identified, therefore, next block to be identified is the block to be identified of " mouth ", by the block to be identified of " mouth " as starting point block.
Similar, the block to be identified itself of " mouth " is identified as word block undetermined, determine degree of confidence, again the block to be identified of " mouth " and the block to be identified of " ten " are merged into a word block undetermined to identify, determine degree of confidence, again by the block to be identified of " mouth ", the block to be identified of " ten ", the block to be identified in " village " is merged into a word block undetermined and is identified, identify until all blocks to be identified in the block to be identified of " mouth " to the block to be identified of " day " are merged into a word block undetermined, and till determining degree of confidence.Because " mouth " word in simple " mouth " word and " leaf " word more or less all can difference to some extent, therefore, the block to be identified itself of " mouth " is carried out as word block undetermined the degree of confidence that identifies can lower than the degree of confidence " mouth " and " ten " merged into a word block undetermined and carry out identifying on very large probability, thus the block to be identified of the block to be identified of " mouth " and " ten " can be identified as " leaf " word by the very large probability of the method that the embodiment of the present application provides.That is, the actual word block at current starting point block " mouth " place is the word block of " leaf " word.
And due to degree of confidence the highest time word block meta undetermined be " ten " in last block to be identified, therefore next block to be identified is the block to be identified in " village ", thus, next with " village " for starting point block identifies.
Similar for starting point block carries out identifying with the above-mentioned situation carrying out identifying for starting point block with " green " with " village ", " village " word can be identified, just repeat no longer one by one here.Following is that starting point block identifies with the block to be identified of " 1 ".
First the block to be identified itself of " 1 " is identified as word block undetermined, determine degree of confidence, then " 1 " and " day " are thereafter merged into a word block undetermined identify, determine degree of confidence.And due to numeral " 1 " also slightly different from " Shu " of " old " word left side, therefore, the block to be identified itself of " 1 " is carried out as word block undetermined the degree of confidence that identifies can higher than the degree of confidence " 1 " and " day " merged into a word block undetermined and carry out identifying on very large probability, thus, the block to be identified of " 1 " can be identified as numeral " 1 " by the very large probability of the method that the embodiment of the present application provides, and " 1 " and " day " merging can not be identified as " old " word.Follow-up, identifiable design goes out " day " word, just repeats here no longer one by one.
As can be seen from upper example, the character recognition method that the embodiment of the present application provides, for " green " word and " village " word of tiled configuration, by the mode that first cutting remerges, some radical of these words can be effectively avoided to be identified as an independent word, same, Chinese character this method for left, center, right structure is suitable for too, for " leaf " word, because " mouth " word in independent " mouth " word and " leaf " word more or less has difference, this difference can be embodied by the degree of confidence identified, therefore, according to the identification principle of most high confidence level, " leaf " word can be identified accurately, for " 1 " word and " day " word, also due to the slightly difference and be embodied in degree of confidence of " Shu " in numeral " 1 " and " old " word, therefore according to the identification principle of most high confidence level, also can identify accurately " 1 " and " day ", the two merging can not be identified as " old ", therefore, the character recognition method that the embodiment of the present application provides can in left and right, the application scenarios of left, center, right structure Chinese character and Chinese character and numeral, the accuracy identified effectively is ensured in the application scenarios of letter mixing, effectively improve the precision of Text region.
Further, when binary conversion treatment being carried out to literal line before step S401, consider in practical application scene, during owing to gathering the images such as document, image can be subject to the impact such as uneven illumination, shade, therefore, in order to distinguish stroke in the literal line that extracts from this image and other parts more accurately, the embodiment of the present application is when carrying out binary conversion treatment to literal line, for not by the part that shade affects, can overall binaryzation be carried out, and can local binarization be carried out for the part affected by shade.
Concrete, the binary processing method in the embodiment of the present application can be: for each pending pixel in literal line, determines that the pixel value of this pending pixel subtracts the difference of default global threshold, when the absolute value of this difference is greater than default first threshold and this difference is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the absolute value of this difference is greater than default first threshold and this difference is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset, when the absolute value of this difference is not more than default first threshold, determine the pixel value of all pixels in specified scope around this pending pixel, max pixel value and minimum pixel value is determined in the pixel value of all pixels determined, and determine the mean value of max pixel value and described minimum pixel value, when the difference that the pixel value of this pending pixel subtracts this mean value is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the difference that the pixel value of this pending pixel subtracts this mean value is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset.Wherein, global threshold sets by OSTU algorithm.As shown in Fig. 6 A, Fig. 6 B, Fig. 6 C.
The application scenarios schematic diagram of the local binarization that Fig. 6 A provides for the embodiment of the present application, in fig. 6, the word comprised in the literal line extracted from image is " 1 day, greenery village ", and in this literal line left-half block by shade, right half part is not blocked, and causes " green " and " leaf " word in literal line to be arranged in shade.
Due in practical application scene, the stroke of word is generally that color is darker, that is in stroke, the pixel value of pixel is less, and the background of word (as blank sheet of paper), color is more shallow, and that is in background, the pixel value of pixel is larger, therefore, be generally that the pixel value of each pixel and certain threshold value are made comparisons to the principle of literal line binaryzation, if pixel value is greater than this threshold value, then it can be used as background pixel point, pixel value is set to background pixel value, as 0.Otherwise, if pixel value is less than this threshold value, then it can be used as foreground pixel point, the pixel namely in stroke, pixel value be set to foreground pixel value, as 255.
And in the scene having shade to block, if set a global threshold by means of only OSTU algorithm, then can be all less than normal due to the pixel value of all pixels of dash area, global threshold is caused to diminish, like this, the pixel value of the pixel in the stroke under shade blocks probably can be greater than global threshold and be regarded the process of background pixel point, and the literal line after the binaryzation obtained thus in worst case will be as shown in Figure 6B.
Fig. 6 B is for only adopting the schematic diagram of overall binaryzation to the literal line shown in Fig. 6 A, obviously, only to be in Fig. 6 A shade block under " green " and " leaf " carry out overall binaryzation time, the pixel at the stroke place of these two words is all greater than global threshold, therefore all by pixel as a setting, pixel value is set to 0(ater), thus the literal line obtained after binaryzation as shown in Figure 6B, comparison diagram 6A and Fig. 6 B, see intuitively, disappear in " green " and " leaf " literal line after the binaryzation shown in Fig. 6 B.
In order to avoid the situation that Fig. 6 B occurs, in conjunction with overall binaryzation and local binarization in the embodiment of the present application, when carrying out binaryzation to the literal line shown in Fig. 6 A, equally first set a global threshold by OSTU algorithm.
For be not in shade block under pending pixel, if this pending pixel is the pixel on stroke, then its pixel value can be less than global threshold, and this pixel value subtract the difference of global threshold absolute value generally can very large (namely, be greater than first threshold), if this pending pixel is background pixel point, then its pixel value can be greater than global threshold, and this pixel value subtract the difference of global threshold absolute value generally also can very large (namely, be greater than first threshold), therefore, when the absolute value that this pixel value subtracts the difference of global threshold is greater than first threshold, directly can carry out binaryzation according to the magnitude relationship of pixel value and global threshold to this pending pixel.
For be in shade block under pending pixel, its pixel value subtract the difference of global threshold absolute value can less (namely, be not more than first threshold), therefore, in this case, the pixel value of all pixels of (if radius is in the scope of 3 ~ 5 pixels) in specified scope around pending pixel can be determined, and therefrom determine max pixel value and minimum pixel value, determine the mean value of this max pixel value and minimum pixel value again, as local threshold, again according to the pixel value of this pending pixel and the magnitude relationship of local threshold, binaryzation is carried out to this pending pixel, as shown in Figure 6 C.
Schematic diagram after the combination overall situation binaryzation that Fig. 6 C provides for the embodiment of the present application and local binarization process the literal line shown in Fig. 6 A, comparison diagram 6C and Fig. 6 A is visible, be in Fig. 6 A shade block under " green " and " leaf " still clear errorless in the literal line shown in Fig. 6 C.
Further, in the step S402 shown in Fig. 4, for each boundary rectangle, when cutting being carried out to this boundary rectangle according to the pixel value of pixel each in this boundary rectangle, the vertical projection of the often row pixel in this boundary rectangle after binary conversion treatment can be determined, and determine the minimum vertical projection in the vertical projection of often row pixel, determine the left margin of a row pixel to this boundary rectangle or the distance of right margin at this minimum vertical projection place, determine the ratio of this distance and the width of this boundary rectangle, when the projection of this minimum vertical is less than default Second Threshold, and this ratio is when falling in default ratio range, with a row pixel at this minimum vertical projection place for segmentation lines, this boundary rectangle of cutting.Wherein, the vertical projection of a row pixel is the pixel value sum of this row pixel.
Concrete, due in the embodiment of the present application when carrying out binary conversion treatment to literal line, pixel value as the pixel of the stroke of prospect is set in order to foreground pixel value 255(pure white), and the pixel value of pixel is as a setting set in order to 0(ater), therefore, a corresponding row pixel, if the vertical projection of this row pixel is very little, then this row pixel be likely between word from word or a word different radicals between gap.But, if with this this boundary rectangle of row pixel cutting, the width of the rectangle after certain cutting obtained is wide or narrow, then illustrate that in fact this boundary rectangle should not be split, therefore, a corresponding boundary rectangle, only have when the minimum row pixel of vertical projection in this boundary rectangle vertical projection (namely, minimum vertical projects) be less than Second Threshold, and with this row pixel for after segmentation lines carries out cutting, the width of the rectangle after the cutting obtained and the ratio of the width of this boundary rectangle original fall into default ratio range (as 1/2 ~ 1/3) interior time, just with this row pixel for segmentation lines carries out cutting to this boundary rectangle.
In addition, in dicing process, after certain boundary rectangle cutting, whether two boundary rectangles after the cutting that also will obtain according to above-mentioned two condition judgment can continue cutting, if can, continue cutting, otherwise no longer cutting, until all boundary rectangles all can not cutting again, again using the boundary rectangle after each cutting all as a block to be combined, perform follow-up step.
Further, in the step S405 shown in Fig. 4, due to after have selected starting point block, if for the block each to be identified be positioned at after starting point block, all all blocks of this starting point block to this block to be identified are merged into word block undetermined and identify, can cause identifying that the efficiency of word is lower, therefore, in order to improve the efficiency identifying word in the embodiment of the present application, tentative identification is being carried out with before determining degree of confidence by step S405, gap between every two the adjacent blocks to be identified can also determining to obtain, according to the gap between every two adjacent blocks to be identified, determine actual gap estimated value, according to the height of the block each to be identified obtained, determine true altitude estimated value, according to actual gap estimated value and true altitude estimated value, determine the maximum merging position that starting point block is corresponding, this maximum merge bit is setting in after starting point block.Wherein, the absolute value of the gap between every two the adjacent blocks from starting point block to this maximum merging position and the difference of this actual gap estimated value is not more than the 4th default threshold value, and, for the block each to be identified between starting point block to this maximum merging position, the ratio of width to height of the block obtained after merging from all blocks of this origin zone BOB(beginning of block) to this block to be identified is fallen in the aspect ratio range of default word block.
Then in step S405 when carrying out tentative identification, specifically can for the block each to be identified between starting point block to this maximum merging position, determine from origin zone BOB(beginning of block), all blocks to this block to be identified, the all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified.Also namely, the number of times by reducing exploratory identification reaches the object improving the efficiency identifying word.
Still be described for " 1 day, greenery village " for the word comprised in literal line.
After obtaining " green ", " mouth ", " ten ", " village ", " 1 ", " day " these 6 blocks to be identified, according to the gap between two adjacent blocks to be identified every in these 6 blocks to be identified, actual gap estimated value can be determined.Concrete, can using the mean value in the gap between two adjacent blocks to be identified every in these 6 blocks to be identified as actual gap estimated value, also by order from small to large or from big to small, can be sorted in the gap between every two adjacent blocks to be identified, and middle gap will be positioned at after sequence as actual gap estimated value.
Similar, according to the height of these 6 blocks to be identified, true altitude estimated value can be determined.Concrete, can using the mean value of the height of these 6 blocks to be identified as true altitude estimated value, also can sort by the height of order from small to large or from big to small by each block to be identified, and middle height will be positioned at after sequence as true altitude estimated value.
Suppose to select the block to be identified of " green " as starting point block, then can determine according to following two conditions the maximum merging position that the block to be identified of " green " is corresponding:
First condition: the absolute value of the gap between every two the adjacent blocks from the block to be identified of " green " to maximum merging position and the difference of above-mentioned actual gap estimated value is not more than default the 4th threshold value (gap between every two the adjacent blocks also namely, from the block to be identified of " green " to maximum merging position and the gap of the above-mentioned actual gap estimated value determined little);
Second condition: for the block each to be identified be positioned between " green " to maximum merging position, the ratio of width to height of the block obtained after being merged by all blocks from " green " to this block to be identified falls in the aspect ratio range of default word block.
Supposing that the maximum merge bit of " green " correspondence determined according to above-mentioned two conditions is set between " mouth " and " ten " (is also, the absolute value of the difference of the gap between " green " and " mouth " and actual gap estimated value is not more than the 4th default threshold value, and the ratio of width to height of the block obtained after " green " and " mouth " being merged is in the aspect ratio range of the word block preset, but by " green ", " mouth ", the ratio of width to height of the block that " ten " obtain after merging will exceed default aspect ratio range), then in step S405, " green " is only needed itself to carry out identifying and determining degree of confidence, again " green " and " mouth " is merged and identify and determine degree of confidence, without the need to again the block all to be identified between " green " to the block to be identified be positioned at after " mouth " all being soundd out identification (be also for one time, without the need to again by " green ", " mouth ", " ten " merge identification, also without the need to by " green ", " mouth ", " ten ", " village " merges identification, by that analogy), effectively can improve the efficiency of Text region.
Be the character recognition method that the embodiment of the present application provides above, based on same thinking, the embodiment of the present application additionally provides a kind of device of Text region, as shown in Figure 7.
The character recognition device structural representation that Fig. 7 provides for the embodiment of the present application, specifically comprises:
Boundary rectangle determination module 701, determines the connected domain be made up of each stroke in literal line, determines the boundary rectangle of each connected domain;
Cutting module 702, for each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined;
Merge module 703, according to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified;
Degree of confidence determination module 704, according to the vertical order of each block to be identified, selects block to be identified as starting point block successively; For the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified;
Identify determination module 705, word the highest for the degree of confidence of identification is defined as the word in the actual word block at described starting point block place.
Described device also comprises:
Binary conversion treatment module 706, before determining at described boundary rectangle determination module 701 connected domain be made up of each stroke in literal line, extracts literal line, carries out binary conversion treatment to described literal line.
Described binary conversion treatment module 706 specifically for, for each pending pixel in described literal line, determine that the pixel value of this pending pixel subtracts the difference of default global threshold, when the absolute value of described difference is greater than default first threshold and described difference is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the absolute value of described difference is greater than default first threshold and described difference is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset, when the absolute value of described difference is not more than default first threshold, determine the pixel value of all pixels in specified scope around this pending pixel, max pixel value and minimum pixel value is determined in the pixel value of the described all pixels determined, and determine the mean value of described max pixel value and described minimum pixel value, when the difference that the pixel value of this pending pixel subtracts described mean value is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the difference that the pixel value of this pending pixel subtracts described mean value is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset.
Described cutting module 702 specifically for, determine the vertical projection of the often row pixel in this boundary rectangle after binary conversion treatment, and determine in the vertical projection of described often row pixel minimum vertical projection; Determine the left margin of a row pixel to this boundary rectangle or the distance of right margin at described minimum vertical projection place; Determine the ratio of described distance and the width of described boundary rectangle; When described minimum vertical projection is less than default Second Threshold and described ratio falls in default ratio range, with a row pixel at described minimum vertical projection place for segmentation lines, this boundary rectangle of cutting.
Described merging module 703 specifically for, for adjacent two blocks to be combined, determine that the horizontal ordinate of right margin of left side block to be combined is the first horizontal ordinate, determine that the horizontal ordinate of the left margin of right side block to be combined is the second horizontal ordinate; When described first horizontal ordinate be greater than the second horizontal ordinate and the difference that the first horizontal ordinate subtracts the second horizontal ordinate be greater than the 3rd default threshold value and the ratio of width to height of the block obtained after being merged by described two blocks to be combined falls in the aspect ratio range of default word block time, determine that described two blocks to be combined meet specified requirements, described two blocks to be combined are merged.
Described device also comprises:
Statistical module 707, for at described degree of confidence determination module 704 for the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), before all blocks to this block to be identified, gap between every two the adjacent blocks to be identified determining to obtain, according to the gap between every two adjacent blocks to be identified, determine actual gap estimated value, according to the height of the block each to be identified obtained, determine true altitude estimated value, according to described actual gap estimated value and true altitude estimated value, determine the maximum merging position that described starting point block is corresponding, described maximum merge bit is setting in after described starting point block, wherein, the absolute value of the gap between every two the adjacent blocks from described starting point block to described maximum merging position and the difference of described actual gap estimated value is not more than the 4th default threshold value, and, for the block each to be identified between described starting point block to described maximum merging position, the ratio of width to height of the block obtained after merging from all blocks of described origin zone BOB(beginning of block) to this block to be identified is fallen in described aspect ratio range,
Described degree of confidence determination module 704 specifically for, for the block each to be identified between described starting point block to described maximum merging position, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified.
The embodiment of the present application provides a kind of character recognition method and device, the boundary rectangle of the connected domain be made up of each stroke in the method determination literal line, according to the pixel value of pixel in each boundary rectangle, cutting is carried out to each boundary rectangle and obtains block to be combined, again according to overlapping region and the default aspect ratio range of each block to be combined, each block to be combined is merged, obtain block to be identified, therefrom select starting point block, for being positioned at the block each to be identified after starting point block, a word block undetermined is merged into by from starting point block to all blocks of this block to be identified, and identify the word in this word block undetermined, determine the degree of confidence identified, finally word the highest for degree of confidence is defined as the word in the actual word block at this starting point block place.By said method, the situation Chinese character of left and right, left, center, right structure being identified as by mistake multiple different Chinese character can be avoided, in the scene of Chinese character and other word mixings, also effectively can improve the precision of Text region.
In one typically configuration, computing equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.
Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flashRAM).Internal memory is the example of computer-readable medium.
Computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise temporary computer readable media (transitorymedia), as data-signal and the carrier wave of modulation.
Also it should be noted that, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thus make to comprise the process of a series of key element, method, commodity or equipment and not only comprise those key elements, but also comprise other key elements clearly do not listed, or also comprise by the intrinsic key element of this process, method, commodity or equipment.When not more restrictions, the key element limited by statement " comprising ... ", and be not precluded within process, method, commodity or the equipment comprising described key element and also there is other identical element.
It will be understood by those skilled in the art that the embodiment of the application can be provided as method, system or computer program.Therefore, the application can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the application can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory, CD-ROM, optical memory etc.) of computer usable program code.
The foregoing is only the embodiment of the application, be not limited to the application.To those skilled in the art, the application can have various modifications and variations.Any amendment done within all spirit in the application and principle, equivalent replacement, improvement etc., within the right that all should be included in the application.

Claims (12)

1. a character recognition method, is characterized in that, comprising:
Determine the connected domain be made up of each stroke in literal line, determine the boundary rectangle of each connected domain;
For each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined;
According to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified;
According to the vertical order of each block to be identified, select block to be identified as starting point block successively;
For the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified;
Word the highest for the degree of confidence of identification is defined as the word in the actual word block at described starting point block place.
2. the method for claim 1, is characterized in that, before determining the connected domain be made up of each stroke in literal line, described method also comprises:
Extract literal line;
Binary conversion treatment is carried out to described literal line.
3. method as claimed in claim 2, is characterized in that, carry out binary conversion treatment, specifically comprise described literal line:
For each pending pixel in described literal line, determine that the pixel value of this pending pixel subtracts the difference of default global threshold;
When the absolute value of described difference is greater than default first threshold and described difference is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset;
When the absolute value of described difference is greater than default first threshold and described difference is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset;
When the absolute value of described difference is not more than default first threshold, determine the pixel value of all pixels in specified scope around this pending pixel, max pixel value and minimum pixel value is determined in the pixel value of the described all pixels determined, and determine the mean value of described max pixel value and described minimum pixel value, when the difference that the pixel value of this pending pixel subtracts described mean value is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the difference that the pixel value of this pending pixel subtracts described mean value is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset.
4. method as claimed in claim 2, it is characterized in that, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, specifically comprises:
Determine the vertical projection of the often row pixel in this boundary rectangle after binary conversion treatment, and determine the minimum vertical projection in the vertical projection of described often row pixel;
Determine the left margin of a row pixel to this boundary rectangle or the distance of right margin at described minimum vertical projection place;
Determine the ratio of described distance and the width of described boundary rectangle;
When described minimum vertical projection is less than default Second Threshold and described ratio falls in default ratio range, with a row pixel at described minimum vertical projection place for segmentation lines, this boundary rectangle of cutting.
5. the method for claim 1, is characterized in that, according to the overlapping region of each block to be combined and the aspect ratio range of default word block, merges, specifically comprise the block to be combined meeting specified requirements:
For adjacent two blocks to be combined, determine that the horizontal ordinate of the right margin of left side block to be combined is the first horizontal ordinate, determine that the horizontal ordinate of the left margin of right side block to be combined is the second horizontal ordinate;
When described first horizontal ordinate be greater than the second horizontal ordinate and the difference that the first horizontal ordinate subtracts the second horizontal ordinate be greater than the 3rd default threshold value and the ratio of width to height of the block obtained after being merged by described two blocks to be combined falls in the aspect ratio range of default word block time, determine that described two blocks to be combined meet specified requirements, described two blocks to be combined are merged.
6. the method for claim 1, is characterized in that, for the block each to be identified be positioned at after described starting point block, determines that, from described origin zone BOB(beginning of block), before all blocks to this block to be identified, described method also comprises:
Gap between every two the adjacent blocks to be identified determining to obtain;
According to the gap between every two adjacent blocks to be identified, determine actual gap estimated value;
According to the height of the block each to be identified obtained, determine true altitude estimated value;
According to described actual gap estimated value and true altitude estimated value, determine the maximum merging position that described starting point block is corresponding, described maximum merge bit is setting in after described starting point block; Wherein, the absolute value of the gap between every two the adjacent blocks from described starting point block to described maximum merging position and the difference of described actual gap estimated value is not more than the 4th default threshold value, and, for the block each to be identified between described starting point block to described maximum merging position, the ratio of width to height of the block obtained after merging from all blocks of described origin zone BOB(beginning of block) to this block to be identified is fallen in described aspect ratio range;
For the block each to be identified be positioned at after described starting point block, determine that, from described origin zone BOB(beginning of block), all blocks to this block to be identified, specifically comprise:
For the block each to be identified between described starting point block to described maximum merging position, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified.
7. a character recognition device, is characterized in that, comprising:
Boundary rectangle determination module, determines the connected domain be made up of each stroke in literal line, determines the boundary rectangle of each connected domain;
Cutting module, for each boundary rectangle, the pixel value according to pixel each in this boundary rectangle carries out cutting to this boundary rectangle, obtains block to be combined;
Merge module, according to the overlapping region of each block to be combined and the aspect ratio range of default word block, the block to be combined meeting specified requirements is merged, obtains block to be identified;
Degree of confidence determination module, according to the vertical order of each block to be identified, selects block to be identified as starting point block successively; For the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified, all blocks determined are merged into a word block undetermined, and identify the word in this word block undetermined, determine the degree of confidence identified;
Identify determination module, word the highest for the degree of confidence of identification is defined as the word in the actual word block at described starting point block place.
8. device as claimed in claim 7, it is characterized in that, described device also comprises:
Binary conversion treatment module, for before the connected domain that is made up of each stroke in described boundary rectangle determination module determination literal line, extracts literal line, carries out binary conversion treatment to described literal line.
9. device as claimed in claim 8, is characterized in that, described binary conversion treatment module specifically for, for each pending pixel in described literal line, determine that the pixel value of this pending pixel subtracts the difference of default global threshold, when the absolute value of described difference is greater than default first threshold and described difference is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the absolute value of described difference is greater than default first threshold and described difference is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset, when the absolute value of described difference is not more than default first threshold, determine the pixel value of all pixels in specified scope around this pending pixel, max pixel value and minimum pixel value is determined in the pixel value of the described all pixels determined, and determine the mean value of described max pixel value and described minimum pixel value, when the difference that the pixel value of this pending pixel subtracts described mean value is less than 0, the pixel value of this pending pixel is set to the foreground pixel value preset, when the difference that the pixel value of this pending pixel subtracts described mean value is greater than 0, the pixel value of this pending pixel is set to the background pixel value preset.
10. device as claimed in claim 8, it is characterized in that, described cutting module specifically for, determine the vertical projection of the often row pixel in this boundary rectangle after binary conversion treatment, and determine in the vertical projection of described often row pixel minimum vertical projection; Determine the left margin of a row pixel to this boundary rectangle or the distance of right margin at described minimum vertical projection place; Determine the ratio of described distance and the width of described boundary rectangle; When described minimum vertical projection is less than default Second Threshold and described ratio falls in default ratio range, with a row pixel at described minimum vertical projection place for segmentation lines, this boundary rectangle of cutting.
11. devices as claimed in claim 7, it is characterized in that, described merging module specifically for, for adjacent two blocks to be combined, on the left of determining, the horizontal ordinate of the right margin of block to be combined is the first horizontal ordinate, determines that the horizontal ordinate of the left margin of right side block to be combined is the second horizontal ordinate; When described first horizontal ordinate be greater than the second horizontal ordinate and the difference that the first horizontal ordinate subtracts the second horizontal ordinate be greater than the 3rd default threshold value and the ratio of width to height of the block obtained after being merged by described two blocks to be combined falls in the aspect ratio range of default word block time, determine that described two blocks to be combined meet specified requirements, described two blocks to be combined are merged.
12. devices as claimed in claim 7, it is characterized in that, described device also comprises:
Statistical module, for at described degree of confidence determination module for the block each to be identified be positioned at after described starting point block, determine from described origin zone BOB(beginning of block), before all blocks to this block to be identified, gap between every two the adjacent blocks to be identified determining to obtain, according to the gap between every two adjacent blocks to be identified, determine actual gap estimated value, according to the height of the block each to be identified obtained, determine true altitude estimated value, according to described actual gap estimated value and true altitude estimated value, determine the maximum merging position that described starting point block is corresponding, described maximum merge bit is setting in after described starting point block, wherein, the absolute value of the gap between every two the adjacent blocks from described starting point block to described maximum merging position and the difference of described actual gap estimated value is not more than the 4th default threshold value, and, for the block each to be identified between described starting point block to described maximum merging position, the ratio of width to height of the block obtained after merging from all blocks of described origin zone BOB(beginning of block) to this block to be identified is fallen in described aspect ratio range,
Described degree of confidence determination module specifically for, for the block each to be identified between described starting point block to described maximum merging position, determine from described origin zone BOB(beginning of block), all blocks to this block to be identified.
CN201410127438.6A 2014-03-31 2014-03-31 Character recognition method and device thereof Pending CN104951741A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410127438.6A CN104951741A (en) 2014-03-31 2014-03-31 Character recognition method and device thereof
HK15111978.6A HK1211363A1 (en) 2014-03-31 2015-12-04 Method and apparatus for character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410127438.6A CN104951741A (en) 2014-03-31 2014-03-31 Character recognition method and device thereof

Publications (1)

Publication Number Publication Date
CN104951741A true CN104951741A (en) 2015-09-30

Family

ID=54166387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410127438.6A Pending CN104951741A (en) 2014-03-31 2014-03-31 Character recognition method and device thereof

Country Status (2)

Country Link
CN (1) CN104951741A (en)
HK (1) HK1211363A1 (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631450A (en) * 2015-12-28 2016-06-01 小米科技有限责任公司 Character identifying method and device
CN106940799A (en) * 2016-01-05 2017-07-11 腾讯科技(深圳)有限公司 Method for processing text images and device
CN106951893A (en) * 2017-05-08 2017-07-14 奇酷互联网络科技(深圳)有限公司 Text information acquisition methods, device and mobile terminal
CN107784316A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 A kind of image-recognizing method, device, system and computing device
CN105740860B (en) * 2016-01-28 2018-04-06 河南大学 Retail shop's label Chinese character region automatic testing method in natural scene
CN108629238A (en) * 2017-03-21 2018-10-09 高德软件有限公司 A kind of method and apparatus of identification Chinese character label
CN109389150A (en) * 2018-08-28 2019-02-26 东软集团股份有限公司 Image consistency comparison method, device, storage medium and electronic equipment
CN109558876A (en) * 2018-11-20 2019-04-02 浙江口碑网络技术有限公司 Character recognition processing method and device
CN110135425A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN110135426A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN110135417A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN111630521A (en) * 2018-02-28 2020-09-04 佳能欧洲股份有限公司 Image processing method and image processing system
CN111968104A (en) * 2020-08-27 2020-11-20 中冶赛迪重庆信息技术有限公司 Machine vision-based steel coil abnormity identification method, system, equipment and medium
CN111967460A (en) * 2020-10-23 2020-11-20 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN113538450A (en) * 2020-04-21 2021-10-22 百度在线网络技术(北京)有限公司 Method and device for generating image
CN115995080A (en) * 2023-03-22 2023-04-21 曲阜市检验检测中心 Archive intelligent management system based on OCR (optical character recognition)
CN117612172A (en) * 2024-01-24 2024-02-27 成都医星科技有限公司 Desensitization position locating and desensitization method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020181777A1 (en) * 2001-05-30 2002-12-05 International Business Machines Corporation Image processing method, image processing system and program
CN1588431A (en) * 2004-07-02 2005-03-02 清华大学 Character extracting method from complecate background color image based on run-length adjacent map
CN102063611A (en) * 2010-01-21 2011-05-18 汉王科技股份有限公司 Method and system for inputting characters
CN102169542A (en) * 2010-02-25 2011-08-31 汉王科技股份有限公司 Method and device for touching character segmentation in character recognition
CN103577818A (en) * 2012-08-07 2014-02-12 北京百度网讯科技有限公司 Method and device for recognizing image characters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020181777A1 (en) * 2001-05-30 2002-12-05 International Business Machines Corporation Image processing method, image processing system and program
CN1588431A (en) * 2004-07-02 2005-03-02 清华大学 Character extracting method from complecate background color image based on run-length adjacent map
CN102063611A (en) * 2010-01-21 2011-05-18 汉王科技股份有限公司 Method and system for inputting characters
CN102169542A (en) * 2010-02-25 2011-08-31 汉王科技股份有限公司 Method and device for touching character segmentation in character recognition
CN103577818A (en) * 2012-08-07 2014-02-12 北京百度网讯科技有限公司 Method and device for recognizing image characters

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105631450A (en) * 2015-12-28 2016-06-01 小米科技有限责任公司 Character identifying method and device
KR102012819B1 (en) * 2016-01-05 2019-08-21 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Text image processing method and device
CN106940799A (en) * 2016-01-05 2017-07-11 腾讯科技(深圳)有限公司 Method for processing text images and device
WO2017118356A1 (en) * 2016-01-05 2017-07-13 腾讯科技(深圳)有限公司 Text image processing method and apparatus
CN106940799B (en) * 2016-01-05 2020-07-24 腾讯科技(深圳)有限公司 Text image processing method and device
KR20170137170A (en) * 2016-01-05 2017-12-12 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Method and apparatus for text image processing
US20180053048A1 (en) * 2016-01-05 2018-02-22 Tencent Technology (Shenzhen) Company Limited Text image processing method and apparatus
US10572728B2 (en) * 2016-01-05 2020-02-25 Tencent Technology (Shenzhen) Company Limited Text image processing method and apparatus
EP3401842A4 (en) * 2016-01-05 2019-08-28 Tencent Technology (Shenzhen) Company Limited Text image processing method and apparatus
CN105740860B (en) * 2016-01-28 2018-04-06 河南大学 Retail shop's label Chinese character region automatic testing method in natural scene
CN107784316A (en) * 2016-08-26 2018-03-09 阿里巴巴集团控股有限公司 A kind of image-recognizing method, device, system and computing device
CN108629238A (en) * 2017-03-21 2018-10-09 高德软件有限公司 A kind of method and apparatus of identification Chinese character label
CN108629238B (en) * 2017-03-21 2020-07-10 阿里巴巴(中国)有限公司 Method and device for identifying Chinese character mark
CN106951893A (en) * 2017-05-08 2017-07-14 奇酷互联网络科技(深圳)有限公司 Text information acquisition methods, device and mobile terminal
CN110135417A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN110135426A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN110135425A (en) * 2018-02-09 2019-08-16 北京世纪好未来教育科技有限公司 Sample mask method and computer storage medium
CN111630521A (en) * 2018-02-28 2020-09-04 佳能欧洲股份有限公司 Image processing method and image processing system
CN109389150A (en) * 2018-08-28 2019-02-26 东软集团股份有限公司 Image consistency comparison method, device, storage medium and electronic equipment
CN109558876A (en) * 2018-11-20 2019-04-02 浙江口碑网络技术有限公司 Character recognition processing method and device
CN109558876B (en) * 2018-11-20 2021-11-16 浙江口碑网络技术有限公司 Character recognition processing method and device
CN113538450A (en) * 2020-04-21 2021-10-22 百度在线网络技术(北京)有限公司 Method and device for generating image
US11810333B2 (en) 2020-04-21 2023-11-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for generating image of webpage content
CN111968104A (en) * 2020-08-27 2020-11-20 中冶赛迪重庆信息技术有限公司 Machine vision-based steel coil abnormity identification method, system, equipment and medium
CN111967460B (en) * 2020-10-23 2021-02-23 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN111967460A (en) * 2020-10-23 2020-11-20 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN115995080A (en) * 2023-03-22 2023-04-21 曲阜市检验检测中心 Archive intelligent management system based on OCR (optical character recognition)
CN115995080B (en) * 2023-03-22 2023-06-02 曲阜市检验检测中心 Archive intelligent management system based on OCR (optical character recognition)
CN117612172A (en) * 2024-01-24 2024-02-27 成都医星科技有限公司 Desensitization position locating and desensitization method and device, electronic equipment and storage medium
CN117612172B (en) * 2024-01-24 2024-03-19 成都医星科技有限公司 Desensitization position locating and desensitization method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
HK1211363A1 (en) 2016-05-20

Similar Documents

Publication Publication Date Title
CN104951741A (en) Character recognition method and device thereof
WO2020140698A1 (en) Table data acquisition method and apparatus, and server
KR102492369B1 (en) Binarization and normalization-based inpainting for text removal
US11087168B2 (en) Method and apparatus for positioning text over image, electronic apparatus, and storage medium
JP6951905B2 (en) How to cut out lines and words for handwritten text images
CN105930159A (en) Image-based interface code generation method and system
US8693790B2 (en) Form template definition method and form template definition apparatus
JP6262188B2 (en) A method for segmenting text characters in a document image using vertical projection of the central area of the characters
CN105654072A (en) Automatic character extraction and recognition system and method for low-resolution medical bill image
KR102596989B1 (en) Method and apparatus for recognizing key identifier in video, device and storage medium
JP2017535891A (en) Method and apparatus for detecting text
CN112560862A (en) Text recognition method and device and electronic equipment
CN107330430A (en) Tibetan character recognition apparatus and method
JP5906788B2 (en) Character cutout method, and character recognition apparatus and program using this method
CN109299718B (en) Character recognition method and device
JP2013137761A (en) Determination for transparent painting-out based on reference background color
CN109948598B (en) Document layout intelligent analysis method and device
JP2016038821A (en) Image processing apparatus
CN105095826A (en) Character recognition method and character recognition device
CN112800824A (en) Processing method, device and equipment for scanning file and storage medium
Kumar et al. Ancient indian document analysis using cognitive memory network
CN108596182A (en) Language of the Manchus component cutting method
CN108564078A (en) The method for extracting language of the Manchus word image central axes
Kise et al. Document image segmentation as selection of Voronoi edges
Gaur et al. A Survey on OCR for Overlapping and Broken Characters in Document Image: Problem with Overlapping and Broken Characters in Document Image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1211363

Country of ref document: HK

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150930

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1211363

Country of ref document: HK