CN103577818B - A kind of method and apparatus of pictograph identification - Google Patents

A kind of method and apparatus of pictograph identification Download PDF

Info

Publication number
CN103577818B
CN103577818B CN201210279370.4A CN201210279370A CN103577818B CN 103577818 B CN103577818 B CN 103577818B CN 201210279370 A CN201210279370 A CN 201210279370A CN 103577818 B CN103577818 B CN 103577818B
Authority
CN
China
Prior art keywords
sentence
recognition result
block
word
confidence level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210279370.4A
Other languages
Chinese (zh)
Other versions
CN103577818A (en
Inventor
韩钧宇
丁二锐
吴中勤
文林福
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210279370.4A priority Critical patent/CN103577818B/en
Publication of CN103577818A publication Critical patent/CN103577818A/en
Application granted granted Critical
Publication of CN103577818B publication Critical patent/CN103577818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides a kind of method and apparatus of pictograph identification, wherein method includes:Character area in S1, acquisition images to be recognized;S2, each block in character area is identified respectively and records the location information of each block;S3, the location information based on each block carry out printed page analysis and obtain sentence structure distribution;S4, the correction based on semantic analysis is carried out to the recognition result of each block based on sentence structure distribution, the recognition result after being corrected.The present invention efficiently utilizes the semantic information between word and is modified to the recognition result of each block, improves the precision of pictograph identification, preferably meets the identification demand of user.

Description

A kind of method and apparatus of pictograph identification
【Technical field】
The present invention relates to computer application technology, more particularly to a kind of method and apparatus of pictograph identification.
【Background technology】
With the rapid development of mobile Internet, the application based on mobile terminal camera the image collected is more and more wider It is general.Wherein the word in image is identified pictograph identification technology, is converted to text, defeated to alleviate user The burden for entering corresponding text information facilitates user's storage, edits corresponding text information.But pictograph identification technology is one A sufficiently complex technical problem, especially in the case of picture material complexity, Text region precision often cannot be satisfied use The demand at family.
Existing image character recognition method mainly includes the following steps that:
1)Determine the character zone in image;2)Character segmentation is carried out to character zone, obtains each block;3)To each Block carries out feature extraction, the feature of extraction is matched with property data base, to obtain matched each character conduct Recognition result.
Although above-mentioned image character recognition method has stronger Text region ability, by being then based on single word Identification, therefore be susceptible to identification error and there is no effective correction measure, Text region precision relatively low.
【Invention content】
In view of this, the present invention provides a kind of method and apparatus of pictograph identification, in order to improve pictograph The precision of identification.
Specific technical solution is as follows:
A kind of pictograph knowledge method for distinguishing, this method include:
Character area in S1, acquisition images to be recognized;
S2, each block in character area is identified respectively and records the location information of each block;
S3, the location information based on each block carry out printed page analysis and obtain sentence structure distribution;
S4, the correction based on semantic analysis is carried out to the recognition result of each block based on sentence structure distribution, is corrected Recognition result afterwards.
According to one preferred embodiment of the present invention, the step S1 is specifically included:
Server receives the images to be recognized that mobile terminal is sent, and character area is extracted from the images to be recognized; Alternatively,
Server receives the character area that mobile terminal is extracted and sent from images to be recognized.
According to one preferred embodiment of the present invention, the step S3 is specifically included:
It will be less than default the in vertical upper position gap using coordinate information of the block center in the images to be recognized Literal line of the block of one threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default the in level using coordinate information of the block center in the images to be recognized Literal line of the block of two threshold values as a vertical direction;Alternatively,
It will be less than default the in vertical upper position gap using coordinate information of the block center in the images to be recognized One threshold value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default the in level using coordinate information of the block center in the images to be recognized Two threshold values and block difference in size are less than literal line of the block of default size threshold value as a vertical direction.
According to one preferred embodiment of the present invention, the step S4 is specifically included:
S41, the recognition result of each block in literal line is matched with word library, obtains the identification knot for constituting word Fruit;
S42, it is combined by block sequence using the recognition result for constituting word and the recognition result for not constituting word Obtain each sentence;
S43, the semantic confidence degree for determining each sentence, and each sentence is matched with sentence database, according to matching Situation determines the matching confidence level of each sentence;
S44, the semantic confidence degree of each sentence and matching confidence level are combined to the total confidence level for determining each sentence, selection Total highest sentence of confidence level is as the recognition result after correction.
According to one preferred embodiment of the present invention, further include in the step S41:By the block of non-first place in literal line The recognition result that word can not be formed with the recognition result close to block in recognition result is deleted, but for can be independently at semantic Or close to block recognition result lack recognition result except.
According to one preferred embodiment of the present invention, further include in the step S2:According to recognition result and block in picture Similarity determines the confidence level of the recognition result of each block;
In the step S43 using sentence in the confidence level of each recognition result sum to obtain the semantic confidence degree of sentence, The confidence level for the recognition result for constituting word is wherein improved in summation.
According to one preferred embodiment of the present invention, the step S43 is specifically included:N1 are selected before semantic confidence degree comes Sentence, n1 are preset positive integer, and the sentence selected is matched with sentence database, and each sentence is determined according to matching state Matching confidence level.
According to one preferred embodiment of the present invention, the matching confidence of sentence i is determined using following formula in the step S43 Spend Cm
Cm=Ni×α×Pi
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, PiFor the maximum continuous coupling of sentence i and sentence L The ratio of the total number of word of word number and sentence L, wherein the sentence L is match statements of the sentence i in sentence database.
According to one preferred embodiment of the present invention, this method further includes:
S5, it is scanned for, is determined optimal with the recognition result matching state after correction using the recognition result after correction Network documentation intercepts in the network documentation with the matched network character content of recognition result after correction as the identification after extension As a result.
According to one preferred embodiment of the present invention, in described interception network documentation with the matched net of recognition result after correction Network word content as extension after recognition result be:
After minimum sentence of the interception comprising the recognition result after correction in the network documentation or minimum paragraph are used as extension Recognition result.
A kind of device of image recognition, the device include:
Area acquisition unit, for obtaining the character area in images to be recognized;
Word recognition unit, for each block in character area to be identified respectively;
Position recording unit, the location information for recording each block;
Printed page analysis unit carries out printed page analysis for the location information based on each block and obtains sentence structure distribution;
Semantic analysis unit, for being carried out the recognition result of each block based on semantic analysis based on sentence structure distribution Correction, the recognition result after being corrected.
According to one preferred embodiment of the present invention, the area acquisition unit receives the figure to be identified that mobile terminal is sent Picture extracts character area from the images to be recognized;It extracts and sends from images to be recognized alternatively, receiving mobile terminal Character area.
According to one preferred embodiment of the present invention, the printed page analysis unit is specifically configured to:
It will be less than default the in vertical upper position gap using coordinate information of the block center in the images to be recognized Literal line of the block of one threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default the in level using coordinate information of the block center in the images to be recognized Literal line of the block of two threshold values as a vertical direction;Alternatively,
It will be less than default the in vertical upper position gap using coordinate information of the block center in the images to be recognized One threshold value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default the in level using coordinate information of the block center in the images to be recognized Two threshold values and block difference in size are less than literal line of the block of default size threshold value as a vertical direction.
According to one preferred embodiment of the present invention, the semantic analysis unit specifically includes:
Dictionary coupling subelement obtains structure for matching the recognition result of each block in literal line with word library At the recognition result of word;
Sentence determination subelement, for using constitute word recognition result and do not constitute the recognition result of word by word Block sequence is combined to obtain each sentence;
Semantic confidence degree determination subelement, the semantic confidence degree for determining each sentence;
Confidence level determination subelement is matched, it is true according to matching state for matching each sentence with sentence database The matching confidence level of fixed each sentence;
Subelement is corrected, for the semantic confidence degree of each sentence and matching confidence level to be combined and determine the total of each sentence Confidence level selects the highest sentence of total confidence level as the recognition result after correction.
According to one preferred embodiment of the present invention, the dictionary coupling subelement is additionally operable to the word of non-first place in literal line The recognition result that word can not be formed with the recognition result close to block in the recognition result of block is deleted, but for can independent Chinese idiom Except the recognition result of justice or close to block recognition result missing.
According to one preferred embodiment of the present invention, the word recognition unit is additionally operable to according to recognition result and word in picture The similarity of block determines the confidence level of the recognition result of each block;
The semantic confidence degree determination subelement is specifically configured to:It is summed using the confidence level of each recognition result in sentence The semantic confidence degree of sentence is obtained, wherein improving the confidence level for the recognition result for constituting word in summation.
According to one preferred embodiment of the present invention, the matching confidence level determination subelement is specifically configured to:Select semanteme N1 sentences before confidence level comes, n1 are preset positive integer, the sentence selected are matched with sentence database, foundation Matching state determines the matching confidence level of each sentence.
According to one preferred embodiment of the present invention, the matching confidence level determination subelement determines sentence i using following formula Matching confidence level Cm
Cm=Ni×α×Pi
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, PiFor the maximum continuous coupling of sentence i and sentence L The ratio of the total number of word of word number and sentence L, wherein the sentence L is match statements of the sentence i in sentence database.
According to one preferred embodiment of the present invention, which further includes:Network expanding element, for utilizing the identification after correction As a result scan for, determine with the optimal network documentation of recognition result matching state after correction, intercept in the network documentation with The matched network character content of recognition result after correction is as the recognition result after extension.
According to one preferred embodiment of the present invention, the network expanding element is when executing the operation of the interception, specifically from Minimum sentence of the interception comprising the recognition result after correction or minimum paragraph are as the recognition result after extension in the network documentation.
As can be seen from the above technical solutions, the present invention obtains sentence structure distribution by printed page analysis, is based on sentence knot Structure is distributed the correction that semantic analysis is carried out to the recognition result of each block, to efficiently utilize the semantic information between word The recognition result of each block is modified, the precision of pictograph identification is improved, preferably meeting the identification of user needs It asks.
【Description of the drawings】
Fig. 1 is the method flow diagram for the pictograph identification that the embodiment of the present invention one provides;
Fig. 2 is the character area instance graph that the embodiment of the present invention one provides;
Fig. 3 is the correction course schematic diagram based on semantic analysis that the embodiment of the present invention one provides;
Fig. 4 is system schematic provided in an embodiment of the present invention;
Fig. 5 is the structure drawing of device of image recognition provided by Embodiment 2 of the present invention;
Fig. 6 is the structure chart of semantic analysis unit provided by Embodiment 2 of the present invention.
【Specific implementation mode】
To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.
Embodiment one,
Fig. 1 is the method flow diagram of pictograph provided in an embodiment of the present invention identification, as shown in Figure 1, this method can be with Include the following steps:
Step 101:Obtain the character area in images to be recognized.
Server obtains the image for including text information that mobile terminal is sent, which can be mobile terminal shooting The original image arrived, server extract the character area in images to be recognized in this step.Alternatively, the image can be mobile After terminal taking to original image, the character area in images to be recognized is sent out after extracting the character area in images to be recognized Give server.
Existing mode may be used when extracting character area, extract character area after removing image background, can adopt With but be not limited to following manner:
Mode one, first according to colored Euclidean distance carry out color run-length coding, then carry out color cluster, based on cluster As a result the generation and selection of character layer are carried out, such as Retention area is more than the connected domain of certain value, based on connected domain and each color The Euclidean distance of cluster centre generates each image level, finally according to the number of pixels of each image level and this layer of segmentation threshold The relationship of number of pixels determines word level, noise level or background level, finally takes out after noise level and background level just Obtain word level, i.e. character area.
Mode two, a large amount of word sample image of selection and the picture without word, use this two class of canny operator extractions Training sample of the marginal information of picture as rarefaction representation classifying dictionary;Two class training samples are inputted into classification rarefaction representation word Allusion quotation training algorithm obtains word rarefaction representation classifying dictionary and non-legible rarefaction representation classifying dictionary;Images to be recognized is switched into ash Image is spent, the marginal information of canny operator extraction gray level images is used;Gray scale is extracted using the rarefaction representation based on classifying dictionary Candidate character region in image edge information;In the horizontal direction in vertical direction respectively use run-length smoothing algorithm will wait The edge for selecting character area isolated is connected as larger region, then carries out Projection Analysis and find out corresponding literal line, casts out simultaneously Isolated edge in candidate character region other than literal line;The character area detected is identified.
If mobile terminal carries out the extraction of character area, existing character area extraction software or hand may be used Dynamic mode carries out the extraction of character area.
In addition, the character area obtained in this step can be one, can also be two or more.Due in this step Content is the prior art, and details are not described herein.
Step 102:Each block in character area is identified respectively and records the location information of each block.
The process that wherein each block in character area is identified respectively can include equally following with the prior art Step:Binaryzation is carried out to character area;Each block is divided into the character area after binaryzation;Extract the feature letter of each block It ceases and is matched with character features database, using matching result as the recognition result of each block.Specific implementation repeats no more.
The location information of the block of record can be:The coordinate information of block center in the picture, can also further wrap Include the size information etc. of block.
It should be noted that the recognition result of each block there may be multiple, i.e., for a block, there may be multiple Recognition result is usually to determine the recognition result for meeting preset requirement with block similarity in picture, and each according to similarity All there is a confidence level in recognition result.By taking each block in image shown in Fig. 2 as an example:
The recognition result of first character block is:In(44), in(44);
The recognition result of second block is:State(32), enclose(31), ware(29);
The recognition result of third block is:Nosebleed(41), it is bright(40), postal(39);
The recognition result of 4th block is:Political affairs(67), attack(48), change(46).
Number in wherein above-mentioned bracket is the confidence level of each recognition result.
Step 103:Location information based on each block carries out printed page analysis and obtains sentence structure distribution.
It will be less than default first in vertical upper position gap using the coordinate information of block center in the picture in this step Threshold value(I.e. approximation is on a horizontal line)Literal line of the block as a horizontal direction, alternatively, will in level position Gap is less than default second threshold(It is i.e. approximate to be arranged vertically at one)Literal line of the block as a vertical direction.As for The layout for taking the literal line of horizontal direction that the literal line of vertical direction is still taken to depend on image context word, if it is lateral writing , then the literal line of horizontal direction is taken in this step, if it is what is longitudinally write, then takes the text of vertical direction in this step Word row, this can be set in advance.
More preferably, it can also will be less than default the in vertical upper position gap further combined with the size information of block One threshold value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction, alternatively, will be Position gap is less than default second threshold in level and block difference in size is perpendicular as one less than the block of default size threshold value Histogram to literal line.
Above-mentioned first threshold and second threshold can be configured based on experience value, the value can as the case may be into Row adjustment.For example, for character area shown in Fig. 2, since four blocks in vertical direction presetting by position gap Within first threshold, therefore this four blocks are connected into a literal line.
The present invention is when it is implemented, the acquisition of this step sentence structure distributed intelligence can be not limited to base set out above In the method for threshold decision, while non-horizontal or vertical direction oblique literal line can also be extracted by other mechanism.
Step 104:The correction based on semantic analysis is carried out to the recognition result of each block based on sentence structure distribution, is obtained Recognition result after correction.
The recognition result that the block of same literal line will be belonged to carries out semantic analysis correction, carries out semantic analysis timing, The recognition result of each block is matched in order with word library first, determines the recognition result institute of each block of same literal line The word combination that can be constituted;Then the semantic confidence degree for each sentence that the recognition result of same literal line can be combined into is determined, Select the highest sentence of semantic confidence degree as the recognition result after correction;Alternatively, further by each sentence and sentence database It is matched, the matching confidence level of each sentence is determined according to matching state, semantic confidence degree and matching confidence level are combined The total confidence level for determining each sentence selects the highest sentence of total confidence level as the recognition result after correction.
The correction course based on semantic analysis is described in detail with reference to Fig. 3, as shown in figure 3, the correction course It may comprise steps of:
Step 301:The recognition result of each block in literal line is matched with word library, by non-first place in literal line The recognition result that word can not be formed with the recognition result of adjacent block in the recognition result of block is deleted, but for can be independent Except recognition result at semantic or close to block recognition result missing.
It should be noted that by can not be with the identification of adjacent block in the recognition result of the block of non-first place in literal line As a result the step of recognition result of composition word is deleted is held to improve the efficiency of follow-up sentence confidence calculations and selection Capable step, it is not essential however to.
Step 302:It is carried out by block sequence using the recognition result for constituting word and the recognition result for not constituting word Combination obtains each sentence, determines the semantic confidence degree of each sentence.
It is exactly the sentence determined literal line and be possible to identify in this step, will be constituted according to the sequence of each block The recognition result of word and the recognition result for not constituting word are combined, and obtain all possible sentence.
Still by taking the recognition result of situation shown in Fig. 2 as an example, in the recognition result of second block " enclosing ", " ware " with And " nosebleed " in third recognition result cannot the recognition result of its immediate block constitute word, and itself does not have yet It is independent semantic, therefore can be deleted in step 301.
Each recognition result is matched with word library, obtained word includes:China, it is postal, bright attack, open policy.
Generating all possible sentence includes:
In state is bright attacks
In state is bright changes
In state's open policy
In state it is postal
In state's postal change
In state's postal attack
China is bright to be changed
Chinese open policy
China is bright to be attacked
China Post
Chinese postal is attacked
Chinese postal changes
When determining that the semantic confidence of each sentence is spent, the confidence level of each recognition result in sentence can be utilized to sum, wherein The confidence level for the recognition result for constituting word can be improved, such as the confidence level for the recognition result for constituting word is doubled, will be asked Semantic confidence degree of the confidence level as sentence with after.
The semantic confidence degree of example in connecting, each sentence is as follows, and the number wherein in round bracket is the confidence of each recognition result It spends, the number in bracket is the semantic confidence degree of sentence.
In(44)State(32)It is bright to attack(40*2+48*2)【268】
In (44) state (32) it is bright(40)Change(46)【162】
In (44) state (32) open policy(40*2+67*2)【214】
In (44) state (32) it is postal(39*2+67*2)【288】
In (44) state (32) postal(39)Change(46)【161】
In (44) state (32) postal(39)It attacks(48)【163】
China(44*2+32*2)It is bright(40)Change(46)【238】
China(44*2+32*2)Open policy(40*2+67*2)【366】
China(44*2+32*2)It is bright to attack(40*2+48*2)【328】
China(44*2+32*2)It is postal(39*2+67*2)【364】
China(44*2+32*2)Postal(39)It attacks(48)【239】
China(44*2+32*2)Postal(39)Change(46)【237】
Step 303:N1 before semantic confidence degree comes sentences are selected, n1 is preset positive integer, the sentence that will be selected It is matched with sentence database, the matching confidence level of each sentence is determined according to matching state.
Several sentence before semantic confidence degree comes can be selected in this step to be matched with sentence database, this Sample is done more efficient, naturally it is also possible to be matched all sentences with sentence database, be determined the matching of all sentences Confidence level.
Example in connecting, it is assumed that select semantic confidence degree and come preceding 3 sentences, is i.e. " Chinese open policy ", " Chinese postal Political affairs " and " China is bright to be attacked ".
Match in the sentence database used includes common sentence, the matching journey of the sentence picked out and common sentence Degree is higher, illustrates that it is bigger as the possibility of recognition result, therefore, foundation and sentence when determining matching confidence level here The matching state of database determines.So-called matching state can be embodied in the number of words and sentence and sentence number of sentence itself According to the maximum continuous coupling word of the match statement in library and the information such as word ratio of the match statement.
Assuming that the matching confidence level C of sentence imIt is determined using following formula:
Cm=Ni×α×Pi(1)
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, such as takes 100, PiMost for sentence i and sentence L The ratio of the total number of word of big continuous matching literal number and sentence L, wherein sentence L is with sentence i in sentence database With sentence, that is to say, that when being matched the sentence selected with sentence database, sentence L, sentence L can be obtained first Can be completely with sentence i on word matched sentence, can also be matched and match language on most words with sentence i The most sentence of sentence i numbers of words, that is to say, that certain journey can be reached with the characters matching degree with sentence i in case statement database It spends and the maximum sentence of matching degree is as match statement, i.e. sentence L.In specific implementation process of the present invention, the acquisition of sentence L also may be used To be obtained by other statement matching strategies.
For example, sentence " Chinese open policy " is matched to " the China Unicom open policy cooperation Room " in sentence database, according to formula (1)The matching confidence level of calculating is:
Sentence " China Post " is matched to " China Post " in sentence database, according to formula(1)The matching of calculating is set Reliability is:
Sentence " China is bright to be attacked " is not matched in sentence database, and matching confidence level is 0.
Step 304:The semantic confidence degree of each sentence and matching confidence level are combined to the total confidence level for determining each sentence, Select the highest sentence of total confidence level as the recognition result after correction.
Assuming that the value that semantic confidence degree and matching confidence level are summed in this step is as the total of each sentence Confidence level, then connecting upper example:
Total confidence level of sentence " China Post " is:366+88=454.
Total confidence level of sentence " China Post " is:364+400=764.
Total confidence level of sentence " China bright attacks " is:328+0=328.
Final choice " China Post " is as the recognition result after correction.
After having carried out semantic analysis correction, have been able to ensure that recognition result has certain accuracy, server can It is shown so that the recognition result after correction is returned to mobile terminal, but in order to further improve accuracy of identification, Ke Yijie The mode for closing web search is extended, you can further to execute step 105.
With continued reference to Fig. 1, step 105:Determine that the network documentation optimal with recognition result matching state after correction, interception should With the matched network character content of recognition result after correction as the recognition result after extension in network documentation.
Recognition result after correction is scanned in a network, each document that search obtains is calculated and is tied with identification after correction The matching state of fruit determines the wherein optimal document of matching state.Optimal so-called matching state can be that matched number of words is most, Can also be that matched number of words accounts for number of words ratio maximum of network character content etc..
In intercept network word content, the network text of recognition result after correction is included from interception in determining network documentation Word content, specifically can based in network documentation punctuate or carriage return, interception comprising correction after recognition result minimum sentence or Minimum paragraph is as the recognition result after extension.So far server the recognition result after extension can be returned to mobile terminal into Row display.
Example in connecting, " China Post " is scanned in a network, obtains that with current recognition result to match number of words most Network character content, interception comprising current recognition result minimum sentence be " group company of China Post " as extend after knowledge Other result.
Can the recognition result after correction and the recognition result after extension be selected one to be shown, can also all shown.Example It is such as shown as " China Post-group company of China Post ".
It is the detailed description carried out to method provided by the present invention above, the present invention is carried with reference to embodiment two The device of confession is described in detail, which is arranged on the server, is mainly used for system architecture as shown in Figure 4, the system It is made of mobile terminal and server, wherein mobile terminal can be using the image comprising word taken as images to be recognized It is sent to server, character area is therefrom extracted by server, alternatively, mobile terminal makees the image comprising word taken After images to be recognized, character area is therefrom extracted, which is sent to server.Server executes implementation later Flow shown in example one, finally by the recognition result after the correction based on semantic analysis and the identification after network extends As a result one kind or combination in return to mobile terminal.
Embodiment two,
Fig. 5 is the structure drawing of device of image recognition provided by Embodiment 2 of the present invention, as shown in figure 5, the device includes:Area Domain acquiring unit 500, word recognition unit 510, position recording unit 520, printed page analysis unit 530 and semantic analysis unit 540。
Area acquisition unit 500 obtains the character area in images to be recognized first, and wherein area acquisition unit 500 can be with The images to be recognized that mobile terminal is sent is received, character area is extracted from images to be recognized;Alternatively, receive mobile terminal from The character area for extracting and sending in images to be recognized.
Each block in character area is identified in word recognition unit 510 respectively, and existing identification side may be used Formula, such as specifically include:Binaryzation is carried out to character area;Each block is divided into the character area after binaryzation;Extraction is each The characteristic information of block is simultaneously matched with property data base, using matching result as the recognition result of each block.
Position recording unit 520 records the location information of each block, and the location information of record can be:Scheming at block center Coordinate information as in, can further include the size information etc. of block.
Location information of the printed page analysis unit 530 based on each block carries out printed page analysis and obtains sentence structure distribution.The space of a whole page Analytic unit 530 can be specifically configured to:
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in images to be recognized Literal line of the block of value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in images to be recognized Literal line of the block of value as a vertical direction;Alternatively,
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a vertical direction.
As for the layout for taking the literal line of horizontal direction that the literal line of vertical direction is still taken to depend on image context word, such as Fruit is laterally write, then takes the literal line of horizontal direction in this step, if it is what is longitudinally write, is then taken in this step The literal line of vertical direction, this can be set in advance.Above-mentioned first threshold and second threshold can based on experience value into Row setting, the value can be adjusted as the case may be.
The present invention is based on threshold decision when it is implemented, the implementation of printed page analysis unit can be not limited to set out above obtain Method, while non-horizontal or vertical direction oblique literal line can also be extracted by other mechanism.
Semantic analysis unit 540 carries out the school based on semantic analysis based on sentence structure distribution to the recognition result of each block Just, the recognition result after being corrected.
The structure of semantic analysis unit 540 is described in detail below, as shown in fig. 6, semantic analysis unit 540 can be with It specifically includes:Dictionary coupling subelement 541, sentence determination subelement 542, semantic confidence degree determination subelement 543, matching confidence Spend determination subelement 544 and correction subelement 545.
Dictionary coupling subelement 541 matches the recognition result of each block in literal line with word library, is constituted The recognition result of word.
Preferably, dictionary coupling subelement 541 can also be by can not in the recognition result of the block of non-first place in literal line The recognition result that word is formed with the recognition result close to block is deleted, but for can be independently at semantic or close to block Except the recognition result of recognition result missing.
Sentence determination subelement 542 is using the recognition result for constituting word and does not constitute the recognition result of word by block Sequence is combined to obtain each sentence.
Semantic confidence degree determination subelement 543 determines the semantic confidence degree of each sentence.Semantic confidence degree is to be based on each identification What confidence level as a result determined, in this case, word recognition unit 510 is also according to recognition result and picture shown in Fig. 5 The similarity of middle block determines the confidence level of the recognition result of each block.Semantic confidence degree determination subelement 543 utilizes language at this time The confidence level of each recognition result sums to obtain the semantic confidence degree of sentence in sentence, wherein improving the identification for constituting word in summation As a result confidence level.
Matching confidence level determination subelement 544 matches each sentence with sentence database, is determined according to matching state The matching confidence level of each sentence.
In order to improve the efficiency of matching confidence calculations, matching confidence level determination subelement 544 can select semantic confidence N1 sentences before degree comes, n1 are preset positive integer, and the sentence selected is matched with sentence database, according to matching Situation determines the matching confidence level of each sentence.
Match in the sentence database used includes common sentence, the matching journey of the sentence picked out and common sentence Degree is higher, illustrates that it is bigger as the possibility of recognition result, therefore, foundation and sentence when determining matching confidence level here The matching state of database determines.So-called matching state can be embodied in the number of words and sentence and sentence number of sentence itself According to the maximum continuous coupling word of the match statement in library and the information such as word ratio of the match statement.
The matching confidence level C that following formula determines sentence i specifically may be usedm
Cm=Ni×α×Pi
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, PiFor the maximum continuous coupling of sentence i and sentence L The ratio of the total number of word of word number and sentence L, wherein sentence L are the match statement with sentence i in sentence database, also Be to say, when being matched the sentence selected with sentence database, sentence L can be obtained first, sentence L can be completely with Sentence i matched sentences on word can also be that matched and match statement i numbers of words are most on most words with sentence i Sentence, that is to say, that can be reached a certain level with the characters matching degree with sentence i in case statement database and matching degree most Big sentence is as match statement, i.e. sentence L.
The semantic confidence degree of each sentence and matching confidence level are combined and determine always setting for each sentence by correction subelement 545 Reliability selects the highest sentence of total confidence level as the recognition result after correction.
With continued reference to Fig. 5, in order to further improve accuracy of identification, which can also include:Network expanding element 550, for being scanned for using the recognition result after correction, determine the network optimal with the recognition result matching state after correction Document intercepts in the network documentation with the matched network character content of recognition result after correction as the identification knot after extension Fruit.
Specifically when executing the operation of interception, it can be intercepted from the network documentation comprising the recognition result after correction most Small sentence or minimum paragraph are as the recognition result after extension.
In the server due to device setting, server can by after correction that above-mentioned apparatus obtains recognition result and expansion Recognition result after exhibition, which selects one and returns to mobile terminal, to be shown, and can also all be returned to mobile terminal and is shown.
Method and apparatus provided by the invention have following advantages it can be seen from above description:
1)The semantic information efficiently utilized between word is modified the recognition result of each block, improves image The precision of Text region preferably meets the identification demand of user.
2)A large amount of network character resources present in internet are taken full advantage of, recognition result are extended, more into one Step excavates user view, promotes the use demand of user.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention With within principle, any modification, equivalent substitution, improvement and etc. done should be included within the scope of protection of the invention god.

Claims (19)

1. a kind of pictograph knows method for distinguishing, which is characterized in that this method includes:
Character area in S1, acquisition images to be recognized;
S2, each block in character area is identified respectively and records the location information of each block;
S3, the location information based on each block carry out printed page analysis and obtain sentence structure distribution;
S41, the recognition result of each block in literal line is matched with word library, obtains the recognition result for constituting word;
S42, it is combined to obtain by block sequence using the recognition result for constituting word and the recognition result for not constituting word Each sentence;
S43, the semantic confidence degree for determining each sentence, or further match each sentence with sentence database, foundation The matching confidence level of each sentence is determined with situation;
S44, the semantic confidence degree according to sentence or total confidence level, case statement is as the recognition result after correction, wherein sentence Total confidence level by sentence semantic confidence degree and matching confidence level combine determine.
2. according to the method described in claim 1, it is characterized in that, the S1 is specifically included:
Server receives the images to be recognized that mobile terminal is sent, and character area is extracted from the images to be recognized;Alternatively,
Server receives the character area that mobile terminal is extracted and sent from images to be recognized.
3. according to the method described in claim 1, it is characterized in that, the S3 is specifically included:
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in the images to be recognized Literal line of the block of value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in the images to be recognized Literal line of the block of value as a vertical direction;Alternatively,
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in the images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in the images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a vertical direction.
4. according to the method described in claim 1, it is characterized in that, further including in the S41:By non-first place in literal line In the recognition result of block can not with close to block recognition result form word recognition result delete, but for can independently at Except the recognition result of semantic or close to block recognition result missing.
5. according to the method described in claim 1, it is characterized in that, further including in the S2:According in recognition result and picture The similarity of block determines the confidence level of the recognition result of each block;
In the S43 using sentence in the confidence level of each recognition result sum to obtain the semantic confidence degree of sentence, wherein asking With when improve constitute word recognition result confidence level.
6. according to the method described in claim 1, it is characterized in that, by each sentence and sentence database progress in the S43 Match, determines that the matching confidence level of each sentence specifically includes according to matching state:N1 before semantic confidence degree comes sentences are selected, N1 is preset positive integer, and the sentence selected is matched with sentence database, and of each sentence is determined according to matching state With confidence level.
7. according to the method described in claim 1, it is characterized in that, of sentence i is determined using following formula in the S43 With confidence level Cm
Cm=Ni×α×Pi
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, PiFor the maximum continuous coupling word of sentence i and sentence L The ratio of the total number of word of number and sentence L, wherein the sentence L is match statements of the sentence i in sentence database.
8. according to the method described in claim 1, it is characterized in that, this method further includes:
S5, it is scanned for using the recognition result after correction, determines the network optimal with the recognition result matching state after correction Document intercepts in the network documentation with the matched network character content of recognition result after correction as the identification knot after extension Fruit.
9. according to the method described in claim 8, it is characterized in that, in described interception network documentation with the identification knot after correction The matched network character content of fruit as extension after recognition result be:
From minimum sentence of the interception comprising the recognition result after correction in the network documentation or minimum paragraph as the knowledge after extension Other result.
10. a kind of device of image recognition, which is characterized in that the device includes:
Area acquisition unit, for obtaining the character area in images to be recognized;
Word recognition unit, for each block in character area to be identified respectively;
Position recording unit, the location information for recording each block;
Printed page analysis unit carries out printed page analysis for the location information based on each block and obtains sentence structure distribution;
Semantic analysis unit obtains constituting word for matching the recognition result of each block in literal line with word library Recognition result;It is combined by block sequence using the recognition result for constituting word and the recognition result for not constituting word To each sentence;It determines the semantic confidence degree of each sentence, or further matches each sentence with sentence database, foundation The matching confidence level of each sentence is determined with situation;Semantic confidence degree according to sentence or total confidence level, case statement is as correction Recognition result afterwards, wherein total confidence level of sentence are combined and are determined by the semantic confidence degree and matching confidence level of sentence.
11. device according to claim 10, which is characterized in that the area acquisition unit receives mobile terminal and sends Images to be recognized, extract character area from the images to be recognized;It is carried from images to be recognized alternatively, receiving mobile terminal The character area for taking and sending.
12. device according to claim 10, which is characterized in that the printed page analysis unit is specifically configured to:
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in the images to be recognized Literal line of the block of value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in the images to be recognized Literal line of the block of value as a vertical direction;Alternatively,
It will be less than default first threshold in vertical upper position gap using coordinate information of the block center in the images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a horizontal direction;Alternatively,
Position gap it will be less than default second threshold in level using coordinate information of the block center in the images to be recognized Value and block difference in size are less than literal line of the block of default size threshold value as a vertical direction.
13. the device according to claim 10 or 12, which is characterized in that the semantic analysis unit specifically includes:
Dictionary coupling subelement obtains constituting word for matching the recognition result of each block in literal line with word library The recognition result of language;
Sentence determination subelement, for suitable by block using the recognition result for constituting word and the recognition result for not constituting word Sequence is combined to obtain each sentence;
Semantic confidence degree determination subelement, the semantic confidence degree for determining each sentence;
Confidence level determination subelement is matched, for matching each sentence with sentence database, is determined according to matching state each The matching confidence level of sentence;
Subelement is corrected, for the semantic confidence degree of each sentence and matching confidence level to be combined to the total confidence for determining each sentence Degree selects the highest sentence of total confidence level as the recognition result after correction.
14. device according to claim 13, which is characterized in that the dictionary coupling subelement is additionally operable to literal line In non-first place block recognition result in can not form the recognition result of word with the recognition result close to block and delete, but it is right Except the recognition result that can be independently lacked at semantic or close to block recognition result.
15. device according to claim 13, which is characterized in that the word recognition unit is additionally operable to according to identification knot The similarity of fruit and block in picture determines the confidence level of the recognition result of each block;
The semantic confidence degree determination subelement is specifically configured to:It sums to obtain using the confidence level of each recognition result in sentence The semantic confidence degree of sentence, wherein improving the confidence level for the recognition result for constituting word in summation.
16. device according to claim 13, which is characterized in that the matching confidence level determination subelement is specifically configured For:Select the sentence of n1 before semantic confidence degree comes, n1 is preset positive integer, by the sentence selected and sentence database into Row matching, the matching confidence level of each sentence is determined according to matching state.
17. device according to claim 13, which is characterized in that the matching confidence level determination subelement is using following public Formula determines the matching confidence level C of sentence im
Cm=Ni×α×Pi
Wherein, NiFor the number of words that sentence i includes, α is preset coefficient, PiFor the maximum continuous coupling word of sentence i and sentence L The ratio of the total number of word of number and sentence L, wherein the sentence L is match statements of the sentence i in sentence database.
18. device according to claim 10, which is characterized in that the device further includes:Network expanding element, for utilizing Recognition result after correction scans for, and determines that the network documentation optimal with the recognition result matching state after correction, interception should With the matched network character content of recognition result after correction as the recognition result after extension in network documentation.
19. device according to claim 18, which is characterized in that the network expanding element is in the behaviour for executing the interception When making, specifically after minimum sentence of the interception comprising the recognition result after correction in the network documentation or minimum paragraph are used as extension Recognition result.
CN201210279370.4A 2012-08-07 2012-08-07 A kind of method and apparatus of pictograph identification Active CN103577818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210279370.4A CN103577818B (en) 2012-08-07 2012-08-07 A kind of method and apparatus of pictograph identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210279370.4A CN103577818B (en) 2012-08-07 2012-08-07 A kind of method and apparatus of pictograph identification

Publications (2)

Publication Number Publication Date
CN103577818A CN103577818A (en) 2014-02-12
CN103577818B true CN103577818B (en) 2018-09-04

Family

ID=50049568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210279370.4A Active CN103577818B (en) 2012-08-07 2012-08-07 A kind of method and apparatus of pictograph identification

Country Status (1)

Country Link
CN (1) CN103577818B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021142765A1 (en) * 2020-01-17 2021-07-22 Microsoft Technology Licensing, Llc Text line detection

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951741A (en) * 2014-03-31 2015-09-30 阿里巴巴集团控股有限公司 Character recognition method and device thereof
CN104143084A (en) * 2014-07-17 2014-11-12 武汉理工大学 Auxiliary reading glasses for visual impairment people
CN105574530B (en) * 2014-10-08 2019-11-22 富士通株式会社 The method and apparatus for extracting the line of text in document
CN105631393A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Information recognition method and device
CN105678207A (en) * 2014-11-19 2016-06-15 富士通株式会社 Device and method for identifying content of target nameplate image from given image
CN106709489B (en) * 2015-07-13 2020-03-03 腾讯科技(深圳)有限公司 Character recognition processing method and device
CN108399405B (en) * 2017-02-07 2023-06-27 腾讯科技(上海)有限公司 Business license identification method and device
US10275687B2 (en) * 2017-02-16 2019-04-30 International Business Machines Corporation Image recognition with filtering of image classification output distribution
GB2571530B (en) * 2018-02-28 2020-09-23 Canon Europa Nv An image processing method and an image processing system
CN109308476B (en) * 2018-09-06 2019-08-27 邬国锐 Billing information processing method, system and computer readable storage medium
CN109033798B (en) * 2018-09-14 2020-07-07 北京金堤科技有限公司 Click verification code identification method and device based on semantics
CN111615702B (en) * 2018-12-07 2023-10-17 华为云计算技术有限公司 Method, device and equipment for extracting structured data from image
CN109934210B (en) 2019-05-17 2019-08-09 上海肇观电子科技有限公司 Printed page analysis method, reading aids, circuit and medium
CN112183513B (en) * 2019-07-03 2023-09-05 杭州海康威视数字技术股份有限公司 Method and device for recognizing characters in image, electronic equipment and storage medium
CN110490190B (en) * 2019-07-04 2021-10-26 贝壳技术有限公司 Structured image character recognition method and system
CN111539412B (en) * 2020-04-21 2021-02-26 上海云从企业发展有限公司 Image analysis method, system, device and medium based on OCR
CN112541496B (en) * 2020-12-24 2023-08-22 北京百度网讯科技有限公司 Method, device, equipment and computer storage medium for extracting POI (point of interest) names

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604073A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
CN101447017A (en) * 2008-11-27 2009-06-03 浙江工业大学 Method and system for quickly identifying and counting votes on the basis of layout analysis
CN101493896A (en) * 2008-01-24 2009-07-29 夏普株式会社 Document image processing apparatus and method
CN101770576A (en) * 2008-12-31 2010-07-07 北京新岸线网络技术有限公司 Method and device for extracting characters
CN102456136A (en) * 2010-10-29 2012-05-16 方正国际软件(北京)有限公司 Image-text splitting method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006092346A (en) * 2004-09-24 2006-04-06 Fuji Xerox Co Ltd Equipment, method, and program for character recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604073A (en) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 Method for conducting title and text logic connection for newspaper pages
CN101493896A (en) * 2008-01-24 2009-07-29 夏普株式会社 Document image processing apparatus and method
CN101447017A (en) * 2008-11-27 2009-06-03 浙江工业大学 Method and system for quickly identifying and counting votes on the basis of layout analysis
CN101770576A (en) * 2008-12-31 2010-07-07 北京新岸线网络技术有限公司 Method and device for extracting characters
CN102456136A (en) * 2010-10-29 2012-05-16 方正国际软件(北京)有限公司 Image-text splitting method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
印刷体汉字识别系统研究与实现;刘聚宁;《中国优秀硕士学位论文全文数据库》;20110915(第9期);第3-4、11、14页 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021142765A1 (en) * 2020-01-17 2021-07-22 Microsoft Technology Licensing, Llc Text line detection

Also Published As

Publication number Publication date
CN103577818A (en) 2014-02-12

Similar Documents

Publication Publication Date Title
CN103577818B (en) A kind of method and apparatus of pictograph identification
CN110569832B (en) Text real-time positioning and identifying method based on deep learning attention mechanism
US20200065601A1 (en) Method and system for transforming handwritten text to digital ink
Burie et al. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc)
JP5031741B2 (en) Grammatical analysis of document visual structure
EP1598770B1 (en) Low resolution optical character recognition for camera acquired documents
US8494273B2 (en) Adaptive optical character recognition on a document with distorted characters
JP2020184109A (en) Learning model generation device, character recognition device, learning model generation method, character recognition method, and program
CN108805076A (en) The extracting method and system of environmental impact assessment report table word
CN111191649A (en) Method and equipment for identifying bent multi-line text image
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
US20150055866A1 (en) Optical character recognition by iterative re-segmentation of text images using high-level cues
CN111046760A (en) Handwriting identification method based on domain confrontation network
CN111753120A (en) Method and device for searching questions, electronic equipment and storage medium
CN115131804A (en) Document identification method and device, electronic equipment and computer readable storage medium
CN104408403B (en) A kind of referee method that secondary typing is inconsistent and device
CN108304815A (en) A kind of data capture method, device, server and storage medium
CN112949649B (en) Text image identification method and device and computing equipment
CN109508712A (en) A kind of Chinese written language recognition methods based on image
RU2657181C1 (en) Method of improving quality of separate frame recognition
CN106339726A (en) Method and device for handwriting recognition
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
CN115050025A (en) Knowledge point extraction method and device based on formula recognition
Su et al. HITHCD-2018: Handwritten Chinese Character Database of 21K-Category
Dimov et al. CBIR approach to the recognition of a sign language alphabet

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant