CN103679165A - OCR (optical character recognition) character recognition method and system - Google Patents

OCR (optical character recognition) character recognition method and system Download PDF

Info

Publication number
CN103679165A
CN103679165A CN201310752624.4A CN201310752624A CN103679165A CN 103679165 A CN103679165 A CN 103679165A CN 201310752624 A CN201310752624 A CN 201310752624A CN 103679165 A CN103679165 A CN 103679165A
Authority
CN
China
Prior art keywords
word string
noise
character
ocr
string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310752624.4A
Other languages
Chinese (zh)
Other versions
CN103679165B (en
Inventor
王海峰
和为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310752624.4A priority Critical patent/CN103679165B/en
Publication of CN103679165A publication Critical patent/CN103679165A/en
Application granted granted Critical
Publication of CN103679165B publication Critical patent/CN103679165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an OCR (optical character recognition) character recognition method. The method comprises the following steps of executing the OCR character recognition for an image in a target area selected by a user so as to obtain a recognized word string; calculating the quantity of sub-word strings in the recognized word string; judging whether the number of characters in a first sub-word string W1 and the number of characters in a kth sub-word string WK are smaller than a preset value or not when the quantity of the sub-word strings in the word string is more than 2; judging the noise probability score of W1 and/or the noise probability score of the WK is greater than a preset noise or not if the number of the characters in the W1 and/or the number of characters in the WK is smaller than the preset value; determining the W1 and/or WK is noise if the noise probability score of W1 and/or the noise probability score of WK is greater than the preset noise, and deleting W1 and/or WK from the word string so as to obtain a novel word string. According to the embodiment, the OCR translation accuracy for the OCR recognition result can be enhanced. The invention also provides an OCR character recognition system.

Description

OCR character identifying method and system
Technical field
The present invention relates to character recognition technologies field, particularly a kind of OCR character identifying method and system.
Background technology
At present a lot of translation APP products are all supported the interpretative function of taking pictures, its operation steps for example: user holds mobile terminal (as smart mobile phone) and takes pictures facing to the foreign language that will translate, and the photo of take is coverd with one deck gray scale; User slides coveing with on the photo of gray scale with finger, the word " wiping " of wanting to translate out; OCR identification is carried out in the region that user is clashed, and obtains foreign language text; Call mechanical translation module, OCR result is translated, finally represent to user.
Whole operating process as shown in Figure 1.But in said process, have a problem, user in " wiping " word, because finger has blocked screen, often can left and right or neighbouring word also together " wiping " in OCR scope.As above shown in figure, this expects this word of translation Obama user, but in practical operation left and right each marked several letters more, cause the result of OCR to be " it Obama I ", through mechanical translation, the final translation result obtaining is " Obama, I ".Such translation result can cause puzzlement to user, affects user and experiences.
Summary of the invention
Object of the present invention is intended at least solve one of described technological deficiency.
For this reason, one object of the present invention is to propose a kind of OCR character identifying method.The method can promote the accuracy to the OCR translation of the result of OCR identification.
Another object of the present invention is to propose a kind of OCR character recognition system.
For achieving the above object, the embodiment of first aspect present invention discloses a kind of OCR character identifying method, comprise the following steps: the image in the target area that user is selected carries out OCR character recognition to obtain the word string of identification, wherein, institute's predicate string comprises K sub-word string, every sub-word string at least comprises 1 character, and described K is positive integer; Calculate the quantity of the word string neutron word string of described identification; If the quantity of institute's predicate string neutron word string is greater than 2, judge described the 1st sub-word string W 1the number of middle character and described K sub-word string W kwhether the number of middle character is less than preset value; If described W 1the number of middle character and/or W kthe number of middle character is less than described preset value, judges described W 1noise probability score and/or W knoise probability score whether be greater than default noise; If so, judge described W 1and/or described W kfor noise and from institute's predicate string, delete described W 1and/or described W kto obtain new word string.
According to the OCR character identifying method of the embodiment of the present invention, for the result of OCR identification in OCR translation, carry out noise reduction process, thus, can identify and delete the OCR noise conventionally bringing due to user misoperation.Like this, after denoising, can promote and purify translation result, make translation result more accurate, improve user and experience.
In addition, OCR character identifying method according to the above embodiment of the present invention can also have following additional technical characterictic:
In some instances, also comprise: if the quantity of institute's predicate string neutron word string equals 2, judge described W 1whether the number of middle character is less than described W kthe number of middle character; If described W 1the number of middle character is less than described W kthe number of middle character, further judges described W 1whether the number of middle character is less than preset value; If described W 1the number of middle character is less than described preset value, further judges described W 1noise probability score whether be greater than default noise; If so, judge described W 1for noise and from institute's predicate string, delete described W 1to obtain new word string.
In some instances, also comprise: if described W 1the number of middle character is greater than described W kthe number of middle character, further judges described W kwhether the number of middle character is less than preset value; If described W kthe number of middle character is less than described preset value, further judges described W knoise probability score whether be greater than default noise; If so, judge described W kfor noise and from institute's predicate string, delete described W kto obtain new word string.
In some instances, described noise obtains by following formula:
P left=αlogp(W 1)+βlogp(W 2|W 1),
P right=αlogp(W k)+βlogp(W k|W k-1)。
In some instances, also comprise: described new word string is carried out to OCR translation.
The embodiment of second aspect present invention provides a kind of OCR character recognition system, comprise: identification module, for the image in the target area that user is selected, carry out OCR character recognition to obtain the word string of identification, wherein, institute's predicate string comprises K sub-word string, every sub-word string at least comprises 1 character, and described K is positive integer; Computing module, for calculating the quantity of the word string neutron word string of described identification; Denoising module, is greater than 2 for the quantity at institute's predicate string neutron word string, judges described the 1st sub-word string W 1the number of middle character and described K sub-word string W kwhether the number of middle character is less than preset value, if while being less than described preset value, judges described W 1noise probability score and/or described W knoise probability score whether be greater than default noise, if be greater than described default noise, judge described W 1and/or described W kfor noise and from institute's predicate string, delete described W 1and/or described W kto obtain new word string.
According to the OCR character recognition system of the embodiment of the present invention, for the result of OCR identification in OCR translation, carry out noise reduction process, thus, can identify and delete the OCR noise conventionally bringing due to user misoperation.Like this, after denoising, can promote and purify translation result, make translation result more accurate, improve user and experience.
In addition, OCR character identifying method according to the above embodiment of the present invention can also have following additional technical characterictic:
In some instances, described denoising module also for: if the quantity of institute's predicate string neutron word string equals 2, judge described W 1whether the number of middle character is less than described W kthe number of middle character; If described W 1the number of middle character is less than described W kthe number of middle character, further judges described W 1whether the number of middle character is less than preset value; If described W 1the number of middle character is less than described preset value, further judges described W 1noise probability score whether be greater than default noise; If so, judge described W 1for noise and from institute's predicate string, delete described W 1to obtain new word string.
In some instances, described denoising module also for: if described W 1the number of middle character is greater than described W kthe number of middle character, further judges described W kwhether the number of middle character is less than preset value; If described W kthe number of middle character is less than described preset value, further judges described W knoise probability score whether be greater than default noise; If so, judge described W kfor noise and from institute's predicate string, delete described W kto obtain new word string.
In some instances, described noise obtains by following formula:
P left=αlogp(W 1)+βlogp(W 2|W 1),
P right=αlogp(W k)+βlogp(W k|W k-1)。
In some instances, also comprise: translation module, for described new word string is carried out to OCR translation.
The aspect that the present invention is additional and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Of the present invention and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:
Fig. 1 is a kind of interface schematic diagram of OCR identification translation;
Fig. 2 is the process flow diagram of OCR character identifying method according to an embodiment of the invention;
Fig. 3 is the process flow diagram of OCR character identifying method in accordance with another embodiment of the present invention; And
Fig. 4 is the structural drawing of OCR character recognition system according to an embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that, term " longitudinally ", " laterally ", " on ", orientation or the position relationship of the indication such as D score, 'fornt', 'back', " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward " be based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, rather than indicate or imply that the device of indication or element must have specific orientation, with specific orientation, construct and operation, therefore can not be interpreted as limitation of the present invention.
In description of the invention, it should be noted that, unless otherwise prescribed and limit, term " installation ", " being connected ", " connection " should be interpreted broadly, for example, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be to be directly connected, and also can indirectly be connected by intermediary, for the ordinary skill in the art, can understand as the case may be the concrete meaning of described term.
Below in conjunction with accompanying drawing, describe according to OCR character identifying method and the system of the embodiment of the present invention.
Fig. 2 is the process flow diagram of OCR character identifying method according to an embodiment of the invention.
As shown in Figure 2, OCR character identifying method according to an embodiment of the invention, comprises the following steps:
Step S201: the image in the target area that user is selected carries out OCR character recognition to obtain the word string of identification, and wherein, word string comprises K sub-word string, and every sub-word string at least comprises 1 character, and K is positive integer.
Step S202: the quantity of calculating the word string neutron word string of identification.
Step S203: if the quantity of word string neutron word string is greater than 2, judge the 1st sub-word string W 1the number of middle character and K sub-word string W kwhether the number of middle character is less than preset value.
Step S204: if W 1the number of middle character and/or W kthe number of middle character is less than preset value, judges W 1noise probability score and/or W knoise probability score whether be greater than default noise.
Step S205: if judge W 1and/or W kfor noise and from word string, delete W 1and/or W kto obtain new word string.
In one embodiment of the invention, this OCR character identifying method, further comprising the steps of:
If the quantity of 1 word string neutron word string equals 2, judge W 1whether the number of middle character is less than W kthe number of middle character.
If 2 W 1the number of middle character is less than W kthe number of middle character, further judges W 1whether the number of middle character is less than preset value.
If 3 W 1the number of middle character is less than preset value, further judges W 1noise probability score whether be greater than default noise.
4 if judge W 1for noise and from word string, delete W 1to obtain new word string.
Further, described method also comprises:
If 1 W 1the number of middle character is greater than W kthe number of middle character, further judges W kwhether the number of middle character is less than preset value.
If 2 W kthe number of middle character is less than preset value, further judges W knoise probability score whether be greater than default noise.
3 if judge W kfor noise and from word string, delete W kto obtain new word string.
In one embodiment of the invention, noise obtains by following formula:
P left=αlogp(W 1)+βlogp(W 2|W 1),
P right=αlogp(W k)+βlogp(W k|W k-1)。
The OCR character identifying method of the embodiment of the present invention, after obtaining new word string, also comprises: new word string is carried out to OCR translation.
As a concrete example, to suppose in OCR translation, OCR recognition result (the word string that identification obtains) is a word string W who comprises k word k: W 1w 2w 3w 4w k-2w k-1w k.W kmiddle W 1and W kit may be the noise that user misoperation is brought.Generally, the length of noise generally can more than one word.It is exactly to calculate respectively W that OCR recognition result is carried out to noise reduction 1and W knoise probability score, if noise probability score is greater than a certain threshold value (being the default noise in above-mentioned example), judge W 1and/or W kit is noise.
Shown in Fig. 3, the concrete step that determines whether noise comprises:
Step S301: start input W k=W 1w k.
Step S302: judge whether K equals 1, if it is performs step S303, otherwise execution step S304.
Step S303: return to W 1.
Step S304: judge whether K equals 2, if it is performs step S305, otherwise execution step S308.
Step S305: judgement W 1whether the number of the character comprising is less than W 2(be W k, K equals 2) and the number of the character that comprises, i.e. len (W 1) <len (W 2), if so, perform step S306, otherwise execution step S307.
Step S306: another T={W 1, wherein, T represents that comprises a sub-word string W 1set.
Step S307, another T={W k, wherein, T represents that comprises a sub-word string W kset.
Step S308: another T={W 1, W k, wherein, T represents that comprises a sub-word string W 1with sub-word string W kset.Shown in Fig. 1, T={it, I}.
Step S309: in deletion set T, character length (being the number of character) is greater than the word of preset value, and wherein, the alphabetical number generally including due to the English word of translating for needs is greater than 3, therefore, this preset value can be made as but be not limited to 3.
Step S310: for the word of set T, calculating noise probability score NoisyScore (), if noise probability score is greater than threshold value θ (i.e. default noise), thinks that the sub-word string that set T comprises is noise.
Step S311: finish.
In above-mentioned example, the computing method of noise probability score NoiseScore () can adopt the method for similar statistical language model, if leftmost word (is W 1), calculate P left, if rightmost word (is W k), calculate P right, concrete formula is:
P left=αlogp(W 1)+βlogp(W 2|W 1);
P right=αlogp(W k)+βlogp(W k|W k-1)。
P (w wherein i| w i-1) expression binary phrase w i-1w iprobability, its statistical method is:
p ( w i | w i - 1 ) = count ( w i - 1 w i ) &Sigma; w i count ( w i - t w i )
And p (w i) expression monobasic word w iprobability, its statistical method is:
p ( w i ) = count ( w i ) &Sigma; w i &prime; count ( w i &prime; )
Wherein, α and β are the weights of monobasic word and binary phrase, and value is respectively but is not limited to-1 and-0.5.
Add up by experiment, the threshold value θ (i.e. default noise) that can set noise probability score NoisyScore () is 10.5.
According to the OCR character identifying method of the embodiment of the present invention, for the result of OCR identification in OCR translation, carry out noise reduction process, thus, can identify and delete the OCR noise conventionally bringing due to user misoperation.Like this, after denoising, can promote and purify translation result, make translation result more accurate, improve user and experience.
Fig. 4 is the structural drawing of OCR character recognition system according to an embodiment of the invention.As shown in Figure 4, OCR character recognition system 400 according to an embodiment of the invention, comprising: identification module 410, computing module 420 and denoising module 430.
Wherein, identification module 410 carries out OCR character recognition to obtain the word string of identification for the image in the target area that user is selected, and wherein, word string comprises K sub-word string, and every sub-word string at least comprises 1 character, and K is positive integer.Computing module 420 is for calculating the quantity of the word string neutron word string of identification.Denoising module 430 is greater than 2 for the quantity at word string neutron word string, judges the 1st sub-word string W 1the number of middle character and K sub-word string W kwhether the number of middle character is less than preset value, if while being less than preset value, and judgement W 1noise probability score and/or W knoise probability score whether be greater than default noise, if be greater than default noise, judge W 1and/or W kfor noise and from word string, delete W 1and/or W kto obtain new word string.
In one embodiment of the invention, denoising module 430 also for: if the quantity of word string neutron word string equals 2, judge W 1whether the number of middle character is less than W kthe number of middle character; If W 1the number of middle character is less than W kthe number of middle character, further judges W 1whether the number of middle character is less than preset value; If W 1the number of middle character is less than described preset value, further judges W 1noise probability score whether be greater than default noise; If so, judge W 1for noise and from word string, delete W 1to obtain new word string.
Further, denoising module 430 also for: if W 1the number of middle character is greater than W kthe number of middle character, further judges W kwhether the number of middle character is less than preset value; If W kthe number of middle character is less than preset value, further judges W knoise probability score whether be greater than default noise; If so, judge W kfor noise and from word string, delete W kto obtain new word string.
Wherein, noise can obtain by following formula:
P left=αlogp(W 1)+βlogp(W 2|W 1),
P right=αlogp(W k)+βlogp(W k|W k-1)。
Certainly, the OCR character recognition system 400 of the embodiment of the present invention, also comprises: translation module (not shown), translation module is for carrying out OCR translation to new word string.
Specifically, shown in Fig. 3, the processing procedure of the OCR character recognition system 400 of the embodiment of the present invention is as follows:
Suppose in OCR translation, OCR recognition result (the word string that identification obtains) is a word string W who comprises k word k: W 1w 2w 3w 4w k-2w k-1w k.W kmiddle W 1and W kit may be the noise that user misoperation is brought.Generally, the length of noise generally can more than one word.It is exactly to calculate respectively W that OCR recognition result is carried out to noise reduction 1and W knoise probability score, if noise probability score is greater than a certain threshold value (being the default noise in above-mentioned example), judge W 1and/or W kit is noise.
Shown in Fig. 3, concrete processing procedure comprises:
Step S301: start input W k=W 1w k.
Step S302: judge whether K equals 1, if it is performs step S303, otherwise execution step S304.
Step S303: return to W 1.
Step S304: judge whether K equals 2, if it is performs step S305, otherwise execution step S308.
Step S305: judgement W 1whether the number of the character comprising is less than W 2(be W k, K equals 2) and the number of the character that comprises, i.e. len (W 1) <len (W 2), if so, perform step S306, otherwise execution step S307.
Step S306: another T={W 1, wherein, T represents that comprises a sub-word string W 1set.
Step S307, another T={W k, wherein, T represents that comprises a sub-word string W kset.
Step S308: another T={W 1, W k, wherein, T represents that comprises a sub-word string W 1with sub-word string W kset.Shown in Fig. 1, T={it, I}.
Step S309: in deletion set T, character length (being the number of character) is greater than the word of preset value, and wherein, the alphabetical number generally including due to the English word of translating for needs is greater than 3, therefore, this preset value can be made as but be not limited to 3.
Step S310: for the word of set T, calculating noise probability score NoisyScore (), if noise probability score is greater than threshold value θ (i.e. default noise), thinks that the sub-word string that set T comprises is noise.
Step S311: finish.
In above-mentioned example, the computing method of noise probability score NoiseScore () can adopt the method for similar statistical language model, if leftmost word (is W 1), calculate P left, if rightmost word (is W k), calculate P right, concrete formula is:
P left=αlogp(W 1)+βlogp(W 2|W 1);
P right=αlogp(W k)+βlogp(W k|W k-1)。
P (w wherein i| w i-1) expression binary phrase w i-1w iprobability, its statistical method is:
p ( w i | w i - 1 ) = count ( w i - 1 w i ) &Sigma; w i count ( w i - t w i )
And p (w i) expression monobasic word w iprobability, its statistical method is:
p ( w i ) = count ( w i ) &Sigma; w i &prime; count ( w i &prime; )
Wherein, α and β are the weights of monobasic word and binary phrase, and value is respectively but is not limited to-1 and-0.5.
Add up by experiment, the threshold value θ (i.e. default noise) that can set noise probability score NoisyScore () is 10.5.
According to the OCR character recognition system of the embodiment of the present invention, for the result of OCR identification in OCR translation, carry out noise reduction process, thus, can identify and delete the OCR noise conventionally bringing due to user misoperation.Like this, after denoising, can promote and purify translation result, make translation result more accurate, improve user and experience.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of described term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.

Claims (10)

1. an OCR character identifying method, is characterized in that, comprises the following steps:
Image in the target area that user is selected carries out OCR character recognition to obtain the word string of identification, and wherein, institute's predicate string comprises K sub-word string, and every sub-word string at least comprises 1 character, and described K is positive integer;
Calculate the quantity of the word string neutron word string of described identification;
If the quantity of institute's predicate string neutron word string is greater than 2, judge described the 1st sub-word string W 1the number of middle character and described K sub-word string W kwhether the number of middle character is less than preset value;
If described W 1the number of middle character and/or W kthe number of middle character is less than described preset value, judges described W 1noise probability score and/or W knoise probability score whether be greater than default noise;
If so, judge described W 1and/or described W kfor noise and from institute's predicate string, delete described W 1and/or described W kto obtain new word string.
2. OCR character identifying method according to claim 1, is characterized in that, also comprises:
If the quantity of institute's predicate string neutron word string equals 2, judge described W 1whether the number of middle character is less than described W kthe number of middle character;
If described W 1the number of middle character is less than described W kthe number of middle character, further judges described W 1whether the number of middle character is less than preset value;
If described W 1the number of middle character is less than described preset value, further judges described W 1noise probability score whether be greater than default noise;
If so, judge described W 1for noise and from institute's predicate string, delete described W 1to obtain new word string.
3. OCR character identifying method according to claim 2, is characterized in that, also comprises:
If described W 1the number of middle character is greater than described W kthe number of middle character, further judges described W kwhether the number of middle character is less than preset value;
If described W kthe number of middle character is less than described preset value, further judges described W knoise probability score whether be greater than default noise;
If so, judge described W kfor noise and from institute's predicate string, delete described W kto obtain new word string.
4. OCR character identifying method according to claim 1, is characterized in that, described noise obtains by following formula: P left=α logp (W 1)+β logp (W 2| W 1), P right=α logp (W k)+β logp (W k| W k-1).
5. according to the OCR character identifying method described in claim 1-4 any one, it is characterized in that, also comprise: described new word string is carried out to OCR translation.
6. an OCR character recognition system, is characterized in that, comprising:
Identification module, carries out OCR character recognition to obtain the word string of identification for the image in the target area that user is selected, and wherein, institute's predicate string comprises K sub-word string, and every sub-word string at least comprises 1 character, and described K is positive integer;
Computing module, for calculating the quantity of the word string neutron word string of described identification;
Denoising module, is greater than 2 for the quantity at institute's predicate string neutron word string, judges described the 1st sub-word string W 1the number of middle character and described K sub-word string W kwhether the number of middle character is less than preset value, if while being less than described preset value, judges described W 1noise probability score and/or described W knoise probability score whether be greater than default noise, if be greater than described default noise, judge described W 1and/or described W kfor noise and from institute's predicate string, delete described W 1and/or described W kto obtain new word string.
7. OCR character recognition system according to claim 6, is characterized in that, described denoising module also for:
If the quantity of institute's predicate string neutron word string equals 2, judge described W 1whether the number of middle character is less than described W kthe number of middle character;
If described W 1the number of middle character is less than described W kthe number of middle character, further judges described W 1whether the number of middle character is less than preset value;
If described W 1the number of middle character is less than described preset value, further judges described W 1noise probability score whether be greater than default noise;
If so, judge described W 1for noise and from institute's predicate string, delete described W 1to obtain new word string.
8. OCR character recognition system according to claim 7, is characterized in that, described denoising module also for:
If described W 1the number of middle character is greater than described W kthe number of middle character, further judges described W kwhether the number of middle character is less than preset value;
If described W kthe number of middle character is less than described preset value, further judges described W knoise probability score whether be greater than default noise;
If so, judge described W kfor noise and from institute's predicate string, delete described W kto obtain new word string.
9. OCR character recognition system according to claim 6, is characterized in that, described noise obtains by following formula: P left=α logp (W 1)+β logp (W 2| W 1), P right=α logp (W k)+β logp (W k| W k-1).
10. according to the OCR character recognition system described in claim 6-9 any one, it is characterized in that, also comprise:
Translation module, for carrying out OCR translation to described new word string.
CN201310752624.4A 2013-12-31 2013-12-31 OCR (optical character recognition) character recognition method and system Active CN103679165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310752624.4A CN103679165B (en) 2013-12-31 2013-12-31 OCR (optical character recognition) character recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310752624.4A CN103679165B (en) 2013-12-31 2013-12-31 OCR (optical character recognition) character recognition method and system

Publications (2)

Publication Number Publication Date
CN103679165A true CN103679165A (en) 2014-03-26
CN103679165B CN103679165B (en) 2017-02-08

Family

ID=50316655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310752624.4A Active CN103679165B (en) 2013-12-31 2013-12-31 OCR (optical character recognition) character recognition method and system

Country Status (1)

Country Link
CN (1) CN103679165B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599857A (en) * 2016-12-20 2017-04-26 广东欧珀移动通信有限公司 Image identification method, apparatus, computer-readable storage medium and terminal device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5448474A (en) * 1993-03-03 1995-09-05 International Business Machines Corporation Method for isolation of Chinese words from connected Chinese text
CN1477559A (en) * 2002-08-23 2004-02-25 华为技术有限公司 Method for implementing long character string prefix matching
US20100008582A1 (en) * 2008-07-10 2010-01-14 Samsung Electronics Co., Ltd. Method for recognizing and translating characters in camera-based image
CN103186587A (en) * 2011-12-30 2013-07-03 牟颖 Method for quickly translating English word of book through mobile phone

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5448474A (en) * 1993-03-03 1995-09-05 International Business Machines Corporation Method for isolation of Chinese words from connected Chinese text
CN1477559A (en) * 2002-08-23 2004-02-25 华为技术有限公司 Method for implementing long character string prefix matching
US20100008582A1 (en) * 2008-07-10 2010-01-14 Samsung Electronics Co., Ltd. Method for recognizing and translating characters in camera-based image
CN103186587A (en) * 2011-12-30 2013-07-03 牟颖 Method for quickly translating English word of book through mobile phone

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599857A (en) * 2016-12-20 2017-04-26 广东欧珀移动通信有限公司 Image identification method, apparatus, computer-readable storage medium and terminal device

Also Published As

Publication number Publication date
CN103679165B (en) 2017-02-08

Similar Documents

Publication Publication Date Title
US9262412B2 (en) Techniques for predictive input method editors
Fowler et al. Effects of language modeling and its personalization on touchscreen typing performance
US9785631B2 (en) Identification and extraction of acronym/definition pairs in documents
CN114399769B (en) Training method of text recognition model, and text recognition method and device
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
US10325018B2 (en) Techniques for scheduling language models and character recognition models for handwriting inputs
WO2014190732A1 (en) Method and apparatus for building a language model
US20200327886A1 (en) Method for creating a knowledge base of components and their problems from short text utterances
CN104133561A (en) Auxiliary information display method and device based on input method
US20210406464A1 (en) Skill word evaluation method and device, electronic device, and non-transitory computer readable storage medium
CN111369980A (en) Voice detection method and device, electronic equipment and storage medium
CN113642316A (en) Chinese text error correction method and device, electronic equipment and storage medium
CN103679165A (en) OCR (optical character recognition) character recognition method and system
CN110929514B (en) Text collation method, text collation apparatus, computer-readable storage medium, and electronic device
CN112528628A (en) Text processing method and device and electronic equipment
US9014477B2 (en) Method and apparatus for automatically identifying character segments for character recognition
CN116541494A (en) Model training method, device, equipment and medium for replying information
CN104134064B (en) Character recognition method and device
CN115547508A (en) Data correction method, data correction device, electronic equipment and storage medium
JP2012093968A (en) Character recognition apparatus and character recognition method, recognition character correction apparatus and recognition character correction method and program
CN111339776B (en) Resume parsing method and device, electronic equipment and computer-readable storage medium
CN110728137B (en) Method and device for word segmentation
KR20140002171A (en) Method for interpreting automatically
CN109614621A (en) A kind of method, device and equipment correcting text
CN113688625A (en) Language identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant