CN1300740C - Postal coding numberical string identifying method - Google Patents
Postal coding numberical string identifying method Download PDFInfo
- Publication number
- CN1300740C CN1300740C CNB2005100235506A CN200510023550A CN1300740C CN 1300740 C CN1300740 C CN 1300740C CN B2005100235506 A CNB2005100235506 A CN B2005100235506A CN 200510023550 A CN200510023550 A CN 200510023550A CN 1300740 C CN1300740 C CN 1300740C
- Authority
- CN
- China
- Prior art keywords
- centerdot
- recognition result
- probability
- prime
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Character Discrimination (AREA)
Abstract
The present invention relates to a postal code number string identifying method which comprises the following steps: the images of N postal code character sequences, which are represented by the formula that X is equal to (x1... xn... xN), are respectively input in K independent isolated character recognition sorters ek; an input character image xn is identified into one of postal codes (c1... cm... cM) by each isolated character recognition sorter or is refused to be identified; When an identifying result is m', the probability of an input mode of cm is calculated to be P(x belonging to Cm/ek (x) is equal to m'). The probability for calculating that the identifying result of X is D which is equal to (d1, d2,..., dN) is p(D|X) according to P(x belonging to Cm/ek (x) is equal to m'), wherein D equal to (d1, d2,..., dN) is one effective postal code in a postal code dictionary library omega. The identifying result of the input mode is determined according to the probability p(D|X). The postal code number string identifying method of the present invention is characterized in that the advantages of the sorters are exerted by an identifying voting rule according to the characteristics of the sorters. The prior knowledge of the identifying performance of each sorter can be obtained by the statistics of large amounts of samples so as to be used as the reference of votes by ballot. Thus, an identifying combination result can achieve a high identifying rate and high confidence. The accuracy rate of postal code number string identification is enhanced.
Description
Technical field
The present invention relates to postal coding numberical string identifying method.
Background technology
Optical character recognition progressively move towards practical, yet people wishes that still recognition system can reach the better recognition performance through the development of decades.In order to improve discrimination and degree of confidence, people more and more tend to adopt the combination of multiple information sources, many feature extractions and identification methods to realize high performance recognition system.
A kind of simple method that existing postal coding numberical string multi-categorizer makes up is voted exactly, as majority vote rule and rule in full accord etc.But these voting rules are not considered the characteristic of each sorter itself, implementation be the principle of " on a one-man-one-vote basis ".And in fact because the feature difference that each sorter uses, based on principle and method different, perhaps the sample of training process use is not quite similar, the recognition performance of each sorter is difference to some extent, certain complementarity is arranged, and promptly each sorter has certain difference to the recognition capability of each classification.
General Combination of Multiple Classifiers is paid close attention to is combination to single character identification result, its objective is the optimization that reaches the individual character recognition effect, and its principle after input is waited to know sample Xn and discerned through K recognition classifier, obtains K recognition result Sn as shown in Figure 1
(k)(k=1,2 .. K), after the decision-making of multi-categorizer knowledge result combinations, obtains final recognition result Cn.Do not consider the context of character string during to the combination of multi-categorizer, it is with the combination recognition sequence (C of each character in the character string
1Cn ... C
N) deliver to a dictionary library, whether effective by the recognition result of dictionary library check character string, as shown in Figure 2.
In some practical application, wish to obtain the whole recognition effect optimum of character string, and be not only the recognition effect optimum of single character string, because the recognition effect optimum of single character string is not necessarily represented the whole recognition effect optimum of character string.Such as in the identification of postcode, six numerals are discerned simultaneously correctly and can be used for the automatic mail sorting machine, require the recognition effect of whole postal coding numberical string is reached best.
Summary of the invention
The object of the present invention is to provide a kind of postal coding numberical string identifying method of the Combination of Multiple Classifiers based on knowledge base.
Adopt following technical scheme for reaching above-mentioned purpose the present invention,
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string
1X
nX
N) be input to independently individual character recognition classifier e of K respectively
k, wherein N and K are the positive integer greater than 1; For China Post's coded digital character string, N=6.
(2) each described individual character recognition classifier e
kCharacter image x with input
nBe identified as postcode { c
1C
mC
MIn one, perhaps refuse to know, be expressed as c
(M+1), wherein M is the positive integer greater than 1; Postcode { c
1C
mC
MBe any one in the numeral 0 to 9, M=10 is promptly arranged.
(3) calculating input pattern when recognition result is m ' is c
mProbability P (x ∈ C
m/ e
k(x)=m ');
(4) according to P (x ∈ C
m/ e
k(x)=and m ') recognition result that calculates X is D=(d
1, d
2..., d
N) Probability p (D|X); D=(d wherein
1, d
2..., d
N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
As a kind of optimal way of the present invention, in the described step (3), input pattern was c when recognition result was m '
mProbability P (x ∈ C
m/ e
k(x)=m ') computing method can be following method:
According to described individual character recognition classifier e
kRecognition result carries out sample statistics, forms described individual character recognition classifier e
kThe chaotic matrix of identification situation:
N wherein
Mm ' (k)Represent described individual character recognition classifier e
kWith C
mSample in the class is identified as C
M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e
kCorrect identification C
mThe quantity of sample in the class;
(b) work as m '=M+1, e
kRefuse to know C
mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e
kWith C
mSample wrong identification in the class is C
M 'The quantity of class,
Described individual character recognition classifier e
kRecognition result is m '=e
k(x) total sample number is:
At described individual character recognition classifier e
kRecognition result be that sample is from C under the condition of m '
mThe probability of class is:
As another optimal way of the present invention, in the described step (4), according to P (x ∈ C
m/ e
k(x)=and m ') recognition result that calculates X is D=(d
1, d
2..., d
N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M
kSample abundant and reflected the space distribution of recognition result, with CM
kAs set of classifiers fashionable priori, promptly with P (x ∈ C
m/ e
k(x)=and m ') score when voting, x ∈ C
mProbability tables be shown:
s
(k)(x∈C
m)=P(x∈C
m/e
k(x)=m’)i=1,2,...,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
The probability that last X belongs to D is p (D|X)=e
F (D)S (D|X).
As an optimal way more of the present invention, in the described step (5),
Determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then, and promptly recognition result is D; Wherein α is refusing to know and wrong value of explaining (α=0.5) that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is constant (β=0.2) here, X=D then, promptly recognition result is D.
Postal coding numberical string identifying method of the present invention, its identification voting rule have been brought into play the advantage of each sorter according to the characteristic of each sorter itself.Obtain the priori of each sorter recognition performance by statistics, as the foundation of voting, make the identification combined result reach high discrimination and high confidence level it great amount of samples.Improved the accuracy rate of postal coding numberical string identification.
Description of drawings
Further specify the present invention below in conjunction with drawings and Examples.
Fig. 1 is a Combination of Multiple Classifiers individual character identification block scheme in the prior art
Fig. 2 carries out verification for dictionary library in the prior art to recognition result block scheme
Fig. 3 is the inventive method functional-block diagram
Embodiment
As shown in Figure 3, sequence X to be identified=(x
1... x
n... x
N) through individual character recognition classifier e
kAfter the identification, make a strategic decision, obtain recognition result sequence (d at last in conjunction with the probability of dictionary library and appearance
1, d
2..., d
N).
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string
1... x
n... x
N) be input to independently individual character recognition classifier of K simultaneously.For Chinese code number word character string, N=6.
(2) each individual character recognition classifier e
kCharacter image x to input
nDiscern, obtain recognition result, suppose that sorter is identified as { c with input pattern
1... c
m... c
MIn the class one, perhaps refuse to know.For postcode numeral, M=10, promptly its recognition result may be 0,1 ..., any one among the 9}.
(3) when recognition result is m ', input pattern may be c
mProbability represent with following mode:
At first utilize great amount of samples statistical sorter e
kThe identification situation, thereby form the chaotic matrix of relevant this sorter identification situation:
N wherein
Mm ' (k)Presentation class device e
kWith C
mSample in the class is identified as C
M 'The quantity of class, the implication of expression is:
(a) if m=m ', e
kCorrect identification C
mThe quantity of sample in the class;
(b) if m '=M+1, e
kRefuse to know C
mThe quantity of sample in the class;
(c) if m ≠ m ' and m ' ≠ M+1, e
kWith C
mSample wrong identification in the class is C
M 'The quantity of class.
To sorter e
k, recognition result is m '=e
k(x) total sample number is:
At sorter e
kRecognition result be that sample is from C under the condition of m '
mThe probability of class can be represented with conditional probability:
If generate chaotic Matrix C M
kSample abundant and reflected the distribution of model space, this confusion matrix has reflected sorter e
kThe identification situation, with CM
kAs set of classifiers fashionable priori, promptly with P (x ∈ C
m/ e
k(x)=and m ') score when voting, x ∈ C
mProbability tables be shown:
s
(k)(x∈C
m)=P(x∈C
m/e
k(x)=m’)i=1,2,...,M
(4) calculate X and belong to a certain postcode character string D=(d
1, d
2..., d
N) probability:
Suppose D=(d
1, d
2..., d
N) be an effective postcode among the postcode dictionary library Ω, and suppose that for certain specific application scenario, the frequency that postcode D occurs is expressed as f (D).
X is calculated as follows from the score of D:
The possibility that last X belongs to D is expressed as:
p(D|X)=e
f(D)·S(D|X)
(5) adopt following rule to determine the optimal identification result of input pattern:
Rule 1:
If exist D to belong to Ω, and
X=D then
Wherein α is a threshold value, be used for refusing to obtain compromise (α=0.5) between knowledge and the wrong knowledge,
Rule 2:
If exist D to belong to Ω, and
Exist D ' to belong to Ω, and
And p (D | X)-p (D ' | X)>β
X=D then
Here β is constant (β=0.2).
Claims (6)
1, a kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string
1X
nX
N) be input to independently individual character recognition classifier e of K respectively
kIn, wherein N and K are the positive integer greater than 1;
(2) each described individual character recognition classifier e
kCharacter image x with input
nBe identified as postcode { c
1C
mC
MIn one, perhaps refuse to know, be expressed as c
(M+1), wherein M is the positive integer greater than 1;
(3) calculating input pattern when recognition result is m ' is c
mProbability P (x ∈ C
m/ e
k(x)=m ');
(4) according to P (x ∈ C
m/ e
k(x)=and m ') recognition result that calculates X is D=(d
1, d
2..., d
N) Probability p (D|X); D=(d wherein
1, d
2..., d
N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
2, postal coding numberical string identifying method according to claim 1 is characterized in that: in the described step (1), the number N of postcode character string is 6; In the described step (2), postcode { c
1C
mC
MBe any one in the numeral 0 to 9.
3, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (3), input pattern was c when recognition result was m '
mProbability P (x ∈ C
m/ e
k(x)=m ') computing method be, according to described individual character recognition classifier e
kRecognition result carries out sample statistics, forms described individual character recognition classifier e
kThe chaotic matrix of identification situation:
N wherein
Mm ' (k)Represent described individual character recognition classifier e
kWith C
mSample in the class is identified as C
M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e
kCorrect identification C
mThe quantity of sample in the class;
(b) work as m '=M+1, e
kRefuse to know C
mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e
kWith C
mSample wrong identification in the class is C
M 'The quantity of class,
Described individual character recognition classifier e
kRecognition result be m '=e
k(x) total sample number is:
At described individual character recognition classifier e
kRecognition result be that sample is from C under the condition of m '
mThe probability of class is:
4, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (4), according to P (x ∈ C
m/ e
k(x)=and m ') recognition result that calculates X is D=(d
1, d
2..., d
N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M
kSample abundant and reflected the space distribution of recognition result, with CM
kAs the fashionable priori of set of classifiers, promptly with P (x ∈ C
m/ e
k(x)=and m ') score when voting, x ∈ C
mProbability tables be shown:
s
(k)(x∈C
m)=P(x∈C
m/e
k(x)=m’) i=1,2,…,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
The probability that last X belongs to D is p (D|X)=e
F (D)S (D|X).
5, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (5), determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then,
Be that recognition result is D; Wherein α is refusing to know and a wrong threshold value that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is a constant here, X=D then, promptly recognition result is D.
6, postal coding numberical string identifying method according to claim 5 is characterized in that: the value of described α and β is respectively 0.5 and 0.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100235506A CN1300740C (en) | 2005-01-25 | 2005-01-25 | Postal coding numberical string identifying method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2005100235506A CN1300740C (en) | 2005-01-25 | 2005-01-25 | Postal coding numberical string identifying method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1645408A CN1645408A (en) | 2005-07-27 |
CN1300740C true CN1300740C (en) | 2007-02-14 |
Family
ID=34875908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2005100235506A Expired - Fee Related CN1300740C (en) | 2005-01-25 | 2005-01-25 | Postal coding numberical string identifying method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1300740C (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100452042C (en) * | 2006-06-23 | 2009-01-14 | 腾讯科技(深圳)有限公司 | Digital string fuzzy match method |
CN101894266A (en) * | 2010-06-30 | 2010-11-24 | 北京捷通华声语音技术有限公司 | Handwriting recognition method and system |
CN110443159A (en) * | 2019-07-17 | 2019-11-12 | 新华三大数据技术有限公司 | Digit recognition method, device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0929179A (en) * | 1995-07-17 | 1997-02-04 | Toshiba Corp | Addressee reader |
CN1154879A (en) * | 1996-12-19 | 1997-07-23 | 邮电部第三研究所 | Process and apparatus for recognition of postcode in course of letter sorting |
JPH1034089A (en) * | 1996-07-30 | 1998-02-10 | Toshiba Corp | Video coding device |
US6269171B1 (en) * | 1995-04-12 | 2001-07-31 | Lockheed Martin Corporation | Method for exploiting correlated mail streams using optical character recognition |
-
2005
- 2005-01-25 CN CNB2005100235506A patent/CN1300740C/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6269171B1 (en) * | 1995-04-12 | 2001-07-31 | Lockheed Martin Corporation | Method for exploiting correlated mail streams using optical character recognition |
JPH0929179A (en) * | 1995-07-17 | 1997-02-04 | Toshiba Corp | Addressee reader |
JPH1034089A (en) * | 1996-07-30 | 1998-02-10 | Toshiba Corp | Video coding device |
CN1154879A (en) * | 1996-12-19 | 1997-07-23 | 邮电部第三研究所 | Process and apparatus for recognition of postcode in course of letter sorting |
Also Published As
Publication number | Publication date |
---|---|
CN1645408A (en) | 2005-07-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1207664C (en) | Error correcting method for voice identification result and voice identification system | |
CN1276381C (en) | Region detecting method and region detecting apparatus | |
CN101122953B (en) | Picture words segmentation method | |
CN1163841C (en) | On-line hand writing Chinese character distinguishing device | |
CN102346847B (en) | License plate character recognizing method of support vector machine | |
CN1302456C (en) | Sound veins identifying method | |
CN1120757C (en) | Method and device for recognition of delivery data on mail matter | |
CN101067808A (en) | Text key word extracting method | |
CN1222871A (en) | Method of processing postal matters | |
CN101059870A (en) | Image cutting method based on attribute histogram | |
CN1300740C (en) | Postal coding numberical string identifying method | |
CN1945628A (en) | Video frequency content expressing method based on space-time remarkable unit | |
CN101079044A (en) | Similarity measurement method for audio-frequency fragments | |
CN100390815C (en) | Template optimized character recognition method and system | |
CN101046809A (en) | New word identification method based on association rule model | |
CN1367460A (en) | Character string identification device, character string identification method and storage medium thereof | |
CN1388947A (en) | Character recognition system | |
CN1227373A (en) | Handwriting verification device | |
CN1545067A (en) | A method for compressing digitalized archive file using computer | |
CN1734466A (en) | The character recognition device and the character identifying method that are used for the character of recognition image | |
CN1167956A (en) | Method and device for recognition of similar writing | |
CN1488119A (en) | Resolution enhancement by nearest neighbor classified filtering | |
CN1838150A (en) | Probabilistic boosting tree structure for learned discriminative models | |
CN1916938A (en) | Identifying distance regulator and method thereof and text lines identifier and method thereof | |
CN1186744C (en) | Chinese character recognizing method based on structure model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20070214 Termination date: 20150125 |
|
EXPY | Termination of patent right or utility model |