CN1300740C - Postal coding numberical string identifying method - Google Patents

Postal coding numberical string identifying method Download PDF

Info

Publication number
CN1300740C
CN1300740C CNB2005100235506A CN200510023550A CN1300740C CN 1300740 C CN1300740 C CN 1300740C CN B2005100235506 A CNB2005100235506 A CN B2005100235506A CN 200510023550 A CN200510023550 A CN 200510023550A CN 1300740 C CN1300740 C CN 1300740C
Authority
CN
China
Prior art keywords
centerdot
recognition result
probability
prime
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005100235506A
Other languages
Chinese (zh)
Other versions
CN1645408A (en
Inventor
吕岳
邬建中
文颖
原晓梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI INST OF POSTAL SCIENCE
Original Assignee
SHANGHAI INST OF POSTAL SCIENCE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI INST OF POSTAL SCIENCE filed Critical SHANGHAI INST OF POSTAL SCIENCE
Priority to CNB2005100235506A priority Critical patent/CN1300740C/en
Publication of CN1645408A publication Critical patent/CN1645408A/en
Application granted granted Critical
Publication of CN1300740C publication Critical patent/CN1300740C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The present invention relates to a postal code number string identifying method which comprises the following steps: the images of N postal code character sequences, which are represented by the formula that X is equal to (x1... xn... xN), are respectively input in K independent isolated character recognition sorters ek; an input character image xn is identified into one of postal codes (c1... cm... cM) by each isolated character recognition sorter or is refused to be identified; When an identifying result is m', the probability of an input mode of cm is calculated to be P(x belonging to Cm/ek (x) is equal to m'). The probability for calculating that the identifying result of X is D which is equal to (d1, d2,..., dN) is p(D|X) according to P(x belonging to Cm/ek (x) is equal to m'), wherein D equal to (d1, d2,..., dN) is one effective postal code in a postal code dictionary library omega. The identifying result of the input mode is determined according to the probability p(D|X). The postal code number string identifying method of the present invention is characterized in that the advantages of the sorters are exerted by an identifying voting rule according to the characteristics of the sorters. The prior knowledge of the identifying performance of each sorter can be obtained by the statistics of large amounts of samples so as to be used as the reference of votes by ballot. Thus, an identifying combination result can achieve a high identifying rate and high confidence. The accuracy rate of postal code number string identification is enhanced.

Description

Postal coding numberical string identifying method
Technical field
The present invention relates to postal coding numberical string identifying method.
Background technology
Optical character recognition progressively move towards practical, yet people wishes that still recognition system can reach the better recognition performance through the development of decades.In order to improve discrimination and degree of confidence, people more and more tend to adopt the combination of multiple information sources, many feature extractions and identification methods to realize high performance recognition system.
A kind of simple method that existing postal coding numberical string multi-categorizer makes up is voted exactly, as majority vote rule and rule in full accord etc.But these voting rules are not considered the characteristic of each sorter itself, implementation be the principle of " on a one-man-one-vote basis ".And in fact because the feature difference that each sorter uses, based on principle and method different, perhaps the sample of training process use is not quite similar, the recognition performance of each sorter is difference to some extent, certain complementarity is arranged, and promptly each sorter has certain difference to the recognition capability of each classification.
General Combination of Multiple Classifiers is paid close attention to is combination to single character identification result, its objective is the optimization that reaches the individual character recognition effect, and its principle after input is waited to know sample Xn and discerned through K recognition classifier, obtains K recognition result Sn as shown in Figure 1 (k)(k=1,2 .. K), after the decision-making of multi-categorizer knowledge result combinations, obtains final recognition result Cn.Do not consider the context of character string during to the combination of multi-categorizer, it is with the combination recognition sequence (C of each character in the character string 1Cn ... C N) deliver to a dictionary library, whether effective by the recognition result of dictionary library check character string, as shown in Figure 2.
In some practical application, wish to obtain the whole recognition effect optimum of character string, and be not only the recognition effect optimum of single character string, because the recognition effect optimum of single character string is not necessarily represented the whole recognition effect optimum of character string.Such as in the identification of postcode, six numerals are discerned simultaneously correctly and can be used for the automatic mail sorting machine, require the recognition effect of whole postal coding numberical string is reached best.
Summary of the invention
The object of the present invention is to provide a kind of postal coding numberical string identifying method of the Combination of Multiple Classifiers based on knowledge base.
Adopt following technical scheme for reaching above-mentioned purpose the present invention,
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1X nX N) be input to independently individual character recognition classifier e of K respectively k, wherein N and K are the positive integer greater than 1; For China Post's coded digital character string, N=6.
(2) each described individual character recognition classifier e kCharacter image x with input nBe identified as postcode { c 1C mC MIn one, perhaps refuse to know, be expressed as c (M+1), wherein M is the positive integer greater than 1; Postcode { c 1C mC MBe any one in the numeral 0 to 9, M=10 is promptly arranged.
(3) calculating input pattern when recognition result is m ' is c mProbability P (x ∈ C m/ e k(x)=m ');
(4) according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) Probability p (D|X); D=(d wherein 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
As a kind of optimal way of the present invention, in the described step (3), input pattern was c when recognition result was m ' mProbability P (x ∈ C m/ e k(x)=m ') computing method can be following method:
According to described individual character recognition classifier e kRecognition result carries out sample statistics, forms described individual character recognition classifier e kThe chaotic matrix of identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) k = 1,2 , · · · , K
N wherein Mm ' (k)Represent described individual character recognition classifier e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) work as m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class,
Described individual character recognition classifier e kRecognition result is m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n im ′ ( k ) m ′ = 1,2 , . . . , M + 1
At described individual character recognition classifier e kRecognition result be that sample is from C under the condition of m ' mThe probability of class is:
P ( x ∈ C m / e k ( x ) = m ′ ) = n mm ′ ( k ) n m ′ ( k ) = n mm ′ ( k ) Σ m = 1 M n mm ′ ( k ) m ′ = 1,2 , · · · , M
As another optimal way of the present invention, in the described step (4), according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M kSample abundant and reflected the space distribution of recognition result, with CM kAs set of classifiers fashionable priori, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’)i=1,2,...,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C dn )
S ( D | X ) = Π n = 1 N s ( d n | x n ) = Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C dn )
The probability that last X belongs to D is p (D|X)=e F (D)S (D|X).
As an optimal way more of the present invention, in the described step (5),
Determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then, and promptly recognition result is D; Wherein α is refusing to know and wrong value of explaining (α=0.5) that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is constant (β=0.2) here, X=D then, promptly recognition result is D.
Postal coding numberical string identifying method of the present invention, its identification voting rule have been brought into play the advantage of each sorter according to the characteristic of each sorter itself.Obtain the priori of each sorter recognition performance by statistics, as the foundation of voting, make the identification combined result reach high discrimination and high confidence level it great amount of samples.Improved the accuracy rate of postal coding numberical string identification.
Description of drawings
Further specify the present invention below in conjunction with drawings and Examples.
Fig. 1 is a Combination of Multiple Classifiers individual character identification block scheme in the prior art
Fig. 2 carries out verification for dictionary library in the prior art to recognition result block scheme
Fig. 3 is the inventive method functional-block diagram
Embodiment
As shown in Figure 3, sequence X to be identified=(x 1... x n... x N) through individual character recognition classifier e kAfter the identification, make a strategic decision, obtain recognition result sequence (d at last in conjunction with the probability of dictionary library and appearance 1, d 2..., d N).
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1... x n... x N) be input to independently individual character recognition classifier of K simultaneously.For Chinese code number word character string, N=6.
(2) each individual character recognition classifier e kCharacter image x to input nDiscern, obtain recognition result, suppose that sorter is identified as { c with input pattern 1... c m... c MIn the class one, perhaps refuse to know.For postcode numeral, M=10, promptly its recognition result may be 0,1 ..., any one among the 9}.
(3) when recognition result is m ', input pattern may be c mProbability represent with following mode:
At first utilize great amount of samples statistical sorter e kThe identification situation, thereby form the chaotic matrix of relevant this sorter identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) k = 1,2 , . . . , K
N wherein Mm ' (k)Presentation class device e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of expression is:
(a) if m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) if m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) if m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class.
To sorter e k, recognition result is m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n im ′ ( k ) m ′ = 1,2 , . . . , M + 1
At sorter e kRecognition result be that sample is from C under the condition of m ' mThe probability of class can be represented with conditional probability:
P ( x ∈ C m / e k ( x ) = m ′ ) = n mm ′ ( k ) n m ′ ( k ) = n mm ′ ( k ) Σ m = 1 M n mm ′ ( k ) m ′ = 1,2 , . . . , M
If generate chaotic Matrix C M kSample abundant and reflected the distribution of model space, this confusion matrix has reflected sorter e kThe identification situation, with CM kAs set of classifiers fashionable priori, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’)i=1,2,...,M
(4) calculate X and belong to a certain postcode character string D=(d 1, d 2..., d N) probability:
Suppose D=(d 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω, and suppose that for certain specific application scenario, the frequency that postcode D occurs is expressed as f (D).
X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C dn )
S ( D | X ) = Π n = 1 N s ( d n | x n ) = Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C dn )
The possibility that last X belongs to D is expressed as:
p(D|X)=e f(D)·S(D|X)
(5) adopt following rule to determine the optimal identification result of input pattern:
Rule 1:
If exist D to belong to Ω, and p ( D | X ) = max D ∈ Ω p ( D | X ) and , p ( D | X ) > α
X=D then
Wherein α is a threshold value, be used for refusing to obtain compromise (α=0.5) between knowledge and the wrong knowledge,
Rule 2:
If exist D to belong to Ω, and p ( D | X ) = max D ∈ Ω p ( D | X )
Exist D ' to belong to Ω, and p ( D ′ | X ) = max D ′ ∈ Ω - D p ( D ′ | X )
And p (D | X)-p (D ' | X)>β
X=D then
Here β is constant (β=0.2).

Claims (6)

1, a kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1X nX N) be input to independently individual character recognition classifier e of K respectively kIn, wherein N and K are the positive integer greater than 1;
(2) each described individual character recognition classifier e kCharacter image x with input nBe identified as postcode { c 1C mC MIn one, perhaps refuse to know, be expressed as c (M+1), wherein M is the positive integer greater than 1;
(3) calculating input pattern when recognition result is m ' is c mProbability P (x ∈ C m/ e k(x)=m ');
(4) according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) Probability p (D|X); D=(d wherein 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
2, postal coding numberical string identifying method according to claim 1 is characterized in that: in the described step (1), the number N of postcode character string is 6; In the described step (2), postcode { c 1C mC MBe any one in the numeral 0 to 9.
3, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (3), input pattern was c when recognition result was m ' mProbability P (x ∈ C m/ e k(x)=m ') computing method be, according to described individual character recognition classifier e kRecognition result carries out sample statistics, forms described individual character recognition classifier e kThe chaotic matrix of identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) , k = 1,2 , · · · , K
N wherein Mm ' (k)Represent described individual character recognition classifier e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) work as m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class,
Described individual character recognition classifier e kRecognition result be m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n i m ′ ( k ) , m ′ = 1,2 , · · · , M + 1
At described individual character recognition classifier e kRecognition result be that sample is from C under the condition of m ' mThe probability of class is:
P ( x ∈ C m / e k ( x ) = m ′ ) = n m m ′ ( k ) n m ′ ( k ) = n m m ′ ( k ) Σ m = 1 M n m m ′ ( k ) , m ′ = 1,2 , · · · , M .
4, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (4), according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M kSample abundant and reflected the space distribution of recognition result, with CM kAs the fashionable priori of set of classifiers, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’) i=1,2,…,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C d n )
S ( D | X ) = Π n = 1 N s ( d n | x n ) + Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C d n )
The probability that last X belongs to D is p (D|X)=e F (D)S (D|X).
5, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (5), determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then,
Be that recognition result is D; Wherein α is refusing to know and a wrong threshold value that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is a constant here, X=D then, promptly recognition result is D.
6, postal coding numberical string identifying method according to claim 5 is characterized in that: the value of described α and β is respectively 0.5 and 0.2.
CNB2005100235506A 2005-01-25 2005-01-25 Postal coding numberical string identifying method Expired - Fee Related CN1300740C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100235506A CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100235506A CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Publications (2)

Publication Number Publication Date
CN1645408A CN1645408A (en) 2005-07-27
CN1300740C true CN1300740C (en) 2007-02-14

Family

ID=34875908

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100235506A Expired - Fee Related CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Country Status (1)

Country Link
CN (1) CN1300740C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100452042C (en) * 2006-06-23 2009-01-14 腾讯科技(深圳)有限公司 Digital string fuzzy match method
CN101894266A (en) * 2010-06-30 2010-11-24 北京捷通华声语音技术有限公司 Handwriting recognition method and system
CN110443159A (en) * 2019-07-17 2019-11-12 新华三大数据技术有限公司 Digit recognition method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0929179A (en) * 1995-07-17 1997-02-04 Toshiba Corp Addressee reader
CN1154879A (en) * 1996-12-19 1997-07-23 邮电部第三研究所 Process and apparatus for recognition of postcode in course of letter sorting
JPH1034089A (en) * 1996-07-30 1998-02-10 Toshiba Corp Video coding device
US6269171B1 (en) * 1995-04-12 2001-07-31 Lockheed Martin Corporation Method for exploiting correlated mail streams using optical character recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269171B1 (en) * 1995-04-12 2001-07-31 Lockheed Martin Corporation Method for exploiting correlated mail streams using optical character recognition
JPH0929179A (en) * 1995-07-17 1997-02-04 Toshiba Corp Addressee reader
JPH1034089A (en) * 1996-07-30 1998-02-10 Toshiba Corp Video coding device
CN1154879A (en) * 1996-12-19 1997-07-23 邮电部第三研究所 Process and apparatus for recognition of postcode in course of letter sorting

Also Published As

Publication number Publication date
CN1645408A (en) 2005-07-27

Similar Documents

Publication Publication Date Title
CN1207664C (en) Error correcting method for voice identification result and voice identification system
CN1276381C (en) Region detecting method and region detecting apparatus
CN101122953B (en) Picture words segmentation method
CN1163841C (en) On-line hand writing Chinese character distinguishing device
CN102346847B (en) License plate character recognizing method of support vector machine
CN1302456C (en) Sound veins identifying method
CN1120757C (en) Method and device for recognition of delivery data on mail matter
CN101067808A (en) Text key word extracting method
CN1222871A (en) Method of processing postal matters
CN101059870A (en) Image cutting method based on attribute histogram
CN1300740C (en) Postal coding numberical string identifying method
CN1945628A (en) Video frequency content expressing method based on space-time remarkable unit
CN101079044A (en) Similarity measurement method for audio-frequency fragments
CN100390815C (en) Template optimized character recognition method and system
CN101046809A (en) New word identification method based on association rule model
CN1367460A (en) Character string identification device, character string identification method and storage medium thereof
CN1388947A (en) Character recognition system
CN1227373A (en) Handwriting verification device
CN1545067A (en) A method for compressing digitalized archive file using computer
CN1734466A (en) The character recognition device and the character identifying method that are used for the character of recognition image
CN1167956A (en) Method and device for recognition of similar writing
CN1488119A (en) Resolution enhancement by nearest neighbor classified filtering
CN1838150A (en) Probabilistic boosting tree structure for learned discriminative models
CN1916938A (en) Identifying distance regulator and method thereof and text lines identifier and method thereof
CN1186744C (en) Chinese character recognizing method based on structure model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070214

Termination date: 20150125

EXPY Termination of patent right or utility model