CN101256631B - Method and apparatus for character recognition - Google Patents

Method and apparatus for character recognition Download PDF

Info

Publication number
CN101256631B
CN101256631B CN2007100787676A CN200710078767A CN101256631B CN 101256631 B CN101256631 B CN 101256631B CN 2007100787676 A CN2007100787676 A CN 2007100787676A CN 200710078767 A CN200710078767 A CN 200710078767A CN 101256631 B CN101256631 B CN 101256631B
Authority
CN
China
Prior art keywords
sample
distortion
distortion sample
character
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2007100787676A
Other languages
Chinese (zh)
Other versions
CN101256631A (en
Inventor
黄开竹
孙俊
堀田悦伸
藤本克仁
直井聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN2007100787676A priority Critical patent/CN101256631B/en
Priority to JP2008044928A priority patent/JP5365026B2/en
Publication of CN101256631A publication Critical patent/CN101256631A/en
Application granted granted Critical
Publication of CN101256631B publication Critical patent/CN101256631B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method, a device, a program and a readable storage medium for identifying characters. The method comprises the steps of: identifying the input character samples to generate the identification result; generating degree of confidence of the identification result; in accordance with the degree of confidence, judging whether the input character samples belong to deformed samples, if not, regarding the identification result as the final identification result; if so, identifying the deformed samples and generate the final identification result. The technical proposal of the invention can effectively process the deformed samples by directly regarding the samples which are incorrectly identified at the first level as the basic training samples at the second level, thus improving the identification accuracy of the system.

Description

A kind of method of character recognition, device
Technical field
The present invention is about area of pattern recognition, especially in regard to the technology that distortion or slight abnormality character are discerned.Be a kind of method, device, program and readable storage medium storing program for executing of character recognition concretely.
Background technology
There are a lot of technology can be used for identification character.A kind of method that is widely used at present is the recognition methods based on statistics, (Support Vector Machine:SVM) (specifically can see V.N.Vapnik for details such as support vector machine, Statistical Learning Theory, Springer, New York, 2nd edition, 1998, and C.J.C.Burges, A tutorial on support vector machines forpattern recognition, Data Mining and Knowledge Discovery, 2 (2): 121-167,1998) be to be used for one of the best way of identification character.When carrying out identification character with support vector machine, normally collected a lot of character samples in advance, this sample set also is training set.Then training set being imported SVM by the gross trains. generally speaking, SVM at first at per two classes (as discerning 10 arabic numeral, then one have C (10,2) two class problem=45, the identification problem as 0 and 1,0 and 2 identification problem, ..., 8 and 9 identification problem etc.), utilize and find the solution quadratic programming problem, construct a categorised decision function.When new character input was discerned, each two class decision function was to its classification results ballot, and final classification results is then for obtaining that maximum character of poll.
Based on the method for statistics, especially the resulting classifying face of support vector machine can obtain discrimination preferably usually, but in a lot of practical applications, often needs higher discrimination.Just need more high precision such as identification to bank money.Whole input training sample can cause support vector machine to consider not enough to the sample of some distortion.In other words, these a spot of distortion or mildly subnormal sample are supported vector machine probably and are treated to noise spot, and without a moment's thought.And in the reality, the identification of the sample of these distortion is often directly affected system whether can obtain the better recognition precision.Recognition methods based on statistics simultaneously often needs a large amount of training samples, when only existence is out of shape sample on a small quantity, directly uses based on the method for adding up and often can not obtain good recognition effect.There are a lot of methods all to concentrate in recent years and how improve based on the method for statistics such as the training effectiveness or the recognition speed of support vector machine
United States Patent (USP), the patent No. is 6,327,581, denomination of invention is " Methods and apparatus forbuilding a support vector machine classifier ", and United States Patent (USP), the patent No. is 6,134,344, denomination of invention is herein incorporated as background technology of the present invention for the content of " Method and apparatus for improving the efficiency ofsupportvec tor machines ".Yet these methods are when handling the high deformation sample, because their same whole input training samples, it is not enough to cause support vector machine that the sample of some distortion is considered equally, thus the recognition effect that can not obtain.
Summary of the invention
At the defective of prior art, the invention provides a kind of method, device, program and readable storage medium storing program for executing of character recognition.This character identifying method adopts the two-stage identification mode to carry out: utilize traditional statistical method in the first order, such as support vector machine sample is discerned, those distortion or mildly subnormal sample tend to by first order identification error, then the sample of first order tional identification mistake are handled separately in the structure of the second level.Adopt two-layer configuration to can be good at handling the sample of distortion, thereby can greatly improve the precision of system.
One of purpose of the present invention provides a kind of character identifying method, said method comprising the steps of: the character sample to input is discerned, and produces recognition result; Generate the degree of confidence of described recognition result; Judge according to described degree of confidence whether the character sample of being imported belongs to the distortion sample; If non-distortion sample is then also exported described recognition result as final recognition result; If the distortion sample is then discerned the distortion sample, generate final recognition result and output.
Described character sample to input is discerned the method that adopts support vector machine.
Described degree of confidence generates by the method that the matching distance match distributes.
The method that distributes by the matching distance match generates degree of confidence, further may further comprise the steps: the matching distance that produces recognition result and each character class; According to l matching distance before the ascending select progressively of matching distance as candidate distance; Calculate the degree of confidence of described recognition result according to a described l candidate distance.
Judge that according to described degree of confidence whether the character sample of being imported belongs to the distortion sample, specifically may further comprise the steps: if described degree of confidence, judges then that the character sample of input is non-distortion sample greater than setting threshold; Otherwise, judge that then the character sample of input is the distortion sample.
Described the distortion sample is discerned, specifically be may further comprise the steps: generate distortion sample template base, the template vector of described distortion sample and described distortion sample template base is got to differ from obtain difference characteristic; Described distortion sample conversion is become difference and class differences in the class, and judge that described difference characteristic belongs to class interpolation XOR class differences; Described distortion sample is identified as the classification of class interpolation anomalous mode plate with maximum confidence.
Described generation distortion sample template base, specifically may further comprise the steps: generate the basic storehouse of distortion sample, the basic storehouse of described distortion sample only comprises described distortion sample; Automatic cluster is carried out in the basic storehouse of described distortion sample generate distortion sample template base.
The template vector of described distortion sample and described distortion sample template base is got difference to be obtained difference characteristic and be meant: the distortion sample is only got difference with the distortion sample template of top n candidate's classification of described recognition result and is obtained difference characteristic.
Generate the class differences according to described distortion sample, specifically may further comprise the steps: sampling according to a given number percent generates described class differences.
Two of purpose of the present invention provides a kind of character recognition device, and described device comprises: first order categorised decision unit, be used for the character sample of input is discerned, and produce recognition result; The degree of confidence generation unit is used to generate the degree of confidence of recognition result; The degree of confidence judging unit is used for judging according to described degree of confidence whether the character sample of being imported belongs to the distortion sample, if non-distortion sample then with described recognition result as final recognition result; Categorised decision unit, the second level is used for the distortion sample is discerned, and generates final recognition result; Output unit is used to export described final recognition result.
Described first order categorised decision unit is a support vector machine.
Described degree of confidence generation unit further comprises: the matching distance generation unit is used to produce the matching distance of recognition result and each character class; The candidate distance selected cell is used for according to l matching distance before the ascending select progressively of matching distance as candidate distance; Computing unit is used for calculating according to a described l candidate distance degree of confidence of described recognition result.
Described degree of confidence judging unit further comprises: the threshold setting unit is used to set a threshold value; Comparison judgment unit, the degree of confidence of more described recognition result and the size of described threshold value, if described degree of confidence is greater than setting threshold, the character sample of then judging input be non-distortion sample and with described recognition result as final recognition result, otherwise the character sample of then judging input is for the distortion sample and with described distortion sample input categorised decision unit, the second level.
Categorised decision unit, the described second level further comprises: distortion sample template base generation unit is used for generating distortion sample template base; Difference operator unit is used for that the template vector of described distortion sample and described distortion sample template base is got difference and obtains difference characteristic; Distortion sample conversion unit is used for described distortion sample conversion is become difference and class differences in the class; The difference characteristic decision package is used to judge that described difference characteristic belongs to class interpolation XOR class differences, and described distortion sample is identified as the classification of the class interpolation anomalous mode plate with maximum confidence.
Described distortion sample template base generation unit further comprises: the basic storehouse of distortion sample generation unit, be used for generating the basic storehouse of distortion sample, and the basic storehouse of described distortion sample only comprises described distortion sample; The sample cluster cell is used for that automatic cluster is carried out in the basic storehouse of described distortion sample and generates distortion sample template base.
Described difference operator unit is used for that also the distortion sample template of being out of shape top n candidate's classification of sample and described recognition result is got difference and obtains difference characteristic.
Described distortion sample conversion unit further comprises sampling unit, is used for sampling according to a given number percent generating described class differences.
Three of order of the present invention provides a kind of character recognition program, and described program comprises: the character sample to input is discerned, and produces recognition result; Generate the degree of confidence of described recognition result; Judge according to described degree of confidence whether the character sample of being imported belongs to the distortion sample; If non-distortion sample is then also exported described recognition result as final recognition result; If the distortion sample is then discerned the distortion sample, generate final recognition result and output.
Four of purpose of the present invention provides a kind of readable storage medium storing program for executing of store character recognizer, and described readable storage medium storing program for executing stores following program: the character sample to input is discerned, and produces recognition result; Generate the degree of confidence of described recognition result; Judge according to described degree of confidence whether the character sample of being imported belongs to the distortion sample; If non-distortion sample is then also exported described recognition result as final recognition result; If the distortion sample is then discerned the distortion sample, generate final recognition result and output.
The present invention adopts the two-stage recognition structure, and the sample of first order recognition device identification error can continue to enter second level recognition device and discern.Therefore, technical scheme of the present invention by with the sample of first order identification error directly as partial propaedeutics sample, can effectively handle the sample of those distortion, therefore can improve the accuracy of identification of system.Simultaneously, utilize error sample to generate distortion sample storehouse automatically and also avoided manual collection sample storehouse, saved manpower greatly, material resources.
In addition, the method for difference characteristic has been adopted in the identification that the present invention carries out the distortion sample, by poor with getting mutually between the deformed characters proper vector, thereby the character recognition problem of multiclass is converted to two classes, also is the identification problem of interior difference of class and class differences.And in two class difference characteristics after the conversion set training classifier.The character recognition problem of multiclass is converted to the identification of two classes, and greatly the exptended sample number effectively utilizes the recognition methods based on statistics, has improved the sorter training effect, thereby has improved the precision of system.
Description of drawings
Fig. 1 is the structured flowchart of character recognition device of the present invention;
Fig. 2 a is a character identifying method process flow diagram of the present invention;
Fig. 2 b is the process flow diagram of character recognition embodiment of the present invention;
Fig. 3 is the structured flowchart of degree of confidence generation unit of the present invention;
Fig. 4 is the structured flowchart of degree of confidence judging unit of the present invention;
Fig. 5 is the structured flowchart of categorised decision unit, the second level of the present invention;
Fig. 6 is out of shape the structural drawing of sample template base generation unit for the present invention;
Fig. 7 is out of shape the generation method flow diagram in the basic storehouse of sample for the present invention;
Fig. 8 is out of shape the generation method flow diagram in sample training storehouse for the present invention;
Fig. 9 is a second level of the present invention categorised decision cell operation schematic diagram.
Embodiment
The present invention adopts two-layer configuration that the character sample of input is discerned, because the sample of first order identification error is out of shape sample or mildly subnormal sample often, by the sample of these identification errors is discerned processing separately in the second level, thereby can increase substantially accuracy of identification to the distortion sample.The present invention is described in detail below in conjunction with accompanying drawing.
Fig. 1 is the structured flowchart of character recognition device of the present invention.As shown in the figure, character recognition device of the present invention comprises: first order categorised decision unit 11, degree of confidence generation unit 12, degree of confidence judging unit 13 and categorised decision unit, the second level 14, this character recognition device also comprises output unit 15 in a preferred embodiment.Wherein, first order categorised decision unit 11 is used for the character sample of input is discerned, and produces recognition result, and first order categorised decision of the present invention unit can adopt traditional support vector machine to realize; Degree of confidence generation unit 12 is used to generate the degree of confidence of recognition result; Degree of confidence judging unit 13 is used for judging according to described degree of confidence whether the character sample of being imported belongs to the distortion sample, if non-distortion sample then with described recognition result as final recognition result; Categorised decision unit, the second level 14 is used for final recognition result is discerned and generated to the distortion sample; Output unit 15 is used to export described final recognition result.
Fig. 2 a is the process flow diagram of character identifying method of the present invention.As shown in the figure, step S101, first order categorised decision unit 11 is discerned the character sample of input according to training sample set T101, and produces recognition result.Step S111, degree of confidence generation unit 12 calculates the degree of confidence of the recognition result of first order categorised decision unit 11 generations.Step S102, degree of confidence judging unit 13 judges that whether the recognition result degree of confidence of first order categorised decision unit 11 generations is greater than a setting threshold Th1, if, then explanation input sample can be very reliable by 11 identifications of first order categorised decision unit, so the recognition result of first order categorised decision unit 11 will be final recognition result, and skip to step S105, by output unit 15 these final recognition results of output.Otherwise, judge that then this input character sample is the distortion sample, and carry out step S106, the distortion sample of this input (comprising the slight abnormality sample) is by 14 identifications of categorised decision unit, the second level, and with the recognition result of categorised decision unit, the second level 14 as final recognition result, enter step S105 then, by output unit 15 these final recognition results of output.Categorised decision unit 14, the second level can these unusual characters of better recognition at distortion.Adopt two-layer configuration to handle, thereby can effectively improve the accuracy of identification of system at character of different nature.Fig. 2 b is the process flow diagram of character recognition one embodiment of the present invention.
Fig. 3 is the structured flowchart of degree of confidence generation unit 12 of the present invention.As shown in the figure, degree of confidence generation unit 12 of the present invention further comprises matching distance generation unit 31, candidate distance selected cell 32 and computing unit 33.Wherein, matching distance generation unit 31 is used to produce the matching distance of recognition result and each character class; Candidate distance selected cell 32 is used for according to l matching distance before the ascending select progressively of matching distance as candidate distance; Computing unit 33 is used for calculating according to a described l candidate distance degree of confidence of described recognition result.Below describe the principle of work of degree of confidence generation unit in detail:
Degree of confidence generation unit 12 adopts the degree of confidence that generates recognition result from matching distance match location mode.(I i) satisfies following relation with input sample I coupling for the conditional probability p (I|i) of character class i for the matching distance Dis of character class i to suppose to import sample I coupling
p(I|i)∝e -Dis(I,i)/t
Wherein t is a positive default constant, can be obtained by experience or experiment.In the practical application, the probability distribution of index is all satisfied in a lot of distributions, just belongs to exponential distribution such as the normal distribution that often occurs.According to bayesian theory, given input sample image, the posterior probability that sample image is judged as classification i also is that degree of confidence can be calculated as:
p ( i | I ) = p ( I | i ) p ( i ) Σ i = 1 C p ( I | i ) p ( i ) ≈ e - Dis ( I , i ) / t Σ i = 1 C e - Dis ( I , i ) / t · · · · · ( 1 )
Wherein C is the character type number.In the top formula, used prior probability p (i) hypothesis about equally of every class character, in training process, if guarantee the frequency of occurrences basically identical of every class character, then this hypothesis can be met.
Concrete calculating confidence process is as follows: at first, (I, i), wherein, i gets all over C character class the matching distance Dis of matching distance generation unit 31 generation recognition results and each character class, obtains importing the matching distance of sample I and all character classes thus.Then, the individual matching distance of preceding l (lC) that candidate distance selected cell 32 will obtain is as candidate distance, and wherein, the pairing character class of each candidate distance is called the identification candidate, is designated as Cand i, i=1,2 ..., l; Each identification candidate Cand iMatching distance be called candidate distance, be designated as Dis i, and candidate distance carried out ascending ordering, that is, and and Dis iDis j, for i〉and ji, j=1,2 ..., l.At last, computing unit 33 utilizes following formula to calculate the degree of confidence conf of the recognition result of first order categorised decision unit 11:
Conf = e - Dis i / t Σ i = 1 l e - Dis i / t · · · · · ( 2 )
Wherein t is a positive default constant.In the actual application, the matching distance of last several candidates is often very big, so does not use the classification after l the candidate in the formula (2), can improve the speed of calculating on the one hand, also can not reduce the accuracy of estimation of degree of confidence on the other hand.
Fig. 4 is the structured flowchart of degree of confidence judging unit 13 of the present invention.As shown in the figure, degree of confidence judging unit 13 further comprises threshold setting unit 41 and comparison judgment unit 42.Wherein threshold setting unit 41 is used to preestablish threshold value Th1, and usually Th1 is a positive number between 0 and 1; Comparison judgment unit, the degree of confidence conf of the recognition result of comparison first order categorised decision unit 11 and the size of threshold value Th1, if degree of confidence conf is greater than threshold value Th1, the character sample of then judging input is non-distortion sample, and with the recognition result of first order categorised decision unit 11 as final recognition result; Otherwise, judge that then the character sample of input is the distortion sample, and should be out of shape sample input categorised decision unit, the second level and discern.
Fig. 5 is the structured flowchart of categorised decision unit, the second level of the present invention 14.As shown in the figure, second level categorised decision unit 14 further comprises: distortion sample template base generation unit 51, difference operator unit 52, distortion sample conversion unit 53 and difference characteristic decision package 54.Wherein, distortion sample template base generation unit 51 is used for generating distortion sample template base; Difference operator unit 52 is used for that the template vector of described distortion sample and described distortion sample template base is got difference and obtains difference characteristic; Distortion sample conversion unit 53 is used for described distortion sample conversion is become difference and class differences in the class; Difference characteristic decision package 54 is used to judge that described difference characteristic belongs to class interpolation XOR class differences, and described distortion sample is identified as the classification of the class interpolation anomalous mode plate with maximum confidence.
Fig. 6 is the structural drawing of distortion sample template base generation unit 51.In a preferred embodiment, the present invention is out of shape the sample template base and is used based on automatic cluster such as C-Mean (A.K.Jain by the basic storehouse of distortion sample, M.N.Myrthy, and P.J.Flynn.Data clus tering:A survey.ACM ComputingSurvey, 31 (3): 264--323,1999.) method obtains.As shown in the figure, distortion sample template base generation unit 51 further comprises distortion sample basic storehouse generation unit 61 and sample cluster cell 62.The basic storehouse of distortion sample generation unit 61 is used for generating the basic storehouse of distortion sample, and this basic storehouse of distortion sample only comprises the distortion sample; Sample cluster cell 62 is used for that automatic cluster is carried out in the basic storehouse of distortion sample and generates distortion sample template base.
In a preferred embodiment, the basic storehouse of distortion sample generation unit 61 adopts method shown in Figure 7 to generate the basic storehouse of distortion sample.As shown in the figure, the sample in each training sample database is imported first order categorised decision unit 11 successively.Then, the degree of confidence that degree of confidence judging unit 13 is generated according to degree of confidence generation unit 12 judges that whether this sample is by first order categorised decision unit 11 identification errors, if identification error then the sample of this identification error is added the basic storehouse of distortion sample, therefore, the distortion sample has only comprised by the sample of first order categorised decision unit identification error in the basic storehouse, promptly is out of shape sample.Sample cluster cell 62 is at the basic storehouse of distortion sample, adopt automatic cluster algorithm autopolymerization to become K cluster centre to each class, this K cluster centre then as K template of this class sample, can generate distortion sample template base by adopting the automatic cluster algorithm.
In a preferred embodiment, the distortion sample that difference operator unit 52 only will appear at the template of the classification in the first order categorised decision unit top n identification candidate collection and input is the calculated difference feature together, and input difference characteristic decision package 54 is discerned.Do like this and avoided the template and the input of identification all categories to be out of shape the difference characteristic of sample, thereby can effectively lower the number of times of difference characteristic decision package classification, improved recognition speed.
Distortion sample conversion unit 53 is used for the distortion sample training is become in the class sample between sample and class.This conversion has generated distortion sample training storehouse, and wherein sample is formed training storehouse in the class in the class, trains the storehouse between sample composition class between class.The method that the present invention adopts random number to produce solves the imbalance problem of the sample number in the class of number of samples between class, and distortion sample conversion unit further comprises sampling unit, is used for sampling according to a given number percent generating described class differences.The generation method in concrete distortion sample training storehouse sees Fig. 8 for details.
As shown in Figure 8, initial value is set at first, and from the basic storehouse Dw={V of distortion sample 1, V 2..., V qThe middle sample V that extracts k, V j, judge whether that all distortion samples have been got, if got then finish this flow process.If do not got then judged V k, V jWhether be the sample that belongs to same classification, if, then totalizer i being added 1, calculated difference feature Ni, this difference characteristic are difference in the class, and will train the storehouse in the class in the Ni adding distortion sample training storehouse.If not, then described random number is converted into a probability P by the probability converting unit, and by sampling unit with Probability p (0<p<1 is a predetermined threshold value) with V k, V jBetween difference characteristic Ni add between class in the distortion sample training storehouse and train the storehouse.The difference characteristic of above process circulation between all distortion samples traveled through.
In the superincumbent process, at first with the arbitrary integer between the probability generation 0 to X, X=[1/p], expression is got its integral part to 1/p, produces the sampling probability P thus.Sampling unit will will be trained the storehouse between class differences feature adding class with the probability of p like this.
On the one hand in the superincumbent process, in fact the distortion sample conversion with multiclass is one two a class problem, judges that promptly pair of sample is a still sample between class in the class, and this mode is great exptended sample quantity, thereby has remedied the problem of distortion number of samples deficiency.Adopt the method for random number on the other hand, can effectively avoid the problem of the number of samples in the class of number of samples between class, thus make categorised decision unit, the second level can not be partial to difference characteristic be identified as class interpolation difference this.To specify the characteristics of this method with example below.
Suppose to have 10 classes among the basic storehouse Dw of distortion sample that first order decision function forms the sample of training sample identification error, such as being 10 numerals, 0-9, the number of every class sample has similar number q.Usually use method based on statistics, when training first order decision function as the method for SVM, by the number of samples of wrong identification seldom, promptly in the Dw number of every class sample seldom, a typical number is about 15.Based on the sample of such peanut, SVM will be difficult to learn out a sorter that high accuracy of identification is arranged.If it is identification problem in the class and between class that the classification problem of 10 classes is converted to 2 classes, then sample number can be expanded greatly, and particularly, sample number is between class C 10 2 q 2 = 10125 , Sample number is in the class 10 C q 2 = 1050 . But when number of samples obtains expanding, number of samples sample number in the class between class, such problem is called uneven identification problem, and the decision function that training is come out tends to tend to new input sample is identified as that class sample more than the sample number.The present invention has adopted the way of sampling with random number, supposes p=1/7, then sampling unit will with the probability of 1/7*100% will be between class sample add between class and train the storehouse, also be storehouse between last class number of samples will for C 10 2 q 2 / 7 = 1446 , Like this in the class of Chan Shenging and the balance relatively on the one hand of the number of samples between class, the number of samples 1050+1446=2496 of the classification problem of two classes is far more than the number of samples 15*10=150 of 10 original class classification problems on the other hand, thereby can use learning method based on statistics, train high-precision decision function, improved the precision of system greatly.
The distortion sample training storehouse training that difference characteristic decision package 54 is generated by the basic storehouse of distortion sample, specifically the sorter form can be any two class sorters, as SVM etc.Export the result at last to output unit 15.Particularly, suppose that the input character sample is designated as I.The character class number is the C class altogether, and the template number of every class is W in the distortion sample template base, and being designated as Tij is (1iC, 1jW), suppose that the preceding N identification of first recognition unit candidate collection is DN, then can obtain NW difference vector Δ ij=|I-Tij| (1≤i<C, i ∈ DN, 1jw).| I-Tij| represents each dimension of vectorial I-Tij is taken absolute value.To each difference vector, difference characteristic decision package 54 judges that this input difference vector belongs to difference in the class, or the class differences.Concrete decision process is as follows:
Suppose that difference characteristic decision package decision function is f Dw(f DwCan be by two category features, promptly difference characteristic and class differences feature form by sorting technique such as SVM training in the class),
Work as f DwIj) 〉=0, input difference characteristic Δ ij is judged as difference characteristic in the class, also is that I has identical character class with template Tij.
Work as f DwIj)<0, input difference characteristic Δ ij is judged as class differences feature, also is that I has different character class with template Tij.
Shown in specific as follows
The identification output 0 of difference characteristic decision package is
O = arg max f Dw i ( Δ ij ) , j = 1,2 , . . . , w - - - ( 1 )
f DwIj) value is big more, expression Δ ij is that the possibility of difference characteristic is big more in the class.So appear at the template of first order identification candidate collection, Δ ij=|I-Tij| (1≤i<C, i ∈ D at all N, 1jw), the difference characteristic sorter is the classification with template of maximum decision value to the classification results of sample I.
Effectively combine template matches and two class sorters in the superincumbent difference characteristic sorter.On the one hand, these templates are formed by the distortion sample process, therefore can effectively discern the distortion sample.Adopt on the one hand in addition difference characteristic only with sample as handling in the class and between class, effective exptended sample number, thus can improve decision-making device, especially based on the decision-making sorter of the statistics nicety of grading of (as, SVM etc.).
Fig. 9 is a second level of the present invention categorised decision cell operation schematic diagram.As shown in the figure, when sample is judged to be the distortion sample by degree of confidence judging unit 13 after, in the distortion sample that difference operator unit 52 will be imported and the distortion sample template base and the template that appears at the character class that occurs in the first order categorised decision unit top n identification candidate collection subtract each other, obtain difference characteristic.Difference characteristic decision package 54 is judged that it still is the class differences that the difference characteristic of importing belongs to the interior difference of class, and will be out of shape the classification that sample is identified as the class interpolation anomalous mode plate with maximum confidence according to the difference characteristic and the distortion sample training storehouse of input.At last, by output unit 15 recognition result of difference characteristic decision package 54 is exported.Wherein, the basic storehouse of distortion sample adopts the automatic cluster algorithm to generate distortion sample template base by the sample cluster cell, and the basic storehouse of distortion sample generates distortion sample training storehouse by distortion sample conversion unit 53.
The present invention adopts the two-stage recognition structure, and the sample of first order recognition device identification error can continue to enter second level recognition device and discern.Therefore, technical scheme of the present invention by with the sample of first order identification error directly as partial propaedeutics sample, can effectively handle the sample of those distortion, therefore can improve the accuracy of identification of system.Simultaneously, utilize error sample to generate distortion sample storehouse automatically and also avoided manual collection sample storehouse, saved manpower greatly, material resources.
In addition, the method for difference characteristic has been adopted in the identification that the present invention carries out the distortion sample, by poor with getting mutually between the deformed characters proper vector, thereby the character recognition problem of multiclass is converted to two classes, also is the identification problem of interior difference of class and class differences.And in two class difference characteristics after the conversion set training classifier.The character recognition problem of multiclass is converted to the identification of two classes, and greatly the exptended sample number effectively utilizes the recognition methods based on statistics, has improved the sorter training effect, thereby has improved the precision of system.
Above embodiment only is used to illustrate the present invention, but not is used to limit the present invention.

Claims (13)

1. a character identifying method is characterized in that, said method comprising the steps of:
First classification step is discerned the character sample of input, produces recognition result;
Generate the degree of confidence of described recognition result;
Judge according to described degree of confidence whether the character sample of being imported belongs to the distortion sample;
If non-distortion sample, then with described recognition result as final recognition result;
If the distortion sample is then discerned the distortion sample, generate final recognition result;
Export described final recognition result,
Wherein, described distortion sample is discerned be may further comprise the steps:
Generate distortion sample template base, the template of described distortion sample and described distortion sample template base is got difference obtain difference characteristic;
Described distortion sample conversion is become in the class sample between sample and class, and judge that described difference characteristic belongs to class interpolation XOR class differences;
Described distortion sample is identified as the classification of class interpolation anomalous mode plate with maximum confidence.
2. character identifying method according to claim 1 is characterized in that, described degree of confidence generates by the method that the matching distance match distributes.
3. character identifying method according to claim 2 is characterized in that, the method that distributes by the matching distance match generates degree of confidence, further may further comprise the steps:
Produce the matching distance of recognition result and each character class;
According to l matching distance before the ascending select progressively of matching distance as candidate distance;
Calculate the degree of confidence of described recognition result according to a described l candidate distance.
4. character identifying method according to claim 1 is characterized in that, judges that according to described degree of confidence whether the character sample of being imported belongs to the distortion sample, specifically may further comprise the steps:
If described degree of confidence, judges then that the character sample of input is non-distortion sample greater than setting threshold;
Otherwise, judge that then the character sample of input is the distortion sample.
5. character identifying method according to claim 1 is characterized in that, the step of described generation distortion sample template base may further comprise the steps:
Generate the basic storehouse of distortion sample, the basic storehouse of described distortion sample only comprises described distortion sample;
Automatic cluster is carried out in the basic storehouse of described distortion sample generate distortion sample template base.
6. character identifying method according to claim 1 is characterized in that, the template of described distortion sample and described distortion sample template base is got difference obtain difference characteristic and be meant:
The distortion sample is only got difference with the distortion sample template of top n candidate's classification of the recognition result of described first classification step and is obtained difference characteristic.
7. character identifying method according to claim 1 is characterized in that, described distortion sample conversion is become sample may further comprise the steps between class: sampling according to given number percent generates sample between class.
8. a character recognition device is characterized in that, described device comprises:
First order categorised decision unit is used for the character sample of input is discerned, and produces recognition result;
The degree of confidence generation unit is used to generate the degree of confidence of recognition result;
The degree of confidence judging unit, be used for judging according to described degree of confidence whether the character sample of being imported belongs to the distortion sample, if non-distortion sample then with described recognition result as final recognition result, if the distortion sample is then with described distortion sample input categorised decision unit, the second level;
Categorised decision unit, the second level is used for the distortion sample is discerned, and generates final recognition result;
Output unit is used to export described final recognition result,
Wherein, categorised decision unit, the described second level further comprises:
Distortion sample template base generation unit is used for generating distortion sample template base;
Difference operator unit is used for that the template of described distortion sample and described distortion sample template base is got difference and obtains difference characteristic;
Distortion sample conversion unit is used for described distortion sample conversion is become in the class sample between sample and class;
The difference characteristic decision package is used to judge that described difference characteristic belongs to class interpolation XOR class differences, and described distortion sample is identified as the classification of the class interpolation anomalous mode plate with maximum confidence.
9. character recognition device according to claim 8 is characterized in that, described degree of confidence generation unit further comprises:
The matching distance generation unit is used to produce the matching distance of recognition result and each character class;
The candidate distance selected cell is used for according to l matching distance before the ascending select progressively of matching distance as candidate distance;
Computing unit is used for calculating according to a described l candidate distance degree of confidence of described recognition result.
10. character recognition device according to claim 8 is characterized in that, described degree of confidence judging unit further comprises:
The threshold setting unit is used to set a threshold value;
Comparison judgment unit, the degree of confidence of more described recognition result and the size of described threshold value, if described degree of confidence is greater than setting threshold, the character sample of then judging input be non-distortion sample and with described recognition result as final recognition result, otherwise the character sample of then judging input is for the distortion sample and with described distortion sample input categorised decision unit, the second level.
11. character recognition device according to claim 8 is characterized in that, described distortion sample template base generation unit further comprises:
The basic storehouse of distortion sample generation unit is used for generating the basic storehouse of distortion sample, and the basic storehouse of described distortion sample only comprises described distortion sample;
The sample cluster cell is used for that automatic cluster is carried out in the basic storehouse of described distortion sample and generates distortion sample template base.
12. character recognition device according to claim 8, it is characterized in that, described difference operator unit also is used for the distortion sample template of top n candidate's classification of the recognition result of distortion sample and described first order categorised decision unit got to differ from obtaining difference characteristic.
13. character recognition device according to claim 8 is characterized in that, described distortion sample conversion unit further comprises,
Sampling unit, being used for sampling described distortion sample conversion according to a given number percent is sample between described class.
CN2007100787676A 2007-02-26 2007-02-26 Method and apparatus for character recognition Expired - Fee Related CN101256631B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2007100787676A CN101256631B (en) 2007-02-26 2007-02-26 Method and apparatus for character recognition
JP2008044928A JP5365026B2 (en) 2007-02-26 2008-02-26 Method, apparatus and program for identifying code

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2007100787676A CN101256631B (en) 2007-02-26 2007-02-26 Method and apparatus for character recognition

Publications (2)

Publication Number Publication Date
CN101256631A CN101256631A (en) 2008-09-03
CN101256631B true CN101256631B (en) 2011-06-01

Family

ID=39786587

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2007100787676A Expired - Fee Related CN101256631B (en) 2007-02-26 2007-02-26 Method and apparatus for character recognition

Country Status (2)

Country Link
JP (1) JP5365026B2 (en)
CN (1) CN101256631B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102402693B (en) * 2010-09-09 2014-07-30 富士通株式会社 Method and equipment for processing images containing characters
CN106372634A (en) * 2015-07-21 2017-02-01 无锡天脉聚源传媒科技有限公司 Method and apparatus for identifying original complex Chinese character
CN105956590A (en) * 2016-04-27 2016-09-21 泰合鼎川物联科技(北京)股份有限公司 Character recognition method and character recognition system
JP6977345B2 (en) * 2017-07-10 2021-12-08 コニカミノルタ株式会社 Image processing device, image processing method, and image processing program
CN109902768B (en) * 2019-04-26 2021-06-29 上海肇观电子科技有限公司 Processing of output results of optical character recognition techniques
CN110176007A (en) * 2019-05-17 2019-08-27 广州视源电子科技股份有限公司 Crystalline lens dividing method, device and storage medium
JP7381330B2 (en) * 2019-12-24 2023-11-15 京セラ株式会社 Information processing system, information processing device, and information processing method
JP7316203B2 (en) * 2019-12-06 2023-07-27 京セラ株式会社 Information processing system, information processing device, and information processing method
WO2021112234A1 (en) * 2019-12-06 2021-06-10 京セラ株式会社 Information processing system, information processing device, and information processing method
JP2021099629A (en) * 2019-12-20 2021-07-01 京セラ株式会社 Information processing system, information processing device, and information processing method
CN111767909B (en) * 2020-05-12 2022-02-01 合肥联宝信息技术有限公司 Character recognition method and device and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1187257A (en) * 1995-06-05 1998-07-08 摩托罗拉公司 Method and apparatus for character recognition of handwriting input
CN1472695A (en) * 2002-07-09 2004-02-04 ������������ʽ���� Symbol identifying device and method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS6142083A (en) * 1984-08-03 1986-02-28 Fujitsu Ltd Character recognition device
JPH0528302A (en) * 1991-07-19 1993-02-05 Nec Corp Character reader
JPH06309435A (en) * 1993-04-26 1994-11-04 Hitachi Ltd Method and device for recognizing pattern and pattern reader using the device
JPH11203414A (en) * 1998-01-08 1999-07-30 Fuji Xerox Co Ltd Broadly classified dictionary preparing device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1187257A (en) * 1995-06-05 1998-07-08 摩托罗拉公司 Method and apparatus for character recognition of handwriting input
CN1472695A (en) * 2002-07-09 2004-02-04 ������������ʽ���� Symbol identifying device and method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JP特开平9-274645A 1997.10.21
刘海龙,丁晓青.基于镜像学习和复合二次距离的手写汉字识别.清华大学学报46 7.2006,46(7),1239-1242.
刘海龙,丁晓青.基于镜像学习和复合二次距离的手写汉字识别.清华大学学报46 7.2006,46(7),1239-1242. *

Also Published As

Publication number Publication date
CN101256631A (en) 2008-09-03
JP5365026B2 (en) 2013-12-11
JP2008210388A (en) 2008-09-11

Similar Documents

Publication Publication Date Title
CN101256631B (en) Method and apparatus for character recognition
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN101496035B (en) Method for classifying modes
CN103632168B (en) Classifier integration method for machine learning
US20150134578A1 (en) Discriminator, discrimination program, and discrimination method
CN103366367B (en) Based on the FCM gray-scale image segmentation method of pixel count cluster
CN100595780C (en) Handwriting digital automatic identification method based on module neural network SN9701 rectangular array
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN105930862A (en) Density peak clustering algorithm based on density adaptive distance
CN101147160B (en) Adaptive classifier, and method of creation of classification parameters therefor
CN102156871B (en) Image classification method based on category correlated codebook and classifier voting strategy
CN106909946A (en) A kind of picking system of multi-modal fusion
CN108319987A (en) A kind of filtering based on support vector machines-packaged type combined flow feature selection approach
CN109086825A (en) A kind of more disaggregated model fusion methods based on model adaptation selection
CN105930792A (en) Human action classification method based on video local feature dictionary
CN110516098A (en) Image labeling method based on convolutional neural networks and binary coding feature
CN109614866A (en) Method for detecting human face based on cascade deep convolutional neural networks
CN104850868A (en) Customer segmentation method based on k-means and neural network cluster
CN102004796B (en) Non-retardant hierarchical classification method and device of webpage texts
CN111597943B (en) Table structure identification method based on graph neural network
CN114663002A (en) Method and equipment for automatically matching performance assessment indexes
CN104537383A (en) Massive organizational structure data classification method and system based on particle swarm
CN101702172A (en) Data discretization method based on category-attribute relation dependency
CN105938547A (en) Paper hydrologic yearbook digitalization method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110601

Termination date: 20180226

CF01 Termination of patent right due to non-payment of annual fee