CN1204526C - Preclassifying method and system for Chinese handwriting character recognition - Google Patents

Preclassifying method and system for Chinese handwriting character recognition Download PDF

Info

Publication number
CN1204526C
CN1204526C CN 02127006 CN02127006A CN1204526C CN 1204526 C CN1204526 C CN 1204526C CN 02127006 CN02127006 CN 02127006 CN 02127006 A CN02127006 A CN 02127006A CN 1204526 C CN1204526 C CN 1204526C
Authority
CN
China
Prior art keywords
chinese character
candidate group
hanzi features
hanzi
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 02127006
Other languages
Chinese (zh)
Other versions
CN1471042A (en
Inventor
郭丰俊
镇立新
黄建成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google Technology Holdings LLC
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to CN 02127006 priority Critical patent/CN1204526C/en
Publication of CN1471042A publication Critical patent/CN1471042A/en
Application granted granted Critical
Publication of CN1204526C publication Critical patent/CN1204526C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention relates to a pre-classifying method for Chinese handwriting character recognition systems. A handwriting Chinese character recognition system is used for pre-classifying and subtly classifying the characteristics of handwriting Chinese characters for recognizing the handwriting Chinese characters. The method comprises that a first kind of Chinese character characteristic of low dimension of the handwriting Chinese characters is extracted; a first alternate select character group is generated; a second kind of Chinese character characteristic of high dimension of the handwriting Chinese characters is extracted and is used for subtle classification; the dimension of the extracted second kind of Chinese character characteristic of high dimension is reduced for obtaining the second kind of Chinese character characteristic of low dimension; a second alternate select character group is generated; a final alternate select character group is obtained by the intersection of the first alternate select character group and the second alternate select character group. For the same handwriting Chinese character, the present invention carries out pre-classification through two son pre-classifying devices, and uses two kinds of Chinese character characteristics for respectively screening out two different alternate select character groups. Consequently, the present invention avoids the defect that the alternate select character groups are screened out only by one pre-classifying device and one kind of Chinese character characteristic.

Description

The method and system of presorting that are used for handwritten Kanji recognition
Technical field
The present invention relates to the recognition methods and the system of handwritten Chinese character, relate in particular to the method and apparatus of presorting that is used for handwritten Kanji recognition.
Background technology
The computing power of handheld device and memory capacity are all very limited, therefore are used for the Chinese handwriting recognition (CHWR) of portable handheld device, differ widely with the Chinese handwriting recognition that is used for desk-top computer.For a recognition system, generally need a sophisticated category device based on the Chinese character high dimensional feature at whole Chinese characters.The computational complexity of this sorter is too high, and the memory span demand is too big, to such an extent as to can't directly apply to handheld device.
Fig. 4 a and Fig. 4 b have showed the flow process of two kinds of existing pre-classifiers.Among Fig. 4 a, at first, carry out Elastic Matching (dynamic programming method) to obtain the candidate group in step 72 then in the architectural feature of step 71 input Chinese character.But, the Chinese Character Structure Models of Li Yonging and be not easy training wherein, the structural difference of different Chinese character is very big, and the calculation of complex gender gap is also very big, and the structure of Chinese character is complicated more, and the computing time that pre-classifier needs is also many more.
Fig. 4 b has used the multistage strategy of presorting, to improve recognition speed.In each pre-classifier,, be the selected scope of feature-set of each class based on training data.If the feature of input sample is in the scope of a category feature, then such will be included in the candidate group, otherwise such will be not included in the candidate group.At the statistical nature of step 81 input handwritten Chinese character, compare in the statistical nature scope of step 82 with each Chinese character, obtain the candidate group.
Existing pre-classifier only utilizes a kind of feature screening candidate group, is trading off aspect speed and the accuracy rate, and accuracy rate is low when high-speed, and speed was slow when accuracy rate was high, and its combination property is difficult to satisfy the demand of handheld device.
Summary of the invention
In view of the weak point of prior art, the purpose of this invention is to provide a kind of new method of presorting and pre-classifier.It can take into account the accuracy and the speed of presorting better.
Further purpose of the present invention provides a kind of Chinese handwriting identifying method efficiently and system.
According to a kind of method of presorting that is used for Chinese handwritten Chinese character recognition system of the present invention, this handwritten Chinese character recognition system to the feature of handwritten Chinese character presort and sophisticated category to discern this Chinese character.This method comprises: extract first kind of Hanzi features of the low-dimensional of handwritten Chinese character, and produce the first candidate group thus; Extract second kind of Hanzi features of the higher-dimension of handwritten Chinese character, be used for sophisticated category; Second kind of Hanzi features dimensionality reduction of described higher-dimension, obtain second kind of Hanzi features of low-dimensional, and produce the second candidate group thus; And, obtain final candidate group according to the common factor of the described first candidate group and the second candidate group.
For same handwritten Chinese character, the present invention utilizes two kinds of different Hanzi featureses to filter out two different candidate groups respectively by two sub-pre-classifiers.Presort according to these two candidate groups, thereby avoided utilizing merely a pre-classifier and a kind of Hanzi features to produce the deficiency that the candidate group is brought based on the different Chinese character feature.
The present invention also provides a kind of method that is used for the hand-written Chinese character of Chinese handwritten Chinese character recognition system identification.This method comprises: extract first kind of Hanzi features of the low-dimensional of handwritten Chinese character, and presorted by first son and to produce the first candidate group; Extract second kind of Hanzi features of the higher-dimension of described handwritten Chinese character; Obtain second kind of Hanzi features of low-dimensional from second kind of Hanzi features of described higher-dimension, and presort by second son and to produce the second candidate group; Common factor according to the described first candidate group and the second candidate group obtains final candidate group, as the result who presorts; And utilize the second kind of Hanzi features and the described final candidate group of described higher-dimension to identify the Chinese character of being write.Utilize second kind of Hanzi features of two kinds of different candidate groups and higher-dimension, the accuracy rate and the speed combination property of identification writing Chinese characters improve significantly.
The present invention also provides a kind of pre-classifier that is used for Chinese handwritten Chinese character recognition system.It comprises first kind of Research of Chinese Feature Extraction device of low-dimensional, is used to extract the first kind of Hanzi features and the first sub-pre-classifier of the low-dimensional of handwritten Chinese character, is used to produce the first candidate group; Second kind of Research of Chinese Feature Extraction device of higher-dimension is used to extract second kind of Hanzi features of the higher-dimension of handwritten Chinese character; The dimensionality reduction device second kind of Hanzi features dimensionality reduction of described higher-dimension, obtains second kind of Hanzi features of low-dimensional; The second sub-pre-classifier produces the second candidate group according to second kind of Hanzi features of said low-dimensional; And final candidate group generation device, utilize the common factor of the described first candidate group and the second candidate group to obtain final candidate group.
The present invention also provides a kind of Chinese handwritten Chinese character recognition system.It comprises first kind of Research of Chinese Feature Extraction device of low-dimensional, is used to extract the first kind of Hanzi features and the first sub-pre-classifier of the low-dimensional of handwritten Chinese character, is used to produce the first candidate group; Second kind of Research of Chinese Feature Extraction device of higher-dimension is used to extract second kind of Hanzi features of the higher-dimension of described handwritten Chinese character.This handwritten Chinese character recognition system also comprises: the dimensionality reduction converting means second kind of Hanzi features dimensionality reduction of described higher-dimension, obtains second kind of Hanzi features of low-dimensional; The second sub-pre-classifier produces the second candidate group according to second kind of Hanzi features of the low-dimensional that is obtained; Final candidate group generation device is used to produce final candidate group; And the sophisticated category device, be used to utilize the second kind of Hanzi features and the described final candidate group of described higher-dimension to discern this handwritten Chinese character.
Described sophisticated category device of the present invention utilizes the common factor of the described first candidate group and the second candidate group to discern this handwritten Chinese character.Make full use of the complementarity of the first candidate group and the second candidate group, removed some unnecessary candidate, thereby improved the recognition speed of sophisticated category device.
First kind of Hanzi features of described low-dimensional of the present invention is different with second kind of Hanzi features of described low-dimensional.Uncorrelated basically between them.Therefore, the resultant first candidate group and the second candidate group have certain complementarity.
In addition, the peripheral characteristic of Chinese character is important more than its internal feature, more helps discerning Chinese character, and therefore, what second kind of Hanzi features of low-dimensional of the present invention selected for use is the peripheral statistical nature of Chinese character.Dimensionality reduction converting means of the present invention gathers the peripheral characteristic of second Hanzi features of the higher-dimension of sampling with (summarize), as adds up, and obtains second kind of Hanzi features of low-dimensional.So just saved the independently extraction of second kind of Hanzi features of low-dimensional.
The present invention also proposes a kind ofly is used for Chinese handwritten Chinese character recognition system produces the candidate group by presorting method and comprises: a plurality of templates of training effective statistical nature; These templates are divided into a plurality of statistical nature clusters; In each cluster, generate the wherein cluster centre of whole Hanzi featureses of representative; And, produce a word indexing group to each statistical nature cluster; Chinese character to input is sampled and is obtained the statistical nature of this Chinese character; The statistical nature of this Chinese character sampling gained and the cluster centre of each cluster are compared, select some groups of clusters the most similar with it, the quantity of wherein similar cluster group pre-determines; And merge the selected corresponding word indexing group of cluster group, produce candidate group to the input Chinese character.This with cluster centre mode relatively, be better than in the prior art mode that the cluster feature scope with each cluster compares.Its accuracy rate height, and have greater flexibility.
Description of drawings
Fig. 1 is the block diagram according to handwritten Chinese character sorter of the present invention.
Fig. 2 is according to Hanzi features dimension reduction method synoptic diagram of the present invention.
Fig. 3 is according to candidate group selection synoptic diagram of the present invention.
Fig. 4 a and Fig. 4 b have showed the flow process of the pre-classifier of two kinds of prior aries.
Fig. 4 c is according to pre-classifier process flow diagram of the present invention.
Fig. 5 a and Fig. 5 b are the synoptic diagram that extracts the Hanzi features higher dimensional matrix of handwritten Chinese character.
Fig. 6 a and Fig. 6 b are the synoptic diagram with the Hanzi features higher dimensional matrix dimensionality reduction among Fig. 5 b.
Embodiment
With reference to figure 1, handwritten Chinese character sorter of the present invention comprises a pre-classifier 1 and sophisticated category device 2.This pre-classifier comprises the first sub-pre-classifier 12 and the second sub-pre-classifier 13.Pre-classifier 1 also comprises first kind of Research of Chinese Feature Extraction device 10 of a low-dimensional, is used for extracting from the Chinese character of input first kind of Hanzi features of low-dimensional.First kind of Hanzi features of this low-dimensional, it generally is the Chinese character statistical nature (Statistic Feature) of the low-dimensional of Chinese character, as the frequency field feature (low dimension frequency domain feature) of the low-dimensional of Chinese character, or other Chinese character statistical natures.The first sub-pre-classifier 12 also stores a plurality of clusters (not shown cluster centre and word indexing group) that first kind of Hanzi features with low-dimensional adapts, and comprises the cluster centre and the corresponding word indexing group of this Hanzi features.Wherein, each cluster comprises Chinese character like a plurality of feature classes, and each cluster has a cluster centre, and this cluster centre has been represented the common trait of Chinese character in this cluster.The first sub-pre-classifier compares first kind of Hanzi features of low-dimensional and each cluster centre of the first sub-pre-classifier, obtains and the first sub-pre-classifier distances of clustering centers (distance).According to the first sub-pre-classifier in each distances of clustering centers, select several minimum clusters of distance with it, as the output of the first sub-pre-classifier.The first candidate group formed in these Chinese characters included apart from the cluster of minimum.
Pre-classifier 1 also comprises second kind of Hanzi features device of a low-dimensional, i.e. a dimensionality reduction converting means 21.Second Hanzi features that this device will extract higher-dimension is reduced to second kind of Hanzi features of low-dimensional.Second Hanzi features of this higher-dimension is to be extracted by the Research of Chinese Feature Extraction device of higher-dimension, is used for sophisticated category.Second kind of statistical nature that Hanzi features also is a kind of Chinese character this higher-dimension or low-dimensional.But second kind of Hanzi features of this low-dimensional is the Chinese character statistical nature different with first kind of Hanzi features of described low-dimensional.The front has said that the statistical nature of Chinese character has a variety of.Here first or second Hanzi features of saying can be wherein any.But require selected first kind of Hanzi features different with second kind of Hanzi features, that is, and mutually orthogonal to a certain extent (almost not having correlativity).For example, the correlativity of the stroke number feature of Chinese character and stroke direction feature is low.For example, the stroke number feature similarity of China fir and close to is in same cluster, but the direction character difference between them is very big, not in same cluster.The second sub-pre-classifier stores a plurality of clusters that second kind of Hanzi features with low-dimensional adapts.Each cluster comprises a plurality of Chinese characters, and each cluster has a cluster centre.This cluster centre has been represented the common trait of Chinese character in this cluster.The second sub-pre-classifier compares second kind of Hanzi features of the low-dimensional of input Chinese character and each cluster centre of the second sub-pre-classifier, obtains and the second sub-pre-classifier distances of clustering centers (distance).According to itself and the second sub-pre-classifier distances of clustering centers, select the minimum cluster of a plurality of distances with it, as the output of the second sub-pre-classifier.The second candidate group formed in these Chinese characters included apart from the cluster of minimum.
Because the first candidate group and the second candidate group have certain complementarity, can be with the common factor of the first candidate group and the second candidate group as the final candidate group of pre-classifier, just as sophisticated category device candidate group, so that remove the unnecessary Chinese character in the first candidate group that first kind of Hanzi features according to low-dimensional filter out, and the unnecessary Chinese character in the second candidate group that filters out according to second kind of Hanzi features of low-dimensional.This is finished by final candidate group generation device (common factor generating apparatus) 14 shown in Fig. 1 and the final candidate group of pre-classifier memory storage 15.This method can reduce the sophisticated category device the number of the Chinese character in the candidate group to be processed, also just improved the recognition speed of sophisticated category device.And then improved the speed of whole handwritten Chinese character sorter.
As selection, because second kind of Hanzi features of first kind of Hanzi features of low-dimensional and low-dimensional is mutually orthogonal to a certain extent, so the first candidate group and the second candidate group have certain complementarity.The first candidate group that filters out according to first kind of Hanzi features of low-dimensional can be replenished mutually with the second candidate group that second kind of Hanzi features according to low-dimensional filters out.At this moment, the final candidate group generation device (common factor generating apparatus) 14 among Fig. 1 can be replaced with a union generating apparatus (not shown) gets final product.Like this, form the final candidate group of pre-classifier, just can be used as sophisticated category device candidate group, identify this handwritten Chinese character by sophisticated category device 22 by all Chinese characters in the first candidate group and the second candidate group.
Described sophisticated category device 22 comprises the extraction element 20 of a certain Hanzi features of a higher-dimension, is used for extracting from handwritten Chinese character the Hanzi features of higher-dimension.In order to make Chinese Character Recognition have enough precision, the Hanzi features of higher-dimension generally selects the direction character (high dimensiondirectional feature) of higher-dimension.This sophisticated category device 22 utilizes the Hanzi features of described higher-dimension, identifies described handwritten Chinese character the candidate group after the preliminary election that is transported to this sophisticated category device.
Second kind of Hanzi features of described low-dimensional is by conversion obtains through dimensionality reduction the Hanzi features of the higher-dimension that is used for the sophisticated category device.This function is finished by dimensionality reduction converting means 21.The front has said that for handwritten Chinese character, its peripheral characteristic is more important than its internal feature.Therefore, when dimensionality reduction, the present invention preferentially extracts the peripheral characteristic in the Hanzi features of higher-dimension.Fig. 2 a is depicted as a kind of Chinese character statistical nature of the higher-dimension of higher-dimension Research of Chinese Feature Extraction device extraction.Wherein each stain is represented the multidimensional feature.The feature (being peripheral characteristic) at four angles in the Hanzi features of the higher-dimension of extraction Chinese character is shown in Fig. 2 b.Then, the peripheral characteristic in each dotted line is gathered (summarize), as add up, obtain the statistical nature behind the dimensionality reduction shown in Fig. 2 c.With the second kind Hanzi features of the statistical nature behind the dimensionality reduction, thereby simplified the extraction of Hanzi features as low-dimensional.
Below with reference to Fig. 3, word indexing group generating apparatus 5 of the present invention is described.This device is used for being divided into a plurality of clusters according to the Chinese character that the feature of Chinese character is discerned needs.Each cluster has a cluster centre.Cluster centre is represented the feature of cluster, i.e. the common trait of all Chinese characters in this cluster.Each cluster comprises the index of Chinese character in the cluster corresponding to a word indexing group in this word indexing group.Word indexing group generating apparatus 5 comprises statistical nature template 51, clustering apparatus 52, word indexing group memory storage 53 and cluster centre memory storage 54.
Need to suppose m Chinese character of identification, at first train effective statistical nature template 51, make this template number also be m.Utilize clustering technique then, m template is divided into n cluster.In order to make the process of presorting have fast speeds, the value of n and m need satisfy n<<m.The number that is cluster will be less than template number far away.Then, obtain the cluster centre of each cluster, and the word indexing group of each cluster, write down the index of all Chinese characters in this cluster in the word indexing group of cluster.The feature similarity of the Chinese character in the same cluster.
Like this, can obtain a plurality of first clusters, cluster centre and a plurality of first word indexing group by first kind of Hanzi features of low-dimensional about this m Chinese character.Can obtain a plurality of second clusters, cluster centre and a plurality of second word indexing group by second kind of Hanzi features of low-dimensional about this m Chinese character.The frequency field feature of utilizing Chinese character can obtain a plurality of frequency field feature clusterings relevant with the frequency field feature, frequency field feature clustering center and frequency field tagged word index-group in conjunction with said method.
The direction character that utilizes Chinese character can obtain a plurality of direction character clusters relevant with direction character, direction character cluster centre and direction character word indexing group in conjunction with said method.
Describe candidate group generating apparatus of the present invention in detail below in conjunction with Fig. 3.Each sub-pre-classifier all comprises a candidate group generating apparatus 6.It comprises feature innput device 60, cluster centre comparison means 61, cluster selecting arrangement 62 and word indexing group combination memory storage 63.After the feature of extracting handwritten Chinese character, feature innput device 60 inputs to sub-pre-classifier with this Hanzi features.The Hanzi features cluster centre corresponding with corresponding cluster (or word indexing group) that cluster centre comparison means 61 will be imported in this sub-pre-classifier compared.Cluster selecting arrangement 62 utilizes the difference that relatively obtains to select P the minimum cluster of distance, i.e. P word indexing group with it.Chinese character in this P word indexing group has been formed a candidate group by word indexing group combination memory storage 63.
The candidate group that two sub-pre-classifiers are obtained combines, and has just obtained the final candidate group of pre-classifier.The value of P influences the recognition accuracy of handwritten Chinese character, and what of cluster in the candidate group, promptly the candidate in the candidate group what.If the value of P is big, the accuracy of handwritten Kanji recognition will improve, but the candidate in the candidate group also can increase, and make that the identifying of subsequent fine disaggregated classification device is slack-off.If the value of P is little, then the identifying of subsequent fine disaggregated classification device is fast, but recognition accuracy will descend.
Identification below in conjunction with Fig. 5 a, Fig. 5 b, Fig. 6 a and Fig. 6 b explanation handwritten Chinese character " hand ".After handwritten Chinese character " hand " input, the handwritten Chinese character sorter will extract two kinds of statistical natures of this Chinese character.First kind of Research of Chinese Feature Extraction device 10 of low-dimensional extracts a kind of (first kind) Hanzi features of the low-dimensional of " hand ".The second Research of Chinese Feature Extraction device 20 of higher-dimension extracts another kind (second kind) Hanzi features of the higher-dimension of " hand ".These two kinds of statistical natures can be selected from Chinese Character Recognition statistical nature commonly used, such as direction character (directional feature), contour feature (contour feature), stroke number feature and frequency field feature (frequency domainfeature) or the like.A kind of statistical nature is used for the first sub-pre-classifier 12, and another kind of statistical nature is used for sophisticated category device 2.Above-mentioned two kinds of statistical natures that statistical nature is preferentially chosen according to the Chinese character different qualities.Because, second kind of Hanzi features that this sorter also will above-mentioned higher-dimension through the dimensionality reduction conversion after, second Hanzi features that becomes low-dimensional is used for the second sub-pre-classifier 13.
In this embodiment, first kind of Hanzi features of low-dimensional selected the frequency field feature of low-dimensional for use, as the frequency field feature less than 30 dimensions.Second Hanzi features of higher-dimension is selected the direction character of higher-dimension for use, as the direction character greater than 100 dimensions.
Fig. 5 has schematically showed a kind of Hanzi features of the higher-dimension of extraction Chinese character " hand ".For difference is used for the Hanzi features of the first sub-pre-classifier 12, be referred to as second kind of Hanzi features.Chinese character " hand " is divided into a plurality of after input, shown in Fig. 5 a.Fig. 5 a only is the purpose of giving an example, and the piece of actual division statistical nature dimension is as required determined.In each piece, the direction character of system-computed stroke extracts the result shown in Fig. 5 b."-" " | " and "/" among Fig. 5 b " " represent different direction characters respectively.
Fig. 6 has showed that dimensionality reduction converting means 21 is how with second kind of Hanzi features dimensionality reduction of the higher-dimension among Fig. 5, obtaining the statistical nature of low-dimensional.The statistical nature of this low-dimensional will be as second kind of Hanzi features of low-dimensional.As indicated above, the peripheral characteristic of Chinese character is than important many of the internal feature of Chinese character.In Fig. 6 a, the with dashed lines rectangle is chosen the Hanzi features of the higher-dimension at four angles of input Chinese character.Then, the direction character in the piece in each dashed rectangle is gathered (summarize), as add up, reduce dimension, obtain the direction character of the low-dimensional shown in Fig. 6 b.The direction character of this low-dimensional will be used for the second sub-pre-classifier 13, so be referred to as second Hanzi features of low-dimensional.
According to said method, first kind of Hanzi features of the required low-dimensional of pre-classifier and second kind of Hanzi features of low-dimensional have just been obtained.The first sub-pre-classifier 12 compares low-dimensional frequency field feature and each the frequency field feature clustering center that obtains, and obtains the distance between them.Based on this distance, from a plurality of frequency field feature clusterings, select P1 the frequency field feature clustering that distance is minimum.The first candidate group will be formed in Chinese character in these frequency field feature clusterings.The value of P1 will be considered to trade off between recognition accuracy (discrimination) and required calculated amount (speed).
Equally, the second sub-pre-classifier 13 compares direction character and each direction character cluster centre of the low-dimensional that obtains, obtains the distance between them.Based on this distance, from a plurality of direction character clusters, select P2 the direction character cluster that distance is minimum.The second candidate group will be formed in Chinese character in these direction character clusters.The value of P2 also will be considered to trade off between recognition accuracy (discrimination) and required calculated amount (speed).
Next, occur simultaneously and ask for device 14, receive first candidate group of first sub-pre-classifier 12 outputs and the second candidate group of second sub-pre-classifier 13 outputs, ask for the common factor of the first candidate group and the second candidate group, as the final candidate group of pre-classifier.Last sophisticated category device 22 utilizes the direction character of the higher-dimension that obtains, and identifies handwritten Chinese character from this candidate group.
Ask for the common factor of the first candidate group and the second candidate group, be equivalent to utilize first kind of Hanzi features screening of low-dimensional to obtain the first candidate group after, get rid of impossible Chinese character in the first candidate group, promptly unnecessary Chinese character according to second kind of Hanzi features of low-dimensional.Like this, just dwindled the Chinese character quantity in the final candidate group, also just dwindled the identification range of sophisticated category device, thereby accelerated recognition speed.
Fig. 4 c shows among pre-classifier of the present invention and Fig. 4 a and Fig. 4 b different between the existing pre-classifier.Wherein, the present invention at first at the statistical nature of step 91 sampling and input handwritten Chinese character, compares the statistical nature of Chinese character and the cluster centre of each cluster in step 92 then.In step 93,, select and import P the cluster of the statistical nature of handwritten word apart from minimum according to comparative result.In step 94, the candidate group formed in the Chinese character in this P cluster.The present invention is the statistical nature that utilizes Chinese character, and sorter of the present invention is the distance classification device, rather than the dynamic programming sorter.
Handwritten Chinese character sorter of the present invention is after definite recognition speed and discrimination, can take all factors into consideration the value of choosing the first number of clusters P1, the value of choosing the second number of clusters P2, and common factor or the union of utilizing the first candidate group and the second candidate group, determine the handwritten Kanji recognition scheme according to different demands.

Claims (16)

1. method of presorting that is used for handwritten Chinese character recognition system, this handwritten Chinese character recognition system be used for to the feature of handwritten Chinese character presort and sophisticated category to discern this Chinese character, described method comprises first kind of Hanzi features of the low-dimensional of extracting described handwritten Chinese character, and produces the first candidate group; It is characterized in that described method comprises:
Extract second kind of Hanzi features of the higher-dimension of described handwritten Chinese character, be used for sophisticated category;
Second kind of Hanzi features dimensionality reduction of described higher-dimension, obtain second kind of Hanzi features of low-dimensional, and produce the second candidate group; And
Common factor by the described first candidate group and the second candidate group obtains final candidate group.
2. the method for presorting as claimed in claim 1, it is characterized in that: described first kind of Hanzi features is two kinds of different Chinese character statistical natures with second kind of Hanzi features, and they are stroke direction feature, contour feature, stroke number feature and the frequency field features that are selected from respectively in the Chinese character statistical nature.
3. the method for presorting as claimed in claim 1 is characterized in that: the frequency field feature that described first kind of Hanzi features is Chinese character, second kind of stroke direction feature that Hanzi features is a Chinese character.
4. the method for presorting as claimed in claim 1 is characterized in that: first kind of Hanzi features of described low-dimensional and second kind of Hanzi features are all less than 30 dimensions, and second kind of Hanzi features of described higher-dimension is greater than 100 dimensions.
5. the method for presorting as claimed in claim 1 is characterized in that: said dimensionality reduction is the second kind of technology of Chinese character feature that becomes low-dimensional after four jiaos of peripheral characteristics from the Chinese character of second kind of Hanzi features of the higher-dimension that has extracted gather.
6. one kind is used for the method that handwritten Chinese character recognition system is discerned hand-written Chinese character, comprises first kind of Hanzi features of the low-dimensional of extracting described handwritten Chinese character, is used for the first sub-pre-classifier and produces the first candidate group; And the second kind of Hanzi features that extracts the higher-dimension of described handwritten Chinese character, be used for sophisticated category; It is characterized in that described method comprises:
Second kind of Hanzi features dimensionality reduction of described higher-dimension, obtain second kind of Hanzi features of low-dimensional, and be used for the second sub-pre-classifier and produce the second candidate group;
Common factor by the described first candidate group and the second candidate group obtains final candidate group, as the result who presorts; And
Utilize second kind of Hanzi features of described higher-dimension, from described final candidate group, identify this handwritten Chinese character.
7. the method that is used for the hand-written Chinese character of handwritten Chinese character recognition system identification as claimed in claim 6, it is characterized in that: described first kind of Hanzi features is different Chinese character statistical natures with second kind of Hanzi features, and they are stroke direction feature, contour feature, stroke number feature and the frequency field features that are selected from respectively in the Chinese character statistical nature.
8. the handwritten Chinese character recognition system that is used for as claimed in claim 6 is discerned the method that hand is decided Chinese character, it is characterized in that: the frequency field feature that described first kind of Hanzi features is Chinese character, second kind of stroke direction feature that Hanzi features is a Chinese character.
9. the method that is used for the hand-written Chinese character of handwritten Chinese character recognition system identification as claimed in claim 6, it is characterized in that: first kind of Hanzi features of described low-dimensional and second kind of Hanzi features are all less than 30 dimensions, and second kind of Hanzi features of described higher-dimension is greater than 100 dimensions.
10. the method that is used for the hand-written Chinese character of handwritten Chinese character recognition system identification as claimed in claim 6 is characterized in that: said dimensionality reduction is the second kind of Hanzi features that becomes low-dimensional after four jiaos of peripheral characteristics from the Chinese character of second kind of Hanzi features of the higher-dimension that has extracted gather.
11. pre-classifier that is used for Chinese handwritten Chinese character recognition system, the first kind of Research of Chinese Feature Extraction device that comprises low-dimensional, be used to extract the first kind of Hanzi features and the first sub-pre-classifier of the low-dimensional of described handwritten Chinese character, produce the first candidate group according to first kind of Hanzi features of low-dimensional; It is characterized in that described pre-classifier also comprises:
Second kind of Research of Chinese Feature Extraction device of higher-dimension is used to extract second kind of Hanzi features of the higher-dimension of described handwritten Chinese character;
Second kind of Research of Chinese Feature Extraction device of low-dimensional is used for from second kind of Hanzi features of second kind of Hanzi features acquisition low-dimensional of described higher-dimension;
The second sub-pre-classifier produces the second candidate group according to second kind of Hanzi features of low-dimensional; And
Final candidate group generation device obtains final candidate group according to the common factor of the described first candidate group and the second candidate group.
12. handwritten Chinese character recognition system, the first kind of Research of Chinese Feature Extraction device that comprises low-dimensional, be used to extract the first kind of Hanzi features and the first sub-pre-classifier of the low-dimensional of described handwritten Chinese character, produce the first candidate group according to first kind of Hanzi features of the low-dimensional of being extracted; Second kind of Research of Chinese Feature Extraction device of higher-dimension is used to extract second kind of Hanzi features of the higher-dimension of described handwritten Chinese character; It is characterized in that described handwritten Chinese character recognition system also comprises:
The dimensionality reduction converting means is second kind of Hanzi features dimensionality reduction of the higher-dimension that is extracted, to obtain second kind of Hanzi features of low-dimensional;
The second sub-pre-classifier produces the second candidate group according to second kind of Hanzi features of described low-dimensional;
Final candidate group generation device is used to produce final candidate group; And
The sophisticated category device utilizes second kind of Hanzi features of described higher-dimension, identifies this handwritten Chinese character from described final candidate group.
13. handwritten Chinese character recognition system as claimed in claim 12 is characterized in that: described final candidate group generation device obtains final candidate group according to the common factor of the described first candidate group and the second candidate group.
14. one kind is used for handwritten Chinese character recognition system produces the candidate group by presorting method, comprises:
Train the template of a plurality of effective statistical natures;
Described template is divided into a plurality of statistical nature clusters;
In each cluster, generate a wherein cluster centre of whole Hanzi featureses of representative;
To each statistical nature cluster, produce a word indexing group;
For the input Chinese character, extract the statistical nature of this Hanzi specimen;
Statistical nature and described each cluster centre of this Hanzi specimen are compared, select the many groups and the cluster of Chinese character statistical nature of importing apart from minimum, wherein said and the Chinese character statistical nature of importing be predetermined apart from the number of the group of the cluster of minimum; And
The word indexing group of many groups cluster that merge selected is fixed produces the candidate group of the corresponding Chinese character of importing.
15. the method for generation candidate group as claimed in claim 14 is characterized in that: the number of described statistical nature cluster is far smaller than the number of described statistical nature template.
16. a method that generates the candidate group of sub-pre-classifier comprises for the handwriting input Hanzi specimen and extracts its statistical nature; It is characterized in that: the cluster centre of each Chinese character cluster of storing in statistical nature that this Hanzi specimen is extracted and the pre-classifier compares, and wherein said cluster centre is represented the common trait of the Chinese character in this cluster; Select some groups with the cluster of input Chinese character statistical nature apart from minimum, wherein said with the Chinese character statistical nature of importing be predetermined apart from the number of the group of the cluster of minimum; And the word indexing group of the fixed many groups cluster of merge selected, produce the candidate group of the corresponding Chinese character of importing.
CN 02127006 2002-07-25 2002-07-25 Preclassifying method and system for Chinese handwriting character recognition Expired - Fee Related CN1204526C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02127006 CN1204526C (en) 2002-07-25 2002-07-25 Preclassifying method and system for Chinese handwriting character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02127006 CN1204526C (en) 2002-07-25 2002-07-25 Preclassifying method and system for Chinese handwriting character recognition

Publications (2)

Publication Number Publication Date
CN1471042A CN1471042A (en) 2004-01-28
CN1204526C true CN1204526C (en) 2005-06-01

Family

ID=34143448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02127006 Expired - Fee Related CN1204526C (en) 2002-07-25 2002-07-25 Preclassifying method and system for Chinese handwriting character recognition

Country Status (1)

Country Link
CN (1) CN1204526C (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101685497B (en) * 2008-09-28 2011-10-12 汉王科技股份有限公司 Method and device for processing hand-written information
BR112012009701A8 (en) * 2009-10-28 2017-10-10 Koninklijke Philips Electronics Nv METHOD FOR SELECTING EXERCISES FROM A PLURALITY OF EXERCISES FOR A USER, DEVICE FOR SELECTING EXERCISES FROM A PLURALITY OF EXERCISES FOR A USER AND MEANS OF INFORMATION
CN103295007B (en) * 2013-05-02 2016-06-22 华南理工大学 A kind of Feature Dimension Reduction optimization method for Chinese Character Recognition
CN113642679B (en) * 2021-10-13 2021-12-28 山东凤和凰城市科技有限公司 Multi-type data identification method

Also Published As

Publication number Publication date
CN1471042A (en) 2004-01-28

Similar Documents

Publication Publication Date Title
CN1167030C (en) Handwriteen character recognition using multi-resolution models
CN1315090C (en) Method for identifying hand-writing characters
CN1163841C (en) On-line hand writing Chinese character distinguishing device
CN1908960A (en) Feature classification based multiple classifiers combined people face recognition method
CN1333366C (en) On-line hand-written Chinese characters recognition method based on statistic structural features
Long et al. Building compact MQDF classifier for large character set recognition by subspace distribution sharing
CN100390815C (en) Template optimized character recognition method and system
CN1236458A (en) Reducing handwriting recognizer errors using decision trees
Song et al. Comparative study of part-based handwritten character recognition methods
Biswas et al. Writer identification of Bangla handwritings by radon transform projection profile
Pal et al. Handwritten street name recognition for Indian postal automation
CN1204526C (en) Preclassifying method and system for Chinese handwriting character recognition
Guo et al. Research on Feature Extraction for Character Recognition of NaXi Pictograph.
Lamghari et al. Template matching for recognition of handwritten Arabic characters using structural characteristics and Freeman code
Sadr et al. Categorization of persian detached handwritten letters using intelligent combinations of classifiers
CN1029534C (en) Handwriting Chinese character online identifying method and system
CN1851729A (en) AdaBoost based characteristic extracting method for pattern recognition
Saba et al. Semantic analysis based forms information retrieval and classification
Hassan Arabic (Indian) Handwritten‏‏ Digits Recognition Using Multi feature and KNN Classifier
Hoque et al. A moving window classifier for offline character recognition
Bresler et al. Simultaneous segmentation and recognition of graphical symbols using a composite descriptor
Amor et al. Multifont Arabic character recognition using Hough transform and hidden Markov models
Freitas et al. Metaclasses and zoning mechanism applied to handwriting recognition
Gao et al. A vision-based fast chinese postal envelope identification system
CN115063667B (en) Parallel identification processing method for document scanning PDF (Portable document Format) file

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
ASS Succession or assignment of patent right

Owner name: MOTOROLA MOBILE CO., LTD

Free format text: FORMER OWNER: MOTOROLA INC.

Effective date: 20110120

C41 Transfer of patent application or patent right or utility model
TR01 Transfer of patent right

Effective date of registration: 20110120

Address after: Illinois State

Patentee after: MOTOROLA MOBILITY, Inc.

Address before: Illinois, USA

Patentee before: Motorola, Inc.

ASS Succession or assignment of patent right

Owner name: MOTOROLA MOBILITY INC.

Free format text: FORMER OWNER: MOTOROLA INC.

Effective date: 20110620

C41 Transfer of patent application or patent right or utility model
CI01 Publication of corrected invention patent application

Correction item: Patentee

Correct: Motorola Inc.|Illinois State

False: Motorola Mobility LLC|Illinois State

Number: 10

Volume: 27

ERR Gazette correction

Free format text: CORRECT: PATENTEE; FROM: MOTOROLA MOBILITY INC.: ILLINOIS, THE USA TO: MOTOROLA INC.: ILLINOIS, THE USA

TR01 Transfer of patent right

Effective date of registration: 20110620

Address after: Illinois State

Patentee after: MOTOROLA MOBILITY, Inc.

Address before: Illinois, USA

Patentee before: Motorola, Inc.

C41 Transfer of patent application or patent right or utility model
C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: Illinois State

Patentee after: MOTOROLA MOBILITY LLC

Address before: Illinois State

Patentee before: MOTOROLA MOBILITY, Inc.

TR01 Transfer of patent right

Effective date of registration: 20160303

Address after: California, USA

Patentee after: Google Technology Holdings LLC

Address before: Illinois State

Patentee before: MOTOROLA MOBILITY LLC

CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050601

Termination date: 20210725

CF01 Termination of patent right due to non-payment of annual fee