Summary of the invention
According to an aspect of the present invention, a kind of method of compressing the hand-written character template is provided, this method may further comprise the steps: produce the code book (codebook) that comprises vector, the center of the not compact model character feature vector group that is provided by the model Character mother plate is provided described vector; And the compression template that described not compact model character feature vector and described code book is compared the character that supplies a model.
This method can preferably include to be calculated described not compact model character feature vector group and not to compress distance between the input character proper vector, and provide the candidate characters of preceding n (top-n) by calculating the described distance of not compressing between input character proper vector and the described model Character mother plate, wherein n is based on the numerical value of a threshold value.
Provide the step of not compressing the input character proper vector can comprise the step of from standardization input character template, extracting feature.
Code book preferably includes and is no more than 256 groups.
Not compressing the input character proper vector can be 8 dimensional vectors.
Can provide the sorting technique of described candidate characters to calculate the described distance of not compressing between input character proper vector and the described model Character mother plate by use, n candidate characters before providing.
Candidate characters can be provided in character identifying method.
Foundation on the other hand, the present invention includes the system that is used to compress the hand-written character template, this system comprises: a code book generator module that is used to produce code book, wherein said code book comprise the vector at the center of the not compact model character feature vector group that definition is provided by the model Character mother plate; And a template compression module, be operably connected to described code book generator module, be used for described not compact model character feature vector is compared so that the compression template of the character that supplies a model with described code book.
System preferably includes a template matches module that is operably connected to described template compression module, and being used for provides candidate characters by the distance of not compressing between input character proper vector and the described model Character mother plate.
This system also can comprise apart from the look-up table generator.
At this instructions, comprise in claims, term " " comprising " or similarly the term meaning be meant non-exclusive comprising, for example: comprising that the method for an element tabulation or device not only comprise also can comprise other unlisted elements by the element that those are listed.
DETAILED DESCRIPTION OF THE PREFERRED
Referring to Fig. 1, there is shown the schematic block diagram of the software element of the system 100 that is used to compress the hand-written character template according to the preferred embodiment of the invention.Referring to Fig. 2, Fig. 2 is to use the software element of this system 100 to compress the general flow figure of the method for hand-written character.Software element comprises: a code book generator module 105, it produces not compact model character feature vector 110 and this proper vector 110 is categorized as group 115 from the model Character mother plate.This model Character mother plate comprises the assembly average of the proper vector of the numerous input samples that come from each character.Compact model character feature vector 110 does not comprise the proper vector of all characters that are used for character set.For some language, for example Chinese has the character more than 10,000 in a character set.Indexed so that form a code book 125 (step 210) in vectorial center of the group 115.Next, a template compression module 120 is arranged.Come from the code of code book 125 and this not compact model character feature vector 110 compare the compression template (step 215) that produces model character 135.
Template matches module 140 comprises: a look-up table generator 155, it with the code in the code book 125 with do not compress input character proper vector 130 and compare and calculate one apart from look-up table 145 (step 220).The compression template 135 of template matches generator 160 computation model characters and do not compress the distance between the input character proper vector 130 and sort out result's (step 225).At last, system 100 provides preceding n candidate characters 150, and wherein n is constant (step 230).
Referring to Fig. 3, showed the synoptic diagram that is input to an input of the present invention among the figure, it comprises digitized, high-resolution, the three-dimensional information visual 300 of a hand-written character 305.The bidimensional physical features and the time dimension that provides about the directional information of hand-written stroke of character 305 is provided this presentation of information 300.The level of character 305 and the stroke direction of vertical features are illustrated by arrow 310.Be used to produce three-dimensional image, for example the method for presentation of information 300 is being known in the art, thus shown in the step of method only be summarized in this.
Fig. 3 has illustrated also the presentation of information 300 of digitizing character 305 is how to be converted into not compress input character proper vector 130.The pixel of character 305 at first be suitable for grid 315 and by standardization so that the size of character 305 and the model character that is used to produce compact model character feature vector 110 not measure-alike.Each unit of grid 315 is subdivided into uniform refined net then, and is analyzed then so that extract proper vector 130.In the embodiment shown in fig. 3, each grid 320 comprises 7 * 7 unit.An example of proper vector 130 is octuple direction vectors.Vector each dimension of 130 is corresponding to the specified stroke direction of eight arrows 325 that are shown in Fig. 3, and these arrows are by being that increment is divided a circle and produced with 45 degree.One skilled in the art will know that according to the present invention the proper vector 130 that also can use more or less dimension.
Each unit that comprises the grid 320 of the pixel that comes from hand-written stroke is assigned to one of eight directions based on only actual stroke direction in this unit.The direction dimension of the unit of grid 320 is added to produce and does not compress input character proper vector 130.An octuple direction character vector is defined as V={v
1, v
2, v
3, v
4, v
5, v
6, v
7, v
8, v wherein
iValue be the quantity of i direction dimension in the grid 320, wherein (1<=i<=8).The proper vector of describing the character 305 that is shown in Fig. 3 like this will be designated as following value: v
1=4; v
2=0; v
3=7; v
4=0; v
5=0; v
6=0; v
7=0; v
8=3.
Referring to Fig. 4, it is a process flow diagram of having summarized the method 400 that above-mentioned proper vector 130 is provided.In step 405, receive the presentation of information 300 of input character 305.In step 410, the presentation of information 300 of this input character 305 is standardized as a grid 315.In step 415, each unit of grid 315 further is subdivided into grid 320.In step 420, extract the direction character of each grid 320.Next, in step 425, for not compressing each dimension designated value of input character proper vector 130.
According to the present invention, the model Character mother plate comprises the assembly average of the proper vector of the numerous input samples that come from each character.The feature of each input sample is extracted according to the processing identical with said method 400.
Referring to Fig. 5, it is one of the explanation not compact model character feature vector 110 that utilizes all model characters in the character set and produce the process flow diagram of the method 500 of the code book 125 in the code book generator module 105.In step 505, provide not compact model character feature vector 110 as mentioned above.Compact model character feature vector 110 is not constructed the template of each model character.Next, in step 510, handle the not compact model character feature vector 110 that hives off according to for example known in the field hiving off.For example: can use such as the K-Means method or with the grouping method of the GLA algorithm of vector quantization.According to a preferred embodiment of the present invention, octuple not compact model character feature vector 110 be grouped into 256 different groups 115.Hive off can with, for example the Euclidean distance of vector between 110 be basic.In step 515, method 500 provides code book 125 for template compression module 120 then.Code book 125 comprises 256 groups' 115 vectorial center, the code of each group 115 between designated one 0 to 255.The number that depends on analyzed model character, thousands of not compact model character feature vector 110 can be compressed into 256 groups 115.According to another embodiment of the invention, using portable electric appts of the present invention, that have the minimum memory resource can use group 115 still less to remove to define code book 125.
Referring to Fig. 6, it shows the process flow diagram of the method 600 of the compression template 135 of utilizing template compression module 120 to produce the model character.In step 605, each not compact model character feature vector 110 compare with group 115 in the code book 125.Next, in step 610, use the group's 115 of preferably mating each proper vector 130 single Code Number to replace each not compact model character feature vector 110.In step 615, the compression template 135 of model character is provided to template matches unit 140.
Original, uncompressed model character feature vector 110 more than described be the compression template 135 that how to be converted into the model character.For example, consider relevant and alphabet or dictionary the k digitizing character that is used with the above embodiment of the present invention.Store a corresponding not memory module of compact model character feature vector 110 needs are comprised k * m octuple vector, wherein m is the sum that is used to define the grid 320 of each character 305.Yet with above-mentioned processing of the present invention, each eight saves not that compact model character feature vector 110 is reduced to the single compressed template that only needs the model of 1 byte memory character 135.Necessary storer is reduced to initial 1/8 of the size that requires.
Referring to Fig. 7, the process flow diagram of the method 700 of the final candidate characters 150 of providing of a character recognition algorithm that is used for being correlated with is provided for it.In step 220, Accounting Legend Code this in 125 each group 115 and each do not compress distance between the input character proper vector 130.As known in the art, vector distance can use following formula to calculate:
Formula 1
Wherein, A={a1, a2, a3, a4, a5, a6, a7, a8} and B={b1, b2, b3, b4, b5, b6, b7, b8}.Thereby the distance between A and the B equals the absolute value sum of the difference between each corresponding dimension.In step 710, in look-up table 145, provide distance.Size apart from look-up table 145 is m * 256, and wherein m still is used to define the sum of the grid 320 of each input character 305 again.Perhaps, in other words, m is the sum that does not compress input character proper vector 13 that is used to define each input character 305.
The remainder of the process flow diagram that Fig. 7 provided will be described with reference to the synoptic diagram of figure 8, and Fig. 8 has illustrated a compact model Character mother plate 805.Template shown in Figure 8 comprises 49 grids, can make 49 code index in code book 125 like this.The value of grid be designated as 5,10 ..., 15}.The index of first grid 810 is 5, so the distance between first grid feature of first grid 810 and first feature in compression input character template first grid 320 is Dis
15Dis
15Value can in look-up table 145, find.Similarly, the distance between each feature in other grids of compact model Character mother plate 805 and the individual features in compression input character template is determined.After the distance of all 49 grids is determined, calculate total distance (DIS) of compressing between input character template and the compact model Character mother plate 805 according to following formula:
DIS=Dis
15+ Dis
2 10+ ...+Dis
49 15Formula 2
Refer again to Fig. 7, in step 225, the compression template 135 of system's 100 computation model characters and do not compress distance between the input character proper vector 130.These distances of classification in step 720.In step 230, system 100 provides preceding n candidate characters 150 at last.
Therefore the present invention is one and is used in improving one's methods of the compression hand-written character template of electronics character recognition in handling.The use of code book 125 can make multidimensional not compact model character feature vector 110 to be compressed into preferably only be a code book index of a byte.And the present invention can preferably use a simple lookup technology, by the distance between each feature in input character and model character relatively input character proper vector 130 and model Character mother plate mated.When using in conjunction with character recognition system, the present invention just can realize the high precision character recognition and reduce the expense of storer and processing simultaneously.Reduce storer and processing expenditure and make the present invention be particularly suitable for and mancarried electronic aid, for example mobile phone and PDA(Personal Digital Assistant) use together.
Above-mentioned describing in detail only provides a preferred example embodiment, but the scope that is not intended to limit the present invention, application or configuration.On the contrary, the those skilled in the art that are described as of preferred example embodiment provide the effective description that is used to realize the preferred example embodiment of the present invention.Be appreciated that under the situation that does not deviate from the illustrated the spirit and scope of the present invention of appended claim, can carry out various conversion the function and the configuration of element and step.