Summary of the invention
According to an aspect of the present invention, the present invention is a kind of method of handwriting recognition, and it comprises the steps: to receive the expression of writing the handwriting characters on the user interface of electronic equipment; From described input character, extract the direction character vector; From described input character, extract stroke section proper vector; And by thereby described direction character vector sum described stroke section proper vector and model character are compared the candidate characters of determining coupling.
That this method also can comprise is level and smooth by carrying out, noise deletion and size normalization are handled described handwriting characters is carried out pretreated step.
This method also can comprise the step that described handwriting characters is presorted.
This method also can comprise by only thereby described direction character vector and described model character contrast being provided the candidate character list and the corresponding first confidence score (v thereof of a weak point
i 2) step.
The step of the candidate characters of definite coupling can comprise candidate character list and the corresponding first confidence score (v thereof with described weak point
i 2) step that compares with described stroke section proper vector.
This method can further comprise the step of the no order function D (S) that calculates described handwriting characters.
Can calculate no order function D (S) according to following formula:
In the formula
S in the formula
i=(x
i, y
i) and s
j=(x
j, y
j), s wherein
iAnd s
jIt is the starting point of two hand-written strokes.
Candidate character list and the corresponding first confidence score (v thereof with described weak point
i 2) step that compares with described stroke section proper vector can comprise: the corresponding second confidence score (v is provided
i 3), and according to following formula with the described first confidence score (v
i 2) and the described second confidence score (v
i 3) merge:
θ in the formula
2+ θ
3=1, and θ
2And θ
3Be rule of thumb to determine.
θ
2And θ
3Can according to whether D (S)<T (S) change, wherein T (S) is the thresholding of the definite stroke number decision of experience, and D (S) is the no order function that calculates according to following formula:
In the formula
S in the formula
i=(x
i, y
i) and s
j=(x
j, y
j), s wherein
iAnd s
jIt is the starting point of two hand-written strokes.
The candidate characters of coupling can be to have maximum v
iThe character of value.
According to a further aspect in the invention, the present invention is a kind of system that is used for handwriting recognition, and it comprises: microprocessor; The ROM (read-only memory) (ROM) that effectively links to each other with described microprocessor; The programmable storage that effectively links to each other with described microprocessor; And the tablet that effectively links to each other with described microprocessor; Described microprocessor is carried out the code that is stored among the described ROM effectively, write the expression of the handwriting characters on described tablet with reception, from described input character, extract the direction character vector, from described input character, extract stroke section proper vector, by the described stroke section of described direction character vector sum proper vector and the model character that is stored in the described programmable storage are compared definite candidate characters that mates.
, noise deletion level and smooth by carrying out and size normalization are handled, and microprocessor can further carry out pre-service to described handwriting characters effectively.
Microprocessor can further be presorted to described handwriting characters effectively.
By only described direction character vector being compared with described model character, microprocessor can further provide the candidate character list and the corresponding first confidence score (v thereof of a weak point effectively
i 2).
Microprocessor can be further effectively with the candidate character list and the corresponding first confidence score (v thereof of described weak point
i 2) compare with described stroke section proper vector.
In this specification and claims book, term " comprises (comprises, comprising) " or similar term means comprising of nonexcludability, therefore, the method or the device that comprise a series of assemblies not only comprise and also can comprise other assembly that those are not clearly listed well by the assembly that these have illustrated.
Embodiment
Referring to Fig. 1, this process flow diagram according to one embodiment of present invention, illustrate the hand-written recognition method 100 of determining the Matching Model character candidates, this Matching Model character candidates has been mated the expression of handwriting characters best.Method 100 starts from step 105, receives the expression of handwriting characters here.Carry out pre-service at step 110 pair input character, calculate in step 115 and be used to create input character, the no order function D (S) relevant with order of strokes.In step 120, input character is presorted, to create the first short tabulation of matching candidate character.From input character, extract the direction character vector in step 125 then.In step 130, direction character vector and the model character of tabulating corresponding to first weak point of matching candidate character are compared, with the second short tabulation of creating the matching candidate character and the first confidence score (v that is correlated with
i 2).In step 135, from input character, extract stroke section proper vector.Compare in the second short tabulation of step 140 then, to create second group of confidence score (v stroke section proper vector and matching candidate character
i 3).At last, in step 145, discern handwriting characters by determining a matching candidate character.By with the first confidence score (v
i 2) and the second confidence score (v
i 3) merge to determine the matching candidate character.
The further details of method 100 is provided below.Method 100 can be incorporated in the hand-hold electronic equipments such as PDA(Personal Digital Assistant) and mobile phone, so that hand-write recognition function to be provided.In step 105, can submit a written statement to a higher authority at the user interface of electronic equipment and write the expression of hand-written input character.In step 110, pre-service can be carried out smoothly input character, noise removing and size normalization.No order function D (S) will be described in the instructions of back.In step 120, each all comprises in the group of dozens of character by all characters in the character set at first being grouped into (for example) hundreds of, thereby input character is presorted.For some language,, have 10,000 characters of surpassing in the character set such as Chinese.Then, the training pattern character compares input character and each model character then to represent each group.Then, by the first short tabulation of forming candidate characters corresponding to the character in the group of optimum matching.
In step 125, by method known in the field, method 100 extracts the direction character vector from input character. for example, at first make the pixel of digitized input character be suitable for grid and the normalization of N*N in pre-treatment step 110, thereby make size and the model character that is used for model of creation character direction proper vector big or small identical of input character. then, each element of N*N grid is subdivided into even thinner grid, to analyze then to extract the direction character vector. an example of direction character vector is that 8 dimension direction character vector .8 tie up each dimension of direction character vectors corresponding to a stroke direction that is used to create input character. by circumference being divided into the five equilibriums of 45 degree, can create eight stroke direction. those skilled in the art will recognize that, according to the present invention, the direction character vector also can use dimension more or still less. then, optimum matching according to actual stroke direction in the element, distribute one of eight dimensions for each element comprised from the grid of the pixel of hand-written stroke. with the direction dimension summation of grid elements, be defined as V={v then to create direction character vector .8 dimension direction character vector
1, v
2, v
3, v
4, v
5, v
6, v
7, v
8, wherein, v
iValue be the counting of i direction dimension in the grid, in the formula (1≤i≤8).Then, the dimension of 8 in each element of N*N grid direction character vector is averaged.At last, obtain the direction character vector of the 8*N*N dimension of whole input character.
Model character feature vector comprises the assembly average from the proper vector of a plurality of input samples of each character.Model character feature vector comprises the proper vector of all characters in the character set.
Next, method 100 continues in step 130, here the direction character vector is compared with model character feature vector, to create the second short tabulation of candidate characters.The second short tabulation of candidate characters is the subclass of the first short tabulation of candidate characters.Then, based on the optimum matching comparison of distance between the direction character vector of the direction character of input character vector and candidate characters, with confidence score (v
i 2) distribute to the candidate characters in the second short tabulation.Distance between the vector for example can be based on Euclidean distance (Euclidean distance) or city block distance (city block distance).
Then, method 100 continues in step 135, extracts stroke section proper vector here from input character.Obtain stroke section proper vector by the third dimension degree (time dimension) that uses input character.Referring to Fig. 2, illustrate the stroke section feature of Chinese character.Writing stroke section between the local maximum deflection of input character changes has defined stroke section feature.Obtain then every section bivector (dx, dy), the coordinate distance between the starting and ending point of dx and the dy section of being wherein.According to the complexity of input character, the number that is used to define the stroke section proper vector of input character also can change.
Stroke section proper vector is used to determine whether that the mode to be similar to prediction writes input character.For example, Chinese character has standard stroke order, can observe the order of stroke usually when connecting written character.When the writing Chinese characters character, the standard stroke order is followed following rule usually: 1) the left back right side of elder generation; 2) under after in the elder generation.Referring to Fig. 3, show an example of the Chinese character of writing according to top standard stroke rule.As shown in Figure 3, standard stroke is " s1, s2, s3, s4 " in proper order, and wherein the stain in each is represented the starting point of stroke.Notice that s2 is below s1, s3 is on the right side of s2, s4 below s3, the right side of s3.
In step 140, method 100 continues, and here the above-mentioned stroke section proper vector that will obtain from input character compares with second short being listed as of matching candidate character, to create second group of confidence score (v
i 3).Referring to Fig. 4, illustrate and produce second group of confidence score (v
i 3) the synoptic diagram of comparison process.The stroke segment model sequence of the stroke that will be listed as corresponding to second weak point of matching candidate character and the stroke section proper vector of input character compare.Stroke segment model sequence comprises probability distribution function (PDF), such as Gaussian Mixture PDF (Gaussian mixture PDF).Can be inconsistent from the number of the proper vector of input character with number from the stroke segment model of matching candidate character.Therefore use dynamic programming (DP) that model sequence and input character are compared.Calculate confidence score (v
i 3) measure as of each contrast.During DP handled, the confidence score between stroke section proper vector and the segmented model was confirmed as the likelihood score of proper vector and segmented model coupling.
At last, in step 145, method 100 is passed through the first confidence score (v
i 2) and the second confidence score (v
i 3) merge and determine the best match candidate character, thereby the identification inputting characters by handwriting.Merge two groups of confidence scores according to following formula:
θ in the formula
2+ θ
3=1.
Choose and maximal value v
iThe candidate characters that is complementary is as the best match candidate character.θ
2And θ
3Be to be used for determining where to organize confidence score (v
i 2Or v
i 3) more may provide the weight factor of accurate recognition result.
Referring to Fig. 5, illustrate and determine θ
2And θ
3The process flow diagram of method 500 of value.At first, in step 505, method 500 determines that input character is to write to connect a plurality of independent strokes that also are to use of writing.Connect the most possible conformance with standard order of strokes of input character that pen is write; Therefore, if write input character to connect pen, method 500 is output valve θ immediately
2=a
1And θ
3=a
2, a wherein
1And a
2Rule of thumb determine by testing input character and model candidate characters to compare.In addition, if determine that in step 505 input character is to use a plurality of independent strokes to write, then method 500 proceeds to step 510, calculates no order function D (S) here.No order function D (S) is the order of strokes of input character and measuring of standard stroke sequence consensus degree.For example, in an embodiment of the present invention, for Chinese, D (S) can calculate as described below.
At first, definition d (s
i, s
j), it has described the relation between two strokes, wherein s
i=(x
i, y
i) and s
j=(x
j, y
j) be the starting point coordinate of two strokes.Then according to following formula definition d (s
i, s
j):
(formula 2)
Notice that formula 2 meets as above the rule in conjunction with the qualification Chinese standard stroke order that Fig. 3 explained.Formula 2 is applied to Chinese character shown in Figure 3, can obtains d (s
1, s
2)=1, d (s
1, s
3)=0, d (s
1, s
4)=0, d (s
2, s
3)=1, d (s
2, s
4)=0, d (s
3, s
4)=0.
Unordered function definition the relation between all strokes of input character.Suppose S=s
1s
2... s
nBe one group of stroke sequence, D (S) can be defined as so:
(formula 3)
For example, again referring to the Chinese character described in Fig. 3, if i.e. S=s write in proper order in this character according to its standard
1s
2s
3s
4, use formula 2 and 3 can calculate D (S)=2 so.If character is write according to reverse order, i.e. S=s
4s
3s
2s
1, can calculate D (S)=10 so.Table 1 has provided the example of other value of D (S) that is used to create the order of strokes of Chinese character " my god (Tian (1)) " according to other.
Table 1
S |
D(S) |
s
4s
1s
2s
3 |
8 |
s
1s
4s
3s
2 |
6 |
s
3s
2s
4s
1 |
6 |
s
2s
1s
3s
4 |
2 |
... |
... |
In general, the input character of writing according to the standard stroke order will have less D (S) value, and will have bigger D (S) value with the character that non-standard order is write.
Referring to Fig. 5, calculating no order function D (S) afterwards once more, method 500 proceeds to step 515, and here value and the threshold T (S) with D (S) compares.T (S) is based on the sample of the model character of writing according to different order of strokes and rule of thumb determines.For example, once more referring to Fig. 3, the thresholding of four-stroke character can be set to T=4, thus stroke sequence s
1s
2s
3s
4And s
2s
1s
3s
4Just can be accepted as the standard generic sequence, other sequence then is regarded as non-standard order.
If D (S)<T (S) this means that input character may write according to the standard order, method 500 output valve θ once more so
2=a
1And θ
3=a
2But if D (S)>T (S) this means that input character can not write according to the standard order, method 500 will be exported another class value θ so
2=b
1And θ
3=b
2
Referring to Fig. 6, illustrate the synoptic diagram of electronic equipment or system with form such as the mobile phone 601 that can be used for realizing said method of the present invention.Phone 601 comprises the radio frequency communications unit 602 that connects and communicate by letter with processor 603.User interface (in the present embodiment, in form being display screen 605, keypad 606 and tablet 619) also is connected and communicates by letter with processor 603.As the skilled person will be apparent, when display screen was touch-screen, tablet 619 can be the part of display screen 605.
Processor 603 comprises the encoder/decoder 611 with the ROM (read-only memory) (ROM) 612 that is associated, and ROM 612 is used to store by mobile phone 601 transmissions or the encoding and decoding voice that receive or the data of other signal.Processor 603 also comprises microprocessor 613, ROM (read-only memory) (ROM) 614, random-access memory (ram) 604, static programmable memory 616 and detachable sim module 618 that microprocessor 613 is connected to encoder/decoder 611 by conventional data and address bus 617 and is associated.Wherein, static programmable memory 616 and sim module 618 can memory model character feature vector sum use the expression and the out of Memory of the input character of tablet 619 inputs.
Radio frequency communications unit 602 is receiver and the transmitters with merging of common antenna 607.Communication unit 602 has the transceiver 608 that is connected to antenna 607 by radio frequency amplifier 609.Transceiver 608 is also connected to the modulator/demodulator 610 of merging, and the modulator/demodulator 610 of merging is connected to processor 603 with communication unit 602.
Microprocessor 613 has and is used to be connected to for example port of keypad 606, screen 605 and tablet 619.ROM (read-only memory) 614 stores the code that is used for the aforesaid expression of for example using the hand-written character that pen, contact pilotage or finger write on tablet is carried out handwriting recognition.
Therefore, the user of phone 601 can write one or more characters on tablet 619, and phone 601 will all be stored in these characters in random-access memory (ram) 604, static programmable memory 616 and/or the detachable sim module 618.Then, the user of phone 601 can issue an order (for example using keypad 606), requires identification to use the hand-written character of tablet 619 inputs.
The order of identification hand-written character can be handled by microprocessor 613.By using the code that stores among the ROM 612, microprocessor 613 will be carried out aforesaid method of the present invention subsequently, determine the matching candidate character by direction character vector sum stroke section proper vector and model character being compared for each input character.According to the requirement of special system, microprocessor 613 can be carried out further order based on the input character of identification subsequently.Further order like this comprises: the information of input character that for example sends the following a piece of news of the input character comprise identification or will comprise identification is in address book.
Therefore, the present invention is used to discern improving one's methods and system of the hand-written character write on the user interface of electronic equipment. and the present invention is especially effective such as ideographic characters such as Chinese or Japanese character for identification, contain many strokes of writing according to the standard order usually in these ideographic characters. by analyzing stroke section proper vector and the direction character vector that is associated with input character, the present invention has increased the possibility of correct identification input character.
Above detailed description only provided preferred example embodiment, and do not want to limit the scope of the invention, applicability or configuration.Detailed description to this preferred example embodiment provides the description that preferred example embodiment of the present invention can be put into practice to those skilled in the art.Should be appreciated that, under the prerequisite of the spirit and scope of the present invention that do not deviate from the claims to be set forth, can make various changes the function and the structure of assembly and step.