CN106845319A - Hand-written register method, hand-written recognition method and its device - Google Patents

Hand-written register method, hand-written recognition method and its device Download PDF

Info

Publication number
CN106845319A
CN106845319A CN201510876255.9A CN201510876255A CN106845319A CN 106845319 A CN106845319 A CN 106845319A CN 201510876255 A CN201510876255 A CN 201510876255A CN 106845319 A CN106845319 A CN 106845319A
Authority
CN
China
Prior art keywords
radical
hmm
character
hand
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510876255.9A
Other languages
Chinese (zh)
Inventor
王亮
李建杰
刘欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201510876255.9A priority Critical patent/CN106845319A/en
Publication of CN106845319A publication Critical patent/CN106845319A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a kind of hand-written register method, hand-written recognition method and its device.The hand-written register method includes building includes the radical dictionary of radical hidden Markov model (HMM), and generates the character HMM based on radical by combining the radical HMM selected from the radical dictionary.The present invention can realize handwriting recognition results more accurately and quickly.

Description

Hand-written register method, hand-written recognition method and its device
Technical field
It is more particularly to a kind of hand-written for ONLINE RECOGNITION this patent disclosure relates generally to handwriting recognition field The method and its device of character.
Background technology
In order to ONLINE RECOGNITION is hand-written, prior art 1 [1] establishes a hidden Ma Erke for each character Husband's model (HMM, Hidden Markov Model).Therefore, including thousands of models knowledge Malapropism allusion quotation is big, and uses the calculating high cost of the technology.
For the purpose for reducing calculating cost, used in the handwriting recognition of ideographic language based on inclined Other method.United States Patent (USP) 7903877B2 (referred to as prior art 2) has used character-radical word Allusion quotation, to represent thousands of character by sharing less radical subset.However, be based on The method of character is compared, and the method based on radical seems often to adversely affect accuracy of identification, Because, doing for all outward appearances of the radical in kinds of characters is represented using only a HMM Method, it is difficult to cover the actual diversity of the character.
In order to improve accuracy of identification, United States Patent (USP) 6956969B2 (referred to as prior art 3) will be inclined Side is categorized as some classifications.Principles of the Figure 1A exemplified with prior art 3 when radical model is set up. Geometric layouts of the radical X according to it in kinds of characters, and it is classified as m classification, wherein often Individual classification corresponds to HMM model.M is positive integer.Accuracy of identification based on prior art 3 is still So it is far below prior art 1.It is therefore desirable to be able to provide one kind not only can quickly but also accurately know Not hand-written new hand-written recognition method.
Bibliography
[1]Han Shu,“On-Line Handwriting Recognition Using Hidden Markov Models”,Master Thesis in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology,1997
The content of the invention
The present invention is at least one of in view of the above problems and proposes.
According to an aspect of the invention, there is provided a kind of hand-written register method, the hand-written registration Method includes:Structure includes the radical dictionary of radical hidden Markov model (HMM), and leads to The radical HMM of combination selection from the radical dictionary is crossed to generate the character based on radical HMM, wherein, radical HMM in the radical dictionary is through the following steps that generation:
- training data obtaining step, it includes:Selection includes the training radical of at least one classification, And the respective seed HMM of one of at least one classification is obtained, wherein, this is trained It is based on geometric layout of the radical in kinds of characters that radical is categorized at least one classification; The training dataset of character sample is obtained, wherein, the character sample includes handwriting tracks;Obtain The model data collection of character HMM, wherein, the model data collection includes the HMM of multiple characters;
- radical is detected and radical sampled point determines step, by using accessed seed HMM, character sample of the detection comprising the radical is used as described inclined in the middle of the training dataset Other training character sample, and for the radical each training character sample to determine State the sampled point of radical;
- status switch extraction step, the respective symbols concentrated by using the model data HMM, the training character sample to the radical is decoded respectively, and from respective symbols HMM in, the status switch for representing the radical is extracted respectively;
- sorting procedure, based on status number, the status switch cluster that will be extracted arrives subclass, So that each subclass corresponds to radical HMM.
By description referring to the drawings to exemplary embodiment, further characteristic of the invention will It is made apparent from.
Brief description of the drawings
The accompanying drawing of a part of specification is merged in specification and constituted exemplified with implementation of the invention Example, and be used for illustrating principle of the invention together with word description.
General principles of the Figure 1A exemplified with prior art 2;Figure 1B is exemplified with reality of the invention Apply the general principle that example builds radical model.
Fig. 2 is the handwriting recognition built by the hand-written register method according to embodiments herein The diagram of the applied environment of dictionary.
Fig. 3 is the schematic block diagram of the device configured according to the first example system, wherein, it is described Device realizes handwriting input work(by handwriting recognition module embedded according to an embodiment of the invention Energy.
Fig. 4 is the schematic block diagram of the device configured according to the second example system, wherein, it is described Device realizes handwriting input work(by handwriting recognition module distributed according to an embodiment of the invention Energy.
Fig. 5 is the handwriting recognition module 430 in handwriting recognition module 330 or Fig. 4 in diagrammatic illustration 3 Exemplary hardware arrangement schematic block diagram.
Fig. 6 shows two in the general flow chart of register method hand-written according to an embodiment of the invention The individual stage.
Fig. 7 shows the flow chart of the construction radical model of the stage I in Fig. 6.
Fig. 8 shows the example implementations of the step 200 and step 300 in Fig. 7.
Fig. 9 is schematically to illustrate the basic place for constructing radical model according to an embodiment of the invention The schematic diagram of reason.
Figure 10 is schematically to illustrate the treatment for constructing character model according to an embodiment of the invention Schematic diagram.
Figure 11 is shown according to an embodiment of the invention, by combining corresponding radical model next life Into the example of the model of multiple characters.
Figure 12 is schematically to illustrate the treatment for generating Seed model according to an embodiment of the invention Schematic diagram.
Figure 13 is the functional structure of calling mechanism hand-written according to an embodiment of the invention.
Specific embodiment
Describe exemplary embodiment of the invention in detail next, with reference to accompanying drawing.It should be pointed out that Following description is substantially only illustrative and exemplary of, and be in no way intended to limit the present invention and Its application or purposes.The inscape and positioned opposite, the numerical tabular of step stated in embodiment It is not limit the scope of the invention up to formula and numerical value, unless otherwise specified.Additionally, ability Technology, method and apparatus may be not discussed in detail known to the technical staff in domain, but appropriate In the case of be regarded as the part of this specification.
In the hand-written recognition method based on HMM, each character corresponds to HMM, and often Individual HMM is made up of status switch.Particularly, if character HMM is by the shared side of radical Method and build, then character HMM can be considered as the combination of the HMM of each radical.For example, Character " slow-witted " is made up of radical " mouth " and radical " wood ".Therefore, character " slow-witted " is represented Status switch is the status switch for representing radical " mouth " and the status switch for representing radical " wood " Combination.
In this application, if do not specialized, radical model refers to radical HMM, and And character model refers to character HMM.The status number of HMM shows that the model of the HMM is complicated Degree.In other words, the status number of character HMM shows the model complexity of character HMM; The status number of radical HMM shows the model complexity of radical HMM.
Radical is resolved into geometric layout in view of radical the way of class, it is possible to increase precision.So And, the radical of identical geometric layout may still have different model complexities.It is vertical with left side As a example by radical " gold ".Here, " left side is vertical " refers to particular geometric cloth of the radical in character Office.In the HMM based on training character, for character " Needles ", preceding 17 states and radical " gold " matches.However, for character " Duo ", only preceding 11 states and radical " gold " phase Matching.In this case, 17 states and 11 shapes are built for left side vertical " gold " The radical HMM of state, is inappropriate.
Next, further radical of the analysis with identical geometric layout matches with different conditions number The reason for.In general, the model complexity of radical is influenced by two factors.First factor It is the integrality number of the character comprising the radical.Second factor is relative complexity.In character Whole states in the middle of, some will match with the radical.The intrinsic property of HMM causes that matching is tied Fruit is relative complexity of the track relative to the track of character based on radical.The relative complexity is got over Greatly, then more multimode matches with radical in the middle of whole states of character HMM.Show upper one In example, " gold " in " Needles " obtains 17 states, and this is Bi " Duo " in " gold " 6 more State.Because, the relative complexity of the track relative to track " Needles " of " gold " is more than The track of " gold " is relative to track " Duo " relative complexity.
In Figure 1A and Figure 1B, each circle represents the state in model.Around the arrow of circle Expression state changes.A series of rows of circles represent the state route of transition of model.Rows of one Serial circle can be considered as status switch.
Additionally, illustrating to build radical model according to an embodiment of the invention by referring to Figure 1B General principle.By the first sort operation, according to geometric layouts of the radical X in kinds of characters, Be classified as m classification C1, C2 ..., Cm.
In the first example, radical " mouth " can be classified as top, bottom and outside " mouth " 3 Individual classification.3 representative characters in this 3 classifications are respectively " numbers ", " accounting for " and " enclosing ". First example is based only upon the position of radical, and radical is resolved into class.
In the second example, radical " mouth " can be classified as " mouth " greatly and small " mouth " 2 Classification.2 representative characters in 2 classifications are respectively " numbers " and " enclosing "." accounting for " with " number " is in same category.Second example is based only upon the size of radical, and radical is resolved into class.
In the 3rd example, radical " day " can be classified as that left side is vertical, right side is vertical, top Horizontal " day " 4 classifications of portion's transverse direction and bottom.4 representative character difference in this 4 classifications It is " bright ", " rising sun ", " morning " and " Books ".3rd example is based on position and the shape of radical, Radical is resolved into class.
Above three example is merely illustrative, and is not intended to limit protection scope of the present invention. Geometric layout is included with least one of properties:Position, shape or big of the radical in character It is small.There can be different modes that radical is resolved into class.In first sort operation of Figure 1B, The all modes that radical is resolved into class disclosed in prior art 3 can be applied.In general, The classification that radical is broken down into is more, then the radical HMM of gained can be more detailed and perfect.
Additionally, it is at least one subclass to pass through the second sort operation by each category classification.Second The criterion of sort operation is model complexities of the radical X in its respective symbols, or in other words, It is status numbers of the radical X in its respective symbols.Classification C1 be classified as subclass C11 ..., C1n;Classification C2 be classified as subclass C21 ..., C2n;Classification Cm is classified as subclass Other Cm1 ..., Cmn.In fig. ib, m and n are positive integers.Also, as shown in Figure 1B, Each subclass corresponds to a radical HMM.In other words, for particular category, the number of subclass Quantity of the amount equal to radical HMM.
Return to above example." Needles " is He " Duo " in radical " gold " first classify after, Each fall within classification C1.In classifying second, due to status number of the radical " gold " in respective symbols It is respectively 11 and 17, therefore, the radical " gold " in " Needles " falls into subclass C11, Er " Duo " In radical " gold " fall into subclass C12.Additionally, C11 and C12 correspond to two it is different Radical HMM.
The present invention not only considers geometric layout of the radical in its respective symbols, and considers that radical exists Model complexity in its respective symbols.Therefore, compared with prior art, reality of the invention The radical HMM for applying example construction is more accurate.The structure by shared these accurate radical HMM The character HMM for making can more accurately represent character.Therefore, by using comprising more accurate The dictionary of character HMM, realizes and more accurately recognizes hand-written purpose.
Below, illustrate that two systems are configured by referring to Fig. 3 and Fig. 4.Fig. 3 is according to first The schematic block diagram of the device of example system configuration, wherein, described device is by according to the present invention The embedded handwriting recognition module of embodiment realize hand-write input function.Fig. 4 is according to second The schematic block diagram of the device of example system configuration, wherein, described device is by according to the present invention The distributed handwriting recognition module of another embodiment realize hand-write input function.Dress in Fig. 3 It can be mobile phone (such as in Fig. 2 200), many to put the device 400 in 300 or Fig. 4 Function printer (MFP), or realize other electronic installations of hand-write recognition function.Two examples The Main Differences of sexual system configuration are that in figure 3, handwriting recognition module 330 is embedded in dress Put in 300, and in fig. 4, handwriting recognition module 430 is the distribution separated with device 400 Module.
Next, illustrating to be stepped on by according to the hand-written of embodiments herein by referring to Fig. 2 Note method and the diagram of the applied environment of handwriting recognition dictionary that builds.User is in mobile phone 200 Upper editor's message.Mobile phone 200 can be the device 300 in Fig. 3, or the device in Fig. 4 400.Under either case in both of these case, the picture of mobile phone mainly includes 3 regions, I.e. message display area 210, input viewing area 220 and text edit area 230.210、220 And 230 together constitute Short Message Service (SMS) UI, wherein, 201 represent two it is exemplary Short message, and 202 be light target current location.
By using the finger 208 of writing pencil or user, picture of the user in mobile phone 200 Upper input handwriting tracks 207.In figure 3, touch conversion of the sensor 310 by user on phone It is sampled point, or in other words, obtains handwriting tracks 207.To in track buffer 320, store The discrete sampling point of the track 207 including space and temporal information.Identification module 330 is from buffer Track 207 is read in 320.Module 330 includes handwriting recognition dictionary, and the handwriting recognition dictionary is logical Cross and built according to the hand-written register method of embodiments herein.By using handwriting recognition word Track 207 is identified as relevant character, i.e. character " buying " by allusion quotation, identification module 330, and will be known Other result exports back application module 340.In the applied environment, application module 340 is SMS, Therefore in viewing area 220 is input into, character " buying " will be displayed on the present bit of cursor 202 Put.User can be input into using button 204, and backspace is carried out using button 205, using pressing Button 206 switches language, etc..After the editor that user completes to current message, user presses Lower button 203 is sending the message.Then, during entire message will appear in message display region 210, And will be transmitted by the communication network not illustrated in figure.
Alternately, it is also possible to application second system configuration.As shown in figure 4, the list of Fig. 4 The units 310,320 and 340 that unit 410,420 and 440 is analogous respectively to Fig. 3, herein for succinct For the sake of, will not be repeated again.The Main Differences of Fig. 4 and Fig. 3 are the position of identification module 430 Put.Comprising the handwriting recognition word built by the hand-written register method according to embodiments herein The module 430 of allusion quotation, positioned at the outside of device 400.Module 430 via computer network 450, with Buffer 420 and application module 440 communicate.Network 450 can be LAN or internet.Know Other module 430 can be other computing devices, such as personal computer (PC), computer workstation, Or the calculating service provided by network cloud computing technology, etc..By using hand-written Recognition dictionary, the track 207 that identification module 430 will send via network 450 is identified as associated characters Symbol, i.e. character " buying ", and via network 450, recognition result is exported into back application module 440. Also, in the case of figure 4, user with similar mode in the context of fig. 3, and such as The UI of the mobile phone of the UI of SMS etc. is interacted.
Fig. 5 is the block diagram of the exemplary hardware arrangement for illustrating handwriting recognition module 530.Module 530 Can be 430 in 330 or Fig. 4 in Fig. 3.Memory cell 534 stores handwriting recognition dictionary, And the software program of hand-written recognition method.By enroller hand-written according to an embodiment of the invention Method builds recognition dictionary.In Figure 13 described later, exemplified with the flow chart of recognition methods.No Only track 207 is sent to memory 533, and recognition dictionary is also extended to memory 533 In.Processor 532 is disposed to retrieve the software program of hand-written recognition method.Processor 532 Also it is disposed to look for all steps for taking, decoding and performing according to hand-written recognition method, such as schemes The step of shown in 13.Searched by recognition dictionary, processor 532 produces recognition result, And system bus 535 is utilized, result is recorded to memory 533.In addition to memory 533, Output can also be stored more permanently in memory cell 534.
Network interface 531 is optional.For the first system configuration in Fig. 3, network interface 531 Not necessarily.But, for the second system configuration in Fig. 4, in order to be communicated with network 450, Network interface 531 is necessary, consequently facilitating the input and output of identification module 430.
Additionally, handwriting recognition module 530 can be various forms of, may have it is one or more not Necessary part is removed, or has one or more additional parts to be added.
The method of the present invention is described in detail next, with reference to accompanying drawing.Fig. 6 is shown according to this hair Two stages in the general flow chart of the hand-written register method of bright embodiment.
Stage I is construction radical HMM.In stage I, the invention provides for hand-written registration New paragon, and produce accurate radical HMM.In stage I, can build including radical The radical dictionary of HMM.
Stage II is construction character HMM.By combining the radical selected from radical dictionary HMM constructs the character HMM based on radical.Therefore, in stage II, the character of gained Model is also accurate.
Fig. 7 shows the flow chart of the construction radical model of the stage I in Fig. 6.
In the step s 100, training data is obtained.Training data includes Seed model, character sample Training dataset, and character model model data collection.Step S100 includes following sub-step Suddenly.Selection includes the training radical of at least one classification, and obtains at least one classification The respective seed HMM of, wherein, classification of the training radical at least one classification It is based on geometric layout of the radical in kinds of characters.The training dataset of character sample is obtained, Wherein, the character sample includes handwriting tracks.The model data collection of character HMM is obtained, its In, the model data collection includes the HMM of multiple characters.
Step S200 is that radical detection and radical sampled point determine step.In step s 200, exist In the middle of the training dataset, by using accessed seed HMM, detection includes the instruction Practice the character sample of radical as the training character sample of the radical, then, it is determined that for described The sampled point of each training character sample of radical, the radical.
Step S300 is status switch extraction step.In step S300, by using pattern number According to the HMM of the respective symbols concentrated, the training character sample to radical is decoded respectively. Also, respectively from the HMM of respective symbols, extract the status switch for representing radical.
Step S400 is sorting procedure.In step S400, based on status number, will be extracted Status switch cluster be subclass so that each subclass correspond to a radical HMM, and Each subclass corresponds to specific status number.
Fig. 8 shows the example implementations of the step S200 and step S300 in Fig. 7. In step S210, by by using the Seed model for obtaining in the step s 100, detect respectively Whether the character sample that the training data obtained in step S100 is concentrated is comprising training radical.For word One track of symbol, using Seed model, owns by Viterbi (Viterbi) algorithm to track Possible section (segments) decoded, and calculate corresponding confidence level.If highest Confidence level is more than predetermined threshold (such as 0.5 etc.), then in step S220, the track of character It is detected as comprising training radical, wherein, training radical corresponds to Seed model.Also, will tool There is the section of highest confidence level, be defined as training the sampled point of radical, that is, train the track of radical.It is no Then, if highest confidence level is less than predetermined threshold, then it represents that in character trace, do not find and plant The corresponding training radical of submodel, and the track of character will be filtered.
In step S310, by the Viterbi side as known solution in the art Method, is decoded using the model of character to the track of character.That is, adopting the track of character Sampling point, is matched with the state in the model of character.
In step s 320, based on matching result, from the status switch of the model of character, extract The status switch corresponding with the track of training radical, as the status switch of training radical.
Next, how will be described in detail by referring to Fig. 9 by character sample and character model To construct the exemplary process of radical model.Fig. 9 corresponds to the stage I in Fig. 6.
In this example, the geometric layout based on " mouth " in kinds of characters, radical " mouth " can To be classified as top " mouth " and bottom " mouth ".Two of " mouth " different classes of to be claimed respectively It is " mouth 1 " and " mouth 2 ".Fig. 9 illustrates how for classification " mouth 1 " to be categorized into subclass, Or classification C1 is divided into subclass C11 in Figure 1B ..., the place of second classification of C1n Reason.
As shown in figure 9, the cylinder in the upper left corner represents the training dataset of character sample, for example, provide " note ", " brother ", " city ", " Lesson ", " slow-witted ", " STAFF " equal sampleses.Top Seed model of the dashed rectangle comprising " mouth 1 ".After a while, the generation of Seed model will be described.
Step S51 in Fig. 9 corresponds to the step S200 in Fig. 7.In step s 51, pass through The Seed model of " mouth 1 " is used, radical " mouth 1 " is detected from the character sample in the cylinder of left side. As a result, " brother ", " city ", " slow-witted ", " STAFF " is judged as comprising " mouth 1 ".Afterwards, Determine the sampled point of radical " mouth 1 ", the i.e. track of radical " mouth 1 ".In fig .9, gained 4 tracks are indicated with 4 ellipses.By step S210 and S220 in Fig. 8, it has been shown that The algorithm of step S51.
Step S52 and S53 in Fig. 9 correspond to the step S300 in Fig. 7.The cylinder in the upper right corner Represent the model data collection of character model.Model data collection includes the HMM of multiple characters, wherein, Comprising " brother ", " city ", " slow-witted ", " STAFF " model.In step S52, by making With " brother ", " city ", " slow-witted ", " STAFF " model to 4 character samples including " mouth 1 " This is decoded respectively.In step S53, respectively from " brother ", " city ", " slow-witted ", " STAFF " Character model in, extract represent " mouth 1 " status switch.Institute is shown in the cylinder of bottom Status switch, wherein, from " brother ", " city ", " slow-witted ", " STAFF " " mouth 1 " Status switch respectively include 7,7,6 and 6 states.By the step S310 in Fig. 8 and S320, it has been shown that the algorithm of step S52 and S53.
Step S54 corresponds to the step S400 in Fig. 7.Due to from " brother " and " slow-witted " The status switch of " mouth 1 " has identical status number, i.e., 7, therefore, the two status switch quilts Cluster to the first subclass.Also, due to from " city " Ji " STAFF " " mouth 1 " state sequence Row have identical status number, i.e., 6, therefore, the two status switches are clustered to the second subclass.
The representative radical HMM of each subclass.There are different modes to obtain the representativeness HMM。
In a kind of mode, by being subordinated to selection state sequence in the middle of the status switch of each subclass Arrange the radical HMM that each subclass is represented to obtain.For the first subclass, can select to come From the status switch of " mouth 1 " of " brother " or " slow-witted ", as representative radical HMM.For Second subclass, can select from " city " Huo " STAFF " " mouth 1 " status switch, as Representative radical HMM.
Alternately, can be by the multiple hands of training radical of the training package containing corresponding subclass Sample is write to obtain the radical HMM of each subclass of expression.By taking the first subclass as an example.Collect And train multiple handwriting samples of " brother " or " slow-witted ", with obtain the first subclass representativeness partially Other HMM.In an identical manner, collect and train " city " Huo " STAFF " multiple handwriting samples, To obtain the representative radical HMM of the second subclass.
As described above, the cylinder in the upper left corner represents the training dataset of character sample in Fig. 9.Although Concentrated in training data and illustrate only 6 character samples, but it is also possible to use much bigger instruction Practice data set.In one embodiment, training dataset can include the sample of overall character set. As long as have input any Seed model in the dashed rectangle at top, such as " mouth 2 ", " wood 1 ", " wood 2 ", " wood 3 ", " Tony 1 " or " Tony 2 ", then hand-written register method will be automatically generated Radical subclass and radical HMM as derived from input Seed model.
Radical dictionary is built by collecting radical subclass.In one implementation, radical Dictionary has tree structure, it means that, each radical in radical dictionary has as in Figure 1B Structure as radical X.Therefore, if giving the classification and subsequent subclass of radical, The model of radical can easily be retrieved.
Return to Fig. 6.Next, the stage II that character Construction of A Model will be described.Figure 10 is to illustrate Property ground illustrate according to an embodiment of the invention construction character model treatment schematic diagram.The treatment Step and radical model combination step are selected comprising radical model.
For character " slow-witted ", it is made up of key element radical " mouth " and " wood ".Based on " mouth " Geometric layout in " slow-witted ", selects the classification of " mouth 1 ".Also, based on " wood " " slow-witted " In geometric layout, the classification of selection " wood 3 ".
In the overall character HMM of character " slow-witted ", the status switch of radical " mouth " is represented Status number is 7.Based on the quantity 7, selection subclass " mouth 11 ".Therefore, it is possible to from radical word The representative HMM of subclass " mouth 11 " is obtained in allusion quotation.Equally, in the entirety of character " slow-witted " In character HMM, the status number for representing the status switch of radical " wood " is 8.Based on the quantity 8, Selection subclass " wood 31 ".Then, the representative of subclass " wood 31 " is obtained from radical dictionary Property HMM.
Therefore, by combining the representative HMM and subclass " wood 31 " of subclass " mouth 11 " Representative HMM, generation character " slow-witted " the HMM based on radical.
It is noted that entirety character HMM refers to character HMM be as an entirety, and And character HMM is produced by the model of combination/shared key element radical.Frequently by Multiple handwriting samples of respective symbols are trained to produce overall character HMM.
Figure 11 shows embodiments in accordance with the present invention by combining corresponding radical model to generate The example of the model of multiple characters.Built-up pattern " west 11 " and the word of " wood 31 " generation " chestnut " Symbol model;Built-up pattern " standing grain 11 ", " Dao11" and " wood 32 " generation " pears " character mould Type;Etc.." mouth 11 " is shared by more than one character, " wood 31 ", " wood 32 " and " mouth 12 " it is also such.In hand-written register method, the character recognition dictionary of gained is much smaller than comprising whole The conventional characters recognition dictionary of body character HMM, because the quantity of radical model is much smaller than character mould The quantity of type.Therefore, by radical shared mechanism, realize and quickly recognize hand-written purpose.
There are different modes to generate Seed model.Figure 12 be schematically illustrate it is of the invention The schematic diagram of the treatment of the generation Seed model of embodiment.In step S81, selection training radical Representative character, wherein, each character comprising different geometric layouts training radical.For example, For training radical " mouth ", character " number " and " accounting for " is selected, wherein, " mouth 1 " is in " number " In geometric layout be top, and the geometric layout of " mouth 2 " be bottom.In step S82, In the sample of character " number ", the radical section corresponding with " mouth 1 " is marked.Equally, in character In the sample of " accounting for ", the radical section corresponding with " mouth 2 " is marked.In fig. 12, the two Section is represented as radical section 1 and radical section 2.Next, in step S83, by being used as this The Viterbi method of the known schemes in technical field, using corresponding character model to character The sample of " number " and " accounting for " is decoded respectively.That is, by the sampled point of the track of character with State in the model of character is matched.In step S84, respectively from " number " and " accounting for " Character model in extract with radical section 1 and radical section 2 corresponding status switches, and serve as The Seed model of " mouth 1 " and " mouth 2 ".
Note that and do not limited for generating the method for Seed model.Alternately, can be with Seed model is obtained by being trained to the multiple handwriting samples for training radical, wherein, instruction The multiple handwriting samples for practicing radical belong to the corresponding classifications of seed HMM.
According to another aspect, the present invention provides a kind of hand-written recognition method, the hand-written recognition method bag Include following steps:
- obtain handwriting samples;And
- obtained to recognize by using the character dictionary comprising character model of the multiple based on radical The handwriting samples got, wherein, the multiple base is generated by above-mentioned hand-written register method In the character model of radical.
The identification step also includes:
- input handwriting samples are standardized as such as 400*400;
- feature of these handwriting samples is extracted to formula (3) according to formula (1);And
- according to Viterbi method, and by using the identification of the recognition template including being input into character Dictionary, the feature to these handwriting samples is decoded.
It will be understood by those of skill in the art that the method for the present invention be also applied for such as Japanese, in The East Asia character of text or Korean characters etc..
According on the other hand, hand-written register method of the invention can be offline register method.This hair Bright hand-written recognition method can be ONLINE RECOGNITION method.Character dictionary according to embodiment can be Offline structure, and handwriting recognition treatment is canbe used on line.
The present invention not only considers the geometric layout of radical, and considers radical in its respective symbols Model complexity, therefore compared with prior art, the radical mould of embodiments in accordance with the present invention construction Type is more accurate.The character model constructed by sharing these accurate radical models can be more smart Really represent character.Therefore, it is real by using the recognition dictionary comprising more accurate character model Show and more accurately recognized hand-written purpose.
Further, since the build-in attribute of the hand-written register method based on radical, and as a result The dictionary of small size, and realize the fast advantage of handwriting recognition speed.
Figure 13 is the functional configuration of calling mechanism hand-written according to an embodiment of the invention.Can be by hard Any of part, firmware, software or its any combination constitute hand-written calling mechanism 9000 And its included unit, as long as the unit in device 9000 can realize hand-written as described above stepping on The function of the corresponding steps of note method.If device 9000 is partly or wholly by software Constitute, then the software is stored in the memory of computer, and when the processor of computer During by performing stored software to be processed, the computer can realize hand of the invention Write the function of register method.On the other hand, device 9000 can partly or wholly by hardware Or firmware is constituted.Device 9000 can be incorporated into image processing equipment as functional module.
Hand-written calling mechanism 9000 includes that radical Construction of A Model unit 9100 and character model construction are single Unit 9200, wherein, unit 9100 is configured to build includes radical hidden Markov model (HMM) Radical dictionary, and unit 9200 be configured to by combine from the radical dictionary select Radical HMM generates the character HMM based on radical.
Radical Construction of A Model unit 9100 includes:
- training data obtains subelement 9110, its training number for being configured to obtain character sample According to collection, wherein, the character sample includes handwriting tracks sample;Selection training radical, according to this Geometric layout of the radical in kinds of characters, is categorized at least one classification, and obtain by the radical Take the respective seed HMM of at least one classification;Obtain the model of character HMM Data set, wherein, the model data collection includes the HMM of multiple characters;
- radical detect and radical sampled point determination subelement 9120, its be configured to by using Accessed seed HMM, character of the detection comprising the radical in the middle of the training dataset Sample as the radical training character sample, and for the radical each training character Sample, determines the sampled point of the radical;
- status switch extracts subelement 9130, and it is configured to by using the model data The HMM of the respective symbols of concentration, the training character sample to the radical is solved respectively Code, and respectively from the HMM of respective symbols, extract the status switch for representing the radical;
- cluster subelement 9140, it is configured to based on status number, the state that will be extracted Sequence clustering is to subclass so that each subclass corresponds to a radical HMM.
Character model structural unit 9200 includes:
- radical model selects subelement 9210, and it is configured to be directed to includes the instruction of key element radical Practise handwriting symbol, be each key element based on geometric layout of each key element radical in the training character In radical one classification of selection, and the overall character HMM based on the training character, table Show the status number of the status switch of each key element radical, for each key element is inclined from the radical dictionary One subclass of side selection, and for each key element radical, obtain and selected subclass phase Corresponding radical HMM;And
- radical model group zygote unit 9220, it is configured to each obtained by combination The radical HMM of key element radical come generate it is described training character the HMM based on radical.
According to another aspect of the present invention, there is provided a kind of character model that radical is based on including multiple Handwriting recognition dictionary, wherein, it is the multiple to generate by above-mentioned any hand-written register method Character model based on radical.
According to another aspect of the present invention, there is provided a kind of handwriting recognition apparatus.The handwriting recognition dress Put including:Sample acquisition unit, it is configured to obtain handwriting samples;Hand-written knowledge as described above Malapropism allusion quotation;And recognition unit, it is configured to know by using the handwriting recognition dictionary Not accessed handwriting samples.
According to another aspect of the present invention, there is provided a kind of mobile phone.The mobile phone includes: Sensor, its touch for being configured to by user on the mobile phone is converted to sampled point;Rail Mark buffer, it is configured to store the sampled point;Handwriting recognition apparatus as described above, its It is configured to for the sampled point to be identified as character;And text editing module, it is configured to root Text is exported according to the character for being identified.
According to another aspect of the invention, there is provided a kind of image processing equipment.Described image treatment sets It is standby to include:Handwriting recognition apparatus as described above;And graphics processing unit, it is configured to root Input picture is processed according to the recognition result of the handwriting recognition apparatus.Described image processing equipment Can be at least one of following equipment:Duplicator, facsimile machine, scanner, printer or many Function printer.
It should be noted that the method for the present invention and device can be implemented in several ways.For example, The method of the present invention can be implemented by any combination of software, hardware, firmware or this three And device.The order of the step of method described above is only intended to illustrate, and unless in addition Specialize, otherwise the method for the present invention the step of be not limited to the order that is described in detail above. Additionally, in certain embodiments, the present invention may be embodied in recording program in the recording medium, Including the machine readable instructions for realizing the method according to the invention.Therefore, present invention also contemplates that The recording medium of the program being stored with for realizing the method according to the invention.
Although with reference to exemplary embodiment, invention has been described, but it is to be understood that this hair It is bright to be not limited to disclosed exemplary embodiment.It will be apparent to those skilled in the art that It is, can without departing from the scope and spirit of the present invention, to above-mentioned example embodiment Carry out modification.Scope of the following claims should be endowed explanation most wide, with cover it is all this Class modification and equivalent 26S Proteasome Structure and Function.

Claims (25)

1. a kind of hand-written register method, the hand-written register method includes:
Structure includes the radical dictionary of radical hidden Markov model HMM, and by combine from The radical HMM selected in the radical dictionary generates the character HMM based on radical, wherein, Radical HMM in the radical dictionary is through the following steps that generation:
Training data obtaining step, it includes:Selection includes the training radical of at least one classification, And the respective seed HMM of one of at least one classification is obtained, wherein, this is trained It is based on geometric layout of the radical in kinds of characters that radical is categorized at least one classification; The training dataset of character sample is obtained, wherein, the character sample includes handwriting tracks;Obtain The model data collection of character HMM, wherein, the model data collection includes the HMM of multiple characters;
Radical is detected and radical sampled point determines step, by using accessed seed HMM, In the middle of the training dataset, character sample of the detection comprising the radical is used as the radical Training character sample, and for each training character sample of the radical, determine the radical Sampled point;
Status switch extraction step, the respective symbols concentrated by using the model data HMM, the training character sample to the radical is decoded respectively, and from respective symbols HMM in, the status switch for representing the radical is extracted respectively;
Sorting procedure, based on status number, the status switch cluster that will be extracted makes to subclass Obtain each subclass and correspond to a radical HMM.
2. hand-written register method according to claim 1, wherein, the word based on radical Symbol HMM through the following steps that generation:
Radical model selects step, for the training character including key element radical, based on each key element Geometric layout of the radical in the training character, is that each key element radical selects a classification, and And based on it is described training character overall character HMM in, state representing each key element radical The status number of sequence, for each key element radical selects a subclass from the radical dictionary, and And for each key element radical, obtain the radical HMM corresponding with selected subclass;And
Radical model combination step, the radical HMM of each the key element radical obtained by combination, To generate the HMM based on radical of the training character.
3. the hand-written register method according to claim 1 or claim 2, wherein, it is described Geometric layout is included with least one of properties:Position, shape or big of the radical in character It is small.
4. the hand-written register method according to claim 1 or claim 2, wherein, it is described Sorting procedure includes:
The status switch cluster that status number identical is extracted arrives same subclass.
5. hand-written register method according to claim 4, wherein, in the sorting procedure, Select status switch and obtain and every height by being subordinated in the middle of the status switch of each subclass The corresponding radical HMM of classification.
6. hand-written register method according to claim 4, wherein, in the sorting procedure, Obtained and every height by multiple handwriting samples of training radical of the training package containing corresponding subclass The corresponding radical HMM of classification.
7. the hand-written register method according to claim 1 or claim 2, wherein, pass through The status switch of the expression training radical is extracted from the HMM of source word symbol to obtain the kind Sub- HMM, wherein, geometric layout of the training radical in source word symbol belongs to the seed HMM corresponding classification.
8. the hand-written register method according to claim 1 or claim 2, wherein, pass through Multiple handwriting samples of the training radical are trained to obtain the seed HMM, wherein, it is described The multiple handwriting samples of radical are trained to belong to the corresponding classifications of the seed HMM.
9. the hand-written register method according to claim 1 or claim 2, wherein, it is described Hand-written register method is used to register East Asia character.
10. a kind of hand-written recognition method, the hand-written recognition method is comprised the following steps:
Obtain handwriting samples;And
Recognized by using the character dictionary comprising character model of the multiple based on radical acquired The handwriting samples for arriving, wherein, by any hand-written enroller of claim 1 to claim 9 Method generates the multiple character model based on radical.
11. hand-written recognition methods according to claim 10, wherein, the character dictionary is Offline structure, and the hand-written recognition method is canbe used on line.
A kind of 12. hand-written calling mechanisms, the hand-written calling mechanism include radical Construction of A Model unit and Character model structural unit, wherein, the radical Construction of A Model unit is configured to build to be included partially The radical dictionary of other hidden Markov model HMM, and the character model structural unit is by structure Make is to generate the word based on radical by combining the radical HMM selected from the radical dictionary Symbol HMM, wherein, the radical Construction of A Model unit includes:
Training data obtains subelement, and it is configured to:The training dataset of character sample is obtained, Wherein, the character sample includes handwriting tracks sample;Selection training radical, exists according to the radical Geometric layout in kinds of characters, is categorized as at least one classification, and obtain described by the radical The respective seed HMM of one of at least one classification;And obtain the pattern number of character HMM According to collection, wherein, the model data collection includes the HMM of multiple characters;
Radical detects and radical sampled point determination subelement that it is configured to by using accessed Seed HMM, in the middle of the training dataset, detection comprising the radical character sample make It is the training character sample of the radical, and for each training character sample of the radical, Determine the sampled point of the radical;
Status switch extracts subelement, and it is configured to the phase concentrated by using the model data The HMM of character is answered, the training character sample to the radical is decoded respectively, and from In the HMM of respective symbols, the status switch for representing the radical is extracted respectively;
Cluster subelement, it is configured to the status switch cluster that based on status number, will be extracted To subclass so that each subclass corresponds to a radical HMM.
13. hand-written calling mechanisms according to claim 12, wherein, the character model structure Making unit includes:
Radical model selects subelement, and it is configured to be directed to includes the training character of key element radical, It is the selection of each key element radical based on geometric layout of each key element radical in the training character One classification, and based on it is described training character overall character HMM in, represent each will The status number of the status switch of plain radical, is each key element radical selection one from the radical dictionary Individual subclass, and for each key element radical, obtains corresponding with selected subclass inclined Other HMM;And
Radical model group zygote unit, it is configured to each the key element radical obtained by combination Radical HMM come generate it is described training character the HMM based on radical.
The 14. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute Stating geometric layout is included with least one of properties:Position of the radical in character, shape or Size.
The 15. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute Status switch cluster that cluster subelement extracted status number identical is stated to same subclass.
16. hand-written calling mechanisms according to claim 15, wherein, it is single in cluster In unit, by be subordinated in the middle of the status switch of each subclass select status switch obtain with The corresponding radical HMM of each subclass.
17. hand-written calling mechanisms according to claim 15, wherein, it is single in cluster In unit, obtained and each subclass phase by multiple handwriting samples of the training package containing training radical Corresponding radical HMM.
The 18. hand-written calling mechanism according to claim 12 or claim 13, wherein, lead to Cross to be extracted from the HMM of source word symbol and represent that the status switch of the training radical is described to obtain Seed HMM, wherein, geometric layout of the training radical in source word symbol belongs to the kind Sub- HMM corresponding classification.
The 19. hand-written calling mechanism according to claim 12 or claim 13, wherein, lead to Cross and train multiple handwriting samples of the training radical to obtain the seed HMM, wherein, institute The multiple handwriting samples for stating training radical belong to the corresponding classifications of the seed HMM.
The 20. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute Hand-written calling mechanism is stated for registering East Asia character.
A kind of 21. handwriting recognition dictionaries, the handwriting recognition dictionary includes multiple character moulds based on radical Type, wherein, generated by any hand-written register method of claim 1 to claim 9 The multiple character model based on radical.
A kind of 22. handwriting recognition apparatus, the handwriting recognition apparatus include:
Sample acquisition unit, it is configured to obtain handwriting samples;
Handwriting recognition dictionary according to claim 21;And
Recognition unit, it is configured to be recognized by using the handwriting recognition dictionary acquired The handwriting samples for arriving.
A kind of 23. mobile phones, the mobile phone includes:
Sensor, its touch for being configured to by user on the mobile phone is converted to sampled point;
Track buffer, it is configured to store the sampled point;
Handwriting recognition apparatus according to claim 22, it is configured to know the sampled point Wei not character;And
Text editing module, it is configured to export text according to the character for being identified.
A kind of 24. image processing equipments, the image processing equipment includes:
Handwriting recognition apparatus according to claim 22;And
Graphics processing unit, it is configured to the recognition result according to the handwriting recognition apparatus, comes Treatment input picture.
25. image processing equipments according to claim 24, the image processing equipment is to duplicate Machine, facsimile machine, scanner, printer or multi-function printer.
CN201510876255.9A 2015-12-03 2015-12-03 Hand-written register method, hand-written recognition method and its device Pending CN106845319A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510876255.9A CN106845319A (en) 2015-12-03 2015-12-03 Hand-written register method, hand-written recognition method and its device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510876255.9A CN106845319A (en) 2015-12-03 2015-12-03 Hand-written register method, hand-written recognition method and its device

Publications (1)

Publication Number Publication Date
CN106845319A true CN106845319A (en) 2017-06-13

Family

ID=59149053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510876255.9A Pending CN106845319A (en) 2015-12-03 2015-12-03 Hand-written register method, hand-written recognition method and its device

Country Status (1)

Country Link
CN (1) CN106845319A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097193A (en) * 2019-04-28 2019-08-06 第四范式(北京)技术有限公司 The method and system of training pattern and the method and system of forecasting sequence data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097193A (en) * 2019-04-28 2019-08-06 第四范式(北京)技术有限公司 The method and system of training pattern and the method and system of forecasting sequence data

Similar Documents

Publication Publication Date Title
Mahdavi et al. ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection
CN105512692B (en) Hand script Chinese input equipment mathematical formulae Symbol Recognition based on BLSTM
Harouni et al. Online Persian/Arabic script classification without contextual information
Awal et al. First experiments on a new online handwritten flowchart database
CN102449640B (en) Recognizing handwritten words
TW389865B (en) System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing network
US9711117B2 (en) Method and apparatus for recognising music symbols
CN109614944A (en) A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing
CN107169485A (en) A kind of method for identifying mathematical formula and device
CN107526721B (en) Ambiguity elimination method and device for comment vocabularies of e-commerce products
EP4150480A1 (en) Descriptive insight generation and presentation system
CN114730241B (en) Gesture and stroke recognition in touch user interface input
Ramteke et al. A novel weighted SVM classifier based on SCA for handwritten marathi character recognition
CN111753744A (en) Method, device and equipment for classifying bill images and readable storage medium
Abuzaraida et al. Online handwriting Arabic recognition system using k-nearest neighbors classifier and DCT features
Gohel et al. On-line handwritten Gujarati character recognition using low level stroke
Panda et al. Odia offline typewritten character recognition using template matching with unicode mapping
Chaithra et al. Handwritten online character recognition for single stroke Kannada characters
CN117095414A (en) Handwriting recognition system and recognition method based on dot matrix paper pen
KR20080080112A (en) Logical structure and layout based offline character recognition
CN106845319A (en) Hand-written register method, hand-written recognition method and its device
Chiney et al. Handwritten data digitization using an anchor based multi-channel CNN (MCCNN) trained on a hybrid dataset (h-EH)
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
CN115311674A (en) Handwriting processing method and device, electronic equipment and readable storage medium
CN114387600A (en) Text feature recognition method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170613

WD01 Invention patent application deemed withdrawn after publication