CN106845319A - Hand-written register method, hand-written recognition method and its device - Google Patents
Hand-written register method, hand-written recognition method and its device Download PDFInfo
- Publication number
- CN106845319A CN106845319A CN201510876255.9A CN201510876255A CN106845319A CN 106845319 A CN106845319 A CN 106845319A CN 201510876255 A CN201510876255 A CN 201510876255A CN 106845319 A CN106845319 A CN 106845319A
- Authority
- CN
- China
- Prior art keywords
- radical
- hmm
- character
- hand
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/32—Digital ink
- G06V30/36—Matching; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Discrimination (AREA)
Abstract
The invention discloses a kind of hand-written register method, hand-written recognition method and its device.The hand-written register method includes building includes the radical dictionary of radical hidden Markov model (HMM), and generates the character HMM based on radical by combining the radical HMM selected from the radical dictionary.The present invention can realize handwriting recognition results more accurately and quickly.
Description
Technical field
It is more particularly to a kind of hand-written for ONLINE RECOGNITION this patent disclosure relates generally to handwriting recognition field
The method and its device of character.
Background technology
In order to ONLINE RECOGNITION is hand-written, prior art 1 [1] establishes a hidden Ma Erke for each character
Husband's model (HMM, Hidden Markov Model).Therefore, including thousands of models knowledge
Malapropism allusion quotation is big, and uses the calculating high cost of the technology.
For the purpose for reducing calculating cost, used in the handwriting recognition of ideographic language based on inclined
Other method.United States Patent (USP) 7903877B2 (referred to as prior art 2) has used character-radical word
Allusion quotation, to represent thousands of character by sharing less radical subset.However, be based on
The method of character is compared, and the method based on radical seems often to adversely affect accuracy of identification,
Because, doing for all outward appearances of the radical in kinds of characters is represented using only a HMM
Method, it is difficult to cover the actual diversity of the character.
In order to improve accuracy of identification, United States Patent (USP) 6956969B2 (referred to as prior art 3) will be inclined
Side is categorized as some classifications.Principles of the Figure 1A exemplified with prior art 3 when radical model is set up.
Geometric layouts of the radical X according to it in kinds of characters, and it is classified as m classification, wherein often
Individual classification corresponds to HMM model.M is positive integer.Accuracy of identification based on prior art 3 is still
So it is far below prior art 1.It is therefore desirable to be able to provide one kind not only can quickly but also accurately know
Not hand-written new hand-written recognition method.
Bibliography
[1]Han Shu,“On-Line Handwriting Recognition Using Hidden Markov
Models”,Master Thesis in Electrical Engineering and Computer Science at
the Massachusetts Institute of Technology,1997
The content of the invention
The present invention is at least one of in view of the above problems and proposes.
According to an aspect of the invention, there is provided a kind of hand-written register method, the hand-written registration
Method includes:Structure includes the radical dictionary of radical hidden Markov model (HMM), and leads to
The radical HMM of combination selection from the radical dictionary is crossed to generate the character based on radical
HMM, wherein, radical HMM in the radical dictionary is through the following steps that generation:
- training data obtaining step, it includes:Selection includes the training radical of at least one classification,
And the respective seed HMM of one of at least one classification is obtained, wherein, this is trained
It is based on geometric layout of the radical in kinds of characters that radical is categorized at least one classification;
The training dataset of character sample is obtained, wherein, the character sample includes handwriting tracks;Obtain
The model data collection of character HMM, wherein, the model data collection includes the HMM of multiple characters;
- radical is detected and radical sampled point determines step, by using accessed seed
HMM, character sample of the detection comprising the radical is used as described inclined in the middle of the training dataset
Other training character sample, and for the radical each training character sample to determine
State the sampled point of radical;
- status switch extraction step, the respective symbols concentrated by using the model data
HMM, the training character sample to the radical is decoded respectively, and from respective symbols
HMM in, the status switch for representing the radical is extracted respectively;
- sorting procedure, based on status number, the status switch cluster that will be extracted arrives subclass,
So that each subclass corresponds to radical HMM.
By description referring to the drawings to exemplary embodiment, further characteristic of the invention will
It is made apparent from.
Brief description of the drawings
The accompanying drawing of a part of specification is merged in specification and constituted exemplified with implementation of the invention
Example, and be used for illustrating principle of the invention together with word description.
General principles of the Figure 1A exemplified with prior art 2;Figure 1B is exemplified with reality of the invention
Apply the general principle that example builds radical model.
Fig. 2 is the handwriting recognition built by the hand-written register method according to embodiments herein
The diagram of the applied environment of dictionary.
Fig. 3 is the schematic block diagram of the device configured according to the first example system, wherein, it is described
Device realizes handwriting input work(by handwriting recognition module embedded according to an embodiment of the invention
Energy.
Fig. 4 is the schematic block diagram of the device configured according to the second example system, wherein, it is described
Device realizes handwriting input work(by handwriting recognition module distributed according to an embodiment of the invention
Energy.
Fig. 5 is the handwriting recognition module 430 in handwriting recognition module 330 or Fig. 4 in diagrammatic illustration 3
Exemplary hardware arrangement schematic block diagram.
Fig. 6 shows two in the general flow chart of register method hand-written according to an embodiment of the invention
The individual stage.
Fig. 7 shows the flow chart of the construction radical model of the stage I in Fig. 6.
Fig. 8 shows the example implementations of the step 200 and step 300 in Fig. 7.
Fig. 9 is schematically to illustrate the basic place for constructing radical model according to an embodiment of the invention
The schematic diagram of reason.
Figure 10 is schematically to illustrate the treatment for constructing character model according to an embodiment of the invention
Schematic diagram.
Figure 11 is shown according to an embodiment of the invention, by combining corresponding radical model next life
Into the example of the model of multiple characters.
Figure 12 is schematically to illustrate the treatment for generating Seed model according to an embodiment of the invention
Schematic diagram.
Figure 13 is the functional structure of calling mechanism hand-written according to an embodiment of the invention.
Specific embodiment
Describe exemplary embodiment of the invention in detail next, with reference to accompanying drawing.It should be pointed out that
Following description is substantially only illustrative and exemplary of, and be in no way intended to limit the present invention and
Its application or purposes.The inscape and positioned opposite, the numerical tabular of step stated in embodiment
It is not limit the scope of the invention up to formula and numerical value, unless otherwise specified.Additionally, ability
Technology, method and apparatus may be not discussed in detail known to the technical staff in domain, but appropriate
In the case of be regarded as the part of this specification.
In the hand-written recognition method based on HMM, each character corresponds to HMM, and often
Individual HMM is made up of status switch.Particularly, if character HMM is by the shared side of radical
Method and build, then character HMM can be considered as the combination of the HMM of each radical.For example,
Character " slow-witted " is made up of radical " mouth " and radical " wood ".Therefore, character " slow-witted " is represented
Status switch is the status switch for representing radical " mouth " and the status switch for representing radical " wood "
Combination.
In this application, if do not specialized, radical model refers to radical HMM, and
And character model refers to character HMM.The status number of HMM shows that the model of the HMM is complicated
Degree.In other words, the status number of character HMM shows the model complexity of character HMM;
The status number of radical HMM shows the model complexity of radical HMM.
Radical is resolved into geometric layout in view of radical the way of class, it is possible to increase precision.So
And, the radical of identical geometric layout may still have different model complexities.It is vertical with left side
As a example by radical " gold ".Here, " left side is vertical " refers to particular geometric cloth of the radical in character
Office.In the HMM based on training character, for character " Needles ", preceding 17 states and radical
" gold " matches.However, for character " Duo ", only preceding 11 states and radical " gold " phase
Matching.In this case, 17 states and 11 shapes are built for left side vertical " gold "
The radical HMM of state, is inappropriate.
Next, further radical of the analysis with identical geometric layout matches with different conditions number
The reason for.In general, the model complexity of radical is influenced by two factors.First factor
It is the integrality number of the character comprising the radical.Second factor is relative complexity.In character
Whole states in the middle of, some will match with the radical.The intrinsic property of HMM causes that matching is tied
Fruit is relative complexity of the track relative to the track of character based on radical.The relative complexity is got over
Greatly, then more multimode matches with radical in the middle of whole states of character HMM.Show upper one
In example, " gold " in " Needles " obtains 17 states, and this is Bi " Duo " in " gold " 6 more
State.Because, the relative complexity of the track relative to track " Needles " of " gold " is more than
The track of " gold " is relative to track " Duo " relative complexity.
In Figure 1A and Figure 1B, each circle represents the state in model.Around the arrow of circle
Expression state changes.A series of rows of circles represent the state route of transition of model.Rows of one
Serial circle can be considered as status switch.
Additionally, illustrating to build radical model according to an embodiment of the invention by referring to Figure 1B
General principle.By the first sort operation, according to geometric layouts of the radical X in kinds of characters,
Be classified as m classification C1, C2 ..., Cm.
In the first example, radical " mouth " can be classified as top, bottom and outside " mouth " 3
Individual classification.3 representative characters in this 3 classifications are respectively " numbers ", " accounting for " and " enclosing ".
First example is based only upon the position of radical, and radical is resolved into class.
In the second example, radical " mouth " can be classified as " mouth " greatly and small " mouth " 2
Classification.2 representative characters in 2 classifications are respectively " numbers " and " enclosing "." accounting for " with
" number " is in same category.Second example is based only upon the size of radical, and radical is resolved into class.
In the 3rd example, radical " day " can be classified as that left side is vertical, right side is vertical, top
Horizontal " day " 4 classifications of portion's transverse direction and bottom.4 representative character difference in this 4 classifications
It is " bright ", " rising sun ", " morning " and " Books ".3rd example is based on position and the shape of radical,
Radical is resolved into class.
Above three example is merely illustrative, and is not intended to limit protection scope of the present invention.
Geometric layout is included with least one of properties:Position, shape or big of the radical in character
It is small.There can be different modes that radical is resolved into class.In first sort operation of Figure 1B,
The all modes that radical is resolved into class disclosed in prior art 3 can be applied.In general,
The classification that radical is broken down into is more, then the radical HMM of gained can be more detailed and perfect.
Additionally, it is at least one subclass to pass through the second sort operation by each category classification.Second
The criterion of sort operation is model complexities of the radical X in its respective symbols, or in other words,
It is status numbers of the radical X in its respective symbols.Classification C1 be classified as subclass C11 ...,
C1n;Classification C2 be classified as subclass C21 ..., C2n;Classification Cm is classified as subclass
Other Cm1 ..., Cmn.In fig. ib, m and n are positive integers.Also, as shown in Figure 1B,
Each subclass corresponds to a radical HMM.In other words, for particular category, the number of subclass
Quantity of the amount equal to radical HMM.
Return to above example." Needles " is He " Duo " in radical " gold " first classify after,
Each fall within classification C1.In classifying second, due to status number of the radical " gold " in respective symbols
It is respectively 11 and 17, therefore, the radical " gold " in " Needles " falls into subclass C11, Er " Duo "
In radical " gold " fall into subclass C12.Additionally, C11 and C12 correspond to two it is different
Radical HMM.
The present invention not only considers geometric layout of the radical in its respective symbols, and considers that radical exists
Model complexity in its respective symbols.Therefore, compared with prior art, reality of the invention
The radical HMM for applying example construction is more accurate.The structure by shared these accurate radical HMM
The character HMM for making can more accurately represent character.Therefore, by using comprising more accurate
The dictionary of character HMM, realizes and more accurately recognizes hand-written purpose.
Below, illustrate that two systems are configured by referring to Fig. 3 and Fig. 4.Fig. 3 is according to first
The schematic block diagram of the device of example system configuration, wherein, described device is by according to the present invention
The embedded handwriting recognition module of embodiment realize hand-write input function.Fig. 4 is according to second
The schematic block diagram of the device of example system configuration, wherein, described device is by according to the present invention
The distributed handwriting recognition module of another embodiment realize hand-write input function.Dress in Fig. 3
It can be mobile phone (such as in Fig. 2 200), many to put the device 400 in 300 or Fig. 4
Function printer (MFP), or realize other electronic installations of hand-write recognition function.Two examples
The Main Differences of sexual system configuration are that in figure 3, handwriting recognition module 330 is embedded in dress
Put in 300, and in fig. 4, handwriting recognition module 430 is the distribution separated with device 400
Module.
Next, illustrating to be stepped on by according to the hand-written of embodiments herein by referring to Fig. 2
Note method and the diagram of the applied environment of handwriting recognition dictionary that builds.User is in mobile phone 200
Upper editor's message.Mobile phone 200 can be the device 300 in Fig. 3, or the device in Fig. 4
400.Under either case in both of these case, the picture of mobile phone mainly includes 3 regions,
I.e. message display area 210, input viewing area 220 and text edit area 230.210、220
And 230 together constitute Short Message Service (SMS) UI, wherein, 201 represent two it is exemplary
Short message, and 202 be light target current location.
By using the finger 208 of writing pencil or user, picture of the user in mobile phone 200
Upper input handwriting tracks 207.In figure 3, touch conversion of the sensor 310 by user on phone
It is sampled point, or in other words, obtains handwriting tracks 207.To in track buffer 320, store
The discrete sampling point of the track 207 including space and temporal information.Identification module 330 is from buffer
Track 207 is read in 320.Module 330 includes handwriting recognition dictionary, and the handwriting recognition dictionary is logical
Cross and built according to the hand-written register method of embodiments herein.By using handwriting recognition word
Track 207 is identified as relevant character, i.e. character " buying " by allusion quotation, identification module 330, and will be known
Other result exports back application module 340.In the applied environment, application module 340 is SMS,
Therefore in viewing area 220 is input into, character " buying " will be displayed on the present bit of cursor 202
Put.User can be input into using button 204, and backspace is carried out using button 205, using pressing
Button 206 switches language, etc..After the editor that user completes to current message, user presses
Lower button 203 is sending the message.Then, during entire message will appear in message display region 210,
And will be transmitted by the communication network not illustrated in figure.
Alternately, it is also possible to application second system configuration.As shown in figure 4, the list of Fig. 4
The units 310,320 and 340 that unit 410,420 and 440 is analogous respectively to Fig. 3, herein for succinct
For the sake of, will not be repeated again.The Main Differences of Fig. 4 and Fig. 3 are the position of identification module 430
Put.Comprising the handwriting recognition word built by the hand-written register method according to embodiments herein
The module 430 of allusion quotation, positioned at the outside of device 400.Module 430 via computer network 450, with
Buffer 420 and application module 440 communicate.Network 450 can be LAN or internet.Know
Other module 430 can be other computing devices, such as personal computer (PC), computer workstation,
Or the calculating service provided by network cloud computing technology, etc..By using hand-written
Recognition dictionary, the track 207 that identification module 430 will send via network 450 is identified as associated characters
Symbol, i.e. character " buying ", and via network 450, recognition result is exported into back application module 440.
Also, in the case of figure 4, user with similar mode in the context of fig. 3, and such as
The UI of the mobile phone of the UI of SMS etc. is interacted.
Fig. 5 is the block diagram of the exemplary hardware arrangement for illustrating handwriting recognition module 530.Module 530
Can be 430 in 330 or Fig. 4 in Fig. 3.Memory cell 534 stores handwriting recognition dictionary,
And the software program of hand-written recognition method.By enroller hand-written according to an embodiment of the invention
Method builds recognition dictionary.In Figure 13 described later, exemplified with the flow chart of recognition methods.No
Only track 207 is sent to memory 533, and recognition dictionary is also extended to memory 533
In.Processor 532 is disposed to retrieve the software program of hand-written recognition method.Processor 532
Also it is disposed to look for all steps for taking, decoding and performing according to hand-written recognition method, such as schemes
The step of shown in 13.Searched by recognition dictionary, processor 532 produces recognition result,
And system bus 535 is utilized, result is recorded to memory 533.In addition to memory 533,
Output can also be stored more permanently in memory cell 534.
Network interface 531 is optional.For the first system configuration in Fig. 3, network interface 531
Not necessarily.But, for the second system configuration in Fig. 4, in order to be communicated with network 450,
Network interface 531 is necessary, consequently facilitating the input and output of identification module 430.
Additionally, handwriting recognition module 530 can be various forms of, may have it is one or more not
Necessary part is removed, or has one or more additional parts to be added.
The method of the present invention is described in detail next, with reference to accompanying drawing.Fig. 6 is shown according to this hair
Two stages in the general flow chart of the hand-written register method of bright embodiment.
Stage I is construction radical HMM.In stage I, the invention provides for hand-written registration
New paragon, and produce accurate radical HMM.In stage I, can build including radical
The radical dictionary of HMM.
Stage II is construction character HMM.By combining the radical selected from radical dictionary
HMM constructs the character HMM based on radical.Therefore, in stage II, the character of gained
Model is also accurate.
Fig. 7 shows the flow chart of the construction radical model of the stage I in Fig. 6.
In the step s 100, training data is obtained.Training data includes Seed model, character sample
Training dataset, and character model model data collection.Step S100 includes following sub-step
Suddenly.Selection includes the training radical of at least one classification, and obtains at least one classification
The respective seed HMM of, wherein, classification of the training radical at least one classification
It is based on geometric layout of the radical in kinds of characters.The training dataset of character sample is obtained,
Wherein, the character sample includes handwriting tracks.The model data collection of character HMM is obtained, its
In, the model data collection includes the HMM of multiple characters.
Step S200 is that radical detection and radical sampled point determine step.In step s 200, exist
In the middle of the training dataset, by using accessed seed HMM, detection includes the instruction
Practice the character sample of radical as the training character sample of the radical, then, it is determined that for described
The sampled point of each training character sample of radical, the radical.
Step S300 is status switch extraction step.In step S300, by using pattern number
According to the HMM of the respective symbols concentrated, the training character sample to radical is decoded respectively.
Also, respectively from the HMM of respective symbols, extract the status switch for representing radical.
Step S400 is sorting procedure.In step S400, based on status number, will be extracted
Status switch cluster be subclass so that each subclass correspond to a radical HMM, and
Each subclass corresponds to specific status number.
Fig. 8 shows the example implementations of the step S200 and step S300 in Fig. 7.
In step S210, by by using the Seed model for obtaining in the step s 100, detect respectively
Whether the character sample that the training data obtained in step S100 is concentrated is comprising training radical.For word
One track of symbol, using Seed model, owns by Viterbi (Viterbi) algorithm to track
Possible section (segments) decoded, and calculate corresponding confidence level.If highest
Confidence level is more than predetermined threshold (such as 0.5 etc.), then in step S220, the track of character
It is detected as comprising training radical, wherein, training radical corresponds to Seed model.Also, will tool
There is the section of highest confidence level, be defined as training the sampled point of radical, that is, train the track of radical.It is no
Then, if highest confidence level is less than predetermined threshold, then it represents that in character trace, do not find and plant
The corresponding training radical of submodel, and the track of character will be filtered.
In step S310, by the Viterbi side as known solution in the art
Method, is decoded using the model of character to the track of character.That is, adopting the track of character
Sampling point, is matched with the state in the model of character.
In step s 320, based on matching result, from the status switch of the model of character, extract
The status switch corresponding with the track of training radical, as the status switch of training radical.
Next, how will be described in detail by referring to Fig. 9 by character sample and character model
To construct the exemplary process of radical model.Fig. 9 corresponds to the stage I in Fig. 6.
In this example, the geometric layout based on " mouth " in kinds of characters, radical " mouth " can
To be classified as top " mouth " and bottom " mouth ".Two of " mouth " different classes of to be claimed respectively
It is " mouth 1 " and " mouth 2 ".Fig. 9 illustrates how for classification " mouth 1 " to be categorized into subclass,
Or classification C1 is divided into subclass C11 in Figure 1B ..., the place of second classification of C1n
Reason.
As shown in figure 9, the cylinder in the upper left corner represents the training dataset of character sample, for example, provide
" note ", " brother ", " city ", " Lesson ", " slow-witted ", " STAFF " equal sampleses.Top
Seed model of the dashed rectangle comprising " mouth 1 ".After a while, the generation of Seed model will be described.
Step S51 in Fig. 9 corresponds to the step S200 in Fig. 7.In step s 51, pass through
The Seed model of " mouth 1 " is used, radical " mouth 1 " is detected from the character sample in the cylinder of left side.
As a result, " brother ", " city ", " slow-witted ", " STAFF " is judged as comprising " mouth 1 ".Afterwards,
Determine the sampled point of radical " mouth 1 ", the i.e. track of radical " mouth 1 ".In fig .9, gained
4 tracks are indicated with 4 ellipses.By step S210 and S220 in Fig. 8, it has been shown that
The algorithm of step S51.
Step S52 and S53 in Fig. 9 correspond to the step S300 in Fig. 7.The cylinder in the upper right corner
Represent the model data collection of character model.Model data collection includes the HMM of multiple characters, wherein,
Comprising " brother ", " city ", " slow-witted ", " STAFF " model.In step S52, by making
With " brother ", " city ", " slow-witted ", " STAFF " model to 4 character samples including " mouth 1 "
This is decoded respectively.In step S53, respectively from " brother ", " city ", " slow-witted ", " STAFF "
Character model in, extract represent " mouth 1 " status switch.Institute is shown in the cylinder of bottom
Status switch, wherein, from " brother ", " city ", " slow-witted ", " STAFF " " mouth 1 "
Status switch respectively include 7,7,6 and 6 states.By the step S310 in Fig. 8 and
S320, it has been shown that the algorithm of step S52 and S53.
Step S54 corresponds to the step S400 in Fig. 7.Due to from " brother " and " slow-witted "
The status switch of " mouth 1 " has identical status number, i.e., 7, therefore, the two status switch quilts
Cluster to the first subclass.Also, due to from " city " Ji " STAFF " " mouth 1 " state sequence
Row have identical status number, i.e., 6, therefore, the two status switches are clustered to the second subclass.
The representative radical HMM of each subclass.There are different modes to obtain the representativeness
HMM。
In a kind of mode, by being subordinated to selection state sequence in the middle of the status switch of each subclass
Arrange the radical HMM that each subclass is represented to obtain.For the first subclass, can select to come
From the status switch of " mouth 1 " of " brother " or " slow-witted ", as representative radical HMM.For
Second subclass, can select from " city " Huo " STAFF " " mouth 1 " status switch, as
Representative radical HMM.
Alternately, can be by the multiple hands of training radical of the training package containing corresponding subclass
Sample is write to obtain the radical HMM of each subclass of expression.By taking the first subclass as an example.Collect
And train multiple handwriting samples of " brother " or " slow-witted ", with obtain the first subclass representativeness partially
Other HMM.In an identical manner, collect and train " city " Huo " STAFF " multiple handwriting samples,
To obtain the representative radical HMM of the second subclass.
As described above, the cylinder in the upper left corner represents the training dataset of character sample in Fig. 9.Although
Concentrated in training data and illustrate only 6 character samples, but it is also possible to use much bigger instruction
Practice data set.In one embodiment, training dataset can include the sample of overall character set.
As long as have input any Seed model in the dashed rectangle at top, such as " mouth 2 ", " wood 1 ",
" wood 2 ", " wood 3 ", " Tony 1 " or " Tony 2 ", then hand-written register method will be automatically generated
Radical subclass and radical HMM as derived from input Seed model.
Radical dictionary is built by collecting radical subclass.In one implementation, radical
Dictionary has tree structure, it means that, each radical in radical dictionary has as in Figure 1B
Structure as radical X.Therefore, if giving the classification and subsequent subclass of radical,
The model of radical can easily be retrieved.
Return to Fig. 6.Next, the stage II that character Construction of A Model will be described.Figure 10 is to illustrate
Property ground illustrate according to an embodiment of the invention construction character model treatment schematic diagram.The treatment
Step and radical model combination step are selected comprising radical model.
For character " slow-witted ", it is made up of key element radical " mouth " and " wood ".Based on " mouth "
Geometric layout in " slow-witted ", selects the classification of " mouth 1 ".Also, based on " wood " " slow-witted "
In geometric layout, the classification of selection " wood 3 ".
In the overall character HMM of character " slow-witted ", the status switch of radical " mouth " is represented
Status number is 7.Based on the quantity 7, selection subclass " mouth 11 ".Therefore, it is possible to from radical word
The representative HMM of subclass " mouth 11 " is obtained in allusion quotation.Equally, in the entirety of character " slow-witted "
In character HMM, the status number for representing the status switch of radical " wood " is 8.Based on the quantity 8,
Selection subclass " wood 31 ".Then, the representative of subclass " wood 31 " is obtained from radical dictionary
Property HMM.
Therefore, by combining the representative HMM and subclass " wood 31 " of subclass " mouth 11 "
Representative HMM, generation character " slow-witted " the HMM based on radical.
It is noted that entirety character HMM refers to character HMM be as an entirety, and
And character HMM is produced by the model of combination/shared key element radical.Frequently by
Multiple handwriting samples of respective symbols are trained to produce overall character HMM.
Figure 11 shows embodiments in accordance with the present invention by combining corresponding radical model to generate
The example of the model of multiple characters.Built-up pattern " west 11 " and the word of " wood 31 " generation " chestnut "
Symbol model;Built-up pattern " standing grain 11 ", " Dao11" and " wood 32 " generation " pears " character mould
Type;Etc.." mouth 11 " is shared by more than one character, " wood 31 ", " wood 32 " and " mouth
12 " it is also such.In hand-written register method, the character recognition dictionary of gained is much smaller than comprising whole
The conventional characters recognition dictionary of body character HMM, because the quantity of radical model is much smaller than character mould
The quantity of type.Therefore, by radical shared mechanism, realize and quickly recognize hand-written purpose.
There are different modes to generate Seed model.Figure 12 be schematically illustrate it is of the invention
The schematic diagram of the treatment of the generation Seed model of embodiment.In step S81, selection training radical
Representative character, wherein, each character comprising different geometric layouts training radical.For example,
For training radical " mouth ", character " number " and " accounting for " is selected, wherein, " mouth 1 " is in " number "
In geometric layout be top, and the geometric layout of " mouth 2 " be bottom.In step S82,
In the sample of character " number ", the radical section corresponding with " mouth 1 " is marked.Equally, in character
In the sample of " accounting for ", the radical section corresponding with " mouth 2 " is marked.In fig. 12, the two
Section is represented as radical section 1 and radical section 2.Next, in step S83, by being used as this
The Viterbi method of the known schemes in technical field, using corresponding character model to character
The sample of " number " and " accounting for " is decoded respectively.That is, by the sampled point of the track of character with
State in the model of character is matched.In step S84, respectively from " number " and " accounting for "
Character model in extract with radical section 1 and radical section 2 corresponding status switches, and serve as
The Seed model of " mouth 1 " and " mouth 2 ".
Note that and do not limited for generating the method for Seed model.Alternately, can be with
Seed model is obtained by being trained to the multiple handwriting samples for training radical, wherein, instruction
The multiple handwriting samples for practicing radical belong to the corresponding classifications of seed HMM.
According to another aspect, the present invention provides a kind of hand-written recognition method, the hand-written recognition method bag
Include following steps:
- obtain handwriting samples;And
- obtained to recognize by using the character dictionary comprising character model of the multiple based on radical
The handwriting samples got, wherein, the multiple base is generated by above-mentioned hand-written register method
In the character model of radical.
The identification step also includes:
- input handwriting samples are standardized as such as 400*400;
- feature of these handwriting samples is extracted to formula (3) according to formula (1);And
- according to Viterbi method, and by using the identification of the recognition template including being input into character
Dictionary, the feature to these handwriting samples is decoded.
It will be understood by those of skill in the art that the method for the present invention be also applied for such as Japanese, in
The East Asia character of text or Korean characters etc..
According on the other hand, hand-written register method of the invention can be offline register method.This hair
Bright hand-written recognition method can be ONLINE RECOGNITION method.Character dictionary according to embodiment can be
Offline structure, and handwriting recognition treatment is canbe used on line.
The present invention not only considers the geometric layout of radical, and considers radical in its respective symbols
Model complexity, therefore compared with prior art, the radical mould of embodiments in accordance with the present invention construction
Type is more accurate.The character model constructed by sharing these accurate radical models can be more smart
Really represent character.Therefore, it is real by using the recognition dictionary comprising more accurate character model
Show and more accurately recognized hand-written purpose.
Further, since the build-in attribute of the hand-written register method based on radical, and as a result
The dictionary of small size, and realize the fast advantage of handwriting recognition speed.
Figure 13 is the functional configuration of calling mechanism hand-written according to an embodiment of the invention.Can be by hard
Any of part, firmware, software or its any combination constitute hand-written calling mechanism 9000
And its included unit, as long as the unit in device 9000 can realize hand-written as described above stepping on
The function of the corresponding steps of note method.If device 9000 is partly or wholly by software
Constitute, then the software is stored in the memory of computer, and when the processor of computer
During by performing stored software to be processed, the computer can realize hand of the invention
Write the function of register method.On the other hand, device 9000 can partly or wholly by hardware
Or firmware is constituted.Device 9000 can be incorporated into image processing equipment as functional module.
Hand-written calling mechanism 9000 includes that radical Construction of A Model unit 9100 and character model construction are single
Unit 9200, wherein, unit 9100 is configured to build includes radical hidden Markov model (HMM)
Radical dictionary, and unit 9200 be configured to by combine from the radical dictionary select
Radical HMM generates the character HMM based on radical.
Radical Construction of A Model unit 9100 includes:
- training data obtains subelement 9110, its training number for being configured to obtain character sample
According to collection, wherein, the character sample includes handwriting tracks sample;Selection training radical, according to this
Geometric layout of the radical in kinds of characters, is categorized at least one classification, and obtain by the radical
Take the respective seed HMM of at least one classification;Obtain the model of character HMM
Data set, wherein, the model data collection includes the HMM of multiple characters;
- radical detect and radical sampled point determination subelement 9120, its be configured to by using
Accessed seed HMM, character of the detection comprising the radical in the middle of the training dataset
Sample as the radical training character sample, and for the radical each training character
Sample, determines the sampled point of the radical;
- status switch extracts subelement 9130, and it is configured to by using the model data
The HMM of the respective symbols of concentration, the training character sample to the radical is solved respectively
Code, and respectively from the HMM of respective symbols, extract the status switch for representing the radical;
- cluster subelement 9140, it is configured to based on status number, the state that will be extracted
Sequence clustering is to subclass so that each subclass corresponds to a radical HMM.
Character model structural unit 9200 includes:
- radical model selects subelement 9210, and it is configured to be directed to includes the instruction of key element radical
Practise handwriting symbol, be each key element based on geometric layout of each key element radical in the training character
In radical one classification of selection, and the overall character HMM based on the training character, table
Show the status number of the status switch of each key element radical, for each key element is inclined from the radical dictionary
One subclass of side selection, and for each key element radical, obtain and selected subclass phase
Corresponding radical HMM;And
- radical model group zygote unit 9220, it is configured to each obtained by combination
The radical HMM of key element radical come generate it is described training character the HMM based on radical.
According to another aspect of the present invention, there is provided a kind of character model that radical is based on including multiple
Handwriting recognition dictionary, wherein, it is the multiple to generate by above-mentioned any hand-written register method
Character model based on radical.
According to another aspect of the present invention, there is provided a kind of handwriting recognition apparatus.The handwriting recognition dress
Put including:Sample acquisition unit, it is configured to obtain handwriting samples;Hand-written knowledge as described above
Malapropism allusion quotation;And recognition unit, it is configured to know by using the handwriting recognition dictionary
Not accessed handwriting samples.
According to another aspect of the present invention, there is provided a kind of mobile phone.The mobile phone includes:
Sensor, its touch for being configured to by user on the mobile phone is converted to sampled point;Rail
Mark buffer, it is configured to store the sampled point;Handwriting recognition apparatus as described above, its
It is configured to for the sampled point to be identified as character;And text editing module, it is configured to root
Text is exported according to the character for being identified.
According to another aspect of the invention, there is provided a kind of image processing equipment.Described image treatment sets
It is standby to include:Handwriting recognition apparatus as described above;And graphics processing unit, it is configured to root
Input picture is processed according to the recognition result of the handwriting recognition apparatus.Described image processing equipment
Can be at least one of following equipment:Duplicator, facsimile machine, scanner, printer or many
Function printer.
It should be noted that the method for the present invention and device can be implemented in several ways.For example,
The method of the present invention can be implemented by any combination of software, hardware, firmware or this three
And device.The order of the step of method described above is only intended to illustrate, and unless in addition
Specialize, otherwise the method for the present invention the step of be not limited to the order that is described in detail above.
Additionally, in certain embodiments, the present invention may be embodied in recording program in the recording medium,
Including the machine readable instructions for realizing the method according to the invention.Therefore, present invention also contemplates that
The recording medium of the program being stored with for realizing the method according to the invention.
Although with reference to exemplary embodiment, invention has been described, but it is to be understood that this hair
It is bright to be not limited to disclosed exemplary embodiment.It will be apparent to those skilled in the art that
It is, can without departing from the scope and spirit of the present invention, to above-mentioned example embodiment
Carry out modification.Scope of the following claims should be endowed explanation most wide, with cover it is all this
Class modification and equivalent 26S Proteasome Structure and Function.
Claims (25)
1. a kind of hand-written register method, the hand-written register method includes:
Structure includes the radical dictionary of radical hidden Markov model HMM, and by combine from
The radical HMM selected in the radical dictionary generates the character HMM based on radical, wherein,
Radical HMM in the radical dictionary is through the following steps that generation:
Training data obtaining step, it includes:Selection includes the training radical of at least one classification,
And the respective seed HMM of one of at least one classification is obtained, wherein, this is trained
It is based on geometric layout of the radical in kinds of characters that radical is categorized at least one classification;
The training dataset of character sample is obtained, wherein, the character sample includes handwriting tracks;Obtain
The model data collection of character HMM, wherein, the model data collection includes the HMM of multiple characters;
Radical is detected and radical sampled point determines step, by using accessed seed HMM,
In the middle of the training dataset, character sample of the detection comprising the radical is used as the radical
Training character sample, and for each training character sample of the radical, determine the radical
Sampled point;
Status switch extraction step, the respective symbols concentrated by using the model data
HMM, the training character sample to the radical is decoded respectively, and from respective symbols
HMM in, the status switch for representing the radical is extracted respectively;
Sorting procedure, based on status number, the status switch cluster that will be extracted makes to subclass
Obtain each subclass and correspond to a radical HMM.
2. hand-written register method according to claim 1, wherein, the word based on radical
Symbol HMM through the following steps that generation:
Radical model selects step, for the training character including key element radical, based on each key element
Geometric layout of the radical in the training character, is that each key element radical selects a classification, and
And based on it is described training character overall character HMM in, state representing each key element radical
The status number of sequence, for each key element radical selects a subclass from the radical dictionary, and
And for each key element radical, obtain the radical HMM corresponding with selected subclass;And
Radical model combination step, the radical HMM of each the key element radical obtained by combination,
To generate the HMM based on radical of the training character.
3. the hand-written register method according to claim 1 or claim 2, wherein, it is described
Geometric layout is included with least one of properties:Position, shape or big of the radical in character
It is small.
4. the hand-written register method according to claim 1 or claim 2, wherein, it is described
Sorting procedure includes:
The status switch cluster that status number identical is extracted arrives same subclass.
5. hand-written register method according to claim 4, wherein, in the sorting procedure,
Select status switch and obtain and every height by being subordinated in the middle of the status switch of each subclass
The corresponding radical HMM of classification.
6. hand-written register method according to claim 4, wherein, in the sorting procedure,
Obtained and every height by multiple handwriting samples of training radical of the training package containing corresponding subclass
The corresponding radical HMM of classification.
7. the hand-written register method according to claim 1 or claim 2, wherein, pass through
The status switch of the expression training radical is extracted from the HMM of source word symbol to obtain the kind
Sub- HMM, wherein, geometric layout of the training radical in source word symbol belongs to the seed
HMM corresponding classification.
8. the hand-written register method according to claim 1 or claim 2, wherein, pass through
Multiple handwriting samples of the training radical are trained to obtain the seed HMM, wherein, it is described
The multiple handwriting samples of radical are trained to belong to the corresponding classifications of the seed HMM.
9. the hand-written register method according to claim 1 or claim 2, wherein, it is described
Hand-written register method is used to register East Asia character.
10. a kind of hand-written recognition method, the hand-written recognition method is comprised the following steps:
Obtain handwriting samples;And
Recognized by using the character dictionary comprising character model of the multiple based on radical acquired
The handwriting samples for arriving, wherein, by any hand-written enroller of claim 1 to claim 9
Method generates the multiple character model based on radical.
11. hand-written recognition methods according to claim 10, wherein, the character dictionary is
Offline structure, and the hand-written recognition method is canbe used on line.
A kind of 12. hand-written calling mechanisms, the hand-written calling mechanism include radical Construction of A Model unit and
Character model structural unit, wherein, the radical Construction of A Model unit is configured to build to be included partially
The radical dictionary of other hidden Markov model HMM, and the character model structural unit is by structure
Make is to generate the word based on radical by combining the radical HMM selected from the radical dictionary
Symbol HMM, wherein, the radical Construction of A Model unit includes:
Training data obtains subelement, and it is configured to:The training dataset of character sample is obtained,
Wherein, the character sample includes handwriting tracks sample;Selection training radical, exists according to the radical
Geometric layout in kinds of characters, is categorized as at least one classification, and obtain described by the radical
The respective seed HMM of one of at least one classification;And obtain the pattern number of character HMM
According to collection, wherein, the model data collection includes the HMM of multiple characters;
Radical detects and radical sampled point determination subelement that it is configured to by using accessed
Seed HMM, in the middle of the training dataset, detection comprising the radical character sample make
It is the training character sample of the radical, and for each training character sample of the radical,
Determine the sampled point of the radical;
Status switch extracts subelement, and it is configured to the phase concentrated by using the model data
The HMM of character is answered, the training character sample to the radical is decoded respectively, and from
In the HMM of respective symbols, the status switch for representing the radical is extracted respectively;
Cluster subelement, it is configured to the status switch cluster that based on status number, will be extracted
To subclass so that each subclass corresponds to a radical HMM.
13. hand-written calling mechanisms according to claim 12, wherein, the character model structure
Making unit includes:
Radical model selects subelement, and it is configured to be directed to includes the training character of key element radical,
It is the selection of each key element radical based on geometric layout of each key element radical in the training character
One classification, and based on it is described training character overall character HMM in, represent each will
The status number of the status switch of plain radical, is each key element radical selection one from the radical dictionary
Individual subclass, and for each key element radical, obtains corresponding with selected subclass inclined
Other HMM;And
Radical model group zygote unit, it is configured to each the key element radical obtained by combination
Radical HMM come generate it is described training character the HMM based on radical.
The 14. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute
Stating geometric layout is included with least one of properties:Position of the radical in character, shape or
Size.
The 15. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute
Status switch cluster that cluster subelement extracted status number identical is stated to same subclass.
16. hand-written calling mechanisms according to claim 15, wherein, it is single in cluster
In unit, by be subordinated in the middle of the status switch of each subclass select status switch obtain with
The corresponding radical HMM of each subclass.
17. hand-written calling mechanisms according to claim 15, wherein, it is single in cluster
In unit, obtained and each subclass phase by multiple handwriting samples of the training package containing training radical
Corresponding radical HMM.
The 18. hand-written calling mechanism according to claim 12 or claim 13, wherein, lead to
Cross to be extracted from the HMM of source word symbol and represent that the status switch of the training radical is described to obtain
Seed HMM, wherein, geometric layout of the training radical in source word symbol belongs to the kind
Sub- HMM corresponding classification.
The 19. hand-written calling mechanism according to claim 12 or claim 13, wherein, lead to
Cross and train multiple handwriting samples of the training radical to obtain the seed HMM, wherein, institute
The multiple handwriting samples for stating training radical belong to the corresponding classifications of the seed HMM.
The 20. hand-written calling mechanism according to claim 12 or claim 13, wherein, institute
Hand-written calling mechanism is stated for registering East Asia character.
A kind of 21. handwriting recognition dictionaries, the handwriting recognition dictionary includes multiple character moulds based on radical
Type, wherein, generated by any hand-written register method of claim 1 to claim 9
The multiple character model based on radical.
A kind of 22. handwriting recognition apparatus, the handwriting recognition apparatus include:
Sample acquisition unit, it is configured to obtain handwriting samples;
Handwriting recognition dictionary according to claim 21;And
Recognition unit, it is configured to be recognized by using the handwriting recognition dictionary acquired
The handwriting samples for arriving.
A kind of 23. mobile phones, the mobile phone includes:
Sensor, its touch for being configured to by user on the mobile phone is converted to sampled point;
Track buffer, it is configured to store the sampled point;
Handwriting recognition apparatus according to claim 22, it is configured to know the sampled point
Wei not character;And
Text editing module, it is configured to export text according to the character for being identified.
A kind of 24. image processing equipments, the image processing equipment includes:
Handwriting recognition apparatus according to claim 22;And
Graphics processing unit, it is configured to the recognition result according to the handwriting recognition apparatus, comes
Treatment input picture.
25. image processing equipments according to claim 24, the image processing equipment is to duplicate
Machine, facsimile machine, scanner, printer or multi-function printer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510876255.9A CN106845319A (en) | 2015-12-03 | 2015-12-03 | Hand-written register method, hand-written recognition method and its device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510876255.9A CN106845319A (en) | 2015-12-03 | 2015-12-03 | Hand-written register method, hand-written recognition method and its device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106845319A true CN106845319A (en) | 2017-06-13 |
Family
ID=59149053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510876255.9A Pending CN106845319A (en) | 2015-12-03 | 2015-12-03 | Hand-written register method, hand-written recognition method and its device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106845319A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097193A (en) * | 2019-04-28 | 2019-08-06 | 第四范式(北京)技术有限公司 | The method and system of training pattern and the method and system of forecasting sequence data |
-
2015
- 2015-12-03 CN CN201510876255.9A patent/CN106845319A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110097193A (en) * | 2019-04-28 | 2019-08-06 | 第四范式(北京)技术有限公司 | The method and system of training pattern and the method and system of forecasting sequence data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mahdavi et al. | ICDAR 2019 CROHME+ TFD: Competition on recognition of handwritten mathematical expressions and typeset formula detection | |
CN105512692B (en) | Hand script Chinese input equipment mathematical formulae Symbol Recognition based on BLSTM | |
Harouni et al. | Online Persian/Arabic script classification without contextual information | |
Awal et al. | First experiments on a new online handwritten flowchart database | |
CN102449640B (en) | Recognizing handwritten words | |
TW389865B (en) | System and method for automated interpretation of input expressions using novel a posteriori probability measures and optimally trained information processing network | |
US9711117B2 (en) | Method and apparatus for recognising music symbols | |
CN109614944A (en) | A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing | |
CN107169485A (en) | A kind of method for identifying mathematical formula and device | |
CN107526721B (en) | Ambiguity elimination method and device for comment vocabularies of e-commerce products | |
EP4150480A1 (en) | Descriptive insight generation and presentation system | |
CN114730241B (en) | Gesture and stroke recognition in touch user interface input | |
Ramteke et al. | A novel weighted SVM classifier based on SCA for handwritten marathi character recognition | |
CN111753744A (en) | Method, device and equipment for classifying bill images and readable storage medium | |
Abuzaraida et al. | Online handwriting Arabic recognition system using k-nearest neighbors classifier and DCT features | |
Gohel et al. | On-line handwritten Gujarati character recognition using low level stroke | |
Panda et al. | Odia offline typewritten character recognition using template matching with unicode mapping | |
Chaithra et al. | Handwritten online character recognition for single stroke Kannada characters | |
CN117095414A (en) | Handwriting recognition system and recognition method based on dot matrix paper pen | |
KR20080080112A (en) | Logical structure and layout based offline character recognition | |
CN106845319A (en) | Hand-written register method, hand-written recognition method and its device | |
Chiney et al. | Handwritten data digitization using an anchor based multi-channel CNN (MCCNN) trained on a hybrid dataset (h-EH) | |
CN112560849B (en) | Neural network algorithm-based grammar segmentation method and system | |
CN115311674A (en) | Handwriting processing method and device, electronic equipment and readable storage medium | |
CN114387600A (en) | Text feature recognition method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170613 |
|
WD01 | Invention patent application deemed withdrawn after publication |