CN102750552B - Handwriting recognition method and system as well as handwriting recognition terminal - Google Patents

Handwriting recognition method and system as well as handwriting recognition terminal Download PDF

Info

Publication number
CN102750552B
CN102750552B CN201210205916.1A CN201210205916A CN102750552B CN 102750552 B CN102750552 B CN 102750552B CN 201210205916 A CN201210205916 A CN 201210205916A CN 102750552 B CN102750552 B CN 102750552B
Authority
CN
China
Prior art keywords
maximum entropy
character
stroke
handwriting
entropy model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210205916.1A
Other languages
Chinese (zh)
Other versions
CN102750552A (en
Inventor
李健
郑晓明
张连毅
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing InfoQuick SinoVoice Speech Technology Corp.
Original Assignee
JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd filed Critical JIETONG HUASHENG SPEECH TECHNOLOGY Co Ltd
Priority to CN201210205916.1A priority Critical patent/CN102750552B/en
Publication of CN102750552A publication Critical patent/CN102750552A/en
Application granted granted Critical
Publication of CN102750552B publication Critical patent/CN102750552B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides handwriting recognition method and system as well as a character recognition terminal, aiming at solving the problems that the existing character recognition result frequently has errors so as to cause reduction of recognition rate and further influence the handwriting experience of multiword input. The handwriting recognition method disclosed by the invention comprises the steps of: collecting continuously input handwriting; extracting handwriting features; inputting the handwriting features to a maximum entropy model which judges whether the current stroke is a segmentation point; if yes, segmenting the character to obtain the final recognition result. In the invention, the character segmentation method based on maximum entropy is a statistics prediction model, and can more accurately judge the relation between character strokes, thus confirming the segmentation point and providing segmentation point judgment probability, more comprehensively and overall judging the segmentation of characters and improving the accuracy of recognition results.

Description

A kind of hand-written recognition method, system and handwriting recognition terminal
Technical field
The present invention relates to mode identification technology, particularly relate to a kind of hand-written recognition method based on maximum entropy model, system and handwriting recognition terminal.
Background technology
Handwriting recognition (Handwriting Recognize), refer to the process orderly trace information produced when writing on handwriting equipment being converted into Hanzi internal code, be actually the mapping process of coordinate sequence to Hanzi internal code of handwriting tracks, be that man-machine interaction is the most natural, one of the means of most convenient.
Equipment at present for handwriting input has many kinds, such as electromagnetic induction handwriting pad, pressure-sensitive handwriting pad, touch-screen, Trackpad, ultrasonic pen etc.The stroke that user writes on handwriting input device is to be similar to the form of polar plot by under Computer Storage, undertaken processing and contrasting by information such as the locus to each pixel on the lifting pen, start to write of character image, person's handwriting, the literal code that data transformations uses for computing machine exports by system.Along with popularizing of the mobile message such as smart mobile phone, palm PC instrument, handwriting recognition technology also enters the sizable application epoch, can be widely used in various desktop operating system, embedded OS.
The pattern of hand input has also developed into multiword handwriting recognition from individual character handwriting recognition, the cutting of multiword character is the gordian technique affecting handwriting recognition accuracy and user experience, the Character segmentation algorithm major part of current employing be rule-based come the judgement of cut point, and the probability being judged as cut point cannot be provided, often there is mistake in the character identification result therefore after cutting, thus cause the decline of discrimination, affect the hand-written experience of multiword input.
Summary of the invention
The invention provides a kind of hand-written recognition method, system and character recognition terminal, often occur mistake to solve existing character identification result, cause discrimination to decline, and then affect the problem of the hand-written experience that multiword inputs.
In order to solve the problem, the invention discloses a kind of hand-written recognition method, comprising: the person's handwriting gathering input continuously; Extract handwriting characteristic; Be input to by handwriting characteristic in maximum entropy model, maximum entropy model judges whether current stroke is cut point; If so, then character is cut, obtain final recognition result.
Preferably, maximum entropy model judges whether current stroke is that cut point comprises: maximum entropy model utilizes handwriting characteristic to provide the probability that current stroke is cut point; If the probability obtained is greater than predetermined probabilities, current stroke is cut point.
Preferably, also comprise the step determining predetermined probabilities, determine that predetermined probabilities comprises: described character script is cut, obtain at least one cutting route; Individual character identification is carried out to each cutting route, obtains candidate's recognition result for each cutting route and obtain the first probable value of this candidate's recognition result; Utilize language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result; The combined chance value of each candidate's recognition result is obtained according to the first probable value of each candidate's recognition result and the second probable value; Maximum combined chance value is selected to be predetermined probabilities.
Preferably, the person's handwriting gathering continuously input comprises: gather the character script that inputs continuously with reduplicated word or with row or with the character script arranging input continuously.
Preferably, also comprise and set up maximum entropy model, described maximum entropy model of setting up comprises: select maximum entropy model feature, prepares training data, training maximum entropy model.
Preferably, the maximum entropy model feature of selection comprises: the handwriting characteristic selecting the character script inputted continuously with reduplicated word; Namely select the relative position between stroke, stroke is arranged in the start to write regional location at a place, the regional location lifting pen point place, the size increasing stroke, stroke height of position, the stroke of writing region and accounts for the ratio of writing region height or stroke width and account at least one feature of ratio of writing peak width feature as maximum entropy model.
Preferably, the maximum entropy model feature selected comprises: select with the handwriting characteristic of the continuous character script of input of row, namely to select in the width in width, below space in space before current character and the ratio of width to height of current character at least one feature as the feature of maximum entropy model; Select the handwriting characteristic of the character script to arrange continuously input, namely to select in the width in width, below space in space above current character and the ratio of width to height of current character at least one feature as the feature of maximum entropy model.
The invention also discloses a kind of hand-written discrimination system, comprising: acquisition module, for gathering the person's handwriting of input continuously; Characteristic extracting module: for extracting handwriting characteristic; Cutting module, for being input in maximum entropy model by handwriting characteristic, maximum entropy model judges whether current stroke is cut point; Identification module, for when current stroke is cut point, cuts character, obtains final recognition result.
Preferably, hand-written discrimination system also comprises: determination module, for determining predetermined probabilities; Described determination module comprises:
Cutting submodule; For cutting described character script, obtain at least one cutting route;
Individual character recognin module; For carrying out individual character identification to each cutting route, obtaining candidate's recognition result for each cutting route and obtaining the first probable value of this candidate's recognition result;
Language model recognin module; For utilizing language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result;
Comprehensive descision submodule; For obtaining the combined chance value of each candidate's recognition result according to the first probable value of each candidate's recognition result and the second probable value;
Chooser module; Be predetermined probabilities for selecting maximum combined chance value.
The invention also discloses a kind of handwriting recognition terminal, comprise a kind of hand-written discrimination system disclosed by the invention.
Compared with prior art, the present invention has the following advantages:
The Character segmentation method based on maximum entropy that the present invention provides, it is the forecast model of Corpus--based Method, relation between character stroke and stroke can be judged more accurately, thus be confirmed whether as cut point, and provide the probability being judged as cut point, more comprehensive and comprehensively judge intercharacter cutting, the accuracy of raising recognition result.
Accompanying drawing explanation
Fig. 1 is a kind of hand-written recognition method process flow diagram described in the embodiment of the present invention;
Fig. 2 is the structural drawing of a kind of hand-written discrimination system described in the embodiment of the present invention.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below in conjunction with the drawings and specific embodiments, the present invention is further detailed explanation.
The present invention proposes a kind of hand-written recognition method, system and terminal, whether the handwriting characteristic that the method can extract user's continuous writing is input in maximum entropy model is the judgement of cut point, relation between character stroke and stroke can be judged more accurately, improve the accuracy of recognition result.
Be described in detail below by embodiment.
With reference to Fig. 1, it is a kind of hand-written recognition method process flow diagram described in the embodiment of the present invention.
Step 11, gathers the person's handwriting of input continuously;
Step 12, extracts handwriting characteristic;
User can repeat to input multiple character continuously in same handwriting area, and described character comprises the forms such as Chinese text, punctuation mark, English alphabet.
Gather the character script that user inputs continuously, described character script refers to the information of stroke forms input.The equipment gathering handwriting input has multiple, and as electromagnetic induction handwriting pad, pressure-sensitive handwriting pad, touch-screen, Trackpad, ultrasonic pen etc., distinct device is all the coordinate utilizing the induction installation that equipment is installed to record user writing when gathering, i.e. person's handwriting point.Usually the position of starting to write is designated as the reference position of a stroke, the position of lifting pen is designated as the final position of a stroke, pen-down position and a series of person's handwriting points lifted between a position form an entering stroke.
Step 13, is input to handwriting characteristic in maximum entropy model, and maximum entropy model judges whether current stroke is cut point;
In hand-written recognition method described in the present embodiment, what collect is multiple character script that user inputs continuously, can be the person's handwriting point of the character script that collection inputs continuously with reduplicated word in practical application; Or gather with the person's handwriting point of the character script of the continuous input of row; Or gather the person's handwriting point of the character script to arrange input continuously.
Also needed to set up maximum entropy model before carrying out determining whether cut point, concrete set up maximum entropy model and can comprise: select maximum entropy model feature, prepare training data, training maximum entropy model.
Give a concrete illustration detailed description below:
(1) maximum entropy model feature is selected
Select the feature relevant with character stroke position as the feature of maximum entropy model.Different handwriting characteristics is selected according to different input conditions when selecting, such as: when reduplicated word inputs continuously, the handwriting characteristic of selection can comprise: the relative position between stroke, stroke are arranged in the start to write regional location at a place, the regional location lifting pen point place, the size increasing stroke, stroke height of position, the stroke of writing region and account for the ratio of writing region height or stroke width accounts for the feature of the features such as at least one of the features such as the ratio of writing peak width as maximum entropy model.Selected feature including, but not limited to the above-mentioned feature enumerated, can according to practical application need with and choose required handwriting characteristic.When with row continuously input, the handwriting characteristic of selection can comprise: before current character space the feature such as the width in width, below space, the ratio of width to height of current character at least one feature as the feature of maximum entropy model; To arrange the handwriting characteristic of the character script of continuously input, namely to select in the width in width, below space in space above current character and the ratio of width to height of current character at least one feature as the feature of maximum entropy model.Illustrate with reduplicated word input below.
(2) training data is prepared
After the feature selecting maximum entropy model, carry out the preparation of training data, need the feature of character stroke position in Confirming model.As the relative position between stroke, stroke are positioned at position of writing region etc., the x namely in above-mentioned model.Then carry out data encasement, prepare some reduplicated word character strokes, and mark according to the feature determined.
Consider a stochastic process p (y|x), it is according to the vector x that can observe, belongs to a finite aggregate Y with certain probability output certain y, y.In the judgement of Character segmentation, Y={1,0}, represent cut point and non-cutting point respectively.The feature that x representative is relevant with character stroke position, i.e. unsentenced reduplicated word character stroke, comprises the relative position between stroke, stroke is positioned at the position of writing region etc.In order to rebuild stochastic process p (y|x), we sample to its output, obtain N number of training examples (x 1, y 1), (x 2, y 2) ..., (x n, y n).Due to the stochastic process generation thus of these training examples, so we suppose the empirical probability of certain event in training examples, equal the expected probability of this event when known p (y|x).
(3) maximum entropy model is trained
After getting out training data, utilize ready training data to train maximum entropy model.Previous step has been marked the relative position between stroke, stroke be positioned at the data behind the character stroke position of writing regional location send into maximum entropy model training, whether data layout is: cut, feature 1, feature 2
Certain event can characterize function f with one i(x, y) represents.If sample (x j, y j) in there occurs this event, then f i(x i, y i)=1; Otherwise be 0.Such as: complete if x meets previous character writing, and y is cut point, then f i(x, y)=1; Other situations, then f i(x, y)=0.The empirical probability of this event in training examples is expressed as:
p ( f i ) = Σ x , y p ~ ( x , y ) f i ( x , y ) - - - ( 1 )
Wherein, the probability that sample (x, y) occurs in training examples,
occurrence number in training character stroke.
If known p (y|x), then event f ithe expected probability of (x, y) is expressed as:
p ( f i ) = Σ x , y p ~ ( x ) p ( y | x ) f i ( x , y ) - - - ( 2 )
Wherein, p (x) is the probability of x in training example.
Hypothesis according to us has that is:
Σ x , y p ~ ( x ) p ( y | x ) f i ( x , y ) = Σ x , y p ~ ( x , y ) f i ( x , y ) - - - ( 3 )
We claim to characterize function f i(x, y) is fundamental function, or is called for short feature.So above formula is called as about feature f ian equation of constraint of (x, y), referred to as constraint.Constraint is stochastic process p (y|x) and training examples equation about a certain feature, and it has done some restriction to the distribution of p (y|x), make it the sample that produces feature indicate in, from the statistical significance close to training examples.
Suppose to define n feature, all stochastic processes meeting this n feature form a set:
C ≡ { p ( y | x ) | p ( f i ) = p ~ ( f i ) f or i ∈ { 1,2 , . . . , n } } - - - ( 4 )
Usually, | C|>1.We choose that maximum stochastic process of wherein entropy as the model rebuild out.Here entropy is conditional entropy, is expressed as:
H ( p ) ≡ - Σ x , y p ~ ( x ) p ( y | x ) log p ( y | x ) - - - ( 5 )
Then our the final model rebuild out is: p*=arg max p ∈ Ch (p) (6)
This model is referred to as maximum entropy model.The maximum principle of entropy ensure that maximum entropy model has good extensive effect.The expression-form of maximum entropy model and parameter calculate
(6) formula that solves obtains maximum entropy model and has following form:
p ( y | x ) = 1 z ( x ) exp ( Σ i λ i f i ( x , y ) ) - - - ( 7 )
In above formula, λ i is feature f ithe weight of (x, y), can use IIS or L-BFGS iterative algorithm, trains obtaining from training character stroke.Z (x) is normalization coefficient.
Complete after setting up maximum entropy model, the handwriting characteristic of collection input maximum entropy model is judged.The detailed process judged can be as described in step 14.
Step 14, if maximum entropy model judges that current stroke is cut point, then cuts character, obtains final recognition result.If maximum entropy model judges that current stroke is not cut point, then character is not cut, the handwriting characteristic gathering continuous input character can be continued.
The process judged specifically can comprise maximum entropy model can to the probability between character stroke and stroke being whether cut point.If this probability is very large, then thinks the cut point between character stroke, cutting is completed to character.How to judge that whether obtained be that the probability of cut point is very large, rule of thumb can set a fixed value, if the cutting probability obtained is greater than this fixed value, is then that the probability of cut point is very large between description character stroke and stroke, can cuts.
How to judge that whether obtained be that the probability of cut point is very large, can be also whether the probability of cut point by what obtain, join in route searching, improve discrimination further.During concrete enforcement, can be setting predetermined probabilities, if whether what obtain is that the probability of cut point is greater than predetermined probabilities, is then that the probability of cut point is very large between description character stroke and stroke, can cuts.Predetermined probabilities can obtain by the following method:
Described character script is cut, obtains at least one cutting route;
Such as, will " all over the world " cut after obtain 4 cutting route, be under “ bis-∣ Ren ∣ respectively ", " under Er Ren ∣ ", under “ bis-∣ people ", " under two people ", all corresponding cutting probable value of every bar cutting route.
Individual character identification is carried out to each cutting route, obtains candidate's recognition result for each cutting route and obtain the first probable value of this candidate's recognition result;
Wherein, described identifying can adopt existing identification methods, and the embodiment of the present invention does not limit at this.
In each cutting route, the each individual character opened with the cutting of candidate's cut point is identified, identification for each individual character may obtain multiple candidate's recognition result (being individual character candidate recognition result), and obtains the individual character identification probability of each candidate's recognition result, is called the first probable value.
Such as, for " all over the world " that input is shorter, to under corresponding 4 cutting route " two | people | under ", " under Er Ren ∣ ", “ bis-∣ people ", " under two people " identify respectively: respectively individual character identification is carried out to " two people ", D score for cutting route " under Er Ren ∣ "; candidate's recognition result that corresponding " two people " obtains may be " my god ", " husband " etc.; each candidate's recognition result obtains an individual character identification probability, as corresponding " my god ", first probable value of " husband " is A, B respectively; Equally, the first probable value that individual character identification obtains corresponding one or more candidate's recognition result and each candidate's recognition result is also carried out for D score.The individual character identifying of other cutting route is identical, describes in detail no longer one by one.
Utilize language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result;
Described language model can represent the related information between character, and this related information represents by probability.Language model refers to the model for calculating phrase or sentence probability, for a word, if there are many cutting route just to have multiple candidate's recognition result, candidate's recognition result herein refers to that individual character candidate recognition result that above-mentioned steps 131 obtains is combined into candidate's recognition result of word, word, phrase or sentence according to language model, as " two " and " people " be combined into a word " my god ", " my god " being candidate's recognition result, it is also candidate's recognition result that " literary composition " and " part " are combined into phrase " file ".Then to each candidate's recognition result, language model can calculate the correct probability of this sentence to be had much.Such as, user inputs candidate's recognition result of person's handwriting point for " civilian coroner ", and another candidate's recognition result is " file ", and from language model, the probability of " file " is greater than the probability of " civilian coroner "; If the identification probability of " file " and " civilian coroner " is more or less the same, then result can be defined as more commonly using " file " by language model.
About the realization of language model, a kind of simple method only considers the probability of former and later two words, and the probability being above " literary composition " word as " part " is how many, and " coroner " is the probability of " literary composition " word is above how many, and be forward that what word is irrelevant again.But in fact situation is not like this, thus any implementation method complicated also can consider before before word (or more word), or consider the language model based on word, but calculated amount and storage space can increase a lot.
Equally, for candidate's recognition result " under two people ", " all over the world ", " under husband " etc. ", because " all over the world " is everyday words, the probability therefore drawn according to language model is the highest; And " under two people " are not everyday words, therefore the probability of language model is lower.
The combined chance value of each candidate's recognition result is obtained according to the first probable value of each candidate's recognition result and the second probable value; When calculating combined chance value, a kind of simple method is that the first probable value of each candidate's recognition result and the second probable value are weighted addition, obtains should a combined chance value of candidate's recognition result.Certainly, also can adopt other more complicated computing method, the embodiment of the present invention does not limit at this.
Maximum combined chance value is selected to be predetermined probabilities.This maximum combined chance value can represent the cutting cost of cutting route, the probable value that the cutting namely drawn according to input sequence and stroke relative position is correct.
The method that example above gives maximum entropy model in Corpus--based Method method carries out cutting identification to continuation character, also can comprise in a particular application and utilize other statistical methods to carry out cutting identification to continuation character, such as support vector machines (Support Vector Machine) method etc., the embody rule of these methods in the prior art more description, repeats no more.
In sum, through the process of above flow process, whether the handwriting characteristic that above-mentioned hand-written recognition method can extract user's continuous writing is input in maximum entropy model is the judgement of cut point, relation between character stroke and stroke can be judged more accurately, improve the accuracy of recognition result.Meanwhile, because user once can input multiple word, input speed is substantially increased.
In actual applications, hand-written recognition method described in the embodiment of the present invention can be applicable to some to be had in the product of handwriting input demand, as in the desktop operating systems such as PC, notebook computer, panel computer, handwriting pad.In addition, also can be applied in embedded OS, the intelligent mobile terminals such as such as palm PC, mobile phone, PAD, PDA, little screen mobile phone or horizontal screen mobile phone; The GPS/GIS such as personal information terminal, board information terminal terminal; The intelligent learning terminals such as eBOOK, electronic dictionary, intelligent toy; Tax control machine entry terminal, China second-generation identity card Card Reader information terminal, large database inquiry terminal, Hospitality management system entry terminal, intelligent alarm, digital television interaction telepilot, other data terminals such as Karaoke requesting song device, Information Appliance device etc.The present invention requires lower to the screen size of handwriting area, is particularly useful for reduplicated word input and the identification of small screen device, has greater advantage for small screen devices such as current mobile phones.
Preferably, in multitask system, above-mentioned cutting and comprehensive identifying synchronously can be carried out with writing process (i.e. person's handwriting gatherer process), thus accelerate identifying processing speed further.Described multitask system refers to the system can carrying out multithreading.Within the time period that user writes, due to person's handwriting collection, to take CPU lower or substantially do not take CPU, and therefore most of CPU is in idle condition.And in multitask system, the CPU of this part free time can be used, write while identify, so can recognition speed be accelerated.
Based on foregoing, the embodiment of the present invention additionally provides corresponding system embodiment.
With reference to Fig. 2, it is the structural drawing of a kind of hand-written discrimination system described in the embodiment of the present invention.
Acquisition module 21, for gathering the person's handwriting of input continuously;
Characteristic extracting module 22, for extracting the feature of person's handwriting;
Cutting module 23, for being input in maximum entropy model by handwriting characteristic, maximum entropy model judges whether current stroke is cut point;
Identification module 24, for when current stroke is cut point, cuts character, obtains final recognition result.
Wherein, described cutting module 23 can make full use of the method that based on maximum entropy model judge of statistical method especially in said method embodiment to whether being that cut point judges more accurately between character.In order to further improve the accuracy rate of judgement, can the cutting probability obtained based on maximum entropy model be joined in route searching, improve discrimination further.Therefore described system can comprise determination module 25 further, and determination module 25 can comprise:
Cutting submodule 251; For cutting described character script, obtain at least one cutting route;
Individual character recognin module 252; For carrying out individual character identification to each cutting route, obtaining candidate's recognition result for each cutting route and obtaining the first probable value of this candidate's recognition result, and by each candidate's recognition result input language Model Identification submodule 253;
Language model recognin module 253; For utilizing language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result;
Comprehensive descision submodule 254; For obtaining the combined chance value of each candidate's recognition result according to the first probable value of each candidate's recognition result and the second probable value;
Chooser module 255; Be predetermined probabilities for selecting maximum combined chance value.
After determining predetermined probabilities value, the cutting probability that maximum entropy model can be obtained compares with predetermined probabilities value, if cutting probability is more than or equal to predetermined probabilities value, can judge that obtaining is cut point between character.Through probability contrast so again, improve and to judge between character being the whether accuracy of cut point, enhance character recognition ability.
Based on the above-mentioned hand-written discrimination system based on maximum entropy model, the embodiment of the present invention additionally provides a kind of handwriting recognition terminal, and this handwriting recognition terminal can comprise above-mentioned hand-written discrimination system, thus supports the identification of continuation character input.The concrete structure of described hand-written discrimination system can refer to shown in Fig. 2, is not described in detail in this.
Described handwriting recognition terminal can be the desktop operating system terminals such as PC, notebook computer, panel computer, handwriting pad, also can be the intelligent mobile terminals such as palm PC, mobile phone, PAD, PDA, little screen mobile phone or horizontal screen mobile phone, can also be each Terminal Type with multitask system.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed is the difference with other embodiments, between each embodiment identical similar part mutually see.For system embodiment, due to itself and embodiment of the method basic simlarity, so description is fairly simple, relevant part illustrates see the part of embodiment of the method.
Above to a kind of hand-written recognition method provided by the present invention, system and handwriting recognition terminal, be described in detail, apply specific case herein to set forth principle of the present invention and embodiment, the explanation of above embodiment just understands method of the present invention and core concept thereof for helping; Meanwhile, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims (7)

1. a hand-written recognition method, is characterized in that, comprising:
Set up maximum entropy model, set up maximum entropy model and comprise: select maximum entropy model feature, prepare training data, training maximum entropy model;
Gather the person's handwriting of input continuously;
Extract handwriting characteristic;
Be input to by handwriting characteristic in maximum entropy model, maximum entropy model judges whether current stroke is cut point;
If so, then character is cut, obtain final recognition result;
Described maximum entropy model judges whether current stroke is that cut point comprises: maximum entropy model utilizes handwriting characteristic to provide the probability that current stroke is cut point; If the probability obtained is greater than predetermined probabilities, current stroke is cut point;
The maximum entropy model feature of described selection comprises: the handwriting characteristic selecting the character script inputted continuously with reduplicated word; Namely select the relative position between stroke, stroke is arranged in the start to write regional location at a place, the regional location lifting pen point place, the size increasing stroke, stroke height of position, the stroke of writing region and accounts for the ratio of writing region height or stroke width and account at least one feature of ratio of writing peak width feature as maximum entropy model.
2. method according to claim 1, is characterized in that, also comprises and determines predetermined probabilities, describedly determines that predetermined probabilities comprises:
Described character script is cut, obtains at least one cutting route;
Individual character identification is carried out to each cutting route, obtains candidate's recognition result for each cutting route and obtain the first probable value of this candidate's recognition result;
Utilize language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result;
The combined chance value of each candidate's recognition result is obtained according to the first probable value of each candidate's recognition result and the second probable value;
Maximum combined chance value is selected to be predetermined probabilities.
3. method according to claim 1, is characterized in that, the person's handwriting that described collection inputs continuously comprises: gather the character script that inputs continuously with reduplicated word or with row or with the character script arranging input continuously.
4. method according to claim 1, is characterized in that, the maximum entropy model feature of described selection comprises:
Select with the handwriting characteristic of the continuous character script of input of row, namely to select in the width in width, below space in space before current character and the ratio of width to height of current character at least one feature as the feature of maximum entropy model;
Select the handwriting characteristic of the character script to arrange continuously input, namely to select in the width in width, below space in space above current character and the ratio of width to height of current character at least one feature as the feature of maximum entropy model.
5. a hand-written discrimination system, is characterized in that, comprising:
Acquisition module, for gathering the person's handwriting of input continuously;
Characteristic extracting module: for extracting handwriting characteristic;
Cutting module, for handwriting characteristic is input in maximum entropy model, maximum entropy model judges whether current stroke is cut point, and described maximum entropy model judges whether current stroke is that cut point comprises: maximum entropy model utilizes handwriting characteristic to provide the probability that current stroke is cut point; If the probability obtained is greater than predetermined probabilities, current stroke is cut point, sets up maximum entropy model, and described maximum entropy model of setting up comprises: select maximum entropy model feature, prepares training data, training maximum entropy model; The maximum entropy model feature of described selection comprises: the handwriting characteristic selecting the character script inputted continuously with reduplicated word; Namely select the relative position between stroke, stroke is arranged in the start to write regional location at a place, the regional location lifting pen point place, the size increasing stroke, stroke height of position, the stroke of writing region and accounts for the ratio of writing region height or stroke width and account at least one feature of ratio of writing peak width feature as maximum entropy model;
Identification module, for when current stroke is cut point, cuts character, obtains final recognition result.
6. system according to claim 5, is characterized in that, also comprise:
Determination module, for determining predetermined probabilities;
Described determination module comprises:
Cutting submodule; For cutting described character script, obtain at least one cutting route;
Individual character recognin module; For carrying out individual character identification to each cutting route, obtaining candidate's recognition result for each cutting route and obtaining the first probable value of this candidate's recognition result;
Language model recognin module; For utilizing language model to give a mark to each candidate's recognition result, draw the second probable value of the expression intercharacter related information for each candidate's recognition result;
Comprehensive descision submodule; For obtaining the combined chance value of each candidate's recognition result according to the first probable value of each candidate's recognition result and the second probable value;
Chooser module; Be predetermined probabilities for selecting maximum combined chance value.
7. a handwriting recognition terminal, is characterized in that, comprising: the arbitrary described hand-written discrimination system of claim 5-6.
CN201210205916.1A 2012-06-18 2012-06-18 Handwriting recognition method and system as well as handwriting recognition terminal Active CN102750552B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210205916.1A CN102750552B (en) 2012-06-18 2012-06-18 Handwriting recognition method and system as well as handwriting recognition terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210205916.1A CN102750552B (en) 2012-06-18 2012-06-18 Handwriting recognition method and system as well as handwriting recognition terminal

Publications (2)

Publication Number Publication Date
CN102750552A CN102750552A (en) 2012-10-24
CN102750552B true CN102750552B (en) 2015-07-22

Family

ID=47030717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210205916.1A Active CN102750552B (en) 2012-06-18 2012-06-18 Handwriting recognition method and system as well as handwriting recognition terminal

Country Status (1)

Country Link
CN (1) CN102750552B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345365B (en) * 2013-07-12 2016-04-13 北京蒙恬科技有限公司 The display packing of continuous handwriting input and the hand input device of employing the method
CN103809694A (en) * 2014-02-21 2014-05-21 上海分维智能科技有限公司 Handwriting recognition child intelligent learning system based on intelligent terminal
US10976918B2 (en) * 2015-10-19 2021-04-13 Myscript System and method of guiding handwriting diagram input
CN106056114B (en) * 2016-05-24 2019-07-05 腾讯科技(深圳)有限公司 Contents of visiting cards recognition methods and device
CN107067005A (en) * 2017-04-10 2017-08-18 深圳爱拼信息科技有限公司 A kind of method and device of Sino-British mixing OCR Character segmentations
CN109858323A (en) * 2018-12-07 2019-06-07 广州光大教育软件科技股份有限公司 A kind of character hand-written recognition method and system
CN112699780A (en) * 2020-12-29 2021-04-23 上海臣星软件技术有限公司 Object identification method, device, equipment and storage medium
CN113064497A (en) * 2021-03-23 2021-07-02 上海臣星软件技术有限公司 Statement identification method, device, equipment and computer storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901355A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Character recognition method and device based on maximum entropy
CN102063620A (en) * 2010-12-31 2011-05-18 北京捷通华声语音技术有限公司 Handwriting identification method, system and terminal
CN102073884A (en) * 2010-12-31 2011-05-25 北京捷通华声语音技术有限公司 Handwriting recognition method, system and handwriting recognition terminal
CN102243708A (en) * 2011-06-29 2011-11-16 北京捷通华声语音技术有限公司 Handwriting recognition method, handwriting recognition system and handwriting recognition terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901355A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Character recognition method and device based on maximum entropy
CN102063620A (en) * 2010-12-31 2011-05-18 北京捷通华声语音技术有限公司 Handwriting identification method, system and terminal
CN102073884A (en) * 2010-12-31 2011-05-25 北京捷通华声语音技术有限公司 Handwriting recognition method, system and handwriting recognition terminal
CN102243708A (en) * 2011-06-29 2011-11-16 北京捷通华声语音技术有限公司 Handwriting recognition method, handwriting recognition system and handwriting recognition terminal

Also Published As

Publication number Publication date
CN102750552A (en) 2012-10-24

Similar Documents

Publication Publication Date Title
CN102750552B (en) Handwriting recognition method and system as well as handwriting recognition terminal
CN108717406B (en) Text emotion analysis method and device and storage medium
CN110909548B (en) Chinese named entity recognition method, device and computer readable storage medium
CN108629043B (en) Webpage target information extraction method, device and storage medium
CN102855082B (en) Character recognition for overlay text user input
CN110674271B (en) Question and answer processing method and device
CN102449640B (en) Recognizing handwritten words
EP3320482B1 (en) System for recognizing multiple object input and method and product for same
KR101825154B1 (en) Overlapped handwriting input method
CN102063620A (en) Handwriting identification method, system and terminal
CN109358766B (en) Progress display of handwriting input
CN102073884A (en) Handwriting recognition method, system and handwriting recognition terminal
CN106095845B (en) Text classification method and device
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN103902098A (en) Shaping device and shaping method
EP4172803A1 (en) Computerized information extraction from tables
CN106127222A (en) The similarity of character string computational methods of a kind of view-based access control model and similarity determination methods
CN113673432A (en) Handwriting recognition method, touch display device, computer device and storage medium
CN106339481A (en) Chinese compound new-word discovery method based on maximum confidence coefficient
CN112686053A (en) Data enhancement method and device, computer equipment and storage medium
CN102243708B (en) Handwriting recognition method, handwriting recognition system and handwriting recognition terminal
EP3037985A1 (en) Search method and system, search engine and client
CN101986309A (en) Method and device for inquiring question bank
CN115760500A (en) Method, device, equipment and storage medium for optimizing teacher reading and amending operation
CN101901348A (en) Normalization based handwriting identifying method and identifying device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP01 Change in the name or title of a patent holder

Address after: 100193, No. two, building 10, Zhongguancun Software Park, 8 northeast Wang Xi Road, Beijing, Haidian District, 206-1

Patentee after: Beijing InfoQuick SinoVoice Speech Technology Corp.

Address before: 100193, No. two, building 10, Zhongguancun Software Park, 8 northeast Wang Xi Road, Beijing, Haidian District, 206-1

Patentee before: Jietong Huasheng Speech Technology Co., Ltd.

CP02 Change in the address of a patent holder

Address after: Building 2102, building 1, Haidian District, Beijing

Patentee after: BEIJING SINOVOICE TECHNOLOGY Co.,Ltd.

Address before: 100193 two, 206-1, Zhongguancun Software Park, 8 Northeast Northeast Road, Haidian District, Beijing, 206-1

Patentee before: BEIJING SINOVOICE TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder