Disclosure of Invention
The invention provides a handwriting recognition method, a handwriting recognition system and a character recognition terminal, and aims to solve the problems that the recognition rate is reduced and the handwriting experience of multi-character input is influenced due to frequent errors of the conventional character recognition result.
In order to solve the above problems, the present invention discloses a handwriting recognition method, comprising: acquiring continuously input handwriting; extracting handwriting characteristics; inputting the handwriting characteristics into a maximum entropy model, and judging whether the current stroke is a cutting point or not by the maximum entropy model; if yes, cutting the character to obtain a final recognition result.
Preferably, the determining, by the maximum entropy model, whether the current stroke is a cut point includes: the maximum entropy model gives the probability that the current stroke is a cutting point by using the handwriting characteristics; and if the obtained probability is greater than the preset probability, the current stroke is the cutting point.
Preferably, the method further comprises a step of determining a preset probability, wherein the step of determining the preset probability comprises: cutting the character handwriting to obtain at least one cutting path; performing single character recognition on each segmentation path, and obtaining a candidate recognition result and a first probability value of the candidate recognition result for each segmentation path; scoring each candidate recognition result by using a language model to obtain a second probability value which is used for expressing the association information between the characters and aims at each candidate recognition result; obtaining a comprehensive probability value of each candidate recognition result according to the first probability value and the second probability value of each candidate recognition result; and selecting the maximum comprehensive probability value as a preset probability.
Preferably, the acquiring of the continuously input handwriting comprises: character scripts continuously input in a character stack or character scripts continuously input in a row or a column are collected.
Preferably, the method further comprises establishing a maximum entropy model, wherein the establishing of the maximum entropy model comprises: and selecting the maximum entropy model characteristic, preparing training data and training the maximum entropy model.
Preferably, the selected maximum entropy model features include: selecting handwriting characteristics of character handwriting continuously input by overlapping characters; selecting at least one characteristic of the relative position among strokes, the position of the strokes in a writing area, the area position of stroke falling points, the area position of stroke lifting points, the size proportion of the increased strokes, the proportion of the stroke height in the writing area height or the proportion of the stroke width in the writing area width as the characteristic of the maximum entropy model.
Preferably, the selected maximum entropy model features include: selecting handwriting characteristics of character handwriting input in line succession, namely selecting at least one characteristic of the width of a front gap, the width of a rear gap and the aspect ratio of the current character as the characteristics of the maximum entropy model; selecting handwriting characteristics of character handwriting input in a column in succession, namely selecting at least one characteristic of the width of the upper gap, the width of the lower gap and the aspect ratio of the current character as the characteristics of the maximum entropy model.
The invention also discloses a handwriting recognition system, which comprises: the acquisition module is used for acquiring continuously input handwriting; a feature extraction module: used for extracting handwriting characteristics; the cutting module is used for inputting the handwriting characteristics into the maximum entropy model, and the maximum entropy model judges whether the current stroke is a cutting point; and the recognition module is used for cutting the character to obtain a final recognition result when the current stroke is a cutting point.
Preferably, the handwriting recognition system further comprises: the determining module is used for determining a preset probability; the determining module comprises:
cutting the sub-modules; the character handwriting processing device is used for cutting the character handwriting to obtain at least one cutting path;
a single character recognition submodule; the system comprises a database, a database server and a database server, wherein the database is used for storing a plurality of segmentation paths, and the database server is used for performing single character recognition on each segmentation path, obtaining a candidate recognition result and a first probability value of the candidate recognition result aiming at each segmentation path;
a language model identification submodule; the system comprises a language model, a first probability value and a second probability value, wherein the language model is used for scoring each candidate recognition result to obtain a second probability value which is used for expressing the association information between the characters aiming at each candidate recognition result;
a comprehensive judgment submodule; the comprehensive probability value of each candidate recognition result is obtained according to the first probability value and the second probability value of each candidate recognition result;
selecting a submodule; and the method is used for selecting the maximum comprehensive probability value as a preset probability.
The invention also discloses a handwriting recognition terminal which comprises the handwriting recognition system disclosed by the invention.
Compared with the prior art, the invention has the following advantages:
the character cutting method based on the maximum entropy is a statistical prediction model, can judge the relation between character strokes more accurately so as to determine whether the character strokes are cutting points, gives the probability of judging the cutting points, judges the cutting between characters more comprehensively and comprehensively, and improves the accuracy of recognition results.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a handwriting recognition method, a handwriting recognition system and a handwriting recognition terminal.
The following examples are given for the purpose of illustration.
Fig. 1 is a flowchart of a handwriting recognition method according to an embodiment of the present invention.
Step 11, collecting continuously input handwriting;
step 12, extracting handwriting characteristics;
the user can repeatedly and continuously input a plurality of characters in the same handwriting area, and the characters comprise Chinese characters, punctuation marks, English letters and other forms.
And acquiring character handwriting continuously input by a user, wherein the character handwriting refers to information input in a stroke form. The equipment for acquiring handwriting input has multiple types, such as an electromagnetic induction handwriting board, a pressure-sensitive handwriting board, a touch screen, a touch pad, an ultrasonic pen and the like, and different equipment records the coordinates written by a user by using an induction device installed on the equipment when acquiring, namely, a stroke point. The pen-down position is usually recorded as the start position of a stroke, the pen-up position is recorded as the end position of a stroke, and a series of handwriting points between the pen-down position and the pen-up position form an input stroke.
Step 13, inputting the handwriting characteristics into a maximum entropy model, and judging whether the current stroke is a cutting point or not by the maximum entropy model;
in the handwriting recognition method of the embodiment, a plurality of character scripts continuously input by a user are collected, and in practical application, handwriting points of character scripts continuously input by overlapping characters can be collected; or collecting handwriting points of character handwriting input in lines continuously; or collecting handwriting points of character handwriting input in a column in succession.
Before determining whether the cut point is the cut point, a maximum entropy model needs to be established, and specifically, establishing the maximum entropy model may include: and selecting the maximum entropy model characteristic, preparing training data and training the maximum entropy model.
The following specific examples are given in detail:
(1) selecting maximum entropy model features
Features related to character stroke positions are selected as features of the maximum entropy model. During selection, different handwriting characteristics are selected according to different input situations, for example: in the case of continuous input of a superimposed word, the selected handwriting characteristics may include: at least one of the characteristics of the relative position among strokes, the position of the stroke in the writing area, the area position of the stroke falling point, the area position of the stroke lifting point, the size proportion of the increased stroke, the proportion of the stroke height in the writing area height or the proportion of the stroke width in the writing area width and the like is taken as the characteristic of the maximum entropy model. The selected features include, but are not limited to, the above listed features, and the desired handwriting features may be selected according to the requirements of the actual application. In the case of continuous input in lines, the selected handwriting features may include: at least one of the characteristics of the width of a front gap of the current character, the width of a rear gap, the aspect ratio of the current character and the like is used as the characteristic of the maximum entropy model; and selecting at least one of the width of the upper gap of the current character, the width of the lower gap of the current character and the aspect ratio of the current character as the characteristic of the maximum entropy model. The following is an example of a double-letter input.
(2) Preparing training data
After selecting the features of the maximum entropy model, preparation of training data is performed, requiring determination of the features of character stroke positions in the model. Such as the relative position between strokes, the position of the stroke in the writing area, etc., i.e., x in the above model. Data preparation is then performed, and some of the alphanumeric strokes are prepared and labeled according to the determined characteristics.
Consider a random process p (Y | x) that outputs some Y with some probability based on the observed vector x, Y belonging to a finite set Y. In the judgment of character segmentation, Y = {1, 0}, which respectively represents a segmentation point and a non-segmentation point. x represents the character stroke position related characteristics, i.e. the character strokes of the character stack to be determined, including the relative position between the strokes, the position of the stroke in the writing area, etc. To reconstruct the random process p (y | x), we sample its output, resulting in N training samples (x)1,y1),(x2,y2),……,(xN,yN). Since these training examples are generated by this random process, we assume that the empirical probability of an event in a training example is equal to the expected probability of that event when p (y | x) is known.
(3) Training maximum entropy model
After the training data is prepared, the maximum entropy model is trained using the prepared training data. And sending the data marked with the relative positions among the strokes and the character stroke positions of the strokes in the writing area position in the last step into a maximum entropy model for training, wherein the data format is as follows: cut, feature 1, feature 2 … …
An event can be represented by a characterizing function fi(x, y). If example (x)j,yj) In case this event occurs, fi(xi,yi) And =1, otherwise 0. For example: if x satisfies the previous character writing completion and y is the cut point, fi(x, y) =1; otherwise, then fi(x, y) = 0. The empirical probability of this event in the training example is expressed as:
<math>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
</mrow>
</msub>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein,is the probability that the sample (x, y) occurred in the training sample,
number of occurrences in the training character strokes.
If p (y | x) is known, event fiThe desired probability of (x, y) is expressed as:
<math>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
</mrow>
</msub>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
where p (x) is the probability of x in the training example.
According to our hypothesis there areNamely:
<math>
<mrow>
<msub>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
</mrow>
</msub>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msub>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
</mrow>
</msub>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
we call the characterization function fi(x, y) is a feature function, or feature for short. The above formula is referred to as relating to feature fi(x, y), a constraint equation, simply constraint. The constraint is an equation between the random process p (y | x) and the training sample for a feature that places some constraint on the distribution of p (y | x) so that the resulting sample is statistically close to the training sample in terms of the indication of the feature.
Assuming that n features have been defined, all random processes that satisfy these n features form a set:
<math>
<mrow>
<mi>C</mi>
<mo>≡</mo>
<mo>{</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mi>f or i</mi>
<mo>∈</mo>
<mo>{</mo>
<mn>1,2</mn>
<mo>,</mo>
<mo>.</mo>
<mo>.</mo>
<mo>.</mo>
<mo>,</mo>
<mi>n</mi>
<mo>}</mo>
<mo>}</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
typically, | C | > 1. We choose the random process with the largest entropy as the reconstructed model. The entropy here is a conditional entropy, expressed as:
<math>
<mrow>
<mi>H</mi>
<mrow>
<mo>(</mo>
<mi>p</mi>
<mo>)</mo>
</mrow>
<mo>≡</mo>
<mo>-</mo>
<msub>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
</mrow>
</msub>
<mover>
<mi>p</mi>
<mo>~</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mi>log</mi>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>5</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
the model we finally reconstruct is: p ═ argmax p∈CH(p) (6)
This model is referred to as the maximum entropy model. The principle of maximum entropy ensures that the maximum entropy model has good generalization effect. Expression form and parameter calculation of maximum entropy model
Solving equation (6) yields a maximum entropy model having the form:
<math>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mrow>
<mi>z</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>Σ</mi>
<mi>i</mi>
</msub>
<msub>
<mi>λ</mi>
<mi>i</mi>
</msub>
<msub>
<mi>f</mi>
<mi>i</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
in the above formula, λ i is the feature fiThe weight of (x, y) may be trained from the training character strokes using the IIS or L-BFGS iterative algorithm. Z (x) is a normalized coefficient.
And after the maximum entropy model is established, inputting the acquired handwriting characteristics into the maximum entropy model for judgment. The specific process of the judgment may be as described in step 14.
And 14, if the maximum entropy model judges that the current stroke is a cutting point, cutting the character to obtain a final recognition result. If the maximum entropy model judges that the current stroke is not the cutting point, the character is not cut, and the handwriting characteristics of the continuously input character can be continuously acquired.
The judgment process may specifically include that the maximum entropy model gives the probability of whether the character strokes are cut points or not. If the probability is large, the character is considered as a cutting point between strokes of the character, and the character is cut. And if the obtained cutting probability is greater than the fixed value, the probability that the cutting point is formed between the character strokes is great, and the cutting can be performed.
How to judge whether the obtained probability of the cutting point is very high can also be added into the path search, so that the recognition rate is further improved. In specific implementation, a preset probability may be set, and if the obtained probability of whether the character is a cut point is greater than the preset probability, the probability that the character stroke and the stroke are the cut point is very high, and the cutting may be performed. The preset probability may be obtained by:
cutting the character handwriting to obtain at least one cutting path;
for example, the "world" is divided into 4 division paths, i.e., "two | person | next", "two | person next", and "two person next", where each division path corresponds to a division probability value.
Performing single character recognition on each segmentation path, and obtaining a candidate recognition result and a first probability value of the candidate recognition result for each segmentation path;
the identification process may adopt various existing identification methods, and the embodiment of the present invention is not limited herein.
In each segmentation path, each single character segmented by the candidate segmentation points is identified, a plurality of candidate identification results (which are single character candidate identification results) can be obtained for the identification of each single character, and the single character identification probability of each candidate identification result is obtained and is called as a first probability value.
For example, for the input of the shorter "world", the corresponding 4 slicing paths "two | person | under", "two person | under", and "two person under" are respectively identified: respectively carrying out individual character recognition on the two persons and the lower person aiming at the segmentation path 'two persons' lower ', wherein candidate recognition results obtained corresponding to the two persons' may be 'day' and 'fu' and the like, and each candidate recognition result obtains an individual character recognition probability, for example, first probability values corresponding to the 'day' and the 'fu' are A, B respectively; similarly, the word recognition is carried out for 'lower' to obtain one or more corresponding candidate recognition results and a first probability value of each candidate recognition result. The single character recognition processes of other segmentation paths are the same and are not described in detail.
Scoring each candidate recognition result by using a language model to obtain a second probability value which is used for expressing the association information between the characters and aims at each candidate recognition result;
the language model may represent association information between characters, which may be represented by probabilities. The language model is a model for calculating the probability of a phrase or a sentence, and for a sentence, if there are multiple segmentation paths, there are multiple candidate recognition results, where the candidate recognition result refers to the candidate recognition result of combining the single character candidate recognition results obtained in step 131 into a character, a word, a phrase or a sentence according to the language model, for example, "two" and "person" are combined into a character "day", which is a candidate recognition result, and "text" and "piece" are combined into a phrase "file" which is also a candidate recognition result. The language model will calculate for each candidate recognition result how likely it is that the sentence is correct. For example, one candidate recognition result of the user input stroke point is a "piece" and the other candidate recognition result is a "file", and the probability of the "file" is greater than that of the "piece" as known by the language model; if the recognition probabilities of the file and the file are not very different, the language model determines the result as the more common file.
With respect to the implementation of the language model, a simple approach is to consider only the probabilities of two words before and after, as what the probability of a "piece" preceded by a "literal" word is, regardless of what words are before. In practice this is not the case, so a less complex implementation may also take into account the previous word (or more) or a word-based language model, but the amount of computation and memory space would increase considerably.
Similarly, for the candidate recognition results "two persons down", "Tian Xian", "Fu Xian", etc., the probability obtained according to the language model is the highest because "Tian Xian" is a common word; and "under two people" is not a common word, so the probability of the language model is low.
Obtaining a comprehensive probability value of each candidate recognition result according to the first probability value and the second probability value of each candidate recognition result; when calculating the comprehensive probability value, a simple method is to perform weighted addition on the first probability value and the second probability value of each candidate recognition result to obtain a comprehensive probability value corresponding to the candidate recognition result. Of course, other more complex calculation methods may also be adopted, and the embodiment of the present invention is not limited herein.
And selecting the maximum comprehensive probability value as a preset probability. This maximum integrated probability value may represent the cost of the segmentation path, i.e., the probability value of the correct segmentation based on the input order and the relative positions of the strokes.
The above example shows that the method based on the maximum entropy model in the statistical method performs segmentation recognition on the continuous characters, and in a specific application, the method may also include performing segmentation recognition on the continuous characters by using other statistical methods, such as a support Vector machine (svm) (support Vector machine) method, and the like.
In summary, through the processing of the above processes, the handwriting recognition method can extract handwriting features continuously written by the user and input the handwriting features into the maximum entropy model to judge whether the handwriting features are cut points, can judge the relationship between character strokes more accurately, and improves the accuracy of the recognition result. Meanwhile, the user can input a plurality of characters at a time, so that the input speed is greatly improved.
In practical application, the handwriting recognition method provided by the embodiment of the invention can be applied to products with handwriting input requirements, such as desktop operating systems of a PC (personal computer), a notebook computer, a tablet computer, a writing pad and the like. In addition, the method can also be applied to an embedded operating system, such as an intelligent mobile terminal like a palm computer, a mobile phone, a PAD, a PDA, a small-screen mobile phone or a horizontal-screen mobile phone; GPS/GIS terminals such as personal information terminals and vehicle-mounted information terminals; intelligent learning terminals such as eBOOK, electronic dictionary, intelligent toy and the like; the system comprises a tax control machine input terminal, a second-generation ID card reading information terminal, a large-scale database inquiry terminal, a hotel management system input terminal, an intelligent alarm, a digital television interactive remote controller, a karaoke song ordering device, an information appliance controller and other data terminals. The invention has lower requirement on the screen size of a handwriting area, is particularly suitable for the character-overlapping input and recognition of small-screen equipment, and has greater advantages for the small-screen equipment such as the current mobile phone and the like.
Preferably, in a multitasking system, the cutting and comprehensive recognition processes can be performed synchronously with the writing process (i.e., the handwriting collecting process), so that the recognition processing speed is further increased. The multitasking system refers to a system capable of multithreading. In the writing time period of the user, the handwriting acquisition occupies a lower CPU or does not occupy the CPU basically, so most of the CPU is in an idle state. In the multitask system, the idle CPU can be utilized and the recognition is carried out while writing, so that the recognition speed can be increased.
Based on the above, the embodiment of the invention also provides a corresponding system embodiment.
Fig. 2 is a structural diagram of a handwriting recognition system according to an embodiment of the present invention.
The acquisition module 21 is used for acquiring continuously input handwriting;
a feature extraction module 22, configured to extract features of the handwriting;
the cutting module 23 is configured to input the handwriting characteristics into a maximum entropy model, and the maximum entropy model determines whether the current stroke is a cutting point;
and the recognition module 24 is configured to cut the character to obtain a final recognition result when the current stroke is a cut point.
The cutting module 23 may make use of a statistical method, particularly the method based on maximum entropy model determination in the above method embodiments, to determine whether the characters are cut points more accurately. In order to further improve the accuracy of judgment, the cutting probability obtained based on the maximum entropy model can be added into the path search, so that the recognition rate is further improved. The system may thus further comprise a determination module 25, the determination module 25 may comprise:
cutting the sub-module 251; the character handwriting processing device is used for cutting the character handwriting to obtain at least one cutting path;
a single word identification submodule 252; the language model recognition submodule 253 is used for performing single character recognition on each segmentation path, obtaining a candidate recognition result and a first probability value of the candidate recognition result for each segmentation path, and inputting each candidate recognition result into the language model recognition submodule;
a language model identification sub-module 253; the system comprises a language model, a first probability value and a second probability value, wherein the language model is used for scoring each candidate recognition result to obtain a second probability value which is used for expressing the association information between the characters aiming at each candidate recognition result;
a comprehensive judgment sub-module 254; the comprehensive probability value of each candidate recognition result is obtained according to the first probability value and the second probability value of each candidate recognition result;
a select sub-module 255; and the method is used for selecting the maximum comprehensive probability value as a preset probability.
After the preset probability value is determined, the cutting probability obtained by the maximum entropy model can be compared with the preset probability value, and if the cutting probability is greater than or equal to the preset probability value, it can be judged that a cutting point exists between the obtained characters. Through the probability comparison again, the accuracy of judging whether the characters are cut points is improved, and the character recognition capability is enhanced.
Based on the handwriting recognition system based on the maximum entropy model, the embodiment of the invention also provides a handwriting recognition terminal, and the handwriting recognition terminal can comprise the handwriting recognition system, so that the recognition of continuous character input is supported. The specific structure of the handwriting recognition system can be shown in fig. 2, and will not be described in detail here.
The handwriting recognition terminal can be a desktop operation system terminal such as a PC (personal computer), a notebook computer, a tablet personal computer, a handwriting PAD and the like, can also be an intelligent mobile terminal such as a palm computer, a mobile phone, a PAD (PAD application data center), a PDA (personal digital assistant), a small-screen mobile phone or a transverse-screen mobile phone and the like, and can also be various terminals with a multi-task system.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The handwriting recognition method, the handwriting recognition system and the handwriting recognition terminal provided by the invention are introduced in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.