CN101452357A - Hand-written character input method and system - Google Patents

Hand-written character input method and system Download PDF

Info

Publication number
CN101452357A
CN101452357A CNA2008102198651A CN200810219865A CN101452357A CN 101452357 A CN101452357 A CN 101452357A CN A2008102198651 A CNA2008102198651 A CN A2008102198651A CN 200810219865 A CN200810219865 A CN 200810219865A CN 101452357 A CN101452357 A CN 101452357A
Authority
CN
China
Prior art keywords
character
hand
sample
written
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008102198651A
Other languages
Chinese (zh)
Other versions
CN101452357B (en
Inventor
高精鍊
黄新春
陈炳辉
蔡沐宇
胡安进
陆华兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Guobi Technology Co Ltd
Original Assignee
Guangdong Guobi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Guobi Technology Co Ltd filed Critical Guangdong Guobi Technology Co Ltd
Priority to CN2008102198651A priority Critical patent/CN101452357B/en
Publication of CN101452357A publication Critical patent/CN101452357A/en
Application granted granted Critical
Publication of CN101452357B publication Critical patent/CN101452357B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention discloses a hand-written character input method, which comprises: A, ,performing feature selection on prestored character classes and calculating a template in coarse classification; B, performing feature transformation on the prestored character classes and calculating a template in fine classification; C, acquiring a discrete coordinate sequence of a track point of an input character and adopting a smooth continuous function to adjust the size and shape of the hand-written input character and a centrobaric coordinate value; D, extracting feature to obtain a multidimensional feature vector of the hand-written character; E, selecting partial feature value of the hand-written input character, matching the hand-written input character with the template in coarse classification respectively, and selecting a plurality of candidate character classes with the maximum similarity; and F, performing feature transformation on the hand-written input character, matching the hand-written input character with the sample center of the candidate character classes selected from the template in fine classification, and determining the most similar character classes. The invention also discloses a hand-written character input system. The invention has higher speed of identifying the hand-written input character, and higher identification accuracy.

Description

A kind of hand-written character input method and system
Technical field
The present invention relates to the Handwritten Digits Recognition field, relate in particular to a kind of hand-written character input method and system.
Background technology
The recognition technology of handwriting input at present is applied to the various communication terminals and the information processing terminal; Such end product has a hand-written video screen of the touch that is used to write usually, the user can write with pen or finger in the above, handle through the identification of terminal then, generating corresponding character is presented on the terminal, and carry out follow-up associative operation, the recognition technology of handwriting input has improved the speed and the dirigibility of input, is therefore used at large.
The identifying of present handwriting input is divided into steps such as signals collecting, pre-service, feature extraction, characteristic matching mostly, pre-service in the existing method is at the employing piecewise linear function during to the center of housing with the character stroke gravity-center-change, make that the character shape after the conversion is unnatural, the identification that influences the back is handled.
Prior art is big because of calculated amount, the processing time is long, makes recognition speed still slower, and accuracy of identification is not high, is still waiting further improvement.
Summary of the invention
Based on the deficiencies in the prior art, the technical matters that the embodiment of the invention will solve is to provide a kind of method and system of hand-written character input, makes recognition speed faster, and accuracy of identification is more accurate.
Purpose of the present invention is achieved through the following technical solutions: a kind of method of hand-written character input may further comprise the steps:
A, from the eigenvector of the sample of prestored character classes the selected part eigenwert, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes;
B, the eigenvector of the sample of prestored character classes is carried out eigentransformation, recomputate the center of a sample of each character type, obtain the template in fine classification that the center of a sample by all character types constitutes;
C, the signal that receives the hand-written character input and the discrete coordinate sequence of gathering the input character tracing point, utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence, with the size of adjustment handwriting characters and the coordinate figure of shape and center of gravity, obtain the regular coordinate sequence of this character;
D, according to the regular coordinate sequence of input character, carry out the multidimensional eigenvector that feature extraction obtains this hand-written character;
E, from the multidimensional eigenvector of described handwriting characters the selected part eigenwert, described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
F, the multidimensional eigenvector of described handwriting characters is carried out eigentransformation, handwriting characters after the eigentransformation is mated with the center of a sample of the candidate character classes of choosing from described disaggregated classification template, therefrom determine the most similar character type, select for the user.
The present invention also provides a kind of system of hand-written character input, and this system comprises:
Signal acquisition module is used to receive the signal of hand-written character input and the discrete coordinate sequence of gathering the hand-written character tracing point;
The normalization module, be used to utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity;
Characteristic extracting module is used for carrying out the multidimensional eigenvector that feature extraction obtains this hand-written character according to described regular coordinate sequence;
Memory module, be used to store the rough sort template and the disaggregated classification template of eigentransformation matrix and all character types, the rough sort template is made of the center of a sample that all character types calculate after Feature Selection, and the disaggregated classification template is made of the center of a sample that all character types calculate after eigentransformation;
The rough sort module, be used for multidimensional eigenvector selected part eigenwert from handwriting characters, with described handwriting characters respectively with described memory module in coarse classification mate, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
The disaggregated classification module, be used for the multidimensional eigenvector of handwriting characters is carried out eigentransformation, the center of a sample of the candidate character classes chosen in the handwriting characters after the eigentransformation and the disaggregated classification template from described memory module is mated, therefrom determine the most similar character type, select for the user.
Compared with prior art, the present invention carries out pre-service by adopting level and smooth continuous function to hand-written character, make the size of pretreated hand-written character and shape also standard more naturally, thereby the speed of feature extraction is faster after making, precision is higher, it is faster that the present invention is had the Handwritten Digits Recognition speed of input, and accuracy of identification is beneficial effect more accurately.
A kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C specifically also comprises step:
Judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C also comprises step:
Whether the tracing point of the hand-written character that inspection collects has only one, gathers again if then delete this tracing point;
Coordinate distance in the tracing point of the hand-written character that detection collects between the consecutive point if this distance less than preset threshold, is then deleted wherein a bit, makes to keep certain distance between the consecutive point.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C specifically may further comprise the steps:
The abscissa value and the ordinate value of all tracing points are transformed between 0 to 100;
Calculate the barycentric coordinates value of all tracing point horizontal ordinates and ordinate respectively;
With all tracing point coordinate figures and barycentric coordinates value divided by 100, all tracing point coordinate figures and barycentric coordinates value are become between 0 to 1, utilization makes the horizontal ordinate of barycentric coordinates value and ordinate become a level and smooth continuous function of 0.5, and the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence;
Again all tracing point coordinate figures be multiply by 64, obtain the regular coordinate sequence of input character.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step D specifically may further comprise the steps:
According to the regular coordinate sequence of hand-written character, the vector line segment that all adjacent track points are formed decomposes 8 reference directions, obtains the line of vector length value on each reference direction;
The described line of vector length value that obtains is handled, obtained the multidimensional eigenvector that large-scale characteristics value and small scale features value constitute.
This characteristic extraction step to obtain eigenvector, makes calculated amount little tracing point resolution of vectors to 8 reference direction, thereby speed is faster and the feature of extraction is more accurate.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described steps A specifically may further comprise the steps:
According to the Fisher criterion, from the sample of each character type of prestoring, choose and make Fisher than maximum several features value;
According to the eigenvector of the sample that constitutes by the selected characteristic value, calculate the eigenvector of the center of a sample of this character type, obtain the coarse classification that the center of a sample by all character types constitutes.
Described step e specifically may further comprise the steps:
According to the Fisher criterion, selected part eigenwert from the multidimensional eigenvector of described handwriting characters, described handwriting characters has the eigenvector that is made of the selected characteristic value with the sample same dimension of character type;
Described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum.
By character type being adopted the Fisher criterion select feature to carry out rough sort, the feature of selection has the better recognition precision, and calculated amount is little.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step B specifically may further comprise the steps:
The eigentransformation matrix that utilization obtains according to the Fisher criterion carries out eigentransformation with the sample of all character types, reduces the dimension of its multidimensional eigenvector;
Recomputate the center of a sample of all character types after the eigentransformation;
Center of a sample to described eigentransformation matrix and all character types carries out the iteration adjustment, recomputates the center of a sample of eigentransformation matrix and all character types, obtains the template in fine classification that the center of a sample by all character types constitutes.
Described step F specifically may further comprise the steps:
With the adjusted eigentransformation matrix of iteration handwriting characters is carried out eigentransformation, obtain its low-dimensional eigenvector;
The low-dimensional eigenvector of this handwriting characters mates with the center of a sample of the candidate character classes of choosing from described disaggregated classification template respectively, determines the character type of similarity maximum from candidate character classes, selects for the user.
On the basis of rough sort, utilize the eigentransformation matrix that the character sample in hand-written character and the described candidate character classes is carried out eigentransformation, center of a sample to eigentransformation matrix and candidate character classes carries out the iteration adjustment then, candidate character classes is carried out disaggregated classification, make that the recognition speed of hand-written character input identification is fast, and the accuracy of identification height.
The eigenvector of the sample of the prestored character classes described in described steps A, the B is the multidimensional eigenvector that obtains by described step C, D in advance.
A kind of preferred implementation of the system of a kind of hand-written character input of the present invention is that described signal acquisition module specifically comprises:
Collecting unit, the discrete coordinate sequence that is used to gather the hand-written character tracing point;
Judging unit is used to judge whether the input of a hand-written character finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence;
Detecting unit is used to check whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point; And detect the coordinate distance between the consecutive point in the tracing point of the hand-written character collect, if this distance less than preset threshold, is then deleted wherein a bit, make to keep certain distance between the consecutive point.
The another kind of preferred implementation of the system of a kind of hand-written character of the present invention input is that this system also comprises: display module, be used to show the most similar character type of described disaggregated classification module output, and select for the user.
Description of drawings
Fig. 1 is the process flow diagram of a kind of hand-written character input method of the embodiment of the invention.
Fig. 2 is the detail flowchart of a kind of hand-written character input method of the embodiment of the invention.
Fig. 3 is the structural representation of a kind of hand-written character input system of the embodiment of the invention.
Fig. 4 is the detailed structure synoptic diagram of a kind of hand-written character input system of the embodiment of the invention.
Fig. 5 a is the synoptic diagram before the character boundary shape adjustments among the step S02 among Fig. 1.
Fig. 5 b is the synoptic diagram after the character boundary shape adjustments among the step S02 among Fig. 1.
Fig. 6 is the synoptic diagram that step S02 puts adjusted character into a grid among Fig. 1.
Fig. 7 is 8 reference direction figure described in the step S03 among Fig. 1.
Fig. 8 is the synoptic diagram that a vector line segment is decomposed 2 reference directions described in the step S03 among Fig. 1.
Embodiment
For making the present invention easier to understand, the present invention is further elaborated in conjunction with the accompanying drawings, but the embodiment in the accompanying drawing does not constitute any limitation of the invention.
The present invention is the character process coordinate sequence collection to handwriting input, treatment schemees such as pre-service, eigenvector extraction, rough sort, disaggregated classification, thus finally discern this hand-written character.
Fig. 1 shows the process flow diagram of a kind of hand-written character input method of the embodiment of the invention, and this method may further comprise the steps:
Step S01, selected part eigenwert from the eigenvector of the sample of pre-prepd character type, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes, coarse classification is stored in the storer of entry terminals such as mobile phone; Particularly, the sample of prestored character classes is in advance by feature extraction, obtain its multidimensional eigenvector, then according to the Fisher criterion, from the multidimensional eigenvector of the sample of each character type, choose and make Fisher, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes than maximum several features value.
The purpose of this step is to obtain coarse classification from the eigenvector of the sample of pre-prepd character type, in order to improve the speed of rough sort, select a part of feature calculation matching distance, feature selecting and masterplate design are closed at a training sample set and are carried out.Training sample set comprises the handwriting samples of each character class, each sample through feature extraction with 640 eigenwerts (640 the dimension eigenvector x=[x 1..., x 640] T) expression.Be provided with the sample of N altogether of C classification, wherein classification i has Ni sample.Selecting the criterion of feature is Fisher criterion (on the pattern-recognition teaching material detailed description being arranged): the basic thought of Fisher criterion function is, the structure evaluation function, make that the distance between the classification that is classified is big as far as possible when evaluation function is optimum, distance is as far as possible little between all kinds of internal specimens simultaneously.
J schedule of samples of i class is shown eigenvector (being made up of the part candidate feature), then the center of a sample of each classification (average) is
μ i = 1 N i Σ j = 1 N i x i j , i = 1 , . . . , C - - - ( 1 )
Total center is μ 0 = 1 N Σ i = 1 C N i μ i .
Covariance matrix is calculated as respectively between interior covariance matrix of class and class:
S w = 1 N Σ i = 1 C Σ j = 1 N i ( x j i - μ i ) ( x j i - μ i ) T - - - ( 2 )
S b = 1 N Σ i = 1 C N i ( μ i - μ 0 ) ( μ i - μ 0 ) T - - - ( 3 )
The target of feature selecting is on the basis of selecting Partial Feature, matrix
Figure A200810219865D00145
Mark (Fisher ratio) reaches maximal value.Here
Figure A200810219865D00147
Candidate feature changes in feature selection process.Seeking Fisher is a combinatorial optimization problem than maximum characteristics combination, available order sweep forward method approximate solution:
Calculate the Fisher ratio of each feature earlier, select Fisher than maximum feature.Then in the remaining feature each is calculated the Fisher ratio with selecting feature composition characteristic vector successively, select Fisher to add and has selected feature than maximum feature.So repeatedly, till selecting feature to reach specified number or amount (being decided to be below 100).
The detailed process of feature selecting is as follows: at first with in 640 features each successively as the candidate, calculate the Fisher ratio, with Fisher than a maximum feature as first feature of electing.Then each feature of electing with the first time successively in 639 remaining features (this moment, candidate feature had two) is estimated, selected Fisher than the maximum combination that contains 2 eigenwerts.Then in 638 remaining features each is estimated with combine (this moment, candidate feature had 3) of containing 2 eigenwerts elected previously successively, selected Fisher than the maximum combination that contains 3 eigenwerts.So repeatedly, till the characteristic number of electing reaches the number of appointment.After feature selecting was finished, feature set was also just fixing.
Through after the feature selecting, the coarse classification of each classification is the center (average) of a class sample, calculates with formula (1).
Step S02 carries out eigentransformation to the eigenvector of the sample of prestored character classes, recomputates the center of a sample of each character type, obtains the template in fine classification that the center of a sample by all character types constitutes;
In order to obtain higher accuracy of identification, the disaggregated classification feature is taked eigentransformation, rather than feature selecting, (vector of d<D), the eigenvector dimension after the eigentransformation generally is decided to be between 100 to 150 promptly the eigenvector of original D=640 dimension to be obtained low-dimensional through linear conversion.Utilize formula: y=Wx carries out eigentransformation, and wherein W is the transformation matrix of dxD.Finding the solution transformation matrix makes Fisher compare tr[(WS wW T) -1W TS bW T] maximum, its result, each row of W is a matrix
Figure A200810219865D00151
D the latent vector (this is the mathematical method of standard, needn't give unnecessary details) of corresponding eigenvalue maximum.Through behind the dimensionality reduction, the masterplate of each classification is the center (formula (1)) of a class sample.
The eigentransformation matrix and the classification masterplate that obtain as top can't obtain very high accuracy of identification.For this reason, transformation matrix and classification masterplate are carried out the iteration adjustment, the classification error (each sample is assigned to nearest classification) of closing at training sample set is gradually reduced.At first, the weight of all training samples is made as 1, transformation matrix that obtains with the Fisher criterion and classification center masterplate are to all training sample classification, and each wrong sample weights of dividing adds 1.If sample
Figure A200810219865D00152
The weight table of (j sample of i class) is shown
Figure A200810219865D00153
Recomputate in class center, the class and covariance matrix between class by following formula:
μ i = 1 Σ j = 1 N i v j i Σ j = 1 N i v j i x j i - - - ( 4 )
μ 0 = 1 Σ i = 1 C Σ j = 1 N i v j i Σ i = 1 C Σ j = 1 N i v j i x j i - - - ( 5 )
S w = 1 Σ i = 1 C Σ j = 1 N i v j i Σ i = 1 C Σ j = 1 N i v j i ( x j i - μ i ) ( x j i - μ i ) T - - - ( 6 )
S b = 1 Σ i = 1 C V i Σ i = 1 C V i ( μ i - μ 0 ) ( μ i - μ 0 ) T , Wherein V i = Σ j = 1 N i v j i - - - ( 7 )
On this basis by making tr[(WS wW T) -1W TS bW T] maximization recomputates the class center after transformation matrix and the eigentransformation, again to the training sample classification, wrong sample weights of dividing adds 1; So repeatedly, till the classification error of training sample no longer further reduces.
Step S03 receives the signal of hand-written character input and the discrete coordinate sequence of gathering the hand-written character tracing point; Write on touch-screen with pen particularly, (x, y) coordinate sequence is noted the position of nib when starting writing.The complete handwriting trace of an input character with one (x, y) sequence is represented: (x1, y1), (x2, y2) ..., (xn, yn) }.
Utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtain the regular coordinate sequence of this hand-written character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity; The normalization of character track has two purposes: size criteriaization and shape correction.Shown in Fig. 5 a-5b, character among Fig. 5 a is through having become the shape among Fig. 5 b after the normalization, not only the border has become the size (all characters border after normalization becomes identical size) of regulation, and variation has also taken place shape, the more standard that becomes, thereby easier identification.
Normalization is by two coordinate conversion function x '=x aAnd y '=y bRealize, the coordinate of every bit in the character track (x, y) replace with (x ', y ') after, just obtain normalized character track.
The method of estimation of parameter a and b is as follows:
At first, find out the minimum value of x and y in the coordinate sequence, x that is had a few and y coordinate are deducted the minimum value of x and y respectively, thereby make the minimum value of x and y all become 0.Then, all x and y are on duty with 100/u, wherein u be the maximal value of an x and y is arranged, thereby make x and y value between 0 to 100.
Second the step, ask stroke track in the horizontal direction with vertical direction on projection.The stroke track is put into the grid of one 100 x 100, (being the grid of 10 x 10 in the synoptic diagram) as shown in Figure 6.The stroke length addition in each file grid, just obtain the projection fx (i) of horizontal direction, i=1,2 ..., 100.Equally,, just obtain the projection fy (i) of vertical direction the stroke length addition in each row grid, i=1,2 ..., 100. centers of gravity by fx (i) calculated level direction:
x c = Σ i = 1 100 i × f x ( i ) Σ i = 1 100 f x ( i ) - - - ( 8 )
Equally, calculate the center of gravity yc of vertical direction by fy (i).
The 3rd step is the coordinate of being had a few and (xc yc) divided by 100, becomes between 0 to 1.Function x '=x aAnd y '=y bRespectively xc and yc are become 0.5, i.e. x c a=0.5, a = log 0.5 log x c , Same y c b=0.5 b = log 0.5 log y c . Through conversion, the center of gravity of character track is moved on to (0.5,0.5) and the border remains unchanged.
In the 4th step, (x ', y ') be multiply by a given multiple, thereby make the housing of character become the size of regulation.We are decided to be 64 this multiple.At last, the coordinate of being had a few in the normalization character track is between 0 to 64.
Step S04 according to the regular coordinate sequence of hand-written character, carries out the multidimensional eigenvector that feature extraction obtains this hand-written character; Basic thought: as shown in Figure 7, stroke line segment (being linked to be a vector line segment between every adjacent 2) is decomposed 8 reference directions of D1 to D8, write down in 64 x, 64 grids line segment length value of all directions in each grid, calculate the direction character value of two yardsticks then.
The first step decomposes 8 reference directions to the stroke line segment.Being linked to be a line segment between every adjacent 2 in the coordinate point sequence, is a directive vector f iThis vector f iDirection between two reference direction D2 and D3, vector f iResolve into the component (as shown in Figure 8) on two reference direction D2 and the D3, the component length on each reference direction counts the line segment length value of this direction in the grid of place.Like this, on each direction of 8 directions, obtain 64 line segment length values of 64 x.
In second step, calculate large-scale characteristics.64 x, 64 grids on each direction evenly are divided into 4 grids of 4 x, calculate all directions in each grid reach the standard grade length value and, obtain 8 x, 4 an x 4=128 eigenwert.
In the 3rd step, calculate small scale features.64 x, 64 grids on each direction evenly are divided into 8 grids of 8 x, calculate all directions in each grid reach the standard grade length value and, obtain 8 x, 8 an x 8=512 eigenwert.
Total number of large-scale characteristics and small scale features is 128+512=640.
Step S05, selected part eigenwert from the multidimensional eigenvector of handwriting characters is mated described handwriting characters respectively with described coarse classification, choose the plurality of candidate character classes of similarity maximum from prestored character classes; Particularly, according to described in the step S01: according to the Fisher criterion, from the multidimensional eigenvector of handwriting characters, choose and make Fisher than maximum several features value, the eigenwert number of choosing among the eigenwert number of choosing and the step S01 is identical.
The distance calculation of template matches is as follows: the multidimensional eigenvector of establishing handwriting characters is expressed as vector x=[x 1..., x n] T, the center of a sample of a classification is expressed as eigenvector y=[y in the rough sort template 1..., y n] T, then calculate matching distance by following formula:
d ( x , y ) = Σ i = 1 n | x i - y i | - - - ( 9 )
Step S06, multidimensional eigenvector to described handwriting characters carries out eigentransformation, handwriting characters after the eigentransformation is mated with the center of a sample of the candidate character classes of choosing from described disaggregated classification template, therefrom determine the most similar character type, select for the user.The purpose of this step is to carry out disaggregated classification, to an input character, in rough sort, find out M candidate's classification after, disaggregated classification adopts than the more feature of rough sort, recomputate the distance of input character, get nearest classification as final recognition result to M candidate's class template.
Disaggregated classification provides a plurality of (being generally 10) classification of matching distance minimum as final candidate.These candidate's classifications can directly show for the user and select, or utilize language rule based on context to select automatically.
The rough sort of step S05 is that the masterplate of each character class of storing in the eigenvector of input character (character to be identified) and the masterplate database is compared (coupling), the individual classification of M (such as M=100) of finding out distance minimum (similarity maximum just) is found out the minimum candidate's classification of distance as final recognition result again as the candidate in the disaggregated classification of step S06.
Rough sort different with the masterplate that disaggregated classification is compared (feature is also different): coarse classification simple (feature is few), computing velocity is fast, template in fine classification complexity (feature is more), computing velocity is slower.
The purpose of rough sort is after finding out M candidate's classification fast, needn't calculate the distance (only calculating the distance of M candidate's classification) of all categories in the disaggregated classification, thereby improve overall recognition speed.
In sum, Fig. 2 shows the detail flowchart of a kind of hand-written character input method of the embodiment of the invention;
Step S02 specifically may further comprise the steps:
Step S021 utilizes the eigentransformation matrix that obtains according to the Fisher criterion, and the sample of all character types is carried out eigentransformation, reduces the dimension of its multidimensional eigenvector;
Step S022 recomputates the center of a sample of all character types after the eigentransformation;
Step S023 carries out the iteration adjustment to the center of a sample of described eigentransformation matrix and all character types, recomputates the center of a sample of eigentransformation matrix and all character types, obtains the template in fine classification that the center of a sample by all character types constitutes.
Described step S03 specifically may further comprise the steps:
Step S031 receives the signal of hand-written character input and the discrete coordinate sequence of gathering the input character tracing point;
Step S032 judges whether a character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this character discrete coordinate sequence; When pen lifting time surpasses a threshold value (as 0.5 second), be considered as a wordbook and write end; The complete handwriting trace of an input character with one (x, y) sequence is represented: (x1, y1), (x2, y2) ..., (xn, yn) }, wherein, start writing and represent with a special coordinate (1,0).
Step S033 checks whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point;
Step S034, coordinate distance in the tracing point of the hand-written character that detection collects between the consecutive point is if this distance is less than preset threshold, if i.e. two consecutive point position coincidences or very tight from getting, then wherein any of deletion makes to keep certain distance between the consecutive point;
Step S035, utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this character with the size of adjusting handwriting characters and the coordinate figure of shape and center of gravity.
Described step S04 specifically may further comprise the steps:
Step S041, according to the regular coordinate sequence of hand-written character, the vector line segment that all adjacent track points are formed decomposes 8 reference directions (shown in Fig. 7 and 8), obtains the line of vector length value on each reference direction;
Step S042 handles the described line of vector length value that obtains, and obtains the multidimensional eigenvector that large-scale characteristics value and small scale features value constitute.
Described step S05 specifically may further comprise the steps:
Step S051, according to the Fisher criterion, selected part eigenwert from the multidimensional eigenvector of described handwriting characters, described handwriting characters has the eigenvector that is made of the selected characteristic value with the sample same dimension of character type;
Step S052 mates described handwriting characters respectively with described coarse classification, choose the plurality of candidate character classes of similarity maximum from prestored character classes.
Described step S06 specifically may further comprise the steps:
Step S061 carries out eigentransformation with the adjusted eigentransformation matrix of iteration to handwriting characters, obtains its low-dimensional eigenvector;
Step S062, the low-dimensional eigenvector of this handwriting characters mates with the center of a sample of the candidate character classes of choosing from described disaggregated classification template respectively, determines the character type of similarity maximum from candidate character classes, selects for the user.
Fig. 3 shows the structural representation of a kind of hand-written character input system of the embodiment of the invention.This system comprises:
Signal acquisition module 1 is used to receive the signal of hand-written character input and the discrete coordinate sequence of gathering this hand-written character tracing point;
Normalization module 2, be used to utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this hand-written character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity;
Characteristic extracting module 3 is used for the regular coordinate sequence according to hand-written character, and the vector line segment that all adjacent track points are formed decomposes eight reference directions, obtains the multidimensional eigenvector of hand-written character;
Memory module 4, be used to store the rough sort template and the disaggregated classification template of eigentransformation matrix and all character types, the rough sort template is made of the center of a sample that all character types calculate after Feature Selection, and the disaggregated classification template is made of the center of a sample that all character types calculate after eigentransformation;
Rough sort module 5, be used for multidimensional eigenvector selected part eigenwert from handwriting characters, with described handwriting characters respectively with described memory module 4 in coarse classification mate, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
Disaggregated classification module 6, be used for the multidimensional eigenvector of handwriting characters is carried out eigentransformation, the center of a sample of the candidate character classes chosen in the handwriting characters after the eigentransformation and the disaggregated classification template from described memory module 4 is mated, therefrom determine the most similar character type, select for the user.
Fig. 4 shows the detailed structure synoptic diagram of a kind of hand-written character input system of the embodiment of the invention; Signal acquisition module 1 specifically comprises among Fig. 3:
Collecting unit 101, the discrete coordinate sequence that is used to gather the hand-written character tracing point;
Judging unit 102 is used to judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence;
Detecting unit 103 is used to check whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point; And detect the coordinate distance between the consecutive point in the tracing point of the hand-written character collect, if this distance less than preset threshold, is then deleted wherein a bit, make to keep certain distance between the consecutive point.
This system also comprises:
Display module 7 is used to show the most similar character type of described disaggregated classification module 6 outputs, selects for the user.
Compare with existing method, the technology in normalization, feature extraction, rough sort, disaggregated classification has some differences:
Normalization: coordinate transform function x '=x aAnd y '=y bBe new the proposition, the benefit of doing like this is that the function of coordinate transform is level and smooth continuous function, thereby makes the character shape after the conversion more natural, guarantees the center (0.5) of the gravity-center-change of character track to housing simultaneously.Also handlebar character center of gravity was mapped to the way at housing center in the past, but usefulness is piecewise linear function, and the character shape after the conversion is nature, also influences the identification of back.
Feature extraction: the stroke track line segment directly decomposes 8 directions.Existing method is earlier track to be become piece image, and the pixel travel direction in the image is decomposed, and the calculated amount of doing like this is bigger, and the image that generates has distortion.Our method has been avoided generating the calculated amount that image increased, and the direction character that obtains is more accurate.
Rough sort: existing method generally is manually to choose a part of feature (as direct use large-scale characteristics) or adopt eigentransformation.And the aspect ratio artificial selection feature that we select with the Fisher criterion has the better recognition precision, compares with eigentransformation and has reduced calculated amount (because not having linear transformation).
Disaggregated classification: the iteration adjustment of eigentransformation matrix and classification center masterplate can obviously improve accuracy of identification.Existing method is the transformation matrix that directly adopts the Fisher criterion to obtain, and (Learning Vector Quantization, LVQ) algorithm is adjusted classification center masterplate to adopt the study vector quantization then on transform characteristics.Our method can obtain higher accuracy of identification by adjusting transformation matrix and classification center masterplate simultaneously.
This method is applicable to the identification of Chinese, English, numeral, symbol.
One, input character track:
Figure A200810219865D00231
Two, the track after the normalization:
Figure A200810219865D00232
Three, 10 candidate's classifications and the matching distance (selecting computed range on the features) thereof selected of rough sort at 60
Instigate Sound of sighing What Pyridine Call out Furan Smell Cry Larynx Mark
597 621 643 676 689 698 715 732 764 771
Four, disaggregated classification is to the rearrangement and the distance (computed range on 120 transform characteristics) thereof of 10 candidate's classifications
Sound of sighing Pyridine Call out Instigate What Cry Furan Larynx Smell Mark
1079 1121 1157 1186 1233 1298 1374 1419 1462 1503
Five, last recognition result is " sound of sighing ".
The above is a preferred implementation of the present invention; certainly can not limit the present invention's interest field with this; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention; can also make some improvement and change, these improvement and change also are considered as protection scope of the present invention.

Claims (13)

1, a kind of hand-written character input method, it may further comprise the steps:
A, from the eigenvector of the sample of prestored character classes the selected part eigenwert, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes;
B, the eigenvector of the sample of prestored character classes is carried out eigentransformation, recomputate the center of a sample of each character type, obtain the template in fine classification that the center of a sample by all character types constitutes;
C, the signal that receives the hand-written character input and the discrete coordinate sequence of gathering the input character tracing point, utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence, with the size of adjustment handwriting characters and the coordinate figure of shape and center of gravity, obtain the regular coordinate sequence of this character;
D, according to the regular coordinate sequence of input character, carry out the multidimensional eigenvector that feature extraction obtains this hand-written character;
E, from the multidimensional eigenvector of described handwriting characters the selected part eigenwert, described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
F, the multidimensional eigenvector of described handwriting characters is carried out eigentransformation, handwriting characters after the eigentransformation is mated with the center of a sample of the candidate character classes of choosing from described disaggregated classification template, therefrom determine the most similar character type, select for the user.
2, a kind of hand-written character input method according to claim 1 is characterized in that, described step C specifically also comprises step:
Judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this character discrete coordinate sequence.
3, a kind of hand-written character input method according to claim 1 and 2 is characterized in that, described step C also comprises step:
Whether the tracing point of the hand-written character that inspection collects has only one, gathers again if then delete this tracing point;
Coordinate distance in the tracing point of the hand-written character that detection collects between the consecutive point if this distance less than preset threshold, is then deleted wherein a bit, makes to keep certain distance between the consecutive point.
4, a kind of hand-written character input method according to claim 1 is characterized in that, described step C specifically may further comprise the steps:
The abscissa value and the ordinate value of all tracing points are transformed between 0 to 100;
Calculate the barycentric coordinates value of all tracing point horizontal ordinates and ordinate respectively;
With all tracing point coordinate figures and barycentric coordinates value divided by 100, all tracing point coordinate figures and barycentric coordinates value are become between 0 to 1, utilization makes the horizontal ordinate of barycentric coordinates value and ordinate become a level and smooth continuous function of 0.5, and the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence;
Again all tracing point coordinate figures be multiply by 64, obtain the regular coordinate sequence of input character.
5, according to claim 1 or 4 described a kind of hand-written character input methods, it is characterized in that described step D specifically may further comprise the steps:
According to the regular coordinate sequence of input character, the vector line segment that all adjacent track points are formed decomposes 8 reference directions, obtains the line of vector length value on each reference direction;
The described line of vector length value that obtains is handled, obtained the multidimensional eigenvector that large-scale characteristics value and small scale features value constitute.
6, a kind of hand-written character input method according to claim 1 is characterized in that, described steps A specifically may further comprise the steps:
According to the Fisher criterion, from the sample of each character type of prestoring, choose and make Fisher than maximum several features value;
According to the eigenvector of the sample that constitutes by the selected characteristic value, calculate the eigenvector of the center of a sample of this character type, obtain the coarse classification that the center of a sample by all character types constitutes.
7, a kind of hand-written character input method according to claim 6 is characterized in that, described step e specifically may further comprise the steps:
According to the Fisher criterion, selected part eigenwert from the multidimensional eigenvector of described handwriting characters, described handwriting characters has the eigenvector that is made of the selected characteristic value with the sample same dimension of character type;
Described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum.
8, according to claim 1 or 6 described a kind of hand-written character input methods, it is characterized in that described step B specifically may further comprise the steps:
The eigentransformation matrix that utilization obtains according to the Fisher criterion carries out eigentransformation with the sample of all character types, reduces the dimension of its multidimensional eigenvector;
Recomputate the center of a sample of all character types after the eigentransformation;
Center of a sample to described eigentransformation matrix and all character types carries out the iteration adjustment, recomputates the center of a sample of eigentransformation matrix and all character types, obtains the template in fine classification that the center of a sample by all character types constitutes.
9, a kind of hand-written character input method according to claim 8 is characterized in that, described step F specifically may further comprise the steps:
With the adjusted eigentransformation matrix of iteration handwriting characters is carried out eigentransformation, obtain its low-dimensional eigenvector;
The low-dimensional eigenvector of this handwriting characters mates with the center of a sample of the candidate character classes of choosing from described disaggregated classification template respectively, determines the character type of similarity maximum from candidate character classes, selects for the user.
10, a kind of hand-written character input method according to claim 1 is characterized in that, the eigenvector of the sample of the prestored character classes described in described steps A, the B is the multidimensional eigenvector that obtains by described step C, D in advance.
11, a kind of hand-written character input system is characterized in that, comprising:
Signal acquisition module is used to receive the signal of hand-written character input and the discrete coordinate sequence of gathering the hand-written character tracing point;
The normalization module, be used to utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity;
Characteristic extracting module is used for carrying out the multidimensional eigenvector that feature extraction obtains this hand-written character according to described regular coordinate sequence;
Memory module, be used to store the rough sort template and the disaggregated classification template of eigentransformation matrix and all character types thereof, the rough sort template is made of the center of a sample that all character types calculate after Feature Selection, and the disaggregated classification template is made of the center of a sample that all character types calculate after eigentransformation;
The rough sort module, be used for multidimensional eigenvector selected part eigenwert from handwriting characters, with described handwriting characters respectively with described memory module in coarse classification mate, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
The disaggregated classification module, be used for the multidimensional eigenvector of handwriting characters is carried out eigentransformation, the center of a sample of the candidate character classes chosen in the handwriting characters after the eigentransformation and the disaggregated classification template from described memory module is mated, therefrom determine the most similar character type, select for the user.
12, system according to claim 11 is characterized in that, specifically comprises in the described signal acquisition module:
Collecting unit, the discrete coordinate sequence that is used to gather the hand-written character tracing point;
Judging unit is used to judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this character discrete coordinate sequence;
Detecting unit is used to check whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point; And detect the coordinate distance between the consecutive point in the tracing point of the hand-written character collect, if this distance less than preset threshold, is then deleted wherein a bit, make to keep certain distance between the consecutive point.
13, according to claim 11 or 12 described systems, it is characterized in that this system also comprises:
Display module is used to show the most similar character type of described disaggregated classification module output, selects for the user.
CN2008102198651A 2008-12-11 2008-12-11 Hand-written character input method and system Expired - Fee Related CN101452357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008102198651A CN101452357B (en) 2008-12-11 2008-12-11 Hand-written character input method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008102198651A CN101452357B (en) 2008-12-11 2008-12-11 Hand-written character input method and system

Publications (2)

Publication Number Publication Date
CN101452357A true CN101452357A (en) 2009-06-10
CN101452357B CN101452357B (en) 2012-06-20

Family

ID=40734618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008102198651A Expired - Fee Related CN101452357B (en) 2008-12-11 2008-12-11 Hand-written character input method and system

Country Status (1)

Country Link
CN (1) CN101452357B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010078698A1 (en) * 2008-12-30 2010-07-15 广东国笔科技股份有限公司 Handwritten character recognition method and system
CN101620504B (en) * 2009-08-12 2011-09-28 北京红旗中文贰仟软件技术有限公司 Method and device for identifying vectoring track in office suite
CN102663454A (en) * 2012-04-18 2012-09-12 安徽科大讯飞信息科技股份有限公司 Method and device for evaluating character writing standard degree
CN102662465A (en) * 2012-03-26 2012-09-12 北京国铁华晨通信信息技术有限公司 Method and system for inputting visual character based on dynamic track
CN102761495A (en) * 2011-04-29 2012-10-31 周佳 Method, terminal and system for instant messaging based on original handwritings
CN108133213A (en) * 2016-12-01 2018-06-08 西安米特电子科技有限公司 A kind of embedded digital recognition methods imaged towards shell of gas meter formula
CN108279841A (en) * 2018-01-05 2018-07-13 西安电子科技大学 Complete hand-written online Tibetan language input system based on syllable word

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010078698A1 (en) * 2008-12-30 2010-07-15 广东国笔科技股份有限公司 Handwritten character recognition method and system
CN101620504B (en) * 2009-08-12 2011-09-28 北京红旗中文贰仟软件技术有限公司 Method and device for identifying vectoring track in office suite
CN102761495A (en) * 2011-04-29 2012-10-31 周佳 Method, terminal and system for instant messaging based on original handwritings
WO2012146128A1 (en) * 2011-04-29 2012-11-01 北京壹人壹本信息科技有限公司 Sending and receiving method and terminal of instant communication base on manuscript original handwriting
TWI455031B (en) * 2011-04-29 2014-10-01
CN102761495B (en) * 2011-04-29 2015-12-16 北京壹人壹本信息科技有限公司 A kind of instant communicating method based on original handwriting, communication terminal and system
CN102662465A (en) * 2012-03-26 2012-09-12 北京国铁华晨通信信息技术有限公司 Method and system for inputting visual character based on dynamic track
CN102663454A (en) * 2012-04-18 2012-09-12 安徽科大讯飞信息科技股份有限公司 Method and device for evaluating character writing standard degree
CN102663454B (en) * 2012-04-18 2014-08-20 安徽科大讯飞信息科技股份有限公司 Method and device for evaluating character writing standard degree
CN108133213A (en) * 2016-12-01 2018-06-08 西安米特电子科技有限公司 A kind of embedded digital recognition methods imaged towards shell of gas meter formula
CN108279841A (en) * 2018-01-05 2018-07-13 西安电子科技大学 Complete hand-written online Tibetan language input system based on syllable word

Also Published As

Publication number Publication date
CN101452357B (en) 2012-06-20

Similar Documents

Publication Publication Date Title
CN101482920B (en) Hand-written character recognition method and system
CN101477426B (en) Method and system for recognizing hand-written character input
CN101452357B (en) Hand-written character input method and system
CN109325454A (en) A kind of static gesture real-time identification method based on YOLOv3
CN101968847B (en) Statistical online character recognition
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN101477425A (en) Method and system for recognizing hand-written character input
CN108664975B (en) Uyghur handwritten letter recognition method and system and electronic equipment
CN102622610A (en) Handwritten Uyghur character recognition method based on classifier integration
Bhattacharya et al. Neural combination of ANN and HMM for handwritten Devanagari numeral recognition
CN102663454B (en) Method and device for evaluating character writing standard degree
CN112052852A (en) Character recognition method of handwritten meteorological archive data based on deep learning
CN109919055B (en) Dynamic human face emotion recognition method based on AdaBoost-KNN
EP2535787B1 (en) 3D free-form gesture recognition system and method for character input
CN107704867A (en) Based on the image characteristic point error hiding elimination method for weighing the factor in a kind of vision positioning
CN112507863B (en) Handwritten character and picture classification method based on quantum Grover algorithm
CN101436249A (en) Method and system for obtaining character matching stencil
CN109886164B (en) Abnormal gesture recognition and processing method
Amor et al. Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification.
CN105844299A (en) Image classification method based on bag of words
CN115565182A (en) Handwritten Chinese character recognition method based on complexity grouping
Korichi et al. Off-line Arabic handwriting recognition system based on ML-LPQ and classifiers combination
CN108960347A (en) A kind of recruitment evaluation system and method for convolutional neural networks handwriting recongnition Ranking Stability
Ozdil et al. Optical character recognition without segmentation
Shanmugam et al. Newton algorithm based DELM for enhancing offline tamil handwritten character recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20150921

Granted publication date: 20120620

RINS Preservation of patent right or utility model and its discharge
PD01 Discharge of preservation of patent

Date of cancellation: 20160921

Granted publication date: 20120620

PP01 Preservation of patent right

Effective date of registration: 20160921

Granted publication date: 20120620

RINS Preservation of patent right or utility model and its discharge
PD01 Discharge of preservation of patent
PD01 Discharge of preservation of patent

Date of cancellation: 20170921

Granted publication date: 20120620

PP01 Preservation of patent right
PP01 Preservation of patent right

Effective date of registration: 20170921

Granted publication date: 20120620

PD01 Discharge of preservation of patent
PD01 Discharge of preservation of patent

Date of cancellation: 20180321

Granted publication date: 20120620

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120620

Termination date: 20151211

DD01 Delivery of document by public notice

Addressee: Gao Jingjian

Document name: Notification of Approving Refund

DD01 Delivery of document by public notice