Summary of the invention
Based on the deficiencies in the prior art, the technical matters that the embodiment of the invention will solve is to provide a kind of method and system of hand-written character input, makes recognition speed faster, and accuracy of identification is more accurate.
Purpose of the present invention is achieved through the following technical solutions: a kind of method of hand-written character input may further comprise the steps:
A, from the eigenvector of the sample of prestored character classes the selected part eigenwert, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes;
B, the eigenvector of the sample of prestored character classes is carried out eigentransformation, recomputate the center of a sample of each character type, obtain the template in fine classification that the center of a sample by all character types constitutes;
C, the signal that receives the hand-written character input and the discrete coordinate sequence of gathering the input character tracing point, utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence, with the size of adjustment handwriting characters and the coordinate figure of shape and center of gravity, obtain the regular coordinate sequence of this character;
D, according to the regular coordinate sequence of input character, carry out the multidimensional eigenvector that feature extraction obtains this hand-written character;
E, from the multidimensional eigenvector of described handwriting characters the selected part eigenwert, described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
F, the multidimensional eigenvector of described handwriting characters is carried out eigentransformation, handwriting characters after the eigentransformation is mated with the center of a sample of the candidate character classes of choosing from described disaggregated classification template, therefrom determine the most similar character type, select for the user.
The present invention also provides a kind of system of hand-written character input, and this system comprises:
Signal acquisition module is used to receive the signal of hand-written character input and the discrete coordinate sequence of gathering the hand-written character tracing point;
The normalization module, be used to utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity;
Characteristic extracting module is used for carrying out the multidimensional eigenvector that feature extraction obtains this hand-written character according to described regular coordinate sequence;
Memory module, be used to store the rough sort template and the disaggregated classification template of eigentransformation matrix and all character types, the rough sort template is made of the center of a sample that all character types calculate after Feature Selection, and the disaggregated classification template is made of the center of a sample that all character types calculate after eigentransformation;
The rough sort module, be used for multidimensional eigenvector selected part eigenwert from handwriting characters, with described handwriting characters respectively with described memory module in coarse classification mate, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
The disaggregated classification module, be used for the multidimensional eigenvector of handwriting characters is carried out eigentransformation, the center of a sample of the candidate character classes chosen in the handwriting characters after the eigentransformation and the disaggregated classification template from described memory module is mated, therefrom determine the most similar character type, select for the user.
Compared with prior art, the present invention carries out pre-service by adopting level and smooth continuous function to hand-written character, make the size of pretreated hand-written character and shape also standard more naturally, thereby the speed of feature extraction is faster after making, precision is higher, it is faster that the present invention is had the Handwritten Digits Recognition speed of input, and accuracy of identification is beneficial effect more accurately.
A kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C specifically also comprises step:
Judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C also comprises step:
Whether the tracing point of the hand-written character that inspection collects has only one, gathers again if then delete this tracing point;
Coordinate distance in the tracing point of the hand-written character that detection collects between the consecutive point if this distance less than preset threshold, is then deleted wherein a bit, makes to keep certain distance between the consecutive point.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step C specifically may further comprise the steps:
The abscissa value and the ordinate value of all tracing points are transformed between 0 to 100;
Calculate the barycentric coordinates value of all tracing point horizontal ordinates and ordinate respectively;
With all tracing point coordinate figures and barycentric coordinates value divided by 100, all tracing point coordinate figures and barycentric coordinates value are become between 0 to 1, utilization makes the horizontal ordinate of barycentric coordinates value and ordinate become a level and smooth continuous function of 0.5, and the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence;
Again all tracing point coordinate figures be multiply by 64, obtain the regular coordinate sequence of input character.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step D specifically may further comprise the steps:
According to the regular coordinate sequence of hand-written character, the vector line segment that all adjacent track points are formed decomposes 8 reference directions, obtains the line of vector length value on each reference direction;
The described line of vector length value that obtains is handled, obtained the multidimensional eigenvector that large-scale characteristics value and small scale features value constitute.
This characteristic extraction step to obtain eigenvector, makes calculated amount little tracing point resolution of vectors to 8 reference direction, thereby speed is faster and the feature of extraction is more accurate.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described steps A specifically may further comprise the steps:
According to the Fisher criterion, from the sample of each character type of prestoring, choose and make Fisher than maximum several features value;
According to the eigenvector of the sample that constitutes by the selected characteristic value, calculate the eigenvector of the center of a sample of this character type, obtain the coarse classification that the center of a sample by all character types constitutes.
Described step e specifically may further comprise the steps:
According to the Fisher criterion, selected part eigenwert from the multidimensional eigenvector of described handwriting characters, described handwriting characters has the eigenvector that is made of the selected characteristic value with the sample same dimension of character type;
Described handwriting characters is mated with described coarse classification respectively, from prestored character classes, choose the plurality of candidate character classes of similarity maximum.
By character type being adopted the Fisher criterion select feature to carry out rough sort, the feature of selection has the better recognition precision, and calculated amount is little.
The another kind of preferred implementation of the method for a kind of hand-written character input of the present invention is that described step B specifically may further comprise the steps:
The eigentransformation matrix that utilization obtains according to the Fisher criterion carries out eigentransformation with the sample of all character types, reduces the dimension of its multidimensional eigenvector;
Recomputate the center of a sample of all character types after the eigentransformation;
Center of a sample to described eigentransformation matrix and all character types carries out the iteration adjustment, recomputates the center of a sample of eigentransformation matrix and all character types, obtains the template in fine classification that the center of a sample by all character types constitutes.
Described step F specifically may further comprise the steps:
With the adjusted eigentransformation matrix of iteration handwriting characters is carried out eigentransformation, obtain its low-dimensional eigenvector;
The low-dimensional eigenvector of this handwriting characters mates with the center of a sample of the candidate character classes of choosing from described disaggregated classification template respectively, determines the character type of similarity maximum from candidate character classes, selects for the user.
On the basis of rough sort, utilize the eigentransformation matrix that the character sample in hand-written character and the described candidate character classes is carried out eigentransformation, center of a sample to eigentransformation matrix and candidate character classes carries out the iteration adjustment then, candidate character classes is carried out disaggregated classification, make that the recognition speed of hand-written character input identification is fast, and the accuracy of identification height.
The eigenvector of the sample of the prestored character classes described in described steps A, the B is the multidimensional eigenvector that obtains by described step C, D in advance.
A kind of preferred implementation of the system of a kind of hand-written character input of the present invention is that described signal acquisition module specifically comprises:
Collecting unit, the discrete coordinate sequence that is used to gather the hand-written character tracing point;
Judging unit is used to judge whether the input of a hand-written character finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence;
Detecting unit is used to check whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point; And detect the coordinate distance between the consecutive point in the tracing point of the hand-written character collect, if this distance less than preset threshold, is then deleted wherein a bit, make to keep certain distance between the consecutive point.
The another kind of preferred implementation of the system of a kind of hand-written character of the present invention input is that this system also comprises: display module, be used to show the most similar character type of described disaggregated classification module output, and select for the user.
Embodiment
For making the present invention easier to understand, the present invention is further elaborated in conjunction with the accompanying drawings, but the embodiment in the accompanying drawing does not constitute any limitation of the invention.
The present invention is the character process coordinate sequence collection to handwriting input, treatment schemees such as pre-service, eigenvector extraction, rough sort, disaggregated classification, thus finally discern this hand-written character.
Fig. 1 shows the process flow diagram of a kind of hand-written character input method of the embodiment of the invention, and this method may further comprise the steps:
Step S01, selected part eigenwert from the eigenvector of the sample of pre-prepd character type, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes, coarse classification is stored in the storer of entry terminals such as mobile phone; Particularly, the sample of prestored character classes is in advance by feature extraction, obtain its multidimensional eigenvector, then according to the Fisher criterion, from the multidimensional eigenvector of the sample of each character type, choose and make Fisher, calculate the center of a sample of each character type, obtain the coarse classification that the center of a sample by all character types constitutes than maximum several features value.
The purpose of this step is to obtain coarse classification from the eigenvector of the sample of pre-prepd character type, in order to improve the speed of rough sort, select a part of feature calculation matching distance, feature selecting and masterplate design are closed at a training sample set and are carried out.Training sample set comprises the handwriting samples of each character class, each sample through feature extraction with 640 eigenwerts (640 the dimension eigenvector x=[x
1..., x
640]
T) expression.Be provided with the sample of N altogether of C classification, wherein classification i has Ni sample.Selecting the criterion of feature is Fisher criterion (on the pattern-recognition teaching material detailed description being arranged): the basic thought of Fisher criterion function is, the structure evaluation function, make that the distance between the classification that is classified is big as far as possible when evaluation function is optimum, distance is as far as possible little between all kinds of internal specimens simultaneously.
J schedule of samples of i class is shown eigenvector
(being made up of the part candidate feature), then the center of a sample of each classification (average) is
Total center is
Covariance matrix is calculated as respectively between interior covariance matrix of class and class:
The target of feature selecting is on the basis of selecting Partial Feature, matrix
Mark
(Fisher ratio) reaches maximal value.Here
Candidate feature changes in feature selection process.Seeking Fisher is a combinatorial optimization problem than maximum characteristics combination, available order sweep forward method approximate solution:
Calculate the Fisher ratio of each feature earlier, select Fisher than maximum feature.Then in the remaining feature each is calculated the Fisher ratio with selecting feature composition characteristic vector successively, select Fisher to add and has selected feature than maximum feature.So repeatedly, till selecting feature to reach specified number or amount (being decided to be below 100).
The detailed process of feature selecting is as follows: at first with in 640 features each successively as the candidate, calculate the Fisher ratio, with Fisher than a maximum feature as first feature of electing.Then each feature of electing with the first time successively in 639 remaining features (this moment, candidate feature had two) is estimated, selected Fisher than the maximum combination that contains 2 eigenwerts.Then in 638 remaining features each is estimated with combine (this moment, candidate feature had 3) of containing 2 eigenwerts elected previously successively, selected Fisher than the maximum combination that contains 3 eigenwerts.So repeatedly, till the characteristic number of electing reaches the number of appointment.After feature selecting was finished, feature set was also just fixing.
Through after the feature selecting, the coarse classification of each classification is the center (average) of a class sample, calculates with formula (1).
Step S02 carries out eigentransformation to the eigenvector of the sample of prestored character classes, recomputates the center of a sample of each character type, obtains the template in fine classification that the center of a sample by all character types constitutes;
In order to obtain higher accuracy of identification, the disaggregated classification feature is taked eigentransformation, rather than feature selecting, (vector of d<D), the eigenvector dimension after the eigentransformation generally is decided to be between 100 to 150 promptly the eigenvector of original D=640 dimension to be obtained low-dimensional through linear conversion.Utilize formula: y=Wx carries out eigentransformation, and wherein W is the transformation matrix of dxD.Finding the solution transformation matrix makes Fisher compare tr[(WS
wW
T)
-1W
TS
bW
T] maximum, its result, each row of W is a matrix
D the latent vector (this is the mathematical method of standard, needn't give unnecessary details) of corresponding eigenvalue maximum.Through behind the dimensionality reduction, the masterplate of each classification is the center (formula (1)) of a class sample.
The eigentransformation matrix and the classification masterplate that obtain as top can't obtain very high accuracy of identification.For this reason, transformation matrix and classification masterplate are carried out the iteration adjustment, the classification error (each sample is assigned to nearest classification) of closing at training sample set is gradually reduced.At first, the weight of all training samples is made as 1, transformation matrix that obtains with the Fisher criterion and classification center masterplate are to all training sample classification, and each wrong sample weights of dividing adds 1.If sample
The weight table of (j sample of i class) is shown
Recomputate in class center, the class and covariance matrix between class by following formula:
Wherein
On this basis by making tr[(WS
wW
T)
-1W
TS
bW
T] maximization recomputates the class center after transformation matrix and the eigentransformation, again to the training sample classification, wrong sample weights of dividing adds 1; So repeatedly, till the classification error of training sample no longer further reduces.
Step S03 receives the signal of hand-written character input and the discrete coordinate sequence of gathering the hand-written character tracing point; Write on touch-screen with pen particularly, (x, y) coordinate sequence is noted the position of nib when starting writing.The complete handwriting trace of an input character with one (x, y) sequence is represented: (x1, y1), (x2, y2) ..., (xn, yn) }.
Utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtain the regular coordinate sequence of this hand-written character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity; The normalization of character track has two purposes: size criteriaization and shape correction.Shown in Fig. 5 a-5b, character among Fig. 5 a is through having become the shape among Fig. 5 b after the normalization, not only the border has become the size (all characters border after normalization becomes identical size) of regulation, and variation has also taken place shape, the more standard that becomes, thereby easier identification.
Normalization is by two coordinate conversion function x '=x
aAnd y '=y
bRealize, the coordinate of every bit in the character track (x, y) replace with (x ', y ') after, just obtain normalized character track.
The method of estimation of parameter a and b is as follows:
At first, find out the minimum value of x and y in the coordinate sequence, x that is had a few and y coordinate are deducted the minimum value of x and y respectively, thereby make the minimum value of x and y all become 0.Then, all x and y are on duty with 100/u, wherein u be the maximal value of an x and y is arranged, thereby make x and y value between 0 to 100.
Second the step, ask stroke track in the horizontal direction with vertical direction on projection.The stroke track is put into the grid of one 100 x 100, (being the grid of 10 x 10 in the synoptic diagram) as shown in Figure 6.The stroke length addition in each file grid, just obtain the projection fx (i) of horizontal direction, i=1,2 ..., 100.Equally,, just obtain the projection fy (i) of vertical direction the stroke length addition in each row grid, i=1,2 ..., 100. centers of gravity by fx (i) calculated level direction:
Equally, calculate the center of gravity yc of vertical direction by fy (i).
The 3rd step is the coordinate of being had a few and (xc yc) divided by 100, becomes between 0 to 1.Function x '=x
aAnd y '=y
bRespectively xc and yc are become 0.5, i.e. x
c a=0.5,
Same y
c b=0.5
Through conversion, the center of gravity of character track is moved on to (0.5,0.5) and the border remains unchanged.
In the 4th step, (x ', y ') be multiply by a given multiple, thereby make the housing of character become the size of regulation.We are decided to be 64 this multiple.At last, the coordinate of being had a few in the normalization character track is between 0 to 64.
Step S04 according to the regular coordinate sequence of hand-written character, carries out the multidimensional eigenvector that feature extraction obtains this hand-written character; Basic thought: as shown in Figure 7, stroke line segment (being linked to be a vector line segment between every adjacent 2) is decomposed 8 reference directions of D1 to D8, write down in 64 x, 64 grids line segment length value of all directions in each grid, calculate the direction character value of two yardsticks then.
The first step decomposes 8 reference directions to the stroke line segment.Being linked to be a line segment between every adjacent 2 in the coordinate point sequence, is a directive vector f
iThis vector f
iDirection between two reference direction D2 and D3, vector f
iResolve into the component (as shown in Figure 8) on two reference direction D2 and the D3, the component length on each reference direction counts the line segment length value of this direction in the grid of place.Like this, on each direction of 8 directions, obtain 64 line segment length values of 64 x.
In second step, calculate large-scale characteristics.64 x, 64 grids on each direction evenly are divided into 4 grids of 4 x, calculate all directions in each grid reach the standard grade length value and, obtain 8 x, 4 an x 4=128 eigenwert.
In the 3rd step, calculate small scale features.64 x, 64 grids on each direction evenly are divided into 8 grids of 8 x, calculate all directions in each grid reach the standard grade length value and, obtain 8 x, 8 an x 8=512 eigenwert.
Total number of large-scale characteristics and small scale features is 128+512=640.
Step S05, selected part eigenwert from the multidimensional eigenvector of handwriting characters is mated described handwriting characters respectively with described coarse classification, choose the plurality of candidate character classes of similarity maximum from prestored character classes; Particularly, according to described in the step S01: according to the Fisher criterion, from the multidimensional eigenvector of handwriting characters, choose and make Fisher than maximum several features value, the eigenwert number of choosing among the eigenwert number of choosing and the step S01 is identical.
The distance calculation of template matches is as follows: the multidimensional eigenvector of establishing handwriting characters is expressed as vector x=[x
1..., x
n]
T, the center of a sample of a classification is expressed as eigenvector y=[y in the rough sort template
1..., y
n]
T, then calculate matching distance by following formula:
Step S06, multidimensional eigenvector to described handwriting characters carries out eigentransformation, handwriting characters after the eigentransformation is mated with the center of a sample of the candidate character classes of choosing from described disaggregated classification template, therefrom determine the most similar character type, select for the user.The purpose of this step is to carry out disaggregated classification, to an input character, in rough sort, find out M candidate's classification after, disaggregated classification adopts than the more feature of rough sort, recomputate the distance of input character, get nearest classification as final recognition result to M candidate's class template.
Disaggregated classification provides a plurality of (being generally 10) classification of matching distance minimum as final candidate.These candidate's classifications can directly show for the user and select, or utilize language rule based on context to select automatically.
The rough sort of step S05 is that the masterplate of each character class of storing in the eigenvector of input character (character to be identified) and the masterplate database is compared (coupling), the individual classification of M (such as M=100) of finding out distance minimum (similarity maximum just) is found out the minimum candidate's classification of distance as final recognition result again as the candidate in the disaggregated classification of step S06.
Rough sort different with the masterplate that disaggregated classification is compared (feature is also different): coarse classification simple (feature is few), computing velocity is fast, template in fine classification complexity (feature is more), computing velocity is slower.
The purpose of rough sort is after finding out M candidate's classification fast, needn't calculate the distance (only calculating the distance of M candidate's classification) of all categories in the disaggregated classification, thereby improve overall recognition speed.
In sum, Fig. 2 shows the detail flowchart of a kind of hand-written character input method of the embodiment of the invention;
Step S02 specifically may further comprise the steps:
Step S021 utilizes the eigentransformation matrix that obtains according to the Fisher criterion, and the sample of all character types is carried out eigentransformation, reduces the dimension of its multidimensional eigenvector;
Step S022 recomputates the center of a sample of all character types after the eigentransformation;
Step S023 carries out the iteration adjustment to the center of a sample of described eigentransformation matrix and all character types, recomputates the center of a sample of eigentransformation matrix and all character types, obtains the template in fine classification that the center of a sample by all character types constitutes.
Described step S03 specifically may further comprise the steps:
Step S031 receives the signal of hand-written character input and the discrete coordinate sequence of gathering the input character tracing point;
Step S032 judges whether a character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this character discrete coordinate sequence; When pen lifting time surpasses a threshold value (as 0.5 second), be considered as a wordbook and write end; The complete handwriting trace of an input character with one (x, y) sequence is represented: (x1, y1), (x2, y2) ..., (xn, yn) }, wherein, start writing and represent with a special coordinate (1,0).
Step S033 checks whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point;
Step S034, coordinate distance in the tracing point of the hand-written character that detection collects between the consecutive point is if this distance is less than preset threshold, if i.e. two consecutive point position coincidences or very tight from getting, then wherein any of deletion makes to keep certain distance between the consecutive point;
Step S035, utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this character with the size of adjusting handwriting characters and the coordinate figure of shape and center of gravity.
Described step S04 specifically may further comprise the steps:
Step S041, according to the regular coordinate sequence of hand-written character, the vector line segment that all adjacent track points are formed decomposes 8 reference directions (shown in Fig. 7 and 8), obtains the line of vector length value on each reference direction;
Step S042 handles the described line of vector length value that obtains, and obtains the multidimensional eigenvector that large-scale characteristics value and small scale features value constitute.
Described step S05 specifically may further comprise the steps:
Step S051, according to the Fisher criterion, selected part eigenwert from the multidimensional eigenvector of described handwriting characters, described handwriting characters has the eigenvector that is made of the selected characteristic value with the sample same dimension of character type;
Step S052 mates described handwriting characters respectively with described coarse classification, choose the plurality of candidate character classes of similarity maximum from prestored character classes.
Described step S06 specifically may further comprise the steps:
Step S061 carries out eigentransformation with the adjusted eigentransformation matrix of iteration to handwriting characters, obtains its low-dimensional eigenvector;
Step S062, the low-dimensional eigenvector of this handwriting characters mates with the center of a sample of the candidate character classes of choosing from described disaggregated classification template respectively, determines the character type of similarity maximum from candidate character classes, selects for the user.
Fig. 3 shows the structural representation of a kind of hand-written character input system of the embodiment of the invention.This system comprises:
Signal acquisition module 1 is used to receive the signal of hand-written character input and the discrete coordinate sequence of gathering this hand-written character tracing point;
Normalization module 2, be used to utilize a level and smooth continuous function, the discrete coordinate sequence of the hand-written character tracing point that collects is for conversion into another discrete coordinate sequence,, obtains the regular coordinate sequence of this hand-written character with the size of adjusting hand-written character and the coordinate figure of shape and center of gravity;
Characteristic extracting module 3 is used for the regular coordinate sequence according to hand-written character, and the vector line segment that all adjacent track points are formed decomposes eight reference directions, obtains the multidimensional eigenvector of hand-written character;
Memory module 4, be used to store the rough sort template and the disaggregated classification template of eigentransformation matrix and all character types, the rough sort template is made of the center of a sample that all character types calculate after Feature Selection, and the disaggregated classification template is made of the center of a sample that all character types calculate after eigentransformation;
Rough sort module 5, be used for multidimensional eigenvector selected part eigenwert from handwriting characters, with described handwriting characters respectively with described memory module 4 in coarse classification mate, from prestored character classes, choose the plurality of candidate character classes of similarity maximum;
Disaggregated classification module 6, be used for the multidimensional eigenvector of handwriting characters is carried out eigentransformation, the center of a sample of the candidate character classes chosen in the handwriting characters after the eigentransformation and the disaggregated classification template from described memory module 4 is mated, therefrom determine the most similar character type, select for the user.
Fig. 4 shows the detailed structure synoptic diagram of a kind of hand-written character input system of the embodiment of the invention; Signal acquisition module 1 specifically comprises among Fig. 3:
Collecting unit 101, the discrete coordinate sequence that is used to gather the hand-written character tracing point;
Judging unit 102 is used to judge whether a hand-written character input finishes, and surpasses preset threshold when the time that does not receive the hand-written character input signal, then finishes the collection of this hand-written character discrete coordinate sequence;
Detecting unit 103 is used to check whether the tracing point of the hand-written character that collects has only one, gathers again if then delete this tracing point; And detect the coordinate distance between the consecutive point in the tracing point of the hand-written character collect, if this distance less than preset threshold, is then deleted wherein a bit, make to keep certain distance between the consecutive point.
This system also comprises:
Display module 7 is used to show the most similar character type of described disaggregated classification module 6 outputs, selects for the user.
Compare with existing method, the technology in normalization, feature extraction, rough sort, disaggregated classification has some differences:
Normalization: coordinate transform function x '=x
aAnd y '=y
bBe new the proposition, the benefit of doing like this is that the function of coordinate transform is level and smooth continuous function, thereby makes the character shape after the conversion more natural, guarantees the center (0.5) of the gravity-center-change of character track to housing simultaneously.Also handlebar character center of gravity was mapped to the way at housing center in the past, but usefulness is piecewise linear function, and the character shape after the conversion is nature, also influences the identification of back.
Feature extraction: the stroke track line segment directly decomposes 8 directions.Existing method is earlier track to be become piece image, and the pixel travel direction in the image is decomposed, and the calculated amount of doing like this is bigger, and the image that generates has distortion.Our method has been avoided generating the calculated amount that image increased, and the direction character that obtains is more accurate.
Rough sort: existing method generally is manually to choose a part of feature (as direct use large-scale characteristics) or adopt eigentransformation.And the aspect ratio artificial selection feature that we select with the Fisher criterion has the better recognition precision, compares with eigentransformation and has reduced calculated amount (because not having linear transformation).
Disaggregated classification: the iteration adjustment of eigentransformation matrix and classification center masterplate can obviously improve accuracy of identification.Existing method is the transformation matrix that directly adopts the Fisher criterion to obtain, and (Learning Vector Quantization, LVQ) algorithm is adjusted classification center masterplate to adopt the study vector quantization then on transform characteristics.Our method can obtain higher accuracy of identification by adjusting transformation matrix and classification center masterplate simultaneously.
This method is applicable to the identification of Chinese, English, numeral, symbol.
One, input character track:
Two, the track after the normalization:
Three, 10 candidate's classifications and the matching distance (selecting computed range on the features) thereof selected of rough sort at 60
Instigate |
Sound of sighing |
What |
Pyridine |
Call out |
Furan |
Smell |
Cry |
Larynx |
Mark |
597 |
621 |
643 |
676 |
689 |
698 |
715 |
732 |
764 |
771 |
Four, disaggregated classification is to the rearrangement and the distance (computed range on 120 transform characteristics) thereof of 10 candidate's classifications
Sound of sighing |
Pyridine |
Call out |
Instigate |
What |
Cry |
Furan |
Larynx |
Smell |
Mark |
1079 |
1121 |
1157 |
1186 |
1233 |
1298 |
1374 |
1419 |
1462 |
1503 |
Five, last recognition result is " sound of sighing ".
The above is a preferred implementation of the present invention; certainly can not limit the present invention's interest field with this; should be understood that; for those skilled in the art; under the prerequisite that does not break away from the principle of the invention; can also make some improvement and change, these improvement and change also are considered as protection scope of the present invention.