CN1333366C - On-line hand-written Chinese characters recognition method based on statistic structural features - Google Patents

On-line hand-written Chinese characters recognition method based on statistic structural features Download PDF

Info

Publication number
CN1333366C
CN1333366C CNB200510011510XA CN200510011510A CN1333366C CN 1333366 C CN1333366 C CN 1333366C CN B200510011510X A CNB200510011510X A CN B200510011510XA CN 200510011510 A CN200510011510 A CN 200510011510A CN 1333366 C CN1333366 C CN 1333366C
Authority
CN
China
Prior art keywords
point
prime
sigma
person
handwriting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB200510011510XA
Other languages
Chinese (zh)
Other versions
CN1664846A (en
Inventor
丁晓青
鲁湛
刘长松
陈彦
彭良瑞
方弛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CNB200510011510XA priority Critical patent/CN1333366C/en
Publication of CN1664846A publication Critical patent/CN1664846A/en
Application granted granted Critical
Publication of CN1333366C publication Critical patent/CN1333366C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The present invention relates to an on-line handwritten Chinese character recognition method based on statistic structural features, which belongs to the technical field of Chinese character recognition. The present invention is characterized in that firstly, a processed character object is preprocessed for eliminating noise interference; partial writing deformation is eliminated, and the occupied space of a Chinese character to be recognized is mapped in a position with a fixed size; statistic structural features which comprise a direction feature and an edge feature and can better reflect on-line handwritten Chinese character features are extracted; obtained original features are compressed and converted into recognition features by a linear discrimination and analysis method; finally, an improved model of a Gaussian quadratic classification device is adopted for completing an exercise and recognition; the model uses a constant value which is set in advance for replacing a minor feature value for eliminating an unfavorable influence on classification performance due to the inaccurate estimation of the minor feature value. The present invention has 98.43% of average recognition rate and obtains a satisfactory effect.

Description

On-line handwritten Chinese character recognition methods based on statistic structural features
Technical field
On-line handwritten Chinese character recognition methods based on statistic structural features belongs to on-line handwritten Chinese character identification field.
Background technology
So-called on-line handwritten Chinese character identification is exactly that computing machine is sampled to people's written handwriting in real time by a kind of digitizer, then a kind of technology that the gained data are discerned automatically.This technology can make people with natural writing mode computing machine or smart machine be imported in Chinese character, satisfy the requirement of user's vague generalization and miniaturization easily, and can be well complementary with other input method such as speech recognition technology, form Chinese character input system efficiently.So the on-line handwritten Chinese character recognition methods has very important theory and practical significance in computer utility.
Up to the present, under numerous researchers' unremitting effort, the on-line handwritten Chinese character recognition technology has had bigger progress.Common on-line handwritten Chinese character recognition methods has two big classes: a kind of statistical recognition method that is based on whole word pattern analysis, emphasize to describe and utilize character information from the angle of the overall situation, its representative method is with one group of higher-dimension numerical characteristics vector description Chinese mode in the feature space, and makes the classification judgement with their being distributed as according to the utilization decision theory in feature space.Another kind is based on the recognition methods that local minor structure is analyzed, and principal character is described based on the using partial stroke information after the STRUCTURE DECOMPOSITION, and the representative method has the character string model, the relation on attributes graph model, and implicit Markov model etc.Because statistical recognition method is that unit carries out feature extraction and coupling with whole Chinese character, utilized the numerical value processing power of computing machine fully, help in feature space, seeking the overall dependency structure feature of Chinese character with mathematical method, and have the advantage that speed is fast, performance is good, so the present invention has used statistical method to discern on-line handwritten Chinese character.
The key of statistical recognition method success is the validity of statistical nature set, that is depends on that the pattern of kinds of characters can to what extent separate at feature space.In the open source literature report, the statistical nature of online Chinese character mainly adopts the conversion coefficient feature, and this feature is applicable to the character of the less and bent arc stroke of stroke number, as English digital and Japanese ideogram etc.High fdrequency component is abundant in the Chinese character, the poor effect of conversion coefficient feature.In the research of off line Chinese characters recognition method, proposed some statistical framework numerical characteristics,, obtained goodr recognition effect as directional line element feature feature, grid feature, frame feature, stroke density feature, background characteristics or the like.Have the minority document to attempt in online Chinese Character Recognition, to introduce these statistic structural features, but owing to do not consider the characteristic of on-line handwritten Chinese character, recognition performance is poor.
The present invention is exactly on the pretreated basis of necessity, and design and extraction can adapt to the statistic structural features of on-line handwritten Chinese character characteristic, have realized high performance on-line handwritten Chinese character recognition system, and this is also not see in other all documents at present.
Summary of the invention
The objective of the invention is to realize an on-line handwritten Chinese character recognition methods based on statistic structural features.This recognition methods with single on-line handwritten Chinese character character as process object, at first the character object of handling is carried out necessary pre-service, extract the statistic structural features of the fine reflection on-line handwritten Chinese character characteristics of energy then, these primitive characters that will obtain again use linear discriminant analysis LDA compressed transform to be recognition feature, discern with modified quadratic classifier MQDF at last.
The present invention consists of the following components: the extraction of pre-service, statistic structural features, eigentransformation, classifier design.
1. pre-service
Pretreated purpose is to eliminate the noise in the person's handwriting as far as possible and write distortion before identification, makes Chinese character to be known that a better recognition basis be arranged.Its task one is the noise that filtering person's handwriting collecting device and writer cause, and irregular etc. as isolated point noise, serrate noise, pen speed, main methods is the level and smooth and resampling of filtering; The 2nd, treat and know Chinese character and do shaping and handle to eliminate part and write distortion, comprise linear normalization, two functions of non-linear normalizing, make and wait to know the shared area of space of Chinese character and be mapped to a fixed-size position, and stroke is more even on space distribution after the shaping.
If the person's handwriting of an on-line handwritten Chinese character is:
P(x 1,y 1),P(x 2,y 2),…,P(x i,y i),(break),P(x i+1,y i+1),…,P(x N,y N)。
This is a computing machine by digitizer a series of point coordinate of arranging in regular turn from the time that the motion track of nib is sampled and obtained when writing in real time, and (break) mark represent and lifted pen and the interruption of starting to write between two natural strokes.
Remove the isolated point noise and be meant that removal is only by one or two stroke of forming from the person's handwriting point sequence.The method of filtering serrate noise is that the coordinate figure to consecutive point is weighted on average, reaches the effect of low-pass filtering.The filtering formula is:
x i ′ = 1 4 ( x i - 1 + 2 · x i + x i + 1 )
y i ′ = 1 4 ( y i - 1 + 2 · y i + y i + 1 )
Eliminate the irregular resampling method of pen speed and be for the track of writing with a fixed length interval resampling, make the stroke of certain-length represent with the point of some, its formula is:
x j ′ ′ = [ x i ′ · ( s i + 1 - jL ) + x i + 1 ′ · ( jL - s i ) ] / d i
y j ′ ′ = [ y i ′ · ( s i = 1 - jL ) + y i + 1 ′ · ( jL - s i ) ] / d i
In the following formula, L is the fixed sample interval, and value is a constant 1; (x i', y i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s i≤ jL<s I+1 d i = ( x i + 1 ′ - x i ′ ) 2 + ( y i + 1 ′ - y i ′ ) 2 Be two length between the point; s i = Σ k = 0 i - 1 d k Be cumulative length, and set s 0=0; (x j", y j"), j = 0,1 , . . . , [ s N L ] The new coordinate points that obtains for resampling.
Shaping is handled need obtain the new coordinate of each person's handwriting point after conversion, calculates by the density equalization method in the present invention.At first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")] W * H, the picture traverse before the shaping conversion is W, highly is H, any one person's handwriting point P (x i", y iThe corresponding black pixel point f (x of ") coordinate place i", y i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.(x "), (y ") represents picture element density projection in the horizontal and vertical directions respectively to V to U, that is:
U ( x ′ ′ ) = Σ y = 1 H f ( x ′ ′ , y ′ ′ ) + α U x″=1,2,...,W
V ( y ′ ′ ) = Σ x = 1 W f ( x ′ ′ , y ′ ′ ) + α V y″=1,2,...,H
Wherein, α U, α VFor the biasing constant, set α herein UV=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") be (x ' ", y ' "):
x ′ ′ ′ = Σ k = 1 x ′ ′ U ( k ) × W ′ Σ k = 1 W U ( k ) , y ′ ′ ′ = Σ l = 1 y ′ ′ V ( l ) × H ′ Σ l = 1 H V ( l )
Wherein, W ' is the maximum horizontal ordinate after handling, and H ' is the maximum ordinate after handling, and these two values are expectation values of the person's handwriting point coordinate scope after handling, need be pre-set before shaping is handled, all be made as 64 herein.
Pretreated last step is that the person's handwriting point in each natural stroke is all joined end to end in regular turn, and the point that does not overlap with former person's handwriting point on the line inserts the person's handwriting sequence, and eliminates the coincide point in the adjacent person's handwriting point.
2. the extraction of statistic structural features
The extraction of statistic structural features is to carry out on the on-line handwritten Chinese character person's handwriting after pre-service.The present invention designs and has extracted two kinds of statistic structural features by scrutinizing the architectural characteristic of on-line handwritten Chinese character, is called direction character and edge feature.
2.1 directional characteristic extraction
Directional characteristic extraction result also is divided into two kinds, is called consecutive point direction character and adjacent flex point direction character.
2.1.1 consecutive point direction character
At first calculate the direction of each person's handwriting point: in person's handwriting point coordinate sequence, appoint and get 1 P i, except that last point, at least one follow-up some P is arranged all j(j>i), we are from P iPoint to P jThe direction setting of directed line segment be P iThe direction value θ of point i, its codomain scope be [0 °, 360 °), as shown in Figure 3, (a) for the some P iTo consecutive point P I+1Direction, (b) be flex point P iTo adjacent flex point P jDirection, (c) be the calculating synoptic diagram of directed line segment orientation angle.When j=i+1, this direction value is called the consecutive point direction.
θ iComputing method are, establish (X i, Y i) be a some P iCoordinate, (X j, Y j) be a some P jCoordinate.
Because θ iThe triangle tan tg ( θ i ) = Y j - Y i X j - X i
So θ i = arctg ( Y j - Y i X j - X i )
Calculate the direction attribute coefficients of each person's handwriting point then.The direction attribute coefficients of so-called person's handwriting point is meant that the direction value with this point is an independent variable, utilizes trapezoidal and half trapezoidal function shown in Figure 4,4 kinds of functional values of this point that calculates:
Transverse direction attribute coefficients function
Perpendicular direction attribute coefficients function
Figure C20051001151000124
Cast aside direction attribute coefficients function
Figure C20051001151000125
Press down direction attribute coefficients function
Figure C20051001151000126
Above six parameter alpha 1~ α 6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α 1=-10 °, α 2=260 °, α 3=280 °, α 4=250 °, α 5=300 °, α 6=330 °.
Obtain after the direction attribute coefficients coordinate space of person's handwriting dot image evenly being divided into K 1* K 1The height piece, as shown in Figure 5.Add up 4 kinds of direction attribute coefficients sums separately of all person's handwriting points in each sub-piece respectively, obtain K altogether 1* K 1* 4 dimensional features.(k, l) (1≤k≤K here with 1, 1≤l≤K 1) the height piece is example, 4 dimensional features that statistics obtains are respectively:
F k , l ( h ) = Σ P ( x , y ) ∈ D ( k , l ) f ( h ) ( θ ) , θ is some P (x, direction value y);
F k , l ( s ) = Σ P ( x , y ) ∈ D ( k , l ) f ( s ) ( θ ) , θ is some P (x, direction value y);
F k , l ( p ) = Σ P ( x , y ) ∈ D ( k , l ) f ( p ) ( θ ) , θ is some P (x, direction value y);
F k , l ( n ) = Σ P ( x , y ) ∈ D ( k , l ) f ( n ) ( θ ) , θ is some P (x, direction value y);
2.1.2 adjacent flex point direction character
When person's handwriting trembleed, the calculating of consecutive point direction can produce bigger deviation, so we have also designed adjacent flex point direction, promptly Pi and Pj was set at flex point adjacent in the person's handwriting point, recomputated the direction of each person's handwriting point.So-called flex point is meant that the direction front and back that stroke is write change violent point, also are set at a kind of flex point to the stroke end points simultaneously.Flex point is the normal root basic skills of approaching according to polygon really: calculate in the stroke cosine value of subtended angle between each point and consecutive point earlier.The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians.
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate.If a, b, c are respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes.Subtended angle γ is limit a, and the angle of b, c are the opposite side of subtended angle γ, calculates the length on three limits respectively according to the coordinate of triangular apex earlier, can be tried to achieve by the cosine law cos γ = c 2 - a 2 - b 2 2 ab . As shown in Figure 6.
Point Pi and Pj, j>i is a flex point adjacent in the person's handwriting point, all comprise that the direction of the person's handwriting point between these 2 that Pi is ordered all is set at the directed line segment direction of pointing to some Pj from a Pi.
Recomputate each person's handwriting and put the direction attribute coefficients of adjacent flex point and add up 4 kinds of direction attribute coefficients sums in the sub-piece of each spatial division, obtain other K 1* K 1* 4 dimensional features.
Direction character is the merging of these two kinds of features, total K 1* K 1* 8 dimensional features.
2.2 the extraction of edge feature
Edge feature and direction character difference are that edge feature can reflect the peripheral structural information of Chinese character preferably.
With direction from left to right is example, and the method for extracting edge feature is: the left-half space of pretreated online Chinese character handwriting corresponding image equidistantly is divided into K 2Individual horizontal subregion is shown in Fig. 7 (a).In each subregion, from the direction of arrow, promptly the image left hand edge is turned right and is lined by line scan.If during the i time line scanning, scan certain coordinate points first when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f I, 1 (h), f I, 1 (s), f I, 1 (p), f I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning, when scanning once more that certain coordinate points is person's handwriting point in addition, calculate the consecutive point direction attribute coefficients of this person's handwriting point, remember and be f I, 2 (h), f I, 2 (s), f I, 2 (p), f I, 2 (n), same, if never scan the person's handwriting point once more, then these 4 coefficients are 0.Until i line scanning finishes, add up the above coefficient that each row obtains respectively, obtain 8 dimensional features: K 2Sub regions obtains K altogether 2* 8 dimension edge features.
From all the other 7 directions of arrow, promptly right, upper and lower three edges in addition and diagonal repeat above method, and shown in Fig. 7 (b), the direction of arrow is space five equilibrium and direction of scanning, obtains K altogether 2The edge feature of * 8 * 8 dimensions.
After merging, direction character and edge feature obtain the statistic structural features V of a complete on-line handwritten Chinese character.
3. eigentransformation
The primitive character dimension that the front extraction obtains is not under the very sufficient situation than higher at sample number, can cause computation complexity to increase and reduction sorter performance.So, before primitive character is delivered to sorter, also need it is carried out eigentransformation, the conversion of higher-dimension primitive character is compressed to low dimensional feature space.The present invention adopts linear discriminant analysis technology LDA to carry out eigentransformation.If { { V i (j), 1≤i≤N j, 1≤j≤C} is the original feature vector set, V in the formula i (j)Expression belongs to the original feature vector of i sample extraction of j classification, N jThe number of samples of representing j classification, C are represented the classification number.Each classification is represented a Chinese character in the Chinese characters of the national standard set.Calculate the average of each classification and the average of all categories with following formula:
μ j = 1 N j Σ i = 1 N j V i ( j ) , μ = 1 C Σ j = 1 C μ j
Divergence matrix S in the compute classes then wWith the between class scatter matrix S b:
S w = 1 C Σ j = 1 C ( 1 N j Σ i = 1 N j ( V i ( j ) - μ j ) ( V i ( j ) - μ j ) T )
S b = 1 C Σ j = 1 C ( μ j - μ ) ( μ j - μ ) T
We choose | (S b+ S w)/S w| as optimizing criterion, promptly ask for matrix of a linear transformation A, make | A T ( S b + S w ) A A T S w A | Maximum.
Transformation matrix A is that n * m ties up matrix, and n is the primitive character dimension, and the intrinsic dimensionality after the setting conversion is m.The acquiring method of transformation matrix is as follows: we are to matrix S w -1(S b+ S w) carry out eigenwert and proper vector decomposition, obtain eigenwert { γ i, i=1,2 ..., n}, eigenwert big or small descending sort according to value, and proper vector ξ i, i=1,2 ..., n.Form matrix A=[ξ with preceding m proper vector 1, ξ 2..., ξ m], then A meets the matrix of a linear transformation that requires previously.
The formula of feature selecting is as follows:
Y=A T·V
In the following formula, V is the prototype structure proper vector, and Y is through the proper vector after the conversion.
4. classifier design
The present invention has used the modified quadratic classifier MQDF at Gauss model.Here introduce standard quadratic classifier QDF earlier.The decision function of QDF is:
g j ( Y ) = Σ i = 1 m ( ( Y - μ j ) T ξ i ( j ) ) 2 λ i ( j ) + Σ i - 1 m log λ i ( j )
In the following formula, Y is the proper vector of input, and m is an intrinsic dimensionality, μ jRepresent the mean vector of j classification, ζ i (j)Be i proper vector of the covariance matrix of j classification, λ i (j)Be i eigenwert of the covariance matrix of j classification.When input Y is discerned, classify with following criterion:
Y is classified as i classification, if g i ( Y ) = min 1 ≤ j ≤ C g j ( Y ) , C is the classification number in the formula
In actual applications, since inaccurate to the estimation of little eigenwert, cause the performance of QDF to descend.Estimate inaccurate adverse effect to classification performance for reducing little eigenwert, we use improved quadratic classifier MQDF.MQDF replaces with pre-determined constant too small eigenwert, and its discriminant function is as follows:
g j ( Y ) = Σ i = 1 k ( ( Y - μ j ) T ξ i ( j ) ) 2 λ i ( j ) + Σ i = k + 1 m ( ( Y - μ j ) T ξ i ( j ) ) 2 λ + Σ i = 1 k log λ i ( j ) + Σ i = k + 1 m log λ j=1.2……,C
In the following formula, k is the positive integer less than m, and λ is a constant.K and λ are empirical parameter, are determined by experiment.At a minute time-like, input Y is divided into and makes g j(Y) get the classification of minimum value.
The invention is characterized in that it is a kind of on-line handwritten Chinese character recognition methods based on statistic structural features.It contains following steps successively:
(1) the on-line handwritten Chinese character person's handwriting to input carries out pre-service.
The person's handwriting of supposing an on-line handwritten Chinese character is: P (x 1, y 1), P (x 2, y 2) ..., P (x i, y i), (break), P (x I+1, y I+1) ..., P (x N, y N).Carry out following pre-service successively.
(1.1) remove the isolated point noise.
From the person's handwriting point sequence, only remove by one or two stroke of forming.
(1.2) filtering serrate noise.
Be weighted on average with the coordinate figure of following formula, reach the effect of low-pass filtering consecutive point:
x i ′ = 1 4 ( x i - 1 + 2 · x i + x i + 1 )
y i ′ = 1 4 ( y i - 1 + 2 · y i + y i + 1 )
(1.3) eliminate the irregular resampling of pen speed.
Adopt the following formula resampling for the track of writing with a fixed length interval, make the stroke of certain-length represent with the point of some:
x j ′ ′ = [ x i ′ · ( s i + 1 - jL ) + x i + 1 ′ · ( jL - s i ) ] / d i
y j ′ ′ = [ y i ′ · ( s i + 1 - jL ) + y i + 1 ′ · ( jL - s i ) ] / d i
In the following formula, L is the fixed sample interval, and value is a constant 1; (x i', y i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s i≤ jL<s I+1 d i = ( x i + 1 ′ - x i ′ ) 2 + ( y i + 1 ′ - y i ′ ) 2 Be two length between the point; s i = Σ k = 0 i - 1 d k Be cumulative length, and set s 0=0; (x j", y j"), j = 0,1 , . . . , [ s N L ] The new coordinate points that obtains for resampling.
(1.4) handle with the shaping of density equalization method.
At first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")] W * H, picture traverse is W, highly is H, any one person's handwriting point P (x i", y iThe corresponding black pixel point f (x of ") coordinate place i", y i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.In the horizontal and vertical directions density projection U of computed image (x "), V (y "):
U ( x ′ ′ ) = Σ y = 1 H f ( x ′ ′ , y ′ ′ ) + α U x″=1,2,...,W
V ( y ′ ′ ) = Σ x = 1 W f ( x ″ ′ , y ′ ′ ) + α V y″=1,2,...,H
Wherein, α U, α VFor the biasing constant, set α herein UV=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") is (x , y ):
x ′ ′ ′ = Σ k = 1 x ′ ′ U ( k ) × W ′ Σ k = 1 W U ( k ) , y ′ ′ ′ = Σ l = 1 y ′ ′ V ( l ) × H ′ Σ l = 1 H V ( l )
Wherein, the maximum horizontal ordinate after shaping is handled is W ', and maximum ordinate is H '.
(1.5) interpolation and deletion coincide point.
Person's handwriting point in each natural stroke is all joined end to end in regular turn, and the point that does not overlap with former person's handwriting point on the line inserts the person's handwriting sequence, and eliminates the coincide point in the adjacent person's handwriting point.
(2) extract statistic structural features
Extract direction character and edge feature on the on-line handwritten Chinese character person's handwriting after pre-service, merge into original statistic structural features.Its extracting method is as follows respectively:
(2.1) extract direction character
Direction character is consecutive point direction character and the directional characteristic merging of adjacent flex point.These two kinds of Feature Extraction steps are as follows:
(2.1.1) extract the consecutive point direction character
(a) at first calculate the consecutive point direction of all the person's handwriting points except that last point: from P iPoint to P I+1The direction θ of directed line segment i, its codomain scope be [0 °, 360 °).It is invalid that the direction of last point is made as.
(b) press the direction value θ of following formula then according to each person's handwriting point jCalculate 4 kinds of direction attribute coefficients of this point:
Transverse direction attribute coefficients function
Figure C20051001151000171
Perpendicular direction attribute coefficients function
Figure C20051001151000172
Cast aside direction attribute coefficients function
Figure C20051001151000173
Press down direction attribute coefficients function
Figure C20051001151000174
Six parameter alpha 1~ α 6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α 1=-10 °, α 2=260 °, α 31=280 °, α 4=250 °, α 5=300 °, α 6=330 °.
(c) the person's handwriting point coordinate is taken up space evenly be divided into K 1* K 1The height piece, 4 kinds of direction attribute coefficients sums of adding up all person's handwriting points in each sub-piece respectively.With (k, l), 1≤k≤K 1, 1≤l≤K 1The height piece is an example, and 4 dimensional features that statistics obtains are respectively:
F k , l ( h ) = Σ P ( x , y ) ∈ D ( k , l ) f ( h ) ( θ ) , θ is some P (x, direction value y);
F k , l ( s ) = Σ P ( x , y ) ∈ D ( k , l ) f ( s ) ( θ ) , θ is some P (x, direction value y);
F k , l ( p ) = Σ P ( x , y ) ∈ D ( k , l ) f ( p ) ( θ ) , θ is some P (x, direction value y);
F k , l ( n ) = Σ P ( x , y ) ∈ D ( k , l ) f ( n ) ( θ ) , θ is some P (x, direction value y);
Obtain K altogether 1* K 1* 4 dimensional feature consecutive point direction characters.
(2.1.2) extract adjacent flex point direction character
The method of approaching with polygon is determined the flex point in the person's handwriting, and flex point is to change violent point before and after the stroke direction of writing, and comprises the stroke flex point, calculates in the stroke cosine value of subtended angle between each point and consecutive point earlier;
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate, if a, b, c is respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes, subtended angle γ is limit a, and the angle of b, c are the opposite side of subtended angle γ, earlier calculate the length on three limits respectively, can try to achieve by the cosine law according to the coordinate of triangular apex cos γ = c 2 - a 2 - b 2 2 ab ;
The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians; The stroke end points also is set at a kind of flex point.
Calculate the adjacent flex point direction of each person's handwriting point: P sets up an office iAnd P j, j>i is a flex point adjacent in the person's handwriting point,
All comprise P iPoint all is set at from a P in the direction of the interior person's handwriting point between these 2 iPoint to some P jThe directed line segment direction.
(b) that repeat in (2.1.1) (c) two goes on foot, and obtains K 1* K 1The adjacent flex point direction character of * 4 dimensions.
(2.2) extract edge feature
At first extract the edge feature of a left side → right scanning: the left-half space of pretreated online Chinese character handwriting corresponding image equidistantly is divided into K 2Individual horizontal subregion is shown in Fig. 7 (a); Line by line scan from the direction of arrow (being that the image left hand edge is turned right).If during the i time line scanning, scan certain coordinate points for the first time when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f I, 1 (h), f I, 1 (s), f I, 1 (p), f I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning,, calculate the consecutive point direction attribute coefficients of this person's handwriting point, remember and be f when scanning for the second time certain coordinate points when being person's handwriting point I, 2 (h), f I, 2 (s), f I, 2 (p), f I, 2 (n), same, if never for the second time scan the person's handwriting point, then these 4 coefficients are 0.Line scanning finishes, and adds up the above coefficient that each row obtains respectively, obtains 8 dimensional features:
Figure C20051001151000182
K 2Sub regions obtains K altogether 2* 8 dimension edge features.
Then from right, upper and lower three edges and four oblique line direction of scanning in addition,, repeat above step then, obtain K altogether as Fig. 7 (b) 2The edge feature of * 8 * 8 dimensions.
(3) eigentransformation
Extract recognition feature with linear discriminant analysis LDA from original statistic structural features, to improve characteristic distribution, improve recognition performance, it contains following steps successively:
(3.1) calculate the average μ of each classification with following formula jAnd the average μ of all categories:
μ j = 1 N j Σ i = 1 N j V i ( j ) , μ = 1 C Σ j = 1 C μ j
Wherein, V i (j)Be the original feature vector of i sample extraction belonging to j classification, N jThe number of samples of representing j classification, C are represented the classification number.
(3.2) with divergence matrix S in the following formula compute classes wWith the between class scatter matrix S b:
S w = 1 C Σ j = 1 C ( 1 N j Σ i = 1 N j ( V i ( j ) - μ j ) ( V i ( j ) - μ j ) T )
S b = 1 C Σ j = 1 C ( μ j - μ ) ( μ j - μ ) T
(3.3) to matrix S w -1(S b+ S w) carry out eigenwert and proper vector decomposition, obtain eigenwert γ by the big or small descending sort of eigenwert i, i=1,2 ..., n and proper vector ξ i, i=1,2 ..., n.
(3.4) form matrix of a linear transformation A=[ξ with preceding m proper vector 1, ξ 2..., ξ m].
(3.5) with the proper vector Y behind primitive character V and the transformation matrix A computational transformation:
Y=A T·V
4) carry out on-line handwritten Chinese character identification with the MQDF sorter.
Discern with the MQDF sorter and to comprise two parts: at first will generate the identification library file by gathering good sample training in advance according to the recognition feature that obtains previously; Could utilize the identification storehouse that reality input sample to be known is discerned then.
(4.1) training process:
(4.1.1) at first to each classification j,, add up its average μ with following formula according to the m dimension recognition feature that obtains previously jWith the covariance matrix ∑ j:
μ j = 1 N j Σ i = 1 N j Y i ( j ) , Σ j = 1 N j Σ i = 1 N j ( Y i ( j ) - μ j ) · ( Y i ( j ) - μ j ) T
Wherein, Y i (j)Be the recognition feature vector of i sample extraction belonging to j classification, N jThe number of samples of representing j classification.
(4.1.2) to the covariance matrix ∑ of each classification jCarry out eigenwert and proper vector and decompose, obtain the eigenvalue of big or small descending sort according to value i (j), i=1,2 ..., m and proper vector ξ i (j), i=l, 2 ..., m
(4.1.3) substitution value of the little eigenwert of calculating:
γ = 1 C Σ j = 1 C λ k = 1 ( j )
Wherein, k is the positive integer less than m, is determined by experiment.
(4.1.4) the λ that obtains previously i (j), j=1,2 ..., C, i=1,2 ..., k, ζ i (j), j=1,2 ..., C, i=1,2 ..., m, μ j, j=1,2 ..., C and λ store in the identification library file, use for follow-up identification.
(4.2) identifying:
(4.2.1) obtain recognition feature Y, calculate the decision function g of each classification with following formula by sample to be known j(Y):
g j ( Y ) = Σ i = 1 k ( ( Y - μ j ) T ξ i ( j ) ) 2 λ i ( j ) + Σ i = k + 1 m ( ( Y - μ j ) T ξ i ( j ) ) 2 λ + Σ i = 1 k log λ i ( j ) + Σ i = k + 1 m log λ
Wherein, the same training process of the value of m, k.
(4.2.2) Shu Ru sample to be known is divided into and makes g j(Y) get the classification of minimum value.
Experiment showed, that average recognition rate of the present invention is 98.43%, reaches gratifying effect.
Description of drawings
The formation of Fig. 1 on-line handwritten Chinese character recognition system.
Fig. 2 pretreatment process
The computing method of Fig. 3 person's handwriting point direction.
The computing method of Fig. 4 direction attribute coefficients.
The sub-piece division methods in Fig. 5 person's handwriting point coordinate space.
Fig. 6 calculates the method for flex point.
The extracting method of Fig. 7 edge feature.
Fig. 8 LDA eigentransformation method flow.
Embodiment
Realization at first will obtain discerning the storehouse by training during based on the on-line handwritten Chinese character recognition system of statistic structural features, just can discern the on-line handwritten Chinese character character according to the identification storehouse then.Thereby the realization of practical on-line handwritten Chinese character recognition system based on statistic structural features need be considered the realization of training process and two aspects of realization of identifying, and its system constitutes as shown in Figure 1.It is identical in these two processes the section processes content being arranged.
Below the detailed various piece of introducing system:
A. the realization of training process
A.1 pre-service
Pretreatment process as shown in Figure 2.The person's handwriting of supposing an on-line handwritten Chinese character is: P (x 1, y 1), P (x 2, y 2) ..., P (x i, y i), (break), P (x I+1, y I+1) ..., P (x N, y N).
At first to remove the isolated point noise, from the person's handwriting point sequence, only remove by one or two stroke of forming.
Coordinate figure to consecutive point is weighted on average then, filtering serrate noise, and the filtering formula is:
x i ′ = 1 4 ( x i - 1 + 2 · x i + x i + 1 )
y i ′ = 1 4 ( y i - 1 + 2 · y i + y i + 1 )
The purpose of resampling is that the elimination pen speed is irregular.Its method be to the track write with a fixed length interval resampling, make the stroke of certain-length represent with the point of some, the sampling formula is:
x j ″ = [ x i ′ · ( S i + 1 - jL ) + x i + 1 ′ · ( jL - S i ) ] / d i
y j ″ = [ y i ′ · ( S i + 1 - jL ) + y i + 1 ′ · ( jL - S i ) ] / d i
In the following formula, L is the fixed sample interval, and value is a constant 1; (x i', y i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s i≤ jL<s i+ 1; d i = ( x i + 1 ′ - x i ′ ) 2 + ( y i + 1 ′ - y i ′ ) 2 Be two length between the point; S i = Σ k = 0 i - 1 d k Be cumulative length, and set s 0=0; (x j", y j"), j = 0,1 , … , [ S N L ] The new coordinate points that obtains for resampling.
The task that shaping is handled is to eliminate to wait that the part of knowing Chinese character writes distortion, comprise linear normalization, two functions of non-linear normalizing, make and wait to know the shared area of space of Chinese character and be mapped to a fixed-size position, and stroke is more even on space distribution.Shaping is handled each person's handwriting point transformation of back to new coordinate, and transformation for mula calculates by the density equalization method: at first the person's handwriting of online Chinese character is converted to Chinese character image [f (and x ", y ")] W * H, picture traverse is W, highly is H, any one person's handwriting point P (x i", y iThe corresponding black pixel point f (x of ") coordinate place i", y i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0.(x "), (y ") represents picture element density projection in the horizontal and vertical directions respectively to V to H, that is:
U ( x ″ ) = Σ y = 1 H f ( x ″ , y ″ ) + α U x″=1,2,...,W
V ( y ″ ) = Σ x = 1 W f ( x ″ , y ″ ) + α V y″=1,2,...,H
Wherein, α U, α VFor the biasing constant, set α among the present invention UV=6.Then former coordinate be (x ", the new coordinate of the person's handwriting point of y ") be (x ' ", y ' "):
x ′ ′ ′ = Σ k = 1 x ″ U ( k ) × W ′ Σ k = 1 W U ( k ) , y ′ ′ ′ = Σ l = 1 y ″ V ( l ) × H ′ Σ l = 1 H V ( l )
Wherein, W ' is the maximum horizontal ordinate after handling, and H ' is the maximum ordinate after handling.Set W '=H '=64 among the present invention.
A.2 the extraction of statistic structural features
This step is the feature that extracts the architectural characteristic that is fit to on-line handwritten Chinese character on the basis of the on-line handwritten Chinese character person's handwriting after pre-service.Among the present invention the design and extracted two kinds of statistic structural features, be called direction character and edge feature.
A.2.1 directional characteristic extraction
Direction character is to be merged by consecutive point direction character and these two kinds of features of adjacent flex point direction character to form.
The directional characteristic extracting method of consecutive point is as follows:
1) at first calculates the consecutive point direction of all the person's handwriting points except that last point: from current some P iPoint to following some P I+1The direction θ of directed line segment i, its codomain scope be [0 °, 360 °).It is invalid that the direction of last point is made as.
2) according to the direction value θ of each person's handwriting point iBe calculated as follows 4 kinds of direction attribute coefficients of this point:
Transverse direction attribute coefficients function
Figure C20051001151000224
Perpendicular direction attribute coefficients function
Figure C20051001151000225
Cast aside direction attribute coefficients function
Press down direction attribute coefficients function
Six parameter alpha 1~ α 6Be angle threshold, their effect is a shape of determining direction attribute coefficients function, is made as respectively in the present invention: α 1=-10 °, α 2=260 °, α 31=280 °, α 4=250 °, α 5=300 °, α 6=330 °.
3) the person's handwriting point coordinate is taken up space evenly be divided into K 1* K 1The height piece, 4 kinds of direction attribute coefficients sums of adding up all person's handwriting points in each sub-piece respectively.With (k, l) (1≤k≤K 1, 1≤l≤K 1) the height piece is example, 4 dimensional features that statistics obtains are respectively:
F k , l ( h ) = Σ P ( x , y ) ∈ D ( k , l ) f ( h ) ( θ ) , θ is some P (x, direction value y)
F k , l ( s ) = Σ P ( x , y ) ∈ D ( k , l ) f ( s ) ( θ ) , θ is some P (x, direction value y)
F k , l ( p ) = Σ P ( x , y ) ∈ D ( k , l ) f ( p ) ( θ ) , θ is some P (x, direction value y)
F k , l ( n ) = Σ p ( x , y ) ∈ D ( k , l ) f ( n ) ( θ ) , θ is some P (x, direction value y)
In the present invention, K 1=8, so the consecutive point direction character has 8 * 8 * 4=256 dimension.
The directional characteristic extracting method of adjacent flex point is as follows:
The method of approaching with polygon is determined the flex point in the person's handwriting, and flex point is to change violent point before and after the stroke direction of writing, and comprises the stroke flex point, calculates in the stroke cosine value of subtended angle between each point and consecutive point earlier;
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate, if α, b, c is respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes, subtended angle γ is limit α, and the angle of b, c are the opposite side of subtended angle γ, earlier calculate the length on three limits respectively, can try to achieve by the cosine law according to the coordinate of triangular apex cos γ = c 2 - a 2 - b 2 2 ab ;
The judgement of flex point is that the cosine value of working as subtended angle γ maximal value occurs and, is made as-0.8 greater than setting threshold, and this moment, γ was about 2.5 radians; The stroke end points also is set at a kind of flex point.
Calculate the adjacent flex point direction of each person's handwriting point: P sets up an office iAnd P j, j>i is a flex point adjacent in the person's handwriting point, all comprise P iPoint all is set at from a P in the direction of the interior person's handwriting point between these 2 iPoint to some P jThe directed line segment direction.
In (2) (3) two steps in the consecutive point direction character extracting method above repeating, obtain the 256 adjacent flex point direction characters of tieing up.
Consecutive point direction character and adjacent flex point direction character are merged into the direction character of 512 dimensions.
A.2.2 the extraction of edge feature
Edge feature and direction character difference are that edge feature can reflect the peripheral structural information of Chinese character preferably.The method of extracting edge feature is as follows:
At first extract the from left to right edge feature of direction of scanning: the left-half in pretreated online Chinese character handwriting corresponding image space equidistantly is divided into K 2Individual horizontal subregion is shown in Fig. 7 (a).In each subregion, from the direction of arrow, promptly the image left hand edge is turned right, and lines by line scan.If during the i time line scanning, scan certain coordinate points for the first time when being person's handwriting point, calculate 4 consecutive point direction attribute coefficients of this person's handwriting point, remember and be f I, 1 (h), f I, 1 (s), f I, 1 (p), f I, 1 (n)If, never scan the person's handwriting point, then these 4 coefficients are 0; Continue scanning, when scanning for the second time certain coordinate points when being person's handwriting point, the consecutive point direction attribute coefficients of this person's handwriting point that accumulative total runs into is remembered and is f I, 2 (h), f I, 2 (s), f I, 2 (p), f I, 2 (n), same, if never for the second time scan the person's handwriting point, then these 4 coefficients are 0.Line scanning finishes, and adds up the above coefficient that each row obtains respectively, obtains 8 dimensional features:
Figure C20051001151000241
K 2Sub regions obtains K altogether 2* 8 dimension edge features.
Repeat above method from right, upper and lower three edges in addition and 4 oblique line direction of scanning then, obtain K altogether 2The edge feature of * 8 * 8 dimensions.
In the present invention, K 2=8, edge feature has 512 dimensions.
After merging, direction character and edge feature obtain 1024 complete dimension on-line handwritten Chinese character statistic structural features.
A.3 eigentransformation
The flow process of eigentransformation has adopted linear discriminant analysis technology LDA method as shown in Figure 8, by asking for transformation matrix A, primitive character is carried out the conversion compression, obtains final recognition feature.
The concrete steps of eigentransformation are as follows:
1) at first calculate the average of each classification and the average of all categories:
μ j = 1 N j Σ i = 1 N j V i ( j ) , μ = 1 C Σ j = 1 C μ j
2) divergence matrix S in the compute classes then wWith the between class scatter matrix S b:
S w = 1 C Σ j = 1 C ( 1 N j Σ i = 1 N j ( V i ( j ) - μ j ) ( V i ( j ) - μ j ) T )
S b = 1 C Σ j = 1 C ( μ j - μ ) ( μ j - μ ) T
3) to matrix S w -1(S b+ S w) carry out eigenwert and proper vector decomposition, obtain eigenwert { γ i, i=1,2 ..., n}, eigenwert big or small descending sort according to value, and proper vector ξ i, i=1,2 ..., n.Form matrix A=[ξ with preceding m proper vector 1, ξ 2..., ξ 256], then A is exactly the matrix of a linear transformation that will ask for.In the present invention, m gets 128.
This transformation matrix A need store in the file, uses for the eigentransformation of identifying.
4) obtain transformation matrix A after, can ask for final feature, transformation for mula is:
Y=A T·V。
A.4 train the MQDF sorter
M dimension recognition feature Y according to obtaining, add up its average and covariance matrix to each classification with following formula:
μ j = 1 N j Σ i = 1 N j Y i ( j ) , Σ j = 1 N j Σ i = 1 N j ( Y i ( j ) - μ j ) · ( Y i ( j ) - μ j ) T
Wherein, Y i (j)The proper vector of representing i training sample extraction of j classification, N jBe the training sample number of j classification, μ jThe average of representing j classification, ∑ jThe covariance matrix of representing j classification.
Covariance matrix to each classification carries out eigenwert and proper vector decomposition, obtains eigenvalue i (j), i=1,2 ..., m, eigenwert big or small descending sort and proper vector ζ according to value i (j), i=1,2 .., m, λ i (j)Be i the eigenwert of ∑ j, ζ i (j)It is ∑ jI proper vector.
We calculate parameter lambda in the MQDF sorter, the substitution value of promptly little eigenwert with following formula:
λ = 1 C Σ j = 1 C λ k = 1 ( j )
In the following formula, k is the positive integer less than m, and in the present invention, k gets 32, and C represents the classification number.
Above parameter lambda i (j), j=1,2 ..., C, i=1,2 ..., k, ζ i (j), j=1,2 ..., C, i=1,2 ..., m, μ j, j=1,2 ..., C, λ store in the identification library file, use for identifying.So just finished the training process of MQDF sorter.
B. the realization of identifying
Identifying as shown in Figure 1.The same with training process, identifying also needs at first to carry out pre-service, extracts then and obtains original statistic structural features V.
When carrying out the LDA eigentransformation, the transformation matrix A that identifying directly adopts training process to provide obtains recognition feature vector Y=A TV.
When discerning with the MQDF sorter, all relevant classifier parameters read from the identification library file that training process provides.The decision function of MQDF sorter is:
g j ( Y ) = Σ i = 1 k ( ( Y - μ j ) T ξ i ( j ) ) 2 λ i ( j ) + Σ i = k + 1 m ( ( Y - μ j ) T ξ i ( j ) ) 2 λ + Σ i + 1 k log λ i ( j ) + Σ i = k + 1 m log λ j=1.2……,C
Calculate the g of each classification during identification with following formula j(Y), classifying rules is as follows:
Y is classified as i classification, if g i ( Y ) = mim 1 ≤ j ≤ C g j ( Y ) , C is the classification number in the formula
For verifying validity of the present invention, we have carried out following experiment:
Training sample set uses 1000 cover GB Chinese characters of level 2's word collection samples and 400 cover GBK word collection samples, and other 60 cover GB Chinese characters of level 2's word collection samples and 30 cover GBK word collection samples are tested in GBK word collection identification range as test sample book.Above sample is the on-line handwritten Chinese character of Free Writing.In the training and identifying of on-line handwritten Chinese character recognition system, the setting in the embodiment that sees above of concrete parameter value.
Experimental result is as follows:
6763 Chinese characters of GB Chinese characters of level 2 word collection, 60 covers are totally 405,780 samples 14240 Chinese characters of GBK Chinese Character collection, 30 covers are totally 427,200 samples Comprehensive average
The test discrimination 99.30% 98.17% 98.43%
Data can be found out from table, on-line handwritten Chinese character recognition methods based on statistic structural features all reaches very high recognition performance under two kinds of different identification ranges, recognition speed is to reach for 35.27 word/seconds on the computing machine of PentiumIV-1.7GHz in dominant frequency, can satisfy practical needs fully.
In sum, on-line handwritten Chinese character recognition methods and recognition system that the present invention proposes based on statistic structural features, can discern the on-line handwritten Chinese character of Free Writing, and the experiment proved that recognition correct rate and the reliability that reaches high, have very application prospects.

Claims (1)

1. based on the on-line handwritten Chinese character recognition methods of statistic structural features, it is characterized in that it is to be to realize according to the following steps successively on the computing machine of PentiumIV-1.7GHz that whole implement process is made of respectively training stage and cognitive phase in dominant frequency:
Training stage:
Step 1. pre-service, it contains following steps successively:
Step 1.1: the aforementioned calculation machine is sampled to people's written handwriting in real time by a kind of digitized image collecting device, and the person's handwriting of an on-line handwritten Chinese character that obtains is: P (x 1, y 1), P (x 2, y 2) ..., P (x i, y i), break, P (x I+1, y I+1) ..., P (x N, y N); Wherein, the break mark is represented the interruption of lifting pen between two natural strokes and starting to write; Described person's handwriting is a series of point coordinate that are arranged in order from the time of when Chinese character of hand script Chinese input equipment the motion track of nib being sampled and obtaining, total N of described point coordinate;
Step 1.2: the aforementioned calculation machine is removed the isolated point noise, promptly only removes from above-mentioned person's handwriting point sequence by one or two stroke of forming;
Step 1.3: filtering serrate noise, promptly the aforementioned calculation machine is weighted on average the coordinate figure of the consecutive point of each coordinate points in the above-mentioned person's handwriting, to form new coordinate points X by low-pass filtering i', Y i':
x i ′ = 1 4 ( x i - 1 + 2 · x i + x i + 1 )
y i ′ = 1 4 ( y i - 1 + 2 · y i + y i + 1 )
Step 1.4: it is inhomogeneous that the aforementioned calculation machine is eliminated pen speed by the method for resampling, promptly comes resampling with the sampling interval of a regular length, makes the stroke of certain-length with the coordinate points X of some j", Y j" represent:
x j″=[x i′·(s i+1-jL)+x i+1′·(jL-s i)]/d i
y j″=[y i′·(s i+1-jL)+y i+1′·(jL-s i)]/d i
Wherein, L is the stationary coordinate intervals, and value is a constant 1,
(x i', y i') being N coordinate points of the stroke of waiting to sample, i satisfies 1≤i≤N and s i≤ jL<s I+1
s iBe cumulative length, S i = Σ k = 0 i - 1 d k , S 0 = 0 ,
d i = ( x i + 1 ′ - x i ′ ) 2 + ( y i + 1 ′ - y i ′ ) 2 , d iBe the length between 2;
(X j", Y jThe new coordinate points that ") obtains for resampling;
j = 0,1 , · · · , [ S N L ] ;
Step 1.5: carry out shaping with the density equalization method and handle
Step 1.5.1: the person's handwriting of online Chinese character is converted to Chinese character image, this graphical representation be [f (and x ", y ")] W * H, wherein W is the picture traverse before the shaping conversion, H is its height, any one person's handwriting point P (x i", y iThe corresponding black pixel point in ") coordinate place is got f (x i", y i")=1, all the other be white elephant vegetarian refreshments f (x ", y ")=0;
Step 1.5.2: computed image density projection in the horizontal and vertical directions, use respectively U (x "), V (y ") expression:
U ( x ′ ′ ) = Σ y = 1 H f ( x ′ ′ , y ′ ′ ) + α U , x ′ ′ = 1,2 , . . . , W
V ( y ′ ′ ) = Σ x = 1 W f ( x ′ ′ , y ′ ′ ) + α V , y ′ ′ = 1,2 , . . . , H
Wherein, α U, α VFor the biasing constant, set α herein UV=6;
Step 1.5.3: calculate former coordinate for (x ", the new coordinate of the person's handwriting point of y ") (x , y ):
x ′ ′ ′ = Σ k = 1 x ′ ′ U ( k ) × W ′ Σ k = 1 W U ( k ) ; W ' is maximum horizontal ordinate after the shaping;
y ′ ′ ′ = Σ l = 1 y ′ ′ V ( l ) × H ′ Σ l = 1 H V ( l ) ; H ' is maximum ordinate after the shaping;
W ', H ' are the expectation values of the person's handwriting point coordinate scope after handling, and preestablish before handling, and establish W '=H '=64;
Step 1.5.4: interpolation and deletion coincide point
Person's handwriting point in each natural stroke is all joined end to end in regular turn, and the point that does not overlap with former person's handwriting point on the line inserts the person's handwriting sequence, and removes the coincide point in the adjacent person's handwriting point;
Step 2 is extracted statistic structural features
Step 2.1: extract direction character;
Step 2.1.1: extract the consecutive point direction character;
Step 2.1.1.1: the direction of calculating each person's handwriting point;
Appoint and get 1 P i, except the last point, some P iAt least one follow-up some P is all arranged j, j=i+1 will be from a P iTo a P jThe direction setting of directed line segment be P iThe direction value of point is used θ iExpression, its codomain scope is [0 a °, 360 °], claims that this direction value is the consecutive point direction;
θ iComputing method are, establish (X i, Y i) be a some P iCoordinate, (X j, Y j) be a some P jCoordinate;
Because θ iThe triangle tan tg ( θ i ) = Y j - Y i X j - X i ,
So θ i = arctg ( Y j - Y i X j - X i ) ;
Step 2.1.1.2: calculate the direction attribute coefficients of each person's handwriting point, refer to that promptly the direction value with this point is an independent variable, calculate 4 kinds of functional values of this point of representing in order to minor function:
Transverse direction attribute coefficients function f (h)(θ) expression:
Perpendicular direction attribute coefficients function is used f (s)(θ) expression:
Figure C2005100115100004C4
Cast aside direction attribute coefficients function, use f (p)(θ) expression:
Figure C2005100115100004C5
Press down direction attribute coefficients function, use f (n)(θ) expression:
Wherein, α 1~α 6Be angle and threshold value, be used for determining the shape of direction attribute coefficients function, establish: α 1=-10 °, α 2=260 °, α 3=280 °, α 4=250 °, α 5=300 °, α 6=330 °
Step 2.1.1.3: the coordinate space of person's handwriting dot image evenly is divided into K 1* K 1Height piece, wherein K 1Value is 8, adds up respectively in each sub-piece, and all directions attribute coefficients sum of all person's handwriting points obtains k 1* k * 4 features; For any one sub-piece wherein (k, l), 1≤k≤K 1, 1≤l≤K 1, described 4 dimensional features are respectively:
The (k, l) in the height piece, the transverse direction attribute coefficients function sum F of all person's handwriting points K, l (h)Expression: F k , l ( h ) = Σ P ( x , y ) ∈ D ( k , l ) f ( h ) ( θ ) , θ is some P (x, direction value y)
The (k, l) in the height piece, the perpendicular direction attribute coefficients function sum F of all person's handwriting points K, l (s)Expression: F k , l ( s ) = Σ P ( x , y ) ∈ D ( k , l ) f ( s ) ( θ ) , θ is some P (x, direction value y)
(k, l) in the height piece, the left-falling stroke direction attribute coefficients function sum of all person's handwriting points is used F K, l (p) expression F k , l ( p ) = Σ P ( x , y ) ∈ D ( k , l ) f ( p ) ( θ ) , θ is some P (x, direction value y)
(k, l) in the height piece, the right-falling stroke direction attribute coefficients function sum of all person's handwriting points is used F K, l (n)Expression F k , l ( n ) = Σ P ( x , y ) ∈ D ( k , l ) f ( n ) ( θ ) , θ is some P (x, direction value y)
Step 2.1.2: extract adjacent flex point direction character
Step 2.1.2.1: the method for approaching with polygon is determined the flex point in the person's handwriting, and flex point is to change violent point before and after the stroke direction of writing, and comprises the stroke flex point, calculates in the stroke cosine value of subtended angle between each point and consecutive point earlier;
The cosine value of subtended angle γ can utilize the triangle cosine law to calculate, if a, b, c is respectively leg-of-mutton three limits that the adjacent person's handwriting point with front and back of current person's handwriting point constitutes, subtended angle γ is limit a, and the angle of b, c are the opposite side of subtended angle γ, earlier calculate the length on three limits respectively, can try to achieve by the cosine law according to the coordinate of triangular apex cos γ = c 2 - a 2 - b 2 2 ab ;
When the cosine value of subtended angle γ maximal value occurs and judges that current person's handwriting point is a flex point during greater than setting threshold, wherein threshold value setting is-0.8, and this moment, γ was about 2.5 radians;
Step 2.1.2.2: calculate the adjacent flex point direction of each person's handwriting point, P sets up an office iAnd P j, j>i is a flex point adjacent in the person's handwriting point, all comprise P iPoint all is set at from a P in the direction of the interior person's handwriting point between these 2 iPoint to some P jThe directed line segment direction, then, repeat above-mentioned steps 2.1.1.2 and 2.1.1.3 again according to this point, obtain K 1* K 1The adjacent flex point direction character of * 4 dimensions;
Step 2.1.3 merges the consecutive point direction character and adjacent flex point direction character obtains k 1* k 1* 8 dimension direction characters;
Step 2.2 is extracted edge feature, and it has reflected the peripheral structural information of Chinese character
Step 2.2.1: at first extract the from left to right edge feature of scanning direction: the left-half space of pretreated online Chinese character handwriting corresponding image equidistantly is divided into K 2Individual horizontal subregion in each subregion, is turned right from the image left border and to be lined by line scan, and is all when scanning 4 consecutive point direction attribute coefficients just calculating this person's handwriting point when certain coordinate points is person's handwriting point, is designated as f I, 1 (h), f I, 1 (s), f I, 1 (p), f I, 1 (n)Wherein, subscript is illustrated in sweeps to the person's handwriting point for the first time in the i time line scanning, if sweep less than the person's handwriting point, then this coefficient is 0, secondarily in the line scanning,, be designated as f if just calculate 4 consecutive point direction attribute coefficients of this person's handwriting point when running into coordinate points as person's handwriting point once more I, 2 (h), f I, 2 (s), f I, 2 (p), f I, 2 (n), when subscript is represented the i time line scanning, run into the person's handwriting point for the second time, otherwise these 4 coefficients being 0, this line scanning finishes.Then, carry out next line scanning,, add up above 8 kinds of direction attribute coefficients that each row obtains respectively, obtain 8 dimensional features until finish all line scannings of this subregion: Σ i f i , 1 ( h ) , Σ i f i , 1 ( s ) , Σ i f i , 1 ( p ) , Σ i f i , 1 ( n ) , Σ i f i , 2 ( h ) , Σ i f i , 2 ( s ) , Σ i f i , 2 ( p ) , Σ i f i , 2 ( n ) , For K 2Sub regions obtains K altogether 2* 8 dimension edge features;
Step 2.2.2 is from right, upper and lower other three edges and four described methods of cornerwise direction repeating step 2.2.1, thereby from above-mentioned 8 directions, obtains K altogether 2The edge feature of * 8 * 8 dimensions;
Step 2.2.3: merge direction character and edge feature that above-mentioned steps 2.2.1 and 2.2.2 obtain, obtain the statistic structural features of a complete on-line handwritten Chinese character, represent with V:
Step 3 eigentransformation
Step 3.1: calculate the original feature vector set of a Chinese character in the Chinese characters of the national standard set, this set V i (j)Expression:
{{V i (j),1≤i≤N j},1≤j≤C}
Wherein, C represents the classification number, and each classification is represented a Chinese character in the Chinese characters of the national standard set; Of all categories among the C represents that with j j represents wherein j classification;
N jThe number of samples of representing j classification,
I represents i sample in j the classification;
Step 3.2 is calculated the average of each classification and the average of all categories with following formula, uses μ respectively j, μ represents;
μ j = 1 N j Σ i = 1 N j V i ( j ) ;
In calculating the Chinese characters of the national standard set, the average μ of a Chinese character of each classification representative jAfter, be calculated as follows the average of all categories:
μ = 1 C Σ j = 1 C μ j ;
Divergence matrix and between class scatter battle array are used S respectively in step 3.3 compute classes w, S bExpression:
S w = 1 C Σ j = 1 C ( 1 N j Σ i = 1 N j ( V i ( j ) - μ j ) ( V i ( j ) - μ j ) T )
S b = 1 C Σ j = 1 C ( μ j - μ ) ( μ j - μ ) T
Step 3.4 transforms to low dimensional feature space to the higher-dimension original feature vector with the linear discriminant analysis method:
To matrix S w -1(S b+ S w) carry out eigenwert and proper vector decomposition, obtain the eigenwert of big or small descending sort according to value, use γ i, i=1,2 ..., n represents, proper vector ξ i, i=1,2 ..., n represents, according to value | A T ( S b + S w ) A A T S W A | Maximum principle is selected m, and m proper vector formed matrix A before making, and A is that n * m ties up matrix, A=[ξ 1, ξ 2..., ξ m], this is the matrix of a linear transformation;
Step 3.5: be calculated as follows the proper vector Y after the conversion:
Y=A TV; The original feature vector set of V for obtaining in all categories Chinese character that from the Chinese characters of the national standard set, extracts;
Step 4. training classifier
M dimension recognition feature vector Y according to above-mentioned steps 3 obtains adds up its average μ to each classification respectively with following formula jWith the covariance matrix ∑ j:
μ j = 1 N j Σ i = 1 N j Y i ( j ) , Σ j = 1 N j Σ i = 1 N j ( Y i ( j ) - μ j ) · ( Y i ( j ) - μ j ) T
Wherein, Y i (j)The proper vector of representing i training sample extraction of j classification, N jBe the training sample number of j classification, μ jThe average of representing the recognition feature of j classification, ∑ jThe covariance matrix of representing j classification:
Then, the covariance matrix to each classification carries out eigenwert and proper vector decomposition again, obtains eigenvalue i (j), i=1,2 ..., m, it represents the eigenvalue of i training sample of j classification i (j), i=1,2 ..., m; By the descending sort of eigenwert size; Proper vector is ζ i (j), i=1,2 ..., m;
Then, be calculated as follows the substitution value of little eigenwert, promptly replace less eigenwert with pre-determined constant λ, to reduce the bad influence of little eigenwert classification performance when not statistical uncertainty:
λ = 1 C Σ j = 1 C λ k + 1 ( j ) , K is the positive integer less than m
Again above-mentioned parameter λ i (j), j=1,2 ..., C, i=1,2 ..., k, ζ i (j), j=1,2 ..., C, i=1,2 ..., m, μ j, j=1,2 ..., C, λ store in the identification library file:
Obtain modified quadratic classifier thus, represent, be calculated as follows the decision function g of input feature value with MQDF at Gauss model j(Y),
g j ( Y ) = Σ i = 1 k ( ( Y - μ j ) T ζ i ( j ) ) 2 λ i ( j ) + Σ i = k + 1 m ( ( Y - μ j ) T ζ i ( j ) ) 2 λ + Σ i = 1 k log λ i ( j ) + Σ i = k + 1 m log λ , j = 1.2 · · · · · · , C
Classification decision rule is:
If: g i ( Y ) = min 1 ≤ j ≤ C g j ( Y )
Then: Y is classified as j classification;
Cognitive phase:
At first, obtain recognition feature Y, calculate the decision function g of each classification with following formula by sample to be known j(Y), m, the value of K is identical with training process;
g j ( Y ) = Σ i = 1 k ( ( Y - μ j ) T ζ i ( j ) ) 2 λ i ( j ) + Σ i = k + 1 m ( ( Y - μ j ) T ζ i ( j ) ) 2 λ + Σ i = 1 k log λ i ( j ) + Σ i = k + 1 m log λ , j = 1.2 · · · · · · , C Secondly,
The sample to be known of input is divided into and makes g j(Y) get the classification of minimum value.
CNB200510011510XA 2005-04-01 2005-04-01 On-line hand-written Chinese characters recognition method based on statistic structural features Expired - Fee Related CN1333366C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200510011510XA CN1333366C (en) 2005-04-01 2005-04-01 On-line hand-written Chinese characters recognition method based on statistic structural features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200510011510XA CN1333366C (en) 2005-04-01 2005-04-01 On-line hand-written Chinese characters recognition method based on statistic structural features

Publications (2)

Publication Number Publication Date
CN1664846A CN1664846A (en) 2005-09-07
CN1333366C true CN1333366C (en) 2007-08-22

Family

ID=35035928

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200510011510XA Expired - Fee Related CN1333366C (en) 2005-04-01 2005-04-01 On-line hand-written Chinese characters recognition method based on statistic structural features

Country Status (1)

Country Link
CN (1) CN1333366C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299236B (en) * 2008-06-25 2010-06-09 华南理工大学 Method for recognizing Chinese hand-written phrase

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100394435C (en) * 2006-05-25 2008-06-11 无敌科技(西安)有限公司 Word recognition method and its system
CN101393643B (en) * 2007-09-21 2012-01-18 华东师范大学 Computer stroke deforming system and method
CN101320422B (en) * 2008-06-06 2010-06-02 广东开心信息技术有限公司 Normative decision method and apparatus for cross, connection and separation relationship of handwritten Chinese character strokes
CN101354747B (en) * 2008-09-18 2011-07-20 炬力集成电路设计有限公司 Method and apparatus for recognizing hand-written symbol
CN101901344B (en) * 2010-08-13 2012-04-25 上海交通大学 Method for detecting character image local feature based on corrosion method and DoG operator
CN102043537B (en) * 2010-12-28 2013-12-25 东莞宇龙通信科技有限公司 Mobile terminal and control method for hand input
CN102135820B (en) * 2011-01-18 2013-03-06 浙江大学 Planarization pre-processing method
CN103646582B (en) * 2013-12-04 2016-08-17 广东小天才科技有限公司 A kind of method and device pointing out clerical error
CN104102450A (en) * 2014-06-18 2014-10-15 深圳贝特莱电子科技有限公司 Touch screen based gesture recognition method and system
CN104766101B (en) * 2015-04-22 2018-02-06 福州大学 A kind of k nearest neighbor hand-written discrimination system algorithm based on searching characteristic value
CN106485280A (en) * 2016-10-18 2017-03-08 安徽天达网络科技有限公司 A kind of computer based image-recognizing method
CN108416249A (en) * 2017-02-10 2018-08-17 肖奇 A kind of written handwriting identification system and method
CN108932454A (en) * 2017-05-23 2018-12-04 杭州海康威视系统技术有限公司 A kind of character recognition method based on picture, device and electronic equipment
JP6868057B2 (en) * 2019-05-27 2021-05-12 株式会社東芝 Reading system, reading method, program, storage medium, and mobile
CN111144064B (en) * 2019-12-05 2022-07-19 稿定(厦门)科技有限公司 Character deformation method, medium, equipment and device
CN112861709A (en) * 2021-02-05 2021-05-28 金陵科技学院 Hand-drawn sketch recognition method based on simple strokes

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09319826A (en) * 1996-05-28 1997-12-12 Oki Electric Ind Co Ltd Hand-written character recognition device
JPH10214312A (en) * 1997-01-29 1998-08-11 Hitachi Ltd Online hand-written character recognition device
CN1204817A (en) * 1997-07-02 1999-01-13 株式会社三井高科技 Method and apparatusfor on-line handwritten input character recognition and recording medium for executing said method
CN1252584A (en) * 1998-10-26 2000-05-10 松下电器产业株式会社 On-line hand writing Chinese character distinguishing device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH09319826A (en) * 1996-05-28 1997-12-12 Oki Electric Ind Co Ltd Hand-written character recognition device
JPH10214312A (en) * 1997-01-29 1998-08-11 Hitachi Ltd Online hand-written character recognition device
CN1204817A (en) * 1997-07-02 1999-01-13 株式会社三井高科技 Method and apparatusfor on-line handwritten input character recognition and recording medium for executing said method
CN1252584A (en) * 1998-10-26 2000-05-10 松下电器产业株式会社 On-line hand writing Chinese character distinguishing device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于笔段间关系的联机手写汉字HMM模型 鲁湛,丁晓青,清华大学学报(自然科学版),第44卷第7期 2004 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299236B (en) * 2008-06-25 2010-06-09 华南理工大学 Method for recognizing Chinese hand-written phrase

Also Published As

Publication number Publication date
CN1664846A (en) 2005-09-07

Similar Documents

Publication Publication Date Title
CN1333366C (en) On-line hand-written Chinese characters recognition method based on statistic structural features
Chacko et al. Handwritten character recognition using wavelet energy and extreme learning machine
CN102622610B (en) Handwritten Uyghur character recognition method based on classifier integration
Coetzer et al. Offline signature verification using the discrete radon transform and a hidden Markov model
US8391613B2 (en) Statistical online character recognition
Dongre et al. A review of research on Devnagari character recognition
CN101510259B (en) On-line identification method for 'ding' of handwriting Tibet character
Sonkusare et al. A survey on handwritten character recognition (HCR) techniques for English alphabets
CN105426842A (en) Support vector machine based surface electromyogram signal multi-hand action identification method
CN1828630A (en) Manifold learning based human face posture identification method
CN101968853A (en) Improved immune algorithm based expression recognition method for optimizing support vector machine parameters
CN104899601A (en) Identification method of handwritten Uyghur words
Deore et al. A survey on offline signature recognition and verification schemes
CN102542243A (en) LBP (Local Binary Pattern) image and block encoding-based iris feature extracting method
Okawa et al. Offline writer verification based on forensic expertise: Analyzing multiple characters by combining the shape and advanced pen pressure information
Dhanikonda et al. An efficient deep learning model with interrelated tagging prototype with segmentation for telugu optical character recognition
Saraf et al. Devnagari script character recognition using genetic algorithm for get better efficiency
Ramzi et al. Online Arabic handwritten character recognition using online-offline feature extraction and back-propagation neural network
Guo et al. Research on Feature Extraction for Character Recognition of NaXi Pictograph.
Pal et al. Interval-valued symbolic representation based method for off-line signature verification
CN106295478A (en) A kind of image characteristic extracting method and device
Halder et al. Individuality of isolated Bangla numerals
Singh et al. Survey on offline signature recognition and verification schemes
Padmajadevi et al. A review of handwritten signature verification systems and methodologies
Izadi et al. A new segmentation algorithm for online handwritten word recognition in Persian script

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070822

Termination date: 20140401