CN103324925A - Method and device used for obtaining character data used for handwritten character recognition - Google Patents

Method and device used for obtaining character data used for handwritten character recognition Download PDF

Info

Publication number
CN103324925A
CN103324925A CN2012100780589A CN201210078058A CN103324925A CN 103324925 A CN103324925 A CN 103324925A CN 2012100780589 A CN2012100780589 A CN 2012100780589A CN 201210078058 A CN201210078058 A CN 201210078058A CN 103324925 A CN103324925 A CN 103324925A
Authority
CN
China
Prior art keywords
character data
data
conversion
curve
curvilinear
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012100780589A
Other languages
Chinese (zh)
Inventor
沈海峰
山本宽树
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN2012100780589A priority Critical patent/CN103324925A/en
Publication of CN103324925A publication Critical patent/CN103324925A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

The invention provides a method and device used for obtaining character data used for handwritten character recognition by methods of curvilinear transformation and generating a handwritten character recognition classifier by the obtained character data. The method used for obtaining the character data used for handwritten character recognition includes an obtaining step and a curvilinear transformation step, wherein in the obtaining step, at least one original handwritten character datum is obtained, and in the curvilinear transformation step, at least one curvilinear transformation method is applied to each of the obtained original handwritten character data, and at least one transformation character datum is obtained for each of the obtained original handwritten character data to serve as the character data used for handwritten character recognition. In the curvilinear transformation step, shapes of the original handwritten character data are transformed in a non-linear mode, the structure of the original handwritten character data is not broken, and the transformed character data cannot be recovered to the original handwritten character data in a pre-processing method.

Description

Be used for the method and apparatus that acquisition is used for the character data of Handwritten Digits Recognition
Technical field
The present invention relates generally to for the method and apparatus that obtains for the character data of Handwritten Digits Recognition.More specifically, the present invention relates to a kind of method and apparatus that uses the acquisition of curvilinear transformation method to be used for the character data of Handwritten Digits Recognition.The invention still further relates to a kind of for using the character data that obtains to generate the method and apparatus of Handwritten Digits Recognition sorter.
Background technology
Handwritten Digits Recognition is used widely in a lot of fields.Proposed much to be used for the technology of Handwritten Digits Recognition so far.Data volume limitation and the robustness of raising handwritten character recognizing system to environment diversity (such as different writers and different writing styles) in order to overcome the hand-written character data extensively adopt the technology that is used for generating artificial hand write characters data.
The method of U.S. Patent No. 5903884 use change length breadth ratios and spinning solution are to change the coordinate of character track.
U.S. Patent No. 7418128 uses the random deformation method to change the coordinate of character track.
In Fig. 1, character picture (a) is original hand-written character data, and character picture (b) is the conversion character data when using the method that changes length breadth ratio, and character picture (c) is the conversion character data when using spinning solution.As can be known, conversion character picture (b) is restored to original hand-written character image (a) easily by the Preprocessing Algorithm of carrying out such as the size normalization algorithm.Conversion character picture (c) is restored to original hand-written character image (a) easily by the Preprocessing Algorithm of carrying out such as the inclination normalization algorithm.Size normalization algorithm and inclination normalization algorithm generally use in traditional handwritten character recognizing system.Obviously, if in conjunction with above-mentioned method, the effect of conversion will lose.
In Fig. 1, character picture (d) and (e) are illustrated in the conversion character data when using different random deformation methods.What the character that is difficult to be identified in the character picture (d) under the situation that does not have reference character image (a) is.This method is not considered the shape of hand-written character data and along the relation between the consecutive point of character track.This causes the conversion of the mistake of hand-written character data or distortion easily and may destroy the structure of hand-written character data when using bigger random offset.In fact, the writer can not write out this character.Can not show any effect in conjunction with this character for Handwritten Digits Recognition.
Summary of the invention
As mentioned above, inventor of the present invention has been found that above-mentioned known typical technology for Handwritten Digits Recognition can not effectively overcome the data volume limitation of hand-written character data.
According to the known typical technology, if in conjunction with carrying out the length breadth ratio that changes original hand-written character data or the processing of rotating original hand-written character data such as the Preprocessing Algorithm of size normalization algorithm and inclination normalization algorithm, then the effect of conversion will lose.In addition, if the processing of using random deformation to original hand-written character data then may destroy the structure of original hand-written character data, make to be difficult to identify processing character.
For in solving the problems of the technologies described above at least one, the invention provides a kind of method for the character data that obtains to be used for Handwritten Digits Recognition, comprising: obtaining step, obtain at least one original hand-written character data; With the curvilinear transformation step, in the original hand-written character data of obtaining each is used at least a curvilinear transformation method, and obtain at least one conversion character data in the original hand-written character data of obtaining each, as the character data that is used for Handwritten Digits Recognition, wherein, the shape of the original hand-written character data of curvilinear transformation step nonlinear transformation, and do not destroy the structure of original hand-written character data, and the conversion character data can not be restored to original hand-written character data by preprocess method.
In addition, for in solving the problems of the technologies described above at least one, the invention provides a kind of for the method that generates the Handwritten Digits Recognition sorter, comprise: data obtain step, use above-mentionedly to obtain original hand-written character data and about the conversion character data of original hand-written character data for the method for the character data of Handwritten Digits Recognition of obtaining; Generate step with sorter, use the original hand-written character data of acquisition and the conversion character data of acquisition to generate the Handwritten Digits Recognition sorter.
In addition, in solving the problems of the technologies described above at least one, the invention provides a kind of device for the character data that obtains to be used for Handwritten Digits Recognition, comprising: acquiring unit is configured to obtain at least one original hand-written character data; With the curvilinear transformation unit, each that is configured in the original hand-written character data of obtaining is used at least a curvilinear transformation method, and obtain at least one conversion character data in the original hand-written character data of obtaining each, as the character data that is used for Handwritten Digits Recognition, wherein, the shape of the original hand-written character data of curvilinear transformation unit nonlinear transformation, and do not destroy the structure of original hand-written character data, and the conversion character data can not be restored to original hand-written character data by preprocess method.
In addition, for in solving the problems of the technologies described above at least one, the invention provides a kind of for the device that generates the Handwritten Digits Recognition sorter, comprise: data obtain the unit, are configured to utilize above-mentioned device for the character data that obtains to be used for Handwritten Digits Recognition to obtain original hand-written character data and about the conversion character data of original hand-written character data; With the sorter generation unit, be configured to use the original hand-written character data of acquisition and the conversion character data of acquisition to generate the Handwritten Digits Recognition sorter.
Obviously, the present invention is different from the scheme of current existence.The above-mentioned limitation of the scheme that the objective of the invention is to use the curvilinear transformation method to overcome current existence.Conversion of the present invention or distortion are by multistage curve controlled and execution in any direction.Therefore, the present invention can only change the shape of hand-written character data and not destroy the structure of hand-written character data, and the connection between the change point is very level and smooth.The conversion character data can not be restored to original hand-written character data when the Preprocessing Algorithm of use such as inclination normalization algorithm.Utilize curvilinear transformation method of the present invention, can tackle the hand-written diversity such as writers different in true environment and different writing style effectively.
From following description with reference to accompanying drawing, other property features of the present invention and advantage will become clear.
Description of drawings
The accompanying drawing that is incorporated in the instructions and constitutes an instructions part shows embodiments of the invention, and with describe one and be used from explanation principle of the present invention.
Fig. 1 schematically shows the image of the character of original hand-written character data and the various transform methods of process.
Fig. 2 schematically shows original hand-written character data, passes through the character of curvilinear transformation method of the present invention and the image that passes through the character of curvilinear transformation method and the normalized preprocess method that tilts.
Fig. 3 illustrates according to example of the present invention to be used for acquisition for the process flow diagram of the base conditioning of the method for the character data of Handwritten Digits Recognition.
Fig. 4 illustrates the process flow diagram that is used for the exemplary processes of curvilinear transformation step S120 according to example of the present invention.
Fig. 5 illustrates the example of curvilinear transformation parameter.
Fig. 6 illustrate controlling party to example.
Fig. 7 illustrates according to an example of the present invention to be used for the process flow diagram that the curvilinear transformation parameter obtains the exemplary processes of processing.
Fig. 8 illustrates according to an example of the present invention to be used for the process flow diagram that the curvilinear transformation parameter obtains the exemplary processes of step S211.
Fig. 9 illustrates according to an example of the present invention to be used for the process flow diagram that the conversion character data obtains the exemplary processes of step S122.
Figure 10 schematically shows the details according to the sampled point of the curvilinear transformation parameter of an example of the present invention and original hand-written character data.
Figure 11 illustrates the original hand-written character data of obtaining and uses different β values from the comparison between the conversion character data of Figure 10 acquisition.
Figure 12 illustrates the example of the conversion character data of using different curvilinear transformation methods.
Figure 13 illustrates the experimental evaluation of the Handwritten Digits Recognition precision of utilizing the sorter that uses the diverse ways generation.
Figure 14 illustrates another experimental evaluation of the Handwritten Digits Recognition precision of utilizing the sorter that uses the diverse ways generation.
Figure 15 is used for acquisition for the functional block diagram of the device 1 of the character data of Handwritten Digits Recognition according to example of the present invention.
Figure 16 illustrates the process flow diagram that is used for the base conditioning of generation Handwritten Digits Recognition sorter according to example of the present invention.
Figure 17 is the functional block diagram that is used for the device 10 of generation Handwritten Digits Recognition sorter according to example of the present invention.
Figure 18 is the block diagram that the hardware configuration of the computer system 1000 that can implement embodiments of the invention is shown.
Embodiment
Describe embodiments of the invention in detail hereinafter with reference to accompanying drawing.
Note that similar reference number and letter refer among the figure similarly project, thereby in case in a width of cloth figure, defined a project, needn't after figure in this project is discussed again.
Fig. 2 schematically shows original hand-written character data, passes through the character of curvilinear transformation method of the present invention and the image that passes through the character of curvilinear transformation method and the normalized preprocess method that tilts.
In Fig. 2, conversion character picture (b) obtains when original hand-written character image (a) is used the curvilinear transformation method, and character picture (c) is to obtain when the preprocess method that conversion character picture (b) is used such as the inclination method for normalizing.Obviously, the character in the character picture (c) significantly is different from the original hand-written character in the character picture (a).Therefore, the present invention can use in conjunction with Preprocessing Algorithm.The fruit of multiple-effect more will be obtained after the combination and hand-written diversity such as writers different in true environment and different writing style can be successfully managed.
Fig. 3 illustrates according to example of the present invention to be used for acquisition for the process flow diagram of the base conditioning of the method for the character data of Handwritten Digits Recognition.
As shown in Figure 3, can comprise obtaining step S110 and curvilinear transformation step S120 for the method that obtains for the Handwritten Digits Recognition character data.
At obtaining step S110, can obtain at least one original hand-written character data.
At curvilinear transformation step S120, each that can be in the original hand-written character data of obtaining is used at least a curvilinear transformation method, and obtain at least one conversion character data in the original hand-written character data of obtaining each, as the character data that is used for Handwritten Digits Recognition.At curvilinear transformation step S120, the shape of original hand-written character data can be by nonlinear transformation, and does not destroy the structure of original hand-written character data, and the conversion character data can not be restored to original hand-written character data by preprocess method.Known, preprocess method can comprise size method for normalizing and inclination method for normalizing.
Fig. 4 illustrates the process flow diagram that is used for the exemplary processes of curvilinear transformation step S120 according to example of the present invention.
As shown in Figure 4, in the original hand-written character data of obtaining each, curvilinear transformation step S120 can comprise that the curvilinear transformation parameter arranges step S121 and the conversion character data obtains step S122.
In the curvilinear transformation parameter step S121 is set, can be provided for controlling at least one suite line transformation parameter of at least a curvilinear transformation method.
Obtain step S122 in the conversion character data, can be obtained up to few conversion character data by using at least one suite line transformation parameter that the original hand-written character data of obtaining are carried out at least a curvilinear transformation method.
In the curvilinear transformation parameter step S121 is set, can manual definition curvilinear transformation parameter.
Fig. 5 illustrates the example of curvilinear transformation parameter.For example, curvilinear transformation parameter can comprise controlling party to, control curve, control parameter 1 (namely first control parameter) and control parameter 2 (namely second controlling parameter).
Controlling party to:
Fig. 6 illustrate controlling party to four examples.In the left side of the hand-written character data of image (a), controlling party is to being to push away from left to right in the horizontal direction.On the right side of the hand-written character data of image (b), controlling party is to being to push away from right to left in the horizontal direction.At upside and the downside of image (c) and hand-written character data (d), controlling party is to being respectively to push away from the top down in the vertical direction and push away from bottom to top.
Can use various controlling parties to, and without departing from the spirit and scope of the present invention.For example, the controlling party that can use the diagonal that is parallel to the hand-written character data to.
The control curve:
As shown in Figure 5, the control curve is an arcuate curve.Can use various control curves, and without departing from the spirit and scope of the present invention.
For example, the control curve can be selected from camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve.In addition, the control curve can be selected from the combination of straight line, camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve.As example, the control curve can be the curve of straight line and sinusoidal combination.As another example, the control curve can be the curve of two sinusoidal combinations.
Control parameter 1:
Control parameter 1 can design along controlling party to maximum conversion side-play amount, as shown in Figure 5.
Usually, control parameter 1=β * along controlling party to maximum hand-written character span, 0<β≤1 wherein.As shown in Figure 5, naming a person for a particular job on the center line of hand-written character data has maximum conversion side-play amount, and the top of hand-written character data and bottom will have offset of zero.
Control parameter 2:
Control parameter 2 can design perpendicular to controlling party to direction on these original hand-written character data of obtaining want the scope of conversion.
Usually, control parameter 2=γ * along perpendicular to controlling party to direction on maximum hand-written character span, 0<γ≤1 wherein.As shown in Figure 5, control parameter 2 illustrates the scope that covers whole hand-written character data, that is, and and γ=1.In other words, all sampled points of hand-written character data will be transformed.
Utilize the curvilinear transformation parameter of definition, can control at least a curvilinear transformation method.Then, obtain step S122 in the conversion character data, can be obtained up to few conversion character data by using at least one suite line transformation parameter that the original hand-written character data of obtaining are carried out at least a curvilinear transformation method.
Alternatively, can carry out at least a curvilinear transformation method to normal data and obtain the conversion character data.Thereby curvilinear transformation step S120 can comprise also that data select step, selects data from the original hand-written character data of obtaining, and wherein, the data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data.
Confidence score can be determined according to the criteria of likelihood ratio.Confidence score can be by normalization, so the threshold value of definition can be arranged between 0 and 1.
Alternatively, curvilinear transformation parameter can obtain to handle setting by the curvilinear transformation parameter that the curvilinear transformation parameter arranges among the step S121.
Fig. 7 illustrates according to an example of the present invention to be used for the process flow diagram that the curvilinear transformation parameter obtains the exemplary processes of processing.
Fig. 7 as shown, in the original hand-written character data of obtaining each, the curvilinear transformation parameter obtains to handle and can comprise: data select step S210 and curvilinear transformation parameter to obtain step S211.
Select step S210 in data, select data from the original hand-written character data of obtaining, wherein, data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data, and the data that have the first correct identification candidate of the threshold value that confidence score is lower than definition or have a correct identification candidate who is selected from other remaining N best (N-best) identification candidates by the user are selected as abnormal data.
Confidence score can be determined according to the criteria of likelihood ratio.Confidence score can be by normalization, so the threshold value of definition can be arranged between 0 and 1.
Known, the original hand-written character data of obtaining can be by the identification of Handwritten Digits Recognition engine.Identification candidate with N classification of higher fractional can be known as the N-best candidate.
Obtain step S211 in the curvilinear transformation parameter, can obtain at least one group of best curve transformation parameter at least a curvilinear transformation method of control from normal data and abnormal data.
Fig. 8 illustrates according to an example of the present invention to be used for the process flow diagram that the curvilinear transformation parameter obtains the exemplary processes of step S211.
Fig. 8 as shown, the curvilinear transformation parameter obtains step S211 can comprise that curvilinear transformation parameter-definition step S310, learning data obtain step S320 and the best curve transformation parameter obtains step S330.
At curvilinear transformation parameter-definition step S310, be similar to the processing of the defined parameters shown in Fig. 5, can define at least one suite line transformation parameter at least a curvilinear transformation method of control.
Obtain step S320 at learning data, can obtain at least one group of learning data by using at least one suite line transformation parameter that normal data is carried out at least a curvilinear transformation method.
Obtain step S330 at the best curve transformation parameter, can obtain at least one group of best curve transformation parameter by finding at least one and the immediate learning data of abnormal data.For example, can be by finding according to range observation comparative learning data and abnormal data near learning data.
Learning data (for example data by using many suites line transformation parameter to obtain) and abnormal data have a same T sampled point respectively.Other is [x i(t), y i(t)] be the sampled point of i learning data in two dimensional surface at the coordinate of time t, 0≤t≤T wherein, [x (t), y (t)] is that the sampled point of abnormal data is at the coordinate of time t in addition.Distance between i learning data and the abnormal data can be calculated by the Euclidean distance of using as give a definition (Euclidean distance):
d i = Σ t = 1 T { [ x i ( t ) - x ( t ) ] 2 + [ y i ( t ) - y ( t ) ] 2 }
By normal data being used many suites line transformation parameter obtain a plurality of learning datas.Therefore, can obtain a plurality of distance values according to above-mentioned formula.By comparing these distance values d i, can be from d iAcquisition has the sequence number j of lowest distance value:
j = arg min i d i
This means that j learning data and abnormal data are the most approaching or similar.Therefore, j suite line transformation parameter the best.
Utilize the best curve transformation parameter that obtains, can control at least a curvilinear transformation method.Then, obtain step S122 in the conversion character data, by according at least one group of best curve transformation parameter normal data being carried out at least a curvilinear transformation method, can be obtained up to few conversion character data.
The conversion character data obtains step S122 and can carry out in every way.Fig. 9 illustrates according to an example of the present invention to be used for the process flow diagram that the conversion character data obtains the exemplary processes of step S122.
As shown in Figure 9, conversion character data obtains step S122 can comprise curve size set-up procedure S220, conversion side-play amount calculation procedure S221 and coordinate modify steps S222.
At curve size set-up procedure S220, can adjust the size of controlling curve according to control parameter 1 and control parameter 2.
At conversion side-play amount calculation procedure S221, can want all sampled points in the scope of conversion to calculate one group of conversion side-play amount to the original hand-written character data of obtaining at this according to the control curved needle of adjusting.
At coordinate modify steps S222, can by using the conversion side-play amount corresponding to sampled point, be modified in the coordinate that these original hand-written character data of obtaining are wanted each sampled point in the scope of conversion according to one group of conversion side-play amount calculating.
Figure 10 schematically shows the details according to the sampled point of the curvilinear transformation parameter of an example of the present invention and original hand-written character data.
Suppose to have T sampled point altogether in the original hand-written character data of obtaining, (x y) is described in the coordinate of each sampled point in the original hand-written character data of obtaining to use the planimetric coordinates system.
The coordinate of each sampled point in the original hand-written character data of obtaining is described to (x i, y i), 1≤i≤T wherein.
In example, the curvilinear transformation parameter is at the definition of control curvilinear transformation method.For example, controlling party is to being to push away (that is, the x axle positive dirction among Figure 10 is by arrow indication) in the horizontal direction from left to right; The control curve is selected as an arcuate curve; Along controlling party to maximum conversion side-play amount be defined as control parameter 1; Perpendicular to controlling party to direction on these original hand-written character data of obtaining want the scope (that is the y axle positive dirction among Figure 10) of conversion be defined as control parameter 2.
As mentioned above, control parameter 1=β * along controlling party to maximum hand-written character span, 0<β≤1 wherein, control parameter 2=γ * along perpendicular to controlling party to direction on maximum hand-written character span, 0<γ≤1 wherein.In example, suppose that all sampled points of the original hand-written character data of obtaining will be transformed, that is, and γ=1.
Use the curvilinear transformation parameter control curvilinear transformation method of definition.After this, how the original hand-written character data of obtaining are carried out curvilinear transformation with describing.
In the step that is similar to curve size set-up procedure S220, can according to control parameter 1 along controlling party to and according to control parameter 2 along perpendicular to controlling party to the size of direction adjustment control curve.
After the size of adjusting the control curve, original hand-written character data and the control curve that obtains can be placed in the frame of reference, be used for simplifying the description of this invention and understanding.
In frame of reference, at each sampled point (x i, y i), be parallel to the x axle and pass first reference line of sampled point by drafting, can find the corresponding point that have identical y coordinate figure with sampled point.For the sampled point CC among Figure 10, be parallel to the first reference line RL1 that the x axle also passes through sampled point CC by drafting, can find corresponding point BB from the second reference line RL2, and find corresponding point bb from the control curve, the second reference line RL2 is parallel to the y axle and passes at least one end points of control curve.
In the step that is similar to conversion side-play amount calculation procedure S221, at all sampled points in the original hand-written character data of obtaining, can be according to the control curve calculation conversion side-play amount of adjusting.
Utilize maximum conversion side-play amount, first reference line and second reference line of definition, at each point in the control curve, by determine in the control curve o'clock to the distance of second reference line, can find the sampled point that occurs maximum conversion side-play amount in the original hand-written character data of obtaining.
For example, as shown in figure 10, some aa is to the distance of second reference line, that is, the distance of some aa and AA equals maximum conversion side-play amount.So, can determine that maximum conversion side-play amount will appear at the sampled point DD of the original hand-written character data of obtaining, sampled point DD and some AA and aa are on same first reference line.
According to control curve, maximum conversion side-play amount and the some AA distance with some BB, conversion side-play amount that can calculating sampling point CC.
Similarly, can the computational transformation side-play amount at all sampled points in the original hand-written character data of obtaining.
The control curve is depended in the processing that is used for calculating.Situation at difference control curve can be used different computings.
According to the conversion side-play amount of calculating, at all sampled points in the original hand-written character data of obtaining, revise the coordinate of each sampled point.
In example, because therefore controlling party only revises the x coordinate of sampled point to along the x axle.If controlling party will only be revised the y coordinate of sampled point according to the conversion side-play amount to along the y axle, utilize and the similar mode computational transformation of aforesaid way side-play amount.If controlling party will be revised x coordinate and the y coordinate of sampled point according to the conversion side-play amount to along other directions, utilize and the similar mode computational transformation of aforesaid way side-play amount.
Utilize the coordinate of the modification of all sampled points, can obtain the conversion character data to the original hand-written character data of obtaining.
Figure 11 illustrates the original hand-written character data of obtaining and uses different β values from the comparison between the conversion character data of Figure 10 acquisition.
As shown in figure 11, along with the increase of β value, it is big that the conversion degree becomes.
Figure 12 illustrates the example of the conversion character data of using different curvilinear transformation methods.
At image (a) and (b), use along continuous straight runs to push away from left to right respectively and push away, utilize 1/2 arcuate curve transform method from right to left.As can be seen, along with the increase of β value, it is big that the conversion degree becomes.
As mentioned above, for more complicated hand-written character map function, dissimilar curvilinear transformation method combinations easily.For example, at image (c), use along continuous straight runs to push away, utilize one type curvilinear transformation method of 1/2 arcuate curve to push away, utilize the curvilinear transformation method of the another kind of type of 1 arcuate curve to be combined from left to right with using along continuous straight runs from right to left.Obviously, utilize the conversion character data of the transform method acquisition of this combination also to be different from the original hand-written character data of obtaining and the conversion character data of using independent transform method to obtain.
Can use experimental evaluation to represent the advantage of said method.
Figure 13 and Figure 14 illustrate the experimental evaluation of the Handwritten Digits Recognition precision of utilizing the sorter that uses the diverse ways generation.
At Figure 13, the control curve is 1/2 arcuate curve." baseline " expression is not only used the original hand-written character data of recording in advance of obtaining the conversion character data to be used for sorter and is generated.The original hand-written character data that other experiments all will be obtained and the conversion character data of same quantity are used for sorter and generate.Use different curvilinear transformation methods to obtain the conversion character data respectively, the curvilinear transformation parameter wherein manually is set.As can be seen, extra conversion character data is very useful.If use extra conversion character data (using the conversion character data that along continuous straight runs pushes away from right to left, the curvilinear transformation method of β=0.2 obtains), the precision of Handwritten Digits Recognition can significantly be improved to 84.51% from 80.32%.
In Figure 14, the control curve is 1 arcuate curve.Similarly, if use extra conversion character data (using the conversion character data that along continuous straight runs pushes away from right to left, the curvilinear transformation method of β=0.3 obtains), the precision of Handwritten Digits Recognition can significantly be improved to 82.81% from 80.32%.
Can realize that above-mentioned being used for obtains the whole bag of tricks for the character data of Handwritten Digits Recognition by various devices.
Figure 15 is used for acquisition for the functional block diagram of the device 1 of the character data of Handwritten Digits Recognition according to example of the present invention.
Figure 15 as shown, the device 1 that is used for obtaining being used for the character data of Handwritten Digits Recognition can comprise acquiring unit 110 and curvilinear transformation unit 120, is configured to realize respectively the obtaining step S110 shown in Fig. 3 and curvilinear transformation step S120.
Preferably, curvilinear transformation unit 120 comprises that also curvilinear transformation parameter set unit 121 and conversion character data obtain unit 122, is configured to realize respectively that the transformation parameter shown in Fig. 4 arranges step S121 and the conversion character data obtains step S122.
More preferably, curvilinear transformation unit 120 also comprises data selection unit (not illustrating in the drawings), be configured to realize data selection step, select data from the original hand-written character data of obtaining, wherein, the data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data.
Can use curvilinear transformation parameter set unit 121 manual definition curvilinear transformation parameters.
Alternatively, can use curvilinear transformation parameter set unit 121 to come to obtain processing by the curvilinear transformation parameter curvilinear transformation parameter is set.
Preferably, curvilinear transformation parameter set unit 121 comprises that also data selection unit 210 and curvilinear transformation parameter obtain unit 211, is configured to realize respectively that data selection step S210 and the curvilinear transformation parameter shown in Fig. 7 obtains step S211.
Preferably, the curvilinear transformation parameter obtains unit 211 and comprises that also curvilinear transformation parameter-definition unit 310, learning data obtain unit 320 and the best curve transformation parameter obtains unit 330, is configured to realize respectively that the curvilinear transformation parameter-definition step S310 shown in Fig. 8, learning data obtain step S320 and the best curve transformation parameter obtains step S330.
Preferably, the conversion character data obtains unit 122 and also comprises curve size adjustment unit 220, conversion side-play amount computing unit 221 and coordinate modification unit 222, is configured to realize respectively curve size set-up procedure S220, conversion side-play amount calculation procedure S221 and the coordinate modify steps S222 shown in Fig. 9.
Above-mentioned unit and the following unit that will describe are for the exemplary and/or preferred module that realizes various steps.These modules can be hardware cell (such as processor, special IC etc.) and/or software module (such as computer-readable program).The module that is used for implementing each step is not below at large described.Yet, as long as the step of carrying out certain processing is arranged, just can be useful on the functional module of the correspondence that realizes same processing or unit (by and/or the software realization).The technical scheme that all combinations by step described below and the device corresponding with these steps limit all is included in the disclosure of the specification, as long as these technical schemes that their constitute are complete and applicable.
In addition, the said apparatus that is made of each unit can be incorporated in the hardware device such as computing machine as functional module.Except these functional modules, these computing machines can have other hardware or component software certainly.
The method and apparatus of describing with reference to figure 1-15 can be used separately or be bonded to each other and be applied to method and apparatus for generating the Handwritten Digits Recognition sorter.
Referring now to Figure 16 and 17 method and apparatus that is used for generating the Handwritten Digits Recognition sorter according to one exemplary embodiment of the present invention is described.Figure 16 illustrates the process flow diagram that is used for the base conditioning of generation Handwritten Digits Recognition sorter according to example of the present invention.Figure 17 is the functional block diagram that is used for the device 10 of generation Handwritten Digits Recognition sorter according to example of the present invention.
Figure 16 can comprise that for the method that generates the Handwritten Digits Recognition sorter data obtain step S101 and sorter generates step S102 as shown.
Obtain step S101 in data, can use or be bonded to each other application separately to obtain original hand-written character data and at the conversion character data of original hand-written character data with reference to the method for figure 1-14 description.
Generate step S102 at sorter, can use the original hand-written character data of acquisition and the conversion character data of acquisition to generate the Handwritten Digits Recognition sorter.
As shown in figure 17, the device 10 that is used for generation Handwritten Digits Recognition sorter according to one exemplary embodiment of the present invention can comprise that data obtain unit 101 and sorter generation unit 102, is configured to realize respectively that the data shown in Figure 16 obtain step S101 and sorter generates step S102.
Figure 18 is the block diagram that the hardware configuration of the computer system 1000 that can implement embodiments of the invention is shown.
As shown in figure 18, computer system comprises computing machine 1110.Computing machine 1110 comprises processing unit 1120, system storage 1130, fixed non-volatile memory interface 1140, mobile non-volatile memory interface 1150, user's input interface 1160, network interface 1170, video interface 1190 and the output peripheral interface 1195 that connects via system bus 1121.
System storage 1130 comprises ROM (ROM (read-only memory)) 1131 and RAM (random access memory) 1132.BIOS (Basic Input or Output System (BIOS)) 1133 resides in the ROM1131.Operating system 1134, application program 1135, other program module 1136 and some routine data 1137 reside in the RAM1132.
Fixed non-volatile memory 1141 such as hard disk is connected to fixed non-volatile memory interface 1140.Fixed non-volatile memory 1141 for example can storage operating system 1144, application program 1145, other program module 1146 and some routine data 1147.
Mobile nonvolatile memory such as floppy disk 1151 and CD-ROM drive 1155 is connected to mobile non-volatile memory interface 1150.For example, floppy disk can be inserted in the floppy disk 1151, and CD (CD) can be inserted in the CD-ROM drive 1155.
Input equipment such as mouse 1161 and keyboard 1162 is connected to user's input interface 1160.
Computing machine 1110 can be connected to remote computer 1180 by network interface 1170.For example, network interface 1170 can be connected to remote computer 1180 by LAN (Local Area Network) 1171.Perhaps, network interface 1170 can be connected to modulator-demodular unit (modulator-demodulator) 1172, and modulator-demodular unit 1172 is connected to remote computer 1180 via wide area network 1173.
Remote computer 1180 can comprise the storer 1181 such as hard disk, and it can store remote application 1185.
Video interface 1190 is connected to monitor 1191.
Output peripheral interface 1195 is connected to printer 1196 and loudspeaker 1197.
Computer system shown in Figure 180 only is illustrative and never means any restriction to invention, its application, or uses.
Computer system shown in Figure 180 can be incorporated in any embodiment, can be used as stand-alone computer, also can be used as the disposal system in the electronic equipment, can remove one or more unnecessary assemblies, also can add one or more additional assemblies to it.
Can implement method and system of the present invention by many modes.For example, can implement method and system of the present invention by software, hardware, firmware or its any combination.The order of above-mentioned method step only is illustrative, and method step of the present invention is not limited to above specifically described order, unless otherwise offer some clarification on.In addition, in certain embodiments, the present invention can also be implemented as the program that is recorded in the recording medium, and it comprises for the machine readable instructions that realizes the method according to this invention.Thereby the present invention also covers the recording medium that storage is used for the program of realization the method according to this invention.
Though by the example detail display specific embodiments more of the present invention, it will be appreciated by those skilled in the art that above-mentioned example only is intended that exemplary and does not limit the scope of the invention.It should be appreciated by those skilled in the art that above-described embodiment to be modified and do not depart from the scope and spirit of the present invention.Scope of the present invention is to limit by appended claim.

Claims (22)

1. one kind is used for the method that acquisition is used for the character data of Handwritten Digits Recognition, comprising:
Obtaining step obtains at least one original hand-written character data; With
The curvilinear transformation step, in the original hand-written character data of obtaining each is used at least a curvilinear transformation method, and obtain at least one conversion character data in the described original hand-written character data of obtaining each, as the character data that is used for Handwritten Digits Recognition
Wherein, the shape of the original hand-written character data of curvilinear transformation step nonlinear transformation, and do not destroy the structure of original hand-written character data, and the conversion character data can not be restored to original hand-written character data by preprocess method.
2. according to the process of claim 1 wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation step comprises:
Curvilinear transformation parameter-definition step, definition is used at least one suite line transformation parameter of at least a curvilinear transformation method of control; With
The conversion character data obtains step, obtains at least one conversion character data by using at least one suite line transformation parameter that the described original hand-written character data of obtaining are carried out at least a curvilinear transformation method.
3. according to the process of claim 1 wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation step comprises:
Data are selected step, select data from the described original hand-written character data of obtaining, and wherein, the data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data;
Curvilinear transformation parameter-definition step, definition is used at least one suite line transformation parameter of at least a curvilinear transformation method of control; With
The conversion character data obtains step, obtains at least one conversion character data by using at least one suite line transformation parameter that normal data is carried out at least a curvilinear transformation method.
4. according to the process of claim 1 wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation step comprises:
Data are selected step, select data from the described original hand-written character data of obtaining, wherein, data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data, and the data that have the first correct identification candidate of the threshold value that confidence score is lower than definition or have a correct identification candidate who is selected from other remaining N best identification candidates by the user are selected as abnormal data;
The curvilinear transformation parameter obtains step, obtains at least one group of best curve transformation parameter at least a curvilinear transformation method of control from normal data and abnormal data; With
The conversion character data obtains step, obtains at least one conversion character data by according at least one group of best curve transformation parameter normal data being carried out at least a curvilinear transformation method.
5. according to the method for claim 4, wherein, the curvilinear transformation parameter obtains step and comprises:
Curvilinear transformation parameter-definition step, definition is used at least one suite line transformation parameter of at least a curvilinear transformation method of control;
Learning data obtains step, obtains at least one group of learning data by using at least one suite line transformation parameter that normal data is carried out at least a curvilinear transformation method; With
The best curve transformation parameter obtains step, obtains at least one group of best curve transformation parameter by finding at least one and the immediate learning data of abnormal data.
6. according to any one the method in the claim 2,3 and 5, wherein, curvilinear transformation parameter-definition step comprises:
The definition controlling party to;
Definition control curve;
The definition first control parameter, with the design along controlling party to maximum conversion side-play amount; With
The definition second control parameter, with design perpendicular to controlling party to the above original hand-written character data of obtaining of direction want the scope of conversion.
7. according to the method for claim 6, wherein, the conversion character data obtains step and comprises:
Curve size set-up procedure is according to the size of the first control parameter and the second control parameter adjustment control curve;
Conversion side-play amount calculation procedure is calculated one group of conversion side-play amount according to the control curved needle of adjusting to all sampled points in the scope of wanting conversion in the described original hand-written character data of obtaining; With
The coordinate modify steps according to one group of conversion side-play amount calculating, by using the conversion side-play amount corresponding to sampled point, is modified in the coordinate that the described original hand-written character data of obtaining are wanted each sampled point in the scope of conversion.
8. according to the method for claim 6, wherein, the control curve is selected from camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve, or is selected from the combination of straight line, camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve.
9. according to the method for claim 5, wherein, obtain in the step, by finding immediate learning data according to range observation comparative learning data and abnormal data at the best curve transformation parameter.
10. according to the method for claim 3 or 4, wherein, select to determine confidence score according to the criteria of likelihood ratio in the step in data.
11. a method that is used for generating the Handwritten Digits Recognition sorter comprises:
Data obtain step, use according to each the method among the claim 1-10 and obtain original hand-written character data and about the conversion character data of original hand-written character data; With
Sorter generates step, uses the original hand-written character data of acquisition and the conversion character data of acquisition to generate the Handwritten Digits Recognition sorter.
12. one kind is used for the device that acquisition is used for the character data of Handwritten Digits Recognition, comprises:
Acquiring unit is configured to obtain at least one original hand-written character data; With
The curvilinear transformation unit, each that is configured in the original hand-written character data of obtaining is used at least a curvilinear transformation method, and obtain at least one conversion character data in the described original hand-written character data of obtaining each, as the character data that is used for Handwritten Digits Recognition
Wherein, the shape of the original hand-written character data of curvilinear transformation unit nonlinear transformation, and do not destroy the structure of original hand-written character data, and the conversion character data can not be restored to original hand-written character data by preprocess method.
13. according to the device of claim 12, wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation unit comprises:
Curvilinear transformation parameter-definition unit is configured to define at least one suite line transformation parameter at least a curvilinear transformation method of control;
The conversion character data obtains the unit, is configured to obtain at least one conversion character data by using at least one suite line transformation parameter that the described original hand-written character data of obtaining are carried out at least a curvilinear transformation method.
14. according to the device of claim 12, wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation unit comprises:
Data selection unit is configured to select data from the described original hand-written character data of obtaining, and wherein, the data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data;
Curvilinear transformation parameter-definition unit is configured to define at least one suite line transformation parameter at least a curvilinear transformation method of control; With
The conversion character data obtains the unit, is configured to obtain at least one conversion character data by using at least one suite line transformation parameter that normal data is carried out at least a curvilinear transformation method.
15. according to the device of claim 12, wherein, in the described original hand-written character data of obtaining each, the curvilinear transformation unit comprises:
Data selection unit, be configured to select data from the described original hand-written character data of obtaining, wherein, data with first correct identification candidate of the threshold value that confidence score is greater than or equal to definition are selected as normal data, and the data that have the first correct identification candidate of the threshold value that confidence score is lower than definition or have a correct identification candidate who is selected from other remaining N best identification candidates by the user are selected as abnormal data;
The curvilinear transformation parameter obtains the unit, is configured to obtain at least one group of best curve transformation parameter at least a curvilinear transformation method of control from normal data and abnormal data; With
The conversion character data obtains the unit, is configured to obtain at least one conversion character data by according at least one group of best curve transformation parameter normal data being carried out at least a curvilinear transformation method.
16. according to the device of claim 15, wherein, the curvilinear transformation parameter obtains the unit and also comprises:
Curvilinear transformation parameter-definition unit is configured to define at least one suite line transformation parameter at least a curvilinear transformation method of control;
Learning data obtains the unit, is configured to obtain at least one group of learning data by using at least one suite line transformation parameter that normal data is carried out at least a curvilinear transformation method; With
The best curve transformation parameter obtains the unit, is configured to by finding at least one and the immediate learning data of abnormal data to obtain at least one group of best curve transformation parameter.
17. according to any one the device in the claim 13,14 and 16, wherein, curvilinear transformation parameter-definition unit is configured to:
The definition controlling party to;
Definition control curve;
The definition first control parameter, with the design along controlling party to maximum conversion side-play amount; With
The definition second control parameter, with design perpendicular to controlling party to the above original hand-written character data of obtaining of direction want the scope of conversion.
18. according to the device of claim 17, wherein, the conversion character data obtains the unit and comprises:
Curve size adjustment unit is configured to the size according to the first control parameter and the second control parameter adjustment control curve;
Conversion side-play amount computing unit is configured to according to the control curved needle of adjusting all sampled points in the scope of wanting conversion in the described original hand-written character data of obtaining be calculated one group of conversion side-play amount; With
Coordinate is revised the unit, is configured to by using the conversion side-play amount corresponding to sampled point, be modified in the coordinate that the described original hand-written character data of obtaining are wanted each sampled point in the scope of conversion according to one group of conversion side-play amount calculating.
19. the device according to claim 17, wherein, the control curve is selected from camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve, or is selected from the combination of straight line, camber line, arcuate curve, index curve, logarithmic curve, sinusoidal curve and cosine curve.
20. according to the device of claim 16, wherein, obtain in the unit, by finding immediate learning data according to range observation comparative learning data and abnormal data at the best curve transformation parameter.
21. according to the device of claim 14 or 15, wherein, in data selection unit, determine confidence score according to the criteria of likelihood ratio.
22. a device that is used for generating the Handwritten Digits Recognition sorter comprises:
Data obtain the unit, are configured to use according to each the device among the claim 12-21 obtain original hand-written character data and about the conversion character data of original hand-written character data; With
The sorter generation unit is configured to use the original hand-written character data of acquisition and the conversion character data of acquisition to generate the Handwritten Digits Recognition sorter.
CN2012100780589A 2012-03-22 2012-03-22 Method and device used for obtaining character data used for handwritten character recognition Pending CN103324925A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100780589A CN103324925A (en) 2012-03-22 2012-03-22 Method and device used for obtaining character data used for handwritten character recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100780589A CN103324925A (en) 2012-03-22 2012-03-22 Method and device used for obtaining character data used for handwritten character recognition

Publications (1)

Publication Number Publication Date
CN103324925A true CN103324925A (en) 2013-09-25

Family

ID=49193656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100780589A Pending CN103324925A (en) 2012-03-22 2012-03-22 Method and device used for obtaining character data used for handwritten character recognition

Country Status (1)

Country Link
CN (1) CN103324925A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197224A (en) * 2019-05-29 2019-09-03 华南理工大学 Aerial hand-written character track restoration methods based on the confrontation study of feature space depth
CN113139533A (en) * 2021-04-06 2021-07-20 广州大学 Method, device, medium and equipment for quickly recognizing handwriting vector

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1315024A (en) * 1998-08-26 2001-09-26 德库玛股份公司 Charater recognition
JP2002288668A (en) * 2001-03-23 2002-10-04 Yoshinobu Takeuchi Curve linear transformation method
CN101536012A (en) * 2005-07-01 2009-09-16 微软公司 Ink warping for normalization and beautification / ink beautification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1315024A (en) * 1998-08-26 2001-09-26 德库玛股份公司 Charater recognition
JP2002288668A (en) * 2001-03-23 2002-10-04 Yoshinobu Takeuchi Curve linear transformation method
CN101536012A (en) * 2005-07-01 2009-09-16 微软公司 Ink warping for normalization and beautification / ink beautification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ADEL M.ALIMI: "A Neuro-Fuzzy Approach to Recognize Arabic Handwritten Characters", 《INTERNATIONAL CONFERENCE ON NEURAL NETWORKS》, vol. 3, 9 June 1997 (1997-06-09), XP010238662, DOI: doi:10.1109/ICNN.1997.613998 *
刘来元 等: "基于曲线矩的手写体数字识别", 《模式识别与人工智能》, vol. 8, no. 2, 15 June 1995 (1995-06-15) *
苗夺谦 等: "基于主曲线的脱机手写数字识别", 《电子学报》, vol. 33, no. 9, 25 September 2005 (2005-09-25) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197224A (en) * 2019-05-29 2019-09-03 华南理工大学 Aerial hand-written character track restoration methods based on the confrontation study of feature space depth
CN110197224B (en) * 2019-05-29 2021-05-14 华南理工大学 Method for recovering handwritten character track in air based on feature space depth counterstudy
CN113139533A (en) * 2021-04-06 2021-07-20 广州大学 Method, device, medium and equipment for quickly recognizing handwriting vector
CN113139533B (en) * 2021-04-06 2022-08-02 广州大学 Method, device, medium and equipment for quickly recognizing handwriting vector

Similar Documents

Publication Publication Date Title
JP6403233B2 (en) User authentication method, apparatus for executing the same, and recording medium storing the same
US8194934B2 (en) Apparatus for and method of using reliability information to produce and update image recognition data
Dehghan et al. Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM
US6178263B1 (en) Method of estimating at least one run-based font attribute of a group of characters
US7437001B2 (en) Method and device for recognition of a handwritten pattern
CN105761219B (en) Text image Slant Rectify method and system
Zhao et al. Two-stage segmentation of unconstrained handwritten Chinese characters
CN101128837A (en) Segmentation-based recognition
JP2005309608A (en) Character recognition result output device, character recognition device, its method and program
CN110399878A (en) Table format restoration methods, computer-readable medium and computer
CN106295631A (en) A kind of image Uighur word recognition methods and device
CN110909809A (en) Card image identification method based on deep learning
CN113657274A (en) Table generation method and device, electronic equipment, storage medium and product
CN102782705B (en) Comprise the resolution adjustment of the image of the text of experience OCR process
CN103500323A (en) Template matching method based on self-adaptive gray-scale image filtering
CN103324925A (en) Method and device used for obtaining character data used for handwritten character recognition
JP3634574B2 (en) Information processing method and apparatus
Viard-Gaudin et al. Recognition-directed recovering of temporal information from handwriting images
Rodrigues et al. Cursive character recognition–a character segmentation method using projection profile-based technique
CN112307977A (en) Automatic generation method and device for lane speed limit
CN104346320A (en) Handwritten document processing device, handwritten document processing method, and handwritten document processing program
CN115841671A (en) Calligraphy character skeleton correction method, system and storage medium
US6988107B2 (en) Reducing and controlling sizes of model-based recognizers
CN108345853B (en) Character recognition method and device based on isomorphic theory and terminal equipment
CN109145896A (en) A kind of interest region prediction technique, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130925