CN106339726A - Method and device for handwriting recognition - Google Patents

Method and device for handwriting recognition Download PDF

Info

Publication number
CN106339726A
CN106339726A CN201510424004.7A CN201510424004A CN106339726A CN 106339726 A CN106339726 A CN 106339726A CN 201510424004 A CN201510424004 A CN 201510424004A CN 106339726 A CN106339726 A CN 106339726A
Authority
CN
China
Prior art keywords
radical
hand
written
character
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510424004.7A
Other languages
Chinese (zh)
Inventor
李建杰
刘欣
王亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Priority to CN201510424004.7A priority Critical patent/CN106339726A/en
Publication of CN106339726A publication Critical patent/CN106339726A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • G06V30/244Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
    • G06V30/2455Discrimination between machine-print, hand-print and cursive writing

Abstract

The invention provides a method and a device for handwriting recognition. A handwriting registration method comprises the following steps: a training data acquisition step used for acquiring training data of a training character, wherein the training character comprises at least one Chinese character component, and the training data comprises codes of the training character; a virtual sample generation step used for generating at least one virtual sample of the training character based on a Chinese character component data set, wherein the Chinese character component data set comprises at least one writing style template of the at least one Chinese character component of the training character; and an identification template generation step used for generating at least one identification template of the training character based on the training data and the at least one virtual sample of the training character.

Description

Method for handwriting recognition and device
Technical field
The present invention relates generally to handwriting recognition field, in particular it relates to a kind of be used for ONLINE RECOGNITION hand The method of write characters and its device.
Background technology
In recent years, online handwriting recognition is widely used in man-machine interactive system.Set with multifunction peripheral As a example standby (mfp), the technology of handwriting recognition allows users to input operation parameter to indicate mfp. User can be by the address of the fax destination of handwriting input name and user.Mfp identifies hand Write characters, and correspondingly implement FAX operation.
In such a scenario, user, sometimes for logical too small amount of handwriting samples, fresh character is registered To in the dictionary of handwriting recognition engine.For example, user needs to register Japanese Kanji character, because working as It is not necessary that generating the dictionary that can cover all Japanese Kanji character when rom size is limited.So And, if the sample of character, hand-written knowledge to mfp by a person writing in registration phase It may be difficult to identify by the sample of this character of other person writings for other engine.Because different people It is likely to be of different writing styles, for example different order of strokes, different stroke numbers, difference Stroke shape etc..Figure 12 a to Figure 12 e can help understand stroke number.In Figure 12 a, book The stroke number writing style template is 1, and in Figure 12 b to Figure 12 d, the stroke number of template is respectively 2, in Figure 12 e, the stroke number of template is 3.
U.S. Patent No. 7865018 discloses a kind of hand-written using personalized handwriting recognition engine Technology of identification.Technology of identification, using the example of personal previous writing style, to help identification to be somebody's turn to do The new person's handwriting of personal input.If other people have different writing styles, the method can not Support other people.
Under the scene of online registration, the quantity for the true handwriting samples of registration is usual very little, And the be possible to writing style of a character can not be comprised.But, even if the registration sample when character When this only has a small amount of, however it remains identify the needs of the various test samples of this character.
Content of the invention
The present invention is at least one of in view of the above problems and to propose.
According to an aspect of the invention, it is provided a kind of hand-written register method, this hand-written enroller Method includes: training data obtaining step, for obtaining the training data of training character, wherein, institute State training character and include at least one radical, described training data includes the code of described training character; Virtual sample generation step, for generating at least the one of described training character based on radical data collection Individual virtual sample, wherein, described radical data collection include described training character described at least one At least one writing style template of radical;And recognition template generation step, for based on described The described training data of training character and at least one virtual sample described, to generate described training word At least one recognition template of symbol.
By the description to exemplary embodiment referring to the drawings, other features of the present invention will become Must be clear.
Brief description
It is merged in this specification and constitute the reality exemplified with the present invention for the accompanying drawing of the part of this specification Apply example, and be used for the principle of the present invention is described together with word description.
Fig. 1 is the configuration illustrating the image processing apparatus 100 according to the first example system configuration Schematic block diagram.
Fig. 2 is the block diagram illustrating the exemplary hardware arrangement of character recognition unit 120 in Fig. 1.
Fig. 3 schematically shows the writing style template of the generation radical according to first embodiment General flow figure.
Fig. 4 shows and is clustered (clustering) according to first embodiment to the sample of radical Flow chart.
Fig. 5 shows the general flow figure of hand-written register method.
Fig. 6 shows at least one virtual sample of the generation training character according to second embodiment Flow chart.
Fig. 7 shows at least one virtual sample of the generation training character according to 3rd embodiment Flow chart.
Fig. 8 shows the detail flowchart of the detection radical according to 3rd embodiment.
Fig. 9 is exemplified with the registration sample of Japanese Kanji character " " and its order of strokes.
Figure 10 shows the hand-written detailed process of the radical detecting according to 3rd embodiment construction Figure.
The stroke of the numbering 7 and 8 exemplified with the registration sample in Fig. 9 for Figure 11 a.
The stroke fragment of the stroke of the numbering 7 and 8 exemplified with the registration sample in Fig. 9 for Figure 11 b.
Figure 12 a to Figure 12 e is exemplified with 5 kinds of writing style templates of radical " own ".
Figure 13 to construct this training character exemplified with the writing style template of the radical by training character The example of virtual sample.
Figure 14 is exemplified with image coordinate system.
Figure 15 is exemplarily exemplified with charcter topology dictionary.
Figure 16 is the writing style template exemplarily illustrating the generation radical according to first embodiment The schematic diagram of basic handling.
Figure 17 is at least one void that character is trained in the generation exemplarily illustrating according to second embodiment Intend the schematic diagram of the basic handling of sample.
Figure 18 is at least one void that character is trained in the generation exemplarily illustrating according to 3rd embodiment Intend the schematic diagram of the basic handling of sample.
Figure 19 is the functional configuration of the track segmenting device according to the present invention.
Specific embodiment
To describe the exemplary embodiment of the present invention next, with reference to accompanying drawing in detail.It should be pointed out that Description below is only substantially illustrative and exemplary of, and be never intended to limit the present invention and Its application or purposes.The part stated in an embodiment and step, numerical expression and numerical value Positioned opposite it not delimit the scope of the invention, unless stated otherwise.Additionally, to this area Known to technical staff, technology, method and apparatus may not be discussed in detail, but in suitable feelings It is intended to the part as this specification under condition.
Fig. 1 is the first example system configuration illustrating according to being capable of embodiments of the invention The schematic block diagram of the configuration of image processing apparatus 100.Device 100 be including such as copy function, The multi-function peripheral of the various functions of scanner function, facsimile function etc..Can also be by multiple Device cooperates to realize these functions.Device 100 include user interface (ui) unit 110, Character recognition unit 120, graphics processing unit 130 and network interface 140.
Component unit in device 100 is in communication with each other via bus 10.Ui unit 110 enables users to Enough input by keyboard or touch-sensitive screen is ordered and is optimized operating parameter.Ui unit 110 is also to user Display such as state and the various information processing progress.For example, ui unit 110 allow users to through To be inputted hand-written by touch-sensitive screen.Character recognition unit 120 obtains the hand-written of input, and calculates identification Result.Then, graphics processing unit 130 is processed to input picture according to recognition result. Image processing apparatus 100 are connected to network by network interface 140, and control from network The data receiver of external equipment or the data is activation to external equipment.The data receiving can be Data to be printed, and be sent to the data of external equipment can be by scan paper document and The image obtaining, or the image to certain destination of will faxing.
Fig. 2 is the block diagram illustrating the exemplary hardware arrangement of character recognition unit 120 in Fig. 1. Processor 121 is by will be stored in the program in hard disk drive (hdd) 123 and recognition dictionary It is loaded on memory 122, to control the overall operation of image processing apparatus 100.Additionally, place Reason device 121 is communicated with the miscellaneous part in character recognition unit 120 via bus 10.Processor 121 It is also arranged to be read according to disclosed method, decode and execute Overall Steps.Processor 121 Using system bus 10, character identification result be recorded in memory 122.Except memory 122 Outside, character identification result can also be more permanently stored on hdd 123.As another choosing Select, it is possible to use character identification result is auxiliary as the order for controlling graphics processing unit 130 Help.
Hereinafter, will describe in detail and generate writing of radical with reference to first embodiment and accompanying drawing The method of style template.
First, term used in this specification will be described.
(1) code of character.Each Japanese Kanji character has been assigned benchmark true value (ground ) or unique code truth.In this embodiment, apply unicode (Unicode) system.Example As character " " is defined as 0x8a18.Each radical is also defined as unique code.For example, Radical " saying " is defined as 0x0001, and radical " own " is defined as 0x0010.
(2) radical.Radical is defined as one group of adjacent strokes in a character.In a word Fu Zhong, radical does not overlap each other.Here, no overlap refers to that the different radicals in a character are common Use same stroke.And each character can be considered to be made up of at least one radical.In this enforcement In example, if character can be split into left and right two parts, the group of strokes of left-half or The group of strokes of right half part can be respectively defined as a radical.If character can be split into Upper and lower two parts, then the group of strokes of the group of strokes of top half or the latter half can be distinguished It is defined as a radical.If character can not be split, this character itself can also be used as entirety It is defined as a radical.
(3) charcter topology dictionary.Charcter topology dictionary is pre-generated.This dictionary includes character Unpaired message with corresponding radical.For given character, this dictionary indicates that this character is inclined by which Position in this character of other composition, the code of these radicals, these radicals and this character The sequential write of these radicals.
Positional information in a character for the radical for example includes:
Coordinate in Japanese Kanji character for the central point of radical, wherein, Japanese Kanji character quilt It is normalized to 400*400, and the initial point of coordinate system is located at the upper left corner.Note that normalization is big Little for 400*400, the position of the initial point of coordinate system is only used for illustrating.Figure 14 is former exemplified with marked Point and the image coordinate system of coordinate direction.Much less, the initial point position of normalization size and coordinate system Put and be not limited to situation above.
Height in Japanese Kanji character for the radical.
The ratio of width to height in Japanese Kanji character of radical or aspect ratio.
Figure 15 is exemplarily exemplified with charcter topology dictionary.There are many entries in this dictionary.From It can be seen that " " is made up of radical " " and " own " in the entry of character " ".? In this dictionary, list the code of radical " " and " own ", center point coordinate, highly, in length and breadth Than and sequential write.
(4) writing style template.For same radical, being likely to be of of different people is many different Writing style.Stroke number, the difference of stroke shape, stroke direction and/or order of strokes can be passed through Different, to characterize these differences.Radical writing style template is created for representing these different books Write style.Therefore, a radical can correspond to multiple radical writing style templates.
Radical writing style template includes the code of corresponding radical and the stroke feature of this radical. Stroke feature includes stroke number and/or stroke shape and/or stroke direction and/or order of strokes.
The One function of writing style template is, is equipped with radical writing style template, therefore can Enough construct the virtual track of this radical.
Figure 16 is the writing style template exemplarily illustrating the generation radical according to first embodiment The schematic diagram of basic handling.Note that in figure 16, illustrate only key step, and do not show Go out some detailed steps.
Cylinder 1610 is exemplified with the hand-written and corresponding code of Japanese Kanji character, for example hand-written And its it is benchmark true value code 0x8a18, hand-writtenAnd its code 0x8ab2.1620, cylinder Show charcter topology dictionary, including the unpaired message of " ", " own " and " ", and " ", " really " unpaired message with " ".
By the process of step 100, based on 1610 and 1620 from the hand-written middle extraction radical sample of character This.
Cylinder 1630 is exemplified with the radical sample extracting, such as radical " ", " own " and " really " Radical sample.
By the process of step 200, radical sample is clustered.
Cylinder 1640 is exemplified with the result of cluster.WithIt is incorporated in a class, Because their stroke feature is more similar;AndIn different classes, for producing further Other writing style templates raw.In figure 16, show for radical " own " and " really " similar Process.
By the process of step 300, create writing style template based on the class in cylinder 1640, A usual class produces a writing style template.
Cylinder 1650 is exemplified with the writing style template creating.
Next, the generation of writing style template will be illustrated in greater detail by referring to Fig. 3.
Fig. 3 schematically shows the writing style template of the generation radical according to first embodiment General flow figure.In this embodiment, character is Japanese Kanji character.Japanese Kanji character is hand-written Data set is pre-generated.Each Japanese Kanji character has the multiple hands write by multiple writers Write.
As shown in Figure 3, there is provided Japanese Kanji character hand-written data collection, corresponding code and word Symbol structure dictionary, as the input of the writing style template for generating radical.
In step 100, from the hand-written middle extraction radical sample of character.For Japanese Kanji character It is hand-written that each inputs, and the code according to this character is come searching character structure dictionary.Then, pass through Using the respective symbols structure in dictionary, hand-written structure decision is tied for tiled configuration or up and down Structure or overall structure.According to the structure judging, it is divided at least one portion by hand-written for character. Extract described at least one portion, as at least one radical sample.
In step 200, the sample of the same radical write by different writers is clustered. Fig. 4 shows the flow chart sample of radical being clustered according to first embodiment.
In step 210, based on stroke number, radical sample is clustered.For example, radical " saying " 3 samples beAndBecause 3 samples are respectively provided with stroke number 6th, 6 and 5, therefore sampleAnd sampleIt is clustered in same first order trajectory set, And sampleIt is clustered to other first order trajectory set.
In a step 220, the size of radical track is normalized.After normalization, respectively The bounding box of radical track is changed into the size of 200*200, and a left side for the bounding box of each radical track Upper angle point is located at coordinate (0,0).Needless to say, normalized parameter is adjustable, and is not intended as Limit protection scope of the present invention.
In step 230, corresponding stroke is resampled.In each first order radical rail In mark group, corresponding stroke is resampled, so that the track of corresponding stroke has identical Number of sampling.First summation is asked to the number of sampling of all tracks of the corresponding stroke in first order group, Then by this summation the track divided by corresponding stroke sum, thus calculating an averagely counting.So Afterwards, using equidistant sampling method, the track of corresponding stroke is sampled.
In step 240, the calculating based on stroke number and based on the distance between radical sample, Radical sample is clustered.Here, summarizing to cluster principle.First, by the first order Each track in trajectory set is considered as a class.Then, whether threshold value is less than by judging characteristic distance, Different classes is incorporated in first order group.Each amalgamation result is considered as a class, and can Merging treatment is repeatedly implemented according to characteristic distance.Note that there is the class of different stroke numbers not Can combine.Therefore, the track in first order group will produce at least one class.Can apply Various features distance, to judge whether to merge.Explained below is merely illustrative, and should not Limit protection scope of the present invention.
Different features can be used for calculating the distance between radical sample.Here, definition calculates In parameter:
N is the total stroke number in radical sample;
J is stroke index;
M is the total points in stroke;
I is an index;
T is the total track number in a class;
S is the track index in a class;
P and q is the index of two classes respectively.
In one example, four dimensional features are extracted for each point in track, as formula (1) to (3). Wherein, i=1,2 ..., m.
featurei=(xi,yi,dxi,dyi) (1)
dxi=xi+1-xi(2)
dyi=yi+1-yi(3)
If there is t radical sample in a class, calculating and there is stroke index j and point rope Draw the average characteristics of a class of the point of i, as formula (4).
feature i , j , a v e r = ( σ s = 1 t feature i , j , s ) / t - - - ( 4 )
To measure the similitude between two classes using euclidean (euclidean) distance.Calculate The characteristic distance of corresponding stroke between class p and class q, having stroke index j, as formula (5).
distance p , q , j = σ i = 1 m ( feature i , j , a v e r , p - feature i , j , a v e r , q ) 2 - - - ( 5 )
And calculate the distance of the radical between class p and class q, as formula (6).
sum distance p , q = σ j = 1 n distance p , q , j n - - - ( 6 )
If predetermined threshold is less than by the summation that formula (6) calculates, merge class p and class q;No Then, nonjoinder class p and class q.
After step 240 it is impossible to remerge all kinds of corresponding to second level trajectory set.In step In 300, each second level group corresponds to radical writing style template.One radical can have multiple books Write style template.Each template representative writes the concrete mode of radical.Therefore, different templates has Different stroke features.
The steps (a) to (c) can be passed through, to calculate the average spy of writing style template Levy.Note that following steps are merely illustrative it is no intended to limit protection scope of the present invention.
A () is normalized to the radical track in the group of the corresponding second level;In preferred exemplary, The Unitary coordinate of track is turned to [0,1];
B () passes through formula (1), (2) and (3) to calculate the feature of each point;
C () feature based on each point calculating, to calculate the average characteristics of stroke.
Note that the generation method of writing style template is not particularly limited, as long as being obtained in that institute The writing style template needing.
Fig. 5 is the flow chart of hand-written register method.The method includes:
Step 1000, training data obtaining step, for obtaining the training data of training character, its In, described training character includes at least one radical, and described training data includes described training character Code;
Step 2000, virtual sample generation step, for based on described training data and radical data Collect and to generate at least one virtual sample of described training character, wherein, described radical data Ji Bao Include at least one writing style template of at least one radical described of described training character;And
Step 3000, recognition template generation step, for the described training based on described training character Data and at least one virtual sample described, to generate at least one identification mould of described training character Plate.
Note that in step 1000, training data may be inadequate.For example, training data can The only one of character or two true handwriting samples can be included.Small-scale training data cannot meet Identify the needs of the various samples of character by many different person writings.
According to embodiments of the invention, based on training data and corresponding writing style template, can Generate a large amount of virtual samples of training character.Obtain from many difference training samples therefore, it is possible to generate The recognition template obtaining.Additionally, recognition template has the ability that the many differences of identification write sample.
The input information of step 2000 includes training data and the radical data collection training character. Training character includes at least one radical, and training data includes training the code of character.Radical Data set includes training at least one writing style template of at least one radical described of character.
Below, will by respectively referring to Fig. 6 and 7, to describe in detail generate in step 2000 virtual Two embodiments of sample.
If user need register fresh character, at least two ways come to complete register.In one kind side In formula, user can input code and the charcter topology dictionary of this fresh character.This mode makes Obtain and can be registered, without the training sample inputting this fresh character.In another way, use Family can input the code of this fresh character and a small amount of sample of this fresh character.This mode makes Can be registered, without input charcter topology dictionary.Both modes or two kinds of selections, The register request of user can neatly be met.
Figure 17 is at least one void that character is trained in the generation exemplarily illustrating according to second embodiment Intend the schematic diagram of the basic handling of sample.
The input of user is illustrated by parallelogram 1710, and it includes training the code of character " ", I.e. 0x8ab2.
Cylinder 1720 is exemplified with the character knot of the unpaired message including " ", " really " and " " Structure dictionary.
By the process of step 2100, detect based on 1710 and 1720 and form character " " Radical.
Block 1740 is exemplified with the radical " " detecting and " really ".
Block 1730 exemplified with radical data collection, it include the writing style template 1731 of " " and " really " writing style template 1732.Block 1730 serves as the input of step 2200.
By the process of step 2200, be based respectively on 1731 and 1732 come to construct radical " " and " really " track.
The track of " really " of the track 1751 of " " exemplified with construction for the block 1750 and construction 1752.
By the process of step 2300, construct the virtual sample of character " ".
The block 1760 exemplarily virtual sample exemplified with 4 constructions.
Next, at least one the virtual sample that training character by referring to Fig. 6, will be illustrated in greater detail This generation.
Fig. 6 shows at least one virtual sample of the generation training character according to second embodiment Flow chart.
In a second embodiment, in addition to training the code of character, training data also includes character Structure dictionary is as input.Dictionary comprises to train the entry of character and the unpaired message of its radical.Excellent Selection of land, unpaired message also includes code, position and the order of each corresponding radical.
In order to register the hand-written of Japanese Kanji character " ", input the unicode 0x8a18 of this character, As shown in Figure 6.In step 2100, structure dictionary is used for radical and detects.Testing result includes:
The character with unicode 0x8a18 is made up of radical " saying " and " own ", and " saying " The radical position of " own " and radical sequential write, wherein, the unicode of this two radicals divides It is not 0x0001 and 0x0010.
According to the radical code detecting, from the radical data of input concentrate extract radical " saying " and All writing style templates of " own ".In this example, there are 3 kinds of writing styles of " saying " Template and 2 kinds of writing style templates of " own ".
In step 2200, based on the stroke feature in whole writing style templates, to construct detection The radical arriving hand-written.The construction details of the radical " own " according to a writing style template are retouched State as follows, for illustrating.
Step 2200-1, selects a writing style template
Step 2200-2, selects stroke in the template selectingFeature.
Step 2200-3, according to the coordinate information in stroke feature, to generate strokeVoid Intend track.
For strokeRepeat step 2200-2 and 2200-3.
Therefore, after generating the track of all strokes, construct and templateCorresponding Virtual track.By this way, 3 virtual tracks and the radical of radical " saying " are constructed 2 virtual tracks of " own ".In step 2300, radical is combined based on charcter topology dictionary Track, to construct the virtual sample of Japanese Kanji character " ".By various virtual inclined By-track mark is it is ensured that various character virtual sample.
According to unicode 0x8a18, charcter topology dictionary finds the unpaired message of character " ". Unpaired message includes radical order, center, height and the aspect ratio of two radicals.According to Height and aspect ratio, to calculate 3 virtual tracks of radical " saying " and the 2 of radical " own " The full-size(d) of individual virtual track.Then, the radical order according to radical and center, will adjust Track combination after whole size is the virtual sample of character " ".Therefore, up to 6 are constructed Virtual sample.
In a preferred embodiment, the virtual sample of character " " is further adjusted to other pre- Sizing.
Figure 18 is at least one void that character is trained in the generation exemplarily illustrating according to 3rd embodiment Intend the schematic diagram of the basic handling of sample.
The input of user is illustrated by parallelogram 1810, and it includes training sampleAnd its Code 0x8ab2.
Block 1820 exemplified with radical data collection, it include the writing style template 1821 of " ", with And the writing style template 1822 of " really ".
By step 2100 ' process, detect based on 1810 and 1820 and to form character " " Radical.
Block 1840 is exemplified with the result detecting, the code 0x01 of " " and the code of " really " 0x30.
By step 2200 ' process, be based respectively on 1821 and 1822 come to construct radical " " and " really " track.
The track of " really " of the track 1851 of " " exemplified with construction for the block 1850 and construction 1852.
Cylinder 1830 is exemplified with the character knot of the unpaired message including " ", " really " and " " Structure dictionary.
By step 2300 ' process, construct input sample based on 1830Virtual sample This.
The block 1860 exemplarily virtual sample exemplified with 4 constructions.
Next, at least one the virtual sample that training character by referring to Fig. 7, will be illustrated in greater detail This generation.
Fig. 7 shows at least one virtual sample of the generation training character according to 3rd embodiment Flow chart.
In the third embodiment, in addition to training the code of character, training data also includes training At least one training sample of character.Therefore, also generate the virtual sample of training character.
In order to register the sample of Japanese Kanji character " ", user input oneself hand-written for instructing Practice.The sample of registration includes 8 continuous strokes with the order of strokes of mark in Fig. 9.Return With reference to Fig. 7, in step 2100 ' in, the radical that detection includes in the sample registered.
Generate the multiple group of strokes from this training sample, wherein, the stroke in each group is continuous. These group of strokes have different stroke numbers, and/or have different starting. stroke.
Below, will be described in step 2100 ' -1 in enforcement Fig. 8, select these group of strokes Detailed mode.Fig. 9 is exemplified with the registration sample of Japanese Kanji character " " and its order of strokes. If by the first stroke in Fig. 9As starting. stroke, then can select up to 8 strokes Group.These groups each have different stroke numbers, and stroke number is different from 1 to 8.As Really by the second stroke in Fig. 9As starting. stroke, then can select up to 7 pens Draw group.Each group has different stroke numbers, and stroke number is different from 1 to 7.With this Mode, can be using any stroke in Fig. 9 as starting. stroke.If by last in Fig. 9 Individual strokeAs starting. stroke, then only can select a group of strokes.And stroke number is 1.
Referring back to Fig. 8, step 2100, ' -2 are apart from calculation procedure.The stroke that each is selected Group, concentrates, from radical data, the radical book selecting there is identical stroke number with the current group of strokes selecting Write the subset of style template;And calculate each template in the group of strokes and described subset of current selection The distance between information.Various distance calculating methods can be applied, as long as distance can reflect candidate Radical writing style template and the matching degree of group of strokes.In this embodiment, several using Europe Reed method is calculating distance.And adopt dynamic time warping (dtw) algorithm, to measure away from From and correspondingly calculate similarity score.Distance is less, then similarity score is higher.
In ' -3 in step 2100, based on corresponding range information, to calculating and corresponding group of strokes All writing style templates of distance are ranked up, and determine that described training character includes the One radical.
Then, from all strokes of training character, remove the stroke corresponding with the first radical; And according to remaining stroke and range information, to determine next stroke.
In a preferred embodiment, as shown in figure 9, determining the stroke phase with numbering 7 and 8 Corresponding first radical " own ".Then, remove the stroke of numbering 7 and 8.Only keep by compiling The group of strokes that number 1 to 6 stroke produces, and calculate the group of strokes of each holding and candidate writes wind The distance of grid template.Next, this hand, after the sorting operation based on range information, will be determined Other radicals write.In the case of Fig. 9, determine the stroke with numbering 1 to 6 by this way Corresponding the second radical " saying ".
In other preferred embodiments, in ' -2 in step 2100, will be produced by the track of training sample All subsets of raw template, are categorized in different classes the subset so that template in same class There is identical stroke number, and the subset of the template in inhomogeneity has different stroke numbers.Example As by all subsets of the template being produced by the track in Fig. 9, being divided into 8 classes.For every Individual class, is merely retained in a template in such with highest similarity score, and removes such Other interior templates.As a result, maintain stroke number from 1 to 88 different templates.
Next, in ' -3 in step 2100, determining similarity score highest mould in the middle of 8 templates Plate, the first radical including as described training character.Then, from remaining 7 templates, Remove each template with the stroke overlapping with the first radical, and according to remaining 7 templates and phase Like property score, to determine next radical.
Thus, the step 2100 in Fig. 7 ' detect radical from training sample.
Next, in step 2200 ' in, by using radical data collection, to construct detect inclined Other is hand-written.
Figure 10 shows the hand-written detailed process of the radical of the construction detection according to 3rd embodiment Figure.
Figure 11 a is exemplified with numbering 7 He of the registration sample of the Japanese Kanji character " " in Fig. 9 8 stroke.
Figure 11 b is exemplified with numbering 7 He of the registration sample of the Japanese Kanji character " " in Fig. 9 The stroke fragment of 8 stroke.
In ' -1 in step 2200, stroke is split based on significant point (dominant points).As Shown in Figure 11 a, based on significant point, the i.e. flex point of stroke, to split corresponding with radical " own " Numbering 7 and 8 stroke.
Reference picture 11b, obtains 6 stroke fragments.Next, reference picture 12a to Figure 12 e, Concentrate in radical data, have 5 kinds of writing style templates of radical " own ".Write wind for described 5 kinds Grid template has different annexations between stroke fragment.In ' -2 in step 2200, based on 5 Plant the stroke number of writing style template and normalization stroke feature carrys out combined pen scribing section, thus obtaining Construction radical " own " hand-written.
In preferred exemplary, obtain the hand-written of 5 radicals " own " constructing.In the same way, Obtain the hand-written of 4 radicals " saying " constructing.
The hand-written scheme of above-mentioned construction radical remains the shape of stroke fragment, provides various simultaneously The stroke fragment combination of various kinds.
Then, for the registration sample in Fig. 9, the stroke of numbering 7 and 8 is replaced with 5 respectively The radical " own " of individual construction hand-written.Alternately, by the stroke of numbering 1 to 6 respectively Replace with the hand-written of 4 radicals " saying " constructing.Therefore, step 2300 in the figure 7 ' after, It is obtained in that up to 20 virtual samples of training character " ".
In another embodiment, identification engine utilizes hmm (hidden Markov (markov) Model).And it is as is well known in the art, as long as in recognition template generation phase, in office There is the track of the radical that test sample includes, such identification engine is with regard to energy in what training sample Enough identify this test sample,
Figure 13 is exemplified with the virtual sample being constructed this training character by the virtual sample of the radical training character This example.There is 4 virtual samples of radical " saying ", i.e. a to d.There is radical " own " 5 virtual samples, i.e. a to e.In step 2300, altogether the 5 of construction character " " Individual virtual sample generating recognition template, i.e. a+a, b+b, c+c, d+d, a+e.
In this case, it is capable of identify that the test sample of c+d using the identification engine of hmm, Because being respectively present radical track c and d in training sample c+c and d+d.
Alternately, in the third embodiment, step 2200 ' and step 2300 ' can adopt Step 2200 in second embodiment and step 2300.In short, the hand-written constitution step of radical is permissible Testing result and radical data collection using radical detecting step;And virtual sample constitution step Charcter topology dictionary can be utilized.
Preferably, in order that training the virtual sample variation of character and thus improving the identification of generation The recognition capability of template, also includes virtual sample deforming step according to the hand-written register method of the present invention, The virtual sample being made the training character obtaining in virtual sample generation step by geometric transformation is become Shape.Geometric transform method is not particularly limited, as long as making virtual sample variation.
Return Fig. 5, the recognition template that recognition template for generate training character is described below generates The step 3000 of step.
Geometry deformation is carried out to the training training sample of character and virtual sample, high various to have Property.
By the training sample of training character, virtual sample and deformation sample, it is normalized to for example 400*400.Then, generate at least one recognition template of training character.In one example, may be used Using the on-line training method based on hidden Markov model, to generate recognition template.
First, training sample and the virtual sample of training character according to formula (1) to (3), are extracted Feature.
Second, the feature based on training sample and virtual sample has three steps generating recognition template:
- the initial of training character is created according to k- mean cluster and Viterbi (viterbi) method Template.
- it is based on Bao nurse _ Wei Erqi (baum_welch) algorithm, by hand-written whole characteristics According to reevaluating original template.
- by the template reevaluating is added in existing template, update an online handwriting and know The existing template of the Japanese Kanji character of other engine.
Note that the method for generating recognition template does not limit.
According on the other hand, the hand-written register method of the present invention is online register method.
According to another aspect, the present invention provides a kind of hand-written recognition method, this hand-written recognition method bag Include:
- obtaining step, obtains handwriting samples;And
- identification step, to identify these handwriting samples by using multiple recognition templates, and, By above-mentioned hand-written register method, to generate the part in the plurality of recognition template.
This identification step also includes:
- handwriting samples of input are normalized to such as 400*400;
- extract the feature of these handwriting samples to (3) according to formula (1);And
- according to Viterbi method and by using including the knowledge of the recognition template of the character of input Malapropism allusion quotation, is decoded to the feature of these handwriting samples.
It will be understood by those of skill in the art that the method for the present invention be also applied for such as Japanese, in The East Asia character of literary composition or Korean etc..
Compared with prior art, the method for the present invention and device are capable of identify that by the word of different person writings Symbol, even if these characters are to be registered by a small amount of sample by a person writing.When user is defeated When entering and registering corresponding sample, recognition template can be generated online, and be added in dictionary.
Meanwhile, the recognition dictionary including recognition template is maintained at little size.This is because, no It is that all of recognition template is all pre-generated and includes in recognition dictionary.When dictionary is stored in When in the embedded memory cell (such as duplicator or printer) of image processing apparatus, identify word The little size of allusion quotation is particularly useful and important.
Figure 19 is the functional configuration of the hand-written calling mechanism according to the present invention.Can by hardware, firmware, Any equipment in software or its any combination, to constitute hand-written calling mechanism 4000 and its included Unit, as long as the unit in device 4000 can implement the corresponding step of above-mentioned hand-written register method Rapid function.If device 4000 is partly or wholly by software sharing, this software It is stored in the memory of computer, and when the processor of this computer passes through what execution stored When software is to be processed, this computer is capable of the function of the hand-written register method of the present invention. In another aspect, device 4000 can be partly or wholly made up of hardware or firmware.Device 4000 can be incorporated in image processing equipment as functional module.
Hand-written calling mechanism 4000 includes: training data acquiring unit 4100, and it is configured to obtain The training data of training character, wherein, described training character includes at least one radical, described instruction Practice the code that data includes described training character;Virtual sample signal generating unit 4200, it is configured to Based on radical data collection, to generate at least one virtual sample of described training character, wherein, institute State radical data collection and include at least one of at least one radical described of described training character and write wind Grid template;And recognition template signal generating unit 4300, it is configured to based on described training character Described training data and at least one virtual sample described, to generate at least the one of described training character Individual recognition template.
Preferably, described virtual sample signal generating unit 4200 also includes: radical detection sub-unit 4210, It is configured to detection at least one radical described, and wherein, at least one radical described is included in In described training character;Radical hand-written construction subelement 4220, it is configured to using described Radical data collection, at least one to construct at least one detected radical is hand-written;And it is empty Intend sample architecture subelement 4230, it is configured to use constructed radical hand-written, carrys out structure Make at least one virtual sample of described training character.
The invention also discloses a kind of handwriting recognition apparatus, this handwriting recognition apparatus includes: obtains single Unit, it is configured to obtain at least one handwriting samples;And recognition unit, it is configured to lead to Cross and identify handwriting samples using multiple recognition templates, wherein, by being stepped on according to the hand-written of the present invention Note method, to generate at least a portion in the plurality of recognition template.
The invention also discloses a kind of image processing equipment, this image processing equipment includes: as above institute The handwriting recognition apparatus stated;And graphics processing unit, it is configured to according to described handwriting recognition The recognition result of device, input picture is processed.
Described image processing equipment can be duplicator, facsimile machine, scanner, printer or many Function printer.
The method of the present invention and device can be implemented in several ways.For example, it is possible to by soft Part, hardware, firmware or its any combination, to implement the method for the present invention and device.It is described above The order only purport of the step of method be exemplary, and, the step of the method for the present invention is not It is defined in the order being described in detail above, unless stated otherwise.Additionally, in some embodiments In, the present invention may be embodied in recording program in the recording medium, including for realizing basis The machine readable instructions of the method for the present invention.Therefore, present invention also contemplates that being stored with for realizing root Recording medium according to the program of the method for the present invention.
Although describe some specific embodiments of the present invention, this area skill in detail by example Art personnel should be appreciated that above-mentioned example only purport is exemplary, and does not limit the scope of the present invention. It will be appreciated by those skilled in the art that can be in the case of without departing from scope and spirit of the present invention Above-described embodiment is modified.The scope of the present invention is defined by the appended claims.

Claims (26)

1. a kind of hand-written register method, this hand-written register method includes:
Training data obtaining step, for obtaining the training data of training character, wherein, described instruction Symbol of practising handwriting includes at least one radical, and described training data includes the code of described training character;
Virtual sample generation step, for generating described training character extremely based on radical data collection A few virtual sample, wherein, described radical data collection include described training character described at least At least one writing style template of one radical;And
Recognition template generation step, for based on the described training described training data of character and described At least one virtual sample, to generate at least one recognition template of described training character.
2. hand-written register method according to claim 1, wherein, described at least one write Style template includes the code of at least one radical described and the stroke feature of described training character.
3. hand-written register method according to claim 2, wherein, the institute of described training character The described stroke feature stating at least one radical includes stroke number and/or stroke shape and/or stroke Direction and/or order of strokes.
4. the hand-written register method according to Claims 2 or 3, wherein, described radical data Collection includes multiple writing style templates of at least one radical described of described training character, and institute State multiple writing style templates and there are different stroke features.
5. the hand-written register method according to claims 1 to 3, wherein, described virtual sample Generation step also includes:
Radical detecting step, for detecting at least one radical described, wherein, described at least one Radical is included in described training character;
The hand-written constitution step of radical, for by using described radical data collection, to construct and to be detected At least one of at least one radical arriving is hand-written;
Virtual sample constitution step, hand-written for the radical by using being constructed, to construct described At least one virtual sample of training character.
6. hand-written register method according to claim 5, wherein, described virtual sample generates Step also utilizes charcter topology dictionary, and described charcter topology dictionary comprises at least one entry, described One of at least one entry entry includes the unpaired message of described training character and corresponding radical.
7. hand-written register method according to claim 6, wherein, described unpaired message also wraps Include each code in described corresponding radical, position and order.
8. hand-written register method according to claim 7, wherein, for described training character The each radical including, described radical data collection includes at least one writing style template.
9. hand-written register method according to claim 8, wherein,
Described radical detecting step based on the described training code of character and charcter topology dictionary, and And,
Described virtual sample constitution step is based on described charcter topology dictionary.
10. hand-written register method according to claim 5, wherein, described virtual sample generates Step also utilizes at least one training sample of described training character.
11. hand-written register methods according to claim 10, wherein, described radical detection step Suddenly at least one training sample described of described training character and described radical data collection are utilized.
12. hand-written register methods according to claim 11, wherein, the hand-written structure of described radical Make step and utilize the testing result of described radical detecting step and described radical data collection;And,
Described virtual sample constitution step utilizes charcter topology dictionary, and described charcter topology dictionary comprises At least one entry, one of at least one entry described entry includes described training character and phase Answer the unpaired message of radical.
13. hand-written register methods according to claim 11, wherein, the hand-written structure of described radical Make the flex point that step also includes based on stroke, by described in described training character, at least one trains sample This stroke segmentation is fragment, and combines described fragment based on described radical data collection.
14. hand-written register methods according to claim 13, wherein, the hand-written structure of described radical Make step and also include the stroke number information of group based on radical writing style template and normalization stroke spy Levy, to combine described fragment, wherein, described group includes obtaining with by described radical detecting step The corresponding writing style template of at least one radical described.
15. hand-written register methods according to claim 1, described hand-written register method is also wrapped Include:
Virtual sample deforming step, by geometric transformation, makes by described virtual sample generation step Described at least one virtual sample deformation of the described training character obtaining.
16. hand-written register methods according to claim 1, wherein, the institute of described training character State at least one writing style template described of at least one radical through the following steps that generate:
Radical sample clustering step, for the training sample data collection by using described training character, Radical sample is clustered;And
Characteristic extraction step, for extracting the feature of described cluster result, as described training character At least one radical described at least one writing style template described.
17. hand-written register methods according to claim 16, wherein, described cluster result Feature also includes the stroke number in described cluster result and the normalization pen in described cluster result The feature drawn.
18. hand-written register methods according to claim 16, wherein, described radical sample gathers Class step also includes:
Sample decomposition sub-step, for concentrate the described training sample data of described training character Various kinds is originally divided at least one portion, and wherein, described at least one portion corresponds at least one Radical;And
Radical clusters sub-step, the stroke number for the part based on each segmentation and the portion split The distance between point, at least one portion split is clustered.
19. hand-written register methods according to claim 1, wherein, described training character is day Language or Chinese or Korean.
A kind of 20. hand-written recognition methods, this hand-written recognition method comprises the following steps:
Obtain at least one handwriting samples;And
By using multiple recognition templates, to identify at least one handwriting samples described, wherein, to lead to Cross the hand-written register method according to any one of claim 1 to 19, to generate the plurality of At least a portion in recognition template.
21. hand-written recognition methods according to claim 20, wherein, described hand-written enroller Method is online register method.
A kind of 22. hand-written calling mechanisms, this hand-written calling mechanism includes:
Training data acquiring unit, it is configured to obtain the training data of training character, wherein, Described training character includes at least one radical, and described training data includes the generation of described training character Code;
Virtual sample signal generating unit, it is configured to generate described training word based on radical data collection At least one virtual sample of symbol, wherein, described radical data collection includes the institute of described training character State at least one writing style template of at least one radical;And
Recognition template signal generating unit, it is configured to the described training data based on described training character With at least one virtual sample described, to generate at least one recognition template of described training character.
23. hand-written calling mechanisms according to claim 22, wherein, described virtual sample life Unit is become also to include:
Radical detection sub-unit, it is configured to detection at least one radical described, wherein, described At least one radical is included in described training character;
Radical hand-written construction subelement, it is configured to, using described radical data collection, carry out structure At least one of at least one radical detected by making is hand-written;And
Virtual sample constructs subelement, and it is configured to use constructed radical hand-written, comes At least one virtual sample of construction described training character.
A kind of 24. handwriting recognition apparatus, this handwriting recognition apparatus includes:
Acquiring unit, it is configured to obtain at least one handwriting samples;And
Recognition unit, it is configured to using multiple recognition templates, to identify described at least one Individual handwriting samples, wherein, by the hand-written registration according to any one of claim 1 to 19 Method, to generate at least a portion in the plurality of recognition template.
A kind of 25. image processing equipments, this image processing equipment includes:
Handwriting recognition apparatus according to claim 24;And
Graphics processing unit, it is configured to the recognition result according to described handwriting recognition apparatus, comes Input picture is processed.
26. image processing equipments according to claim 25, described image processing equipment is multiple Print machine, facsimile machine, scanner, printer or multi-function printer.
CN201510424004.7A 2015-07-17 2015-07-17 Method and device for handwriting recognition Pending CN106339726A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510424004.7A CN106339726A (en) 2015-07-17 2015-07-17 Method and device for handwriting recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510424004.7A CN106339726A (en) 2015-07-17 2015-07-17 Method and device for handwriting recognition

Publications (1)

Publication Number Publication Date
CN106339726A true CN106339726A (en) 2017-01-18

Family

ID=57826149

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510424004.7A Pending CN106339726A (en) 2015-07-17 2015-07-17 Method and device for handwriting recognition

Country Status (1)

Country Link
CN (1) CN106339726A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875723A (en) * 2018-01-03 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN112699780A (en) * 2020-12-29 2021-04-23 上海臣星软件技术有限公司 Object identification method, device, equipment and storage medium
CN113269223A (en) * 2021-03-16 2021-08-17 重庆市地理信息和遥感应用中心 City style classification method based on spatial culture modular factorial analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1516061A (en) * 1998-02-09 2004-07-28 Ħ��������˾ Method and device for recognition of character notation
CN101149804A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Self-adaptive hand-written discrimination system and method
CN101627398A (en) * 2007-03-06 2010-01-13 微软公司 Radical-based hmm modeling for handwriten east asian characters
CN101866417A (en) * 2010-06-18 2010-10-20 西安电子科技大学 Method for identifying handwritten Uigur characters
CN101901348A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Normalization based handwriting identifying method and identifying device
CN102360436A (en) * 2011-10-24 2012-02-22 中国科学院软件研究所 Identification method for on-line handwritten Tibetan characters based on components
CN103366151A (en) * 2012-03-30 2013-10-23 佳能株式会社 A method and an apparatus for identifying hand-written characters
CN103902993A (en) * 2012-12-28 2014-07-02 佳能株式会社 Document image identification method and device
CN104008363A (en) * 2013-02-26 2014-08-27 佳能株式会社 Handwriting track detection, standardization and online-identification and abnormal radical collection

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1516061A (en) * 1998-02-09 2004-07-28 Ħ��������˾ Method and device for recognition of character notation
CN101149804A (en) * 2006-09-19 2008-03-26 北京三星通信技术研究有限公司 Self-adaptive hand-written discrimination system and method
CN101627398A (en) * 2007-03-06 2010-01-13 微软公司 Radical-based hmm modeling for handwriten east asian characters
CN101866417A (en) * 2010-06-18 2010-10-20 西安电子科技大学 Method for identifying handwritten Uigur characters
CN101901348A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Normalization based handwriting identifying method and identifying device
CN102360436A (en) * 2011-10-24 2012-02-22 中国科学院软件研究所 Identification method for on-line handwritten Tibetan characters based on components
CN103366151A (en) * 2012-03-30 2013-10-23 佳能株式会社 A method and an apparatus for identifying hand-written characters
CN103902993A (en) * 2012-12-28 2014-07-02 佳能株式会社 Document image identification method and device
CN104008363A (en) * 2013-02-26 2014-08-27 佳能株式会社 Handwriting track detection, standardization and online-identification and abnormal radical collection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘晓娟: "联机手写汉字识别中字根提取算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
黄鹤鸣: "脱机手写藏文字符识别研究", 《万方数据》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875723A (en) * 2018-01-03 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN108875723B (en) * 2018-01-03 2023-01-06 北京旷视科技有限公司 Object detection method, device and system and storage medium
CN112699780A (en) * 2020-12-29 2021-04-23 上海臣星软件技术有限公司 Object identification method, device, equipment and storage medium
CN113269223A (en) * 2021-03-16 2021-08-17 重庆市地理信息和遥感应用中心 City style classification method based on spatial culture modular factorial analysis
CN113269223B (en) * 2021-03-16 2022-04-22 重庆市地理信息和遥感应用中心 City style classification method based on spatial culture modular factorial analysis

Similar Documents

Publication Publication Date Title
CN104268603B (en) Intelligent marking method and system for text objective questions
Plamondon et al. Online and off-line handwriting recognition: a comprehensive survey
CN103577818B (en) A kind of method and apparatus of pictograph identification
US7428516B2 (en) Handwriting recognition using neural networks
CN109614944A (en) A kind of method for identifying mathematical formula, device, equipment and readable storage medium storing program for executing
US20070286486A1 (en) System and method for automated reading of handwriting
JPS61502495A (en) Cryptographic analysis device
JP2007317022A (en) Handwritten character processor and method for processing handwritten character
CN104866216B (en) A kind of information processing method and smart pen
CN101581981A (en) Method and system for directly forming Chinese text by writing Chinese characters on a piece of common paper
Kumar et al. A systematic survey on CAPTCHA recognition: types, creation and breaking techniques
CN106339726A (en) Method and device for handwriting recognition
RU2259592C2 (en) Method for recognizing graphic objects using integrity principle
CN100357957C (en) Character recognition apparatus and method for recognizing characters in image
Sanjrani et al. Handwritten optical character recognition system for Sindhi numerals
CN113673294A (en) Method and device for extracting key information of document, computer equipment and storage medium
JP2008225695A (en) Character recognition error correction device and program
JP3898645B2 (en) Form format editing device and form format editing program
JP7435098B2 (en) Kuzushiji recognition system, Kuzushiji recognition method and program
US11442981B2 (en) Information providing device, information providing method, and recording medium with combined images corresponding to selected genre
JP2011522492A (en) Kanji input method suitable for Chinese education
CN107886808B (en) Braille square auxiliary labeling method and system
Saloum DAD: A Detailed Arabic Dataset for Online Text Recognition and Writer Identification, a New Type
JPS592191A (en) Recognizing and processing system of handwritten japanese sentence
CN109766978A (en) A kind of generation method of word code, recognition methods, device, storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170118

RJ01 Rejection of invention patent application after publication