CN1020213C - Hand-written charactor recognition apparatus - Google Patents

Hand-written charactor recognition apparatus Download PDF

Info

Publication number
CN1020213C
CN1020213C CN89106539A CN89106539A CN1020213C CN 1020213 C CN1020213 C CN 1020213C CN 89106539 A CN89106539 A CN 89106539A CN 89106539 A CN89106539 A CN 89106539A CN 1020213 C CN1020213 C CN 1020213C
Authority
CN
China
Prior art keywords
character
stroke
degree
approximation
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN89106539A
Other languages
Chinese (zh)
Other versions
CN1040447A (en
Inventor
吉田公义
田守宽文
板野秋夫
茶谷公之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Corp
Original Assignee
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Corp filed Critical Sony Corp
Publication of CN1040447A publication Critical patent/CN1040447A/en
Application granted granted Critical
Publication of CN1020213C publication Critical patent/CN1020213C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

To contrive the miniaturization of a dictionary and to speed up retrieval by obtaining a degree of approximation between an inputted stroke and the basic constituting element of a character, which is defined in advance, and evaluating the character to be inputted from a feature vector, which defines this degree of approximation as a component, with fuzzy.One stroke part of data is supplied through a temporary buffer memory 3 to approximation degree arithmetic circuits 401-426 and the approximation degree with ten plates T1-T26 is calculated. The calculated approximation degree is supplied to a feature vector storing buffer 6. 26 memory areas are obtained in a row direction in correspondence to the ten plate and memory areas are obtained in a column direction in correspondence to the maximum stroke number of the character to be recognized. Consequently, the feature vector of each stroke is stored at every stroke. This feature vector by one character and a character code of a feature dictionary 7 are processed in a detecting circuit 8 and the code number of the inputted character is outputted.

Description

Hand-written charactor recognition apparatus
The present invention relates to the recognition device of hand script Chinese input equipment character.
In the hand-written charactor recognition apparatus of the present invention, obtain the degree of approximation of input stroke and predefined character basic structural element, and, use the fuzzy evaluation input character in view of the above, improve the various characteristics of its identification with the composition of this degree of approximation as eigenvector.
The recognition methods of prior art hand script Chinese input equipment character as shown in Figure 6,
1. with sampling spot P on the input person's handwriting 0, P 1... P nAnd the broken line (Fig. 6 B) that time serial message constitutes removes the input person's handwriting of an approximate hand-written stroke.(Fig. 6 A)
With the basic configuration of predefined person's handwriting, promptly " basic strokes type " relatively carry out " stroke identification " with the 1st broken line.
3. according to the 2nd result, the stroke of input is transformed to the Code Number of immediate basic strokes type.
4. to all strokes of a character, repeat the 1-3 item.
5. with reference to dictionary, will be judged as the character that is transfused to by the character that order of strokes is held the 3rd item code number.
Said method is being widely used always.
If make in this way, owing to replace the stroke of importing with a kind of basic penization type, at basic input person's handwriting sampling spot P 0, P 1... P nIn the information,, nearly all can cast out, even therefore the less device of memory capacity also may be used as character recognition except the necessary data of later identifying.Because the Code Number of the basic strokes type of dictionary is by each stroke series arrangement, make comparisons in proper order with the Code Number of the basic strokes type of it and input character again,, can carry out the identification of input character, therefore, both the dictionary miniaturization can be made, the more needed time can be shortened again.
Document: " Nikkei " Dec nineteen eighty-three 5 days numbers
But under the occasion that makes in this way, fundamental form that shortcoming makes the input person's handwriting changes and even when being out of shape, will produce mistake in the 2nd stroke identification because input noise, hand-written person are imported, and its result descends accuracy of identification significantly.For example, write " one " or one when horizontal, as represented in zero among Fig. 7 A, if any " first stroke of a Chinese character place ", then its basic strokes type can be identified as shown in Fig. 7 B.
For this reason, in the past with " the code name sign indicating number C of stroke s 1, C 2... C n" such and line description handles in the dictionary those codes that " is easy to the stroke type of mistake identification ".
But, if do like this, both increased the dictionary size, prolonged retrieval time again.And the user owing to can not only determine the necessary stroke type code of login from the input stroke, realizes that such login of appending is very difficult will append not the character of login in dictionary the time.
The objective of the invention is to solve problem as above.
For this reason, in the present invention, obtain the stroke information of input and the degree of approximation of predefined character basic structural element, in the information of conversion with this degree of approximation constitutive characteristic vector, the stroke information of input changes evaluating deg according to the qualifier in estimating.
Carry out character recognition with ambiguity, can not reduce discrimination, can realize dictionary miniaturization of size and retrieval high speed etc.
At first, just the summary of the present invention and embodiment is illustrated.
In the present invention promptly:
1. prepare appropriate template pattern for example shown in Figure 2 (basic strokes type) T 1-T 26;
2. with ⅰ stroke information S in each character of handwriting input iWith template T 1-T 26Order compares, and calculates and each template T j(j=1-26) degree of approximation E Ij
For example, if input character be katakana "
Figure 891065393_IMG2
", its first stroke s is " Pie " so since with template T 1, T 2, T 3Deng the degree of approximation higher, with template T 7Deng the degree of approximation lower, so obtain E 101=90%, E 102=80%, E 103=95% ... E 107=0% ... E 126=0%.Similarly, because the second stroke S is " Dian ", so obtain E 201=5%, E 202=0%, E 203=0% ... E 207=95% ... E 226=0%, (numerical value is the size of supposing for explanation).
3, for each character, at each stroke S iOn, with the 2nd result as eigenvector V i, V i=(E I01, E I02... E I26) store.
As last example "
Figure 891065393_IMG3
" situation under, eigenvector is:
V=(90,80,95,…,0,…0)
V=(5,0,0,…,95,…,0)。
4, when katakana "
Figure 891065393_IMG4
" character writes when correct its first stroke S 1With template T 3" substantially " unanimity, the second stroke S 2With template T 7" really " unanimity.
Therefore, if character in the dictionary "
Figure 891065393_IMG5
", then "
Figure 891065393_IMG6
" jis code (JIS) code of describing its character be T 3=substantially, T 7=determine.
That is, to each character, describe this character code number, with this character i pen S iImmediate template T jAnd the qualifier of representing this approximate (unanimity) degree.Again, this template T jAnd qualifier, only its stroke number is described by order of strokes.
Again according to total stroke number of each character with these character classifications.
5, be ready to as shown in Figure 3 function chart again with ambiguity.
6,, from total stroke number item of input character, take out the character data of first character according in the 4th the character data.
In last example, "
Figure 891065393_IMG7
" total stroke number of character is 2, so from 2 stroke items, take out first character data.
7, for the sake of simplicity, the character data that obtains in the 6th for "
Figure 891065393_IMG8
" character data of character, because the 1st stroke S 1T 3=" substantially ", so along with the function curve of selecting " substantially " among Fig. 3, just can be from the 3rd eigenvector V that obtains 1The degree of approximation in, take out with respect to template T 3Degree of approximation 95%(=E 103).
Then, according to the function curve of " substantially " among Fig. 3, change this degree of approximation 95% and be qualification rate G 1, G for example 1=96%.
Equally, because the second stroke S 2, T 7=" determining ", so along with the function curve of selecting " determining " among Fig. 3, just can be from eigenvector V 2The degree of approximation in, take out with respect to template T 7Degree of approximation 95%(=E 207), it is transformed to qualification rate G 2, for example, G 2=98%.
That is, when the 6th is taken out character data, should be to each stroke S iSelect the function curve of Fig. 4, again with eigenvector V iCorresponding approximate value E IjBe transformed to the qualified degree G that limits by selected function curve i
8, with the 7th qualified degree G that tries to achieve iInterior minimum qualification rate is as the qualified degree of the character G of the represented Code Number of this character data m
In last example, because G 1=96%, G 2=98%, thus with respect to input character " " the qualified degree G of word mBe 96%(=G 1).
To meet all character datas of stroke number, repeat 7,89, thereafter.
If carry out and finish for 10 the 9th, at resulting qualified degree G mIts Code Number with wherein giving first candidate of the highest character of qualified degree G as input character, is exported in (this has to the number of character data) lining.
Then, template pattern T 1-T 26Be that some is determined below considering, that is:
1, the structural elements of Chinese character have " horizontal stroke ", and " perpendicular stroke ", " left-falling stroke ", " bending " etc., its kind has qualification.
Even 2 seem identical stroke, also can be owing to the difference that " left-fallings stroke " literary style of being out of shape such as " bends " appears in the difference of wieling the pen.Again, to strokes such as " left-falling strokes ", owing to do not stipulate the original length and angle, in the stroke that may produce different distortion, template pattern T for example 1-T 3, only be for distinguishing other templates that deformation type is prepared.
3, because very complicated basic configuration frequency of occurrences in all Chinese characters is extremely low, thus do not define template, and handle with other recognition methods.
Also has template pattern T 1-T 26In the stroke part represented with dotted line asking degree of approximation E IjThe time, expression can reduce or ignore their evaluation.
The following describes a structure example of the present invention.
Fig. 1 is the system diagram of one embodiment of the present of invention.
Fig. 2 to Fig. 7 is its key diagram.
In Fig. 1, the coordinate input medium of (1) presentation graphic tablet etc. is by the coordinate sequence P of a stroke part of this input medium (1) input o-P n, with this coordinate sequence P o-P nDeliver to broken line compressor circuit (2), the sequence of broken line information and terminal point information thereof is carried out compressed transform.Promptly, for example, if stroke (coordinate sequence) pre-service of input is the broken line #1-#4 shown in Fig. 4 B, this stroke so, its every the broken line #1-#4 both angle shown in Fig. 4 A (direction) quantizes by 8 direction numbers, again the length of this broken line #1-#4 and the coordinate values of each initial point and terminal point are carried out conversion, obtain data such shown in Fig. 4 C.
And a stroke of these data partly by memory buffer (3), is delivered to degree of approximation counting circuit (401)-(426), calculation template style T 1-T 26Degree of approximation E Ij(aforementioned the 2nd) again, carries out this degree of approximation E IjCalculating is according to rale store circuit (501 -(526) algorithm described in, independent and concurrently to each template pattern T jCalculating is tried to achieve.
Then, with the degree of approximation E that calculates IjDeliver to eigenvector memory buffer (6).Among the figure, the structure of this memory buffer (6) has been represented on model ground, existing and template pattern T 1-T 2626 memory blocks of corresponding row direction have K memory block with the maximum stroke number K of identification character respective column direction again.Therefore, in this memory buffer (6), with regard to the eigenvector V of its each stroke S of character i, be stored in each stroke S iIn.(aforementioned the 3rd)
And, the eigenvector V of this character iHandle by aforementioned 5-10 item in estimating circuit (8) with character code (aforementioned the 4th), will be exported for the highest Code Number of the qualified degree of input character from feature lexicon (7).
Fig. 5 is illustrated in the degree of approximation counting circuit (401), calculates input stroke Si to template T 1Degree of approximation E I01Regular example.
That is, in Fig. 5 A, contained " Pie " in the words such as " right side ", " five " represented in exaggeration, and stroke " Pie " hereto is at template T 1The moment, shown in Fig. 5 B, be by measuring length L 1-L 4, L h, L w, calculate.
E iol=(aLh-bLw-cL 1+dL 4+eL 3)/L 2
But, work as E I01>1 o'clock E I01=1
Work as E I01<0 o'clock E I01=0
A-e is the constant of obtaining, and (represents approximate value E here decimally I01).
And, at this moment at template T 1In, because the power of dotted line stroke part is 0 and even less than 0, so for value L 3, L 4Constant e, d, a-c is little than other constant.
Again, in counting circuit (402)-(426), also by respective modules T 2-T 26Degree of approximation E I02-E I26Computing formula, each self-defining degree of approximation is calculated.
If the present invention according to as above carries out the identification of hand script Chinese input equipment character, at this moment,, obtain input stroke S if particularly according to the present invention iWith the template T that pre-defines 1-T 26Degree of approximation E Ij, and with this degree of approximation E IjReaching qualifier is that character recognition is carried out on the basis, and the discrimination that changes naturally and even be out of shape of identifying the handwriting so can not reduce.Because template T jResemble T 1-T 3Such processing, promptly processing has also been done in distortion to part, thereby has improved discrimination, has strengthened the recognition capability of identifying the handwriting and changing and even being out of shape.
Again, because dictionary (7) can have only 1 group and stroke S basically iCorresponding representational template T jAnd qualifier, so, both can make dictionary (7) miniaturization, can make the retrieval high speed of dictionary (7) again.
Available again a kind of like this method realizes the login of undefined character, and promptly by stroke of the every input of user, by the shape of the highest template of the image representation degree of approximation, the available sessions form confirms whether this template is correct shape simultaneously on the one hand.

Claims (1)

1, a kind of hand-written charactor recognition apparatus is equipped with: comprises coordinates of input devices and removes the hand-written input circuit of the broken line compressor circuit of approximate each input person's handwriting by described coordinates of input devices input with broken line,
Store the rale store circuit of good and the corresponding a plurality of Template Informations of character basic structural element in advance,
Good each character of storage in advance, the feature lexicon storer that reaches the module information the most close and represent the limit of degree of approximation with its each stroke,
Described hand-written charactor recognition apparatus is characterised in that also and comprises:
Be used for calculating respectively the degree of approximation counting circuit of a plurality of Template Informations that each stroke information that provides from above-mentioned hand-written input circuit and above-mentioned rale store circuit store,
The degree of approximation of each stroke that calculates according to above-mentioned degree of approximation counting circuit, with reference to above-mentioned feature lexicon storer, hand-written character is estimated, thereby and when estimating, changed the evaluation circuit that character judgement degree is estimated by limit the described degree of approximation with described qualifier.
CN89106539A 1988-08-17 1989-08-17 Hand-written charactor recognition apparatus Expired - Fee Related CN1020213C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP63204144A JP2762472B2 (en) 1988-08-17 1988-08-17 Character recognition method and character recognition device
JPP204144/88 1988-08-17
JP204144/88 1988-08-17

Publications (2)

Publication Number Publication Date
CN1040447A CN1040447A (en) 1990-03-14
CN1020213C true CN1020213C (en) 1993-03-31

Family

ID=16485566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN89106539A Expired - Fee Related CN1020213C (en) 1988-08-17 1989-08-17 Hand-written charactor recognition apparatus

Country Status (3)

Country Link
JP (1) JP2762472B2 (en)
KR (1) KR0128733B1 (en)
CN (1) CN1020213C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0548030B1 (en) * 1991-12-19 1998-09-30 Texas Instruments Incorporated Character recognition
TW274135B (en) * 1994-09-14 1996-04-11 Hitachi Seisakusyo Kk
JPH10124505A (en) * 1996-10-25 1998-05-15 Hitachi Ltd Character input device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0632083B2 (en) * 1988-06-10 1994-04-27 サン電子株式会社 Handwritten character recognition system by fuzzy reasoning
JPH0632084B2 (en) * 1988-07-25 1994-04-27 サン電子株式会社 Handwritten character recognition method by fuzzy reasoning

Also Published As

Publication number Publication date
JPH0253193A (en) 1990-02-22
CN1040447A (en) 1990-03-14
JP2762472B2 (en) 1998-06-04
KR0128733B1 (en) 1998-04-15
KR900003771A (en) 1990-03-27

Similar Documents

Publication Publication Date Title
CN1167030C (en) Handwriteen character recognition using multi-resolution models
CN1098504C (en) Method for performing string matching
CN107330127B (en) Similar text detection method based on text picture retrieval
CN1107283C (en) Method and apparatus for character recognition of handwriting input
CN1017663B (en) Printer
CN1163841C (en) On-line hand writing Chinese character distinguishing device
US6035063A (en) Online character recognition system with improved standard strokes processing efficiency
CN1755706A (en) Image construction method, fingerprint image construction apparatus, and program
CN1040693A (en) Hand-written character recognition apparatus and method
JPH0139154B2 (en)
CN113076465A (en) Universal cross-modal retrieval model based on deep hash
JPH11203461A (en) Graph sorting method and system, graph retrieving method and system, graph sorting feature extracting method, graph sorting table preparing method, information recording medium, and method for evaluating similarity or difference between graphs
JP2673871B2 (en) Method and device for pattern recognition by neural network
CN1051633A (en) target identification system
CN114745553A (en) Image data storage method based on big data
CN115914640A (en) Data compression method for Internet of vehicles
CN1227373A (en) Handwriting verification device
CN1020213C (en) Hand-written charactor recognition apparatus
CN1035844C (en) Method of sorting out candidate characters in character recognition system
CN115526310A (en) Network model quantification method, device and equipment
KR100248601B1 (en) On-line character recognition method and device
CN1019699B (en) Hand-written charactor recognition apparatus
CN110941730A (en) Retrieval method and device based on human face feature data migration
CN1303563C (en) Method and system for compressing hand-written character template
JPH05314320A (en) Recognition result evaluating system using difference of recognition distance and candidate order

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C15 Extension of patent right duration from 15 to 20 years for appl. with date before 31.12.1992 and still valid on 11.12.2001 (patent law change 1993)
OR01 Other related matters
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee