JPH0330191B2

JPH0330191B2 -

Info

Publication number: JPH0330191B2
Application number: JP59057420A
Authority: JP
Priority date: 1984-03-27
Filing date: 1984-03-27
Publication date: 1991-04-26
Also published as: JPS60201486A

Description

【発明の詳細な説明】（技術分野）本発明は高速で精度の良い手書文書の読取方法
に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Technical Field) The present invention relates to a method for reading handwritten documents at high speed and with high accuracy.

（背景技術）これまでに手書文書の読取方法として記入した
文字の文字種を指定する記入枠内の指定の有無を
検出し、指定を検出したカラムの文字の読取を指
定された文字種の辞書のみを参照して行う方法が
提案されている。しかしながら、この方法では、
文字種の指定がなかつた場合、全ての辞書を参照
しなければならず処理速度が遅くなるという問題
があつた。(Background technology) As a method for reading handwritten documents, the presence or absence of a specification in a writing frame that specifies the character type of the written character is detected, and the characters in the column where the specification is detected are read only in a dictionary of the specified character type. A method has been proposed that refers to However, with this method,
If a character type is not specified, all dictionaries must be referenced, resulting in a slow processing speed.

（発明の目的および概要）本発明の目的は従来の技術の上記欠点を改善し
て高速で精度のよい手書文書の読取方法を提供す
ることにあり、その特徴は、文字の文字線量を検
出しそれを当該文字の複雑度とし、字種の指定が
なかつた場合、その複雑度により当該文字の含ま
れる文字種の辞書を選択して識別を行うことにあ
る。(Objective and Summary of the Invention) An object of the present invention is to improve the above-mentioned drawbacks of the conventional technology and provide a high-speed and highly accurate handwritten document reading method. This is taken as the complexity of the character, and if a character type is not specified, the character is identified by selecting a dictionary of the character type that includes the character based on the complexity.

（発明の実施例）第１図は本発明による手書日本語文書読取方法
の一実施例を示す構成図である。図において、１
は光電変換部、２はパタンレジスタ、３は特徴抽
出部、４は文字線量検出部、５は文字種検出部、
６は識別部、７は文字名出力、８はひらがな辞書
メモリ、９はカタカナ辞書メモリ、１０は英字数
字記号辞書メモリ、１１は漢字辞書メモリであ
る。(Embodiment of the Invention) FIG. 1 is a block diagram showing an embodiment of a handwritten Japanese document reading method according to the present invention. In the figure, 1
is a photoelectric conversion unit, 2 is a pattern register, 3 is a feature extraction unit, 4 is a character dose detection unit, 5 is a character type detection unit,
6 is an identification unit, 7 is a character name output, 8 is a hiragana dictionary memory, 9 is a katakana dictionary memory, 10 is an alphanumeric symbol dictionary memory, and 11 is a kanji dictionary memory.

また、第２図は本実施例に使用した帳票例を示
す図で、２１は帳票、２２は文字種指定記入枠
で、その中の２３は漢字指定欄、２４はひらがな
指定欄、２５はカタカナ指定欄、２６は英字数字
記号指定欄、２７は文字記入枠、２８は文字種指
定記入行、２９は文字記入行で、例えば第３図の
帳票記入例のように記入しておく。 Fig. 2 is a diagram showing an example of a form used in this example, where 21 is a form, 22 is a character type specification entry frame, 23 is a kanji specification field, 24 is a hiragana specification field, and 25 is a katakana specification field. 26 is an alphanumeric symbol designation field, 27 is a character entry frame, 28 is a character type designation entry line, and 29 is a character entry line. For example, entries are made as in the form entry example shown in FIG. 3.

以下、この帳票例を用いて本発明の動作を次に
説明する。 The operation of the present invention will be explained below using this example of the form.

まず第２図の帳票２１の文字種指定記入行２８
の行について文字記入行２９の各文字に対応した
字種指定の有無を検出し、識別部６へ字種指定の
有無を送出する。その動作は光電変換部１により
文字種指定記入行２８について光電変換を行ない
２値の量子化された電気信号に変換し、１文字分
の領域を切り出してパタンレジスタ２に格納す
る。文字種指定検出部５はパタンレジスタ２を文
字種指定記入枠２２に対応する様に４個の領域に
分割し、各領域内の黒点数（文字線部を黒点とす
る。）を計数し、閾値と比較してそれぞれの文字
種の指定の有無を検出し、前記文字種即ち漢字、
ひらがな、カタカナ、記号等の指定の有無を文字
種検出部５内の文字種指定メモリに格納する。以
上の動作により文字記入行２９の各文字に対応し
た字種指定の有無を検出する。次に、第２図の文
字記入行２９の読取りを行なう。その動作は光電
変換部１により文字記入行２９について光電変換
を行ない、２値の量子化した電気信号に変換し、
１文字分の領域を切出してパタンレジスタ２に格
納する。特徴抽出部３はパタンレジスタ２内の文
字パタンより各種特徴を抽出し、該特徴を識別部
６へ送出する。 First, enter line 28 for character type specification in form 21 in Figure 2.
The presence or absence of character type designation corresponding to each character in the character entry line 29 is detected for the line , and the presence or absence of character type designation is sent to the identification unit 6. In this operation, the photoelectric conversion unit 1 performs photoelectric conversion on the character type designation entry line 28 to convert it into a binary quantized electric signal, cuts out an area for one character, and stores it in the pattern register 2. The character type designation detection unit 5 divides the pattern register 2 into four areas corresponding to the character type designation entry frame 22, counts the number of black dots in each area (character line parts are black dots), and calculates the threshold value. The comparison is made to detect whether or not each character type is designated, and the character type, that is, kanji,
The presence or absence of designation of hiragana, katakana, symbol, etc. is stored in the character type designation memory in the character type detection unit 5. Through the above operations, it is detected whether or not a character type has been specified corresponding to each character in the character entry line 29. Next, the character entry line 29 in FIG. 2 is read. The operation is such that the photoelectric conversion unit 1 performs photoelectric conversion on the character entry line 29, converting it into a binary quantized electric signal,
An area corresponding to one character is cut out and stored in the pattern register 2. The feature extraction section 3 extracts various features from the character patterns in the pattern register 2 and sends the features to the identification section 6.

同時に文字線量検出部４ではパタンレジスタ２
内の文字パタンより文字線量を検出して文字の大
きさで正規化することにより文字の複雑度Ｄとす
る。複雑度は次式によつて表わされる。 At the same time, the pattern register 2 in the character dose detection unit 4
The character complexity level D is determined by detecting the character radiation dose from the character pattern within and normalizing it by the character size. The complexity is expressed by the following equation.

Ｄ＝Ａ×Ｋ／WL×（PB＋PR）但しＫはＤを整数化するための定数、Ａは文字
枠内の全黒点数、PBは文字の外接枠のうち高さ
方向の大きさ、同様にPRは幅方向の大きさを示
すものである。WLは文字の線幅で次式によつて
求める。 D=A×K/WL×(PB+PR) However, K is a constant to convert D into an integer, A is the total number of black dots in the character frame, PB is the size of the circumscribed frame of the character in the height direction, and similarly PR indicates the size in the width direction. WL is the line width of the character and is calculated using the following formula.

WL＝Ａ／Ａ−Ｑ但しＱは、文字枠内を２×２の窓で全点観測
し、４点とも黒点である個数を表わす。 WL=A/A-Q However, Q represents the number of points in which all four points are black points when observing all points within the character frame using a 2×2 window.

文字の複雑度Ｄが検出されたら、文字種指定検
出部５へ複雑度Ｄを送出する。文字種指定検出部
５では前記文字種指定メモリを順次参照し前記文
字種指定を識別部６へ送出し文字種指定領域で第
３図の２２に示すごとく文字種指定が検出できな
かつた場合、前記複雑度Ｄを用い以下の条件を判
定し、文字種を決定し識別部６へ送出する。 When the complexity level D of the character is detected, the complexity level D is sent to the character type designation detection section 5. The character type designation detection unit 5 sequentially refers to the character type designation memory and sends the character type designation to the identification unit 6. If the character type designation cannot be detected in the character type designation area as shown at 22 in FIG. The following conditions are determined, the character type is determined, and the character type is sent to the identification unit 6.

Ｄ＜ａ全ての辞書を参照する。D<a See all dictionaries.

Ｄ≧ａ字種は漢字であるとし漢字の辞書を参照
する。D≧a The character type is assumed to be a kanji, and a kanji dictionary is referred to.

但し本実施例においてはａ＝10、Ｋ＝５とし
た。 However, in this example, a=10 and K=5.

識別部６は特徴抽出部３より送出された特徴と
辞書とを照合し、最終的に１文字のカラゴリ名を
文字名出力７へ出力する。 The identification unit 6 compares the features sent from the feature extraction unit 3 with a dictionary, and finally outputs a one-character color name to the character name output 7.

識別部６において使用する辞書メモリは、ひら
がな辞書メモリ８、カタカナ辞書メモリ９、英字
数字記号辞書メモリ１０及び漢字辞書メモリ１１
の４種が用意されているが、前記特徴抽出部３よ
り送出された特徴と辞書との照合は、前記あらか
じめ各文字に対応する字種指定があつた文字種の
辞書メモリを使用して行う。 The dictionary memories used in the identification unit 6 include a hiragana dictionary memory 8, a katakana dictionary memory 9, an alphanumeric symbol dictionary memory 10, and a kanji dictionary memory 11.
Four types are prepared, and the features sent from the feature extraction section 3 are compared with the dictionary using the dictionary memory of the character types in which the character types corresponding to each character have been designated in advance.

（発明の効果）本発明は以上詳細に説明したようにあらかじめ
字種指定の検出を行い、前記指定のない文字につ
いては、文字の文字線量を検出して、字種の選択
を行い字種に適した辞書により文字の識別を行つ
ているので高速で精度の高い読取が出来、従つて
高速で精度の良い手書日本語文書の読取が可能と
なる効果がある。(Effects of the Invention) As described above in detail, the present invention detects the character type designation in advance, and for characters without the above designation, the character type is selected by detecting the character dose of the character. Since characters are identified using a suitable dictionary, reading can be performed at high speed and with high precision, and therefore, handwritten Japanese documents can be read at high speed and with high precision.

[Brief explanation of drawings]

第１図は本発明による手書文書読取方法の一実
施例を示す構成図、第２図は本発明の実施例で使
用した帳票例を示す図、第３図はその帳票記入例
を示す図である。１……光電変換部、２……パタンレジスタ、３
……特徴抽出部、４……文字線量検出部、５……
文字種指定検出部、６……識別部、７……文字名
出力、８……ひらがな辞書メモリ、９……カタカ
ナ辞書メモリ、１０……英字数字記号辞書メモ
リ、１１……漢字辞書メモリ、２１……帳票、２
２……文字種指定記入枠、２３……ひらがな指定
欄、２４……カタカナ指定欄、２５……英字数字
記号指定欄、２６……漢字指定欄、２７……文字
記入枠、２８……文字種指定記入行、２９……文
字記入行。 FIG. 1 is a block diagram showing an embodiment of the handwritten document reading method according to the present invention, FIG. 2 is a diagram showing an example of a form used in the embodiment of the present invention, and FIG. 3 is a diagram showing an example of filling out the form. It is. 1...Photoelectric conversion unit, 2...Pattern register, 3
...Feature extraction unit, 4...Character dose detection unit, 5...
Character type designation detection unit, 6...Identification unit, 7...Character name output, 8...Hiragana dictionary memory, 9...Katakana dictionary memory, 10...Alphabet/numeric symbol dictionary memory, 11...Kanji dictionary memory, 21... …Form, 2
2...Character type specification entry box, 23...Hiragana specification field, 24...Katakana specification field, 25...Alphabet, numeric symbol specification field, 26...Kanji specification field, 27...Character entry box, 28...Character type specification Entry line, 29...Character entry line.

Claims

[Scope of Claims] 1. A handwritten Japanese document has a character frame in which characters are written, and a character type specification area provided near the character frame to specify a character type, and a dictionary of the character type specified in the character type specification area. In a handwritten document reading method that recognizes handwritten characters written in a character frame, when the character type is not specified, the character dose of the handwritten character is determined as the complexity of the character, and a dictionary of character types corresponding to the complexity is obtained. 1. A handwritten document reading method comprising: selecting a dictionary; and recognizing handwritten characters using the selected dictionary. 2. The handwritten document reading method according to claim 1, wherein the dictionary is created for hiragana, katakana, alphanumeric characters, and kanji, respectively.