JPH0319589B2

JPH0319589B2 -

Info

Publication number: JPH0319589B2
Application number: JP58203843A
Authority: JP
Inventors: Hiroyuki Harashima; Kunio Sakai; Yoshiaki Kurosawa
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1983-10-31
Filing date: 1983-10-31
Publication date: 1991-03-15
Also published as: JPS6095689A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は、特に手書き漢字を認識するための光
学的文字読取装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to an optical character reading device particularly for recognizing handwritten Chinese characters.

〔発明の技術的背景とその問題点〕近年、手書き漢字認識用の光学的文字読取装置
（OCR）には、認識精度を向上させるために手書
き漢字に対応する振り仮名文字（通常片仮名文
字）の読取結果をガイドとして、漢字の認識処理
を行なう方式のものが開発されている（特願照56
−99573）。[Technical background of the invention and its problems] In recent years, optical character readers (OCR) for handwritten kanji recognition have been equipped with furigana characters (usually katakana characters) that correspond to handwritten kanji in order to improve recognition accuracy. A system has been developed that performs kanji recognition processing using the reading results as a guide (Patent Application Reference 56).
−99573).

この方式のOCRでは、帳票上に予め漢字に対
応する振り仮名文字が記入されており、この振り
仮名文字の読取結果に基づいて漢字の認識結果か
ら得られる複数の候補文字から答となる認識結果
が選択されるように構成されている。 In this type of OCR, furigana characters corresponding to kanji are written on the form in advance, and the recognition result is an answer from multiple candidate characters obtained from the kanji recognition results based on the reading results of these furigana characters. is configured to be selected.

しかしながら上記のような方式のOCRは、漢
字の認識精度が振り仮名文字（片仮名文字）の認
識結果に依存しており、振り仮名文字の認識処理
が高精度で行なわれることにより有効となる方式
である。このため、例えば月見里（ヤマナシ）、
五月雨（サミダレ）などのように特殊な読み方の
漢字には予め正しい振り仮名を付すことが困難で
あるため、上記のような方式のOCRでは読取ミ
スが生ずることがある。また、通常の漢字の場合
でも、振り仮名文字である片仮名文字を完全に認
識することは不可能であるため、高い精度で漢字
を読取ることは困難である。 However, in the above-mentioned OCR method, the accuracy of kanji recognition depends on the recognition result of furigana characters (katakana characters), and it is a method that becomes effective when furigana character recognition processing is performed with high accuracy. be. For this reason, for example, Tsukimi Sato (Yamanashi),
Because it is difficult to assign the correct furigana in advance to kanji with special pronunciations, such as ``Samidare,'' OCR using the above-mentioned method may result in reading errors. Furthermore, even in the case of normal kanji, it is impossible to completely recognize katakana characters, which are furigana characters, so it is difficult to read kanji with high accuracy.

[Purpose of the invention]

本発明は上記の事情に鑑みてなされたもので、
その目的は、振り仮名文字の認識結果を参照して
漢字の認識を行なう方式のOCRにおいて、高い
精度で漢字の認識を行なうことができる光学的文
字読取装置を提供することにある。 The present invention was made in view of the above circumstances, and
The purpose is to provide an optical character reading device that can recognize kanji with high precision in OCR, which uses the recognition results of furigana characters to perform kanji recognition.

[Summary of the invention]

本発明では、漢字及びその漢字に対応する振り
仮名文字が記録された帳票に対して、各文字を文
字単位で認識する文字認識手段が設けられる。こ
の文字認識手段の文字単位の各認識結果に基づい
て、漢字及び振り仮名のそれぞれからなる単語が
単語認識手段により認識される。単語認識手段
は、予め用意されている単語認識用テーブルを参
照して各単語単の認識処理を行なう。 In the present invention, a character recognition means is provided for recognizing each character in units of characters for a form in which kanji characters and furigana characters corresponding to the kanji characters are recorded. Based on the recognition results of each character by the character recognition means, the word recognition means recognizes words each consisting of kanji and furigana. The word recognition means performs recognition processing for each word by referring to a word recognition table prepared in advance.

制御手段は、単語認識手段からの各単語単位の
認識結果を照合し、その照合結果が漢字および振
り仮名文字の各単語単位の認識結果が相互に対応
する場合にその各認識結果を最終的答として出力
するように構成されている。 The control means collates the recognition results of each word unit from the word recognition means, and when the collation results indicate that the recognition results of each word unit of kanji and furigana characters correspond to each other, converts each recognition result into a final answer. It is configured to output as .

これにより、漢字及び振り仮名の各文字を同時
にしかも正確に認識でき、結果的に高い精度で漢
字を読取ることができるものである。 As a result, each character of kanji and furigana can be recognized simultaneously and accurately, and as a result, kanji can be read with high accuracy.

[Embodiments of the invention]

以下図面を参照して本発明の一実施例について
説明する。第１図は一実施例に係わるOCRの部
分的構成を示すブロツク図である。第１図におい
て、走査部１０は帳票（第２図に示す）上を走査
してその帳票に記録された漢字及び振り仮名用片
仮名文字のそれぞれを光電変換し、各量子化パタ
ーン（２値化信号からなる文字パターン）を文字
認識部１１へ出力する。この文字認識部１１は、
走査部１０から送られる量子化パターンに基づい
て帳票上の、漢字及び振り仮名用片仮名文字の認
識処理を文字単位で行なう。 An embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing a partial configuration of an OCR according to an embodiment. In FIG. 1, a scanning unit 10 scans a form (shown in FIG. 2), photoelectrically converts each of the kanji and furigana/katakana characters recorded on the form, and converts each quantization pattern (binary A character pattern consisting of a signal) is output to the character recognition unit 11. This character recognition unit 11 is
Based on the quantization pattern sent from the scanning unit 10, recognition processing of kanji and furigana katakana characters on the form is performed character by character.

単語認識部１２は、文字認識部１１から出力さ
れる漢字及び片仮名文字の各認識結果（文字単
位）に基づいて、それぞれの文字からなる各単語
を認識する。単語認識部１２は、予め漢字及び片
仮名文字のそれぞれについて単語単位の認識用テ
ーブル（辞書）を記憶している。制御部１３は、
単語認識部１２から出力される漢字及び片仮名文
字の各単語単位の認識結果を照合し、その照合結
果において各単語単位の認識結果が相互に対応す
る場合にその各認識結果を最終的答として出力す
る。また、制御部１３は走査部１０、文字認識部
１１及び単語認識部１２の各動作を制御する。 The word recognition unit 12 recognizes each word made up of the respective characters based on the recognition results (character by character) of the kanji and katakana characters output from the character recognition unit 11. The word recognition unit 12 stores in advance a word-by-word recognition table (dictionary) for each of kanji and katakana characters. The control unit 13 is
The recognition results for each word of kanji and katakana characters output from the word recognition unit 12 are collated, and if the recognition results for each word correspond to each other in the collation results, each recognition result is output as the final answer. do. Further, the control section 13 controls each operation of the scanning section 10, the character recognition section 11, and the word recognition section 12.

上記のような構成のOCRにおいて、その動作
を説明する。先ず、第２図に示すような帳票が走
査部１０により走査されて、その帳票上に記録さ
れた漢字及びその振り仮名用片仮名がそれぞれ量
子化パターンに変換されて文字認識部１１に送ら
れる。ここで、帳票には第２図に示すように漢字
２０が記入される文字エリア２１及びその漢字２
０に対応する振り仮名用片仮名文字（以下単にカ
タカナと称する）２２が記入される文字エリア２
３が予め設けられている。走査部１０からは、帳
票上の漢字２０及びカタカナ２２に対応する各量
子化パターンが文字認識部１１へ送られる。文字
認識部１１では、走査部１０からの各量子化パタ
ーンに基づいて漢字２０及びカタカナ２２に対す
る文字単位の認識処理が行なわれる。 The operation of the OCR configured as described above will be explained. First, a form as shown in FIG. 2 is scanned by the scanning unit 10, and the kanji and furigana/katakana characters recorded on the form are each converted into quantized patterns and sent to the character recognition unit 11. Here, the form includes a character area 21 in which the kanji 20 is written, and the kanji 2
Character area 2 in which furigana katakana characters (hereinafter simply referred to as katakana) 22 corresponding to 0 are written
3 is set in advance. The scanning unit 10 sends each quantization pattern corresponding to the kanji 20 and katakana 22 on the form to the character recognition unit 11 . The character recognition unit 11 performs character-by-character recognition processing on the kanji 20 and katakana 22 based on each quantization pattern from the scanning unit 10.

文字認識部１１で認識された文字単位の各認識
結果は、順次単語認識部１２へ送られる。単語認
識部１２は、文字単位の各認識結果を例えばバツ
フアメモリ内に単語単位毎に格納する。この場
合、制御部１３により予め用意されているフオー
マツトテーブル（帳票上に記入される文字２０，
２２の位置を指示する情報等が記憶されている）
に基づいて、単語単位毎に上記各認識結果が単語
認識部１２に送られる。これにより、単語認識部
１２では単語単位（即ち例えば名字と名前毎の単
語）の認識処理が行なわれる。 The recognition results for each character recognized by the character recognition section 11 are sequentially sent to the word recognition section 12. The word recognition unit 12 stores each character recognition result in a buffer memory, word by word, for example. In this case, a format table prepared in advance by the control unit 13 (characters 20 to be written on the form,
Information indicating the location of 22 is stored)
Based on this, the above recognition results are sent to the word recognition section 12 for each word. As a result, the word recognition unit 12 performs recognition processing on a word-by-word basis (that is, for example, a surname and a word for each name).

ところで、例えば第３図に示すように文字認識
部１１でのカタカナに対する認識結果において、
名字の１文字３０及び名前の１文字３１がそれぞ
れ特定できないとする。このときの各認識結果３
０，３１の候補文字３２，３３は、それぞれ２文
字づつとする。また例えば第４図に示すように文
字認識部１１での漢字に対する認識結果におい
て、名字の１文字４０及び名前の１文字４１がそ
れぞれ特定できないとする。このときの各認識結
果４０，４１の候補文字４２，４３は、それぞれ
２文字づつとする。単語認識部１２では、上記の
ような単語単位のカタカナ（第３図）及び漢字
（第４図）に対して、第５図及び第６図に示す単
語認識用テーブル（以下単語辞書と称する）に基
づいて単語単位の認識処理が行なわれる。即ち、
第３図に示すカタカナに対する認識処理では、第
５図に示す単語辞書に基づいて、単語認識が行な
われる。名字の「サカク」および名前の「ケニ
オ」の各単語は単語辞書に存在しないため除去さ
れる。これにより、名字及び名前の各カタカナの
単語認識結果として、第７図に示す単語「サカ
イ」及び「クニオ」の両者が単語認識部１２から
制御部１へ送られる。このとき、「サカイ」及び
「クニオ」の各振り仮名に対応する複数の漢字７
０，７１が共に制御部１３へ送られる。 By the way, for example, as shown in FIG. 3, in the recognition results for katakana in the character recognition unit 11,
Assume that one character 30 of the last name and one character 31 of the first name cannot be specified. Each recognition result 3 at this time
The candidate characters 32 and 33 of 0 and 31 are each two characters. For example, as shown in FIG. 4, it is assumed that one character 40 of the last name and one character 41 of the first name cannot be identified in the recognition results for Chinese characters by the character recognition unit 11. At this time, the candidate characters 42 and 43 of each recognition result 40 and 41 are two characters each. The word recognition unit 12 uses a word recognition table (hereinafter referred to as a word dictionary) shown in FIGS. 5 and 6 for the above-mentioned word units of katakana (FIG. 3) and kanji (FIG. 4). Word-by-word recognition processing is performed based on this. That is,
In the recognition process for Katakana shown in FIG. 3, word recognition is performed based on the word dictionary shown in FIG. The last name "Sakaku" and the first name "Kenio" are removed because they do not exist in the word dictionary. As a result, both the words "Sakai" and "Kunio" shown in FIG. 7 are sent from the word recognition section 12 to the control section 1 as the word recognition results for each katakana of the surname and first name. At this time, multiple kanji 7 corresponding to each furigana of "Sakai" and "Kunio"
0 and 71 are both sent to the control section 13.

また、第４図に示す漢字に対する単語認識部１
２の認識処理では、第６図に示す単語辞書に基づ
いて検索され、上記第３図のカタカナの認識結果
（サカ及びニオ）及び第４図の候補文字４２，４
３に応じた漢字「坂井」及び「邦夫」の両者が選
択される。これにより、単語認識部１２からは、
第８図に示す認識結果（坂井及び邦夫）が各対応
するカタカナ８０，８１と共に制御部１３へ送ら
れる。 In addition, the word recognition unit 1 for kanji shown in FIG.
In the recognition process No. 2, a search is performed based on the word dictionary shown in FIG.
Both the kanji ``Sakai'' and ``Kunio'' corresponding to 3 are selected. As a result, from the word recognition unit 12,
The recognition results (Sakai and Kunio) shown in FIG. 8 are sent to the control unit 13 along with the corresponding katakana 80 and 81.

制御部１３では、単語認識部１２から送られる
第７図及び第８図に示す各認識結果が照合され
る。そして、単語単位のカタカナ及び漢字の両者
が相互に対応する認識結果がそれぞれ最終的答と
して出力される。即ち、この場合には第９図に示
すような答が制御部１３から出力される。制御部
１３での照合処理の結果において、相互に対応す
るカタカナ及び漢字の各認識結果が存在しない場
合にはリジエクト処理となる。 The control unit 13 collates the recognition results shown in FIGS. 7 and 8 sent from the word recognition unit 12. Then, the recognition results in which both the katakana and kanji characters of each word correspond to each other are output as final answers. That is, in this case, an answer as shown in FIG. 9 is output from the control section 13. As a result of the matching process in the control unit 13, if there are no mutually corresponding recognition results for katakana and kanji, a reject process is performed.

このようにして、帳票上に記入された単語単位
の漢字及びその振り仮名となるカタカナをそれぞ
れ認識処理し、その各認識結果を照合する。これ
により、単語単位の漢字及びカタカナがそれぞれ
相互に対応するような組合せ結果を最終的答とし
て出力することができる。したがつて、振り仮名
であるカタカナの文字単位の認識結果が完全でな
い場合でも、単語単位での漢字及びカタカナの相
互に対応する組合せ結果として答を出力するた
め、漢字の誤読率を大幅に低下させることができ
る。また、上記のように単語単位による認識処理
が行なわれるため、特別な読み方をする漢字（例
えば月見里、五月雨）に対する読取処理にも有効
となる。 In this way, each word of kanji and katakana written on the form is recognized, and the recognition results are compared. As a result, a combination result in which the kanji and katakana of each word correspond to each other can be output as the final answer. Therefore, even if the character-by-character recognition results for katakana (furigana) are not perfect, the answer is output as a combination of kanji and katakana that correspond to each other on a word-by-word basis, significantly reducing the misreading rate of kanji. can be done. Furthermore, since recognition processing is performed on a word-by-word basis as described above, it is also effective in reading processing for kanji that have special readings (for example, Tsukimi-sato, Satsuki-ame).

〔Effect of the invention〕

以上詳述したように本発明によれば、振り仮名
文字及び漢字の各単語単位の認識結果を照合し、
相互に対応する振り仮名文字と漢字の組合せ結果
を最終的答として出力することにより、振り仮名
文字の文字単位の認識結果が不完全の場合でも、
漢字に対する誤読率を大幅に低下させることがで
きる。したがつて、結果的に高い精度で漢字の認
識を行なうことができるものである。 As detailed above, according to the present invention, the recognition results for each word of furigana characters and kanji are collated,
By outputting the combination result of furigana characters and kanji that correspond to each other as the final answer, even if the recognition result of each character of furigana characters is incomplete,
The rate of misreading of kanji can be significantly reduced. Therefore, it is possible to recognize kanji with high accuracy as a result.

[Brief explanation of the drawing]

第１図は本発明の一実施例に係わる光学的文字
読取装置の構成を示すブロツク図、第２図は帳票
の一例を示す図、第３図及び第４図はそれぞれ第
１図の文字認識部の認識結果の一例を示す図、第
５図及び第６図はそれぞれ第１図の単語認識部に
用意される単語認識用テーブルの一例を示す図、
第７図及び第８図はそれぞれ第１図の単語認識部
の認識結果の一例を示す図、第９図は第１図の制
御部の最終的答の一例を示す図である。１０……走査部、１１……文字認識部、１２…
…単語認識部、１３……制御部。 FIG. 1 is a block diagram showing the configuration of an optical character reading device according to an embodiment of the present invention, FIG. 2 is a diagram showing an example of a form, and FIGS. 3 and 4 are character recognition of FIG. 1, respectively. 5 and 6 are diagrams each showing an example of a word recognition table prepared in the word recognition section of FIG. 1,
7 and 8 are diagrams each showing an example of the recognition result of the word recognition section of FIG. 1, and FIG. 9 is a diagram showing an example of the final answer of the control section of FIG. 1. 10... Scanning section, 11... Character recognition section, 12...
...Word recognition section, 13...Control section.

Claims

[Claims]

1. A form that has a character area for recording kanji and a character area for recording furigana characters corresponding to the kanji, and a quantization pattern by photoelectrically converting the kanji and furigana characters recorded on this form. a scanning unit that outputs a quantization pattern, a character recognition unit that recognizes each of the above-mentioned kanji and furigana characters character by character based on the quantization pattern output from the scanning unit; Word recognition means that recognizes words made of each of the above-mentioned kanji and furigana characters based on the recognition results by referring to a word recognition table prepared in advance, and recognition of each word unit output from this word recognition means. It is characterized by comprising a control means for collating the results and outputting each recognition result as a final answer when the recognition results for each word of kanji and furigana characters correspond to each other. optical character reading device.