JPH05120485A

JPH05120485A - Character reader

Info

Publication number: JPH05120485A
Application number: JP3285038A
Authority: JP
Inventors: Minako Kuwata; みな子桑田; Yasuhisa Nakamura; 安久中村; Kazuhiro Takehara; 和宏竹原
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1991-10-30
Filing date: 1991-10-30
Publication date: 1993-05-18

Abstract

PURPOSE:To improve reading precision without dropping of memory usage efficiency and processing speed. CONSTITUTION:A scanner 10 which scans characters to output an image data G1, a recognizing part 11 which extracts respective character patterns out of the data G1 and accesses a standard pattern R1 to extract N numbers of candidate character codes, an error table referring part 12 which further refers to an error table R2 for each candidate character code to extract M numbers of candidate character codes, a post-processing part 13 which finally identifies the most suitable character code out of (N+M) numbers of candidate character codes to introduce as a final derivated code G4 and an output part 14 which outputs the final derivated code G4 are provided.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】この発明は、文字読取装置に関
し、特に、読取った文字を対応の文字コードにして出力
する文字読取装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character reading device, and more particularly to a character reading device which outputs a read character as a corresponding character code.

【０００２】[0002]

【従来の技術】一般に、文字読取装置は紙面上に書かれ
た日本語文章の文字を光で走査して読取り、まず画像デ
ータを出力する。そして、画像データに含まれる各文字
のパターンを抽出し、各文字パターンをパターンマッチ
ングにより該当する文字コードに変換しながら、この文
字コードを出力する。文字コードはコンピュータ、ワー
ドプロセッサおよび手帳サイズの電子計算機（電子シス
テム手帳）などの入力データとして用いられることが一
般的であった。2. Description of the Related Art Generally, a character reading device scans characters of a Japanese sentence written on a paper by light to read them, and first outputs image data. Then, the pattern of each character included in the image data is extracted, and this character code is output while converting each character pattern into a corresponding character code by pattern matching. The character code is generally used as input data for a computer, a word processor, and a notebook-sized electronic computer (electronic system notebook).

【０００３】図４は、従来の文字読取装置２の構成図で
ある。文字読取装置２は予め紙面上に書かれた日本語文
章を光で走査（スキャン）して画像データをＧ１を出力
するスキャナー１０、画像データＧ１を入力し、そこに
含まれる各文字パターンを抽出して、各文字パターンに
ついて標準パターンＲ１とのパターンマッチング（パタ
ーン照合）により文字コードへの変換を行ない、パター
ンマッチング結果として１文字パターンごとにＮ個の候
補文字コードと類似度のデータＧ２を出力する認識部１
１、後処理辞書Ｒ３をアクセスして、与えられるＮ個の
候補文字コードから最終的に認識される最適な最終認識
コードＧ５を出力する後処理部１３および最終認識コー
ドＧ５を次段に接続されるパーソナルコンピュータやワ
ードプロセッサなどへ入力データとして出力する出力部
１４を備える。FIG. 4 is a block diagram of a conventional character reading device 2. The character reading device 2 inputs a scanner 10 that outputs a G1 image data by scanning a Japanese sentence previously written on paper with light, and extracts each character pattern included in the image data G1. Then, each character pattern is converted into a character code by pattern matching (pattern matching) with the standard pattern R1, and N candidate character codes and similarity data G2 are output for each character pattern as a pattern matching result. Recognition unit 1
1. Accessing the post-processing dictionary R3, the post-processing unit 13 for outputting the optimum final recognition code G5 finally recognized from the given N candidate character codes and the final recognition code G5 are connected to the next stage. The output unit 14 outputs the input data to a personal computer or a word processor.

【０００４】従来の文字読取装置２の認識部１１は標準
パターンＲ１と画像データＧ１から切出した各文字パタ
ーンとのマッチングを取り、最適な文字コードになりう
るＮ個の候補文字コードを決定し、後処理部１３へＮ個
の候補文字コードと類似度のデータＧ２にして導出する
ようにしていた。後処理部１３は、前段の認識部１１か
ら与えられるＮ個の候補文字コードＧ２を逐次に入力
し、これを単語綴りで展開して後処理辞書Ｒ３をアクセ
スしながら、展開された単語綴りがこの辞書Ｒ３に登録
されているか否かなどにより、最終的に文字認識をし、
最終認識コードＧ５を出力するようにしていた。The recognition unit 11 of the conventional character reader 2 matches the standard pattern R1 with each character pattern cut out from the image data G1 to determine N candidate character codes that can be the optimum character code, The N candidate character codes and the similarity data G2 are derived to the post-processing unit 13. The post-processing unit 13 sequentially inputs the N candidate character codes G2 given from the recognition unit 11 in the previous stage, develops the candidate character codes G2 by word spelling, and accesses the post-processing dictionary R3, while developing the developed word spelling. Finally, character recognition is performed depending on whether or not it is registered in this dictionary R3,
The final recognition code G5 is output.

【０００５】[0005]

【発明が解決しようとする課題】上述した従来の文字読
取装置２の認識部１１においてアクセスされる標準パタ
ーンＲ１は、予め限定された種類の標準パターンをスト
アしているので、認識部１１におけるパターンマッチン
グの際には、導出されるＮ個の候補文字コードは標準パ
ターンＲ１にストアされる文字パターンに該当の文字コ
ードに限定されていた。このため、その標準パターンが
標準パターンＲ１にストアされていない文字がスキャナ
ー１０によって読取られ、画像データＧ１にして入力さ
れると、認識部１１ではＮ個の候補文字コードを抽出し
たとしても、最終的に認識することができなかった。こ
の標準パターンＲ１において、あらゆる文字の標準パタ
ーンをストアするようにすれば、上述したような認識不
可能という問題は起らないが、その実現には標準パター
ンＲ１におけるメモリーの消費量増加を引起こし、文字
読取装置２におけるメモリーの有効利用が低下するとと
もに、標準パターンＲ１のアクセス処理を含む認識処理
時間が大幅に増加して、文字読取装置２自体の読取速度
が著しく低下するという問題があった。それゆえに、上
述したように認識不可能になることを回避するために標
準パターンＲ１にストアされるパターン数を増やすこと
は、根本的な解決策とはなりえないという問題があっ
た。The standard pattern R1 accessed by the recognition unit 11 of the conventional character reading device 2 described above stores a standard pattern of a limited type in advance, so the pattern in the recognition unit 11 is stored. At the time of matching, the derived N candidate character codes were limited to the character codes corresponding to the character patterns stored in the standard pattern R1. Therefore, when a character whose standard pattern is not stored in the standard pattern R1 is read by the scanner 10 and input as the image data G1, even if the recognition unit 11 extracts N candidate character codes, I could not recognize it. If the standard pattern of all characters is stored in the standard pattern R1, the above-mentioned problem of unrecognization does not occur, but the realization thereof causes an increase in memory consumption in the standard pattern R1. The problem is that the effective use of the memory in the character reading device 2 is reduced and the recognition processing time including the access process of the standard pattern R1 is significantly increased, and the reading speed of the character reading device 2 itself is significantly reduced. .. Therefore, there is a problem that increasing the number of patterns stored in the standard pattern R1 in order to avoid becoming unrecognizable as described above cannot be a fundamental solution.

【０００６】それゆえにこの発明の目的は、メモリー容
量の利用効率の低下ならびに処理速度の低下を引起こす
ことなく、文字読取性能を向上させた文字読取装置を提
供することである。Therefore, an object of the present invention is to provide a character reading device with improved character reading performance without causing a decrease in utilization efficiency of memory capacity and a decrease in processing speed.

【０００７】[0007]

【課題を解決するための手段】この発明にかかる文字読
取装置は文字を走査して入力文字パターンを出力する走
査手段と、予め複数の文字パターンのそれぞれを、その
文字コードを対応づけて記憶する第１記憶手段と、第１
文字コード出力手段と、第１記憶手段に記憶されない文
字パターンを含む複数文字パターンの各文字コードを、
類似するパターンごとに予めグループ化して記憶する第
２記憶手段と、第１文字コード出力手段により出力され
た文字コードのそれぞれについて、該当する照合結果が
妥当でないことを判定する判定手段と、判定手段の判定
出力に該当の文字コードに基いて第２記憶手段を検索
し、該当するグループに含まれる文字コードを出力する
第２文字コード出力手段と、第１および第２文字コード
出力手段により出力された複数文字コードから最適な文
字コードを認識して出力する認識出力手段とを備えて構
成される。A character reading device according to the present invention stores scanning means for scanning a character to output an input character pattern and a plurality of character patterns in advance in association with respective character codes. A first storage means and a first
Character code output means and character codes of a plurality of character patterns including character patterns not stored in the first storage means,
Second storage means for preliminarily grouping and storing each similar pattern, and determination means for determining that the corresponding matching result is not valid for each of the character codes output by the first character code output means, and the determination means. The second storage means is searched on the basis of the corresponding character code based on the judgment output of, and is output by the second character code output means for outputting the character code included in the corresponding group and the first and second character code output means. And a recognition output means for recognizing and outputting an optimum character code from a plurality of character codes.

【０００８】上述の第１文字コード出力手段は、走査手
段が出力する入力文字パターンと第１記憶手段に記憶さ
れる複数文字パターンのそれぞれとをパターン照合し、
その照合結果が所定レベルに達する少なくとも一つ以上
の文字パターンに対応の文字コードをそれぞれ出力する
よう構成される。The above-mentioned first character code output means performs pattern matching between the input character pattern output by the scanning means and each of the plurality of character patterns stored in the first storage means,
The character code corresponding to at least one character pattern whose matching result reaches a predetermined level is output.

【０００９】[0009]

【作用】この発明にかかる文字読取装置はメモリー消費
の大幅増加および処理時間の大幅増加なしに、読取精度
の向上を可能とするように、第１記憶手段に追加して第
２記憶手段を備えることを特徴とする。第２記憶手段は
第１記憶手段に記憶されない文字パターンを含む複数文
字パターンの各文字コードを記憶して、第１文字コード
出力手段は第１記憶手段をアクセスして文字コードを出
力し、第２文字コード出力手段は第２記憶手段をアクセ
スして文字コードを出力するので、認識出力手段は第１
および第２文字コード出力手段から出力された複数文字
コードの中から最適な文字コードを認識するので、最適
文字コードがこの複数文字コード中に存在する確率、す
なわち最適文字コードが認識される確率は向上する。さ
らに、第２記憶手段は、第１記憶手段が文字パターンと
文字コードをストアするのに対し、文字コードのみをス
トアするので、そのメモリー消費量は第１記憶手段のそ
れに比較し小さく、第２文字コード出力手段はパターン
照合をせずに文字コードを出力するので、パターン照合
処理は不要となって認識出力手段における認識率が高い
割に処理速度は向上する。The character reading apparatus according to the present invention further comprises the second storage means in addition to the first storage means so that the reading accuracy can be improved without significantly increasing the memory consumption and the processing time. It is characterized by The second storage means stores each character code of a plurality of character patterns including a character pattern not stored in the first storage means, and the first character code output means accesses the first storage means to output the character code, Since the two-character code output means accesses the second storage means and outputs the character code, the recognition output means is the first
Since the optimum character code is recognized from the plural character codes output from the second character code output means, the probability that the optimum character code exists in the plural character codes, that is, the probability that the optimum character code is recognized is improves. Further, since the second storage means stores only the character code while the first storage means stores the character pattern and the character code, the memory consumption thereof is smaller than that of the first storage means, Since the character code output means outputs the character code without performing the pattern matching, the pattern matching processing becomes unnecessary and the processing speed is improved despite the high recognition rate in the recognition output means.

【００１０】[0010]

【実施例】以下、この発明の実施例について図面を参照
して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１１】図１は、本発明の１実施例による文字読取
装置の構成図である。図２（ａ）ないし（ｄ）は本発明
の１実施例による間違いテーブルの作成手順を説明する
ための図である。FIG. 1 is a block diagram of a character reading apparatus according to an embodiment of the present invention. 2 (a) to 2 (d) are views for explaining the procedure for creating the error table according to the first embodiment of the present invention.

【００１２】図３は本発明の１実施例による文字読取装
置の読取処理のフロー図である。図１において文字読取
装置１はスキャナー１０、認識部１１および間違いテー
ブル参照部１２を含む候補文字抽出部１５、後処理部１
３、出力部１４およびデータがストアされたメモリーの
一種である標準パターンＲ１、間違いテーブルＲ２なら
びに後処理辞書Ｒ３を含む。FIG. 3 is a flow chart of the reading process of the character reading device according to one embodiment of the present invention. In FIG. 1, the character reading device 1 includes a scanner 10, a candidate character extraction unit 15 including a recognition unit 11 and an error table reference unit 12, and a post-processing unit 1.
3, a standard pattern R1, which is a kind of memory in which the output unit 14 and data are stored, an error table R2, and a post-processing dictionary R3.

【００１３】標準パターンＲ１は複数の文字のそれぞれ
についての標準パターンとその文字コードとを対応づけ
て記憶するように構成される。間違いテーブルＲ２の詳
細については後述するが、標準パターンＲ１にストアさ
れる文字パターンと標準パターンＲ１にストアされない
文字パターンに対応の文字コードを、類似するパターン
ごとにグループ化して記憶するように構成されるテーブ
ルである。後処理辞書Ｒ３は複数の単語を文字コード単
位でストアするように構成される辞書である。The standard pattern R1 is configured to store a standard pattern for each of a plurality of characters and its character code in association with each other. Although details of the error table R2 will be described later, the character patterns corresponding to the character patterns stored in the standard pattern R1 and the character patterns not stored in the standard pattern R1 are configured to be grouped and stored for each similar pattern. Table. The post-processing dictionary R3 is a dictionary configured to store a plurality of words in character code units.

【００１４】スキャナー１０は、予め紙面上に書かれた
日本語文章の文字を光で読取り、画像データＧ１を導出
する。候補文字抽出部１５は画像データＧ１を入力し、
応じて（Ｎ＋Ｍ）個の候補文字コードと類似度のデータ
Ｇ３を出力するよう構成される。候補文字抽出部１５は
標準パターンＲ１をアクセスする認識部１１および間違
いテーブルＲ２をアクセスする間違いテーブル参照部１
２を有する。認識部１１は画像データＧ１から各文字の
文字パターンを切出して、切出された文字パターンと標
準パターンＲ１にストアされる標準の文字パターンとを
パターン照合し、所定のマッチング条件に一致したＮ個
の標準文字パターンに該当の文字コードを候補文字コー
ドとしてさらにそれぞれについてのパターンの類似に関
する類似度のデータＧ２を出力する。間違いテーブル参
照部１２は与えられるＮ個の候補文字コードのそれぞれ
について間違いテーブルＲ２をアクセスし、各候補文字
コードのグループを特定して特定されたグループからの
Ｍ個の文字コードを読出す。そして間違いテーブル参照
部１２は与えられるＮ個の候補文字コードと間違いテー
ブルＲ２をアクセスして読出したＭ個の候補文字コード
を合せた（Ｎ＋Ｍ）個の候補文字コードと類似度のデー
タＧ３を導出する。The scanner 10 optically reads the characters of the Japanese sentence written on the paper surface in advance and derives the image data G1. The candidate character extraction unit 15 inputs the image data G1,
Accordingly, (N + M) candidate character codes and the similarity data G3 are output. The candidate character extraction unit 15 includes a recognition unit 11 that accesses the standard pattern R1 and an error table reference unit 1 that accesses the error table R2.
Have two. The recognition unit 11 cuts out a character pattern of each character from the image data G1 and performs pattern matching between the cut-out character pattern and the standard character pattern stored in the standard pattern R1. Further, the character code corresponding to the standard character pattern is used as the candidate character code, and the data G2 of the similarity regarding the similarity of each pattern is output. The error table reference unit 12 accesses the error table R2 for each of the given N candidate character codes, identifies each candidate character code group, and reads M character codes from the identified group. The error table reference unit 12 derives (N + M) candidate character codes and similarity data G3, which is a combination of the N candidate character codes given and the M candidate character codes read by accessing the error table R2. To do.

【００１５】後処理部１３は、前段の候補文字抽出部１
５において抽出された（Ｎ＋Ｍ）個の候補文字コードを
逐次入力し、これを単語綴りで展開して後処理辞書Ｒ３
にストアされる単語と比較照合し、その単語綴りの妥当
性から最終的な文字コードを特定して認識する。このよ
うにして後処理部１３が後処理辞書Ｒ３をアクセスして
最終的に認識した文字コードは最終認識コードＧ４とし
て出力部１４に導出される。The post-processing unit 13 is a candidate character extraction unit 1 in the preceding stage.
The (N + M) candidate character codes extracted in step 5 are sequentially input, and they are developed by word spelling to obtain the post-processing dictionary R3.
The final character code is identified and recognized from the validity of the spelling of the word by comparing and collating with the word stored in. In this way, the character code finally recognized by the post-processing unit 13 accessing the post-processing dictionary R3 is derived to the output unit 14 as the final recognition code G4.

【００１６】出力部１４は与えられる最終認識コードＧ
４をパーソナルコンピュータやワードプロセッサなどへ
の入力データとして導出するか、プリンターやＣＲＴ
（陰極線管）に印字または画面出力するよう動作する。The output unit 14 gives the final recognition code G
4 as input data to a personal computer or word processor, or to a printer or CRT
It operates to print or screen output on (cathode ray tube).

【００１７】以上のように文字読取装置１はスキャナー
１０で日本語文章の文字を読取ることにより、この読取
った結果を文字コードＧ４にしてパーソナルコンピュー
タやワードプロセッサなどへ与えるように動作するか、
もしくはプリンターやＣＲＴなどに印字出力または画面
出力するよう構成されるので、操作者はスキャナー１０
で読取った日本語文章の読取結果をパーソナルコンピュ
ータなどでデータとしてデータ処理できるとともに、印
字結果や画面表示でその結果を確認できる。As described above, the character reading device 1 operates by reading the characters of a Japanese sentence with the scanner 10 and giving the read result as a character code G4 to a personal computer or a word processor.
Alternatively, since the printer or the CRT is configured to print out or output on the screen, the operator can operate the scanner 10
The read result of the Japanese sentence read in can be processed as data with a personal computer, etc., and the result can be confirmed by the print result or screen display.

【００１８】文字読取装置１と前述した従来の文字読取
装置２の構成の上で大きく異なる点は、文字読取装置１
が間違いテーブルＲ２を備え、これをアクセスする間違
いテーブル参照部１２を設けている点にある。このよう
に構成される文字読取装置１は認識部１１および間違い
テーブル参照部１２を含む候補文字抽出部１５により、
スキャナー１０で読取られた画像データＧ１を文字コー
ドに変換する処理において、データＧ１に含まれる各文
字パターンについて第１候補から（Ｎ＋Ｍ）候補の文字
コードを、その文字パターンの文字コードらしいと判断
し、最終的な文字コードの認識を後処理部１３で行なう
ようにしている。したがって、従来の文字読取装置２の
後処理部１３においてはＮ個の候補文字コードから最終
的な認識文字を決定するようにしているが、文字読取装
置１の後処理部１３は（Ｎ＋Ｍ）個の候補文字コードか
ら最終的に文字認識するので、その認識精度、すなわち
文字読取精度は従来に比較し飛躍的に向上することにな
る。The character reading device 1 differs from the above-described conventional character reading device 2 in that the character reading device 1 has a large difference.
Is provided with the error table R2, and the error table reference unit 12 for accessing the error table R2 is provided. The character reading device 1 configured as described above uses the candidate character extraction unit 15 including the recognition unit 11 and the error table reference unit 12,
In the process of converting the image data G1 read by the scanner 10 into a character code, it is determined that the character code of the first to (N + M) candidates for each character pattern included in the data G1 seems to be the character code of the character pattern. The final character code is recognized by the post-processing unit 13. Therefore, the post-processing unit 13 of the conventional character reading device 2 determines the final recognized character from N candidate character codes, but the post-processing unit 13 of the character reading device 1 has (N + M) number of characters. Since the character is finally recognized from the candidate character code of, the recognition accuracy, that is, the character reading accuracy is dramatically improved as compared with the conventional one.

【００１９】次に、間違いテーブルＲ２の作成手順につ
いて説明する。図２（ａ）ないし（ｄ）に示される手順
により間違いテーブルＲ２が予め作成される。Next, the procedure for creating the error table R2 will be described. The error table R2 is created in advance by the procedure shown in FIGS. 2 (a) to 2 (d).

【００２０】図２（ａ）においては装置１において入出
力され得る文字種ＣＨが予め定められ、文字種ＣＨのそ
れぞれについては、図２（ａ）に示されるようにカテゴ
リー番号Ｃｉ（ｉ＝０．１．２．３…）が１対１に対応
して準備される。このように文字種ＣＨとカテゴリー番
号Ｃｉとが１対１に対応しているテーブルはカテゴリー
対応テーブルＴ１として設けられる。In FIG. 2A, the character types CH that can be input and output in the device 1 are predetermined, and for each of the character types CH, as shown in FIG. 2A, the category number Ci (i = 0.1) is set. .2.3 ...) are prepared in a one-to-one correspondence. The table in which the character type CH and the category number Ci have a one-to-one correspondence in this way is provided as a category correspondence table T1.

【００２１】図２（ｂ）には間違いテーブルＲ２の作成
手順がフローにして示され、図２（ｃ）には作成時にア
クセスされる間違いテーブル作成用のテーブルＴ２が示
され、図２（ｄ）には図２（ｂ）の手順により作成され
る間違いテーブルＲ２が示される。図２（ｂ）の手順フ
ローは、あらかじめプログラムとして準備される。図２
（ｃ）のテーブルＴ２および図２（ｄ）のテーブルＲ２
は予め所定容量のメモリー空間にその領域がとられる。FIG. 2B shows a flow of the procedure for creating the error table R2, FIG. 2C shows a table T2 for creating an error table, which is accessed at the time of creation, and FIG. ) Shows the error table R2 created by the procedure of FIG. The procedure flow of FIG. 2B is prepared as a program in advance. Figure 2
Table T2 in (c) and table R2 in FIG. 2 (d)
The area is previously set in a memory space having a predetermined capacity.

【００２２】なお、図２（ａ）ないし（ｄ）に示される
各テーブルならびに処理フローのプログラムは、別個の
計算機内において準備され、その計算機において間違い
テーブルＲ２が予め作成されて、文字読取装置１の内部
にロードされるようにしてもよいし、これらのテーブル
およびプログラムが文字読取装置１の内部のメモリーに
準備されて、装置１自体が文字読取動作に先立って、間
違いテーブルＲ２を作成するようにしてもよい。The tables and processing flow programs shown in FIGS. 2A to 2D are prepared in a separate computer, and the error table R2 is created in advance in that computer, and the character reading device 1 is prepared. May be loaded into the internal memory of the character reading device 1, or these tables and programs may be prepared in the internal memory of the character reading device 1 so that the device 1 itself creates the error table R2 prior to the character reading operation. You can

【００２３】図２（ｂ）のステップＳＴ１、（図中ＳＴ
１と略す）において、まず間違いテーブル作成用のテー
ブルＴ２および間違いテーブルＲ２のための記憶領域の
確保とその領域のクリア（初期化）が行なわれる。Step ST1 of FIG. 2B, (ST in the figure
1), a storage area for the error table creation table T2 and the error table R2 is secured and the area is cleared (initialized).

【００２４】続くステップＳＴ２以降の処理は、文字読
取装置１の認識部１１に相当する機能を有した処理部に
おいて行なわれる。Subsequent processing from step ST2 is performed by a processing unit having a function corresponding to the recognition unit 11 of the character reading device 1.

【００２５】ステップＳＴ２の処理においてデータ入力
は終了したか否か、すなわち１文字の画像データの入力
が終了したか否かが判定される。このとき入力データの
終了が確認されれば後述するステップＳＴ８以降の処理
が実行されるが、入力データの終了が確認されなけれ
ば、次のステップＳＴ３の処理において、入力された１
文字の画像データについて文字認識が行なわれる。この
文字認識は、標準パターンＲ１をアクセスして行なわれ
る。このとき、標準パターンＲ１をアクセスして得た文
字コードは逐次画面出力または印字出力されて、操作者
に提示されるので、操作者はこの出力文字を見て読取ら
れた文字と文字認識結果による認識文字とを比較して認
識結果が正しいかどうかを確認する。認識結果が正しい
か否かは次のステップＳＴ４の処理において判断され、
正しければ再びステップＳＴ２の処理に移行し、次の文
字パターンについての処理が同様にして行なわれるが、
認識結果が正しくなければ、ステップＳＴ５以降に移行
する。In the process of step ST2, it is determined whether the data input is completed, that is, whether the input of the image data of one character is completed. At this time, if the end of the input data is confirmed, the processing from step ST8 described later is executed, but if the end of the input data is not confirmed, the input 1 is input in the processing of the next step ST3.
Character recognition is performed on character image data. This character recognition is performed by accessing the standard pattern R1. At this time, the character code obtained by accessing the standard pattern R1 is sequentially output on the screen or printed out and presented to the operator, so that the operator sees the output character and reads the read character and the character recognition result. Check if the recognition result is correct by comparing with the recognition character. Whether or not the recognition result is correct is determined in the processing of the next step ST4,
If it is correct, the process proceeds to step ST2 again, and the process for the next character pattern is performed in the same manner.
If the recognition result is not correct, the process proceeds to step ST5 and thereafter.

【００２６】ステップＳＴ５の処理においては、間違い
テーブル作成用のテーブルＴ２を作成するための処理が
開始される。ステップＳＴ５の処理において入力された
文字パターンの文字種ＣＨに基づいて予め準備されるカ
テゴリー対応テーブルＴ１をアクセスし、対応のカテゴ
リー番号Ｃｉを特定し、これを変数ｘに代入する。つぎ
のステップＳＴ６の処理において、前述のステップＳＴ
３の処理において得られた認識結果による認識文字の文
字種ＣＨに基いて同様にテーブルＴ１をアクセスし、該
当するカテゴリー番号Ｃｉを特定し、これを変数ｙに代
入して設定する。そして、つぎのステップＳＴ７の処理
において、間違いテーブル作成用のテーブルＴ２の
［ｘ，ｙ］に１を加算してテーブルＴ２を作成（更新）
する。このステップＳＴ５ないしステップＳＴ７で示さ
れる間違いテーブル作成用のテーブルＴ２の作成につい
て図２（ｃ）を参照して説明する。In the process of step ST5, the process for creating the table T2 for creating the error table is started. The category correspondence table T1 prepared in advance based on the character type CH of the character pattern input in the process of step ST5 is accessed, the corresponding category number Ci is specified, and this is assigned to the variable x. In the processing of the next step ST6, the above-mentioned step ST
Similarly, the table T1 is accessed based on the character type CH of the recognition character obtained by the recognition result obtained in the processing of 3, the corresponding category number Ci is specified, and this is assigned to the variable y and set. Then, in the processing of the next step ST7, 1 is added to [x, y] of the table T2 for creating an error table to create (update) the table T2.
To do. Creation of the table T2 for creating the error table shown in steps ST5 to ST7 will be described with reference to FIG.

【００２７】図２（ｃ）には間違いテーブルＲ２を作成
するために作成されるテーブルＴ２が示され、テーブル
Ｔ２は［ｘ，ｙ］の二次元配列のテーブルである。テー
ブルＴ２の構造は［ｘ，ｙ］の二次元で構成され、入出
力されうる文字種ｘおよび入出力されうる文字種ｙで構
成される。たとえば、“衰”という文字を認識させた場
合に、認識結果が“哀”でであったとき、テーブルＴ２
の“哀”の要素に“衰”を登録する。言いかえれば、該
当する要素に１が加算される。また、別の筆跡による
“衰”という文字を認識させた場合、認識結果が“京”
であったとき、テーブルＴ２の“京”の要素に“衰”を
登録する、すなわち１を加算する。このようにして、い
わゆる認識誤りの回数をテーブルＴ２に逐次登録してい
くと、カテゴリー番号Ｃｉに対応の文字を認識したと
き、その文字はカテゴリー番号Ｃｉｄｊである可能性が
あるというテーブル、すなわち間違いテーブルＲ２（図
２（ｄ）参照）が完成する。FIG. 2 (c) shows a table T2 created for creating the error table R2, which is a two-dimensional array table [x, y]. The structure of the table T2 is two-dimensionally composed of [x, y] and is composed of a character type x that can be input / output and a character type y that can be input / output. For example, if the recognition result is "sorrow" when the character "depression" is recognized, the table T2
Register "decline" in the "sorrow" element of. In other words, 1 is added to the corresponding element. In addition, if you recognize the character "Age" by another handwriting, the recognition result will be "Kyo".
When it is, “decline” is registered in the element of “K” in the table T2, that is, 1 is added. In this way, when the number of so-called recognition errors is sequentially registered in the table T2, when a character corresponding to the category number Ci is recognized, that character may be the category number Cidj, that is, an error table. The table R2 (see FIG. 2D) is completed.

【００２８】間違いテーブルＲ２を図２（ｄ）を参照し
て簡単に説明すると、間違いテーブル作成用のテーブル
Ｔ２からカテゴリー番号Ｃｉ＝２１（衰）は、カテゴリ
ー番号Ｃｉ＝２２（哀）およびカテゴリー番号Ｃｉ＝２
３（京）である可能性があると認識される。間違いテー
ブルＲ２のｉはテーブルＲ２作成のために認識させたカ
テゴリー数であり、ｊはテーブルＲ２に登録する文字数
の最適個数であり、該装置１自体のメモリー容量と経験
に基いて決められる。The error table R2 will be briefly described with reference to FIG. 2D. From the table T2 for creating the error table, the category number Ci = 21 (decline) is changed to the category number Ci = 22 (sorrow) and the category number. Ci = 2
It is recognized that it may be 3 (K). In the error table R2, i is the number of categories recognized for creating the table R2, and j is the optimum number of characters registered in the table R2, which is determined based on the memory capacity and experience of the device 1 itself.

【００２９】上述のように、間違いテーブル作成用のテ
ーブルＴ２に基づく間違いテーブルＲ２の作成は、ステ
ップＳＴ８およびステップＳＴ９の処理において行なわ
れ、テーブルＴ２の各ｘ列について対応のｙ列の文字を
値の大きい順にソートした後、各ｘ列についてｊ個のｙ
列の文字を出力することにより、間違いテーブルＲ２が
作成される。As described above, the creation of the error table R2 based on the table T2 for creating the error table is performed in the processing of steps ST8 and ST9, and the value of the character of the corresponding y column is set for each x column of the table T2. , J j for each x column
The error table R2 is created by outputting the characters in the column.

【００３０】上述したように、間違いテーブルＲ２作成
時には標準パターンＲ１が参照されるので、標準パター
ンＲ１に登録されない文字、たとえば標準パターンＲ１
に反映されることのないＪＩＳ第２水準文字などを認識
させれば、間違いテーブルＲ２にはＪＩＳ第２水準文字
コードが出現することになり、該装置１においてＪＩＳ
第２水準文字を入力した場合でも、間違いテーブルＲ２
にその文字コードは存在するので、候補文字コードとし
て後処理部１３へ送られることになる。As described above, since the standard pattern R1 is referred to when the error table R2 is created, characters that are not registered in the standard pattern R1, for example, the standard pattern R1.
If a JIS second-level character or the like that is not reflected in is recognized, the JIS second-level character code will appear in the error table R2.
Even if the second level character is entered, the error table R2
Since that character code exists, it is sent to the post-processing unit 13 as a candidate character code.

【００３１】また、標準パターンＲ１に反映されない特
異なフォント（新聞のゴシック体や明朝体などで印字さ
れなく、デザイン的に装飾されたフォント）について
も、テーブルＲ２作成時の入力データとして認識させる
ようにすれば、この特異なフォントの文字コードは間違
いテーブルＲ２に反映されるので、候補文字抽出部１５
において候補文字コードとして抽出されることになる。Further, a peculiar font that is not reflected in the standard pattern R1 (a font that is not printed in the Gothic typeface or Mincho typeface of a newspaper and is decorated by design) is also recognized as input data when the table R2 is created. By doing so, since the character code of this peculiar font is reflected in the error table R2, the candidate character extracting unit 15
Will be extracted as a candidate character code.

【００３２】上述のように、標準パターンＲ１をなんら
操作せずとも、認識対象文字数の増大、認識対象フォン
ト数の増大を間違いテーブルＲ２と、これをアクセスす
る間違いテーブル参照部１２を設けることによって実現
できる。As described above, it is possible to increase the number of recognition target characters and the number of recognition target fonts by providing the error table R2 and the error table reference unit 12 for accessing the same, without any operation of the standard pattern R1. it can.

【００３３】また、認識部１１がアクセスする標準パタ
ーンＲ１は、１文字のデータをストアするのにたとえ
ば、３２ビットマシーンで３８４バイトのメモリー容量
が必要とされるが、間違いテーブルＲ２は、文字コード
のみをストアするように構成されるので、１文字のデー
タについて２バイトの容量ですまされる。したがって、
文字読取装置１の読取精度を向上させるために標準パタ
ーンＲ１でパターンデータを拡張させるよりも、間違い
テーブルＲ２を設けて後処理部１３に対してより多くの
候補文字コードを与えるようにするほうが、装置１とし
てのメモリー消費量は、その読取精度の割に小さくてよ
いという利点がある。Further, the standard pattern R1 accessed by the recognition unit 11 requires a memory capacity of 384 bytes in a 32-bit machine for storing data of one character, but the error table R2 shows a character code. Since it is configured to store only data, it can store 2 bytes for one character data. Therefore,
Rather than expanding the pattern data with the standard pattern R1 to improve the reading accuracy of the character reading device 1, it is better to provide the error table R2 to give more candidate character codes to the post-processing unit 13. The memory consumption of the device 1 has an advantage that it may be small for its reading accuracy.

【００３４】次に、図１に示される文字読取装置の読取
動作について図３の処理フローにしたがって説明する。Next, the reading operation of the character reading device shown in FIG. 1 will be described according to the processing flow of FIG.

【００３５】なお、標準パターンＲ１、間違いテーブル
Ｒ２および後処理辞書Ｒ３は、予め文字読取装置１の内
部にストアされていると想定する。It is assumed that the standard pattern R1, the error table R2, and the post-processing dictionary R3 are stored inside the character reading device 1 in advance.

【００３６】図３のステップＳ１（図中Ｓ１と略す）に
おいて、スキャナー１０は紙面上に書かれた日本語文字
を走査して画像データＧ１を導出する。画像データＧ１
は候補文字抽出部１５に与えられる。In step S1 of FIG. 3 (abbreviated as S1 in the figure), the scanner 10 scans Japanese characters written on the paper surface to derive image data G1. Image data G1
Is given to the candidate character extraction unit 15.

【００３７】次のステップＳ２ないしステップＳ４の処
理において、認識部１１は与えられる画像データＧ１か
ら１文字づつ切出して、切出された各文字パターンにつ
いて標準パターンＲ１にストアされる各標準パターンと
パターンマッチング（パターン照合）を行なって、所定
のマッチング条件に一致する標準パターンに該当の文字
コードを、Ｎ個を抽出する。認識部１１は、与えられる
文字パターンと標準パターンＲ１にストアされる複数の
文字パターンのそれぞれとをパターン照合し、そのパタ
ーンの類似度が所定のしきい値に該当すれば、その該当
した文字パターンに対応の文字コードを候補文字コード
として抽出するので、少なくとも一つ以上のＮ個文字コ
ードが候補として抽出されることになる。In the processing of the next steps S2 to S4, the recognition section 11 cuts out one character from the supplied image data G1 and stores each standard pattern and the pattern stored in the standard pattern R1 for each cut-out character pattern. Matching (pattern matching) is performed, and N character codes corresponding to a standard pattern that matches a predetermined matching condition are extracted. The recognition unit 11 performs pattern matching between the given character pattern and each of the plurality of character patterns stored in the standard pattern R1, and if the similarity of the pattern corresponds to a predetermined threshold value, the corresponding character pattern. Since the character code corresponding to is extracted as a candidate character code, at least one or more N character codes are extracted as candidates.

【００３８】上述のようにして認識部１１がＮ個の候補
文字コードと類似度のデータＧ２を抽出すると、これは
間違いテーブル参照部１２に与えられる。間違いテーブ
ル参照部１２はステップＳ５の処理において、与えられ
るＮ個の候補文字コードのそれぞれについて、抽出され
るべき文字コードであるか否かを判定する。この判定
は、与えられる類似度データが所定のしきい値に達して
いるか否かにより判定される。この類似度データは、標
準パターンにストアされた文字パターンとスキャナー１
０により読取られた入力文字パターンとの距離データを
表わしており、この距離データが大きいほどパターンは
類似していないことが知られているので、この類似度デ
ータが小さければパターンは類似しているといえる。し
たがって、類似度データが予め定められたしきい値に達
していなければ、間違いテーブル参照部１２は抽出され
た候補文字コードは妥当ではない、すなわち疑しいと判
断し、後述するステップＳ６以降の処理に移行するが、
与えられる類似度データに基づいて抽出された候補文字
コードが疑わしいと判断されなければ、後述するステッ
プＳ８以降の処理が実行される。When the recognition unit 11 extracts the N candidate character codes and the similarity data G2 as described above, the data G2 is provided to the error table reference unit 12. In the process of step S5, the error table reference unit 12 determines whether or not each of the given N candidate character codes is a character code to be extracted. This determination is made based on whether or not the given similarity data has reached a predetermined threshold value. This similarity data is used for the scanner 1 and the character patterns stored in the standard pattern.
It represents distance data with the input character pattern read by 0, and it is known that the larger the distance data is, the less similar the patterns are. Therefore, if the similarity data is small, the patterns are similar. Can be said. Therefore, if the similarity data does not reach the predetermined threshold value, the error table reference unit 12 determines that the extracted candidate character code is not valid, that is, is suspicious, and the processing in step S6 and subsequent steps described later. Will move to
If it is not determined that the candidate character code extracted based on the given similarity data is suspicious, the processing from step S8 described below is executed.

【００３９】ステップＳ６の処理においては、前述のス
テップＳ５の処理で抽出された候補文字コードが妥当で
ないと判断されたことに応じて、間違いテーブル参照部
１２は間違いテーブルＲ２をアクセスする。このとき、
疑わしい候補文字コードのそれぞれについて間違いテー
ブルＲ２が検索され、該当文字コードに対応する複数の
候補文字コードがさらに抽出される。図２（ｄ）に示さ
れるように、たとえば“衰”という候補文字コードが疑
わしいと判断されれば、間違いテーブル参照部１２は間
違いテーブルＲ２をアクセスし、文字コード“衰”につ
いてさらに“哀”および“京”を含む文字コードを読出
す。このようにして、疑わしいと判断された候補文字コ
ードのそれぞれについて、間違いテーブルＲ２が参照さ
れることにより、さらにＮ個の候補文字コードが抽出さ
れる。この（Ｎ＋Ｍ）個の候補文字コードと類似度のデ
ータＧ３は次のステップＳ７の処理において後処理部１
３に導出される。In the process of step S6, the error table reference unit 12 accesses the error table R2 in response to the determination that the candidate character code extracted in the process of step S5 is not valid. At this time,
The error table R2 is searched for each of the suspicious candidate character codes, and a plurality of candidate character codes corresponding to the corresponding character code are further extracted. As shown in FIG. 2 (d), if it is determined that the candidate character code such as "decline" is suspicious, the error table reference unit 12 accesses the error table R2, and further "sorrow" about the character code "decline". Read the character code including "Kyo". In this way, N candidate character codes are further extracted by referring to the error table R2 for each of the candidate character codes determined to be suspicious. The (N + M) candidate character codes and the similarity data G3 are processed by the post-processing unit 1 in the processing of the next step S7.
3 is derived.

【００４０】後処理部１３は、次のステップＳ８の処理
において、与えられる候補文字コードを、逐次単語綴り
に展開し、その類似度データを考慮しながら後処理辞書
Ｒ３を参照し言語処理を行なう。単語綴りで展開したも
のが後処理辞書Ｒ３に登録されていれば、この展開した
単語綴りの各文字を最終認識コードＧ４にして出力部１
４に導出し、後処理辞書Ｒ３に登録されていなければそ
の類似度データを考慮しもっとも最適な文字コードとな
るような最終認識コードＧ４を出力部１４に導出する。In the next step S8, the post-processing unit 13 sequentially develops the given candidate character code into word spelling, and performs language processing by referring to the post-processing dictionary R3 while considering the similarity data. .. If the word spelled and expanded is registered in the post-processing dictionary R3, each character of the expanded word spell is set as the final recognition code G4 and the output unit 1
4, and if not registered in the post-processing dictionary R3, the final recognition code G4 that gives the most optimum character code is derived to the output unit 14 in consideration of the similarity data.

【００４１】出力部１４は、次のステップＳ９の処理に
おいて与えられる最終認識コードＧ４を次段のコンピュ
ータやワードプロセッサの入力データとして結果出力す
るか、プリンタやＣＲＴに印字出力または画面出力する
よう動作する。The output unit 14 operates so as to output the final recognition code G4 given in the processing of the next step S9 as the input data of the computer or word processor in the next stage, or to print out or screen output to the printer or CRT. ..

【００４２】つづいて、ステップＳ１０の処理において
処理は終了か否かが判断される。これは、スキャナー１
０においてのスキャナー入力が所定時間期間行なわれな
かったことなどに応じて判断される。この処理終了が判
断されると一連の文字読取りは終了するが、スキャナー
１０において文字入力が継続するようであれば、ただち
にステップＳ１の処理にもどり、次の入力文字について
同様にして（Ｎ＋Ｍ）個の候補文字コードの抽出による
文字認識が行なわれるよう処理が繰返される。Subsequently, in the process of step S10, it is determined whether or not the process is completed. This is the scanner 1
It is determined according to the fact that the scanner input at 0 is not performed for a predetermined time period. When it is judged that this process is finished, a series of character reading is finished, but if the character input continues in the scanner 10, the process immediately returns to the process of step S1 and the next input character is similarly (N + M) pieces. The process is repeated so that character recognition is performed by extracting the candidate character code of.

【００４３】以上のように、文字読取装置１の候補文字
抽出部１５は、標準パターンＲ１を参照して抽出された
Ｎ個の候補文字コードのそれぞれを一定の判定基準に基
づいて最適文字コードとして抽出すべきかどうかを判定
し、抽出すべきでないと判定されたとき、その候補文字
コードに対応するさらなる文字コードを抽出するために
間違いテーブルＲ２を参照し、さらなる候補文字コード
としてＭ個の文字コードを抽出し、Ｎ個の文字コードに
Ｍ個の文字コードを追加し後処理部１３へ導出すること
で、後処理部１３における最終認識の認識率を高めるよ
うにしている。As described above, the candidate character extraction unit 15 of the character reading device 1 sets each of the N candidate character codes extracted by referring to the standard pattern R1 as the optimum character code based on a certain criterion. It is determined whether or not it should be extracted, and when it is determined that it should not be extracted, the error table R2 is referred to in order to extract a further character code corresponding to the candidate character code, and M character codes as additional candidate character codes. Is extracted, M character codes are added to N character codes, and the character codes are derived to the post-processing unit 13 to increase the recognition rate of final recognition in the post-processing unit 13.

【００４４】[0044]

【発明の効果】以上のようにこの発明によれば、第１記
憶手段に記憶されない文字コードを第２記憶手段に記憶
し、第１文字コード出力手段は第１記憶手段をアクセス
して文字コードを出力し、第２文字コード出力手段は第
２記憶手段をアクセスして文字コードを出力するので、
認識出力手段は、第１および第２文字コード出力手段が
出力する複数文字コードの中から最終コードを認識する
ので、最適文字コードが読取られる確率は向上する。As described above, according to the present invention, the character code which is not stored in the first storage means is stored in the second storage means, and the first character code output means accesses the first storage means to access the character code. Is output, and the second character code output means accesses the second storage means and outputs the character code.
Since the recognition output means recognizes the final code from the plural character codes output by the first and second character code output means, the probability that the optimum character code is read is improved.

【００４５】さらに、第２記憶手段は文字コードのみを
ストアするように構成されているので、そのメモリー容
量は第１記憶手段のそれに比較し小さくてもよく、上述
した読取率の向上の割に、メモリー消費量は大きくない
という効果がある。Further, since the second storage means is configured to store only the character code, its memory capacity may be smaller than that of the first storage means, and in view of the improvement of the reading rate mentioned above. The effect is that memory consumption is not large.

【００４６】また、第２文字コード出力手段はパターン
照合はせずに第２記憶手段をアクセスして文字コードを
出力するのみなので、ここでのパターン照合は不要とな
って、外装置自体の処理速度は読取精度が高い割に向上
するという効果がある。Further, since the second character code output means does not perform pattern matching but only accesses the second storage means and outputs the character code, the pattern matching here becomes unnecessary and the processing of the external device itself is performed. There is an effect that the speed is improved even though the reading accuracy is high.

[Brief description of drawings]

【図１】本発明の一実施例による文字読取装置の構成図
である。FIG. 1 is a configuration diagram of a character reading device according to an embodiment of the present invention.

【図２】（ａ）ないし（ｄ）は、本発明の一実施例によ
る間違いテーブルの作成手順を説明するための図であ
る。2A to 2D are views for explaining a procedure for creating an error table according to an embodiment of the present invention.

【図３】本発明の一実施例による文字読取装置の読取処
理のフロー図である。FIG. 3 is a flowchart of a reading process of the character reading device according to the embodiment of the present invention.

【図４】従来の文字読取装置の構成図である。FIG. 4 is a configuration diagram of a conventional character reading device.

[Explanation of symbols]

１文字読取装置１０スキャナー１１認識部１２間違いテーブル参照部１３後処理部１４出力部Ｒ１標準パターンＲ２間違いテーブルＲ３後処理辞書Ｇ１画像データＧ２Ｎ個の候補文字コードと類似度のデータＧ３（Ｎ＋Ｍ）個の候補文字コードと類似度のデータＧ４最終認識コードなお、各図中、同一の符号は同一または相当部分を示
す。1 Character reading device 10 Scanner 11 Recognition unit 12 Error table reference unit 13 Post-processing unit 14 Output unit R1 Standard pattern R2 Error table R3 Post-processing dictionary G1 Image data G2 N candidate character codes and similarity data G3 (N + M) Individual candidate character code and similarity data G4 final recognition code In the drawings, the same reference numerals indicate the same or corresponding portions.

Claims

[Claims]

1. A scanning means for scanning a character to output an input character pattern, a first storage means for storing a plurality of character patterns in advance in association with respective character codes, and the scanning means for outputting. A pattern matching is performed between the input character pattern and each of the plurality of character patterns stored in the first storage means, and a character code corresponding to at least one character pattern whose matching result reaches a predetermined level is output. A one-character code output unit, a second storage unit that stores in advance each character code of a plurality of character patterns including a character pattern that is not stored in the first storage unit by grouping the similar patterns in advance, and the first character code For each of the character codes output by the output means, a determination that determines that the corresponding matching result is not valid A second character code output unit that searches the second storage unit based on the character code corresponding to the judgment output of the judgment unit and outputs the character code included in the corresponding group; A character reading device, comprising: a recognition output unit that recognizes and outputs an optimum character code from a plurality of character codes output by the two-character code output unit.