JPH04270482A

JPH04270482A - Printing character recognition device

Info

Publication number: JPH04270482A
Application number: JP3031109A
Authority: JP
Inventors: Keiko Abe; 阿部　惠子
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1991-02-26
Filing date: 1991-02-26
Publication date: 1992-09-25

Abstract

PURPOSE:To easily and accurately correct a character even when the character as a recognizing object is erroneously recognized by mechanical similarity judgement. CONSTITUTION:A deciding circuit 17 is provided to output the character code of the similar character as a clot pattern to the recognizing object character while referring to a memory 12 for rough classification dictionary and a memory 15 for precise classification dictionary, a display device 3 is provided to display the source dot pattern of the recognizing object character and the character similar to the recognizing object character, and a similar character table 19 is provided to arbitrarily record the character code and the attribute of the character having this character code by groups.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、例えば印刷文字を認識
するための印刷文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a printed character recognition device for recognizing, for example, printed characters.

【０００２】0002

【従来の技術】新聞や書籍等の記事を電子化し、データ
ベース化することにより、効率的にそれらを利用しよう
とする動きが急速に高まって来ており、印刷文書を高速
且つ正確に識別してコード化できる文字認識装置の開発
が進められている。これらの文字認識装置の中には、大
まかな特徴量と対応する文字コードとが記憶されている
大分類辞書用のメモリを用いて認識対象とする文字の候
補文字を選び、文字コードと対応するドットパターン（
文字フォントのデータ）とが記憶されている細分類辞書
用のメモリを用いてその認識対象とする文字のパターン
と候補文字のパターンとを比較し、両者のパターンが類
似しているときにその認識対象とする文字にその候補文
字の文字コードを付与するものがある。[Prior Art] There is a rapidly growing movement to efficiently utilize articles such as newspapers and books by digitizing them and creating a database. Character recognition devices that can encode characters are being developed. Some of these character recognition devices select candidate characters for characters to be recognized using a memory for a large classification dictionary that stores rough feature values and corresponding character codes, and select candidate characters that correspond to the character codes. Dot pattern (
The pattern of the character to be recognized is compared with the pattern of the candidate character using the memory for the subclassification dictionary that stores the character font data), and when the two patterns are similar, the recognition is performed. There are some that give the character code of the candidate character to the target character.

【０００３】そして、候補文字が存在しない場合、又は
候補文字のパターンと認識対象とする文字のパターンと
が類似しない場合には、その認識対象とする文字は認識
できないものとしてリジェクトされる。このようなリジ
ェクトされた文字又は誤認識された文字を修正するには
次のような方法がある。１．類似度の高い他の候補文字
を表示してその中から選択する。２．かな漢字変換、コ
ード入力又は部首画数による選択等により修正する。[0003] If the candidate character does not exist, or if the pattern of the candidate character and the pattern of the character to be recognized are not similar, the character to be recognized is rejected as unrecognizable. The following methods can be used to correct such rejected characters or erroneously recognized characters. 1. Display other candidate characters with high similarity and select from among them. 2. Modify by converting kana-kanji, inputting codes, or selecting by number of radical strokes.

【０００４】0004

【発明が解決しようとする課題】しかしながら、その大
分類辞書用のメモリを参照して得られる候補文字は機械
的に類似していると判定された文字であるが、必ずしも
機械的な類似判定により的確に目的とする文字が抽出さ
れるとは限らないため、その候補文字の中からの選択で
は認識対象文字を誤認識することがある。このような場
合には、かな漢字変換等によりその認識対象文字の入力
を行う必要があるが、これでは煩雑である不都合がある
。[Problem to be Solved by the Invention] However, although the candidate characters obtained by referring to the memory for the major classification dictionary are characters that are determined to be mechanically similar, they are not necessarily determined by mechanical similarity determination. Since the target character is not necessarily extracted accurately, the recognition target character may be incorrectly recognized when selecting from among the candidate characters. In such a case, it is necessary to input the character to be recognized through kana-kanji conversion or the like, but this has the disadvantage of being complicated.

【０００５】また、かな漢字変換等により文字の入力を
行う場合には、例えば数字の「１」と英小文字の「ｌ」
とを間違えたり、ひらがなの「り」とカタカナの「リ」
とを間違えたりする虞がある。本発明は斯かる点に鑑み
、印刷文字の認識装置において、機械的な類似判定によ
って認識対象とする文字を誤認識したような場合でもそ
の文字を容易に且つ正確に修正できるようにすることを
目的とする。[0005] Furthermore, when inputting characters through kana-kanji conversion, for example, the number "1" and the lowercase English letter "l"
or ``ri'' in hiragana and ``ri'' in katakana.
There is a risk of making a mistake. In view of the above, the present invention aims to enable a printed character recognition device to easily and accurately correct a character to be recognized even if the character is incorrectly recognized by mechanical similarity determination. purpose.

【０００６】[0006]

【課題を解決するための手段】本発明による印刷文字認
識装置は、例えば図１に示す如く、認識対象文字のドッ
トパターンを得るデータ入力手段（１）と、文字コード
と印刷文字の特徴量とを対応させて記憶する辞書用の記
憶手段（１２，１５）と、この辞書用の記憶手段を参照
してその認識対象文字にドットパターンとして類似する
印刷文字の文字コードを出力する分類手段（１０，１１
，１７，２）と、その認識対象文字の原ドットパターン
とその認識対象文字に類似する文字とを表示する表示手
段（３）と、任意にグループ別に文字コード及びこの文
字コードの文字の属性が記録された類似文字用の記憶手
段（１９）とを有し、その辞書用の記憶手段（１２，１
５）を参照して得られた認識結果文字がその認識対象文
字に合致しなかった場合に、その類似文字用の記憶手段
（１９）を参照してその認識結果文字が属するグループ
の文字及びこの文字の属性（漢字，数字，英小文字，ひ
らがな，カタカナ等）をその表示手段（３）に表示する
ようにしたものである。[Means for Solving the Problems] The printed character recognition apparatus according to the present invention, as shown in FIG. a dictionary storage means (12, 15) for storing the characters in correspondence with each other; and a classification means (10) for referring to the dictionary storage means and outputting the character code of a printed character similar to the recognition target character as a dot pattern. ,11
, 17, 2), a display means (3) for displaying the original dot pattern of the character to be recognized and characters similar to the character to be recognized, and a display means (3) for displaying character codes and character attributes of the character codes for each group arbitrarily. storage means (19) for recorded similar characters; and storage means (12, 1) for the dictionary.
If the recognition result character obtained by referring to 5) does not match the recognition target character, the storage means (19) for similar characters is referred to and the characters of the group to which the recognition result character belongs and this character are stored. Character attributes (kanji, numbers, lowercase letters, hiragana, katakana, etc.) are displayed on the display means (3).

【０００７】[0007]

【作用】斯かる本発明によれば、予めオペレータの判断
又は運用上の経験等により類似しているとみなされる文
字及びこの文字の属性をグループ別にその類似文字用の
記憶手段（１９）に登録しておく。そして、その辞書用
の記憶手段（１２，１５）を参照して得られた認識結果
文字が認識対象文字と合致しなかった場合に、その類似
文字用の記憶手段（１９）を参照すると、その認識結果
文字が属するグループの文字がその表示手段（３）に表
示されるので、そのグループの中から容易に認識対象文
字を特定することができる。この場合、そのグループの
文字と共に各文字の属性も表示されるので、例えば数字
の「１」と英小文字の「ｌ」とを間違えるようなことが
ない。[Operation] According to the present invention, characters that are deemed to be similar based on the operator's judgment or operational experience, etc., and the attributes of these characters are registered in advance in the storage means (19) for similar characters by group. I'll keep it. If the recognition result character obtained by referring to the storage means (12, 15) for the dictionary does not match the recognition target character, referring to the storage means (19) for the similar character, Since the characters of the group to which the recognition result character belongs are displayed on the display means (3), the character to be recognized can be easily identified from the group. In this case, since the attributes of each character are displayed together with the characters of the group, there is no chance of confusing the number "1" with the lowercase letter "l", for example.

【０００８】[0008]

【実施例】以下、本発明による文字認識装置の一実施例
につき図面を参照して説明しよう。図１は本例の文字認
識装置を示し、この図１において、１はイメージスキャ
ナであり、このイメージスキャナ１で読み込まれた印刷
文書のイメージデータをワークステーション２に供給す
る。このワークステーション２はそのイメージデータを
一方向に投影することにより先ず文字列を抽出し、次に
この文字列内での投影により個々の文字の外接枠を特定
し、この外接枠の内部のドットパターンよりなる個々の
文字の入力パターンＩＰを入出力インターフェース回路
４を介して文字認識ボード５に供給する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the character recognition device according to the present invention will be described below with reference to the drawings. FIG. 1 shows the character recognition apparatus of this example. In FIG. 1, 1 is an image scanner, and image data of a printed document read by this image scanner 1 is supplied to a workstation 2. In FIG. This workstation 2 first extracts a character string by projecting the image data in one direction, then identifies the circumscribing frame of each character by projecting within this character string, and identifies the dots inside this circumscribing frame. An input pattern IP of individual characters consisting of a pattern is supplied to a character recognition board 5 via an input/output interface circuit 4.

【０００９】この文字認識ボード５は、通常の文字認識
モード時にはその入力パターンＩＰに対応する候補文字
が存在するときには候補文字及び各候補文字のパターン
とその入力パターンとの残差を求め、類似度が例えば上
位１０までの候補文字の文字コードと残差との対を入出
力インターフェース回路４を介してワークステーション
２に供給する。このワークステーション２は、最上位の
候補文字の残差が所定レベルよりも小さいときにはその
最上位の候補文字の文字コードをその入力パターンＩＰ
に付与し、候補文字が存在しないとき又は最上位の候補
文字の残差が所定レベルを超えるときにはその文字の識
別ができなかったものと判断してリジェクトコードをそ
の入力パターンに付与する。In the normal character recognition mode, when there is a candidate character corresponding to the input pattern IP, this character recognition board 5 calculates the residual difference between the candidate character and the pattern of each candidate character and its input pattern, and calculates the degree of similarity. supplies, for example, pairs of character codes and residuals of the top 10 candidate characters to the workstation 2 via the input/output interface circuit 4. When the residual of the highest candidate character is smaller than a predetermined level, this workstation 2 transfers the character code of the highest candidate character to its input pattern IP.
When there is no candidate character or the residual of the highest candidate character exceeds a predetermined level, it is determined that the character cannot be identified and a reject code is assigned to the input pattern.

【００１０】また、ワークステーション２は文字フォン
トのＲＯＭを有し、入力文書の識別結果を文字の形で表
示装置３の表示画面３Ａ（図２参照）に表示する。この
最に、リジェクトコードが付与された文字の部分には例
えば高輝度の正方形の図形が表示される。また、後述の
ようにワークステーション２はその図２に示す表示画面
３Ａを用いて誤認識された文字の修正を行うことができ
る。The workstation 2 also has a character font ROM, and displays the identification result of the input document in the form of characters on the display screen 3A of the display device 3 (see FIG. 2). Finally, a high-intensity square figure, for example, is displayed in the part of the character to which the reject code has been assigned. Furthermore, as will be described later, the workstation 2 can correct erroneously recognized characters using the display screen 3A shown in FIG.

【００１１】その文字認識ボード５の構成及び文字認識
モード時の動作につき詳細に説明するに、図１の文字認
識ボード５において、認識対象文字のドットパターンよ
りなる入力パターンＩＰを第１のデュアル・ポート・メ
モリ６Ａを介して正規化処理回路７に供給する。この正
規化処理回路７は、その入力パターンＩＰを伸縮して２
４×２４ドットの正規化パターンＮＰを得て、この正規
化パターンＮＰを先入れ先出しメモリ（以下「ＦＩＦＯ
メモリ」と略称する。）８及び入力パターンセット回路
９を介して細分類回路１０に供給する。また、その入力
パターンＩＰを第２のデュアル・ポート・メモリ６Ｂを
介して大分類処理回路１１に供給し、この大分類処理回
路１１はその入力パターンＩＰより大まかな特徴量を抽
出する。To explain in detail the structure of the character recognition board 5 and its operation in the character recognition mode, in the character recognition board 5 shown in FIG. It is supplied to the normalization processing circuit 7 via the port memory 6A. This normalization processing circuit 7 expands and contracts the input pattern IP to 2
A normalized pattern NP of 4×24 dots is obtained, and this normalized pattern NP is stored in a first-in first-out memory (hereinafter referred to as “FIFO”).
It is abbreviated as "memory". ) 8 and an input pattern set circuit 9 to a subclassification circuit 10. Further, the input pattern IP is supplied to the major classification processing circuit 11 via the second dual port memory 6B, and the major classification processing circuit 11 extracts a rough feature amount from the input pattern IP.

【００１２】本例における特徴量は、その入力パターン
ＩＰに外接する矩形の各辺をＡ辺〜Ｄ辺として、各辺の
近傍での文字部のパターン構造をそれぞれコード化した
ものである。各辺の近傍でのパターン構造とは、各辺か
ら数ドット離れた観測ラインに各辺側から文字部のパタ
ーンを投影して得られた値であるが、この外に各辺の近
傍の文字部のドット数等をも使用することができる。こ
れに関し、１２は大分類辞書用のメモリを示し、この大
分類辞書用のメモリ１２には、文字の特徴量に対応させ
てその特徴量を有する文字の文字コードが記憶されてい
る。The feature quantities in this example are obtained by encoding the pattern structure of the character portion in the vicinity of each side of a rectangle circumscribing the input pattern IP as sides A to D. The pattern structure near each side is the value obtained by projecting the character pattern from each side onto the observation line several dots away from each side. The number of dots in a section can also be used. In this regard, reference numeral 12 denotes a memory for a large classification dictionary, and in this memory 12 for a large classification dictionary, character codes of characters having the feature amount are stored in correspondence with the feature amount of the character.

【００１３】大分類処理回路１１はその大分類辞書用の
メモリ１２を参照して、その入力パターンＩＰより抽出
した特徴量に対応する特徴コードと同じ特徴コードを有
する文字（以下、「候補文字」という。）の文字コード
を全て引き出し、この候補文字の文字コードをＦＩＦＯ
メモリ１３を介して候補パターンセット回路１４に供給
する。本例において候補文字とは、Ａ辺の特徴コード〜
Ｄ辺の特徴コードがそれぞれその認識対象文字の４辺の
特徴コードに等しい文字をいう。１５は細分類辞書用の
メモリを示し、この細分類辞書用のメモリ１５は、文字
コードとこの文字コードに対応するドットパターン（文
字フォント）とを各文字コード毎に記憶している。その
ドットパターンも２４×２４ドットに正規化されたパタ
ーンである。文字コードが例えばＪＩＳコードのような
ものであれば、認識対象とする字体（明朝体、新聞明朝
体、ゴシック体等）に合わせて各文字コードについて複
数の正規化されたドットパターンを記憶しておく。文字
コードが字体をも特定できる形式で定義されていれば、
各文字コードについて対応するドットパターンは１個で
もよい。The major classification processing circuit 11 refers to its major classification dictionary memory 12 and selects characters (hereinafter referred to as "candidate characters") having the same feature code as the feature code corresponding to the feature extracted from the input pattern IP. ), and store the character codes of these candidate characters in FIFO.
It is supplied to a candidate pattern set circuit 14 via a memory 13. In this example, the candidate characters are the feature code of side A ~
A character whose feature code on the D side is equal to the feature code on each of the four sides of the character to be recognized. Reference numeral 15 indicates a memory for a subclassification dictionary, and this memory 15 for a subclassification dictionary stores a character code and a dot pattern (character font) corresponding to the character code for each character code. The dot pattern is also a normalized pattern of 24×24 dots. For example, if the character code is a JIS code, multiple normalized dot patterns are stored for each character code according to the font to be recognized (Mincho font, newspaper Mincho font, Gothic font, etc.) I'll keep it. If the character code is defined in a format that can also specify the font,
The number of dot patterns corresponding to each character code may be one.

【００１４】その候補パターンセット回路１４は、細分
類辞書用のメモリ１５を参照することにより、大分類処
理回路１１から供給された候補文字の文字コードに対応
する正規化されたパターンＲＰを引き出して、この正規
化パターンＲＰを順次細分類回路１０に供給する。この
細分類回路１０は、認識対象とする文字の正規化パター
ンＮＰと候補文字の正規化パターンＲＰとを比較して、
パターンＲＰでパターンＮＰを消去して得られる第１の
残差ΔＩＰとパターンＮＰでパターンＲＰを消去して得
られる第２の残差ΔＲＰとをそれぞれＦＩＦＯメモリ１
６Ａ及び１６Ｂを介して判定回路１７に供給し、この判
定回路１７には大分類処理回路１１より出力される候補
文字の文字コードをもＦＩＦＯメモリ１６Ｃを介して供
給する。この判定回路１７は、第１の残差ΔＩＰと第２
の残差ΔＲＰとを例えば加算して最終的な残差Δを求め
る。The candidate pattern set circuit 14 extracts the normalized pattern RP corresponding to the character code of the candidate character supplied from the major classification processing circuit 11 by referring to the memory 15 for the subclassification dictionary. , this normalized pattern RP is sequentially supplied to the subclassification circuit 10. This subclassification circuit 10 compares the normalized pattern NP of the character to be recognized with the normalized pattern RP of the candidate character,
A first residual ΔIP obtained by erasing pattern NP with pattern RP and a second residual ΔRP obtained by erasing pattern RP with pattern NP are stored in the FIFO memory 1.
The character code of the candidate character output from the major classification processing circuit 11 is also supplied to the determination circuit 17 via the FIFO memory 16C. This determination circuit 17 determines the first residual ΔIP and the second residual ΔIP.
For example, the final residual ΔRP is obtained by adding the residual ΔRP.

【００１５】この残差Δが０に近い程にその認識対象の
文字と候補文字とはより類似していると考えられるので
、この残差Δが０に近い文字程その認識対象とする文字
に対する類似度が高い。そこで、判定回路１７は、類似
度が高い上位１０個の候補文字の文字コードＣとこの文
字コードの残差Δとを対にして順次ＦＩＦＯメモリ１８
を介して入出力インターフェース回路４に出力し、候補
文字が存在しないときには所定のリジェクトコードをそ
のインターフェース回路４に出力する。これらの文字コ
ードＣと残差Δとの対又はリジェクトコードはワークス
テーション２に供給される。[0015] The closer this residual Δ is to 0, the more similar the character to be recognized and the candidate character are. High similarity. Therefore, the determination circuit 17 pairs the character codes C of the top 10 candidate characters with high similarity and the residual difference Δ of this character code, and sequentially stores them in the FIFO memory 18.
If there is no candidate character, a predetermined reject code is output to the input/output interface circuit 4 via the input/output interface circuit 4. These pairs of character code C and residual Δ or the reject code are supplied to the workstation 2.

【００１６】１９はランダムアクセスメモリ（ＲＡＭ）
よりなる類似文字テーブルを示し、この類似文字テーブ
ル１９をワークステーション２に接続する。その類似文
字テーブル１９のメモリ構成につき説明するに、その内
容は常時存在する登録文字コード表と随時形成されるテ
ンポラリーな類似コード表とに分かれる。図４はオペレ
ータが運用上の経験等に基づいて作成する登録文字コー
ド表のメモリ構成を示し、この図４に示すように、各ア
ドレスに対応するデータは文字コード、次のアドレス及
びテーブル番号に分けられ、テーブル番号とは類似文字
が属するグループの番号を示し、類似文字とはオペレー
タが類似していると判断してグループ分けした文字をい
う。この場合、文字コードの部分には、全ての文字の文
字コードと文字の属性（例えば数字、英小文字、英大文
字、読点、ひらがな、カタカナ、句点、英記号、漢字、
記号等）とを示す文字データＣ１，Ｃ２，Ｃ３，‥‥が
順番にアドレスＮ１，Ｎ２，Ｎ３，‥‥に対応して記憶
されている。19 is random access memory (RAM)
This similar character table 19 is connected to the workstation 2. To explain the memory structure of the similar character table 19, its contents are divided into a registered character code table that always exists and a temporary similar code table that is created from time to time. Figure 4 shows the memory structure of the registered character code table created by the operator based on operational experience, etc. As shown in Figure 4, the data corresponding to each address is stored in the character code, next address, and table number. The table number indicates the number of the group to which the similar characters belong, and the similar characters are characters that the operator has determined to be similar and have divided into groups. In this case, the character code part includes the character codes of all characters and character attributes (e.g. numbers, lowercase English letters, uppercase English letters, commas, hiragana, katakana, full periods, English symbols, kanji,
Character data C1, C2, C3, . . . indicating symbols, etc.) are stored in order corresponding to addresses N1, N2, N3, .

【００１７】そして、アドレスＮｉのデータの中の次の
アドレスの部分には文字データＣｉの文字が属するグル
ープと同じ番号のグループに属する次の文字のアドレス
が記憶され、そのデータの中のテーブル番号の部分には
その文字データＣｉの文字が属するグループの番号が記
憶されている。具体的に図４において、文字０，８，Ｂ
，ｏ，ｇをそれぞれアドレスＮ１，Ｎ３，Ｎ６，Ｎ７，
Ｎ９に対応させて、文字の組（０，ｏ）がテーブル番号
１のグループに属し、文字の組（８，Ｂ，ｇ）がテーブ
ル番号６のグループに属するものとすると、アドレスＮ
３の文字８に対応するデータの次のアドレスとしては文
字Ｂに対応するアドレスＮ６が書き込まれ、アドレスＮ
６の文字Ｂに対応するデータの次のアドレスとしては文
字ｇに対応するアドレスＮ９が書き込まれている。そして、テーブル番号６のグループには他には類似文字
が登録されていないので、そのアドレスＮ９のデータの
次のアドレスとしては０が書き込まれている。同様に、
文字０に対応するアドレスＮ１のデータの次のアドレス
には、文字ｏに対応するアドレスＮ７が書き込まれてい
る。[0017] Then, in the next address part of the data of address Ni, the address of the next character belonging to the group with the same number as the group to which the character of character data Ci belongs is stored, and the table number in that data is stored. The number of the group to which the character of the character data Ci belongs is stored in the part. Specifically, in Figure 4, characters 0, 8, B
, o, g at addresses N1, N3, N6, N7, respectively.
Corresponding to N9, if the character set (0, o) belongs to the group with table number 1 and the character set (8, B, g) belongs to the group with table number 6, then address N9
Address N6 corresponding to character B is written as the next address of the data corresponding to character 8 of 3, and address N6 corresponds to character B.
The address N9 corresponding to the character g is written as the next address of the data corresponding to the character B of 6. Since no other similar characters are registered in the group with table number 6, 0 is written as the next address for the data at address N9. Similarly,
The address N7 corresponding to the character o is written in the address next to the data at the address N1 corresponding to the character 0.

【００１８】図４に示すように、本例では同じグループ
に属する類似文字がポインター接続されているので、或
る文字が指定されるとその同じグループに属する次の文
字を容易に検索することができる。例えば、テーブル番
号６のグループに属する文字Ｂが指定されると、その文
字Ｂに対応する次のアドレスＮ９よりテーブル番号６の
グループに属する次の文字ｇが容易に特定される。また
、アドレスＮ９の文字ｇのように次のアドレスが存在し
ない文字が指定されたときには、その登録文字コード表
でその文字が属するグループの最初の文字に戻るように
する。As shown in FIG. 4, in this example, similar characters belonging to the same group are connected by pointers, so when a certain character is specified, it is easy to search for the next character belonging to the same group. can. For example, when character B belonging to the group of table number 6 is specified, the next character g belonging to the group of table number 6 can be easily specified from the next address N9 corresponding to the character B. Further, when a character for which the next address does not exist, such as the character g of address N9, is specified, the process returns to the first character of the group to which the character belongs in the registered character code table.

【００１９】本例では随時その登録文字コード表への追
加記録又はその表からの削除ができるが、そのためには
図４の登録文字コード表から一時的に類似文字テーブル
１９の中に図５に示す類似コード表を形成する必要があ
る。図５に示すように、類似コード表の各アドレスに対
応するデータはテーブル番号と文字コードとより構成さ
れ、テーブル番号の部分にはテーブル番号が１，２，３
，‥‥の順に書き込まれ、文字コードの部分にはそのテ
ーブル番号に属する文字の文字コードと属性とよりなる
文字データＣｉが書き込まれている。この場合、各テー
ブル番号はそのテーブルに属する文字の個数だけ繰り返
して連続的に書き込まれている。In this example, the registered character code table can be added to or deleted from the registered character code table at any time, but for this purpose, the registered character code table shown in FIG. It is necessary to form a table of similar codes to show. As shown in FIG. 5, the data corresponding to each address in the similar code table is composed of a table number and a character code, and the table number part contains table numbers 1, 2, and 3.
. In this case, each table number is written consecutively and repeated as many times as there are characters belonging to that table.

【００２０】この図５の類似コード表の内容を図１の表
示装置３に表示すると図６のようになる。この図６にお
いて、左覧は各グループのテーブル番号を示し、右覧は
そのテーブル番号のグループに属する類似文字及び各類
似文字の属性を示す。この図６における類似文字とは、
図１の文字認識ボード５が類似していると判断する文字
ではなく、オペレータが類似していると判断してグルー
プ分けした文字である。この表示を用いて、所望のテー
ブル番号の類似文字を追加し、所望のテーブル番号の類
似文字を削除し、又は新たなテーブル番号の類似文字群
を設定することができる。When the contents of the similar code table of FIG. 5 are displayed on the display device 3 of FIG. 1, the result will be as shown in FIG. In FIG. 6, the table on the left shows the table number of each group, and the table on the right shows similar characters belonging to the group with that table number and attributes of each similar character. The similar characters in this figure 6 are
These characters are not characters that are determined to be similar by the character recognition board 5 in FIG. 1, but are characters that are determined to be similar by the operator and are grouped. Using this display, it is possible to add similar characters of a desired table number, delete similar characters of a desired table number, or set a similar character group of a new table number.

【００２１】次に、図２及び図３を参照して誤認識した
文字を修正する方法につき説明する。図２は本例の表示
装置３の表示画面３Ａを示し、この表示画面３Ａには、
入力文書の１頁分の文字の認識結果を表示する認識結果
表示領域２０を設ける。入力文書と同様に認識結果を横
書きで表示している。２１は修正対象とする文字を示す
カーソルである。その認識結果表示領域２０の左下に周
辺領域表示部２２と原パターン表示部３２とを設け、カ
ーソル２１を認識結果の中の所望の文字の上に移動する
ことにより、原パターン表示部３２及び周辺領域表示部
２２にそれぞれ修正対象とする文字の原ドットパターン
及びその文字を中心とする所定範囲の周辺パターンを表
示するようにする。図２の例では、入力文書での正確な
文字は数字の「８」であるが、認識結果は英小文字の「
ｇ」となっている。Next, a method for correcting erroneously recognized characters will be explained with reference to FIGS. 2 and 3. FIG. 2 shows a display screen 3A of the display device 3 of this example, and this display screen 3A includes:
A recognition result display area 20 is provided to display recognition results of characters for one page of an input document. Similar to the input document, the recognition results are displayed horizontally. A cursor 21 indicates a character to be corrected. A surrounding area display section 22 and an original pattern display section 32 are provided at the lower left of the recognition result display area 20, and by moving the cursor 21 over a desired character in the recognition result, the original pattern display section 32 and the surrounding area are displayed. The original dot pattern of each character to be corrected and the peripheral pattern of a predetermined range around the character are displayed on the area display section 22. In the example in Figure 2, the correct character in the input document is the number "8", but the recognition result is the lowercase English letter "8".
g”.

【００２２】２３は修正対象文字の認識結果の文字の属
性を示す文字属性釦（「釦」といっても本例では物理的
な釦ではなく表示画面上の表示である。）であり、この
文字属性釦２３の右隣の注目文字釦２４に修正対象文字
の認識結果の文字を表示する。本例のように修正対象文
字の認識結果文字が「ｇ」であるときには、その文字属
性釦２３には「英語」の文字（又は「英小文字」の文字
）が表示される。また、本例では図示省略した所謂マウ
スのような座標入力ユニットを用いて十字のカーソル３
５を各釦の上に移動して座標入力スイッチを操作するこ
とにより、その釦の機能を実行させることができる。Reference numeral 23 denotes a character attribute button (the "button" in this example is not a physical button but a display on the display screen) that indicates the character attribute of the recognition result of the character to be corrected. The character resulting from the recognition of the character to be corrected is displayed on the character of interest button 24 to the right of the character attribute button 23. When the recognition result character of the correction target character is “g” as in this example, the character attribute button 23 displays the character “English” (or the character “lowercase English”). In addition, in this example, a cross-shaped cursor 3 is input using a coordinate input unit such as a mouse (not shown).
By moving 5 onto each button and operating the coordinate input switch, the function of that button can be executed.

【００２３】本例では、その文字属性釦２３をカーソル
３５で選択することにより、その注目文字釦２４の表示
文字を図６の同じ類似文字テーブルに属する次の文字で
置き換えることができると共に、その文字属性釦２３に
はその次の文字の属性を表示させることができる。この
機能により後述のように、修正対象文字の修正を容易に
行うことができる。In this example, by selecting the character attribute button 23 with the cursor 35, the displayed character of the noted character button 24 can be replaced with the next character belonging to the same similar character table in FIG. The character attribute button 23 can display the attribute of the next character. With this function, as will be described later, it is possible to easily modify characters to be modified.

【００２４】２５は機能釦領域を示し、この領域２５に
は候補釦２６Ａ，コード釦２６Ｂ等が表示されている。例えば、カーソル３５で候補釦２６Ａを選択すると、候
補文字表示領域３３にその修正対象とする文字に類似す
ると判定された１０個の候補文字がその残差と共に表示
される。オペレータは周辺領域表示部２２と原パターン
表示部３２の表示より元の文字を認識できるので、その
１０個の候補文字の中にその元の文字が存在するときに
はカーソル３５でその文字を選択することにより、その
選択した文字の文字コードがその認識できなかった文字
の文字コードとして付与される。その候補文字の中に所
望の文字が存在しない場合には、コード釦２６Ｂの選択
により文字コードを直接入力することができ、かな漢字
釦２６Ｃ及び部首画数釦２６Ｅにより漢字の入力を行う
ことができる。また、テーブル釦２６Ｄの選択により文
字コード（例えばＪＩＳコード）と対応する文字との一
覧表が表示され、外字登録釦２６Ｆ及び学習削除釦２６
Ｇは特殊な字体の文字等を登録する場合に使用される。また、２７は再認識釦、２８はバックスペース釦を示し
、再認識釦２７は修正対象文字を再認識する場合に使用
される。Reference numeral 25 indicates a function button area, in which candidate buttons 26A, code buttons 26B, etc. are displayed. For example, when the candidate button 26A is selected with the cursor 35, ten candidate characters determined to be similar to the character to be corrected are displayed in the candidate character display area 33 together with their residuals. Since the operator can recognize the original character from the display on the peripheral area display section 22 and the original pattern display section 32, if the original character exists among the 10 candidate characters, the operator can select that character with the cursor 35. The character code of the selected character is assigned as the character code of the unrecognized character. If the desired character does not exist among the candidate characters, the character code can be directly input by selecting the code button 26B, and the kanji can be input by using the kana-kanji button 26C and the radical stroke count button 26E. . In addition, by selecting the table button 26D, a list of character codes (for example, JIS codes) and corresponding characters is displayed, and the external character registration button 26F and learning deletion button 26 are displayed.
G is used when registering characters with special fonts. Further, 27 indicates a re-recognition button, and 28 indicates a backspace button. The re-recognition button 27 is used when re-recognizing a character to be corrected.

【００２５】２９は修正履歴表示部を示し、この修正履
歴表示部２９には、文字の修正又はリジェクト後の文字
コードの付与によって得られた文字を出現頻度の高い順
に左から１０個表示する。この表示部２９に所望の文字
が表示されているときには、その文字をカーソル３５で
選択することにより、容易にその認識できなかった文字
に正確な文字コードを付与することができる。３０は次
釦、３１は前釦を示し、次釦３０を選択すると、カーソ
ル２１が注目している文字と同じでより後に存在する文
字の上に移動し、前釦３１を選択すると、カーソル２１
が注目している文字と同じでより前に存在する文字の上
に移動する。Reference numeral 29 indicates a modification history display section, and this modification history display section 29 displays 10 characters from the left in descending order of frequency of appearance, which are obtained by modifying characters or adding character codes after rejection. When a desired character is displayed on the display section 29, by selecting the desired character with the cursor 35, an accurate character code can be easily assigned to the unrecognized character. 30 indicates the next button, and 31 indicates the previous button. When the next button 30 is selected, the cursor 21 moves to a character that is the same as the character that is being focused on and exists later, and when the previous button 31 is selected, the cursor 21
moves over the character that is the same as the character you are looking at but is earlier.

【００２６】３４は類似釦を示し、この類似釦３４を選
択すると、認識結果表示領域２０には次のような表示が
行われる。１．類似文字テーブルの一覧表の表示２．類似文字テーブルへの追加登録３．類似文字テーブルへの新規登録４．類似文字テーブルの登録削除例えば一番上の一覧表の表示を選択すると、認識結果表
示領域２０には図６に示す類似文字のテーブルの一覧表
が表示され、その次の追加登録により注目文字釦２４に
表示されている文字を所望の類似文字のテーブルに追加
登録することができ、その次の新規登録によりその文字
を新たなテーブルに登録することができ、最後の登録削
除により所望のテーブルの所望の文字又は所望のテーブ
ル自体を削除することができる。Reference numeral 34 indicates a similarity button, and when this similarity button 34 is selected, the following display is performed in the recognition result display area 20. 1. Displaying a list of similar character tables 2. Additional registration to similar character table 3. New registration to similar character table 4. Deletion of registration of similar character table For example, if you select the top list display, the list of similar character tables shown in FIG. 6 will be displayed in the recognition result display area 20, and the next additional registration will cause the character button The character displayed in 24 can be added to the table of desired similar characters, and the next new registration allows that character to be registered in a new table, and the last registration deletion allows the character to be registered in the desired table. Desired characters or the desired table itself can be deleted.

【００２７】本例の類似文字テーブルの使用方法の一例
につき説明するに、図２においてカーソル３５で候補釦
２６Ａを選択したときに候補文字表示領域３３には文字
「ｇ」，「６」及び「９」のみが各々の残差と共に表示
されるものとする。本例では周辺領域表示部２２と原パ
ターン表示部３２の表示より元の文字が「８」であると
認識できるので、その候補文字表示領域３３には目的と
する文字が含まれていないことが分かる。そこで、オペ
レータはカーソル３５で文字属性釦２３を選択する。こ
れに対応して、図３に示すように注目文字釦２４には文
字「ｇ」と同じテーブルに属する次の類似文字（図６よ
り文字「８」であることが分かる。）が表示され、類似
文字釦２３にはその類似文字の属性（本例では「数字」
）が表示される。To explain an example of how to use the similar character table of this example, when candidate button 26A is selected with cursor 35 in FIG. 2, characters "g", "6" and "9'' are displayed together with each residual. In this example, it can be recognized that the original character is "8" from the display in the peripheral area display section 22 and the original pattern display section 32, so it can be determined that the candidate character display area 33 does not contain the target character. I understand. Then, the operator selects the character attribute button 23 with the cursor 35. Correspondingly, as shown in FIG. 3, the next similar character belonging to the same table as the character "g" (as seen in FIG. 6, it is the character "8") is displayed on the notable character button 24. The similar character button 23 displays the attribute of the similar character (in this example, "number").
) is displayed.

【００２８】その注目文字釦２４に表示されている文字
が目的とする文字であるときには、オペレータがカーソ
ル３５でその注目文字釦２４を選択することにより、そ
の注目文字釦２４の文字の文字コードがその修正対象文
字に付与され、認識結果表示領域２０の中のカーソル２
１の中にはその選択された文字が表示される。この場合
、文字属性釦２３には文字の属性が表示されるので、例
えば数字の「１」と英小文字の「ｌ」とを取り違えたり
、数字の「０」と英大文字の「Ｏ」とを取り違えたりす
ることがない。一方、その注目文字釦２４に表示されて
いる文字が目的とする文字でないときには、オペレータ
がカーソル３５で文字属性釦２３を選択することにより
、注目文字釦２４には同じ類似文字テーブルの次の文字
が表示され、その文字属性釦２３にはその次の文字の属
性が表示される。When the character displayed on the character of interest button 24 is the desired character, the operator selects the character of interest button 24 with the cursor 35 to change the character code of the character of the character of interest button 24. The cursor 2 in the recognition result display area 20 is attached to the character to be corrected.
The selected character is displayed in 1. In this case, the attribute of the character is displayed on the character attribute button 23, so for example, the number "1" may be confused with the lowercase letter "l", or the number "0" and the uppercase letter "O" may be confused. There is no chance of getting them mixed up. On the other hand, if the character displayed on the character of interest button 24 is not the desired character, the operator selects the character attribute button 23 with the cursor 35, and the character of interest button 24 displays the next character from the same similar character table. is displayed, and the character attribute button 23 displays the attribute of the next character.

【００２９】上述のように、本例によれば、その目的と
する文字と認識結果文字とを予め図６の同じテーブル番
号のグループに登録しておくことにより、候補文字表示
領域３３に目的とする文字が表示されていない場合であ
っても、文字属性釦２３を選択するだけで容易に目的と
する文字の文字コードを修正対象文字に付与することが
できる利益がある。これにより誤認識結果を容易に修正
することができる。この場合、その文字属性釦２３には
各文字の属性が表示されるので、その目的とする文字を
誤認することがない利益がある。As described above, according to this example, the target character and the recognition result character are registered in advance in the group with the same table number in FIG. Even if the desired character is not displayed, there is an advantage in that the character code of the desired character can be easily assigned to the character to be corrected by simply selecting the character attribute button 23. This makes it possible to easily correct erroneous recognition results. In this case, since the attribute of each character is displayed on the character attribute button 23, there is an advantage that the target character will not be misidentified.

【００３０】なお、本発明は上述実施例に限定されず本
発明の要旨を逸脱しない範囲で種々の構成を取り得るこ
とは勿論である。Note that the present invention is not limited to the above-mentioned embodiments, and it goes without saying that various configurations may be adopted without departing from the gist of the present invention.

【００３１】[0031]

【発明の効果】本発明によれば、類似文字用の記憶手段
にオペレータが機械では類似と判断できない文字をグル
ープ分けして登録しておくことができるので、機械が誤
認識した文字の修正を容易に行うことができる利益があ
る。また、各文字の属性が表示手段に表示されるので、
文字の修正を正確に行うことができる利益がある。[Effects of the Invention] According to the present invention, since an operator can group and register characters that cannot be determined to be similar by a machine in the storage means for similar characters, it is possible to correct characters that have been erroneously recognized by the machine. There are benefits that can be easily made. Also, since the attributes of each character are displayed on the display means,
There is an advantage in that characters can be corrected accurately.

【図面の簡単な説明】[Brief explanation of the drawing]

【図１】本発明による文字認識装置の一実施例を示す構
成図である。FIG. 1 is a configuration diagram showing an embodiment of a character recognition device according to the present invention.

【図２】その一実施例の表示装置の表示画面の一例を示
す正面図である。FIG. 2 is a front view showing an example of a display screen of the display device according to the embodiment.

【図３】図２の表示画面の表示内容の変化の一例を示す
要部の線図である。FIG. 3 is a diagram of main parts showing an example of changes in display content on the display screen of FIG. 2;

【図４】一実施例の類似文字テーブルの登録文字コード
表のメモリ構成を示す線図である。FIG. 4 is a diagram showing a memory configuration of a registered character code table of a similar character table in one embodiment.

【図５】一実施例の類似文字テーブルの類似コード表の
メモリ構成を示す線図である。FIG. 5 is a diagram illustrating a memory configuration of a similar code table of a similar character table according to an embodiment.

【図６】一実施例の類似文字テーブルの類似コード表の
内容を図２の表示画面に表示した例を示す線図である。6 is a diagram showing an example of the contents of the similar code table of the similar character table of one embodiment displayed on the display screen of FIG. 2; FIG.

[Explanation of symbols]

１　　イメージスキャナ２　　ワークステーション３　　表示装置５　　文字認識ボード１２　　大分類辞書用のメモリ１５　　細分類辞書用のメモリ１０　　細分類回路１７　　判定回路１９　　類似文字テーブル 1 Image scanner 2 Workstation 3 Display device 5 Character recognition board 12 Memory for major classification dictionary 15 Memory for subclassification dictionary 10 Subclassification circuit 17 Judgment circuit 19 Similar character table

Claims

[Claims]

1. A data input means for obtaining a dot pattern of a character to be recognized, a dictionary storage means for storing character codes and feature quantities of printed characters in correspondence, and a dictionary storage means for storing a character code and a feature amount of a printed character in correspondence. a classification means for outputting a character code of a printed character similar to the recognition target character as a dot pattern; a display means for displaying the original dot pattern of the recognition target character and a character similar to the recognition target character; Separately, it has a storage means for similar characters in which a character code and the character attributes of this character code are recorded, and the recognition result character obtained by referring to the storage means for the dictionary matches the recognition target character. A printed character recognition device characterized in that, if there is no similar character, the character of the group to which the recognition result character belongs and the attribute of this character are displayed on the display means by referring to the storage means for similar characters.