JPH01217686A

JPH01217686A - Character reader

Info

Publication number: JPH01217686A
Application number: JP63042257A
Authority: JP
Inventors: Shinji Matsuda; 信治松田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1988-02-26
Filing date: 1988-02-26
Publication date: 1989-08-31

Abstract

PURPOSE:To improve a reading accuracy in a character recognizing processing by executing the character recognizing processing based on the character type respectively corresponding to a specific code stored in a second memory means and a code stored in a first memory means. CONSTITUTION:A control part 17 has a character type designating area 20 as a storing area for FC information and further has an a area 21 for a character code for storing the character code designated by the code of the character type designating area 20. At the time of recognizing the character, the character recognition processing based on the character type corresponding to the specific code stored in the second memory means 2 and the code stored in the first memory means 20 is executed by a recognition control means 17. Thereby, for instance, only a specific character type designated by a general character type and the specific code can be limited to a range to be read. Accordingly, the specific character type used for a document to be read is designated and an unused specific character type or the like can be eliminated from the range to be read, so that an erroneous reading ratio is lowered to realize the improvement of the reading accuracy.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、フォーマットコントロール情報に基づいて文
字認識処理を行なう文字読取装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a character reading device that performs character recognition processing based on format control information.

（従来の技術）従来、光学的文字読取装置（ＯＣＲ）では、読取対象で
ある帳票に記録された文字を光学的に走査し、この走査
により得られた文字パターンが文字認識部で認識処理さ
れる。文字認識部は、予め用意された標準文字パターン
群からなる辞書情報と文字パターンとのマツチング処理
を実行し、読取対象の文字パターンを識別する処理を行
なう。(Prior Art) Conventionally, optical character reading devices (OCR) optically scan characters recorded on a form to be read, and the character pattern obtained by this scanning is recognized by a character recognition unit. Ru. The character recognition unit performs a process of matching a character pattern with dictionary information consisting of a group of standard character patterns prepared in advance, and performs a process of identifying a character pattern to be read.

ところで、ＯＣＲでは、読取処理の開始前に、外部（例
えばユーザ）から例えば制御シートに記録されたフォー
マットコントロール情報（ＦＣ情報）が与えられる。Ｆ
Ｃ情報は、読取対象の文字種２文字ピッチ、字体、帳票
の読取行位置、帳票のサイズ等のような読取範囲を限定
して高い読取精度を確保するために必要な各種読取制御
情報である。この中で、特に文字種に関するＦＣ情報は
、読取対象の文字種を限定するために、文字認識処理の
際に必要な辞書情報の範囲を限定する重要な情報である
。By the way, in OCR, before starting the reading process, format control information (FC information) recorded on, for example, a control sheet is provided from the outside (for example, a user). F
The C information is various reading control information necessary to ensure high reading accuracy by limiting the reading range, such as the pitch between two characters of the character type to be read, the font, the reading line position of the form, and the size of the form. Among these, the FC information regarding the character type is particularly important information that limits the range of dictionary information necessary for character recognition processing in order to limit the character type to be read.

従来では、ＦＣ情報により文字種をセット（キャラクタ
セット）する場合には、例えば数字（Ｎ）、英字（Ａ）
、仮名文字（Ｋ）等の一般文字種及びｒ、Ｊ、ｒ＋Ｊ等
の記号（特殊文字種）の中から、該当する文字種がコー
ド指定されることになる。具体的には、例えば（Ｎ）と
記号を指定すれば、文字認識部は数字及びｒ、Ｊ、ｒ＋
Ｊ等の記号に対応する辞書情報を検索し、この辞書情報
により読取対象の文字パターンの認識処理を行なうこと
になる。このような方式により、例えば数字と記号のみ
に読取対象範囲を限定できるが、記号の種類が多数の場
合には読取対象範囲がそれほど限定されないことになる
。特に、記号については実際上使用される種類が限定さ
れることが多い。このため、帳票上には使用されない記
号についてもＦＣ情報で指定されると、文字認識処理の
際に誤読率が高くなり、読取精度が相対的に低下するこ
とになる。Conventionally, when setting character types (character set) based on FC information, for example, numbers (N), alphabets (A)
, Kana character (K), and symbols (special character types) such as r, J, r+J, etc., the corresponding character type is designated as a code. Specifically, for example, if you specify the symbol (N), the character recognition unit will recognize numbers and r, J, r+
Dictionary information corresponding to a symbol such as J is searched, and the character pattern to be read is recognized using this dictionary information. With such a method, the range to be read can be limited to only numbers and symbols, for example, but if there are many types of symbols, the range to be read is not so limited. In particular, the types of symbols that are actually used are often limited. For this reason, if symbols that are not used on a form are also designated by FC information, the rate of misreading will increase during character recognition processing, resulting in a relative decrease in reading accuracy.

（発明が解決しようとする課題）従来、ＦＣ情報の文字種をセットする場合に、読取対象
以外の記号等の文字種も指定することになり、読取対象
範囲が比較的広くなる。したがって、文字認識処理の際
に誤読率が高くなり、読取精度が相対的に低下する欠点
がある。(Problems to be Solved by the Invention) Conventionally, when setting the character type of FC information, character types such as symbols other than those to be read are also specified, and the range to be read becomes relatively wide. Therefore, there is a drawback that the rate of misreading increases during character recognition processing, and the reading accuracy decreases relatively.

本発明の目的は、ＦＣ情報の文字種をセットする場合に
、読取対象範囲に合致する文字種を適確に指定し、文字
認識処理における読取精度を向上することができる文字
読取装置を提供することにある。An object of the present invention is to provide a character reading device that can accurately specify a character type that matches a reading target range when setting the character type of FC information and improve reading accuracy in character recognition processing. be.

［発明の構成］（課題を解決するための手段と作用）本発明は、文字種等のフォーマットコントロール情報及
び辞書情報に基づいて文字認識処理を行なう文字読取装
置において、フォーマットコントロール情報において一
般文字種及び各種記号からなる特殊文字種のそれぞれの
指定範囲を指示するコードを格納する第１のメモリ手段
及び一般文字種又は特殊文字種の指定範囲内で特定文字
種のみを指定するための特定コードを格納する第２のメ
モリ手段とを備えた装置である。文字認識処理の際には
、認識制御手段により第２のメモリ手段に格納された特
定コード及び第１のメモリ手段に格納されたコードのそ
れぞれに対応する文字種に基づいた文字認識処理が実行
されることになる。[Structure of the Invention] (Means and Effects for Solving the Problems) The present invention provides a character reading device that performs character recognition processing based on format control information such as character types and dictionary information. A first memory means for storing a code indicating a specified range of each special character type consisting of symbols, and a second memory means for storing a specific code for specifying only a specific character type within the specified range of general character types or special character types. It is a device comprising means. During character recognition processing, the recognition control means executes character recognition processing based on character types corresponding to the specific code stored in the second memory means and the code stored in the first memory means. It turns out.

このような構成により、例えば一般文字種及び特定コー
ドにより指定される特定文字種のみを読取対象範囲とし
て限定することができる。したがって、読取対象である
帳票に使用される特定文字種を指定し、使用しない特殊
文字種等を読取対象範囲から排除できるため、誤読率を
低下させ、読取精度の向上を実現することが可能となる
。With such a configuration, it is possible to limit the reading target range to, for example, general character types and specific character types specified by a specific code. Therefore, it is possible to specify a specific character type used in a form to be read and exclude special character types that are not used from the reading range, thereby reducing the rate of misreading and improving reading accuracy.

（実施例）以下図面を参照して本発明の詳細な説明する。第１図は
同実施例に係わるＯＣＲの構成を示すブロック図である
。第１図に示すように、ＯＣＲは、量子化部１０．検切
部１１．前処理正規化部１２．特徴抽出部１３．認識部
１４．出力部１５及び辞書メモリＩＢを備えている。量
子化部１０は、図示しない走査系（光電変換回路を有す
る）により帳票上を走査されて得られた画像信号から２
値化信号からなる文字パターンデータ（以下文字パター
ンと称す）を生成する回路である。検切部１１は、量子
化部１０で得られた例えば１行分の文字パターン群から
１文字分の文字パターンを検出切出し処理する回路であ
る。前処理正規化部１２は検切された文字パターンのノ
イズを除去し、位置及びサイズ等を正規化する回路であ
る。特徴抽出部１３は文字パターンを識別するために必
要な特徴情報を生成して出力する回路である。認識部１
４は文字パターンに対応する特徴情報及び辞書メモリＩ
Ｂに予め格納された辞書情報とのマツチング処理を実行
し、認識結果（該当する文字コード）を出力部１５へ出
力する。出力部１５は例えばホストコンピュータへ認識
結果を出力する回路である。(Example) The present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of an OCR according to the same embodiment. As shown in FIG. 1, OCR includes a quantization unit 10. Examination section 11. Preprocessing normalization unit 12. Feature extraction unit 13. Recognition unit 14. It is equipped with an output section 15 and a dictionary memory IB. The quantization unit 10 converts two images from an image signal obtained by scanning a document by a scanning system (not shown) (having a photoelectric conversion circuit).
This is a circuit that generates character pattern data (hereinafter referred to as character pattern) consisting of digitized signals. The cutoff section 11 is a circuit that detects and cuts out a character pattern for one character from a group of character patterns for one line obtained by the quantization section 10, for example. The preprocessing normalization unit 12 is a circuit that removes noise from the examined character pattern and normalizes the position, size, etc. The feature extraction unit 13 is a circuit that generates and outputs feature information necessary for identifying character patterns. Recognition part 1
4 is feature information corresponding to the character pattern and dictionary memory I
Matching processing with dictionary information stored in advance in B is executed, and the recognition result (corresponding character code) is output to the output unit 15. The output unit 15 is, for example, a circuit that outputs recognition results to a host computer.

制御部１７はＯＣＲ全体の動作制御を行なうマイクロプ
ロセッサからなり、本発明の要旨に関係するＦＣ情報の
登録（セット）処理を行なう回路である。制御部１７は
、外部から与えられるＦＣ情報を格納する記憶エリアを
備えており、このＦＣ情報を認識部１４へ出力する。制
御部１７は、ＦＣ情報用記憶エリアとして、第２図（ａ
）に示すような文字種指定用エリア２０を有する。さら
に、同図（ｂ）又は（ｃ）に示すように、文字種指定用
エリア２０のコードで指定された文字コードを格納する
文字コード用エリア２１を有する。The control unit 17 includes a microprocessor that controls the overall operation of the OCR, and is a circuit that performs registration (setting) processing of FC information related to the gist of the present invention. The control unit 17 includes a storage area for storing FC information given from the outside, and outputs this FC information to the recognition unit 14. The control unit 17 has an FC information storage area as shown in FIG.
) has a character type designation area 20 as shown in FIG. Furthermore, as shown in FIG. 10B or 2C, it has a character code area 21 that stores the character code designated by the code in the character type designation area 20.

次に、同実施例の動作を説明する。先ず、読取対象の帳
票が図示しない走査系により走査されて、量子化部１０
により２値化された文字パターンが生成される。この文
字パターンから検切部１１により１文字分の文字パター
ンに検切処理された後に、前処理正規化部１２により正
規化処理される。特徴抽出部１３では、文字パターンの
識別に必要な特徴情報が生成されて認識部１４へ出力さ
れる。Next, the operation of this embodiment will be explained. First, a document to be read is scanned by a scanning system (not shown), and the quantization unit 10
A binary character pattern is generated. This character pattern is subjected to inspection processing by the inspection section 11 into a character pattern for one character, and then subjected to normalization processing by the preprocessing normalization section 12. The feature extraction section 13 generates feature information necessary for character pattern identification and outputs it to the recognition section 14 .

認識部１４は特徴情報及び辞書メモリ１Ｂからの辞書情
報とのマツチング処理を実行し、認識結果として該当す
る文字コードを決定して出力する。ここで、認識部１４
は制御部Ｉ７から出力されるＦＣ情報の中で指定された
文字種に対応する辞書情報に基づいて、文字認識処理を
行なう。The recognition unit 14 executes a matching process between the feature information and the dictionary information from the dictionary memory 1B, and determines and outputs the corresponding character code as a recognition result. Here, the recognition unit 14
performs character recognition processing based on dictionary information corresponding to the character type specified in the FC information output from the control unit I7.

制御部１７は、外部から例えば制御シートにより与えら
れるＦＣ情報により、第２図（ａ）に示す文字種指定用
エリア２０の該当エリアに指定用ビットをセットする。The control unit 17 sets a designation bit in the corresponding area of the character type designation area 20 shown in FIG. 2(a) based on FC information provided from the outside, for example, from a control sheet.

ここで、文字種指定用エリア２０において、ｒＮＪは数
字、ｒＡＪは英字、ｒＫＪは仮名文字を意味する。また
、「記号Ａ−Ｃｌは、それぞれ予め決定される特殊文字
種の種類別に設けられた指定エリアである。具体的には
、「記号Ｃ」は例えば「ピリオド、」及び「プラス十」
の記号及び「記号Ａと記号Ｂ」のそれぞれに含む特殊文
字種からなる２６種類の特殊文字種を指定するためのエ
リアである。さらに、文字種指定用エリア２０には、特
定文字種を指定するための特定コードを格納するための
エリア「コード」が設けられている。Here, in the character type designation area 20, rNJ means a number, rAJ means an alphabetic character, and rKJ means a kana character. Furthermore, the "symbols A-Cl" are designated areas provided for each predetermined special character type.Specifically, the "symbols C" are, for example, "period," and "plus ten."
This is an area for specifying 26 kinds of special character types, including the special character types included in each of the symbol A and symbol B. Further, the character type designation area 20 is provided with an area "code" for storing a specific code for designating a specific character type.

いま仮に、外部からのＦＣ情報により、文字種指定用エ
リア２０のｒＮＪと「記号Ｃ」の箇所に指定ビットがセ
ットされたとする。これにより、制御部１７は、認識処
理する際の読取対象範囲が「０〜９」の数字及び「記号
Ｃ」で指定される特殊文字種２６種類からなることを指
定するためのＦＣコードを認識部１４へ出力する。認識
部１４は、制御部１７から出力されるＦＣコードに対応
する辞書情報に基づいて、認識処理を行なうことになる
。Assume now that designation bits are set at rNJ and "symbol C" in the character type designation area 20 by external FC information. As a result, the control unit 17 sends an FC code to the recognition unit for specifying that the range to be read during recognition processing consists of 26 types of special characters designated by the numbers “0 to 9” and the “symbol C”. Output to 14. The recognition unit 14 performs recognition processing based on dictionary information corresponding to the FC code output from the control unit 17.

ここで、外部からのＦＣ情報により、文字種指定用エリ
ア２０のｒＮＪと「コード」の箇所に指定ビットがセッ
トされたとする。制御部Ｉ７は、文字種指定用エリア２
０の指定ビットにより、「０〜９」の数字及び「コード
」で指定される例えば「ピリオド、」、「プラス十」の
記号からなる文字種を読取対象範囲として指定すること
になる。制御部１７は、「コード」で指定される例えば
「ピリオド、」の文字コード（第２図（ｂ）に示す２Ｅ
）及び「プラス十」の文字コード（第２図（Ｃ）に示す
２Ｂ）を生成して、数字の文字コードと共にＦＣコード
として認識部１４へ出力する。これにより、認識部１４
は、読取対象範囲として数字及び「ピリオド、」、「プ
ラス＋」の記号のみからなる辞書情報に基づいて、文字
認識処理を行なうことになる。Here, it is assumed that designation bits are set in rNJ and "code" in the character type designation area 20 by external FC information. The control unit I7 controls the character type specification area 2.
A designation bit of 0 designates a character type consisting of numbers "0 to 9" and symbols such as "period" and "plus ten" designated by "code" as the range to be read. The control unit 17 controls the character code specified by the "code", for example, "period" (2E shown in FIG. 2(b)).
) and "plus ten" character code (2B shown in FIG. 2(C)) are generated and output to the recognition unit 14 as an FC code together with the numerical character code. As a result, the recognition unit 14
performs character recognition processing based on dictionary information consisting only of numbers and symbols such as "period" and "plus+" as the reading range.

このようにして、ｒＮ、Ａ、Ｋｌの一般文字種及び「記
号Ａ−Ｃｌの特殊文字種のそれぞれを指定するコードと
は別に、一般文字種又は特殊文字種の中で特定文字種を
指定するための特定コードをセットする方式により、例
えば特殊文字種の範囲を適確な範囲のみに限定すること
ができる。したがって、例えば「ピリオド、」、「プラ
ス＋」の記号等しか使用しない帳票に対しては、該当す
る特殊文字種の辞書情報に基づいた認識処理が成される
ことになる。このため、認識処理における誤読率を大幅
に低下させることが可能となる。In this way, in addition to the codes that specify the general character types rN, A, and Kl and the special character types of symbols A-Cl, a specific code for specifying a specific character type among the general character types or special character types is created. Depending on the setting method, the range of special character types can be limited to an appropriate range, for example. Therefore, for a form that only uses symbols such as "period,""plus+", etc., the corresponding special character type can be Recognition processing is performed based on dictionary information of character types. Therefore, it is possible to significantly reduce the misreading rate in recognition processing.

［発明の効果コ以上詳述したように本発明によれば、フォーマットコン
トロール情報に基づいて認識処理を行なう文字読取装置
において、読取対象範囲に合致する文字種を適確に指定
することができる。したがって、文字認識処理における
誤読率を低下させ、結果的に読取精度を向上することが
できるものである。[Effects of the Invention] As described in detail above, according to the present invention, in a character reading device that performs recognition processing based on format control information, it is possible to accurately specify a character type that matches a reading target range. Therefore, it is possible to reduce the rate of misreading in character recognition processing and improve reading accuracy as a result.

[Brief explanation of the drawing]

第１図は本発明の実施例に係わる文字読取装置の構成を
示すブロック図、第２図（ａ）乃至（ｃ）はそれぞれ同
実施例の制御部に設けられる記憶エリアの内容を説明す
るための概念図である。１４・・・認識部、１６・・・辞書メモリ、１７・・・
制御部。出願人代理人　弁理士　鈴　江　武　彦第１図里第２図FIG. 1 is a block diagram showing the configuration of a character reading device according to an embodiment of the present invention, and FIGS. 2(a) to (c) are for explaining the contents of the storage area provided in the control unit of the embodiment, respectively. It is a conceptual diagram. 14... Recognition unit, 16... Dictionary memory, 17...
control section. Applicant's agent Patent attorney Takehiko Suzue Figure 1 Figure 2

Claims

[Claims] In a character reading device that performs character recognition processing based on format control information such as character types and dictionary information provided from the outside, the format control information specifies each of general character types and special character types consisting of various symbols. a first memory means for storing a code indicating a range; and a memory means for specifying only a specific character type within the specified range of the general character type or the special character type specified by the code stored in the first memory means. A second section that stores a specific code.
memory means, and recognition control means for executing character recognition processing based on character types corresponding to the specific code stored in the second memory means and the code stored in the first memory means, respectively. A character reading device characterized by: