JPH06301821A

JPH06301821A - Character recognition processing system

Info

Publication number: JPH06301821A
Application number: JP5085829A
Authority: JP
Inventors: Takahiro Kimura; 隆弘木村
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1993-04-13
Filing date: 1993-04-13
Publication date: 1994-10-28

Abstract

PURPOSE:To improve the OCR recognition rate and to reduce the burden of verifying work on an operator by performing know-ledge processing based on a dictionary data base at the time of character recognition of each document image. CONSTITUTION:A host computer 1 is provided with a corporation dictionary source generating part to generate a corporation dictionary source as an example of data bases. A knowledge processing device 4 performs the knowledge processing by a knowledge dictionary 5 at the time of character recognition of each document image in a character recognition device 3. The latest dictionary data base for knowledge base to which the corporation dictionary source, which is generated based on an actual result master file 2 by the host computer 1 and consists of proper nouns peculiar to the region like individual names, business type names, corporation names, is converted is stored in the knowledge dictionary 5. This latest dictionary data base is transferred to the knowledge processing device 4 by online, and the knowledge processing is performed in the knowledge processing device 4 by the knowledge dictionary 5 of proper nouns at the time of character recognition of each document image in the character recognition processing device.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、文字認識装置における
ＯＣＲ認識率を向上させる文字認識処理システムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition processing system for improving the OCR recognition rate in a character recognition device.

【０００２】[0002]

【従来の技術】従来、光学式文字認識装置として図５に
示すような文字認識装置１００が知られている。2. Description of the Related Art Conventionally, a character recognition device 100 as shown in FIG. 5 is known as an optical character recognition device.

【０００３】この文字認識装置１００は、ＬＡＮまたは
回線を介してホストコンピュータ１００に接続されてい
る。ホストコンピュータ１０１は、実績マスタファイル
１０２を備えている。この実績マスタファイル１０２
は、文字認識装置１００が読み取る帳票の関連機関、例
えば銀行等の金融機関から提供され、実績のある顧客の
帳票における検証項目の削減を図るために利用されてい
る。また、ホストコンピュータ１０１には、文字認識で
認識された検証端末１０３が接続されており、この検証
端末１０３は、帳票に記入された文字のイメージと文字
認識装置１００によって変換された文字コードデータと
の検証を行う。The character recognition device 100 is connected to a host computer 100 via a LAN or a line. The host computer 101 has a result master file 102. This achievement master file 102
Is provided by a related institution of the form read by the character recognition device 100, for example, a financial institution such as a bank, and is used to reduce the number of verification items in the form of a customer with a track record. Further, a verification terminal 103 recognized by character recognition is connected to the host computer 101, and the verification terminal 103 stores an image of characters written on a form and character code data converted by the character recognition device 100. Verify.

【０００４】このような文字認識装置１００によれば、
帳票イメージ若しくはＯＣＲ帳票について、予め設けら
れた辞書（図示せず）に基づきいわゆる素読みを行って
文字コードデータに変換する。その後、検証端末１０３
によって帳票のイメージと文字認識装置１００にて変換
された文字コードデータとの検証を行う。According to such a character recognition device 100,
The form image or the OCR form is subjected to so-called plain reading based on a dictionary (not shown) provided in advance and converted into character code data. After that, the verification terminal 103
The image of the form and the character code data converted by the character recognition device 100 are verified by.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上述し
た従来の文字認識装置１００は、帳票イメージ若しくは
ＯＣＲ帳票について素読みしか行っていなかったため
に、ＯＣＲ認識率が低いという問題があった。However, the above-described conventional character recognition device 100 has a problem that the OCR recognition rate is low because it only reads the form image or the OCR form.

【０００６】特に、個人名、業態名及び法人名等、地域
特有の固有名詞についてのＯＣＲ認識率が低いという問
題があった。[0006] In particular, there is a problem that the OCR recognition rate for proper nouns peculiar to a region such as an individual name, a business type name, a corporate name, etc.

【０００７】本発明は上記の事情に鑑みてなされたもの
であり、その目的は、文字認識装置のＯＣＲ認識率を向
上させることにより、各オペレータの検証作業の負担を
軽減することにある。The present invention has been made in view of the above circumstances, and it is an object of the present invention to improve the OCR recognition rate of a character recognition device, thereby reducing the burden of verification work on each operator.

【０００８】[0008]

【課題を解決するための手段】上記の目的を達成するた
めに、本発明に係る文字認識処理システムは、ホストコ
ンピュータにおいて過去数か月間に処理された実績デー
タを累積して作成された実績マスタファイルから個人
名、業態名、法人名称など、その地域の固有名詞の辞書
データベースを編集作成するデータベース作成手段と、
作成された辞書データベースに基づいて文字認識装置に
おける各帳票イメージの文字認識の際、知識処理を行う
知識処理装置とを具備することを特徴とする。In order to achieve the above-mentioned object, a character recognition processing system according to the present invention is a record master created by accumulating record data processed in the past several months in a host computer. Database creation means for editing and creating a dictionary database of proper nouns in the area such as personal name, business category name, corporate name from file
It is characterized by comprising a knowledge processing device that performs knowledge processing when character recognition of each form image in the character recognition device based on the created dictionary database.

【０００９】[0009]

【作用】上記の構成において、ホストコンピュータにお
いて過去数か月間に処理された実績データを累積管理し
て実績マスタファイルを作成する。その実績データをも
とに個人名、業態名、法人名称など、その地域の固有名
詞の最新辞書データベースを編集作成する。この最新辞
書データベースは知識処理装置にオンラインで転送され
る。文字認識処理装置における各帳票イメージの文字認
識の際、知識処理装置において固有名詞の知識辞書によ
って知識処理を行う。この知識辞書を用いてＯＣＲ知識
処理を行うことによってＯＣＲ認識率を向上させる。With the above arrangement, the host computer accumulates and manages the record data processed in the past several months to create a record master file. The latest dictionary database of proper nouns in the area, such as personal name, business category name, and corporate name, is edited and created based on the result data. This latest dictionary database is transferred online to the knowledge processing device. At the time of character recognition of each form image in the character recognition processing device, knowledge processing is performed by the knowledge dictionary of proper nouns in the knowledge processing device. The OCR recognition rate is improved by performing the OCR knowledge processing using this knowledge dictionary.

【００１０】[0010]

【実施例】図１は本発明に係る文字認識処理システムの
一実施例構成を示している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows the configuration of an embodiment of a character recognition processing system according to the present invention.

【００１１】この文字認識処理システムは、ホストコン
ピュータ１と、このホストコンピュータ１に接続された
実績マスタファイル２と、同じくホストコンピュータ１
にＬＡＮまたは回線を介して接続された文字認識装置
（ＯＣＲ）３と、この文字認識装置３に接続された知識
処理装置４と、この知識処理装置４に接続された知識辞
書５と、前記ホストコンピュータ１に接続された検証端
末６とを備えている。This character recognition processing system includes a host computer 1, a record master file 2 connected to the host computer 1, and a host computer 1 as well.
A character recognition device (OCR) 3 connected to the computer via a LAN or a line, a knowledge processing device 4 connected to the character recognition device 3, a knowledge dictionary 5 connected to the knowledge processing device 4, and the host The verification terminal 6 connected to the computer 1 is provided.

【００１２】実績マスタファイル２は、ホストコンピュ
ータ１が過去数カ月に処理した業務実績を蓄積している
データファイルであり、例えば、銀行の場合では、図２
に示すように、銀行名、支店名、科目（当座等）、口座
番号、金額、依頼人、受取人、電話番号等、顧客に関す
る種々の項目が設定され、対応する項目に取引実績が格
納されるようになっている。The actual result master file 2 is a data file in which the operational results processed by the host computer 1 in the past several months are accumulated. For example, in the case of a bank, FIG.
As shown in, various items related to customers such as bank name, branch name, subject (current account etc.), account number, amount, client, recipient, telephone number, etc. are set, and transaction records are stored in corresponding items. It has become so.

【００１３】本実施例のホストコンピュータ１は、知識
辞書作成手段としての法人辞書ソース作成部１Ａを備え
ており、後述するデータベースの一例として図３に示す
ような法人辞書ソースを作成する。The host computer 1 of this embodiment includes a corporate dictionary source creating section 1A as a knowledge dictionary creating means, and creates a corporate dictionary source as shown in FIG. 3 as an example of a database described later.

【００１４】知識処理装置４は、文字認識装置３におけ
る各帳票イメージの文字認識の際、知識辞書５により知
識処理を行う知識辞書５は、実績マスタファイル２をも
とにホストコンピュータ１によって作成された個人名、
業態名及び法人名等その地域特有の固有名詞からなる法
人辞書ソースを知識ベース用に変換した最新辞書データ
ベースを記憶する。The knowledge processing device 4 executes knowledge processing by the knowledge dictionary 5 when the character recognition device 3 performs character recognition of each form image. The knowledge dictionary 5 is created by the host computer 1 based on the result master file 2. Personal name,
The latest dictionary database in which a corporate dictionary source composed of proper nouns unique to the region such as a business name and a corporate name is converted into a knowledge base is stored.

【００１５】検証端末６は、帳票に記入されている文字
のイメージと文字認識装置３において変換された文字コ
ードデータとの検証を行う端末装置である。The verification terminal 6 is a terminal device for verifying the image of the character entered on the form and the character code data converted by the character recognition device 3.

【００１６】次に図４を参照しながら、この実施例の動
作を法人辞書作成処理部１Ａの処理手順を中心に説明す
る。Next, with reference to FIG. 4, the operation of this embodiment will be described focusing on the processing procedure of the corporate dictionary creation processing unit 1A.

【００１７】初めに、ホストコンピュータ１において過
去数カ月に処理された実績データを累積し、それを管理
して実績マスタファイル２が作成される。First, the actual result data processed in the past several months is accumulated in the host computer 1, and the actual result data is managed to create the actual result master file 2.

【００１８】次に、その実績データをもとに個人名、業
態名、法人名等その地域特有の固有名詞の最新辞書デー
タベースがホストコンピュータ１の法人辞書作成処理部
１Ａにより以下のようにして編集・作成される。Next, based on the result data, the latest dictionary database of personal names, business category names, corporate names, and other proper nouns unique to the region is edited by the corporate dictionary creation processing unit 1A of the host computer 1 as follows. -Created.

【００１９】法人辞書作成処理部１Ａは、図４に示すよ
うに、法人名ファイル作成処理８、法人名ファイル９、
第１のソート処理１０、第１のワークファイル１１、第
２のソート処理１２、第２のワークファイル１３、およ
び法人辞書ソース作成処理１４の各機能を備え、法人辞
書ソース１５を作成する。As shown in FIG. 4, the corporate dictionary creating processor 1A creates a corporate name file creating process 8, a corporate name file 9,
The first sort processing 10, the first work file 11, the second sort processing 12, the second work file 13, and the corporate dictionary source creation processing 14 are provided and the corporate dictionary source 15 is created.

【００２０】先ず、法人名ファイル作成処理８により、
実績マスタファイル２と業態名ファイル７を用いて受取
人名中の業態名を除外し、法人名ファイル９を作成す
る。First, by the corporate name file creation process 8,
The business name in the recipient name is excluded by using the achievement master file 2 and the business name file 7, and a corporate name file 9 is created.

【００２１】ここで、業態名とは、法人の業務態様を示
すもので、“株式会社Ａ”の業態名は“株式会社”また
は“（カ”であり、また“Ｂ株式会社”の業態名は“株
式会社”または“カ）”であり、さらに有限会社Ｃの業
態名は、“有限会社”または“（ユ”である。Here, the business category name indicates the business mode of a corporation, and the business category name of "corporation A" is "corporation" or "(ka)" and the business category name of "B corporation". Is a "corporation" or "ka", and the business name of the limited company C is "limited company" or "(you)".

【００２２】作成された法人名ファイル９を用いて第１
のソート処理１０により法人名称をキーとして昇順にソ
ートし、第１のワークファイル１１に出力する。ソート
時に同一キーを持つレコードが存在している場合にはそ
のレコード件数も同時にカウントし出力される。First, using the created corporate name file 9,
Sorting is performed in ascending order using the corporate name as a key by the sorting process 10 and is output to the first work file 11. If a record with the same key exists at the time of sorting, the number of records is also counted and output.

【００２３】第１のワークファイル１１を用いて第２の
ソート処理１２により、出現回数（レコード件数）の多
いものから順（降順）、法人名称の昇順（あいうえお
順）にソートし、第２のワークファイル１３に出力す
る。第２ワークファイル１３を用いて、法人辞書ソース
作成処理１４により、ＯＣＲ文字認識に適合した形式に
変換して図３に示すような法人辞書ソースを作成する。The second sort processing 12 using the first work file 11 sorts in descending order of the number of appearances (the number of records) and in ascending order of corporate name (aiueo order). Output to work file 13. Using the second work file 13, the corporate dictionary source creation processing 14 converts the file into a format suitable for OCR character recognition to create a corporate dictionary source as shown in FIG.

【００２４】この最新知識辞書ソースをオンラインで知
識処理装置５に転送して追加することにより最新の知識
辞書５を作成する。The latest knowledge dictionary 5 is created by transferring this latest knowledge dictionary source to the knowledge processing device 5 online and adding it.

【００２５】これによって、文字認識装置３は、作成さ
れた知識辞書５を各ＯＣＲ帳票イメージ若しくはＯＣＲ
帳票の文字認識の際に使用することにより知識処理を行
う。As a result, the character recognition device 3 uses the created knowledge dictionary 5 for each OCR form image or OCR.
Knowledge processing is performed by using it when recognizing characters on a form.

【００２６】以上の方法により、文字認識装置３におけ
るＯＣＲ文字認識率が向上し、各オペレータの検証作業
における作業負担が軽減される。By the above method, the OCR character recognition rate in the character recognition device 3 is improved, and the work load of each operator in the verification work is reduced.

【００２７】なお、本実施例では、銀行を始めとする金
融機関を例として説明したが、本発明は、金融機関にお
けるＯＣＲエントリシステムに限定されるものではな
く、各種流通産業や官公庁等の文字認識装置によるＯＣ
Ｒエントリシステムにも適用可能である。In this embodiment, a financial institution such as a bank has been described as an example, but the present invention is not limited to an OCR entry system in a financial institution, and characters of various distribution industries, government offices, etc. OC by recognition device
It is also applicable to the R entry system.

【００２８】[0028]

【発明の効果】以上説明したように本発明によれば、実
績データから作成された個人名、業態名、法人名称な
ど、その地域の固有名詞の辞書データベースに基づいて
文字認識装置における各帳票イメージの文字認識の際に
知識処理を行うようにしたので、文字認識装置のＯＣＲ
認識率が向上するとともに、ＯＣＲエントリシステムに
おけるオペレータの検証作業の負担が軽減され、ＯＣＲ
エントリシステムの処理効率が向上する。As described above, according to the present invention, each form image in the character recognition device is based on the dictionary database of the proper nouns of the region, such as the personal name, business category name, corporate name, etc. created from the performance data. Since the knowledge processing is performed at the time of character recognition of OCR of the character recognition device
The recognition rate is improved, and the burden of the operator's verification work in the OCR entry system is reduced.
The processing efficiency of the entry system is improved.

[Brief description of drawings]

【図１】本発明に係る文字認識処理システムの一実施例
を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a character recognition processing system according to the present invention.

【図２】実績マスタファイルの一例を示す説明図であ
る。FIG. 2 is an explanatory diagram showing an example of a result master file.

【図３】法人辞書ソースの一例を示す説明図である。FIG. 3 is an explanatory diagram showing an example of a corporate dictionary source.

【図４】法人辞書ソース作成の処理手順を示す説明図で
ある。FIG. 4 is an explanatory diagram showing a processing procedure for creating a corporate dictionary source.

【図５】従来の文字認識処理システムを示すブロック図
である。FIG. 5 is a block diagram showing a conventional character recognition processing system.

[Explanation of symbols]

１ホストコンピュータ２実績マスタファイル３文字認識装置４知識処理装置５知識辞書６検証端末７業態名ファイル 1 Host computer 2 Actual master file 3 Character recognition device 4 Knowledge processing device 5 Knowledge dictionary 6 Verification terminal 7 Business name file

Claims

[Claims]

1. A dictionary database of proper nouns for a region such as an individual name, a business name, and a corporate name is edited and created from a result master file created by accumulating the result data processed in the past several months on a host computer. A character recognition processing system, comprising: a database creating means for performing the knowledge processing when performing character recognition of each form image in the character recognition apparatus based on the created dictionary database.