JPH0816730A

JPH0816730A - Character recognition system

Info

Publication number: JPH0816730A
Application number: JP6170251A
Authority: JP
Inventors: Masahiro Iwazawa; 正宏岩沢
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1994-06-29
Filing date: 1994-06-29
Publication date: 1996-01-19

Abstract

PURPOSE:To improve character recognition processing efficiency and to reduce the burden of an operator at the time of correction or the like by performing word recognition processing by rearranging the order of candidate characters selected by a character recognition part while considering frequency in use. CONSTITUTION:A use frequency table 123 showing the frequency in use corresponding to respective characters up to the last time and a rearranging part 213 is provided with a function for rearranging the order of respective candidate characters selected by a character recognition part 232 while using the data of the use frequency table stored in a use frequency table storage part 222 and for outputting the candidate characters in that rearranged order to a word recognition processor 100. Then, the data of the use frequency table 123 are transferred from the word recognition processor 100 to a character recognition device 200 and on the side of the character recognition device 200, the order of candidate characters is rearranged while considering the contents of this use frequency table 123.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、帳票上の文字を読取
り、その文字認識処理を行う文字認識システムに関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition system for reading characters on a form and performing character recognition processing on the characters.

【０００２】[0002]

【従来の技術】帳票上の文字を光学的に読み取り、その
記載内容を認識する文字認識システムとして、文字認識
装置と、この文字認識装置の文字認識結果に基づき漢字
の知識処理等を行う情報処理装置等からなる構成が知ら
れている。このような文字認識システムにおいて、文字
認識装置は、帳票上の文字の読み取りおよび各文字毎の
文字認識処理を行う装置であり、また、情報処理装置
は、このような文字認識装置の文字認識結果に基づき、
単語認識処理や各文字の修正処理といった処理を行うも
のである。2. Description of the Related Art As a character recognition system for optically reading characters on a form and recognizing the contents of the characters, a character recognition device and information processing for performing knowledge processing of Chinese characters based on the character recognition result of the character recognition device. A configuration including a device and the like is known. In such a character recognition system, the character recognition device is a device that reads characters on a form and performs character recognition processing for each character, and the information processing device is a character recognition result of such a character recognition device. Based on
It performs processing such as word recognition processing and correction processing of each character.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、上記従
来の文字認識システムでは、その文字認識処理は、帳票
毎に独立して行われ、例えば、前回までの文字認識処理
結果を今回の文字認識処理に対して反映させるといった
構成ではなかった。従って、例えば、文字認識装置で文
字の認識誤りが発生した場合、一般的には、単語認識処
理で正しい文字に認識処理されるが、この認識処理に用
いる文字の候補から漏れてしまったり、認識効率が低下
する問題があった。このような点から、前回までの文字
認識処理結果を生かし、文字認識処理の効率を向上させ
ることのできる文字認識システムの実現が望まれてい
た。However, in the above-described conventional character recognition system, the character recognition processing is performed independently for each form. For example, the character recognition processing results up to the previous time are used as the current character recognition processing. It was not configured to reflect it. Therefore, for example, when a character recognition error occurs in the character recognition device, in general, the word recognition process recognizes a correct character. There was a problem of reduced efficiency. From such a point, it has been desired to realize a character recognition system that can improve the efficiency of the character recognition processing by making the most of the character recognition processing results up to the previous time.

【０００４】[0004]

【課題を解決するための手段】本発明の文字認識システ
ムは、前述の課題を解決するために、各文字の前回まで
の使用頻度を示す使用頻度テーブルを設ける。そして、
各文字に対して複数の候補文字を順位を付けて選出する
文字認識部の候補文字の順位を、使用頻度テーブルの使
用頻度に基づき並べ替える並べ替え部を設ける。更に、
この並べ替え部で並べ替えられた候補文字の順位に基づ
き、各文字の単語認識を行う単語認識処理部を備えたこ
とを特徴とするものである。In order to solve the above-mentioned problems, the character recognition system of the present invention is provided with a usage frequency table showing the usage frequency of each character up to the last time. And
A rearrangement unit is provided for rearranging the ranks of the candidate characters of the character recognition unit that selects and ranks a plurality of candidate characters for each character based on the frequency of use in the frequency-of-use table. Furthermore,
It is characterized in that a word recognition processing unit for recognizing a word of each character based on the rank of the candidate characters rearranged by the rearranging unit is provided.

【０００５】[0005]

【作用】本発明の文字認識システムにおいては、文字認
識部は、各文字毎の文字認識を行い、各文字に対して複
数の候補文字を順位を付けて選出する。並べ替え部は、
使用頻度テーブルの使用頻度に基づき、文字認識部が選
出した候補文字の順位を並べ替える。単語認識処理部
は、並べ替えられた候補文字の順位に基づき、各文字か
ら構成される単語認識を行う。In the character recognition system of the present invention, the character recognition unit performs character recognition for each character and ranks and selects a plurality of candidate characters for each character. The sorting section
The order of the candidate characters selected by the character recognition unit is rearranged based on the usage frequency of the usage frequency table. The word recognition processing unit performs word recognition composed of each character based on the rank of the rearranged candidate characters.

【０００６】[0006]

【実施例】以下、本発明の実施例を図面を用いて詳細に
説明する。《実施例１》図１は本発明の文字認識システムの実施例
１を示すブロック図である。図のシステムは、単語認識
処理装置１００と、文字認識装置２００とからなる。単
語認識処理装置１００は、パーソナルコンピュータやワ
ークステーション等の情報処理装置からなるもので、制
御部１１０、記憶部１２０、ディスプレイ１３１、キー
ボード１３２、マウス１３３を備えている。Embodiments of the present invention will now be described in detail with reference to the drawings. << Embodiment 1 >> FIG. 1 is a block diagram showing Embodiment 1 of the character recognition system of the present invention. The system shown in the figure comprises a word recognition processing device 100 and a character recognition device 200. The word recognition processing device 100 includes an information processing device such as a personal computer or a workstation, and includes a control unit 110, a storage unit 120, a display 131, a keyboard 132, and a mouse 133.

【０００７】制御部１１０は、単語認識処理装置１００
としての単語認識処理や各部の制御を行う機能を有し、
単語認識処理部１１１、テーブルデータ管理部１１２、
テーブルデータ転送部１１３を備えている。The control unit 110 includes a word recognition processing device 100.
Has the function of performing word recognition processing and control of each part as
A word recognition processing unit 111, a table data management unit 112,
A table data transfer unit 113 is provided.

【０００８】単語認識処理部１１１は、単語照合処理プ
ログラムや、データ修正処理用プログラムから構成さ
れ、文字認識装置２００から受け取った文字認識結果に
基づき、単語単位に文字を認識し、かつ、その認識結果
をディスプレイ１３１に表示したり、キーボード１３２
やマウス１３３からの入力結果により、各文字の認識結
果を修正する機能を有している。The word recognition processing unit 111 is composed of a word collation processing program and a data correction processing program, recognizes characters in word units based on the character recognition result received from the character recognition device 200, and recognizes the characters. The result is displayed on the display 131 or the keyboard 132
And a function of correcting the recognition result of each character according to the input result from the mouse 133.

【０００９】テーブルデータ管理部１１２は、記憶部１
２０内に格納された後述する使用頻度テーブル１２３の
データを管理するもので、単語認識処理部１１１の文字
認識結果に基づき、認識結果の文字が使用頻度テーブル
１２３にない場合は、新たに加え、あった場合は使用頻
度カウンタのカウンタ値を増加させる機能を有するもの
である。テーブルデータ転送部１１３は、使用頻度テー
ブル１２３の内容を文字認識装置２００に転送制御する
機能を有するものである。The table data management unit 112 is a storage unit 1.
20 is used to manage the data of a usage frequency table 123, which will be described later, stored in the memory 20. If there is, it has a function of increasing the counter value of the usage frequency counter. The table data transfer unit 113 has a function of controlling transfer of the contents of the usage frequency table 123 to the character recognition device 200.

【００１０】記憶部１２０は、磁気ディスク装置や半導
体メモリ等の記憶装置からなり、帳票フォーマットデー
タ１２１、辞書ファイル１２２および使用頻度テーブル
１２３を備えている。The storage unit 120 is composed of a storage device such as a magnetic disk device or a semiconductor memory, and has a form format data 121, a dictionary file 122 and a use frequency table 123.

【００１１】帳票フォーマットデータ１２１は、文字認
識装置２００によって文字認識が行われる帳票のフォー
マットデータである。即ち、各帳票には、その帳票の形
状と共に、認識すべき領域（文字が手書きあるいは印字
されるための記入欄の位置）が帳票毎に予め決定されて
おり、フォーマットはこの認識すべき領域（フィール
ド）や帳票の大きさ、および、各記入欄で使用するキャ
ラクタセットテーブル２２１等のデータである。尚、キ
ャラクタセットテーブル２２１については後述する。The form format data 121 is the form data of a form for which character recognition is performed by the character recognition device 200. That is, in each form, the area to be recognized (position of the entry column for handwriting or printing of characters) is predetermined for each form together with the shape of the form, and the format is the area to be recognized ( It is data such as fields) and the size of the form, and the character set table 221 used in each entry field. The character set table 221 will be described later.

【００１２】辞書ファイル１２２は、帳票上の各領域毎
の単語認識を行うための辞書データである。即ち、各帳
票は、その記入場所が決められていると共に、各欄に記
入される内容もある程度決まっているものである。例え
ば、住所欄に記入される文字は都道府県名や地名である
ため、これらの単語辞書が必要となる。従って、文字認
識システムでは、予め、各フィールドに記入される予定
の単語を収集した辞書ファイルを作成しているものであ
る。The dictionary file 122 is dictionary data for word recognition for each area on the form. That is, in each form, the entry place is determined, and the content to be entered in each column is also determined to some extent. For example, since the characters entered in the address column are prefecture names or place names, these word dictionaries are required. Therefore, in the character recognition system, a dictionary file that collects words to be written in each field is created in advance.

【００１３】また、使用頻度テーブル１２３は、前回ま
での各文字に対応した使用頻度を示すためのテーブルで
あり、以下のように構成されている。図２は、使用頻度
テーブル１２３の説明図である。即ち、使用頻度テーブ
ル１２３は、それまでの認識処理で出現した文字の文字
コードと、各々の文字が辞書ファイル１２２中のどの辞
書で使用されているかを示す使用辞書情報と、各辞書毎
に何回使用されたかを示す頻度カウンタの情報とで構成
されている。The usage frequency table 123 is a table for indicating the usage frequency corresponding to each character up to the last time, and is configured as follows. FIG. 2 is an explanatory diagram of the usage frequency table 123. That is, the usage frequency table 123 includes the character codes of the characters that have appeared in the recognition process up to that point, the usage dictionary information indicating in which dictionary in the dictionary file 122 each character is used, and the information for each dictionary. It is composed of the frequency counter information indicating whether or not it has been used once.

【００１４】例えば、文字コードｘの場合、使用辞書情
報には、その文字コードｘが辞書Ａ、Ｂ、…で使用され
ている旨の情報が記憶され、また、辞書Ａでの頻度カウ
ンタ、辞書Ｂでの頻度カウンタは、文字コードｘにおけ
る各辞書Ａ、Ｂ、…での使用頻度を示している。For example, in the case of the character code x, the use dictionary information stores information indicating that the character code x is used in the dictionaries A, B, ... And the frequency counter and the dictionary in the dictionary A are stored. The frequency counter for B indicates the frequency of use in each dictionary A, B, ... For the character code x.

【００１５】また、ディスプレイ１３１〜マウス１３３
は、オペレータがデータ修正や作業指示を行う場合に用
いる入出力装置である。Further, the display 131 to the mouse 133
Is an input / output device used when the operator corrects data or gives a work instruction.

【００１６】文字認識装置２００は、帳票上に記入され
た手書き文字を光学的に読み取って、各文字毎の文字認
識を行う光学式文字読取装置（ＯＣＲ）であり、文字認
識処理部２１０と、記憶部２２０とから構成されてい
る。文字認識処理部２１０は、文字認識装置２００とし
ての認識処理や各部の制御を行う機能を有し、読取部２
１１、文字認識部２１２、並べ替え部２１３から構成さ
れている。また、記憶部２２０は、キャラクタセットテ
ーブル２２１と使用頻度テーブル格納部２２２とを備え
ている。The character recognition device 200 is an optical character reading device (OCR) that optically reads a handwritten character entered on a form and performs character recognition for each character. It is composed of a storage unit 220. The character recognition processing unit 210 has a function of performing recognition processing as the character recognition device 200 and control of each unit.
11, a character recognition unit 212 and a rearrangement unit 213. The storage unit 220 also includes a character set table 221 and a usage frequency table storage unit 222.

【００１７】読取部２１１は、イメージリーダ等から構
成され、帳票上に記入された文字を光学的に読み取り、
イメージデータとして出力する機能を有している。文字
認識部２１２は、読取部２１１の読み取り結果を、単語
認識処理装置１００からの帳票フォーマットデータ１２
１に基づき、記入された文字を１文字ずつ解析・認識
し、キャラクタセットテーブル２２１から、各文字毎に
複数の候補文字を順位を付けて選出する機能を有してい
る。並べ替え部２１３は、文字認識部２１２にて選出さ
れた各候補文字の順位を、使用頻度テーブル格納部２２
２に格納された使用頻度テーブルのデータを用いて並べ
替えを行い、その並べ替えた順位の候補文字を単語認識
処理装置１００に対して出力する機能を有している。The reading unit 211 is composed of an image reader or the like, and optically reads characters written on a form,
It has a function to output as image data. The character recognition unit 212 uses the reading result of the reading unit 211 as the form format data 12 from the word recognition processing device 100.
Based on 1, the entered characters are analyzed and recognized one by one, and a plurality of candidate characters are ranked and selected from the character set table 221 for each character. The rearrangement unit 213 determines the rank of each candidate character selected by the character recognition unit 212 from the use frequency table storage unit 22.
It has a function of performing rearrangement using the data of the frequency-of-use table stored in No. 2 and outputting the candidate character of the rearranged rank to the word recognition processing device 100.

【００１８】また、記憶部２２０のキャラクタセットテ
ーブル２２１は、各辞書毎に存在する全ての文字の文字
コードを記録しているテーブルである。即ち、キャラク
タセットテーブル２２１には、各文字の文字コードと、
各文字がどの辞書で使用されているかを示す辞書のファ
イルNoが記憶されている。更に、使用頻度テーブル格納
部２２２は、単語認識処理装置１００のテーブルデータ
転送部１１３によって転送された使用頻度テーブル１２
３のデータを格納するための領域である。The character set table 221 of the storage unit 220 is a table in which character codes of all the characters existing in each dictionary are recorded. That is, in the character set table 221, the character code of each character,
A dictionary file number indicating which dictionary each character is used is stored. Further, the usage frequency table storage unit 222 stores the usage frequency table 12 transferred by the table data transfer unit 113 of the word recognition processing device 100.
This is an area for storing the data of No. 3.

【００１９】次に、上記構成の文字認識システムの動作
を説明する。先ず、本実施例の文字認識システムの動作
を説明する前に、使用頻度テーブルによる並べ替え処理
を除いた文字認識システムにおける基本的な動作を説明
する。Next, the operation of the character recognition system having the above configuration will be described. First, before explaining the operation of the character recognition system of the present embodiment, the basic operation of the character recognition system excluding the rearrangement processing by the use frequency table will be described.

【００２０】《文字認識システムにおける基本的な動
作》先ず、文字認識装置２００において、帳票の読み取
りが開始されると、予め決められた領域に記載されてい
る帳票の識別データ（例えば帳票番号）が読み取られ
る。これにより、文字認識部２１２は、その帳票の識別
データと共に、帳票のフォーマットデータを単語認識処
理装置１００に対して要求する。単語認識処理装置１０
０は、送られた帳票の識別データに基づき、対応する帳
票のフォーマットデータを文字認識装置２００に送信す
る。<< Basic Operation in Character Recognition System >> First, when the reading of a form is started in the character recognition device 200, the identification data (for example, form number) of the form described in a predetermined area is displayed. Read. As a result, the character recognition unit 212 requests the word recognition processing device 100 for the form identification data as well as the form identification data. Word recognition processing device 10
0 transmits the format data of the corresponding form to the character recognition device 200 based on the identification data of the form transmitted.

【００２１】文字認識装置２００では、このフォーマッ
トデータによって、文字認識すべき帳票上のフィールド
と、そのフィールドにはどのようなデータが記入される
かを認識して文字認識処理を行う。即ち、文字認識部２
１２は、フィールド毎に、キャラクタセットテーブル２
２１に登録されている文字の中からその文字に類似して
いると考えられる文字を、その字形の特徴から可能性の
高い順に順位を付け、候補文字として選出する。The character recognition device 200 recognizes the field on the form to be character-recognized and what kind of data is to be written in the field based on the format data, and performs the character recognition process. That is, the character recognition unit 2
12 is a character set table 2 for each field
Characters that are considered to be similar to the character registered in 21 are ranked in order of high possibility based on the characteristics of the glyph, and are selected as candidate characters.

【００２２】例えば、あるフィールドの文字認識処理を
行う場合、フォーマットデータに基づき、そのフィー
ルドで使用される辞書ファイル（例えば、“Ａ”とす
る）を知る。キャラクタセットテーブル２２１に登録
されている文字のうち、辞書ファイル＝“Ａ”を持つ文
字のみを候補文字の対象として選出する。そして、これ
らの候補文字を単語認識処理装置１００に転送する。For example, when performing character recognition processing for a certain field, the dictionary file (for example, "A") used in the field is known based on the format data. Of the characters registered in the character set table 221, only the characters having the dictionary file = "A" are selected as candidates for the candidate character. Then, these candidate characters are transferred to the word recognition processing device 100.

【００２３】単語認識処理装置１００の単語認識処理部
１１１は、転送された候補文字の順位を考慮しながら、
辞書ファイル１２２から突合せを行って最適な単語を選
出することで文字認識処理を行う。例えば、文字認識装
置２００からの都道府県名の文字認識結果が「埼王県」
となっていたとする。即ち、２文字目の候補文字の順位
が１．王２．玉 …となっていた場合である。The word recognition processing section 111 of the word recognition processing apparatus 100 considers the order of the transferred candidate characters,
Character recognition processing is performed by matching the dictionary files 122 and selecting the optimum word. For example, the character recognition result of the prefecture name from the character recognition device 200 is “Saio prefecture”.
It was supposed to be. That is, the rank of the second candidate character is 1. King 2. This is the case when it was a ball.

【００２４】このような場合、単語認識処理部１１１
は、そのフィールドに対応する辞書に基づいて単語認識
を行う。即ち、その辞書には「埼王県」が存在せず、２
文字目を、候補文字の順位が２番目である「玉」とした
場合の「埼玉県」が存在するため、これを採用する。そ
して、単語認識処理部１１１は、帳票に記入された文字
のイメージデータと共に、単語認識結果をディスプレイ
１３１に表示し、オペレータによる確認を促す。オペレ
ータは、ディスプレイ１３１上に表示された文字に対し
て修正を行いたい場合は、各文字毎に文字認識装置２０
０から送られた候補文字の一覧を表示選択し、キーボー
ド１３２やマウス１３３によって、この中から正解と思
われる文字を選択する。In such a case, the word recognition processing unit 111
Performs word recognition based on the dictionary corresponding to that field. That is, there is no "Saio prefecture" in the dictionary, and 2
Since there is “Saitama prefecture” when the character is “tama”, which has the second rank of candidate characters, this is adopted. Then, the word recognition processing unit 111 displays the word recognition result on the display 131 together with the image data of the characters written on the form, and prompts the operator to confirm. When the operator wants to make corrections to the characters displayed on the display 131, the character recognition device 20
A list of candidate characters sent from 0 is displayed and selected, and a character which seems to be the correct answer is selected from among these by the keyboard 132 and the mouse 133.

【００２５】ところが、このような動作では、文字認識
装置２００は、ある文字に対して、その文字に近いと思
われる文字を候補文字として選出するだけである。従っ
て、例えば、キャラクタセットテーブル２２１に、ある
文字に似た字形の文字が多く登録されている場合等で
は、正解の文字が候補文字から外されてしまう場合も発
生する。即ち、文字認識装置２００から単語認識処理装
置１００に対して送られる候補文字には、各文字に対し
て何字以内といった制限（単語認識処理装置１００側か
ら文字認識装置２００に対して何個欲しいかを指定す
る）があるため、正解の文字がそれ以降の順位であった
場合には、候補文字として送られないことになる。However, in such an operation, the character recognition device 200 only selects a character that is considered to be close to the character as a candidate character. Therefore, for example, when many characters having a character shape similar to a certain character are registered in the character set table 221, the correct character may be excluded from the candidate characters. That is, the candidate character sent from the character recognition device 200 to the word recognition processing device 100 is limited to how many characters each character has (how many words the character recognition device 200 wants from the word recognition processing device 100 side). Therefore, if the correct character has a higher rank, it will not be sent as a candidate character.

【００２６】その結果、単語認識処理装置１００側の単
語認識処理部１１１では、受け取った候補文字の中から
辞書ファイル１２２での単語認識を行う関係上、候補文
字として正解文字を受け取っていない場合は、認識不可
となってしまう恐れがある。As a result, the word recognition processing unit 111 of the word recognition processing apparatus 100 recognizes a word in the dictionary file 122 from the received candidate characters, and therefore, when the correct character is not received as the candidate character, , There is a risk of becoming unrecognizable.

【００２７】また、たとえ、正解文字が候補文字として
単語認識処理装置１００に送られた場合でも、その順位
が最下位に近付けば近付くほど、単語認識処理部１１１
での単語認識の処理効率が低下してしまう。Further, even if the correct answer character is sent to the word recognition processing apparatus 100 as a candidate character, the word recognition processing unit 111 becomes closer as the rank gets closer to the lowest rank.
The processing efficiency of word recognition in will be reduced.

【００２８】このような点から、実施例１では、文字認
識装置２００側から送られる候補文字に制限がある場合
でも正確でかつ効率のよい単語認識処理が行えるよう、
文字認識装置２００の候補文字選出時に使用頻度テーブ
ルのデータを用いており、その動作を次に実施例１の動
作として説明する。From this point of view, in the first embodiment, it is possible to perform accurate and efficient word recognition processing even when there are restrictions on the candidate characters sent from the character recognition device 200 side.
The data of the frequency-of-use table is used when selecting the candidate character of the character recognition device 200, and the operation will be described as the operation of the first embodiment.

【００２９】《実施例１の動作》図３および図４は、そ
れぞれ、単語認識処理装置１００および文字認識装置２
００の動作フローチャートである。先ず、文字認識装置
２００に帳票が挿入されると、読取部２１１が読み取り
を開始し（ステップＳ４１）、上述したように帳票フォ
ーマットデータ要求を行う（ステップＳ４２）。<< Operation of Embodiment 1 >> FIGS. 3 and 4 show a word recognition processing apparatus 100 and a character recognition apparatus 2, respectively.
10 is an operation flowchart of 00. First, when a form is inserted into the character recognition device 200, the reading unit 211 starts reading (step S41) and makes a form format data request as described above (step S42).

【００３０】単語認識処理装置１００は、文字認識装置
２００からの受信があった場合、その受信内容を判定す
る（ステップＳ３１）。これが、帳票フォーマットデー
タ要求であった場合は、基本的な動作で説明したよう
に、帳票の識別データに対応する帳票フォーマットデー
タ１２１を取り出して、これを転送する（ステップＳ３
２）また、同時に、取り出した帳票フォーマットデータ
１２１から使用頻度テーブル１２３を使用するか否かを
判定する（ステップＳ３３）。尚、使用頻度テーブル１
２３を使用する／しないは、帳票フォーマットの設計の
時点で帳票フォーマットデータ１２１に記録されてい
る。When the word recognition processing device 100 receives from the character recognition device 200, the word recognition processing device 100 determines the received content (step S31). If this is a form format data request, as described in the basic operation, the form format data 121 corresponding to the form identification data is taken out and transferred (step S3).
2) At the same time, it is determined whether to use the usage frequency table 123 from the extracted form format data 121 (step S33). Note that the usage frequency table 1
Whether or not to use 23 is recorded in the form format data 121 at the time of designing the form format.

【００３１】ステップＳ３３において、使用頻度テーブ
ル１２３を“使用する”であった場合、テーブルデータ
転送部１１３は、使用頻度テーブル１２３を文字認識装
置２００に対して転送し（ステップＳ３４）、ステップ
Ｓ３１に戻る。また、ステップＳ３３において、使用頻
度テーブル１２３を“使用しない”であった場合は、そ
のままステップＳ３１に戻る。When the usage frequency table 123 is "used" in step S33, the table data transfer unit 113 transfers the usage frequency table 123 to the character recognition device 200 (step S34), and then proceeds to step S31. Return. If the usage frequency table 123 is "not used" in step S33, the process directly returns to step S31.

【００３２】文字認識装置２００は、単語認識処理装置
１００からの帳票フォーマットデータ１２１が転送さ
れ、また、ステップＳ３４で単語認識処理装置１００側
から使用頻度テーブル１２３が転送された場合は、この
データを受信し（ステップＳ４３）、その後、文字認識
部２１２は文字解析処理を開始する（ステップＳ４
４）。The character recognition device 200 transfers the form format data 121 from the word recognition processing device 100, and when the usage frequency table 123 is transferred from the word recognition processing device 100 side in step S34, this data is transferred. Upon reception (step S43), the character recognition unit 212 starts character analysis processing (step S4).
4).

【００３３】即ち、文字認識部２１２は、《基本的な動
作》で説明したように、読取部２１１で読み取った文字
のイメージデータから、帳票のフォーマットデータに基
づき所定の文字解析処理を行い、キャラクタセットテー
ブル２２１に登録されている文字の中から、そのフィー
ルドで使用されている辞書ファイルの文字を候補文字の
対象として選出する。そして、これらの対象とする文字
から、解析処理で認識した文字に類似していると考えら
れる文字を、その字形の特徴から可能性の高い順に順位
を付けて候補文字として選出する（ステップＳ４５）。That is, the character recognition unit 212 performs a predetermined character analysis process based on the format data of the form from the image data of the character read by the reading unit 211, as described in << Basic operation >>, From the characters registered in the set table 221, the characters of the dictionary file used in that field are selected as candidates for the candidate character. Then, from these target characters, the characters that are considered to be similar to the characters recognized in the analysis process are selected as candidate characters by ranking them in the order of high possibility based on the characteristics of the glyphs (step S45). .

【００３４】次に、並べ替え部２１３は、ステップＳ４
３で受信した帳票フォーマットデータから、使用頻度テ
ーブル１２３を使用するか否かを判定する（ステップＳ
４６）。このステップＳ４６において、使用頻度テーブ
ル１２３を使用する、であった場合は、この使用頻度テ
ーブル１２３の内容を加味して候補文字の順位を並べ替
える（ステップＳ４７）。Next, the rearrangement unit 213 determines in step S4.
From the form format data received in step 3, it is determined whether or not to use the usage frequency table 123 (step S
46). If it is determined in step S46 that the usage frequency table 123 is used, the order of the candidate characters is rearranged in consideration of the contents of the usage frequency table 123 (step S47).

【００３５】即ち、認識された文字に対する選出された
全ての候補文字に対して、対応する辞書での頻度カウン
タの値を調べ、この頻度カウンタの値に基づいて順位を
見直す。例えば、候補文字の順位が低いにもかかわら
ず、使用頻度テーブル１２３のカウンタ値が高かった場
合は、このカウンタ値を反映させて、その候補文字の順
位を高くする。尚、このカウンタ値の順位への反映度
は、予め経験則的に決定されている。That is, for all the candidate characters selected for the recognized character, the value of the frequency counter in the corresponding dictionary is checked, and the ranking is re-evaluated based on the value of this frequency counter. For example, when the rank value of the candidate character is low, but the counter value of the usage frequency table 123 is high, this counter value is reflected to increase the rank of the candidate character. It should be noted that the degree of reflection of this counter value in the ranking is previously determined empirically.

【００３６】ステップＳ４７で候補文字の並べ替えが終
了すると、またはステップＳ４６において使用頻度テー
ブル１２３を使用しないと判定された場合は、そのまま
の候補文字の順位で、帳票１枚分の候補文字や読み取っ
た記入文字のイメージデータといった文字認識結果を単
語認識処理装置１００に対して転送する（ステップＳ４
８）。When the rearrangement of the candidate characters is completed in step S47, or when it is determined in step S46 that the use frequency table 123 is not used, the candidate characters of one sheet or the read characters are read in the order of the candidate characters. The character recognition result such as the image data of the written character is transferred to the word recognition processing device 100 (step S4).
8).

【００３７】単語認識処理装置１００は、ステップＳ３
１の受信において、受信内容が文字認識装置２００から
の候補文字を含む文字認識結果であった場合、単語認識
処理部１１１が単語照合処理を行う（ステップＳ３
５）。即ち、各フィールド毎に、対応する辞書ファイル
１２２を参照し、単語としての文字認識を行う。これが
終了すると、単語認識処理部１１１は、その単語認識し
た処理結果や文字認識装置２００から受け取った記入文
字のイメージデータを表示し、オペレータによる確認・
修正処理を促す（ステップＳ３６）。The word recognition processing apparatus 100 has step S3.
In the reception of 1, when the received content is the character recognition result including the candidate character from the character recognition device 200, the word recognition processing unit 111 performs the word matching process (step S3).
5). That is, for each field, the corresponding dictionary file 122 is referred to, and character recognition as a word is performed. When this is finished, the word recognition processing unit 111 displays the processing result of the word recognition and the image data of the written character received from the character recognition device 200, and the operator confirms and
A correction process is prompted (step S36).

【００３８】オペレータが認識内容の確認入力を行う
と、テーブルデータ管理部１１２は、帳票フォーマット
データ１２１から、その帳票の、使用頻度テーブル１２
３の使用する／しないを調べ（ステップＳ３７）、“使
用する”であった場合は、使用頻度テーブル１２３の内
容を更新する（ステップＳ３８）。即ち、その文字が登
録されていない場合は、文字コードの登録を行い、登録
されている場合は、対応する辞書での頻度カウンタ値を
インクリメントする。また、ステップＳ３７において、
使用頻度テーブル１２３を“使用しない”であった場合
は、そのままステップＳ３１に戻る。When the operator confirms and inputs the recognition contents, the table data management unit 112 uses the form format data 121 to retrieve the usage frequency table 12 of the form.
The use / non-use of No. 3 is checked (step S37), and if it is "use", the contents of the use frequency table 123 are updated (step S38). That is, if the character is not registered, the character code is registered, and if it is registered, the frequency counter value in the corresponding dictionary is incremented. In step S37,
If the usage frequency table 123 is "not used", the process directly returns to step S31.

【００３９】このようにして、文字認識装置２００での
帳票の読み取りが１枚１枚行われる毎に、前回までの使
用頻度テーブル１２３の内容が文字認識装置２００に転
送される。従って、最初のうちは候補文字としての順位
の低かった候補も、帳票の読み取りを行っているうち
に、正解文字として順位が高くなり、認識率の向上を図
ることができる。In this way, each time the form is read by the character recognition device 200, the contents of the use frequency table 123 up to the previous time are transferred to the character recognition device 200. Therefore, even a candidate whose rank as a candidate character is low at first becomes higher as a correct character while reading the form, and the recognition rate can be improved.

【００４０】尚、上記実施例１では、文字認識装置２０
０による帳票フォーマットデータ要求を受信する毎に使
用頻度テーブル１２３の全データを転送するよう構成し
ているが、更新された文字コードのデータのみ転送する
よう構成してもよい。このようにすることで、使用頻度
テーブル１２３の転送時間を短縮することができる。In the first embodiment, the character recognition device 20
Although all the data of the usage frequency table 123 is transferred every time the form format data request of 0 is received, it may be configured to transfer only the data of the updated character code. By doing so, the transfer time of the usage frequency table 123 can be shortened.

【００４１】以上のように、上記実施例１では、単語認
識処理装置１００から使用頻度テーブル１２３のデータ
を文字認識装置２００に転送し、文字認識装置２００側
では、この使用頻度テーブル１２３の内容を加味して候
補文字の順位の並べ替えを行うよう構成したので、文字
認識装置２００側から出力される候補文字の順位が使用
頻度の高い順序となるため、同一文字の再現性の高い業
務ほど認識率が向上し、かつ、単語照合処理の時間短縮
を図ることができる。また、この実施例１は、文字認識
装置２００側から送られる候補文字に制限がある場合で
も正確な単語認識処理を行うことができるものである。As described above, in the first embodiment, the data of the usage frequency table 123 is transferred from the word recognition processing device 100 to the character recognition device 200, and the contents of the usage frequency table 123 are transferred on the character recognition device 200 side. Since the order of the candidate characters is rearranged in consideration, the order of the candidate characters output from the character recognition device 200 is in the order of high use frequency. The rate can be improved and the time for word matching processing can be shortened. In addition, the first embodiment is capable of performing accurate word recognition processing even when there are restrictions on the candidate characters sent from the character recognition device 200 side.

【００４２】ところで、上記実施例１では、文字認識装
置２００側で候補文字の並べ替えを行うように構成した
が、単語認識処理装置１００側でその並べ替えを行うよ
うに構成してもよく、これを実施例２として次に説明す
る。In the first embodiment, the character recognition device 200 is arranged to rearrange the candidate characters, but the word recognition processing device 100 may be arranged to rearrange the candidate characters. This will be described below as a second embodiment.

【００４３】《実施例２》図５は、実施例２の文字認識
システムの構成図である。即ち、単語認識処理装置１０
０の制御部１１０には、単語認識処理部１１１とテーブ
ルデータ管理部１１２とが設けられていると共に、並べ
替え部１１４が設けられている。尚、この場合、実施例
１のテーブルデータ転送部１１３は不要となる。<Second Embodiment> FIG. 5 is a block diagram of a character recognition system according to a second embodiment. That is, the word recognition processing device 10
The control unit 110 of 0 is provided with a word recognition processing unit 111, a table data management unit 112, and a rearrangement unit 114. In this case, the table data transfer unit 113 of the first embodiment is unnecessary.

【００４４】この並べ替え部１１４は、上記実施例１に
おける文字認識装置２００の並べ替え部２１３と同様の
機能を有し、文字認識装置２００から受け取った候補文
字に対して使用頻度テーブル１２３の内容を加味してそ
の順位を並べ替えるものである。尚、単語認識処理装置
１００における他の構成は、上記実施例１と同様である
ため、ここでの説明は省略する。This rearrangement unit 114 has the same function as the rearrangement unit 213 of the character recognition device 200 in the first embodiment, and the contents of the usage frequency table 123 for the candidate characters received from the character recognition device 200. The order is rearranged in consideration of. Since the other configuration of the word recognition processing device 100 is the same as that of the above-described first embodiment, the description thereof is omitted here.

【００４５】一方、文字認識装置２００側の文字認識処
理部２１０には、読取部２１１と文字認識部２１２とを
備え、また、記憶部２２０にはキャラクタセットテーブ
ル２２１を備えている。即ち、実施例１で備えていた並
べ替え部２１３と使用頻度テーブル格納部２２２とは不
要となっている。従って、文字認識装置２００では、文
字認識装置としての基本的な機能のみを備えていればよ
いことになる。On the other hand, the character recognition processing section 210 of the character recognition apparatus 200 side is provided with a reading section 211 and a character recognition section 212, and the storage section 220 is provided with a character set table 221. That is, the rearrangement unit 213 and the usage frequency table storage unit 222 provided in the first embodiment are not necessary. Therefore, the character recognition device 200 only needs to have a basic function as a character recognition device.

【００４６】次に、このように構成された実施例２の動
作について説明する。《実施例２の動作》図６および図７はそれぞれ、実施例
２における単語認識処理装置１００および文字認識装置
２００の動作フローチャートである。先ず、文字認識装
置２００に帳票が挿入されると、読取部２１１が読み取
りを開始し（ステップＳ７１）、実施例１と同様に、帳
票フォーマットデータ要求を行う（ステップＳ７２）。Next, the operation of the second embodiment thus constructed will be described. << Operation of Second Embodiment >> FIGS. 6 and 7 are operation flowcharts of the word recognition processing apparatus 100 and the character recognition apparatus 200 in the second embodiment, respectively. First, when a form is inserted into the character recognition device 200, the reading unit 211 starts reading (step S71), and a form format data request is made as in the first embodiment (step S72).

【００４７】単語認識処理装置１００は、文字認識装置
２００からの受信内容を判定し（ステップＳ６１）、こ
れが、帳票フォーマットデータ要求であった場合、帳票
の識別データに対応する帳票フォーマットデータ１２１
を取り出して、これを転送し（ステップＳ６２）、ステ
ップＳ６１に戻る。即ち、実施例１とは異なり、使用頻
度テーブル１２３のデータ転送は行わない。The word recognition processing device 100 judges the contents received from the character recognition device 200 (step S61). If this is a request for the form format data, the form format data 121 corresponding to the form identification data is obtained.
Is taken out and transferred (step S62), and the process returns to step S61. That is, unlike the first embodiment, data transfer of the usage frequency table 123 is not performed.

【００４８】文字認識装置２００は、単語認識処理装置
１００からの帳票フォーマットデータ１２１を受信する
と（ステップＳ７３）、文字認識部２１２が文字解析処
理を開始し（ステップＳ７４）、キャラクタセットテー
ブル２２１から候補文字を選出する（ステップＳ７
５）。以上の動作は、上記実施例１の動作と同様であ
る。そして、候補文字が選出されると、文字認識処理部
２１０は、これら候補文字を含む帳票１枚分の文字認識
結果を単語認識処理装置１００に転送する（ステップＳ
７７）。When the character recognition device 200 receives the form format data 121 from the word recognition processing device 100 (step S73), the character recognition unit 212 starts the character analysis process (step S74), and the character set table 221 selects candidates. Select characters (step S7)
5). The above operation is similar to the operation of the first embodiment. Then, when the candidate characters are selected, the character recognition processing unit 210 transfers the character recognition result of one form including these candidate characters to the word recognition processing device 100 (step S).
77).

【００４９】単語認識処理装置１００は、ステップＳ６
１の受信において、受信内容が文字認識装置２００から
の文字認識結果であった場合、単語認識処理部１１１
は、先ず、帳票フォーマットデータ１２１に基づき、そ
の帳票が使用頻度テーブル１２３を使用するか否かを判
定する（ステップＳ６３）。このステップＳ６３におい
て、使用頻度テーブル１２３を“使用する”であった場
合、並べ替え部１１４は、使用頻度テーブル１２３の内
容を加味して候補文字の順位の並べ替えを行う（ステッ
プＳ６４）。The word recognition processing apparatus 100 has step S6.
In the reception of 1, when the received content is the character recognition result from the character recognition device 200, the word recognition processing unit 111
First, based on the form format data 121, it is determined whether or not the form uses the use frequency table 123 (step S63). If the usage frequency table 123 is "used" in step S63, the sorting unit 114 sorts the ranks of the candidate characters in consideration of the contents of the usage frequency table 123 (step S64).

【００５０】ステップＳ６４での並べ替えが終了する
と、また、ステップＳ６３において、“使用しない”で
あった場合、単語認識処理部１１１は、単語照合処理を
行う（ステップＳ６５）。即ち、各フィールド毎に、対
応する辞書ファイル１２２を参照し、単語としての文字
認識を行う。これが終了すると、単語認識処理部１１１
は、その単語認識処理の認識結果や文字認識装置２００
から受け取った記入文字のイメージデータを表示し、オ
ペレータによる確認・修正処理を促す（ステップＳ６
６）。When the rearrangement in step S64 is completed, and when it is "not used" in step S63, the word recognition processing unit 111 performs a word collation process (step S65). That is, for each field, the corresponding dictionary file 122 is referred to, and character recognition as a word is performed. When this is completed, the word recognition processing unit 111
Is the recognition result of the word recognition process and the character recognition device 200.
The image data of the written characters received from the computer is displayed to prompt the operator for confirmation / correction processing (step S6).
6).

【００５１】オペレータが認識内容を確認すると、テー
ブルデータ管理部１１２は、帳票フォーマットデータ１
２１から、その帳票の、使用頻度テーブル１２３の使用
する／しないを調べ（ステップＳ６７）、“使用する”
であった場合は、使用頻度テーブル１２３の内容を更新
する（ステップＳ６９）。即ち、その文字が登録されて
いない場合は、文字コードの登録を行い、登録されてい
る場合は、対応する辞書での頻度カウンタ値をインクリ
メントする。また、ステップＳ６７において、使用頻度
テーブル１２３を“使用しない”であった場合は、その
ままステップＳ６１に戻る。When the operator confirms the recognition contents, the table data management unit 112 determines that the form format data 1
From 21, the use frequency table 123 of the form is checked whether it is used or not (step S67), and “used”
If it is, the contents of the usage frequency table 123 are updated (step S69). That is, if the character is not registered, the character code is registered, and if it is registered, the frequency counter value in the corresponding dictionary is incremented. If the usage frequency table 123 is "not used" in step S67, the process directly returns to step S61.

【００５２】このように、実施例２においても、候補文
字の順位に対して、前回までの使用頻度テーブル１２３
の内容が反映される。従って、最初のうちは候補文字と
しての順位の低かった候補文字も、帳票の読み取りを行
っているうちに、正解文字として順位が高くなり、認識
率の向上を図ることができる。As described above, also in the second embodiment, the use frequency table 123 up to the last time is used for the rank of the candidate character.
The content of is reflected. Therefore, even a candidate character that has a lower rank as a candidate character at the beginning becomes higher as a correct character while the form is being read, and the recognition rate can be improved.

【００５３】更に、この実施例２においては、文字認識
装置２００は基本的な動作のみを行うだけであるため、
文字認識装置２００の演算処理能力はさほど高いものは
要求されず、相対的に文字認識装置２００の演算処理能
力より単語認識処理装置１００の演算処理能力が高いシ
ステムに有利である。但し、文字認識装置２００側から
転送する候補文字は十分な個数とする必要がある。Further, in the second embodiment, the character recognition device 200 only performs basic operations,
The character recognition device 200 is not required to have a high calculation processing capability, and is advantageous for a system in which the word recognition processing device 100 has a relatively higher calculation processing capability than the character recognition device 200. However, a sufficient number of candidate characters must be transferred from the character recognition device 200 side.

【００５４】尚、上記各実施例では、帳票フォーマット
データ１２１を、文字認識装置２００が帳票を読み取る
毎に、単語認識処理装置１００側から送るよう構成した
が、これに限定されるものではなく、前回と同様の帳票
フォーマットデータ１２１であった場合は送らないよう
構成してもよい。また、この帳票フォーマットデータ１
２１を単語認識処理装置１００側から送るのではなく、
文字認識装置２００側で持っていてもよい。また、文字
認識装置２００は、光学式文字読取装置としたが、特に
光学的に文字を読み取るシステムに限定されるものでは
なく、たとえば、磁気的な読み取り等であってもよい。In each of the above embodiments, the form format data 121 is sent from the word recognition processing device 100 side every time the character recognition device 200 reads a form, but the present invention is not limited to this. If the form format data 121 is the same as the previous one, it may be configured not to be sent. Also, this form format data 1
21 is not sent from the word recognition processing device 100 side,
The character recognition device 200 may have it. Further, although the character recognition device 200 is an optical character reading device, the character recognition device 200 is not particularly limited to an optical character reading system, and may be, for example, magnetic reading.

【００５５】[0055]

【発明の効果】以上説明したように、本発明の文字認識
システムによれば、文字認識部が選出する候補文字の順
位を、前回までの単語認識結果に基づく使用頻度を考慮
して並べ替え、この並べ替えた順位で単語認識処理を行
うよう構成したので、文字認識処理の効率の向上が図
れ、かつ、修正時等のオペレータの負担を軽減すること
ができる。As described above, according to the character recognition system of the present invention, the order of candidate characters selected by the character recognition unit is rearranged in consideration of the frequency of use based on the word recognition results up to the last time, Since the word recognition processing is performed in this rearranged order, the efficiency of the character recognition processing can be improved and the operator's burden at the time of correction can be reduced.

[Brief description of drawings]

【図１】本発明の文字認識システムにおける実施例１の
構成図である。FIG. 1 is a configuration diagram of a first embodiment of a character recognition system of the present invention.

【図２】本発明の文字認識システムにおける使用頻度テ
ーブルの説明図である。FIG. 2 is an explanatory diagram of a usage frequency table in the character recognition system of the present invention.

【図３】本発明の文字認識システムの実施例１における
単語認識処理装置の動作を示すフローチャートである。FIG. 3 is a flowchart showing the operation of the word recognition processing device in the first embodiment of the character recognition system of the present invention.

【図４】本発明の文字認識システムの実施例１における
文字認識装置の動作を示すフローチャートである。FIG. 4 is a flowchart showing the operation of the character recognition device in the first embodiment of the character recognition system of the present invention.

【図５】本発明の文字認識システムにおける実施例２の
構成図である。FIG. 5 is a configuration diagram of a second embodiment of the character recognition system of the present invention.

【図６】本発明の文字認識システムの実施例２における
単語認識処理装置の動作を示すフローチャートである。FIG. 6 is a flowchart showing the operation of the word recognition processing device in the second embodiment of the character recognition system of the present invention.

【図７】本発明の文字認識システムの実施例２における
文字認識装置の動作を示すフローチャートである。FIG. 7 is a flowchart showing the operation of the character recognition device in the second embodiment of the character recognition system of the present invention.

[Explanation of symbols]

１００単語認識処理装置１１１単語認識処理部１１２テーブルデータ管理部１１３テーブルデータ転送部１１４並べ替え部１２３使用頻度テーブル２００文字認識装置２１２文字認識部２１３並べ替え部２２２使用頻度テーブル格納部 100 word recognition processing device 111 word recognition processing unit 112 table data management unit 113 table data transfer unit 114 sorting unit 123 usage frequency table 200 character recognition device 212 character recognition unit 213 sorting unit 222 usage frequency table storage unit

Claims

[Claims]

1. A character recognition system that reads characters on a form and performs word recognition processing on the form, performs character recognition for each character, and ranks a plurality of candidate characters for each character. A character recognition unit to be selected, a usage frequency table indicating the usage frequency of any character up to the previous time, and a sorting unit that sorts the ranks of the candidate characters selected by the character recognition unit based on the usage frequency of the usage frequency table. And a word recognition processing unit that performs a recognition process of a word composed of each character based on the rank of the candidate characters rearranged by the rearrangement unit.

2. A character recognition device that reads characters on a form and performs character recognition processing on each character, and word recognition that performs word recognition processing based on the character recognition result of each character received from this character recognition device. In a character recognition system comprising a processing device, the word recognition processing device, a usage frequency table indicating the usage frequency of any character up to the last time, and a table data transfer unit for transferring the usage frequency table to the character recognition device. The character recognition device performs character recognition processing for each character, and a plurality of candidate characters are given a candidate order for setting the order in which the word recognition processing device performs the word recognition processing, and each character is recognized. A character recognition unit that is selected for, a use frequency table storage unit that receives the use frequency table transferred from the table data transfer unit, and retains the contents. When the character recognition unit selects a plurality of candidate characters, based on the frequency of use of the frequency-of-use table stored in the table storage unit, the candidate ranks of the plurality of candidate characters are set to the rank of the candidate character with high frequency of use. The character recognition system further comprises: a rearrangement unit that rearranges the candidate characters so that the candidate characters are rearranged in the rearranged order and sends the candidate characters to the word recognition processing device.

3. A character recognition device for reading characters on a form, performing character recognition processing on each character, and selecting a plurality of candidate characters for each character in a candidate order, and selecting by this character recognition device. In a character recognition system including a word recognition processing device that performs a word recognition process based on the candidate characters, the word recognition processing device, a usage frequency table showing the usage frequency of any character up to the last time, the character recognition A rearrangement unit for rearranging the rank of each candidate character with respect to each of the plurality of candidate characters for each character received from the device so that the rank of the candidate character having a high frequency of use becomes high using the frequency of use table. A character recognition system comprising: a word recognition processing unit that performs a word recognition process based on the order of the candidate characters rearranged by the rearrangement unit.