JPH05282484A

JPH05282484A - Optical character reader

Info

Publication number: JPH05282484A
Application number: JP4077449A
Authority: JP
Inventors: Etsuo Saito; 悦生斉藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1992-03-31
Filing date: 1992-03-31
Publication date: 1993-10-29

Abstract

PURPOSE:To provide the superior optical character reader which can read characters at a high recognition ratio by performing some postprocessing for improving the recognition ratio of characters without using a 'word' dictionary. CONSTITUTION:An optical system 2 reads characters entered into a document 1. In general, a character recognizing means 8 reads KANA (Japanese syllabary) characters and 'Roman characters'. An area of plural successive characters is specified by using an FC table 7, etc., and data specifying the kind of the read characters in this area is stored; and then, the certain postprocessing is specified for a group of areas consisting of KANA character areas and Roman character areas specified as areas having the same pronunciation. A KANA- Roman character matching processing part 13 collates and compares the KANA character recognition result with Roman character recognition result as to the group of specified areas to perform the certain postprocessing.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、帳票に記載された文字
を光学的に読取り、その文字コードをファイルや印刷機
などへ出力するための、光学的文字読取り装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an optical character reader for optically reading a character written on a form and outputting the character code to a file or a printing machine.

【０００２】[0002]

【従来の技術】従来、光学的文字読取り装置としては、
図５に示すような構成を有する装置が存在している。こ
の装置について、以下に説明する。2. Description of the Related Art Conventionally, as an optical character reader,
There are devices having a configuration as shown in FIG. This device will be described below.

【０００３】まず、図５中において、１は読取りの対象
となる帳票であり、読取りの対象となる文字が、手書き
あるいは印刷などの方法によって記入されている。そし
て、この帳票１を読取るための光学的文字読取り装置
は、大別して、データを入力して蓄積するための前処理
部と、蓄積されたデータを認識するための後処理部とか
ら構成されている。First, in FIG. 5, reference numeral 1 is a form to be read, and characters to be read are filled in by a method such as handwriting or printing. The optical character reading device for reading the form 1 is roughly divided into a pre-processing unit for inputting and storing data and a post-processing unit for recognizing the stored data. There is.

【０００４】このうち、前処理部は、帳票１上の文字を
走査し、光学信号の形で画像データを得る光学系２、光
学信号を電気信号に変換する光電変換部３、電気信号を
ディジタル信号に変換するＡ／Ｄ変換部４、及び、ディ
ジタル信号化された画像データを蓄積する画像メモリ５
が順次接続されて構成されている。Among these, the preprocessing unit scans the characters on the form 1 to obtain an image data in the form of an optical signal, an optical system 2, a photoelectric conversion unit 3 for converting the optical signal into an electric signal, and a digital electric signal. A / D converter 4 for converting into a signal, and an image memory 5 for storing image data converted into a digital signal
Are sequentially connected and configured.

【０００５】一方、後処理部は、画像メモリ５中に蓄積
された帳票画像データから１文字ずつの文字画像データ
を順次切出す文字画像切出し手段６、予め設定した帳票
フォーマットデータ（以下には、ＦＣデータと称する）
を登録してなり、文字画像切出し手段６に接続されたＦ
Ｃテーブル７、文字画像切出し手段６によって切出され
た文字画像データを認識する文字認識手段８、認識対象
となる文字の標準文字パターン画像データを対応する文
字コードと共に予め格納してなり、文字認識手段８に接
続された標準文字パターン格納部９、及び、文字画像デ
ータの認識結果を編集する認識結果編集手段１０によっ
て構成されている。On the other hand, the post-processing section includes a character image cutout means 6 for sequentially cutting out character image data of each character from the form image data stored in the image memory 5, preset form format data (hereinafter, (Referred to as FC data)
Is registered and is connected to the character image cutting means 6
The C table 7, the character recognition means 8 for recognizing the character image data cut out by the character image cutout means 6, and the standard character pattern image data of the character to be recognized are stored in advance together with the corresponding character code. A standard character pattern storage unit 9 connected to the means 8 and a recognition result editing means 10 for editing the recognition result of the character image data.

【０００６】以上のような構成を有する光学的文字読取
り装置の作用は、次の通りである。まず、帳票１上に記
入された文字を、光学系２によって走査し、光学信号を
得る。そして、この光学信号を光電変換部３に送り、電
気信号に変換する。さらに、この電気信号を、Ａ／Ｄ変
換部４でディジタル信号に変換し、画像データとして画
像メモリ５に蓄積する。The operation of the optical character reader having the above construction is as follows. First, the characters written on the form 1 are scanned by the optical system 2 to obtain an optical signal. Then, this optical signal is sent to the photoelectric conversion unit 3 and converted into an electric signal. Further, this electric signal is converted into a digital signal by the A / D converter 4 and stored in the image memory 5 as image data.

【０００７】次に、画像メモリ５に蓄積した帳票画像デ
ータを使用して後処理を行う。典型的には、帳票１枚分
の画像データの画像メモリ５への蓄積を終了した時点
で、この蓄積された帳票画像データから、文字画像切出
し手段６により、ＦＣテーブル７中のＦＣデータに従っ
て文字位置を計算し、１文字ずつの文字画像データを順
次切出して、文字認識手段８に送る。文字認識手段８に
おいては、標準文字パターン格納部９中の標準文字パタ
ーン画像データと、入力された文字画像データとを比較
照合して、最も近いと判断した文字パターンの文字コー
ドを出力する。この認識の後、認識結果編集手段１０に
おいて、各文字ごとの認識結果を編集して、最終的な帳
票の認識結果を得る。Next, post-processing is performed using the form image data stored in the image memory 5. Typically, when the accumulation of the image data of one form in the image memory 5 is completed, the character image cutout unit 6 extracts the characters from the accumulated form image data according to the FC data in the FC table 7. The position is calculated, and character image data of each character is sequentially cut out and sent to the character recognition means 8. The character recognizing means 8 compares and collates the standard character pattern image data in the standard character pattern storage unit 9 with the input character image data, and outputs the character code of the character pattern determined to be the closest. After this recognition, the recognition result editing means 10 edits the recognition result for each character to obtain the final recognition result of the form.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記の
ような従来の光学的文字読取り装置においては、個別の
文字ごとに単独に認識処理を行っていたので、高い認識
率を得ることが困難であった。また、このような個別の
文字認識方法以外の方法として、複数の連続した文字を
読取り、いわゆる「単語」として辞書に登録された文字
列と突合せて最も近い単語を結果として出力する、とい
う方法も存在しているが、この方法では、辞書に登録し
てある単語のみしか認識できず、また、辞書のサイズが
増大すると突合せの処理に時間がかかり、あるいは、辞
書を保持するために大きな外部記憶領域が必要となる、
などの問題点があった。However, in the conventional optical character reading device as described above, since the recognition processing is individually performed for each individual character, it is difficult to obtain a high recognition rate. It was As a method other than such individual character recognition method, a method of reading a plurality of consecutive characters and matching them with a character string registered in a dictionary as a so-called "word" and outputting the closest word as a result is also available. Although this method exists, this method can only recognize words registered in the dictionary, and the matching process will take longer if the size of the dictionary increases, or a large external memory will be required to hold the dictionary. Space is needed,
There was a problem such as.

【０００９】本発明は、上記のような従来技術の課題を
解決するために提案されたものであり、その目的は、
「単語」辞書を使用せずに文字の認識率を向上し得るよ
うな何等かの後処理を行うことにより、高い認識率で文
字を読取ることが可能な、優れた光学的文字読取り装置
を提供することである。The present invention has been proposed in order to solve the problems of the prior art as described above, and its purpose is to:
Providing an excellent optical character reader that can read characters with a high recognition rate by performing some post-processing that can improve the character recognition rate without using a "word" dictionary It is to be.

【００１０】[0010]

【課題を解決するための手段】本発明は、同じ読みのデ
ータが、「カナ」と「ローマ字」の２種類で記入されて
いる場合に着目したものであり、このような場合に、同
じ読みの「カナ」と「ローマ字」の突合せ処理を行うよ
うに構成したものである。The present invention focuses on the case where the same reading data is entered in two types of "kana" and "romaji". In such a case, the same reading is performed. It is configured to perform the matching process of "kana" and "romaji".

【００１１】すなわち、本発明の光学的文字読取り装置
は、光学的に文字を読取る光学的文字読取り装置におい
て、「カナ」文字を読取るカナ文字認識手段と、「ロー
マ字」を読取るローマ字認識手段と、複数の連続した文
字からなる領域を指定し、この領域の読取り文字種を指
定したデータを記憶する認識文字種指定手段と、認識文
字種指定手段によって、「カナ」文字の読取りを指定さ
れたカナ文字領域と、このカナ文字領域と同一の読みの
データであると指定され、且つ、「ローマ字」の読取り
を指定されたローマ字領域とから構成される領域の組に
対して、一定の後処理を行うように指定する後処理指定
手段と、後処理指定手段によって指定された領域の組を
構成するカナ文字領域とローマ字領域について、カナ文
字認識手段によって認識されたカナ文字認識結果と、ロ
ーマ字認識手段によって認識されたローマ字認識結果と
を照合比較して一定の後処理を行うカナ−ローマ字突合
せ処理部とを備えたことを特徴としている。That is, the optical character reader according to the present invention is an optical character reader for optically reading characters, in which a kana character recognizing means for reading a "kana" character and a roman character recognizing means for reading a "roman character" are provided. An area consisting of multiple consecutive characters is specified, and the recognition character type specification means that stores the data that specifies the read character type of this area, and the kana character area specified by the recognition character type specification means to read the "kana" character. , To perform a certain post-processing on a set of areas that are designated as the same reading data as this Kana character area and that are designated as "Roman character" reading. The kana character recognizing means selects the post-processing specifying means and the kana character area and the roman character area forming the set of areas specified by the post-processing specifying means. It is characterized in that a Roman character matching process section - and recognized Kana character recognition result, by matching comparing the recognized Romanized recognition result Kana performing certain post-processing by the Roman alphabet recognition means.

【００１２】より具体的には、認識文字指定手段及び後
処理指定手段として、これらを兼ねたＦＣテーブルが使
用され、このＦＣテーブルには、読取る対象となる帳票
に対応するＦＣデータが、文字毎に、あるいは、連続し
た文字を単位とした領域毎に登録され、この領域情報の
一部として、同一の読みであると指定されたカナ文字と
ローマ字の領域の組が登録され、さらに、このカナ文字
とローマ字の領域の組における認識結果を、カナ−ロー
マ字突合せ処理部で処理を行わせるための指令情報が格
納されることが可能である。More specifically, an FC table that also serves as the recognition character designating means and the post-processing designating means is used. In this FC table, FC data corresponding to the form to be read is written for each character. Or a region of continuous characters is registered as a unit, and as a part of this region information, a set of Kana and Romaji regions designated to have the same reading is registered. It is possible to store command information for causing the kana-romaji matching processing section to process the recognition result in the set of the character and the romaji area.

【００１３】また、カナ文字認識手段及びローマ字認識
手段が、類似度計算を行うことによって、類似度の最も
大きな文字からソートして各文字毎に一定数の複数候補
を認識結果として出力するように構成されることが可能
である。この場合、カナ−ローマ字突合せ処理部が、カ
ナ文字認識手段及びローマ字認識手段から出力されたカ
ナ候補及びローマ字候補のうち、認識文字指定手段の領
域情報によって同一の読みであると指定されたカナ文字
とローマ字の領域の組に対応するカナ候補及びローマ字
候補を使用し、後処理指定手段の指定に従ってカナ候補
文字列とローマ字候補列を生成し、これらを突合せ処理
して認識結果を編集するように構成されることが望まし
い。Further, the kana character recognizing means and the roman character recognizing means sort the characters having the highest degree of similarity by performing similarity degree calculation, and output a fixed number of plural candidates for each character as a recognition result. Can be configured. In this case, the Kana-Roman character matching processing unit specifies the same Kana character among the Kana candidate and the Roman character candidate output from the Kana character recognizing unit and the Roman character recognizing unit by the area information of the recognition character designating unit. The Kana candidate and the Roman character candidate corresponding to the pair of the and the Roman character area are used, the Kana candidate character string and the Roman character candidate string are generated according to the designation of the post-processing designating means, and the matching result is processed to edit the recognition result. It is desirable to be configured.

【００１４】代表的には、カナ−ローマ字突合せ処理部
が、カナ文字認識手段によって出力されたカナ候補によ
ってカナ候補文字列を生成し、このカナ候補文字列から
ローマ字候補列を生成し、カナ候補文字列とカナ候補を
突合せてカナ候補文字列の評価点を計算すると共に、ロ
ーマ字候補列とローマ字候補を突合せてローマ字候補列
の評価点を計算し、さらに、カナ文字認識手段によって
出力されたカナ候補の候補順に評価点を配点し、カナ文
字とローマ字の候補列の各組に対する評価点を計算し、
最終的に、最高の評価点を得た組を認識結果として出力
するように構成されることが望ましい。Typically, the Kana-Roman character matching processing unit generates a Kana candidate character string from the Kana candidate output by the Kana character recognition means, generates a Roman character candidate string from this Kana candidate character string, and generates a Kana candidate. The evaluation points of the Kana candidate character strings are calculated by matching the character strings and Kana candidates, and the evaluation points of the Roman character candidate strings are calculated by matching the Roman character candidate strings and Roman character candidates, and the Kana character recognition means outputs the Kana character recognition means. The evaluation points are assigned in the order of candidates, and the evaluation points for each set of candidate strings of Kana and Roman characters are calculated,
Finally, it is desirable that the set having the highest evaluation score is output as the recognition result.

【００１５】[0015]

【作用】以上のような構成を有する本発明の光学的文字
読取り装置においては、同じ読みで記入されている「カ
ナ」と「ローマ字」を、認識文字種指定手段のデータに
従い、カナ文字認識手段及びローマ字認識手段によって
個別に読取って認識結果をそれぞれ出力し、後処理指定
手段によって指定された領域の組を構成するカナ文字領
域とローマ字領域について、カナ−ローマ字突合せ処理
部により、カナ文字認識結果とローマ字認識結果とを照
合比較して一定の後処理を行うことができる。すなわ
ち、本発明においては、同じ読みであると指定されてい
る領域の組についてカナ文字認識結果とローマ字認識結
果の突合せ処理を行うことにより、これらの領域におけ
る文字の認識率を向上することができる。In the optical character reading device of the present invention having the above-mentioned structure, the "kana" and "romaji" written in the same reading are converted into kana character recognizing means and The Romaji recognition unit individually reads out the recognition results and outputs the recognition results, and the Kana-Romaji matching processing unit determines a Kana character recognition result for the Kana character region and the Romaji region that form the set of regions designated by the post-processing designation unit. A certain amount of post-processing can be performed by collating and comparing with the Roman character recognition result. That is, in the present invention, by performing the matching process of the kana character recognition result and the roman character recognition result for a set of areas designated as having the same reading, the recognition rate of characters in these areas can be improved. ..

【００１６】[0016]

【実施例】本発明による光学的文字読取り装置の一実施
例を、図１乃至図４を用いて説明する。なお、図５に示
した従来例と同一部分には同一符号を付している。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the optical character reader according to the present invention will be described with reference to FIGS. The same parts as those in the conventional example shown in FIG. 5 are designated by the same reference numerals.

【００１７】図１において、帳票１には、図２に示すよ
うな予めドロップアウトカラーで印刷されたＯＣＲ読取
り用の記入枠が印刷されている。この帳票１上には、Ｏ
ＣＲ記入枠に従って、本実施例の光学的文字読取り装置
の読取りの対象となる文字が、手書きあるいは印刷など
の方法によって記入されている。そして、この帳票１を
読取るための光学的文字読取り装置は、前述した従来例
と同様に、大別して、データを入力して蓄積するための
前処理部と、蓄積されたデータを認識するための後処理
部とから構成されている。In FIG. 1, the form 1 has an OCR reading entry frame printed in dropout color in advance as shown in FIG. On this form 1, O
Characters to be read by the optical character reading device of this embodiment are written by a method such as handwriting or printing according to the CR entry frame. The optical character reading device for reading the form 1 is roughly divided into a preprocessing unit for inputting and storing data and a device for recognizing the stored data, as in the above-described conventional example. It is composed of a post-processing unit.

【００１８】このうち、前処理部としては、まず、図５
に示す従来例と同様に、帳票１上の文字を走査し、光学
信号の形で画像データを得る光学系２、光学信号を電気
信号に変換する光電変換部３、電気信号をディジタル信
号に変換するＡ／Ｄ変換部４、及び、ディジタル信号化
された画像データを蓄積する画像メモリ５が順次接続さ
れている。この場合、光電変換部３としては、例えば、
ＣＣＤセンサなどが使用されている。また、Ａ／Ｄ変換
部４は、サンプリングと量子化を施し、例えば、８本／
ｍｍの２値画像データを得るように構成されている。そ
して、本実施例の前処理部は、以上の構成に加えて、帳
票１を搬送するための帳票搬送手段１１と、制御信号を
出力して帳票搬送手段１１を制御する搬送制御部１２と
を有している。Of these, the preprocessing unit is first shown in FIG.
As in the conventional example shown in FIG. 1, an optical system 2 that scans characters on the form 1 to obtain image data in the form of optical signals, a photoelectric conversion unit 3 that converts the optical signals into electric signals, and converts the electric signals into digital signals. The A / D conversion unit 4 and the image memory 5 that stores the image data converted into digital signals are sequentially connected. In this case, as the photoelectric conversion unit 3, for example,
A CCD sensor or the like is used. Further, the A / D conversion unit 4 performs sampling and quantization, for example, 8 lines /
It is configured to obtain mm binary image data. In addition to the above-described configuration, the preprocessing unit of the present embodiment includes a form conveyance unit 11 for conveying the form 1 and a conveyance control unit 12 that outputs a control signal to control the form conveyance unit 11. Have

【００１９】一方、後処理部としては、まず、図５に示
す従来例と同様に、画像メモリ５中に蓄積されたＯＣＲ
画像データから文字画像データを求め、１文字ずつの文
字画像データを順次切出す文字画像切出し手段６、予め
設定したＦＣデータを登録してなり、文字画像切出し手
段６に接続されたＦＣテーブル７、文字画像切出し手段
６によって切出された文字画像データを認識する文字認
識手段８、認識対象となる文字の標準文字パターン画像
データを対応する文字コードと共に予め格納してなり、
文字認識手段８に接続された標準文字パターン格納部
９、及び、文字画像データの認識結果を編集する認識結
果編集手段１０が設けられている。On the other hand, as the post-processing section, first, similarly to the conventional example shown in FIG. 5, the OCR stored in the image memory 5 is used.
The character image data is obtained from the image data, and the character image cutout unit 6 sequentially cuts out the character image data for each character, and the FC table 7 connected to the character image cutout unit 6 in which preset FC data is registered. Character recognition means 8 for recognizing the character image data cut out by the character image cutout means 6, standard character pattern image data of the character to be recognized, and a corresponding character code are stored in advance.
A standard character pattern storage unit 9 connected to the character recognition unit 8 and a recognition result editing unit 10 for editing the recognition result of the character image data are provided.

【００２０】この場合、ＦＣテーブル７は、本発明にお
ける認識文字指定手段と後処理指定手段とを兼ねた手段
に相当する。すなわち、このＦＣテーブル７中には、読
取る各種の帳票に対応する複数のＦＣデータとして、例
えば、帳票の左上隅を原点として図ったＯＣＲ文字記入
枠の位置、大きさ、文字枠のピッチ、文字種などのデー
タが、文字毎に、あるいは、連続した文字を単位とした
フィールド毎に、登録されている。さらに、このような
フィールド情報の一部として、同一の読みであると指定
されたカナ文字とローマ字のフィールドのペアが登録さ
れており、このようなカナ−ローマ字フィールドペアに
おける認識結果を、カナ−ローマ字突合せ処理部１３で
処理を行わせるための指令情報が格納されている。In this case, the FC table 7 corresponds to a unit that serves as both the recognized character designating unit and the post-processing designating unit in the present invention. That is, in this FC table 7, as a plurality of FC data corresponding to various forms to be read, for example, the position, size, character frame pitch, and character type of the OCR character entry frame with the upper left corner of the form as the origin. Data such as is registered for each character or for each field in units of continuous characters. Furthermore, as a part of such field information, a pair of Kana-character and Romaji fields designated as having the same reading is registered. Command information for causing the Romaji matching processing unit 13 to perform processing is stored.

【００２１】また、文字認識手段８は、その読取り文字
種として、少なくとも「カナ」文字と「ローマ字」との
２種類を含んでおり、本発明におけるカナ文字認識手段
とローマ字認識手段とを兼ねた認識手段に相当する。す
なわち、この文字認識手段８は、例えば、複合類似度法
などを用いて類似度計算して、類似度の最も大きな文字
からソートして各文字毎に３候補ずつ出力するように構
成されている。The character recognizing means 8 includes at least two kinds of characters to be read, that is, "kana" character and "roman character", and the recognition functioning as both kana character recognizing means and roman character recognizing means in the present invention. It corresponds to the means. That is, the character recognizing means 8 is configured to calculate the similarity using, for example, the composite similarity method, sort from the character having the highest similarity, and output three candidates for each character. ..

【００２２】そして、本実施例の後処理部は、以上の構
成に加えて、文字認識手段８と認識結果編集手段１０と
の間に、カナ−ローマ字突合せ処理部１３を有してい
る。このカナ−ローマ字突合せ処理部１３は、文字認識
手段８の認識結果とＦＣテーブル７中のフィールド情報
を使用して、同一の読みであると指定されたカナ−ロー
マ字フィールドペアのカナ候補文字列とローマ字候補列
を生成し、これらを突合せ処理して認識結果を編集する
ように構成されている。The post-processing unit of this embodiment has a Kana-Roman character matching processing unit 13 between the character recognizing unit 8 and the recognition result editing unit 10 in addition to the above configuration. The Kana-Roman character matching processing unit 13 uses the recognition result of the character recognition unit 8 and the field information in the FC table 7 as a Kana candidate character string of the Kana-Roman character field pair designated as the same reading. It is configured to generate a Roman character candidate sequence, perform a matching process on these, and edit the recognition result.

【００２３】以上のような構成を有する本実施例の光学
的文字読取り装置の作用は、次の通りである。まず、搬
送制御部１２から出力される制御信号によって、帳票搬
送手段１１を制御し、帳票１を搬送する。この搬送状態
において、光学系２により、帳票１上に記入された文字
を、印刷されたＯＣＲ文字記入枠の１ライン毎に走査
し、光学信号を得る。そして、この光学信号を、光電変
換部３で電気信号に変換し、さらに、Ａ／Ｄ変換部４で
サンプリングと量子化を施し、８本／ｍｍの２値画像デ
ータを得て、この画像データを画像メモリ５に蓄積す
る。The operation of the optical character reader of the present embodiment having the above construction is as follows. First, the form conveying unit 11 is controlled by a control signal output from the conveyance control unit 12 to convey the form 1. In this conveyance state, the optical system 2 scans the characters written on the form 1 for each line of the printed OCR character entry frame to obtain an optical signal. Then, this optical signal is converted into an electric signal by the photoelectric conversion unit 3, and further subjected to sampling and quantization by the A / D conversion unit 4 to obtain binary image data of 8 lines / mm, and this image data is obtained. Are stored in the image memory 5.

【００２４】次に、画像メモリ５に蓄積したＯＣＲ画像
データから、文字画像切出し手段６により、１文字毎の
文字画像データを切出す。すなわち、文字画像切出し手
段６により、ＦＣテーブル７内に、文字毎に、あるい
は、連続した文字を単位としたフィールド毎に登録され
たＯＣＲ文字記入枠の位置、大きさ、文字枠のピッチ、
文字種などのＦＣデータに従って、１文字ずつの文字画
像データを切出し、切出した文字画像データを、文字認
識手段８に送る。Next, from the OCR image data stored in the image memory 5, the character image cutting means 6 cuts out character image data for each character. That is, the position, size, pitch of the character frame of the OCR character entry frame registered in the FC table 7 by the character image cutting means 6 for each character or for each field in units of continuous characters,
The character image data for each character is cut out according to the FC data such as the character type, and the cut-out character image data is sent to the character recognition means 8.

【００２５】そして、文字認識手段８においては、送ら
れてきた文字画像データを、標準文字パターン蓄積部９
の標準文字パターン画像データと比較照合して、最も近
いと判断した文字パターンの文字コードを出力する。す
なわち、複合類似度法などを用いて類似度計算して、類
似度の最も大きな文字からソートして、各文字毎に３候
補ずつ出力し、この認識結果を、例えばフィールド単位
で、カナ−ローマ字突合せ処理部１３に送る。Then, in the character recognition means 8, the sent character image data is stored in the standard character pattern storage section 9
The character code of the character pattern determined to be the closest to the standard character pattern image data is output. That is, the similarity is calculated using the composite similarity method, the character having the highest similarity is sorted, and three candidates are output for each character. The recognition result is, for example, a field unit and a kana-romaji character is output. It is sent to the matching processing unit 13.

【００２６】続いて、カナ−ローマ字突合せ処理部１３
により、送られてきた認識結果を、ＦＣテーブル７中の
フィールド情報と照合し、カナ文字とローマ字の突合わ
せ処理が指定されているフィールドに関して、該当する
フィールドの認識結果を用いてカナ文字とローマ字の認
識結果の編集作業を行う。以下には、この編集作業の一
例に関し、図３及び図４を用いてさらに詳しく説明す
る。ここで、図３は、文字認識手段８による認識結果の
一例を示す図であり、「タカハシ」というカナ文字と
「ＴＡＫＡＨＡＳＨＩ」というローマ字を読取らせた場
合に、各カナ文字画像データに対して与えられた３候補
ずつのカナ文字を示す表（Ａ）と、各ローマ字画像デー
タに対して与えられた３候補ずつのローマ字を示す表
（Ｂ）である。また、図４は、認識結果を編集するため
に設定されたカナ−ローマ字突合せ処理部１３の処理フ
ローの一例を示すフローチャートである。Next, the Kana-Romaji matching processing unit 13
The received recognition result is collated with the field information in the FC table 7, and the kana character and the roman character are used by using the recognition result of the corresponding field for the field for which the matching process of the kana character and the roman character is designated. Edit the recognition result of. An example of this editing work will be described in more detail below with reference to FIGS. 3 and 4. Here, FIG. 3 is a diagram showing an example of the recognition result by the character recognizing means 8. When the kana character "Takahashi" and the roman character "Takahashi" are read, each kana character image data is read. It is a table (A) showing Kana characters for each given 3 candidates, and a table (B) showing Romaji for each 3 candidates given for each Roman character image data. Further, FIG. 4 is a flowchart showing an example of a processing flow of the Kana-Roman character matching processing unit 13 set to edit the recognition result.

【００２７】すなわち、図３に示す認識結果の編集作業
に当たっては、まず、図４に示すようなステップ１〜６
を行って、第１番目のカナ−ローマ字候補列ペアの１文
字当たりの評価点を計算し、記憶する。具体的には、ス
テップ１に示すように、カナ候補の中から、第１候補だ
けを選択して、第１番目のカナ候補文字列を生成する。
この場合、図３の例では、「タカハツ」となる。次に、
ステップ２に示すように、このカナ候補文字列「タカハ
ツ」をローマ字化し、第１番目のローマ字候補列「ＴＡ
ＫＡＨＡＴＳＵ」というローマ字候補列を生成する。こ
の結果、第１番目のカナ−ローマ字候補列ペア「タカハ
ツ」−「ＴＡＫＡＨＡＴＳＵ」が得られる。That is, in editing the recognition result shown in FIG. 3, first, steps 1 to 6 as shown in FIG.
Then, the evaluation score per character of the first kana-Roman character candidate string pair is calculated and stored. Specifically, as shown in step 1, only the first candidate is selected from the kana candidates to generate the first kana candidate character string.
In this case, in the example of FIG. 3, it is “Takahatsu”. next,
As shown in step 2, the kana candidate character string "Takahatsu" is converted into Roman characters, and the first Roman character candidate string "TA" is displayed.
A roman character candidate string "KAHATSU" is generated. As a result, the first kana-Roman character candidate string pair “Takahatsu”-“TAKAHATSU” is obtained.

【００２８】続いて、ステップ３，４に示すように、生
成した第１番目のカナ−ローマ字候補列ペア「タカハ
ツ」−「ＴＡＫＡＨＡＴＳＵ」を構成するカナ候補文字
列「タカハツ」及びローマ字候補列「ＴＡＫＡＨＡＴＳ
Ｕ」の各々と、図３に示すカナ候補及びローマ字候補の
認識結果とを比較照合して、これらのカナ候補文字列及
びローマ字候補列の評価点を計算する。例えば、カナ文
字とローマ字のそれぞれについて、認識結果の候補の中
に当該文字が存在した場合、第１候補に５点、第２候補
に３点、第３候補に１点、さらに候補にない場合には、
０点といったように配点して全文字の総和を計算し、カ
ナ候補文字列及びローマ字候補列の評価点を得る。Subsequently, as shown in steps 3 and 4, the kana candidate character string "Takahatsu" and the Roman character candidate string "Takahats" which form the first kana-Roman character candidate string pair "Takahatsu"-"TAKAHATSU" generated.
U ”and the recognition results of the Kana candidate and the Roman character candidate shown in FIG. 3 are compared and collated to calculate the evaluation points of the Kana candidate character string and the Roman character candidate string. For example, if the character exists in the recognition result candidates for Kana and Roman characters respectively, the first candidate has 5 points, the second candidate has 3 points, the third candidate has 1 point, and there is no further candidate. Has
The total sum of all characters is calculated by allocating points such as 0, and the evaluation points of the kana candidate character string and the roman character candidate string are obtained.

【００２９】この場合、図３の例において、カナ候補文
字列「タカハツ」を構成する４文字は、いずれも第１候
補であり、それぞれに５点が配点されるため、列として
の評価点は、［５点×４＝２０点］となる。また、ロー
マ字候補列「ＴＡＫＡＨＡＴＳＵ」を構成する９文字の
うち、最初の文字「Ｔ」は第２候補であるため、３点が
配点され、続く５文字「ＡＫＡＨＡ」はいずれも第１候
補であるため、５点が配点されるが、残る３文字「ＴＳ
Ｕ」はいずれも候補にないため、０点が配点され、この
結果、列としての評価点は、［３点×１＋５点×５＝２
８点］となる。従って、第１番目のカナ−ローマ字候補
列ペア全体としての評価点は、［２０点＋２８点＝４８
点］となる。In this case, in the example of FIG. 3, the four characters forming the kana candidate character string "Takahatsu" are all first candidates, and 5 points are assigned to each of them, so that the evaluation score as a column is , [5 points × 4 = 20 points]. Also, among the nine characters that make up the Roman character candidate string “TAKAHATSU”, the first character “T” is the second candidate, so three points are assigned, and the subsequent five characters “AKAHA” are all the first candidates. Therefore, 5 points are assigned, but the remaining 3 letters "TS
Since “U” is not a candidate, 0 point is assigned, and as a result, the evaluation score as a column is [3 points × 1 + 5 points × 5 = 2
8 points]. Therefore, the evaluation score of the entire first kana-Roman character candidate string pair is [20 points + 28 points = 48].
Point].

【００３０】さらに、ステップ５に示すように、このよ
うにして得た第１番目のカナ−ローマ字候補の評価点
を、総文字数で割って、その１文字当たりの評価点を得
る。この場合、第１番目のカナ−ローマ字候補列ペアの
評価点は前述の通り４８点であり、総文字数は、４文字
＋９文字＝１３文字であるため、１文字当たりの評価点
は、［４８点÷１３＝３．６９…］となり、約３．７点
である。そして、ステップ６に示すように、この評価点
（３．７点）を第１番目のカナ−ローマ字候補列ペア
「タカハツ」−「ＴＡＫＡＨＡＴＳＵ」の評価点として
記憶する。Further, as shown in step 5, the evaluation score of the first Kana-Roman character candidate thus obtained is divided by the total number of characters to obtain an evaluation score per character. In this case, the first kana-Roman character candidate string pair has an evaluation score of 48 points as described above, and the total number of characters is 4 characters + 9 characters = 13 characters. Therefore, the evaluation score per character is [48 Point / 13 = 3.69 ...], which is about 3.7 points. Then, as shown in step 6, this evaluation point (3.7 points) is stored as the evaluation point of the first kana-Roman character candidate string pair “Takahatsu”-“TAKAHATSU”.

【００３１】以上のようにして、第１番目のカナ−ロー
マ字候補列ペアの１文字当たりの評価点を計算し、記憶
した後は、同様にして、第２番目のカナ−ローマ字候補
列ペアの１文字当たりの評価点を計算し、記憶する。す
なわち、カナの候補を１文字だけ第２候補の中から選択
して第２番目のカナ候補文字列を生成し（ステップ
１）、同様に、一連のステップ２〜６を行って、第２番
目のカナ−ローマ字候補列ペアの評価点を得る。このよ
うな手順を繰り返し、全てのカナ−ローマ字候補列ペア
について、１文字当たりの評価点を計算し、記憶する。After the evaluation score per character of the first kana-Roman character candidate string pair is calculated and stored as described above, the second kana-Roman character candidate character string pair is similarly calculated. The evaluation score per character is calculated and stored. That is, only one character of the kana candidate is selected from the second candidates to generate the second kana candidate character string (step 1), and similarly, a series of steps 2 to 6 is performed to make the second character. Obtain the evaluation score of the Kana-Romaji candidate string pair. By repeating this procedure, the evaluation score per character is calculated and stored for all kana-romaji candidate string pairs.

【００３２】そして、最後のカナ候補文字列を生成し、
最後のカナ−ローマ字候補列ペアの１文字当たりの評価
点を計算し、記憶した時点で、言い換えれば、ステップ
７に示すように、新たなカナ候補文字列が生成できなく
なった時点で、ステップ８に進み、全てのカナ−ローマ
字候補列ペアの中から、最高の評価点を有するカナ−ロ
ーマ字候補列ペアを選択し、認識結果として出力する。Then, the final kana candidate character string is generated,
When the evaluation score per character of the last kana-romaji candidate character string pair is calculated and stored, in other words, as shown in step 7, when a new kana candidate character string cannot be generated, step 8 is executed. Proceeding to step 2, the kana-roman character candidate string pair having the highest evaluation point is selected from all the kana-roman character candidate string pairs, and is output as the recognition result.

【００３３】さらに、認識結果編集手段１０により、カ
ナ−ローマ字突合せ処理部１３において得られたカナ−
ローマ字フィールドペアの認識結果と、その他のフィー
ルドの認識結果とを編集し、最終的な認識結果を出力す
る。Furthermore, the kana-romaji matching processing section 13 obtains the kana obtained by the recognition result editing means 10.
The recognition result of the Roman character field pair and the recognition result of the other fields are edited and the final recognition result is output.

【００３４】以上説明したように、本実施例において
は、従来技術のように「単語」辞書を使用せずに、カナ
−ローマ字突合せ処理部１３によって同一の読みである
と指定されたカナ−ローマ字フィールドペアにおける
「カナ」と「ローマ字」の突合せ処理を行うことによ
り、これらのフィールドにおける文字の認識率を向上さ
せることができる。As described above, in the present embodiment, the Kana-Roman character matching processing unit 13 specifies the same reading without using the "word" dictionary as in the prior art. By performing the matching process of "kana" and "romaji" in the field pair, the recognition rate of characters in these fields can be improved.

【００３５】なお、本発明は、前記実施例に限定される
ものではなく、例えば、文字認識手段８で出力する候補
の数は、適宜変更可能であり、候補の類似度計算も各種
の方法で行うことが可能である。また、具体的なカナ−
ローマ字の突合せ処理の方法も、自由に選択可能であ
り、例えば、前記実施例において、全てのカナ−ローマ
字候補列ペアを評価する代わりに、第１候補及び第２候
補が一定割合以上含まれている一部の候補列ペアのみを
抽出して評価する方法なども可能である。The present invention is not limited to the above-described embodiment. For example, the number of candidates output by the character recognizing means 8 can be changed as appropriate, and the similarity calculation of candidates can be performed by various methods. It is possible to do. In addition, specific kana
The method of roman character matching processing can also be freely selected. For example, in the above embodiment, instead of evaluating all the kana-romaji candidate string pairs, the first candidate and the second candidate are included in a certain proportion or more. It is also possible to extract and evaluate only some of the candidate column pairs that exist.

【００３６】[0036]

【発明の効果】以上説明したように、本発明において
は、カナ文字認識手段によって認識されたカナ文字認識
結果と、ローマ字認識手段によって認識されたローマ字
認識結果とを、カナ−ローマ字突合せ処理部によって照
合比較して一定の後処理を行うことにより、従来のよう
に「単語」辞書を使用せずに、高い認識率で文字を読取
ることが可能な、優れた光学的文字読取り装置を提供す
ることができる。As described above, according to the present invention, the kana-character recognition result recognized by the kana-character recognizing means and the roman character recognition result recognized by the roman-character recognizing means are processed by the kana-roma character matching processing section. To provide an excellent optical character reading device capable of reading characters with a high recognition rate without using a "word" dictionary as in the past by performing collation comparison and performing a certain post-processing. You can

[Brief description of drawings]

【図１】本発明による光学的文字読取り装置の一実施例
を示すブロック図。FIG. 1 is a block diagram showing an embodiment of an optical character reader according to the present invention.

【図２】図１の装置の読取り対象となる帳票の記入枠の
一例を示す模式図。FIG. 2 is a schematic diagram showing an example of an entry frame of a form to be read by the apparatus of FIG.

【図３】図１の文字認識手段による認識結果の一例を示
す図であり、同じ読みであると指定されている領域の組
について、「タカハシ」というカナ文字と「ＴＡＫＡＨ
ＡＳＨＩ」というローマ字を読取らせた場合に、各カナ
文字画像データに対して与えられた３候補ずつのカナ文
字を示す表（Ａ）と、各ローマ字画像データに対して与
えられた３候補ずつのローマ字を示す表（Ｂ）。FIG. 3 is a diagram showing an example of a recognition result by the character recognition means of FIG. 1, in which a kana character “Takahashi” and “TAKAH” are set for a set of areas designated to have the same reading.
(A) showing Kana characters for each of the three candidates given to each Kana character image data when the Roman character "ASHI" is read, and three candidates given for each Romaji image data (B) showing the Roman letters of.

【図４】図１のカナ−ローマ字突合せ処理部の処理フロ
ーの一例を示すフローチャート。FIG. 4 is a flowchart showing an example of a processing flow of a Kana-Roman character matching processing unit in FIG.

【図５】従来の光学的文字読取り装置の一例を示すブロ
ック図。FIG. 5 is a block diagram showing an example of a conventional optical character reading device.

[Explanation of symbols]

１…帳票２…光学系３…光電変換部４…Ａ／Ｄ変換部５…画像メモリ６…文字画像切出し手段７…ＦＣテーブル８…文字認識手段９…標準文字パターン格納部１０…認識結果編集手段１１…帳票搬送手段１２…搬送制御部１３…カナ−ローマ字突合せ処理部 1 ... Form 2 ... Optical system 3 ... Photoelectric conversion unit 4 ... A / D conversion unit 5 ... Image memory 6 ... Character image cutout unit 7 ... FC table 8 ... Character recognition unit 9 ... Standard character pattern storage unit 10 ... Recognition result edit Means 11 ... Form transportation means 12 ... Transportation control section 13 ... Kana-Roman character matching processing section

Claims

[Claims]

1. An optical character reader for optically reading characters, wherein kana character recognizing means for reading "kana" characters, roman character recognizing means for reading "romaji", and an area consisting of a plurality of continuous characters are designated. Then, the recognition character type designating means that stores the data that specifies the reading character type of this area, and the kana character area designated to read the "kana" character by the recognition character type designating means and the reading of the same reading as this kana character area. A post-processing designation means for designating a fixed post-processing for a set of areas which are designated as data and which are designated as "Roman characters" and which are designated as Roman characters; The kana character recognition result recognized by the kana character recognition means for the kana character area and the roman character area forming the set of areas specified by the specifying means, A Roman alphabet recognition result recognized by the recognition unit by matching comparison make certain postprocessing Kana - optical character reading apparatus characterized by having a Romaji matching process unit.

2. An FC table, which also serves as the recognition character designating means and the post-processing designating means, is used.
In the C table, FC corresponding to the form to be read
Data is registered for each character or for each area in units of continuous characters, and as part of this area information, a set of kana and romaji areas designated to have the same reading is registered. 2. The optical character according to claim 1, further comprising command information for storing a recognition result in the combination of the Kana character and Roman characters in a Kana-Roman character matching processing unit. Reader.

3. The kana character recognizing means and the roman character recognizing means sort the characters having the highest degree of similarity by performing similarity degree calculation, and output a fixed number of plural candidates as a recognition result for each character. The optical character reading device according to claim 1, wherein the optical character reading device is configured.

4. The kana-Roman character matching processing unit specifies that the Kana character recognition unit and the Roman character candidate output from the Roman character recognition unit have the same reading by the area information of the recognized character designating unit. A kana candidate and a roman character candidate corresponding to a pair of a kana character and a roman character area are used to generate a kana candidate character string and a roman character candidate string according to the designation of the post-processing designating means. The optical character reader according to claim 3, wherein the optical character reader is configured as described above.

5. A kana-romaji matching processing unit generates a kana candidate character string from the kana candidate output by the kana character recognition means, generates a roman character candidate string from this kana candidate character string, and generates a kana candidate character string. The kana candidates are matched to calculate the kana candidate character string evaluation score, and the romaji candidate string and the romaji candidate string are matched to calculate the romaji candidate string evaluation score. Further, kana candidate candidates output by the kana character recognition means are calculated. The evaluation points are assigned in order, the evaluation points for each pair of candidate strings of Kana and Roman characters are calculated, and finally,
The optical character reading device according to claim 4, wherein the set having the highest evaluation score is configured to be output as a recognition result.