JPH07239916A

JPH07239916A - Printed character recognition device

Info

Publication number: JPH07239916A
Application number: JP6054992A
Authority: JP
Inventors: Kenichi Hattori; 健一服部; Takeshi Machida; 健町田
Original assignee: N T T DATA TSUSHIN KK; NTT Data Communications Systems Corp
Current assignee: N T T DATA TSUSHIN KK; NTT Data Corp
Priority date: 1994-03-01
Filing date: 1994-03-01
Publication date: 1995-09-12

Abstract

PURPOSE:To obtain a high recognition rate of printed characters regardless of the kinds of character font. CONSTITUTION:A dictionary number corresponding to a font used by each customer is registered in a disk storage device 3 of a work station 1. Upon the receipt of a printed matter from a customer, the document is read by an image reader 7 and the read image is displayed on the work station 1. Then, a part of the displayed document image is recognized tentatively by using a dictionary whose number corresponds to the customer, and when the result of correct recognition indicates 85% or over, the image is recognized regularly by using the dictionary. When the result of correct recognition indicates 85% or below or the dictionary number corresponding to the customer has not been registered yet, the regular recognition is implemented by using a total dictionary including the entire fonts.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、印刷文字の認識装置に
関し、特に、複数種類の文字フォントに対応できる印刷
文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a print character recognition device, and more particularly to a print character recognition device that can handle a plurality of types of character fonts.

【０００２】[0002]

【従来の技術】例えば、多数の取引先からの取引書類を
文字認識装置（ＯＣＲ）を用いて自動認識するような場
合、取引書類の印刷に使用された文字フォントは取引先
によって異なるのが通常である。そのため、特にファク
シミリからの受信書類のように、ノイズの多く含まれて
いる書類は、その書類に使用された文字フォントとＯＣ
Ｒの辞書が対応している文字フォントとが異なっている
場合には、高い認識率を得ることが難しいという問題が
ある。2. Description of the Related Art For example, when a transaction document from a large number of business partners is automatically recognized by using a character recognition device (OCR), the character font used for printing the transaction document usually differs depending on the business partner. Is. Therefore, a document that contains a lot of noise, such as a document received from a facsimile, may have a character font and OC that are used in the document.
If the character dictionary corresponding to the R dictionary is different, it is difficult to obtain a high recognition rate.

【０００３】[0003]

【発明が解決しようとする課題】このように、印刷書類
を認識するＯＣＲでは、書類の文字フォントがＯＣＲの
それと対応していないと、十分な認識率が得られず実用
できないというケースがある。As described above, in the OCR for recognizing a printed document, if the character font of the document does not correspond to that of the OCR, there is a case in which a sufficient recognition rate cannot be obtained and the document cannot be used.

【０００４】従って、本発明の主目的は、印刷書類の文
字認識において、印刷書類の文字フォントがどのような
種類のフォントであっても、高い認識率が得られるよう
にすることにある。Therefore, a main object of the present invention is to obtain a high recognition rate in character recognition of a printed document, regardless of the type of font of the printed document.

【０００５】本発明の副なる目的は、種々の文字フォン
トで印刷された書類の文字認識を行う場合に、出来るだ
け短時間で高精度の認識結果が得られるようにすること
にある。A secondary object of the present invention is to obtain a highly accurate recognition result in the shortest possible time when performing character recognition on a document printed with various character fonts.

【０００６】[0006]

【課題を解決するための手段】本発明に係る印刷文字認
識装置は、複数の文字フォントに対応した複数の認識辞
書を蓄積した辞書蓄積手段と、認識対象書類を提供して
くる各印刷元毎に、使用している文字フォントに対応し
た辞書番号が登録された辞書テーブルと、この辞書テー
ブルを参照することにより、認識対象書類について使用
すべき認識辞書の辞書番号を選択する辞書手段と、選択
された辞書番号に対応する辞書蓄積手段内の辞書を用い
て、認識対象書類の文字認識を行う認識手段とを備えた
ことを特徴とする。SUMMARY OF THE INVENTION A print character recognition apparatus according to the present invention includes a dictionary storage unit that stores a plurality of recognition dictionaries corresponding to a plurality of character fonts, and each printing source that provides a recognition target document. , A dictionary table in which dictionary numbers corresponding to the character fonts used are registered, and by referring to this dictionary table, dictionary means for selecting the dictionary number of the recognition dictionary to be used for the document to be recognized, and selecting It is characterized by further comprising a recognition means for performing character recognition of the recognition target document by using the dictionary in the dictionary storage means corresponding to the generated dictionary number.

【０００７】[0007]

【作用】本発明によれば、種々の文字フォントに対応す
る種々の認識辞書が予め用意されている。そして、認識
対象書類を受けると、まず、その印刷元の使用している
文字フォントに対応する辞書番号が辞書テーブル内から
検索され、その辞書番号に対応する認識辞書を用いてそ
の認識対象書類の文字認識が行われる。従って、使用す
る文字フォントが印刷元毎に異なっていても、自動的に
印刷元に応じた適切な認識辞書が選択されるため、高い
認識率での文字認識が可能となる。しかも、全ての辞書
を使用する場合に比較し、処理時間が短くて済む。According to the present invention, various recognition dictionaries corresponding to various character fonts are prepared in advance. When the document to be recognized is received, first, the dictionary number corresponding to the character font used by the printing source is searched from the dictionary table, and the document to be recognized is recognized using the recognition dictionary corresponding to the dictionary number. Character recognition is performed. Therefore, even if the character font to be used is different for each printing source, an appropriate recognition dictionary is automatically selected according to the printing source, so that character recognition can be performed with a high recognition rate. Moreover, the processing time is shorter than that when all dictionaries are used.

【０００８】好適な実施例では、辞書テーブル内から登
録の辞書番号が一応選択された後、この選択された辞書
番号の辞書の各々を用いて、認識対象書類内の指定され
た部分的範囲を仮認識して認識率を求める仮認識手段が
更に設けられる。そして、辞書選択手段は、仮認識手段
からの認識率の中に所定のしきい値以上のものがあれ
ば、その中の最高の認識率を得た辞書番号を最終的に選
択し、しきい値以上のものが無ければ、辞書蓄積手段内
の全部の辞書を選択する。更に、辞書テーブルに印刷元
に対応する辞書番号が登録されていない場合も、全部の
辞書を選択する。In the preferred embodiment, after a registered dictionary number is once selected from the dictionary table, each of the dictionaries having the selected dictionary number is used to determine a specified partial range in the document to be recognized. Temporary recognition means for tentatively recognizing and obtaining the recognition rate is further provided. Then, if the recognition rate from the temporary recognition means is equal to or higher than a predetermined threshold value, the dictionary selection means finally selects the dictionary number having the highest recognition rate among them, and sets the threshold value. If there is no more than the value, all dictionaries in the dictionary storage means are selected. Further, even when the dictionary number corresponding to the printing source is not registered in the dictionary table, all dictionaries are selected.

【０００９】このようにすると、辞書テーブル内の登録
辞書番号が間違っていたり、登録されていなかったり、
印刷元が使用フォントを変更したりした場合でも、誤っ
た辞書を用いることがなくなり、常に高い認識率を得る
ことができる。In this way, the registered dictionary number in the dictionary table is incorrect, or it is not registered,
Even if the printing source changes the font used, the wrong dictionary is not used, and a high recognition rate can always be obtained.

【００１０】更に、本発明の装置では、上記の校正に加
え、認識手段からの認識結果に基づいて、正解頻度の最
も高い辞書を選択し辞書テーブルに登録する辞書登録手
段を更に設けてもよい。そうすると、印刷元が文字フォ
ントを変更したり、新しい印刷元が追加されたりした場
合でも、自動的に適切な辞書番号が辞書テーブルに登録
されるため、人手による登録作業が省略または削減でき
る。Further, in addition to the above-mentioned proofreading, the apparatus of the present invention may further include dictionary registration means for selecting the dictionary having the highest correct answer frequency based on the recognition result from the recognition means and registering it in the dictionary table. . Then, even if the printing source changes the character font or a new printing source is added, an appropriate dictionary number is automatically registered in the dictionary table, so that manual registration work can be omitted or reduced.

【００１１】[0011]

【実施例】以下、本発明の実施例を図面により詳細に説
明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１２】図１は、本発明に係る印刷文字認識装置の
一実施例のシステム構成を示す。このシステムの主たる
用途は、種々の顧客から送られてくる印刷書類のイメー
ジを読取り、その文字を自動認識することである。FIG. 1 shows the system configuration of an embodiment of a printed character recognition apparatus according to the present invention. The main use of this system is to read images of printed documents sent from various customers and automatically recognize the characters.

【００１３】図１において、ワークステーション１は、
このシステムの全体の制御、認識辞書の選択、認識結果
の校正及び出力を行うためのもので、コンピュータ１１
とディスプレイ１３とキーボード１５から構成され、更
に、後述する辞書テーブルを格納したディスク記憶装置
３を備える。In FIG. 1, the workstation 1 is
The computer 11 controls the entire system, selects a recognition dictionary, calibrates and outputs the recognition result.
It comprises a display 13 and a keyboard 15, and further comprises a disk storage device 3 which stores a dictionary table described later.

【００１４】このワークステーション１には、コンピュ
ータ１１からの指令に従い印刷書類のイメージを処理す
るイメージ処理装置（ＩＰＵ）５が接続されている。こ
のＩＰＵ３には、印刷書類のイメージを読取るためのイ
メージリーダ７と、読取ったイメージに基づき印刷書類
の文字を認識する文字認識装置（ＣＲＵ）９とが接続さ
れている。An image processing unit (IPU) 5 for processing an image of a printed document according to a command from a computer 11 is connected to the workstation 1. An image reader 7 for reading an image of a printed document and a character recognition unit (CRU) 9 for recognizing characters of the printed document based on the read image are connected to the IPU 3.

【００１５】図２は、ディスク記憶装置３に格納された
辞書テーブルの一例を示す。FIG. 2 shows an example of a dictionary table stored in the disk storage device 3.

【００１６】図２に示すように、辞書テーブルには、種
々の顧客を識別するための顧客コードと、各顧客コード
毎に登録された辞書番号とが記述されている。ここで、
辞書番号とは、ＣＲＵ９内に格納されている複数の認識
辞書の識別番号である。後述するように、ＣＲＵ９内に
は、多種類の文字フォントについて各文字フォント毎に
一つの認識辞書が設けられているため、辞書番号は文字
フォントの識別番号でもある。辞書テーブルへの顧客毎
に辞書番号の登録は、各顧客が使用する文字フォントを
事前に調べ、ワークステーション１から行う。As shown in FIG. 2, customer codes for identifying various customers and dictionary numbers registered for each customer code are described in the dictionary table. here,
The dictionary number is an identification number of a plurality of recognition dictionaries stored in the CRU 9. As will be described later, since one recognition dictionary is provided for each character font in CRU 9 for various kinds of character fonts, the dictionary number is also a character font identification number. The registration of the dictionary number for each customer in the dictionary table is performed from the workstation 1 by checking the character font used by each customer in advance.

【００１７】図３は認識すべき文字種（カテゴリ）を示
し、英文に用いる数字、アルファベット文字及び記号の
５６種のカテゴリを例示している。図４は、ＣＲＵ９内
のメモリに格納された、上記５６種のカテゴリについて
の認識辞書の構成を示す。FIG. 3 shows character types (categories) to be recognized, and exemplifies 56 types of categories such as numbers, alphabetic characters and symbols used in English. FIG. 4 shows the structure of the recognition dictionary for the above 56 categories stored in the memory of the CRU 9.

【００１８】図４に示すように、一つの文字フォントに
関して一つの辞書が用意され、全部で３０種類の文字フ
ォントに対応して３０個の辞書Ｄ1〜Ｄ30が用意されて
いる。個々の辞書エリアには、２５６カテゴリ分の参照
ベクトルが格納できるが、この実施例では図３に示した
５６種のカテゴリの参照ベクトルだけが１番〜５６番の
エリアに格納され、５７番〜２５６番のエリアは空いて
いる。As shown in FIG. 4, one dictionary is prepared for one character font, and 30 dictionaries D1 to D30 are prepared corresponding to 30 character fonts in total. Although reference vectors for 256 categories can be stored in each dictionary area, in this embodiment, only the reference vectors for the 56 categories shown in FIG. 3 are stored in the areas 1 to 56, and the reference vectors 57 to 57 are stored. Area No. 256 is vacant.

【００１９】尚、各辞書に２５６カテゴリ分のエリアを
設けている理由は、ＣＲＵ９のハードウェアが、２５６
カテゴリの認識処理を同時にできるように、２５６個の
１ビット処理ユニットを並列に配置しているからであ
る。ＣＲＵ９では、認識に使用する辞書が選定される
と、認識対象文字の特徴ベクトルの個々のビットを２５
６個の処理ユニットに同時に送ると共に、選定された辞
書の２５６カテゴリの参照ベクトルの対応するビットを
並列に２５６個の処理ユニットに送る。この動作が、認
識対象文字の特徴ベクトルの全部のビットについて繰り
返され、最後のビットが終了すると、その対象文字の２
５６カテゴリに対する距離値が同時に得られることにな
る。The reason why each dictionary has an area for 256 categories is that the hardware of the CRU 9 has 256 areas.
This is because 256 1-bit processing units are arranged in parallel so that category recognition processing can be performed simultaneously. In CRU9, when the dictionary used for recognition is selected, each bit of the feature vector of the character to be recognized is set to 25 bits.
Simultaneously to the six processing units, the corresponding bits of the reference vector of 256 categories of the selected dictionary are sent in parallel to the 256 processing units. This operation is repeated for all the bits of the feature vector of the recognition target character, and when the last bit ends, 2
Distance values for 56 categories will be obtained at the same time.

【００２０】図５は、このシステムにおけるワークステ
ーション１のコンピュータ１１が行う全体的な処理の流
れを示す。FIG. 5 shows the flow of the overall processing performed by the computer 11 of the workstation 1 in this system.

【００２１】図５に示すように、イメージリーダ７で読
取った印刷原稿のイメージをイメージＩＰＵ５から受信
するところから処理が始る（ステップＳ１）。まず、受
信した原稿イメージをディスプレイ１３に表示し（ステ
ップＳ２）、次に辞書選択の処理に入る（ステップＳ
３）。この辞書選択処理の詳細は後に説明する。As shown in FIG. 5, the process starts when the image of the print document read by the image reader 7 is received from the image IPU 5 (step S1). First, the received document image is displayed on the display 13 (step S2), and then the dictionary selection process is started (step S2).
3). Details of this dictionary selection processing will be described later.

【００２２】辞書が選択されると、その辞書番号と認識
実行指令とをＩＰＵ５に送信する。すると、ＩＰＵ５は
原稿イメージと辞書番号とをＣＲＵ９に送り、ＣＲＵ９
はその原稿イメージと辞書番号に対応する辞書とを用い
て認識を行い、認識結果をＩＰＵ５を通じてコンピュー
タ１１に返す。When the dictionary is selected, the dictionary number and the recognition execution command are transmitted to the IPU 5. Then, the IPU 5 sends the document image and the dictionary number to the CRU 9, and the CRU 9
Recognizes using the document image and the dictionary corresponding to the dictionary number, and returns the recognition result to the computer 11 through the IPU 5.

【００２３】コンピュータ１１は、この認識結果を受信
し（ステップＳ５）、次に校正処理に入る（ステップＳ
６）。受信した認識結果には、原稿の各文字について距
離値の小さい順に第１位〜第１６位までの１６候補の文
字イメージが含まれている。校正処理の初期状態では、
各文字の１６候補のうち第１位の候補のイメージだけが
原稿イメージと対応づけてディスプレイされる。オペレ
ータは、候補イメージと原稿イメージとを対照して、候
補イメージが間違っている文字があれば、キーボード又
はマウスでこれを指摘する。The computer 11 receives the recognition result (step S5), and then enters the calibration process (step S5).
6). The received recognition result includes 16 candidate character images of 1st to 16th in descending order of distance value for each character of the document. In the initial state of the calibration process,
Of the 16 candidates for each character, only the first candidate image is displayed in association with the original image. The operator compares the candidate image with the original image, and if there is a character in which the candidate image is incorrect, points it out with a keyboard or a mouse.

【００２４】この指摘を受けると、コンピュータ１１は
その文字の残り１５候補を一覧表示し、この中からオペ
レータが正解文字を選択する。このような作業を繰り返
して校正処理が終了する。Upon receiving this indication, the computer 11 displays a list of the remaining 15 candidates for the character, and the operator selects the correct answer character from the list. The calibration process is completed by repeating such operations.

【００２５】校正処理が終了すると、校正後の認識結果
に基づく正しい書類イメージをプリントアウトしたりそ
の書類の文書データをディスク記憶装置３に格納したり
して処理を終了する。When the proofreading process is completed, a correct document image based on the recognition result after the proofreading is printed out and the document data of the document is stored in the disk storage device 3 and the process is completed.

【００２６】図６は、上述した辞書選択の詳細な処理流
れを示す。FIG. 6 shows a detailed processing flow of the dictionary selection described above.

【００２７】図６に示すように、まずオペレータがマウ
スやキーボードを用いて、ディスプレイされた原稿イメ
ージの中から仮認識を行う範囲（例えば、最初の一文
章）を指定する（ステップＳ３０１）。続いて、オペレ
ータがキーボードより、その仮認識範囲の正解文字列を
入力する（ステップＳ３０２）。更に、オペレータがキ
ーボードより、顧客コードを入力し（ステップＳ３０
３）、そして実行命令を入力する（ステップＳ３０
４）。As shown in FIG. 6, first, the operator uses a mouse or a keyboard to specify a range (for example, the first sentence) for temporary recognition from the displayed document image (step S301). Then, the operator inputs the correct character string of the temporary recognition range from the keyboard (step S302). Further, the operator inputs the customer code from the keyboard (step S30
3), and input an execution command (step S30)
4).

【００２８】コンピュータ１１は、これらの入力情報を
受けると、まず、顧客コードに対応する辞書番号を辞書
テーブルから検索し（ステップＳ３０５）、辞書番号が
登録されているか否かチェックする（ステップＳ３０
６）。その結果、顧客コードに対応する欄に辞書番号が
登録されていなければ、辞書番号”０”を選択する（ス
テップＳ３０７）。Upon receiving these input information, the computer 11 first searches the dictionary table for the dictionary number corresponding to the customer code (step S305) and checks whether or not the dictionary number is registered (step S30).
6). As a result, if the dictionary number is not registered in the column corresponding to the customer code, the dictionary number "0" is selected (step S307).

【００２９】一方、ステップＳ３０６のチェックの結
果、対応する欄に１種類または２種類以上の辞書番号が
登録されていれば、各辞書番号と共に指定範囲の認識実
行指令をＩＰＵ５に送る。これにより、ＣＲＵ９におい
て、その辞書番号に対応する辞書を用いて指定範囲の認
識が行われ、その認識結果がコンピュータ１１に返送さ
れる。On the other hand, as a result of the check in step S306, if one or more types of dictionary numbers are registered in the corresponding column, the recognition execution command of the designated range is sent to the IPU 5 together with each dictionary number. As a result, the CRU 9 recognizes the designated range using the dictionary corresponding to the dictionary number, and the recognition result is returned to the computer 11.

【００３０】コンピュータ１１は、この認識結果を受信
すると（ステップＳ３０９）、これと先に入力された正
解文字列との対照を行って、認識率（正解率）を計算す
る（ステップＳ３１０）。When the computer 11 receives the recognition result (step S309), it compares the recognition result with the previously inputted correct answer character string to calculate the recognition rate (correct answer rate) (step S310).

【００３１】この指定範囲に対する仮認識処理を、対応
欄に登録されている全ての辞書について行った後（ステ
ップＳ３１１）、コンピュータ１１は、各辞書の認識率
の中の最高の認識率が所定の閾値（例えば、８５％）を
超えているかチェックし（ステップＳ３１２）、超えて
いればその最高認識率を得た辞書番号を選択し（ステッ
プＳ３１３）、超えていなければ辞書番号”０”を選択
する（ステップＳ３０７）。After performing the temporary recognition processing for this designated range for all the dictionaries registered in the corresponding column (step S311), the computer 11 determines that the highest recognition rate among the recognition rates of the dictionaries is predetermined. It is checked whether the threshold value (for example, 85%) is exceeded (step S312), and if it is exceeded, the dictionary number with the highest recognition rate is selected (step S313). If it is not exceeded, the dictionary number "0" is selected. Yes (step S307).

【００３２】ここで、辞書番号”０”とは前述した３０
個の認識辞書の全部を用いて認識を行うことを意味す
る。Here, the dictionary number "0" is 30 as described above.
This means that recognition is performed using all of the individual recognition dictionaries.

【００３３】図７は、以上のようにして辞書番号が決ま
った後、この辞書番号の通知を受けたＣＲＵ９が行う認
識処理（原稿イメージの各文字の認識処理）の流れを示
す。この処理は、辞書との照合が終了すると、その距離
値を距離の小さいものからソートして、１６個の候補文
字を出力するものである。ソートの方法は、ＣＲＵ９の
２５６カテゴリ並列処理機能を利用して、２段階に分け
て行う。図８は、この２段階のソートの様子を示したも
のである。FIG. 7 shows the flow of the recognition processing (recognition processing of each character of the original image) performed by the CRU 9 which has received the notification of the dictionary number after the dictionary number is determined as described above. In this process, when the matching with the dictionary is completed, the distance values are sorted from the smallest distance and 16 candidate characters are output. The sorting method is performed in two stages by utilizing the 256-category parallel processing function of CRU9. FIG. 8 shows a state of this two-step sorting.

【００３４】図７に示すように、辞書番号”０”を受け
た場合は、まず、３０個の辞書の各々を用いて認識対象
文字の認識を行う（ステップＳ４０１）。次に、その認
識結果を用いて縦方向ソートを行う（ステップＳ４０
２）。As shown in FIG. 7, when the dictionary number "0" is received, the recognition target character is first recognized using each of the 30 dictionaries (step S401). Next, vertical sorting is performed using the recognition result (step S40).
2).

【００３５】この縦方向ソートとは、図８に示すよう
に、各カテゴリ毎に、最も距離の小さいもの（距離１）
から最も距離の大きいもの（距離３０）まで順に候補文
字をソートするものである。その結果、距離１の５６カ
テゴリの候補文字群Ｃ1、距離２の５６カテゴリの候補
文字群Ｃ2、…、距離３０の５６カテゴリの候補文字群
Ｃ30が得られる。As shown in FIG. 8, this vertical sorting is the one with the smallest distance (distance 1) for each category.
To the longest distance (distance 30). As a result, a candidate character group C1 of 56 categories with a distance of 1, a candidate character group C2 of 56 categories of a distance of 2, ..., A candidate character group C30 of 56 categories with a distance of 30 are obtained.

【００３６】次に、それら候補文字群Ｃ1〜Ｃ30の中か
ら距離１の文字群Ｃ1を選択し（ステップＳ４０３）、
この文字群Ｃ1について横方向ソートを行う（ステップ
Ｓ４０５）。Next, a character group C1 having a distance of 1 is selected from the candidate character groups C1 to C30 (step S403),
The character group C1 is horizontally sorted (step S405).

【００３７】この横方向ソートでは、図８に示すように
距離１の候補文字群Ｃ30の中で、距離値の小さい順に候
補文字を並べ換える。この横方向ソートの後、ＣＲＵ９
は、距離値の小さい方から上位１６個の候補文字を選択
し、認識結果としてワークステーション１に返送する
（ステップＳ４０６）。In this horizontal sorting, the candidate characters are rearranged in ascending order of the distance value in the candidate character group C30 at the distance 1 as shown in FIG. After this horizontal sort, CRU9
Selects the top 16 candidate characters from the one with the smallest distance value and returns them to the workstation 1 as the recognition result (step S406).

【００３８】一方、”０”以外の辞書番号を受けた場合
は、ＣＲＵ９はその辞書番号に対応する辞書を用いて認
識を行う（ステップＳ４０４）。その結果、５６カテゴ
リについてそれぞれ距離値が求まるので、次に、その距
離値を用いて横方向ソートを行い（ステップＳ４０
５）、そして上位１６候補を選択し出力する（ステップ
Ｓ４０６）。On the other hand, when the dictionary number other than "0" is received, the CRU 9 recognizes using the dictionary corresponding to the dictionary number (step S404). As a result, since the distance value is obtained for each of the 56 categories, the horizontal direction sorting is performed using the distance value (step S40).
5), and the top 16 candidates are selected and output (step S406).

【００３９】以上のような２段階のソート方法を採るこ
とにより、”０”以外の辞書番号が指定された場合は２
段階目のソートだけで済むというメリットが得られる。
また、辞書番号”０”の場合は、第１段階で同一カテゴ
リ内のソートを行うため、同じカテゴリが最終的な候補
文字に複数現れることがなくなり、よって、多くのカテ
ゴリを候補文字として出力できるので、後の校正処理に
とって有利である。By adopting the above-described two-step sorting method, when a dictionary number other than "0" is designated, 2
The advantage is that only the stage sort is required.
In the case of the dictionary number “0”, the same category is sorted in the first step, so that the same category does not appear more than once in the final candidate character, so that many categories can be output as candidate characters. Therefore, it is advantageous for the subsequent calibration process.

【００４０】以上のように、本実施例では、予め顧客毎
に使用フォントに応じた辞書番号を登録しておき、その
番号に対応した辞書を用いて認識を行うため、顧客がど
のようなフォントを用いて印刷しても、高速且つ正確に
認識を行うことができる。また、顧客の使用フォントが
途中で変更されたり、登録されてない顧客からの書類を
受けた場合には、全辞書を用いて認識し、かつ、上記し
たように候補文字内に同一カテゴリが複数含まれないよ
うにしているため、比較的高い認識率が得られると共に
後の校正も楽である。As described above, in the present embodiment, the dictionary number corresponding to the font used is registered in advance for each customer, and recognition is performed using the dictionary corresponding to that number. Even if printing is performed using, the recognition can be performed at high speed and accurately. Also, when the font used by the customer is changed in the middle or a document from a customer who is not registered is received, it is recognized using all dictionaries, and as described above, the same category is included in multiple candidate characters. Since it is not included, a relatively high recognition rate can be obtained and the subsequent proofreading is easy.

【００４１】本発明は、上記の実施例以外の種々の態様
で実施することができる。例えば、辞書番号”０”での
認識を行った場合、横方向ソートの結果第１位になった
候補文字がどの辞書のものかを記憶し、原稿の全文字に
ついてその記憶結果を集計して最も高いポイントを得た
（正解の頻度が高かった）辞書を自動的に辞書テーブル
に登録するようにしてもよい。これにより、辞書テーブ
ルへの辞書番号の登録は自動的に行われ、人手による登
録を省略又は削減することができる。The present invention can be implemented in various modes other than the above embodiments. For example, when recognition is performed with the dictionary number “0”, the dictionary stores the candidate character that is ranked first as a result of the horizontal sorting, and the storage result is totaled for all characters of the manuscript. The dictionary that has the highest score (the frequency of correct answers is high) may be automatically registered in the dictionary table. As a result, the dictionary number is automatically registered in the dictionary table, and manual registration can be omitted or reduced.

【００４２】また、オペレータの任意によって、辞書選
択処理における仮認識を省略してオペレータの指定した
辞書番号を強制的に使用させ得るようにしたり、登録さ
れている辞書番号が複数ある場合、その複数の辞書全部
を使って辞書番号”０”と同様の認識処理を行わせ得る
ようにしてもよい。これにより、仮認識を行う面倒を場
合に応じて省略することができる。Further, according to the operator's discretion, temporary recognition in the dictionary selection processing can be omitted so that the dictionary number designated by the operator can be forcibly used, or if there are a plurality of registered dictionary numbers, the plurality of dictionary numbers can be used. It is also possible to use all of the dictionaries to perform the recognition process similar to the dictionary number "0". Thereby, the trouble of performing the temporary recognition can be omitted depending on the case.

【００４３】[0043]

【発明の効果】以上説明したように、本発明によれば、
書類の印刷に使用した文字フォントがどのような種類の
フォントであっても、対応する辞書を自動的に選んで高
い認識率を得ることができる。また、対応する辞書の番
号が予め登録されているため、全体の処理時間が短時間
になる。As described above, according to the present invention,
Regardless of the type of font used to print the document, it is possible to automatically select the corresponding dictionary and obtain a high recognition rate. Further, since the corresponding dictionary number is registered in advance, the entire processing time becomes short.

[Brief description of drawings]

【図１】本発明の一実施例のシステム構成を示すブロッ
ク図。FIG. 1 is a block diagram showing a system configuration of an embodiment of the present invention.

【図２】同実施例の辞書テーブルの一例を示す図。FIG. 2 is a diagram showing an example of a dictionary table of the same embodiment.

【図３】同実施例の認識カテゴリ例を示す図。FIG. 3 is a diagram showing an example of a recognition category of the same embodiment.

【図４】同実施例の辞書構成を示す図。FIG. 4 is a diagram showing a dictionary configuration of the same embodiment.

【図５】同実施例のワークステーションの全体的処理流
れを示すフローチャート。FIG. 5 is a flowchart showing the overall processing flow of the workstation of the embodiment.

【図６】図５の辞書選択処理の詳細な流れを示すフロー
チャート。FIG. 6 is a flowchart showing a detailed flow of dictionary selection processing in FIG.

【図７】同実施例のＣＲＵの認識処理の流れを示すフロ
ーチャート。FIG. 7 is a flowchart showing the flow of CRU recognition processing according to the embodiment.

【図８】図７の認識処理の２段階のソートの様子を示す
図。FIG. 8 is a diagram showing a state of sorting in two stages of the recognition processing of FIG. 7.

[Explanation of symbols]

１ワークステーション３ディスク記憶装置５イメージ処理装置（ＩＰＵ）７イメージリーダ９文字認識装置（ＣＲＵ） 1 workstation 3 disk storage device 5 image processing unit (IPU) 7 image reader 9 character recognition unit (CRU)

Claims

[Claims]

1. An apparatus for recognizing characters in a printed document from one or more printing sources, and a dictionary storage unit for storing a plurality of recognition dictionaries corresponding to a plurality of character fonts, which is used for each printing source. A dictionary table in which dictionary numbers corresponding to the character fonts are registered, and by referring to the dictionary table, dictionary means for selecting the dictionary number of the recognition dictionary to be used for the document to be recognized, A print character recognition device comprising: a recognition unit that performs character recognition of the recognition target document using a dictionary in the dictionary storage unit corresponding to a dictionary number.

2. The apparatus according to claim 1, wherein when the dictionary selection unit registers a dictionary number corresponding to a printing source of the recognition target document in the dictionary table, the registered dictionary number. Is selected, and if not registered, all the dictionaries in the dictionary accumulating means are selected.

3. The apparatus according to claim 2, wherein when the dictionary selecting unit selects the registered dictionary number, the recognition target document is selected by using each dictionary corresponding to the selected dictionary number. If the dictionary selection means includes a recognition rate that is equal to or higher than a predetermined threshold among the recognition rates from the temporary recognition means, the maximum recognition rate is obtained by temporarily recognizing a designated range in the recognition range. The print character recognition device is characterized in that the dictionary number that has obtained the recognition rate is finally selected, and if there is no dictionary number greater than or equal to the threshold value, all dictionaries in the dictionary storage means are selected.

4. The apparatus according to claim 1, further comprising dictionary registration means for selecting a dictionary having the highest correct answer frequency based on a recognition result from the recognition means and registering the dictionary in the dictionary table. Print character recognition device.

5. A method of performing character recognition of a print document from one or more printing sources, refers to a dictionary table in which a dictionary number corresponding to a character font used for each printing source is registered in advance. By doing so, the dictionary process of selecting the dictionary number of the recognition dictionary to be used for the document to be recognized, and the dictionary corresponding to the selected dictionary number, A printing character recognition method, comprising: a recognition process of selecting a character from the selected recognition dictionary and performing character recognition of the recognition target document using the selected recognition dictionary.