JP2002279351A

JP2002279351A - Character recognition device, method, and program, and recording medium computer-readable the program recorded

Info

Publication number: JP2002279351A
Application number: JP2001079473A
Authority: JP
Inventors: Naoya Misawa; 直也三澤; Yoko Fujiwara; 葉子藤原
Original assignee: Minolta Co Ltd
Current assignee: Minolta Co Ltd
Priority date: 2001-03-19
Filing date: 2001-03-19
Publication date: 2002-09-27
Anticipated expiration: 2021-03-19
Also published as: JP4385536B2

Abstract

PROBLEM TO BE SOLVED: To provide a character recognition device for simultaneously recognizing character codes and fonts, which is easily adapted to utilization environment, and highly precisely recognizing characters. SOLUTION: This character recognition device 1 detects the font incorporated in a personal computer 3 and a printer 4, obtains character image data of respective characters of the font detected from the personal computer 3 and the printer 4, prepares a reference pattern of the respective characters of the font detected, by using the character image data, and prepares an identification dictionary for each font detected by registering the reference pattern. The character code and the font of the character pattern recognized from a document image data obtained from a scanner 2 are determined using the identification dictionary for every font.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字認識装置に関
する。詳しくは、文字コードと同時にフォントの認識が
行え、かつ簡便に使用環境に適応して精度の高い文字認
識を行うことのできる文字認識装置である。[0001] The present invention relates to a character recognition device. More specifically, the present invention is a character recognition device capable of recognizing a font simultaneously with a character code, and performing highly accurate character recognition easily adapted to a use environment.

【０００２】[0002]

【従来の技術】従来の文字認識装置においては、文字パ
ターンの識別に用いる識別辞書は、フォントによる文字
パターンの違いや、かすれ、つぶれ等の変動に対応して
安定した文字認識を行うため、複数の代表的なフォント
（通常は、セリフ系、サンセリフ系、モノスペース系各
１種）の特徴量を平均化して作られていた。2. Description of the Related Art In a conventional character recognition apparatus, an identification dictionary used for character pattern identification includes a plurality of character dictionaries in order to perform stable character recognition in response to differences in character patterns due to fonts, fluctuations such as blurring and crushing. Of the typical fonts (usually, serif, sans-serif, and monospace-based fonts).

【０００３】これに対し、単一フォントの特徴量で作成
したフォントごとの識別辞書を用いる文字認識装置が提
案されている（特開平１１−８５９０８号公報等）。か
かる装置では、文字コードと同時にフォントの認識が行
えるうえ、所有する識別辞書にかかるフォントに対して
は、前述の複数フォントの特徴量が平均化された識別辞
書を用いるよりも、高い認識率が得られるという利点が
ある。On the other hand, there has been proposed a character recognition apparatus that uses an identification dictionary for each font created based on the feature amount of a single font (Japanese Patent Laid-Open No. 11-85908). In such a device, the font can be recognized simultaneously with the character code, and a higher recognition rate can be obtained for fonts belonging to the identification dictionary than in the case of using the identification dictionary in which the characteristic amounts of a plurality of fonts are averaged. There is an advantage that it can be obtained.

【０００４】しかし、その一方、所有する識別辞書にか
かるフォント以外については、識別率が低下してしまう
という欠点があった。したがって、様々な使用環境に対
応させるためには、多数のフォント別辞書を用意してお
く必要がありコストがかかるうえ、予め認識すべきフォ
ントをユーザが指定する構成とすれば操作が煩雑とな
り、フォントを自動認識させる構成とすれば処理が極め
て複雑となるという問題点があった。さらに、フォント
の種類は多種多様に及ぶため、すべての識別辞書を用意
することは現実には困難であり、特殊なフォントを使用
している環境には対応できないという問題があった。[0004] On the other hand, however, there is a disadvantage that the identification rate is reduced for fonts other than those relating to the identification dictionary owned by the user. Therefore, in order to cope with various use environments, it is necessary to prepare a large number of font-specific dictionaries, which is expensive, and if the user specifies a font to be recognized in advance, the operation becomes complicated, If the font is automatically recognized, the processing becomes extremely complicated. Furthermore, since there are many types of fonts, it is actually difficult to prepare all identification dictionaries, and there is a problem that it is not possible to cope with an environment using a special font.

【０００５】[0005]

【発明が解決しようとする課題】本発明は上記従来技術
の有する問題点に鑑みなされたものであり、その目的と
するところは、文字コードと同時にフォントの認識が行
え、かつ簡便に使用環境に適応して精度の高い文字認識
を行うことのできる文字認識装置を提供することにあ
る。SUMMARY OF THE INVENTION The present invention has been made in consideration of the above-mentioned problems of the prior art, and has as its object to recognize fonts simultaneously with character codes, and to simplify the use environment. It is an object of the present invention to provide a character recognition device capable of adaptively performing highly accurate character recognition.

【０００６】[0006]

【課題を解決するための手段】本発明の上記目的は、下
記の手段によって達成される。The above object of the present invention is achieved by the following means.

【０００７】（１）接続機器に組み込まれたフォント
を検出するフォント検出手段と、前記接続機器から前記
フォント検出手段により検出されたフォントの各文字に
ついての文字画像データを取得する文字画像データ取得
手段と、前記文字画像データ取得手段により取得した文
字画像データを用いて検出したフォントの各文字の標準
パターンを作成する標準パターン作成手段と、前記標準
パターン作成手段により作成した標準パターンを登録し
て識別辞書を作成する識別辞書作成手段と、前記識別辞
書作成手段により作成した識別辞書を用いて文字パター
ンの文字コードおよびフォントを決定する文字識別手段
と、を有することを特徴とする文字認識装置。(1) Font detecting means for detecting a font incorporated in a connected device, and character image data obtaining means for obtaining character image data for each character of the font detected by the font detecting device from the connected device. Registering and identifying the standard pattern created by the standard pattern creating unit, which creates a standard pattern for each character of the font detected using the character image data acquired by the character image data acquiring unit; A character recognition apparatus comprising: an identification dictionary creation unit that creates a dictionary; and a character identification unit that determines a character pattern character code and a font using the identification dictionary created by the identification dictionary creation unit.

【０００８】（２）接続機器に組み込まれたフォント
を検出するフォント検出手段と、前記接続機器から前記
フォント検出手段により検出されたフォントの各文字に
ついての文字画像データを取得する文字画像データ取得
手段と、前記文字画像データ取得手段により取得した文
字画像データを用いて検出したフォントの各文字の標準
パターンを作成する標準パターン作成手段と、前記標準
パターン作成手段により作成した標準パターンを登録し
て検出されたフォントごとの識別辞書を作成する識別辞
書作成手段と、前記識別辞書作成手段により作成したフ
ォントごとの識別辞書を用いて文字パターンの文字コー
ドおよびフォントを決定する文字識別手段と、を有する
ことを特徴とする文字認識装置。(2) Font detecting means for detecting a font incorporated in the connected equipment, and character image data obtaining means for obtaining character image data for each character of the font detected by the font detecting means from the connected equipment. A standard pattern creating unit that creates a standard pattern for each character of the font detected using the character image data acquired by the character image data acquiring unit; and a standard pattern created by the standard pattern creating unit. Identification dictionary creating means for creating the identified dictionary for each font, and character identification means for determining the character code and font of the character pattern using the identification dictionary for each font created by the identification dictionary creating means. Character recognition device characterized by the above-mentioned.

【０００９】（３）前記文字識別手段は、類似度が所
定のしきい値以上となる標準パターンの文字コードを前
記文字パターンの文字コードとすることを特徴とする
（２）に記載の文字認識装置。(3) The character recognition device according to (2), wherein the character identification means sets a character code of a standard pattern whose similarity is equal to or more than a predetermined threshold value as a character code of the character pattern. apparatus.

【００１０】（４）前記文字識別手段は、識別頻度の
順に識別辞書の優先順位を決定することを特徴とする
（２）または（３）に記載の文字認識装置。(4) The character recognition device according to (2) or (3), wherein the character identification unit determines the priority of the identification dictionary in the order of the identification frequency.

【００１１】（５）前記文字識別手段は、識別頻度の
最も高い識別辞書のフォントを文字ブロックのフォント
として前記文字ブロックごとにフォントを決定すること
を特徴とする（２）〜（４）に記載の文字認識装置。(5) The character identification means determines a font for each of the character blocks by using a font of an identification dictionary having the highest identification frequency as a font of a character block. Character recognition device.

【００１２】（６）接続機器に組み込まれたフォント
を検出する段階と、前記接続機器から検出されたフォン
トの各文字についての文字画像データを取得する段階
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する段階と、前記標準パター
ンを登録して識別辞書を作成する段階と、前記識別辞書
を用いて文字パターンの文字コードおよびフォントを決
定する段階と、を有することを特徴とする文字認識方
法。(6) detecting a font incorporated in the connected device, obtaining character image data for each character of the font detected from the connected device, and detecting the character image data; Creating a standard pattern for each character of the font; registering the standard pattern to create an identification dictionary; and determining the character code and font of the character pattern using the identification dictionary. Character recognition method characterized by the following.

【００１３】（７）接続機器に組み込まれたフォント
を検出する段階と、前記接続機器から検出されたフォン
トの各文字についての文字画像データを取得する段階
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する段階と、前記標準パター
ンを登録して検出されたフォントごとの識別辞書を作成
する段階と、前記フォントごとの識別辞書を用いて文字
パターンの文字コードおよびフォントを決定する段階
と、を有することを特徴とする文字認識方法。(7) A step of detecting a font incorporated in the connected device, a step of acquiring character image data for each character of the font detected from the connected device, and a step of detecting using the character image data Creating a standard pattern for each character of the font; registering the standard pattern to create an identification dictionary for each detected font; and character code and font for the character pattern using the identification dictionary for each font. Determining a character recognition method.

【００１４】（８）接続機器に組み込まれたフォント
を検出する手順と、前記接続機器から検出されたフォン
トの各文字についての文字画像データを取得する手順
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する手順と、前記標準パター
ンを登録して識別辞書を作成する手順と、前記識別辞書
を用いて文字パターンの文字コードおよびフォントを決
定する手順と、をコンピュータに実行させるためのプロ
グラム。(8) A procedure for detecting a font incorporated in the connected device, a process for acquiring character image data for each character of the font detected from the connected device, and a process for detecting using the character image data A procedure for creating a standard pattern of each character of a font, a procedure for registering the standard pattern and creating an identification dictionary, and a procedure for determining a character code and a font of the character pattern using the identification dictionary, The program to be executed.

【００１５】（９）接続機器に組み込まれたフォント
を検出する手順と、前記接続機器から検出されたフォン
トの各文字についての文字画像データを取得する手順
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する手順と、前記標準パター
ンを登録して検出されたフォントごとの識別辞書を作成
する手順と、前記フォントごとの識別辞書を用いて文字
パターンの文字コードおよびフォントを決定する手順
と、をコンピュータに実行させるためのプログラム。(9) A procedure for detecting a font incorporated in the connected device, a process for acquiring character image data for each character of the font detected from the connected device, and a process for detecting the character image data. A procedure for creating a standard pattern for each character of the font, a procedure for registering the standard pattern and creating an identification dictionary for each detected font, and a character code and font for the character pattern using the identification dictionary for each font And a program for causing a computer to execute the procedure.

【００１６】（１０）接続機器に組み込まれたフォン
トを検出する手順と、前記接続機器から検出されたフォ
ントの各文字についての文字画像データを取得する手順
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する手順と、前記標準パター
ンを登録して識別辞書を作成する手順と、前記識別辞書
を用いて文字パターンの文字コードおよびフォントを決
定する手順と、をコンピュータに実行させるためのプロ
グラムを記録したコンピュータ読み取り可能な記録媒
体。(10) A procedure for detecting a font incorporated in the connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a process for detecting the character image data. A procedure for creating a standard pattern of each character of a font, a procedure for registering the standard pattern and creating an identification dictionary, and a procedure for determining a character code and a font of the character pattern using the identification dictionary, A computer-readable recording medium on which a program to be executed is recorded.

【００１７】（１１）接続機器に組み込まれたフォン
トを検出する手順と、前記接続機器から検出されたフォ
ントの各文字についての文字画像データを取得する手順
と、前記文字画像データを用いて検出したフォントの各
文字の標準パターンを作成する手順と、前記標準パター
ンを登録して検出されたフォントごとの識別辞書を作成
する手順と、前記フォントごとの識別辞書を用いて文字
パターンの文字コードおよびフォントを決定する手順
と、をコンピュータに実行させるためのプログラムを記
録したコンピュータ読み取り可能な記録媒体。(11) A procedure for detecting a font incorporated in the connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a process for detecting using the character image data A procedure for creating a standard pattern for each character of the font, a procedure for registering the standard pattern and creating an identification dictionary for each detected font, and a character code and font for the character pattern using the identification dictionary for each font And a computer-readable recording medium on which a program for causing a computer to execute is determined.

【００１８】[0018]

【発明の実施の形態】以下、本発明の実施の形態を、図
面を参照して詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１９】図１は、本発明の実施形態にかかる文字認
識装置を含む文字認識システムの全体構成を示すブロッ
ク図である。当該文字認識システムは、文字認識装置１
と、画像読取装置としてのスキャナ２と、画像処理装置
としてのパソコン３と、画像形成装置としてのプリンタ
４とを備え、これらはネットワーク５を介して相互に通
信可能に接続されている。なお、ネットワーク５に接続
される機器の種類および台数は、図１に示す例に限定さ
れない。FIG. 1 is a block diagram showing the overall configuration of a character recognition system including a character recognition device according to an embodiment of the present invention. The character recognition system includes a character recognition device 1
, A scanner 2 as an image reading device, a personal computer 3 as an image processing device, and a printer 4 as an image forming device, which are communicably connected via a network 5. The type and number of devices connected to the network 5 are not limited to the example shown in FIG.

【００２０】つぎに、上記各機器の構成について説明す
るが、各機器で同様の機能を有する部分については、説
明の重複を避けるため初回のみその説明を行い、２回目
以降はその説明を省略する。Next, a description will be given of the configuration of each of the above-described devices. For the portions having the same functions in each of the devices, the description will be given only for the first time in order to avoid duplication of description, and the description will be omitted for the second and subsequent times. .

【００２１】図２は、本実施形態にかかる文字認識装置
１の構成の一例を示すブロック図である。文字認識装置
１はコンピュータであり、図２に示すように、ＣＰＵ１
１、ＲＯＭ１２、ＲＡＭ１３、ハードディスク１４、デ
ィスプレイ１５、入力装置１６、ネットワークインタフ
ェース１７、およびバス１８を有する。FIG. 2 is a block diagram showing an example of the configuration of the character recognition device 1 according to the present embodiment. The character recognition device 1 is a computer, and as shown in FIG.
1, a ROM 12, a RAM 13, a hard disk 14, a display 15, an input device 16, a network interface 17, and a bus 18.

【００２２】ＣＰＵ１１は制御と演算の各種処理を行
う。ＲＯＭ１２は各種プログラムやデータを記憶する。
ＲＡＭ１３は作業領域として一時的にプログラムやデー
タを記憶する。ハードディスク１４は各種プログラムや
データを記憶する。ディスプレイ１５は、各種の表示を
行う。入力装置１６は、キーボードやマウス等であり、
各種の入力を行うために使用される。ネットワークイン
タフェース１７は、ネットワークに接続しネットワーク
上の他の機器と通信するためのインタフェースである。
上記各部は、信号をやり取りするためのバス１８で相互
に接続されている。本実施形態において、コンピュータ
１は後述する所定の動作を行うが、このためのコンピュ
ータ１の動作を制御するプログラムは、ＲＯＭ１２また
はハードディスク１４に格納されている。The CPU 11 performs various processes of control and calculation. The ROM 12 stores various programs and data.
The RAM 13 temporarily stores programs and data as a work area. The hard disk 14 stores various programs and data. The display 15 performs various displays. The input device 16 is a keyboard, a mouse, or the like,
Used to make various inputs. The network interface 17 is an interface for connecting to a network and communicating with other devices on the network.
The above components are interconnected by a bus 18 for exchanging signals. In the present embodiment, the computer 1 performs a predetermined operation described later. A program for controlling the operation of the computer 1 for this purpose is stored in the ROM 12 or the hard disk 14.

【００２３】スキャナ２は、所定位置にセットされた原
稿を読み取って画像データ（ビットマップデータ）を取
得し、ネットワークを介して他の機器に画像データを出
力する機能を有する。The scanner 2 has a function of reading a document set at a predetermined position, obtaining image data (bitmap data), and outputting the image data to another device via a network.

【００２４】パソコン３は、図２に示すような、文字認
識装置１と同様の構成を有する。パソコン３には、画像
処理装置としてプリンタやディスプレイ等の画像形成装
置に文字画像を画像形成させるためのフォントが組み込
まれている。The personal computer 3 has the same configuration as the character recognition device 1 as shown in FIG. In the personal computer 3, a font for causing a character image to be formed by an image forming apparatus such as a printer or a display as an image processing apparatus is incorporated.

【００２５】プリンタ４は、印刷ジョブに基づいて印刷
イメージ（画像データ）の印刷出力を行う。プリンタ４
は、印刷ジョブが制御言語を含む場合はこれを解釈して
ラスタライズを行い、印刷イメージを作成する機能を有
し、このために必要なフォントがプリンタ４に組み込ま
れている。The printer 4 prints out a print image (image data) based on a print job. Printer 4
When a print job includes a control language, it has a function of interpreting the rasterization and performing rasterization to create a print image, and fonts necessary for this are incorporated in the printer 4.

【００２６】ネットワーク５は、イーサネット（登録商
標）、トークンリング、ＦＤＤＩ等の規格によりコンピ
ュータやネットワーク機器同士を接続したＬＡＮや、Ｌ
ＡＮ同士を専用線で接続したＷＡＮ等からなる。The network 5 includes a LAN connecting computers and network devices according to standards such as Ethernet (registered trademark), Token Ring, and FDDI, and an L
It is composed of a WAN or the like in which ANs are connected by a dedicated line.

【００２７】つぎに、本実施形態における文字認識装置
１の動作の概要を説明する。図３は、本実施形態にかか
る文字認識装置１の文字認識処理の全体の手順を示すフ
ローチャートである。まず、文字認識装置１は、ディス
プレイ１５に識別辞書の新規作成命令の入力画面を表示
し、ユーザから識別辞書の新規作成命令の入力があるま
で待機する（Ｓ１０１のＮＯ）。ユーザは、入力装置１
６を操作することにより文字認識装置１に識別辞書の新
規作成命令を入力する。文字認識装置１は、ユーザから
識別辞書の新規作成命令の入力を受け付けると（Ｓ１０
１のＹＥＳ）、識別辞書の作成処理を行う（Ｓ１０
２）。Next, an outline of the operation of the character recognition device 1 in this embodiment will be described. FIG. 3 is a flowchart illustrating the overall procedure of the character recognition process of the character recognition device 1 according to the present embodiment. First, the character recognition device 1 displays an input screen for a new identification dictionary creation command on the display 15 and waits for a user to input a new identification dictionary creation command (NO in S101). The user operates the input device 1
6 to input a new creation command of the identification dictionary to the character recognition device 1. When the character recognition device 1 receives an input of a command to newly create an identification dictionary from the user (S10)
(1) (YES), an identification dictionary creation process is performed (S10).
2).

【００２８】図４は、本実施形態にかかる文字認識装置
１の識別辞書作成処理の手順を示すフローチャートであ
る。当該識別辞書の作成処理において、文字認識装置１
は、まず、ネットワークインタフェース１７およびネッ
トワーク５を介して、文字認識装置１に接続された機器
の検出を行う（Ｓ２０１）。ネットワーク５上に接続機
器を検出すると（Ｓ２０２のＹＥＳ）、当該検出された
機器に組み込まれたフォントがあるか否かを検出する
（Ｓ２０３）。たとえば、パソコンの場合、組み込まれ
たフォントはＯＳ（オペレーティングシステム）が保有
しており、ハードディスク内のＯＳが利用する特定のデ
ィレクトリにフォントデータを所持している。また、ポ
ストスクリプト（米アドビシステムズ社）等のページ記
述言語対応のプリンタの場合、プリンタがフォントを保
有しており、ページ記述言語を解釈してハードディスク
等に格納されたフォントデータに基づいてフォントをラ
スタライズする。文字認識装置１は、これらの機器のハ
ードディスク等の特定のディレクトリを検索しかかるフ
ォントデータを検出することにより、当該機器に組み込
まれたフォントを検出する。前記検出された機器に組み
込まれたフォントを検出し、かつ当該検出したフォント
がまだ識別辞書を作成していないものである場合（Ｓ２
０４のＹＥＳ）、ネットワークインタフェース１７およ
びネットワーク５を介して前記機器に、検出されたフォ
ントのすべての文字についての文字画像データ（ビット
マップデータ）を転送するよう要求する文字画像データ
転送要求を送信する（Ｓ２０５）。FIG. 4 is a flowchart showing the procedure of the identification dictionary creating process of the character recognition device 1 according to the present embodiment. In the process of creating the identification dictionary, the character recognition device 1
First, the device connected to the character recognition device 1 is detected via the network interface 17 and the network 5 (S201). When a connected device is detected on the network 5 (YES in S202), it is detected whether or not there is a font incorporated in the detected device (S203). For example, in the case of a personal computer, an installed font is owned by an OS (operating system), and font data is held in a specific directory used by the OS on a hard disk. In the case of a printer that supports a page description language such as PostScript (Adobe Systems Incorporated), the printer has a font. The printer interprets the page description language and converts the font based on font data stored on a hard disk or the like. Rasterize. The character recognition device 1 detects a font incorporated in the device by searching a specific directory such as a hard disk of these devices and detecting such font data. When a font incorporated in the detected device is detected, and the detected font is a character for which an identification dictionary has not been created yet (S2).
04, YES), transmits a character image data transfer request to transfer the character image data (bitmap data) for all the characters of the detected font to the device via the network interface 17 and the network 5. (S205).

【００２９】図５は、文字認識装置１がネットワーク５
上のパソコン３を検出し、さらにパソコン３のハードデ
ィスクの所定のディレクトリを検索してフォントデータ
を検出し、パソコン３に当該検出したフォントデータに
かかるフォントの文字画像データの転送要求を送信した
場合について、パソコン３の動作を示すフローチャート
である。パソコン３は、文字認識装置１から文字画像デ
ータの転送要求を受信するまで待機し（Ｓ３０１のＮ
Ｏ）、文字認識装置１から文字画像データの転送要求を
受信すると（Ｓ３０１のＹＥＳ）、当該転送要求にかか
るフォントのすべての文字についての文字画像データを
準備する。ここで、フォントには、文字の形状をドット
の集合で表現するビットマップフォントと、文字の輪郭
線（アウトライン）をデータ化したアウトラインフォン
トの２種類がある。ビットマップフォントは、フォント
データを文字画像データそのもので構成しているのに対
し、アウトラインフォントは、フォントデータとして、
文字の幅と高さの情報を記載したフォントメトリックデ
ータと、文字の輪郭線を算出するための情報を記載した
アウトラインデータとを有し、これらのデータに基づい
て必要な文字サイズにラスタライズして文字画像データ
を得るものである。したがって、前記転送要求にかかる
フォントがビットマップフォントである場合は（Ｓ３０
２のＮＯ）、各文字についてフォントデータである文字
画像データをそのまま文字認識装置１に送信し、前記転
送要求にかかるフォントがアウトラインフォントである
場合は（Ｓ３０２のＹＥＳ）、各文字についてフォント
データに基づいて一定の大きさにラスタライズして（Ｓ
３０３）、得られた文字画像データを文字認識装置１に
送信する。FIG. 5 shows that the character recognition device 1
The case where the PC 3 on the PC 3 is detected, the font data is detected by searching a predetermined directory on the hard disk of the PC 3 and a transfer request of the character image data of the font related to the detected font data is transmitted to the PC 3 4 is a flowchart showing the operation of the personal computer 3. The personal computer 3 waits until a character image data transfer request is received from the character recognition device 1 (N in S301).
O), upon receiving a character image data transfer request from the character recognition device 1 (YES in S301), prepares character image data for all characters of the font corresponding to the transfer request. Here, there are two types of fonts, a bitmap font that expresses the shape of a character by a set of dots, and an outline font in which the outline (outline) of the character is converted into data. Bitmap fonts consist of font image data themselves, whereas outline fonts contain font data.
It has font metric data describing information on the width and height of the character, and outline data describing information for calculating the outline of the character, and rasterizes to the required character size based on these data. This is for obtaining character image data. Therefore, when the font related to the transfer request is a bitmap font (S30)
2), the character image data, which is the font data for each character, is transmitted to the character recognition device 1 as it is, and if the font requested for the transfer request is an outline font (YES in S302), the font data is converted to the font data for each character. Rasterized to a certain size based on (S
303), and transmits the obtained character image data to the character recognition device 1.

【００３０】なお、文字画像データの転送要求にかかる
機器がプリンタ４である場合は、プリンタに組み込まれ
るフォントはアウトラインフォントなので、プリンタ４
は、図５における文字画像データの転送要求受信の手順
（Ｓ３０１）、ラスタライズ処理の手順（Ｓ３０３）お
よび文字画像データ転送の手順（Ｓ３０４）にしたがっ
て順次動作する。また、文字認識装置１自体が、画像処
理装置としてフォントを組み込んでいる場合は、自己を
接続機器の一つとして扱い、上述のフォント検出の手順
（Ｓ２０３）およびラスタライズ処理の手順（Ｓ３０
３）にしたがって順次動作するものである。When the printer 4 is the device that requests the transfer of the character image data, the font incorporated in the printer is an outline font.
Operate sequentially in accordance with the procedure for receiving a transfer request for character image data (S301), the procedure for rasterizing processing (S303), and the procedure for character image data transfer (S304) in FIG. If the character recognition device 1 itself incorporates a font as an image processing device, it treats itself as one of the connected devices, and performs the above-described font detection procedure (S203) and rasterization processing procedure (S30).
It operates sequentially according to 3).

【００３１】図４において、文字認識装置１は、ネット
ワーク５およびネットワークインタフェース１７を介し
て前記機器から前記文字画像データを受信するまで待機
し（Ｓ２０６のＮＯ）、前記機器から前記文字画像デー
タを受信すると（Ｓ２０６のＹＥＳ）、受信した文字画
像データをハードディスク１４の所定のディレクトリに
格納する。ついで、受信した文字画像データを用いて当
該フォントのすべての文字について標準パターンを作成
し（Ｓ２０７）、作成した標準パターンを辞書登録する
ことにより当該フォントの識別辞書を作成する（Ｓ２０
８）。ここで、識別辞書の作成方法は、文字認識の方法
により異なる。特徴量抽出による方法の場合は、受信し
た各文字の文字画像データから所定の方法で特徴量を算
出し、これを各文字の標準パターンとして登録する。特
徴量抽出の具体的方法としては、抽出量として、文字線
の傾き、ループの数、線幅、文字面積等を抽出する方法
が挙げられる。また、パターン整合による方法の場合
は、受信した各文字の文字画像データをそのまま各文字
の標準パターンとして登録する。ただし、識別する文字
パターンの位置ずれ、傾き、微小なサイズの違い等に対
応するため、受信した文字画像データからある程度文字
をぼかした文字画像データを作成し、これを標準パター
ンとして登録するものであってもよい。作成した当該フ
ォントの識別辞書は、ハードディスク１４の所定のディ
レクトリに格納する。In FIG. 4, the character recognition device 1 waits until the character image data is received from the device via the network 5 and the network interface 17 (NO in S206), and receives the character image data from the device. Then (S206: YES), the received character image data is stored in a predetermined directory of the hard disk 14. Next, a standard pattern is created for all characters of the font using the received character image data (S207), and an identification dictionary of the font is created by registering the created standard pattern in a dictionary (S20).
8). Here, the method of creating the identification dictionary differs depending on the method of character recognition. In the case of the method based on feature amount extraction, a feature amount is calculated by a predetermined method from the received character image data of each character, and this is registered as a standard pattern of each character. As a specific method of feature amount extraction, there is a method of extracting the inclination of a character line, the number of loops, the line width, the character area, and the like as the extraction amount. In the case of the method based on pattern matching, the received character image data of each character is registered as it is as a standard pattern of each character. However, in order to cope with misregistration, inclination, minute size differences, etc. of the character pattern to be identified, character image data in which characters are blurred to some extent from the received character image data is created and registered as a standard pattern. There may be. The created identification dictionary of the font is stored in a predetermined directory of the hard disk 14.

【００３２】文字認識装置１は、引き続き、前記機器に
組み込まれた他のフォントを検出し（Ｓ２０３）、識別
辞書を作成していないものがあれば（Ｓ２０４のＹＥ
Ｓ）、上述の検出フォントの識別辞書作成の手順（Ｓ２
０５〜Ｓ２０８）を繰り返して検出されたフォントごと
の識別辞書を作成する。一方、前記機器に組み込まれた
フォントが存在しない場合や組み込まれたフォントがい
ずれも既に識別辞書を作成したものである場合は（Ｓ２
０４のＮＯ）、さらに他の接続機器の検出を行い（Ｓ２
０１）、他の接続機器が検出された場合は（Ｓ２０２の
ＹＥＳ）、上述のフォント検出の手順および検出フォン
トの識別辞書作成の手順（Ｓ２０３〜Ｓ２０８）を繰り
返して検出されたフォントごとの識別辞書を作成する。
他の接続機器が検出されなかった場合は（Ｓ２０２のＮ
Ｏ）、当該識別辞書作成処理を終了する。The character recognition device 1 continues to detect other fonts incorporated in the device (S203), and if any of them has not created an identification dictionary (YE in S204).
S), the procedure for creating the identification dictionary of the detected font (S2)
05 to S208) are repeated to create an identification dictionary for each detected font. On the other hand, if there is no font incorporated in the device or if all of the incorporated fonts have already created the identification dictionary (S2
04 (NO in step S2), another connected device is detected (S2).
01), if another connected device is detected (YES in S202), the above-described font detection procedure and the procedure for creating the identification dictionary of the detected font (S203 to S208) are repeated, and the identification dictionary for each detected font is determined. Create
If no other connected device is detected (N in S202)
O), the identification dictionary creation process ends.

【００３３】図３において、文字認識装置１は、上記識
別辞書作成処理が終了すると、ディスプレイ１５に原稿
読み取り命令の入力画面を表示し、ユーザから原稿読み
取り命令の入力があるまで待機する（Ｓ１０３のＮ
Ｏ）。ユーザは、入力装置１６を操作することにより文
字認識装置１に原稿読み取り命令を入力する。文字認識
装置１は、ユーザから原稿読み取り命令の入力があると
（Ｓ１０３のＹＥＳ）、ネットワークインタフェース１
７およびネットワーク５を介してスキャナ２に原稿読み
取り命令を送信する（Ｓ１０４）。スキャナ２は、文字
認識装置１から原稿読み取り命令を受信すると、所定の
読み取り位置にセットされた文書原稿を読み取って、得
られた画像データを文字認識装置１に送信する。文字認
識装置１は、スキャナ２から画像データを受信するまで
待機し（Ｓ１０５のＮＯ）、ネットワーク５およびネッ
トワークインタフェース１７を介してスキャナ２から画
像データを受信すると（Ｓ１０５のＹＥＳ）、受信した
画像データをハードディスク１４の所定のディレクトリ
に格納する。In FIG. 3, when the above-described identification dictionary creation processing is completed, the character recognition apparatus 1 displays an input screen for a document reading command on the display 15 and waits until a user inputs a document reading command (S103). N
O). The user inputs a document reading command to the character recognition device 1 by operating the input device 16. When a document reading command is input by the user (YES in S103), the character recognition device 1
A document reading command is transmitted to the scanner 2 via the network 7 and the network 5 (S104). When the scanner 2 receives the document reading command from the character recognition device 1, the scanner 2 reads the document document set at a predetermined reading position and transmits the obtained image data to the character recognition device 1. The character recognition device 1 waits until image data is received from the scanner 2 (NO in S105), and receives image data from the scanner 2 via the network 5 and the network interface 17 (YES in S105). Is stored in a predetermined directory on the hard disk 14.

【００３４】ついで、受信した画像データに含まれる文
字領域および非文字領域を判別して、前記画像データを
文字領域と非文字領域に分離する（Ｓ１０６）。当該処
理では、画像データの微小領域の明度ヒストグラムや周
波数分解したスペクトルの特性を用いて、元画像データ
中の文字領域と写真や図形等の文字以外の領域とを判別
し、元画像データから非文字領域を切り出す。切り出し
た非文字領域の画像データは、後に認識した文字コード
データとともに出力するため、ハードディスク１４の所
定のディレクトリに一時的に格納する。この際、必要に
よって、前記非文字領域の画像データに平滑化フィルタ
等の画像補正や解像度の変更、圧縮等の処理を施しても
よい。また、生成した文字領域のみの画像データに対し
ては、必要によりつぶれ文字やかすれ文字が発生しない
ようにフィルタ処理による画像補正を行ってもよい。Next, a character area and a non-character area included in the received image data are determined, and the image data is separated into a character area and a non-character area (S106). In this processing, a character area in the original image data and a non-character area such as a photograph or a figure are distinguished from each other by using a brightness histogram of a minute area of the image data or a characteristic of a spectrum obtained by frequency decomposition. Cut out the character area. The cut-out image data of the non-character area is temporarily stored in a predetermined directory of the hard disk 14 for output together with the character code data recognized later. At this time, if necessary, the image data of the non-character area may be subjected to processing such as image correction such as a smoothing filter, change in resolution, and compression. Further, the image data of only the generated character area may be subjected to an image correction by a filtering process as necessary so that a collapsed character or a blurred character does not occur.

【００３５】さらに、前記文字領域のみの画像データに
対して文字認識前処理を行う（Ｓ１０７）。すなわち、
文字認識がモノクロ２値画像データを用いて行われるた
め、当該処理では、ＲＧＢ、Ｌａｂ等で表示されたカラ
ー画像データを、文字を黒背景を白とするモノクロ２値
画像データに変換する。さらに、必要により、生成した
モノクロ２値画像データに対して、孤立点等のノイズ除
去や画像の傾き補正等の処理を施す。Further, a pre-character recognition process is performed on the image data of only the character area (S107). That is,
Since character recognition is performed using monochrome binary image data, in this process, color image data displayed in RGB, Lab, or the like is converted into monochrome binary image data in which characters have a black background on white. Further, if necessary, the generated monochrome binary image data is subjected to processes such as noise removal of isolated points and the like and image tilt correction.

【００３６】ついで、文字認識前処理により得られたモ
ノクロ２値画像データについてレイアウト解析を行う
（Ｓ１０８）。当該処理では、前記画像中の文字領域に
対し投影画像のヒストグラムを利用して文字の行構成や
段組を検出し、前記文字領域を複数の文字ブロックとし
て認識する。ここで、文字ブロックとは、段組、空白
行、改行等で区切られパラグラフ程度に分割された文字
領域をいう。文字サイズが異なる場合等も一つの文字ブ
ロックを構成する。Next, layout analysis is performed on the monochrome binary image data obtained by the character recognition preprocessing (S108). In the processing, the character line configuration and the column are detected using the histogram of the projected image for the character region in the image, and the character region is recognized as a plurality of character blocks. Here, a character block refers to a character area that is divided into paragraphs, separated by columns, blank lines, line feeds, and the like. One character block is also formed when the character sizes are different.

【００３７】さらに、レイアウト解析で認識された文字
ブロックについて、順次行および文字の切り出しを行い
（Ｓ１０９）、切り出した文字画像データに対して文字
認識処理を行う（Ｓ１１０）。切り出した各文字画像の
外接矩形データは、各文字の位置データとしてハードデ
ィスク１４の所定のディレクトリに格納する。Further, lines and characters are sequentially cut out from the character blocks recognized by the layout analysis (S109), and character recognition processing is performed on the cut-out character image data (S110). The circumscribed rectangle data of each cut-out character image is stored in a predetermined directory of the hard disk 14 as position data of each character.

【００３８】図６は、本実施形態にかかる文字認識装置
１の文字認識処理の手順を示すフローチャートである。
当該文字認識処理の手順において、文字認識装置１は、
まず、レイアウト解析で認識された文字ブロックの一つ
を選択し（Ｓ４０１）、当該文字ブロックから行を切り
出し、さらに当該行から文字を切り出す（Ｓ４０２）。FIG. 6 is a flowchart showing the procedure of the character recognition process of the character recognition device 1 according to the present embodiment.
In the procedure of the character recognition process, the character recognition device 1
First, one of the character blocks recognized by the layout analysis is selected (S401), a line is cut out from the character block, and a character is cut out from the line (S402).

【００３９】ついで、切り出した文字にかかる文字パタ
ーンについて特徴量の抽出を行う（Ｓ４０３）。当該特
徴量の抽出は、前記識別辞書作成処理の標準パターン作
成の手順（Ｓ２０７）において、識別辞書に登録した各
文字の標準パターンを作成する際に用いたのと同じ方法
により行う。また、前記識別辞書作成処理の標準パター
ン作成の手順（Ｓ２０７）において、文字認識の方法と
して特徴量抽出による方法ではなくパターン整合による
方法を用いた場合は、切り出した文字にかかる文字パタ
ーンそのものを標準パターンと比較するので、上記文字
パターンの特徴量抽出の手順（Ｓ４０３）は省略され
る。Next, the feature amount is extracted from the character pattern relating to the cut-out character (S403). The extraction of the feature amount is performed by the same method as that used when creating the standard pattern of each character registered in the identification dictionary in the procedure for creating the standard pattern of the identification dictionary creation process (S207). Also, in the standard pattern creation procedure (S207) of the identification dictionary creation processing, when a method based on pattern matching is used as a method for character recognition instead of a method based on feature amount extraction, a character pattern pertaining to a cut character is used as a standard. Since the comparison is made with the pattern, the procedure of extracting the characteristic amount of the character pattern (S403) is omitted.

【００４０】さらに、優先順位に従って識別処理に用い
る識別辞書を選択し（Ｓ４０４）、ハードディスク１４
からＲＡＭ１３上に読み出す。当該優先順位は、後述す
る識別辞書の優先順位決定の手順（Ｓ４０８）にしたが
って決定される。ただし、初回のみ、デフォルトの優先
順位、またはランダムな順位を用いる。Further, an identification dictionary to be used for the identification processing is selected according to the priority (S404), and the hard disk 14
From the RAM 13. The priority order is determined according to a procedure (S408) for determining the priority order of the identification dictionary, which will be described later. However, only the first time, a default priority or a random priority is used.

【００４１】そして、当該文字パターンから抽出された
特徴量を、選択した識別辞書に登録された各文字の標準
パターンと比較する識別処理を行う（Ｓ４０５）。すな
わち、当該識別処理では、当該文字パターンから抽出し
た特徴量と選択した識別辞書に登録された各文字の標準
パターンとの類似度を順次計算していき、類似度が所定
のしきい値以上となる標準パターンがあった場合（Ｓ４
０６のＹＥＳ）、当該標準パターンの文字コードを当該
文字パターンの文字コードとして決定し、当該文字コー
ドデータをハードディスク１４の所定のディレクトリに
保存する（Ｓ４０７）。前記識別処理に用いる類似度の
算出方法としては、シティブロック距離による方法、ユ
ークリッド距離による方法、線形識別関数による方法、
部分空間法、ベイズ識別法等が挙げられる。選択した識
別辞書に類似度がしきい値以上となる標準パターンがな
かった場合は（Ｓ４０６のＮＯ）、類似度がしきい値以
上となる標準パターンが見つかるまで、優先順位にした
がって順次識別辞書を選択しなおして上述の識別処理の
手順を繰り返し（Ｓ４０４〜Ｓ４０６）、当該文字パタ
ーンの文字コードを決定する。Then, an identification process is performed in which the feature amount extracted from the character pattern is compared with a standard pattern of each character registered in the selected identification dictionary (S405). That is, in the identification processing, the similarity between the feature amount extracted from the character pattern and the standard pattern of each character registered in the selected identification dictionary is sequentially calculated, and when the similarity is equal to or greater than a predetermined threshold value. (S4)
(YES of 06), the character code of the standard pattern is determined as the character code of the character pattern, and the character code data is stored in a predetermined directory of the hard disk 14 (S407). As a method of calculating the similarity used in the identification processing, a method using a city block distance, a method using a Euclidean distance, a method using a linear identification function,
A subspace method, a Bayes identification method, and the like can be given. If there is no standard pattern in which the similarity is equal to or greater than the threshold value in the selected identification dictionary (NO in S406), the identification dictionaries are sequentially arranged in order of priority until a standard pattern in which the similarity is equal to or greater than the threshold value is found. The selection is performed again, and the above-described procedure of the identification processing is repeated (S404 to S406), and the character code of the character pattern is determined.

【００４２】ついで、当該文字コードの決定に用いた識
別辞書のフォントを当該文字パターンのフォント候補と
し、当該フォント候補データをハードディスク１４の所
定のディレクトリに保存する（Ｓ４０８）。さらに、ハ
ードディスク１４に蓄積されたフォント候補データを解
析して、フォント候補となった頻度の順に合わせて、そ
れぞれのフォントに対応する識別辞書の優先順位を決定
し更新する（Ｓ４０９）。また、当該文字パターンから
フォントサイズを算出して当該フォントサイズデータを
ハードディスク１４の所定のディレクトリに保存する
（Ｓ４１０）。Next, the font of the identification dictionary used for determining the character code is set as a font candidate of the character pattern, and the font candidate data is stored in a predetermined directory of the hard disk 14 (S408). Further, the font candidate data stored in the hard disk 14 is analyzed, and the priorities of the identification dictionaries corresponding to the respective fonts are determined and updated according to the frequency of the font candidates (S409). Further, the font size is calculated from the character pattern, and the font size data is stored in a predetermined directory of the hard disk 14 (S410).

【００４３】選択した文字ブロックから切り出されたす
べての文字の文字パターンについて上記一連の文字認識
処理（Ｓ４０２〜Ｓ４１１）を繰り返した後（Ｓ４１１
のＹＥＳ）、優先順位の１番目にある識別辞書にかかる
フォントを当該文字ブロックのフォントと決定し、当該
フォント種別データをハードディスク１４の所定のディ
レクトリに保存する（Ｓ４０８）。そして、レイアウト
解析で認識されたすべての文字ブロックについて、上記
一連の文字認識処理（Ｓ４０１〜Ｓ４１２）を繰り返し
当該文字認識処理を終了する（Ｓ４１３のＹＥＳ）。After repeating the above-described series of character recognition processing (S402 to S411) for the character patterns of all the characters cut out from the selected character block (S411).
YES), the font related to the identification dictionary having the first priority is determined as the font of the character block, and the font type data is stored in a predetermined directory of the hard disk 14 (S408). Then, the above-described series of character recognition processing (S401 to S412) is repeated for all the character blocks recognized by the layout analysis, and the character recognition processing ends (YES in S413).

【００４４】なお、上記識別処理の手順（Ｓ４０５およ
びＳ４０６）では、当該文字パターンから抽出した特徴
量との類似度が所定のしきい値以上となる標準パターン
が見つかった時点で類似度の算出を終了するものであっ
たが、すべての識別辞書のすべての標準パターンとの類
似度を算出し、上記識別辞書の優先順位更新の手順（Ｓ
４０９）において、類似度の高い順に当該標準パターン
にかかる識別辞書の優先順位を決定するものであっても
よい。また、このようなすべての識別辞書のすべての標
準パターンとの類似度を算出する方法において、類似度
が所定のしきい値以上となる標準パターンが２以上見つ
かった場合に、前後の文字のスペルチェックを行い単語
辞書にある単語を形成する方の文字コードを採用するよ
うにしてもよい。さらに、上記文字ブロックのフォント
決定の手順（Ｓ４０８）では、文字コードによらずに最
も多く使用された識別辞書のフォントを当該文字ブロッ
クのフォントとするものであったが、フォントによる差
異の大きい文字コードにウエイトをかけて当該文字ブロ
ックのフォントを決定するものであってもよい。In the above-described identification processing procedure (S405 and S406), the similarity is calculated when a standard pattern whose similarity with the feature quantity extracted from the character pattern is equal to or larger than a predetermined threshold value is found. Although the processing is ended, the similarity between all the identification dictionaries and all the standard patterns is calculated, and the procedure for updating the priority of the identification dictionaries (S
In 409), the priorities of the identification dictionaries for the standard pattern may be determined in descending order of similarity. Further, in such a method of calculating the similarity with all the standard patterns of all the identification dictionaries, if two or more standard patterns having a similarity of not less than a predetermined threshold value are found, the spelling of the preceding and following characters is performed. A check may be made and the character code that forms the word in the word dictionary may be adopted. Further, in the procedure for determining the font of the character block (S408), the font of the most frequently used identification dictionary is used as the font of the character block regardless of the character code. A code may be weighted to determine the font of the character block.

【００４５】図３において、文字認識装置１は、上記文
字認識処理が終了すると、ついで上記文字認識処理で切
り出された文字の色認識を行い、得られた文字色データ
をハードディスク１４の所定のディレクトリに保存する
（Ｓ１１０）。当該文字色認識の処理は、元画像データ
の当該文字に相当する画素のＲＧＢやＬａｂ等のカラー
データを読み取り、当該文字に相当する全画素の平均値
を求めることにより行う。文字と背景の画素の区別は、
前記文字認識前処理で作成したモノクロ２値画像データ
を用いて、各文字の外接矩形内の黒画素を文字画素、白
画素を背景画素とすることにより行う。In FIG. 3, when the character recognition process is completed, the character recognition device 1 performs color recognition of the character cut out in the character recognition process, and stores the obtained character color data in a predetermined directory on the hard disk 14. (S110). The character color recognition process is performed by reading color data such as RGB and Lab of a pixel corresponding to the character in the original image data, and calculating an average value of all pixels corresponding to the character. The distinction between text and background pixels is
Using the monochrome binary image data created in the character recognition preprocessing, black pixels in a circumscribed rectangle of each character are set as character pixels, and white pixels are set as background pixels.

【００４６】そして、前記文字認識処理の手順（Ｓ１０
９）において得られた文字コードデータおよび文字・非
文字領域分離の手順（Ｓ１０６）で得られた非文字領域
の画像データから出力ファイルを作成する（Ｓ１１
１）。すなわち、前記文字コードデータおよび非文字領
域の画像データをハードディスク１４からＲＡＭ１３上
に読み出して所定の出力ファイル形式に変換し、さら
に、前記文字認識処理の手順（Ｓ１０９）で得られたフ
ォント種別データおよびフォントサイズ、ならびに文字
色認識の手順（Ｓ１１０）で得られた文字色データをハ
ードディスク１４から読み出して、文字属性データとし
て前記文字コードデータに付加し、前記文字認識処理の
手順（Ｓ１０９）で得られた文字位置データに基づい
て、前記非文字領域の画像データとともに配置するもの
である。作成した出力ファイルは、ハードディスク１４
の所定のディレクトリに格納するか、ネットワークイン
タフェース１７およびネットワーク５を介してパソコン
３やプリンタ４等の他の機器に出力する。Then, the procedure of the character recognition processing (S10)
An output file is created from the character code data obtained in 9) and the image data of the non-character area obtained in the character / non-character area separation procedure (S106) (S11).
1). That is, the character code data and the image data in the non-character area are read out from the hard disk 14 onto the RAM 13 and converted into a predetermined output file format. Further, the font type data and the font type data obtained in the character recognition processing procedure (S109) are obtained. The font size and the character color data obtained in the character color recognition procedure (S110) are read from the hard disk 14, added to the character code data as character attribute data, and obtained in the character recognition processing procedure (S109). Based on the character position data, the image data is arranged together with the image data of the non-character area. The created output file is stored on the hard disk 14
In a predetermined directory, or output to another device such as the personal computer 3 or the printer 4 via the network interface 17 and the network 5.

【００４７】上記実施形態では、文字認識処理の際、毎
回識別辞書作成処理（Ｓ１０２、Ｓ２０１〜Ｓ２０８）
を行い識別辞書を更新するものであったが、文字認識処
理とは別に、初回のみ、もしくは一定期間ごとに、また
は接続環境に変化があった場合等にのみ識別辞書作成処
理を行う構成としてもよい。In the above embodiment, in the character recognition processing, the identification dictionary creation processing (S102, S201 to S208) is performed each time.
However, apart from the character recognition processing, the identification dictionary creation processing may be performed only for the first time, or at regular intervals, or only when there is a change in the connection environment. Good.

【００４８】また、上記実施形態では、検出したフォン
トごとの識別辞書を作成し、これを用いて文字認識を行
うものであったが、デフォルトでは通常の識別辞書、す
なわち、複数のフォントの特徴量を平均化して作成した
識別辞書を用いて文字認識を行い、認識率が低い場合、
すなわち前記識別処理において類似度の平均値が所定値
以下であった場合に、上述の検出したフォントごとの識
別辞書を用いる構成としてもよい。これにより、デフォ
ルトの識別辞書では認識できない特殊フォントが当該文
字認識システムの環境で用いられている場合等に、当該
フォントにかかる入力データについてのみ前記検出フォ
ントの識別辞書を用い、それ以外の入力データについて
はデフォルトの識別辞書を用いることにより、全体とし
ての認識率を向上させることができる。In the above-described embodiment, an identification dictionary is created for each detected font, and character recognition is performed using the identification dictionary. However, by default, a normal identification dictionary, that is, a feature amount of a plurality of fonts is used. When character recognition is performed using the identification dictionary created by averaging and the recognition rate is low,
That is, when the average value of the similarities is equal to or less than a predetermined value in the identification processing, the above-described identification dictionary for each detected font may be used. Thereby, when a special font that cannot be recognized by the default identification dictionary is used in the environment of the character recognition system, the identification dictionary of the detected font is used only for input data related to the font, and other input data is used. By using the default identification dictionary, the recognition rate as a whole can be improved.

【００４９】さらに、上記実施形態では、検出したフォ
ントごとに別々の識別辞書を作成するものであったが、
検出したすべてのフォントのすべての文字の文字画像デ
ータから得られた標準パターンを１つの識別辞書に登録
して識別辞書を作成する構成としても構わない。Further, in the above embodiment, a separate identification dictionary is created for each detected font.
A standard pattern obtained from character image data of all characters of all detected fonts may be registered in one identification dictionary to create an identification dictionary.

【００５０】なお、上記実施形態では、文字認識装置
１、スキャナ２、コンピュータ３およびプリンタ４は、
ネットワーク５を介して相互に接続された構成であった
が、文字認識装置１とスキャナ２、または文字認識装置
１とプリンタ４とが、ＲＳ−２３２Ｃ、ＵＳＢ、ＩＥＥ
Ｅ１３９４等のシリアルインタフェース、ＳＣＳＩ、Ｉ
ＥＥＥ１２８４等のパラレルインタフェース、Ｂｌｕｅ
ｔｏｏｔｈ（ブルートゥース）、ＩＥＥＥ８０２.１
１、ＨｏｍｅＲＦ、ＩｒＤＡ等の無線通信インタフェー
ス等を使用してローカル接続された構成であってもよ
い。In the above embodiment, the character recognition device 1, the scanner 2, the computer 3, and the printer 4
Although the configuration was connected to each other via the network 5, the character recognition device 1 and the scanner 2 or the character recognition device 1 and the printer 4 are connected to the RS-232C, USB, or IEEE.
Serial interface such as E1394, SCSI, I
Parallel interface such as EEE1284, Blue
Bluetooth (Bluetooth), IEEE802.1
1. The configuration may be a local connection using a wireless communication interface such as HomeRF, IrDA, or the like.

【００５１】本発明による文字認識装置１は、上記実施
形態で説明した構成による他、さらに原稿読取手段また
は印刷手段を有する構成として、文字認識機能を有する
スキャナ、デジタル複写機、ファクシミリ装置等の多機
能周辺機器（ＭＦＰ）とすることができる。The character recognizing device 1 according to the present invention has, in addition to the configuration described in the above embodiment, a configuration having a document reading means or a printing means, such as a scanner having a character recognizing function, a digital copier, a facsimile machine, or the like. It can be a functional peripheral device (MFP).

【００５２】本発明による文字認識装置および文字認識
方法は、上記各手順を実行するための専用のハードウエ
ア回路によっても、また、上記各手順を記述した所定の
プログラムをＣＰＵが実行することによっても実現する
ことができる。後者により本発明を実現する場合、文字
認識装置を動作させる上記所定のプログラムは、フロッ
ピー（登録商標）ディスクやＣＤ−ＲＯＭ等のコンピュ
ータ読取可能な記録媒体によって提供されることができ
る。この場合、コンピュータ読取可能な記録媒体に記録
されたプログラムは、通常、ＲＯＭやハードディスク等
に転送され記憶される。また、このプログラムは、たと
えば、単独のアプリケーションソフトとして提供されて
もよいし、文字認識装置の一機能としてその装置のソフ
トウエアに組み込んでもよい。The character recognizing device and the character recognizing method according to the present invention can be realized not only by a dedicated hardware circuit for executing the above-mentioned procedures but also by a CPU executing a predetermined program describing the above-mentioned procedures. Can be realized. When the present invention is realized by the latter, the above-mentioned predetermined program for operating the character recognition device can be provided by a computer-readable recording medium such as a floppy (registered trademark) disk or a CD-ROM. In this case, the program recorded on the computer-readable recording medium is usually transferred to a ROM or a hard disk and stored. This program may be provided, for example, as independent application software, or may be incorporated in the software of the character recognition device as a function of the device.

【００５３】[0053]

【発明の効果】上述したように、本発明によれば、文字
認識装置の使用環境で利用されているフォントを自動的
に検出して、フォントごとの識別辞書を作成して文字認
識を行うので、複数のフォントの特徴量を平均化した識
別辞書を用いる場合に比べて、文字コードと同時にフォ
ントの認識が行え、かつ、より高い認識率を達成するこ
とができる。As described above, according to the present invention, the font used in the environment in which the character recognition device is used is automatically detected, and an identification dictionary is created for each font to perform character recognition. Compared with the case of using an identification dictionary in which the characteristic amounts of a plurality of fonts are averaged, font recognition can be performed simultaneously with character codes, and a higher recognition rate can be achieved.

【００５４】また、使用環境に存在するフォントについ
てのみ識別辞書を作成するので、予めたくさんのフォン
トについて識別辞書を用意していなくても精度の高い文
字認識を行うことができる。Further, since an identification dictionary is created only for fonts existing in the environment of use, highly accurate character recognition can be performed even if identification dictionaries are not prepared for many fonts in advance.

【００５５】さらに、特殊なフォントが使用されている
環境においても自動的に当該フォントについての識別辞
書を作成するので、簡便に環境に適応して精度の高い文
字認識を行うことができる。Furthermore, even in an environment where a special font is used, an identification dictionary for the font is automatically created, so that highly accurate character recognition can be easily performed and adapted to the environment.

[Brief description of the drawings]

【図１】本発明の実施形態にかかる文字認識装置を含
む文字認識システムの全体構成を示すブロック図であ
る。FIG. 1 is a block diagram showing an overall configuration of a character recognition system including a character recognition device according to an embodiment of the present invention.

【図２】本実施形態にかかる文字認識装置１の構成の
一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a configuration of the character recognition device 1 according to the embodiment.

【図３】本実施形態にかかる文字認識装置１の文字認
識処理の全体の手順を示すフローチャートである。FIG. 3 is a flowchart illustrating an overall procedure of a character recognition process of the character recognition device 1 according to the embodiment.

【図４】本実施形態にかかる文字認識装置１の識別辞
書作成処理の手順を示すフローチャートである。FIG. 4 is a flowchart illustrating a procedure of an identification dictionary creation process of the character recognition device 1 according to the embodiment.

【図５】本実施形態におけるパソコン３の文字画像デ
ータ転送処理の手順を示すフローチャートである。FIG. 5 is a flowchart illustrating a procedure of a character image data transfer process of the personal computer 3 according to the present embodiment.

【図６】本実施形態にかかる文字認識装置１の文字認
識処理の手順を示すフローチャートである。FIG. 6 is a flowchart illustrating a procedure of a character recognition process of the character recognition device 1 according to the embodiment.

[Explanation of symbols]

１…文字認識装置、１１…ＣＰＵ１２…ＲＯＭ１３…ＲＡＭ１４…ハードディスク１５…ディスプレイ１６…入力装置１７…ネットワークインタフェース１８…バス２…スキャナ、３…パソコン、４…プリンタ、５…ネットワーク。 DESCRIPTION OF SYMBOLS 1 ... Character recognition device, 11 ... CPU 12 ... ROM 13 ... RAM 14 ... Hard disk 15 ... Display 16 ... Input device 17 ... Network interface 18 ... Bus 2 ... Scanner, 3 ... Personal computer, 4 ... Printer, 5 ... Network.

Claims

[Claims]

1. Font detecting means for detecting a font incorporated in a connected device; character image data obtaining means for obtaining character image data for each character of a font detected by the font detecting means from the connected device; A standard pattern creating means for creating a standard pattern of each character of a font detected using the character image data acquired by the character image data acquiring means; and an identification dictionary for registering the standard pattern created by the standard pattern creating means. A character recognition device comprising: an identification dictionary creating unit that creates a character string; and a character identification unit that determines a character code and a font of a character pattern using the identification dictionary created by the identification dictionary creating unit.

2. Font detecting means for detecting a font incorporated in a connected device; character image data obtaining means for obtaining character image data for each character of the font detected by the font detecting means from the connected device; A standard pattern creating unit that creates a standard pattern for each character of the font detected using the character image data acquired by the character image data acquiring unit; and a standard pattern created by the standard pattern creating unit registered and detected. Identification dictionary creating means for creating an identification dictionary for each font, and character identification means for determining a character pattern character code and a font using the identification dictionary for each font created by the identification dictionary creating means. Character recognition device.

3. The character code of a standard pattern whose similarity is equal to or greater than a predetermined threshold value as the character code of the character pattern.
The character recognition device according to 1.

4. The apparatus according to claim 2, wherein the character identification means determines the priority of the identification dictionary in the order of the identification frequency.
Or the character recognition device according to 3.

5. The character recognition apparatus according to claim 2, wherein said character identification means determines a font for each character block using a font of an identification dictionary having the highest identification frequency as a font of a character block. apparatus.

6. A step of detecting a font incorporated in the connected device; a step of obtaining character image data for each character of the font detected from the connected device; and a font detected using the character image data. Creating a standard pattern of each character, registering the standard pattern to create an identification dictionary, and determining a character code and a font of the character pattern using the identification dictionary. Character recognition method to be characterized.

7. A step of detecting a font incorporated in the connected device; a step of obtaining character image data for each character of the font detected from the connected device; and a font detected using the character image data. Creating a standard pattern of each character, creating an identification dictionary for each detected font by registering the standard pattern, and using the identification dictionary for each font to determine a character pattern character code and a font. Determining a character.

8. A procedure for detecting a font incorporated in a connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a font detected using the character image data Executing a procedure of creating a standard pattern of each character of the above, a procedure of registering the standard pattern and creating an identification dictionary, and a procedure of determining a character code and a font of the character pattern using the identification dictionary. Program to let you.

9. A procedure for detecting a font embedded in a connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a font detected using the character image data A procedure for creating a standard pattern of each character, a procedure for registering the standard pattern and creating an identification dictionary for each detected font, and using the identification dictionary for each font to determine a character pattern character code and a font. A program for causing a computer to execute a procedure for determining.

10. A procedure for detecting a font incorporated in a connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a font detected using the character image data Executing a procedure of creating a standard pattern of each character of the above, a procedure of registering the standard pattern and creating an identification dictionary, and a procedure of determining a character code and a font of the character pattern using the identification dictionary. A computer-readable recording medium on which a program for causing a computer to record is recorded.

11. A procedure for detecting a font incorporated in a connected device, a procedure for acquiring character image data for each character of the font detected from the connected device, and a font detected using the character image data A procedure for creating a standard pattern of each character, a procedure for registering the standard pattern and creating an identification dictionary for each detected font, and using the identification dictionary for each font to determine a character pattern character code and a font. A computer-readable recording medium on which a program for causing a computer to execute the determining procedure is recorded.