JP3055484B2

JP3055484B2 - Character recognition apparatus and method

Info

Publication number: JP3055484B2
Application number: JP9029029A
Authority: JP
Inventors: 泰彦長谷川; 俊之荒谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1997-02-13
Filing date: 1997-02-13
Publication date: 2000-06-26
Anticipated expiration: 2017-02-13
Also published as: JPH10228521A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字認識装置およ
び方法に関し、特に所定の高さの文字を認識する文字認
識装置および方法に関する。The present invention relates to a character recognition device and method, and more particularly to a character recognition device and method for recognizing a character having a predetermined height.

【０００２】[0002]

【従来の技術】従来、用紙に印字等で標記された文字の
認識に文字認識装置が利用されている。このような文字
認識装置の従来の技術を、図７を参照して以下に説明す
る。図７は、従来の文字認識装置の論理的構造を示す模
式図である。2. Description of the Related Art Heretofore, a character recognition device has been used for recognizing characters marked on a sheet by printing or the like. A conventional technique of such a character recognition device will be described below with reference to FIG. FIG. 7 is a schematic diagram showing a logical structure of a conventional character recognition device.

【０００３】ここで文字認識装置として例示するＯＣＲ
システム１は、例えば、イメージスキャナをコンピュー
タシステムに接続し、このコンピュータシステムにＯＣ
Ｒ用のアプリケーションプログラムをインストールした
構造(図示せず)として実現されている。An OCR exemplified here as a character recognition device
For example, the system 1 connects an image scanner to a computer system, and connects the computer system to an OC.
It is realized as a structure (not shown) in which an application program for R is installed.

【０００４】このような構造のＯＣＲシステム１は、コ
ンピュータシステムがプログラムに従って適正な処理動
作を実行することにより、画像抽出部２、多値イメージ
メモリ３、文字切り出し部４、文字正規化部５、標準パ
ターンメモリ６、文字認識部７、等を論理的に具備して
いる。In the OCR system 1 having such a structure, the computer system executes an appropriate processing operation in accordance with a program, whereby an image extracting unit 2, a multi-valued image memory 3, a character extracting unit 4, a character normalizing unit 5, It logically includes a standard pattern memory 6, a character recognition unit 7, and the like.

【０００５】前記画像抽出部２は、前記イメージスキャ
ナを前記コンピュータシステムが動作制御する部分に相
当し、帳票上に印字された文字を光学的に読み取って光
電変換し、デジタル化された多値レベルのイメージデー
タを生成して前記多値イメージメモリ３に格納する。こ
の多値イメージメモリ３は、例えば、ＲＡＭの所定の記
憶エリアに相当し、多値のイメージデータを一時記憶す
る。The image extraction unit 2 corresponds to a part of the computer system which controls the operation of the image scanner. The image extraction unit 2 optically reads characters printed on a form, photoelectrically converts the characters, and digitizes the multi-valued levels. Is generated and stored in the multi-valued image memory 3. The multi-valued image memory 3 corresponds to, for example, a predetermined storage area of a RAM, and temporarily stores multi-valued image data.

【０００６】前記文字切り出し部４は、多値のイメージ
データを二値化してから文字単位に切り出し、その各々
の高さと横幅とを検出する。前記文字正規化部５は、イ
メージデータの微少な凹凸を除去してから、その文字高
および文字幅を参照してイメージデータの位置を所定の
イメージ領域に対して上下左右に補正する。前記標準パ
ターンメモリ６は、認識する文字の標準パターンが事前
に設定されており、前記文字認識部７は、上述のように
位置が補正されたイメージデータである認識パターンに
多数の標準パターンを順次マッチングさせ、そのマッチ
ング精度が最高の標準パターンの文字を認識結果として
出力する。The character extracting section 4 binarizes the multi-valued image data, extracts the binary data in units of characters, and detects the height and width of each character. The character normalizing section 5 corrects the position of the image data vertically and horizontally with respect to a predetermined image area by referring to the character height and character width after removing minute unevenness of the image data. In the standard pattern memory 6, a standard pattern of a character to be recognized is set in advance, and the character recognizing unit 7 sequentially stores a large number of standard patterns in a recognition pattern which is image data whose position is corrected as described above. Matching is performed, and the characters of the standard pattern with the highest matching accuracy are output as recognition results.

【０００７】上述のような構造の文字認識装置１は、帳
票に印字された文字を認識することができる。その場
合、帳票上に印字された文字を画像抽出部２によって光
学的に読み取って光電変換し、デジタル化した多値レベ
ルのイメージデータを多値イメージメモリ３に格納す
る。文字切り出し部４では、多値のイメージデータのう
ち、文字自体である黒画素の部分を１、背景で黒画素の
存在しない部分を０とし、このように二値化されたイメ
ージデータを文字単位に切り出してから、イメージデー
タの文字高および文字幅を検出する。[0007] The character recognition device 1 having the above structure can recognize characters printed on a form. In this case, the characters printed on the form are optically read by the image extracting unit 2 and photoelectrically converted, and the digitized multi-level image data is stored in the multi-level image memory 3. In the character cutout unit 4, of the multi-valued image data, a black pixel portion which is a character itself is set to 1 and a black pixel portion which is not present in the background is set to 0, and the binarized image data is converted into a character unit. Then, the character height and character width of the image data are detected.

【０００８】このように文字単位に切り出されたイメー
ジデータに対し、文字正規化部５では、微少な凹凸を除
去し、文字切り出し部４で検出された文字高および文字
幅を参照し、その黒画素の中心位置を標準パターンのイ
メージ領域の中心位置に整合させて認識パターンを生成
する。つぎに、文字認識部７では、文字正規化部５から
出力された認識パターンと標準パターンメモリ６に設定
されている多数の標準パターンとを順次マッチングさ
せ、マッチング精度が最高の標準パターンの文字を認識
結果として出力する。The character normalizing unit 5 removes minute irregularities from the image data cut out in units of characters as described above, and refers to the character height and character width detected by the character cutting unit 4 to determine the black level. The recognition pattern is generated by matching the center position of the pixel with the center position of the image area of the standard pattern. Next, the character recognition unit 7 sequentially matches the recognition pattern output from the character normalization unit 5 with a large number of standard patterns set in the standard pattern memory 6, and determines the character of the standard pattern with the highest matching accuracy. Output as recognition result.

【０００９】上述したＯＣＲシステム１では、上述のよ
うに帳票から検出した認識パターンを標準パターンにマ
ッチングさせるとき、これに先行してイメージデータの
中心位置を標準パターンのイメージ領域の中心位置に一
致させている。このため、パターンマッチングの際に標
準パターンと認識パターンとの位置ズレによりマッチン
グ精度が低下することを防止でき、帳票に印字された文
字を良好に認識することができる。In the above-described OCR system 1, when the recognition pattern detected from the form is matched with the standard pattern as described above, the center position of the image data is made to coincide with the center position of the image area of the standard pattern prior to this. ing. For this reason, it is possible to prevent a decrease in matching accuracy due to a positional shift between the standard pattern and the recognition pattern during pattern matching, and it is possible to satisfactorily recognize characters printed on a form.

【００１０】[0010]

【発明が解決しようとする課題】上述したような従来の
文字認識装置では、文字認識の精度を向上させるため、
標準パターンにマッチングさせる認識パターンの位置を
補正している。In the conventional character recognition device as described above, in order to improve the accuracy of character recognition,
The position of the recognition pattern to be matched with the standard pattern is corrected.

【００１１】しかし、帳票に印字された文字は、例え
ば、図８(ａ)に示すように、プリンタのドット抜けやゴ
ム印の印字ずれなどのため、上部や下部が欠落している
ことがある。このように上部や下部が欠落したイメージ
データの中心位置を標準パターンの領域中心に一致させ
ると、図８(ｂ)(ｃ)に示すように、かえって標準パター
ンと相違する形状となって認識精度が低下することにな
る。However, as shown in FIG. 8A, for example, characters printed on a form may have an upper portion or a lower portion missing due to missing dots of a printer or misregistration of a rubber stamp. When the center position of the image data with the upper and lower portions missing is matched with the center of the area of the standard pattern, the shape becomes different from the standard pattern, as shown in FIGS. 8B and 8C. Will decrease.

【００１２】本発明は上述のような課題に鑑みてなされ
たものであり、上部や下部が欠落した文字でも良好に認
識できる文字認識装置および方法を提供することを目的
とする。SUMMARY OF THE INVENTION The present invention has been made in consideration of the above-described problems, and has as its object to provide a character recognition apparatus and method capable of satisfactorily recognizing a character whose upper or lower part is missing.

【００１３】[0013]

【課題を解決するための手段】まず、本発明の文字認識
装置は、認識する文字の標準パターンを事前に設定して
おき、帳票に標記された所定の高さの文字を光学的に読
み取り、読み取られたイメージデータを文字単位に切り
出して中心位置を標準パターンのイメージ領域の中心位
置に整合させ、中心位置が整合したイメージデータを認
識パターンとして多数の標準パターンと順次マッチング
させ、マッチング精度が最高の標準パターンの文字を認
識結果として出力する文字認識装置において、文字単位
に切り出されたイメージデータの上下方向での欠落の有
無を判定する欠落判定手段と、存在が検出された欠落の
方向を検出する方向検出手段と、存在が検出された欠落
の高さを検出する高さ検出手段と、検出された欠落の方
向と高さとに対応してイメージデータのイメージ領域に
対する位置を上下方向に補正し、欠落の方向が判明しな
いイメージデータは上下両方に変位させる位置補正手段
と、二つのマッチング結果を比較して一方を選択する結
果選択手段と、を設けた。First, a character recognition device of the present invention sets a standard pattern of characters to be recognized in advance, and optically reads characters of a predetermined height marked on a form. The read image data is cut out in character units, the center position is aligned with the center position of the image area of the standard pattern, and the image data with the aligned center position is sequentially matched as a recognition pattern with a number of standard patterns, achieving the highest matching accuracy. In a character recognition device that outputs characters of a standard pattern as a recognition result, a missing determining means for determining whether there is a missing in the vertical direction of image data cut out in character units, and detecting a direction of the missing detected presence Direction detecting means, height detecting means for detecting the height of the detected missing part, and corresponding to the direction and height of the detected missing part. Correcting the position for the image area of the image data in the vertical direction Te, the direction of the missing Do found
Forming the have image data for selecting the position correction means Ru is displaced vertically both, one compares the two matching results
Fruit selection means .

【００１４】また、本発明の文字認識方法は、認識する
文字の標準パターンを事前に設定しておき、帳票に標記
された所定の高さの文字を光学的に読み取り、読み取ら
れたイメージデータを文字単位に切り出して中心位置を
標準パターンのイメージ領域の中心位置に整合させ、中
心位置が整合したイメージデータを認識パターンとして
多数の標準パターンと順次マッチングさせ、マッチング
精度が最高の標準パターンの文字を認識結果として出力
するようにした文字認識方法において、文字単位に切り
出されたイメージデータの上下方向での欠落の有無を判
定し、存在が検出された欠落の方向を検出し、存在が検
出された欠落の高さを検出し、検出された欠落の方向と
高さとに対応してイメージデータのイメージ領域に対す
る位置を上下方向に補正し、欠落の方向が判明しないイ
メージデータは上下両方に変位させ、二つのマッチング
結果を比較して一方を選択するようにした。According to the character recognition method of the present invention, a standard pattern of a character to be recognized is set in advance, characters of a predetermined height marked on a form are optically read, and the read image data is read. Cut out in character units, align the center position with the center position of the standard pattern image area, match the image data with the aligned center position as a recognition pattern, sequentially with a number of standard patterns, and select the characters of the standard pattern with the highest matching accuracy. In a character recognition method that is output as a recognition result, it is determined whether or not there is a vertical drop in image data cut out in character units, and the presence of the missing is detected, and the presence is detected. Detects the height of the missing part and moves the position of the image data with respect to the image area in the vertical direction according to the detected direction and height of the missing part. Corrected, stomach direction of the missing is not found
Image data is displaced both up and down , two matching
The results were compared and one was selected .

【００１５】従って、帳票に標記された所定の高さの文
字を認識する場合、帳票から光学的に読み取られたイメ
ージデータを文字単位に切り出す。このように文字単位
に切り出されたイメージデータの上下方向での欠落の有
無を判定し、欠落の存在が検出されない場合には、切り
出されたイメージデータの中心位置を標準パターンの中
心位置に整合させ、中心位置が整合したイメージデータ
を認識パターンとして多数の標準パターンと順次マッチ
ングさせる。Therefore, when recognizing a character of a predetermined height marked on a form, image data optically read from the form is cut out in character units. The presence or absence of a vertical drop in the image data cut out in character units in this way is determined, and if the presence of a drop is not detected, the center position of the cut out image data is matched with the center position of the standard pattern. Then, the image data whose center positions match are sequentially matched with a large number of standard patterns as recognition patterns.

【００１６】しかし、欠落の存在が検出された場合に
は、その欠落の方向と高さとを検出し、このように検出
された欠落の方向と高さとに対応して、イメージデータ
のイメージ領域に対する位置を上下方向に補正する。こ
のように中心位置の整合動作が補正されることで、上部
や下部が欠落したイメージデータが本来の位置に配置さ
れるので、上部や下部が欠落した認識パターンが適正な
位置で標準パターンとマッチングされる。さらに、文字
の欠落の方向が上方か下方か判明しない場合、このイメ
ージデータの中心位置が上方と下方とに変位されて標準
パターンとマッチングされ、その二つの結果から精度が
高い一方が選択される。 However, when the presence of a missing portion is detected, the direction and height of the missing portion are detected, and the direction and height of the missing portion are detected and the image data of the image area corresponding to the missing portion is detected. Correct the position vertically. By correcting the alignment operation of the center position in this way, the image data with the upper and lower parts missing is arranged at the original position, so that the recognition pattern with the upper and lower parts missing matches the standard pattern at the appropriate position. Is done. Furthermore, characters
If it is not clear whether the direction of the
The center position of the page data is displaced upward and downward
Is matched with the pattern, and the accuracy is determined from the two results.
The higher one is selected.

【００１７】なお、本発明で云う文字とは、高さが一定
のキャラクタであれば良く、数字やアルファベット等の
他、特定の記号なども許容する。帳票とは、文字が標記
された媒体を意味しており、例えば、文字が印字された
用紙を許容する。文字認識装置の各種手段は、その機能
を論理的に具備したものであれば良く、例えば、専用の
ハードウェア、適正なプログラムに従って動作する汎用
のコンピュータ、これらの組み合わせ、等として実現さ
れる。Note that the characters referred to in the present invention may be characters having a constant height, and may include specific symbols in addition to numbers and alphabets. The form means a medium on which characters are marked, for example, a sheet on which characters are printed is allowed. The various means of the character recognition device may be those having the function logically, and are realized, for example, as dedicated hardware, a general-purpose computer operating according to an appropriate program, a combination thereof, or the like.

【００１８】上述のような文字認識装置における他の発
明では、文字の高さの規定値が事前に設定された文字高
記憶手段を設け、欠落判定手段は、文字単位のイメージ
データの高さを規定値と比較して欠落の有無を判定す
る。従って、上部や下部が欠落した文字は高さが不足す
るので、文字の高さを規定値と比較すれば欠落の有無が
判定される。In another aspect of the character recognition apparatus as described above, a character height storage unit in which a prescribed value of the character height is set in advance is provided, and the lack determination unit determines the height of the image data in character units. The presence / absence of a missing part is determined by comparing with a specified value. Therefore, a character with a missing upper or lower part has a shortage, and the presence or absence of a missing part is determined by comparing the character height with a specified value.

【００１９】上述のような文字認識装置における他の発
明では、欠落無しと判定されたイメージデータの高さに
基づいて高さの規定値を補正するデータ補正手段を設け
た。従って、プリンタやスキャナの不調等のためにイメ
ージデータの高さが全体的に高めや低めとなるようなこ
とがあるが、このような場合でも欠落無しと判定された
イメージデータの高さに基づいて規定値が更新されるの
で、一部のイメージデータの欠落が相対的に検出され
る。In another aspect of the character recognition apparatus as described above, there is provided a data correction means for correcting a prescribed value of the height based on the height of the image data determined to have no loss. Therefore, the height of the image data may become higher or lower as a whole due to a malfunction of the printer or the scanner, but even in such a case, the height of the image data is determined based on the height of the image data determined to have no omission. Since the specified value is updated, the lack of some image data is relatively detected.

【００２０】[0020]

【００２１】[0021]

【発明の実施の形態】本発明の実施の一形態を図１ない
し図６を参照して以下に説明する。なお、本実施の形態
に関して前述した一従来例と同一の部分は、同一の名称
および符号を使用して詳細な説明は省略する。図１は本
実施の形態の文字認識装置の論理的構造を示す模式図、
図２は物理的構造を示すブロック図、図３および図４は
認識パターンを示す模式図、図５および図６は本実施の
形態の文字認識方法を示すフローチャートである。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described below with reference to FIGS. The same parts as those in the conventional example described above with reference to the present embodiment are denoted by the same names and reference numerals, and detailed description is omitted. FIG. 1 is a schematic diagram showing a logical structure of the character recognition device of the present embodiment,
FIG. 2 is a block diagram showing a physical structure, FIGS. 3 and 4 are schematic diagrams showing recognition patterns, and FIGS. 5 and 6 are flowcharts showing a character recognition method according to the present embodiment.

【００２２】本実施の形態で文字認識装置として例示す
るＯＣＲシステム１１は、図２に示すように、データ処
理装置であるコンピュータシステム１２と、画像読取装
置であるイメージスキャナ１３とを具備している。As shown in FIG. 2, an OCR system 11 exemplified as a character recognition device in this embodiment includes a computer system 12 as a data processing device and an image scanner 13 as an image reading device. .

【００２３】前記コンピュータシステム１２は、コンピ
ュータの主体としてＣＰＵ１０１を具備しており、この
ＣＰＵ１０１には、バスライン１０２により、ＲＯＭ１
０３、ＲＡＭ１０４、ＨＤＤ１０５、ＦＤ１０６が装填
されるＦＤＤ１０７、ＣＤ−ＲＯＭ１０８が装填される
ＣＤドライブ１０９、キーボード１１０、マウス１１
１、ディスプレイ１１２、通信Ｉ／Ｆ１１３、等が接続
されている。この通信Ｉ／Ｆ１１３には、接続コネクタ
１１４が接続されており、この接続コネクタ１１４に前
記イメージスキャナ１３が接続されている。The computer system 12 includes a CPU 101 as a main body of the computer.
03, RAM 104, HDD 105, FDD 107 in which FD 106 is loaded, CD drive 109 in which CD-ROM 108 is loaded, keyboard 110, mouse 11
1, a display 112, a communication I / F 113, and the like are connected. A connector 114 is connected to the communication I / F 113, and the image scanner 13 is connected to the connector 114.

【００２４】本実施の形態のＯＣＲシステム１１では、
前記ＲＯＭ１０３、前記ＲＡＭ１０４、前記ＨＤＤ１０
５、前記ＦＤ１０６、前記ＣＤ−ＲＯＭ１０８等が情報
記憶媒体に相当し、これらに各種動作に必要なプログラ
ムやデータがソフトウェアとして記憶されている。例え
ば、前記ＣＰＵ１０１に各種の処理動作を実行させる制
御プログラムは、前記ＦＤ１０６や前記ＣＤ−ＲＯＭ１
０８に事前に書き込まれている。このようなソフトウェ
アは前記ＨＤＤ１０５に事前にインストールされてお
り、前記コンピュータシステム１２の起動時に前記ＲＡ
Ｍ１０４に複写されて前記ＣＰＵ１０１に読み取られ
る。In the OCR system 11 of the present embodiment,
The ROM 103, the RAM 104, the HDD 10
5. The FD 106, the CD-ROM 108, and the like correspond to an information storage medium, in which programs and data necessary for various operations are stored as software. For example, a control program for causing the CPU 101 to execute various processing operations includes the FD 106 and the CD-ROM 1
08 is written in advance. Such software is pre-installed in the HDD 105, and the RA is activated when the computer system 12 is started.
Read the CPU101 is copied to M 104.

【００２５】このように前記ＣＰＵ１０１が適正なプロ
グラムを読み取って各種の処理動作を実行することによ
り、各種の機能が各種の手段として実現される。このよ
うな各種手段として、本実施の形態のＯＣＲシステム１
１は、図１に示すように、画像抽出部２、多値イメージ
メモリ３、文字切り出し部４、フィールド座標メモリ１
４、文字高記憶手段である標準文字高メモリ１５、欠落
検出部１６、第一第二の基準位置補正部１７，１８、文
字正規化部５、標準パターンメモリ６、文字認識部７、
結果選択部１９、等を論理的に具備している。As described above, the CPU 101 reads an appropriate program and executes various processing operations, whereby various functions are realized as various means. As such various means, the OCR system 1 of the present embodiment
Reference numeral 1 denotes an image extraction unit 2, a multi-valued image memory 3, a character cutout unit 4, and a field coordinate memory 1 as shown in FIG.
4. Standard character height memory 15 which is a character height storage means, missing detector 16, first and second reference position correctors 17, 18, character normalizer 5, standard pattern memory 6, character recognizer 7,
It has a result selection unit 19 and the like logically.

【００２６】前記画像抽出部２は、前記ＲＯＭ１０３に
設定された制御プログラムに対応して前記ＣＰＵ１０１
が前記イメージスキャナ１３を動作制御することによ
り、帳票上に印字された認識対象の文字を光学的に取り
込んで光電変換を行い、デジタル化した多値レベルのイ
メージデータを出力する。前記多値イメージメモリ３
は、前記ＲＡＭ１０４や前記ＨＤＤ１０５等の書替自在
なデータ記憶媒体の記憶エリアに相当し、前記画像抽出
部２が帳票から読み取った多値のイメージデータを一時
記憶する。The image extracting section 2 is adapted to execute the CPU 101 according to a control program set in the ROM 103.
Controls the operation of the image scanner 13 to optically capture the characters to be recognized printed on the form, perform photoelectric conversion, and output digitized multilevel image data. The multi-valued image memory 3
Is equivalent to a storage area of a rewritable data storage medium such as the RAM 104 or the HDD 105, and temporarily stores multi-valued image data read from a form by the image extraction unit 2.

【００２７】前記文字切り出し部４は、前記ＲＯＭ１０
３に設定された制御プログラムに対応して前記ＣＰＵ１
０１が所定の処理動作を実行することにより、前記多値
イメージメモリ３に格納された多値のイメージデータを
二値化して文字単位に切り出し、この文字単位のイメー
ジデータの各々の高さおよび横幅を検出し、１フィール
ド分(例えば、一行分)の文字座標と文字高とを算出す
る。前記フィールド座標メモリ１４も、前記ＲＡＭ１０
４や前記ＨＤＤ１０５等の書替自在なデータ記憶媒体の
記憶エリアに相当し、前記文字切り出し部４が検出した
イメージデータの高さ等の各種データを一時記憶する。The character extracting section 4 is provided in the ROM 10
3 according to the control program set in the CPU 1.
01 performs a predetermined processing operation, thereby binarizing the multi-valued image data stored in the multi-valued image memory 3 and cutting out the data in units of characters, and the height and width of each of the image data in units of characters. Is detected, and the character coordinates and the character height for one field (for example, one line) are calculated. The field coordinate memory 14 is also provided in the RAM 10
4 and a storage area of a rewritable data storage medium such as the HDD 105, and temporarily stores various data such as the height of image data detected by the character cutout unit 4.

【００２８】前記標準文字高メモリ１５も、前記ＲＡＭ
１０４や前記ＨＤＤ１０５等の書替自在なデータ記憶媒
体の記憶エリアに相当し、ここでは文字高記憶手段とし
て標準パターンの高さの規定値が更新自在に事前に格納
されている。前記欠落検出部１６は、前記ＲＯＭ１０３
に設定された制御プログラムに対応して前記ＣＰＵ１０
１が所定の処理動作を実行することにより、文字単位に
切り出されたイメージデータの上下方向での欠落の有無
を判定する欠落判定手段、存在が検出された欠落の方向
を検出する方向検出手段、存在が検出された欠落の高さ
を検出する高さ検出手段、として機能する。The standard character height memory 15 also includes the RAM
It corresponds to a storage area of a rewritable data storage medium such as 104 or the HDD 105. Here, a prescribed value of the height of the standard pattern is stored in advance as a character height storage means so as to be freely updated. The missing detector 16 is provided in the ROM 103
CPU 10 corresponding to the control program set in
1 performs a predetermined processing operation, a missing determining means for determining whether there is a missing in the vertical direction of the image data cut out in character units, a direction detecting means for detecting the direction of the missing detected presence, It functions as height detecting means for detecting the height of the missing part whose presence has been detected.

【００２９】つまり、前記欠落検出部１６は、前記フィ
ールド座標メモリ１４に格納された文字高と前記標準文
字高メモリ１５に事前に設定されている規定値の文字高
とを比較して文字の欠落の有無を検出し、欠落が存在し
ない場合は従来と同様に各種データを前記文字正規化部
５に直接に出力する。しかし、欠落が存在した場合は前
記フィールド座標メモリ１４に格納されている文字座標
により欠落の方向と高さと量を検出し、欠落の方向に対
応して前記第一第二の基準位置補正部１７，１８に各種
データを出力する。That is, the missing portion detecting section 16 compares the character height stored in the field coordinate memory 14 with a predetermined character height set in the standard character height memory 15 in advance, and determines whether a character is missing. Is detected, and if there is no missing part, various data are directly output to the character normalization unit 5 as in the conventional case. However, if there is a missing portion, the direction, height and amount of the missing portion are detected based on the character coordinates stored in the field coordinate memory 14, and the first and second reference position correcting sections 17 are detected in accordance with the missing direction. , 18 are output.

【００３０】より詳細には、前記欠落検出部１６は、フ
ィールド単位で欠落の有無を判定し、フィールド内で欠
落した文字が部分的に検出された場合のみ、その欠落の
方向と高さとを検出する。その場合、上部の欠落が検出
されると各種データを前記第一の基準位置補正部１７に
出力し、下部の欠落が検出されると各種データを前記第
二の基準位置補正部１８に出力する。しかし、フィール
ド内の全部の文字が欠落している場合には、欠落の方向
は検出せず高さのみ検出し、各種データを前記第一第二
の基準位置補正部１７，１８の両方に出力する。More specifically, the missing detecting section 16 determines the presence or absence of a missing in a field unit and detects the direction and height of the missing only when a missing character is partially detected in the field. I do. In this case, when an upper drop is detected, various data is output to the first reference position corrector 17, and when a lower drop is detected, various data is output to the second reference position corrector 18. . However, when all the characters in the field are missing, only the height is detected without detecting the direction of the missing, and various data are output to both the first and second reference position correction units 17 and 18. I do.

【００３１】これら第一第二の基準位置補正部１７，１
８は、やはり前記ＲＯＭ１０３に設定された制御プログ
ラムに対応して前記ＣＰＵ１０１が所定の処理動作を実
行することにより、検出された欠落の方向と高さとに対
応してイメージデータのイメージ領域に対する位置を上
下方向に補正する位置補正手段として機能する。つま
り、前記第一第二の基準位置補正部１７，１８は、上述
のように入力されるイメージデータに対し、その基準位
置を欠落の高さに対応して上下方向に変位させる。図３
に示すように、この基準位置はフィールドの上縁部の位
置なので、その位置が上下するよう座標が変更される
と、このイメージデータは中心位置も上下方向に移動す
ることになる。The first and second reference position correctors 17, 1
Reference numeral 8 denotes a position of the image data with respect to the image area corresponding to the direction and the height of the detected missing portion when the CPU 101 executes a predetermined processing operation in accordance with the control program set in the ROM 103. It functions as a position correcting means for correcting in the vertical direction. That is, the first and second reference position correction units 17 and 18 displace the reference position in the up and down direction with respect to the input image data in accordance with the height of the missing part. FIG.
Since the reference position is the position of the upper edge portion of the field as shown in (1), if the coordinates are changed so that the position goes up and down, the center position of the image data also moves up and down.

【００３２】前記文字正規化部５は、やはり前記ＲＯＭ
１０３に設定された制御プログラムに対応して前記ＣＰ
Ｕ１０１が所定の処理動作を実行することにより、イメ
ージデータの微少な凹凸を除去してから、前記フィール
ド座標メモリ１４に格納された文字高および文字幅を参
照し、イメージデータを中心位置が標準パターンのイメ
ージ領域の中心位置に整合するよう移動させて認識パタ
ーンを生成する。The character normalizing section 5 is also provided with the ROM
The CP corresponding to the control program set in 103
U101 executes a predetermined processing operation to remove minute irregularities of the image data, and then refers to the character height and character width stored in the field coordinate memory 14 to determine whether the center position of the image data is a standard pattern. Is moved so as to match the center position of the image area of, and a recognition pattern is generated.

【００３３】ただし、前述のように上部や下部が欠落し
たイメージデータは、その基準位置が欠落の方向と高さ
とに対応して上下方向に変位されているので、上述のよ
うに標準パターンのイメージ領域の中心位置に整合され
る中心位置も欠落に対応して変位した状態となり、結果
的に上部や下部が欠落した認識パターンも本来の位置に
生成される。However, as described above, since the reference position of the image data whose upper and lower portions are missing is displaced in the vertical direction corresponding to the direction and height of the missing portion, the image data of the standard pattern is lost as described above. The center position aligned with the center position of the area is also displaced in accordance with the lack, and as a result, a recognition pattern with the upper and lower parts missing is also generated at the original position.

【００３４】前記文字認識部７は、やはり前記ＲＯＭ１
０３に設定された制御プログラムに対応して前記ＣＰＵ
１０１が所定の処理動作を実行することにより、前記文
字正規化部５で正規化された認識パターンに各種の標準
パターンメモリ６を順次マッチングさせ、マッチング精
度が最高の標準パターンの文字を認識結果として出力す
る。The character recognizing section 7 is also provided with the ROM 1
03 according to the control program set in the CPU 03
The 101 performs a predetermined processing operation to sequentially match the various standard pattern memories 6 with the recognition patterns normalized by the character normalization unit 5, and recognizes the characters of the standard pattern having the highest matching accuracy as a recognition result. Output.

【００３５】ただし、前述のように欠落の方向が判定さ
れない認識パターンは、上方と下方との両方に変位した
状態で標準パターンとマッチングされるので、この場合
は前記文字認識部７が結果選択手段として、二つのマッ
チング結果を比較してマッチング精度が高い一方を選択
する。However, as described above, the recognition pattern for which the missing direction is not determined is matched with the standard pattern while being displaced both upward and downward. In this case, the character recognizing unit 7 uses the result selection means. As a result, two matching results are compared to select one having a higher matching accuracy.

【００３６】上述のような各種手段は、必要により前記
イメージスキャナ１３や前記ＲＡＭ１０４等のハードウ
ェアを利用して実現されるが、その主体は前記ＲＡＭ１
０４等に書き込まれたソフトウェアに対応して前記ＣＰ
Ｕ１０１が動作することにより実現されている。The various means as described above are realized by using hardware such as the image scanner 13 and the RAM 104 as necessary.
04 corresponding to the software written in
This is realized by the operation of U101.

【００３７】このようなソフトウェアは、例えば、前記
イメージスキャナ１３を動作制御して帳票から多値レベ
ルのイメージデータを読み取って前記ＲＡＭ１０４等の
所定エリアに格納すること、このような多値のイメージ
データを二値化して文字単位に切り出すこと、このイメ
ージデータの各々の高さおよび横幅を抽出して１フィー
ルド分の文字座標と文字高とを算出すること、このよう
な各種データを前記ＲＡＭ１０４等の所定エリアに格納
すること、上述のように検出された文字高を事前に設定
されている規定値と比較して欠落の有無をフィールド単
位で検出すること、欠落した文字単位のイメージデータ
がフィールドの一部から検出された場合には、そのイメ
ージデータの欠落の方向と高さとを検出して基準位置を
上下方向の一方に変位させること、フィールドの全体で
欠落が検出された場合には、そのフィールドの欠落の高
さのみ検出して全部のイメージデータの基準位置を上下
方向の両方に変位させること、イメージデータの微少な
凹凸を除去すること、その文字高および文字幅を参照し
てイメージデータの中心位置を標準パターンのイメージ
領域の中心位置に整合させて認識パターンを生成するこ
と、この認識パターンを各種の標準パターンと順次マッ
チングさせること、マッチング精度が最高の標準パター
ンの文字を認識結果として出力すること、等の処理動作
を前記ＣＰＵ１０１等に実行させるための制御プログラ
ムとして記述されている。Such software, for example, controls the operation of the image scanner 13 to read multi-level image data from a form and store it in a predetermined area of the RAM 104 or the like. Is binarized and cut out in character units, the height and width of each of the image data are extracted to calculate the character coordinates and character height for one field, and such various data are stored in the RAM 104 or the like. Storing in a predetermined area, comparing the character height detected as described above with a predetermined value set in advance, and detecting the presence or absence of a missing in a field unit; If a part of the image data is detected, the direction and height of the missing image data are detected, and the reference position is set to one of the vertical directions. If a drop is detected in the entire field, only the height of the drop in the field is detected and the reference position of all image data is displaced both in the vertical direction. Removing the irregularities, referring to the character height and character width, aligning the center position of the image data with the center position of the image area of the standard pattern, and generating a recognition pattern. It is described as a control program for causing the CPU 101 and the like to execute processing operations such as sequential matching, outputting a character of a standard pattern having the highest matching accuracy as a recognition result, and the like.

【００３８】なお、このような制御プログラムのソフト
ウェアには、例えば、別個に更新自在なデータファイル
などの形式で、認識結果となる文字の標準パターン、そ
の高さの規定値、等の各種データも付加することが可能
である。The software of such a control program includes various types of data such as a standard pattern of characters as a recognition result and a prescribed value of the height in the form of a separately updatable data file. It is possible to add.

【００３９】上述のような構成において、本実施の形態
のＯＣＲシステム１１による文字認識方法を以下に説明
する。まず、利用者は文字を読み取りたい帳票をイメー
ジスキャナ１３にセットし、コンピュータシステム１２
のキーボード１１０を手動操作して文字認識のプログラ
ムを作動させる。The character recognition method by the OCR system 11 according to the present embodiment in the above configuration will be described below. First, a user sets a form to read a character on the image scanner 13 and sets the computer system 12
Is manually operated to activate a character recognition program.

【００４０】すると、コンピュータシステム１２のＣＰ
Ｕ１０１がイメージスキャナ１３を動作制御することに
より、図５に示すように、画像抽出部２が帳票上に印字
された文字を光学的に読み取って光電変換し、デジタル
化された多値レベルのイメージデータを生成して多値イ
メージメモリ３に格納する(ステップＳ１)。Then, the CP of the computer system 12
U101 controls the operation of the image scanner 13 so that the image extracting unit 2 optically reads the characters printed on the form, photoelectrically converts the characters, and digitizes the multi-level image as shown in FIG. Data is generated and stored in the multi-valued image memory 3 (step S1).

【００４１】つぎに、文字切り出し部４では、多値のイ
メージデータに対して文字の黒画素の部分を１、背景で
黒画素の存在しない部分を０とした二値化を実行し(ス
テップＳ２)、文字単位にイメージデータを切り出す(ス
テップＳ３)。さらに、その文字高および文字幅を検出
し(ステップＳ４)、これらの各種データをフィールド座
標メモリ１４に格納する(ステップＳ５)。このとき、図
３に示すように、フィールド内の全部の文字に対し、帳
票のイメージ領域を基準として最上端と最下端との座標
が検出され、これらの格差として文字高が各々算出され
る。Next, the character cutout unit 4 performs binarization on the multi-valued image data by setting the black pixel portion of the character to 1 and the black portion not existing in the background to 0 (step S2). ), The image data is cut out in character units (step S3). Further, the character height and character width are detected (step S4), and these various data are stored in the field coordinate memory 14 (step S5). At this time, as shown in FIG. 3, the coordinates of the uppermost end and the lowermost end are detected for all the characters in the field with reference to the image area of the form, and the character height is calculated as a difference between them.

【００４２】つぎに、欠落検出部１６は、標準文字高メ
モリ１５から標準パターンの文字高の規定値を読み出す
(ステップＳ６)。この標準文字高メモリ１５には、帳票
から読み取る活字やゴム印の文字に対応して標準パター
ンの文字高が事前に設定されており、例えば、このよう
な文字高は認識する必要がある数字やアルファベットな
どの文字群毎に設定されている。Next, the missing portion detecting section 16 reads out the specified value of the character height of the standard pattern from the standard character height memory 15.
(Step S6). In the standard character height memory 15, the character height of the standard pattern is set in advance in correspondence with the characters to be read from the form and the characters of the rubber stamp. Is set for each character group.

【００４３】この欠落検出部１６では、イメージデータ
の欠落の有無が検出され、図６に示すように、それに対
応して処理が三通りに分かれることになる。まず、最初
にフィールド座標メモリ１４に格納されている１フィー
ルドの全部の文字単位のイメージデータに対し、その上
下方向での欠落の有無を各々判定し(ステップＳ７)、フ
ィールド内の全体で欠落が一様に検出された場合は(ス
テップＳ８)、欠落の方向は検出することなく欠落の高
さのみ算出する(ステップＳ９)。その場合、イメージデ
ータの高さ等の各種データは第一第二の基準位置補正部
１７，１８の両方に出力され、これらによりイメージデ
ータの基準位置は欠落の高さに対応して上下両方に補正
される(ステップＳ１０)。このように欠落の方向が判定
されない場合に、イメージデータから二種類の認識パタ
ーンを作成することを、ここでは第一処理と呼称する。The missing detector 16 detects the presence or absence of missing image data, and as shown in FIG. 6, the processing is divided into three types. First, the presence / absence of a vertical drop is determined for all character unit image data of one field stored in the field coordinate memory 14 (step S7). If they are uniformly detected (step S8), only the height of the missing portion is calculated without detecting the direction of the missing portion (step S9). In this case, various data such as the height of the image data is output to both the first and second reference position correction units 17 and 18, whereby the reference position of the image data is both up and down corresponding to the height of the lack. The correction is made (step S10). Creating two types of recognition patterns from the image data when the direction of the missing is not determined in this way is referred to as a first process here.

【００４４】また、１フィールドの一部の文字単位のイ
メージデータから欠落が検出された場合は(ステップＳ
１１)、欠落が検出されたイメージデータに対して欠落
の高さの算出と方向の検出を実行し(ステップＳ１２，
Ｓ１３)、この方向に対応して基準位置を上下一方に補
正する(ステップＳ１４〜Ｓ１６)。このように欠落の方
向に対応して認識パターンを作成することを、ここでは
第二処理と呼称する。If a missing part is detected from some character-based image data of one field (step S
11), the height of the missing portion and the direction of the missing portion are calculated for the image data in which the missing portion is detected (step S12,
S13), the reference position is corrected to one of upper and lower directions corresponding to this direction (steps S14 to S16). Creating the recognition pattern corresponding to the direction of the missing in this manner is referred to as a second process herein.

【００４５】さらに、１フィールドの全体から欠落が検
出されない場合は(ステップＳ１１)、基準位置を補正す
ることなく従来と同様に認識パターンを生成する。これ
を、ここでは第三処理と呼称する。Further, if no omission is detected from the whole one field (step S11), a recognition pattern is generated in the same manner as in the related art without correcting the reference position. This is referred to herein as a third process.

【００４６】ここで、欠落の検出などの処理のアルゴリ
ズムを各々説明する。まず、欠落の検出（アルゴリズム
１）は、図３に示すように、文字切り出し部４で切り出
した文字単位のイメージデータの上端と下端との座標
(Ｈ１，Ｌ１)を各々検出し、その格差である文字高（ｈ
１）と標準パターンの文字高の規定値（ｋｈ）とを比較
する。この場合、“ｈ１−ｋｈ”の絶対値を文字高差と
し、これが許容範囲内の場合は欠落無しと判定し、許容
範囲を逸脱している場合は欠落有りと判定する。このと
き、適正なプログラムに従って動作するＣＰＵ１０１が
データ補正手段として機能することにより、欠落無しと
判定されたイメージデータの文字高と標準パターンの文
字高の規定値との平均値が算出され、これが文字高の規
定値として更新される(ステップＳ１５)。Here, the algorithm of processing such as detection of missing data will be described. First, the detection of the lack (algorithm 1) is performed, as shown in FIG. 3, by using the coordinates of the upper end and the lower end of the character-based image data cut out by the character cutout unit 4.
(H1, L1) are detected, and the character height (h)
1) is compared with the prescribed value (kh) of the character height of the standard pattern. In this case, the absolute value of "h1-kh" is defined as the character height difference. If the absolute value is within the allowable range, it is determined that there is no loss, and if it is outside the allowable range, it is determined that there is a loss. At this time, the CPU 101 operating according to the appropriate program functions as a data correction unit, and the average value of the character height of the image data determined to be no missing and the specified value of the character height of the standard pattern is calculated. The value is updated as a high specified value (step S15).

【００４７】ただし、この更新値が初期設定の規定値に
対して許容範囲を逸脱している場合には、規定値の更新
は中止される。これにより光学系や帳票毎、フィールド
毎に発生する印字による文字サイズのバラツキが吸収で
きる。なお、上述のような許容範囲は予め実験にて算出
しておくことが好適である。However, if the updated value is out of the allowable range with respect to the initially specified value, the updating of the specified value is stopped. As a result, variations in character size due to printing that occur in each optical system, each form, and each field can be absorbed. It is preferable that the allowable range described above is calculated in advance by an experiment.

【００４８】つぎに、フィールドの一部のイメージデー
タから検出された欠落の方向の判定（アルゴリズム２）
と高さの算出（アルゴリズム３）とを、図４を参照して
以下に説明する。まず、欠落無しと判定されたイメージ
データを基準文字と考え、その上端と下端との座標値を
基準座標として使用する。Next, the direction of the missing part detected from the image data of a part of the field is determined (algorithm 2).
The calculation of the height (algorithm 3) will be described below with reference to FIG. First, image data determined to have no loss is considered as a reference character, and the coordinate values of the upper end and the lower end are used as reference coordinates.

【００４９】基準文字と処理対象のイメージデータとの
上端の座標Ｈ３，Ｈ１の格差を絶対値で算出し、同様に
下端Ｌ３，Ｌ１の格差の絶対値も算出し、これらを比較
して大きい方が上端の場合は上端欠落（パターン１）と
判定し、逆に大きい方が下端の場合は下端欠落（パター
ン２）と判定する。なお、比較差が許容範囲の場合は、
欠落無し（パターン３）または、上下同数欠落（パター
ン４）と判定し、基準位置補正は実行しない。欠落の高
さは、上端および下端の座標の格差の絶対値とする。The difference between the coordinates H3 and H1 at the upper end between the reference character and the image data to be processed is calculated as an absolute value. Similarly, the absolute value of the difference between the lower ends L3 and L1 is calculated. Is determined to be a missing upper end (pattern 1) when the upper end is missing, and conversely, it is determined to be a missing lower end (pattern 2) if the larger one is the lower end. If the comparison difference is within the allowable range,
It is determined that there is no missing part (pattern 3) or the same number of missing parts (pattern 4), and the reference position correction is not executed. The height of the gap is the absolute value of the difference between the coordinates of the upper end and the lower end.

【００５０】そして、基準位置の補正（アルゴリズム
４）では、上端欠落の場合には、第一の基準位置補正部
１７にてイメージデータの先頭を示す基準位置の座標の
数値から、欠落の高さの半分の数値を減算する。この場
合、イメージデータの上端と下端との座標の数値が欠落
の半分だけ減少することになり、イメージデータは座標
のみ欠落の半分だけ上昇したようになる。In the correction of the reference position (algorithm 4), if the upper end is missing, the first reference position correction unit 17 determines the height of the missing portion from the numerical value of the reference position indicating the beginning of the image data. Subtract half the number. In this case, the numerical values of the coordinates of the upper end and the lower end of the image data are reduced by half of the lack, and the image data is increased by only half of the lack of the coordinates.

【００５１】しかし、ビットマップ上の実際のイメージ
データは変位していないので、上述のような座標に基づ
いてビットマップからイメージデータを読み出すと、こ
のイメージデータは相対的に欠落の半分の高さだけ下降
することになる。そこで、上述のように上昇した上端と
下端との座標を平均して中心位置の座標を算出し、これ
がイメージ領域の中心位置に整合するように文字正規化
部５がビットマップ上でイメージデータを移動させる
と、このイメージデータは欠落が無い状態の本来の位置
に配置されることになる。However, since the actual image data on the bit map is not displaced, when the image data is read from the bit map based on the coordinates as described above, the height of the image data is relatively half that of the missing data. Will only descend. Accordingly, the coordinates of the center position are calculated by averaging the coordinates of the upper end and the lower end which have been raised as described above, and the character normalizing unit 5 converts the image data on the bit map so that the coordinates match the center position of the image area. When the image data is moved, the image data is arranged at an original position where no image data is missing.

【００５２】なお、下端欠落の場合には、第二の基準位
置補正部１８にてイメージデータの基準位置の座標に欠
落の高さの半分の数値を加算し、イメージデータがイメ
ージ領域に対して欠落が無い状態の本来の位置に配置さ
れるようにする。また、前述のように欠落の方向が判定
されないイメージデータは、第一第二の基準位置補正部
１７，１８の両方で基準位置の補正が実行される。上述
のような基準位置の補正で欠落部分を補ったイメージデ
ータのメモリ内容には０値をセットする。If the lower end is missing, the second reference position correction unit 18 adds a half value of the height of the missing to the coordinates of the reference position of the image data, and the image data is shifted to the image area. It should be arranged at the original position without any missing parts. As described above, the reference position of the image data for which the missing direction is not determined is corrected by both the first and second reference position correction units 17 and 18. The value 0 is set in the memory content of the image data in which the missing part has been compensated by the correction of the reference position as described above.

【００５３】つぎに、文字正規化部５では、従来と同様
に微少な凹凸を除去する正規化を実行し、イメージデー
タの上端と下端との座標を平均して中心位置の座標を算
出し、これを標準パターンのイメージ領域の中心位置に
整合させて認識パターンを生成する(ステップＳ１７)。
このとき、イメージデータの上端と下端との座標は基準
位置から算出されるので、第一第二の基準位置補正部１
７，１８で基準位置が変位されている場合、そのイメー
ジデータは欠落が無い状態の本来の位置でイメージ領域
に整合される。なお、前述のようにフィールド全体で欠
落が検出されて方向が判定されなかったイメージデータ
は、基準位置が上下両方に変位されているので上下両方
に変位した認識パターンが各々生成される。Next, the character normalization unit 5 performs normalization for removing minute irregularities in the same manner as in the prior art, and calculates the coordinates of the center position by averaging the coordinates of the upper and lower ends of the image data. This is matched with the center position of the image area of the standard pattern to generate a recognition pattern (step S17).
At this time, since the coordinates of the upper end and the lower end of the image data are calculated from the reference position, the first and second reference position correctors 1
If the reference position is displaced in steps 7 and 18, the image data is aligned with the image area at the original position without any loss. Note that, as described above, since the reference position is displaced both up and down, recognition patterns displaced both up and down are generated for the image data in which the missing is detected in the entire field and the direction is not determined as described above.

【００５４】そして、文字認識部７では、標準パターン
メモリ６から多数の標準パターンを順番に読み出して認
識パターンとマッチングさせ(ステップＳ１８)、マッチ
ング精度が最高の標準パターンの文字を認識結果として
出力する(ステップＳ１９)。ただし、フィールド全体が
欠落として二種類の認識パターンが生成されたイメージ
データに対しては、二種類の認識パターンに標準パター
ンをマッチングさせ、その精度が最高の一方が選択され
る。この場合、そのフィールドで最初のイメージデータ
に対して欠落の方向が判明することになるので、以後の
イメージデータに対しては欠落方向を特定して一方のマ
ッチングを省略することも可能である。本実施の形態の
ＯＣＲシステム１１は、上述のように文字単位のイメー
ジデータの上下方向での欠落の有無を判定し、存在が検
出された欠落の方向と高さとを検出し、検出された欠落
の方向と高さとに対応してイメージデータのイメージ領
域に対する位置を上下方向に補正するので、上部や下部
が欠落した文字のイメージデータを欠落が無い状態と同
様にイメージ領域に配置することができ、上部や下部が
欠落した文字を良好な精度で認識することができる。Then, the character recognition section 7 reads out a large number of standard patterns from the standard pattern memory 6 in order and matches them with the recognition pattern (step S18), and outputs a character of the standard pattern having the highest matching accuracy as a recognition result. (Step S19). However, with respect to image data in which two types of recognition patterns are generated by omitting the entire field, a standard pattern is matched with the two types of recognition patterns, and one having the highest accuracy is selected. In this case, since the missing direction is determined for the first image data in that field, it is possible to specify the missing direction for the subsequent image data and omit one matching. As described above, the OCR system 11 according to the present embodiment determines the presence / absence of a vertical drop in image data in units of characters, detects the direction and height of the detected drop, and detects the detected drop. The position of the image data with respect to the image area is corrected in the vertical direction according to the direction and height of the image, so that the image data of characters with missing upper and lower parts can be arranged in the image area as if there was no missing , Characters with missing upper and lower portions can be recognized with good accuracy.

【００５５】しかも、文字単位のイメージデータの高さ
を事前に用意された規定値と比較して欠落の有無を判定
するので、簡単な処理で欠落の有無を良好に検出するこ
とができる。さらに、欠落無しと判定されたイメージデ
ータの高さに基づいて高さの規定値を補正するので、印
刷や読み取りの誤差のためにイメージデータの高さが全
体的に変動している場合でも、これに対応して上部や下
部が欠落した文字を相対的に良好に検出することができ
る。Further, since the presence or absence of a missing portion is determined by comparing the height of the image data in character units with a prescribed value prepared in advance, the presence or absence of the missing portion can be detected satisfactorily by simple processing. Furthermore, since the specified value of the height is corrected based on the height of the image data determined not to be missing, even when the height of the image data is entirely changed due to an error in printing or reading, Correspondingly, characters with missing upper and lower portions can be detected relatively favorably.

【００５６】また、欠落の方向が判明しないイメージデ
ータは上下両方に変位させ、その二つのマッチング結果
を比較して一方を選択するので、欠落の方向が判明しな
い文字も良好に認識することができる。その場合、フィ
ールドの全体が欠落して方向が判明しなくとも、最初の
文字が認識されれば方向が判定できるので、以後の文字
では二つのパターン生成やマッチングを一つに省略する
ことができる。Further, image data for which the missing direction is not known is displaced both up and down, and the two matching results are compared to select one, so that a character for which the missing direction is not known can be well recognized. . In that case, even if the entire field is missing and the direction is not known, the direction can be determined if the first character is recognized, so that for subsequent characters, two pattern generations and matching can be omitted as one. .

【００５７】なお、本発明は上記形態に限定されるもの
ではなく、その要旨を逸脱しない範囲で各種の変形を許
容する。例えば、上記形態では上部や下部が欠落したイ
メージデータをイメージ領域の本来の位置に配置するた
め、基準位置の座標を欠落の高さの半分の数値だけ欠落
の方向に上下させることを例示した。しかし、上部が欠
落している場合には、そのイメージデータの上端の座標
から欠落の高さを減算し、下部が欠落している場合に
は、下端の座標に欠落の高さを加算するようなことも可
能である。The present invention is not limited to the above-described embodiment, but allows various modifications without departing from the scope of the invention. For example, in the above-described embodiment, an example has been described in which the coordinates of the reference position are moved up and down in the direction of the missing portion by half the value of the height of the missing portion in order to arrange the image data whose upper and lower portions are missing at the original position of the image area. However, if the upper part is missing, subtract the height of the missing part from the coordinates of the upper end of the image data, and if the lower part is missing, add the height of the missing part to the coordinates of the lower end. Other things are possible.

【００５８】また、上記形態では、ＲＡＭ１０４等にソ
フトウェアとして格納されている制御プログラムに従っ
てＣＰＵ１０１が動作することにより、ＯＣＲシステム
１１の各種手段が実現されることを例示した。しかし、
このような各種手段の各々を固有のハードウェアとして
形成することも可能であり、一部をソフトウェアとして
ＲＡＭ１０４等に格納するとともに一部をハードウェア
として形成することも可能である。Further, in the above-described embodiment, an example has been described in which various units of the OCR system 11 are realized by the operation of the CPU 101 according to a control program stored as software in the RAM 104 or the like. But,
It is also possible to form each of these various means as unique hardware, and it is also possible to store a part of the means as software in the RAM 104 or the like and form a part as hardware.

【００５９】また、上記形態では、コンピュータシステ
ム１２の起動時に、ＨＤＤ１０５に事前に格納されてい
るソフトウェアがＲＡＭ１０４に複写され、このように
ＲＡＭ１０４に格納されたソフトウェアをＣＰＵ１０１
が読み取ることを想定したが、このようなソフトウェア
をＨＤＤ１０５に格納したままＣＰＵ１０１に利用させ
ることや、ＲＯＭ１０３に事前に固定的に書き込んでお
くことも可能である。さらに、単体で取り扱える情報記
憶媒体であるＦＤ１０６やＣＤ−ＲＯＭ１０８等にソフ
トウェアを書き込んでおき、このＦＤ１０６等からＲＡ
Ｍ１０４等にソフトウェアをインストールすることも可
能であるが、このようなインストールを実行することな
くＦＤ１０６等からＣＰＵ１０１がソフトウェアを直接
に読み取って処理動作を実行することも可能である。In the above embodiment, when the computer system 12 is started, the software stored in the HDD 105 is copied to the RAM 104, and the software stored in the RAM 104 is copied to the CPU 101.
However, such software can be used by the CPU 101 while the software is stored in the HDD 105, or can be fixedly written in the ROM 103 in advance. Further, software is written in the FD 106 or the CD-ROM 108 which is an information storage medium which can be handled alone, and the RA is read from the FD 106 or the like.
Although software can be installed in the M104 or the like, the CPU 101 can directly read the software from the FD 106 or the like and execute the processing operation without performing such installation.

【００６０】つまり、本発明の文字認識装置の各種手段
をソフトウェアにより実現する場合、そのソフトウェア
はＣＰＵ１０１が読み取って対応する動作を実行できる
状態に有れば良い。また、上述のような各種手段を実現
する制御プログラムを、複数のソフトウェアの組み合わ
せで形成することも可能であり、その場合、単体の製品
となる情報記憶媒体には、本発明の文字認識装置を実現
するための必要最小限のソフトウェアのみを格納してお
けば良い。That is, when the various means of the character recognition apparatus of the present invention are realized by software, the software only needs to be in a state where the CPU 101 can read and execute the corresponding operation. It is also possible to form a control program for realizing the various means as described above by combining a plurality of software. In this case, the information storage medium as a single product includes the character recognition device of the present invention. It is only necessary to store only the minimum necessary software for realization.

【００６１】例えば、既存のオペレーティングシステム
が実装されているコンピュータシステム１２に、ＣＤ−
ＲＯＭ１０８等の情報記憶媒体によりアプリケーション
ソフトを提供するような場合、本発明の文字認識装置の
各種手段を実現するソフトウェアは、アプリケーション
ソフトとオペレーティングシステムとの組み合わせで実
現されるので、オペレーティングシステムに依存する部
分のソフトウェアは情報記憶媒体のアプリケーションソ
フトから省略することができる。For example, a CD-ROM is stored in a computer system 12 on which an existing operating system is mounted.
In the case where application software is provided by an information storage medium such as the ROM 108, software for implementing various units of the character recognition device of the present invention is implemented by a combination of application software and an operating system, and thus depends on the operating system. Part of the software can be omitted from the application software of the information storage medium.

【００６２】また、このように情報記憶媒体に記述した
ソフトウェアをＣＰＵ１０１に供給する手法は、その情
報記憶媒体をコンピュータシステム１２に直接に装填す
ることに限定されない。例えば、上述のようなソフトウ
ェアをホストコンピュータの情報記憶媒体に格納してお
き、このホストコンピュータを通信ネットワークで端末
コンピュータに接続し、ホストコンピュータから端末コ
ンピュータにデータ通信でソフトウェアを供給すること
も可能である。The method of supplying the software described on the information storage medium to the CPU 101 is not limited to loading the information storage medium directly into the computer system 12. For example, it is also possible to store the software as described above in an information storage medium of a host computer, connect the host computer to a terminal computer via a communication network, and supply the software from the host computer to the terminal computer by data communication. is there.

【００６３】上述のような場合、端末コンピュータが自
身の情報記憶媒体にソフトウェアをダウンロードした状
態でスタンドアロンの処理動作を実行することも可能で
あるが、ソフトウェアをダウンロードすることなくホス
トコンピュータとのリアルタイムのデータ通信により処
理動作を実行することも可能である。この場合、ホスト
コンピュータと端末コンピュータとを通信ネットワーク
で接続したシステム全体が、本発明の文字認識装置に相
当することになる。[0063] When as described above, the terminal computer itself
It is also possible to execute a stand-alone processing operation with the software downloaded to the personal information storage medium, but it is also possible to execute the processing operation by real-time data communication with the host computer without downloading the software. is there. In this case, the entire system in which the host computer and the terminal computer are connected by a communication network corresponds to the character recognition device of the present invention.

【００６４】[0064]

【発明の効果】本発明は以上説明したように構成されて
いるので、以下に記載するような効果を奏する。Since the present invention is configured as described above, it has the following effects.

【００６５】請求項１記載の発明は、認識する文字の標
準パターンを事前に設定しておき、帳票に標記された所
定の高さの文字を光学的に読み取り、読み取られたイメ
ージデータを文字単位に切り出して中心位置を標準パタ
ーンのイメージ領域の中心位置に整合させ、中心位置が
整合したイメージデータを認識パターンとして多数の標
準パターンと順次マッチングさせ、マッチング精度が最
高の標準パターンの文字を認識結果として出力する文字
認識装置において、文字単位に切り出されたイメージデ
ータの上下方向での欠落の有無を判定する欠落判定手段
と、存在が検出された欠落の方向を検出する方向検出手
段と、存在が検出された欠落の高さを検出する高さ検出
手段と、検出された欠落の方向と高さとに対応してイメ
ージデータのイメージ領域に対する位置を上下方向に補
正し、欠落の方向が判明しないイメージデータは上下両
方に変位させる位置補正手段と、二つのマッチング結果
を比較して一方を選択する結果選択手段と、を設けたこ
とにより、上部や下部が欠落した文字のイメージデータ
を欠落が無い状態と同様にイメージ領域に配置すること
ができるので、上部や下部が欠落した文字を良好な精度
で認識することができ、欠落の方向が判明しない文字も
良好に認識することができる。 According to the first aspect of the present invention, a standard pattern of a character to be recognized is set in advance, characters of a predetermined height marked on a form are optically read, and the read image data is converted into a character unit. The center position is aligned with the center position of the image area of the standard pattern, and the image data with the aligned center position is sequentially matched with many standard patterns as a recognition pattern, and the recognition result of the character of the standard pattern with the highest matching accuracy is obtained. In a character recognition device that outputs as: a missing part determining unit that determines whether there is a missing part in the vertical direction of the image data cut out in character units, a direction detecting unit that detects the direction of the missing part whose presence is detected, Height detecting means for detecting the height of the detected drop, and image data corresponding to the direction and height of the detected drop. Correcting the position with respect to di-region in the vertical direction, the image data direction of missing is not found in upper and lower
And position correcting means Ru is displaced towards the two matching results
And the result selection means for comparing and selecting one of them, the image data of the character with the upper and lower portions missing can be arranged in the image area in the same manner as the state without the missing portion. Characters with missing characters can be recognized with good accuracy, and characters with unknown directions
Can be recognized well.

【００６６】請求項２記載の発明は、請求項１記載の文
字認識装置であって、文字の高さの規定値が事前に設定
された文字高記憶手段を設け、欠落判定手段は、文字単
位のイメージデータの高さを規定値と比較して欠落の有
無を判定することにより、簡単な処理で欠落の有無を良
好に検出することができる。According to a second aspect of the present invention, there is provided the character recognition apparatus according to the first aspect, further comprising a character height storage unit in which a prescribed value of a character height is set in advance, By comparing the height of the image data with the specified value to determine the presence or absence of a missing portion, the presence or absence of the missing portion can be satisfactorily detected by simple processing.

【００６７】請求項３記載の発明は、請求項２記載の文
字認識装置であって、欠落無しと判定されたイメージデ
ータの高さに基づいて高さの規定値を補正するデータ補
正手段を設けたことにより、印刷や読み取りの誤差のた
めにイメージデータの高さが全体的に変動している場合
でも、これに対応して上部や下部が欠落した文字を相対
的に良好に検出することができる。According to a third aspect of the present invention, there is provided the character recognition apparatus according to the second aspect, further comprising a data correcting means for correcting a specified value of the height based on the height of the image data determined to have no loss. As a result, even if the height of the image data fluctuates overall due to printing or reading errors, it is possible to relatively well detect characters with missing upper and lower portions in response to this. it can.

【００６８】[0068]

【００６９】請求項４記載の発明は、認識する文字の標
準パターンを事前に設定しておき、帳票に標記された所
定の高さの文字を光学的に読み取り、読み取られたイメ
ージデータを文字単位に切り出して中心位置を標準パタ
ーンのイメージ領域の中心位置に整合させ、中心位置が
整合したイメージデータを認識パターンとして多数の標
準パターンと順次マッチングさせ、マッチング精度が最
高の標準パターンの文字を認識結果として出力するよう
にした文字認識方法において、文字単位に切り出された
イメージデータの上下方向での欠落の有無を判定し、存
在が検出された欠落の方向を検出し、存在が検出された
欠落の高さを検出し、検出された欠落の方向と高さとに
対応してイメージデータのイメージ領域に対する位置を
上下方向に補正するようにしたことにより、上部や下部
が欠落した文字のイメージデータを欠落が無い状態と同
様にイメージ領域に配置することができるので、上部や
下部が欠落した文字を良好な精度で認識することができ
る。According to a fourth aspect of the present invention, a standard pattern of a character to be recognized is set in advance, characters of a predetermined height marked on a form are optically read, and the read image data is converted into a character unit. The center position is aligned with the center position of the image area of the standard pattern, and the image data with the aligned center position is sequentially matched with many standard patterns as a recognition pattern, and the recognition result of the character of the standard pattern with the highest matching accuracy is obtained. In the character recognition method that is output as, the presence / absence of a vertical drop of the image data cut out in character units is determined, the direction of the missing where the presence is detected is detected, and the presence of the missing is detected. Detects the height and corrects the position of the image data with respect to the image area in the vertical direction according to the direction and height of the detected dropout. By doing so, the image data of the characters with the upper and lower parts missing can be arranged in the image area in the same way as the state without the missing, so that the characters with the upper and lower parts missing can be recognized with good accuracy. it can.

[Brief description of the drawings]

【図１】本発明の実施の一形態の文字認識装置であるＯ
ＣＲシステムの論理的構造を示す模式図である。FIG. 1 is a diagram illustrating a character recognition apparatus according to an embodiment of the present invention;
It is a schematic diagram which shows the logical structure of CR system.

【図２】ＯＣＲシステムの物理的構造を示すブロック図
である。FIG. 2 is a block diagram showing a physical structure of the OCR system.

【図３】標準的な認識パターンを示す模式図である。FIG. 3 is a schematic diagram showing a standard recognition pattern.

【図４】標準的な認識パターンとともに各部が欠落した
認識パターンを示す模式図である。FIG. 4 is a schematic diagram showing a recognition pattern in which each part is missing together with a standard recognition pattern.

【図５】ＯＣＲシステムによる文字認識方法の前半部を
示すフローチャートである。FIG. 5 is a flowchart showing the first half of a character recognition method by the OCR system.

【図６】ＯＣＲシステムによる文字認識方法の後半部を
示すフローチャートである。FIG. 6 is a flowchart showing the second half of the character recognition method by the OCR system.

【図７】従来の文字認識装置であるＯＣＲシステムを示
す模式図である。FIG. 7 is a schematic diagram showing an OCR system that is a conventional character recognition device.

【図８】標準パターンと認識パターンを示す模式図であ
る。FIG. 8 is a schematic diagram showing a standard pattern and a recognition pattern.

[Explanation of symbols]

２画像抽出部３多値イメージメモリ４文字切り出し部５文字正規化部６標準パターンメモリ７文字認識部１１文字認識装置であるＯＣＲシステム１２コンピュータシステム１３イメージスキャナ１４フィールド座標メモリ１５文字高記憶手段である標準文字高メモリ１６欠落検出手段等の各種手段に相当する欠落検出
部１７位置補正手段に相当する第一の基準位置補正部１８位置補正手段に相当する第二の基準位置補正部１０１コンピュータであるＣＰＵ１０２バスライン１０３情報記憶媒体であるＲＯＭ１０４情報記憶媒体であるＲＡＭ１０５情報記憶媒体であるＨＤＤ１０６情報記憶媒体であるＦＤ１０７ＦＤＤ１０８情報記憶媒体であるＣＤ−ＲＯＭ１０９ＣＤドライブ１１０キーボード１１１マウス１１２ディスプレイ１１３通信Ｉ／Ｆ2 Image extraction unit 3 Multi-valued image memory 4 Character extraction unit 5 Character normalization unit 6 Standard pattern memory 7 Character recognition unit 11 OCR system which is a character recognition device 12 Computer system 13 Image scanner 14 Field coordinate memory 15 Character height storage means Certain standard character height memory 16 Missing detecting section corresponding to various means such as missing detecting means 17 First reference position correcting section corresponding to position correcting means 18 Second reference position correcting section corresponding to position correcting means 101 Computer Certain CPU 102 Bus line 103 ROM 104 as information storage medium RAM 105 as information storage medium HDD 106 as information storage medium FD 107 FDD 108 as information storage medium CD-ROM 109 as information storage medium CD drive 110 Keyboard 111 Mau S 112 Display 113 Communication I / F

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06K 9/62 610 ──────────────────────────────────────────────────続き Continued on front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06K 9/62 610

Claims

(57) [Claims]

1. A standard pattern of a character to be recognized is set in advance, characters of a predetermined height marked on a form are optically read, and the read image data is cut out in character units to determine a center position. A character recognition device that matches the center position of the image area of the standard pattern, sequentially matches the image data with the aligned center position as a recognition pattern with a number of standard patterns, and outputs a character of the standard pattern with the highest matching accuracy as a recognition result. A drop determining means for determining the presence or absence of a drop in the vertical direction of the image data cut out in character units; a direction detecting means for detecting a direction of the drop in which the presence is detected; and a height of the drop in which the presence is detected. Height detecting means for detecting the height of the image data, and the position of the image data with respect to the image area in accordance with the detected direction and height of the defect. And a position correcting means for displacing image data for which the missing direction is not known in both directions, and a result selecting means for comparing two matching results and selecting one of them. Character recognition device.

2. A character height storage means in which a prescribed value of a character height is set in advance, and a missing determining means compares the height of image data in character units with a prescribed value to determine the presence or absence of missing data. The character recognition device according to claim 1, wherein the character recognition is performed.

3. The character recognition device according to claim 2, further comprising data correction means for correcting a specified value based on the height of the image data determined to be missing.

4. A standard pattern of a character to be recognized is set in advance, characters of a predetermined height marked on a form are optically read, and the read image data is cut out in character units to determine a center position. The center position of the image area of the standard pattern is aligned, the image data with the aligned center position is sequentially matched as a recognition pattern with a number of standard patterns, and the characters of the standard pattern with the highest matching accuracy are output as the recognition result. In the character recognition method, the presence / absence of a missing part in the vertical direction of the image data cut out in character units is determined, the direction of the missing part detected is detected, and the height of the missing part detected is detected. The position of the image data with respect to the image area is corrected vertically according to the detected direction and height of the missing part, and the direction of the missing part is determined. Character recognition method without image data is displaced vertically both, characterized by being adapted to select one by comparing the two matching results.