JP2001266069A

JP2001266069A - Reading method for optical character

Info

Publication number: JP2001266069A
Application number: JP2000076054A
Authority: JP
Inventors: Masashi Noguchi; 雅司野口
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2000-03-17
Filing date: 2000-03-17
Publication date: 2001-09-28

Abstract

PROBLEM TO BE SOLVED: To correctly read a character, even if the describing position of the character is deviated in optical character reading. SOLUTION: A segmentation control part 14 calculates segmentation start position and the number of segmentation characters for segmenting a character image, comprising its front and rear character positions based on the character start position and the effective number of characters to be read, which is registered in a slip form file 18 and controls a character-segmenting part 13. Then the part 13 segments the character image, having equal to or more than the prescribed effective number of characters, a character-recognizing part 15 recognizes the character and a character code is stored in a buffer memory 16. An output processing part 17 abandons invalid data such as 'null' in the character code stored in the buffer memory 16 and outputs only valid data for the portion of the valid number of characters as output data OUT.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、光学式文字読取装
置（以下、「ＯＣＲ」という）において文字記入領域
（以下、「フィールド」という）を読み取る光学式文字
読取方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an optical character reading method for reading a character entry area (hereinafter referred to as "field") in an optical character reading apparatus (hereinafter referred to as "OCR").

【０００２】[0002]

【従来の技術】従来、ＯＣＲにおいて帳票のフィールド
に記載された文字や数字を読み取る場合、先ず、そのフ
ィールドのイメージを画素に分解して読み取ってイメー
ジメモリに一旦記憶する。次に、文字等が記載されてい
るべき位置として予め定められた文字開始位置から、予
め定められた有効文字数分のイメージデータを切り出
す。更に、切り出したイメージデータから特徴データを
抽出して文字認識を行い、該当する文字コードを認識結
果として出力するようにしていた。2. Description of the Related Art Conventionally, when reading a character or a number described in a field of a form in an OCR, first, an image of the field is decomposed into pixels and read and temporarily stored in an image memory. Next, image data for a predetermined number of valid characters is cut out from a character start position predetermined as a position where a character or the like should be described. Further, character data is extracted from the extracted image data to perform character recognition, and a corresponding character code is output as a recognition result.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
ＯＣＲでは、次のような課題があった。即ち、例えばプ
リンタ等で帳票に数字等を印字した場合、第１文字の印
字位置がフィールド中の文字開始位置からずれることが
ある。このような場合、所定の桁数の有効文字を読み取
ることができず、桁落ちが発生するという問題点があっ
た。However, the conventional OCR has the following problems. That is, for example, when numbers or the like are printed on a form using a printer or the like, the printing position of the first character may be shifted from the character start position in the field. In such a case, there has been a problem that valid characters having a predetermined number of digits cannot be read, and a digit drop occurs.

【０００４】本発明は、前記従来技術が持っていた課題
を解決し、印字位置がずれても正確に文字を読み取るこ
とができる光学式文字読取方法を提供するものである。An object of the present invention is to solve the problems of the prior art and to provide an optical character reading method capable of reading characters accurately even if the printing position is shifted.

【０００５】[0005]

【課題を解決するための手段】前記課題を解決するため
に、本発明は、読み取り対象となる帳票のフィールドの
イメージを画素に分解して光学的に読み取る読取処理
と、予め定められた文字開始位置と有効文字数に基づい
て該フィールドのイメージを切り出す切出処理と、切り
出したイメージを文字認識して認識結果の文字コードを
出力する認識処理とを行う光学式文字読取方法におい
て、前記切出処理では、前記文字開始位置と有効文字数
に基づいて該有効文字数よりも多い文字数分のイメージ
を前記フィールドから切り出し、前記認識処理では、前
記切り出されたイメージの認識結果の内の無効データを
廃棄して前記有効文字数分の文字コードを出力するよう
にしている。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a reading process in which an image of a field of a form to be read is decomposed into pixels and optically read; An optical character reading method for performing a cutout process of cutting out an image of the field based on the position and the number of valid characters, and a recognition process of recognizing the cutout image and outputting a character code of a recognition result; Then, based on the character start position and the number of valid characters, the image for the number of characters greater than the number of valid characters is cut out from the field, and in the recognition process, invalid data in the recognition result of the cut out image is discarded. Character codes for the number of valid characters are output.

【０００６】本発明によれば、以上のように光学式文字
読取方法を構成したので、次のような作用が行われる。
先ず、読取処理によって、読み取り対象となる帳票のフ
ィールドのイメージが画素に分解して光学的に読み取ら
れる。次に、切出処理によって、予め定められた文字開
始位置と有効文字数に基づいて、その有効文字数よりも
多い文字数分の文字イメージがフィールドから切り出さ
れる。更に、認識処理によって文字認識が行われる。こ
の時、切出処理で切り出された文字のイメージの認識結
果の内で無効なデータが廃棄され、有効文字数分の文字
コードのみが出力される。According to the present invention, since the optical character reading method is configured as described above, the following operation is performed.
First, by a reading process, an image of a field of a form to be read is decomposed into pixels and optically read. Next, by a cutout process, a character image of a number of characters larger than the number of valid characters is cut out from the field based on a predetermined character start position and the number of valid characters. Further, character recognition is performed by a recognition process. At this time, invalid data is discarded in the recognition result of the image of the character extracted by the extraction processing, and only character codes corresponding to the number of valid characters are output.

【０００７】[0007]

【発明の実施の形態】図１は、本発明の実施形態を示す
ＯＣＲの構成図である。このＯＣＲは、読み取り対象と
なる帳票１を光学的に読み取るイメージスキャナ１１を
備えている。帳票１の読取面には、複数のフィールド２
が設けられている。各フィールド２には、第１文字を記
載する所定の文字開始位置３が定められると共に、この
文字開始位置を先頭にして所定数の有効文字が一定の間
隔で一列に印字されるようになっている。FIG. 1 is a block diagram of an OCR showing an embodiment of the present invention. The OCR includes an image scanner 11 that optically reads a form 1 to be read. A plurality of fields 2 are provided on the reading surface of the form 1.
Is provided. In each field 2, a predetermined character start position 3 for describing the first character is defined, and a predetermined number of valid characters are printed in a line at regular intervals starting from the character start position. I have.

【０００８】イメージスキャナ１１は、帳票１のフィー
ルド２のイメージを画素に分解して光学的に読み取るも
ので、このイメージスキャナ１１の出力側に、イメージ
メモリ１２が接続されている。イメージメモリ１２は、
フィールド２のイメージを認識処理のために一旦蓄積す
るもので、このイメージメモリ１２に文字切出部１３が
接続されている。文字切出部１３は、切出制御部１４か
らの指示に基づいてイメージメモリ１２中の文字イメー
ジを切り出し、文字認識部１５に出力するものである。The image scanner 11 is for decomposing the image of the field 2 of the form 1 into pixels and optically reading the image. An image memory 12 is connected to the output side of the image scanner 11. The image memory 12
The image of the field 2 is temporarily stored for recognition processing, and a character cutout unit 13 is connected to the image memory 12. The character extracting unit 13 extracts a character image from the image memory 12 based on an instruction from the extracting control unit 14 and outputs the character image to the character recognizing unit 15.

【０００９】切出制御部１４は、帳票１におけるフィー
ルド２の文字開始位置３、有効文字数、及び文字サイズ
等に基づいて、文字イメージを切り出すための情報を文
字切出部１３に指示するものである。文字認識部１５
は、切り出された文字イメージから特徴データを抽出
し、図示しない文字辞書等を参照して該当する文字コー
ドを生成するものである。The cutout control unit 14 instructs the character cutout unit 13 to cut out a character image based on the character start position 3 of the field 2 in the form 1, the number of valid characters, the character size, and the like. is there. Character recognition unit 15
Extracts character data from the extracted character image and generates a corresponding character code by referring to a character dictionary or the like (not shown).

【００１０】文字認識部１５の出力側には、生成された
文字コードを認識結果の出力処理のために一旦格納する
バッファメモリ１６が接続されている。バッファメモリ
１６には、出力処理部１７が接続されている。出力処理
部１７は、認識結果の内の無効データを廃棄し、有効文
字数分の文字コードのみを出力データＯＵＴとして出力
するものである。The output side of the character recognizing unit 15 is connected to a buffer memory 16 for temporarily storing the generated character codes for output processing of the recognition result. An output processing unit 17 is connected to the buffer memory 16. The output processing unit 17 discards invalid data in the recognition result and outputs only character codes for the number of valid characters as output data OUT.

【００１１】更に、このＯＣＲは、帳票１の形式を予め
登録した帳票形式ファイル１８を備えている。帳票形式
ファイル１８には、読み取り対象となる帳票１における
フィールド２の位置、文字開始位置３、有効文字数、及
び文字サイズ等の情報が登録されている。そして、帳票
形式ファイル１８から、イメージスキャナ１１に対して
フィールド２の位置情報が与えられ、切出制御部１４に
対して文字開始位置３及び有効文字数等の情報が与えら
れるようになっている。また、帳票形式ファイル１８か
ら出力処理部１７には、有効文字数等の情報が与えられ
るようになっている。Further, the OCR has a form format file 18 in which the form of the form 1 is registered in advance. Information such as the position of the field 2 in the form 1 to be read, the character start position 3, the number of valid characters, and the character size is registered in the form file 18. Then, from the form file 18, the position information of the field 2 is given to the image scanner 11, and information such as the character start position 3 and the number of valid characters is given to the cutout control unit 14. Further, information such as the number of valid characters is provided from the form format file 18 to the output processing unit 17.

【００１２】図２は、図１の動作を示すフローチャート
であり、図３は、図１中の切出制御部１４の処理の説明
図である。以下、これらの図２及び図３を参照しつつ、
図１の動作を説明する。図１のＯＣＲが起動され、帳票
１がイメージスキャナ１１にセットされると、図２のス
テップＳ１におけるイメージ読取処理が開始される。FIG. 2 is a flowchart showing the operation of FIG. 1, and FIG. 3 is an explanatory diagram of the processing of the cutout control unit 14 in FIG. Hereinafter, with reference to FIGS. 2 and 3,
The operation of FIG. 1 will be described. When the OCR of FIG. 1 is activated and the form 1 is set on the image scanner 11, the image reading process in step S1 of FIG. 2 is started.

【００１３】ステップＳ１において、イメージスキャナ
１１の動作が開始され、帳票形式ファイル１８中のフィ
ールド位置情報に基づいて、帳票１のフィールド２のイ
メージが読み取られる。読み取られたフィールド２のイ
メージは、イメージメモリ１２に蓄積される。In step S1, the operation of the image scanner 11 is started, and an image of the field 2 of the form 1 is read based on the field position information in the form file 18. The read image of the field 2 is stored in the image memory 12.

【００１４】ステップＳ２において、切出制御部１４の
動作が開始され、帳票形式ファイル１８中のフィールド
２の文字開始位置３及び有効文字数の情報に基づいて、
例えば、次式のように文字イメージの切出開始位置及び
切出文字数が算出される。切出開始位置＝読取開始位置−１切出文字数＝有効文字数＋２In step S2, the operation of the cutout control unit 14 is started, and based on the information of the character start position 3 and the number of valid characters of the field 2 in the form file 18,
For example, the cutout start position and the number of cutout characters of the character image are calculated as in the following equation. Extraction start position = Read start position-1 Number of extracted characters = Number of valid characters + 2

【００１５】即ち、このステップＳ２では、図３に例示
したように、予め定められた所定の文字開始位置よりも
１文字分左側の文字位置から文字イメージの切り出しを
開始するように、切出開始位置を設定している。更に、
所定の有効文字数よりも２文字分多い文字イメージをフ
ィールド２から切り出すように、切出文字数を設定して
いる。このよう算出された切出開始位置と切出文字数に
基づいて、切出制御部１４から文字切出部１３に対する
制御が行われる。That is, in this step S2, as shown in FIG. 3, the extraction of the character image is started from a character position one character left of a predetermined character start position. The position has been set. Furthermore,
The number of extracted characters is set so that a character image that is two characters larger than the predetermined number of valid characters is extracted from field 2. The cutout control unit 14 controls the character cutout unit 13 based on the cutout start position and the number of cutout characters thus calculated.

【００１６】ステップＳ３において、文字切出部１３の
動作が開始され、切出制御部１４からの指示に基づいて
イメージメモリ１２中の文字イメージが切り出され、文
字認識部１５に出力される。ステップＳ４において、文
字認識部１５の動作が開始され、文字切出部１３から与
えられた文字イメージの特徴データが抽出され、文字辞
書等が参照されて該当する文字コードが生成される。In step S 3, the operation of the character extracting section 13 is started, and a character image in the image memory 12 is extracted based on an instruction from the extracting control section 14 and output to the character recognizing section 15. In step S4, the operation of the character recognizing unit 15 is started, the characteristic data of the character image provided from the character extracting unit 13 is extracted, and a corresponding character code is generated by referring to a character dictionary or the like.

【００１７】ステップＳ５において、文字認識部１５で
生成された認識結果の文字コードが、バッファメモリ１
６に格納される。ステップＳ６において、出力処理部１
７の動作が開始され、無効データの廃棄処理が行われ
る。即ち、バッファメモリ１６に格納された文字コード
の内、何も記載されていない「空白」に対応する文字コ
ードが廃棄される。更に、ステップＳ７において、残っ
た有効文字数分の有効データのみが出力データＯＵＴと
して出力される。In step S5, the character code of the recognition result generated by the character recognition unit 15 is stored in the buffer memory 1.
6 is stored. In step S6, the output processing unit 1
7 is started, and the invalid data is discarded. That is, of the character codes stored in the buffer memory 16, the character code corresponding to "blank" in which nothing is described is discarded. Further, in step S7, only valid data corresponding to the number of remaining valid characters is output as output data OUT.

【００１８】このように、本実施形態のＯＣＲは、帳票
形式ファイル１８に予め登録されている文字開始位置と
有効文字数に基づいて、この有効文字数よりも多い文字
イメージをイメージメモリ１２から切り出すように制御
する切出制御部１４を有している。更に、文字認識部１
５で認識された多数の文字コードの中から、無効データ
を廃棄して所定の有効文字数分の文字コードのみを出力
データＯＵＴとして出力する出力処理部１７を有してい
る。これにより、帳票１のフィールド２に記載された文
字の位置が、所定の文字開始位置からずれていても、桁
落ち等を発生させることなく正確に文字を読み取り、正
しい桁数の文字コードを出力することができるという利
点がある。As described above, the OCR according to the present embodiment cuts out a character image larger than the number of valid characters from the image memory 12 based on the character start position and the number of valid characters registered in the form file 18 in advance. It has a cutting control unit 14 for controlling. Furthermore, the character recognition unit 1
An output processing unit 17 that discards invalid data from among the many character codes recognized in step 5 and outputs only character codes of a predetermined number of valid characters as output data OUT. Thereby, even if the position of the character described in the field 2 of the form 1 is deviated from a predetermined character start position, the character is accurately read without generating a digit loss and the character code having a correct number of digits is output. There is an advantage that can be.

【００１９】なお、本発明は、上記実施形態に限定され
ず、種々の変形が可能である。この変形例としては、例
えば、次の（ａ），（ｂ）のようなものがある。（ａ）図１のＯＣＲは、各処理の説明を明確にするた
めに、文字切出部１３、切出制御部１４、文字認識部１
５、及び出力制御部１７等の個別の処理部で構成してい
るが、マイクロコンピュータ等を用いてプログラム制御
で行うようにしても良い。The present invention is not limited to the above embodiment, and various modifications are possible. For example, there are the following modifications (a) and (b). (A) The OCR shown in FIG. 1 includes a character extracting unit 13, a clipping control unit 14, and a character recognizing unit 1 for clarifying the description of each process.
5 and an individual processing unit such as the output control unit 17, but may be performed by program control using a microcomputer or the like.

【００２０】（ｂ）図３の説明では、所定の文字位置
に対して、前と後の２文字分だけ余分に文字イメージの
切り出しを行うようにしているが、更に多数の文字イメ
ージを切り出すようにしても良い。(B) In the description of FIG. 3, a character image is cut out by an extra two characters before and after a predetermined character position, but more character images are cut out. You may do it.

【００２１】[0021]

【発明の効果】以上詳細に説明したように、本発明によ
れば、所定の文字開始位置と有効文字数に基づいてその
有効文字数よりも多い文字数分のイメージをフィールド
から切り出す切出処理を行っている。これにより、帳票
のフィールドに記載された文字の位置が、所定の文字開
始位置からずれていても、桁落ち等を発生させることな
く文字を読み取ることができる。更に、切り出されたイ
メージの認識結果の内の無効データを廃棄して有効文字
数分の文字コードを出力する認識処理を行っている。こ
れにより、帳票のフィールドに記載された文字の位置
が、所定の文字開始位置からずれていても、正しい文字
数の有効文字を読み取ることができる。As described above in detail, according to the present invention, based on a predetermined character start position and the number of valid characters, an image is extracted from a field by extracting an image of a number of characters larger than the number of valid characters. I have. As a result, even if the position of the character described in the field of the form is deviated from the predetermined character start position, the character can be read without causing digit dropout or the like. Further, a recognition process of discarding invalid data in the recognition result of the cut-out image and outputting character codes corresponding to the number of valid characters is performed. Thereby, even if the position of the character described in the field of the form deviates from the predetermined character start position, it is possible to read the valid number of valid characters.

[Brief description of the drawings]

【図１】本発明の実施形態を示すＯＣＲの構成図であ
る。FIG. 1 is a configuration diagram of an OCR showing an embodiment of the present invention.

【図２】図１の動作を示すフローチャートである。FIG. 2 is a flowchart showing the operation of FIG.

【図３】図１中の切出制御部１４の処理の説明図であ
る。FIG. 3 is an explanatory diagram of a process of a cutout control unit 14 in FIG. 1;

[Explanation of symbols]

１帳票２フィールド３文字開始位置１１イメージスキャナ１２イメージメモリ１３文字切出部１４切出制御部１５文字認識部１６バッファメモリ１７出力処理部１８帳票形式ファイル 1 Form 2 Field 3 Character Start Position 11 Image Scanner 12 Image Memory 13 Character Extraction Unit 14 Extraction Control Unit 15 Character Recognition Unit 16 Buffer Memory 17 Output Processing Unit 18 Form Format File

Claims

[Claims]

1. A reading process in which an image of a character entry area of a form to be read is decomposed into pixels and optically read, and an image of the character entry area is determined based on a predetermined character start position and the number of valid characters. In an optical character reading method that performs a cutout process of cutting out and a recognition process of recognizing a cutout image and outputting a character code of a recognition result, the cutout process includes the steps of: An image corresponding to the number of characters greater than the number of valid characters is cut out from the character entry area. In the recognition process, invalid data in the recognition result of the cut-out image is discarded, and a character code corresponding to the number of valid characters is output. An optical character reading method, comprising: