JPH01245376A

JPH01245376A - Character segmenting device for character reader

Info

Publication number: JPH01245376A
Application number: JP63072739A
Authority: JP
Inventors: Noboru Okada; 昇岡田
Original assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Current assignee: Toshiba Corp; Toshiba Computer Engineering Corp
Priority date: 1988-03-26
Filing date: 1988-03-26
Publication date: 1989-09-29

Abstract

PURPOSE:To attain printing by means of a non-dropout color by detecting a character frame field based on a formed histogram and executing character segmenting processing in each character based on the character frame field. CONSTITUTION:A character frame field consisting of a character frame partitioned into each character and a document to be read out recored in the character frame are scanned by a scanning part 20 and the character frame field and pattern data corresponding to the character are stored in an image buffer 21. A histogram forming part 22 forms a histogram consisting of projection data in line and row directions from image pattern data. A black character frame field detecting part 23 detects a character frame field based on the histogram and a character segmenting part 24 executes character segmenting processing in each character obtained from the buffer 21 based on the character frame field. Consequently, it is unnecessary to previously print out a character on a document by means of a dropout color.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）本発明は、１文字毎の文字認識処理を行なう文字読取装
置の文字切出し装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Object of the Invention] (Industrial Application Field) The present invention relates to a character cutting device for a character reading device that performs character recognition processing for each character.

（従来の技術）従来、光学的文字読取装置では、読取対象の帳票上を光
学的に走査し、充電変換処理により得られた画像パター
ンデータがイメージバッファに格納される。イメージバ
ッフ？には例えば１シート（１帳票）分の画像パターン
データが格納される。そして、イメージバッファかう１
文字毎の文字切出し処理がなされて、１文字単位の文字
！ｉ！１処理が実行される。(Prior Art) Conventionally, in an optical character reading device, a form to be read is optically scanned, and image pattern data obtained by charge conversion processing is stored in an image buffer. Image buff? For example, image pattern data for one sheet (one form) is stored. And image buffer 1
Character-by-character character extraction processing is performed, character by character! i! 1 process is executed.

文字切出し処理では、第５図に示すように、イメージバ
ッファ内の各画像パターンデータ（文字パターン）Ｐに
対応する射影データ１０ａ〜１０ｄからなるヒストグラ
ムが作成される。ここで、各文字は、帳票上にドロップ
アウトカラーで印刷された文字枠１１に記録されている
。In the character cutting process, as shown in FIG. 5, a histogram consisting of projection data 10a to 10d corresponding to each image pattern data (character pattern) P in the image buffer is created. Here, each character is recorded in a character frame 11 printed in dropout color on the form.

光学的文字読取装置の文字切出し部は、例えば１行分の
フィールドに対応するヒストグラムを走査し、文字枠間
のスペースに相当する白データを検出して文字切出し位
置１２ａ〜１２ｃを決定する。The character cutting section of the optical character reading device scans a histogram corresponding to, for example, one line of field, detects white data corresponding to the space between character frames, and determines character cutting positions 12a to 12c.

この文字切出し位置１２ａ〜１２Ｃを基準として、１文
字毎の画像パターンデータをイメージバッファから切出
し、文字認識部へ転送する。これにより、文字認識部は
１文字毎の文字認識処理を行なうことになる。Image pattern data for each character is cut out from the image buffer using the character cutout positions 12a to 12C as references, and transferred to the character recognition section. As a result, the character recognition section performs character recognition processing for each character.

ところで、前記のような文字切出し方式では、読取対象
である帳票上の各文字の記録位置を設定するために、文
字枠が帳票に予め印刷されているが、その文字枠は読取
不可となるようにドロップアウトカラーで印刷されてい
る必要がある。これは、文字枠が文字と共にイメージバ
ッフ？に格納されると、文字切出し処理に不都合なデー
タとなるからである。しかし、文字枠をドロップアウト
カラーで印刷することは、帳票を作成する際に大きな制
約条件となり、帳票の作成工程の複雑化及び帳票のコス
ト増大化の原因になっている。By the way, in the above-mentioned character extraction method, a character frame is printed in advance on the form in order to set the recording position of each character on the form to be read, but the character frame is printed so that it cannot be read. must be printed in dropout color. Is this an image buffer where the text frame is used along with the text? This is because if the data is stored in , the data becomes inconvenient for character extraction processing. However, printing the character frame in dropout color is a major constraint when creating a form, complicating the form creation process and increasing the cost of the form.

また、文字枠間に所定のスペースを設ける必要があるた
め、帳票全体が大型化し、小型の帳票を作成することが
困難である。Furthermore, since it is necessary to provide a predetermined space between character frames, the entire form becomes large, making it difficult to create a small form.

（発明が解決しようとする課題）従来の文字切出し方式では、帳票上に予め設けられる文
字枠はドロップアウトカラーで印刷される必要がある。(Problems to be Solved by the Invention) In the conventional character cutout method, character frames provided in advance on a form need to be printed in a dropout color.

このため、帳票の作成工程の複雑化及び帳票のコスト増
大化を招く欠点がある。For this reason, there is a drawback that the process of creating the form becomes complicated and the cost of the form increases.

また、文字枠間に所定のスペースを設ける必要があるた
め、小型の帳票を作成することが困難である。Furthermore, since it is necessary to provide a predetermined space between character frames, it is difficult to create small forms.

本発明の目的は、非ドロップアウトカラーで印刷され、
かつ文字枠間のスペースが不必要な文字枠を有する帳票
を読取対象として、その帳票の文字枠内に記録された文
字の切出し処理を確実に実行できる文字読取装置の文字
切出し装置を提供することにある。The object of the invention is to print with non-dropout colors,
To provide a character cutting device of a character reading device which can reliably cut out characters recorded in the character frames of a document when reading a document having character frames with unnecessary spaces between the character frames. It is in.

［発明の構成コ（課題を解決するための手段と作用）本発明は、１文字分毎に区画された文字枠からなる文字
枠フィールド及び文字枠内に記録された読取対象の帳票
上を走査し、文字枠フィールド及び文字に対応する画像
パターンデータを格納するパンフ？メモリ手段を備えて
いる。バッファメモ９手段に格納された画像パターンデ
ータから、行及び列の各方向の射影データからなるヒス
トグラムをヒストグラム生成手段により生成する。この
ヒストグラム生成手段により生成されたヒストグラムに
基づいて、文字枠フィールド検出手段により文字枠フィ
ールドを検出する。この文字枠フィールド検出手段によ
り検出された文字枠フィールドに基づいて、文字切出し
手段によりバッファメモリ手段から１文字毎の文字切出
し処理を行なう。[Configuration of the Invention (Means and Effects for Solving the Problems) The present invention provides a system for scanning a character frame field consisting of a character frame partitioned for each character and a document to be read recorded within the character frame. And a pamphlet that stores the character frame field and image pattern data corresponding to the characters? Equipped with memory means. From the image pattern data stored in the buffer memo 9 means, a histogram consisting of projection data in each direction of rows and columns is generated by the histogram generating means. Based on the histogram generated by the histogram generation means, a character frame field is detected by the character frame field detection means. Based on the character frame field detected by the character frame field detection means, the character extraction means performs character extraction processing for each character from the buffer memory means.

このような構成の装置により、文字枠を予めドロップア
ウトカラーで帳票に印刷することを不要にすることがで
きる。また、文字枠に基づいて文字切出し処理を行なう
ため、文字枠間に特にスペースを設ける必要がない。With a device having such a configuration, it is possible to eliminate the need to print character frames on a form in advance in a dropout color. Further, since the character extraction process is performed based on the character frames, there is no need to provide any particular space between the character frames.

（実施例）以下図面を参照して本発明の詳細な説明する。第１図は
同実施例の文字読取装置の構成を示すブロック図である
。第１図に示すように、本装置は、走査部２０、イメー
ジバッファ２１、ヒストグラム生成部２２、黒文字枠フ
ィールド検出部２３、文字切出し部２４及び文字認識部
２５を備えている。走査部２０は読取対象の帳票上を光
学的に走査し、光電変換されて得られる画像パターンデ
ータを出力する回路である。イメージバッフ？２１は、
走査部２０から出力される１シ一ト分の画像パターンデ
ータを格納するバッファメモリである。(Example) The present invention will be described in detail below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a character reading device according to the same embodiment. As shown in FIG. 1, this apparatus includes a scanning section 20, an image buffer 21, a histogram generation section 22, a black character frame field detection section 23, a character cutting section 24, and a character recognition section 25. The scanning unit 20 is a circuit that optically scans a form to be read and outputs image pattern data obtained through photoelectric conversion. Image buff? 21 is
This is a buffer memory that stores one sheet of image pattern data output from the scanning section 20.

ヒストグラム生成部２２は、イメージバッフ？２１に格
納された画像パターンデータにおいて、予め指定される
エリアに対応するヒストグラムを生成する回路である。Is the histogram generation unit 22 an image buffer? This circuit generates a histogram corresponding to a prespecified area in the image pattern data stored in 21.

黒文字枠フィールド検出部２３は、ヒストグラム生成部
２２で生成されたピストグラムに基づいて、黒文字枠フ
ィールドの検出を行なう回路である。文字切出し部２４
は、検出された黒文字枠フィールドに基づいて、イメー
ジバッファ２１から１文字毎の文字パターンデータを切
出し、文字認識部２５へ出力する。文字認識ＥＢ２５は
、１文字毎の文字パターンデータに対する文字認識処理
を行なう回路である。The black character frame field detection unit 23 is a circuit that detects a black character frame field based on the pistogram generated by the histogram generation unit 22. Character cutting section 24
extracts character pattern data for each character from the image buffer 21 based on the detected black character frame field and outputs it to the character recognition unit 25. The character recognition EB 25 is a circuit that performs character recognition processing on character pattern data for each character.

次に、同実施例の動作を説明する。先ず、読取対象の帳
票が走査部２０により走査されると、その帳票に記録さ
れた文字及び黒文字枠に対応する１シ一ト分の画像パタ
ーンデータがイメージバッファ２１に格納される。ここ
で、黒文字枠は非ドロップアウトカラーで帳票に印刷さ
れており、縦及び横の罫線により文字群を１文字毎に区
画するためのフィールドである。イメージバッファ２１
には、第３図に示すように、文字（ここでは、数字）に
対応する文字パターンデータ及び黒文字枠に対応する画
像パターンデータＢが格納されることになる。Next, the operation of this embodiment will be explained. First, when a document to be read is scanned by the scanning unit 20, one sheet of image pattern data corresponding to the characters and black character frames recorded on the document is stored in the image buffer 21. Here, the black character frame is printed on the form in a non-dropout color, and is a field for dividing a character group into individual characters using vertical and horizontal ruled lines. image buffer 21
As shown in FIG. 3, character pattern data corresponding to characters (in this case, numbers) and image pattern data B corresponding to black character frames are stored.

次に、ヒストグラム生成部２２は、イメージバッファ２
１に格納された画像パターンデータにおいて、予め指定
されるエリア（第３図の点線からなる範囲）に対するヒ
ストグラムを生成する。このエリア指定は、例えばホス
トコンピュータからフォーマットコントロール情報とし
て与えられる。ピストグラム生成部２２は、第４図に示
すように、指定されたエリアＡの画像パターンデータに
対するヒストグラムの生成処理を行なう。Next, the histogram generation unit 22 generates the image buffer 2
In the image pattern data stored in 1, a histogram is generated for a prespecified area (the range indicated by the dotted line in FIG. 3). This area designation is given as format control information from the host computer, for example. The pistogram generation unit 22 performs a histogram generation process for the image pattern data of the designated area A, as shown in FIG.

ヒストグラム生成部２２は、第２図のステップＳ１に示
すように、エリアＡ内を横方向へ走査し、第４図に示す
ようなヒストグラムＨ１を生成する。The histogram generation unit 22 scans the area A in the horizontal direction, as shown in step S1 in FIG. 2, and generates a histogram H1 as shown in FIG. 4.

このヒストグラムＨ１は、横方向の文字に対応する射影
データＤ１及び黒文字枠フィールドＦに対応する射影デ
ータＤ２．０３からなる。黒文字枠フィールド検出部２
３は、ピストグラム生成部２２により生成された横方向
のヒストグラムＨ１の黒文字枠に対応する射影データ０
２．０３に基づいて、黒文字枠フィールドＦの上下位１
４０．４１を検出する（ステップＳ２）。This histogram H1 consists of projection data D1 corresponding to characters in the horizontal direction and projection data D2.03 corresponding to the black character frame field F. Black character frame field detection section 2
3 is projection data 0 corresponding to the black character frame of the horizontal histogram H1 generated by the pistogram generation unit 22.
Based on 2.03, the upper and lower 1 of the black text frame field F
40.41 is detected (step S2).

次に、ヒストグラム生成部２２は、第４図に示すように
、所定のエリアＡ内の所定の左右端範囲な設定し、その
各ｔ！囲に対して縦方向に走査して、ヒストグラムＨ２
，Ｈ３を生成する（ステップ８４）。黒文字枠フィール
ド検出部２３は、ヒストグラム生成部２２により生成さ
れた縦方向のヒストグラムＨ２，Ｈ３の黒文字枠に対応
する射影データＤ４．Ｄ５に基づいて、黒文字枠フィー
ルドＦの左右位置４２．４３を検出する（ステップＳ５
）。Next, as shown in FIG. 4, the histogram generation unit 22 sets a predetermined left and right end range within a predetermined area A, and each t! The histogram H2 is scanned vertically with respect to the
, H3 (step 84). The black character frame field detection unit 23 uses projection data D4 . Based on D5, the left and right positions 42 and 43 of the black character frame field F are detected (step S5
).

ここで、黒文字枠フィールド検出部２３は、上下位置４
０．４１又は左右位置４２．４３を検出する際に、それ
ぞれの黒文字枠に対応する射影データＤ２〜Ｄ５のピー
クが予め決定された為さ及び幅の基準値を満足するか否
かを判定することになる。Here, the black character frame field detection unit 23 detects the upper and lower positions 4.
When detecting 0.41 or the left/right position 42.43, it is determined whether the peaks of the projection data D2 to D5 corresponding to each black character frame satisfy predetermined width and width reference values. It turns out.

このようにして、黒文字枠フィールド検出部２３は、ヒ
ストグラム生成部２２で生成されたヒストグラムＨ１〜
Ｈ３に基づいて黒文字枠フィールドの位ＩＦ（第４図の
４０〜４３）を検出する（ステップ８６）。そして、文
字切出し部２４は、黒文字枠フィールド検出部２３によ
り検出された黒文字枠フィールドに基づいて、イメージ
バッファ２１から１文字毎の文字パターンデータ（第４
図のパターンＰ）を切出して（ステップＳ７）、文字２
１部２５へ出力する。文字認識部２５は、文字切出し部
２４により切出された１文字毎の文字パターンデータに
対する文字認識処理を行なう。In this way, the black character frame field detection unit 23 detects the histograms H1 to H1 generated by the histogram generation unit 22.
The position IF (40 to 43 in FIG. 4) of the black character frame field is detected based on H3 (step 86). Then, the character cutting unit 24 extracts character pattern data for each character (the fourth
Pattern P) in the figure is cut out (step S7), character 2
Output to part 1 25. The character recognition unit 25 performs character recognition processing on the character pattern data for each character extracted by the character extraction unit 24.

ここで、文字切出し部２４は、検出された黒文字枠フィ
ールドに基づいて、各文字毎の黒文字枠の縦方向及び横
方向のエツジを検出し、このエツジからなる１文字分の
文字枠位置データを作成する。Here, the character cutting unit 24 detects the vertical and horizontal edges of the black character frame for each character based on the detected black character frame field, and extracts character frame position data for one character consisting of these edges. create.

この文字枠位置データによる切出し位置を決定し、１文
字毎の文字切出し処理を行なうことになる。The cutting position is determined based on this character frame position data, and character cutting processing is performed for each character.

尚、ステップＳ３において、黒文字枠フィールドが検出
されない場合には、従来方式（第５図）による文字切出
し処理が実行されることになる（ステップ８８）。Incidentally, if a black character frame field is not detected in step S3, character extraction processing using the conventional method (FIG. 5) is executed (step 88).

［発明の効果］以上詳述したように本発明によれば、黒色等の非ドロッ
プアウトカラーで印刷された黒文字枠フィールドを有す
る帳票を使用した場合に、その黒文字枠フィールドに対
応する画像パターンデータを利用して、フィールド内に
記録された文字群から１文字毎の文字切出し処理を行な
うことができる。したがって、文字枠をドロップアウト
カラーで印刷する工程を無（すことができるため、帳票
の作成工程を簡単化し、作成コストの軽減化を図ること
ができる。[Effects of the Invention] As detailed above, according to the present invention, when a form having a black character frame field printed in a non-dropout color such as black is used, image pattern data corresponding to the black character frame field is Using this, it is possible to perform character-by-character extraction processing from a group of characters recorded in a field. Therefore, it is possible to eliminate the step of printing the character frame in a dropout color, thereby simplifying the process of creating a form and reducing the cost of creating it.

また、黒文字枠を利用して文字切出しを行なうために、
従来の方式で必要な文字枠間のスペースを特に確保する
必要がなくなる。このため、帳票上においてそのスペー
ス分を省略できるため、帳票全体の小型化を図ることが
できるものである。Also, in order to cut out characters using the black character frame,
There is no need to specifically secure the space between character frames that is required in the conventional method. As a result, the space on the form can be omitted, making it possible to reduce the size of the entire form.

[Brief explanation of the drawing]

第１図は本発明の実施例に係わる文字読取装置の構成を
示すブロック図、第２図は同実施例の動作を説明するた
めのフローチャート、第３図及び第４図はそれぞれ同実
施例の動作を説明するための概念図、第５図は従来の文
字切出し方式を説明するための概念図である。２０・・・走査部、２１・・・イメージバッファ、２２
・・・ヒストグラム生成部、２３・・・黒文字枠フィー
ルド検出部、２４・・・文字切出し部、２５・・・文字
認識部。第　１　図０３　口第４　囚第５　図FIG. 1 is a block diagram showing the configuration of a character reading device according to an embodiment of the present invention, FIG. 2 is a flowchart for explaining the operation of the embodiment, and FIGS. 3 and 4 are respectively of the embodiment. FIG. 5 is a conceptual diagram for explaining the conventional character extraction method. 20... Scanning unit, 21... Image buffer, 22
. . . histogram generation section, 23 . . . black character frame field detection section, 24 . . . character cutting section, 25 . . . character recognition section. Figure 1 Figure 03 Mouth Figure 4 Prisoner Figure 5

Claims

[Scope of Claims] Image pattern data obtained by scanning a character frame field consisting of a character frame divided into character frames and a document to be read recorded within the character frame and photoelectrically converting it is output. scanning means for scanning, buffer memory means for storing image pattern data corresponding to the character frame field and characters outputted from the scanning means, and scanning means for scanning in each direction of rows and columns from the image pattern data stored in the buffer memory means. a histogram generating means for generating a histogram consisting of projection data of; a character frame field detecting means for detecting the character frame field based on the histogram generated by the histogram generating means; A character cutting device for a character reading device, comprising character cutting means for cutting out characters character by character from the buffer memory means based on a character frame field.