JPH05210759A

JPH05210759A - Character recognizing device

Info

Publication number: JPH05210759A
Application number: JP4040261A
Authority: JP
Inventors: Yoshiaki Asougawa; 佳誠麻生川
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1992-01-30
Filing date: 1992-01-30
Publication date: 1993-08-20

Abstract

PURPOSE:To comparatively accurately discriminate the height of a character even when an inputted character image is inclined or characters having different height are mixed by dividing a read character string image into plural areas and discriminating the height of a character string in each area. CONSTITUTION:Character image data for one line are sent from an I/O device 6 to a CPU 1. The CPU 1 stores the obtained character image data in an image preserving RAM 4 and divides the inputted character string into plural areas each of which has a prescribed width or the prescribed number of characters. Then the height of a character string in each divided area is found out and the average value of the height in all the areas is found out while using a RAM 3 for an arithmetic buffer memory. Then the CPU 1 executes unit character segmenting processing based upon a character segmentation control program stored in a ROM 2 in accordance with the obtained average value of the character string height. Character data segmented to a unit character are recognized as a character by a character image recognizing part 5.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、スキャナ等を用いて文
字を読み取る文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading characters using a scanner or the like.

【０００２】[0002]

【従来の技術】従来の文字認識装置は、スキャナ等を用
いて例えば１行当たりの文字を読み取りつつ、これを単
位文字毎に切り出し処理を行い、切り出された単位文字
と、所定の基準文字とをパターン比較して文字を認識す
ることが行なわれている。この文字認識作業には、切り
出された文字列（行）の高さを判別することが必要であ
り、従来の文字認識装置では、切り出された文字列画像
の最も高い点と最も低い点の間を行の高さ（文字列の高
さ）として扱っている。2. Description of the Related Art A conventional character recognizing device reads a character per line by using a scanner or the like and cuts out the character for each unit character, and cuts out the unit character and a predetermined reference character. Characters are recognized by comparing patterns. For this character recognition work, it is necessary to determine the height of the cut-out character string (row), and in the conventional character recognition device, it is between the highest point and the lowest point of the cut-out character string image. Is treated as the line height (character string height).

【０００３】[0003]

【発明が解決しようとする課題】しかしながら行の高さ
を求めるのに単純に最高点と最低点との比較を行なうだ
けの場合、入力文字列が傾いていたり、あるいは欧文文
字列のように高さの異なる文字が混在する場合には、行
の高さと文字の高さとが対応しない場合があり、単位文
字の切り出し作業に支障を生じる恐れがあった。However, in the case of simply comparing the highest point and the lowest point to obtain the height of a line, the input character string may be slanted, or the height may be high like a European character string. When characters with different lengths are mixed, the line height and the character height may not correspond to each other, which may hinder the work of cutting out the unit character.

【０００４】例えば図５（ａ）と図５（ｂ）とを比較す
ると、「ｂｏｏｋ」と「ｐｅｎ」の単語が相違するだけ
であるにもかかわらず、一文としての最高点と最低点の
距離が異なるため、異なる高さの文字列であるとして認
識されてしまう。For example, comparing FIG. 5 (a) and FIG. 5 (b), the distance between the highest point and the lowest point as one sentence is different even though the words "book" and "pen" are different. Are different, so they are recognized as strings of different heights.

【０００５】また図５（ｃ）のように、入力文字列画像
が傾いている場合には、傾いていない場合に比較して、
入力文字列画像の高さが大きなものとして認識されてし
まうのである。Further, as shown in FIG. 5 (c), when the input character string image is tilted, as compared with when it is not tilted,
The height of the input character string image is recognized as being large.

【０００６】特に所謂プロポーショナルと呼ばれる活字
形式にて印刷された英文文書のように、文字サイズが一
定でなく、しかも文字と文字とが近接しているような場
合には、単位文字ごとに切り出し、認識する場合に、い
くつかの切り出し候補点を抽出し、それぞれの切り出し
候補点で文字認識処理を行い、その中で最も処理結果が
優れた切り出し候補点を切り出し位置として決定する手
法が一般的であるが、かかる候補点を抽出する範囲を決
定するときには文字の高さから文字幅を推定することが
行なわれている。In particular, when the character size is not constant and the characters are close to each other, such as an English document printed in a so-called proportional type, the characters are cut out for each unit character. When recognizing, a general method is to extract some cutout candidate points, perform character recognition processing at each cutout candidate point, and determine the cutout candidate point with the best processing result as the cutout position. However, when determining the range for extracting such candidate points, the character width is estimated from the height of the character.

【０００７】しかしながら上述の例のように文字の高さ
が正しく判別されない場合、切り出し候補点を定めるこ
とができず、文字幅を正確に得ることが困難となる。However, if the height of the character is not correctly discriminated as in the above-mentioned example, the cutout candidate point cannot be determined, and it is difficult to obtain the character width accurately.

【０００８】すなわち、標準文字幅は文字高さから推定
されるが、文字高さが正確に抽出できない場合には、標
準文字幅の推定が困難になる。もしも正しい切り出し位
置が確実に抽出範囲に入るようにするならば抽出範囲を
大きくする必要があり、多くの切り出し候補位置が抽出
され処理効率が悪くなる。逆に処理効率を向上させよう
とすると正しい切り出し位置が抽出範囲に入らない場合
があり、処理性能が悪化する。That is, although the standard character width is estimated from the character height, it is difficult to estimate the standard character width if the character height cannot be accurately extracted. If it is necessary to ensure that the correct cutout position falls within the extraction range, it is necessary to enlarge the extraction range, and many cutout candidate positions are extracted, resulting in poor processing efficiency. On the other hand, if an attempt is made to improve the processing efficiency, the correct clipping position may not fall within the extraction range, and the processing performance will deteriorate.

【０００９】さらに、大きさの異なる同形文字の認識が
正確に行なえないという欠点があった。Further, there is a drawback that it is impossible to accurately recognize homomorphic characters of different sizes.

【００１０】本発明はかかる従来の文字認識装置が有す
る課題を克服するためになされたものであり、入力文字
列画像が傾いていたり、異なる高さの文字が混在する場
合にも比較的正確に文字の高さを判別することのできる
文字認識装置を提供することを目的とする。The present invention has been made in order to overcome the problems of the conventional character recognition device, and is relatively accurate even when the input character string image is tilted or characters of different heights are mixed. An object of the present invention is to provide a character recognition device that can determine the height of a character.

【００１１】[0011]

【課題を解決するための手段】本発明の文字認識装置
は、少なくとも１行分の文字列画像を入力する画像入力
手段を有する文字認識装置であって、画像入力手段によ
り入力された文字列画像を所定幅もしくは所定数の複数
領域に分割し、各分割領域ごとの文字列の高さを抽出す
るとともに、抽出されたデータから入力文字列の高さを
判別する判別手段としてのＣＰＵ１を備えたことを特徴
とする。A character recognition device according to the present invention is a character recognition device having an image input means for inputting a character string image of at least one line, and the character string image input by the image input means. Is divided into a plurality of regions having a predetermined width or a predetermined number, the height of the character string for each divided region is extracted, and the CPU 1 is provided as a determination unit that determines the height of the input character string from the extracted data. It is characterized by

【００１２】[0012]

【作用】上記構成の文字認識装置においては、ＣＰＵ１
は、行ごとに読み取られた文字列画像を複数の領域に分
割して、各々の領域毎に文字列の高さを判別する。こう
して文字列の並びや傾きの影響を受けにくい文字認識装
置を実現することができる。In the character recognition device having the above structure, the CPU 1
Divides the character string image read for each row into a plurality of areas, and determines the height of the character string for each area. In this way, it is possible to realize a character recognition device that is not easily affected by the arrangement and inclination of character strings.

【００１３】[0013]

【実施例】以下本発明の実施例を図面を参照して説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】図１は本発明の文字認識装置の一実施例の
構成を示すブロック図であり、図示しないスキャナ等の
入力文字列の読み取り手段から得られるデータが入出力
装置６に伝送され、伝送された１行分の文字画像データ
はバスライン７を介して判別手段としてのＣＰＵ１に送
られる。FIG. 1 is a block diagram showing the configuration of an embodiment of the character recognition device of the present invention. Data obtained from an input character string reading means such as a scanner (not shown) is transmitted to an input / output device 6 for transmission. The generated character image data for one line is sent to the CPU 1 as a discriminating means via the bus line 7.

【００１５】ＣＰＵ１は得られた文字画像データを画像
保存用ＲＡＭ４にメモリし、入力文字列を所定幅あるい
は所定文字数毎の領域に分割する。次に分割された各領
域ごとの文字列の高さを求め、演算用のバッファメモリ
であるＲＡＭ３を利用しながら、すべての領域に亘って
の高さの平均値を求める。The CPU 1 stores the obtained character image data in the image storage RAM 4 and divides the input character string into areas each having a predetermined width or a predetermined number of characters. Next, the height of the character string in each of the divided areas is obtained, and the average value of the heights over all the areas is obtained while using the RAM 3 which is a buffer memory for calculation.

【００１６】次にＣＰＵ１は求められた文字列の高さの
平均値から、ＲＯＭ２に格納された文字切り出し制御プ
ログラムに基づいて単位文字の切り出し処理を行なう。Next, the CPU 1 cuts out the unit character from the obtained average value of the heights of the character strings based on the character cut-out control program stored in the ROM 2.

【００１７】単位文字に切り出された文字データは、文
字画像認識部５によって基準のパターンと比較対照され
て、文字認識が行なわれる。The character data cut out into unit characters is compared and compared with a reference pattern by the character image recognition unit 5 to perform character recognition.

【００１８】図２は図１のＣＰＵ１が行なう文字認識処
理の動作を説明するためのフローチャートであり、まず
図示せぬスキャナから文字列画像が入力されるとＣＰＵ
１はステップＳ１にて、入力画像をＲＡＭ４にメモリす
る。次にＣＰＵ１はステップＳ２にて文字列画像を所定
の幅の領域に分割する。FIG. 2 is a flow chart for explaining the operation of the character recognition processing performed by the CPU 1 of FIG. 1. First, when a character string image is input from a scanner (not shown), the CPU
1 stores the input image in the RAM 4 in step S1. Next, the CPU 1 divides the character string image into areas of a predetermined width in step S2.

【００１９】次にＣＰＵ１はステップＳ３にて各領域ご
との高さを求めると共に、１行当たりの高さの平均値を
演算し、行の高さが求められたならステップＳ４にて求
められた行の高さに基づいた１文字当たりの切り出し処
理を行なう。切り出された文字はステップＳ５にて文字
認識処理が施される。Next, the CPU 1 obtains the height of each area in step S3, calculates the average value of the heights per row, and if the row height is obtained, obtains it in step S4. Cutout processing is performed for each character based on the line height. The cut-out characters are subjected to character recognition processing in step S5.

【００２０】文字認識処理が完了したか否かがステップ
Ｓ６で判定され、もしもまだ文字認識処理の完了してい
ない文字がある場合にはステップＳ４からの処理を繰り
返す。In step S6, it is determined whether the character recognition processing is completed. If there is a character for which the character recognition processing is not completed, the processing from step S4 is repeated.

【００２１】ステップＳ６の判定の結果、すべての文字
についての文字認識処理が完了したと判定されたならば
ＣＰＵ１はステップＳ７にてサイズの異なる文字につい
て同形の文字か否かを認識処理し、これが完了したか否
かをステップＳ８にて判定する。もしも完了していなけ
ればステップＳ３からの処理を繰り返す。If it is determined in step S6 that the character recognition processing has been completed for all the characters, the CPU 1 recognizes in step S7 whether or not the characters of different sizes have the same shape. Whether or not it is completed is determined in step S8. If not completed, the processes from step S3 are repeated.

【００２２】そして同形文字の認識処理が完了していた
場合は、文字認識処理を終了する。When the homomorphic character recognition process has been completed, the character recognition process is terminated.

【００２３】図３は本発明の文字認識装置による文字列
の高さ判別の具体例を示すものであり、同図において
は、１行の文字列が例えば２０mm幅にて分割された５つ
の領域に分割されている。各領域ごとに文字列の高さが
求められ、その平均値が文字列の高さとされる。FIG. 3 shows a specific example of the height discrimination of a character string by the character recognition device of the present invention. In FIG. 3, one line character string is divided into, for example, 20 mm widths into five areas. Is divided into The height of the character string is obtained for each area, and the average value is used as the height of the character string.

【００２４】この結果、従来の文字列の文字認識装置で
は、図３（ａ）の文字列の高さが７mmとするならば、同
じ文字列が傾いて入力された同図（ｂ）においては約２
０mmにもなるが、本発明の文字認識装置においては、両
者の差はわずかなものになる。As a result, in the conventional character recognition device for a character string, if the height of the character string in FIG. 3 (a) is 7 mm, the same character string is tilted and input in FIG. 3 (b). About 2
Although it is as small as 0 mm, the difference between the two is small in the character recognition device of the present invention.

【００２５】図４はプロポーショナル書体により印字さ
れた欧文文字を例に本発明の作用を説明するための図で
あり、Ａ乃至Ｇの境界線によって「ｍｕｌｔｉ」なる文
字の高さがそれぞれの領域において求められている。そ
れぞれの領域において高さ方向のデータを累積加算した
結果がヒストグラム状に表現されることから、特に高さ
方向の突出した部分は少なくとも文字の一部が存在する
ことを意味し、高さの低い部分は文字と文字の間か、も
しくは水平方向に引かれた文字の一部であることを意味
する。FIG. 4 is a diagram for explaining the operation of the present invention by taking a Roman character printed in proportional typeface as an example, and the height of the character "multi" by the boundary lines A to G in each region. It has been demanded. Since the result of cumulative addition of height direction data in each area is expressed in a histogram, especially the protruding portion in the height direction means that at least a part of the character exists, and the height is low. The part means that it is between characters or a part of a character drawn horizontally.

【００２６】従ってプロポーショナル書体のごとく文字
と文字が近接した場合において、少なくとも各領域にお
ける高さの累積加算値の低い部分を切り出し候補点とし
て抽出することができるのである。Therefore, when the characters are close to each other as in the proportional typeface, at least a portion having a low cumulative addition value of heights in each area can be extracted as a cutout candidate point.

【００２７】[0027]

【発明の効果】以上のように本発明の文字認識装置によ
れば、ＣＰＵ１が入力文字列画像を複数の領域に分割し
て、文字の高さを判別することにより、傾いて入力され
た文字列画像であっても正しく文字列の高さを判別する
ことが可能となり、文字の切り出し、同形文字の認識処
理を正確にかつ容易に行なうことができる。As described above, according to the character recognition apparatus of the present invention, the CPU 1 divides the input character string image into a plurality of areas and determines the height of the character, so that the character input at an inclination is obtained. It is possible to correctly determine the height of the character string even in the case of a column image, and it is possible to accurately and easily cut out characters and recognize homomorphic characters.

[Brief description of drawings]

【図１】本発明の文字認識装置の一実施例の構成を示す
ブロック図FIG. 1 is a block diagram showing the configuration of an embodiment of a character recognition device of the present invention.

【図２】図１のＣＰＵ１が行なう動作を示すフローチャ
ートFIG. 2 is a flowchart showing an operation performed by a CPU 1 in FIG.

【図３】本発明の文字認識装置が行なう文字列の高さ判
別の具体例を示す図FIG. 3 is a diagram showing a specific example of character string height determination performed by the character recognition device of the present invention.

【図４】本発明の文字認識装置が行なう文字切り出し処
理の具体例を示す図FIG. 4 is a diagram showing a specific example of character cutting processing performed by the character recognition device of the present invention.

【図５】従来の文字認識装置による文字列の高さ判別の
具体例を示す図FIG. 5 is a diagram showing a specific example of character string height determination by a conventional character recognition device.

[Explanation of symbols]

１ＣＰＵ（判別手段）２ＲＯＭ３演算用ＲＡＭ４画像保存用ＲＡＭ５文字認識部６入出力装置 1 CPU (discriminating means) 2 ROM 3 RAM for calculation 4 RAM for image storage 5 Character recognition unit 6 Input / output device

Claims

[Claims]

1. A character recognition device having an image input means for inputting a character string image of at least one line, wherein the character string image input by said image input means is divided into a plurality of regions of a predetermined width or a predetermined number. Then, the character recognition device is provided with a determination means for extracting the height of the character string for each divided area and determining the height of the input character string from the extracted data.