JPH05210759A - Character recognizing device - Google Patents
Character recognizing deviceInfo
- Publication number
- JPH05210759A JPH05210759A JP4040261A JP4026192A JPH05210759A JP H05210759 A JPH05210759 A JP H05210759A JP 4040261 A JP4040261 A JP 4040261A JP 4026192 A JP4026192 A JP 4026192A JP H05210759 A JPH05210759 A JP H05210759A
- Authority
- JP
- Japan
- Prior art keywords
- character
- height
- character string
- image
- cpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Character Input (AREA)
Abstract
Description
【0001】[0001]
【産業上の利用分野】本発明は、スキャナ等を用いて文
字を読み取る文字認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a character recognition device for reading characters using a scanner or the like.
【0002】[0002]
【従来の技術】従来の文字認識装置は、スキャナ等を用
いて例えば1行当たりの文字を読み取りつつ、これを単
位文字毎に切り出し処理を行い、切り出された単位文字
と、所定の基準文字とをパターン比較して文字を認識す
ることが行なわれている。この文字認識作業には、切り
出された文字列(行)の高さを判別することが必要であ
り、従来の文字認識装置では、切り出された文字列画像
の最も高い点と最も低い点の間を行の高さ(文字列の高
さ)として扱っている。2. Description of the Related Art A conventional character recognizing device reads a character per line by using a scanner or the like and cuts out the character for each unit character, and cuts out the unit character and a predetermined reference character. Characters are recognized by comparing patterns. For this character recognition work, it is necessary to determine the height of the cut-out character string (row), and in the conventional character recognition device, it is between the highest point and the lowest point of the cut-out character string image. Is treated as the line height (character string height).
【0003】[0003]
【発明が解決しようとする課題】しかしながら行の高さ
を求めるのに単純に最高点と最低点との比較を行なうだ
けの場合、入力文字列が傾いていたり、あるいは欧文文
字列のように高さの異なる文字が混在する場合には、行
の高さと文字の高さとが対応しない場合があり、単位文
字の切り出し作業に支障を生じる恐れがあった。However, in the case of simply comparing the highest point and the lowest point to obtain the height of a line, the input character string may be slanted, or the height may be high like a European character string. When characters with different lengths are mixed, the line height and the character height may not correspond to each other, which may hinder the work of cutting out the unit character.
【0004】例えば図5(a)と図5(b)とを比較す
ると、「book」と「pen」の単語が相違するだけ
であるにもかかわらず、一文としての最高点と最低点の
距離が異なるため、異なる高さの文字列であるとして認
識されてしまう。For example, comparing FIG. 5 (a) and FIG. 5 (b), the distance between the highest point and the lowest point as one sentence is different even though the words "book" and "pen" are different. Are different, so they are recognized as strings of different heights.
【0005】また図5(c)のように、入力文字列画像
が傾いている場合には、傾いていない場合に比較して、
入力文字列画像の高さが大きなものとして認識されてし
まうのである。Further, as shown in FIG. 5 (c), when the input character string image is tilted, as compared with when it is not tilted,
The height of the input character string image is recognized as being large.
【0006】特に所謂プロポーショナルと呼ばれる活字
形式にて印刷された英文文書のように、文字サイズが一
定でなく、しかも文字と文字とが近接しているような場
合には、単位文字ごとに切り出し、認識する場合に、い
くつかの切り出し候補点を抽出し、それぞれの切り出し
候補点で文字認識処理を行い、その中で最も処理結果が
優れた切り出し候補点を切り出し位置として決定する手
法が一般的であるが、かかる候補点を抽出する範囲を決
定するときには文字の高さから文字幅を推定することが
行なわれている。In particular, when the character size is not constant and the characters are close to each other, such as an English document printed in a so-called proportional type, the characters are cut out for each unit character. When recognizing, a general method is to extract some cutout candidate points, perform character recognition processing at each cutout candidate point, and determine the cutout candidate point with the best processing result as the cutout position. However, when determining the range for extracting such candidate points, the character width is estimated from the height of the character.
【0007】しかしながら上述の例のように文字の高さ
が正しく判別されない場合、切り出し候補点を定めるこ
とができず、文字幅を正確に得ることが困難となる。However, if the height of the character is not correctly discriminated as in the above-mentioned example, the cutout candidate point cannot be determined, and it is difficult to obtain the character width accurately.
【0008】すなわち、標準文字幅は文字高さから推定
されるが、文字高さが正確に抽出できない場合には、標
準文字幅の推定が困難になる。もしも正しい切り出し位
置が確実に抽出範囲に入るようにするならば抽出範囲を
大きくする必要があり、多くの切り出し候補位置が抽出
され処理効率が悪くなる。逆に処理効率を向上させよう
とすると正しい切り出し位置が抽出範囲に入らない場合
があり、処理性能が悪化する。That is, although the standard character width is estimated from the character height, it is difficult to estimate the standard character width if the character height cannot be accurately extracted. If it is necessary to ensure that the correct cutout position falls within the extraction range, it is necessary to enlarge the extraction range, and many cutout candidate positions are extracted, resulting in poor processing efficiency. On the other hand, if an attempt is made to improve the processing efficiency, the correct clipping position may not fall within the extraction range, and the processing performance will deteriorate.
【0009】さらに、大きさの異なる同形文字の認識が
正確に行なえないという欠点があった。Further, there is a drawback that it is impossible to accurately recognize homomorphic characters of different sizes.
【0010】本発明はかかる従来の文字認識装置が有す
る課題を克服するためになされたものであり、入力文字
列画像が傾いていたり、異なる高さの文字が混在する場
合にも比較的正確に文字の高さを判別することのできる
文字認識装置を提供することを目的とする。The present invention has been made in order to overcome the problems of the conventional character recognition device, and is relatively accurate even when the input character string image is tilted or characters of different heights are mixed. An object of the present invention is to provide a character recognition device that can determine the height of a character.
【0011】[0011]
【課題を解決するための手段】本発明の文字認識装置
は、少なくとも1行分の文字列画像を入力する画像入力
手段を有する文字認識装置であって、画像入力手段によ
り入力された文字列画像を所定幅もしくは所定数の複数
領域に分割し、各分割領域ごとの文字列の高さを抽出す
るとともに、抽出されたデータから入力文字列の高さを
判別する判別手段としてのCPU1を備えたことを特徴
とする。A character recognition device according to the present invention is a character recognition device having an image input means for inputting a character string image of at least one line, and the character string image input by the image input means. Is divided into a plurality of regions having a predetermined width or a predetermined number, the height of the character string for each divided region is extracted, and the CPU 1 is provided as a determination unit that determines the height of the input character string from the extracted data. It is characterized by
【0012】[0012]
【作用】上記構成の文字認識装置においては、CPU1
は、行ごとに読み取られた文字列画像を複数の領域に分
割して、各々の領域毎に文字列の高さを判別する。こう
して文字列の並びや傾きの影響を受けにくい文字認識装
置を実現することができる。In the character recognition device having the above structure, the CPU 1
Divides the character string image read for each row into a plurality of areas, and determines the height of the character string for each area. In this way, it is possible to realize a character recognition device that is not easily affected by the arrangement and inclination of character strings.
【0013】[0013]
【実施例】以下本発明の実施例を図面を参照して説明す
る。Embodiments of the present invention will be described below with reference to the drawings.
【0014】図1は本発明の文字認識装置の一実施例の
構成を示すブロック図であり、図示しないスキャナ等の
入力文字列の読み取り手段から得られるデータが入出力
装置6に伝送され、伝送された1行分の文字画像データ
はバスライン7を介して判別手段としてのCPU1に送
られる。FIG. 1 is a block diagram showing the configuration of an embodiment of the character recognition device of the present invention. Data obtained from an input character string reading means such as a scanner (not shown) is transmitted to an input / output device 6 for transmission. The generated character image data for one line is sent to the CPU 1 as a discriminating means via the bus line 7.
【0015】CPU1は得られた文字画像データを画像
保存用RAM4にメモリし、入力文字列を所定幅あるい
は所定文字数毎の領域に分割する。次に分割された各領
域ごとの文字列の高さを求め、演算用のバッファメモリ
であるRAM3を利用しながら、すべての領域に亘って
の高さの平均値を求める。The CPU 1 stores the obtained character image data in the image storage RAM 4 and divides the input character string into areas each having a predetermined width or a predetermined number of characters. Next, the height of the character string in each of the divided areas is obtained, and the average value of the heights over all the areas is obtained while using the RAM 3 which is a buffer memory for calculation.
【0016】次にCPU1は求められた文字列の高さの
平均値から、ROM2に格納された文字切り出し制御プ
ログラムに基づいて単位文字の切り出し処理を行なう。Next, the CPU 1 cuts out the unit character from the obtained average value of the heights of the character strings based on the character cut-out control program stored in the ROM 2.
【0017】単位文字に切り出された文字データは、文
字画像認識部5によって基準のパターンと比較対照され
て、文字認識が行なわれる。The character data cut out into unit characters is compared and compared with a reference pattern by the character image recognition unit 5 to perform character recognition.
【0018】図2は図1のCPU1が行なう文字認識処
理の動作を説明するためのフローチャートであり、まず
図示せぬスキャナから文字列画像が入力されるとCPU
1はステップS1にて、入力画像をRAM4にメモリす
る。次にCPU1はステップS2にて文字列画像を所定
の幅の領域に分割する。FIG. 2 is a flow chart for explaining the operation of the character recognition processing performed by the CPU 1 of FIG. 1. First, when a character string image is input from a scanner (not shown), the CPU
1 stores the input image in the RAM 4 in step S1. Next, the CPU 1 divides the character string image into areas of a predetermined width in step S2.
【0019】次にCPU1はステップS3にて各領域ご
との高さを求めると共に、1行当たりの高さの平均値を
演算し、行の高さが求められたならステップS4にて求
められた行の高さに基づいた1文字当たりの切り出し処
理を行なう。切り出された文字はステップS5にて文字
認識処理が施される。Next, the CPU 1 obtains the height of each area in step S3, calculates the average value of the heights per row, and if the row height is obtained, obtains it in step S4. Cutout processing is performed for each character based on the line height. The cut-out characters are subjected to character recognition processing in step S5.
【0020】文字認識処理が完了したか否かがステップ
S6で判定され、もしもまだ文字認識処理の完了してい
ない文字がある場合にはステップS4からの処理を繰り
返す。In step S6, it is determined whether the character recognition processing is completed. If there is a character for which the character recognition processing is not completed, the processing from step S4 is repeated.
【0021】ステップS6の判定の結果、すべての文字
についての文字認識処理が完了したと判定されたならば
CPU1はステップS7にてサイズの異なる文字につい
て同形の文字か否かを認識処理し、これが完了したか否
かをステップS8にて判定する。もしも完了していなけ
ればステップS3からの処理を繰り返す。If it is determined in step S6 that the character recognition processing has been completed for all the characters, the CPU 1 recognizes in step S7 whether or not the characters of different sizes have the same shape. Whether or not it is completed is determined in step S8. If not completed, the processes from step S3 are repeated.
【0022】そして同形文字の認識処理が完了していた
場合は、文字認識処理を終了する。When the homomorphic character recognition process has been completed, the character recognition process is terminated.
【0023】図3は本発明の文字認識装置による文字列
の高さ判別の具体例を示すものであり、同図において
は、1行の文字列が例えば20mm幅にて分割された5つ
の領域に分割されている。各領域ごとに文字列の高さが
求められ、その平均値が文字列の高さとされる。FIG. 3 shows a specific example of the height discrimination of a character string by the character recognition device of the present invention. In FIG. 3, one line character string is divided into, for example, 20 mm widths into five areas. Is divided into The height of the character string is obtained for each area, and the average value is used as the height of the character string.
【0024】この結果、従来の文字列の文字認識装置で
は、図3(a)の文字列の高さが7mmとするならば、同
じ文字列が傾いて入力された同図(b)においては約2
0mmにもなるが、本発明の文字認識装置においては、両
者の差はわずかなものになる。As a result, in the conventional character recognition device for a character string, if the height of the character string in FIG. 3 (a) is 7 mm, the same character string is tilted and input in FIG. 3 (b). About 2
Although it is as small as 0 mm, the difference between the two is small in the character recognition device of the present invention.
【0025】図4はプロポーショナル書体により印字さ
れた欧文文字を例に本発明の作用を説明するための図で
あり、A乃至Gの境界線によって「multi」なる文
字の高さがそれぞれの領域において求められている。そ
れぞれの領域において高さ方向のデータを累積加算した
結果がヒストグラム状に表現されることから、特に高さ
方向の突出した部分は少なくとも文字の一部が存在する
ことを意味し、高さの低い部分は文字と文字の間か、も
しくは水平方向に引かれた文字の一部であることを意味
する。FIG. 4 is a diagram for explaining the operation of the present invention by taking a Roman character printed in proportional typeface as an example, and the height of the character "multi" by the boundary lines A to G in each region. It has been demanded. Since the result of cumulative addition of height direction data in each area is expressed in a histogram, especially the protruding portion in the height direction means that at least a part of the character exists, and the height is low. The part means that it is between characters or a part of a character drawn horizontally.
【0026】従ってプロポーショナル書体のごとく文字
と文字が近接した場合において、少なくとも各領域にお
ける高さの累積加算値の低い部分を切り出し候補点とし
て抽出することができるのである。Therefore, when the characters are close to each other as in the proportional typeface, at least a portion having a low cumulative addition value of heights in each area can be extracted as a cutout candidate point.
【0027】[0027]
【発明の効果】以上のように本発明の文字認識装置によ
れば、CPU1が入力文字列画像を複数の領域に分割し
て、文字の高さを判別することにより、傾いて入力され
た文字列画像であっても正しく文字列の高さを判別する
ことが可能となり、文字の切り出し、同形文字の認識処
理を正確にかつ容易に行なうことができる。As described above, according to the character recognition apparatus of the present invention, the CPU 1 divides the input character string image into a plurality of areas and determines the height of the character, so that the character input at an inclination is obtained. It is possible to correctly determine the height of the character string even in the case of a column image, and it is possible to accurately and easily cut out characters and recognize homomorphic characters.
【図1】本発明の文字認識装置の一実施例の構成を示す
ブロック図FIG. 1 is a block diagram showing the configuration of an embodiment of a character recognition device of the present invention.
【図2】図1のCPU1が行なう動作を示すフローチャ
ートFIG. 2 is a flowchart showing an operation performed by a CPU 1 in FIG.
【図3】本発明の文字認識装置が行なう文字列の高さ判
別の具体例を示す図FIG. 3 is a diagram showing a specific example of character string height determination performed by the character recognition device of the present invention.
【図4】本発明の文字認識装置が行なう文字切り出し処
理の具体例を示す図FIG. 4 is a diagram showing a specific example of character cutting processing performed by the character recognition device of the present invention.
【図5】従来の文字認識装置による文字列の高さ判別の
具体例を示す図FIG. 5 is a diagram showing a specific example of character string height determination by a conventional character recognition device.
1 CPU(判別手段) 2 ROM 3 演算用RAM 4 画像保存用RAM 5 文字認識部 6 入出力装置 1 CPU (discriminating means) 2 ROM 3 RAM for calculation 4 RAM for image storage 5 Character recognition unit 6 Input / output device
Claims (1)
る画像入力手段を有する文字認識装置であって、前記画
像入力手段により入力された文字列画像を所定幅もしく
は所定数の複数領域に分割し、各分割領域ごとの文字列
の高さを抽出するとともに、抽出されたデータから入力
文字列の高さを判別する判別手段を備えたことを特徴と
する文字認識装置。1. A character recognition device having an image input means for inputting a character string image of at least one line, wherein the character string image input by said image input means is divided into a plurality of regions of a predetermined width or a predetermined number. Then, the character recognition device is provided with a determination means for extracting the height of the character string for each divided area and determining the height of the input character string from the extracted data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4040261A JPH05210759A (en) | 1992-01-30 | 1992-01-30 | Character recognizing device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP4040261A JPH05210759A (en) | 1992-01-30 | 1992-01-30 | Character recognizing device |
Publications (1)
Publication Number | Publication Date |
---|---|
JPH05210759A true JPH05210759A (en) | 1993-08-20 |
Family
ID=12575727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP4040261A Pending JPH05210759A (en) | 1992-01-30 | 1992-01-30 | Character recognizing device |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPH05210759A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103946865A (en) * | 2011-11-21 | 2014-07-23 | 诺基亚公司 | Methods and apparatuses for facilitating detection of text within an image |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6478395A (en) * | 1987-09-19 | 1989-03-23 | Fujitsu Ltd | Character recognition device |
-
1992
- 1992-01-30 JP JP4040261A patent/JPH05210759A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6478395A (en) * | 1987-09-19 | 1989-03-23 | Fujitsu Ltd | Character recognition device |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103946865A (en) * | 2011-11-21 | 2014-07-23 | 诺基亚公司 | Methods and apparatuses for facilitating detection of text within an image |
CN103946865B (en) * | 2011-11-21 | 2017-03-29 | 诺基亚技术有限公司 | Method and apparatus for contributing to the text in detection image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6643401B1 (en) | Apparatus and method for recognizing character | |
US4903312A (en) | Character recognition with variable subdivisions of a character region | |
JP4607633B2 (en) | Character direction identification device, image forming apparatus, program, storage medium, and character direction identification method | |
JP2006031546A (en) | Character direction identifying device, character processing device, program and storage medium | |
KR20010015025A (en) | Character extracting method | |
JPH05210759A (en) | Character recognizing device | |
JP4244692B2 (en) | Character recognition device and character recognition program | |
JP4136257B2 (en) | Character recognition device, character recognition method, and storage medium | |
JP2728086B2 (en) | Character extraction method | |
JP2993533B2 (en) | Information processing device and character recognition device | |
JP3710164B2 (en) | Image processing apparatus and method | |
JP3848792B2 (en) | Character string recognition method and recording medium | |
JPH0816719A (en) | Character segmenting character recognition method, method and device by using same | |
JPH0950488A (en) | Method for reading different size characters coexisting character string | |
JPH0782524B2 (en) | Optical character reader | |
JP3071479B2 (en) | Line spacing detection method | |
JPH09106437A (en) | Device and method for segmenting character | |
JP2728085B2 (en) | Character extraction method | |
JPH0757047A (en) | Character segmentation system | |
JP2851102B2 (en) | Character extraction method | |
JPH05135204A (en) | Character recognition device | |
JPH03217993A (en) | Character size recognizer | |
JPH08339424A (en) | Device and method for image processing | |
JPH10134147A (en) | Font discriminating device and storage medium stored with its font discriminating process | |
JPH08202822A (en) | Character segmenting device and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A02 | Decision of refusal |
Free format text: JAPANESE INTERMEDIATE CODE: A02 Effective date: 19980414 |