JPH09305707A - Image extracting system - Google Patents

Image extracting system

Info

Publication number
JPH09305707A
JPH09305707A JP8118059A JP11805996A JPH09305707A JP H09305707 A JPH09305707 A JP H09305707A JP 8118059 A JP8118059 A JP 8118059A JP 11805996 A JP11805996 A JP 11805996A JP H09305707 A JPH09305707 A JP H09305707A
Authority
JP
Japan
Prior art keywords
pixel
character
ruled line
image
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP8118059A
Other languages
Japanese (ja)
Other versions
JP2871590B2 (en
Inventor
Katsuhiko Takahashi
勝彦 高橋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP8118059A priority Critical patent/JP2871590B2/en
Publication of JPH09305707A publication Critical patent/JPH09305707A/en
Application granted granted Critical
Publication of JP2871590B2 publication Critical patent/JP2871590B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To highly precisely extract a character or the like from a gradation image constituted by containing a straight line and the character, etc., at least. SOLUTION: An interference picture element likelihood calculating means 904 regards picture elements, which exist near the part of contact with the character among picture elements (ruled line picture elements) belonging to the straight line identified by a ruled line removing means 902, as the cluster of picture element strings of width 1 arranged in linear direction, the average density value calculated by an average ruled line density calculating means 903 is compared with the density of respective ruled line picture elements belonging to that picture element string for each picture element, and the possibility of a picture element (interference picture element) overlapping the character and the straight line is found for each picture element. A character stroke likelihood calculating means 905 verifies the continuity of picture elements having the high possibility of the interference picture element and expresses this continuity as the exponent of character stroke likelihood. An interference picture element restoring means 906 extracts the picture element having the high exponent of character stroke likelihood as the interference picture element.

Description

【発明の詳細な説明】Detailed Description of the Invention

【0001】[0001]

【産業上の利用分野】本発明は、画像抽出方式に係り、
特に、光学的文字読み取り装置において文字枠や罫線な
どの直線に干渉した文字・図形・記号などを抽出するた
めの画像抽出するための画像抽出方式に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image extraction system,
In particular, the present invention relates to an image extraction method for extracting an image for extracting characters, figures, symbols, etc. that interfere with straight lines such as character frames and ruled lines in an optical character reading device.

【0002】[0002]

【従来の技術】多値画像を入力画像として、罫線等の直
線と文字等を含む帳票画像を読み取る方法として特開平
7−334619号公報に記載された技術が提案されて
いる。この公報に開示された発明では、文字の方が罫線
よりも濃い帳票を仮定し、文字認識のためのしきい値と
直線検出のためのしきい値を入力画像の状態から決定す
る。文字認識のためのしきい値よりも画素値が大きい画
素を選出すれば一意的に文字だけを抽出することができ
る。
2. Description of the Related Art A technique disclosed in Japanese Patent Application Laid-Open No. 7-334619 has been proposed as a method for reading a form image including straight lines such as ruled lines and characters using a multivalued image as an input image. In the invention disclosed in this publication, it is assumed that characters are darker than ruled lines, and the threshold for character recognition and the threshold for straight line detection are determined from the state of the input image. If a pixel having a pixel value larger than the threshold value for character recognition is selected, only the character can be uniquely extracted.

【0003】また、枠線や罫線などの直線と文字等を含
む2値画像から、直線に干渉した文字等を抽出するため
の方法として特公昭63−251874号公報、特開平
6−309498号公報等に開示された技術が知られて
いる。特公昭63−251874号公報や特開平6−3
09498号公報に記載された方法は、帳票画像を2値
画像として取り込み、その文字枠の線幅や文字ストロー
クの連続性などの特徴から文字等を抽出している。
Further, as a method for extracting a character interfering with a straight line from a binary image including a straight line such as a frame line and a ruled line and a character, Japanese Patent Publication No. 63-251874 and Japanese Patent Application Laid-Open No. 6-309498. The technology disclosed in US Pat. JP-B-63-251874 and JP-A-6-3
In the method described in Japanese Patent Publication No. 09498, a form image is captured as a binary image, and characters and the like are extracted based on features such as the line width of the character frame and the continuity of character strokes.

【0004】[0004]

【発明が解決しようとする課題】しかしながら、上記特
開平7−334619号公報に記載された方法は、罫線
と文字の濃度が明確な場合には非常に有効であるが、実
際の帳票では必ずしも罫線と文字の濃度が明確に分離す
るとは限らない。例えば、同じインクで印刷された罫線
でも太い罫線と細い罫線では濃淡値が異なり、一本の罫
線でも罫線の中心部と周辺部では濃度が大きく異なる。
また、罫線に微妙な傾きがある場合には、罫線の方向に
沿って濃度値が微妙に変化していく。筆記文字も筆圧の
かかっている部分とそうでない部分では全く濃度が異な
る。
However, the method described in Japanese Patent Application Laid-Open No. 7-334619 is very effective when the density of ruled lines and characters is clear, but it is not always necessary in actual forms. And the density of letters are not always clearly separated. For example, even a ruled line printed with the same ink has a different shade value between a thick ruled line and a thin ruled line, and even a single ruled line has greatly different densities in the center and peripheral portions of the ruled line.
If the ruled line has a slight inclination, the density value changes subtly along the direction of the ruled line. With regard to the written character, the density is completely different between the part where the writing pressure is applied and the part where it is not.

【0005】一方、特公昭63−251874号公報、
特開平6−309498号公報などに記載された2値画
像に基づく手法は、文字と罫線を一意的に分離できない
ことがある。これは、2値画像上での文字枠幅や文字ス
トロークの連続性といった特徴がかならずしも文字と罫
線を一意的に分離できる特徴ではないことによる。
On the other hand, Japanese Patent Publication No. 63-251874,
A method based on a binary image described in Japanese Patent Laid-Open No. 6-309498 may not be able to uniquely separate a character from a ruled line. This is because the features such as the character frame width and the character stroke continuity on the binary image are not always the features that can uniquely separate the character and the ruled line.

【0006】これを解決するために、特公昭63−25
1874号公報に開示された発明では複数の分離状態を
仮説として生成し、これを文字認識により検証するとい
う仮説検証方式を採用している。しかしながら、ブロッ
ク文字枠や罫線から構成される帳票などでは、一文字枠
から構成される帳票に比べて枠線や罫線(以下では2つ
をまとめて単に罫線と呼ぶ)等の直線と文字等が複雑に
干渉する場合が増えるために、仮説の数が指数関数的に
増加するという問題がある。また、数字“2”の最下部
にある横線や英字“A”の中央部の横線などが罫線に重
なった場合でも正しく文字を抽出できるようにするため
には、非常に多くの分離状態を仮説として生成する必要
があるという問題もある。
To solve this problem, Japanese Patent Publication No. 63-25
In the invention disclosed in Japanese Patent No. 1874, a hypothesis verification method is adopted in which a plurality of separation states are generated as hypotheses and the hypotheses are verified by character recognition. However, in a form composed of block character frames and ruled lines, compared to a form composed of one character frame, straight lines such as frame lines and ruled lines (hereinafter, the two are simply referred to as ruled lines) and characters are more complicated. There is a problem that the number of hypotheses increases exponentially due to the increased number of cases that interfere with. Also, in order to be able to correctly extract characters even if the horizontal line at the bottom of the number “2” or the horizontal line at the center of the English letter “A” overlaps the ruled line, a large number of separation states are hypothesized. There is also a problem that it needs to be generated as.

【0007】本発明は従来の上記実情に鑑み、従来の技
術に内在する上記の問題点を解決するためになされたも
のであり、従って本発明の目的は、罫線等の直線と文字
等の濃度差があまり明確でない帳票画像からでも高精度
にかつ一意的に文字等を分離することを可能とした新規
な画像抽出方式を提供することにある。
In view of the above-mentioned conventional circumstances, the present invention has been made to solve the above-mentioned problems inherent in the prior art. Therefore, the object of the present invention is to provide straight lines such as ruled lines and the density of characters. An object of the present invention is to provide a novel image extraction method that enables highly accurate and unique separation of characters and the like even from form images in which the difference is not so clear.

【0008】[0008]

【課題を解決するための手段】上記目的を達成するため
に、請求項1に記載の発明(第1の発明)は、少なくと
も枠線や罫線等の直線と、文字・図形もしくは記号等を
含んで構成される濃淡画像から、文字・図形・記号等を
抽出する画像抽出方式において、前記濃淡画像から罫線
等の直線・文字・図形・記号等と背景を分離して2値画
像を取得する2値化手段と、前記2値画像から罫線等の
直線を除去するとともに、除去した罫線等に属する画素
(以下罫線画素と略記する)及び罫線除去後の2値画像
に含まれる文字等に属する画素(以下文字画素と略記す
る)の位置を記憶する罫線除去手段と、前記文字画素に
隣接している前記罫線画素の中で最も濃度の高い画素を
各隣接箇所において種画素として選出する種画素選出手
段と、前記種画素とそれに対して直線方向に存在する罫
線画素の平均濃度を前記濃淡画像を参照して算出する平
均罫線濃度算出手段と、前記種画素の位置を開始点とし
て前記濃淡画像中の画素を直線の方向に走査し、画素値
が前記平均罫線濃度に定数を加算した値よりも大きい場
合には文字と直線が重なった画素(以下干渉画素と呼
ぶ)と判定してその位置座標を記憶し、一方画素値が前
記平均罫線濃度値に定数を加算した値よりも小さい場合
にはその方向への画素走査を終了する干渉画素復元手段
とを有し、前記干渉画素復元手段により干渉画素が抽出
された場合、抽出された干渉画素を文字画素と同様に扱
って、これと罫線画素が隣接する箇所に対して前記種画
素選出手段以降の処理を繰り返し適用し、文字等を抽出
することを特徴としている。
In order to achieve the above object, the invention according to claim 1 (first invention) includes at least a straight line such as a frame line or a ruled line, and characters, figures or symbols. In an image extraction method for extracting characters, figures, symbols, etc. from a grayscale image composed of, a binary image is obtained by separating lines such as ruled lines, characters, figures, symbols, etc. from the grayscale image and the background 2 A binarizing unit and a straight line such as a ruled line are removed from the binary image, and pixels belonging to the removed ruled line (hereinafter abbreviated as ruled line pixels) and pixels included in the binary image after the ruled line removal are also included. Ruled line removing means for storing the position (hereinafter abbreviated as character pixel), and seed pixel selection for selecting the pixel with the highest density among the ruled line pixels adjacent to the character pixel as a seed pixel at each adjacent position Means and the seed pixel On the other hand, an average ruled line density calculating means for calculating the average density of ruled line pixels existing in a straight line direction with reference to the grayscale image, and a pixel in the grayscale image in the straight line direction starting from the position of the seed pixel. When the pixel value is scanned, and the pixel value is larger than the value obtained by adding a constant to the average ruled line density, it is determined to be a pixel in which a character and a straight line overlap (hereinafter referred to as an interference pixel), and the position coordinates are stored, and one pixel value Has an interference pixel restoration means for ending the pixel scanning in that direction when is smaller than a value obtained by adding a constant to the average ruled line density value, and when an interference pixel is extracted by the interference pixel restoration means, The extracted interference pixel is treated in the same manner as the character pixel, and the character pixel and the like are extracted by repeatedly applying the processing after the seed pixel selection means to the location where the ruled pixel is adjacent to the interference pixel.

【0009】また請求項3に記載の発明(第2の発明)
は、少なくとも枠線や罫線等の直線と文字・図形もしく
は記号等を含んで構成される濃淡画像から、文字・図形
・記号等を抽出する画像抽出方式において、前記濃淡画
像から罫線等の直線・文字・図形・記号と背景を抽出し
て2値画像を取得する2値化手段と、前記2値画像から
罫線等の直線を除去するとともに、除去した罫線画素及
び罫線除去後の2値画像に含まれる文字画素の位置を記
憶する罫線除去手段と、前記文字画素と前記罫線画素が
隣接する箇所毎に、隣接箇所付近の罫線画素を直線方向
に並ぶ幅1の画素列に分割し、各画素列の中から最大濃
度値を持つ画素を種画素として抽出する種画素選出手段
と、前記種画素が属する画素列に存在する罫線画素の平
均濃度を前記濃淡画像を参照して算出する平均罫線濃度
算出手段と、前記隣接箇所付近の罫線画素に対し、前記
濃淡画像中におけるその濃淡値とその画素が属する画素
列の前記平均濃度との差から、その画素が干渉画素であ
る確からしさを算出する干渉画素らしさ算出手段と、前
記干渉画素らしさの高い画素がどの程度連続的に存在す
るかをその画素の文字ストロークらしさとして算出する
文字ストロークらしさ算出手段と、前記種画素を開始点
としてその種画素が属する罫線の方向に画素を走査し、
前記文字ストロークらしさがしきい値よりも大きい場合
には干渉画素と判定してその位置座標を記憶し、一方文
字ストロークらしさが前記しきい値よりも小さい場合に
はその方向への画素走査を終了する干渉画素復元手段と
を備えることを特徴としている。
The invention according to claim 3 (second invention)
Is an image extraction method for extracting characters, figures, symbols, etc. from a grayscale image including at least straight lines such as frame lines and ruled lines, and characters, figures, symbols, etc. Binarization means for extracting a binary image by extracting characters / figures / symbols and a background, and removing straight lines such as ruled lines from the binary image, and removing the ruled line pixels and the binary image after the ruled lines are removed. Ruled line removing means for storing the position of the included character pixel, and for each position where the character pixel and the ruled line pixel are adjacent to each other, the ruled line pixel near the adjacent position is divided into a pixel row of width 1 arranged in the linear direction, and each pixel is divided. Seed pixel selection means for extracting a pixel having the maximum density value from a column as a seed pixel, and an average ruled line density for calculating an average density of ruled line pixels existing in a pixel column to which the seed pixel belongs by referring to the grayscale image Calculating means, and For a ruled line pixel near the contact point, an interference pixel likelihood calculating means for calculating the probability that the pixel is an interference pixel from the difference between the gray value in the gray image and the average density of the pixel row to which the pixel belongs. And a character stroke likelihood calculating means for calculating how consecutively the pixels having a high interference pixel likelihood exist as the character stroke likelihood of the pixel, and the direction of the ruled line to which the seed pixel belongs from the seed pixel as a starting point. Scan the pixels to
If the character stroke likeness is larger than the threshold value, it is determined as an interference pixel and the position coordinates thereof are stored, while if the character stroke likeness is smaller than the threshold value, the pixel scanning in that direction is ended. And an interfering pixel restoring means for performing the same.

【0010】[0010]

【作用】罫線等の直線と文字等の濃度が明確に分離しな
い帳票でも、罫線と文字が重なっている干渉画素は、罫
線方向においてそれと同じ画素列上に存在する金棒の罫
線画素よりも濃度が高い傾向がある。
[Function] Even in a form in which the straight lines such as ruled lines and the densities of characters and the like are not clearly separated, the interference pixels in which the ruled lines and the characters overlap have a higher density than the ruled line pixels of the gold rod existing on the same pixel line in the ruled line direction. Tends to be high.

【0011】請求項1の発明(第1の発明)では、幾何
学的特徴などから同定(設定)した罫線画素のうち、干
渉画素が存在する可能性の高い箇所、すなわち罫線画素
と文字画素が隣接する箇所周辺に存在する罫線画素に対
して画素列毎に平均濃度を調べ、この平均濃度よりも濃
度が高い画素を干渉画素と判定することにより、罫線等
と文字等の濃度値が明確に分離していない画像からでも
干渉画素を高精度にかつ一意的に抽出することができ
る。罫線除去手段によって抽出された文字画素と干渉画
素とを合成すれば文字を抽出することができる。また、
種画素の周辺から順番に干渉画素の復元を行い、復元の
途中で干渉画素でない画素がみつかったならばその方向
への画素探索を中止することにより、入力画像が多少の
ノイズ成分を含んでいる画像でも、本来干渉画素でない
画素を誤って復元しないようにすることができる。
In the invention of claim 1 (first invention), among the ruled line pixels identified (set) from geometrical characteristics and the like, there is a high possibility that interference pixels exist, that is, ruled line pixels and character pixels are By checking the average density of each pixel row for the ruled line pixels existing around adjacent areas, and determining the pixel with a density higher than this average density as an interfering pixel, the density values of the ruled lines and characters can be clarified. Interference pixels can be extracted with high accuracy and uniquely even from images that are not separated. A character can be extracted by combining the character pixel extracted by the ruled line removing means and the interference pixel. Also,
Interference pixels are restored in order from the periphery of the seed pixel, and if a pixel that is not an interference pixel is found during restoration, the pixel search in that direction is stopped, so that the input image contains some noise components. Even in an image, it is possible to prevent pixels that are not originally interference pixels from being mistakenly restored.

【0012】また、請求項3の発明(第2の発明)で
は、周囲に存在する罫線画素よりも濃い罫線画素がどの
程度連続的に存在するかという特徴から文字ストローク
らしさを算出し、これを干渉画素抽出のための基準とし
ているので、濃淡差の抽出できない罫線画素が数画素あ
っても周囲に濃淡差の明確な罫線画素があれば、滑らか
な形状で干渉画素を抽出することができるようになる。
According to the third aspect of the invention (the second aspect of the invention), the character stroke likeness is calculated from the characteristic of how continuously the ruled line pixels darker than the surrounding ruled line pixels exist, and this is calculated. Since it is used as a reference for extracting interference pixels, it is possible to extract interference pixels with a smooth shape if there are some ruled line pixels in which shade differences are clear even if there are several ruled line pixels in which shade differences cannot be extracted. become.

【0013】[0013]

【実施例】以下本発明をその好ましい各実施例について
図面を参照しながら詳細に説明する。
BEST MODE FOR CARRYING OUT THE INVENTION The present invention will now be described in detail with reference to the drawings for each of its preferred embodiments.

【0014】図1は請求項1に記載の発明(第1の発
明)に基づき構成した帳票画像からの文字抽出方式の一
実施例を示すブロック図である。図2に本実施例の入力
画像の一例を示す。
FIG. 1 is a block diagram showing an embodiment of a character extraction method from a form image constructed according to the invention (first invention) described in claim 1. FIG. 2 shows an example of the input image of this embodiment.

【0015】図1、図2を参照するに、2値化手段10
0は入力画像を2値化し、文字及び罫線と背景とを分離
する。2値化の方法としては、画像全体を一様なしきい
値で2値化する方法や局所領域毎にしきい値を変化させ
る方法などが一般的に知られている。図3は、図2に示
した入力画像を一様なしきい値で2値化した画像の一部
を示す図である。罫線除去手段101は、2値化手段1
00が出力する2値画像から帳票に印刷されている罫線
を抽出し、除去する。また、罫線除去手段101は除去
した罫線画素の位置を記憶する。具体的な記憶方法とし
ては、2値画像と同じ大きさの画像領域を確保して罫線
位置を記憶してもよいし、2値画像中の罫線画素に第3
番目の値を書き込んで3値画像としてもよい。本実施例
では後者の方法で罫線位置を記憶するものとする。
Referring to FIGS. 1 and 2, the binarizing means 10
0 binarizes the input image and separates the character and ruled line from the background. As a binarization method, a method of binarizing the entire image with a uniform threshold value, a method of changing the threshold value for each local region, etc. are generally known. FIG. 3 is a diagram showing a part of an image obtained by binarizing the input image shown in FIG. 2 with a uniform threshold value. The ruled line removing means 101 is the binarizing means 1
The ruled line printed on the form is extracted from the binary image output by 00 and removed. Further, the ruled line removing means 101 stores the positions of the removed ruled line pixels. As a specific storage method, an image area having the same size as the binary image may be secured to store the ruled line position, or the ruled line pixel in the binary image may be stored in the third position.
The third value may be written to form a ternary image. In this embodiment, the ruled line position is stored by the latter method.

【0016】図4は罫線除去手段101が生成する3値
画像の一部を示す図である。この図4からわかるよう
に、罫線除去手段101の動作が終了した時点で、背景
画素400、文字画素401、罫線画素402(但し罫
線画素は文字と罫線が重なった画素=干渉画素を含んで
いる)の位置が同定(設定)される。
FIG. 4 is a diagram showing a part of the ternary image generated by the ruled line removing means 101. As can be seen from FIG. 4, when the operation of the ruled line removing means 101 is completed, the background pixel 400, the character pixel 401, and the ruled line pixel 402 (however, the ruled line pixel includes a pixel where the character and the ruled line overlap = interference pixel). ) Is identified (set).

【0017】次に、種画素選出手段102の動作原理を
図4、図5を用いて説明する。
Next, the operation principle of the seed pixel selection means 102 will be described with reference to FIGS. 4 and 5.

【0018】図4において、文字画素401と罫線画素
402が隣接している箇所が2箇所ある。抽出対象であ
る干渉画素は、文字ストロークの一部であるから、既に
文字画素と判明している画素401に隣接する位置に存
在する確率が高い。また、干渉画素は周囲の罫線濃度よ
りも濃度が高いという仮定から、各隣接箇所においても
っとも干渉画素である確率が高い画素を種画素として選
出する。例えば、図4のような画像に対しては、図5に
示すような2つの種画素500、及び501が選出され
る。
In FIG. 4, there are two places where the character pixel 401 and the ruled line pixel 402 are adjacent to each other. Since the interference pixel to be extracted is a part of the character stroke, there is a high probability that it will be present at a position adjacent to the pixel 401 already known as the character pixel. Further, on the assumption that the interference pixel has a higher density than the surrounding ruled line density, the pixel having the highest probability of being an interference pixel at each adjacent portion is selected as the seed pixel. For example, for the image shown in FIG. 4, two seed pixels 500 and 501 shown in FIG. 5 are selected.

【0019】平均罫線濃度算出手段103は、種画素選
出手段102で選出された各種画素周辺の平均罫線濃度
を算出する。横罫線の場合には種画素を中心としてその
左右方向に存在するwh 個の罫線画素の平均濃度を算出
する。縦罫線の場合には種画素を中心として、その上下
方向に存在するwv 個の罫線画素の平均濃度を算出す
る。一般的に、横罫線との重なり長の方が縦罫線との重
なり長よりも短いので、wh <wv となるようにする。
また、平均罫線濃度算出のための罫線画素が上記の個数
に達しない場合でも、そこにある罫線画素のみから平均
値を算出する。
The average ruled line density calculating means 103 calculates the average ruled line density around the various pixels selected by the seed pixel selecting means 102. In the case of horizontal ruled lines is calculated an average density of w h pieces of ruled lines pixels existing on the left and right directions around the seed pixel. In the case of a vertical ruled line, the average density of w v ruled line pixels existing in the vertical direction centering on the seed pixel is calculated. Generally, towards the overlapping length between the lateral borders shorter than the overlap length between the vertical ruled lines, so-w h <w v.
Even if the number of ruled line pixels for calculating the average ruled line density does not reach the above number, the average value is calculated only from the ruled line pixels existing there.

【0020】更に、種画素と同一列上にある画素のみか
ら平均値を算出する理由を図6を用いて説明する。
Further, the reason why the average value is calculated only from the pixels on the same column as the seed pixel will be described with reference to FIG.

【0021】図6は実際の帳票画像にて観察された、横
罫線と縦方向の文字ストロークが交差している箇所の濃
度値を示す図である。この図から、罫線方向に並んだ画
素列600の中では、干渉画素とそれ以外の罫線画素の
値が明確に分離しているが、罫線の中心部付近と周辺部
付近では濃度値が異なることがわかる。例えば、罫線中
心部にある罫線画素601と、罫線輪郭部にある干渉画
素602では罫線画素601の方が濃度値が高い。これ
は、画像入力装置が備える1つのCCD画素が光を感知
する領域の直径が罫線幅に対してかなり大きいことに依
っており、このことから、罫線画素を干渉画素とそれ以
外の画素に分離するには各画素列毎に濃淡変化を調べる
のが有効であることがわかる。
FIG. 6 is a diagram showing the density value of a portion where a horizontal ruled line and a character stroke in the vertical direction intersect, which is observed in an actual form image. From this figure, the values of the interfering pixel and the other ruled line pixels are clearly separated in the pixel row 600 arranged in the ruled line direction, but the density values are different near the center and around the ruled line. I understand. For example, in the ruled line pixel 601 in the center of the ruled line and the interference pixel 602 in the ruled line outline, the ruled line pixel 601 has a higher density value. This is because one CCD pixel included in the image input device has a diameter of a region in which light is sensed that is considerably larger than the ruled line width. Therefore, the ruled line pixel is separated into an interference pixel and other pixels. In order to do this, it is effective to check the change in density for each pixel column.

【0022】次に干渉画素復元手段104の動作原理を
図7を用いて説明する。横方向をx軸、縦方向をy軸と
すると、干渉画素復元手段104は、各種画素を開始点
として、横罫線なら左右方向、縦罫線なら上下方向に画
素を探索し、その濃淡値g(x,y)とその種画素に対
して平均罫線濃度算出手段103が算出した平均濃度値
を比較して、 g(xs +x,y)>平均値+α ・・・・・・・・ (1) を満たす画素P(xs +x,y)を干渉画素として復元
し、ラベル画像中に干渉画素に対応する第4の値を書き
込む。ここで、xs は種画素のx座標、αは定数を示
す。図7は横罫線の場合を示しており、まず種画素70
0の濃度と平均値を比較し、式(1)を満たしていたら
干渉画素と判定して、その左の画素701の濃度を調べ
る。画素701も式(1)を満たしたならば、さらに左
方向の画素を探索していき、式(1)を満たさない画素
が現れたら種画素の右側に存在する画素について同様に
探索を行っていく。右側の画素に対しての探索が終了し
た場合にはこの種画素に対する干渉画素の復元を終了す
る。
Next, the operation principle of the interference pixel restoration means 104 will be described with reference to FIG. If the horizontal direction is the x-axis and the vertical direction is the y-axis, the interference pixel restoration unit 104 searches for pixels in the horizontal direction for horizontal ruled lines and in the vertical direction for vertical ruled lines, and the grayscale value g ( (x, y) and the average density value calculated by the average ruled line density calculation means 103 for the seed pixel, g (x s + x, y)> average value + α (1) Pixel P (x s + x, y) satisfying the above condition is restored as an interference pixel, and the fourth value corresponding to the interference pixel is written in the label image. Here, x s indicates the x coordinate of the seed pixel, and α indicates a constant. FIG. 7 shows a case of a horizontal ruled line. First, the seed pixel 70
The density of 0 is compared with the average value, and if the expression (1) is satisfied, it is determined to be an interference pixel, and the density of the pixel 701 on the left is checked. If the pixel 701 also satisfies the expression (1), the pixel in the left direction is further searched, and if a pixel that does not satisfy the expression (1) appears, the pixel on the right side of the seed pixel is similarly searched. Go. When the search for the pixel on the right side is completed, the restoration of the interference pixel for this kind of pixel is completed.

【0023】そして、抽出された干渉画素を文字画素と
同じように扱って再びこれと罫線画素の隣接箇所に対し
て種画素選出手段102以降の処理を繰り返す。すると
さらに干渉画素が復元されて、図4のような画像に対し
ては、図8に示すようなラベル画像が得られる。ここ
で、文字画素ラベルがついた画素と干渉画素ラベルがつ
いた画素のみを抽出すれば、文字を正しく抽出すること
ができる。図8において、803は干渉画素である。
Then, the extracted interference pixel is treated in the same manner as the character pixel, and the processing from the seed pixel selection means 102 and the subsequent processing is repeated for the adjacent portion of this pixel and the ruled line pixel. Then, the interference pixel is further restored, and a label image as shown in FIG. 8 is obtained for the image as shown in FIG. Here, by extracting only the pixels having the character pixel label and the pixels having the interference pixel label, the character can be correctly extracted. In FIG. 8, reference numeral 803 is an interference pixel.

【0024】次に、請求項3に記載の発明(第2の発
明)に基づいて構成した帳票画像からの文字抽出方式の
一実施例を図9を用いて説明する。
Next, an embodiment of a character extraction method from a form image constructed according to the invention (second invention) described in claim 3 will be described with reference to FIG.

【0025】図9を参照するに、2値化手段900及び
罫線除去手段901は上述した第1の発明に示したもの
と完全に同機能を有するものである。
Referring to FIG. 9, the binarizing means 900 and the ruled line removing means 901 have completely the same functions as those shown in the above-mentioned first invention.

【0026】ここで先ず、図10を用いて種画素選出手
段902の機能について説明する。種画素選出手段90
2は、罫線除去手段901が出力する画像にて、罫線ラ
ベル画素と文字ラベル画素が隣接している全干渉箇所付
近の各画素列1000〜1003において最大濃度値の
画素1004〜1007を種画素として選出する。各画
素列の長さ、すなわち種画素の探索範囲は以下のように
して定めるとよい。まず、文字画素列1008に隣接す
る罫線画素列1000、及び文字画素列1009に隣接
する罫線画素列1002は、文字画素に4連結で接して
いる罫線画素のみから構成される。そして文字画素列1
008、1009から離れるほど画素列の長さをやや長
めに設定する。こうすることにより、文字ストロークが
罫線と鋭角的に交わる場合でも、本来干渉画素であるは
ずの画素が種画素として選ばれる確率を増すことができ
る。
First, the function of the seed pixel selection means 902 will be described with reference to FIG. Seed pixel selection means 90
In the image 2 output by the ruled line removing unit 901, the pixels 1004 to 1007 having the maximum density value are used as seed pixels in each pixel column 1000 to 1003 near all interference points where the ruled line label pixels and the character label pixels are adjacent to each other. elect. The length of each pixel column, that is, the search range of the seed pixel may be determined as follows. First, the ruled line pixel column 1000 adjacent to the character pixel column 1008 and the ruled line pixel column 1002 adjacent to the character pixel column 1009 are composed only of ruled line pixels that are in four-connection with the character pixel. And character pixel row 1
As the distance from 008, 1009 increases, the length of the pixel row is set to be slightly longer. By doing so, even when a character stroke intersects a ruled line at an acute angle, it is possible to increase the probability that a pixel that should originally be an interference pixel is selected as a seed pixel.

【0027】平均罫線濃度算出手段903は、前記種画
素選出手段902が選出した各種画素に対して、同一列
上に存在する罫線画素の平均罫線濃度を算出する。
The average ruled line density calculation means 903 calculates the average ruled line density of the ruled line pixels existing on the same column for the various pixels selected by the seed pixel selection means 902.

【0028】干渉画素らしさ算出手段904は、種画素
と同一列上に存在する罫線画素の濃淡値とその種画素に
ついて算出した平均罫線濃度とを比較し、各罫線画素が
干渉画素である確信度(干渉画素らしさ)を計算する。
干渉画素らしさは、横罫線なら種画素を中心としてその
左右に存在するwh 個、縦罫線なら種画素を中心として
その上下に存在するwv 個の罫線画素についてのみ算出
し、種画素より遠く離れた罫線画素に対しては求める必
要はない。また、干渉画素らしさO(x,y)は、 O(x,y)=f(g(x,y)−平均値) ・・・・・・・(2) で定義する。ここで関数fはシグモイド関数のような形
をした単調増加の関数を表す。また、本関数の値域の上
限をOmax 、下限をOmin とする。
The interference pixel likeness calculating means 904 compares the gray value of the ruled line pixel existing on the same column as the seed pixel with the average ruled line density calculated for the seed pixel, and the certainty factor that each ruled line pixel is an interference pixel. (Interference pixel likeness) is calculated.
The likelihood of interfering pixels is calculated only for w h ruled lines that are on the left and right of the seed pixel in the case of a horizontal ruled line and w v ruled lines that are above and below the seed pixel for a vertical ruled line, and is farther than the seed pixel. It is not necessary to find the ruled line pixels that are far apart. The interference pixel likelihood O (x, y) is defined by O (x, y) = f (g (x, y) -average value) (2). Here, the function f represents a monotonically increasing function shaped like a sigmoid function. The upper limit of the range of this function is Omax and the lower limit is Omin.

【0029】罫線と文字の干渉箇所が近接して複数存在
すると、1つの罫線画素に対して複数の干渉画素らしさ
が定義されうるが、そうした場合には最大値である値を
その画素の干渉画素らしさとする。但し、縦罫線と横罫
線の交点部の罫線画素に対しては最小値を採用する。
When there are a plurality of interference points between ruled lines and characters in close proximity to each other, it is possible to define a plurality of interference pixel-like pixels for one ruled line pixel. In such a case, the maximum value is set as the interference pixel of that pixel. Let's be like However, the minimum value is adopted for the ruled line pixel at the intersection of the vertical ruled line and the horizontal ruled line.

【0030】文字ストロークらしさ算出手段905は、
干渉画素らしさの算出された画素に対して、干渉画素ら
しさの高い画素がどのくらい連続的に並んでいるかを
“文字ストロークらしさ”という定量的な値で表す。ま
ず、微小領域における方向性を以下の式(3)により定
義する。
The character stroke likelihood calculating means 905
With respect to the pixels for which the likelihood of interference pixels has been calculated, how consecutively the pixels with high likelihood of interference pixels are arranged is represented by a quantitative value called "character stroke likelihood". First, the directivity in a minute area is defined by the following equation (3).

【0031】 [0031]

【0032】但し本計算において、文字ラベル画素・背
景ラベル画素に対する干渉画素らしさはそれぞれO
(x,y)=Omax 、O(x,y)=0min とする。こ
の計算式の理解を容易にするために、例としてD
0 (x,y)の算出原理を図11を用いて説明する。D
O (x,y)は横方向に長い2つのフィルタ1100、
及び1101を使って算出した積和演算結果の大きい方
を採用するのと等しく、従って幅2画素程度以上の横方
向ストロークの抽出をつかさどる。D45(x,y)、D
90(x,y)、D135 (x,y)もこれを45度ずつ傾
けたようなフィルタで計算することができる。
However, in this calculation, the likelihood of interfering pixels with respect to the character label pixel and the background label pixel is O, respectively.
Let (x, y) = Omax and O (x, y) = 0 min. To facilitate understanding of this calculation formula, as an example, D
The principle of calculating 0 (x, y) will be described with reference to FIG. D
O (x, y) is two filters 1100 that are long in the lateral direction,
And 1101 is the same as using the larger one of the product-sum operation results, and thus controls the extraction of lateral strokes having a width of about 2 pixels or more. D 45 (x, y), D
90 (x, y) and D 135 (x, y) can also be calculated with a filter that is inclined by 45 degrees.

【0033】そして各画素の文字ストロークらしさを L(x,y) =MAX(LO (x,y),L45(x,y),L90(x,y), L135 (x,y)) ・・・・・・・・(4)The character stroke likelihood of each pixel is L (x, y) = MAX (L O (x, y), L 45 (x, y), L 90 (x, y), L 135 (x, y) )) ・ ・ ・ ・ ・ ・ (4)

【0034】 [0034]

【0035】により定義する。値LO (x,y),L45
(x,y),L90(x,y),L135(x,y)は、そ
れぞれ画素(x,y)付近に横方向、右斜め方向、縦方
向、左斜め方向の干渉画素の並びが存在する確からしさ
を示す。
It is defined by Value L O (x, y), L 45
(X, y), L 90 (x, y), and L 135 (x, y) are rows of interference pixels near the pixel (x, y) in the horizontal direction, the right diagonal direction, the vertical direction, and the left diagonal direction. Indicates the likelihood of existence.

【0036】本計算式の理解を容易にするために、図1
2を用いてLO (x,y)の算出方法を説明する。画素
1200において横方向の干渉画素列(=文字ストロー
ク)があるということは、その周囲にもそれに接続する
文字ストロークがあるはずである。そこで、画素位置1
201〜1203、1204〜1206のそれぞれにお
いて図示した方向のDの値が最大になるものをそれぞれ
選出してきて、これと画素位置1200におけるD0
値から画素位置1200における横方向文字ストローク
の存在可能性を算出する。L45(x,y),L90(x,
y),L135 (x,y)もこれを45度ずつ傾けた場合
に等しい。
To facilitate understanding of this calculation formula, FIG.
The calculation method of L O (x, y) will be described using 2. The fact that there is an interfering pixel row (= character stroke) in the horizontal direction at the pixel 1200 means that there should be a character stroke connected to it as well. Therefore, pixel position 1
In each of 201 to 1203 and 1204 to 1206, the one having the maximum value of D in the illustrated direction is selected, and from this and the value of D 0 at the pixel position 1200, the horizontal character stroke at the pixel position 1200 can exist. Calculate the sex. L 45 (x, y), L 90 (x,
y) and L 135 (x, y) are also the same as when they are inclined by 45 degrees.

【0037】干渉画素復元手段906は文字ストローク
らしさの大きい画素を干渉画素として復元する。復元順
序は前記した第1の発明と同じように、文字ラベル画素
と接している罫線画素列から開始して、罫線の反対側の
方向へ進めていく。ひとつの罫線画素列内においては、
種画素を開始点として罫線方向に画素探索し、その文字
ストロークらしさがしきい値よりも大きければ復元して
その隣の画素に着目し、しきい値よりも小さければそこ
でその方向への探索を中止する。
The interference pixel restoration means 906 restores a pixel having a large character stroke likeness as an interference pixel. As in the first aspect of the invention, the restoration order starts from the ruled line pixel row in contact with the character label pixel and proceeds in the direction opposite to the ruled line. In one ruled line pixel row,
A pixel is searched in the direction of the ruled line using the seed pixel as a starting point, and if the character stroke likeness is larger than the threshold value, it is restored and the pixel next to it is focused, and if it is smaller than the threshold value, the search is made in that direction. Discontinue.

【0038】本発明を用いて青色罫線帳票の帳票画像5
0枚から文字を抽出する実験を行った。その結果、罫線
と文字ストロークが長い区間で重複している、比較的文
字抽出が難しい箇所に対して、94.1%の抽出成功率
を得た。
Form image 5 of a blue ruled line form using the present invention
An experiment was conducted to extract characters from 0 sheets. As a result, an extraction success rate of 94.1% was obtained for a portion where character extraction is relatively difficult, where ruled lines and character strokes overlap in a long section.

【0039】本実施例では帳票画像に本発明を適用した
例を述べたが、帳票画像以外でも直線線分が文字・記号
・図形に接触する画像から文字・記号・図形を抽出する
場合に有効である。例として、アンダーライン除去など
が挙げられる。
In this embodiment, an example in which the present invention is applied to a form image has been described, but it is effective when a character, a symbol, or a figure is extracted from an image in which a straight line segment touches the character, the symbol, or the figure other than the form image. Is. An example is underline removal.

【0040】次に、請求項2または請求項3に記載の発
明に基づいて構成した帳票画像からの文字コード抽出方
式の実施例を図13に示すブロック図を用いて説明す
る。本実施例は、図1に示した干渉画素復元手段130
4(図1では104)の後段に、文字切り出し手段13
05及び文字認識手段1306を付加したシステムであ
る。本ブロック図が示すシステムに図14のような濃淡
画像を入力した場合には、請求項1の発明、即ち干渉画
素復元手段1304の最終的出力画像は図15に示す文
字だけを含む画像となる。
Next, an embodiment of a character code extraction method from a form image constructed according to the invention of claim 2 or claim 3 will be described with reference to the block diagram shown in FIG. In this embodiment, the interference pixel restoration means 130 shown in FIG.
4 (104 in FIG. 1), the character cutting means 13
05 and character recognition means 1306 are added to the system. When a grayscale image as shown in FIG. 14 is input to the system shown in this block diagram, the final output image of the invention of claim 1, that is, the interference pixel restoration means 1304 is an image containing only the characters shown in FIG. .

【0041】文字切り出し手段1305は、本画像から
各文字毎の画像を生成する手段である。本手段の具体的
実現方法としては、縦方向に黒画素を投影したときに得
られる黒画素ヒストグラムの谷点を用いる方法や、連結
黒画素のラベリング結果を用いる方法等が一般的に知ら
れている。
The character cutting means 1305 is means for generating an image for each character from the main image. As a concrete implementation method of this means, a method of using valley points of a black pixel histogram obtained when black pixels are projected in the vertical direction, a method of using a labeling result of connected black pixels, etc. are generally known. There is.

【0042】図16は、文字切り出し手段1305が、
これらの方法によって図15に示す画像の最上段の文字
列から抽出した文字画像の例を示す図である。文字認識
手段1306は、図16に示した各文字画像を認識し、
それぞれの文字コードを出力する。干渉画素復元手段1
304が出力する画像は、図15に示したように、罫線
等の直線成分と文字が干渉している部分付近の文字形状
が実形状に近い状態に復元されるために、従来の個別文
字認識方法を用いて高い読み取り率を得ることができ
る。しかし、前記した特公昭63−251874号公報
に示された発明などを用いた文字読み取り方法では、罫
線等の直線成分と文字の干渉箇所付近の文字形状が階段
状になるために、文字認識のための特徴量として不適当
な値が抽出され、文字認識を困難にするという問題があ
る。
In FIG. 16, the character cutting means 1305 is
It is a figure which shows the example of the character image extracted from the uppermost character string of the image shown in FIG. 15 by these methods. The character recognition means 1306 recognizes each character image shown in FIG.
Output each character code. Interference pixel restoration means 1
As shown in FIG. 15, since the image output by 304 is restored to a state in which the character shape in the vicinity of the portion where the character interferes with the straight line component such as ruled line is approximated to the actual shape, the conventional individual character recognition is performed. High read rates can be obtained using the method. However, in the character reading method using the invention disclosed in Japanese Patent Publication No. 63-251874, the character shape near the point where the line component such as a ruled line interferes with the character has a stair shape. There is a problem in that an inappropriate value is extracted as a feature amount for making the character recognition difficult.

【0043】[0043]

【発明の効果】以上説明したように、本発明によれば、
枠線や罫線等の直線と文字・図形・記号を含む濃淡画像
から文字・図形・記号を高精度に抽出することができ
る。
As described above, according to the present invention,
Characters, figures, and symbols can be extracted with high accuracy from grayscale images that include straight lines such as frame lines and ruled lines, and characters, figures, and symbols.

【0044】また、本発明によって抽出される文字は本
来の文字の形を忠実に復元するために、本発明の後段で
実行される可能性の高い文字認識処理の入力画像として
優れた品質の画像を提供することができる。
Further, since the character extracted by the present invention faithfully restores the original character shape, an image of excellent quality as an input image for the character recognition processing which is likely to be executed in the latter stage of the present invention. Can be provided.

【図面の簡単な説明】[Brief description of drawings]

【図1】第1の発明による一実施例を示すブロック構成
図である。
FIG. 1 is a block diagram showing an embodiment according to the first invention.

【図2】第1の発明における入力濃淡画像の一例を示す
図である。
FIG. 2 is a diagram showing an example of an input grayscale image in the first invention.

【図3】図2に示した帳票の2値画像の一部を示す図で
ある。
FIG. 3 is a diagram showing a part of a binary image of the form shown in FIG.

【図4】第1の発明に係る罫線除去手段が出力するラベ
ル画像の例を示す図である。
FIG. 4 is a diagram showing an example of a label image output by a ruled line removing unit according to the first invention.

【図5】第1の発明に係る種画素選出手段が設定する種
画素の位置の例を示す図である。
FIG. 5 is a diagram showing an example of positions of seed pixels set by a seed pixel selection unit according to the first invention.

【図6】罫線と文字の交差部における濃淡値の例を示す
図である。
FIG. 6 is a diagram showing an example of a gray value at an intersection of a ruled line and a character.

【図7】第1の発明に係る干渉画素復元手段が画素を探
索する順序を示す図である。
FIG. 7 is a diagram showing an order in which an interference pixel restoration unit according to the first invention searches for a pixel.

【図8】干渉画素復元後のラベル画像の一例を示す図で
ある。
FIG. 8 is a diagram showing an example of a label image after restoration of interference pixels.

【図9】第2の発明による一実施例を示すブロック構成
図である。
FIG. 9 is a block diagram showing an embodiment according to the second invention.

【図10】第2の発明における種画素選出手段が設定す
る種画素の位置の例を示す図である。
FIG. 10 is a diagram showing an example of positions of seed pixels set by a seed pixel selection unit in the second invention.

【図11】微小領域における干渉画素の連続性(水平方
向)を算出するための2種類のフィルタを示す図であ
る。
FIG. 11 is a diagram showing two types of filters for calculating continuity (horizontal direction) of interference pixels in a minute area.

【図12】文字ストロークらしさ(水平方向)の算出方
法を説明する図である。
FIG. 12 is a diagram illustrating a method of calculating character stroke likelihood (horizontal direction).

【図13】請求項2に記載した発明の一実施例を示すブ
ロック図である。
FIG. 13 is a block diagram showing an embodiment of the invention described in claim 2.

【図14】入力濃淡画像の一例を示す図である。FIG. 14 is a diagram showing an example of an input grayscale image.

【図15】図14を入力画像した場合における干渉画素
復元手段の出力画像を示す図である。
FIG. 15 is a diagram showing an output image of the interference pixel restoration unit when FIG. 14 is an input image.

【図16】文字切り出し手段が出力する文字画像の例を
示す図である。
FIG. 16 is a diagram showing an example of a character image output by a character cutout unit.

【符号の説明】[Explanation of symbols]

100…2値化手段 101…罫線除去手段 102…種画素選出手段 103…平均罫線濃度算出手段 104…干渉画素復元手段 400…背景画素 401…文字画素 402…罫線画素 500、501…種画素 600…罫線方向に並ぶ画素列 601…横罫線中心部にある罫線画素 602…横罫線輪郭付近にある干渉画素 700…種画素 701…種画素の左側の画素 800…背景画素 801…文字画素 802…罫線画素 803…干渉画素 900…2値化手段 901…罫線除去手段 902…種画素選出手段 903…平均罫線濃度算出手段 904…干渉画素らしさ算出手段 905…文字ストロークらしさ算出手段 906…干渉画素復元手段 1000、1001、1002、1003…種画素の探
索範囲 1004、1005、1006、1007…種画素の検
索範囲1000、10 01、1002、1003の中から選出された種画素 1008、1009…罫線画素に隣接する文字画素 1100、1101…微小領域における干渉画素の方向
性(横方向)を算出する為のフィルタ 1200、1201、1202、1203、1204、
1205、1206…画素 1300…2値化手段 1301…罫線除去手段 1302…種画素選出手段 1303…平均罫線濃度算出手段 1304…干渉画素復元手段 1305…文字切り出し手段 1306…文字認識手段
Reference numeral 100 ... Binarization means 101 ... Ruled line removal means 102 ... Seed pixel selection means 103 ... Average ruled line density calculation means 104 ... Interference pixel restoration means 400 ... Background pixels 401 ... Character pixels 402 ... Ruled line pixels 500, 501 ... Seed pixels 600 ... Pixel column arranged in ruled line direction 601 ... Ruled line pixel in the center of horizontal ruled line 602 ... Interference pixel near horizontal ruled line outline 700 ... Seed pixel 701 ... Left pixel of seed pixel 800 ... Background pixel 801 ... Character pixel 802 ... Ruled line pixel 803 ... Interference pixel 900 ... Binarization means 901 ... Ruled line removal means 902 ... Seed pixel selection means 903 ... Average ruled line density calculation means 904 ... Interference pixel likeness calculation means 905 ... Character stroke likeness calculation means 906 ... Interference pixel restoration means 1000, 1001, 1002, 1003 ... Search range of seed pixels 1004, 1005, 1006, 007 ... Seed pixel search range 1000, 1001, 1002, 1003, selected seed pixel 1008, 1009 ... Character pixel adjacent to ruled line pixel 1100, 1101 ... Directionality of interference pixel in minute area (horizontal direction) Filters for calculating the values 1200, 1201, 1202, 1203, 1204,
1205, 1206 ... Pixels 1300 ... Binarization means 1301 ... Ruled line removal means 1302 ... Seed pixel selection means 1303 ... Average ruled line density calculation means 1304 ... Interference pixel restoration means 1305 ... Character cutout means 1306 ... Character recognition means

Claims (4)

【特許請求の範囲】[Claims] 【請求項1】 少なくとも枠線や罫線等の直線と、文字
・図形もしくは記号等を含んで構成される濃淡画像か
ら、文字・図形・記号等を抽出する画像抽出方式におい
て、 前記濃淡画像から罫線等の直線・文字・図形・記号等と
背景を分離して2値画像を取得する2値化手段と、前記
2値画像から罫線等の直線を除去するとともに、除去し
た罫線等に属する画素(以下罫線画素と略記する)及び
罫線除去後の2値画像に含まれる文字等に属する画素
(以下文字画素と略記する)の位置を記憶する罫線除去
手段と、前記文字画素に隣接している前記罫線画素の中
で最も濃度の高い画素を各隣接箇所において種画素とし
て選出する種画素選出手段と、前記種画素とそれに対し
て直線方向に存在する罫線画素の平均濃度を前記濃淡画
像を参照して算出する平均罫線濃度算出手段と、前記種
画素の位置を開始点として前記濃淡画像中の画素を直線
の方向に走査し、画素値が前記平均罫線濃度に定数を加
算した値よりも大きい場合には文字と直線が重なった画
素(以下干渉画素と呼ぶ)と判定してその位置座標を記
憶し、一方画素値が前記平均罫線濃度値に定数を加算し
た値よりも小さい場合にはその方向への画素走査を終了
する干渉画素復元手段とを有し、前記干渉画素復元手段
により干渉画素が抽出された場合、抽出された干渉画素
を文字画素と同様に扱って、これと罫線画素が隣接する
箇所に対して前記種画素選出手段以降の処理を繰り返し
適用し、文字等を抽出することを特徴とする画像抽出方
式。
1. An image extraction method for extracting characters, figures, symbols, etc. from a grayscale image including at least straight lines such as frame lines and ruled lines and characters, figures, symbols, etc., wherein a ruled line is created from the grayscale image. Binarizing means for separating a straight line, characters, figures, symbols, etc. from a background to obtain a binary image, and removing straight lines such as ruled lines from the binary image, and pixels belonging to the removed ruled lines ( A ruled line removal unit that stores the positions of pixels (hereinafter abbreviated as ruled line pixels) and the positions of pixels (hereinafter abbreviated as character pixels) belonging to a character or the like included in a binary image after ruled line removal, and the ruled line adjacent to the character pixel A seed pixel selection unit that selects the pixel with the highest density among the ruled line pixels as a seed pixel at each adjacent position, and the average density of the seed pixel and the ruled line pixels existing in a straight line direction with respect to the seed pixel, with reference to the grayscale image. Calculated And an average ruled line density calculation means for scanning the pixels in the grayscale image in the direction of a straight line starting from the position of the seed pixel, and when the pixel value is larger than a value obtained by adding a constant to the average ruled line density. A pixel in which a character and a straight line overlap each other (hereinafter referred to as an interference pixel) is determined and its position coordinates are stored. On the other hand, when the pixel value is smaller than the value obtained by adding a constant to the average ruled line density value, the direction to that direction is set. An interfering pixel restoring means for ending the pixel scanning, and when the interfering pixel is extracted by the interfering pixel restoring means, the extracted interfering pixel is treated in the same manner as a character pixel, and a portion where the ruled line pixel and the ruled line pixel are adjacent to each other. An image extraction method characterized in that characters and the like are extracted by repeatedly applying the processing after the seed pixel selection means to the.
【請求項2】 前記干渉画素復元手段により生成される
文字等を含む画像から各文字等の存在領域を定め、1つ
の文字だけを含む文字画像を出力する文字切り出し手段
と、前記文字画像を文字認識して文字コードを出力する
文字認識手段とを有することを更に特徴とする請求項1
に記載の画像抽出方式。
2. A character cutout unit that determines a region where each character or the like exists from an image including the character or the like generated by the interference pixel restoration unit and outputs a character image that includes only one character, and the character image A character recognition means for recognizing and outputting a character code is further provided.
Image extraction method described in.
【請求項3】 少なくとも枠線や罫線等の直線と文字・
図形もしくは記号等を含んで構成される濃淡画像から、
文字・図形・記号等を抽出する画像抽出方式において、 前記濃淡画像から罫線等の直線・文字・図形・記号と背
景を抽出して2値画像を取得する2値化手段と、前記2
値画像から罫線等の直線を除去するとともに、除去した
罫線画素及び罫線除去後の2値画像に含まれる文字画素
の位置を記憶する罫線除去手段と、前記文字画素と前記
罫線画素が隣接する箇所毎に、隣接箇所付近の罫線画素
を直線方向に並ぶ幅1の画素列に分割し、各画素列の中
から最大濃度値を持つ画素を種画素として抽出する種画
素選出手段と、前記種画素が属する画素列に存在する罫
線画素の平均濃度を前記濃淡画像を参照して算出する平
均罫線濃度算出手段と、前記隣接箇所付近の罫線画素に
対し、前記濃淡画像中におけるその濃淡値とその画素が
属する画素列の前記平均濃度との差から、その画素が干
渉画素である確からしさを算出する干渉画素らしさ算出
手段と、前記干渉画素らしさの高い画素がどの程度連続
的に存在するかをその画素の文字ストロークらしさとし
て算出する文字ストロークらしさ算出手段と、前記種画
素を開始点としてその種画素が属する罫線の方向に画素
を走査し、前記文字ストロークらしさがしきい値よりも
大きい場合には干渉画素と判定してその位置座標を記憶
し、一方文字ストロークらしさが前記しきい値よりも小
さい場合にはその方向への画素走査を終了する干渉画素
復元手段とを備えることを特徴とする画像抽出方式。
3. A straight line such as a frame line or ruled line and a character
From a grayscale image that includes figures or symbols,
An image extraction method for extracting characters, figures, symbols, etc., comprising: binarizing means for extracting straight lines such as ruled lines, characters, figures, symbols, and background from the grayscale image to obtain a binary image;
Ruled line removing means for removing straight lines such as ruled lines from the value image and storing the positions of the removed ruled line pixels and the character pixels contained in the binary image after the ruled line removal, and a position where the character pixel and the ruled line pixel are adjacent to each other. A seed pixel selecting unit that divides ruled line pixels near adjacent portions into pixel columns of a width 1 arranged in a linear direction for each pixel, and extracts a pixel having the maximum density value as a seed pixel from each pixel column, and the seed pixel. And an average ruled line density calculating means for calculating an average density of ruled line pixels existing in a pixel row to which the pixel belongs, and a gray value and its pixel in the gray image for ruled line pixels near the adjacent portion. From the difference in the average density of the pixel row to which the pixel belongs, an interference pixel likelihood calculating means for calculating the probability that the pixel is an interference pixel, and how consecutively the pixels with high interference pixel likelihood are present. Character stroke likelihood calculating means for calculating as the character stroke likelihood of the pixel of, and scanning the pixel in the direction of the ruled line to which the seed pixel belongs, starting from the seed pixel, when the character stroke likelihood is larger than a threshold value. Includes an interfering pixel restoring unit that stores the position coordinates of the interfering pixel and stores the position coordinate, and ends the pixel scanning in that direction when the character stroke likelihood is smaller than the threshold value. Image extraction method.
【請求項4】 前記干渉画素復元手段により生成される
文字等を含む画像から各文字等の存在領域を定め、1つ
の文字だけを含む文字画像を出力する文字切り出し手段
と、前記文字画像を文字認識して文字コードを出力する
文字認識手段とを有することを更に特徴とする請求項3
に記載の画像抽出方式。
4. A character cutout unit that determines a region where each character or the like exists from an image including the character or the like generated by the interference pixel restoration unit, and outputs a character image including only one character, and the character image A character recognizing means for recognizing and outputting a character code is further provided.
Image extraction method described in.
JP8118059A 1996-05-13 1996-05-13 Image extraction method Expired - Lifetime JP2871590B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP8118059A JP2871590B2 (en) 1996-05-13 1996-05-13 Image extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP8118059A JP2871590B2 (en) 1996-05-13 1996-05-13 Image extraction method

Publications (2)

Publication Number Publication Date
JPH09305707A true JPH09305707A (en) 1997-11-28
JP2871590B2 JP2871590B2 (en) 1999-03-17

Family

ID=14727006

Family Applications (1)

Application Number Title Priority Date Filing Date
JP8118059A Expired - Lifetime JP2871590B2 (en) 1996-05-13 1996-05-13 Image extraction method

Country Status (1)

Country Link
JP (1) JP2871590B2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6778712B1 (en) 1999-12-20 2004-08-17 Fujitsu Limited Data sheet identification device
JP2005258683A (en) * 2004-03-10 2005-09-22 Fujitsu Ltd Character recognition device, character recognition method, medium processing method, character recognition program, and computer readable recording medium recording character recognition program
US7796817B2 (en) 2006-09-14 2010-09-14 Fujitsu Limited Character recognition method, character recognition device, and computer product
US8854691B2 (en) 2011-02-25 2014-10-07 Murata Machinery Ltd. Image processing apparatus and image processing method for extracting a line segment
CN112488108A (en) * 2020-12-11 2021-03-12 广州小鹏自动驾驶科技有限公司 Parking space number identification method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6778712B1 (en) 1999-12-20 2004-08-17 Fujitsu Limited Data sheet identification device
JP2005258683A (en) * 2004-03-10 2005-09-22 Fujitsu Ltd Character recognition device, character recognition method, medium processing method, character recognition program, and computer readable recording medium recording character recognition program
US7796817B2 (en) 2006-09-14 2010-09-14 Fujitsu Limited Character recognition method, character recognition device, and computer product
US8854691B2 (en) 2011-02-25 2014-10-07 Murata Machinery Ltd. Image processing apparatus and image processing method for extracting a line segment
CN112488108A (en) * 2020-12-11 2021-03-12 广州小鹏自动驾驶科技有限公司 Parking space number identification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
JP2871590B2 (en) 1999-03-17

Similar Documents

Publication Publication Date Title
JP3904840B2 (en) Ruled line extraction device for extracting ruled lines from multi-valued images
EP0632402B1 (en) Method for image segmentation and classification of image elements for document processing
US8644616B2 (en) Character recognition
JP6080259B2 (en) Character cutting device and character cutting method
JP2951814B2 (en) Image extraction method
US8947736B2 (en) Method for binarizing scanned document images containing gray or light colored text printed with halftone pattern
JP2005523530A (en) System and method for identifying and extracting character string from captured image data
WO2011128777A2 (en) Segmentation of textual lines in an image that include western characters and hieroglyphic characters
CN116071763B (en) Teaching book intelligent correction system based on character recognition
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
JP3411472B2 (en) Pattern extraction device
JP2011248702A (en) Image processing device, image processing method, image processing program, and program storage medium
JP4049560B2 (en) Halftone dot removal method and system
JPH09305707A (en) Image extracting system
JP2868134B2 (en) Image processing method and apparatus
CN109086766B (en) Multi-threshold fusion crown word number extraction method based on integral graph
JPH10261047A (en) Character recognition device
JP2003317107A (en) Method and device for ruled-line detection
CN109086769B (en) Method for identifying fracture adhesion laser printing digit string
JP3343305B2 (en) Character extraction device and character extraction method
Jambekar A Review of Optical Character Recognition System for Recognition of Printed Text
JP3634248B2 (en) Character area extraction method, character area extraction apparatus, and recording medium
Selvam et al. Enhancing Text Detection in Natural Scenes: A Hybrid Approach with MSER, Connected Components, and Norm-CLAHE
Siddique et al. An absolute Optical Character Recognition system for Bangla script Utilizing a captured image
Deivalakshmi A simple system for table extraction irrespective of boundary thickness and removal of detected spurious lines

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080108

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20090108

Year of fee payment: 10

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100108

Year of fee payment: 11

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110108

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110108

Year of fee payment: 12

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120108

Year of fee payment: 13

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130108

Year of fee payment: 14

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130108

Year of fee payment: 14

EXPY Cancellation because of completion of term