JPH09305707A

JPH09305707A - Image extracting system

Info

Publication number: JPH09305707A
Application number: JP8118059A
Authority: JP
Inventors: Katsuhiko Takahashi; 勝彦高橋
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1996-05-13
Filing date: 1996-05-13
Publication date: 1997-11-28
Anticipated expiration: 2016-05-13
Also published as: JP2871590B2

Abstract

PROBLEM TO BE SOLVED: To highly precisely extract a character or the like from a gradation image constituted by containing a straight line and the character, etc., at least. SOLUTION: An interference picture element likelihood calculating means 904 regards picture elements, which exist near the part of contact with the character among picture elements (ruled line picture elements) belonging to the straight line identified by a ruled line removing means 902, as the cluster of picture element strings of width 1 arranged in linear direction, the average density value calculated by an average ruled line density calculating means 903 is compared with the density of respective ruled line picture elements belonging to that picture element string for each picture element, and the possibility of a picture element (interference picture element) overlapping the character and the straight line is found for each picture element. A character stroke likelihood calculating means 905 verifies the continuity of picture elements having the high possibility of the interference picture element and expresses this continuity as the exponent of character stroke likelihood. An interference picture element restoring means 906 extracts the picture element having the high exponent of character stroke likelihood as the interference picture element.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、画像抽出方式に係り、
特に、光学的文字読み取り装置において文字枠や罫線な
どの直線に干渉した文字・図形・記号などを抽出するた
めの画像抽出するための画像抽出方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an image extraction system,
In particular, the present invention relates to an image extraction method for extracting an image for extracting characters, figures, symbols, etc. that interfere with straight lines such as character frames and ruled lines in an optical character reading device.

【０００２】[0002]

【従来の技術】多値画像を入力画像として、罫線等の直
線と文字等を含む帳票画像を読み取る方法として特開平
７−３３４６１９号公報に記載された技術が提案されて
いる。この公報に開示された発明では、文字の方が罫線
よりも濃い帳票を仮定し、文字認識のためのしきい値と
直線検出のためのしきい値を入力画像の状態から決定す
る。文字認識のためのしきい値よりも画素値が大きい画
素を選出すれば一意的に文字だけを抽出することができ
る。2. Description of the Related Art A technique disclosed in Japanese Patent Application Laid-Open No. 7-334619 has been proposed as a method for reading a form image including straight lines such as ruled lines and characters using a multivalued image as an input image. In the invention disclosed in this publication, it is assumed that characters are darker than ruled lines, and the threshold for character recognition and the threshold for straight line detection are determined from the state of the input image. If a pixel having a pixel value larger than the threshold value for character recognition is selected, only the character can be uniquely extracted.

【０００３】また、枠線や罫線などの直線と文字等を含
む２値画像から、直線に干渉した文字等を抽出するため
の方法として特公昭６３−２５１８７４号公報、特開平
６−３０９４９８号公報等に開示された技術が知られて
いる。特公昭６３−２５１８７４号公報や特開平６−３
０９４９８号公報に記載された方法は、帳票画像を２値
画像として取り込み、その文字枠の線幅や文字ストロー
クの連続性などの特徴から文字等を抽出している。Further, as a method for extracting a character interfering with a straight line from a binary image including a straight line such as a frame line and a ruled line and a character, Japanese Patent Publication No. 63-251874 and Japanese Patent Application Laid-Open No. 6-309498. The technology disclosed in US Pat. JP-B-63-251874 and JP-A-6-3
In the method described in Japanese Patent Publication No. 09498, a form image is captured as a binary image, and characters and the like are extracted based on features such as the line width of the character frame and the continuity of character strokes.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上記特
開平７−３３４６１９号公報に記載された方法は、罫線
と文字の濃度が明確な場合には非常に有効であるが、実
際の帳票では必ずしも罫線と文字の濃度が明確に分離す
るとは限らない。例えば、同じインクで印刷された罫線
でも太い罫線と細い罫線では濃淡値が異なり、一本の罫
線でも罫線の中心部と周辺部では濃度が大きく異なる。
また、罫線に微妙な傾きがある場合には、罫線の方向に
沿って濃度値が微妙に変化していく。筆記文字も筆圧の
かかっている部分とそうでない部分では全く濃度が異な
る。However, the method described in Japanese Patent Application Laid-Open No. 7-334619 is very effective when the density of ruled lines and characters is clear, but it is not always necessary in actual forms. And the density of letters are not always clearly separated. For example, even a ruled line printed with the same ink has a different shade value between a thick ruled line and a thin ruled line, and even a single ruled line has greatly different densities in the center and peripheral portions of the ruled line.
If the ruled line has a slight inclination, the density value changes subtly along the direction of the ruled line. With regard to the written character, the density is completely different between the part where the writing pressure is applied and the part where it is not.

【０００５】一方、特公昭６３−２５１８７４号公報、
特開平６−３０９４９８号公報などに記載された２値画
像に基づく手法は、文字と罫線を一意的に分離できない
ことがある。これは、２値画像上での文字枠幅や文字ス
トロークの連続性といった特徴がかならずしも文字と罫
線を一意的に分離できる特徴ではないことによる。On the other hand, Japanese Patent Publication No. 63-251874,
A method based on a binary image described in Japanese Patent Laid-Open No. 6-309498 may not be able to uniquely separate a character from a ruled line. This is because the features such as the character frame width and the character stroke continuity on the binary image are not always the features that can uniquely separate the character and the ruled line.

【０００６】これを解決するために、特公昭６３−２５
１８７４号公報に開示された発明では複数の分離状態を
仮説として生成し、これを文字認識により検証するとい
う仮説検証方式を採用している。しかしながら、ブロッ
ク文字枠や罫線から構成される帳票などでは、一文字枠
から構成される帳票に比べて枠線や罫線（以下では２つ
をまとめて単に罫線と呼ぶ）等の直線と文字等が複雑に
干渉する場合が増えるために、仮説の数が指数関数的に
増加するという問題がある。また、数字“２”の最下部
にある横線や英字“Ａ”の中央部の横線などが罫線に重
なった場合でも正しく文字を抽出できるようにするため
には、非常に多くの分離状態を仮説として生成する必要
があるという問題もある。To solve this problem, Japanese Patent Publication No. 63-25
In the invention disclosed in Japanese Patent No. 1874, a hypothesis verification method is adopted in which a plurality of separation states are generated as hypotheses and the hypotheses are verified by character recognition. However, in a form composed of block character frames and ruled lines, compared to a form composed of one character frame, straight lines such as frame lines and ruled lines (hereinafter, the two are simply referred to as ruled lines) and characters are more complicated. There is a problem that the number of hypotheses increases exponentially due to the increased number of cases that interfere with. Also, in order to be able to correctly extract characters even if the horizontal line at the bottom of the number “2” or the horizontal line at the center of the English letter “A” overlaps the ruled line, a large number of separation states are hypothesized. There is also a problem that it needs to be generated as.

【０００７】本発明は従来の上記実情に鑑み、従来の技
術に内在する上記の問題点を解決するためになされたも
のであり、従って本発明の目的は、罫線等の直線と文字
等の濃度差があまり明確でない帳票画像からでも高精度
にかつ一意的に文字等を分離することを可能とした新規
な画像抽出方式を提供することにある。In view of the above-mentioned conventional circumstances, the present invention has been made to solve the above-mentioned problems inherent in the prior art. Therefore, the object of the present invention is to provide straight lines such as ruled lines and the density of characters. An object of the present invention is to provide a novel image extraction method that enables highly accurate and unique separation of characters and the like even from form images in which the difference is not so clear.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するため
に、請求項１に記載の発明（第１の発明）は、少なくと
も枠線や罫線等の直線と、文字・図形もしくは記号等を
含んで構成される濃淡画像から、文字・図形・記号等を
抽出する画像抽出方式において、前記濃淡画像から罫線
等の直線・文字・図形・記号等と背景を分離して２値画
像を取得する２値化手段と、前記２値画像から罫線等の
直線を除去するとともに、除去した罫線等に属する画素
（以下罫線画素と略記する）及び罫線除去後の２値画像
に含まれる文字等に属する画素（以下文字画素と略記す
る）の位置を記憶する罫線除去手段と、前記文字画素に
隣接している前記罫線画素の中で最も濃度の高い画素を
各隣接箇所において種画素として選出する種画素選出手
段と、前記種画素とそれに対して直線方向に存在する罫
線画素の平均濃度を前記濃淡画像を参照して算出する平
均罫線濃度算出手段と、前記種画素の位置を開始点とし
て前記濃淡画像中の画素を直線の方向に走査し、画素値
が前記平均罫線濃度に定数を加算した値よりも大きい場
合には文字と直線が重なった画素（以下干渉画素と呼
ぶ）と判定してその位置座標を記憶し、一方画素値が前
記平均罫線濃度値に定数を加算した値よりも小さい場合
にはその方向への画素走査を終了する干渉画素復元手段
とを有し、前記干渉画素復元手段により干渉画素が抽出
された場合、抽出された干渉画素を文字画素と同様に扱
って、これと罫線画素が隣接する箇所に対して前記種画
素選出手段以降の処理を繰り返し適用し、文字等を抽出
することを特徴としている。In order to achieve the above object, the invention according to claim 1 (first invention) includes at least a straight line such as a frame line or a ruled line, and characters, figures or symbols. In an image extraction method for extracting characters, figures, symbols, etc. from a grayscale image composed of, a binary image is obtained by separating lines such as ruled lines, characters, figures, symbols, etc. from the grayscale image and the background 2 A binarizing unit and a straight line such as a ruled line are removed from the binary image, and pixels belonging to the removed ruled line (hereinafter abbreviated as ruled line pixels) and pixels included in the binary image after the ruled line removal are also included. Ruled line removing means for storing the position (hereinafter abbreviated as character pixel), and seed pixel selection for selecting the pixel with the highest density among the ruled line pixels adjacent to the character pixel as a seed pixel at each adjacent position Means and the seed pixel On the other hand, an average ruled line density calculating means for calculating the average density of ruled line pixels existing in a straight line direction with reference to the grayscale image, and a pixel in the grayscale image in the straight line direction starting from the position of the seed pixel. When the pixel value is scanned, and the pixel value is larger than the value obtained by adding a constant to the average ruled line density, it is determined to be a pixel in which a character and a straight line overlap (hereinafter referred to as an interference pixel), and the position coordinates are stored, and one pixel value Has an interference pixel restoration means for ending the pixel scanning in that direction when is smaller than a value obtained by adding a constant to the average ruled line density value, and when an interference pixel is extracted by the interference pixel restoration means, The extracted interference pixel is treated in the same manner as the character pixel, and the character pixel and the like are extracted by repeatedly applying the processing after the seed pixel selection means to the location where the ruled pixel is adjacent to the interference pixel.

【０００９】また請求項３に記載の発明（第２の発明）
は、少なくとも枠線や罫線等の直線と文字・図形もしく
は記号等を含んで構成される濃淡画像から、文字・図形
・記号等を抽出する画像抽出方式において、前記濃淡画
像から罫線等の直線・文字・図形・記号と背景を抽出し
て２値画像を取得する２値化手段と、前記２値画像から
罫線等の直線を除去するとともに、除去した罫線画素及
び罫線除去後の２値画像に含まれる文字画素の位置を記
憶する罫線除去手段と、前記文字画素と前記罫線画素が
隣接する箇所毎に、隣接箇所付近の罫線画素を直線方向
に並ぶ幅１の画素列に分割し、各画素列の中から最大濃
度値を持つ画素を種画素として抽出する種画素選出手段
と、前記種画素が属する画素列に存在する罫線画素の平
均濃度を前記濃淡画像を参照して算出する平均罫線濃度
算出手段と、前記隣接箇所付近の罫線画素に対し、前記
濃淡画像中におけるその濃淡値とその画素が属する画素
列の前記平均濃度との差から、その画素が干渉画素であ
る確からしさを算出する干渉画素らしさ算出手段と、前
記干渉画素らしさの高い画素がどの程度連続的に存在す
るかをその画素の文字ストロークらしさとして算出する
文字ストロークらしさ算出手段と、前記種画素を開始点
としてその種画素が属する罫線の方向に画素を走査し、
前記文字ストロークらしさがしきい値よりも大きい場合
には干渉画素と判定してその位置座標を記憶し、一方文
字ストロークらしさが前記しきい値よりも小さい場合に
はその方向への画素走査を終了する干渉画素復元手段と
を備えることを特徴としている。The invention according to claim 3 (second invention)
Is an image extraction method for extracting characters, figures, symbols, etc. from a grayscale image including at least straight lines such as frame lines and ruled lines, and characters, figures, symbols, etc. Binarization means for extracting a binary image by extracting characters / figures / symbols and a background, and removing straight lines such as ruled lines from the binary image, and removing the ruled line pixels and the binary image after the ruled lines are removed. Ruled line removing means for storing the position of the included character pixel, and for each position where the character pixel and the ruled line pixel are adjacent to each other, the ruled line pixel near the adjacent position is divided into a pixel row of width 1 arranged in the linear direction, and each pixel is divided. Seed pixel selection means for extracting a pixel having the maximum density value from a column as a seed pixel, and an average ruled line density for calculating an average density of ruled line pixels existing in a pixel column to which the seed pixel belongs by referring to the grayscale image Calculating means, and For a ruled line pixel near the contact point, an interference pixel likelihood calculating means for calculating the probability that the pixel is an interference pixel from the difference between the gray value in the gray image and the average density of the pixel row to which the pixel belongs. And a character stroke likelihood calculating means for calculating how consecutively the pixels having a high interference pixel likelihood exist as the character stroke likelihood of the pixel, and the direction of the ruled line to which the seed pixel belongs from the seed pixel as a starting point. Scan the pixels to
If the character stroke likeness is larger than the threshold value, it is determined as an interference pixel and the position coordinates thereof are stored, while if the character stroke likeness is smaller than the threshold value, the pixel scanning in that direction is ended. And an interfering pixel restoring means for performing the same.

【００１０】[0010]

【作用】罫線等の直線と文字等の濃度が明確に分離しな
い帳票でも、罫線と文字が重なっている干渉画素は、罫
線方向においてそれと同じ画素列上に存在する金棒の罫
線画素よりも濃度が高い傾向がある。[Function] Even in a form in which the straight lines such as ruled lines and the densities of characters and the like are not clearly separated, the interference pixels in which the ruled lines and the characters overlap have a higher density than the ruled line pixels of the gold rod existing on the same pixel line in the ruled line direction. Tends to be high.

【００１１】請求項１の発明（第１の発明）では、幾何
学的特徴などから同定（設定）した罫線画素のうち、干
渉画素が存在する可能性の高い箇所、すなわち罫線画素
と文字画素が隣接する箇所周辺に存在する罫線画素に対
して画素列毎に平均濃度を調べ、この平均濃度よりも濃
度が高い画素を干渉画素と判定することにより、罫線等
と文字等の濃度値が明確に分離していない画像からでも
干渉画素を高精度にかつ一意的に抽出することができ
る。罫線除去手段によって抽出された文字画素と干渉画
素とを合成すれば文字を抽出することができる。また、
種画素の周辺から順番に干渉画素の復元を行い、復元の
途中で干渉画素でない画素がみつかったならばその方向
への画素探索を中止することにより、入力画像が多少の
ノイズ成分を含んでいる画像でも、本来干渉画素でない
画素を誤って復元しないようにすることができる。In the invention of claim 1 (first invention), among the ruled line pixels identified (set) from geometrical characteristics and the like, there is a high possibility that interference pixels exist, that is, ruled line pixels and character pixels are By checking the average density of each pixel row for the ruled line pixels existing around adjacent areas, and determining the pixel with a density higher than this average density as an interfering pixel, the density values of the ruled lines and characters can be clarified. Interference pixels can be extracted with high accuracy and uniquely even from images that are not separated. A character can be extracted by combining the character pixel extracted by the ruled line removing means and the interference pixel. Also,
Interference pixels are restored in order from the periphery of the seed pixel, and if a pixel that is not an interference pixel is found during restoration, the pixel search in that direction is stopped, so that the input image contains some noise components. Even in an image, it is possible to prevent pixels that are not originally interference pixels from being mistakenly restored.

【００１２】また、請求項３の発明（第２の発明）で
は、周囲に存在する罫線画素よりも濃い罫線画素がどの
程度連続的に存在するかという特徴から文字ストローク
らしさを算出し、これを干渉画素抽出のための基準とし
ているので、濃淡差の抽出できない罫線画素が数画素あ
っても周囲に濃淡差の明確な罫線画素があれば、滑らか
な形状で干渉画素を抽出することができるようになる。According to the third aspect of the invention (the second aspect of the invention), the character stroke likeness is calculated from the characteristic of how continuously the ruled line pixels darker than the surrounding ruled line pixels exist, and this is calculated. Since it is used as a reference for extracting interference pixels, it is possible to extract interference pixels with a smooth shape if there are some ruled line pixels in which shade differences are clear even if there are several ruled line pixels in which shade differences cannot be extracted. become.

【００１３】[0013]

【実施例】以下本発明をその好ましい各実施例について
図面を参照しながら詳細に説明する。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will now be described in detail with reference to the drawings for each of its preferred embodiments.

【００１４】図１は請求項１に記載の発明（第１の発
明）に基づき構成した帳票画像からの文字抽出方式の一
実施例を示すブロック図である。図２に本実施例の入力
画像の一例を示す。FIG. 1 is a block diagram showing an embodiment of a character extraction method from a form image constructed according to the invention (first invention) described in claim 1. FIG. 2 shows an example of the input image of this embodiment.

【００１５】図１、図２を参照するに、２値化手段１０
０は入力画像を２値化し、文字及び罫線と背景とを分離
する。２値化の方法としては、画像全体を一様なしきい
値で２値化する方法や局所領域毎にしきい値を変化させ
る方法などが一般的に知られている。図３は、図２に示
した入力画像を一様なしきい値で２値化した画像の一部
を示す図である。罫線除去手段１０１は、２値化手段１
００が出力する２値画像から帳票に印刷されている罫線
を抽出し、除去する。また、罫線除去手段１０１は除去
した罫線画素の位置を記憶する。具体的な記憶方法とし
ては、２値画像と同じ大きさの画像領域を確保して罫線
位置を記憶してもよいし、２値画像中の罫線画素に第３
番目の値を書き込んで３値画像としてもよい。本実施例
では後者の方法で罫線位置を記憶するものとする。Referring to FIGS. 1 and 2, the binarizing means 10
0 binarizes the input image and separates the character and ruled line from the background. As a binarization method, a method of binarizing the entire image with a uniform threshold value, a method of changing the threshold value for each local region, etc. are generally known. FIG. 3 is a diagram showing a part of an image obtained by binarizing the input image shown in FIG. 2 with a uniform threshold value. The ruled line removing means 101 is the binarizing means 1
The ruled line printed on the form is extracted from the binary image output by 00 and removed. Further, the ruled line removing means 101 stores the positions of the removed ruled line pixels. As a specific storage method, an image area having the same size as the binary image may be secured to store the ruled line position, or the ruled line pixel in the binary image may be stored in the third position.
The third value may be written to form a ternary image. In this embodiment, the ruled line position is stored by the latter method.

【００１６】図４は罫線除去手段１０１が生成する３値
画像の一部を示す図である。この図４からわかるよう
に、罫線除去手段１０１の動作が終了した時点で、背景
画素４００、文字画素４０１、罫線画素４０２（但し罫
線画素は文字と罫線が重なった画素＝干渉画素を含んで
いる）の位置が同定（設定）される。FIG. 4 is a diagram showing a part of the ternary image generated by the ruled line removing means 101. As can be seen from FIG. 4, when the operation of the ruled line removing means 101 is completed, the background pixel 400, the character pixel 401, and the ruled line pixel 402 (however, the ruled line pixel includes a pixel where the character and the ruled line overlap = interference pixel). ) Is identified (set).

【００１７】次に、種画素選出手段１０２の動作原理を
図４、図５を用いて説明する。Next, the operation principle of the seed pixel selection means 102 will be described with reference to FIGS. 4 and 5.

【００１８】図４において、文字画素４０１と罫線画素
４０２が隣接している箇所が２箇所ある。抽出対象であ
る干渉画素は、文字ストロークの一部であるから、既に
文字画素と判明している画素４０１に隣接する位置に存
在する確率が高い。また、干渉画素は周囲の罫線濃度よ
りも濃度が高いという仮定から、各隣接箇所においても
っとも干渉画素である確率が高い画素を種画素として選
出する。例えば、図４のような画像に対しては、図５に
示すような２つの種画素５００、及び５０１が選出され
る。In FIG. 4, there are two places where the character pixel 401 and the ruled line pixel 402 are adjacent to each other. Since the interference pixel to be extracted is a part of the character stroke, there is a high probability that it will be present at a position adjacent to the pixel 401 already known as the character pixel. Further, on the assumption that the interference pixel has a higher density than the surrounding ruled line density, the pixel having the highest probability of being an interference pixel at each adjacent portion is selected as the seed pixel. For example, for the image shown in FIG. 4, two seed pixels 500 and 501 shown in FIG. 5 are selected.

【００１９】平均罫線濃度算出手段１０３は、種画素選
出手段１０２で選出された各種画素周辺の平均罫線濃度
を算出する。横罫線の場合には種画素を中心としてその
左右方向に存在するｗ_h個の罫線画素の平均濃度を算出
する。縦罫線の場合には種画素を中心として、その上下
方向に存在するｗ_v個の罫線画素の平均濃度を算出す
る。一般的に、横罫線との重なり長の方が縦罫線との重
なり長よりも短いので、ｗ_h＜ｗ_vとなるようにする。
また、平均罫線濃度算出のための罫線画素が上記の個数
に達しない場合でも、そこにある罫線画素のみから平均
値を算出する。The average ruled line density calculating means 103 calculates the average ruled line density around the various pixels selected by the seed pixel selecting means 102. In the case of horizontal ruled lines is calculated an average density of w _h pieces of ruled lines pixels existing on the left and right directions around the seed pixel. In the case of a vertical ruled line, the average density of w _v ruled line pixels existing in the vertical direction centering on the seed pixel is calculated. Generally, towards the overlapping length between the lateral borders shorter than the overlap length between the vertical ruled lines, so-w _h <w _v.
Even if the number of ruled line pixels for calculating the average ruled line density does not reach the above number, the average value is calculated only from the ruled line pixels existing there.

【００２０】更に、種画素と同一列上にある画素のみか
ら平均値を算出する理由を図６を用いて説明する。Further, the reason why the average value is calculated only from the pixels on the same column as the seed pixel will be described with reference to FIG.

【００２１】図６は実際の帳票画像にて観察された、横
罫線と縦方向の文字ストロークが交差している箇所の濃
度値を示す図である。この図から、罫線方向に並んだ画
素列６００の中では、干渉画素とそれ以外の罫線画素の
値が明確に分離しているが、罫線の中心部付近と周辺部
付近では濃度値が異なることがわかる。例えば、罫線中
心部にある罫線画素６０１と、罫線輪郭部にある干渉画
素６０２では罫線画素６０１の方が濃度値が高い。これ
は、画像入力装置が備える１つのＣＣＤ画素が光を感知
する領域の直径が罫線幅に対してかなり大きいことに依
っており、このことから、罫線画素を干渉画素とそれ以
外の画素に分離するには各画素列毎に濃淡変化を調べる
のが有効であることがわかる。FIG. 6 is a diagram showing the density value of a portion where a horizontal ruled line and a character stroke in the vertical direction intersect, which is observed in an actual form image. From this figure, the values of the interfering pixel and the other ruled line pixels are clearly separated in the pixel row 600 arranged in the ruled line direction, but the density values are different near the center and around the ruled line. I understand. For example, in the ruled line pixel 601 in the center of the ruled line and the interference pixel 602 in the ruled line outline, the ruled line pixel 601 has a higher density value. This is because one CCD pixel included in the image input device has a diameter of a region in which light is sensed that is considerably larger than the ruled line width. Therefore, the ruled line pixel is separated into an interference pixel and other pixels. In order to do this, it is effective to check the change in density for each pixel column.

【００２２】次に干渉画素復元手段１０４の動作原理を
図７を用いて説明する。横方向をｘ軸、縦方向をｙ軸と
すると、干渉画素復元手段１０４は、各種画素を開始点
として、横罫線なら左右方向、縦罫線なら上下方向に画
素を探索し、その濃淡値ｇ（ｘ，ｙ）とその種画素に対
して平均罫線濃度算出手段１０３が算出した平均濃度値
を比較して、ｇ（ｘ_s＋ｘ，ｙ）＞平均値＋α ・・・・・・・・（１）を満たす画素Ｐ（ｘ_s＋ｘ，ｙ）を干渉画素として復元
し、ラベル画像中に干渉画素に対応する第４の値を書き
込む。ここで、ｘ_sは種画素のｘ座標、αは定数を示
す。図７は横罫線の場合を示しており、まず種画素７０
０の濃度と平均値を比較し、式（１）を満たしていたら
干渉画素と判定して、その左の画素７０１の濃度を調べ
る。画素７０１も式（１）を満たしたならば、さらに左
方向の画素を探索していき、式（１）を満たさない画素
が現れたら種画素の右側に存在する画素について同様に
探索を行っていく。右側の画素に対しての探索が終了し
た場合にはこの種画素に対する干渉画素の復元を終了す
る。Next, the operation principle of the interference pixel restoration means 104 will be described with reference to FIG. If the horizontal direction is the x-axis and the vertical direction is the y-axis, the interference pixel restoration unit 104 searches for pixels in the horizontal direction for horizontal ruled lines and in the vertical direction for vertical ruled lines, and the grayscale value g ( (x, y) and the average density value calculated by the average ruled line density calculation means 103 for the seed pixel, g (x _s + x, y)> average value + α (1) Pixel P (x _s + x, y) satisfying the above condition is restored as an interference pixel, and the fourth value corresponding to the interference pixel is written in the label image. Here, x _s indicates the x coordinate of the seed pixel, and α indicates a constant. FIG. 7 shows a case of a horizontal ruled line. First, the seed pixel 70
The density of 0 is compared with the average value, and if the expression (1) is satisfied, it is determined to be an interference pixel, and the density of the pixel 701 on the left is checked. If the pixel 701 also satisfies the expression (1), the pixel in the left direction is further searched, and if a pixel that does not satisfy the expression (1) appears, the pixel on the right side of the seed pixel is similarly searched. Go. When the search for the pixel on the right side is completed, the restoration of the interference pixel for this kind of pixel is completed.

【００２３】そして、抽出された干渉画素を文字画素と
同じように扱って再びこれと罫線画素の隣接箇所に対し
て種画素選出手段１０２以降の処理を繰り返す。すると
さらに干渉画素が復元されて、図４のような画像に対し
ては、図８に示すようなラベル画像が得られる。ここ
で、文字画素ラベルがついた画素と干渉画素ラベルがつ
いた画素のみを抽出すれば、文字を正しく抽出すること
ができる。図８において、８０３は干渉画素である。Then, the extracted interference pixel is treated in the same manner as the character pixel, and the processing from the seed pixel selection means 102 and the subsequent processing is repeated for the adjacent portion of this pixel and the ruled line pixel. Then, the interference pixel is further restored, and a label image as shown in FIG. 8 is obtained for the image as shown in FIG. Here, by extracting only the pixels having the character pixel label and the pixels having the interference pixel label, the character can be correctly extracted. In FIG. 8, reference numeral 803 is an interference pixel.

【００２４】次に、請求項３に記載の発明（第２の発
明）に基づいて構成した帳票画像からの文字抽出方式の
一実施例を図９を用いて説明する。Next, an embodiment of a character extraction method from a form image constructed according to the invention (second invention) described in claim 3 will be described with reference to FIG.

【００２５】図９を参照するに、２値化手段９００及び
罫線除去手段９０１は上述した第１の発明に示したもの
と完全に同機能を有するものである。Referring to FIG. 9, the binarizing means 900 and the ruled line removing means 901 have completely the same functions as those shown in the above-mentioned first invention.

【００２６】ここで先ず、図１０を用いて種画素選出手
段９０２の機能について説明する。種画素選出手段９０
２は、罫線除去手段９０１が出力する画像にて、罫線ラ
ベル画素と文字ラベル画素が隣接している全干渉箇所付
近の各画素列１０００〜１００３において最大濃度値の
画素１００４〜１００７を種画素として選出する。各画
素列の長さ、すなわち種画素の探索範囲は以下のように
して定めるとよい。まず、文字画素列１００８に隣接す
る罫線画素列１０００、及び文字画素列１００９に隣接
する罫線画素列１００２は、文字画素に４連結で接して
いる罫線画素のみから構成される。そして文字画素列１
００８、１００９から離れるほど画素列の長さをやや長
めに設定する。こうすることにより、文字ストロークが
罫線と鋭角的に交わる場合でも、本来干渉画素であるは
ずの画素が種画素として選ばれる確率を増すことができ
る。First, the function of the seed pixel selection means 902 will be described with reference to FIG. Seed pixel selection means 90
In the image 2 output by the ruled line removing unit 901, the pixels 1004 to 1007 having the maximum density value are used as seed pixels in each pixel column 1000 to 1003 near all interference points where the ruled line label pixels and the character label pixels are adjacent to each other. elect. The length of each pixel column, that is, the search range of the seed pixel may be determined as follows. First, the ruled line pixel column 1000 adjacent to the character pixel column 1008 and the ruled line pixel column 1002 adjacent to the character pixel column 1009 are composed only of ruled line pixels that are in four-connection with the character pixel. And character pixel row 1
As the distance from 008, 1009 increases, the length of the pixel row is set to be slightly longer. By doing so, even when a character stroke intersects a ruled line at an acute angle, it is possible to increase the probability that a pixel that should originally be an interference pixel is selected as a seed pixel.

【００２７】平均罫線濃度算出手段９０３は、前記種画
素選出手段９０２が選出した各種画素に対して、同一列
上に存在する罫線画素の平均罫線濃度を算出する。The average ruled line density calculation means 903 calculates the average ruled line density of the ruled line pixels existing on the same column for the various pixels selected by the seed pixel selection means 902.

【００２８】干渉画素らしさ算出手段９０４は、種画素
と同一列上に存在する罫線画素の濃淡値とその種画素に
ついて算出した平均罫線濃度とを比較し、各罫線画素が
干渉画素である確信度（干渉画素らしさ）を計算する。
干渉画素らしさは、横罫線なら種画素を中心としてその
左右に存在するｗ_h個、縦罫線なら種画素を中心として
その上下に存在するｗ_v個の罫線画素についてのみ算出
し、種画素より遠く離れた罫線画素に対しては求める必
要はない。また、干渉画素らしさＯ（ｘ，ｙ）は、Ｏ（ｘ，ｙ）＝ｆ（ｇ（ｘ，ｙ）−平均値）・・・・・・・（２）で定義する。ここで関数ｆはシグモイド関数のような形
をした単調増加の関数を表す。また、本関数の値域の上
限をＯmax 、下限をＯmin とする。The interference pixel likeness calculating means 904 compares the gray value of the ruled line pixel existing on the same column as the seed pixel with the average ruled line density calculated for the seed pixel, and the certainty factor that each ruled line pixel is an interference pixel. (Interference pixel likeness) is calculated.
The likelihood of interfering pixels is calculated only for w _h ruled lines that are on the left and right of the seed pixel in the case of a horizontal ruled line and w _v ruled lines that are above and below the seed pixel for a vertical ruled line, and is farther than the seed pixel. It is not necessary to find the ruled line pixels that are far apart. The interference pixel likelihood O (x, y) is defined by O (x, y) = f (g (x, y) -average value) (2). Here, the function f represents a monotonically increasing function shaped like a sigmoid function. The upper limit of the range of this function is Omax and the lower limit is Omin.

【００２９】罫線と文字の干渉箇所が近接して複数存在
すると、１つの罫線画素に対して複数の干渉画素らしさ
が定義されうるが、そうした場合には最大値である値を
その画素の干渉画素らしさとする。但し、縦罫線と横罫
線の交点部の罫線画素に対しては最小値を採用する。When there are a plurality of interference points between ruled lines and characters in close proximity to each other, it is possible to define a plurality of interference pixel-like pixels for one ruled line pixel. In such a case, the maximum value is set as the interference pixel of that pixel. Let's be like However, the minimum value is adopted for the ruled line pixel at the intersection of the vertical ruled line and the horizontal ruled line.

【００３０】文字ストロークらしさ算出手段９０５は、
干渉画素らしさの算出された画素に対して、干渉画素ら
しさの高い画素がどのくらい連続的に並んでいるかを
“文字ストロークらしさ”という定量的な値で表す。ま
ず、微小領域における方向性を以下の式（３）により定
義する。The character stroke likelihood calculating means 905
With respect to the pixels for which the likelihood of interference pixels has been calculated, how consecutively the pixels with high likelihood of interference pixels are arranged is represented by a quantitative value called "character stroke likelihood". First, the directivity in a minute area is defined by the following equation (3).

【００３１】 [0031]

【００３２】但し本計算において、文字ラベル画素・背
景ラベル画素に対する干渉画素らしさはそれぞれＯ
（ｘ，ｙ）＝Ｏmax 、Ｏ（ｘ，ｙ）＝０min とする。こ
の計算式の理解を容易にするために、例としてＤ
₀（ｘ，ｙ）の算出原理を図１１を用いて説明する。Ｄ
_O（ｘ，ｙ）は横方向に長い２つのフィルタ１１００、
及び１１０１を使って算出した積和演算結果の大きい方
を採用するのと等しく、従って幅２画素程度以上の横方
向ストロークの抽出をつかさどる。Ｄ₄₅（ｘ，ｙ）、Ｄ
₉₀（ｘ，ｙ）、Ｄ₁₃₅（ｘ，ｙ）もこれを４５度ずつ傾
けたようなフィルタで計算することができる。However, in this calculation, the likelihood of interfering pixels with respect to the character label pixel and the background label pixel is O, respectively.
Let (x, y) = Omax and O (x, y) = 0 min. To facilitate understanding of this calculation formula, as an example, D
_The principle of calculating ₀ (x, y) will be described with reference to FIG. D
_O (x, y) is two filters 1100 that are long in the lateral direction,
And 1101 is the same as using the larger one of the product-sum operation results, and thus controls the extraction of lateral strokes having a width of about 2 pixels or more. D ₄₅ (x, y), D
₉₀ (x, y) and D ₁₃₅ (x, y) can also be calculated with a filter that is inclined by 45 degrees.

【００３３】そして各画素の文字ストロークらしさをＬ(x,y) ＝ＭＡＸ（Ｌ_O(x,y),Ｌ₄₅(x,y),Ｌ₉₀（x,y), Ｌ₁₃₅(x,y)) ・・・・・・・・（４）The character stroke likelihood of each pixel is L (x, y) = MAX (L _O (x, y), L ₄₅ (x, y), L ₉₀ (x, y), L ₁₃₅ (x, y) )) ・・・・・・ (4)

【００３４】 [0034]

【００３５】により定義する。値Ｌ_O（ｘ，ｙ），Ｌ₄₅
（ｘ，ｙ），Ｌ₉₀（ｘ，ｙ），Ｌ₁₃₅（ｘ，ｙ）は、そ
れぞれ画素（ｘ，ｙ）付近に横方向、右斜め方向、縦方
向、左斜め方向の干渉画素の並びが存在する確からしさ
を示す。It is defined by Value L _O (x, y), L ₄₅
(X, y), L ₉₀ (x, y), and L ₁₃₅ (x, y) are rows of interference pixels near the pixel (x, y) in the horizontal direction, the right diagonal direction, the vertical direction, and the left diagonal direction. Indicates the likelihood of existence.

【００３６】本計算式の理解を容易にするために、図１
２を用いてＬ_O（ｘ，ｙ）の算出方法を説明する。画素
１２００において横方向の干渉画素列（＝文字ストロー
ク）があるということは、その周囲にもそれに接続する
文字ストロークがあるはずである。そこで、画素位置１
２０１〜１２０３、１２０４〜１２０６のそれぞれにお
いて図示した方向のＤの値が最大になるものをそれぞれ
選出してきて、これと画素位置１２００におけるＤ₀の
値から画素位置１２００における横方向文字ストローク
の存在可能性を算出する。Ｌ₄₅（ｘ，ｙ），Ｌ₉₀（ｘ，
ｙ），Ｌ₁₃₅（ｘ，ｙ）もこれを４５度ずつ傾けた場合
に等しい。To facilitate understanding of this calculation formula, FIG.
The calculation method of L _O (x, y) will be described using 2. The fact that there is an interfering pixel row (= character stroke) in the horizontal direction at the pixel 1200 means that there should be a character stroke connected to it as well. Therefore, pixel position 1
In each of 201 to 1203 and 1204 to 1206, the one having the maximum value of D in the illustrated direction is selected, and from this and the value of D _{0 at} the pixel position 1200, the horizontal character stroke at the pixel position 1200 can exist. Calculate the sex. L ₄₅ (x, y), L ₉₀ (x,
y) and L ₁₃₅ (x, y) are also the same as when they are inclined by 45 degrees.

【００３７】干渉画素復元手段９０６は文字ストローク
らしさの大きい画素を干渉画素として復元する。復元順
序は前記した第１の発明と同じように、文字ラベル画素
と接している罫線画素列から開始して、罫線の反対側の
方向へ進めていく。ひとつの罫線画素列内においては、
種画素を開始点として罫線方向に画素探索し、その文字
ストロークらしさがしきい値よりも大きければ復元して
その隣の画素に着目し、しきい値よりも小さければそこ
でその方向への探索を中止する。The interference pixel restoration means 906 restores a pixel having a large character stroke likeness as an interference pixel. As in the first aspect of the invention, the restoration order starts from the ruled line pixel row in contact with the character label pixel and proceeds in the direction opposite to the ruled line. In one ruled line pixel row,
A pixel is searched in the direction of the ruled line using the seed pixel as a starting point, and if the character stroke likeness is larger than the threshold value, it is restored and the pixel next to it is focused, and if it is smaller than the threshold value, the search is made in that direction. Discontinue.

【００３８】本発明を用いて青色罫線帳票の帳票画像５
０枚から文字を抽出する実験を行った。その結果、罫線
と文字ストロークが長い区間で重複している、比較的文
字抽出が難しい箇所に対して、９４．１％の抽出成功率
を得た。Form image 5 of a blue ruled line form using the present invention
An experiment was conducted to extract characters from 0 sheets. As a result, an extraction success rate of 94.1% was obtained for a portion where character extraction is relatively difficult, where ruled lines and character strokes overlap in a long section.

【００３９】本実施例では帳票画像に本発明を適用した
例を述べたが、帳票画像以外でも直線線分が文字・記号
・図形に接触する画像から文字・記号・図形を抽出する
場合に有効である。例として、アンダーライン除去など
が挙げられる。In this embodiment, an example in which the present invention is applied to a form image has been described, but it is effective when a character, a symbol, or a figure is extracted from an image in which a straight line segment touches the character, the symbol, or the figure other than the form image. Is. An example is underline removal.

【００４０】次に、請求項２または請求項３に記載の発
明に基づいて構成した帳票画像からの文字コード抽出方
式の実施例を図１３に示すブロック図を用いて説明す
る。本実施例は、図１に示した干渉画素復元手段１３０
４（図１では１０４）の後段に、文字切り出し手段１３
０５及び文字認識手段１３０６を付加したシステムであ
る。本ブロック図が示すシステムに図１４のような濃淡
画像を入力した場合には、請求項１の発明、即ち干渉画
素復元手段１３０４の最終的出力画像は図１５に示す文
字だけを含む画像となる。Next, an embodiment of a character code extraction method from a form image constructed according to the invention of claim 2 or claim 3 will be described with reference to the block diagram shown in FIG. In this embodiment, the interference pixel restoration means 130 shown in FIG.
4 (104 in FIG. 1), the character cutting means 13
05 and character recognition means 1306 are added to the system. When a grayscale image as shown in FIG. 14 is input to the system shown in this block diagram, the final output image of the invention of claim 1, that is, the interference pixel restoration means 1304 is an image containing only the characters shown in FIG. .

【００４１】文字切り出し手段１３０５は、本画像から
各文字毎の画像を生成する手段である。本手段の具体的
実現方法としては、縦方向に黒画素を投影したときに得
られる黒画素ヒストグラムの谷点を用いる方法や、連結
黒画素のラベリング結果を用いる方法等が一般的に知ら
れている。The character cutting means 1305 is means for generating an image for each character from the main image. As a concrete implementation method of this means, a method of using valley points of a black pixel histogram obtained when black pixels are projected in the vertical direction, a method of using a labeling result of connected black pixels, etc. are generally known. There is.

【００４２】図１６は、文字切り出し手段１３０５が、
これらの方法によって図１５に示す画像の最上段の文字
列から抽出した文字画像の例を示す図である。文字認識
手段１３０６は、図１６に示した各文字画像を認識し、
それぞれの文字コードを出力する。干渉画素復元手段１
３０４が出力する画像は、図１５に示したように、罫線
等の直線成分と文字が干渉している部分付近の文字形状
が実形状に近い状態に復元されるために、従来の個別文
字認識方法を用いて高い読み取り率を得ることができ
る。しかし、前記した特公昭６３−２５１８７４号公報
に示された発明などを用いた文字読み取り方法では、罫
線等の直線成分と文字の干渉箇所付近の文字形状が階段
状になるために、文字認識のための特徴量として不適当
な値が抽出され、文字認識を困難にするという問題があ
る。In FIG. 16, the character cutting means 1305 is
It is a figure which shows the example of the character image extracted from the uppermost character string of the image shown in FIG. 15 by these methods. The character recognition means 1306 recognizes each character image shown in FIG.
Output each character code. Interference pixel restoration means 1
As shown in FIG. 15, since the image output by 304 is restored to a state in which the character shape in the vicinity of the portion where the character interferes with the straight line component such as ruled line is approximated to the actual shape, the conventional individual character recognition is performed. High read rates can be obtained using the method. However, in the character reading method using the invention disclosed in Japanese Patent Publication No. 63-251874, the character shape near the point where the line component such as a ruled line interferes with the character has a stair shape. There is a problem in that an inappropriate value is extracted as a feature amount for making the character recognition difficult.

【００４３】[0043]

【発明の効果】以上説明したように、本発明によれば、
枠線や罫線等の直線と文字・図形・記号を含む濃淡画像
から文字・図形・記号を高精度に抽出することができ
る。As described above, according to the present invention,
Characters, figures, and symbols can be extracted with high accuracy from grayscale images that include straight lines such as frame lines and ruled lines, and characters, figures, and symbols.

【００４４】また、本発明によって抽出される文字は本
来の文字の形を忠実に復元するために、本発明の後段で
実行される可能性の高い文字認識処理の入力画像として
優れた品質の画像を提供することができる。Further, since the character extracted by the present invention faithfully restores the original character shape, an image of excellent quality as an input image for the character recognition processing which is likely to be executed in the latter stage of the present invention. Can be provided.

[Brief description of drawings]

【図１】第１の発明による一実施例を示すブロック構成
図である。FIG. 1 is a block diagram showing an embodiment according to the first invention.

【図２】第１の発明における入力濃淡画像の一例を示す
図である。FIG. 2 is a diagram showing an example of an input grayscale image in the first invention.

【図３】図２に示した帳票の２値画像の一部を示す図で
ある。FIG. 3 is a diagram showing a part of a binary image of the form shown in FIG.

【図４】第１の発明に係る罫線除去手段が出力するラベ
ル画像の例を示す図である。FIG. 4 is a diagram showing an example of a label image output by a ruled line removing unit according to the first invention.

【図５】第１の発明に係る種画素選出手段が設定する種
画素の位置の例を示す図である。FIG. 5 is a diagram showing an example of positions of seed pixels set by a seed pixel selection unit according to the first invention.

【図６】罫線と文字の交差部における濃淡値の例を示す
図である。FIG. 6 is a diagram showing an example of a gray value at an intersection of a ruled line and a character.

【図７】第１の発明に係る干渉画素復元手段が画素を探
索する順序を示す図である。FIG. 7 is a diagram showing an order in which an interference pixel restoration unit according to the first invention searches for a pixel.

【図８】干渉画素復元後のラベル画像の一例を示す図で
ある。FIG. 8 is a diagram showing an example of a label image after restoration of interference pixels.

【図９】第２の発明による一実施例を示すブロック構成
図である。FIG. 9 is a block diagram showing an embodiment according to the second invention.

【図１０】第２の発明における種画素選出手段が設定す
る種画素の位置の例を示す図である。FIG. 10 is a diagram showing an example of positions of seed pixels set by a seed pixel selection unit in the second invention.

【図１１】微小領域における干渉画素の連続性（水平方
向）を算出するための２種類のフィルタを示す図であ
る。FIG. 11 is a diagram showing two types of filters for calculating continuity (horizontal direction) of interference pixels in a minute area.

【図１２】文字ストロークらしさ（水平方向）の算出方
法を説明する図である。FIG. 12 is a diagram illustrating a method of calculating character stroke likelihood (horizontal direction).

【図１３】請求項２に記載した発明の一実施例を示すブ
ロック図である。FIG. 13 is a block diagram showing an embodiment of the invention described in claim 2.

【図１４】入力濃淡画像の一例を示す図である。FIG. 14 is a diagram showing an example of an input grayscale image.

【図１５】図１４を入力画像した場合における干渉画素
復元手段の出力画像を示す図である。FIG. 15 is a diagram showing an output image of the interference pixel restoration unit when FIG. 14 is an input image.

【図１６】文字切り出し手段が出力する文字画像の例を
示す図である。FIG. 16 is a diagram showing an example of a character image output by a character cutout unit.

[Explanation of symbols]

１００…２値化手段１０１…罫線除去手段１０２…種画素選出手段１０３…平均罫線濃度算出手段１０４…干渉画素復元手段４００…背景画素４０１…文字画素４０２…罫線画素５００、５０１…種画素６００…罫線方向に並ぶ画素列６０１…横罫線中心部にある罫線画素６０２…横罫線輪郭付近にある干渉画素７００…種画素７０１…種画素の左側の画素８００…背景画素８０１…文字画素８０２…罫線画素８０３…干渉画素９００…２値化手段９０１…罫線除去手段９０２…種画素選出手段９０３…平均罫線濃度算出手段９０４…干渉画素らしさ算出手段９０５…文字ストロークらしさ算出手段９０６…干渉画素復元手段１０００、１００１、１００２、１００３…種画素の探
索範囲１００４、１００５、１００６、１００７…種画素の検
索範囲１０００、１００１、１００２、１００３の中から選出された種画素１００８、１００９…罫線画素に隣接する文字画素１１００、１１０１…微小領域における干渉画素の方向
性（横方向）を算出する為のフィルタ１２００、１２０１、１２０２、１２０３、１２０４、
１２０５、１２０６…画素１３００…２値化手段１３０１…罫線除去手段１３０２…種画素選出手段１３０３…平均罫線濃度算出手段１３０４…干渉画素復元手段１３０５…文字切り出し手段１３０６…文字認識手段Reference numeral 100 ... Binarization means 101 ... Ruled line removal means 102 ... Seed pixel selection means 103 ... Average ruled line density calculation means 104 ... Interference pixel restoration means 400 ... Background pixels 401 ... Character pixels 402 ... Ruled line pixels 500, 501 ... Seed pixels 600 ... Pixel column arranged in ruled line direction 601 ... Ruled line pixel in the center of horizontal ruled line 602 ... Interference pixel near horizontal ruled line outline 700 ... Seed pixel 701 ... Left pixel of seed pixel 800 ... Background pixel 801 ... Character pixel 802 ... Ruled line pixel 803 ... Interference pixel 900 ... Binarization means 901 ... Ruled line removal means 902 ... Seed pixel selection means 903 ... Average ruled line density calculation means 904 ... Interference pixel likeness calculation means 905 ... Character stroke likeness calculation means 906 ... Interference pixel restoration means 1000, 1001, 1002, 1003 ... Search range of seed pixels 1004, 1005, 1006, 007 ... Seed pixel search range 1000, 1001, 1002, 1003, selected seed pixel 1008, 1009 ... Character pixel adjacent to ruled line pixel 1100, 1101 ... Directionality of interference pixel in minute area (horizontal direction) Filters for calculating the values 1200, 1201, 1202, 1203, 1204,
1205, 1206 ... Pixels 1300 ... Binarization means 1301 ... Ruled line removal means 1302 ... Seed pixel selection means 1303 ... Average ruled line density calculation means 1304 ... Interference pixel restoration means 1305 ... Character cutout means 1306 ... Character recognition means

Claims

[Claims]

1. An image extraction method for extracting characters, figures, symbols, etc. from a grayscale image including at least straight lines such as frame lines and ruled lines and characters, figures, symbols, etc., wherein a ruled line is created from the grayscale image. Binarizing means for separating a straight line, characters, figures, symbols, etc. from a background to obtain a binary image, and removing straight lines such as ruled lines from the binary image, and pixels belonging to the removed ruled lines ( A ruled line removal unit that stores the positions of pixels (hereinafter abbreviated as ruled line pixels) and the positions of pixels (hereinafter abbreviated as character pixels) belonging to a character or the like included in a binary image after ruled line removal, and the ruled line adjacent to the character pixel A seed pixel selection unit that selects the pixel with the highest density among the ruled line pixels as a seed pixel at each adjacent position, and the average density of the seed pixel and the ruled line pixels existing in a straight line direction with respect to the seed pixel, with reference to the grayscale image. Calculated And an average ruled line density calculation means for scanning the pixels in the grayscale image in the direction of a straight line starting from the position of the seed pixel, and when the pixel value is larger than a value obtained by adding a constant to the average ruled line density. A pixel in which a character and a straight line overlap each other (hereinafter referred to as an interference pixel) is determined and its position coordinates are stored. On the other hand, when the pixel value is smaller than the value obtained by adding a constant to the average ruled line density value, the direction to that direction is set. An interfering pixel restoring means for ending the pixel scanning, and when the interfering pixel is extracted by the interfering pixel restoring means, the extracted interfering pixel is treated in the same manner as a character pixel, and a portion where the ruled line pixel and the ruled line pixel are adjacent to each other. An image extraction method characterized in that characters and the like are extracted by repeatedly applying the processing after the seed pixel selection means to the.

2. A character cutout unit that determines a region where each character or the like exists from an image including the character or the like generated by the interference pixel restoration unit and outputs a character image that includes only one character, and the character image A character recognition means for recognizing and outputting a character code is further provided.
Image extraction method described in.

3. A straight line such as a frame line or ruled line and a character
From a grayscale image that includes figures or symbols,
An image extraction method for extracting characters, figures, symbols, etc., comprising: binarizing means for extracting straight lines such as ruled lines, characters, figures, symbols, and background from the grayscale image to obtain a binary image;
Ruled line removing means for removing straight lines such as ruled lines from the value image and storing the positions of the removed ruled line pixels and the character pixels contained in the binary image after the ruled line removal, and a position where the character pixel and the ruled line pixel are adjacent to each other. A seed pixel selecting unit that divides ruled line pixels near adjacent portions into pixel columns of a width 1 arranged in a linear direction for each pixel, and extracts a pixel having the maximum density value as a seed pixel from each pixel column, and the seed pixel. And an average ruled line density calculating means for calculating an average density of ruled line pixels existing in a pixel row to which the pixel belongs, and a gray value and its pixel in the gray image for ruled line pixels near the adjacent portion. From the difference in the average density of the pixel row to which the pixel belongs, an interference pixel likelihood calculating means for calculating the probability that the pixel is an interference pixel, and how consecutively the pixels with high interference pixel likelihood are present. Character stroke likelihood calculating means for calculating as the character stroke likelihood of the pixel of, and scanning the pixel in the direction of the ruled line to which the seed pixel belongs, starting from the seed pixel, when the character stroke likelihood is larger than a threshold value. Includes an interfering pixel restoring unit that stores the position coordinates of the interfering pixel and stores the position coordinate, and ends the pixel scanning in that direction when the character stroke likelihood is smaller than the threshold value. Image extraction method.

4. A character cutout unit that determines a region where each character or the like exists from an image including the character or the like generated by the interference pixel restoration unit, and outputs a character image including only one character, and the character image A character recognizing means for recognizing and outputting a character code is further provided.
Image extraction method described in.