JP3734614B2

JP3734614B2 - Image processing method, apparatus, and recording medium

Info

Publication number: JP3734614B2
Application number: JP03609098A
Authority: JP
Inventors: 裕子杉浦
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1998-02-18
Filing date: 1998-02-18
Publication date: 2006-01-11
Anticipated expiration: 2018-02-18
Also published as: JPH11232386A

Description

【０００１】
【発明の属する技術分野】
本発明は、罫線の識別精度を向上させた画像処理方法、装置および記録媒体に関する。
【０００２】
【従来の技術】
表を構成する点線罫線を認識する方法として、例えば特開平７−２３０５２５号公報に記載された罫線認識方法がある。この方法は、所定のしきい値以下の矩形を点線要素として抽出し、該要素間の間隔が所定のいしき値以内の要素同士を統合した矩形を点線罫線として抽出する。また、矩形の大きさおよび矩形間の間隔のそれぞれの分散値を基に罫線としての妥当性も判断している。
【０００３】
ところで、罫線や文字を認識処理するときに、処理対象となる画像の特性が例えば濃い状態あるいはかすれた状態にある場合には、認識処理に使用されているアルゴリズムやしきい値では対応できずに、認識精度が低下することがある。また、画像の特性は、スキャナ等の入力装置によって入力するときの２値化しきし値を変更することによって、入力画像の濃度などを補正し、認識アルゴリズムに適応した画像特性を得ることが可能である。
【０００４】
【発明が解決しようとする課題】
上記した公報に記載された方法では、矩形の抽出処理を所定のしきい値で行うと、例えば、文字を構成している矩形を、破線の構成要素として誤って抽出する可能性がある。また、一度入力された２値データがアルゴリズムに合わない特性であるという理由で、再度画像を入力し直すことは大変に手間がかかる。
【０００５】
そこで、罫線や文字の認識処理の前処理として、処理画像の特性を把握し、その特性情報を用いることができれば、その情報に対応したアルゴリズムやしきい値などに変更することが可能となり、認識精度を向上させることが可能となる。
【０００６】
本発明は上記した考察を基になされたもので、
本発明の目的は、罫線や文字の認識処理の前処理として、入力画像の特性の一つであるかすれ画像を判定し、またかすれ領域を抽出することにより、罫線や文字の識別精度を向上させた画像処理方法、装置および記録媒体を提供することにある。
【０００７】
【課題を解決するための手段】
前記目的を達成するために、請求項１記載の発明では、２値化された画像データから黒画素連続成分の矩形を抽出し、該抽出された矩形から点線要素に相当する矩形を抽出し、該抽出された点線要素について、所定の距離内にある点線要素を結合処理し、該処理の結果、結合しなかった点線要素を計数し、該計数値が所定値以上であるとき、前記画像はかすれ画像であると判定することを特徴としている。
【０００８】
請求項２記載の発明では、２値化された画像データから黒画素連続成分の矩形を抽出し、該抽出された矩形から点線要素に相当する矩形を抽出し、該抽出された点線要素について、所定の距離内にある点線要素を結合処理し、該処理の結果、結合しなかった点線要素を計数し、該計数値が所定値以上であるとき、前記画像はかすれ画像であると判定し、該判定されたかすれ画像について、前記結合しなかった点線要素を統合することにより、かすれ領域を抽出することを特徴としている。
【０００９】
請求項３記載の発明では、前記かすれ領域内に存在している点線を疑似点線として除去することを特徴としている。
【００１０】
請求項４記載の発明では、前記かすれ領域のサイズが所定値以上であるとき、前記画像がかすれ画像であることを警告することを特徴としている。
【００１１】
請求項５記載の発明では、２値化された画像データから黒画素連続成分の矩形を抽出する手段と、該抽出された矩形から点線要素に相当する矩形を抽出する手段と、該抽出された点線要素について、所定の距離内にある点線要素を結合処理する手段と、該結合処理の結果、結合しなかった点線要素を計数する手段と、該計数値が所定値以上であるとき、前記画像はかすれ画像であると判定する手段とを備えたことを特徴としている。
【００１２】
請求項６記載の発明では、２値化された画像データから黒画素連続成分の矩形を抽出する手段と、該抽出された矩形から点線要素に相当する矩形を抽出する手段と、該抽出された点線要素について、所定の距離内にある点線要素を結合処理する手段と、該結合処理の結果、結合しなかった点線要素を計数する手段と、該計数値が所定値以上であるとき、前記画像はかすれ画像であると判定する手段と、該判定されたかすれ画像について、前記結合しなかった点線要素を統合することにより、かすれ領域を抽出する手段とを備えたことを特徴としている。
【００１３】
請求項７記載の発明では、請求項１乃至４のいずれか１項に記載の画像処理方法をコンピュータに実現させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体であることを特徴としている。
【００１４】
【発明の実施の形態】
以下、本発明の一実施例を図面を用いて具体的に説明する。
本発明の各実施例を説明する前に、まず本発明で使用する用語を定義する。
矩形：画像中に、連続している画像、または所定のしきい値以上連続している画像部分（例えば、２値画像であれば連続黒画素部、もしくは連続白画素部）を一塊として、それらが接触包含されるように外接四角形で囲んだ範囲を矩形と定義する。
【００１５】
矩形抽出：矩形の位置座標を抽出することを矩形抽出と定義する。
単一要素点線：点線要素が１つである点線のことで、つまり、結合できなかった点線要素と同義である。
点線：点線要素を結合処理したものをいう。
【００１６】
〈実施例１〉
２値入力データを基にかすれ画像か否かを判断するとき、かすれ画像の特徴として、罫線や文字がかすれているために不連続となり、このため矩形の数が多いことが挙げられる。ところで、点線や網点を含む画像であっても矩形の数が多くなり、矩形の数を単純にカウントするだけでは、かすれ画像であるのか、点線を多く含む、鮮明な画像であるのか判断ができない。そこで、点線要素矩形とかすれによる矩形とを区別できれば、かすれによる矩形の数がしきい値よりも多い場合にかすれ画像であると判断できる。
【００１７】
そこで、本実施例１では、かすれ画像における矩形数が多いという特徴を用いて、かすれによって不連続になっている矩形を点線要素や網点要素矩形と区別し、かすれによって発生した矩形数をカウントして、その計数値を基にかすれ画像であるか否かを判断している。
【００１８】
図１は、本発明の実施例１の構成を示す。また、図２は、本発明の実施例１の処理フローチャートを示す。スキャナなどの２値画像入力部１によって文書などの画像を入力し、２値イメージメモリ２に格納する（ステップ１０１）。矩形抽出部３では、２値イメージメモリ２から黒画素連続成分の矩形を抽出し、これを矩形メモリ４に格納する（ステップ１０２）。
【００１９】
点線抽出部５は、点線要素選択部６と点線要素の結合処理部８からなり、点線要素選択部６は、矩形メモリ４に保持された矩形の中から点線要素相当サイズの矩形を選択し、点線要素メモリ７に保持する（ステップ１０３）。
【００２０】
続いて、点線要素の結合処理部８では、処理方向において一定値内に位置している点線要素どうしを結合処理していく。つまり、横方向に並んでいる点線要素を結合する場合には、横方向（処理方向）において一定値内に存在している点線要素どうしを結合していく。結合されて抽出された点線は、罫線メモリ９に保持される（ステップ１０４、１０９）。また、点線要素の結合処理部８で２つ以上の点線要素と結合しなかった点線要素も、単一要素点線として罫線メモリ９に保持される（ステップ１０４、１０５）。
【００２１】
単一要素点線とは、点線要素の結合処理で要素結合できなかった点線であり、つまり、点線要素選択部６で点線要素サイズとして選択された矩形であるが、点線要素矩形でないと判断して、かすれによる矩形であるとみなす。ここで、単一要素点線の数は、かすれにより発生した矩形の数と仮定することができる。かすれ原稿判定部１０では以下のようにしてかすれ原稿を判断する。かすれ原稿判定部１０は、単一要素点線をカウントするカウンタ１１と、かすれ原稿にマーキングを付与するマーキング部１２と、原稿種類メモリ１３から構成されている。
【００２２】
カウンタ１１が単一要素点線をカウントした結果（ステップ１０６）、単一要素点線が所定数以上であれば、処理中の画像をかすれ画像と判定して（ステップ１０７、１０８）、マーキング部１２では、入力画像がかすれ画像であることを示す情報を付けて、原稿種類メモリ１３に格納する。
【００２３】
〈実施例２〉
上記した実施例１では、かすれ画像と判定された画像において、画像のどの部分がかすれているのかの情報は得られない。画像におけるかすれ領域が分かれば、その後の処理、例えば抽出されたかすれ領域に対してかすれ補正処理を実行させるなどの処理に対して有効な情報を提供することができる。
【００２４】
そこで、本実施例では、画像におけるかすれ領域を抽出することによって、他の処理に有効な情報を与える実施例である。
【００２５】
図３は、本発明の実施例２の構成を示し、また、図４は、本発明の実施例２の処理フローチャートを示す。点線要素の結合処理部２８、罫線メモリ２９までの構成は、実施例１と同様である。本実施例では、単一要素点線の結合部３１と、かすれ領域統合部３２からなるかすれ領域抽出処理部３０と、かすれ領域メモリ３３を設けて構成されている。
【００２６】
単一要素点線の結合部３１は、罫線メモリ２９に保持されている単一要素点線に対して、所定の範囲内に位置している単一要素点線どうしを結合することにより、小かすれ領域を形成し（ステップ２０５、２０６）、さらに、かすれ領域統合部３２で小かすれ領域どうしを同様に結合成長させることにより、かすれ領域を抽出する（ステップ２０７）。
【００２７】
図５、６、７は、かすれ領域生成を説明する図である。図５、６、７において、Ａ、Ｂ、Ｃは単一要素点線であり、２つ以上の点線要素と結合できなかった点線要素である。まず、図５のＡを処理の基準とすると、Ａを中心に縦横任意の値（幅）で広げた範囲をアとし、範囲アがＡと結合する処理範囲となる。また、図５ではアの処理範囲内にＢが存在しているので、ＡとＢを結合して、図６に示すように小かすれ領域αを形成する。
【００２８】
小かすれ領域が抽出されたら、次に、小かすれ領域αを基準として、図６に示すように結合範囲イを設定する。この設定された結合範囲にＣが存在しているので、前述したと同様にＣと小かすれ領域を結合させると、図７に示すように小かすれ領域αが形成される。このような処理を、結合範囲内に単一要素点線がなくなるまで繰返す。結合範囲内に単一要素点線がなくなった時点で、Ａを基準とした小かすれ領域抽出処理が終了する。
【００２９】
単一要素点線の結合部３１は、以上の処理を全ての単一要素点線に対して実行する。抽出された小かすれ領域はかすれ領域メモリ３３に保持される。次いで、かすれ領域統合部３２は、単一要素点線の結合部３１で生成された小かすれ領域に対して統合処理を実行する。かすれ領域メモリ３３に保持されている小かすれ領域データを基に結合部３１の処理と同様に小かすれ領域どうしを結合する。つまり、結合部３１における単一要素点線の結合処理を小かすれ領域に置き換えた処理を行う。ここで統合されたかすれ領域は、再びかすれ領域メモリ３３に保持される。
【００３０】
〈実施例３〉
全画像に対して実施例２のかすれ領域処理を実行させると処理時間がかかる。そこで本実施例では、実施例１でかすれ画像と判断された画像に対してのみ、実施例２のかすれ領域抽出処理を実行させ、処理時間を短縮した実施例である。
【００３１】
図８は、実施例３の構成を示す。また、図９は、実施例３の処理フローチャートを示す。実施例３の構成は、実施例１と実施例２の構成を組み合わせたものである。つまり、実施例１の処理によって原稿種類メモリ４８には、処理中の画像の種類が保持されているので、メモリ４８を参照したときに、画像がかすれているという情報を得た場合に（ステップ３０８）、実施例２のかすれ領域抽出処理４９を実行する（ステップ３０９）。
【００３２】
〈実施例４〉
点線抽出処理は、点線要素サイズの矩形を抽出し、しきい値内に位置している点線要素矩形どうしを結合することによって点線を形成させている。しかし、画像内のかすれている領域では不連続になった矩形を誤って結合することにより、疑似点線が発生してしまう。
【００３３】
そこで、本実施例では、画像中のかすれ領域を簡単な方法で抽出することによって、かすれ領域で抽出された点線を疑似点線をみなして点線情報から除去し、点線認識の精度を向上させた実施例である。
【００３４】
図１０は、実施例４の構成を示し、実施例３の構成に疑似点線除去部５１をさらに付加したもので、他の構成要素は実施例３と同様である。また、図１１は、実施例４の処理フローチャートを示す。実施例２で抽出されたかすれ領域はかすれ領域メモリ５０に保持されていて、点線抽出部４５で抽出された点線が罫線メモリ４６に保持されている。
【００３５】
疑似点線除去部５１は、罫線メモリ４６内の点線の位置座標とかすれ領域メモリ５０のかすれ領域とを比較して、かすれ領域内に存在している点線（疑似点線）あるいはかすれ領域に接している点線（疑似点線）を、罫線メモリ４６から除去する（ステップ４１０）。
【００３６】
〈実施例５〉
かすれの程度が著しい画像中の罫線や文字の認識精度を向上させるためには、認識処理のアルゴリズムを修正する方法が考えられるが、アルゴリズムの開発までには所要の時間が必要となる。
【００３７】
そこで、以下の実施例５、６では、かすれ画像の不具合に対処するために、実施例２で抽出されたかすれ領域の数、または面積が所定値よりも大きい画像においては、ユーザーに対してかすれ画像であることを警告する処理を行う。警告がでた画像に対してユーザー側が再度画像を入力するか、あるいは画像の修正を行なうものである。
【００３８】
図１２は、実施例５の構成である。原稿の種類メモリ４８を参照したとき、かすれ原稿であるという情報を得た場合に、かすれ画像警告部５２は、かすれ画像であるという警告を行う。例えば、表示画面上に警告を表示させたり、音声で報知するなどの方法を用いてユーザに知らせる。
【００３９】
〈実施例６〉
図１３は、実施例６の構成を示す。実施例２で抽出されたかすれ領域はかすれ領域メモリ５０に保持されているので、かすれ領域面積計算部５３は、かすれ領域メモリ５０からかすれ領域の面積の総和、またはかすれ領域数の総和等の計算を行い、所定のしきし値を超えた場合には、実施例５と同様に、かすれ画像警告部５２でかすれ画像であることを警告する。
【００４０】
〈実施例７〉
図１４は、実施例７の構成を示し、ソフトウェアによって実現する実施例である。本発明をソフトウェアによって実現する場合には、図１４に示すように、ＣＰＵ、メモリ、表示装置、ハードディスク、キーボード、ＣＤ‐ＲＯＭドライブ、マウスなどからなるコンピュータシステムを用意する。ＣＤ−ＲＯＭなどのコンピュータ読み取り可能な記録媒体には、本発明の画像処理機能や処理手順を実現するプログラムなどが記録されている。また、処理対象の文書などの画像は例えばハードディスクなどに格納されている。そして、ＣＰＵは、記録媒体から上記した処理機能、処理手順を実現するプログラムを読み出し、ハードディスクなどから読み込まれた画像がかすれ画像であるか否かを判定し、その判定結果をディスプレイなどに表示出力する。
【００４１】
【発明の効果】
以上、説明したように、本発明によれば、罫線や文字認識の前処理として、入力画像がかすれているか否かを判定しているので、その後の処理である罫線や文字の認識精度の向上のために有効な情報を与えることができる。
【００４２】
本発明によれば、入力画像中のかすれている領域を抽出しているので、該領域情報を用いることにより、罫線や文字の認識精度をより向上させることが可能となる。
【００４３】
本発明によれば、かすれ画像と判定された画像に対してのみかすれ領域の抽出処理を実行しているので、処理時間が短縮される。
【００４４】
本発明によれば、かすれ領域に存在する点線は疑似点線である確率が高いため、かすれ領域内で抽出された点線を除外することにより、画像中の疑似点線の除去率が向上し、従って点線罫線の抽出精度が向上する。
【００４５】
本発明によれば、入力画像がかすれ画像であることをユーザに喚起することができ、ユーザはかすれ画像に対して的確な処置を実行することが可能となる。
【図面の簡単な説明】
【図１】本発明の実施例１の構成を示す。
【図２】本発明の実施例１の処理フローチャートを示す。
【図３】本発明の実施例２の構成を示す。
【図４】本発明の実施例２の処理フローチャートを示す。
【図５】かすれ領域生成を説明するための第１の図である。
【図６】かすれ領域生成を説明するための第２の図である。
【図７】かすれ領域生成を説明するための第３の図である。
【図８】本発明の実施例３の構成を示す。
【図９】本発明の実施例３の処理フローチャートを示す。
【図１０】本発明の実施例４の構成を示す。
【図１１】本発明の実施例４の処理フローチャートを示す。
【図１２】本発明の実施例５の構成を示す。
【図１３】本発明の実施例６の構成を示す。
【図１４】本発明の実施例７の構成を示す。
【符号の説明】
１画像入力部
２２値イメージメモリ
３矩形抽出部
４矩形メモリ
５点線抽出部
６点線要素選択部
７点線要素メモリ
８結合処理部
９罫線メモリ
１０かすれ原稿判定部
１１カウンタ
１２マーキング部
１３原稿種類メモリ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image processing method , apparatus, and recording medium with improved ruled line identification accuracy.
[0002]
[Prior art]
As a method for recognizing dotted ruled lines constituting a table, for example, there is a ruled line recognition method described in JP-A-7-230525. In this method, a rectangle having a predetermined threshold value or less is extracted as a dotted line element, and a rectangle obtained by integrating elements whose intervals between the elements are within a predetermined threshold value is extracted as a dotted line. Also, the validity of the ruled line is determined based on the respective variance values of the size of the rectangle and the interval between the rectangles.
[0003]
By the way, when a ruled line or character is recognized, if the characteristics of the image to be processed are, for example, in a dark state or a faint state, the algorithm or threshold value used in the recognition process cannot handle it. Recognition accuracy may be reduced. In addition, by changing the binarization threshold value when inputting with an input device such as a scanner, the image characteristics can be corrected by correcting the density of the input image and obtaining image characteristics adapted to the recognition algorithm. It is.
[0004]
[Problems to be solved by the invention]
In the method described in the above publication, if a rectangle extraction process is performed with a predetermined threshold, for example, a rectangle constituting a character may be erroneously extracted as a broken line component. In addition, it is very time-consuming to input an image again because binary data once input has characteristics that do not match the algorithm.
[0005]
Therefore, if the characteristics of the processed image can be grasped and used as pre-processing for the recognition processing of ruled lines and characters, it is possible to change to an algorithm or threshold corresponding to the information, and to recognize it. The accuracy can be improved.
[0006]
The present invention has been made based on the above consideration,
It is an object of the present invention to improve ruled line and character identification accuracy by determining a blurred image, which is one of the characteristics of an input image, and extracting a blurred area as a pre-process for ruled line and character recognition processing. Another object of the present invention is to provide an image processing method , apparatus, and recording medium.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, the invention according to claim 1 extracts a black pixel continuous component rectangle from the binarized image data, extracts a rectangle corresponding to a dotted line element from the extracted rectangle, With respect to the extracted dotted line elements, the dotted line elements within a predetermined distance are combined, and as a result of the processing, the dotted line elements that are not combined are counted, and when the count value is equal to or greater than a predetermined value, the image is It is characterized by determining that the image is a blurred image.
[0008]
In the invention of claim 2, a rectangle of black pixel continuous components is extracted from the binarized image data, a rectangle corresponding to a dotted line element is extracted from the extracted rectangle, and the extracted dotted line element is A dotted line element within a predetermined distance is combined, and the dotted line elements that are not combined as a result of the processing are counted. When the counted value is equal to or greater than a predetermined value, the image is determined to be a blurred image; For the determined faint image, the blurred area is extracted by integrating the dotted line elements that are not combined.
[0009]
The invention according to claim 3 is characterized in that a dotted line existing in the blurred area is removed as a pseudo dotted line.
[0010]
According to a fourth aspect of the present invention, when the size of the blurred area is a predetermined value or more, a warning is given that the image is a blurred image.
[0011]
According to the fifth aspect of the present invention, means for extracting a black pixel continuous component rectangle from the binarized image data, means for extracting a rectangle corresponding to a dotted line element from the extracted rectangle, and the extracted With respect to the dotted line element, means for combining dotted line elements within a predetermined distance, means for counting dotted line elements that have not been combined as a result of the combining process, and when the count value is greater than or equal to a predetermined value, the image And a means for determining that the image is a faint image .
[0012]
In the invention described in claim 6, means for extracting a black pixel continuous component rectangle from the binarized image data, means for extracting a rectangle corresponding to a dotted line element from the extracted rectangle, and the extracted With respect to the dotted line element, means for combining dotted line elements within a predetermined distance, means for counting dotted line elements that have not been combined as a result of the combining process, and when the count value is greater than or equal to a predetermined value, the image It is characterized by comprising means for determining that the image is a faint image, and means for extracting the blurred region by integrating the dotted line elements that have not been combined with the determined faint image .
[0013]
The invention described in claim 7 is a computer-readable recording medium that records a program for causing a computer to implement the image processing method according to any one of claims 1 to 4 .
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.
Before describing each embodiment of the present invention, terms used in the present invention are first defined.
Rectangle: Consecutive images in an image, or image portions that are continuous for a predetermined threshold or more (for example, a continuous black pixel portion or a continuous white pixel portion in the case of a binary image) The range enclosed by the circumscribed rectangle is defined as a rectangle so that is included in contact.
[0015]
Rectangle extraction: Extracting the position coordinates of a rectangle is defined as rectangle extraction.
Single element dotted line: A dotted line having one dotted line element, that is, synonymous with a dotted line element that cannot be combined.
Dotted line: A combination of dotted line elements.
[0016]
<Example 1>
When determining whether or not the image is a blurred image based on the binary input data, a characteristic of the blurred image is that the ruled lines and characters are blurred and discontinuous, and thus the number of rectangles is large. By the way, even if an image includes dotted lines and halftone dots, the number of rectangles increases, and simply counting the number of rectangles makes it possible to determine whether the image is a blurred image or a clear image including many dotted lines. Can not. Therefore, if the dotted line element rectangle and the rectangle due to the blur can be distinguished, it can be determined that the image is a blur image when the number of rectangles due to the blur is larger than the threshold value.
[0017]
Therefore, in the first embodiment, using the feature that the number of rectangles in the blurred image is large, rectangles that are discontinuous due to blurring are distinguished from dotted line elements and halftone dot element rectangles, and the number of rectangles generated by blurring is counted. Then, based on the count value, it is determined whether the image is a blurred image.
[0018]
FIG. 1 shows the configuration of Embodiment 1 of the present invention. FIG. 2 shows a process flowchart of the first embodiment of the present invention. An image such as a document is input by the binary image input unit 1 such as a scanner and stored in the binary image memory 2 (step 101). The rectangle extraction unit 3 extracts the rectangle of the black pixel continuous component from the binary image memory 2 and stores it in the rectangle memory 4 (step 102).
[0019]
The dotted line extraction unit 5 includes a dotted line element selection unit 6 and a dotted line element combination processing unit 8. The dotted line element selection unit 6 selects a rectangle having a size corresponding to a dotted line element from the rectangles held in the rectangular memory 4, It is stored in the dotted line element memory 7 (step 103).
[0020]
Subsequently, the dotted line element combination processing unit 8 combines the dotted line elements located within a certain value in the processing direction. That is, when connecting the dotted line elements arranged in the horizontal direction, the dotted line elements existing within a certain value in the horizontal direction (processing direction) are combined. The dotted lines combined and extracted are held in the ruled line memory 9 (steps 104 and 109). Also, the dotted line elements that have not been combined with two or more dotted line elements by the dotted line element combination processing unit 8 are held in the ruled line memory 9 as single element dotted lines (steps 104 and 105).
[0021]
A single element dotted line is a dotted line that could not be combined by the dotted line element combination processing, that is, a rectangle selected as the dotted line element size by the dotted line element selection unit 6, but determined not to be a dotted line element rectangle , Consider a rectangle with blur. Here, the number of single element dotted lines can be assumed to be the number of rectangles generated by blurring. The faint document determination unit 10 determines a faint document as follows. The faint document determination unit 10 includes a counter 11 that counts a single element dotted line, a marking unit 12 that gives a marking to a faint document, and a document type memory 13.
[0022]
As a result of the counter 11 counting the single element dotted lines (step 106), if the number of single element dotted lines is equal to or greater than a predetermined number, the image being processed is determined to be a blurred image (steps 107 and 108). Information indicating that the input image is a faint image is added and stored in the document type memory 13.
[0023]
<Example 2>
In the first embodiment described above, information regarding which part of the image is blurred in the image determined to be a blurred image cannot be obtained. If the blurred area in the image is known, it is possible to provide information that is effective for subsequent processing, for example, processing for executing the blurred correction process on the extracted blurred area.
[0024]
Therefore, this embodiment is an embodiment in which information that is effective for other processing is provided by extracting a blurred region in an image.
[0025]
FIG. 3 shows a configuration of the second embodiment of the present invention, and FIG. 4 shows a process flowchart of the second embodiment of the present invention. The configuration up to the dotted line element combination processing unit 28 and the ruled line memory 29 is the same as in the first embodiment. In the present embodiment, a single element dotted line coupling unit 31, a blurred region extraction processing unit 30 including a blurred region integration unit 32, and a blurred region memory 33 are provided.
[0026]
The single element dotted line combining unit 31 combines the single element dotted lines that are located within a predetermined range with the single element dotted lines held in the ruled line memory 29 to thereby reduce the faint area. Then, the faint areas are extracted by causing the faint area integration unit 32 to similarly bond and grow the faint areas (step 207).
[0027]
5, 6 and 7 are diagrams for explaining the generation of the blurred area. In FIGS. 5, 6, and 7, A, B, and C are single element dotted lines, which are dotted line elements that cannot be combined with two or more dotted line elements. First, assuming that A in FIG. 5 is a reference for processing, a range that is widened by an arbitrary value (width) centering on A is defined as A, and the range A is a processing range that is combined with A. Further, in FIG. 5, B exists within the processing range of A, so A and B are combined to form a faint area α as shown in FIG.
[0028]
After the faint area is extracted, the combined range a is set as shown in FIG. 6 with the faint area α as a reference. Since C exists in the set coupling range, when C and the faint area are coupled in the same manner as described above, a faint area α is formed as shown in FIG. Such processing is repeated until there is no single element dotted line within the combined range. When there is no single element dotted line within the combined range, the faint area extraction process based on A is completed.
[0029]
The single element dotted line coupling unit 31 executes the above processing for all single element dotted lines. The extracted faint area is held in the faint area memory 33. Next, the faint area integration unit 32 executes an integration process on the small faint areas generated by the single element dotted line coupling unit 31. Based on the small blurred area data held in the blurred area memory 33, the small blurred areas are combined as in the processing of the combining unit 31. That is, a process of replacing the single element dotted line combining process in the combining unit 31 with a faint area is performed. The blurred area integrated here is held in the blurred area memory 33 again.
[0030]
<Example 3>
If the blurred area process of the second embodiment is executed for all images, it takes a long time. In this embodiment, therefore, the blur area extraction process of the second embodiment is executed only for the image determined to be a blur image in the first embodiment, and the processing time is shortened.
[0031]
FIG. 8 shows a configuration of the third embodiment. FIG. 9 shows a process flowchart of the third embodiment. The configuration of the third embodiment is a combination of the configurations of the first and second embodiments. In other words, since the type of image being processed is held in the document type memory 48 by the processing of the first embodiment, when information indicating that the image is faint is obtained when the memory 48 is referred to (step 50). 308), the blurred area extraction process 49 of the second embodiment is executed (step 309).
[0032]
<Example 4>
In the dotted line extraction process, a dotted line is formed by extracting a dotted line element size rectangle and combining the dotted line element rectangles positioned within the threshold value. However, a false dotted line is generated by erroneously combining discontinuous rectangles in a blurred area in the image.
[0033]
Therefore, in this embodiment, by extracting a faint area in an image by a simple method, the dotted line extracted in the faint area is regarded as a pseudo dotted line and removed from the dotted line information, and the accuracy of the dotted line recognition is improved. It is an example.
[0034]
FIG. 10 shows the configuration of the fourth embodiment, in which a pseudo dotted line removing unit 51 is further added to the configuration of the third embodiment, and other components are the same as those of the third embodiment. FIG. 11 shows a process flowchart of the fourth embodiment. The blurred area extracted in the second embodiment is held in the blurred area memory 50, and the dotted line extracted by the dotted line extraction unit 45 is held in the ruled line memory 46.
[0035]
The pseudo dotted line removing unit 51 compares the position coordinates of the dotted line in the ruled line memory 46 with the blurred area in the blurred area memory 50 and is in contact with the dotted line (pseudo dotted line) or the blurred area existing in the blurred area. The dotted line (pseudo dotted line) is removed from the ruled line memory 46 (step 410).
[0036]
<Example 5>
In order to improve the recognition accuracy of ruled lines and characters in an image with a remarkable degree of fading, a method of correcting the recognition processing algorithm can be considered, but a required time is required until the algorithm is developed.
[0037]
Therefore, in the following fifth and sixth embodiments, in order to deal with a problem of a blurred image, in an image in which the number or area of the blurred areas extracted in the second embodiment is larger than a predetermined value, the user is blurred. Processing to warn that the image is present. The user inputs the image again for the image for which the warning is given or corrects the image.
[0038]
FIG. 12 shows the configuration of the fifth embodiment. When the document type memory 48 is referred to and the information indicating that the document is a blurred document is obtained, the blurred image warning unit 52 issues a warning that the document is a blurred image. For example, the user is notified using a method such as displaying a warning on the display screen or notifying by voice.
[0039]
<Example 6>
FIG. 13 shows the configuration of the sixth embodiment. Since the blurred area extracted in the second embodiment is held in the blurred area memory 50, the blurred area calculation unit 53 calculates the total area of the blurred areas or the total number of blurred areas from the blurred area memory 50. When the predetermined threshold value is exceeded, the blurred image warning unit 52 warns that the image is a blurred image, as in the fifth embodiment.
[0040]
<Example 7>
FIG. 14 shows the configuration of the seventh embodiment and is an embodiment realized by software. When the present invention is realized by software, a computer system including a CPU, a memory, a display device, a hard disk, a keyboard, a CD-ROM drive, a mouse, and the like is prepared as shown in FIG. A computer-readable recording medium such as a CD-ROM records a program for realizing the image processing function and processing procedure of the present invention. An image such as a document to be processed is stored in, for example, a hard disk. Then, the CPU reads a program for realizing the processing functions and processing procedures described above from the recording medium, determines whether or not the image read from the hard disk or the like is a faint image, and displays the determination result on a display or the like. To do.
[0041]
【The invention's effect】
As described above, according to the present invention , it is determined whether or not the input image is faint as pre-processing for ruled line and character recognition, so that the accuracy of recognition of ruled lines and characters as subsequent processing is improved. Can give useful information for.
[0042]
According to the present invention , since the blurred area in the input image is extracted, the recognition accuracy of ruled lines and characters can be further improved by using the area information.
[0043]
According to the present invention , the processing time is shortened because the processing for extracting the blurred region is performed only on the image determined to be a blurred image.
[0044]
According to the present invention , since there is a high probability that the dotted line existing in the blurred area is a pseudo-dotted line, by removing the dotted line extracted in the blurred area, the removal rate of the pseudo-dotted line in the image is improved, and therefore the dotted line. The accuracy of ruled line extraction is improved.
[0045]
According to the present invention , it is possible to alert the user that the input image is a faint image, and the user can execute an appropriate treatment on the faint image.
[Brief description of the drawings]
FIG. 1 shows a configuration of Embodiment 1 of the present invention.
FIG. 2 shows a processing flowchart of Embodiment 1 of the present invention.
FIG. 3 shows a configuration of Embodiment 2 of the present invention.
FIG. 4 shows a process flowchart of Embodiment 2 of the present invention.
FIG. 5 is a first diagram for explaining fading area generation;
FIG. 6 is a second diagram for explaining the generation of the blurred area.
FIG. 7 is a third diagram for explaining fading region generation.
FIG. 8 shows a configuration of Embodiment 3 of the present invention.
FIG. 9 shows a process flowchart of Embodiment 3 of the present invention.
FIG. 10 shows a configuration of Example 4 of the present invention.
FIG. 11 shows a process flowchart of Embodiment 4 of the present invention.
FIG. 12 shows a configuration of Example 5 of the present invention.
FIG. 13 shows a configuration of Embodiment 6 of the present invention.
FIG. 14 shows a configuration of Example 7 of the present invention.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Image input part 2 Binary image memory 3 Rectangle extraction part 4 Rectangular memory 5 Dotted line extraction part 6 Dotted line element selection part 7 Dotted line element memory 8 Joining process part 9 Ruled line memory 10 Blurred original determination part 11 Counter 12 Marking part 13 Original type memory

Claims

A black pixel continuous component rectangle is extracted from the binarized image data, a rectangle corresponding to a dotted line element is extracted from the extracted rectangle, and a dotted line element within a predetermined distance is extracted from the extracted dotted line element. The image processing method is characterized in that the combination processing is performed, the dotted line elements that are not combined as a result of the processing are counted, and the image is determined to be a blurred image when the count value is equal to or greater than a predetermined value.

A black pixel continuous component rectangle is extracted from the binarized image data, a rectangle corresponding to a dotted line element is extracted from the extracted rectangle, and a dotted line element within a predetermined distance is extracted from the extracted dotted line element. And the dotted line elements that are not combined as a result of the processing are counted, and when the counted value is equal to or greater than a predetermined value, it is determined that the image is a blurred image, and for the determined blurred image, An image processing method, wherein a blurred area is extracted by integrating the dotted line elements that are not combined.

The image processing method according to claim 2, wherein a dotted line existing in the blurred area is removed as a pseudo dotted line.

3. The image processing method according to claim 2 , wherein when the size of the blurred area is equal to or larger than a predetermined value, a warning is given that the image is a blurred image.

Means for extracting a black pixel continuous component rectangle from the binarized image data; means for extracting a rectangle corresponding to a dotted line element from the extracted rectangle; and for the extracted dotted line element within a predetermined distance. Means for combining the dotted line elements at the position, means for counting the dotted line elements that have not been combined as a result of the combination process, and determining that the image is a blurred image when the count value is equal to or greater than a predetermined value. And an image processing apparatus.

Means for extracting a black pixel continuous component rectangle from the binarized image data; means for extracting a rectangle corresponding to a dotted line element from the extracted rectangle; and for the extracted dotted line element within a predetermined distance. Means for combining the dotted line elements at the position, means for counting the dotted line elements that have not been combined as a result of the combination process, and determining that the image is a blurred image when the count value is equal to or greater than a predetermined value. An image processing apparatus comprising: means for extracting a blurred area by integrating the dotted line elements that have not been combined with the determined blurred image.

A computer-readable recording medium on which a program for causing a computer to realize the image processing method according to claim 1 is recorded.