JP2004192164A

JP2004192164A - Image processor, image forming device having the same, image processing method, image processing program and computer-readable recording medium

Info

Publication number: JP2004192164A
Application number: JP2002357259A
Authority: JP
Inventors: Toyohisa Matsuda; 豊久松田
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2002-12-09
Filing date: 2002-12-09
Publication date: 2004-07-08

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image processor capable of carrying out highly precise area division processing, an image forming device having the image processor, an image processing method, an image processing program and a computer-readable recording medium. <P>SOLUTION: With respect to inputted image data, a clustering part 11 generates the class information and object information of image data for each level by recursive class division processing. With respect to the class information and the object information, a run length calculating part 12 calculates run length where pixels having the same information in a horizontal direction are continued. A character area estimating part 13 estimates pixels belonging to a character area based on the run length of the class information. Lastly, an area deciding part 14 decides which area each object area belongs to, based on the content of the pixels estimated to belong to the character area. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する分野】
本発明は、たとえばデジタルテレビ放送から入力された映像について、背景・文字・写真が混在した多階調画像を背景・文字・その他の領域に分割する画像処理方法、画像処理装置、画像形成装置、およびプログラム、記録媒体に関する。
【０００２】
【従来の技術】
チューナを介して受信したデジタルテレビ放送信号を復号して得られた多値入力画像（静止画）データには、文字・写真・背景領域が混在しており、それぞれの領域において固有の画質劣化を伴う。文字領域では、文字にじみ、文字欠けが発生し、写真領域ではＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）、ＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）によるリンギング、ブロックノイズなどの圧縮アーティファクツが発生する。また、背景領域には少なからずノイズが見られ、そのまま拡大してプリンタ出力した際には画質劣化が非常に目立つ。
【０００３】
また、文字にじみや圧縮によるアーティファクツを解決するような処理を画像データ全体に実施すると画像がぼけたり、写真領域の画質を高めるために、ディテイル再現向上、輪郭強調を行うと、背景領域も強調されてしまい、ノイズを際立たせてしまう。したがって、背景・文字・写真領域などが混在する入力多値画像データの背景領域、文字領域、写真領域を検出して分割し、それぞれの領域に適した処理を行うことが望ましい。
【０００４】
このような課題に対して、従来から画像データの領域分割処理が開発されており、たとえば、複数の画素からなる小領域内の最大画素濃度差を用いて分割する方法がある。この方法では、背景領域の濃度分布が写真・文字領域の濃度分布に比べて、平坦であることを利用し、注目画素を含む小領域内の最大濃度差が第一所定閾値以下であれば、その注目画素を背景領域と判定し、それ以外をオブジェクト領域として判定する。さらに、オブジェクト領域について第２所定閾値以上であれば、注目画素を文字領域と判定し、それ以外を写真領域として判定する。
【０００５】
しかしながら、この方法では、写真領域に含まれるオブジェクトの輪郭など、濃度変化の激しい領域が文字領域として誤って判定されたり、かすれ文字など、２値画像でありながら濃度変化が緩やかな領域が写真領域として誤って判定される。そのため、画像データ全体としては領域分割精度が悪いという問題がある。この問題を解決するための技術として、特許文献１に記載の画像領域分離装置がある。この装置は、背景色画素から構成される背景領域と非背景色画素から構成され、写真や文字等の異なる種類の非背景領域とを有する画像情報を、走査することによって入力する入力手段と、この入力手段により走査毎に入力した画像情報の中から、非背景色画素を検出する非背景色画素分離手段と、非背景色画素分離手段により分離された走査方向に一つ以上連続する非背景色画素を一つのランとし、その長さを検出するラン検出手段と、非背景色画素分離手段により分離された非背景色画素が、非背景領域のエッジを構成するエッジ画素であるかどうかを判定するエッジ画素判定手段と、エッジ画素として判定された非背景色画素の割合に基づき、検出されたランが写真や文字等の異なる種類の非背景領域のうちのいずれの種類の非背景領域に属するかを示すランの属性を判定する領域判定手段とを備えており、これらの手段を用いて領域分割を行う。この領域判定手段では、ラン検出手段で検出されたランの中のエッジ画素の割合が多い場合には、そのランを文字領域として判定し、ランの中のエッジ画素の割合が少ない場合には、そのランを写真領域として判定する。
【０００６】
一般に文字は白色と黒色とが鮮明に分かれており、これに対して写真は白色と黒色とが緩やかに変化していく場合が多い。したがって、一つのランに属する連続した非背景色画素の中で、ラン検出手段によりエッジ画素として判定された非背景色画素の割合が多い場合には、文字領域であると判定することが可能になる。反対に写真領域の場合は文字領域とは異なり、非背景色画素が連続し、かつ、なだらかな変化を呈するために、ランの中でエッジ画素として判定された非背景色画素の割合が少ない場合には、写真領域であると判定することが可能になる。このように、領域判定手段は、非背景色画素から成る主走査方向のランを、各ランにおけるエッジ画素含有率が高い場合には文字領域として判定し、エッジ画素含有率が低い場合には写真領域として判定する。
【０００７】
【特許文献１】
特開平６−５４１８０号公報
【０００８】
【発明が解決しようとする課題】
上記の従来技術においては、Ｎ×Ｎ画素ブロック内の濃度平均値などに基づいて、非背景画素を検出しているが、背景色が白色であることを想定し、色の違いによって非背景画素を検出しているため、デジタルテレビ放送のように様々な色の背景領域を有する入力画像データに対して、背景領域と非背景領域との領域を分割することができない。さらに、エッジ量をＮ×Ｎ画素ブロック内の最大濃度差に基づく固定閾値を用いて２値化処理（クラス分け処理）しているので、非背景領域のうちの文字領域と写真領域との分割精度が悪いという問題がある。
【０００９】
本発明の目的は、高精度な領域分割処理を行うことができる画像処理装置および該画像処理装置を備える画像形成装置、ならびに画像処理方法、画像処理プログラムおよびコンピュータ読み取り可能な記録媒体を提供することである。
【００１０】
【課題を解決するための手段】
本発明は、複数の画素からなる画像を示す画像データが入力され、入力された画像データに基づいて画像を構成する各画素が、文字領域、背景領域およびその他領域のいずれの領域に属するかを判定し、画像データの領域分割を行う領域分割部を備える画像処理装置において、
前記領域分割部は、
注目画素とその周辺画素とからなる画素ブロックの特徴量を各画素の画素値を用いて求め、求めた特徴量に基づく閾値を生成し、生成された閾値と各画素の画素値とを比較して注目画素を２つの画素集合にクラス分けし、前記クラス分けによって分類された画素集合に対して、前記閾値とは異なる閾値でさらにクラス分けを行うことで複数段階のクラス分けを行い、段階ごとのクラス分けの結果を示すクラス情報を生成するクラス情報生成手段と、
クラス情報生成手段が生成した複数の閾値に基づいて、注目画素が背景領域に属するか否かを判断し、その判断結果を示すオブジェクト情報を生成するオブジェクト情報生成手段と、
同じクラス情報を有し、所定の方向に互いに隣接する画素からなるクラスランの画素数であるクラスランレングスと、同じオブジェクト情報を有し、所定の方向に互いに隣接する画素からなるオブジェクトランの画素数であるオブジェクトランレングスとを前記段階ごとに算出するランレングス算出手段と、
前記クラスランレングスに基づいて、クラスランに含まれる画素が文字領域に属するか否かを前記段階ごとに推定する文字領域推定手段と、
オブジェクト情報に基づいて画素が背景領域に属するか否かを判定するとともに、前記オブジェクトランに含まれる画素のうち、前記文字領域推定手段によって文字領域に属すると推定された画素の前記段階ごとの割合に基づいて、オブジェクトランに含まれる画素が文字領域およびその他領域のいずれに属するかを判定する領域判定手段とを有することを特徴とする画像処理装置である。
【００１１】
本発明に従えば、領域分割部は、複数の画素からなる画像を示す画像データに基づいて、画像を構成する各画素が、文字領域、背景領域およびその他領域のいずれの領域に属するかを判定し、画像データの領域分割を行う。
【００１２】
領域分割部は、上記のような構成となっており、まずクラス情報生成手段が、注目画素とその周辺画素とからなる画素ブロックの特徴量を各画素の画素値を用いて求め、求めた特徴量に基づく閾値を生成し、生成された閾値と各画素の画素値とを比較して注目画素のクラス分けを行う。このクラス分けによって各画素は、２つの画素集合に分類され、分類された画素集合の各画素に対して前記閾値とは異なる閾値でさらにクラス分けを行う。この処理を繰り返すことで、複数段階のクラス分けを行う。複数段階のクラス分けの結果は、クラス情報として生成される。クラス情報とは、上記のようにクラス分けによって、分類された際に各画素がいずれのクラス、すなわち明度値などの画素値が閾値以上のクラスまたは閾値未満のクラスに属するかを示す情報である。
【００１３】
たとえば、第１の段階では、１回目のクラス分けによって、２つのクラスに分類され、第２の段階では、これら２つのクラスの画素がさらにクラス分けされて４つのクラスに分類される。したがって、第１の段階のクラス情報は、各画素が２つのクラスのいずれに属するか示し、第２の段階のクラス情報は、各画素が４つのクラスのいずれに属するかを示す。
【００１４】
オブジェクト情報生成手段では、クラス情報生成手段が生成した複数の閾値に基づいて、注目画素が背景領域に属するか否かを判断し、その判断結果を示すオブジェクト情報を生成する。
【００１５】
このようにして、クラス情報およびオブジェクト情報が生成されると、ランレングス算出手段は、クラスランレングスとオブジェクトランレングスとを前記段階ごとに算出する。クラスランレングスは、同じクラス情報を有し、所定の方向に互いに隣接する画素からなるクラスランの画素数であり、オブジェクトランレングスは、同じオブジェクト情報を有し、所定の方向に互いに隣接する画素からなるオブジェクトランの画素数である。つまり、クラスランレングスは、クラス分けによって同じクラスに分類された画素が連続して並んだ場合の画素数を示し、オブジェクトランレングスは、背景領域に属する画素が連続して並んだ場合、もしくは背景画素には属しない画素（文字領域またはその他領域に属する画素）が連続して並んだ場合の画素数を示している。
【００１６】
次に、ランレングス算出手段によって算出されたクラスランレングスに基づいて、クラスランに含まれる画素が文字領域に属するか否かを前記段階ごとに判断するのであるが、クラスランレングスのみで画素が文字領域に属するか否かを判定すると、判定精度が低いものとなってしまう場合がある。したがって、最終的な判定は、後述の領域判定手段によって行い、文字領域推定手段では、クラスランレングスに基づいて、文字領域に属する可能性が高い画素を段階ごとに推定する。
【００１７】
以上のようにして得られた各手段の動作結果に基づいて、領域判定手段が画素の属する領域を判定する。
【００１８】
まず、オブジェクト情報生成手段によって生成されたオブジェクト情報に基づいて、画素が背景領域に属するか否かを判定する。背景領域に属さないと判定された画素については、次のようにして文字領域に属するか、その他領域に属するかを判定する。
【００１９】
背景領域に属しない画素を含むオブジェクトランについて、このオブジェクトランに含まれる画素のうち、文字領域推定手段によって文字領域に属すると推定された画素の割合を前記段階ごとに算出する。文字領域では、１つのオブジェクトランの中に、同じ段階で文字領域と推定された画素が含まれる割合が多いことから、文字領域に属すると推定された画素の段階ごとの割合に基づいて、オブジェクトランが文字領域に属する画素からなるオブジェクトランであるか否かを判断する。文字領域に属する画素からなるオブジェクトランであれば、そのオブジェクトランに含まれる画素を文字領域に含まれる画素として判定する。文字領域に属する画素からなるオブジェクトランでなければ、そのオブジェクトランに含まれる画素をその他領域に含まれる画素として判定する。
【００２０】
注目画素とその周辺画素とからなる画素ブロックの特徴量に基づく閾値を用いて注目画素のクラス分けを行っているので、固定閾値を用いてクラス分けを行う場合に比べ、周辺画素の影響を反映させたクラス情報およびオブジェクト情報を生成することができる。オブジェクト情報の判定は、オブジェクト情報に基づいて精度よく行われる。文字領域の判定は、クラス情報およびオブジェクト情報を用いて、クラスランレングスに基づく推定と、オブジェクトランに含まれる推定画素数の割合とから判定しているので、精度よく文字領域に属する画素を判定できる。
【００２１】
このように、各領域の判定精度が高いので、画像データの領域分割精度を向上させることができる。
【００２２】
また本発明は、前記クラス情報生成手段は、特徴量として注目画素のエッジ量と、前記画素ブロックに含まれる画素の濃度平均値と、周辺画素が注目画素であったときに行ったクラス分けの閾値とを用い、前記エッジ量を重み係数として、濃度平均値と周辺画素の閾値とを線形補間して閾値を算出することを特徴とする。
【００２３】
本発明に従えば、クラス情報生成手段は、特徴量として注目画素のエッジ量と、画素ブロックに含まれる画素の濃度平均値と、周辺画素が注目画素であったときに行ったクラス分けの閾値とを用い、エッジ量を重み係数として、濃度平均値と周辺画素の閾値とを線形補間して閾値を算出する。
【００２４】
これにより、エッジ強度を反映した閾値を用いるため、文字領域などのエッジ付近の画素において適切にクラス分けを行うことができる。
【００２５】
また本発明は、前記クラス情報生成手段は、前記エッジ量の下限値を画像データのダイナミックレンジに基づいて設定することを特徴とする。
【００２６】
本発明に従えば、クラス情報生成手段は、画前記エッジ量の下限値を画像データのダイナミックレンジに基づいて設定する。
【００２７】
これにより、画像データのダイナミックレンジが狭い場合でも、適切にクラス分けを行うことができる。
【００２８】
また本発明は、前記クラス情報生成手段は、注目画素を所定の走査方向に順次移動させてクラス分けを行い、走査ラインごとに走査方向を変えることを特徴とする。
【００２９】
本発明に従えば、クラス情報生成手段は、注目画素を所定の走査方向に順次移動させてクラス分けを行い、走査ラインごとに走査方向を変える。
【００３０】
クラス情報生成手段が閾値を算出する場合、周辺画素が注目画素であったときに行ったクラス分けの閾値を用いる。したがって、ラインを左から右へ走査するときは、注目画素の上側の周辺画素と左側の周辺画素の閾値を用いることとなる。走査方向を変えずにクラス分けを行うと、常に左側の周辺画素の影響を受けるため、適切な閾値を算出することができない。走査ラインごとに走査方向を変えることによって、左右の周辺画素の影響を平均して受けることができるため、適切なクラス分けを行うことができる。
【００３１】
また本発明は、前記クラス情報生成手段は、周辺画素に含まれるエッジ画素の位置に基づいて、注目画素の閾値を算出せずに周辺画素の閾値の中から選択するか、もしくは濃度平均値と周辺画素の閾値とを線形補間して算出することを特徴とする。
【００３２】
本発明に従えば、クラス情報生成手段は、周辺画素に含まれるエッジ画素の位置に基づいて、注目画素の閾値を算出せずに周辺画素の閾値の中から選択するか、もしくは濃度平均値と周辺画素の閾値とを線形補間して算出する。
【００３３】
周辺画素にエッジ画素が含まれる場合、算出する閾値は、エッジ画素の影響を強く受けることになり、適切な閾値を算出することができない。したがって、エッジ画素の位置が注目画素の上のみの場合は、閾値を算出せずに左側の周辺画素の閾値を注目画素の閾値として用いるなど、エッジ画素の位置に基づいて注目画素の閾値を生成するので、特に注目画素がエッジ付近の背景画素などの場合に適切な閾値を生成してクラス分けを行うことができる。
【００３４】
また本発明は、前記オブジェクト情報生成手段は、前記クラス情報生成手段が生成した閾値として、画像データの最初の注目画素に対して予め定められている初期閾値が連続する場合、注目画素が背景領域に属すると判断することを特徴とする。
【００３５】
本発明に従えば、オブジェクト情報生成手段は、クラス情報生成手段が生成した閾値として、画像データの最初の注目画素に対して予め定められている初期閾値が連続する場合、注目画素が背景領域に属すると判断する。
【００３６】
画像データの最初の注目画素は、背景画素である場合がほとんどであるため、クラス情報生成手段が生成した閾値が初期閾値であり、それが連続するときは、注目画素が背景領域に属する場合が多い。したがって、このような条件で注目画素が背景領域に属するか否かの判断をすることで、容易かつ精度良く判断することができる。
【００３７】
また本発明は、前記ランレングス算出手段は、同種複数処理型演算装置で構成され、走査ラインを予め定める画素数分のクラス情報を含むデータパスに分割し、データパスごとにランレングスを算出し、各データパスの算出後にデータパス間を連結してランレングスを求めることを特徴とする。
【００３８】
本発明に従えば、ランレングス算出手段は、同種複数処理型演算装置で構成される。このとき、走査ラインを予め定める画素数分のクラス情報を含むデータパスに分割し、データパスごとにランレングスを算出し、各データパスの算出後にデータパス間を連結してランレングスを求める。
【００３９】
同種複数処理型演算装置、いわゆるＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型プロセッサは、同種の命令の処理を同時に行うことができる。ランレングスを算出する場合に、命令をランレングスの算出とすると、データパスごとのランレングスの算出処理を同時に行うことができる。したがって、ランレングス算出処理の処理速度を高速化することができる。
【００４０】
また本発明は、前記文字領域推定手段は、ランレングス算出手段により算出されたクラスランレングスを予め定める文字推定閾値と比較し、閾値以下であればクラスランに含まれる画素は文字領域に属すると推定することを特徴とする。
【００４１】
本発明に従えば、文字領域推定手段は、ランレングス算出手段により算出されたクラスランレングスを予め定める文字推定閾値と比較し、閾値以下であればクラスランに含まれる画素は文字領域に属すると推定する。
【００４２】
一般的に文字は繁雑度が高いため、クラスランレングスを文字推定閾値と比較するだけで容易に文字領域に属するか否かを推定することができる。
【００４３】
また本発明は、前記文字領域推定手段は、周辺画素のいずれかが注目画素と同じクラス情報を有し、かつ、文字領域に属しないと推定されている場合、注目画素を文字領域に属しないと推定することを特徴とする。
【００４４】
本発明に従えば、文字領域推定手段は、周辺画素の何れかが注目画素と同じクラス情報を有し、かつ、文字領域に属しないと推定されている場合、注目画素を文字領域に属しないと推定する。
【００４５】
文字領域の推定に用いられるランレングスは、所定の方向、たとえば横方向のみのランレングスであり、ランレングスと閾値の比較のみで推定すると、横方向の繁雑度のみに依存し、推定精度が低くなってしまう。したがって、上記のような条件を付加して推定を行うことで推定精度を高めることができる。
【００４６】
また本発明は、前記領域判定手段によって、文字領域に属すると判定された連続する画素のうち、最端部の画素が有するクラス情報と同じクラス情報を有する画素を文字画素として検知する文字検知手段が備えられていることを特徴とする。
【００４７】
本発明に従えば、文字検知手段は、領域判定手段によって、文字領域に属すると判定された連続する画素のうち、最端部の画素が有するクラス情報と同じクラス情報を有する画素を文字画素として検知する。
【００４８】
文字領域に属する画素は同じクラス情報を有し、文字領域に属すると判定された連続する画素のうち、最端部の画素が文字領域に属する場合が多いので、精度よく文字領域に属する画素を検知することができる。
【００４９】
また本発明は、上記の画像処理装置と、画像処理装置によって処理された画像データを出力する画像出力装置とを備えることを特徴とする画像形成装置である。
【００５０】
本発明に従えば、上記の画像処理装置によって処理された画像データを、画像出力装置から出力する。
【００５１】
これにより、画像データが高精度で領域分割され、各領域に応じた後処理が施された画像データを出力することができるので、高画質な静止画像を形成することができる。
【００５２】
また本発明は、複数の画素からなる画像を示す画像データが入力され、入力された画像データに基づいて画像を構成する各画素が、文字領域、背景領域およびその他領域のいずれの領域に属するかを判定し、画像データの領域分割を行う領域分割工程を備える画像処理方法において、
前記領域分割工程は、
注目画素とその周辺画素とからなる画素ブロックの特徴量を各画素の画素値を用いて求め、求めた特徴量に基づく閾値を生成し、生成された閾値と画素値とを比較して注目画素を２つの画素集合にクラス分けし、前記クラス分けによって分類された画素集合に対して、前記閾値とは異なる閾値でさらにクラス分けを行うことで複数段階のクラス分けを行い、段階ごとのクラス分けの結果を示すクラス情報を生成するクラス情報生成工程と、
クラス情報生成工程で生成した複数の閾値に基づいて、注目画素が背景領域に属するか否かを判断し、その判断結果を示すオブジェクト情報を生成するオブジェクト情報生成工程と、
同じクラス情報を有し、所定の方向に互いに隣接する画素からなるクラスランの画素数であるクラスランレングスと、同じオブジェクト情報を有し、所定の方向に互いに隣接する画素からなるオブジェクトランの画素数であるオブジェクトランレングスとを前記段階ごとに算出するランレングス算出工程と、
前記クラスランレングスに基づいて、クラスランに含まれる画素が文字領域に属するか否かを前記段階ごとに推定する文字領域推定工程と、
オブジェクト情報に基づいて画素が背景領域に属するか否かを判定するとともに、前記オブジェクトランに含まれる画素のうち、前記文字領域推定工程によって文字領域に属すると推定された画素の前記段階ごとの割合に基づいて、オブジェクトランに含まれる画素が文字領域およびその他領域のいずれに属するかを判定する領域判定工程とを有することを特徴とする画像処理方法である。
【００５３】
本発明に従えば、領域分割工程は、複数の画素からなる画像を示す画像データに基づいて、画像を構成する各画素が、文字領域、背景領域およびその他領域のいずれの領域に属するかを判定し、画像データの領域分割を行う。
【００５４】
領域分割工程は、上記のような工程からなり、まずクラス情報生成工程が、注目画素とその周辺画素とからなる画素ブロックの特徴量を各画素の画素値を用いて求め、求めた特徴量に基づく閾値を生成し、生成された閾値と各画素の画素値とを比較して注目画素のクラス分けを行う。このクラス分けによって各画素は、２つの画素集合に分類され、分類された画素集合の各画素に対して前記閾値とは異なる閾値でさらにクラス分けを行う。この処理を繰り返すことで、複数段階のクラス分けを行う。複数段階のクラス分けの結果は、クラス情報として生成される。クラス情報とは、上記のようにクラス分けによって、分類された際に各画素がいずれのクラス、すなわち明度値などの画素値が閾値以上のクラスまたは閾値未満のクラスに属するかを示す情報である。
【００５５】
たとえば、第１の段階では、１回目のクラス分けによって、２つのクラスに分類され、第２の段階では、これら２つのクラスの画素がさらにクラス分けされて４つのクラスに分類される。したがって、第１の段階のクラス情報は、各画素が２つのクラスのいずれに属するか示し、第２の段階のクラス情報は、各画素が４つのクラスのいずれに属するかを示す。
【００５６】
オブジェクト情報生成工程では、クラス情報生成工程で生成された複数の閾値に基づいて、注目画素が背景領域に属するか否かを判断し、その判断結果を示すオブジェクト情報を生成する。
【００５７】
このようにして、クラス情報およびオブジェクト情報が生成されると、ランレングス算出工程では、クラスランレングスとオブジェクトランレングスとを前記段階ごとに算出する。クラスランレングスは、同じクラス情報を有し、所定の方向に互いに隣接する画素からなるクラスランの画素数であり、オブジェクトランレングスは、同じオブジェクト情報を有し、所定の方向に互いに隣接する画素からなるオブジェクトランの画素数である。つまり、クラスランレングスは、クラス分けによって同じクラスに分類された画素が連続して並んだ場合の画素数を示し、オブジェクトランレングスは、背景領域に属する画素が連続して並んだ場合、もしくは背景画素には属しない画素（文字領域またはその他領域に属する画素）が連続して並んだ場合の画素数を示している。
【００５８】
次に、ランレングス算出工程によって算出されたクラスランレングスに基づいて、クラスランに含まれる画素が文字領域に属するか否かを前記段階ごとに判断するのであるが、クラスランレングスのみで画素が文字領域に属するか否かを判定すると、判定精度が低いものとなってしまう場合がある。したがって、最終的な判定は、後述の領域判定工程によって行い、文字領域推定工程では、クラスランレングスに基づいて、文字領域に属する可能性が高い画素を段階ごとに推定する。
【００５９】
以上のようにして得られた各工程の結果に基づいて、領域判定工程では、画素の属する領域を判定する。
【００６０】
まず、オブジェクト情報生成工程で生成されたオブジェクト情報に基づいて、画素が背景領域に属するか否かを判定する。背景領域に属さないと判定された画素については、次のようにして文字領域に属するか、その他領域に属するかを判定する。
【００６１】
背景領域に属しない画素を含むオブジェクトランについて、このオブジェクトランに含まれる画素のうち、文字領域推定工程で文字領域に属すると推定された画素の割合を前記段階ごとに算出する。文字領域では、１つのオブジェクトランの中に、同じ段階で文字領域と推定された画素が含まれる割合が多いことから、文字領域に属すると推定された画素の段階ごとの割合に基づいて、オブジェクトランが文字領域に属する画素からなるオブジェクトランであるか否かを判断する。文字領域に属する画素からなるオブジェクトランであれば、そのオブジェクトランに含まれる画素を文字領域に含まれる画素として判定する。文字領域に属する画素からなるオブジェクトランでなければ、そのオブジェクトランに含まれる画素をその他領域に含まれる画素として判定する。
【００６２】
注目画素とその周辺画素とからなる画素ブロックの特徴量に基づく閾値を用いて注目画素のクラス分けを行っているので、固定閾値を用いてクラス分けを行う場合に比べ、周辺画素の影響を反映させたクラス情報およびオブジェクト情報を生成することができる。オブジェクト情報の判定は、オブジェクト情報に基づいて精度よく行われる。文字領域の判定は、クラス情報およびオブジェクト情報を用いて、クラスランレングスに基づく推定と、オブジェクトランに含まれる推定画素数の割合とから判定しているので、精度よく文字領域に属する画素を判定できる。
【００６３】
このように、各領域の判定精度が高いので、画像データの領域分割精度を向上させることができる。
【００６４】
また本発明は、上記の画像処理方法をコンピュータに実行させるための画像処理プログラムである。
【００６５】
本発明に従えば、上記の画像処理方法をコンピュータに実行させるための画像処理プログラムとして提供することができる。
【００６６】
また本発明は、上記の画像処理方法をコンピュータに実行させるための画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体である。
【００６７】
本発明に従えば、上記の画像処理方法をコンピュータに実行させるための画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体として提供することができる。
【００６８】
【発明の実施の形態】
本発明は、文字・写真・背景領域が混在する多値入力画像データに対して、領域判定処理を行う画像処理装置であり、たとえば、デジタル放送（データ放送）で得られた多値入力画像データをプリンタなどで印刷する場合に予め画像処理を行う装置である。
【００６９】
図１は、本発明の実施の一形態である画像形成装置１の構成を示すブロック図である。画像形成装置１は、画像処理装置２と画像出力装置であるプリンタ９とからなり、画像処理装置２は、入力部３、領域分割部４、補正部５、解像度変換部６、色補正部７およびハーフトーン部８からなる。
【００７０】
本実施形態における画像形成装置１は、デジタルテレビ放送などで送信される画像データを印刷して出力するデジタルプリンタとして説明する。印刷して出力するためには、まず有線ケーブルまたは放送用無線アンテナなどを介して送られてきたデジタルテレビ放送信号を、チューナなどの入力部３によって、入力多値画像データ（以下では単に画像データと呼ぶ。）に変換する。画像データは、格子状に配列された複数の画素からなり、各画素は明度値や色度などの画素値を有している。
【００７１】
次に、領域分割部４により、画像データの各画素が文字領域、背景領域、写真領域のいずれの領域に属するかを判定し、画像データを文字領域、背景領域、写真領域に分割した後、補正部５によりそれぞれの領域に適した補正処理を行う。
【００７２】
補正部５は、文字にじみ補正処理手段５ａ、圧縮アーティファクツ除去処理手段５ｂ、ノイズ除去処理手段５ｃからなり、文字領域であると判定された領域については、文字にじみ補正処理手段５ａが文字にじみおよび文字欠けを補正する処理を行い、写真領域には、圧縮アーティファクツ除去処理手段５ｂが圧縮によるアーティファクツを除去する処理を行い、また、背景領域には、ノイズ除去処理手段５ｃが雑音成分を除去するような処理を行う。
【００７３】
補正されて画質改善された画像データは、解像度変換部６によって、プリンタ９の解像度に合わせて解像度変換処理される。色補正部７が、解像度変換処理された画像データの色空間をデバイス色空間に変換した後、最後にハーフトーン部８が中間調処理を行い、プリンタ９に出力する。プリンタ９は、たとえば、電子写真方式やインクジェット方式を用いて画像処理装置２から出力された画像データを紙などの記録媒体に印刷する。
【００７４】
なお、以上の処理は不図示のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）により制御される。画像処理装置２とプリンタ３とは、接続ケーブルによって直接接続されていてもよいし、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などのネットワークを介して接続されていても良い。このとき、画像処理装置２はパーソナルコンピュータ（ＰＣ）などであり、プリンタ３はファクシミリ装置やコピー装置または複写機能およびファックス機能を備える複合機などでもよい。
【００７５】
図２は、領域分割部４の構成を示すブロック図である。領域分割部４は、色変換部１０、クラスタリング部１１、ランレングス算出部１２、文字領域推定部１３および領域判定部１４からなる。
【００７６】
領域分割部４では、写真領域、背景領域、文字領域が混在する画像データに対して、色変換部１０が所定の色空間に変換した後、クラスタリング部１１が再帰的クラス分け処理によって画像データのクラス情報、および、オブジェクト情報を生成する。そして、ランレングス算出部１２が、クラス情報およびオブジェクト情報それぞれについて、水平方向に同一情報を有する画素が連続するランレングスを算出する。
【００７７】
次に、文字領域推定部１３は、クラス情報のランレングスに基づいて文字領域に属する画素を推定する。そして、領域判定部１４は、オブジェクト情報のランが連続する各オブジェクト領域において、文字領域に属すると推定された画素の含有率に基づいて、オブジェクト領域が文字領域、背景領域、写真領域のどの領域に属するかを判定する。
【００７８】
以下では、各部位の動作について詳細に説明する。まず色変換部１０において、入力された画像データがＲＧＢ色空間画像であれば、（Ｒ＋Ｇ＋Ｂ）／３を算出して、１つのデータに統一できるよう変換する。
【００７９】
また、他の色変換方法として、入力されたＲＧＢ色空間画像を均等色空間であるＬ^＊ａ^＊ｂ^＊カラースペースＣＩＥ１９７６（ＣＩＥ：ＣｏｍｍｉｓｓｉｏｎＩｎｔｅｒｎａｔｉｏｎａｌｅｄｅｌ’Ｅｃｌａｉｒａｇｅ：国際照明委員会。Ｌ^＊：明度、ａ^＊，ｂ^＊：色度）色空間に変換し、そのＬ^＊信号を用いる。図３は、入力画像（図３（ａ））と、色空間変換によって生成したＬ^＊信号からなる画像（図３（ｂ））の例を示す図である。
【００８０】
クラスタリング部１１は、画像データに対して再帰的クラス分け処理を行い、クラス情報およびオブジェクト情報を生成するクラス情報生成手段およびオブジェクト情報生成手段である。クラス情報とは、再帰的クラス分け処理によって、分類された際に各画素がいずれのクラス、すなわち明度値などの画素値が閾値以上のクラスまたは閾値未満のクラスに属するかを示す情報である。オブジェクト情報とは、各画素が背景領域に属するか、文字領域および写真領域である非背景領域（オブジェクト領域）に属するかを示す情報である。
【００８１】
再帰的クラス分け処理は、注目画素を含む画素ブロックの特徴量を基に閾値を算出し、算出した閾値を用いて注目画素をクラス分けする処理である。まず、画素ブロックとしては、中心となる注目画素とその周辺画素となる８画素を含む３×３画素の画素ブロックを用いる。
【００８２】
図４は、３×３画素の画素ブロックを示す図である。注目画素Ｃ１の座標を（ｘ，ｙ）とすると、周辺画素Ｐ１〜Ｐ８の座標は、それぞれＰ１（ｘ−１，ｙ−１），Ｐ２（ｘ，ｙ−１），Ｐ３（ｘ＋１，ｙ−１），Ｐ４（ｘ−１，ｙ），Ｐ５（ｘ＋１，ｙ），Ｐ６（ｘ−１，ｙ＋１），Ｐ７（ｘ，ｙ＋１），Ｐ８（ｘ＋１，ｙ＋１）となる。特徴量としては、近傍平均値、近傍エッジ量および近傍閾値を用いる。近傍平均値Ａｖｇは、図４に示したウインドウ内の９画素の画素値の平均として求める。また、エッジ量については図５に示すようなｐｒｅｗｉｔｔオペレータ（プリヴィットフィルター）を用いる。３×３画素の画素値を抽出し、画素値にマトリクス係数を畳み込むことで、エッジ量を算出する。図５（ａ）が垂直方向用オペレータ、図５（ｂ）が水平方向用オペレータである。それぞれのオペレータを用いることで、垂直方向エッジｅｄｇｅ＿ｖ量（ｘ，ｙ）および水平方向エッジ量ｅｄｇｅ＿ｈ（ｘ，ｙ）を算出することができる。
【００８３】
そして、上記で求めた近傍平均値Ａｖｇ、垂直方向エッジ量ｅｄｇｅ＿ｖ、水平方向エッジ量ｅｄｇｅ＿ｈ、および近傍閾値（すでにクラス分けされた周辺画素の閾値）を用いて動的に注目画素の閾値を決定する。領域分離の精度を高めるために、画像のエッジ部では、クラスを変化させるように、主に近傍平均値を閾値として用い、画像の平坦部では、クラスを変化させないように、主に近傍閾値を用いてクラス分けする。
【００８４】
そこで、閾値を、エッジ量を重み係数として用いた線形補間により算出する。以下に一般的な線形補間式を示す。
Ｙ＝（１−ａ）×Ｘ１＋ａ×Ｘ２（１）
ただし、ａの範囲は０≦ａ≦１である。
【００８５】
（１）式において、重み係数ａをエッジ量、Ｘ１を近傍閾値、Ｘ２を近傍平均値として閾値Ｙを算出することにより、エッジ部では主に近傍平均値をクラス分けの閾値として用い、平坦部では、主に近傍閾値を閾値として用いることができる。
そこで、以下の算出式を用いてエッジ量Ｅｄｇｅを算出する。
【００８６】
【数１】

【００８７】
（１）式を用いて線形補間により閾値を算出するためには、重み係数であるエッジ量の範囲が、０≦Ｅｄｇｅ≦１である必要があるが、（２）式で算出されるエッジ量Ｅｄｇｅは、０≦Ｅｄｇｅ≦１の範囲とはならない。したがって、エッジ量Ｅｄｇｅに対して最大値Ｗを設け、最大値で除算することで０≦Ｅｄｇｅ／Ｗ≦１の範囲とすることができる。
【００８８】
エッジ量Ｅｄｇｅの最大値Ｗは、以下の（３）式により設定する。
Ｅｄｇｅ＝Ｅｄｇｅ＞Ｗ？Ｗ：Ｅｄｇｅ（３）
【００８９】
（３）式は、Ｅｄｇｅとして、条件を満たすときには前者を、条件を満たさない場合には後者の値を用いることを意味する。つまり、エッジ量ＥｄｇｅがＷより大きい時はＥｄｇｅ＝Ｗとし、Ｗ以下の時は、Ｅｄｇｅをそのまま用いる。
【００９０】
本実施形態における再帰的クラス分け処理は、注目画素とその周辺画素とからなる３×３画素の画素ブロックにおいて、エッジ量、近傍平均値および近傍閾値などの特徴量を求め、求めた特徴量に基づく閾値を生成して注目画素のクラス分けを行う。さらに、クラス分けによって分類された各クラスの画素に対して、異なる閾値でさらにクラス分けを行うことで複数段階（レベル）のクラス分けを行う。また、本実施形態では再帰レベルを３とし、段階的に、強いエッジ部分をクラスの境界として分割するレベル１、比較的強いエッジ部分をクラスの境界として分割するレベル２、および、弱いエッジ部分をクラスの境界として分割するレベル３の３つのレベルで分割することとなる。強いエッジ部分とは、エッジを挟んだ両側の画素間の画素値の差が大きい部分であり、弱いエッジ部分とは、エッジを挟んだ両側の画素間の画素値の差が小さい部分である。
【００９１】
したがって、複数レベルの再帰的クラス分け処理を実現するためにエッジ量の下限値を設ける。
【００９２】
下限値をＷの関数ＬＯＷＥＲ＿ＶＡＬ（Ｗ）とすると、下限値は以下の（４）式により算出される。
Ｅｄｇｅ＝Ｅｄｇｅ＜ＬＯＷＥＲ＿ＶＡＬ（Ｗ）？０：Ｅｄｇｅ（４）
【００９３】
このとき、エッジ量ＥｄｇｅがＬＯＷＥＲ＿ＶＡＬ（Ｗ）より小さい時はＥｄｇｅ＝０とし、ＬＯＷＥＲ＿ＶＡＬ（Ｗ）以上の時は、Ｅｄｇｅをそのまま用いる。
【００９４】
関数ＬＯＷＥＲ＿ＶＡＬ（Ｗ）は、たとえば以下のようなＷの関数とする。
ＬＯＷＥＲ＿ＶＡＬ（Ｗ）＝３２×Ｗ／１２８（５）
【００９５】
（２）〜（４）式により算出したエッジ量Ｅｄｇｅ、近傍平均値Ａｖｇ、および、近傍閾値ｔｈ（ｘ−１，ｙ），ｔｈ（ｘ，ｙ−１）を（１）式に代入することにより、注目画素における閾値ｔｈ（ｘ，ｙ）を算出することができる。ここで、座標（ｘ−１，ｙ）は周辺画素のうち注目画素Ｃ１の左隣の画素Ｐ４の座標を示し、座標（ｘ，ｙ−１）は周辺画素のうち上の画素Ｐ２の座標を示している。したがって、ｔｈ（ｘ−１，ｙ）は注目画素の左隣の画素Ｐ４をクラス分けしたときの閾値を示し、ｔｈ（ｘ，ｙ−１）は注目画素の上の画素Ｐ２をクラス分けしたときの閾値を示す。
【００９６】
【数２】

【００９７】
（７）式は四捨五入を表す。閾値ｔｈ（ｘ，ｙ）は整数であることから、ＴＨ／Ｗに０．５を加えることにより、四捨五入を実現することができる。しかしながら、整数演算において、除算を行った後に０．５を加える場合、処理量が増加するため、除算における分母を２で割った値を分子に加えた後、分母で割ることにより四捨五入を実現するのが一般的である。
【００９８】
実際にクラス分け処理を行う手順としては、画像データの各画素を行方向（主走査方向）に処理を繰り返して走査する。１ラインの処理が終われば列方向（副走査方向）に処理の対象ラインを移動し、再度主走査方向にクラス分け処理を行う。
【００９９】
前述のように閾値ｔｈ（ｘ，ｙ）を算出するためには、近傍閾値ｔｈ（ｘ−１，ｙ），ｔｈ（ｘ，ｙ−１）が必要であるが、最初のラインをクラス分け処理する場合、注目画素の上の画素が存在しないので、近傍閾値ｔｈ（ｘ，ｙ−１）を用いることができない。また、ラインを左から右へ順次クラス分け処理を行うときに、最初の注目画素、すなわち左端の画素には左隣の画素が存在しないため、近傍閾値ｔｈ（ｘ−１，ｙ）を用いることができない。したがって、予め初期閾値を設定し、近傍画素が存在しない場合には、設定した初期閾値を近傍閾値ｔｈ（ｘ，ｙ−１）および近傍閾値ｔｈ（ｘ−１，ｙ）として閾値ｔｈ（ｘ，ｙ）を算出する。
【０１００】
以下では、画素値、たとえば明度値の範囲を０（黒）〜２５５（白）として、初期閾値を１２８とする。なお、他の初期閾値としては、たとえば画像データ全体の平均画素値などを用いてもよい。
【０１０１】
また、閾値ｔｈ（ｘ，ｙ）を算出するために、近傍閾値ｔｈ（ｘ−１，ｙ），ｔｈ（ｘ，ｙ−１）を用いることから、ラインを主走査方向に走査するときに、常に左から右へクラス分け処理を行うと、閾値ｔｈ（ｘ，ｙ）は、注目画素の左隣の画素の近傍閾値ｔｈ（ｘ−１，ｙ）の影響を受けることになり、適切なクラス分け処理が行われない場合がある。したがって、所定のライン毎に、ラインの左から右への処理と、右から左への処理とを入れ換えてクラス分け処理を行う。ラインの右から左へクラス分け処理を行う場合は、（６）式に代入する近傍閾値を、近傍閾値ｔｈ（ｘ−１，ｙ）から近傍閾値ｔｈ（ｘ＋１，ｙ）に変更すればよい。これにより、閾値ｔｈ（ｘ，ｙ）は、上の画素、および左右の画素を平均的に考慮した閾値として算出することができる。
【０１０２】
さらに、クラスタリング部１１へ入力される画像データとして、明度値など１つの画素値のみでなく、他に色差などを入力し、エッジ量算出に、色差のエッジ量を付加することにより、色差も考慮したクラス分け処理を行うことができる。
【０１０３】
また、画像データ全体のダイナミックレンジ（画素値の最大値と最小値との差）を算出し、以下の式によりＬＯＷＥＲ＿ＶＡＬ（Ｗ）を算出することにより、より画像に適応したクラス分け処理を行うことができ、その結果、処理精度を向上することが可能となる。
【０１０４】
【数３】

Ｄはダイナミックレンジを表す。
【０１０５】
これは、画像におけるエッジ量がダイナミックレンジと大きく関係しており、ダイナミックレンジが狭い（Ｄが小さい）画像はエッジが検出されにくく、エッジ量算出時における下限値をダイナミックレンジに合わせて変更することにより、エッジが検出されにくい画像に対応するためである。
【０１０６】
本実施形態で行われる画像処理は、ラスタ処理であるため、注目画素とエッジ部との位置関係によって同じ平坦部の画素であっても閾値が異なる。たとえば、注目画素の下にエッジ部がある場合は平坦部が連続しており、前述の（６），（７）式に示すように、注目画素の左隣および上の周辺画素、すなわち同じ平坦部の近傍閾値を用いて閾値を算出するのに対し、注目画素の上にエッジ部がある場合は注目画素の上の周辺画素がエッジ画素であるため、エッジ画素および平坦部の画素の近傍閾値を用いて閾値を算出することになる。したがって、同じ平坦部の画素であってもエッジ部との位置関係によって閾値が異なることとなる。図６（ａ）に図３に示した画像データの各画素における（６），（７）式で求めた閾値の分布を示す。背景部分および下部の写真内の陸地や海の部分などの平坦部で閾値の変化が生じていることが分かる。
【０１０７】
そこで、注目画素とエッジ部との位置関係によって、クラス分け処理の閾値の算出方法を変える。まず、注目画素をラインの左から右へ１画素ごとにクラス分け処理を行う場合について説明する。
【０１０８】
図７（ａ）に示すように、周辺画素のうち注目画素の上の画素のみがエッジ画素であり、注目画素の左右にはエッジ画素が存在しない場合には、注目画素の左の画素がクラス分けを行ったときの閾値ｔｈ（ｘ−１，ｙ）をそのまま注目画素の閾値ｔｈ（ｘ，ｙ）とする。図７（ｂ）に示すように、周辺画素のうち注目画素の上にはエッジ画素が存在せず、注目画素の左右の画素がエッジ画素である場合には、注目画素の上の画素がクラス分けを行ったときの閾値ｔｈ（ｘ，ｙ−１）をそのまま注目画素の閾値ｔｈ（ｘ，ｙ）とする。
【０１０９】
図７（ｃ）に示すように、周辺画素にエッジ画素が存在しない場合には、注目画素の左の画素がクラス分け処理を行ったときの閾値、あるいは、上の画素がクラス分け処理を行ったときの閾値のうち、予め設定されている初期閾値に近いほうの閾値を注目画素の閾値とする。図７（ｄ）に示すように、上記以外の場合には、（６）式を用いて注目画素の閾値を算出する。
【０１１０】
次に、注目画素をラインの右から左へ１画素ごとにクラス分け処理を行う場合について説明する。図８（ａ）に示すように、周辺画素のうち注目画素の上の画素のみがエッジ画素であり、注目画素の左右にはエッジ画素が存在しない場合には、注目画素の右の画素にクラス分け処理を行ったときの閾値ｔｈ（ｘ＋１，ｙ）をそのまま注目画素の閾値ｔｈ（ｘ，ｙ）とする。図８（ｂ）に示すように、周辺画素のうち注目画素の上にはエッジ画素が存在せず、注目画素の左右の画素がエッジ画素である場合には、注目画素の上の画素がクラス分け処理を行ったときの閾値ｔｈ（ｘ，ｙ−１）をそのまま注目画素の閾値ｔｈ（ｘ，ｙ）とする。
【０１１１】
図８（ｃ）に示すように周辺画素にエッジ画素が存在しない場合には、注目画素の右の画素がクラス分け処理を行ったときの閾値、あるいは、上の画素がクラス分け処理を行ったときの閾値のうち、予め設定されている初期閾値に近いほうの閾値を注目画素の閾値とする。図８（ｄ）に示すように、上記以外の場合には、（６）式を用いて注目画素の閾値を算出する。
【０１１２】
このようにして閾値を決定した場合の閾値の分布を図６（ｂ）に示す。図から平坦部における不自然な閾値の変化を生じていないことが分かる。これにより平坦部の閾値を一定に保つことができ、さらに、後述するオブジェクト情報の作成を行うことができる。
【０１１３】
再帰的クラス分け処理は、上記のように画素ごとに閾値を決定してクラス分け処理が繰り返されることにより実行される。具体的には以下のように実現する。
【０１１４】
本実施形態では、３レベル階層まで、再帰的クラス分け処理を繰り返す。
まず、レベル１におけるクラス分け処理では、エッジ量上限値（＝重み係数の和）Ｗ１を１２８とし、前述のようにして決定した閾値に基づいて、各画素を明度値が０または２５５の２つのクラスに分類する。画素の明度値が閾値より大きいときは、その画素の明度値を２５５とし、閾値より小さいときは、明度値を０とする。このようにして得られた各画素の明度値をレベル１のクラス情報として画素ごとに記憶し、レベル１における分類結果とする。
【０１１５】
レベル２では、レベル１において明度値が０のクラスに分類された各画素および２５５のクラスに分類された各画素について、さらにクラス分け処理を行う。エッジ量上限値をＷ２＝Ｗ１／２（＝６４）と設定することで、レベル１より細かなエッジを検出してクラスの変化を起こしやすくする。また、このとき、エッジ量下限値ＬＯＷＥＲ＿ＶＡＬ（Ｗ２）は、（５）式にＷ＝６４を代入して１６とする。
【０１１６】
レベル２のクラス分け処理では、レベル１において０のクラスに分類された各画素の明度値を０と８５の２つのクラスに分類し、レベル１において２５５のクラスに分類された各画素の明度値を１７０と２５５の２つのクラスに分類する。このようにして得られた各画素の明度値をレベル２のクラス情報として記憶し、レベル２における分類結果とする。
【０１１７】
最後に、レベル３では、レベル２において明度値が０，８５，１７０，２５５のクラスに分類された各画素について、さらにクラス分け処理を行う。エッジ量上限値Ｗ３をＷ３＝Ｗ２／２（＝３２）と設定することで、より細かなエッジを検出してクラスの変化を起こしやすくする。また、このとき、エッジ量下限値ＬＯＷＥＲ＿ＶＡＬ（Ｗ３）は、（５）式にＷ３＝３２を代入して８とする。
【０１１８】
レベル３のクラス分け処理では、レベル２において明度値が０のクラスに分類された各画素の明度値を０と２８の２つのクラスに分類し、８５のクラスに分類された各画素の明度値を５６と８５の２つのクラスに分類し、１７０のクラスに分類された各画素の明度値を１７０と１９６の２つのクラスに分類し、２５５のクラスに分類された各画素の明度値を２２６と２５５の２つのクラスに分類する。このようにして得られた各画素の明度値をレベル３のクラス情報として記憶し、レベル３における分類結果とする。
【０１１９】
図９は、再帰的クラス分け処理を３レベルまで行ったときの画素の分類を模式的に表したツリー構造を示す図である。ここで、０，２８，５６，…２５５はそれぞれクラスの明度値であり、クラスを識別するためのクラス情報である。また、このツリー構造は、クラス情報により、レベル３のクラス情報から容易にレベル１、レベル２におけるクラス情報を求めることができる。たとえば、レベル３では１９６のクラスに属する画素は、レベル２では１７０のクラスに属し、レベル１では２５５に属することがわかる。したがって、各画素については、レベル３におけるクラス情報のみを記憶しておけばよい。
【０１２０】
ただし、必ずしもクラス情報には明度値を用いる必要はなく、レベル３におけるクラス情報からレベル１，２におけるクラス情報がわかれば良い。たとえば、レベル３のクラスにおいて、前述のクラス０をクラス１，クラス２８をクラス２，クラス５６をクラス３，…，クラス２５５をクラス８などとしてもよい。
【０１２１】
さらにクラスタリング部１１は、再帰的クラス分け処理を行う際に決定した画素ごとの閾値に基づいてオブジェクト情報を作成する。オブジェクト情報は画素ごとに決定され、画素が背景領域に属するか、背景以外のオブジェクト（写真、文字など）領域に属するかを示す。たとえば、画素が背景領域に属する場合は、オブジェクト情報を１とし、オブジェクト領域に属する場合は、オブジェクト情報を０として記憶する。
【０１２２】
画素が背景領域に属するかどうかは、レベルごとに決定され、クラス分けに用いた閾値が初期閾値であって、これが継続されている間の画素は背景領域に属すると判断する。図７および図８に示した条件で閾値を決定した場合、初期閾値が継続されるのは、平坦部が連続しているからである。また、背景領域以外の領域は何らかのオブジェクトが存在すると考えられるため、背景領域以外はオブジェクト領域であると判断する。
【０１２３】
したがって、画素ごとに行われる再帰的クラス分け処理において、閾値として用いる近傍閾値が、背景画素の閾値であれば、注目画素は背景領域に属し、非背景画素の閾値であれば、注目画素は背景領域に属するとする。
【０１２４】
また、閾値が式（６）を用いて算出された場合には、注目画素は非背景領域に属するとする。これは、注目画素の閾値が新たに算出されるということは、何らかのオブジェクトが存在すると考えられるためである。
【０１２５】
以上のように、再帰的クラス分け処理によってクラスタリング部１１は、各画素のクラス情報とオブジェクト情報とを作成する。
【０１２６】
図１０は、各画素のクラス情報の分布を示す図である。本実施形態では、クラス情報として明度値を用いており、この明度値を階調値として用いることで、各画素が有するクラス情報を画像として可視化することができる。図１０（ａ）は、レベル１のクラス情報の分布を示し、図１０（ｂ）は、レベル２のクラス情報の分布を示し、図１０（ｃ）は、レベル３のクラス情報の分布を示している。レベル１から３にかけてクラスが詳細に分類される様子が分かる。
【０１２７】
図１１は、各画素のオブジェクト情報の分布を示す図である。図では、背景領域に属する画素の明度値を２５５（白の領域）とし、オブジェクト領域に属する画素の明度値を１２８（グレーの領域）としてオブジェクト情報の分布を示している。
【０１２８】
次に、ランレングス算出手段であるランレングス算出部１２においてクラスタリング部１１で作成したクラス情報およびオブジェクト情報の主走査方向のランレングスを算出する。ランレングスはレベルごと、本実施形態ではレベル３までのランレングスを算出する。
【０１２９】
図１２は、ランレングス算出処理の手順の一例を示す図である。ここでは、１ラインの画素数を１６画素として処理を行うこととする。ランレングス算出処理は、２つの処理からなる。各画素には１つの変数（カウント）が与えられ、このカウントを所定の条件で変化させることによりランレングスを算出する。まず第１の処理は、各画素のクラス情報（図１２（ａ）参照）に基づいて、ラインの左から右方向に同一クラスの画素が連続する限り、画素のカウントを増加させてランレングスを算出する処理であり、第２の処理は、ラインの右から左方向について、右隣りの画素のカウントが注目画素におけるカウントより１大きい場合、右隣りの画素におけるカウントを注目画素のカウントに置き換えることにより、各画素に自らが属するランのランレングスを与える処理である。なお、２つの処理に分割することで、複雑なループ処理を避けることが可能となり、ＳＩＭＤプロセッサ（同種複数処理型演算装置）によってマルチパス処理で行うことができる。
【０１３０】
まず、図１２（ｂ）を参照して、第１の処理について説明する。第１の処理では、図１２（ａ）に示した各画素のクラス情報に基づいて、左隣りの画素のクラス情報が注目画素のクラス情報と同じ場合、左隣りの画素のカウントに１を加えたカウントを注目画素のカウントとする。図１２（ｂ）のレベル１では、まず左端の画素を注目画素とすると、注目画素のレベル１クラス情報は０であり、左隣りの画素が存在しないので、カウント０を出力バッファに書き込み、注目画素を次の右隣の画素に移動する。
【０１３１】
次の注目画素（左から２番目の画素）のレベル１クラス情報も０であるから、左隣の画素のカウントに１を加え、カウント１を出力バッファに書き込む。次の注目画素（左から３番目の画素）のレベル１クラス情報は２５５であり、左隣の画素とは異なるクラスに属するので、カウントを０に戻し、出力バッファにカウント０を書き込む。同様にして左隣の画素のクラス情報と注目画素のクラス情報とを比較しながら１ライン分の画素についてカウントを決定する。カウントが０の画素が現れるまでのカウントがその画素が属するランのランレングスを示す。
【０１３２】
なお、図１２（ａ）に示すクラス情報は、レベル３クラス情報であるため、レベル１のランレングスを算出するためには、レベル３クラス情報からレベル１クラス情報を求める必要がある。たとえば、左から６番目の画素の記憶されているクラス情報は、レベル３クラス情報の１７０であるが、図９に示したツリー構造から、レベル２クラス情報は、１７０であり、レベル１クラス情報は２５５であることがわかる。
【０１３３】
次に、各画素のレベル２クラス情報を求め、レベル１と同様にして、ランレングスを算出する。レベル３クラス情報からレベル２クラス情報を求める方法について説明する。レベル３クラス情報をｉｎ、レベル２クラス情報をｏｕｔとすると、以下の式により容易に実現できる。
【０１３４】
▲１▼ｏｕｔ＝ｉｎ＜５６？０：ｏｕｔ；
▲２▼ｏｕｔ＝ｉｎ＜１７０？８５：ｏｕｔ；
▲３▼ｏｕｔ＝ｉｎ＜２２６？１７０：ｏｕｔ；
▲４▼ｏｕｔ＝２５５；
【０１３５】
▲１▼レベル３クラス情報を５６と比較し、５６未満ならばレベル２クラス情報を「０」とする。
【０１３６】
▲２▼レベル３クラス情報が５６以上で１７０未満ならば、レベル２クラス情報を「８５」とする。
【０１３７】
▲３▼レベル３クラス情報が１７０以上で２２６未満ならば、レベル２クラス情報を「１７０」とする。
【０１３８】
▲４▼レベル３クラス情報が２２６以上ならば、レベル２クラス情報を「２５５」とする。
【０１３９】
レベル２においては、レベル２より上位であるレベル１におけるクラスの変化を無視してランレングスを算出するために、左隣の画素のクラス情報と注目画素のクラス情報との差の絶対値が２５５となるときには、クラスの変化が無いものとみなし、カウントを０に戻さず、カウントアップを継続する。つまり、レベル１で既にクラスの変化点、すなわちランの境界であると判定された箇所をレベル２以降では検知しないようにする。図１２（ｂ）にレベル２のランレングス算出結果を示す。
【０１４０】
レベル３については、記憶されているそのままのクラス情報を用いてランレングスを算出することができる。ただし、レベル２と同様に、レベル３より上位であるレベル１およびレベル２におけるクラスの変化を無視してランレングスを算出するために、左隣の画素のクラス情報と注目画素のクラス情報との差の絶対値が２８を超えるときには、クラスの変化が無いものとみなし、カウントアップを継続する。以上のような第１の処理により、レベル１〜３までのランレングスを算出することができる。
【０１４１】
第２の処理について説明する。第２の処理では、第１の処理で求めた各画素のカウント（図１２（ｂ））に対して、注目画素のカウントとその右隣り画素のカウントとを比較し、右隣の画素のカウントが注目画素のカウントより１だけ大きければ、注目画素のカウントを右隣りの画素のカウントで置き換える。ランの右端にある画素のカウントはランレングスと等しいので、同じランに属する画素のカウントをランの右端にある画素のカウントで置き換えることによって、各画素が、自らが属するランのランレングスを情報として有することとなる。レベル１の場合を例として以下に説明する（図１２（ｃ）参照）。
・右端の画素のカウントが「１」であり、右隣の画素が存在しないので、カウントは「１」のまま変えない。
・次（右から２番目）の画素のカウントが「０」であり、右隣の画素のカウントが１だけ大きいので、カウントを「１」に置き換える。
・右から３番目の画素のカウントは「３」であり、右隣の画素のカウントが２大きいので、カウントは「３」のまま変えない。
・右から４番目の画素のカウントは「２」であり、右隣の画素のカウントが１だけ大きいので、カウントを「３」に置き換える。
・右から５番目の画素のカウントは「１」であり、右隣の画素のカウントが１だけ大きいので、カウントを「３」に置き換える。
【０１４２】
以下同様にこの処理を繰り返す。なお、注目画素のカウントとその右隣りの画素のカウントとの比較は、第１の処理で求めたカウント（図１２（ｂ））に基づいて行い、置き換えるカウントは第２の処理後のカウントを用いる。これは、連続してカウントされたときのカウントの最大値（ランの右端のカウント）がランレングスに相当するため、連続してカウントされた画素のカウントを最大値で置き換えることに相当する。
【０１４３】
以上の第１および第２の処理と同様の処理を行えば、オブジェクト情報のランレングスを算出することができる。第１の処理では、左隣の画素と同じオブジェクト情報であれば、左隣りの画素のカウントに１を加えたカウントを注目画素のカウントとする。第２の処理では、第１の処理結果に基づいて、カウントの置き換えを行う。
【０１４４】
また、ＳＩＭＤプロセッサのような複数のデータパスを１つのプログラムカウンタで扱うプロセッサでは、１ラインのクラス情報を複数のデータパス、たとえば図１３（ａ）に示すように、データパスＡおよびデータパスＢに分割し、第１の処理では各データパスを同時に処理することができる。
【０１４５】
各データパス内で個別にランレングスを算出し（図１３（ｂ）参照）、データパス間を連結する（図１３（ｃ）参照）。データパスＡとデータパスＢとの連結部において、隣接する画素のクラス情報が同じであれば、データパスＡの右端の画素のカウントを、データパスＢの左端の画素以外でカウントが０の画素が現れるまで加算する（図１３（ｃ）参照）。また、連結部でクラス情報が異なる場合には、そのまま連結する。データパスの連結後は、前述と同様に第２の処理を行い、各画素に自らが属するランのランレングスを与える（図１３（ｄ）参照）。以上の処理により、容易にＳＩＭＤプロセッサにおいて処理を行うことができる。
【０１４６】
次に、文字領域推定手段である文字領域推定部１３において、クラス情報のランレングスに基づいて、文字領域に属する画素を推定する。文字は、一般的に煩雑度が高いと考えられるため、クラス情報のランレングスが文字推定閾値ＳＩＺＥＯＦＴＥＸＴ以下であれば文字領域に属する画素であると推定することができる。
【０１４７】
しかしながら、ランレングス算出部１２で算出したランレングスは、主走査方向のランレングスであるから、閾値ＳＩＺＥＯＦＴＥＸＴに基づいて文字領域の推定を行うと、画像の横方向の煩雑度にのみ依存した判定となり、精度が十分ではない。
【０１４８】
そこで、周辺画素において、注目画素と同一のクラスに属し、かつ、文字領域ではないと推定されている画素が存在する場合、その注目画素は、クラス情報のランレングスが所定の閾値ＳＩＺＥＯＦＴＥＸＴ以下であっても文字領域であると推定しない。この条件を付加して判定することにより、文字領域推定精度を向上することができる。
【０１４９】
さらに、ラインの左から処理を行う場合と右から処理を行う場合とを考慮し、２方向から推定処理を行う。まず、左から右方向に処理を行う場合、クラス情報のランレングスが所定の閾値ＳＩＺＥＯＦＴＥＸＴ以下であっても、図１４に示す処理対象の周辺画素が以下の条件を満たす場合、文字領域であると推定しない。
・左隣の画素が注目画素と同一のクラスに属し、かつ、文字領域として推定されていない
・上の画素が注目画素と同一のクラスに属し、かつ、文字領域として推定されていない
・左斜め上の画素が注目画素と同一のクラスに属し、かつ、文字領域として推定されていない
・右斜め上の画素が注目画素と同一のクラスに属し、かつ、文字領域として推定されていない
また、ラインの右から左方向に処理を行う場合、既に左から右方向の処理で文字領域と推定されていても、以下の条件を満たす場合、文字領域であると推定しない。
・右隣の画素が注目画素と同一のクラスに属し、かつ、文字領域として推定されていない
以上の２方向の処理（▲１▼ラインの左から右方向の処理、▲２▼ラインの右から左方向処理）により、クラス情報のランレングスに基づいて文字領域を精度良く推定することが出来る。以上の文字領域推定処理を各レベルで行う。
【０１５０】
図１５は、各レベルにおける文字推定領域を示す図である。図１５（ａ）は、レベル１における文字推定領域、図１５（ｂ）は、レベル２における文字推定領域、図１５（ｃ）は、レベル３における文字推定領域をそれぞれ示している。図では、文字領域に属すると推定された画素の明度値を２５５、それ以外の画素の明度値を０としている。
【０１５１】
領域判定部１４は、オブジェクト情報のランレングスおよび文字領域推定結果に基づいて、各画素の属する領域を判定する領域判定手段である。オブジェクト情報のランを単位窓（ある単位をまとめて１つのものとして見なす）とし、文字領域推定部１３の推定結果からレベル毎に文字領域と推定された画素の含有率に基づいて、領域判定を行う。
【０１５２】
まず、単位窓内におけるレベル１の文字推定領域の画素数、レベル２の文字推定領域の画素数、レベル３の文字推定領域の画素数をカウントする。図１６は、領域判定の対象となる単位窓の一例を示す図である。この例では、単位窓であるオブジェクト情報のランレングスを８とし（ランレングス算出処理が０からカウントを始めるため、図では「８」ではなく「７」と表記している。）、文字領域に属すると推定される画素を「＊」、文字領域ではないと推定された画素を「−」で表している。
【０１５３】
まず、単位窓内におけるレベル毎の文字領域推定画素をカウントする。図１６では、レベル１における文字領域推定画素数が４、レベル２における文字領域推定画素数が３、レベル３における文字領域推定画素が０である。
【０１５４】
そして、これらの文字領域推定画素数から背景・文字・写真領域を判定する。文字領域は、連続するオブジェクト領域が１つのレベルの文字領域推定画素で構成されていることが多く、たとえば、以下に示す条件では文字領域である可能性が高い。
【０１５５】
【表１】

【０１５６】
逆に、写真領域は、連続するオブジェクト領域が複数のレベルの文字領域推定画素で構成されていることが多い。たとえば、以下に示す条件では写真領域である可能性が高い。
【０１５７】
【表２】

【０１５８】
実際に判定するには、予めオブジェクト情報のランレングス、レベル１の文字領域推定画素数、レベル２の文字領域推定画素数およびレベル３の文字領域推定画素数と、領域判定結果とを関連付けるＬＵＴ（ＬｏｏｋＵｐＴａｂｌｅ）を記憶しておき、文字領域推定画素数に基づいてＬＵＴを参照することにより、オブジェクト領域が文字領域と写真領域のいずれであるかを判定する。このＬＵＴの作成には、たとえば、ニューラルネットワークを用いた学習方法などが挙げられる。
【０１５９】
なお、背景領域は、オブジェクト情報を作成した際、オブジェクト領域が存在しない領域を背景領域と判定する。また、オブジェクト領域であったとしてもオブジェクト情報のランレングスがある程度大きく、各レベルにおける文字領域推定画素数が少ない場合には、背景領域として判定してもよい。
【０１６０】
図１７は、領域判定結果を示す図である。ただし、文字領域に属する画素の明度値を０（黒の領域）、背景領域に属する画素の明度値を２５５（白の領域）、写真領域に属する画素の明度値を１２８（グレーの領域）としている。
【０１６１】
さらに、領域分割結果に基づいて、文字領域に判定された画素から詳細に文字を検知する。なお、文字検知を行う際には、図１８の領域分割部４のブロック図に示すように、領域判定部１４の後段に文字検知部１５が設けられる。文字検知部１５以外の部位については、図２で説明した部位と同じであるので説明は省略する。なお、文字検知部１５は必ずしも領域分割部４に備える必要はない。
【０１６２】
文字検知部１５は、領域判定部１４において文字領域であると判定された画素について、文字領域推定結果を用いてさらに詳細に文字を検知する。文字推定領域において、連続する文字推定領域の最初の画素の属するクラスが文字クラスであるのが一般的であることから、最初の画素が属するクラスを検知し、文字領域であると判定された領域内において、検知したクラスと同一のクラスに属する画素が文字領域に属すると判定することにより、文字の判定精度をさらに向上させることができる。
【０１６３】
図１９は、文字検知部１５が文字の検知を行った場合の領域判定結果を示す図である。各領域を示す明度値は、図１７に示した判定結果と同じである。図からわかるように図１７に示した判定結果に比べて、精度良く文字領域が分割されているのがわかる。
【０１６４】
図２０は、領域分割処理を示すフローチャートである。まず、ステップＳ１では、色変換部１０によって、入力された画像データの色空間を変換し、明度値など領域判定に用いる画素値を求める。ステップＳ２では、クラスタリング部１１によって、再帰的クラス分け処理を行い、クラス情報およびオブジェクト情報を生成する。ステップＳ３では、ランレングス算出部１２が作成されたクラス情報およびオブジェクト情報の主走査方向ランレングスを算出する。
【０１６５】
ステップＳ４では、文字領域推定部１３が、クラス情報のランレングスと閾値ＳＩＺＥＯＦＴＥＸＴとを比較する。閾値より小さいランレングスを有するランに属する画素を文字領域に属する画素と推定する。ステップＳ５では、領域判定部１４が、オブジェクト情報が連続する領域内の画素のうち文字領域と推定された画素の画素数に基づいて、オブジェクト領域の画素を文字領域か写真領域に判定する。
【０１６６】
以上のように、本実施形態では、周辺画素の影響を考慮して注目画素ごとに閾値を決定する再帰的クラス分け処理によって、画像データを複数のクラスに分類し、この結果に基づいて領域判定を行う。したがって、固定閾値を用いてクラス分け処理を行う場合などと比べて領域分離精度を向上させることができる。
【０１６７】
また、本発明の実施の他の形態は、コンピュータを画像処理装置２として機能させるための画像処理プログラム、および画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体である。これによって、画像処理プログラムおよび画像処理プログラムを記録した記録媒体を持ち運び自在に提供することができる。
【０１６８】
記録媒体は、プリンタやコンピュータシステム（コンピュータシステムに適用する場合はアプリケーション・ソフトとして用いることができる）に備えられるプログラム読み取り装置により読み取られることで、画像処理プログラムが実行される。
【０１６９】
コンピュータシステムの入力手段としては、フラットベッドスキャナ・フィルムスキャナ・デジタルカメラなどを用いてもよい。コンピュータシステムは、これらの入力手段と、所定のプログラムがロードされることにより画像処理などを実行するコンピュータと、コンピュータの処理結果を表示するＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）ディスプレイ・液晶ディスプレイなどの画像表示装置と、コンピュータの処理結果を紙などに出力するプリンタより構成される。さらには、ネットワークを介してサーバーなどに接続するための通信手段としてのモデムなどが備えられる。
【０１７０】
なお、記録媒体としては、プログラム読み取り装置によって読み取られるものには限らず、マイクロコンピュータのメモリ、たとえばＲＯＭであっても良い。記録されているプログラムはマイクロプロセッサがアクセスして実行しても良いし、あるいは、記録媒体から読み出したプログラムを、マイクロコンピュータのプログラム記憶エリアにダウンロードし、そのプログラムを実行してもよい。このダウンロード機能は予めマイクロコンピュータが備えているものとする。
【０１７１】
記録媒体の具体的な例としては、磁気テープやカセットテープなどのテープ系、フレキシブルディスクやハードディスクなどの磁気ディスクやＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）／ＭＯ（ＭａｇｎｅｔｏＯｐｔｉｃａｌ）ディスク／ＭＤ（ＭｉｎｉＤｉｓｃ）／ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）などの光ディスクのディスク系、ＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）カード（メモリカードを含む）／光カードなどのカード系、あるいはマスクＲＯＭ、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、フラッシュＲＯＭなどの半導体メモリを含めた固定的にプログラムを担持する媒体である。
【０１７２】
また、本実施形態においては、コンピュータはインターネットを含む通信ネットワークに接続可能なシステム構成とし、通信ネットワークを介して画像処理プログラムをダウンロードしても良い。なお、このように通信ネットワークからプログラムをダウンロードする場合には、そのダウンロード機能は予めコンピュータに備えておくか、あるいは別な記録媒体からインストールされるものであっても良い。また、ダウンロード用のプログラムはユーザーインターフェースを介して実行されるものであっても良いし、決められたＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｅｒ）から定期的にプログラムをダウンロードするようなものであっても良い。
【０１７３】
【発明の効果】
以上のように本発明によれば、注目画素とその周辺画素とからなる画素ブロックの特徴量に基づく閾値を用いて注目画素にクラス分けを行っているので、固定閾値を用いてクラス分けを行う場合に比べ、周辺画素の影響を反映させたクラス情報およびオブジェクト情報を生成することができる。オブジェクト情報の判定は、オブジェクト情報に基づいて精度よく行われる。文字領域の判定は、クラス情報およびオブジェクト情報を用いて、クラスランレングスに基づく推定と、オブジェクトランに含まれる推定画素数の割合とから判定しているので、精度よく文字領域に属する画素を判定できる。したがって、各領域の判定精度が高いので、画像データの領域分割精度を向上させることができる。
【０１７４】
また本発明によれば、エッジ強度を反映した閾値を用いるため、文字領域などのエッジ付近の画素において適切にクラス分けを行うことができる。
【０１７５】
また本発明によれば、画像データのダイナミックレンジが狭い場合でも、適切にクラス分けを行うことができる。
【０１７６】
また本発明によれば、走査ラインごとに走査方向を変えることによって、左右の周辺画素の影響を平均して受けることができるため、適切なクラス分けを行うことができる。
【０１７７】
また本発明によれば、エッジ画素の位置が注目画素の上のみ場合は、閾値を算出せずに左側の周辺画素の閾値を注目画素の閾値として用いるなど、エッジ画素の位置に基づいて注目画素の閾値を生成するので、特に注目画素がエッジ付近の背景画素などの場合に適切な閾値を生成してクラス分けを行うことができる。
【０１７８】
また本発明によれば、注目画素が背景領域に属するか否かを容易かつ精度良く判断することができる。
【０１７９】
また本発明によれば、同種複数処理型演算装置、いわゆるＳＩＭＤ（ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ）型プロセッサは、同種の命令の処理を同時に行うことができる。ランレングスを算出する場合に、命令をランレングスの算出とすると、データパスごとのランレングスの算出処理を同時に行うことができる。したがって、ランレングス算出処理の処理速度を高速化することができる。
【０１８０】
また本発明によれば、一般的に文字は繁雑度が高いため、クラスランレングスを文字推定閾値と比較するだけで容易に文字領域に属するか否かを推定することができる。
【０１８１】
また本発明によれば、文字領域の推定精度を高めることができる。
また本発明によれば、文字領域に属する画素は同じクラス情報を有し、文字領域に属すると判定された連続する画素のうち、最端部の画素が文字領域に属する場合が多いので、精度よく文字領域に属する画素を検知することができる。
【０１８２】
また本発明によれば、画像データが高精度で領域分割され、各領域に応じた後処理が施された画像データを出力することができるので、高画質な静止画像を形成することができる。
【０１８３】
また本発明によれば、画像処理方法をコンピュータに実行させるための画像処理プログラムとして提供することができる。
【０１８４】
また本発明によれば、画像処理方法をコンピュータに実行させるための画像処理プログラムを記録したコンピュータ読み取り可能な記録媒体として提供することができる。
【図面の簡単な説明】
【図１】本発明の実施の一形態である画像形成装置１の構成を示すブロック図である。
【図２】領域分割部４の構成を示すブロック図である。
【図３】入力画像（図３（ａ））と、色空間変換によって生成したＬ^＊信号からなる画像（図３（ｂ））の例を示す図である。
【図４】３×３画素の画素ブロックを示す図である。
【図５】Ｐｒｅｗｉｔｔオペレータ（プリヴィットフィルター）の一例を示す図である。
【図６】各画素における閾値の分布を示す図である。
【図７】注目画素と周辺のエッジ画素との位置関係による閾値の決定方法を説明する図である。
【図８】注目画素と周辺のエッジ画素との位置関係による閾値の決定方法を説明する図である。
【図９】再帰的クラス分け処理を３レベルまで行ったときの画素の分類を模式的に表したツリー構造を示す図である。
【図１０】各画素のクラス情報の分布を示す図である。
【図１１】各画素のオブジェクト情報の分布を示す図である。
【図１２】ランレングス算出処理の手順の一例を示す図である。
【図１３】ＳＩＭＤプロセッサを用いたランレングス算出処理の手順の一例を示す図である。
【図１４】文字領域推定部１３が行う文字領域推定処理を説明する図である。
【図１５】各レベルにおける文字推定領域を示す図である。
【図１６】領域判定の対象となる単位窓の一例を示す図である。
【図１７】領域判定結果を示す図である。
【図１８】領域分割部４の他の構成を示すブロック図である。
【図１９】文字検知部１５が文字の検知を行った場合の領域判定結果を示す図である。
【図２０】領域分割処理を示すフローチャートである。
【符号の説明】
１画像形成装置
２画像処理装置
３入力部
４領域分割部
５補正部
６解像度変換部
７色補正部
８ハーフトーン部
９プリンタ
１０色変換部
１１クラスタリング部
１２ランレングス算出部
１３文字領域推定部
１４領域判定部[0001]
[Field of the Invention]
The present invention provides an image processing method, an image processing apparatus, an image forming apparatus, and the like for dividing a multi-tone image in which backgrounds, characters, and photos are mixed into backgrounds, characters, and other regions, for example, for a video input from digital television broadcasting. And a program and a recording medium.
[0002]
[Prior art]
Multi-valued input image (still image) data obtained by decoding a digital television broadcast signal received via a tuner contains text, photos, and background areas, each of which has its own image quality degradation. Accompany. In the character area, character bleeding and character loss occur, and in the photographic area, compression artifacts such as ringing and block noise by JPEG (Joint Photographic Experts Group) and MPEG (Moving Picture Experts Group) occur. In addition, a considerable amount of noise is seen in the background area, and when the image is enlarged and output to a printer, image quality degradation is very conspicuous.
[0003]
Also, if processing such as character bleeding or artifacts due to compression is performed on the entire image data, the image will be blurred, and in order to improve the image quality of the photographic area, the detail reproduction will be improved and the outline area will be enhanced. It is emphasized and makes noise stand out. Therefore, it is desirable to detect and divide the background region, character region, and photograph region of the input multi-valued image data in which background, character, and photograph regions are mixed, and to perform processing suitable for each region.
[0004]
In order to solve such a problem, a region dividing process of image data has been conventionally developed. For example, there is a method of dividing using a maximum pixel density difference in a small region including a plurality of pixels. In this method, utilizing that the density distribution of the background area is flatter than the density distribution of the photograph / text area, if the maximum density difference in the small area including the pixel of interest is equal to or less than the first predetermined threshold, The pixel of interest is determined as a background area, and the others are determined as object areas. Further, if the object area is equal to or larger than the second predetermined threshold, the pixel of interest is determined to be a character area, and the other pixels are determined to be a photograph area.
[0005]
However, in this method, an area having a sharp change in density, such as an outline of an object included in a photograph area, is erroneously determined as a character area. Is incorrectly determined as Therefore, there is a problem that the region division accuracy is poor for the entire image data. As a technique for solving this problem, there is an image region separation device described in Patent Document 1. The apparatus includes an input unit configured to input image information by scanning by inputting image information including a background region including background color pixels and a non-background region of a different type such as a photograph or a character. A non-background color pixel separation unit that detects a non-background color pixel from image information input for each scan by the input unit, and a non-background color pixel separation unit separated by the non-background color pixel separation unit in one or more scanning directions. A color pixel is defined as one run, and a run detection unit that detects the length of the color pixel and a non-background color pixel separated by the non-background color pixel separation unit determine whether the non-background color pixel is an edge pixel that forms an edge of the non-background area. Based on the edge pixel determining means and the ratio of non-background color pixels determined as edge pixels, the detected run can be any type of non-background area of a different type of non-background area such as a photograph or a character. And a determining region determining means for run attribute indicating belongs to, performs area division using these means. In this area determination means, when the ratio of edge pixels in the run detected by the run detection means is large, the run is determined as a character area, and when the percentage of edge pixels in the run is small, The run is determined as a photograph area.
[0006]
In general, characters are clearly separated from white and black, whereas in photographs, white and black often change gradually. Therefore, when the ratio of the non-background color pixels determined as the edge pixels by the run detection unit among the continuous non-background color pixels belonging to one run is large, it is possible to determine that the character region is present. Become. Conversely, in the case of the photograph area, unlike the character area, the non-background color pixels are continuous and exhibit a gradual change, so that the proportion of non-background color pixels determined as edge pixels in the run is small. Can be determined to be a photographic area. As described above, the area determination unit determines a run in the main scanning direction including non-background color pixels as a character area when the edge pixel content rate in each run is high, and determines a photograph when the edge pixel content rate is low. It is determined as an area.
[0007]
[Patent Document 1]
JP-A-6-54180
[0008]
[Problems to be solved by the invention]
In the above prior art, the non-background pixels are detected based on the average density value in the N × N pixel block and the like. However, assuming that the background color is white, the non-background pixels are determined by the color difference. Is detected, it is not possible to divide a background region and a non-background region into input image data having background regions of various colors as in digital television broadcasting. Furthermore, since the edge amount is binarized (classified) using a fixed threshold based on the maximum density difference in the N × N pixel block, the non-background region is divided into a character region and a photograph region. There is a problem that accuracy is poor.
[0009]
An object of the present invention is to provide an image processing apparatus capable of performing a high-precision area division process, an image forming apparatus including the image processing apparatus, an image processing method, an image processing program, and a computer-readable recording medium. It is.
[0010]
[Means for Solving the Problems]
According to the present invention, image data indicating an image composed of a plurality of pixels is input, and each pixel constituting the image based on the input image data is determined to which of a character area, a background area, and another area belongs. In an image processing apparatus including an area dividing unit that determines and performs area division of image data,
The area dividing unit includes:
The feature amount of the pixel block including the target pixel and its surrounding pixels is obtained using the pixel value of each pixel, a threshold based on the obtained feature amount is generated, and the generated threshold is compared with the pixel value of each pixel. The target pixel is classified into two pixel sets, and the pixel set classified by the classification is further classified with a threshold different from the threshold, thereby performing a plurality of levels of classification. Class information generating means for generating class information indicating a result of the classification of
Based on a plurality of thresholds generated by the class information generating means, determine whether the pixel of interest belongs to the background area, object information generating means to generate object information indicating the determination result,
A class run length, which is the number of pixels of a class run composed of pixels adjacent to each other in a predetermined direction having the same class information, and a pixel of an object run composed of pixels adjacent to each other in a predetermined direction having the same object information. Run length calculating means for calculating an object run length, which is a number, for each of the stages,
Character area estimation means for estimating, for each of the stages, whether or not a pixel included in a claslan belongs to a character area, based on the clas run length,
It is determined whether or not the pixel belongs to the background area based on the object information, and the ratio of the pixel included in the object run, which is estimated to belong to the character area by the character area estimating unit, at each step. And a region determining unit that determines whether a pixel included in the object run belongs to a character region or another region based on the image run.
[0011]
According to the present invention, the region dividing unit determines, based on image data indicating an image composed of a plurality of pixels, which of a character region, a background region, and other regions each pixel constituting the image belongs to Then, the image data is divided into regions.
[0012]
The region dividing unit has the above-described configuration. First, the class information generating unit obtains a feature amount of a pixel block including a target pixel and its peripheral pixels using the pixel value of each pixel, and obtains the obtained feature. A threshold based on the amount is generated, and the generated threshold is compared with the pixel value of each pixel to classify the pixel of interest. With this classification, each pixel is classified into two pixel sets, and each pixel of the classified pixel set is further classified with a threshold different from the threshold. By repeating this process, classification into a plurality of stages is performed. The result of the multi-stage classification is generated as class information. The class information is information indicating which class each pixel belongs to when classified by the classification as described above, that is, the pixel value such as the brightness value belongs to a class equal to or larger than the threshold or a class smaller than the threshold. .
[0013]
For example, in the first stage, the pixels are classified into two classes by the first classification, and in the second stage, the pixels of these two classes are further classified and classified into four classes. Accordingly, the first-stage class information indicates which of the two classes each pixel belongs to, and the second-stage class information indicates which of the four classes each pixel belongs to.
[0014]
The object information generating means determines whether or not the pixel of interest belongs to the background area based on the plurality of thresholds generated by the class information generating means, and generates object information indicating the result of the determination.
[0015]
When the class information and the object information are generated in this way, the run length calculating means calculates the class run length and the object run length for each of the stages. The clas run length is the number of pixels of a clas run consisting of pixels having the same class information and adjacent to each other in a predetermined direction.The object run length is a pixel having the same object information and being adjacent to each other in a predetermined direction. Is the number of pixels of the object run consisting of That is, the class run length indicates the number of pixels when pixels classified into the same class are continuously arranged by the classification, and the object run length indicates the case where pixels belonging to the background area are continuously arranged or the background. It shows the number of pixels when pixels that do not belong to the pixel (pixels that belong to the character area or other area) are continuously arranged.
[0016]
Next, based on the clas run length calculated by the run length calculating means, it is determined at each of the stages whether or not the pixels included in the clas run belong to the character area. When it is determined whether or not a pixel belongs to a character area, the determination accuracy may be low. Therefore, the final determination is made by the area determining means described later, and the character area estimating means estimates a pixel having a high possibility of belonging to the character area for each stage based on the class run length.
[0017]
Based on the operation results of the respective units obtained as described above, the region determining unit determines the region to which the pixel belongs.
[0018]
First, it is determined whether or not a pixel belongs to a background area based on the object information generated by the object information generating means. For a pixel determined not to belong to the background area, it is determined whether the pixel belongs to the character area or another area as follows.
[0019]
For an object run including a pixel that does not belong to the background area, the ratio of the pixel that is estimated to belong to the character area by the character area estimation unit among the pixels included in the object run is calculated for each of the above-described steps. In the character area, since there is a large percentage of pixels that are estimated to belong to the character area at the same stage in one object run, the object run is determined based on the ratio of pixels estimated to belong to the character area for each stage. It is determined whether or not the run is an object run composed of pixels belonging to the character area. If the object run is composed of pixels belonging to the character area, the pixels included in the object run are determined as the pixels included in the character area. If it is not an object run consisting of pixels belonging to the character area, the pixels included in the object run are determined as pixels included in other areas.
[0020]
Since the target pixel is classified using a threshold based on the feature amount of the pixel block including the target pixel and its peripheral pixels, the influence of peripheral pixels is reflected as compared with the case where the classification is performed using a fixed threshold. The generated class information and object information can be generated. The determination of the object information is accurately performed based on the object information. The character area is determined based on the class run length and the ratio of the estimated number of pixels included in the object run using the class information and the object information. Therefore, the pixels belonging to the character area are accurately determined. it can.
[0021]
As described above, since the determination accuracy of each region is high, the region division accuracy of the image data can be improved.
[0022]
Further, according to the present invention, the class information generating means includes an edge amount of a target pixel as a feature amount, a density average value of a pixel included in the pixel block, and a classification performed when a peripheral pixel is a target pixel. A threshold value is calculated by linearly interpolating a density average value and threshold values of peripheral pixels using the edge amount as a weighting factor and a threshold value.
[0023]
According to the present invention, the class information generating means includes, as feature amounts, an edge amount of a target pixel, a density average value of a pixel included in a pixel block, and a threshold value of a classification performed when a peripheral pixel is a target pixel. The threshold value is calculated by linearly interpolating the average density value and the threshold values of the peripheral pixels using the edge amount as a weighting coefficient.
[0024]
As a result, since the threshold value reflecting the edge strength is used, it is possible to appropriately classify pixels in the vicinity of an edge such as a character area.
[0025]
Further, the invention is characterized in that the class information generating means sets the lower limit value of the edge amount based on a dynamic range of image data.
[0026]
According to the invention, the class information generating means sets the lower limit of the image edge amount based on the dynamic range of the image data.
[0027]
As a result, even when the dynamic range of the image data is narrow, the classification can be appropriately performed.
[0028]
Further, in the invention, it is preferable that the class information generating means sequentially moves the pixel of interest in a predetermined scanning direction, classifies the pixel, and changes the scanning direction for each scanning line.
[0029]
According to the present invention, the class information generating means sequentially moves the pixel of interest in a predetermined scanning direction to perform classification, and changes the scanning direction for each scanning line.
[0030]
When the class information generation means calculates the threshold, the threshold of the classification performed when the peripheral pixel is the target pixel is used. Therefore, when scanning the line from left to right, the threshold values of the peripheral pixels above and left of the target pixel are used. If the classification is performed without changing the scanning direction, an appropriate threshold value cannot be calculated because the left peripheral pixels are always affected. By changing the scanning direction for each scanning line, the influence of the left and right peripheral pixels can be received on average, so that appropriate classification can be performed.
[0031]
Further, according to the present invention, the class information generating means selects from the threshold values of the peripheral pixels without calculating the threshold value of the pixel of interest based on the positions of the edge pixels included in the peripheral pixels, or the density average value. The threshold value of the peripheral pixel is calculated by linear interpolation.
[0032]
According to the present invention, the class information generating means selects from the threshold values of the peripheral pixels without calculating the threshold value of the pixel of interest based on the position of the edge pixel included in the peripheral pixel, or calculates the density average value. The threshold value of the peripheral pixel is calculated by linear interpolation.
[0033]
When an edge pixel is included in the peripheral pixels, the threshold value to be calculated is strongly affected by the edge pixel, and an appropriate threshold value cannot be calculated. Therefore, when the position of the edge pixel is only above the target pixel, the threshold of the target pixel is generated based on the position of the edge pixel, such as using the threshold of the left peripheral pixel as the threshold of the target pixel without calculating the threshold. Therefore, particularly when the target pixel is a background pixel near the edge or the like, an appropriate threshold value can be generated and the classification can be performed.
[0034]
Further, according to the present invention, when the object information generating means has a predetermined initial threshold for the first target pixel of the image data continuous as the threshold generated by the class information generating means, Is determined to belong to
[0035]
According to the present invention, when the predetermined initial threshold for the first target pixel of the image data is continuous as the threshold generated by the class information generating unit, the object information generating unit sets the target pixel in the background area. Judge that it belongs.
[0036]
Since the first target pixel of the image data is almost always a background pixel, the threshold value generated by the class information generating means is the initial threshold value.If the threshold value is continuous, the target pixel may belong to the background region. Many. Therefore, it is possible to easily and accurately determine whether or not the target pixel belongs to the background area under such a condition.
[0037]
Further, in the present invention, the run-length calculating means is constituted by a same-type multi-processing type arithmetic unit, divides a scan line into data paths including class information for a predetermined number of pixels, and calculates a run length for each data path. After the calculation of each data path, the data paths are connected to obtain a run length.
[0038]
According to the present invention, the run length calculating means is constituted by the same type of multiple processing type arithmetic device. At this time, the scan line is divided into data paths including class information for a predetermined number of pixels, the run length is calculated for each data path, and after calculating each data path, the data paths are connected to obtain a run length.
[0039]
A multi-processing type arithmetic device of the same type, that is, a so-called Single Instruction Multiple Data (SIMD) type processor can simultaneously execute the processing of the same type of instruction. When calculating the run length, if the instruction is the calculation of the run length, the process of calculating the run length for each data path can be performed simultaneously. Therefore, the processing speed of the run length calculation processing can be increased.
[0040]
Also, in the present invention, the character region estimating unit compares the clas run length calculated by the run length calculating unit with a predetermined character estimation threshold, and if the class run length is equal to or smaller than the threshold, the pixels included in the class run belong to the character region. It is characterized by estimation.
[0041]
According to the present invention, the character region estimating unit compares the class run length calculated by the run length calculating unit with a predetermined character estimation threshold, and if it is equal to or less than the threshold, it is determined that the pixels included in the class run belong to the character region. presume.
[0042]
In general, characters have a high degree of complexity, so it can be easily estimated whether or not a character belongs to a character area only by comparing the class run length with a character estimation threshold.
[0043]
Further, in the present invention, the character region estimating means does not belong to the character region when any of the surrounding pixels has the same class information as the pixel of interest and is estimated to not belong to the character region. It is characterized by estimating.
[0044]
According to the present invention, if any of the surrounding pixels has the same class information as the target pixel and it is estimated that the target pixel does not belong to the character region, the target region does not belong to the character region. It is estimated.
[0045]
The run length used for estimating the character area is a run length in a predetermined direction, for example, only in the horizontal direction.If the run length is estimated only by comparing the run length and the threshold, the estimation accuracy depends on only the complexity in the horizontal direction, and the estimation accuracy is low. turn into. Therefore, the estimation accuracy can be improved by performing the estimation with the above conditions added.
[0046]
Further, according to the present invention, there is provided a character detecting means for detecting, as a character pixel, a pixel having the same class information as the class information held by the endmost pixel among the continuous pixels determined to belong to the character area by the area determining means. Is provided.
[0047]
According to the present invention, the character detection unit determines, as a character pixel, a pixel having the same class information as the class information of the endmost pixel among the continuous pixels determined to belong to the character region by the region determination unit. Detect.
[0048]
Pixels belonging to the character area have the same class information, and among the consecutive pixels determined to belong to the character area, the pixels at the end belong to the character area in many cases. Can be detected.
[0049]
According to another aspect of the present invention, there is provided an image forming apparatus comprising: the image processing device described above; and an image output device that outputs image data processed by the image processing device.
[0050]
According to the invention, the image data processed by the image processing device is output from the image output device.
[0051]
As a result, the image data is divided into regions with high precision, and image data subjected to post-processing according to each region can be output, so that a high-quality still image can be formed.
[0052]
Further, according to the present invention, image data indicating an image composed of a plurality of pixels is input, and each of the pixels forming the image based on the input image data belongs to any of a character area, a background area, and another area. In the image processing method including an area dividing step of performing area division of image data,
The area dividing step includes:
The feature amount of the pixel block including the target pixel and its surrounding pixels is obtained using the pixel value of each pixel, a threshold based on the obtained feature amount is generated, and the generated threshold value is compared with the pixel value to obtain the target pixel. Is divided into two pixel sets, and the pixel set classified by the classification is further classified by a threshold different from the threshold, thereby performing a multi-stage classification, and the classification is performed for each stage. Class information generating step of generating class information indicating the result of
Based on the plurality of thresholds generated in the class information generation step, determine whether the target pixel belongs to the background area, object information generation step of generating object information indicating the determination result,
A class run length, which is the number of pixels of a class run composed of pixels adjacent to each other in a predetermined direction having the same class information, and a pixel of an object run composed of pixels adjacent to each other in a predetermined direction having the same object information. A run length calculation step of calculating an object run length which is a number for each of the stages;
A character region estimation step of estimating, for each of the stages, whether a pixel included in the clas run belongs to a character region, based on the clas run length;
It is determined whether or not the pixel belongs to the background area based on the object information, and the ratio of the pixel included in the object run, which is estimated to belong to the character area by the character area estimation step, in each of the steps. And a region determining step of determining whether a pixel included in the object run belongs to a character region or another region based on the image run.
[0053]
According to the present invention, the area dividing step determines whether each pixel constituting the image belongs to a character area, a background area, or another area based on image data indicating an image composed of a plurality of pixels. Then, the image data is divided into regions.
[0054]
The region dividing step includes the steps described above. First, in the class information generating step, a feature amount of a pixel block including a target pixel and its surrounding pixels is obtained using the pixel value of each pixel, and the obtained feature amount is calculated. A threshold based on the target pixel is generated, and the generated threshold is compared with the pixel value of each pixel to classify the target pixel. With this classification, each pixel is classified into two pixel sets, and each pixel of the classified pixel set is further classified with a threshold different from the threshold. By repeating this process, classification into a plurality of stages is performed. The result of the multi-stage classification is generated as class information. The class information is information indicating which class each pixel belongs to when classified by the classification as described above, that is, the pixel value such as the brightness value belongs to a class equal to or larger than the threshold or a class smaller than the threshold. .
[0055]
For example, in the first stage, the pixels are classified into two classes by the first classification, and in the second stage, the pixels of these two classes are further classified and classified into four classes. Accordingly, the first-stage class information indicates which of the two classes each pixel belongs to, and the second-stage class information indicates which of the four classes each pixel belongs to.
[0056]
In the object information generating step, it is determined whether or not the target pixel belongs to the background area based on the plurality of thresholds generated in the class information generating step, and object information indicating the determination result is generated.
[0057]
When the class information and the object information are generated in this way, in the run length calculating step, the class run length and the object run length are calculated for each of the stages. The clas run length is the number of pixels of a clas run consisting of pixels having the same class information and adjacent to each other in a predetermined direction.The object run length is a pixel having the same object information and being adjacent to each other in a predetermined direction. Is the number of pixels of the object run consisting of That is, the class run length indicates the number of pixels when pixels classified into the same class are continuously arranged by the classification, and the object run length indicates the case where pixels belonging to the background area are continuously arranged or the background. It shows the number of pixels when pixels that do not belong to the pixel (pixels that belong to the character area or other area) are continuously arranged.
[0058]
Next, based on the clas run length calculated in the run length calculating step, it is determined at each stage whether or not the pixels included in the clas run belong to the character area. When it is determined whether or not a pixel belongs to a character area, the determination accuracy may be low. Therefore, the final determination is performed in an area determination step described later, and in the character area estimation step, pixels having a high possibility of belonging to the character area are estimated for each stage based on the class run length.
[0059]
In the area determination step, the area to which the pixel belongs is determined based on the results of the respective steps obtained as described above.
[0060]
First, it is determined whether or not a pixel belongs to a background area based on the object information generated in the object information generation step. For a pixel determined not to belong to the background area, it is determined whether the pixel belongs to the character area or another area as follows.
[0061]
For an object run that includes pixels that do not belong to the background area, the ratio of the pixels that are estimated to belong to the character area in the character area estimation step among the pixels included in this object run is calculated for each of the stages. In the character area, since there is a large percentage of pixels that are estimated to belong to the character area at the same stage in one object run, the object run is determined based on the ratio of pixels estimated to belong to the character area for each stage. It is determined whether or not the run is an object run composed of pixels belonging to the character area. If the object run is composed of pixels belonging to the character area, the pixels included in the object run are determined as the pixels included in the character area. If it is not an object run consisting of pixels belonging to the character area, the pixels included in the object run are determined as pixels included in other areas.
[0062]
Since the target pixel is classified using a threshold based on the feature amount of the pixel block including the target pixel and its peripheral pixels, the influence of peripheral pixels is reflected as compared with the case where the classification is performed using a fixed threshold. The generated class information and object information can be generated. The determination of the object information is accurately performed based on the object information. The character area is determined based on the class run length and the ratio of the estimated number of pixels included in the object run using the class information and the object information. Therefore, the pixels belonging to the character area are accurately determined. it can.
[0063]
As described above, since the determination accuracy of each region is high, the region division accuracy of the image data can be improved.
[0064]
Further, the present invention is an image processing program for causing a computer to execute the above image processing method.
[0065]
According to the present invention, it is possible to provide an image processing program for causing a computer to execute the above image processing method.
[0066]
The present invention is also a computer-readable recording medium on which an image processing program for causing a computer to execute the above-described image processing method is recorded.
[0067]
According to the present invention, it is possible to provide a computer-readable recording medium on which an image processing program for causing a computer to execute the above-described image processing method is recorded.
[0068]
BEST MODE FOR CARRYING OUT THE INVENTION
The present invention relates to an image processing apparatus that performs an area determination process on multi-valued input image data in which a text, a photograph, and a background area are mixed. For example, the multi-valued input image data obtained by digital broadcasting (data broadcasting) Is an apparatus that performs image processing in advance when printing is performed by a printer or the like.
[0069]
FIG. 1 is a block diagram illustrating a configuration of an image forming apparatus 1 according to an embodiment of the present invention. The image forming apparatus 1 includes an image processing apparatus 2 and a printer 9 as an image output apparatus. The image processing apparatus 2 includes an input unit 3, an area division unit 4, a correction unit 5, a resolution conversion unit 6, and a color correction unit 7. And a halftone section 8.
[0070]
The image forming apparatus 1 according to the present embodiment will be described as a digital printer that prints and outputs image data transmitted by digital television broadcasting or the like. In order to print and output, first, a digital television broadcast signal sent via a wired cable or a broadcast radio antenna is input to an input unit 3 such as a tuner to input multi-valued image data (hereinafter simply referred to as image data). .). The image data is composed of a plurality of pixels arranged in a grid, and each pixel has a pixel value such as a brightness value and a chromaticity.
[0071]
Next, the region dividing unit 4 determines whether each pixel of the image data belongs to a character region, a background region, or a photograph region, and divides the image data into a character region, a background region, and a photograph region. The correction unit 5 performs a correction process suitable for each area.
[0072]
The correction unit 5 includes a character bleeding correction processing unit 5a, a compressed artifact elimination processing unit 5b, and a noise elimination processing unit 5c. In addition, the compression artifact removal processing unit 5b performs processing to remove artifacts due to compression in the photograph area, and the noise removal processing unit 5c performs noise reduction processing in the background area. Processing to remove components is performed.
[0073]
The corrected image data whose image quality has been improved is subjected to resolution conversion processing by the resolution conversion unit 6 in accordance with the resolution of the printer 9. After the color correction unit 7 converts the color space of the image data subjected to the resolution conversion process into the device color space, finally, the halftone unit 8 performs a halftone process and outputs the halftone process to the printer 9. The printer 9 prints image data output from the image processing apparatus 2 on a recording medium such as paper using an electrophotographic method or an ink jet method, for example.
[0074]
The above processing is controlled by a CPU (Central Processing Unit) not shown. The image processing apparatus 2 and the printer 3 may be directly connected by a connection cable, or may be connected via a network such as a LAN (Local Area Network). At this time, the image processing device 2 is a personal computer (PC) or the like, and the printer 3 may be a facsimile device, a copying device, or a multifunction device having a copying function and a facsimile function.
[0075]
FIG. 2 is a block diagram showing a configuration of the area dividing unit 4. As shown in FIG. The area dividing section 4 includes a color converting section 10, a clustering section 11, a run length calculating section 12, a character area estimating section 13, and an area determining section 14.
[0076]
In the region dividing unit 4, after the color conversion unit 10 converts the image data in which the photograph region, the background region, and the character region are mixed into a predetermined color space, the clustering unit 11 performs a recursive classification process to reconstruct the image data. Generate class information and object information. Then, the run length calculating unit 12 calculates a run length in which pixels having the same information in the horizontal direction continue for each of the class information and the object information.
[0077]
Next, the character area estimation unit 13 estimates pixels belonging to the character area based on the run length of the class information. Then, in each of the object areas in which the run of the object information is continuous, the area determination unit 14 determines which one of the character area, the background area, and the photograph area, based on the content ratio of the pixel estimated to belong to the character area. Is determined.
[0078]
Hereinafter, the operation of each part will be described in detail. First, in the color conversion unit 10, if the input image data is an RGB color space image, (R + G + B) / 3 is calculated and converted so as to be unified into one data.
[0079]
Further, as another color conversion method, an input RGB color space image is converted into a uniform color space L ^* a ^* b ^* Color Space CIE 1976 (CIE: Commission Internationale de l'Eclairage: International Commission on Illumination. L ^* : Lightness, a ^* , B ^* : Chromaticity) is converted to a color space and its L ^* Use signals. FIG. 3 shows an input image (FIG. 3A) and L generated by color space conversion. ^* FIG. 4 is a diagram showing an example of an image (FIG. 3B) composed of signals.
[0080]
The clustering unit 11 is a class information generation unit and an object information generation unit that perform recursive classification processing on image data to generate class information and object information. The class information is information indicating which class each pixel belongs to when classified by the recursive classification process, that is, a class in which a pixel value such as a brightness value is equal to or more than a threshold value or less than a threshold value. The object information is information indicating whether each pixel belongs to a background area or a non-background area (object area) which is a character area and a photograph area.
[0081]
The recursive classification process is a process of calculating a threshold based on a feature amount of a pixel block including a target pixel, and classifying the target pixel using the calculated threshold. First, as a pixel block, a 3 × 3 pixel block including a central target pixel and eight peripheral pixels is used.
[0082]
FIG. 4 is a diagram showing a pixel block of 3 × 3 pixels. Assuming that the coordinates of the target pixel C1 are (x, y), the coordinates of the peripheral pixels P1 to P8 are P1 (x-1, y-1), P2 (x, y-1), and P3 (x + 1, y-), respectively. 1), P4 (x-1, y), P5 (x + 1, y), P6 (x-1, y + 1), P7 (x, y + 1), P8 (x + 1, y + 1). As the feature amount, a neighborhood average value, a neighborhood edge amount, and a neighborhood threshold are used. The neighborhood average value Avg is obtained as an average of the pixel values of nine pixels in the window shown in FIG. For the edge amount, a prewitt operator (prewitt filter) as shown in FIG. 5 is used. An edge amount is calculated by extracting a pixel value of 3 × 3 pixels and convolving a matrix coefficient with the pixel value. FIG. 5A shows a vertical operator and FIG. 5B shows a horizontal operator. By using the respective operators, it is possible to calculate the vertical edge amount_v (x, y) and the horizontal edge amount edge_h (x, y).
[0083]
Then, the threshold value of the target pixel is dynamically determined using the neighborhood average value Avg, the vertical edge amount edge_v, the horizontal edge amount edge_h, and the neighborhood threshold value (threshold value of the peripheral pixels already classified). . In order to improve the accuracy of region separation, the neighborhood threshold is mainly used as the threshold value in the edge portion of the image so as to change the class, and the neighborhood threshold value is mainly used in the flat portion of the image so as not to change the class. Classify using
[0084]
Therefore, the threshold value is calculated by linear interpolation using the edge amount as a weight coefficient. The general linear interpolation formula is shown below.
Y = (1−a) × X1 + a × X2 (1)
However, the range of a is 0 ≦ a ≦ 1.
[0085]
In the equation (1), the threshold Y is calculated by using the weighting factor a as the edge amount, X1 as the neighborhood threshold, and X2 as the neighborhood average, so that the edge is mainly used as the neighborhood threshold, and the flat part is used as the threshold. In this case, the neighborhood threshold can be mainly used as the threshold.
Therefore, the edge amount Edge is calculated using the following calculation formula.
[0086]
(Equation 1)

[0087]
In order to calculate the threshold value by linear interpolation using the expression (1), the range of the edge amount as the weight coefficient needs to be 0 ≦ Edge ≦ 1, but the edge amount calculated by the expression (2) Edge does not fall within the range of 0 ≦ Edge ≦ 1. Therefore, a maximum value W is provided for the edge amount Edge, and division by the maximum value makes it possible to set the range of 0 ≦ Edge / W ≦ 1.
[0088]
The maximum value W of the edge amount Edge is set by the following equation (3).
Edge = Edge> W? W: Edge (3)
[0089]
Equation (3) means that when the condition is satisfied, the former value is used, and when the condition is not satisfied, the latter value is used. That is, when the edge amount Edge is larger than W, Edge = W is set, and when the edge amount Edge is smaller than W, the Edge is used as it is.
[0090]
In the recursive classification process in the present embodiment, a feature block such as an edge amount, a neighborhood average value, and a neighborhood threshold is obtained in a 3 × 3 pixel block including a pixel of interest and its surrounding pixels. Based on the threshold, a target pixel is classified. Further, the pixels of each class classified by the classification are further classified at different threshold values, thereby performing a plurality of levels (levels) of classification. Further, in the present embodiment, the recursion level is set to 3, level 1 for dividing a strong edge portion as a class boundary, level 2 for dividing a relatively strong edge portion as a class boundary, and a weak edge portion. The division is performed at three levels of level 3, which is divided as a class boundary. The strong edge portion is a portion where the pixel value difference between the pixels on both sides of the edge is large, and the weak edge portion is a portion where the pixel value difference between the pixels on both sides of the edge is small.
[0091]
Therefore, a lower limit value of the edge amount is provided in order to realize a plurality of levels of recursive classification processing.
[0092]
Assuming that the lower limit is a function LOWER_VAL (W) of W, the lower limit is calculated by the following equation (4).
Edge = Edge <LOWER_VAL (W)? 0: Edge (4)
[0093]
At this time, when the edge amount Edge is smaller than LOWER_VAL (W), Edge = 0 is set, and when the edge amount is larger than LOWER_VAL (W), Edge is used as it is.
[0094]
The function LOWER_VAL (W) is, for example, a function of W as follows.
LOWER_VAL (W) = 32 × W / 128 (5)
[0095]
Substituting the edge amount Edge, the neighborhood average value Avg, and the neighborhood thresholds th (x-1, y) and th (x, y-1) calculated by the equations (2) to (4) into the equation (1). As a result, the threshold th (x, y) at the target pixel can be calculated. Here, the coordinates (x-1, y) indicate the coordinates of the pixel P4 on the left of the target pixel C1 among the peripheral pixels, and the coordinates (x, y-1) indicate the coordinates of the upper pixel P2 of the peripheral pixels. Is shown. Therefore, th (x-1, y) indicates a threshold value when the pixel P4 on the left of the target pixel is classified, and th (x, y-1) indicates a threshold when the pixel P2 above the target pixel is classified. Is shown.
[0096]
(Equation 2)

[0097]
Equation (7) represents rounding. Since the threshold value th (x, y) is an integer, rounding can be realized by adding 0.5 to TH / W. However, when adding 0.5 after performing division in an integer operation, the processing amount increases. Therefore, a value obtained by dividing the denominator in division by 2 is added to the numerator, and then rounding is realized by dividing by the denominator. It is common.
[0098]
As a procedure for actually performing the classification process, each pixel of the image data is scanned by repeating the process in the row direction (main scanning direction). When the processing of one line is completed, the processing target line is moved in the column direction (sub-scanning direction), and the classification processing is performed again in the main scanning direction.
[0099]
As described above, in order to calculate the threshold value th (x, y), the neighborhood threshold values th (x-1, y) and th (x, y-1) are required. In this case, since there is no pixel above the pixel of interest, the neighborhood threshold th (x, y-1) cannot be used. Further, when performing line classification processing sequentially from left to right, since the first pixel of interest, that is, the leftmost pixel does not have a pixel adjacent to the left, the neighborhood threshold th (x−1, y) is used. Can not. Therefore, an initial threshold value is set in advance, and if there is no neighboring pixel, the set initial threshold value is set as the threshold value th (x, y-1) and the threshold value th (x, y) as the nearby threshold value th (x-1, y). y) is calculated.
[0100]
In the following, it is assumed that the range of the pixel value, for example, the brightness value is 0 (black) to 255 (white), and the initial threshold is 128. As another initial threshold value, for example, an average pixel value of the entire image data may be used.
[0101]
In addition, since the neighboring thresholds th (x-1, y) and th (x, y-1) are used to calculate the threshold th (x, y), when the line is scanned in the main scanning direction, If the classification process is always performed from left to right, the threshold th (x, y) is affected by the neighborhood threshold th (x−1, y) of the pixel on the left of the target pixel, and the appropriate class There is a case where the dividing process is not performed. Therefore, for each predetermined line, the processing from the left to the right of the line and the processing from the right to the left are exchanged to perform the classification processing. When performing the classification process from the right to the left of the line, the neighborhood threshold value to be substituted into the equation (6) may be changed from the neighborhood threshold th (x-1, y) to the neighborhood threshold th (x + 1, y). Thus, the threshold value th (x, y) can be calculated as a threshold value that considers the upper pixel and the left and right pixels on average.
[0102]
Further, not only one pixel value such as a lightness value but also other color differences are input as image data to be input to the clustering unit 11, and the color difference is also considered by adding the edge amount of the color difference to the edge amount calculation. Classifying process can be performed.
[0103]
Further, the dynamic range (difference between the maximum value and the minimum value of the pixel values) of the entire image data is calculated, and LOWER_VAL (W) is calculated by the following equation, thereby performing a classification process more adapted to the image. As a result, processing accuracy can be improved.
[0104]
[Equation 3]

D represents the dynamic range.
[0105]
This is because the edge amount in the image is greatly related to the dynamic range, and it is difficult to detect an edge in an image having a narrow dynamic range (D is small), and the lower limit value in calculating the edge amount is changed according to the dynamic range. This is because this corresponds to an image in which edges are hardly detected.
[0106]
Since the image processing performed in the present embodiment is raster processing, the threshold value differs even for pixels in the same flat portion depending on the positional relationship between the target pixel and the edge portion. For example, when there is an edge portion below the pixel of interest, the flat portion is continuous, and as shown in the above equations (6) and (7), the neighboring pixels on the left and above the pixel of interest, that is, the same flat portion When the threshold value is calculated using the neighborhood threshold value of the target pixel, if there is an edge portion above the target pixel, the peripheral pixels above the target pixel are edge pixels, so that the neighborhood threshold value of the edge pixel and the pixel of the flat portion is obtained. Is used to calculate the threshold. Therefore, even if the pixels are in the same flat portion, the threshold value differs depending on the positional relationship with the edge portion. FIG. 6A shows a distribution of threshold values obtained by the equations (6) and (7) in each pixel of the image data shown in FIG. It can be seen that the threshold value changes in flat portions such as the land portion and the sea portion in the background portion and the lower portion of the photograph.
[0107]
Therefore, the method of calculating the threshold value of the classification process is changed depending on the positional relationship between the target pixel and the edge portion. First, a case will be described in which a classifying process is performed on a pixel of interest from left to right of a line for each pixel.
[0108]
As shown in FIG. 7A, only pixels above the pixel of interest among the peripheral pixels are edge pixels, and if there are no edge pixels to the left and right of the pixel of interest, the pixel to the left of the pixel of interest is a class pixel. The threshold value th (x-1, y) at the time of the division is used as the threshold value th (x, y) of the target pixel. As shown in FIG. 7B, when there is no edge pixel above the target pixel among the peripheral pixels, and when the left and right pixels of the target pixel are edge pixels, the pixel above the target pixel is classified into a class. The threshold th (x, y-1) at the time of the division is used as the threshold th (x, y) of the target pixel.
[0109]
As shown in FIG. 7C, when there is no edge pixel in the peripheral pixels, the threshold value when the pixel to the left of the target pixel has been subjected to the classification processing, or the pixel above the target pixel performs the classification processing. Of the thresholds at the time, the threshold closer to the preset initial threshold is set as the threshold of the target pixel. As shown in FIG. 7D, in cases other than the above, the threshold value of the pixel of interest is calculated using Expression (6).
[0110]
Next, a case will be described in which a target pixel is subjected to a classification process for each pixel from the right to the left of the line. As shown in FIG. 8A, among the peripheral pixels, only the pixel above the target pixel is an edge pixel, and if there are no edge pixels on the left and right of the target pixel, the class is assigned to the right pixel of the target pixel. The threshold th (x + 1, y) at the time of performing the division processing is directly set as the threshold th (x, y) of the target pixel. As shown in FIG. 8B, when there is no edge pixel above the target pixel among the peripheral pixels, and the left and right pixels of the target pixel are edge pixels, the pixel above the target pixel is classified into a class. The threshold th (x, y-1) at the time of performing the division processing is directly used as the threshold th (x, y) of the target pixel.
[0111]
As shown in FIG. 8C, when there is no edge pixel in the peripheral pixels, the threshold value when the pixel on the right of the target pixel has been subjected to the classification processing, or the pixel on the upper side has been subjected to the classification processing. Of the thresholds at that time, the threshold closer to the preset initial threshold is set as the threshold of the target pixel. As shown in FIG. 8D, in cases other than the above, the threshold value of the target pixel is calculated using Expression (6).
[0112]
FIG. 6B shows the distribution of the threshold values when the threshold values are determined in this manner. From the figure, it can be seen that an unnatural change in the threshold value in the flat portion has not occurred. As a result, the threshold value of the flat portion can be kept constant, and the object information described later can be created.
[0113]
The recursive classification process is performed by determining a threshold value for each pixel and repeating the classification process as described above. Specifically, it is realized as follows.
[0114]
In this embodiment, the recursive classification process is repeated up to the three-level hierarchy.
First, in the classifying process at level 1, the edge amount upper limit value (= sum of weighting factors) W1 is set to 128, and each pixel is classified into two pixels having a brightness value of 0 or 255 based on the threshold value determined as described above. Classify into classes. When the brightness value of a pixel is larger than the threshold value, the brightness value of the pixel is set to 255, and when it is smaller than the threshold value, the brightness value is set to 0. The brightness value of each pixel obtained in this manner is stored for each pixel as class information of level 1 and is set as a classification result at level 1.
[0115]
At level 2, further classification processing is performed on each pixel classified into the class whose brightness value is 0 and each pixel classified into the class of 255 in level 1. By setting the edge amount upper limit value to W2 = W1 / 2 (= 64), an edge finer than level 1 is detected to easily cause a class change. At this time, the edge amount lower limit value LOWER_VAL (W2) is set to 16 by substituting W = 64 into the equation (5).
[0116]
In the level 2 classification process, the brightness value of each pixel classified into the class of 0 at the level 1 is classified into two classes of 0 and 85, and the brightness value of each pixel classified into the class of 255 at the level 1 Are classified into two classes, 170 and 255. The brightness value of each pixel obtained in this way is stored as level 2 class information, and is set as a classification result at level 2.
[0117]
Finally, at the level 3, the pixels classified into the classes of the lightness values of 0, 85, 170, and 255 at the level 2 are further subjected to a classification process. By setting the edge amount upper limit value W3 to W3 = W2 / 2 (= 32), a finer edge is detected and a class change is easily caused. At this time, the edge amount lower limit value LOWER_VAL (W3) is set to 8 by substituting W3 = 32 into the equation (5).
[0118]
In the level 3 classification process, the brightness value of each pixel classified into the class whose brightness value is 0 in level 2 is classified into two classes of 0 and 28, and the brightness value of each pixel classified into 85 classes Are classified into two classes of 56 and 85, the brightness value of each pixel classified into 170 classes is classified into two classes of 170 and 196, and the brightness value of each pixel classified into 255 classes is classified into 226. And 255. The brightness value of each pixel obtained in this way is stored as level 3 class information, and is used as a classification result at level 3.
[0119]
FIG. 9 is a diagram showing a tree structure schematically illustrating pixel classification when recursive classification processing is performed up to three levels. Here, 0, 28, 56,... 255 are the brightness values of the classes, respectively, and are class information for identifying the classes. Further, in this tree structure, the class information at the level 1 and the level 2 can be easily obtained from the class information at the level 3 by the class information. For example, it can be seen that a pixel belonging to the class 196 at level 3 belongs to the class 170 at level 2 and belongs to 255 at level 1. Therefore, for each pixel, only the class information at level 3 needs to be stored.
[0120]
However, it is not always necessary to use the lightness value as the class information, and it is sufficient that the class information at

levels

1 and 2 is known from the class information at level 3. For example, in the class of the level 3, the class 0 may be the class 1, the class 28 may be the class 2, the class 56 may be the class 3,...
[0121]
Further, the clustering unit 11 creates object information based on the threshold value for each pixel determined when performing the recursive classification process. The object information is determined for each pixel and indicates whether the pixel belongs to a background area or an object (photograph, text, etc.) area other than the background. For example, when the pixel belongs to the background area, the object information is stored as 1, and when the pixel belongs to the object area, the object information is stored as 0.
[0122]
Whether or not a pixel belongs to the background area is determined for each level, and the threshold used for classification is the initial threshold, and it is determined that the pixel belongs to the background area while the threshold is being continued. When the threshold value is determined under the conditions shown in FIGS. 7 and 8, the reason why the initial threshold value is continued is that the flat portion is continuous. Further, since it is considered that some object exists in the area other than the background area, it is determined that the area other than the background area is an object area.
[0123]
Therefore, in the recursive classification process performed for each pixel, if the neighboring threshold used as the threshold is the threshold of the background pixel, the pixel of interest belongs to the background area, and if the threshold of the non-background pixel is the threshold of the background pixel, It belongs to the area.
[0124]
When the threshold value is calculated using Expression (6), it is assumed that the target pixel belongs to the non-background area. This is because the fact that the threshold value of the target pixel is newly calculated is considered to indicate that some object exists.
[0125]
As described above, the clustering unit 11 creates the class information and the object information of each pixel by the recursive classification processing.
[0126]
FIG. 10 is a diagram illustrating a distribution of class information of each pixel. In the present embodiment, the brightness value is used as the class information, and by using the brightness value as the gradation value, the class information of each pixel can be visualized as an image. 10A shows the distribution of level 1 class information, FIG. 10B shows the distribution of level 2 class information, and FIG. 10C shows the distribution of level 3 class information. ing. It can be seen that the classes are classified in detail from levels 1 to 3.
[0127]
FIG. 11 is a diagram illustrating a distribution of object information of each pixel. In the figure, the distribution of the object information is shown with the brightness value of the pixels belonging to the background area being 255 (white area) and the brightness value of the pixels belonging to the object area being 128 (gray area).
[0128]
Next, a run length calculating unit 12 as a run length calculating unit calculates a run length in the main scanning direction of the class information and the object information created by the clustering unit 11. The run length is calculated for each level, and in the present embodiment, the run length up to level 3 is calculated.
[0129]
FIG. 12 is a diagram illustrating an example of the procedure of the run-length calculation process. Here, it is assumed that the processing is performed with the number of pixels in one line being 16 pixels. The run length calculation process includes two processes. One variable (count) is given to each pixel, and the run length is calculated by changing this count under a predetermined condition. First, the first processing is based on the class information of each pixel (see FIG. 12 (a)). As long as pixels of the same class continue from the left to the right of the line, the count of pixels is increased to increase the run length. The second process is to replace the count of the pixel on the right with the count of the pixel of interest if the count of the pixel on the right is greater than the count of the pixel of interest by 1 in the right-to-left direction of the line. Is a process for giving the run length of the run to which each pixel belongs to each pixel. By dividing the processing into two processes, complicated loop processing can be avoided, and the processing can be performed by multi-pass processing by a SIMD processor (same type multiple processing type arithmetic unit).
[0130]
First, the first processing will be described with reference to FIG. In the first process, based on the class information of each pixel shown in FIG. 12A, when the class information of the pixel on the left is the same as the class information of the pixel of interest, 1 is added to the count of the pixel on the left. The counted value is used as the count of the target pixel. At level 1 in FIG. 12B, assuming that the leftmost pixel is the target pixel, the level 1 class information of the target pixel is 0, and there is no pixel adjacent to the left. Move pixel to next right pixel.
[0131]
Since the level 1 class information of the next target pixel (the second pixel from the left) is also 0, 1 is added to the count of the pixel on the left and the count 1 is written to the output buffer. The level 1 class information of the next pixel of interest (the third pixel from the left) is 255, which belongs to a different class from the pixel on the left. Therefore, the count is returned to 0, and the count 0 is written to the output buffer. Similarly, the count is determined for the pixels of one line while comparing the class information of the pixel on the left with the class information of the pixel of interest. The count until a pixel with a count of 0 appears indicates the run length of the run to which the pixel belongs.
[0132]
Since the class information shown in FIG. 12A is the level 3 class information, it is necessary to obtain the level 1 class information from the level 3 class information in order to calculate the level 1 run length. For example, the stored class information of the sixth pixel from the left is level 3 class information 170, but from the tree structure shown in FIG. 9, the level 2 class information is 170 and the level 1 class information Is found to be 255.
[0133]
Next, the level 2 class information of each pixel is obtained, and the run length is calculated in the same manner as in level 1. A method for obtaining level 2 class information from level 3 class information will be described. Assuming that the level 3 class information is in and the level 2 class information is out, it can be easily realized by the following equation.
[0134]
(1) out = in <56? 0: out;
(2) out = in <170? 85: out;
(3) out = in <226? 170: out;
{Circle around (4)} out = 255;
[0135]
(1) The level 3 class information is compared with 56, and if it is less than 56, the level 2 class information is set to "0".
[0136]
(2) If the level 3 class information is 56 or more and less than 170, the level 2 class information is set to “85”.
[0137]
(3) If the level 3 class information is 170 or more and less than 226, the level 2 class information is set to “170”.
[0138]
(4) If the level 3 class information is 226 or more, the level 2 class information is set to "255".
[0139]
At level 2, the absolute value of the difference between the class information of the pixel on the left and the class information of the pixel of interest is 255 in order to calculate the run length ignoring the class change at level 1, which is higher than level 2. Is satisfied, it is considered that there is no change in the class, and the count is not returned to 0 and the count-up is continued. In other words, a change point of a class at level 1, that is, a portion that has already been determined to be a run boundary is not detected at level 2 or later. FIG. 12B shows the result of the run length calculation of the level 2.
[0140]
For level 3, the run length can be calculated using the stored class information as it is. However, as in level 2, in order to calculate the run length while ignoring class changes in

levels

1 and 2 which are higher than level 3, the class information of the pixel on the left and the class information of the target pixel are calculated. When the absolute value of the difference exceeds 28, it is regarded that there is no change in the class, and the count-up is continued. By the first processing as described above, the run lengths of levels 1 to 3 can be calculated.
[0141]
The second process will be described. In the second process, the count of the pixel of interest is compared with the count of the pixel on the right side of the count of each pixel obtained in the first process (FIG. 12B), and the count of the pixel on the right is calculated. Is larger than the count of the target pixel by one, the count of the target pixel is replaced with the count of the pixel on the right. Since the count of the pixel at the right end of the run is equal to the run length, by replacing the count of the pixels belonging to the same run with the count of the pixel at the right end of the run, each pixel uses the run length of the run to which it belongs as information. Will have. The case of level 1 will be described below as an example (see FIG. 12C).
The count of the rightmost pixel is “1”, and there is no pixel on the right, so the count remains “1”.
-Since the count of the next (second from right) pixel is "0" and the count of the pixel on the right is larger by 1, the count is replaced with "1".
The count of the third pixel from the right is “3”, and the count of the pixel on the right is 2 larger, so the count remains at “3”.
The count of the fourth pixel from the right is “2”, and the count of the pixel on the right is larger by one, so the count is replaced with “3”.
The count of the fifth pixel from the right is “1”, and the count of the pixel on the right is larger by one, so the count is replaced with “3”.
[0142]
Hereinafter, the same process is repeated. The count of the target pixel and the count of the pixel on the right thereof are compared based on the count obtained in the first processing (FIG. 12B), and the replacement count is the count after the second processing. Used. This is equivalent to replacing the count of continuously counted pixels with the maximum value, since the maximum value of the count (the rightmost count of the run) when counting continuously is equivalent to the run length.
[0143]
By performing the same processing as the above first and second processing, the run length of the object information can be calculated. In the first process, if the object information is the same as that of the pixel on the left, the count of the pixel of interest is obtained by adding 1 to the count of the pixel on the left. In the second processing, the count is replaced based on the first processing result.
[0144]
In a processor such as an SIMD processor that handles a plurality of data paths with one program counter, one line of class information is stored in a plurality of data paths, for example, a data path A and a data path B as shown in FIG. And in the first process, each data path can be processed simultaneously.
[0145]
The run length is calculated individually in each data path (see FIG. 13B), and the data paths are connected (see FIG. 13C). In the connection part between the data path A and the data path B, if the class information of the adjacent pixels is the same, the count of the right end pixel of the data path A is changed to the pixel of which count is 0 except the left end pixel of the data path B. (See FIG. 13C). When the class information is different in the connection part, the connection is performed as it is. After the data paths are connected, the second process is performed in the same manner as described above, and each pixel is given the run length of the run to which the pixel belongs (see FIG. 13D). With the above processing, the processing can be easily performed in the SIMD processor.
[0146]
Next, the character region estimating unit 13 as a character region estimating unit estimates pixels belonging to the character region based on the run length of the class information. Since a character is generally considered to have a high degree of complexity, if the run length of the class information is equal to or smaller than the character estimation threshold SIZEOFTEXT, it can be estimated that the character belongs to the character area.
[0147]
However, since the run length calculated by the run length calculation unit 12 is the run length in the main scanning direction, when the character area is estimated based on the threshold SIZEOFEXT, the determination depends only on the complexity of the image in the horizontal direction. , Accuracy is not enough.
[0148]
Therefore, when there is a pixel in the surrounding pixels that belongs to the same class as the pixel of interest and is estimated to be not a character area, the pixel of interest has a run length of the class information equal to or smaller than a predetermined threshold value SIZEOFTEXT. Is not estimated to be a character area. By making a determination with this condition added, the accuracy of character region estimation can be improved.
[0149]
Further, in consideration of the case where processing is performed from the left and the case where processing is performed from the right of the line, estimation processing is performed from two directions. First, when processing is performed from left to right, even if the run length of the class information is equal to or less than a predetermined threshold value SIZEOFTEXT, if the peripheral pixel to be processed shown in FIG. Do not estimate.
-The pixel on the left belongs to the same class as the pixel of interest and is not estimated as a character area
-The upper pixel belongs to the same class as the pixel of interest and is not estimated as a character area
-The upper left pixel belongs to the same class as the pixel of interest and is not estimated as a character area
-The upper right pixel belongs to the same class as the pixel of interest and is not estimated as a character area
Also, when processing is performed from the right to the left of a line, even if it has already been estimated as a character area in the processing from the left to the right, it is not estimated as a character area if the following conditions are satisfied.
-The pixel on the right belongs to the same class as the pixel of interest, and is not estimated as a character area
Through the above two-way processing ((1) processing from the left to right direction of the line, (2) processing from the right to left direction of the line), the character area can be accurately estimated based on the run length of the class information. . The above-described character region estimation processing is performed at each level.
[0150]
FIG. 15 is a diagram showing a character estimation area at each level. FIG. 15A shows a character estimation area at level 1, FIG. 15B shows a character estimation area at level 2, and FIG. In the figure, the brightness value of the pixel estimated to belong to the character area is 255, and the brightness values of the other pixels are 0.
[0151]
The region determination unit 14 is a region determination unit that determines a region to which each pixel belongs based on the run length of the object information and the character region estimation result. The run of the object information is defined as a unit window (a certain unit is regarded as one unit). Based on the estimation result of the character region estimating unit 13, the region determination is performed based on the content ratio of the pixel estimated as the character region for each level. Do.
[0152]
First, the number of pixels in the level 1 character estimation area, the number of pixels in the level 2 character estimation area, and the number of pixels in the level 3 character estimation area in the unit window are counted. FIG. 16 is a diagram illustrating an example of a unit window to be subjected to region determination. In this example, the run length of the object information, which is a unit window, is set to 8 (the run length calculation process starts counting from 0, and thus is shown as “7” instead of “8” in the figure), and is used in the character area. Pixels that are presumed to belong to are represented by “*”, and pixels that are presumed to be not character regions are represented by “−”.
[0153]
First, the character area estimation pixels for each level in the unit window are counted. In FIG. 16, the estimated number of character area pixels at level 1 is 4, the estimated number of character area pixels at level 2 is 3, and the estimated number of character area pixels at level 3 is 0.
[0154]
Then, the background / character / photo region is determined from the estimated pixel number of the character region. In a character area, a continuous object area is often formed of one-level character area estimation pixels. For example, it is highly likely that the character area is a character area under the following conditions.
[0155]
[Table 1]

[0156]
Conversely, in a photograph area, a continuous object area is often composed of character area estimation pixels at a plurality of levels. For example, under the following conditions, it is highly possible that the area is a photograph area.
[0157]
[Table 2]

[0158]
To actually determine, the LUT (LUT) that associates the area determination result with the run length of the object information, the estimated number of character areas at level 1, the estimated number of character areas at level 2 and the estimated number of character areas at level 3 in advance. Look Up Table) is stored, and by referring to the LUT based on the estimated number of pixels in the character area, it is determined whether the object area is a text area or a photograph area. The creation of the LUT includes, for example, a learning method using a neural network.
[0159]
When the object information is created, an area where no object area exists is determined as the background area. Even if it is an object area, when the run length of the object information is large to some extent and the estimated number of pixels of the character area at each level is small, it may be determined as the background area.
[0160]
FIG. 17 is a diagram illustrating a region determination result. Here, it is assumed that the brightness value of the pixel belonging to the character area is 0 (black area), the brightness value of the pixel belonging to the background area is 255 (white area), and the brightness value of the pixel belonging to the photograph area is 128 (gray area). I have.
[0161]
Further, based on the region division result, a character is detected in detail from pixels determined to be a character region. When performing character detection, a character detection unit 15 is provided downstream of the area determination unit 14 as shown in the block diagram of the area division unit 4 in FIG. The parts other than the character detection unit 15 are the same as the parts described with reference to FIG. Note that the character detection unit 15 does not necessarily need to be provided in the area division unit 4.
[0162]
The character detection unit 15 detects characters in more detail using the character region estimation result for the pixel determined to be a character region by the region determination unit 14. In the character estimation area, since the class to which the first pixel of the continuous character estimation area belongs is generally a character class, the class to which the first pixel belongs is detected, and the area determined to be a character area is detected. , The pixels belonging to the same class as the detected class are determined to belong to the character area, so that the accuracy of character determination can be further improved.
[0163]
FIG. 19 is a diagram illustrating an area determination result when the character detection unit 15 detects a character. The lightness value indicating each area is the same as the determination result shown in FIG. As can be seen from the figure, it can be seen that the character area is divided with higher accuracy than the determination result shown in FIG.
[0164]
FIG. 20 is a flowchart showing the area dividing process. First, in step S1, the color conversion unit 10 converts the color space of the input image data to obtain a pixel value used for area determination such as a lightness value. In step S2, the clustering unit 11 performs recursive classification processing to generate class information and object information. In step S3, the run length calculation unit 12 calculates the run length in the main scanning direction of the created class information and object information.
[0165]
In step S4, the character area estimating unit 13 compares the run length of the class information with the threshold value SIZEOFTEXT. Pixels belonging to a run having a run length smaller than the threshold are estimated to be pixels belonging to a character area. In step S5, the region determination unit 14 determines the pixels of the object region as a character region or a photograph region based on the number of pixels estimated as a character region among the pixels in the region where the object information is continuous.
[0166]
As described above, in the present embodiment, image data is classified into a plurality of classes by a recursive classification process of determining a threshold for each pixel of interest in consideration of the influence of peripheral pixels, and area determination is performed based on the result. I do. Therefore, the area separation accuracy can be improved as compared with the case where the classification processing is performed using the fixed threshold value.
[0167]
Further, another embodiment of the present invention is an image processing program for causing a computer to function as the image processing device 2, and a computer-readable recording medium storing the image processing program. Thus, the image processing program and the recording medium storing the image processing program can be provided in a portable manner.
[0168]
The recording medium is read by a program reading device provided in a printer or a computer system (which can be used as application software when applied to a computer system) to execute an image processing program.
[0169]
As an input means of the computer system, a flatbed scanner, a film scanner, a digital camera, or the like may be used. The computer system includes these input means, a computer that executes image processing when a predetermined program is loaded, and an image display device such as a CRT (Cathode Ray Tube) display or a liquid crystal display that displays the processing results of the computer. And a printer that outputs the processing results of the computer to paper or the like. Further, a modem or the like is provided as communication means for connecting to a server or the like via a network.
[0170]
The recording medium is not limited to a medium that can be read by a program reading device, but may be a memory of a microcomputer, for example, a ROM. The recorded program may be accessed and executed by a microprocessor, or a program read from a recording medium may be downloaded to a program storage area of a microcomputer and executed. It is assumed that the microcomputer has this download function in advance.
[0171]
Specific examples of the recording medium include a tape system such as a magnetic tape and a cassette tape, a magnetic disk such as a flexible disk and a hard disk, and a CD-ROM (Compact Disc-Read Only Memory) / MO (Magneto Optical) disk / MD (MD). Mini Disc) / DVD (Digital Versatile Disc) and other optical disc disc systems, IC (Integrated Circuit) card (including memory card) / optical card and other card systems, or mask ROM, EPROM (Erasable Programmable Read Only Memory) (Electrically Erasable Programmable Read Only Memory ), It is a medium fixedly carrying the program, including semiconductor memories such as a flash ROM.
[0172]
In the present embodiment, the computer may have a system configuration connectable to a communication network including the Internet, and may download the image processing program via the communication network. When the program is downloaded from the communication network as described above, the download function may be provided in the computer in advance, or may be installed from another recording medium. The download program may be executed via a user interface, or may be a program that periodically downloads a program from a predetermined URL (Uniform Resource Locator).
[0173]
【The invention's effect】
As described above, according to the present invention, the target pixel is classified using the threshold value based on the feature amount of the pixel block including the target pixel and its surrounding pixels. Therefore, the classification is performed using the fixed threshold value. Compared to the case, it is possible to generate class information and object information that reflect the influence of peripheral pixels. The determination of the object information is accurately performed based on the object information. The character area is determined based on the class run length and the ratio of the estimated number of pixels included in the object run using the class information and the object information. Therefore, the pixels belonging to the character area are accurately determined. it can. Therefore, since the determination accuracy of each area is high, the area division accuracy of the image data can be improved.
[0174]
Further, according to the present invention, since a threshold value reflecting the edge strength is used, it is possible to appropriately classify pixels near an edge such as a character area.
[0175]
Further, according to the present invention, classification can be appropriately performed even when the dynamic range of image data is narrow.
[0176]
Further, according to the present invention, by changing the scanning direction for each scanning line, the influence of the left and right peripheral pixels can be received on average, so that appropriate classification can be performed.
[0177]
Further, according to the present invention, when the position of the edge pixel is only above the target pixel, the threshold of the left peripheral pixel is used as the threshold of the target pixel without calculating the threshold, and the target pixel is determined based on the position of the edge pixel. Is generated, an appropriate threshold value can be generated and the classification can be performed particularly when the target pixel is a background pixel near the edge or the like.
[0178]
Further, according to the present invention, it is possible to easily and accurately determine whether or not the target pixel belongs to the background area.
[0179]
Further, according to the present invention, the same type of multiple processing type arithmetic device, that is, a so-called Single Instruction Multiple Data (SIMD) type processor can simultaneously execute the same type of instruction processing. When calculating the run length, if the instruction is the calculation of the run length, the process of calculating the run length for each data path can be performed simultaneously. Therefore, the processing speed of the run length calculation processing can be increased.
[0180]
In addition, according to the present invention, since characters are generally very busy, it is possible to easily estimate whether or not a character belongs to a character area only by comparing the class run length with a character estimation threshold.
[0181]
Further, according to the present invention, it is possible to improve the estimation accuracy of the character area.
According to the present invention, the pixels belonging to the character area have the same class information, and among the consecutive pixels determined to belong to the character area, the pixel at the extreme end often belongs to the character area. Pixels belonging to a character area can be detected well.
[0182]
Further, according to the present invention, image data is divided into regions with high accuracy, and image data subjected to post-processing according to each region can be output, so that a high-quality still image can be formed.
[0183]
Further, according to the present invention, it is possible to provide an image processing program for causing a computer to execute the image processing method.
[0184]
Further, according to the present invention, it is possible to provide a computer-readable recording medium in which an image processing program for causing a computer to execute the image processing method is recorded.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of an image forming apparatus 1 according to an embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a region dividing unit 4.
FIG. 3 shows an input image (FIG. 3A) and L generated by color space conversion. ^* FIG. 4 is a diagram showing an example of an image (FIG. 3B) composed of signals.
FIG. 4 is a diagram showing a pixel block of 3 × 3 pixels.
FIG. 5 is a diagram illustrating an example of a Prewitt operator (prewitt filter).
FIG. 6 is a diagram showing a distribution of threshold values in each pixel.
FIG. 7 is a diagram illustrating a method of determining a threshold based on a positional relationship between a target pixel and peripheral edge pixels.
FIG. 8 is a diagram illustrating a method of determining a threshold based on a positional relationship between a target pixel and peripheral edge pixels.
FIG. 9 is a diagram illustrating a tree structure schematically illustrating pixel classification when recursive classification processing is performed up to three levels.
FIG. 10 is a diagram showing a distribution of class information of each pixel.
FIG. 11 is a diagram showing a distribution of object information of each pixel.
FIG. 12 is a diagram illustrating an example of a procedure of a run length calculation process.
FIG. 13 is a diagram illustrating an example of a procedure of a run length calculation process using a SIMD processor.
FIG. 14 is a diagram illustrating a character region estimation process performed by a character region estimation unit 13;
FIG. 15 is a diagram showing a character estimation area at each level.
FIG. 16 is a diagram illustrating an example of a unit window to be subjected to area determination.
FIG. 17 is a diagram showing an area determination result.
FIG. 18 is a block diagram showing another configuration of the area dividing unit 4.
FIG. 19 is a diagram illustrating an area determination result when the character detection unit 15 detects a character.
FIG. 20 is a flowchart showing an area dividing process.
[Explanation of symbols]
1 Image forming apparatus
2 Image processing device
3 Input section
4 Area division unit
5 Correction unit
6 Resolution converter
7 Color correction unit
8 Halftone section
9 Printer
10 Color converter
11 Clustering part
12 Run length calculation unit
13 Character area estimation unit
14 Area judgment unit

Claims

Image data indicating an image composed of a plurality of pixels is input, and each pixel constituting the image is determined based on the input image data to determine which of a character area, a background area, and another area belongs to the image area. In an image processing apparatus including an area dividing unit that performs data area division,
The area dividing unit includes:
The feature amount of the pixel block including the target pixel and its surrounding pixels is obtained using the pixel value of each pixel, a threshold based on the obtained feature amount is generated, and the generated threshold is compared with the pixel value of each pixel. The target pixel is classified into two pixel sets, and the pixel set classified by the classification is further classified with a threshold different from the threshold, thereby performing a plurality of levels of classification. Class information generating means for generating class information indicating a result of the classification of
Based on a plurality of thresholds generated by the class information generating means, determine whether the pixel of interest belongs to the background area, object information generating means to generate object information indicating the determination result,
A class run length, which is the number of pixels of a class run composed of pixels adjacent to each other in a predetermined direction having the same class information, and a pixel of an object run composed of pixels adjacent to each other in a predetermined direction having the same object information. Run length calculating means for calculating an object run length, which is a number, for each of the stages,
Character area estimation means for estimating, for each of the stages, whether or not a pixel included in a claslan belongs to a character area, based on the clas run length,
It is determined whether or not the pixel belongs to the background area based on the object information, and the ratio of the pixel included in the object run, which is estimated to belong to the character area by the character area estimating unit, at each step. An image processing apparatus comprising: an area determination unit configured to determine whether a pixel included in an object run belongs to a character area or another area based on the information.

The class information generating means uses the edge amount of the pixel of interest, the average density value of the pixels included in the pixel block, and the threshold value of the classification performed when the peripheral pixel is the pixel of interest as the feature amount, 2. The image processing apparatus according to claim 1, wherein a threshold value is calculated by linearly interpolating a density average value and threshold values of peripheral pixels using the edge amount as a weight coefficient.

3. The image processing apparatus according to claim 2, wherein the class information generation unit sets the lower limit of the edge amount based on a dynamic range of the image data.

3. The image processing apparatus according to claim 2, wherein the class information generating unit performs the classification by sequentially moving the pixel of interest in a predetermined scanning direction, and changes the scanning direction for each scanning line.

The class information generating means may select from among the threshold values of the peripheral pixels without calculating the threshold value of the pixel of interest based on the positions of the edge pixels included in the peripheral pixels, or may determine the density average value and the threshold value of the peripheral pixels. 3. The image processing apparatus according to claim 2, wherein is calculated by linear interpolation.

The object information generating means determines that the pixel of interest belongs to the background area when the initial threshold value predetermined for the first target pixel of the image data is continuous as the threshold generated by the class information generating means. The image processing apparatus according to claim 1, wherein:

The run-length calculating means is constituted by the same type of multiple processing type arithmetic unit, divides a scan line into data paths including class information for a predetermined number of pixels, calculates a run length for each data path, and calculates a run length for each data path. 2. The image processing apparatus according to claim 1, wherein the run length is obtained by connecting the data paths after the calculation.

The character region estimating unit compares the class run length calculated by the run length calculating unit with a predetermined character estimation threshold, and estimates that the pixels included in the class run belong to the character region if the threshold is equal to or less than the threshold. The image processing apparatus according to claim 1.

The character region estimating means estimates that the target pixel does not belong to the character region when any of the surrounding pixels has the same class information as the target pixel and is estimated not to belong to the character region. 9. The image processing apparatus according to claim 8, wherein:

A character detecting unit is provided which detects, as a character pixel, a pixel having the same class information as the class information of the pixel at the outermost end among the continuous pixels determined to belong to the character region by the region determining unit. The image processing apparatus according to claim 1, wherein:

An image processing apparatus according to any one of claims 1 to 10,
An image forming apparatus comprising: an image output device that outputs image data processed by the image processing device.

Image data indicating an image composed of a plurality of pixels is input, and each pixel constituting the image is determined based on the input image data to determine which of a character area, a background area, and another area belongs to the image area. In an image processing method including an area dividing step of dividing an area of data,
The area dividing step includes:
The feature amount of the pixel block including the target pixel and its surrounding pixels is obtained using the pixel value of each pixel, a threshold based on the obtained feature amount is generated, and the generated threshold value is compared with the pixel value to obtain the target pixel. Is divided into two pixel sets, and the pixel set classified by the classification is further classified by a threshold different from the threshold, thereby performing a multi-stage classification, and the classification is performed for each stage. Class information generating step of generating class information indicating the result of
Based on the plurality of thresholds generated in the class information generation step, determine whether the target pixel belongs to the background area, object information generation step of generating object information indicating the determination result,
A class run length, which is the number of pixels of a class run composed of pixels adjacent to each other in a predetermined direction having the same class information, and a pixel of an object run composed of pixels adjacent to each other in a predetermined direction having the same object information. A run length calculation step of calculating an object run length which is a number for each of the stages;
A character region estimation step of estimating, for each of the stages, whether a pixel included in the clas run belongs to a character region, based on the clas run length;
It is determined whether or not the pixel belongs to the background area based on the object information, and among the pixels included in the object run, the ratio of the pixel estimated to belong to the character area by the character area estimation step in each step. An area determination step of determining whether a pixel included in the object run belongs to a character area or another area based on the above.

An image processing program for causing a computer to execute the image processing method according to claim 12.

A computer-readable recording medium on which an image processing program for causing a computer to execute the image processing method according to claim 12 is recorded.