JP4701448B2

JP4701448B2 - Region of interest encoding method

Info

Publication number: JP4701448B2
Application number: JP2000283259A
Authority: JP
Inventors: 雄介水野
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2000-09-19
Filing date: 2000-09-19
Publication date: 2011-06-15
Anticipated expiration: 2020-09-19
Also published as: JP2002094991A

Description

【０００１】
【発明の属する技術分野】
この発明は、画像全体中の人物や物体などの特定の関心領域に対し優先的に符号化のビットレートを割り振ることにより、その関心領域の画質を他領域に比較して高く保持する関心領域符号化方法に関する。
【０００２】
【従来の技術】
画像を効率的に符号化する方法として、画像全体中の人物や物体などの特定の関心領域に対し優先的に符号化のビットレートを割り振ることにより、その関心領域の画質を他領域に比較して高く保持する、ＲＯＩ（Region of Interest：関心領域）と背景などの重要でない情報の非ＲＯＩに分ける方法がある。
【０００３】
この方式の代表例としては、Ｍａｘｓｈｉｆｔ法（第一従来例）とビットプレーン法としてのＥＢＣＯＴ（Embedded Block Coding with Optimized Truncation：第二従来例）とがある。
【０００４】
Ｍａｘｓｈｉｆｔ法（第一従来例）は、ＲＯＩ部分を任意の形で指定し、その部分を可逆圧縮する一方、非ＲＯＩ部分を非可逆圧縮するものである。具体的には、まず原画像に対して既知のウェーブレット変換を行って図１３のようなウェーブレット係数の分布を得た後、これらの分布の中で、非ＲＯＩ部分に想到する係数分布の最も大きなウェーブレット係数の値Ｖｍを求めておく。そして、Ｓ＞＝ｍａｘ（Ｖｍ）となるようなビット数Ｓを求め、ＲＯＩ部分Ａｒ１のウェーブレット係数のみを図１４のように増大する方向へＳビットだけシフトさせる。例えば、Ｖｍの値が十進数で「２５５」（即ち、二進数で「１１１１１１１１」）である場合には、Ｓ＝８ビットであり、またＶｍの値が十進数で「１２８」（即ち、二進数で「１０００００００」）である場合にも同様にＳ＝８ビットであるため、この場合にはＲＯＩ部分Ａｒ１のウェーブレット係数を図１４のように増大する方向へＳ＝８ビットだけシフトさせることになる。これにより、ＲＯＩ部分Ａｒ１については非ＲＯＩ部分に比べて圧縮率を低く設定でき、これによりＲＯＩ部分Ａｒ１について可逆圧縮の圧縮データを得ることが可能となる。この方法によれば、復号化側において、事前にＲＯＩ部分の形状等の定義情報を入手する必要がなく、そのまま復号化するだけでＲＯＩ部分の可逆的な復号を行うことができるので便利である。
【０００５】
また、ビットプレーン法のＥＢＣＯＴ（第二従来例）は、図１５の如く、画像を複数の矩形ブロックＢに分けて、その矩形ブロックＢごとにビットレートの優先度をつけて圧縮するものである。したがって、ＥＢＣＯＴによれば、一部の矩形ブロックＢについて、他の矩形ブロックＢよりもビットレートの高い画像圧縮を行うことができる。
【０００６】
【発明が解決しようとする課題】
Ｍａｘｓｈｉｆｔ法（第一従来例）では、ＲＯＩ部分において可逆圧縮を行っていたため、その部分の圧縮率に一定の限界があり、圧縮された画像ファイルのサイズは全体として比較的大きいものとならざるを得なかった。逆に、Ｍａｘｓｈｉｆｔ法において圧縮率を高めて圧縮された画像ファイルのサイズを小さくするためには、非ＲＯＩ部分のビットレートを大幅に下げなければならず、よって非ＲＯＩ部分の画質がＲＯＩ部分の画質に比べて極めて劣悪なものとなってしまう。
【０００７】
一方、ＥＢＣＯＴでは、矩形ブロックＢ毎にビットレートを変化させているため、復号化側に事前にブロックの座標及び大きさや境界等の定義情報が必要である。
【０００８】
また、ＥＢＣＯＴでは、矩形ブロックＢとしてしかＲＯＩ部分を指定していなかったため、任意の形状のＲＯＩ部分を定義することができず、例えば背景の中の人物の顔のみのビットレートを増大させたい場合などには、その処理が極めて困難にならざるを得ないこととなる。
【０００９】
そこで、この発明の課題は、任意の形状のＲＯＩ部分を非可逆圧縮することにより画像を効率的に符号化することができるとともに、復号化側において事前にＲＯＩ部分の形状や大きさ等の定義情報を入手する必要のない関心領域符号化方法を提供することにある。
【００１０】
【課題を解決するための手段】
上記課題を解決すべく、請求項１に記載の発明は、原画像をウェーブレット変換する第一工程と、前記原画像に対して指定された任意の形状の複数のマスク領域を、前記ウェーブレット変換に対応するウェーブレット平面に展開してマスク信号を生成する第二工程と、前記複数のマスク領域及び当該複数のマスク領域以外の非マスク領域に対して圧縮する際の符号量をそれぞれ割り当てる第三工程と、前記第三工程で割り当てられた符号量に応じて量子化及び符号化を行う第四工程とを備え、前記第二工程は、ウェーブレット平面に展開された異なるマスク領域に関する部分同士が重なるときは、重なる部分を優先順位の高いマスク領域に関する部分としてマスク信号を生成し、前記第四工程は、前記複数のマスク領域及び前記非マスク領域に個別に前記量子化及び前記符号化を行い、それぞれの圧縮データを生成するものである。
【００１２】
請求項２に記載の発明は、請求項１に記載の関心領域符号化方法であって、前記第四工程は、前記第一工程においてウェーブレット変換された原画像のウェーブレット係数を前記マスク信号に対応付けて取り出した複数のマスク領域に関する各ウェーブレット係数と、非マスク領域に関するウェーブレット係数とに量子化及び符号化を行うものである。
【００１４】
【発明の実施の形態】
図１はこの発明の一の実施の形態に係る関心領域符号化方法を示すフローチャートである。このフローチャートに沿って、この関心領域符号化方法を説明する。
【００１５】
＜圧縮方法＞
画像の圧縮は、図１中の圧縮装置Ｃｏ側のフローで実行される。
【００１６】
圧縮装置Ｃｏでは、図１の如く、まず原画像であるディジタル式の画像データ１に対してウェーブレット変換を行う。ウェーブレット変換とは、２分割フィルタバンクを使用してローパス側とハイパス側に画像を分けて符号化圧縮を行う変換方法である。ここで、ウェーブレット変換の代表的な方式としては、ウェーブレットパケット木の形態の相違によって、図２及び図３に示したｍａｌｌａｔ型と、図４及び図５に示したｓｐａｃｌｅ型と、図６及び図７に示したｐａｃｋｅｔ型といった複数の種類に区別される。ここでは、これらの中からひとつを選んでウェーブレット変換を行う。尚、図２、図４及び図６は一次元での、図３、図５及び図７は二次元での、それぞれウェーブレット変換の態様を示している。ただし、ウェーブレット変換の展開階層は各図に示した階層数に限定されず任意である。
【００１７】
ここで、図２及び図３に示したｍａｌｌａｔ型は、低い周波数成分（ローパス成分：図２中の相対的に小さなブロックで示した画像及び図３中の「Ｌ」で示したパス参照）が高い周波数成分（ハイパス成分：図２中の相対的に大きなブロックで示した画像及び図３中の「Ｈ」で示したパス参照）より多くの情報を含んでいるという仮定の下に低域通過フィルタのみを繰り返す方式のものである。これに対して、ウェーブレットパケット基底が任意の２進木構造に対応できることから、図４及び図５に示したｓｐａｃｌｅ型では、低い周波数成分（ローパス成分：図４中の相対的に小さなブロックで示した画像及び図５中の「Ｌ」で示したパス参照）だけでなく、高い周波数成分（ハイパス成分：図４中の最大ブロックで示した画像及び図５中の最初の分岐で「Ｈ」で示したパス参照）においても、さらに低い周波数成分（ローパス成分：「Ｌ」）と高い周波数成分（ハイパス成分：「Ｈ」）とに１段だけ展開している。また、図６及び図７に示したｐａｃｋｅｔ型では、短時間フーリエ変換のように、全ての枝において周波数成分（ローパス成分：「Ｌ」）と高い周波数成分（ハイパス成分：「Ｈ」）に展開して完全木構成を採用している。ここでは、これらのいずれの型を選択するかは、コストと容量に基づいて決められたレートを制約とし、復元歪みを最小にする型を選択する。
【００１８】
次に、圧縮装置Ｃｏにおいて、ウェーブレット変換された画像データ２に対して、重要な部分をＲＯＩ部分として指定するためのマスク信号３を与える。例えば、図８のような人物の画像データ（原画像）１において、額より下の顔部分のみをＲＯＩ部分とする場合、図９の白抜き部分に示すように単一のマスク領域４を設定し、これをＲＯＩ部分として指定する。このマスク領域４は、原画像１をディスプレイ装置の画面で見ながら、所謂マウス等のポインティング入力デバイスを用いて原画像１に対応して指定することができる。
【００１９】
尚、図９は、原画像１に対して単一のＲＯＩ部分を指定した例であるが、複数の領域をＲＯＩ部分として指定してもよい。これらは、それぞれ異なったマスク信号３により規定される。尚、全てのＲＯＩ部分についての全てのマスク領域４を除去した残りの部分が非ＲＯＩ部分５（図９）となる。
【００２０】
次に、図１のように複数のマスク信号３が与えられた場合には、その複数のマスク信号３に対して優先順位をつける。この優先順位が高いほど、情報量、例えばビットレートが高くなり、伸張時の損失が少なくなることになる。この際の優先順位としては、「１」「２」…というように数字の昇順で指定を行う。
【００２１】
そして、上述のように選択された型（ｍａｌｌａｔ型／ｓｐａｃｌｅ型／ｐａｃｋｅｔ型）のウェーブレット変換に対応して、マスク領域４をウェーブレット平面に展開してマスク信号３を生成する。
【００２２】
ここで、マスク信号をウェーブレット平面に相当する部分に変換する方法はウェーブレット変換のフィルタのタップ数に依存する。
【００２３】
例えば、図１１のようにウェーブレット変換の演算処理においてリバーシブル（Reversible）５×３フィルタ（分解側のローパスフィルタのタップ数が５タップで分解側のハイパスフィルタのタップ数が３タップであるフィルタ）を適用するものとすると、原画像１の偶数番目（２ｎ番目）の画素データがＲＯＩ部分として指定されている場合には、ローパスフィルタ（低域側）７のｎ番目のデータと、ハイパスフィルタ（高域側）８の（ｎ−１）番目及びｎ番目のデータとがＲＯＩ部分であるものとして、マスク信号をウェーブレット平面に展開する。また、原画像１の奇数番目（２ｎ＋１番目）の画素データがＲＯＩ部分として指定されている場合には、ローパスフィルタ（低域側）７のｎ番目及び（ｎ＋１）番目のデータと、ハイパスフィルタ（高域側）８の（ｎ−１）番目、ｎ番目及び（ｎ＋１）番目のデータとがＲＯＩ部分であるものとして、マスク信号をウェーブレット平面に展開する。尚、図１１は原画像１と最初の階層のウェーブレット平面との対応関係のみを示しているが、より深い階層の展開についても同様の再帰的な展開が行われる。
【００２４】
あるいは、例えば、図１２のようにウェーブレット変換の演算処理においてドビュッシー（Daubechies）９×７フィルタ（分解側のローパスフィルタのタップ数が９タップで分解側のハイパスフィルタのタップ数が７タップであるフィルタ）を適用するものとすると、原画像１の偶数番目（２ｎ番目）の画素データがＲＯＩ部分として指定されている場合には、ローパスフィルタ（低域側）７の（ｎ−１）番目、ｎ番目及び（ｎ＋１）番目のデータと、ハイパスフィルタ（高域側）８の（ｎ−２）番目、（ｎ−１）番目、ｎ番目及び（ｎ＋１）番目のデータとがＲＯＩ部分であるものとして、マスク信号をウェーブレット平面に展開する。また、原画像１の奇数番目（２ｎ＋１番目）の画素データがＲＯＩ部分として指定されている場合には、ローパスフィルタ（低域側）７の（ｎ−１）番目、ｎ番目、（ｎ＋１）番目及び（ｎ＋２）番目のデータと、ハイパスフィルタ（高域側）８の（ｎ−２）番目、（ｎ−１）番目、ｎ番目、（ｎ＋１）番目及び（ｎ＋２）番目のデータとがＲＯＩ部分であるものとして、マスク信号をウェーブレット平面に展開する。尚、図１２は原画像１と最初の階層のウェーブレット平面との対応関係のみを示しているが、より深い階層の展開についても同様の再帰的な展開が行われる。
【００２５】
尚、ローパスフィルタ（低域側）７及びハイパスフィルタ（高域側）８において、図１１及び図１２の対応関係について、原画像１の或る画素データとの対応により非ＲＯＩ部分と、且つ原画像１の他の画素データとの対応によりＲＯＩ部分とが重なり合う部分は、ＲＯＩ部分であるものとして、マスク信号をウェーブレット平面に展開する。
【００２６】
図１０中の白抜き部分４ａは、上記のようにしてマスク領域（ＲＯＩ部分）４をｍａｌｌａｔ型のウェーブレット平面に展開した領域（以下「展開マスク領域」と称す）４であり、この展開マスク領域４ａに対応したマスク信号３が生成され、ウェーブレット変換された画像データ２に与えられる。図１０中の符号５ａは非ＲＯＩ部分５がウェーブレット平面に展開された領域（以下「展開非マスク領域」と称す）を示している。そして、マスク領域（ＲＯＩ部分）４同士が重なり合う部分及びマスク領域（ＲＯＩ部分）４と非ＲＯＩ部分５とが重なり合う部分では、いずれか優先順位の高い方をマスク領域（ＲＯＩ部分）４とする一方、低い方を非ＲＯＩ部分５として処理する。
【００２７】
そして、図１において、ウェーブレット変換された画像データ２のウェーブレット係数を、それぞれのマスク信号３に対応付けして取り出していく。
【００２８】
次に、図１０のようにウェーブレット平面上に展開された各マスク領域４ａに対して優先順位をつける。尚、上述のように、原画像１に対して複数のＲＯＩ部分４を設定している場合は、この複数のＲＯＩ部分４毎にウェーブレット平面上に展開された数の展開マスク領域４ａに対して、互いに対応する各マスク領域４ａに対して同等の優先順位をつけ、最終的に全ての展開マスク領域４ａに優先順位を設定する。
【００２９】
ここで、原画像１に対して複数のＲＯＩ部分４を設定している場合において、ウェーブレット平面上のローパスフィルタ（低域側）７を通過した部分については、複数の展開マスク領域４ａが重なり合うことがあり得る。この場合は、その重なり合った部分について、重なり合った複数の展開マスク領域４ａのうち優先順位の高い方の展開マスク領域４ａであるとして優先順位を決定する。
【００３０】
そして、ウェーブレット変換された画像データ２において、どのマスク信号にもかからなかった展開非マスク領域５ａのウェーブレット係数を取り出す。この場合、展開非マスク領域５ａの優先順位は、全ての展開マスク領域４ａよりも低くなり（即ち、数字の昇順で大きな数字が付与される）、そのウェーブレット係数としては例えば「０」の値が採用される。
【００３１】
このようにして、複数のＲＯＩ部分６ａ，６ｂのウェーブレット係数が出力される。尚、このＲＯＩ部分６ａ，６ｂ…及び展開非マスク領域５ａのウェーブレット係数の出力データを、以下に「ＲＯＩ信号」と称することにする。
【００３２】
次に、先に設定した優先順位に応じて、ＲＯＩ部分６ａ，６ｂと展開非マスク領域５ａにそれぞれビット量を割り当てる。
【００３３】
この際のビット量の割り当て方法としては、ＲＯＩ部分６ａ，６ｂの情報量、例えばビット量が可逆圧縮に必要なビット量に満たない値に設定される。具体的に、例えば、まず、画像全体の圧縮率ビット量を決定し、その内、各ＲＯＩ部分６ａ，６ｂ…の優先順位の高いものから順番に所定の割合のビット量を順次割り当て、残りのビット量を展開非マスク領域５ａに割り当てる第１の方法と、各ＲＯＩ部分６ａ，６ｂ…と展開非マスク領域５ａの優先順位に応じて所定の割合で直接ビット量を決定する第２の方法とがあるが、いずれの方法を予め選択しておき、その選択された方法に従って各ＲＯＩ部分６ａ，６ｂ…及び展開非マスク領域５ａのビット量を決定する。この際、上述のように、ＲＯＩ部分６ａ，６ｂのビット量が可逆圧縮に必要なビット量に満たない値に設定される。尚、ＲＯＩ部分６ａ，６ｂのビット量を、可逆圧縮に必要なビット量以上に割り当てるモードと、可逆圧縮に必要なビット量に満たない値に設定するモードとを選択できるようにしてもよい。
【００３４】
そして、各ＲＯＩ部分６ａ，６ｂ…及び展開非マスク領域５ａのＲＯＩ信号に対して量子化処理を行い、それぞれ２値化されたデータ１０ａ，１０ｂ…１０ｚを生成する。尚、この量子化処理の方法としては、ＥＢＣＯＴやＳＰＩＨＴ（Image Compression with Set Partitioning in Hierarchical Trees）等のビットプレーン符号化法と同様の方法で２値化してもよい。
【００３５】
そして、それぞれの２値化されたデータ１０ａ，１０ｂ…１０ｚに対して、ＭＱコーダーやＱＭコーダーなどの算術符号化やハフマン符号化等の所定の方式を用いてエントロピー符号化を行う。
【００３６】
このようにすることで、各ＲＯＩ部分６ａ，６ｂ…及び展開非マスク領域５ａ毎の圧縮データ１１ａ，１１ｂ…１１ｚが生成される。
【００３７】
そして、各圧縮データ１１ａ，１１ｂ…１１ｚを、その優先順位に応じて順番に並べて、例えば所定の経路に送出し、あるいはハードディスクドライブ等の所定の記録媒体に記録する。
【００３８】
＜復号方法＞
画像の復号は、図１中の伸張装置Ｅｘ側のフローで実行される。
【００３９】
伸張装置Ｅｘでは、図１の如く、各圧縮データ１１ａ，１１ｂ…１１ｚを所定の方式に従ってエントロピー復号化し、順次に並べられた順番（即ち、ビットレートの優先順位の高い順番）で、２値化されたデータ２１ａ，２１ｂ…２１ｚを生成する。
【００４０】
次に、それぞれの２値化されたデータ２１ａ，２１ｂ…２１ｚを逆量子化（多値化）して、複数のＲＯＩ信号２２ａ，２２ｂ…２２ｚを生成する。この際、順次に並べられた順番（即ち、ビットレートの優先順位の高い順番）で、逆量子化（多値化）を行い、複数のＲＯＩ信号２２ａ，２２ｂ…２２ｚをビットレートの優先順位の高い順番で生成する。
【００４１】
そして、ウェーブレット平面において、各ＲＯＩ信号２２ａ，２２ｂ…２２ｚを優先順位に応じて、ウェーブレット変換されたウェーブレット係数を合成する。このとき、各ＲＯＩ信号２２ａ，２２ｂ…２２ｚの優先順位は、データが送られて来た順番（即ち、データの並んでいる順番）であるため、与えられた順番に従って順次ウェーブレット係数を合成し、ウェーブレット変換された画像データ２３を形成する。尚、複数のＲＯＩ信号２２ａ，２２ｂ…２２ｚの間で重なり合う部分が発生している場合には、優先順位の高い方を選択する。尚、非ＲＯＩ部分５のウェーブレット係数を「０」の値に設定していた場合は、各ＲＯＩ部分６ａ，６ｂ…及び非ＲＯＩ部分５に対応するＲＯＩ信号２２ａ，２２ｂ…２２ｚの値を加算するだけでよい。
【００４２】
最後に、ウェーブレット変換の型（ｍａｌｌａｔ型／ｓｐａｃｌｅ型／ｐａｃｋｅｔ型）に対応して、逆ウェーブレット変換を行なって画像データ２４を復元する。
【００４３】
以上のように、ウェーブレット平面に相当する部分に変換された任意の形状のＲＯＩ部分６ａ，６ｂ…に対して非可逆圧縮を行っているので、可逆圧縮を行っていたＭａｘｓｈｉｆｔ法（第一従来例）に比べて、圧縮後の画像ファイルのサイズを全体として小さくできる。また、ＥＢＣＯＴ（第二従来例）に比べて、任意の形状のＲＯＩ部分６ａ，６ｂ…を指定でき、例えば背景の中の人物の顔のみのビットレートを増大させたい場合などにおいて、その処理が容易になる。
【００４４】
【発明の効果】
本発明によれば、ＥＢＣＯＴ（第二従来例）に比べて、任意の形状のＲＯＩ部分を指定でき、例えば背景の中の人物の顔のみのビットレートを増大させたい場合などにおいて、その処理が容易になる。
【００４５】
また、本発明によれば、複数の任意の形状のマスク領域を指定した場合にも、それぞれのマスク領域に対して容易に非可逆圧縮を行うことができる。この場合、それぞれのマスク領域に対して個別に符号量を割り当てることで、割り当て符号量の自由な設定が可能となるという効果がある。
【図面の簡単な説明】
【図１】この発明の一の実施の形態に係る関心領域符号化方法を示すフローチャートである。
【図２】ｍａｌｌａｔ型のウェーブレットパケット木を示す図である。
【図３】ｍａｌｌａｔ型のウェーブレット平面を示す図である。
【図４】ｓｐａｃｌｅ型のウェーブレットパケット木を示す図である。
【図５】ｓｐａｃｌｅ型のウェーブレット平面を示す図である。
【図６】ｐａｃｋｅｔ型のウェーブレットパケット木を示す図である。
【図７】ｐａｃｋｅｔ型のウェーブレット平面を示す図である。
【図８】原画像の例を示す図である。
【図９】図８の原画像に対して設定された単一のマスク領域を示す図である。
【図１０】図９のマスク領域をｍａｌｌａｔ型のウェーブレット平面に展開した状態を示す図である。
【図１１】逆ウェーブレット５×３フィルタにおける低域側及び高域側と入力側との間のマスク領域の対応関係を示す図である。
【図１２】逆ウェーブレット９×７フィルタにおける低域側及び高域側と入力側との間のマスク領域の対応関係を示す図である。
【図１３】原画像に対して既知のウェーブレット変換を行った後のウェーブレット係数の分布を示す図である。
【図１４】第一従来例におけるウェーブレット係数の分布を示す図である。
【図１５】第二従来例のＥＢＣＯＴの概念を説明する図である。
【符号の説明】
１画像データ（原画像）
１０ａ〜１０ｚ２値化されたデータ
１１ａ〜１１ｚ圧縮データ
２ウェーブレット変換された画像データ
２１ａ〜２１ｚ２値化されたデータ
２２ａ〜２２ｚＲＯＩ信号
３マスク信号
４，４ａマスク領域
５，５ａ非ＲＯＩ部分
６ａ，６ｂ… ＲＯＩ部分
Ｃｏ圧縮装置
Ｅｘ伸張装置[0001]
BACKGROUND OF THE INVENTION
The present invention assigns a coding bit rate to a specific region of interest such as a person or an object in the entire image, thereby maintaining the image quality of the region of interest higher than that of other regions. It relates to the conversion method.
[0002]
[Prior art]
As a method of efficiently encoding an image, the image quality of the region of interest is compared with other regions by preferentially allocating the encoding bit rate to a specific region of interest such as a person or object in the entire image. There is a method of dividing the ROI (Region of Interest) and the non-ROI of unimportant information such as the background.
[0003]
Typical examples of this method include a Maxshift method (first conventional example) and an EBCOT (Embedded Block Coding with Optimized Truncation) as a bit plane method.
[0004]
The Maxshift method (first conventional example) designates an ROI part in an arbitrary form, and reversibly compresses the part while irreversibly compresses the non-ROI part. Specifically, first, a known wavelet transform is performed on the original image to obtain the distribution of wavelet coefficients as shown in FIG. 13, and among these distributions, the largest coefficient distribution is conceived for the non-ROI portion. The value Vm of the wavelet coefficient is obtained in advance. Then, the number of bits S such that S> = max (Vm) is obtained, and only the wavelet coefficient of the ROI part Ar1 is shifted by S bits in the increasing direction as shown in FIG. For example, if the value of Vm is “255” in decimal (ie, “11111111” in binary), S = 8 bits, and the value of Vm is “128” in decimal (ie, binary). Similarly, in the case of “10000000”), S = 8 bits. In this case, the wavelet coefficient of the ROI part Ar1 is shifted by S = 8 bits in the increasing direction as shown in FIG. Become. As a result, the ROI portion Ar1 can be set to a lower compression rate than the non-ROI portion, and thus it is possible to obtain lossless compressed data for the ROI portion Ar1. According to this method, there is no need to obtain definition information such as the shape of the ROI portion in advance on the decoding side, and it is convenient because reversible decoding of the ROI portion can be performed simply by decoding as it is. .
[0005]
The bit plane EBCOT (second conventional example) divides an image into a plurality of rectangular blocks B as shown in FIG. 15, and compresses each rectangular block B with a priority of the bit rate. . Therefore, according to EBCOT, image compression with a bit rate higher than that of other rectangular blocks B can be performed for some rectangular blocks B.
[0006]
[Problems to be solved by the invention]
In the Maxshift method (first conventional example), the lossless compression is performed in the ROI part, so there is a certain limit in the compression rate of the part, and the size of the compressed image file must be relatively large as a whole. I didn't get it. Conversely, in order to reduce the size of the image file compressed by increasing the compression rate in the Maxshift method, the bit rate of the non-ROI portion must be greatly reduced, and therefore the image quality of the non-ROI portion is the same as that of the ROI portion. It will be extremely poor compared to the image quality.
[0007]
On the other hand, in EBCOT, since the bit rate is changed for each rectangular block B, definition information such as the coordinates, size, and boundary of the block is required in advance on the decoding side.
[0008]
Also, in EBCOT, the ROI part was specified only as the rectangular block B, so it is not possible to define the ROI part of an arbitrary shape. For example, when it is desired to increase the bit rate of only the human face in the background In such a case, the processing becomes extremely difficult.
[0009]
Therefore, an object of the present invention is to efficiently encode an image by irreversibly compressing an ROI portion having an arbitrary shape, and defining the shape and size of the ROI portion in advance on the decoding side. It is an object of the present invention to provide a region-of-interest encoding method that does not require information.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, a first aspect of the present invention, a first step of the wavelet transform of the original image, a plurality of mask areas of any shape specified for the original image, the wavelet transform A second step of generating a mask signal by developing on a corresponding wavelet plane, and a third step of assigning a code amount for compression to the plurality of mask regions and a non-mask region other than the plurality of mask regions, respectively A fourth step of performing quantization and encoding according to the code amount allocated in the third step, and when the second step overlaps portions related to different mask regions developed in the wavelet plane , the overlapped part to generate a masked signal as a part related to high priority mask area, the fourth step, the plurality of mask areas and the unmasked region Performed individually the quantization and the encoding, and generates a respective compressed data.
[0012]
The invention according to claim 2 is the region-of-interest coding method according to claim 1, wherein the fourth step corresponds to a wavelet coefficient of the original image wavelet transformed in the first step to the mask signal. Quantization and encoding are performed on each wavelet coefficient related to the plurality of mask regions taken out and the wavelet coefficient related to the non-mask region .
[0014]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a flowchart showing a region of interest encoding method according to one embodiment of the present invention. This region-of-interest encoding method will be described along this flowchart.
[0015]
<Compression method>
The image compression is executed according to the flow on the compression device Co side in FIG.
[0016]
As shown in FIG. 1, the compression apparatus Co first performs wavelet transform on digital image data 1 that is an original image. The wavelet transform is a conversion method that performs coding compression by dividing an image into a low-pass side and a high-pass side using a two-part filter bank. Here, as a typical method of the wavelet transform, due to the difference in the form of the wavelet packet tree, the mallat type shown in FIG. 2 and FIG. 3, the scale type shown in FIG. 4 and FIG. A plurality of types such as the packet type shown in FIG. Here, one of these is selected and wavelet transformation is performed. 2, 4, and 6 illustrate one-dimensional wavelet transform modes, and FIGS. 3, 5, and 7 illustrate two-dimensional wavelet transform modes. However, the development hierarchy of the wavelet transform is not limited to the number of hierarchies shown in each figure and is arbitrary.
[0017]
Here, the mallat type shown in FIG. 2 and FIG. 3 has a low frequency component (low-pass component: refer to an image shown by a relatively small block in FIG. 2 and a path shown by “L” in FIG. 3). Low pass under the assumption that it contains more information than high frequency components (high pass component: see image shown in relatively large blocks in FIG. 2 and path shown as “H” in FIG. 3) This is a system that repeats only the filter. On the other hand, since the wavelet packet base can correspond to an arbitrary binary tree structure, in the scale type shown in FIGS. 4 and 5, a low frequency component (low-pass component: indicated by a relatively small block in FIG. 4). In addition to the image and the path indicated by “L” in FIG. 5, high frequency components (high pass component: the image indicated by the largest block in FIG. 4 and “H” in the first branch in FIG. 5) Also in the path shown), only one stage is developed for a lower frequency component (low-pass component: “L”) and a higher frequency component (high-pass component: “H”). Further, in the packet type shown in FIGS. 6 and 7, the frequency component (low-pass component: “L”) and the high frequency component (high-pass component: “H”) are developed in all branches as in the short-time Fourier transform. And adopts a complete tree structure. Here, which of these types is selected is based on a rate determined based on cost and capacity, and a type that minimizes the restoration distortion is selected.
[0018]
Next, in the compression apparatus Co, a mask signal 3 for designating an important part as an ROI part is given to the wavelet-transformed image data 2. For example, in the human image data (original image) 1 as shown in FIG. 8, when only the face portion below the forehead is set as the ROI portion, a single mask area 4 is set as shown in the white portion in FIG. This is designated as the ROI part. The mask area 4 can be specified corresponding to the original image 1 using a so-called pointing input device such as a mouse while viewing the original image 1 on the screen of the display device.
[0019]
Although FIG. 9 shows an example in which a single ROI portion is specified for the original image 1, a plurality of regions may be specified as the ROI portion. These are defined by different mask signals 3. Note that the remaining portion obtained by removing all mask regions 4 for all ROI portions becomes a non-ROI portion 5 (FIG. 9).
[0020]
Next, when a plurality of mask signals 3 are given as shown in FIG. 1, priority is given to the plurality of mask signals 3. The higher the priority, the higher the amount of information, for example the bit rate, and the less the loss during expansion. As priorities at this time, designation is performed in ascending order of numbers such as “1”, “2”.
[0021]
Then, corresponding to the wavelet transform of the type selected as described above (mallat type / scale type / packet type), the mask region 4 is developed on the wavelet plane to generate the mask signal 3.
[0022]
Here, the method of converting the mask signal into a portion corresponding to the wavelet plane depends on the number of taps of the wavelet transform filter.
[0023]
For example, as shown in FIG. 11, a reversible 5 × 3 filter (a filter in which the number of taps on the decomposition-side low-pass filter is 5 taps and the number of taps on the decomposition-side high-pass filter is 3 taps) in the wavelet transform arithmetic processing is used. Assuming that the even-numbered (2n-th) pixel data of the original image 1 is designated as the ROI portion, the n-th data of the low-pass filter (low-frequency side) 7 and the high-pass filter (high Assuming that the (n-1) th and nth data of (region side) 8 are ROI portions, the mask signal is developed on the wavelet plane. When the odd-numbered (2n + 1) -th pixel data of the original image 1 is designated as the ROI portion, the n-th and (n + 1) -th data of the low-pass filter (low-frequency side) 7 and the high-pass filter ( Assuming that the (n−1) th, nth and (n + 1) th data of the (high frequency side) 8 are ROI portions, the mask signal is developed on the wavelet plane. FIG. 11 shows only the correspondence between the original image 1 and the wavelet plane of the first layer, but the same recursive expansion is performed for the expansion of deeper layers.
[0024]
Or, for example, as shown in FIG. 12, in the wavelet transform calculation process, a Deubechies 9 × 7 filter (the number of taps of the low-pass filter on the decomposition side is 9 taps and the number of taps of the high-pass filter on the decomposition side is 7 taps) ) Is applied, when the even-numbered (2n-th) pixel data of the original image 1 is designated as the ROI portion, the (n−1) -th, n of the low-pass filter (low-frequency side) 7 The (n + 1) th data and the (n-2) th, (n-1) th, nth and (n + 1) th data of the high-pass filter (high frequency side) 8 are assumed to be ROI portions. The mask signal is developed on the wavelet plane. When the odd-numbered (2n + 1) -th pixel data of the original image 1 is designated as the ROI portion, the (n−1) -th, n-th, (n + 1) -th of the low-pass filter (low-frequency side) 7 And the (n + 2) th data and the (n-2) th, (n-1) th, nth, (n + 1) th and (n + 2) th data of the high-pass filter (high frequency side) 8 are ROI portions. The mask signal is developed on the wavelet plane. FIG. 12 shows only the correspondence between the original image 1 and the wavelet plane of the first layer, but the same recursive expansion is performed for the expansion of deeper layers.
[0025]
In the low-pass filter (low-frequency side) 7 and the high-pass filter (high-frequency side) 8, the correspondence relationship in FIG. 11 and FIG. The part where the ROI part overlaps with the other pixel data of the image 1 is assumed to be the ROI part, and the mask signal is developed on the wavelet plane.
[0026]
A white portion 4a in FIG. 10 is a region (hereinafter referred to as a “development mask region”) 4 in which the mask region (ROI portion) 4 is developed on a mallat wavelet plane as described above. A mask signal 3 corresponding to 4a is generated and applied to the wavelet transformed image data 2. Reference numeral 5 a in FIG. 10 indicates a region where the non-ROI portion 5 is developed on the wavelet plane (hereinafter referred to as “development non-mask region”). In a portion where the mask regions (ROI portion) 4 overlap each other and a portion where the mask region (ROI portion) 4 overlaps with the non-ROI portion 5, whichever is higher in priority is set as the mask region (ROI portion) 4. The lower one is processed as the non-ROI portion 5.
[0027]
In FIG. 1, the wavelet coefficients of the image data 2 subjected to wavelet transformation are extracted in association with the respective mask signals 3.
[0028]
Next, priorities are assigned to the mask regions 4a developed on the wavelet plane as shown in FIG. As described above, when a plurality of ROI portions 4 are set for the original image 1, the number of development mask regions 4a developed on the wavelet plane for each of the plurality of ROI portions 4 is set. The same priority order is assigned to the mask areas 4a corresponding to each other, and finally the priority order is set for all the development mask areas 4a.
[0029]
Here, when a plurality of ROI portions 4 are set for the original image 1, a plurality of development mask regions 4 a overlap each other in a portion that has passed the low-pass filter (low frequency side) 7 on the wavelet plane. There can be. In this case, the priority order of the overlapping portions is determined as the development mask region 4a having the higher priority among the plurality of overlapping development mask regions 4a.
[0030]
Then, in the image data 2 subjected to wavelet transformation, the wavelet coefficients of the unmasked unmasked area 5a that are not applied to any mask signal are extracted. In this case, the priority of the development non-mask area 5a is lower than that of all the development mask areas 4a (that is, a larger number is given in ascending order of the numbers), and the wavelet coefficient has a value of “0”, for example. Adopted.
[0031]
In this way, the wavelet coefficients of the plurality of ROI parts 6a and 6b are output. The output data of the wavelet coefficients of the ROI portions 6a, 6b... And the unmasked unmasked area 5a will be referred to as “ROI signal” below.
[0032]
Next, according to the priorities set in advance, bit amounts are allocated to the ROI portions 6a and 6b and the expanded non-mask area 5a, respectively.
[0033]
As a bit amount allocation method at this time, the information amount of the ROI portions 6a and 6b, for example, the bit amount is set to a value that does not satisfy the bit amount necessary for lossless compression. Specifically, for example, first, the compression rate bit amount of the entire image is determined, and among them, a predetermined amount of bit amount is sequentially assigned in descending order of the priority of each ROI portion 6a, 6b. A first method for allocating the bit amount to the expanded non-mask area 5a, and a second method for directly determining the bit amount at a predetermined ratio according to the priority of each ROI portion 6a, 6b,. However, any method is selected in advance, and the bit amount of each ROI portion 6a, 6b... And the unmasked area 5a is determined according to the selected method. At this time, as described above, the bit amount of the ROI portions 6a and 6b is set to a value less than the bit amount necessary for the lossless compression. It should be noted that a mode in which the bit amount of the ROI portions 6a and 6b is assigned to be greater than or equal to the bit amount necessary for the lossless compression and a mode in which the bit amount necessary for the lossless compression is set to a value less than the bit amount may be selected.
[0034]
Then, the ROI signals in the ROI portions 6a, 6b,... And the unmasked area 5a are quantized to generate binarized data 10a, 10b,. As a method of this quantization processing, binarization may be performed by a method similar to a bit plane encoding method such as EBCOT or SPIHT (Image Compression with Set Partitioning in Hierarchical Trees).
[0035]
Then, entropy coding is performed on each of the binarized data 10a, 10b... 10z using a predetermined method such as arithmetic coding such as MQ coder or QM coder or Huffman coding.
[0036]
In this way, compressed data 11a, 11b,... 11z for each ROI portion 6a, 6b,.
[0037]
Then, the compressed data 11a, 11b,... 11z are arranged in order according to the priority order, and are sent to a predetermined path, for example, or recorded on a predetermined recording medium such as a hard disk drive.
[0038]
<Decoding method>
The decoding of the image is executed in the flow on the decompression device Ex side in FIG.
[0039]
In the decompression device Ex, as shown in FIG. 1, the compressed data 11a, 11b,... 11z are entropy decoded according to a predetermined method, and binarized in the order in which they are sequentially arranged (that is, the order of priority of the bit rate). Generated data 21a, 21b... 21z.
[0040]
Next, the binarized data 21a, 21b,... 21z are inversely quantized (multi-valued) to generate a plurality of ROI signals 22a, 22b,. At this time, inverse quantization (multi-value quantization) is performed in the order sequentially arranged (that is, the order of higher bit rate priority), and a plurality of ROI signals 22a, 22b,. Generate in high order.
[0041]
Then, on the wavelet plane, wavelet-transformed wavelet coefficients are synthesized for each ROI signal 22a, 22b,. At this time, since the priority of each ROI signal 22a, 22b ... 22z is the order in which the data is sent (that is, the order in which the data is arranged), the wavelet coefficients are sequentially synthesized in accordance with the given order, Wavelet transformed image data 23 is formed. If overlapping portions are generated between the plurality of ROI signals 22a, 22b,... 22z, the higher priority order is selected. If the wavelet coefficients of the non-ROI portion 5 are set to “0”, the values of the ROI signals 22a, 22b,... 22z corresponding to the ROI portions 6a, 6b. Just do it.
[0042]
Finally, the inverse wavelet transform is performed to restore the image data 24 in accordance with the wavelet transform type (mallat type / scale type / packet type).
[0043]
As described above, since the irreversible compression is performed on the ROI portions 6a, 6b... Having an arbitrary shape converted into the portion corresponding to the wavelet plane, the Maxshift method (first conventional example) in which the lossless compression has been performed. ), The size of the compressed image file as a whole can be reduced. .. Can be designated as compared with EBCOT (second conventional example). For example, when it is desired to increase the bit rate of only the face of a person in the background, the processing is not performed. It becomes easy.
[0044]
【The invention's effect】
According to the onset bright, as compared to the EBCOT (Second conventional example), to specify an ROI portion of any shape, such as in order to enhance the bit rate of only the face of the person in the background, the process Becomes easier.
[0045]
Furthermore, according to the present invention, even when a plurality of mask regions having an arbitrary shape are designated, it is possible to easily perform irreversible compression on each mask region. In this case, there is an effect that it is possible to freely set the assigned code amount by individually assigning the code amount to each mask region.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a region-of-interest encoding method according to an embodiment of the present invention.
FIG. 2 is a diagram showing a mallat-type wavelet packet tree;
FIG. 3 is a diagram showing a mallat-type wavelet plane;
FIG. 4 is a diagram illustrating a space type wavelet packet tree;
FIG. 5 is a diagram showing a space-type wavelet plane;
FIG. 6 is a diagram illustrating a packet-type wavelet packet tree.
FIG. 7 is a diagram showing a packet type wavelet plane;
FIG. 8 is a diagram illustrating an example of an original image.
9 is a diagram showing a single mask area set for the original image of FIG. 8. FIG.
10 is a diagram showing a state in which the mask region of FIG. 9 is developed on a mallat wavelet plane. FIG.
FIG. 11 is a diagram illustrating a correspondence relationship between a low-frequency side and a high-frequency side in the inverse wavelet 5 × 3 filter and a mask region between the input side and the input side.
FIG. 12 is a diagram illustrating a correspondence relationship of a mask region between a low frequency side and a high frequency side and an input side in an inverse wavelet 9 × 7 filter.
FIG. 13 is a diagram illustrating a distribution of wavelet coefficients after a known wavelet transform is performed on an original image.
FIG. 14 is a diagram showing a distribution of wavelet coefficients in the first conventional example.
FIG. 15 is a diagram for explaining the concept of EBCOT of the second conventional example.
[Explanation of symbols]
1 Image data (original image)
10a to 10z Binary data 11a to 11z Compressed data 2 Wavelet transformed image data 21a to 21z Binary data 22a to 22z ROI signal 3 Mask signal 4, 4a Mask area 5, 5a Non-ROI portion 6a , 6b ... ROI partial Co compression device Ex expansion device

Claims

A first step of wavelet transforming the original image;
A second step of generating a mask signal to deploy multiple mask areas of any shape specified for the original image, the wavelet plane corresponding to the wavelet transform,
A third step of assigning a code amount when compressing each of the plurality of mask regions and a non-mask region other than the plurality of mask regions;
A fourth step of performing quantization and encoding according to the code amount allocated in the third step;
With
In the second step, when the portions related to different mask regions developed in the wavelet plane overlap each other, a mask signal is generated with the overlapping portion as a portion related to the mask region having a high priority,
The region-of-interest encoding method, wherein the fourth step performs the quantization and the encoding individually on the plurality of mask regions and the non-mask region, and individually generates compressed data.

The region of interest encoding method according to claim 1,
In the fourth step, the wavelet coefficients of the original image wavelet transformed in the first step are quantized into wavelet coefficients related to a plurality of mask areas extracted in association with the mask signal and wavelet coefficients related to a non-mask area. And a region of interest encoding method, wherein encoding is performed.