JP3708218B2

JP3708218B2 - Image coding method

Info

Publication number: JP3708218B2
Application number: JP12210196A
Authority: JP
Inventors: 裕二伊藤
Original assignee: テキサスインスツルメンツインコーポレイテツド
Priority date: 1996-05-17
Filing date: 1996-05-17
Publication date: 2005-10-19
Anticipated expiration: 2016-05-17
Also published as: JPH09322159A

Description

【０００１】
【産業上の利用分野】
本発明は画像の符号化方法、特に、圧縮画像の符号化方法に関する。
【０００２】
【従来の技術及びその課題】
多層方式は、相異なる寸法の領域を持つ像表示を得る為の簡単な方法である。この表示は像の符号化、特に進み伝送（ｐｒｏｇｒｅｓｓｉｖｅｔｒａｎｓｍｉｓｓｉｏｎ）を介しての像の圧縮に有用であり、その結果得られるトリー（ｔｒｅｅ）は、カッドトリー（ｑｕａｄｔｒｅｅ）符号化（Ｅ．Ｓｈｕｓｔｅｒｍａｎ及びＭ．Ｆｅｄｅｒ著“Ｉｍａｇｅｃｏｍｐｒｅｓｓｉｏｎｖｉａｉｍｐｒｏｖｅｄｑｕａｄｔｒｅｅｄｅｃｏｍｐｏｓｉｔｉｏｎａｌｇｏｒｉｔｈｍ”ＩＥＥＥＴｒａｎｓ．ＩｍａｇｅＰｒｏｃｅｓｓｉｎｇ，ｖｏｌ．３，ｎｏ．２，ｐｐ２０７−２１５，１９４４，他）を用いて伝送するのが普通である。
【０００３】
自然の像は可変量の細部及び情報を持つ異なる寸法の領域に分割することができる。像のこの様なセグメント分割は、像データの効率のよい符号化の為に好ましい。カッドトリー（ＱＴ）符号化は、像の分解、即ち多層化を表わす主要な方式であり、像を２次元の均質（ｈｏｍｏｇｅｎｏｕｓ）な方形領域に分割する。その分解がトリーをつくり、トリーの各ノードは４つの子を持つと共に像の一意的に限定された領域に関連する。根が像全体に関連する。ＱＴ符号化を像の圧縮の為に用いる時、その結果得られるトリーを符号化することが必要である。符号化手順は、トリー構造情報の符号化、及びノード／リーフ情報、即ちセグメント分割フラグの符号化を含む。各々のリーフに対し、対応する部分像（ｓｕｂｉｍａｇｅ）の強度を記述するパラメータを割当てる。勿論、像の画素は、それがＱＴコードに現われれば、常にリーフである。トリー構造の符号化は、１番下のレベル、即ち、許容し得る最小寸法のブロックの１つ前のレベルで終了すべきであり、このブロックは画素に等しいか又はそれより大きくなければならない。
【０００４】
しかし、ＱＴ符号化はその有用な性質を活用していない。或るブロックは、隣接するブロックが分割される時、部分ブロックに分割される尤度が一層大きい。この性質は、自然の像は、連続的な小さなブロックを用いて表わすべき何等かの物体を描くものであるのに対し、背景は一層大きなブロックで構成される事実によって説明される。言い換えれば、ＱＴコードには、更に圧縮することができる空間的な冗長度が存在する。これは、物体を簡潔に表現する手法を用いることにより、ＱＴ符号化よりも更に効率よく像の分解を行なうことができることを意味する。物体を表示するための幾つかの候補の一つとして、縁（エッジ）の情報がある。縁情報を用いる一つの方法では、線近似方法によって縁を抽出する（階層形縁検出）。従って、像の分解は各ブロックに縁が存在することに基づいて実施され、頭と肩の像の様な単純な物体の場面には有用であるが、冗長なオーバーヘッド、即ち、縁が、ＱＴ符号化より低い符号化効率を招くのが難点である。
【０００５】
【課題を解決するための手段及び作用】
本発明では、像の符号化の為の改良された多層化方式（ＭｕｌｔｉｌａｙｅｒｉｎｇＳｃｈｅｍｅ）を提供する。縁は画素の精度で符号化しなくてもよく、セグメント分割の際の１番小さいブロック・レベルの精度であればよいから、縁の抽出は縮小された像（ｄｅｃｉｍａｔｅｄｉｍａｇｅ）で行うことができる。その時、適切な多層化を達成しながら、メモリ空間並びに縁抽出用の計算の複雑さと共に、縁データも減らすことができる。
【０００６】
本発明の符号化方法は、入力像の多層表示のためのパラメータを決定し、パラメータが決定された多層表示に対し所望の縮小フィルターを用いて像の縮小を行い、縮小像に対し縁の場所を探すためにラプラス演算を行って単位縁を検出し、検出した単位縁に基きマクロ縁を検出し、検出したマクロ縁を包み込む矩形領域毎に縁強度を算出するステップを含む。
【０００７】
【実施例】
本発明に至った理論的考察を含め、一実施例を図面を参照して説明する。
【０００８】
先ずカッドトリー符号化について以下に説明する。
【０００９】
Ｌ_t,bを像の多層表示とする。ここでｔ及びｂは夫々トップ・レベル及びボトム・レベルを表わすパラメータであり、Ｔ＞ｂ≧０である。この時Ｌ_t,bは次の様に定義される。
【００１０】
【数１】

ｌ_t,iは２ⁱ×２ⁱの寸法のブロックで構成された層であり、これは分解過程が、そのブロックについては、レベルｔで出発して、レベルｉで終わることを意味する。
【００１１】
次に変数を定義する。ｐ_t,iは、次に低いレベル、即ちレベルｉ−１で４個のサブブロックに分割される２ⁱ×２ⁱの寸法のブロックの数であり、ｑ_t,iは、レベルｉ−１より下でも同じ寸法のままにとどまる２ⁱ×２ⁱの寸法のブロックの数であり、この為、このｑ_t,iに対応するブロックが１_t,iを構成する。入力像の寸法がＷ（幅）×Ｈ（高さ）の画素であると仮定し、像の画素の層をレベル０と表わす。その時、２つの変数の間の関係が反復方程式によって表わされる。
【００１２】
【数２】

【００１３】
ｔは次の条件を満たす。
【００１４】
【数３】

【００１５】
パラメータとしてｔ＝５及びｂ＝２である時の多層表示の一例を、２本の縁線が存在する図１に示す。
【００１６】
各々のレベルでの符号化の順序は、正方形の像の場合とは若干異なり、同じ寸法のブロックはラスター走査式の順序になる。即ち、左から右に、そして上から下へである。親のノードに‘１’を割当て、リーフに‘０’を割当てる。図２は、図１に示したＬ_5,2に対応するＱＴコードを示す。ｐ_t,iが黒のノードの数に等しく、ｑ_t,iがレベルｉにおける白のノードの数に相当する。
直接的な符号化を想定して、ＱＴ符号化を用いて像の分解を表わす為に使わなければならない速度Ｒ_t,b〔ビット〕は次の様になる。
【００１７】
【数４】

【００１８】
図２に見られる様に、ＱＴ符号化を用いて、図１に示す多層表示Ｌ_5,2を符号化するには５４ビットを必要とする。本発明は、５４ビットより少ないビットでＬ_5,2を表示することを可能にする。
【００１９】
図１において、縁がこの図に示される様に抽出される場合、その分解過程は縁情報だけに基づく。一つのブロック内に縁が存在すれば、そのブロックは４つの子ブロックにセグメント分割される。本発明においては、上−下手順と共に下−上手順（即ち、セグメント分割がボトムレベルｂから出発する）を用いることができる。その結果の分解は、縁及びパラメータが一定である限り、どの手順を採用しても同一である。
【００２０】
図３は、本発明の一実施例に係る符号器のブロック図である。像信号がパラメータ決定部１に入力し、像縮小部２を介して縁抽出部３に入力される。縁抽出部３は縁データを多層化部４へも出力する。多層化部４は、縁抽出部３から出力された縁データを基に、フレームを多層化する。最初はフレームを大きなブロック（トップ・レベル：ｔで規定されたサイズのブロック）で均等分割し（これを第ｔ層と呼ぶ）、各ブロックにマクロ縁が存在すれば４つの小ブロック（第ｔ−１層に相当）に分割する。そして、この操作をボトム・レベル：ｂで規定された層に到るまで繰り返す。符号化部５は多層化部４からの入力及び像信号とに基き視感度データ（ＬｕｍｉｎｏｓｉｔｙＤａｔａ）を出力する。
【００２１】
以下に各ブロックの機能を詳述する。
【００２２】
（１）パラメータ決定
パラメータ決定部１において、入力像の多層表示Ｌ_t,bのためのパラメータが決定される。
ｄを、縮小フィルタを用いて、水平及び垂直の両方向に入力像が係数１／ｄだけ部分標本化される様にする縮小係数とすると、ボトム・レベルである値ｂはｌｏｇ₂ｄに等しいか又はそれより大きくなることが望ましい。これは、その結果得られる分解が主符号化過程、即ち、符号化部５における適応ブロックをベースとした符号化にとって適切である様に保証するために、縁の場所の特定を十分精密にする必要があるからである。
【００２３】
（２）像の縮小（ＩｍａｇｅＤｅｃｉｍａｔｉｏｎ）
縮小された像に縁の抽出を適用することは、縁データ、計算の複雑さ、並びに像分解の為のメモリ空間が減少する点で利点がある。特に、像データについて云うと、１／４寸法像、即ちｄ＝２を使う時、縁データを１／３に減少することができる。図４は、ｄ＝２として、図１の縮小された像を用いた同様な分解の結果を示す。この場合、２つの縁が組合さって１つの縁になるので、多層表示のデータ量は、もと（図１参照）に比べて大体半分の寸法に減少させることができる。縁の数が減少することは、主に全体的なデータの減少に寄与し、これに対して縁データに対するコード長の変化は、データの減少に対する影響が少ない。使う縮小フィルタは符号器の選択に任される。これは、フィルタがこの後の過程、即ち縁の抽出に大した影響がないからである。
【００２４】
像縮小部２においては、パラメータが決定された多層表示に対し所望の縮小フィルタを用いて像の縮小を行う。更に、分解過程が縮小された像に適応される。実際、こうして得られた分解は若干変更されている。即ち、１つの４×４寸法のブロックはそのまま（図４参照）である。然し、この４×４寸法のブロックは、像の範囲全体の内の僅か１％を占めるにすぎず、この為、この違いは全体的な符号化の性能の低下を殆ど招かない。
【００２５】
（３）縁の抽出
【００２６】
図５に縁抽出部３の内部ブロック図を示す。
【００２７】
縁の場所を突き止める為、最初に周知のラプラース演算子をもとの像に適用し（ラプラス・フィルタ６）、その後、強度変化が大きい位置を表わす２進像、即ち縁を、μ＋Ｋ・σを用いた閾値作用によって求めることができる。ここでμ，σ及びＫは、夫々平均、微分空間の標準偏差及び係数である。一例として、８方向の小さなセグメントのパターンが図６に示されており、これらはテンプレートＴ_n（ｎ＝０，１…７）によって表わされており、各々の入口（ｊ，ｋ）がｔ_n（ｊ，ｋ）で表わされている。
【００２８】
ｊ，ｋ＝０，１，２，３，４として、λ（ｘ＋ｊ，ｙ＋ｋ）で表わされる５×５画素領域で構成される２進像内の部分領域をΛ（ｘ，ｙ）とする。テンプレートＴ_nとこのΛ（ｘ，ｙ）の間の相互相関Ｒ_n（ｘ，ｙ）は次の式によって計算することができる。
【００２９】
【数５】

【００３０】
この後、Ｒ_n（ｘ，ｙ）が８に等しいか又はそれより大きくなる様なｎが存在すれば、ｎビット平面内の座標（ｘ，ｙ）の所でフラグを高にする。ここでｎは０から７まで変化する。これは、テンプレートＴ_nが座標（ｘ，ｙ）で符号パターンとして検出されたことを示す。この過程を２進像全体に適用しなければならない。こうして単位縁が得られる（単位縁検出７）。
【００３１】
次にマクロ縁検出８について説明する。
【００３２】
単位縁を抽出した後、図７に示す様に、マクロ縁の検出が行なわれる。単位縁を接続して、１６方向に、即ち１８０°÷１６＝１１．２５°の間隔で定められたマクロ縁にする。検出の出発点は、８の内の任意のビット平面のフラグが作用している画素と定めることができる。こう云う点をラスター走査式に求めると、出発点より下に接続される探索区域を局限することができる。
【００３３】
ｎビット平面内のフラグが作用している出発点が見付かったと仮定すると、検出過程の方向Ｎは、Ｎ＝２ｎに従って定められる。この方向から次の探索動作が適用される。探索動作の前にマクロ縁の方向を予め決定するのはリスクがあるので、考えられる３つの方向、即ち、Ｎ，Ｎ−１及びＮ＋１の中から、最も起こりそうな方向を選ぶ。考えられる各々の方向で、各々に方向に沿って単位長Ｌ（単位）ごとにある各々の接続点（図７参照）で、マクロ縁が接続されているかどうかを決定する。ｎ，（Ｎ−１）／２又は（Ｎ＋１）／２の何れかのビット平面内でフラグがその接続点又はその近辺、即ち隣り合う８個の画素で作用していれば、マクロ縁を接続点まで伸ばす。こうして求められた３つの候補のマクロ縁の内、１番長いものを符号化すべきマクロ縁とする。一旦マクロ縁が検出されると、同様なマクロ縁を抽出することを避ける為の一種の後処理が適用され得る。抽出されたマクロ縁、並びにｎ，（Ｎ−１）／２又は（Ｎ＋１）／２の何れかのビット平面内の隣り合う８個の画素に対応する一連の画素を中立にする。これは、マクロ縁を減衰させる役割を果たし、像内の抽出されたマクロ縁の数を減らすのに大いに助けになる。
【００３４】
次に強度収集９について説明する。
【００３５】
抽出されたマクロ縁は、ブロックのセグメント分割を行なう時に同等に重要であると見做すことができるが、人間の視覚系の或る性質を考慮に入れた知覚上の順序付けに従ってそれらに等級を付けるのがより妥当である。ここでは、低輝度領域に於ける所定の輝度変化は、高輝度領域に於ける同じ輝度変化よりも、一層目に付きやすく、従って重要であるとするウエーバの法則を採用する。マクロ縁を求めたら、或る延長部Ｂ_extをもってマクロ縁を包み込む矩形領域を縁ベルトとして定める。縁ベルトの一例が図８に示されており、この図では、マクロ縁と平行な軸線及びマクロ縁に対して垂直な軸線を、夫々ｐ及びｑで示してある。こうして、縁ベルト内にある画素値をε（ｐ，ｑ）で表わすことができる。
【００３６】
一般的に、実際の縁は縁ベルト内のマクロ縁に沿って存在すると仮定してよい。ウエーバの法則から導き出した縁強度情報を求める為、各々の縁ベルト内でグレースケールの変化を検査する。（ｉ）縁ベルト全体の中でφで表わすグレー・レベルの平均値を計算する。（ｉｉ）マクロ縁に対して垂直な各々の軸線で、最小及び最大のグレースケールを見付け、こうして夫々低い方の強度δ₀及び高い方の強度δ₁を求める。
【００３７】
【数６】

【００３８】
【数７】

ここでτは縁ベルト内にある軸線ｐに沿った画素の数を表わす。
【００３９】
段形縁（ｓｔｅｐ−ｅｄｇｅ）の簡略例を図９に示すが、この図では、実際の縁の場所が太線で追跡され且つ表示されている。ｑ＝０である一連の画素がマクロ縁に対応することに注意されたい。
【００４０】
Ｆをもとの像に於けるグレースケール関数とする。次に、縁ベルト内で有意でなければならないコントラストＣをＦで定義する。
【００４１】
【数８】
Ｃ＝ΔＦ／Ｆ（８）
【００４２】
抽出された各々のマクロ縁に対し、式（８）は上の計算によって求められた統計を用いて、近似的に書直すことができる。
【００４３】
【数９】
Ｃ＝（δ₁−δ₀）／φ （９）
【００４４】
ウエーバの法則によると、マクロ縁の周りの視覚的な感度は、上に定義したコントラストＣに比例する。次に、コントラストＣが視覚的な感度の大きさを正しく量的に表わすと云う仮定に基づいて、Ｉ_Wで表わす縁強度の考えを導入する。
【００４５】
【数１０】

【００４６】
そのコントラストがθ₀未満である様なマクロ縁は、人間の知覚にとって重要度が劣り、従って像のセグメント分割にとっても重要度が劣ると考えられるので、縁データから取除くことが望ましい。縁強度は、一層高い強度のマクロ縁の周りの領域が一層細かい分解能、即ち一層小さいブロックで表わされる様にするいわゆる知覚を基本とした多層化を提供することができる。更に、各々の縁がその強度値に従った以下の太さを持つことができる様にする方式を実現することも可能である。
【００４７】
【数１１】

【００４８】
表１はマクロ縁当たりの符号化メッセージに関する表である。出発点に関するメッセージは適当な符号化方式を使うことによって更に圧縮することができる。
【００４９】
【表１】

【００５０】
本実施例においては、多層化部４及び符号化部５により、不要なマクロ縁が取り除かれる。その多層化手順を以下に要約する。
【００５１】
縁データをそれに相当するカッドトリーに変換する為に、各々のブロック内の縁の存在を単純に探索し、縁が存在するブロックでは‘１’を割当て、そうでなければ‘０’を割当てる。例えば、上から下への手順の場合、この探索動作はレベルｔからレベルｂ＋１まで適用され、こうして抽出されたマクロ縁に対応するＱＴコードが得られる。マクロ縁がＱＴコード、従って像の分解を一意的に限定することになる。
【００５２】
この手順の利点は、レベル・パラメータｔ及びｂを変えるだけで、分解能を一層細かくなる様に拡大したり、又は一層粗くなる様に減少することができることである。言い換えれば、同じ縁データを使って、種々の多層化を実施することができる。他方、均質性テストに基づく分解は、パラメータが変化する時始めからその過程を出発させなければならない。この性質は、最大符号化ビット速度の様な或る拘束の下での最適化が反復的に満たされる多重パス符号化過程にとって有利である。それは、１回目のパスで抽出された縁データを使った２回目のパスの後でも、縁に基づく多層化を行なうことができるからである。
【００５３】
本発明に係る方式の性能を評価する為に、２種類の実験を行なった。１つは、ＱＴ符号化及び本発明に係る方式双方で、分解トリーを伝送する為のビットをそれぞれ計数し比較することである。縁強度を分類する為のパラメータは、θ₀＝０．２，θ₁＝０．４，θ₂＝０．８，及びθ₃＝１．６に設定した。ＱＴ符号化に対するビット・カウントは、本方式によって出来る分解トリーが得られると仮定して、式（４）に基づいている。図１０は、ＭＰＥＧ−２標準化用のテストシーケンスである“フラワーガーデン”と“テーブルテニス”における多層表示のデータ速度を示す。本方式のアルゴリズムが、多層表示に対するデータの減少の点で、常にＱＴ符号化を凌いでいることが観察される。
【００５４】
もう一方は、一種の主観的な品質試験であって、縁に基づく分解によって生じた視覚的な効果を８×８寸法のブロックによる標本化と比較して示したものである。両方の場合、各々のブロックは平均輝度値によって表わした。公平な比較の為、縁強度の分類に関するパラメータは、本方式によるブロックの総数ができるだけ線形標本化の数に接近する様に、即ち５，２８０ブロックになる様に、経験的に且つ帰納的に決定した。表２が、この実験から得られた統計的な結果を示す。本方式が、ピーク信号対雑音比（ＰＳＮＲ）値の点で一層品質のよい像を提供することが観察される。
【００５５】
【表２】

【００５６】
これらの２つの実験結果から、本発明に係る多層化方式は、比較的少ないデータ量で、像を符号化するのに適切な像の分解を行なうことが確認できる。
【００５７】
以上、本発明の一実施例につき説明したが、本発明はこれに限られるものではない。
【００５８】
以下に、他の発明に関し説明する。これはＭＰＥＧ画像信号の制御方法に関し、特に可変長コードの復号方法に関する。
【００５９】
ＭＰＥＧ−２ビットストリームを復号する時、ＤＣＴ係数の可変長コード（ＶＬＣ）復号が、ビットストリームの復号速度に最も制約を与える。これは、符号化されたビットストリームの５０％−８０％がＤＣＴ係数によって占められているからである。従って、ＶＬＣ復号を早くする努力がなされている。他方、必要とされるメモリ空間が、ハードウェアの実現を考慮すると別の重要な因子である。
【００６０】
ＶＬＣの復号は、種々のＶＬＣで構成される２進ストリングを元の数値に変換することである。１組のＶＬＣが各々の事象の発生確率に従って組立てられる。即ち、事象の持つ確率が高ければ高いほど、その事象には一層短いコードが割り当てられ、コード当たりの平均ビット数を最小限に抑えることができる様にしている。表３は、ＭＰＥＧ−２ＤＣＴ係数に対するＶＬＣテーブルの一例を示す。可変長コードを復号することは、２進ストリング中の２つの相次ぐＶＬＣの間の位置として定義されるコード境界を見つけることに相当する。従来のＶＬＣ復号方法は、コード境界を探すのに、ビット・パターンのマッチングを用いるのが普通であった。
【００６１】
【表３】

【００６２】
例えば、ＭＰＥＧソフトウェア・シミュレーション・グループ（ＭＳＳＧ）によって考えられた復号方法は、１６ビット長の２進ストリングを一度に読取って、１６ビット長のパターンマッチングを適用することにより、対応するＤＣＴ係数に変換する機構を採用している。これは高速の復号を達成するが、この機構は、冗長なコードが少なくない為、ＤＣＴ係数テーブルに対して大量のメモリ空間を必要とする。実際、４３２個のコード項目の内の２５９が冗長であり、驚くべきことに、これは６０％に相当する。これはＶＬＣ復号方法が、関連するコードをアドレスするのに線形オペレーションを用いている為である。時には、ＶＬＣ復号をハードウェアで実現する時、メモリ空間がきめ手になることがある。これは、復号は高価な高速メモリを使うことを必要とするからである。
【００６３】
本発明は、ビット・パターンのマッチングの代わりに、非線形写像オペレーション（ｎｏｎｌｉｎｅａｒｍａｐｐｉｎｇｏｐｅｒａｔｉｏｎ）を用いるＶＬＣ復号方法を提供する。本発明に係る復号方法では、入力ビット列中の区切りから所定ビット数の範囲内で連続する０ビットの個数をカウントし、上記カウントした値に基いた状態レジスタの内容を用いてＤＣＴ係数を復号するステップを含む。
【００６４】
又、複数のアドレスオペレーションを介して状態レジスタの内容を取得するステップを含む復号方法も開示する。
【００６５】
本発明の一実施例を、理論的考察を含め、図面を参照しながら説明する。
【００６６】
本発明に従うＶＬＣ復号方法は、いわゆる自己位置ぎめができる様に設計された非線形写像オペレーションを特徴とする。各々のオペレーションは、それが現在位置を正しく更新することができる様に関連するビット・シフト情報を持っている。従って、係数テーブルにコード長は必要ではない。現在位置は必ずしもコード境界と同じ所にある訳ではなく、単に、その後のオペレーションを再開する時の中間ビット位置を示すにすぎないことに注意されたい。この方法において考慮すべき別の点は、ＩＳＯ／ＩＥＣ１３８１８−２で公表されているＤＣＴ係数テーブルの０とＤＣＴ係数テーブルの１の間の冗長度を全面的に活用しなければならないことである。言い換えれば、係数テーブルは、テーブル全体の規模を縮小することができる様に、冗長コードを取去る様に設計されていなければならない。本発明の一実施例に係る方法を図１１に示す。
【００６７】
図１１は、ＤＣＴ係数の復号が最大４段階で実行されることを意味し、最初の３つの各段階で、後述の表４の１６個のオペレーションのうち規定のものを用いて入力ビット列の情報を順次入手し、第４段階目でそれらの情報をもとに係数を復号することを示している。ここでは、スタート時点で、ゼロ・テーブルかワン・テーブルのうちどちらを使うかが定められていることが前提とされており、この１ビット情報を後述の式（１３）の状態レジスタＲの構成要素Ｘとして保持する。ゼロ・テーブルとワン・テーブルでは、復号法が異なる（図１２及び１３に各々処理手順を図示）。
【００６８】
詳述すると、ゼロ・テーブルとワン・テーブル共に、第１段階（ブロック２１）では、表４のｚｅｒｏｒｕｎ（）というオペレーションを用いて入力ビット列中の区切り（復号が終了している符号語の最後のビットの直後）から６ビットの範囲内で連続する“０”ビットの個数をカウントし、それを状態レジスタＲの構成要素Ｒ０として保持する。そして、入力ビット列中の現在位置を（Ｒ０＋１）ビット進める。第２段階（ブロック２２）では、そのＲ０の値で決定する処理を行い、その結果をＲ１として保持する。第３段階（ブロック２３）では、Ｒ１に基づいた処理を行い、その結果をＲ２として保持し、この時点で状態レジスタＲを得る。第４段階（ブロック２４）で、Ｒを用いて後述の表６の係数テーブルからＤＣＴ係数を復号する。
【００６９】
以下の説明において、ＤＣＴ係数は（ｒｕｎ、ｌｅｖｅｌ）形式で表示し、２進ストリングは‘００１０’と云う形で表記する。特に断らない限り、値は１０進値である。
【００７０】
表４は、本発明に係る方法が用いる写像オペレーションを記述する。１６個の写像オペレーションがあり、その内９つのオペレーションはコードをアドレスし、その後、復号プロセスを終わらせるのに使われ、７つのオペレーションは、定められた値に戻す。
【００７１】
【表４】

【００７２】
表４で、Ｍ（Ｒ，ｎ）（ｎ＝１，２，３，４……）は次の様に定義される写像関数である。
【００７３】
【数１２】
Ｍ（Ｒ，ｎ）＝Ｔａｂｌｅ〔Ｒ〕〔ｇｅｔｂｉｔ（ｎ）〕（１２）
【００７４】
ここでＲは、下記の形で定義される８ビット・レジスタであり、セマンティクスを表５に示す。これは状態レジスタと名付けられる。ｇｅｔ−ｂｉｔ（ｎ）は、次のｎビットの１０進値を戻し、ｎビットだけシフトさせる関数である。
【００７５】
【数１３】
Ｒ≡Ｘ；Ｒ０；Ｒ１；Ｒ２（１３）
【００７６】
【表５】

【００７７】
オペレーションｔｅｒｍ−＃４（）即ち、ｅｓｃａｐｅ−ｃｏｄｅはＣソース・ランゲージで以下の様に記述することができる。
【００７８】
【数１４】

【００７９】
ここで、本発明に係る方法に使われる係数テーブルを定義する。この場合、各々のコードが、２次元アレイの項目としてアクセスされる。例えば、〔０；０；０；０〕〔０〕は（Ｎ／Ａ，ＥＯＢ）に相当する。
【００８０】
【表６】

【００８１】
図１２は、本発明の一実施例に係る復号アルゴリズムを示し、ＤＣＴ係数テーブルの０の場合、即ち、ｉｎｔｒａ−ｖｌｃ−ｆｏｒｍａｔが作用していない時のこれらのオペレーションを有するＶＬＣ復号アルゴリズムを示す。この図において、Ｒ０とＲ１が求められ、これは図１１のブロック図における第１のアドレス・オペレーション（ブロック２１）および第２のアドレス・オペレーション（ブロック２２）に対応する。このプロセスは符号ビットをカバーしない。これは、ＶＬＣの他の先行部分が分かれば、容易に扱うことができるからである。このアルゴリズムが１つのコードの復号プロセスを示す。然し、最初の判定ルーチン、即ちｕｓｅ−ｉｎｔｒａ−ｖｌｃ（）は飛越すことができる。これは、映像層シンタクスｉｎｔｒａ−ｖｌｃ−ｆｏｒｍａｔが作用していない時、戻った値は０に固定されているからである。従って、こう云う状態では、復号プロセスは次のオペレーション、即ちｚｅｒｏ−ｒｕｎ（）から開始することができ、Ｘを強制的に０にする。それでも、状態レジスタＲは、ｕｓｅ−ｉｎｔｒａ−ｖｌｃ（）を適用する前に０；０；０；０にリセットすべきである。
【００８２】
ｉｎｔｒａ−ｖｌｃ−ｆｏｒｍａｔが作用している場合、図１３に示した別のアルゴリズムが使用される。この図において、Ｒ０、Ｒ１及びＲ２が求められ、これは図１１のブロック図における第１オペレーション（ブロック２１）、第２オペレーション（ブロック２２）及び第３オペレーション（ブロック２３）に対応する。
【００８３】
図１２及び図１３から、図１１に示すように、オペレーションｕｓｅ−ｉｎｔｒａ−ｖｌｃ（）を除いて、係数を復号するには、本発明に係るＶＬＣ復号方法は精々３回のアドレスオペレーションを必要とすることが判る。
【００８４】
表７は、２進ストリング‘００１０００１０’が現われた時のＶＬＣ復号プロセスの一例を示す。復号結果としてテーブル〔０；２；１；０〕〔２〕が得られることを示している。即ち、２進ストリング‘００１０００１０’はコード（１２，１）に対応する。表７の「現在位置」の欄で示されるビット間の縦線がビット列の区切り（符号語の区切りではなく、本発明に係るオペレーションを単位とする区切り）を示している。表７の例では、入力ビット列は右から左へ流れており、区切りより左にあるビット列は、既にそこから情報が抽出され、今後の復号処理に影響を及ぼさないデータとしてみなす。
【００８５】
【表７】

【００８６】
本発明に係る方法が、種々の符号化されたビットストリングからＤＣＴ係数を正しく復号することがシミュレーションによって検証されている。
【００８７】
テーブルの規模の縮小の場合と同じく、本方法は、１５４個の係数コード持つＶＬＣ復号（１１２個はＤＣＴ係数テーブルの０に対し、４２個はＤＣＴ係数テーブルの１に対し）を可能にする。これに対して、ＭＰＥＧソフトウェア・シミュレーション・グループ（ＭＳＳＧ）によって考えられた復号方法は、４３２個の係数コードを使う。更に精密な比較の為、表８は、各々の方法に対してコード項目を表わすのに要するビット数で、係数テーブルのフォーマットを示している。最終的には、テーブルの規模は、夫々本発明に係る方法とＭＳＳＧ復号方法に対し、１，６９４ビット（１５４×１１）及び６，９１２ビット（４３２×１６）である。その比は大まかに云うと４：１に等しい。
【００８８】
【表８】

【００８９】
更に、バス・アーキテクチャについて云うと、ＭＳＳＧ復号方法は８ビット・アドレス・バスを必要とする。これは、係数テーブルの最大項目数が２５０だからである。これに対し、表６に示す様な本発明に係る方法では、４ビット・アドレス・バスで十分である。
【００９０】
上述のように、各段階毎に復号のための情報（状態レジスタ）を取得し、現在位置を更新しながら復号を進めることができるので、最終的に参照する係数テーブルには符号長を表すフィールドが不要になり、その分だけ係数テーブルを縮小することができる。さらに係数テーブルを参照する際のアドレスビット長を従来（ＭＳＳＧ）の８ビットに対し、４ビットにできるので係数テーブルのエントリー数を半減できる。これらの２つの効果により係数テーブルを従来の１／３ないし１／４にできる。
【００９１】
以下に別の発明に関し説明する。この発明はビデオ映像信号の動き補償に関する。従来の動き補償を用いた符号化方式では、フレームを均等なサイズのブロックに分割して、そのブロック単位で動きを推定していた。近年、これを改良したものとして図１４に示すような可変サイズのブロック単位で動き補償を行う方式が検討されている。この手法により、フレームは、物体の近傍では小さなブロックに、また、背景など均質な領域では大きなブロックに分割される。その目的は、誤差が発生しやすい物体の近傍できめ細かく動き情報を伝送することで、フレーム全体で発生する誤差を抑制することである。しかし、この可変サイズブロック動き補償では、ブロック分割のための情報を四分木構造のデータで伝送した場合、符号化の利得に対してオーバーヘッドが大きくなるという問題が指摘されている。そこで、提案手法では、このオーバーヘッドを軽減するために、エッジ情報を抽出・伝送することにより可変サイズブロック分割を行う。すなわち、提案手法は、エッジに基づく可変サイズブロック分割を特徴とする動き補償符号化方式である。
【００９２】
従来から用いられている画素単位の追跡によるエッジ抽出では、オーバーヘッドの情報量が大きくなるので、これを防止するため提案手法では、エッジを点ではなく線分で近似的に表現する方法を採用する。ブロック分割アルゴリズムの機能ブロック図を図１５に示す。
【００９３】
まず、原画像（または、復号フレーム）に２次微分フィルタをかけることによって輝度変化の不連続点を抽出する。そして、閾値処理により得られた点（エッジ点と呼ぶ）の集合に対して２段階の群化処理を施し、物体の形状を表現する線分を抽出する。その第１段階の群化処理において、エッジ点の集合と８方向に量子化した５×５画素の大きさの線分パターンマスクとのテンプレート・マッチングを行ない方向性を持った線分（単位線分と呼ぶ）を抽出する（ブロック６１）。これが単位線分の抽出である。
【００９４】
次にマクロ線分の抽出（ブロック６２）を説明する。人間の視覚中枢細胞は物体の回転に対して１０°単位で反応する性質（これを方位選択性という）があることから、第２段階の郡化処理においては、同一方向に連続する単位線分を結合し、１６方向に量子化した線分（マクロ線分と呼ぶ）を抽出する。マクロ線分の例は図１６に示す通りである。マクロ線分を表現するデータ形式は、各線分の端点の２次元座標、方向（１６に量子化した内の１方向）及び長さ（結合画素数）である。
【００９５】
そしてブロック分割（ブロック６３）では、最初はフレームを大きなブロック（許容最大ブロック）で均等分割し（これを、第１層と呼ぶ）、各ブロックにマクロ線分が存在すれば４つの小ブロックに分割する（これを、第２層と呼ぶ）。そして、この操作を、予め規定した許容最小ブロックを得るまで繰り返す。図１６のブロック分割では、第１層から第３層までが存在している。
【００９６】
これまでの実験結果からオーバーヘッドの情報量は、画像の内容に依存せず、第２層までの分割であれば提案手法と四分木構造はほぼ同等、第３層以上であれば提案手法の方が少ない（つまり、有利である）ことを確認した。また、図１４中破線で示したブロックは、提案手法では通常エッジ抽出を原画像上で行うが、代替案として以前に復号したフレームを用いる方法があることを示している。この代替案では、符号化側の局所復号化部と復号化側ではともに等しい復号フレームが存在するので、エッジ抽出法が同じであれば得られるエッジも等しくなることを利用している。この手法の利点は、オーバーヘッドがなくなるということである。しかし、復号フレームと実際に処理しているフレームが類似していることを前提としているため、両者に違いが生じた場合は誤った分割を行うことになり符号化効率が低下する。従って、原画像を用いる場合と復号フレームを用いる場合を適応的に切換える機構が必要となる。
【００９７】
【発明の効果】
より少ないデータ量で、像を符号化するのに適切な像の分解を行うことができる。
【図面の簡単な説明】
【図１】多層表示の一例を示す図。
【図２】図１に示した多層表示例のＱＴコードを示す図。
【図３】本発明の一実施例に係る符号器のブロック図。
【図４】縮小された像を説明する図。
【図５】図３の符号器内の縁抽出部のブロック図。
【図６】８方向の小セグメント・パターンを示す図。
【図７】マクロ縁の検出を説明する図。
【図８】縁ベルトの一例を示す図。
【図９】段形縁の一例を示す図。
【図１０】シミュレーションによる多層表示のデータ速度を比較するグラフ。
【図１１】他の発明に係る復号方法の概念を示すブロック図。
【図１２】他の発明の一実施例に係る復号アルゴリズムを説明する図（ＤＣＴ係数テーブルが０の場合）。
【図１３】他の発明の一実施例に係る復号アルゴリズムを説明する図（ＤＣＴ係数テーブルが１の場合）。
【図１４】別の発明の一実施例を説明するブロック図。
【図１５】別の発明におけるアルゴリズムを説明する図。
【図１６】ブロック分割を説明する図。
【符号の説明】
１パラメータ決定部
２像縮小部
３縁抽出部
４多層化部
５符号化部[0001]
[Industrial application fields]
The present invention relates to an image encoding method, and more particularly, to a compressed image encoding method.
[0002]
[Prior art and problems]
The multilayer system is a simple method for obtaining an image display having regions having different dimensions. This representation is useful for image coding, particularly image compression via progressive transmission, and the resulting tree is a quadtree coding (E. Schusterman and M.S.). It is generally transmitted using “Image compression via improved quadrature decomposition algorithm” by Feder, IEEE Trans. Image Processing, vol.3, no.2, pp207-215, 1944, etc.
[0003]
Natural images can be divided into regions of different dimensions with variable amounts of detail and information. Such segmentation of the image is preferred for efficient encoding of the image data. Quadtree (QT) coding is the primary method for image decomposition, or multilayering, and divides the image into two-dimensional homogenous rectangular regions. The decomposition creates a tree, where each node of the tree has four children and is associated with a uniquely limited region of the image. The root is related to the entire image. When QT encoding is used for image compression, it is necessary to encode the resulting tree. The encoding procedure includes encoding tree structure information and encoding node / leaf information, that is, segment division flags. For each leaf, a parameter describing the intensity of the corresponding subimage is assigned. Of course, an image pixel is always a leaf if it appears in the QT code. Tree structure encoding should end at the lowest level, ie, the level one block before the smallest allowable block, and this block must be equal to or greater than the pixel.
[0004]
However, QT coding does not take advantage of its useful properties. A certain block has a higher likelihood of being divided into partial blocks when adjacent blocks are divided. This property is explained by the fact that the natural image describes any object to be represented using successive small blocks, whereas the background is composed of larger blocks. In other words, the QT code has a spatial redundancy that can be further compressed. This means that image decomposition can be performed more efficiently than QT coding by using a method for simply expressing an object. As one of several candidates for displaying an object, there is edge information. In one method using edge information, edges are extracted by a line approximation method (hierarchical edge detection). Thus, image decomposition is performed based on the presence of edges in each block and is useful for scenes of simple objects such as head and shoulder images, but the redundant overhead, i.e. It is difficult to incur lower encoding efficiency than encoding.
[0005]
[Means and Actions for Solving the Problems]
The present invention provides an improved multi-layering scheme for image coding. Edges do not have to be encoded with pixel accuracy and can be of the smallest block level accuracy when segmenting, so that edge extraction can be done with a reduced image. At that time, the edge data can also be reduced, along with the memory space as well as the computational complexity for edge extraction, while achieving proper multi-layering.
[0006]
The encoding method of the present invention determines parameters for multilayer display of an input image, performs image reduction using a desired reduction filter for the multilayer display for which the parameters have been determined, and places the edges of the reduced image. In order to search for the image, a unit edge is detected by detecting a unit edge, a macro edge is detected based on the detected unit edge, and an edge strength is calculated for each rectangular region surrounding the detected macro edge.
[0007]
【Example】
One embodiment will be described with reference to the drawings, including theoretical considerations leading to the present invention.
[0008]
First, quadtree coding will be described below.
[0009]
L _{t, b} Is a multilayer display of images. Here, t and b are parameters representing the top level and the bottom level, respectively, and T> b ≧ 0. At this time L _{t, b} Is defined as follows.
[0010]
[Expression 1]

l _{t, i} Is 2 ⁱ × 2 ⁱ Layer, which means that the decomposition process starts at level t and ends at level i for that block.
[0011]
Next, define variables. p _{t, i} Is divided into 4 sub-blocks at the next lower level, i.e. level i-1. ⁱ × 2 ⁱ The number of blocks of the dimension q _{t, i} Remains the same dimension below level i-1 2 ⁱ × 2 ⁱ The number of blocks with dimensions of _{t, i} 1 block corresponding to _{t, i} Configure. Assuming that the dimensions of the input image are pixels of W (width) × H (height), the pixel layer of the image is represented as level 0. Then the relationship between the two variables is represented by an iterative equation.
[0012]
[Expression 2]

[0013]
t satisfies the following condition.
[0014]
[Equation 3]

[0015]
An example of multi-layer display when t = 5 and b = 2 as parameters is shown in FIG. 1 where two edge lines exist.
[0016]
The order of encoding at each level is slightly different from that of a square image, and blocks of the same size are in a raster scan order. That is, from left to right and from top to bottom. Assign '1' to the parent node and assign '0' to the leaf. FIG. 2 shows the L shown in FIG. _5,2 QT code corresponding to is shown. p _{t, i} Is equal to the number of black nodes, q _{t, i} Corresponds to the number of white nodes at level i.
Assuming direct encoding, the rate R that must be used to represent image decomposition using QT encoding _{t, b} [Bit] is as follows.
[0017]
[Expression 4]

[0018]
As seen in FIG. 2, the multi-layer display L shown in FIG. _5,2 Requires 54 bits. The present invention uses less than 54 bits for L _5,2 It is possible to display.
[0019]
In FIG. 1, when edges are extracted as shown in this figure, the decomposition process is based solely on edge information. If an edge exists within a block, the block is segmented into four child blocks. In the present invention, the lower-upper procedure (that is, the segmentation starts from the bottom level b) can be used together with the upper-lower procedure. The resulting decomposition is the same no matter which procedure is employed, as long as the edges and parameters are constant.
[0020]
FIG. 3 is a block diagram of an encoder according to an embodiment of the present invention. An image signal is input to the parameter determination unit 1 and input to the edge extraction unit 3 via the image reduction unit 2. The edge extraction unit 3 also outputs the edge data to the multilayering unit 4. The multi-layer unit 4 multi-layers the frame based on the edge data output from the edge extraction unit 3. At first, the frame is equally divided into large blocks (top level: a block having a size defined by t) (this is called a t-th layer), and if each block has a macro edge, four small blocks (t -1 layer). This operation is repeated until reaching the layer defined by the bottom level: b. The encoding unit 5 outputs visibility data based on the input from the multilayering unit 4 and the image signal.
[0021]
The function of each block will be described in detail below.
[0022]
(1) Parameter determination
In the parameter determination unit 1, a multi-layer display L of input images _{t, b} The parameters for are determined.
If d is a reduction factor that causes the input image to be partially sampled by a factor 1 / d in both the horizontal and vertical directions using a reduction filter, then the value b at the bottom level is log ₂ It is desirable to be equal to or greater than d. This ensures that the location of the edges is sufficiently precise to ensure that the resulting decomposition is appropriate for the main coding process, i.e. coding based on adaptive blocks in the coding unit 5. It is necessary.
[0023]
(2) Image Decimation
Applying edge extraction to a reduced image is advantageous in that it reduces edge data, computational complexity, and memory space for image decomposition. In particular, with respect to image data, the edge data can be reduced to 1/3 when using a 1/4 size image, i.e. d = 2. FIG. 4 shows the result of a similar decomposition using the reduced image of FIG. 1 with d = 2. In this case, since the two edges are combined into one edge, the data amount of the multi-layer display can be reduced to about half the size compared to the original (see FIG. 1). The reduction in the number of edges mainly contributes to the overall data reduction, whereas the change in code length for edge data has less impact on the data reduction. The reduction filter to use is left to the choice of the encoder. This is because the filter has no significant effect on the subsequent process, ie edge extraction.
[0024]
The image reduction unit 2 reduces the image using a desired reduction filter for the multilayer display for which the parameters are determined. Furthermore, the decomposition process is applied to the reduced image. In fact, the decomposition thus obtained is slightly modified. That is, one 4 × 4 block is left as it is (see FIG. 4). However, this 4 × 4 sized block only accounts for 1% of the total image area, so this difference causes little degradation of the overall coding performance.
[0025]
(3) Edge extraction
[0026]
FIG. 5 shows an internal block diagram of the edge extraction unit 3.
[0027]
In order to locate the edge, first a well-known Laplacian operator is applied to the original image (Laplace filter 6), and then the binary image representing the position where the intensity change is large, that is, the edge is expressed as μ + K · σ It can be determined by the threshold action used. Here, μ, σ, and K are the average, the standard deviation of the differential space, and the coefficient, respectively. As an example, a pattern of small segments in 8 directions is shown in FIG. _n (N = 0, 1... 7) and each entrance (j, k) is t _n (J, k).
[0028]
Assuming j, k = 0, 1, 2, 3, and 4, a partial region in the binary image composed of a 5 × 5 pixel region represented by λ (x + j, y + k) is Λ (x, y). Template T _n And the cross-correlation R between this Λ (x, y) _n (X, y) can be calculated by the following equation.
[0029]
[Equation 5]

[0030]
After this, R _n If n exists such that (x, y) is equal to or greater than 8, the flag is raised at the coordinates (x, y) in the n-bit plane. Here, n varies from 0 to 7. This is the template T _n Is detected as a code pattern at coordinates (x, y). This process must be applied to the entire binary image. A unit edge is thus obtained (unit edge detection 7).
[0031]
Next, the macro edge detection 8 will be described.
[0032]
After extracting the unit edge, the macro edge is detected as shown in FIG. The unit edges are connected to form macro edges defined in 16 directions, that is, with an interval of 180 ° ÷ 16 = 11.25 °. The starting point of detection can be defined as the pixel on which the flag of any bit plane among 8 is operating. If these points are obtained by raster scanning, the search area connected below the starting point can be localized.
[0033]
Assuming that the starting point where the flag in the n-bit plane is working is found, the direction N of the detection process is determined according to N = 2n. The next search operation is applied from this direction. Predetermining the direction of the macro edge prior to the search operation is risky, so the most likely direction is selected from the three possible directions, namely N, N-1 and N + 1. In each possible direction, it is determined whether a macro edge is connected at each connection point (see FIG. 7) for each unit length L (unit) along each direction. Connect a macro edge if the flag is operating at or near its connection point in the bit plane of either n, (N-1) / 2 or (N + 1) / 2, i.e. adjacent 8 pixels Stretch to the point. Of the three candidate macro edges thus obtained, the longest one is defined as the macro edge to be encoded. Once a macro edge is detected, a kind of post-processing can be applied to avoid extracting similar macro edges. Neutralize the sequence of pixels corresponding to the extracted macro edges and the eight adjacent pixels in either n, (N−1) / 2 or (N + 1) / 2 bit planes. This serves to attenuate the macro edges and greatly assists in reducing the number of extracted macro edges in the image.
[0034]
Next, the intensity collection 9 will be described.
[0035]
The extracted macro edges can be considered equally important when segmenting blocks, but they are graded according to a perceptual ordering that takes into account certain properties of the human visual system. It is more reasonable to put it on. Here, Weber's law is adopted in which the predetermined luminance change in the low luminance region is more noticeable and therefore more important than the same luminance change in the high luminance region. If you want a macro edge, _ext A rectangular region that wraps around the macro edge is defined as an edge belt. An example of an edge belt is shown in FIG. 8, where the axis parallel to the macro edge and the axis perpendicular to the macro edge are indicated by p and q, respectively. Thus, the pixel value within the edge belt can be represented by ε (p, q).
[0036]
In general, it may be assumed that the actual edge exists along the macro edge in the edge belt. In order to obtain edge strength information derived from Weber's law, grayscale changes are examined within each edge belt. (I) The average value of the gray level represented by φ in the entire edge belt is calculated. (Ii) find the minimum and maximum gray scale at each axis perpendicular to the macro edge, thus the lower intensity δ, respectively. ₀ And higher strength δ ₁ Ask for.
[0037]
[Formula 6]

[0038]
[Expression 7]

Where τ represents the number of pixels along axis p in the edge belt.
[0039]
A simplified example of a step-edge is shown in FIG. 9, where the actual edge location is tracked and displayed in bold lines. Note that the series of pixels with q = 0 corresponds to the macro edge.
[0040]
Let F be the grayscale function in the original image. Next, F is defined as the contrast C that must be significant within the edge belt.
[0041]
[Equation 8]
C = ΔF / F (8)
[0042]
For each extracted macro edge, equation (8) can be rewritten approximately using the statistics determined by the above calculations.
[0043]
[Equation 9]
C = (δ ₁ −δ ₀ ) / Φ (9)
[0044]
According to Weber's law, the visual sensitivity around the macro edge is proportional to the contrast C defined above. Next, based on the assumption that contrast C represents the magnitude of visual sensitivity correctly and quantitatively, I _W The idea of edge strength expressed by
[0045]
[Expression 10]

[0046]
The contrast is θ ₀ Macro edges that are less than are less important for human perception and are therefore considered less important for image segmentation and are therefore preferably removed from the edge data. Edge strength can provide a so-called perceptual multi-layering in which the area around the higher-intensity macro edge is represented with finer resolution, i.e. smaller blocks. Furthermore, it is possible to realize a scheme in which each edge can have the following thickness according to its strength value.
[0047]
[Expression 11]

[0048]
Table 1 is a table relating to coded messages per macro edge. The message about the starting point can be further compressed by using an appropriate encoding scheme.
[0049]
[Table 1]

[0050]
In the present embodiment, unnecessary macro edges are removed by the multilayer unit 4 and the encoding unit 5. The multilayering procedure is summarized below.
[0051]
To convert the edge data into the corresponding quadtree, simply search for the presence of an edge in each block, assigning '1' to the block where the edge exists, and assigning '0' otherwise. For example, in the case of a procedure from top to bottom, this search operation is applied from level t to level b + 1, and thus a QT code corresponding to the extracted macro edge is obtained. The macro edge will uniquely limit the QT code and hence image resolution.
[0052]
The advantage of this procedure is that the resolution can be increased to be finer or reduced to be coarser by simply changing the level parameters t and b. In other words, various multilayers can be implemented using the same edge data. On the other hand, decomposition based on homogeneity tests must start the process from the beginning when the parameters change. This property is advantageous for multi-pass coding processes where optimization under certain constraints such as maximum coding bit rate is iteratively satisfied. This is because edge-based multi-layering can be performed even after the second pass using the edge data extracted in the first pass.
[0053]
In order to evaluate the performance of the method according to the present invention, two types of experiments were conducted. One is to count and compare each bit for transmitting the decomposition tree in both the QT coding and the method according to the present invention. The parameter for classifying edge strength is θ ₀ = 0.2, θ ₁ = 0.4, θ ₂ = 0.8 and θ _Three = 1.6. The bit count for QT coding is based on equation (4), assuming that the decomposition tree that can be generated by this scheme is obtained. FIG. 10 shows the data rate of the multi-layer display in “Flower Garden” and “Table Tennis” which are test sequences for MPEG-2 standardization. It is observed that the algorithm of this scheme always outperforms QT coding in terms of data reduction for multi-layer display.
[0054]
The other is a kind of subjective quality test that shows the visual effect caused by edge-based decomposition compared to sampling with 8 × 8 sized blocks. In both cases, each block was represented by an average luminance value. For fair comparison, the parameters for edge strength classification are empirically and inductively so that the total number of blocks according to this scheme is as close to the number of linear samplings as possible, ie 5,280 blocks. Were determined. Table 2 shows the statistical results obtained from this experiment. It is observed that this scheme provides a better quality image in terms of peak signal to noise ratio (PSNR) values.
[0055]
[Table 2]

[0056]
From these two experimental results, it can be confirmed that the multi-layered system according to the present invention performs image decomposition suitable for encoding an image with a relatively small amount of data.
[0057]
Although one embodiment of the present invention has been described above, the present invention is not limited to this.
[0058]
Hereinafter, other inventions will be described. This relates to a method for controlling an MPEG image signal, and particularly to a method for decoding a variable length code.
[0059]
When decoding an MPEG-2 bitstream, variable length code (VLC) decoding of DCT coefficients places the most constraints on the decoding speed of the bitstream. This is because 50% -80% of the encoded bitstream is occupied by DCT coefficients. Therefore, efforts are made to speed up VLC decoding. On the other hand, the required memory space is another important factor when considering hardware implementation.
[0060]
The decoding of VLC is to convert a binary string composed of various VLCs into an original numerical value. A set of VLCs are assembled according to the probability of each event occurring. That is, the higher the probability of an event, the shorter the code assigned to the event, so that the average number of bits per code can be minimized. Table 3 shows an example of a VLC table for MPEG-2 DCT coefficients. Decoding a variable length code is equivalent to finding a code boundary defined as a position between two successive VLCs in a binary string. Conventional VLC decoding methods typically use bit pattern matching to find code boundaries.
[0061]
[Table 3]

[0062]
For example, a decoding method conceived by the MPEG Software Simulation Group (MSSG) reads a 16-bit long binary string at a time and applies 16-bit long pattern matching to convert it to the corresponding DCT coefficients. Adopting a mechanism to This achieves fast decoding, but this mechanism requires a large amount of memory space for the DCT coefficient table because there are many redundant codes. In fact, 259 of the 432 code items are redundant, and surprisingly this corresponds to 60%. This is because the VLC decoding method uses linear operations to address the associated code. Sometimes, when VLC decoding is implemented in hardware, the memory space may be a master. This is because decoding requires the use of expensive high speed memory.
[0063]
The present invention provides a VLC decoding method that uses a non-linear mapping operation instead of bit pattern matching. In the decoding method according to the present invention, the number of 0 bits consecutive within a predetermined number of bits from the break in the input bit string is counted, and the DCT coefficient is decoded using the contents of the status register based on the counted value. Includes steps.
[0064]
Also disclosed is a decoding method including the step of obtaining the contents of the status register via a plurality of address operations.
[0065]
An embodiment of the present invention will be described with reference to the drawings, including theoretical considerations.
[0066]
The VLC decoding method according to the invention features a non-linear mapping operation designed to allow so-called self-positioning. Each operation has associated bit shift information so that it can correctly update the current position. Therefore, no code length is required in the coefficient table. Note that the current position is not necessarily co-located with the code boundary, but merely indicates the intermediate bit position when subsequent operations resume. Another point to consider in this method is that the redundancy between DCT coefficient table 0 and DCT coefficient table 1 published in ISO / IEC 13818-2 must be fully exploited. . In other words, the coefficient table must be designed to remove redundant code so that the overall table size can be reduced. A method according to an embodiment of the present invention is shown in FIG.
[0067]
FIG. 11 means that the decoding of the DCT coefficient is executed in a maximum of four stages. In the first three stages, information of the input bit string is used using the prescribed one of 16 operations shown in Table 4 to be described later. Are sequentially obtained, and the coefficient is decoded based on the information in the fourth stage. Here, it is assumed that at the start time, it is determined which one of the zero table and the one table is used, and this 1-bit information is used as the configuration of the state register R in the equation (13) described later. Hold as element X. The decoding method is different between the zero table and the one table (the processing procedure is shown in FIGS. 12 and 13, respectively).
[0068]
More specifically, in both the zero table and the one table, in the first stage (block 21), an operation called zero run () in Table 4 is used to delimit the input bit string (the end of the codeword that has been decoded). The number of consecutive “0” bits within a range of 6 bits from the bit immediately after (1) is counted and held as a component R 0 of the status register R. Then, the current position in the input bit string is advanced by (R0 + 1) bits. In the second stage (block 22), processing is performed based on the value of R0, and the result is held as R1. In the third stage (block 23), processing based on R1 is performed, the result is held as R2, and the status register R is obtained at this point. In the fourth step (block 24), DCT coefficients are decoded from the coefficient table shown in Table 6 described later using R.
[0069]
In the following description, DCT coefficients are expressed in (run, level) format, and binary strings are expressed in the form '0010'. Unless otherwise noted, the values are decimal values.
[0070]
Table 4 describes the mapping operations used by the method according to the invention. There are 16 mapping operations, of which 9 operations address the code and are then used to end the decoding process, and 7 operations return to a defined value.
[0071]
[Table 4]

[0072]
In Table 4, M (R, n) (n = 1, 2, 3, 4...) Is a mapping function defined as follows.
[0073]
[Expression 12]
M (R, n) = Table [R] [get bit (n)] (12)
[0074]
Here, R is an 8-bit register defined as follows, and the semantics are shown in Table 5. This is named the status register. get-bit (n) is a function that returns the next n-bit decimal value and shifts it by n bits.
[0075]
[Formula 13]
R≡X; R0; R1; R2 (13)
[0076]
[Table 5]

[0077]
Operation term- # 4 (), that is, escape-code, can be described in the C source language as follows.
[0078]
[Expression 14]

[0079]
Here, a coefficient table used in the method according to the present invention is defined. In this case, each code is accessed as a two-dimensional array item. For example, [0; 0; 0; 0] [0] corresponds to (N / A, EOB).
[0080]
[Table 6]

[0081]
FIG. 12 shows a decoding algorithm according to an embodiment of the present invention, and shows a VLC decoding algorithm having these operations when the DCT coefficient table is 0, that is, when intra-vlc-format is not working. In this figure, R0 and R1 are determined, which correspond to the first address operation (block 21) and the second address operation (block 22) in the block diagram of FIG. This process does not cover the sign bit. This is because it can be easily handled if other leading parts of the VLC are known. This algorithm represents the decoding process of one code. However, the first decision routine, ie use-intra-vlc (), can be skipped. This is because the returned value is fixed to 0 when the video layer syntax intra-vlc-format is not working. Thus, in these situations, the decoding process can begin with the next operation, zero-run (), forcing X to zero. Nevertheless, the status register R should be reset to 0; 0; 0; 0 before applying use-intra-vlc ().
[0082]
When intra-vlc-format is working, another algorithm shown in FIG. 13 is used. In this figure, R0, R1 and R2 are determined, which correspond to the first operation (block 21), the second operation (block 22) and the third operation (block 23) in the block diagram of FIG.
[0083]
From FIG. 12 and FIG. 13, as shown in FIG. 11, except for the operation use-intra-vlc (), in order to decode the coefficients, the VLC decoding method according to the present invention requires at most three address operations. I know that
[0084]
Table 7 shows an example of the VLC decoding process when the binary string '00100010' appears. The table [0; 2; 1; 0] [2] is obtained as the decoding result. That is, the binary string “00100010” corresponds to the code (12, 1). A vertical line between bits shown in the “current position” column of Table 7 indicates a bit string delimiter (not a delimiter of code words but a delimiter in units of operations according to the present invention). In the example of Table 7, the input bit string flows from right to left, and the bit string to the left of the break is regarded as data from which information has already been extracted and does not affect future decoding processing.
[0085]
[Table 7]

[0086]
It has been verified by simulation that the method according to the invention correctly decodes the DCT coefficients from various encoded bit strings.
[0087]
As in the case of table scale reduction, the method allows VLC decoding with 154 coefficient codes (112 for 0 in the DCT coefficient table and 42 for 1 in the DCT coefficient table). In contrast, the decoding method conceived by the MPEG software simulation group (MSSG) uses 432 coefficient codes. For a more precise comparison, Table 8 shows the coefficient table format in terms of the number of bits required to represent a code item for each method. Eventually, the table size is 1,694 bits (154 × 11) and 6,912 bits (432 × 16) for the method according to the present invention and the MSSG decoding method, respectively. The ratio is roughly equal to 4: 1.
[0088]
[Table 8]

[0089]
Further, with respect to the bus architecture, the MSSG decoding method requires an 8-bit address bus. This is because the maximum number of items in the coefficient table is 250. On the other hand, in the method according to the present invention as shown in Table 6, a 4-bit address bus is sufficient.
[0090]
As described above, decoding information (status register) can be acquired at each stage, and decoding can proceed while updating the current position. Is unnecessary, and the coefficient table can be reduced accordingly. Furthermore, since the address bit length when referring to the coefficient table can be made 4 bits compared to the conventional 8 bits (MSSG), the number of entries in the coefficient table can be halved. With these two effects, the coefficient table can be reduced to 1/3 to 1/4 of the conventional one.
[0091]
Another invention will be described below. The present invention relates to motion compensation of a video image signal. In a conventional encoding method using motion compensation, a frame is divided into equal-sized blocks, and motion is estimated in units of the blocks. In recent years, a method for performing motion compensation in units of variable-size blocks as shown in FIG. By this method, the frame is divided into small blocks in the vicinity of the object, and large blocks in a homogeneous region such as the background. The purpose is to suppress errors that occur in the entire frame by transmitting detailed motion information in the vicinity of an object where errors are likely to occur. However, in this variable-size block motion compensation, a problem has been pointed out that when information for block division is transmitted as data of a quadtree structure, overhead increases with respect to coding gain. Therefore, in the proposed method, variable size block division is performed by extracting and transmitting edge information in order to reduce this overhead. In other words, the proposed method is a motion compensation coding method characterized by variable-size block division based on edges.
[0092]
In conventional edge extraction by pixel-by-pixel tracking, the amount of overhead information increases, so in order to prevent this, the proposed method adopts a method that approximates edges with line segments instead of points. . A functional block diagram of the block division algorithm is shown in FIG.
[0093]
First, a discontinuous point of luminance change is extracted by applying a secondary differential filter to the original image (or decoded frame). Then, a two-stage grouping process is performed on a set of points (referred to as edge points) obtained by the threshold process, and line segments expressing the shape of the object are extracted. In the first grouping process, template matching is performed between a set of edge points and a line segment pattern mask having a size of 5 × 5 pixels quantized in eight directions, and a line segment having directionality (unit line). (Referred to as minutes) is extracted (block 61). This is the extraction of the unit line segment.
[0094]
Next, the macro line segment extraction (block 62) will be described. Since the human visual central cell has a property of reacting to the rotation of the object in units of 10 ° (this is called orientation selectivity), in the second stage count processing, unit line segments that are continuous in the same direction. Are extracted, and a line segment quantized in 16 directions (called a macro line segment) is extracted. An example of the macro line segment is as shown in FIG. The data format for expressing the macro line segment is two-dimensional coordinates, direction (one direction quantized to 16) and length (number of coupled pixels) of the end points of each line segment.
[0095]
In block division (block 63), the frame is initially divided equally into large blocks (maximum allowable blocks) (this is called the first layer), and if there is a macro line segment in each block, it is divided into four small blocks. Divide (this is called the second layer). This operation is repeated until a predetermined minimum allowable block is obtained. In the block division of FIG. 16, the first layer to the third layer exist.
[0096]
From the experimental results so far, the amount of overhead information does not depend on the content of the image, and if it is divided up to the second layer, the proposed method and the quadtree structure are almost the same. It was confirmed that there were fewer (that is, advantageous). In addition, blocks indicated by broken lines in FIG. 14 indicate that, although the normal edge extraction is performed on the original image in the proposed method, there is a method using a previously decoded frame as an alternative. In this alternative, since the same decoding frame exists on both the local decoding unit on the encoding side and the decoding side, it is used that the obtained edges are equal if the edge extraction method is the same. The advantage of this approach is that there is no overhead. However, since it is assumed that the decoded frame and the actually processed frame are similar, if there is a difference between the two, an incorrect division is performed and the coding efficiency is lowered. Therefore, a mechanism for adaptively switching between using the original image and using the decoded frame is required.
[0097]
【The invention's effect】
An image decomposition suitable for encoding an image can be performed with a smaller amount of data.
[Brief description of the drawings]
FIG. 1 is a diagram showing an example of multilayer display.
FIG. 2 is a diagram showing a QT code of the multilayer display example shown in FIG. 1;
FIG. 3 is a block diagram of an encoder according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a reduced image.
FIG. 5 is a block diagram of an edge extraction unit in the encoder of FIG. 3;
FIG. 6 is a diagram showing a small segment pattern in eight directions.
FIG. 7 is a diagram illustrating detection of a macro edge.
FIG. 8 is a diagram showing an example of an edge belt.
FIG. 9 is a diagram showing an example of a stepped edge.
FIG. 10 is a graph comparing data rates of multilayer display by simulation.
FIG. 11 is a block diagram showing the concept of a decoding method according to another invention.
FIG. 12 is a diagram for explaining a decoding algorithm according to another embodiment of the present invention (when the DCT coefficient table is 0).
FIG. 13 is a diagram for explaining a decoding algorithm according to another embodiment of the present invention (when the DCT coefficient table is 1).
FIG. 14 is a block diagram illustrating an embodiment of another invention.
FIG. 15 is a diagram illustrating an algorithm according to another invention.
FIG. 16 is a diagram illustrating block division.
[Explanation of symbols]
1 Parameter determination part
2 Image reduction part
3 Edge extraction unit
4 Multi-layer part
5 Coding section

Claims

Determine the parameters for multi-layer display of the input image,
Reduce the image using the desired reduction filter for the multi-layer display for which the parameters have been determined,
A step of performing Laplace calculation to find the edge location for the reduced image, detecting a unit edge, detecting a macro edge based on the detected unit edge, and calculating an edge strength for each rectangular region surrounding the detected macro edge A method for encoding an image including