JP4258879B2

JP4258879B2 - Image encoding method and apparatus, image decoding method and apparatus, and computer-readable recording medium storing a program for causing a computer to realize the image encoding method and the image decoding method

Info

Publication number: JP4258879B2
Application number: JP05977099A
Authority: JP
Inventors: 稔栄藤; 幸一畑
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1999-03-08
Filing date: 1999-03-08
Publication date: 2009-04-30
Anticipated expiration: 2019-03-08
Also published as: JP2000261808A

Description

【０００１】
【発明の属する技術分野】
本発明は、３次元環境システムを実現することを目的とした画像符号化方法とその装置、画像復号化方法とその装置、コンピュータに画像符号化方法および画像復号化方法を実現させるためのプログラムを記録したコンピュータ読み取り可能な記録媒体に関する。
【０００２】
【従来の技術】
コンピュータグラフィックス（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ，ＣＧ）とは、「コンピュータによりデータを処理し、生成された画像」または、「コンピュータによりデータを処理し、画像を生成する技術」という意味で用いられる。
【０００３】
広く解釈すれば、コンピュータで図形を描くアルゴリズムを開発することも、コンピュータを利用して図形を描くこともＣＧに含まれる。
【０００４】
ＣＧが表現可能な対象は、人間が見ることができる物体、場所や背景のみならず、実際に見ることができないものを表現することも可能である。
【０００５】
そのため、ＣＧは仮想現実感（ＶｉｒｔｕａｌＲｅａｌｉｔｙ，ＶＲ）や人工現実感（ＡｒｔｉｆｉｃｉａｌＲｅａｌｉｔｙ，ＡＲ）と呼ばれる技術に用いられ、重要な基礎となっている。これらの技術では、コンピュータ内に構築された世界をＣＧによって可視化し、あたかもその世界にいるかのような感覚を起こす。より現実味のある緻密な画像生成が必要である。
【０００６】
現在そのＣＧの多くは、ポリゴンベースドレンダリング（Ｐｏｌｙｇｏｎ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ，ＰＢＲ）という手法によって生成されている。光学的計算を行うレンダリング（Ｒｅｎｄｅｒｉｎｇ）行程は、実環境のような複雑なシーンである場合は計算コストが大きくなる。そのために、計算機の能力が飛躍的に発展した現在でも、実時間でシーンの変更が必要な場合には幾何計算専用のプロセッサが必要になることがある。
【０００７】
一方、撮影・蓄積された実写画像を基に画像を生成するイメージベースドレンダリング（Ｉｍａｇｅ−ＢａｓｅｄＲｅｎｄｅｒｉｎｇ，ＩＢＲ）と呼ばれる手法が提案されている。
【０００８】
蓄積画像数が莫大になるために大きな記憶領域を必要とする欠点があるが、実写画像を用いることにより写実的な再生画像を得ることができる。
【０００９】
しかし、撮影した画像しか再生できないために、架空の環境は表現することができない。ＩＢＲは実写画像を用いるために写実性に優れている。さらに、蓄積画像を出力するために計算コストは極めて小さく、人工現実感のように写実的な画像が求められ、かつ視点の変更とともに実時間でシーンの変更が求められるような応用には、後者のＩＢＲが適している。
【００１０】
本発明はＩＢＲを用いた３次元環境再現システムに関する。
【００１１】
ＩＢＲを用いた３次元環境再現システムを図１に示す。
【００１２】
図１では全方位画像（パノラマ画像）を撮影できるロボットを自走させ、多地点で撮像することにより、任意位置の任意視点の見え方を再現する。ＩＢＲの課題は、蓄積すべき情報量が多い点である。
【００１３】
これについては特開平１０−２７１５１１号公報記載の画像符号化装置と画像復号化装置がある。
【００１４】
これを符号化に関する従来例（従来例１）とする。従来例１では、図２に示すように物体周囲をカメラが移動し可能な限り多くの位置から物体を撮像し、異なる視点画像間の相関を利用して高能率符号化を行う。
【００１５】
この高能率符号化には、動き補償離散コサイン変換（Ｍｏｔｉｏｎ−ＣｏｍｐｅｎｓａｔｅｄＤｉｓｃｒｅｔｅＣｏｓｉｎｅＴｒａｎｓｆｏｒｍａｔｉｏｎ）符号化方式（ＭＣ−ＤＣＴ符号化と以後略す）が多視点間の予測符号化に拡張されて用いられている。
【００１６】
ＭＣ−ＤＣＴ符号化方式はＩＴＵ−Ｔ（国際電気通信連合電気通信標準化部門）で国際標準化されたＨ．２６１規格やＨ．２６３規格やＭＰＥＧ規格に採用された共通動画圧縮技術として良く知られている。
【００１７】
ここでＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）とは、世界標準化機構（ＩｎｔｅｒｎａｔｉｏｎａｌＯｒｇａｎｉｚａｔｉｏｎｆｏｒＳｔａｎｄａｒｄｉｚａｔｉｏｎ，ＩＳＯ）の音声・動画符号化の標準化を進めてきた作業グループ（ＩＳＯ／ＩＥＣＪＴＣ１／ＳＣ２９／ＷＧ１１）を指すが、同時にこのグループが定めたデータ圧縮の国際規格をも意味する。
【００１８】
ＭＣ−ＤＣＴ符号化方式では画像を１６×１６画素からなるマクロブロックと呼ばれるブロックに分割し、連続する画像（フレーム）間でマクロブロックの差分が最小となる移動量（動きベクトル）を計算し、その差分を離散コサイン変換（ＤＣＴ）符号化を行う。フレーム間差分最小となるよう動きベクトルを求め冗長性を除いた差分画像を得る作業は動き補償と呼ばれる。ＤＣＴは動き補償された画像（差分画像）に残る空間的冗長性を除くために用いられる。なお、最初のフレームは、他のフレームとの差分符号化が行えないため、ＤＣＴのみを用いて符号化される。これをＩピクチャとよぶ。ＭＣ−ＤＣＴ符号化される一般フレームをＰピクチャとよぶ。従来例１はこの構造を多視点画像間に拡張したものである。
【００１９】
３次元環境再現システムでは自走ロボットを所定の位置を走査するよう制御する。
【００２０】
この制御は角度センサーの情報から移動距離、回転を得て行われるが、角度センサーの精度不足や車輪のスリップが避けられず、正しく所定の位置を走行する保証はない。そのために得られた画像から実際に走行した視点位置を補正する必要がある。
【００２１】
画像中の特徴点から、カメラの動きを推定する手法として６特徴点照合による推定方法がある。例えば、文献ＪｕｙａｎｇＷｅｎｇ，ＮａｒｅｎｄｒａＡｈｕｊａ，ａｎｄＴｈｏｍａｓＳ．Ｈｕａｎｇ：“Ｏｐｔｉｍａｌｍｏｔｉｏｎａｎｄｓｔｒｕｃｔｕｒｅｅｓｔｉｍａｔｉｏｎ”，ＩＥＥＥＴｒａｎｓ，ＰａｔｔｅｒｎＡｎａｌｙｓｉｓａｎｄＭａｃｈｉｎｅＩｎｔｅｌｌｉｇｅｎｃｅ，１５（９），ｐｐ８６４−８８４（１９９３）に記載されている。
【００２２】
これを撮像装置の校正に関する従来例（従来例２）とする。この概念を図３に示す。画像中で
【００２３】
【数１】

【００２４】
が未知数であるとする。
【００２５】
ここで、ｄ０からｄ５まではカメラから物体（この場合、筐体の頂点）までの距離、Ｔｘ、Ｔｙ、Ｔｚはカメラと物体間の相対的並進成分、ωｘ、ωｙ、ωｚは回転成分である。基準位置で観測された６特徴点の座標を数２として定める。
【００２６】
【数２】

【００２７】
３次元空間中の特徴点の位置が（数１）と（数２）を既知とすると、任意の視点位置の特徴点投影座標が（数３）として計算できる。
【００２８】
【数３】

【００２９】
（数３）が実際には（数４）として観測されたとする。
【００３０】
【数４】

【００３１】
（数３）と（数４）は量子化誤差やレンズ系の収差のため、必ずしも一致しない。
【００３２】
（数３）と（数４）のずれを２乗誤差として（数５）に表現し、これを最小化する枠組みで物体−カメラ間の動きΨを決定する。
【００３３】
【数５】

【００３４】
この最小化は、（数３）の関数ｆが非線形関数であることから、反復による最小化手法が使われる。（数６）は非線形最小２乗化の手法として広く用いられているＬｅｖｅｎｂｅｒｇ−Ｍａｒｑｕａｒｔ法による最小化ステップを表している。
【００３５】
【数６】

【００３６】
Ｈは近似ヘッセ行列で、Ｉは単位行列∇ｅは勾配ｔは反復数である。λは非負の制御変数で、λが大きな時、最急降下法（収束は遅いが安定）に、λが小さな時、ニュートン法（不安定だが収束は速い）に近づく。
【００３７】
Ｌｅｖｅｎｂｅｒｇ−Ｍａｒｑｕａｒｔ法では、λを制御することにより最急降下法とニュートン法の長所が生かせる。なお、Ｌｅｖｅｎｂｅｒｇ−Ｍａｒｑｕａｒｔ法では、近似ヘッセ行列を以下のように計算する。
【００３８】
【数７】

【００３９】
【数８】

【００４０】
数８は、数５が２次形式をとっていることから、可能になった近似である。
【００４１】
反復が安定収束であるためには、数６において、勾配∇ｅに係る逆行列は正定
値（ｐｏｓｉｔｉｖｅ−ｄｅｆｉｎｉｔｅ）である必要がある。
【００４２】
一般にヘッセ行列は正定値である保証はないが、数７が数８の近似により実対称行列となることから、この場合正定置であることが保証されている。以上２視点間の６特徴点照合により、カメラの並進・回転が求まる。ただし、並進については絶対値ではなく、特徴点までの距離も含めて比として得られる。
【００４３】
なお、６特徴点の照合が必要な理由は、以下のように説明できる。特徴点一つの観測につき水平・垂直の座標が得られることから２つの方程式が立つ。
【００４４】
カメラの動きパラメータ（未知パラメータ）は６個である。特徴点照合が１つにつき、基準座標における特徴点までの距離が１個未知パラメータとして増える。特徴点照合が６のとき、方程式数が未知パラメータ１２と等しくなり、数５の最小化として画像から動きパラメータが求まる。
【００４５】
【発明が解決しようとする課題】
従来例１は、図２に示したように内向き多視点画像の符号化を行っており、図１に示すようなＩＢＲを用いた３次元環境再現システム符号化には、改善の余地がある。異なる視点画像間の相関を用いた上で、符号化対象は全方位画像であることの性質を利用した符号化が必要である。また、３次元環境再現システムを実現するためには、符号化した全方位画像の復号化が必要である。
【００４６】
本発明は、かかる点に鑑み、全方位画像多視点符号化に適した符号化方法とその装置、全方位画像多視点複合化に適した復号化方法とその装置、およびコンピュータに全方位画像多視点符号化に適した符号化方法および全方位画像多視点複合化に適した復号化方法を実現させるプログラムを記録した記録媒体を提供することを目的とする。
【００４７】
【課題を解決するための手段】
この課題を解決するために、第１の発明は、全方位画像を予測符号化する方法であって、フレーム内符号化を行う参照画像符号化ステップと、符号化された参照画像から類似の全方位画像を予測する際、参照画像の端が他方の端に連続していることを仮定し、継ぎ目なく予測全方位画像を生成する予測ステップと、予測ステップにより生成された予測全方位画像と入力全方位画像との差分を符号化する残差符号化ステップを有することを特徴とする画像符号化方法である。
【００４８】
第２の発明は、全方位画像を予測復号化する方法であって、フレーム内復号化を行う参照画像復号化ステップと、復号化された参照画像から類似の全方位画像を予測する際、参照画像の端が他方の端に連続していることを仮定し、継ぎ目なく予測全方位画像を生成する予測ステップと、予測ステップにより生成された予測全方位画像と入力全方位画像との差分を復号化する残差復号化ステップを有することを特徴とする画像復号化方法である。
【００４９】
【発明の実施の形態】
（実施例１）
実施例１を図４、図５、図６、図７、図８を用いて説明する。図１に示した撮像系は、床面上を格子状にくまなく移動するように移動する。図４は、床面を上から俯瞰した、撮影点を表しており、円は全方位画像を表している。
【００５０】
全方位画像の中で、網かけされた円はテンプレート符号化画像であり、そうでない画像はテンプレート符号化画像を参照して予測符号化されるテンプレート予測符号化画像である。全方位画像は円筒状にシームレスな画像であるが、これを矩形に展開し、ＭＰＥＧ１とほぼ同じイントラフレーム符号化により符号化される（ＭＣ−ＤＣＴ符号化方式のＩピクチャに相当）。
【００５１】
ＭＰＥＧ１と異なるのは画像サイズで横３５２０×縦５７６画素であることである。
【００５２】
この処理ステップを図５に示す。従来の技術で説明したように、通常のＭＣ−ＤＣＴ符号化方式と同じように画像を１６×１６画素からなるマクロブロックと呼ばれるブロックに分割し、それをさらに分割した８×８画素単位で２次元ＤＣＴを行う。
【００５３】
これを量子化し、主観画質を落とさない範囲で情報を落とす（情報の欠落は量子化誤差に相当する）。量子化データは可変長符号化されてハフマン符号として出力される。
【００５４】
テンプレート予測符号化画像の符号化ステップを図６に示す。
【００５５】
視点位置の近いテンプレートを一度復号して参照画像とする。
【００５６】
この復号済みテンプレートとの間で当該画像のマクロブロックの差分が最小となる移動量（動きベクトル）を計算し、その差分を離散コサイン変換（ＤＣＴ）符号化を行う。
【００５７】
この処理を図７に示す。量子化以降はテンプレート符号化と同じステップをとる。上記動き補償で、従来のＭＣ−ＤＣＴ方式の動き補償と異なるのは、図８に示すように領域外参照となる動きベクトルを認めることである。
【００５８】
全方位画像はシームレスな円筒画像と考えることができるから、右方（左方）への突出分を画像左端（右端）からみたオフセットとして予測画像生成を行う。
【００５９】
この“シームレス（継ぎ目なし）”動き補償に対応して、動きベクトル予測も“シームレス”に行う。
【００６０】
以上特許請求の範囲の請求項１に該当する実施例１を説明した。
（実施例２）
次に実施例１で生成された符号データを復号する実施例２を示す。
【００６１】
図９は図５の逆過程であり、テンプレート符号化画像を再構成する。
【００６２】
横３５２０×縦５７６画素が扱えるＭＰＥＧ１のイントラ復号化ステップを実行する。
【００６３】
図１０は、図６の逆過程であり、テンプレート予測画像を再構成する。
【００６４】
再生しようとする視点位置がテンプレート符号化画像であれば、図９の処理ステップにしたがって復号化する。テンプレート予測符号化画像であれば、参照している近傍のテンプレート符号化画像を一度復号した後（あるいは、一度復号した画像は、消去せずに蓄積しておくとすると、メモリから読み出して）、図１０の処理ステップに従って復号化する。さらに再構成された画像から図１１に示すように任意視線方向を切り取れば、３次元環境再現が行える。
【００６５】
このように、膨大なデータ量となる全方位画像を異なる視点間の相関を利用した上で、全方位画像の特徴に注目して“シームレス”動き補償することにより、少ない符号量で符号化することができる。
（参考例１、２）
参考例１、２は３次元環境再現システムにおける観測系の動き推定方法に関するものである。
【００６６】
はじめに、参考例１、２の課題を示す。従来技術に示した従来例２は６特徴点照合により、カメラの並進・回転が求まる。図１のように一般に床面上を移動する撮像系では、自由度が水平移動に拘束され、また回転についても鉛直線を軸に１自由度の回転である。
【００６７】
したがって、２自由度の並進運動と１自由度の回転運動合計３自由度のパラメータ推定では、３特徴点照合により、カメラ移動パラメータが画像より推定できる。しかし、カメラ移動に際して、特徴点座標の垂直方向成分が変化しないことが多い。したがって、２自由度の並進運動と１自由度の回転運動合計３自由度のパラメータ推定では、３特徴点照合により、カメラ移動パラメータが画像より推定できる。しかし、カメラ移動に際して、特徴点座標の垂直方向成分が変化しないことが多い。
【００６８】
言い換えれば、画面上、垂直方向の特徴点座標の僅かな観測誤差が、カメラの運動パラメータを大きく左右する。
【００６９】
参考例１、２では、かかる点に鑑み、床面上を移動する観測系の動きパラメータを安定して推定する方法と装置を提供する実施例を示す。
（参考例１）
参考例１に示す発明は、走行する観測系の水平移動、回転角度を推定する方法であって、基準位置における観測系周囲に存在するＮ個所（Ｎ≧６）の方向角を観測するステップ１と、上記Ｎ個所について異なる位置１で再度方向角を観測するステップ２と、上記Ｎ個所についてさらに異なる位置２で再度方向角を観測するステップ３と、以上の３Ｎの方向角を用いて、基準位置に対する位置１および位置２とその位置における鉛直線周りの観測系の回転を求めるステップからなることを特徴とする観測系動き推定方法である。
【００７０】
観測系動き推定方法を図１２、図１３、図１４を用いて説明する。参考の実施例では床面上を移動する撮像系の動きパラメータを安定して推定する方法を示す。
【００７１】
はじめに動きパラメータの推定原理を説明する。図１２に示すように、カメラが自走することにより、異なる観測位置で、見え方の異なる画像が撮像される。ここで求めたいのは、カメラの並行移動成分（床面上の２自由度）と鉛直方向周りの回転成分（１自由度）の計３自由度のパラメータである。
【００７２】
床面上の移動に対して、最も見え方変化の大きな特徴は、情景中の垂直エッジである（原理的には、垂直エッジだけではなく、特徴点を含む。後述の方向角に変換できる特徴であれば良い）。
【００７３】
本実施例では水平方向の見え方の変化だけを頼りにカメラの動きパラメータを推定する。図１３は、撮像装置の位置を真上から俯瞰した図であるが、このように３視点位置でＡの位置を基準とし、新たに位置Ｂおよび位置Ｃで同じ垂直エッ
ジを観測するとする。
【００７４】
各位置における十字は、撮像装置中心で定まる局座標である。
【００７５】
基準位置Ａにおいて定めた座標系に対して、Ｂに移動した際のカメラ動きパラ
メータは（Ｔｘ^Ｂ、Ｔｙ^Ｂ、ω^Ｂ）、Ｃに移動した際のカメラ動きパラメータは（Ｔｘ^Ｃ、Ｔｙ^Ｃ、ω^Ｃ）とする。
【００７６】
未知数は６である。これを求めるために、各位置における垂直エッジの方位角を求める。観測１つに付き数９が１つ成り立つ。
【００７７】
数９においてｐは位置の識別子でこの場合［Ｂ、Ｃ］のいずれかである。ｉは垂直エッジの識別子であり、０≦ｉ＜Ｎ、Ｎ≧６であることは後述する。
【００７８】
【数９】

【００７９】
数９は垂直エッジｉの基準座標Ａからみた距離ｄｉ^Ａと位置ｐまでのカメラ動きパラメータが分かれば視点Ａにおける方位角が計算できることを意味する拘束式である。
【００８０】
さて、ここで２つの視点（例えば、基準点Ａと移動点Ｂ）で、垂直エッジを照合して基準座標から移動点Ｂまでのカメラ動きパラメータを求めようとすると、観測１つにつき拘束式が増えるが、新たに未知パラメータｄｉ^Ａが増える。
【００８１】
そこで、観測位置を３点（この場合に相当し、基準点Ａと移動点ＢとＣ）とすると、観測１に対して拘束式が２つ得られ、６特徴の観測で１２の拘束式、１２の未知パラメータとなり、未知パラメータが定まる。
【００８２】
より一般的には６特徴以上の観測であれば、未知パラメータは全て求まる。
【００８３】
数９を書き換えて数１０とする。
【００８４】
【数１０】

【００８５】
ただし、
【００８６】
【数１１】

【００８７】
【数１２】

【００８８】
【数１３】

【００８９】
である。
【００９０】
ここで求めるべきパラメータを数１４とする。
【００９１】
【数１４】

【００９２】
これまでの関係式から、Ψが求まれば視点位置Ａ、Ｂ、Ｃ間のカメラ移動が分かることになる。
【００９３】
基準位置Ａにおける観測角を数１５とすると、Ψと数１５から、視点位置Ｂに
おける観測角と視点位置Ｃにおける観測角が数１６として導かれる。
【００９４】
【数１５】

【００９５】
【数１６】

【００９６】
以上は、移動パラメータと視点位置から計算された値であるが、実際に観測された角度を数１７であるとする。
【００９７】
【数１７】

【００９８】
ただし、
【００９９】
【数１８】

【０１００】
【数１９】

【０１０１】
である。
【０１０２】
ここで、Ψと数１５から得られる数１６を以後の微分処理を容易にするため（数１０参照）、数２０に表現を書き換える。数２０は明示しないまでもΨと基準点Ａにおける観測角（数１５）の関数である。
【０１０３】
【数２０】

【０１０４】
以上に対応して、観測位置Ｂ、Ｃにおける観測角の表現を数１７から数２１に表現を変える。
【０１０５】
【数２１】

【０１０６】
そして、計算により予測されるＢ地点、Ｃ地点の観測角と実測との２乗誤差を数２２で表す。
【０１０７】
【数２２】

【０１０８】
数２２を最小化するΨとしてパラメータが求まる。これは、従来技術で示したＬｅｖｅｎｂｅｒｇ−Ｍａｒｑｕａｒｔ法の枠組みで数２３の反復として求めることができる。
【０１０９】
【数２３】

【０１１０】
数２３中、Ｈは近似ヘッセ行列（数７、数８と同形）で、Ｉは単位行列∇ｅは勾配である。
【０１１１】
ヘッセ行列と勾配を求める微分処理は数１４の未知パラメータについて行われる。
【０１１２】
以上の処理ステップをまとめると、図１４となる。これを参考例１とする。
（参考例２）
参考例２の発明は、走行する観測系の水平移動、回転角度を推定する装置であって、観測系周囲に存在するＮ個所（Ｎ≧６）の方向角を観測する手段と、上記手段を異なる観測位置（基準位置、位置１、位置２）について３回動作させた結果を保持するメモリと、記憶された３観測位置における３Ｎの方向角を用いて、基準位置に対する位置１および位置２とその位置における鉛直線周りの観測系の回転を求める手段からなることを特徴とする観測系動き推定装置である。
【０１１３】
観測系動き推定装置の参考例２について、図１５を用いて説明する。図１５はブロック図であり、１０１は全方位画像から垂直エッジを抽出する特徴抽出部、１０２は垂直エッジの対応関係を求める特徴追跡部、１０３はエッジから図１３に示す観測角に変換する６方向抽出部、１０４〜１０５は基準点Ａ、移動点Ｂ、Ｃの角度を記憶するメモリ、１０７は以上の観測から数２３に示した反復演算を行い、数１４のパラメータを求めるカメラ動き計算部である。
【０１１４】
この装置によれば、自走しながら３地点で画像上に現れる６本の線分の見える方位角を記憶することにより、その線分（例えば、テーブルや本棚の稜線）までの距離と移動成分の比、および鉛直線周りの回転角を計算することができる。実際に観測した例を図１６、図１７に示す。
【０１１５】
未知パラメータのうち、距離に関する成分（言い換えれば回転以外の成分）は全て比として求まる。
【０１１６】
この例では、移動カメラの車輪から得られた距離から数１４のＴｘ^Ｂの絶対値を定め、比例演算で絶対距離を求めている。
【０１１７】
なお、本参考例では、全方位画像に写る垂直線分を特徴として用いたが、原理的には、図１３に示すように３次元中の物体の存在（エッジや角など）を観測位置からの水平面上の方位角に置き換えることができれば、数１４のパラメータを求めることができる。
【０１１８】
したがって、撮像対象が全方位画像である必要はなく、また用いる画像特徴も線分に限定する必要はない。
【０１１９】
例えば、通常のカメラを用いてその投影面上でレンズ中心に平行な線を仮定し、その線上の特徴移動を追跡することによりカメラ動きパラメータを推定することができる。
【０１２０】
さらに、観測手段は画像である必要も無い。方位角の観測であるからレーザー光の照射によってもこの方法および装置は適用できる。
【０１２１】
最後に６線分を観測したが、７以上の線分を観測しても、増える未知パラメータ（線分までの距離）の数よりも拘束式が増えるため、数２０の最小化により求めることができる。
【０１２２】
なお、本発明をソフトウエアプログラムによって実現し、そのプラグラムをメモリ上に記憶することにより、実行することができる。その場合、そのプログラムをＣＤ−ＲＯＭ等の記録媒体を用いて、あるいはインターネット等の通信回線を用いてプログラムを配信して、本発明を実行することも可能である。
【０１２３】
【発明の効果】
以上のように本発明（請求項１）に係る画像符号化方法によれば、図１に例示される全方位画像を予測符号化する際、右端と左端が連結した動き補償を行うことにより、ＭＰＥＧ１などの通常の予測符号化を用いた方法よりも高能率に符号化することができる。
【０１２４】
また本発明（請求項２）に係る画像復号化方法によれば、図１に例示される全方位画像を予測符号化したデータを複号化する際、右端と左端が連結した動き補償を行うことにより、ＭＰＥＧ１などの通常の予測復号化を用いた方法よりも高能率に復号化することができる。
【図面の簡単な説明】
【図１】３次元環境再現システム概念図
【図２】多視点画像符号化の従来例を示す図
【図３】画像上の６特徴点照合によるカメラ動きパラメータ推定図
【図４】テンプレート符号化画像とテンプレート予測符号化画像の配置説明図
【図５】テンプレート符号化処理手順を示す図
【図６】テンプレート予測符号化処理手順を示す図
【図７】全方位画像の動き補償概念図
【図８】全方位画像の境界外動き補償の概念図
【図９】テンプレート復号化処理手順を示す図
【図１０】テンプレート予測復号化処理手順を示す図
【図１１】全方位画像と任意方向画像の関係説明図
【図１２】６垂直線による自走ロボット位置推定概念図
【図１３】方向角と自走ロボットの座標系説明図
【図１４】観測系動き推定方法の一実施例処理手順を示す図
【図１５】観測系動き推定装置の一実施例ブロック図
【図１６】観測系動き推定装置の一実施例による処理結果例１を示す図
【図１７】観測系動き推定装置の一実施例による処理結果例２を示す図
【符号の説明】
１０１特徴抽出部
１０２特徴追跡部
１０３６方向抽出部
１０４角度メモリ１
１０５角度メモリ２
１０６角度メモリ３
１０７カメラ動き計算部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image encoding method and apparatus for realizing a three-dimensional environment system, an image decoding method and apparatus thereof, and a program for causing a computer to realize the image encoding method and image decoding method. The present invention relates to a recorded computer-readable recording medium.
[0002]
[Prior art]
Computer graphics (CG) is used to mean “an image generated by processing data by a computer” or “a technique for generating data by processing data by a computer”.
[0003]
In broad terms, CG includes developing algorithms for drawing graphics on a computer and drawing graphics using a computer.
[0004]
Objects that can be represented by CG can represent not only objects, places and backgrounds that humans can see, but also objects that cannot be actually seen.
[0005]
Therefore, CG is a virtual reality (VR) or an artificial reality (Artificial). It is used in a technology called Reality, AR) and is an important basis. With these technologies, the world built in the computer is visualized by CG, and it feels as if it is in that world. There is a need for more realistic and precise image generation.
[0006]
Currently, many of the CGs are based on polygon-based rendering (Polygon-Based Rendering, PBR). The rendering process for performing optical calculation increases the calculation cost in the case of a complex scene such as a real environment. For this reason, even when the capabilities of computers have dramatically improved, a processor dedicated to geometric calculation may be required when scene changes are required in real time.
[0007]
On the other hand, a method called image-based rendering (IBR) has been proposed in which an image is generated based on a photographed / accumulated actual image.
[0008]
Although there is a drawback that a large storage area is required because the number of stored images becomes enormous, a realistic reproduction image can be obtained by using a real image.
[0009]
However, since only captured images can be reproduced, a fictitious environment cannot be expressed. IBR is excellent in realism because it uses a real image. Furthermore, the calculation cost for outputting the stored image is extremely low. For applications where realistic images such as artificial reality are required, and scene changes are required in real time as the viewpoint changes, the latter is required. IBR is suitable.
[0010]
The present invention relates to a three-dimensional environment reproduction system using IBR.
[0011]
A three-dimensional environment reproduction system using IBR is shown in FIG.
[0012]
In FIG. 1, a robot capable of capturing an omnidirectional image (panoramic image) is self-propelled and captured at multiple points to reproduce the appearance of an arbitrary viewpoint at an arbitrary position. The problem of IBR is that there is a large amount of information to be accumulated.
[0013]
Regarding this, there is an image encoding device and an image decoding device described in Japanese Patent Laid-Open No. 10-271511.
[0014]
This is a conventional example (conventional example 1) related to encoding. In Conventional Example 1, as shown in FIG. 2, the camera moves around the object, picks up an object from as many positions as possible, and performs high-efficiency encoding using correlation between different viewpoint images.
[0015]
In this high-efficiency coding, a motion-compensated discrete cosine transformation (MC-DCT coding) is used as an extension to multi-view predictive coding. .
[0016]
The MC-DCT encoding method is H.264 standardized by ITU-T (International Telecommunication Union Telecommunication Standardization Sector). H.261 standard and H.264 standard. It is well known as a common moving image compression technique adopted in the H.263 standard and the MPEG standard.
[0017]
Here, MPEG (Moving Picture Experts Group) refers to a working group (ISO / IEC JTC1 / SC29 / WG11) that has been promoting the standardization of audio / video coding of the International Organization for Standardization (ISO). At the same time, it means the international standard for data compression established by this group.
[0018]
In the MC-DCT encoding method, an image is divided into blocks called macroblocks composed of 16 × 16 pixels, and a movement amount (motion vector) that minimizes the macroblock difference between consecutive images (frames) is calculated. The difference is subjected to discrete cosine transform (DCT) encoding. The operation of obtaining a motion image by obtaining a motion vector so as to minimize the interframe difference and obtaining a difference image without redundancy is called motion compensation. The DCT is used to remove spatial redundancy remaining in the motion compensated image (difference image). Note that since the first frame cannot be differentially encoded with other frames, it is encoded using only DCT. This is called an I picture. A general frame that is MC-DCT encoded is called a P picture. Conventional Example 1 is an extension of this structure between multi-viewpoint images.
[0019]
In the three-dimensional environment reproduction system, the self-propelled robot is controlled to scan a predetermined position.
[0020]
This control is performed by obtaining the moving distance and rotation from the information of the angle sensor, but insufficient accuracy of the angle sensor and slipping of the wheel are unavoidable, and there is no guarantee that the vehicle travels correctly at a predetermined position. Therefore, it is necessary to correct the viewpoint position actually traveled from the obtained image.
[0021]
As a technique for estimating camera motion from feature points in an image, there is an estimation method based on six feature point matching. For example, the documents Juyang Weng, Narendra Ahuja, and Thomas S. Huang: “Optimal motion and structure estimation”, IEEE Trans, Pattern Analysis and Machine Intelligence, 15 (9), pp 864-884 (1993).
[0022]
This is a conventional example (conventional example 2) relating to the calibration of the imaging apparatus. This concept is illustrated in FIG. In the image [0023]
[Expression 1]

[0024]
Is an unknown.
[0025]
Here, d0 to d5 are distances from the camera to the object (in this case, the apex of the housing), Tx, Ty, and Tz are relative translation components between the camera and the object, and ωx, ωy, and ωz are rotation components. . The coordinates of the six feature points observed at the reference position are defined as Equation 2.
[0026]
[Expression 2]

[0027]
Assuming that (Equation 1) and (Equation 2) are known as the positions of the feature points in the three-dimensional space, the feature point projection coordinates at an arbitrary viewpoint position can be calculated as (Equation 3).
[0028]
[Equation 3]

[0029]
Assume that (Equation 3) is actually observed as (Equation 4).
[0030]
[Expression 4]

[0031]
(Equation 3) and (Equation 4) do not necessarily match due to quantization errors and lens system aberrations.
[0032]
The difference between (Equation 3) and (Equation 4) is expressed as (Equation 5) as a square error, and the motion ψ between the object and the camera is determined in a framework that minimizes this.
[0033]
[Equation 5]

[0034]
For this minimization, since the function f in (Equation 3) is a non-linear function, an iterative minimization method is used. (Equation 6) represents a minimization step by the Levenberg-Marquart method widely used as a nonlinear least square method.
[0035]
[Formula 6]

[0036]
H is an approximate Hessian matrix, I is a unit matrix ∇e, and the gradient t is the number of iterations. λ is a non-negative control variable. When λ is large, it approaches the steepest descent method (convergence is slow but stable), and when λ is small, it approaches the Newton method (unstable but fast convergence).
[0037]
In the Levenberg-Marquart method, the advantages of the steepest descent method and the Newton method can be utilized by controlling λ. In the Levenberg-Marquart method, an approximate Hessian matrix is calculated as follows.
[0038]
[Expression 7]

[0039]
[Equation 8]

[0040]
Equation 8 is an approximation made possible because Equation 5 takes a quadratic form.
[0041]
In order for the iteration to be stable convergence, in Equation 6, the inverse matrix related to the gradient ∇e needs to be a positive-definite value.
[0042]
In general, the Hessian matrix is not guaranteed to be a positive definite value, but since Equation 7 becomes a real symmetric matrix by approximation of Equation 8, it is guaranteed to be positively fixed in this case. As described above, the translation / rotation of the camera is obtained by collating six feature points between the two viewpoints. However, the translation is not an absolute value but is obtained as a ratio including the distance to the feature point.
[0043]
The reason why the six feature points need to be collated can be explained as follows. Since horizontal and vertical coordinates can be obtained for each observation of a feature point, two equations are established.
[0044]
There are six camera motion parameters (unknown parameters). For each feature point collation, the distance to the feature point in the reference coordinates is increased by one unknown parameter. When the feature point matching is 6, the number of equations is equal to the unknown parameter 12, and the motion parameter is obtained from the image as the minimization of Equation 5.
[0045]
[Problems to be solved by the invention]
Conventional Example 1 encodes an inward multi-viewpoint image as shown in FIG. 2, and there is room for improvement in 3D environment reproduction system encoding using IBR as shown in FIG. . It is necessary to use the correlation between different viewpoint images and perform encoding using the property that the encoding target is an omnidirectional image. In order to realize a three-dimensional environment reproduction system, it is necessary to decode the encoded omnidirectional image.
[0046]
In view of the above problems, the encoding how suitable omnidirectional image multi-view coding and apparatus, decoding method suitable for omnidirectional image multiview complexed with the device, and the omnidirectional image to the computer It is an object of the present invention to provide a recording medium on which a program for realizing an encoding method suitable for image multi-viewpoint encoding and a decoding method suitable for omnidirectional image multi-viewpoint decoding is recorded .
[0047]
[Means for Solving the Problems]
In order to solve this problem, a first invention is a method for predictively encoding an omnidirectional image, and includes a reference image encoding step for performing intraframe encoding, and a similar all-encompassing from an encoded reference image. When predicting a azimuth image, assuming that the end of the reference image is continuous with the other end, a prediction step that generates a seamless omnidirectional image seamlessly, and a prediction omnidirectional image generated by the prediction step and input An image encoding method comprising a residual encoding step for encoding a difference from an omnidirectional image.
[0048]
A second invention is a method for predictive decoding of an omnidirectional image, a reference image decoding step for performing intraframe decoding, and a reference when predicting a similar omnidirectional image from a decoded reference image Assuming that the edge of the image is continuous with the other edge, the prediction step for seamlessly generating a predicted omnidirectional image, and the difference between the predicted omnidirectional image generated by the prediction step and the input omnidirectional image is decoded. It is an image decoding method characterized by having a residual decoding step.
[0049]
DETAILED DESCRIPTION OF THE INVENTION
Example 1
Example 1 will be described with reference to FIGS. 4, 5, 6, 7, and 8. The imaging system shown in FIG. 1 moves so as to move all over the floor surface in a grid pattern. FIG. 4 represents a shooting point when the floor surface is viewed from above, and a circle represents an omnidirectional image.
[0050]
Among the omnidirectional images, the shaded circle is a template encoded image, and the other images are template predictive encoded images that are predictively encoded with reference to the template encoded image. An omnidirectional image is a seamless image in a cylindrical shape, but is expanded into a rectangle and encoded by intra-frame encoding that is substantially the same as MPEG1 (corresponding to an I picture of the MC-DCT encoding method).
[0051]
The difference from MPEG1 is that the image size is 3520 × 5576 pixels.
[0052]
This processing step is shown in FIG. As described in the prior art, the image is divided into blocks called macroblocks each consisting of 16 × 16 pixels in the same manner as in the normal MC-DCT encoding method, and is further divided into 2 × 8 × 8 pixel units. Perform dimensional DCT.
[0053]
This is quantized, and information is dropped within a range that does not degrade subjective image quality (missing information corresponds to quantization error). The quantized data is variable-length encoded and output as a Huffman code.
[0054]
The encoding step of the template predictive encoded image is shown in FIG.
[0055]
A template close to the viewpoint position is once decoded as a reference image.
[0056]
A movement amount (motion vector) that minimizes the difference between the macroblocks of the image is calculated from the decoded template, and the difference is subjected to discrete cosine transform (DCT) encoding.
[0057]
This process is shown in FIG. After quantization, the same steps as template encoding are taken. The motion compensation is different from the motion compensation of the conventional MC-DCT method in that a motion vector serving as an out-of-region reference is recognized as shown in FIG.
[0058]
Since an omnidirectional image can be considered as a seamless cylindrical image, a predicted image is generated with an amount of protrusion to the right (left) as an offset viewed from the left end (right end) of the image.
[0059]
Corresponding to this “seamless (seamless)” motion compensation, motion vector prediction is also performed “seamlessly”.
[0060]
Described actual Example 1 that falls under claim 1 above JP Moto請 determined range.
(Example 2)
It shows the actual施例2 you decodes the encoded data generated by the real Example 1 in the following.
[0061]
FIG. 9 is the reverse process of FIG. 5 and reconstructs the template encoded image.
[0062]
An MPEG1 intra decoding step capable of handling horizontal 3520 × vertical 576 pixels is executed.
[0063]
FIG. 10 is the reverse process of FIG. 6 and reconstructs a template prediction image.
[0064]
If the viewpoint position to be reproduced is a template encoded image, decoding is performed according to the processing steps of FIG. If it is a template predictive encoded image, after decoding a template encoded image in the vicinity of the reference once (or if the decoded image is stored without being erased, it is read from the memory), Decoding is performed according to the processing steps of FIG. Further, if the arbitrary line-of-sight direction is cut out from the reconstructed image as shown in FIG. 11, three-dimensional environment reproduction can be performed.
[0065]
In this way, an omnidirectional image with an enormous amount of data is encoded with a small amount of code by using the correlation between different viewpoints and paying attention to the features of the omnidirectional image and performing “seamless” motion compensation. be able to.
(Reference Examples 1 and 2)
Reference examples 1 and 2 relate to a motion estimation method for an observation system in a three-dimensional environment reproduction system.
[0066]
First, problems of Reference Examples 1 and 2 are shown. In the second conventional example shown in the prior art, the translation / rotation of the camera is obtained by the six feature point matching. In an imaging system that generally moves on the floor as shown in FIG. 1, the degree of freedom is constrained by horizontal movement, and the rotation is also one degree of freedom about the vertical line.
[0067]
Therefore, in the parameter estimation of a total of 3 degrees of freedom with a translational motion of 2 degrees of freedom and a rotational motion of 1 degree of freedom, the camera movement parameter can be estimated from the image by collating 3 feature points. However, the vertical component of the feature point coordinates often does not change during camera movement. Therefore, in the parameter estimation of a total of 3 degrees of freedom with a translational motion of 2 degrees of freedom and a rotational motion of 1 degree of freedom, the camera movement parameter can be estimated from the image by collating 3 feature points. However, the vertical component of the feature point coordinates often does not change during camera movement.
[0068]
In other words, a slight observation error of the feature point coordinates in the vertical direction on the screen greatly affects the motion parameters of the camera.
[0069]
In reference examples 1 and 2, in view of such points, examples are provided that provide a method and apparatus for stably estimating the motion parameters of an observation system moving on the floor surface.
(Reference Example 1)
The invention shown in Reference Example 1 is a method for estimating the horizontal movement and rotation angle of a traveling observation system, and observes N direction angles (N ≧ 6) existing around the observation system at a reference position. Step 2 for observing the direction angle again at a different position 1 for the N locations, Step 3 for observing the direction angle again at a different location 2 for the N locations, and the above 3N direction angles. An observation system motion estimation method comprising the steps of obtaining position 1 and position 2 with respect to a position and rotation of the observation system around a vertical line at that position.
[0070]
The observation system motion estimation method will be described with reference to FIG. 12, FIG. 13, and FIG. The reference embodiment shows a method for stably estimating the motion parameters of the imaging system moving on the floor surface.
[0071]
First, the principle of motion parameter estimation will be described. As shown in FIG. 12, when the camera is self-propelled, images with different appearances are taken at different observation positions. What we want to find here is a parameter with a total of three degrees of freedom: a parallel movement component of the camera (two degrees of freedom on the floor) and a rotational component around the vertical direction (one degree of freedom).
[0072]
The feature with the greatest change in appearance with respect to movement on the floor is the vertical edge in the scene (in principle, it includes not only the vertical edge but also the feature point. Features that can be converted into the direction angle described later. If it is good).
[0073]
In this embodiment, the camera motion parameter is estimated based on only the change in the horizontal appearance. FIG. 13 is an overhead view of the position of the imaging apparatus, and it is assumed that the same vertical edge is newly observed at positions B and C with reference to the position A at the three viewpoint positions.
[0074]
The cross at each position is a local coordinate determined at the center of the imaging apparatus.
[0075]
With respect to the coordinate system defined at the reference position A, the camera motion parameters when moving to B are (Tx ^B , Ty ^B , ω ^B ), and the camera motion parameters when moving to ^C are (Tx ^C , Ty ^C , ω ^C ).
[0076]
The unknown is 6. In order to obtain this, the azimuth angle of the vertical edge at each position is obtained. One number 9 holds for each observation.
[0077]
In Equation 9, p is a position identifier, and in this case, any of [B, C]. i is a vertical edge identifier, and 0 ≦ i <N and N ≧ 6 will be described later.
[0078]
[Equation 9]

[0079]
Formula 9 is a constraint equation that means that the azimuth angle at the viewpoint A can be calculated if the camera motion parameters from the reference coordinate A of the vertical edge i to the distance di ^A and the position p are known.
[0080]
Now, if the camera motion parameters from the reference coordinates to the moving point B are obtained by collating the vertical edges at two viewpoints (for example, the reference point A and the moving point B), the constraint equation is obtained for each observation. increase, but new unknown parameters di ^a increases.
[0081]
Therefore, if the observation position is 3 points (corresponding to this case, the reference point A and the moving points B and C), two constraint equations are obtained for the observation 1, and 12 constraint equations are obtained for the observation of 6 features. There are 12 unknown parameters, and the unknown parameters are determined.
[0082]
More generally, all unknown parameters can be obtained if the observation has 6 or more features.
[0083]
Equation 9 is rewritten into Equation 10.
[0084]
[Expression 10]

[0085]
However,
[0086]
[Expression 11]

[0087]
[Expression 12]

[0088]
[Formula 13]

[0089]
It is.
[0090]
Here, the parameter to be obtained is represented by Equation 14.
[0091]
[Expression 14]

[0092]
From the relational expressions so far, if Ψ is obtained, the camera movement between the viewpoint positions A, B, and C can be known.
[0093]
Assuming that the observation angle at the reference position A is Expression 15, the observation angle at the viewpoint position B and the observation angle at the viewpoint position C are derived from Expression Ψ and Expression 15 as Expression 16.
[0094]
[Expression 15]

[0095]
[Expression 16]

[0096]
The above is a value calculated from the movement parameter and the viewpoint position, but the actually observed angle is assumed to be Equation 17.
[0097]
[Expression 17]

[0098]
However,
[0099]
[Formula 18]

[0100]
[Equation 19]

[0101]
It is.
[0102]
Here, Expression 16 is rewritten into Expression 20 in order to facilitate subsequent differentiation processing (see Expression 10) from Expression 16 obtained from Ψ and Expression 15. Equation 20 is a function of ψ and the observation angle (Equation 15) at the reference point A, unless otherwise specified.
[0103]
[Expression 20]

[0104]
Corresponding to the above, the expression of the observation angle at the observation positions B and C is changed from Expression 17 to Expression 21.
[0105]
[Expression 21]

[0106]
Then, the square error between the observation angle at the points B and C predicted by calculation and the actual measurement is expressed by Equation 22.
[0107]
[Expression 22]

[0108]
A parameter is obtained as Ψ that minimizes Equation 22. This can be obtained as an iteration of Equation 23 in the framework of the Levenberg-Marquart method shown in the prior art.
[0109]
[Expression 23]

[0110]
In Equation 23, H is an approximate Hessian matrix (the same form as Equations 7 and 8), and I is a unit matrix ∇e is a gradient.
[0111]
Differentiation processing for obtaining the Hessian matrix and the gradient is performed on the unknown parameter of Equation 14.
[0112]
The above processing steps are summarized as shown in FIG. This is referred to as Reference Example 1 .
(Reference Example 2)
The invention of Reference Example 2 is an apparatus for estimating the horizontal movement and rotation angle of a traveling observation system, comprising means for observing N direction angles (N ≧ 6) existing around the observation system, and the above means. Using a memory that holds the result of operating three times for different observation positions (reference position, position 1, position 2), and using the 3N direction angles at the three stored observation positions, position 1 and position 2 with respect to the reference position An observation system motion estimation device comprising means for obtaining rotation of an observation system around a vertical line at the position.
[0113]
Reference Example 2 of the observation system motion estimation device will be described with reference to FIG. FIG. 15 is a block diagram, 101 is a feature extraction unit that extracts vertical edges from an omnidirectional image, 102 is a feature tracking unit that obtains vertical edge correspondences, and 103 is an edge-to-observation angle shown in FIG. Direction extracting unit, 104 to 105 are memories for storing the angles of the reference point A and the moving points B and C, and 107 is a camera motion calculating unit that performs the iterative calculation shown in Equation 23 from the above observations and obtains the parameter of Equation 14 It is.
[0114]
According to this apparatus, by storing the azimuth angles of the six line segments that appear on the image at three points while self-propelled, the distance to the line segment (for example, the ridge line of the table or the bookshelf) and the moving component are stored. And the rotation angle around the vertical line can be calculated. Examples actually observed are shown in FIGS.
[0115]
Among the unknown parameters, all components related to distance (in other words, components other than rotation) are obtained as ratios.
[0116]
In this example, we determine the absolute value of the number 14 of Tx ^B from the distance obtained from the wheel of the mobile cameras, the absolute distance proportional calculation.
[0117]
In this reference example, a vertical line segment that appears in an omnidirectional image is used as a feature. However, in principle, the presence of an object (such as an edge or a corner) in a three-dimensional object is detected from the observation position as shown in FIG. If the azimuth angle on the horizontal plane can be replaced, the parameter of Equation 14 can be obtained.
[0118]
Therefore, the imaging target need not be an omnidirectional image, and the image feature to be used need not be limited to a line segment.
[0119]
For example, a camera motion parameter can be estimated by assuming a line parallel to the center of the lens on the projection plane using an ordinary camera and tracking feature movement on the line.
[0120]
Furthermore, the observation means need not be an image. Since this is an azimuth angle observation, this method and apparatus can be applied by laser light irradiation.
[0121]
Lastly, 6 line segments were observed, but even if 7 or more line segments were observed, the number of constraints increased from the number of unknown parameters (distance to the line segment) that increased. it can.
[0122]
It should be noted that the present invention can be executed by realizing the present invention by a software program and storing the program in a memory. In that case, it is also possible to execute the present invention by distributing the program using a recording medium such as a CD-ROM or using a communication line such as the Internet.
[0123]
【The invention's effect】
As described above, according to the image coding method according to the present invention (Claim 1), when predictive coding the omnidirectional image exemplified in FIG. 1, by performing motion compensation in which the right end and the left end are connected, Encoding can be performed more efficiently than a method using normal predictive encoding such as MPEG1.
[0124]
Further, according to the image decoding method according to the present invention (claim 2), when the data obtained by predictively encoding the omnidirectional image illustrated in FIG. 1 is decoded, motion compensation in which the right end and the left end are connected is performed. Thus, decoding can be performed more efficiently than a method using normal predictive decoding such as MPEG1.
[Brief description of the drawings]
FIG. 1 is a conceptual diagram of a three-dimensional environment reproduction system. FIG. 2 is a diagram showing a conventional example of multi-view image coding. FIG. 3 is a camera motion parameter estimation diagram by collating six feature points on an image. FIG. 5 is a diagram showing a template encoding process procedure. FIG. 6 is a diagram showing a template predictive encoding process procedure. FIG. 7 is a conceptual diagram of motion compensation for an omnidirectional image. 8] Conceptual diagram of out-of-boundary motion compensation for omnidirectional images. [Fig. 9] Diagram showing template decoding processing procedure. [Fig. 10] Diagram showing template predictive decoding processing procedure. [Fig. FIG. 12 is a conceptual diagram of self-running robot position estimation using 6 vertical lines. FIG. 13 is an explanatory diagram of a direction angle and a coordinate system of the self-running robot. FIG. [ 15 is a block diagram of an example of an observation system motion estimation device. FIG. 16 is a diagram illustrating a processing result example 1 according to an example of an observation system motion estimation device. FIG. 17 is a processing result of an example of an observation system motion estimation device. Diagram showing Example 2 [Explanation of symbols]
DESCRIPTION OF SYMBOLS 101 Feature extraction part 102 Feature tracking part 103 6 direction extraction part 104 Angle memory 1
105 Angle memory 2
106 Angle memory 3
107 Camera motion calculator

Claims

A method for predictively encoding an omnidirectional image,
A reference image encoding step for performing intraframe encoding;
When predicting a similar omnidirectional image from an encoded reference image, assuming that the end of the reference image is continuous with the other end, a prediction step for seamlessly generating a predicted omnidirectional image;
Have a, a residual encoding step of encoding a difference between the predicted omnidirectional image generated with the input omnidirectional image by the prediction step,
In the prediction step, in a macroblock partially removed from an end of the reference image, a first portion that is a macroblock portion in the reference image;
A third part that is a macroblock part that has the information content indicated by the second part that is a macroblock part that deviates from the end of the reference image and can be predicted to exist near the other end of the reference image;
An image encoding method for performing motion compensation using

A method for predictive decoding of omnidirectional images,
A reference image decoding step for performing intra-frame decoding;
When predicting a similar omnidirectional image from a decoded reference image, assuming that the end of the reference image is continuous with the other end, a prediction step for generating a predicted omnidirectional image seamlessly;
Have a, a residual decoding step for decoding the difference between the predicted omnidirectional image generated with the input omnidirectional image by the prediction step,
In the prediction step, in a macroblock partially removed from an end of the reference image, a first portion that is a macroblock portion in the reference image;
A third part that is a macroblock part that has the information content indicated by the second part that is a macroblock part that deviates from the end of the reference image and can be predicted to exist near the other end of the reference image;
Motion compensation using
Image decoding method.

An apparatus for predictively encoding an omnidirectional image,
Reference image encoding means for performing intraframe encoding;
When predicting a similar omnidirectional image from an encoded reference image, assuming that the end of the reference image is continuous with the other end, a prediction means for generating a predicted omnidirectional image seamlessly;
Have a, and residual encoding means for encoding the difference between the predicted omnidirectional image generated with the input omnidirectional image by the prediction means,
In the prediction means, in a macroblock partly removed from an end of the reference image, a first portion that is a macroblock portion in the reference image;
A third part that is a macroblock part that has the information content indicated by the second part that is a macroblock part that deviates from the end of the reference image and can be predicted to exist near the other end of the reference image;
An image encoding apparatus that performs motion compensation using a video.

An apparatus for predictive decoding of omnidirectional images,
Reference image decoding means for performing intra-frame decoding;
When predicting a similar omnidirectional image from a decoded reference image, assuming that the end of the reference image is continuous with the other end, a prediction means for generating a predicted omnidirectional image seamlessly;
Have a, a residual decoding means for decoding the difference between the predicted omnidirectional image generated with the input omnidirectional image by the prediction means,
In the prediction means, in a macroblock partly removed from an end of the reference image, a first portion that is a macroblock portion in the reference image;
A third part that is a macroblock part that has the information content indicated by the second part that is a macroblock part that deviates from the end of the reference image and can be predicted to exist near the other end of the reference image;
An image decoding apparatus for performing motion compensation using

A computer-readable recording medium recording a program for causing a computer to implement the image encoding method according to claim 1.

A computer-readable recording medium storing a program for causing a computer to realize the image decoding method according to claim 2.