JP2004007266A

JP2004007266A - Image encoder and its method, image decoder and its method, and program and recording medium

Info

Publication number: JP2004007266A
Application number: JP2002160465A
Authority: JP
Inventors: Takahiro Fukuhara; 福原　隆浩; Eizaburo Itakura; 板倉　英三郎
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2002-05-31
Filing date: 2002-05-31
Publication date: 2004-01-08

Abstract

<P>PROBLEM TO BE SOLVED: To efficiently encode omnidirectional images obtained from multi-camera images. <P>SOLUTION: In an image encoding/decoding system 1, a JPEG-2000 encoding part 12 can perform the random access decoding of an optional camera image by regarding the omnidirectional images taken by a plurality of cameras and supplied from a camera video image input part 11 as title images respectively and performing title encoding. The JPEG-2000 encoding part 12 generates one image by collecting pixel columns or pixel rows of the same position in a plurality of camera images to improve the correlation of adjacent pixels and to improve a compression ratio. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のカメラからの画像を例えばＪＰＥＧ−２０００方式で符号化する画像符号化装置及びその方法、その圧縮された画像を復号する画像復号装置及びその方法、並びにプログラム及び記録媒体に関する。
【０００２】
【従来の技術】
従来の代表的な画像圧縮方式として、ＩＳＯ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｓｔａｎｄａｒｄｓ　Ｏｒｇａｎｉｚａｔｉｏｎ）によって標準化されたＪＰＥＧ（Ｊｏｉｎｔ　Ｐｈｏｔｏｇｒａｐｈｉｃ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）方式がある。これは、離散コサイン変換（ＤＣＴ：Ｄｅｓｃｒｅｔｅ　Ｃｏｓｉｎｅ　Ｔｒａｎｓｆｏｒｍ）を用い、比較的高いビットが割り当てられる場合には、良好な符号化画像及び復号画像を供することが知られている。しかし、ある程度以上に符号化ビット数を少なくすると、ＤＣＴ特有のブロック歪みが顕著になり、主観的に劣化が目立つようになる。
【０００３】
一方、近年では画像をフィルタバンクと呼ばれるハイパス・フィルタとローパス・フィルタとを組み合わせたフィルタによって複数の帯域に分割し、各帯域毎に符号化を行う方式の研究が盛んになっている。その中でも、ウェーブレット変換符号化は、ＤＣＴのように高圧縮でブロック歪みが顕著になるという欠点がないことから、ＤＣＴに代わる新たな技術として有力視されている。
【０００４】
【発明が解決しようとする課題】
ところで、現在、電子スチルカメラやビデオムービでは、上述したＪＰＥＧ方式やＭＰＥＧ（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）方式を用いており、変換方式としてはＤＣＴを用いている。
【０００５】
ここで、ＭＰＥＧでは圧縮効率を向上させる目的でフレーム内符号化画像（Ｉピクチャ）とフレーム間符号化画像（Ｐピクチャ：片方向予測、Ｂピクチャ：両方向予測）とを用いており、復号側ではＩピクチャがないとＰピクチャ及びＢピクチャの復号が行えないため、例えば臨場感画像通信やゲーム等の環境で最近増えつつあるマルチカメラ画像による全方位画像を符号化する際に遅延が生じるという問題が発生する。また、動き補償のためのフレームメモリを多く持たなければならないという制約がある。
【０００６】
そこで、コンシューマ用途のカムコーダではＤＶ、監視用のコーデックではＭｏｔｉｏｎ−ＪＰＥＧやＤＶ、プロフェッショナル用途のノンリニア編集機ではＭｏｔｉｏｎ−ＪＰＥＧやＤＶ、デジタルカメラに搭載された動画コーデックではＭｏｔｉｏｎ−ＪＰＥＧ、といったように、静止画ベースのコーデックが広く使われている。その理由としては、遅延が少ないことの他、静止画ベースであるため後での編集処理等が容易に行えることや、各画像が独立しているため使いまわしがよいこと等が挙げられる。
【０００７】
しかしながら、今後はウェーブレット変換をベースにした製品が市場に出現するものと推測され、各研究機関においても符号化方式の効率向上のための検討が盛んに行われている。実際、ＪＰＥＧ方式の後継とも言える次世代の静止画国際標準方式として期待されているＪＰＥＧ−２０００方式（ＪＰＥＧと同じ組織であるＩＳＯ／ＩＥＣ／ＪＴＣ１　ＳＣ２９／ＷＧ１によって作業中）は、２００１年１月に標準化勧告が出された。このＪＰＥＧ−２０００では、画像圧縮の基本である変換方式として、既存のＪＰＥＧのＤＣＴに代わり、ウェーブレット変換を採用している。また、ＪＰＥＧ−２０００は、ＪＰＥＧよりも高圧縮が実現できる他、プログレッシブ機能や、エラー耐性、可逆・非可逆圧縮・伸張など、ＪＰＥＧにはない豊富な機能を持っている点が大きな特徴である。
【０００８】
また、もう１つの特徴として、動画対応のＭｏｔｉｏｎ−ＪＰＥＧ２０００方式の規格化もなされた。これは動画を構成する各画像をＪＰＥＧ−２０００画像の連続として符号化するものである。
【０００９】
本発明は、このような従来の実情に鑑みて提案されたものであり、ウェーブレット変換を用いた静止画ベースの符号化方式であるＪＰＥＧ−２０００方式やＭｏｔｉｏｎ−ＪＰＥＧ２０００方式を用いて、複数のカメラによって撮像された複数のカメラ画像を効率的に符号化する画像符号化装置及びその方法、その圧縮された画像を復号する画像復号装置及びその方法、並びに画像符号化処理又は画像復号処理をコンピュータに実行させるプログラム及びそのプログラムが記録されたコンピュータ読み取り可能な記録媒体を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上述した目的を達成するために、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像を各構成要素とする全体画像を生成し、上記各構成要素となるカメラ画像を符号化の単位として上記全体画像を符号化する。
【００１１】
また、上述した目的を達成するために、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像を各構成要素とする全体画像を、上記各構成要素となるカメラ画像を単位として符号化して生成された符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された上記複数のカメラ画像を隣接して表示する。
【００１２】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像をそれぞれ１つの画像の構成要素とみなし、この構成要素となるカメラ画像を符号化の単位として符号化し、復号後の複数のカメラ画像を隣接する元の配置となるように表示する。
【００１３】
また、上述した目的を達成するために、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像内の画素を、隣接して配列されるカメラ画像同士が対称になるように配置し、当該配置後のカメラ画像を水平方向及び垂直方向に配列して生成された全体画像を符号化する。
【００１４】
また、上述した目的を達成するために、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像内の画素を、隣接して配列されるカメラ画像同士が対称になるように配置し、当該配置後のカメラ画像を水平方向及び垂直方向に配列した全体画像を符号化して生成した符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された画像を水平方向及び垂直方向に分割して上記複数のカメラ画像を生成し、隣接して配列された画像同士が対称になるように上記複数のカメラ画像の画素が配置されている場合に元の画素配列に戻す。
【００１５】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像を隣接境界で画素値に大きな差がないように反転させて対称になるように水平・垂直方向に接合し、１つの全体画像として符号化する。復号の際には、復号された画像を水平方向及び垂直方向に分割して元の複数のカメラ画像を生成し、隣接して配列された画像同士が対称になるように複数のカメラ画像の画素が配置されている場合には元の画素配列に戻す。
【００１６】
また、上述した目的を達成するために、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像の同一位置の画素列又は画素行をまとめて配置して生成された全体画像を符号化する。
【００１７】
また、上述した目的を達成するために、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像の同一位置の画素列又は画素行をまとめて配置した全体画像を符号化して生成した符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された画像の画素列又は画素行を同一位置に分配することにより上記複数のカメラ画像を生成する。
【００１８】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像の同一位置の画素列又は画素行をまとめ、１つの全体画像として符号化する。復号の際には、復号された画像の画素列又は画素行を同一位置に分配することにより元の複数のカメラ画像を生成する。
【００１９】
また、上述した目的を達成するために、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラのうち１又は２以上の第１のカメラによって撮像された第１のカメラ画像をフレーム内符号化し、符号化された上記第１のカメラ画像を復号し、復号された上記第１のカメラ画像と上記第１のカメラに隣接する第２のカメラによって撮像された第２のカメラ画像との第１の差分画像を符号化する。
【００２０】
ここで、この画像符号化装置及び方法は、復号された上記第２のカメラ画像と上記第２のカメラに隣接する第３のカメラによって撮像された第３のカメラ画像との第２の差分画像を符号化する。
【００２１】
また、上述した目的を達成するために、本発明に係る画像復号装置及び方法は、隣接する複数のカメラのうち１又は２以上の第１のカメラによって撮像された第１のカメラ画像をフレーム内符号化して生成した第１の符号化コードストリームと、符号化された上記第１のカメラ画像を復号し、復号された上記第１のカメラ画像と上記第１のカメラに隣接する第２のカメラによって撮像された第２のカメラ画像との第１の差分画像を符号化して生成した第２の符号化コードストリームとを復号する際に、上記第１の符号化コードストリームを復号して上記第１のカメラ画像を生成し、上記第２の符号化コードストリームを復号して上記第１の差分画像を生成し、上記第１の差分画像と上記第１のカメラ画像とを合成して上記第２のカメラ画像を生成する。
【００２２】
ここで、この画像復号装置及び方法は、復号された上記第２のカメラ画像と上記第２のカメラに隣接する第３のカメラによって撮像された第３のカメラ画像との第２の差分画像を符号化した第３の符号化コードストリームを復号して上記第２の差分画像を生成し、上記第２の差分画像と上記第２のカメラ画像とを合成して上記第３のカメラ画像を生成する。
【００２３】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、フレーム内符号化するカメラ画像と差分画像として符号化するカメラ画像とに分け、画像の内容が類似する隣接カメラ画像間の差分を取って符号化する。復号の際には、復号された差分画像と復号された隣接カメラ画像とを合成して元のカメラ画像を生成する。
【００２４】
また、本発明に係るプログラムは、上述した画像符号化処理又は画像復号処理をコンピュータに実行させるものであり、本発明に係る記録媒体は、そのようなプログラムが記録されたコンピュータ読み取り可能なものである。
【００２５】
【発明の実施の形態】
以下、本発明を適用した具体的な実施の形態について、図面を参照しながら詳細に説明する。この実施の形態は、本発明を、複数のカメラからの画像をＪＰＥＧ−２０００方式で符号化する画像符号化装置及びその方法、並びにその圧縮された画像を復号する画像復号装置及びその方法に適用したものである。
【００２６】
（１）第１の実施の形態
本実施の形態では、１つの画像を複数の矩形領域であるタイルに分割して符号化するというＪＰＥＧ−２０００規格におけるタイル符号化の機能を応用して、複数台のカメラからの全方位画像をそれぞれタイル画像とみなしてタイル符号化を行う。ここで、複数台のカメラによる全方位画像としては、大別して、図１（Ａ）に示すように、環状にカメラを配置し、環状の中心から外部方向に撮像して得られたものと、図１（Ｂ）に示すように、環状にカメラを配置し、環状の中心に向かって撮像して得られたものとの２つのタイプが存在する。そして、例えば図２に示すように、このようにして得られた第１カメラ〜第１６カメラのカメラ画像をそれぞれタイル画像とみなしてタイル符号化を行う。
【００２７】
以下では、本実施の形態における画像符号化復号システムについて説明する前に、先ずＪＰＥＧ−２０００規格におけるウェーブレット変換及びタイル符号化について説明する。
【００２８】
このウェーブレット変換では、通常図３に示すように低域成分が繰り返し変換されるが、これは画像のエネルギの多くが低域成分に集中しているためである。このことは、図４（Ａ）に示す分割レベル＝１から図４（Ｂ）に示す分割レベル＝３のように、分割レベルを進めていくに従って、同図のようにサブバンドが形成されていくことからも分かる。
【００２９】
ここで、図３におけるウェーブレット変換のレベル数は２であり、この結果計７個のサブバンドが形成されている。すなわち、１回目のフィルタリング処理によって水平方向のサイズＸ＿ＳＩＺＥ及び垂直方向のサイズＹ＿ＳＩＺＥがそれぞれ１／２に分割され、ＬＬ−１，ＬＨ−１，ＨＬ−１，ＨＨ−１の４つのサブバンドが生成される。そして２回目のフィルタリング処理によってＬＬ−１がさらに分割されて、ＬＬ−２，ＬＨ−２，ＨＬ−２，ＨＨ−２の４つのサブバンドが生成される。なお、図３においてＬ，Ｈはそれぞれ低域，高域を表し、Ｌ，Ｈの後の数字は分割レベルを表す。すなわち、例えばＬＨ−１は、水平方向が低域で垂直方向が高域である分割レベル＝１のサブバンドを表す。
【００３０】
続いてＪＰＥＧ−２０００規格におけるタイル符号化について説明する。ＪＰＥＧ−２０００規格で定義されているタイル符号化時のパラメータを図５に示す。図５に示すように、１つの画像は、例えばタイルインデックスがＴ０〜Ｔ１９である２０個のタイルに分割される。各タイルの水平サイズはＸＴｓｉｚ、垂直サイズはＹＴｓｉｚで与えられる。本実施の形態では、（ＸＴｓｉｚ、ＹＴｓｉｚ）がカメラ画像の水平サイズ、垂直サイズになる。また、ＪＰＥＧ−２０００規格では、画像やタイルの位置がリファレンス・グリッド（Ｒｅｆｅｒｅｎｃｅ　Ｇｒｉｄ）と称する座標軸を用いて表現され、これにより、解像度変換を行った際に画像のサンプリング位置を容易に決定することができる。ここで、各タイルは、単独で符号化・復号できる必要があるため、タイル境界を越えての画素参照はできない。タイルの位置は、リファレンス・グリッドの原点とタイルの左上との相対位置で指定され、この相対位置が図５におけるＸＴＯｓｉｚ、ＹＴＯｓｉｚと定められている。
【００３１】
また、ＪＰＥＧ−２０００規格のタイル符号化では、タイル内の画像をウェーブレット変換する際に、隣接タイル領域までフィルタがはみ出る際には、その隣接タイルの画素を使わず、図６に示すように符号化対象のタイル内の画素を対称拡張してウェーブレット変換を行う。図６の例では、符号化対象のタイルの外部に対して水平方向に画素ｂ，ｃを、垂直方向に画素ｅ，ｆをそれぞれ対称拡張している。ＪＰＥＧ−２０００規格では、可逆型５×３ウェーブレット変換フィルタと非可逆型９×７ウェーブレット変換フィルタとが存在し、それぞれのウェーブレット変換フィルタに対してタイル境界から対称拡張すべき画素数が定義されているため、本実施の形態でもそれに従えばよい。
【００３２】
次に、第１の実施の形態における画像符号化復号システムの概略構成を図７に示す。図７に示すように、画像符号化復号システム１は、カメラ映像入力部１１と、ＪＰＥＧ−２０００符号化部１２と、コードストリーム記録部１３とを有する画像符号化装置１０と、ＪＰＥＧ−２０００復号部２１と、画像表示部２２とを有する画像復号装置２０とから構成されている。画像符号化装置１０と画像復号装置２０とは、例えばインターネットであるネットワークＮを介して接続されている。
【００３３】
画像符号化装置１０において、カメラ映像入力部１１は、複数台のカメラからの全方位画像を得て、このカメラ画像情報Ｄ１０をＪＰＥＧ−２０００符号化部１２に供給する。
【００３４】
ＪＰＥＧ−２０００符号化部１２は、カメラ画像入力部１１から供給されたカメラ画像情報Ｄ１０を上述したように各タイル画像とみなし、これをＪＰＥＧ−２０００方式に従って符号化する。ＪＰＥＧ−２０００符号化部１２は、得られた符号化コードストリームＤ１１をネットワークＮを介して画像復号装置２０に伝送し、又はコードストリーム記録部１３に供給して記録媒体に記録させる。なお、カメラ画像が複数個存在する場合には、符号化コードストリームＤ１１もカメラ数だけ存在することになる。
【００３５】
コードストリーム記録部１３は、ＪＰＥＧ−２０００符号化部１２から供給された符号化コードストリームＤ１１を、例えばハードディスクやメモリカードといった図示しない記録媒体に記録する。
【００３６】
なお、図２に示す計１６個のカメラ画像は、１時刻当たりの画像である。したがって、例えば通常のテレビジョン映像のように１秒間に３０フレームの動画像の場合、ＪＰＥＧ−２０００符号化部１２は、図８に示すように図２の画像を毎秒３０フレームの速度で符号化する。
【００３７】
一方、画像復号装置２０において、ＪＰＥＧ−２０００復号部２１は、画像符号化装置１０から伝送され、又は記録媒体を介して画像符号化装置１０から供給された符号化コードストリームＤ１１をＪＰＥＧ−２０００方式に従って復号する。ここで、符号化コードストリームＤ１１内には、タイル符号化の有無や、タイルサイズを記述するパラメータが定義されているため、ＪＰＥＧ−２０００復号部２１は、それらを忠実に復号する。なお、タイル符号化は隣接タイルの影響を受けず完全に独立しているため、ランダムアクセス復号が可能とされる。ＪＰＥＧ−２０００復号部２１は、得られた復号画像Ｄ２０を画像表示部２２に供給する。
【００３８】
画像表示部２２は、ＪＰＥＧ−２０００復号部２１から供給された復号画像Ｄ２０を表示する。ここで、複数台のカメラ分だけ復号画像Ｄ２０が存在するため、画像表示部２２は、それらを図１に示した配置になるように表示する。
【００３９】
このように、第１の実施の形態では、複数のカメラ画像をそれぞれ１つの画像のタイル画像とみなしてタイル符号化を行うことにより、ランダムアクセス復号が可能となり、任意のカメラ画像の符号化コードストリームを抽出して復号することができるため、操作性が非常に向上する。例えばインターネット環境下において、クライアント側で人間が実際に見ている視点位置のカメラ画像のみを復号することができる。これにより、全てのカメラ画像を復号する必要がなく、効率的である。
【００４０】
（２）第２の実施の形態
第１の実施の形態で画像復号装置２０が十分に高速でない場合がある。例えばカメラ数が多い場合は、これら全部をリアルタイムに復号するのにはハードウェア化してもかなりの規模のＬＳＩ化が必要になる。したがって、安価なコーデックＬＳＩではパフォーマンス的にアルタイム復号ができない可能性がある。そこで、以下に説明する第２の実施の形態では、ＪＰＥＧ−２０００方式の符号化コードストリームの持つスケーラビリティを積極的に利用する。ここで、スケーラビリティとは、スケールが自由に変えられること、つまり１つの符号化コードストリームを用いて解像度や画質を自由に変えて復号できることを意味している。
【００４１】
ＪＰＥＧ−２０００方式の符号化コードストリームの特徴とスケラービリティの具体的な実現手段について説明する。ＪＰＥＧ−２０００方式では、上述したようにウェーブレット変換を用いて画像を符号化する。この際、図３に示したように低域成分の解像度を階層的に分割するため、元々の符号化コードストリームは、多重解像度構造を保有することになる。
【００４２】
また、ＪＰＥＧ−２０００方式では、ビットプレーン符号化を採用しており、ＭＳＢ（Ｍｏｓｔ　Ｓｉｇｎｉｆｉｃａｎｔ　Ｂｉｔ）からＬＳＢ（Ｌｅａｓｔ　Ｓｉｇｎｉｆｉｃａｎｔ　Ｂｉｔ）に展開して符号化する。これらから、符号化の過程で、図９に示すように、ＭＳＢからＬＳＢ方向にレイヤ化した符号化コードストリームを生成することができる。
【００４３】
また、ＪＰＥＧ−２０００方式では、符号化コードストリームをパケットという単位で生成する。ここで、規定により、同一の解像度レベルにつき１つのパケットが割り当てられる。図９の例では、レベル３までウェーブレット変換を行っているため、１個のレイヤに付き４個のパケットが存在する。
【００４４】
これらの特徴を生かして、例えば画像復号装置２０のパフォーマンスが十分でない場合には、図９の低域成分（Ｐａｃｋｅｔ−０、Ｐａｃｋｅｔ−４、Ｐａｃｋｅｔ−８）のみを復号して表示することが有効である。この場合、原画像と比較して水平・垂直のサイズが８分の１の大きさになるため、原画像の大きさの符号化コードストリームを復号する場合と比較して大幅に処理の負担を軽減することができ、リアルタイムでカメラの個数分だけの復号画像を生成することが可能となる。
【００４５】
実際の画像で実現した例を図１０に示す。図１０に示すように、最低域（レベル０）から最高域（レベル３）に至る解像度が図示されている。そこで、画像復号装置２０のパフォーマンスに応じて、可能なレベルまでのパケットの符号化コードストリームを復号することができる。
【００４６】
また、上位レイヤのパケットの符号化コードストリームのみを復号することも有効である。この場合、カメラ画像の原画像と同じ解像度で画質の劣化した画像が出力されることになる。実際の画像で実現した例を図１１に示す。図１１に示すように、最低域（レイヤ０）から最高域（レイヤ２）に至る復号画像が図示されている。そこで、画像復号装置２０のパフォーマンスに応じて、可能なレイヤまでのパケットの符号化コードストリームを復号することができる。
【００４７】
このように、第２の実施の形態では、画像復号装置２０のパフォーマンスに応じて、可能なレベルまで、或いは可能なレイヤまでの符号化コードストリームを復号することで、カメラ数が多い場合であってもリアルタイムに復号することが可能となる。
【００４８】
（３）第３の実施の形態
本実施の形態では、例えば図１２に示すように、第１カメラ〜第９カメラの合計９個のカメラ画像を水平・垂直方向に接合して１つの画像とする。この際、単純に複数のカメラ画像を接合するのではなく、隣接境界で画素値に大きな差がないように、予めカメラ画像を反転させて対称になるように配置して、接合させる。これにより、ウェーブレット変換した際に、エッジ等が現れやすい隣接部における高域成分が発生し難いために全体の符号化効率が向上し、圧縮率向上に繋がる。実際の構成としては、図７に示した画像符号化復号システム１を用いることができるため、必要に応じて図７を参照しながら説明する。
【００４９】
図７に示したカメラ映像入力部１１は、複数台のカメラからカメラ画像を得て、このカメラ画像情報Ｄ１０をＪＰＥＧ−２０００符号化部１２に供給する。
【００５０】
ＪＰＥＧ−２０００符号化部１２は、カメラ画像入力部１１から供給されたカメラ画像情報Ｄ１０を上述したように水平・垂直方向に接合して１つの画像とし、この画像をＪＰＥＧ−２０００方式に従って符号化する。ＪＰＥＧ−２０００符号化部１２は、得られた符号化コードストリームＤ１１をネットワークＮを介して画像復号装置２０に伝送し、又はコードストリーム記録部１３に供給して記録媒体に記録させる。
【００５１】
なお、図１２の計９個のカメラ画像は、１時刻当たりの画像であり、例えば通常のテレビジョン映像のように、１秒間に３０フレームの動画像の場合、ＪＰＥＧ−２０００符号化部１２は、図１２の画像を毎秒３０フレームの速度で符号化する。
【００５２】
ＪＰＥＧ−２０００復号部２１は、画像符号化装置１０から伝送され、又は記録媒体を介して画像符号化装置１０から供給された符号化コードストリームＤ１１を対称変換前の画像に戻し、これをＪＰＥＧ−２０００方式に従って復号する。ＪＰＥＧ−２０００復号部２１は、得られた復号画像Ｄ２０を画像表示部２２に供給する。
【００５３】
画像表示部２２は、ＪＰＥＧ−２０００復号部２１から供給された復号画像Ｄ２０を表示する。ここで、複数台のカメラ分だけ復号画像Ｄ２０が存在するため、画像表示部２２は、それらを図１に示した配置になるように表示する。
【００５４】
このように、第３の実施の形態では、複数のカメラ画像を隣接境界で画素値に大きな差がないように反転させて対称になるように水平・垂直方向に接合し、１つの画像とすることにより、ウェーブレット変換した際に、エッジ等が現れやすい隣接部における高域成分が発生し難くなり、全体の符号化効率を向上することができる。
【００５５】
（４）第４の実施の形態
本実施の形態では、例えば図１３に示すように、複数のカメラ画像の同一位置の画素列をまとめて１つの画像を生成する。図１３の例では、４個のカメラ画像について、左端から順番に４列ずつをまとめる。これにより、最終的に生成される画像は、カメラ画像と垂直サイズが同じで、水平サイズが４倍の解像度となる。実際の構成としては、図７に示した画像符号化復号システム１を用いることができるため、必要に応じて図７を参照しながら説明する。
【００５６】
図７に示したカメラ映像入力部１１は、複数台のカメラからカメラ画像を得て、このカメラ画像情報Ｄ１０をＪＰＥＧ−２０００符号化部１２に供給する。
【００５７】
ＪＰＥＧ−２０００符号化部１２は、カメラ画像入力部１１から供給されたカメラ画像情報Ｄ１０について、上述したように同一位置の画素列をまとめて１つの画像とし、この画像をＪＰＥＧ−２０００方式に従って符号化する。ＪＰＥＧ−２０００符号化部１２は、得られた符号化コードストリームＤ１１をネットワークＮを介して画像復号装置２０に伝送し、又はコードストリーム記録部１３に供給して記録媒体に記録させる。
【００５８】
ここで、隣接カメラ画像の特徴、特に同位置の画像の内容（画素値）が類似しているため、新しく生成した画像の隣接画素の相関が高くなり、圧縮率の向上が期待できる。
【００５９】
ＪＰＥＧ−２０００復号部２１は、画像符号化装置１０から伝送され、又は記録媒体を介して画像符号化装置１０から供給された符号化コードストリームＤ１１について、同一位置の画素列を複数のカメラ画像の画素列に分配し、複数のカメラ画像を生成する。そして、ＪＰＥＧ−２０００復号部２１は、この複数のカメラ画像をＪＰＥＧ−２０００方式に従って復号する。ＪＰＥＧ−２０００復号部２１は、得られた復号画像Ｄ２０を画像表示部２２に供給する。
【００６０】
画像表示部２２は、ＪＰＥＧ−２０００復号部２１から供給された復号画像Ｄ２０を表示する。ここで、複数台のカメラ分だけ復号画像Ｄ２０が存在するため、画像表示部２２は、それらを図１に示した配置になるように表示する。
【００６１】
なお、複数のカメラ画像の、同一位置の画素列をまとめる代わりに、例えば図１４に示すように、複数のカメラ画像の同一位置の画素行をまとめることもできる。図１４の例では、４個のカメラ画像について、画像の上から下の方向に１行ずつ複数のカメラ画像の同一位置の画素行を抽出し、これを順に並べて１つの画像を生成する。これにより、最終的に生成される画像は、カメラ画像と水平サイズが同じで、垂直サイズが４倍の解像度となる。
【００６２】
ＪＰＥＧ−２０００符号化部１２は、このように複数のカメラ画像の同一位置の画素行をまとめて１つの画像を生成し、生成された画像を、ＪＰＥＧ−２０００方式に従って符号化する。
【００６３】
ＪＰＥＧ−２０００復号部２１は、同一位置の画素行を複数のカメラ画像の画素列に分配し、複数のカメラ画像を生成する。そして、ＪＰＥＧ−２０００復号部２１は、この複数のカメラ画像をＪＰＥＧ−２０００方式に従って復号する。
【００６４】
画像表示部２２は、ＪＰＥＧ−２０００復号部２１から供給された復号画像Ｄ２０を表示する。ここで、複数台のカメラ分だけ復号画像Ｄ２０が存在するため、画像表示部２２は、それらを図１に示した配置になるように表示する。
【００６５】
このように、第４の実施の形態では、複数のカメラ画像の同一位置の画素列又は画素行をまとめて１つの画像を生成することにより隣接画素の相関を高め、圧縮率を向上することが可能となる。
【００６６】
なお、上述した第１、第３、第４の実施の形態の説明において、ＪＰＥＧ−２０００符号化部１２は、図１５で示すように、各画像をＩピクチャ（フレーム内符号化画像）として符号化するのが一般的である。
【００６７】
（５）第５の実施の形態
上述した第１、第３、第４の実施の形態では、全ての画像をＩピクチャ（フレーム内符号化画像）として符号化していたが、本実施の形態では、Ｉピクチャとして符号化する画像と差分画像として符号化する画像とに分けて符号化する。以下では、差分画像として符号化される画像をＳピクチャという。なお、ＭＰＥＧ方式ではマクロブロック単位で動き予測を行い、予測誤差について符号化を行うが、本実施の形態では、簡略化のため、動き予測を行わない単純な差分画像に対して符号化を行うこととする。
【００６８】
先ず本実施の形態における符号化方式の概念図を図１６に示す。上述した図１５では全てＩピクチャとして符号化されるのに対して、図１６では、第１カメラのみがＩピクチャとして符号化され、他のカメラ画像はＳピクチャとして符号化される。例えば、第２カメラの入力画像は、第１カメラの復号画像との差分が取られ、この差分画像が符号化される。同様に、第３カメラの入力画像は、第２カメラの復号画像との差分が取られ、この差分画像が符号化される。
【００６９】
次に本実施の形態における画像符号化復号システムの概略構成を図１７に示す。図１７に示すように、画像符号化復号システム２は、画像符号化装置３０と画像復号装置４０とから構成されており、画像符号化装置３０と画像復号装置４０とは、例えばインターネットであるネットワークＮを介して接続されている。ここで、画像符号化装置３０は、カメラ映像入力部３１と、ＪＰＥＧ−２０００符号化部３２と、ＪＰＥＧ−２０００復号部３３と、減算器３４と、差分画像符号化部３５と、コードストリーム記録部３６とを有する。また、画像復号装置４０は、ＪＰＥＧ−２０００復号部４１と、差分画像復号部４２と、加算器４３と、画像表示部４４とを有する。
【００７０】
画像符号化装置３０において、カメラ映像入力部３１は、複数台のカメラからの全方位画像を得て、このカメラ画像情報Ｄ３０をＪＰＥＧ−２０００符号化部３２及び減算器３４に供給する。
【００７１】
ＪＰＥＧ−２０００符号化部３２は、カメラ映像入力部３１から供給されたカメラ画像情報Ｄ３０をＪＰＥＧ−２０００方式に従って符号化する。ＪＰＥＧ−２０００符号化部３２は、得られた符号化コードストリームＤ３１をＪＰＥＧ−２０００復号部３３に供給する。また、ＪＰＥＧ−２０００符号化部３２は、Ｉピクチャとして符号化した符号化コードストリームＤ３２をネットワークＮを介して画像復号装置４０に伝送し、又はコードストリーム記録部３６に供給する。
【００７２】
ＪＰＥＧ−２０００復号部３３は、ＪＰＥＧ−２０００符号化部３２から供給された符号化コードストリームＤ３１をＪＰＥＧ−２０００方式に従って復号し、復号画像情報Ｄ３３を生成する。減算器３４は、カメラ映像入力部３１からＳピクチャとして符号化するカメラ画像情報Ｄ３０が供給されると、このカメラ画像情報Ｄ３０から隣接するカメラの復号画像情報Ｄ３３を減算し、得られた差分画像情報Ｄ３４を差分画像符号化部３５に供給する。
【００７３】
差分画像符号化部３５は、減算器３４から供給された差分画像情報Ｄ３４を、例えばファイル圧縮等によく使われるＬｉｖ−Ｚｅｍｐｅｌ符号や算術符号を用いて符号化する。ここで、ＪＰＥＧ−２０００方式に従って差分画像情報Ｄ３４を符号化しないのは、差分画像は一種のノイズ画像であり、画像内部の相関が非常に小さいため、画像の相関を利用するＪＰＥＧ−２０００方式が適さないからである。差分画像符号化部３５は、得られた符号化コードストリームＤ３５をネットワークＮを介して画像復号装置４０に伝送し、又はコードストリーム記録部３６に供給して記録媒体に記録させる。
【００７４】
一方、画像復号装置４０において、ＪＰＥＧ−２０００復号部４１は、Ｉピクチャとして符号化されて画像符号化装置３０から伝送され、又は記録媒体を介して画像符号化装置３０から供給された符号化コードストリームＤ３２をＪＰＥＧ−２０００方式に従って復号する。ＪＰＥＧ−２０００復号部４１は、得られた復号画像情報Ｄ４０を加算器４３及び画像表示部４４に供給する。
【００７５】
差分画像復号部４２は、Ｓピクチャとして符号化されて画像符号化装置３０から伝送され、又は記録媒体を介して画像符号化装置３０から供給された符号化コードストリームＤ３５を復号し、得られた差分画像情報Ｄ４１を加算器４３に供給する。加算器４３は、差分画像復号部４２から差分画像情報Ｄ４１が供給されると、この差分画像情報Ｄ４１と復号画像情報Ｄ４０又は隣接するカメラの復号画像情報とを加算し、得られた復号画像情報Ｄ４２を画像表示部４４に供給する。
【００７６】
画像表示部４４は、ＪＰＥＧ−２０００復号部４１から供給された復号画像情報Ｄ４０と加算器４３から供給された復号画像情報Ｄ４２とを表示する。ここで、複数台のカメラ分だけ復号画像が存在するため、画像表示部４４は、それらを図１に示した全方位画像の配置になるように表示する。
【００７７】
このように、第５の実施の形態では、Ｉピクチャとして符号化する画像とＳピクチャとして符号化する画像とに分け、画像の内容が類似する隣接カメラ画像間の差分を取って符号化することにより、Ｉピクチャのみの場合よりも圧縮率を向上することができる。その反面、少なくとも参照用の復号画像を保持しておくためのメモリが必要とされ、また、カメラ画像を順番に符号化していくことによりカメラ数分だけ遅延が発生するため、システムの要求条件によってＳピクチャを使い分けることが好ましい。
【００７８】
なお、図１６の例では、Ｉピクチャの位置を常に第１カメラに固定することにより切り替えの必要がなくなり、ハードウェアの制御を簡略化することができたが、このようにＩピクチャの位置を１つのカメラに固定することに限定されるものではない。例えば図１８に示すように、複数個のカメラ画像をＩピクチャで符号化し、Ｉピクチャに隣接したカメラ画像をＳピクチャで符号化するようにしても構わない。これにより、図１６の場合と比較して遅延時間を減らすことができる。また、重要なカメラ画像をＩピクチャで符号化するように切り替えても構わない。例えば、インターネット環境下において、クライアント側で人間が実際に見ている視点の中心位置のカメラ画像をサーバから伝送する場合には、この視点の中心位置のカメラ画像をＩピクチャとして符号化するように切り替えるのが好ましい。
【００７９】
（６）その他
本発明は上述した実施の形態のみに限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能であることは勿論である。
【００８０】
例えば、上述の実施の形態では、環状に配置された複数のカメラによって全方位画像を得るものとして説明したが、これに限定されるものではなく、例えば円弧状に配置された複数のカメラによって複数方位の画像を得るようにしても構わない。
【００８１】
また、上述の実施の形態では、ハードウェアの構成として説明したが、これに限定されるものではなく、任意の処理を、ＣＰＵ（Ｃｅｎｔｒａｌ　Ｐｒｏｃｅｓｓｉｎｇ　Ｕｎｉｔ）にコンピュータプログラムを実行させることにより実現することも可能である。この場合、コンピュータプログラムは、記録媒体に記録して提供することも可能であり、また、インターネットその他の伝送媒体を介して伝送することにより提供することも可能である。
【００８２】
【発明の効果】
以上詳細に説明したように本発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像を各構成要素とする全体画像を生成し、上記各構成要素となるカメラ画像を符号化の単位として上記全体画像を符号化する。
【００８３】
また、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像を各構成要素とする全体画像を、上記各構成要素となるカメラ画像を単位として符号化して生成された符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された上記複数のカメラ画像を隣接して表示する。
【００８４】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像をそれぞれ１つの画像の構成要素とみなし、この各構成要素となるカメラ画像を符号化の単位として符号化し、復号後の複数のカメラ画像を隣接する元の配置となるように表示する。これにより、ランダムアクセス復号が可能となり、任意のカメラ画像を抽出して復号することができるため、操作性が非常に向上する。
【００８５】
また、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像内の画素を、隣接して配列されるカメラ画像同士が対称になるように配置し、当該配置後のカメラ画像を水平方向及び垂直方向に配列して生成された全体画像を符号化する。
【００８６】
また、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像内の画素を、隣接して配列されるカメラ画像同士が対称になるように配置し、当該配置後のカメラ画像を水平方向及び垂直方向に配列された全体画像を符号化して生成した符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された画像を水平方向及び垂直方向に分割して上記複数のカメラ画像を生成し、隣接する画像同士が対称になるように上記複数のカメラ画像の画素が配置されている場合に元の画素配列に戻す。
【００８７】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像を隣接境界で画素値に大きな差がないように反転させて対称になるように水平・垂直方向に接合し、１つの全体画像として符号化する。復号の際には、復号された画像を水平方向及び垂直方向に分割して元の複数のカメラ画像を生成し、隣接して配列された画像同士が対称になるように複数のカメラ画像の画素が配置されている場合には元の画素配列に戻す。これにより、エッジ等が現れやすい隣接部における高域成分が発生し難くなり、全体の符号化効率を向上することができる。
【００８８】
また、発明に係る画像符号化装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像の同一位置の画素列又は画素行をまとめて配置して生成された全体画像を符号化する。
【００８９】
また、本発明に係る画像復号装置及び方法は、隣接する複数のカメラによって撮像された複数のカメラ画像の同一位置の画素列又は画素行をまとめて配置した全体画像を符号化して生成した符号化コードストリームを復号する際に、上記符号化コードストリームを復号し、復号された画像の画素列又は画素行を同一位置に分配することにより上記複数のカメラ画像を生成する。
【００９０】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、複数のカメラ画像の同一位置の画素列又は画素行をまとめ、１つの全体画像として符号化する。復号の際には、復号された画像の画素列又は画素行を同一位置に分配することにより元の複数のカメラ画像を生成する。これにより、隣接画素の相関を高め、圧縮率を向上することが可能となる。
【００９１】
また、本発明に係る画像符号化装置及び方法は、隣接する複数のカメラのうち１又は２以上の第１のカメラによって撮像された第１のカメラ画像をフレーム内符号化し、符号化された上記第１のカメラ画像を復号し、復号された上記第１のカメラ画像と上記第１のカメラに隣接する第２のカメラによって撮像された第２のカメラ画像との第１の差分画像を符号化する。
【００９２】
ここで、この画像符号化装置及び方法は、復号された上記第２のカメラ画像と上記第２のカメラに隣接する第３のカメラによって撮像された第３のカメラ画像との第２の差分画像を符号化する。
【００９３】
また、本発明に係る画像復号装置及び方法は、隣接する複数のカメラのうち１又は２以上の第１のカメラによって撮像された第１のカメラ画像をフレーム内符号化して生成した第１の符号化コードストリームと、符号化された上記第１のカメラ画像を復号し、復号された上記第１のカメラ画像と上記第１のカメラに隣接する第２のカメラによって撮像された第２のカメラ画像との第１の差分画像を符号化して生成した第２の符号化コードストリームとを復号する際に、上記第１の符号化コードストリームを復号して上記第１のカメラ画像を生成し、上記第２の符号化コードストリームを復号して上記第１の差分画像を生成し、上記第１の差分画像と上記第１のカメラ画像とを合成して上記第２のカメラ画像を生成する。
【００９４】
ここで、この画像復号装置及び方法は、復号された上記第２のカメラ画像と上記第２のカメラに隣接する第３のカメラによって撮像された第３のカメラ画像との第２の差分画像を符号化した第３の符号化コードストリームを復号して上記第２の差分画像を生成し、上記第２の差分画像と上記第２のカメラ画像とを合成して上記第３のカメラ画像を生成する。
【００９５】
このような画像符号化装置及び方法、並びに画像復号装置及び方法では、フレーム内符号化するカメラ画像と差分画像として符号化するカメラ画像とに分け、画像の内容が類似する隣接カメラ画像間の差分を取って符号化する。復号の際には、復号された差分画像と復号された隣接カメラ画像とを合成して元のカメラ画像を生成する。これにより、全てのカメラ画像をフレーム内符号化する場合よりも圧縮率を向上することができる。
【００９６】
また、本発明に係るプログラムは、上述した画像符号化処理又は画像復号処理をコンピュータに実行させるものであり、本発明に係る記録媒体は、そのようなプログラムが記録されたコンピュータ読み取り可能なものである。
【００９７】
このようなプログラム及び記録媒体によれば、上述した画像符号化処理又は画像復号処理をソフトウェアにより実現することができる。
【図面の簡単な説明】
【図１】第１の実施の形態における全方位画像を説明する図であり、同図（Ａ）は、同心円上にカメラを配置し、同心円の中心から外部方向に撮像する例を示し、図図（Ｂ）は、同心円上にカメラを配置し、同心円の中心に向かって撮像する例を示す。
【図２】同第１の実施の形態において、複数のカメラ画像をタイル画像とみなす例を説明する図である。
【図３】第２レベルまでウェーブレット変換した場合のサブバンドを説明する図である。
【図４】実際の画像をウェーブレット変換した場合のサブバンドを説明する図であり、同図（Ａ）は、第１レベルまで分割した例を示し、同図（Ｂ）は、第３レベルまで分割した例を示す。
【図５】同第１の実施の形態におけるタイル符号化時のパラメータを説明する図である。
【図６】同タイル符号化の際のウェーブレット変換の及ぶ範囲を説明する図である。
【図７】同第１の実施の形態における画像符号化復号システムの概略構成を説明する図である。
【図８】タイル画像群の時間方向の符号化を説明する図である。
【図９】第２の実施の形態において、複数レイヤに符号化コードストリームを分割する際のレイヤ構造及びパケットの配置を説明する図である。
【図１０】実際の画像における解像度プログレシブを説明する図である。
【図１１】実際の画像における画質プログレシブを説明する図である。
【図１２】第３の実施の形態において、複数のカメラ画像を対称関係になるように配置する例を説明する図である。
【図１３】第４の実施の形態において、複数のカメラ画像の同一列の画素をまとめて１つの画像とする例を説明する図である。
【図１４】同第４の実施の形態において、複数のカメラ画像の同一行の画素をまとめて１つの画像とする例を説明する図である。
【図１５】全てのカメラ画像をＩピクチャとして符号化する例を説明する図である。
【図１６】第５の実施の形態において、複数のカメラ画像のうち、１つのカメラ画像をＩピクチャとして符号化し、その他のカメラ画像をＳピクチャとして符号化する例を説明する図である。
【図１７】同第５の実施の形態における画像符号化復号システムの概略構成を説明する図である。
【図１８】同第５の実施の形態において、複数のカメラ画像のうち、複数のカメラ画像をＩピクチャとして符号化し、その他のカメラ画像をＳピクチャとして符号化する例を説明する図である。
【符号の説明】
１，２　画像符号化復号システム、１０　画像符号化装置、１１　カメラ映像入力部、１２　ＪＰＥＧ−２０００符号化部、１３　コードストリーム記録部、２０　画像復号装置、２１　ＪＰＥＧ−２０００復号部、２２　画像表示部、３０　画像符号化装置、３１　カメラ映像入力部、３２　ＪＰＥＧ−２０００符号化部、３３　ＪＰＥＧ−２０００復号部、３４　減算器、３５　差分画像符号化部、３６　コードストリーム記録部、４０　画像復号装置、４１　ＪＰＥＧ−２０００復号部、４２　差分画像復号部、４３　加算器、４４　画像表示部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an image encoding apparatus and method for encoding images from a plurality of cameras by, for example, the JPEG-2000 system, an image decoding apparatus and method for decoding a compressed image, a program, and a recording medium.
[0002]
[Prior art]
As a conventional representative image compression method, there is a JPEG (Joint Photographic Experts Group) method standardized by ISO (International Standards Organization). It uses a discrete cosine transform (DCT) and is known to provide good encoded and decoded images when relatively high bits are assigned. However, if the number of coded bits is reduced to a certain degree or more, block distortion peculiar to DCT becomes remarkable, and deterioration is subjectively noticeable.
[0003]
On the other hand, in recent years, researches on a method of dividing an image into a plurality of bands by a filter called a filter bank, which is a combination of a high-pass filter and a low-pass filter, and performing encoding for each band have been actively conducted. Above all, wavelet transform coding is regarded as a promising new technology to replace DCT because it does not have the drawback that block distortion becomes remarkable at high compression unlike DCT.
[0004]
[Problems to be solved by the invention]
By the way, at present, electronic still cameras and video movies use the above-mentioned JPEG method and MPEG (Moving Picture Experts Group) method, and use DCT as a conversion method.
[0005]
Here, MPEG uses an intra-frame coded image (I picture) and an inter-frame coded image (P picture: unidirectional prediction, B picture: bidirectional prediction) for the purpose of improving compression efficiency. Since there is no I-picture, decoding of P-pictures and B-pictures cannot be performed. For example, there is a problem that a delay occurs when encoding an omnidirectional image by a multi-camera image, which is recently increasing in an environment such as real-life image communication and a game. Occurs. In addition, there is a restriction that a large number of frame memories for motion compensation must be provided.
[0006]
For example, DV for consumer camcorders, Motion-JPEG or DV for surveillance codecs, Motion-JPEG or DV for non-linear editing machines for professional applications, Motion-JPEG for video codecs mounted on digital cameras, etc. Still image based codecs are widely used. Reasons for this include, in addition to the small delay, the fact that the editing process is easily performed later because the image is based on a still image, and the fact that each image is independent so that it can be easily used.
[0007]
However, products based on the wavelet transform are expected to appear in the market in the future, and various research institutions are actively studying ways to improve the efficiency of the coding method. In fact, the JPEG-2000 system (working on ISO / IEC / JTC1 SC29 / WG1, which is the same organization as JPEG), which is expected to be the next generation international standard for still images, which can be said to be the successor to the JPEG system, was released in January 2001 A standardization recommendation was issued. In JPEG-2000, a wavelet transform is adopted as a conversion method which is the basis of image compression, instead of the existing JPEG DCT. In addition, JPEG-2000 is notable for achieving higher compression than JPEG, and has a great feature that JPEG-2000 has abundant functions such as progressive function, error tolerance, reversible / irreversible compression / expansion, etc. which are not available in JPEG. .
[0008]
As another feature, the Motion-JPEG2000 standard for moving images has been standardized. In this method, each image constituting a moving image is encoded as a sequence of JPEG-2000 images.
[0009]
The present invention has been proposed in view of such a conventional situation. A plurality of cameras are proposed using a JPEG-2000 or Motion-JPEG2000 system which is a still image-based coding system using a wavelet transform. Encoding Device and Method for Efficiently Encoding a Plurality of Camera Images Captured by Camera, Image Decoding Device and Method for Decoding the Compressed Image, and Image Encoding or Image Decoding Process to Computer An object of the present invention is to provide a program to be executed and a computer-readable recording medium on which the program is recorded.
[0010]
[Means for Solving the Problems]
In order to achieve the above-described object, an image encoding apparatus and method according to the present invention generates an entire image including a plurality of camera images captured by a plurality of adjacent cameras as respective components, and generates the entire image. The entire image is encoded using the camera image as a unit of encoding.
[0011]
Further, in order to achieve the above-described object, the image decoding device and the method according to the present invention provide an overall image having a plurality of camera images captured by a plurality of adjacent cameras as the respective components, and the above-described respective components. When decoding an encoded code stream generated by encoding a camera image as a unit, the encoded code stream is decoded, and the decoded camera images are displayed adjacent to each other.
[0012]
In such an image encoding apparatus and method, and an image decoding apparatus and method, a plurality of camera images are each regarded as a component of one image, and the camera image serving as the component is encoded and encoded as a unit of encoding. A plurality of subsequent camera images are displayed so as to have an adjacent original arrangement.
[0013]
In addition, in order to achieve the above-described object, an image encoding device and an image encoding method according to an aspect of the present invention include a method for combining pixels in a plurality of camera images captured by a plurality of adjacent cameras with camera images arranged adjacent to each other. Are arranged symmetrically, and the entire image generated by arranging the camera images after the arrangement in the horizontal and vertical directions is encoded.
[0014]
Further, in order to achieve the above-described object, the image decoding device and the image decoding method according to the present invention are configured such that pixels in a plurality of camera images taken by a plurality of adjacent cameras are used for connecting camera images arranged adjacent to each other. Arranged so as to be symmetrical, when decoding the encoded code stream generated by encoding the entire image in which the arranged camera images are arranged in the horizontal and vertical directions, decoding the encoded code stream, When the decoded image is divided in the horizontal and vertical directions to generate the plurality of camera images, and the pixels of the plurality of camera images are arranged so that the images arranged adjacent to each other are symmetric. To the original pixel array.
[0015]
In such an image encoding apparatus and method, and an image decoding apparatus and method, a plurality of camera images are inverted in a horizontal and vertical direction so as to be symmetrical by being inverted so that there is no large difference in pixel values at adjacent boundaries. , As one entire image. At the time of decoding, the decoded image is divided in the horizontal and vertical directions to generate the original plurality of camera images, and the pixels of the plurality of camera images are arranged so that the adjacently arranged images are symmetric. Are arranged, the original pixel arrangement is restored.
[0016]
Further, in order to achieve the above-described object, an image encoding device and method according to the present invention collectively arrange pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras. The whole image generated by the above is encoded.
[0017]
Further, in order to achieve the above-described object, an image decoding apparatus and method according to the present invention provide an image decoding apparatus and method in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged. When decoding an encoded code stream generated by encoding an image, decoding the encoded code stream and distributing the plurality of camera images by distributing pixel columns or pixel rows of the decoded image to the same position. Generate.
[0018]
In such an image encoding device and method, and an image decoding device and method, pixel columns or pixel rows at the same position of a plurality of camera images are combined and encoded as one entire image. At the time of decoding, a plurality of original camera images are generated by distributing pixel columns or pixel rows of the decoded image to the same position.
[0019]
In addition, in order to achieve the above-described object, an image encoding device and an image encoding method according to the present invention include a method of converting a first camera image captured by one or more first cameras among a plurality of adjacent cameras into a frame. Decoding the first camera image encoded by the inner encoding, and decoding the first camera image and the second camera image captured by a second camera adjacent to the first camera; Is encoded.
[0020]
Here, the image encoding apparatus and method include a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera. Is encoded.
[0021]
Further, in order to achieve the above-described object, an image decoding device and method according to the present invention include an image decoding apparatus that includes a first camera image captured by one or more first cameras among a plurality of adjacent cameras. A first encoded code stream generated by encoding, and an encoded first camera image are decoded, and the decoded first camera image and a second camera adjacent to the first camera are decoded. When decoding the first difference image with the second camera image captured by the second encoding code stream generated by encoding the first difference image, the first encoding code stream is decoded by decoding the first encoding code stream. Generating the first camera image, decoding the second encoded code stream to generate the first difference image, synthesizing the first difference image with the first camera image, and generating the first camera image. 2 camera images It is formed.
[0022]
Here, the image decoding apparatus and the image decoding method include a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera image. Decoding the encoded third encoded code stream to generate the second differential image, and combining the second differential image and the second camera image to generate the third camera image I do.
[0023]
In such an image encoding apparatus and method, and an image decoding apparatus and method, a camera image to be intra-frame encoded and a camera image to be encoded as a difference image are divided into a difference between adjacent camera images having similar image contents. And encode it. At the time of decoding, a decoded difference image and a decoded adjacent camera image are combined to generate an original camera image.
[0024]
Further, a program according to the present invention causes a computer to execute the above-described image encoding process or image decoding process, and a recording medium according to the present invention is a computer-readable medium on which such a program is recorded. is there.
[0025]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, specific embodiments to which the present invention is applied will be described in detail with reference to the drawings. In this embodiment, the present invention is applied to an image encoding apparatus and method for encoding images from a plurality of cameras in a JPEG-2000 system, and an image decoding apparatus and method for decoding the compressed image. It was done.
[0026]
(1) First embodiment
In the present embodiment, an omnidirectional image from a plurality of cameras is applied by applying a tile encoding function in the JPEG-2000 standard in which one image is divided into a plurality of rectangular regions and encoded. Tile encoding is performed by regarding each as a tile image. Here, the omnidirectional images obtained by a plurality of cameras are roughly classified into, as shown in FIG. 1A, images obtained by arranging cameras in a ring shape and capturing images from the center of the ring to the outside. As shown in FIG. 1 (B), there are two types, one obtained by arranging a camera in an annular shape and capturing an image toward the center of the annular shape. Then, as shown in FIG. 2, for example, the camera images of the first to sixteenth cameras obtained in this way are regarded as tile images, and tile encoding is performed.
[0027]
Hereinafter, before describing the image encoding / decoding system according to the present embodiment, first, wavelet transform and tile encoding in the JPEG-2000 standard will be described.
[0028]
In this wavelet transform, the low-frequency component is usually repeatedly transformed as shown in FIG. 3, because most of the energy of the image is concentrated in the low-frequency component. This means that the sub-bands are formed as shown in FIG. 4A as the division level is advanced from the division level = 1 shown in FIG. 4A to the division level = 3 shown in FIG. It can be understood from going.
[0029]
Here, the number of levels of the wavelet transform in FIG. 3 is two, and as a result, a total of seven subbands are formed. That is, the horizontal size X_SIZE and the vertical size Y_SIZE are each halved by the first filtering process, and four subbands LL-1, LH-1, HL-1, and HH-1 are generated. Is done. Then, LL-1 is further divided by the second filtering process, and four subbands of LL-2, LH-2, HL-2, and HH-2 are generated. In FIG. 3, L and H represent low and high frequencies, respectively, and the numbers after L and H represent division levels. That is, for example, LH-1 represents a subband at a division level = 1 in which the horizontal direction is low and the vertical direction is high.
[0030]
Next, tile encoding in the JPEG-2000 standard will be described. FIG. 5 shows parameters at the time of tile encoding defined by the JPEG-2000 standard. As shown in FIG. 5, one image is divided into, for example, 20 tiles whose tile indexes are T0 to T19. The horizontal size of each tile is given by XTsiz, and the vertical size is given by YTsiz. In the present embodiment, (XTsiz, YTsiz) is the horizontal size and the vertical size of the camera image. In the JPEG-2000 standard, the position of an image or a tile is expressed using a coordinate axis called a reference grid, thereby easily determining the sampling position of the image when performing resolution conversion. Can be. Here, since each tile needs to be independently encoded and decoded, it is not possible to refer to pixels beyond the tile boundary. The position of the tile is specified by a relative position between the origin of the reference grid and the upper left of the tile, and this relative position is defined as XTOsiz and YTOsiz in FIG.
[0031]
In the tile encoding of the JPEG-2000 standard, when an image in a tile is subjected to wavelet transform, when a filter protrudes to an adjacent tile area, pixels of the adjacent tile are not used, and a code as shown in FIG. Wavelet transform is performed by symmetrically expanding the pixels in the tile to be converted. In the example of FIG. 6, the pixels b and c are symmetrically extended in the horizontal direction and the pixels e and f are symmetrically extended in the horizontal direction with respect to the outside of the tile to be encoded. In the JPEG-2000 standard, there are a reversible 5 × 3 wavelet transform filter and an irreversible 9 × 7 wavelet transform filter, and the number of pixels to be symmetrically extended from a tile boundary is defined for each wavelet transform filter. Therefore, this embodiment may be followed.
[0032]
Next, FIG. 7 shows a schematic configuration of an image encoding / decoding system according to the first embodiment. As shown in FIG. 7, the image encoding / decoding system 1 includes an image encoding device 10 having a camera video input unit 11, a JPEG-2000 encoding unit 12, and a code stream recording unit 13, and a JPEG-2000 decoding unit. It comprises an image decoding device 20 having a unit 21 and an image display unit 22. The image encoding device 10 and the image decoding device 20 are connected via a network N, for example, the Internet.
[0033]
In the image encoding device 10, the camera image input unit 11 obtains omnidirectional images from a plurality of cameras and supplies the camera image information D <b> 10 to the JPEG-2000 encoding unit 12.
[0034]
The JPEG-2000 encoding unit 12 regards the camera image information D10 supplied from the camera image input unit 11 as each tile image as described above, and encodes it according to the JPEG-2000 system. The JPEG-2000 encoding unit 12 transmits the obtained encoded code stream D11 to the image decoding device 20 via the network N, or supplies the encoded code stream D11 to the code stream recording unit 13 to record the code stream on a recording medium. When there are a plurality of camera images, the encoded code stream D11 is also present by the number of cameras.
[0035]
The code stream recording unit 13 records the encoded code stream D11 supplied from the JPEG-2000 encoding unit 12 on a recording medium (not shown) such as a hard disk or a memory card.
[0036]
Note that a total of 16 camera images shown in FIG. 2 are images per time. Therefore, for example, in the case of a moving image of 30 frames per second like a normal television picture, the JPEG-2000 encoding unit 12 encodes the image of FIG. 2 at a rate of 30 frames per second as shown in FIG. I do.
[0037]
On the other hand, in the image decoding device 20, the JPEG-2000 decoding unit 21 converts the encoded code stream D11 transmitted from the image encoding device 10 or supplied from the image encoding device 10 via a recording medium into the JPEG-2000 format. Decrypt according to Here, since the presence or absence of tile encoding and parameters describing the tile size are defined in the encoded code stream D11, the JPEG-2000 decoding unit 21 faithfully decodes them. Note that tile encoding is completely independent without being affected by adjacent tiles, and thus random access decoding is possible. The JPEG-2000 decoding unit 21 supplies the obtained decoded image D20 to the image display unit 22.
[0038]
The image display unit 22 displays the decoded image D20 supplied from the JPEG-2000 decoding unit 21. Here, since the decoded images D20 exist for a plurality of cameras, the image display unit 22 displays them in the arrangement shown in FIG.
[0039]
As described above, in the first embodiment, random access decoding becomes possible by performing tile encoding by regarding each of a plurality of camera images as a tile image of one image. Since the stream can be extracted and decoded, the operability is greatly improved. For example, in an Internet environment, it is possible to decode only a camera image at a viewpoint position that a human is actually watching on the client side. This eliminates the need to decode all camera images, which is efficient.
[0040]
(2) Second embodiment
In the first embodiment, the image decoding device 20 may not be fast enough. For example, when the number of cameras is large, decoding of all of them in real time requires a large-scale LSI even if hardware is used. Therefore, there is a possibility that real-time decoding cannot be performed with an inexpensive codec LSI in terms of performance. Therefore, in the second embodiment described below, the scalability of the JPEG-2000 coded code stream is actively used. Here, the scalability means that the scale can be freely changed, that is, decoding can be performed by freely changing the resolution and image quality using one encoded code stream.
[0041]
The features of the JPEG-2000 coded code stream and specific means for achieving scalability will be described. In the JPEG-2000 system, an image is encoded using wavelet transform as described above. At this time, since the resolution of the low frequency component is hierarchically divided as shown in FIG. 3, the original encoded code stream has a multi-resolution structure.
[0042]
In the JPEG-2000 system, bit plane coding is adopted, and the data is expanded from MSB (Most Significant Bit) to LSB (Least Significant Bit) for coding. From these, in the course of encoding, as shown in FIG. 9, an encoded code stream layered from the MSB to the LSB direction can be generated.
[0043]
In the JPEG-2000 system, an encoded code stream is generated in a unit called a packet. Here, by definition, one packet is assigned to the same resolution level. In the example of FIG. 9, since the wavelet transform is performed up to the level 3, there are four packets per one layer.
[0044]
Taking advantage of these features, for example, when the performance of the image decoding device 20 is not sufficient, it is effective to decode and display only the low-frequency components (Packet-0, Packet-4, and Packet-8) in FIG. It is. In this case, since the horizontal and vertical sizes are 1/8 the size of the original image, the processing load is greatly reduced as compared with the case of decoding an encoded code stream having the size of the original image. It is possible to reduce the number of decoded images in the number of cameras in real time.
[0045]
FIG. 10 shows an example realized with actual images. As shown in FIG. 10, the resolution from the lowest band (level 0) to the highest band (level 3) is shown. Therefore, it is possible to decode the encoded code stream of the packet to a possible level according to the performance of the image decoding device 20.
[0046]
It is also effective to decode only the encoded code stream of the upper layer packet. In this case, an image having a deteriorated image quality is output at the same resolution as the original image of the camera image. FIG. 11 shows an example realized with actual images. As shown in FIG. 11, a decoded image from the lowest band (layer 0) to the highest band (layer 2) is shown. Therefore, it is possible to decode an encoded code stream of a packet up to a possible layer according to the performance of the image decoding device 20.
[0047]
As described above, in the second embodiment, the number of cameras is large by decoding the coded code stream to a possible level or to a possible layer according to the performance of the image decoding device 20. Even in this case, decoding can be performed in real time.
[0048]
(3) Third embodiment
In the present embodiment, for example, as shown in FIG. 12, a total of nine camera images of the first to ninth cameras are joined in the horizontal and vertical directions to form one image. At this time, instead of simply joining a plurality of camera images, the camera images are inverted and arranged so as to be symmetrical in advance so that there is no large difference in pixel values at adjacent boundaries, and then joined. As a result, when the wavelet transform is performed, a high-frequency component is hardly generated in an adjacent portion where an edge or the like is likely to appear, so that the overall coding efficiency is improved, and the compression rate is improved. Since the image encoding / decoding system 1 shown in FIG. 7 can be used as an actual configuration, description will be made with reference to FIG. 7 as necessary.
[0049]
The camera image input unit 11 shown in FIG. 7 obtains camera images from a plurality of cameras and supplies the camera image information D10 to the JPEG-2000 encoding unit 12.
[0050]
The JPEG-2000 encoding unit 12 joins the camera image information D10 supplied from the camera image input unit 11 in the horizontal and vertical directions as described above to form one image, and encodes this image according to the JPEG-2000 system. I do. The JPEG-2000 encoding unit 12 transmits the obtained encoded code stream D11 to the image decoding device 20 via the network N, or supplies the encoded code stream D11 to the code stream recording unit 13 to record the code stream on a recording medium.
[0051]
Note that the nine camera images in FIG. 12 are images per one time. For example, in the case of a moving image of 30 frames per second like a normal television image, the JPEG-2000 encoding unit 12 12 are encoded at a rate of 30 frames per second.
[0052]
The JPEG-2000 decoding unit 21 returns the encoded code stream D11 transmitted from the image encoding device 10 or supplied from the image encoding device 10 via the recording medium to an image before the symmetric conversion, and converts this to a JPEG-coded image. Decoding is performed according to the 2000 system. The JPEG-2000 decoding unit 21 supplies the obtained decoded image D20 to the image display unit 22.
[0053]
The image display unit 22 displays the decoded image D20 supplied from the JPEG-2000 decoding unit 21. Here, since the decoded images D20 exist for a plurality of cameras, the image display unit 22 displays them in the arrangement shown in FIG.
[0054]
As described above, in the third embodiment, a plurality of camera images are inverted so that there is no large difference in pixel values at adjacent boundaries, and are joined in the horizontal and vertical directions so as to be symmetrical to form one image. This makes it difficult to generate high-frequency components in adjacent portions where edges and the like are likely to appear when performing the wavelet transform, thereby improving the overall coding efficiency.
[0055]
(4) Fourth embodiment
In the present embodiment, for example, as shown in FIG. 13, a single image is generated by combining pixel rows at the same position of a plurality of camera images. In the example of FIG. 13, four columns are grouped in order from the left end for four camera images. As a result, the finally generated image has the same vertical size as the camera image and has a resolution four times the horizontal size. Since the image encoding / decoding system 1 shown in FIG. 7 can be used as an actual configuration, description will be made with reference to FIG. 7 as necessary.
[0056]
The camera image input unit 11 shown in FIG. 7 obtains camera images from a plurality of cameras and supplies the camera image information D10 to the JPEG-2000 encoding unit 12.
[0057]
The JPEG-2000 encoding unit 12 combines the pixel rows at the same position into one image for the camera image information D10 supplied from the camera image input unit 11 as described above, and encodes this image according to the JPEG-2000 system. Become The JPEG-2000 encoding unit 12 transmits the obtained encoded code stream D11 to the image decoding device 20 via the network N, or supplies the encoded code stream D11 to the code stream recording unit 13 to record the code stream on a recording medium.
[0058]
Here, since the characteristics of the adjacent camera images, particularly, the contents (pixel values) of the images at the same position are similar, the correlation between the adjacent pixels of the newly generated image becomes higher, and an improvement in the compression ratio can be expected.
[0059]
The JPEG-2000 decoding unit 21 converts a pixel sequence at the same position into a plurality of camera images of an encoded code stream D11 transmitted from the image encoding device 10 or supplied from the image encoding device 10 via a recording medium. A plurality of camera images are generated by distributing the image into pixel columns. Then, the JPEG-2000 decoding unit 21 decodes the plurality of camera images according to the JPEG-2000 method. The JPEG-2000 decoding unit 21 supplies the obtained decoded image D20 to the image display unit 22.
[0060]
The image display unit 22 displays the decoded image D20 supplied from the JPEG-2000 decoding unit 21. Here, since the decoded images D20 exist for a plurality of cameras, the image display unit 22 displays them in the arrangement shown in FIG.
[0061]
Note that, instead of grouping pixel columns at the same position in a plurality of camera images, for example, as shown in FIG. 14, pixel rows at the same position in a plurality of camera images can be grouped. In the example of FIG. 14, for four camera images, pixel rows at the same position of a plurality of camera images are extracted line by line from top to bottom of the images, and these are arranged in order to generate one image. As a result, the finally generated image has the same horizontal size as the camera image and a vertical size of four times the resolution.
[0062]
The JPEG-2000 encoding unit 12 generates a single image by combining pixel rows at the same position of a plurality of camera images in this way, and encodes the generated image according to the JPEG-2000 system.
[0063]
The JPEG-2000 decoding unit 21 distributes pixel rows at the same position to pixel columns of a plurality of camera images, and generates a plurality of camera images. Then, the JPEG-2000 decoding unit 21 decodes the plurality of camera images according to the JPEG-2000 method.
[0064]
The image display unit 22 displays the decoded image D20 supplied from the JPEG-2000 decoding unit 21. Here, since the decoded images D20 exist for a plurality of cameras, the image display unit 22 displays them in the arrangement shown in FIG.
[0065]
As described above, in the fourth embodiment, it is possible to increase the correlation between adjacent pixels and improve the compression ratio by generating one image by combining pixel columns or pixel rows at the same position of a plurality of camera images. It becomes possible.
[0066]
In the description of the first, third, and fourth embodiments, the JPEG-2000 encoding unit 12 encodes each image as an I picture (intra-frame encoded image) as shown in FIG. It is common to make
[0067]
(5) Fifth embodiment
In the first, third, and fourth embodiments described above, all images are encoded as I-pictures (intra-frame encoded images). However, in the present embodiment, the images to be encoded as I-pictures are The image is encoded separately from the image to be encoded as the difference image. Hereinafter, an image encoded as a difference image is referred to as an S picture. In the MPEG system, motion prediction is performed on a macroblock basis, and coding is performed on prediction errors. However, in the present embodiment, for simplification, coding is performed on a simple difference image that does not perform motion prediction. It shall be.
[0068]
First, FIG. 16 shows a conceptual diagram of an encoding method according to the present embodiment. In FIG. 15 described above, all are encoded as I pictures, whereas in FIG. 16, only the first camera is encoded as I pictures, and the other camera images are encoded as S pictures. For example, a difference between the input image of the second camera and the decoded image of the first camera is obtained, and the difference image is encoded. Similarly, the difference between the input image of the third camera and the decoded image of the second camera is obtained, and the difference image is encoded.
[0069]
Next, FIG. 17 shows a schematic configuration of an image encoding / decoding system according to the present embodiment. As shown in FIG. 17, the image encoding / decoding system 2 includes an image encoding device 30 and an image decoding device 40, and the image encoding device 30 and the image decoding device 40 are connected to each other by a network such as the Internet. N. Here, the image encoding device 30 includes a camera video input unit 31, a JPEG-2000 encoding unit 32, a JPEG-2000 decoding unit 33, a subtractor 34, a difference image encoding unit 35, and a code stream recording unit. A part 36. The image decoding device 40 includes a JPEG-2000 decoding unit 41, a difference image decoding unit 42, an adder 43, and an image display unit 44.
[0070]
In the image encoding device 30, the camera image input unit 31 obtains omnidirectional images from a plurality of cameras, and supplies the camera image information D30 to the JPEG-2000 encoding unit 32 and the subtractor 34.
[0071]
The JPEG-2000 encoding unit 32 encodes the camera image information D30 supplied from the camera video input unit 31 according to the JPEG-2000 system. The JPEG-2000 encoding unit 32 supplies the obtained encoded code stream D31 to the JPEG-2000 decoding unit 33. Further, the JPEG-2000 encoding unit 32 transmits the encoded code stream D32 encoded as an I picture to the image decoding device 40 via the network N, or supplies the encoded code stream D32 to the code stream recording unit 36.
[0072]
The JPEG-2000 decoding unit 33 decodes the encoded code stream D31 supplied from the JPEG-2000 encoding unit 32 according to the JPEG-2000 method, and generates decoded image information D33. When the camera image information D30 to be encoded as an S picture is supplied from the camera image input unit 31, the subtractor 34 subtracts the decoded image information D33 of the adjacent camera from the camera image information D30, and obtains a difference image obtained. The information D34 is supplied to the difference image encoding unit 35.
[0073]
The difference image encoding unit 35 encodes the difference image information D34 supplied from the subtractor 34 using, for example, a Live-Zempel code or an arithmetic code often used for file compression or the like. Here, the difference image information D34 is not encoded according to the JPEG-2000 method because the difference image is a kind of noise image and the correlation inside the image is very small. It is not suitable. The difference image coding unit 35 transmits the obtained coded code stream D35 to the image decoding device 40 via the network N, or supplies the coded code stream D35 to the code stream recording unit 36 to record it on a recording medium.
[0074]
On the other hand, in the image decoding device 40, the JPEG-2000 decoding unit 41 encodes an I-picture and transmits the I-picture from the image encoding device 30, or the encoded code supplied from the image encoding device 30 via a recording medium. The stream D32 is decoded according to the JPEG-2000 system. The JPEG-2000 decoding unit 41 supplies the obtained decoded image information D40 to the adder 43 and the image display unit 44.
[0075]
The difference image decoding unit 42 decodes and obtains an encoded code stream D35 encoded as an S picture and transmitted from the image encoding device 30, or supplied from the image encoding device 30 via a recording medium. The difference image information D41 is supplied to the adder 43. When the difference image information D41 is supplied from the difference image decoding unit 42, the adder 43 adds the difference image information D41 and the decoded image information D40 or the decoded image information of the adjacent camera, and obtains the obtained decoded image information. D42 is supplied to the image display unit 44.
[0076]
The image display unit 44 displays the decoded image information D40 supplied from the JPEG-2000 decoding unit 41 and the decoded image information D42 supplied from the adder 43. Here, since the decoded images exist for a plurality of cameras, the image display unit 44 displays the decoded images in the omnidirectional image arrangement shown in FIG.
[0077]
As described above, in the fifth embodiment, the image is divided into an image to be encoded as an I picture and an image to be encoded as an S picture, and encoding is performed by taking a difference between adjacent camera images having similar image contents. As a result, the compression ratio can be improved as compared with the case where only I pictures are used. On the other hand, at least a memory for holding the decoded image for reference is required, and since the encoding of the camera images in order causes a delay corresponding to the number of cameras, depending on the requirements of the system, It is preferable to use S pictures properly.
[0078]
In the example of FIG. 16, the position of the I picture is always fixed to the first camera, thereby eliminating the need for switching and simplifying the hardware control. It is not limited to fixing to one camera. For example, as shown in FIG. 18, a plurality of camera images may be encoded with an I picture, and a camera image adjacent to the I picture may be encoded with an S picture. Thus, the delay time can be reduced as compared with the case of FIG. It is also possible to switch to encode an important camera image with an I picture. For example, in the Internet environment, when transmitting a camera image at the center position of a viewpoint actually viewed by a human on the client side from the server, the camera image at the center position of the viewpoint is encoded as an I picture. Switching is preferred.
[0079]
(6) Other
The present invention is not limited to only the above-described embodiment, and it goes without saying that various modifications can be made without departing from the spirit of the present invention.
[0080]
For example, in the above-described embodiment, an omnidirectional image is obtained by a plurality of cameras arranged in a ring, but the present invention is not limited to this. For example, a plurality of cameras arranged in an arc shape An image of the azimuth may be obtained.
[0081]
Further, in the above-described embodiment, the hardware configuration has been described. However, the present invention is not limited to this. Any processing may be realized by causing a CPU (Central Processing Unit) to execute a computer program. It is possible. In this case, the computer program can be provided by being recorded on a recording medium, or can be provided by being transmitted via the Internet or another transmission medium.
[0082]
【The invention's effect】
As described in detail above, the image encoding device and method according to the present invention generate an entire image including a plurality of camera images captured by a plurality of adjacent cameras as respective components, and serve as the respective components. The entire image is encoded using the camera image as a unit of encoding.
[0083]
Further, the image decoding apparatus and method according to the present invention encodes an entire image having a plurality of camera images captured by a plurality of adjacent cameras as each component in units of the above-described component image. When decoding the generated encoded code stream, the encoded code stream is decoded, and the decoded plurality of camera images are displayed adjacent to each other.
[0084]
In such an image encoding device and method, and an image decoding device and method, each of a plurality of camera images is regarded as a component of one image, and a camera image serving as each of the components is encoded as a unit of encoding. A plurality of camera images after decoding are displayed so as to have an adjacent original arrangement. As a result, random access decoding becomes possible, and an arbitrary camera image can be extracted and decoded, so that operability is greatly improved.
[0085]
Further, the image encoding apparatus and method according to the present invention, the pixels in a plurality of camera images captured by a plurality of adjacent cameras, arranged so that adjacently arranged camera images are symmetrical, An entire image generated by arranging the arranged camera images in the horizontal and vertical directions is encoded.
[0086]
Further, the image decoding apparatus and method according to the present invention arranges pixels in a plurality of camera images captured by a plurality of adjacent cameras so that adjacently arranged camera images are symmetrical with each other. When decoding an encoded code stream generated by encoding an entire image in which the arranged camera images are arranged in the horizontal and vertical directions, the encoded code stream is decoded, and the decoded image is output in the horizontal and vertical directions. The plurality of camera images are divided in the vertical direction to generate the plurality of camera images, and when the pixels of the plurality of camera images are arranged so that adjacent images are symmetric, the original pixel arrangement is restored.
[0087]
In such an image encoding apparatus and method, and an image decoding apparatus and method, a plurality of camera images are inverted in a horizontal and vertical direction so as to be symmetrical by being inverted so that there is no large difference in pixel values at adjacent boundaries. , As one entire image. At the time of decoding, the decoded image is divided in the horizontal and vertical directions to generate the original plurality of camera images, and the pixels of the plurality of camera images are arranged so that the adjacently arranged images are symmetric. Are arranged, the original pixel arrangement is restored. This makes it difficult for high-frequency components to be generated in adjacent portions where edges and the like are likely to appear, thereby improving the overall coding efficiency.
[0088]
Further, the image encoding device and method according to the present invention encode an entire image generated by arranging pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras collectively. .
[0089]
In addition, the image decoding apparatus and method according to the present invention include an encoding apparatus that encodes and generates an entire image in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged. When decoding the code stream, the plurality of camera images are generated by decoding the encoded code stream and distributing pixel columns or pixel rows of the decoded image to the same position.
[0090]
In such an image encoding device and method, and an image decoding device and method, pixel columns or pixel rows at the same position of a plurality of camera images are combined and encoded as one entire image. At the time of decoding, a plurality of original camera images are generated by distributing pixel columns or pixel rows of the decoded image to the same position. This makes it possible to increase the correlation between adjacent pixels and improve the compression ratio.
[0091]
Further, the image encoding apparatus and method according to the present invention is characterized in that the first camera image captured by one or two or more first cameras among a plurality of adjacent cameras is intra-frame encoded, and the encoded Decoding a first camera image and encoding a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera image; I do.
[0092]
Here, the image encoding apparatus and method include a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera. Is encoded.
[0093]
In addition, the image decoding apparatus and method according to the present invention include a first code generated by intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras. An encoded code stream and an encoded first camera image, and the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera When decoding a second encoded code stream generated by encoding a first differential image with the first encoded image, the first encoded image is generated by decoding the first encoded code stream. The second encoded code stream is decoded to generate the first differential image, and the first differential image and the first camera image are combined to generate the second camera image.
[0094]
Here, the image decoding apparatus and the image decoding method include a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera image. Decoding the encoded third encoded code stream to generate the second differential image, and combining the second differential image and the second camera image to generate the third camera image I do.
[0095]
In such an image encoding apparatus and method, and an image decoding apparatus and method, a camera image to be intra-frame encoded and a camera image to be encoded as a difference image are divided into a difference between adjacent camera images having similar image contents. And encode it. At the time of decoding, a decoded difference image and a decoded adjacent camera image are combined to generate an original camera image. This makes it possible to improve the compression ratio as compared with the case where all camera images are intra-frame encoded.
[0096]
Further, a program according to the present invention causes a computer to execute the above-described image encoding process or image decoding process, and a recording medium according to the present invention is a computer-readable medium on which such a program is recorded. is there.
[0097]
According to such a program and a recording medium, the above-described image encoding process or image decoding process can be realized by software.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining an omnidirectional image according to a first embodiment. FIG. 1A shows an example in which a camera is arranged on a concentric circle and an image is taken in an external direction from the center of the concentric circle. FIG. 2B shows an example in which a camera is arranged on a concentric circle and an image is taken toward the center of the concentric circle.
FIG. 2 is a diagram illustrating an example in which a plurality of camera images are regarded as tile images in the first embodiment.
FIG. 3 is a diagram illustrating subbands when wavelet transform is performed up to a second level.
FIGS. 4A and 4B are diagrams illustrating subbands when an actual image is subjected to wavelet transform. FIG. 4A shows an example of division into a first level, and FIG. An example of division is shown.
FIG. 5 is a diagram illustrating parameters at the time of tile encoding according to the first embodiment.
FIG. 6 is a diagram illustrating a range covered by a wavelet transform in the tile encoding.
FIG. 7 is a diagram illustrating a schematic configuration of an image encoding / decoding system according to the first embodiment.
FIG. 8 is a diagram illustrating encoding of a tile image group in the time direction.
FIG. 9 is a diagram illustrating a layer structure and packet arrangement when an encoded code stream is divided into a plurality of layers in the second embodiment.
FIG. 10 is a diagram illustrating resolution progressive in an actual image.
FIG. 11 is a diagram illustrating image quality progressive in an actual image.
FIG. 12 is a diagram illustrating an example in which a plurality of camera images are arranged in a symmetrical relationship in the third embodiment.
FIG. 13 is a diagram illustrating an example in which pixels in the same column of a plurality of camera images are combined into one image in the fourth embodiment.
FIG. 14 is a diagram illustrating an example in which pixels in the same row of a plurality of camera images are combined into one image in the fourth embodiment.
FIG. 15 is a diagram illustrating an example in which all camera images are encoded as I pictures.
FIG. 16 is a diagram illustrating an example in which one camera image among a plurality of camera images is encoded as an I picture and the other camera images are encoded as S pictures in the fifth embodiment.
FIG. 17 is a diagram illustrating a schematic configuration of an image encoding / decoding system according to the fifth embodiment.
FIG. 18 is a diagram illustrating an example in which, in the fifth embodiment, a plurality of camera images among a plurality of camera images are encoded as I-pictures, and other camera images are encoded as S-pictures.
[Explanation of symbols]
1, 2 image encoding / decoding system, 10 image encoding device, 11 camera video input unit, 12 JPEG-2000 encoding unit, 13 code stream recording unit, 20 image decoding device, 21 JPEG-2000 decoding unit, 22 image display Unit, 30 image encoding device, 31 camera video input unit, 32 JPEG-2000 encoding unit, 33 JPEG-2000 decoding unit, 34 subtractor, 35 difference image encoding unit, 36 code stream recording unit, 40 image decoding device , 41 JPEG-2000 decoding unit, 42 difference image decoding unit, 43 adder, 44 image display unit

Claims

Means for generating an overall image having a plurality of camera images captured by a plurality of adjacent cameras as respective constituent elements,
Means for encoding the entire image using the camera image as each of the constituent elements as a unit of encoding.

The image encoding device according to claim 1, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

2. The image encoding apparatus according to claim 1, wherein said encoding means encodes the entire image in a frame.

4. The image encoding apparatus according to claim 3, wherein said encoding means performs tile encoding of the plurality of camera images as tile images in a JPEG-2000 format.

Means for arranging pixels in a plurality of camera images captured by a plurality of adjacent cameras so that camera images arranged adjacent to each other are symmetric;
Means for arranging the arranged camera images in the horizontal and vertical directions to generate an entire image,
Means for encoding the entire image.

The image encoding apparatus according to claim 5, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

Means for collectively arranging pixel columns or pixel rows at the same position of a plurality of camera images taken by a plurality of adjacent cameras, and generating an entire image;
Means for encoding the entire image.

The image encoding apparatus according to claim 7, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

Means for intra-frame encoding a first camera image taken by one or more first cameras among a plurality of adjacent cameras;
Means for decoding the encoded first camera image;
Means for encoding a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera. Image encoding device.

The apparatus further comprises means for encoding a second difference image between the decoded second camera image and a third camera image taken by a third camera adjacent to the second camera. The image encoding device according to claim 9, wherein

The image encoding apparatus according to claim 9, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

The image encoding device according to claim 9, wherein the position of the first camera is always constant.

10. The image encoding apparatus according to claim 9, wherein the position of the first camera is a center position of a viewpoint.

A step of generating an overall image having a plurality of camera images captured by a plurality of adjacent cameras as respective components,
Encoding the entire image using the camera image as each component as an encoding unit.

A step of arranging pixels in a plurality of camera images captured by a plurality of adjacent cameras so that camera images arranged adjacent to each other are symmetrical,
A step of arranging the arranged camera images in the horizontal and vertical directions to generate an entire image,
Encoding the entire image.

A step of collectively arranging pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras, and generating an entire image,
Encoding the entire image.

Intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras;
Decoding the encoded first camera image;
Encoding a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera. Image encoding method.

Encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera image. 18. The image encoding method according to claim 17, wherein

In a program for causing a computer to execute a predetermined process,
A step of generating an overall image having a plurality of camera images captured by a plurality of adjacent cameras as respective components,
Encoding the entire image using the camera image as each of the constituent elements as an encoding unit.

In a program for causing a computer to execute a predetermined process,
A step of arranging pixels in a plurality of camera images captured by a plurality of adjacent cameras so that camera images arranged adjacent to each other are symmetrical,
A step of arranging the arranged camera images in the horizontal and vertical directions to generate an entire image,
Encoding the entire image.

In a program for causing a computer to execute a predetermined process,
A step of collectively arranging pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras, and generating an entire image,
Encoding the entire image.

In a program for causing a computer to execute a predetermined process,
Intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras;
Decoding the encoded first camera image;
Encoding a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera. Program to do.

Encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera image. The program according to claim 22, which performs the program.

In a computer-readable recording medium in which a program for causing a computer to execute a predetermined process is recorded,
A step of generating an overall image having a plurality of camera images captured by a plurality of adjacent cameras as respective components,
Encoding the entire image using the camera image as each component as a unit of encoding.

In a computer-readable recording medium in which a program for causing a computer to execute a predetermined process is recorded,
A step of arranging pixels in a plurality of camera images captured by a plurality of adjacent cameras so that camera images arranged adjacent to each other are symmetrical,
A step of arranging the arranged camera images in the horizontal and vertical directions to generate an entire image,
Encoding the whole image. A recording medium on which a program is recorded.

In a computer-readable recording medium in which a program for causing a computer to execute a predetermined process is recorded,
A step of collectively arranging pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras, and generating an entire image,
Encoding the whole image. A recording medium on which a program is recorded.

In a computer-readable recording medium in which a program for causing a computer to execute a predetermined process is recorded,
Intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras;
Decoding the encoded first camera image;
Encoding a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera. A recording medium on which a program to be recorded is recorded.

The program further includes a step of encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera. 28. The recording medium according to claim 27, wherein:

An image decoding apparatus that decodes an encoded code stream generated by encoding an entire image including a plurality of camera images captured by a plurality of adjacent cameras as components, and encoding the entire image in units of the camera images as the components. And
Means for decoding the encoded code stream;
Means for displaying the plurality of decoded camera images adjacent to each other.

30. The image decoding apparatus according to claim 29, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

30. The image decoding apparatus according to claim 29, wherein the decoding unit performs tile decoding using the plurality of camera images as tile images in a JPEG-2000 format.

30. The image decoding apparatus according to claim 29, wherein said decoding means selects a camera image to be decoded according to a viewpoint position.

30. The image decoding apparatus according to claim 29, wherein said decoding means decodes only an encoded code stream of a low-frequency component of a camera image when a decoding speed is not sufficient.

30. The image decoding apparatus according to claim 29, wherein said decoding means decodes only an encoded code stream of an upper layer of a camera image when a decoding speed is not sufficient.

Pixels in a plurality of camera images captured by a plurality of adjacent cameras are arranged so that the camera images arranged adjacent to each other are symmetrical, and the arranged camera images are arranged in a horizontal direction and a vertical direction. An image decoding device that decodes an encoded code stream generated by encoding the entire image that has been generated,
Means for decoding the encoded code stream;
Means for dividing the decoded image in the horizontal direction and the vertical direction to generate the plurality of camera images,
Means for returning to the original pixel array when pixels of the plurality of camera images are arranged so that adjacently arranged images are symmetrical.

36. The image decoding device according to claim 35, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

An image decoding apparatus that decodes an encoded code stream generated by encoding an entire image in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged,
Means for decoding the encoded code stream;
Means for generating the plurality of camera images by distributing pixel columns or pixel rows of the decoded image to the same position.

38. The image decoding device according to claim 37, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

A first encoded code stream generated by intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras, and the first encoded code stream A camera image is decoded, and a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera is generated by encoding. An image decoding device that decodes a second encoded code stream,
Means for decoding the first encoded code stream to generate the first camera image;
Means for decoding the second encoded codestream to generate the first differential image;
Means for synthesizing the first difference image and the first camera image to generate the second camera image.

A third encoded code stream obtained by encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera is obtained. Means for decoding to generate the second difference image;
40. The image decoding apparatus according to claim 39, further comprising: means for combining the second difference image and the second camera image to generate the third camera image.

40. The image decoding apparatus according to claim 39, wherein the plurality of cameras are arranged in a ring shape, and capture an image in a center direction or an outer direction of the ring.

An image decoding method for decoding an encoded code stream generated by encoding an entire image having a plurality of camera images captured by a plurality of adjacent cameras as constituent elements in units of the camera images as the constituent elements And
Decoding the encoded codestream;
Displaying the plurality of decoded camera images adjacent to each other.

Pixels in a plurality of camera images captured by a plurality of adjacent cameras are arranged so that the camera images arranged adjacent to each other are symmetrical, and the arranged camera images are arranged in a horizontal direction and a vertical direction. An image decoding method for decoding an encoded code stream generated by encoding the entire image,
Decoding the encoded codestream;
Generating the plurality of camera images by dividing the decoded image in a horizontal direction and a vertical direction,
When the pixels of the plurality of camera images are arranged such that adjacently arranged images are symmetrical, the pixel arrangement is returned to the original pixel arrangement.

An image decoding method for decoding an encoded code stream generated by encoding an entire image in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged,
Decoding the encoded codestream;
Generating the plurality of camera images by distributing pixel columns or pixel rows of the decoded image to the same position.

A first encoded code stream generated by intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras, and the first encoded code stream A camera image is decoded, and a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera is generated by encoding. An image decoding method for decoding a second encoded code stream, comprising:
Decoding the first encoded codestream to generate the first camera image;
Decoding the second encoded codestream to generate the first difference image;
Combining the first difference image and the first camera image to generate the second camera image.

A third encoded code stream obtained by encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera is obtained. Decoding to generate the second difference image;
46. The image decoding method according to claim 45, further comprising: combining the second difference image and the second camera image to generate the third camera image.

An image decoding process for decoding an encoded code stream generated by encoding an entire image having a plurality of camera images captured by a plurality of adjacent cameras as constituent elements in units of the camera images as the constituent elements. Is a program that causes a computer to execute
Decoding the encoded codestream;
Displaying the plurality of decoded camera images adjacent to each other.

Pixels in a plurality of camera images captured by a plurality of adjacent cameras are arranged so that the camera images arranged adjacent to each other are symmetrical, and the arranged camera images are arranged in a horizontal direction and a vertical direction. A program that causes a computer to execute an image decoding process of decoding an encoded code stream generated by encoding the entire image that has been generated,
Decoding the encoded codestream;
Generating the plurality of camera images by dividing the decoded image in a horizontal direction and a vertical direction,
When the pixels of the plurality of camera images are arranged so that adjacently arranged images are symmetrical, the image is returned to the original pixel array.

The computer executes an image decoding process of decoding an encoded code stream generated by encoding an entire image in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged. A program,
Decoding the encoded codestream;
Generating the plurality of camera images by distributing pixel columns or pixel rows of the decoded image to the same position.

A first encoded code stream generated by intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras, and the first encoded code stream A camera image is decoded, and a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera is generated by encoding. A program for causing a computer to execute an image decoding process of decoding a second encoded code stream,
Decoding the first encoded codestream to generate the first camera image;
Decoding the second encoded codestream to generate the first difference image;
Combining the first difference image and the first camera image to generate the second camera image.

A third encoded code stream obtained by encoding a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera is obtained. Decoding to generate the second difference image;
The program according to claim 50, further comprising the step of: combining the second difference image and the second camera image to generate the third camera image.

An image decoding process for decoding an encoded code stream generated by encoding an entire image having a plurality of camera images captured by a plurality of adjacent cameras as constituent elements in units of the camera images as the constituent elements. Is a computer-readable recording medium on which a program for causing a computer to execute is recorded,
Decoding the encoded codestream;
Displaying the plurality of decoded camera images adjacent to each other.

Pixels in a plurality of camera images taken by a plurality of adjacent cameras are arranged so that adjacently arranged camera images are symmetrical, and the arranged camera images are arranged in the horizontal and vertical directions. A computer-readable recording medium recorded with a program for causing a computer to execute an image decoding process of decoding an encoded code stream generated by encoding the entire image,
Decoding the encoded codestream;
Generating the plurality of camera images by dividing the decoded image in a horizontal direction and a vertical direction,
A step of returning to the original pixel array when pixels of the plurality of camera images are arranged so that adjacently arranged images are symmetrical. .

The computer executes an image decoding process of decoding an encoded code stream generated by encoding an entire image in which pixel columns or pixel rows at the same position of a plurality of camera images captured by a plurality of adjacent cameras are collectively arranged. A computer-readable recording medium on which the program is recorded,
Decoding the encoded codestream;
Generating the plurality of camera images by distributing pixel columns or pixel rows of the decoded image to the same position.

A first encoded code stream generated by intra-frame encoding a first camera image captured by one or more first cameras among a plurality of adjacent cameras, and the first encoded code stream A camera image is decoded, and a first difference image between the decoded first camera image and a second camera image captured by a second camera adjacent to the first camera is generated by encoding. A computer-readable recording medium on which a program for causing a computer to execute an image decoding process for decoding the second encoded code stream and a computer is recorded.
Decoding the first encoded codestream to generate the first camera image;
Decoding the second encoded codestream to generate the first difference image;
A step of combining the first difference image and the first camera image to generate the second camera image.

The program includes a third code that encodes a second difference image between the decoded second camera image and a third camera image captured by a third camera adjacent to the second camera. Decoding the encoded code stream to generate the second difference image;
The recording medium according to claim 55, further comprising a step of generating the third camera image by combining the second difference image and the second camera image.