JP3789794B2

JP3789794B2 - Stereoscopic image processing method, apparatus, and system

Info

Publication number: JP3789794B2
Application number: JP2001294981A
Authority: JP
Inventors: 健増谷; 五郎濱岸
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2001-09-26
Filing date: 2001-09-26
Publication date: 2006-06-28
Anticipated expiration: 2021-09-26
Also published as: JP2003111101A

Description

【０００１】
【発明の属する技術分野】
この発明は立体画像処理技術に関し、とくに、立体画像を処理または表示する方法、装置、システムおよび関連するコンピュータプログラムとデータ構造に関する。
【０００２】
【従来の技術】
ここ数年、インターネット利用人口が急増し、インターネット利用の新たなステージともいえるブロードバンド時代に入ろうとしている。ブロードバンド通信では通信帯域が格段に広がるため、従来敬遠されがちだった重い画像データの配信も盛んになる。「マルチメディア」や「ビデオ・オン・デマンド」などの概念は提起されて久しいが、ブロードバンド時代になって、はじめてこれらのことばが一般のユーザに実感をもって体験される状況になった。
【０００３】
画像、とくに動画像の配信が広がれば、ユーザは当然ながらコンテンツの充実と画質の向上を求める。これらは、既存の映像ソフトのデジタル化とそのためのオーサリングツールの開発、高効率かつロスの少ない画像符号化技術の追求などに負うところが大きい。
【０００４】
【発明が解決しようとする課題】
こうした状況下、近い将来画像配信サービスのひとつの形態として、擬似三次元画像（以下単に「立体画像」ともいう）の配信が技術的に注目され、かつ相当の市場を獲得することが考えられる。立体画像は、よりリアルな映像を求めるユーザの希望を叶え、とくに映画やゲームなど臨場感を追求するアプリケーションでは魅力的である。さらに立体画像は、２１世紀の商取引のひとつの標準になると思われるＥＣ（電子商取引）における商品プレゼンテーションにおいて、商品のリアルな表示にも有用である。
【０００５】
しかしながら、立体画像の配信という新しいネットビジネスを考えたとき、そのためのインフラストラクチャもビジネス推進のためのモデルもまだ存在しないといってもよい。本発明者はそうした現状に着目して本発明をなしたものであり、その目的は、立体画像の流通促進を技術的側面から可能にするための立体画像処理技術を提供することにある。
【０００６】
【課題を解決するための手段】
本発明の理解のために、まず本明細書における以下の概念を定義する。
「立体画像」：画像データそのものではなく、立体的に表示された結果、ユーザの目に投ずる画像を観念的に指す。立体画像として表示できる画像データのほうは、後述する「マルチプレクス画像」とよぶ。すなわち、マルチプレクス画像を表示すると、立体画像が見える。
「視差画像」：通常、奥行き感のある立体視のためには、視差が生じるよう右目に投ずるべき画像（以下、単に右目画像という）と左目に投ずるべき画像（以下、単に左目画像という）を準備する必要がある。右目画像と左目画像のように視差を生じさせる画像の対を視差画像と総称する場合もあるが、本明細書では、視差を生じさせる原因となる画像それぞれを視差画像とよぶ。つまり、右目画像も左目画像もそれぞれ視差画像である。これら以外にも、一般には、立体画像において想定された各視点からの画像がそれぞれ視差画像となる。
【０００７】
「基礎画像」：立体画像が表示されるために、立体視に必要な処理をなす対象の画像、またはすでに処理がなされた画像をいう。具体的な例として、後述のセパレート形式の視差画像の他、マルチプレクス形式またはサイドバイサイド形式のごとく、すでに複数の視差画像が何らかの形で合成されてできた画像（これらを「合成画像」ともいう）を含む。
「サイドバイサイド形式」：基礎画像の構成の態様のひとつ。複数の視差画像を水平方向、垂直方向またはそれらの両方向に並置して合成した形式。通常は間引きされた視差画像を並置する。例えば水平方向に２枚の視差画像を並置して構成する場合、それぞれの視差画像を水平方向に一画素ごとに間引く。サイドバイサイド形式の基礎画像を単に「サイドバイサイド画像」ともよぶ。
【０００８】
「マルチプレクス形式」：基礎画像の構成の態様のひとつ。立体画像を表示するための最終的な画像データの形式。マルチプレクス形式の基礎画像を単に「マルチプレクス画像」ともよぶ。
「セパレート形式」：基礎画像の構成の態様のひとつ。単独の二次元画像だが、他の二次元画像と組み合わされて立体視されることが想定されており、それら複数の二次元画像のそれぞれを指す。「セパレート形式」の基礎画像を単に「セパレート画像」ともよぶ。セパレート画像はマルチプレクス画像やサイドバイサイド画像と違い、合成画像ではない。
【０００９】
「視点」：立体画像にはそれを見る視点が想定されている。視点の数と視差画像の数は通常等しい。左目画像と右目画像のふたつの視差画像があるとき、視点の数は「２」である。ただし、視点がふたつでも、ユーザの頭の想定位置はひとつである。同様に、左右方向のユーザの移動を考慮した立体画像を表示する場合、例えば左右方向に４つの視点ｖａ、ｖｂ、ｖｃ、ｖｄを想定し、それぞれから見える視差画像をＩａ、Ｉｂ、Ｉｃ、Ｉｄとすれば、例えば（Ｉａ，Ｉｂ）（Ｉｂ，Ｉｃ）（Ｉｃ，Ｉｄ）の３組の視差画像によって奥行き感のある立体画像が表示できる。この状態でさらに、上下方向に回り込んだ立体画像を生成するために、相対的に上の方向から見た４つの画像と、同様に下の方向から見た４つの画像を利用するとすれば、視点の数は「８」となる。
【００１０】
以上の定義のもと、本発明のある態様は、立体画像処理方法に関する。この方法は、立体画像の流通の起点ともいうべき符号化側の技術と把握することができる。この方法は、立体画像を表示するための基礎画像に、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報（以下「立体情報」ともいう）を付加するものである。
【００１１】
「付加する」とは、基礎画像の中に組み込んでもよいし、基礎画像のヘッダその他の領域に組み込んでもよいし、基礎画像と関連づけられた別ファイルなどに組み込んでもよく、要するに基礎画像との対応関係を設ければよい。「立体画像を表示するための一連の処理における所定の場面」の例は、例えばサイドバイサイド画像をマルチプレクス画像へ変換する場面である。
【００１２】
この方法によれば、立体情報を参照することにより、適切な方法で立体画像を表示することができる。この方法で多数の基礎画像を準備すれば、種々の情報端末がそのデータを取りだして立体表示できるため、この方法は立体画像流通のための基礎技術として働く。この方法は、例えば立体画像サーバにて利用可能である。
【００１３】
本発明の別の態様は、上述の方法によって生成された画像データの構造に関する。このデータ構造は、立体画像を表示するための基礎画像の主データと、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を保持する副データとの組合せとして形成されている。主データは基礎画像を所定の手法にて圧縮したものであってもよい。「組合せ」とは、両者が一体の場合の他、両者に何らかの関連づけがなされていればよい。このデータ構造によれば、上述のごとく、表示側にて容易に立体表示が実現する。
【００１４】
本発明のさらに別の態様も立体画像処理方法に関する。この方法は、上述のデータ構造を解釈して利用するもの、すなわち一般には立体画像を表示する復号側の技術と把握することができる。この方法は、立体画像を表示するための基礎画像に付加された、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を検出するものである。検出を容易にするために、立体情報の付加は予め符号化側と合意された所定の形式にしたがってなされてもよい。この方法はさらに、検出された立体情報をもとに基礎画像の構成の態様を別のものに変換してもよい。
【００１５】
本発明のさらに別の態様も立体画像処理方法に関する。この方法は、メモリを有する装置にて立体画像を扱う際、画面に最終的に表示される基礎画像の構成の態様とは異なる態様の基礎画像を前記メモリへ保持しておき、適宜これを読み出して利用するものである。例えば、立体画像の表示にマルチプレクス画像が用いられても、その立体画像に拡大その他の処理を施したいとき、マルチプレクス形式よりもサイドバイサイド形式のほうが好都合なことがある。その場合、メモリにはサイドバイサイド画像を保持しておけば処理の高速化が実現する。
【００１６】
本発明のさらに別の態様も立体画像処理方法に関する。この方法は、立体画像を表示するための基礎画像に付加された、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を検出し、その情報をもとに立体画像の表示画面の輝度を調整する。たとえば、立体情報として、立体画像に想定された視点の数に関する情報を入れておき、輝度をその数に応じて調整してもよい。
【００１７】
仮に視点数が「４」であると、４枚の視差画像を合成してマルチプレクス画像が形成される。４つの視点のうちいずれかひとつの視点から見える画素数は通常の二次元画像を見た場合の１／４にとどまる。したがって、画面の輝度は理論上通常の１／４となる。このため、視点数に応じて表示装置の画面の輝度を高める処理が有効になる。この処理は、例えば視点数を検出するソフトウエアと輝度を調整する回路の協働によってなされる。
【００１８】
本発明のさらに別の態様は、立体画像処理装置に関する。この装置は、立体画像を準備する符号化側のものであり、立体画像を表示するための基礎画像を取得する画像取得部と、取得された基礎画像に、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を付加する情報付加部とを含む。画像取得部は、自ら基礎画像を生成してもよいし、既製の基礎画像を入力してもよい。
【００１９】
本発明のさらに別の態様も立体画像処理装置に関する。この装置は、立体画像を実際に表示し、またはそのための前処理を行うものであり、立体画像を表示するための基礎画像を取得する画像取得部と、取得された基礎画像に付加された、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を検出する情報検出部とを含む。画像取得部は、たとえばネットワーク経由または記録媒体などから基礎画像を取得または入力する。画像取得部は、予め圧縮されていた画像データを入力する画像入力部と、入力された画像データを伸張することによって基礎画像を生成する画像伸張部とを含んでもよく、検出された情報をもとに前記立体画像の表示画面の輝度を調整する輝度調整部を含んでもよい。
【００２０】
本発明のさらに別の態様は、立体画像処理システムに関する。このシステムは立体画像の合成装置と表示装置を含み、合成装置は、立体画像を表示するための基礎画像に、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を組み込み、表示装置は、立体情報を検出してこれをもとに基礎画像に適宜画像処理を施し、立体画像を表示する。画像処理の例として、基礎画像の構成の態様の変更がある。本システムはサーバ・クライアントシステムであってもよい。本システムは立体画像の流通促進に寄与できる。
【００２１】
本発明のさらに別の態様は立体画像処理方法に関する。この方法は、立体画像を表示するための基礎画像を取得し、この基礎画像の一部を検査することによって立体画像を表示するための一連の処理における所定の場面にて参照すべき情報、すなわち立体情報を推定するものである。いままでに述べた場合とは異なり、ここでは立体情報が明示的に付加されていない場合の処理を考えている。そのため、基礎画像の一部が実際に検査される。一例として、基礎画像上のいくつかの領域を調べることにより、これがサイドバイサイド画像であるか否かが推定される。
【００２２】
この方法によれば、明示的に立体情報が与えられていない場合でも、基礎画像からそれを知ることができる。したがって一般的な手法で作成された過去の画像を利用することができ、ソフト資産の有効活用が図られる。
【００２３】
なお、以上の構成要素の任意の組合せ、本発明の表現を方法、装置、システム、コンピュータプログラム、記録媒体、伝送媒体などの間で変換したものもまた、本発明の態様として有効である。
【００２４】
【発明の実施の形態】
ＬＣＤに画像を表示するとき、通常、表示の最小単位はドットである。しかし、ＲＧＢに対応する３個のドットが集まってひとつのピクセルが形成され、通常の画像表示または画像処理ではピクセルが処理の最小単位として意識される。
【００２５】
しかし、立体画像をＬＣＤに表示する場合、別の配慮が必要になる。右目画像と左目画像は、レンチキュラーレンズやパララックスバリアなどの光学フィルタを通して、視差をもってユーザの目に到達する。左右両目の画像をピクセル単位、すなわち３ドット単位で交互に配置すると、右目画像のみが見える領域と左目画像だけが見える領域との間に、両方の画像が見える領域が発生し、色も混ざり、非常に見にくくなる。そのため、物理的な最小表示要素であるドット単位による交互の配置が望ましい。そこで、立体表示すべき基礎画像として、ドット単位で右目画像と左目画像を交互に配置したマルチプレクス画像が利用されることが多い。
【００２６】
視差画像が右目画像と左目画像の２枚のみからなる場合、すなわち水平視点数が「２」の場合、マルチプレクス画像は右目画像と左目画像をドット単位でストライプ状に配すれば足りる。しかし、視点数が「４」で、４枚の視差画像をもちいて水平方向の視点移動を考慮した立体画像を表示する場合、図１に示すごとく、画面１０の前におかれたパララックスバリア１２により、第１〜第４の視点ＶＰ１〜４からそれぞれ対応する視差画像のドットのみが見える。画面１０では、第１の視点ＶＰ１に対応する第１の視差画像のドットに「１」を付して示しており、以下の視点でも同様である。この例では、第１〜第４の視差画像がドット単位で順にストライプ状に配され、マルチプレクス画像が形成される。
【００２７】
さらに、垂直方向にも視点移動を考えたとき、パララックスバリア１２はストライプ状ではなくマトリクス状に並ぶピンホールになり、マルチプレクス画像もドット単位で入れ替わるマトリクス状になる。図２は、水平視点数、垂直視点数ともに「４」の場合のマルチプレクス画像２０の例を示す。ここで、（ｉ，ｊ）と表記される領域は、それぞれ水平方向の第ｉ視点、かつ垂直方向の第ｊ視点から見えるべきドットを示す。同図のごとく、水平方向には、ｉが１、２、３、４、１、・・・とサイクリックに変化し、同様に垂直方向には、ｊが１、２、３、４、１、・・・とサイクリックに変化する。
【００２８】
立体画像の利用促進を考えた場合、図２に示すマルチプレクス画像２０を必要な端末に送信すればよい。マルチプレクス画像２０であれば、すでに立体視するための最終形式になっているため、端末側ではそれを単に表示すれば済む。もちろんこのとき、立体視のためにパララックスバリア等の光学フィルタの存在を仮定している。
【００２９】
しかし本発明者は、ここでひとつ問題が生ずることを認識した。すなわち、送信に際して、当然ながら画像データを圧縮すべきであるが、マルチプレクス画像２０の場合、ＪＰＥＧ（Joint Photographic Expert Group）を代表とする通常の非可逆圧縮が事実上利用できないことである。なぜなら、マルチプレクスされた複数の視差画像は、それぞれ違う視点の画像であるから、画素レベルで考えるとそれらは本質的に無関係であり、ＪＰＥＧ等の空間周波数に依拠する手法で圧縮すると、せっかく各視点からの独立した視差画像を利用したにも拘わらず、それらの画像間で高周波成分がそぎ落とされ、結果的に正しい立体表示ができなくなる。とくに、独立した画像を画素単位で交互に並べたとき、非常に細かい高周波成分が多数生じるから、この問題は場合により致命的である。ネットワークの帯域が広がっているとはいえ、通常の画像は問題なく圧縮できるときに、立体画像のための基礎画像だけは圧縮できないとなれば、普及の足かせとなる。
【００３０】
そこで、送信や保存の場合で圧縮可能な形式として、サイドバイサイド画像の利用度が高くなることが考えられる。図３は水平、垂直とも４つの視点をもつサイドバイサイド画像３０を示す。ここで、（ｉ，ｊ）と表記される領域は、それぞれ水平方向の第ｉ視点、かつ垂直方向の第ｊ視点から見えるべき一枚の視差画像を示す。すなわちサイドバイサイド画像３０は、視差画像を水平または垂直の一方向か両方向に並置する形で合成したものであり、各視差画像は、それをサイドバイサイド画像３０から切り取れば、一枚の画像として機能する。
【００３１】
ただし、各視差画像は４×４＝１６の視点のひとつのみに対応すればよいため、立体画像として表示すべき画像サイズの１／１６のサイズでよく、通常は立体画像と同じサイズのオリジナルの画像から、水平方向と垂直方向のそれぞれについて、４ドットおきに１ドットを選んで生成される。わかりやすい例でいえば、視点数が「２」の右目画像と左目画像だけからなるマルチプレクス画像の場合、右目からは奇数列のドットのみが見えればよく、左目からは偶数列のドットのみが見えればよい。したがって、右目画像は予めオリジナルの画像から奇数列だけを取り出して水平方向に１／２に間引かれたものであればよく、左目画像も同様に偶数列だけを取り出せばよい。一般に視点の数が「ｎ」なら、サイドバイサイド画像を構成する各視差画像はオリジナルの画像サイズの１／ｎでよく、すべての視差画像をタイルのように並置すればちょうどオリジナルの画像サイズに戻る。
【００３２】
サイドバイサイド画像３０の場合、各視差画像がその境界を除いて独立しているため、非可逆圧縮をしても、悪影響はせいぜい境界部分にしか生じない。そのため、通常はサイドバイサイド３０をＪＰＥＧ等によって圧縮し、ネットワークを介して容易に送信したり、小さなストレージでも多数保存できるようになる。このように、サイドバイサイド画像３０は普及面で好適であるが、逆に欠点もあり、それは特別なビュアを要する点である。すなわち、いずれの表示装置でも、最終的にはマルチプレクス画像に変換しないと立体表示ができず、サイドバイサイド画像３０からマルチプレクス画像への変換処理が必要になる。
【００３３】
以上の一長一短を有するふたつの形式に加え、普及面、とくに画像の準備の観点から第３の形式としてセパレート画像が考えられる。セパレート画像は、集合体として立体画像を形成できるが、単独では通常の二次元画像に過ぎない。図４は１６枚のセパレート画像と立体画像との関係を示す。１６枚のうち、例えば「視点（４，２）の画像」と表記されたセパレート画像３２は、視点（４，２）を想定したもので、その画像サイズはオリジナルの画像と同じである。したがって、１６枚のセパレート画像は、それぞれユーザが移動しながら撮影したカメラ画像と考えればわかりやすい。
【００３４】
このように、セパレート画像はそのサイズが撮影時のままでよいため、間引きや合成といった処理を必要とせず、準備は楽である。また、それぞれの画像はオリジナルの状態で残るため、単独で別途利用できる。しかし、立体表示の場合、全体で１６枚の視差画像を要するため、伝送や保存の面では不利であり、また、やはり特別なビュアが必要になる。
【００３５】
以上が立体画像の普及にあたって考えられる主要な３形式である。これらの変形は最後に述べるとして、以下、基礎画像がこれらの３形式のいずれかで表現されているとき、普及促進および立体表示を技術的に実現するための立体画像処理方法を説明する。以下、簡単のために水平視点数が「２」、垂直視点数が「１」の場合を例示する。
【００３６】
図５、図６、図７はそれぞれ、本実施の形態に係る基礎画像のうち、サイドバイサイド画像４０、マルチプレクス画像５０、２枚でセットのセパレート画像６０、６２のデータ構造を模式的に示す。
【００３７】
図５に示すごとく、サイドバイサイド画像４０は、左目画像である第１視差画像４４と、右目画像である第２視差画像４６を水平に合成したもので、その画像データに後述するヘッダ領域４２が付加されている。同様に図６のマルチプレクス画像５０にも同じフォーマットにしたがうヘッダ領域４２が付加されている。図７のふたつのセパレート画像６０、６２には、それぞれヘッダ領域４２が付加されている。いずれの場合も、このデータ構造は、立体画像を表示するための基礎画像である主データと、その立体画像を表示するための一連の処理における所定の場面にて参照すべき情報を保持する副データとの組合せと考えることができる。なお、この主データが基礎画像を所定の手法にて圧縮したものである場合、副データはその圧縮手法において規定されるヘッダ領域に格納されてもよい。ヘッダ領域の規定がすでに存在する場合、その領域のうち例えばユーザ定義領域を利用することができる。
【００３８】
図８はヘッダ領域４２の詳細構成を模式的に示す。同図において、各領域は以下の立体情報を保持する。
（１）ＤＩＭ領域７０：３ビットで基礎画像の次元および構成の態様を示す。
０００：セパレート画像または立体視できない二次元画像全般
００１：三次元画像のうちマルチプレクス画像
０１０：三次元画像のうちサイドバイサイド画像
０１１：リザーブ
１ｘｘ：リザーブ
リザーブされる形式の例として、サイドバイサイド画像のように複数の視差画像を並置しながら、ただしそれらの視差画像を一切間引かないオリジナル画像のまま並置する「ジョイント画像」や、偶数フィールドと奇数フィールドで視差画像を時分割で交互に表示すべき視差画像であることを示す「フィールドシーケンシャル画像」などが考えられる。「ジョイント画像」は平行法や交差法で観察されることが多いが、ビュアで間引きしてサイドバイサイド画像に変換したり、直接マルチプレクス画像へ変換することもできるため、ひとつのフォーマットとして有効である。
【００３９】
（２）ＢＤＬ領域７２：１ビットでサイドバイサイド画像の境界処理の有無を示す。ＤＩＭが「０１ｘ」のときに意味をもつ。
０：境界処理なし
１：境界処理あり
前述ごとく、サイドバイサイド画像を非可逆圧縮するとき、その境界部分で画像が悪影響を受ける。これを軽減するために、次項で示す処理がなされているか否かを示す。
【００４０】
（３）ＨＤＬ領域７４：２ビットでサイドバイサイド画像の境界処理の内容を示す。ＢＤＬが「１」のときに意味をもつ。
００：白枠を入れる
０１：黒枠を入れる
１０：端の画素をコピーして入れる
１１：リザーブ
圧縮による悪影響を低減するため、境界部分に白枠、黒枠等を入れて複数の視差画像の混じりを減らす。端の画素のコピーも同様の効果がある。
【００４１】
（４）ＷＤＴ領域７６：２ビットでサイドバイサイド画像の境界処理の画素数を指定する。ＢＤＬが「１」のとき意味をもつ。
００〜１１：画素数
（５）ＶＰＨ領域７８：８ビットで立体画像に想定された水平視点数を示す。基礎画像の作成時に手動で記述してもよいし、基礎画像を生成するソフトウエアが自動生成してもよい。
００００００００：不明またはリザーブ
０００００００１〜１１１１１１１１：水平視点数
（６）ＶＰＶ領域８０：８ビットで立体画像に想定された垂直視点数を示す。
００００００００：不明またはリザーブ
０００００００１〜１１１１１１１１：垂直視点数
なお、ＶＰＨとＶＰＶがともに０００００００１のとき、基礎画像は立体視のできない通常の二次元画像と判断してもよい。
【００４２】
（７）ＯＤＨ領域８２：１ビットで複数の視差画像の水平方向の並びを示す。
０：撮影時のカメラの並びと同じ
１：撮影時のカメラの並びと逆
つまり、ＯＤＨが「０」のとき、撮影時いちばん左側のカメラで撮影された視差画像がそのまま基礎画像においてもいちばん左側に記録されており、以降順に記録されている。通常並びがランダムということは考えにくいため、２種類の規定でよい。
【００４３】
（８）ＯＤＶ領域８４：１ビットで複数の視差画像の垂直方向の並びを示す。
０：撮影時のカメラの並びと同じ
１：撮影時のカメラの並びと逆
なお、前述の視点数に関するＶＰＨとＶＰＶがともに８ビットであり、通常は十分すぎると考えられるため、これらの最上位ビットをそれぞれＯＤＨ、ＯＤＶに割り当ててもよい。
【００４４】
（９）ＰＳＨ領域８６：８ビットで各セパレート画像が水平方向において何番目の視点位置の画像であるかを示す。ＤＩＭが「０００」のときに意味がある。
００００００００：不明またはリザーブ
０００００００１〜１１１１１１１１：水平方向の位置
なお、各セパレート画像上に決められた原点、例えば画像の左上角の点の座標のような絶対値を別途ヘッダ領域４２に盛り込んでもよく、その場合、処理の高速化につながる。
【００４５】
（１０）ＰＳＶ領域８８：８ビットで各セパレート画像が垂直方向において何番目の視点位置の画像であるかを示す。
００００００００：不明またはリザーブ
０００００００１〜１１１１１１１１：垂直方向の位置
例えば図４で「視点（４，２）の画像」と表記されたセパレード画像は、ＰＳＨ＝４、ＰＳＶ＝２という記述になる。
【００４６】
以上がヘッダ領域４２の一例である。この領域を利用して立体画像の流通を実現するための装置を説明する。
図９はこの領域を生成する画像処理装置１００の構成を示す。この装置１００は、立体画像を表示するための基礎画像を取得する画像取得部１０２と、取得された基礎画像に、立体画像を表示するための一連の処理における所定の場面、例えば後に表示側の装置にてマルチプレクス画像を生成する場面において参照すべき立体情報を付加する情報付加部１０４とを含む。この構成は、ハードウエア的には、任意のコンピュータのＣＰＵ、メモリ、その他のＬＳＩで実現でき、ソフトウエア的にはメモリのロードされた基礎画像生成機能および立体情報付加機能のあるプログラムなどによって実現されるが、ここではそれらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックがハードウエアのみ、ソフトウエアのみ、またはそれらの組合せによっていろいろな形で実現できることは、当業者には理解されるところである。したがって、以下、構成の名称を明示的に示さないものは、例えばＣＰＵを中心とする制御部によってなされると考えてよい。
【００４７】
画像取得部１０２は、ネットワークやユーザのデジタルカメラなどの画像ソースからオリジナル画像を入力し、これをそのまま基礎画像とするか、または加工して基礎画像を生成する。例えばセパレート画像が必要な場合、単にオリジナル画像をそのまま基礎画像とすればよい。一方、サイドバイサイド画像が必要な場合、オリジナル画像を複数並置して合成する。マルチプレクス画像が必要な場合、各視点からの視差画像をストライプ状やマトリクス状に再構成する。
【００４８】
画像取得部１０２はさらに、得られた基礎画像を必要に応じて圧縮する。それに先立ち、圧縮によって立体画像の画質に影響が出るか否かを判定し、出ると判定したときは圧縮を禁止してもよい。例えばマルチプレクス画像を空間周波数成分に関して圧縮する場合、圧縮を禁止したり、これを一旦サイドバイサイド画像へ変換した後圧縮してもよい。
【００４９】
情報付加部１０４は、そうして得られた基礎画像に前述のヘッダ情報を付加し、その結果得られた立体表示のための画像データを図示しない記憶装置へ記録したり、ネットワーク経由で所定の個所へ配信する。以上、この装置１００によれば、立体表示を望む者のために、予め立体情報の付いた基礎画像を準備することができる。
【００５０】
なお、画像取得部１０２は、必ずしもオリジナル画像を最初に入手するとは限らない。すでにマルチプレクス画像になっているものをネットワーク等から入力し、その立体情報を検出し、それがマルチプレクス画像であることを判定し、そのままの状態では圧縮の不向きであることを認識し、これをサイドバイサイド画像に変換した後圧縮し、立体情報を書き換えるといった処理も可能である。その場合、この装置１００は立体画像流通の中継点として利用することもできる。
【００５１】
一方、図１０は、実際に立体表示を行う復号側の画像処理装置２００の構成を示す。この装置２００は、立体画像を表示するための基礎画像を取得する画像取得部２０２と、取得された基礎画像に付加された、立体画像を表示するための一連の処理における所定の場面にて参照すべき立体情報を検出する情報検出部２０４とを含む。この装置２００は典型的にはユーザ側の端末であり、画像取得部２０２は、すでに立体情報が付加された基礎画像を取得する。画像取得部２０２は、予め圧縮されていた画像データを入力したとき、これを伸張することによって基礎画像を生成または再生してもよい。
【００５２】
つづいて、情報検出部２０４がその基礎画像に付加されたヘッダ領域をパースし、立体情報を検出する。検出した立体情報から、この基礎画像がマルチプレクス画像でないことが判明すれば、この装置２００はこの基礎画像をマルチプレクス画像へ変換し、立体画像を表示する。この装置２００はそのオプショナルな機能として、検出された立体情報のうちとくに水平視点数と垂直視点数をもとに、前述の輝度に関する考察にしたがい、この装置２００の表示画面（図示せず）の輝度を高めてもよい。
【００５３】
この装置２００は、単に立体画像の表示だけでなく、当然ながら基礎画像を保存、編集することもできる。保存の際、基礎画像がマルチプレクス画像であればこれをサイドバイサイド画像その他へ変換し、立体情報を書き換えたうえで保存してもよい。編集の際、例えば画像を拡大縮小したいことがある。そのとき、マルチプレクス画像であると処理は煩雑であるから、これをいったんサイドバイサイド画像へ変換し、しかる後に所定の画像処理を施し、最後にマルチプレクス画像へ戻して表示してもよい。
【００５４】
なおこの装置２００は、こうした編集その他の画像処理の便宜を図るべく、マルチプレクス画像以外の形式の画像、とくにサイドバイサイド画像を常時メモリその他の記憶装置に保持しておき、必要に応じて適宜これを読み出して利用すればよい。
【００５５】
この装置２００の付加的な構成として、表示装置のもつ視点数や最適観察距離などの特性をデータとして取得する特性取得部を設ければ、さらに利便性が増す。例えば基礎画像の想定視点数と表示装置のそれとが異なる場合、前記の特性をもとに基礎画像から表示すべき視差画像を自動的に選択する表示画像選択部を設けることができる。基礎画像の想定視点数が「４」で、表示装置のそれが「２」であれば、４つの視差画像からふたつを選択する。これらふたつの視差画像は連続する視点のものである必要はなく、立体感を強調するには、むしろ視点を飛ばした２画像を選択してもよい。基礎画像の視点数が「２」で表示装置のそれが「４」であれば、同じ視差画像を２回づつ表示することで画面正面に立体視が可能な領域を確保できる。
【００５６】
さらにこの画像処理装置２００が、表示画面を見る観察者の頭部位置を検出する位置検出部を備えていれば、表示画像選択部は、頭部位置に合わせて選択すべき視差画像を変化させ、観察者に回りこんだ画像を見せることもできる。
【００５７】
また、光学フィルタが取り替え可能な場合、例えばこの装置２００の表示部に、光学フィルタにパターン印刷された視点数などの情報を含む表示を光学的に読み取る読取部を設けてもよい。読み取られたデータが、視差画像の視点数との不一致を示唆するとき、上述のように視差画像を適宜最適選択および表示してもよいし、光学フィルタの取り替えを促す表示を行ってもよい。
【００５８】
図１１は、立体画像流通のためのネットワークシステム３００の構成を示す。ここで合成装置３０２は図９の画像処理装置１００であり、流通の起点として作用する。一方、表示装置３０４は図１０の画像処理装置２００であり、流通の終点として作用する。同図のごとく、合成装置３０２が基礎画像を多数記録する記憶装置３０６をもち、画像サーバとして振る舞うことにより、ユーザは所望の立体画像をインターネットその他のネットワーク３０８を介して容易に取得することができる。
【００５９】
以上、本発明を実施の形態をもとに説明した。これらの実施の形態は例示であり、それらの各構成要素や各処理プロセスの組合せにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解されるところである。以下、そうした例をいくつか挙げる。
【００６０】
図９や図１０の画像処理装置１００、２００の機能はそれぞれコンピュータプログラムの形でユーザへ提供することができる。ユーザが基礎画像を自ら生成したい場合、図９の画像処理装置１００の機能をオーサリングツールとして整えたうえでユーザへ提供すればよい。
【００６１】
図８で示したヘッダ領域４２の構成はビット数も含め、当然自由度が大きい。例えば、
・立体画像としてユーザから観察される基礎画像の著作権情報
・基礎画像を見るのにふさわしいパララックスバリアやレンチキュラーレンズなどの光学フィルタが満たすべき条件
などをさらに組み込むことができる。「光学フィルタが満たすべき条件」の例として、視差画像の視点間距離、すなわち眼間距離や撮影時のカメラの画角などがある。こうした条件は、立体画像を前後方向に正しいスケールで再生したい場合の光学フィルタの設計には必須のパラメータである。また、前述の基礎画像の視点数が表示装置の視点数より多い場合の画像の選択においても、より自然な立体感が得られる画像を自動的に選択するために参照することができる。
【００６２】
実施の形態では最終的に表示する立体画像をマルチプレクス画像としたが、表示すべき画像は観察方法により変わる。したがって、さまざまな画像が観察方法に適合した画像に変換処理されて表示されてもよい。
例えば液晶シャッタメガネを用いる場合、表示すべき画像はフィールドシーケンシャル画像である。また、ヘッドマウントディスプレイで、左右の目に対応して別々の表示手段をもつタイプのものでは、表示する立体画像はセパレート画像となり、別々の画像出力手段によりそれぞれの表示手段に送られる。表示装置がひとつのヘッドマウントディスプレイで、サイドバイサイド画像を表示することもできる。この場合、光学的な手段により、画像が左半分と右半分に分離され、かつ、水平方向に拡大されて観察されるように構成すればよい。さらに、交差法、平行法といった観察方法では、ジョイント画像を表示すればよい。
【００６３】
実施の形態では、立体画像を表示する側の装置、すなわち図１０の画像処理装置２００は、基礎画像に立体情報が付加されている前提で処理を開始した。しかし、仮に本実施の形態によらない既存の基礎画像があれば、これは実施の形態に特徴的なヘッダ領域を有さないため、その基礎画像を検査する検査・推定処理部を設け、画像処理装置２００の側で立体情報を推定してもよい。例えばサイドバイサイド画像であるか否かは、画像を水平方向および垂直方向にそれぞれｍ等分およびｎ等分し、それぞれの領域の画像の近似度をｍとｎの値を変えながら評価してもよい。あるｍとｎの組について各領域またはその一部の近似度が高ければ、これは水平視点数ｍ、垂直視点数ｎのサイドバイサイド画像と推定できる。近似度の評価は、例えば画素値の差分二乗和による。このほかにも、基礎画像に微分フィルタを作用させてみて、領域の境界線が浮かび上がることも考えられ、それによってサイドバイサイド画像であるか否かの推定ができる場合もある。
【００６４】
【発明の効果】
本発明によれば、立体画像の流通が促進できる。
【図面の簡単な説明】
【図１】ユーザが水平方向にマルチプレクスされた４枚の視差画像を立体視する状態を示す図である。
【図２】マルチプレクス画像を示す図である。
【図３】サイドバイサイド画像を示す図である。
【図４】複数のセパレート画像の集合を示す図である。
【図５】実施の形態によるサイドバイサイド画像の構成を模式的に示す図である。
【図６】実施の形態によるマルチプレクス画像の構成を模式的に示す図である。
【図７】実施の形態によるセパレート画像の構成を模式的に示す図である。
【図８】実施の形態によって基礎画像に付加されたヘッダ領域の構成図である。
【図９】実施の形態に係る、画像流通の起点となる画像処理装置の構成図である。
【図１０】実施の形態に係る、画像流通の終点となる画像処理装置の構成図である。
【図１１】図９の画像処理装置を合成装置、図１０の画像処理装置を表示装置とする画像流通のためのネットワークシステムの構成を示す図である。
【符号の説明】
１２パララックスバリア、２０，４０マルチプレクス画像、３０，５０サイドバイサイド画像、３２，６０，６２セパレート画像、４２ヘッダ領域、１００，２００画像処理装置、１０２，２０２画像取得部、１０４情報付加部、２０４情報検出部、３００ネットワークシステム、３０２合成装置、３０４表示装置、３０６記憶装置。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to stereoscopic image processing technology, and more particularly, to a method, apparatus, system, and related computer program and data structure for processing or displaying a stereoscopic image.
[0002]
[Prior art]
Over the past few years, the Internet population has increased rapidly and is entering a broadband era, a new stage of Internet usage. In broadband communication, the communication band is dramatically widened, so the distribution of heavy image data, which has been often avoided in the past, is also popular. Concepts such as “multimedia” and “video on demand” have been proposed for a long time. However, in the broadband era, these words can only be experienced by ordinary users.
[0003]
If the distribution of images, especially moving images, spreads, the user naturally wants to enrich the content and improve the image quality. These are largely due to the digitization of existing video software, the development of an authoring tool therefor, and the pursuit of high-efficiency and low-loss image coding technology.
[0004]
[Problems to be solved by the invention]
Under such circumstances, as one form of image distribution service in the near future, distribution of pseudo three-dimensional images (hereinafter also simply referred to as “stereoscopic images”) has attracted technical attention, and it is considered that a considerable market is acquired. Stereoscopic images fulfill the user's desire for more realistic images, and are especially attractive for applications that pursue a sense of reality, such as movies and games. Furthermore, stereoscopic images are also useful for the realistic display of products in product presentations in EC (electronic commerce), which is considered to be one of the standards for commerce in the 21st century.
[0005]
However, when considering a new net business of stereoscopic image distribution, it may be said that neither an infrastructure for that purpose nor a model for business promotion exists yet. The present inventor has made the present invention paying attention to such a current situation, and an object of the present invention is to provide a stereoscopic image processing technique for enabling the promotion of distribution of stereoscopic images from a technical aspect.
[0006]
[Means for Solving the Problems]
In order to understand the present invention, the following concepts in this specification are first defined.
“Stereoscopic image”: The term “stereoscopic image” refers to an image that is projected on the user's eyes as a result of being displayed stereoscopically rather than image data itself. Image data that can be displayed as a stereoscopic image is called a “multiplexed image” to be described later. That is, when a multiplex image is displayed, a stereoscopic image can be seen.
“Parallax image”: Usually, for stereoscopic viewing with a sense of depth, an image that should be thrown to the right eye (hereinafter simply referred to as a right-eye image) and an image that should be cast to the left eye (hereinafter simply referred to as a left-eye image) so as to generate parallax. It is necessary to prepare. A pair of images that cause parallax, such as a right-eye image and a left-eye image, may be collectively referred to as a parallax image. In this specification, each image that causes a parallax is referred to as a parallax image. That is, both the right eye image and the left eye image are parallax images. In addition to these, generally, images from the respective viewpoints assumed in the stereoscopic image are parallax images.
[0007]
“Basic image”: An image that is subject to processing necessary for stereoscopic vision or an image that has already been processed in order to display a stereoscopic image. As a specific example, in addition to a parallax image in a separate format, which will be described later, an image obtained by combining a plurality of parallax images in some form, such as a multiplex format or a side-by-side format (these are also referred to as “synthesized images”) including.
“Side-by-side format”: One of the basic image configurations. A format in which multiple parallax images are combined in the horizontal direction, vertical direction, or both directions. Usually, the thinned parallax images are juxtaposed. For example, when two parallax images are arranged side by side in the horizontal direction, the respective parallax images are thinned out pixel by pixel in the horizontal direction. The basic image in the side-by-side format is also simply referred to as “side-by-side image”.
[0008]
"Multiplex format": One of the basic image composition modes. Final image data format for displaying stereoscopic images. A basic image in a multiplex format is also simply referred to as a “multiplex image”.
“Separate format”: One of the basic image configurations. Although it is a single two-dimensional image, it is assumed that it is stereoscopically viewed in combination with other two-dimensional images, and indicates each of the plurality of two-dimensional images. The basic image of “separate format” is also simply referred to as “separate image”. Separate images are not composite images, unlike multiplexed images and side-by-side images.
[0009]
“Viewpoint”: A stereoscopic image is assumed to have a viewpoint of viewing it. The number of viewpoints and the number of parallax images are usually equal. When there are two parallax images of a left-eye image and a right-eye image, the number of viewpoints is “2”. However, even if there are two viewpoints, the assumed position of the user's head is one. Similarly, when displaying a stereoscopic image considering the movement of the user in the left-right direction, for example, assuming four viewpoints va, vb, vc, vd in the left-right direction, the parallax images that can be seen from each are Ia, Ib, Ic, Id Then, for example, a stereoscopic image with a sense of depth can be displayed by three sets of parallax images (Ia, Ib), (Ib, Ic), and (Ic, Id). In this state, if four images viewed from the upper direction and four images viewed from the lower direction are used in order to generate a stereoscopic image that wraps in the vertical direction, The number of viewpoints is “8”.
[0010]
Based on the above definition, an aspect of the present invention relates to a stereoscopic image processing method. This method can be understood as a technique on the encoding side that should be called a starting point of distribution of a stereoscopic image. This method adds information (hereinafter also referred to as “stereoscopic information”) to be referred to in a predetermined scene in a series of processes for displaying the stereoscopic image to the basic image for displaying the stereoscopic image. is there.
[0011]
“Append” may be incorporated into the basic image, may be incorporated into the header or other area of the basic image, or may be incorporated into another file associated with the basic image. What is necessary is just to establish a relationship. An example of a “predetermined scene in a series of processes for displaying a stereoscopic image” is a scene where, for example, a side-by-side image is converted into a multiplexed image.
[0012]
According to this method, a stereoscopic image can be displayed by an appropriate method by referring to the stereoscopic information. If a large number of basic images are prepared by this method, various information terminals can take out the data and display them stereoscopically, and this method works as a basic technology for distributing stereoscopic images. This method can be used in, for example, a stereoscopic image server.
[0013]
Another aspect of the present invention relates to the structure of image data generated by the method described above. This data structure includes main data of a basic image for displaying a stereoscopic image, information to be referred to in a predetermined scene in a series of processes for displaying the stereoscopic image, that is, sub-data holding stereoscopic information, It is formed as a combination. The main data may be a basic image compressed by a predetermined method. The “combination” is not limited to the case where the two are integrated, and it is only necessary that some kind of association is established between the two. According to this data structure, as described above, stereoscopic display is easily realized on the display side.
[0014]
Still another embodiment of the present invention also relates to a stereoscopic image processing method. This method can be understood as a technique for interpreting and using the above-described data structure, that is, generally a decoding technique for displaying a stereoscopic image. This method detects information to be referred to in a predetermined scene in a series of processing for displaying a stereoscopic image, that is, stereoscopic information added to a basic image for displaying the stereoscopic image. In order to facilitate detection, the stereoscopic information may be added according to a predetermined format agreed with the encoding side in advance. In this method, the configuration of the basic image may be converted into another based on the detected stereoscopic information.
[0015]
Still another embodiment of the present invention also relates to a stereoscopic image processing method. In this method, when a stereoscopic image is handled by a device having a memory, a basic image having a mode different from the configuration of the basic image finally displayed on the screen is stored in the memory, and is appropriately read out. To use. For example, even if a multiplex image is used to display a stereoscopic image, the side-by-side format may be more advantageous than the multiplex format when it is desired to perform enlargement or other processing on the stereoscopic image. In that case, if the side-by-side image is held in the memory, the processing speed can be increased.
[0016]
Still another embodiment of the present invention also relates to a stereoscopic image processing method. This method detects information to be referred to in a predetermined scene in a series of processing for displaying a stereoscopic image, that is, stereoscopic information, added to a basic image for displaying the stereoscopic image, and detects the information. The brightness of the stereoscopic image display screen is adjusted based on the original. For example, information regarding the number of viewpoints assumed in the stereoscopic image may be entered as the stereoscopic information, and the luminance may be adjusted according to the number.
[0017]
If the number of viewpoints is “4”, a multiplexed image is formed by combining four parallax images. The number of pixels that can be seen from any one of the four viewpoints is ¼ that of a normal two-dimensional image. Therefore, the brightness of the screen is theoretically 1/4. For this reason, the process of increasing the brightness of the screen of the display device according to the number of viewpoints is effective. This process is performed by, for example, cooperation between software for detecting the number of viewpoints and a circuit for adjusting luminance.
[0018]
Yet another embodiment of the present invention relates to a stereoscopic image processing apparatus. This device is on the encoding side for preparing a stereoscopic image, and includes an image acquisition unit that acquires a basic image for displaying a stereoscopic image, and a series for displaying the stereoscopic image on the acquired basic image. Information to be referred to in a predetermined scene in the process, that is, an information adding unit for adding three-dimensional information. The image acquisition unit may generate a basic image by itself or input a ready-made basic image.
[0019]
Still another embodiment of the present invention also relates to a stereoscopic image processing apparatus. This device actually displays a stereoscopic image, or performs preprocessing therefor, an image acquisition unit for acquiring a basic image for displaying a stereoscopic image, and an attached to the acquired basic image, It includes information to be referenced in a predetermined scene in a series of processes for displaying the stereoscopic image, that is, an information detection unit that detects stereoscopic information. The image acquisition unit acquires or inputs a basic image via a network or a recording medium, for example. The image acquisition unit may include an image input unit that inputs pre-compressed image data, and an image expansion unit that generates a basic image by expanding the input image data. And a brightness adjusting unit for adjusting the brightness of the display screen of the stereoscopic image.
[0020]
Yet another embodiment of the present invention relates to a stereoscopic image processing system. This system includes a stereoscopic image synthesizing device and a display device, and the synthesizing device provides information to be referred to in a predetermined scene in a series of processes for displaying the stereoscopic image on the basic image for displaying the stereoscopic image. That is, the three-dimensional information is incorporated, and the display device detects the three-dimensional information, performs appropriate image processing on the basic image based on the three-dimensional information, and displays the three-dimensional image. As an example of image processing, there is a change in the configuration of the basic image. The system may be a server / client system. This system can contribute to the promotion of the distribution of stereoscopic images.
[0021]
Still another embodiment of the present invention relates to a stereoscopic image processing method. In this method, a basic image for displaying a stereoscopic image is acquired, and information to be referred to in a predetermined scene in a series of processes for displaying the stereoscopic image by examining a part of the basic image, that is, Three-dimensional information is estimated. Unlike the cases described so far, here, the processing when the stereoscopic information is not explicitly added is considered. Therefore, a part of the basic image is actually inspected. As an example, by examining several regions on the base image, it is estimated whether this is a side-by-side image.
[0022]
According to this method, even when stereoscopic information is not explicitly given, it can be known from the basic image. Therefore, past images created by a general method can be used, and software assets can be effectively used.
[0023]
It should be noted that any combination of the above-described constituent elements and the expression of the present invention converted between a method, an apparatus, a system, a computer program, a recording medium, a transmission medium, etc. are also effective as an aspect of the present invention.
[0024]
DETAILED DESCRIPTION OF THE INVENTION
When an image is displayed on the LCD, the minimum unit of display is usually a dot. However, three dots corresponding to RGB are gathered to form one pixel, and in normal image display or image processing, the pixel is recognized as the minimum unit of processing.
[0025]
However, when a stereoscopic image is displayed on the LCD, another consideration is required. The right eye image and the left eye image reach the user's eyes with parallax through an optical filter such as a lenticular lens or a parallax barrier. If the left and right eye images are alternately arranged in pixel units, that is, in units of 3 dots, an area where both the right eye image can be seen and an area where only the left eye image can be seen is generated, and the colors are mixed, It becomes very difficult to see. For this reason, an alternate arrangement in units of dots, which is a physical minimum display element, is desirable. Therefore, a multiplex image in which right-eye images and left-eye images are alternately arranged in units of dots is often used as a basic image to be stereoscopically displayed.
[0026]
When the parallax image is composed of only two images of the right eye image and the left eye image, that is, when the number of horizontal viewpoints is “2”, it is sufficient for the multiplex image to arrange the right eye image and the left eye image in stripes in dot units. However, when displaying a stereoscopic image in consideration of horizontal viewpoint movement using four parallax images with the number of viewpoints being “4”, the parallax barrier placed in front of the screen 10 as shown in FIG. 12, only the dots of the corresponding parallax images can be seen from the first to fourth viewpoints VP1 to VP4. On the screen 10, “1” is given to the dots of the first parallax image corresponding to the first viewpoint VP1, and the same applies to the following viewpoints. In this example, the first to fourth parallax images are sequentially arranged in stripes in dot units, and a multiplexed image is formed.
[0027]
Further, when considering the viewpoint movement also in the vertical direction, the parallax barrier 12 becomes a pinhole arranged in a matrix rather than a stripe shape, and the multiplexed image also becomes a matrix shape that is replaced in units of dots. FIG. 2 shows an example of the multiplexed image 20 when the number of horizontal viewpoints and the number of vertical viewpoints are both “4”. Here, the region denoted by (i, j) indicates dots that should be visible from the i-th viewpoint in the horizontal direction and the j-th viewpoint in the vertical direction. As shown in the figure, in the horizontal direction, i is cyclically changed to 1, 2, 3, 4, 1,..., And similarly, j is 1, 2, 3, 4, 1, in the vertical direction. , ... and changes cyclically.
[0028]
In consideration of promoting the use of stereoscopic images, the multiplex image 20 shown in FIG. 2 may be transmitted to a necessary terminal. The multiplex image 20 is already in the final format for stereoscopic viewing, so it can be simply displayed on the terminal side. Of course, at this time, the presence of an optical filter such as a parallax barrier is assumed for stereoscopic viewing.
[0029]
However, the inventor has recognized that one problem arises here. That is, the image data should naturally be compressed at the time of transmission. However, in the case of the multiplexed image 20, the ordinary irreversible compression represented by JPEG (Joint Photographic Expert Group) is practically unavailable. This is because a plurality of multiplexed parallax images are images of different viewpoints, so they are essentially irrelevant when considered at the pixel level, and when compressed by a method that relies on a spatial frequency such as JPEG, each Despite the use of the parallax images that are independent from the viewpoint, high-frequency components are removed between these images, and as a result, correct stereoscopic display cannot be performed. In particular, when independent images are alternately arranged in units of pixels, a large number of very fine high-frequency components are generated, so this problem is sometimes fatal. Even if the network bandwidth is widened, if a normal image can be compressed without any problem, if only the basic image for a stereoscopic image cannot be compressed, it will be a drag on the spread.
[0030]
Therefore, it is conceivable that the usage of side-by-side images increases as a format that can be compressed in the case of transmission or storage. FIG. 3 shows a side-by-side image 30 having four viewpoints both horizontally and vertically. Here, the region denoted by (i, j) indicates one parallax image that should be seen from the i-th viewpoint in the horizontal direction and the j-th viewpoint in the vertical direction. That is, the side-by-side image 30 is a combination of parallax images arranged side by side in one direction or both in the horizontal or vertical directions, and each parallax image functions as a single image if it is cut out from the side-by-side image 30.
[0031]
However, since each parallax image only needs to correspond to one of 4 × 4 = 16 viewpoints, it may be 1/16 of the image size to be displayed as a stereoscopic image, and is usually an original size of the same size as the stereoscopic image. From the image, one dot is generated every four dots for each of the horizontal direction and the vertical direction. In an easy-to-understand example, in the case of a multiplex image consisting only of a right-eye image and a left-eye image with the number of viewpoints “2”, only the odd-numbered dots need to be seen from the right eye, and only the even-numbered dots can be seen from the left eye. That's fine. Accordingly, the right-eye image may be any image in which only the odd-numbered columns are extracted from the original image in advance and thinned in half in the horizontal direction, and only the even-numbered columns are similarly extracted from the left-eye image. In general, if the number of viewpoints is “n”, each parallax image constituting the side-by-side image may be 1 / n of the original image size, and if all the parallax images are juxtaposed like tiles, the original image size is restored.
[0032]
In the case of the side-by-side image 30, since each parallax image is independent except for its boundary, even if irreversible compression is performed, an adverse effect occurs only at the boundary portion. For this reason, the side-by-side 30 is usually compressed by JPEG or the like, and can be easily transmitted via a network, or a large number of small storages can be stored. As described above, the side-by-side image 30 is preferable in terms of popularization, but there is also a drawback, which is that a special viewer is required. That is, in any display device, stereoscopic display is not possible unless it is finally converted to a multiplex image, and conversion processing from the side-by-side image 30 to the multiplex image is necessary.
[0033]
In addition to the two formats having the advantages and disadvantages described above, a separate image is conceivable as a third format from the viewpoint of dissemination, particularly image preparation. A separate image can form a stereoscopic image as an aggregate, but it is merely a normal two-dimensional image by itself. FIG. 4 shows the relationship between 16 separate images and a stereoscopic image. Of the 16 images, for example, the separate image 32 described as “viewpoint (4, 2) image” is assumed to have the viewpoint (4, 2), and the image size is the same as the original image. Therefore, the 16 separate images can be easily understood if they are considered as camera images taken while the user is moving.
[0034]
As described above, since the size of the separate image may remain as it is at the time of shooting, it does not require processing such as thinning or combining, and preparation is easy. In addition, since each image remains in its original state, it can be used separately. However, stereoscopic display requires a total of 16 parallax images, which is disadvantageous in terms of transmission and storage, and also requires a special viewer.
[0035]
The above are the three main formats considered for the spread of stereoscopic images. These modifications will be described at the end. Hereinafter, a stereoscopic image processing method for technically realizing spread promotion and stereoscopic display when the basic image is expressed in any of these three formats will be described. In the following, for the sake of simplicity, a case where the number of horizontal viewpoints is “2” and the number of vertical viewpoints is “1” will be exemplified.
[0036]
5, 6, and 7 schematically show the data structures of the side-by-side image 40, the multiplex image 50, and two separate images 60 and 62 among the basic images according to the present embodiment.
[0037]
As shown in FIG. 5, the side-by-side image 40 is obtained by horizontally combining a first parallax image 44 that is a left-eye image and a second parallax image 46 that is a right-eye image, and a header area 42 described later is added to the image data. Has been. Similarly, a header area 42 according to the same format is added to the multiplexed image 50 in FIG. A header area 42 is added to each of the two separate images 60 and 62 shown in FIG. In any case, this data structure includes main data that is a basic image for displaying a stereoscopic image and secondary information that holds information to be referred to in a predetermined scene in a series of processes for displaying the stereoscopic image. It can be considered as a combination with data. If the main data is a basic image compressed by a predetermined method, the sub data may be stored in a header area defined by the compression method. If the header area definition already exists, for example, a user-defined area can be used.
[0038]
FIG. 8 schematically shows a detailed configuration of the header area 42. In the figure, each region holds the following three-dimensional information.
(1) DIM area 70: The dimensions and configuration of the basic image are indicated by 3 bits.
000: Separate images or 2D images that cannot be viewed stereoscopically
001: Multiplex image among 3D images
010: Side-by-side image of 3D image
011: Reserve
1xx: Reserve
Examples of the reserved format include “joint images” in which a plurality of parallax images are juxtaposed as in a side-by-side image, but the parallax images are juxtaposed as they are, and even fields and odd fields are juxtaposed. A “field sequential image” indicating that a parallax image should be displayed alternately in a time division manner is conceivable. “Joint images” are often observed by the parallel method or the intersection method, but they can be converted to side-by-side images by thinning them out in the viewer, or they can be converted directly into multiplexed images, so they are effective as one format. .
[0039]
(2) BDL area 72: 1 bit indicates the presence or absence of side-by-side image boundary processing. Meaningful when DIM is "01x".
0: No boundary processing
1: With boundary processing
As described above, when a side-by-side image is irreversibly compressed, the image is adversely affected at the boundary portion. In order to reduce this, it is shown whether or not the processing shown in the next section has been performed.
[0040]
(3) HDL region 74: 2 bits indicate the contents of the side-by-side image boundary processing. Meaningful when BDL is "1".
00: Insert a white frame
01: Insert a black frame
10: Copy edge pixels
11: Reserve
In order to reduce the adverse effects due to compression, a white frame, a black frame, or the like is added to the boundary portion to reduce mixing of a plurality of parallax images. Copying the end pixels has the same effect.
[0041]
(4) WDT area 76: The number of pixels for the boundary processing of the side-by-side image is designated by 2 bits. Meaningful when BDL is "1".
00-11: Number of pixels
(5) VPH area 78: 8 bits indicate the number of horizontal viewpoints assumed for a stereoscopic image. It may be described manually when creating the basic image, or may be automatically generated by software that generates the basic image.
00000000: Unknown or reserved
00000001-11111111: Number of horizontal viewpoints
(6) VPV area 80: 8 bits indicate the number of vertical viewpoints assumed for a stereoscopic image.
00000000: Unknown or reserved
00000001-11111111: Number of vertical viewpoints
When both VPH and VPV are 00000001, the basic image may be determined as a normal two-dimensional image that cannot be stereoscopically viewed.
[0042]
(7) ODH area 82: One bit indicates the arrangement of a plurality of parallax images in the horizontal direction.
0: Same as the camera array at the time of shooting
1: Reverse of camera alignment
That is, when ODH is “0”, the parallax image captured by the leftmost camera at the time of shooting is recorded as it is on the left side of the basic image as it is, and is recorded in order thereafter. Since it is unlikely that the normal arrangement is random, two types of rules are sufficient.
[0043]
(8) ODV area 84: One bit indicates the arrangement of a plurality of parallax images in the vertical direction.
0: Same as the camera array at the time of shooting
1: Reverse of camera alignment
It should be noted that VPH and VPV related to the number of viewpoints are both 8 bits, and it is considered that they are usually sufficient. Therefore, these most significant bits may be assigned to ODH and ODV, respectively.
[0044]
(9) PSH area 86: 8 bits indicate each viewpoint image in the horizontal direction. Meaningful when DIM is "000".
00000000: Unknown or reserved
00000001-11111111: Horizontal position
Note that an absolute value such as the origin determined on each separate image, for example, the coordinates of the upper left corner point of the image, may be included in the header area 42 separately, which leads to a higher processing speed.
[0045]
(10) PSV area 88: 8 bits indicate each viewpoint image in the vertical direction.
00000000: Unknown or reserved
00000001-11111111: Vertical position
For example, a separate image labeled “viewpoint (4, 2) image” in FIG. 4 has a description of PSH = 4 and PSV = 2.
[0046]
The above is an example of the header area 42. An apparatus for realizing the distribution of stereoscopic images using this area will be described.
FIG. 9 shows the configuration of the image processing apparatus 100 that generates this region. The apparatus 100 includes an image acquisition unit 102 that acquires a basic image for displaying a stereoscopic image, and a predetermined scene in a series of processes for displaying a stereoscopic image on the acquired basic image, for example, a display side later And an information adding unit 104 for adding stereoscopic information to be referred to in a scene where a multiplex image is generated by the apparatus. This configuration can be realized in hardware by a CPU, memory, or other LSI of any computer, and in software, it is realized by a program with a basic image generation function and a stereoscopic information addition function loaded into the memory. However, here, functional blocks realized by their cooperation are depicted. Accordingly, those skilled in the art will understand that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof. Therefore, hereinafter, what does not explicitly indicate the name of the configuration may be considered to be made by a control unit centered on the CPU, for example.
[0047]
The image acquisition unit 102 receives an original image from an image source such as a network or a user's digital camera, and uses the original image as it is or processes it to generate a basic image. For example, when a separate image is required, the original image may be used as the basic image as it is. On the other hand, when side-by-side images are required, a plurality of original images are juxtaposed and combined. When a multiplexed image is required, the parallax images from each viewpoint are reconstructed in a stripe shape or a matrix shape.
[0048]
The image acquisition unit 102 further compresses the obtained basic image as necessary. Prior to that, it may be determined whether or not the image quality of the stereoscopic image is affected by the compression, and if it is determined that the compression will occur, the compression may be prohibited. For example, when compressing a multiplex image with respect to a spatial frequency component, the compression may be prohibited or may be compressed after being converted into a side-by-side image.
[0049]
The information adding unit 104 adds the above-described header information to the basic image obtained in this way, and records the resulting image data for stereoscopic display in a storage device (not shown) or a predetermined amount via a network. Deliver to the location. As described above, according to this apparatus 100, a basic image with stereoscopic information can be prepared in advance for those who desire stereoscopic display.
[0050]
Note that the image acquisition unit 102 does not necessarily obtain the original image first. Input what is already a multiplex image from a network, etc., detect its 3D information, determine that it is a multiplex image, recognize that it is unsuitable for compression as it is, It is also possible to convert the image into a side-by-side image, compress the image, and rewrite the stereoscopic information. In that case, the apparatus 100 can also be used as a relay point for the distribution of stereoscopic images.
[0051]
On the other hand, FIG. 10 shows a configuration of a decoding-side image processing apparatus 200 that actually performs stereoscopic display. The apparatus 200 refers to a predetermined scene in a series of processes for displaying a stereoscopic image, which is added to the acquired basic image, and an image acquisition unit 202 that acquires a basic image for displaying the stereoscopic image. And an information detection unit 204 for detecting stereoscopic information to be performed. The device 200 is typically a user terminal, and the image acquisition unit 202 acquires a basic image to which stereoscopic information has already been added. The image acquisition unit 202 may generate or reproduce a basic image by decompressing an image data that has been compressed in advance.
[0052]
Subsequently, the information detection unit 204 parses the header area added to the basic image and detects stereoscopic information. If it is determined from the detected stereoscopic information that the basic image is not a multiplexed image, the apparatus 200 converts the basic image into a multiplexed image and displays the stereoscopic image. As an optional function of this apparatus 200, a display screen (not shown) of this apparatus 200 is displayed according to the above-mentioned consideration on luminance based on the number of horizontal viewpoints and the number of vertical viewpoints among the detected stereoscopic information. The brightness may be increased.
[0053]
The apparatus 200 can store and edit a basic image as well as a stereoscopic image. At the time of storage, if the basic image is a multiplex image, it may be converted into a side-by-side image or the like, and the stereoscopic information may be rewritten and stored. When editing, for example, the user may want to enlarge or reduce the image. At this time, since the processing is complicated if it is a multiplex image, it may be converted once into a side-by-side image, then subjected to predetermined image processing, and finally returned to the multiplex image for display.
[0054]
In order to facilitate the editing and other image processing, this apparatus 200 always holds images in a format other than a multiplex image, particularly a side-by-side image, in a memory or other storage device, and appropriately stores this as necessary. What is necessary is just to read and use.
[0055]
As an additional configuration of the apparatus 200, if a characteristic acquisition unit that acquires characteristics such as the number of viewpoints and the optimum observation distance of the display apparatus as data is provided, the convenience is further increased. For example, when the assumed number of viewpoints of the basic image is different from that of the display device, a display image selection unit that automatically selects a parallax image to be displayed from the basic image based on the above characteristics can be provided. If the assumed number of viewpoints of the basic image is “4” and that of the display device is “2”, two are selected from the four parallax images. These two parallax images do not need to have continuous viewpoints, and in order to enhance the stereoscopic effect, two images with the viewpoints skipped may be selected. If the number of viewpoints of the basic image is “2” and that of the display device is “4”, the same parallax image is displayed twice, so that a stereoscopically viewable area can be secured in front of the screen.
[0056]
Further, if the image processing apparatus 200 includes a position detection unit that detects the position of the head of an observer viewing the display screen, the display image selection unit changes the parallax image to be selected in accordance with the head position. , You can also show the image around the viewer.
[0057]
When the optical filter can be replaced, for example, the display unit of the apparatus 200 may be provided with a reading unit that optically reads a display including information such as the number of viewpoints pattern-printed on the optical filter. When the read data suggests a discrepancy with the number of viewpoints of the parallax image, the parallax image may be optimally selected and displayed as described above, or a display that prompts replacement of the optical filter may be performed.
[0058]
FIG. 11 shows a configuration of a network system 300 for distributing stereoscopic images. Here, the synthesizing apparatus 302 is the image processing apparatus 100 of FIG. 9 and functions as a distribution starting point. On the other hand, the display device 304 is the image processing device 200 of FIG. 10 and acts as an end point of distribution. As shown in the figure, the synthesizer 302 has a storage device 306 that records a large number of basic images and acts as an image server, so that the user can easily obtain a desired stereoscopic image via the Internet or other network 308. .
[0059]
The present invention has been described based on the embodiments. It is understood by those skilled in the art that these embodiments are exemplifications, and that various modifications can be made to combinations of the respective constituent elements and processing processes, and such modifications are also within the scope of the present invention. By the way. Here are some examples:
[0060]
The functions of the image processing apparatuses 100 and 200 in FIGS. 9 and 10 can be provided to the user in the form of computer programs. If the user wants to generate a basic image himself, the function of the image processing apparatus 100 in FIG. 9 may be provided as an authoring tool after being prepared.
[0061]
The header area 42 shown in FIG. 8 has a large degree of freedom including the number of bits. For example,
・ Copyright information of the basic image observed by the user as a stereoscopic image
・ Requirements for optical filters such as parallax barrier and lenticular lens suitable for viewing basic images
Etc. can be further incorporated. Examples of “conditions to be satisfied by the optical filter” include a distance between viewpoints of a parallax image, that is, a distance between eyes, a field angle of a camera at the time of photographing, and the like. Such a condition is an indispensable parameter for designing an optical filter when it is desired to reproduce a stereoscopic image with a correct scale in the front-rear direction. Further, even in the selection of an image when the number of viewpoints of the basic image is larger than the number of viewpoints of the display device, it can be referred to in order to automatically select an image that provides a more natural stereoscopic effect.
[0062]
In the embodiment, the stereoscopic image to be finally displayed is a multiplexed image, but the image to be displayed varies depending on the observation method. Therefore, various images may be displayed after being converted into images suitable for the observation method.
For example, when liquid crystal shutter glasses are used, the image to be displayed is a field sequential image. In the case of a head-mounted display that has separate display means corresponding to the left and right eyes, the stereoscopic image to be displayed is a separate image and is sent to each display means by separate image output means. The display device can display a side-by-side image on a single head-mounted display. In this case, the image is separated into the left half and the right half by optical means and may be configured to be observed in the horizontal direction. Furthermore, a joint image may be displayed by an observation method such as a crossing method or a parallel method.
[0063]
In the embodiment, the apparatus that displays a stereoscopic image, that is, the image processing apparatus 200 in FIG. 10 starts processing on the premise that stereoscopic information is added to the basic image. However, if there is an existing basic image that does not depend on the present embodiment, this does not have a header area characteristic of the embodiment, so an inspection / estimation processing unit that inspects the basic image is provided, and the image The three-dimensional information may be estimated on the processing device 200 side. For example, whether or not the image is a side-by-side image may be evaluated by dividing the image into m equal parts and n equal parts in the horizontal direction and the vertical direction, respectively, and changing the degree of approximation of the image in each region while changing the values of m and n. . If the degree of approximation of each region or part thereof is high for a certain set of m and n, this can be estimated as a side-by-side image with m horizontal viewpoints and n vertical viewpoints. The evaluation of the degree of approximation is based on, for example, the sum of squares of pixel values. In addition to this, it is conceivable that a boundary line of a region emerges when a differential filter is applied to the basic image, whereby it may be possible to estimate whether the image is a side-by-side image.
[0064]
【The invention's effect】
According to the present invention, the distribution of stereoscopic images can be promoted.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a state in which a user stereoscopically views four parallax images multiplexed in a horizontal direction.
FIG. 2 is a diagram illustrating a multiplexed image.
FIG. 3 is a diagram showing a side-by-side image.
FIG. 4 is a diagram illustrating a set of a plurality of separate images.
FIG. 5 is a diagram schematically illustrating a configuration of a side-by-side image according to the embodiment.
FIG. 6 is a diagram schematically illustrating a configuration of a multiplexed image according to the embodiment.
FIG. 7 is a diagram schematically illustrating a configuration of a separate image according to the embodiment.
FIG. 8 is a configuration diagram of a header area added to a basic image according to the embodiment.
FIG. 9 is a configuration diagram of an image processing apparatus serving as a starting point of image distribution according to the embodiment.
FIG. 10 is a configuration diagram of an image processing apparatus as an end point of image distribution according to the embodiment.
11 is a diagram illustrating a configuration of a network system for image distribution in which the image processing apparatus in FIG. 9 is a synthesizing apparatus and the image processing apparatus in FIG. 10 is a display apparatus.
[Explanation of symbols]
12 parallax barrier, 20, 40 multiplexed image, 30, 50 side-by-side image, 32, 60, 62 separate image, 42 header area, 100, 200 image processing device, 102, 202 image acquisition unit, 104 information addition unit, 204 Information detection unit, 300 network system, 302 synthesis device, 304 display device, 306 storage device.

Claims

An image acquisition unit for acquiring a basic image for displaying a stereoscopic image;
Look including a information estimating unit that estimates a three-dimensional information for displaying a stereoscopic image based on the pixel values of the basic image by the image acquisition unit has acquired,
The information estimation unit divides the basic image acquired by the image acquisition unit into a plurality of regions, and evaluates the degree of approximation of each of the divided regions to determine whether the image is a side-by-side image. A featured stereoscopic image processing apparatus.

The basic image is again a side-by-side image,
The information estimation unit divides the side-by-side image acquired by the image acquisition unit into a plurality of regions, and estimates the number of viewpoints of the side-by-side image by evaluating the degree of approximation of each divided region. The three-dimensional image processing apparatus described.

The stereoscopic image processing apparatus according to claim 1, wherein the information estimation unit evaluates the degree of approximation based on a sum of squared differences of pixel values.

An image acquisition unit for acquiring a basic image for displaying a stereoscopic image;
An information estimation unit that estimates stereoscopic information for displaying a stereoscopic image based on the pixel value of the basic image acquired by the image acquisition unit,
The information estimation unit determines whether or not the image is a side-by-side image by performing differentiation on the basic image acquired by the image acquisition unit.

The stereoscopic image processing apparatus according to claim 1, further comprising an information adding unit that adds stereoscopic information to the basic image.

The stereoscopic image processing apparatus according to claim 5, wherein the information adding unit adds stereoscopic information to a header area of a basic image.

The stereoscopic image processing apparatus according to claim 1, wherein the basic image does not have a header area that holds stereoscopic information.

An image acquisition step of acquiring a basic image for displaying a stereoscopic image;
An information estimation step of estimating stereoscopic information for displaying a stereoscopic image based on the pixel value of the basic image acquired by the image acquisition unit,
In the information estimation step, the basic image acquired in the image acquisition step is divided into a plurality of regions, and the degree of approximation of each divided region is evaluated to determine whether the image is a side-by-side image. A feature of a stereoscopic image processing method.

On the computer,
An image acquisition step of acquiring a basic image for displaying a stereoscopic image;
An information estimation step of estimating stereoscopic information for displaying a stereoscopic image based on the pixel value of the basic image acquired by the image acquisition unit;
A computer program for executing
In the information estimation step, the basic image acquired in the image acquisition step is divided into a plurality of regions, and the degree of approximation of each divided region is evaluated to determine whether the image is a side-by-side image. A featured computer program.