JP4117041B2

JP4117041B2 - Digital camera system

Info

Publication number: JP4117041B2
Application number: JP02895397A
Authority: JP
Inventors: 公一江尻; 海克関
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1997-02-13
Filing date: 1997-02-13
Publication date: 2008-07-09
Anticipated expiration: 2017-02-13
Also published as: JPH10229566A

Description

【０００１】
【発明の属する技術分野】
本発明は、デジタルカメラシステムに関する。
【０００２】
【従来の技術】
人間の目は、奥行きのある対象物でも、ピンぼけを感じることはない。また、眼鏡をかけた場合、網膜に射影される画像は大きく歪んでいるにもかかわず、人間はすぐにこれを補正して知覚するようになる。さらに、人間の知覚は、鮮明な画像が得られているのは視野の中心付近のみであるにもかかわらず、知覚視野としては広い範囲にわたって鮮明な画像を認識している。
【０００３】
このような人間の知覚画像と、従来のカメラにより撮影された画像との主要な相違点は次の３点に要約できる。
（１）知覚画像には画像の奥行き（３次元）情報が存在するが、１枚の写真画像にはこれがない。
（２）１回の（１枚の）画像として知覚される視野は、写真画像に比べ飛躍的に広い。
（３）知覚画像はパンフォーカスである（ピンぼけでない）。
【０００４】
デジタルカメラシステムの分野において、上に述べたような人間の視覚の持つ機能の実現は未開拓である。僅かにパンフォーカスに関して、ピント位置を少しずつずらして撮影された複数の画像から、最もシャープな画像の画素を選び、パンフォーカス画像を合成する手法が「児玉ほか；”複数の異なる焦点画像からの全焦点画像の再構成”，１９９５年テレビジョン学会年次大会，P.１４９」に見られる程度である。
【０００５】
【発明が解決しようとする課題】
本発明の目的は、前述の人間の知覚画像に近い画像を獲得するためのデジタルカメラシステムを提供することにある。
特に本発明は、被写体の奥行き情報の付加された画像を獲得するためのデジタルカメラシステムを提供することにある。
【０００６】
【課題を解決するための手段】
本発明のデジタルカメラは、被写体の方向に対して垂直な方向に移動しながら撮影された、同一被写体に対する複数の画像を保存する手段と、前記保存された複数の画像のうちの所定画像とそれ以外の画像を順次比較し、該比較した両画像の差分がゼロになる面積が最大となる位置を基準位置として、前記所定画像とそれ以外の画像において、前記基準位置と最大類似度を示す位置との距離をブロック単位に求め、前記距離を被写体の奥行き情報として抽出する手段とを有することを特徴とする。
【０００９】
【発明の実施の形態】
以下、本発明の実施の形態を、図面を用いて説明する。
【００１０】
本発明の第１の実施例によれば、撮影した画像に、被写体の奥行き情報を付加することができる。これが本発明の主題である。まず、奥行き情報の獲得方法について説明する。
【００１１】
被写体の奥行きを撮影した画像だけから見つけるためには、複数の画像が必要である。人間は奥行きの認識に両眼の視差を利用している。視差だけでは十分な精度が得られない場合は、人間自身が左右に動いて、より大きな視差を故意に作っている。これと同じことをカメラで行うには、撮影者が動き、異なる場所で同一の被写体を撮影すればよい。こうすることによって、２枚の画像に視差が生じる。
【００１２】
以上のことを図１により説明する。図１に示すように、カメラのレンズ１と撮像板２の距離をＦo、レンズ１と被写体Ｏiの距離をＲo(i)とした場合に、撮影者がカメラを持ったまま、被写体の方向に対し垂直な方向に距離Ｌだけ動くと、つまり、レンズ１及び撮像板２がそれぞれ１（１）及び２（１）の位置から１（２）及び２（２）の位置まで移動すると、被写体Ｏiの像は、撮像板１の上をＦo×Ｌ／Ｒo(i)だけ移動する。Ｆoが一定で、かつＲo>>Ｆoであれば、撮像板２上の像の移動量は、距離Ｒo(i)にほぼ反比例する。したがって、人がカメラを持って移動する間に、個々の画像要素（画素）がどれだけ移動したかを記憶しておけば、被写体の奥行き情報を復元できる。
【００１３】
図２は、このような原理に基づく本発明の第１実施例によるデジタルカメラシステムの概略構成図である。図３は、本実施例のデジタルカメラシステムを用い、撮影画像に被写体の奥行き情報を付加する手順を示すフローチャートである。
【００１４】
図２において、１００はレンズ、撮像板（ＣＣＤ等）、焦点調整や絞り調整、シャッター等の機構を含む撮像部である。１０２は撮像系１００により撮影された画像のデータ等を記憶するための記憶部、１０４は奥行き情報を抽出するための奥行き情報抽出部、１０６は撮影者がシャッター操作や撮影モードの切り替え等の操作をするための操作部、１０８は外部コンピュータ等とのデータの授受のための外部インターフェイス部、１１０は各部の制御のための制御部である。
【００１５】
撮影画像に被写体の奥行き情報を付加する場合、まず、静止画像撮影モードで、被写体を１枚撮影する（図３、ステップＳ１）。撮影された静止画像フレームは記憶部１０２に記憶される。
【００１６】
次に、動画撮影モードに切り替え、静止画像を撮影した時の光軸方向に直交する方向に移動しながら同じ被写体を連続撮影する（ステップＳ２）。撮影された一連の動画フレームは記憶部１０２に記憶される。ただし、動画フレームの記憶域と静止画像フレームの記憶域とは別個の物理メモリであってもよい。なお、動画フレームは奥行き情報の抽出のために利用されるものであるので、奥行き情報の抽出が済んだ段階で消去される。
【００１７】
必要数の動画フレームが撮影されると、撮影動作を終了し、奥行き情報の抽出処理が奥行き情報抽出部１０４によって実行される（ステップＳ３）。奥行き情報抽出部１０４は、まず、最初の動画フレームと、それに続く各動画フレームとを順次比較し、比較した両画像の差分がゼロになる面積が最大になる位置（基準位置）を探す。具体的には、比較する２フレームを相対的に微小量ずつ並行移動させながら、画素単位又は小領域単位の差分を算出し、この差分値のフレーム当たりの合計が最小になった相対的位置を探す。次に、最初の動画フレームをｍ×ｎの長方形ブロックに分割し、このブロック単位で、最初の動画フレームと後続の各動画フレームを比較する。比較の初期位置は上記基準位置である。この比較により、各ブロック単位に、その周辺で最大の類似度を有する動画領域を探す
（探索の範囲は予め定めておく）。そして、各ブロックの最大類似度を示す動画領域までの相対的距離（Ｉ，Ｊ）を求め、これを奥行き情報として記憶部１０２に格納する。
【００１８】
図４は、奥行き情報抽出の説明図である。１２０（１）は最初の動画フレーム、１２０（２），１２０（３），・・・，１２０（ｎ）は後続の動画フレームである。動画フレーム１２０（１）と次の動画フレーム１２０（２）を比較すると、三角形の像の部分が相対的に水平方向に移動していることが分かる。この移動した部分が差分フレーム１２１に網点領域として図示されている。このような像の水平、垂直方向の相対的移動量（Ｉ，Ｊ）をブロック単位に抽出するわけである。
【００１９】
抽出される奥行き情報の例を図５に示す。図５において、（０，０）は移動の観測されなかった領域を意味し、（０，-1）と（０，-3）は１方向のみ移動した領域を意味する。移動量が均一な領域は、カメラからほぼ等距離にある被写体を表す。領域間の移動量の差が両者の見かけの距離になる。この例では、（０，０）の領域と（０，-3）の領域が見かけ上、最も離れている。なお、上記処理における画素の座標値は、正確には、レンズ中心からの角度で表現すべき値であるが、近似的に撮像板の画素アドレスを利用することも可能である。
【００２０】
このようにして、連続した動画フレームを用いて被写体の奥行き情報が抽出され、これが静止画像フレームに付加された形で記憶される。奥行き情報を静止画像フレームとともに外部のコンピュータ等に取り込むことにより、撮影画像の３次元化等の編集が可能になる。
【００２１】
本発明の第２の実施例によれば、デジタルカメラシステムにおいて、視野の飛躍的な拡大を実現できる。前述のように、人間に知覚される視野は、写真にくらべ遥かに広い。言い換えれば、人間の知覚する画像１枚のサイズは、通常のカメラの撮影画像より遥かに大きい。このことをさらに考察すれば、人間はしばしば一地点で観察される画像を１群の画像として認識していることが多い。記憶の中の風景は通常、パノラマ画像である。つまり、一地点から見えるすべての画像を広角画像としても、個別の狭角の画像としても認識している。理想的には、個別の狭角画像が切れめなしに連続して、あらゆる方位をカバーする。これをカメラ画像で実現するためには、同一地点から撮影さた一連の画像がメモリに記憶されなければならないが、その個々の画像は、歪曲収差が除去される必要があり、また、一連の画像の一部であること及びその光軸方位が分かるようなタグが付加されている必要がある。
【００２２】
図６は、そのような原理に基づき視野拡大を実現する、本発明の第２実施例によるデジタルカメラシステムの概略構成図である。図６において、２００はレンズ、撮像板（ＣＣＤ等）、焦点調整や絞り調整、シャッター等の機構を含む撮像部である。２０２は撮像系２００により撮影された画像のデータ等を記憶するための記憶部、２０４は画像の歪曲収差の除去を行う歪曲収差補正部、２０５は平面座標と球面座標の変換を行う座標変換部、２０６は複数の画像より１枚の大きな表示画像を合成する処理を実行する画像合成部である。２０８は撮影者がシャッター操作等のカメラ操作をするための操作部、２１０は外部コンピュータ等とのデータの授受のための外部インターフェイス部、２１２は各部の制御のための制御部である。
【００２３】
広角の画像を得るためには、まず、同一地点より少しずつ方位を変えながら撮影する操作を連続的に行い、撮影した画像のファイルは記憶部２０２に保管する必要がある。この画像ファイルの保管方法として、平面状の格子点の画像として保管する方法（方法１）と、画素の方位角度に対応させて保管する方法（方法２）のいずれを採用してもよい。ただし、いずれの保管方法を採用するにしても、歪曲収差補正部２０４によって、画像の歪曲収差を予め除去しておく。なお、この歪曲収差の除去には、例えば本出願人による特願平８−２７３２９４号又は特願平８−２９６０２５号特許出願に添付の明細書に記載の方法等を利用すればよいので、ここでは説明を省略する。
【００２４】
保管方法１では、撮像板に撮像した画像をそのままの形で保管する。各画素は、カメラへの入射光１本に対応するが、画像の滑らかな合成を可能にするためには、各画素への入射光の方位角度を再現可能にするためのパラメータを保持しておく必要がある。
【００２５】
図７において、カメラの２つの異なった方位が光軸Ｌ１，Ｌ３で示されている。１１（１）と１１（２）はそれぞれの方位の時のレンズ１１の位置を示し、１２（１）と１２（２）はそれぞれの方位の時の撮像板１２（ＣＣＤ等）の位置を示している。任意の光線Ｌ２は、撮像板１２が１２（１）の位置にある時にも１２（２）の位置にある時にも撮影されるが、両方の位置で撮影された画像を光線Ｌ２の位置でつながあわせても、１枚の画像を合成することはできない。なぜなら、その２つの画像平面は異なった平面であり、つなぎ合わせた部分で画像は滑らかな形状変換をしないからである。画像を滑らかに合成できるようにするためには、個々の画像の元の入射光線の方位角度を再現可能にする必要があり、そのために保持しておく必要のある重要なパラメータ次のとおりである。
＊画像のファイル名称
＊画像の光軸方位を示すタグ
＊レンズと撮像板との距離
＊撮像板の画素ピッチ（水平方法及び垂直方向）
＊基準画像（通常、一連の画像中の最初に撮影された画像）に対する各画像の相対方位角度
このようなパラメータの付加、保存及び画像の保存は制御部２１２により管理される。
【００２６】
保管方法２では、画像の各画素を、その入射光の方位角度に対応させて保管する。画像の合成を容易にするため、方法１と違って、平面の撮像板に撮影された画素を、球面の座標（方位角度）上で等間隔になるように座標変換部２０５で予め変換を施して保存する方法である。
【００２７】
図８は、この座標変換の様子を示しており、２１はレンズ、２２は撮像板である。光軸Ｌ１以外の入射光Ｌ２，Ｌ２’が撮像板２２上につくる等ピッチの画素ｐ１，ｐ２の座標ではなく、これに対応する球面２３上の位置ｑ１，ｑ２の座標に変換する。このような変換を全画素について施したものを保存する。ただし、図８から分かるように、単純な変換を施したのでは、球面２３上では光軸から離れるほど画素ピッチは小さくなりメモリ管理に都合が悪く、また後述の合成処理にも不都合であるので、補間によって等間隔な球面上画素に変換する。方法２の場合に保持する必要のある重要なパラメータは次のとおりである。
＊画像のファイル名称
＊画像の光軸方位を示すタグ
＊画像の画素間角度ピッチ（水平方向及び垂直方向：通常の撮像板では、方位角度は画素位置に依存するが、前述のように一定角度ピッチになるよう予め変換される）
＊基準画像（通常、一連の画像中の最初に撮影された画像）に対する各画像の相対方位角度
このようなパラメータの付加、保存及び画像の保存は制御部２１２により管理される。
【００２８】
以上のようにして一連の画像と関連したパラメータが記憶部２０２内に準備されると、画像合成部２０６において視野の拡大された画像の合成が可能である。図９は、この画像合成処理の流れを示すフローチャートである。図１０及び図１１は、この画像合成の説明のための図である。
【００２９】
図１０は、異なる方位（光軸方向）Ｐ１，Ｐ２を撮影した画像３１，３２の関係を示しており、カメラのレンズ中心Ｏは不動であり、被写体は無限遠にあるとする。画像３１の座標系は（ｘ１，ｙ１，ｚ１）で、画像３２の座標系は（ｘ２，ｙ２，ｚ２）で表されている。ξは両座標系の（ｘ，ｙ）平面の共通軸を表し、αはｘ１軸と共通軸ξのなす角度、βはｘ２軸と共通軸ξのなす角度、γはｚ１軸とｚ２軸のなす角度である。画像３１を基準画像とすれば、図示のα，β，γはそのまま画像３２の基準画像に対する相対角度を表す角度パラメータとして用いることができる。
【００３０】
画像合成処理の内容は次のとおりである。最初のステップＳ１１において、見ようとする方位（Φ，Ψ）と画像のサイズ（Ｗ）を指定する。これを（Φ，Ψ，Ｗ，α，β，γ）と置く。
【００３１】
次のステップＳ１２において、先に指定された方位（Φ，Ψ）に最も近い光軸方位を持つ画像（Φ'，Ψ'，Ｗ'，α'，β'，γ'）を記憶部２０２より取り出す。
【００３２】
次のステップＳ１３において、方位（Φ'，Ψ'）の方向の球面を定義し、この球面上に画像（Φ'，Ψ'，Ｗ'，α'，β'，γ'）を射影する。前記方法１によって画像が保管されている場合、画像の各画素は方位（Φ'，Ψ'）に垂直な平面上の１点に対応するので、定義した球面上の１点に変換し、そこに画素値を出力する必要がある。この手続きは図８におけるｐ１，ｐ２からｑ１，ｑ２への変換に相当し、座標変換部２０５を利用して実行される。前記保管方法２が利用されている場合には、この変換手続きは不要である。
【００３３】
次のステップＳ１４において、前ステップで球面に定義された画素を、方位（Φ，Ψ）に垂直な目的平面Ｐ0 に投影する（図１１を参照）。目的平面Ｐ0 上の画素位置（ｉ，ｊ）は
（Ｆ0 tan(θ)）^2 ＝ (ｉｐx）^2 ＋ (ｊｐy）^2
で表すことができる。ここでｐx,ｐyは平面Ｐ0上の水平，垂直方向の画素ピッチであり、Ｆ0はレンズと撮像板（撮像平面）との距離（図１１では球の半径）、θは光軸と画素（ｉ，ｊ）への入射光のなす角度である。
【００３４】
次のステップＳ１５において、（Φ'，Ψ'）に最も近い方位の未使用の画像
（Φ1，Ψ1，Ｗ1，α1，β1，γ1）を記憶部２０２より取り出し、その画像に関しステップＳ１３からの処理を続ける。処理が終了するのは、未使用の画像がなくなったか、又は、目的平面Ｐ0が覆い尽くされたとステップＳ１６又はＳ１７で判断された時である。
【００３５】
このようにして、目的の平面Ｐ0上に複数の画像による広角の合成画像が作成され、記憶部２０２に保存される。この合成画像データは、外部インターフェイス部２１０を介して外部のコンピュータ等へ転送し、そのまま画面に表示させることができる。
【００３６】
なお、同一地点から撮影された複数の画像と、それに関するタグ及びパラメータを記憶部２０２に保存し、それらのデータを外部のコンピュータ等へ転送し、必要な処理は外部のコンピュータ等で実行させることも可能である。
【００３７】
さて、前述のように、人間に知覚される画像の特徴の一つはパンフォーカスであることである。人間は、眼の焦点位置を絶えず動かしながら対象物を観察しており、これら個々の異なるピント位置の画像を脳の中で再構成していると考えられている。現在の技術では、唯１枚の画像をもとにしたパンフォーカス画像は得られていないが、その代替技術として、ピント位置を少しずつずらした複数の画像から、最もシャープな画像の画素を選ぶことによってパンフォーカス画像を合成する方法が知られていることは前述のとおりである。
【００３８】
本発明の第３の実施例によれば、デジタルカメラシステムにおいて、パンフォーカス画像を得ることができる。図１２は、そのようなデジタルカメラシステムの概略構成図であり、図１３はその説明図である。
【００３９】
図１２において、３００は撮像部である。この撮像部３００は、レンズ位置をずらしながら多数の画像を撮影し、撮影した一連の画像を出力する機構を備える。この機構としては、例えば、レンズを撮像板に近い位置から順次せり出しながら（又は、その逆に遠い位置から撮像板に近い位置に順次移動させながら）撮影し、最もシャープな画像（空間周波数が最大となった画像）が得られるレンズ位置を合焦点位置として検出する、公知のオートフォーカス機構を利用できる。ただし、オートフォーカスの目的では、合焦点の１枚の画像だけが保存され、他の画像は捨てられるが、本発明のデジタルカメラシステムでは、撮影されたすべての画像が記憶部３０１に保存され、パンフォーカス画像の合成に利用される。
【００４０】
３０２は、パンフォーカス画像の合成処理等を実行する演算処理部、３０３は操作部、３０４は記憶部３０１内のデータの外部転送等のための外部インターフェース部、３０５は各部を制御する制御部である。なお、パンフォーカス画像を合成するための処理は演算量が大きいため、内部の演算処理部３０２の能力が不足する場合等には、外部の汎用コンピュータ等の高性能演算処理装置３０６へ画像とその関連データを転送し、必要な演算を実行させることも可能である。
【００４１】
さて、撮像部３００により撮影された一連の画像はすべて、それぞれの撮影時のレンズ位置（レンズと撮像板の距離）の情報とともに記憶部３０１に格納される。この一連の画像には、同一のファイル名称が付与されるとともに、各画像に撮影順にシリアル番号が付けられる。例えば、図１３に示す一連の画像３１０のそれぞれに付けられた風景１−１，風景１−２，．．．等の「風景１」がファィル名称であり、ハイフンの後の数字はシリアル番号である。また、シリアル番号の後のＦ0，Ｆ1，Ｆ2，．．．はレンズ位置を表すが、Ｆ0はレンズの焦点距離（無限遠像に対応）とし、Ｆ1，Ｆ2，．．．はＦ0 より順次遠いレンズ位置とする。
【００４２】
本発明によれば、一連の画像の各画素に、濃度勾配に相関の高い変数、例えば微分値もしくは高次微分値を利用して重み付けをした複数の画像より、ピンぼけを補正したパンフォーカス画像を作成する。以下、図１３を参照し、パンフォーカス画像の合成処理の具体例を説明する。
【００４３】
まず、一連の画像３１０中の「風景１−ｋ」画像のfk（ｉ，ｊ）画素とその周辺（４近傍）画素を読み出し、次のラプラシアンを計算する。
Δfk（I(k),J(k)）
＝｜fk(i(k),j(k)）− fk（i(k)-1，j(k)）｜
＋｜fk（i(k),j(k)）− fk（i(k),j(k)-1）｜
＋｜fk（i(k),j(k)）− fj（i(k),j(k)+1）｜
＋｜fk（i(k),j(k)）− fk（i(k)+2,J(k)）｜
ただし、Ｉ(k)＝ｉ×（F0／Fk），J(k)＝ｊ×（Fo／Fk）
と定義する。
【００４４】
このΔfk(I,J)のＷ乗、｛Δfk(I,J)｝^W、を第１のフレームメモリ（重みテーブル）３１２の番地(I,J)に記憶する。ここで、Ｗはある定数である。
【００４５】
重みの加算値Σk｛Δfk（I,J）｝^W が計算され、これが第２のフレームメモリ（重み総和テーブル）３１４の番地（Ｉ，Ｊ）に記憶される（実際には前に書き込んだ値を読み出して、新しい値を加算して再度記憶する）。
【００４６】
異なるレンズ位置の画像の演算
Ｓｆ(I,J)＝Σk｛fk（I(k),J(k)）^W×Δfk（I,J）｝
が行われ、その演算結果が第３のフレームメモリ３１６の番地（Ｉ，Ｊ）に記憶される。この計算の意味するところは、２次微分値に対応した重みで画素輝度（濃度）を積算することである。
【００４７】
ここで、Ｉ(k)、Ｊ(k)は一般に整数ではないが、非整数の場合は、最も近い整数値を採用するか、又は隣接整数値を持つ、fk（[I(k)],[J(k)])，fk（[I(k)]+1,[J(k)])，fk（［I(k)],[J(k)]+1)，fk（[I(k)]+1,[J(k)]+1）から補間する。
【００４８】
レンズ位置に対応して取り込まれたＳf(I,J)を第１のフレームメモリ３１２の値で除した値、すなわち
Ｓf(I,J)／Σk｛Δfk（I,J）｝^W
が求められる。これが、ピンぼけの補正されたパンフォーカス画像である。この画像のデータは、内部の演算処理部３０２によって生成された場合には記憶部３０１に記憶され、必要に応じて外部インターフェイス部３０４を介し外部のコンピュータ等に転送される。なお、パンフォーカス画像が得られたならば、元の一連の画像は不要となるので廃棄してよい。
【００４９】
【発明の効果】
以上に詳細に説明した如く、本発明によれば、デジタルカメラシステムにおいて、被写体の奥行き情報を付加した画像を得られるようになるため、撮影画像の３次元化等の編集を容易に行うことができるようになる。
【図面の簡単な説明】
【図１】カメラを移動した時の撮像板上の像の移動量と被写体の奥行きとの関連を説明するための図である。
【図２】本発明の第１の実施例によるデジタルカメラシステムの概略構成図である。
【図３】奥行き情報を抽出するための手順を示すフローチャートである。
【図４】奥行き情報の抽出の説明図である。
【図５】抽出される奥行き情報の例を示す図である。
【図６】本発明の第２の実施例によるデジタルカメラシステムの概略構成図である。
【図７】異方位の画像のつなぎ合わせに関する説明図である。
【図８】平面から球面への座標変換に関する説明図である。
【図９】広角画像の合成処理の流れを示すフローチャートである。
【図１０】広角画像の合成を説明するための図である。
【図１１】広角画像の合成を説明するための図である。
【図１２】本発明の第３の実施例によるデジタルカメラシステムの概略構成図である。
【図１３】パンフォーカス画像の合成処理の説明のための図である。
【符号の説明】
１，１１，２１レンズ
２，１２，２２撮像板
１００撮像部
１０２記憶部
１０４奥行き情報抽出部
１０６操作部
１０８外部インターフェイス部
１１０制御部
２００撮像部
２０２記憶部
２０４歪曲収差補正部
２０５座標変換部
２０６画像合成部
２０８操作部
２１０外部インターフェイス部
２１２制御部
３００撮像部
３０１記憶部
３０２演算処理部
３０３操作部
３０４外部インターフェイス部
３０５制御部
３０６外部の演算処理装置[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a digital camera system.
[0002]
[Prior art]
The human eye does not feel out of focus even with deep objects. In addition, when wearing glasses, the image projected on the retina is greatly distorted, but humans immediately correct it and perceive it. Furthermore, human perception recognizes a clear image over a wide range as a perceptual visual field, although a clear image is obtained only near the center of the visual field.
[0003]
The main differences between such human perceptual images and images taken by conventional cameras can be summarized as the following three points.
(1) The perceptual image includes image depth (three-dimensional) information, but one photographic image does not have this.
(2) The field of view perceived as one (one) image is dramatically wider than that of a photographic image.
(3) The perceived image is pan focus (not out of focus).
[0004]
In the field of digital camera systems, the realization of the functions of human vision as described above is unexplored. With regard to pan focus, the method of selecting the sharpest image pixel from multiple images shot with the focus position slightly shifted and combining the pan focus images is “Kodama et al;” from multiple different focus images. All-focus image reconstruction ", 1995 Television Society Annual Conference, P.149".
[0005]
[Problems to be solved by the invention]
The objective of this invention is providing the digital camera system for acquiring the image close | similar to the above-mentioned human perceptual image.
In particular, the present invention is to provide a digital camera system for acquiring an image to which depth information of a subject is added.
[0006]
[Means for Solving the Problems]
The digital camera according to the present invention includes means for storing a plurality of images of the same subject that are photographed while moving in a direction perpendicular to the direction of the subject, a predetermined image among the plurality of stored images, and Sequentially comparing images other than the above, and using the position where the area where the difference between the two compared images is zero as the reference is the reference position, the position indicating the maximum similarity to the reference position in the predetermined image and the other images And a means for extracting the distance as depth information of the subject.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0010]
According to the first embodiment of the present invention, depth information of a subject can be added to a captured image. This is the subject of the present invention. First, a method for acquiring depth information will be described.
[0011]
In order to find the depth of the subject only from the photographed image, a plurality of images are required. Humans use binocular parallax for depth recognition. If the parallax alone does not provide sufficient accuracy, the person moves to the left and right to intentionally create a larger parallax. To do the same thing with the camera, the photographer moves and shoots the same subject at different locations. By doing so, parallax occurs between the two images.
[0012]
The above will be described with reference to FIG. As shown in FIG. 1, when the distance between the lens 1 of the camera and the imaging plate 2 is Fo and the distance between the lens 1 and the subject Oi is Ro (i), the photographer holds the camera in the direction of the subject. When the lens 1 and the imaging plate 2 move from the positions 1 (1) and 2 (1) to the positions 1 (2) and 2 (2), respectively, when they move by a distance L in a direction perpendicular to the subject, the subject Oi. Is moved by Fo × L / Ro (i) on the imaging plate 1. If Fo is constant and Ro >> Fo, the moving amount of the image on the imaging plate 2 is almost inversely proportional to the distance Ro (i). Therefore, by storing how much each image element (pixel) has moved while a person moves with a camera, the depth information of the subject can be restored.
[0013]
FIG. 2 is a schematic configuration diagram of a digital camera system according to the first embodiment of the present invention based on such a principle. FIG. 3 is a flowchart showing a procedure for adding depth information of a subject to a captured image using the digital camera system of the present embodiment.
[0014]
In FIG. 2, reference numeral 100 denotes an imaging unit including a lens, an imaging plate (CCD, etc.), a focus adjustment, an aperture adjustment, a shutter, and the like. Reference numeral 102 denotes a storage unit for storing data of an image taken by the imaging system 100, 104 denotes a depth information extraction unit for extracting depth information, and 106 denotes an operation such as a shutter operation or switching of a shooting mode by a photographer. An operation unit 108 for performing data transmission, an external interface unit 108 for exchanging data with an external computer or the like, and a control unit 110 for controlling each unit.
[0015]
When adding depth information of a subject to a photographed image, first, one subject is photographed in the still image photographing mode (FIG. 3, step S1). The captured still image frame is stored in the storage unit 102.
[0016]
Next, the mode is switched to the moving image shooting mode, and the same subject is continuously shot while moving in the direction orthogonal to the optical axis direction when the still image is shot (step S2). A series of captured moving image frames is stored in the storage unit 102. However, the moving image frame storage area and the still image frame storage area may be separate physical memories. Since the moving image frame is used for extracting depth information, it is deleted when the depth information has been extracted.
[0017]
When the necessary number of moving image frames have been shot, the shooting operation is terminated, and the depth information extraction unit 104 executes depth information extraction processing (step S3). First, the depth information extraction unit 104 sequentially compares the first moving image frame and each subsequent moving image frame, and searches for a position (reference position) where the area where the difference between the compared images is zero is maximized. Specifically, the difference between the pixel unit or the small region unit is calculated while the two frames to be compared are moved in parallel by a relatively small amount, and the relative position where the sum per frame of the difference value is minimized is calculated. look for. Next, the first moving image frame is divided into m × n rectangular blocks, and the first moving image frame and each subsequent moving image frame are compared in block units. The initial position for comparison is the reference position. By this comparison, a moving image area having the maximum similarity in the vicinity is searched for each block (a search range is determined in advance). Then, a relative distance (I, J) to the moving image area indicating the maximum similarity of each block is obtained, and this is stored in the storage unit 102 as depth information.
[0018]
FIG. 4 is an explanatory diagram of the depth information extraction. 120 (1) is the first moving image frame, and 120 (2), 120 (3),..., 120 (n) are the subsequent moving image frames. Comparing the moving image frame 120 (1) with the next moving image frame 120 (2), it can be seen that the triangular image portion is relatively moved in the horizontal direction. This moved portion is shown as a halftone dot area in the difference frame 121. The relative movement amount (I, J) of the image in the horizontal and vertical directions is extracted in units of blocks.
[0019]
An example of the extracted depth information is shown in FIG. In FIG. 5, (0, 0) means an area where no movement is observed, and (0, -1) and (0, -3) mean an area moved only in one direction. A region where the amount of movement is uniform represents a subject that is substantially equidistant from the camera. The difference in the amount of movement between the regions is the apparent distance between them. In this example, the (0, 0) region and the (0, -3) region are apparently farthest away. The coordinate value of the pixel in the above processing is a value that should be accurately expressed by an angle from the lens center, but it is also possible to use the pixel address of the imaging plate approximately.
[0020]
In this way, the depth information of the subject is extracted using the continuous moving image frames, and is stored in a form added to the still image frame. By importing the depth information into an external computer or the like together with the still image frame, editing such as three-dimensional imaging of the captured image becomes possible.
[0021]
According to the second embodiment of the present invention, the field of view can be dramatically expanded in the digital camera system. As mentioned above, the field of view perceived by humans is far wider than that of photographs. In other words, the size of a single image perceived by a human is much larger than a captured image of a normal camera. Considering this further, humans often recognize images observed at one point as a group of images. The landscape in memory is usually a panoramic image. That is, all images viewed from one point are recognized as wide-angle images or as individual narrow-angle images. Ideally, individual narrow-angle images cover all orientations in succession without interruption. In order to realize this with a camera image, a series of images taken from the same point must be stored in the memory, but the individual images need to be freed from distortion and a series of images. It is necessary to add a tag so that it is a part of an image and its optical axis direction.
[0022]
FIG. 6 is a schematic configuration diagram of a digital camera system according to a second embodiment of the present invention that realizes a field of view expansion based on such a principle. In FIG. 6, reference numeral 200 denotes an image pickup unit including a lens, an image pickup plate (CCD, etc.), mechanisms such as focus adjustment, aperture adjustment, and shutter. Reference numeral 202 denotes a storage unit for storing image data taken by the imaging system 200, 204 denotes a distortion correction unit that removes distortion of the image, and 205 denotes a coordinate conversion unit that converts between plane coordinates and spherical coordinates. , 206 is an image composition unit that executes a process of composing one display image larger than a plurality of images. Reference numeral 208 denotes an operation unit for the photographer to operate the camera such as a shutter operation, 210 denotes an external interface unit for exchanging data with an external computer or the like, and 212 denotes a control unit for controlling each unit.
[0023]
In order to obtain a wide-angle image, first, it is necessary to continuously perform shooting operations while changing the direction little by little from the same point, and the captured image file needs to be stored in the storage unit 202. As a storage method of this image file, either a method of storing as an image of planar lattice points (method 1) or a method of storing in correspondence with the azimuth angle of the pixel (method 2) may be employed. However, whichever storage method is adopted, the distortion aberration correction unit 204 removes the distortion aberration of the image in advance. In order to remove this distortion, for example, the method described in the specification attached to the patent application of Japanese Patent Application No. 8-273294 or Japanese Patent Application No. 8-296025 by the present applicant may be used. Then, explanation is omitted.
[0024]
In the storage method 1, the image captured on the imaging plate is stored as it is. Each pixel corresponds to one incident light to the camera, but in order to enable smooth synthesis of the image, the parameters for enabling the azimuth angle of the incident light to each pixel to be reproduced are retained. It is necessary to keep.
[0025]
In FIG. 7, two different orientations of the camera are indicated by optical axes L1 and L3. 11 (1) and 11 (2) indicate the position of the lens 11 at each orientation, and 12 (1) and 12 (2) indicate the position of the imaging plate 12 (CCD etc.) at each orientation. ing. The arbitrary light beam L2 is shot when the imaging plate 12 is at the position 12 (1) and at the position 12 (2), but the images shot at both positions are connected at the position of the light beam L2. Even if they are combined, one image cannot be synthesized. This is because the two image planes are different planes, and the image does not undergo a smooth shape conversion at the joined portion. In order to be able to synthesize images smoothly, it is necessary to be able to reproduce the azimuth angle of the original incident rays of individual images, and the important parameters that need to be kept are as follows: .
* Image file name * Tag indicating the optical axis direction of the image * Distance between the lens and the imaging plate * Pixel pitch of the imaging plate (horizontal method and vertical direction)
* The relative azimuth angle of each image with respect to a reference image (usually the first image taken in a series of images). The addition, storage, and storage of such parameters are managed by the control unit 212.
[0026]
In the storage method 2, each pixel of the image is stored in correspondence with the azimuth angle of the incident light. Unlike the method 1, in order to facilitate image synthesis, the pixels converted on the flat imaging plate are pre-converted by the coordinate conversion unit 205 so as to be equidistant on the spherical coordinate (azimuth angle). And save it.
[0027]
FIG. 8 shows a state of this coordinate conversion, 21 is a lens, and 22 is an imaging plate. The incident lights L2 and L2 ′ other than the optical axis L1 are converted to the coordinates of the positions q1 and q2 on the spherical surface 23 corresponding to the coordinates of the pixels p1 and p2 having the same pitch formed on the imaging plate 22. The result of such conversion for all pixels is stored. However, as can be seen from FIG. 8, when simple conversion is performed, the pixel pitch decreases on the spherical surface 23 as the distance from the optical axis decreases, which is inconvenient for memory management and is also inconvenient for the composition processing described later. Then, it is converted into pixels on the spherical surface at equal intervals by interpolation. The important parameters that need to be retained for Method 2 are as follows:
* Image file name * Tag indicating the azimuth of the optical axis of the image * Inter-pixel angle pitch (horizontal and vertical directions: In a normal imaging board, the azimuth angle depends on the pixel position, but as described above, it is a constant angle. Pre-converted to be pitch)
* The relative azimuth angle of each image with respect to a reference image (usually the first image taken in a series of images). The addition, storage, and storage of such parameters are managed by the control unit 212.
[0028]
When parameters related to a series of images are prepared in the storage unit 202 as described above, an image with an enlarged field of view can be synthesized in the image synthesis unit 206. FIG. 9 is a flowchart showing the flow of the image composition process. 10 and 11 are diagrams for explaining this image composition.
[0029]
FIG. 10 shows the relationship between images 31 and 32 obtained by photographing different azimuths (optical axis directions) P1 and P2, and it is assumed that the lens center O of the camera is stationary and the subject is at infinity. The coordinate system of the image 31 is represented by (x1, y1, z1), and the coordinate system of the image 32 is represented by (x2, y2, z2). ξ represents the common axis of the (x, y) planes of both coordinate systems, α is the angle formed by the x1 axis and the common axis ξ, β is the angle formed by the x2 axis and the common axis ξ, and γ is the z1 axis and the z2 axis. It is an angle to make. If the image 31 is a reference image, the illustrated α, β, and γ can be used as angle parameters representing the relative angle of the image 32 with respect to the reference image.
[0030]
The contents of the image composition process are as follows. In the first step S11, the orientation (Φ, Ψ) to be viewed and the image size (W) are designated. This is set as (Φ, Ψ, W, α, β, γ).
[0031]
In the next step S 12, an image (Φ ′, Ψ ′, W ′, α ′, β ′, γ ′) having the optical axis direction closest to the previously specified direction (Φ, Ψ) is stored from the storage unit 202. Take out.
[0032]
In the next step S13, a spherical surface in the direction of azimuth (Φ ′, Ψ ′) is defined, and an image (Φ ′, Ψ ′, W ′, α ′, β ′, γ ′) is projected onto this spherical surface. When the image is stored by the method 1, each pixel of the image corresponds to one point on the plane perpendicular to the azimuth (Φ ′, Ψ ′). It is necessary to output the pixel value. This procedure corresponds to the conversion from p1, p2 to q1, q2 in FIG. 8, and is executed using the coordinate conversion unit 205. If the storage method 2 is used, this conversion procedure is not necessary.
[0033]
In the next step S14, the pixel defined as the spherical surface in the previous step is projected onto the target plane P0 perpendicular to the azimuth (Φ, Ψ) (see FIG. 11). The pixel position (i, j) on the target plane P0 is (F0 tan (θ)) ^ 2 = (i px) ^ 2 + (j py) ^ 2
Can be expressed as Here, px and py are pixel pitches in the horizontal and vertical directions on the plane P0, F0 is the distance between the lens and the imaging plate (imaging plane) (the radius of the sphere in FIG. 11), and θ is the optical axis and pixel (i , J) is the angle formed by the incident light.
[0034]
In the next step S15, an unused image (Φ1, Ψ1, W1, α1, β1, γ1) closest to (Φ ′, Ψ ′) is extracted from the storage unit 202, and the processing from step S13 is performed on the image. Continue. The process ends when there is no unused image or when it is determined in step S16 or S17 that the target plane P0 is covered.
[0035]
In this way, a wide-angle composite image using a plurality of images is created on the target plane P 0 and stored in the storage unit 202. The composite image data can be transferred to an external computer or the like via the external interface unit 210 and displayed on the screen as it is.
[0036]
A plurality of images taken from the same point, tags and parameters related to the images are stored in the storage unit 202, the data is transferred to an external computer or the like, and necessary processing is executed by the external computer or the like. Is also possible.
[0037]
As described above, one of the features of an image perceived by humans is that it is pan focus. It is thought that humans are observing an object while continuously moving the focal position of the eye, and reconstructing images of these different focus positions in the brain. With the current technology, a pan-focus image based on only one image has not been obtained, but as the alternative technology, the pixel of the sharpest image is selected from a plurality of images with the focus position gradually shifted. As described above, a method for synthesizing a pan-focus image is known.
[0038]
According to the third embodiment of the present invention, a pan-focus image can be obtained in a digital camera system. FIG. 12 is a schematic configuration diagram of such a digital camera system, and FIG. 13 is an explanatory diagram thereof.
[0039]
In FIG. 12, reference numeral 300 denotes an imaging unit. The imaging unit 300 includes a mechanism that captures a large number of images while shifting the lens position and outputs a series of captured images. As this mechanism, for example, the lens is photographed while sequentially protruding from a position close to the imaging plate (or conversely, moving from a distant position to a position close to the imaging plate), and the sharpest image (with the highest spatial frequency). It is possible to use a known autofocus mechanism that detects the lens position from which the obtained image) is obtained as the in-focus position. However, for the purpose of autofocusing, only one image at the focal point is stored and the other images are discarded, but in the digital camera system of the present invention, all the captured images are stored in the storage unit 301, Used for composition of pan focus images.
[0040]
Reference numeral 302 denotes an arithmetic processing unit that executes pan focus image synthesis processing, 303 denotes an operation unit, 304 denotes an external interface unit for external transfer of data in the storage unit 301, and 305 denotes a control unit that controls each unit. is there. Since the processing for synthesizing the pan focus image requires a large amount of calculation, when the capability of the internal arithmetic processing unit 302 is insufficient, the image and its image are sent to the high-performance arithmetic processing unit 306 such as an external general-purpose computer. It is also possible to transfer related data and execute necessary calculations.
[0041]
Now, all the series of images photographed by the imaging unit 300 are stored in the storage unit 301 together with information on the lens position (distance between the lens and the imaging plate) at the time of each photographing. This series of images is given the same file name, and a serial number is assigned to each image in the order of photographing. For example, landscape 1-1, landscape 1-2,... Attached to each of a series of images 310 shown in FIG. . . “Scenery 1” is a file name, and the number after the hyphen is a serial number. Also, F0, F1, F2,. . . Represents the lens position, F0 is the focal length of the lens (corresponding to an image at infinity), and F1, F2,. . . Is a lens position that is sequentially farther from F0.
[0042]
According to the present invention, a pan-focus image in which defocusing is corrected from a plurality of images weighted using a variable having a high correlation with a density gradient, such as a differential value or a high-order differential value, for each pixel of a series of images. create. Hereinafter, a specific example of the pan-focus image synthesis process will be described with reference to FIG.
[0043]
First, fk (i, j) pixels and surrounding (four neighboring) pixels of the “landscape 1-k” image in the series of images 310 are read, and the next Laplacian is calculated.
Δfk (I (k), J (k))
= | Fk (i (k), j (k)) − fk (i (k) -1, j (k)) |
+ | Fk (i (k), j (k))-fk (i (k), j (k) -1) |
+ | Fk (i (k), j (k)) − fj (i (k), j (k) +1) |
＋ | fk (i (k), j (k)) − fk (i (k) + 2, J (k)) |
However, I (k) = i × (F0 / Fk), J (k) = j × (Fo / Fk)
It is defined as
[0044]
This Δfk (I, J) to the power W, {Δfk (I, J)} ^ W, is stored in the address (I, J) of the first frame memory (weight table) 312. Here, W is a certain constant.
[0045]
The weight addition value Σk {Δfk (I, J)} ^ W is calculated and stored in the address (I, J) of the second frame memory (weight sum table) 314 (actually written before) Read the value, add the new value and store again).
[0046]
Calculation of images at different lens positions Sf (I, J) = Σk {fk (I (k), J (k)) ^ W × Δfk (I, J)}
The calculation result is stored in the address (I, J) of the third frame memory 316. This calculation means that the pixel luminance (density) is integrated with a weight corresponding to the secondary differential value.
[0047]
Here, I (k) and J (k) are generally not integers. However, when they are non-integer numbers, fk ([I (k)], [J (k)]), fk ([I (k)] + 1, [J (k)]), fk ([I (k)], [J (k)] + 1), fk ([I Interpolate from (k)] + 1, [J (k)] + 1).
[0048]
A value obtained by dividing Sf (I, J) captured corresponding to the lens position by the value of the first frame memory 312, that is, Sf (I, J) / Σk {Δfk (I, J)} ^ W
Is required. This is a pan-focus image with defocus corrected. When the image data is generated by the internal arithmetic processing unit 302, the image data is stored in the storage unit 301 and transferred to an external computer or the like via the external interface unit 304 as necessary. If a pan-focus image is obtained, the original series of images is no longer necessary and may be discarded.
[0049]
【The invention's effect】
As described above in detail, according to the present invention, an image to which subject depth information is added can be obtained in a digital camera system, so that editing such as three-dimensional imaging of a captured image can be easily performed. become able to.
[Brief description of the drawings]
FIG. 1 is a diagram for explaining the relationship between the amount of movement of an image on an imaging plate and the depth of a subject when a camera is moved.
FIG. 2 is a schematic configuration diagram of a digital camera system according to a first embodiment of the present invention.
FIG. 3 is a flowchart showing a procedure for extracting depth information.
FIG. 4 is an explanatory diagram of extraction of depth information.
FIG. 5 is a diagram illustrating an example of extracted depth information.
FIG. 6 is a schematic configuration diagram of a digital camera system according to a second embodiment of the present invention.
FIG. 7 is an explanatory diagram relating to stitching images of different orientations.
FIG. 8 is an explanatory diagram relating to coordinate conversion from a plane to a spherical surface;
FIG. 9 is a flowchart showing a flow of wide-angle image composition processing;
FIG. 10 is a diagram for explaining synthesis of a wide-angle image.
FIG. 11 is a diagram for explaining synthesis of a wide-angle image.
FIG. 12 is a schematic configuration diagram of a digital camera system according to a third embodiment of the present invention.
FIG. 13 is a diagram for explaining pan focus image synthesis processing;
[Explanation of symbols]
1,11,21 Lens 2,12,22 Imaging plate 100 Imaging unit 102 Storage unit 104 Depth information extraction unit 106 Operation unit 108 External interface unit 110 Control unit 200 Imaging unit 202 Storage unit 204 Distortion aberration correction unit 205 Coordinate conversion unit 206 Image composition unit 208 Operation unit 210 External interface unit 212 Control unit 300 Imaging unit 301 Storage unit 302 Calculation processing unit 303 Operation unit 304 External interface unit 305 Control unit 306 External calculation processing device

Claims

It means for storing a plurality of images against captured while moving in a direction perpendicular to the direction of the object, the same object,
The predetermined image of the plurality of stored images and the other images are sequentially compared, and the position where the difference between the two compared images is zero is the reference position, and the predetermined image and the others A distance between the reference position and the position indicating the maximum similarity in the block, and extracting the distance as subject depth information;
A digital camera system comprising: