JP2013005025A

JP2013005025A - Stereoscopic image generating device, stereoscopic image generating method, program, and recording medium

Info

Publication number: JP2013005025A
Application number: JP2011131255A
Authority: JP
Inventors: Kenji Tsukuba; 健史筑波; Masahiro Shioi; 正宏塩井; Takeaki Suenaga; 健明末永; 敦稔〆野; Atsutoshi Simeno
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2011-06-13
Filing date: 2011-06-13
Publication date: 2013-01-07
Anticipated expiration: 2031-06-13
Also published as: WO2012172853A1; JP5210416B2

Abstract

PROBLEM TO BE SOLVED: To provide a stereoscopic image with more natural feeling of depth by generating the depth model of an image on the basis of a vanishing point if the position of the vanishing point can be estimated and by generating the depth model of the image on the basis of a degree of conspicuousness if the position of the vanishing point cannot be estimated.SOLUTION: A stereoscopic image generating device 1 is provided with a vanishing point estimation part 20 for estimating the vanishing point from an image being processed; a depth model generation part 30 which generates different depth models depending on whether the vanishing point can be estimated by the vanishing point estimation part 20; a view-point image generation part 40 which generates an image to be presented to the right eye and an image to be presented to the left eye on the basis of the depth model generated by the depth model generation part 30, the image being processed, and information on estimated viewing condition. The depth model generation part 30 generates the depth model on the basis of the vanishing point if the vanishing point can be estimated by the vanishing point estimation part 20, but generates the depth model on the basis of the degree of conspicuousness of each pixel in the image being processed if the vanishing point cannot be estimated by the vanishing point estimation part 20.

Description

本発明は、２Ｄ画像に対して両眼立体情報を付加し、３Ｄ画像を生成する立体画像生成装置、立体画像生成方法、プログラム、及び記録媒体に関する。 The present invention relates to a stereoscopic image generation apparatus, a stereoscopic image generation method, a program, and a recording medium that add binocular stereoscopic information to a 2D image and generate a 3D image.

近年、３ＤＴＶ（3D Television）の普及と３Ｄデジタル放送の開始により、家庭において３Ｄ映像を視聴する環境が整いつつある。しかし、３Ｄ映像の再生環境の整備に伴い、３Ｄ映像のコンテンツ不足が指摘されている。こうしたコンテンツ不足の解消へのアプローチとして、２Ｄ画像に対して人工的に両眼立体情報を付加し、３Ｄ画像を生成する２Ｄ／３Ｄ変換（2D to 3D conversion）が注目されている。 In recent years, with the spread of 3D TV (3D Television) and the start of 3D digital broadcasting, the environment for viewing 3D video at home is being prepared. However, with the improvement of the 3D video playback environment, it has been pointed out that there is a shortage of 3D video content. As an approach to resolving such a shortage of content, attention has been focused on 2D / 3D conversion (2D to 3D conversion) in which binocular stereoscopic information is artificially added to a 2D image to generate a 3D image.

２Ｄ／３Ｄ変換を実現する手法として、例えば、特許文献１に示す手法が知られている。この特許文献１には、基本となる３種類の画像の奥行値を示す基本奥行モデルを備え、入力画像のパターンによって、３種類の基本奥行モデルの合成比を変えて、入力画像の奥行モデルを生成し、生成した奥行モデルと入力画像とから、左眼／右眼へ提示する画像を生成する立体画像生成装置が開示されている。 As a technique for realizing 2D / 3D conversion, for example, a technique disclosed in Patent Document 1 is known. This patent document 1 is provided with a basic depth model indicating depth values of three basic images, and the input image depth model is changed by changing the composition ratio of the three basic depth models according to the pattern of the input image. A stereoscopic image generation apparatus that generates an image to be presented to the left eye / right eye from the generated depth model and an input image is disclosed.

特許第４２１４９７６号明細書（特開２００５−１５１５３４号公報）Japanese Patent No. 4214976 (Japanese Patent Laid-Open No. 2005-151534)

J. Shi and C. Tomasi, “Good Features to Track,” 9th IEEE Conference on Computer Vision and Pattern Recognition, June 1994J. Shi and C. Tomasi, “Good Features to Track,” 9th IEEE Conference on Computer Vision and Pattern Recognition, June 1994 B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Proceedings of the 1981 DARPA Imaging Understanding Workshop (pp.121-130), 1981B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” Proceedings of the 1981 DARPA Imaging Understanding Workshop (pp.121-130), 1981 ＣＧ−ＡＲＴＳ協会，ディジタル画像処理第２版，2009CG-ARTS Association, Digital Image Processing 2nd Edition, 2009 太田登著, 色彩工学第２版, 東京電機大学出版局，2001Ota Noboru, Color Engineering 2nd Edition, Tokyo Denki University Press, 2001 L. Itti, C. Koch, E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.11, pp.1254-1259, Nov 1998.L. Itti, C. Koch, E. Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No.11, pp.1254-1259, Nov 1998. A. Telea, “An image inpainting technique based on the fast marching method,” Journal of Graphics Tools 9 (2004), pp.25-36.A. Telea, “An image inpainting technique based on the fast marching method,” Journal of Graphics Tools 9 (2004), pp. 25-36.

しかしながら、特許文献１に開示されている立体画像生成装置では、想定した基本奥行モデルに合致しない画像の奥行モデルを表現することが困難である。例えば、画面外に消失点がある場合、３種類の基本奥行モデルの合成比を変更しても表現することができないという問題がある。このため、自然な奥行感のある立体映像を生成することができなかった。 However, in the stereoscopic image generation device disclosed in Patent Literature 1, it is difficult to express a depth model of an image that does not match the assumed basic depth model. For example, when there is a vanishing point outside the screen, there is a problem that it cannot be expressed even if the composition ratio of the three basic depth models is changed. For this reason, it was not possible to generate a stereoscopic image with a natural sense of depth.

本発明は、上述のような問題点を解決するためになされたものであって、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成し、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度に基づいて画像の奥行モデルを生成することにより、より自然な奥行感のある立体画像を生成可能とする立体画像生成装置、立体画像生成方法、プログラム、及び記録媒体を提供することを目的とする。 The present invention was made to solve the above-described problems, and when the vanishing point position can be estimated by a geometric depth cue, a depth model of the image is generated based on the vanishing point, If the vanishing point position cannot be estimated due to a geometric depth cue, a more natural depth sensation can be obtained by generating a depth model of the image based on the degree of saliency representing the attractiveness in the image based on human visual characteristics. It is an object of the present invention to provide a stereoscopic image generating apparatus, a stereoscopic image generating method, a program, and a recording medium that can generate a certain stereoscopic image.

上記課題を解決するために、本発明の第１の技術手段は、２Ｄ画像に両眼立体情報を付加し、３Ｄ画像を生成する立体画像生成装置であって、処理対象画像から消失点を推定する消失点推定手段と、該消失点推定手段により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成手段と、該奥行モデル生成手段により生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成手段とを備え、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴としたものである。 In order to solve the above-described problem, a first technical means of the present invention is a stereoscopic image generation apparatus that generates binocular stereoscopic information by adding binocular stereoscopic information to a 2D image, and estimates a vanishing point from the processing target image. Vanishing point estimating means, depth model generating means for generating a different depth model based on whether the vanishing point can be estimated by the vanishing point estimating means, the depth model generated by the depth model generating means, and the processing target Based on the image and the assumed viewing condition information, a viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image is provided, and the depth model generation unit can estimate the vanishing point by the vanishing point estimation unit. A depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated by the vanishing point estimating means, the depth model is based on the saliency of each pixel in the processing target image. Is obtained is characterized in that produced.

第２の技術手段は、第１の技術手段において、処理対象画像から所定の画像サイズの縮小画像を生成する縮小画像生成手段を備え、前記縮小画像を、前記消失点推定手段と前記奥行モデル生成手段の入力とし、該奥行モデル生成手段により生成した前記縮小画像の奥行モデルから前記処理対象画像と同一画像サイズの拡大奥行モデルを生成する拡大奥行モデル生成手段を備えることを特徴としたものである。 The second technical means includes, in the first technical means, reduced image generation means for generating a reduced image having a predetermined image size from the processing target image, and the reduced image is converted into the vanishing point estimation means and the depth model generation. And an enlarged depth model generating means for generating an enlarged depth model having the same image size as the processing target image from the depth model of the reduced image generated by the depth model generating means. .

第３の技術手段は、第１の技術手段において、前記奥行モデル生成手段により生成した処理対象画像の奥行モデルを空間方向に平滑化し、該空間方向に平滑化された前記処理対象画像の奥行モデルと、該処理対象画像よりも過去の比較対象画像の時空間方向に平滑化された奥行モデルとに基づいて、前記処理対象画像の奥行モデルを時間方向に平滑化し、前記処理対象画像の時空間方向に平滑化された奥行モデルを生成する時空間方向平滑化手段を備えることを特徴としたものである。 According to a third technical means, in the first technical means, the depth model of the processing target image generated by the depth model generation means is smoothed in the spatial direction, and the depth model of the processing target image smoothed in the spatial direction is used. And the depth model smoothed in the spatio-temporal direction of the comparison target image in the past than the processing target image, the depth model of the processing target image is smoothed in the time direction, and the spatio-temporal of the processing target image A spatio-temporal direction smoothing means for generating a depth model smoothed in the direction is provided.

第４の技術手段は、第１〜第３のいずれか１の技術手段において、前記想定視聴条件情報は、前記３Ｄ画像を表示するディスプレイの画素ピッチ、該ディスプレイの画像サイズ、視聴者から前記ディスプレイまでの距離、前記３Ｄ画像の奥行量を表す視差範囲、左右の仮想視点間の距離である基線長を含むことを特徴としたものである。 According to a fourth technical means, in any one of the first to third technical means, the assumed viewing condition information includes a pixel pitch of a display for displaying the 3D image, an image size of the display, a viewer to the display And a parallax range representing the depth of the 3D image, and a baseline length that is a distance between the left and right virtual viewpoints.

第５の技術手段は、第１〜第４のいずれか１の技術手段において、前記処理対象画像内の各画素の顕著度は、注目画素とその周辺画素との色差が大きい箇所、あるいは、注目画素と画像全体との色差が大きい箇所、あるいは、注目画素を含む局所領域とその周辺領域との色差が大きい箇所ほど高く算出されることを特徴としたものである。 According to a fifth technical means, in any one of the first to fourth technical means, the saliency of each pixel in the processing target image is a point where the color difference between the target pixel and its surrounding pixels is large, or the target The higher the color difference between the pixel and the whole image, or the higher the color difference between the local region including the target pixel and its surrounding region, the higher the calculation.

第６の技術手段は、第５の技術手段において、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度が高い箇所が手前側になるように奥行モデルを生成することを特徴としたものである。 A sixth technical means is the fifth technical means, wherein when the vanishing point cannot be estimated by the vanishing point estimating means, the depth model generating means has a high saliency of each pixel in the processing target image. The depth model is generated so that is on the near side.

第７の技術手段は、第１〜第６のいずれか１の技術手段において、前記消失点推定手段は、前記処理対象画像内の直線情報から該処理対象画像の消失点を推定するフレーム内消失点推定手段と、前記処理対象画像と該処理対象画像よりも過去の比較対象画像と該比較対象画像における消失点の位置とに基づいて、前記処理対象画像の消失点を推定するフレーム間消失点推定手段とを備えたことを特徴としたものである。 A seventh technical means is any one of the first to sixth technical means, wherein the vanishing point estimating means estimates the vanishing point of the processing target image from the straight line information in the processing target image. An inter-frame vanishing point that estimates a vanishing point of the processing target image based on a point estimation unit, the processing target image, a comparison target image that is earlier than the processing target image, and a vanishing point position in the comparison target image; And an estimation means.

第８の技術手段は、第７の技術手段において、前記処理対象画像と前記比較対象画像との間でシーンチェンジがあったか否かを検出するシーンチェンジ検出手段を備え、該シーンチェンジ検出手段によりシーンチェンジが検出された場合、前記フレーム内消失点推定手段が選択され、前記シーンチェンジ検出手段によりシーンチェンジが検出されない場合、前記フレーム間消失点推定手段が選択されることを特徴としたものである。 The eighth technical means comprises scene change detection means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image in the seventh technical means, and the scene change detection means When a change is detected, the intra-frame vanishing point estimating means is selected, and when no scene change is detected by the scene change detecting means, the inter-frame vanishing point estimating means is selected. .

第９の技術手段は、第８の技術手段において、前記比較対象画像の消失点の位置を含む消失点情報を記憶する記憶手段を備え、該記憶手段に前記比較対象画像の消失点情報が記憶されている場合、前記フレーム間消失点推定手段が選択され、前記記憶手段に前記比較対象画像の消失点情報が記憶されていない場合、前記フレーム内消失点推定手段が選択されることを特徴としたものである。 A ninth technical means includes, in the eighth technical means, storage means for storing vanishing point information including the position of the vanishing point of the comparison target image, and the vanishing point information of the comparison target image is stored in the storage means. The inter-frame vanishing point estimating means is selected, and when the vanishing point information of the comparison target image is not stored in the storage means, the intra-frame vanishing point estimating means is selected. It is a thing.

第１０の技術手段は、第７〜第９のいずれか１の技術手段において、前記比較対象画像は、前記処理対象画像の１つ前の画像であることを特徴としたものである。 According to a tenth technical means, in any one of the seventh to ninth technical means, the comparison target image is an image immediately preceding the processing target image.

第１１の技術手段は、２Ｄ画像に両眼立体情報を付加し、３Ｄ画像を生成する立体画像生成装置による立体画像生成方法であって、前記立体画像生成装置が、処理対象画像から消失点を推定する消失点推定ステップと、該消失点推定ステップにて消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成ステップと、該奥行モデル生成ステップにて生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成ステップとを備え、前記奥行モデル生成ステップは、前記消失点推定ステップにて消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定ステップにて消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成することを特徴としたものである。 An eleventh technical means is a stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image, and the stereoscopic image generation apparatus detects a vanishing point from a processing target image. A vanishing point estimating step, a depth model generating step for generating a different depth model based on whether or not the vanishing point can be estimated in the vanishing point estimating step, and a depth model generated in the depth model generating step; A viewpoint image generation step of generating a right eye presentation image and a left eye presentation image based on the processing target image and the assumed viewing condition information, and the depth model generation step includes a vanishing point in the vanishing point estimation step. Can be estimated, a depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated in the vanishing point estimation step, the processing target image It is obtained by and generating a depth model based on the saliency of each pixel of.

第１２の技術手段は、コンピュータに、第１１の技術手段における立体画像生成方法を実行させるためのプログラムである。 The twelfth technical means is a program for causing a computer to execute the stereoscopic image generating method in the eleventh technical means.

第１３の技術手段は、第１２の技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。 The thirteenth technical means is a computer-readable recording medium recording the program according to the twelfth technical means.

本発明によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成することにより、幾何的な奥行手掛かりによる奥行感を強調した立体画像を生成することができる。
また、本発明によれば、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを生成することにより、人の注目する部分の奥行感を強調した立体画像を生成することができる。 According to the present invention, when the vanishing point position can be estimated by the geometric depth cue, a stereoscopic image in which the depth feeling by the geometric depth cue is emphasized is generated by generating the image depth model based on the vanishing point. Can be generated.
Further, according to the present invention, when the vanishing point position cannot be estimated by the geometric depth cue, by generating the image depth model from the saliency representing the attractiveness in the image based on the human visual characteristics, It is possible to generate a stereoscopic image in which the sense of depth of a portion that is noticed by a person is emphasized.

本発明の実施形態に係る立体画像生成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus which concerns on embodiment of this invention. 輝度ヒストグラムに基づくシーンチェンジ検出の概略図である。It is the schematic of the scene change detection based on a brightness | luminance histogram. 本発明の実施形態に係るシーンチェンジ検出部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the scene change detection part which concerns on embodiment of this invention. 本発明の実施形態に係るシーンチェンジ検出部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the scene change detection part which concerns on embodiment of this invention. 本発明の実施形態に係る消失点推定部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the vanishing point estimation part which concerns on embodiment of this invention. 本発明の実施形態に係る消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part which concerns on embodiment of this invention. 本発明の実施形態に係るフレーム内消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part in a flame | frame which concerns on embodiment of this invention. 図８のフローに対応した画像の一例を示す図である。It is a figure which shows an example of the image corresponding to the flow of FIG. ハフ変換による直線検出を説明するための概略図である。It is the schematic for demonstrating the straight line detection by Hough transform. 本発明の実施形態に係るフレーム間消失点推定部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the vanishing point estimation part between frames which concerns on embodiment of this invention. 図１１のフローに対応した画像の一例を示す図である。It is a figure which shows an example of the image corresponding to the flow of FIG. 同一シーン内において、フレーム内消失点推定手段、フレーム間消失点推定手段の適用される範囲の一例を示す図である。It is a figure which shows an example of the range to which an intra-frame vanishing point estimation means and an inter-frame vanishing point estimation means are applied in the same scene. 本発明の実施形態に係る奥行モデル生成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る奥行モデル生成部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る画面内に消失点がある場合の基本奥行モデルの一例を示す図である。It is a figure which shows an example of a basic depth model in case there exists a vanishing point in the screen which concerns on embodiment of this invention. 本発明の実施形態に係る画面外に消失点がある場合の基本奥行モデルの一例を示す図である。It is a figure which shows an example of the basic depth model in case there exists a vanishing point outside the screen which concerns on embodiment of this invention. 本発明の実施形態に係る消失点に基づいた奥行モデルを求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model based on the vanishing point which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデルを求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model based on the saliency which concerns on embodiment of this invention. 視点画像を生成するためのカメラ（視点）配置の俯瞰図である。It is a bird's-eye view of camera (viewpoint) arrangement for generating a viewpoint image. 本発明の実施形態に係る視点画像生成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the viewpoint image generation part which concerns on embodiment of this invention. 本発明の実施形態に係る視点画像生成部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the viewpoint image generation part which concerns on embodiment of this invention. 本発明の実施形態に係る視点画像を生成する過程の一例を示す図である。It is a figure which shows an example of the process which produces | generates the viewpoint image which concerns on embodiment of this invention. 交差方向、及び開散方向の視差ベクトルを示す図である。It is a figure which shows the parallax vector of a cross direction and a spreading | diffusion direction. 本発明の実施形態に係る奥行モデル生成部の変形例を示すブロック図である。It is a block diagram which shows the modification of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る奥行モデル生成部の変形例における顕著度に基づく奥行モデル作成手段の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the depth model production | generation means based on the saliency in the modification of the depth model production | generation part which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデル（変形例）を求める過程の一例を示す図である。It is a figure which shows an example of the process which calculates | requires the depth model (modification) based on the saliency which concerns on embodiment of this invention. 本発明の実施形態に係る顕著度に基づいた奥行モデル（変形例）において、基準となる奥行モデルの変形例の一例と対応する奥行モデルの一例を示す図である。It is a figure which shows an example of the depth model corresponding to an example of the modification of the depth model used as a reference | standard in the depth model (modification) based on the saliency which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置（第一の変形例）の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus (1st modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置（第の一変形例）のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus (1st modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置（第二の変形例）の構成例を示すブロック図である。It is a block diagram which shows the structural example of the stereo image production | generation apparatus (2nd modification) which concerns on embodiment of this invention. 本発明の実施形態に係る立体画像生成装置（第二の変形例）のフレーム単位の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the frame unit of the stereo image production | generation apparatus (2nd modification) which concerns on embodiment of this invention. 本発明の実施形態に係る時空間方向平滑化部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the spatio-temporal direction smoothing part which concerns on embodiment of this invention. 本発明の実施形態に係る時空間方向平滑化部の動作例を説明するためのフロー図である。It is a flowchart for demonstrating the operation example of the spatiotemporal direction smoothing part which concerns on embodiment of this invention.

以下、図面を参照しながら本発明の実施形態について詳しく説明する。なお、図面において同じ機能を有する部分については同じ符号を付し、繰り返しの説明は省略する。
図１は、本発明に係る立体画像生成装置の概略構成例を示すブロック図である。図中、１は立体画像生成装置を示す。立体画像生成装置１は、シーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、及び視点画像生成部４０を備えている。また、図２は、本発明に係る立体画像生成装置１のフレーム単位の動作例を説明するためのフロー図である。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the drawings, portions having the same function are denoted by the same reference numerals, and repeated description is omitted.
FIG. 1 is a block diagram illustrating a schematic configuration example of a stereoscopic image generating apparatus according to the present invention. In the figure, reference numeral 1 denotes a stereoscopic image generating apparatus. The stereoscopic image generation apparatus 1 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, and a viewpoint image generation unit 40. FIG. 2 is a flowchart for explaining an operation example in units of frames of the stereoscopic image generating apparatus 1 according to the present invention.

図２において、まず、図１の立体画像生成装置１は、入力された時刻ｔの画像（以降、処理対象画像Ｆ（ｔ）ともいう）をシーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、及び視点画像生成部４０へ出力する（図２のステップＳ１１）。 2, first, the stereoscopic image generation apparatus 1 in FIG. 1 uses an input image at time t (hereinafter also referred to as a processing target image F (t)) as a scene change detection unit 10, a vanishing point estimation unit 20, a depth. It outputs to the model production | generation part 30 and the viewpoint image production | generation part 40 (step S11 of FIG. 2).

（シーンチェンジ検出部１０について）
図１のシーンチェンジ検出部１０は、本発明のシーンチェンジ検出手段に相当し、入力された処理対象画像Ｆ（ｔ）と、処理対象画像Ｆ（ｔ）より一つ前に入力された画像（以降、比較対象画像Ｆ（ｔ−１）ともいう）から所定の画像特徴量を算出し、算出した画像特徴量の類似度を比較して、時系列に連続する画像の区分点（シーンチェンジ）を検出し、処理対象画像Ｆ（ｔ）においてシーンチェンジの有無を表すシーンチェンジ情報Ｓ（ｔ）を消失点推定部２０、および奥行モデル生成部３０へ出力する（図２のステップＳ１２）。ここで、所定の画像特徴量の一例として、画像の輝度値の出現頻度を表す輝度ヒストグラムに基づくシーンチェンジ検出について図３〜図５に基づき説明する。 (About the scene change detection unit 10)
The scene change detection unit 10 in FIG. 1 corresponds to the scene change detection means of the present invention, and the input processing target image F (t) and the image input immediately before the processing target image F (t) ( Hereinafter, a predetermined image feature amount is calculated from the comparison target image F (t-1)), the similarity of the calculated image feature amount is compared, and segmentation points (scene changes) of images that are continuous in time series , And scene change information S (t) indicating the presence or absence of a scene change in the processing target image F (t) is output to the vanishing point estimation unit 20 and the depth model generation unit 30 (step S12 in FIG. 2). Here, as an example of the predetermined image feature amount, scene change detection based on a luminance histogram representing the appearance frequency of the luminance value of an image will be described with reference to FIGS.

図３に示すように、輝度ヒストグラムに基づくシーンチェンジ検出は、処理対象画像Ｆ（ｔ）と比較対象画像Ｆ（ｔ−１）からそれぞれの輝度ヒストグラムＨ_Ｌ（ｔ）とＨ_Ｌ（ｔ−１）を算出し、算出した輝度ヒストグラムの類似度ｄ（Ｈ_Ｌ（ｔ）, Ｈ_Ｌ（ｔ−１））と所定の閾値とを比較して、処理対象画像Ｆ（ｔ）においてシーンチェンジの有無を判定するというものである。 As shown in FIG. 3, the scene change detection based on the luminance histogram is performed based on the luminance histograms H _L (t) and H _L (t−1) from the processing target image F (t) and the comparison target image F (t−1). ) And the calculated similarity d (H _L (t), H _L (t−1)) of the luminance histogram is compared with a predetermined threshold value, and whether or not there is a scene change in the processing target image F (t) Is to determine.

図４に示すように、シーンチェンジ検出部１０は、輝度ヒストグラム生成部１０１、バッファ１０２、ヒストグラム類似度算出部１０３、およびシーンチェンジ判定部１０４で構成されている。図５は、シーンチェンジ検出部１０の動作例を説明するためのフロー図である。図５において、図４の輝度ヒストグラム生成部１０１は、入力された処理対象画像Ｆ（ｔ）から輝度情報を取得し、取得した輝度情報から輝度値の出現頻度を表す輝度ヒストグラムＨ_Ｌ（ｔ）を算出し、その算出結果（輝度ヒストグラムＨ_Ｌ（ｔ））をバッファ１０２、ヒストグラム類似度算出部１０３へ出力する（図５のステップＳ２１）。 As shown in FIG. 4, the scene change detection unit 10 includes a luminance histogram generation unit 101, a buffer 102, a histogram similarity calculation unit 103, and a scene change determination unit 104. FIG. 5 is a flowchart for explaining an operation example of the scene change detection unit 10. In FIG. 5, the luminance histogram generation unit 101 in FIG. 4 acquires luminance information from the input processing target image F (t), and a luminance histogram H _L (t) that represents the appearance frequency of the luminance value from the acquired luminance information. And the calculation result (luminance histogram H _L (t)) is output to the buffer 102 and the histogram similarity calculation unit 103 (step S21 in FIG. 5).

図４のバッファ１０２は、処理対象画像Ｆ（ｔ）の１つ後の画像Ｆ（ｔ＋１）におけるシーンチェンジ検出のために、処理対象画像Ｆ（ｔ）の輝度ヒストグラムＨ_Ｌ（ｔ）を記憶する（図５のステップＳ２２）。図４のヒストグラム類似度算出部１０３は、入力された処理対象画像Ｆ（ｔ）の輝度ヒストグラムＨ_Ｌ（ｔ）と、バッファ１０２より読みだした比較対象画像Ｆ（ｔ−１）の輝度ヒストグラムＨ_Ｌ（ｔ−１）から類似度ｄ（Ｈ_Ｌ（ｔ）, Ｈ_Ｌ（ｔ−１））を式（１）により算出し、その算出結果をシーンチェンジ判定部１０４へ出力する（図５のステップＳ２３）。 The buffer 102 in FIG. 4 stores the luminance histogram H _L (t) of the processing target image F (t) in order to detect a scene change in the image F (t + 1) immediately after the processing target image F (t). (Step S22 in FIG. 5). The histogram similarity calculation unit 103 in FIG. 4 inputs the luminance histogram H _L (t) of the input processing target image F (t) and the luminance histogram H of the comparison target image F (t−1) read from the buffer 102. _The similarity d (H _L (t), H _L (t−1)) is calculated from _L (t−1) by the equation (1), and the calculation result is output to the scene change determination unit 104 (FIG. 5). Step S23).

ここで、式（１）において、Ｗは画像の１ライン毎のピクセル数を表し、Ｈは画像のライン数を表し、ｖは輝度値を表し、Ｖは輝度値の階調数を表し、Ｈ_Ｌ（ｖ｜ｔ）は時刻ｔにおける画像Ｆ（ｔ）上の輝度値ｖの出現頻度を表す。また、式（１）において、類似度ｄ（Ｈ_Ｌ（ｔ）, Ｈ_Ｌ（ｔ−１））のとる値の範囲は０〜２となり、値が０に近いほどヒストグラムの形状が似ており、値が２に近いほどヒストグラムの形状が異なることを表す。 Here, in Expression (1), W represents the number of pixels per line of the image, H represents the number of lines of the image, v represents the luminance value, V represents the number of gradations of the luminance value, and H _L (v | t) represents the appearance frequency of the luminance value v on the image F (t) at time t. In Expression (1), the range of values taken by the similarity d (H _L (t), H _L (t−1)) is 0 to 2, and the closer the value is to 0, the more similar the shape of the histogram is. The closer the value is to 2, the more different the shape of the histogram is.

図４のシーンチェンジ判定部１０４は、入力されたヒストグラムの類似度ｄ（Ｈ_Ｌ（ｔ）, Ｈ_Ｌ（ｔ−１））と、所定の閾値ｄ_ｔｈとで閾値判定を行い、式（２）により処理対象画像Ｆ（ｔ）においてシーンチェンジの有無を表すシーンチェンジ情報Ｓ（ｔ）を設定し、外部へ出力する（図５のステップＳ２４）。 The scene change determination unit 104 in FIG. 4 performs threshold determination based on the similarity d (H _L (t), H _L (t−1)) of the input histogram and a predetermined threshold d _th, and formula (2) ), Scene change information S (t) indicating the presence / absence of a scene change in the processing target image F (t) is set and output to the outside (step S24 in FIG. 5).

つまり、シーンチェンジ判定部１０４は、類似度ｄ（Ｈ_Ｌ（ｔ）, Ｈ_Ｌ（ｔ−１））が閾値ｄ_ｔｈより小さい場合、シーンチェンジが無いと判定し、シーンチェンジ情報Ｓ（ｔ）に「０」を設定する。それ以外の場合は、シーンチェンジが有ると判定し、シーンチェンジ情報Ｓ（ｔ）に「１」を設定する。 That is, when the similarity d (H _L (t), H _L (t−1)) is smaller than the threshold value d _th , the scene change determination unit 104 determines that there is no scene change, and the scene change information S (t) Set “0” to. In other cases, it is determined that there is a scene change, and “1” is set in the scene change information S (t).

以上、シーンチェンジ検出部１０によれば、処理対象画像Ｆ（ｔ）と、比較対象画像Ｆ（ｔ−１）から所定の画像特徴量を算出し、算出した画像特徴量の類似度を比較することで、時系列に連続する画像の区分点（シーンチェンジ）を検出することができる。 As described above, according to the scene change detection unit 10, the predetermined image feature amount is calculated from the processing target image F (t) and the comparison target image F (t-1), and the similarities of the calculated image feature amounts are compared. Thus, it is possible to detect segment points (scene changes) of images that are continuous in time series.

（消失点推定部２０について）
図１に戻って、消失点推定部２０は、本発明の消失点推定手段に相当し、入力された処理対象画像Ｆ（ｔ）のシーンチェンジ情報Ｓ（ｔ）と、消失点推定部２０の内部で記憶している一つ前の消失点情報ＶＰ（ｔ−１）に基づいて消失点の推定手段（画像内の直線から消失点位置を推定するフレーム内消失点推定手段、画像間の特徴点の対応関係と前フレームの消失点位置から現フレームにおける消失点位置を推定するフレーム間消失点推定手段）を選択し、選択した消失点推定手段により入力された処理対象画像Ｆ（ｔ）から消失点の位置を推定して、その結果を記述した消失点情報ＶＰ（ｔ）を奥行モデル生成部３０へ出力する（図２のステップＳ１３）。ここで「消失点」とは、３次元空間において平行な２直線を平面に射影（投影）すると、それらの線が必ず１点に収束する点のことである。 (About the vanishing point estimation unit 20)
Returning to FIG. 1, the vanishing point estimation unit 20 corresponds to the vanishing point estimation unit of the present invention, and the input scene change information S (t) of the processing target image F (t) and the vanishing point estimation unit 20. Vanishing point estimating means (intra-frame vanishing point estimating means for estimating vanishing point position from straight line in image, feature between images based on previous vanishing point information VP (t−1) stored internally An inter-frame vanishing point estimating means for estimating the vanishing point position in the current frame from the point correspondence and the vanishing point position of the previous frame) is selected, and the processing target image F (t) input by the selected vanishing point estimating means is selected. The vanishing point position is estimated, and vanishing point information VP (t) describing the result is output to the depth model generating unit 30 (step S13 in FIG. 2). Here, the “vanishing point” refers to a point that, when two parallel lines in a three-dimensional space are projected (projected) onto a plane, these lines always converge to one point.

続いて、本実施形態における消失点推定部２０について詳細に説明する。図６に示すように、消失点推定部２０は、切替部２０１，切替部２０２、フレーム内消失点推定部２１、フレーム間消失点推定部２２、バッファ２０３、およびバッファ２０４で構成されている。また、図６のフレーム内消失点推定部２１は、エッジ検出部２１１、直線検出部２１２、および消失点同定部２１３で構成されている。このフレーム内消失点推定部２１は、本発明のフレーム内消失点推定手段に相当し、処理対象画像Ｆ（ｔ）内の直線情報から処理対象画像Ｆ（ｔ）の消失点を推定する。また、図６のフレーム間消失点推定部２２は、特徴点検出部２２１、対応点算出部２２２、変換行列算出部２２３、および消失点位置算出部２２４で構成されている。このフレーム間消失点推定部２２は、本発明のフレーム間消失点推定手段に相当し、処理対象画像Ｆ（ｔ）と処理対象画像Ｆ（ｔ）よりも過去の比較対象画像Ｆ（ｔ−１）と比較対象画像Ｆ（ｔ−１）における消失点の位置とに基づいて、処理対象画像Ｆ（ｔ）の消失点を推定する。図７は、消失点推定部２０の動作例を説明するためのフロー図である。 Then, the vanishing point estimation part 20 in this embodiment is demonstrated in detail. As illustrated in FIG. 6, the vanishing point estimation unit 20 includes a switching unit 201, a switching unit 202, an intraframe vanishing point estimation unit 21, an interframe vanishing point estimation unit 22, a buffer 203, and a buffer 204. The intra-frame vanishing point estimation unit 21 in FIG. 6 includes an edge detection unit 211, a straight line detection unit 212, and a vanishing point identification unit 213. This intra-frame vanishing point estimation unit 21 corresponds to the intra-frame vanishing point estimation means of the present invention, and estimates the vanishing point of the processing target image F (t) from the straight line information in the processing target image F (t). 6 includes a feature point detection unit 221, a corresponding point calculation unit 222, a transformation matrix calculation unit 223, and a vanishing point position calculation unit 224. The inter-frame vanishing point estimation unit 22 corresponds to the inter-frame vanishing point estimation means of the present invention, and is compared with the processing target image F (t) and the processing target image F (t) in the past comparison target image F (t−1). ) And the position of the vanishing point in the comparison target image F (t−1), the vanishing point of the processing target image F (t) is estimated. FIG. 7 is a flowchart for explaining an operation example of the vanishing point estimation unit 20.

図７において、図６の消失点推定部２０は、入力されたシーンチェンジ情報Ｓ（ｔ）、およびバッファ２０４より読み出した一つ前の消失点情報ＶＰ（ｔ−１）に基づいて消失点推定手段を選択する（図７のステップＳ３１）。具体的には、シーンチェンジが有る場合（「Ｓ（ｔ）＝１」）、もしくは、消失点情報ＶＰ（ｔ−１）が、前フレームに消失点が無いことを示す場合、つまり、消失点情報ＶＰ（ｔ−１）が「vp_num=0」の場合（図７のステップＳ３１においてＹｅｓ）、図６の切替部２０１は画像の入力先を、図６の切替部２０２は消失点情報の出力元を、フレーム内消失点推定部２１へそれぞれ切り替え（図７のステップＳ３２）、その後、フレーム内消失点推定部２１は、画像内の直線から消失点位置を推定し、その結果（消失点情報ＶＰ（ｔ））を出力する（図７のステップＳ３３）。 In FIG. 7, the vanishing point estimation unit 20 of FIG. 6 estimates vanishing points based on the input scene change information S (t) and the previous vanishing point information VP (t−1) read from the buffer 204. A means is selected (step S31 in FIG. 7). Specifically, when there is a scene change (“S (t) = 1”), or when the vanishing point information VP (t−1) indicates that there is no vanishing point in the previous frame, that is, the vanishing point. When the information VP (t−1) is “vp_num = 0” (Yes in step S31 in FIG. 7), the switching unit 201 in FIG. 6 outputs an image input destination, and the switching unit 202 in FIG. 6 outputs vanishing point information. The original is switched to the intra-frame vanishing point estimation unit 21 (step S32 in FIG. 7), and then the intra-frame vanishing point estimation unit 21 estimates the vanishing point position from the straight line in the image, and the result (vanishing point information) VP (t)) is output (step S33 in FIG. 7).

（フレーム内消失点推定部２１について）
ここで、フレーム内消失点推定部２１について詳細に説明する。図８は、フレーム内消失点推定部２１の動作例を説明するためのフロー図である。図９は、図８のフローに対応した画像例を示す図である。図８のステップＳ３３１において、図６のエッジ検出部２１１は、入力された処理対象画像Ｆ（ｔ）（図９（Ａ）を参照）から直線検出に用いるエッジ点情報Ｅｄｇｅ（ｔ）を算出する。具体的には、まず、色成分（例えば、ＲＧＢ（Ｒｅｄ（赤）、Ｇｒｅｅｎ（緑）、Ｂｌｕｅ（青）））毎に微分オペレータを適用し、ｘ方向、ｙ方向における各色成分ｉの勾配ベクトルＧ_ｉ（ｘ，ｙ｜ｔ）＝（ΔＧ_ｉｘ（ｔ）, ΔＧ_ｉｙ（ｔ））（ｉ＝１，２，３）を算出する。なお、ｉ＝１、２、３は、それぞれ、Ｒ成分、Ｇ成分、Ｂ成分である。 (Intraframe vanishing point estimation unit 21)
Here, the intra-frame vanishing point estimation unit 21 will be described in detail. FIG. 8 is a flowchart for explaining an operation example of the intra-frame vanishing point estimation unit 21. FIG. 9 is a diagram illustrating an image example corresponding to the flow of FIG. In step S331 in FIG. 8, the edge detection unit 211 in FIG. 6 calculates edge point information Edge (t) used for straight line detection from the input processing target image F (t) (see FIG. 9A). . Specifically, first, a differential operator is applied to each color component (for example, RGB (Red (red), Green (green), Blue (blue))), and a gradient vector of each color component i in the x direction and the y direction. G _i (x, y | t) = (ΔG _ix (t), ΔG _iy (t)) (i = 1, 2, 3) is calculated. Note that i = 1, 2, and 3 are an R component, a G component, and a B component, respectively.

続いて、エッジ検出部２１１は、式（３）の演算を、座標（ｘ，ｙ）の画素毎に行うことで、エッジ強度Ｅ（ｘ，ｙ｜ｔ）を算出する。 Subsequently, the edge detection unit 211 calculates the edge strength E (x, y | t) by performing the calculation of Expression (3) for each pixel of the coordinates (x, y).

続いて、エッジ検出部２１１は、式（４）の演算を、座標（ｘ，ｙ）の画素毎に行うことで、エッジ強度Ｅ（ｘ，ｙ｜ｔ）から局所的にエッジ強度が極大値となる座標をエッジ点として抽出し、その結果を記述したエッジ点情報Ｅｄｇｅ（ｔ）を直線検出部２１２へ出力する。 Subsequently, the edge detection unit 211 performs the calculation of Expression (4) for each pixel of the coordinates (x, y), so that the edge strength is locally maximum from the edge strength E (x, y | t). Are extracted as edge points, and edge point information Edge (t) describing the result is output to the straight line detection unit 212.

つまり、エッジ検出部２１１は、座標（ｘ，ｙ）を中心とした窓サイズＷ１×Ｗ２の範囲内で、エッジ強度Ｅ（ｘ，ｙ｜ｔ）が極大値となる場合（ＬｏｃａｌＭａｘｉｍａ）、エッジ点Ｅｄｇｅ（ｘ，ｙ｜ｔ）に「１」、それ以外はエッジ点Ｅｄｇｅ（ｘ，ｙ｜ｔ）に「０」を設定する。ここで、Ｗ１はｘ方向の窓のサイズ、Ｗ２はｙ方向の窓のサイズを表す。 That is, the edge detection unit 211 has an edge when the edge intensity E (x, y | t) is a maximum value within the window size W1 × W2 centered on the coordinates (x, y) (Local Maxima). “1” is set to the point Edge (x, y | t), and “0” is set to the edge point Edge (x, y | t) otherwise. Here, W1 represents the size of the window in the x direction, and W2 represents the size of the window in the y direction.

図８のステップＳ３３２に進んで、図６の直線検出部２１２は、入力されたエッジ点情報Ｅｄｇｅ（ｔ）にハフ変換を適用して直線情報Ｌ（ｔ）（図９（Ｂ）を参照）を取得する。ここで、ハフ変換による直線検出に関して図１０に基づいて説明する。なお、図１０（Ａ）において、ある直線Ｌ上にある特徴点Ａ，Ｂ，Ｃを、エッジ検出部２１１において得られた「Ｅｄｇｅ（ｘ，ｙ|ｔ）＝１」となるエッジ点とする。まず、ハフ変換では、図１０（Ａ）に示す画像空間上の直線Ｌを極座標表現（ρ、θ）を用いて表現する。ρは画像空間の原点から直線Ｌへ引いた垂線の距離を表し、θはその垂線が画像空間のｘ軸となす角度を表す。なお、ρの範囲はρ≧０であり、θの範囲は０≦θ<２πである。 Proceeding to step S332 in FIG. 8, the straight line detection unit 212 in FIG. 6 applies straight line information L (t) to the input edge point information Edge (t) by applying Hough transform (see FIG. 9B). To get. Here, the straight line detection by the Hough transform will be described with reference to FIG. In FIG. 10A, feature points A, B, and C on a certain straight line L are edge points that become “Edge (x, y | t) = 1” obtained by the edge detection unit 211. . First, in the Hough transform, a straight line L on the image space shown in FIG. 10A is expressed using polar coordinate expressions (ρ, θ). ρ represents the distance of a perpendicular drawn from the origin of the image space to the straight line L, and θ represents the angle formed by the perpendicular to the x axis of the image space. The range of ρ is ρ ≧ 0, and the range of θ is 0 ≦ θ <2π.

図１０（Ａ）の画像空間上にある特徴点Ａ，Ｂ，Ｃを通過する直線Ｌは、パラメータ（ρ₀,θ₀）を用いて式（５）によって表される。 A straight line L that passes through the feature points A, B, and C on the image space of FIG. 10A is expressed by Expression (5) using parameters (ρ ₀ , θ ₀ ).

また、特徴点Ａ，Ｂ，Ｃをそれぞれ通過する直線群は、パラメータ空間へ写像すると、図１０（Ｂ）においてパラメータ空間上の曲線ａ，曲線ｂ，曲線ｃとして表現される。つまり、パラメータ空間上で、曲線ａ、曲線ｂ、曲線ｃの交点（ρ₀,θ₀）が、特徴点Ａ，Ｂ，Ｃを通過する直線Ｌとして検出される。 Further, when a straight line group passing through each of the feature points A, B, and C is mapped to the parameter space, it is represented as a curve a, a curve b, and a curve c on the parameter space in FIG. 10B. That is, the intersection (ρ ₀ , θ ₀ ) of the curve a, the curve b, and the curve c is detected as a straight line L passing through the feature points A, B, and C on the parameter space.

以上のように「Ｅｄｇｅ(ｘ，ｙ|ｔ)＝１」となるエッジ点に対して、ハフ変換を適用し、パラメータ空間上で所定の閾値以上の曲線が交差し、かつ交差数の多い順にＮＬ個の極座標（ρｓ，θｓ）（ｓ＝１，・・・，ＮＬ）を直線Ｌｓとして抽出し、その極座標（ρｓ，θｓ）を記述したデータを直線情報Ｌ（ｔ）とする。なお、ＮＬの範囲は０≦ＮＬ≦ＮＬ_ｍａｘである。ここで、ＮＬ_ｍａｘは抽出する直線数の上限値を表す所定の定数である。また、上記直線抽出の条件を満たさず、直線が検出されない場合は、ＮＬ＝０となる。 As described above, the Hough transform is applied to the edge point where “Edge (x, y | t) = 1”, and the curves having a predetermined threshold value or more intersect on the parameter space, and the number of intersections increases. NL polar coordinates (ρs, θs) (s = 1,..., NL) are extracted as a straight line Ls, and data describing the polar coordinates (ρs, θs) is set as straight line information L (t). The range of NL is 0 ≦ NL ≦ NL _max . Here, NL _max is a predetermined constant representing the upper limit value of the number of straight lines to be extracted. If the straight line extraction condition is not satisfied and no straight line is detected, NL = 0.

再び図８に戻って、図６の消失点同定部２１３は、入力された直線情報Ｌ（ｔ）から直線数ＮＬを取得し、直線数ＮＬと所定の閾値と大小関係を比較し、ステップＳ３３４〜ステップＳ３３５の処理によって消失点の位置推定を行うか否かを決定する（ステップＳ３３３）。直線数ＮＬが閾値ＴｈＬ（≧２）より小さい場合（または以下の場合）（ステップＳ３３３においてＮｏ）、消失点の推定に十分な幾何的な奥行手掛かりが無いと判定し、ステップＳ３３６へ進む。また、直線数ＮＬが閾値ＴｈＬ（≧２）以上の場合（または大きい場合）（ステップＳ３３３においてＹｅｓ）、消失点同定部２１３は、消失点の推定に十分な幾何的な奥行手掛かりがあると判定し、入力された直線情報Ｌ（ｔ）から、直線を表す角度θに関して式（６）、式（７）の条件を満たす直線Ｌｉ(ｉ=1,・・・, ＮＬ)と直線Ｌｊ (ｊ=1,・・・, ＮＬ)を選び、その交点Ｐ_ｉｊ(ｉ≠ｊ）を式（８）の行列演算によって算出し、交点情報を取得する（ステップＳ３３４）。なお、直線の交点を求める際に、一度選んだ直線Ｌｉと直線Ｌｊ同士の重複演算はしないものとする。なお、式（６）の条件は選んだ二直線が平行でないことを表し、式（７）の条件は水平方向（｜θ−π｜≒π／２）近傍、及び垂直方向（｜θ−π｜≒０、または｜θ−π｜≒π）近傍の直線でないことを表す。 Returning to FIG. 8 again, the vanishing point identifying unit 213 in FIG. 6 obtains the number of straight lines NL from the input straight line information L (t), compares the number of straight lines NL with a predetermined threshold value, and compares the magnitude of the step 334. ~ It is determined whether or not the position of the vanishing point is to be estimated by the process of step S335 (step S333). When the number of straight lines NL is smaller than the threshold ThL (≧ 2) (or in the following case) (No in step S333), it is determined that there is no geometric depth clue sufficient to estimate the vanishing point, and the process proceeds to step S336. When the number of straight lines NL is greater than or equal to the threshold ThL (≧ 2) (or larger) (Yes in step S333), the vanishing point identifying unit 213 determines that there is a sufficient geometric depth clue for vanishing point estimation. From the input straight line information L (t), the straight line Li (i = 1,..., NL) and the straight line Lj (j) satisfying the conditions of the expressions (6) and (7) with respect to the angle θ representing the straight line. = 1,..., NL), the intersection P _ij (i ≠ j) is calculated by the matrix operation of the equation (8), and the intersection information is acquired (step S334). It should be noted that when the intersection of the straight lines is obtained, the overlap calculation between the straight line Li and the straight line Lj once selected is not performed. Note that the condition of Expression (6) represents that the selected two straight lines are not parallel, and the condition of Expression (7) is the vicinity in the horizontal direction (| θ−π | ≈π / 2) and the vertical direction (| θ−π). | ≈0 or | θ−π | ≈π).

続いて、消失点同定部２１３は、取得した直線の交点Ｐ_ｉｊの分布モデルを、式（９）に示すＫｃ個のガウス分布の混合モデルＧＭＭ（ＧａｕｓｓｉａｎＭｉｘｔｕｒｅＭｏｄｅｌ）を用いて表されると仮定し、ＥＭ（Ｅｘｐｅｃｔａｔｉｏｎ−Ｍａｘｉｍｉｚａｔｉｏｎ）アルゴリズムによって、分布モデルのパラメータ（ｗｉ，μｉ,Σｉ）（ｉ＝１，・・・，Ｋｃ）を取得し、消失点の位置を決定する（ステップＳ３３５）。 Subsequently, the vanishing point identifying unit 213 assumes that the distribution model of the acquired intersection point P _ij of the straight line is expressed using a mixed model GMM (Gaussian Mixture Model) of Kc Gaussian distributions shown in Expression (9). Then, the parameters (wi, μi, Σi) (i = 1,..., Kc) of the distribution model are acquired by the EM (Expectation-Maximization) algorithm, and the position of the vanishing point is determined (step S335).

なお、式（９）において、Ｐ（ｘ）は、ベクトルｘ（交点Ｐｉｊの座標）が出現する確率を表す。Ｋｃはクラス数（ガウス分布の個数）を表し、ｗｉはクラスｉのガウス分布の重み係数を表し、重み係数の総和は１となる。また、μｉはクラスｉの平均ベクトル（クラスｉの重心座標）を表し、Σｉはクラスｉの共分散行列を表し、Ｄはベクトルｘの次元数を表す。式（９）中のＮ（ｘ｜μｉ,Σｉ）は、クラスｉのガウス分布（正規分布）を表し、平均ベクトルμｉ、共分散行列Σｉを用いて表現される。つまり、消失点同定部２１３は、重み係数ｗｉが大きい上位Ｎ（≧１）クラスの分布の平均ベクトルμｉ（重心座標）を、消失点位置と定める。以降、簡単化のため、消失点の数をＮ＝１として説明するが、これに限定されるものではない。 In Equation (9), P (x) represents the probability that the vector x (coordinates of the intersection Pij) will appear. Kc represents the number of classes (number of Gaussian distributions), wi represents a weighting coefficient of class i Gaussian distribution, and the sum of the weighting coefficients is 1. Μi represents a class i average vector (class i barycentric coordinates), Σi represents a class i covariance matrix, and D represents the number of dimensions of the vector x. N (x | μi, Σi) in Expression (9) represents a Gaussian distribution (normal distribution) of class i, and is expressed using an average vector μi and a covariance matrix Σi. That is, the vanishing point identifying unit 213 determines the average vector μi (center of gravity coordinates) of the distribution of the top N (≧ 1) classes having a large weighting coefficient wi as the vanishing point position. Hereinafter, for the sake of simplicity, the number of vanishing points will be described as N = 1, but the present invention is not limited to this.

続いて、消失点同定部２１３は、ステップＳ３３３またはステップＳ３３５の結果に基づいて、図９（Ｃ）に示すように消失点情報ＶＰ（ｔ）を設定する（ステップＳ３３６）。なお、消失点情報ＶＰ（ｔ）は、例えば、表１のデータとして表現される。 Subsequently, the vanishing point identifying unit 213 sets vanishing point information VP (t) as shown in FIG. 9C based on the result of Step S333 or Step S335 (Step S336). The vanishing point information VP (t) is expressed as data in Table 1, for example.

表１において、消失点情報ＶＰ（ｔ）は、時刻ｔ（又は、画像のフレームに付した番号（フレーム番号）でもよい）を表す「vp_time」、検出した消失点の数を表す「vp_num」、及び検出したn個の消失点の位置「vp_pos[n]」を表すリストによって示される。 In Table 1, vanishing point information VP (t) is “vp_time” indicating time t (or a number (frame number) assigned to a frame of an image), “vp_num” indicating the number of detected vanishing points, And a list representing the position “vp_pos [n]” of the n vanishing points detected.

再び図７のステップＳ３１に戻って、シーンチェンジが無く（「Ｓ（ｔ）＝０」）、かつ、消失点情報ＶＰ（ｔ−１）が、前フレームに消失点が有ることを示す場合（消失点情報ＶＰ（ｔ−１）の「vp_num>0」）（ステップＳ３１においてＮｏ）、図６の切替部２０１は画像の入力先を、図６の切替部２０２は消失点情報の出力元を、フレーム間消失点推定部２２へそれぞれ切り替え（ステップＳ３４）、その後、フレーム間消失点推定部２２は、入力された処理対象画像Ｆ（ｔ）とバッファ２０３で記憶した１つ前の画像Ｆ（ｔ−１）から画像間の特徴点の対応関係を求め、その対応関係と、一つ前の消失点情報ＶＰ（ｔ−１）より処理対象画像Ｆ（ｔ）における消失点位置を推定し、その結果（消失点情報ＶＰ（ｔ））を出力する（ステップＳ３５）。すなわち、消失点推定部２０は、比較対象画像Ｆ（ｔ−１）の消失点の位置を含む消失点情報ＶＰ（ｔ−１）を記憶する記憶手段（図６のバッファ２０４）を備え、前フレームに消失点が有るか否かは、この記憶手段に前フレームの消失点情報ＶＰ（ｔ−１）が記憶されており、この消失点情報ＶＰ（ｔ−１）が「vp_num>0」であるか否かで判定される。 Returning to step S31 in FIG. 7 again, when there is no scene change (“S (t) = 0”) and the vanishing point information VP (t−1) indicates that there is a vanishing point in the previous frame ( (Vp_num> 0 ”of vanishing point information VP (t−1)) (No in step S31), the switching unit 201 in FIG. 6 is the input destination of the image, and the switching unit 202 in FIG. 6 is the output source of the vanishing point information. Then, switching to the inter-frame vanishing point estimation unit 22 is performed (step S34), and then the inter-frame vanishing point estimation unit 22 and the input processing target image F (t) and the previous image F (t stored in the buffer 203 ( From t-1), a correspondence relationship between feature points between images is obtained, and the vanishing point position in the processing target image F (t) is estimated from the correspondence relationship and the previous vanishing point information VP (t-1). As a result, vanishing point information VP (t) is output (step). S35). That is, the vanishing point estimation unit 20 includes storage means (buffer 204 in FIG. 6) that stores vanishing point information VP (t−1) including the position of the vanishing point of the comparison target image F (t−1). Whether or not the frame has a vanishing point is determined by storing the vanishing point information VP (t−1) of the previous frame in this storage means, and the vanishing point information VP (t−1) is “vp_num> 0”. It is determined by whether or not there is.

（フレーム間消失点推定部２２について）
ここで、フレーム間消失点推定部２２の詳細について説明する。図１１は、フレーム間消失点推定部２２の動作例を説明するためのフロー図である。また、図１２は、図１１のフローに対応した画像例を示す図である。図１１のステップＳ３５１において、図６の特徴点検出部２２１は、図１２（Ａ）に示すように、入力された処理対象画像Ｆ（ｔ）と一つ前の画像Ｆ（ｔ−１）との画像間の対応関係を求めるために用いるＮＫ個の特徴点Ｋｓ（ｓ＝１，・・・，ＮＫ）を検出し、その特徴点Ｋｓの座標（ｘ_Ks,t，ｙ_Ks,t）を記述した特徴点情報Ｋ（ｔ）を図６の対応点算出部２２２へ出力する（ステップＳ３５１）。 (About the inter-frame vanishing point estimation unit 22)
Here, details of the inter-frame vanishing point estimation unit 22 will be described. FIG. 11 is a flowchart for explaining an operation example of the inter-frame vanishing point estimation unit 22. FIG. 12 is a diagram illustrating an example of an image corresponding to the flow of FIG. In step S351 in FIG. 11, the feature point detection unit 221 in FIG. 6 receives the input processing target image F (t), the previous image F (t−1), as shown in FIG. NK feature points Ks (s = 1,..., NK) used for obtaining the correspondence between the images are detected, and the coordinates (x _{Ks, t} , y _{Ks, t} ) of the feature points Ks are detected. The described feature point information K (t) is output to the corresponding point calculation unit 222 in FIG. 6 (step S351).

なお、特徴点とは、画素間の色や輝度の変化等に基づいて被写体のエッジの一部や頂点として抽出される点である。例えば、画素（ｘ，ｙ）を中心とした局所領域Ｓの範囲内のｘ方向、ｙ方向の輝度の勾配ベクトルＧｉ（ｘ，ｙ）(ｉ＝ｘ，ｙ)を用いて表される二次モーメント行列Ａ（式（１０））の第一固有値λ１、及び第二固有値λ２を求め、式（１１）に示す条件を満たす画素（ｘ，ｙ）を特徴点として検出する。 A feature point is a point extracted as a part or vertex of a subject's edge based on a change in color or luminance between pixels. For example, a secondary expressed using a gradient vector Gi (x, y) (i = x, y) of luminance in the x and y directions within the range of the local region S with the pixel (x, y) as the center. A first eigenvalue λ1 and a second eigenvalue λ2 of the moment matrix A (Expression (10)) are obtained, and a pixel (x, y) that satisfies the condition shown in Expression (11) is detected as a feature point.

つまり、二次モーメント行列Ａの第一固有値λ１、及び第二固有値λ２のうち小さい方の固有値が所定の閾値λｔｈより大きい（または以上）場合に特徴点とするものである(例えば、非特許文献１を参照)。なお、式（１０）において係数ｗ（ｕ，ｖ）は、画素（ｘ，ｙ）からｘ方向にｕ，ｙ方向にｖだけ離れた画素（ｘ＋ｕ，ｙ＋ｖ）に関する重み係数を表し、例えば、式（１２）の条件を満たすように定めた、局所領域Ｓの範囲内の２次ガウス分布の値を正規化した値を用いる。 That is, a characteristic point is obtained when the smaller eigenvalue of the first eigenvalue λ1 and the second eigenvalue λ2 of the second moment matrix A is greater than (or greater than) the predetermined threshold λth (for example, non-patent literature). 1). In Equation (10), the coefficient w (u, v) represents a weighting coefficient for a pixel (x + u, y + v) that is separated from the pixel (x, y) by u in the x direction and v in the y direction. A value obtained by normalizing the value of the secondary Gaussian distribution within the range of the local region S determined so as to satisfy the condition (12) is used.

図１１のステップＳ３５２に進んで、図６の対応点算出部２２２は、図１２（Ｂ）に示すように、入力された処理対象画像Ｆ（ｔ）と、バッファ２０３より読み出した一つ前の画像Ｆ（ｔ−１）と、ステップＳ３５１で取得した処理対象画像Ｆ（ｔ）の特徴点情報Ｋ（ｔ）とに基づいて、処理対象画像Ｆ（ｔ）の各特徴点Ｋｓ（ｓ＝１，・・・，ＮＫ）が一つ前の画像Ｆ（ｔ−１）上にある位置（x_{Ks,t -1}，y_{Ks,t -1}）をオプティカルフローにより算出し、その特徴点Ｋｓの時刻ｔ，時刻ｔ−１における位置を記述した対応点情報Ｑ（ｔ，ｔ−１）を図６の変換行列算出部２２３へ出力する（ステップＳ３５２）。 Proceeding to step S352 in FIG. 11, the corresponding point calculation unit 222 in FIG. 6, as shown in FIG. 12B, inputs the processing target image F (t) and the previous one read from the buffer 203. Based on the image F (t−1) and the feature point information K (t) of the processing target image F (t) acquired in step S351, each feature point Ks (s = 1) of the processing target image F (t). ,..., NK) is calculated by optical flow at a position (x _{Ks, t −1} , y _{Ks, t −1} ) where the previous image F (t−1) is on the image F (t−1). Corresponding point information Q (t, t−1) describing the positions at time t and time t−1 is output to the transformation matrix calculation unit 223 in FIG. 6 (step S352).

なお、処理対象画像Ｆ（ｔ）の各特徴点Ｋｓが一つ前の画像Ｆ（ｔ−１）上にある位置（x_Ks,t-1，y_Ks,t-1）は、例えば、式（１３）に示す勾配法によるオプティカルフローの拘束条件を（x_Ks,t-1，y_Ks,t-1）について解くことで取得できる（例えば、非特許文献２を参照）。 Note that the position (x _{Ks, t−1} , y _{Ks, t−1} ) where each feature point Ks of the processing target image F (t) is on the previous image F ( _t−1 ) is, for example, an expression It can be acquired by solving the constraint condition of the optical flow by the gradient method shown in (13) with respect to (x _{Ks, t−1} , y _{Ks, t−1} ) (for example, see Non-Patent Document 2).

ここで、式（１３）において、Ｇｉ（ｘ，ｙ｜ｔ）（ｉ＝ｘ，ｙ，ｔ）は画像Ｆ（ｔ）の輝度に関するｘ方向、ｙ方向、ｔ方向（時間方向）の勾配ベクトルを表し、Ｓは特徴点Ｋｓを中心とする所定サイズの局所領域を表す。 Here, in Expression (13), Gi (x, y | t) (i = x, y, t) is a gradient vector in the x direction, y direction, and t direction (time direction) related to the luminance of the image F (t). S represents a local area of a predetermined size centered on the feature point Ks.

図１１のステップＳ３５３に進んで、図６の変換行列算出部２２３は、ステップＳ３５２で取得した対応点情報Ｑ（ｔ，ｔ−１）から、特徴点Ｋｓ（ｓ＝１，・・・，ＮＫ）を一つ前の画像Ｆ（ｔ−１）上の位置から、処理対象画像Ｆ（ｔ）上の位置へ射影する変換行列Ｈを算出し、その変換行列Ｈを記述した情報を図６の消失点位置算出部２２４へ出力する（ステップＳ３５３）。なお、２枚の画像間（Ｆ（ｔ），Ｆ（ｔ−１））の対応関係は、変換行列Ｈを用いて式（１４）で表すことができる（例えば、非特許文献３を参照）。この式（１４）において、記号「〜」は同値関係を表し、定数倍の違いを許して等しいことを意味する。 Proceeding to step S353 in FIG. 11, the transformation matrix calculation unit 223 in FIG. 6 uses the feature point Ks (s = 1,..., NK) from the corresponding point information Q (t, t−1) acquired in step S352. ) Is calculated from the position on the previous image F (t-1) to the position on the processing target image F (t), and information describing the conversion matrix H is calculated as shown in FIG. The data is output to the vanishing point position calculation unit 224 (step S353). Note that the correspondence between two images (F (t), F (t−1)) can be expressed by Expression (14) using the transformation matrix H (see, for example, Non-Patent Document 3). . In this equation (14), the symbol “˜” represents an equivalence relationship, which means that they are equal by allowing a constant multiple difference.

また、変換行列Ｈは、一般的な変換を表現することができるため、射影変換と呼ばれる。ここで、画像間の対応関係を平行移動として表現できると仮定すると、式（１４）は、式（１５）として表現される。 The transformation matrix H is called projective transformation because it can express general transformation. Here, if it is assumed that the correspondence between images can be expressed as parallel movement, Expression (14) is expressed as Expression (15).

式（１５）中の係数ｔｘ、ｔｙはそれぞれｘ方向、ｙ方向への移動量を表す。また、画像間の対応関係を平行移動、回転、拡大・縮小を含めたアフィン変換として表現できると仮定すると、式（１４）は、式（１６）として表現される。 Coefficients tx and ty in equation (15) represent the amounts of movement in the x and y directions, respectively. Assuming that the correspondence between images can be expressed as affine transformation including translation, rotation, and enlargement / reduction, Expression (14) is expressed as Expression (16).

式（１６）中の係数ａ，ｂ，ｃ，ｄは拡大・縮小、及び回転を表し、係数ｔｘ、ｔｙは式（１５）と同様である。なお、式（１４）、式（１５）、式（１６）における変換行列Ｈの各係数ｈ_ｉｊ（ｉ，ｊ＝１，２，３）は、各変換モデルの拘束条件と対応点情報Ｑ（ｔ，ｔ−１）から導かれる連立方程式を最小二乗法により解くことで算出する。なお、十分な対応点数が無く、変換行列Ｈを算出できない場合は、所定の変換行列Ｈ_０を用いる。 Coefficients a, b, c, and d in Expression (16) represent enlargement / reduction and rotation, and coefficients tx and ty are the same as those in Expression (15). The coefficients h _ij (i, j = 1, 2, 3) of the transformation matrix H in the equations (14), (15), and (16) are the constraint conditions and corresponding point information Q ( It is calculated by solving simultaneous equations derived from t, t-1) by the method of least squares. If there is not a sufficient number of corresponding points and the conversion matrix H cannot be calculated, a predetermined conversion matrix H ₀ is used.

図１１のステップＳ３５４に進んで、図６の消失点位置算出部２２４は、「時刻ｔにおける消失点の位置は、一つ前の画像Ｆ（ｔ−１）上の消失点位置を、図６の変換行列算出部２２３で算出した変換行列Ｈを用いて、処理対象画像Ｆ（ｔ）上へ射影した位置にある」と仮定して、時刻ｔの消失点位置を算出し（ステップ３５４）、その結果に基づいて消失点情報ＶＰ（ｔ）を設定する（ステップＳ３５５）。画像Ｆ（ｔ−１）上の消失点を変換行列Ｈにより画像Ｆ（ｔ）上に射影したときの画像例を図１２（Ｃ）に示す。 Proceeding to step S354 in FIG. 11, the vanishing point position calculation unit 224 in FIG. 6 reads “The vanishing point position at time t is the vanishing point position on the previous image F (t−1). Using the transformation matrix H calculated by the transformation matrix calculation unit 223, the vanishing point position at the time t is calculated (step 354), assuming that the position is projected onto the processing target image F (t). Based on the result, vanishing point information VP (t) is set (step S355). FIG. 12C shows an image example when the vanishing point on the image F (t−1) is projected onto the image F (t) by the transformation matrix H.

再び図７のステップＳ３６に戻って、図６のバッファ２０３は、１つ前の画像Ｆ（ｔ−１）を削除し、入力された処理対象画像Ｆ（ｔ）を記憶する。また、図６のバッファ２０４は、１つ前の消失点情報ＶＰ（ｔ−１）を削除し、フレーム内消失点推定部２１、または、フレーム間消失点推定部２２より入力された消失点情報ＶＰ（ｔ）を記憶して、処理対象画像Ｆ（ｔ）における消失点推定の処理を終了する（ステップＳ３６）。 Returning to step S36 in FIG. 7 again, the buffer 203 in FIG. 6 deletes the previous image F (t−1) and stores the input processing target image F (t). Also, the buffer 204 in FIG. 6 deletes the previous vanishing point information VP (t−1), and the vanishing point information input from the intra-frame vanishing point estimation unit 21 or the inter-frame vanishing point estimation unit 22. VP (t) is stored, and the vanishing point estimation process for the processing target image F (t) is terminated (step S36).

以上、本実施形態の消失点推定部２０によれば、図１３に示すように、同一シーン（空間方向、時間方向に相関のある時系列画像群）において、先頭フレームＦ（ｔ_０）から同一シーン内で最初に消失点が検出されるフレームＦ（ｔ_０＋ｋ−１）までは、フレーム内消失点推定手段（フレーム内消失点推定部２１）により消失点を推定し、同一シーン内で最初に消失点が検出されるフレームＦ（ｔ_０＋ｋ−１）の次フレームＦ（ｔ_０＋ｋ）から同一シーン内の最終フレームＦ（ｔ_０＋Ｎ）までは、フレーム間消失点推定手段（フレーム間消失点推定部２２）により消失点を推定するため、フレーム単位にフレーム内消失点推定手段により消失点を推定する場合と比べて、カメラワークにロバストでかつ、消失点の揺れを抑制し安定した消失点の推定が可能となる。 As described above, according to the vanishing point estimation unit 20 of the present embodiment, as shown in FIG. 13, in the same scene (a time series image group correlated in the spatial direction and the time direction), the same from the first frame F (t ₀ ). Until the frame F (t ₀ + k−1) in which the vanishing point is first detected in the scene, the vanishing point is estimated by the intra-frame vanishing point estimating means (intra-frame vanishing point estimating unit 21). From the next frame F (t ₀ + k) of the frame F (t ₀ + k−1) in which the vanishing point is detected to the last frame F (t ₀ + N) in the same scene, the inter-frame vanishing point estimation means (inter-frame Since the vanishing point is estimated by the vanishing point estimation unit 22), the vanishing point is more robust to the camera work than the case where the vanishing point is estimated by the intra-frame vanishing point estimation unit, and the vanishing point fluctuation is suppressed and stabilized. The vanishing point can be estimated.

（奥行モデル生成部３０について）
再び図２に戻って、図１の奥行モデル生成部３０は、本発明の奥行モデル生成手段に相当し、消失点推定部２０により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する。つまり、奥行モデル生成部３０は、消失点情報ＶＰ（ｔ）に基づいて、奥行モデルの作成手段（消失点位置から奥行モデルを作成する第１の奥行モデル作成手段、人の視覚特性に基づいた画像内の誘目性を表す顕著度から奥行モデルを作成する第２の奥行モデル作成手段）を選択し、選択した奥行モデル作成手段により、処理対象画像Ｆ（ｔ）における各画素の奥行値を設定し、各画素の奥行値を表す奥行モデルＤ（ｔ）を視点画像生成部４０へ出力する（図２のステップＳ１４）。 (About the depth model generation unit 30)
Returning to FIG. 2 again, the depth model generation unit 30 in FIG. 1 corresponds to the depth model generation means of the present invention, and generates different depth models based on whether or not the vanishing point can be estimated by the vanishing point estimation unit 20. To do. That is, the depth model generation unit 30 is based on the vanishing point information VP (t), based on the visual model of the depth model creating means (first depth model creating means for creating the depth model from the vanishing point position, human visual characteristics). Second depth model creating means for creating a depth model from the saliency representing the attractiveness in the image), and the depth value of each pixel in the processing target image F (t) is set by the selected depth model creating means. Then, the depth model D (t) representing the depth value of each pixel is output to the viewpoint image generation unit 40 (step S14 in FIG. 2).

続いて、本実施形態における奥行モデル生成部３０について詳細に説明する。図１４に示すように、奥行モデル生成部３０は、切替部３０１、切替部３０２、領域分割部３０３、距離算出部３０４、顕著度算出部３０５、および奥行値設定部３０６で構成されている。図１５は、奥行モデル生成部３０の動作例を説明するためのフロー図である。 Next, the depth model generation unit 30 in the present embodiment will be described in detail. As illustrated in FIG. 14, the depth model generation unit 30 includes a switching unit 301, a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306. FIG. 15 is a flowchart for explaining an operation example of the depth model generation unit 30.

図１５において、図１４の奥行モデル生成部３０は、入力された消失点情報ＶＰ（ｔ）に基づいて奥行モデル作成手段を選択する（ステップＳ４１）。つまり、現フレームに消失点が有る場合（消失点情報ＶＰ（ｔ）の「vp_num>0」）（ステップＳ４１においてＹｅｓ）、図１４の切替部３０１は画像の入力先を領域分割部３０３へ、図１４の切替部３０２は奥行値設定部３０６へ入力するデータの出力元を距離算出部３０４へそれぞれ切り替え、消失点に基づく第１の奥行モデル作成手段が選択される（ステップＳ４２）。 In FIG. 15, the depth model generation unit 30 in FIG. 14 selects the depth model creation means based on the input vanishing point information VP (t) (step S41). That is, when there is a vanishing point in the current frame (“vp_num> 0” of vanishing point information VP (t)) (Yes in step S41), the switching unit 301 in FIG. 14 transfers the image input destination to the region dividing unit 303. The switching unit 302 in FIG. 14 switches the output source of the data input to the depth value setting unit 306 to the distance calculation unit 304, and the first depth model creation means based on the vanishing point is selected (step S42).

ステップＳ４３に進んで、図１４の距離算出部３０４は、消失点情報ＶＰ（ｔ）の消失点の座標と各画素の座標との距離Ｄｉｓｔ（ｘ，ｙ）を算出し、その結果を記述した距離情報Ｄｉｓｔ（ｔ）を奥行値設定部３０６へ出力する（ステップＳ４３）。具体的には、消失点の座標と各画素の座標との距離Ｄｉｓｔ（ｘ，ｙ）は、式（１７）、式（１８）、式（１９）のいずれかに基づいて算出される。なお、式（１７）、式（１８）、式（１９）中のΔｘ、Δｙはそれぞれ各画素と消失点とのｘ方向の距離、ｙ方向の距離を表す。 In step S43, the distance calculation unit 304 in FIG. 14 calculates the distance Dist (x, y) between the coordinates of the vanishing point of the vanishing point information VP (t) and the coordinates of each pixel, and describes the result. The distance information Dist (t) is output to the depth value setting unit 306 (step S43). Specifically, the distance Dist (x, y) between the coordinates of the vanishing point and the coordinates of each pixel is calculated based on any one of Expression (17), Expression (18), and Expression (19). In Expressions (17), (18), and (19), Δx and Δy represent the distance in the x direction and the distance in the y direction between each pixel and the vanishing point, respectively.

ここで、画面内に消失点ＶＰがある場合の、それぞれ式（１７）、式（１８）、式（１９）に基づく距離情報Ｄｉｓｔ（ｔ）の一例を図１６に示す。図１６（Ａ）は、画面内に消失点ＶＰがある一例を表す。図１６（Ｂ）のＢＤ１ａは式（１７）、図１６（Ｃ）のＢＤ１ｂは式（１８）、図１６（Ｄ）のＢＤ１ｃは式（１９）に基づく距離情報Ｄｉｓｔ（ｔ）を表している。また、画面外に消失点ＶＰがある場合の、それぞれ式（１７）、式（１８）、式（１９）に基づく距離情報Ｄｉｓｔ（ｔ）の一例を図１７に示す。図１７（Ａ）は、画面外に消失点ＶＰがある一例を表す。図１７（Ｂ）のＢＤ２ａは式（１７）、図１７（Ｃ）のＢＤ２ｂは式（１８）、図１７（Ｄ）のＢＤ２ｃは式（１９）に基づく距離情報Ｄｉｓｔ（ｔ）を表している。なお、図１６及び図１７において。白い部分が最も近く、黒くなるにつれて遠くなるものとする。 Here, FIG. 16 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP is present in the screen. FIG. 16A shows an example in which the vanishing point VP is in the screen. BD1a in FIG. 16B represents equation (17), BD1b in FIG. 16C represents equation (18), and BD1c in FIG. 16D represents distance information Dist (t) based on equation (19). . FIG. 17 shows an example of the distance information Dist (t) based on the equations (17), (18), and (19) when the vanishing point VP exists outside the screen. FIG. 17A shows an example in which the vanishing point VP is outside the screen. BD2a in FIG. 17B represents equation (17), BD2b in FIG. 17C represents equation (18), and BD2c in FIG. 17D represents distance information Dist (t) based on equation (19). . In FIG. 16 and FIG. It is assumed that the white part is closest and becomes farther away as it becomes blacker.

ステップＳ４４に進んで、図１４の領域分割部３０３は、処理対象画像Ｆ（ｔ）を領域分割（クラスタリング）により、特徴量が類似する（特徴量の値が予め定めた範囲内となる）複数の画素の集合（領域；クラス）に分割する。例えば、領域分割部３０３は、特徴量空間でのクラスタリングにより画像を複数の領域へ分割する。特徴量空間によるクラスタリングとは、画像空間の各画素を特徴量空間（例えば、色、エッジ、動きベクトル）に写像し、その特徴量空間においてＫ-ｍｅａｎｓ法、Ｍｅａｎ-Ｓｈｉｆｔ法、又はＫ最近傍探索法（近似Ｋ最近傍探索法）などの手法により行うクラスタリングである。特徴量空間でのクラスタリング処理の終了後、各領域の代表値となる画素値（例えば平均値）により、そのクラス内の画素について、元の画像空間における画素値を置き換え、各領域に対して領域を識別するラベルを各領域内の全画素に付与し、その結果を記述した領域情報Ｒ（ｔ）を奥行値設定部３０６へ出力する（ステップＳ４４）。 Proceeding to step S44, the region dividing unit 303 in FIG. 14 has a plurality of similar feature amounts (feature value values are within a predetermined range) by region division (clustering) of the processing target image F (t). Into a set of pixels (region; class). For example, the region dividing unit 303 divides the image into a plurality of regions by clustering in the feature amount space. The clustering by the feature amount space means that each pixel of the image space is mapped to the feature amount space (for example, color, edge, motion vector), and the K-means method, the Mean-Shift method, or the K nearest neighbor in the feature amount space. Clustering is performed by a technique such as a search method (approximate K nearest neighbor search method). After the clustering processing in the feature amount space is finished, the pixel value in the class is replaced with a pixel value (for example, an average value) that is a representative value of each region, and the pixel value in the original image space is replaced with each region. Is assigned to all pixels in each region, and region information R (t) describing the result is output to the depth value setting unit 306 (step S44).

ステップＳ４５に進んで、図１４の奥行値設定部３０６は、入力された距離情報Ｄｉｓｔ（ｔ）と領域情報Ｒ（ｔ）に基づいて、各画素の奥行値を設定する。具体的には、式（２０）に示すように、領域情報Ｒ（ｔ）が示す各領域内にある画素の距離Ｄｉｓｔ（ｘ，ｙ）の平均値をスケーリングし、基準となる奥行値Ｄ_ｂａｓｅ（ｘ，ｙ）だけシフトした値を各画素の奥行値Ｄ（ｘ，ｙ）として設定する（ステップＳ４５）。 Proceeding to step S45, the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel based on the input distance information Dist (t) and region information R (t). Specifically, as shown in Expression (20), the average value of the distances Dist (x, y) of the pixels in each area indicated by the area information R (t) is scaled, and the depth value D _{base serving} as a reference is scaled. A value shifted by (x, y) is set as the depth value D (x, y) of each pixel (step S45).

なお、式（２０）において、Ｄ_ｍａｘは奥行値の上限値、Ｄ_ｍｉｎは奥行値の下限値、Ｄｉｓｔ_ｍａｘは距離情報Ｄｉｓｔ（ｔ）の最大値、Ｄｉｓｔ_ｍｉｎは距離情報Ｄｉｓｔ（ｔ）の最小値、Ｄ_ｂａｓｅ（ｘ，ｙ）は各画素の奥行値の基準値（最遠景とする奥行値）を調整するための所定の定数である。ここで、消失点に基づいた奥行モデルの一例を図１８に示す。図１８において、画像Ａは処理対象画像Ｆ（ｔ）の一例を表し、画像Ｂは領域分割部３０３において求めた処理対象画像Ｆ（ｔ）の領域分割結果（領域分割情報Ｒ（ｔ））の一例を表し、画像Ｃは処理対象画像Ｆ（ｔ）の消失点ＶＰの一例を表し、画像Ｄは距離算出部３０４において求めた処理対象画像Ｆ（ｔ）の距離情報Ｄｉｓｔ（ｔ）の一例を表し、画像Ｅは奥行値設定部３０６において、画像Ｂの領域分割情報Ｒ（ｔ）と画像Ｄの距離情報Ｄｉｓｔ（ｔ）に基づいて求めた奥行モデルの一例である。図１８の画像Ｅにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。 In Expression (20), D _max is the upper limit value of the depth value, D _min is the lower limit value of the depth value, Dist _max is the maximum value of the distance information Dist (t), and Dist _min is the minimum value of the distance information Dist (t). The value D _base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). Here, an example of the depth model based on the vanishing point is shown in FIG. In FIG. 18, an image A represents an example of the processing target image F (t), and an image B is a region division result (region division information R (t)) of the processing target image F (t) obtained by the region division unit 303. For example, the image C represents an example of the vanishing point VP of the processing target image F (t), and the image D represents an example of the distance information Dist (t) of the processing target image F (t) obtained by the distance calculation unit 304. The image E is an example of the depth model obtained by the depth value setting unit 306 based on the region division information R (t) of the image B and the distance information Dist (t) of the image D. In the image E of FIG. 18, the bright part represents the near side, and the dark part represents the back.

再び図１５のステップＳ４１に戻って、現フレームに消失点が無い場合（消失点情報ＶＰ（ｔ）の「vp_num=0」）（ステップＳ４１においてＮｏ）、図１４の切替部３０１は画像の入力先を顕著度算出部３０５へ、図１４の切替部３０２は奥行値設定部３０６へ入力するデータの出力元を顕著度算出部３０５へそれぞれ切り替え、顕著度に基づく奥行モデル作成手段が選択される（ステップＳ４６）。 Returning to step S41 in FIG. 15 again, when there is no vanishing point in the current frame (“vp_num = 0” in vanishing point information VP (t)) (No in step S41), the switching unit 301 in FIG. The switching unit 302 in FIG. 14 switches the output source of the data input to the saliency calculating unit 306 to the saliency calculating unit 306, and the depth model creating means based on the saliency is selected. (Step S46).

図１４の顕著度算出部３０５は、入力された処理対象画像Ｆ（ｔ）から、人の視覚特性に基づいた画像内の誘目性を表す顕著度Ｍ（ｔ）を算出する（ステップＳ４７）。人が注目しやすい部分の例としては、注目画素とその周辺画素との色差が大きい箇所（局所的な色差）、注目する画素と画像全体との色差が大きい箇所、あるいは注目画素を含む局所領域とその周辺領域との色差が大きい箇所（大局的な色差）がある。色差とは、色の知覚的な相違を定量的に表したものであり、色差を評価する色空間として、均等色空間（uniform color space）であるＣＩＥＬＡＢ色空間（ＣＩＥ１９７６Ｌ＊ａ＊ｂ＊空間ともいう）を用いる。人の視覚特性に基づき、式（２１）により各画素の顕著度Ｍ（ｘ，ｙ）を算出する。 The saliency calculating unit 305 in FIG. 14 calculates the saliency M (t) representing the attractiveness in the image based on the human visual characteristics from the input processing target image F (t) (step S47). Examples of portions that are easy for people to pay attention to are locations where the color difference between the pixel of interest and its surrounding pixels is large (local color difference), locations where the color difference between the pixel of interest and the entire image is large, or a local region including the pixel of interest There is a portion (global color difference) where the color difference between the image and the surrounding area is large. The color difference is a quantitative representation of a perceptual difference in color. As a color space for evaluating the color difference, a CIELAB color space (CIE 1976 L * a * b *) which is a uniform color space is used. Also referred to as space). Based on the visual characteristics of the person, the saliency M (x, y) of each pixel is calculated by Equation (21).

ここで、式（２１）において、ΔＥ_{ｌｏｃａｌ}は局所的な色差を表し、ΔＥ_{ｇｌｏｂａｌ}は大局的な色差を表し、係数α、βは所定の重み係数を表す。つまり、式（２１）は顕著度を局所的な色差と大局的な色差との線形和によって表わしている。なお、局所的な色差ΔＥ_{ｌｏｃａｌ}は式（２２）、大局的な色差ΔＥ_{ｇｌｏｂａｌ}は式（２３）によって算出される。また、式（２２）、式（２３）において、Ｌ＊は明度指数、ａ＊は赤−緑の知覚色度、ｂ＊は黄−青の知覚色度を表す。なお、色差を評価する色空間は、ＣＩＥＬＵＶ色空間（ＣＩＥ１９７６Ｌ＊ｕ＊ｖ＊色空間ともいう）を用いてもよい。なお、式（２２）中の係数ｗ（ｕ，ｖ）は、式（１２）と同一であるため、説明を省略する。 Here, in Expression (21), ΔE _local represents a local color difference, ΔE _global represents a _global color difference, and coefficients α and β represent predetermined weight coefficients. That is, Expression (21) expresses the saliency by a linear sum of the local color difference and the global color difference. The local color difference ΔE _local is calculated by equation (22), and the global color difference ΔE _global is calculated by equation (23). In Expressions (22) and (23), L * represents a lightness index, a * represents red-green perceptual chromaticity, and b * represents yellow-blue perceived chromaticity. Note that a CIELV color space (also referred to as CIE 1976 L * u * v * color space) may be used as the color space for evaluating the color difference. Note that the coefficient w (u, v) in the equation (22) is the same as that in the equation (12), and thus the description thereof is omitted.

ステップＳ４８に進んで、図１４の奥行値設定部３０６は、入力された顕著度Ｍ（ｔ）に基づいて、式（２４）の演算により各画素の奥行値を設定し、その結果を記述した奥行モデルＤ（ｔ）を出力する（ステップＳ４８）。 Proceeding to step S48, the depth value setting unit 306 in FIG. 14 sets the depth value of each pixel by the calculation of Expression (24) based on the input saliency M (t), and describes the result. The depth model D (t) is output (step S48).

式（２４）において、Ｄ_ｍａｘは奥行値の上限値、Ｄ_ｍｉｎは奥行値の下限値、Ｍ_ｍａｘは顕著度Ｍ（ｔ）の最大値、Ｍ_ｍｉｎは顕著度Ｍ（ｔ）の最小値、Ｄ_ｂａｓｅ（ｘ，ｙ）は各画素の奥行値の基準値（最遠景とする奥行値）を調整するための所定の定数である。つまり、式（２４）により各画素の顕著度Ｍ（ｘ，ｙ）をスケーリングし、基準となる奥行値Ｄ_ｂａｓｅ（ｘ，ｙ）だけシフトした値を各画素の奥行値Ｄ（ｘ，ｙ）として設定する。ここで、顕著度に基づいた奥行モデルの一例を図１９に示す。図１９において、画像Ａは処理対象画像Ｆ（ｔ）の一例を表し、画像Ｂは顕著度算出部３０５において求めた処理対象画像Ｆ（ｔ）の顕著度Ｍ（ｔ）の一例を表し、画像Ｃは基準となる奥行モデル（Ｄ_ｂａｓｅ）の一例を表し、画像Ｄは奥行値設定部３０６において、画像Ｃの奥行モデル（Ｄ_ｂａｓｅ）に画像Ｂの顕著度Ｍ（ｔ）を合成して作成した奥行モデルの一例である。 In Formula (24), D _max is the upper limit value of the depth value, D _min is the lower limit value of the depth value, M _max is the maximum value of the saliency M (t), M _min is the minimum value of the saliency M (t), D _base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). That is, the saliency M (x, y) of each pixel is scaled by the equation (24), and the value shifted by the reference depth value D _base (x, y) is the depth value D (x, y) of each pixel. Set as. Here, an example of the depth model based on the saliency is shown in FIG. In FIG. 19, an image A represents an example of the processing target image F (t), an image B represents an example of the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305, and the image C represents an example of a reference depth model (D _base ), and the image D is created by combining the depth model (D _base ) of the image C with the saliency M (t) of the image B in the depth value setting unit 306. It is an example of a depth model.

図１９の画像Ｂにおいて、明るい部分（白）が人の注目しやすい部分（誘目性が高い）を表し、暗い部分（黒）が人の注目しにくい部分（誘目性が低い）を表す。また、図１９の画像Ｄにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。図１９の画像Ｄに示すように、顕著度に基づく奥行モデル作成手段は、基準となる奥行の面上（Ｄ_ｂａｓｅ）に、スケーリングした顕著度を重畳し、注目する領域とその周辺領域との相対的な奥行の違いを強調することによって、疑似的な奥行感を知覚させるものである。 In the image B of FIG. 19, a bright part (white) represents a part that is easily noticed by a person (high attractiveness), and a dark part (black) represents a part that is difficult for human attention (low attractiveness). Moreover, in the image D of FIG. 19, it represents that a bright part is a near side, and a dark part represents that it is the back. As shown in the image D of FIG. 19, the depth model creation means based on the saliency superimposes the scaled saliency on the reference depth surface (D _base ), and the region of interest and its surrounding region By emphasizing the difference in relative depth, a pseudo depth feeling is perceived.

顕著度に基づいて奥行モデルを生成する場合には、基準となる奥行の面上に、顕著度の高い（誘目性が高い）部分の奥行が手前側に、顕著度の低い（誘目性が低い）部分の奥行が奥側となるように設定する。これにより、注目する領域とその周辺領域との相対的な奥行の違いが強調され、擬似的な奥行感を知覚させることができる。換言すれば、注目領域の顕著度が周辺領域の顕著度よりも高い場合には、周辺領域に対して相対的に奥行が手前になるように設定される。また、注目領域の顕著度が周辺領域の顕著度と同等の場合には、相対的に同じ奥行になるように設定される。また、注目領域の顕著度が周辺領域の顕著度よりも低い場合には、周辺領域に対して相対的に奥行が奥になるように設定される。 When the depth model is generated based on the saliency, the depth of the portion with high saliency (high attraction) is on the near side on the reference depth surface, and the saliency is low (low attraction) ) Set the depth of the part to the back side. Thereby, the relative depth difference between the region of interest and the surrounding region is emphasized, and a pseudo depth sensation can be perceived. In other words, when the saliency of the attention area is higher than the saliency of the surrounding area, the depth is set to be relatively closer to the surrounding area. Further, when the saliency of the attention area is equal to the saliency of the surrounding area, the depth is set to be relatively the same. Further, when the saliency of the attention area is lower than the saliency of the peripheral area, the depth is set to be relatively deep with respect to the peripheral area.

以上のように、奥行モデル生成部３０によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを作成することで、幾何的な奥行手掛かりのによる奥行感を強調する奥行モデルを作成することができる。また、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを作成することで、人の注目する部分の奥行感を強調する奥行モデルを作成することができる。 As described above, according to the depth model generation unit 30, when the vanishing point position can be estimated by the geometric depth clue, the depth model of the image is created based on the vanishing point, thereby generating the geometric depth clue. It is possible to create a depth model that emphasizes the sense of depth. In addition, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model of the image is created from the saliency representing the attractiveness in the image based on the visual characteristics of the person, so that It is possible to create a depth model that emphasizes the sense of depth.

（視点画像生成部４０について）
再び図２に戻って、図１の視点画像生成部４０は、本発明の視点画像生成手段に相当し、予め設定された想定視聴条件情報に基づいて、奥行モデルＤ（ｔ）が表す各画素の奥行値から、基準画像Ｆ（ｔ）（入力画像；処理対象画像）上の各画素と視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ；Ｆｒ：右眼提示画像、Ｆｌ：左眼提示画像）上の対応する画素までのずれ量を表す視差ベクトル（シフト量）を算出し、基準画像Ｆ（ｔ）上の画素と、対応する算出した視差ベクトルに基づいて、各視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）を生成する（ステップＳ１５）。 (About the viewpoint image generation unit 40)
Returning to FIG. 2 again, the viewpoint image generation unit 40 of FIG. 1 corresponds to the viewpoint image generation means of the present invention, and each pixel represented by the depth model D (t) based on the assumed viewing condition information set in advance. From the depth value of each pixel, each pixel on the reference image F (t) (input image; processing target image) and the viewpoint image Fi (t) (i = 1, r; Fr: right eye presentation image, Fl: left eye presentation image) ) Calculates a disparity vector (shift amount) representing a shift amount to the corresponding pixel on the above, and based on the pixel on the reference image F (t) and the corresponding calculated disparity vector, each viewpoint image Fi (t) (I = 1, r) is generated (step S15).

ここで、「想定視聴条件情報」とは、視聴者に提示する立体画像（左眼提示画像、右眼提示画像）を生成するための情報であり、立体画像を表示するディスプレイの画素ピッチ（画素間距離）μ、ディスプレイの画像サイズ、視聴者と立体画像を表示するディスプレイまでの距離（想定視距離）ｆ、立体画像の奥行量を表す視差範囲（視差ベクトルの範囲）、基線長ｔ（視点画像Ｆｒ（ｔ）の仮想右視点Ｃｒと視点画像Ｆｌ（ｔ）の仮想左視点Ｃｌ間の距離）を表す。 Here, “assumed viewing condition information” is information for generating a stereoscopic image (left-eye presentation image, right-eye presentation image) to be presented to the viewer, and the pixel pitch (pixel) of the display that displays the stereoscopic image. Distance) μ, display image size, distance between viewer and display for displaying stereoscopic image (assumed viewing distance) f, parallax range (range of parallax vector) representing depth of stereoscopic image, baseline length t (viewpoint) This represents the distance between the virtual right viewpoint Cr of the image Fr (t) and the virtual left viewpoint Cl of the viewpoint image Fl (t).

この想定視聴条件情報に基づいた視点画像を生成するためのカメラ（視点）配置の一例の俯瞰図を図２０に示す。図２０の例では、平行法による立体画像の撮影を想定し、仮想右視点Ｃｒ上のカメラと仮想左視点Ｃｌ上のカメラが基準視点Ｃｃ上のカメラとｘ軸方向に平行に配置され、それぞれのカメラは３次元空間上にある注目点Ｐを観測しているとする。また基準視点Ｃｃの画像面Ｉｃ上に投影された注目点Ｐの位置をＸｃ、仮想左視点Ｃｌの画像面Ｉｌ上に投影された注目点Ｐの位置をＸｌ、仮想右視点Ｃｒの画像面Ｉｒ上に投影された注目点Ｐの位置をＸｒとする。図２０において、各視点と対応する画像面までの距離（焦点距離、あるいは視距離）ｆ、視点から注目点Ｐまでのｚ方向の距離Ｚ、基準視点Ｃｃと各仮想視点（Ｃｒ，Ｃｌ）までのｘ方向の距離ｔ／２を用いて各画像面上に投影された注目点Ｐの位置ＸｌとＸｃ，ＸｒとＸｃの幾何的な関係は、それぞれ式（２５）、（２６）によって表される。 An overhead view of an example of a camera (viewpoint) arrangement for generating a viewpoint image based on this assumed viewing condition information is shown in FIG. In the example of FIG. 20, assuming that a parallel image is captured, a camera on the virtual right viewpoint Cr and a camera on the virtual left viewpoint Cl are arranged in parallel with the camera on the reference viewpoint Cc in the x-axis direction, respectively. Is observing a point of interest P in a three-dimensional space. Further, the position of the attention point P projected on the image plane Ic of the reference viewpoint Cc is Xc, the position of the attention point P projected on the image plane Il of the virtual left viewpoint Cl is X1, and the image plane Ir of the virtual right viewpoint Cr. Let Xr be the position of the point of interest P projected above. In FIG. 20, the distance (focal length or viewing distance) f to the image plane corresponding to each viewpoint, the distance Z in the z direction from the viewpoint to the point of interest P, the reference viewpoint Cc and each virtual viewpoint (Cr, Cl). The geometrical relationships between the positions Xl and Xc, Xr and Xc of the target point P projected on each image plane using the distance t / 2 in the x direction are expressed by equations (25) and (26), respectively. The

以上から、基準画像Ｆ（ｔ）上の画素と視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）上の対応する画素までのずれ量を表す視差ベクトル（シフト量）ｄｉ（ｉ＝ｌ，ｒ）は、式（２５）、（２６）を変形した式（２７）、（２８）によって導出される。 From the above, the disparity vector (shift amount) di (i = 1, r) representing the amount of deviation between the pixel on the reference image F (t) and the corresponding pixel on the viewpoint image Fi (t) (i = 1, r). ) Is derived from equations (27) and (28) obtained by modifying equations (25) and (26).

なお、式（２７）、（２８）中の変数μは、画素ピッチを表す。つまり、基準画像Ｆ（ｔ）と相対奥行値である奥行モデルＤ（ｔ）と奥行モデルＤ（ｔ）を絶対奥行値Ｚへ変換する関数（Ｚ＝ｚ（Ｄ（ｔ）））が与えられれば、式（２７）、式（２８）に基づいて、各視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）を生成することができる。 Note that the variable μ in the equations (27) and (28) represents the pixel pitch. That is, a reference image F (t), a depth model D (t) that is a relative depth value, and a function (Z = z (D (t))) that converts the depth model D (t) into an absolute depth value Z are given. For example, each viewpoint image Fi (t) (i = 1, r) can be generated based on Expression (27) and Expression (28).

以下では、上記考え方に基づき視点画像生成部４０について説明する。図２１は、視点画像生成部４０の構成例を示すブロック図である。また、図２２は、視点画像生成部４０の動作例を説明するためのフロー図である。また、図２３は、視点画像生成部４０における視点画像の生成例を説明するための図である。まず、図２１に示すように、視点画像生成部４０は、視差ベクトル算出部４０１、テクスチャシフト部４０２、ギャップフィリング部（オクルージョン補償部ともいう）４０３、及びフローティングウィンドウ重畳部４０４で構成されている。 Below, the viewpoint image generation part 40 is demonstrated based on the said view. FIG. 21 is a block diagram illustrating a configuration example of the viewpoint image generation unit 40. FIG. 22 is a flowchart for explaining an operation example of the viewpoint image generation unit 40. FIG. 23 is a diagram for explaining a viewpoint image generation example in the viewpoint image generation unit 40. First, as illustrated in FIG. 21, the viewpoint image generation unit 40 includes a disparity vector calculation unit 401, a texture shift unit 402, a gap filling unit (also referred to as an occlusion compensation unit) 403, and a floating window superimposing unit 404. .

図２２において、図２１の視差ベクトル算出部４０１は、入力された想定視聴条件情報と奥行モデルＤ（ｔ）と奥行モデルＤ（ｔ）を絶対奥行値へ変換する関数（Ｚ＝ｚ（Ｄ（ｔ）））とに基づいて、式（２７）、式（２８）から基準画像Ｆ（ｔ）上の各画素と各視点画像Ｆｉ（ｔ）上の対応する画素までの視差ベクトルｄｉ（ｉ＝ｌ，ｒ）を算出し、その結果をテクスチャシフト部４０２へ出力する（図２２のステップＳ５１）。なお、視差ベクトルの算出方法は、式（２７）、式（２８）に基づいて各画素の視差ベクトルを算出するほかに、図２３（Ｂ）のＬＵＴに示すように、予め想定視聴条件情報に基づいて設定した奥行値（相対奥行値）から視差ベクトルを導くルックアップテーブルを用いて算出してもよい。ここで、図２３（Ｂ）中の視差ベクトル（シフト量）の開散方向、交差方向について図２４を用いて説明する。図２４において、ある注目点をＰとし、右眼から見てディスプレイ面に投影される注目点ＰをＰｒ、左眼から見てディスプレイ面に投影される注目点ＰをＰｌとする。このとき、開散方向の視差ベクトルは、図２４の（Ａ）に示すように、ある注目点Ｐは、ディスプレイ面の後方に位置し、ディスプレイ面上のＰｒからＰｌへの視差ベクトル、あるいはＰｌからＰｒへの視差ベクトルの値が正となる場合である。同様に、交差方向の視差ベクトルは、図２４の（Ｂ）に示すように、ある注目点Ｐはディスプレイ面の前方に位置し、ディスプレイ面上のＰｒからＰｌへの視差ベクトル、あるいはＰｌからＰｒへの視差ベクトルの値が負となる場合である。また、視差ベクトルの値がゼロの場合は、注目点Ｐはディスプレイ面上に位置する。 In FIG. 22, the disparity vector calculation unit 401 in FIG. 21 converts the input assumed viewing condition information, the depth model D (t), and the depth model D (t) into an absolute depth value (Z = z (D ( t))) and the disparity vectors di (i = i = i) from the equations (27) and (28) to the respective pixels on the reference image F (t) and the corresponding pixels on each viewpoint image Fi (t). l, r) is calculated, and the result is output to the texture shift unit 402 (step S51 in FIG. 22). In addition to calculating the disparity vector of each pixel based on Expressions (27) and (28), the disparity vector calculation method is based on the assumed viewing condition information in advance as shown in the LUT of FIG. You may calculate using the lookup table which derives a parallax vector from the depth value (relative depth value) set based on. Here, the spreading direction and the crossing direction of the disparity vector (shift amount) in FIG. 23B will be described with reference to FIG. In FIG. 24, let P be a certain point of interest, Pr be the point of interest P projected onto the display surface when viewed from the right eye, and Pl be the point of interest P projected onto the display surface when viewed from the left eye. At this time, as shown in FIG. 24A, the disparity vector in the spreading direction is such that a certain point of interest P is located behind the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl This is a case where the value of the parallax vector from to Pr becomes positive. Similarly, as shown in FIG. 24B, the disparity vector in the cross direction is such that a certain point of interest P is located in front of the display surface, and the disparity vector from Pr to Pl on the display surface, or Pl to Pr. This is a case where the value of the parallax vector to becomes negative. When the value of the parallax vector is zero, the attention point P is located on the display surface.

続いて、図２１のテクスチャシフト部４０２は、基準画像Ｆ（ｔ）の各画素（ｘ，ｙ）を、対応する視差ベクトルｄｉ（ｉ＝ｌ，ｒ）に基づいて、各視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）と対応する画素（ｕ，ｖ）の画素値として設定し、生成した視点画像をギャップフィリング部４０３へ出力する（図２２のステップＳ５２）。なお、画素値を設定するときは、視差ベクトルの値が開散方向側（例えば、図２３（Ｂ）のＬＵＴ上のｄ２）の値を有する画素からテクスチャシフトを行う。 Subsequently, the texture shift unit 402 in FIG. 21 converts each pixel (x, y) of the reference image F (t) to each viewpoint image Fi (t) based on the corresponding disparity vector di (i = 1, r). ) (I = 1, r) and the corresponding pixel value of the pixel (u, v), and the generated viewpoint image is output to the gap filling unit 403 (step S52 in FIG. 22). When the pixel value is set, texture shift is performed from a pixel whose disparity vector value has a value on the spreading direction side (for example, d2 on the LUT in FIG. 23B).

例えば、図２３（Ａ）において、基準画像ｉＦ、奥行モデルｉＤより仮想左視点の視点画像Ｆｌ（ｔ）（左眼提示画像）を生成する場合を考える。なお、奥行モデルｉＤは、白部分の奥行値がＤ１であり、黒部分の奥行値がＤ２で表されるとする。このとき、図２３（Ｂ）のＬＵＴに基づいてテクスチャシフトを行うと、まず、図２３（Ａ）の奥行値Ｄ２を有するレイヤＬ２の各画素を開散方向へｄ２だけシフトする。その後、図２３（Ａ）の奥行値Ｄ１を有するレイヤＬ１の各画素を交差方向へｄ１だけシフトすると、画面の左端／右端に位置しない欠損領域Ｇｓ１と画面の左端／右端に位置する欠損領域Ｇｌ１を有する視点画像ｏＦ１が得られる。ここで、欠損領域（オクルージョン領域）とは、図２３（Ａ）の視点画像ｏＦ１において、それぞれ基準画像上に対応する画素がないため、画素値が設定されていない領域を表す。 For example, in FIG. 23A, a case where a virtual left viewpoint viewpoint image Fl (t) (left eye presented image) is generated from the reference image iF and the depth model iD is considered. In the depth model iD, the depth value of the white portion is D1 and the depth value of the black portion is represented by D2. At this time, when texture shift is performed based on the LUT in FIG. 23B, first, each pixel of the layer L2 having the depth value D2 in FIG. 23A is shifted by d2 in the spreading direction. After that, when each pixel of the layer L1 having the depth value D1 in FIG. 23A is shifted by d1 in the intersecting direction, a defective region Gs1 that is not located at the left / right end of the screen and a defective region Gl1 that is located at the left / right end of the screen A viewpoint image oF1 having is obtained. Here, the missing area (occlusion area) represents an area in which no pixel value is set because there is no corresponding pixel on the reference image in the viewpoint image oF1 in FIG.

続いて、図２１のギャップフィリング部４０３は、入力された視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）において、画面端に位置しない欠損領域（例えば、図２３（Ａ）の視点画像ｏＦ１のＧｓ１）の画素を、欠損領域周辺に位置する画素群から補間し、補間後の視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）をフローティングウィンドウ重畳部４０４へ出力する（図２２のステップＳ５３）。なお、欠損領域の画素の補間方法は、例えば、線形補間、メディアンフィルタ、もしくは公知の画像修復方法（例えば、非特許文献６参照）を用いる。 Subsequently, the gap filling unit 403 in FIG. 21 in the input viewpoint image Fi (t) (i = 1, r), the missing region that is not located at the screen edge (for example, the viewpoint image oF1 in FIG. 23A). The pixel of Gs1) is interpolated from the pixel group located around the defect area, and the interpolated viewpoint image Fi (t) (i = 1, r) is output to the floating window superimposing unit 404 (step S53 in FIG. 22). . Note that, for example, linear interpolation, a median filter, or a known image restoration method (see, for example, Non-Patent Document 6) is used as a method for interpolating a pixel in a defective region.

続いて、フローティングウィンドウ重畳部４０４は、入力された視点画像Ｆｉ（ｔ）（ｉ＝ｌ，ｒ）の両方のうち、画面端に位置する欠損領域（例えば、図２３（Ａ）の視点画像ｏＦ１のＧｌ１）において、欠損領域の幅の最大値Ｗ１を取得する。続いて、それぞれの視点画像の右端、左端へ幅Ｗ２（＝αＷ１）のフローティングウィンドウ（黒帯）を挿入し、その結果を出力する（図２２のステップＳ５４）。なお、Ｗ２は、Ｗ１を所定の定数αでスケーリングした値である。また、フローティングウィンドウ挿入後の視点画像は、例えば図２３（Ａ）の視点画像ｏＦ２である。図２３（Ａ）の視点画像ｏＦ２では、画面の左端、及び右端にフローティングウィンドウｆｗ１、ｆｗ２がそれぞれ挿入されている。なお、フローティングウィンドウを挿入する理由は、左眼／右眼に提示される画像において、ある対象の位置や形状などが極端に異なる場合（例えば、生成した視点画像において画面の左端／右端に位置する欠損領域）、一つの対象として両眼視することができないことが原因で発生する左右の網膜像を交互に知覚する視野闘争を抑制するためである。 Subsequently, the floating window superimposing unit 404 of the input viewpoint images Fi (t) (i = 1, r) has a defect area located at the screen edge (for example, the viewpoint image oF1 in FIG. 23A). In Gl1), the maximum width W1 of the defect region is obtained. Subsequently, a floating window (black band) having a width W2 (= αW1) is inserted into the right end and the left end of each viewpoint image, and the result is output (step S54 in FIG. 22). W2 is a value obtained by scaling W1 with a predetermined constant α. Further, the viewpoint image after the floating window is inserted is, for example, the viewpoint image oF2 of FIG. In the viewpoint image oF2 in FIG. 23A, floating windows fw1 and fw2 are inserted at the left end and the right end of the screen, respectively. The reason for inserting the floating window is that the position or shape of a certain target is extremely different in the image presented to the left eye / right eye (for example, the generated viewpoint image is located at the left end / right end of the screen). This is to suppress a visual field struggle that alternately perceives left and right retinal images caused by the inability to see both eyes as one object.

このように、本実施形態によれば、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づいて画像の奥行モデルを生成することにより、幾何的な奥行手掛かりによる奥行感を強調した立体画像を生成することができる。また、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度から画像の奥行モデルを生成することにより、人の注目する部分の奥行感を強調した立体画像を生成することができる。 As described above, according to the present embodiment, when the vanishing point position can be estimated by the geometric depth cue, the depth model of the image is generated based on the vanishing point, so that the depth sensation by the geometric depth cue can be obtained. An enhanced stereoscopic image can be generated. In addition, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model of the image is generated from the saliency representing the attractiveness in the image based on the human visual characteristics, thereby A stereoscopic image in which the sense of depth is emphasized can be generated.

（奥行モデル生成部３０の変形例（奥行モデル生成部３０ａ））
上記実施形態において、奥行モデル生成部３０では、顕著度に基づく奥行モデル作成手段の一例として、式（２４）により各画素の顕著度Ｍ（ｘ，ｙ）をスケーリングし、基準となる奥行値Ｄ_ｂａｓｅ（ｘ，ｙ）だけシフトした値を各画素の奥行値Ｄ（ｘ，ｙ）として設定する場合について説明したが、本発明はこれに限定されない。例えば、奥行モデル生成部３０を、図２５に示すように切替部３０１を取り除き、画像Ｆ（ｔ）が領域分割部３０３、顕著度算出部３０５へ入力されるように構成を変更してもよい。つまり、奥行モデル生成部３０ａは、切替部３０２、領域分割部３０３、距離算出部３０４、顕著度算出部３０５、および奥行値設定部３０６で構成される。 (Modification of depth model generation unit 30 (depth model generation unit 30a))
In the embodiment described above, the depth model generation unit 30 scales the saliency M (x, y) of each pixel according to the equation (24) as an example of the depth model creation unit based on the saliency, and becomes a reference depth value D. _{Although the case} where the value shifted by _base (x, y) is set as the depth value D (x, y) of each pixel has been described, the present invention is not limited to this. For example, the configuration of the depth model generation unit 30 may be changed so that the switching unit 301 is removed as illustrated in FIG. 25 and the image F (t) is input to the region division unit 303 and the saliency calculation unit 305. . That is, the depth model generation unit 30a includes a switching unit 302, an area dividing unit 303, a distance calculating unit 304, a saliency calculating unit 305, and a depth value setting unit 306.

この場合の顕著度に基づく奥行モデル作成手段の動作例について、図２５に基づいて説明する。なお、消失点に基づく奥行モデル作成手段の動作は、図１５のステップＳ４２〜Ｓ４５と同一の処理のため、ここでの説明を省略する。 An example of the operation of the depth model creation means based on the saliency in this case will be described with reference to FIG. Note that the operation of the depth model creation means based on the vanishing point is the same processing as steps S42 to S45 in FIG.

まず、図２５の顕著度算出部３０５は、図１５のステップＳ４７と同様の処理によって、処理対象画像Ｆ（ｔ）より顕著度Ｍ（ｔ）を算出し、その結果を奥行値設定部３０６へ出力する（図２６のステップＳ４７′）。続いて、図２５の領域分割部３０３は、図１５のステップＳ４４と同様の処理によって、処理対象画像Ｆ（ｔ）を領域分割し、その結果を記述した領域情報Ｒ（ｔ）を奥行値設定部３０６へ出力する（図２６のステップＳ４８′）。 First, the saliency calculating unit 305 in FIG. 25 calculates the saliency M (t) from the processing target image F (t) by the same processing as step S47 in FIG. 15, and the result is sent to the depth value setting unit 306. This is output (step S47 ′ in FIG. 26). Subsequently, the area dividing unit 303 in FIG. 25 divides the processing target image F (t) into areas by processing similar to that in step S44 in FIG. 15, and sets area information R (t) describing the result to the depth value setting. It outputs to the part 306 (step S48 'of FIG. 26).

その後、図２５の奥行値設定部３０６は、入力された顕著度Ｍ（ｔ）と領域情報Ｒ（ｔ）に基づいて、式（２９）に示すように、領域情報Ｒ（ｔ）が示す各領域内にある画素の顕著度Ｍ（ｘ，ｙ）の平均値をスケーリングし、基準となる奥行値Ｄ_ｂａｓｅ（ｘ，ｙ）だけシフトした値を各画素の奥行値Ｄ（ｘ，ｙ）として設定する（図２６のステップＳ４９′）。 Thereafter, the depth value setting unit 306 in FIG. 25, based on the input saliency M (t) and the area information R (t), each area information R (t) indicates, as shown in Expression (29). The average value of the saliency M (x, y) of the pixels in the region is scaled, and the value shifted by the reference depth value D _base (x, y) is used as the depth value D (x, y) of each pixel. Setting is made (step S49 'in FIG. 26).

なお、式（２９）において、Ｄ_ｍａｘは奥行値の上限値、Ｄ_ｍｉｎは奥行値の下限値、Ｍ_ｍａｘは顕著度Ｍ（ｔ）の最大値、Ｍ_ｍｉｎは顕著度Ｍ（ｔ）の最小値、Ｄ_ｂａｓｅ（ｘ，ｙ）は各画素の奥行値の基準値（最遠景とする奥行値）を調整するための所定の定数である。ここで、変形例における顕著度に基づいた奥行モデルの一例を図２７に示す。図２７において、画像Ａは処理対象画像Ｆ（ｔ）の一例を表し、画像Ｂは領域分割部３０３において求めた処理対象画像Ｆ（ｔ）の領域分割結果（領域情報Ｍ（ｔ））の一例を表し、画像Ｃは顕著度算出部３０５において求めた処理対象画像Ｆ（ｔ）の顕著度Ｍ（ｔ）を表し、画像Ｄは基準となる奥行モデル（Ｄ_ｂａｓｅ）の一例を表し、画像Ｅは奥行値設定部３０６において、画像Ｂの領域情報Ｒ（ｔ）と画像Ｃの顕著度Ｍ（ｔ）と画像Ｄの基準となる奥行モデル（Ｄ_ｂａｓｅ）に基づいて求めた奥行モデルの一例である。図２７の画像Ｃにおいて、明るい部分（白）が人の注目しやすい部分（誘目性が高い）を表し、暗い部分（黒）が人の注目しにくい部分（誘目性が低い）を表す。また、図２７の画像Ｅにおいて、明るい部分が手前であることを表し、暗い部分が奥であることを表す。なお、基準となる奥行モデル（Ｄ_ｂａｓｅ）に関して、図２７の画像Ｄでは、同一の奥行値をもつ平面を一例として挙げたが、これに限定されない。例えば、下記の式（３０）に示す平面方程式を予め定めて、各画素の座標（ｘ，ｙ）によって基準となる奥行値Ｄ_ｂａｓｅ（ｘ，ｙ）を設定してもよい。式（３０）によって表される基準となる奥行モデルの一例を図２８の画像Ａに示す。図２８の画像Ａは、奥行が下端に近いほど手前となり上端に近いほど奥となるように式（３０）の係数ａ，ｂ，ｃを設定した基準となる奥行モデルである。図２８の画像Ａを、図２５の画像Ｄの代わりに入力した場合に作成される奥行モデルＤ（ｔ）の結果を図２８の画像Ｂに示す。 In Expression (29), D _max is the upper limit value of the depth value, D _min is the lower limit value of the depth value, M _max is the maximum value of the saliency M (t), and M _min is the minimum value of the saliency M (t). The value D _base (x, y) is a predetermined constant for adjusting the reference value of the depth value of each pixel (the depth value as the farthest view). Here, an example of the depth model based on the saliency in the modification is shown in FIG. In FIG. 27, image A represents an example of the processing target image F (t), and image B is an example of the region division result (region information M (t)) of the processing target image F (t) obtained by the region dividing unit 303. The image C represents the saliency M (t) of the processing target image F (t) obtained by the saliency calculating unit 305, the image D represents an example of a reference depth model (D _base ), and the image E Is an example of the depth model obtained by the depth value setting unit 306 based on the region information R (t) of the image B, the saliency M (t) of the image C, and the depth model (D _base ) as a reference of the image D. is there. In the image C of FIG. 27, a bright part (white) represents a part that is easily noticed by a person (high attraction), and a dark part (black) represents a part that is difficult for a person to pay attention (low attraction). In the image E of FIG. 27, the bright part represents the near side, and the dark part represents the back. In addition, regarding the reference depth model (D _base ), in the image D of FIG. 27, a plane having the same depth value is given as an example, but the present invention is not limited to this. For example, the plane equation shown in the following equation (30) may be determined in advance, and the reference depth value D _base (x, y) may be set by the coordinates (x, y) of each pixel. An example of the depth model serving as a reference expressed by the equation (30) is shown in an image A of FIG. Image A in FIG. 28 is a depth model that serves as a reference in which the coefficients a, b, and c in Expression (30) are set so that the depth is closer to the lower end and the depth is closer to the upper end. The result of the depth model D (t) created when the image A in FIG. 28 is input instead of the image D in FIG. 25 is shown in an image B in FIG.

以上のように奥行モデル生成部３０ａは、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度と、画像の領域分割結果（領域情報）に基づいて、領域毎に均一の奥行値を設定することで、奥行の前後関係の誤りを抑制した奥行モデルを生成することができる。 As described above, when the vanishing point position cannot be estimated due to the geometric depth cue, the depth model generation unit 30a and the saliency representing the attractiveness in the image based on the visual characteristics of the person and the image segmentation result ( By setting a uniform depth value for each region based on (region information), it is possible to generate a depth model that suppresses errors in the depth context.

（シーンチェンジ検出部１０の変形例）
上記実施形態において、シーンチェンジ検出部１０では、シーンチェンジ検出に用いる画像特徴量として、輝度ヒストグラムを用いる場合について説明したが、本発明はこれに限定されない。例えば、輝度ヒストグラムの代わりに、各色成分の出現頻度を表すカラーヒストグラム、フレーム間差分の平均誤差、動きベクトルの分布を画像特徴量として用いてもよい。 (Modification of scene change detection unit 10)
In the above embodiment, the scene change detection unit 10 has been described using the luminance histogram as the image feature amount used for scene change detection, but the present invention is not limited to this. For example, instead of the luminance histogram, a color histogram representing the appearance frequency of each color component, an average error of inter-frame differences, and a motion vector distribution may be used as the image feature amount.

（エッジ検出部２１１の変形例）
上記実施形態において、エッジ検出部２１１では、画像空間において局所的にエッジ強度が極大となる点（ＬｏｃａｌＭａｘｉｍａ）をエッジ点として抽出するエッジ検出について説明したが、本発明はこれに限られない。例えば、ＣａｎｎｙＥｄｇｅｄｅｔｅｃｔｉｏｎなどの公知のエッジ検出手法を用いてもよい。また、微分オペレータ（エッジ検出器）として、ソーベルフィルタ（Ｓｏｂｅｌｆｉｌｔｅｒ）、プリューウィットフィルタ（Ｐｒｅｗｉｔｔｆｉｌｔｅｒ）、ＬｏＧフィルタ（ＬａｐｒａｃｉａｎｏｆＧａｕｓｓｉａｎ）、ＤｏＧフィルタ（ＤｉｆｆｅｒｅｎｃｅｏｆＧａｕｓｓｉａｎ）、などの公知の手法を用いてもよい。 (Modification of the edge detection unit 211)
In the above embodiment, the edge detection unit 211 has described edge detection in which a point (Local Maxima) where the edge intensity is locally maximum in the image space is extracted as an edge point, but the present invention is not limited to this. For example, a known edge detection method such as Canny Edge detection may be used. Further, as a differential operator (edge detector), a known method such as a Sobel filter, a Prewitt filter, a LoG filter (Laplacian of Gaussian), or a DoG filter (Difference of Gaussian) is used. May be.

（消失点同定部２１３の変形例）
上記実施形態において、消失点同定部２１３では、混合モデルに用いる分布モデルとしてガウス分布を用いる場合について説明したが、本発明はこれに限られない。例えば、分布モデルには指数型分布族（ラプラス分布、ベータ分布、ベルヌーイ分布など）を用いてもよい。また、消失点同定部２１３は、混合モデルに用いるクラス数Ｋｃを予め定めた値とし、次の一例のように値を決定してもよい。消失点同定部２１３は、クラス数Ｋｃに予め定めたクラス数Ｋｃ′を設定し、Ｋ-ｍｅａｎｓ法により、クラスタリングを行う。その後、消失点同定部２１３は、クラス間距離が所定閾値以下（または未満）を満たすクラスＣｉとクラスＣｊがある場合は、クラスＣｉとクラスＣｊとを併合して、新たなクラスＣｋ′とする処理を行う。消失点同定部２１３は、この処理を、クラス数が一定値へ収束するまで繰り返すことにより、クラス数Ｋｃ（≦Ｋｃ′）を決定する。なお、消失点同定部２１３が交点の分布モデルの推定に用いる手法は、混合モデルなどのパラメトリックの推定手法に限定されず、Ｍｅａｎ−ｓｈｉｆｔ法、Ｋ−ｍｅａｎｓ法、Ｋ最近傍探索法（近似Ｋ最近傍探索法）などのノンパラメトリックの推定手法であってもよい。 (Modification of Vanishing Point Identification Unit 213)
In the said embodiment, although the vanishing point identification part 213 demonstrated the case where Gaussian distribution was used as a distribution model used for a mixed model, this invention is not limited to this. For example, an exponential distribution family (Laplace distribution, beta distribution, Bernoulli distribution, etc.) may be used for the distribution model. The vanishing point identifying unit 213 may determine the number of classes Kc used in the mixed model as a predetermined value and determine the value as in the following example. The vanishing point identifying unit 213 sets a predetermined class number Kc ′ as the class number Kc, and performs clustering by the K-means method. Thereafter, when there is a class Ci and a class Cj where the distance between classes satisfies a predetermined threshold value or less (or less), the vanishing point identifying unit 213 merges the class Ci and the class Cj to form a new class Ck ′. Process. The vanishing point identifying unit 213 determines the number of classes Kc (≦ Kc ′) by repeating this process until the number of classes converges to a constant value. Note that the method used by the vanishing point identifying unit 213 to estimate the distribution model of the intersection is not limited to a parametric estimation method such as a mixed model, and the Mean-shift method, the K-means method, the K nearest neighbor search method (approximate K) A non-parametric estimation method such as a nearest neighbor search method may be used.

（領域分割部３０３の変形例）
上記実施形態において、領域分割部３０３では、特徴量空間でのクラスタリングを行う場合について説明したが、本発明はこれに限らず、画像空間でのクラスタリングを行ってもよい。画像空間でのクラスタリングとは、特徴量空間に写像せず、元の画像空間において、画素間、または領域を構成する画素群（領域）間の類似度を基に、領域分割を実施する手法である。例えば、領域分割部３０３は、（ａ）画素結合法、（ｂ）領域成長法（ＲｅｇｉｏｎＧｒｏｗｉｎｇ法ともいう）、（ｃ）領域分割統合法（Ｓｐｌｉｔ＆Ｍｅｒｇｅ法ともいう）の手法により、画像空間でのクラスタリングを行ってもよい。 (Modification of the area dividing unit 303)
In the above embodiment, the case in which the region dividing unit 303 performs clustering in the feature amount space has been described, but the present invention is not limited to this, and clustering in the image space may be performed. Clustering in the image space is a method for performing region division based on the similarity between pixels or pixel groups (regions) constituting the region in the original image space without mapping to the feature amount space. is there. For example, the region dividing unit 303 uses (a) pixel combination method, (b) region growth method (also referred to as Region Growing method), and (c) region division integration method (also referred to as Split & Merge method) in the image space. Clustering may be performed.

（顕著度算出部３０５の変形例）
上記実施形態において、顕著度算出部３０５が局所的な色差および大局的な色差に基づいて顕著度を算出する場合について説明したが、本発明はこれに限定されず、局所的な色差（式（２１）中の第一項ΔＥ_{ｌｏｃａｌ}）、または、大局的な色差（式（２１）中の第二項ΔＥ_{ｇｌｏｂａｌ}）のいずれか一方の指標に基づいて顕著度を算出してもよい。また、赤−緑の知覚色度ａ＊、黄−青の知覚色度ｂ＊を用いずに、明度指数であるＬ＊のみを用いて色差を算出してもよい。この場合は、人の視覚特性において、明るさの対比（コントラスト差）が大きい箇所が誘目性の高いことを表す。また、局所的な色差ΔＥ_{ｌｏｃａｌ}、および大局的な色差ΔＥ_{ｇｌｏｂａｌ}は、ＣＩＥ方式に基づいて明度の差ΔＬ＊、クロマの差ΔＣ＊、色相の差ΔＨ＊を用いて、それぞれ式（３１）、式（３２）によって求めてもよい（例えば、非特許文献４を参照）。 (Modification of saliency calculation unit 305)
In the above embodiment, the case where the saliency calculating unit 305 calculates the saliency based on the local color difference and the global color difference has been described. However, the present invention is not limited to this, and the local color difference (expression ( The saliency may be calculated based on one of the indicators of the first term ΔE _local ) in 21) or the global color difference (second term ΔE _{global in} equation (21)). Alternatively, the color difference may be calculated using only the L * lightness index without using the red-green perceptual chromaticity a * and the yellow-blue perceptual chromaticity b *. In this case, in a human visual characteristic, a portion having a large brightness contrast (contrast difference) represents high attractiveness. Further, the local color difference ΔE _local and the global color difference ΔE _global are expressed by the following equation (31), using the lightness difference ΔL *, chroma difference ΔC *, and hue difference ΔH * based on the CIE method, respectively. You may obtain | require by Formula (32) (for example, refer nonpatent literature 4).

なお、式（３１）の係数ｗ（ｕ，ｖ）は、式（１２）と同一である。また、式（３１）および式（３２）中の係数ｌ、ｃ、ｈは所定の重み係数である。また、顕著度の求め方は、色差に限定されず、色差、エッジ勾配、動きベクトルなど複数の画像特徴量に基づいて顕著度を算出してもよい（例えば、非特許文献５を参照）。 Note that the coefficient w (u, v) in equation (31) is the same as in equation (12). Further, the coefficients l, c, h in the expressions (31) and (32) are predetermined weighting coefficients. The method of obtaining the saliency is not limited to the color difference, and the saliency may be calculated based on a plurality of image feature amounts such as a color difference, an edge gradient, and a motion vector (for example, see Non-Patent Document 5).

（立体画像生成装置１の第一の変形例）
上記実施形態において、立体画像生成装置１のシーンチェンジ検出部１０、消失点推定部２０、及び奥行モデル生成部３０で入出力の画像サイズは、入力画像Ｆ（ｔ）と同一の画像サイズと仮定して説明してきたが、これに限定されない。例えば、演算量の低減、メモリサイズの低減を図るために、シーンチェンジ検出部１０、消失点推定部２０、及び奥行モデル生成部３０に入力する画像を、予め所定の画像サイズへ縮小し、奥行モデル生成部３０より出力される奥行モデルを入力画像サイズへ拡大する処理を追加して実施してもよい。つまり、立体画像生成装置１の第一の変形例（立体画像生成装置２）は、図２９に示すように、縮小処理部５０、シーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、拡大処理部６０、視点画像生成部４０によって構成される。縮小処理部５０は、本発明の縮小画像生成手段に相当し、入力画像Ｆ（ｔ）から所定の画像サイズの縮小画像を生成する。そして、生成された縮小画像は、消失点推定部２０と奥行モデル生成部３０に入力される。拡大処理部６０は、本発明の拡大奥行モデル生成手段に相当し、奥行モデル生成部３０により生成された縮小画像の奥行モデルから入力画像Ｆ（ｔ）と同一画像サイズの拡大奥行モデルを生成する。 (First Modification of Stereoscopic Image Generation Device 1)
In the above embodiment, the input / output image size of the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 of the stereoscopic image generation device 1 is assumed to be the same as the input image F (t). However, the present invention is not limited to this. For example, in order to reduce the amount of computation and the memory size, images input to the scene change detection unit 10, the vanishing point estimation unit 20, and the depth model generation unit 30 are reduced in advance to a predetermined image size, and the depth You may add and implement the process which expands the depth model output from the model production | generation part 30 to input image size. That is, the first modification (stereoscopic image generating apparatus 2) of the stereoscopic image generating apparatus 1 includes a reduction processing unit 50, a scene change detecting unit 10, a vanishing point estimating unit 20, and a depth model generating unit as shown in FIG. 30, an enlargement processing unit 60, and a viewpoint image generation unit 40. The reduction processing unit 50 corresponds to reduced image generation means of the present invention, and generates a reduced image having a predetermined image size from the input image F (t). Then, the generated reduced image is input to the vanishing point estimation unit 20 and the depth model generation unit 30. The enlargement processing unit 60 corresponds to the enlarged depth model generation unit of the present invention, and generates an enlarged depth model having the same image size as the input image F (t) from the depth model of the reduced image generated by the depth model generation unit 30. .

上記立体画像生成装置２の動作例について、図３０に基づいて説明する。なお、図２９のシーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、及び視点画像生成部４０の各動作（図３０のステップＳ６３、ステップＳ６４、ステップＳ６５、ステップＳ６７）はそれぞれ前述の図１に示した立体画像生成装置１のシーンチェンジ検出部１０、消失点推定部２０、奥行モデル推定部３０、及び視点画像生成部４０の各動作（前述の図２のステップＳ１２、ステップＳ１３、ステップＳ１４、ステップＳ１５）と同一であるため説明を省略する。 An example of the operation of the stereoscopic image generating apparatus 2 will be described with reference to FIG. Each operation (step S63, step S64, step S65, step S67 in FIG. 30) of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model generation unit 30, and the viewpoint image generation unit 40 in FIG. Each operation of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model estimation unit 30, and the viewpoint image generation unit 40 of the stereoscopic image generation apparatus 1 shown in FIG. 1 (step S12 in FIG. Since it is the same as S13, step S14, and step S15), description thereof is omitted.

図３０において、まず、図２９の立体画像生成装置２は、入力された時刻ｔの画像を縮小処理部５０、及び視点画像生成部４０へ出力する（図３０のステップＳ６１）。 In FIG. 30, first, the stereoscopic image generating apparatus 2 in FIG. 29 outputs the input image at time t to the reduction processing unit 50 and the viewpoint image generating unit 40 (step S61 in FIG. 30).

図２９の縮小処理部５０は、入力された処理対象画像Ｆ（ｔ）を予め定められた画像サイズへ縮小し、縮小画像Ｆｄ（ｔ）をシーンチェンジ検出部１０、消失点推定部２０、及び奥行モデル生成部３０へ出力する（図３０のステップＳ６２）。なお、画像の縮小は、例えば、ニアレストネイバ法、バイリニア法、バイキュービック法のいずれかの方法を用いて行う。 29 reduces the input processing target image F (t) to a predetermined image size, converts the reduced image Fd (t) to the scene change detection unit 10, the vanishing point estimation unit 20, and It outputs to the depth model production | generation part 30 (step S62 of FIG. 30). Note that image reduction is performed using, for example, a nearest neighbor method, a bilinear method, or a bicubic method.

図２９の拡大処理部６０は、入力された奥行モデルＤ（ｔ）を入力画像Ｆ（ｔ）の画像サイズへ拡大し、拡大奥行モデルＤｕ（ｔ）を視点画像生成部４０へ出力する（図３０のステップＳ６６）。なお、奥行モデルの拡大は、例えば、ニアレストネイバ法、バイリニア法、バイキュービック法のいずれかを用いて行う。 29 enlarges the input depth model D (t) to the image size of the input image F (t), and outputs the enlarged depth model Du (t) to the viewpoint image generation unit 40 (FIG. 30 step S66). The depth model is enlarged using, for example, the nearest neighbor method, the bilinear method, or the bicubic method.

上記立体画像生成装置２によれば、入力画像より小さい画像サイズの縮小画像を用いてシーンチェンジ検出処理、消失点推定処理、奥行モデル生成処理を行うため、図１の立体画像生成装置１に比べて、メモリサイズの低減、演算量の低減を図ることができる。 According to the stereoscopic image generating apparatus 2, scene change detection processing, vanishing point estimation processing, and depth model generation processing are performed using a reduced image having an image size smaller than the input image, and therefore, compared with the stereoscopic image generating apparatus 1 of FIG. 1. Thus, the memory size and the amount of calculation can be reduced.

（立体画像生成装置１の第二の変形例）
上記実施形態において、立体画像生成装置１では、幾何的な奥行手掛かりにより消失点位置を推定できる場合は、消失点に基づき画像の奥行モデルを生成し、幾何的な奥行手掛かりにより消失点位置を推定できない場合は、人の視覚特性に基づいた画像内の誘目性を表す顕著度に基づいて画像の奥行モデルを生成している。そのため、奥行モデル生成手段が切り替わる前後のフレームにおいて、時間方向に奥行モデルが異なるため、視差（奥行）の変化が大きくなると考えられる。また、同様にシーンチェンジが発生する前後のフレームにおいても、時間方向に奥行モデルが異なるため、視差（奥行）の変化が大きくなると考えられる。そこで、立体画像生成装置１の第二の変形例（図３１の立体画像生成装置３）では、時間方向の視差の変化を低減するために、奥行モデルを時空間方向に平滑化する時空間方向平滑化部７０を、奥行モデル生成部３０と視点画像生成部４０の間に設ける。つまり、立体画像生成装置３は、図３１に示すように、シーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、時空間方向平滑化部７０、及び視点画像生成部４０によって構成される。この時空間方向平滑化部７０は、本発明の時空間方向平滑化手段に相当し、図３２に示すように、空間方向平滑化部７０１、時間方向平滑化部７０２、及びバッファ７０３によって構成される。時空間方向平滑部７０は、奥行モデル生成部３０により生成した処理対象画像Ｆ（ｔ）の奥行モデルＤ（ｔ）を空間方向に平滑化し、空間方向に平滑化された画像Ｆ（ｔ）の奥行モデルＤｓ（ｔ）と、画像Ｆ（ｔ）よりも過去の比較対象画像Ｆ（ｔ−１）の時空間方向に平滑化された奥行モデルＤｔ（ｔ−１）とに基づいて、画像Ｆ（ｔ）の奥行モデルＤｓ（ｔ）を時間方向に平滑化し、画像Ｆ（ｔ）の時空間方向に平滑化された奥行モデルＤｔ（ｔ）を生成する。 (Second Modification of Stereoscopic Image Generation Device 1)
In the above embodiment, when the vanishing point position can be estimated by the geometric depth cue, the stereoscopic image generation apparatus 1 generates a depth model of the image based on the vanishing point and estimates the vanishing point position by the geometric depth cue. When it is not possible, the depth model of the image is generated based on the saliency representing the attractiveness in the image based on the visual characteristics of the person. For this reason, in the frames before and after the depth model generation unit is switched, the depth model is different in the time direction, so that the change in parallax (depth) is considered to be large. Similarly, in the frames before and after the scene change occurs, since the depth model is different in the time direction, it is considered that the change in parallax (depth) increases. Therefore, in the second modification of the stereoscopic image generating device 1 (stereoscopic image generating device 3 in FIG. 31), the spatio-temporal direction in which the depth model is smoothed in the spatiotemporal direction in order to reduce the change in parallax in the temporal direction. A smoothing unit 70 is provided between the depth model generation unit 30 and the viewpoint image generation unit 40. That is, the stereoscopic image generation device 3 includes a scene change detection unit 10, a vanishing point estimation unit 20, a depth model generation unit 30, a spatio-temporal direction smoothing unit 70, and a viewpoint image generation unit 40 as illustrated in FIG. Is done. The spatio-temporal direction smoothing unit 70 corresponds to the spatio-temporal direction smoothing means of the present invention, and includes a spatial direction smoothing unit 701, a time direction smoothing unit 702, and a buffer 703 as shown in FIG. The The spatio-temporal direction smoothing unit 70 smoothes the depth model D (t) of the processing target image F (t) generated by the depth model generation unit 30 in the spatial direction, and the image F (t) smoothed in the spatial direction. Based on the depth model Ds (t) and the depth model Dt (t−1) smoothed in the spatio-temporal direction of the comparison target image F (t−1) in the past from the image F (t), the image F The depth model Ds (t) of (t) is smoothed in the time direction, and the depth model Dt (t) smoothed in the spatio-temporal direction of the image F (t) is generated.

上記立体画像生成装置３の動作例について、図３３、図３４に基づいて説明する。なお、図３１のシーンチェンジ検出部１０、消失点推定部２０、奥行モデル生成部３０、及び視点画像生成部４０の各動作（図３３のステップＳ７２、ステップＳ７３、ステップＳ７４、ステップＳ７６）はそれぞれ前述の図１に示した立体画像生成装置１のシーンチェンジ検出部１０、消失点推定部２０、奥行モデル推定部３０、及び視点画像生成部４０の各動作（前述の図２のステップＳ１２、ステップＳ１３、ステップＳ１４、ステップＳ１５）と同一であるため説明を省略する。 An operation example of the stereoscopic image generation device 3 will be described with reference to FIGS. 33 and 34. FIG. In addition, each operation | movement (step S72, step S73, step S74, step S76 of FIG. 33) of the scene change detection part 10, the vanishing point estimation part 20, the depth model generation part 30, and the viewpoint image generation part 40 of FIG. Each operation of the scene change detection unit 10, the vanishing point estimation unit 20, the depth model estimation unit 30, and the viewpoint image generation unit 40 of the stereoscopic image generation apparatus 1 shown in FIG. 1 (step S12 in FIG. Since it is the same as S13, step S14, and step S15), description thereof is omitted.

（時空間方向平滑化部７０について）
図３１の時空間方向平滑化部７０は、入力された処理対象画像Ｆ（ｔ）の奥行モデルＤ（ｔ）に関して、時空間方向に平滑化処理を行い、その結果（平滑化奥行モデルＤｔ（ｔ）を出力する（図３３のステップＳ７５）。 (About the spatio-temporal direction smoothing unit 70)
The spatiotemporal direction smoothing unit 70 in FIG. 31 performs a smoothing process in the spatiotemporal direction on the depth model D (t) of the input processing target image F (t), and the result (smoothed depth model Dt ( t) is output (step S75 in FIG. 33).

具体的には、図３２の空間方向平滑部７０１は、水平方向、垂直方向、または垂直方向、水平方向の順に１次元の平滑化フィルタにより空間方向に奥行モデルＤ（ｔ）を平滑化し、その結果（奥行モデルＤｓ（ｔ））を時間方向平滑部７０２へ出力する（図３４のステップＳ８１）。なお、１次元の平滑化フィルタは、例えば、１次元のガウシアンフィルタを用いる。 Specifically, the spatial direction smoothing unit 701 in FIG. 32 smoothes the depth model D (t) in the spatial direction using a one-dimensional smoothing filter in the order of the horizontal direction, the vertical direction, or the vertical direction and the horizontal direction. The result (depth model Ds (t)) is output to the time direction smoothing unit 702 (step S81 in FIG. 34). As the one-dimensional smoothing filter, for example, a one-dimensional Gaussian filter is used.

図３２の時間方向平滑部７０２は、入力された空間方向に平滑化された奥行モデルＤｓ（ｔ）と、バッファ７０３に記憶された前フレームの平滑化奥行モデルＤｔ（ｔ−１）とに基づいて、下記の式（３３）により平滑化奥行モデルＤｔ（ｔ）を生成し、その結果をバッファ７０３、及び外部へ出力する（図３４のステップＳ８２）。なお、式（３３）中の係数αは、０〜１の間の所定の値である。 The time direction smoothing unit 702 in FIG. 32 is based on the input depth model Ds (t) smoothed in the spatial direction and the smoothed depth model Dt (t−1) of the previous frame stored in the buffer 703. Then, the smoothed depth model Dt (t) is generated by the following equation (33), and the result is output to the buffer 703 and the outside (step S82 in FIG. 34). In addition, coefficient (alpha) in Formula (33) is a predetermined value between 0-1.

図３２のバッファ７０３は、前フレームの平滑化奥行モデルＤｔ（ｔ−１）を削除し、入力された現フレームの平滑化奥行モデルＤｔ（ｔ）を記憶する（図３４のステップＳ８３）。 The buffer 703 in FIG. 32 deletes the smoothed depth model Dt (t−1) of the previous frame and stores the input smoothed depth model Dt (t) of the current frame (step S83 in FIG. 34).

上記立体画像生成装置３によれば、時空間方向に奥行モデルを平滑化することにより、奥行モデル生成手段が切り替わる前後のフレーム、及びシーンチェンジが発生する前後のフレームにおいて、視差（奥行）の変化を低減することができる。 According to the stereoscopic image generating device 3, by changing the depth model in the spatio-temporal direction, the change in parallax (depth) in the frames before and after the depth model generating means is switched and in the frames before and after the scene change occurs. Can be reduced.

以上、本発明に係る立体画像生成装置の各実施形態を中心に説明してきたが、本発明は、立体画像生成装置１による立体画像生成方法の形態とすることもできる。また、この立体画像生成方法をコンピュータに実行させるためのプログラムの形態としてもよい。 As mentioned above, although it demonstrated centering on each embodiment of the stereo image production | generation apparatus which concerns on this invention, this invention can also be made into the form of the stereo image production | generation method by the stereo image production | generation apparatus 1. FIG. Moreover, it is good also as a form of the program for making a computer perform this stereo image production | generation method.

つまり、上述した実施形態における立体画像生成装置１の一部をコンピュータで実現するようにしても良い。その場合、この制御機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現しても良い。なお、ここでいう「コンピュータシステム」とは、立体画像生成装置１に内蔵されたコンピュータシステムであって、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでも良い。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 That is, you may make it implement | achieve a part of the stereo image production | generation apparatus 1 in embodiment mentioned above with a computer. In that case, the program for realizing the control function may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read by a computer system and executed. Here, the “computer system” is a computer system built in the stereoscopic image generation apparatus 1 and includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” is a medium that dynamically holds a program for a short time, such as a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line, In such a case, a volatile memory inside a computer system serving as a server or a client may be included and a program that holds a program for a certain period of time. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

また、上述した実施形態における立体画像生成装置１の一部、または全部を、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）等の集積回路として実現しても良い。立体画像生成装置１の各機能ブロックは個別にプロセッサ化してもよいし、一部、または全部を集積してプロセッサ化しても良い。また、集積回路化の手法はＬＳＩに限らず専用回路、または汎用プロセッサで実現しても良い。また、半導体技術の進歩によりＬＳＩに代替する集積回路化の技術が出現した場合、当該技術による集積回路を用いても良い。 Moreover, you may implement | achieve part or all of the stereo image production | generation apparatus 1 in embodiment mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the stereoscopic image generating apparatus 1 may be individually made into a processor, or a part or all of them may be integrated into a processor. Further, the method of circuit integration is not limited to LSI, and may be realized by a dedicated circuit or a general-purpose processor. Further, in the case where an integrated circuit technology that replaces LSI appears due to progress in semiconductor technology, an integrated circuit based on the technology may be used.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to

１、２、３…立体画像生成装置、１０…シーンチェンジ検出部、２０…消失点推定部、２１…フレーム内消失点推定部、２２…フレーム間消失点推定部、３０…奥行モデル生成部、４０…視点画像生成部、５０…縮小処理部、６０…拡大処理部、７０…時空間方向平滑化部、１０１…輝度ヒストグラム生成部、１０２，２０３，２０４、７０３…バッファ、１０３…ヒストグラム類似度算出部、１０４…シーンチェンジ判定部、２０１，２０２，３０１，３０２…切替部、２１１…エッジ検出部、２１２…直線検出部、２１３…消失点同定部、２２１…特徴点検出部、２２２…対応点算出部、２２３…変換行列算出部、２２４…消失点位置算出部、３０３…領域分割部、３０４…距離算出部、３０５…顕著度算出部、３０６…奥行値設定部、４０１…視差ベクトル算出部、４０２…テクスチャシフト部、４０３…ギャップフィリング部、４０４…フローティングウィンドウ重畳部、７０１…空間方向平滑部、７０２…時間方向平滑化部。 1, 2, 3, 3D image generation device, 10, scene change detection unit, 20, vanishing point estimation unit, 21, intra-frame vanishing point estimation unit, 22, interframe vanishing point estimation unit, 30, depth model generation unit, 40: viewpoint image generation unit, 50 ... reduction processing unit, 60 ... enlargement processing unit, 70 ... spatiotemporal direction smoothing unit, 101 ... luminance histogram generation unit, 102, 203, 204, 703 ... buffer, 103 ... histogram similarity Calculation unit 104... Scene change determination unit 201, 202, 301, 302... Switching unit 211 211 Edge detection unit 212 Straight line detection unit 213 Vanishing point identification unit 221 Feature point detection unit 222 Point calculation unit, 223 ... transformation matrix calculation unit, 224 ... vanishing point position calculation unit, 303 ... area division unit, 304 ... distance calculation unit, 305 ... saliency calculation unit, 306 ... depth value setting unit, 01 ... disparity vector calculation unit, 402 ... texture shift unit, 403 ... gap filling section, 404 ... floating window superimposing unit, 701 ... spatial direction smoothing unit, 702 ... time direction smoothing unit.

上記課題を解決するために、本発明の第１の技術手段は、２Ｄ画像に両眼立体情報を付加し、３Ｄ画像を生成する立体画像生成装置であって、処理対象画像から消失点を推定する消失点推定手段と、該消失点推定手段により消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成手段と、該奥行モデル生成手段により生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成手段とを備え、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定手段により消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成し、前記消失点推定手段は、前記処理対象画像内の直線情報から該処理対象画像の消失点を推定するフレーム内消失点推定手段と、前記処理対象画像と該処理対象画像よりも過去の比較対象画像との間の特徴点の対応関係から射影変換行列を求め、該求めた射影変換行列と前記比較対象画像における消失点の位置とにより前記処理対象画像の消失点を推定するフレーム間消失点推定手段とを備え、前記立体画像生成装置は、前記処理対象画像と前記比較対象画像との間でシーンチェンジがあったか否かを検出するシーンチェンジ検出手段を備え、該シーンチェンジ検出手段によりシーンチェンジが検出された場合、前記フレーム内消失点推定手段を選択し、前記シーンチェンジ検出手段によりシーンチェンジが検出されない場合、前記フレーム間消失点推定手段を選択することを特徴としたものである。 In order to solve the above-described problem, a first technical means of the present invention is a stereoscopic image generation apparatus that generates binocular stereoscopic information by adding binocular stereoscopic information to a 2D image, and estimates a vanishing point from the processing target image. Vanishing point estimating means, depth model generating means for generating a different depth model based on whether the vanishing point can be estimated by the vanishing point estimating means, the depth model generated by the depth model generating means, and the processing target Based on the image and the assumed viewing condition information, a viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image is provided, and the depth model generation unit can estimate the vanishing point by the vanishing point estimation unit. A depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated by the vanishing point estimating means, the depth model is based on the saliency of each pixel in the processing target image. Generated, the vanishing point estimation means, wherein the frame vanishing point estimation means from the line information for estimating the vanishing point of the process target image in the process target image, than the processing target image and the processing target image in the past An inter-frame erasure that obtains a projective transformation matrix from the correspondence of feature points with the comparison target image and estimates the vanishing point of the processing target image based on the obtained projective transformation matrix and the position of the vanishing point in the comparison target image Point estimation means, and the stereoscopic image generation apparatus further comprises scene change detection means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image. If a change is detected, select the intra-frame vanishing point estimating means, and if no scene change is detected by the scene change detecting means, That you select the vanishing point estimation means between frames is obtained by the features.

第２の技術手段は、第１の技術手段において、前記処理対象画像内の各画素の顕著度は、注目画素とその周辺画素との色差に所定の重み係数を乗じた局所的な色差と、注目画素と画像全体との色差あるいは注目画素を含む局所領域とその周辺領域との色差に所定の重み係数を乗じた大局的な色差との線形和により算出され、前記色差は、明度指数、赤−緑の知覚色度、及び黄−青の知覚色度の各差分に基づき定義されることを特徴としたものである。 A second technical means is the first technical means, wherein the saliency of each pixel in the processing target image is a local color difference obtained by multiplying a color difference between the target pixel and its surrounding pixels by a predetermined weighting factor , color Now Rui between the target pixel and the entire image is calculated by the linear sum of the global color difference multiplied by a predetermined weighting factor to the color difference between the local region and its peripheral region including the attention pixel, the color difference, lightness, red - green perception chromaticity and yellow - based on the difference between the perception chromaticity of blue is defined is obtained by said Rukoto.

第３の技術手段は、第２の技術手段において、前記奥行モデル生成手段は、前記消失点推定手段により消失点が推定される場合、前記消失点の位置と前記処理対象画像内の各画素の座標から定まる距離と、前記処理対象画像の領域分割情報とに基づいて、該領域分割情報が示す各領域内の距離の平均値を求め、前記処理対象画像内の各画素の奥行値を、前記平均値が第１の重み係数により正規化された値と、所定の奥行値との和として設定し、前記第１の重み係数は、奥行値の上限値と下限値との差分を、前記距離の上限値と下限値との差分で除算したものであり、前記消失点推定手段により消失点が推定されない場合、前記処理対象画像内の各画素の顕著度と、前記処理対象画像の領域分割情報とに基づいて、該領域分割情報が示す各領域内の顕著度の平均値を求め、前記処理対象画像内の各画素の奥行値を、前記平均値が第２の重み係数により正規化された値と、所定の奥行値との和として設定し、前記第２の重み係数は、奥行値の上限値と下限値との差分を、前記顕著度の上限値と下限値との差分で除算したものであることを特徴としたものである。 According to a third technical means, in the second technical means, when the vanishing point is estimated by the vanishing point estimating means, the depth model generating means calculates the position of the vanishing point and each pixel in the processing target image. Based on the distance determined from the coordinates and the region division information of the processing target image, an average value of the distance in each region indicated by the region division information is obtained, and the depth value of each pixel in the processing target image is An average value is set as a sum of a value normalized by a first weighting factor and a predetermined depth value, and the first weighting factor sets the difference between the upper limit value and the lower limit value of the depth value as the distance. When the vanishing point is not estimated by the vanishing point estimating means, the saliency of each pixel in the processing target image and the region division information of the processing target image are divided by the difference between the upper limit value and the lower limit value And each area indicated by the area division information based on And calculating a depth value of each pixel in the processing target image as a sum of a value obtained by normalizing the average value by a second weighting factor and a predetermined depth value, The second weighting factor is characterized in that the difference between the upper limit value and the lower limit value of the depth value is divided by the difference between the upper limit value and the lower limit value of the saliency .

第４の技術手段は、第１〜第３のいずれか１の技術手段において、前記比較対象画像の消失点の位置を含む消失点情報を記憶する記憶手段を備え、該記憶手段に前記比較対象画像の消失点情報が記憶されている場合、前記フレーム間消失点推定手段が選択され、前記記憶手段に前記比較対象画像の消失点情報が記憶されていない場合、前記フレーム内消失点推定手段が選択されることを特徴としたものである。 The fourth technical means includes storage means for storing vanishing point information including the position of the vanishing point of the comparison target image in any one of the first to third technical means, and the storage means includes the comparison target. When the vanishing point information of the image is stored, the inter-frame vanishing point estimation unit is selected, and when the vanishing point information of the comparison target image is not stored in the storage unit, the intra-frame vanishing point estimation unit It is characterized by being selected.

第５の技術手段は、第４の技術手段において、前記比較対象画像は、前記処理対象画像の１つ前の画像であることを特徴としたものである。 According to a fifth technical means, in the fourth technical means, the comparison target image is an image immediately before the processing target image.

第６の技術手段は、２Ｄ画像に両眼立体情報を付加し、３Ｄ画像を生成する立体画像生成装置による立体画像生成方法であって、前記立体画像生成装置が、処理対象画像から消失点を推定する消失点推定ステップと、該消失点推定ステップにて消失点が推定できたか否かに基づいて異なる奥行モデルを生成する奥行モデル生成ステップと、該奥行モデル生成ステップにて生成した奥行モデルと前記処理対象画像と想定視聴条件情報とに基づいて、右眼提示画像と左眼提示画像を生成する視点画像生成ステップとを備え、前記奥行モデル生成ステップは、前記消失点推定ステップにて消失点が推定できた場合、前記消失点に基づいて奥行モデルを生成し、また、前記消失点推定ステップにて消失点が推定できなかった場合、前記処理対象画像内の各画素の顕著度に基づいて奥行モデルを生成し、前記消失点推定ステップは、前記処理対象画像内の直線情報から該処理対象画像の消失点を推定するフレーム内消失点推定ステップと、前記処理対象画像と該処理対象画像よりも過去の比較対象画像との間の特徴点の対応関係から射影変換行列を求め、該求めた射影変換行列と前記比較対象画像における消失点の位置とにより前記処理対象画像の消失点を推定するフレーム間消失点推定ステップとを備え、前記立体画像生成装置が、さらに、前記処理対象画像と前記比較対象画像との間でシーンチェンジがあったか否かを検出するシーンチェンジ検出ステップを備え、該シーンチェンジ検出ステップにてシーンチェンジが検出された場合、前記フレーム内消失点推定ステップを選択し、前記シーンチェンジ検出ステップにてシーンチェンジが検出されない場合、前記フレーム間消失点推定ステップを選択することを特徴としたものである。 A sixth technical means is a stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image, and the stereoscopic image generation apparatus detects a vanishing point from a processing target image. A vanishing point estimating step, a depth model generating step for generating a different depth model based on whether or not the vanishing point can be estimated in the vanishing point estimating step, and a depth model generated in the depth model generating step; A viewpoint image generation step of generating a right eye presentation image and a left eye presentation image based on the processing target image and the assumed viewing condition information, and the depth model generation step includes a vanishing point in the vanishing point estimation step. Can be estimated, a depth model is generated based on the vanishing point, and if the vanishing point cannot be estimated in the vanishing point estimation step, To generate a depth model based on the saliency of each pixel, the vanishing point estimation step, the frame vanishing point estimation step of estimating a vanishing point of the processed image from the line information in the processed image, the processing A projection transformation matrix is obtained from a correspondence relationship between feature points between the target image and a comparison target image that is earlier than the processing target image, and the processing is performed based on the obtained projection transformation matrix and the position of the vanishing point in the comparison target image. An inter-frame vanishing point estimating step for estimating a vanishing point of the target image, and the stereoscopic image generating device further detects whether or not a scene change has occurred between the processing target image and the comparison target image A change detection step, and when a scene change is detected in the scene change detection step, the in-frame vanishing point estimation step is selected, If the scene change is not detected at Nchenji detection step is obtained by said you to select the vanishing point estimation step between the frames.

第７の技術手段は、コンピュータに、第６の技術手段における立体画像生成方法を実行させるためのプログラムである。 The seventh technical means is a program for causing a computer to execute the stereoscopic image generating method in the sixth technical means.

第８の技術手段は、第７の技術手段におけるプログラムを記録したコンピュータ読み取り可能な記録媒体である。
The eighth technical means is a computer-readable recording medium on which the program in the seventh technical means is recorded.

Claims

A stereoscopic image generating apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image,
Vanishing point estimating means for estimating vanishing points from the processing target image;
A depth model generating means for generating a different depth model based on whether or not the vanishing point can be estimated by the vanishing point estimating means;
A viewpoint image generation unit that generates a right eye presentation image and a left eye presentation image based on the depth model generated by the depth model generation unit, the processing target image, and the assumed viewing condition information;
When the vanishing point can be estimated by the vanishing point estimating unit, the depth model generating unit generates a depth model based on the vanishing point, and when the vanishing point cannot be estimated by the vanishing point estimating unit, A three-dimensional image generation apparatus that generates a depth model based on the saliency of each pixel in the processing target image.

A reduced image generating means for generating a reduced image of a predetermined image size from the processing target image;
The reduced image is input to the vanishing point estimating means and the depth model generating means, and an enlarged depth model having the same image size as the processing target image is generated from the depth model of the reduced image generated by the depth model generating means. The stereoscopic image generation apparatus according to claim 1, further comprising an enlarged depth model generation unit.

The depth model of the processing target image generated by the depth model generation means is smoothed in the spatial direction, the depth model of the processing target image smoothed in the spatial direction, and a comparison target image in the past than the processing target image. Based on the depth model smoothed in the spatiotemporal direction, the depth model of the processing target image is smoothed in the temporal direction, and the depth model smoothed in the spatiotemporal direction of the processing target image is generated. The stereoscopic image generating apparatus according to claim 1, further comprising a smoothing unit.

The assumed viewing condition information includes the pixel pitch of the display that displays the 3D image, the image size of the display, the distance from the viewer to the display, the parallax range that represents the depth of the 3D image, and between the left and right virtual viewpoints The stereoscopic image generation apparatus according to claim 1, comprising a baseline length that is a distance.

The degree of saliency of each pixel in the processing target image is a location where the color difference between the target pixel and its surrounding pixels is large, a location where the color difference between the target pixel and the entire image is large, or a local region including the target pixel and its The stereoscopic image generating apparatus according to claim 1, wherein the stereoscopic image generating apparatus according to claim 1, wherein the three-dimensional image generating apparatus is calculated so as to be higher for a portion having a larger color difference from the peripheral region.

When the vanishing point cannot be estimated by the vanishing point estimating unit, the depth model generating unit generates the depth model such that a location where the saliency of each pixel in the processing target image is high is on the near side. The stereoscopic image generating apparatus according to claim 5, wherein the apparatus is a stereoscopic image generating apparatus.

The vanishing point estimating means includes an intra-frame vanishing point estimating means for estimating a vanishing point of the processing target image from straight line information in the processing target image, and a comparison target image that is earlier than the processing target image and the processing target image. And inter-frame vanishing point estimating means for estimating the vanishing point of the processing target image based on the vanishing point position in the comparison target image. The three-dimensional image generation device described.

A scene change detecting means for detecting whether or not a scene change has occurred between the processing target image and the comparison target image, and when a scene change is detected by the scene change detecting means, the in-frame vanishing point estimating means; The stereoscopic image generating apparatus according to claim 7, wherein when a scene change is detected and no scene change is detected by the scene change detection unit, the inter-frame vanishing point estimation unit is selected.

A storage unit that stores vanishing point information including the position of the vanishing point of the comparison target image, and the vanishing point information of the comparison target image is stored in the storage unit; The stereoscopic image generating apparatus according to claim 8, wherein, when the vanishing point information of the comparison target image is not stored in the storage unit, the intra-frame vanishing point estimation unit is selected.

The stereoscopic image generation apparatus according to claim 7, wherein the comparison target image is an image immediately before the processing target image.

A stereoscopic image generation method by a stereoscopic image generation apparatus that adds binocular stereoscopic information to a 2D image and generates a 3D image,
The three-dimensional image generation device, a vanishing point estimation step for estimating a vanishing point from the processing target image,
A depth model generation step for generating a different depth model based on whether or not the vanishing point could be estimated in the vanishing point estimation step;
A viewpoint image generation step of generating a right eye presentation image and a left eye presentation image based on the depth model generated in the depth model generation step, the processing target image, and the assumed viewing condition information;
In the depth model generation step, when the vanishing point can be estimated in the vanishing point estimation step, a depth model is generated based on the vanishing point, and the vanishing point cannot be estimated in the vanishing point estimation step. A depth model is generated based on the saliency of each pixel in the processing target image.

A program for causing a computer to execute the stereoscopic image generation method according to claim 11.

A computer-readable recording medium on which the program according to claim 12 is recorded.