JP5809607B2

JP5809607B2 - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP5809607B2
Application number: JP2012149302A
Authority: JP
Inventors: 小軍ウ; 章平延原; 隆司松山
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-07-03
Filing date: 2012-07-03
Publication date: 2015-11-11
Anticipated expiration: 2032-07-03
Also published as: JP2014010805A

Description

本発明は、多視点画像から任意視点の画像を生成する画像処理装置、画像処理方法及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus, an image processing method, and an image processing program that generate an image of an arbitrary viewpoint from a multi-viewpoint image.

従来から、多視点カメラによって撮影された画像データをコンピュータ内部で融合し、任意視点からの見え方を提示する任意視点画像合成技術が知られている（例えば、非特許文献１参照）。任意視点画像合成には、多視点から撮影された物体のシルエットとカメラの光学中心によって形成される錐体の積集合空間であるＶｉｓｕａｌＨｕｌｌを用いるのが一般的である。任意視点画像合成は、ＶｉｓｕａｌＨｕｌｌを用いた対象物の３Ｄ（３次元）形状を求めた後、形状の精細化を図る処理を経て、仮想カメラの視線情報（視点位置、視線方向）に従い、テクスチャ計算を行うことにより行う。 2. Description of the Related Art Conventionally, there has been known an arbitrary viewpoint image synthesis technique that fuses image data captured by a multi-viewpoint camera inside a computer and presents the appearance from an arbitrary viewpoint (see, for example, Non-Patent Document 1). For arbitrary viewpoint image synthesis, it is common to use Visual Hull, which is the product set space of cones formed by the silhouette of an object photographed from multiple viewpoints and the optical center of the camera. Arbitrary viewpoint image synthesis is a process of obtaining a 3D (three-dimensional) shape of an object using Visual Hull, and then performing a process for refining the shape, and in accordance with the line-of-sight information (viewpoint position and line-of-sight direction) of the virtual camera. This is done by calculating.

任意視点画像の合成においては、マルチカメラシステムによる多視点画像を入力とし、合成対象の３Ｄ形状を求め、形状表面のテクスチャを計算するという一連の計算処理であるが、合成画像の画質を大きく左右する計算指標に３Ｄ形状の正確さと空間解像度が含まれる。 Arbitrary viewpoint image synthesis is a series of calculation processes in which multi-viewpoint images from a multi-camera system are input, the 3D shape to be synthesized is obtained, and the texture of the shape surface is calculated. Calculation indices to include include 3D shape accuracy and spatial resolution.

図４は、従来技術による任意視点画像合成の処理動作を示すフローチャートである。図４を参照して、従来技術による任意視点画像合成の処理動作を説明する。ます、マルチカメラシステムを用い、多視点で撮影した画像から、ＶｉｓｕａｌＨｕｌｌを用いた対象物の３Ｄ形状を計算する（ステップＳ１１）。このステップで求まった形状は、正確さに欠け、対象物の形状を近似した状態である。次に、ＶｉｓｕａｌＨｕｌｌを公知の方法で、正確な形状に近づける計算を行う（ステップＳ１２）。これにより、複数の視点に対応した対象シルエットが求まる。 FIG. 4 is a flowchart showing the processing operation of arbitrary viewpoint image synthesis according to the prior art. With reference to FIG. 4, the processing operation of arbitrary viewpoint image synthesis according to the prior art will be described. First, a 3D shape of an object using Visual Hull is calculated from an image photographed from multiple viewpoints using a multi-camera system (step S11). The shape obtained in this step is inaccurate and is in a state approximating the shape of the object. Next, calculation is performed to approximate the Visual Hull to an accurate shape by a known method (step S12). As a result, target silhouettes corresponding to a plurality of viewpoints are obtained.

次に、任意視点画像を合成するための仮想カメラの視点位置および視線方向などの情報（仮想カメラの画角情報）を決定する（ステップＳ１３）。そして、決定された仮想カメラの画角情報に基づき、３Ｄ形状表面のテクスチャを計算してテクスチャ・マッピングを行う（ステップＳ１４）。最後に、求められた仮想カメラ画像の描画処理を行う（ステップＳ１５）。これにより任意視点画像が合成されて表示されることになる。 Next, information (virtual camera angle-of-view information) such as the viewpoint position and line-of-sight direction of the virtual camera for combining the arbitrary viewpoint images is determined (step S13). Then, based on the determined field angle information of the virtual camera, the texture of the 3D shape surface is calculated and texture mapping is performed (step S14). Finally, the obtained virtual camera image is drawn (step S15). As a result, an arbitrary viewpoint image is synthesized and displayed.

Matsuyama T. and Takai T. and Wu X. and Nobuhara S.: "Gener-ation, Editing, and Visualization of 3D Video", "The Transactions of The Virtual Reality Society of Japan", No.4 Vol.7, pp."521-532"(2002).Matsuyama T. and Takai T. and Wu X. and Nobuhara S .: "Gener-ation, Editing, and Visualization of 3D Video", "The Transactions of The Virtual Reality Society of Japan", No.4 Vol.7, pp . "521-532" (2002).

しかしながら、従来技術による任意視点画像合成にあっては、３Ｄ形状の精細化処理（ＶｉｓｕａｌＨｕｌｌの精細化：ステップＳ１２）の後で仮想カメラの画角を決定しており、３Ｄ形状の精細化の処理が仮想カメラの画角情報と無関係に実施されるため、精細化の対象が３Ｄ形状全体となり、計算負荷が高く、計算処理効率も悪いという問題がある。 However, in the arbitrary viewpoint image synthesis according to the prior art, the angle of view of the virtual camera is determined after the 3D shape refinement processing (Visual Hull refinement: step S12), and the 3D shape refinement is performed. Since the processing is performed regardless of the angle of view information of the virtual camera, there is a problem that the object to be refined is the entire 3D shape, the calculation load is high, and the calculation processing efficiency is poor.

本発明は、このような事情に鑑みてなされたもので、任意視点の画像を生成する際に、計算負荷を低減し、計算処理効率を高めることができるとともに、生成される画像の画質を向上させることができる画像処理装置、画像処理方法及び画像処理プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and can reduce calculation load and increase calculation processing efficiency when generating an image of an arbitrary viewpoint, and improve the image quality of the generated image. An object of the present invention is to provide an image processing apparatus, an image processing method, and an image processing program.

本発明は、対象物を異なる位置から撮像した複数の入力カメラの画像から、任意の視点位置に配置された仮想カメラの画像を生成する画像処理装置であって、前記複数の入力カメラの画像から前記対象物の３次元形状を計算して求める３次元形状計算手段と、前記仮想カメラの画角情報を決定する画角情報決定手段と、前記３次元形状を前記画角情報に基づき前記仮想カメラの画像上に投影処理する投影手段と、前記仮想カメラ画像の各画素について、仮想カメラ座標系における奥行き値を計算して求める奥行き計算手段と、前記奥行き値の集合について、３次元形状の正確さを表す指標であるｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙが最大となるように最適化を行う最適化処理手段と、前記最適化された前記仮想カメラの全画素の奥行き値に基づき、前記仮想カメラの画像の画素値を計算して求めることにより前記仮想カメラの画像を生成する画像生成手段とを備えることを特徴とする。 The present invention is an image processing apparatus for generating an image of a virtual camera arranged at an arbitrary viewpoint position from images of a plurality of input cameras taken from different positions of an object, and from the images of the plurality of input cameras. A three-dimensional shape calculating means for calculating and obtaining a three-dimensional shape of the object; an angle-of-view information determining means for determining angle-of-view information of the virtual camera; and the virtual camera based on the angle-of-view information. 3D shape accuracy with respect to the projection means for performing projection processing on the image, depth calculation means for calculating the depth value in the virtual camera coordinate system for each pixel of the virtual camera image, and the set of depth values and optimization processing means for performing optimization to maximize an indicator representing the photo-consistenc y is the optimized depth values of all pixels of the virtual camera Based, characterized in that it comprises an image producing means for producing an image of the virtual camera by obtaining by calculating the pixel values of the image of the virtual camera.

本発明は、対象物を異なる位置から撮像した複数の入力カメラの画像から、任意の視点位置に配置された仮想カメラの画像を生成する画像処理装置が行う画像処理方法であって、前記複数の入力カメラの画像から前記対象物の３次元形状を計算して求める３次元形状計算ステップと、前記仮想カメラの画角情報を決定する画角情報決定ステップと、前記３次元形状を前記画角情報に基づき前記仮想カメラの画像上に投影処理する投影ステップと、前記仮想カメラ画像の各画素について、仮想カメラ座標系における奥行き値を計算して求める奥行き計算ステップと、前記奥行き値の集合について、３次元形状の正確さを表す指標であるｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙが最大となるように最適化を行う最適化処理ステップと、前記最適化された前記仮想カメラの全画素の奥行き値に基づき、前記仮想カメラの画像の画素値を計算して求めることにより前記仮想カメラの画像を生成する画像生成ステップとを有することを特徴とする。 The present invention is an image processing method performed by an image processing apparatus that generates an image of a virtual camera arranged at an arbitrary viewpoint position from images of a plurality of input cameras obtained by imaging an object from different positions, A three-dimensional shape calculating step for calculating a three-dimensional shape of the object from an input camera image; an angle-of-view information determining step for determining angle-of-view information of the virtual camera; and the three-dimensional shape as the angle-of-view information. a projection step of projecting process to the virtual camera on the image based on, for each pixel of the virtual camera image, the depth calculation step of obtaining by calculating the depth values in the virtual camera coordinate system, the set of the depth value, 3 and optimization process steps photo-consistenc y performs optimization so as to maximize an indicator indicating the accuracy of the dimension shape, the optimization Based on the depth values of all pixels of the virtual camera, and having an image generating step of generating an image of the virtual camera by obtaining by calculating the pixel values of the image of the virtual camera.

本発明は、コンピュータを、前記画像処理装置として機能させるための画像処理プログラムである。 The present invention is an image processing program for causing a computer to function as the image processing apparatus.

本発明によれば、任意視点画像生成における仮想カメラの画角情報に基づき３Ｄ形状精細化を行うことで、精細化の対象領域を限定することができるため、計算負荷を低減し、計算処理効率を高めることができるとともに、生成される画像の画質を向上させることができるという効果が得られる。 According to the present invention, since the 3D shape refinement is performed based on the angle-of-view information of the virtual camera in generating an arbitrary viewpoint image, it is possible to limit the target area for the refinement. As a result, the image quality of the generated image can be improved.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示す画像処理部４の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the image process part 4 shown in FIG. 図２に示す可視領域のＶｉｓｕａｌＨｕｌｌの精細化（ステップＳ１６）の処理動作の詳細を示すフローチャートである。FIG. 3 is a flowchart showing details of a processing operation for refining the Visual Hull in the visible region shown in FIG. 2 (step S <b> 16). 従来技術による任意視点画像合成の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation | movement of the arbitrary viewpoint image composition by a prior art.

以下、図面を参照して、本発明の一実施形態による画像処理装置を説明する。図１は同実施形態の構成を示すブロック図である。この図において、符号１１〜１３は、マルチカメラを構成する複数の撮像部である。撮像部１１〜１３は、それぞれ設定が異なるズームレンズが装着されたカメラで構成する。図１においては、３台の撮像部を図示したが、４台以上であってもよく、通常は１２台程度備えているのが望ましい。符号２は、複数の撮像部１１〜１３から出力する画像データを記憶する画像記憶部である。符号３は、所望の任意視点位置を入力する視点入力部である。視点入力部３は、ユーザが所望の視点位置を指定するための入力操作を行う機能を有している。 Hereinafter, an image processing apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of the embodiment. In this figure, reference numerals 11 to 13 denote a plurality of imaging units constituting a multi-camera. The imaging units 11 to 13 are configured by cameras equipped with zoom lenses having different settings. Although three imaging units are illustrated in FIG. 1, four or more imaging units may be provided, and it is usually desirable to have about 12 units. Reference numeral 2 denotes an image storage unit that stores image data output from the plurality of imaging units 11 to 13. Reference numeral 3 denotes a viewpoint input unit that inputs a desired arbitrary viewpoint position. The viewpoint input unit 3 has a function of performing an input operation for the user to specify a desired viewpoint position.

符号４は、視点入力部３によって入力された任意の視点位置から見た画像を合成する画像処理部である。符号５は、画像処理部４において合成された画像を表示する画像表示部であり、ディスプレイ装置等で構成する。なお、画像処理装置の構成を図１を参照して説明するに際して、多視点画像から任意視点の画像を合成する画像処理装置が普通に有する公知の機能・構成については、本発明の説明に直接関わりがない限り、その説明及び構成の図示を省略する。なお、本明細書の説明においては、動画像（映像）の１フレーム分を画像と称している。 Reference numeral 4 denotes an image processing unit that synthesizes an image viewed from an arbitrary viewpoint position input by the viewpoint input unit 3. Reference numeral 5 denotes an image display unit that displays an image synthesized by the image processing unit 4, and is configured by a display device or the like. When the configuration of the image processing apparatus is described with reference to FIG. 1, known functions and configurations that are commonly possessed by an image processing apparatus that synthesizes an arbitrary viewpoint image from a multi-viewpoint image are directly described in the description of the present invention. As long as there is no relation, the description and illustration of the configuration are omitted. In the description of the present specification, one frame of a moving image (video) is referred to as an image.

次に、図２を参照して、図１に示す画像処理部４の動作を説明する。図２は、図１に示す画像処理部４の動作を示すフローチャートである。図２において、図４に示す従来技術による処理動作と同一の部分には同一の符号を付し、その説明を省略する。この図に示す処理動作が従来技術による処理動作と異なる点は、ＶｉｓｕａｌＨｕｌｌの精細化処理（ステップＳ１２）を行わず、仮想カメラ画角の決定処理（ステップＳ１３）の後に、可視領域のみについてＶｉｓｕａｌＨｕｌｌの精細化処理（ステップＳ１６）を行うようにした点である。 Next, the operation of the image processing unit 4 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the operation of the image processing unit 4 shown in FIG. In FIG. 2, the same parts as those in the processing operation according to the prior art shown in FIG. The processing operation shown in this figure is different from the processing operation according to the prior art in that the Visual Hull refining process (step S12) is not performed, and the virtual camera field angle determination process (step S13) is performed only for the visible region. This is the point that the Hull refinement process (step S16) is performed.

画像処理部４は、視点入力部３からの視点位置情報に基づき、仮想カメラの画角情報（視点位置、視線方向、画角）を決定すると（ステップＳ１３）、仮想カメラの画角情報に応じて、精細化を行う領域を限定し、ｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙと呼ばれる評価指標を向上するように３Ｄ形状の精細化を行う（ステップＳ１６）。そして、決定された仮想カメラの画角情報に基づき、３Ｄ形状表面のテクスチャを計算してテクスチャ・マッピングを行う（ステップＳ１４）。 When the image processing unit 4 determines the viewing angle information (viewing point position, viewing direction, and viewing angle) of the virtual camera based on the viewpoint position information from the viewpoint input unit 3 (step S13), the image processing unit 4 responds to the viewing angle information of the virtual camera. Thus, the region to be refined is limited, and refinement of the 3D shape is performed so as to improve the evaluation index called photo-consistency (step S16). Then, based on the determined field angle information of the virtual camera, the texture of the 3D shape surface is calculated and texture mapping is performed (step S14).

次に、図３を参照して、図２に示すＶｉｓｕａｌＨｕｌｌの精細化処理（ステップＳ１６）の詳細を説明する。図３は、図２に示す可視領域のＶｉｓｕａｌＨｕｌｌの精細化（ステップＳ１６）の処理動作の詳細を示すフローチャートである。説明を簡単にするために、次の表記を定義する。３Ｄ形状をＳ、仮想カメラを＾ｃ（＾はｃの上に付く、以下、同様）、入力カメラをｃｉ∈Ｃ（ｉ＝１，．．．，ｎ）とする。 Next, the details of the Visual Hull refinement process (step S16) shown in FIG. 2 will be described with reference to FIG. FIG. 3 is a flowchart showing details of the processing operation of visual hull refinement (step S16) in the visible region shown in FIG. To simplify the explanation, the following notation is defined. Assume that the 3D shape is S, the virtual camera is {circumflex over (c)} (where ^ is attached to c, and so on), and the input camera is ciεC (i = 1,..., N).

まず、仮想カメラ画角が決定すると、画像処理部４は、３Ｄ形状Ｓを仮想カメラ＾ｃの画面（合成すべき画像）上に投影する（ステップＳ１６１）。続いて、画像処理部４は、仮想カメラ＾ｃ画面上の１画素ｐを選択し（ステップＳ１６２）、選択された画素ｐについて、仮想カメラ＾ｃのカメラ座標系の奥行き（視点から対象物までの距離）ｄｐを計算する（ステップＳ１６３）。画像処理部４は、ステップＳ１６２、Ｓ１６３の処理を仮想カメラ＾ｃの全ての画素について繰り返し行う。これにより、仮想カメラ＾ｃに基づく仮想距離画像が得られることになる。 First, when the virtual camera angle of view is determined, the image processing unit 4 projects the 3D shape S onto the screen (image to be synthesized) of the virtual camera ^ c (step S161). Subsequently, the image processing unit 4 selects one pixel p on the virtual camera ^ c screen (step S162), and for the selected pixel p, the depth (from the viewpoint to the object) of the camera coordinate system of the virtual camera ^ c. (Distance) dp is calculated (step S163). The image processing unit 4 repeats the processes of steps S162 and S163 for all the pixels of the virtual camera ^ c. As a result, a virtual distance image based on the virtual camera ^ c is obtained.

次に、画像処理部４は、得られた仮想距離画像のｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙの最適化を行う（ステップＳ１６４）。ｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙとは、３Ｄ形状の正確さを表す指標の一つである。３Ｄ空間中の１点について、複数のカメラで撮影された場合、それぞれのカメラ上の画素値を比較し、カメラ間の一致度が考えられる。実際には、カメラ間の画素の相違を表すエラー関数を定義し、３Ｄ形状が点の集合と考え、すべての点におけるエラー関数の結合をｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙの逆数と定義される。３Ｄ形状Ｓが正確に求まった場合、表面のすべての点において、エラー値が０となる。 Next, the image processing unit 4 optimizes the photo-consistency of the obtained virtual distance image (step S164). “Photo-consistency” is one of indices indicating the accuracy of the 3D shape. When one point in the 3D space is photographed by a plurality of cameras, pixel values on the respective cameras are compared, and the degree of coincidence between the cameras can be considered. Actually, an error function that represents a pixel difference between cameras is defined, and a 3D shape is considered as a set of points, and a combination of error functions at all points is defined as an inverse of photo-consistency. When the 3D shape S is accurately obtained, the error value becomes 0 at all points on the surface.

ここでは、仮想距離画像上の１画素ｐについて、真の距離をｄ_ｐ＋δ_ｐと定義し、エラー関数Ｅ_ｐ（δ_ｐ）を（１）式のように定義する。

ただし、Ｃ_ｑは全入力カメラＣのうち、画素ｐに対応する３Ｄ形状Ｓ表面の点が撮影されるカメラの集合を表す。ＺＮＣＣ（π_δｐ，ｃ_ｉ，ｃ_ｊ）は、微小平面π_δｐをカメラｃ_ｉとｃ_ｊの画像上に投影した位置での画素値の間のＺＮＣＣ（zero-mean-cross-correlation）である。また、｜Ｃ_ｑ｜Ｃ_２はカメラ集合Ｃ_ｑから２台のカメラを選ぶ組み合わせの数である。 Here, for one pixel p on the virtual distance image, the true distance is defined as d _p + δ _p, and the error function E _p (δ _p ) is defined as in equation (1).

Here, C _q represents a set of cameras in which the points on the surface of the 3D shape S corresponding to the pixel p among all the input cameras C are captured. ZNCC (π _δp , c _i , c _j ) is a ZNCC (zero-mean-cross-correlation) between pixel values at a position where the minute plane π _δp is projected on the images of the cameras c _i and c _j. . | C _q | C ₂ is the number of combinations for selecting two cameras from the camera set C _q .

ここでは、Ｅ_ｐ（δ_ｐ）に加え、エラー関数Ｅ_ｄ（δ_ｐ）及びＥ_ｃ（δ_ｐ，δ_ｐ’）を組み合わせる。
Ｅ_ｄ（δ_ｐ）＝｜δ_ｐ｜（２）
Ｅ_ｃ（δ_ｐ，δ_ｐ’）＝｜δ_ｐ−δ_ｐ’｜（３） Here, in addition to E _p (δ _p ), error functions E _d (δ _p ) and E _c (δ _p , δ _{p ′} ) are combined.
E _d (δ _p ) = | δ _p | (2)
E _c (δ _p , δ _{p ′} ) = | δ _p −δ _{p ′} | (3)

よって、最適化の目的関数を次のように定義する。

ただし、Ｐは全画素の集合、Ｎは画素間の連結関係の集合であり、λ_ｐ，λ_ｄ，λ_ｃは各項の重み係数である。 Therefore, the optimization objective function is defined as follows.

Here, P is a set of all pixels, N is a set of connection relations between pixels, and λ _p , λ _d , and λ _c are weighting factors of each term.

次に、画像処理部４は、テクスチャ・マッピングを行う（ステップＳ１４）。以降の処理動作は従来技術（図４に示す処理動作）と同様である。 Next, the image processing unit 4 performs texture mapping (step S14). Subsequent processing operations are the same as those of the prior art (processing operation shown in FIG. 4).

以上説明したように、３Ｄ形状の精細化を仮想カメラ画角の決定処理の後に行うようにしたため、仮想カメラ画角情報に基づき可視領域のみを精細化することができるため、処理の負荷を低減することができ、計算処理効率を向上させることができる。また、任意視点画像の合成においては、マルチカメラシステムによる多視点画像を入力とし、合成対象の３Ｄ形状を求め、形状表面のテクスチャを計算するという一連の計算処理であるが、合成画像の画質を大きく左右する計算指標に３Ｄ形状の正確さと空間解像度が含まれる。本実施形態では仮想カメラの画角情報に基づき、３Ｄ形状のｐｈｏｔｏ−ｃｏｎｓｉｓｔｅｎｃｙを向上することにより、合成画像の高画質化を図ることができる。 As described above, since the 3D shape is refined after the virtual camera field angle determination process, only the visible region can be refined based on the virtual camera field angle information, thus reducing the processing load. It is possible to improve the calculation processing efficiency. In addition, in the synthesis of an arbitrary viewpoint image, a multi-viewpoint image by a multi-camera system is input, a 3D shape to be synthesized is obtained, and the texture of the shape surface is calculated. 3D shape accuracy and spatial resolution are included in the calculation index that greatly affects. In the present embodiment, it is possible to improve the image quality of the composite image by improving the 3D-shaped photo-consistency based on the view angle information of the virtual camera.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより任意視点画像合成処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The program for realizing the functions of the processing unit in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed to execute arbitrary viewpoint image synthesis. Processing may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other changes of the components may be made without departing from the technical idea and scope of the present invention.

対象の３Ｄ形状に基づく任意視点画像合成技術は、画像配信の付加価値を高め、インタラクティブな視聴体験を実現する。さらに、臨場感の高い遠隔テレビ会議や、テレワークスを可能にし、画像通信のキラーアプリの創出に大いに貢献できると期待できる。 Arbitrary viewpoint image composition technology based on the target 3D shape increases the added value of image distribution and realizes an interactive viewing experience. In addition, it can be expected to contribute greatly to the creation of killer apps for image communication by enabling remote teleconferencing and teleworking with a high sense of reality.

１１、１２、１３・・・撮像部、２・・・画像記憶部、３・・・視点入力部、４・・・画像処理部、５・・・画像表示部 DESCRIPTION OF SYMBOLS 11, 12, 13 ... Imaging part, 2 ... Image storage part, 3 ... Viewpoint input part, 4 ... Image processing part, 5 ... Image display part

Claims

An image processing device that generates an image of a virtual camera arranged at an arbitrary viewpoint position from images of a plurality of input cameras obtained by capturing an object from different positions,
Three-dimensional shape calculation means for calculating and calculating a three-dimensional shape of the object from the images of the plurality of input cameras;
Angle-of-view information determining means for determining angle-of-view information of the virtual camera;
Projection means for projecting the three-dimensional shape onto the image of the virtual camera based on the angle-of-view information;
Depth calculation means for calculating a depth value in a virtual camera coordinate system for each pixel of the virtual camera image;
An optimization processing means for optimizing the set of depth values so that a photo-consistency that is an index representing the accuracy of a three-dimensional shape is maximized;
Image generation means for generating an image of the virtual camera by calculating and obtaining a pixel value of the image of the virtual camera based on the optimized depth values of all the pixels of the virtual camera. An image processing apparatus.

An image processing method performed by an image processing apparatus that generates an image of a virtual camera arranged at an arbitrary viewpoint position from images of a plurality of input cameras obtained by imaging an object from different positions,
A three-dimensional shape calculation step for calculating a three-dimensional shape of the object from images of the plurality of input cameras;
An angle-of-view information determining step for determining angle-of-view information of the virtual camera;
A projecting step of projecting the three-dimensional shape onto the image of the virtual camera based on the angle-of-view information;
Depth calculation step for calculating a depth value in a virtual camera coordinate system for each pixel of the virtual camera image;
An optimization processing step for optimizing the set of depth values so that a photo-consistency that is an index representing the accuracy of a three-dimensional shape is maximized;
An image generation step of generating an image of the virtual camera by calculating and obtaining a pixel value of the image of the virtual camera based on the optimized depth value of all the pixels of the virtual camera. Image processing method.

An image processing program for causing a computer to function as the image processing apparatus according to claim 1.