JP2010250452A

JP2010250452A - Arbitrary viewpoint image synthesizing device

Info

Publication number: JP2010250452A
Application number: JP2009097580A
Authority: JP
Inventors: Takayuki Hamamoto; 隆之浜本; Tadaaki Hosaka; 忠明保坂; Keiichi Kimura; 圭一木村; Sakae Saito; 栄齋藤; Seiichi Tanaka; 誠一田中; Nao Shibuhisa; 奈保澁久; Shunichi Sato; 俊一佐藤
Original assignee: Tokyo University of Science; Sharp Corp
Current assignee: Tokyo University of Science; Sharp Corp
Priority date: 2009-04-14
Filing date: 2009-04-14
Publication date: 2010-11-04
Also published as: WO2010119852A1

Abstract

PROBLEM TO BE SOLVED: To provide an arbitrary viewpoint image synthesizing device which generates an image in which an image capturing direction and an eye direction match by subjecting images captured by a plurality of cameras in different image capturing directions to synthesis processing. SOLUTION: An image from each camera 2A, 2B and 2C is coordinate-transformed by coordinate transformation parts 21, 22 and 23 based on a parameter stored in a camera parameter storage part 26. The transformed image is transferred to a distance estimation part 24 and an image synthesis part 25. The distance estimation part 24 evaluates those three images by block matching for each two image as one set. The image synthesis part 25 composites the images of virtual camera viewpoints by extracting and attaching a corresponding pixel color from the coordinate transformed image to the created distance image of an object. COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、例えばテレビ会議システムまたはテレビ電話システムに適用可能な任意視点画像合成装置に関するものであり、表示装置の周囲に配置した複数のカメラからの被写体画像を、利用者の視点位置に応じた画像に合成する任意視点画像合成装置に関するものである。 The present invention relates to an arbitrary viewpoint image composition device applicable to, for example, a video conference system or a video phone system, and subject images from a plurality of cameras arranged around a display device according to a user's viewpoint position. The present invention relates to an arbitrary viewpoint image composition device that composes an image.

テレビ会議システムやテレビ電話システム等のテレビ対話システムにおいて動画像より得られる非言語情報は、身振り、姿勢等、様々なものがある。その中でも、視線は、発言に対する自信や相手に対する信頼の有無を示したり、相手に発言を促したりと互いの気持ちや雰囲気の伝達に大きく寄与する情報である。そのためテレビ対話システムでは、端末の利用者同士が実空間と同じように視線を正確に得られることが重要となる。 Non-linguistic information obtained from a moving image in a video dialogue system such as a video conference system or a video phone system includes a variety of gestures and postures. Among them, the line of sight is information that greatly contributes to the transmission of each other's feelings and atmosphere, such as showing confidence in the speech and the presence or absence of trust in the partner, or prompting the partner to speak. For this reason, it is important in the television dialogue system that the users of the terminals can obtain the line of sight accurately as in the real space.

しかし、現状のテレビ対話システムではこれを実現することは容易ではない。１対１の対話において互いが正面を向き合った時に視線を一致させるためには、利用者を撮影するカメラをディスプレイ中央部に配置することが必要である。この場合、正面方向を向いた画像を得ることができるが、カメラがディスプレイの画像を遮るために実用的な配置とは言えない。そのため、カメラをディスプレイ周辺部に配置することになるが、この場合、カメラの撮影方向と利用者の視線方向にズレが生じるため、正面を向いていない画像を得ることになる。この視線がズレた画像をシステムで利用した場合、対話する相手と視線を合わせることができないという問題が生じる。特にこの問題は大型ディスプレイ利用時に顕著に表れる。 However, it is not easy to realize this with the current television dialogue system. In order to match the line of sight when the front faces each other in a one-on-one dialogue, it is necessary to arrange a camera for photographing the user in the center of the display. In this case, an image facing the front direction can be obtained, but it cannot be said to be a practical arrangement because the camera blocks the image on the display. For this reason, the camera is arranged at the periphery of the display. In this case, an image that is not facing the front is obtained because of a deviation between the shooting direction of the camera and the viewing direction of the user. When an image in which the line of sight is shifted is used in the system, there arises a problem that the line of sight cannot be matched with the other party to talk with. This problem is particularly noticeable when using large displays.

前述のように視線はコミュニケーションにとって重要な情報であるため、視線の不一致は動画像を利用して対話を行う利点を大きく損なってしまう。これまで、ハードウェア及びアルゴリズムの観点から視線が一致しない問題の解決が試みられている。従来より、ハーフミラーを用いたシステムが数多く知られているが、装置がかさばる、箱を覗き込む感じ（画像が奥に見える）、カメラが奥にあり真正面にいる必要がある、等の課題があり、主流になっていない。 As described above, since the line of sight is important information for communication, the mismatch of the line of sight greatly impairs the advantage of performing dialogue using moving images. Up to now, attempts have been made to solve the problem that the line of sight does not match from the viewpoint of hardware and algorithm. Many systems using half-mirrors have been known in the past. However, there are problems such as the device being bulky, the feeling of looking into the box (the image can be seen in the back), and the camera must be in the back and in front of you. Yes, not mainstream.

このような問題を解決するために、ディスプレイ中央視点の画像を周囲のカメラから合成することで、視線一致を実現する方法が知られている（例えば、後述の特許文献１、２参照）。 In order to solve such a problem, a method of realizing line-of-sight matching by synthesizing an image of a display center viewpoint from surrounding cameras is known (for example, see Patent Documents 1 and 2 described later).

特許文献１のネットワーク会議画像処理装置においては、２台のビデオカメラ、ディスプレイ装置、画像処理部、画像合成部、視点検出部、画像表示処理部、画像作成部、表示制御部及び伝送路を有し、他のネットワーク会議画像処理装置との間で画像情報、視点情報及び画像制御情報を伝送し、この情報によって会議者の視点の位置変化に対応して伝送する画像の撮影位置及び撮影範囲を変化させ、会議者の視点の移動に対応して、その視線を一致させることができる。 The network conference image processing apparatus disclosed in Patent Document 1 includes two video cameras, a display device, an image processing unit, an image composition unit, a viewpoint detection unit, an image display processing unit, an image creation unit, a display control unit, and a transmission path. Then, image information, viewpoint information, and image control information are transmitted to and from other network conference image processing apparatuses, and the shooting position and shooting range of the image to be transmitted corresponding to the change in the position of the viewpoint of the conference party are determined based on this information. The line of sight can be matched in accordance with the movement of the viewpoint of the conference person.

また、特許文献２の画像処理装置および画像処理方法においては、ディスプレイの両側に複数のカメラを配置した撮影部で撮影された入力画像より、被写体の３次元情報を抽出し、その３次元情報と受信者の自由な視点位置の情報より再構成・表示部で被写体の出力画像を表示する。これにより利用者の視線一致を可能にし、受信者の自由な視点位置での送信者の画像を生成することができる。 Further, in the image processing apparatus and the image processing method disclosed in Patent Document 2, three-dimensional information of a subject is extracted from an input image photographed by a photographing unit in which a plurality of cameras are arranged on both sides of a display, and the three-dimensional information and The output image of the subject is displayed on the reconstruction / display unit based on the information on the free viewpoint position of the receiver. Thereby, the user's line of sight can be matched, and the image of the sender at a free viewpoint position of the receiver can be generated.

また、予め撮影した視線の一致した目の画像を対話者の顔に貼り付けるような簡易的な方法も知られている（例えば、後述の特許文献３参照）。特許文献３の撮像機能付携帯電話機とその制御方法及び制御プログラムにおいては、カメラに対して視線が一致した画像を記憶部に予め記憶しておき、カメラが撮影する通話中の通話者の画像に、予め記憶されている視線が一致した画像を、貼り付けて視線が補正された画像を作成し、視線が補正された画像を相手通話者に送信することができる。 A simple method is also known in which an image of eyes that have been photographed in advance and that matches the line of sight is pasted on the face of the conversation person (for example, see Patent Document 3 described later). In the mobile phone with an imaging function and the control method and control program of Patent Document 3, an image whose line of sight coincides with the camera is stored in advance in the storage unit, and the image of the caller during a call captured by the camera is stored. Then, it is possible to create an image in which the line of sight is corrected by pasting an image in which the line of sight stored in advance is matched, and to transmit the image in which the line of sight is corrected to the other party.

特開平１１−３５５８０４号公報JP 11-355804 A 特開２００１−５２１７７号公報JP 2001-52177 A 特開２００５−１１７１０６号公報JP 2005-117106 A

しかしながら、特許文献１〜２の画像処理装置では、特徴点ベースで３次元位置を推定し、空間モデルにマッピングするため、２次元画像としての画質が劣化し、自然なコミュニケーションの妨げになる、という課題があった。特に、顔または目の画質の劣化や不自然さは、視線一致等の自然なコミュニケーションに悪影響を及ぼす、という課題があった。また、ディスプレイに表示した目の位置に撮影方向と視線方向を一致させた仮想カメラを正確に設置できず、不自然な視線となる、という課題もあった。 However, in the image processing apparatuses of Patent Documents 1 and 2, since the three-dimensional position is estimated on the basis of feature points and mapped to a spatial model, the image quality as a two-dimensional image is deteriorated, which hinders natural communication. There was a problem. In particular, there has been a problem that deterioration in image quality or unnaturalness of the face or eyes adversely affects natural communication such as line-of-sight matching. In addition, there is a problem in that a virtual camera in which the shooting direction and the line-of-sight direction coincide with the position of the eyes displayed on the display cannot be accurately set, resulting in an unnatural line of sight.

また、特許文献３は、予め撮影した正面の目を貼り付けるため、目だけは正面を見ている感じはあるものの、顔全体として見た場合、不自然な画像になる、という課題があった。特に、正面の目は予め撮影した静止画のため、動画像にした場合、目の部分だけ動きが無いという不自然な画像になり、動画像には適用できない、という課題もあった。 In addition, Patent Document 3 has a problem that, since the front eyes photographed in advance are pasted, there is a feeling that only the eyes are looking at the front, but when the whole face is viewed, an unnatural image is formed. . In particular, since the front eye is a still image taken in advance, there is a problem that when it is converted into a moving image, it becomes an unnatural image in which only the eye portion does not move and cannot be applied to a moving image.

本発明は、このような事情に鑑みてなされたもので、撮影方向の異なる複数のカメラの画像を合成処理して、撮影方向と視線方向を一致させた画像を生成する任意視点画像合成装置を提供することを目的とする。 The present invention has been made in view of such circumstances, and an arbitrary viewpoint image synthesis device that generates an image in which a shooting direction and a line-of-sight direction are matched by combining images of a plurality of cameras having different shooting directions. The purpose is to provide.

本発明は、表示装置の表示部周囲の上部、右側面部、左側面部の少なくとも３箇所に設置され、被写体の画像を撮影するカメラと、前記カメラにより撮影された各画像に基づいて、被写体の視線方向にある仮想カメラから撮影された画像を合成する画像合成処理部と、を備え、
前記画像合成処理部は、前記カメラのカメラキャリブレーションによるカメラパラメータを用いてカメラ座標系から仮想カメラ座標系に変換する座標変換部と、前記座標変換部で変換した画像に対し被写体の距離モデルを推定する距離推定部と、前記被写体の距離モデルより仮想カメラ視点の画像を合成する画像合成部とを備えることを特徴とする任意視点画像合成装置である。 The present invention provides a camera that captures an image of a subject that is installed at at least three locations around the display unit of the display device, the right side surface, and the left side surface, and the line of sight of the subject based on each image captured by the camera. An image composition processing unit for compositing images taken from a virtual camera in a direction,
The image composition processing unit converts a camera coordinate system from a camera coordinate system to a virtual camera coordinate system using camera parameters obtained by camera calibration of the camera, and sets a distance model of a subject with respect to the image converted by the coordinate conversion unit. An arbitrary viewpoint image synthesizing apparatus comprising: a distance estimating unit for estimating; and an image synthesizing unit for synthesizing a virtual camera viewpoint image from the distance model of the subject.

ここで、前記距離推定部は、前記座標変換部で変換した画像に対し、２枚１組ずつブロックマッチングによる評価を行い、仮想平面距離を距離推定を行う範囲で変化させ、全ての距離において評価を行い、評価値がもっとも小さな距離を注目画素の距離値として求めて、被写体の距離モデルを推定することを特徴とする。 Here, the distance estimation unit performs evaluation by block matching for each pair of images converted by the coordinate conversion unit, changes the virtual plane distance within a range in which distance estimation is performed, and evaluates at all distances. The distance model having the smallest evaluation value is obtained as the distance value of the target pixel, and the distance model of the subject is estimated.

また、前記距離推定部は、前記ブロックマッチングによる評価を変則形ブロックで行うことを特徴とする。 The distance estimation unit may perform the evaluation by the block matching using an irregular block.

また、前記画像合成部は、前記座標変換部で変換した画像から前記被写体の距離モデルに対応する画素の色を抽出して貼り付けることを特徴とする。 The image composition unit may extract and paste a color of a pixel corresponding to the distance model of the subject from the image converted by the coordinate conversion unit.

また、前記画像合成部は、前記距離値より仮想カメラの画素に対応する周囲のカメラの画素を１対１の関係で求めることを特徴とする。 In addition, the image composition unit obtains pixels of surrounding cameras corresponding to the pixels of the virtual camera from the distance value in a one-to-one relationship.

本発明によれば、上記構成とすることにより、正面画像を高画質化でき、視線一致感が向上するという効果が得られる。また、特に目の周辺領域の画質を向上させることにより、一層、視線一致感を向上することが可能となり、テレビ対話システムに適用可能な任意視点画像合成装置を実現することができる。 According to the present invention, with the above-described configuration, it is possible to obtain an effect that the front image can be improved in image quality and the line-of-sight matching feeling is improved. In particular, by improving the image quality of the peripheral region of the eyes, it is possible to further improve the line-of-sight matching feeling, and it is possible to realize an arbitrary viewpoint image composition device applicable to a television conversation system.

また、本発明によれば、前記距離推定部において、前記カメラから取得した画像を変則形ブロックでブロックマッチングを行うため、オクルージョン影響を減らすことができ、正面画像の合成結果を高画質化することができる。すなわち、視線一致感が向上するという効果が得られる。 Further, according to the present invention, the distance estimation unit performs block matching on the image acquired from the camera with an irregular block, so that the influence of occlusion can be reduced and the synthesis result of the front image can be improved in image quality. Can do. That is, the effect of improving the line-of-sight match is obtained.

本発明の実施の形態による任意視点画像合成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the arbitrary viewpoint image composition system by embodiment of this invention. 図１に示した実施の形態による任意視点画像合成システムのフローチャートである。It is a flowchart of the arbitrary viewpoint image composition system by embodiment shown in FIG. 図１に示した実施の形態による画像合成処理部のブロック図である。FIG. 2 is a block diagram of an image composition processing unit according to the embodiment shown in FIG. 1. 図３に示した画像合成処理部の詳細な処理内容を記載したブロック図である。FIG. 4 is a block diagram illustrating detailed processing contents of an image composition processing unit illustrated in FIG. 3. 図４に示した実施の形態による画像合成処理部のフローチャートである。It is a flowchart of the image composition process part by embodiment shown in FIG. 距離推定部の詳細なフローチャートである。It is a detailed flowchart of a distance estimation part. ピンホールカメラモデルの説明図である。It is explanatory drawing of a pinhole camera model. 画像平面を前に出したピンホールカメラモデルの説明図である。It is explanatory drawing of the pinhole camera model which took out the image plane ahead. 世界座標系とカメラ座標系の関係の説明図である。It is explanatory drawing of the relationship between a world coordinate system and a camera coordinate system. ピンホールカメラモデルを実際のカメラに適用した場合の問題点についての説明図であり、（ａ）は実際のカメラモデル、（ｂ）は非直交座標軸がもたらす画像歪みである。It is explanatory drawing about the problem at the time of applying a pinhole camera model to an actual camera, (a) is an actual camera model, (b) is the image distortion which a non-orthogonal coordinate axis brings. カメラ内部変数の説明図である。It is explanatory drawing of a camera internal variable. キャリブレーション用チェッカーパターン例の説明図である。It is explanatory drawing of the checker pattern example for a calibration. ＣＧチェッカーパターン撮影環境の説明図である。It is explanatory drawing of a CG checker pattern imaging environment. ＣＧ撮影環境パラメータの一例を示す表である。It is a table | surface which shows an example of CG imaging | photography environment parameter. キャリブレーション用チェッカーパターン例の説明図である。It is explanatory drawing of the checker pattern example for a calibration. ＣＧシミュレーション用カメラパラメータの一例を示す表である。It is a table | surface which shows an example of the camera parameter for CG simulation. ３眼カメラシステムの説明図である。It is explanatory drawing of a trinocular camera system. キャリブレーション用チェッカーパターン例の説明図である。It is explanatory drawing of the checker pattern example for a calibration. ３眼カメラシステムのカメラパラメータの一例を示す表である。It is a table | surface which shows an example of the camera parameter of a trinocular camera system. ３眼カメラシステムの各カメラ座標系におけるカメラパラメータの説明図である。It is explanatory drawing of the camera parameter in each camera coordinate system of a trinocular camera system. 中央カメラ座標系を世界座標系としたときの左右カメラ座標系におけるカメラパラメータの説明図である。It is explanatory drawing of the camera parameter in the left-right camera coordinate system when a center camera coordinate system is made into a world coordinate system. 中央カメラ座標系を世界座標系としたときのカメラパラメータの一例を示す表である。It is a table | surface which shows an example of a camera parameter when a center camera coordinate system is made into a world coordinate system. マッチングに用いるブロック形状の説明図であり、（ａ）は単眼キャリブレーションを適用した場合、（ｂ）は内部パラメータ拘束方式を適用した場合である。It is explanatory drawing of the block shape used for matching, (a) is a case where a monocular calibration is applied, (b) is a case where an internal parameter constraint system is applied. 内部パラメータ拘束を用いて推定した３眼カメラシステムのカメラパラメータの一例を示す表である。It is a table | surface which shows an example of the camera parameter of the trinocular camera system estimated using the internal parameter constraint. 中央カメラ座標系を世界座標系としたときのカメラパラメータの一例を示す表である。It is a table | surface which shows an example of a camera parameter when a center camera coordinate system is made into a world coordinate system. キャリブレーション時とシステム利用時のカメラ配置の説明図であり、（ａ）はキャリブレーション時のカメラ配置、（ｂ）はシステム利用時のカメラ配置である。It is explanatory drawing of the camera arrangement | positioning at the time of calibration and system utilization, (a) is the camera arrangement | positioning at the time of calibration, (b) is the camera arrangement | positioning at the time of system use. カメラ視点変換のためのカメラ配置の説明図であり、（ａ）はキャリブレーション時のカメラ配置、（ｂ）はシステム利用時のカメラ配置である。It is explanatory drawing of the camera arrangement | positioning for camera viewpoint conversion, (a) is the camera arrangement | positioning at the time of calibration, (b) is the camera arrangement | positioning at the time of system utilization. 仮想カメラと仮想平面の配置位置の説明図である。It is explanatory drawing of the arrangement position of a virtual camera and a virtual plane. 変換行列の対応関係の説明図である。It is explanatory drawing of the correspondence of a conversion matrix. 仮想平面位置とカメラ視点変換対応箇所関係の説明図である。It is explanatory drawing of a virtual plane position and a camera viewpoint conversion corresponding | compatible location relationship. ＴＶ対話システム概略図の説明図である。It is explanatory drawing of TV dialogue system schematic. マッチングに用いるブロック形状の説明図である。It is explanatory drawing of the block shape used for matching. 周辺カメラ配置位置の説明図であり、（ａ）はカメラ配置位置、（ｂ）はカメラ配置座標である。It is explanatory drawing of a surrounding camera arrangement position, (a) is a camera arrangement position, (b) is a camera arrangement coordinate. ＣＧカメラ設定の一例を示す表である。It is a table | surface which shows an example of CG camera setting. ＣＧシミュレーション用カメラパラメータの一例を示す表である。It is a table | surface which shows an example of the camera parameter for CG simulation. 中央視点画像合成実験条件の一例を示す表である。It is a table | surface which shows an example of a center viewpoint image synthetic | combination experiment condition. ＴＶ対話システムのカメラ配置の説明図である。It is explanatory drawing of the camera arrangement | positioning of a TV interactive system. ＴＶ対話システムカメラパラメータの一例を示す表である。It is a table | surface which shows an example of a TV dialog system camera parameter. ＵｓｅｒＡ，Ｂの合成条件の一例を示す表である。It is a table | surface which shows an example of the synthetic | combination conditions of UserA, B. 本発明の他の実施例による任意視点画像合成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the arbitrary viewpoint image composition system by the other Example of this invention. 本発明のさらに他の実施例による任意視点画像合成システムの構成を示すブロック図である。It is a block diagram which shows the structure of the arbitrary viewpoint image composition system by the further another Example of this invention.

以下に、本発明の実施形態について、図面を参照して詳細に説明する。 Embodiments of the present invention will be described below in detail with reference to the drawings.

図１は、本発明の第１の実施形態に係る、任意視点画像合成システム１の全体構成を示す機能ブロック図である。図１に示す任意視点画像合成システム１は、画像を表示する表示装置５と、被写体６の撮影画像をもとに被写体６の視線方向と撮影方向を一致させる画像を合成する任意視点画像合成装置１０とから構成される。任意視点画像合成装置１０は、被写体６を撮影するカメラ２Ａ，２Ｂ，２Ｃと、撮影された画像から仮想カメラ視点の画像を合成する画像合成処理部３と、合成画像を出力する画像出力部４とから基本的に構成されている。 FIG. 1 is a functional block diagram showing the overall configuration of an arbitrary viewpoint image synthesis system 1 according to the first embodiment of the present invention. An arbitrary viewpoint image composition system 1 shown in FIG. 1 includes a display device 5 that displays an image, and an arbitrary viewpoint image composition device that synthesizes an image in which the line-of-sight direction of the subject 6 matches the shooting direction based on the captured image of the subject 6. 10. The arbitrary viewpoint image composition device 10 includes cameras 2A, 2B, and 2C that photograph the subject 6, an image composition processing unit 3 that composes a virtual camera viewpoint image from the photographed images, and an image output unit 4 that outputs a composite image. And basically consists of

図１に示すように、カメラ２Ａは表示装置５の中央上部に、カメラ２Ｂは表示装置５の側面左下部に、カメラ２Ｃは表示装置５の側面右下部に装着されている。このように表示装置５の表示部分の周囲３箇所に配置されたカメラ２Ａ，２Ｂ，２Ｃにより、被写体６を同時に撮影する。 As shown in FIG. 1, the camera 2 A is mounted on the upper center of the display device 5, the camera 2 B is mounted on the lower left side of the display device 5, and the camera 2 C is mounted on the lower right side of the display device 5. Thus, the subject 6 is simultaneously photographed by the cameras 2A, 2B, and 2C arranged at the three positions around the display portion of the display device 5.

図２に任意視点画像合成システム１の処理手順のフローチャートを示す。
まず、カメラ２Ａ，２Ｂ，２Ｃにより被写体６を撮影し画像を取得する（ステップＳ２１：以降ステップを省略する）。次に画像合成処理部３により、撮影した３つの画像から、撮影方向と視線方向を一致させた仮想カメラの視点の画像を合成する（Ｓ２２）。合成した画像を画像出力部４から表示装置５へ出力し（Ｓ２３）、表示装置５に仮想カメラ視点の画像を表示する（Ｓ２４）。 FIG. 2 shows a flowchart of the processing procedure of the arbitrary viewpoint image synthesis system 1.
First, the subject 6 is photographed by the cameras 2A, 2B, and 2C to acquire an image (step S21: Steps are omitted hereinafter). Next, the image composition processing unit 3 synthesizes the viewpoint image of the virtual camera in which the photographing direction and the line-of-sight direction coincide with each other from the three photographed images (S22). The synthesized image is output from the image output unit 4 to the display device 5 (S23), and the virtual camera viewpoint image is displayed on the display device 5 (S24).

次に、画像合成処理部３の詳細構成を図３に示す。
画像合成処理部３は、カメラ２Ａ，２Ｂ，２Ｃにより撮影された画像をカメラ座標系から仮想カメラ座標系に変換する座標変換部２１，２２，２３と、仮想カメラ座標系に変換した画像に対し被写体の距離画像（距離モデル）を推定する距離推定部２４と、推定した距離モデルより仮想カメラ視点の画像を合成する画像合成部２５と、座標変換に用いるカメラのパラメータを記憶するカメラパラメータ記憶部２６とから基本的に構成されている。 Next, a detailed configuration of the image composition processing unit 3 is shown in FIG.
The image composition processing unit 3 includes coordinate conversion units 21, 22, and 23 that convert images captured by the cameras 2 A, 2 B, and 2 C from a camera coordinate system to a virtual camera coordinate system, and an image converted to the virtual camera coordinate system. A distance estimation unit 24 that estimates a distance image (distance model) of the subject, an image synthesis unit 25 that synthesizes a virtual camera viewpoint image from the estimated distance model, and a camera parameter storage unit that stores camera parameters used for coordinate conversion 26 basically.

さらに、画像合成処理部３の各部の詳細な処理関係を図４に示す。
カメラパラメータ記憶部２６に記憶したパラメータに基づいて、座標変換部２１，２２，２３では、各カメラ２Ａ，２Ｂ，２Ｃからの画像Ｉ_１，Ｉ_２，Ｉ_３から座標変換した結果の画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}を、距離推定部２４、及び画像合成部２５へ渡す。距離推定部２４では、３枚の画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}に対し、２枚１組ずつブロックマッチングによる評価を行う。ここでは、仮想平面距離ｚを距離推定を行う範囲で変化させ、全ての距離において評価を行う。このとき評価値がもっとも小さな距離を注目画素（ｉ，ｊ）での距離値ｄ（ｉ，ｊ）とする。この操作を全ての画素に対して行うことで、被写体の距離画像を推定できる（この処理の詳しい説明は後述する）。 Furthermore, the detailed processing relationship of each part of the image composition processing unit 3 is shown in FIG.
Based on the parameters stored in the camera parameter storage unit 26, the coordinate conversion units 21, 22, and 23 convert the images I ₁ , I ₂ , and I ₃ from the cameras 2A, 2B, and 2C into an image I _1conv as a result of coordinate conversion. , I 2 _conv , I 3 _conv are _passed to the distance estimation unit 24 and the image composition unit 25. The distance estimation unit 24 performs evaluation by block matching for each of the three images I _1conv , I _2conv , and I _3conv for each set. Here, the virtual plane distance z is changed within the range in which distance estimation is performed, and evaluation is performed at all distances. At this time, the distance having the smallest evaluation value is set as the distance value d (i, j) at the target pixel (i, j). By performing this operation on all the pixels, the distance image of the subject can be estimated (a detailed description of this process will be described later).

次に画像合成部２５では、作成した被写体の距離画像に対し、座標変換画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}から対応する画素の色を抽出して貼り付ける。被写体６の距離情報が得られている場合、幾何学計算で仮想カメラの画素に対応する周囲のカメラの画素を１対１の関係で求めることができる。この操作を仮想カメラの全ての画素に対して行うことで仮想カメラ視点の画像を合成できる（この処理の詳しい説明は後述する）。 Next, the image composition unit 25 extracts and pastes corresponding pixel colors from the coordinate conversion images I _1conv , I _2conv , and I _3conv to the created distance image of the subject. When the distance information of the subject 6 is obtained, the surrounding camera pixels corresponding to the virtual camera pixels can be obtained in a one-to-one relationship by geometric calculation. By performing this operation on all the pixels of the virtual camera, a virtual camera viewpoint image can be synthesized (a detailed description of this process will be described later).

図５に画像合成処理部３のフローチャートを示す。
図５において、座標変換部２１，２２，２３は、カメラ２Ａ、カメラ２Ｂ、カメラ２Ｃから画像Ｉ_１，Ｉ_２，Ｉ_３を取得し（Ｓ５１）、カメラ２Ａ、カメラ２Ｂ、カメラ２Ｃから取得したカメラ座標系の画像Ｉ_１，Ｉ_２，Ｉ_３を仮想カメラ座標系の画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}に変換し、距離推定部２４と画像合成部２５に送る（Ｓ５２）。 FIG. 5 shows a flowchart of the image composition processing unit 3.
In FIG. 5, the coordinate conversion units 21, 22, and 23 obtain images I ₁ , I ₂ , and I ₃ from the cameras 2A, 2B, and 2C (S51), and obtain images from the cameras 2A, 2B, and 2C. The camera coordinate system images I ₁ , I ₂ , I ₃ are converted into virtual camera coordinate system images I 1 _conv , I 2 _conv , I 3 _conv and sent to the distance estimation unit 24 and the image composition unit 25 (S52).

次に、距離推定部２４は、仮想カメラ座標系に変換した画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}に対し、仮想平面距離ｚを距離推定を行う範囲で変化させ、全ての距離において２枚１組ずつブロックマッチングし、評価値がもっとも小さな距離を注目画素の距離値として距離画像ｄを推定する（Ｓ５３）。 Next, the distance estimation unit 24 _changes the virtual plane distance z in the range in which the distance estimation is performed on the images I _1conv , I _2conv , and I _3conv converted to the virtual camera coordinate system, and one set of two at all distances. Block matching is performed one by one, and the distance image d is estimated with the distance having the smallest evaluation value as the distance value of the target pixel (S53).

最後に、画像合成部２５が、作成した被写体の距離画像ｄに対応する画素の色を画像Ｉ_１，Ｉ_２，Ｉ_３から抽出して距離画像ｄに貼り付け、仮想カメラ視点の画像を合成する（Ｓ５４）。 Finally, the image synthesis unit 25 extracts the color of the pixel corresponding to the created subject distance image d from the images I ₁ , I ₂ , and I ₃ and pastes it on the distance image d to synthesize the virtual camera viewpoint image. (S54).

図６にＳ５３の距離推定部２４で行う距離推定の手法のより詳細なフローチャートを示す。
距離推定の手法では、画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}の仮想平面距離ｚを距離推定を行う範囲で変更し（Ｓ６１）、２枚１組ずつでブロックマッチングを行う（Ｓ６２）。次に距離画像推定のための評価値Ｉ_{ＳＵＭＡＬＬ}（ｉ，ｊ）について計算を行う（Ｓ６３）。評価値Ｉ_{ＳＵＭＡＬＬ}（ｉ，ｊ）の詳細については後述する。仮想平面距離ｚを、距離推定を行う全ての範囲で変化させたかを確認する（Ｓ６４）。全ての範囲で変化させていない場合、Ｓ６１に戻る。全ての範囲で完了した場合、距離画像推定を行い、距離画像ｄを出力する（Ｓ６５）。距離画像ｄの詳細については後述する。 FIG. 6 shows a more detailed flowchart of the distance estimation method performed by the distance estimation unit 24 in S53.
In the distance estimation method, the virtual plane distance z of the images I 1 _conv , I 2 _conv , and I 3 _conv is changed within the range in which distance estimation is performed (S61), and block matching is performed for each pair of two sheets (S62). Next, the evaluation value I _SUMALL (i, j) for distance image estimation is calculated (S63). Details of the evaluation value I _SUMALL (i, j) will be described later. It is confirmed whether or not the virtual plane distance z has been changed in all ranges in which distance estimation is performed (S64). If the change has not been made in the entire range, the process returns to S61. When the process is completed for all ranges, distance image estimation is performed and a distance image d is output (S65). Details of the distance image d will be described later.

以下、任意視点画像合成装置１０に係る処理の詳細について述べる。 The details of the processing related to the arbitrary viewpoint image composition device 10 will be described below.

［カメラキャリブレーションの説明］
まず初めに、カメラキャリブレーションについて説明する。これは画像合成処理部３の座標変換部２１，２２，２３とカメラパラメータ記憶部２６に関するものである。
同一のシーンを同じ型番のカメラを用いて撮影したとしても、その取得画像はレンズ歪みや画像中心のずれなど各カメラの個体差の影響を受けるため、同一の画像を取得することはできない。そのため常に同一の条件で画像処理を行うためには個体差の補正を行わなければならない。また三角測量の原理に基づくステレオマッチングなど複数のカメラから取得した画像を用いて処理を行う場合、カメラの個体差の補正だけでなく空間的な位置関係を把握する必要がある。カメラキャリブレーションとは、このカメラ固有の個体差を示す内部変数とカメラの空間的位置を示す外部変数を求めるプロセスのことである。これにはいくつかの手法があり、形が既知のパターンを置きそれを観察する手法と、形が未知の世界を観察する手法に分類される。前者の代表的な手法として、本発明ではＺｈａｎｇの手法を用いた。まずカメラモデルおよびカメラパラメータについて説明を行い、単眼カメラキャリブレーションの手法であるＺｈａｎｇの手法について述べる。次にその手法の本任意視点画像合成装置１０で用いる多眼撮像装置におけるキャリブレーションへの適用について述べる。 [Explanation of camera calibration]
First, camera calibration will be described. This relates to the coordinate conversion units 21, 22 and 23 and the camera parameter storage unit 26 of the image composition processing unit 3.
Even if the same scene is shot using cameras of the same model number, the acquired image is affected by individual differences of each camera such as lens distortion and image center shift, and therefore the same image cannot be acquired. Therefore, in order to always perform image processing under the same conditions, individual differences must be corrected. In addition, when processing is performed using images acquired from a plurality of cameras such as stereo matching based on the principle of triangulation, it is necessary to grasp not only correction of individual differences of cameras but also a spatial positional relationship. The camera calibration is a process for obtaining an internal variable indicating an individual difference inherent to the camera and an external variable indicating the spatial position of the camera. There are several methods, which can be classified into a method of placing a pattern with a known shape and observing it, and a method of observing a world with an unknown shape. As the former representative technique, the Zhang technique is used in the present invention. First, the camera model and camera parameters will be described, and then the Zhang method, which is a monocular camera calibration method, will be described. Next, application of the method to calibration in the multi-view imaging apparatus used in the arbitrary viewpoint image synthesis apparatus 10 will be described.

［カメラモデルとカメラパラメータ：カメラモデルと座標系の説明］
我々は、物体に反射した光が目のレンズを通り網膜に届くことで被写体を見ている。カメラにおいても網膜の役目をＣＣＤやＣＭＯＳセンサが行っている点を除き同様に考えることができる。この機構を簡単なモデルで表したものが図７に示すピンホールカメラモデルである。ピンホールカメラは画像平面３１とピンホール３２により構成され、ピンホール３２とはピンホール平面３３の中央部に設けられた微小な穴である。ピンホール平面３３とは、ピンホール３２を通る光以外は全て遮断する仮想平面のことである。 [Camera model and camera parameters: description of camera model and coordinate system]
We see the subject as the light reflected by the object passes through the eye lens and reaches the retina. The camera can be considered similarly except that the CCD or CMOS sensor performs the role of the retina. A simple model of this mechanism is the pinhole camera model shown in FIG. The pinhole camera is composed of an image plane 31 and a pinhole 32, and the pinhole 32 is a minute hole provided at the center of the pinhole plane 33. The pinhole plane 33 is a virtual plane that blocks all light except the light passing through the pinhole 32.

画像平面３１は網膜やＣＭＯＳセンサに対応するものである。ここでｆはカメラの焦点距離、ｚはカメラ（ピンホール３２）と被写体３４の距離、ｙは被写体３４の大きさ、ｙ_{ｓｃｒｅｅｎ}は画像平面３１における被写体３４の投影像３５の大きさをそれぞれ示す。このピンホールカメラモデルにおいては画像平面３１に至る光はすべてピンホール３２を通過し、画像平面３１と交差した位置で像を結ぶことになる。この様な射影を中心射影と呼ぶ。また、このとき図７より、次の（１）式を得る。 The image plane 31 corresponds to a retina or a CMOS sensor. Here, f is the focal length of the camera, z is the distance between the camera (pinhole 32) and the subject 34, y is the size of the subject 34, and _yscreen is the size of the projection image 35 of the subject 34 on the image plane 31. . In this pinhole camera model, all the light reaching the image plane 31 passes through the pinhole 32 and forms an image at a position intersecting the image plane 31. Such a projection is called a central projection. At this time, the following equation (1) is obtained from FIG.

これは像が上下逆転して画像平面３１に表示されることを示す。しかし、これでは直感的に射影を扱いにくい。そこで図７の画像平面３１とピンホール平面３３を入れ替えた図８のモデルがよく用いられる。このようにカメラを中心として定義した座標系をカメラ座標系と呼ぶ。このとき画像座標系は画像中心を原点とし、ｘ軸とｙ軸はそれぞれカメラの素子の配置軸にあわせる。３次元空間のカメラ座標系は点Ｃを原点に、光軸をＺ軸とし、Ｘ軸、Ｙ軸は右手系になるよう配置する。このときピンホール３２は射影中心Ｃとなり、被写体３４からの光線は全てこの射影中心Ｃに向かう。そこで図８において、次の（２）式が成り立ち、像が反転せずに画像平面に表示されることがわかる。 This indicates that the image is displayed upside down on the image plane 31. However, this makes it difficult to handle projection intuitively. Therefore, the model of FIG. 8 in which the image plane 31 and the pinhole plane 33 of FIG. 7 are interchanged is often used. A coordinate system defined with the camera at the center is called a camera coordinate system. At this time, the image coordinate system has the image center as the origin, and the x-axis and y-axis are aligned with the arrangement axes of the camera elements. The camera coordinate system in the three-dimensional space is arranged such that the point C is the origin, the optical axis is the Z axis, and the X axis and Y axis are the right hand system. At this time, the pinhole 32 becomes the projection center C, and all light rays from the subject 34 go to the projection center C. Therefore, in FIG. 8, it can be seen that the following equation (2) holds, and the image is displayed on the image plane without being inverted.

このとき画像座標系ｍ（ｘ_{ｓｃｒｅｅｎ}，ｙ_{ｓｃｒｅｅｎ}）と３次元空間のカメラ座標系Ｍ（Ｘ，Ｙ，Ｚ）との関係は（３）式として表される。 At this time, the relationship between the image coordinate system m (x _screen , y _screen ) and the camera coordinate system M (X, Y, Z) in the three-dimensional space is expressed as equation (3).

［カメラモデルとカメラパラメータ：外部パラメータの説明］
（３）式を線形表現すると（４）式として表すことができる。 [Camera model and camera parameters: description of external parameters]
When the expression (3) is expressed linearly, it can be expressed as an expression (4).

ここで、ｓはスカラーである。ｍ、Ｍは同次座標表現であり、それぞれ次の（５）式、（６）式を意味する。 Here, s is a scalar. m and M are homogeneous coordinate expressions, and mean the following expressions (5) and (6), respectively.

また、ここでＰは、中心射影の射影行列であり、次の（７）式で表される。 Here, P is a projection matrix for central projection, and is expressed by the following equation (7).

図７及び図８においては、３次元空間内の点をカメラ座標系で記述していたが、実際は世界座標系を用いることが多い。図９に世界座標系とカメラ座標系の関係を示す。座標軸の添字ｗは世界座標系を、ｃはカメラ座標系を表している。図９に示すように、世界座標系をカメラ座標系に変換するには世界座標系を回転させたのち平行移動させることになる。回転は３行３列の回転行列Ｒとして次の（８）式で、平行移動は３行１列の並進ベクトルｔとして次の（９）式で表される。 7 and 8, the points in the three-dimensional space are described in the camera coordinate system, but in reality, the world coordinate system is often used. FIG. 9 shows the relationship between the world coordinate system and the camera coordinate system. The suffix w of the coordinate axis represents the world coordinate system, and c represents the camera coordinate system. As shown in FIG. 9, in order to convert the world coordinate system to the camera coordinate system, the world coordinate system is rotated and then translated. The rotation is expressed by the following equation (8) as a rotation matrix R of 3 rows and 3 columns, and the translation is expressed by the following equation (9) as a translation vector t of 3 rows and 1 column.

ここでθ_Ｘ，θ_Ｙ，θ_Ｚは、次の（１０）、（１１）、（１２）の各式に示す通りであり、それぞれの座標軸まわりの回転を表す。回転の方向は右手系とする。なお、回転行列Ｒから各座標軸まわりの回転を求めることもできる。 Here, θ _X , θ _Y , and θ _Z are as shown in the following equations (10), (11), and (12), and represent rotations about the respective coordinate axes. The direction of rotation is the right-handed system. Note that rotation about each coordinate axis can also be obtained from the rotation matrix R.

ここで（８）式、（９）式を用いてカメラ座標系における点Ｍ_Ｃとそれに対応する世界座標系における座標Ｍ_Ｗとの幾何学的関係は次の（１３）式で表される。 Where (8), the geometric relationship between the coordinates M _W in the world coordinate system and the corresponding point M _C in the camera coordinate system using equation (9) can be expressed by the following equation (13).

このＲ、ｔをあわせて、カメラの外部パラメータと呼ぶ。 R and t are collectively referred to as an external parameter of the camera.

［カメラモデルとカメラパラメータ：内部パラメータの説明］
以上は、理想的なカメラモデルの定義とカメラ座標系について説明したが、実際のカメラはこの単純なモデルで表すことはできない。図１０（ａ）のように実際にカメラ２において、レンズ４１を取り付けた場合、センサ４２の中心を通る光軸Ｚ１とレンズの光軸Ｚ０とが一致するとは限らない。これは、センサ４２を接着材４３により取り付けるときに傾いてしまう場合があるからである。また、センサ４２の画素が正方でない場合は画像の両座標軸のスケールが異なる場合がある。さらに図１０（ｂ）に示すように、実際の画像の両座標軸が必ず直交するとは限らない。このため、画像に歪み４３が生じる場合もある。 [Camera model and camera parameters: description of internal parameters]
The above describes the definition of the ideal camera model and the camera coordinate system, but an actual camera cannot be represented by this simple model. When the lens 41 is actually attached to the camera 2 as shown in FIG. 10A, the optical axis Z1 passing through the center of the sensor 42 does not always coincide with the optical axis Z0 of the lens. This is because the sensor 42 may be tilted when attached by the adhesive 43. Further, when the pixels of the sensor 42 are not square, the scales of both coordinate axes of the image may be different. Further, as shown in FIG. 10B, both coordinate axes of the actual image are not always orthogonal. For this reason, distortion 43 may occur in the image.

ここで図１１に示すように新たなパラメータとして、光軸とセンサ面が交わる座標と画像中心座標とのずれ量を示すｕ_０，ｖ_０、実際の画像の両座標軸がなす角度を示すθ、両座標軸の単位長を示すｋ_ｕ，ｋ_ｖを導入する。このとき座標系ｃ−ｘｙは画像中心ｃを原点とし、ｘ軸とｙ軸は同じスケールを持つ。座標系ｏ−ｕｖは画像座標系である。ここで座標系ｏ−ｕｖと座標系ｃ−ｘｙの座標をそれぞれｍ＝［ｕ，ｖ］^Ｔ，ｍ_ｓ＝［ｘ，ｙ］^Ｔとすると次の（１４）式が成り立つ。 Here, as shown in FIG. 11, as new parameters, u ₀ and v ₀ indicating the shift amount between the coordinates at which the optical axis and the sensor surface intersect with the image center coordinates, θ indicating the angle formed by the two coordinate axes of the actual image, Introducing k _u and k _v indicating the unit length of both coordinate axes. At this time, the coordinate system c-xy has the image center c as the origin, and the x axis and the y axis have the same scale. The coordinate system o-uv is an image coordinate system. Here, when the coordinates of the coordinate system o-uv and the coordinate system c-xy are m = [u, v] ^T and m _s = [x, y] ^T , the following equation (14) is established.

ここで、Ｈは、次の（１５）式である。 Here, H is the following equation (15).

（４）式、（１４）式より、（１６）式が成り立つ。 Equation (16) is established from Equation (4) and Equation (14).

ここで、Ｐ_Ｍは、次の（１７）式である。 Here, _{P M} is the following equation (17).

ここで、ｆｋ_ｕとｆｋ_ｖをそれぞれα_ｕとα_ｖで置き換えると、（１７）式は次の（１８）式のように表わされる。 Here, when fk _u and fk _v are replaced with α _u and α _v , respectively, equation (17) is expressed as the following equation (18).

ここでＡ、Ｐ_Ｎは以下の（１９）式、（２０）式で表わされる行列である。 Here, A and _PN are matrices represented by the following equations (19) and (20).

この行列Ａはカメラの内部の変数によって構成されており、カメラの内部パラメータと呼ばれる。 This matrix A is composed of internal variables of the camera and is called an internal parameter of the camera.

［単眼カメラキャリブレーション：既知パターンによる校正法の説明］
既知パターンによる校正は、３次元形状が既知の物体を観察し、その２次元への投影像と比較を行うことでカメラの内部パラメータと外部パラメータを推定する方法である。以下にその原理を説明する。 [Monocular camera calibration: Explanation of calibration method using known pattern]
Calibration using a known pattern is a method of estimating internal and external parameters of a camera by observing an object having a known three-dimensional shape and comparing it with a projection image onto the two-dimensional shape. The principle will be described below.

（４）式、（１５）式、（１８）式より、カメラの画像座標系の点ｍと世界座標系の点Ｍとの幾何学的対応関係は以下の（２１）式で表わされる。 From the equations (4), (15), and (18), the geometrical correspondence between the point m in the image coordinate system of the camera and the point M in the world coordinate system is expressed by the following equation (21).

ここで、Ｐは、次の（２２）式である。 Here, P is the following equation (22).

ここで射影行列Ｐは３×４の行列で、内部変数と外部変数合わせて１１の変数から成り立っている。（２１）式を展開すると、１つの世界座標の点とその画像座標への投影点から以下の（２３）式が求まる。 Here, the projection matrix P is a 3 × 4 matrix, and is composed of 11 variables including internal variables and external variables. When the expression (21) is expanded, the following expression (23) is obtained from one world coordinate point and the projected point on the image coordinate.

ここで（２３）式を変形すると次の（２４）式として表わされる。 Here, when the equation (23) is modified, it is expressed as the following equation (24).

ここで１枚の画像からｎ個の対応点を取得した場合、（２４）式を連立することで次の（２５）式を得る。 Here, when n corresponding points are acquired from one image, the following equation (25) is obtained by simultaneous equations (24).

Ｂは２ｎ×１２の行列であり、ｎ≧６であれば射影行列Ｐの１１の変数の解をＢ^ＴＢの最小の固有値に対する固有ベクトルとして求めることができる。（２１）式に（８）式、（９）式、（１９）式を代入することで次の（２６）式を得る。 B is a 2n × 12 matrix. If n ≧ 6, the solution of 11 variables of the projection matrix P can be obtained as an eigenvector for the minimum eigenvalue of B ^T B. By substituting the equations (8), (9), and (19) into the equation (21), the following equation (26) is obtained.

ここで物体は常にカメラの前面にあることから次の（２７）式を得る。 Here, since the object is always in front of the camera, the following equation (27) is obtained.

以上の式を計算することで次の（２８）式を得る。 The following equation (28) is obtained by calculating the above equation.

以上のようにして既知の３次元点を６点以上取得することで、各カメラパラメータを推定することができる。これらカメラパラメータは、カメラパラメータ記憶部２６に記憶される。 Each camera parameter can be estimated by acquiring 6 or more known three-dimensional points as described above. These camera parameters are stored in the camera parameter storage unit 26.

［単眼カメラキャリブレーション：Ｚｈａｎｇのカメラキャリブレーション法の説明］
Ｚｈａｎｇのカメラキャリブレーションは、図１２に示すような平面パターン５１を多方向から撮影し得られた画像中の特徴点を元にカメラパラメータを推定する方法である。カメラと平面パターン５１のみ用意すれば推定が行えるため、３次元形状を用いた手法に比べ、より簡易な装置でキャリブレーションを行うことができる。以下にその原理を説明する。 [Monocular Camera Calibration: Explanation of Zhang's Camera Calibration Method]
Zhang's camera calibration is a method for estimating camera parameters based on feature points in an image obtained by photographing a planar pattern 51 as shown in FIG. 12 from multiple directions. Since only the camera and the planar pattern 51 can be used for estimation, calibration can be performed with a simpler apparatus as compared with a method using a three-dimensional shape. The principle will be described below.

三次元空間上の点Ｍ（Ｘ，Ｙ，Ｚ）と画像平面上の点ｍ（ｘ，ｙ）の関係は、前述の（４）式として表される。このときキャリブレーション用のチェッカーパターンがある平面５１をｚ＝０と置くと次の（２９）式を得る。 The relationship between the point M (X, Y, Z) on the three-dimensional space and the point m (x, y) on the image plane is expressed as the above-described equation (4). At this time, if the plane 51 with the checker pattern for calibration is set to z = 0, the following equation (29) is obtained.

これよりチェッカーパターン上の点と画像上の点との対応は次の（３０）式に示す３×３の行列で表すことができる。この行列をホモグラフィ行列と呼ぶ。 Accordingly, the correspondence between the points on the checker pattern and the points on the image can be represented by a 3 × 3 matrix expressed by the following equation (30). This matrix is called a homography matrix.

ここで、（３０）式を次の（３１）式の形で表わす。 Here, the expression (30) is expressed in the form of the following expression (31).

このとき、回転行列Ｒの性質より、ｒ_１とｒ_２が互いに直交する単位ベクトルであることを利用すると以下の（３２）式、（３３）式の制約条件を得る。 At this time, by utilizing the fact that r ₁ and r ₂ are unit vectors orthogonal to each other due to the nature of the rotation matrix R, the following constraint conditions of Equations (32) and (33) are obtained.

ここでＡ^−Ｔ＝（Ａ^Ｔ）^−１である。 Here, A ^−T = (A ^T ) ⁻¹ .

１つのホモグラフィ行列より２式を得ることができるので、ホモグラフィ行列を３つ以上得ることでＨ＝［ｈ１ｈ２ｈ３］に含まれる６つの未知数を求められる。以下に計算手法を示す。まず、Ａ^−ＴＡ^−１は対称性を持ち、次の（３４）式で表される。 Since two equations can be obtained from one homography matrix, six unknowns included in H = [h1 h2 h3] can be obtained by obtaining three or more homography matrices. The calculation method is shown below. First, A- ^TA- ¹ has symmetry and is represented by the following equation (34).

このとき、Ｂの要素を並べたベクトルを次の（３５）式で定義する。 At this time, a vector in which the elements of B are arranged is defined by the following equation (35).

これより、次の（３６）式が成り立つ。 Thus, the following equation (36) is established.

これにより、（３２）式と（３３）式は、次の（３７）式として表せる。 Thereby, the expression (32) and the expression (33) can be expressed as the following expression (37).

ここで、ｎ枚の画像を得ていた場合、ｎ個の上記の式を縦に連結させることで（３８）式を得る。 Here, when n images are obtained, the equation (38) is obtained by vertically connecting the n equations described above.

Ｖは２ｎ×６の行列である。これより、ｂはＶ^ＴＶの最小固有値に対応する固有ベクトルとして求められる。ｂを求めることでＢが求まれば、Ｂ＝λＡ^−ＴＡ^−１からカメラの内部パラメータは（３９）式で求められる。 V is a 2n × 6 matrix. Thus, b is obtained as an eigenvector corresponding to the minimum eigenvalue of V ^T V. If B is Motomare by obtaining b, internal parameters of the camera from B = .lambda.A ^-T A ^-1 can be obtained by the equation (39).

また、これよりＡが求まれば、外部パラメータに関しても（３０）式より、次の（４０）式として求められる。 If A is obtained from this, the external parameter is also obtained from the equation (30) as the following equation (40).

図１３の環境で撮影される理想的な画像をＣＧで作成し、Ｚｈａｎｇの手法によりカメラキャリブレーションを行った。カメラ２−１からチェッカーパターン平面５１までの距離を１７００ｍｍとした。この時の撮影環境の各パラメータの一例を図１４に示す。この環境でチェッカーパターンの角度を変えながら１３枚の画像を取得する。例として図１５（ａ）にカメラに対してチェッカーパターンを正対させたものを、図１５（ｂ）〜（ｆ）に角度を少しずつ変えたものを示す。図１５において（ｘ，ｙ，ｚ）はそれぞれの軸まわりにチェッカーパターンを何度回転させたかを示す。 An ideal image photographed in the environment of FIG. 13 was created by CG, and camera calibration was performed by the Zhang method. The distance from the camera 2-1 to the checker pattern plane 51 was 1700 mm. An example of each parameter of the photographing environment at this time is shown in FIG. In this environment, 13 images are acquired while changing the angle of the checker pattern. As an example, FIG. 15A shows a camera in which a checker pattern is directly opposed to the camera, and FIGS. 15B to 15F show that the angle is changed little by little. In FIG. 15, (x, y, z) indicates how many times the checker pattern is rotated around each axis.

図１５に示すような回転方向を変化させた画像１３枚を用いて、Ｚｈａｎｇの手法によるカメラキャリブレーションを行った。パラメータの推定にはＯｐｅｎＧＬのライブラリを用いており、出力される計算結果はチェッカーパターンの左上を原点とした世界座標に対するカメラの外部パラメータである。これにより推定した内部パラメータ、外部パラメータを図１６に示す。なお、外部パラメータは、図１５（ａ）の画像に対し計算されている。解像度１０２４×７６８画素の理想カメラにおいて画像中心（ｕ_０，ｖ_０）は（５１１．５，３８３．５）であるため、内部パラメータにおいて画像中心が良好に計算されていることが確認できる。また外部パラメータにおいて回転がほとんど無く、チェッカーパターンとカメラとの距離がおよそ１７００ｍｍで互いに正対していることも確認できる。 Camera calibration by the Zhang method was performed using 13 images with the rotation direction changed as shown in FIG. The OpenGL library is used for parameter estimation, and the output calculation result is an external parameter of the camera with respect to world coordinates with the upper left of the checker pattern as the origin. FIG. 16 shows the internal parameters and external parameters estimated by this. The external parameters are calculated for the image shown in FIG. Since the image center (u ₀ , v ₀ ) is (511.5, 383.5) in an ideal camera with a resolution of 1024 × 768 pixels, it can be confirmed that the image center is well calculated in the internal parameters. It can also be confirmed that there is almost no rotation in the external parameters, and the distance between the checker pattern and the camera is approximately 1700 mm and they are facing each other.

［多眼カメラキャリブレーションの説明］
２眼以上のカメラの位置関係を求める場合、全てのカメラにおいて世界座標系を共通にしなければならず、そのために外部パラメータ推定を同時に行う必要がある。ここで図１３の構成に２台のカメラ２−２，２−３を追加した、図１７のような３眼カメラシステムを考える。このシステムにおいてチェッカーパターン５１の角度を変えた複数のシーンを３眼（カメラ２−１，２−２，２−３）から同時に撮影し、その画像群を用いてカメラキャリブレーションを行う。左カメラ２−３から撮影した画像例を図１８（ａ），（ｃ），（ｅ）に、右カメラ２−２から撮影した画像例を図１８（ｂ），（ｄ），（ｆ）に示す。（ｘ，ｙ，ｚ）はそれぞれの軸まわりにチェッカーパターン５１を何度回転させたかを示し、同じ回転角の画像は、左右カメラ２−２，２−３から同時に取得している。このように撮影を行った場合、図１８に示すように、左右カメラ２−２，２−３共に、チェッカーパターン５１の画像が各カメラの画像中心からずれた位置に表示されていることがわかる。このようにして、撮影した画像群に対して前述のカメラキャリブレーションを行い推定したカメラパラメータを、図１９に示す。図１９より、左右カメラ共に図１８の単眼キャリブレーション結果と比べ画像中心のずれが生じていることが確認できる。 [Explanation of multi-eye camera calibration]
When obtaining the positional relationship of two or more cameras, the global coordinate system must be made common to all the cameras, and therefore, external parameter estimation must be performed simultaneously. Here, consider a trinocular camera system as shown in FIG. 17 in which two cameras 2-2 and 2-3 are added to the configuration of FIG. In this system, a plurality of scenes in which the angle of the checker pattern 51 is changed are simultaneously photographed from three eyes (cameras 2-1, 2-2, 2-3), and camera calibration is performed using the image group. Examples of images taken from the left camera 2-3 are shown in FIGS. 18A, 18C, and 18E, and examples of images taken from the right camera 2-2 are FIGS. 18B, 18D, and 18F. Shown in (X, y, z) indicates how many times the checker pattern 51 has been rotated around each axis, and images of the same rotation angle are simultaneously acquired from the left and right cameras 2-2 and 2-3. When shooting is performed in this way, as shown in FIG. 18, it can be seen that the images of the checker pattern 51 are displayed at positions shifted from the image centers of the left and right cameras 2-2 and 2-3. . FIG. 19 shows camera parameters estimated by performing the above-described camera calibration on the captured image group in this way. From FIG. 19, it can be confirmed that the center of the image is shifted in both the left and right cameras as compared with the monocular calibration result of FIG.

ここで図２０に示すように、中央カメラ２−１及び左右カメラ２−２，２−３のカメラパラメータが求まっている。このとき図２１のように世界座標系を中央カメラ２−１のカメラ座標系に設定しなおすことで、左右カメラ２−２，２−３の外部パラメータ［Ｒ’_ｌｅｆｔ｜ｔ’_ｌｅｆｔ］、［Ｒ’_{ｒｉｇｈｔ}｜ｔ’_{ｒｉｇｈｔ}ｔ］は（４１）式、（４２）式として求めることができる。 Here, as shown in FIG. 20, the camera parameters of the central camera 2-1 and the left and right cameras 2-2 and 2-3 are obtained. This time by the world coordinate system as shown in FIG. 21 again set in the camera coordinate system of the center camera 2-1, the external parameters of the right and left cameras _{2-2,2-3 [R 'left | t'} left], [ R ′ _right | t ′ _right t] can be obtained as equations (41) and (42).

この外部パラメータは世界座標系、つまり実空間における中央カメラ２−１と左右カメラ２−２，２−３間の絶対的な位置関係を表す。図２２にこの新たな外部パラメータの計算結果を示す。理論上、左カメラの場合ｔ_ｌｅｆｔ＝［−６００，０，０］、右カメラの場合ｔ_{ｒｉｇｈｔ}＝［６００，０，０］となるが、計算結果は大きく異なっている。これは前述したようにカメラパラメータの計算が内部パラメータ、外部パラメータの順で行われるため、内部パラメータに推定誤差がある場合、外部パラメータ推定時にその誤差の影響が現れることが原因である。よってこの手法では複数のカメラ間の位置関係を正しく求めることができない。 This external parameter represents the absolute positional relationship between the central camera 2-1 and the left and right cameras 2-2 and 2-3 in the world coordinate system, that is, real space. FIG. 22 shows the calculation result of this new external parameter. Theoretically, t _left = [− 600, 0, 0] in the case of the left camera and t _right = [600, 0, 0] in the case of the right camera, but the calculation results are greatly different. This is because, as described above, the camera parameters are calculated in the order of the internal parameters and the external parameters. Therefore, if there is an estimation error in the internal parameters, the influence of the error appears when the external parameters are estimated. Therefore, this method cannot correctly determine the positional relationship between a plurality of cameras.

［内部パラメータを拘束したカメラキャリブレーション法の説明］
前述の単眼キャリブレーションをそのまま適用したカメラキャリブレーション法では、内部パラメータの推定誤差が外部パラメータの推定に影響を及ぼすことを示した。そこで本節では、内部パラメータに拘束をかけてキャリブレーションを行うことで、外部パラメータの推定誤差を軽減する手法について説明する。本システムに用いるような高精度なカメラとレンズにおいては、内部パラメータの光軸中心（ｕ_０，ｖ_０）は画像の中心位置にあり、画像歪みは無いと近似することができる。このときカメラにより得られる画像サイズをＷＩＤＴＨ×ＨＥＩＧＨＴとすると、画像中心を示す（ｕ_０，ｖ_０）、画像歪みを示すθを（４３）式のように固定できる。 [Explanation of camera calibration method constraining internal parameters]
In the camera calibration method applying the above monocular calibration as it is, it was shown that the estimation error of the internal parameter affects the estimation of the external parameter. Therefore, in this section, a method for reducing the estimation error of external parameters by performing calibration with constraints on internal parameters will be described. In a highly accurate camera and lens used in the present system, the optical axis center (u ₀ , v ₀ ) of the internal parameter is at the center position of the image, and it can be approximated that there is no image distortion. If the image size obtained by the camera at this time is WIDTH × HEIGHT, (u ₀ , v ₀ ) indicating the image center and θ indicating image distortion can be fixed as shown in equation (43).

これより（１９）式は、次の（４４）式として書き換えられる。 Thus, equation (19) can be rewritten as the following equation (44).

さらに内部パラメータの推定精度を高めるため、各カメラごとにチェッカーパターンが中央に映る位置で、個別に画像群の取得を行う。このとき、３眼から同時に撮影するのは外部パラメータを求めるための１シーンだけとする。前述の単眼カメラキャリブレーションをそのまま適用した場合と、本節の内部パラメータ拘束方式を適用した場合の画像群の違いを図２３に示す。図２３（ａ）は単眼カメラキャリブレーションを適用した場合であり、図２３（ｂ）は内部パラメータ拘束方式を適用した場合である。太枠の画像が外部パラメータ算出対象の画像である。図２３（ｂ）の内部パラメータ拘束方式においては、各カメラごとのキャリブレーションの画像６１に加えて、３眼同時撮影の画像６２を加えている。 In order to further improve the estimation accuracy of the internal parameters, an image group is individually acquired at a position where the checker pattern appears in the center for each camera. At this time, it is assumed that only one scene for obtaining an external parameter is photographed simultaneously from three eyes. FIG. 23 shows a difference between image groups when the above-described monocular camera calibration is applied as it is and when the internal parameter constraint method of this section is applied. FIG. 23A shows a case where the monocular camera calibration is applied, and FIG. 23B shows a case where the internal parameter constraint method is applied. A thick frame image is an external parameter calculation target image. In the internal parameter constraint method of FIG. 23B, in addition to the calibration image 61 for each camera, an image 62 of three-lens simultaneous photographing is added.

内部パラメータ拘束を用いて３眼カメラシステムに対してカメラキャリブレーションを行った。カメラ配置は図１７と共通である。この推定結果を図２４に示す。さらに（４１）式、（４２）式を用いて中央カメラのカメラ座標系を世界座標系としたときの左カメラ２−２と右カメラ２−３のカメラパラメータ（外部パラメータ［Ｒ|ｔ］）を図２５に示す。これよりカメラ位置が２ｍｍ程度の誤差で推定できており、この手法が多眼カメラシステムのカメラキャリブレーションに有効である。 Camera calibration was performed on the trinocular camera system using internal parameter constraints. The camera arrangement is the same as in FIG. The estimation result is shown in FIG. Furthermore, the camera parameters (external parameters [R | t]) of the left camera 2-2 and the right camera 2-3 when the camera coordinate system of the central camera is set to the world coordinate system using the expressions (41) and (42). Is shown in FIG. Accordingly, the camera position can be estimated with an error of about 2 mm, and this method is effective for camera calibration of the multi-camera system.

以上、座標系とカメラパラメータについて説明し、単眼カメラのキャリブレーションの手法、特にＺｈａｎｇのキャリブレーション手法について詳しく説明した。さらにＺｈａｎｇの手法の多眼カメラシステムへの適用について説明した。また、ＣＧ画像を用いたカメラキャリブレーションのシミュレーションを行うことで、提案する手法が２ｍｍ程度の誤差でカメラ位置を推定できた。 The coordinate system and the camera parameters have been described above, and the monocular camera calibration method, particularly the Zhang calibration method, has been described in detail. Furthermore, the application of the Zhang method to a multi-lens camera system has been described. In addition, by performing a camera calibration simulation using a CG image, the proposed method was able to estimate the camera position with an error of about 2 mm.

［多眼カメラを用いた中心視点画像合成手法の説明］
ここで、視線一致可能な顔領域画像合成のための、多眼カメラを用いた中心視点画像合成の手法について述べる。まず試作したシステムについて述べ、次に射影変換を用いた被写体の距離推定について述べる。さらに推定した距離情報を元に、ディスプレイの中央位置からの視点の画像を合成する手法について述べる。 [Explanation of central viewpoint image synthesis using multi-view camera]
Here, a method of synthesizing a central viewpoint image using a multi-lens camera for synthesizing a face area image capable of matching the line of sight will be described. First, the prototype system is described, and then subject distance estimation using projective transformation is described. Furthermore, a method for synthesizing the viewpoint image from the center position of the display based on the estimated distance information will be described.

［中心視点画像の合成：射影変換を用いたカメラ視点の変換］
本システムは、図２６（ａ）に示すようにキャリブレーション時に４眼（カメラ２Ａ，２Ｂ，２Ｃ，２Ｄ）、図２６（ｂ）に示すようにシステム利用時に３眼（カメラ２Ａ，２Ｂ，２Ｃ）を用いるシステムである。本節ではカメラ視点の変換について説明するため、まず図２７（ａ）に示すようなキャリブレーション時に２眼（カメラ２Ｃ，２Ｄ）のカメラ配置のシステム、図２７（ｂ）に示すようなカメラ視点変換時に１眼（カメラ２−１）のカメラ配置のシステムを考える。はじめに図２７（ａ）のシステムに対し多眼カメラキャリブレーションを行う。このとき得られるカメラパラメータは（４１）式のようなカメラ２Ｄを世界座標系の原点とした場合のパラメータであり、カメラ２Ｄの回転行列Ｒ’_０と並進ベクトルｔ’_０は（４５）式、（４６）式、カメラ２Ｃの回転行列Ｒ’_１と並進ベクトルｔ’_１は（４７）式、（４８）式のように表される。 [Composition of Central Viewpoint Image: Camera Viewpoint Conversion Using Projective Transformation]
This system has four eyes (cameras 2A, 2B, 2C, 2D) as shown in FIG. 26A, and three eyes (cameras 2A, 2B, 2C) when the system is used as shown in FIG. ). In this section, in order to explain the conversion of the camera viewpoint, first, a system for arranging two cameras (cameras 2C and 2D) at the time of calibration as shown in FIG. 27A and a camera viewpoint conversion as shown in FIG. Consider a system with a single camera (camera 2-1) placement. First, multi-eye camera calibration is performed on the system shown in FIG. The camera parameters obtained at this time are parameters when the camera 2D is set as the origin of the world coordinate system as in the equation (41), and the rotation matrix R ′ ₀ and the translation vector t ′ ₀ of the camera 2D are expressed in the equation (45), Expression (46), the rotation matrix R ′ ₁ and translation vector t ′ ₁ of the camera 2C are expressed as Expressions (47) and (48).

このとき（４７）式、（４８）式はそれぞれカメラ２Ｄに対してカメラ２Ｃがどれだけ回転、並進しているかを示す値になる。図２７（ｂ）において、カメラ２Ｄを仮想カメラ２−０として、カメラ２Ｃをカメラ２−１として１眼のカメラ視点変換を考えることができる。 At this time, the equations (47) and (48) are values indicating how much the camera 2C is rotating and translating with respect to the camera 2D. In FIG. 27B, one-camera viewpoint conversion can be considered with the camera 2D as the virtual camera 2-0 and the camera 2C as the camera 2-1.

ここで図２８のようにカメラ２−０の位置に仮想カメラを配置し、その仮想カメラ（カメラ２−０）に対して平行な仮想平面を距離ｚだけ離した位置に配置した場合を考える。世界座標系の原点を仮想カメラ２−０の光軸（Ｚ）と仮想平面が交わる点に変更すると、新たな世界座標系に対する仮想カメラ座標系とカメラ１座標系の並進ベクトルｔ_ｉｍ，ｔ_１はそれぞれ次の（４９）式、（５０）式として表すことができる。 Consider a case where a virtual camera is arranged at the position of the camera 2-0 as shown in FIG. 28 and a virtual plane parallel to the virtual camera (camera 2-0) is arranged at a position separated by a distance z. When the origin of the world coordinate system is changed to a point where the optical axis (Z) of the virtual camera 2-0 intersects the virtual plane, the translation vectors t _im and t ₁ of the virtual camera coordinate system and the camera 1 coordinate system with respect to the new world coordinate system. Can be expressed as the following equations (49) and (50), respectively.

また、このとき世界座標系に対する仮想カメラ座標系とカメラ１座標系の回転行列Ｒ_ｉｍ、Ｒ_１はそれぞれ次の（５１）式、（５２）式として表すことができる。 At this time, the rotation matrices R _im and R ₁ of the virtual camera coordinate system and the camera 1 coordinate system with respect to the world coordinate system can be expressed as the following equations (51) and (52), respectively.

また、仮想カメラはカメラ２−０をそのまま置き換えて表すため、仮想カメラの内部パラメータＡ_ｉｍは次の（５３）式で表される。 Further, since the virtual camera is represented by replacing the camera 2-0 as it is, the internal parameter A _im of the virtual camera is represented by the following equation (53).

次に仮想平面に対する射影変換を考える。この射影変換における変換行列の対応関係を図２９に示す。 Next, projective transformation for a virtual plane is considered. FIG. 29 shows the correspondence of the transformation matrix in this projective transformation.

各カメラの画像座標系ｍ_ｉｍ、ｍ_１と世界座標系Ｍとの関係は、それぞれ次の（５４）式、（５５）式として表される。 The relationship between the image coordinate systems m _im and m _{1 of} each camera and the world coordinate system M is expressed as the following equations (54) and (55), respectively.

このとき仮想平面は世界座標においてＺ＝０で表されるため、仮想平面上の点Ｍに対して（５５）式は以下の（５６）式に変形できる。 At this time, since the virtual plane is represented by Z = 0 in the world coordinates, the equation (55) can be transformed into the following equation (56) with respect to the point M on the virtual plane.

ここでＨ_１（ｚ）は世界座標系の平面とカメラ１の画像座標系の対応を示す３×３のホモグラフィ行列であり、ｚの関数である。同様に（５４）式を変形させることで次の（５７）式を得る。 Here, H ₁ (z) is a 3 × 3 homography matrix indicating the correspondence between the plane of the world coordinate system and the image coordinate system of the camera 1, and is a function of z. Similarly, the following equation (57) is obtained by transforming the equation (54).

次に（５６）式を変形し、次の（５８）式を得る。 Next, the equation (56) is modified to obtain the following equation (58).

これを（５７）式に代入することで、次の（５９）式を得る。 By substituting this into the equation (57), the following equation (59) is obtained.

（５９）式は仮想カメラの画像座標上の点ｍ_ｉｍとカメラ２−１の画像座標上の点ｍ_１の対応関係を示す。そのため（５９）式を用いることで、カメラ２−１により撮影した画像を仮想カメラ視点で撮影した画像に変換することができる。ただし、このとき、正しく変換されるのは、先に仮定した仮定平面の距離ｚに被写体が存在している部分だけであり、それ以外の部分は誤って変換される。例えば図３０のように球体を被写体とし、仮想平面を球体を通過するようにｚを変化させた場合、図３０に示す太線部分のみ正しく変換される。すなわち、図３０（ａ）は仮想平面がｚの位置に在る場合、図３０（ｂ）は仮想平面がｚ＋αにある場合、図３０（ｃ）は仮想平面がｚ＋２αにある場合をそれぞれ示し、被写体である球体のどの部分が正しく変換されるかを太線で示している。 Equation (59) shows the correspondence between the point m _im on the image coordinates of the virtual camera and the point m ₁ on the image coordinates of the camera 2-1. Therefore, by using the equation (59), an image captured by the camera 2-1 can be converted into an image captured from the virtual camera viewpoint. However, at this time, only the portion where the subject is present at the distance z of the assumed plane assumed earlier is correctly converted, and the other portions are erroneously converted. For example, when a sphere is a subject as shown in FIG. 30 and z is changed so as to pass through the sphere on the virtual plane, only the thick line portion shown in FIG. 30 is correctly converted. 30A shows a case where the virtual plane is at the position z, FIG. 30B shows a case where the virtual plane is at z + α, FIG. 30C shows a case where the virtual plane is at z + 2α, Thick lines indicate which part of the sphere that is the subject is correctly converted.

［被写体モデル作成の説明］
（５６）式でディスプレイ中央に設置した仮想カメラと周囲に設置したカメラ２−１との対応関係を示したが、これはディスプレイ５の周囲に設置した他の全てのカメラに対しても同様に適用できる。ここで図３１に示す本ＴＶ対話システムについて考える。図１と同様に図３１のシステムにおいて、カメラ２−１（図１ではカメラ２Ａ）は表示装置５の中央上部に、カメラ２−２（図１ではカメラ２Ｂ）は表示装置５の側面右下部に、カメラ２−３（図１ではカメラ２Ｃ）は表示装置５の側面左下部に装着されている。図３１では、表示装置５の後ろ側から見た図であるので、カメラが左右入れ替わっている。カメラ２−２、カメラ２−３の画像座標系をｍ_２，ｍ_３、射影変換行列をＨ_２（ｚ），Ｈ_３（ｚ）とすると次の（６０）式、（６１）式を得る。 [Description of subject model creation]
Although the correspondence relationship between the virtual camera installed in the center of the display and the camera 2-1 installed in the periphery is shown by the equation (56), this applies to all other cameras installed around the display 5 as well. Applicable. Consider the present TV dialogue system shown in FIG. As in FIG. 1, in the system of FIG. 31, the camera 2-1 (camera 2A in FIG. 1) is at the upper center of the display device 5, and the camera 2-2 (camera 2B in FIG. 1) is at the lower right side of the side of the display device 5. In addition, the camera 2-3 (camera 2C in FIG. 1) is mounted on the lower left side of the side surface of the display device 5. In FIG. 31, since it is the figure seen from the back side of the display apparatus 5, the camera has changed right and left. When the image coordinate system of the camera 2-2 and camera 2-3 is m ₂ and m ₃ and the projective transformation matrix is H ₂ (z) and H ₃ (z), the following equations (60) and (61) are obtained. .

座標変換部２１，２２，２３は、（５９）式、（６０）式、（６１）式をそれぞれカメラ２−１、カメラ２−２、カメラ２−３から取得した画像Ｉ_１，Ｉ_２，Ｉ_３に適用することで、それぞれの画像を仮想カメラであるカメラ２−０の視点に変換した画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}を得る。 The coordinate conversion units 21, 22, and 23 obtain images I ₁ , I ₂ , and (1) obtained from the expressions (59), (60), and (61) from the camera 2-1, the camera 2-2, and the camera 2-3, respectively. by applying the I _3, obtained image _I 1Conv obtained by converting the respective images to the viewpoint of the camera 2-0 is a virtual _{_camera,} I _2conv, the _{I 3conv.}

次に、距離推定部２４は、変換で得られた３枚の画像Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ}に対し、２枚１組ずつブロックマッチングによる評価を行う。図３２に示すようにマッチングに用いるブロック形状には様々なパターンが考えられるが、本節では長方形マッチングを例として説明する。注目画素を（ｉ，ｊ）とした場合、評価値として次の（６２）式に示す変換画像の画素値の絶対値差分和Ｉ_{ＳＵＭαβ}を用いる。 Next, the distance estimation unit 24 performs evaluation by block matching for each of the three images I _1conv , I _2conv , and I _3conv obtained by the conversion. As shown in FIG. 32, various patterns are conceivable for the block shape used for matching. In this section, rectangular matching will be described as an example. When the pixel of interest is (i, j), the absolute value difference sum _ISUMαβ of the pixel values of the converted image shown in the following equation (62) is used as the evaluation value.

ここで、α、βは対応するカメラ番号、ｍは水平ブロックサイズ、ｎは垂直ブロックサイズ、Ｉ_{αｃｏｎｖ}（ｉ’，ｊ’），Ｉ_{βｃｏｎｖ}（ｉ’，ｊ’）はそれぞれ評価に用いる２枚の変換画像の位置（ｉ’，ｊ’）における画素値を示す。この（６２）式を３枚による３通りの組み合わせ全てに対して行った結果を評価値Ｉ_{ＳＵＭＡＬＬ}とし、次の（６３）式に示す。 Here, α and β are corresponding camera numbers, m is a horizontal block size, n is a vertical block size, and I _αconv (i ′, j ′) and I _βconv (i ′, j ′) are two images used for evaluation, respectively. The pixel value at the position (i ′, j ′) of the converted image is shown. The result obtained by performing this equation (62) for all three combinations of three sheets is taken as an evaluation value I _SUMALL, and is shown in the following equation (63).

ここでＷ_αβはそれぞれの組の絶対値差分和に対する重み付けを示し、０から１の範囲で最適となるよう手動で調整する。次に仮想平面距離ｚを、距離推定を行う範囲で変化させ、全ての距離において（６３）式で評価を行う。それぞれの距離における評価値をＩ_{ＳＵＭＡＬＬ}（i,j;z）と表し、Ｉ_{ＳＵＭＡＬＬ}の値がもっとも小さな距離を注目画素の距離値ｄ（ｉ，ｊ）とすると、以下の（６４）式として表すことができる。この操作を全ての画素に対して行うことで、被写体の距離画像を推定できる。 Here, W _αβ represents a weight for each set of absolute value difference sums, and is manually adjusted so as to be optimal in the range of 0 to 1. Next, the virtual plane distance z is changed within the range in which the distance is estimated, and the evaluation is performed using the equation (63) at all distances. The evaluation value at each distance is expressed as _ISUMALL (i, j; z), and the distance with the smallest _ISUMALL value is the distance value d (i, j) of the pixel of interest. be able to. By performing this operation on all the pixels, the distance image of the subject can be estimated.

［画素値の決定の説明］
画像合成部２５は、作成した被写体の距離画像に対し、周囲のカメラから対応する画素の色を貼り付ける。被写体の距離情報が得られている場合、（５６）式、（５７）式、（５８）式に示すように幾何学計算で仮想カメラの画素に対応する周囲のカメラの画素を１対１の関係で求めることができる。距離値ｚに対し算出された仮想カメラの注目画素の画素値をＩ_ｉｍ（ｉ，ｊ，ｚ）、その注目画素に対応する周囲のカメラの画素値をＩ_{αｃｏｎｖ}（ｉ，ｊ，ｚ）（α＝１，２，３）とした場合、貼り付ける色の計算方法を（６５）式に示す。 [Description of determination of pixel value]
The image composition unit 25 pastes the corresponding pixel color from the surrounding camera to the created distance image of the subject. When the distance information of the subject is obtained, the surrounding camera pixels corresponding to the virtual camera pixels are calculated one-to-one in the geometric calculation as shown in the equations (56), (57), and (58). It can be found in a relationship. The pixel value of the _target pixel of the virtual camera calculated with respect to the distance value z is set as I _im (i, j, z), and the pixel values of the surrounding cameras corresponding to the _target pixel are set as I _αconv (i, j, z) ( In the case of α = 1, 2, 3), the calculation method of the color to be pasted is shown in equation (65).

ここでＵ_α（α＝１，２，３）はそれぞれの画素値に対する重み付けを示し、Ｕ_１＋Ｕ_２＋Ｕ_３＝１の範囲で最適になるように手動で調節する。この操作を仮想カメラの全ての画素に対して行うことで仮想カメラ視点の画像を合成できる。 Here, U _α (α = 1, 2, 3) represents a weight for each pixel value, and is manually adjusted so as to be optimal in the range of U ₁ + U ₂ + U ₃ = 1. By performing this operation on all the pixels of the virtual camera, it is possible to synthesize the virtual camera viewpoint image.

以上、取得したカメラキャリブレーションデータを用いた視点位置の変換方法と、変換結果を利用した被写体の距離モデル作成方法について説明した。さらにその距離モデルに周囲のカメラの色を貼り付けることでディスプレイの中央視点からの画像を合成する手法について説明した。ここではカメラが３台の例を記したが、３台以上のカメラを用いた場合でも本手法を適用できる。例えばカメラが４台の場合は、３台のカメラが２組あると考えて本手法を適用し、出力される２枚の合成画像を統合すれば良い。つまり、カメラ２Ａ，２Ｂ，２Ｃ，２Ｄが配置されている場合、例えば（２Ａ，２Ｂ，２Ｃ）と（２Ｂ，２Ｃ，２Ｄ）の３台のカメラ２組を考えて、各組の合成画像を２枚生成し、さらにこの２枚を統合する。画像の統合には単純平均、加重平均などを用いることができる。つまり、少なくとも３台のカメラがあれば本手法を用いて任意視点画像を合成でき、３台以上のカメラを用いることでさらに正面画像を高画質化できる。 The viewpoint position conversion method using the acquired camera calibration data and the subject distance model creation method using the conversion result have been described above. Furthermore, the method of combining images from the central viewpoint of the display by pasting the color of the surrounding camera to the distance model was explained. Here, an example in which there are three cameras has been described, but this method can be applied even when three or more cameras are used. For example, when there are four cameras, this method is applied assuming that there are two sets of three cameras, and two output composite images may be integrated. That is, when the cameras 2A, 2B, 2C, and 2D are arranged, for example, two sets of (2A, 2B, 2C) and (2B, 2C, 2D) are considered, and the combined image of each set is displayed. Two sheets are generated, and these two sheets are further integrated. A simple average, a weighted average, or the like can be used for image integration. That is, if there are at least three cameras, an arbitrary viewpoint image can be synthesized using this method, and the front image can be further improved in image quality by using three or more cameras.

［顔領域画像合成の最適なパラメータの説明］
次に、ＣＧを用いて周囲のカメラの画像を作成し、それを基に中央視点画像合成シミュレーションを行い、カメラ数、カメラ配置、マッチング時のブロック形状を変化させた場合の、最適なパラメータについて説明する。 [Explanation of optimal parameters for face area image composition]
Next, create an image of the surrounding camera using CG, perform the central viewpoint image synthesis simulation based on it, and optimize the parameters when the number of cameras, camera arrangement, and block shape at the time of matching are changed explain.

［カメラ配置による合成精度への影響］
図３３に示すように、表示装置５の周囲のカメラ配置位置として２Ａ，２Ｂ，２Ｃ，２Ｄ，２Ｅの５箇所を考える。図３３（ａ）はカメラの配置位置を示し、（ｂ）はカメラの配置座標を示す。被写体６は、表示装置５の向こう側に位置しており、図３３は表示装置５の裏側から見ている図である。カメラ２Ａ、２Ｂ、２Ｄは表示部中心（仮想カメラ位置）に対して上部に、カメラ２Ｃ、２Ｅは表示部中心（仮想カメラ位置）に対して下部に配置される。また、カメラ２Ｂ、２Ｃは表示部中心に対して右部に、カメラ２Ｄ，２Ｅは表示部中心に対して左部に配置される。このとき全てのカメラは点（０，０，１７００）に光軸が向くよう配置される。被写体６のＣＧ顔モデルは点（０，０，１７００）に中心が来るように配置される。中央の仮想カメラを含む６つのカメラ視点画像をＣＧにより作成する。このときのカメラ設定を図３４に示す。このうち複数のカメラ配置を選択し、仮想カメラ視点の画像（以下中央視点画像）の合成を行う。 [Influence on composition accuracy by camera placement]
As shown in FIG. 33, five locations 2A, 2B, 2C, 2D, and 2E are considered as camera placement positions around the display device 5. FIG. 33A shows the camera arrangement position, and FIG. 33B shows the camera arrangement coordinates. The subject 6 is located on the other side of the display device 5, and FIG. 33 is a view seen from the back side of the display device 5. The cameras 2A, 2B, and 2D are disposed above the display unit center (virtual camera position), and the cameras 2C and 2E are disposed below the display unit center (virtual camera position). The cameras 2B and 2C are arranged on the right side with respect to the center of the display unit, and the cameras 2D and 2E are arranged on the left side with respect to the center of the display unit. At this time, all the cameras are arranged so that the optical axis is directed to the point (0, 0, 1700). The CG face model of the subject 6 is arranged so that the center is located at the point (0, 0, 1700). Six camera viewpoint images including the central virtual camera are created by CG. FIG. 34 shows the camera settings at this time. Among these, a plurality of camera arrangements are selected, and a virtual camera viewpoint image (hereinafter referred to as a central viewpoint image) is synthesized.

なお、カメラキャリブレーションにより取得した各カメラパラメータを計算に用いる。ＣＧチェッカーパターンによるキャリブレーションを行っており、このときのカメラパラメータを図３５に示す。図３５において、仮想カメラを「０」としている。またシミュレーションにおけるその他の合成条件を図３６に示す。 In addition, each camera parameter acquired by camera calibration is used for calculation. Calibration using a CG checker pattern is performed, and camera parameters at this time are shown in FIG. In FIG. 35, the virtual camera is set to “0”. Further, other synthesis conditions in the simulation are shown in FIG.

この合成画像は、どのカメラ２台を用いても顔表面形状の距離推定が顔の丸みがわかる程度に良好に生成された。またどのカメラ２台を用いても中央視点画像が合成できた。しかし、目近傍においては合成誤りが多く発生した。これは片方のカメラもしくは両方のカメラから見えない領域であるオクルージョン領域で距離推定を誤っているためと考えられる。また、鼻領域に注目すると全ての組み合わせにおいて鼻領域の合成誤りが多く発生した。これもまたオクルージョン領域の影響であり、どの２台を選択してもこのオクルージョンの影響を避けられない。 This composite image was generated satisfactorily to such an extent that the distance estimation of the face surface shape was able to show the roundness of the face regardless of which two cameras were used. In addition, the central viewpoint image could be synthesized using any two cameras. However, many composition errors occurred near the eyes. This is considered to be because the distance estimation is wrong in the occlusion area which is an area invisible from one camera or both cameras. In addition, when focusing on the nose region, many combinations of nose regions occurred in all combinations. This is also an influence of the occlusion area, and the influence of this occlusion cannot be avoided even if any two are selected.

そこで、最もオクルージョンの影響の少ないカメラ２Ｃ，２Ｅを基本に、カメラ２Ａカメラまたはカメラ２Ｂを追加した３台のカメラを用いて中央視点画像合成を行った。その結果、３台のカメラを用いた方が目近傍の距離を精度よく推定できた。また、どちらも２台のカメラを用いるより良好な結果が得られているが、カメラ２Ｂ，２Ｃ，２Ｅを用いた方がよりオクルージョンを回避できた。しかし、鼻領域はカメラ２Ａ，２Ｃ，２Ｅを用いた方が良好に合成された。これは、カメラを表示部中心（仮想カメラ位置）に対して上部と下部に少なくとも１台ずつ配置し、かつ表示部中心に対して少なくとも１台ずつ配置するということである。 Therefore, based on the cameras 2C and 2E having the least influence of occlusion, the central viewpoint image was synthesized using three cameras including the camera 2A camera or the camera 2B. As a result, it was possible to estimate the distance near the eyes more accurately by using three cameras. Moreover, although both obtained better results than using two cameras, occlusion could be avoided more by using cameras 2B, 2C, and 2E. However, the nose region was synthesized better using the cameras 2A, 2C, and 2E. This means that at least one camera is arranged at the top and bottom with respect to the center of the display unit (virtual camera position), and at least one camera is arranged at the center of the display unit.

［ブロック形状による合成精度への影響］
次に３台のカメラを用いた場合において、ブロック形状を変化させて中央視点画像合成を行った。３台の組み合わせは（２Ａ，２Ｃ，２Ｅ）、（２Ｂ，２Ｃ，２Ｅ）の２通りについて、ブロック形状は図３２（ａ）に示す長方形、（ｂ）に示すひし形、（ｃ）に示す変則形の３種類について評価を行った。 [Influence on composition accuracy by block shape]
Next, when three cameras were used, the central viewpoint image was synthesized by changing the block shape. The combinations of the three units are (2A, 2C, 2E) and (2B, 2C, 2E), the block shape is the rectangle shown in FIG. 32 (a), the rhombus shown in (b), and the irregularity shown in (c). Three types of shapes were evaluated.

その結果、長方形マッチングと比較して処理量が半分になっているにもかかわらず、同程度の合成結果が得られた。
次に変則形マッチングによる距離推定を行った結果、どちらもオクルージョンの影響がほぼ見られない目領域画像を合成できた。鼻領域はＡＣＥカメラを用いた場合の方が合成結果が良かった。 As a result, although the processing amount was halved compared with the rectangular matching, the same synthesis result was obtained.
Next, as a result of distance estimation by irregular shape matching, it was possible to synthesize eye region images in which the influence of occlusion was hardly seen. As for the nose region, the synthesized result was better when the ACE camera was used.

以上の結果より本システムでは、カメラ配置に２Ａ、２Ｃ、２Ｅの３箇所、マッチングには変則形ブロックマッチングを用いた。すなわち、カメラ配置、カメラ数、マッチング時のブロック形状の影響などを調べた結果、ディスプレイ上部、右下部、左下部の３箇所にカメラを配置した場合、目領域合成時のオクルージョン影響を減らすことが出来る。また、長方形ブロックでマッチングを行った場合よりも変則形ブロックでブロックマッチングを行った場合に合成結果が良かった。 From the above results, in this system, three locations of 2A, 2C, and 2E are used for camera arrangement, and irregular block matching is used for matching. That is, as a result of investigating the effects of the camera arrangement, the number of cameras, the block shape at the time of matching, etc., if the cameras are arranged in three places, the upper part of the display, the lower right part, and the lower left part, the occlusion influence at the time of eye region synthesis can be reduced. I can do it. Also, the synthesis result was better when the block matching was performed with the irregular block than when the matching was performed with the rectangular block.

［実画像を用いた顔領域画像合成の説明］
図３７に示すカメラ配置で被写体６を撮影し、顔領域画像合成を行った。被写体６はディスプレイ５から１７００ｍｍの位置に座っており、各カメラは被写体の上半身を捉えるように配置されている。まず全てのカメラに対してカメラキャリブレーションを行った。その結果を図３８に示す。なお、世界座標系の原点はカメラ２−０となっている。キャリブレーションの結果、カメラ２−２とカメラ２−３のｘ方向の間隔はおよそ１３００ｍｍ、カメラ２−１とカメラ２−０のｙ方向の間隔はおよそ４００ｍｍであることがわかる。この推定結果はおおよその実測値と一致する。キャリブレーション後、比較用のカメラ２−０を含む４台のカメラで視線をディスプレイ中央に向けた（以降正面視線と呼ぶ）被写体を撮影した。 [Description of face area image composition using real images]
The subject 6 was photographed with the camera arrangement shown in FIG. The subject 6 sits at a position 1700 mm from the display 5, and each camera is arranged to capture the upper body of the subject. First, camera calibration was performed for all cameras. The result is shown in FIG. The origin of the world coordinate system is the camera 2-0. As a result of the calibration, it can be seen that the distance in the x direction between the camera 2-2 and the camera 2-3 is about 1300 mm, and the distance in the y direction between the camera 2-1 and the camera 2-0 is about 400 mm. This estimation result agrees with an approximate actual measurement value. After the calibration, a subject with a line of sight directed toward the center of the display (hereinafter referred to as a front line of sight) was photographed with four cameras including a comparative camera 2-0.

被験者２名をＵｓｅｒＡ，ＵｓｅｒＢとし、カメラ２−０を除いた３枚の画像を用いて中央視点画像合成を行った。図３９に各被験者に対する合成条件を示す。図３９は、被験者別に調整を行ったパラメータであり、距離推定範囲のみが異なり、他の合成条件は共通である。その結果、顔領域の距離推定が良好に行われていることが確認できた。また、２名の被験者に対して正面視線の中央視点画像が目領域、鼻領域、口領域ともに良好に合成できた。 Two subjects were User A and User B, and a central viewpoint image was synthesized using three images excluding the camera 2-0. FIG. 39 shows the synthesis conditions for each subject. FIG. 39 shows parameters adjusted for each subject, only the distance estimation range is different, and other synthesis conditions are common. As a result, it was confirmed that the distance estimation of the face area was performed well. In addition, the central viewpoint image of the frontal line of sight for the two subjects could be satisfactorily synthesized for the eye area, the nose area, and the mouth area.

［視線方向変化画像の説明］
現実には対話中、人の視線は絶えず変化するためそれにも対応出来なければならない。そこで視線方向の変化による合成画像への影響を確認するため、視線方向を上下左右に変化させた画像を用いて中央視点画像合成を行った。 [Description of gaze direction change image]
In reality, during the conversation, people's gaze constantly changes, so it must be able to cope with it. Therefore, in order to confirm the effect on the composite image due to the change in the line-of-sight direction, the central viewpoint image was synthesized using an image in which the line-of-sight direction was changed vertically and horizontally.

このときの合成条件は図３９と同一である。その結果、各視線方向の中央視点画像が良好に合成できた。また、視線方向を変化させても距離推定時のマッチングを誤ることなく、距離が良好に推定できた。また、眼球の運動による肌や眉毛の位置変化も良好に合成できた。 The synthesis conditions at this time are the same as in FIG. As a result, the central viewpoint images in each line-of-sight direction were successfully synthesized. In addition, even when the line-of-sight direction was changed, the distance could be estimated satisfactorily without making a mistake in matching during distance estimation. In addition, changes in the position of the skin and eyebrows due to eye movement were also successfully synthesized.

［仮想カメラ位置変化画像］
実際の利用環境では、必ずしもユーザーがディスプレイ中央正面位置にいるとは限らない。例えば体を傾けたりした場合、頭の位置が変化すれば対話相手の見え方は変わるはずである。そこで実空間に近い視覚効果を得るために、利用者の頭の位置に応じてディスプレイ上の任意の位置に仮想カメラを設置できることが望ましい。そこで、仮想カメラ位置を変化させながら中央視点以外における画像合成を行った。ここでは、被写体は先と同様の位置のままとし、仮想カメラ位置のみを動かす状況を考える。仮想カメラ位置はカメラ２−０のパラメータを基に計算することで任意の位置で求めることができる。 [Virtual camera position change image]
In an actual usage environment, the user is not always in the center front position of the display. For example, when the body is tilted, if the head position changes, the appearance of the conversation partner should change. Therefore, in order to obtain a visual effect close to real space, it is desirable that a virtual camera can be installed at an arbitrary position on the display in accordance with the position of the user's head. Therefore, image synthesis was performed at a position other than the central viewpoint while changing the virtual camera position. Here, let us consider a situation in which the subject remains in the same position as before and only the virtual camera position is moved. The virtual camera position can be obtained at an arbitrary position by calculating based on the parameters of the camera 2-0.

画像は正面視線方向画像を用い、仮想カメラ位置を（−６００，０，０）から（６００，０，０）までｘ方向に３００ｍｍ刻みで変化させた。なお、カメラの光軸は常に点（０，０，１４５０）に向くものとする。このときの合成条件は図３９と同一である。その結果、仮想カメラ位置変化による視覚効果が得られた。特に同条件でディスプレイ中央位置から合成した場合と比較すると、カメラ位置の変化により視線方向が変化していくことを確認できた。また、鼻上部領域で段差のような領域が見られたが、これは推定幅を現条件の３ｍｍから小さくすることで軽減できると考えられる。 As the image, a front view direction image was used, and the virtual camera position was changed from (−600, 0, 0) to (600, 0, 0) in 300 mm increments in the x direction. Note that the optical axis of the camera is always directed to the point (0, 0, 1450). The synthesis conditions at this time are the same as in FIG. As a result, the visual effect by the virtual camera position change was obtained. In particular, it was confirmed that the line-of-sight direction changed due to the change in the camera position, compared with the case where the image was synthesized from the center position of the display under the same conditions. Moreover, although the area | region like a level | step difference was seen in the nose upper area | region, it is thought that this can be reduced by making an estimated width | variety small from 3 mm of present conditions.

［動画像を用いた顔領域画像合成］
構築したシステムのカメラ２−１、カメラ２−２、カメラ２−３の３眼を同期させて３０ｆｐｓで瞬きの動画を撮影し、各フレーム毎に中央視点画像合成を行った。 [Face region image composition using moving images]
A blinking video was taken at 30 fps by synchronizing the three eyes of the camera 2-1, camera 2-2, and camera 2-3 of the constructed system, and a central viewpoint image was synthesized for each frame.

その結果、瞬き動作時においても合成が良好に行われていることが確認できた。本手法は、目を抽出するなどの処理を行わず、距離推定に基づく任意視点画像合成を行っているため、被写体が目を開いていても閉じていてもその距離推定範囲に収まっていれば原理的に合成結果に影響は与えないはずである。このことから本手法は被写体の動作に対して頑健であると言える。 As a result, it was confirmed that the synthesis was performed well even during the blinking operation. Since this method uses arbitrary viewpoint image composition based on distance estimation without processing such as eye extraction, so long as the subject is within the distance estimation range regardless of whether the eyes are open or closed In principle, it should not affect the synthesis results. From this, it can be said that this method is robust against the movement of the subject.

以上、ＴＶ対話システムを用いてディスプレイ中央視点画像の合成を行った。まずカメラシステムのキャリブレーションデータを示し、正面視線方向の取得画像に対する処理結果を示した。その結果、本手法が中央視点画像合成を行えることを確認した。また、視線方向を上下左右に変化させた画像に対し処理を行い、視線方向にかかわらず中央視点画像が合成出来ることを確認した。さらに仮想カメラのパラメータを書き換えることでカメラ位置を動かしたような視覚効果を得られることも確認した。最後に瞬き動作を撮影した連続画像に対して処理を行い、瞬きのような目を閉じる動作にも対応できることを確認した。 As described above, the display central viewpoint image is synthesized using the TV dialogue system. First, the calibration data of the camera system is shown, and the processing result for the acquired image in the front line-of-sight direction is shown. As a result, we confirmed that this method can synthesize the central viewpoint image. In addition, processing was performed on an image whose line-of-sight direction was changed vertically and horizontally, and it was confirmed that a central viewpoint image can be synthesized regardless of the line-of-sight direction. It was also confirmed that the visual effect of moving the camera position can be obtained by rewriting the parameters of the virtual camera. Finally, we performed processing on the continuous images that captured the blinking action, and confirmed that it was possible to cope with the action of closing the eyes like blinking.

以上、説明したように、本発明の任意視点画像合成装置によれば、表示装置と、前記表示装置の表示部周囲の上部、右下部、左下部の少なくとも３箇所に設置した撮像装置と、画像合成処理部とを備えたため、正面画像を高画質化でき、視線一致感が向上するという効果が得られる。 As described above, according to the arbitrary viewpoint image composition device of the present invention, the display device, the imaging devices installed in at least three locations around the display unit of the display device, the lower right portion, and the lower left portion, and the image Since the image processing apparatus includes the synthesis processing unit, the front image can be improved in image quality, and the effect of improving the line-of-sight match can be obtained.

また、本発明によれば、前記画像合成処理部において、前記撮像装置のカメラキャリブレーションデータを用いて視点位置の変換を行う座標変換部と、前記座標変換部で変換した被写体距離モデルを作成する距離推定部と、前記被写体距離モデルに前記撮像装置で取得した色を貼り付けることで画像を合成する画像合成部とを備えたため、正面画像を高画質化でき、視線一致感が向上するという効果が得られる。また、特に目の周辺領域の画質を向上させることにより、一層、視線一致感を向上することが可能となり、テレビ対話システムに適用可能な任意視点画像合成装置を実現することができる。 According to the invention, in the image composition processing unit, a coordinate conversion unit that converts a viewpoint position using camera calibration data of the imaging apparatus, and a subject distance model converted by the coordinate conversion unit are created. Since it has a distance estimation unit and an image synthesis unit that synthesizes an image by pasting the color acquired by the imaging device to the subject distance model, the front image can be improved in image quality, and the line-of-sight matching feeling is improved. Is obtained. In particular, by improving the image quality of the peripheral region of the eyes, it is possible to further improve the line-of-sight matching feeling, and it is possible to realize an arbitrary viewpoint image composition device applicable to a television conversation system.

また、本発明によれば、前記距離推定部において、前記撮像装置から取得した画像を変則形ブロックでブロックマッチングを行うブロックマッチング部を備えたため、オクルージョンの影響を減らすことができ、正面画像の合成結果を高画質化することができる。すなわち、視線一致感が向上するという効果が得られる。 In addition, according to the present invention, the distance estimation unit includes a block matching unit that performs block matching on the image acquired from the imaging device with an irregular block, thereby reducing the influence of occlusion and combining front images. The image quality can be improved. That is, the effect of improving the line-of-sight match is obtained.

本発明は、上記実施例の構成に限るものではない。本発明の他の実施例について図面を参照して詳細に説明する。図４０は本実施例の全体構成を示すブロック図である。図１と同一部分には共通の符号を付してある。このシステム１０１の構成は図１のシステム１と同じ構成であるが、任意視点画像合成装置１１０のカメラ２Ｂ，２Ｃの配置が異なる。カメラ２Ａを表示装置５の中心上に、カメラ２Ｂを表示装置５の側面左下端に、カメラ２Ｃを表示装置５の右下端の３箇所に設置する。こうして、３箇所のカメラ２Ａ，２Ｂ，２Ｃをつないだ三角形の面積が最大となり、図１の構成に比較して、より広い範囲で中央視点画像を求めやすいという利点がある。 The present invention is not limited to the configuration of the above embodiment. Another embodiment of the present invention will be described in detail with reference to the drawings. FIG. 40 is a block diagram showing the overall configuration of this embodiment. The same parts as those in FIG. 1 are denoted by common reference numerals. The configuration of the system 101 is the same as that of the system 1 in FIG. 1, but the arrangement of the cameras 2B and 2C of the arbitrary viewpoint image composition device 110 is different. The camera 2 A is installed on the center of the display device 5, the camera 2 B is installed on the lower left side of the display device 5, and the camera 2 C is installed on the lower right side of the display device 5. Thus, the area of the triangle connecting the three cameras 2A, 2B, and 2C is maximized, and there is an advantage that the central viewpoint image can be easily obtained in a wider range than the configuration of FIG.

また、本発明のさらに他の実施例について図面を参照して詳細に説明する。図４１は本実施例の構成例を示す機能ブロック図である。本実施例のシステム２０１の任意視点画像合成装置２１０の構成が、図１の任意視点画像合成装置１０と異なり、画像出力部４の代わりに、画像送信部４５００と画像受信部４５０２を備えている。任意視点画像合成装置２１０では、撮像装置で撮影した画像を画像合成処理部３で合成し、合成した画像は画像送信部４５００を介して外部のネットワーク４５０１と接続され、他の任意視点画像合成装置２１０などに繋がっており、他の任意視点画像合成装置２１０などから画像受信部４５０２を介して伝送された画像を表示装置５に表示する。こうして、このシステム２０１は、ネットワーク４５０１を介して遠隔地にいる利用者同士が視線を一致させてコミュニケーションをとることができ、テレビ会議を行うのに適している。 Still another embodiment of the present invention will be described in detail with reference to the drawings. FIG. 41 is a functional block diagram showing a configuration example of this embodiment. The configuration of the arbitrary viewpoint image composition device 210 of the system 201 of this embodiment is different from the arbitrary viewpoint image composition device 10 of FIG. 1, and includes an image transmission unit 4500 and an image reception unit 4502 instead of the image output unit 4. . In the arbitrary viewpoint image composition device 210, the image photographed by the imaging device is composed by the image composition processing unit 3, and the synthesized image is connected to the external network 4501 via the image transmission unit 4500, and another arbitrary viewpoint image composition device. 210 and the like, and an image transmitted from another arbitrary viewpoint image composition device 210 or the like via the image receiving unit 4502 is displayed on the display device 5. In this way, this system 201 is suitable for conducting a video conference because users in remote places can communicate with each other by matching the line of sight via the network 4501.

１任意視点画像システム
２Ａ，２Ｂ，２Ｃ，２Ｄ，２Ｅカメラ
３画像合成処理部
４画像出力部
５表示装置
６被写体
１０任意視点画像合成装置
２１，２２，２３座標変換部
２４距離推定部
２５画像合成部
２６カメラパラメータ記憶部
３１画像平面
３２ピンホール
３３ピンホール平面
３４被写体
３５投影像
４１レンズ
４２センサ
４３接着材
５１チェッカーパターン平面
１０１システム
１１０任意視点画像合成装置
２０１システム
２１０任意視点画像合成装置
４５００画像送信部
４５０１ネットワーク
４５０２画像受信部
Ｉ_１，Ｉ_２，Ｉ_３撮影画像
Ｉ_{１ｃｏｎｖ}，Ｉ_{２ｃｏｎｖ}，Ｉ_{３ｃｏｎｖ} 座標変換画像 DESCRIPTION OF SYMBOLS 1 Arbitrary viewpoint image system 2A, 2B, 2C, 2D, 2E Camera 3 Image composition process part 4 Image output part 5 Display apparatus 6 Subject 10 Arbitrary viewpoint image composition apparatus 21, 22, 23 Coordinate conversion part 24 Distance estimation part 25 Image composition Unit 26 Camera parameter storage unit 31 Image plane 32 Pinhole 33 Pinhole plane 34 Subject 35 Projected image 41 Lens 42 Sensor 43 Adhesive material 51 Checker pattern plane 101 System 110 Arbitrary viewpoint image composition apparatus 201 System 210 Arbitrary viewpoint image composition apparatus 4500 Image Transmission unit 4501 Network 4502 Image reception unit I ₁ , I ₂ , I ₃ _Captured image I 1 _conv , I 2 _conv , I 3 _conv Coordinate conversion image

Claims

A camera that shoots an image of a subject, installed at at least three locations on an upper part, a right side part, and a left side part around the display unit of the display device;
An image composition processing unit configured to synthesize an image photographed from a virtual camera in the line-of-sight direction of the subject based on each image photographed by the camera;
With
The image composition processing unit
A coordinate conversion unit for converting from a camera coordinate system to a virtual camera coordinate system using camera parameters obtained by camera calibration of the camera;
A distance estimation unit that estimates a distance model of a subject with respect to the image converted by the coordinate conversion unit;
An arbitrary viewpoint image composition apparatus comprising: an image composition unit that composes an image of a virtual camera viewpoint from the distance model of the subject.

The distance estimation unit
The image converted by the coordinate conversion unit is evaluated by block matching for each set of two images, the virtual plane distance is changed within the range of distance estimation, the evaluation is performed at all distances, and the evaluation value is the smallest distance The arbitrary viewpoint image composition apparatus according to claim 1, wherein the distance model of the subject is estimated by obtaining the distance value of the target pixel.

The distance estimation unit
The arbitrary viewpoint image synthesis apparatus according to claim 2, wherein the evaluation by the block matching is performed by an irregular block.

The image composition unit
4. The arbitrary viewpoint image composition apparatus according to claim 1, wherein a color of a pixel corresponding to the distance model of the subject is extracted and pasted from the image converted by the coordinate conversion unit.

The image composition unit
5. The arbitrary viewpoint image composition apparatus according to claim 4, wherein pixels of surrounding cameras corresponding to the pixels of the virtual camera are obtained in a one-to-one relationship from the distance value.