JP2008065684A

JP2008065684A - Stereoscopic image synthesizing device, shape data generation method, and its program

Info

Publication number: JP2008065684A
Application number: JP2006244199A
Authority: JP
Inventors: Shiro Ozawa; 史朗小澤; Hisao Abe; 尚生阿部; Shosuke Naruto; 章介鳴戸; Tsukuru Kamiya; 造神谷
Original assignee: NTT Comware Corp
Current assignee: NTT Comware Corp
Priority date: 2006-09-08
Filing date: 2006-09-08
Publication date: 2008-03-21
Also published as: WO2008029530A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide a stereoscopic image synthesizing device for outputting stereoscopic image data and shape data by presenting the live stereoscopic image photographed by a video camera by a stereoscopic video display device, and presenting a haptic and tactile sense appropriate to the stereoscopic image by a haptic/tactile sense presenting device. <P>SOLUTION: In this stereoscopic image synthesizing device for synthesizing stereoscopic images from a left image obtained by imaging an object from the view point of the left eye and a right image obtained by imaging the object from the view point of the right eye, a shape calculation part 405 calculates the shape data of the object from the distance image of the input object. A shape video space adjusting part 407 converts the shape data calculated by the shape calculation part 405 into the display space of the stereoscopic images to be synthesized by its own device. A shape data output part 409 outputs the shape data converted by the shape video space adjusting part 407 to a haptic/tactile sense presenting device. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、特に立体映像データとともに形状データを生成する立体映像合成装置、形状データ生成方法およびプログラムに関する。 The present invention particularly relates to a stereoscopic video composition apparatus, a shape data generation method, and a program for generating shape data together with stereoscopic video data.

従来の立体映像表示装置は、あらかじめ左右の目に対応する２視点から見た映像を用意しておき、バリア方式（例えば、特許文献１、特許文献２参照）や偏光グラスシャッター方式の三次元ディスプレイ上で表示することで、ユーザが立体的に知覚することができる。
また、ペンタイプの操作部を持ち、ペンを操作することで力覚や触覚を体験することができるフォースフィードバック装置や、腕に装着し、腕全体の力覚や手の触感を体験することができるハプティック装置などの力覚・触覚提示装置もある。
特開平８−２４８３５５号公報特表２００３−５２１１８１号公報 A conventional stereoscopic image display device prepares an image viewed from two viewpoints corresponding to the left and right eyes in advance, and a three-dimensional display using a barrier method (see, for example, Patent Document 1 and Patent Document 2) or a polarizing glass shutter method. By displaying the above, the user can perceive in three dimensions.
Also, a force feedback device that has a pen-type operation unit and can experience force and tactile sensation by operating the pen, and it can be worn on the arm to experience the tactile sense of the entire arm and the tactile sensation of the hand There are also haptic / tactile presentation devices such as haptic devices.
JP-A-8-248355 Special table 2003-521181

しかしながら、従来の立体映像表示装置にあっては、立体映像を提示するのみであり、立体感があり、物体が浮き出て見えたとしても、それに触れることはできないという問題がある。また、ＣＡＤ（Computer Aided Design）データなどの形状データに基づき、ＣＧ（Computer Graphics）を表示しながら、力覚・触覚提示装置で力覚および触覚を提示することはできたが、予め映像と一致した形状データを用意しなければならず、映像を生成するのに形状データが必要なＣＧには適用できても、ビデオカメラなどで撮影した実写映像には適用できないという問題があった。 However, the conventional stereoscopic video display device only presents a stereoscopic video, and there is a problem that there is a stereoscopic effect and even if an object appears to protrude, it cannot be touched. Also, based on the shape data such as CAD (Computer Aided Design) data, the haptic and tactile sensation devices were able to present the haptic and tactile sensation while displaying the CG (Computer Graphics). However, there is a problem that even if it can be applied to a CG that requires shape data to generate an image, it cannot be applied to a real image captured by a video camera or the like.

本発明は、このような事情に鑑みてなされたもので、その目的は、ビデオカメラなどで撮影した実写映像の立体映像を立体映像表示装置にて提示しつつ、該立体映像と一致した力覚・触覚を力覚・触覚提示装置にて提示可能な立体映像データと形状データとを出力できる立体映像合成装置を提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to present a stereoscopic image of a real image captured by a video camera or the like on a stereoscopic image display device, and to match a force sense corresponding to the stereoscopic image. It is an object of the present invention to provide a 3D image synthesizing device capable of outputting 3D image data and shape data that can present tactile sensations with a force / tactile sense presentation device.

この発明は上述した課題を解決するためになされたもので、本発明の立体映像合成装置は、被写体を左目の視点から撮像した左画像と右目の視点から撮像した右画像とから立体映像を合成する立体映像合成装置において、各画素が距離を表す距離画像を撮影する距離画像撮影装置から入力された前記被写体の距離画像から、前記被写体の形状データを算出する形状算出部と、前記形状算出部が算出した形状データに対して、自装置が合成する立体映像の表示空間への変換を施す形状映像空間調整部と、前記形状映像空間調整部が変換を施した形状データを感覚提示装置に出力する形状データ出力部とを備えることを特徴とする。 The present invention has been made to solve the above-described problems, and the stereoscopic video composition apparatus of the present invention synthesizes a stereoscopic video from a left image obtained by imaging a subject from the viewpoint of the left eye and a right image obtained by imaging from the viewpoint of the right eye. In the stereoscopic image synthesizing apparatus, a shape calculation unit that calculates shape data of the subject from a distance image of the subject input from a distance image photographing device that captures a distance image in which each pixel represents a distance, and the shape calculation unit The shape image space adjustment unit that converts the shape data calculated by the device into the display space of the 3D image synthesized by the device, and the shape data converted by the shape image space adjustment unit is output to the sensation presentation device. And a shape data output unit.

また、本発明の立体映像合成装置は、上述の立体映像合成装置であって、前記左画像と前記右画像とから各々の色に基づき３つ以上の被写体を抽出し、前記３つ以上の被写体各々の前記立体映像の表示空間における位置を算出し、前記距離画像から各々の距離に基づき前記３つ以上の被写体を抽出し、前記３つ以上の被写体各々の前記距離画像に基づく空間における位置を算出し、前記算出したこれらの位置に基づき、前記距離画像に基づく空間から前記表示空間への変換を求める変換生成部を備え、前記形状映像空間調整部は、前記変換生成部が求めた変換を、前記形状データに施すことを特徴とする。 The stereoscopic video composition apparatus according to the present invention is the stereoscopic video composition apparatus described above, wherein three or more subjects are extracted from the left image and the right image based on respective colors, and the three or more subjects are extracted. The position of each of the three-dimensional images in the display space is calculated, the three or more subjects are extracted from the distance image based on each distance, and the position of each of the three or more subjects in the space based on the distance image is determined. And a conversion generation unit that calculates a conversion from the space based on the distance image to the display space based on the calculated positions, and the shape image space adjustment unit performs the conversion obtained by the conversion generation unit. And applying to the shape data.

また、本発明の立体映像合成装置は、上述の立体映像合成装置であって、ユーザ操作により指定された３つ以上の前記立体映像の表示空間の点と、該点それぞれにユーザ操作により対応付けられた前記距離画像の点とに基づき、前記距離画像に基づく空間から前記表示空間への変換を求める変換生成部を備え、前記形状映像空間調整部は、前記変換生成部が求めた変換を、前記形状データに施すことを特徴とする。 The stereoscopic video composition apparatus according to the present invention is the above-described stereoscopic video composition apparatus, wherein three or more stereoscopic video display space points designated by a user operation are associated with each of the points by a user operation. A conversion generation unit that calculates a conversion from the space based on the distance image to the display space based on the point of the distance image, and the shape video space adjustment unit performs the conversion obtained by the conversion generation unit, It is applied to the shape data.

また、本発明の形状データ生成方法は、被写体を左目の視点から撮像した左画像と右目の視点から撮像した右画像とから立体映像を合成する立体映像合成装置における形状データ生成方法であって、前記立体映像合成装置が、各画素が距離を表す距離画像を撮影する距離画像撮影装置から入力された前記被写体の距離画像から、前記被写体の形状データを算出する第１の過程と、前記立体映像合成装置が、前記第１の過程にて算出した形状データに対して、自装置が合成する立体映像の表示空間への変換を施す第２の過程と、前記立体映像合成装置が、前記第２の過程にて変換を施した形状データを感覚提示装置に出力する第３の過程とを備えることを特徴とする。 The shape data generation method of the present invention is a shape data generation method in a stereoscopic video composition device that synthesizes a stereoscopic video from a left image obtained by imaging a subject from a left eye viewpoint and a right image obtained from a right eye viewpoint, A first process in which the stereoscopic video synthesizing device calculates shape data of the subject from a distance image of the subject input from a distance image capturing device that captures a distance image in which each pixel represents a distance; and the stereoscopic video A second process in which the synthesizing apparatus converts the shape data calculated in the first process into a display space of the stereoscopic video synthesized by the own apparatus, and the stereoscopic video synthesizing apparatus includes the second process. And a third process of outputting the shape data converted in the process to the sensory presentation device.

また、本発明のプログラムは、コンピュータを、被写体を左目の視点から撮像した左画像と右目の視点から撮像した右画像とから立体映像を合成する立体映像合成装置として機能させるためのプログラムにおいて、各画素が距離を表す距離画像を撮影する距離画像撮影装置から入力された前記被写体の距離画像から、前記被写体の形状データを算出する形状算出部、前記形状算出部が算出した形状データに対して、自装置が合成する立体映像の表示空間への変換を施す形状映像空間調整部、前記形状映像空間調整部が変換を施した形状データを感覚提示装置に出力する形状データ出力部として機能させる。 Further, the program of the present invention is a program for causing a computer to function as a stereoscopic video composition device that synthesizes a stereoscopic video from a left image obtained by imaging a subject from a left eye viewpoint and a right image obtained from a right eye viewpoint. A shape calculation unit that calculates shape data of the subject from the distance image of the subject that is input from a distance image capturing device that captures a distance image in which a pixel represents a distance, and the shape data calculated by the shape calculation unit, A shape image space adjustment unit that converts a stereoscopic image synthesized by the device into a display space, and a shape data output unit that outputs the shape data converted by the shape image space adjustment unit to a sensation presentation device.

この発明によれば、被写体を左右２つの視点からビデオカメラなどで撮影した実写映像と該被写体を距離画像撮影装置で撮影した距離画像からなる映像とを本装置に入力させることで、立体映像表示装置に表示させた立体映像と一致した力覚・触覚を力覚・触覚提示装置にて提供させる形状データを生成することができる。 According to the present invention, a stereoscopic video display can be obtained by causing the present apparatus to input a real image obtained by photographing a subject with a video camera or the like from two left and right viewpoints and a video comprising a distance image obtained by photographing the subject with a distance image photographing device. It is possible to generate shape data that allows the haptic / tactile sensation presentation device to provide a haptic / tactile sensation that matches the stereoscopic image displayed on the device.

本実施形態の立体映像合成装置４００は、図１に示すように、左映像撮影装置１００が撮影した左目の視点から見た映像と右映像撮影装置２００が撮影した右目の視点から見た映像とを立体映像合成して、立体映像表示装置５００にて表示するとともに、距離画像撮影装置３００が撮影した距離画像から被写体の形状データを算出し、この被写体の形状データを距離画像に基づく３次元空間から立体映像の表示空間へ変換した形状データを算出して、力覚・触覚提示装置６００に出力する。これにより、ユーザは立体映像表示装置５００にて表示された立体映像を見ると同時に、表示されている立体映像中の被写体について力覚・触覚提示装置６００により力覚・触覚を得ることができる。 As shown in FIG. 1, the stereoscopic video composition apparatus 400 according to the present embodiment includes a video viewed from the left eye viewpoint captured by the left video imaging apparatus 100 and a video viewed from the right eye viewpoint captured by the right video imaging apparatus 200. Are synthesized and displayed on the stereoscopic video display device 500, and the shape data of the subject is calculated from the distance image photographed by the distance image photographing device 300, and the subject shape data is used as a three-dimensional space based on the distance image. The shape data converted from the 3D image display space to the 3D image display space is calculated and output to the haptic / tactile sense presentation device 600. As a result, the user can view the stereoscopic video displayed on the stereoscopic video display device 500 and simultaneously obtain the haptic / tactile sensation with the haptic / tactile sensation presentation device 600 for the subject in the displayed stereoscopic video.

以下、図面を参照して、本発明の実施の形態について説明する。図２は、この発明の一実施形態による立体映像合成装置４００の構成を示す概略ブロック図である。左映像撮影装置１００は、ユーザの左目の視点から見た映像に相当する映像を撮影するビデオカメラである。右映像撮影装置２００は、左映像撮影装置１００の右側に平行に設置され、ユーザの右目の視点から見た映像に相当する映像を撮影するビデオカメラである。距離画像撮影装置３００は、左映像撮影装置１００および右映像撮影装置２００の近傍に平行に設置され、本装置が照射した光が、照射されてから、測定対象物にて反射し、本装置によって受光されるまでの時間を計測して距離を求めるＴＯＦ（Time of flight）方式により、各画素が本装置からの距離を表している映像をリアルタイムに生成する。立体映像合成装置４００は、左映像撮影装置１００と右映像撮影装置２００とから左目の視点から見た映像および右目の視点から見た映像を受付けて、立体映像を合成して立体映像表示装置５００に出力するとともに、距離画像撮影装置３００から距離画像を受付けて、被写体の形状データを算出して、この算出した形状データを変換して立体映像の表示空間にマッピングした後に力覚・触覚提示装置６００に出力する。
なお、本実施形態では、力覚・触覚提示装置６００が感覚提示装置として機能する。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 2 is a schematic block diagram showing the configuration of the stereoscopic video composition apparatus 400 according to one embodiment of the present invention. The left image capturing apparatus 100 is a video camera that captures an image corresponding to an image viewed from the viewpoint of the user's left eye. The right image capturing device 200 is a video camera that is installed in parallel to the right side of the left image capturing device 100 and captures an image corresponding to an image viewed from the viewpoint of the user's right eye. The distance image capturing device 300 is installed in parallel in the vicinity of the left image capturing device 100 and the right image capturing device 200, and after the light irradiated by the present apparatus is irradiated, it is reflected by the measurement object and is reflected by the present apparatus. An image in which each pixel represents a distance from the present device is generated in real time by a time of flight (TOF) method in which the time until the light is received is obtained to determine the distance. The stereoscopic video composition apparatus 400 receives the video viewed from the left eye viewpoint and the video viewed from the right eye viewpoint from the left video imaging apparatus 100 and the right video imaging apparatus 200, and synthesizes the stereoscopic video to display the stereoscopic video display apparatus 500. And a distance image is received from the distance image capturing device 300, the shape data of the subject is calculated, the calculated shape data is converted and mapped to the display space of the stereoscopic image, and then the haptic / tactile sensation presentation device Output to 600.
In the present embodiment, the haptic / tactile sensation presentation device 600 functions as a sensation presentation device.

図３は、本実施形態による立体映像合成装置４００の構成を示す概略ブロック図である。
４０１は、左映像撮影装置１００から入力された映像を受付けて、該映像から１フレームずつ抽出した左画像を出力する左映像データ入力部である。４０２は、右映像撮影装置２００から入力された映像を受付けて、該映像から１フレームずつ抽出した右画像を出力する右映像データ入力部である。４０３は、距離画像撮影装置３００から入力された映像を受付けて、該映像から１フレームずつ抽出した距離画像を出力する距離画像データ入力部である。 FIG. 3 is a schematic block diagram showing the configuration of the stereoscopic video composition apparatus 400 according to the present embodiment.
Reference numeral 401 denotes a left video data input unit that receives a video input from the left video shooting device 100 and outputs a left image extracted frame by frame from the video. Reference numeral 402 denotes a right video data input unit that receives a video input from the right video shooting device 200 and outputs a right image extracted frame by frame from the video. A distance image data input unit 403 receives a video input from the distance image capturing device 300 and outputs a distance image extracted frame by frame from the video.

４０４は、左映像データ入力部４０１と右映像データ入力部４０２とから受けた左画像および右画像を合成して、立体映像表示装置５００に合わせた形式の立体映像データを生成する立体映像合成部である。４０５は、距離画像データ入力部から受けた距離画像に基づき、被写体の形状データを生成する形状抽算出部である。形状抽算出部４０５の詳細は、図５を用いて後述する。 Reference numeral 404 denotes a stereoscopic video composition unit that synthesizes the left image and the right image received from the left video data input unit 401 and the right video data input unit 402 and generates stereoscopic video data in a format suitable for the stereoscopic video display device 500. It is. A shape extraction calculation unit 405 generates shape data of a subject based on the distance image received from the distance image data input unit. Details of the shape extraction calculation unit 405 will be described later with reference to FIG.

４０６は、距離画像に基づく空間から立体映像の表示空間へのアフィン変換を表す行列を求める変換生成部である。左映像撮影装置１００と右映像撮影装置２００と距離画像撮影装置３００との位置関係が変わらなければ、このアフィン変換は、変わらない。そこで、本実施形態では、事前に、異なる色を有する３つのマーカを左映像撮影装置１００と右映像撮影装置２００と距離画像撮影装置３００とにて撮影し、該撮影した映像に基づき変換生成部４０６にてアフィン変換を表す行列を生成しておき、この生成した行列を以後の形状映像空間調整部４０７における変換に用いる。 A conversion generation unit 406 obtains a matrix representing affine transformation from a space based on the distance image to a stereoscopic video display space. If the positional relationship among the left image capturing device 100, the right image capturing device 200, and the distance image capturing device 300 does not change, this affine transformation does not change. Therefore, in the present embodiment, three markers having different colors are photographed in advance by the left image capturing device 100, the right image capturing device 200, and the distance image capturing device 300, and the conversion generation unit is based on the captured images. A matrix representing affine transformation is generated at 406, and this generated matrix is used for subsequent transformation in the shape video space adjustment unit 407.

図４は、この事前に行うアフィン変換生成の概略を説明するフローチャートである。画像変換生成部４０６は、左画像と右画像とから３つのマーカを各々の色に基づき抽出し（Ｓ１：詳細は図６を用いて後述）、これら３つのマーカ各々の立体映像の表示空間における位置を左画像と右画像の視差に基づき算出する（Ｓ２：詳細は図７を用いて後述）。さらに、変換生成部４０６は、前述の左画像および右画像と同時に撮影した距離画像から各々の距離に基づき、前述の３つのマーカを抽出し、これらの３つのマーカ各々の距離画像に基づく空間における位置を算出する（Ｓ３：詳細は図８を用いて後述）。これら３つのマーカの距離画像に基づく空間における位置と、立体映像の表示空間における位置とに基づき、アフィン変換である前述の距離画像に基づく空間から立体映像の表示空間への変換を求める（Ｓ４）。 FIG. 4 is a flowchart for explaining the outline of the affine transformation generation performed in advance. The image conversion generation unit 406 extracts three markers from the left image and the right image based on the colors (S1: details will be described later with reference to FIG. 6), and each of these three markers in the stereoscopic video display space. The position is calculated based on the parallax between the left image and the right image (S2: details will be described later with reference to FIG. 7). Further, the conversion generation unit 406 extracts the three markers described above from the distance images captured simultaneously with the left image and the right image described above, and in the space based on the distance image of each of these three markers. The position is calculated (S3: details will be described later with reference to FIG. 8). Based on the position in the space based on the distance image of these three markers and the position in the display space of the stereoscopic video, conversion from the space based on the above-described distance image, which is affine transformation, to the display space of the stereoscopic video is obtained (S4). .

なお、任意のアフィン変換は、各軸回りの回転、各軸方向の拡大縮小、平行移動の組合せで表せるため、アフィン変換を表す行列は、３次元空間では、それぞれ３つずつで合計９つの未知数からなる。このため、同一直線上にない３つの点の変換前と変換後の座標がわかれば、これらを代入した９つの一次式が立てられるので、アフィン変換を表す行列を一意に求めることができる。前述のステップＳ４では、変換生成部４０６は、３つの被写体の距離画像に基づく空間と立体映像の表示空間における位置（座標）を３つの点の変換前と変換後の座標として用いて、アフィン変換を表す行列を求める。 Since any affine transformation can be expressed by a combination of rotation around each axis, enlargement / reduction in each axial direction, and parallel movement, the matrix representing the affine transformation is a total of nine unknowns each in three in a three-dimensional space. Consists of. For this reason, if the coordinates before and after the conversion of the three points that are not on the same straight line are known, nine linear expressions substituted with these can be established, so that a matrix representing the affine transformation can be uniquely obtained. In step S4 described above, the conversion generation unit 406 uses the space based on the distance image of the three subjects and the position (coordinates) in the display space of the stereoscopic video as the coordinates before and after the conversion of the three points, and uses the affine transformation. Find the matrix that represents.

形状映像空間調整部４０７は、変換生成部４０６が求めたアフィン変換を表す行列を用いて、形状抽算出部４０５が生成した被写体の形状データを立体映像の表示空間に変換する。すなわち、アフィン変換を表す行列に、形状データの各点のベクトルを掛けることで、アフィン変換を施す。４０８は、立体映像合成部４０４が生成した立体映像データを、立体映像表示装置５００に出力する立体映像出力部である。４０９は、形状映像空間調整部４０７により変換が施された形状データを力覚・触覚提示装置６００に出力する形状出力部である。 The shape video space adjustment unit 407 uses the matrix representing the affine transformation obtained by the conversion generation unit 406 to convert the subject shape data generated by the shape extraction calculation unit 405 into a stereoscopic video display space. That is, affine transformation is performed by multiplying a matrix representing affine transformation by a vector of each point of shape data. A stereoscopic video output unit 408 outputs the stereoscopic video data generated by the stereoscopic video synthesis unit 404 to the stereoscopic video display device 500. Reference numeral 409 denotes a shape output unit that outputs the shape data converted by the shape image space adjustment unit 407 to the haptic / tactile sense presentation device 600.

図５は、形状抽算出部４０５において、距離画像データ入力部４０３から受けた距離画像に基づき、被写体の形状データを生成する処理を説明するフローチャートである。まず、被写体を抽出するために、距離の抽出閾値Ｄｍを、予めユーザ操作により設定し、形状抽算出部４０５は、この抽出閾値Ｄｍを記憶しておく（Ｓａ１）。
次に、形状抽算出部４０５は、距離画像データ入力部４０３から受けた距離画像を構成するｉ＝０からＩｍａｘ−１までのＩｍａｘ個全ての画素について、ステップＳａ３〜Ｓａ６の処理を行う（Ｓａ２）。ステップＳａ３では、形状抽算出部４０５は、ｉ番目（最初は０番目）の画素の距離Ｄｉを取得する。次に、形状抽算出部４０５は、ステップＳａ３にて取得した距離Ｄｉが、被写体として抽出すべき距離であるか否か、すなわち、ステップＳａ１にて記憶部に格納した抽出閾値Ｄｍより小さいか否かを判定する（Ｓａ４）。ステップＳａ４にて、距離Ｄｉは抽出閾値Ｄｍより小さいと判定されると、ステップＳａ５に遷移して、形状抽算出部４０５は、このときのｉ番目の画素について、距離画像中における位置とその距離Ｄｉを記憶した後、ステップＳａ６に遷移する。 FIG. 5 is a flowchart for explaining processing for generating shape data of a subject based on the distance image received from the distance image data input unit 403 in the shape extraction calculation unit 405. First, in order to extract a subject, a distance extraction threshold Dm is set in advance by a user operation, and the shape extraction calculation unit 405 stores the extraction threshold Dm (Sa1).
Next, the shape extraction calculation unit 405 performs the processing of steps Sa3 to Sa6 for all Imax pixels from i = 0 to Imax-1 constituting the distance image received from the distance image data input unit 403 (Sa2). ). In step Sa3, the shape lottery calculation unit 405 acquires the distance Di of the i-th (initially 0th) pixel. Next, the shape extraction calculation unit 405 determines whether the distance Di acquired in step Sa3 is a distance to be extracted as a subject, that is, whether it is smaller than the extraction threshold Dm stored in the storage unit in step Sa1. Is determined (Sa4). If it is determined in step Sa4 that the distance Di is smaller than the extraction threshold value Dm, the process proceeds to step Sa5, and the shape extraction calculation unit 405 determines the position in the distance image and its distance for the i-th pixel at this time. After storing Di, the process proceeds to step Sa6.

一方、ステップＳａ４にて、距離Ｄｉは抽出閾値Ｄｍより小さくない（同じもしくは大きい）と判定されると、直接ステップＳａ６に遷移する。ステップＳａ６では、形状抽算出部４０５は、ｉの値を１増やし、ｉの値がＩｍａｘより小さいときは、ステップＳａ３に戻って、前述の処理を繰り返す。このようにして、ｉの値がＩｍａｘとなるまで、すなわち、該距離画像の全ての画素について、形状抽算出部４０５は、ステップＳａ３からＳａ６を繰り返す。 On the other hand, if it is determined in step Sa4 that the distance Di is not smaller (same or larger) than the extraction threshold Dm, the process directly proceeds to step Sa6. In step Sa6, the shape extraction calculation unit 405 increases the value of i by 1, and when the value of i is smaller than Imax, the shape lottery calculation unit 405 returns to step Sa3 and repeats the above-described processing. In this way, until the value of i reaches Imax, that is, for all the pixels of the distance image, the shape extraction calculation unit 405 repeats steps Sa3 to Sa6.

ステップＳａ６にて、ｉの値がＩｍａｘとなると、ステップＳａ７に遷移し、形状抽算出部４０５は、ステップＳａ６にて位置を記憶した画素のうち、距離画像中の位置が縦もしくは横に隣接しあうものをまとめたグループを生成する。ここで、任意の２画素が隣接しあうか否かの判定は、該２画素の横軸方向の位置が同じで縦軸方向の位置が１画素ずれているか、あるいは、横軸方向の位置が１画素ずれていて縦軸方向の位置が同じであるかのいずれかの条件を満たしている場合は隣接しあうとすることで可能である。次に、形状抽算出部４０５は、ステップＳａ７にて生成した各グループについて、画素の位置（横軸方向、縦軸方向）と距離に基づき、隣接する３点ずつからなる３角形によりポリゴン化した形状データを生成する（Ｓａ８）。 When the value of i becomes Imax in step Sa6, the process proceeds to step Sa7, and among the pixels whose positions are stored in step Sa6, the shape extraction calculation unit 405 is adjacent to the position in the distance image vertically or horizontally. Create a group of things that match. Here, whether or not any two pixels are adjacent to each other is determined based on whether the position in the horizontal axis direction of the two pixels is the same and the position in the vertical axis direction is shifted by one pixel, or the position in the horizontal axis direction is If one of the conditions of shifting by one pixel and the same position in the vertical axis direction is satisfied, it is possible to try to adjoin each other. Next, the shape extraction calculation unit 405 polygonizes each group generated in step Sa7 with a triangle formed by three adjacent points based on the pixel position (horizontal axis direction and vertical axis direction) and distance. Shape data is generated (Sa8).

図６は、変換生成部４０６において、左画像および右画像におけるマーカの位置を算出する処理を説明するフローチャートである。変換生成部４０６では、左画像と右画像それぞれについて、図６に示すフローチャートの処理を、抽出閾値を変更して３回実施することで、３つのマーカの位置を算出する。つまり、図６の処理を、左右で２つの画像に対し、３つのマーカについて行うことによって、計６回実施する。また、本実施形態では、各画素の色を表現するための色値を、赤、緑、青成分の値で表す。
まず、３つのマーカを検出するために、予めマーカ各々について色の赤、緑、青成分の抽出閾値として上限値（Ｒｍａｘ、Ｇｍａｘ、Ｂｍａｘ）および下限値（Ｒｍｉｎ、Ｇｍｉｎ、Ｂｍｉｎ）をユーザ操作により設定し、変換生成部４０６は、これらの値を記憶部に格納する（Ｓｂ１）。 FIG. 6 is a flowchart for describing processing for calculating the position of the marker in the left image and the right image in the conversion generation unit 406. The conversion generation unit 406 calculates the positions of the three markers by performing the process of the flowchart shown in FIG. 6 three times for each of the left image and the right image while changing the extraction threshold. In other words, the process of FIG. 6 is performed a total of six times by performing three markers on two images on the left and right. In the present embodiment, the color values for expressing the color of each pixel are represented by red, green, and blue component values.
First, in order to detect three markers, an upper limit value (Rmax, Gmax, Bmax) and a lower limit value (Rmin, Gmin, Bmin) are previously set by user operation as extraction threshold values for red, green, and blue components of each marker. Then, the conversion generation unit 406 stores these values in the storage unit (Sb1).

次に、変換生成部４０６は、左映像データ入力部４０１または右映像データ入力部４０２から受けた画像を構成するｊ＝０からＪｍａｘ−１までのＪｍａｘ個全ての画素について、ステップＳｂ３〜Ｓｂ６の処理を行う（Ｓｂ２）。ステップＳｂ３では、変換生成部４０６は、ｊ番目（最初は０番目）の画素の赤、緑、青成分値（Ｒｊ、Ｇｊ、Ｂｊ）を取得する。次に、変換生成部４０６は、ステップＳｂ３にて取得した赤、緑、青成分値（Ｒｊ、Ｇｊ、Ｂｊ）が、マーカの色であるか否か、すなわち、ステップＳｂ１にて記憶部に格納した上限値（Ｒｍａｘ、Ｇｍａｘ、Ｂｍａｘ）および下限値（Ｒｍｉｎ、Ｇｍｉｎ、Ｂｍｉｎ）の範囲内、すなわち抽出閾値の範囲内にあるか否かを判定する（Ｓｂ４）。この判定では、以下の（１）〜（３）の式を全て満たしている場合に、抽出閾値の範囲内にあると判定する。
Ｒｍｉｎ＜Ｒｊ＜Ｒｍａｘ・・・・（１）
Ｇｍｉｎ＜Ｇｊ＜Ｇｍａｘ・・・・（２）
Ｂｍｉｎ＜Ｂｊ＜Ｂｍａｘ・・・・（３）
ステップＳｂ４にて、抽出閾値の範囲内にあると判定されると、ステップＳｂ５に遷移して、変換生成部４０６は、このときのｊ番目の画素について、画像中における位置を記憶した後、ステップＳｂ６に遷移する。 Next, the conversion generation unit 406 performs steps Sb3 to Sb6 for all the Jmax pixels from j = 0 to Jmax−1 that constitute the image received from the left video data input unit 401 or the right video data input unit 402. Processing is performed (Sb2). In step Sb3, the conversion generation unit 406 acquires the red, green, and blue component values (Rj, Gj, Bj) of the jth (initially 0th) pixel. Next, the conversion generation unit 406 stores in the storage unit whether or not the red, green, and blue component values (Rj, Gj, Bj) acquired in step Sb3 are marker colors, that is, in step Sb1. It is determined whether it is within the range of the upper limit value (Rmax, Gmax, Bmax) and the lower limit value (Rmin, Gmin, Bmin), that is, within the extraction threshold range (Sb4). In this determination, when all of the following expressions (1) to (3) are satisfied, it is determined that the value is within the extraction threshold range.
Rmin <Rj <Rmax (1)
Gmin <Gj <Gmax (2)
Bmin <Bj <Bmax (3)
If it is determined in step Sb4 that it is within the range of the extraction threshold, the process proceeds to step Sb5, and the conversion generation unit 406 stores the position in the image for the j-th pixel at this time, Transition to Sb6.

一方、ステップＳｂ４にて、抽出閾値の範囲内にないと判定されると、直接ステップＳｂ６に遷移する。ステップＳｂ６では、変換生成部４０６は、ｊの値を１増やし、ｊの値がＪｍａｘより小さいときは、ステップＳｂ３に遷移して、前述の処理を繰り返す。このようにして、ｊの値がＪｍａｘとなるまで、すなわち、該画像の全ての画素について、変換生成部４０６は、ステップＳｂ３からＳｂ６を繰り返す。 On the other hand, if it is determined in step Sb4 that it is not within the extraction threshold range, the process directly proceeds to step Sb6. In step Sb6, the conversion generation unit 406 increases the value of j by 1, and when the value of j is smaller than Jmax, the process proceeds to step Sb3 and repeats the above processing. In this way, the conversion generation unit 406 repeats steps Sb3 to Sb6 until the value of j reaches Jmax, that is, for all the pixels of the image.

ステップＳｂ６にて、ｊの値がＪｍａｘとなると、ステップＳｂ７に遷移し、変換生成部４０６は、ステップＳｂ６にて位置を記憶した画素のうち、画像中の位置が縦もしくは横に隣接しあうものをまとめたグループを生成する。ここで、任意の２画素が隣接しあうか否かの判定は、該２画素の横軸方向の位置が同じで縦軸方向の位置が１画素ずれているか、横軸方向の位置が１画素ずれていて縦軸方向の位置が同じであるかのいずれかの条件を満たしている場合は隣接しあうとすることで可能である。次に、変換生成部４０６は、ステップＳｂ７にて生成したグループのうち、面積が最大のもの、すなわち、画素数が最大のものをマーカであると判定して抽出し、抽出したマーカの重心の座標を、マーカを構成する画素の座標を平均することで算出して出力する（Ｓｂ８）。 When the value of j reaches Jmax in step Sb6, the process proceeds to step Sb7, and the conversion generation unit 406 has pixels whose positions are stored in step Sb6 whose positions in the image are adjacent vertically or horizontally. Generate a group that summarizes. Here, whether or not any two pixels are adjacent to each other is determined by determining whether the two pixels have the same horizontal axis position and the vertical axis position is shifted by one pixel, or the horizontal axis direction position is one pixel. If any one of the conditions of being shifted and having the same position in the vertical axis direction is satisfied, it is possible to make them adjacent to each other. Next, the conversion generation unit 406 determines that the group having the largest area, that is, the one having the largest number of pixels, is extracted from the group generated in step Sb7, and extracts the center of gravity of the extracted marker. The coordinates are calculated and output by averaging the coordinates of the pixels constituting the marker (Sb8).

例えば、ステップＳｂ６にて変換生成部４０６が位置を記憶した画素のＸ，Ｙ座標が、（１０，１０）、（１０，１１）、（１１，１１）、（２５，６０）、（２４，６１）、（２５，６１）、（２６，６１）、（２５，６２）の８画素であったとすると、ステップＳｂ７では、変換生成部４０６は、（１０，１０）、（１０，１１）、（１１，１１）の３つの画素からなるグループ１と、（２５，６０）、（２４，６１）、（２５，６１）、（２６，６１）、（２５，６２）の５つの画素からなるグループ２とを生成する。次にステップＳｂ８では、変換生成部４０６、３画素のグループ１と５画素のグループ２との画素数を比較し、画素数の多いグループ２をマーカであると判定して抽出し、抽出したマーカの重心の位置を、（２５，６０）、（２４，６１）、（２５，６１）、（２６，６１）、（２５，６２）の平均を求めることで算出する。すなわち、重心の位置は、（（２５＋２４＋２５＋２６＋２５）／５，（６０＋６１＋６１＋６１＋６２）／５）＝（２５，６１）となる。 For example, the X and Y coordinates of the pixel whose position is stored by the conversion generation unit 406 in step Sb6 are (10, 10), (10, 11), (11, 11), (25, 60), (24, 61), (25, 61), (26, 61), and (25, 62), assuming that there are 8 pixels, in step Sb7, the conversion generation unit 406 selects (10, 10), (10, 11), Group 1 consisting of three pixels (11, 11) and five pixels (25, 60), (24, 61), (25, 61), (26, 61), (25, 62) Group 2 is generated. Next, in step Sb8, the number of pixels of the conversion generation unit 406, the group 1 of 3 pixels and the group 2 of 5 pixels are compared, and the group 2 having a large number of pixels is determined to be a marker and extracted. Is calculated by obtaining the average of (25, 60), (24, 61), (25, 61), (26, 61), and (25, 62). That is, the position of the center of gravity is ((25 + 24 + 25 + 26 + 25) / 5, (60 + 61 + 61 + 61 + 62) / 5) = (25, 61).

図７は、変換生成部４０６にて立体映像の表示空間におけるマーカのＺ軸方向すなわち画像に対して垂直な方向（奥行き方向）の位置を算出する方法を説明する図である。なお、本図では、説明を簡易にするために１つのマーカについてのみ説明するが、変換生成部４０６は３つのマーカ各々について、図７の処理を行い、各マーカの位置を算出する。座標ＸＬは、図６の処理にて変換生成部４０６が算出した左画像Ｇ１におけるマーカ画像Ｍ１の重心位置のうち横軸方向の座標であり、左画像Ｇ１の左端を原点としている。座標ＸＲは、図６の処理にて変換生成部４０６が算出した右画像Ｇ２におけるマーカ画像Ｍ２の重心位置のうち横軸方向の座標であり、右画像Ｇ２の左端を原点としている。画像に対して垂直な方向（Ｚ軸）は、立体映像表示装置５００にて表示している立体映像を見ているユーザの視点を原点としており、変換生成部４０６は、画像に対して垂直な方向の座標Ｚを（４）式にて算出する。
Ｚ＝１／（ＸＬ−ＸＲ）・・・・（４）
また、変換生成部４０６は、図６の処理にて算出した左画像および右画像におけるマーカ画像の重心位置の平均をとることで、立体映像の表示空間におけるマーカのＸ軸方向すなわち画像の横方向の座標およびＹ軸方向すなわち画像の縦方向の座標を算出する。 FIG. 7 is a diagram illustrating a method for calculating the position of the marker in the Z-axis direction, that is, the direction perpendicular to the image (depth direction) in the stereoscopic video display space by the conversion generation unit 406. In this figure, only one marker is described for the sake of simplicity, but the conversion generation unit 406 performs the process of FIG. 7 for each of the three markers to calculate the position of each marker. The coordinate XL is a coordinate in the horizontal axis direction among the barycentric positions of the marker image M1 in the left image G1 calculated by the conversion generation unit 406 in the process of FIG. 6, and has the left end of the left image G1 as the origin. The coordinate XR is a coordinate in the horizontal axis direction among the barycentric positions of the marker image M2 in the right image G2 calculated by the conversion generation unit 406 in the process of FIG. 6, and has the left end of the right image G2 as the origin. The direction perpendicular to the image (Z-axis) is based on the viewpoint of the user viewing the stereoscopic video displayed on the stereoscopic video display device 500, and the conversion generator 406 is perpendicular to the image. The coordinate Z of the direction is calculated by equation (4).
Z = 1 / (XL-XR) (4)
Further, the conversion generation unit 406 takes the average of the barycentric positions of the marker images in the left image and the right image calculated in the process of FIG. 6, so that the X-axis direction of the marker in the stereoscopic video display space, that is, the horizontal direction of the image And the Y-axis direction, that is, the vertical coordinate of the image is calculated.

例えば、図６の処理にて算出した左画像におけるマーカ画像の重心位置のＸ座標ＸＬ＝８０、Ｙ座標ＹＬ＝４２であり、右画像におけるマーカ画像の重心位置のＸ座標ＸＲ＝５０、Ｙ座標ＹＲ＝４０であるときは、変換生成部４０６は、立体映像の表示空間におけるＸ座標を（５）式、Ｙ座標を（６）式、Ｚ座標を（７）式で算出し、（Ｘ，Ｙ，Ｚ）＝（６５，４１，０．０３３）となる。
Ｘ＝（ＸＬ＋ＸＲ）／２＝（８０＋５０）／２＝６５・・・（５）
Ｙ＝（ＹＬ＋ＹＲ）／２＝（４２＋４０）／２＝４１・・・（６）
Ｚ＝１／（ＸＬ−ＸＲ）＝１／（８０−５０）＝０．０３３・・・（７）
ここで、Ｚ座標の値が、Ｘ座標、Ｙ座標の値に比べて非常に小さな値となっているが、これは、（４）式により求められるＺ座標の値が、Ｘ座標、Ｙ座標とは異なる縮尺となっているためであり、所定の定数ＣをＺ座標に乗じることで、これを調整する。また、Ｚ軸方向の位置を強調するように、所定の定数Ｃの大きさを調整してもよい。 For example, the X coordinate XL = 80 and Y coordinate YL = 42 of the centroid position of the marker image in the left image calculated in the process of FIG. 6, and the X coordinate XR = 50 and Y coordinate of the centroid position of the marker image in the right image When YR = 40, the conversion generation unit 406 calculates the X coordinate in the stereoscopic image display space using Equation (5), the Y coordinate using Equation (6), and the Z coordinate using Equation (7). Y, Z) = (65, 41, 0.033).
X = (XL + XR) / 2 = (80 + 50) / 2 = 65 (5)
Y = (YL + YR) / 2 = (42 + 40) / 2 = 41 (6)
Z = 1 / (XL-XR) = 1 / (80-50) = 0.033 (7)
Here, the value of the Z coordinate is very small compared to the value of the X coordinate and the Y coordinate. This is because the value of the Z coordinate obtained by the equation (4) is the X coordinate and the Y coordinate. This is because the scale is different from that of FIG. 2, and this is adjusted by multiplying the Z coordinate by a predetermined constant C. Further, the magnitude of the predetermined constant C may be adjusted so as to emphasize the position in the Z-axis direction.

図８は、変換生成部４０６において、３つのマーカ各々の距離画像に基づく空間における位置を算出する処理を説明するフローチャートである。なお、本図では、説明を簡易にするために１つのマーカについてのみ説明するが、変換生成部４０６は３つのマーカ各々について、図８の処理を行い、３つのマーカの距離画像に基づく空間における位置を算出する。つまり、図８の処理を３回実施する。
まず、３つのマーカを検出するために、予めマーカ各々について距離の抽出閾値である下限値Ｄｍｉｎおよび上限値Ｄｍａｘをユーザ操作により設定し、変換生成部４０６は、この下限値Ｄｍｉｎ、上限値Ｄｍａｘを記憶部に記憶しておく（Ｓｃ１）。
次に、変換生成部４０６は、距離画像データ入力部４０３から受けた距離画像を構成するｋ＝０からＫｍａｘ−１までのＫｍａｘ個全ての画素について、ステップＳｃ３〜Ｓｃ６の処理を行う（Ｓｃ２）。ステップＳｃ３では、変換生成部４０６は、ｋ番目（最初は０番目）の画素の距離Ｄｋを取得する。次に、変換生成部４０６は、ステップＳｃ３にて取得した距離Ｄｋが、マーカとして抽出すべき距離であるか否か、すなわち、ステップＳｃ１にて記憶部に格納した下限値Ｄｍｉｎと上限値Ｄｍａｘの間にあるか否かを判定する（Ｓｃ４）。ステップＳｃ４にて、距離Ｄｋは下限値Ｄｍｉｎと上限値Ｄｍａｘの間にあると判定されると、ステップＳｃ５に遷移して、変換生成部４０６は、このときのｋ番目の画素について、距離画像中における位置とその距離Ｄｋを記憶した後、ステップＳｃ６に遷移する。 FIG. 8 is a flowchart for explaining the process of calculating the position in the space based on the distance image of each of the three markers in the conversion generation unit 406. In this figure, only one marker will be described for the sake of simplicity, but the conversion generation unit 406 performs the process of FIG. 8 for each of the three markers, and in the space based on the distance image of the three markers. Calculate the position. That is, the process of FIG. 8 is performed three times.
First, in order to detect three markers, a lower limit value Dmin and an upper limit value Dmax, which are distance extraction threshold values, are set in advance by a user operation for each marker, and the conversion generation unit 406 sets the lower limit value Dmin and the upper limit value Dmax. Store in the storage unit (Sc1).
Next, the conversion generation unit 406 performs the processes of Steps Sc3 to Sc6 for all Kmax pixels from k = 0 to Kmax−1 constituting the distance image received from the distance image data input unit 403 (Sc2). . In step Sc3, the conversion generation unit 406 acquires the distance Dk of the kth (initially 0th) pixel. Next, the conversion generation unit 406 determines whether or not the distance Dk acquired in step Sc3 is a distance to be extracted as a marker, that is, the lower limit value Dmin and the upper limit value Dmax stored in the storage unit in step Sc1. It is determined whether it is between (Sc4). If it is determined in step Sc4 that the distance Dk is between the lower limit value Dmin and the upper limit value Dmax, the process proceeds to step Sc5, and the conversion generation unit 406 determines that the kth pixel at this time is in the distance image. After storing the position and the distance Dk at, the process proceeds to step Sc6.

一方、ステップＳｃ４にて、距離Ｄｋは下限値Ｄｍｉｎと上限値Ｄｍａｘの間にないと判定されると、直接ステップＳｃ６に遷移する。ステップＳｃ６では、変換生成部４０６は、ｋの値を１増やし、ｋの値がＫｍａｘより小さいときは、ステップＳｃ３に戻って、前述の処理を繰り返す。このようにして、ｋの値がＫｍａｘとなるまで、すなわち、該距離画像の全ての画素について、変換生成部４０６は、ステップＳｃ３からＳｃ６を繰り返す。 On the other hand, if it is determined in step Sc4 that the distance Dk is not between the lower limit value Dmin and the upper limit value Dmax, the process directly proceeds to step Sc6. In step Sc6, the conversion generation unit 406 increases the value of k by 1, and when the value of k is smaller than Kmax, the conversion generation unit 406 returns to step Sc3 and repeats the above-described processing. In this way, the conversion generation unit 406 repeats Steps Sc3 to Sc6 until the value of k becomes Kmax, that is, for all the pixels of the distance image.

ステップＳｃ６にて、ｋの値がＫｍａｘとなると、ステップＳｃ７に遷移し、変換生成部４０６は、ステップＳｃ６にて位置を記憶した画素のうち、距離画像中の位置が縦もしくは横に隣接しあうものをまとめたグループを生成する。ここで、任意の２画素が隣接しあうか否かの判定は、該２画素の横軸方向の位置が同じで縦軸方向の位置が１画素ずれているか、あるいは、横軸方向の位置が１画素ずれていて縦軸方向の位置が同じであるかのいずれかの条件を満たしている場合は隣接しあうとすることで可能である。次に、変換生成部４０６は、ステップＳｃ７にて生成したグループのうち、面積が最大のもの、すなわち、画素数が最大のものをマーカであると判定して抽出し、抽出したマーカの重心の座標を、マーカを構成する画素の座標を平均することで算出して出力する（Ｓｃ８）。 In step Sc6, when the value of k becomes Kmax, the process proceeds to step Sc7, and the conversion generation unit 406 makes the positions in the distance image adjacent to each other vertically or horizontally among the pixels whose positions are stored in step Sc6. Create a group of things together. Here, whether or not any two pixels are adjacent to each other is determined based on whether the position in the horizontal axis direction of the two pixels is the same and the position in the vertical axis direction is shifted by one pixel, or the position in the horizontal axis direction is If one of the conditions of shifting by one pixel and the same position in the vertical axis direction is satisfied, it is possible to try to adjoin each other. Next, the conversion generation unit 406 determines that the group having the largest area, that is, the one having the largest number of pixels, is extracted from the group generated in Step Sc7, and extracts the center of gravity of the extracted marker. The coordinates are calculated and output by averaging the coordinates of the pixels constituting the marker (Sc8).

このように、変換生成部４０６にて生成したアフィン変換を用いて、形状映像空間調整部４０７が、形状データを距離画像に基づく空間から立体映像の表示空間に変換するので、本実施形態の立体映像合成装置４００は、実写の立体映像と動きおよび座標系が一致した形状データを出力し、実写映像を立体映像表示装置５００で表示した際に、誰もが直感的に要求する触感への要望に対して、形状データを受けた力覚・触覚提示装置６００によってそれを実現することができる。従来の実写の立体映像は単に立体物として見るのみであったが、触感が加わることによってより確実な立体物の把握が可能であるとともに新たなメディア、インターフェースの可能性が広がる。
また、本実施形態の立体映像合成装置４００は、複数の被写体について、立体映像データおよび形状データを出力することができるため、遠隔コミュニケーションや空間記録などの分野で特に効果的な利用が可能である。 As described above, the shape video space adjustment unit 407 uses the affine transformation generated by the conversion generation unit 406 to convert the shape data from the space based on the distance image to the stereoscopic video display space. The video composition device 400 outputs shape data in which the motion and coordinate system coincide with the live-action stereoscopic video, and when the live-action video is displayed on the stereoscopic video display device 500, a request for a tactile sensation that anyone requests intuitively. On the other hand, it can be realized by the force / tactile sense presentation device 600 that has received the shape data. Conventional live-action 3D images were simply viewed as 3D objects, but the addition of tactile sensation makes it possible to grasp the 3D objects more reliably and expand the possibilities of new media and interfaces.
In addition, since the stereoscopic video composition apparatus 400 according to this embodiment can output stereoscopic video data and shape data for a plurality of subjects, it can be used particularly effectively in fields such as remote communication and spatial recording. .

なお、立体映像合成装置４００の記憶部は、ハードディスク装置や光磁気ディスク装置、フラッシュメモリ等の不揮発性のメモリや、ＣＲ−ＲＯＭ等の読み出しのみが可能な記憶媒体、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のような揮発性のメモリ、あるいはこれらの組み合わせにより構成されるものとする。
また、この立体映像合成装置４００には、周辺機器として入力装置、表示装置等（いずれも図示せず）が接続されるものとする。ここで、入力装置とはキーボード、マウス等の入力デバイスのことをいう。表示装置とはＣＲＴ（ＣａｔｈｏｄｅＲａｙＴｕｂｅ）や液晶表示装置等のことをいう。 Note that the storage unit of the stereoscopic video composition device 400 includes a hard disk device, a magneto-optical disk device, a nonvolatile memory such as a flash memory, a storage medium that can only be read such as a CR-ROM, and a RAM (Random Access Memory). Such a volatile memory, or a combination thereof.
In addition, it is assumed that an input device, a display device, and the like (none of which are shown) are connected to the stereoscopic video composition device 400 as peripheral devices. Here, the input device refers to an input device such as a keyboard and a mouse. The display device refers to a CRT (Cathode Ray Tube), a liquid crystal display device, or the like.

また、図３における左映像データ入力部４０１、右映像データ入力部４０２、距離画像データ入力部４０３、立体映像合成部４０４、形状抽算出部４０５、変換生成部４０６、形状映像空間調整部４０７、立体映像出力部４０８、形状出力部４０９の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより左映像データ入力部４０１、右映像データ入力部４０２、距離画像データ入力部４０３、立体映像合成部４０４、形状抽算出部４０５、変換生成部４０６、形状映像空間調整部４０７、立体映像出力部４０８、形状出力部４０９の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。 Also, the left video data input unit 401, the right video data input unit 402, the distance image data input unit 403, the stereoscopic video synthesis unit 404, the shape extraction calculation unit 405, the conversion generation unit 406, the shape video space adjustment unit 407 in FIG. A program for realizing the functions of the stereoscopic image output unit 408 and the shape output unit 409 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read by a computer system and executed. Video data input unit 401, right video data input unit 402, distance image data input unit 403, stereoscopic video composition unit 404, shape extraction calculation unit 405, conversion generation unit 406, shape video space adjustment unit 407, stereoscopic video output unit 408, Processing of the shape output unit 409 may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであっても良く、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであっても良い。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

なお、本実施形態において、変換生成部４０６は、３つのマーカを撮影した画像から、アフィン変換を生成するとして説明したが、３つを超える数のマーカを撮影した画像からアフィン変換を生成してもよいし、１つのマーカを場所を変えて３回以上撮影し、これらの画像からアフィン変換を生成してもよい。また、事前にマーカを撮影するのではなく、被写体とともにマーカを撮影し、リアルタイムにアフィン変換を生成してもよい。
また、本実施形態においては、変換生成部４０６は、３つのマーカを左画像および右画像では色を基準にして抽出し、距離画像では距離を基準にして抽出しているが、左画像、右画像、距離画像を立体映像合成装置４００の変換生成部４０６が表示させ、ユーザがマウスなどの入力デバイスを用いて左画像および右画像に対して指定した立体映像の表示空間の点と、該点と対応付けてユーザがマウスなどの入力デバイスを用いて距離画像に対して指定した点を変換生成部４０６が受付けてマーカとし、変換生成部４０６は該マーカに基づきアフィン変換を生成してもよい。このときも、アフィン変換の生成には３つ以上のマーカが必要である。これにより、マーカを用いることなく、任意のシーンを撮影した画像に基づき、距離画像に基づく空間から立体映像の表示空間へのアフィン変換を求めることができる。 In the present embodiment, the conversion generation unit 406 has been described as generating an affine transformation from an image obtained by photographing three markers, but an affine transformation is produced from an image obtained by photographing more than three markers. Alternatively, one marker may be photographed three or more times at different locations, and affine transformation may be generated from these images. Further, instead of photographing the marker in advance, the marker may be photographed together with the subject, and the affine transformation may be generated in real time.
In the present embodiment, the conversion generation unit 406 extracts three markers with respect to the color in the left image and the right image, and extracts with reference to the distance in the distance image. A point in the display space of the stereoscopic video that is displayed by the conversion generation unit 406 of the stereoscopic video synthesis apparatus 400 on the left image and the right image using an input device such as a mouse. The conversion generation unit 406 accepts a point specified by the user with respect to the distance image using an input device such as a mouse as a marker, and the conversion generation unit 406 may generate an affine transformation based on the marker. . Also at this time, three or more markers are required to generate the affine transformation. Thus, affine transformation from a space based on a distance image to a stereoscopic video display space can be obtained based on an image obtained by photographing an arbitrary scene without using a marker.

以上、この発明の実施形態を図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes design and the like within a scope not departing from the gist of the present invention.

本発明の立体映像合成装置は、遠隔コミュニケーションや空間記録などに用いて好適であるが、これに限られるものではない。 The stereoscopic video composition apparatus of the present invention is suitable for use in remote communication or spatial recording, but is not limited to this.

この発明の一実施形態の概要を説明する図である。It is a figure explaining the outline | summary of one Embodiment of this invention. 同実施形態における立体映像合成装置４００を用いたシステムの構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the system using the three-dimensional video composition apparatus 400 in the embodiment. 同実施形態における立体映像合成装置４００の構成を示す概略ブロック図である。It is a schematic block diagram which shows the structure of the three-dimensional video composition apparatus 400 in the embodiment. 同実施形態における変換生成部４０６の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the conversion production | generation part 406 in the embodiment. 同実施形態における形状抽算出部４０５において、被写体の形状データを生成する処理を説明するフローチャートである。6 is a flowchart for describing processing for generating shape data of a subject in the shape extraction calculation unit 405 according to the embodiment. 同実施形態における変換生成部４０６において、左画像および右画像におけるマーカの位置を算出する処理を説明するフローチャートである。It is a flowchart explaining the process which calculates the position of the marker in a left image and a right image in the conversion production | generation part 406 in the embodiment. 同実施形態における変換生成部４０６にて立体映像の表示空間におけるマーカのＺ軸方向の位置に関する算出方法を説明する図であるIt is a figure explaining the calculation method regarding the position of the Z-axis direction of the marker in the display space of a three-dimensional image in the conversion production | generation part 406 in the embodiment. 同実施形態における変換生成部４０６において、３つのマーカ各々の距離画像に基づく空間における位置を算出する処理を説明するフローチャートである。It is a flowchart explaining the process which calculates the position in the space based on the distance image of each of three markers in the conversion production | generation part 406 in the embodiment.

Explanation of symbols

１００…左映像撮影装置
２００…右映像撮影装置
３００…距離画像撮影装置
４００…立体映像合成装置
５００…立体映像表示装置
６００…力覚・触覚提示装置
４０１…左映像データ入力部
４０２…右映像データ入力部
４０３…距離画像データ入力部
４０４…立体映像合成部
４０５…形状抽算出部
４０６…変換生成部
４０７…形状映像空間調整部
４０８…立体映像出力部
４０９…形状出力部 DESCRIPTION OF SYMBOLS 100 ... Left image imaging device 200 ... Right image imaging device 300 ... Distance image imaging device 400 ... Three-dimensional image composition device 500 ... Three-dimensional image display device 600 ... Haptic / tactile sense presentation device 401 ... Left image data input unit 402 ... Right image data Input unit 403 ... Distance image data input unit 404 ... Stereoscopic image synthesis unit 405 ... Shape extraction calculation unit 406 ... Conversion generation unit 407 ... Shape video space adjustment unit 408 ... Stereoscopic video output unit 409 ... Shape output unit

Claims

In a stereoscopic video composition device that synthesizes a stereoscopic video from a left image captured from the viewpoint of the left eye and a right image captured from the viewpoint of the right eye,
A shape calculating unit that calculates shape data of the subject from a distance image of the subject input from a distance image capturing device that captures a distance image in which each pixel represents a distance;
A shape image space adjustment unit that converts the shape data calculated by the shape calculation unit into a display space of a stereoscopic image synthesized by the device;
A three-dimensional image synthesizing device comprising: a shape data output unit that outputs shape data converted by the shape image space adjustment unit to a sensation presentation device.

Three or more subjects are extracted from the left image and the right image based on the respective colors, the positions of the three or more subjects in the display space of the stereoscopic video are calculated, and each distance from the distance image is calculated. The three or more subjects are extracted based on the image, the positions of the three or more subjects in the space based on the distance image are calculated, and the display is performed from the space based on the distance image based on the calculated positions. It has a conversion generator that calculates the conversion to space,
The stereoscopic video composition apparatus according to claim 1, wherein the shape video space adjustment unit performs the conversion obtained by the conversion generation unit on the shape data.

From the space based on the distance image to the display space based on three or more points in the display space of the stereoscopic video designated by the user operation and the points of the distance image associated with the respective points by the user operation A conversion generation unit that calculates conversion to
The stereoscopic video composition apparatus according to claim 1, wherein the shape video space adjustment unit performs the conversion obtained by the conversion generation unit on the shape data.

A shape data generation method in a stereoscopic video synthesizing apparatus that synthesizes a stereoscopic video from a left image captured from a left eye viewpoint and a right image captured from a right eye viewpoint,
A first process in which the stereoscopic video composition device calculates shape data of the subject from a distance image of the subject input from a distance image photographing device that captures a distance image in which each pixel represents a distance;
A second process in which the stereoscopic video synthesizing device converts the shape data calculated in the first process into a display space of the stereoscopic video synthesized by the own device;
A method of generating shape data, comprising: a third process in which the stereoscopic video composition apparatus outputs the shape data converted in the second process to a sensation presentation apparatus.

In a program for causing a computer to function as a stereoscopic video composition device that synthesizes a stereoscopic video from a left image obtained by imaging a subject from a left eye viewpoint and a right image obtained by imaging from a right eye viewpoint,
A shape calculating unit that calculates shape data of the subject from a distance image of the subject input from a distance image capturing device that captures a distance image in which each pixel represents a distance;
A shape image space adjustment unit that converts the shape data calculated by the shape calculation unit into a display space of a stereoscopic image synthesized by the device;
A program for functioning as a shape data output unit for outputting shape data converted by the shape image space adjustment unit to a sensation presentation device.