JP2006128816A

JP2006128816A - Recording program and reproducing program corresponding to stereoscopic video and stereoscopic audio, recording apparatus and reproducing apparatus, and recording medium

Info

Publication number: JP2006128816A
Application number: JP2004311391A
Authority: JP
Inventors: Takayuki Sugawara; 隆幸菅原; Takuma Suzuki; 琢磨鈴木; Toshiko Murata; 寿子村田; Masako Yurino; 正子百合野; Jitsuki Haishi; 実希羽石
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2004-10-26
Filing date: 2004-10-26
Publication date: 2006-05-18

Abstract

<P>PROBLEM TO BE SOLVED: To provide a recording/reproducing technique corresponding to stereoscopic video and stereoscopic audio capable of performing corresponding recording and reproducing of the contents of stereoscopic video and stereoscopic audio. <P>SOLUTION: In this recording technique corresponding to stereoscopic video and stereoscopic audio, stereoscopic video data and stereoscopic audio data are recorded in a recording medium, stereoscopic positional information of an object in a video image is created per time, together with stereoscopic video information, and the created information is recorded in a predetermined area capable of being referred to, when the stereoscopic video data and the stereoscopic audio data are reproduced in this recording medium. Also, in this reproducing technique corresponding to stereoscopic video and stereoscopic audio, the stereoscopic positional information of the object in the video image is detected from the stereoscopic video data, read from the recording medium or the received stereoscopic video data, and spatial localization of spatial audio is performed on the basis of the positional information. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、立体映像のオブジェクトの動きと立体音響を対応させて記録し、再生するための立体映像・立体音響対応記録プログラム、再生プログラム、記録装置、再生装置及び記録メディアに関する。 The present invention relates to a stereoscopic video / stereo acoustic correspondence recording program, a reproducing program, a recording apparatus, a reproducing apparatus, and a recording medium for recording and reproducing stereoscopic object motion and stereoscopic sound in association with each other.

従来、いくつかの方式で３Ｄ（立体）映像再生技術が提案されている。また３Ｄ（立体）音場に関する技術も、スピーカのないところから、あたかもその位置にスピーカがあるかのように音を定位させて再生する技術が提案されている。しかし、空間を表現できる立体映像技術で表示再現できるコンテンツの立体的な映像と空間音響の空間定位技術による音響とを連携させて再生することができないという問題点があった。 Conventionally, 3D (stereoscopic) video reproduction techniques have been proposed in several ways. Further, as a technique related to a 3D (stereoscopic) sound field, a technique has been proposed in which a sound is localized and reproduced as if a speaker is located at that position from a place without a speaker. However, there is a problem that it is not possible to reproduce the stereoscopic video of the content that can be displayed and reproduced by the stereoscopic video technology that can express the space and the sound by the spatial localization technology of the spatial sound in cooperation.

例えば、特開２０００−１７３５５号公報（特許文献１）には、本発明者から対象物体に関するステレオ画像の相対応する両画素点間を結ぶエピポーラ線の方向とその直交する方向に関して両方向にそれぞれ相対応する両画素点を含むように画面上の２次元探索を行って視差ベクトルを求め、対象物体までの距離を計算する方法が開示されている。 For example, in Japanese Patent Laid-Open No. 2000-17355 (Patent Document 1), the present inventor discloses a phase in both directions with respect to a direction of an epipolar line connecting between corresponding pixel points of a stereo image related to a target object and a direction orthogonal thereto. A method of calculating a distance to a target object by performing a two-dimensional search on the screen so as to include both corresponding pixel points and calculating a disparity vector is disclosed.

また、特開平７−２３６１９９公報（特許文献２）には、立体映像における左右の物体の動きは動きベクトルの検出によって自動的に検出し、これに応じた音場は主音声信号及び副音声信号によって駆動される主スピーカ及びサラウンドスピーカで立体的に再生をする方式が開示されている。しかしながら、動きベクトルはコンテンツの時間方向に関する物体の動きであり、３次元立体映像の奥行きの情報ではない。また、立体音場もこの従来技術では、単に幾つかのスピーカでのサラウンド感を用いて行うもので、空間での音の定位を行うものではない。 In Japanese Patent Laid-Open No. 7-236199 (Patent Document 2), the movements of left and right objects in a stereoscopic video are automatically detected by detecting a motion vector, and the sound fields corresponding to the motions are a main audio signal and a sub audio signal. A method of reproducing three-dimensionally with a main speaker and a surround speaker driven by the above is disclosed. However, the motion vector is the motion of the object in the time direction of the content and is not information on the depth of the 3D stereoscopic video. In this prior art, the three-dimensional sound field is also performed by simply using the surround feeling of several speakers, and does not perform localization of sound in space.

また、特開２００１−３０６０８１公報（特許文献３）には、実時間でオーディオ空間の構成を制御する音楽空間構成制御装置に関する技術が開示されており、ＤｉｒｅｃｔＸに存在する３次元音源を記述するパラメータを使い、定位、方向、ドップラーパラメータなどを用いて効果的なミュージックスペースを提供することが開示されている。しかしながら立体映像の空間である奥行きに関するパラメータは開示されておらず、その空間的映像と音響の同期再生も説明されていない。 Japanese Patent Laid-Open No. 2001-306081 (Patent Document 3) discloses a technique related to a music space configuration control device that controls the configuration of an audio space in real time, and parameters that describe a three-dimensional sound source existing in DirectX. To provide an effective music space using localization, direction, Doppler parameters, and the like. However, a parameter relating to depth, which is a space of stereoscopic video, is not disclosed, and synchronous playback of the spatial video and sound is not described.

このように、従来では、３次元の立体映像の奥行き方向の動きに対応して、音の定位技術を用いて映像と音響を連携させて再生することや、映像がディスプレイ表示域を超えてもそのコンテンツの音を音の定位技術を利用して再生して臨場感のある映像音響を再生するシステムは知られていない。
特開２０００−１７３５５号公報特開平７−２３６１９９号公報特開２００１−３０６０８１号公報ＮＨＫ放送技術研究所、「３次元映像の基礎」、オーム社、１９９５年イエンスブラウエルト著、「空間音響」、鹿島出版会、１９８５年Ｂ．Ｊａｖｉｄｉ，Ｆ．ＯｋａｎｏＥｄｉｔｏｒｓ， “Ｔｈｒｅｅ−ＤｉｍｅｎｓｉｏｎａｌＴｅｌｅｖｉｓｉｏｎ，Ｖｉｄｅｏ，ａｎｄＤｉｓｐｌａｙＴｅｃｈｎｏｌｏｇｉｅｓ”，Ｓｐｒｉｎｇｅｒ−Ｖｅｒｌａｇ（２００２），Ｐ１０１〜Ｐ１２３ As described above, conventionally, in response to the movement in the depth direction of a three-dimensional stereoscopic image, the sound and the sound are coordinated to reproduce the image and the sound, or even if the image exceeds the display display area. There is no known system that reproduces the sound of the content using sound localization technology and reproduces realistic video and audio.
JP 2000-17355 A JP 7-236199 A JP 2001-306081 A NHK Broadcasting Technology Laboratory, “Basics of 3D Video”, Ohm, 1995 By Jens Brauert, “Spatial Acoustics”, Kashima Press, 1985 B. Javidi, F.A. Okano Editors, “Three-Dimensional Television, Video, and Display Technologies”, Springer-Verlag (2002), P101-P123.

本発明は前述のような従来技術の技術的課題に鑑みてなされたものであり、立体映像と立体音響のコンテンツの対応記録ができる立体映像・立体音響対応記録技術を提供することを目的とする。 The present invention has been made in view of the above-described technical problems of the prior art, and an object of the present invention is to provide a stereoscopic video / stereoacoustic recording technique capable of corresponding recording of stereoscopic video and stereoscopic audio content. .

本発明はまた、記録メディアに記録されている立体映像データと音響データの再生に際し、オブジェクトの映像中の３次元的位置と音響の３次元的定位を同期して再生することができる立体映像・立体音響対応再生技術を提供することを目的とする。 The present invention also provides a stereoscopic video / audio that can be reproduced in synchronization with the three-dimensional position of the object in the video and the three-dimensional localization of the sound when reproducing the stereoscopic video data and the audio data recorded on the recording medium. It aims at providing the reproduction | regeneration technology corresponding to a three-dimensional sound.

請求項１の発明の立体映像・立体音響対応記録プログラムは、立体映像データ及び音響データの双方のデータストリームを記録メディアに記録するステップと、所定の時間単位で、再生時に自身を音源とする音響の立体定位制御を行おうとするオブジェクトの立体位置情報を当該オブジェクトの識別情報と共に、前記記録メディアにおける当該立体映像データ及び音響データの再生時に参照可能な所定の記憶エリアに記録するステップとをコンピュータに実行させるものである。 According to the first aspect of the present invention, there is provided a recording program for stereoscopic video / stereo acoustic recording of a data stream of both stereoscopic video data and audio data on a recording medium, and an audio having itself as a sound source during reproduction in a predetermined time unit. Recording the stereo position information of the object to be subjected to the stereo localization control in the computer together with the identification information of the object in a predetermined storage area that can be referred to when reproducing the stereo video data and the audio data on the recording medium. To be executed.

請求項２の発明の立体映像・立体音響対応再生プログラムは、記録メディアに記録されている立体映像データ及び音響データを読み出して再生するステップと、前記記録メディアの所定の記憶エリアから音源の立体定位制御を行うオブジェクトの識別情報と立体位置情報とを読み出すステップと、前記音響データの再生に際して、前記オブジェクトの識別情報に対応する立体位置情報に基づき、当該音響データの音像の立体定位位置を当該オブジェクトの立体位置になるように、少なくとも２以上のスピーカを用いて音像位置制御を行うステップとをコンピュータに実行させるものである。 The reproduction program for stereoscopic video / stereo sound according to claim 2 reads out and reproduces stereoscopic video data and audio data recorded on a recording medium, and stereo localization of a sound source from a predetermined storage area of the recording medium. The step of reading the identification information and the stereoscopic position information of the object to be controlled, and the reproduction of the acoustic data, based on the stereoscopic position information corresponding to the identification information of the object, the stereo localization position of the sound image of the acoustic data And a step of performing sound image position control using at least two or more speakers so that the three-dimensional position is obtained.

請求項３の発明の立体映像・立体音響対応記録装置は、立体映像データ及び音響データの双方のデータストリームを記録メディアに記録する手段と、所定の時間単位で、再生時に自身を音源とする音響の立体定位制御を行おうとするオブジェクトの立体位置情報を当該オブジェクトの識別情報と共に、前記記録メディアにおける当該立体映像データ及び音響データの再生時に参照可能な所定の記憶エリアに記録する手段とを備えたものである。 According to a third aspect of the present invention, there is provided a stereoscopic video / stereo sound compatible recording apparatus, a unit for recording a data stream of both stereoscopic video data and audio data on a recording medium, and an audio having itself as a sound source during reproduction in a predetermined time unit. And means for recording the stereoscopic position information of the object to be subjected to the stereo localization control together with the identification information of the object in a predetermined storage area that can be referred to when reproducing the stereoscopic video data and the audio data on the recording medium. Is.

請求項４の発明の立体映像・立体音響対応再生装置は、記録メディアに記録されている立体映像データ及び音響データを読み出して再生する手段と、前記記録メディアの所定の記憶エリアから音源の立体定位制御を行うオブジェクトの識別情報と立体位置情報とを読み出す手段と、前記音響データの再生に際して、前記オブジェクトの識別情報に対応する立体位置情報に基づき、当該音響データの音像の立体定位位置を当該オブジェクトの立体位置になるように、少なくとも２以上のスピーカを用いて音像位置制御を行う手段とを備えたものである。 According to a fourth aspect of the present invention, there is provided a stereoscopic video / stereo sound-compatible playback device that reads and plays back stereoscopic video data and audio data recorded on a recording medium, and stereo localization of a sound source from a predetermined storage area of the recording medium. Means for reading the identification information and stereoscopic position information of the object to be controlled, and upon reproduction of the acoustic data, based on the stereoscopic position information corresponding to the identification information of the object, the stereo localization position of the sound image of the acoustic data is determined as the object Means for performing sound image position control using at least two or more speakers so that the three-dimensional position is obtained.

請求項５の発明の立体映像・立体音響対応記録メディアは、立体映像データ及び音響データの双方のデータストリームを記録すると共に、所定の時間単位でオブジェクトの立体位置情報を当該オブジェクトの識別情報と共に当該記録メディアにおける前記立体映像データ及び音響データの再生時に参照可能な所定の記憶エリアに記録したものである。 According to the fifth aspect of the present invention, there is provided a recording medium for stereoscopic video / stereo acoustic recording both the stereoscopic video data and the audio data, and the stereoscopic position information of the object together with the identification information of the object in a predetermined time unit. It is recorded in a predetermined storage area that can be referred to when reproducing the stereoscopic video data and the sound data on the recording medium.

本発明の立体映像・立体音響対応記録技術では、記録メディアに対して立体映像データ及び音響データを記録とすると共に、立体映像情報と共に映像中のオブジェクトの立体位置情報を所定の時間単位で作成して当該記録メディアにおける立体映像データ及び音響データの再生時に参照可能な所定の記憶エリアに記録することができる。 In the recording technology for stereoscopic video / stereo sound of the present invention, stereoscopic video data and acoustic data are recorded on a recording medium, and stereoscopic position information of an object in the video is created in a predetermined time unit together with stereoscopic video information. Thus, it can be recorded in a predetermined storage area that can be referred to when reproducing the stereoscopic video data and the audio data on the recording medium.

本発明の立体映像・立体音響対応再生技術では、記録メディアから読出された立体映像データあるいは受信した立体映像データからその映像中のオブジェクトの立体位置情報を検出し、その位置情報をもとにして空間音響の空間的定位を正確に行うことができる。 In the 3D image / stereo sound compatible playback technology of the present invention, the 3D position information of an object in the image is detected from the 3D image data read from the recording medium or the received 3D image data, and based on the position information. Spatial localization of spatial acoustics can be performed accurately.

以下、本発明の実施の形態を図に基づいて詳説する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（第１の実施の形態）図１は本発明の１つの実施の形態の立体映像・立体音響対応記録システムのブロック図であり、図２はそれを用いた対象物体の撮影態様を示している。本実施の形態の立体映像・立体音響対応記録システムは、立体映像情報記録系として、左右一対の撮像カメラＡ１、撮像カメラＢ２、視差ベクトル抽出器３、奥行き距離算出器４、オブジェクト解析器５、水平方向位置分析器６、垂直方向位置分析器７、奥行き方向位置分析器８、位置情報フォーマット器９、ビデオ圧縮器１４を備えている。本システムはまた、音響情報記録系として、複数マイク群１１、音源選択器１２、オーディオ圧縮器１３を備えている。また本システムは、これら両系の共通の要素としてＣＰＵ１０、情報多重化器１５、記録器１６を備えている。 (First Embodiment) FIG. 1 is a block diagram of a stereoscopic video / stereo acoustic recording system according to one embodiment of the present invention, and FIG. 2 shows a photographing mode of a target object using the same. . The stereoscopic video / stereoacoustic recording system of the present embodiment includes a pair of left and right imaging cameras A1, B2, parallax vector extractor 3, depth distance calculator 4, object analyzer 5, as a stereoscopic video information recording system. A horizontal position analyzer 6, a vertical position analyzer 7, a depth direction position analyzer 8, a position information formatter 9, and a video compressor 14 are provided. This system also includes a plurality of microphone groups 11, a sound source selector 12, and an audio compressor 13 as an acoustic information recording system. The system also includes a CPU 10, an information multiplexer 15, and a recorder 16 as elements common to both systems.

上記構成の立体映像・立体音響対応記録システムによる立体映像・立体音響対応記録動作について説明する。映像記録系は、左右一対の撮像カメラＡ１、Ｂ２により左画像Ｐｌ及び右画像Ｐｒを撮像し、撮像された対象物体の映像を視差ベクトル抽出器３に出力する。視差ベクトル抽出器３は、特開平９−３３２４９号公報に開示されている相関関数などの評価関数に従って画素毎の対応付けを行う。ここで、２台の撮像カメラＡ１、撮像カメラＢ２はそれらの光軸が同一Ｘ−Ｚ平面上に含まれるように配置されている。これが厳密に正しく配置される限り、対応点の探索はエピポーラ線である走査線上のみ行えばよいのであるが、実際には、走査線上に１画素分も誤差なく配置されていることはむしろ少ない。そこで、左画像Ｐｌ及び右画像Ｐｒは視差ベクトル抽出器３でエピポーラ線方向である水平方向と、ある程度の垂直方向の視差ベクトルサーチ計算を行う。 A stereoscopic video / stereo sound compatible recording operation by the stereoscopic video / stereo sound compatible recording system configured as described above will be described. The video recording system captures the left image Pl and the right image Pr by the pair of left and right imaging cameras A1 and B2, and outputs the captured image of the target object to the parallax vector extractor 3. The disparity vector extractor 3 associates each pixel according to an evaluation function such as a correlation function disclosed in Japanese Patent Laid-Open No. 9-33249. Here, the two imaging cameras A1 and B2 are arranged so that their optical axes are included in the same XZ plane. As long as this is strictly arranged correctly, the search for the corresponding point may be performed only on the scanning line which is an epipolar line, but in practice, it is rather rare that one pixel is arranged on the scanning line without any error. Therefore, the parallax vector extractor 3 performs parallax vector search calculation in the horizontal direction that is the epipolar line direction and a certain degree of vertical direction in the left image Pl and the right image Pr.

エピポーラ線方向の計算方法を、図３を参照して説明する。ここでは予め判別されている対応点、若しくは非常に判別しやすい特徴点が、左画像ＰｌにはＲ点（Ｘｌ，Ｙｌ）に、右画像ＰｒにはＳ点（Ｘｒ，Ｙｒ）に存在していたとする。この２点Ｒ、Ｓを直線で結ぶことによりエピポーラ線ＥＰの方向が求められる。ここでは、エピポーラ線ＥＰの方向はＸ軸方向線にほぼ平行であるから、基本的には水平の探索で求まる。例えば水平４画素垂直２画素の小ブロックの画素の差和や差の２乗和などを評価パラメータとして、最小値になる位置をＸの位置で求める。カメラに垂直方向の設置誤差が考えられる場合には、エピポーラ線から角度θだけ傾斜していると仮定して水平探索範囲とＴａｎθの積で計算される程度に垂直方向の探索範囲を拡大する。 A method of calculating the epipolar line direction will be described with reference to FIG. Here, corresponding points that have been discriminated in advance or feature points that are very easy to discriminate exist at the R point (Xl, Yl) in the left image Pl and at the S point (Xr, Yr) in the right image Pr. Suppose. The direction of the epipolar line EP is obtained by connecting these two points R and S with a straight line. Here, since the direction of the epipolar line EP is substantially parallel to the X-axis direction line, it is basically obtained by a horizontal search. For example, the position of the minimum value is obtained at the position X by using the difference sum of the pixels of the small blocks of horizontal 4 pixels and vertical 2 pixels, the square sum of the differences, or the like as the evaluation parameter. When a vertical installation error is considered in the camera, the vertical search range is expanded to the extent calculated by the product of the horizontal search range and Tan θ, assuming that the camera is inclined by an angle θ from the epipolar line.

設定された探索範囲内で探索を行い、結果として、例えば左画像上の点Ｒ（Ｘｌ，Ｙｌ）と右画像上の点Ｓ（Ｘｒ，Ｙｒ）が対応した場合、点Ｒ（Ｘｌ，Ｙｌ）における視差ベクトルをＶ（Ｘｌ−Ｘｒ，Ｙｌ−Ｙｒ）と表す。このような視差ベクトルを、左画像上の全てのマクロブロックについて求める。この処理を画面全体にわたって行い、最終的に選択された視差ベクトルＶを奥行き距離計算器４に送出する。 A search is performed within the set search range. As a result, for example, when a point R (Xl, Yl) on the left image corresponds to a point S (Xr, Yr) on the right image, the point R (Xl, Yl) The disparity vector at is expressed as V (X1-Xr, Y1-Yr). Such a disparity vector is obtained for all macroblocks on the left image. This process is performed over the entire screen, and the finally selected disparity vector V is sent to the depth distance calculator 4.

奥行き距離算出器４は、視差ベクトルの大きさを計算し、例えば水平４画素垂直２画素の小ブロックと設定して探索して位置を求める。得られた水平位置、垂直位置、奥行き方向を（Ｘ、Ｙ、Ｚ）としてオブジェクト解析器５へ伝送する。オブジェクト解析器５では、小ブロックを画像１枚につき、左上から右下へラスター順番にならべて、Ｘ、Ｙ、Ｘそれぞれを所定のビット数でフォーマット化する。例えば画像が水平７２０ｘ４８０のＮＴＳＣクラスの解像度であれば、Ｘを１０ビット、Ｙを９ビットとする。例えば画像が水平１９２０ｘ１０８０のＨＤＴＶクラスの解像度であれば、Ｘを１１ビット、Ｙを９ビットとする。 The depth distance calculator 4 calculates the size of the parallax vector, and searches for a position by setting, for example, a small block of horizontal 4 pixels and vertical 2 pixels. The obtained horizontal position, vertical position, and depth direction are transmitted to the object analyzer 5 as (X, Y, Z). The object analyzer 5 formats each of the X, Y, and X with a predetermined number of bits in a raster order from the upper left to the lower right for each image. For example, if the image is NTSC class resolution of 720 × 480 horizontal, X is 10 bits and Y is 9 bits. For example, if the image has a horizontal 1920 × 1080 HDTV class resolution, X is 11 bits and Y is 9 bits.

図４に７２０ｘ４８０の場合のフォーマット例を示す。１フレームでこの構造を１つ伝送する。フレームレイヤには、１画面中の４ｘ２のマクロブロックの数だけ、ラスター順番に、オブジェクトフラッグ１ビット、その後にオブジェクトのＩＤ７ビットその後に水平位置情報Ｘ、垂直位置情報Ｙ、奥行き位置情報Ｚのデータが続く。 FIG. 4 shows a format example in the case of 720 × 480. One structure is transmitted in one frame. In the frame layer, the number of 4 × 2 macroblocks in one screen is the raster order, the object flag is 1 bit, the object ID is 7 bits, the horizontal position information X, the vertical position information Y, and the depth position information Z. Followed.

オブジェクトフラッグとは、この後説明するオブジェクトの中心に対応するマクロブロックに対して１、それ以外のマクロブロックには０を記述するもので、オブジェクトであるかないかを示すフラグともいえる。オブジェクトＩＤ情報もこの後説明するオブジェクトの識別ナンバーである。ここでは７ビットを用いるので、１２７種類のオブジェクトがフレーム内に定義することができる（０はオブジェクト領域外を占めることとする）。対応する音源情報とのリンク情報となる。 The object flag describes 1 for a macroblock corresponding to the center of the object to be described later, and 0 for other macroblocks, and can be said to be a flag indicating whether or not the object is an object. The object ID information is also an object identification number to be described later. Since 7 bits are used here, 127 types of objects can be defined in the frame (0 occupies outside the object area). It becomes link information with corresponding sound source information.

また、オブジェクトは必ずしも映像中に存在していないこともあるので、そのときには図４のＥＳＣ情報を用いる。この構造には、はじめにＮｕｍＯｆＯｂｊｅｃｔという８ビットで、後続するＥＳＣオブジェクトの個数と、ＥＳＣオブジェクトの中心（若しくはそれに順ずる）位置における水平位置、垂直位置、奥行き位置を記録する。また、オブジェクトのＩＤも記録する。これによって、映像中から消えて、例えば視聴者の後方を飛び回るようなオブジェクトでも、音像だけは後ろから聞こえるという特殊なシーンの再生も可能である。 Further, since the object may not necessarily exist in the video, the ESC information of FIG. 4 is used at that time. In this structure, the number of subsequent ESC objects and the horizontal position, the vertical position, and the depth position at the center (or the same position) of the ESC object are recorded with 8 bits of NumOfObject. The ID of the object is also recorded. As a result, even for an object that disappears from the video and flies behind the viewer, for example, it is possible to reproduce a special scene in which only the sound image can be heard from behind.

これらのマクロブロックの情報はオブジェクト解析器５に伝送される。オブジェクト解析器５では画像データにラプラシアンオペレータなどを用いて微分し、オブジェクトの輪郭を抽出し、その大きい塊をひとつのオブジェクトと定義し、そのオブジェクトの領域を示す輪郭の情報を、水平方向位置分析器６、垂直方向位置分析器７、奥行き方向位置分析器８にそれぞれ伝送する。水平方向位置分析器６ではオブジェクトの領域における水平方向の最小値と最大値の和を１／２にした値を計算する。垂直方向位置分析器７では垂直方向の最小値と最大値の和を１／２にした値を計算する。奥行き方向位置分析器８ではオブジェクト輪郭情報から、その中心の位置に対応する水平４画素垂直２画素の小ブロックの視差ベクトルの大きさを計算する。ここでは中心の値を用いたが、オブジェクト画像の面積を考慮した重心を求めてもよい。 Information on these macro blocks is transmitted to the object analyzer 5. The object analyzer 5 differentiates the image data using a Laplacian operator, extracts the outline of the object, defines the large block as one object, and analyzes the position information of the outline of the object in the horizontal direction. And to the vertical position analyzer 7 and the depth position analyzer 8, respectively. The horizontal position analyzer 6 calculates a value obtained by halving the sum of the horizontal minimum and maximum values in the object area. The vertical position analyzer 7 calculates a value obtained by halving the sum of the minimum value and the maximum value in the vertical direction. The depth direction position analyzer 8 calculates the size of the disparity vector of a small block of horizontal 4 pixels and vertical 2 pixels corresponding to the center position from the object outline information. Although the center value is used here, the center of gravity in consideration of the area of the object image may be obtained.

それぞれの計算された値は、位置情報フォーマット器９において、オブジェクトのＸ、Ｙ、Ｚの位置情報として、オブジェクトの中心に対応するマクロブロックに対してオブジェクトフラッグを１、それ以外のマクロブロックには０を記述する。同様に次のオブジェクトを検出していくがオブジェクトに発生順番で１から１２７までのＩＤをふり、オブジェクトＩＤフィールドに記録する。オブジェクト領域以外のところは０を記述する。次のフレーム画像ではオブジェクトは動いている可能性があるので、もっとも近傍であって、オブジェクトの輪郭情報が特徴点（例えば四角形のものであれば角の数や、そのオブジェクトの内部の輝度、色差信号が８ビットデータ）で１０％以内の僅差でほぼ似ているなどの情報や、時間的に隣り合うフレームであれば、検出されたオブジェクトがもっとも近いものである、などの複数の条件を満たしているものを同じオブジェクトとして認識し、オブジェクトＩＤ情報は、次のフレーム移行、同じオブジェクトと認識したものには同じＩＤ値を記録する。位置情報フォーマット器９はこのような図４のようなフォーマットをして、情報多重化器１５に伝送すると同時に、ＣＰＵ１０へオブジェクトのＸ、Ｙ、Ｚの位置情報を伝送する。 In the position information formatter 9, each calculated value is set as 1 for the macroblock corresponding to the center of the object, and for other macroblocks as the X, Y, Z position information of the object. Describe 0. Similarly, the next object is detected, but IDs 1 to 127 are assigned to the objects in the order of occurrence and recorded in the object ID field. 0 is described in places other than the object area. Since the object may be moving in the next frame image, it is the nearest neighbor, and if the outline information of the object is a feature point (for example, if it is a rectangle, the number of corners, the brightness inside the object, the color difference) Satisfies multiple conditions such as information such as signals that are almost similar within 10% of the signal (8-bit data) and that the detected object is the closest if the frames are temporally adjacent Are recognized as the same object, and the same ID value is recorded in the object ID information for the next frame transition and those recognized as the same object. The position information formatter 9 performs the format as shown in FIG. 4 and transmits the information to the information multiplexer 15 and simultaneously transmits the X, Y, and Z position information of the object to the CPU 10.

次に、オーディオ情報の記録方法について説明する。オーディオはカメラ１，２を中心に複数のマイク群１１を全周囲に向けて設定する。すなわち、図５のようにカメラを中心にした球の表面を複数の指向性の高いマイクを用いて球面の法線方向に向ける。図５は概念的に示してあるが、このマイクは、中心をカメラ位置としてＸ、Ｙ、Ｚの方向で３次元の座標を仮想的に設定し、その軸の方向６点とその間を補間できるようにマイクをなるべく多く設置している。ただしカメラレンズ方向（図５では＋Ｚ方向マイク）は、カメラの視野に入らないようにマイクの位置を工夫している。 Next, a method for recording audio information will be described. Audio is set with the plurality of microphone groups 11 centering on the cameras 1 and 2 so as to face all around. That is, as shown in FIG. 5, the surface of the sphere centered on the camera is directed in the normal direction of the spherical surface using a plurality of highly directional microphones. Although conceptually shown in FIG. 5, this microphone can virtually set three-dimensional coordinates in the X, Y, and Z directions with the center as the camera position, and can interpolate between the six directions of the axis and between them. As many microphones as possible are installed. However, the position of the microphone is devised so that the camera lens direction (the + Z direction microphone in FIG. 5) does not enter the field of view of the camera.

位置情報フォーマット器９からは、ＣＰＵ１０へ、検出されたそれぞれのオブジェクトの位置情報が入力される。ＣＰＵ１０では、各オブジェクトの位置情報に対応した方向に向いているマイクの音源を記録するように、音源選択器１２に選択信号を送信する。音源選択器１２ではその指示信号に応じた複数のマイクからの音源を選択してその音源のマイクから伝送されたオーディオデータをオーディオ圧縮器１３へ伝送する。その際、オブジェクトのＩＤが、位置情報フォーマット器９からＣＰＵ１０と音源選択器１２を経由してオーディオ圧縮器１３にも伝送される。オブジェクトＩＤは後に述べるオーディオ圧縮器１３においてオーディオ圧縮データに識別子情報をＭＰＥＧ規格のＤｅｓｃｒｉｐｔｏｒの打ち方を参照して記述されるのに利用される。 From the position information formatter 9, position information of each detected object is input to the CPU 10. The CPU 10 transmits a selection signal to the sound source selector 12 so as to record the sound source of the microphone that faces in the direction corresponding to the position information of each object. The sound source selector 12 selects sound sources from a plurality of microphones according to the instruction signal, and transmits audio data transmitted from the sound source microphones to the audio compressor 13. At this time, the object ID is also transmitted from the position information formatter 9 to the audio compressor 13 via the CPU 10 and the sound source selector 12. The object ID is used by the audio compressor 13 to be described later to describe identifier information in audio compressed data with reference to a method for writing a Descriptor of the MPEG standard.

一方、カメラＢ２（若しくはカメラＡ１）の画像はビデオ圧縮器１４に伝送される。ビデオ圧縮器１４では、カメラからの２次元画像、位置情報フォーマット器９からの奥行き情報やオブジェクトＩＤ情報を入力し、ＭＰＥＧ規格の画像圧縮を行う。なお、奥行き情報やオブジェクトＩＤ情報は、ＭＰＥＧの規定の中で互換性が取れるように、ユーザーデータ領域やプライベートストリームにて伝送し、またオブジェクトＩＤはＭＰＥＧ規格で定義されているＲｅｇｉｓｔｒａｔｉｏｎ＿Ｄｅｓｃｒｉｐｔｏｒのａｄｄｉｔｉｏｎａｌ＿ｉｄｅｎｔｉｆｉｅｒ＿ｉｎｆｏを用いる。なお、他の方法として、（１）新規にＰｒｉｖａｔｅのＤｅｓｃｒｉｐｔｏｒを定義して、図１９のようにＯｂｊｅｃｔ＿ＩＤ＿ｄｅｓｃｒｉｐｔｏｒを作成する、（２）Ｓｔｒｅａｍ＿ＩＤのｒｅｓｅｒｖｅｄｄａｔａｓｔｒｅａｍを用いて、ＭＰＥＧで未定義の領域１１１１１０１０〜１１１１１１１０までを用いて設定する方法、（３）Ｓｔｒｅａｍ＿ＴｙｐｅとしてＭＰＥＧではＵｓｅｒＰｒｉｖａｔｅなＶａｌｕｅとされている０ｘ８０から０ｘＦＦまでの中の識別を使用する方法を採用することもできる。なお（２）におけるＳｔｒｅａｍ＿ＩＤはＩＳＯ１３８１８−１のＰＥＳパケットのシンタックスに定義されているものである。また、これ以外にも、ＭＰＥＧで規定されているユーザーデータ領域のどこを使っても構わないし、ＡＣ３や他のオーディオ方式のシンタックスの中で許されているユーザーデータの領域部分に識別コードを入れる方法でも構わない。 On the other hand, the image of the camera B2 (or camera A1) is transmitted to the video compressor 14. The video compressor 14 inputs a two-dimensional image from the camera, depth information from the position information formatter 9 and object ID information, and performs MPEG standard image compression. Note that the depth information and the object ID information are transmitted in the user data area and the private stream so that the compatibility can be obtained in the MPEG standard, and the object ID uses the registration_descriptor's additional_identifier_info defined in the MPEG standard. . As another method, (1) a new private descriptor is defined to create an Object_ID_descriptor as shown in FIG. 19, and (2) an undefined region 11111010 in MPEG using a stream_ID reserved data stream. It is also possible to use a method of setting up to 11111110, and (3) a method of using an identification from 0x80 to 0xFF, which is a User Private Value in MPEG as Stream_Type. The Stream_ID in (2) is defined in the syntax of the PES packet of ISO13818-1. In addition to this, any user data area defined in MPEG may be used, and an identification code is provided in the area of user data allowed in AC3 or other audio syntax. It does not matter how you put it in.

音源選択器１２からのオーディオデータはオーディオ圧縮器１３においてＭＰＥＧのオーディオ圧縮（ＭＰＥＧ１オーディオ、ＭＰＥＧ２オーディオ、ＡＡＣ、ドルビーＡＣ３、ＡＴＲＡＣなど）がなされる。 Audio data from the sound source selector 12 is subjected to MPEG audio compression (MPEG1 audio, MPEG2 audio, AAC, Dolby AC3, ATRAC, etc.) in the audio compressor 13.

次にＭＰＥＧの規格に準拠しながら、付加情報を記録する方法を説明する。ＭＰＥＧ画像圧縮の規格では、ピクチャーレイヤ、ＧＯＰレイヤにそれぞれユーザーデータ領域が設定されている。これらはＭＰＥＧのシンタックスで映像音声とは関係ないデータを埋め込むことのできる所定のエリアとして設定されているｕｓｅｒ＿ｄａｔａ、若しくはｐｒｉｖａｔｅ＿ｄａｔａ＿ｂｙｔｅ、若しくはユーザーが任意に設定できるｐｒｉｖａｔｅ＿ｓｔｒｅａｍなどのデータパケットに記録する。例えばＭＰＥＧ１のビデオにおけるピクチャーレイヤは図１７に示すようになっていて、スライスレイヤの手前で、ｕｓｅｒ＿ｄａｔａ＿ｓｔａｒｔ＿ｃｏｄｅを送った後にｕｓｅｒ＿ｄａｔａを８ビット単位で記録することができるような仕組みが定義されている。また、ＭＰＥＧ２などの多重化トランスポートストリームのシステムレイヤにも図１８のようにｔｒａｎｓｐｏｒｔ＿ｐｒｉｖａｔｅ＿ｄａｔａ＿ｆｌａｇに１を立てることでｐｒｉｖａｔｅ＿ｄａｔａが存在することを明示でき、データ長もトランスポートパケットをはみ出さないという制限のもとで、ｔｒａｎｓｐｏｒｔ＿ｐｒｉｖａｔｅ＿ｄａｔａ＿ｌｅｎｇｔｈに設定したデータ長のｐｒｉｖａｔｅ＿ｄａｔａを送信することができる。これ以外にも、ＭＰＥＧシステムでユーザー固有のデータを記録する方法は、ｓｔｒｅａｍ＿ｉｄにｐｒｉｖａｔｅ＿ｓｔｒｅａｍを設定して専用のパケットを宣言することで送信するなど、仕組みは幾つか定義されており、本発明における奥行き情報やオブジェクトＩＤなどの図４の構造の情報は、これらの領域に、ピクチャー毎記録することができる。どの仕組みを用いてもかまわないが、本実施の形態では、ＭＰＥＧ１ビデオのｕｓｅｒ＿ｄａｔａを用いている。 Next, a method for recording additional information while conforming to the MPEG standard will be described. In the MPEG image compression standard, user data areas are set in the picture layer and the GOP layer, respectively. These are recorded in a data packet such as user_data or private_data_byte set as a predetermined area in which data unrelated to video and audio can be embedded in MPEG syntax, or private_stream which can be arbitrarily set by the user. For example, a picture layer in an MPEG1 video is as shown in FIG. 17, and a mechanism is defined in which user_data can be recorded in units of 8 bits after sending user_data_start_code before the slice layer. Also, in the system layer of a multiplexed transport stream such as MPEG2, it is possible to clearly indicate that private_data exists by setting 1 to transport_private_data_flag as shown in FIG. 18, and there is a restriction that the data length does not protrude from the transport packet. Thus, private_data having the data length set in transport_private_data_length can be transmitted. In addition to this, there are several mechanisms for recording user-specific data in the MPEG system, such as sending private data by declaring a dedicated packet by setting private_stream to stream_id. Information of the structure of FIG. 4 such as information and object ID can be recorded for each picture in these areas. Any mechanism may be used, but in this embodiment, user_data of MPEG1 video is used.

本実施の形態において、ｕｓｅｒ＿ｄａｔａ＿ｓｔａｒｔ＿ｃｏｄｅはスライスレイヤの手前で０ｘ０００００１Ｂ２とＭＰＥＧでは定義されている。そのコードを送った後に、ユーザーデータエリア内で本発明の認証に用いる関数値の存在を示す、予め一意に識別可能なコードである例えば０ｘ０ｆ０ｆ０ｆ０ｆ２４１７ｆｄａａのコードを送信する。このコードは他のアプリケーションで、ｕｓｅｒ＿ｄａｔａを使う場合に、識別する目的で記録するもので、コードの値は特に意味はない。そのコードの後に図４の１マクロブロック３２ビット（４バイト）の構造を、ＭＰＥＧの１ピクチャー毎にピクチャーレイヤにラスター順番に記録する。画素が７２０ｘ４８０であれば７２０ｘ４８０ｘ４／（４ｘ２）＝１７ＫＢの情報量となる。これでは情報が大きい場合には、この情報を圧縮する。例えば奥行き情報は、隣り合うマクロブロックと同じがわずかの差しか存在しないことが多いので、変化していないときには４バイトのスキップするコード（例えばオール０の０ｘ００００００００）をセットすることで、スキップしている間のマクロブロックの情報を４バイトから１バイトへ減少させることができる。若しくはわずかの差の場合（例えば、プラスマイナスで１以下の場合など）には、０としてしまう。このほか、圧縮には上記のようなスキップ方法のほかに、差分をとってその変化だけを伝送するＤＰＣＭ方式や、エントロピーを減少させる符号化など、どんな方法を使用してもよい。 In this embodiment, user_data_start_code is defined in MPEG as 0x000001B2 before the slice layer. After sending the code, a code of, for example, 0x0f0f0f0f2417fda, which is a uniquely identifiable code indicating the presence of the function value used for authentication of the present invention in the user data area, is transmitted. This code is recorded for the purpose of identification when user_data is used in another application, and the value of the code has no particular meaning. After the code, the structure of one macroblock 32 bits (4 bytes) in FIG. 4 is recorded in the picture layer in raster order for each picture of MPEG. If the pixel is 720 × 480, the information amount is 720 × 480 × 4 / (4 × 2) = 17 KB. If the information is large, this information is compressed. For example, the depth information is often the same as the adjacent macroblock, but there is often only a slight difference, so when it has not changed, it can be skipped by setting a 4-byte skip code (for example, 0x00000000 for all 0s). During this time, the macro block information can be reduced from 4 bytes to 1 byte. Or, in the case of a slight difference (for example, in the case of plus or minus 1 or less), it becomes 0. In addition to the skip method as described above, any method may be used for compression, such as a DPCM method in which only the change is taken and the entropy is reduced.

以上の立体映像・立体音響対応記録システムによる立体映像・立体音響対応記録処理の手順を図６のフローチャートを参照して説明する。カメラＡ１及びカメラＢ２からの画像データ、及び複数マイク群１１よりの音響データを所定の時間分入力し、記憶装置に記憶する（ステップＳ１）。次に画像の視差ベクトルを抽出する（ステップＳ２）。ここではエピポーラ線方向である水平方向とある程度の垂直方向との視差ベクトルサーチ計算を画像上の全てのマクロブロックについて求める。次にマクロブロックの奥行き距離の計算を行う。すなわち、視差ベクトルの大きさを計算する（ステップＳ３）。 The procedure of the stereoscopic video / stereo acoustic correspondence recording process by the above stereoscopic video / stereo acoustic correspondence recording system will be described with reference to the flowchart of FIG. The image data from the cameras A1 and B2 and the acoustic data from the plurality of microphone groups 11 are input for a predetermined time and stored in the storage device (step S1). Next, a parallax vector of the image is extracted (step S2). Here, the parallax vector search calculation between the horizontal direction which is the epipolar line direction and a certain vertical direction is obtained for all macroblocks on the image. Next, the depth distance of the macroblock is calculated. That is, the magnitude of the parallax vector is calculated (step S3).

次にオブジェクトが画像に存在しているかどうかを判定する（ステップＳ４）。これは画像データに簡単なラプラシアンオペレータなどを用いて微分し、オブジェクトの輪郭が存在しているかどうかを判定する。判定の結果ＹＥＳであればオブジェクトの本格的な検出を行う（ステップＳ５）。すなわち、オブジェクト解析器では画像データにラプラシアンオペレータなどを用いて微分し、オブジェクトの輪郭を抽出して、その大きい塊をひとつのオブジェクトと定義してそのオブジェクトの領域を判定する。次に前記判定されたオブジェクトを抽出（定義）して、どのマクロブロックがそのオブジェクトに含まれているかを決定する（ステップＳ６）。次にオブジェクトの水平方向、垂直方向の位置情報を分析する（ステップＳ７）。この処理は、すでにマクロブロックごとに奥行き情報は求まっているので、オブジェクトの領域に掛かっているマクロブロックを指定して、その中心位置をオブジェクトの位置にする。次にオブジェクトＩＤと位置情報のフォーマッティングを行う（ステップＳ８）。すなわち、得られた水平位置、垂直位置、奥行き方向を（Ｘ、Ｙ、Ｚ）としての位置情報と、オブジェクトフラッグ、オブジェクトのＩＤを図４に示すフォーマットにする。次にオブジェクトの位置情報にリンクさせる音源を選択する（ステップＳ９）。 Next, it is determined whether or not the object exists in the image (step S4). In this case, the image data is differentiated by using a simple Laplacian operator or the like to determine whether or not the contour of the object exists. If the determination result is YES, full-scale object detection is performed (step S5). That is, the object analyzer differentiates the image data using a Laplacian operator, extracts the outline of the object, defines the large block as one object, and determines the area of the object. Next, the determined object is extracted (defined) to determine which macroblock is included in the object (step S6). Next, the horizontal and vertical position information of the object is analyzed (step S7). In this process, since the depth information has already been obtained for each macroblock, the macroblock applied to the object area is designated and the center position thereof is set to the object position. Next, the object ID and position information are formatted (step S8). That is, the obtained horizontal position, vertical position, and depth direction are set to the format shown in FIG. 4 with the position information, the object flag, and the object ID as (X, Y, Z). Next, a sound source to be linked to the position information of the object is selected (step S9).

一方、ステップＳ４における判定がＮＯの場合は前のピクチャーなどのシーンでオブジェクトの動きを延長し、映像データ中に存在しないオブジェクトを表現するかどうかをステップＳ１５において判定する。その判定がＹＥＳの場合には、ステップＳ７へ飛ぶ。ＮＯの場合には、オブジェクトＩＤを０とした前述の位置情報のフォーマッティングを行う（ステップＳ１６）。映像中に存在していないときには図４のＥＳＣ情報を用いて、後続するＥＳＣオブジェクトの個数と、ＥＳＣオブジェクトの中心（若しくはそれに順ずる）位置における水平位置、垂直位置、奥行き位置、オブジェクトのＩＤを記録する。 On the other hand, if the determination in step S4 is NO, it is determined in step S15 whether or not the object motion is extended in the scene such as the previous picture to represent an object that does not exist in the video data. If the determination is yes, the process jumps to step S7. In the case of NO, the above-described position information is formatted with the object ID set to 0 (step S16). When it does not exist in the video, the number of subsequent ESC objects, the horizontal position, the vertical position, the depth position, and the object ID at the center (or the same position) of the ESC object are determined using the ESC information of FIG. Record.

次に、ビデオデータとオブジェクトとリンクした複数のオーディオデータを圧縮する（ステップＳ１０）。次にオーディオストリームにリンクされたビデオオブジェクトのＩＤと同じ識別情報を記述する（ステップＳ１１）。次にＭＰＥＧの多重化を行い（ステップＳ１２）、所定の単位でメディアに記録、あるいは伝送する場合には伝送路特有のパケット化を行って伝送出力される（ステップＳ１３）。次に入力画像データがまだあるかどうかを判定し（ステップＳ１４）、ある場合（ＹＥＳ）にはステップＳ１へ飛ぶ。ない場合（ＮＯ）には、処理を終了する。 Next, a plurality of audio data linked with video data and objects are compressed (step S10). Next, the same identification information as the ID of the video object linked to the audio stream is described (step S11). Next, MPEG multiplexing is performed (step S12), and when recording or transmission on a medium in a predetermined unit, packetization specific to the transmission path is performed and transmitted (step S13). Next, it is determined whether there is still input image data (step S14). If there is (YES), the process jumps to step S1. If not (NO), the process is terminated.

これにより、本実施の形態の立体映像・立体音響対応記録システムでは、所定の時間単位でオブジェクトの立体位置情報をＭＰＥＧの規定のユーザーデータ領域若しくはプライベートストリーム、若しくは別領域の情報体を用いて記述し、所定の時間単位でオブジェクトの識別情報を立体映像データと音響データの双方のストリームの識別情報とリンクさせて記述することで、立体映像データを音響データと共に、記録メディアに記録することができる。 As a result, in the stereoscopic video / stereoacoustic recording system of the present embodiment, the stereoscopic position information of the object is described in predetermined time units using the MPEG-specified user data area or private stream, or an information body in another area. Then, by describing the object identification information linked with the identification information of the streams of both the stereoscopic video data and the audio data in a predetermined time unit, the stereoscopic video data can be recorded on the recording medium together with the audio data. .

（第２の実施の形態）次に図７を用いて、ＣＧ（コンピュータ・グラフィックス）をベースにした本発明の第２の実施の形態の立体映像・立体音響対応記録システムについて説明する。ＣＧでは、撮像のためのカメラやマイクを必要としないので、プログラムを実行することによってすべてＣＰＵ２０の処理により立体映像、立体音響を作成する。そのためにＣＰＵ２０の中に画像信号生成器２１と音源信号生成器２２があり、それぞれが専用のソフトにより起動される。ＣＧ画像データは基本的にはポリゴンなどの小サイズの画像に対し、位置の情報と奥行きの情報が予め備わっている。したがって、先に説明した４ｘ２のマクロブロックの相当するポリゴンの部分の位置情報は容易に計算が可能である。また、オーディオのほうも、ＣＧ画像のオブジェクトの位置に所定の音源データをシンセサイザーにて作成し、その音源を用いることで容易に作成することが可能である。それぞれ、ビデオはビデオ圧縮器２３に、オーディオはオーディオ圧縮器２４に入力され、圧縮が施される。圧縮されたデータは先の図１のシステムと同様に位置情報フォーマット器２５によってフォーマット化された情報と共に、情報多重化器２６によって多重化され、記録器２７によって記録メディア１７に記録される。 (Second Embodiment) Next, with reference to FIG. 7, a description will be given of a stereoscopic video / stereoacoustic recording system according to a second embodiment of the present invention based on CG (computer graphics). Since CG does not require a camera or a microphone for imaging, a 3D image and 3D sound are created by the processing of the CPU 20 by executing a program. For this purpose, there are an image signal generator 21 and a sound source signal generator 22 in the CPU 20, each of which is activated by dedicated software. The CG image data basically includes position information and depth information in advance for a small-sized image such as a polygon. Therefore, the position information of the polygon portion corresponding to the 4 × 2 macroblock described above can be easily calculated. The audio can also be easily created by creating predetermined sound source data at the object position of the CG image with a synthesizer and using the sound source. The video is input to the video compressor 23 and the audio is input to the audio compressor 24 to be compressed. The compressed data is multiplexed by the information multiplexer 26 together with the information formatted by the position information formatter 25 in the same manner as the system of FIG. 1, and is recorded on the recording medium 17 by the recorder 27.

これにより、本実施の形態の立体映像・立体音響対応記録システムでは、所定の時間単位でオブジェクトの立体位置情報をＭＰＥＧの規定のユーザーデータ領域若しくはプライベートストリーム、若しくは別領域の情報体を用いて記述し、所定の時間単位でオブジェクトの識別情報を立体映像データと音響データの双方のストリームの識別情報とリンクさせて記述することで、立体映像データを音響データと共に記録メディアに記録することができる。 As a result, in the stereoscopic video / stereoacoustic recording system of the present embodiment, the stereoscopic position information of the object is described in predetermined time units using the MPEG-specified user data area or private stream, or an information body in another area. Then, by describing the object identification information linked with the identification information of both the stereoscopic video data and the audio data in a predetermined time unit, the stereoscopic video data can be recorded on the recording medium together with the audio data.

（第３の実施の形態）図８は、本発明の第３の実施の形態の立体映像・立体音響対応記録システムを示している。図１に示した第１の実施の形態及び図７に示した第２の実施の形態においては最終的な情報は記録メディア１７に記録したが、本実施の形態のシステムでは、通信や放送特有のパケット化をして放送や通信網に伝送する。したがって、本実施の形態のシステムは通信（放送）用パケット化器１８を備え、情報多重化器１５からの立体映像データと立体音響データとの多重化データを通信（放送）用のパケットデータにパケット化し、通信網若しくは放送網に送出する。なお、本実施の形態のシステムでは、図１に示した第１の実施の形態のシステムと共通の機器要素に関しては共通の符号を付して示している。 (Third Embodiment) FIG. 8 shows a stereoscopic video / stereoscopic sound recording system according to a third embodiment of the present invention. In the first embodiment shown in FIG. 1 and the second embodiment shown in FIG. 7, the final information is recorded on the recording medium 17. However, in the system of this embodiment, communication and broadcasting are specific. Packetized and transmitted to a broadcast or communication network. Therefore, the system according to the present embodiment includes a communication (broadcast) packetizer 18, and the multiplexed data of the stereoscopic video data and the stereoscopic audio data from the information multiplexer 15 is converted into packet data for communication (broadcast). Packetize and send to communication network or broadcast network. In the system of the present embodiment, the same reference numerals are assigned to the device elements common to the system of the first embodiment shown in FIG.

本実施の形態のシステムによる立体映像・立体音響対応記録方法は、第１の実施の形態と同様に図６のフローチャートによる。これにより、本実施の形態のシステムでは、第１の実施の形態と同様に、所定の時間単位でオブジェクトの立体位置情報をＭＰＥＧの規定のユーザーデータ領域若しくはプライベートストリーム、若しくは別領域の情報体を用いて記述し、所定の時間単位でオブジェクトの識別情報を立体映像データと音響データの双方のストリームの識別情報とリンクさせて記述することで、立体映像データを音響データと共に多重化し、パケット化して送信できる。 The stereoscopic video / stereo sound correspondence recording method by the system of the present embodiment is based on the flowchart of FIG. 6 as in the first embodiment. As a result, in the system according to the present embodiment, as in the first embodiment, the stereoscopic position information of the object is converted into the MPEG-specified user data area or private stream, or the information body of another area in a predetermined time unit. By describing and linking the object identification information with the identification information of the streams of both the stereoscopic video data and the audio data in a predetermined time unit, the stereoscopic video data is multiplexed with the audio data and packetized. Can be sent.

（第４の実施の形態）次に、上記実施の形態の立体映像・立体音響対応記録システムにより作成され、記録メディア１７に記録された立体映像・立体音響対応記録情報を再生するための立体映像・立体音響対応再生システムについて、図９を用いて説明する。本実施の形態の再生システムは、再生器３１、情報分離器３２、ビデオ復号器３３、位置情報取り出し器３４、オーディオ復号器３５、視野変換器３６、立体画像表示器３７、音源選択器３８、音像位置制御器３９、スピーカレイ４０を備えている。 (Fourth Embodiment) Next, a stereoscopic video for reproducing the stereoscopic video / stereoacoustic recording information created by the stereoscopic video / stereoacoustic recording system of the above-described embodiment and recorded on the recording medium 17 is reproduced. A stereophonic sound compatible playback system will be described with reference to FIG. The reproduction system according to the present embodiment includes a reproduction unit 31, an information separator 32, a video decoder 33, a position information extraction unit 34, an audio decoder 35, a visual field converter 36, a stereoscopic image display 37, a sound source selector 38, A sound image position controller 39 and a speaker array 40 are provided.

この立体映像・立体音響対応再生システムでは、記録メディア１７より多重化されたデータを再生器３１にて読み取り、情報分離化器３２へ伝送する。情報分離器３２では、ビデオ信号とオーディオ信号のパケットを分離し、ビデオ信号はビデオ復号器３３に、オーディオ信号はオーディオ復号器３５にそれぞれ伝送する。ビデオ復号器３３ではビデオを復号すると同時に、ビデオのピクチャーレイヤのユーザーデータを位置情報取り出し器３４に伝送する。位置情報取り出し器３４ではユーザーデータから、図４のフォーマットで記録されている奥行き情報とオブジェクトの位置情報、及びオブジェクトＩＤ情報を取り出す。オブジェクトの位置情報、及びオブジェクトＩＤ情報は音源選択器３８に伝送する。奥行き情報と復号したビデオ信号は視野変換器３６に伝送する。視野変換器３６では、奥行き情報と復号した２次元の画像から、立体画像表示器３７の立体表示方式に応じた視差画像を生成する。この視差画像の生成に際して、ＣＧにおける座標系の変換方法には視野変換方式を用いる。これは視点座標系への変換の式によって、視点を変えた画像を得るもので、奥行き情報があれば自由な視点で生成することができる。例えば、視点の座標を（ｘ_ｉ，ｙ_ｉ，ｚ_ｉ）、注視点の座標を（ｘ_ａ，ｙ_ａ，ｚ_ａ）とする。また、視点と注視点間の距離を（ｘ_ｆ，ｙ_ｆ，ｚ_ｆ）とする。

In this stereoscopic video / stereoscopic reproduction system, the data multiplexed from the recording medium 17 is read by the reproduction device 31 and transmitted to the information separator 32. The information separator 32 separates the video signal and audio signal packets, and transmits the video signal to the video decoder 33 and the audio signal to the audio decoder 35. The video decoder 33 decodes the video and simultaneously transmits the user data of the video picture layer to the position information extractor 34. The position information extractor 34 extracts depth information, object position information, and object ID information recorded in the format of FIG. 4 from user data. The object position information and the object ID information are transmitted to the sound source selector 38. The depth information and the decoded video signal are transmitted to the visual field converter 36. The visual field converter 36 generates a parallax image corresponding to the stereoscopic display method of the stereoscopic image display 37 from the depth information and the decoded two-dimensional image. When generating the parallax image, a visual field conversion method is used as a conversion method of the coordinate system in CG. This is to obtain an image with a different viewpoint from the expression for conversion to the viewpoint coordinate system. If there is depth information, it can be generated from any viewpoint. For example, the coordinates of the viewpoint _{_{_{(x i, y i, z}}} i), the coordinates of the gazing point _{_{_{(x a, y a, z}}} a) and. In addition, the distance between the viewpoint and the gazing point is (x _f , y _f , z _f ).

最初に平行移動により原点の位置を動かす。この変換をＴ１とする。変換Ｔ１は単に（−ｘ_ａ，−ｙ_ａ，−ｚ_ａ）平行移動する変換である。次に回転により座標値の向きを変える。図１０のように点Ｏ_ａから点Ｏ_ｆ方向のベクトルは点Ｏ_ａからｚ軸のベクトルをまずα角だけｙ軸に回転させ、次にβ角だけｘ軸に回転させる。実際には点Ｏ_ｆの座標値を動かすので回転方向が逆になる。

First, move the position of the origin by parallel movement. This conversion is assumed to be T1. The transformation T1 is simply a transformation that translates (−x _a , −y _a , −z _a ). Next, the direction of the coordinate value is changed by rotation. Vector from point O _a of the point O _f direction as shown in FIG. 10 is rotated by the y-axis is first α angle vector of the z-axis from the point O _a, then allowed to rotate in only x-axis β corner. The direction of rotation is reversed so actually move the coordinates of the point O _f.

となる。ここでαはＯ_ｆをｘｙ平面に投影した足とＯ_ａのなす角であるので、

It becomes. Here, since α is the angle of the foot and O _a obtained by projecting the O _f the xy plane,

となる。またβはＯ_ａＯ_ｆ間の長さ

It becomes. The β length between _O a _{O f}

と、Ｏ_ｆＯ_ｆ’間の長さｙ_ｆにより

And the length y _f between O _f O _{f ′}

となる。 It becomes.

最後の変換はｘｙ平面に対してｚ軸が手前になるような座標系から、ｘｙ平面に対して目の方向、つまり向こう側が正になるようにする変換Ｔ_４を行う。これは単にｚ→−ｚにするだけである。これらＴ_１〜Ｔ_４の４つの変換マトリクスを掛け合わせると視点座標の変換マトリクスは、

Last conversion is performed z-axis from the coordinate system such that the front with respect to the xy plane, the eye direction relative to the xy plane, i.e. the transformation T ₄ to allow the other side is positive. This is simply z → -z. By multiplying these four conversion matrices T _{1 to} T _4, the viewpoint coordinate conversion matrix is

となる。 It becomes.

これは、ＩＰ（ＩｎｔｅｇｒａｌＰｈｏｔｏｇｒａｐｈｙ：インテグラルフォトグラフィー、あるいはインテグラルイメージングともいう）では、複数のレンズアレイに対応した要素画像を、そのレンズ位置に対応したカメラで撮像したものを、画像の大きさと共に、前記の視点座標の変換マトリクスを用いて計算して生成する。このようにして生成した立体画像データを、立体画像表示器３７に伝送し、立体画像再生を行う。なお、立体画像表示器３７の立体表示方式にはパララックスバリアを用いた２眼式立体表示方式を採用することもできる。そしてその場合には、視距離によって設定できるαを上の式に代入し、βやγは０とすることで、右目用と左目用の視差を持つ画像を生成することができる。 In IP (Integral Photography), an element image corresponding to a plurality of lens arrays is captured by a camera corresponding to the lens position. At the same time, it is calculated and generated using the conversion matrix of the viewpoint coordinates. The stereoscopic image data generated in this way is transmitted to the stereoscopic image display 37 to perform stereoscopic image reproduction. Note that a binocular stereoscopic display method using a parallax barrier may be employed as the stereoscopic display method of the stereoscopic image display 37. In that case, α that can be set according to the viewing distance is substituted into the above equation, and β and γ are set to 0, so that an image having parallax for the right eye and the left eye can be generated.

ここで立体画像表示方式のうち、代表的なパララックスバリア方式とＩＰ方式の説明をする。パララックスバリア方式は液晶によって実現することができる。図１１のように、これは２枚の液晶パネル１０１，１０２を積層するもので、一方の液晶パネル１０１には細いスリット状の開口部があり、その裏側の液晶パネル１０２上に適当な間隔をおいて左（Ｌ）右（Ｒ）２眼分の画像を交互に配置し、所定の視点１０３Ｌ，１０３Ｒからこのスリット状の開口部を通して見た場合に右目、左目に分離された画像を知覚できるものである。これによって右目、左目に違う画像を入力させることができるので、立体画像として知覚することができる。なお、液晶パネル１０２上の画像を照らすためにバックライト１０４が設けてられている。 Here, a typical parallax barrier method and an IP method among the stereoscopic image display methods will be described. The parallax barrier method can be realized by liquid crystal. As shown in FIG. 11, this is a laminate of two liquid crystal panels 101 and 102. One liquid crystal panel 101 has a thin slit-like opening, and an appropriate interval is provided on the liquid crystal panel 102 on the back side. In this case, images for the left (L) and right (R) two eyes can be alternately arranged, and the images separated from the right eye and the left eye can be perceived when viewed from the predetermined viewpoints 103L and 103R through the slit-shaped opening. Is. As a result, different images can be input to the right eye and the left eye, and can be perceived as a stereoscopic image. A backlight 104 is provided to illuminate the image on the liquid crystal panel 102.

しかしながら、パララックスバリア方式の場合、目のピントは常に液晶のスクリーン上に合わされているにもかかわらず、像がこの位置とは違う場所に感じられることから、生理学的な不自然さを伴うことで、ユーザーが疲れやすい、映像酔いしやすいなどの問題点も指摘されており、近年は４つの立体視の生理的要因、輻輳調節矛盾（輻輳点とピントの合う位置の矛盾）（両眼視差＝ある物体を見る際に、人間の左右の目はそれぞれ違った方向から見る２つの異なる像を捕らえている性質、ピント調節＝見る対象からの距離の変化に「伴って水晶体の厚さをコントロールしてレンズの厚みを変えるような性質、輻輳＝遠い、近いの変化で、眼球が内側に回転したり外側へ回転したりする動きを伴うという性質、運動視差＝ユーザーが自分で動いたり見る角度を変えたりすることによる像の違いを見る性質）を満たすような方式も提案されている。その中でも有望なものとして、Ｌｉｐｐｍａｎｎが１９０８年に発表した方式がＩＰ方式である。 However, in the case of the parallax barrier system, the eye is always focused on the liquid crystal screen, but the image appears to be different from this position, which is accompanied by physiological unnaturalness. In recent years, it has been pointed out that the user is prone to fatigue and video sickness. In recent years, there are four physiological factors of stereoscopic vision, congestion adjustment contradiction (contradiction of convergence point and focus position) (binocular parallax) = When looking at an object, the human right and left eyes capture two different images seen from different directions, focus adjustment = "The thickness of the lens is controlled as the distance from the object is changed" The nature of changing the thickness of the lens, convergence = distant, close change, the nature of the eyeball moving inward or outward, motion parallax = user himself Such a manner as to satisfy the property) to see the difference in the image due to changing the angle of view Italy has also been proposed. Among them as promising, method Lippmann announced in 1908 is an IP system.

ＩＰ方式は、２次元的に配列したレンズアレイ（フライアイレンズ、蝿の目レンズ、複眼レンズなどともいう）を利用して物体の奥行き情報を取得するものである。１９９０年代に入ると、従来の写真乾板による記録を電子技術で置き換えることにより、ＩＰ方式による動画を生成する技術が開発され、さらに、同文献の研究者の手により、屈折率分布レンズアレイ（ＧＲＩＮレンズアレイともいう）とハイビジョンカメラを用いて被写体を撮像して要素画像群を取得しながら、各画像を液晶ディスプレイにリアルタイムに伝送して表示し、フライアイレンズにより空間上に結像することに成功し、ＩＰ方式による３次元テレビジョン放送の実現可能性が示された（非特許文献３）。図１２はこのＩＰ方式の原理を説明したもので、図１２Ａのように撮影時に微小な要素レンズを多数並べたＧＲＩＮレンズアレイ１１０を用い、このＧＲＩＮレンズアレイ１１０の微小な要素レンズそれぞれの光を集光レンズ１１１で集光して微小カメラ１１２の１画素が１方向の光線の映像を撮影する。そして再生するときには、図１２Ｂのようにカメラからの映像１２０をＬＣＤのようなディスプレイ１２１で再現し、全部の微小カメラの１点ずつが集合して、全体として１方向から見た再生像をつくる。 In the IP method, depth information of an object is acquired using a two-dimensionally arranged lens array (also referred to as a fly-eye lens, a fly-eye lens, or a compound eye lens). In the 1990s, a technology for generating moving images by the IP method was developed by replacing electronic recording with a conventional photographic dry plate. Further, a refractive index distribution lens array (GRIN) was developed by a researcher of the same document. (It is also referred to as a lens array) and a high-vision camera is used to capture a subject and acquire an element image group, and each image is transmitted and displayed on a liquid crystal display in real time and is imaged in space by a fly-eye lens. Successful, the feasibility of 3D television broadcasting by the IP system was shown (Non-Patent Document 3). FIG. 12 illustrates the principle of this IP system. As shown in FIG. 12A, a GRIN lens array 110 in which a large number of minute element lenses are arranged at the time of photographing is used, and the light of each minute element lens of the GRIN lens array 110 is used. The light is collected by the condensing lens 111 and one pixel of the micro camera 112 captures an image of light in one direction. Then, when reproducing, the image 120 from the camera is reproduced on a display 121 such as an LCD as shown in FIG. 12B, and one point of all the minute cameras are gathered to form a reproduced image viewed from one direction as a whole. .

微小レンズを２次元に並べたレンズアレイ１２２を用いることで水平垂直の運動視差を作り出すことが可能であり、水平方向に並べれば水平方向のみの視差を持たせることも可能である。本方式では複数のレンズを経由して見えた複数の要素画像を、この要素画像を奥行き情報をもとに視点変換して作成し、その要素画像を配列して、あたかも図１２Ａで撮像したかのようにＬＣＤ１２１へ要素画像配列を表示することで立体視再生を実現する。 By using the lens array 122 in which micro lenses are arranged two-dimensionally, it is possible to create horizontal and vertical motion parallax, and it is also possible to have parallax only in the horizontal direction if arranged in the horizontal direction. In this method, a plurality of element images seen through a plurality of lenses are created by converting the viewpoints of the element images based on the depth information, and the element images are arranged, as if they were captured in FIG. 12A. As described above, stereoscopic image reproduction is realized by displaying the element image array on the LCD 121.

一方、オーディオはオーディオ復号器３５において、複数の音源の圧縮オーディオデータを復号したら、音源選択器３８に伝送する。音源選択器３８では、先に説明した位置情報取り出し器３４からのオブジェクトの位置情報及びオブジェクトＩＤ情報を受信し、画像の中に存在しているオブジェクト位置情報に応じて、そのオブジェクトＩＤにリンクした前記復号された音源を選択し、選択された音源データとオブジェクト位置情報を対にして音像位置制御器３９へ伝送する。音像位置制御器３９では、後述するスピーカレイ４０を用いた音像定位制御方式を用いて、それぞれの音源に対して、画像オブジェクトの位置に応じてそれにリンクしたオーディオ音源の定位を制御する。それぞれの音源に対して定位制御した結果の複数のスピーカに対応したオーディオデータは、それぞれ、１つのスピーカに対応する複数のオブジェクトの音源として得られる。これらはすべて線形加算し、ゲインを調節し、１つのスピーカから出力されるオーディオデータは１つにしてスピーカレイ４０へ伝送する。スピーカレイ４０は、伝送されたオーディオデータを出力する。 On the other hand, audio is transmitted to the sound source selector 38 after the audio decoder 35 decodes the compressed audio data of a plurality of sound sources. The sound source selector 38 receives the object position information and the object ID information from the position information extractor 34 described above, and links to the object ID in accordance with the object position information existing in the image. The decoded sound source is selected, and the selected sound source data and object position information are paired and transmitted to the sound image position controller 39. The sound image position controller 39 controls the localization of an audio sound source linked to each sound source according to the position of the image object, using a sound image localization control method using a speaker array 40 described later. Audio data corresponding to a plurality of speakers as a result of localization control for each sound source is obtained as a sound source of a plurality of objects corresponding to one speaker. All of these are linearly added, the gain is adjusted, and the audio data output from one speaker is transmitted to the speaker array 40 as one. The speaker array 40 outputs the transmitted audio data.

ここで音像定位制御方式の説明をする。ここではスピーカレイを用いて、空間上のある焦点付近の音圧を局所的に上昇させるようにスピーカレイの中心から焦点までの経路と、各スピーカから焦点までの経路との差に応じた遅延量を与えた再生信号により音像定位を実現する。図１３を用いてその原理を説明する。まずスピーカを図１３のようにアレイ状に組んでスピーカレイ４０を構成する。１つ１つのスピーカに遅延回路１３１を設ける。そして上述した既知の方法で遅延回路１３１を用いて聴取位置近傍に焦点を結ぶように遅延を設定すると、聴取位置においてスピーカからの直接音よりも、焦点において発生する音圧成分が極めて高くなるように再生することが可能である。この原理を用いて連続的にリアルタイムで制御することで立体動画像のオブジェクトの位置にリンクして音像の定位を制御する。 Here, the sound image localization control method will be described. Here, using a speaker ray, the delay according to the difference between the route from the center of the speaker ray to the focal point and the route from each speaker to the focal point so as to locally increase the sound pressure near a focal point in space. Sound image localization is realized by a reproduction signal given a quantity. The principle will be described with reference to FIG. First, the speaker array 40 is constructed by assembling speakers in an array as shown in FIG. A delay circuit 131 is provided for each speaker. When the delay is set by using the delay circuit 131 in the known method to focus on the vicinity of the listening position, the sound pressure component generated at the focal point is much higher than the direct sound from the speaker at the listening position. It is possible to play back. By using this principle and continuously controlling in real time, the localization of the sound image is controlled by linking to the position of the object of the stereoscopic moving image.

これらの一連のデータ処理により、画像は立体画像として再生し、その中の画像のオブジェクトは、その立体的な動きにリンクした形で、オーディオデータの音源の位置を制御し、あたかもその立体視されている空間から該当オブジェクトの発する音が聞こえているかのように音像定位をして再生する。すなわち、図１４のように、立体映像の映像１２０をＬＣＤ１２１を用いて再生し、それを例えばＩＰ方式のレンズアレイ１２２を用いて要素画像から立体映像を構成し、その後に設定したスピーカレイ４０にて、ディスプレイ１２１より前に出てきているように知覚される車や飛行機のような立体映像のオブジェクトにリンクした形でそのオブジェクトの音源があたかもそのオブジェクトの位置から聞こえて来るように音像定位をして再生する。 Through this series of data processing, the image is reproduced as a three-dimensional image, and the object of the image in the image is linked to the three-dimensional movement to control the position of the sound source of the audio data, as if it were three-dimensionally viewed. The sound is localized and reproduced as if the sound emitted by the object is heard from the space. That is, as shown in FIG. 14, a stereoscopic video 120 is reproduced using the LCD 121, and a stereoscopic video is constructed from the element images using, for example, the IP lens array 122, and then the speaker array 40 is set. The sound image localization is performed so that the sound source of the object is heard from the position of the object in a form linked to a stereoscopic image object such as a car or airplane that is perceived as coming out of the display 121. And play.

次に、本ＩＰ方式を用いた立体映像・立体音響対応再生システムによる再生処理を図１５のフローチャートを用いて説明する。はじめに記録メディア１７から、多重化されたデータを読み取る（ステップＲ１）。次にＭＰＥＧ分離処理する（ステップＲ２）。分離した情報のうちのビデオデータはビデオデータ復号処理し、またオーディオデータはオーディオデータ復号処理する（ステップＲ３，Ｒ９）。次に復号したビデオデータのユーザーデータからオブジェクトＩＤと位置情報を分離する（ステップＲ４）。次に分離した情報から、マクロブロック毎の奥行き情報を検出する（ステップＲ５）。またオブジェクトのＩＤを検出する（ステップＲ６）。 Next, reproduction processing by the stereoscopic video / stereoscopic reproduction system using the IP system will be described with reference to the flowchart of FIG. First, the multiplexed data is read from the recording medium 17 (step R1). Next, MPEG separation processing is performed (step R2). Of the separated information, video data is subjected to video data decoding processing, and audio data is subjected to audio data decoding processing (steps R3 and R9). Next, the object ID and position information are separated from the user data of the decoded video data (step R4). Next, depth information for each macroblock is detected from the separated information (step R5). Further, the ID of the object is detected (step R6).

一方、ステップＲ９において復号したオーディオデータから当該オーディオデータのストリームＩＤなどに記述したオブジェクトＩＤを検出する（ステップＲ１０）。次に複数のオーディオオブジェクトのＩＤと、ビデオのＩＤとを照合して、ビデオのオブジェクトにリンクしたオーディオデータの音源の選択を行う（ステップＲ１１）。 On the other hand, the object ID described in the stream ID of the audio data is detected from the audio data decoded in step R9 (step R10). Next, the audio data ID linked to the video object is selected by collating the IDs of the plurality of audio objects with the video ID (step R11).

次に、ビデオは各マクロブロック内の画素に対して持つ奥行き情報を使用して、ビデオ画像の視野変換を行う（ステップＲ７）。これには、視点座標系への変換式によって視点を変えた画像を得る。次にＩＰ方式によってビデオの立体表示を行う（ステップＲ８）。この立体表示方式には、パララックスバリアを用いた２眼式立体表示方式を用いることもできる。 Next, the video uses the depth information held for the pixels in each macroblock to perform visual field conversion of the video image (step R7). For this, an image in which the viewpoint is changed by a conversion formula to the viewpoint coordinate system is obtained. Next, stereoscopic display of video is performed by the IP method (step R8). As this stereoscopic display method, a binocular stereoscopic display method using a parallax barrier can also be used.

一方、オーディオは選択したリンク関係が明確化された複数の音源それぞれに対して、音像位置制御器３９で、例えばスピーカレイ４０を用いた音像定位制御方式を用いて、それぞれの音源に対して、画像オブジェクトの位置に応じて、それにリンクしたオーディオ音源の定位を制御する（ステップＲ１２）。次にスピーカレイ４０でそれぞれの音源に対して定位制御した結果の複数のスピーカに対応したオーディオデータを線形加算し、またゲインを調節し、１つのスピーカから出力されるオーディオデータは１つにしてスピーカレイ４０から出力する（ステップＲ１３）。 On the other hand, for each of a plurality of sound sources in which the selected link relation is clarified, the audio is controlled by the sound image position controller 39 using, for example, a sound image localization control method using a speaker array 40, for each sound source. In accordance with the position of the image object, the localization of the audio sound source linked to the image object is controlled (step R12). Next, the audio data corresponding to a plurality of speakers as a result of localization control for each sound source by the speaker array 40 is linearly added, and the gain is adjusted so that one audio data is output from one speaker. Output from the speaker array 40 (step R13).

このようにして本実施の形態の立体映像・立体音響対応再生システムでは、立体映像の映像１２０をＬＣＤ１２１を用いて再生し、それをＩＰ方式のレンズアレイ１２２を用いて要素画像から立体映像を構成し、その後に設定したスピーカレイ４０にて、ディスプレイ１２１より前に出てきているように知覚される車や飛行機のような立体映像のオブジェクトにリンクした形でそのオブジェクトの音源があたかもそのオブジェクトの位置から聞こえて来るように音像定位をして再生することができる。 In this manner, in the stereoscopic video / stereoscopic playback system of the present embodiment, the stereoscopic video 120 is reproduced using the LCD 121, and the stereoscopic image is formed from the element image using the IP lens array 122. Then, the sound source of the object is linked to a stereoscopic image object such as a car or an airplane that is perceived as coming out of the display 121 by the speaker array 40 set after that, as if the sound source of the object is The sound image can be localized and reproduced so that it can be heard from the position.

（第５の実施の形態）
次に、本発明の第５の実施の形態の立体映像・立体音響対応再生システムについて、図１６を用いて説明する。図９に示した第４の実施の形態の立体映像・立体音響対応再生システムでは、最終的な情報は記録メディア１７から再生するものであったが、本実施の形態の再生システムは、通信や放送特有のパケット化がなされているパケット情報を受信して、図１６のように通信（放送）用パケット解除器３１′を経由して放送や通信網から立体映像・音響データのパケットを受信再生することを特徴とする。このパケット解除器３１′によりパケット解除したデータは、図９における記録メディア１７からの再生データと同じものであり、したがって情報分離化器３２以降の構成要素、またその処理機能は図９に示した第４の実施の形態のものと共通である。 (Fifth embodiment)
Next, a stereoscopic video / stereoscopic reproduction system according to a fifth embodiment of the present invention will be described with reference to FIG. Although the final information is reproduced from the recording medium 17 in the stereoscopic image / stereoscopic reproduction system of the fourth embodiment shown in FIG. 9, the reproduction system of the present embodiment Receives packet information that has been packetized peculiar to broadcasting, and receives and reproduces 3D video / audio data packets from broadcasting and communication networks via a communication (broadcasting) packet releaser 31 'as shown in FIG. It is characterized by doing. The data whose packet has been canceled by the packet canceller 31 'is the same as the reproduction data from the recording medium 17 in FIG. 9, and therefore the components after the information separator 32 and their processing functions are shown in FIG. This is the same as that of the fourth embodiment.

これにより、第５の実施の形態の立体映像・立体音響対応再生システムでも、通信網あるいは放送受信した立体映像の映像を、図１４に示したＬＣＤ１２１を用いて再生し、それをＩＰ方式のレンズアレイ１２２を用いて要素画像から立体映像を構成し、その後に設定したスピーカレイ４０にて、ディスプレイ１２１より前に出てきているように知覚される車や飛行機のような立体映像のオブジェクトにリンクした形でそのオブジェクトの音源があたかもそのオブジェクトの位置から聞こえて来るように音像定位をして再生することができる。 As a result, even in the stereoscopic video / stereoscopic playback system of the fifth embodiment, the stereoscopic video image received via the communication network or broadcast is played back using the LCD 121 shown in FIG. A three-dimensional image is constructed from element images using the array 122, and then linked to a three-dimensional object such as a car or airplane that is perceived as appearing in front of the display 121 by the speaker array 40 set thereafter. In this way, the sound source of the object can be reproduced with sound image localization so that it can be heard from the position of the object.

なお、本発明は上記の実施の形態に限定されることはなく、次のような変形態様が可能である。上記実施の形態では、立体映像の方式はＩＰ方式で説明したが、パララックスバリア方式、レンチキュラーレンズ方式、超多眼方式、偏向眼鏡を用いた２眼方式、アナグリフなど、立体知覚できる方式であればなんであってもよい。また音像位置制御方式は、音像定位制御方式としてスピーカレイ方式で説明したが、仮想音場空間を実現できる方式、例えばバイノーラル・トランスオーラル方式であっても、Ｋｉｒｃｈｈｏｆｆ−Ｈｅｌｍｈｏｔｚ微分方程式に代表される波動音響理論を用いた音場制御法を用いる方式であってもよい。 In addition, this invention is not limited to said embodiment, The following deformation | transformation aspects are possible. In the above-described embodiment, the stereoscopic video method is described as the IP method. However, the stereoscopic video method may be a parallax barrier method, a lenticular lens method, a super multi-view method, a binocular method using deflecting glasses, an anaglyph, or the like. Whatever. In addition, the sound image position control method has been described with the speaker ray method as the sound image localization control method, but even a method that can realize a virtual sound field space, for example, a binaural transoral method, is represented by a wave represented by Kirchoff-Helmhotz differential equations. A method using a sound field control method using acoustic theory may be used.

また、記録メディアに立体映像・音響データを記録しなくても、通信、放送などあらゆる伝送メディアを経由してそれらのデータを送信することが可能で、その場合には、記録装置は伝送装置として使用することもできる。また再生装置は受信装置として使用することも可能である。 In addition, it is possible to transmit such data via any transmission media such as communication and broadcasting without recording stereoscopic video / audio data on the recording media. It can also be used. The playback device can also be used as a receiving device.

本発明の信号データを記録した記録メディアは、オブジェクトの位置情報とオブジェクトのＩＤ情報を記録してあるというメディア特有の効果があり、立体映像や立体音場を再生するシステムを好適に実現することができる。また、記録メディアにおける「メディア」という定義はデータを記録できるメディアという、狭義なメディアというものだけでなく、信号データを伝送するための電磁波、光などを含む。また、記録メディアに記録されている情報は、記録されていない状態での電子ファイルなどのデータ自身を含むものとする。 The recording medium on which the signal data of the present invention is recorded has a media-specific effect that the object position information and the object ID information are recorded, and preferably realizes a system for reproducing a three-dimensional video or a three-dimensional sound field. Can do. In addition, the definition of “media” in a recording medium includes not only a narrowly-defined medium that can record data, but also electromagnetic waves and light for transmitting signal data. The information recorded on the recording medium includes data itself such as an electronic file in a state where it is not recorded.

さらに、上記実施の形態では、映像の奥行き情報は１ピクチャー毎に記録するように説明したが、０．５秒程度ごとでも、１秒程度ごとでも構わない。その場合には、ＭＰＥＧのＧＯＰレイヤのユーザーデータを用いることで実現できる。オブジェクトのＩＤは８ビットであっても１６ビットであっても構わない。また、オブジェクトとは映像の物体であっても領域であっても構わない。アルゴリズムによってはオブジェクトを検出するのに誤差を含むこともあるが、それは無視するものであってもよい。またさらに、領域は閉曲線で指定できなくてもよい。加えて、音の定位はあくまでもノーマルなステレオ再生よりもわずかでも定位の効果があれば、定位を制御したものと考えられる。 Furthermore, in the above-described embodiment, the video depth information has been described as being recorded for each picture, but it may be about every 0.5 seconds or about every 1 second. In that case, it can be realized by using user data of the GOP layer of MPEG. The object ID may be 8 bits or 16 bits. The object may be a video object or a region. Some algorithms may contain errors in detecting objects, but they may be ignored. Furthermore, the region may not be specified by a closed curve. In addition, if the sound localization has a localization effect even slightly compared to normal stereo reproduction, it is considered that the localization is controlled.

加えて、上記実施の形態では、奥行き情報やオブジェクトＩＤ情報はＭＰＥＧビデオのユーザーデータを用いて説明したが、ＭＰＥＧシステムレイヤ規定のプライベートストリームで記録しても構わない。この場合、データはピクチャー毎、若しくは複数のＧＯＰ毎に時間に同期させることからＰＴＳ（ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ）を用いて、画像や音響データと同期をとるので、ＭＰＥＧ規定のＰｒｉｖａｔｅ＿ｓｔｒｅａｍ＿１の同期型のストリーム形式が望ましい。また、図４における情報は映像音響データとは別の領域、すなわち、ＭＰＥＧのストリームとは別に、記録メディアの別のファイルとして、図４の構造をそのままで、符号化された映像音響データのプログラム毎に、名前をつけたファイルに記録してもよい。その場合は再生順番（＝入力画像順番）のピクチャー（フィールド、若しくは画像）順番か、ＭＰＥＧにおける符号化順番のピクチャー（フィールド、若しくは画像）順番で記録することも可能である。ファイル名はプログラムの識別が可能であれば番号でも、アスキー文字のものでもよい。またプログラムごとでも、幾つかのプログラムを組み合わせたプレイリスト毎でも、メディア全体を１つにした１ファイルであってもよい。 In addition, although the depth information and the object ID information have been described using MPEG video user data in the above embodiment, they may be recorded in a private stream defined by the MPEG system layer. In this case, since data is synchronized with time for each picture or for each of a plurality of GOPs, it is synchronized with images and sound data using PTS (Presentation Time Stamp), so that the private stream stream format of the private stream defined by MPEG is used. Is desirable. In addition, the information in FIG. 4 is an encoded audiovisual data program in the same area as the audiovisual data, that is, as a separate file on the recording medium, separately from the MPEG stream, with the structure of FIG. 4 as it is. Each may be recorded in a named file. In that case, it is also possible to record in the picture (field or image) order of the reproduction order (= input image order) or the picture (field or image) order of the encoding order in MPEG. The file name may be a number or an ASCII character as long as the program can be identified. Further, each file may be a single file that combines the entire media, or a playlist that combines several programs.

さらに、上記した装置の機能はプログラムによりコンピュータに実現させてもよい。そしてそのプログラムは、記録メディアに記録されたものをその記録メディアから読み取らせてコンピュータに取り込ませてもよいし、通信ネットワークを介して伝送されてきたものをコンピュータに取り込ませてもよい。 Furthermore, the functions of the apparatus described above may be realized by a computer by a program. The program may be recorded on a recording medium, read from the recording medium, and loaded into a computer, or transmitted through a communication network into the computer.

本発明の第１の実施の形態の立体映像・立体音響対応記録装置の機能ブロック図。1 is a functional block diagram of a recording apparatus for stereoscopic video / stereoscopic sound according to a first embodiment of the present invention. 上記第１の実施の形態の立体映像・立体音響対応記録装置の主要部のブロック図。The block diagram of the principal part of the recording apparatus corresponding to the three-dimensional video / stereo sound of the first embodiment. 一般的な２画面の視差を求めるためのエピポーラ拘束条件の説明図。Explanatory drawing of the epipolar constraint conditions for calculating | requiring the general parallax of 2 screens. 奥行き情報とオブジェクトＩＤ情報などのフォーマット説明図。The format explanatory drawing of depth information, object ID information, etc. FIG. 全周囲を収録するマイクの設置方法の説明図。Explanatory drawing of the installation method of the microphone which records the whole circumference. 本発明の第１の実施の形態の立体映像・立体音響対応記録装置による立体映像・立体音響対応記録処理を示すフローチャート図。The flowchart figure which shows the stereo image / stereo sound correspondence recording process by the stereo image / stereo sound correspondence recording apparatus of the 1st Embodiment of this invention. 本発明の第２の実施の形態の立体映像・立体音響対応記録装置を示すブロック図。The block diagram which shows the three-dimensional video / stereo sound corresponding recording apparatus of the 2nd Embodiment of this invention. 本発明の第３の実施の形態の立体映像・立体音響対応伝送装置を示すブロック図。The block diagram which shows the transmission apparatus corresponding to the stereo image / stereo sound of the 3rd Embodiment of this invention. 本発明の第４の実施の形態の立体映像・立体音響対応再生装置を示すブロック図。The block diagram which shows the reproduction | regeneration apparatus corresponding to the three-dimensional video / stereo sound of the 4th Embodiment of this invention. 一般的な視点変換の説明図。Explanatory drawing of general viewpoint conversion. 一般的なパララックスバリア方式の説明図。Explanatory drawing of a general parallax barrier system. 一般的なＩＰ方式の説明図。（１）一般的なＩＰ方式の説明図。（２）Explanatory drawing of a general IP system. (1) An explanatory diagram of a general IP system. (2) 本発明の第４の実施の形態で使用するアレイスピーカの説明図。Explanatory drawing of the array speaker used in the 4th Embodiment of this invention. 上記第４の実施の形態によるスピーカアレイとＩＰ立体映像方式のシステムのブロック図。The block diagram of the system of a speaker array and IP stereoscopic video system by the said 4th Embodiment. 上記第４の実施の形態による立体映像・立体音響再生処理を示すフローチャート図。The flowchart figure which shows the three-dimensional video and three-dimensional sound reproduction | regeneration processing by the said 4th Embodiment. 本発明の第５の実施の形態の立体映像・立体音響対応再生システムを示すブロック図。The block diagram which shows the reproduction | regeneration system corresponding to the three-dimensional video / stereo sound of the 5th Embodiment of this invention. ＭＰＥＧのビデオストリームビデオレイヤの説明図。FIG. 3 is an explanatory diagram of an MPEG video stream video layer. ＭＰＥＧの多重化トランスポートストリームシステムレイヤを説明図。（１）ＭＰＥＧの多重化トランスポートストリームシステムレイヤを説明図。（２）FIG. 2 is an explanatory diagram of an MPEG multiplexed transport stream system layer. (1) An explanatory diagram of an MPEG multiplexed transport stream system layer. (2) 本発明の第１の実施の形態で採用する新規なオブジェクトＩＤを記述するＯｂｊｅｃｔ＿ＩＤ＿Ｄｅｓｃｒｉｐｔｏｒを示す説明図。Explanatory drawing which shows Object_ID_Descriptor which describes new object ID employ | adopted by the 1st Embodiment of this invention.

Explanation of symbols

１カメラＡ
２カメラＢ
３視差ベクトル抽出器
４奥行き距離算出器
５オブジェクト解析器
６水平方向位置分析器
７垂直方向位置分析器
８奥行き方向位置分析器
９位置情報フォーマット器
１０ＣＰＵ
１１複数マイク群
１２音源選択器
１３オーディオ圧縮器
１４ビデオ圧縮器
１５情報多重化器
１６記録器
１７記録メディア
３１再生器
３２情報分離器
３３ビデオ復号器
３４位置情報取り出し器
３５オーディオ復号器
３６視野変換器
３７立体画像表示器
３８音源選択器
３９音像位置制御器
４０スピーカレイ 1 Camera A
2 Camera B
3 Disparity vector extractor 4 Depth distance calculator 5 Object analyzer 6 Horizontal position analyzer 7 Vertical position analyzer 8 Depth position analyzer 9 Position information formatter 10 CPU
DESCRIPTION OF SYMBOLS 11 Multiple microphone group 12 Sound source selector 13 Audio compressor 14 Video compressor 15 Information multiplexer 16 Recorder 17 Recording medium 31 Regenerator 32 Information separator 33 Video decoder 34 Position information extractor 35 Audio decoder 36 View conversion 37 Stereoscopic image display 38 Sound source selector 39 Sound image position controller 40 Speaker layout

Claims

Recording both stereoscopic video data and audio data on a recording medium;
Refers to the stereo position information of the object for which stereo localization control is performed using the sound source as a sound source at the time of reproduction, together with the identification information of the object, at the time of reproduction of the stereoscopic video data and audio data on the recording medium in a predetermined time unit. A recording program for stereoscopic video / stereo sound that causes a computer to execute the step of recording in a predetermined storage area.

Reading and playing back stereoscopic video data and audio data recorded on a recording medium;
Reading object identification information and stereo position information for performing stereo localization control of the sound source from a predetermined storage area of the recording medium;
When reproducing the acoustic data, based on the three-dimensional position information corresponding to the identification information of the object, the sound image is obtained using at least two or more speakers so that the three-dimensional localization position of the sound image of the acoustic data becomes the three-dimensional position of the object. A reproduction program for stereoscopic video and stereophonic sound that causes a computer to execute the step of performing position control.

Means for recording both stereoscopic video data and audio data on a recording medium;
Refers to the stereo position information of the object for which stereo localization control is performed using the sound source as a sound source at the time of reproduction, together with the identification information of the object, at the time of reproduction of the stereoscopic video data and audio data on the recording medium in a predetermined time unit. A stereoscopic video / stereoscopic recording device comprising: means for recording in a predetermined predetermined storage area.

Means for reading and playing back stereoscopic video data and audio data recorded on a recording medium;
Means for reading identification information and stereoscopic position information of an object for performing stereo localization control of a sound source from a predetermined storage area of the recording medium;
When reproducing the acoustic data, based on the three-dimensional position information corresponding to the identification information of the object, the sound image is obtained using at least two or more speakers so that the three-dimensional localization position of the sound image of the acoustic data becomes the three-dimensional position of the object. A reproduction apparatus for stereoscopic video and stereophonic sound, comprising: means for performing position control.

Both the stereoscopic video data and the audio data are recorded, and the stereo position information of the object for which the stereo localization control of the sound source is controlled in a predetermined time unit, together with the identification information of the object, the stereo video data and the audio data of the recording medium. A recording medium for stereoscopic video / stereo sound recorded in a predetermined storage area that can be referred to during playback.