JP2006191357A

JP2006191357A - Reproduction device and reproduction program

Info

Publication number: JP2006191357A
Application number: JP2005001365A
Authority: JP
Inventors: Takayuki Sugawara; 隆幸菅原; Shinji Nakamura; 伸司中村; Kunio Yamada; 邦男山田; Akinari Suehiro; 晃也末廣
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2005-01-06
Filing date: 2005-01-06
Publication date: 2006-07-20

Abstract

PROBLEM TO BE SOLVED: To provide a reproduction device and a reproduction program for switching and reproducing a two-dimensional image and a three-dimensional image on the basis of the two-dimensional image and depth information when a two-dimensional image and depth information of each pixel are recorded in a recording medium specified by a predetermined format. SOLUTION: This reproduction device 100 is provided with: an information separator 102 and a video decoder 103 for separating a first bit stream from a multiplexing stream recorded in a recording medium 9 based on DVD video specifications, and for reproducing a two-dimensional image; an information separator 102, video decoder 103 and depth information extraction unit 104 for separating a second bit stream from the multiplexing stream, and for reproducing information related with the two-dimensional image; and a visual field converter 106 and a stereoscopic image display unit 105 for generating a three-dimensional image based on the reproduced two-dimensional image and information related with the depth of the two dimensional image. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、所定の規格にフォーマット化された記録媒体に記録された、２次元画像の所定単位毎に求められた当該２次元画像の奥行きに関する情報を再生するための再生装置および再生プログラムに関する。 The present invention relates to a playback apparatus and a playback program for playing back information related to the depth of a two-dimensional image obtained for each predetermined unit of a two-dimensional image recorded on a recording medium formatted to a predetermined standard.

従来、例えば、特許文献１には、左右一対の撮像カメラにより撮像された対象物体に関するステレオ画像（一対の並んだ２次元画像；左眼画像、右眼画像）において相関関数等により互いに対応付けられた一対の画素点間の視差ベクトルを求め、この視差ベクトルに基づいて、ステレオ画像から対象物体までの距離を求める装置が開示されている。 Conventionally, for example, in Patent Document 1, stereo images (a pair of two-dimensional images; a left-eye image and a right-eye image) related to a target object imaged by a pair of left and right imaging cameras are associated with each other by a correlation function or the like. An apparatus for obtaining a disparity vector between a pair of pixel points and obtaining a distance from a stereo image to a target object based on the disparity vector is disclosed.

すなわち、この特許文献１によれば、例えば左眼画像の注目点の右眼画像上の投影線であるエピポーラ線の方向およびこのエピポーラ線に直交する方向を求める。 That is, according to Patent Document 1, for example, the direction of an epipolar line that is a projection line on the right eye image of the attention point of the left eye image and a direction orthogonal to the epipolar line are obtained.

そして、求めたエピポーラ線の方向およびその直交方向それぞれに基づくエピポーラ線方向探索範囲および直交方向探索範囲から構成された左眼画像上の２次元探索範囲を決定し、左眼画像上の２次元探索範囲内のある点の座標およびこの点に対応する右眼画像上の対応点の座標の差分を、その点の視差ベクトルとして求める。この視差ベクトルを、左眼画像上の２次元探索範囲内における全ての点において求め、求めた視差ベクトルを用いてステレオ画像から対象物体までの距離、言い換えれば、ステレオ画像上に表示された対象物体の奥行き情報を算出している。 Then, the two-dimensional search range on the left-eye image is determined by determining the two-dimensional search range on the left-eye image composed of the epipolar line direction search range and the orthogonal direction search range based on the obtained epipolar line direction and the orthogonal directions respectively. The difference between the coordinates of a certain point in the range and the coordinates of the corresponding point on the right-eye image corresponding to this point is obtained as the disparity vector of that point. This disparity vector is obtained at every point in the two-dimensional search range on the left eye image, and the distance from the stereo image to the target object using the obtained disparity vector, in other words, the target object displayed on the stereo image Depth information is calculated.

算出したステレオ画像から対象物体までの距離（奥行き情報）により、対象物体を立体画像、すなわち、３次元画像として表示することが可能になる。 Based on the calculated distance from the stereo image to the target object (depth information), the target object can be displayed as a stereoscopic image, that is, a three-dimensional image.

上述した３次元映像を作成する際に必要な２種類の２次元映像に対応する２種類のビデオデータを記録する際の技術として、特許文献２には、映像信号のメインビデオストリームとして、２種類のビデオデータの内の一方のビデオデータを格納し、この一方のビデオデータから例えばＭＰＥＧ（Motion Picture Experts Group）予測符号化等により作成された他方のビデオデータを、映像信号におけるプライベートストリーム、ユーザーエリア等の任意の使用可能領域に補助データとして記録することにより、立体映像表示に必要な２種類のビデオデータを記録する方式が開示されている。
特開２０００−２８３５６号公報特開平９−３２７０４１号公報 As a technique for recording two types of video data corresponding to two types of two-dimensional video necessary for creating the above-described three-dimensional video, Patent Document 2 discloses two types of video signals as main video streams. One video data is stored, and the other video data created from the one video data by, for example, MPEG (Motion Picture Experts Group) predictive coding is converted into a private stream or user area in the video signal. A method of recording two types of video data necessary for stereoscopic image display by recording as auxiliary data in any usable area such as the above is disclosed.
JP 2000-28356 A Japanese Patent Laid-Open No. 9-327041

しかしながら、特許文献１および２に開示された技術等の従来の技術においては、２次元ビデオデータから３次元映像を生成するために必要なその２次元映像（２次元ビデオデータ）を構成する２次元画像の奥行き情報が、所定のフォーマットで規格化された記録媒体に対し、その規格化されたフォーマットに対し互換性を保持した状態で、且つ切替再生（２次元映像または３次元映像として効率的に切替る再生）できるように記録するための具体的な記録方式は開発されてはおらず、したがって、その具体的な記録方式により記録された奥行き情報を利用して３次元画像を生成する再生方式を開発したいという要望が生じていた。 However, in the conventional techniques such as the techniques disclosed in Patent Documents 1 and 2, the two-dimensional video that forms the two-dimensional video (two-dimensional video data) necessary for generating the three-dimensional video from the two-dimensional video data. For the recording medium in which the depth information of the image is standardized in a predetermined format, while maintaining compatibility with the standardized format, switching reproduction (2D video or 3D video efficiently) No specific recording method has been developed so that recording can be performed. Therefore, a reproduction method for generating a three-dimensional image using depth information recorded by the specific recording method has not been developed. There was a demand for development.

本発明は上述した事情に鑑みてなされたものであり、２次元画像およびその画素毎の奥行き情報が所定のフォーマットで規格された記録媒体に記録されている際に、その２次元画像および奥行き情報に基づいて２次元画像および３次元画像を切替再生できる再生装置および再生プログラムを提供することをその目的とする。 The present invention has been made in view of the above circumstances, and when a two-dimensional image and depth information for each pixel are recorded on a recording medium standardized in a predetermined format, the two-dimensional image and depth information are recorded. It is an object of the present invention to provide a playback device and a playback program capable of switching and playing back two-dimensional images and three-dimensional images based on the above.

請求項１記載の発明は、上記課題を解決するため、所定の規格にフォーマット化された記録媒体に記録されており、２次元画像を前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより圧縮符号化して成る第１のビットストリームと、前記２次元画像の所定単位毎に求められた当該２次元画像の奥行きに関する情報をフォーマット化して成る第２のビットストリームとが多重化された多重化ストリームを再生する再生装置であって、前記多重化ストリームから前記第１のビットストリームを分離して前記２次元画像を再生する手段と、前記多重化ストリームから前記第２のビットストリームを分離して前記２次元画像の奥行きに関する情報を再生する手段と、前記再生した２次元画像および当該２次元画像の奥行きに関する情報に基づいて３次元画像を生成する手段とを備えたことを要旨とする。 In order to solve the above problems, the invention described in claim 1 is recorded on a recording medium formatted according to a predetermined standard, and a two-dimensional image is compressed by a compression encoding format conforming to the format of the recording medium. A multiplexed stream in which a first bit stream formed by converting and a second bit stream formed by formatting information on the depth of the two-dimensional image obtained for each predetermined unit of the two-dimensional image are multiplexed. A reproducing apparatus for reproducing, wherein the first bit stream is separated from the multiplexed stream to reproduce the two-dimensional image; and the second bit stream is separated from the multiplexed stream and the 2 Means for reproducing information relating to the depth of a two-dimensional image, the reproduced two-dimensional image, and information relating to the depth of the two-dimensional image. And summarized in that a section for generating a three-dimensional image based on.

請求項２記載の発明は、上記課題を解決するため、所定の規格にフォーマット化された記録媒体に記録されており、２次元画像を前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより圧縮符号化して成るビデオストリームと、このビデオストリームにおける前記圧縮符号化フォーマットにおいて設定される任意使用領域に格納された前記２次元画像の奥行きに関する情報とが多重化されて構成された多重化ストリームを再生する再生装置であって、前記多重化ストリームから前記ビットストリームを分離して前記２次元画像を再生する手段と、前記分離したビットストリームにおける前記任意使用領域に格納された前記２次元画像の奥行きに関する情報を再生する手段と、前記再生した２次元画像および当該２次元画像の奥行きに関する情報に基づいて３次元画像を生成する手段とを備えたことを要旨とする。 In order to solve the above problems, the invention according to claim 2 is recorded on a recording medium formatted according to a predetermined standard, and a two-dimensional image is compressed by a compression encoding format conforming to the format of the recording medium. A multiplexed stream formed by multiplexing a video stream formed by combining the information about the depth of the two-dimensional image stored in an arbitrary use area set in the compression encoding format in the video stream A reproduction apparatus, wherein the bit stream is separated from the multiplexed stream to reproduce the two-dimensional image, and information about the depth of the two-dimensional image stored in the arbitrary use area in the separated bit stream , The reproduced two-dimensional image and the back of the two-dimensional image And summarized in that a section for generating a 3-dimensional image based on the information about the air.

請求項３記載の発明は、上記課題を解決するため、前記２次元画像は、２次元映像を構成する複数のフレーム画像であり、前記奥行き情報は、前記２次元映像を構成する各フレーム画像における前記所定単位としての画素毎の画素値であり、前記２次元映像を構成する各フレーム画像における前記画素毎の奥行き情報を表す画素値は、前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより各フレーム画像内において差分符号化する第１の圧縮符号化処理、ランレングス符号化する第２の圧縮符号化処理、前記各フレーム画像と該各フレーム画像の時間軸上における未来および過去の内の少なくとも一方のフレーム画像とから予測符号化する第３の圧縮符号化処理、および前記各フレーム画像内の直交変換を用いて符号化する第４の圧縮符号化処理の内の少なくとも何れか１つの圧縮符号化処理により圧縮符号化されて前記記録媒体に記録されており、前記奥行きに関する情報を再生する手段は、前記第１〜第４の圧縮符号化処理の内の少なくとも何れか１つの圧縮符号化処理により圧縮符号化処理された前記各フレーム画像の前記画素毎の奥行き情報を表す画素値を、その少なくとも何れか１つの圧縮符号化処理に対応する復号化処理により復号化して前記２次元画像の奥行きに関する情報を再生する手段を備えたことを要旨とする。 In order to solve the above problem, the two-dimensional image is a plurality of frame images constituting a two-dimensional video, and the depth information is included in each frame image constituting the two-dimensional video. The pixel value for each pixel as the predetermined unit, and the pixel value representing the depth information for each pixel in each frame image constituting the two-dimensional video is expressed by a compression encoding format compliant with the format of the recording medium. A first compression encoding process for differential encoding in a frame image; a second compression encoding process for run length encoding; at least a future and a past on the time axis of each frame image and each frame image; Encode using a third compression encoding process for predictive encoding from one frame image and orthogonal transform in each frame image Means for reproducing information relating to the depth, which is compressed and encoded by at least one of the four compression encoding processes and recorded on the recording medium. A pixel value representing depth information for each pixel of each frame image that has been compression-encoded by at least one of the compression-encoding processes is used as at least one compression-encoding process. And a means for reproducing information relating to the depth of the two-dimensional image by decoding processing corresponding to the above.

請求項４記載の発明は、上記課題を解決するため、所定の規格にフォーマット化された記録媒体に記録されており、２次元画像を前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより圧縮符号化して成る第１のビットストリームと、前記２次元画像の所定単位毎に求められた当該２次元画像の奥行きに関する情報をフォーマット化して成る第２のビットストリームとが多重化された多重化ストリームを再生するためのコンピュータが実行可能な再生プログラムであって、前記コンピュータに、前記多重化ストリームから前記第１のビットストリームを分離して前記２次元画像を再生する処理と、前記多重化ストリームから前記第２のビットストリームを分離して前記２次元画像の奥行きに関する情報を再生する処理と、前記再生した２次元画像および当該２次元画像の奥行きに関する情報に基づいて３次元画像を生成する処理とをそれぞれ実行させることを要旨とする。 In order to solve the above-mentioned problem, the invention according to claim 4 is recorded on a recording medium formatted according to a predetermined standard, and a two-dimensional image is compressed by a compression encoding format conforming to the format of the recording medium. A multiplexed stream in which a first bit stream formed by converting and a second bit stream formed by formatting information on the depth of the two-dimensional image obtained for each predetermined unit of the two-dimensional image are multiplexed. A reproduction program executable by a computer for reproduction, wherein the computer separates the first bit stream from the multiplexed stream and reproduces the two-dimensional image; and A process of separating a second bitstream and reproducing information about the depth of the two-dimensional image; It is executed without the 2-dimensional image and the two-dimensional image of a process for generating a three-dimensional image based on depth information respectively summarized as.

請求項５記載の発明は、上記課題を解決するため、所定の規格にフォーマット化された記録媒体に記録されており、２次元画像を前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより圧縮符号化して成るビデオストリームと、このビデオストリームにおける前記圧縮符号化フォーマットにおいて設定される任意使用領域に格納された前記２次元画像の奥行きに関する情報とが多重化されて構成された多重化ストリームを再生するためのコンピュータが実行可能な再生プログラムであって、前記コンピュータに、前記多重化ストリームから前記ビットストリームを分離して前記２次元画像を再生する処理と、前記分離したビットストリームにおける前記任意使用領域に格納された前記２次元画像の奥行きに関する情報を再生する処理と、前記再生した２次元画像および当該２次元画像の奥行きに関する情報に基づいて３次元画像を生成する処理とをそれぞれ実行させることを要旨とする。 In order to solve the above-mentioned problem, the invention according to claim 5 is recorded on a recording medium formatted according to a predetermined standard, and a two-dimensional image is compressed by a compression encoding format conforming to the format of the recording medium. A multiplexed stream formed by multiplexing a video stream formed by combining the information about the depth of the two-dimensional image stored in an arbitrary use area set in the compression encoding format in the video stream A computer-executable reproduction program for separating the bit stream from the multiplexed stream and reproducing the two-dimensional image in the computer, and for the arbitrary use area in the separated bit stream. Play back information about the depth of the stored 2D image And processing, and summarized in that to execute the reproduced 2D image and the two-dimensional image of a process for generating a three-dimensional image based on depth information respectively.

請求項６記載の発明は、上記課題を解決するため、前記２次元画像は、２次元映像を構成する複数のフレーム画像であり、前記奥行き情報は、前記２次元映像を構成する各フレーム画像における前記所定単位としての画素毎の画素値であり、前記２次元映像を構成する各フレーム画像における前記画素毎の奥行き情報を表す画素値は、前記記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより各フレーム画像内において差分符号化する第１の圧縮符号化処理、ランレングス符号化する第２の圧縮符号化処理、前記各フレーム画像と該各フレーム画像の時間軸上における未来および過去の内の少なくとも一方のフレーム画像とから予測符号化する第３の圧縮符号化処理、および前記各フレーム画像内の直交変換を用いて符号化する第４の圧縮符号化処理の内の少なくとも何れか１つの圧縮符号化処理により圧縮符号化されて前記記録媒体に記録されており、前記奥行きに関する情報を再生する処理は、前記第１〜第４の圧縮符号化処理の内の少なくとも何れか１つの圧縮符号化処理により圧縮符号化処理された前記各フレーム画像の前記画素毎の奥行き情報を表す画素値を、その少なくとも何れか１つの圧縮符号化処理に対応する復号化処理により復号化して前記２次元画像の奥行きに関する情報を再生する処理を含むことを要旨とする。 In order to solve the above problem, the two-dimensional image is a plurality of frame images constituting a two-dimensional video, and the depth information is included in each frame image constituting the two-dimensional video. The pixel value for each pixel as the predetermined unit, and the pixel value representing the depth information for each pixel in each frame image constituting the two-dimensional video is expressed by a compression encoding format compliant with the format of the recording medium. A first compression encoding process for differential encoding in a frame image; a second compression encoding process for run length encoding; at least a future and a past on the time axis of each frame image and each frame image; Encode using a third compression encoding process for predictive encoding from one frame image and orthogonal transform in each frame image The processing for reproducing the information related to the depth is compressed and encoded by at least one of the four compression encoding processes and recorded on the recording medium. A pixel value representing depth information for each pixel of each frame image that has been compression-encoded by at least one of the compression-encoding processes is used as at least one compression-encoding process. And a process of reproducing information related to the depth of the two-dimensional image by decoding processing corresponding to the above.

請求項１および４記載の発明によれば、記録媒体に記録された多重化ストリームを、一体に多重化記録された奥行き情報を用いることなく再生することにより、第１のビットストリームに基づく２次元画像を再生することができる。そして、第１のビデオストリームを、一体に多重化記録された奥行き情報を用いて再生することにより、２次元画像および奥行き情報から生成された３次元画像に切り替えて再生することができる。 According to the first and fourth aspects of the present invention, by reproducing the multiplexed stream recorded on the recording medium without using the depth information that is integrally multiplexed and recorded, two-dimensional based on the first bit stream is obtained. Images can be played back. Then, by playing back the first video stream using the depth information that is integrally multiplexed and recorded, the first video stream can be switched and played back to the two-dimensional image and the three-dimensional image generated from the depth information.

請求項２および５記載の発明によれば、２次元画像を記録媒体のフォーマットに準拠する圧縮符号化フォーマットにより圧縮符号化してビットストリームを生成し、２次元画像の奥行きに関する情報を、生成したビットストリームにおける圧縮符号化フォーマットにおいて設定される任意使用領域に格納して奥行き情報付きのビットストリームを生成し、多重化により記録媒体に記録している。 According to the second and fifth aspects of the present invention, a bit stream is generated by compressing and encoding a two-dimensional image using a compression encoding format conforming to the format of the recording medium, and information regarding the depth of the two-dimensional image is generated. A bit stream with depth information is generated by storing in an arbitrary use area set in the compression encoding format in the stream, and is recorded on a recording medium by multiplexing.

したがって、図示しない再生装置においては、記録媒体に記録された多重化ストリームを、一体に多重化記録された奥行き情報を用いることなく再生することにより、多重化ストリームに基づく２次元画像を再生することができる。そして、多重化ストリームを、一体に多重化された奥行き情報を用いて再生することにより、２次元画像および奥行きに関する情報から生成された３次元画像に切り替えて再生することができる。 Therefore, in a playback apparatus (not shown), a 2D image based on the multiplexed stream is played back by playing back the multiplexed stream recorded on the recording medium without using the depth information recorded in a multiplexed manner. Can do. Then, by reproducing the multiplexed stream using the integrally multiplexed depth information, it is possible to switch to and reproduce the two-dimensional image and the three-dimensional image generated from the information related to the depth.

特に、請求項３および６に記載した発明によれば、２次元映像を構成する各フレーム画像における画素毎の奥行き情報を表す画素値が第１〜第４の圧縮符号化処理の内の少なくとも何れか１つの圧縮符号化処理により圧縮符号化されている際に、その少なくとも何れか１つの圧縮符号化処理に対応する復号化処理により２次元映像を構成する各フレーム画像における画素毎の奥行き情報を再生することができる。 In particular, according to the invention described in claims 3 and 6, the pixel value representing the depth information for each pixel in each frame image constituting the two-dimensional video is at least one of the first to fourth compression encoding processes. Depth information for each pixel in each frame image constituting a two-dimensional video by decoding processing corresponding to at least one of the compression encoding processing when the compression encoding processing is performed by one compression encoding processing. Can be played.

以下、本発明を実施するための最良の形態である複数の実施形態を図面を参照して説明する。 Hereinafter, a plurality of embodiments, which are the best mode for carrying out the present invention, will be described with reference to the drawings.

なお、本発明に係る各実施形態においては、２次元映像およびこの２次元映像から得られた３次元映像再生に必要な奥行き情報を、規格化されたフォーマットとしてＤＶＤ（Digital Versatile Disk）ビデオ規格に準拠したフォーマットに互換性を有する状態で記録媒体(メディア)に記録するための記録装置および記録プログラムと、このようにして記録された２次元映像および奥行き情報を再生する再生装置および再生プログラムについて説明する。そして、本発明に係る記録装置およびプログラムに対応する記録フォーマットは、上記ＤＶＤビデオ規格準拠フォーマットに限定されるものではなく、２次元映像を再生できる仕組み（ハードウェア構成およびソフトウェア構成の何れも含む）を有するアプリケーションプログラムのフォーマットや他の記録媒体のフォーマットに対してＤＶＤビデオ規格用フォーマットと同様に応用できるものである。 In each of the embodiments according to the present invention, 2D video and depth information necessary for 3D video playback obtained from the 2D video are converted into a DVD (Digital Versatile Disk) video standard as a standardized format. A recording apparatus and a recording program for recording on a recording medium (media) in a state compatible with a compliant format, and a reproducing apparatus and a reproduction program for reproducing the two-dimensional video and depth information recorded in this way are described. To do. The recording format corresponding to the recording apparatus and the program according to the present invention is not limited to the DVD video standard compliant format, and a mechanism capable of reproducing 2D video (including both hardware configuration and software configuration). It can be applied to the format of an application program having the same format as that of the DVD video standard and the format of other recording media.

（第１の実施の形態）
図１は、本発明の第１の実施の形態に係る記録装置２００の概略構成を示すブロック図であり、図２は、本実施形態の記録装置２００の映像記録対象である物体（対象物体）ＯＢに対する記録装置２００の配置関係を示す図である。 (First embodiment)
FIG. 1 is a block diagram showing a schematic configuration of a recording apparatus 200 according to the first embodiment of the present invention, and FIG. 2 is an object (target object) that is a video recording target of the recording apparatus 200 of the present embodiment. FIG. 4 is a diagram illustrating an arrangement relationship of a recording apparatus 200 with respect to OB.

図１および図２に示すように、記録装置２００は、撮像レンズＬ１およびＬ２をそれぞれ有し、対象物体ＯＢの左右の２次元映像｛ビデオデータ（映像用ビットストリーム）｝である左眼動画像および右眼動画像をそれぞれ撮像するための左右一対の撮像カメラ１Ａおよび１Ｂを備えている。この撮像カメラ１Ａおよび１Ｂの撮像レンズＬ１ＡおよびＬ１Ｂは、人間の両眼の間隔に相当する固定あるいは可変の間隔を空けて所定方向に沿って左右に同一の高さで並置されており、各撮像レンズＬ１ＡおよびＬ１Ｂは、互いに共通に設定された固定あるいは可変の焦点距離を有している。すなわち、撮像カメラ１Ａおよび１Ｂは、その撮像レンズＬ１ＡおよびＬ１Ｂの光軸が同一平面上に含まれ、且つそれぞれの撮像領域が同一平面上に位置するように２次元的に配置されている。 As shown in FIGS. 1 and 2, the recording apparatus 200 includes imaging lenses L1 and L2, respectively, and a left-eye moving image that is two-dimensional video {video data (video bitstream)} on the left and right of the target object OB. And a pair of left and right imaging cameras 1A and 1B for capturing a right eye moving image and a right eye moving image, respectively. The imaging lenses L1A and L1B of the imaging cameras 1A and 1B are juxtaposed at the same height on the left and right along a predetermined direction with a fixed or variable interval corresponding to the interval between human eyes. The lenses L1A and L1B have a fixed or variable focal length set in common. That is, the imaging cameras 1A and 1B are two-dimensionally arranged so that the optical axes of the imaging lenses L1A and L1B are included on the same plane and the respective imaging areas are positioned on the same plane.

また、記録装置２００は、撮像カメラ１Ａおよび１Ｂによりそれぞれ撮像された２次元映像の内の少なくともどちらか一方（本実施形態では、撮像カメラ１Ｂにより撮像された２次元映像）を圧縮するビデオ圧縮器２と、撮像カメラ１Ａおよび１Ｂによりそれぞれ撮像された各フレーム画像(左眼画像、右眼画像)間の見え方の違い(視差)を表す物理量（視差ベクトル：視差方向および視差量）を抽出する視差ベクトル抽出器３とを備えている。 The recording apparatus 200 also compresses at least one of the two-dimensional images captured by the imaging cameras 1A and 1B (in this embodiment, the two-dimensional image captured by the imaging camera 1B). 2 and a physical quantity (parallax vector: parallax direction and parallax amount) representing a difference in appearance (parallax) between the frame images (left eye image and right eye image) captured by the imaging cameras 1A and 1B, respectively. A disparity vector extractor 3.

さらに、記録装置２００は、視差ベクトル抽出器３により抽出された視差ベクトルの大きさを算出し、算出した視差ベクトルの大きさを用いて、３次元映像生成に必要な左眼画像および右眼画像の奥行きに関する情報（奥行き情報）を算出する奥行き情報算出器４と、この奥行き情報算出器４により算出された奥行き情報をフォーマット化する奥行き情報フォーマット器５とを備えている。この奥行き情報フォーマット器５によりフォーマット化された奥行き情報は、設定に応じてビデオ圧縮器２に送られて圧縮されるか、直接、後述する情報多重化器８に送られて多重化されるようになっている。 Furthermore, the recording apparatus 200 calculates the size of the parallax vector extracted by the parallax vector extractor 3, and uses the calculated size of the parallax vector, and the left eye image and the right eye image necessary for generating the three-dimensional video. A depth information calculator 4 for calculating information on depth (depth information), and a depth information formatter 5 for formatting the depth information calculated by the depth information calculator 4. The depth information formatted by the depth information formatter 5 is sent to the video compressor 2 and compressed according to the setting, or directly sent to the information multiplexer 8 described later so as to be multiplexed. It has become.

記録装置２００は、対象物体ＯＢの周囲の音（音声等）を取得するマイクロフォン（以下、マイクとする）６と、このマイク９により取得された周囲音に対応するオーディオ信号を、ＤＶＤビデオ規格に準拠する圧縮フォーマットであるＭＰＥＧフォーマット(ＭＰＥＧオーディオ)により圧縮するオーディオ圧縮器７とを備えている。 The recording device 200 converts a microphone (hereinafter referred to as a microphone) 6 that acquires sound (sound or the like) around the target object OB and an audio signal corresponding to the ambient sound acquired by the microphone 9 to the DVD video standard. And an audio compressor 7 which compresses the MPEG format (MPEG audio) which is a compliant compression format.

さらに、記録装置２００は、ビデオ圧縮器２により圧縮化されたビデオデータ、オーディオ圧縮器７により圧縮化されたオーディオデータ、および奥行き情報を多重化することにより、ＤＶＤビデオ規格に準拠するフォーマットのデータを生成する情報多重化器８と、ＤＶＤビデオ規格に準拠する記録媒体９と、情報多重化器８により多重化されたＤＶＤビデオ規格に準拠するフォーマットのデータを記録媒体９に記録する記録器１０とを備えている。 Furthermore, the recording apparatus 200 multiplexes the video data compressed by the video compressor 2, the audio data compressed by the audio compressor 7, and the depth information, thereby data in a format compliant with the DVD video standard. An information multiplexer 8 that generates the data, a recording medium 9 that conforms to the DVD video standard, and a recorder 10 that records data in a format conforming to the DVD video standard multiplexed by the information multiplexer 8 on the recording medium 9 And.

そして、記録装置２００は、撮像カメラ１Ａおよび１Ｂ、ビデオ圧縮器２、視差ベクトル抽出器３、奥行き情報算出器４、奥行き情報フォーマット器５、マイク６、オーディオ圧縮器７、情報多重化器８、および記録器１０にそれぞれ接続されており、装置全体を制御する制御部１１を備えている。すなわち、この制御部１１は、撮像カメラ１Ａおよび１Ｂの撮像動作、ビデオ圧縮器２の圧縮処理、視差ベクトル抽出器３の視差ベクトル抽出処理、奥行き情報算出器４の奥行き情報算出処理、奥行き情報フォーマット器５のフォーマット化処理、マイク６の周囲音取得処理、オーディオ圧縮器７の圧縮処理、情報多重化器８の多重化処理、および記録器１０の記録処理をそれぞれ制御するようになっている。 The recording apparatus 200 includes an imaging camera 1A and 1B, a video compressor 2, a disparity vector extractor 3, a depth information calculator 4, a depth information formatter 5, a microphone 6, an audio compressor 7, an information multiplexer 8, And a controller 11 that is connected to the recorder 10 and controls the entire apparatus. That is, the control unit 11 performs the imaging operation of the imaging cameras 1A and 1B, the compression process of the video compressor 2, the parallax vector extraction process of the parallax vector extractor 3, the depth information calculation process of the depth information calculator 4, and the depth information format. The controller 5 controls the formatting process of the recorder 5, the ambient sound acquisition process of the microphone 6, the compression process of the audio compressor 7, the multiplexing process of the information multiplexer 8, and the recording process of the recorder 10.

視差ベクトル抽出器３は、左右一対の撮像カメラ１Ａおよび１Ｂによりそれぞれ同一タイミングで撮像された対象物体ＯＢの左眼画像Ｐｌおよび右眼画像Ｐｒを受け取り、受け取った左眼画像Ｐｌの各画素（ピクセル）および右眼画像Ｐｒの対応する各画素の間の対応付けを所定の評価関数（例えば、特開平９−３３２４９号公報に開示された評価関数）を用いて行うようになっている。 The disparity vector extractor 3 receives the left eye image Pl and the right eye image Pr of the target object OB imaged at the same timing by the pair of left and right imaging cameras 1A and 1B, and receives each pixel (pixel) of the received left eye image Pl. ) And the corresponding pixels of the right eye image Pr are performed using a predetermined evaluation function (for example, the evaluation function disclosed in Japanese Patent Laid-Open No. 9-33249).

ここで、上述したように、左右一対の撮像カメラ１Ａおよび１Ｂは、その撮像レンズＬ１ＡおよびＬ１Ｂの光軸ＯｌおよびＯｒが所定の間隔（図３（ａ）ではＢとして表す）を空けて同一平面（例えば、本実施形態では、図３（ａ）に示すように、同一のＸ−Ｚ平面）に含まれるように配置されており、対象物体ＯＢの左眼画像Ｐｌおよび右眼画像Ｐｒは、図３（ａ）に示すように、対象物体ＯＢの手前側（撮像カメラ側）において同一のＸ−Ｙ平面に含まれるように得られる。 Here, as described above, the pair of left and right imaging cameras 1A and 1B have the same plane with the optical axes Ol and Or of the imaging lenses L1A and L1B spaced apart from each other by a predetermined distance (represented as B in FIG. 3A). (For example, in the present embodiment, the left eye image Pl and the right eye image Pr of the target object OB are arranged so as to be included in the same XZ plane as shown in FIG. As shown to Fig.3 (a), it is obtained so that it may be contained in the same XY plane in the near side (imaging camera side) of the target object OB.

撮像カメラ１Ａおよび１Ｂのそれぞれの光軸が同一のＸ−Ｚ平面上に正しく配置される限り、左眼画像Ｐｌの各点（画素）に対応する右眼画像Ｐｒ間の点（画素）の探索は、例えば左眼画像Ｐｌの各画素の右眼画像Ｐｒ上への投影線であるエピポーラ線である走査線上のみ行えばよい。しかしながら、実際には、右眼画像Ｐｒ上の走査線上に対応画素が１画素分も誤差なく配置されていることはむしろ少ない。 As long as the optical axes of the imaging cameras 1A and 1B are correctly arranged on the same XZ plane, the search for points (pixels) between the right eye images Pr corresponding to the respective points (pixels) of the left eye image Pl May be performed only on a scanning line that is an epipolar line that is a projection line of each pixel of the left eye image Pl onto the right eye image Pr. However, in practice, it is rather rare that the corresponding pixels for one pixel are arranged without error on the scanning line on the right eye image Pr.

そこで、本実施形態では、視差ベクトル抽出器３は、左眼画像Ｐｌ上の各画素に対応する右眼画像Ｐｒ上の各画素、すなわち、左眼画像Ｐｌおよび右眼画像Ｐｒ間の対応画素を、左眼画像Ｐｌ上の各画素の右眼画像Ｐｒ上の投影線であるエピポーラ線方向｛本実施形態の場合、撮像レンズＬ１ＡおよびＬ１Ｂの光軸ＯｌおよびＯｒが同一のＸ−Ｚ平面に含まれているため、エピポーラ線方向に加えて、そのエピポーラ線方向に直交する垂直方向｝に沿って対応画素探索および探索された対応画素間の視差ベクトル抽出処理を行う。 Therefore, in the present embodiment, the disparity vector extractor 3 calculates each pixel on the right eye image Pr corresponding to each pixel on the left eye image Pl, that is, a corresponding pixel between the left eye image Pl and the right eye image Pr. , Epipolar line direction that is a projection line on the right eye image Pr of each pixel on the left eye image Pl {in this embodiment, the optical axes Ol and Or of the imaging lenses L1A and L1B are included in the same XZ plane Therefore, in addition to the epipolar line direction, a corresponding pixel search and a parallax vector extraction process between the searched corresponding pixels are performed along the vertical direction orthogonal to the epipolar line direction}.

ここで、視差ベクトル抽出器３の対応画素探索処理および探索された対応画素間の視差ベクトル算出処理について、図２、図３（ａ）および（ｂ）を参照して説明する。 Here, the corresponding pixel search processing of the disparity vector extractor 3 and the disparity vector calculation processing between the searched corresponding pixels will be described with reference to FIGS. 2, 3A, and 3B.

例えば、図３（ａ）に示すように、対象物体ＯＢにおける予め判別されている点、あるいは非常に判別しやすい特徴点（座標（ｘ，ｙ，ｚ））が、左眼画像Ｐｌには点Ｒ（Ｘｌ，Ｙｌ）に、右眼画像Ｐｒには点Ｓ（Ｘｒ，Ｙｒ）に存在していたとする。左眼画像Ｐｌ上の点Ｒの右眼画像Ｐｒ上におけるエピポーラ線はＥＰｒ、右眼画像Ｐｒ上の点Ｓの左眼画像Ｐｌ上におけるエピポーラ線はＥＰｌとしてそれぞれ表される。 For example, as shown in FIG. 3A, a point that has been previously determined in the target object OB or a feature point (coordinates (x, y, z)) that is very easy to determine is a point in the left eye image Pl. Assume that R (Xl, Yl) exists at the point S (Xr, Yr) in the right eye image Pr. The epipolar line on the right eye image Pr of the point R on the left eye image Pl is represented as EPr, and the epipolar line on the left eye image Pl of the point S on the right eye image Pr is represented as EP1.

このとき、撮像カメラ１Ａおよび１Ｂのそれぞれの光軸が同一のＸ−Ｚ平面上に配置されているため、点ＲのＹ座標Ｙｌおよび点ＳのＹ座標Ｙｒはそれぞれ等しくなり、エピポーラ線ＥＰｒおよびエピポーラ線ＥＰｌの方向は、点Ｒおよび点Ｓを結ぶ直線の方向Ｄ１（以下、エピポーラ線ＥＰの方向Ｄ１とする）として求めることができる。 At this time, since the optical axes of the imaging cameras 1A and 1B are arranged on the same XZ plane, the Y coordinate Yl of the point R and the Y coordinate Yr of the point S are equal, and the epipolar line EPr and The direction of the epipolar line EPl can be obtained as a direction D1 of a straight line connecting the points R and S (hereinafter referred to as the direction D1 of the epipolar line EP).

本実施形態において、撮像カメラ１Ａおよび１Ｂの高さ方向（Ｙ軸方向）への設置誤差が無視でき、撮像カメラ１Ａおよび１Ｂそれぞれの光軸ＯｌおよびＯｒが同一のＸ−Ｚ平面上に略正確に配置されているとした場合、エピポーラ線ＥＰの方向Ｄ１は、図３（ａ）に示すように、略Ｘ軸に平行である。したがって、例えば、左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）に対応する右眼画像Ｐｒ上の対応点Ｓ（Ｘｒ，Ｙｒ）の探索は、左眼画像Ｐｌおよび右眼画像Ｐｒ上に設定される、エピポーラ線ＥＰの方向Ｄ１に沿った幅ΔＥおよびエピポーラ線ＥＰの方向に直交する方向に沿った幅ΔＴにより構成される探索範囲Ｗにおいて実行される。 In the present embodiment, installation errors in the height direction (Y-axis direction) of the imaging cameras 1A and 1B can be ignored, and the optical axes Ol and Or of the imaging cameras 1A and 1B are substantially accurate on the same XZ plane. , The direction D1 of the epipolar line EP is substantially parallel to the X axis, as shown in FIG. Therefore, for example, the search for the corresponding point S (Xr, Yr) on the right eye image Pr corresponding to the point R (Xl, Yl) of the left eye image Pl is set on the left eye image Pl and the right eye image Pr. This is executed in the search range W constituted by the width ΔE along the direction D1 of the epipolar line EP and the width ΔT along the direction orthogonal to the direction of the epipolar line EP.

具体的には、探索範囲Ｗ上における左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）を含むその近傍に小ブロック（例えば、水平４画素、垂直２画素の画素ブロック）を設定し、この小ブロックの各画素の画素値｛例えば、輝度信号（Ｙ信号）用の８ビット、色差信号（Ｃｂ信号、Ｃｒ信号）用の４ビットずつの合計１６ビットで表される｝と探索範囲Ｗ上における右眼画像Ｐｒの各点に対応する各小ブロックの各画素の画素値（上記１６ビット）との差や和、あるいは差の２乗和等を評価パラメータとし、この評価パラメータが最小値になる右眼画像Ｐｒ上の点を、左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）の対応点として求める。なお、本実施形態の場合、この対応点が点Ｓ（Ｘｒ，Ｙｒ）となる。 Specifically, a small block (for example, a pixel block of four horizontal pixels and two vertical pixels) is set in the vicinity of the left eye image Pl including the point R (Xl, Yl) on the search range W. The pixel value of each pixel {represented by a total of 16 bits, for example, 8 bits for luminance signal (Y signal) and 4 bits for color difference signals (Cb signal, Cr signal)} and the right on the search range W The right or the right at which the evaluation parameter is the minimum value, with the difference or sum from the pixel value (16 bits above) of each pixel of each small block corresponding to each point of the eye image Pr as the evaluation parameter. A point on the eye image Pr is obtained as a corresponding point of the point R (Xl, Yl) of the left eye image Pl. In the present embodiment, this corresponding point is the point S (Xr, Yr).

撮像カメラ１Ａおよび１Ｂの高さ方向（Ｙ軸方向）への設置誤差を考慮した結果、例えばエピポーラ線ＥＰの方向Ｄ１がＸ軸から角度θだけ例えば下方に傾斜していた場合（図３(ｃ)においてエピポーラ線ＥＰの方向Ｄ１’として表す）には、水平探索範囲とｔａｎθの積で計算される範囲分垂直方向の探索範囲を拡大することにより、たとえ撮像カメラ１Ａおよび１Ｂの高さ方向へ設置誤差が生じていた場合でも、左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）に対応する右眼画像Ｐｒ上の点Ｓ（Ｘｒ，Ｙｒ）を正確に抽出することができる。 As a result of considering installation errors in the height direction (Y-axis direction) of the imaging cameras 1A and 1B, for example, when the direction D1 of the epipolar line EP is inclined downward, for example, by an angle θ from the X-axis (FIG. 3C ) In the direction D1 ′ of the epipolar line EP), by expanding the vertical search range by the range calculated by the product of the horizontal search range and tan θ, even in the height direction of the imaging cameras 1A and 1B. Even when an installation error has occurred, the point S (Xr, Yr) on the right eye image Pr corresponding to the point R (Xl, Yl) of the left eye image Pl can be accurately extracted.

このようにして左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）に対応する右眼画像Ｐｒ上の点Ｓ（Ｘｒ，Ｙｒ）を抽出した後、視差ベクトル抽出器３は、左眼画像Ｐｌの点Ｒ（Ｘｌ，Ｙｌ）と対応点Ｓ（Ｘｒ，Ｙｒ）との差分をとり、得られた結果を、点Ｐ（Ｘｌ，Ｙｌ）における視差ベクトルＶ（Ｘｌ−Ｘｒ，Ｙｌ−Ｙｒ）として表す。この視差ベクトルを、左眼画像Ｐｌにおける探索範囲Ｗ内の全てのマクロブロック（ＭＢ：本実施形態では、水平方向１６画素×垂直方向１６画素の画素ブロックとする）内の全ての点について、上述した点Ｒ（Ｘｌ，Ｙｌ）における抽出方法と同一の方法により求める。視差ベクトル抽出器３は、上述した処理を左眼画像Ｐｌ全体に亘って行い、この結果、左眼画像Ｐｌおよび右眼画像Ｐｒ上の全ての対応点に関する全ての視差ベクトルＶを抽出し、抽出した全ての視差ベクトルＶを奥行き情報算出器４に送出する。 After extracting the point S (Xr, Yr) on the right eye image Pr corresponding to the point R (Xl, Yl) of the left eye image Pl in this way, the disparity vector extractor 3 uses the point of the left eye image Pl. The difference between R (Xl, Yl) and the corresponding point S (Xr, Yr) is taken, and the obtained result is expressed as a disparity vector V (Xl-Xr, Yl-Yr) at the point P (Xl, Yl). This disparity vector is described above for all points in all macroblocks (MB: pixel block of 16 pixels in the horizontal direction × 16 pixels in the vertical direction) in the search range W in the left eye image Pl. It is obtained by the same method as the extraction method at the point R (Xl, Yl). The disparity vector extractor 3 performs the above-described processing over the entire left eye image Pl, and as a result, extracts and extracts all disparity vectors V related to all corresponding points on the left eye image Pl and the right eye image Pr. All the parallax vectors V thus transmitted are sent to the depth information calculator 4.

奥行き情報算出器４は、送出されてきた各視差ベクトルＶを受け取り、受け取った各視差ベクトルの大きさ（奥行き方向の距離、以下、奥行き情報とする）を計算する。そして、奥行き情報算出器４は、例えば、水平４画素、垂直２画素（４×２）の小ブロックを設定して探索を行い、この小ブロック単位の各視差ベクトルの位置情報｛水平位置（Ｘ）、垂直位置（Ｙ）、奥行き情報（Ｚ）｝として奥行き情報フォーマット器５へ送る。 The depth information calculator 4 receives each transmitted disparity vector V and calculates the magnitude of each received disparity vector (distance in the depth direction, hereinafter referred to as depth information). Then, the depth information calculator 4 performs a search by setting a small block of 4 horizontal pixels and 2 vertical pixels (4 × 2), for example, and position information {horizontal position (X ), Vertical position (Y), and depth information (Z)}.

奥行き情報フォーマット器５は、送られた各小ブロックの奥行き情報の値を、各小ブロックを構成する各画素に対してたとえば８ビットのデータ(以下、奥行きデータとする)として展開する。例えば、奥行き情報フォーマット器５は、各小ブロックを構成する画素それぞれに対し、対応する小ブロックに設定された同一の奥行きデータ値や奥行きデータ値の画素単位の平均値を設定することも可能である。また、奥行き情報フォーマット器５は、隣接する小ブロックの画素それぞれに設定されるデータ値を、例えばローパスフィルタを介して平滑化処理（小ブロック間で奥行きデータ値が滑らかに繋がる処理）を施すこともできる。 The depth information formatter 5 develops the depth information value of each small block sent as, for example, 8-bit data (hereinafter referred to as depth data) for each pixel constituting each small block. For example, the depth information formatter 5 can set the same depth data value set in the corresponding small block or the average value of the depth data value in units of pixels for each pixel constituting each small block. is there. In addition, the depth information formatter 5 performs a smoothing process (a process in which the depth data values are smoothly connected between the small blocks), for example, via a low-pass filter on the data values set in the pixels of the adjacent small blocks. You can also.

このようにして得られた左眼画像Ｐｌの各画素の奥行きデータに基づいて、奥行き情報フォーマット器５は、その奥行きデータを例えばラスタ順（左上画素から右下画素へ走査される順番）に並べてフォーマット化する。このフォーマット化された奥行きデータは、ビデオ圧縮器２に送信される。 Based on the depth data of each pixel of the left eye image Pl obtained in this way, the depth information formatter 5 arranges the depth data in, for example, raster order (the order of scanning from the upper left pixel to the lower right pixel). Format. This formatted depth data is transmitted to the video compressor 2.

ここで、ＤＶＤビデオフォーマットの概略を図４に示す。 An outline of the DVD video format is shown in FIG.

図４に示すように、ＤＶＤビデオフォーマットでは、ＤＶＤビデオ規格に準拠する記録媒体の記録層がVolume spaceとして設定されており、このVolume spaceがVolume and File structure１９、DVD-video zone２０、およびDVD-others zone２１に分かれている。DVD-video zone（ＤＶＤビデオゾーン）２０には、ＤＶＤビデオの再生に必要な全てのファイルとして、１つのＶＭＧ（Video Manager）２２および複数（ｎ個）のＶＴＳ（Video Title Set）２３ａ１〜２３ａｎが格納されている。ＶＭＧ２２は、ＶＭＧＩ（Video Manager Information:ビデオマネージャーインフォメーション）等、後続するＶＴＳ２３ａ１〜２３ａｎの識別情報や様々な情報自体のスタートアドレスやエンドアドレス、どこのビデオストリームから再生を開始するか等の情報が含まれている。 As shown in FIG. 4, in the DVD video format, a recording layer of a recording medium compliant with the DVD video standard is set as a Volume space, and this Volume space is Volume and File structure 19, DVD-video zone 20, and DVD-others. It is divided into zone 21. The DVD-video zone 20 includes one VMG (Video Manager) 22 and a plurality (n pieces) of VTS (Video Title Set) 23a1 to 23an as all files necessary for DVD video playback. Stored. The VMG 22 includes information such as identification information of subsequent VTSs 23a1 to 23an such as VMGI (Video Manager Information), start addresses and end addresses of various information itself, and from which video stream playback is started. It is.

DVD-others zone２１は、ＤＶＤビデオ規格に準拠するＤＶＤビデオフォーマットにおいて自由使用領域として設定された領域である。 The DVD-others zone 21 is an area set as a free use area in the DVD video format that complies with the DVD video standard.

各ＶＴＳ２３ａ１〜２３ａｎは、再生されるべきオーディオデータ（オーディオビットストリーム、以下、単にオーディオストリームとも記載する）およびビデオデータ（ビデオビットストリーム、以下、単にビデオストリームとも記載する）のアドレス情報や識別情報等の制御データ（Control Data）が格納されたフィールド２４と、このControl Dataフィールド２４の後に格納されたＶＯＢＳ（Video Object Set:ビデオオブジェクトセット）２５というビデオストリームおよびオーディオストリームが多重化されたＭＰＥＧ（Motion Picture Experts Group）ストリームのセット（コンテンツ）とから構成されており、このＶＯＢＳ２５は、複数のＶＯＢ２６（Video Object）という小単位のＭＰＥＧストリームから構成されている。 Each VTS 23a1 to 23an has address information, identification information, etc. of audio data (audio bit stream, hereinafter simply referred to as audio stream) and video data (video bit stream, hereinafter also simply referred to as video stream) to be reproduced. MPEG (Motion) in which a video stream and an audio stream called VOBS (Video Object Set) 25 stored after the control data field 24 and a VOBS (Video Object Set) 25 stored after the control data field 24 are multiplexed. This VOBS 25 is composed of a plurality of MPEG streams called a plurality of VOBs 26 (Video Objects).

各ＶＯＢ２６は、さらに細分化された複数のセル（ＣＥＬＬ）２７という単位から構成されている。このＣＥＬＬ２７は、再生単位を表し、固有のＩＤ番号が付与されている。各ＣＥＬＬ２７は、さらに複数のＶＯＢＵ（Video Object Unit:ビデオオブジェクトユニット）２８から構成されている。この各ＶＯＢＵ２８がＭＰＥＧストリームのＧＯＰ（Group of Pictures：グループオブピクチャズ）に相当する構造となっており、再生時間長として０．４〜１．０秒程度を有している。このＧＯＰは、後述するＭＰＥＧ２における予測符号化のための構造であり、I-picture （Ｉピクチャ：フレーム内予測符号化画像）から次のＩピクチャまでのグループ｛P-pictures（Ｐピクチャズ：複数の順方向予測符号化画像）およびB-Pictures（Ｂピクチャズ：複数の双方向予測符号化画像）｝を意味し、例えば蓄積用記録媒体では、一般に１５ピクチャズ（１５画像）のグループとして構成される。 Each VOB 26 is composed of a plurality of subdivided cells (CELL) 27 units. This CELL 27 represents a reproduction unit and is given a unique ID number. Each CELL 27 is composed of a plurality of VOBUs (Video Object Units) 28. Each VOBU 28 has a structure corresponding to a GOP (Group of Pictures) of the MPEG stream, and has a reproduction time length of about 0.4 to 1.0 seconds. This GOP is a structure for predictive coding in MPEG2, which will be described later, and a group {P-pictures (P-pictures: multiple pictures) from I-picture (I picture: intra-frame predictive coded picture) to the next I picture. Forward-predicted encoded image) and B-Pictures (B-pictures: a plurality of bidirectional predictive encoded images)}. For example, a storage recording medium is generally configured as a group of 15 pictures (15 images).

各ＶＯＢＵ２８には、ＭＰＥＧ多重化された別個のストリームデータであるＮＶ＿ＰＡＣＫ２９、Ａ＿ＰＡＣＫ３０、Ｖ＿ＰＡＣＫ３１、およびＤ＿ＰＡＣＫ３２がそれぞれ時分割多重化により格納されており、先頭のＮＶ＿ＰＡＣＫ２９にはストリームサーチ情報等がパック化（例えば、トランスポートパケット単位でパック化）されている。また、Ａ＿ＰＡＣＫ３０には、圧縮符号化されたオーディオデータがパック化されており、Ｖ＿ＰＡＣＫ３１には、圧縮符号化されたビデオデータがパック化されている。 In each VOBU 28, NV_PACK29, A_PACK30, V_PACK31, and D_PACK32, which are separate MPEG multiplexed stream data, are stored by time division multiplexing, and stream search information and the like are packed in the first NV_PACK29 (for example, And packed in units of transport packets). The A_PACK30 is packed with compression-coded audio data, and the V_PACK31 is packed with compression-coded video data.

上述したように、Ｄ＿ＰＡＣＫ３２は時分割されており、複数のＤ＿ＰＡＣＫ３２により各フレームレイヤ３３を構成する。各フレームレイヤ３３は、先頭に２ビットのスタートコード（Start Code）３４を有しており、例えば１６進で０００００１ＦＦから始まるように設定されている。これは、後述するデータ符号化処理として、ランレングス符号化処理、離散コサイン変換（ＤＣＴ：Discrete Cosine Transform）処理、および／または可変長符号化処理（ＶＬＣ：Variable length Coding）を行った場合においても、その符号化されたデータから区別するために、スタートコード３４は、１６進で０００００１ＦＦから始まる特殊なデータとして設定されている。 As described above, the D_PACK 32 is time-divided, and each frame layer 33 is configured by a plurality of D_PACKs 32. Each frame layer 33 has a 2-bit start code 34 at the head, and is set to start from 000001FF in hexadecimal, for example. This is also the case when a run length encoding process, a discrete cosine transform (DCT) process, and / or a variable length encoding process (VLC) is performed as a data encoding process to be described later. In order to distinguish from the encoded data, the start code 34 is set as special data starting from 000001FF in hexadecimal.

本実施形態においては、撮像カメラ１Ｂにより撮像された右眼画像Ｐｒは、ビデオ圧縮器２に伝送され、このビデオデータは、撮像された２次元画像（右眼画像Ｐｒ）は、図４に示すＤＶＤビデオフォーマットに準拠する圧縮フォーマットであるＭＰＥＧ圧縮フォーマットによりビデオ圧縮器２において圧縮符号化され、ビットストリーム（ＭＰＥＧビデオストリーム）として情報多重化器８に送信される。また、マイク６により収集された周囲音に対応するオーディオ信号は、図４に示すＤＶＤビデオフォーマットに準拠するＭＰＥＧオーディオフォーマットによりオーディオ圧縮器７において圧縮符号化されオーディオデータ（ＭＰＥＧオーディオストリーム）として情報多重化器８に送信される。 In the present embodiment, the right eye image Pr captured by the imaging camera 1B is transmitted to the video compressor 2, and the captured two-dimensional image (right eye image Pr) is shown in FIG. The data is compressed and encoded in the video compressor 2 by the MPEG compression format, which is a compression format compliant with the DVD video format, and transmitted to the information multiplexer 8 as a bit stream (MPEG video stream). Also, the audio signal corresponding to the ambient sound collected by the microphone 6 is compression-encoded in the audio compressor 7 by the MPEG audio format compliant with the DVD video format shown in FIG. 4, and is information multiplexed as audio data (MPEG audio stream). Sent to the generator 8.

情報多重化器８では、送信されてきたビデオデータ（ＭＰＥＧビットストリーム）およびオーディオデータ(ＭＰＥＧオーディオストリーム)が前掲図４に示すＤＶＤビデオフォーマットに従ってパック化および多重化される。この結果、上記ＤＶＤビデオフォーマットに対応するＭＰＥＧ多重化トランスポートデータが生成される。生成されたＭＰＥＧ多重化トランスポートデータは、記録器１０に送られ、この記録器１０の処理により、ＤＶＤビデオ規格に準拠した記録媒体９に記録される。 In the information multiplexer 8, the transmitted video data (MPEG bit stream) and audio data (MPEG audio stream) are packed and multiplexed according to the DVD video format shown in FIG. As a result, MPEG multiplexed transport data corresponding to the DVD video format is generated. The generated MPEG multiplexed transport data is sent to the recorder 10 and is recorded on the recording medium 9 compliant with the DVD video standard by the processing of the recorder 10.

ここで、ビデオ圧縮器２における圧縮符号化処理の規格の一例であるＭＰＥＧについて簡単に説明する。 Here, MPEG which is an example of the standard of compression encoding processing in the video compressor 2 will be briefly described.

ＭＰＥＧは、1988年、ISO/IEC JTC1/SC2（国際標準化機構/国際電気標準化会合同技術委員会1/専門部会2、現在のSC29）に設立された動画像符号化標準を検討する組織の名称（Moving Pictures Expert Group）の略称である。ＭＰＥＧ１（MPEGフェーズ1）は1.5Mbps程度の蓄積メディアを対象とした圧縮方式の標準で、静止画圧縮符号化を目的としたJPEGと、ISDNのテレビ会議やテレビ電話の低転送レート用の動画像圧縮を目的としたH.261（CCITT SGXV、現在のITU-T SG15で標準化）との基本的な技術を受け継ぎ、蓄積メディア用に新しい圧縮符号化技術を導入したものである。これらは1993年8月、ISO/IEC 11172 として成立している。 MPEG is the name of an organization that examines video coding standards established in 1988 by ISO / IEC JTC1 / SC2 (International Organization for Standardization / International Electrotechnical Standards Meeting Technical Committee 1 / Technical Committee 2, now SC29). Abbreviation for (Moving Pictures Expert Group). MPEG1 (MPEG Phase 1) is a compression standard for storage media of about 1.5 Mbps, and JPEG for the purpose of still image compression coding and moving images for low transfer rates of ISDN video conferences and videophones. It inherits the basic technology of H.261 (CCITT SGXV, standardized by the current ITU-T SG15) for the purpose of compression, and introduces a new compression encoding technology for storage media. These were established in August 1993 as ISO / IEC 11172.

ＭＰＥＧ２（MPEGフェーズ２）は、通信や放送等の多様なアプリケーションに対応できる汎用圧縮符号化技術の標準として、１９９４年１１月ISO/IEC １３８１８、H.２６２として成立している。 MPEG2 (MPEG phase 2) was established as ISO / IEC 13818, H.262 in November 1994 as a standard of general-purpose compression coding technology that can be used for various applications such as communication and broadcasting.

ＭＰＥＧは幾つかの技術を組み合わせて設計されており、特に時間領域における冗長性除去用の予測符号化技術｛フレーム内予測符号化（フレーム間差分符号化）技術、動き補償技術｝と、周波数領域における冗長性除去用の離散コサイン変換技術とが組み合わされている。 MPEG is designed by combining several technologies, and in particular, a predictive coding technique for removing redundancy in the time domain {intraframe predictive coding (interframe differential coding) technique, motion compensation technique}, and frequency domain. In combination with a discrete cosine transform technique for redundancy removal.

すなわち、予測符号化技術におけるフレーム間差分符号化技術は、ビデオデータにおける連続する２つのフレーム画像が互いに類似していることを利用して、その２つのフレーム画像の差分をとり、この差分情報を符号化することにより、時間冗長部分を削除する技術である。 That is, the inter-frame difference encoding technique in the predictive encoding technique takes advantage of the fact that two consecutive frame images in video data are similar to each other, and takes the difference between the two frame images. This is a technique for deleting a time redundant portion by encoding.

予測符号化技術の動き補償技術における予測方向は、過去、未来、過去および未来の双方の３モードが存在する。またこの３モードは、１６画素×１６画素のマクロブロック（ＭＢ）単位で切り替えて使用することができる。予測方向は、例えば入力画像として与えられたフレーム画像（ピクチャ）のタイプによって決定される。 The prediction direction in the motion compensation technique of the predictive coding technique has three modes of past, future, past and future. The three modes can be switched and used in units of 16 × 16 pixel macroblocks (MB). The prediction direction is determined by the type of frame image (picture) given as an input image, for example.

上述したように、ＭＰＥＧの予測符号化されたフレームには、データ量の異なる３種類の画像（ピクチャ）が存在する。 As described above, there are three types of images (pictures) having different data amounts in a frame subjected to MPEG predictive encoding.

すなわち、過去のフレーム画像からの順方向のフレーム間予測符号化を行うモードと、予測をせずにマクロブロック内の画素を独立で符号化するモードとをマクロブロック毎に切り替えて符号化されているのがＰピクチャである。 That is, encoding is performed by switching between a mode for performing interframe predictive encoding in the forward direction from a past frame image and a mode for independently encoding pixels in a macroblock without performing prediction. There is a P picture.

また、過去のフレーム画像からの順方向のフレーム間予測符号化を行うモードと、未来のフレーム画像から逆方向のフレーム間予測符号化を行うモードと、過去および未来の２つのフレーム画像から双方向のフレーム間予測符号化を行うモードと、予測をせずにマクロブロック内の画素を独立で符号化するモードとをマクロブロック毎に切り替えて符号化されているのがＢピクチャである。 In addition, a mode for performing forward interframe prediction encoding from a past frame image, a mode for performing interframe predictive encoding in the reverse direction from a future frame image, and bi-directional from two past and future frame images The B picture is encoded by switching between the mode in which the inter-frame predictive encoding is performed and the mode in which the pixels in the macro block are independently encoded without performing the prediction.

そしてフレーム画像内の全てのマクロブロックの画素がマクロブロック毎にそれぞれ独立してフレーム内符号化されているのがＩピクチャである。 In the I picture, the pixels of all the macroblocks in the frame image are intraframe-coded independently for each macroblock.

すなわち、動き補償技術は、フレーム画像間の動き領域（動きベクトル）を、マクロブロック毎のパターンマッチングにより半画素（ハーフピクセル）精度で検出し、検出した動きベクトル分だけシフトしてから、上記フレーム間予測符号化を行う技術である。動きベクトルは、水平方向と垂直方向が存在し、この動きベクトルの符号化データは、予測開始位置を示すＭＣ(Motion Compensation)モード情報と共にマクロブロックの付加情報として伝送することができる。 That is, the motion compensation technique detects a motion region (motion vector) between frame images with half pixel accuracy by pattern matching for each macroblock, and shifts the detected motion vector by the detected motion vector before the frame. This is a technique for performing inter prediction encoding. The motion vector has a horizontal direction and a vertical direction, and the encoded data of the motion vector can be transmitted as additional information of the macroblock together with MC (Motion Compensation) mode information indicating the prediction start position.

離散コサイン変換技術（ＤＣＴ技術）は、余弦関数を積分核とした積分変換を有限空間へＤＣＴ（離散変換）する直交変換する技術であり、このＤＣＴを用いてフレーム画像の画素値を周波数領域のデータに変換してフレーム画像内の高周波成分をカットすることができる。 Discrete cosine transform technology (DCT technology) is a technology for performing orthogonal transform by DCT (discrete transform) to integral space using a cosine function as an integral kernel, and using this DCT, a pixel value of a frame image is converted into a frequency domain. It is possible to cut high frequency components in the frame image by converting into data.

図５は、上述したＭＰＥＧによる圧縮符号化処理を実行するための符号化装置の一例を示すブロック図である。 FIG. 5 is a block diagram showing an example of an encoding apparatus for executing the above-described MPEG compression encoding process.

図５に示すように、符号化装置５０は、圧縮符号化対象となるビデオデータ（フレーム単位の動画像）がそれぞれ入力される減算処理用の演算器５１および動き補償予測器５２と、上述した離散変換処理用のＤＣＴ器５３と、量子化処理を行う量子化器５４と、可変長符号化処理を行うＶＬＣ器５５と、バッファ５６と、符号量に基づくフィードバック制御を行うための符号量制御器５７とを備えている。 As shown in FIG. 5, the encoding device 50 includes a subtraction processing computing unit 51 and a motion compensation predictor 52 to which video data (moving images in frame units) to be compressed and encoded are input, and the above-described operation. DCT unit 53 for discrete transform processing, quantizer 54 for performing quantization processing, VLC unit 55 for performing variable length coding processing, buffer 56, and code amount control for performing feedback control based on the code amount Instrument 57.

また、符号化装置５０は、逆量子化処理を行うための逆量子化器５８と、逆ＤＣＴを行う逆ＤＣＴ器５９と、加算処理用の演算器６０と、画像蓄積用の画像メモリ６１とを備えている。 The encoding device 50 also includes an inverse quantizer 58 for performing inverse quantization processing, an inverse DCT device 59 for performing inverse DCT, an arithmetic unit 60 for addition processing, and an image memory 61 for storing images. It has.

図５に示す符号化装置５０によれば、演算器５１に入力されたビットストリーム（ＭＰＥＧビデオストリーム）ＶＳにおけるＮ番目のフレーム画像は、この演算器５１により、後述する動き補償予測器５２により動き補償されたリファレンス（参照）用復号化画像（Ｎ−１番目のフレーム画像）との差分が演算され（フレーム間差分符号化処理）、得られた差分画像は、ＤＣＴ器５３に送信される。ＤＣＴ器５３では、送信されてきた差分画像が周波数領域の画像データに変換される。すなわち、ＤＣＴ器５３では、送信されてきた差分画像は、マクロブロック４分割した８×８のＤＣＴブロックに分割され、それぞれのＤＣＴブロック毎に２次元ＤＣＴ処理が実行される。ここで、一般に、ビデオデータは、低周波数帯域の成分が多く高周波数帯域の成分が少ないため、ＤＣＴ処理後の画像データ（各ＤＣＴ係数ブロック）は、低周波数帯域に集中する。 According to the encoding device 50 shown in FIG. 5, the Nth frame image in the bit stream (MPEG video stream) VS input to the computing unit 51 is moved by the motion compensated predictor 52 described later. The difference from the compensated reference (reference) decoded image (N−1th frame image) is calculated (inter-frame difference encoding process), and the obtained difference image is transmitted to the DCT unit 53. In the DCT unit 53, the transmitted difference image is converted into frequency domain image data. That is, in the DCT unit 53, the transmitted difference image is divided into 8 × 8 DCT blocks divided into four macro blocks, and a two-dimensional DCT process is executed for each DCT block. Here, since video data generally has many components in the low frequency band and few components in the high frequency band, the image data after DCT processing (each DCT coefficient block) is concentrated in the low frequency band.

ＤＣＴ処理により得られた画像データ（各ＤＣＴ係数ブロック）は、量子化器５４に送られて量子化される。 Image data (each DCT coefficient block) obtained by the DCT processing is sent to the quantizer 54 and quantized.

すなわち、量子化器５４において、各ＤＣＴ係数ブロックは、８×８のマトリクスとして表された２次元周波数値の各セルの値を視覚特性（人間の視覚的認識特性）に基づいて重み付けし、且つ例えばフレーム単位、あるいはマクロブロック単位で指定された全体をスカラー倍するための量子化スケール値により乗算されて得られた量子化値により構成された８×８の量子化マトリクスに基づいて対応するセル同士で除算される。この結果、８×８のＤＣＴ係数データにおける右下のデータ（高周波数帯域係数）は０となり、高周波数帯域のデータを除去することができる。 That is, in the quantizer 54, each DCT coefficient block weights the value of each cell of two-dimensional frequency values expressed as an 8 × 8 matrix based on visual characteristics (human visual recognition characteristics), and For example, a corresponding cell based on an 8 × 8 quantization matrix composed of quantized values obtained by multiplying by a quantization scale value for multiplying the whole specified in frame units or macroblock units by a scalar Divide between each other. As a result, the lower right data (high frequency band coefficient) in the 8 × 8 DCT coefficient data becomes 0, and the data in the high frequency band can be removed.

このようにして、高周波数帯域のデータが除去された量子化データ（各８×８の量子化データブロック）は、ＶＬＣ器５５および逆量子化器５８にそれぞれ送られる。 In this way, the quantized data (8 × 8 quantized data blocks) from which the high frequency band data has been removed is sent to the VLC unit 55 and the inverse quantizer 58, respectively.

ＶＬＣ器５５では、ブロック毎に可変長符号化処理が行われる。 In the VLC unit 55, variable length coding processing is performed for each block.

ここで、各８×８量子化データブロックの一番左上のセルは、ＤＣＴ後の定数項に相当し、波形全体の平均値、言い換えれば直流成分（ＤＣ成分）を表しており、残りのセルは、交流成分（高次成分）に相当する。 Here, the upper left cell of each 8 × 8 quantized data block corresponds to a constant term after DCT, and represents the average value of the entire waveform, in other words, the direct current component (DC component), and the remaining cells. Corresponds to an AC component (higher order component).

このとき、ＶＬＣ器５５において、各量子化データブロックにおける左上のセルの直流成分は、予測符号化の１つである差分ＰＣＭ{Differential Pulse Code Modulation）}により予測符号化される。また、交流成分を構成する残りのセル（交流成分）は、低周波数域（左上側）から高周波数域（右下）に向かって、その左上側および右下側を結ぶ対角線に直交する方向に沿ってジグザグスキャン（Zigzag Scan）により順番に読み出され、一列に整列される。このジグザグスキャンにより読み出され一列に整列されたデータにおける後ろの部分のデータは、高周波成分に相当しており、０が連続して発生している。 At this time, in the VLC unit 55, the DC component of the upper left cell in each quantized data block is predictively encoded by differential PCM {Differential Pulse Code Modulation)} which is one of predictive encoding. In addition, the remaining cells (AC component) constituting the AC component are directed in a direction perpendicular to the diagonal line connecting the upper left side and the lower right side from the low frequency region (upper left) to the high frequency region (lower right). Are sequentially read out by a zigzag scan and aligned in a line. The data in the rear part of the data read out by the zigzag scan and arranged in a line corresponds to a high frequency component, and 0s are continuously generated.

そこで、ＶＬＣ器５５では、例えば連続している０のラン長および０以外の有効なデータ値を１つの事象とし、出現確率の高いデータ値から符号長の短い符号が割り当てられることにより、ハフマン符号化される。また、動き補償予測器５２により得られた動きベクトルもＶＬＣ器５５に送られてハフマン符号化される。 Therefore, in the VLC unit 55, for example, a continuous run length of 0 and a valid data value other than 0 are regarded as one event, and a code with a short code length is assigned from a data value with a high probability of appearance, so that the Huffman code It becomes. The motion vector obtained by the motion compensation predictor 52 is also sent to the VLC unit 55 and Huffman coded.

このようにしてハフマン符号化された符号化データ（動きベクトルの符号化データが付加されている）、すなわち、符号化装置５０によりＭＰＥＧ圧縮符号化された差分画像のＭＰＥＧビデオストリームＶＳは、一時的にバッファ５６に蓄積され、所定の転送レートにより出力される。このとき、出力されたＭＰＥＧビデオストリームＶＳにおけるマクロブロック毎の発生符号量を表すデータは、符号量制御器５７に送信され、この符号量制御器５７において、送信されてきた発生符号量データと目標となる符号量との間の誤差符号量が求められる。この誤差符号量は、量子化器５４にフィードバックされ、フィードバックされた誤差符号量に基づいて、量子化器５４において量子化スケール値が調整される。この結果、ＭＰＥＧ圧縮符号化後のＭＰＥＧビデオストリームＶＳの符号化量に応じて最適な量子化スケール値を設定することができる。 The Huffman-encoded encoded data (the motion vector encoded data is added), that is, the MPEG video stream VS of the differential image MPEG-encoded by the encoding device 50 is temporarily stored. Are stored in the buffer 56 and output at a predetermined transfer rate. At this time, the data representing the generated code amount for each macroblock in the output MPEG video stream VS is transmitted to the code amount controller 57, and the generated code amount data and the target data transmitted by the code amount controller 57 are transmitted. An error code amount between the code amount to be obtained is obtained. This error code amount is fed back to the quantizer 54, and the quantizer 54 adjusts the quantization scale value based on the fed back error code amount. As a result, an optimal quantization scale value can be set according to the encoding amount of the MPEG video stream VS after MPEG compression encoding.

一方、量子化器５４から逆量子化器５８に送られた各８×８量子化データブロックは、量子化器５４の量子化処理と逆の処理を行うことにより逆量子化され、得られたＤＣＴ係数ブロックは、逆ＤＣＴ器５９において逆ＤＣＴ処理が施され、この結果、差分画像が得られる。得られた差分画像は、演算器６０に送られ、この演算器６０において動き補償予測器５２から送られた動き補償されたリファレンス用復号化画像（前フレーム画像）に加算され、加算されたフレーム画像、すなわち、入力されたＮ番目のフレーム画像に相当する再生用フレーム画像は、次に入力されるＮ＋１番目のフレーム画像に対するリファレンス用復号化画像として画像メモリ６１に蓄積される。 On the other hand, each 8 × 8 quantized data block sent from the quantizer 54 to the inverse quantizer 58 is inversely quantized and obtained by performing a process reverse to the quantization process of the quantizer 54. The DCT coefficient block is subjected to inverse DCT processing in an inverse DCT unit 59, and as a result, a difference image is obtained. The obtained difference image is sent to the computing unit 60, where it is added to the motion-compensated decoded reference image (previous frame image) sent from the motion compensation predictor 52, and the added frame. An image, that is, a playback frame image corresponding to the input Nth frame image is stored in the image memory 61 as a reference decoded image for the next input N + 1th frame image.

動き補償予測器５２においては、今回の入力フレーム画像（例えばＮ番目のフレーム画像）と画像メモリ６１に蓄積された前回のフレーム画像（例えばＮ−１番目のフレーム画像）との間でマクロブロック毎のパターンマッチング処理が実行され、この結果得られた動きベクトルだけシフトされた前回のフレーム画像がリファレンス用復号化画像として演算器５１および６０にそれぞれ送信される。 In the motion compensated predictor 52, every macroblock between the current input frame image (for example, the Nth frame image) and the previous frame image (for example, the (N-1) th frame image) stored in the image memory 61. The previous frame image shifted by the motion vector obtained as a result is transmitted to the computing units 51 and 60 as a decoded image for reference, respectively.

以上述べたように、符号化装置５０に入力されたビデオデータは、ＭＰＥＧ技術、すなわち、時間領域の予測符号化処理（フレーム差分符号化、動き補償）、周波数領域の圧縮符号化処理（離散コサイン変換）、および可変長符号化処理（ハフマン符号化）により圧縮符号化される。 As described above, the video data input to the encoding device 50 is encoded with MPEG technology, that is, time domain predictive encoding processing (frame difference encoding, motion compensation), frequency domain compression encoding processing (discrete cosine). Conversion) and variable length coding processing (Huffman coding).

一方、図６は、図５に示す符号化装置５０により圧縮符号化されたビデオデータを復号して再生する復号化装置７０を示すブロック図である。 On the other hand, FIG. 6 is a block diagram showing a decoding apparatus 70 that decodes and reproduces video data compression-encoded by the encoding apparatus 50 shown in FIG.

図６に示すように、復号化装置７０は、符号化装置５０により圧縮符号化されたＭＰＥＧビデオストリームＶＳが入力されるバッファ７１と、可変長復号化処理を行うＶＬＤ（Variable length Decoding）器７２と、逆量子化処理を行うための逆量子化器７３と、逆ＤＣＴを行う逆ＤＣＴ器７４と、加算処理用の演算器７５と、動き補償予測器７６と、画像メモリ７７とを備えている。 As shown in FIG. 6, the decoding device 70 includes a buffer 71 to which the MPEG video stream VS compressed and encoded by the encoding device 50 is input, and a VLD (Variable length Decoding) device 72 that performs variable length decoding processing. An inverse quantizer 73 for performing inverse quantization processing, an inverse DCT device 74 for performing inverse DCT, an arithmetic unit 75 for addition processing, a motion compensation predictor 76, and an image memory 77. Yes.

図６に示す復号化装置７０によれば、入力されたＭＰＥＧビデオストリーム（例えば、Ｎ番目のフレーム画像に対応するものとする）ＶＳは、バッファ７１に蓄積（バッファリング）され、ＶＬＤ器７２に入力される。 According to the decoding device 70 shown in FIG. 6, the input MPEG video stream (for example, corresponding to the Nth frame image) VS is stored (buffered) in the buffer 71, and is stored in the VLD unit 72. Entered.

ＶＬＤ器７２において、ＭＰＥＧビデオストリームＶＳは、可変長復号化処理され、直流成分のデータおよび交流成分のデータがそれぞれ得られる。直流成分のデータは、８×８量子化データブロックの一番左上のセルに配置され、交流成分のデータは、その８×８の量子化ブロックの残りのセルに対して、上述した低周波数域（左上側）から高周波数域（右下）へのジグザグスキャンにより順番に配置される。なお、このＶＬＤ器７２においては、符号化された動きベクトルも復号化され、この動きベクトルは、動き補償予測器７６に送信される。 In the VLD unit 72, the MPEG video stream VS is subjected to variable length decoding processing to obtain DC component data and AC component data, respectively. The DC component data is arranged in the upper left cell of the 8 × 8 quantized data block, and the AC component data is stored in the low frequency band described above with respect to the remaining cells of the 8 × 8 quantized block. They are arranged in order by zigzag scanning from the (upper left) to the high frequency range (lower right). The VLD unit 72 also decodes the encoded motion vector, and this motion vector is transmitted to the motion compensation predictor 76.

このようにして得られた各８×８量子化データブロックは、逆量子化器７３により量子化器５４の量子化処理と逆の処理を行うことにより逆量子化される。逆量子化により得られたＤＣＴ係数ブロックは、逆ＤＣＴ器７４において逆ＤＣＴ処理が施され、この結果、入力されたＭＰＥＧビデオストリームＶＳに相当する差分画像が得られる。 Each 8 × 8 quantized data block obtained in this way is inversely quantized by performing a process reverse to the quantization process of the quantizer 54 by the inverse quantizer 73. The DCT coefficient block obtained by inverse quantization is subjected to inverse DCT processing in an inverse DCT unit 74, and as a result, a differential image corresponding to the input MPEG video stream VS is obtained.

得られた差分画像は、演算器７５に送られ、この演算器７５において動き補償予測器７６から送られた動き補償されたリファレンス用復号化画像（前回再生されたフレーム画像）に加算され、加算されたフレーム画像、すなわち、入力されたＮ番目のフレーム画像に相当する再生用フレーム画像（復号化データ）は、図示しない再生装置を用いて再生される。また、この再生用フレーム画像は、次に入力されるＮ＋１番目のフレーム画像に対するリファレンス用復号化画像として画像メモリ７７に蓄積される。 The obtained difference image is sent to the computing unit 75, where it is added to the motion-compensated reference decoded image (previously reproduced frame image) sent from the motion compensation predictor 76. The reproduced frame image, that is, the reproduction frame image (decoded data) corresponding to the input Nth frame image is reproduced using a reproduction device (not shown). The reproduction frame image is stored in the image memory 77 as a reference decoded image for the N + 1th frame image to be input next.

動き補償予測器７６においては、今回の入力フレーム画像（例えばＮ番目のフレーム画像）と画像メモリ７７に蓄積された前回再生されたフレーム画像（例えばＮ−１番目のフレーム画像）との間でマクロブロック毎のパターンマッチング処理が実行され、この結果得られた動きベクトルだけシフトされた前回再生されたフレーム画像がリファレンス用復号化画像として演算器７５に送信される。 In the motion compensation predictor 76, a macro between the current input frame image (for example, the Nth frame image) and the previously reproduced frame image (for example, the (N-1) th frame image) stored in the image memory 77 is displayed. The pattern matching process for each block is executed, and the frame image reproduced last time shifted by the motion vector obtained as a result is transmitted to the calculator 75 as a decoded image for reference.

以上述べたように、符号化装置５０により符号化され、復号化装置７０に入力されたＭＰＥＧビデオストリームＶＳは、復号化装置７０により再生用フレーム画像として復号化される。 As described above, the MPEG video stream VS encoded by the encoding device 50 and input to the decoding device 70 is decoded by the decoding device 70 as a playback frame image.

本実施形態の記録装置２００における例えばビデオ圧縮器２に対して圧縮符号化装置５０の構成および／または機能を持たせることにより、奥行き情報フォーマット器５から送られた奥行きデータを圧縮符号化することができる。そして、図示しない再生装置に対して復号化装置７０の構成および／または機能を持たせることにより、記録装置２００により圧縮符号化されたＭＰＥＧビデオストリームＶＳから対応するフレーム画像を再生することができる。 The depth data sent from the depth information formatter 5 is compression-encoded by, for example, providing the configuration and / or function of the compression-encoding device 50 to the video compressor 2 in the recording apparatus 200 of the present embodiment. Can do. Then, by providing the reproducing apparatus (not shown) with the configuration and / or function of the decoding apparatus 70, it is possible to reproduce the corresponding frame image from the MPEG video stream VS compression-encoded by the recording apparatus 200.

ここで、符号化装置５０により生成されるＭＰＥＧビデオストリームＶＳの階層構造（シンタックス）の概略構成を図７に示す。 Here, a schematic configuration of a hierarchical structure (syntax) of the MPEG video stream VS generated by the encoding device 50 is shown in FIG.

図７に示すように、ＭＰＥＧビデオストリームＶＳは、６つの階層(レイヤ)から構成されており、その最上位のレイヤであるシーケンスレイヤは、図８（ａ）にテーブルＴ１として示すシンタックスを有している。なお、このテーブルＴ１を含む後述するＭＰＥＧビデオストリームＶＳのシンタックスは、ＭＰＥＧビットストリームＶＳからデータエレメントを抽出するために使用される復号化装置７０側でのシンタックスである。符号化装置５０側でのシンタックスは、復号化装置７０側でのシンタックスからｉｆ文やｗｈｉｌｅ文等の条件文を省略したシンタックスとなる。 As shown in FIG. 7, the MPEG video stream VS is composed of six hierarchies (layers), and the sequence layer, which is the highest layer, has the syntax shown in FIG. 8A as the table T1. is doing. Note that the syntax of an MPEG video stream VS, which will be described later, including this table T1 is a syntax on the side of the decoding device 70 used for extracting a data element from the MPEG bit stream VS. The syntax on the encoding device 50 side is a syntax in which conditional statements such as an if statement and a while statement are omitted from the syntax on the decoding device 70 side.

図８（ａ）に示すように、シーケンスレイヤにおけるｖｉｄｅｏ＿ｓｅｑｕｅｎｃｅ（）におけるＮｅｘｔ＿Ｓｔａｒｔ＿ｃｏｄｅは、ＭＰＥＧビットストリーム中に記述されているスタートコードを探すための関数であり、ｄｏ{ }ｗｈｉｌｅ構文は、ｗｈｉｌｅ文によって定義されている条件が真である間、ｄｏ文の{ }内の関数に基づいて記述されたデータエレメントをＭＰＥＧビットストリーム中から抽出するための構文である。したがって、このシーケンスレイヤにおいては、ｗｈｉｌｅ文で記述されたシーケンスヘッダコード（Ｓｅｑｕｅｎｃｅ＿ｈｅａｄｅｒ＿ｃｏｄｅ）の条件が真である間、シーケンスヘッダ（Ｓｅｑｕｅｎｃｅ＿ｈｅａｄｅｒ）関数に基づいて記述されたデータエレメントをＭＰＥＧビットストリーム中から抽出し、ｗｈｉｌｅ文で記述されたＧＯＰレイヤ（ｇｒｏｕｐ＿ｏｆ＿ｐｉｃｔｕｒｅｓレイヤ）のグループスタートコード（Ｇｒｏｕｐ＿ｓｔａｒｔ＿ｃｏｄｅ）の条件が真である間、ＧＯＰレイヤの関数に基づいて記述されたデータエレメントを、抽出したデータエレメントからさらに抽出するデコード方法を表している。 As shown in FIG. 8A, Next_Start_code in video_sequence () in the sequence layer is a function for searching for a start code described in the MPEG bitstream, and the do {} while syntax is defined by a while statement. This is a syntax for extracting the data element described based on the function in {} of the do statement from the MPEG bitstream while the specified condition is true. Therefore, in this sequence layer, while the condition of the sequence header code (Sequence_header_code) described by the while statement is true, the data element described based on the sequence header (Sequence_header) function is extracted from the MPEG bit stream. While the condition of the group start code (Group_start_code) of the GOP layer (group_of_pictures layer) described in the while statement is true, the data elements described based on the function of the GOP layer are further extracted from the extracted data elements Describes the decoding method.

シーケンスヘッダは、画面のフォーマット等を指定するためのレイヤであり、図８（ｂ）にテーブルＴ２として示すシンタックスを有している。 The sequence header is a layer for designating a screen format and the like, and has a syntax shown as a table T2 in FIG.

ＧＯＰレイヤは、図９にテーブルＴ３として示すシンタックス構造を有しており、図９における“ｕｓｅｒ＿ｄａｔａ”として示すように、ユーザによりビデオデータやオーディオデータとは関係のない任意のデータを格納することができるユーザデータ領域ＵＤ１（図７参照）が設定されている。 The GOP layer has a syntax structure shown as a table T3 in FIG. 9, and stores arbitrary data unrelated to video data and audio data by the user as shown as “user_data” in FIG. A user data area UD1 (see FIG. 7) is set.

ＧＯＰレイヤの下位階層であり抽出対象のデータエレメントを規定するＰｉｃｔｕｒｅレイヤは、図１０にテーブルＴ４として示すシンタックス構造を有しており、図１０における“ｕｓｅｒ＿ｄａｔａ”として示すように、ユーザによりビデオデータやオーディオデータとは関係のない任意のデータを格納することができるユーザデータ領域ＵＤ２（図７参照）が設定されている。 The Picture layer that is a lower layer of the GOP layer and defines the data element to be extracted has a syntax structure shown as a table T4 in FIG. A user data area UD2 (see FIG. 7) is set in which arbitrary data unrelated to audio data can be stored.

Ｐｉｃｔｕｒｅレイヤの下位階層であり抽出対象のデータエレメントを規定するＳｌｉｃｅレイヤは、マクロブロックの帯を表しており、図１１にテーブルＴ５として示すシンタックス構造を有している。 A slice layer that is a lower layer of the picture layer and defines a data element to be extracted represents a band of a macroblock, and has a syntax structure shown as a table T5 in FIG.

また、Ｓｌｉｃｅレイヤの下位階層であり抽出対象のデータエレメントを規定するＭａｃｒｏｂｌｏｃｋ（マクロブロック）レイヤは、図１２にテーブルＴ６として示すシンタックス構造を有している。通常のＭＰＥＧのフレーム内予測符号化におけるマクロブロック（ＭＢ）としては、Ｙ信号用の４つの８×８のＤＣＴブロックおよび色差信号（ＣｂおよびＣｒ）用の２つの８×８のＤＣＴブロックを使用するため、そのシンタックスにおけるｉｆ文（macroblock_pattern）におけるｆｏｒ文の繰り返し範囲が“ｆｏｒ（ｉ＝０；ｉ＜６；ｉ＋＋）”と設定されている。 A Macroblock layer that is a lower layer of the Slice layer and defines a data element to be extracted has a syntax structure shown as a table T6 in FIG. As macroblocks (MB) in normal MPEG intraframe prediction coding, four 8 × 8 DCT blocks for Y signal and two 8 × 8 DCT blocks for color difference signals (Cb and Cr) are used. Therefore, the repetition range of the for statement in the if statement (macroblock_pattern) in the syntax is set as “for (i = 0; i <6; i ++)”.

さらに、マクロブロックレイヤの下位階層であり抽出対象のデータエレメント（８×８の小ブロック）を規定するブロックレイヤは、図１３にテーブルＴ７として示すシンタックス構造を有している。 Further, the block layer that is a lower layer of the macroblock layer and defines the data element (8 × 8 small block) to be extracted has a syntax structure shown as a table T7 in FIG.

すなわち、ビデオ圧縮器２に入力されたビデオデータ（右眼画像Ｐｒ）は、ビデオ圧縮器２の図５に示すＭＰＥＧ圧縮符号化処理により圧縮符号化され、図７〜図１３に示す構造を有するＭＰＥＧビデオストリームＶＳが生成される。 That is, the video data (right eye image Pr) input to the video compressor 2 is compression-encoded by the MPEG compression encoding process shown in FIG. 5 of the video compressor 2 and has the structure shown in FIGS. An MPEG video stream VS is generated.

次に、本実施形態における奥行きデータをＤＶＤビデオフォーマットに準拠するＭＰＥＧ圧縮符号化フォーマットに互換性を有するように記録する処理について説明する。 Next, a process for recording the depth data in the present embodiment so as to be compatible with the MPEG compression encoding format based on the DVD video format will be described.

本実施形態では、ビデオ圧縮器２は、奥行き情報フォーマット器５から送られてきた各画素の８ビットの奥行きデータを、次の（１）〜（４）の内の何れかのデータフォーマットに基づいて、例えば右眼画像Ｐｒに基づくＭＰＥＧ圧縮符号化されたビデオデータ（ＭＰＥＧビデオストリーム）ＶＳにおけるユーザデータ領域ＵＤ１および／またはユーザデータ領域ＵＤ２に格納し、奥行きデータを含むＭＰＥＧビデオストリームＶＳを情報多重化器６へ送信するように構成されている。 In the present embodiment, the video compressor 2 converts the 8-bit depth data of each pixel sent from the depth information formatter 5 based on any one of the following data formats (1) to (4). Thus, for example, the MPEG video stream VS is stored in the user data area UD1 and / or the user data area UD2 in the MPEG compression-encoded video data (MPEG video stream) VS based on the right eye image Pr, and the MPEG video stream VS including depth data is information multiplexed. Is configured to transmit to the generator 6.

（１）左上画素の奥行きデータから右下画素の奥行きデータへ向かう順番（ラスタ順番）に並ぶように格納するフォーマット
（２）各画素の奥行きデータの値が同じ値を示す場合に、その同じ値を示す画素数（ラン長）を７ビットで表すデータおよびその同一となるデータ値が並ぶように圧縮符号化して格納するフォーマット
（３）各画素の奥行きデータを８ビットのグレースケール画像（Ｙ信号）に見立てて、上述したＭＰＥＧ圧縮符号化におけるＩピクチャ画像と同じようにフレーム内予測符号化して格納するフォーマット
（４）（３）のフレーム内符号化に加えて、上述したＭＰＥＧ圧縮符号化におけるＰピクチャ画像用およびＢピクチャ画像用の順方向フレーム間予測符号化および双方向フレーム間挿入予測符号化して格納するフォーマット
例えば、（２）のフォーマットにおいては、図４に示すように、各フレームレイヤ３３は、スタートコード３４に続いて、１ビットの奥行きデータ有効フラグ（ＤＥ）３５を有しており、この奥行きデータ有効フラグ３５に“１”が設定されている場合には、後に続く奥行きデータは有効となる。 (1) Format stored in order from the depth data of the upper left pixel to the depth data of the lower right pixel (raster order) (2) When the depth data value of each pixel shows the same value, the same value A format in which the data representing the number of pixels (run length) indicating 7 bits and the same data value are compressed and stored so that the same data values are arranged. (3) Depth data of each pixel is converted into an 8-bit grayscale image (Y signal). In addition to the intra-frame encoding of the formats (4) and (3) for storing by intra-frame predictive encoding in the same manner as the I picture image in the above-mentioned MPEG compression encoding, A forward inter-frame predictive encoding and a bi-directional inter-frame insert predictive encoding for P picture images and B picture images are stored. Matte For example, in the format (2), as shown in FIG. 4, each frame layer 33 has a 1-bit depth data valid flag (DE) 35 following the start code 34, and this depth. When “1” is set in the data valid flag 35, the following depth data is valid.

一方、奥行きデータ有効フラグ３５に“０”が設定されている場合には、後に続く奥行きデータは無効となり、対応するフレームレイヤ３３の奥行きは全ての画素において全く変化がないものとなり、各画素の奥行きデータは、後述するＯＦＦＳＥＴ値に均一にセットされることになる。 On the other hand, when “0” is set in the depth data valid flag 35, the following depth data is invalid, and the depth of the corresponding frame layer 33 is not changed at all, so that each pixel has no change. The depth data is uniformly set to an OFFSET value described later.

フレームレイヤ３３は、奥行きデータ有効フラグ３５に続いて、７ビットの予約（Reserved）フィールド３６と、各画素の奥行きデータに対して付加するための８ビットのオフセット（ＯＦＦＳＥＴ）値を格納するためのフィールドＯＦＦＳＥＴ３７とを有している。 Following the depth data valid flag 35, the frame layer 33 stores a 7-bit reserved field 36 and an 8-bit offset (OFFSET) value to be added to the depth data of each pixel. Field OFFSET37.

このＯＦＦＳＥＴ３７には、−１２７〜＋１２７までの値（８ビット）がオフセット値として格納されるようになっており、後述するピクセルレイヤに格納され、オフセットが付加された奥行きデータの値が２５５（８ビットで表現できる最大値）を超えるような場合（オーバーフロー）には、その都度リミッタ機能により奥行きデータのデータ幅を８ビットに抑えることが可能になっている。 In this OFFSET 37, a value from −127 to +127 (8 bits) is stored as an offset value. The offset data is stored in a pixel layer to be described later, and the depth data value to which the offset is added is 255 (8 When the value exceeds the maximum value that can be expressed in bits (overflow), the data width of the depth data can be suppressed to 8 bits by the limiter function each time.

例えば、奥行きデータ有効フラグ３５に“０”が設定され、且つＯＦＦＳＥＴ３７にも“０”が設定されている場合、たとえ後述する各データフィールド（ピクセルレイヤ）に何らかの奥行きデータ値が格納されていたとしても、全ての画素の奥行きデータ値が“０”と判断されることになる。 For example, if “0” is set in the depth data valid flag 35 and “0” is also set in the OFFSET 37, it is assumed that some depth data value is stored in each data field (pixel layer) described later. In other words, the depth data values of all the pixels are determined to be “0”.

そして、フレームレイヤ３３は、フィールドＯＦＦＳＥＴ３７に続いて、各画素（ピクセル）の奥行きデータを格納するためのフィールド（ピクセルレイヤ３８）を有している。 The frame layer 33 has a field (pixel layer 38) for storing depth data of each pixel (pixel) following the field OFFSET37.

ピクセルレイヤ３８は、同一の奥行きデータが連続する画素数をランレングスとして８ビットで表すためのNumOfSkipPixelフィールド３８ａと、その同一の奥行きデータを−１２８から＋１２７の値として８ビットで表すフィールド（ＺＰ）３８ｂとから構成されている。 The pixel layer 38 includes a NumOfSkipPixel field 38a for representing the number of consecutive pixels having the same depth data as a run length in 8 bits, and a field (ZP) for representing the same depth data as a value from −128 to +127 in 8 bits. 38b.

例えば、“１００”という奥行きデータを有する画素が例えば行方向に４画素連続している場合、この４画素それぞれにピクセルレイヤを割り当てて奥行きデータをＺＰフィールド３８ｂに格納するのではなく、１つのピクセルレイヤ３８のNumOfSkipPixelフィールド３８ａに連続画素数（ランレングス）である“４”を格納し、対応するＺＰフィールド３８ｂに対して奥行きデータ値である“１００”を格納することにより、奥行きデータ“１００”を４画素分連続して格納した場合と同様のデータを圧縮して格納することができる。なお（２）のフォーマットの応用として、隣接する画素の奥行きデータは非常に似ていることが多いので、隣接画素の差分をとり、隣接画素数および差分結果である差分データを、それぞれピクセルレイヤ３８のNumOfSkipPixelフィールド３８ａおよびＺＰフィールド３８ｂに格納することも可能であり、奥行きデータをさらに圧縮して符号化効率を向上させることが可能になる。 For example, when four pixels having depth data of “100” are consecutive in the row direction, for example, a pixel layer is assigned to each of the four pixels, and the depth data is not stored in the ZP field 38b. Depth data “100” is stored by storing “4” as the number of continuous pixels (run length) in the NumOfSkipPixel field 38 a of the layer 38 and storing “100” as the depth data value in the corresponding ZP field 38 b. Can be compressed and stored in the same manner as when four pixels are stored continuously. As an application of the format (2), the depth data of adjacent pixels are often very similar. Therefore, the difference between adjacent pixels is taken, and the difference data as the number of adjacent pixels and the difference result are respectively obtained from the pixel layer 38. The NumOfSkipPixel field 38a and the ZP field 38b can be stored, and the depth data can be further compressed to improve the encoding efficiency.

なお、（３）のデータフォーマットにおいては、各画素の奥行きデータを８ビットのグレースケール画像（Ｙ信号）とみなしているため、色差信号がない。したがって、通常のＭＰＥＧのフレーム内予測符号化におけるマクロブロック（ＭＢ）として、Ｙ信号用の４つの８×８のＤＣＴブロックおよび色差信号（ＣｂおよびＣｒ）用の２つの８×８のＤＣＴブロックを合計６個のＤＣＴブロックを使用する（図１２のテーブルＴ６のｆｏｒ文の繰り返し範囲“ｆｏｒ（ｉ＝０；ｉ＜６；ｉ＋＋）”参照）のに対し、本実施形態では、各画素の奥行きデータである８ビットのグレースケール画像は、Ｙ信号用の４つの８×８のＤＣＴブロックを用いてフレーム内予測符号化されるため、上記ｆｏｒ文の繰り返し範囲は、“ｆｏｒ（ｉ＝０；ｉ＜４；ｉ＋＋）”と設定される。 In the data format (3), since the depth data of each pixel is regarded as an 8-bit grayscale image (Y signal), there is no color difference signal. Therefore, four 8 × 8 DCT blocks for Y signal and two 8 × 8 DCT blocks for color difference signals (Cb and Cr) are used as macroblocks (MB) in normal intra-frame prediction encoding of MPEG. While a total of 6 DCT blocks are used (see the for statement repetition range “for (i = 0; i <6; i ++)” in table T6 in FIG. 12), in this embodiment, the depth of each pixel Since the 8-bit grayscale image, which is data, is subjected to intraframe prediction encoding using four 8 × 8 DCT blocks for Y signal, the for sentence repetition range is “for (i = 0; i <4; i ++) ”.

具体的に述べれば、図１０にテーブルＴ４として示すように、ＭＰＥＧビットストリームのピクチャレイヤにおいて、スライスレイヤ（Ｓｌｉｃｅ関数）により定義されるデータエレメント抽出処理に移行する前の最後のｉｆ文に、「ユーザデータスタートコード（user_data_start_code）を送った後に、ユーザデータ（user_data）を８ビット単位で記録することが出来る仕組み」が定義されている。 Specifically, as shown as table T4 in FIG. 10, in the picture layer of the MPEG bitstream, the last if statement before the transition to the data element extraction process defined by the slice layer (Slice function) A mechanism is defined in which user data (user_data) can be recorded in 8-bit units after sending a user data start code (user_data_start_code).

詳細に説明すれば、一般に、ＭＰＥＧビットストリームにおいては、ユーザデータスタートコード（user_data_start_code）は、スライスレイヤにより定義されるデータエレメント抽出処理に移行する前では、“0x000001B2”と定義されている。すなわち、本実施形態に係るビデオ圧縮器２は、ＭＰＥＧビデオストリームＶＳとして、ユーザコードを情報多重化器８に送信し、続いてユーザデータ領域ＵＤ１あるいはＵＤ２内で、本実施形態における認証に用いる関数値の存在を示す、予め一意に識別可能なコード(識別コード)である例えば0x0f0f0f0f2428fdaaのコードを送信する。この識別コードは、他の装置やアプリケーションでユーザデータ（user_data）を使用する場合に、そのユーザデータを識別する目的で記録するもので、その識別コードの値は特別に意味を有していない。そして、ビデオ圧縮器２は、識別コードに続いて、図４に示すフレームレイヤ構造として構成された各画素の奥行きデータを、ＭＰＥＧビデオストリームＶＳにおける例えばピクチャレイヤのユーザデータ領域ＵＤ２にラスタ順で格納するようになっている。 More specifically, in the MPEG bitstream, the user data start code (user_data_start_code) is generally defined as “0x000001B2” before moving to the data element extraction process defined by the slice layer. That is, the video compressor 2 according to the present embodiment transmits a user code as an MPEG video stream VS to the information multiplexer 8, and subsequently uses a function used for authentication in the user data area UD1 or UD2. For example, a code of 0x0f0f0f0f2428fdaa, which is a uniquely identifiable code (identification code) indicating the presence of a value, is transmitted. This identification code is recorded for the purpose of identifying user data (user_data) in other devices or applications, and the value of the identification code has no special meaning. Then, following the identification code, the video compressor 2 stores the depth data of each pixel configured as the frame layer structure shown in FIG. 4 in the user data area UD2 of the picture layer in the MPEG video stream VS in raster order, for example. It is supposed to be.

このようにして、各画素の奥行きデータがユーザデータ領域ＵＤ１またはＵＤ２に格納されたＭＰＥＧビデオストリームＶＳは、情報多重化器８に送信され、図４に示すＤＶＤビデオフォーマットに従って、オーディオ圧縮器７から送信されたオーディオデータ（ＭＰＥＧオーディオストリームＡＳ）と図４に示すＤＶＤビデオフォーマットに従って多重化され、この多重化されたデータストリーム（ＭＰＥＧ多重化トランスポートストリームＴＳ）は、記録器１０により、ＤＶＤビデオフォーマットに準拠した記録媒体９に記録される。 In this way, the MPEG video stream VS in which the depth data of each pixel is stored in the user data area UD1 or UD2 is transmitted to the information multiplexer 8 and is sent from the audio compressor 7 according to the DVD video format shown in FIG. The transmitted audio data (MPEG audio stream AS) is multiplexed in accordance with the DVD video format shown in FIG. 4, and this multiplexed data stream (MPEG multiplexed transport stream TS) is recorded by the recorder 10 in the DVD video format. Is recorded on the recording medium 9 compliant with the above.

以上述べたように、本実施形態に係る記録装置２００によれば、２次元映像から得られた３次元映像生成用の各画素の奥行き情報を、ＤＶＤビデオフォーマットに準拠するＭＰＥＧ圧縮符号化処理により圧縮符号化されたＭＰＥＧビデオストリームＶＳにおけるユーザ任意使用領域であるユーザデータ領域ＵＤ１あるいはＵＤ２に格納することができる。そして、このように奥行き情報が一体化されたＭＰＥＧビデオストリームをＭＰＥＧオーディオストリームと多重化し、ＭＰＥＧ多重化トランスポートストリームＴＳとしてＤＶＤビデオフォーマットに準拠した記録媒体９に記録することができる。 As described above, according to the recording apparatus 200 according to the present embodiment, the depth information of each pixel for generating 3D video obtained from 2D video is obtained by MPEG compression encoding processing compliant with the DVD video format. It can be stored in the user data area UD1 or UD2, which is a user arbitrary use area in the compression-coded MPEG video stream VS. Then, the MPEG video stream in which the depth information is integrated in this way can be multiplexed with the MPEG audio stream and recorded as an MPEG multiplexed transport stream TS on the recording medium 9 compliant with the DVD video format.

このため、図示しない再生装置においては、記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳを、一体に多重化記録された奥行き情報を用いることなく再生することにより、ＭＰＥＧビデオストリームＶＳおよびＭＰＥＧオーディオストリームＡＳに基づく２次元映像およびオーディオ信号をそれぞれ再生することができる。そして、記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳを、一体に多重化記録された奥行き情報を用いて再生することにより、奥行き情報およびＭＰＥＧビデオストリームＶＳに基づく３次元映像およびＭＰＥＧオーディオストリームＡＳに基づくオーディオ信号をそれぞれ再生することができる。 For this reason, in a playback apparatus (not shown), the MPEG multiplexed transport stream TS recorded on the recording medium 9 is played back without using the depth information recorded in an integrated manner, so that the MPEG video streams VS and MPEG are recorded. Two-dimensional video and audio signals based on the audio stream AS can be reproduced. Then, by reproducing the MPEG multiplexed transport stream TS recorded on the recording medium 9 using the depth information integrally recorded by multiplexing, 3D video and MPEG audio based on the depth information and the MPEG video stream VS are reproduced. Each audio signal based on the stream AS can be reproduced.

したがって、本実施形態に係る記録装置２００によれば、再生装置側において、記録媒体９に記録された奥行き情報の使用／不使用の切り替えにより２次元映像および３次元映像の切り替え再生を可能にする記録システムを構築することができ、この記録装置２００により記録媒体９に記録された２次元映像および３次元映像の再生効率を向上させることができる。 Therefore, according to the recording apparatus 200 according to the present embodiment, on the playback apparatus side, switching playback of 2D video and 3D video is enabled by switching between using / not using depth information recorded on the recording medium 9. A recording system can be constructed, and the reproduction efficiency of 2D video and 3D video recorded on the recording medium 9 by the recording device 200 can be improved.

（第２の実施の形態）
図１４は、本発明の第２の実施の形態に係る記録装置２００Ａの概略構成を示すブロック図である。本実施形態における記録装置２００Ａにおいては、ビデオ圧縮器および情報多重化器の処理が図１に示す記録装置２００と異なるため、その異なる処理について説明する。 (Second Embodiment)
FIG. 14 is a block diagram showing a schematic configuration of a recording apparatus 200A according to the second embodiment of the present invention. In the recording apparatus 200A in the present embodiment, the processing of the video compressor and the information multiplexer is different from that of the recording apparatus 200 shown in FIG.

本実施形態に係る記録装置２００Ａにおいては、奥行き情報フォーマット器５から送られてきた各画素の８ビットの奥行きデータは、直接、あるいはビデオ圧縮器２Ａにより上記（２）のランレングス符号化による圧縮符号化されたフレームレイヤ毎のデータ、（３）によりフレーム内予測符号化されたデータ、あるいは（４）のフレーム間挿入予測符号化により圧縮符号化されたデータ、ビデオ圧縮器２ＡによりＭＰＥＧ圧縮符号化されたＭＰＥＧビデオストリームＶＳとは異なるビットストリームとして情報多重化器８Ａに送信される。また、上記ＭＰＥＧビデオストリームＶＳおよびオーディオ圧縮器７によりＭＰＥＧ圧縮符号化されたＭＰＥＧオーディオストリームＡＳも、それぞれ情報多重化器８Ａに送信される。 In the recording apparatus 200A according to the present embodiment, 8-bit depth data of each pixel sent from the depth information formatter 5 is compressed directly or by the video compressor 2A by the run length encoding (2). Data for each encoded frame layer, data that has been subjected to intraframe prediction encoding according to (3), or data that has been compression encoded by interframe insertion prediction encoding according to (4), MPEG compression code by the video compressor 2A The encoded MPEG video stream VS is transmitted to the information multiplexer 8A as a bit stream. The MPEG video stream VS and the MPEG audio stream AS MPEG-encoded by the audio compressor 7 are also transmitted to the information multiplexer 8A.

情報多重化器８Ａは、送信されてきた左眼画像Ｐｌの各画素の奥行きデータに対応するデータを、ＤＶＤビデオフォーマット（図４参照）におけるＤ＿ＰＡＣＫ３２のデータ内容（例えば、２ｋＢ）として、ＭＰＥＧビデオストリームＶＳに対応するＶ＿ＰＡＣＫ３１およびＭＰＥＧオーディオストリームＡＳに対応するＡ＿ＰＡＣＫ３０等と多重化することにより、奥行きデータをＤＶＤビデオフォーマット化する。 The information multiplexer 8A uses the transmitted data corresponding to the depth data of each pixel of the left eye image Pl as the data content (for example, 2 kB) of D_PACK32 in the DVD video format (see FIG. 4). By multiplexing with V_PACK31 corresponding to VS, A_PACK30 corresponding to MPEG audio stream AS, etc., the depth data is converted into a DVD video format.

このようにして、各画素の奥行きデータが多重化されたＭＰＥＧ多重化トランスポートストリームＴＳは、記録器１０により、ＤＶＤビデオフォーマットに準拠した記録媒体９に記録される。 In this way, the MPEG multiplexed transport stream TS in which the depth data of each pixel is multiplexed is recorded on the recording medium 9 compliant with the DVD video format by the recorder 10.

以上述べたように、本実施形態に係る記録装置２００Ａによれば、２次元映像から得られた３次元映像生成用の各画素の奥行き情報を、ＤＶＤビデオフォーマットに従った多重化処理により、そのＤＶＤビデオフォーマットに対応する各フレームレイヤ３３における各ピクセルレイヤに格納して奥行き情報が一体化されたＭＰＥＧ多重化トランスポートストリームＴＳを生成し、生成されたＭＰＥＧ多重化トランスポートストリームＴＳをＤＶＤビデオフォーマットに準拠した記録媒体９に記録することができる。 As described above, according to the recording apparatus 200A according to the present embodiment, the depth information of each pixel for generating a 3D image obtained from the 2D image is obtained by multiplexing the DVD video format. An MPEG multiplexed transport stream TS in which depth information is integrated by storing in each pixel layer in each frame layer 33 corresponding to the DVD video format is generated, and the generated MPEG multiplexed transport stream TS is converted into the DVD video format. Can be recorded on the recording medium 9 compliant with the above.

このため、第１実施形態と同様に、図示しない再生装置においては、記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳを、一体に多重化記録された奥行き情報を用いることなく再生することにより、ＭＰＥＧビデオストリームＶＳおよびＭＰＥＧオーディオストリームＡＳに基づく２次元映像およびオーディオ信号をそれぞれ再生することができる。そして、ＭＰＥＧ多重化トランスポートストリームＴＳを、一体に多重化記録された奥行き情報を用いて再生することにより、奥行き情報およびＭＰＥＧビデオストリームＶＳに基づく３次元映像およびＭＰＥＧオーディオストリームＡＳに基づくオーディオ信号をそれぞれ再生することができる。 For this reason, as in the first embodiment, in a playback device (not shown), the MPEG multiplexed transport stream TS recorded on the recording medium 9 is played back without using the depth information recorded in a multiplexed manner. Thus, it is possible to reproduce the two-dimensional video and audio signal based on the MPEG video stream VS and the MPEG audio stream AS, respectively. Then, by reproducing the MPEG multiplexed transport stream TS using the depth information recorded in an integrated manner, the 3D video based on the depth information and the MPEG video stream VS and the audio signal based on the MPEG audio stream AS are obtained. Each can be played.

したがって、本実施形態に係る記録装置２００Ａにおいても、第１実施形態と同様の効果である、記録媒体９に記録された２次元映像および３次元映像の再生効率の向上という効果を得ることができる。 Therefore, also in the recording apparatus 200A according to the present embodiment, it is possible to obtain the effect of improving the reproduction efficiency of the 2D video and the 3D video recorded on the recording medium 9, which is the same effect as the first embodiment. .

なお、第１および第２の実施形態の変形例として、情報多重化器は、生成した例ＭＰＥＧ多重化トランスポートストリームＴＳのシステムレイヤ（図１５のテーブルＴ８参照）における“トランスポートデータフラグ（transport_private_data_flag）”に“１”を立てて、プライベートデータ（private_data）が存在することを明示し、続いて、データ長がトランスポートパケットをはみ出さないという制限の下で、ＭＰＥＧ多重化トランスポートストリームＴＳのtransport_private_data_lengthに設定したデータ長のプライベートデータフィールドにprivate_dataとして各画素の奥行きデータを格納し、記録媒体９に記録することもできる。 As a modification of the first and second embodiments, the information multiplexer uses a “transport data flag (transport_private_data_flag) in the system layer (see table T8 in FIG. 15) of the generated example MPEG multiplexed transport stream TS. ) ”Is set to“ 1 ”to clearly indicate that private data (private_data) exists, and then, under the restriction that the data length does not protrude from the transport packet, the MPEG multiplexed transport stream TS Depth data of each pixel can be stored as private_data in a private data field having a data length set in transport_private_data_length and recorded on the recording medium 9.

また、第１および第２の実施形態の変形例に係る記録装置は、ストリーム識別情報（stream_id）にプライベートストリーム（private_stream）を設定して専用のパケットを宣言し、この専用のパケットとして確保されたフィールド（プライベートストリームフィールド）に奥行きデータをパック化し、記録器１０を介して記録媒体９にフレーム画像毎に記録することも可能である。 Further, the recording apparatus according to the modification of the first and second embodiments sets a private stream (private_stream) in the stream identification information (stream_id), declares a dedicated packet, and is secured as this dedicated packet. It is also possible to pack the depth data into a field (private stream field) and record it on the recording medium 9 via the recorder 10 for each frame image.

（第３の実施の形態）
図１６は、本発明の第３の実施の形態に係る記録装置２００Ｂの概略構成を示すブロック図である。本実施形態における記録装置２００Ｂにおいては、ビデオ圧縮器および情報多重化器の処理が図１４に示す記録装置２００Ａと異なるため、その異なる処理について説明する。 (Third embodiment)
FIG. 16 is a block diagram showing a schematic configuration of a recording apparatus 200B according to the third embodiment of the present invention. In the recording apparatus 200B in the present embodiment, the processing of the video compressor and the information multiplexer is different from that of the recording apparatus 200A shown in FIG.

本実施形態に係る記録装置２００Ｂにおいては、奥行き情報フォーマット器５から送られてきた各画素の８ビットの奥行きデータは、直接、あるいはビデオ圧縮器２Ａにより上記（２）のランレングス符号化による圧縮符号化されたフレームレイヤ毎のデータ、（３）によりフレーム内予測符号化されたデータ、あるいは（４）のフレーム間挿入予測符号化により圧縮符号化されたデータ、ビデオ圧縮器２ＡによりＭＰＥＧ圧縮符号化されたＭＰＥＧビデオストリームＶＳとは異なるデータストリームとして情報多重化器８Ｂに送信される。また、上記ＭＰＥＧビデオストリームＶＳおよびオーディオ圧縮器７によりＭＰＥＧ圧縮符号化されたＭＰＥＧオーディオストリームＡＳも、それぞれ情報多重化器８Ｂに送信される。 In the recording apparatus 200B according to the present embodiment, 8-bit depth data of each pixel sent from the depth information formatter 5 is compressed by the run length encoding (2) directly or by the video compressor 2A. Data for each encoded frame layer, data that has been subjected to intraframe prediction encoding according to (3), or data that has been compression encoded by interframe insertion prediction encoding according to (4), MPEG compression code by the video compressor 2A The data stream is transmitted to the information multiplexer 8B as a data stream different from the converted MPEG video stream VS. The MPEG video stream VS and the MPEG audio stream AS MPEG-encoded by the audio compressor 7 are also transmitted to the information multiplexer 8B.

本実施形態に係る情報多重化器８Ｂは、送信されてきた左眼画像Ｐｌの各画素の奥行きデータに対応するデータを、図４に示すＤＶＤビデオフォーマットにおけるＤ＿ＰＡＣＫ３２ではなく、図１７に示すように、ＤＶＤビデオ規格に準拠するＤＶＤビデオフォーマットにおいて自由使用領域として設定された領域であるＤＶＤ-others zone２１のデータ内容としてＤＶＤ−video zone２０とリンクして多重化している。 As shown in FIG. 17, the information multiplexer 8 B according to the present embodiment uses data corresponding to the transmitted depth data of each pixel of the left eye image Pl as shown in FIG. 17 instead of D_PACK32 in the DVD video format shown in FIG. 4. The data content of the DVD-others zone 21, which is an area set as a free use area in the DVD video format compliant with the DVD video standard, is linked to the DVD-video zone 20 and multiplexed.

DVD-others zone２１は、１つのＤＶＭＧ（D-Video Manager）８２および複数（ｎ個）のＤＶＴＳ（D-Video Title Set）８３ａ１〜８３ａｎから構成されている。ＤＶＭＧ８２は、ビデオマネージャーインフォメーション等、後続するＤＶＴＳ８３ａ１〜８３ａｎの識別情報や様々な情報自体のスタートアドレスやエンドアドレス、どこのビデオストリームから再生を開始するか等の情報が記述されている。 The DVD-others zone 21 includes one DVMG (D-Video Manager) 82 and a plurality (n) of DVTS (D-Video Title Set) 83a1 to 83an. The DVMG 82 describes information such as identification information of subsequent DVTSs 83a1 to 83an such as video manager information, start addresses and end addresses of various information itself, and from which video stream playback is started.

各ＤＶＴＳ８３ａ１〜８３ａｎは、再生されるべきオーディオデータやビデオデータ（ビデオストリーム）のアドレス情報や識別情報等の制御データ（Control Data）が格納されたフィールドＤＶＴＳＩ８４と、このＤＶＴＳＩ８４の後に格納されたＤＶＯＢＳ（D-Video Object Set:ビデオオブジェクトセット）８５というビデオストリームおよびオーディオストリームが多重化されたＭＰＥＧ（Motion Picture Experts Group）ストリームのセット（コンテンツ）とから構成されており、このＤＶＯＢＳ８５は、複数のＤＶＯＢ８６（D-Video Object）という小単位のＭＰＥＧストリームから構成されている。 Each of the DVTSs 83a1 to 83an includes a field DVTSI 84 in which control data (control data) such as address information and identification information of audio data and video data (video stream) to be reproduced is stored, and a DVOBS ( A D-Video Object Set (Video Object Set) 85 is composed of a set (content) of MPEG (Motion Picture Experts Group) streams in which a video stream and an audio stream are multiplexed. The DVOBS 85 includes a plurality of DVOBs 86 ( D-Video Object) is composed of a small unit MPEG stream.

各ＤＶＯＢ８６は、さらに細分化された複数のセル（ＤＣＥＬＬ）８７という単位から構成されている。このＤＣＥＬＬ８７は、再生単位を表し、固有のＩＤ番号が付与されている。各ＤＣＥＬＬ８７は、さらに複数のＤＶＯＢＵ（D-Video Object Unit:ビデオオブジェクトユニット）８８から構成されている。 Each DVOB 86 is composed of a plurality of subdivided cells (DCELL) 87 units. This DCELL 87 represents a reproduction unit and is given a unique ID number. Each DCELL 87 further includes a plurality of DVOBUs (D-Video Object Units) 88.

本実施形態においては、この各ＤＶＯＢ８８が第２の実施の形態における、各画素の奥行きデータが各ピクセルレイヤ３８のデータとして格納されたフレームレイヤ３３を数フレーム分グループ化して構成されている。 In the present embodiment, each DVOB 88 is configured by grouping several frame layers 33 each having the depth data of each pixel stored as data of each pixel layer 38 in the second embodiment.

すなわち、本実施形態に係る情報多重化器８Ｂは、奥行き情報フォーマット器５から送信されてきたフレーム画像毎の各画素の奥行きデータからフレームレイヤ３３を生成し、生成した複数のフレーム画像に対応する複数のフレームレイヤ３３をグループ化してＤＶＤビデオフォーマットにおけるＤＶＯＢＵ８８を生成することにより、奥行きデータをＤＶＤビデオフォーマット化する。 That is, the information multiplexer 8B according to the present embodiment generates the frame layer 33 from the depth data of each pixel for each frame image transmitted from the depth information formatter 5, and corresponds to the generated plurality of frame images. A plurality of frame layers 33 are grouped to generate DVOBU 88 in the DVD video format, thereby converting the depth data into the DVD video format.

本実施形態においても、第１および第２実施形態と同様に、図示しない再生装置においては、記録媒体９に記録された多重化データを、記録媒体９におけるDVD-others zone２１に対応するエリアに記録された奥行き情報を用いることなく再生することにより、ＭＰＥＧビデオストリームＶＳおよびＭＰＥＧオーディオストリームＡＳに基づく２次元映像およびオーディオ信号をそれぞれ再生することができる。そして、記録媒体９におけるDVD-others zone２１に対応するエリアに記録された奥行き情報を用いてＭＰＥＧ多重化トランスポートストリームＴＳの再生処理を行うことにより、奥行き情報およびＭＰＥＧビデオストリームＶＳに基づく３次元映像およびＭＰＥＧオーディオストリームＡＳに基づくオーディオ信号をそれぞれ再生することができる。 Also in the present embodiment, as in the first and second embodiments, in a playback apparatus (not shown), multiplexed data recorded on the recording medium 9 is recorded in an area corresponding to the DVD-others zone 21 on the recording medium 9. By reproducing without using the depth information, a two-dimensional video and an audio signal based on the MPEG video stream VS and the MPEG audio stream AS can be reproduced, respectively. Then, by performing the reproduction process of the MPEG multiplexed transport stream TS using the depth information recorded in the area corresponding to the DVD-others zone 21 in the recording medium 9, the 3D video based on the depth information and the MPEG video stream VS is obtained. And an audio signal based on the MPEG audio stream AS can be reproduced.

したがって、本実施形態に係る記録装置２００Ｂにおいても、第１および第２実施形態と同様の効果である、記録媒体９に記録された２次元映像および３次元映像の再生効率の向上という効果を得ることができる。 Therefore, the recording apparatus 200B according to this embodiment also has the same effect as the first and second embodiments, that is, the effect of improving the reproduction efficiency of the two-dimensional video and the three-dimensional video recorded on the recording medium 9. be able to.

特に、本実施形態では、DVD-others zone２１に格納された奥行きデータをDVD-video zone２０に格納されたビデオデータと同一のデータ構造とし、DVD-others zone２１のＤＶＯＢ８６、ＤＣＥＬＬ８７、およびＤＶＯＢ８８に対応するフレーム枚数(再生長時間)を、DVD-video zone２０のＶＯＢ２６、ＣＥＬＬ２７、およびＶＯＢ２８に対応するフレーム枚数(再生長時間)と等しく設定することにより、サーチ等の記録媒体９に記録されたデータに対するアクセス性を高めることができる。 In particular, in the present embodiment, the depth data stored in the DVD-others zone 21 has the same data structure as the video data stored in the DVD-video zone 20, and frames corresponding to the DVOB 86, DCELL 87, and DVOB 88 of the DVD-others zone 21 are used. By setting the number of images (long playback time) equal to the number of frames (long playback time) corresponding to the VOB 26, CELL 27, and VOB 28 of the DVD-video zone 20, accessibility to data recorded on the recording medium 9 such as search Can be increased.

このようにＤＶＤビデオ規格に準拠したフォーマットで、DVD-video zone２０およびDVD-others zone２１の内の何れか一方に、２次元映像に対応する各フレーム画像の各画素の奥行き情報を格納しておくことにより、２次元映像および３次元映像をＤＶＤビデオ規格互換で記録媒体９に記録することができる。 In this way, the depth information of each pixel of each frame image corresponding to the 2D video is stored in one of the DVD-video zone 20 and the DVD-others zone 21 in a format compliant with the DVD video standard. Thus, 2D video and 3D video can be recorded on the recording medium 9 in conformity with the DVD video standard.

（第４の実施の形態）
図１８は、本発明の第４の実施の形態に係る記録装置２００Ｃの概略構成を示すブロック図である。 (Fourth embodiment)
FIG. 18 is a block diagram showing a schematic configuration of a recording apparatus 200C according to the fourth embodiment of the present invention.

本実施形態における記録装置２００Ｃは、コンピュータグラフィックス（ＣＧ）技術をベースにして、既存の２次元画像に対応する画像データから新たな２次元画像データ、あるいは全く新たな２次元画像データを例えばフレーム単位で生成する２次元画像生成機能（モジュール）９１ａおよび信号処理によりオーディオ信号を生成するオーディオ信号生成機能（モジュール）９１ｂを有するコンピュータ９１を備えており、このコンピュータ９１は、視差ベクトル抽出器３、ビデオ圧縮器２、およびオーディオ圧縮器７にそれぞれ接続されている。 The recording apparatus 200C according to the present embodiment is based on a computer graphics (CG) technique, for example, by converting new 2D image data or completely new 2D image data from image data corresponding to an existing 2D image into a frame. The computer 91 includes a two-dimensional image generation function (module) 91a generated in units and an audio signal generation function (module) 91b that generates an audio signal by signal processing. The computer 91 includes a parallax vector extractor 3, The video compressor 2 and the audio compressor 7 are connected respectively.

コンピュータ９１は、図示しないメモリを内蔵しており、このメモリに内蔵されたプログラム(ソフトウェア)により２次元画像生成機能９１ａおよびオーディオ信号生成機能９１ｂがそれぞれ起動される。 The computer 91 includes a memory (not shown), and the two-dimensional image generation function 91a and the audio signal generation function 91b are activated by a program (software) stored in the memory.

２次元画像生成機能９１ａによりＣＧ技術に基づいて生成された２次元画像データ（ＣＧ画像データ）は、小サイズの単位（例えば、ポリゴン）毎にその位置情報および奥行き情報をそれぞれ有している。 The two-dimensional image data (CG image data) generated based on the CG technique by the two-dimensional image generation function 91a has position information and depth information for each small size unit (for example, polygon).

すなわち、本実施形態によれば、コンピュータ９１の２次元画像生成機能９１ａは、ＣＧ技術によりフレーム単位の２次元画像データを順次生成し、ビデオデータ（ビデオストリーム）としてビデオ圧縮器２にそれぞれ送信するとともに、各２次元画像データのポリゴン単位毎の位置情報｛水平位置（Ｘ）、垂直位置（Ｙ）、奥行き情報（Ｚ）｝を奥行き情報フォーマット器５に送信するようになっている。 That is, according to the present embodiment, the two-dimensional image generation function 91a of the computer 91 sequentially generates two-dimensional image data in units of frames by the CG technique, and transmits the data to the video compressor 2 as video data (video stream). At the same time, position information {horizontal position (X), vertical position (Y), depth information (Z)} for each polygon unit of each two-dimensional image data is transmitted to the depth information formatter 5.

そして、オーディオ信号生成機能９１ｂは、元の素材となるオーディオ信号を加工して、あるいは全く新たなオーディオ信号を生成してオーディオ圧縮器７に送信するようになっている。 The audio signal generation function 91b processes the audio signal as the original material or generates a completely new audio signal and transmits it to the audio compressor 7.

なお、ビデオ圧縮器２、奥行き情報フォーマット器５、オーディオ圧縮器７、情報多重化器８、および記録器１０の処理については、第１の実施形態の記録装置２００の対応する構成要素と略同一であるため、その説明は省略する。 The processing of the video compressor 2, the depth information formatter 5, the audio compressor 7, the information multiplexer 8, and the recorder 10 is substantially the same as the corresponding constituent elements of the recording apparatus 200 of the first embodiment. Therefore, the description thereof is omitted.

すなわち、本実施形態によれば、コンピュータ９１の２次元画像生成機能９１ａにより生成された各２次元画像データのポリゴン単位毎の位置情報｛水平位置（Ｘ）、垂直位置（Ｙ）、奥行き情報（Ｚ）｝は、順次奥行き情報フォーマット器５Ａに送信される。 That is, according to the present embodiment, position information for each polygon unit of each 2D image data generated by the 2D image generation function 91a of the computer 91 {horizontal position (X), vertical position (Y), depth information ( Z)} are sequentially transmitted to the depth information formatter 5A.

このとき、奥行き情報フォーマット器５Ａでは、送られた各ポリゴンの位置情報および各小ブロック｛（４×２）の画素範囲｝の位置情報の比較結果に応じて、各ポリゴンの奥行き情報の値が、各小ブロックの奥行き情報の値に変換され、第１の実施形態と同様に、変換された各小ブロックを構成する各画素に対してたとえば８ビットの奥行きデータとして展開される。 At this time, in the depth information formatter 5A, the depth information value of each polygon is set in accordance with the comparison result of the position information of each polygon and the position information of each small block {(4 × 2) pixel range}. The depth information value of each small block is converted into, for example, 8-bit depth data for each pixel constituting each converted small block, as in the first embodiment.

この他の処理は、第１実施形態と同様である。 Other processes are the same as those in the first embodiment.

すなわち、本実施形態によれば、撮像カメラを用いることなく２次元映像およびその奥行き情報を生成することができ、生成した奥行き情報を、ＭＰＥＧビデオストリームと一体化してＭＰＥＧオーディオストリームと多重化し、ＭＰＥＧ多重化トランスポートストリームＴＳとしてＤＶＤビデオフォーマットに準拠した記録媒体９に記録することができる。 That is, according to the present embodiment, it is possible to generate a two-dimensional image and its depth information without using an imaging camera. The generated depth information is integrated with an MPEG video stream and multiplexed with an MPEG audio stream. The multiplexed transport stream TS can be recorded on the recording medium 9 compliant with the DVD video format.

このため、第１実施形態と同様の効果である、記録媒体９に記録された２次元映像および３次元映像の再生効率の向上という効果を得ることができる。 For this reason, it is possible to obtain the effect of improving the reproduction efficiency of the 2D video and the 3D video recorded on the recording medium 9, which is the same effect as the first embodiment.

（第５の実施の形態）
図１９は、本発明の第５の実施の形態に係る記録装置２００Ｄの概略構成を示すブロック図である。 (Fifth embodiment)
FIG. 19 is a block diagram showing a schematic configuration of a recording apparatus 200D according to the fifth embodiment of the present invention.

本実施形態における記録装置２００Ｄは、第４実施形態と同等の機能構成および接続関係を有するコンピュータ９１および奥行き情報フォーマット器５Ａを備えている。なお、ビデオ圧縮器２Ａ、オーディオ圧縮器７、情報多重化器８Ａ、および記録器１０の処理については、第２の実施形態の記録装置２００Ａの対応する構成要素と略同一であるため、その説明は省略する。 The recording apparatus 200D in the present embodiment includes a computer 91 and a depth information formatter 5A having the same functional configuration and connection relationship as those in the fourth embodiment. Note that the processing of the video compressor 2A, the audio compressor 7, the information multiplexer 8A, and the recorder 10 is substantially the same as the corresponding components of the recording apparatus 200A of the second embodiment, and therefore the description thereof will be given. Is omitted.

すなわち、本実施形態においても、第４の実施形態と同様に、コンピュータ９１の２次元画像生成機能９１ａにより生成された各２次元画像データのポリゴン単位毎の位置情報は、順次奥行き情報フォーマット器５Ａに送信される。 That is, also in the present embodiment, as in the fourth embodiment, the position information for each polygon unit of each two-dimensional image data generated by the two-dimensional image generation function 91a of the computer 91 is sequentially obtained from the depth information formatter 5A. Sent to.

このとき、奥行き情報フォーマット器５Ａでは、送られた各ポリゴンの位置情報および各小ブロック｛（４×２）の画素範囲｝の位置情報の比較結果に応じて、各ポリゴンの奥行き情報の値が、各小ブロックの奥行き情報の値に変換され、変換された各小ブロックを構成する各画素に対してたとえば８ビットの奥行きデータとして展開される。 At this time, in the depth information formatter 5A, the depth information value of each polygon is set in accordance with the comparison result of the position information of each polygon and the position information of each small block {(4 × 2) pixel range}. The depth information of each small block is converted into a value, and is developed as, for example, 8-bit depth data for each pixel constituting each converted small block.

この他の処理は、第２実施形態と同様である。 Other processes are the same as those in the second embodiment.

すなわち、本実施形態によれば、撮像カメラを用いることなく２次元映像およびその奥行き情報を生成することができる。そして、生成した奥行き情報を、ＤＶＤビデオフォーマットに従った多重化処理により、そのＤＶＤビデオフォーマットに対応する各フレームレイヤ３３における各ピクセルレイヤに格納して奥行き情報が一体化されたＭＰＥＧ多重化トランスポートストリームＴＳを生成し、生成したＭＰＥＧ多重化トランスポートストリームＴＳをＤＶＤビデオフォーマットに準拠した記録媒体９に記録することができる。 That is, according to the present embodiment, it is possible to generate a two-dimensional image and its depth information without using an imaging camera. Then, the generated depth information is stored in each pixel layer in each frame layer 33 corresponding to the DVD video format by multiplexing processing according to the DVD video format, and the MPEG multiplexed transport in which the depth information is integrated. A stream TS can be generated, and the generated MPEG multiplexed transport stream TS can be recorded on a recording medium 9 compliant with the DVD video format.

このため、第２実施形態と同様の効果である、記録媒体９に記録された２次元映像および３次元映像の再生効率の向上という効果を得ることができる。 For this reason, it is possible to obtain the same effect as the second embodiment, that is, the improvement of the reproduction efficiency of the two-dimensional video and the three-dimensional video recorded on the recording medium 9.

（第６の実施の形態）
図２０は、本発明の第６の実施の形態に係る記録装置２００Ｅの概略構成を示すブロック図である。 (Sixth embodiment)
FIG. 20 is a block diagram showing a schematic configuration of a recording apparatus 200E according to the sixth embodiment of the present invention.

本実施形態における記録装置２００Ｅは、第４実施形態と同等の機能構成および接続関係を有するコンピュータ９１および奥行き情報フォーマット器５Ａを備えている。なお、ビデオ圧縮器２Ａ、オーディオ圧縮器７、情報多重化器８Ｂ、および記録器１０の処理については、第３の実施形態の記録装置２００Ｂの対応する構成要素と略同一であるため、その説明は省略する。 The recording apparatus 200E in the present embodiment includes a computer 91 and a depth information formatter 5A having the same functional configuration and connection relationship as those in the fourth embodiment. Note that the processing of the video compressor 2A, the audio compressor 7, the information multiplexer 8B, and the recorder 10 is substantially the same as the corresponding components of the recording apparatus 200B of the third embodiment, and therefore the description thereof is omitted. Is omitted.

この他の処理は、第３実施形態と同様である。 Other processes are the same as those in the third embodiment.

すなわち、本実施形態によれば、撮像カメラを用いることなく２次元映像およびその奥行き情報を生成することができる。次いで、生成した奥行き情報からフレームレイヤ３３を生成し、生成した複数のフレーム画像に対応する複数のフレームレイヤ３３をグループ化してＤＶＤビデオフォーマットにおけるＤＶＯＢＵ８８を生成して奥行き情報をＤＶＤビデオフォーマット化する。そして、各画素の奥行き情報が多重化されたＭＰＥＧ多重化トランスポートストリームＴＳをＤＶＤビデオフォーマットに準拠した記録媒体９に記録することができる。 That is, according to the present embodiment, it is possible to generate a two-dimensional image and its depth information without using an imaging camera. Next, a frame layer 33 is generated from the generated depth information, and a plurality of frame layers 33 corresponding to the generated plurality of frame images are grouped to generate a DVOBU 88 in the DVD video format, thereby converting the depth information into the DVD video format. Then, the MPEG multiplexed transport stream TS in which the depth information of each pixel is multiplexed can be recorded on the recording medium 9 conforming to the DVD video format.

したがって、本実施形態に係る記録装置２００Ｅにおいても、第３実施形態と同様の効果である、記録媒体９に記録された２次元映像および３次元映像の再生効率の向上という効果を得ることができる。 Therefore, also in the recording apparatus 200E according to the present embodiment, it is possible to obtain the same effect as the third embodiment, that is, the improvement of the reproduction efficiency of the 2D video and the 3D video recorded on the recording medium 9. .

（第７の実施の形態）
図２１は、本発明の第７の実施の形態に係る再生装置１００の概略構成を示すブロック図である。この再生装置１００は、本発明に係る第１乃至第６実施形態に係る記録装置２００、２００Ａ〜２００Ｅおよびその変形例の内の何れか１つの記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳに基づいて２次元映像および３次元映像を切り替え再生できる装置である。 (Seventh embodiment)
FIG. 21 is a block diagram showing a schematic configuration of the playback apparatus 100 according to the seventh embodiment of the present invention. This playback apparatus 100 includes an MPEG multiplexed transport stream recorded on any one of the recording apparatuses 200, 200A to 200E according to the first to sixth embodiments of the present invention and the modifications thereof. This is an apparatus capable of switching and reproducing 2D video and 3D video based on TS.

すなわち、再生装置１００は、記録媒体９にアクセスしてその記録媒体９に記録されたデータを読み取り再生することができる再生器１０１と、この再生器１０１に接続されたパケット分離用の情報分離化器１０２と、この情報分離化器１０２に接続されており、例えば図６に示す復号化装置の構成を有するビデオ復号器１０３と、このビデオ復号器１０３に接続された奥行きデータ取り出し用の奥行き情報取り出し器１０４とを備えている。 That is, the playback apparatus 100 accesses the recording medium 9 and can read and play back data recorded on the recording medium 9, and information separation for packet separation connected to the playback apparatus 101. For example, a video decoder 103 having the configuration of the decoding apparatus shown in FIG. 6 and depth information for extracting depth data connected to the video decoder 103. And a take-out device 104.

さらに、再生装置１００は、所定の立体表示方式により立体映像（３次元動画像）を表示可能な立体画像表示器１０５と、この立体画像表示器１０５の立体表示方式に対応する立体画像（視差画像）を生成するための視野変換器１０６とを備えている。 Furthermore, the playback apparatus 100 includes a stereoscopic image display 105 that can display a stereoscopic video (three-dimensional moving image) by a predetermined stereoscopic display method, and a stereoscopic image (parallax image) corresponding to the stereoscopic display method of the stereoscopic image display 105 ) To generate a visual field converter 106.

そして、再生装置１００は、情報分離化器１０２に接続されたオーディオデータ復号用のオーディオ復号器１０７と、このオーディオ復号器１０７により復号化されたオーディオ信号を再生するためのスピーカ１０８とを備えている。 The playback apparatus 100 includes an audio decoder 107 for decoding audio data connected to the information separator 102, and a speaker 108 for playing back the audio signal decoded by the audio decoder 107. Yes.

再生装置１００は、再生器１０１、情報分離化器１０２、ビデオ復号器１０３、奥行き情報取り出し器１０４、立体画像表示器１０５、視野変換器１０６、およびオーディオ復号器１０７にそれぞれ接続されており、装置全体を制御するコントローラ１０９を備えている。 The playback apparatus 100 is connected to a playback apparatus 101, an information separator 102, a video decoder 103, a depth information extractor 104, a stereoscopic image display 105, a field of view converter 106, and an audio decoder 107, respectively. A controller 109 for controlling the whole is provided.

次に、本実施形態に係る再生装置１００の全体動作について説明する。 Next, the overall operation of the playback apparatus 100 according to this embodiment will be described.

この再生装置１００によれば、第１実施形態で説明した記録方法により記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳは、再生器１０１により記録媒体９から読み取られ、再生器１０１により読み取られたＭＰＥＧ多重化トランスポートストリームＴＳは、情報分離化器１０２に送られ、この情報分離化器１０２において、ＭＰＥＧ多重化ストリームＴＳからＭＰＥＧビデオストリームＶＳのパケットおよびＭＰＥＧオーディオストリームＡＳのパケットがそれぞれ分離される。 According to the reproducing apparatus 100, the MPEG multiplexed transport stream TS recorded on the recording medium 9 by the recording method described in the first embodiment is read from the recording medium 9 by the reproducing device 101 and read by the reproducing device 101. The MPEG multiplexed transport stream TS is sent to the information separator 102, where the MPEG video stream VS packet and the MPEG audio stream AS packet are separated from the MPEG multiplexed stream TS. Is done.

情報分離化器１０２により分離されたＭＰＥＧビデオストリームＶＳのパケットは、ビデオ復号器１０３に受信され、このビデオ復号器１０３において、前掲図５に示した方法により、受信されたＭＰＥＧビデオストリームＶＳのパケットが復号化されて復号化データ（再生用フレーム画像）が生成される。 The packet of the MPEG video stream VS separated by the information separator 102 is received by the video decoder 103, and the received video packet of the MPEG video stream VS is received by the video decoder 103 by the method shown in FIG. Is decoded to generate decoded data (reproduction frame image).

さらに、ビデオ復号器１０３において、ＭＰＥＧビデオストリームＶＳのユーザデータ領域ＵＤ１あるいはＵＤ２に格納されたユーザデータが読み出され、この読み出されたユーザデータから、奥行き情報取り出し器１０４により、例えば図４や図１７のフレームレイヤのフォーマットで記録された奥行きデータが取り出される。なお、ユーザデータがＭＰＥＧ多重化トランスポートストリームＴＳにおけるＤ＿ＰＡＣＫ３２、プライベートデータフィールド、もしくはプライベートストリームフィールドに格納されている場合には、情報分離化器１０２によってユーザデータが読み出されて奥行き情報取り出し器１０４により奥行きデータが取り出される。 Further, in the video decoder 103, user data stored in the user data area UD1 or UD2 of the MPEG video stream VS is read out, and from this read out user data, the depth information extractor 104, for example, FIG. Depth data recorded in the frame layer format of FIG. 17 is extracted. When the user data is stored in the D_PACK 32, the private data field, or the private stream field in the MPEG multiplexed transport stream TS, the user data is read by the information separator 102 and the depth information extractor 104 is read out. Depth data is extracted by.

ビデオ復号器１０３により生成された再生用フレーム画像および奥行き情報取り出し器１０４により取り出された奥行きデータは、それぞれ視野変換器１０６にそれぞれ受信される。 The playback frame image generated by the video decoder 103 and the depth data extracted by the depth information extractor 104 are respectively received by the visual field converter 106.

視野変換器１０６においては、受信された再生用フレーム画像および奥行き情報取り出し器１０４により取り出された奥行きデータに基づいて、立体画像表示器１０５の立体表示方式に対応する視差画像が生成され、生成された視差画像は、立体画像表示器１０５により立体表示される。 In the field-of-view converter 106, a parallax image corresponding to the stereoscopic display method of the stereoscopic image display 105 is generated and generated based on the received playback frame image and the depth data extracted by the depth information extractor 104. The displayed parallax image is stereoscopically displayed by the stereoscopic image display 105.

一方、情報分離化器１０２により分離されたＭＰＥＧオーディオストリームＡＳのパケットは、オーディオ復号器１０７により復号化されて復号化データ（オーディオ信号）が再生され、生成されたオーディオ信号は、スピーカ１０８により再生される。 On the other hand, the MPEG audio stream AS packet separated by the information separator 102 is decoded by the audio decoder 107 to reproduce decoded data (audio signal), and the generated audio signal is reproduced by the speaker 108. Is done.

次に、本実施形態における視野変換器１０６の視差画像生成処理について説明する。 Next, the parallax image generation processing of the visual field converter 106 in this embodiment will be described.

視差画像を生成するには、ＣＧにおける座標系の変換方法として視野変換方式を用いる。この視野変換方式は、視点座標系への変換式により、視点を変えた画像を得ることができるものであり、２次元画像およびその奥行き情報を入手することができれば、その奥行き情報を用いて自由な視点から見た画像（立体画像）を生成することができる。 In order to generate a parallax image, a visual field conversion method is used as a coordinate system conversion method in CG. This visual field conversion method can obtain an image with a different viewpoint by a conversion formula to the viewpoint coordinate system. If a two-dimensional image and its depth information can be obtained, the depth information can be freely used. An image (stereoscopic image) viewed from various viewpoints can be generated.

例えば、図１に示す撮像カメラ１Ｂの１Ａあるいは１ＢのレンズＬ１ＡあるいはＬ１Ｂの光軸上の主点に対応する視点の座標を（ｘ_ｉ，ｙ_ｉ，ｚ_ｉ）、対象物体ＯＢの特徴点に対応する注視点の座標を（ｘ_ａ，ｙ_ａ，ｚ_ａ）とする。また、視点および注視点間の距離を（ｘ_ｆ，ｙ_ｆ，ｚ_ｆ）とすると、この視点および注視点間距離（ｘ_ｆ，ｙ_ｆ，ｚ_ｆ）は、下式
ｘ_ｆ＝ｘ_ｉ−ｘ_ａ
ｙ_ｆ＝ｙ_ｉ−ｙ_ａ
ｚ_ｆ＝ｚ_ｉ−ｚ_ａ
として表される。 For example, the coordinates of the viewpoint corresponding to the principal point on the optical axis of the lens L1A or L1B of the imaging camera 1B shown in FIG. 1 (x _i , y _i , z _i ) are used as the feature points of the target object OB. Let the coordinates of the corresponding gaze point be (x _a , y _a , z _a ). Further, when the distance between the viewpoint and the gazing point is (x _f , y _f , z _f ), the distance between the viewpoint and the gazing point (x _f , y _f , z _f ) is expressed by the following equation: x _f = x _i − x _a
y _f = y _i −y _a
_z _f = z i -z _a
Represented as:

このとき、本実施形態に係る視野変換方式では、最初に平行移動により原点の位置を動かして注視点Ｏ_ａを直交座標系（ｘ，ｙ，ｚ）の原点に設定する。この変換をＴ_１とする。この変換Ｔ_１は、単に（−ｘ_ａ，−ｙ_ａ，−ｚ_ａ）だけの平行移動を表す変換である。次に、回転により座標値の向きを変える。図２２に示すように、直交座標系（ｘ，ｙ，ｚ）の原点（注視点）Ｏ_ａから点Ｏ_ｆ方向へのベクトルは、原点Ｏ_ａからｚ軸のベクトルをまずα角だけｙ軸を中心に回転させ、次にβ角だけｘ軸を中心に回転させる。実際には、点Ｏ_ｆの座標値を動かすので回転方向が逆になる。 In this case, in view transformation method according to the present embodiment sets the first fixation point O _a by moving the position of the origin by translation to the origin of the orthogonal coordinate system (x, y, z). This conversion to _{T 1.} This transformation T ₁ is a transformation that represents a parallel movement of only (−x _a , −y _a , −z _a ). Next, the direction of the coordinate value is changed by rotation. As shown in FIG. 22, an orthogonal coordinate system (x, y, z) vector from the origin (gazing point) O _a to the point O _f direction, only first α angular vector of the z-axis from the origin O _a y-axis Is then rotated around the x axis by the β angle. In practice, the rotational direction is reversed so move the coordinates of the point O _f.

ここで、
変換Ｔ_２：ｙ軸に-α回転
変換Ｔ_３：ｘ軸に-β回転
として表される。ここで、αは、点Ｏ_ｆをｘｙ平面に投影した投影点Ｏ_ｆ’と原点Ｏ_ａとを結ぶ線とＺ軸との間の角度であるので、ｓｉｎαおよびｃｏｓαは、それぞれ下式

として表される。 here,
Converting _T 2:-.alpha. rotational transform to the y-axis _T 3: represented as -β rotating the x-axis. Here, alpha is because an angle between the projection point obtained by projecting the point O _f the xy plane O _{f 'between} a line and Z-axis connecting the origin O _a, sin .alpha and cosα, respectively following formula

Represented as:

また、βは、原点Ｏ_ａおよび点Ｏ_ｆの間の長さ（ｘ_ｆ ^２＋ｙ_ｆ ^２）^１/２と点Ｏ_ｆおよび投影点Ｏ_ｆ’間の長さｙ_ｆにより、下式

として表される。 Moreover, beta is the origin _{O a} and point _O length between _{_{^{_{^{f (x f 2 + y f}}}}} 2) 1/2 and the point _{O f} and projection point _{O f} 'between length _{y f,} the following formula

Represented as:

最後の変換として、ｘｙ平面に対するｚ軸の正方向が原点Ｏ_ａから視点側になるような（ｘ，ｙ，ｚ）座標系（図２２参照）から、ｘｙ平面に対するｚ軸の正方向が原点Ｏ_ａから視点とは反対の方向側、すなわち、視点から見て原点Ｏ_ａを介してｘｙ平面の向こう側(視点からの目の方向)のｚ軸方向が正になるようにする変換Ｔ_４を行う。これは単にｚ→−ｚにするだけである。これら変換Ｔ_１〜Ｔ_４の４つの変換マトリクスを掛け合わせると、視点座標の変換マトリクスは

として表される。 As a final conversion, the positive direction of the z-axis with respect to xy plane such that the viewpoint side from the origin O _a (x, y, z) coordinate system (see FIG. 22), forward the origin of the z-axis with respect to the xy plane O opposite direction side perspective and from _a, i.e., transformation T ₄ to allow the z-axis direction across the xy plane (the direction of the eye from the viewpoint) is positive through the origin O _a from the perspective I do. This is simply z → -z. Multiplying these four transformation matrices T _{1 to} T ₄ gives the viewpoint coordinate transformation matrix.

Represented as:

例えば、立体画像表示器１０５の立体表示方式が後述するパララックスバリアを用いた２眼式立体表示方式であれば、視野変換器１０６は、奥行き情報取り出し器１０４により取り出された奥行きデータに基づいて設定できるαを上記変換マトリクスＴを表す式に代入し、βおよびγは０に設定することにより、視差を有する右眼画像および左眼画像を生成することができる。 For example, if the stereoscopic display method of the stereoscopic image display device 105 is a twin-lens stereoscopic display method using a parallax barrier described later, the visual field converter 106 is based on the depth data extracted by the depth information extractor 104. By substituting α that can be set into an expression representing the conversion matrix T and setting β and γ to 0, it is possible to generate a right eye image and a left eye image having parallax.

また、立体画像表示器１０５の立体表示方式が後述するＩＰ（Integral Photography：インテグラルフォトグラフィー、あるいはインテグラルイメージングともいう）を用いた方式であれば、視野変換器１０６は、複数のレンズアレイを構成するそれぞれのレンズ位置に対応した撮像カメラで対象物体を撮像して得られた複数の要素画像に基づいて、それぞれの要素画像の大きさと共に上記視点座標の変換マトリクスＴを用いて計算することにより、立体画像データを生成することができる。このようにして生成した立体画像データを、立体画像表示器１０５に伝送し、立体画像再生を行うように公正されている。 If the stereoscopic display method of the stereoscopic image display 105 is a method using IP (Integral Photography), which will be described later, the visual field converter 106 includes a plurality of lens arrays. Based on a plurality of element images obtained by imaging a target object with an imaging camera corresponding to each lens position constituting, using the conversion matrix T of the viewpoint coordinates together with the size of each element image Thus, stereoscopic image data can be generated. It is fair to transmit the stereoscopic image data generated in this way to the stereoscopic image display 105 and perform stereoscopic image reproduction.

ここで立体画像表示方式のうち、代表的なパララックスバリア方式とＩＰ方式の説明をする。パララックスバリア方式は、例えば液晶によって実現することができる。すなわち、パララックスバリア方式は、２枚の液晶パネルを積層して構成されており、図２３に示すように、細いスリット状の一定周期の開口部１１０ａが形成された一方の液晶パネル（液晶遮光バリア）１１０と、所定の視点（左眼、右眼）から見てその裏側に適当な間隔をおいて対向配置され、その液晶遮光バリア側のスクリーンに、左眼画像および右眼画像（ＬおよびＲ）が上記開口部と同一周期の交互に配置され、かつその反対側の面にバックライト１１１がその長手方向に沿って設置された他方の液晶パネル１１２とを備える方式である。 Here, a typical parallax barrier method and an IP method among the stereoscopic image display methods will be described. The parallax barrier method can be realized by liquid crystal, for example. That is, the parallax barrier method is configured by laminating two liquid crystal panels, and as shown in FIG. 23, one liquid crystal panel (liquid crystal light-shielding panel) in which narrow slit-shaped openings 110a are formed. The barrier 110 is opposed to the back side of the predetermined viewpoint (left eye, right eye) at an appropriate interval, and the left eye image and right eye image (L and L) are displayed on the screen on the liquid crystal light shielding barrier side. R) is a system comprising the other liquid crystal panel 112 in which the backlight 111 is disposed along the longitudinal direction on the opposite side of the openings alternately arranged in the same cycle as the opening.

このパララックスバリア方式によれば、ユーザの所定の視点から、液晶遮光バリア１１０の開口部を通して左眼画像および右眼画像を見た場合、右眼には右眼画像が、左眼には左眼画像がそれぞれ分離された状態で知覚できるように構成されている。この構成により、ユーザは、その右眼および左眼にそれぞれ認識された異なる左眼画像および右眼画像に基づいて、合成画像の結像位置が右眼・左眼画像位置（スクリーン位置）から変化し、立体画像として知覚することができる。 According to this parallax barrier method, when the left eye image and the right eye image are viewed through the opening of the liquid crystal light blocking barrier 110 from a predetermined viewpoint of the user, the right eye image is displayed for the right eye and the left eye is displayed for the left eye. It is configured so that the eye images can be perceived in a separated state. With this configuration, the user can change the imaging position of the composite image from the right eye / left eye image position (screen position) based on the different left eye image and right eye image recognized by the right eye and the left eye, respectively. It can be perceived as a stereoscopic image.

しかしながら、眼のピントは常に液晶パネル１１２のスクリーン上に合わされているにも係わらず、結像位置がスクリーンとは異なる位置に知覚されるため、生理学的な不自然さを伴う恐れがあり、ユーザの疲労や映像酔い等が発生する可能性も生じていた。そこで、近年は、５つの立体視の生理的要因、すなわち、輻輳調節矛盾（輻輳点とピントのあう位置の矛盾）、両眼視差（ある物体を見る際に、人間の左右の眼はそれぞれ違った方向から見る２つの異なる像を捕らえている性質）、ピント調節（見る対象からの距離の変化に伴って水晶体の厚さをコントロールしてレンズの厚みを変えるような性質）、輻輳(遠近の変化により眼球が内側に回転したり外側へ回転したりする動きを伴うという性質）、および運動視差（ユーザが自分で動いたり見る角度を変えたりすることで像の違いを見る性質）をそれぞれ満たすような立体画像表示方式も提案されている。 However, although the focus of the eye is always on the screen of the liquid crystal panel 112, the imaging position is perceived at a position different from the screen, which may be accompanied by physiological unnaturalness. There was also the possibility of fatigue and video sickness. Therefore, in recent years, there are five physiological factors of stereoscopic vision, namely, convergence adjustment contradiction (conflict between the convergence point and the focus position), binocular parallax (when viewing an object, the left and right eyes of a human are different. That captures two different images viewed from different directions), focus adjustment (the property of changing the lens thickness by controlling the lens thickness as the distance from the viewing object changes), and convergence (perspective Satisfying the nature of the eyeball rotating inward or rotating outward due to changes) and motion parallax (property of seeing the difference of images by the user moving or changing the viewing angle) Such a stereoscopic image display method has also been proposed.

提案された中でも有望な方式として、Lippmannが１９０８年に発表した方式が上記ＩＰ方式である。ＩＰ方式は、２次元的に配列したレンズアレイ（フライアイレンズ、蝿の目レンズ、複眼レンズなどともいう）を利用して表示対象物体の奥行き情報を取得するものである。１９９０年代に入ると、従来の写真乾板による記録を電子技術で置き換えることにより、ＩＰ方式により動画を生成する技術が開発され、さらに、文献（ＮＨＫ放送技術研究所「３次元映像の基礎」）の研究者の手により、屈折率分布レンズアレイ｛ＧＲＩＮ（Gradient Index）レンズアレイともいう｝とハイビジョンカメラとを用いて表示対象（被写体）を撮像してレンズアレイに対応する要素画像群を取得しながら、各要素画像を液晶ディスプレイにリアルタイムに伝送して表示し、フライアイレンズにより空間上に立体画像（３次元映像）として結像することに成功し、ＩＰ方式による３次元テレビジョン放送の実現可能性が示された。 As a promising method among the proposed ones, a method announced by Lippmann in 1908 is the IP method. The IP method acquires depth information of a display target object by using a two-dimensionally arranged lens array (also referred to as a fly-eye lens, a fly-eye lens, or a compound eye lens). In the 1990s, a technology for generating moving images by IP was developed by replacing the conventional photographic plate recording with electronic technology. Furthermore, the literature (NHK Broadcasting Technology Laboratory “Basics of 3D Video”) was developed. A researcher's hand captures a display object (subject) using a gradient index lens array {also referred to as a GRIN (Gradient Index) lens array} and a high-definition camera, and acquires an element image group corresponding to the lens array. Each elemental image is transmitted and displayed in real time on a liquid crystal display, and is successfully imaged as a 3D image (3D video) in space by a fly-eye lens, enabling the realization of 3D television broadcasting by the IP method. Sex was shown.

図２４は、ＩＰ方式の立体画像結像原理を説明するものであり、図２４（ａ）に示すように、撮影時に微小な要素レンズ１２１を多数並べてＩＰレンズアレイ１２２としてのＧＲＩＮレンズアレイを構成しており、要素レンズ１２１毎に結像された撮像対象物体ＯＢ１の像は、集光レンズ１２３を介してハイビジョンカメラ１２４に要素画像として結像され、このカメラ１２４により一括して撮影される。そして、再生時においては、図２４（ｂ）に示すように、カメラからの映像は、表示器である例えば液晶ディスプレイ（LCD:Light Crystal Display）１２５により再現され、全ての要素レンズ１２１の光が１点に集合して、全体として１方向から見た再生像を作成する。 FIG. 24 is a diagram for explaining the principle of imaging an IP-type stereoscopic image. As shown in FIG. 24A, a large number of minute lens elements 121 are arranged at the time of photographing to constitute a GRIN lens array as an IP lens array 122. The image of the imaging target object OB1 imaged for each element lens 121 is imaged as an element image on the high-vision camera 124 via the condenser lens 123, and is collectively photographed by this camera 124. At the time of reproduction, as shown in FIG. 24B, the image from the camera is reproduced by a display, for example, a liquid crystal display (LCD) 125, and the light of all the element lenses 121 is reproduced. Collected at one point, a reproduced image viewed from one direction as a whole is created.

微小な要素レンズを２次元に配置することにより水平および垂直方向の運動視差を作り出すことも可能であり、水平方向に並べれば水平方向のみの視差を持たせることも可能である。 It is possible to create motion parallax in the horizontal and vertical directions by arranging minute element lenses two-dimensionally. If they are arranged in the horizontal direction, it is possible to have parallax only in the horizontal direction.

本実施形態においてＩＰ方式に対応する視差画像を生成表示するために、視野変換器１０６は、複数の要素レンズを経由して見えた複数の要素画像を、奥行き情報取り出し器１０４により取り出された奥行きデータに基づいて上述した視点変換によりそれぞれ作成し、作成した要素画像を配列することにより、あたかも図２４の（ａ）の構成で撮像したかのように立体画像表示器１０５における液晶ディスプレイに対してその要素画像配列を表示することにより、立体視再生を実現することができる。実際にＩＰ方式に対応する立体画像表示器１０５は、図２５に示すように、液晶ディスプレイ１２５と、複数の要素レンズ１２１から構成されたＩＰレンズアレイ（例えばＧＲＩＮレンズアレイ）１２２とを備え、その液晶ディスプレイ１２５およびレンズアレイ１２１を近接して対向配置して構成されている。すなわち、立体画像表示器１０５は、２次元映像を表示する構成と略同一の構成を有する薄型ディスプレイとして設計されている。 In this embodiment, in order to generate and display a parallax image corresponding to the IP method, the field of view converter 106 extracts a plurality of element images viewed via a plurality of element lenses by the depth information extractor 104. By creating each of the above-described viewpoint conversions based on the data and arranging the created element images, the liquid crystal display in the stereoscopic image display 105 is displayed as if it was captured with the configuration of FIG. By displaying the element image array, stereoscopic reproduction can be realized. As shown in FIG. 25, the stereoscopic image display 105 that actually corresponds to the IP system includes a liquid crystal display 125 and an IP lens array (for example, a GRIN lens array) 122 composed of a plurality of element lenses 121. The liquid crystal display 125 and the lens array 121 are arranged close to each other and face each other. That is, the stereoscopic image display 105 is designed as a thin display having a configuration that is substantially the same as the configuration for displaying a two-dimensional video.

一方、コントローラ１０９の制御に応じて、再生装置１０５の視野変換器１０６は奥行きデータを用いて視野変換処理を行わないことも可能であり、この場合、立体画像表示器１０５は、コントローラ１０９の制御に応じて、ビデオ復号器１０３から送られた再生用フレーム画像（２次元画像）をフレーム毎に順次表示するようになっている。 On the other hand, in accordance with the control of the controller 109, the visual field converter 106 of the playback device 105 may not perform the visual field conversion process using the depth data. In this case, the stereoscopic image display device 105 controls the controller 109. Accordingly, the playback frame image (two-dimensional image) sent from the video decoder 103 is sequentially displayed for each frame.

以上述べたように、本実施形態に係る再生装置１００によれば、記録媒体９に記録された奥行き情報の使用／不使用の切り替えにより２次元映像および３次元映像を切り替え再生することができ、記録媒体９に記録された２次元映像および３次元映像の再生効率を向上させることができる。 As described above, according to the playback apparatus 100 according to the present embodiment, two-dimensional video and three-dimensional video can be switched and played by switching use / non-use of depth information recorded on the recording medium 9, The reproduction efficiency of 2D video and 3D video recorded on the recording medium 9 can be improved.

また、本実施形態では、奥行き情報から立体視するデータを作成しているため、従来のように片眼画像のみが劣化することを防止して、立体視を行うユーザの満足度を向上させることができる。 Further, in the present embodiment, since the data for stereoscopic viewing is created from the depth information, it is possible to prevent only one-eye image from being deteriorated as in the past and improve the satisfaction of the user who performs stereoscopic viewing. Can do.

なお、第１乃至第６の実施の形態およびその変形例では、記録器１０は、奥行き情報が多重化されたＭＰＥＧ多重化トランスポートストリームＴＳを記録媒体９に記録したが、本発明はこの構成に限定されるものではない。例えば、図２６に示すように、本変形例に係る記録装置２００Ｆにおける記録器１０Ａは、情報多重化器８から送信されてきたＭＰＥＧトランスポートストリームＴＳを、通信（放送）用パケット化器１３０を介して通信または放送用のフォーマットにしたがってパケット化し、通信／放送用フォーマットでパケット化されたストリームを、フォーマットに対応する通信網（あるいは放送網）１３１を介して伝送してもよい。 In the first to sixth embodiments and modifications thereof, the recorder 10 records the MPEG multiplexed transport stream TS in which the depth information is multiplexed on the recording medium 9, but the present invention has this configuration. It is not limited to. For example, as shown in FIG. 26, a recorder 10A in a recording apparatus 200F according to this modification uses an MPEG transport stream TS transmitted from the information multiplexer 8 as a communication (broadcast) packetizer 130. The stream may be packetized according to a communication or broadcast format via the communication / broadcast format, and the stream packetized in the communication / broadcast format may be transmitted via the communication network (or broadcast network) 131 corresponding to the format.

また、第７実施形態では、再生装置１００は、その再生器１０１により記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳを記録媒体９から読み取って再生処理を行ったが、本発明はこの構成に限定されるものではない。例えば、図２７に示すように、本変形例に係る再生装置１００Ａは、通信網（あるいは放送網）１３１を介して伝送されてきた、通信／放送用フォーマットでパケット化されたストリームを受信してパケットを解除する。そして、再生装置１００Ａは、解除したデータを情報分離化器１０２に送り、この情報分離化器１０２において、ＭＰＥＧ多重化トランスポートストリームＴＳからＭＰＥＧビデオストリームＶＳのパケットおよびＭＰＥＧオーディオストリームＡＳのパケットをそれぞれ分離し、以下、上述した再生処理（図２１乃至図２５参照）を実行することも可能である。 In the seventh embodiment, the playback device 100 reads the MPEG multiplexed transport stream TS recorded on the recording medium 9 by the playback device 101 from the recording medium 9 and performs playback processing. The configuration is not limited. For example, as shown in FIG. 27, the playback device 100A according to the present modification receives a stream packetized in a communication / broadcasting format transmitted via a communication network (or broadcast network) 131. Release the packet. Then, the playback device 100A sends the released data to the information separator 102, and the information separator 102 transmits the MPEG video stream VS packet and the MPEG audio stream AS packet from the MPEG multiplexed transport stream TS, respectively. It is also possible to execute the reproduction processing described above (see FIGS. 21 to 25).

本変形例によれば、記録媒体９からＭＰＥＧ多重化トランスポートストリームＴＳを読み取るのではなく、通信網（放送網）１３１を介して伝送されるストリームについても、このストリームを読み取り、読み取ったストリームに対応する２次元あるいは３次元映像等を再生することもできる。 According to this modified example, the MPEG multiplexed transport stream TS is not read from the recording medium 9, but the stream transmitted via the communication network (broadcasting network) 131 is read and the read stream is converted into the read stream. Corresponding two-dimensional or three-dimensional video can be reproduced.

（第８の実施の形態）
図２８（ａ）は、本発明の第８の実施の形態に係る記録用プログラムがインストールされた少なくとも１台のメモリ内蔵型コンピュータ（汎用コンピュータ、マイクロコンピュータ等も含む）１５０を示す図である。 (Eighth embodiment)
FIG. 28A is a diagram showing at least one computer with a built-in memory (including a general-purpose computer and a microcomputer) 150 in which a recording program according to the eighth embodiment of the present invention is installed.

図２８（ａ）に示すように、コンピュータ１５０は、図１に示した一対の撮像カメラ１Ａおよび１Ｂ、マイク６、ならびに記録媒体９に対してそれぞれ接続されている。なお、図２８（ｂ）に示すコンピュータの処理手順は、図１に示すハードウェアブロック構成要素（ビデオ圧縮器２、視差ベクトル抽出器３、奥行き情報算出器４、奥行き情報フォーマット器５、オーディオ圧縮器７、情報多重化器８、および記録器１０）それぞれの処理機能に対応しているため、処理の流れを簡単に説明する。 As shown in FIG. 28A, the computer 150 is connected to the pair of imaging cameras 1A and 1B, the microphone 6, and the recording medium 9 shown in FIG. The processing procedure of the computer shown in FIG. 28B is the hardware block component (video compressor 2, disparity vector extractor 3, depth information calculator 4, depth information formatter 5 and audio compression shown in FIG. Since it corresponds to the processing functions of the device 7, the information multiplexer 8, and the recorder 10), the flow of processing will be briefly described.

すなわち、図２８（ｂ）に示すように、コンピュータ１５０は、メモリ１５０ａに記録された記録用プログラムに従って、左右一対の撮像カメラ１Ａおよび１Ｂによりそれぞれ同一タイミングで撮像された所定時間分の対象物体ＯＢの左眼画像Ｐｌおよび右眼画像Ｐｒを入力し、メモリ１５０ａに記憶する（ステップＳ１１０）。 That is, as shown in FIG. 28 (b), the computer 150, according to the recording program recorded in the memory 150a, captures a target object OB for a predetermined time captured at the same timing by the pair of left and right imaging cameras 1A and 1B. The left eye image Pl and the right eye image Pr are input and stored in the memory 150a (step S110).

次いで、コンピュータ１５０は、図１に示す視差ベクトル抽出器３に対応する処理を実行することにより、第１実施形態で説明したように、エピポーラ線方向およびそのエピポーラ線方向に直交する方向に基づいて設定した探索範囲内の全てのマクロブロック内の全ての対応点に関する視差ベクトルＶを抽出する（ステップＳ１２０）。 Next, the computer 150 executes a process corresponding to the disparity vector extractor 3 shown in FIG. 1, and based on the epipolar line direction and the direction orthogonal to the epipolar line direction, as described in the first embodiment. Disparity vectors V for all corresponding points in all macroblocks within the set search range are extracted (step S120).

続いて、コンピュータ１５０は、図１に示す奥行き情報算出器４に対応する処理を実行することにより、第１実施形態で説明したように、ステップＳ１２０の処理により抽出した各視差ベクトルの大きさを計算し小ブロック単位の各視差ベクトルの位置情報｛水平位置（Ｘ）、垂直位置（Ｙ）、奥行き情報（Ｚ）｝を生成する（ステップＳ１３０）。 Subsequently, the computer 150 executes the process corresponding to the depth information calculator 4 illustrated in FIG. 1, thereby determining the size of each disparity vector extracted by the process of step S 120 as described in the first embodiment. The position information {horizontal position (X), vertical position (Y), depth information (Z)} of each parallax vector calculated in small blocks is generated (step S130).

そして、コンピュータ１５０は、図１に示す奥行き情報フォーマット器５に対応する処理を実行することにより、各小ブロックの奥行き情報の値を各画素の奥行きデータとしてフォーマット化する（ステップＳ１４０）。 Then, the computer 150 executes processing corresponding to the depth information formatter 5 shown in FIG. 1 to format the depth information value of each small block as depth data of each pixel (step S140).

次いで、コンピュータ１５０は、図１に示すビデオ圧縮器２に対応する処理を実行することにより、例えば右眼画像ＰｒをＭＰＥＧ圧縮符号化してＭＰＥＧビデオストリームＶＳを生成するとともに、ステップＳ１４０の処理により得られた各画素の８ビットの奥行きデータを、第１実施形態で説明した（１）〜（４）の内の何れかのデータフォーマットに基づいて、ＭＰＥＧビデオストリームＶＳにおけるユーザデータ領域ＵＤ１／ＵＤ２、ＭＰＥＧ多重化トランスポートストリームＴＳにおけるＤ＿ＰＡＣＫ３２、プライベートデータフィールド、あるいはプライベートストリームフィールドの内の何れかに格納する（ステップＳ１５０）。 Next, the computer 150 executes processing corresponding to the video compressor 2 shown in FIG. 1 to generate, for example, the MPEG video stream VS by MPEG compression encoding the right eye image Pr, and obtain the processing by the processing of step S140. The 8-bit depth data of each pixel obtained is based on any one of the data formats (1) to (4) described in the first embodiment, and the user data areas UD1 / UD2 in the MPEG video stream VS. Stored in any of D_PACK32, private data field, or private stream field in the MPEG multiplexed transport stream TS (step S150).

ステップＳ１１０〜Ｓ１５０の処理と並行して、コンピュータ１５０は、図１に示すオーディオ圧縮器７に対応する処理を実行することにより、マイク６により収集されたオーディオ信号をＭＰＥＧ圧縮符号化してＭＰＥＧオーディオストリームＡＳを生成する（ステップＳ１６０）。 In parallel with the processing of steps S110 to S150, the computer 150 executes processing corresponding to the audio compressor 7 shown in FIG. 1 to MPEG-encode the audio signal collected by the microphone 6 to MPEG-encoded the audio stream. AS is generated (step S160).

続いて、コンピュータ１５０は、図１に示す情報多重化器８に対応する処理を実行することにより、ＭＰＥＧビデオストリームＶＳおよびＭＰＥＧオーディオストリームＡＳを図４に示すＤＶＤビデオフォーマットに従って多重化する（ステップＳ１７０）。 Subsequently, the computer 150 multiplexes the MPEG video stream VS and the MPEG audio stream AS according to the DVD video format shown in FIG. 4 by executing processing corresponding to the information multiplexer 8 shown in FIG. 1 (step S170). ).

そして、コンピュータ１５０は、多重化により生成されたデータストリーム（ＭＰＥＧ多重化トランスポートストリームＴＳ）を、所定の単位（例えば、ＤＶＤビデオ規格に対応する２ｋＢ単位）で記録媒体９に記録するか、あるいは、仮にコンピュータ１５０が所定の通信フォーマットを有する通信網や放送網に接続されている場合、多重化により生成されたＭＰＥＧ多重化トランスポートストリームＴＳを所定の通信フォーマットに対応するパケットに変換して通信網や放送網を介して伝送する（ステップＳ１８０）。 Then, the computer 150 records the data stream (MPEG multiplexed transport stream TS) generated by multiplexing on the recording medium 9 in a predetermined unit (for example, 2 kB unit corresponding to the DVD video standard), or If the computer 150 is connected to a communication network or broadcast network having a predetermined communication format, the MPEG multiplexed transport stream TS generated by multiplexing is converted into a packet corresponding to the predetermined communication format for communication. Transmission is performed via a network or a broadcast network (step S180).

続いて、コンピュータ１５０は、撮像カメラ１Ａおよび／または１Ｂから画像データ（フレーム画像データ）が入力されてくるか否か判断しており（ステップＳ１９０）、この判断の結果ＹＥＳの場合には、ステップＳ１１０の処理に戻ってステップＳ１１０以降の処理を継続して行う。 Subsequently, the computer 150 determines whether image data (frame image data) is input from the imaging cameras 1A and / or 1B (step S190). If the result of this determination is YES, step 150 is performed. Returning to the processing of S110, the processing after step S110 is continued.

一方、ステップＳ１９０の判断の結果ＮＯの場合には、コンピュータ１５０は、処理を終了する。 On the other hand, if the result of determination in step S190 is NO, the computer 150 ends the process.

すなわち、本実施形態に係る記録プログラムによれば、コンピュータ１５０に対して、各画素の奥行きデータをＭＰＥＧビデオストリームＶＳのユーザデータ領域ＵＤ１またはＵＤ２に格納させ、そのＭＰＥＧビデオストリームＶＳをＭＰＥＧオーディオストリームＡＳと多重化させて記録媒体９に記録させることができる。このため、第１実施形態と同様に、記録媒体９に記録された奥行き情報の使用／不使用の切り替えにより２次元映像および３次元映像の切り替え再生が可能になり、２次元映像および３次元映像の再生効率を向上させることができる。 That is, according to the recording program according to the present embodiment, the computer 150 stores the depth data of each pixel in the user data area UD1 or UD2 of the MPEG video stream VS, and the MPEG video stream VS is stored in the MPEG audio stream AS. And can be recorded on the recording medium 9. For this reason, similarly to the first embodiment, switching between 2D video and 3D video can be performed by switching use / non-use of depth information recorded on the recording medium 9, and 2D video and 3D video can be switched. The reproduction efficiency can be improved.

（第９の実施の形態）
図２９（ａ）は、本発明の第９の実施の形態に係る再生用プログラムがインストールされた少なくとも１台のメモリ内蔵型コンピュータ（汎用コンピュータ、マイクロコンピュータ等も含む）１６０を示す図である。 (Ninth embodiment)
FIG. 29A is a diagram showing at least one memory built-in computer (including a general-purpose computer and a microcomputer) 160 on which a reproduction program according to the ninth embodiment of the present invention is installed.

図２９（ａ）に示すように、コンピュータ１６０は、図２１に示した記録媒体９、立体画像表示器１０５、およびスピーカ１０８に対してそれぞれ接続されている。なお、図２９（ｂ）に示すコンピュータの処理手順は、図２１に示すハードウェアブロック構成要素（再生器１０１、情報分離化器１０２、ビデオ復号器１０３、奥行き情報取り出し器１０４、視野変換器１０６、およびオーディオ復号器１０７）それぞれの処理機能に対応しているため、処理の流れを簡単に説明する。 As shown in FIG. 29A, the computer 160 is connected to the recording medium 9, the stereoscopic image display 105, and the speaker 108 shown in FIG. The processing procedure of the computer shown in FIG. 29 (b) is the same as the hardware block components (reproducer 101, information separator 102, video decoder 103, depth information extractor 104, field of view converter 106 shown in FIG. , And the audio decoder 107) correspond to the respective processing functions, and therefore the processing flow will be briefly described.

すなわち、図２９（ｂ）に示すように、コンピュータ１６０は、メモリ１６０ａに記録された再生用プログラムに従って、記録媒体９に記録されたＭＰＥＧ多重化トランスポートストリームＴＳを再生するか、あるいはコンピュータ１６０が所定の通信フォーマットを有する通信網や放送網に接続されている場合、その通信網や放送網から伝送されてきたＭＰＥＧ多重化トランスポートストリームＴＳ（所定の通信フォーマットでフォーマット化されている）を受信する（ステップＳ２１０）。 That is, as shown in FIG. 29B, the computer 160 reproduces the MPEG multiplexed transport stream TS recorded on the recording medium 9 according to the reproduction program recorded on the memory 160a, or the computer 160 When connected to a communication network or broadcast network having a predetermined communication format, an MPEG multiplexed transport stream TS (formatted in a predetermined communication format) transmitted from the communication network or broadcast network is received. (Step S210).

次いで、コンピュータ１６０は、図２１に示す情報分離化器１０２に対応する処理を実行することにより、第７実施形態で説明したように、ＭＰＥＧ多重化ストリームＴＳからＭＰＥＧビデオストリームＶＳのパケットおよびＭＰＥＧオーディオストリームＡＳのパケットをそれぞれ分離する（ステップＳ２２０）。なお、ストリームＴＳが所定の通信フォーマットでパケット化されている場合には、この通信フォーマットのパケットを解除してから、ステップＳ２２０の処理を実行する。 Next, the computer 160 executes a process corresponding to the information separator 102 shown in FIG. 21, and as described in the seventh embodiment, the packet of the MPEG video stream VS from the MPEG multiplexed stream TS and the MPEG audio are transmitted. Each packet of the stream AS is separated (step S220). If the stream TS is packetized in a predetermined communication format, the process of step S220 is executed after releasing the packet of this communication format.

続いて、コンピュータ１６０は、図２１に示すビデオ復号器１０３に対応する処理を実行することにより、第７実施形態で説明したように、分離されたＭＰＥＧビデオストリームＶＳのパケットを復号化して復号化データ（再生用フレーム画像）を生成するとともに、ＭＰＥＧビデオストリームＶＳのユーザデータ領域ＵＤ１あるいはＵＤ２に格納されたユーザデータを読み出す（ステップＳ２３０）。 Subsequently, the computer 160 executes processing corresponding to the video decoder 103 shown in FIG. 21, thereby decoding and decoding the packet of the separated MPEG video stream VS as described in the seventh embodiment. Data (reproduction frame image) is generated, and user data stored in the user data area UD1 or UD2 of the MPEG video stream VS is read (step S230).

なお、ユーザデータがＭＰＥＧ多重化トランスポートストリームＴＳにおけるＤ＿ＰＡＣＫ３２、プライベートデータフィールド、もしくはプライベートストリームフィールドに格納されている場合には、コンピュータ１６０は、ステップＳ２２０の処理において、そのＭＰＥＧ多重化トランスポートストリームＴＳにおける対応する格納領域からユーザデータを読み出す。 When user data is stored in the D_PACK32, the private data field, or the private stream field in the MPEG multiplexed transport stream TS, the computer 160 determines that the MPEG multiplexed transport stream TS in step S220. User data is read out from the corresponding storage area at.

そして、コンピュータ１６０は、図２１に示す奥行き情報取り出し器１０４に対応する処理を実行することにより、第７実施形態で説明したように、ステップＳ２３０の処理により読み出したユーザデータから、例えば図４や図１７のフレームレイヤのフォーマットで記録されたマクロブロック毎の奥行きデータを取り出す（ステップＳ２４０）。次いで、コンピュータ１６０は、再生用フレーム画像および奥行きデータに基づいて、立体画像表示器１０５の立体表示方式に対応する視差画像を生成し（ステップＳ２５０）、生成した視差画像を立体画像表示器１０５により立体表示する(ステップＳ２６０)。 Then, the computer 160 executes the process corresponding to the depth information extractor 104 shown in FIG. 21, and as described in the seventh embodiment, from the user data read out in the process of step S230, for example, FIG. Depth data for each macroblock recorded in the frame layer format of FIG. 17 is extracted (step S240). Next, the computer 160 generates a parallax image corresponding to the stereoscopic display method of the stereoscopic image display 105 based on the playback frame image and the depth data (step S250), and the generated parallax image is generated by the stereoscopic image display 105. Three-dimensional display is performed (step S260).

一方、コンピュータ１６０は、ステップＳ２２０の処理により分離したＭＰＥＧオーディオストリームＡＳのパケットを復号化し（ステップＳ２７０）、復号化データ（オーディオ信号）としてスピーカ１０８により再生する（ステップＳ２８０）。 On the other hand, the computer 160 decodes the packet of the MPEG audio stream AS separated by the process of step S220 (step S270), and reproduces the decoded data (audio signal) by the speaker 108 (step S280).

コンピュータ１６０は、記録媒体９に再生対象となる画像データ（フレーム画像データ）が依然として記録されているか否か、あるいは通信網（放送網）から画像データがコンピュータ１６０に入力されてくるか否かを判断しており（ステップＳ２９０）、この判断の結果ＹＥＳの場合には、ステップＳ２１０の処理に戻ってステップＳ２１０以降の処理を継続して行う。 The computer 160 determines whether image data (frame image data) to be reproduced is still recorded on the recording medium 9 or whether image data is input to the computer 160 from a communication network (broadcast network). If it is determined (step S290) and the result of this determination is YES, processing returns to step S210 and processing from step S210 onward is continued.

一方、ステップＳ２９０の判断の結果ＮＯの場合には、コンピュータ１６０は、処理を終了する。 On the other hand, if the result of determination in step S290 is NO, the computer 160 ends the process.

すなわち、本実施形態に係る再生プログラムによれば、第６実施形態と同様に、記録媒体９に記録された奥行き情報の使用／不使用の切り替えにより２次元映像および３次元映像の切り替え再生が可能になり、２次元映像および３次元映像の再生効率を向上させることができる。 In other words, according to the playback program according to the present embodiment, switching between 2D video and 3D video can be performed by using / not using depth information recorded on the recording medium 9 as in the sixth embodiment. Thus, the reproduction efficiency of 2D video and 3D video can be improved.

なお、第６および第８実施形態において、立体画像表示器１０５の立体映像表示方式として、ＩＰ方式およびパララックスバリア方式について説明したが、本発明はこの方式に限定されるものではなく、レンチキュラーレンズ方式、超多眼方式、偏向眼鏡を用いた２眼方式、アナグリフ方式等、立体知覚できる方式であれば、何れの方式も適用できる。また、本発明では、奥行きデータを含むＭＰＥＧ多重化トランスポートストリームを必ずしも記録媒体に記録する必要はなく、図２６に示したように、通信網や放送網等の様々な伝送媒体を経由してＭＰＥＧ多重化トランスポートストリームを伝送することが可能である。したがって、この場合には、記録装置は伝送装置として使用される。また、図２７に示したように、本発明に係る再生装置を、通信網や放送網等の様々な伝送媒体を経由してＭＰＥＧ多重化トランスポートストリームを受信する受信装置として用いることも可能である。 In the sixth and eighth embodiments, the IP method and the parallax barrier method have been described as the stereoscopic image display method of the stereoscopic image display 105, but the present invention is not limited to this method, and the lenticular lens Any method can be applied as long as it is a method capable of stereoscopic perception such as a method, a super multi-view method, a binocular method using deflection glasses, and an anaglyph method. Further, in the present invention, it is not always necessary to record an MPEG multiplexed transport stream including depth data on a recording medium, as shown in FIG. 26, via various transmission media such as a communication network and a broadcast network. It is possible to transmit an MPEG multiplexed transport stream. Therefore, in this case, the recording device is used as a transmission device. As shown in FIG. 27, the playback device according to the present invention can also be used as a receiving device that receives an MPEG multiplexed transport stream via various transmission media such as a communication network and a broadcasting network. is there.

また、第１〜第８実施形態に係る記録媒体は、３次元映像表示用の奥行き情報を記録しているという媒体特有の効果を有しているため、複数の立体映像（３次元映像）再生方法に対応して３次元映像の再生を可能にするシステムを好適に実現することができる。 Moreover, since the recording media according to the first to eighth embodiments have a medium-specific effect of recording depth information for 3D video display, a plurality of 3D video (3D video) can be reproduced. It is possible to preferably realize a system that enables playback of 3D video in accordance with the method.

さらに、本発明に係る記録媒体における“媒体”という定義は、データを記録できる媒体という狭義な媒体を表すだけでなく、データ伝送用の媒体である電磁波、光等の物理的媒体等も含む概念である。また、記録媒体に記録されている情報は、記録されていない状態での、電子ファイルなどのデータ自身も含むものとする。 Furthermore, the definition of “medium” in the recording medium according to the present invention represents not only a narrowly-defined medium that can record data, but also a concept including a physical medium such as an electromagnetic wave or light that is a data transmission medium. It is. In addition, the information recorded on the recording medium includes data itself such as an electronic file when not recorded.

そして、映像データの奥行き情報は、フレーム画像毎に記録するように説明したが、本発明においては、約０．５秒毎、あるいは約１秒毎に記録する構成でもよい。その場合には、ＭＰＥＧのＧＯＰレイヤに用意されたユーザデータ領域に記録されたユーザデータを用いることで実現できる。 In the above description, the depth information of the video data is recorded for each frame image. However, in the present invention, the depth information may be recorded about every 0.5 second or about every 1 second. In that case, it can be realized by using user data recorded in a user data area prepared in the MPEG GOP layer.

また、上記各実施形態では、ビデオデータおよびオーディオデータをそれぞれ圧縮符号化して多重化したが、オーディオデータを用いずに、ビデオデータを中心に多重化してもよい。また、オーディオデータやビデオデータに限らず、他のサブピクチャや制御情報等のデータを多重化してよいことも当然である。 In each of the above embodiments, the video data and the audio data are respectively compressed and encoded and multiplexed. However, the video data may be multiplexed mainly without using the audio data. Of course, not only audio data and video data, but also data such as other sub-pictures and control information may be multiplexed.

さらに、上記各実施形態では、奥行きデータを例えば左目画像の画素毎に求めたが、本発明はこの構成に限定されるものではなく、画素以外の単位毎に奥行きデータを求めてもよい。 Further, in each of the above embodiments, the depth data is obtained for each pixel of the left eye image, for example, but the present invention is not limited to this configuration, and the depth data may be obtained for each unit other than the pixel.

なお、第７の実施形態および第８の実施形態においてコンピュータにインストールされたプログラムは、このコンピュータがアクセス可能なＣＤ―ＲＯＭ、ＤＶＤ−ＲＯＭ等の各種の記録媒体からコンピュータ内にインストールしてもよく、あるいは、上記通信網を介して伝送されてきたプログラムをコンピュータ内にインストールしてもよい。 Note that the program installed in the computer in the seventh and eighth embodiments may be installed in the computer from various recording media such as CD-ROM and DVD-ROM accessible by the computer. Alternatively, the program transmitted via the communication network may be installed in the computer.

本発明の第１の実施の形態に係る記録装置の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a recording apparatus according to a first embodiment of the invention. 本発明における第１の実施形態の記録装置の映像記録対象である物体に対する記録装置の配置関係を示す図である。It is a figure which shows the arrangement | positioning relationship of the recording device with respect to the object which is a video recording target of the recording device of 1st Embodiment in this invention. （ａ）は、本発明の第１の実施の形態に係る右眼画像、左眼画像、対象物体、およびエピポーラ線をそれぞれ示す斜視図であり、（ｂ）は、図３（ａ）に示される右眼画像および左眼画像上に設定される探索範囲を示す図である。(A) is a perspective view which shows each of the right eye image, left eye image, target object, and epipolar line which concern on the 1st Embodiment of this invention, (b) is shown in Fig.3 (a). It is a figure which shows the search range set on the right-eye image and left-eye image to be set. ＤＶＤビデオフォーマットの概略を示す図。The figure which shows the outline of a DVD video format. ＭＰＥＧによる圧縮符号化処理を実行するための符号化装置の一例を示すブロック図である。It is a block diagram which shows an example of the encoding apparatus for performing the compression encoding process by MPEG. 図５に示す符号化装置により圧縮符号化されたビデオデータを復号して再生する復号化装置を示すブロック図である。FIG. 6 is a block diagram illustrating a decoding apparatus that decodes and reproduces video data that has been compression-encoded by the encoding apparatus illustrated in FIG. 5. 図５に示す符号化装置により生成されるＭＰＥＧビデオストリームＶＳの階層構造の概略構成を示す図である。FIG. 6 is a diagram showing a schematic configuration of a hierarchical structure of an MPEG video stream VS generated by the encoding device shown in FIG. 5. （ａ）は、図７に示すシーケンスレイヤのシンタックスを示す図であり、（ｂ）は、シーケンスヘッダのシンタックスを示す図である。(A) is a figure which shows the syntax of the sequence layer shown in FIG. 7, (b) is a figure which shows the syntax of a sequence header. ＧＯＰレイヤのシンタックスを示す図である。It is a figure which shows the syntax of a GOP layer. ｐｉｃｔｕｒｅレイヤのシンタックスを示す図である。It is a figure which shows the syntax of a picture layer. ｓｌｉｃｅレイヤのシンタックスを示す図である。It is a figure which shows the syntax of a slice layer. Ｍａｃｒｏｂｌｏｃｋレイヤのシンタックスを示す図である。It is a figure which shows the syntax of a Macroblock layer. ブロックレイヤのシンタックスを示す図である。It is a figure which shows the syntax of a block layer. 本発明の第２の実施の形態に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the 2nd Embodiment of this invention. ＭＰＥＧ多重化トランスポートストリームのシステムレイヤのシンタックスを示す図である。It is a figure which shows the syntax of the system layer of an MPEG multiplexed transport stream. 本発明の第３の実施の形態に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係るＤＶＤビデオフォーマットの概略を示す図である。It is a figure which shows the outline of the DVD video format based on the 3rd Embodiment of this invention. 本発明の第４の実施の形態に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the 4th Embodiment of this invention. 本発明の第５の実施の形態に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the 5th Embodiment of this invention. 本発明の第６の実施の形態に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the 6th Embodiment of this invention. 本発明の第７の実施の形態に係る再生装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the reproducing | regenerating apparatus based on the 7th Embodiment of this invention. 本発明の第７の実施形態における視野変換方式を説明するための座標系を示す図である。It is a figure which shows the coordinate system for demonstrating the visual field conversion system in the 7th Embodiment of this invention. 本発明の第７の実施形態に係る立体表示方式の一例であるパララックスバリア方式を概略的に説明するための図である。It is a figure for demonstrating schematically the parallax barrier system which is an example of the three-dimensional display system concerning the 7th Embodiment of this invention. （ａ）は、本発明の第７の実施形態に係る立体表示方式の一例である撮影時におけるＩＰ方式を概略的に説明するための図であり、（ｂ）は、表示時におけるＩＰ方式を概略的に説明するための図である。(A) is a figure for demonstrating schematically the IP system at the time of imaging | photography which is an example of the three-dimensional display system which concerns on the 7th Embodiment of this invention, (b) is an IP system at the time of a display. It is a figure for demonstrating schematically. 本発明の第７の実施形態に係る立体表示方式としてＩＰ方式を適用した図２１に示す立体画像表示器の概略構成を示す図である。It is a figure which shows schematic structure of the three-dimensional image display shown in FIG. 21 which applied the IP system as a three-dimensional display system based on the 7th Embodiment of this invention. 本発明の第１〜第６実施形態の変形例に係る記録装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the recording device which concerns on the modification of 1st-6th embodiment of this invention. 本発明の第７の実施の形態の変形例に係る再生装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the reproducing | regenerating apparatus based on the modification of the 7th Embodiment of this invention. （ａ）は、本発明の第８の実施の形態に係る記録用プログラムがインストールされたメモリ内蔵型コンピュータを含むシステムの概略構成を示す図であり、（ｂ）は、図２８（ａ）に示すコンピュータの処理手順を概略的に示すフローチャートである。(A) is a figure which shows schematic structure of the system containing the computer with a built-in memory in which the recording program based on the 8th Embodiment of this invention was installed, (b) is a figure in Fig.28 (a). It is a flowchart which shows the process sequence of the computer shown schematically. （ａ）は、本発明の第９の実施の形態に係る再生用プログラムがインストールされたメモリ内蔵型コンピュータを含むシステムの概略構成を示す図であり、（ｂ）は、図２９（ａ）に示すコンピュータの処理手順を概略的に示すフローチャートである。(A) is a figure which shows schematic structure of the system containing the computer with a built-in memory in which the program for reproduction | regeneration which concerns on the 9th Embodiment of this invention was installed, (b) is a figure which shows (a) in FIG. It is a flowchart which shows the process sequence of the computer shown schematically.

Explanation of symbols

１Ａ，１Ｂ撮像カメラ
２，２Ａビデオ圧縮器
３視差ベクトル抽出器
４奥行き情報算出器
５，５Ａ奥行き情報フォーマット器
６マイク
７オーディオ圧縮器
８，８Ａ，８Ｂ情報多重化器
９記録媒体
１０記録器
１１制御部
９１コンピュータ
１００，１００Ａ再生装置
１０１再生器
１０２情報分離化器
１０３ビデオ復号器
１０４奥行き情報取り出し器
１０５立体画像表示器
１０６視野変換器
１０７オーディオ復号器
１０８スピーカ
１０９コントローラ
１３０通信（放送）用パケット化器
１３１通信（放送）網
１４０通信（放送）用パケット解除器
１５０，１６０コンピュータ
１５０ａ，１６０ａメモリ
２００，２００Ａ〜２００Ｆ記録装置 1A, 1B Imaging camera 2, 2A Video compressor 3 Parallax vector extractor 4 Depth information calculator 5, 5A Depth information formatter 6 Microphone 7 Audio compressor 8, 8A, 8B Information multiplexer 9 Recording medium 10 Recorder 11 Control unit 91 Computer 100, 100A Reproducing apparatus 101 Reproducing unit 102 Information separating unit 103 Video decoder 104 Depth information extracting unit 105 Stereo image display unit 106 Field of view converter 107 Audio decoder 108 Speaker 109 Controller 130 Communication (broadcasting) packet Generator 131 Communication (broadcast) network 140 Communication (broadcast) packet release unit 150, 160 Computer 150a, 160a Memory 200, 200A-200F Recording device

Claims

A first bit stream recorded on a recording medium formatted in accordance with a predetermined standard and formed by compressing and encoding a two-dimensional image using a compression encoding format conforming to the format of the recording medium; A playback device for playing back a multiplexed stream obtained by multiplexing a second bit stream formed by formatting information relating to the depth of the two-dimensional image obtained for each predetermined unit,
Means for separating the first bit stream from the multiplexed stream and reproducing the two-dimensional image;
Means for separating the second bitstream from the multiplexed stream and reproducing information relating to the depth of the two-dimensional image;
Means for generating a three-dimensional image based on the reproduced two-dimensional image and information relating to the depth of the two-dimensional image;
A playback apparatus comprising:

A video stream recorded on a recording medium formatted in accordance with a predetermined standard and formed by compressing and encoding a two-dimensional image according to a compression encoding format conforming to the format of the recording medium, and the compression encoding in the video stream A playback device that plays back a multiplexed stream configured by multiplexing information on the depth of the two-dimensional image stored in an arbitrary use area set in a format,
Means for reproducing the two-dimensional image by separating the bit stream from the multiplexed stream;
Means for reproducing information relating to the depth of the two-dimensional image stored in the arbitrary use area in the separated bitstream;
Means for generating a three-dimensional image based on the reproduced two-dimensional image and information relating to the depth of the two-dimensional image;
A playback apparatus comprising:

The two-dimensional image is a plurality of frame images constituting a two-dimensional video, and the depth information is a pixel value for each pixel as the predetermined unit in each frame image constituting the two-dimensional video,
A pixel value representing depth information for each pixel in each frame image constituting the two-dimensional video is a first compression code that is differentially encoded in each frame image by a compression coding format that conforms to the format of the recording medium. Processing, second compression encoding processing for run-length encoding, third encoding for predictive encoding from each frame image and at least one of the future and past frame images on the time axis of each frame image Compressed and encoded by at least one of the compression encoding processing and the fourth compression encoding processing encoded using orthogonal transform in each frame image, and recorded on the recording medium Has been
The means for reproducing the information on the depth is the depth for each pixel of each frame image that has been compression-encoded by at least one of the first to fourth compression-encoding processes. 2. A means for decoding pixel values representing information by a decoding process corresponding to at least one of the compression encoding processes to reproduce information relating to the depth of the two-dimensional image. Or the reproducing apparatus of 2.

A first bit stream recorded on a recording medium formatted in accordance with a predetermined standard and formed by compressing and encoding a two-dimensional image using a compression encoding format conforming to the format of the recording medium; A computer-executable playback program for playing back a multiplexed stream obtained by multiplexing a second bit stream formed by formatting information about the depth of the two-dimensional image obtained for each predetermined unit,
In the computer,
Processing to separate the first bitstream from the multiplexed stream and reproduce the two-dimensional image;
Processing to separate the second bitstream from the multiplexed stream and reproduce information about the depth of the two-dimensional image;
Processing for generating a three-dimensional image based on the reproduced two-dimensional image and information on the depth of the two-dimensional image;
A reproduction program characterized in that each of the above is executed.

A video stream recorded on a recording medium formatted in accordance with a predetermined standard and formed by compressing and encoding a two-dimensional image according to a compression encoding format conforming to the format of the recording medium, and the compression encoding in the video stream A reproduction program executable by a computer for reproducing a multiplexed stream configured by multiplexing information on the depth of the two-dimensional image stored in an arbitrary use area set in a format,
In the computer,
Processing to separate the bit stream from the multiplexed stream and reproduce the two-dimensional image;
Processing for reproducing information about the depth of the two-dimensional image stored in the arbitrary use area in the separated bitstream;
Processing for generating a three-dimensional image based on the reproduced two-dimensional image and information on the depth of the two-dimensional image;
A reproduction program characterized in that each of the above is executed.

The two-dimensional image is a plurality of frame images constituting a two-dimensional video, and the depth information is a pixel value for each pixel as the predetermined unit in each frame image constituting the two-dimensional video,
A pixel value representing depth information for each pixel in each frame image constituting the two-dimensional video is a first compression code that is differentially encoded in each frame image by a compression coding format that conforms to the format of the recording medium. Processing, second compression encoding processing for run-length encoding, third encoding for predictive encoding from each frame image and at least one of the future and past frame images on the time axis of each frame image Compressed and encoded by at least one of the compression encoding processing and the fourth compression encoding processing encoded using orthogonal transform in each frame image, and recorded on the recording medium Has been
The process of reproducing the information about the depth is the depth for each pixel of each frame image that has been compression-encoded by at least one of the first to fourth compression-encoding processes. 5. The method according to claim 4, further comprising: a process of decoding a pixel value representing information by a decoding process corresponding to at least one of the compression encoding processes to reproduce information about the depth of the two-dimensional image. 5. The reproduction program according to 5.