WO2009025503A2 - Method of generating contents information and apparatus for managing contents using the contents information - Google Patents

Method of generating contents information and apparatus for managing contents using the contents information Download PDF

Info

Publication number
WO2009025503A2
WO2009025503A2 PCT/KR2008/004858 KR2008004858W WO2009025503A2 WO 2009025503 A2 WO2009025503 A2 WO 2009025503A2 KR 2008004858 W KR2008004858 W KR 2008004858W WO 2009025503 A2 WO2009025503 A2 WO 2009025503A2
Authority
WO
WIPO (PCT)
Prior art keywords
contents
information
scenes
stereoscopic
dimensional
Prior art date
Application number
PCT/KR2008/004858
Other languages
French (fr)
Other versions
WO2009025503A3 (en
Inventor
Kug Jin Yun
Nam Ho Hur
Bong Ho Lee
Hyun Lee
Jin Woong Kim
Soo In Lee
Yoon-Jin Lee
Young-Kown Lim
Original Assignee
Electronics And Telecommunications Research Institute
Net & Tv Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics And Telecommunications Research Institute, Net & Tv Inc. filed Critical Electronics And Telecommunications Research Institute
Priority to US12/673,604 priority Critical patent/US20110175985A1/en
Priority to EP08793371.9A priority patent/EP2183924A4/en
Publication of WO2009025503A2 publication Critical patent/WO2009025503A2/en
Publication of WO2009025503A3 publication Critical patent/WO2009025503A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/172Processing image signals image signals comprising non-image signal components, e.g. headers or format information
    • H04N13/178Metadata, e.g. disparity information

Definitions

  • the present invention relates to a method of generating contents information, and more particularly, to a method of generating a stereoscopic descriptor, which is contents information for managing contents including stereoscopic contents, which can be two-dimensional or three-dimensional contents, and an apparatus for managing contents using the stereoscopic descriptor.
  • the present invention is derived from a study that was supported by the IT R&D program of MIC/IITA [2007-S-004-01, Development of Glassless Single-User 3D Broadcasting Technologies].
  • the initial object descriptor which is data first transferred from an MPEG-4 session, is a descriptor having information on the binary format for scene stream or the object descriptor stream.
  • the contents include several media objects, such as a still image, a text, a motion picture, audio, or the like, wherein the binary format for scene stream represents a spatial position and a temporal relation between the media objects.
  • the object descriptor is a descriptor including information required for the relationship and decoding of the binary format for scene stream and the media objects
  • the apparatus proposes a structure of an object descriptor including information on the number of media streams according to a kind of a three-dimensional motion picture (information representing whether an image is a stereoscopic three-dimensional motion picture or a multiview three-dimensional motion picture), a display mode (two-dimensional/field shuttering/frame shuttering/polarizer display modes for the ste- reoscopic three-dimensional motion picture and two-dimensional/panorama/stereo display modes for the multiview three-dimensional motion picture), the number of cameras, the number of views, and the number of media streams according to the views, and provides an apparatus for managing a three-dimensional motion picture using an object descriptor with the structure.
  • An object of the present invention is to provide a method of generating contents information and an apparatus for managing contents using the contents information having advantages of managing contents composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
  • an exemplary embodiment of the present invention provides a method of generating contents information including adding a first field representing the number of scene changes of contents to contents information, and adding a second field including information on each of a plurality of scenes corresponding to a plurality of types, respectively, to the contents information, when there is a scene change of contents.
  • another embodiment of the present invention provides a method of generating contents information including adding a first field representing the number of scene changes of contents to contents information, and adding a second field including information on a contents type to the contents information when there is no scene change of contents.
  • yet another embodiment of the present invention provides an apparatus for managing contents including: a control signal generating unit that generates a binary format for a scene descriptor, an object descriptor, and a stereoscopic descriptor; an encoding unit that encodes media data and control signals input from the control signal generating unit and outputs an encoding stream (elementary stream, ES); and a unit that generates a file after receiving the encoding stream, the stereoscopic descriptor including information required for decoding and re- producing the contents composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
  • FIG. 1 illustrates configuration of contents that are provided by an apparatus for managing contents according to an exemplary embodiment of the present invention.
  • FIG. 2 is a view showing an apparatus for managing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
  • FIG. 3 is a view showing an apparatus for reproducing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
  • FIG. 4 is a flowchart illustrating a method of generating a stereoscopic descriptor according to an exemplary embodiment of the present invention.
  • FIG. 5 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
  • FIG. 6 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention in the case where contents are configured by a single type.
  • FIG. 7 shows the types of 3D contents.
  • FIG. 8 illustrates parallel and cross arrangements of cameras.
  • contents include a motion picture and a still image.
  • FIG. 1 is a view showing the configuration of contents that are provided by an apparatus for managing contents according to an exemplary embodiment of the present invention.
  • FIG. 1 shows the types of contents transferred over time with respect to configuration forms of respective contents
  • FIG. 1 shows a form composed of three-dimensional contents only in a specific time and two-dimensional contents in the remaining time.
  • the configuration of (a) of FIG. 1 is composed of three-dimensional contents only from tl time to t2 time and two-dimensional contents in the remaining time.
  • (b) of FIG. 1 shows a form composed of three-dimensional contents only in a specific time and two-dimensional contents in the remaining time.
  • the configuration of (b) of FIG. 1 is composed of three-dimensional contents only from tl time to t2 time and two-dimensional contents in the remaining time.
  • FIG. 1 shows a form composed of a single type of three-dimensional contents.
  • the three-dimensional contents include a left image and a right image, wherein the left image and the right image can be provided from one source and two sources.
  • an option shows a case where the left image and the right image are provided from the two sources.
  • FIG. 1 shows a form composed of various types of three-dimensional contents.
  • FIG. 2 is a view showing an apparatus for managing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention
  • FIG. 3 is a view showing an apparatus for reproducing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
  • an apparatus for managing contents includes a storing unit 210, a three-dimensional contents generating unit 220, a control signal generating unit 230, an encoding unit 240, an MP4 file generating unit 250, and a packetizing unit 260.
  • the storing unit 210 stores contents obtained by a camera and the three-dimensional generating unit 220 generates the three-dimensional contents by converting the sizes and colors of images transferred from the storing unit 210.
  • the control signal generating unit 230 generates a binary format for a scene descriptor, an object descriptor, and a stereoscopic descriptor in MPEG-4.
  • the stereoscopic descriptor includes information required for decoding and reproducing contents when the contents are composed of two-dimensional contents and three- dimensional contents, or various types of three-dimensional contents.
  • the encoding unit 240 encodes two-dimensional contents input from the storing unit
  • each encoding stream (elementary stream, ES).
  • the MP4 file generating unit 250 receives each encoding stream and generates the
  • the packetizing unit 260 extracts media data and MPEG-4 control signals included in the MP4 file after receiving the MP4 file from the MP4 file generating unit 250 and then generates packets defined in the MPEG-4 system specifications, or extracts the media data and the MPEG-4 control signals after receiving the encoding stream from the encoding unit 240 and then generates packets defined in the MPEG-4 system specifications, which in turn transmits them through a network.
  • the apparatus for reproducing contents includes a depacketizing unit 310, a decoding unit 320, and a display unit 330.
  • the depacketizing unit 310 receives the received MPEG-4 packet to recover the media data.
  • the decoding unit 320 decodes the recovered media data in the depacketizing unit 310 to recover contents.
  • the display unit 330 displays the recovered contents.
  • StereoScopicDescrTag represents a stereoscopic descriptor tag.
  • the stereoscopic descriptor includes ScenechangeSpecificInfo, and if the Scene_change_number is 0, the stereoscopic descriptor includes 3-bit Contents_format.
  • the stereoscopic descriptor includes 1-bit StereoscopicCamera_setting, 4-bit Reserved, 16-bit Baseline, 16-bit Focal_Length, 16-bit Con vergencePoint_distance, 16-bit Max_disparity, and 16-bit Min_disparity.
  • the ScenechangeSpecificInfo includes 16-bit Start_AU_index, 3-bit
  • the stereoscopic descriptor includes the ScenechangeSpecificInfo and the Contents_format regardless of the Scene_change_number and may allow a user to designate the structure of the ScenechangeSpecificInfo, rather than previously designate it.
  • the StereoscopicCamera_setting, the Reserved, the Baseline, the Focal_Length, and the ConvergencePoint_distance may be represented to be included in StereoscopicCameralnfo being a separate field and the Max_disparity and the Min_disparity may be represented to be included in StereoscopicContentsInfo being a separate field.
  • FIG. 4 is a flowchart illustrating a method of generating a stereoscopic descriptor according to an exemplary embodiment of the present invention.
  • FIG. 5 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents
  • FIG. 6 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when contents are formed in one type.
  • the structure of the stereoscopic descriptor according to an exemplary embodiment of the present invention can be applied to all systems for servicing MPEG-2/MPEG-4 system-based stereoscopic contents, but MPEG-2/MPEG-4 system specifications do not support the stereoscopic descriptor.
  • the control signal generating unit 230 adds Scenechange_number fields 510 and 610 (S410).
  • the number of scene changes represents the changed number of contents type when the contents are composed of two-dimensional contents and three- dimensional contents, or various types of three-dimensional contents.
  • the scene means a unit in which the same contents type is transferred.
  • FIGS. Ia, Ib, and Id are composed of three scenes, wherein the number of scene changes is 2.
  • the contents of FIG. Ic are composed of one scene, wherein the number of scene changes is 0.
  • the control signal generating unit 230 adds a scene change specification information (ScenechangeSpecificInfo) field 520 (S420).
  • the scene change specification information field 520 which is a field including information on each of the plurality of scenes, includes the start frame index (Start_AU_index), the contents format (Contents_format), the reserved (Reserved), and the decoder specification information (DecoderSpecificInfo) parameters for each of a plurality of scenes, as shown in FIG. 5.
  • the start frame index parameter represents the access unit (AU) number of each scene.
  • AU is generally a frame.
  • the contents format parameter is a parameter representing the types of contents.
  • Table 4 represents an example of a contents format parameter value.
  • Mono means a general 2D motion picture type.
  • FIG. 7 is a view showing the types of 3D contents.
  • the stereoscopic contents include a left image and a right image, wherein side by side means a form that the left image and the right image enter one frame left and right as shown in (a) of FIG. 7.
  • top/down means a form in which the left image and the right image are arranged up and down in a frame as shown in (b) of FIG. 7.
  • field sequential means a form in which the fields of the left image and the right image are alternately arranged in a frame [73] That is, the frame is formed in order of "a 1 st vertical line of a left image, a 2 nd vertical line of a right image, a 3 rd vertical line of a left image, a 4 th vertical line of a right image ".
  • the frame sequential means a form in which the frame of the left image and the frame of the right image are alternately transferred.
  • the frame is transferred in order of "a 1 st frame of a left image, a 1 st frame of a right image, a 2 nd frame of a left image, a 2 nd frame of a right image ".
  • a main + additional image or a depth/disparity map is a form configuring data by considering any one of the left image and the right image as a main image and the other as a sub image, or configuring data by considering the left image or the right image as a main image and adding a depth/disparity map.
  • the depth/disparity map can generate the stereoscopic image using the left image or the right image, and the depth/disparity map as the information obtained through a separate signal processing using the obtained left and right images.
  • the depth/disparity map has an advantage of having a smaller data amount than the image form.
  • the reserved (Reserved) which is a reserved bit, is an inserted bit so as to meet 16 bits.
  • the decoder specification information (DecoderSpecificInfo) parameter includes header information required for decoding contents.
  • the scene change specification information field 520 of the stereoscopic descriptor of the contents having the form shown in FIG. Ia includes [0,000,11111, 2D contents header/3600,001,11111, 3D contents header/5800,000,11111, 2D contents header].
  • a general MPEG-4 system can transfer the header information required for decoding in the case where the contents are composed of only two-dimensional contents or a single type of three-dimensional contents, but cannot transfer the header information required for decoding when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
  • an exemplary embodiment of the present invention can transfer the header information required for decoding by using the stereoscopic descriptor including the scene change specification information field 520 as described above, when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
  • a barrier can automatically be on/off.
  • the barrier is attached on an LCD to separate the stereoscopic image, making it possible to perform a role of seeing the left image with a left eye and the left image with a right eye.
  • the control signal generating unit 230 adds a contents format field 620 (S430).
  • the stereoscopic descriptor includes the contents format field but does not include the start frame index (Start_AU_index), the reserved (Reserved), the decoder specification information (DecoderSpecificInfo).
  • control signal generating unit 230 adds stereoscopic camera information
  • SteposcopicCameralnfo fields 530 and 630 (S440).
  • the stereoscopic camera information fields 530 and 630 which are fields including information on a stereoscopic camera, include stereoscopic camera setting (StereoscopicCamera_setting), reserved (Reserved), baseline (Baseline), focal length (Focal_Length) and convergence point distance (ConvergencePoint_distance) paramters.
  • the stereoscopic camera setting which represents an arrangement form of a camera upon producing or photographing three-dimensional contents, is divided into a parallel and a cross arrangement.
  • FIG. 8 shows parallel and cross arrangements of cameras. As shown in (a) of FIG. 8, two cameras are arranged in parallel in the parallel arrangement, and as shown in (b) of FIG. 8, cameras are arranged such that the photographing directions cross each other at an object in the cross arrangement.
  • the baseline represents a distance between two cameras and the focal length represents a distance from a lens to an image plane.
  • the image plane is generally a film on which the image is formed.
  • the convergence point distance represents a distance from the baseline to the convergence point, wherein the convergence point means a point crossly meeting at a subject.
  • the control signal generating unit 230 adds (S450) stereoscopic contents information
  • SteposcopicContentsInfo fields 540 and 640.
  • the stereoscopic contents information fields which are fields including information on the disparity of the stereoscopic contents, include Max_disparity and Min_disparity parameters.
  • the disparity leads to a difference in images obtained from two cameras. In other words, a specific point (subject) of the left image is at a slightly different position in the right image.
  • the difference in the image is referred to as the disparity and the information representing the disparity value is referred to as a magnitude of the disparity.
  • the Max disparity represents a magnitude of the Max disparity of the three- dimensional contents and the Min disparity represents a magnitude of the Min disparity of the three-dimensional contents.
  • the above-mentioned exemplary embodiments of the present invention are not embodied only by a method and apparatus.
  • the above-mentioned exemplary embodiments may be embodied by a program performing functions, which correspond to the configuration of the exemplary embodiments of the present invention, or a recording medium on which the program is recorded.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Library & Information Science (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

The present invention relates to a method of generating contents information for managing contents including stereoscopic contents being two-dimensional contents and three dimensional contents, and an apparatus for managing contents, the method including the steps of, when there is a scene change of contents, adding a second field including information on each of a plurality of scenes corresponding to a plurality of types, respectively, to the contents information.

Description

Description
METHOD OF GENERATING CONTENTS INFORMATION AND
APPARATUS FOR MANAGING CONTENTS USING THE
CONTENTS INFORMATION
Technical Field
[1] The present invention relates to a method of generating contents information, and more particularly, to a method of generating a stereoscopic descriptor, which is contents information for managing contents including stereoscopic contents, which can be two-dimensional or three-dimensional contents, and an apparatus for managing contents using the stereoscopic descriptor.
[2] The present invention is derived from a study that was supported by the IT R&D program of MIC/IITA [2007-S-004-01, Development of Glassless Single-User 3D Broadcasting Technologies]. Background Art
[3] In order to transfer contents based on MPEG-4, an initial object descriptor (IOD), a binary format for scene (BIFS), an object descriptor (OD), and media data are needed.
[4] The initial object descriptor, which is data first transferred from an MPEG-4 session, is a descriptor having information on the binary format for scene stream or the object descriptor stream.
[5] The contents include several media objects, such as a still image, a text, a motion picture, audio, or the like, wherein the binary format for scene stream represents a spatial position and a temporal relation between the media objects.
[6] The object descriptor is a descriptor including information required for the relationship and decoding of the binary format for scene stream and the media objects
[7] However, the object descriptor of MPEG-4 focuses on the management of a two- dimensional motion picture, so that it cannot manage a three-dimensional motion picture
[8] Therefore, as a method of managing a motion picture using an object descriptor supporting a three-dimensional motion picture, in the related art, there is an apparatus for managing three-dimensional picture using information and structure of MPEG-4 object descriptor.
[9] The apparatus proposes a structure of an object descriptor including information on the number of media streams according to a kind of a three-dimensional motion picture (information representing whether an image is a stereoscopic three-dimensional motion picture or a multiview three-dimensional motion picture), a display mode (two-dimensional/field shuttering/frame shuttering/polarizer display modes for the ste- reoscopic three-dimensional motion picture and two-dimensional/panorama/stereo display modes for the multiview three-dimensional motion picture), the number of cameras, the number of views, and the number of media streams according to the views, and provides an apparatus for managing a three-dimensional motion picture using an object descriptor with the structure.
[10] However, there is a problem in that it is impossible to manage contents composed of two-dimensional contents and three-dimensional contents or various types of three- dimensional contents, in the related art.
[11] The above information disclosed in this Background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not form the prior art that is already known in this country to a person of ordinary skill in the art. Disclosure of Invention Technical Problem
[12] An object of the present invention is to provide a method of generating contents information and an apparatus for managing contents using the contents information having advantages of managing contents composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents. Technical Solution
[13] To achieve the technical object, an exemplary embodiment of the present invention provides a method of generating contents information including adding a first field representing the number of scene changes of contents to contents information, and adding a second field including information on each of a plurality of scenes corresponding to a plurality of types, respectively, to the contents information, when there is a scene change of contents.
[14] To achieve the technical object, another embodiment of the present invention provides a method of generating contents information including adding a first field representing the number of scene changes of contents to contents information, and adding a second field including information on a contents type to the contents information when there is no scene change of contents.
[15] To achieve the technical object, yet another embodiment of the present invention provides an apparatus for managing contents including: a control signal generating unit that generates a binary format for a scene descriptor, an object descriptor, and a stereoscopic descriptor; an encoding unit that encodes media data and control signals input from the control signal generating unit and outputs an encoding stream (elementary stream, ES); and a unit that generates a file after receiving the encoding stream, the stereoscopic descriptor including information required for decoding and re- producing the contents composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
Advantageous Effects
[16] According to an exemplary embodiment of the present invention, it is possible to manage, using the stereoscopic descriptor, contents composed of two-dimensional contents and three-dimensional contents or various types of three-dimensional contents, and to automatically turn on/off a barrier using a start frame index and contents format information included in the stereoscopic descriptor in a three- dimensional (3D) terminal. Brief Description of the Drawings
[17] FIG. 1 illustrates configuration of contents that are provided by an apparatus for managing contents according to an exemplary embodiment of the present invention.
[18] FIG. 2 is a view showing an apparatus for managing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
[19] FIG. 3 is a view showing an apparatus for reproducing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
[20] FIG. 4 is a flowchart illustrating a method of generating a stereoscopic descriptor according to an exemplary embodiment of the present invention.
[21] FIG. 5 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
[22] FIG. 6 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention in the case where contents are configured by a single type.
[23] FIG. 7 shows the types of 3D contents.
[24] FIG. 8 illustrates parallel and cross arrangements of cameras.
Mode for the Invention
[25] In the following detailed description, only certain exemplary embodiments of the present invention have been shown and described, simply by way of illustration. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not restrictive. Like reference numerals designate like elements throughout the specification.
[26] In the specification, unless explicitly described to the contrary, the word "comprise", and variations such as "comprises" and "comprising", will be understood to imply the inclusion of stated elements but not the exclusion of any other elements. In addition, the terms "-er" and "-or" described in the specification mean units for processing at least one function and operation and can be implemented by hardware components, software components, and combinations thereof.
[27] First, the configuration of contents that are provided by an apparatus for managing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention will be described. The contents include a motion picture and a still image.
[28] FIG. 1 is a view showing the configuration of contents that are provided by an apparatus for managing contents according to an exemplary embodiment of the present invention.
[29] In FIG. 1, the horizontal axis represents time, and "2D" means two-dimensional contents and "3D" means three-dimensional contents.
[30] In FIG. 1, (a) to (d) show the types of contents transferred over time with respect to configuration forms of respective contents
[31] (a) of FIG. 1 shows a form composed of three-dimensional contents only in a specific time and two-dimensional contents in the remaining time. In other words, the configuration of (a) of FIG. 1 is composed of three-dimensional contents only from tl time to t2 time and two-dimensional contents in the remaining time.
[32] (b) of FIG. 1 shows a form composed of three-dimensional contents only in a specific time and two-dimensional contents in the remaining time. In other words, the configuration of (b) of FIG. 1 is composed of three-dimensional contents only from tl time to t2 time and two-dimensional contents in the remaining time.
[33] (c) of FIG. 1 shows a form composed of a single type of three-dimensional contents.
At this time, the three-dimensional contents include a left image and a right image, wherein the left image and the right image can be provided from one source and two sources. In (c) and (d) of FIG. 1, an option shows a case where the left image and the right image are provided from the two sources.
[34] (d) of FIG. 1 shows a form composed of various types of three-dimensional contents.
[35] Next, an apparatus for managing contents and an apparatus for reproducing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention will be described with reference to FIG. 2 and FIG. 3. FIG. 2 is a view showing an apparatus for managing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention, and FIG. 3 is a view showing an apparatus for reproducing contents using a stereoscopic descriptor according to an exemplary embodiment of the present invention.
[36] As shown in FIG. 2, an apparatus for managing contents according to an exemplary embodiment of the present invention includes a storing unit 210, a three-dimensional contents generating unit 220, a control signal generating unit 230, an encoding unit 240, an MP4 file generating unit 250, and a packetizing unit 260.
[37] The storing unit 210 stores contents obtained by a camera and the three-dimensional generating unit 220 generates the three-dimensional contents by converting the sizes and colors of images transferred from the storing unit 210.
[38] The control signal generating unit 230 generates a binary format for a scene descriptor, an object descriptor, and a stereoscopic descriptor in MPEG-4. The stereoscopic descriptor includes information required for decoding and reproducing contents when the contents are composed of two-dimensional contents and three- dimensional contents, or various types of three-dimensional contents.
[39] The encoding unit 240 encodes two-dimensional contents input from the storing unit
210, three-dimensional contents input from the three-dimensional contents generating unit 220, and MPEG-4 control signals input from the control signal generating unit 230, and outputs each encoding stream (elementary stream, ES).
[40] The MP4 file generating unit 250 receives each encoding stream and generates the
MP4 file defined in MPEG-4 system specifications.
[41] The packetizing unit 260 extracts media data and MPEG-4 control signals included in the MP4 file after receiving the MP4 file from the MP4 file generating unit 250 and then generates packets defined in the MPEG-4 system specifications, or extracts the media data and the MPEG-4 control signals after receiving the encoding stream from the encoding unit 240 and then generates packets defined in the MPEG-4 system specifications, which in turn transmits them through a network.
[42] As shown in FIG. 3, the apparatus for reproducing contents according to an exemplary embodiment of the present invention includes a depacketizing unit 310, a decoding unit 320, and a display unit 330.
[43] The depacketizing unit 310 receives the received MPEG-4 packet to recover the media data. The decoding unit 320 decodes the recovered media data in the depacketizing unit 310 to recover contents.
[44] The display unit 330 displays the recovered contents.
[45] A pseudo code representing the stereoscopic descriptor according to an exemplary embodiment of the present invention will now be described. Tables 1 to 3 show examples of the pseudo codes representing the stereoscopic descriptor according to an exemplary embodiment of the present invention.
[46] First, reviewing Table 1, StereoScopicDescrTag represents a stereoscopic descriptor tag. As shown in Table 1, if Scene_change_number being a variable representing the number of scene changes is not 0, the stereoscopic descriptor includes ScenechangeSpecificInfo, and if the Scene_change_number is 0, the stereoscopic descriptor includes 3-bit Contents_format. Further, the stereoscopic descriptor includes 1-bit StereoscopicCamera_setting, 4-bit Reserved, 16-bit Baseline, 16-bit Focal_Length, 16-bit Con vergencePoint_distance, 16-bit Max_disparity, and 16-bit Min_disparity.
[47] The ScenechangeSpecificInfo includes 16-bit Start_AU_index, 3-bit
Contents_format, 5-bit Reserved, and DecoderSpecificInfo.
[48] As shown in Table 2, the stereoscopic descriptor includes the ScenechangeSpecificInfo and the Contents_format regardless of the Scene_change_number and may allow a user to designate the structure of the ScenechangeSpecificInfo, rather than previously designate it.
[49] Further, as in Table 3, the StereoscopicCamera_setting, the Reserved, the Baseline, the Focal_Length, and the ConvergencePoint_distance may be represented to be included in StereoscopicCameralnfo being a separate field and the Max_disparity and the Min_disparity may be represented to be included in StereoscopicContentsInfo being a separate field.
[50] The meanings of each parameter and field will be described below.
[51] (Table 1)
[52]
Figure imgf000009_0001
[53] [54] [55] (Table 2) [56]
Figure imgf000010_0001
[57] (Table 3)
Figure imgf000011_0001
[59] A method of generating a stereoscopic descriptor according to an exemplary embodiment of the present invention will now be described with reference to FIG. 4 to FIG. 8.
[60] FIG. 4 is a flowchart illustrating a method of generating a stereoscopic descriptor according to an exemplary embodiment of the present invention.
[61] FIG. 5 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents, and FIG. 6 is a view showing a structure and components of a stereoscopic descriptor according to an exemplary embodiment of the present invention when contents are formed in one type. The structure of the stereoscopic descriptor according to an exemplary embodiment of the present invention can be applied to all systems for servicing MPEG-2/MPEG-4 system-based stereoscopic contents, but MPEG-2/MPEG-4 system specifications do not support the stereoscopic descriptor.
[62] First, the control signal generating unit 230 adds Scenechange_number fields 510 and 610 (S410). The number of scene changes represents the changed number of contents type when the contents are composed of two-dimensional contents and three- dimensional contents, or various types of three-dimensional contents. The scene means a unit in which the same contents type is transferred.
[63] For example, the contents of FIGS. Ia, Ib, and Id are composed of three scenes, wherein the number of scene changes is 2. The contents of FIG. Ic are composed of one scene, wherein the number of scene changes is 0.
[64] When the number of scene changes is not 0, that is, the contents are composed of a plurality of scenes in which two-dimensional and three-dimensional contents are mixed, or various types of three-dimensional contents are mixed, the control signal generating unit 230 adds a scene change specification information (ScenechangeSpecificInfo) field 520 (S420).
[65] The scene change specification information field 520, which is a field including information on each of the plurality of scenes, includes the start frame index (Start_AU_index), the contents format (Contents_format), the reserved (Reserved), and the decoder specification information (DecoderSpecificInfo) parameters for each of a plurality of scenes, as shown in FIG. 5.
[66] The start frame index parameter represents the access unit (AU) number of each scene. AU is generally a frame.
[67] The contents format parameter is a parameter representing the types of contents.
Table 4 represents an example of a contents format parameter value. Mono means a general 2D motion picture type.
[68] The types of 3D contents will be described with reference to FIG. 7
[69] FIG. 7 is a view showing the types of 3D contents. The stereoscopic contents include a left image and a right image, wherein side by side means a form that the left image and the right image enter one frame left and right as shown in (a) of FIG. 7.
[70] Here, "n" means horizontal image sizes for the right image and the left image, respectively, and "m" means vertical image sizes. [71] Top/down means a form in which the left image and the right image are arranged up and down in a frame as shown in (b) of FIG. 7. [72] As shown in (c) of FIG. 7, field sequential means a form in which the fields of the left image and the right image are alternately arranged in a frame [73] That is, the frame is formed in order of "a 1st vertical line of a left image, a 2nd vertical line of a right image, a 3rd vertical line of a left image, a 4th vertical line of a right image ".
[74] As shown in (d) of FIG. 7, the frame sequential means a form in which the frame of the left image and the frame of the right image are alternately transferred. In other words, the frame is transferred in order of "a 1st frame of a left image, a 1st frame of a right image, a 2nd frame of a left image, a 2nd frame of a right image ".
[75] A main + additional image or a depth/disparity map is a form configuring data by considering any one of the left image and the right image as a main image and the other as a sub image, or configuring data by considering the left image or the right image as a main image and adding a depth/disparity map.
[76] The depth/disparity map can generate the stereoscopic image using the left image or the right image, and the depth/disparity map as the information obtained through a separate signal processing using the obtained left and right images.
[77] The depth/disparity map has an advantage of having a smaller data amount than the image form.
[78] (Table 4) [79]
Figure imgf000013_0001
[80] The reserved (Reserved), which is a reserved bit, is an inserted bit so as to meet 16 bits. [81] The decoder specification information (DecoderSpecificInfo) parameter includes header information required for decoding contents.
[82] At this time, when the 3D contents header is the same as the existing 2D contents header, it is not written. That is, if it has the same header information, the header information is not written repetitively.
[83] For example, the scene change specification information field 520 of the stereoscopic descriptor of the contents having the form shown in FIG. Ia includes [0,000,11111, 2D contents header/3600,001,11111, 3D contents header/5800,000,11111, 2D contents header].
[84] A general MPEG-4 system can transfer the header information required for decoding in the case where the contents are composed of only two-dimensional contents or a single type of three-dimensional contents, but cannot transfer the header information required for decoding when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents. However, an exemplary embodiment of the present invention can transfer the header information required for decoding by using the stereoscopic descriptor including the scene change specification information field 520 as described above, when the contents are composed of two-dimensional contents and three-dimensional contents, or various types of three-dimensional contents.
[85] In a 3D terminal, when 3D contents are activated in the binary format for scene descriptor based on the start frame index and the contents format information, a barrier can automatically be on/off.
[86] The barrier is attached on an LCD to separate the stereoscopic image, making it possible to perform a role of seeing the left image with a left eye and the left image with a right eye.
[87] When the number of scene changes is 0, the control signal generating unit 230 adds a contents format field 620 (S430). In other words, in the case where the contents is composed of only two-dimensional contents or a single type of three-dimensional contents, the stereoscopic descriptor includes the contents format field but does not include the start frame index (Start_AU_index), the reserved (Reserved), the decoder specification information (DecoderSpecificInfo).
[88] Next, the control signal generating unit 230 adds stereoscopic camera information
(StereoscopicCameralnfo) fields 530 and 630 (S440).
[89] The stereoscopic camera information fields 530 and 630, which are fields including information on a stereoscopic camera, include stereoscopic camera setting (StereoscopicCamera_setting), reserved (Reserved), baseline (Baseline), focal length (Focal_Length) and convergence point distance (ConvergencePoint_distance) paramters.
[90] The stereoscopic camera setting, which represents an arrangement form of a camera upon producing or photographing three-dimensional contents, is divided into a parallel and a cross arrangement.
[91] FIG. 8 shows parallel and cross arrangements of cameras. As shown in (a) of FIG. 8, two cameras are arranged in parallel in the parallel arrangement, and as shown in (b) of FIG. 8, cameras are arranged such that the photographing directions cross each other at an object in the cross arrangement.
[92] The baseline represents a distance between two cameras and the focal length represents a distance from a lens to an image plane. The image plane is generally a film on which the image is formed.
[93] The convergence point distance represents a distance from the baseline to the convergence point, wherein the convergence point means a point crossly meeting at a subject.
[94] The control signal generating unit 230 adds (S450) stereoscopic contents information
(StereoscopicContentsInfo) fields 540 and 640.
[95] The stereoscopic contents information fields, which are fields including information on the disparity of the stereoscopic contents, include Max_disparity and Min_disparity parameters. The disparity leads to a difference in images obtained from two cameras. In other words, a specific point (subject) of the left image is at a slightly different position in the right image. The difference in the image is referred to as the disparity and the information representing the disparity value is referred to as a magnitude of the disparity.
[96] The Max disparity represents a magnitude of the Max disparity of the three- dimensional contents and the Min disparity represents a magnitude of the Min disparity of the three-dimensional contents.
[97] The above-mentioned exemplary embodiments of the present invention are not embodied only by a method and apparatus. Alternatively, the above-mentioned exemplary embodiments may be embodied by a program performing functions, which correspond to the configuration of the exemplary embodiments of the present invention, or a recording medium on which the program is recorded. These embodiments can be easily devised from the description of the above-mentioned exemplary embodiments by those skilled in the art to which the present invention pertains.
[98] While this invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

Claims
[1] A method of generating contents information comprising: adding a first field representing the number of scene changes of contents to contents information; and adding a second field including information on each of a plurality of scenes corresponding to a plurality of types to the contents information, respectively, when there is the scene change of contents.
[2] The method of claim 1, wherein the second field includes information on a start frame of each of the plurality of scenes, and information on the plurality of types each corresponding to the plurality of scenes.
[3] The method of claim 2, wherein the second field further includes a plurality of header information required for decoding each of the plurality of scenes and corresponding to each of the plurality of scenes.
[4] The method of claim 1, further comprising adding a third field including information on a camera photographing the contents to contents information.
[5] The method of claim 4, wherein the third field includes: information on the type of arrangement of a plurality of cameras photographing the contents; information on a baseline that is a distance between the plurality of cameras; information on a focal length that is a distance between lenses of the plurality of cameras and an image plane; and information on a distance between the baseline and a convergence point.
[6] The method of claim 1, further comprising adding a fourth field including information on a disparity between the contents of the plurality of scenes and the contents information.
[7] The method of claim 6, wherein the fourth field includes: information on a magnitude of the Max disparity between the contents of the plurality of scenes; and information on a magnitude of the Min disparity between the contents of the plurality of scenes
[8] A method of generating contents information comprising: adding a first field representing the number of scene changes of contents to contents information; and adding a second field including information on the types of contents to the contents information, when there is no scene change of contents.
[9] The method of claim 8, further comprising adding a third field including information on a camera photographing the contents.
[10] The method of claim 9, wherein the third field includes: information on the type of arrangement of a plurality of cameras photographing the contents; information on a baseline that is a distance between the plurality of cameras; information on a focal length that is a distance between lenses of the plurality of cameras and an image plane; and information on a convergence point that is a distance between the baseline and a convergence point.
[11] An apparatus for managing contents comprising: a control signal generating unit generating a binary format for a scene descriptor, an object descriptor, and a stereoscopic descriptor; an encoding unit encoding media data and control signals input from the control signal generating unit and outputting an encoding stream (elementary stream, ES); and a unit generating a file after receiving the encoding stream, wherein the stereoscopic descriptor includes information required for decoding and reproducing the contents composed of two-dimensional contents and three- dimensional contents, or various types of three-dimensional contents.
[12] The apparatus of claim 11, further comprising a packetizing unit extracting the media data and the control signals included in the file and generating packets.
[13] The apparatus of claim 11, wherein the stereoscopic descriptor includes a scene change specification information field on each of a plurality of scenes corresponding to a plurality of types, respectively.
[14] The apparatus of claim 13, wherein the scene change specification information field includes: a plurality of start frame index parameters representing each start access unit (AU) of the plurality of scenes; a plurality of contents format parameters representing each content type of the plurality of scenes; and a plurality of decoder specification information parameters including header information required for decoding each contents of the plurality of scenes.
[15] The apparatus of claim 13, wherein the stereoscopic descriptor further includes a stereoscopic camera information field including information on a stereoscopic camera.
[16] The apparatus of claim 15, wherein the stereoscopic camera information field includes: a stereoscopic camera setting parameter representing an arrangement form of a plurality of cameras photographing three-dimensional contents; a baseline parameter representing a baseline that is a distance between the plurality of cameras; a focal length parameter representing a distance between lenses of the plurality of cameras and an image plane; and a convergence point distance representing a distance between the baseline and a convergence point. [17] The apparatus of claim 13, wherein the stereoscopic descriptor further includes a stereoscopic contents information field including information on a disparity between contents of the plurality of scenes. [18] The apparatus of claim 17, wherein the stereoscopic contents information field includes: a Max disparity parameter representing a magnitude of a Max disparity between contents of the plurality of scenes; and a Min disparity parameter representing a magnitude of a Min disparity between contents of the plurality of scenes. [19] The apparatus of claim 11, further comprising a three-dimensional contents generating unit converting sizes and colors of contents into three-dimensional contents.
PCT/KR2008/004858 2007-08-21 2008-08-20 Method of generating contents information and apparatus for managing contents using the contents information WO2009025503A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/673,604 US20110175985A1 (en) 2007-08-21 2008-08-20 Method of generating contents information and apparatus for managing contents using the contents information
EP08793371.9A EP2183924A4 (en) 2007-08-21 2008-08-20 Method of generating contents information and apparatus for managing contents using the contents information

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020070083985A KR101382618B1 (en) 2007-08-21 2007-08-21 Method for making a contents information and apparatus for managing contens using the contents information
KR10-2007-0083985 2007-08-21

Publications (2)

Publication Number Publication Date
WO2009025503A2 true WO2009025503A2 (en) 2009-02-26
WO2009025503A3 WO2009025503A3 (en) 2009-04-23

Family

ID=40378828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/004858 WO2009025503A2 (en) 2007-08-21 2008-08-20 Method of generating contents information and apparatus for managing contents using the contents information

Country Status (4)

Country Link
US (1) US20110175985A1 (en)
EP (1) EP2183924A4 (en)
KR (1) KR101382618B1 (en)
WO (1) WO2009025503A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101158723B1 (en) * 2011-05-09 2012-06-22 한밭대학교 산학협력단 System and method for fast game pictures encoder based on scene descriptor
KR20150004989A (en) * 2013-07-03 2015-01-14 한국전자통신연구원 Apparatus for acquiring 3d image and image processing method using the same

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000078611A (en) * 1998-08-31 2000-03-14 Toshiba Corp Stereoscopic video image receiver and stereoscopic video image system
BR0014954A (en) 1999-10-22 2002-07-30 Activesky Inc Object-based video system
JP2001258023A (en) 2000-03-09 2001-09-21 Sanyo Electric Co Ltd Multimedia reception system and multimedia system
US6803912B1 (en) * 2001-08-02 2004-10-12 Mark Resources, Llc Real time three-dimensional multiple display imaging system
KR100556826B1 (en) * 2003-04-17 2006-03-10 한국전자통신연구원 System and Method of Internet Broadcasting for MPEG4 based Stereoscopic Video
KR100576544B1 (en) 2003-12-09 2006-05-03 한국전자통신연구원 Apparatus and Method for Processing of 3D Video using MPEG-4 Object Descriptor Information
KR100697972B1 (en) * 2004-11-16 2007-03-23 한국전자통신연구원 Apparatus and Method for 3D Broadcasting Service
CN101171843B (en) * 2005-03-10 2010-10-13 高通股份有限公司 Content classification for multimedia processing
KR100747598B1 (en) * 2005-12-09 2007-08-08 한국전자통신연구원 System and Method for Transmitting/Receiving Three Dimensional Video based on Digital Broadcasting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of EP2183924A4 *

Also Published As

Publication number Publication date
EP2183924A2 (en) 2010-05-12
KR101382618B1 (en) 2014-04-10
KR20090019499A (en) 2009-02-25
WO2009025503A3 (en) 2009-04-23
US20110175985A1 (en) 2011-07-21
EP2183924A4 (en) 2013-07-17

Similar Documents

Publication Publication Date Title
US10341636B2 (en) Broadcast receiver and video data processing method thereof
US11140373B2 (en) Method for transmitting 360-degree video, method for receiving 360-degree video, apparatus for transmitting 360-degree video, and apparatus for receiving 360-degree video
US9179124B2 (en) Method and apparatus for generating stereoscopic image data stream by using camera parameter, and method and apparatus for restoring stereoscopic image by using camera parameter
US8878836B2 (en) Method and apparatus for encoding datastream including additional information on multiview image and method and apparatus for decoding datastream by using the same
EP2001235B1 (en) MPEG-4 format extension for recording stereoscopic or synthetic 3D video data with related metadata
JP7047095B2 (en) A method for transmitting and receiving 360 ° video including camera lens information and its device
CN104333746B (en) Broadcast receiver and 3d subtitle data processing method thereof
US8922621B2 (en) Method of recording three-dimensional image data
EP2757788B1 (en) Metadata structure for storing and playing stereoscopic data, and method for storing stereoscopic content file using this metadata
US11206387B2 (en) Method for transmitting 360 video, method for receiving 360 video, apparatus for transmitting 360 video, and apparatus for receiving 360 video
CN103202021A (en) Encoding device, decoding device, playback device, encoding method, and decoding method
WO2009048216A1 (en) Metadata structure for storing and playing stereoscopic data, and method for storing stereoscopic content file using this metadata
US20210321072A1 (en) An apparatus for transmitting a video, a method for transmitting a video, an apparatus for receiving a video, and a method for receiving a video
WO2012081874A2 (en) Signaling method for a stereoscopic video service and apparatus using the method
EP2512143A1 (en) Image data transmission device, image data transmission method, image data reception device, and image data reception method
CN102265626B (en) Method for transmitting data on stereoscopic image, method for playing back stereoscopic image, and method for creating file of stereoscopic image
US20110175985A1 (en) Method of generating contents information and apparatus for managing contents using the contents information
KR100913397B1 (en) Method of object description for three dimensional image service based on dmb, and method for receiving three dimensional image service and converting image format
KR20100092851A (en) Method and apparatus for generating 3-dimensional image datastream, and method and apparatus for receiving 3-dimensional image datastream
KR20100060413A (en) Receiving system and method of processing data
Jolly et al. Three-dimensional television: a broadcaster's perspective

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08793371

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 12673604

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2008793371

Country of ref document: EP