WO2016084592A1 - 送信装置、送信方法、受信装置および受信方法 - Google Patents
送信装置、送信方法、受信装置および受信方法 Download PDFInfo
- Publication number
- WO2016084592A1 WO2016084592A1 PCT/JP2015/081524 JP2015081524W WO2016084592A1 WO 2016084592 A1 WO2016084592 A1 WO 2016084592A1 JP 2015081524 W JP2015081524 W JP 2015081524W WO 2016084592 A1 WO2016084592 A1 WO 2016084592A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- view
- stream
- video
- information
- sound source
- Prior art date
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims description 16
- 238000012937 correction Methods 0.000 claims abstract description 134
- 238000009877 rendering Methods 0.000 claims abstract description 31
- 238000012545 processing Methods 0.000 claims description 18
- 230000011664 signaling Effects 0.000 claims description 10
- 101001073193 Homo sapiens Pescadillo homolog Proteins 0.000 description 16
- 102100035816 Pescadillo homolog Human genes 0.000 description 16
- 238000005516 engineering process Methods 0.000 description 14
- 239000000284 extract Substances 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 5
- 101100243456 Arabidopsis thaliana PES2 gene Proteins 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 101100294638 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) NRPS8 gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000002730 additional effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005401 electroluminescence Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2365—Multiplexing of several video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/236—Assembling of a multiplex stream, e.g. transport stream, by combining a video stream with other content or additional data, e.g. inserting a URL [Uniform Resource Locator] into a video stream, multiplexing software data into a video stream; Remultiplexing of multiplex streams; Insertion of stuffing bits into the multiplex stream, e.g. to obtain a constant bit-rate; Assembling of a packetised elementary stream
- H04N21/2368—Multiplexing of audio and video streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/20—Arrangements for obtaining desired frequency or directional characteristics
- H04R1/32—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
- H04R1/326—Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
- H04S5/02—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo four-channel type, e.g. in which rear channel signals are derived from two-channel stereo signals
Definitions
- the present technology relates to a transmission device, a transmission method, a reception device, and a reception method, and more particularly to a transmission device that transmits an audio stream having audio data and position information of an object sound source together with a video stream.
- the object sound source position information described above is based on one view.
- the position and direction of the camera to be used differ for each camera. Therefore, when the view is switched on the receiving side, 3D audio rendering cannot be performed correctly except for the reference view.
- the purpose of this technique is to enable 3D audio rendering to be performed correctly even when the view is switched on the receiving side.
- the concept of this technology is Based on the first video stream having the video data of the first view, the second video stream having the video data of the second view, the audio data of the object sound source, and the first view of the object sound source.
- a transmission unit for transmitting the container.
- the encoding unit generates an audio stream together with a first video stream having video data of the first view and a second video stream having video data of the second view.
- This audio stream has position information with reference to the audio data of the object sound source and the first view of the object sound source.
- the transmission unit has a predetermined format including the first video stream, the second video stream, and the audio stream, and position correction information for correcting the position information of the object sound source to position information based on the second view.
- a container is sent.
- the position correction information may be a difference component between the positions and directions of the first view and the second view.
- the container may be a transport stream (MPEG-2 TS) adopted in the digital broadcasting standard.
- the container may be MP4 used for Internet distribution or the like, or a container of other formats.
- the position correction information may be inserted into the layer of the audio stream. In this case, the synchronization of the audio data and position information of the object sound source and the position correction information is guaranteed.
- the position correction information may be inserted into a metadata area including the position information. In this case, for example, the position correction information may be inserted into the user data area.
- a plurality of pieces of position correction information corresponding to the plurality of second views are inserted into the audio stream layer, and a plurality of positions are included in the container layer.
- Information indicating the second video stream corresponding to each of the correction information may be inserted.
- the position correction information may be inserted into the layer of the second video stream. In this case, it becomes easy to associate the position correction information with the second video stream. In this case, for example, the position correction information may be inserted into the user data area.
- the position correction information may be inserted into the container layer.
- the position correction information may be inserted as signaling information.
- the receiving side it is possible to obtain position correction information at the system layer.
- the container may be MPEG2-TS, and the position correction information may be inserted into a video elementary stream loop corresponding to the second video stream of the program map table.
- an information stream including position correction information may be inserted.
- the position correction information can be easily obtained from an information stream independent of the audio stream and the video stream.
- the position correction information for correcting the position information of the object sound source to the position information based on the second view is transmitted together with the first video stream, the second video stream, and the audio stream. It is what is done. Therefore, when switching to the second view on the receiving side, the position information of the object sound source corrected with the position correction information so as to be based on the second view can be used, and 3D audio rendering can be performed. It can be done correctly.
- a receiving unit that receives a container in a predetermined format including an audio stream having position information to be corrected, and position correction information for correcting the position information of the object sound source to position information based on the second view; And a processing unit that processes information included in the container.
- the reception unit receives a first video stream having video data of the first view, a second video stream having video data of the second view, audio data of the object sound source, and the object sound source
- a container having a predetermined format including an audio stream having position information based on the first view and position correction information for correcting the position information of the object sound source to position information based on the second view is received. The Then, the information included in the container is processed by the processing unit.
- the processing unit obtains video data of the first view, video data of the second view, audio data of the object sound source, and position information from the first video stream, the second video stream, and the audio stream.
- a selector that selectively outputs the video data of the first view or the video data of the second view, and a rendering unit that maps the audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source;
- the rendering unit uses the position information corrected to be based on the second view based on the position correction information when the video data of the second view is selected by the selector. Also good.
- the rendering is performed by using the position information corrected based on the position correction information so that the second view is used as a reference. Done. Therefore, even when view switching is performed, 3D audio rendering can be performed correctly.
- a receiving unit that receives a container in a predetermined format including an audio stream having position information to be transmitted;
- An acquisition unit for acquiring position correction information for correcting the position information of the object sound source to position information based on the second view;
- a decoding unit for obtaining video data of the first view, video data of the second view, audio data of the object sound source, and position information from the first video stream, the second video stream, and the audio stream
- a selector that selectively outputs the video data of the first view or the video data of the second view;
- a rendering unit that maps the audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source;
- the reception unit receives a first video stream having video data of the first view, a second video stream having video data of the second view, audio data of the object sound source, and the object sound source A container of a predetermined format including an audio stream having position information relative to the first view is received.
- the acquisition unit acquires position correction information for correcting the position information of the object sound source to position information based on the second view.
- the acquisition unit may acquire the position correction information from an audio stream layer, a second video stream layer, or a container layer.
- the acquisition unit may acquire the position correction information from a server on the network.
- the decoding unit obtains video data of the first view, video data of the second view, audio data of the object sound source, and position information from the first video stream, the second video stream, and the audio stream.
- the selector selectively outputs video data of the first view or video data of the second view.
- the rendering unit maps the audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source.
- the position information corrected to be based on the second view based on the position correction information is used.
- 3D audio rendering can be performed correctly even when the view is switched on the receiving side. Note that the effects described in the present specification are merely examples and are not limited, and may have additional effects.
- FIG. 10 is a diagram illustrating a configuration example of configuration information “userdataConfig ()”. It is a figure which shows the structural example of a component group descriptor (component_group_descriptor). It is a figure which shows the structural example of transport stream TS in case difference component VP2, VP3 is inserted in the layer of an audio stream. It is a figure which shows the structural example of a video sequence (Video_sequence). It is a figure which shows the structural example etc. of user data.
- multiview position information 2 multiview position information 2 (multiview_Position_information2 ()). It is a figure which shows the structural example of the user data SEI in MPEG4-AVC or HEVC. It is a figure which shows the structural example of the transport stream TS in case the difference components VP2 and VP3 are inserted in the layer of a video stream. It is a figure which shows the structural example of a multi view position information descriptor. It is a figure which shows the structural example of transport stream TS in case the difference component VP2, VP3 is inserted as a signaling into the layer of a container (system). It is a block diagram which shows the other structural example of a transmitter.
- FIG. 1 It is a figure which shows the structural example of a position correction information stream (elementary stream). It is a figure which shows the structural example of transport stream TS in case the difference components VP2 and VP3 are inserted as a position correction information stream. It is a figure which shows collectively the transmission system of position correction information. It is a block diagram which shows the structural example of a receiver. It is a block diagram which shows the other structural example of a receiver. It is a block diagram which shows the other structural example of a receiver.
- FIG. 1 shows a configuration example of a transmission / reception system 10 as an embodiment.
- the transmission / reception system 10 includes a transmission device 100 and a reception device 200.
- the transmission device 100 transmits the transport stream TS on a broadcast wave or a net packet.
- the transport stream TS includes a plurality of video streams each having video data of a plurality of views, and an audio stream having audio data and position information of one or a plurality of object sound sources.
- FIG. 2 shows an example of an assumed situation of view (video) shooting with a camera and audio listening with a microphone.
- the transport stream TS includes a video stream and an audio stream corresponding to this assumed situation.
- the transport stream TS includes a video stream having video data SV1 of view 1 (View1) obtained by shooting with the camera 11 and video data of view 2 (View2) obtained by shooting with the camera 12.
- a video stream having SV2 and a video stream having video data SV3 of view 3 (View3) obtained by being photographed by the camera 13 are included.
- the transport stream TS includes one audio stream.
- This audio stream includes audio data obtained by the microphone 21 (audio data of the object sound source 1 (Object1)) and position information (position information of the object sound source 1) based on the view 1 of the microphone 21.
- audio data obtained by the microphone 22 audio data of the object sound source 2 (Object 2)) and position information with reference to the view 1 of the microphone 22 (position information of the object sound source 2) Is included.
- this transport stream TS has a difference component between the positions and directions of view 1 and view 2.
- This difference component constitutes position correction information for correcting the position information of each object sound source to position information with view 2 as a reference.
- the transport stream TS has a difference component between the positions and directions of the view 1 and the view 3.
- This difference component constitutes position correction information for correcting the position information of each object sound source to position information with reference to the view 3.
- the receiving device 200 receives the transport stream TS transmitted from the transmitting device 100 on broadcast waves or net packets.
- this transport stream TS includes three video streams each having video data of view 1, view 2 and view 3, and one audio having audio data and position information of object sound source 1 and object sound source 2. Has a stream.
- the transport stream TS includes a difference component between the positions and directions of the view 1 and the view 2 as position correction information for correcting the position information of each object sound source to position information with the view 2 as a reference.
- the transport stream TS includes a difference component between the positions and directions of the views 1 and 3 as position correction information for correcting the position information of each object sound source into position information with the view 3 as a reference. ing.
- the receiving apparatus 200 selectively presents images based on video data of view 1, view 2, and view 3. Further, the receiving apparatus 200 performs rendering by mapping audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source, and reproduces sound. At this time, when the view 2 or the view 3 is selected, the position information corrected by the difference component is used so that rendering can be performed correctly.
- the position information corrected to be based on the second view based on the difference components between the positions and directions of the views 1 and 2 is used.
- position information corrected so as to be based on the third view based on the difference components between the positions and directions of the views 1 and 3 is used.
- FIG. 3 shows the relationship between the view 1 (View 1) and the position of the object sound source.
- the position of the object sound source can be expressed by polar coordinates s (r, ⁇ , ⁇ ) with view 1 as a reference, and can also be expressed by orthogonal coordinates p (x, y, z).
- s polar coordinates
- p orthogonal coordinates
- the transmission device 100 transmits an audio stream including audio data and position information of an object sound source.
- This audio stream includes 3D audio metadata.
- this metadata each coordinate value and gain value of the polar coordinates s (r, ⁇ , ⁇ ) of the object sound source are inserted.
- FIG. 4 shows the positional relationship between View 2 (View 2) and the object sound source.
- the position of the object sound source can be expressed by polar coordinates s ′ (r ′, ⁇ ′, ⁇ ′) with view 2 as a reference, and can be expressed by orthogonal coordinates p ′ (x ′, y ′, z ′).
- the difference component between the positions and directions of the view 1 and the view 2 includes a spatial difference component ( ⁇ x, ⁇ y, ⁇ z) and a directional difference component ( ⁇ , ⁇ ).
- the receiving apparatus 200 corrects the position information s (r, ⁇ , ⁇ ) of the object sound source to position information s ′ (r ′, ⁇ ′, ⁇ ′) based on the view 2.
- FIG. 5 shows an example of the conversion formula in that case.
- (x, y, z) corresponds to each coordinate value of the orthogonal coordinates p (x, y, z) of the object sound source, and (x1, y1, z1) represents the difference component ( ⁇ x, ⁇ y). , ⁇ z).
- FIG. 6 shows a configuration example of the transmission device 100.
- the transmission apparatus 100 includes a control unit 111, video encoders 112, 113, and 114, a 3D audio encoder 115, a system encoder 116, and a transmission unit 117.
- the control unit 111 controls the operation of each unit of the transmission device 100.
- the video encoders 112, 113, and 114 receive the video data SV1, SV2, and SV3 of the views 1, 2, and 3, respectively.
- the 3D audio encoder 115 inputs object data related to the object sound sources 1 and 2 and performs encoding of, for example, MPEG-H 3D Audio on the object data to obtain an audio stream.
- the object data related to the object sound source 1 includes object audio data SA1 and object metadata META1.
- the object metadata META1 includes coordinate values and gain values of polar coordinates s (r, ⁇ , ⁇ ) of the object sound source 1.
- the object data related to the object sound source 2 includes object audio data SA2 and object metadata META2.
- the object metadata META2 includes the coordinate values and gain values of the polar coordinates s (r, ⁇ , ⁇ ) of the object sound source 2.
- the system encoder 116 converts the video stream output from the video encoders 112, 113, and 114 and the audio stream output from the 3D audio encoder 115 into PES packets, further converts them into transport packets, multiplexes them, and converts them as a multiplexed stream.
- a port stream TS is obtained.
- the transmission unit 117 transmits the transport stream TS to a reception device 200 by placing it on a broadcast wave or a network packet.
- the difference components VP2 ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ) between the positions and directions of the views 1 and 2 and the difference components VP3 between the positions and directions of the views 1 and 3 are displayed.
- ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ) are inserted into (1) an audio stream layer, (2) a video stream layer, or (3) a container layer.
- the difference component VP2 constitutes position correction information for correcting the position information of the object sound sources 1 and 2 to position information with the view 2 as a reference.
- the difference component VP3 constitutes position correction information for correcting the position information of the object sound sources 1 and 2 to position information with the view 3 as a reference.
- the 3D audio encoder 115 inserts difference components VP2 and VP3 as position correction information in the layer of the audio stream. In this case, it is inserted into the metadata area or the user data area.
- FIG. 7 shows the structure of an audio frame of MPEG-H 3D Audio.
- This audio frame is composed of a plurality of MPEG audio stream packets (mpeg
- Each MPEG audio stream packet is composed of a header and a payload.
- the header has information such as packet type (Packet type), packet label (Packet type Label), and packet length (Packet type Length).
- Information defined by the packet type of the header is arranged in the payload.
- the payload information includes “SYNC” corresponding to the synchronization start code, “Frame” that is actual data, and “Config” indicating the configuration of this “Frame”.
- the object data related to the object sound source is composed of object audio data and object metadata as described above. These data are included in “Frame”.
- the object audio data is included as encoded sample data of SCE (Single Channel Element).
- the object metadata is included as an extension element (Ext_element).
- an extension element (Ext_element) including user data can be defined.
- the difference components VP2 and VP3 are inserted into the metadata area, the difference components VP2 and VP3 are inserted into the extension element (Ext_element) including the object metadata.
- FIG. 8A shows a configuration example (Syntax) of object metadata (object_metadata ()).
- FIG. 8B shows a configuration example (Syntax) of object metadata effective (object_metadata_efficient ()) included in the object metadata.
- Multi-view position information 1 multiview_Position_information1 ()) having difference components VP2 and VP3 is arranged in the intra-coded metadata effective (intracoded_ object_metadata_efficient ()) of the object metadata effect.
- FIG. 9 shows a configuration example (Syntax) of multiview position information 1 (multiview_Position_information1 ()).
- a 1-bit field of “process_multiview” is a flag indicating multi-view.
- multi-view there is an 8-bit field of “multiview_count”. This field indicates the total number of views. In the example illustrated in FIG. 2, the total number of views is “3”.
- each view there is a field of difference component of each view except “total number ⁇ 1”, that is, view 1 (View1).
- view 1 view 1
- the difference component fields include an 8-bit field “ ⁇ x”, an 8-bit field “ ⁇ y”, an 8-bit field “ ⁇ z”, a 9-bit field “ ⁇ ”, and a 7-bit field “ ⁇ ”. Consists of.
- the field of “ ⁇ x” indicates ⁇ x, that is, the value of the x coordinate of the target view (View) when View 1 (View1) is the origin.
- the field “ ⁇ y” indicates ⁇ y, that is, the value of the y coordinate of the target view (View) when view 1 (View1) is the origin.
- the field “ ⁇ z” indicates ⁇ z, that is, the value of the z coordinate of the target view (View) when view 1 (View1) is the origin.
- the field of “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to View 1 (View 1).
- the field “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to view 1 (View 1).
- the difference components VP2 and VP3 are inserted into the user data area, the difference components VP2 and VP3 are inserted into extension elements (Ext_element) including user data.
- an element (Ext_userdata) including user data (user_data ()) is newly defined as an extension element (Ext_element). Accordingly, the configuration information “userdataConfig ()” of the element (Ext_userdata) is newly defined in “Config”.
- FIG. 10A shows a configuration example (syntax) of the configuration information “userdataConfig ()”.
- a 32-bit field of “userdata_identifier” indicates that user data is set by setting a predetermined array value.
- a 16-bit field of “userdata_frame_length” indicates the number of bytes of user data (user_data ()).
- FIG. 10B shows a configuration example (syntax) of user data (user_data ()).
- “GA94” By inserting “0x4741934” (“GA94”) into the 32-bit field of “user_data_identifier”, “ATSC_user_data ()” is included in the field of “user_structure ()”.
- FIG. 10C illustrates a configuration example (syntax) of “ATSC_user_data ()”.
- Multi-view position information 1 (multiview_Position_information1 ()), for example, “0x07” is inserted into the 8-bit field of “user_data_type_code”, so that the multi-view position information is inserted into the “user_data_type_structure ()” field.
- 1 (multiview_Position_information1 ()) (see FIG. 9) is included.
- the system encoder 116 inserts information indicating the video stream corresponding to each of the plurality of difference components into the container (system) layer. To do. For example, the system encoder 116 inserts a component group descriptor (component_group_descriptor) into an audio elementary stream loop corresponding to the audio stream.
- component_group_descriptor component group descriptor
- FIG. 11 shows a structural example (Syntax) of a component group descriptor.
- An 8-bit field of “descriptor_tag” indicates a descriptor type. Here, it indicates that it is a component group descriptor.
- the 8-bit field of “descriptor_length” indicates the length (size) of the descriptor, and indicates the number of subsequent bytes as the length of the descriptor.
- the 4-bit field of “component_group_type” indicates the type of component group. Here, it is set to “0”, indicating that it is a video / audio component group related to multi-view of 3D audio.
- a 4-bit field of “num_video” indicates the number of video streams (video elementary streams). Then, the 8-bit field of “component_tag” is repeated by this number in a for loop. This “component_tag” field indicates the value of the component tag (Component_tag) of the related video component.
- component tag values of a video stream including video data of view 1 (View 1), view 2 (View 2), and view 3 (View 3) are described in order.
- information such as a packet identifier (PID), a component tag (Component_tag), and a stream type (Stream_Type) is arranged in a video elementary stream loop corresponding to each video stream.
- PID packet identifier
- Component_tag component tag
- Stream_Type stream type
- the 8-bit field of“ num_audio ” indicates the number of audio streams (audio elementary streams). Then, the 8-bit field of “component_tag” is repeated by this number in a for loop. This “component_tag” field indicates the value of the component tag (Component_tag) of the related audio component.
- FIG. 12 shows a configuration example of the transport stream TS when the difference components VP2 and VP3 are inserted into the audio stream layer.
- the transport stream TS includes a PES packet “Video PES1” of the video stream including the video data of the view 1 (View1) and a PES packet “Video PES2” of the video stream including the video data of the view 2 (View2). Then, there is a PES packet “Video PES3” of a video stream including video data of View 3 (View 3).
- multiview position information 1 (multiview_Position_information1 ()) (see FIG. 9) is inserted into the PES payload of the PES packet of this audio stream.
- the transport stream TS includes a PMT (Program Map Table) as PSI (Program Specific Information).
- PSI Program Specific Information
- This PSI is information describing to which program each elementary stream included in the transport stream belongs.
- the PMT has a program descriptor (Program Descriptor) that describes information related to the entire program.
- an elementary stream loop having information related to each elementary stream exists in the PMT.
- video elementary stream loops Video ES loop
- audio elementary stream loops Audio ES loop
- each loop information such as a packet identifier (PID), a component tag (Component_tag), and a stream type (Stream_Type) is arranged.
- PID packet identifier
- Component_tag component tag
- Stream_Type stream type
- component_group_descriptor component group descriptor
- the video encoder 113 inserts a difference component VP2 as position correction information into the layer of the video stream.
- the video encoder 114 inserts a difference component VP3 as position correction information into the layer of the video stream. In this case, it is inserted into the user data area.
- FIG. 13 shows a configuration example (Syntax) of a video sequence (Video_sequence).
- This video sequence (Video_sequence) has a field of extension and user data (2) (extension_and_user_data (2)).
- FIG. 14A shows a configuration example (Syntax) of the extension and user data (2).
- the extension and user data (2) has a user data (user_data ()) field.
- FIG. 14B shows a configuration example (Syntax) of this user data. By inserting “0x4741934” (“GA94”) into the 32-bit field of “user_data_identifier”, “ATSC_user_data ()” is included in the field of “user_structure ()”.
- FIG. 14C illustrates a configuration example (syntax) of “ATSC_user_data ()”.
- Multi-view position information 2 (multiview_Position_information2 ()), for example, “0x07” is inserted into the 8-bit field of “user_data_type_code”, so that the multi-view position information is inserted into the “user_data_type_structure ()” field.
- 2 (multiview_Position_information2 ()) is included.
- FIG. 15 shows a configuration example (Syntax) of multiview position information 2 (multiview_Position_information2 ()).
- a 1-bit field of “process_multiview” is a flag indicating multi-view. When multi-view, there is a difference component field.
- the multi-view position information 2 inserted in the layer of the video stream including the video data of View 2 (View 2) has a field of the difference component VP2, and the layer of the video stream including the video data of View 3 (View 3)
- the field of the difference component VP3 exists in the multi-view position information 2 inserted into.
- the difference component fields include an 8-bit field “ ⁇ x”, an 8-bit field “ ⁇ y”, an 8-bit field “ ⁇ z”, a 9-bit field “ ⁇ ”, and a 7-bit field “ ⁇ ”. Consists of.
- the field of “ ⁇ x” indicates ⁇ x, that is, the value of the x coordinate of the target view (View) when View 1 (View1) is the origin.
- the field “ ⁇ y” indicates ⁇ y, that is, the value of the y coordinate of the target view (View) when view 1 (View1) is the origin.
- the field “ ⁇ z” indicates ⁇ z, that is, the value of the z coordinate of the target view (View) when the view 1 (View1) is the origin.
- the field of “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to View 1 (View 1).
- the field “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to view 1 (View 1).
- FIG. 16A shows a configuration example (syntax) of user data SEI.
- GA94 0x47413934
- ATSC1_data () is included in the field of “USER_structure ()”.
- FIG. 16B shows a configuration example (syntax) of “ATSC1_data ()”.
- Multi-view position information 2 (multiview_Position_information2 ()), for example, “0x07” is inserted into the 8-bit field of “user_data_type_code”, so that the multi-view position information is inserted into the “user_data_type_structure ()” field.
- 2 (multiview_Position_information2 ()) is included.
- FIG. 17 illustrates a configuration example of the transport stream TS when the difference components VP2 and VP3 are inserted into the video stream layer.
- FIG. 17 portions corresponding to those in FIG.
- the above-described multiview position information 2 (multiview_Position_information2 ()) (see FIG. 15) is inserted into the PES packet “Video PES2” of the video stream including the video data of View 2 (View 2). Further, the above-described multiview position information 2 (multiview_Position_information2 ()) (see FIG. 15) is inserted into the PES packet “Video PES3” of the video stream including the video data of the view 3 (View3).
- the system encoder 116 inserts a multi-view position information descriptor (multiview_Position_information_descriptor) in the video elementary stream loop corresponding to the video streams of the view 2 (View2) and the view 3 (View3).
- multiview position information descriptor multiview_Position_information_descriptor
- FIG. 18 shows a configuration example (Syntax) of the multi-view position information descriptor.
- An 8-bit field of “descriptor_tag” indicates a descriptor type. Here, it indicates a multi-view position information descriptor.
- the 8-bit field of “descriptor_length” indicates the length (size) of the descriptor, and indicates the number of subsequent bytes as the length of the descriptor.
- the 1-bit field of “PTS_flag” is flag information indicating that there is time information (PTS) corresponding to the acquisition position information of the object sound source. When “1”, 33-bit time information exists. Further, this descriptor includes a field for a difference component.
- the field of the difference component VP2 exists in the multi-view position information descriptor inserted in the video elementary stream loop corresponding to the video stream of View 2 (View 2). Further, the multi-view position information descriptor inserted in the video elementary stream loop corresponding to the video stream of View 3 (View 3) has a field of the difference component VP3.
- the difference component fields include an 8-bit field “ ⁇ x”, an 8-bit field “ ⁇ y”, an 8-bit field “ ⁇ z”, a 9-bit field “ ⁇ ”, and a 7-bit field “ ⁇ ”. Consists of.
- the field of “ ⁇ x” indicates ⁇ x, that is, the value of the x coordinate of the target view (View) when View 1 (View1) is the origin.
- the field “ ⁇ y” indicates ⁇ y, that is, the value of the y coordinate of the target view (View) when view 1 (View1) is the origin.
- the field “ ⁇ z” indicates ⁇ z, that is, the value of the z coordinate of the target view (View) when view 1 (View1) is the origin.
- the field of “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to View 1 (View 1).
- the field “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to view 1 (View 1).
- FIG. 19 shows a configuration example of the transport stream TS when the difference components VP2 and VP3 are inserted as signaling into the container (system) layer.
- the description of portions corresponding to those in FIG. 12 is omitted as appropriate.
- the multi-view position information descriptor having the above-described difference component VP2 is inserted.
- the multi-view position information descriptor in which the above-described difference component VP3 exists is inserted in the video elementary stream loop corresponding to the video stream of View 3 (View 3).
- FIG. 20 illustrates a configuration example of the transmission device 100 in that case.
- portions corresponding to those in FIG. 6 are denoted by the same reference numerals, and detailed description thereof will be omitted as appropriate.
- the transmission device 100 includes position correction information encoders 118 and 119.
- the position correction information encoder 118 encodes the difference component VP2 to generate a position correction information stream. Further, the position correction information encoder 118 encodes the difference component VP3 to generate a position correction information stream.
- the system encoder 116 converts the video stream output from the video encoders 112, 113, and 114, the audio stream output from the 3D audio encoder 115, and the position correction information stream output from the position correction information encoders 118 and 119 into the PES. Packetized, further transport packetized and multiplexed to obtain a transport stream TS as a multiplexed stream
- FIG. 21 shows a configuration example (Syntax) of the position correction information stream (elementary stream).
- the 8-bit field of “data_identifier” is a value indicating PES data of position correction information.
- the 4-bit field “PES_data_packet_header_length” indicates the length of the field “PES_Data_private_data_byte”. Service-dependent private data is inserted into the field “PES_Data_private_data_byte”.
- the difference component fields include an 8-bit field “ ⁇ x”, an 8-bit field “ ⁇ y”, an 8-bit field “ ⁇ z”, a 9-bit field “ ⁇ ”, and a 7-bit field “ ⁇ ”. It consists of.
- the field of “ ⁇ x” indicates ⁇ x, that is, the value of the x coordinate of the target view (View) when View 1 (View1) is the origin.
- the field “ ⁇ y” indicates ⁇ y, that is, the value of the y coordinate of the target view (View) when view 1 (View1) is the origin.
- the field “ ⁇ z” indicates ⁇ z, that is, the value of the z coordinate of the target view (View) when view 1 (View1) is the origin.
- the field “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to View 1 (View 1).
- the field “ ⁇ ” indicates ⁇ , that is, the difference of ⁇ with respect to view 1 (View 1).
- FIG. 22 shows a configuration example of the transport stream TS when the difference components VP2 and VP3 are inserted as position correction information streams.
- FIG. 22 portions corresponding to those in FIG. 12 will not be described as appropriate.
- the PES packet “Position PES1” of the position correction information stream including the difference component VP2 related to the view 2 (View2) and the PES of the position correction information stream including the difference component VP3 related to the view 3 (View3) are included.
- the packet “Position PES2” exists.
- the PMT includes position correction information / elementary stream loops (Position ES loop) corresponding to two position correction information streams, respectively.
- FIG. 23 collectively shows the transmission method of the position correction information described above.
- each position correction information is added for each video signal. Necessary information is transmitted for each video. It is not transmitted (the information to be transmitted is small) . When the video is selected by re-transmission, the position correction information is transmitted as it is, so that there is no need for extra processing.
- Video data SV1, SV2, and SV3 of views 1, 2, and 3 are supplied to video encoders 112, 113, and 114, respectively.
- the video data SV1, SV2, and SV3 are encoded by, for example, MPEG2, MPEG4-AVC, HEVC, or the like to obtain a video stream.
- the object data SA1 and META1 related to the object sound source 1 and the object data SA2 and META2 related to the object sound source 2 are supplied to the 3D audio encoder 115.
- the 3D audio encoder 115 for example, MPEG-H 3D Audio encoding is performed on the object data related to the object sound sources 1 and 2 to obtain an audio stream.
- Video streams obtained by the video encoders 112, 113, and 114 are supplied to the system encoder 116.
- the audio stream obtained by the 3D audio encoder 115 is supplied to the system encoder 116.
- a stream supplied from each encoder is converted into a PES packet, further converted into a transport packet, and multiplexed to obtain a transport stream TS as a multiplexed stream.
- the transport stream TS obtained by the system encoder 116 is supplied to the transmission unit 117.
- the transport stream TS is transmitted to the reception device 200 on a broadcast wave or a net packet.
- the difference component VP2 between the positions and directions of the views 1 and 2 and the difference component VP3 between the positions and directions of the views 1 and 3 are (1) an audio stream layer and (2) a video stream. Or (3) is inserted into the container layer and transmitted to the receiving apparatus 200.
- FIG. 24 illustrates a configuration example of the receiving device 200.
- the receiving apparatus 200 includes a control unit 211, a receiving unit 212, a system decoder 213, a selector 214, a video decoder 215, a display unit 216, a 3D audio decoder 217, a 3D audio renderer 218, and a speaker system 219. have.
- the control unit 211 controls the operation of each unit of the receiving device 200.
- the reception unit 212 receives the transport stream TS transmitted from the transmission device 100 on broadcast waves or net packets.
- This transport stream TS has three video streams each having video data of view 1, view 2 and view 3, and audio data having audio data and position information of object sound source 1 and object sound source 2 ( (See FIG. 2).
- the system decoder 213 extracts three video stream packets each having video data of view 1, view 2 and view 3 from the transport stream TS, and reconstructs the three video streams. Further, the system decoder 213 extracts an audio stream packet from the transport stream TS, and reconstructs the audio stream.
- the system decoder 213 extracts various information such as descriptor information from the transport stream TS, and sends it to the control unit 211.
- the various information includes information of a multiview position information descriptor (multiview_Position_information_descriptor) (see FIG. 18) when the difference components VP2 and VP3 are inserted as signaling.
- the various information includes information on a component group descriptor (component_group_descriptor) (see FIG. 11) when inserted in the layer of the audio stream.
- the selector 214 selectively outputs one of the three video streams reconstructed by the system decoder 213 based on the selection control of the control unit 211 according to the user's view selection.
- the video decoder 215 performs a decoding process on the video stream output from the selector 214 to obtain video data of a view selected by the user.
- the video decoder 215 extracts various information inserted in the layer of the video stream and sends it to the control unit 211.
- the various information includes information of multiview position information 2 (multiview_Position_information2 ()) (see FIG. 15) when the difference components VP2 and VP3 are inserted in the video stream layer.
- the display unit 216 includes a display panel such as an LCD (Liquid Crystal Display) or an organic EL display (organic electroluminescence display).
- the display unit 216 obtains display video data by performing scaling processing, image quality adjustment processing, and the like on the video data obtained by the video decoder 215, and displays an image based on the display video data on the display panel.
- the 3D audio decoder 217 performs decoding processing on the audio stream reconstructed by the system decoder 213 to obtain object data related to the object sound sources 1 and 2.
- the object data related to the object sound source 1 is composed of object audio data SA1 and object metadata META1, and the object metadata META1 includes coordinate values and gains of polar coordinates s (r, ⁇ , ⁇ ) of the object sound source 1. Gain) value is included.
- the object data related to the object sound source 2 includes object audio data SA2 and object metadata META2.
- the object metadata META2 includes coordinate values and gains (Gain) of polar coordinates s (r, ⁇ , ⁇ ) of the object sound source 2. ) Value is included.
- the 3D audio decoder 217 extracts various information inserted in the layer of the audio stream and sends it to the control unit 211.
- the various information includes information of multiview position information 1 (multiview_Position_information1 ()) (see FIG. 9) when the difference components VP2 and VP3 are inserted in the layer of the audio stream.
- the 3D audio renderer 218 obtains audio data of a predetermined channel matched to the speaker system 219 based on the object data (audio data, position information) related to the object sound sources 1 and 2 obtained by the 3D audio decoder 217.
- the 3D audio renderer 218 refers to the speaker arrangement information and maps the audio data of each object sound source to the speaker existing at an arbitrary position based on the position information.
- the 3D audio renderer 218 includes a position correction calculation unit 218a.
- the 3D audio renderer 218 uses the position information (r, ⁇ , ⁇ ) included in the object data (audio data, position information) related to the sound sources 1 and 2 obtained by the 3D audio decoder 217 as it is. use.
- the 3D audio renderer 218 When the view 2 or the view 3 is selected, the 3D audio renderer 218 includes position information (r, ⁇ , ⁇ ) included in the object data (audio data, position information) related to the sound sources 1 and 2 obtained by the 3D audio decoder 217. ) Is corrected by the position correction calculation unit 218a using the conversion formula shown in FIG. 5, and the position information (r ′, ⁇ ′, ⁇ ′) is used.
- the position correction calculation unit 218a uses the difference component VP2 ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ) between the positions and directions of the view 1 and the view 2 as a reference. Is corrected (converted) into position information (r ′, ⁇ ′, ⁇ ′) based on the view 2.
- the position correction calculation unit 218a uses the difference component VP3 ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ) between the positions and directions of the view 1 and the view 1 as a reference.
- the speaker system 219 obtains an acoustic output corresponding to the display image of the display unit 216 based on audio data of a predetermined channel obtained by the 3D audio renderer 218.
- the reception unit 212 receives the transport stream TS transmitted from the transmission device 100 on broadcast waves or net packets.
- This transport stream TS includes three video streams having video data of view 1, view 2 and view 3, respectively, and audio data of object sound source 1 and object sound source 2 and an audio stream having position information. .
- This transport stream TS is supplied to the system decoder 213.
- system decoder 2113 three video stream packets each having video data of view 1, view 2 and view 3 are extracted from the transport stream TS, and the three video streams are reconstructed. Further, the system decoder 213 extracts audio stream packets from the transport stream TS, and reconfigures the audio stream.
- the system decoder 213 extracts various information such as descriptor information from the transport stream TS and sends it to the control unit 211.
- the various information includes information on the multi-view position information descriptor (see FIG. 18) when the difference components VP2 and VP3 are inserted as signaling.
- the various types of information include information on component group descriptors (see FIG. 11) when inserted into the layer of the audio stream.
- the three video streams reconstructed by the system decoder 213 are supplied to the selector 214.
- the selector 214 selectively outputs one of the three video streams based on selection control of the control unit 211 according to the user's view selection.
- the video stream output from the selector 214 is supplied to the video decoder 215.
- the video decoder 215 performs a decoding process on the video stream, and obtains video data of a view selected by the user.
- the video decoder 215 extracts various information inserted in the layer of the video stream and sends it to the control unit 211.
- the various information includes information of multi-view position information 2 (see FIG. 15) when the difference components VP2 and VP3 are inserted in the layer of the video stream.
- Video data obtained by the video decoder 215 is supplied to the display unit 216.
- the video data obtained by the video decoder 215 is subjected to scaling processing, image quality adjustment processing, and the like to obtain display video data, and an image based on the display video data is displayed on the display panel. .
- the audio stream reconstructed by the system decoder 213 is supplied to the 3D audio decoder 217.
- the audio stream reconstructed by the system decoder 213 is subjected to decoding processing, and object data related to the object sound sources 1 and 2 is obtained.
- various information inserted in the layer of the audio stream is extracted by the 3D audio decoder 217 and sent to the control unit 211.
- the various information includes information of multi-view position information 1 (see FIG. 9) when the difference components VP2 and VP3 are inserted in the audio stream layer.
- the object data related to the object sound sources 1 and 2 obtained by the 3D audio decoder 217 is supplied to the 3D audio renderer 218.
- the 3D audio renderer 218 obtains audio data of a predetermined channel according to the speaker system 219 based on the object data (audio data, position information) related to the object sound sources 1 and 2.
- the speaker arrangement information is referred to, and the audio data of each object sound source is mapped to the speaker existing at an arbitrary position based on the position information.
- position information included in the object data (audio data, position information) related to the sound sources 1 and 2 obtained by the 3D audio decoder 217.
- the differential components ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇
- the audio data of a predetermined channel output from the 3D audio renderer 218 is supplied to the speaker system 219.
- an acoustic output corresponding to the display image on the display unit 216 is obtained based on the audio data of the predetermined channel.
- FIG. 25 shows a configuration example of the receiving apparatus 200 when the difference components VP2 and VP3 are inserted as position correction information streams in the container layer.
- parts corresponding to those in FIG. 24 are denoted by the same reference numerals, and detailed description thereof is omitted.
- two position correction information streams each including the difference information VP2 and VP3 are obtained from the transport stream TS.
- the position correction information stream including the difference information VP2 is supplied to the position correction information decoder 221.
- the position correction information decoder 221 decodes the position correction information stream to obtain a difference component VP2.
- the position correction information stream including the difference information VP3 is supplied to the position correction information decoder 222.
- the position correction information stream is decoded to obtain a difference component VP3.
- difference components VP2 and VP3 are supplied to the 3D audio renderer 218.
- the speaker arrangement information is referred to, and the audio data of each object sound source is mapped to a speaker existing at an arbitrary position based on the position information (r, ⁇ , ⁇ ).
- the position information (r ′, ⁇ ′, ⁇ ′) of the object sound sources 1 and 2 is corrected (converted) based on the difference components VP2 and VP3, respectively. Used.
- the transmission device 100 uses the position information (r, ⁇ , ⁇ ) with respect to the view 1 of each object sound source as the position with respect to the view 2 and the view 3.
- Position correction information (difference components VP2, VP3) for correction (conversion) to information (r ′, ⁇ ′, ⁇ ′) is inserted into the audio stream layer, video stream layer, or container layer and transmitted. Is. Therefore, on the receiving side, when switching to view 2 or view 3, the position information of the object sound source can be corrected and used, and 3D audio rendering can be performed correctly.
- the position correction information is the difference component ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ) is shown.
- the position correction information is not limited to the difference components ( ⁇ x, ⁇ y, ⁇ z, ⁇ , ⁇ ).
- the receiving apparatus 200 has shown an example in which the difference components VP2 and VP3 are acquired from the audio stream layer, the video stream layer, or the container layer.
- this difference component is acquired from a server connected to the network.
- access information to the server may be transmitted from the transmission apparatus 100 to the reception apparatus 200 after being inserted into the audio stream layer, the video stream layer, or the container layer.
- FIG. 26 shows a configuration example of the receiving apparatus 200 in that case.
- portions corresponding to those in FIG. 24 are denoted by the same reference numerals, and detailed description thereof is omitted.
- the difference components VP2 and VP3 are acquired by accessing a server connected to the network.
- difference components VP2 and VP3 are supplied to the 3D audio renderer 218.
- the speaker arrangement information is referred to, and the audio data of each object sound source is mapped to a speaker existing at an arbitrary position based on the position information (r, ⁇ , ⁇ ).
- the position information (r ′, ⁇ ′, ⁇ ′) of the object sound sources 1 and 2 is corrected (converted) based on the difference components VP2 and VP3, respectively. Used.
- the container is a transport stream (MPEG-2 TS)
- MPEG-2 TS transport stream
- the present technology can be similarly applied to a system distributed in a container of MP4 or other formats.
- MMT MPEG-Media-Transport
- this technique can also take the following structures.
- a first video stream having video data of a first view, a second video stream having video data of a second view, audio data of an object sound source, and the first view of the object sound source An encoding unit for generating an audio stream having position information based on A predetermined format including the first video stream, the second video stream, the audio stream, and position correction information for correcting the position information of the object sound source to position information based on the second view.
- a transmission unit for transmitting the container of (2) The transmission device according to (1), wherein the position correction information is a difference component between a position and a direction of the first view and the second view.
- the container is MPEG2-TS, The transmission apparatus according to (9), wherein the position correction information is inserted into a video elementary stream loop corresponding to the second video stream in the program map table.
- an information stream including the position correction information is inserted.
- a first video stream having video data of a first view, a second video stream having video data of a second view, audio data of an object sound source, and the first view of the object sound source An encoding step for generating an audio stream having position information based on Position correction information for correcting the first video stream, the second video stream, the audio stream, and the position information of the object sound source to position information based on the second view by the transmission unit; A transmission step of transmitting a container having a predetermined format including the transmission method.
- a receiving unit for receiving a container in a predetermined format including an audio stream having position information with reference to the position and position correction information for correcting the position information of the object sound source to position information with reference to the second view
- a receiving device comprising: a processing unit that processes information contained in the container.
- the processing unit A decoding unit for obtaining video data of the first view, video data of the second view, audio data of the object sound source, and position information from the first video stream, the second video stream, and the audio stream
- a selector that selectively outputs the video data of the first view or the video data of the second view
- a rendering unit that maps the audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source;
- the rendering unit uses the position information corrected to be based on the second view based on the position correction information.
- the receiving unit By the receiving unit, the first video stream having the video data of the first view, the second video stream having the video data of the second view, the audio data of the object sound source, and the above of the object sound source
- a container having a predetermined format including an audio stream having position information with reference to the first view and position correction information for correcting the position information of the object sound source to position information with reference to the second view.
- a receiving unit for receiving a container in a predetermined format including an audio stream having position information based on An acquisition unit for acquiring position correction information for correcting the position information of the object sound source to position information based on the second view;
- a decoding unit for obtaining video data of the first view, video data of the second view, audio data of the object sound source, and position information from the first video stream, the second video stream, and the audio stream
- a selector that selectively outputs the video data of the first view or the video data of the second view;
- a rendering unit that maps the audio data of the object sound source to an arbitrary speaker position based on the position information of the object sound source; When the video data of the second view is selected by the selector, the rendering unit uses position information corrected to be based on the second view based on the position correction information.
- the receiving unit By the receiving unit, the first video stream having the video data of the first view, the second video stream having the video data of the second view, the audio data of the object sound source, and the above of the object sound source Receiving a container of a predetermined format including an audio stream having position information relative to the first view; An acquisition step of acquiring position correction information for correcting the position information of the object sound source to position information based on the second view; Decoding step of obtaining video data of the first view, video data of the second view, audio data of the object sound source and position information from the first video stream, the second video stream, and the audio stream When, A select step of selectively outputting the video data of the first view or the video data of the second view; A rendering step of obtaining audio data tailored to the speaker system based on the audio data and position information of the object sound source; In the rendering step, when video data of the second view is selected in the selection step, the position information corrected based on the position correction information so as to be based on the second view is used. .
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Databases & Information Systems (AREA)
- Otolaryngology (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Stereophonic System (AREA)
- Television Systems (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
Description
第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームを生成するエンコード部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを送信する送信部とを備える
送信装置にある。
第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを受信する受信部と、
上記コンテナに含まれる情報を処理する処理部とを備える
受信装置にある。
第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームとを含む所定フォーマットのコンテナを受信する受信部と、
上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報を取得する取得部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコード部と、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクタと、
上記オブジェクト音源のオーディオデータを上記オブジェクト音源の位置情報に基づいて任意のスピーカ位置にマッピングするレンダリング部とを備え、
上記レンダリング部は、上記セレクタで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
受信装置にある。
1.実施の形態
2.変形例
[送受信システムの構成例]
図1は、実施の形態としての送受信システム10の構成例を示している。この送受信システム10は、送信装置100と受信装置200により構成されている。送信装置100は、トランスポートストリームTSを、放送波あるいはネットのパケットに載せて送信する。
図6は、送信装置100の構成例を示している。この送信装置100は、制御部111と、ビデオエンコーダ112,113,114と、3Dオーディオエンコーダ115と、システムエンコーダ116と、送信部117を有している。制御部111は、送信装置100の各部の動作を制御する。
差分成分VP2,VP3がオーディオストリームのレイヤに挿入される場合について説明する。この場合、3Dオーディオエンコーダ115は、オーディオストリームのレイヤに、位置補正情報としての差分成分VP2,VP3を挿入する。この場合、メタデータ領域あるいはユーザデータ領域に挿入される。
差分成分VP2,VP3がビデオストリームのレイヤに挿入される場合について説明する。この場合、ビデオエンコーダ113は、ビデオストリームのレイヤに、位置補正情報としての差分成分VP2を挿入する。また、この場合、ビデオエンコーダ114は、ビデオストリームのレイヤに、位置補正情報としての差分成分VP3を挿入する。この場合、ユーザデータ領域に挿入される。
差分成分VP2,VP3がコンテナ(システム)のレイヤに挿入される場合について説明する。この場合、シグナリング情報として挿入されるか、あるいは位置補正情報ストリームとして挿入される。
図24は、受信装置200の構成例を示している。この受信装置200は、制御部211と、受信部212と、システムデコーダ213と、セレクタ214と、ビデオデコーダ215と、表示部216と、3Dオーディオデコーダ217と、3Dオーディオレンダラ218と、スピーカシステム219を有している。制御部211は、受信装置200の各部の動作を制御する。
なお、上述実施の形態においては、ビュー1の他にビュー2、ビュー3が存在し、またオブジェクト音源1,2が存在する例を示した。本技術において、ビューの数およびオブジェクト音源の数は、この例に限定されない。
(1)第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームを生成するエンコード部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを送信する送信部とを備える
送信装置。
(2)上記位置補正情報は、上記第1のビューと上記第2のビューの位置および方向の差分成分である
前記(1)に記載の送信装置。
(3)上記位置補正情報は、上記オーディオストリームのレイヤに挿入される
前記(1)または(2)に記載の送信装置。
(4)上記位置補正情報は、上記位置情報が含まれるメタデータ領域に挿入される
前記(3)に記載の送信装置。
(5)上記位置補正情報は、ユーザデータ領域に挿入される
前記(3)に記載の送信装置。
(6)上記第2のビューが複数であるとき、上記オーディオストリームのレイヤに、上記複数の第2のビューに対応して複数の上記位置補正情報が挿入され、
上記コンテナのレイヤに、上記複数の位置補正情報のそれぞれが対応する上記第2のビデオストリームを示す情報が挿入される
前記(3)に記載の送信装置。
(7)上記位置補正情報は、上記第2のビデオストリームのレイヤに挿入される
前記(1)または(2)に記載の送信装置。
(8)上記位置補正情報は、上記コンテナのレイヤに挿入される
前記(1)または(2)に記載の送信装置。
(9)上記位置補正情報は、シグナリング情報として挿入される
前記(8)に記載の送信装置。
(10)上記コンテナは、MPEG2―TSであり、
上記位置補正情報は、プログラムマップテーブルの上記第2のビデオストリームに対応したビデオ・エレメンタリストリームループ内に挿入される
前記(9)に記載の送信装置。
(11)上記位置補正情報を含む情報ストリームが挿入される
前記(8)に記載の送信装置。
(12)第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームを生成するエンコードステップと、
送信部により、上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを送信する送信ステップとを有する
送信方法。
(13)第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを受信する受信部と、
上記コンテナに含まれる情報を処理する処理部とを備える
受信装置。
(14)上記処理部は、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコード部と、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクタと、
上記オブジェクト音源のオーディオデータを上記オブジェクト音源の位置情報に基づいて任意のスピーカ位置にマッピングするレンダリング部とを有し、
上記レンダリング部は、上記セレクタで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
前記(13)に記載の受信装置。
(15)受信部により、第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを受信する受信ステップと、
上記コンテナに含まれる情報を処理する処理ステップとを有する
受信方法。
(16)第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームとを含む所定フォーマットのコンテナを受信する受信部と、
上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報を取得する取得部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコード部と、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクタと、
上記オブジェクト音源のオーディオデータを上記オブジェクト音源の位置情報に基づいて任意のスピーカ位置にマッピングするレンダリング部とを備え、
上記レンダリング部は、上記セレクタで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
受信装置。
(17)上記取得部は、上記位置補正情報を、上記オーディオストリームのレイヤ、上記第2のビデオストリームのレイヤあるいは上記コンテナのレイヤから取得する
前記(16)に記載の受信装置。
(18)上記取得部は、上記位置補正情報を、ネットワーク上のサーバから取得する
前記(16)に記載の受信装置。
(19)受信部により、第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームとを含む所定フォーマットのコンテナを受信する受信ステップと、
上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報を取得する取得ステップと、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコードステップと、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクトステップと、
上記オブジェクト音源のオーディオデータおよび位置情報に基づいて、スピーカシステムに合わせたオーディオデータを得るレンダリングステップとを有し、
上記レンダリングステップでは、上記セレクトステップで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
受信方法。
100・・・送信装置
111・・・制御部
112,113,114・・・ビデオエンコーダ
115・・・3Dオーディオエンコーダ
116・・・システムエンコーダ
117・・・送信部
118,119・・・位置補正情報エンコーダ
200・・・受信装置
211・・・制御部
212・・・受信部
213・・・システムデコーダ
214・・・セレクタ
215・・・ビデオデコーダ
216・・・表示部
217・・・3Dオーディオデコーダ
218・・・3Dオーディオレンダラ
218a・・・位置補正演算部
219・・・スピーカシステム
221,222・・・位置補正情報デコーダ
231・・・通信インタフェース
Claims (19)
- 第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームを生成するエンコード部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを送信する送信部とを備える
送信装置。 - 上記位置補正情報は、上記第1のビューと上記第2のビューの位置および方向の差分成分である
請求項1に記載の送信装置。 - 上記位置補正情報は、上記オーディオストリームのレイヤに挿入される
請求項1に記載の送信装置。 - 上記位置補正情報は、上記位置情報が含まれるメタデータ領域に挿入される
請求項3に記載の送信装置。 - 上記位置補正情報は、ユーザデータ領域に挿入される
請求項3に記載の送信装置。 - 上記第2のビューが複数であるとき、上記オーディオストリームのレイヤに、上記複数の第2のビューに対応して複数の上記位置補正情報が挿入され、
上記コンテナのレイヤに、上記複数の位置補正情報のそれぞれが対応する上記第2のビデオストリームを示す情報が挿入される
請求項3に記載の送信装置。 - 上記位置補正情報は、上記第2のビデオストリームのレイヤに挿入される
請求項1に記載の送信装置。 - 上記位置補正情報は、上記コンテナのレイヤに挿入される
請求項1に記載の送信装置。 - 上記位置補正情報は、シグナリング情報として挿入される
請求項8に記載の送信装置。 - 上記コンテナは、MPEG2―TSであり、
上記位置補正情報は、プログラムマップテーブルの上記第2のビデオストリームに対応したビデオ・エレメンタリストリームループ内に挿入される
請求項9に記載の送信装置。 - 上記位置補正情報を含む情報ストリームが挿入される
請求項8に記載の送信装置。 - 第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームを生成するエンコードステップと、
送信部により、上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを送信する送信ステップとを有する
送信方法。 - 第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを受信する受信部と、
上記コンテナに含まれる情報を処理する処理部とを備える
受信装置。 - 上記処理部は、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコード部と、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクタと、
上記オブジェクト音源のオーディオデータを上記オブジェクト音源の位置情報に基づいて任意のスピーカ位置にマッピングするレンダリング部とを有し、
上記レンダリング部は、上記セレクタで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
請求項13に記載の受信装置。 - 受信部により、第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームと、上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報とを含む所定フォーマットのコンテナを受信する受信ステップと、
上記コンテナに含まれる情報を処理する処理ステップとを有する
受信方法。 - 第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームとを含む所定フォーマットのコンテナを受信する受信部と、
上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報を取得する取得部と、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコード部と、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクタと、
上記オブジェクト音源のオーディオデータを上記オブジェクト音源の位置情報に基づいて任意のスピーカ位置にマッピングするレンダリング部とを備え、
上記レンダリング部は、上記セレクタで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
受信装置。 - 上記取得部は、上記位置補正情報を、上記オーディオストリームのレイヤ、上記第2のビデオストリームのレイヤあるいは上記コンテナのレイヤから取得する
請求項16に記載の受信装置。 - 上記取得部は、上記位置補正情報を、ネットワーク上のサーバから取得する
請求項16に記載の受信装置。 - 受信部により、第1のビューのビデオデータを持つ第1のビデオストリームと、第2のビューのビデオデータを持つ第2のビデオストリームと、オブジェクト音源のオーディオデータおよび該オブジェクト音源の上記第1のビューを基準とする位置情報を持つオーディオストリームとを含む所定フォーマットのコンテナを受信する受信ステップと、
上記オブジェクト音源の位置情報を上記第2のビューを基準とする位置情報に補正するための位置補正情報を取得する取得ステップと、
上記第1のビデオストリーム、上記第2のビデオストリームおよび上記オーディオストリームから、上記第1のビューのビデオデータ、上記第2のビューのビデオデータ、上記オブジェクト音源のオーディオデータおよび位置情報を得るデコードステップと、
上記第1のビューのビデオデータまたは上記第2のビューのビデオデータを選択的に出力するセレクトステップと、
上記オブジェクト音源のオーディオデータおよび位置情報に基づいて、スピーカシステムに合わせたオーディオデータを得るレンダリングステップとを有し、
上記レンダリングステップでは、上記セレクトステップで上記第2のビューのビデオデータが選択されるとき、上記位置補正情報に基づいて上記第2のビューを基準とするように補正された位置情報を用いる
受信方法。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020177013143A KR102605480B1 (ko) | 2014-11-28 | 2015-11-09 | 송신 장치, 송신 방법, 수신 장치 및 수신 방법 |
CN201580063452.7A CN107004419B (zh) | 2014-11-28 | 2015-11-09 | 发送装置、发送方法、接收装置和接收方法 |
JP2016561483A JP6624068B2 (ja) | 2014-11-28 | 2015-11-09 | 送信装置、送信方法、受信装置および受信方法 |
US15/523,723 US10880597B2 (en) | 2014-11-28 | 2015-11-09 | Transmission device, transmission method, reception device, and reception method |
CA2967249A CA2967249C (en) | 2014-11-28 | 2015-11-09 | Transmission device, transmission method, reception device, and reception method |
EP15862526.9A EP3226241B1 (en) | 2014-11-28 | 2015-11-09 | Transmission device, transmission method, reception device, and reception method |
MX2017006581A MX2017006581A (es) | 2014-11-28 | 2015-11-09 | Dispositivo de transmision, metodo de transmision, dispositivo de recepcion, y metodo de recepcion. |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014241953 | 2014-11-28 | ||
JP2014-241953 | 2014-11-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016084592A1 true WO2016084592A1 (ja) | 2016-06-02 |
Family
ID=56074162
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/081524 WO2016084592A1 (ja) | 2014-11-28 | 2015-11-09 | 送信装置、送信方法、受信装置および受信方法 |
Country Status (8)
Country | Link |
---|---|
US (1) | US10880597B2 (ja) |
EP (1) | EP3226241B1 (ja) |
JP (1) | JP6624068B2 (ja) |
KR (1) | KR102605480B1 (ja) |
CN (1) | CN107004419B (ja) |
CA (1) | CA2967249C (ja) |
MX (1) | MX2017006581A (ja) |
WO (1) | WO2016084592A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019187434A1 (ja) * | 2018-03-29 | 2019-10-03 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
WO2019187437A1 (ja) * | 2018-03-29 | 2019-10-03 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
WO2020153092A1 (ja) * | 2019-01-25 | 2020-07-30 | ソニー株式会社 | 情報処理装置及び情報処理方法 |
JPWO2019187430A1 (ja) * | 2018-03-29 | 2021-04-08 | ソニー株式会社 | 情報処理装置、方法、及びプログラム |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9792957B2 (en) | 2014-10-08 | 2017-10-17 | JBF Interlude 2009 LTD | Systems and methods for dynamic video bookmarking |
US10460765B2 (en) * | 2015-08-26 | 2019-10-29 | JBF Interlude 2009 LTD | Systems and methods for adaptive and responsive video |
US11856271B2 (en) | 2016-04-12 | 2023-12-26 | JBF Interlude 2009 LTD | Symbiotic interactive video |
CN106774930A (zh) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | 一种数据处理方法、装置及采集设备 |
US11050809B2 (en) | 2016-12-30 | 2021-06-29 | JBF Interlude 2009 LTD | Systems and methods for dynamic weighting of branched video paths |
US10820034B2 (en) | 2017-05-26 | 2020-10-27 | At&T Intellectual Property I, L.P. | Providing streaming video from mobile computing nodes |
US10257578B1 (en) | 2018-01-05 | 2019-04-09 | JBF Interlude 2009 LTD | Dynamic library display for interactive videos |
GB2574238A (en) * | 2018-05-31 | 2019-12-04 | Nokia Technologies Oy | Spatial audio parameter merging |
US11601721B2 (en) | 2018-06-04 | 2023-03-07 | JBF Interlude 2009 LTD | Interactive video dynamic adaptation and user profiling |
JP2020005038A (ja) * | 2018-06-25 | 2020-01-09 | キヤノン株式会社 | 送信装置、送信方法、受信装置、受信方法、及び、プログラム |
CN110858925B (zh) * | 2018-08-22 | 2021-10-15 | 华为技术有限公司 | 一种实现视频流切换的方法、设备、系统和存储介质 |
US20200296462A1 (en) | 2019-03-11 | 2020-09-17 | Wci One, Llc | Media content presentation |
US12096081B2 (en) | 2020-02-18 | 2024-09-17 | JBF Interlude 2009 LTD | Dynamic adaptation of interactive video players using behavioral analytics |
US12047637B2 (en) | 2020-07-07 | 2024-07-23 | JBF Interlude 2009 LTD | Systems and methods for seamless audio and video endpoint transitions |
EP3968643A1 (en) * | 2020-09-11 | 2022-03-16 | Nokia Technologies Oy | Alignment control information for aligning audio and video playback |
US11882337B2 (en) | 2021-05-28 | 2024-01-23 | JBF Interlude 2009 LTD | Automated platform for generating interactive videos |
US11934477B2 (en) | 2021-09-24 | 2024-03-19 | JBF Interlude 2009 LTD | Video player integration within websites |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005229618A (ja) * | 2004-02-13 | 2005-08-25 | Texas Instruments Inc | 動的音源とリスナーの位置による音声レンダリング |
JP2012004835A (ja) * | 2010-06-16 | 2012-01-05 | Canon Inc | 再生装置及びその制御方法及びプログラム |
WO2015162947A1 (ja) * | 2014-04-22 | 2015-10-29 | ソニー株式会社 | 情報再生装置及び情報再生方法、並びに情報記録装置及び情報記録方法 |
Family Cites Families (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7448063B2 (en) * | 1991-11-25 | 2008-11-04 | Actv, Inc. | Digital interactive system for providing full interactivity with live programming events |
US20040261127A1 (en) * | 1991-11-25 | 2004-12-23 | Actv, Inc. | Digital interactive system for providing full interactivity with programming events |
US5823786A (en) * | 1993-08-24 | 1998-10-20 | Easterbrook; Norman John | System for instruction of a pupil |
US5835667A (en) * | 1994-10-14 | 1998-11-10 | Carnegie Mellon University | Method and apparatus for creating a searchable digital video library and a system and method of using such a library |
US5714997A (en) * | 1995-01-06 | 1998-02-03 | Anderson; David P. | Virtual reality television system |
TW436777B (en) * | 1995-09-29 | 2001-05-28 | Matsushita Electric Ind Co Ltd | A method and an apparatus for reproducing bitstream having non-sequential system clock data seamlessly therebetween |
CA2269778A1 (en) * | 1996-09-16 | 1998-03-19 | Advanced Research Solutions, Llc | Data correlation and analysis tool |
US6353461B1 (en) * | 1997-06-13 | 2002-03-05 | Panavision, Inc. | Multiple camera video assist control system |
US6961954B1 (en) * | 1997-10-27 | 2005-11-01 | The Mitre Corporation | Automated segmentation, information extraction, summarization, and presentation of broadcast news |
US6750919B1 (en) * | 1998-01-23 | 2004-06-15 | Princeton Video Image, Inc. | Event linked insertion of indicia into video |
KR100324512B1 (ko) * | 1998-07-14 | 2002-06-26 | 구자홍 | 실시간데이터기록및재생장치와그제어방법 |
US6714909B1 (en) * | 1998-08-13 | 2004-03-30 | At&T Corp. | System and method for automated multimedia content indexing and retrieval |
US6144375A (en) * | 1998-08-14 | 2000-11-07 | Praja Inc. | Multi-perspective viewer for content-based interactivity |
US6266100B1 (en) * | 1998-09-04 | 2001-07-24 | Sportvision, Inc. | System for enhancing a video presentation of a live event |
US6229550B1 (en) * | 1998-09-04 | 2001-05-08 | Sportvision, Inc. | Blending a graphic |
US6825875B1 (en) * | 1999-01-05 | 2004-11-30 | Interval Research Corporation | Hybrid recording unit including portable video recorder and auxillary device |
US6466275B1 (en) * | 1999-04-16 | 2002-10-15 | Sportvision, Inc. | Enhancing a video of an event at a remote location using data acquired at the event |
EP1275247A2 (en) * | 2000-03-31 | 2003-01-15 | United Video Properties, Inc. | Personal video recording system with home surveillance feed |
US20020115047A1 (en) * | 2001-02-16 | 2002-08-22 | Golftec, Inc. | Method and system for marking content for physical motion analysis |
US6537076B2 (en) * | 2001-02-16 | 2003-03-25 | Golftec Enterprises Llc | Method and system for presenting information for physical motion analysis |
US20020170068A1 (en) * | 2001-03-19 | 2002-11-14 | Rafey Richter A. | Virtual and condensed television programs |
US7203693B2 (en) * | 2001-06-12 | 2007-04-10 | Lucent Technologies Inc. | Instantly indexed databases for multimedia content analysis and retrieval |
US20030033602A1 (en) * | 2001-08-08 | 2003-02-13 | Simon Gibbs | Method and apparatus for automatic tagging and caching of highlights |
US8947347B2 (en) * | 2003-08-27 | 2015-02-03 | Sony Computer Entertainment Inc. | Controlling actions in a video game unit |
EP1757087A4 (en) * | 2004-04-16 | 2009-08-19 | James A Aman | AUTOMATIC VIDEO RECORDING OF EVENTS, PURSUIT AND CONTENT PRODUCTION SYSTEM |
DE102005008369A1 (de) * | 2005-02-23 | 2006-09-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zum Simulieren eines Wellenfeldsynthese-Systems |
US8306131B2 (en) * | 2005-02-25 | 2012-11-06 | Kyocera Corporation | Communications systems |
JP4669340B2 (ja) * | 2005-07-28 | 2011-04-13 | 富士通株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
JP4683227B2 (ja) | 2006-05-30 | 2011-05-18 | 日本電気株式会社 | 映像音声ファイルシステム |
EP2092409B1 (en) * | 2006-12-01 | 2019-01-30 | LG Electronics Inc. | Apparatus and method for inputting a command, method for displaying user interface of media signal, and apparatus for implementing the same, apparatus for processing mix signal and method thereof |
KR101461958B1 (ko) * | 2007-06-29 | 2014-11-14 | 엘지전자 주식회사 | 디지털 방송 시스템 및 데이터 처리 방법 |
JP5593596B2 (ja) * | 2008-02-04 | 2014-09-24 | ソニー株式会社 | 映像信号送信装置および映像信号送信方法 |
JP4557035B2 (ja) * | 2008-04-03 | 2010-10-06 | ソニー株式会社 | 情報処理装置、情報処理方法、プログラム及び記録媒体 |
CN101350931B (zh) * | 2008-08-27 | 2011-09-14 | 华为终端有限公司 | 音频信号的生成、播放方法及装置、处理系统 |
US20110052155A1 (en) * | 2009-09-02 | 2011-03-03 | Justin Desmarais | Methods for producing low-cost, high-quality video excerpts using an automated sequence of camera switches |
US8749609B2 (en) * | 2009-09-03 | 2014-06-10 | Samsung Electronics Co., Ltd. | Apparatus, system and method for video call |
US8370358B2 (en) * | 2009-09-18 | 2013-02-05 | Microsoft Corporation | Tagging content with metadata pre-filtered by context |
EP2346028A1 (en) * | 2009-12-17 | 2011-07-20 | Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. | An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal |
KR101090356B1 (ko) * | 2009-12-28 | 2011-12-13 | 주식회사 더블유코퍼레이션 | 오디오 신호 및 비디오 신호의 동기화 오차 보정 방법 및 장치 |
US9699431B2 (en) * | 2010-02-10 | 2017-07-04 | Satarii, Inc. | Automatic tracking, recording, and teleprompting device using multimedia stream with video and digital slide |
US9704393B2 (en) * | 2011-01-11 | 2017-07-11 | Videonetics Technology Private Limited | Integrated intelligent server based system and method/systems adapted to facilitate fail-safe integration and/or optimized utilization of various sensory inputs |
WO2012103649A1 (en) * | 2011-01-31 | 2012-08-09 | Cast Group Of Companies Inc. | System and method for providing 3d sound |
EP2541547A1 (en) | 2011-06-30 | 2013-01-02 | Thomson Licensing | Method and apparatus for changing the relative positions of sound objects contained within a higher-order ambisonics representation |
KR101843834B1 (ko) | 2011-07-01 | 2018-03-30 | 돌비 레버러토리즈 라이쎈싱 코오포레이션 | 향상된 3d 오디오 오서링과 렌더링을 위한 시스템 및 툴들 |
US8867886B2 (en) * | 2011-08-08 | 2014-10-21 | Roy Feinson | Surround video playback |
US8917877B2 (en) * | 2011-10-12 | 2014-12-23 | Sony Corporation | Distance-based rendering of media files |
JP2013090016A (ja) | 2011-10-13 | 2013-05-13 | Sony Corp | 送信装置、送信方法、受信装置および受信方法 |
US20130129304A1 (en) * | 2011-11-22 | 2013-05-23 | Roy Feinson | Variable 3-d surround video playback with virtual panning and smooth transition |
WO2013149672A1 (en) | 2012-04-05 | 2013-10-10 | Huawei Technologies Co., Ltd. | Method for determining an encoding parameter for a multi-channel audio signal and multi-channel audio encoder |
US20140002582A1 (en) * | 2012-06-29 | 2014-01-02 | Monkeymedia, Inc. | Portable proprioceptive peripatetic polylinear video player |
US8929573B2 (en) * | 2012-09-14 | 2015-01-06 | Bose Corporation | Powered headset accessory devices |
KR20140102386A (ko) * | 2013-02-13 | 2014-08-22 | 삼성전자주식회사 | 디스플레이장치 및 그 제어방법 |
CN104023265A (zh) * | 2013-03-01 | 2014-09-03 | 联想(北京)有限公司 | 一种音频信息流的切换方法、装置及电子设备 |
US9282399B2 (en) * | 2014-02-26 | 2016-03-08 | Qualcomm Incorporated | Listen to people you recognize |
US9693009B2 (en) * | 2014-09-12 | 2017-06-27 | International Business Machines Corporation | Sound source selection for aural interest |
US9930405B2 (en) * | 2014-09-30 | 2018-03-27 | Rovi Guides, Inc. | Systems and methods for presenting user selected scenes |
-
2015
- 2015-11-09 EP EP15862526.9A patent/EP3226241B1/en active Active
- 2015-11-09 JP JP2016561483A patent/JP6624068B2/ja active Active
- 2015-11-09 CN CN201580063452.7A patent/CN107004419B/zh active Active
- 2015-11-09 KR KR1020177013143A patent/KR102605480B1/ko active IP Right Grant
- 2015-11-09 CA CA2967249A patent/CA2967249C/en active Active
- 2015-11-09 US US15/523,723 patent/US10880597B2/en active Active
- 2015-11-09 MX MX2017006581A patent/MX2017006581A/es unknown
- 2015-11-09 WO PCT/JP2015/081524 patent/WO2016084592A1/ja active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005229618A (ja) * | 2004-02-13 | 2005-08-25 | Texas Instruments Inc | 動的音源とリスナーの位置による音声レンダリング |
JP2012004835A (ja) * | 2010-06-16 | 2012-01-05 | Canon Inc | 再生装置及びその制御方法及びプログラム |
WO2015162947A1 (ja) * | 2014-04-22 | 2015-10-29 | ソニー株式会社 | 情報再生装置及び情報再生方法、並びに情報記録装置及び情報記録方法 |
Non-Patent Citations (2)
Title |
---|
"Information technology - Coding of audio-visual objects -Part 1: Systems", INTERNATIONAL STANDARD ISO/IEC14496-1, June 2010 (2010-06-01), pages 158, XP082004073 * |
See also references of EP3226241A4 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7314929B2 (ja) | 2018-03-29 | 2023-07-26 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US11533348B2 (en) | 2018-03-29 | 2022-12-20 | Sony Group Corporation | Information processing apparatus, information processing method, and program |
JP7396267B2 (ja) | 2018-03-29 | 2023-12-12 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JPWO2019187437A1 (ja) * | 2018-03-29 | 2021-04-01 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JPWO2019187434A1 (ja) * | 2018-03-29 | 2021-04-01 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JPWO2019187430A1 (ja) * | 2018-03-29 | 2021-04-08 | ソニー株式会社 | 情報処理装置、方法、及びプログラム |
WO2019187437A1 (ja) * | 2018-03-29 | 2019-10-03 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
US11323757B2 (en) | 2018-03-29 | 2022-05-03 | Sony Group Corporation | Information processing apparatus, information processing method, and program |
US11743520B2 (en) | 2018-03-29 | 2023-08-29 | Sony Group Corporation | Information processing apparatus, information processing method, and program |
WO2019187434A1 (ja) * | 2018-03-29 | 2019-10-03 | ソニー株式会社 | 情報処理装置、情報処理方法、及びプログラム |
JPWO2020153092A1 (ja) * | 2019-01-25 | 2021-12-02 | ソニーグループ株式会社 | 情報処理装置及び情報処理方法 |
WO2020153092A1 (ja) * | 2019-01-25 | 2020-07-30 | ソニー株式会社 | 情報処理装置及び情報処理方法 |
JP7415954B2 (ja) | 2019-01-25 | 2024-01-17 | ソニーグループ株式会社 | 情報処理装置及び情報処理方法 |
US12073841B2 (en) | 2019-01-25 | 2024-08-27 | Sony Group Corporation | Information processing device and information processing method |
Also Published As
Publication number | Publication date |
---|---|
US10880597B2 (en) | 2020-12-29 |
KR20170088843A (ko) | 2017-08-02 |
CA2967249C (en) | 2023-03-14 |
KR102605480B1 (ko) | 2023-11-24 |
MX2017006581A (es) | 2017-09-01 |
EP3226241B1 (en) | 2022-08-17 |
CN107004419A (zh) | 2017-08-01 |
JPWO2016084592A1 (ja) | 2017-09-07 |
CA2967249A1 (en) | 2016-06-02 |
JP6624068B2 (ja) | 2019-12-25 |
EP3226241A4 (en) | 2018-06-20 |
EP3226241A1 (en) | 2017-10-04 |
US20180310049A1 (en) | 2018-10-25 |
CN107004419B (zh) | 2021-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6624068B2 (ja) | 送信装置、送信方法、受信装置および受信方法 | |
US20210243463A1 (en) | Transmission device, transmitting method, reception device, and receiving method | |
CN105981391B (zh) | 发送装置、发送方法、接收装置、接收方法、显示装置及显示方法 | |
JP6908168B2 (ja) | 受信装置、受信方法、送信装置および送信方法 | |
CN110622516B (zh) | 用于鱼眼视频数据的高级发信号 | |
KR20200024829A (ko) | Dash 에서 피쉬아이 가상 현실 비디오에 대한 강화된 하이레벨 시그널링 | |
WO2017006948A1 (ja) | 受信装置、受信方法、送信装置および送信方法 | |
JP2021105735A (ja) | 受信装置および受信方法 | |
JP2024015131A (ja) | 送信装置、送信方法、受信装置および受信方法 | |
WO2017104519A1 (ja) | 送信装置、送信方法、受信装置および受信方法 | |
CA3071560A1 (en) | Transmission apparatus, transmission method, reception apparatus, and reception method | |
KR20150045869A (ko) | 전송 스트림 시스템 타겟 디코더 모델에 기초한 하이브리드 서비스를 제공하는 영상 수신 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15862526 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016561483 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15523723 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 2967249 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 20177013143 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2017/006581 Country of ref document: MX |
|
REEP | Request for entry into the european phase |
Ref document number: 2015862526 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |