WO2016129412A1 - 送信装置、送信方法、受信装置および受信方法 - Google Patents
送信装置、送信方法、受信装置および受信方法 Download PDFInfo
- Publication number
- WO2016129412A1 WO2016129412A1 PCT/JP2016/052610 JP2016052610W WO2016129412A1 WO 2016129412 A1 WO2016129412 A1 WO 2016129412A1 JP 2016052610 W JP2016052610 W JP 2016052610W WO 2016129412 A1 WO2016129412 A1 WO 2016129412A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- packet
- audio
- stream
- data
- information
- Prior art date
Links
- 230000005540 biological transmission Effects 0.000 title claims description 38
- 238000000034 method Methods 0.000 title claims description 23
- 238000012545 processing Methods 0.000 claims abstract description 46
- 230000010354 integration Effects 0.000 claims abstract description 16
- 239000000872 buffer Substances 0.000 description 10
- 101100126625 Caenorhabditis elegans itr-1 gene Proteins 0.000 description 5
- 101100041822 Schizosaccharomyces pombe (strain 972 / ATCC 24843) sce3 gene Proteins 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000009877 rendering Methods 0.000 description 4
- 101100041819 Arabidopsis thaliana SCE1 gene Proteins 0.000 description 3
- 101100042631 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) SIN3 gene Proteins 0.000 description 3
- 239000002131 composite material Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- -1 CPE2 Proteins 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 101100190466 Caenorhabditis elegans pid-3 gene Proteins 0.000 description 1
- 101000609957 Homo sapiens PTB-containing, cubilin and LRP1-interacting protein Proteins 0.000 description 1
- 101150109471 PID2 gene Proteins 0.000 description 1
- 102100039157 PTB-containing, cubilin and LRP1-interacting protein Human genes 0.000 description 1
- 230000002730 additional effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005401 electroluminescence Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Definitions
- the present technology relates to a transmission device, a transmission method, a reception device, and a reception method, and particularly to a transmission device that handles an audio stream.
- object data composed of encoded sample data and metadata is transmitted together with channel data such as 5.1 channel and 7.1 channel so that sound reproduction with enhanced realism can be performed on the receiving side.
- channel data such as 5.1 channel and 7.1 channel
- 3D audio MPEG-H 3D Audio
- the audio frame constituting the audio stream includes, as payload information, a “Frame” packet (first packet) having encoded data as payload information, and configuration information indicating the configuration of the payload information of the “Frame” packet. It is configured to include a “Config” packet (second packet).
- the association information with the corresponding “Config” packet has not been inserted into the “Frame” packet.
- the order of a plurality of “Frame” packets included in an audio frame is restricted depending on the type of encoded data included in the payload in order to appropriately perform the decoding process. Therefore, for example, when a plurality of audio streams are integrated on the receiving side and integrated into one audio stream, it is necessary to observe this restriction, and the processing load increases.
- the purpose of this technology is to reduce the processing load when integrating multiple audio streams on the receiving side.
- the concept of this technology is An encoding unit for generating a predetermined number of audio streams; A transmission unit for transmitting a container in a predetermined format including the predetermined number of audio streams;
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. Common index information is inserted into the payloads of the related first packet and the second packet in the transmitting apparatus.
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information.
- the encoded data that the first packet has as payload information may be channel encoded data or object encoded data. Common index information is inserted into the payloads of the associated first packet and second packet.
- the transmission unit transmits a container of a predetermined format including the predetermined number of audio streams.
- the container may be a transport stream (MPEG-2 TS) adopted in the digital broadcasting standard.
- MPEG-2 TS transport stream
- the container may be MP4 used for Internet distribution or the like, or a container of other formats.
- the order of the plurality of first packets included in the audio frame is not limited by the order definition according to the type of encoded data included in the payload. Therefore, for example, when a single audio stream is generated by integrating a plurality of audio streams on the receiving side, it is not necessary to observe the order definition, and the processing load can be reduced.
- a receiving unit for receiving a container in a predetermined format including a predetermined number of audio streams The audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. Common index information is inserted in the payloads of the related first packet and the second packet, A part or all of the first packet and the second packet are extracted from the predetermined number of audio streams, and the index information inserted in the payload portion of the first packet and the second packet is used.
- a stream integration unit that integrates into one audio stream The receiving apparatus further includes a processing unit that processes the one audio stream.
- a container having a predetermined format including a predetermined number of audio streams is received by the receiving unit.
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. Then, common index information is inserted in the payloads of the related first packet and second packet.
- the stream integration unit extracts part or all of the first packet and the second packet from a predetermined number of audio streams, and uses the index information inserted in the payload part of the first packet and the second packet. And integrated into one audio stream.
- the order of the plurality of first packets included in the audio frame is the encoded data of the payload.
- the order of the types is not limited, and the configurations of the audio streams are integrated without being decomposed.
- One audio stream is processed by the processing unit.
- the processing unit may perform a decoding process on one audio stream.
- the processing unit may be configured to transmit one audio stream to an external device.
- the present technology a part or all of the first packet and the second packet extracted from the predetermined number of audio streams are inserted in the payload portions of the first packet and the second packet. Index information is used and integrated into one audio stream. Therefore, the configurations of the audio streams can be integrated without being decomposed, and the processing load can be reduced.
- FIG. 1 shows a configuration example of a transmission / reception system 10 as an embodiment.
- the transmission / reception system 10 includes a service transmitter 100 and a service receiver 200.
- the service transmitter 100 transmits the transport stream TS on a broadcast wave or a net packet.
- the transport stream TS has a predetermined number, that is, one or a plurality of audio streams in addition to the video stream.
- the audio stream includes a first packet (packet of “Frame”) having encoded data as payload information, and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. It is composed of an audio frame including a packet (a “Config” packet), and common index information is inserted into the payloads of the related first packet and second packet.
- FIG. 2 shows an example of the structure of an audio frame (1024 samples) in 3D audio transmission data handled in this embodiment.
- This audio frame is composed of a plurality of MPEG audio stream packets (mpeg
- Each MPEG audio stream packet is composed of a header and a payload.
- the header has information such as packet type (Packet type), packet label (Packet type Label), and packet length (Packet type Length).
- Packet type packet type
- Packet type Label packet label
- Packet type Length packet length
- payload information defined by the packet type of the header is arranged.
- the payload information includes “SYNC” corresponding to the synchronization start code, “Frame” that is actual data of 3D audio transmission data, and “Config” indicating the configuration of this “Frame”.
- “Frame” includes channel encoded data and object encoded data constituting 3D audio transmission data. Note that there are cases where only channel encoded data is included, or only object encoded data is included.
- the channel encoded data is composed of encoded sample data such as SCE (Single Channel Element), CPE (Channel Pair Element), and LFE (Low Frequency Element).
- the object encoded data is composed of SCE (Single Channel Element) encoded sample data and metadata for rendering it by mapping it to a speaker located at an arbitrary position. This metadata is included as an extension element (Ext_element).
- identification information for identifying the related “Config” is inserted into each “Frame”. That is, common index information is inserted into the related “Frame” and “Config”.
- FIG. 3A shows a configuration example of a conventional audio stream.
- configuration information “SCE_config” corresponding to an element of “Frame” of SCE exists.
- configuration information “CPE_config” corresponding to “Frame” of CPE.
- configuration information “EXE_config” corresponding to “Frame” of EXE exists as “Config”.
- the order of elements is defined as SCE ⁇ CPE ⁇ EXE so that the decoding process is appropriately performed. That is, the order of CPE ⁇ SCE ⁇ EXE as shown in FIG.
- FIG. 3B shows a configuration example of an audio stream in this embodiment.
- Config configuration information “SCE_config” corresponding to an element of “Frame” of SCE exists, and “Id0” is added as an element index to this configuration information “SCE_config”.
- configuration information “CPE_config” corresponding to “Frame” of CPE exists as “Config”, and “Id1” is added as an element index to this configuration information “CPE_config”. Also, as “Config”, there is configuration information “EXE_config” corresponding to EXE “Frame”, and “Id2” is added as an element index to this configuration information “EXE_config”.
- an element index common to the related “Config” is added to each “Frame”. That is, “Id0” is added as an element index to “Frame” of SCE. Also, “Id1” is added as an element index to “Frame” of CPE. In addition, “Id2” is added to the “Frame” of EXE as an element index.
- FIG. 4A schematically shows a configuration example of “Config”.
- “Mpeg3daConfig ()” is the highest-level concept, and below it is “mpeg3daDecoderConfig ()” for decoding. Furthermore, there is “Config ()” corresponding to each element stored in “Frame”, and an element index (Element_index) is inserted into each of them.
- mpegh3daSingleChannelElementConfig () corresponds to an SCE element
- mpegh3daChannelPairElementConfig () corresponds to a CPE element
- mpegh3daLfeElementConfig () corresponds to an LFE element
- mpegh3daExtElementConfig () corresponds to an EXE element. It corresponds to.
- FIG. 4B schematically shows a configuration example of “Frame”.
- “Mpeg3daFrame ()” is the highest concept, below which “Element ()” that is the entity of each element exists, and an element index (Element_index) is inserted into each.
- “mpegh3daSingleChannelElement ()” is an SCE element
- “mpegh3daChannlePairElement ()” is a CPE element
- mpegh3daLfeElement () is an LFE element
- “mpegh3daExtElement ()” is an EXE element.
- FIG. 5 shows a configuration example of 3D audio transmission data.
- it consists of first data consisting only of channel encoded data, second data consisting only of object encoded data, and third data consisting of channel encoded data and object encoded data. Yes.
- the channel encoded data of the first data is 5.1 channel channel encoded data, and is composed of encoded sample data of SCE1, CPE1, CPE2, and LFE1.
- the object encoded data of the second data is encoded data of an immersive audio object (Immersive audio object).
- This immersive audio object encoded data is object encoded data for immersive sound, and includes encoded sample data SCE2 and metadata EXE1 for rendering by mapping it to a speaker located at an arbitrary position. It has become.
- the channel encoded data included in the third data is 2-channel (stereo) channel encoded data, and is composed of encoded sample data of CPE3.
- the object encoded data included in the third data is speech language object encoded data, and is encoded meta data SCE3 and a meta for rendering it by mapping it to a speaker existing at an arbitrary position. It consists of data EXE2.
- Encoded data is distinguished by the concept of group by type.
- the 5.1 channel encoded channel data is group 1
- the immersive audio object encoded data is group 2
- the 2 channel (stereo) channel encoded data is group 3
- the speech language is group 4.
- group 1 group 1 and group 2 are bundled to form preset group 1
- group 1 and group 4 are bundled to form preset group 2.
- the service transmitter 100 transmits 3D audio transmission data including encoded data of a plurality of groups in one stream or a plurality of streams (Multiple stream) as described above.
- transmission is performed with three streams.
- FIG. 6 schematically shows a configuration example of an audio frame in the case of transmitting with 3 streams in the configuration example of 3D audio transmission data in FIG.
- the first stream identified by PID1 includes the first data consisting only of the channel encoded data together with “SYNC” and “Config”.
- the second stream identified by PID2 includes the second data consisting only of the object encoded data together with “SYNC” and “Config”.
- the third stream identified by PID3 includes the third data including channel encoded data, offject and encoded data, together with “SYNC” and “Config”.
- the service receiver 200 receives the transport stream TS transmitted from the service transmitter 100 on broadcast waves or net packets.
- the transport stream TS has a predetermined number, in this embodiment, three audio streams in addition to the video stream.
- the audio stream includes the first packet having the encoded data as payload information (packet of “Frame”) and the configuration information indicating the configuration of the payload information of the first packet as payload information. It consists of an audio frame including two packets ("Config" packet), and common index information is inserted into the payloads of the related first packet and second packet.
- the service receiver 200 extracts part or all of the first packet and the second packet from the three audio streams, and uses the index information inserted in the payload portion of the first packet and the second packet. Integrated into one audio stream. Then, the service receiver 200 processes this one audio stream. For example, a decoding process is performed on this one audio stream to obtain an audio output of 3D audio. For example, this one audio stream is transmitted to an external device.
- FIG. 7 illustrates a configuration example of the stream generation unit 110 included in the service transmitter 100.
- the stream generation unit 110 includes a video encoder 112, a 3D audio encoder 113, and a multiplexer 114.
- the video encoder 112 receives the video data SV, encodes the video data SV, and generates a video stream (video elementary stream).
- the 3D audio encoder 113 inputs necessary channel data and object data as the audio data SA.
- the 3D audio encoder 113 performs encoding on the audio data SA to obtain 3D audio transmission data.
- the 3D audio transmission data includes first data (group 1 data) consisting only of channel encoded data and second data (group 2 data) consisting only of object encoded data. Data), channel encoded data, and third data (data of groups 3 and 4) including offject and encoded data.
- the 3D audio encoder 113 includes a first audio stream (Stream 1) including the first data, a second audio stream (Stream 2) including the second data, and a third data including the third data. Audio stream (Stream 3) is generated (see FIG. 6).
- FIG. 8A shows the configuration of an audio frame (Audio Frame) that constitutes the first audio stream (Stream 1).
- “Id0” is inserted as a common element index into “Frame” of SCE1 and “Config” corresponding thereto.
- “Id1” is inserted and added as a common element index into “Frame” of CPE1 and “Config” corresponding thereto.
- “Id2” is inserted as a common element index into “Frame” of CPE2 and “Config” corresponding thereto.
- “Id3” is inserted as a common element index into “Frame” of LFE1 and “Config” corresponding thereto. Note that the values of the “Config” and “Frame” packet labels (PL) are all “PL1” in the first audio stream (Stream 1).
- FIG. 8B shows the configuration of an audio frame (Audio frame) that constitutes the second audio stream (Stream 2).
- “Frame” and “Config” “Id4” is inserted as a common element index.
- the values of the “Config” and “Frame” packet labels (PL) are all “PL2” in the second audio stream (Stream 2).
- FIG. 8C shows the configuration of an audio frame (Audio Frame) that constitutes the third audio stream (Stream 3).
- “Id5” is inserted as a common element index into “Frame” of CPE3 and “Config” corresponding thereto.
- the multiplexer 114 multiplexes the video stream output from the video encoder 112 and the three audio streams output from the audio encoder 113 into PES packets, further transport packets and multiplexes, respectively.
- a transport stream TS as a stream is obtained.
- the video data is supplied to the video encoder 112.
- the video data SV is encoded, and a video stream including the encoded video data is generated.
- the audio data SA is supplied to the 3D audio encoder 113.
- the audio data SA includes channel data and object data.
- the 3D audio encoder 113 encodes the audio data SA to obtain 3D audio transmission data.
- the 3D audio transmission data includes first data (group 1 data) consisting only of channel encoded data, second data (group 2 data) consisting only of object encoded data, and channel encoding. Data and third data (data of groups 3 and 4) composed of off-ject and encoded data are included (see FIG. 5).
- the 3D audio encoder 113 generates three audio streams (see FIGS. 6 and 8). In this case, in each audio stream, common index information is inserted into “Frame” and “Config” related to the same element. As a result, “Frame” and “Config” are associated with the index information for each element.
- the video stream generated by the video encoder 112 is supplied to the multiplexer 114. Further, the three audio streams generated by the audio encoder 113 are supplied to the multiplexer 114. In the multiplexer 114, a stream supplied from each encoder is converted into a PES packet, further converted into a transport packet, and multiplexed to obtain a transport stream TS as a multiplexed stream.
- FIG. 9 shows a configuration example of the service receiver 200.
- the service receiver 200 includes a CPU 221, a flash ROM 222, a DRAM 223, an internal bus 224, a remote controller receiver 225, and a remote controller transmitter 226.
- the service receiver 200 includes a receiving unit 201, a demultiplexer 202, a video decoder 203, a video processing circuit 204, a panel driving circuit 205, and a display panel 206.
- the service receiver 200 includes multiplexing buffers 211-1 to 211 -N, a combiner 212, a 3D audio decoder 213, an audio output processing circuit 214, a speaker system 215, and a distribution interface 232. Yes.
- the CPU 221 controls the operation of each unit of service receiver 200.
- the flash ROM 222 stores control software and data.
- the DRAM 223 constitutes a work area for the CPU 221.
- the CPU 221 develops software and data read from the flash ROM 222 on the DRAM 223 to activate the software, and controls each unit of the service receiver 200.
- the remote control receiving unit 225 receives the remote control signal (remote control code) transmitted from the remote control transmitter 226 and supplies it to the CPU 221.
- the CPU 221 controls each part of the service receiver 200 based on this remote control code.
- the CPU 221, flash ROM 222, and DRAM 223 are connected to the internal bus 224.
- the receiving unit 201 receives the transport stream TS transmitted from the service transmitter 100 on broadcast waves or net packets.
- the transport stream TS includes three audio streams that constitute 3D audio transmission data in addition to the video stream (see FIGS. 6 and 8).
- the demultiplexer 202 extracts a video stream packet from the transport stream TS and sends it to the video decoder 203.
- the video decoder 203 reconstructs a video stream from the video packets extracted by the demultiplexer 202 and performs decoding processing to obtain uncompressed video data.
- the video processing circuit 204 performs scaling processing, image quality adjustment processing, and the like on the video data obtained by the video decoder 203 to obtain video data for display.
- the panel drive circuit 205 drives the display panel 206 based on the display image data obtained by the video processing circuit 204.
- the display panel 206 includes, for example, an LCD (Liquid Crystal Display), an organic EL display (organic electroluminescence display), and the like.
- the demultiplexer 202 includes one piece of encoded data of a group that conforms to the speaker configuration and the viewer (user) selection information out of a predetermined number of audio streams included in the transport stream TS under the control of the CPU 221. Alternatively, a plurality of audio stream packets are selectively extracted by a PID filter.
- the multiplexing buffers 211-1 to 211 -N take in the respective audio streams extracted by the demultiplexer 202.
- the number N of the multiplexing buffers 211-1 to 211-N is set to a necessary and sufficient number. However, in actual operation, only the number of audio streams extracted by the demultiplexer 202 is used.
- the combiner 212 performs a part or all of “Config”, “for each audio frame from the multiplexing buffer in which the audio streams extracted by the demultiplexer 202 are taken out of the multiplexing buffers 211-1 to 211 -N. Frame “packets” are extracted and integrated into one audio stream.
- FIG. 10 shows an example of the integration process when “Frame” and “Config” are not linked by index information for each element.
- the data of group 1 included in the first audio stream (Stream ⁇ 1)
- the data of group 2 included in the second audio stream (Stream 2)
- the third audio stream (Stream 3) It is an example which integrates the data of the group 3 contained.
- the synthesized stream in FIG. 10A1 is an example in which the configurations of the audio streams are integrated without being decomposed. In this case, the element order is violated at the locations of LFE1 and CPE3 indicated by arrows. In this case, each element is analyzed, and the configuration of the first audio stream is decomposed and the elements of the third audio stream are interrupted as shown in the composite stream of FIG. 10 (a2), so that CPE3 ⁇ LFE1 It is necessary to be in order.
- FIG. 11 shows an example of the integration process when “Frame” and “Config” are linked by index information for each element. Also in this example, the data of group 1 included in the first audio stream (Stream 1), the data of group 2 included in the second audio stream (Stream 2), and the third audio stream (Stream 3) It is an example which integrates the data of the group 3 contained.
- the composite stream in FIG. 11A1 is an example in which the configurations of the audio streams are integrated without being decomposed.
- the composite stream in FIG. 11A1 is another example in which the configurations of the audio streams are integrated without being decomposed.
- the 3D audio decoder 213 performs decoding processing on one audio stream obtained by integration by the combiner 212 to obtain audio data for driving each speaker.
- the audio output processing circuit 214 performs necessary processing such as D / A conversion and amplification on the audio data for driving each speaker and supplies the audio data to the speaker system 215.
- the speaker system 215 includes a plurality of speakers such as a plurality of channels, for example, two channels, 5.1 channels, 7.1 channels, 22.2 channels, and the like.
- the distribution interface 232 distributes (transmits) one audio stream obtained by integration by the combiner 212 to, for example, the device 300 connected to the local area network.
- This local area network connection includes an Ethernet connection, a wireless connection such as “WiFi” or “Bluetooth”. “WiFi” and “Bluetooth” are registered trademarks.
- the device 300 includes a surround speaker, a second display, and an audio output device attached to the network terminal.
- the device 300 performs the same decoding process as the 3D audio decoder 213 to obtain audio data for driving a predetermined number of speakers.
- the receiving unit 201 receives the transport stream TS transmitted from the service transmitter 100 on broadcast waves or net packets.
- the transport stream TS includes three audio streams that constitute 3D audio transmission data (see FIGS. 6 and 8). This transport stream TS is supplied to the demultiplexer 202.
- the demultiplexer 202 extracts a video stream packet from the transport stream TS and supplies it to the video decoder 203.
- the video decoder 203 a video stream is reconstructed from the video packets extracted by the demultiplexer 202, and decoding processing is performed to obtain uncompressed video data. This video data is supplied to the video processing circuit 204.
- the video processing circuit 204 performs scaling processing, image quality adjustment processing, and the like on the video data obtained by the video decoder 203 to obtain video data for display.
- This display video data is supplied to the panel drive circuit 205.
- the panel drive circuit 205 drives the display panel 206 based on the display video data. As a result, an image corresponding to the video data for display is displayed on the display panel 206.
- the demultiplexer 202 includes one or a plurality of pieces of encoded data including a group that matches a speaker configuration and viewer selection information among a predetermined number of audio streams included in the transport stream TS under the control of the CPU 221. Audio stream packets are selectively extracted by a PID filter.
- the audio stream taken out by the demultiplexer 202 is taken into the corresponding multiplexing buffer among the multiplexing buffers 211-1 to 211 -N.
- the combiner 212 a part or all of the “Config” and “All” for each audio frame from the multiplexing buffer in which the audio streams taken out by the demultiplexer 202 of the multiplexing buffers 211-1 to 211-N are taken. Frame "packets are extracted and integrated into one audio stream.
- One audio stream obtained by integration by the combiner 212 is supplied to the 3D audio decoder 213.
- the audio stream is subjected to decoding processing, and audio data for driving each speaker constituting the speaker system 215 is obtained.
- the audio data is supplied to the audio output processing circuit 214.
- the audio output processing circuit 214 performs necessary processing such as D / A conversion and amplification on the audio data for driving each speaker.
- the processed audio data is supplied to the speaker system 215.
- a sound output corresponding to the display image on the display panel 206 is obtained from the speaker system 215.
- the audio stream obtained by integration by the combiner 212 is supplied to the distribution interface 232.
- this audio stream is distributed (transmitted) to the device 300 connected to the local area network.
- the audio stream is decoded, and audio data for driving a predetermined number of speakers is obtained.
- the service transmitter 100 inserts index information common to “Frame” and “Config” related to the same element when generating an audio stream by 3D audio encoding. To do. Therefore, when a plurality of audio streams are integrated into one audio stream on the receiving side, it is not necessary to observe the order definition, and the processing load can be reduced.
- this technique can also take the following structures.
- an encoding unit that generates a predetermined number of audio streams;
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. Common index information is inserted into the payloads of the related first packet and the second packet in the transmitting apparatus.
- the encoded data included in the first packet as payload information is channel encoded data or object encoded data.
- the transmission unit has a transmission step of transmitting a container of a predetermined format including the predetermined number of audio streams
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information. Common index information is inserted into the payloads of the related first packet and the second packet.
- a receiving unit that receives a container in a predetermined format including a predetermined number of audio streams;
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information.
- Common index information is inserted in the payloads of the related first packet and the second packet, A part or all of the first packet and the second packet are extracted from the predetermined number of audio streams, and the index information inserted in the payload portion of the first packet and the second packet is used.
- a stream integration unit that integrates into one audio stream, A receiving apparatus further comprising a processing unit for processing the one audio stream. (5) The receiving device according to (4), wherein the processing unit performs a decoding process on the one audio stream. (6) The receiving device according to (4) or (5), wherein the processing unit transmits the one audio stream to an external device.
- the reception unit includes a reception step of receiving a container of a predetermined format including a predetermined number of audio streams,
- the audio stream is composed of an audio frame including a first packet having encoded data as payload information and a second packet having configuration information indicating the configuration of the payload information of the first packet as payload information.
- Common index information is inserted in the payloads of the related first packet and the second packet, A part or all of the first packet and the second packet are extracted from the predetermined number of audio streams, and the index information inserted in the payload portion of the first packet and the second packet is used.
- Stream integration step to integrate into one audio stream,
- a receiving method further comprising processing steps for processing the one audio stream.
- the main feature of the present technology is that, when generating an audio stream by 3D audio encoding, by inserting common index information into “Frame” and “Config” related to the same element, processing of stream integration processing on the receiving side The load can be reduced (see FIGS. 3 and 8).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
所定数のオーディオストリームを生成するエンコード部と、
上記所定数のオーディオストリームを含む所定フォーマットのコンテナを送信する送信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入される
送信装置にある。
所定数のオーディオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入されており、
上記所定数のオーディオストリームから一部または全部の上記第1のパケットおよび上記第2のパケットを取り出し、上記第1のパケットおよび上記第2のパケットのペイロード部に挿入されている上記インデックス情報を利用して1つのオーディオストリームに統合するストリーム統合部と、
上記1つのオーディオストリームを処理する処理部をさらに備える
受信装置にある。
1.実施の形態
2.変形例
[送受信システムの構成例]
図1は、実施の形態としての送受信システム10の構成例を示している。この送受信システム10は、サービス送信機100とサービス受信機200により構成されている。サービス送信機100は、トランスポートストリームTSを、放送波あるいはネットのパケットに載せて送信する。このトランスポートストリームTSは、ビデオストリームの他に、所定数、つまり1つまたは複数のオーディオストリームを有している。
図7は、サービス送信機100が備えるストリーム生成部110の構成例を示している。このストリーム生成部110は、ビデオエンコーダ112と、3Dオーディオエンコーダ113と、マルチプレクサ114を有している。
図9は、サービス受信機200の構成例を示している。このサービス受信機200は、CPU221と、フラッシュROM222と、DRAM223と、内部バス224と、リモコン受信部225と、リモコン送信機226を有している。
なお、上述実施の形態においては、コンテナがトランスポートストリーム(MPEG-2 TS)である例を示した。しかし、本技術は、MP4やそれ以外のフォーマットのコンテナで配信されるシステムにも同様に適用できる。例えば、MPEG-DASHベースのストリーム配信システム、あるいは、MMT(MPEG Media Transport)構造伝送ストリームを扱う送受信システムなどである。
(1)所定数のオーディオストリームを生成するエンコード部と、
上記所定数のオーディオストリームを含む所定フォーマットのコンテナを送信する送信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入される
送信装置にある。
(2)上記第1のパケットがペイロード情報として持つ符号化データは、チャネル符号化データまたはオブジェクト符号化データである
前記(1)に記載の送信装置。
(3)所定数のオーディオストリームを生成するエンコードステップと、
送信部により、上記所定数のオーディオストリームを含む所定フォーマットのコンテナを送信する送信ステップを有し、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入される
送信方法。
(4)所定数のオーディオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入されており、
上記所定数のオーディオストリームから一部または全部の上記第1のパケットおよび上記第2のパケットを取り出し、上記第1のパケットおよび上記第2のパケットのペイロード部に挿入されている上記インデックス情報を利用して1つのオーディオストリームに統合するストリーム統合部と、
上記1つのオーディオストリームを処理する処理部をさらに備える
受信装置。
(5)上記処理部は、上記1つのオーディオストリームに対してデコード処理を施す
前記(4)に記載の受信装置。
(6)上記処理部は、上記1つのオーディオストリームを外部機器に送信する
前記(4)または(5)に記載の受信装置。
(7)受信部により、所定数のオーディオストリームを含む所定フォーマットのコンテナを受信する受信ステップを有し、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入されており、
上記所定数のオーディオストリームから一部または全部の上記第1のパケットおよび上記第2のパケットを取り出し、上記第1のパケットおよび上記第2のパケットのペイロード部に挿入されている上記インデックス情報を利用して1つのオーディオストリームに統合するストリーム統合ステップと、
上記1つのオーディオストリームを処理する処理ステップをさらに有する
受信方法。
100・・・サービス送信機
110・・・ストリーム生成部
112・・・ビデオエンコーダ
113・・・3Dオーディオエンコーダ
114・・・マルチプレクサ
200・・・サービス受信機
201・・・受信部
202・・・デマルチプレクサ
203・・・ビデオデコーダ
204・・・映像処理回路
205・・・パネル駆動回路
206・・・表示パネル
211-1~211-N・・・多重化バッファ
212・・・コンバイナ
213・・・3Dオーディオデコーダ
214・・・音声出力処理回路
215・・・スピーカシステム
221・・・CPU
222・・・フラッシュROM
223・・・DRAM
224・・・内部バス
225・・・リモコン受信部
226・・・リモコン送信機
232・・・配信インタフェース
300・・・デバイス
Claims (7)
- 所定数のオーディオストリームを生成するエンコード部と、
上記所定数のオーディオストリームを含む所定フォーマットのコンテナを送信する送信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入される
送信装置。 - 上記第1のパケットがペイロード情報として持つ符号化データは、チャネル符号化データまたはオブジェクト符号化データである
請求項1に記載の送信装置。 - 所定数のオーディオストリームを生成するエンコードステップと、
送信部により、上記所定数のオーディオストリームを含む所定フォーマットのコンテナを送信する送信ステップを有し、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入される
送信方法。 - 所定数のオーディオストリームを含む所定フォーマットのコンテナを受信する受信部を備え、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入されており、
上記所定数のオーディオストリームから一部または全部の上記第1のパケットおよび上記第2のパケットを取り出し、上記第1のパケットおよび上記第2のパケットのペイロード部に挿入されている上記インデックス情報を利用して1つのオーディオストリームに統合するストリーム統合部と、
上記1つのオーディオストリームを処理する処理部をさらに備える
受信装置。 - 上記処理部は、上記1つのオーディオストリームに対してデコード処理を施す
請求項4に記載の受信装置。 - 上記処理部は、上記1つのオーディオストリームを外部機器に送信する
請求項4に記載の受信装置。 - 受信部により、所定数のオーディオストリームを含む所定フォーマットのコンテナを受信する受信ステップを有し、
上記オーディオストリームは、符号化データをペイロード情報として持つ第1のパケットと、該第1のパケットのペイロード情報の構成を示す構成情報をペイロード情報として持つ第2のパケットを含むオーディオフレームからなり、
関連する上記第1のパケットおよび上記第2のパケットのペイロードには共通のインデックス情報が挿入されており、
上記所定数のオーディオストリームから一部または全部の上記第1のパケットおよび上記第2のパケットを取り出し、上記第1のパケットおよび上記第2のパケットのペイロード部に挿入されている上記インデックス情報を利用して1つのオーディオストリームに統合するストリーム統合ステップと、
上記1つのオーディオストリームを処理する処理ステップをさらに有する
受信方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16749056.4A EP3258467B1 (en) | 2015-02-10 | 2016-01-29 | Transmission and reception of audio streams |
JP2016574724A JP6699564B2 (ja) | 2015-02-10 | 2016-01-29 | 送信装置、送信方法、受信装置および受信方法 |
CN201680008488.XA CN107210041B (zh) | 2015-02-10 | 2016-01-29 | 发送装置、发送方法、接收装置以及接收方法 |
US15/540,306 US10475463B2 (en) | 2015-02-10 | 2016-01-29 | Transmission device, transmission method, reception device, and reception method for audio streams |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015024240 | 2015-02-10 | ||
JP2015-024240 | 2015-02-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016129412A1 true WO2016129412A1 (ja) | 2016-08-18 |
Family
ID=56614657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/052610 WO2016129412A1 (ja) | 2015-02-10 | 2016-01-29 | 送信装置、送信方法、受信装置および受信方法 |
Country Status (5)
Country | Link |
---|---|
US (1) | US10475463B2 (ja) |
EP (1) | EP3258467B1 (ja) |
JP (1) | JP6699564B2 (ja) |
CN (1) | CN107210041B (ja) |
WO (1) | WO2016129412A1 (ja) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109168032B (zh) * | 2018-11-12 | 2021-08-27 | 广州酷狗计算机科技有限公司 | 视频数据的处理方法、终端、服务器及存储介质 |
CN113724717B (zh) * | 2020-05-21 | 2023-07-14 | 成都鼎桥通信技术有限公司 | 车载音频处理系统、方法、车机控制器和车辆 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997044955A1 (en) * | 1996-05-17 | 1997-11-27 | Matsushita Electric Industrial Co., Ltd. | Data multiplexing method, method and device for reproducing multiplexed data, and recording medium containing the data multiplexed by said method |
JP2001292432A (ja) * | 2000-04-05 | 2001-10-19 | Mitsubishi Electric Corp | 限定受信制御方式 |
WO2004066303A1 (ja) * | 2003-01-20 | 2004-08-05 | Pioneer Corporation | 情報記録媒体、情報記録装置及び方法、情報再生装置及び方法、情報記録再生装置及び方法、記録又は再生制御用のコンピュータプログラム、並びに制御信号を含むデータ構造 |
JP2009177706A (ja) * | 2008-01-28 | 2009-08-06 | Funai Electric Co Ltd | 放送受信装置 |
JP2012033243A (ja) * | 2010-08-02 | 2012-02-16 | Sony Corp | データ生成装置およびデータ生成方法、データ処理装置およびデータ処理方法 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6385704B1 (en) * | 1997-11-14 | 2002-05-07 | Cirrus Logic, Inc. | Accessing shared memory using token bit held by default by a single processor |
CA2553708C (en) * | 2004-02-06 | 2014-04-08 | Sony Corporation | Information processing device, information processing method, program, and data structure |
CN101479787B (zh) * | 2006-09-29 | 2012-12-26 | Lg电子株式会社 | 用于编码和解码基于对象的音频信号的方法和装置 |
JP5258967B2 (ja) * | 2008-07-15 | 2013-08-07 | エルジー エレクトロニクス インコーポレイティド | オーディオ信号の処理方法及び装置 |
KR101742136B1 (ko) * | 2011-03-18 | 2017-05-31 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | 오디오 콘텐츠를 표현하는 비트스트림의 프레임들 내의 프레임 요소 배치 |
JP5798247B2 (ja) | 2011-07-01 | 2015-10-21 | ドルビー ラボラトリーズ ライセンシング コーポレイション | 向上した3dオーディオ作成および表現のためのシステムおよびツール |
BR122021021503B1 (pt) * | 2012-09-12 | 2023-04-11 | Fraunhofer - Gesellschaft Zur Forderung Der Angewandten Forschung E.V. | Aparelho e método para fornecer capacidades melhoradas de downmix guiado para áudio 3d |
EP2757558A1 (en) * | 2013-01-18 | 2014-07-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Time domain level adjustment for audio signal decoding or encoding |
RU2630754C2 (ru) * | 2013-05-24 | 2017-09-12 | Долби Интернешнл Аб | Эффективное кодирование звуковых сцен, содержащих звуковые объекты |
WO2015180866A1 (en) * | 2014-05-28 | 2015-12-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Data processor and transport of user control data to audio decoders and renderers |
RU2698779C2 (ru) * | 2014-09-04 | 2019-08-29 | Сони Корпорейшн | Устройство передачи, способ передачи, устройство приема и способ приема |
WO2016039287A1 (ja) * | 2014-09-12 | 2016-03-17 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
WO2016052191A1 (ja) * | 2014-09-30 | 2016-04-07 | ソニー株式会社 | 送信装置、送信方法、受信装置および受信方法 |
EP3208801A4 (en) * | 2014-10-16 | 2018-03-28 | Sony Corporation | Transmitting device, transmission method, receiving device, and receiving method |
-
2016
- 2016-01-29 EP EP16749056.4A patent/EP3258467B1/en active Active
- 2016-01-29 JP JP2016574724A patent/JP6699564B2/ja active Active
- 2016-01-29 CN CN201680008488.XA patent/CN107210041B/zh active Active
- 2016-01-29 WO PCT/JP2016/052610 patent/WO2016129412A1/ja active Application Filing
- 2016-01-29 US US15/540,306 patent/US10475463B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1997044955A1 (en) * | 1996-05-17 | 1997-11-27 | Matsushita Electric Industrial Co., Ltd. | Data multiplexing method, method and device for reproducing multiplexed data, and recording medium containing the data multiplexed by said method |
JP2001292432A (ja) * | 2000-04-05 | 2001-10-19 | Mitsubishi Electric Corp | 限定受信制御方式 |
WO2004066303A1 (ja) * | 2003-01-20 | 2004-08-05 | Pioneer Corporation | 情報記録媒体、情報記録装置及び方法、情報再生装置及び方法、情報記録再生装置及び方法、記録又は再生制御用のコンピュータプログラム、並びに制御信号を含むデータ構造 |
JP2009177706A (ja) * | 2008-01-28 | 2009-08-06 | Funai Electric Co Ltd | 放送受信装置 |
JP2012033243A (ja) * | 2010-08-02 | 2012-02-16 | Sony Corp | データ生成装置およびデータ生成方法、データ処理装置およびデータ処理方法 |
Non-Patent Citations (1)
Title |
---|
STEVE VERNON ET AL.: "An Integrated Multichannel Audio Coding System for Digital Television Distribution and Emission", PROC. 108TH CONVENTION OF THE AES, 19 February 2000 (2000-02-19), pages 1 - 12, XP009121840 * |
Also Published As
Publication number | Publication date |
---|---|
US20180005640A1 (en) | 2018-01-04 |
CN107210041B (zh) | 2020-11-17 |
JP6699564B2 (ja) | 2020-05-27 |
EP3258467A4 (en) | 2018-07-04 |
JPWO2016129412A1 (ja) | 2017-11-24 |
CN107210041A (zh) | 2017-09-26 |
EP3258467A1 (en) | 2017-12-20 |
US10475463B2 (en) | 2019-11-12 |
EP3258467B1 (en) | 2019-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6908168B2 (ja) | 受信装置、受信方法、送信装置および送信方法 | |
JP6904463B2 (ja) | 送信装置および送信方法 | |
JP7529013B2 (ja) | 送信装置および送信方法 | |
US20240089534A1 (en) | Transmission apparatus, transmission method, reception apparatus and reception method for transmitting a plurality of types of audio data items | |
WO2016129412A1 (ja) | 送信装置、送信方法、受信装置および受信方法 | |
JP6876924B2 (ja) | 送信装置、送信方法、受信装置および受信方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16749056 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016574724 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15540306 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2016749056 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |