WO2017022461A1

WO2017022461A1 - Receiving device, sending device and data processing method

Info

Publication number: WO2017022461A1
Application number: PCT/JP2016/071059
Authority: WO
Inventors: 高橋　和幸
Original assignee: ソニー株式会社
Priority date: 2015-07-31
Filing date: 2016-07-15
Publication date: 2017-02-09

Abstract

The present technology relates to a receiving device, a sending device and a data processing method that make it possible to perform dialog control using dialog control information. The receiving device: receives a stream which is transmitted via a transmission path and which includes a multi-channel or stereo audio component; acquires dialog control information which is transmitted by a system layer that processes signaling specified by a prescribed standard and which is for controlling the multi-channel or stereo audio dialog; and controls the multi-channel or stereo audio dialog transmitted via the transmission path on the basis of the dialog control information. The present technology is applicable, for example, to a TV receiver which is compatible with multi-channel audio.

Description

Reception device, transmission device, and data processing method

The present technology relates to a receiving device, a transmitting device, and a data processing method, and more particularly, to a receiving device, a transmitting device, and a data processing method that can perform dialog control using dialog control information.

In digital broadcasting, multi-channel (multi-channel audio) such as 22.2ch is being introduced as an acoustic system. Also, in the digital broadcasting standard, various descriptors related to audio streams are defined (for example, see Non-Patent Document 1).

By the way, in multi-channel audio, a technique called dialog control is becoming widespread. However, a technical method of dialog control has not been established, and a proposal for performing dialog control using dialog control information has been requested.

This technology has been made in view of such a situation, and makes it possible to perform dialog control using dialog control information.

A receiving device according to a first aspect of the present technology processes a receiving unit that receives a stream including a multi-channel or stereo audio component that is transmitted via a transmission path, and a signaling that is defined by a predetermined standard. An acquisition unit that acquires dialog control information for controlling a dialog of multi-channel or stereo audio transmitted in the system layer, and a multi-channel transmitted through the transmission path based on the dialog control information And a control unit that controls a stereo audio dialog.

The receiving device according to the first aspect of the present technology may be an independent device, or may be an internal block constituting one device. The data processing method according to the first aspect of the present technology is a data processing method corresponding to the above-described receiving device according to the first aspect of the present technology.

In the receiving device and the data processing method according to the first aspect of the present technology, a stream including a multi-channel or stereo audio component transmitted via a transmission path is received and defined by a predetermined standard. Dialog control information for controlling a multi-channel or stereo audio dialog transmitted in a system layer for processing signaling is acquired, and the multi-channel is transmitted via the transmission path based on the dialog control information Alternatively, a stereo audio dialog is controlled.

A transmission apparatus according to a second aspect of the present technology includes an acquisition unit that acquires a stream including a multi-channel or stereo audio component, and generation that generates dialog control information for controlling a multi-channel or stereo audio dialog. And a transmission unit that transmits the dialog control information together with the stream via a transmission path, and the dialog control information is transmitted in a system layer that processes signaling defined by a predetermined standard Device.

The transmission device according to the second aspect of the present technology may be an independent device, or may be an internal block constituting one device. A data processing method according to the second aspect of the present technology is a data processing method corresponding to the transmission device according to the second aspect of the present technology described above.

In the transmission device and the data processing method according to the second aspect of the present technology, a dialog control information for acquiring a stream including a multi-channel or stereo audio component and controlling a multi-channel or stereo audio dialog. And the dialog control information is transmitted along with the stream via a transmission path. The dialog control information is transmitted in a system layer that processes signaling defined by a predetermined standard.

According to the first aspect and the second aspect of the present technology, dialog control using dialog control information can be performed.

It should be noted that the effects described here are not necessarily limited, and may be any of the effects described in the present disclosure.

It is a figure showing the composition of the 1 embodiment of the transmission system to which this art is applied. It is a figure which shows the speaker arrangement example of 22.2ch. It is a figure which shows the layer structure of 22.2ch speaker arrangement | positioning. It is a figure which shows the example of a 22.2ch channel map. It is a figure which shows the speaker arrangement example of three-dimensional VBAP. It is a figure explaining the coordinate system in three-dimensional VBAP. It is a figure which shows the structural example of the audio decoder of MPEG-H | 3D | Audio. It is a figure which shows the example of the syntax of an audio stream. It is a figure which shows the example of the syntax of an audio component descriptor. It is a figure which shows the example of the information regarding the dialog control arrange | positioned at an audio component descriptor. It is a figure which shows the example of the syntax and semantics of this technical descriptor. It is a figure which shows the example of the syntax of the DE_control_data descriptor prescribed | regulated by DVB. It is a figure which shows the example of the syntax of the component descriptor prescribed | regulated by DVB. It is a figure which shows the structural example of a transmitter. It is a figure which shows the structural example of a receiver. It is a flowchart explaining a transmission process. It is a flowchart explaining a reception process. It is a figure which shows the structural example of a computer.

Hereinafter, embodiments of the present technology will be described with reference to the drawings. The description will be made in the following order.

1. 1. System configuration 2. Overview of multi-channel dialog control technology Contents of dialog control technology of this technology4. 4. Configuration of each device 5. Flow of processing executed in each device Computer configuration

<1. System configuration>

FIG. 1 is a diagram illustrating a configuration of an embodiment of a transmission system to which the present technology is applied. The system refers to a logical collection of a plurality of devices.

1, the transmission system 1 includes a transmission device 10 and a reception device 20. In the transmission system 1, for example, data transmission conforming to a predetermined standard such as a digital broadcasting standard is performed.

The transmitting device 10 transmits (transmits) the content via the transmission path 30. For example, the transmission apparatus 10 transmits a stream of video and audio (components) constituting a content such as a television program and signaling and a signaling as a digital broadcast signal via the transmission path 30.

The receiving device 20 receives and outputs content transmitted (transmitted) from the transmitting device 10 via the transmission path 30. For example, the receiving device 20 receives a digital broadcast signal transmitted from the transmitting device 10, acquires a stream of video and audio (components thereof) and signaling that constitute the content, and obtains a video of content such as a TV program, Play audio.

In addition, in the transmission system 1 of FIG. 1, for example, data transmission conforming to digital broadcasting standards such as ISDB (Integrated Services Digital Broadcasting), DVB (Digital Video Broadcasting), ATSC (Advanced Television Systems Committee), etc. Data transmission conforming to the standard is performed.

Further, as the transmission line 30, when data transmission conforming to the digital broadcasting standard is performed, a satellite line, a cable television network (wired line), or the like can be used in addition to the ground wave. In addition, when data transmission conforming to a standard other than the digital broadcasting standard is performed on the transmission path 30, for example, a communication line such as the Internet or a telephone network can be used.

<2. Overview of multi-channel dialog control technology>

By the way, for multi-channel audio such as 22.2ch, technology related to dialog control is spreading. Dialog control refers to controlling a multi-channel audio dialog such as dialog volume level (voice level) control, dialog replacement control, or dialog localization position control. In the following description, information related to dialog control is referred to as dialog control information.

(22.2ch multi-channel audio)
Here, 22.2ch multi-channel audio will be described as an example of multi-channel audio. Note that multi-channel audio refers to an acoustic system that transmits audio data of two or more channels such as 22.2ch in order to reproduce the localization of a sound image and the sense of spread of a sound field. In the following description, multi-channel audio will be described, but the present technology can also be applied to stereo audio.

FIG. 2 is a diagram showing an example of speaker arrangement in an acoustic system employing 22.2ch multi-channel audio. 3 and 4 show the layer configuration of the speaker arrangement of FIG. 2 and the channel map, which will be described with reference to them as appropriate.

In the acoustic system of FIG. 2, the circles indicate the position of each speaker. There are a total of 22 speakers in three layers, the top layer, the middle layer, and the bottom layer. Arranged to form a 360-degree stereophonic space. In the lower layer, a two-channel subwoofer is arranged for low frequency enhancement (LFE: Low Frequency Frequency). In FIG. 3A to FIG. 3C, the speakers and subwoofers arranged at respective positions in the lower layer to the upper layer in the acoustic system of FIG. 2 are shown for each layer.

In FIG. 2, in the middle layer, the front left channel speaker FL, the front right channel speaker FR, the front center channel speaker FC, and the rear left channel are displayed in the middle layer with respect to the content video display area (TV screen). Speaker BL, rear right channel speaker BR, front left center channel speaker FLC, front right center channel speaker FRC, rear center channel speaker BC, side left channel speaker SiL, and side right channel speaker SiR Is placed. That is, in the acoustic system of FIG. 2, a total of 10 channels of speakers are arranged in the middle layer.

Also, in FIG. 2, the upper layer has a speaker TpFL for the upper front left channel, a speaker TpFR for the upper front right channel, a speaker TpFC for the upper front center channel, and an upper center for the content video display area (TV screen). A channel speaker TpC, an upper rear left channel speaker TpBL, an upper rear right channel speaker TpBR, an upper side left channel speaker TpSiL, an upper side right channel speaker TpSiR, and an upper rear center channel speaker TpBC are arranged. Is done. That is, in the acoustic system of FIG. 2, a total of nine channels of speakers are arranged in the upper layer.

Further, in FIG. 2, a lower front center channel speaker BtFC, a lower front left channel speaker BtFL, and a lower front right channel speaker BtFR are displayed in the lower layer with respect to a content video display area (TV screen). Be placed. Further, in the lower layer, a subwoofer LFE1 and a subwoofer LFE2 for low frequency enhancement (LFE) are arranged in front of the lower layer. That is, in the acoustic system of FIG. 2, a 3-channel speaker and a 2-channel subwoofer are arranged in the lower layer.

In the sound system of FIG. 2, the labels of the speakers used in 22.2ch multi-channel audio are described in the channel map of FIG.

Here, in the transmission system 1 of FIG. 1, when the receiving device 20 has a configuration capable of realizing the 22.2 ch multi-channel audio shown in FIG. 2, the transmission device 10 via the transmission path 30. When the dialog control information is received, the receiving apparatus 20 controls the 22.2ch multi-channel audio dialog based on the dialog control information.

Specifically, for example, when receiving the dialog control information including an instruction for adjusting the volume of the dialog from the transmission apparatus 10, the receiving apparatus 20 adjusts the 22.2ch volume level based on the dialog control information. Also, when receiving the dialog control information including the dialog replacement instruction from the transmitting device 10, the receiving device 20 receives, for example, the front center channel speaker FC and the lower front center channel speaker BtFC based on the dialog control information. Replace the input Japanese dialog with the English or French dialog.

As described above, in the transmission system 1 of FIG. 1, when the receiving device 20 has a configuration capable of realizing the 22.2 ch multi-channel audio shown in FIG. When the dialog control information is transmitted via the receiving device 20, the receiving device 20 performs a multi-channel audio dialog such as dialog volume level control, dialog replacement control, or dialog localization position control based on the dialog control information. Control.

(MPEG-H 3D Audio)
Dialog control can also be used in MPEG-H 3D Audio, which defines audio compression for 3D audio, which allows multiple speakers. In MPEG-H 3D Audio, VBAP (Vector Base Amplitude Pannning) is used as a technique for controlling the localization of a sound image by arranging a plurality of speakers. The details of VBAP are described in “Ville Pulkki,“ Virtual Sound Source Positioning Using Vector Base Amplitude Panning ”, Journal of AES, vol.45, no.6, pp.456-466, 1997”. Yes.

FIG. 5 is a diagram showing an example of speaker arrangement of a three-dimensional VBAP.

In FIG. 5, five speakers SP1 to SP5 are arranged, and the sound of each channel is output from these speakers SP1 to SP5. Here, the speakers SP1 to SP5 are arranged on a spherical surface centering on the origin O at the position of the head of the user U11. Further, the three-dimensional vectors starting from the origin O and pointing in the direction of the positions of the speakers SP1 to SP5 are set as vectors I _{1 to} I ₅ .

Further, among the regions on the spherical surface centered on the origin O, a triangular region surrounded by the speaker SP1, the speaker SP4, and the speaker SP5 is defined as a region TR21. Similarly, a triangular region surrounded by the speaker SP3, the speaker SP4, and the speaker SP5 among the region on the spherical surface centered on the origin O is defined as a region TR22, and is surrounded by the speaker SP2, the speaker SP3, and the speaker SP5. A triangular region is referred to as a region TR23.

In the three-dimensional VBAP, these regions TR21 to TR23 are one mesh (three-dimensional mesh). Now, assuming that a three-dimensional vector indicating a position where the sound image is to be localized is a vector P, in the example of FIG. 5, the vector P indicates a position on the region TR21.

In this example, the three-dimensional vectors facing the positions of the speaker SP1, the speaker SP4, and the speaker SP5 are the vector I ₁ , the vector I ₄ , and the vector I _5. As shown in (1), it can be expressed by a linear sum of a vector I ₁ , a vector I ₄ , and a vector I ₅ .

P = g ₁ I ₁ + g ₄ I ₄ + g ₅ I ₅ (1)

Then, in the formula (1), the vector I _1, vector I ₄ and, the coefficients g _1, which is multiplied by the vector I _5, the coefficient g _4, seeking coefficient g _5, these coefficients, the speaker SP1, a speaker If the gain of the sound output from each of SP4 and speaker SP5 is used, the sound image can be localized at a desired sound image position.

In this case, the gain of the sound output from the speaker SP2 and the speaker SP3 that does not constitute the region TR21 is zero. That is, no sound is output from the speaker SP2 and the speaker SP3.

As described above, in FIG. 5, if five speakers SP1 to SP5 are arranged in a three-dimensional space, a sound image can be placed at an arbitrary position (sound image position) on a region composed of regions TR21 to TR23 by three-dimensional VBAP. It is possible to localize.

As shown in FIG. 6, in the three-dimensional VBAP, the sound image position VSP is represented by a polar coordinate system based on Azimuth (φ), Elevation (θ), and Radius (r). However, the relationship between the polar coordinate notation and the orthogonal coordinate notation is s (φ, θ, r) = p (x, y, z). In addition, the “front” represented by the arrow in the x-axis direction in the figure indicates, for example, the direction of the content video display area (for example, “TV screen” in FIG. 2) with respect to the head of the user U11. pointing.

(Decoder configuration example)
FIG. 7 is a diagram illustrating a configuration example of the audio decoder 50 corresponding to MPEG-H 3D Audio.

In FIG. 7, an audio decoder 50 corresponding to MPEG-H 3D Audio includes a USAC-3D decoder 51, a format converter 52, a metadata decoder 53, an object renderer 54, a SAOC-3D decoder 55, a HOA renderer 56, a mixer 57, and , A binaural renderer 58.

The MPEG-H bit stream is input to the USAC-3D decoder 51. The USAC-3D decoder 51 decodes the MPEG-H bitstream in accordance with USAC (Unified Speech and Audio Audio Coding).

The USAC-3D decoder 51 obtains channels (Channels), metadata (Compressed object metadata), objects (Object), SAOC transport channels (SAOC Transport Channels), and HOA coefficients (HOA (Higher order ambisonics) obtained by decoding. ) (Coefficient) is supplied to the format converter 52, the metadata decoder 53, the object renderer 54, the SAOC-3D decoder 55, and the HOA renderer 56, respectively.

The format converter 52 performs processing such as format conversion on the channel information of each channel supplied from the USAC-3D decoder 51, and supplies the processing result to the mixer 57.

The metadata decoder 53 decodes the compressed metadata supplied from the USAC-3D decoder 51 and supplies the metadata obtained thereby to the object renderer 54 and the SAOC-3D decoder 55, respectively.

The object renderer 54 includes object information (for example, polar coordinate information of the object sound source) regarding each object sound source from the USAC-3D decoder 51 and metadata (for example, meta information including position information of each speaker SP) from the metadata decoder 53. Data).

The object renderer 54 is based on the object information, metadata, and the like, and has a triangular area TR (for example, FIG. 5) surrounded by three speakers SP (for example, the speakers SP1, SP4, and SP5 in FIG. 5) in the vicinity of the target object sound source. 5 region TR21), the processing is performed so that the sound image is localized at the position of the target object sound source (for example, the sound image position VSP corresponding to the vector P in FIG. 5). The object renderer 54 supplies the processing result to the mixer 57.

The SAOC-3D decoder 55 is based on SAOC transport channel information supplied from the USAC-3D decoder 51 and metadata (metadata including position information of the speaker SP) supplied from the metadata decoder 53, etc. Processing related to the object sound source is performed, and the processing result is supplied to the mixer 57.

The HOA renderer 56 performs processing related to the microphone arranged on the spherical surface based on the HOA coefficient supplied from the USAC-3D decoder 51, and supplies the processing result to the mixer 57.

The mixer 57 mixes the processing results from the format converter 52, the object renderer 54, the SAOC-3D decoder 55, and the HOA renderer 56, and outputs the result to each speaker SP (for example, the speakers SP1 to SP5 in FIG. 5). . When outputting to headphones, the output from the mixer 57 is processed by the binaural renderer 58 before being output.

The audio decoder 50 compatible with MPEG-H 3D Audio is configured as described above.

Here, in the transmission system 1 of FIG. 1, when the receiving device 20 has a configuration capable of realizing the three-dimensional VBAP shown in FIG. 5, dialog control is performed from the transmitting device 10 via the transmission path 30. When receiving the information, the receiving device 20 controls the dialog of the sound image that is localized at a position on the region composed of the regions TR21 to TR23, for example, based on the dialog control information.

Specifically, for example, when receiving the dialog control information including the instruction for adjusting the volume of the dialog from the transmitting device 10, the receiving device 20 has a position on the region composed of the regions TR21 to TR23 based on the dialog control information. Adjust the volume level for the sound image localized in In addition, when receiving the dialog control information including the dialog replacement instruction from the transmission device 10, the receiving device 20 receives, for example, the Japanese text input to the object sound source Ob1 and the object sound source Ob2 based on the dialog control information. Replace dialogs with English or French dialogs.

As described above, in the transmission system 1 of FIG. 1, when the reception device 20 has a configuration capable of realizing the three-dimensional VBAP shown in FIG. 5, the dialog from the transmission device 10 via the transmission path 30 is performed. When the control information is transmitted, the receiving device 20 controls a dialog of multi-channel audio such as dialog volume level control, dialog replacement control, or dialog localization position control based on the dialog control information.

<3. Contents of dialog control technology of this technology>

As described above, the receiving device 20 controls the dialog of the multi-channel audio based on the dialog control information from the transmitting device 10. For example, ARIB (Association of Radio Industries, and Businesses: General Incorporated Association Meeting) has standardized that dialog control information is transmitted in an audio stream.

(Audio stream syntax)
FIG. 8 is a diagram illustrating the syntax of an audio stream for transmitting dialog control information. This syntax is the syntax of the bitstream added to the end of MPEG4_ancillary_data ().

ext_downmixing_level_status2 is set to “0” in the case of an audio mode in which a value other than “13” is set as channelConfiguration, which is a parameter set in MPEG-4 Audio, and downmix coefficients (dmix_c_idx, dmix_d_idx, dmix_e_idx, dmix_f_idx, dmix_g_idx, dmix_l_idx) will not be transmitted.

Ext_dialogue_status is a flag indicating whether or not dialog control information exists. Here, when “1” is set as ext_dialogue_status, dialog control information exists.

Here, when "1" is set in ext_dialogue_status, the following information related to dialog control is arranged.

Num_dialogue_chans is set to the number of channels dedicated to the dialog. The number of bits of num_dialogue_chans is determined according to the number of channels of main audio.

In the 3-bit sn_dialogue_plus_index, the upper limit value of the allowable value of the gain control amount on the receiving device 20 side is set. In the 3-bit sn_dialogue_minus_index, a lower limit value of an allowable value of the gain control amount on the receiving device 20 side is set.

The language code of the main dialog is set in 24-bit dialog_main_lang_code. This language code value conforms to, for example, ISO 639-2, and a value defined in ISO / IEC 8859-1 can be used for characters.

In 8-bit dialog_main_lang_comment_bytes, the number of bytes of character string information for indicating the contents of the main dialog is set. 8-bit dialog_main_lang_comment_data is arranged in a loop corresponding to the number of bytes indicated by the dialog_main_lang_comment_bytes. In the dialogue_main_lang_comment_data, byte data of character string information for indicating the contents of the main dialog is set.

Also, dialog_src_index and 4-bit dialog_gain_index are arranged in a loop according to the number of channels dedicated to the dialog indicated by num_dialogue_chans. In dialogue_src_index, the index of the channel dedicated to the dialog is set. The number of bits of dialog_src_index_ is determined according to the number of channels of main audio. In dialogue_gain_index, the gain correction value index of the additional dialog is set.

∙ The number of additional dialogs is set in 4-bit num_additional_lang_chans. A 24-bit dialog_additional_lang_code and an 8-bit dialog_additional_lang_comment_bytes are arranged in a loop corresponding to the number of additional dialogs indicated by the num_additional_lang_chans. In dialogue_additional_lang_code, the language code of the additional dialog is set. This language code value conforms to, for example, ISO 639-2, and a value defined in ISO / IEC 8859-1 can be used for characters.

“Dialogue_additional_lang_comment_bytes” is set with the number of bytes of the character string information for indicating the contents of the add dialog. 8-bit dialog_additional_lang_comment_data is arranged in a loop corresponding to the dialog_additional_lang_comment_bytes. In the dialogue_additional_lang_comment_data, byte data of character string information for indicating the contents of the add dialog is set.

Byte_alignment () is a function for adjusting the data length to a byte unit (multiple of 8 bits), and its starting point is taken from ext_dialogue_status.

The detailed content of the syntax of the bitstream added to the end of MPEG4_ancillary_data () shown in FIG. 8 is described in “ARIB STD-B32 3.2 edition General Incorporated Association Radio Industry Association”.

Here, when dialog control information is transmitted with an audio stream as shown in FIG. 8, there is a possibility that the reception device 20 may have difficulty handling the dialog control information.

For example, when the receiving apparatus 20 can adjust the volume within the range of 0 to 100, and the dialog control information instructs the volume adjustment of the dialog within the range of 30 to 70, 30 to 70 Therefore, even if the user wants to reduce the volume to 0, the volume can only be reduced to 30. In such a case, for example, the volume can be adjusted in the range of 0 to 100, but presenting a user interface indicating that the volume is currently limited to the range of 30 to 70, the volume is shown to the user. It is a general flow of processing to notify the reason why cannot be lowered.

However, in the current ARIB method, dialog control information is transmitted in an audio stream, so in the system layer that is a layer that processes signaling (descriptor) and the like, dialog control information is not transmitted. It is necessary to pass dialog control information acquired from the audio stream to the system layer using API (Application Programming Interface) or the like. As a result, the system layer can present a user interface indicating that volume adjustment is limited to a range of 30 to 70, for example, based on the dialog control information obtained from the audio stream, but the API is used. It is assumed that processing will take time, such as the need to perform processing.

Therefore, there has been a request that the receiving apparatus 20 wants to make dialog control information easy to handle. Therefore, in the present technology, the dialog control information is transmitted in the system layer (descriptor) so that the reception apparatus 20 can easily handle the dialog control information.

However, in ARIB 策定 STD-B60 1.3 currently being developed, information indicating whether the audio stream includes dialog control information is placed in the audio component descriptor (MH-Audio Component Descriptor) transmitted in the system layer. It is planned.

(Configuration of audio component descriptor)
FIG. 9 is a diagram illustrating an example of the syntax of an audio component descriptor (MH-Audio Component Descriptor) defined in the ARIB STD-B60 version 1.3.

The 16-bit descriptor_tag contains a tag value that identifies each descriptor. Descriptor length is set in 8-bit descriptor_length. The descriptor_length is followed by a 4-bit reserved area (reserved_future_use).

The stream type is set in 4-bit stream_content. In 8-bit component_type, information related to encoding for an audio component is set.

∙ Information for identifying the component stream is set in the 16-bit component_tag. The audio stream format is set in the 8-bit stream_type.

∙ In 8-bit simulcast_group_tag, the same number is set for the component performing simulcast. In the 1-bit ES_multi_lingual_flag, “1” is set when bilingual multiplexing is performed in the elementary stream.

-1 is set in 1-bit main_component_flag when the target audio component is the main audio. A predefined sound quality mode is set in the 2-bit quality_indicator.

∙ Sampling frequency is specified in 3-bit sampling_rate. The sampling_rate is followed by a 1-bit reserved area (reserved_future_use).

The language code of the audio component is set in 24-bit ISO_639_language_code. Further, when “1” is designated as ES_multi_lingual_flag, 24-bit ISO_639_language_code_2 is arranged. ISO_639_language_code_2 is set with the language code of the second audio component in the case of the multilingual mode.

∙ Information specifying the character description of the component stream is set in the 8-bit text_char.

The audio component descriptor is configured as described above. In this audio component descriptor, among the 8 bits (b7 to b0) of component_type, the most significant bit (b7) can indicate the presence / absence of dialog control information.

Specifically, as shown in FIG. 10, when “0” is set in the most significant bit (b7) of component_type, it indicates that the audio stream does not include dialog control information. When 1 "is set, it indicates that the audio stream includes dialog control information.

In this way, the audio component descriptor (MH-Audio Component Descriptor) defined in ARIB STD-B60 1.3 contains only information indicating whether or not the audio stream includes dialog control information. Even if it is known whether or not the stream is transmitting dialog information, eventually it is necessary to pass the dialog control information to the system layer using an API or the like, so that processing takes time.

For this reason, there is a possibility that the dialog control information may be difficult to handle in the receiving device 20, so that a proposal for making the dialog control information easy to handle and performing dialog control using the dialog control information is requested. It was.

Therefore, in this technology, a descriptor including dialog control information for responding to such a request (hereinafter also referred to as this technology descriptor) is defined and transmitted by the system layer. A method for enabling the apparatus 20 to perform dialog control using dialog control information transmitted in the system layer is proposed.

(Configuration example of this technical descriptor)
FIG. 11 is a diagram illustrating an example of syntax and semantics of the technology descriptor.

1-bit ext_dialogue_status is a flag indicating that dialog control is performed. Here, dialog control is performed when “1” is set as ext_dialogue_status. Next to ext_dialogue_status is a 7-bit reserved area (reserved).

Here, when "1" is set in ext_dialogue_status, the following information related to dialog control (dialog control information) is arranged.

The number of audio components for which dialog control is performed is set in 5-bit num_of_dialog_chans. Next to num_of_dialog_chans is a 3-bit reserved area (reserved).

The upper limit of the audio level at which dialog control is performed is set in the 8-bit dialog_plus_index. Also, the lower limit of the audio level at which dialog control is performed is set in the 8-bit dialog_minus_index.

The country code of the main dialog is set in 24-bit dialog_lang_code. Here, for example, a 3-byte code issued by the International Organization for Standardization (ISO) can be used.

The 8-bit dialog_main_lang_comment_bytes field contains the number of character information bytes in the main dialog contents. 8-bit dialog_main_lang_comment_data is arranged in a loop corresponding to the number of bytes indicated by the dialog_main_lang_comment_bytes. The content of the main dialog is set in dialogue_main_lang_comment_data.

Also, a 5-bit dialog_src_index, a 3-bit reserved area (reserved), and a 4-bit dialog_gain_index are arranged in a loop corresponding to the number of audio components (dialog control is performed) indicated by num_of_dialog_chans. In dialogue_src_index, the index of the dialog dedicated channel is set. In dialogue_gain_index, the gain correction index of the additional dialog is set.

∙ The number of additional dialogs is set in 4-bit num_additional_lang_chans. Next to num_additional_lang_chans is a 4-bit reserved area (reserved).

In the loop according to the number of additional dialogs indicated by num_additional_lang_chans, 24-bit dialog_additional_lang_code and 8-bit dialog_additional_lang_comment_bytes are arranged. In dialogue_additional_lang_code, the country code of the additional dialog is set. Here, for example, a 3-byte code issued by the International Organization for Standardization (ISO) can be used.

In dialog_additional_lang_comment_bytes, the number of character information bytes of the contents of the add dialog is set. 8-bit dialog_additional_lang_comment_data is arranged in a loop corresponding to the number of bytes indicated by the dialog_additional_lang_comment_bytes. In dialogue_additional_lang_comment_data, the content of the add dialog is set.

This technical descriptor is configured as described above. The configuration of the syntax and semantics of this technical descriptor shown in FIG. 11 is an example, and other configurations may be adopted.

As described above, in the present technology, by newly defining the present technology descriptor in FIG. 11 and transmitting the system descriptor in the system layer, the reception device 20 can transmit the present technology description in the system layer. Based on the child (dialog control information included therein), dialog control of multi-channel audio such as dialog volume level control, dialog replacement control, or dialog localization position control can be performed.

(Corresponding to expansion of existing descriptors)
In the above description, the technology descriptor (FIG. 11) is newly defined and the transmission method in the system layer has been described. However, the contents (dialog control information) described in the technology descriptor (FIG. 11) are described. ) May be adopted as long as it can be transmitted in the system layer.

For example, by describing the contents (dialog control information) described in the technical descriptor (FIG. 11) in the audio component descriptor (MH-Audio Component Descriptor) in FIG. 9 described above, dialog control is performed in the system layer. Audio component descriptors containing information can be transmitted. As a result, the receiving device 20 can perform dialog control such as dialog volume level control based on the technical descriptor (contained in the dialog control information) transmitted in the system layer.

In the audio component descriptor (FIG. 9), for example, the content of the present technology descriptor (FIG. 11) can be described after the text_char loop arranged last. However, what is arranged after the loop of text_char is an example, and the contents of the technical descriptor (FIG. 11) can be arranged at an arbitrary position in the audio component descriptor (FIG. 9). Further, the content of this technical descriptor (FIG. 11) is not limited to the audio component descriptor (FIG. 9), but may be described in another descriptor transmitted in the system layer.

(DVB support)
Further, in the above description, the description has been made on the assumption that the descriptor is standardized by ARIB, but it may be applied to other digital broadcasting standards such as DVB (Digital Video Broadcasting). That is, also in other digital broadcasting standards such as DVB, the receiving apparatus 20 controls dialog control using dialog control information by transmitting the present technology descriptor (FIG. 11) including dialog control information in the system layer. It can be performed.

Here, in DVB, as shown in FIG. 12, information on dialog control is described in a DE (Dialogue Enhancement) _control_data descriptor. The detailed contents of the DE_control_data descriptor shown in FIG. 12 are described in “ETSIETTS 101 154 V2.2.1 (2015-06)”.

In addition, as shown in FIG. 13, in DVB, a component descriptor (Component Descriptor) is defined, and the contents (dialog control information) of this technical descriptor in FIG. 11 are described in this component descriptor. Thus, the component descriptor including the dialog control information can be transmitted in the system layer. As a result, the receiving device 20 can perform dialog control such as dialog volume level control based on the technical descriptor (contained in the dialog control information) transmitted in the system layer.

In the component descriptor (FIG. 13), for example, the content of the present technical descriptor (FIG. 11) can be described after the text_char loop arranged last. However, the arrangement following the text_char loop is an example, and the content of the present technology descriptor (FIG. 11) can be arranged at an arbitrary position in the component descriptor (FIG. 13). Further, the content of this technical descriptor (FIG. 11) is not limited to the component descriptor (FIG. 13), and may be described in another descriptor transmitted in the system layer.

The detailed contents of the component descriptor (Component Descriptor) shown in FIG. 13 are described in “ETSI EN 300-468 V 1.14.1” (2014-05).

(Support for other standards)
In the above description, ARIB has described that the dialog control information is standardized as being transmitted in an audio stream. However, other types such as MPEG-H and AC-4 (Audio Code number 4) have been described. Even in the standard, dialog control information is transmitted at the bitstream level, but dialog control information is not transmitted in the system layer.

Therefore, it is desirable to make it easy to handle dialog control information in standards other than digital broadcasting standards such as MPEG-H and AC-4, as in ARIB.

Therefore, for example, even in standards other than digital broadcasting standards such as MPEG-H and AC-4, the technical descriptor (FIG. 11) is transmitted in the system layer, or the existing descriptor is extended. By describing the contents of this technical descriptor (FIG. 11), dialog control information is transmitted in the system layer (descriptor). Accordingly, the receiving device 20 can perform dialog control such as dialog volume level control based on the descriptor (dialog control information included in the descriptor) transmitted in the system layer.

In addition, when applying to standards other than digital broadcasting standards such as MPEG-H and AC-4, the configuration of the transmission system 1 in FIG. 1 is as follows, for example. That is, in the transmission system 1 of FIG. 1, the transmission path 30 is a communication line such as the Internet or a telephone network, and the receiving device 20 transmits content to the transmitting device 10 installed as a server via the communication line. By requesting the distribution, the stream of the content distributed by streaming from the transmission device 10 is received and reproduced.

(Correspondence by application)
Further, the dialog control information may be transmitted as a data broadcast application such as HTML5 (HyperText Markup Language 5), for example, in addition to transmission by descriptor. In this case, the reception device 20 can perform dialog control by receiving and executing an application transmitted from the transmission device 10 via the transmission path 30. The application is not limited to data broadcasting, but may be distributed from a server via communication.

<4. Configuration of each device>

Next, detailed configurations of the transmission device 10 and the reception device 20 that constitute the transmission system 1 of FIG. 1 will be described.

(Configuration of transmitter)
FIG. 14 is a diagram illustrating a configuration example of the transmission device 10 of FIG.

14, the transmission apparatus 10 includes a control unit 101, a component acquisition unit 102, an encoder 103, a signaling generation unit 104, a signaling processing unit 105, a packet generation unit 106, a physical layer frame generation unit 107, and a transmission unit 108. Is done.

The control unit 101 controls the operation of each unit of the transmission device 10.

The component acquisition unit 102 acquires data such as video, audio, and subtitles (components) constituting content (for example, a television program) provided by a specific service, and supplies the acquired data to the encoder 103. The encoder 103 encodes data (components) such as video and audio supplied from the component acquisition unit 102 according to a predetermined encoding method, and supplies the encoded data to the packet generation unit 106.

Note that, as the content, for example, the corresponding content is acquired from the storage location of the already recorded content according to the broadcast time zone, or the live content is acquired from the studio or location location. Also, the content can be configured to include multi-channel audio components.

The signaling generation unit 104 acquires raw data for generating signaling from an external server, a built-in storage, or the like. The signaling generation unit 104 generates signaling using the raw data of signaling and supplies it to the signaling processing unit 105. Here, as the signaling, for example, the present technology descriptor (FIG. 11) including the dialog control information or the existing descriptor (FIG. 9) is generated. The signaling processing unit 105 processes the signaling supplied from the signaling generation unit 104 and supplies it to the packet generation unit 106.

The packet generator 106 processes video and audio (components) data supplied from the encoder 103 and signaling data supplied from the signaling processor 105, and converts the packet in which those data are stored. It is generated and supplied to the physical layer frame generation unit 107.

The physical layer frame generation unit 107 generates a physical layer frame by encapsulating a plurality of packets supplied from the packet generation unit 106 and supplies the physical layer frame to the transmission unit 108.

The transmission unit 108 performs, for example, OFDM (Orthogonal Frequency Division Multiplexing) modulation on the physical layer frame supplied from the physical layer frame generation unit 107, and transmits it as a digital broadcast signal via the antenna 111. Thus, the present technology descriptor (FIG. 11) or the existing descriptor (FIG. 9) including the dialog control information is transmitted in the system layer by the digital broadcast signal.

In the transmission apparatus 10 of FIG. 14, it is not necessary that all the functional blocks are physically disposed in a single apparatus, and at least some of the functional blocks are physically independent from other functional blocks. It may be configured as a device.

(Receiver configuration)
FIG. 15 is a diagram illustrating a configuration example of the receiving device 20 of FIG.

In FIG. 15, the reception device 20 includes a control unit 201, a reception unit 202, a physical layer frame processing unit 203, a packet processing unit 204, a signaling processing unit 205, a decoder 206, a video output unit 207, and an audio output unit 208. Is done.

The video output unit 207 is connected to a display device 221, and the audio output unit 208 is connected to speakers 222-1 to 222-N (N is an integer of 1 or more). In FIG. 15, the speakers 222-1 to 222-N are arranged corresponding to the speaker arrangement in the 22.2ch multi-channel audio sound system as shown in FIG. Although omitted in FIG. 15, a subwoofer may be arranged in addition to the speaker 222.

The control unit 201 controls the operation of each unit of the receiving device 20.

The reception unit 202 receives the digital broadcast signal transmitted from the transmission device 10 via the antenna 211, performs processing such as OFDM demodulation, and converts the physical layer frame obtained thereby into the physical layer frame processing unit. 203.

The physical layer frame processing unit 203 performs processing on the physical layer frame supplied from the receiving unit 202, extracts a packet, and supplies the packet to the packet processing unit 204.

The packet processing unit 204 processes the packet supplied from the physical layer frame processing unit 203 and acquires component and signaling data. Of the data acquired by the packet processing unit 204, signaling data is supplied to the signaling processing unit 205, and component data is supplied to the decoder 206.

The signaling processing unit 205 appropriately processes the signaling data supplied from the packet processing unit 204 and supplies it to the control unit 201.

The control unit 201 controls the operation of each unit based on the signaling supplied from the signaling processing unit 205. Specifically, the control unit 201 controls packet filtering performed by the packet processing unit 204 based on the analysis result of the signaling, so that data such as video and audio (components thereof) is supplied to the decoder 206. To.

The decoder 206 decodes (components) data such as video and audio supplied from the packet processing unit 204 according to a predetermined decoding method, and supplies the resulting video data to the video output unit 207. The audio data is supplied to the audio output unit 208.

The video output unit 207 causes the display device 221 to display video corresponding to the video data supplied from the decoder 206 in accordance with control from the control unit 201. In addition, the audio output unit 208 causes the speakers 222-1 to 222-N to output audio corresponding to the audio data supplied from the decoder 206 in accordance with control from the control unit 201. Thereby, in the receiving device 20, for example, video and audio of content (for example, a television program) corresponding to the user's channel selection operation are output.

Further, the control unit 201 controls the audio output unit 208 based on the dialog control information included in the technical descriptor (FIG. 11) or the existing descriptor (FIG. 9) transmitted in the system layer. It controls a dialog of multi-channel audio (for example, 22.2ch multi-channel audio shown in FIG. 2) realized by the speakers 222-1 to 222-N.

In FIG. 15, the receiving device 20 includes, for example, a fixed receiver such as a television receiver, a recorder, and a set top box (STB), a mobile receiver such as a smartphone and a tablet terminal, and an in-vehicle television. It can be set as the apparatus mounted in motor vehicles, such as.

<5. Flow of processing executed by each device>

Next, with reference to the flowcharts of FIGS. 16 to 17, the flow of processing executed by each device constituting the transmission system 1 of FIG. 1 will be described.

(Transmission process)
First, the flow of transmission processing executed by the transmission device 10 of FIG. 1 will be described with reference to the flowchart of FIG.

In step S101, component signaling acquisition processing is performed.

In this component signaling acquisition process, components such as video and audio (multi-channel audio) are acquired by the component acquisition unit 102, and data of components such as video and audio are encoded by the encoder 103.

In the component / signaling acquisition process, signaling is generated by the signaling generation unit 104 and the signaling is processed by the signaling processing unit 105.

Specifically, for example, when the MMT (MPEG Media Transport) method is used as a transport protocol, descriptors placed in tables such as MPT (MMT Package Table) and MH-EIT (Event Information Table) are used. Dialog control information (the contents of this technical descriptor in FIG. 11) is described. For example, this technical descriptor (FIG. 11) may be newly defined as a descriptor for describing the dialog control information, or an existing descriptor such as an audio component descriptor (FIG. 9) is extended. Then, dialog control information may be described in the extended area.

In step S102, packet / frame generation processing is performed.

In this packet / frame generation process, a packet is generated by the packet generation unit 106, and a physical layer frame is generated by the physical layer frame generation unit 107.

In step S103, digital broadcast signal transmission processing is performed.

In this digital broadcast signal transmission process, the transmission unit 108 performs a process on the physical layer frame and transmits it as a digital broadcast signal via the antenna 111. Thus, the present technology descriptor (FIG. 11) or the audio component descriptor (FIG. 9) including the dialog control information is transmitted in the system layer by the digital broadcast signal.

The flow of transmission processing has been described above.

(Reception processing)
Next, the flow of reception processing executed by the reception device 20 of FIG. 1 will be described with reference to the flowchart of FIG.

In step S201, digital broadcast signal reception processing is performed.

In this digital broadcast signal reception process, the receiver 202 receives the digital broadcast signal via the antenna 211.

In step S202, packet / frame processing is performed.

In this packet / frame processing, the physical layer frame processing unit 203 extracts a packet from the physical layer frame, and the packet processing unit 204 processes the packet.

In step S203, signaling component processing is performed.

In this signaling component processing, the control unit 201 controls the operation of each unit based on the signaling, and the decoder 206 decodes component data such as video and audio. As a result, the video output unit 207 displays the content video on the display device 221 in accordance with the control from the control unit 201. Also, the audio output unit 208 outputs the audio of the content from the speakers 222-1 to 222-N according to the control from the control unit 201.

Further, the control unit 201 controls the audio output unit 208 based on the dialog control information included in the technical descriptor (FIG. 11) or the audio component descriptor (FIG. 9) transmitted in the system layer. Control of multi-channel audio dialog realized by speakers 222-1 to 222-N, such as dialog volume level control, dialog replacement control, or dialog localization position control.

More specifically, the control unit 201 adjusts the volume level of the 22.2ch multi-channel audio realized by the speakers 222-1 to 222-N according to the dialog volume adjustment instruction included in the dialog control information. Do. By adjusting the volume level, for example, for a hearing impaired person, it is possible to control to increase only the level of the voice of narration.

For example, in this dialog control, when an instruction to increase the level of the speaker FC (for the front center channel) of the dialog dedicated channel and the speaker BtFC (for the lower front center channel) by xdB from the reference value is described, dialog_plus_index Within the range of 0 dB ≦ x ≦ +12 dB shown, each level of 20.2 ch other than the speaker FC and speaker BtFC is lowered by xdB. On the other hand, if an instruction to lower the levels of the speaker FC and the speaker BtFC of the dialog dedicated channel by xdB from the reference value is described, the speaker FC and the speaker are within the range of -∞dB ≦ -x ≦ 0dB indicated by dialog_minus_index. Each level of BtFC is lowered by xdB.

Further, the control unit 101, in accordance with the dialog replacement instruction included in the dialog control information, in the 22.2ch multi-channel audio realized by the speakers 222-1 to 222-N, for example, the speaker FC (for the front center channel) Replace the Japanese dialog input to the speaker BtFC (in the lower front center channel) with the English or French dialog.

For example, in this dialog control, when a dialog replacement instruction is received, dialog_gain_index [0] (−3 dB) indicating the assignment level to the speaker FC and dialog_gain_index [1] (0 dB) indicating the assignment level to the speaker BtFC are set. Referring to the English dialog with the level lowered by 3 dB instead of the Japanese dialog, the English dialog with the level lowered by 0 dB is assigned to the speaker BtFC. This replaces the Japanese dialog with the English dialog.

Note that here, 22.2ch multi-channel audio has been described as an example, but dialog control can also be performed in the same manner when 3D VBAP or the like is used. In addition to descriptors standardized by ARIB, other digital broadcasting standards such as DVB, as well as other standards such as MPEG-H and AC-4, dialog control information in the system layer Dialog control can be performed in the same manner by transmitting a descriptor including.

The flow of reception processing has been described above.

<6. Computer configuration>

The series of processes described above can be executed by hardware or software. When a series of processing is executed by software, a program constituting the software is installed in the computer. FIG. 18 is a diagram illustrating a configuration example of hardware of a computer that executes the above-described series of processing by a program.

In the computer 900, a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, and a RAM (Random Access Memory) 903 are connected to each other by a bus 904. An input / output interface 905 is further connected to the bus 904. An input unit 906, an output unit 907, a recording unit 908, a communication unit 909, and a drive 910 are connected to the input / output interface 905.

The input unit 906 includes a keyboard, a mouse, a microphone, and the like. The output unit 907 includes a display, a speaker, and the like. The recording unit 908 includes a hard disk, a nonvolatile memory, and the like. The communication unit 909 includes a network interface or the like. The drive 910 drives a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

In the computer 900 configured as described above, the CPU 901 loads the program recorded in the ROM 902 or the recording unit 908 to the RAM 903 via the input / output interface 905 and the bus 904, and executes the program. A series of processing is performed.

The program executed by the computer 900 (CPU 901) can be provided by being recorded on a removable medium 911 as a package medium, for example. The program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

In the computer 900, the program can be installed in the recording unit 908 via the input / output interface 905 by installing the removable medium 911 in the drive 910. Further, the program can be received by the communication unit 909 via a wired or wireless transmission medium and installed in the recording unit 908. In addition, the program can be installed in the ROM 902 or the recording unit 908 in advance.

Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in chronological order in the order described as the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or object processing). The program may be processed by a single computer (processor) or may be distributedly processed by a plurality of computers.

Note that the embodiments of the present technology are not limited to the above-described embodiments, and various modifications can be made without departing from the gist of the present technology.

Also, the present technology can take the following configurations.

(1)
A receiver for receiving a stream including multi-channel or stereo audio components transmitted through a transmission path;
An acquisition unit for acquiring dialog control information for controlling a dialog of multi-channel or stereo audio transmitted in a system layer that processes signaling defined by a predetermined standard;
And a control unit that controls a dialog of multi-channel or stereo audio transmitted through the transmission path based on the dialog control information.
(2)
The predetermined standard is a standard for digital broadcasting,
The receiving unit receives a broadcast wave of the digital broadcast,
The acquisition unit acquires the dialog control information transmitted in a system layer that processes signaling defined in the digital broadcasting standard,
The receiving device according to (1), wherein the control unit controls a dialog of multi-channel or stereo audio transmitted through the broadcast wave.
(3)
The dialog control information is described in a newly defined descriptor. The receiving apparatus according to (2).
(4)
The receiving apparatus according to (2), wherein the dialog control information is described in an extension area of a component descriptor.
(5)
The receiving apparatus according to (2) or (4), wherein the dialog control information is described in an extension area of an audio component descriptor.
(6)
The receiving apparatus according to (1), wherein the dialog control information is described as an application including an HTML5 (HyperText Markup Language 5) standard.
(7)
The receiving apparatus according to any one of (1) to (6), wherein the dialog control information includes information related to dialog volume level control, dialog replacement control, or dialog localization position control.
(8)
In the data processing method of the receiving device,
The receiving device is
Receiving a stream containing multi-channel or stereo audio components transmitted over a transmission line;
Obtaining dialog control information for controlling a multi-channel or stereo audio dialog transmitted in a system layer that processes signaling defined by a predetermined standard;
A data processing method including a step of controlling a dialog of multi-channel or stereo audio transmitted through the transmission path based on the dialog control information.
(9)
An acquisition unit for acquiring a stream including multi-channel or stereo audio components;
A generator for generating dialog control information for controlling a dialog of multi-channel or stereo audio;
A transmission unit that transmits the dialog control information along with the stream via a transmission path;
The dialog control information is transmitted in a system layer that processes signaling defined by a predetermined standard.
(10)
The predetermined standard is a standard for digital broadcasting,
The transmission unit transmits the dialog control information together with the stream by the broadcast wave of the digital broadcast,
The transmission apparatus according to (9), wherein the dialog control information is transmitted in a system layer that processes signaling defined in the digital broadcasting standard.
(11)
The dialog control information is described in a newly defined descriptor. The transmission device according to (10).
(12)
The transmission apparatus according to (10), wherein the dialog control information is described in an extension area of a component descriptor.
(13)
The transmission apparatus according to (10) or (12), wherein the dialog control information is described in an extension area of an audio component descriptor.
(14)
The transmission apparatus according to (9), wherein the dialog control information is described as an application including the HTML5 standard.
(15)
The transmission apparatus according to any one of (9) to (14), wherein the dialog control information includes information related to dialog volume level control, dialog replacement control, or dialog localization position control.
(16)
In the data processing method of the transmission device,
The transmitting device is
Get a stream containing multi-channel or stereo audio components,
Generate dialog control information to control multi-channel or stereo audio dialog,
Transmitting the dialog control information together with the stream via a transmission path;
The dialog control information is a data processing method transmitted in a system layer that processes signaling defined by a predetermined standard.

1 transmission system, 10 transmission device, 20 reception device, 30 transmission path, 101 control unit, 102 component acquisition unit, 104 signaling generation unit, 106 packet generation unit, 107 physical layer frame generation unit, 108 transmission unit, 201 control unit, 202 receiving unit, 203 physical layer frame processing unit, 204 packet processing unit, 205 signaling processing unit, 206 decoder, 207 video output unit, 208 audio output unit, 221 display device, 222-1 to 222-N speaker, 900 computer, 901 CPU

Claims

A receiver for receiving a stream including multi-channel or stereo audio components transmitted through a transmission path;
An acquisition unit for acquiring dialog control information for controlling a dialog of multi-channel or stereo audio transmitted in a system layer that processes signaling defined by a predetermined standard;
And a control unit that controls a dialog of multi-channel or stereo audio transmitted through the transmission path based on the dialog control information.
The predetermined standard is a standard for digital broadcasting,
The receiving unit receives a broadcast wave of the digital broadcast,
The acquisition unit acquires the dialog control information transmitted in a system layer that processes signaling defined in the digital broadcasting standard,
The receiving device according to claim 1, wherein the control unit controls a dialog of multi-channel or stereo audio transmitted through the broadcast wave.
The receiving apparatus according to claim 2, wherein the dialog control information is described in a newly defined descriptor.
The receiving apparatus according to claim 2, wherein the dialog control information is described in an extension area of a component descriptor.
The receiving apparatus according to claim 4, wherein the dialog control information is described in an extension area of an audio component descriptor.
The receiving apparatus according to claim 1, wherein the dialog control information is described as an application including an HTML5 (HyperText Markup Language 5) standard.
The receiving apparatus according to claim 1, wherein the dialog control information includes information related to dialog volume level control, dialog replacement control, or dialog localization position control.
In the data processing method of the receiving device,
The receiving device is
Receiving a stream containing multi-channel or stereo audio components transmitted over a transmission line;
Obtaining dialog control information for controlling a multi-channel or stereo audio dialog transmitted in a system layer that processes signaling defined by a predetermined standard;
A data processing method including a step of controlling a dialog of multi-channel or stereo audio transmitted through the transmission path based on the dialog control information.
An acquisition unit for acquiring a stream including multi-channel or stereo audio components;
A generator for generating dialog control information for controlling a dialog of multi-channel or stereo audio;
A transmission unit that transmits the dialog control information along with the stream via a transmission path;
The dialog control information is transmitted in a system layer that processes signaling defined by a predetermined standard.
The predetermined standard is a standard for digital broadcasting,
The transmission unit transmits the dialog control information together with the stream by the broadcast wave of the digital broadcast,
The transmission apparatus according to claim 9, wherein the dialog control information is transmitted in a system layer that processes signaling defined in the digital broadcasting standard.
The transmission apparatus according to claim 10, wherein the dialog control information is described in a newly defined descriptor.
The transmission apparatus according to claim 10, wherein the dialog control information is described in an extension area of a component descriptor.
The transmission apparatus according to claim 12, wherein the dialog control information is described in an extension area of an audio component descriptor.
The transmission apparatus according to claim 9, wherein the dialog control information is described as an application including an HTML5 standard.
The transmission apparatus according to claim 9, wherein the dialog control information includes information related to dialog volume level control, dialog replacement control, or dialog localization position control.
In the data processing method of the transmission device,
The transmitting device is
Get a stream containing multi-channel or stereo audio components,
Generate dialog control information to control multi-channel or stereo audio dialog,
Transmitting the dialog control information together with the stream via a transmission path;
The dialog control information is a data processing method transmitted in a system layer that processes signaling defined by a predetermined standard.