WO2016038876A1

WO2016038876A1 - Encoding device, decoding device, and speech signal processing device

Info

Publication number: WO2016038876A1
Application number: PCT/JP2015/004534
Authority: WO
Inventors: 岳大杉本; 靖茂中山; 小森　智康
Original assignee: 日本放送協会
Priority date: 2014-09-08
Filing date: 2015-09-07
Publication date: 2016-03-17
Also published as: JP6683618B2; JP6924862B2; JP6924863B2; JP2020101837A; JPWO2016038876A1; JP2020101836A

Abstract

The purpose of the present invention is to provide a system wherein an audience can control dialog by using a receiver or the like within the framework for channel-based methods of production and encoding. A coding device (1) is configured to encode an input speech signal and comprises: a compression encoding unit (11) for compressing and encoding a speech signal, and then outputting the speech signal as a compressed speech signal; and a multiplexing unit (12) for multiplexing input meta-data for controlling dialog with the compressed speech signal and outputting the multiplexed data.

Description

Encoding device, decoding device, and audio signal processing device

Cross-reference to related applications

This application claims the priority of Japanese Patent Application No. 2014-182695 (filed on September 8, 2014), the entire disclosure of which is incorporated herein by reference.

The present invention relates to an encoding device, a decoding device, and an audio signal processing device.

Many viewers' opinions on broadcast audio are related to the ease of listening to dialogs (narration, speech, speech, etc.). Conventional Japanese broadcast audio employs a channel-based method in which a sound engineer uniquely adjusts the volume balance between a dialog and a background on the broadcast station side (for example, Non-Patent Document 1). The channel base system is, for example, MPEG-4 AAC (for example, Non-Patent Document 2). Many viewers are interested in the ease of hearing the dialog.

In order to make dialogs easy to hear, studies are proceeding in the direction of adopting an object-based method (for example, Patent Document 1) for audio systems for next-generation broadcasting in Europe and the United States. The object-based system is a system that performs transmission using an encoding system such as MPEG-H 3D Audio (eg, Non-Patent Document 3) or Dolby's AC-4, and controls important audio objects such as dialogs at the receiver. This is a possible method.

In the channel-based method adopted by Japan as described above, the viewer operating the receiver cannot adjust the volume of the dialog. However, when considering the viewer's preferences, age, and the diversity of the playback environment, there may be situations where the volume balance adjusted by the broadcast station cannot cover the diversity. This is considered to be one of the factors that make dialogue difficult to hear.

Japan's 8K SHV 22.2ch audio encoding method is the above-mentioned MPEG-4 AAC, which is a channel-based method in which audio signals and speakers correspond one-to-one. In addition, the audio encoding system of Japanese terrestrial digital broadcasting is MPEG-2 AAC, which is a channel-based system. Therefore, at present, it is impossible to control a sound object such as a dialog.

An object of the present invention made in view of such circumstances is a mechanism in which a viewer can control a dialog using a receiver or the like within the framework of a channel-based production method and a channel-based coding method. It is an object to provide an encoding device, a decoding device, and an audio signal processing device that are realized.

The invention according to the first aspect to achieve the above object is
An encoding device that encodes an input audio signal,
A compression encoding unit that compresses and encodes the audio signal and outputs the compressed audio signal;
A multiplexing unit that multiplexes and outputs the input dialog control metadata with the compressed audio signal;
Is provided.

Further, the dialog control metadata may include a flag indicating whether or not the program corresponds to the dialog control function, and an upper limit value and a lower limit value of gain control in the receiver or a playback device connected to the receiver.

Further, the multiplexing unit may encode the information on the upper limit value and the lower limit value.

The invention according to the second aspect to achieve the above object is
A separation unit that separates an input signal in which dialog control metadata and a compressed audio signal are multiplexed into the dialog control metadata and the compressed audio signal;
A decoding unit for decoding the compressed audio signal;
Is provided.

The invention according to the third aspect for achieving the above object is:
An audio signal processing device that performs audio signal processing using the dialog control metadata separated in the decoding device and the audio signal decoded in the decoding device or the compressed audio signal that has not been decoded,
A dialog control availability determination unit that determines whether or not dialog control is possible based on a flag indicating whether or not the program corresponds to a dialog control function;
A dialog dedicated channel signal specifying unit for specifying a dialog dedicated channel signal;
The upper limit value and lower limit value of the gain control amount of the dialog dedicated channel signal are acquired as dialog control information, and different signal processing is performed for each of the dialog dedicated channel signal and any other number of channel signals. A control unit for outputting as an audio signal;
Is provided.

The audio signal processing device further includes a control information acquisition unit that acquires control information of the dialog from an external control information input device,
The control unit may adjust and output the audio signal based on the control information.

Further, the control information acquisition unit acquires dialog volume adjustment information as the control information of the dialog,
The controller may adjust and output an audio signal based on the adjustment information.

Further, the dialog dedicated channel signal specifying unit may specify the signal of the dialog dedicated channel based on the audio system metadata acquired from the decoding device.

Further, the dialog dedicated channel signal specifying unit may specify the signal of the dialog dedicated channel using information acquired from an external device other than the decoding device.

Further, the control unit may further perform speech speed conversion processing on the dialog.

In addition, when the control unit acquires dialog volume adjustment information that is higher than the upper limit value of the gain control amount or lower than the lower limit value, the control unit may limit the adjustment by the upper limit value or lower limit value of the gain control amount. Good.

In addition, the control unit
When the adjustment information that increases the dialog volume is acquired, the gain of the channel signal other than the dialog dedicated channel signal is reduced,
When the adjustment information for reducing the dialog volume is acquired, only the gain of the dialog dedicated channel signal may be reduced.

In addition, the control unit may convert the number of channels by conversion means including downmix after controlling the dialog.

In addition, the control unit may perform signal processing including frequency correction processing on both or either of the dialog dedicated channel signal and any other number of channel signals.

In addition, the control unit performs the audio signal processing as it is without decoding the compressed audio signal separated from the bit stream in the decoding device, and then outputs the decoded audio signal as an audio signal or decodes the decoded audio signal. Instead, it may be output as a compressed audio signal.

In addition, the control unit may multiplex the compressed audio signal and / or one of the dialog control metadata and audio method metadata after the audio signal processing, and output the multiplexed audio signal as a bit stream.

According to the encoding device, the decoding device, and the audio signal processing device of the present invention, a receiver or a playback device connected to the receiver within the framework of the channel-based production method and the channel-based encoding method. Using this, it is possible to realize a mechanism that allows the viewer to control the dialog.

It is a figure which shows the three-dimensional (stereo) sound system which concerns on one Embodiment of this invention. It is a functional block diagram of the encoding apparatus which concerns on one Embodiment of this invention. It is a functional block diagram of the decoding apparatus which concerns on one Embodiment of this invention. 1 is a functional block diagram of an audio signal processing device and a control information input device according to an embodiment of the present invention. It is a figure which shows the operation | movement flow of the audio | voice signal processing system which concerns on one Embodiment of this invention.

Hereinafter, a mechanism for enabling dialog control in a receiver or a playback device (external playback device such as a speaker or a recording device) connected to the receiver (hereinafter referred to as a receiver) will be described. In this embodiment, as an example, a 22.2ch acoustic system for 8K SHV will be described as an acoustic system having a plurality of audio channels and dialog dedicated channels.

The audio signal processing system according to the present embodiment includes an encoding device 1, a decoding device 2, an audio signal processing device 3, and a control information input device 4, which communicate with each other via a network in a wired or wireless manner. In the following description, each function of the voice processing system according to the present invention will be described, but it should be noted that it is not intended to exclude other functions included in the voice processing system.

FIG. 1 is a diagram showing a three-dimensional (three-dimensional) sound method at the time of production by a production method corresponding to the dialog control function in the 22.2ch sound system. As shown in FIG. 1, the program production of the ultra-high definition / high-sense video / audio system is performed under standard production conditions in which a large-screen video display 1a (for example, 7680 × 4320 pixels) and a speaker are arranged. Under the standard production conditions, the large screen image display 1a is centered on the listening position, and the upper layer consisting of 9 channels, the middle layer consisting of 10 channels, and the 3 channels except for the low-frequency sound effect speakers LFE1 and LFE2. The sound signal is produced with a total of 22 channels of speakers. The position where the 22-channel speaker is arranged is defined in the standard SMPTE ST2036-2-2-2008.

In order to realize the dialog control function using the channel-based method, a dedicated channel for dialog that does not overlap background sounds is required. In the present embodiment, the FC in FIG. 1 will be described as a dialog-dedicated channel as an example. There may be a plurality of dialog-dedicated channels. When there are a plurality of dialog dedicated channels, these dialog dedicated channels may reproduce the same audio signal or different audio signals.

FIG. 2 is a functional block diagram of the encoding device 1. The encoding device 1 includes a compression encoding unit 11 and a multiplexing unit 12. Various operations performed by the compression encoding unit 11 and the multiplexing unit 12 are processed by an arbitrary processing device such as a processor or a microcomputer (not shown).

The compression encoding unit 11 acquires the input audio signal and compresses and encodes the digital audio signal. The compression encoding unit 11 converts the compression encoded audio signal into a 22.2ch compressed audio signal and outputs the converted signal to the multiplexing unit 12.

The multiplexing unit 12 acquires the compressed audio signal acquired from the compression encoding unit 11, and the input dialog control metadata and metadata indicating the audio format (for example, channel configuration in MPEG Audio).

Next, the multiplexing unit 12 encodes the dialog control metadata and the metadata indicating the audio method, and multiplexes it with the acquired compressed audio signal. The dialog control metadata is, for example, data such as a flag indicating whether or not the program corresponds to a dialog control function, an upper limit value and a lower limit value of gain control in a receiver or the like. The multiplexing unit 12 stores metadata in, for example, a DSE (Data Stream Element) in the user extension area when transmitting in MPEG-4 AAC. The multiplexing unit 12 outputs the multiplexed data as a bit stream.

FIG. 3 is a functional block diagram of the decryption device 2. The decryption device 2 includes a separation unit 21, a metadata separation unit 22, and a decryption unit 23. Various operations performed by the separation unit 21, the metadata separation unit 22, and the decryption unit 23 are processed by an arbitrary processing device such as a processor or a microcomputer (not shown).

The separation unit 21 separates the bit stream acquired from the encoding device 1. Specifically, the separation unit 21 separates the bit stream (input signal) into metadata and a compressed audio signal, and outputs them to the metadata separation unit 22 and the decoding unit 23, respectively.

The metadata separation unit 22 separates the acquired metadata into dialog control metadata and audio method metadata.

The decoding unit 23 decodes the acquired compressed audio signal into an audio signal. Note that the decoding unit 23 does not have to decode the acquired compressed audio signal. In this case, the control unit 34 of the audio signal processing device 3 decodes the compressed audio signal without performing decoding, performs audio signal processing described later, and outputs the decoded audio signal as an audio signal. The control unit 34 may output the compressed audio signal as a compressed audio signal without performing decoding after performing the audio signal processing described later without decoding the compressed audio signal.

FIG. 4 is a functional block diagram of the audio signal processing device 3 and the control information input device 4. The audio signal processing device 3 is arranged, for example, at a subsequent stage of the decoding device 2 and acquires dialog control metadata, audio method metadata, and an audio signal from the decoding device 2. The audio signal processing device 3 includes a dialog control availability determination unit 31, a dialog dedicated channel signal identification unit 32, an audio signal separation unit 33, a control unit 34, a control information acquisition unit 35, and a storage unit 36. Various operations (audio signal processing) performed by the dialog dedicated channel signal specifying unit 32, the audio signal separating unit 33, the control unit 34, and the control information acquiring unit 35 are processed by an arbitrary processing device such as a processor or a microcomputer (not shown).

Based on the dialog control metadata (flag indicating whether or not the program is compatible with the dialog control function) acquired from the decoding device 2, the dialog control availability determination unit 31 converts the audio signal acquired from the decoding device 2 into the dialog It is determined whether the program corresponds to a control function (whether or not dialog control is possible). When the dialog control availability determination unit 31 determines that the audio signal is not a program corresponding to the dialog control function, the audio signal processing device 3 outputs the audio signal to a receiver or the like without performing audio signal processing.

The dialog dedicated channel signal specifying unit 32 specifies the signal of the dialog dedicated channel based on the audio method metadata acquired from the decoding device 2. The dialog dedicated channel signal specifying unit 32 may specify the signal of the dialog dedicated channel using information acquired from an external device other than the decoding device 2.

The audio signal separation unit 33 separates the audio signal into a dialog dedicated channel signal and other background sound channel signals based on the specification by the dialog dedicated channel signal specifying unit 32.

The control unit 34 acquires a dialog dedicated channel signal and a background sound channel signal from the audio signal separation unit 33.

Next, based on the dialog control metadata acquired from the decoding device 2, the control unit 34 determines the upper limit value and lower limit value of gain control in the receiver or the like (for example, the upper limit value is +18 dB and the lower limit value is −∞). get.

In addition, since the audio system is 22.2 ch, the control unit 34 refers to the storage unit 36 and specifies a dialog dedicated channel (FC in FIG. 1 in this embodiment). The control unit 34 may specify the dialog dedicated channel from other information (for example, program information).

Further, the control unit 34 controls information (for example, volume) input from the control information input device 4 outside the audio signal processing device 3 to the control information input device 4 according to the reception viewing environment by a remote control operation by the viewer. Adjustment information) is acquired via the control information acquisition unit 35. The control unit 34 controls the dialog dedicated channel signal and the background sound channel signal by using the dialog control metadata and the control information given from the viewer.

In this control, the control unit 34 may perform speech speed conversion processing on the dialog. Further, in the control, when the control unit 34 acquires dialog volume adjustment information that is higher than the upper limit value of the gain control amount or lower than the lower limit value, the control unit 34 limits the adjustment by the upper limit value or lower limit value of the gain control amount. May be.

In this control, the control unit 34 may perform different signal processing on the dialog dedicated channel signal and the background sound channel signal. For example, when the control unit 34 obtains the adjustment information for increasing the dialog volume, the controller 34 reduces the gain of the channel signal other than the dialog-dedicated channel signal, and obtains the adjustment information for reducing the dialog volume. Only the gain may be reduced. Further, the control unit 34 may simultaneously increase or decrease the volume of the dialog dedicated channel signal and the background sound channel signal after adjusting the dialog volume. Further, the control unit 34 may perform signal processing including frequency correction processing on both or either of the dialog dedicated channel signal and any other number of channel signals.

In addition, the control unit 34 converts the number of channels by conversion means including downmixing as necessary, and then outputs a 22.2ch audio signal combining the dialog dedicated channel signal and the background sound channel signal to the receiver. . The receiver outputs the audio signal from a playback device connected to the receiver, and as a result, the viewer can view desired audio as shown in the control information. When the above audio signal processing is performed with the compressed audio signal as it is, the control unit 34 multiplexes the compressed audio signal and / or one of the dialog control metadata and the audio method metadata and receives it as a bit stream. The compressed audio signal may be output without multiplexing the metadata.

FIG. 5 is a diagram showing an operation flow according to an embodiment of the present invention.

The encoding device 1 acquires the input audio signal (step S1) and performs compression encoding (step S2). Next, the encoding device 1 multiplexes the compressed audio signal that has been compression-encoded, the dialog control metadata, and the metadata indicating the audio method (step S3). The encoding device 1 outputs the multiplexed data to the decoding device 2 as a bit stream (step S4).

The decoding device 2 separates the bit stream acquired from the encoding device 1 into metadata and a compressed audio signal (step S5). The decryption apparatus 2 also separates the metadata into dialog control metadata and audio format metadata (step S6). Next, the decoding device 2 decodes the acquired compressed audio signal into an audio signal (step S7), and outputs the dialog control metadata, audio method metadata, and audio signal to the audio signal processing device 3 (step S8).

The audio signal processing device 3 determines whether or not the audio signal acquired from the decoding device 2 is a program corresponding to the dialog control function (step S9). If the audio signal processing apparatus 3 determines that the audio signal is not a program corresponding to the dialog control function (No in step S9), it does not perform steps S10 to S14.

On the other hand, when the audio signal processing device 3 determines that the audio signal is a program corresponding to the dialog control function (Yes in step S9), the upper limit value and the lower limit of the gain control in the receiver or the like from the dialog control metadata. Value information is acquired (step S10). Next, the audio signal processing device 3 specifies the signal of the dialog dedicated channel (step S11). Based on the specification, the audio signal processing device 3 separates the audio signal into a dialog dedicated channel signal and other background sound channel signals (step S12).

The audio signal processing device 3 acquires control information (for example, volume adjustment information) from the control information input device 4 outside the audio signal processing device 3 via the control information acquisition unit 35 (step S13). The audio signal processing device 3 adjusts the audio signal based on the control information (step S14).

Next, the audio signal processing device 3 outputs an audio signal to a receiver or the like (step S15).

Therefore, according to the encoding device 1, the decoding device 2, the audio signal processing device 3, and the control information input device 4 according to the present embodiment, within the framework of the channel-based production method and the channel-based encoding method. Thus, it is possible to realize a mechanism that allows the viewer to control the dialog using a receiver or the like.

Although the present invention has been described based on the drawings and embodiments, those skilled in the art can easily make various modifications and corrections based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present invention. For example, the functions included in each functional unit, each means, each step, etc. can be rearranged so that there is no logical contradiction, and a plurality of functional units, steps, etc. can be combined or divided. It is. Further, the above-described embodiments of the present invention are not limited to being implemented faithfully to the embodiments described above, and may be implemented by appropriately combining the features or omitting some of them. it can.

Needless to say, the present invention is applicable to audio systems other than 22.2ch. The present invention is not limited to MPEG-4 AAC, and can be applied to any audio coding system having a metadata area capable of storing dialog control information. Furthermore, it goes without saying that the present invention is not necessarily applied only to dialogs, but can be applied to control for the purpose of individually controlling by providing a dedicated channel for some kind of audio signal.

DESCRIPTION OF SYMBOLS 1 Encoding apparatus 11 Compression encoding part 12 Multiplexing part 2 Decoding apparatus 21 Separation part 22 Metadata separation part 23 Decoding part 3 Audio signal processing apparatus 31 Dialog control availability determination part 32 Dialog dedicated channel signal specification part 33 Audio signal Separation unit 34 Control unit 35 Control information acquisition unit 36 Storage unit 4 Control information input device

Claims

An encoding device that encodes an input audio signal,
A compression encoding unit that compresses and encodes the audio signal and outputs the compressed audio signal;
A multiplexing unit that multiplexes and outputs the input dialog control metadata with the compressed audio signal;
An encoding device comprising:
The dialog control metadata includes a flag indicating whether or not the program corresponds to a dialog control function, and an upper limit value and a lower limit value of gain control in a receiver or a playback device connected to the receiver. The encoding device described.
The encoding apparatus according to claim 2, wherein the multiplexing unit encodes information on the upper limit value and the lower limit value.
A separation unit that separates an input signal in which dialog control metadata and a compressed audio signal are multiplexed into the dialog control metadata and the compressed audio signal;
A decoding unit for decoding the compressed audio signal;
A decoding device comprising:
An audio signal processing device that performs audio signal processing using the dialog control metadata separated in the decoding device and the audio signal decoded in the decoding device or the compressed audio signal that has not been decoded,
A dialog control availability determination unit that determines whether or not dialog control is possible based on a flag indicating whether or not the program corresponds to a dialog control function;
A dialog dedicated channel signal specifying unit for specifying a dialog dedicated channel signal;
The upper limit value and lower limit value of the gain control amount of the dialog dedicated channel signal are acquired as dialog control information, and different signal processing is performed for each of the dialog dedicated channel signal and any other number of channel signals. A control unit for outputting as an audio signal;
An audio signal processing apparatus comprising:
A control information acquisition unit for acquiring control information of the dialog from an external control information input device;
The audio signal processing apparatus according to claim 5, wherein the control unit adjusts and outputs the audio signal based on the control information.
The control information acquisition unit acquires dialog volume adjustment information as control information of the dialog,
The audio signal processing apparatus according to claim 6, wherein the control unit adjusts and outputs an audio signal based on the adjustment information.
The audio signal processing device according to any one of claims 5 to 7, wherein the dialog dedicated channel signal specifying unit specifies a signal of the dialog dedicated channel based on audio method metadata acquired from the decoding device. .
The audio signal processing according to any one of claims 5 to 7, wherein the dialog dedicated channel signal specifying unit specifies a signal of the dialog dedicated channel using information acquired from an external device other than the decoding device. apparatus.
The audio signal processing device according to any one of claims 5 to 9, wherein the control unit further performs a speech speed conversion process on the dialog.
The said control part restrict | limits adjustment with the upper limit or lower limit of the said gain control amount, when the adjustment information of the dialog volume higher than the upper limit of the said gain control amount or lower than a lower limit is acquired. The audio signal processing device according to any one of 1 to 10.
The controller is
When the adjustment information that increases the dialog volume is acquired, the gain of the channel signal other than the dialog dedicated channel signal is reduced,
The audio signal processing device according to any one of claims 5 to 11, wherein when the adjustment information for reducing the dialog volume is acquired, only the gain of the dialog-dedicated channel signal is reduced.
The audio signal processing apparatus according to any one of claims 5 to 12, wherein the control unit converts the number of channels by conversion means including downmix after controlling the dialog.
14. The control unit according to claim 5, wherein the control unit performs signal processing including frequency correction processing on both or either of the dialog dedicated channel signal and any other number of channel signals. The audio signal processing apparatus according to 1.
The control unit performs the audio signal processing as it is without decoding the compressed audio signal separated from the bit stream in the decoding device, and then outputs the decoded audio signal as an audio signal or does not decode it. The audio signal processing apparatus according to any one of claims 5 to 14, wherein the audio signal processing apparatus outputs a compressed audio signal.
16. The control unit according to claim 5, wherein the control unit multiplexes the compressed audio signal and / or one of the dialog control metadata and audio method metadata after the audio signal processing, and outputs the multiplexed signal as a bit stream. The audio signal processing device according to claim 1.