JP2020101837A

JP2020101837A - Voice signal processor

Info

Publication number: JP2020101837A
Application number: JP2020056076A
Authority: JP
Inventors: 岳大杉本; Takehiro Sugimoto; 靖茂中山; Yasushige Nakayama; 小森　智康; Tomoyasu Komori; 智康小森
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2014-09-08
Filing date: 2020-03-26
Publication date: 2020-07-02
Anticipated expiration: 2035-09-07
Also published as: JPWO2016038876A1; JP6683618B2; JP2020101836A; JP6924862B2; WO2016038876A1; JP6924863B2

Abstract

To enable a viewer to control a dialog using a receiver, etc., within the framework of creation technique-encoding technique of a channel based scheme.SOLUTION: A voice signal processor 3 comprises: a dialog control allowability determination unit 31 for determining from a flag of a program or not that corresponds to a dialog control function whether or not dialog control is allowed; a dialog-exclusive channel signal specification unit 32 for specifying a dialog-exclusive channel signal when dialog control is allowed; a voice signal separation unit 33 for separating the voice signal into a dialog-exclusive channel signal and other signals; and a control unit 34 for acquiring dialog control information (upper- and lower-limit values of gain control amount of the dialog-exclusive channel signal) and performing different signal processing on the dialog-exclusive channel signal and other signals. The control unit 34 reduces the gain of other signals when adjustment information for increasing a dialog sound volume is acquired, and reduces only the gain of the dialog-exclusive channel signal when adjustment information for reducing a dialog sound volume is acquired.SELECTED DRAWING: Figure 4

Description

Cross-reference to related application

本出願は、日本国特許出願２０１４−１８２６９５号（２０１４年９月８日出願）の優先権を主張するものであり、当該出願の開示全体を、ここに参照のために取り込む。 This application claims the priority of Japanese Patent Application No. 2014-182695 (filed on Sep. 8, 2014), and the entire disclosure of the application is incorporated herein by reference.

本発明は、音声信号処理装置に関する。 The present invention relates to an audio signal processing device.

放送音声に対する視聴者意見は、ダイアログ（ナレーション、スピーチ、セリフ等）の聞きやすさに関するものが多い。従来の日本の放送音声は、放送局側で音声技術者がダイアログと背景の音量バランスを一意に調節してから送出するチャンネルベース方式を採用している（例えば、非特許文献１）。チャンネルベース方式とは例えばＭＰＥＧ―４ＡＡＣ（例えば、非特許文献２）である。ダイアログの聞きやすさに対して、多くの視聴者が関心を寄せている。 Most of viewers' opinions about broadcast sound are related to the easiness of hearing dialogues (narration, speech, dialogue, etc.). Conventional Japanese broadcast audio employs a channel-based method in which the audio engineer uniquely adjusts the volume balance between the dialog and the background on the broadcast station side and then sends the audio (for example, Non-Patent Document 1). The channel-based method is, for example, MPEG-4 AAC (for example, Non-Patent Document 2). Many viewers are interested in the ease of listening to dialogs.

ダイアログを聞き取りやすくするために、欧米の次世代放送の音声システムは、オブジェクトベース方式（例えば、特許文献１）を採用する方向で検討が進められている。オブジェクトベース方式とは、ＭＰＥＧ―Ｈ３ＤＡｕｄｉｏ（例えば、非特許文献３）又はＤｏｌｂｙのＡＣ―４等の符号化方式によって伝送を行う方式であり、ダイアログ等の重要な音声オブジェクトが受信機で制御可能になる方式である。 In order to make the dialog easier to hear, the next-generation broadcast audio systems in Europe and the United States are under study in the direction of adopting an object-based system (for example, Patent Document 1). The object-based method is a method of transmitting by an encoding method such as MPEG-H 3D Audio (for example, Non-Patent Document 3) or AC-4 of Dolby, and an important audio object such as a dialog is controlled by a receiver. It will be possible.

ＩＴＵ―Ｒ、“Ａｄｖａｎｃｅｄｓｏｕｎｄｓｙｓｔｅｍｆｏｒｐｒｏｇｒａｍｍｅｐｒｏｄｕｃｔｉｏｎ”、［Ｏｎｌｉｎｅ］、平成２６年２月、［平成２６年９月７日検索］、インターネット＜http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.2051-0-201402-I!!PDF-E.pdf＞ITU-R, "Advanced sound system for program production", [Online], February 2014, [September 7, 2014 search], Internet <http://www.itu.int/dms_pubrec/itu- r/rec/bs/R-REC-BS.2051-0-201402-I!!PDF-E.pdf> 一般社団法人電波産業会、ＡＲＩＢＳＴＤ―Ｂ３．０版 “デジタル放送における映像符号化、音声符号化及び多重化方式”、［Ｏｎｌｉｎｅ］、平成２６年７月３１日、［平成２６年９月７日検索］、インターネット＜http://www.arib.or.jp/english/html/overview/doc/2-STD-B32v3_0.pdf＞ARIB STD-B 3.0 version, General incorporated association, "Video coding, audio coding and multiplexing method in digital broadcasting", [Online], July 31, 2014, [September 2014 7 days search], Internet <http://www.arib.or.jp/english/html/overview/doc/2-STD-B32v3_0.pdf> ＩＳＯ／ＩＥＣＤＩＳ２３００８−３、“Ｉｎｆｏｒｍａｔｉｏｎｔｅｃｈｎｏｌｏｇｙ―Ｈｉｇｈｅｆｆｉｃｉｅｎｃｙｃｏｄｉｎｇａｎｄｍｅｄｉａｄｅｌｉｖｅｒｙｉｎｈｅｔｅｒｏｇｅｎｅｏｕｓｅｎｖｉｒｏｎｍｅｎｔｓ―Ｐａｒｔ３：３Ｄａｕｄｉｏ”ISO/IEC DIS 23008-3, "Information technology-High efficiency coding and media delivery in heterogeneous environment-Part 3:3D audio"

上述した、日本が採用しているチャンネルベース方式においては、受信機を操作する視聴者はダイアログの音量を調節することができない。しかしながら、視聴者の好み並びに年齢及び再生環境の多様性を考えた場合、放送局で調節した音量バランスではその多様性をカバーしきれない状況も存在すると考えられる。これは、ダイアログが聞き取りにくくなる要因の一つと考えられている。 In the above-mentioned channel-based system adopted by Japan, the viewer operating the receiver cannot adjust the volume of the dialogue. However, in consideration of viewer preferences, age, and diversity of reproduction environment, it is considered that there are situations in which the diversity cannot be fully covered by the volume balance adjusted at the broadcasting station. This is considered to be one of the factors that make the dialog difficult to hear.

日本の８ＫＳＨＶ２２．２ｃｈ放送の音声符号化方式は、上述したＭＰＥＧ―４ＡＡＣであり、音声信号とスピーカとが一対一に対応するチャンネルベース方式である。また日本の地上デジタル放送の音声符号化方式はＭＰＥＧ−２ＡＡＣであり、チャンネルベース方式である。このため、現状ではダイアログ等の音声オブジェクトの制御は不可能である。 The audio encoding method of 8K SHV 22.2ch broadcasting in Japan is the above-mentioned MPEG-4 AAC, which is a channel-based method in which audio signals and speakers correspond one-to-one. In addition, the audio encoding method for terrestrial digital broadcasting in Japan is MPEG-2 AAC, which is a channel-based method. Therefore, at present, it is impossible to control a voice object such as a dialog.

かかる事情に鑑みてなされた本発明の目的は、チャンネルベース方式の制作手法及びチャンネルベース方式の符号化手法の枠組み内で、受信機等を用いて視聴者がダイアログを制御することができる仕組みを実現する符号化装置、復号化装置及び音声信号処理装置を提供することにある。 In view of such circumstances, an object of the present invention is to provide a mechanism that allows a viewer to control a dialog using a receiver or the like within the framework of a channel-based production method and a channel-based encoding method. An object is to provide an encoding device, a decoding device, and an audio signal processing device that are realized.

本発明に係る音声信号処理装置は、各チャンネルに対応する音声信号に対して、チャンネルベース方式に基づいて音声信号処理を行う音声信号処理装置であって、ダイアログ制御機能に対応した番組か否かのフラグに基づいてダイアログ制御の可否を判定するダイアログ制御可否判定部と、前記ダイアログ制御可否判定部によりダイアログ制御可能と判定された場合に、ダイアログ専用チャンネル信号を特定するダイアログ専用チャンネル信号特定部と、前記音声信号を、前記ダイアログ専用チャンネル信号特定部の特定に基づいて、前記ダイアログ専用チャンネル信号と、前記ダイアログ専用チャンネル信号以外のチャンネル信号とに分離する音声信号分離部と、ダイアログ専用チャンネル信号の利得制御量の上限値及び下限値を、ダイアログの制御情報として取得し、前記ダイアログ専用チャンネル信号と、前記ダイアログ専用チャンネル信号以外のチャンネル信号とに対してそれぞれ異なる信号処理を行う制御部と、を備え、前記制御部は、ダイアログ音量を増加させる調節情報を取得したとき、前記ダイアログ専用チャンネル信号以外のチャンネル信号の利得を低減させ、ダイアログ音量を低減させる調節情報を取得したとき、前記ダイアログ専用チャンネル信号の利得のみを低減させる。 An audio signal processing device according to the present invention is an audio signal processing device for performing an audio signal processing on an audio signal corresponding to each channel based on a channel-based system, and whether or not the program corresponds to a dialog control function. A dialog control availability determination unit that determines whether the dialog control is possible based on the flag, and a dialog dedicated channel signal identification unit that identifies the dialog dedicated channel signal when the dialog control availability determination unit determines that the dialog control is possible. An audio signal separation unit that separates the audio signal into the dialog dedicated channel signal and a channel signal other than the dialog dedicated channel signal based on the specification of the dialog dedicated channel signal specifying unit; A control unit that obtains the upper limit value and the lower limit value of the gain control amount as the control information of the dialog and performs different signal processing on the dialog dedicated channel signal and the channel signals other than the dialog dedicated channel signal, respectively. When the control unit obtains the adjustment information for increasing the dialog volume, the control unit reduces the gain of a channel signal other than the dialog dedicated channel signal, and obtains the adjustment information for reducing the dialog volume, the dialog dedicated channel. Only reduce the gain of the signal.

また、前記制御部は、ダイアログの制御を行った後にダウンミックスを含む変換手段によりチャンネル数を変換してもよい。 Further, the control unit may convert the number of channels by a conversion unit including downmix after controlling the dialog.

また、前記制御部は、前記ダイアログ専用チャンネル信号と前記ダイアログ専用チャンネル信号以外のチャンネル信号との双方又はどちらか一方に、それぞれ周波数補正処理を含む信号処理を行ってもよい。 Further, the control unit may perform signal processing including frequency correction processing on both or either of the dialog dedicated channel signal and the channel signal other than the dialog dedicated channel signal.

また、前記制御部は、前記音声信号がビットストリームから分離された圧縮音声信号である場合、該圧縮音声信号を復号化せずにそのまま前記信号処理を行ってもよい。 Further, when the audio signal is a compressed audio signal separated from a bitstream, the control unit may directly perform the signal processing without decoding the compressed audio signal.

本発明による音声信号処理装置によれば、チャンネルベース方式の制作手法及びチャンネルベース方式の符号化手法の枠組み内で、受信機又は当該受信機に接続された再生装置を用いて視聴者がダイアログを制御することができる仕組みを実現することができる。 According to the audio signal processing device of the present invention, within the framework of the channel-based production method and the channel-based encoding method, the viewer uses the receiver or the reproduction device connected to the receiver to display a dialog. A mechanism that can be controlled can be realized.

本発明の一実施形態に係る３次元（立体）音響方式を示す図である。It is a figure which shows the three-dimensional (3D) acoustic system which concerns on one Embodiment of this invention. 本発明の一実施形態に係る符号化装置の機能ブロック図である。FIG. 3 is a functional block diagram of an encoding device according to an embodiment of the present invention. 本発明の一実施形態に係る復号化装置の機能ブロック図である。It is a functional block diagram of the decoding device which concerns on one Embodiment of this invention. 本発明の一実施形態に係る音声信号処理装置及び制御情報入力装置の機能ブロック図である。It is a functional block diagram of an audio signal processing device and a control information input device concerning one embodiment of the present invention. 本発明の一実施形態に係る音声信号処理システムの動作フローを示す図である。It is a figure which shows the operation|movement flow of the audio|voice signal processing system which concerns on one Embodiment of this invention.

以下、受信機又は当該受信機に接続された再生装置（スピーカ、録音装置等の外部再生装置）（以下、受信機等とする）においてダイアログを制御可能にする仕組みを説明する。本実施形態では、一例として複数の音声チャンネル及びダイアログ専用チャンネルを有する音響システムとして、８ＫＳＨＶ用の２２．２ｃｈ音響システムを説明する。 Hereinafter, a mechanism for enabling a dialog to be controlled in a receiver or a reproducing device (external reproducing device such as a speaker or a recording device) connected to the receiver (hereinafter, referred to as a receiver) will be described. In the present embodiment, a 22.2 ch sound system for 8K SHV will be described as an example of a sound system having a plurality of audio channels and a channel dedicated to dialogue.

本実施形態の音声信号処理システムは、符号化装置１、復号化装置２、音声信号処理装置３及び制御情報入力装置４を備え、これらはネットワークを介して有線又は無線により通信する。以下の説明では、本発明に係る音声処理システムの各機能を説明するが、それらが備える他の機能を排除することを意図したものではないことに留意されたい。 The audio signal processing system of this embodiment includes an encoding device 1, a decoding device 2, an audio signal processing device 3 and a control information input device 4, which communicate with each other by wire or wirelessly via a network. It should be noted that the following description describes each function of the voice processing system according to the present invention, but is not intended to exclude other functions included in them.

図１は２２．２ｃｈ音響システムにおける、ダイアログ制御機能に対応した制作手法による制作時の３次元（立体）音響方式を示す図である。図１に示すように、超高精細・高臨場感映像音響システムの番組制作は、大画面映像ディスプレイ１ａ（例えば、７６８０×４３２０画素）とスピーカとを配置した標準制作条件下で行う。この標準制作条件下では、聴取位置を中心として大画面映像ディスプレイ１ａを前方にして、低域効果音用スピーカＬＦＥ１及びＬＦＥ２を除いて、９チャンネルからなる上層、１０チャンネルからなる中層及び３チャンネルからなる下層の計２２チャンネルのスピーカで音響信号を制作する。当該２２チャンネルのスピーカが配置される位置は、規格ＳＭＰＴＥＳＴ２０３６−２−２００８に規定されている。 FIG. 1 is a diagram showing a three-dimensional (three-dimensional) sound system at the time of production by a production method corresponding to a dialog control function in a 22.2 ch sound system. As shown in FIG. 1, program production of an ultra-high definition and highly realistic audiovisual system is performed under standard production conditions in which a large screen image display 1a (for example, 7680×4320 pixels) and a speaker are arranged. Under this standard production condition, the large-screen image display 1a is placed in front of the listening position, and the low-frequency sound effect speakers LFE1 and LFE2 are excluded, and the upper layer of 9 channels is composed of the middle layer of 10 channels and 3 channels. A sound signal is produced by a total of 22 channels of speakers in the lower layer. The position where the 22-channel speaker is arranged is defined in the standard SMPTE ST2036-2-2008.

チャンネルベース方式でダイアログ制御機能を実現するためには、背景音を重ねないダイアログ専用のチャンネルが必要である。本実施形態では、一例として図１のＦＣをダイアログ専用チャンネルとして説明する。なお、ダイアログ専用チャンネルは複数存在しても構わない。ダイアログ専用チャンネルが複数存在する場合は、それらのダイアログ専用チャンネルは同一の音声信号を再生しても構わないし、それぞれ異なる音声信号を再生しても構わない。 In order to realize the dialog control function by the channel-based method, a dedicated channel for dialog that does not overlap background sound is required. In the present embodiment, as an example, the FC shown in FIG. 1 will be described as a dialog dedicated channel. There may be a plurality of dialog dedicated channels. When there are a plurality of dialog-dedicated channels, the dialog-dedicated channels may reproduce the same audio signal or different audio signals.

図２は、符号化装置１の機能ブロック図である。符号化装置１は、圧縮符号化部１１及び多重化部１２を備える。圧縮符号化部１１及び多重化部１２が行う各種動作は、図示しないプロセッサ又はマイコン等の任意の処理装置によって処理される。 FIG. 2 is a functional block diagram of the encoding device 1. The encoding device 1 includes a compression encoding unit 11 and a multiplexing unit 12. Various operations performed by the compression encoding unit 11 and the multiplexing unit 12 are processed by an arbitrary processing device such as a processor or a microcomputer not shown.

圧縮符号化部１１は、入力された音声信号を取得し、デジタル方式で圧縮符号化する。圧縮符号化部１１は、圧縮符号化した音声信号を、２２．２ｃｈの圧縮音声信号に変換して多重化部１２に出力する。 The compression encoding unit 11 acquires the input audio signal and compression-encodes it by a digital method. The compression encoding unit 11 converts the compression encoded audio signal into a compressed audio signal of 22.2 ch and outputs it to the multiplexing unit 12.

多重化部１２は、圧縮符号化部１１から取得した圧縮音声信号と、入力されたダイアログ制御用メタデータ及び音声方式を示すメタデータ（例えば、ＭＰＥＧＡｕｄｉｏでは、ｃｈａｎｎｅｌｃｏｎｆｉｇｕｒａｔｉｏｎ）とを取得する。 The multiplexing unit 12 acquires the compressed audio signal acquired from the compression encoding unit 11 and the input dialog control metadata and the metadata indicating the audio system (for example, channel configuration in MPEG Audio).

次いで多重化部１２は、ダイアログ制御用メタデータ、音声方式を示すメタデータを符号化し、取得した圧縮音声信号と共に多重化する。ダイアログ制御用メタデータとは、例えば、ダイアログ制御機能に対応した番組か否かのフラグ、受信機等における利得制御の上限値及び下限値等のデータである。多重化部１２は、ＭＰＥＧ―４ＡＡＣで伝送する場合は、例えばユーザ拡張領域のＤＳＥ（ＤａｔａＳｔｒｅａｍＥｌｅｍｅｎｔ）にメタデータを格納する。多重化部１２は、多重化したデータをビットストリームとして出力する。 Next, the multiplexing unit 12 encodes the dialog control metadata and the metadata indicating the audio method, and multiplexes them with the acquired compressed audio signal. The dialog control metadata is, for example, data such as a flag indicating whether the program is compatible with the dialog control function, an upper limit value and a lower limit value of gain control in a receiver or the like. When transmitting by MPEG-4 AAC, the multiplexing unit 12 stores the metadata in, for example, DSE (Data Stream Element) of the user extension area. The multiplexing unit 12 outputs the multiplexed data as a bitstream.

図３は、復号化装置２の機能ブロック図である。復号化装置２は、分離部２１、メタデータ分離部２２及び復号化部２３を備える。分離部２１、メタデータ分離部２２及び復号化部２３が行う各種動作は、図示しないプロセッサ又はマイコン等の任意の処理装置によって処理される。 FIG. 3 is a functional block diagram of the decoding device 2. The decoding device 2 includes a separation unit 21, a metadata separation unit 22 and a decoding unit 23. Various operations performed by the separation unit 21, the metadata separation unit 22, and the decoding unit 23 are processed by an arbitrary processing device such as a processor or a microcomputer (not shown).

分離部２１は、符号化装置１から取得したビットストリームを分離する。具体的には、分離部２１は、当該ビットストリーム（入力信号）をメタデータ及び圧縮音声信号に分離し、それぞれメタデータ分離部２２及び復号化部２３に出力する。 The separation unit 21 separates the bitstream acquired from the encoding device 1. Specifically, the separation unit 21 separates the bit stream (input signal) into metadata and a compressed audio signal, and outputs the metadata and the compressed audio signal to the metadata separation unit 22 and the decoding unit 23, respectively.

メタデータ分離部２２は、取得したメタデータを、ダイアログ制御用メタデータ及び音声方式メタデータに分離する。 The metadata separation unit 22 separates the acquired metadata into dialog control metadata and audio method metadata.

復号化部２３は、取得した圧縮音声信号を音声信号に復号化する。なお復号化部２３は、取得した圧縮音声信号を復号化しなくてもよい。この場合、音声信号処理装置３の制御部３４は、当該圧縮音声信号を復号化せずにそのまま後述の音声信号処理を行ってから復号化して音声信号として出力する。制御部３４は、当該圧縮音声信号を復号化せずにそのまま後述の音声信号処理を行ってから復号化せずに圧縮音声信号として出力してもよい。 The decoding unit 23 decodes the acquired compressed audio signal into an audio signal. The decoding unit 23 does not have to decode the acquired compressed audio signal. In this case, the control unit 34 of the audio signal processing device 3 performs the audio signal processing described below without decoding the compressed audio signal and then decodes the compressed audio signal to output it as an audio signal. The control unit 34 may perform the audio signal processing described below as it is without decoding the compressed audio signal, and then output the compressed audio signal as a compressed audio signal without decoding.

図４は音声信号処理装置３及び制御情報入力装置４の機能ブロック図である。音声信号処理装置３は、例えば復号化装置２の後段に配置され、復号化装置２からダイアログ制御用メタデータ、音声方式メタデータ及び音声信号を取得する。音声信号処理装置３は、ダイアログ制御可否判定部３１、ダイアログ専用チャンネル信号特定部３２、音声信号分離部３３、制御部３４、制御情報取得部３５及び記憶部３６を備える。ダイアログ専用チャンネル信号特定部３２、音声信号分離部３３、制御部３４及び制御情報取得部３５が行う各種動作（音声信号処理）は、図示しないプロセッサ又はマイコン等の任意の処理装置によって処理される。 FIG. 4 is a functional block diagram of the audio signal processing device 3 and the control information input device 4. The audio signal processing device 3 is arranged, for example, in the latter stage of the decoding device 2, and acquires the dialog control metadata, the audio method metadata, and the audio signal from the decoding device 2. The audio signal processing device 3 includes a dialog control availability determination unit 31, a dialog dedicated channel signal identification unit 32, an audio signal separation unit 33, a control unit 34, a control information acquisition unit 35, and a storage unit 36. Various operations (audio signal processing) performed by the dialog dedicated channel signal specifying unit 32, the audio signal separating unit 33, the control unit 34, and the control information acquiring unit 35 are processed by an arbitrary processing device such as a processor or a microcomputer not shown.

ダイアログ制御可否判定部３１は、復号化装置２から取得したダイアログ制御用メタデータ（ダイアログ制御機能に対応した番組か否かのフラグ）に基づいて、復号化装置２から取得した音声信号が、ダイアログ制御機能対応の番組か否か（ダイアログ制御の可否）を判定する。ダイアログ制御可否判定部３１が、当該音声信号が、ダイアログ制御機能対応の番組でないと判定すると、音声信号処理装置３は当該音声信号に音声信号処理を行わずに受信機等に出力する。 Based on the dialog control metadata (flag indicating whether or not the program corresponds to the dialog control function) acquired from the decoding device 2, the dialog control availability determination unit 31 determines that the audio signal acquired from the decoding device 2 is a dialog. Whether or not the program is compatible with the control function (whether or not the dialog control is possible) is determined. When the dialog control availability determination unit 31 determines that the audio signal is not a program compatible with the dialog control function, the audio signal processing device 3 outputs the audio signal to the receiver or the like without performing the audio signal processing.

ダイアログ専用チャンネル信号特定部３２は、復号化装置２から取得した音声方式メタデータに基づいて、ダイアログ専用チャンネルの信号を特定する。なおダイアログ専用チャンネル信号特定部３２は、復号化装置２以外の外部装置から取得した情報を用いてダイアログ専用チャンネルの信号を特定してもよい。 The dialog dedicated channel signal identifying unit 32 identifies the signal of the dialog dedicated channel based on the audio method metadata acquired from the decoding device 2. Note that the dialog dedicated channel signal specifying unit 32 may specify the dialog dedicated channel signal using information acquired from an external device other than the decoding device 2.

音声信号分離部３３は、ダイアログ専用チャンネル信号特定部３２による特定に基づいて、当該音声信号をダイアログ専用チャンネル信号とそれ以外の背景音チャンネル信号とに分離する。 The audio signal separation unit 33 separates the audio signal into a dialog dedicated channel signal and other background sound channel signals based on the identification by the dialogue dedicated channel signal identification unit 32.

制御部３４は、音声信号分離部３３からダイアログ専用チャンネル信号及び背景音チャンネル信号を取得する。 The control unit 34 acquires the dialog dedicated channel signal and the background sound channel signal from the audio signal separation unit 33.

次いで制御部３４は、復号化装置２から取得したダイアログ制御用メタデータに基づいて、受信機等での利得制御の上限値及び下限値（例えば、上限値は＋１８ｄＢ、下限値は−∞）を取得する。 Next, the control unit 34 sets the upper limit value and the lower limit value (for example, the upper limit value is +18 dB and the lower limit value is −∞) of the gain control in the receiver or the like based on the dialog control metadata acquired from the decoding device 2. get.

また、制御部３４は、音声方式が２２．２ｃｈであることから、記憶部３６を参照してダイアログ専用チャンネル（本実施形態では図１のＦＣ）を特定する。なお制御部３４は、ダイアログ専用チャンネルを、その他の情報（例えば、番組情報）から特定してもよい。 Further, since the audio system is 22.2 ch, the control unit 34 refers to the storage unit 36 to specify the dialog dedicated channel (FC in FIG. 1 in this embodiment). The control unit 34 may specify the dialog dedicated channel from other information (for example, program information).

更に、制御部３４は、音声信号処理装置３の外部の制御情報入力装置４から、視聴者によるリモコン操作等によって受信視聴環境に応じて制御情報入力装置４に入力された制御情報（例えば、音量調節情報）を、制御情報取得部３５を介して取得する。制御部３４は、ダイアログ制御用メタデータと視聴者から与えられた制御情報とを用いて、ダイアログ専用チャンネル信号と背景音チャンネル信号とを制御する。 Further, the control unit 34 controls the control information input from the control information input device 4 outside the audio signal processing device 3 to the control information input device 4 according to the received viewing environment by the remote control operation by the viewer or the like (for example, the volume). The adjustment information) is acquired via the control information acquisition unit 35. The control unit 34 controls the dialog dedicated channel signal and the background sound channel signal by using the dialog control metadata and the control information provided by the viewer.

当該制御において、制御部３４は、ダイアログに話速変換処理を行ってもよい。また、当該制御において、制御部３４は、利得制御量の上限値よりも高い又は下限値よりも低いダイアログ音量の調節情報を取得したとき、利得制御量の上限値又は下限値により調節を制限してもよい。 In the control, the control unit 34 may perform the speech speed conversion process on the dialog. Further, in the control, when the control unit 34 acquires the adjustment information of the dialog volume higher than the upper limit value of the gain control amount or lower than the lower limit value, the control unit 34 limits the adjustment by the upper limit value or the lower limit value of the gain control amount. May be.

当該制御において制御部３４は、ダイアログ専用チャンネル信号と背景音チャンネル信号とに対してそれぞれ異なる信号処理を行ってもよい。例えば、制御部３４は、ダイアログ音量を増加させる調節情報を取得したとき、ダイアログ専用チャンネル信号以外のチャンネル信号の利得を低減させ、ダイアログ音量を低減させる調節情報を取得したとき、前記ダイアログ専用チャンネル信号の利得のみを低減させてもよい。また、制御部３４は、前記のダイアログ音量の調節後に、ダイアログ専用チャンネル信号と背景音チャンネル信号の音量を、同時に増減してもよい。さらに、制御部３４は、ダイアログ専用チャンネル信号とそれ以外の任意の数のチャンネル信号との双方又はどちらか一方に、それぞれ周波数補正処理を含む信号処理を行ってもよい。 In the control, the control unit 34 may perform different signal processing on the dialog dedicated channel signal and the background sound channel signal, respectively. For example, the control unit 34, when acquiring the adjustment information for increasing the dialog volume, reduces the gain of channel signals other than the dialog dedicated channel signal, and acquires the adjustment information for decreasing the dialog volume, the dialog dedicated channel signal. May be reduced only. Further, the control unit 34 may increase or decrease the volume of the dialog dedicated channel signal and the background sound channel signal at the same time after adjusting the dialog volume. Further, the control unit 34 may perform signal processing including frequency correction processing on both or either of the dialog-dedicated channel signal and an arbitrary number of other channel signals.

また、制御部３４は、必要に応じてダウンミックスを含む変換手段によりチャンネル数を変換した後、ダイアログ専用チャンネル信号と背景音チャンネル信号とを組み合わせた２２．２ｃｈの音声信号を受信機に出力する。受信機は当該音声信号を、受信機に接続された再生装置から出力し、この結果、視聴者は制御情報に示す通りの所望の音声を視聴することができる。なお圧縮音声信号のまま上述の音声信号処理を行ったとき、制御部３４は、ダイアログ制御用メタデータと音声方式メタデータの双方またはどちらか一方と、圧縮音声信号を多重化してビットストリームとして受信機等に出力してもよいし、メタデータを多重化せずに圧縮音声信号を出力してもよい。 Further, the control unit 34 converts the number of channels by a conversion unit including downmix as necessary, and then outputs a 22.2 ch audio signal, which is a combination of the dialog dedicated channel signal and the background sound channel signal, to the receiver. .. The receiver outputs the audio signal from the reproduction device connected to the receiver, and as a result, the viewer can view the desired audio as shown in the control information. When the above-described audio signal processing is performed on the compressed audio signal, the control unit 34 multiplexes the compressed audio signal with the dialog control metadata and/or the audio method metadata, and receives the multiplexed audio signal as a bit stream. A compressed audio signal may be output without multiplexing the metadata.

図５は、本発明の一実施形態に係る動作フローを示す図である。 FIG. 5 is a diagram showing an operation flow according to the embodiment of the present invention.

符号化装置１は、入力された音声信号を取得し（ステップＳ１）、圧縮符号化する（ステップＳ２）。次いで符号化装置１は、圧縮符号化した圧縮音声信号と、ダイアログ制御用メタデータ及び音声方式を示すメタデータとを多重化する（ステップＳ３）。符号化装置１は、多重化したデータをビットストリームとして復号化装置２に出力する（ステップＳ４）。 The encoding device 1 acquires the input audio signal (step S1) and performs compression encoding (step S2). Next, the encoding device 1 multiplexes the compression-encoded compressed audio signal, the dialog control metadata, and the metadata indicating the audio system (step S3). The encoding device 1 outputs the multiplexed data as a bit stream to the decoding device 2 (step S4).

復号化装置２は、符号化装置１から取得したビットストリームをメタデータ及び圧縮音声信号に分離する（ステップＳ５）。復号化装置２はまた、メタデータを、ダイアログ制御用メタデータ及び音声方式メタデータに分離する（ステップＳ６）。次いで復号化装置２は、取得した圧縮音声信号を音声信号に復号化し（ステップＳ７）、ダイアログ制御用メタデータ、音声方式メタデータ及び音声信号を音声信号処理装置３に出力する（ステップＳ８） The decoding device 2 separates the bit stream acquired from the encoding device 1 into metadata and a compressed audio signal (step S5). The decryption device 2 also separates the metadata into dialog control metadata and audio method metadata (step S6). Next, the decoding device 2 decodes the acquired compressed audio signal into an audio signal (step S7), and outputs the dialog control metadata, audio system metadata, and audio signal to the audio signal processing device 3 (step S8).

音声信号処理装置３は、復号化装置２から取得した音声信号が、ダイアログ制御機能対応の番組か否かを判定する（ステップＳ９）。音声信号処理装置３は、当該音声信号がダイアログ制御機能対応の番組でないと判定すると（ステップＳ９のＮｏ）、ステップＳ１０〜ステップＳ１４を行わない。 The audio signal processing device 3 determines whether or not the audio signal acquired from the decoding device 2 is a program compatible with the dialog control function (step S9). If the audio signal processing device 3 determines that the audio signal is not a program compatible with the dialog control function (No in step S9), steps S10 to S14 are not performed.

他方、音声信号処理装置３は、当該音声信号がダイアログ制御機能対応の番組であると判定すると（ステップＳ９のＹｅｓ）、ダイアログ制御用メタデータから、受信機等での利得制御の上限値及び下限値の情報を取得する（ステップＳ１０）。次いで音声信号処理装置３は、ダイアログ専用チャンネルの信号を特定する（ステップＳ１１）。音声信号処理装置３は、当該特定に基づいて、当該音声信号をダイアログ専用チャンネル信号とそれ以外の背景音チャンネル信号とに分離する（ステップＳ１２）。 On the other hand, when the audio signal processing device 3 determines that the audio signal is a program compatible with the dialog control function (Yes in step S9), the upper limit value and the lower limit value of the gain control in the receiver or the like are determined from the dialog control metadata. Value information is acquired (step S10). Next, the audio signal processing device 3 identifies the signal of the channel dedicated to the dialog (step S11). The audio signal processing device 3 separates the audio signal into a dialog dedicated channel signal and other background sound channel signals based on the identification (step S12).

音声信号処理装置３は、音声信号処理装置３の外部の制御情報入力装置４から、制御情報（例えば、音量調節情報）を、制御情報取得部３５を介して取得する（ステップＳ１３）。音声信号処理装置３は、当該制御情報に基づいて音声信号を調節する（ステップＳ１４）。 The audio signal processing device 3 acquires control information (for example, volume adjustment information) from the control information input device 4 outside the audio signal processing device 3 via the control information acquisition unit 35 (step S13). The audio signal processing device 3 adjusts the audio signal based on the control information (step S14).

次いで音声信号処理装置３は、音声信号を受信機等に出力する（ステップＳ１５）。 Next, the audio signal processing device 3 outputs the audio signal to the receiver or the like (step S15).

従って、本実施形態に係る符号化装置１、復号化装置２、音声信号処理装置３及び制御情報入力装置４によれば、チャンネルベース方式の制作手法及びチャンネルベース方式の符号化手法の枠組み内で、受信機等を用いて視聴者がダイアログを制御することができる仕組みを実現することができる。 Therefore, according to the encoding device 1, the decoding device 2, the audio signal processing device 3, and the control information input device 4 according to the present embodiment, within the framework of the channel-based production method and the channel-based encoding method. It is possible to realize a mechanism that allows the viewer to control the dialog using a receiver or the like.

本発明を諸図面及び実施形態に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易である。従って、これらの変形及び修正は本発明の範囲に含まれることに留意されたい。例えば、各機能部、各手段、各ステップ等に含まれる機能等は論理的に矛盾しないように再配置可能であり、複数の機能部やステップ等を１つ組み合わせること、或いは分割することが可能である。また、上述した本発明の実施形態は、それぞれ説明した実施形態に忠実に実施することに限定されるものではなく、適宜、各特徴を組み合わせたり、一部を省略したりして実施することもできる。 Although the present invention has been described based on the drawings and the embodiments, those skilled in the art can easily make various variations and modifications based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present invention. For example, the functions and the like included in each functional unit, each unit, and each step can be rearranged so as not to logically contradict, and a plurality of functional units and steps can be combined or divided. Is. In addition, the above-described embodiments of the present invention are not limited to being carried out faithfully to the respective described embodiments, and may be carried out by appropriately combining the respective features or omitting a part thereof. it can.

また、本発明が、２２．２ｃｈ以外の音声方式に適用可能であることは言うまでもない。また本発明は、ＭＰＥＧ―４ＡＡＣに限らず、ダイアログ制御情報を格納可能なメタデータ領域を有する音声符号化方式なら、適用可能である。更に本発明は、必ずしもダイアログのみに適用されるものではなく、何らかの音声信号のための専用チャンネルを設けて、個別に制御する目的の制御に対して応用可能であることは言うまでもない。 Further, it goes without saying that the present invention can be applied to audio systems other than 22.2 ch. The present invention is not limited to MPEG-4 AAC, but can be applied to any audio encoding method having a metadata area capable of storing dialog control information. Further, it goes without saying that the present invention is not necessarily applied only to the dialog, but can be applied to control for the purpose of individually controlling by providing a dedicated channel for some audio signal.

１符号化装置
１１圧縮符号化部
１２多重化部
２復号化装置
２１分離部
２２メタデータ分離部
２３復号化部
３音声信号処理装置
３１ダイアログ制御可否判定部
３２ダイアログ専用チャンネル信号特定部
３３音声信号分離部
３４制御部
３５制御情報取得部
３６記憶部
４制御情報入力装置 DESCRIPTION OF SYMBOLS 1 Coding device 11 Compression coding part 12 Multiplexing part 2 Decoding device 21 Separation part 22 Metadata separation part 23 Decoding part 3 Audio signal processing device 31 Dialog control availability judgment part 32 Dialog dedicated channel signal specifying part 33 Audio signal Separation unit 34 Control unit 35 Control information acquisition unit 36 Storage unit 4 Control information input device

Claims

An audio signal processing device for performing audio signal processing on the basis of a channel-based system, for an audio signal corresponding to each channel,
A dialog control availability determination unit that determines availability of dialog control based on a flag indicating whether or not the program corresponds to the dialog control function,
A dialog dedicated channel signal specifying unit that specifies a dialog dedicated channel signal when the dialog control availability determination unit determines that the dialog control is possible;
An audio signal separation unit that separates the audio signal into the dialog dedicated channel signal and a channel signal other than the dialog dedicated channel signal based on the specification of the dialog dedicated channel signal specifying unit,
The upper limit value and the lower limit value of the gain control amount of the dialog dedicated channel signal are acquired as dialog control information, and different signal processing is performed on the dialog dedicated channel signal and the channel signals other than the dialog dedicated channel signal. A control unit,
Equipped with
When the control unit obtains the adjustment information for increasing the dialog volume, the gain of channel signals other than the dialog dedicated channel signal is reduced, and when the adjustment information for reducing the dialog volume is obtained, the control unit detects the dialog dedicated channel signal. An audio signal processing device that reduces only gain.

The audio signal processing device according to claim 1, wherein the control unit converts the number of channels by a conversion unit including downmix after controlling the dialog.

The audio signal according to claim 1, wherein the control unit performs signal processing including frequency correction processing on both or either of the dialog dedicated channel signal and the channel signal other than the dialog dedicated channel signal. Processing equipment.

The control unit, if the audio signal is a compressed audio signal separated from a bitstream, directly performs the signal processing without decoding the compressed audio signal. The audio signal processing device described.