JP6228389B2

JP6228389B2 - Acoustic signal reproduction device

Info

Publication number: JP6228389B2
Application number: JP2013102436A
Authority: JP
Inventors: 大出　訓史; 訓史大出; 靖茂中山; 渡辺　馨; 馨渡辺
Original assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2013-05-14
Filing date: 2013-05-14
Publication date: 2017-11-08
Anticipated expiration: 2033-05-14
Also published as: JP2014222856A

Description

この発明は、複数の音響空間層を持つマルチチャンネル音響方式の音響信号再生装置に関する。 The present invention relates to audio signal reproducing apparatus of a multi-channel sound system having a plurality of layered sound field.

現在番組制作が行われている2チャンネル音響方式、5.1チャンネル音響方式に加え、7.1チャンネルや22.2チャンネルなどの5.1チャンネル音響方式を超えた「3次元(立体)音響方式」など複数の音響方式が提案されている。オーディオ関連の国際標準化団体であるITU-Rでは、ITU-R勧告として5.1チャンネル音響方式を超えた3次元音響方式（advanced multichannel audio system）に対する要求条件（非特許文献１）を定めており、今後も複数の音響方式が提案されることが予測される。これらの音響方式を共通のフォーマットで表現することで、次世代オーディオシステムに適用可能であり種々の方面への活用が可能な柔軟なシステムとすることができる。 In addition to the two-channel and 5.1-channel sound systems currently being produced, multiple sound systems such as the “three-dimensional (three-dimensional) sound system” that exceed the 5.1-channel sound systems such as 7.1 and 22.2 channels are proposed. Has been. The ITU-R, an international standardization organization related to audio, has set requirements (non-patent document 1) for a three-dimensional acoustic system (advanced multichannel audio system) beyond the 5.1 channel acoustic system as an ITU-R recommendation. It is expected that a plurality of acoustic methods will be proposed. By expressing these acoustic systems in a common format, it can be applied to a next-generation audio system and can be a flexible system that can be used in various fields.

"Performance requirements for an advanced multichannel stere ophonic sound system for use with or without accompanying picture", ITU-R勧告BS. 1909"Performance requirements for an advanced multichannel stere ophonic sound system for use with or without accompanying picture", ITU-R recommendation BS. 1909

種々の音響方式を表現可能な共通のフォーマットとして、「単一の音響空間層を持つ音響信号」の検討は進んでいる。ここで、空間的に配置された複数のチャンネル信号によって構築される音を単一の音響空間層とする。これまでの番組制作では番組に必要な音を全て単一の音響空間層に配置している。これまで一つにまとめていた音響空間層を幾つかの層に分割して音響番組制作を行い、「複数の音響空間層を持つ音響信号」の形式を用いることで、番組交換時の受取先や家庭の環境に合わせて受信した音響信号の変形・変換・入替を容易に行うことができるようになる。これ以降、「マルチチャンネル音響方式」とは「複数の音響空間層を持つ音響方式」を意図するものとして説明を行う。 As a common format capable of expressing various acoustic systems, “acoustic signals having a single acoustic space layer” are being studied. Here, a sound constructed by a plurality of spatially arranged channel signals is defined as a single acoustic space layer. In conventional program production, all the sounds required for a program are arranged in a single acoustic space layer. Dividing the acoustic space layer that has been integrated into several layers into several layers, producing an audio program, and using the format of "acoustic signal with multiple acoustic space layers", the recipient at the time of program exchange In addition, it is possible to easily transform, convert, and replace the received acoustic signal according to the home environment. Hereinafter, the “multi-channel acoustic system” will be described as intended to be an “acoustic system having a plurality of acoustic spatial layers”.

例えば、マルチチャンネル音響方式を用いて放送される放送番組は、様々な音響方式によって制作され、様々な再生環境において再生される。番組制作においては、画面上の映像と音像を一致させる場合があるが、次世代放送フォーマットとして提案されている映像フォーマットに対して想定されている視野角は、例えば30度から100度という大きな幅があり、空間的に同じ位置にスピーカが設置されていても、映像フォーマットによってスピーカの役割が異なる場合がある。 For example, a broadcast program broadcast using a multi-channel audio system is produced by various audio systems and reproduced in various reproduction environments. In program production, the video on the screen may match the sound image, but the viewing angle assumed for the video format proposed as the next-generation broadcast format is as large as 30 to 100 degrees, for example. Even if the speakers are installed at the same spatial position, the role of the speakers may differ depending on the video format.

したがって、かかる点に鑑みてなされた本発明の目的は、複数の音響空間層を持つマルチチャンネル音響方式に対応し、映像フォーマットに応じた役割を持つ音響チャンネル信号を適切に選択して再生可能な音響信号再生装置を提供することにある。 Accordingly, an object of the present invention made in view of the above point is to support a multi-channel audio system having a plurality of acoustic space layers, and to appropriately select and reproduce an audio channel signal having a role corresponding to a video format. An object is to provide an acoustic signal reproducing device.

上述した諸課題を解決すべく、本発明に係る音響信号再生装置は、異なる映像フォーマットにそれぞれ対応する音響空間層を複数含むマルチチャンネル音響信号の再生装置であって、前記マルチチャンネル音響信号は音響空間層に対応する映像フォーマットをメタデータに含み、前記音響空間層に対応する映像フォーマットと、再生環境の映像フォーマットとに基づき選択した１つの音響空間層の音響チャンネル信号の復号化を行い、前記選択した音響空間層以外の音響空間層の音響チャンネル信号の復号化は行わない復号化部、を備える。 In order to solve the above-described problems, an audio signal reproduction device according to the present invention is a multi-channel audio signal reproduction device including a plurality of acoustic space layers respectively corresponding to different video formats, and the multi-channel audio signal is an audio signal. includes a video format corresponding to the space layer in the metadata, the video format corresponding to the acoustic space layer, have row decoding of the audio channel signals of a single layered sound field selected on the basis of the video format of the playback environment, A decoding unit that does not decode an acoustic channel signal of an acoustic space layer other than the selected acoustic space layer .

また、前記再生環境の画面配置が、前記映像フォーマットに対する標準配置と異なる場合、前記音響チャンネル信号のうち、画面と連動する音響チャンネル信号が、前記再生環境の画面配置における音響チャンネル位置から再生されるように前記音響チャンネル信号の変換を行う再生チャンネル変換部を備える、ことが好ましい。 In addition, when the screen layout of the playback environment is different from the standard layout for the video format, among the acoustic channel signals, an acoustic channel signal that is linked to the screen is played from the acoustic channel position in the screen layout of the playback environment. It is preferable that a reproduction channel conversion unit for converting the acoustic channel signal is provided.

また、前記再生環境のスピーカ配置が、前記映像フォーマットに対する標準配置と異なる場合、前記音響チャンネル信号のうち、画面と連動しない音響チャンネル信号が、前記映像フォーマットに対する標準配置から再生されるように前記音響チャンネル信号の変換を行う再生チャンネル変換部を備える、ことが好ましい。 In addition, when the speaker arrangement of the reproduction environment is different from the standard arrangement for the video format, the audio channel signal that is not linked to the screen among the audio channel signals is reproduced from the standard arrangement for the video format. It is preferable to include a reproduction channel conversion unit that converts channel signals.

本発明に係る音響信号再生装置によれば、複数の音響空間層を持つマルチチャンネル音響方式に対応し、映像フォーマットに応じた役割を持つ音響チャンネル信号を適切に選択して再生することが可能になる。 According to the audio signal reproducing device of the present invention, it is possible to appropriately select and reproduce an audio channel signal having a role corresponding to a video format, corresponding to a multi-channel audio system having a plurality of acoustic space layers. Become.

本発明の第１の実施形態に係る音響信号再生装置の構成を示す図である。It is a figure which shows the structure of the acoustic signal reproducing | regenerating apparatus which concerns on the 1st Embodiment of this invention. マルチチャンネル音響信号に含まれる音響空間層と映像フォーマットの画面配置の一例を示す図である。It is a figure which shows an example of the screen arrangement | positioning of the sound space layer and video format which are contained in a multichannel sound signal. マルチチャンネル音響信号におけるメタデータの一例を示す図である。It is a figure which shows an example of the metadata in a multichannel acoustic signal. 映像フォーマットによるチャンネルの役割の違いを示す図である。It is a figure which shows the difference in the role of the channel by a video format. 再生環境の画面配置が映像フォーマットに対する標準配置と異なる場合の音響チャンネル信号の変換を示す図である。It is a figure which shows conversion of an acoustic channel signal when the screen arrangement | positioning of a reproduction environment differs from the standard arrangement | positioning with respect to a video format. 再生環境のスピーカ配置が映像フォーマットに対する標準配置と異なる場合の音響チャンネル信号の変換を示す図である。It is a figure which shows conversion of an audio channel signal when the speaker arrangement | positioning of a reproduction environment differs from the standard arrangement | positioning with respect to a video format. 再生環境の画面配置における音響チャンネル位置と映像フォーマットに対する標準配置との関係を示す図である。It is a figure which shows the relationship between the acoustic channel position in the screen arrangement | positioning of reproduction | regeneration environment, and the standard arrangement | positioning with respect to a video format. 振幅パンニング法の概要を示す図である。It is a figure which shows the outline | summary of an amplitude panning method. 本発明の一実施形態に係る音響信号作成装置の構成を示す図である。It is a figure which shows the structure of the acoustic signal production apparatus which concerns on one Embodiment of this invention.

以降、諸図面を参照しながら、本発明の実施態様を詳細に説明する。ここで、本発明は、「複数の音響空間層を持つ音響信号」であるマルチチャンネル音響信号に対応するものである。本件出願人は、「単一の音響空間層を持つ音響信号」について韓国特許出願（10-2012-0112984）を行っており、また、「複数の音響空間層を持つ音響信号」について日本国特許出願（特願2013-010544）を行っている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Here, the present invention corresponds to a multi-channel acoustic signal that is an “acoustic signal having a plurality of acoustic spatial layers”. The applicant has filed a Korean patent application (10-2012-0112984) for “acoustic signals having a single acoustic space layer” and a Japanese patent for “acoustic signals having multiple acoustic space layers”. An application has been filed (Japanese Patent Application 2013-010544).

図１は、本発明の一実施形態に係る音響信号再生装置の構成を示す図である。音響信号再生装置１０は、デマルチプレクサ１１（ＤＥＭＵＸ）と、復号化部１２と、再生チャンネル変換部１３とを備え、音響信号再生装置１０の出力信号はスピーカ１４により音として再生される。 FIG. 1 is a diagram showing a configuration of an acoustic signal reproduction device according to an embodiment of the present invention. The acoustic signal reproduction device 10 includes a demultiplexer 11 (DEMUX), a decoding unit 12, and a reproduction channel conversion unit 13. The output signal of the acoustic signal reproduction device 10 is reproduced as sound by the speaker 14.

デマルチプレクサ１１は、入力されたマルチチャンネル音響データストリームをメタデータと音響チャンネル信号に分離する。デマルチプレクサ１１は、音響チャンネル信号を復号化部１２に出力し、メタデータを復号化部１２及び再生チャンネル変換部１３に出力する。 The demultiplexer 11 separates the input multi-channel audio data stream into metadata and audio channel signals. The demultiplexer 11 outputs the acoustic channel signal to the decoding unit 12 and outputs the metadata to the decoding unit 12 and the reproduction channel conversion unit 13.

図２は、本実施形態におけるマルチチャンネル音響信号（音響データストリーム）に含まれる音響空間層と映像フォーマットの画面配置の一例を示す図である。図２に示される音響空間層１００は利用者を中心に１０チャンネル（１１０〜２００）が配置される。また、図示の通り、UHDTV（Ultra High Definition Television）の映像フォーマットであるUHDTV-1（4K）に対応する画面配置３２０と、UHDTV-2（スーパーハイビジョン）に対応する画面配置３１０とは、視野角が異なるものである。 FIG. 2 is a diagram illustrating an example of a screen arrangement of an acoustic space layer and a video format included in a multichannel audio signal (acoustic data stream) in the present embodiment. The acoustic space layer 100 shown in FIG. 2 has 10 channels (110 to 200) arranged around the user. In addition, as shown in the drawing, the screen layout 320 corresponding to UHDTV-1 (4K) which is a video format of UHDTV (Ultra High Definition Television) and the screen layout 310 corresponding to UHDTV-2 (Super Hi-Vision) have a viewing angle. Are different.

図３は、マルチチャンネル音響信号における音響チャンネル信号及びメタデータの一例を示す図である。図３のメタデータ（Sound Essence 000）には、当該マルチチャンネル音響信号が音声（Talk）の役割を持つ１層の音響空間層（Sound Field）であり、映像フォーマットUHDTV-2及びUHDTV-1それぞれに対し、第１の音響空間層（Sound Field 01）及び第２の音響空間層（Sound Field 02）が対応することが記載されている。 FIG. 3 is a diagram illustrating an example of an acoustic channel signal and metadata in a multi-channel acoustic signal. The metadata (Sound Essence 000) in FIG. 3 includes a single sound field layer (Sound Field) in which the multi-channel sound signal plays a role of voice (Talk), and each of the video formats UHDTV-2 and UHDTV-1 On the other hand, it is described that the first acoustic space layer (Sound Field 01) and the second acoustic space layer (Sound Field 02) correspond.

第１の音響空間層（Sound Field 01）は、UHDTV-2に対応し、１０チャンネルの完プロ番組（complete mix、再生に必要な音が全て収録された番組）であり、画面左右の視野角が６０度であることが記載されている。また、各音響チャンネル信号１１０〜２００に対して、音響チャンネル信号の番号（Channel 01-10）、チャンネルラベル（FL FR FLc FRc FC SiL SiR BL BR BC）、各チャンネルの位置が画面と連動するか（Linked/Unlinked）が記載されている。 The first acoustic space layer (Sound Field 01) corresponds to UHDTV-2 and is a 10-channel complete program (complete mix, a program that contains all the sounds necessary for playback). Is 60 degrees. Also, for each acoustic channel signal 110-200, whether the acoustic channel signal number (Channel 01-10), channel label (FL FR FLc FRc FC SiL SiR BL BR BC), and the position of each channel are linked to the screen (Linked / Unlinked) is described.

第２の音響空間層（Sound Field 02）は、UHDTV-1に対応し、１０チャンネルの完プロ番組（complete mix、再生に必要な音が全て収録された番組）であり、画面左右の視野角が３０度であることが記載されている。また、各音響チャンネル信号１１０〜２００に対して、音響チャンネル信号の番号（Channel 01-10）、チャンネルラベル（FLw FRw FL FR FC SiL SiR BL BR BC）、各チャンネルの位置が画面と連動するか（Linked/Unlinked）が記載されている。上述の通り、第１の音響空間層及び第２の音響空間層は、各チャンネルの配置は同じ空間位置となるが、画面左右のチャンネルの役割（チャンネルラベル）が異なるものである。 The second acoustic space layer (Sound Field 02) corresponds to UHDTV-1 and is a 10 channel complete professional program (complete mix, a program that contains all the sounds necessary for playback). Is 30 degrees. Also, for each acoustic channel signal 110-200, whether the acoustic channel signal number (Channel 01-10), channel label (FLw FRw FL FR FC SiL SiR BL BR BC), and the position of each channel are linked to the screen (Linked / Unlinked) is described. As described above, in the first acoustic space layer and the second acoustic space layer, the arrangement of the channels is the same spatial position, but the roles of the left and right channels (channel labels) are different.

図４は、映像フォーマットによるチャンネルの役割の違いを示す図である。図４（ａ）はUHDTV-1に対応する画面配置３２０と、各チャンネルの役割を示す図である。UHDTV-1の場合、画面配置３２０の両端に対応するチャンネル１３０がＬｃｈ、チャンネル１４０がＲｃｈとなる。また、チャンネル１３０より外側（ワイド側）となるチャンネル１１０はＬｗチャンネル、チャンネル１４０より外側となるチャンネル１２０はＲｗチャンネルとなる。 FIG. 4 is a diagram illustrating the difference in channel roles depending on the video format. FIG. 4A shows a screen layout 320 corresponding to UHDTV-1 and the role of each channel. In the case of UHDTV-1, the channel 130 corresponding to both ends of the screen arrangement 320 is Lch, and the channel 140 is Rch. Further, the channel 110 outside (wide side) from the channel 130 is the Lw channel, and the channel 120 outside the channel 140 is the Rw channel.

図４（ｂ）はUHDTV-2に対応する画面配置３１０と、各チャンネルの役割を示す図である。UHDTV-2の場合、画面配置３１０の両端に対応するチャンネル１１０がＬｃｈ、チャンネル１２０がＲｃｈとなる。また、チャンネル１１０より内側（センター側）となるチャンネル１３０はＬｃチャンネル、チャンネル１２０より内側となるチャンネル１４０はＲｃチャンネルとなる。 FIG. 4B is a diagram showing a screen layout 310 corresponding to UHDTV-2 and the role of each channel. In the case of UHDTV-2, the channel 110 corresponding to both ends of the screen arrangement 310 is Lch and the channel 120 is Rch. Further, the channel 130 on the inner side (center side) from the channel 110 is the Lc channel, and the channel 140 on the inner side from the channel 120 is the Rc channel.

なお、図４では、画面の両サイドのチャンネルであるＬｃｈ、Ｒｃｈを例に挙げたが、ＬＥＦ（低周波帯域用の効果チャンネル）などを含む他のチャンネルも、映像フォーマットにより役割が変化する場合がある。 In FIG. 4, Lch and Rch, which are channels on both sides of the screen, are taken as an example, but the roles of other channels including LEF (effect channel for low frequency band) also change depending on the video format. There is.

復号化部１２は、マルチチャンネル音響信号に含まれるメタデータと再生環境データに基づき、再生環境に対応した音響空間層を選択し、選択した音響空間層に含まれる音響チャンネル信号を復号化する。再生環境データとは、映像フォーマット、画面サイズ、スピーカ配置、利用者の位置などであり、装置毎に予め定められたデータや、リモコン等を通じて利用者から入力されるデータである。 The decoding unit 12 selects an acoustic space layer corresponding to the reproduction environment based on the metadata and the reproduction environment data included in the multi-channel audio signal, and decodes the acoustic channel signal included in the selected acoustic space layer. The reproduction environment data is a video format, a screen size, a speaker arrangement, a user position, and the like, and is data set in advance for each device or data input from the user through a remote controller or the like.

図２、３に示すマルチチャンネル音響信号の場合、復号化部１２は、再生環境データとして映像フォーマットがUHDTV-2である旨の情報を受信すると、第１の音響空間層（Sound Field 01）を選択し、各音響チャンネル信号１１０〜２００の復号化を行う。また、復号化部１２は、再生環境データとして映像フォーマットがUHDTV-1である旨の情報を受信すると、第２の音響空間層（Sound Field 02）を選択し、各音響チャンネル信号１１０〜２００の復号化を行う。なお、復号化部１２は、メタデータ及び再生環境データを参照し、再生対象とならない音響空間層については、当該音響空間層の各音響チャンネル信号の復号化は行わなくてもよい。これにより、復号化に関する消費電力を低減することができる。 In the case of the multi-channel audio signal shown in FIGS. 2 and 3, when the decoding unit 12 receives information indicating that the video format is UHDTV-2 as the reproduction environment data, the decoding unit 12 displays the first acoustic space layer (Sound Field 01). Select and decode each acoustic channel signal 110-200. In addition, when receiving information indicating that the video format is UHDTV-1 as the reproduction environment data, the decoding unit 12 selects the second acoustic space layer (Sound Field 02), and stores each of the acoustic channel signals 110 to 200. Decrypt. Note that the decoding unit 12 refers to the metadata and the reproduction environment data, and for the acoustic space layer that is not to be reproduced, it is not necessary to decode each acoustic channel signal of the acoustic space layer. Thereby, the power consumption regarding a decoding can be reduced.

再生チャンネル変換部１３は、メタデータと再生環境データとに基づき、各再生スピーカに入力する音響チャンネル信号を生成する。再生チャンネル変換部１３は、再生環境の画面配置及びスピーカ配置が、映像フォーマットに対する標準的な配置（標準配置）と同じか所定の許容範囲内に含まれる場合には、そのまま対応するスピーカより音響チャンネル信号を再生させる。これにより、不要な信号処理による音響品質の劣化を防ぐことができる。なお、画面配置及びスピーカ配置に関する映像フォーマットに対する標準配置とは、各音響空間層における理想的な再生環境であって、例えば標準規格に基づき定めたり、番組制作者が想定する再生環境を適宜設定したりできるものである。再生チャンネル変換部１３は、再生環境の画面配置及びスピーカ配置が、映像フォーマットに対する標準配置と（許容範囲以上）異なる場合には、音響チャンネル信号の変換を行う。 The reproduction channel conversion unit 13 generates an acoustic channel signal to be input to each reproduction speaker based on the metadata and the reproduction environment data. When the screen layout and the speaker layout of the playback environment are the same as the standard layout (standard layout) for the video format or within a predetermined allowable range, the playback channel conversion unit 13 receives the sound channel from the corresponding speaker as it is. Play the signal. As a result, it is possible to prevent deterioration of sound quality due to unnecessary signal processing. The standard layout for the video format related to the screen layout and the speaker layout is an ideal playback environment in each acoustic space layer. For example, the standard layout is determined based on the standard, or the playback environment assumed by the program producer is appropriately set. It can be. The reproduction channel conversion unit 13 converts an audio channel signal when the screen arrangement and the speaker arrangement of the reproduction environment are different from the standard arrangement for the video format (over an allowable range).

図５は、再生環境の画面配置が映像フォーマットに対する標準配置と異なる場合の音響チャンネル信号の変換を示すものである。ここで、UHDTV-2の画面配置及びスピーカ配置の標準配置は、図４（ｂ）に示す画面配置３１０及びチャンネル１１０〜２００であるものとする。図５では、画面配置３１０は図４（ｂ）に示すUHDTV-2の標準配置と異なる。このため、再生環境の画面配置における音響チャンネル位置は、Ｌチャンネル４１０、Ｌｃチャンネル４３０、Ｒｃチャンネル４４０、Ｒチャンネル４２０として図示するように、標準配置であるチャンネル１１０、１３０、１４０、１２０と異なる位置となる。図３のメタデータにおいて、UHDTV-2に対応する第１の音響空間層のうち、音響チャンネル信号１１０〜１５０は、各チャンネルの位置が画面と連動する（Linked）ことが記載されている。画面配置が標準配置と異なる場合、画面と連動する音響チャンネル信号については、その再生位置の変換が必要となる。このため、再生チャンネル変換部１３は、音響チャンネル信号１１０〜１４０は、図５に示すチャンネル４１０〜４４０の位置に虚音像が生成されるように、音響チャンネル信号の変換を行う。なお、画面配置の変化によるチャンネル位置の変更がない音響チャンネル信号１５０については、音響チャンネル信号の変換を省略することができる。また、音響チャンネル信号１６０〜２００は、各チャンネルの位置が画面と連動しない（UnLinked）ため、再生チャンネル変換部１３は、そのまま対応するスピーカより音響チャンネル信号を再生させる。 FIG. 5 shows the conversion of sound channel signals when the screen layout of the playback environment is different from the standard layout for the video format. Here, it is assumed that the standard layout of the UHDTV-2 screen layout and speaker layout is the screen layout 310 and the channels 110 to 200 shown in FIG. In FIG. 5, the screen layout 310 is different from the standard layout of UHDTV-2 shown in FIG. For this reason, the acoustic channel positions in the screen layout of the reproduction environment are different from the standard channels 110, 130, 140, and 120 as illustrated as the L channel 410, the Lc channel 430, the Rc channel 440, and the R channel 420. It becomes. In the metadata of FIG. 3, among the first acoustic space layers corresponding to UHDTV-2, the acoustic channel signals 110 to 150 describe that the position of each channel is linked to the screen (Linked). When the screen layout is different from the standard layout, it is necessary to convert the playback position of the acoustic channel signal linked with the screen. Therefore, the reproduction channel conversion unit 13 converts the acoustic channel signals 110 to 140 so that a virtual sound image is generated at the positions of the channels 410 to 440 shown in FIG. Note that the conversion of the acoustic channel signal can be omitted for the acoustic channel signal 150 in which the channel position is not changed by the change in the screen layout. Moreover, since the position of each channel of the acoustic channel signals 160 to 200 is not linked to the screen (UnLinked), the reproduction channel conversion unit 13 reproduces the acoustic channel signal from the corresponding speaker as it is.

図６は、再生環境のスピーカ配置が映像フォーマットに対する標準配置と異なる場合の音響チャンネル信号の変換を示すものである。図６では、画面配置３１０は図４（ｂ）に示すUHDTV-2の標準的な画面配置と異なり、さらに、再生環境の画面配置における音響チャンネル位置に対応して、チャンネル１１０〜１４０を、それぞれＬチャンネル２１０、Ｌｃチャンネル２３０、Ｒｃチャンネル２４０、Ｒチャンネル２２０の位置へと配置したものである。図３のメタデータにおいて、UHDTV-2に対応する第１の音響空間層のうち、音響チャンネル信号１１０〜１５０は、各チャンネルの位置が画面と連動する（Linked）ことが記載されている。図６の場合、スピーカ配置が再生環境の画面配置における音響チャンネル位置となっているため、再生チャンネル変換部１３は、音響チャンネル信号１１０〜１５０は、そのまま対応するスピーカより音響チャンネル信号を再生させる。また、音響チャンネル信号１６０〜２００は、各チャンネルの位置が画面と連動しない（UnLinked）ため、再生チャンネル変換部１３は、対応するスピーカの標準配置の位置に虚音像が生成されるように、音響チャンネル信号の変換を行う。 FIG. 6 shows the conversion of the acoustic channel signal when the speaker arrangement in the reproduction environment is different from the standard arrangement for the video format. In FIG. 6, the screen layout 310 is different from the standard screen layout of UHDTV-2 shown in FIG. 4B, and the channels 110 to 140 are respectively set corresponding to the acoustic channel positions in the screen layout of the playback environment. The L channel 210, the Lc channel 230, the Rc channel 240, and the R channel 220 are arranged at positions. In the metadata of FIG. 3, among the first acoustic space layers corresponding to UHDTV-2, the acoustic channel signals 110 to 150 describe that the position of each channel is linked to the screen (Linked). In the case of FIG. 6, since the speaker arrangement is the acoustic channel position in the screen arrangement of the reproduction environment, the reproduction channel conversion unit 13 reproduces the acoustic channel signals from the corresponding speakers as they are as the acoustic channel signals 110 to 150. In addition, since the position of each channel is not linked to the screen (UnLinked) in the acoustic channel signals 160 to 200, the reproduction channel conversion unit 13 generates a sound image so that a virtual sound image is generated at the position of the standard arrangement of the corresponding speaker. Performs channel signal conversion.

図７は、再生環境に合わせて配置されたスピーカと映像フォーマットに対する標準配置との関係を示す図である。チャンネル２１０及び２２０は、再生環境の画面配置に合わせて配置されたスピーカであり、チャンネル１１０〜２００は、映像フォーマットにおけるスピーカの標準配置を示すものである。図７において、Ｌｃｈの音響チャンネル信号のメタデータにチャンネルの位置が画面と連動すること（Linked）が記載されている場合、再生チャンネル変換部１３は、チャンネル２１０よりそのままＬｃｈの音響チャンネル信号を再生させる。また、Ｌｃｈの音響チャンネル信号のメタデータにチャンネルの位置が画面と連動しないこと（Unlinked）が記載されている場合、再生チャンネル変換部１３は、チャンネル１１０の位置に虚音像が生成されるように、音響チャンネル信号の変換を行う。 FIG. 7 is a diagram showing the relationship between the speakers arranged in accordance with the reproduction environment and the standard arrangement for the video format. Channels 210 and 220 are speakers arranged in accordance with the screen layout of the reproduction environment, and channels 110 to 200 indicate a standard layout of speakers in the video format. In FIG. 7, when the metadata of the Lch acoustic channel signal describes that the channel position is linked to the screen (Linked), the reproduction channel conversion unit 13 reproduces the Lch acoustic channel signal as it is from the channel 210. Let Further, when the metadata of the Lch acoustic channel signal describes that the channel position is not linked to the screen (Unlinked), the reproduction channel conversion unit 13 generates a virtual sound image at the channel 110 position. , Conversion of acoustic channel signals.

再生チャンネル変換部１３による虚音像の生成には、任意の好適な方法を用いることができる。例えば、音響チャンネル信号の変換方法として、ＶＢＡＰ（Vector Base Amplitude Panning）や音響インテンシティが一致するように変換する方法などがある。このほか、ＷＦＳ（Wave Field Synthesis）といった方法で音響チャンネル信号をレンダリングしてもよい。 Any suitable method can be used to generate the virtual sound image by the reproduction channel conversion unit 13. For example, as a method of converting an acoustic channel signal, there is a method of converting so that VBAP (Vector Base Amplitude Panning) and acoustic intensity match. In addition, the acoustic channel signal may be rendered by a method such as WFS (Wave Field Synthesis).

図８は、音響チャンネル信号の変換方法の一例である振幅パンニング法の概要を示す図である。振幅パンニング法では、スピーカＳＰ１３の位置に虚音像を生成するため、虚音像の方向を挟み隣り合うスピーカであるスピーカＳＰ２２及びＳＰ２３が用いられる。音響チャンネル信号Ｓ（ｔ）が重み係数Ｗ_Ｌ及びＷ_Ｒによって左右のスピーカＳＰ２３及びＳＰ２２に分配される場合を考える。各スピーカＳＰ２３及びＳＰ２２の挟み込む角度を２θとし、スピーカＳＰ２３及びＳＰ２２の中心から虚音像（スピーカＳＰ１３）までの角度をφとすると、振幅パンニング法におけるタンゼント則では、角度φに音を定位させるための重み係数Ｗ_Ｌ及びＷ_Ｒは、式（１）により表される。 FIG. 8 is a diagram showing an outline of an amplitude panning method which is an example of a method for converting an acoustic channel signal. In the amplitude panning method, in order to generate a virtual sound image at the position of the speaker SP13, speakers SP22 and SP23 which are adjacent speakers across the direction of the virtual sound image are used. Sound channel signals S (t) is considered the case where it is distributed to left and right speakers SP23 and SP22 by the weight coefficient _{W L} and _{W R.} When the angle between the speakers SP23 and SP22 is 2θ and the angle from the center of the speakers SP23 and SP22 to the virtual sound image (speaker SP13) is φ, the tangent law in the amplitude panning method is used to localize the sound to the angle φ. weight coefficient _{W L} and _{W R} is represented by the formula (1).

ここで、重み係数Ｗ_Ｌ及びＷ_Ｒの和を１とすると、式（１）の右辺の分母が１となり、重み係数Ｗ_Ｌ及びＷ_Ｒについて式（２）が導き出される。 Here, when 1 the sum of the weighting factors _{W L} and _{W R,} the denominator of the right side becomes 1 of the formula (1), the weight coefficient _{W L} and _{W R} formula (2) is derived.

以上より、再生チャンネル変換部１３は、スピーカＳＰ１３に割り当てられた音響チャンネル信号Ｓ（ｔ）については、式（３）に示すスピーカＳＰ２２及びＳＰ２３それぞれから再生すべき音響信号Ｓ_Ｌ（ｔ）及びＳ_Ｒ（ｔ）へと変換することにより、虚音像によるスピーカ位置の補正を行うことができる。 As described above, the reproduction channel conversion unit 13 has the acoustic signals S _L (t) and S to be reproduced from the speakers SP22 and SP23 shown in Expression (3) for the acoustic channel signal S (t) assigned to the speaker SP13. By converting to _R (t), it is possible to correct the speaker position by the virtual sound image.

このように、本実施形態によれば、復号化部１２は、マルチチャンネル音響信号に含まれるメタデータに記載された音響空間層の対応する映像フォーマットと、再生環境の映像フォーマットとに基づき選択した１つの音響空間層の音響チャンネル信号の復号化を行う。これにより、複数の音響空間層を持つマルチチャンネル音響方式に対応し、映像フォーマットに応じた役割を持つ音響チャンネル信号を適切に選択して再生することが可能となる。 Thus, according to the present embodiment, the decoding unit 12 selects based on the video format corresponding to the acoustic space layer described in the metadata included in the multi-channel audio signal and the video format of the playback environment. The acoustic channel signal of one acoustic space layer is decoded. Accordingly, it is possible to appropriately select and reproduce an acoustic channel signal having a role corresponding to a video format, corresponding to a multi-channel acoustic system having a plurality of acoustic space layers.

また、再生チャンネル変換部１３は、再生環境の画面配置が、映像フォーマットに対する標準配置と異なる場合、音響チャンネル信号のうち、画面と連動する音響チャンネル信号が、再生環境の画面配置における音響チャンネル位置から再生されるように音響チャンネル信号の変換を行う。これにより、再生環境の画面配置が、映像フォーマットに対する標準配置と異なる場合でも、適切な音響チャンネル信号の変換が可能となり、より臨場感の高い音響再生が可能となる。 In addition, when the screen layout of the playback environment is different from the standard layout with respect to the video format, the playback channel conversion unit 13 determines that the acoustic channel signal that is linked to the screen out of the acoustic channel signals from the acoustic channel position in the screen layout of the playback environment. The sound channel signal is converted so as to be reproduced. As a result, even when the screen layout of the playback environment is different from the standard layout for the video format, it is possible to convert an appropriate sound channel signal, and sound playback with higher presence can be realized.

また、再生チャンネル変換部１３は、再生環境のスピーカ配置が、映像フォーマットに対する標準配置と異なる場合、音響チャンネル信号のうち、画面と連動しない音響チャンネル信号が、映像フォーマットに対する標準配置から再生されるように音響チャンネル信号の変換を行う。これにより、再生環境のスピーカ配置が、映像フォーマットに対する標準配置と異なる場合でも、適切な音響チャンネル信号の変換が可能となり、より臨場感の高い音響再生が可能となる。 In addition, when the speaker arrangement in the reproduction environment is different from the standard arrangement for the video format, the reproduction channel conversion unit 13 causes the audio channel signal that is not linked to the screen among the audio channel signals to be reproduced from the standard arrangement for the video format. The sound channel signal is converted. As a result, even when the speaker arrangement in the reproduction environment is different from the standard arrangement for the video format, it is possible to convert an appropriate sound channel signal, and it is possible to reproduce sound with a higher sense of reality.

図９は、本発明の一実施形態に係る音響信号作成装置の構成を示す図である。音響信号作成装置２０は、ミキサ２１と、符号化部２２と、マルチプレクサ２３（ＭＵＸ）とを備える。 FIG. 9 is a diagram showing a configuration of an acoustic signal creation device according to an embodiment of the present invention. The acoustic signal generation device 20 includes a mixer 21, an encoding unit 22, and a multiplexer 23 (MUX).

ミキサ２１は、複数の音響信号をミキシングして、音響空間層毎の音響チャンネル信号として符号化部２２に出力する。 The mixer 21 mixes a plurality of acoustic signals, and outputs them to the encoding unit 22 as acoustic channel signals for each acoustic space layer.

符号化部２２は、ミキサ２１からの各音響空間層の音響チャンネル信号を符号化してマルチプレクサ２３に出力する。 The encoding unit 22 encodes the acoustic channel signal of each acoustic space layer from the mixer 21 and outputs it to the multiplexer 23.

マルチプレクサ２３（多重化部）は、音響空間層の音響チャンネル信号と、音響空間層の対応する映像フォーマットを含むメタデータとを多重化するものであり、番組制作者等により入力されるメタデータと、符号化された音響チャンネル信号を多重化して複数の音響空間層を持つマルチチャンネル音響信号を作成する。マルチプレクサ２３は、放送又は伝送によりマルチチャンネル音響信号を伝えるため、マルチチャンネル音響信号を多重化して電波またはＩＰ回線等で家庭など遠隔地に伝送する。 The multiplexer 23 (multiplexer) multiplexes the sound channel signal of the sound space layer and the metadata including the corresponding video format of the sound space layer, and includes metadata input by a program producer or the like. The encoded acoustic channel signal is multiplexed to create a multi-channel acoustic signal having a plurality of acoustic spatial layers. The multiplexer 23 multiplexes the multi-channel sound signal and transmits it to a remote place such as a home by radio wave or IP line in order to transmit the multi-channel sound signal by broadcasting or transmission.

このように、本実施形態によれば、マルチプレクサ２３は、音響空間層の音響チャンネル信号と、音響空間層の対応する映像フォーマットを含むメタデータとを多重化する。これにより、音響信号再生装置側で、複数の音響空間層を持つマルチチャンネル音響方式に対応し、映像フォーマットに応じた役割を持つ音響チャンネル信号を適切に選択して再生することが可能となる。 Thus, according to this embodiment, the multiplexer 23 multiplexes the acoustic channel signal of the acoustic space layer and the metadata including the video format corresponding to the acoustic space layer. As a result, on the acoustic signal reproduction device side, it is possible to appropriately select and reproduce an acoustic channel signal corresponding to a multi-channel acoustic system having a plurality of acoustic space layers and having a role corresponding to a video format.

本発明を諸図面や実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。従って、これらの変形や修正は本発明の範囲に含まれることに留意されたい。例えば、各部材、各手段、各ステップなどに含まれる機能などは論理的に矛盾しないように再配置可能であり、複数の手段やステップなどを１つに組み合わせたり、或いは分割したりすることが可能である。 Although the present invention has been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various modifications and corrections based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present invention. For example, functions included in each member, each means, each step, etc. can be rearranged so as not to be logically contradictory, and a plurality of means, steps, etc. can be combined or divided into one. Is possible.

１０音響信号再生装置
１１デマルチプレクサ
１２復号化部
１３再生チャンネル変換部
１４スピーカ
２０音響信号作成装置
２１ミキサ
２２符号化部
２３マルチプレクサ（多重化部） DESCRIPTION OF SYMBOLS 10 Acoustic signal reproduction | regeneration apparatus 11 Demultiplexer 12 Decoding part 13 Reproduction | regeneration channel conversion part 14 Speaker 20 Acoustic signal production apparatus 21 Mixer 22 Encoding part 23 Multiplexer (multiplexing part)

Claims

A multi-channel audio signal reproducing apparatus including a plurality of sound space layers corresponding to different video formats ,
The multi-channel audio signal includes a video format corresponding to the audio space layer in the metadata, and an audio channel signal of one audio space layer selected based on the video format corresponding to the audio space layer and the video format of the reproduction environment decoding gastric row, audio signal reproducing apparatus including a decoding unit, which does not perform the decoding of the audio channel signals of the acoustic space layer other than the layered sound field in which the selected.

When the screen layout of the playback environment is different from the standard layout for the video format, among the acoustic channel signals, an acoustic channel signal that is linked to the screen is played back from the acoustic channel position in the screen layout of the playback environment. The acoustic signal reproduction device according to claim 1 , further comprising a reproduction channel conversion unit that performs conversion of the acoustic channel signal.

When the speaker arrangement of the playback environment is different from the standard arrangement for the video format, the audio channel signal is reproduced from the standard arrangement for the video format among the audio channel signals. The audio signal reproduction device according to claim 1 , further comprising: a reproduction channel conversion unit that performs the conversion.