JP6204682B2

JP6204682B2 - Acoustic signal reproduction device

Info

Publication number: JP6204682B2
Application number: JP2013079621A
Authority: JP
Inventors: 大出　訓史; 訓史大出; 靖茂中山; 郁子澤谷
Original assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Current assignee: Japan Broadcasting Corp; NHK Engineering System Inc
Priority date: 2013-04-05
Filing date: 2013-04-05
Publication date: 2017-09-27
Anticipated expiration: 2033-04-05
Also published as: JP2014204321A

Description

この発明は、複数の音響空間層を持つマルチチャンネル音響方式の音響信号再生装置に関する。 The present invention relates to audio signal reproducing apparatus of a multi-channel sound system having a plurality of layered sound field.

現在番組制作が行われている2チャンネル音響方式、5.1チャンネル音響方式に加え、7.1チャンネルや22.2チャンネルなどの5.1チャンネル音響方式を超えた「3次元(立体)音響方式」など複数の音響方式が提案されている。オーディオ関連の国際標準化団体であるITU-Rでは、ITU-R勧告として5.1チャンネル音響方式を超えた3次元音響方式（advanced multichannel audio system）に対する要求条件（非特許文献１）を定めており、今後も複数の音響方式が提案されることが予測される。これらの音響方式を共通のフォーマットで表現することで、次世代オーディオシステムに適用可能であり種々の方面への活用が可能な柔軟なシステムとすることができる。 In addition to the two-channel and 5.1-channel sound systems currently being produced, multiple sound systems such as the “three-dimensional (three-dimensional) sound system” that exceed the 5.1-channel sound systems such as 7.1 and 22.2 channels are proposed. Has been. The ITU-R, an international standardization organization related to audio, has set requirements (non-patent document 1) for a three-dimensional acoustic system (advanced multichannel audio system) beyond the 5.1 channel acoustic system as an ITU-R recommendation. It is expected that a plurality of acoustic methods will be proposed. By expressing these acoustic systems in a common format, it can be applied to a next-generation audio system and can be a flexible system that can be used in various fields.

"Performance requirements for an advanced multichannel stere ophonic sound system for use with or without accompanying picture", ITU-R勧告BS. 1909"Performance requirements for an advanced multichannel stere ophonic sound system for use with or without accompanying picture", ITU-R recommendation BS. 1909

種々の音響方式を表現可能な共通のフォーマットとして、「単一の音響空間層を持つ音響信号」の検討は進んでいる。ここで、空間的に配置された複数のチャンネル信号によって構築される音を単一の音響空間層とする。これまでの番組制作では番組に必要な音を全て単一の音響空間層に配置している。これまで一つにまとめていた音響空間層を幾つかの層に分割して音響番組制作を行い、「複数の音響空間層を持つ音響信号」の形式を用いることで、番組交換時の受取先や家庭の環境に合わせて受信した音響信号の変形・変換・入替を容易に行うことができるようになる。これ以降、「マルチチャンネル音響方式」とは「複数の音響空間層を持つ音響方式」を意図するものとして説明を行う。 As a common format capable of expressing various acoustic systems, “acoustic signals having a single acoustic space layer” are being studied. Here, a sound constructed by a plurality of spatially arranged channel signals is defined as a single acoustic space layer. In conventional program production, all the sounds required for a program are arranged in a single acoustic space layer. Dividing the acoustic space layer that has been integrated into several layers into several layers, producing an audio program, and using the format of "acoustic signal with multiple acoustic space layers", the recipient at the time of program exchange In addition, it is possible to easily transform, convert, and replace the received acoustic signal according to the home environment. Hereinafter, the “multi-channel acoustic system” will be described as intended to be an “acoustic system having a plurality of acoustic spatial layers”.

例えば、マルチチャンネル音響方式を用いて放送される放送番組は、様々な音響方式によって制作され、様々な再生環境において再生される。スピーカ数が増えることで、より詳細な間接音（反射音などの残響成分）の制御が要求される。一方、制作環境はある規格に基づいて最適な残響時間など音環境が実現されているが、受信側の再生環境は規格に基づいて作られていない。制作環境と再生環境の違いによって、制作者の意図とは異なる再生音となる場合があった。 For example, a broadcast program broadcast using a multi-channel audio system is produced by various audio systems and reproduced in various reproduction environments. Increasing the number of speakers requires more detailed control of indirect sounds (reverberation components such as reflected sounds). On the other hand, the production environment has a sound environment such as an optimal reverberation time based on a certain standard, but the reproduction environment on the receiving side is not created based on the standard. Depending on the difference between the production environment and the playback environment, the playback sound may differ from the creator's intention.

したがって、かかる点に鑑みてなされた本発明の目的は、複数の音響空間層を持つマルチチャンネル音響方式に対応し、再生環境に合わせて間接音を調節可能な音響信号再生装置を提供することにある。 Accordingly, an object of the present invention made in view of such a point is to provide an acoustic signal reproducing apparatus that can cope with a multi-channel acoustic system having a plurality of acoustic space layers and can adjust an indirect sound in accordance with a reproduction environment. is there.

上述した諸課題を解決すべく、本発明に係る音響信号再生装置は、少なくとも１つの間接音の音響空間層を含む、複数の音響空間層を持つマルチチャンネル音響信号の再生装置であって、前記マルチチャンネル音響信号に含まれるメタデータは、音響空間層の数と、制作環境の音響に関する特徴量と、各音響空間層が直接音か間接音かを示す記述と、それぞれの音響空間層における各音響チャンネル信号の配置と、直接音の音響空間層と間接音の音響空間層との再生バランスを示すそれぞれの推奨ラウドネスと、を含み、前記制作環境の前記特徴量と、再生環境の前記特徴量とに基づき、再生時に前記制作環境に近い前記特徴量を実現するように前記間接音の音響空間層の音量を調整して、前記直接音の音響空間層と前記間接音の音響空間層とを合成する音量調整部を備える。 In order to solve the above-described problems, an acoustic signal reproducing device according to the present invention is a multi-channel acoustic signal reproducing device having a plurality of acoustic spatial layers, including an acoustic spatial layer of at least one indirect sound, The metadata included in the multi-channel sound signal includes the number of sound space layers, the feature values related to the sound of the production environment, a description indicating whether each sound space layer is a direct sound or an indirect sound, and each sound space layer. wherein the arrangement of the sound channel signals, and a respective recommendation loudness indicating the playback balance between layered sound field of the acoustic space layer and the indirect sound of the direct sound, the feature quantity of the production environment and the feature amount of the reproduction environment based on bets, to adjust the volume of the acoustic space layer of the indirect sound so as to realize the characteristic amount close to the production environment when reproducing, layered sound of the acoustic space layer and the indirect sound of the direct sound Comprising a volume adjusting unit for synthesizing.

また、マルチチャンネル音響信号全体の音量を調整する場合に、前記音量調整部は、前記制作環境の前記特徴量と前記再生環境の前記特徴量とに基づき、前記推奨ラウドネスからのラウドネスの変更幅を音響空間層毎に変えて、前記直接音の音響空間層及び前記間接音の音響空間層の音量を調整して合成することが好ましい。 In addition, when adjusting the volume of the entire multi-channel audio signal, the volume adjustment unit may change the loudness change range from the recommended loudness based on the feature amount of the production environment and the feature amount of the reproduction environment. instead every acoustic space layer, it is preferably synthesized by adjusting the volume of the layered sound field of the acoustic space layer and the indirect sound of the direct sound.

また、マルチチャンネル音響信号を復号化する復号化部を備え、前記復号化部は、前記マルチチャンネル音響信号が間接音の音響空間層を複数含む場合、複数含まれる間接音の音響空間層のうち、前記メタデータに記載された前記制作環境の前記特徴量と前記再生環境の前記特徴量との差が最も小さくなる間接音の音響空間層のみを復号化することが好ましい。 A decoding unit configured to decode a multi-channel acoustic signal, wherein the decoding unit includes a plurality of indirect sound space layers when the multi-channel sound signal includes a plurality of indirect sound space layers; it is preferable to decode only layered sound field differences are minimized indirect sound of the feature quantity of the reproduction environment and the feature quantity of the production environment described the metadata.

また、前記制作環境の前記特徴量は、前記制作環境において各音響空間層を推奨ラウドネスで再生したときの残響時間、両耳間相互相関度、両耳間レベル差、及び両耳間時間差のうち少なくとも一つを含むことが好ましい。 Further, the feature quantity of the production environment, the production reverberation time when reproduced by the recommended loudness of each layered sound field in the environment, the interaural cross-correlation, the interaural level difference, and of the interaural time difference It is preferable to include at least one.

本発明に係る音響信号再生装置によれば、複数の音響空間層を持つマルチチャンネル音響方式に対応し、再生環境に合わせて間接音を調節することが可能となる。 According to the audio signal reproducing apparatus according to the present invention, supports multi-channel sound system having a plurality of layered sound field, it is possible to adjust the indirect sound to suit the reproduction environment.

本発明の一実施形態に係る音響信号再生装置の構成を示す図である。It is a figure which shows the structure of the acoustic signal reproducing | regenerating apparatus which concerns on one Embodiment of this invention. マルチチャンネル音響信号に含まれる音響空間層の一例を示す図である。It is a figure which shows an example of the acoustic space layer contained in a multichannel acoustic signal. マルチチャンネル音響信号におけるメタデータの一例を示す図である。It is a figure which shows an example of the metadata in a multichannel acoustic signal. 本発明の一実施形態に係る音響信号作成装置の構成を示す図である。It is a figure which shows the structure of the acoustic signal production apparatus which concerns on one Embodiment of this invention.

以降、諸図面を参照しながら、本発明の実施態様を詳細に説明する。ここで、本発明は、「複数の音響空間層を持つ音響信号」であるマルチチャンネル音響信号に対応するものである。本件出願人は、「単一の音響空間層を持つ音響信号」について韓国特許出願（10-2012-0112984）を行っており、また、「複数の音響空間層を持つ音響信号」について日本国特許出願（特願2013-010544）を行っている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Here, the present invention corresponds to a multi-channel acoustic signal that is an “acoustic signal having a plurality of acoustic spatial layers”. The applicant has filed a Korean patent application (10-2012-0112984) for “acoustic signals having a single acoustic space layer” and a Japanese patent for “acoustic signals having multiple acoustic space layers”. An application has been filed (Japanese Patent Application 2013-010544).

図１は、本発明の一実施形態に係る音響信号再生装置の構成を示す図である。音響信号再生装置１０は、デマルチプレクサ１１（ＤＥＭＵＸ）と、復号化部１２と、再生チャンネル変換部１３と、音量調整部１５とを備え、音響信号再生装置１０の出力信号はスピーカ１４により音として再生される。 FIG. 1 is a diagram showing a configuration of an acoustic signal reproduction device according to an embodiment of the present invention. The acoustic signal reproduction device 10 includes a demultiplexer 11 (DEMUX), a decoding unit 12, a reproduction channel conversion unit 13, and a volume adjustment unit 15. The output signal of the acoustic signal reproduction device 10 is output as sound by the speaker 14. Played.

デマルチプレクサ１１は、入力されたマルチチャンネル音響データストリームをメタデータと音響チャンネル信号に分離する。デマルチプレクサ１１は、音響チャンネル信号を復号化部１２に出力し、メタデータを復号化部１２、音量調整部１５及び再生チャンネル変換部１３に出力する。 The demultiplexer 11 separates the input multi-channel audio data stream into metadata and audio channel signals. The demultiplexer 11 outputs the acoustic channel signal to the decoding unit 12, and outputs the metadata to the decoding unit 12, the volume adjustment unit 15, and the reproduction channel conversion unit 13.

図２は、本実施形態におけるマルチチャンネル音響信号（音響データストリーム）に含まれる音響空間層の一例を示す図である。本実施形態におけるマルチチャンネル音響信号は、音源から直接届く直接音と、部屋の反射音（残響音）などの間接音とを、それぞれ独立した音響空間層として含むものである。第１の音響空間層２００は、直接音からなる音響空間層であり、利用者１００を中心に５チャンネル（２１０〜２５０）が配置される。また、第２の音響空間層３００は、間接音の音響空間層であり、利用者１００を中心に５チャンネル（３１０〜３５０）が配置される。なお、本実施形態におけるマルチチャンネル音響信号は、間接音の音響空間層を複数含み、再生環境の特徴量（例えば残響時間）に応じて間接音の音響空間層を切り替えることができる。 FIG. 2 is a diagram illustrating an example of an acoustic space layer included in a multi-channel acoustic signal (acoustic data stream) in the present embodiment. The multi-channel acoustic signal in the present embodiment includes a direct sound that directly reaches from a sound source and an indirect sound such as a room reflected sound (reverberation sound) as independent acoustic space layers. The first acoustic space layer 200 is an acoustic space layer made of direct sound, and five channels (210 to 250) are arranged around the user 100. The second acoustic space layer 300 is an acoustic space layer of indirect sound, and five channels (310 to 350) are arranged around the user 100. Note that the multi-channel acoustic signal in the present embodiment includes a plurality of indirect sound space layers, and the indirect sound space layer can be switched according to the reproduction environment feature amount (for example, reverberation time).

図３は、図２に示す音響空間層を表すメタデータの一例を示す図である。図３のメタデータ（Sound Essence 000）には、マルチチャンネル音響信号について、音響空間層の数（直接音と間接音との２層）、制作環境の特徴量である残響時間（２秒）、マルチチャンネル音響信号の種類（音楽：Music）が記載されている。さらに、メタデータでは、直接音である第１の音響空間層２００（Sound Field 01）と、間接音である第２の音響空間層３００（Sound Field 02）又は第３の音響空間層４００（Sound Field 03）とがリンクされマルチチャンネル音響信号を構成することが記載される。ここで、間接音である第２の音響空間層３００及び第３の音響空間層４００は、それぞれ異なるパターンの間接音であり、再生環境の特徴量（例えば残響時間）に応じて音響信号再生装置により適宜選択されるものである。なお、間接音として３層以上の音響空間層を含めることも可能である。また、制作環境の特徴量は残響時間に限られず、両耳間相互相関度、両耳間レベル差、両耳間時間差など別の基準を用いても良く、さらに、各特徴量の周波数帯域ごとの算出値を制作環境の特徴量とすることができる。また、直接音と間接音との割合は、後述の通り推奨ラウドネスとして各音響空間層で記載されている。 FIG. 3 is a diagram illustrating an example of metadata representing the acoustic space layer illustrated in FIG. 2. The metadata (Sound Essence 000) in FIG. 3 includes the number of acoustic space layers (two layers of direct sound and indirect sound), reverberation time (2 seconds), which is a feature of the production environment, for multi-channel sound signals. The type of multi-channel sound signal (music) is described. Further, in the metadata, the first acoustic space layer 200 (Sound Field 01) that is a direct sound and the second acoustic space layer 300 (Sound Field 02) or the third acoustic space layer 400 (Sound Field 01) that is an indirect sound. Field 03) is linked to form a multi-channel audio signal. Here, the second acoustic space layer 300 and the third acoustic space layer 400, which are indirect sounds, are indirect sounds having different patterns, respectively, and an acoustic signal reproducing device according to the characteristic amount (for example, reverberation time) of the reproduction environment. Is appropriately selected. It is possible to include three or more acoustic space layers as indirect sounds. In addition, the feature quantity in the production environment is not limited to the reverberation time, and other criteria such as the interaural cross-correlation, interaural level difference, and interaural time difference may be used. Can be used as a feature value of the production environment. Moreover, the ratio of a direct sound and an indirect sound is described by each acoustic space layer as recommended loudness as below-mentioned.

直接音である第１の音響空間層２００は、５ｃｈで構成され、推奨ラウドネスは−１８ＬＫＦＳであり、音響チャンネル信号２１０（Channel 01）がＬｃｈ、音響チャンネル信号２２０（Channel 02）がＲｃｈ、音響チャンネル信号２３０（Channel 03）がＣｃｈ、音響チャンネル信号２４０（Channel 04）がＢＬｃｈ、音響チャンネル信号２５０（Channel 05）がＢＲｃｈより再生されることが記載されている。さらに、第１の音響空間層２００には、直接音（Direct）であることと、間接音である第２の音響空間層３００又は第３の音響空間層４００へのリンクが記載される。 The first acoustic space layer 200 that is a direct sound is composed of 5 channels, the recommended loudness is −18 LKFS, the acoustic channel signal 210 (Channel 01) is Lch, the acoustic channel signal 220 (Channel 02) is Rch, and the acoustic channel. It is described that the signal 230 (Channel 03) is reproduced from Cch, the acoustic channel signal 240 (Channel 04) is reproduced from BLch, and the acoustic channel signal 250 (Channel 05) is reproduced from BRch. Further, the first acoustic space layer 200 describes a direct sound and a link to the second acoustic space layer 300 or the third acoustic space layer 400 that is an indirect sound.

間接音である第２の音響空間層３００は、５ｃｈで構成され、推奨ラウドネスは−２５ＬＫＦＳであり、音響チャンネル信号３１０（Channel 06）がＬｃｈ、音響チャンネル信号３２０（Channel 07）がＲｃｈ、音響チャンネル信号３３０（Channel 08）がＣｃｈ、音響チャンネル信号３４０（Channel 09）がＢＬｃｈ、音響チャンネル信号３５０（Channel 10）がＢＲｃｈより再生されることが記載されている。さらに、第２の音響空間層３００には、間接音（Reverberation D）であることと、直接音である第１の音響空間層２００へのリンクが記載される。 The second acoustic space layer 300 which is an indirect sound is composed of 5 channels, the recommended loudness is −25 LKFS, the acoustic channel signal 310 (Channel 06) is Lch, the acoustic channel signal 320 (Channel 07) is Rch, and the acoustic channel. It is described that the signal 330 (Channel 08) is reproduced from Cch, the acoustic channel signal 340 (Channel 09) is reproduced from BLch, and the acoustic channel signal 350 (Channel 10) is reproduced from BRch. Further, the second acoustic space layer 300 describes an indirect sound (Reverberation D) and a link to the first acoustic space layer 200 that is a direct sound.

また、第２の音響空間層３００とは異なるパターンの間接音である第３の音響空間層４００は、５ｃｈで構成され、推奨ラウドネスは−２５ＬＫＦＳであり、音響チャンネル信号４１０（Channel 11）がＬｃｈ、音響チャンネル信号４２０（Channel 12）がＲｃｈ、音響チャンネル信号４３０（Channel 13）がＣｃｈ、音響チャンネル信号４４０（Channel 14）がＢＬｃｈ、音響チャンネル信号４５０（Channel 15）がＢＲｃｈより再生されることが記載されている。さらに、第３の音響空間層４００には、間接音（Reverberation W）であることと、直接音である第１の音響空間層２００へのリンクが記載される。 The third acoustic space layer 400, which is an indirect sound having a pattern different from that of the second acoustic space layer 300, is composed of 5ch, the recommended loudness is -25LKFS, and the acoustic channel signal 410 (Channel 11) is Lch. The acoustic channel signal 420 (Channel 12) is reproduced from Rch, the acoustic channel signal 430 (Channel 13) is reproduced from Cch, the acoustic channel signal 440 (Channel 14) is reproduced from BLch, and the acoustic channel signal 450 (Channel 15) is reproduced from BRch. Have been described. Furthermore, the third acoustic space layer 400 describes an indirect sound (Reverberation W) and a link to the first acoustic space layer 200 that is a direct sound.

復号化部１２は、予め記憶されている再生環境の特徴量と、デマルチプレクサ１１からのメタデータとに基づいて、必要な音響空間層の音響チャンネル信号を復号化する。より詳細には、復号化部１２は、直接音である第１の音響空間層２００の各音響チャンネル信号と、間接音である第２の音響空間層３００又は第３の音響空間層４００のいずれか一方の各音響チャンネル信号を復号化する。復号化部１２は、メタデータに記載された制作環境の特徴量と、再生環境の特徴量とに基づき、間接音として第２の音響空間層３００又は第３の音響空間層４００のいずれを復号化するかを選択して復号化する。例えば、復号化部１２は、再生環境の特徴量に基づき、間接音の音響空間層のうち、当該再生環境で再生した際により制作環境に近い再生音が得られる音響空間層を選択して復号化する。これにより、より制作環境に近い再生音を聴取することが可能となる。なお、間接音の音響空間層が１つである場合は、復号化部１２は当該間接音の音響空間層を復号化することは言うまでもない。 The decoding unit 12 decodes the necessary acoustic channel signal of the acoustic space layer based on the reproduction environment feature quantity stored in advance and the metadata from the demultiplexer 11. More specifically, the decoding unit 12 includes each acoustic channel signal of the first acoustic space layer 200 that is a direct sound, and any of the second acoustic space layer 300 or the third acoustic space layer 400 that is an indirect sound. Each of the acoustic channel signals is decoded. The decoding unit 12 decodes either the second acoustic space layer 300 or the third acoustic space layer 400 as an indirect sound based on the feature amount of the production environment described in the metadata and the feature amount of the reproduction environment. Select whether or not to decrypt. For example, the decoding unit 12 selects and decodes, from the acoustic space layers of the indirect sound, the acoustic space layer from which the reproduced sound closer to the production environment can be obtained when reproduced in the reproduction environment, based on the feature amount of the reproduction environment. Turn into. As a result, it is possible to listen to the reproduced sound closer to the production environment. Needless to say, when there is only one acoustic space layer of the indirect sound, the decoding unit 12 decodes the acoustic space layer of the indirect sound.

音量調整部１５は、メタデータに記載された制作環境の特徴量と、再生環境の特徴量とに基づき、間接音の音響空間層（第２の音響空間層３００又は第３の音響空間層４００）を調整して直接音の第１の音響空間層２００と合成する。例えば、音量調整部１５は、復号化された間接音の音響空間層の制作環境の特徴量と、再生環境の特徴量とが異なる場合、制作環境の特徴量及び再生環境の特徴量に基づき、間接音の音響空間層の音量を調整して直接音の音響空間層と合成する。例えば、音量調整部１５は、制作環境よりも再生環境の残響時間が短い場合、間接音の音響空間層の再生レベルを推奨ラウドネスよりも大きくなるように調整して直接音の音響空間層と合成する。これにより、制作環境より残響時間が短い再生環境において、制作環境に近い残響時間を実現することが可能となる。すなわち、制作環境に近い再生音が得られる間接音の音響空間層を選択してもなお制作環境と再生環境との再生音に差がある場合でも、より制作環境に近い再生音を聴取することが可能となる。なお、制作環境と再生環境との特徴量が一致するか又は差が所定の許容範囲内である場合、音量調整部１５は、間接音の音響空間層（第２の音響空間層３００又は第３の音響空間層４００）をそのまま直接音の第１の音響空間層２００と合成してもよい。 The sound volume adjusting unit 15 is configured to generate an indirect sound space (second sound space layer 300 or third sound space layer 400) based on the feature amount of the production environment described in the metadata and the feature amount of the reproduction environment. ) To be synthesized with the first acoustic space layer 200 of direct sound. For example, when the feature amount of the production environment of the acoustic space layer of the decoded indirect sound is different from the feature amount of the reproduction environment, the volume adjustment unit 15 is based on the feature amount of the production environment and the feature amount of the reproduction environment. The volume of the acoustic space layer of indirect sound is adjusted and synthesized with the acoustic space layer of direct sound. For example, when the reverberation time of the reproduction environment is shorter than the production environment, the volume adjustment unit 15 adjusts the reproduction level of the acoustic space layer of the indirect sound to be larger than the recommended loudness and synthesizes it with the acoustic space layer of the direct sound. To do. This makes it possible to realize a reverberation time that is close to the production environment in a playback environment in which the reverberation time is shorter than the production environment. In other words, even if there is a difference in the playback sound between the production environment and the playback environment, even if you select an acoustic space layer of indirect sound that can reproduce the playback sound close to the production environment, you should listen to the playback sound that is closer to the production environment Is possible. When the feature quantities of the production environment and the reproduction environment match or the difference is within a predetermined allowable range, the volume adjustment unit 15 performs the acoustic space layer of the indirect sound (the second acoustic space layer 300 or the third sound space layer). The acoustic space layer 400) may be directly combined with the first acoustic space layer 200 of direct sound.

また、マルチチャンネル音響信号全体の音量を調整する場合、音量調整部１５は、メタデータに記載されている各音響空間層の推奨ラウドネスに基づき、直接音である第１の音響空間層２００の各音響チャンネル信号と、間接音である第２の音響空間層３００又は第３の音響空間層４００のいずれか一方の各音響チャンネル信号のラウドネスの変更幅を音響空間層毎に変えて音量を調整することができる。例えば図３の場合、音量調整部１５は、残響時間の短い再生環境で全体の音量を−２４ＬＫＦＳから−２８ＬＫＦＳへと下げるとき、第１の音響空間層２００（直接音）は−１８ＬＫＦＳから−２４ＬＫＦＳへと比較的下げ幅を大きくし、第２の音響空間層３００又は第３の音響空間層４００（間接音）は−２５ＬＫＦＳから−２７ＬＫＦＳへと比較的下げ幅を小さくするなど、音響空間層の役割に応じて各音響空間層の音量を調整する。制作環境により近い再生音でコンテンツを聴取することが可能となる。なお、残響時間が長い再生環境では、間接音を抑えることで、より聞き取りやすい音量調整が可能となる。 In addition, when adjusting the volume of the entire multi-channel acoustic signal, the volume adjustment unit 15 is configured so that each volume of the first acoustic space layer 200 that is a direct sound is based on the recommended loudness of each acoustic space layer described in the metadata. The volume is adjusted by changing the change width of the loudness of each of the acoustic channel signals and either the second acoustic space layer 300 or the third acoustic space layer 400 that is an indirect sound for each acoustic space layer. be able to. For example, in the case of FIG. 3, when the volume adjustment unit 15 lowers the overall volume from −24 LKFS to −28 LKFS in a reproduction environment with a short reverberation time, the first acoustic space layer 200 (direct sound) is changed from −18 LKFS to −24 LKFS. The second acoustic space layer 300 or the third acoustic space layer 400 (indirect sound) is relatively lowered from -25LKFS to -27LKFS, and the width of the acoustic space layer is reduced. The volume of each acoustic space layer is adjusted according to the role. It becomes possible to listen to the content with playback sound closer to the production environment. Note that, in a playback environment with a long reverberation time, it is possible to adjust the volume more easily to hear by suppressing the indirect sound.

再生チャンネル変換部１３は、複数の音響空間層を一つの音響空間にまとめ、再生環境に合わせ、各再生スピーカに入力する音響チャンネル信号を生成する。 The reproduction channel conversion unit 13 combines a plurality of acoustic space layers into one acoustic space, and generates an acoustic channel signal to be input to each reproduction speaker in accordance with the reproduction environment.

このように、本実施形態によれば、音量調整部１５は、マルチチャンネル音響信号に含まれるメタデータに記載された制作環境の特徴量と、再生環境の特徴量とに基づき、間接音の音響空間層を調整して直接音の音響空間層と合成する。これにより、複数の音響空間層を持つマルチチャンネル音響方式に対応し、再生環境合わせて間接音を調節することが可能となる。特に、放送番組を構成する音響区間を直接音と間接音に分離して伝送することによって、再生環境に合わせて反射音を調整することが可能になり、制作者の意図通りの音響空間を再生することが可能となる。 As described above, according to the present embodiment, the volume adjustment unit 15 performs the sound of the indirect sound based on the production environment feature amount described in the metadata included in the multi-channel sound signal and the reproduction environment feature amount. The spatial layer is adjusted and synthesized with the direct acoustic space layer. Thus, it is possible to adjust the indirect sound in accordance with the reproduction environment, corresponding to a multi-channel acoustic system having a plurality of acoustic space layers. In particular, by separating and transmitting the sound sections that make up a broadcast program into direct and indirect sounds, it is possible to adjust the reflected sound according to the playback environment and reproduce the sound space as intended by the producer It becomes possible to do.

また、音量調整部１５は、推奨ラウドネスに基づき直接音の音響空間層及び間接音の音響空間層の音量を調整して合成する。もし制作環境と再生環境が大きく異なる場合、再生環境の残響時間が極端に短い場合など、推奨ラウドネスよりも再生レベルが大きくなるように音量を調整して合成する。これにより、より制作環境で聴取したときに近い音響品質で番組聴取が可能となる。 The volume adjuster 15 adjusts and synthesizes the volume of the direct sound space and the indirect sound space based on the recommended loudness. If the production environment differs greatly from the playback environment, or if the reverberation time of the playback environment is extremely short, the volume is adjusted so that the playback level is greater than the recommended loudness. Thereby, it becomes possible to listen to the program with a sound quality close to that when listening in the production environment.

また、復号化部１２は、マルチチャンネル音響信号が間接音の音響空間層を複数含む場合、メタデータに記載された制作環境の特徴量と再生環境の特徴量とに基づき選択した間接音の音響空間層の復号化を行う。これにより、再生環境に合わせて反射音を切り替えることが可能になり、制作者の意図通りの音響空間を再生することが可能となる。また、選択した間接音の音響空間層以外の復号化を省略できるため、復号化に係る処理負荷を低減することが可能となる。 In addition, when the multi-channel acoustic signal includes a plurality of indirect sound space layers, the decoding unit 12 selects the sound of the indirect sound selected based on the production environment feature amount and the reproduction environment feature amount described in the metadata. Perform spatial layer decoding. As a result, the reflected sound can be switched according to the reproduction environment, and the acoustic space as intended by the producer can be reproduced. In addition, since the decoding of the selected indirect sound other than the acoustic space layer can be omitted, it is possible to reduce the processing load related to the decoding.

また、制作環境の特徴量は、残響時間、両耳間相互相関度、両耳間レベル差、及び両耳間時間差のうち少なくとも一つを含むものである。これにより、残響時間に限られず、種々の基準により複数の間接音の音響空間層を構築し、より再生環境に適した間接音の音響空間層を選択することが可能になる。即ち、かかる特徴量により、より制作環境に近い音響品質で番組聴取が可能となる。なお、上記説明では残響時間を例示しているが、制作環境の特徴量として両耳間相互相関度、両耳間レベル差、及び両耳間時間差を用いる場合、メタデータには該当する特徴量の情報が記載され、また、間接音の音響空間層は各特徴量に応じて準備された音響チャンネル信号を含むものであることは言うまでもない。 The feature amount of the production environment includes at least one of reverberation time, interaural cross-correlation, interaural level difference, and interaural time difference. Accordingly, it is possible to construct a plurality of indirect sound space layers based on various criteria and select an indirect sound space layer more suitable for the reproduction environment, not limited to the reverberation time. That is, the feature amount enables the program to be listened to with an acoustic quality that is closer to the production environment. In addition, although the reverberation time is illustrated in the above description, when the binaural cross-correlation level, the binaural level difference, and the binaural time difference are used as the feature amount of the production environment, the corresponding feature amount is included in the metadata. Needless to say, the acoustic space layer of the indirect sound includes acoustic channel signals prepared according to each feature amount.

図４は、本発明の一実施形態に係る音響信号作成装置の構成を示す図である。音響信号作成装置２０は、ミキサ２１と、符号化部２２と、マルチプレクサ２３（ＭＵＸ）とを備える。 FIG. 4 is a diagram showing a configuration of an acoustic signal creation device according to an embodiment of the present invention. The acoustic signal generation device 20 includes a mixer 21, an encoding unit 22, and a multiplexer 23 (MUX).

ミキサ２１は、複数の音響信号をミキシングして、音響空間層毎の音響チャンネル信号として符号化部２２に出力する。ここで、ミキサ２１に入力される音響信号は、直接音の音響空間層及び間接音の音響空間層の音響チャンネル信号を含むものである。 The mixer 21 mixes a plurality of acoustic signals, and outputs them to the encoding unit 22 as acoustic channel signals for each acoustic space layer. Here, the acoustic signal input to the mixer 21 includes acoustic channel signals of the direct sound space layer and the indirect sound space layer.

符号化部２２は、ミキサ２１からの各音響空間層の音響チャンネル信号を符号化してマルチプレクサ２３に出力する。 The encoding unit 22 encodes the acoustic channel signal of each acoustic space layer from the mixer 21 and outputs it to the multiplexer 23.

マルチプレクサ２３（多重化部）は、直接音の音響空間層及び間接音の音響空間層の音響チャンネル信号と、制作環境の特徴量を含むメタデータとを多重化するものであり、番組制作者等により入力されるメタデータと、符号化された音響チャンネル信号を多重化して複数の音響空間層を持つマルチチャンネル音響信号を作成する。ここで、制作環境の特徴量は、残響時間、両耳間相互相関度、両耳間レベル差、及び両耳間時間差のうち少なくとも一つを含むものである。また、マルチプレクサ２３は、直接音の音響空間層及び間接音の音響空間層それぞれの推奨ラウドネスを含むメタデータを多重化することができる。また、マルチプレクサ２３は、複数の間接音の音響空間層の音響チャンネル信号を多重化することができる。マルチプレクサ２３は、放送又は伝送によりマルチチャンネル音響信号を伝えるため、マルチチャンネル音響信号を多重化して電波またはＩＰ回線等で家庭など遠隔地に伝送する。 The multiplexer 23 (multiplexing unit) multiplexes the acoustic channel signals of the direct sound space layer and the indirect sound space layer and the metadata including the feature amount of the production environment, such as a program producer. Is multiplexed with the encoded acoustic channel signal to create a multi-channel acoustic signal having a plurality of acoustic spatial layers. Here, the feature quantity of the production environment includes at least one of reverberation time, interaural cross-correlation, interaural level difference, and interaural time difference. The multiplexer 23 can multiplex metadata including recommended loudness of each of the direct sound space and the indirect sound space layer. The multiplexer 23 can multiplex the acoustic channel signals of the acoustic space layer of a plurality of indirect sounds. The multiplexer 23 multiplexes the multi-channel sound signal and transmits it to a remote place such as a home by radio wave or IP line in order to transmit the multi-channel sound signal by broadcasting or transmission.

このように、本実施形態によれば、マルチプレクサ２３は、直接音の音響空間層及び間接音の音響空間層の音響チャンネル信号と、制作環境の特徴量を含むメタデータとを多重化する。これにより、音響信号再生装置側で、複数の音響空間層を持つマルチチャンネル音響方式に対応し、再生環境合わせて間接音を調節することが可能となる。特に、放送番組を構成する音響区間を直接音と間接音に分離して伝送することによって、再生環境に合わせて反射音を調整することが可能になり、制作者の意図通りの音響空間を再生することが可能となる。 As described above, according to the present embodiment, the multiplexer 23 multiplexes the acoustic channel signals of the acoustic space layer of the direct sound and the acoustic space layer of the indirect sound and the metadata including the feature amount of the production environment. As a result, on the acoustic signal reproduction device side, the multi-channel acoustic system having a plurality of acoustic space layers is supported, and the indirect sound can be adjusted according to the reproduction environment. In particular, by separating and transmitting the sound sections that make up a broadcast program into direct and indirect sounds, it is possible to adjust the reflected sound according to the playback environment and reproduce the sound space as intended by the producer It becomes possible to do.

また、マルチプレクサ２３は、直接音の音響空間層及び間接音の音響空間層それぞれの推奨ラウドネスを含むメタデータを多重化する。これにより、音響信号再生装置側で、より制作環境に近い音響品質で番組聴取が可能となる。 The multiplexer 23 multiplexes metadata including recommended loudness of each of the direct sound space and the indirect sound space. As a result, it is possible to listen to the program with an acoustic quality closer to the production environment on the acoustic signal reproduction device side.

また、マルチプレクサ２３は、複数の間接音の音響空間層の音響チャンネル信号を多重化する。これにより、音響信号再生装置側で、再生環境に合わせて反射音を切り替えることが可能になり、制作者の意図通りの音響空間を再生することが可能となる。 Further, the multiplexer 23 multiplexes the acoustic channel signals of the acoustic space layer of a plurality of indirect sounds. Thereby, on the acoustic signal reproduction device side, it is possible to switch the reflected sound in accordance with the reproduction environment, and it is possible to reproduce the acoustic space as intended by the producer.

また、再生環境の特徴量は、残響時間、両耳間相互相関度、両耳間レベル差、及び両耳間時間差のうち少なくとも一つを含む。これにより、音響信号再生装置側で、残響時間に限られず、種々の基準でより再生環境に適した間接音の音響空間層を選択することが可能になる。即ち、かかる特徴量により、より制作環境に近い音響品質で番組聴取が可能となる。 The reproduction environment feature amount includes at least one of reverberation time, interaural cross-correlation, interaural level difference, and interaural time difference. As a result, the acoustic signal reproduction device side can select an acoustic space layer of indirect sound that is more suitable for the reproduction environment based on various criteria, without being limited to the reverberation time. That is, the feature amount enables the program to be listened to with an acoustic quality that is closer to the production environment.

本発明を諸図面や実施例に基づき説明してきたが、当業者であれば本開示に基づき種々の変形や修正を行うことが容易であることに注意されたい。従って、これらの変形や修正は本発明の範囲に含まれることに留意されたい。例えば、各部材、各手段、各ステップなどに含まれる機能などは論理的に矛盾しないように再配置可能であり、複数の手段やステップなどを１つに組み合わせたり、或いは分割したりすることが可能である。 Although the present invention has been described based on the drawings and examples, it should be noted that those skilled in the art can easily make various modifications and corrections based on the present disclosure. Therefore, it should be noted that these variations and modifications are included in the scope of the present invention. For example, functions included in each member, each means, each step, etc. can be rearranged so as not to be logically contradictory, and a plurality of means, steps, etc. can be combined or divided into one. Is possible.

１０音響信号再生装置
１１デマルチプレクサ
１２復号化部
１３再生チャンネル変換部
１４スピーカ
１５音量調整部
２０音響信号作成装置
２１ミキサ
２２符号化部
２３マルチプレクサ（多重化部）
DESCRIPTION OF SYMBOLS 10 Acoustic signal reproduction | regeneration apparatus 11 Demultiplexer 12 Decoding part 13 Reproduction | regeneration channel conversion part 14 Speaker 15 Volume control part 20 Acoustic signal production apparatus 21 Mixer 22 Encoding part 23 Multiplexer (multiplexing part)

Claims

An apparatus for reproducing a multi-channel acoustic signal having a plurality of acoustic spatial layers including an acoustic spatial layer of at least one indirect sound,
The metadata included in the multi-channel sound signal includes the number of sound space layers, the feature amount related to the sound of the production environment, a description indicating whether each sound space layer is a direct sound or an indirect sound, and the sound space layer. Including the placement of each acoustic channel signal and the respective recommended loudness indicating the reproduction balance between the direct sound space space and the indirect sound space layer,
And the feature quantity of the production environment, on the basis of said characteristic quantity of the reproduction environment, to adjust the volume of the acoustic space layer of the indirect sound so as to realize the characteristic amount close to the production environment during playback, the direct An acoustic signal reproduction device comprising a volume adjustment unit that synthesizes an acoustic space layer of sound and an acoustic space layer of the indirect sound .

In the acoustic signal reproducing device according to claim 1, when adjusting the volume of the entire multi-channel acoustic signal,
The sound volume adjustment unit is configured to change the change range of the loudness from the recommended loudness for each acoustic space layer based on the feature amount of the production environment and the feature amount of the reproduction environment , so that the acoustic space layer of the direct sound And an acoustic signal reproducing apparatus characterized by adjusting and synthesizing the volume of the acoustic space layer of the indirect sound.

The acoustic signal reproduction device according to claim 1 or 2,
A decoding unit for decoding a multi-channel acoustic signal;
Said decoding section, when the multi-channel audio signal includes a plurality of layered sound of the indirect sound of the acoustic space layer of a plurality Included indirect sound, the feature quantity of the production environment described the metadata and the difference between the feature quantity of the reproduction environment, characterized in that decode only acoustic space layer of smallest indirect sound, audio signal reproducing apparatus.

In the acoustic signal reproducing device according to any one of claims 1 to 3,
The feature amount of the production environment is at least one of reverberation time, interaural cross-correlation, interaural level difference, and interaural time difference when each acoustic space layer is reproduced with the recommended loudness in the production environment. One characterized in that it comprises a sound signal reproducing apparatus.