JP2023072027A

JP2023072027A - Decoder and method, and program

Info

Publication number: JP2023072027A
Application number: JP2023038916A
Authority: JP
Inventors: 徹知念; Toru Chinen; 正之西口; Masayuki Nishiguchi; 潤宇史; Runyu Shi; 光行畠中; Mitsuyuki Hatanaka; 優樹山本; Yuki Yamamoto
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2014-03-24
Filing date: 2023-03-13
Publication date: 2023-05-23
Also published as: JP2019049745A; JP7412367B2; JP2021064013A; JP6863359B2

Abstract

PROBLEM TO BE SOLVED: To enable the amount of calculation for decoding an audio signal to be reduced.

SOLUTION: A priority information acquisition unit acquires priority information of each channel from a bit stream for supply to an output selection unit. A channel audio signal decoding unit decodes encoded data of an audio signal of each channel, and supplies an obtained MDCT coefficient to the output selection unit. The output selection unit supplies the MDCT coefficient to an IMDCT unit in the case where the priority degree indicated in the priority information is more than or equal to a predetermined degree, and supplies 0 as the MDCT coefficient to a 0-value output unit in the case where the priority degree is less than the predetermined degree. The 0-value output unit generates an audio signal on the basis of 0 supplied as the MDCT coefficient. In addition, the IMDCT unit performs IMDCT on the basis of the MDCT coefficient to generate an audio signal. The present technology can be applied to an encoder and a decoder.

SELECTED DRAWING: Figure 10

Description

本技術は符号化装置および方法、復号装置および方法、並びにプログラムに関し、特に、オーディオ信号の復号の計算量を低減させることができるようにした符号化装置および方法、復号装置および方法、並びにプログラムに関する。 TECHNICAL FIELD The present technology relates to an encoding device and method, a decoding device and method, and a program, and more particularly to an encoding device and method, a decoding device and method, and a program that can reduce the amount of calculation for decoding an audio signal. .

例えば、オーディオ信号を符号化する方法として、国際標準規格であるMPEG(Moving Picture Experts Group)-2 AAC(Advanced Audio Coding)規格、MPEG-4 AAC規格やMPEG-D USAC(Unified Speech and Audio Coding)規格のマルチチャネル符号化が知られている（例えば、非特許文献１および非特許文献２参照）。 For example, the MPEG (Moving Picture Experts Group)-2 AAC (Advanced Audio Coding) standard, the MPEG-4 AAC standard, and the MPEG-D USAC (Unified Speech and Audio Coding) standard, which are international standards, are used as methods for encoding audio signals. Standard multi-channel coding is known (see, for example, Non-Patent Document 1 and Non-Patent Document 2).

INTERNATIONAL STANDARD ISO/IEC 14496-3 Fourth edition 2009-09-01 Information technology-coding of audio-visual objects-part3:AudioINTERNATIONAL STANDARD ISO/IEC 14496-3 Fourth edition 2009-09-01 Information technology-coding of audio-visual objects-part3:Audio INTERNATIONAL STANDARD ISO/IEC 23003-3 Frist edition 2012-04-01 Information technology-coding of audio-visual objects-part3:Unified speech and audio codingINTERNATIONAL STANDARD ISO/IEC 23003-3 Frist edition 2012-04-01 Information technology-coding of audio-visual objects-part3: Unified speech and audio coding

ところで、従来の5.1チャネルサラウンド再生を超える、より高臨場感な再生や、複数の音素材（オブジェクト）を伝送するためには、より多くのオーディオチャネルを用いた符号化技術が必要になる。 By the way, encoding technology using more audio channels is required for playback with a higher sense of reality than conventional 5.1-channel surround playback and for transmission of multiple sound materials (objects).

例えば、24チャネルのオーディオ信号および複数のオブジェクトのオーディオ信号を符号化し、復号を行う場合と、2チャネルのオーディオ信号を符号化し復号する場合とを考える。このような場合、計算能力の乏しいモバイルデバイスなどでは、2チャネルのオーディオ信号をリアルタイムに復号することは可能であるが、24チャネルのオーディオ信号および複数のオブジェクトのオーディオ信号をリアルタイムに復号することが困難な場合がある。 For example, consider a case of encoding and decoding a 24-channel audio signal and audio signals of a plurality of objects, and a case of encoding and decoding a 2-channel audio signal. In such a case, it is possible to decode a 2-channel audio signal in real time on a mobile device with poor computing power, but it is not possible to decode a 24-channel audio signal and multiple object audio signals in real time. It can be difficult.

現状のMPEG-D USACなどのオーディオコーデックでは、全チャネルおよび全オブジェクトのオーディオ信号を復号する必要があるため、復号時の計算量を低減させることが困難である。そうすると、復号側の機器によっては、リアルタイムでオーディオ信号を再生することができなくなってしまうことがある。 Since current audio codecs such as MPEG-D USAC need to decode audio signals of all channels and all objects, it is difficult to reduce the amount of computation during decoding. As a result, depending on the device on the decoding side, it may become impossible to reproduce the audio signal in real time.

本技術は、このような状況に鑑みてなされたものであり、復号の計算量を低減させることができるようにするものである。 The present technology has been made in view of such circumstances, and is intended to reduce the computational complexity of decoding.

本技術の第１の側面の復号装置は、複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報を取得する取得部と、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号するオーディオ信号復号部とを備える。 A decoding device according to a first aspect of the present technology includes an acquisition unit that acquires encoded audio signals of a plurality of channels or a plurality of objects and priority information of each of the audio signals at a predetermined time; an audio signal decoding unit for decoding, based on the information, the encoded audio signal of a predetermined number of channels or objects according to the priority information.

本技術の第１の側面の復号方法またはプログラムは、複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報を取得し、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号するステップを含む。 A decoding method or program according to the first aspect of the present technology obtains encoded audio signals of a plurality of channels or a plurality of objects and priority information of each of the audio signals at a predetermined time, and obtains the priority information decoding the encoded audio signal of a predetermined number of channels or objects according to the priority information, based on.

本技術の第１の側面においては、複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報が取得され、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号が復号される。 In a first aspect of the present technology, encoded audio signals of multiple channels or multiple objects and priority information of each of the audio signals at a predetermined time are obtained, and based on the priority information, The encoded audio signal of a predetermined number of channels or objects according to the priority information is decoded.

本技術の第２の側面の符号化装置は、複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報を生成する優先度情報生成部と、前記優先度情報をビットストリームに格納するパッキング部とを備える。 A coding apparatus according to a second aspect of the present technology includes a priority information generation unit that generates priority information at a predetermined time of audio signals of multiple channels or multiple objects, and stores the priority information in a bitstream. and a packing part.

本技術の第２の側面の符号化方法またはプログラムは、複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報を生成し、前記優先度情報をビットストリームに格納するステップを含む。 A coding method or program according to a second aspect of the present technology includes generating priority information at a predetermined time of an audio signal of multiple channels or multiple objects, and storing the priority information in a bitstream. .

本技術の第２の側面においては、複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報が生成され、前記優先度情報がビットストリームに格納される。 In a second aspect of the present technology, priority information at a given time of an audio signal of multiple channels or multiple objects is generated, and the priority information is stored in a bitstream.

ビットストリームについて説明する図である。FIG. 3 is a diagram for explaining a bitstream; FIG. 符号化について説明する図である。It is a figure explaining encoding. 優先度情報について説明する図である。It is a figure explaining priority information. 優先度情報の値の意味について説明する図である。FIG. 4 is a diagram for explaining the meaning of values of priority information; 符号化装置の構成例を示す図である。It is a figure which shows the structural example of an encoding apparatus. チャネルオーディオ符号化部の構成例を示す図である。FIG. 4 is a diagram showing a configuration example of a channel audio encoding unit; オブジェクトオーディオ符号化部の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of an object audio encoding unit; 符号化処理を説明するフローチャートである。4 is a flowchart for explaining encoding processing; 復号装置の構成例を示す図である。It is a figure which shows the structural example of a decoding apparatus. アンパッキング／復号部の構成例を示す図である。FIG. 4 is a diagram showing a configuration example of an unpacking/decoding unit; 復号処理を説明するフローチャートである。4 is a flowchart for explaining decoding processing; 選択復号処理を説明するフローチャートである。10 is a flowchart for explaining selective decoding processing; アンパッキング／復号部の他の構成例を示す図である。FIG. 10 is a diagram showing another configuration example of the unpacking/decoding unit; 選択復号処理を説明するフローチャートである。10 is a flowchart for explaining selective decoding processing; オブジェクトのメタデータのシンタックスの一例を示す図である。FIG. 4 is a diagram illustrating an example of syntax of object metadata; オーディオ信号の生成について説明する図である。FIG. 4 is a diagram explaining generation of an audio signal; オーディオ信号の生成について説明する図である。FIG. 4 is a diagram explaining generation of an audio signal; MDCT係数の出力先の選択について説明する図である。FIG. 10 is a diagram illustrating selection of an output destination of MDCT coefficients; オーディオ信号と高域のパワー値のゲイン調整について説明する図である。It is a figure explaining the gain adjustment of the power value of an audio signal and a high frequency band. オーディオ信号と高域のパワー値のゲイン調整について説明する図である。It is a figure explaining the gain adjustment of the power value of an audio signal and a high frequency band. アンパッキング／復号部の他の構成例を示す図である。FIG. 10 is a diagram showing another configuration example of the unpacking/decoding unit; 選択復号処理を説明するフローチャートである。10 is a flowchart for explaining selective decoding processing; オーディオ信号のゲイン調整について説明する図である。It is a figure explaining the gain adjustment of an audio signal. オーディオ信号のゲイン調整について説明する図である。It is a figure explaining the gain adjustment of an audio signal. アンパッキング／復号部の他の構成例を示す図である。FIG. 10 is a diagram showing another configuration example of the unpacking/decoding unit; 選択復号処理を説明するフローチャートである。10 is a flowchart for explaining selective decoding processing; VBAPゲインについて説明する図である。FIG. 4 is a diagram for explaining VBAP gain; VBAPゲインについて説明する図である。FIG. 4 is a diagram for explaining VBAP gain; アンパッキング／復号部の他の構成例を示す図である。FIG. 10 is a diagram showing another configuration example of the unpacking/decoding unit; 復号処理を説明するフローチャートである。4 is a flowchart for explaining decoding processing; 選択復号処理を説明するフローチャートである。10 is a flowchart for explaining selective decoding processing; コンピュータの構成例を示す図である。It is a figure which shows the structural example of a computer.

以下、図面を参照して、本技術を適用した実施の形態について説明する。 Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.

〈第１の実施の形態〉
〈本技術の概要について〉
本技術は、マルチチャネルを構成する各チャネルのオーディオ信号、およびオブジェクトのオーディオ信号の符号化において、各チャネルのオーディオ信号の優先度情報および各オブジェクトのオーディオ信号の優先度情報を伝送することで、復号の計算量を低減させることができるようにするものである。 <First Embodiment>
<Outline of this technology>
This technology transmits priority information of audio signals of each channel and priority information of audio signals of each object in encoding of audio signals of each channel and audio signals of objects that constitute a multi-channel. This is intended to reduce the computational complexity of decoding.

また、本技術は復号側において、各チャネルまたは各オブジェクトの優先度情報に示される優先度合いが所定の度合い以上である場合に周波数時間変換を行い、優先度情報に示される優先度合いが所定の度合い未満である場合には、周波数時間変換を行わず、周波数時間変換の結果を０とすることで、オーディオ信号の復号の計算量を低減させることができるようにするものである。 In addition, the present technology performs frequency-time conversion on the decoding side when the priority indicated in the priority information of each channel or each object is equal to or higher than a predetermined degree, and the priority indicated in the priority information reaches the predetermined degree. If it is less than that, frequency-time conversion is not performed, and the result of frequency-time conversion is set to 0, thereby reducing the amount of computation for decoding the audio signal.

なお、以下では、マルチチャネルのオーディオ信号およびオブジェクトのオーディオ信号がAAC規格に従って符号化される場合について説明するが、他の方式で符号化される場合にも同様の処理が行なわれる。 In the following description, the case where the multi-channel audio signal and the object audio signal are encoded according to the AAC standard will be described, but similar processing is performed when they are encoded by other methods.

例えば、マルチチャネルのオーディオ信号、および複数のオブジェクトのオーディオ信号がAAC規格に従って符号化され、伝送される場合、各チャネルや各オブジェクトのオーディオ信号がフレームごとに符号化されて伝送される。 For example, when multi-channel audio signals and audio signals of a plurality of objects are encoded according to the AAC standard and transmitted, the audio signals of each channel and each object are encoded and transmitted frame by frame.

具体的には図１に示すように、符号化されたオーディオ信号や、オーディオ信号の復号等に必要な情報が複数のエレメント（ビットストリームエレメント）に格納され、それらのエレメントからなるビットストリームが伝送されることになる。 Specifically, as shown in FIG. 1, encoded audio signals and information necessary for decoding audio signals are stored in a plurality of elements (bitstream elements), and a bitstream consisting of these elements is transmitted. will be

この例では、１フレーム分のビットストリームには、先頭から順番にｔ個のエレメントＥＬ１乃至エレメントＥＬｔが配置され、最後に当該フレームの情報に関する終端位置であることを示す識別子ＴＥＲＭが配置されている。 In this example, a bit stream for one frame has t elements EL1 to ELt arranged in order from the beginning, and finally an identifier TERM indicating the end position of the information of the frame. .

例えば、先頭に配置されたエレメントＥＬ１は、DSE（Data Stream Element）と呼ばれるアンシラリデータ領域であり、DSEにはオーディオ信号のダウンミックスに関する情報や識別情報など、複数の各チャネルに関する情報が記述される。 For example, the element EL1 placed at the head is an ancillary data area called a DSE (Data Stream Element), and the DSE describes information on each of a plurality of channels, such as information on downmixing of audio signals and identification information. be.

エレメントＥＬ１の後に続くエレメントＥＬ２乃至エレメントＥＬｔには、符号化されたオーディオ信号が格納される。 Encoded audio signals are stored in the elements EL2 to ELt following the element EL1.

特に、シングルチャネルのオーディオ信号が格納されているエレメントはSCEと呼ばれており、ペアとなる２つのチャネルのオーディオ信号が格納されているエレメントはCPEと呼ばれている。また、各オブジェクトのオーディオ信号はSCEに格納される。 In particular, an element storing a single-channel audio signal is called an SCE, and an element storing a pair of two-channel audio signals is called a CPE. Also, the audio signal of each object is stored in the SCE.

本技術では、マルチチャネルを構成する各チャネルのオーディオ信号の優先度情報、および各オブジェクトのオーディオ信号の優先度情報が生成されてDSEに格納される。 In this technology, the priority information of the audio signal of each channel constituting the multi-channel and the priority information of the audio signal of each object are generated and stored in the DSE.

例えば、図２に示すように連続するフレームＦ１１乃至フレームＦ１３のオーディオ信号が符号化されるとする。 For example, as shown in FIG. 2, it is assumed that an audio signal of consecutive frames F11 to F13 is encoded.

このような場合、符号化装置（エンコーダ）は、それらのフレームごとに、各チャネルのオーディオ信号がどの程度の優先度合いであるかを解析し、例えば図３に示すように各チャネルの優先度情報を生成する。同様に、符号化装置は、各オブジェクトのオーディオ信号についても優先度情報を生成する。 In such a case, an encoding device (encoder) analyzes the degree of priority of the audio signal of each channel for each of those frames, and for example, the priority information of each channel as shown in FIG. to generate Similarly, the encoding device also generates priority information for the audio signal of each object.

例えば符号化装置は、オーディオ信号の音圧やスペクトルの形状、さらに各チャネル間やオブジェクト間のスペクトル形状の相関などに基づいて、オーディオ信号がどの程度の優先度合いであるかを解析する。 For example, the encoding device analyzes the degree of priority of the audio signal based on the sound pressure and spectral shape of the audio signal, the correlation of spectral shape between channels and between objects, and the like.

図３では、全チャネル数がＭチャネルである場合における各チャネルの優先度情報が例として示されている。すなわち、チャネル番号が０であるチャネルから、チャネル番号がＭ－１であるチャネルまでの各チャネルについて、それらのチャネルのオーディオ信号の優先度合いを示す数値が優先度情報として示されている。 In FIG. 3, priority information of each channel is shown as an example when the total number of channels is M channels. That is, for each channel from channel number 0 to channel number M−1, numerical values indicating the priority levels of the audio signals of these channels are indicated as priority information.

例えばチャネル番号が０であるチャネルの優先度情報は３となっており、チャネル番号が１であるチャネルの優先度情報は０となっている。なお、以下、所定のチャネル番号ｍ（ｍ＝0,1,・・・,M-1）のチャネルをチャネルｍとも称することとする。 For example, the priority information of a channel with a channel number of 0 is 3, and the priority information of a channel with a channel number of 1 is 0. In addition, hereinafter, a channel with a predetermined channel number m (m=0, 1, . . . , M-1) is also referred to as channel m.

図３に示した優先度情報の値は、図４に示すように０から７までの何れかの値とされるようになされており、優先度情報の値が大きいほど、オーディオ信号の再生時の優先度合い、つまり重要度が高いとされている。 The value of the priority information shown in FIG. 3 is set to any value from 0 to 7 as shown in FIG. is considered to be of high priority, that is, of high importance.

したがって、優先度情報の値が０であるオーディオ信号は最も優先度が低く、優先度情報の値が７であるオーディオ信号は最も優先度が高いことになる。 Therefore, an audio signal with a priority information value of 0 has the lowest priority, and an audio signal with a priority information value of 7 has the highest priority.

マルチチャネルのオーディオ信号や複数のオブジェクトのオーディオ信号が同時に再生される場合、通常、それらのオーディオ信号により再生される音声のなかには、他の音声と比べるとそれほど重要ではない音声も含まれている。換言すれば、全体の音声のなかで、ある特定の音声が再生されなかったとしても、そのことにより受聴者に違和感を与えるようなことがない程度の音声も存在する。 When multi-channel audio signals or audio signals of multiple objects are played simultaneously, some of the sounds reproduced by those audio signals usually contain less important sounds than other sounds. In other words, even if a particular sound is not reproduced in the entire sound, there are sounds that do not cause the listener to feel uncomfortable.

したがって、必要に応じて優先度の低いオーディオ信号については復号しないようにすれば、音質の劣化を抑えつつ復号の計算量を低減させることができる。そこで、符号化装置では、復号しないオーディオ信号を適切に選択することができるように、再生時における各オーディオ信号の重要さの度合い、つまり復号を優先させるべき度合いを示す優先度情報が、フレームごとに各オーディオ信号に対して付与される。 Therefore, if audio signals with low priority are not decoded as necessary, it is possible to reduce the computational complexity of decoding while suppressing deterioration in sound quality. Therefore, in the encoding apparatus, priority information indicating the degree of importance of each audio signal during reproduction, that is, the degree of priority for decoding, is stored for each frame so that the audio signal that is not to be decoded can be appropriately selected. is assigned to each audio signal.

以上のようにして各オーディ信号の優先度情報が定められると、それらの優先度情報は、図１に示したエレメントＥＬ１のDSEに格納される。特に図３の例では、マルチチャネルのオーディオ信号を構成するチャネル数はＭであるから、チャネル０からチャネルＭ－１のＭ個の各チャネルの優先度情報がDSEに格納される。 When the priority information of each audio signal is determined as described above, the priority information is stored in the DSE of the element EL1 shown in FIG. Especially in the example of FIG. 3, since the number of channels constituting the multi-channel audio signal is M, the priority information of each of M channels from channel 0 to channel M-1 is stored in the DSE.

同様に、各オブジェクトの優先度情報もエレメントＥＬ１のDSEに格納される。ここでは、例えばオブジェクト番号が０からＮ－１までのＮ個のオブジェクトがあるとすると、Ｎ個の各オブジェクトに対して、それぞれ優先度情報が定められ、DSEに格納される。 Similarly, priority information for each object is also stored in the DSE of element EL1. Here, for example, if there are N objects with object numbers from 0 to N-1, priority information is determined for each of the N objects and stored in the DSE.

なお、以下、所定のオブジェクト番号ｎ（ｎ＝0,1,・・・,N-1）のオブジェクトをオブジェクトｎとも称することとする。 An object with a predetermined object number n (n=0, 1, . . . , N-1) is hereinafter also referred to as an object n.

このように、各オーディオ信号に対して優先度情報を定めれば、再生側、つまりオーディオ信号の復号側において、再生時にどのオーディオ信号が重要であり、優先して復号すべきか、つまり再生に用いるべきかを簡単に特定することができる。 In this way, if priority information is defined for each audio signal, the playback side, that is, the audio signal decoding side, determines which audio signal is important during playback and should be decoded preferentially. You can easily identify what to do.

図２の説明に戻り、例えば所定のチャネルのフレームＦ１１とフレームＦ１３のオーディオ信号の優先度情報が７であり、その所定のチャネルのフレームＦ１２のオーディオ信号の優先度情報が０であったとする。 Returning to the description of FIG. 2, for example, it is assumed that the priority information of the audio signals of the frames F11 and F13 of the predetermined channel is 7, and the priority information of the audio signal of the frame F12 of the predetermined channel is 0.

また、オーディオ信号の復号側、つまり復号装置（デコーダ）において所定の優先度合い未満のオーディオ信号に対しては、復号が行われないようになっているとする。 In addition, it is assumed that the audio signal decoding side, ie, the decoding device (decoder), does not decode audio signals having a priority lower than a predetermined degree.

ここで、例えば所定の優先度合いを閾値と呼ぶこととし、その閾値が４であるとすると、上述した例では、優先度情報が７である所定チャネルのフレームＦ１１とフレームＦ１３のオーディオ信号に対しては復号が行われる。 Here, for example, the predetermined priority is called a threshold, and if the threshold is 4, in the above example, the audio signals of the frames F11 and F13 of the predetermined channel whose priority information is 7 are is decrypted.

これに対して、優先度情報が０である所定チャネルのフレームＦ１２のオーディオ信号に対しては復号が行われない。 On the other hand, the audio signal of the frame F12 of the predetermined channel whose priority information is 0 is not decoded.

したがって、この例ではフレームＦ１２のオーディオ信号が無音信号とされて、フレームＦ１１とフレームＦ１３のオーディオ信号が合成され、最終的な所定チャネルのオーディオ信号とされる。 Therefore, in this example, the audio signal of the frame F12 is treated as a silent signal, and the audio signals of the frames F11 and F13 are combined to obtain the final audio signal of the predetermined channel.

より詳細には、例えば各オーディオ信号の符号化時には、オーディオ信号に対する時間周波数変換が行われて時間周波数変換により得られた情報が符号化され、その結果得られた符号化データがエレメントに格納される。 More specifically, for example, when each audio signal is encoded, time-frequency transform is performed on the audio signal, the information obtained by the time-frequency transform is encoded, and the encoded data obtained as a result is stored in the element. be.

なお、時間周波数変換としてどのような処理が行われてもよいが、以下では時間周波数変換としてMDCT（Modified Discrete Cosine Transform）（修正離散コサイン変換）が行われるものとして説明を続ける。 Any processing may be performed as the time-frequency transform, but the following description will continue assuming that MDCT (Modified Discrete Cosine Transform) is performed as the time-frequency transform.

また、復号装置では、符号化データに対する復号が行われ、その結果得られたMDCT係数に対してIMDCT（Inverse Modified Discrete Cosine Transform）（逆修正離散コサイン変換）が行われ、オーディオ信号が生成される。すなわち、ここでは時間周波数変換の逆変換（周波数時間変換）としてIMDCTが行われる。 In addition, the decoding device decodes the encoded data, performs IMDCT (Inverse Modified Discrete Cosine Transform) on the resulting MDCT coefficients, and generates an audio signal. . That is, here, IMDCT is performed as inverse transform (frequency-time transform) of time-frequency transform.

そのため、より詳細には、優先度情報が閾値の値４以上であるフレームＦ１１とフレームＦ１３についてはIMDCTが行われてオーディオ信号が生成される。 Therefore, more specifically, IMDCT is performed on the frames F11 and F13 whose priority information is equal to or greater than the threshold value of 4, and audio signals are generated.

また、優先度情報が閾値の値４未満であるフレームＦ１２についてはIMDCTが行われず、IMDCTの結果が０とされてオーディオ信号が生成される。これにより、フレームＦ１２のオーディオ信号は無音信号、つまり０データとなる。 Also, the IMDCT is not performed for the frame F12 whose priority information is less than the threshold value of 4, and the result of the IMDCT is set to 0 and an audio signal is generated. As a result, the audio signal of frame F12 becomes a silent signal, that is, 0 data.

さらに別の例として、図３に示した例では、閾値が４であるときには各チャネル０乃至チャネルＭ－１のオーディオ信号のうち、優先度情報が閾値である４未満の値となっているチャネル０、チャネル１、およびチャネルＭ－２のオーディオ信号の復号が行われないことになる。 As another example, in the example shown in FIG. 3, when the threshold is 4, among the audio signals of each channel 0 to channel M−1, the priority information is less than the threshold value of 4. 0, channel 1, and channel M-2 audio signals will not be decoded.

以上のように閾値との比較結果に応じて、優先度情報により示される優先度合いの低いオーディオ信号については復号を行わないようにすることで、音質の劣化を最小限に抑えつつ、復号の計算量を低減させることができる。 As described above, according to the comparison result with the threshold, decoding is not performed for audio signals with a low priority indicated by the priority information. amount can be reduced.

〈符号化装置の構成例〉
次に、本技術を適用した符号化装置および復号装置の具体的な実施の形態について説明する。まず、符号化装置について説明する。 <Configuration example of encoding device>
Next, specific embodiments of an encoding device and a decoding device to which the present technology is applied will be described. First, the encoding device will be explained.

図５は、本技術を適用した符号化装置の構成例を示す図である。 FIG. 5 is a diagram showing a configuration example of an encoding device to which the present technology is applied.

図５の符号化装置１１は、チャネルオーディオ符号化部２１、オブジェクトオーディオ符号化部２２、メタデータ入力部２３、およびパッキング部２４を有している。 The encoding device 11 of FIG. 5 has a channel audio encoding section 21, an object audio encoding section 22, a metadata input section 23, and a packing section 24.

チャネルオーディオ符号化部２１には、チャネル数がＭであるマルチチャネルの各チャネルのオーディオ信号が供給される。例えば各チャネルのオーディオ信号は、それらのチャネルに対応するマイクロフォンから供給される。図５では、文字「＃０」乃至「＃Ｍ－１」は、各チャネルのチャネル番号を表している。 Audio signals of each channel of a multi-channel having M channels are supplied to the channel audio encoding unit 21 . For example, the audio signals for each channel are supplied from microphones corresponding to those channels. In FIG. 5, the characters "#0" to "#M-1" represent the channel number of each channel.

チャネルオーディオ符号化部２１は、供給された各チャネルのオーディオ信号を符号化するとともに、オーディオ信号に基づいて優先度情報を生成し、符号化により得られた符号化データと、優先度情報とをパッキング部２４に供給する。 The channel audio encoding unit 21 encodes the supplied audio signal of each channel, generates priority information based on the audio signal, and encodes the encoded data obtained by encoding and the priority information. It is supplied to the packing section 24 .

オブジェクトオーディオ符号化部２２には、Ｎ個の各オブジェクトのオーディオ信号が供給される。例えば各オブジェクトのオーディオ信号は、それらのオブジェクトに取り付けられたマイクロフォンから供給される。図５では、文字「＃０」乃至「＃Ｎ－１」は、各オブジェクトのオブジェクト番号を表している。 Audio signals of each of the N objects are supplied to the object audio encoding unit 22 . For example, the audio signal for each object is provided by microphones attached to those objects. In FIG. 5, the characters "#0" to "#N-1" represent the object number of each object.

オブジェクトオーディオ符号化部２２は、供給された各オブジェクトのオーディオ信号を符号化するとともに、オーディオ信号に基づいて優先度情報を生成し、符号化により得られた符号化データと、優先度情報とをパッキング部２４に供給する。 The object audio encoding unit 22 encodes the supplied audio signal of each object, generates priority information based on the audio signal, and encodes the encoded data obtained by encoding and the priority information. It is supplied to the packing section 24 .

メタデータ入力部２３は、各オブジェクトのメタデータをパッキング部２４に供給する。例えばオブジェクトのメタデータは、空間上におけるオブジェクトの位置を示す空間位置情報などとされる。より具体的には、例えば空間位置情報は３次元空間におけるオブジェクトの位置の座標を示す３次元座標情報である。 The metadata input section 23 supplies the metadata of each object to the packing section 24 . For example, metadata of an object is spatial position information indicating the position of the object in space. More specifically, for example, the spatial position information is three-dimensional coordinate information indicating the coordinates of the position of the object in the three-dimensional space.

パッキング部２４は、チャネルオーディオ符号化部２１から供給された符号化データと優先度情報、オブジェクトオーディオ符号化部２２から供給された符号化データと優先度情報、およびメタデータ入力部２３から供給されたメタデータをパッキングしてビットストリームを生成し、出力する。 The packing unit 24 stores the encoded data and priority information supplied from the channel audio encoding unit 21, the encoded data and priority information supplied from the object audio encoding unit 22, and the metadata input unit 23. Generate and output a bitstream by packing the metadata.

このようにして得られるビットストリームには、フレームごとに各チャネルの符号化データ、各チャネルの優先度情報、各オブジェクトの符号化データ、各オブジェクトの優先度情報、および各オブジェクトのメタデータが含まれていることになる。 The bitstream thus obtained includes encoded data for each channel, priority information for each channel, encoded data for each object, priority information for each object, and metadata for each object for each frame. It means that

ここで、１フレーム分のビットストリームに格納されるＭ個の各チャネルのオーディオ信号、およびＮ個の各オブジェクトのオーディオ信号は、同時に再生されるべき同一フレームのオーディオ信号である。 Here, the audio signals of the M channels and the audio signals of the N objects stored in the bitstream for one frame are audio signals of the same frame to be reproduced simultaneously.

なお、ここでは、各チャネルや各オブジェクトのオーディオ信号の優先度情報として、１フレームごとに各オーディオ信号に対して優先度情報が生成される例について説明するが、任意の所定の時間を単位として、例えば数フレーム分のオーディオ信号に対して１つの優先度情報が生成されるようにしてもよい。 Here, as the priority information of the audio signal of each channel and each object, an example will be described in which priority information is generated for each audio signal for each frame. , for example, one piece of priority information may be generated for several frames of audio signals.

〈チャネルオーディオ符号化部の構成例〉
また、図５のチャネルオーディオ符号化部２１は、より詳細には、例えば図６に示すように構成される。 <Configuration example of channel audio encoder>
Further, the channel audio encoding unit 21 in FIG. 5 is more specifically configured as shown in FIG. 6, for example.

図６に示すチャネルオーディオ符号化部２１は、符号化部５１および優先度情報生成部５２を備えている。 The channel audio encoder 21 shown in FIG. 6 includes an encoder 51 and a priority information generator 52 .

符号化部５１はMDCT部６１を備えており、符号化部５１は外部から供給された各チャネルのオーディオ信号を符号化する。 The encoding unit 51 includes an MDCT unit 61, and the encoding unit 51 encodes the audio signal of each channel supplied from the outside.

すなわち、MDCT部６１は、外部から供給された各チャネルのオーディオ信号に対してMDCTを行う。符号化部５１は、MDCTにより得られた各チャネルのMDCT係数を符号化し、その結果得られた各チャネルの符号化データ、つまり符号化されたオーディオ信号をパッキング部２４に供給する。 That is, the MDCT section 61 performs MDCT on the externally supplied audio signal of each channel. The encoding unit 51 encodes the MDCT coefficients of each channel obtained by the MDCT, and supplies the resulting encoded data of each channel, that is, the encoded audio signal to the packing unit 24 .

また、優先度情報生成部５２は、外部から供給された各チャネルのオーディオ信号を解析して、それらの各チャネルのオーディオ信号の優先度情報を生成し、パッキング部２４に供給する。 The priority information generator 52 also analyzes the externally supplied audio signal of each channel, generates priority information of the audio signal of each channel, and supplies it to the packing unit 24 .

〈オブジェクトオーディオ符号化部の構成例〉
さらに、図５のオブジェクトオーディオ符号化部２２は、より詳細には、例えば図７に示すように構成される。 <Configuration example of object audio encoding unit>
Further, the object audio encoding unit 22 of FIG. 5 is configured, more specifically, as shown in FIG. 7, for example.

図７に示すオブジェクトオーディオ符号化部２２は、符号化部９１および優先度情報生成部９２を備えている。 The object audio encoding section 22 shown in FIG. 7 includes an encoding section 91 and a priority information generating section 92 .

符号化部９１はMDCT部１０１を備えており、符号化部９１は外部から供給された各オブジェクトのオーディオ信号を符号化する。 The encoding unit 91 includes an MDCT unit 101, and the encoding unit 91 encodes the audio signal of each object supplied from the outside.

すなわち、MDCT部１０１は、外部から供給された各オブジェクトのオーディオ信号に対してMDCTを行う。符号化部９１は、MDCTにより得られた各オブジェクトのMDCT係数を符号化し、その結果得られた各オブジェクトの符号化データ、つまり符号化されたオーディオ信号をパッキング部２４に供給する。 That is, the MDCT unit 101 performs MDCT on the audio signal of each object supplied from the outside. The encoding unit 91 encodes the MDCT coefficients of each object obtained by MDCT, and supplies the resulting encoded data of each object, that is, the encoded audio signal to the packing unit 24 .

また、優先度情報生成部９２は、外部から供給された各オブジェクトのオーディオ信号を解析して、それらの各オブジェクトのオーディオ信号の優先度情報を生成し、パッキング部２４に供給する。 The priority information generation unit 92 also analyzes the audio signal of each object supplied from the outside, generates priority information of the audio signal of each object, and supplies the information to the packing unit 24 .

〈符号化処理の説明〉
次に、符号化装置１１により行われる処理について説明する。 <Description of encoding processing>
Next, processing performed by the encoding device 11 will be described.

符号化装置１１は、同時に再生される、複数の各チャネルのオーディオ信号および複数の各オブジェクトのオーディオ信号が１フレーム分だけ供給されると、符号化処理を行って、符号化されたオーディオ信号が含まれるビットストリームを出力する。 When the audio signals of each of the plurality of channels and the audio signals of each of the plurality of objects, which are reproduced simultaneously, are supplied for only one frame, the encoding device 11 performs encoding processing to generate encoded audio signals. Output the contained bitstream.

以下、図８のフローチャートを参照して、符号化装置１１による符号化処理について説明する。なお、この符号化処理はオーディオ信号のフレームごとに行われる。 The encoding process by the encoding device 11 will be described below with reference to the flowchart of FIG. Note that this encoding process is performed for each frame of the audio signal.

ステップＳ１１において、チャネルオーディオ符号化部２１の優先度情報生成部５２は、供給された各チャネルのオーディオ信号の優先度情報を生成し、パッキング部２４に供給する。例えば優先度情報生成部５２は、チャネルごとにオーディオ信号を解析し、オーディオ信号の音圧やスペクトル形状、チャネル間のスペクトル形状の相関などに基づいて優先度情報を生成する。 In step S11 , the priority information generation unit 52 of the channel audio encoding unit 21 generates priority information of the supplied audio signal of each channel and supplies it to the packing unit 24 . For example, the priority information generation unit 52 analyzes the audio signal for each channel and generates priority information based on the sound pressure and spectrum shape of the audio signal, the correlation of the spectrum shape between channels, and the like.

ステップＳ１２において、パッキング部２４は、優先度情報生成部５２から供給された各チャネルのオーディオ信号の優先度情報をビットストリームのDSEに格納する。すなわち、優先度情報がビットストリームの先頭のエレメントに格納される。 In step S12, the packing unit 24 stores the priority information of the audio signal of each channel supplied from the priority information generating unit 52 in the DSE of the bitstream. That is, priority information is stored in the leading element of the bitstream.

ステップＳ１３において、オブジェクトオーディオ符号化部２２の優先度情報生成部９２は、供給された各オブジェクトのオーディオ信号の優先度情報を生成し、パッキング部２４に供給する。例えば優先度情報生成部９２は、オブジェクトごとにオーディオ信号を解析し、オーディオ信号の音圧やスペクトル形状、オブジェクト間のスペクトル形状の相関などに基づいて優先度情報を生成する。 In step S13 , the priority information generation unit 92 of the object audio encoding unit 22 generates priority information of the supplied audio signal of each object and supplies it to the packing unit 24 . For example, the priority information generator 92 analyzes the audio signal for each object and generates priority information based on the sound pressure and spectral shape of the audio signal, the correlation of spectral shapes between objects, and the like.

なお、各チャネルや各オブジェクトのオーディオ信号の優先度情報の生成時には、優先度情報の値となる優先度合いごとに、それらの優先度合いが割り当てられるオーディオ信号の数が、チャネル数やオブジェクト数に対して予め定められているようにしてもよい。 When generating priority information for audio signals of each channel and each object, the number of audio signals to which each priority, which is the value of the priority information, is assigned is determined by the number of channels and the number of objects. may be determined in advance.

例えば図３の例では、優先度情報が「７」とされるオーディオ信号の数、つまりチャネルの数は５個、優先度情報が「６」とされるオーディオ信号の数は３個などと、予め定められているようにしてもよい。 For example, in the example of FIG. 3, the number of audio signals whose priority information is "7", that is, the number of channels is five, and the number of audio signals whose priority information is "6" is three. It may be determined in advance.

ステップＳ１４において、パッキング部２４は、優先度情報生成部９２から供給された各オブジェクトのオーディオ信号の優先度情報をビットストリームのDSEに格納する。 In step S14, the packing unit 24 stores the priority information of the audio signal of each object supplied from the priority information generation unit 92 in the DSE of the bitstream.

ステップＳ１５において、パッキング部２４は、各オブジェクトのメタデータをビットストリームのDSEに格納する。 In step S15, the packing unit 24 stores the metadata of each object in the DSE of the bitstream.

例えばメタデータ入力部２３は、ユーザの入力操作を受けたり、外部との通信を行ったり、外部の記録領域からの読み出しを行ったりすることで、各オブジェクトのメタデータを取得し、パッキング部２４に供給する。パッキング部２４は、このようにしてメタデータ入力部２３から供給されたメタデータをDSEに格納する。 For example, the metadata input unit 23 acquires the metadata of each object by receiving user input operations, communicating with the outside, or reading data from an external recording area. supply to The packing unit 24 stores the metadata supplied from the metadata input unit 23 in this way in the DSE.

以上の処理により、ビットストリームのDSEには、全チャネルのオーディオ信号の優先度情報、全オブジェクトのオーディオ信号の優先度情報、および全オブジェクトのメタデータが格納されたことになる。 By the above processing, the DSE of the bitstream stores the priority information of audio signals of all channels, the priority information of audio signals of all objects, and the metadata of all objects.

ステップＳ１６において、チャネルオーディオ符号化部２１の符号化部５１は、供給された各チャネルのオーディオ信号を符号化する。 In step S16, the encoding unit 51 of the channel audio encoding unit 21 encodes the supplied audio signal of each channel.

より具体的には、MDCT部６１は各チャネルのオーディオ信号に対してMDCTを行い、符号化部５１は、MDCTにより得られた各チャネルのMDCT係数を符号化し、その結果得られた各チャネルの符号化データをパッキング部２４に供給する。 More specifically, the MDCT unit 61 performs MDCT on the audio signal of each channel, the encoding unit 51 encodes the MDCT coefficients of each channel obtained by the MDCT, and the resulting The encoded data is supplied to the packing section 24 .

ステップＳ１７において、パッキング部２４は符号化部５１から供給された各チャネルのオーディオ信号の符号化データを、ビットストリームのSCEまたはCPEに格納する。すなわち、ビットストリームにおいてDSEに続いて配置されている各エレメントに符号化データが格納される。 In step S17, the packing unit 24 stores the encoded data of the audio signal of each channel supplied from the encoding unit 51 in the SCE or CPE of the bitstream. In other words, coded data is stored in each element arranged following the DSE in the bitstream.

ステップＳ１８において、オブジェクトオーディオ符号化部２２の符号化部９１は、供給された各オブジェクトのオーディオ信号を符号化する。 In step S18, the encoding unit 91 of the object audio encoding unit 22 encodes the supplied audio signal of each object.

より具体的には、MDCT部１０１は各オブジェクトのオーディオ信号に対してMDCTを行い、符号化部９１は、MDCTにより得られた各オブジェクトのMDCT係数を符号化し、その結果得られた各オブジェクトの符号化データをパッキング部２４に供給する。 More specifically, MDCT section 101 performs MDCT on the audio signal of each object, encoding section 91 encodes the MDCT coefficients of each object obtained by MDCT, and encodes the resulting The encoded data is supplied to the packing section 24 .

ステップＳ１９において、パッキング部２４は符号化部９１から供給された各オブジェクトのオーディオ信号の符号化データを、ビットストリームのSCEに格納する。すなわち、ビットストリームにおいてDSEよりも後に配置されているいくつかのエレメントに符号化データが格納される。 In step S19, the packing unit 24 stores the encoded data of the audio signal of each object supplied from the encoding unit 91 in the SCE of the bitstream. That is, encoded data is stored in some elements located after the DSE in the bitstream.

以上の処理により、処理対象となっているフレームについて、全チャネルのオーディオ信号の優先度情報と符号化データ、全オブジェクトのオーディオ信号の優先度情報と符号化データ、および全オブジェクトのメタデータが格納されたビットストリームが得られる。 With the above processing, the priority information and encoded data of audio signals for all channels, the priority information and encoded data of audio signals for all objects, and the metadata of all objects are stored for the frame being processed. resulting in a compressed bitstream.

ステップＳ２０において、パッキング部２４は、得られたビットストリームを出力し、符号化処理は終了する。 In step S20, the packing unit 24 outputs the obtained bitstream, and the encoding process ends.

以上のようにして符号化装置１１は、各チャネルのオーディオ信号の優先度情報と、各オブジェクトのオーディオ信号の優先度情報とを生成してビットストリームに格納し、出力する。したがって、復号側において、どのオーディオ信号がより優先度合いの高いものであるかを簡単に把握することができるようになる。 As described above, the encoding device 11 generates the priority information of the audio signal of each channel and the priority information of the audio signal of each object, stores them in a bitstream, and outputs them. Therefore, the decoding side can easily grasp which audio signal has a higher priority.

これにより、復号側では、優先度情報に応じて、符号化されたオーディオ信号の復号を選択的に行うことができる。その結果、オーディオ信号により再生される音声の音質の劣化を最小限に抑えつつ、復号の計算量を低減させることができる。 This allows the decoding side to selectively decode the encoded audio signal according to the priority information. As a result, it is possible to reduce the computational complexity of decoding while minimizing the deterioration of the sound quality of the sound reproduced by the audio signal.

特に、各オブジェクトのオーディオ信号の優先度情報をビットストリームに格納しておくことで、復号側において、復号の計算量を低減できるだけでなく、その後のレンダリング等の処理の計算量も低減させることができる。 In particular, by storing the priority information of the audio signal of each object in the bitstream, it is possible to not only reduce the computational complexity of decoding on the decoding side, but also reduce the computational complexity of subsequent processing such as rendering. can.

〈復号装置の構成例〉
次に、以上において説明した符号化装置１１から出力されたビットストリームを入力とし、ビットストリームに含まれる符号化データを復号する復号装置について説明する。 <Configuration example of decoding device>
Next, a decoding device that receives as input the bitstream output from the encoding device 11 described above and decodes the encoded data contained in the bitstream will be described.

そのような復号装置は、例えば図９に示すように構成される。 Such a decoding device is configured, for example, as shown in FIG.

図９に示す復号装置１５１は、アンパッキング／復号部１６１、レンダリング部１６２、およびミキシング部１６３を有している。 Decoding device 151 shown in FIG. 9 has unpacking/decoding section 161 , rendering section 162 and mixing section 163 .

アンパッキング／復号部１６１は、符号化装置１１から出力されたビットストリームを取得するとともに、ビットストリームのアンパッキングおよび復号を行う。 The unpacking/decoding unit 161 acquires the bitstream output from the encoding device 11, and unpacks and decodes the bitstream.

アンパッキング／復号部１６１は、アンパッキングおよび復号により得られた各オブジェクトのオーディオ信号と、各オブジェクトのメタデータとをレンダリング部１６２に供給する。このとき、アンパッキング／復号部１６１は、ビットストリームに含まれている優先度情報に応じて各オブジェクトの符号化データの復号を行う。 The unpacking/decoding unit 161 supplies the audio signal of each object obtained by unpacking and decoding and the metadata of each object to the rendering unit 162 . At this time, the unpacking/decoding unit 161 decodes the encoded data of each object according to the priority information included in the bitstream.

また、アンパッキング／復号部１６１は、アンパッキングおよび復号により得られた各チャネルのオーディオ信号をミキシング部１６３に供給する。このとき、アンパッキング／復号部１６１は、ビットストリームに含まれている優先度情報に応じて各チャネルの符号化データの復号を行う。 The unpacking/decoding section 161 also supplies the audio signal of each channel obtained by unpacking and decoding to the mixing section 163 . At this time, the unpacking/decoding unit 161 decodes the encoded data of each channel according to the priority information included in the bitstream.

レンダリング部１６２は、アンパッキング／復号部１６１から供給された各オブジェクトのオーディオ信号、および各オブジェクトのメタデータとしての空間位置情報に基づいて、Ｍチャネルのオーディオ信号を生成し、ミキシング部１６３に供給する。このときレンダリング部１６２は、各オブジェクトの音像が、それらのオブジェクトの空間位置情報により示される位置に定位するようにＭ個の各チャネルのオーディオ信号を生成する。 The rendering unit 162 generates an M-channel audio signal based on the audio signal of each object supplied from the unpacking/decoding unit 161 and the spatial position information as metadata of each object, and supplies the M-channel audio signal to the mixing unit 163. do. At this time, the rendering unit 162 generates audio signals for each of the M channels so that the sound image of each object is localized at the position indicated by the spatial position information of those objects.

ミキシング部１６３は、アンパッキング／復号部１６１から供給された各チャネルのオーディオ信号と、レンダリング部１６２から供給された各チャネルのオーディオ信号とをチャネルごとに重み付け加算を行って、最終的な各チャネルのオーディオ信号を生成する。ミキシング部１６３は、このようにして得られた最終的な各チャネルのオーディオ信号を、外部の各チャネルに対応するスピーカに供給し、音声を再生させる。 The mixing unit 163 performs weighted addition for each channel on the audio signal of each channel supplied from the unpacking/decoding unit 161 and the audio signal of each channel supplied from the rendering unit 162, and finalizes each channel. to generate an audio signal. The mixing unit 163 supplies the final audio signals of each channel thus obtained to the external speakers corresponding to each channel to reproduce the sound.

〈アンパッキング／復号部の構成例〉
また、図９に示した復号装置１５１のアンパッキング／復号部１６１は、より詳細には例えば図１０に示すように構成される。 <Configuration example of unpacking/decoding section>
Further, the unpacking/decoding section 161 of the decoding device 151 shown in FIG. 9 is more specifically configured as shown in FIG. 10, for example.

図１０に示すアンパッキング／復号部１６１は、優先度情報取得部１９１、チャネルオーディオ信号取得部１９２、チャネルオーディオ信号復号部１９３、出力選択部１９４、０値出力部１９５、IMDCT部１９６、オブジェクトオーディオ信号取得部１９７、オブジェクトオーディオ信号復号部１９８、出力選択部１９９、０値出力部２００、およびIMDCT部２０１を有している。 The unpacking/decoding unit 161 shown in FIG. 10 includes a priority information acquisition unit 191, a channel audio signal acquisition unit 192, a channel audio signal decoding unit 193, an output selection unit 194, a 0 value output unit 195, an IMDCT unit 196, an object audio It has a signal acquisition section 197 , an object audio signal decoding section 198 , an output selection section 199 , a 0 value output section 200 and an IMDCT section 201 .

優先度情報取得部１９１は、供給されたビットストリームから、各チャネルのオーディオ信号の優先度情報を取得して出力選択部１９４に供給するとともに、ビットストリームから各オブジェクトのオーディオ信号の優先度情報を取得して出力選択部１９９に供給する。 The priority information acquisition unit 191 acquires the priority information of the audio signal of each channel from the supplied bitstream and supplies it to the output selection unit 194, and also acquires the priority information of the audio signal of each object from the bitstream. It is acquired and supplied to the output selection unit 199 .

また、優先度情報取得部１９１は、供給されたビットストリームから各オブジェクトのメタデータを取得してレンダリング部１６２に供給するとともに、ビットストリームをチャネルオーディオ信号取得部１９２およびオブジェクトオーディオ信号取得部１９７に供給する。 Also, the priority information acquisition unit 191 acquires metadata of each object from the supplied bitstream and supplies it to the rendering unit 162, and supplies the bitstream to the channel audio signal acquisition unit 192 and the object audio signal acquisition unit 197. supply.

チャネルオーディオ信号取得部１９２は、優先度情報取得部１９１から供給されたビットストリームから各チャネルの符号化データを取得して、チャネルオーディオ信号復号部１９３に供給する。チャネルオーディオ信号復号部１９３は、チャネルオーディオ信号取得部１９２から供給された各チャネルの符号化データを復号し、その結果得られたMDCT係数を出力選択部１９４に供給する。 The channel audio signal acquisition unit 192 acquires encoded data of each channel from the bitstream supplied from the priority information acquisition unit 191 and supplies the encoded data to the channel audio signal decoding unit 193 . The channel audio signal decoding unit 193 decodes the encoded data of each channel supplied from the channel audio signal acquisition unit 192 and supplies the resulting MDCT coefficients to the output selection unit 194 .

出力選択部１９４は、優先度情報取得部１９１から供給された各チャネルの優先度情報に基づいて、チャネルオーディオ信号復号部１９３から供給された各チャネルのMDCT係数の出力先を選択的に切り替える。 The output selection unit 194 selectively switches the output destination of the MDCT coefficients of each channel supplied from the channel audio signal decoding unit 193 based on the priority information of each channel supplied from the priority information acquisition unit 191 .

すなわち、出力選択部１９４は、所定のチャネルについての優先度情報が所定の閾値Ｐ未満である場合、そのチャネルのMDCT係数を０として０値出力部１９５に供給する。また、出力選択部１９４は、所定のチャネルについての優先度情報が所定の閾値Ｐ以上である場合、チャネルオーディオ信号復号部１９３から供給された、そのチャネルのMDCT係数をIMDCT部１９６に供給する。 That is, when the priority information for a given channel is less than a given threshold value P, the output selection section 194 sets the MDCT coefficient of that channel to 0 and supplies it to the 0 value output section 195 . Also, when the priority information for a given channel is equal to or greater than a given threshold value P, the output selection section 194 supplies the MDCT coefficients of that channel supplied from the channel audio signal decoding section 193 to the IMDCT section 196 .

０値出力部１９５は、出力選択部１９４から供給されたMDCT係数に基づいてオーディオ信号を生成し、ミキシング部１６３に供給する。この場合、MDCT係数は０であるので、無音のオーディオ信号が生成される。 The 0-value output unit 195 generates an audio signal based on the MDCT coefficients supplied from the output selection unit 194 and supplies the audio signal to the mixing unit 163 . In this case, since the MDCT coefficient is 0, a silent audio signal is generated.

IMDCT部１９６は、出力選択部１９４から供給されたMDCT係数に基づいてIMDCTを行ってオーディオ信号を生成し、ミキシング部１６３に供給する。 The IMDCT section 196 performs IMDCT based on the MDCT coefficients supplied from the output selection section 194 to generate an audio signal, and supplies the audio signal to the mixing section 163 .

オブジェクトオーディオ信号取得部１９７は、優先度情報取得部１９１から供給されたビットストリームから各オブジェクトの符号化データを取得して、オブジェクトオーディオ信号復号部１９８に供給する。オブジェクトオーディオ信号復号部１９８は、オブジェクトオーディオ信号取得部１９７から供給された各オブジェクトの符号化データを復号し、その結果得られたMDCT係数を出力選択部１９９に供給する。 The object audio signal acquisition unit 197 acquires encoded data of each object from the bitstream supplied from the priority information acquisition unit 191 and supplies the encoded data to the object audio signal decoding unit 198 . The object audio signal decoding unit 198 decodes the encoded data of each object supplied from the object audio signal acquisition unit 197 and supplies the resulting MDCT coefficients to the output selection unit 199 .

出力選択部１９９は、優先度情報取得部１９１から供給された各オブジェクトの優先度情報に基づいて、オブジェクトオーディオ信号復号部１９８から供給された各オブジェクトのMDCT係数の出力先を選択的に切り替える。 The output selection unit 199 selectively switches the output destination of the MDCT coefficients of each object supplied from the object audio signal decoding unit 198 based on the priority information of each object supplied from the priority information acquisition unit 191 .

すなわち、出力選択部１９９は、所定のオブジェクトについての優先度情報が所定の閾値Ｑ未満である場合、そのオブジェクトのMDCT係数を０として０値出力部２００に供給する。また、出力選択部１９９は、所定のオブジェクトについての優先度情報が所定の閾値Ｑ以上である場合、オブジェクトオーディオ信号復号部１９８から供給された、そのオブジェクトのMDCT係数をIMDCT部２０１に供給する。 That is, when the priority information for a given object is less than a given threshold Q, the output selector 199 sets the MDCT coefficient of that object to 0 and supplies it to the 0 value output section 200 . Also, when the priority information for a given object is equal to or greater than a given threshold Q, the output selector 199 supplies the MDCT coefficients of the object supplied from the object audio signal decoder 198 to the IMDCT unit 201 .

なお、閾値Ｑの値は、閾値Ｐの値と同じであってもよいし、閾値Ｐの値と異なる値であってもよい。復号装置１５１の計算能力等に応じて適切に閾値Ｐおよび閾値Ｑを定めることにより、オーディオ信号の復号の計算量を、復号装置１５１がリアルタイムに復号することが可能な範囲内の計算量まで低減させることができる。 Note that the value of the threshold Q may be the same as the value of the threshold P, or may be a value different from the value of the threshold P. By appropriately determining the threshold value P and the threshold value Q according to the computational capacity of the decoding device 151, the computational complexity of decoding the audio signal is reduced to the computational complexity within the range that the decoding device 151 can decode in real time. can be made

０値出力部２００は、出力選択部１９９から供給されたMDCT係数に基づいてオーディオ信号を生成し、レンダリング部１６２に供給する。この場合、MDCT係数は０であるので、無音のオーディオ信号が生成される。 The 0-value output unit 200 generates an audio signal based on the MDCT coefficients supplied from the output selection unit 199 and supplies the audio signal to the rendering unit 162 . In this case, since the MDCT coefficient is 0, a silent audio signal is generated.

IMDCT部２０１は、出力選択部１９９から供給されたMDCT係数に基づいてIMDCTを行ってオーディオ信号を生成し、レンダリング部１６２に供給する。 The IMDCT unit 201 performs IMDCT based on the MDCT coefficients supplied from the output selection unit 199 to generate an audio signal, and supplies the audio signal to the rendering unit 162 .

〈復号処理の説明〉
次に、復号装置１５１の動作について説明する。 <Description of Decryption Processing>
Next, the operation of the decoding device 151 will be described.

復号装置１５１は、符号化装置１１から１フレーム分のビットストリームが供給されると、復号処理を行ってオーディオ信号を生成し、スピーカへと出力する。以下、図１１のフローチャートを参照して、復号装置１５１により行われる復号処理について説明する。 When the bit stream for one frame is supplied from the encoding device 11, the decoding device 151 performs decoding processing to generate an audio signal and outputs the audio signal to a speaker. The decoding process performed by the decoding device 151 will be described below with reference to the flowchart of FIG.

ステップＳ５１において、アンパッキング／復号部１６１は、符号化装置１１から送信されてきたビットストリームを取得する。すなわち、ビットストリームが受信される。 In step S51 , the unpacking/decoding unit 161 acquires the bitstream transmitted from the encoding device 11 . That is, a bitstream is received.

ステップＳ５２において、アンパッキング／復号部１６１は選択復号処理を行う。 In step S52, the unpacking/decoding unit 161 performs selective decoding processing.

なお、選択復号処理の詳細は後述するが、選択復号処理では各チャネルの符号化データと、各オブジェクトの符号化データとが優先度情報に基づいて選択的に復号される。そして、その結果得られた各チャネルのオーディオ信号がミキシング部１６３に供給され、各オブジェクトのオーディオ信号がレンダリング部１６２に供給される。また、ビットストリームから取得された各オブジェクトのメタデータがレンダリング部１６２に供給される。 Although details of the selective decoding process will be described later, in the selective decoding process, encoded data of each channel and encoded data of each object are selectively decoded based on priority information. Then, the audio signal of each channel obtained as a result is supplied to the mixing unit 163 , and the audio signal of each object is supplied to the rendering unit 162 . Metadata of each object obtained from the bitstream is also supplied to the rendering unit 162 .

ステップＳ５３において、レンダリング部１６２は、アンパッキング／復号部１６１から供給された各オブジェクトのオーディオ信号、および各オブジェクトのメタデータとしての空間位置情報に基づいて、各オブジェクトのオーディオ信号のレンダリングを行う。 In step S53, the rendering unit 162 renders the audio signal of each object based on the audio signal of each object supplied from the unpacking/decoding unit 161 and the spatial position information as metadata of each object.

例えばレンダリング部１６２は、空間位置情報に基づいてVBAP（Vector Base Amplitude Pannning）により、各オブジェクトの音像が空間位置情報により示される位置に定位するように各チャネルのオーディオ信号を生成し、ミキシング部１６３に供給する。 For example, the rendering unit 162 generates audio signals for each channel by VBAP (Vector Base Amplitude Panning) based on the spatial position information so that the sound image of each object is localized at the position indicated by the spatial position information, and the mixing unit 163 supply to

ステップＳ５４において、ミキシング部１６３は、アンパッキング／復号部１６１から供給された各チャネルのオーディオ信号と、レンダリング部１６２から供給された各チャネルのオーディオ信号とをチャネルごとに重み付け加算し、外部のスピーカに供給する。これにより、各スピーカには、それらのスピーカに対応するチャネルのオーディオ信号が供給されるので、各スピーカは供給されたオーディオ信号に基づいて音声を再生する。 In step S54, the mixing unit 163 weights and adds the audio signal of each channel supplied from the unpacking/decoding unit 161 and the audio signal of each channel supplied from the rendering unit 162 for each channel. supply to As a result, each speaker is supplied with the audio signal of the channel corresponding to the speaker, so that each speaker reproduces sound based on the supplied audio signal.

各チャネルのオーディオ信号がスピーカに供給されると、復号処理は終了する。 The decoding process ends when the audio signal of each channel is supplied to the speaker.

以上のようにして、復号装置１５１は、ビットストリームから優先度情報を取得して、その優先度情報に応じて各チャネルおよび各オブジェクトの符号化データを復号する。 As described above, the decoding device 151 acquires priority information from the bitstream and decodes the encoded data of each channel and each object according to the priority information.

〈選択復号処理の説明〉
続いて、図１２のフローチャートを参照して、図１１のステップＳ５２の処理に対応する選択復号処理について説明する。 <Description of selective decryption processing>
Next, the selective decoding process corresponding to the process of step S52 in FIG. 11 will be described with reference to the flowchart in FIG.

ステップＳ８１において、優先度情報取得部１９１は、供給されたビットストリームから、各チャネルのオーディオ信号の優先度情報、および各オブジェクトのオーディオ信号の優先度情報を取得して、それぞれ出力選択部１９４および出力選択部１９９に供給する。 In step S81, the priority information acquisition unit 191 acquires the priority information of the audio signal of each channel and the priority information of the audio signal of each object from the supplied bitstream, and selects the output selection unit 194 and the output selection unit 194 respectively. It is supplied to the output selection section 199 .

また、優先度情報取得部１９１は、ビットストリームから各オブジェクトのメタデータを取得してレンダリング部１６２に供給するとともに、ビットストリームをチャネルオーディオ信号取得部１９２およびオブジェクトオーディオ信号取得部１９７に供給する。 The priority information acquisition unit 191 also acquires metadata of each object from the bitstream and supplies the metadata to the rendering unit 162 , and supplies the bitstream to the channel audio signal acquisition unit 192 and the object audio signal acquisition unit 197 .

ステップＳ８２において、チャネルオーディオ信号取得部１９２は、処理対象とするチャネルのチャネル番号に０を設定し、保持する。 In step S82, the channel audio signal acquisition unit 192 sets 0 to the channel number of the channel to be processed and holds it.

ステップＳ８３において、チャネルオーディオ信号取得部１９２は、保持しているチャネル番号がチャネル数Ｍ未満であるか否かを判定する。 In step S83, the channel audio signal acquisition unit 192 determines whether or not the retained channel number is less than the number M of channels.

ステップＳ８３において、チャネル番号がＭ未満であると判定された場合、ステップＳ８４において、チャネルオーディオ信号復号部１９３は、処理対象のチャネルのオーディオ信号の符号化データを復号する。 If it is determined in step S83 that the channel number is less than M, in step S84 the channel audio signal decoding unit 193 decodes the encoded data of the audio signal of the channel to be processed.

すなわち、チャネルオーディオ信号取得部１９２は、優先度情報取得部１９１から供給されたビットストリームから、処理対象のチャネルの符号化データを取得してチャネルオーディオ信号復号部１９３に供給する。 That is, the channel audio signal acquisition unit 192 acquires encoded data of the channel to be processed from the bitstream supplied from the priority information acquisition unit 191 and supplies the encoded data to the channel audio signal decoding unit 193 .

すると、チャネルオーディオ信号復号部１９３は、チャネルオーディオ信号取得部１９２から供給された符号化データを復号し、その結果得られたMDCT係数を出力選択部１９４に供給する。 Then, the channel audio signal decoding unit 193 decodes the encoded data supplied from the channel audio signal acquisition unit 192 and supplies the resulting MDCT coefficients to the output selection unit 194 .

ステップＳ８５において、出力選択部１９４は、優先度情報取得部１９１から供給された処理対象のチャネルの優先度情報が、図示せぬ上位の制御装置等により指定された閾値Ｐ以上であるか否かを判定する。ここで閾値Ｐは、例えば復号装置１５１の計算能力等に応じて定められる。 In step S85, the output selection unit 194 determines whether the priority information of the channel to be processed supplied from the priority information acquisition unit 191 is equal to or greater than a threshold value P designated by an unillustrated host control device or the like. judge. Here, the threshold value P is determined according to the computing power of the decoding device 151, for example.

ステップＳ８５において、優先度情報が閾値Ｐ以上であると判定された場合、出力選択部１９４は、チャネルオーディオ信号復号部１９３から供給された、処理対象のチャネルのMDCT係数をIMDCT部１９６に供給し、処理はステップＳ８６に進む。この場合、処理対象のチャネルのオーディオ信号の優先度合いは、所定の優先度合い以上であるので、そのチャネルについての復号、より詳細にはIMDCTが行われる。 In step S85, when it is determined that the priority information is equal to or greater than the threshold value P, the output selection unit 194 supplies the MDCT coefficients of the channel to be processed, supplied from the channel audio signal decoding unit 193, to the IMDCT unit 196. , the process proceeds to step S86. In this case, since the priority of the audio signal of the channel to be processed is equal to or higher than the predetermined priority, decoding, more specifically IMDCT, is performed on that channel.

ステップＳ８６において、IMDCT部１９６は、出力選択部１９４から供給されたMDCT係数に基づいてIMDCTを行って、処理対象のチャネルのオーディオ信号を生成し、ミキシング部１６３に供給する。オーディオ信号が生成されると、その後、処理はステップＳ８７へと進む。 In step S86 , the IMDCT section 196 performs IMDCT based on the MDCT coefficients supplied from the output selection section 194 to generate an audio signal of the channel to be processed, and supplies the audio signal to the mixing section 163 . After the audio signal is generated, the process proceeds to step S87.

これに対して、ステップＳ８５において、優先度情報が閾値Ｐ未満であると判定された場合、出力選択部１９４は、MDCT係数を０として０値出力部１９５に供給する。 On the other hand, when it is determined in step S85 that the priority information is less than the threshold value P, the output selection unit 194 sets the MDCT coefficient to 0 and supplies it to the 0 value output unit 195 .

０値出力部１９５は、出力選択部１９４から供給された０であるMDCT係数から、処理対象のチャネルのオーディオ信号を生成し、ミキシング部１６３に供給する。したがって、０値出力部１９５では、実質的にはIMDCTなどのオーディオ信号を生成するための処理は何も行われない。 The 0-value output unit 195 generates an audio signal of the channel to be processed from the MDCT coefficients of 0 supplied from the output selection unit 194 , and supplies the audio signal to the mixing unit 163 . Therefore, the 0-value output unit 195 does not substantially perform any processing for generating an audio signal such as IMDCT.

なお、０値出力部１９５により生成されるオーディオ信号は無音信号である。オーディオ信号が生成されると、その後、処理はステップＳ８７へと進む。 Note that the audio signal generated by the 0-value output unit 195 is a silent signal. After the audio signal is generated, the process proceeds to step S87.

ステップＳ８５において優先度情報が閾値Ｐ未満であると判定されたか、またはステップＳ８６においてオーディオ信号が生成されると、ステップＳ８７において、チャネルオーディオ信号取得部１９２は、保持しているチャネル番号に１を加え、処理対象のチャネルのチャネル番号を更新する。 If it is determined in step S85 that the priority information is less than the threshold value P, or if an audio signal is generated in step S86, then in step S87 the channel audio signal acquisition unit 192 adds 1 to the retained channel number. In addition, update the channel number of the channel to be processed.

チャネル番号が更新されると、その後、処理はステップＳ８３に戻り、上述した処理が繰り返し行われる。すなわち、新たな処理対象のチャネルのオーディオ信号が生成される。 After the channel number is updated, the process returns to step S83 and the above-described processes are repeated. That is, the audio signal of the new channel to be processed is generated.

また、ステップＳ８３において、処理対象のチャネルのチャネル番号がＭ未満ではないと判定された場合、全てのチャネルについてオーディオ信号が得られたので、処理はステップＳ８８へと進む。 Further, when it is determined in step S83 that the channel number of the channel to be processed is not less than M, the process proceeds to step S88 because audio signals have been obtained for all channels.

ステップＳ８８において、オブジェクトオーディオ信号取得部１９７は、処理対象とするオブジェクトのオブジェクト番号に０を設定し、保持する。 In step S88, the object audio signal acquisition unit 197 sets 0 to the object number of the object to be processed and holds it.

ステップＳ８９において、オブジェクトオーディオ信号取得部１９７は、保持しているオブジェクト番号がオブジェクト数Ｎ未満であるか否かを判定する。 In step S89, the object audio signal acquisition unit 197 determines whether or not the retained object number is less than the number N of objects.

ステップＳ８９において、オブジェクト番号がＮ未満であると判定された場合、ステップＳ９０において、オブジェクトオーディオ信号復号部１９８は、処理対象のオブジェクトのオーディオ信号の符号化データを復号する。 If it is determined in step S89 that the object number is less than N, in step S90 the object audio signal decoding unit 198 decodes the encoded data of the audio signal of the object to be processed.

すなわち、オブジェクトオーディオ信号取得部１９７は、優先度情報取得部１９１から供給されたビットストリームから、処理対象のオブジェクトの符号化データを取得してオブジェクトオーディオ信号復号部１９８に供給する。 That is, the object audio signal acquisition unit 197 acquires encoded data of the object to be processed from the bitstream supplied from the priority information acquisition unit 191 and supplies the encoded data to the object audio signal decoding unit 198 .

すると、オブジェクトオーディオ信号復号部１９８は、オブジェクトオーディオ信号取得部１９７から供給された符号化データを復号し、その結果得られたMDCT係数を出力選択部１９９に供給する。 Then, the object audio signal decoding unit 198 decodes the encoded data supplied from the object audio signal acquisition unit 197 and supplies the resulting MDCT coefficients to the output selection unit 199 .

ステップＳ９１において、出力選択部１９９は、優先度情報取得部１９１から供給された処理対象のオブジェクトの優先度情報が、図示せぬ上位の制御装置等により指定された閾値Ｑ以上であるか否かを判定する。ここで閾値Ｑは、例えば復号装置１５１の計算能力等に応じて定められる。 In step S91, the output selection unit 199 determines whether the priority information of the object to be processed, which is supplied from the priority information acquisition unit 191, is equal to or greater than a threshold value Q specified by a higher control device (not shown). judge. Here, the threshold Q is determined according to, for example, the computing power of the decoding device 151 .

ステップＳ９１において、優先度情報が閾値Ｑ以上であると判定された場合、出力選択部１９９は、オブジェクトオーディオ信号復号部１９８から供給された、処理対象のオブジェクトのMDCT係数をIMDCT部２０１に供給し、処理はステップＳ９２に進む。 If it is determined in step S91 that the priority information is equal to or greater than the threshold Q, the output selection unit 199 supplies the MDCT coefficients of the object to be processed, supplied from the object audio signal decoding unit 198, to the IMDCT unit 201. , the process proceeds to step S92.

ステップＳ９２において、IMDCT部２０１は、出力選択部１９９から供給されたMDCT係数に基づいてIMDCTを行って、処理対象のオブジェクトのオーディオ信号を生成し、レンダリング部１６２に供給する。オーディオ信号が生成されると、その後、処理はステップＳ９３へと進む。 In step S92 , the IMDCT unit 201 performs IMDCT based on the MDCT coefficients supplied from the output selection unit 199 to generate an audio signal of the object to be processed, and supplies the audio signal to the rendering unit 162 . After the audio signal is generated, the process proceeds to step S93.

これに対して、ステップＳ９１において、優先度情報が閾値Ｑ未満であると判定された場合、出力選択部１９９は、MDCT係数を０として０値出力部２００に供給する。 On the other hand, when it is determined in step S91 that the priority information is less than the threshold value Q, the output selection unit 199 sets the MDCT coefficient to 0 and supplies it to the 0 value output unit 200 .

０値出力部２００は、出力選択部１９９から供給された０であるMDCT係数から、処理対象のオブジェクトのオーディオ信号を生成し、レンダリング部１６２に供給する。したがって、０値出力部２００では、実質的にはIMDCTなどのオーディオ信号を生成するための処理は何も行われない。 The 0-value output unit 200 generates an audio signal of the object to be processed from the MDCT coefficients of 0 supplied from the output selection unit 199 , and supplies the audio signal to the rendering unit 162 . Therefore, the 0-value output unit 200 does not substantially perform any processing for generating an audio signal such as IMDCT.

なお、０値出力部２００により生成されるオーディオ信号は無音信号である。オーディオ信号が生成されると、その後、処理はステップＳ９３へと進む。 Note that the audio signal generated by the 0-value output unit 200 is a silent signal. After the audio signal is generated, the process proceeds to step S93.

ステップＳ９１において優先度情報が閾値Ｑ未満であると判定されたか、またはステップＳ９２においてオーディオ信号が生成されると、ステップＳ９３において、オブジェクトオーディオ信号取得部１９７は、保持しているオブジェクト番号に１を加え、処理対象のオブジェクトのオブジェクト番号を更新する。 If it is determined in step S91 that the priority information is less than the threshold value Q, or if an audio signal is generated in step S92, in step S93 the object audio signal acquisition unit 197 adds 1 to the retained object number. In addition, update the object number of the object being processed.

オブジェクト番号が更新されると、その後、処理はステップＳ８９に戻り、上述した処理が繰り返し行われる。すなわち、新たな処理対象のオブジェクトのオーディオ信号が生成される。 After the object number is updated, the process returns to step S89 and the above-described processes are repeated. That is, the audio signal of the new object to be processed is generated.

また、ステップＳ８９において、処理対象のオブジェクトのオブジェクト番号がＮ未満ではないと判定された場合、全てのチャネルおよびオブジェクトについてオーディオ信号が得られたので選択復号処理は終了し、その後、処理は図１１のステップＳ５３に進む。 Also, if it is determined in step S89 that the object number of the object to be processed is not less than N, the selective decoding process ends because audio signals have been obtained for all channels and objects, and then the process continues as shown in FIG. to step S53.

以上のようにして、復号装置１５１は、各チャネルまたは各オブジェクトについて、優先度情報と閾値とを比較して、処理対象のフレームのチャネルやオブジェクトごとに符号化されたオーディオ信号の復号を行うか否かを判定しながら、符号化されたオーディオ信号を復号する。 As described above, the decoding device 151 compares the priority information and the threshold for each channel or each object, and decodes the encoded audio signal for each channel or object of the frame to be processed. The encoded audio signal is decoded while determining whether or not.

すなわち、復号装置１５１では、各オーディオ信号の優先度情報に応じた所定の数だけ、符号化されたオーディオ信号が復号され、残りのオーディオ信号は復号されない。 That is, the decoding device 151 decodes a predetermined number of encoded audio signals according to the priority information of each audio signal, and does not decode the remaining audio signals.

これにより、再生環境に合わせて優先度合いの高いオーディオ信号のみを選択的に復号することができ、オーディオ信号により再生される音声の音質の劣化を最小限に抑えつつ、復号の計算量を低減させることができる。 As a result, it is possible to selectively decode only audio signals with a high priority according to the reproduction environment, thereby minimizing the deterioration of the sound quality of the sound reproduced by the audio signals and reducing the computational complexity of decoding. be able to.

しかも、各オブジェクトのオーディオ信号の優先度情報に基づいて、符号化されたオーディオ信号の復号を行うことで、オーディオ信号の復号の計算量だけでなく、レンダリング部１６２等における処理など、その後の処理の計算量も低減させることができる。 Moreover, by decoding the encoded audio signal based on the priority information of the audio signal of each object, not only the computational complexity of decoding the audio signal, but also the subsequent processing such as the processing in the rendering unit 162 etc. can also be reduced.

〈第１の実施の形態の変形例１〉
〈優先度情報について〉
なお、以上においては各チャネルや各オブジェクトの１つのオーディオ信号に対して、１つの優先度情報が生成されると説明したが、複数の優先度情報が生成されるようにしてもよい。 <Modification 1 of the first embodiment>
<About priority information>
In the above description, one piece of priority information is generated for one audio signal of each channel or each object, but a plurality of pieces of priority information may be generated.

そのような場合、例えば複数の各優先度情報は復号の計算量、すなわち復号側の計算能力に応じて、計算能力ごとに生成される。 In such a case, for example, a plurality of pieces of priority information are generated for each computational capacity according to the computational complexity of decoding, that is, the computational capacity of the decoding side.

具体的には、例えば2チャネル相当のオーディオ信号をリアルタイムに復号するための計算量に基づいて、2チャネル相当の計算能力を有する機器のための優先度情報が生成される。 Specifically, for example, based on the amount of calculation for decoding audio signals corresponding to two channels in real time, priority information is generated for a device having a calculation capability corresponding to two channels.

このような2チャネル相当の機器のための優先度情報では、例えば全オーディオ信号のうち、より優先度合いが低い、つまり０に近い値が優先度情報として割り当てられるオーディオ信号が多くなるように優先度情報が生成される。 For such priority information for a device equivalent to two channels, for example, among all audio signals, the priority is assigned so that the number of audio signals with a lower priority, that is, a value close to 0, is assigned as priority information. Information is generated.

また、例えば24チャネル相当のオーディオ信号をリアルタイムに復号するための計算量に基づいて、24チャネル相当の計算能力を有する機器のための優先度情報も生成される。24チャネル相当の機器のための優先度情報では、例えば全オーディオ信号のうち、より優先度合いが高い、つまり７に近い値が優先度情報として割り当てられるオーディオ信号が多くなるように優先度情報が生成される。 Also, for example, based on the amount of calculation for decoding audio signals corresponding to 24 channels in real time, priority information for devices having calculation capability corresponding to 24 channels is also generated. Priority information for devices with 24 channels is generated so that, among all audio signals, for example, the number of audio signals with a higher priority, that is, a value close to 7, is assigned as priority information. be done.

この場合、例えば優先度情報生成部５２は、図８のステップＳ１１において、各チャネルのオーディオ信号に対して2チャネル相当の機器のための優先度情報を生成するとともに、それらの優先度情報に2チャネル相当の機器のためのものであることを示す識別子を付加し、パッキング部２４に供給する。 In this case, for example, in step S11 of FIG. 8, the priority information generation unit 52 generates priority information for devices corresponding to two channels for the audio signal of each channel, and adds 2 to the priority information. An identifier indicating that it is for a device corresponding to a channel is added, and supplied to the packing unit 24 .

さらに、優先度情報生成部５２は、ステップＳ１１において、各チャネルのオーディオ信号に対して24チャネル相当の機器のための優先度情報も生成するとともに、それらの優先度情報に24チャネル相当の機器のためのものであることを示す識別子を付加し、パッキング部２４に供給する。 Furthermore, in step S11, the priority information generation unit 52 also generates priority information for devices corresponding to 24 channels for the audio signals of each channel, and adds priority information for devices corresponding to 24 channels to the priority information. It is supplied to the packing unit 24 after adding an identifier indicating that it is intended for the purpose.

同様に、優先度情報生成部９２も図８のステップＳ１３において、2チャネル相当の機器のための優先度情報と、24チャネル相当の機器のための優先度情報とを生成して識別子を付加し、パッキング部２４に供給する。 Similarly, in step S13 of FIG. 8, the priority information generation unit 92 also generates priority information for a device corresponding to 2 channels and priority information for a device corresponding to 24 channels, and adds an identifier. , to the packing section 24 .

これにより、例えばポータブルオーディオプレーヤや、多機能型携帯電話機、タブレット型コンピュータ、テレビジョン受像機、パーソナルコンピュータ、高品位な音響機器などの再生機器の計算能力に応じた優先度情報が複数得られることになる。 As a result, a plurality of pieces of priority information can be obtained according to the computing power of playback devices such as portable audio players, multifunctional mobile phones, tablet computers, television receivers, personal computers, and high-quality audio equipment. become.

例えばポータブルオーディオプレーヤなどの再生機器は、比較的計算能力が低いので、そのような再生機器では、2チャネル相当の機器のための優先度情報に基づいて符号化されたオーディオ信号を復号すれば、リアルタイムでオーディオ信号の再生を行うことができる。 For example, a playback device such as a portable audio player has relatively low computational power, so if such a playback device decodes an audio signal encoded based on priority information for two channels worth of devices, Playback of audio signals can be performed in real time.

以上のように、１つのオーディオ信号に対して複数の優先度情報が生成される場合、復号装置１５１では、例えば上位の制御装置により、複数の優先度情報のうちのどの優先度情報を用いて復号を行うかが優先度情報取得部１９１等に対して指示がされる。どの優先度情報を用いるかの指示は、例えば識別子が供給されることにより行われる。 As described above, when a plurality of pieces of priority information are generated for one audio signal, in the decoding device 151, for example, a higher-level control device uses which priority information among the plurality of pieces of priority information. The priority information acquisition unit 191 or the like is instructed as to whether or not to perform decoding. The indication of which priority information to use is made, for example, by supplying an identifier.

なお、どの識別子の優先度情報を用いるかが、復号装置１５１ごとに予め定められているようにしてもよい。 It should be noted that the identifier priority information to be used may be determined in advance for each decoding device 151 .

例えば優先度情報取得部１９１において、予めどの識別子の優先度情報を用いるかが定められた場合、または上位の制御装置により識別子が指定された場合、図１２のステップＳ８１では、優先度情報取得部１９１は、定められた識別子が付加されている優先度情報を取得する。そして、取得された優先度情報が優先度情報取得部１９１から、出力選択部１９４や出力選択部１９９に供給される。 For example, in the priority information acquisition unit 191, when the priority information of which identifier is to be used is determined in advance, or when the identifier is specified by the upper control device, in step S81 of FIG. 12, the priority information acquisition unit 191 acquires priority information to which a predetermined identifier is added. Then, the acquired priority information is supplied from the priority information acquisition section 191 to the output selection section 194 and the output selection section 199 .

換言すれば、ビットストリームに格納されている複数の優先度情報のなかから、復号装置１５１、より詳細にはアンパッキング／復号部１６１の計算能力等に応じて適切な優先度情報が１つ選択される。 In other words, from among the plurality of priority information stored in the bitstream, one appropriate priority information is selected according to the computing power of the decoding device 151, more specifically, the unpacking/decoding unit 161. be done.

この場合、各チャネルの優先度情報と、各オブジェクトの優先度情報とで異なる識別子が利用されてビットストリームから優先度情報が読み出されてもよい。 In this case, the priority information may be read from the bitstream using different identifiers for the priority information of each channel and the priority information of each object.

このように、ビットストリームに含まれている複数の優先度情報のなかから、特定の優先度情報を選択して取得することにより、復号装置１５１の計算能力等に応じて適切な優先度情報を選択し、復号を行うことができる。これにより、何れの復号装置１５１においてもリアルタイムでオーディオ信号を再生することができるようになる。 In this way, by selecting and acquiring specific priority information from a plurality of pieces of priority information included in the bitstream, appropriate priority information can be obtained according to the computing power of the decoding device 151. It can be selected and decrypted. As a result, any decoding device 151 can reproduce the audio signal in real time.

〈第２の実施の形態〉
〈アンパッキング／復号部の構成例〉
なお、以上においては、符号化装置１１から出力されるビットストリームに優先度情報が含まれている例について説明したが、符号化装置によっては、ビットストリームに優先度情報が含まれていないこともあり得る。 <Second embodiment>
<Configuration example of unpacking/decoding section>
In the above description, an example in which the bitstream output from the encoding device 11 includes priority information has been described, but depending on the encoding device, the bitstream may not include priority information. could be.

そこで、復号装置１５１において優先度情報を生成するようにしてもよい。例えば、ビットストリームに含まれているオーディオ信号の符号化データから抽出できる、オーディオ信号の音圧を示す情報やスペクトル形状を示す情報を用いて優先度情報を生成することが可能である。 Therefore, priority information may be generated in the decoding device 151 . For example, it is possible to generate the priority information using information indicating the sound pressure of the audio signal or information indicating the spectrum shape, which can be extracted from the encoded data of the audio signal included in the bitstream.

このように、復号装置１５１において優先度情報を生成する場合、復号装置１５１のアンパッキング／復号部１６１は、例えば図１３に示すように構成される。なお、図１３において、図１０における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 When generating priority information in the decoding device 151 in this way, the unpacking/decoding unit 161 of the decoding device 151 is configured as shown in FIG. 13, for example. 13, parts corresponding to those in FIG. 10 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

図１３に示すアンパッキング／復号部１６１は、チャネルオーディオ信号取得部１９２、チャネルオーディオ信号復号部１９３、出力選択部１９４、０値出力部１９５、IMDCT部１９６、オブジェクトオーディオ信号取得部１９７、オブジェクトオーディオ信号復号部１９８、出力選択部１９９、０値出力部２００、IMDCT部２０１、優先度情報生成部２３１、および優先度情報生成部２３２を有している。 The unpacking/decoding unit 161 shown in FIG. 13 includes a channel audio signal acquisition unit 192, a channel audio signal decoding unit 193, an output selection unit 194, a 0 value output unit 195, an IMDCT unit 196, an object audio signal acquisition unit 197, an object audio It has a signal decoding section 198 , an output selection section 199 , a 0 value output section 200 , an IMDCT section 201 , a priority information generation section 231 and a priority information generation section 232 .

図１３に示すアンパッキング／復号部１６１の構成は、優先度情報取得部１９１が設けられておらず、新たに優先度情報生成部２３１、および優先度情報生成部２３２が設けられている点で図１０のアンパッキング／復号部１６１と異なり、他の構成は図１０のアンパッキング／復号部１６１と同じとなっている。 The configuration of unpacking/decoding section 161 shown in FIG. Unlike the unpacking/decoding section 161 of FIG. 10, other configurations are the same as those of the unpacking/decoding section 161 of FIG.

チャネルオーディオ信号取得部１９２は、供給されたビットストリームから各チャネルの符号化データを取得して、チャネルオーディオ信号復号部１９３および優先度情報生成部２３１に供給する。 The channel audio signal acquisition unit 192 acquires the encoded data of each channel from the supplied bitstream, and supplies it to the channel audio signal decoding unit 193 and the priority information generation unit 231 .

優先度情報生成部２３１は、チャネルオーディオ信号取得部１９２から供給された各チャネルの符号化データに基づいて、各チャネルの優先度情報を生成し、出力選択部１９４に供給する。 The priority information generation unit 231 generates priority information for each channel based on the encoded data for each channel supplied from the channel audio signal acquisition unit 192 and supplies the priority information to the output selection unit 194 .

オブジェクトオーディオ信号取得部１９７は、供給されたビットストリームから各オブジェクトの符号化データを取得して、オブジェクトオーディオ信号復号部１９８および優先度情報生成部２３２に供給する。また、オブジェクトオーディオ信号取得部１９７は、供給されたビットストリームから各オブジェクトのメタデータを取得して、レンダリング部１６２に供給する。 The object audio signal acquisition unit 197 acquires encoded data of each object from the supplied bitstream, and supplies the encoded data to the object audio signal decoding unit 198 and the priority information generation unit 232 . Also, the object audio signal acquisition unit 197 acquires metadata of each object from the supplied bitstream and supplies the metadata to the rendering unit 162 .

優先度情報生成部２３２は、オブジェクトオーディオ信号取得部１９７から供給された各オブジェクトの符号化データに基づいて、各オブジェクトの優先度情報を生成し、出力選択部１９９に供給する。 The priority information generation unit 232 generates priority information for each object based on the encoded data of each object supplied from the object audio signal acquisition unit 197 and supplies the priority information to the output selection unit 199 .

〈選択復号処理の説明〉
アンパッキング／復号部１６１が図１３に示した構成とされる場合、復号装置１５１は、図１１に示した復号処理のステップＳ５２に対応する処理として、図１４に示す選択復号処理を行う。以下、図１４のフローチャートを参照して、復号装置１５１による選択復号処理について説明する。 <Description of selective decryption processing>
When the unpacking/decoding unit 161 has the configuration shown in FIG. 13, the decoding device 151 performs the selective decoding process shown in FIG. 14 as the process corresponding to step S52 of the decoding process shown in FIG. The selective decoding process by the decoding device 151 will be described below with reference to the flowchart of FIG.

ステップＳ１３１において、優先度情報生成部２３１は各チャネルのオーディオ信号の優先度情報を生成する。 In step S131, the priority information generator 231 generates priority information of the audio signal of each channel.

例えばチャネルオーディオ信号取得部１９２は、供給されたビットストリームから各チャネルの符号化データを取得して、チャネルオーディオ信号復号部１９３および優先度情報生成部２３１に供給する。 For example, the channel audio signal acquisition unit 192 acquires the encoded data of each channel from the supplied bitstream and supplies it to the channel audio signal decoding unit 193 and the priority information generation unit 231 .

優先度情報生成部２３１は、チャネルオーディオ信号取得部１９２から供給された各チャネルの符号化データに基づいて各チャネルの優先度情報を生成し、出力選択部１９４に供給する。 The priority information generation unit 231 generates priority information for each channel based on the encoded data for each channel supplied from the channel audio signal acquisition unit 192 and supplies the priority information to the output selection unit 194 .

例えばビットストリームには、オーディオ信号の符号化データとして、MDCT係数を得るためのスケールファクタ、サイド情報、および量子化スペクトルが含まれている。ここで、スケールファクタはオーディオ信号の音圧を示す情報であり、量子化スペクトルはオーディオ信号のスペクトル形状を示す情報である。 For example, the bitstream contains scale factors for obtaining MDCT coefficients, side information, and quantization spectra as coded data of the audio signal. Here, the scale factor is information indicating the sound pressure of the audio signal, and the quantization spectrum is information indicating the spectral shape of the audio signal.

優先度情報生成部２３１は、各チャネルの符号化データとして含まれているスケールファクタや量子化スペクトルに基づいて、各チャネルのオーディオ信号の優先度情報を生成する。このように、スケールファクタや量子化スペクトルを用いて優先度情報を生成すれば、符号化データの復号を行う前に、直ちに優先度情報を得ることができ、優先度情報の生成のための計算量も低減させることができる。 The priority information generation unit 231 generates priority information of the audio signal of each channel based on the scale factor and quantized spectrum included as encoded data of each channel. In this way, if priority information is generated using a scale factor or a quantized spectrum, priority information can be obtained immediately before decoding encoded data. The amount can also be reduced.

なお、優先度情報は、その他、MDCT係数の自乗平均値を計算することで得られる、オーディオ信号の音圧や、MDCT係数のピーク包絡から得られるオーディオ信号のスペクトル形状に基づいて生成されるようにしてもよい。この場合、優先度情報生成部２３１は、適宜、符号化データの復号を行ったり、チャネルオーディオ信号復号部１９３からMDCT係数を取得したりする。 The priority information is also generated based on the sound pressure of the audio signal, which is obtained by calculating the mean square value of the MDCT coefficients, and the spectral shape of the audio signal, which is obtained from the peak envelope of the MDCT coefficients. can be In this case, the priority information generating section 231 decodes the encoded data or acquires the MDCT coefficients from the channel audio signal decoding section 193 as appropriate.

各チャネルの優先度情報が得られると、その後、ステップＳ１３２乃至ステップＳ１３７の処理が行われるが、これらの処理は図１２のステップＳ８２乃至ステップＳ８７の処理と同様であるので、その説明は省略する。但し、この場合、すでに各チャネルの符号化データは取得されているので、ステップＳ１３４では符号化データの復号のみが行われる。 After the priority information of each channel is obtained, the processes of steps S132 to S137 are performed. Since these processes are the same as the processes of steps S82 to S87 in FIG. 12, the description thereof is omitted. . However, in this case, since the encoded data of each channel has already been acquired, only the encoded data is decoded in step S134.

また、ステップＳ１３３において、チャネル番号がＭ未満でないと判定されると、ステップＳ１３８において、優先度情報生成部２３２は各オブジェクトのオーディオ信号の優先度情報を生成する。 Also, when it is determined in step S133 that the channel number is not less than M, in step S138 the priority information generation unit 232 generates priority information of the audio signal of each object.

例えばオブジェクトオーディオ信号取得部１９７は、供給されたビットストリームから各オブジェクトの符号化データを取得して、オブジェクトオーディオ信号復号部１９８および優先度情報生成部２３２に供給する。また、オブジェクトオーディオ信号取得部１９７は、供給されたビットストリームから各オブジェクトのメタデータを取得して、レンダリング部１６２に供給する。 For example, the object audio signal acquisition unit 197 acquires encoded data of each object from the supplied bitstream, and supplies it to the object audio signal decoding unit 198 and the priority information generation unit 232 . Also, the object audio signal acquisition unit 197 acquires metadata of each object from the supplied bitstream and supplies the metadata to the rendering unit 162 .

優先度情報生成部２３２は、オブジェクトオーディオ信号取得部１９７から供給された各オブジェクトの符号化データに基づいて各オブジェクトの優先度情報を生成し、出力選択部１９９に供給する。例えば優先度情報は、各チャネルにおける場合と同様に、スケールファクタや量子化スペクトルに基づいて生成される。 The priority information generation unit 232 generates priority information for each object based on the encoded data of each object supplied from the object audio signal acquisition unit 197 and supplies the priority information to the output selection unit 199 . For example, priority information is generated based on scale factors and quantized spectra, as in each channel.

また、MDCT係数から得られる音圧やスペクトル形状に基づいて優先度情報が生成されてもよい。この場合、優先度情報生成部２３２は、適宜、符号化データの復号を行ったり、オブジェクトオーディオ信号復号部１９８からMDCT係数を取得したりする。 Also, priority information may be generated based on the sound pressure and spectral shape obtained from the MDCT coefficients. In this case, the priority information generator 232 appropriately decodes the encoded data and acquires the MDCT coefficients from the object audio signal decoder 198 .

各オブジェクトの優先度情報が得られると、その後、ステップＳ１３９乃至ステップＳ１４４の処理が行われて選択復号処理は終了するが、これらの処理は図１２のステップＳ８８乃至ステップＳ９３の処理と同様であるので、その説明は省略する。但し、この場合、すでに各オブジェクトの符号化データは取得されているので、ステップＳ１４１では符号化データの復号のみが行われる。 After the priority information of each object is obtained, the processing of steps S139 to S144 is performed, and the selective decoding processing ends. These processings are the same as the processing of steps S88 to S93 in FIG. Therefore, its description is omitted. However, in this case, since the encoded data of each object has already been acquired, only the encoded data is decoded in step S141.

選択復号処理が終了すると、その後、処理は図１１のステップＳ５３へと進む。 After the selective decoding process ends, the process proceeds to step S53 in FIG.

以上のようにして、復号装置１５１は、ビットストリームに含まれている符号化データに基づいて、各チャネルや各オブジェクトのオーディオ信号の優先度情報を生成する。このように復号装置１５１において優先度情報を生成することで、各オーディオ信号について適切な優先度情報を少ない計算量で得ることができ、復号の計算量やレンダリング等の計算量を低減させることができる。また、オーディオ信号により再生される音声の音質の劣化を最小限に抑えることもできる。 As described above, the decoding device 151 generates priority information of the audio signal of each channel and each object based on the encoded data included in the bitstream. By generating the priority information in the decoding device 151 in this way, appropriate priority information for each audio signal can be obtained with a small amount of calculation, and the amount of calculation for decoding and rendering can be reduced. can. It is also possible to minimize the deterioration of the sound quality of the sound reproduced by the audio signal.

なお、図１０に示したアンパッキング／復号部１６１の優先度情報取得部１９１が、供給されたビットストリームから、各チャネルおよび各オブジェクトのオーディオ信号の優先度情報を取得しようとしたが、ビットストリームから優先度情報が取得できなかった場合に、優先度情報が生成されるようにしてもよい。そのような場合、優先度情報取得部１９１は、優先度情報生成部２３１や優先度情報生成部２３２と同様の処理を行い、符号化データから各チャネルおよび各オブジェクトのオーディオ信号の優先度情報を生成する。 Note that the priority information acquisition unit 191 of the unpacking/decoding unit 161 shown in FIG. 10 tried to acquire the priority information of the audio signal of each channel and each object from the supplied bitstream, The priority information may be generated when the priority information cannot be obtained from the . In such a case, the priority information acquisition unit 191 performs the same processing as the priority information generation unit 231 and the priority information generation unit 232, and obtains the priority information of the audio signal of each channel and each object from the encoded data. Generate.

〈第３の実施の形態〉
〈優先度情報の閾値について〉
さらに、以上においては、各チャネルや各オブジェクトについて、優先度情報と、閾値Ｐや閾値Ｑとを比較して復号するオーディオ信号、より詳細にはIMDCTを行うMDCT係数を選択すると説明したが、これらの閾値Ｐや閾値Ｑがオーディオ信号のフレームごとに動的に変更されるようにしてもよい。 <Third embodiment>
<Regarding the threshold of priority information>
Furthermore, in the above description, priority information is compared with threshold value P and threshold value Q to select audio signals to be decoded, more specifically, MDCT coefficients for IMDCT are selected for each channel and each object. may be dynamically changed for each frame of the audio signal.

例えば図１０に示したアンパッキング／復号部１６１の優先度情報取得部１９１では、復号を必要とせずに、ビットストリームから各チャネルおよび各オブジェクトの優先度情報を取得することができる。 For example, the priority information obtaining unit 191 of the unpacking/decoding unit 161 shown in FIG. 10 can obtain the priority information of each channel and each object from the bitstream without decoding.

したがって、例えば優先度情報取得部１９１が全チャネルのオーディオ信号の優先度情報を読み出せば、処理対象となっているフレームにおける優先度情報の分布を得ることができる。また、復号装置１５１では、例えば何チャネルまでなら同時に、つまりリアルタイムで処理できるかなど、予め自分自身の計算能力が分かっている。 Therefore, for example, if the priority information acquisition unit 191 reads the priority information of the audio signals of all channels, it is possible to obtain the distribution of the priority information in the frames to be processed. Also, the decoding device 151 knows in advance its own computational ability, such as how many channels it can process simultaneously, that is, in real time.

そこで、優先度情報取得部１９１が処理対象のフレームにおける優先度情報の分布と、復号装置１５１の計算能力とに基づいて、その処理対象のフレームについての優先度情報の閾値Ｐを定めるようにしてもよい。 Therefore, the priority information acquisition unit 191 determines the priority information threshold value P for the processing target frame based on the priority information distribution in the processing target frame and the computational capacity of the decoding device 151. good too.

例えば閾値Ｐは、復号装置１５１がリアルタイムで処理を行うことのできる範囲内で最も多くのオーディオ信号が復号されるように定められる。 For example, the threshold value P is determined so that the maximum number of audio signals is decoded within the range that the decoding device 151 can process in real time.

また、優先度情報取得部１９１は、閾値Ｐにおける場合と同様に閾値Ｑを動的に定めることができる。この場合、優先度情報取得部１９１は全オブジェクトのオーディオ信号の優先度情報に基づいて、それらの優先度情報の分布を求め、求めた分布と、復号装置１５１の計算能力とに基づいて、処理対象のフレームについての優先度情報の閾値Ｑを定める。 In addition, the priority information acquisition unit 191 can dynamically determine the threshold Q as in the case of the threshold P. In this case, the priority information acquisition unit 191 obtains the distribution of the priority information based on the priority information of the audio signals of all objects, and based on the obtained distribution and the computing power of the decoding device 151, the processing is performed. A threshold Q for the priority information for the frame of interest is defined.

このような閾値Ｐや閾値Ｑの決定は、比較的少ない計算量で行うことができる。 Such determination of the threshold P and the threshold Q can be performed with a relatively small amount of calculation.

このように優先度情報の閾値を動的に変化させることで、リアルタイムで復号を行いつつ、オーディオ信号により再生される音声の音質の劣化を最小限に抑えることができる。特にこのような場合、優先度情報を複数用意する必要がなく、また優先度情報に識別子を設ける必要もないので、ビットストリームの符号量も少なくてすむ。 By dynamically changing the threshold of the priority information in this way, it is possible to minimize the deterioration of the sound quality of the sound reproduced by the audio signal while decoding in real time. Especially in such a case, there is no need to prepare a plurality of pieces of priority information, and there is no need to provide an identifier for the priority information, so the code amount of the bitstream can be reduced.

〈オブジェクトのメタデータについて〉
さらに、以上において説明した第１の実施の形態乃至第３の実施の形態では、ビットストリームの先頭のエレメントには、１フレーム分のオブジェクトのメタデータや優先度情報などが格納されると説明した。 <About object metadata>
Furthermore, in the first to third embodiments described above, it was explained that the leading element of the bitstream stores object metadata and priority information for one frame. .

この場合、ビットストリームの先頭のエレメントにおける、オブジェクトのメタデータおよび優先度情報が格納される部分のシンタックスは、例えば図１５に示すようになる。 In this case, the syntax of the portion where the object metadata and priority information are stored in the head element of the bitstream is as shown in FIG. 15, for example.

図１５に示す例では、オブジェクトのメタデータのなかに、オブジェクトの空間位置情報と優先度情報が１フレーム分だけ格納されている。 In the example shown in FIG. 15, the object's spatial position information and priority information for only one frame are stored in the object's metadata.

この例では「num_objects」はオブジェクトの数を示している。また、「object_priority[o]」はO番目のオブジェクトの優先度情報を示している。ここで、O番目のオブジェクトとは、オブジェクト番号により特定されるオブジェクトである。 In this example, "num_objects" indicates the number of objects. Also, "object_priority[o]" indicates the priority information of the O-th object. Here, the O-th object is an object specified by an object number.

「position_azimuth[o]」は、視聴者であるユーザからみた、つまり所定の基準位置からみたO番目のオブジェクトの３次元空間位置を表す水平方向角度を示している。また、「position_elevation[o]」は、視聴者であるユーザからみたO番目のオブジェクトの３次元空間位置を表す垂直方向角度を示している。さらに「position_radius[o]」は、視聴者からO番目のオブジェクトまでの距離を示している。 "position_azimuth[o]" indicates a horizontal angle representing the three-dimensional spatial position of the O-th object as seen from the user who is the viewer, that is, as seen from a predetermined reference position. Also, "position_elevation[o]" indicates the vertical angle representing the three-dimensional spatial position of the O-th object as seen from the user who is the viewer. Furthermore, "position_radius[o]" indicates the distance from the viewer to the Oth object.

したがって、３次元空間におけるオブジェクトの位置は、これらの「position_azimuth[o]」、「position_elevation[o]」、および「position_radius[o]」から特定されることになり、これらの情報がオブジェクトの空間位置情報とされる。 Therefore, the position of the object in the three-dimensional space is specified from these "position_azimuth[o]", "position_elevation[o]", and "position_radius[o]", and this information is the spatial position of the object. regarded as information.

また、「gain_factor[o]」はO番目のオブジェクトの利得を示している。 Also, "gain_factor[o]" indicates the gain of the O-th object.

このように、図１５に示すメタデータには、１つのオブジェクトについての「object_priority[o]」、「position_azimuth[o]」、「position_elevation[o]」、「position_radius[o]」、および「gain_factor[o]」が、そのオブジェクトのデータとして順番に配置されている。そして、メタデータ内には、各オブジェクトのデータが、例えばオブジェクトのオブジェクト番号順に並べられて配置されている。 Thus, the metadata shown in FIG. 15 includes "object_priority[o]", "position_azimuth[o]", "position_elevation[o]", "position_radius[o]", and "gain_factor[o]" for one object. o]” are arranged in order as the data of the object. In the metadata, the data of each object are arranged in the order of the object number of the object, for example.

〈第４の実施の形態〉
〈オーディオ信号の完全再構成と不連続性に起因するノイズについて〉
以上においては、復号装置１５１においてビットストリームから読み出されたチャネルまたはオブジェクトごとの各フレーム（以下では、特に時間フレームと称する）の優先度情報が、予め定められた閾値未満である場合にIMDCT等の復号処理を省くことで、復号時の処理量を削減する例について説明した。具体的には、優先度情報が閾値未満である場合には、０値出力部１９５や０値出力部２００から無音のオーディオ信号を出力する、つまりオーディオ信号として０データを出力すると説明した。 <Fourth Embodiment>
<Complete Reconstruction of Audio Signals and Noise Caused by Discontinuity>
In the above, when the priority information of each frame (hereinafter particularly referred to as a time frame) for each channel or object read from the bitstream in the decoding device 151 is less than a predetermined threshold value, IMDCT etc. An example of reducing the processing amount at the time of decoding by omitting the decoding processing of . Specifically, when the priority information is less than the threshold, the 0-value output unit 195 and the 0-value output unit 200 output a silent audio signal, that is, output 0 data as an audio signal.

ところが、そのような場合、聴感上の音質劣化が生じてしまう。具体的には、オーディオ信号の完全再構成に起因する音質劣化と、グリッチノイズ等の信号の不連続性に起因するノイズの発生による音質劣化が生じる。 However, in such a case, the sound quality deteriorates in terms of audibility. Specifically, sound quality deterioration due to complete reconstruction of the audio signal and sound quality deterioration due to the generation of noise such as glitch noise due to discontinuity of the signal occurs.

（完全再構成に起因する音質劣化）
例えば、優先度情報が閾値未満である場合にオーディオ信号として０データを出力すると、０データの出力と、０データではない通常のオーディオ信号の出力との切り替え時に音質劣化が生じる。 (Sound quality deterioration due to perfect reconstruction)
For example, if 0 data is output as an audio signal when the priority information is less than the threshold, sound quality deterioration occurs when switching between the output of 0 data and the output of a normal audio signal that is not 0 data.

上述したようにアンパッキング／復号部１６１では、IMDCT部１９６やIMDCT部２０１において、ビットストリームから読み出された時間フレームごとのMDCT係数に対してIMDCTが行われる。そして、より詳細にはアンパッキング／復号部１６１では、現時間フレームについてのIMDCTの結果または０データと、１時間フレーム前のIMDCTの結果または０データとから、現時間フレームのオーディオ信号が生成される。 As described above, in unpacking/decoding section 161, IMDCT section 196 and IMDCT section 201 perform IMDCT on the MDCT coefficients for each time frame read from the bitstream. More specifically, the unpacking/decoding unit 161 generates an audio signal of the current time frame from the IMDCT result or 0 data for the current time frame and the IMDCT result or 0 data for the previous time frame. be.

ここで、オーディオ信号の生成について、図１６を参照して説明する。なお、ここでは、オブジェクトのオーディオ信号の生成を例として説明するが、各チャネルのオーディオ信号の生成についても同様である。また、以下では、０値出力部２００から出力されるオーディオ信号、およびIMDCT部２０１から出力されるオーディオ信号を、特にIMDCT信号とも称することとする。同様に、０値出力部１９５から出力されるオーディオ信号、およびIMDCT部１９６から出力されるオーディオ信号を、特にIMDCT信号とも称することとする。 Here, generation of an audio signal will be described with reference to FIG. Here, the generation of the audio signal of the object will be described as an example, but the generation of the audio signal of each channel is the same. Also, hereinafter, the audio signal output from the 0-value output unit 200 and the audio signal output from the IMDCT unit 201 are also specifically referred to as an IMDCT signal. Similarly, the audio signal output from 0-value output section 195 and the audio signal output from IMDCT section 196 are also referred to as IMDCT signals.

図１６では、図中、横方向は時間を示しており、文字「data[n-1]」乃至「data[n+2]」が記された長方形は、それぞれ所定のオブジェクトの時間フレーム（n-1）乃至時間フレーム（n+2）のビットストリームを表している。また、各時間フレームのビットストリーム内の数値は、その時間フレームのオブジェクトの優先度情報の値を示しており、この例では各時間フレームの優先度情報の値は「７」となっている。 In FIG. 16, the horizontal direction indicates time, and the rectangles marked with the letters "data[n-1]" to "data[n+2]" are the time frames (n -1) to time frames (n+2). Also, the numerical value in the bitstream of each time frame indicates the value of the priority information of the object of that time frame, and in this example, the value of the priority information of each time frame is "7".

さらに、図１６において文字「MDCT_coef[q]」（但し、q＝n-1,n,…）が記された長方形は、それぞれ時間フレーム（q）のMDCT係数を表している。 Furthermore, in FIG. 16, rectangles marked with the letters “MDCT_coef[q]” (where q=n-1, n, . . . ) each represent the MDCT coefficients of the time frame (q).

いま、閾値Ｑ＝４であるとすると、時間フレーム（n-1）の優先度情報の値「７」は閾値Ｑ以上であるので、時間フレーム（n-1）についてのMDCT係数に対してIMDCTが行われる。同様に、時間フレーム（n）の優先度情報の値「７」も閾値Ｑ以上であるので、時間フレーム（n）についてのMDCT係数に対してIMDCTが行われる。 Now, assuming that the threshold Q is 4, the value "7" of the priority information of the time frame (n-1) is greater than or equal to the threshold Q. is done. Similarly, since the priority information value "7" for time frame (n) is also equal to or greater than threshold Q, IMDCT is performed on the MDCT coefficients for time frame (n).

その結果、時間フレーム（n-1）のIMDCT信号OPS11と、時間フレーム（n）のIMDCT信号OPS12が得られたとする。 As a result, it is assumed that an IMDCT signal OPS11 of time frame (n-1) and an IMDCT signal OPS12 of time frame (n) are obtained.

この場合、アンパッキング／復号部１６１は、時間フレーム（n）のIMDCT信号OPS12の前半部分と、１時間フレーム前の時間フレーム（n-1）のIMDCT信号OPS11の後半部分とを足し合わせて、時間フレーム（n）のオーディオ信号、つまり期間FL(n)のオーディオ信号とする。換言すれば、IMDCT信号OPS11の期間FL(n)の部分と、IMDCT信号OPS12の期間FL(n)の部分とがオーバーラップ加算されて、処理対象のオブジェクトの符号化前の時間フレーム（n）のオーディオ信号が再現される。 In this case, the unpacking/decoding unit 161 adds the first half of the IMDCT signal OPS12 of the time frame (n) and the second half of the IMDCT signal OPS11 of the time frame (n-1) one time frame before, Let the audio signal of time frame (n), that is, the audio signal of period FL(n). In other words, the portion of the period FL(n) of the IMDCT signal OPS11 and the portion of the period FL(n) of the IMDCT signal OPS12 are overlap-added to obtain the pre-encoding time frame (n) of the object to be processed. audio signal is reproduced.

このような処理は、IMDCT信号がMDCT前の信号に完全再構成されるために必要な処理である。 Such processing is necessary for completely reconstructing the IMDCT signal into a pre-MDCT signal.

しかしながら、上述したアンパッキング／復号部１６１では、例えば図１７に示すように、各時間フレームの優先度情報に応じて、IMDCT部２０１のIMDCT信号と０値出力部２００のIMDCT信号を切り替えるタイミングにおいて、IMDCT信号がMDCT前の信号に完全再構成されなくなる。つまり、オーバーラップ加算時にもとの信号ではなく０データが用いられると、完全再構成されないため、もとのオーディオ信号を再現することができず、オーディオ信号の聴感上の音質が劣化してしまう。 However, in the above-described unpacking/decoding section 161, for example, as shown in FIG. , the IMDCT signal is no longer perfectly reconstructed to the pre-MDCT signal. In other words, if 0 data is used instead of the original signal during overlap addition, the original audio signal cannot be reproduced because it is not completely reconstructed, and the perceived sound quality of the audio signal deteriorates. .

なお、図１７において、図１６における場合と対応する部分には同一の文字等を記してあり、その説明は省略する。 In FIG. 17, the same characters and the like are written in the parts corresponding to those in FIG. 16, and the description thereof will be omitted.

図１７の例では、時間フレーム（n-1）の優先度情報の値は「７」であるが、他の時間フレーム（n）乃至時間フレーム（n+2）の優先度情報は最も低い「０」となっている。 In the example of FIG. 17, the priority information value of the time frame (n-1) is "7", but the priority information of the other time frames (n) to (n+2) is the lowest " 0”.

したがって、閾値Ｑ＝４であるとすると、時間フレーム（n-1）については、IMDCT部２０１においてMDCT係数に対するIMDCTが行われ、時間フレーム（n-1）のIMDCT信号OPS21が得られる。これに対して、時間フレーム（n）については、MDCT係数に対するIMDCTが行われず、０値出力部２００から出力される０データが時間フレーム（n）のIMDCT信号OPS22とされる。 Therefore, assuming that the threshold value Q=4, the IMDCT section 201 performs IMDCT on the MDCT coefficients for the time frame (n-1) to obtain the IMDCT signal OPS21 for the time frame (n-1). On the other hand, for time frame (n), IMDCT is not performed on the MDCT coefficients, and 0 data output from 0 value output section 200 is used as IMDCT signal OPS22 for time frame (n).

この場合、時間フレーム（n）のIMDCT信号OPS22である０データの前半部分と、その１時間フレーム前の時間フレーム（n-1）のIMDCT信号OPS21の後半部分とが足し合わされて、最終的な時間フレーム（n）のオーディオ信号とされる。すなわち、IMDCT信号OPS22とIMDCT信号OPS21の期間FL(n)の部分がオーバーラップ加算されて、処理対象のオブジェクトの最終的な時間フレーム（n）のオーディオ信号とされる。 In this case, the first half of the 0 data, which is the IMDCT signal OPS22 of the time frame (n), and the second half of the IMDCT signal OPS21 of the time frame (n-1) one time frame before that are added together to obtain the final An audio signal of time frame (n). That is, the portions of the period FL(n) of the IMDCT signal OPS22 and the IMDCT signal OPS21 are overlap-added to obtain the final audio signal of the time frame (n) of the object to be processed.

このようにIMDCT信号の出力元がIMDCT部２０１から０値出力部２００へと、または０値出力部２００からIMDCT部２０１へと切り替わるときには、IMDCT部２０１からのIMDCT信号が完全再構成されなくなり、聴感上の音質の劣化が生じてしまう。 Thus, when the output source of the IMDCT signal switches from IMDCT section 201 to 0-value output section 200 or from 0-value output section 200 to IMDCT section 201, the IMDCT signal from IMDCT section 201 is not completely reconstructed, Deterioration of the sound quality on auditory sense occurs.

（不連続性に起因するノイズの発生による音質劣化）
また、IMDCT信号の出力元がIMDCT部２０１から０値出力部２００へと、または０値出力部２００からIMDCT部２０１へと切り替わる場合、信号が完全再構成されないので、IMDCTにより得られたIMDCT信号と、０データとされたIMDCT信号との接続部分で信号が不連続となることがある。そうすると、その不連続な接続部分にグリッチノイズが発生し、オーディオ信号の聴感上の音質が劣化してしまう。 (Sound quality deterioration due to noise caused by discontinuity)
Also, when the output source of the IMDCT signal is switched from the IMDCT section 201 to the 0-value output section 200 or from the 0-value output section 200 to the IMDCT section 201, the signal is not completely reconstructed. , the signal may become discontinuous at the connection with the IMDCT signal which is set to 0 data. As a result, glitch noise occurs at the discontinuous connection, and the audible sound quality of the audio signal deteriorates.

さらに、アンパッキング／復号部１６１において音質を向上させるために、IMDCT部２０１や０値出力部２００から出力されたIMDCT信号をオーバーラップ加算して得られたオーディオ信号に対して、SBR（Spectral Band Replication）等の処理が行われることがある。 Furthermore, in order to improve sound quality in unpacking/decoding section 161, SBR (Spectral Band Replication), etc. may be performed.

なお、IMDCT部２０１や０値出力部２００の後段の処理として様々な処理が考えられるが、以下ではSBRを例として説明を続ける。 Various processes are conceivable as post-processing of the IMDCT unit 201 and the 0-value output unit 200, but the following description will continue with SBR as an example.

SBRでは、低域成分である、オーバーラップ加算により得られたオーディオ信号と、ビットストリームに格納されている高域のパワー値とから、符号化前のもとのオーディオ信号の高域成分が生成される。 In SBR, the high frequency components of the original audio signal before encoding are generated from the audio signal obtained by overlap addition, which is the low frequency component, and the high frequency power value stored in the bitstream. be done.

具体的には、１時間フレーム分のオーディオ信号が、タイムスロットと呼ばれるいくつかの区間に分割され、各タイムスロットのオーディオ信号が低域の複数のサブバンドの信号（以下、低域サブバンド信号とも称する）に帯域分割される。 Specifically, an audio signal for one time frame is divided into several sections called time slots, and the audio signal in each time slot is a plurality of low-frequency sub-band signals (hereinafter referred to as low-frequency sub-band signals). ) are band-divided.

そして各サブバンドの低域サブバンド信号と、高域側のサブバンドごとのパワー値とに基づいて、高域の各サブバンドの信号（以下、高域サブバンド信号とも称する）が生成される。例えば、所定のサブバンドの低域サブバンド信号を高域の目的とするサブバンドのパワー値によりパワー調整したり、周波数シフトしたりすることで、目的とする高域サブバンド信号が生成される。 Then, based on the low-frequency sub-band signal of each sub-band and the power value of each high-frequency sub-band, a signal of each high-frequency sub-band (hereinafter also referred to as a high-frequency sub-band signal) is generated. . For example, the target high-frequency subband signal is generated by adjusting the power of the low-frequency sub-band signal of a predetermined sub-band according to the power value of the target high-frequency sub-band or by frequency-shifting it. .

さらに、高域サブバンド信号と低域サブバンド信号が合成されて、高域成分を含むオーディオ信号が生成され、タイムスロットごとに生成された高域成分を含むオーディオ信号が結合されて、高域成分を含む１時間フレームのオーディオ信号とされる。 Furthermore, the high frequency sub-band signal and the low frequency sub-band signal are combined to generate an audio signal containing high frequency components, and the audio signals containing high frequency components generated for each time slot are combined to produce a high frequency signal. An audio signal of one time frame containing components.

IMDCT部２０１や０値出力部２００の後段において、このようなSBRが行われる場合、IMDCT部２０１から出力されたIMDCT信号からなるオーディオ信号については、SBRにより高域成分が生成される。ところが、０値出力部２００から出力されたIMDCT信号は０データであるため、０値出力部２００から出力されたIMDCT信号からなるオーディオ信号については、SBRにより得られる高域成分も０データとなってしまう。 When such SBR is performed in the subsequent stages of IMDCT section 201 and 0-value output section 200, high-frequency components are generated by SBR for the audio signal composed of the IMDCT signal output from IMDCT section 201. FIG. However, since the IMDCT signal output from the 0-value output unit 200 is 0 data, the high frequency component obtained by SBR is also 0 data for the audio signal composed of the IMDCT signal output from the 0-value output unit 200. end up

そうすると、IMDCT信号の出力元がIMDCT部２０１から０値出力部２００へと、または０値出力部２００からIMDCT部２０１へと切り替わるときに、高域においても接続部分が不連続となってしまうことがある。そのような場合、グリッチノイズが発生し、聴感上の音質が劣化してしまう。 Then, when the output source of the IMDCT signal switches from the IMDCT section 201 to the 0-value output section 200 or from the 0-value output section 200 to the IMDCT section 201, the connection portion becomes discontinuous even in the high frequency range. There is In such a case, glitch noise occurs, and the perceived sound quality deteriorates.

そこで、本技術では前後の時間フレームを考慮したMDCT係数の出力先の選択、およびオーディオ信号に対するフェードイン処理とフェードアウト処理を行うことにより、上述した聴感上の音質劣化を抑制し、音質を向上させるようにした。 Therefore, in this technology, by selecting the output destination of the MDCT coefficients in consideration of the preceding and succeeding time frames, and by performing fade-in and fade-out processing on the audio signal, the above-mentioned deterioration in sound quality in terms of hearing is suppressed and the sound quality is improved. I made it

〈前後の時間フレームを考慮したMDCT係数の出力先の選択について〉
まず、前後の時間フレームを考慮したMDCT係数の出力先の選択について説明する。なお、ここでもオブジェクトのオーディオ信号を例として説明するが、各チャネルのオーディオ信号についても同様である。また、以下において説明する処理は、オブジェクトごと、およびチャネルごとに行われる。 <Regarding the selection of the output destination of the MDCT coefficients considering the previous and next time frames>
First, the selection of the output destination of the MDCT coefficients in consideration of the preceding and succeeding time frames will be described. Although the audio signal of the object will be described here as an example, the same applies to the audio signal of each channel. Also, the processing described below is performed for each object and for each channel.

例えば、上述した実施の形態では、出力選択部１９９は、現時間フレームの優先度情報に基づいて、各オブジェクトのMDCT係数の出力先を選択的に切り替えると説明した。これに対して、本実施の形態では、出力選択部１９９は、現時間フレーム、現時間フレームの１つ前の時間フレーム、および現時間フレームの１つ後の時間フレームの時間的に連続する３つの時間フレームの優先度情報に基づいて、MDCT係数の出力先を切り替える。換言すれば、連続する３つの時間フレームの優先度情報に基づいて、符号化データの復号を行うか否かが選択される。 For example, in the embodiments described above, the output selection unit 199 selectively switches the output destination of the MDCT coefficients of each object based on the priority information of the current time frame. On the other hand, in the present embodiment, the output selection unit 199 selects three consecutive time frames of the current time frame, the time frame immediately before the current time frame, and the time frame one time after the current time frame. Switch the destination of the MDCT coefficients based on the priority information of one time frame. In other words, whether or not to decode encoded data is selected based on the priority information of three consecutive time frames.

具体的には、出力選択部１９９は、処理対象のオブジェクトについて、次式（１）に示す条件式が満たされる場合、そのオブジェクトの時間フレーム（n）のMDCT係数をIMDCT部２０１に供給する。 Specifically, the output selection unit 199 supplies the MDCT coefficients of the time frame (n) of the object to the IMDCT unit 201 when the conditional expression shown in the following expression (1) is satisfied for the object to be processed.

式（１）において、object_priority[q]（但し、q＝n-1,n,n+1）は各時間フレーム（q）の優先度情報を示しており、threは閾値Ｑを示している。 In equation (1), object_priority[q] (where q=n−1, n, n+1) indicates priority information for each time frame (q), and thre indicates the threshold Q.

したがって、現時間フレームと、現時間フレームの前後の時間フレームとの合計３つの連続する時間フレームにおいて、１つでも優先度情報が閾値Ｑ以上となる時間フレームがある場合、MDCT係数の供給先としてIMDCT部２０１が選択される。この場合、符号化データの復号、より詳細にはMDCT係数に対するIMDCTが行われる。これに対して、それらの３つの時間フレームの優先度情報が全て閾値Ｑ未満である場合、MDCT係数が０とされて０値出力部２００に出力される。この場合、符号化データの復号、より詳細にはMDCT係数に対するIMDCTは実質的に行われない。 Therefore, in a total of three consecutive time frames, the current time frame and the time frames before and after the current time frame, if there is at least one time frame in which the priority information is equal to or greater than the threshold Q, the MDCT coefficient is supplied to The IMDCT section 201 is selected. In this case, decoding of encoded data, more specifically, IMDCT for MDCT coefficients is performed. On the other hand, when the priority information of those three time frames are all less than the threshold Q, the MDCT coefficient is set to 0 and output to the 0 value output section 200 . In this case, decoding of encoded data, more specifically, IMDCT for MDCT coefficients is not substantially performed.

これにより、図１８に示すようにIMDCT信号からオーディオ信号が完全再構成され、聴感上の音質の劣化が抑制される。なお、図１８において、図１６における場合と対応する部分には同一の文字等を記してあり、その説明は省略する。 As a result, the audio signal is completely reconstructed from the IMDCT signal as shown in FIG. 18, and the deterioration of the sound quality perceptually is suppressed. In FIG. 18, portions corresponding to those in FIG. 16 are denoted by the same letters and the like, and description thereof will be omitted.

図１８の上側に示す例では、各時間フレームの優先度情報の値が図１７に示した例と同じとなっている。例えば閾値Ｑ＝４であるとすると、図中、上側に示す例では時間フレーム（n-1）の優先度情報は閾値Ｑ以上であるが、時間フレーム（n）乃至時間フレーム（n+2）では、優先度情報が閾値Ｑ未満となっている。 In the example shown on the upper side of FIG. 18, the values of the priority information for each time frame are the same as in the example shown in FIG. For example, if the threshold Q is 4, the priority information of the time frame (n-1) is equal to or higher than the threshold Q in the example shown on the upper side of the figure, but the priority information of the time frame (n) to the time frame (n+2) , the priority information is less than the threshold Q.

そのため、式（１）に示した条件式から、時間フレーム（n-1）と時間フレーム（n）のMDCT係数に対してIMDCTが行われ、それぞれIMDCT信号OPS31とIMDCT信号OPS32が得られる。これに対して、条件式が満たされない時間フレーム（n+1）では、MDCT係数に対するIMDCTが行われず、０データがIMDCT信号OPS33とされる。 Therefore, IMDCT is performed on the MDCT coefficients of time frame (n-1) and time frame (n) from the conditional expression shown in equation (1) to obtain IMDCT signal OPS31 and IMDCT signal OPS32, respectively. On the other hand, in the time frame (n+1) where the conditional expression is not satisfied, the IMDCT is not performed on the MDCT coefficients, and 0 data is used as the IMDCT signal OPS33.

したがって、図１７の例では完全再構成されなかった時間フレーム（n）のオーディオ信号が、図１８の上側に示す例では完全再構成されるようになり、聴感上の音質の劣化が抑制される。但し、この例では、その次の時間フレーム（n+1）でオーディオ信号が完全再構成されないため、時間フレーム（n）と時間フレーム（n+1）で後述するフェードアウト処理が行われ、聴感上の音質の劣化が抑制される。 Therefore, the audio signal of the time frame (n), which was not completely reconstructed in the example of FIG. 17, is completely reconstructed in the example shown in the upper part of FIG. . However, in this example, since the audio signal is not completely reconstructed in the next time frame (n+1), the fade-out process described later is performed in time frame (n) and time frame (n+1), resulting in deterioration of sound quality is suppressed.

また、図中、下側に示す例では、時間フレーム（n-1）乃至時間フレーム（n+1）で優先度情報が閾値Ｑ未満となっており、時間フレーム（n+2）で優先度情報は閾値Ｑ以上となっている。 In addition, in the example shown on the lower side of the figure, the priority information is less than the threshold Q in time frames (n-1) to (n+1), and the priority in time frame (n+2) The information is equal to or greater than the threshold Q.

そのため、式（１）に示した条件式から、条件式が満たされない時間フレーム（n）ではMDCT係数に対するIMDCTが行われず、０データがIMDCT信号OPS41とされる。これに対して、時間フレーム（n+1）および時間フレーム（n+2）のMDCT係数に対してIMDCTが行われ、それぞれIMDCT信号OPS42とIMDCT信号OPS43が得られる。 Therefore, from the conditional expression shown in equation (1), IMDCT is not performed on the MDCT coefficients in the time frame (n) where the conditional expression is not satisfied, and 0 data is used as the IMDCT signal OPS41. On the other hand, IMDCT is performed on the MDCT coefficients of time frame (n+1) and time frame (n+2) to obtain IMDCT signal OPS42 and IMDCT signal OPS43, respectively.

この例では、優先度情報が閾値Ｑ未満の値から閾値Ｑ以上の値へと切り替わった時間フレーム（n+2）で、オーディオ信号を完全再構成することができるため、聴感上の音質の劣化を抑制することができる。但し、この場合においても、その直前の時間フレーム（n+1）でオーディオ信号が完全再構成されないため、時間フレーム（n+1）と時間フレーム（n+2）で後述するフェードイン処理が行われ、聴感上の音質の劣化が抑制される。 In this example, since the audio signal can be completely reconstructed in the time frame (n+2) when the priority information switches from a value less than the threshold Q to a value greater than the threshold Q, the perceived sound quality is degraded. can be suppressed. However, even in this case, since the audio signal is not completely reconstructed in the time frame (n+1) immediately before that, the fade-in processing described later is performed in the time frame (n+1) and time frame (n+2). As a result, the deterioration of the sound quality on the sense of hearing is suppressed.

なお、ここでは、１時間フレーム分だけ優先度情報の先読みを行って、連続する３時間フレームの優先度情報からMDCT係数の出力先が選択されている。そのため、図中、上側で示した例の時間フレーム（n）と時間フレーム（n+1）でフェードアウト処理が行われ、図中、下側で示した例の時間フレーム（n+1）と時間フレーム（n+2）でフェードイン処理が行われる。 Here, the priority information for one time frame is read ahead, and the output destination of the MDCT coefficient is selected from the priority information for three consecutive time frames. Therefore, fade-out processing is performed in the example time frame (n) and time frame (n+1) shown in the upper part of the figure, and the time frame (n+1) and time frame (n+1) shown in the lower part of the figure are faded out. Fade-in processing is performed at frame (n+2).

しかし、２時間フレーム分の優先度情報の先読みを行うことができる場合には、図中、上側で示した例の時間フレーム（n+1）と時間フレーム（n+2）でフェードアウト処理が行われ、図中、下側で示した例の時間フレーム（n）と時間フレーム（n+1）でフェードイン処理が行われるようにしてもよい。 However, if priority information for two time frames can be prefetched, fade-out processing is performed in the example time frame (n+1) and time frame (n+2) shown in the upper part of the figure. Alternatively, the fade-in process may be performed at the time frame (n) and the time frame (n+1) shown on the lower side of the figure.

〈フェードイン処理とフェードアウト処理について〉
次に、オーディオ信号に対するフェードイン処理とフェードアウト処理について説明する。なお、ここでもオブジェクトのオーディオ信号を例として説明するが、各チャネルのオーディオ信号についても同様である。また、フェードイン処理とフェードアウト処理は、オブジェクトごと、およびチャネルごとに行われる。 <Regarding fade-in and fade-out processing>
Next, fade-in processing and fade-out processing for audio signals will be described. Although the audio signal of the object will be described here as an example, the same applies to the audio signal of each channel. Also, fade-in processing and fade-out processing are performed for each object and each channel.

本技術では、例えば図１８に示した例のように、IMDCTにより得られたIMDCT信号と０データであるIMDCT信号とがオーバーラップ加算される時間フレームとその前または後の時間フレームにおいて、フェードイン処理またはフェードアウト処理が行われる。 In the present technology, as in the example shown in FIG. 18, fade-in is performed in a time frame in which an IMDCT signal obtained by IMDCT and an IMDCT signal that is 0 data are overlap-added and in a time frame before or after that. processing or fade-out processing is performed.

フェードイン処理では、その時間フレームのオーディオ信号の振幅（大きさ）が時間とともに大きくなるように、オーディオ信号に対するゲイン調整が行われる。逆にフェードアウト処理では、その時間フレームのオーディオ信号の振幅が時間とともに小さくなるように、オーディオ信号に対するゲイン調整が行われる。 In fade-in processing, gain adjustment is performed on the audio signal so that the amplitude (magnitude) of the audio signal in that time frame increases over time. Conversely, in fade-out processing, gain adjustment is performed on the audio signal so that the amplitude of the audio signal in that time frame decreases over time.

これにより、IMDCTにより得られたIMDCT信号と、０データとされたIMDCT信号との接続部分が不連続となる場合でも聴感上の音質の劣化を抑制することができる。なお、以下、このようなゲイン調整時にオーディオ信号に対して乗算されるゲイン値を、特にフェーディング信号ゲインとも称することとする。 As a result, even when the connecting portion between the IMDCT signal obtained by the IMDCT and the IMDCT signal with 0 data is discontinuous, it is possible to suppress the deterioration of the audible sound quality. Note that hereinafter, the gain value by which the audio signal is multiplied during such gain adjustment is also referred to as a fading signal gain.

さらに、本技術では、IMDCTにより得られたIMDCT信号と０データであるIMDCT信号との接続部分について、SBRにおいてもフェードイン処理またはフェードアウト処理が行われる。 Furthermore, in the present technology, fade-in processing or fade-out processing is also performed in SBR on the connecting portion between the IMDCT signal obtained by IMDCT and the IMDCT signal that is 0 data.

すなわち、SBRではタイムスロットごとに高域の各サブバンドのパワー値が用いられるが、本技術では、フェードイン処理用またはフェードアウト処理用にタイムスロットごとに定められたゲイン値が、高域の各サブバンドのパワー値に乗算されてSBRが行われる。
つまり、高域のパワー値のゲイン調整が行われる。 That is, in SBR, the power value of each high-frequency subband is used for each time slot, but in this technology, the gain value determined for each time slot for fade-in processing or fade-out processing is used for each high-frequency subband. SBR is performed by multiplying the subband power values.
That is, the gain adjustment of the power value of the high frequency is performed.

なお、以下、高域のパワー値に乗算される、タイムスロットごとに定められたゲイン値を、特にフェーディングSBRゲインとも称することとする。 In addition, hereinafter, the gain value determined for each time slot, which is multiplied by the high-frequency power value, is also referred to as a fading SBR gain.

具体的には、フェードイン処理用のフェーディングSBRゲインは、そのゲイン値が時間とともに大きくなるように、つまり時間的に後方のタイムスロットのフェーディングSBRゲインほど、その値が大きくなるように定められている。逆に、フェードアウト処理用のフェーディングSBRゲインは、時間的に後方のタイムスロットのフェーディングSBRゲインほど、その値が小さくなるように定められている。 Specifically, the fading SBR gain for fade-in processing is determined so that the gain value increases with time, that is, the fading SBR gain of the later time slot has a greater value. It is Conversely, the fading SBR gain for fade-out processing is determined such that the value of the fading SBR gain in the later time slot becomes smaller.

このように、SBR時にもフェードイン処理やフェードアウト処理を行うことで、高域が不連続となるときでも聴感上の音質の劣化を抑制することができる。 In this way, by performing fade-in processing and fade-out processing even during SBR, it is possible to suppress the deterioration of the audible sound quality even when the high frequencies become discontinuous.

このようなオーディオ信号および高域のパワー値に対するフェードイン処理やフェードアウト処理といったゲイン調整として、具体的には、例えば図１９や図２０に示す処理が行われることになる。なお、図１９および図２０において、図１８における場合と対応する部分には同一の文字や符号等を記してあり、その説明は省略する。 Specifically, the processes shown in FIGS. 19 and 20, for example, are performed as gain adjustments such as fade-in processing and fade-out processing for such audio signals and high-frequency power values. 19 and 20, portions corresponding to those in FIG. 18 are denoted by the same characters, symbols, etc., and description thereof will be omitted.

図１９に示す例は、図１８における図中、上側に示した場合の例である。この例では、時間フレーム（n）および時間フレーム（n+1）のオーディオ信号に対して、折れ線GN11に示されるフェーディング信号ゲインが乗算されることになる。 The example shown in FIG. 19 is an example of the case shown on the upper side in FIG. In this example, the audio signals of time frame (n) and time frame (n+1) are multiplied by the fading signal gain indicated by line GN11.

折れ線GN11に示されるフェーディング信号ゲインの値は、時間フレーム（n）の部分では時間とともに「１」から「０」まで線形に変化し、時間フレーム（n+1）の部分では継続して「０」となっている。したがって、フェーディング信号ゲインによるオーディオ信号のゲイン調整によって、オーディオ信号は徐々に０データへと変化していくので、聴感上の音質の劣化を抑制することができる。 The value of the fading signal gain indicated by the polygonal line GN11 varies linearly with time from '1' to '0' in the portion of time frame (n), and continues to ' 0”. Therefore, by adjusting the gain of the audio signal using the fading signal gain, the audio signal gradually changes to 0 data, so that the deterioration of the sound quality perceptually can be suppressed.

また、この例では時間フレーム（n）の各タイムスロットの高域のパワー値に対して、矢印GN12に示されるフェーディングSBRゲインが乗算されることになる。 Also, in this example, the high frequency power value of each time slot of time frame (n) is multiplied by the fading SBR gain indicated by arrow GN12.

矢印GN12に示されるフェーディングSBRゲインの値は、時間的に後方のタイムスロットほど小さくなるように、「１」から「０」まで変化している。したがって、フェーディングSBRゲインによる高域のゲイン調整によって、オーディオ信号の高域成分は徐々に０データへと変化していくので、聴感上の音質の劣化を抑制することができる。 The value of the fading SBR gain indicated by the arrow GN12 varies from "1" to "0" so as to become smaller in the later time slots. Therefore, the high-frequency component of the audio signal gradually changes to 0 data by adjusting the high-frequency gain using the fading SBR gain, so it is possible to suppress the deterioration of the audible sound quality.

これに対して、図２０に示す例は、図１８における図中、下側に示した場合の例である。この例では、時間フレーム（n+1）および時間フレーム（n+2）のオーディオ信号に対して、折れ線GN21に示されるフェーディング信号ゲインが乗算されることになる。 On the other hand, the example shown in FIG. 20 is an example of the case shown on the lower side in FIG. In this example, the audio signals of time frame (n+1) and time frame (n+2) will be multiplied by the fading signal gain indicated by polygonal line GN21.

折れ線GN21に示されるフェーディング信号ゲインの値は、時間フレーム（n+1）の部分では継続して「０」となっており、時間フレーム（n+2）の部分では時間とともに「０」から「１」まで線形に変化している。したがって、フェーディング信号ゲインによるオーディオ信号のゲイン調整によって、オーディオ信号は徐々に０データから本来の信号へと変化していくので、聴感上の音質の劣化を抑制することができる。 The value of the fading signal gain indicated by the polygonal line GN21 is continuously "0" in the portion of the time frame (n+1), and gradually changes from "0" to "0" in the portion of the time frame (n+2). It changes linearly up to "1". Therefore, by adjusting the gain of the audio signal using the fading signal gain, the audio signal gradually changes from 0 data to the original signal.

また、この例では時間フレーム（n+2）の各タイムスロットの高域のパワー値に対して、矢印GN22に示されるフェーディングSBRゲインが乗算されることになる。 Also, in this example, the high frequency power value of each time slot of the time frame (n+2) is multiplied by the fading SBR gain indicated by the arrow GN22.

矢印GN22に示されるフェーディングSBRゲインの値は、時間的に後方のタイムスロットほど大きくなるように、「０」から「１」まで変化している。したがって、フェーディングSBRゲインによる高域のゲイン調整によって、オーディオ信号の高域成分は徐々に０データから本来の信号へと変化していくので、聴感上の音質の劣化を抑制することができる。 The value of the fading SBR gain indicated by the arrow GN22 varies from "0" to "1" so as to increase in the later time slots. Therefore, by adjusting the high-frequency gain using the fading SBR gain, the high-frequency component of the audio signal gradually changes from 0 data to the original signal.

〈アンパッキング／復号部の構成例〉
以上において説明したMDCT係数の出力先の選択と、フェードイン処理やフェードアウト処理といったゲイン調整とが行われる場合、アンパッキング／復号部１６１は、例えば図２１に示すように構成される。なお、図２１において、図１０における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 <Configuration example of unpacking/decoding section>
When the selection of the output destination of the MDCT coefficients described above and the gain adjustment such as fade-in processing and fade-out processing are performed, the unpacking/decoding section 161 is configured as shown in FIG. 21, for example. 21, parts corresponding to those in FIG. 10 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

図２１に示すアンパッキング／復号部１６１は優先度情報取得部１９１、チャネルオーディオ信号取得部１９２、チャネルオーディオ信号復号部１９３、出力選択部１９４、０値出力部１９５、IMDCT部１９６、オーバーラップ加算部２７１、ゲイン調整部２７２、SBR処理部２７３、オブジェクトオーディオ信号取得部１９７、オブジェクトオーディオ信号復号部１９８、出力選択部１９９、０値出力部２００、IMDCT部２０１、オーバーラップ加算部２７４、ゲイン調整部２７５、およびSBR処理部２７６から構成される。 The unpacking/decoding unit 161 shown in FIG. 21 includes a priority information acquisition unit 191, a channel audio signal acquisition unit 192, a channel audio signal decoding unit 193, an output selection unit 194, a 0 value output unit 195, an IMDCT unit 196, and an overlap addition. Section 271, gain adjustment section 272, SBR processing section 273, object audio signal acquisition section 197, object audio signal decoding section 198, output selection section 199, 0 value output section 200, IMDCT section 201, overlap addition section 274, gain adjustment It is composed of a section 275 and an SBR processing section 276 .

図２１に示すアンパッキング／復号部１６１の構成は、図１０に示したアンパッキング／復号部１６１の構成に、さらにオーバーラップ加算部２７１乃至SBR処理部２７６が設けられた構成となっている。 The configuration of the unpacking/decoding section 161 shown in FIG. 21 is the same as the configuration of the unpacking/decoding section 161 shown in FIG.

オーバーラップ加算部２７１は、０値出力部１９５またはIMDCT部１９６から供給されたIMDCT信号（オーディオ信号）をオーバーラップ加算することにより、各時間フレームのオーディオ信号を生成し、ゲイン調整部２７２に供給する。 The overlap adder 271 generates an audio signal for each time frame by performing overlap addition on the IMDCT signal (audio signal) supplied from the 0 value output unit 195 or the IMDCT unit 196, and supplies the audio signal to the gain adjuster 272. do.

ゲイン調整部２７２は、優先度情報取得部１９１から供給された優先度情報に基づいて、オーバーラップ加算部２７１から供給されたオーディオ信号をゲイン調整し、SBR処理部２７３に供給する。 The gain adjustment unit 272 adjusts the gain of the audio signal supplied from the overlap addition unit 271 based on the priority information supplied from the priority information acquisition unit 191 and supplies the audio signal to the SBR processing unit 273 .

SBR処理部２７３は、優先度情報取得部１９１からタイムスロットごとの高域の各サブバンドのパワー値を取得するとともに、優先度情報取得部１９１から供給された優先度情報に基づいて高域のパワー値をゲイン調整する。また、SBR処理部２７３は、ゲイン調整された高域のパワー値を用いて、ゲイン調整部２７２から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をミキシング部１６３に供給する。 The SBR processing unit 273 acquires the power value of each high frequency sub-band for each time slot from the priority information acquisition unit 191, and based on the priority information supplied from the priority information acquisition unit 191, the SBR processing unit 273 Adjust the gain of the power value. In addition, the SBR processing unit 273 performs SBR on the audio signal supplied from the gain adjustment unit 272 using the gain-adjusted high-frequency power value, and sends the resulting audio signal to the mixing unit 163. supply.

オーバーラップ加算部２７４は、０値出力部２００またはIMDCT部２０１から供給されたIMDCT信号（オーディオ信号）をオーバーラップ加算することにより、各時間フレームのオーディオ信号を生成し、ゲイン調整部２７５に供給する。 The overlap adder 274 generates an audio signal for each time frame by performing overlap addition on the IMDCT signal (audio signal) supplied from the 0 value output unit 200 or the IMDCT unit 201, and supplies the audio signal to the gain adjuster 275. do.

ゲイン調整部２７５は、優先度情報取得部１９１から供給された優先度情報に基づいて、オーバーラップ加算部２７４から供給されたオーディオ信号をゲイン調整し、SBR処理部２７６に供給する。 The gain adjustment unit 275 adjusts the gain of the audio signal supplied from the overlap addition unit 274 based on the priority information supplied from the priority information acquisition unit 191 and supplies the audio signal to the SBR processing unit 276 .

SBR処理部２７６は、優先度情報取得部１９１からタイムスロットごとの高域の各サブバンドのパワー値を取得するとともに、優先度情報取得部１９１から供給された優先度情報に基づいて高域のパワー値をゲイン調整する。また、SBR処理部２７６は、ゲイン調整された高域のパワー値を用いて、ゲイン調整部２７５から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をレンダリング部１６２に供給する。 The SBR processing unit 276 acquires the power value of each high frequency sub-band for each time slot from the priority information acquisition unit 191, and based on the priority information supplied from the priority information acquisition unit 191, the SBR processing unit 276 Adjust the gain of the power value. In addition, the SBR processing unit 276 performs SBR on the audio signal supplied from the gain adjustment unit 275 using the gain-adjusted high-frequency power value, and outputs the resulting audio signal to the rendering unit 162. supply.

〈選択復号処理の説明〉
続いて、アンパッキング／復号部１６１が図２１に示した構成とされる場合における復号装置１５１の動作について説明する。この場合、復号装置１５１は、図１１を参照して説明した復号処理を行う。但し、ステップＳ５２の選択復号処理として、図２２に示す処理を行う。 <Description of selective decryption processing>
Next, the operation of decoding device 151 when unpacking/decoding section 161 has the configuration shown in FIG. 21 will be described. In this case, the decoding device 151 performs the decoding process described with reference to FIG. However, as the selective decoding process in step S52, the process shown in FIG. 22 is performed.

以下、図２２のフローチャートを参照して、図１１のステップＳ５２の処理に対応する選択復号処理について説明する。 The selective decoding process corresponding to the process of step S52 in FIG. 11 will be described below with reference to the flowchart in FIG.

ステップＳ１８１において、優先度情報取得部１９１は、供給されたビットストリームから、各チャネルのオーディオ信号の高域のパワー値を取得してSBR処理部２７３に供給するとともに、ビットストリームから、各オブジェクトのオーディオ信号の高域のパワー値を取得してSBR処理部２７６に供給する。 In step S181, the priority information acquisition unit 191 acquires the high frequency power value of the audio signal of each channel from the supplied bitstream and supplies it to the SBR processing unit 273, and also acquires the power value of each object from the bitstream. A high frequency power value of the audio signal is acquired and supplied to the SBR processing unit 276 .

高域のパワー値が取得されると、その後ステップＳ１８２乃至ステップＳ１８７の処理が行われて処理対象のチャネルのオーディオ信号（IMDCT信号）が生成されるが、これらの処理は図１２のステップＳ８１乃至ステップＳ８６の処理と同様であるので、その説明は省略する。 After the high-frequency power value is obtained, the processing of steps S182 to S187 is performed to generate the audio signal (IMDCT signal) of the channel to be processed. Since it is the same as the processing in step S86, the explanation thereof is omitted.

但し、ステップＳ１８６では、上述した式（１）と同様の条件式が満たされる場合、すなわち処理対象のチャネルの現時間フレームの優先度情報、およびその現時間フレームの直前および直後の各時間フレームの優先度情報のうちの１つでも閾値Ｐ以上である場合、優先度情報が閾値Ｐ以上であると判定される。また、０値出力部１９５またはIMDCT部１９６で生成されたIMDCT信号は、オーバーラップ加算部２７１に出力される。 However, in step S186, if a conditional expression similar to the above-described expression (1) is satisfied, that is, the priority information of the current time frame of the channel to be processed, and the priority information of the time frames immediately before and after the current time frame. If even one piece of priority information is greater than or equal to the threshold P, it is determined that the priority information is greater than or equal to the threshold P. Also, the IMDCT signal generated by the 0-value output section 195 or the IMDCT section 196 is output to the overlap addition section 271 .

ステップＳ１８６において優先度情報が閾値Ｐ以上であると判定されなかったか、またはステップＳ１８７においてIMDCT信号が生成されると、ステップＳ１８８の処理が行われる。 If it is not determined in step S186 that the priority information is equal to or greater than the threshold value P, or if the IMDCT signal is generated in step S187, the process of step S188 is performed.

ステップＳ１８８において、オーバーラップ加算部２７１は、０値出力部１９５またはIMDCT部１９６から供給されたIMDCT信号のオーバーラップ加算を行い、その結果得られた現時間フレームのオーディオ信号をゲイン調整部２７２に供給する。 In step S188, the overlap adder 271 performs overlap addition of the IMDCT signals supplied from the 0-value output unit 195 or the IMDCT unit 196, and sends the resulting audio signal of the current time frame to the gain adjuster 272. supply.

具体的には、例えば図１８を参照して説明したように、現時間フレームのIMDCT信号の前半部分と、直前の時間フレームのIMDCT信号の後半部分とが足し合わされて現時間フレームのオーディオ信号とされる。 Specifically, for example, as described with reference to FIG. 18, the first half of the IMDCT signal of the current time frame and the second half of the IMDCT signal of the previous time frame are added together to form the audio signal of the current time frame. be done.

ステップＳ１８９において、ゲイン調整部２７２は、優先度情報取得部１９１から供給された処理対象のチャネルの優先度情報に基づいて、オーバーラップ加算部２７１から供給されたオーディオ信号をゲイン調整し、SBR処理部２７３に供給する。 In step S189, the gain adjustment unit 272 adjusts the gain of the audio signal supplied from the overlap addition unit 271 based on the priority information of the processing target channel supplied from the priority information acquisition unit 191, and performs SBR processing. 273.

具体的にはゲイン調整部２７２は、現時間フレームの直前の時間フレームの優先度情報が閾値Ｐ以上であり、かつ現時間フレームの優先度情報と、現時間フレームの直後の時間フレームの優先度情報が閾値Ｐ未満である場合、図１９の折れ線GN11に示されるフェーディング信号ゲインでオーディオ信号のゲインを調整する。この場合、図１９における時間フレーム（n）が現時間フレームに対応し、現時間フレームの直後の時間フレームでは、折れ線GN11に示されるように、フェーディング信号ゲイン＝０でのゲイン調整が行われる。 Specifically, the gain adjustment unit 272 determines that the priority information of the time frame immediately before the current time frame is equal to or greater than the threshold value P, and the priority information of the current time frame and the priority of the time frame immediately after the current time frame If the information is less than the threshold P, the gain of the audio signal is adjusted with the fading signal gain indicated by the polygonal line GN11 in FIG. In this case, the time frame (n) in FIG. 19 corresponds to the current time frame, and in the time frame immediately after the current time frame, gain adjustment is performed with the fading signal gain=0, as indicated by the polygonal line GN11. .

また、ゲイン調整部２７２は、現時間フレームの優先度情報が閾値Ｐ以上であり、現時間フレームの直前の２時間フレームの優先度情報がともに閾値Ｐ未満である場合、図２０の折れ線GN21に示されるフェーディング信号ゲインでオーディオ信号のゲインを調整する。この場合、図２０における時間フレーム（n+2）が現時間フレームに対応し、現時間フレームの直前の時間フレームでは、折れ線GN21に示されるように、フェーディング信号ゲイン＝０でのゲイン調整が行われる。 If the priority information of the current time frame is equal to or greater than the threshold value P and the priority information of the two time frames immediately before the current time frame are both less than the threshold value P, the gain adjustment unit 272 shifts to the polygonal line GN21 in FIG. Adjust the gain of the audio signal with the indicated fading signal gain. In this case, the time frame (n+2) in FIG. 20 corresponds to the current time frame, and in the time frame immediately preceding the current time frame, gain adjustment with fading signal gain=0 is performed as indicated by the polygonal line GN21. done.

なお、ゲイン調整部２７２は、これらの２つの例の場合のみゲイン調整を行い、それ以外の場合にはゲイン調整を行わず、オーディオ信号をそのままSBR処理部２７３に供給する。 Note that the gain adjustment unit 272 performs gain adjustment only in these two cases, and does not perform gain adjustment in other cases, and supplies the audio signal to the SBR processing unit 273 as it is.

ステップＳ１９０において、SBR処理部２７３は、優先度情報取得部１９１から供給された、処理対象のチャネルの高域のパワー値および優先度情報に基づいて、ゲイン調整部２７２から供給されたオーディオ信号に対してSBRを行う。 In step S190, the SBR processing unit 273 converts the audio signal supplied from the gain adjustment unit 272 into SBR against it.

具体的には、SBR処理部２７３は、現時間フレームの直前の時間フレームの優先度情報が閾値Ｐ以上であり、かつ現時間フレームの優先度情報と、現時間フレームの直後の時間フレームの優先度情報が閾値Ｐ未満である場合、図１９の矢印GN12に示されるフェーディングSBRゲインで高域のパワー値をゲイン調整する。すなわち、高域のパワー値にフェーディングSBRゲインが乗算される。 Specifically, the SBR processing unit 273 determines that the priority information of the time frame immediately before the current time frame is equal to or greater than the threshold value P, and that the priority information of the current time frame and the priority of the time frame immediately after the current time frame If the degree information is less than the threshold value P, the power value of the high frequency range is gain-adjusted with the fading SBR gain indicated by the arrow GN12 in FIG. That is, the high frequency power value is multiplied by the fading SBR gain.

そして、SBR処理部２７３は、ゲイン調整された高域のパワー値を用いてSBRを行い、その結果得られたオーディオ信号をミキシング部１６３に供給する。この場合、図１９における時間フレーム（n）が現時間フレームに対応する。 Then, the SBR processing unit 273 performs SBR using the gain-adjusted high-frequency power value, and supplies the resulting audio signal to the mixing unit 163 . In this case, time frame (n) in FIG. 19 corresponds to the current time frame.

また、SBR処理部２７３は、現時間フレームの優先度情報が閾値Ｐ以上であり、現時間フレームの直前の２時間フレームの優先度情報がともに閾値Ｐ未満である場合、図２０の矢印GN22に示されるフェーディングSBRゲインで高域のパワー値をゲイン調整する。そして、SBR処理部２７３は、ゲイン調整された高域のパワー値を用いてSBRを行い、その結果得られたオーディオ信号をミキシング部１６３に供給する。この場合、図２０における時間フレーム（n+2）が現時間フレームに対応する。 If the priority information of the current time frame is equal to or greater than the threshold value P and the priority information of two time frames immediately preceding the current time frame are both less than the threshold value P, the SBR processing unit 273 moves to arrow GN22 in FIG. Gain adjust the high frequency power value with the indicated fading SBR gain. Then, the SBR processing unit 273 performs SBR using the gain-adjusted high-frequency power value, and supplies the resulting audio signal to the mixing unit 163 . In this case, time frame (n+2) in FIG. 20 corresponds to the current time frame.

なお、SBR処理部２７３は、これらの２つの例の場合のみ高域のパワー値のゲイン調整を行い、それ以外の場合にはゲイン調整を行わずに、取得された高域のパワー値をそのまま用いてSBRを行い、その結果得られたオーディオ信号をミキシング部１６３に供給する。 Note that the SBR processing unit 273 performs gain adjustment of the high-frequency power value only in these two cases, and in other cases, does not adjust the gain, and uses the acquired high-frequency power value as it is. SBR is performed using the SBR, and the audio signal obtained as a result is supplied to the mixing unit 163 .

SBRが行われて現時間フレームのオーディオ信号が得られると、その後、ステップＳ１９１乃至ステップＳ１９６の処理が行われるが、これらの処理は図１２のステップＳ８７乃至ステップＳ９２の処理と同様であるので、その説明は省略する。 After the SBR is performed and the audio signal of the current time frame is obtained, the processing of steps S191 to S196 is performed. The explanation is omitted.

但し、ステップＳ１９５では、上述した式（１）の条件式が満たされる場合、優先度情報が閾値Ｑ以上であると判定される。また、０値出力部２００またはIMDCT部２０１で生成されたIMDCT信号（オーディオ信号）は、オーバーラップ加算部２７４に出力される。 However, in step S195, it is determined that the priority information is equal to or greater than the threshold value Q when the conditional expression (1) described above is satisfied. Also, the IMDCT signal (audio signal) generated by the 0-value output section 200 or the IMDCT section 201 is output to the overlap addition section 274 .

このようにして現時間フレームのIMDCT信号が得られると、ステップＳ１９７乃至ステップＳ１９９の処理が行われて現時間フレームのオーディオ信号が生成されるが、これらの処理はステップＳ１８８乃至ステップＳ１９０の処理と同様であるので、その説明は省略する。 When the IMDCT signal of the current time frame is obtained in this way, the processes of steps S197 to S199 are performed to generate the audio signal of the current time frame, but these processes are the same as the processes of steps S188 to S190. Since it is the same, its explanation is omitted.

ステップＳ２００において、オブジェクトオーディオ信号取得部１９７がオブジェクト番号に１を加えると、処理はステップＳ１９３に戻る。そして、ステップＳ１９３においてオブジェクト番号がＮ未満ではないと判定されると、選択復号処理は終了し、その後、処理は図１１のステップＳ５３へと進む。 In step S200, when the object audio signal acquisition unit 197 adds 1 to the object number, the process returns to step S193. Then, if it is determined in step S193 that the object number is not less than N, the selective decoding process ends, and then the process proceeds to step S53 in FIG.

以上のようにしてアンパッキング／復号部１６１は、現時間フレームとその前後の時間フレームの優先度情報に応じて、MDCT係数の出力先を選択する。これにより、優先度情報が閾値以上である時間フレームと、優先度情報が閾値未満である時間フレームとの切り替わり部分においてオーディオ信号が完全再構成されるようになり、聴感上の音質の劣化を抑制することができる。 As described above, the unpacking/decoding unit 161 selects the output destination of the MDCT coefficients according to the priority information of the current time frame and the time frames before and after it. As a result, the audio signal is completely reconfigured at the transition between time frames in which the priority information is above the threshold and time frames in which the priority information is below the threshold, thereby suppressing deterioration in perceived sound quality. can do.

また、アンパッキング／復号部１６１は、連続する３時間フレームの優先度情報に基づいて、オーバーラップ加算後のオーディオ信号や、高域のパワー値をゲイン調整する。すなわち、適宜、フェードイン処理やフェードアウト処理が行われる。これにより、グリッチノイズの発生を抑制し、聴感上の音質の劣化を抑制することができる。 Also, the unpacking/decoding unit 161 gain-adjusts the audio signal after overlap addition and the power value of the high frequency band based on the priority information of the three consecutive time frames. That is, fade-in processing and fade-out processing are performed as appropriate. As a result, it is possible to suppress the occurrence of glitch noise and suppress the deterioration of sound quality in terms of audibility.

〈第５の実施の形態〉
〈フェードイン処理とフェードアウト処理について〉
なお、第４の実施の形態では、オーバーラップ加算後のオーディオ信号に対してゲイン調整を行い、さらにSBR時に高域のパワー値に対するゲイン調整を行うと説明した。この場合、最終的なオーディオ信号の低域成分と高域成分とで別々にゲイン調整、つまりフェードイン処理やフェードアウト処理が行われることになる。 <Fifth embodiment>
<Regarding fade-in and fade-out processing>
In the fourth embodiment, it has been explained that the gain adjustment is performed on the audio signal after overlap addition, and further the gain adjustment is performed on the high frequency power value during SBR. In this case, gain adjustment, that is, fade-in processing and fade-out processing are performed separately for the low-frequency component and the high-frequency component of the final audio signal.

そこで、より少ない処理でこれらのフェードイン処理やフェードアウト処理を実現することができるように、オーバーラップ加算直後およびSBR時にはゲイン調整を行わず、SBRにより得られたオーディオ信号に対してゲイン調整を行うようにしてもよい。 Therefore, in order to realize fade-in processing and fade-out processing with less processing, gain adjustment is not performed immediately after overlap addition and during SBR, but gain adjustment is performed on the audio signal obtained by SBR. You may do so.

そのような場合、例えば図２３や図２４に示すようにゲイン調整が行われる。なお、図２３および図２４において、図１９および図２０における場合と対応する部分には同一の文字等を記してあり、その説明は省略する。 In such a case, gain adjustment is performed as shown in FIGS. 23 and 24, for example. In FIGS. 23 and 24, portions corresponding to those in FIGS. 19 and 20 are denoted by the same letters and the like, and description thereof will be omitted.

図２３に示す例は、優先度情報の変化が図１９に示した場合と同じである例である。この例では、閾値Ｑ＝４であるとすると、時間フレーム（n-1）の優先度情報は閾値Ｑ以上であるが、時間フレーム（n）乃至時間フレーム（n+2）では、優先度情報が閾値Ｑ未満となっている。 The example shown in FIG. 23 is an example in which the change in priority information is the same as in the case shown in FIG. In this example, assuming that the threshold Q=4, the priority information in the time frame (n-1) is equal to or greater than the threshold Q, but in the time frames (n) to (n+2), the priority information is less than the threshold Q.

このような場合、時間フレーム（n）および時間フレーム（n+1）における、SBRにより得られたオーディオ信号に対して、折れ線GN31に示されるフェーディング信号ゲインが乗算されてゲイン調整されることになる。 In such a case, the audio signal obtained by SBR at time frame (n) and time frame (n+1) is multiplied by the fading signal gain indicated by line GN31 to adjust the gain. Become.

この折れ線GN31に示されるフェーディング信号ゲインは、図１９の折れ線GN11に示されるフェーディング信号ゲインと同じものとなっている。但し、図２３の例の場合には、ゲイン調整の対象となるオーディオ信号は、低域成分も高域成分も含まれたものとなっているので、それらの低域成分と高域成分のゲイン調整を１つのフェーディング信号ゲインで行うことができる。 The fading signal gain indicated by this polygonal line GN31 is the same as the fading signal gain indicated by the polygonal line GN11 in FIG. However, in the case of the example of FIG. 23, the audio signal to be gain-adjusted includes both low-frequency and high-frequency components. Adjustments can be made with one fading signal gain.

このようなフェーディング信号ゲインによるオーディオ信号のゲイン調整によって、IMDCTにより得られたIMDCT信号と、０データとされたIMDCT信号とがオーバーラップ加算される部分とその直前の部分で、オーディオ信号が徐々に０データへと変化していくようになる。これにより、聴感上の音質の劣化を抑制することができる。 By adjusting the gain of the audio signal by such fading signal gain, the audio signal gradually changes at the portion where the IMDCT signal obtained by the IMDCT and the IMDCT signal with 0 data are overlap-added and the portion immediately before that. gradually changes to 0 data. As a result, it is possible to suppress the deterioration of the sound quality in terms of audibility.

これに対して、図２４に示す例は、優先度情報の変化が図２０に示した場合と同じである例である。この例では、閾値Ｑ＝４であるとすると、時間フレーム（n）および時間フレーム（n+1）では優先度情報が閾値Ｑ未満であるが、時間フレーム（n+2）の優先度情報は閾値Ｑ以上となっている。 On the other hand, the example shown in FIG. 24 is an example in which the change in priority information is the same as in the case shown in FIG. In this example, assuming threshold Q=4, the priority information for time frame (n) and time frame (n+1) is less than threshold Q, but the priority information for time frame (n+2) is It is equal to or greater than the threshold Q.

このような場合、時間フレーム（n+1）および時間フレーム（n+2）における、SBRにより得られたオーディオ信号に対して、折れ線GN41に示されるフェーディング信号ゲインが乗算されてゲイン調整されることになる。 In such a case, the audio signal obtained by SBR in time frame (n+1) and time frame (n+2) is multiplied by the fading signal gain indicated by polygonal line GN41 to adjust the gain. It will be.

この折れ線GN41に示されるフェーディング信号ゲインは、図２０の折れ線GN21に示されるフェーディング信号ゲインと同じものとなっている。但し、図２４の例の場合には、ゲイン調整の対象となるオーディオ信号は、低域成分も高域成分も含まれたものとなっているので、それらの低域成分と高域成分のゲイン調整を１つのフェーディング信号ゲインで行うことができる。 The fading signal gain indicated by this polygonal line GN41 is the same as the fading signal gain indicated by the polygonal line GN21 in FIG. However, in the case of the example of FIG. 24, the audio signal to be gain-adjusted includes both low-frequency and high-frequency components. Adjustments can be made with one fading signal gain.

このようなフェーディング信号ゲインによるオーディオ信号のゲイン調整によって、IMDCTにより得られたIMDCT信号と、０データとされたIMDCT信号とがオーバーラップ加算される部分とその直後の部分で、オーディオ信号が０データから本来の信号へと徐々に変化していくようになる。これにより、聴感上の音質の劣化を抑制することができる。 By adjusting the gain of the audio signal using such fading signal gain, the audio signal becomes 0 at the portion where the IMDCT signal obtained by the IMDCT and the IMDCT signal with 0 data are overlap-added and the portion immediately after that. The data gradually changes to the original signal. As a result, it is possible to suppress the deterioration of the sound quality in terms of audibility.

〈アンパッキング／復号部の構成例〉
図２３および図２４を参照して説明したフェードイン処理やフェードアウト処理によるゲイン調整が行われる場合、アンパッキング／復号部１６１は、例えば図２５に示すように構成される。なお、図２５において、図２１における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 <Configuration example of unpacking/decoding section>
When gain adjustment is performed by the fade-in processing and fade-out processing described with reference to FIGS. 23 and 24, the unpacking/decoding section 161 is configured as shown in FIG. 25, for example. In FIG. 25, parts corresponding to those in FIG. 21 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

図２５に示すアンパッキング／復号部１６１は優先度情報取得部１９１、チャネルオーディオ信号取得部１９２、チャネルオーディオ信号復号部１９３、出力選択部１９４、０値出力部１９５、IMDCT部１９６、オーバーラップ加算部２７１、SBR処理部２７３、ゲイン調整部２７２、オブジェクトオーディオ信号取得部１９７、オブジェクトオーディオ信号復号部１９８、出力選択部１９９、０値出力部２００、IMDCT部２０１、オーバーラップ加算部２７４、SBR処理部２７６、およびゲイン調整部２７５から構成される。 The unpacking/decoding unit 161 shown in FIG. 25 includes a priority information acquisition unit 191, a channel audio signal acquisition unit 192, a channel audio signal decoding unit 193, an output selection unit 194, a 0 value output unit 195, an IMDCT unit 196, and an overlap addition. Unit 271, SBR processing unit 273, gain adjustment unit 272, object audio signal acquisition unit 197, object audio signal decoding unit 198, output selection unit 199, 0 value output unit 200, IMDCT unit 201, overlap addition unit 274, SBR processing It is composed of a section 276 and a gain adjustment section 275 .

図２５に示すアンパッキング／復号部１６１の構成は、ゲイン調整部２７２およびゲイン調整部２７５が、それぞれSBR処理部２７３およびSBR処理部２７６の後段に配置されている点で、図２１に示したアンパッキング／復号部１６１の構成と異なる。 The configuration of unpacking/decoding section 161 shown in FIG. 25 is similar to that shown in FIG. 21 in that gain adjustment section 272 and gain adjustment section 275 are arranged after SBR processing section 273 and SBR processing section 276, respectively. It differs from the configuration of the unpacking/decoding section 161 .

図２５に示すアンパッキング／復号部１６１では、SBR処理部２７３は、優先度情報取得部１９１から供給された高域のパワー値に基づいて、オーバーラップ加算部２７１から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をゲイン調整部２７２に供給する。この場合、SBR処理部２７３では、高域のパワー値のゲイン調整は行われない。 In the unpacking/decoding unit 161 shown in FIG. 25, the SBR processing unit 273 performs The audio signal obtained as a result of the SBR is supplied to the gain adjustment section 272 . In this case, the SBR processing section 273 does not adjust the gain of the high frequency power value.

ゲイン調整部２７２は、優先度情報取得部１９１から供給された優先度情報に基づいて、SBR処理部２７３から供給されたオーディオ信号をゲイン調整し、ミキシング部１６３に供給する。 The gain adjustment unit 272 adjusts the gain of the audio signal supplied from the SBR processing unit 273 based on the priority information supplied from the priority information acquisition unit 191 and supplies the audio signal to the mixing unit 163 .

SBR処理部２７６は、優先度情報取得部１９１から供給された高域のパワー値に基づいて、オーバーラップ加算部２７４から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をゲイン調整部２７５に供給する。この場合、SBR処理部２７６では、高域のパワー値のゲイン調整は行われない。 The SBR processing unit 276 performs SBR on the audio signal supplied from the overlap addition unit 274 based on the high frequency power value supplied from the priority information acquisition unit 191, and the resulting audio signal is supplied to the gain adjustment unit 275 . In this case, the SBR processing section 276 does not adjust the gain of the high frequency power value.

ゲイン調整部２７５は、優先度情報取得部１９１から供給された優先度情報に基づいて、SBR処理部２７６から供給されたオーディオ信号をゲイン調整し、レンダリング部１６２に供給する。 The gain adjustment unit 275 adjusts the gain of the audio signal supplied from the SBR processing unit 276 based on the priority information supplied from the priority information acquisition unit 191 and supplies the result to the rendering unit 162 .

〈選択復号処理の説明〉
続いて、アンパッキング／復号部１６１が図２５に示した構成とされる場合における復号装置１５１の動作について説明する。この場合、復号装置１５１は、図１１を参照して説明した復号処理を行う。但し、ステップＳ５２の選択復号処理として、図２６に示す処理を行う。 <Description of selective decryption processing>
Next, the operation of decoding device 151 when unpacking/decoding section 161 has the configuration shown in FIG. 25 will be described. In this case, the decoding device 151 performs the decoding process described with reference to FIG. However, as the selective decoding process in step S52, the process shown in FIG. 26 is performed.

以下、図２６のフローチャートを参照して、図１１のステップＳ５２の処理に対応する選択復号処理について説明する。なお、ステップＳ２３１乃至ステップＳ２３８の処理は、図２２のステップＳ１８１乃至ステップＳ１８８の処理と同様であるので、その説明は省略する。但し、ステップＳ２３２では、SBR処理部２７３およびSBR処理部２７６には、優先度情報は供給されない。 The selective decoding process corresponding to the process of step S52 in FIG. 11 will be described below with reference to the flowchart in FIG. Note that the processing from step S231 to step S238 is the same as the processing from step S181 to step S188 in FIG. 22, so description thereof will be omitted. However, priority information is not supplied to the SBR processing unit 273 and the SBR processing unit 276 in step S232.

ステップＳ２３９において、SBR処理部２７３は、優先度情報取得部１９１から供給された高域のパワー値に基づいて、オーバーラップ加算部２７１から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をゲイン調整部２７２に供給する。 In step S239, the SBR processing unit 273 performs SBR on the audio signal supplied from the overlap addition unit 271 based on the high frequency power value supplied from the priority information acquisition unit 191, and obtains The obtained audio signal is supplied to the gain adjustment section 272 .

ステップＳ２４０において、ゲイン調整部２７２は、優先度情報取得部１９１から供給された処理対象のチャネルの優先度情報に基づいて、SBR処理部２７３から供給されたオーディオ信号をゲイン調整し、ミキシング部１６３に供給する。 In step S240 , the gain adjustment unit 272 gain-adjusts the audio signal supplied from the SBR processing unit 273 based on the priority information of the processing target channel supplied from the priority information acquisition unit 191 . supply to

具体的にはゲイン調整部２７２は、現時間フレームの直前の時間フレームの優先度情報が閾値Ｐ以上であり、かつ現時間フレームの優先度情報と、現時間フレームの直後の時間フレームの優先度情報が閾値Ｐ未満である場合、図２３の折れ線GN31に示されるフェーディング信号ゲインでオーディオ信号のゲインを調整する。この場合、図２３における時間フレーム（n）が現時間フレームに対応し、現時間フレームの直後の時間フレームでは、折れ線GN31に示されるように、フェーディング信号ゲイン＝０でのゲイン調整が行われる。 Specifically, the gain adjustment unit 272 determines that the priority information of the time frame immediately before the current time frame is equal to or greater than the threshold value P, and the priority information of the current time frame and the priority of the time frame immediately after the current time frame If the information is less than the threshold P, adjust the gain of the audio signal with the fading signal gain indicated by the polygonal line GN31 in FIG. In this case, the time frame (n) in FIG. 23 corresponds to the current time frame, and in the time frame immediately after the current time frame, gain adjustment is performed with the fading signal gain=0, as indicated by the polygonal line GN31. .

また、ゲイン調整部２７２は、現時間フレームの優先度情報が閾値Ｐ以上であり、現時間フレームの直前の２時間フレームの優先度情報がともに閾値Ｐ未満である場合、図２４の折れ線GN41に示されるフェーディング信号ゲインでオーディオ信号のゲインを調整する。この場合、図２４における時間フレーム（n+2）が現時間フレームに対応し、現時間フレームの直前の時間フレームでは、折れ線GN41に示されるように、フェーディング信号ゲイン＝０でのゲイン調整が行われる。 If the priority information of the current time frame is equal to or greater than the threshold value P and the priority information of the two time frames immediately preceding the current time frame are both less than the threshold value P, the gain adjustment unit 272 shifts to the polygonal line GN41 in FIG. Adjust the gain of the audio signal with the indicated fading signal gain. In this case, the time frame (n+2) in FIG. 24 corresponds to the current time frame, and in the time frame immediately preceding the current time frame, gain adjustment is performed with the fading signal gain=0, as indicated by the polygonal line GN41. done.

なお、ゲイン調整部２７２は、これらの２つの例の場合のみゲイン調整を行い、それ以外の場合にはゲイン調整を行わず、オーディオ信号をそのままミキシング部１６３に供給する。 Note that the gain adjustment section 272 performs gain adjustment only in these two cases, and does not perform gain adjustment in other cases, and supplies the audio signal to the mixing section 163 as it is.

オーディオ信号のゲイン調整が行われると、その後、ステップＳ２４１乃至ステップＳ２４７の処理が行われるが、これらの処理は図２２のステップＳ１９１乃至ステップＳ１９７の処理と同様であるので、その説明は省略する。 After the gain adjustment of the audio signal is performed, the processes of steps S241 to S247 are performed, but since these processes are the same as the processes of steps S191 to S197 in FIG. 22, the description thereof will be omitted.

このようにして処理対象のオブジェクトの現時間フレームのオーディオ信号が得られると、ステップＳ２４８およびステップＳ２４９の処理が行われて最終的な現時間フレームのオーディオ信号が生成されるが、これらの処理はステップＳ２３９およびステップＳ２４０の処理と同様であるので、その説明は省略する。 When the audio signal of the current time frame of the object to be processed is obtained in this way, the processes of steps S248 and S249 are performed to generate the final audio signal of the current time frame. Since it is the same as the processing in steps S239 and S240, the description thereof will be omitted.

ステップＳ２５０において、オブジェクトオーディオ信号取得部１９７がオブジェクト番号に１を加えると、処理はステップＳ２４３に戻る。そして、ステップＳ２４３においてオブジェクト番号がＮ未満ではないと判定されると、選択復号処理は終了し、その後、処理は図１１のステップＳ５３へと進む。 In step S250, when the object audio signal acquisition unit 197 adds 1 to the object number, the process returns to step S243. Then, if it is determined in step S243 that the object number is not less than N, the selective decoding process ends, and then the process proceeds to step S53 in FIG.

以上のようにしてアンパッキング／復号部１６１は、連続する３時間フレームの優先度情報に基づいて、SBRにより得られたオーディオ信号をゲイン調整する。これにより、より簡単にグリッチノイズの発生を抑制し、聴感上の音質の劣化を抑制することができる。 As described above, the unpacking/decoding unit 161 adjusts the gain of the audio signal obtained by SBR based on the priority information of the three consecutive time frames. As a result, it is possible to more easily suppress the occurrence of glitch noise and suppress the deterioration of the sound quality on the sense of hearing.

なお、この実施の形態では、３時間フレーム分の優先度情報を用いたMDCT係数の出力先の選択と、フェーディング信号ゲインによるゲイン調整とを行う例について説明したが、フェーディング信号ゲインによるゲイン調整のみが行われるようにしてもよい。 In this embodiment, an example of selecting the output destination of the MDCT coefficient using priority information for three time frames and adjusting the gain by the fading signal gain has been described. Alternatively, only adjustments may be made.

そのような場合、出力選択部１９４や出力選択部１９９では、第１の実施の形態における場合と同様の処理により、MDCT係数の出力先が選択される。そして、ゲイン調整部２７２およびゲイン調整部２７５では、現時間フレームの優先度情報が閾値未満である場合、現時間フレームのフェーディング信号ゲインを線形に増加または減少させることで、フェードイン処理やフェードアウト処理を行う。ここで、フェードイン処理とするか、またはフェードアウト処理とするかは、現時間フレームの優先度情報と、その前後の時間フレームの優先度情報とから定めればよい。 In such a case, the output selection section 194 and the output selection section 199 select the output destination of the MDCT coefficients by the same processing as in the first embodiment. Then, when the priority information of the current time frame is less than the threshold, the gain adjustment section 272 and the gain adjustment section 275 linearly increase or decrease the fading signal gain of the current time frame to perform fade-in processing and fade-out processing. process. Here, whether to perform fade-in processing or fade-out processing may be determined based on the priority information of the current time frame and the priority information of time frames before and after it.

〈第６の実施の形態〉
〈フェードイン処理とフェードアウト処理について〉
ところで、レンダリング部１６２では、例えばVBAPが行われて各オブジェクトのオーディオ信号から、各オブジェクトの音声を再生するための各チャネルのオーディオ信号が生成される。 <Sixth Embodiment>
<Regarding fade-in and fade-out processing>
By the way, in the rendering unit 162, for example, VBAP is performed to generate an audio signal of each channel for reproducing the sound of each object from the audio signal of each object.

具体的には、VBAPではチャネルごと、つまり音声を出力するスピーカごとに、各オブジェクトについて、オーディオ信号のゲイン値（以下、VBAPゲインとも称する）が時間フレームごとに算出される。そして、同じチャネル（スピーカ）についてのVBAPゲインが乗算された各オブジェクトのオーディオ信号の和が、そのチャネルのオーディオ信号とされる。換言すれば、各オブジェクトについて、オブジェクトのオーディオ信号がチャネルごとに算出されたVBAPゲインで、それらの各チャネルに割り当てられる。 Specifically, in VBAP, an audio signal gain value (hereinafter also referred to as VBAP gain) for each object is calculated for each channel, that is, for each speaker that outputs audio, for each time frame. Then, the sum of the audio signals of each object multiplied by the VBAP gain for the same channel (speaker) is taken as the audio signal of that channel. In other words, for each object, the audio signal of the object is assigned to each of their channels with the VBAP gain calculated for each channel.

そこで、オブジェクトのオーディオ信号については、オブジェクトのオーディオ信号や高域のパワー値のゲイン調整をするのではなく、VBAPゲインを適切に調整することにより、グリッチノイズの発生を抑制して聴感上の音質の劣化を抑制するようにしてもよい。 Therefore, for the object audio signal, instead of adjusting the gain of the object audio signal and the power value of the high frequency, by appropriately adjusting the VBAP gain, the occurrence of glitch noise is suppressed and the sound quality is improved. deterioration may be suppressed.

そのような場合、例えば各時間フレームのVBAPゲインに対して線形補間等が行われ、各時間フレーム内のオーディオ信号のサンプルごとのVBAPゲインが算出され、得られたVBAPゲインにより各チャネルのオーディオ信号が生成される。 In such a case, for example, linear interpolation or the like is performed on the VBAP gain of each time frame, the VBAP gain for each sample of the audio signal in each time frame is calculated, and the obtained VBAP gain is used to calculate the audio signal of each channel. is generated.

例えば、処理対象の時間フレームの先頭サンプルのVBAPゲインの値は、処理対象の時間フレームの直前の時間フレームの末尾のサンプルのVBAPゲインの値とされる。また、処理対象の時間フレームの末尾のサンプルのVBAPゲインの値は、その処理対象の時間フレームに対する通常のVBAPにより算出されたVBAPゲインの値とされる。 For example, the VBAP gain value of the leading sample of the time frame to be processed is the VBAP gain value of the last sample of the time frame immediately preceding the time frame to be processed. Also, the value of the VBAP gain of the sample at the end of the time frame to be processed is the value of the VBAP gain calculated by the normal VBAP for the time frame to be processed.

そして、処理対象の時間フレームでは、先頭サンプルから末尾のサンプルまでVBAPゲインが線形に変化するように、先頭サンプルと末尾のサンプルとの間の各サンプルのVBAPゲインの値が定められる。 Then, in the time frame to be processed, the VBAP gain value of each sample between the leading sample and the trailing sample is determined such that the VBAP gain varies linearly from the leading sample to the trailing sample.

但し、処理対象の時間フレームの優先度情報が閾値未満である場合には、VBAPの計算は行われず、その処理対象の時間フレームの末尾のサンプルのVBAPゲインの値は、０とされる。そして、処理対象の時間フレームの先頭サンプルから、末尾のサンプルまでVBAPゲインが線形に変化するように、各サンプルのVBAPゲインが定められる。 However, if the priority information of the time frame to be processed is less than the threshold, the VBAP is not calculated, and the value of the VBAP gain of the sample at the end of the time frame to be processed is set to zero. Then, the VBAP gain of each sample is determined such that the VBAP gain varies linearly from the first sample to the last sample of the time frame to be processed.

このようにしてVBAPゲインにより各オブジェクトのオーディオ信号のゲイン調整を行うことで、低域成分と高域成分のゲイン調整を１度に行うことができ、より少ない処理量でグリッチノイズの発生を抑制し、聴感上の音質の劣化を抑制することができる。 By adjusting the gain of the audio signal of each object using the VBAP gain in this way, it is possible to adjust the gain of the low-frequency component and the high-frequency component at once, suppressing the occurrence of glitch noise with a smaller amount of processing. It is possible to suppress the deterioration of the sound quality on the sense of hearing.

このようにサンプルごとにVBAPゲインを定める場合、各時間フレームのサンプルごとのVBAPゲインは例えば図２７や図２８に示すようになる。 When the VBAP gain is determined for each sample in this way, the VBAP gain for each sample in each time frame is as shown in FIGS. 27 and 28, for example.

なお、図２７および図２８において、図１９および図２０における場合と対応する部分には同一の文字等を記してあり、その説明は省略する。また、図２７および図２８において、「VBAP_gain[q][s]」（但し、q＝n-1,n,n+1,n+2）は、所定のチャネルに対応するスピーカを特定するスピーカインデックスがｓである、処理対象のオブジェクトの時間フレーム（q）のVBAPゲインを示している。 In FIGS. 27 and 28, portions corresponding to those in FIGS. 19 and 20 are denoted by the same letters, etc., and description thereof will be omitted. 27 and 28, "VBAP_gain[q][s]" (where q = n-1, n, n+1, n+2) is a speaker Fig. 3 shows the VBAP gain for the time frame (q) of the object being processed with index s;

図２７に示す例は、優先度情報の変化が図１９に示した場合と同じである例である。この例では、閾値Ｑ＝４であるとすると、時間フレーム（n-1）の優先度情報は閾値Ｑ以上であるが、時間フレーム（n）乃至時間フレーム（n+2）では、優先度情報が閾値Ｑ未満となっている。 The example shown in FIG. 27 is an example in which the change in priority information is the same as the case shown in FIG. In this example, assuming that the threshold Q=4, the priority information in the time frame (n-1) is equal to or greater than the threshold Q, but in the time frames (n) to (n+2), the priority information is less than the threshold Q.

このような場合、時間フレーム（n-1）乃至時間フレーム（n+1）のVBAPゲインは、例えば折れ線GN51に示されるゲインとされる。 In such a case, the VBAP gains from time frame (n−1) to time frame (n+1) are, for example, the gains indicated by polygonal line GN51.

この例では、時間フレーム（n-1）の優先度情報は閾値Ｑ以上であるので、通常のVBAPにより算出されたVBAPゲインに基づいて、各サンプルのVBAPゲインが定められる。 In this example, the priority information of the time frame (n-1) is equal to or higher than the threshold Q, so the VBAP gain of each sample is determined based on the VBAP gain calculated by normal VBAP.

すなわち、時間フレーム（n-1）の先頭のサンプルのVBAPゲインの値は、時間フレーム（n-2）の末尾のサンプルのVBAPゲインの値と同じとされている。また、時間フレーム（n-1）の末尾のサンプルのVBAPゲインの値は、処理対象となっているオブジェクトについて、時間フレーム（n-1）に対する通常のVBAPにより算出された、スピーカｓに対応するチャネルのVBAPゲインの値とされている。そして、時間フレーム（n-1）の各サンプルのVBAPゲインの値は、先頭のサンプルから末尾のサンプルまで線形に変化するように定められている。 That is, the VBAP gain value of the sample at the beginning of the time frame (n-1) is the same as the VBAP gain value of the sample at the end of the time frame (n-2). Also, the value of the VBAP gain for the last sample in time frame (n-1) corresponds to the speaker s calculated by the normal VBAP for time frame (n-1) for the object being processed. It is the value of the channel's VBAP gain. The VBAP gain value of each sample in the time frame (n-1) is determined to change linearly from the first sample to the last sample.

また、時間フレーム（n）の優先度情報は閾値Ｑ未満であるので、時間フレーム（n）の末尾のサンプルのVBAPゲインの値は０とされる。 Also, since the priority information of time frame (n) is less than the threshold Q, the VBAP gain value of the last sample of time frame (n) is set to zero.

すなわち、時間フレーム（n）の先頭のサンプルのVBAPゲインの値は、時間フレーム（n-1）の末尾のサンプルのVBAPゲインの値と同じとされ、時間フレーム（n）の末尾のサンプルのVBAPゲインの値は０とされる。そして、時間フレーム（n）の各サンプルのVBAPゲインの値が、先頭のサンプルから末尾のサンプルまで線形に変化するように定められる。 That is, the VBAP gain value for the leading sample of time frame (n) is assumed to be the same as the VBAP gain value for the trailing sample of time frame (n-1), and the VBAP gain value for the trailing sample of time frame (n) is The gain value is set to 0. Then, the value of the VBAP gain of each sample in time frame (n) is determined so as to change linearly from the first sample to the last sample.

さらに、時間フレーム（n+1）の優先度情報は閾値Ｑ未満であるので、時間フレーム（n+1）の末尾のサンプルのVBAPゲインの値は０とされ、結果として時間フレーム（n+1）の全サンプルのVBAPゲインの値は０となる。 Furthermore, since the priority information of time frame (n+1) is less than the threshold Q, the VBAP gain value of the last sample of time frame (n+1) is set to 0, resulting in time frame (n+1 ) becomes 0 for the VBAP gain of all samples.

このように、優先度情報が閾値Ｑ未満である時間フレームの末尾のサンプルのVBAPゲインの値を０とすることで、図２３の例と等価なフェードアウト処理が可能となる。 In this way, by setting the VBAP gain value of the sample at the end of the time frame whose priority information is less than the threshold Q to 0, a fade-out process equivalent to the example in FIG. 23 can be performed.

これに対して、図２８に示す例は、優先度情報の変化が図２４に示した場合と同じである例である。この例では、閾値Ｑ＝４であるとすると、時間フレーム（n-1）乃至時間フレーム（n+1）では優先度情報が閾値Ｑ未満であるが、時間フレーム（n+2）の優先度情報は閾値Ｑ以上となっている。 On the other hand, the example shown in FIG. 28 is an example in which the change in priority information is the same as in the case shown in FIG. In this example, assuming that the threshold Q=4, the priority information is less than the threshold Q in the time frames (n-1) to (n+1), but the priority in the time frame (n+2) is The information is equal to or greater than the threshold Q.

このような場合、時間フレーム（n-1）乃至時間フレーム（n+2）のVBAPゲインは、例えば折れ線GN61に示されるゲインとされる。 In such a case, the VBAP gains for time frame (n-1) to time frame (n+2) are, for example, the gains indicated by polygonal line GN61.

この例では、時間フレーム（n）の優先度情報も時間フレーム（n+1）の優先度情報もともに閾値Ｑ未満であるので、時間フレーム（n+1）の全サンプルのVBAPゲインは０となる。 In this example, both the priority information for time frame (n) and the priority information for time frame (n+1) are below threshold Q, so the VBAP gain for all samples in time frame (n+1) is zero. Become.

また、時間フレーム（n+2）の優先度情報は閾値Ｑ以上であるので、処理対象となっているオブジェクトについて、通常のVBAPにより算出されたスピーカｓに対応するチャネルのVBAPゲインに基づいて、各サンプルのVBAPゲインが定められる。 Also, since the priority information of the time frame (n+2) is equal to or higher than the threshold Q, for the object to be processed, based on the VBAP gain of the channel corresponding to the speaker s calculated by normal VBAP, A VBAP gain for each sample is determined.

すなわち、時間フレーム（n+2）の先頭のサンプルのVBAPゲインの値は、時間フレーム（n+1）の末尾のサンプルのVBAPゲインの値である０とされ、時間フレーム（n+2）の末尾のサンプルのVBAPゲインの値は、時間フレーム（n+2）に対する通常のVBAPにより算出されたVBAPゲインの値とされている。そして、時間フレーム（n+2）の各サンプルのVBAPゲインの値は、先頭のサンプルから末尾のサンプルまで線形に変化するように定められている。 That is, the VBAP gain value of the sample at the beginning of the time frame (n+2) is set to 0, which is the VBAP gain value of the sample at the end of the time frame (n+1), and the VBAP gain value of the sample at the end of the time frame (n+2). The VBAP gain value of the last sample is the VBAP gain value calculated by normal VBAP for time frame (n+2). The value of the VBAP gain of each sample in the time frame (n+2) is determined to change linearly from the first sample to the last sample.

このように、優先度情報が閾値Ｑ未満である時間フレームの末尾のサンプルのVBAPゲインの値を０とすることで、図２４の例と等価なフェードイン処理が可能となる。 In this way, by setting the value of the VBAP gain of the sample at the end of the time frame whose priority information is less than the threshold Q to 0, fade-in processing equivalent to the example of FIG. 24 can be performed.

〈アンパッキング／復号部の構成例〉
図２７および図２８を参照して説明したフェードイン処理やフェードアウト処理によるゲイン調整が行われる場合、アンパッキング／復号部１６１は、例えば図２９に示すように構成される。なお、図２９において、図２５における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 <Configuration example of unpacking/decoding section>
When gain adjustment is performed by the fade-in processing and fade-out processing described with reference to FIGS. 27 and 28, the unpacking/decoding section 161 is configured as shown in FIG. 29, for example. In FIG. 29, parts corresponding to those in FIG. 25 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

図２９に示すアンパッキング／復号部１６１は優先度情報取得部１９１、チャネルオーディオ信号取得部１９２、チャネルオーディオ信号復号部１９３、出力選択部１９４、０値出力部１９５、IMDCT部１９６、オーバーラップ加算部２７１、SBR処理部２７３、ゲイン調整部２７２、オブジェクトオーディオ信号取得部１９７、オブジェクトオーディオ信号復号部１９８、出力選択部１９９、０値出力部２００、IMDCT部２０１、オーバーラップ加算部２７４、およびSBR処理部２７６から構成される。 The unpacking/decoding unit 161 shown in FIG. 29 includes a priority information acquisition unit 191, a channel audio signal acquisition unit 192, a channel audio signal decoding unit 193, an output selection unit 194, a 0 value output unit 195, an IMDCT unit 196, and an overlap addition. Unit 271, SBR processing unit 273, gain adjustment unit 272, object audio signal acquisition unit 197, object audio signal decoding unit 198, output selection unit 199, 0 value output unit 200, IMDCT unit 201, overlap addition unit 274, and SBR It is composed of a processing unit 276 .

図２９に示すアンパッキング／復号部１６１の構成は、ゲイン調整部２７５が設けられていない点で、図２５に示したアンパッキング／復号部１６１の構成と異なり、その他の点では同じ構成となっている。 The configuration of unpacking/decoding section 161 shown in FIG. 29 differs from the configuration of unpacking/decoding section 161 shown in FIG. 25 in that gain adjustment section 275 is not provided, and otherwise has the same configuration. ing.

図２９に示すアンパッキング／復号部１６１では、SBR処理部２７６は、優先度情報取得部１９１から供給された高域のパワー値に基づいて、オーバーラップ加算部２７４から供給されたオーディオ信号に対してSBRを行い、その結果得られたオーディオ信号をレンダリング部１６２に供給する。 In the unpacking/decoding unit 161 shown in FIG. 29, the SBR processing unit 276 performs SBR is performed using the SBR, and the audio signal obtained as a result is supplied to the rendering unit 162 .

また、優先度情報取得部１９１は、供給されたビットストリームから各オブジェクトのメタデータと優先度情報を取得してレンダリング部１６２に供給する。なお、各オブジェクトの優先度情報は、出力選択部１９９にも供給される。 Also, the priority information acquisition unit 191 acquires metadata and priority information of each object from the supplied bitstream and supplies them to the rendering unit 162 . The priority information of each object is also supplied to the output selection section 199 .

〈復号処理の説明〉
続いて、アンパッキング／復号部１６１が図２９に示した構成とされる場合における復号装置１５１の動作について説明する。 <Description of Decryption Processing>
Next, the operation of decoding device 151 when unpacking/decoding section 161 has the configuration shown in FIG. 29 will be described.

この場合、復号装置１５１は、図３０に示す復号処理を行う。以下、図３０のフローチャートを参照して、復号装置１５１により行われる復号処理について説明する。但し、ステップＳ２８１では、図１１のステップＳ５１の処理と同様の処理が行われるので、その説明は省略する。 In this case, the decoding device 151 performs the decoding process shown in FIG. The decoding process performed by the decoding device 151 will be described below with reference to the flowchart of FIG. However, in step S281, the same processing as the processing in step S51 of FIG. 11 is performed, so the description thereof will be omitted.

ステップＳ２８２において、アンパッキング／復号部１６１は選択復号処理を行う。 In step S282, the unpacking/decoding unit 161 performs selective decoding processing.

ここで、図３１のフローチャートを参照して、図３０のステップＳ２８２の処理に対応する選択復号処理について説明する。 Here, the selective decoding process corresponding to the process of step S282 in FIG. 30 will be described with reference to the flowchart in FIG.

なお、ステップＳ３１１乃至ステップＳ３２８の処理は、図２６のステップＳ２３１乃至ステップＳ２４８の処理と同様であるので、その説明は省略する。但し、ステップＳ３１２では、優先度情報取得部１９１は、ビットストリームから取得された優先度情報をレンダリング部１６２にも供給する。 Note that the processing from step S311 to step S328 is the same as the processing from step S231 to step S248 in FIG. 26, so description thereof will be omitted. However, in step S312 , the priority information acquisition unit 191 also supplies the priority information acquired from the bitstream to the rendering unit 162 .

ステップＳ３２９において、オブジェクトオーディオ信号取得部１９７がオブジェクト番号に１を加えると、処理はステップＳ３２３に戻る。そして、ステップＳ３２３においてオブジェクト番号がＮ未満ではないと判定されると、選択復号処理は終了し、その後、処理は図３０のステップＳ２８３へと進む。 In step S329, when the object audio signal acquisition unit 197 adds 1 to the object number, the process returns to step S323. Then, if it is determined in step S323 that the object number is not less than N, the selective decoding process ends, and then the process proceeds to step S283 in FIG.

したがって、図３１に示した選択復号処理では、各チャネルのオーディオ信号については、第５の実施の形態における場合と同様にフェーディング信号ゲインによるゲイン調整が行われ、各オブジェクトについては、ゲイン調整は行われず、SBRにより得られたオーディオ信号がそのままレンダリング部１６２に出力される。 Therefore, in the selective decoding process shown in FIG. 31, the audio signal of each channel is subjected to gain adjustment based on the fading signal gain as in the fifth embodiment, and the gain adjustment is not performed for each object. The audio signal obtained by SBR is output to the rendering unit 162 as it is.

図３０の復号処理の説明に戻り、ステップＳ２８３において、レンダリング部１６２は、SBR処理部２７６から供給された各オブジェクトのオーディオ信号と、優先度情報取得部１９１から供給された各オブジェクトのメタデータとしての位置情報、および各オブジェクトの現時間フレームの優先度情報とに基づいて、各オブジェクトのオーディオ信号のレンダリングを行う。 Returning to the description of the decoding process in FIG. 30, in step S283, the rendering unit 162 converts the audio signal of each object supplied from the SBR processing unit 276 into the metadata of each object supplied from the priority information acquisition unit 191. and the priority information of the current time frame of each object, the audio signal of each object is rendered.

例えばレンダリング部１６２は、図２７や図２８を参照して説明したように、オブジェクトごとに、各チャネルについて現時間フレームの優先度情報と、現時間フレームの直前の時間フレームの末尾のサンプルのVBAPゲインに基づいて、現時間フレームの各サンプルのVBAPゲインを算出する。このときレンダリング部１６２は、適宜、位置情報に基づいてVBAPによりVBAPゲインを算出する。 For example, as described with reference to FIGS. 27 and 28, the rendering unit 162, for each object, for each channel, the priority information of the current time frame and the VBAP of the sample at the end of the time frame immediately before the current time frame. Based on the gain, calculate the VBAP gain for each sample in the current time frame. At this time, the rendering unit 162 appropriately calculates the VBAP gain by VBAP based on the position information.

そして、レンダリング部１６２は、各オブジェクトについて算出した各チャネルのサンプルごとのVBAPゲインと、各オブジェクトのオーディオ信号とに基づいて、各チャネルのオーディオ信号を生成し、ミキシング部１６３に供給する。 Then, the rendering unit 162 generates an audio signal of each channel based on the VBAP gain for each sample of each channel calculated for each object and the audio signal of each object, and supplies the audio signal to the mixing unit 163 .

なお、ここでは時間フレーム内の各サンプルのVBAPゲインが線形に変化するように各サンプルのVBAPゲインを算出する例について説明したが、VBAPゲインが非線形に変化するようにしてもよい。また、VBAPにより各チャネルのオーディオ信号が生成される例について説明したが、他の方法により各チャネルのオーディオ信号を生成する場合でも、VBAPにおける場合と同様の処理により、各オブジェクトのオーディオ信号のゲインを調整することが可能である。 Although the example of calculating the VBAP gain of each sample so that the VBAP gain of each sample in the time frame changes linearly has been described here, the VBAP gain may change nonlinearly. In addition, an example in which the audio signal of each channel is generated by VBAP has been explained, but even when the audio signal of each channel is generated by other methods, the gain of the audio signal of each object is calculated by the same processing as in VBAP. can be adjusted.

各チャネルのオーディオ信号が生成されると、その後、ステップＳ２８４の処理が行われて復号処理は終了するが、ステップＳ２８４の処理は図１１のステップＳ５４の処理と同様であるので、その説明は省略する。 After the audio signal of each channel is generated, the process of step S284 is performed and the decoding process ends. However, since the process of step S284 is the same as the process of step S54 in FIG. 11, the description thereof is omitted. do.

このようにして復号装置１５１は、各オブジェクトについて、優先度情報に基づいてサンプルごとにVBAPゲインを算出し、各チャネルのオーディオ信号の生成時に、VBAPゲインによりオブジェクトのオーディオ信号のゲイン調整を行う。これにより、より少ない処理量でグリッチノイズの発生を抑制し、聴感上の音質の劣化を抑制することができる。 In this way, the decoding device 151 calculates the VBAP gain for each sample based on the priority information for each object, and adjusts the gain of the audio signal of the object using the VBAP gain when generating the audio signal of each channel. As a result, it is possible to suppress the occurrence of glitch noise with a smaller amount of processing, and to suppress the deterioration of the sound quality on the sense of hearing.

なお、第４の実施の形態乃至第６の実施の形態では、現時間フレームの直前および直後の時間フレームの優先度情報を利用してMDCT係数の出力先を選択したり、フェーディング信号ゲイン等によるゲイン調整を行ったりすると説明した。しかし、これに限らず、現時間フレームの優先度情報と、現時間フレームの所定時間フレームだけ前の時間フレームの優先度情報や、現時間フレームの所定時間フレームだけ後の時間フレームの優先度情報とが用いられるようにしてもよい。 It should be noted that in the fourth to sixth embodiments, the priority information of the time frames immediately before and after the current time frame is used to select the output destination of the MDCT coefficients, the fading signal gain, etc. He explained that gain adjustment is performed by However, not limited to this, the priority information of the current time frame, the priority information of the time frame preceding the current time frame by a predetermined time frame, and the priority information of the time frame following the current time frame by a predetermined time frame. and may be used.

ところで、上述した一連の処理は、ハードウェアにより実行することもできるし、ソフトウェアにより実行することもできる。一連の処理をソフトウェアにより実行する場合には、そのソフトウェアを構成するプログラムが、コンピュータにインストールされる。ここで、コンピュータには、専用のハードウェアに組み込まれているコンピュータや、各種のプログラムをインストールすることで、各種の機能を実行することが可能な、例えば汎用のコンピュータなどが含まれる。 By the way, the series of processes described above can be executed by hardware or by software. When executing a series of processes by software, a program that constitutes the software is installed in the computer. Here, the computer includes, for example, a computer built into dedicated hardware and a general-purpose computer capable of executing various functions by installing various programs.

図３２は、上述した一連の処理をプログラムにより実行するコンピュータのハードウェアの構成例を示すブロック図である。 FIG. 32 is a block diagram showing a hardware configuration example of a computer that executes the series of processes described above by a program.

コンピュータにおいて、ＣＰＵ（Central Processing Unit）５０１，ＲＯＭ（Read Only Memory）５０２，ＲＡＭ（Random Access Memory）５０３は、バス５０４により相互に接続されている。 In the computer, a CPU (Central Processing Unit) 501 , a ROM (Read Only Memory) 502 and a RAM (Random Access Memory) 503 are interconnected by a bus 504 .

バス５０４には、さらに、入出力インターフェース５０５が接続されている。入出力インターフェース５０５には、入力部５０６、出力部５０７、記録部５０８、通信部５０９、およびドライブ５１０が接続されている。 An input/output interface 505 is also connected to the bus 504 . An input unit 506 , an output unit 507 , a recording unit 508 , a communication unit 509 and a drive 510 are connected to the input/output interface 505 .

入力部５０６は、キーボード、マウス、マイクロフォン、撮像素子などよりなる。出力部５０７は、ディスプレイ、スピーカなどよりなる。記録部５０８は、ハードディスクや不揮発性のメモリなどよりなる。通信部５０９は、ネットワークインターフェースなどよりなる。ドライブ５１０は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリなどのリムーバブルメディア５１１を駆動する。 An input unit 506 includes a keyboard, mouse, microphone, imaging device, and the like. The output unit 507 includes a display, a speaker, and the like. A recording unit 508 is composed of a hard disk, a nonvolatile memory, or the like. A communication unit 509 includes a network interface and the like. A drive 510 drives a removable medium 511 such as a magnetic disk, optical disk, magneto-optical disk, or semiconductor memory.

以上のように構成されるコンピュータでは、ＣＰＵ５０１が、例えば、記録部５０８に記録されているプログラムを、入出力インターフェース５０５およびバス５０４を介して、ＲＡＭ５０３にロードして実行することにより、上述した一連の処理が行われる。 In the computer configured as described above, for example, the CPU 501 loads the program recorded in the recording unit 508 into the RAM 503 via the input/output interface 505 and the bus 504 and executes the above-described series of programs. is processed.

コンピュータ（ＣＰＵ５０１）が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア５１１に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 A program executed by the computer (CPU 501) can be provided by being recorded on a removable medium 511 such as a package medium, for example. Also, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

コンピュータでは、プログラムは、リムーバブルメディア５１１をドライブ５１０に装着することにより、入出力インターフェース５０５を介して、記録部５０８にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信部５０９で受信し、記録部５０８にインストールすることができる。その他、プログラムは、ＲＯＭ５０２や記録部５０８に、あらかじめインストールしておくことができる。 In the computer, the program can be installed in the recording section 508 via the input/output interface 505 by loading the removable medium 511 into the drive 510 . Also, the program can be received by the communication unit 509 and installed in the recording unit 508 via a wired or wireless transmission medium. In addition, the program can be installed in the ROM 502 or the recording unit 508 in advance.

なお、コンピュータが実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであっても良いし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであっても良い。 The program executed by the computer may be a program that is processed in chronological order according to the order described in this specification, or may be executed in parallel or at a necessary timing such as when a call is made. It may be a program in which processing is performed.

また、本技術の実施の形態は、上述した実施の形態に限定されるものではなく、本技術の要旨を逸脱しない範囲において種々の変更が可能である。 Further, the embodiments of the present technology are not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present technology.

例えば、本技術は、１つの機能をネットワークを介して複数の装置で分担、共同して処理するクラウドコンピューティングの構成をとることができる。 For example, the present technology can take a configuration of cloud computing in which one function is shared by a plurality of devices via a network and processed jointly.

また、上述のフローチャートで説明した各ステップは、１つの装置で実行する他、複数の装置で分担して実行することができる。 Further, each step described in the flowchart above can be executed by one device, or can be shared by a plurality of devices and executed.

さらに、１つのステップに複数の処理が含まれる場合には、その１つのステップに含まれる複数の処理は、１つの装置で実行する他、複数の装置で分担して実行することができる。 Furthermore, when one step includes a plurality of processes, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices.

また、本明細書中に記載された効果はあくまで例示であって限定されるものではなく、他の効果があってもよい。 Moreover, the effects described in this specification are only examples and are not limited, and other effects may be provided.

さらに、本技術は、以下の構成とすることも可能である。 Furthermore, the present technology can also be configured as follows.

（１）
複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報を取得する取得部と、
前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号するオーディオ信号復号部と
を備える復号装置。
（２）
前記オーディオ信号復号部は、前記優先度情報により示される優先度合いが所定の度合い以上である、前記符号化されたオーディオ信号を復号する
（１）に記載の復号装置。
（３）
前記取得部は、前記所定の時間における前記複数のチャネルまたは前記複数のオブジェクトのオーディオ信号の前記優先度情報に基づいて、前記所定の度合いを変更する
（２）に記載の復号装置。
（４）
前記取得部は、前記オーディオ信号ごとに複数の前記優先度情報を取得し、
前記オーディオ信号復号部は、前記複数の前記優先度情報のなかから選択された１つの前記優先度情報に基づいて、前記符号化されたオーディオ信号を復号する
（１）乃至（３）の何れか一項に記載の復号装置。
（５）
前記複数の前記優先度情報は、前記符号化されたオーディオ信号の復号側の計算能力に応じて、前記計算能力ごとに生成されたものである
（４）に記載の復号装置。
（６）
前記符号化されたオーディオ信号に基づいて、前記優先度情報を生成する優先度情報生成部をさらに備える
（１）乃至（５）の何れか一項に記載の復号装置。
（７）
前記優先度情報生成部は、前記符号化されたオーディオ信号から得られる、オーディオ信号の音圧またはスペクトル形状に基づいて前記優先度情報を生成する
（６）に記載の復号装置。
（８）
前記オーディオ信号復号部は、チャネルごとまたはオブジェクトごとに、前記所定の時間の前記優先度情報と、前記所定の時間よりも前または後の時間の前記優先度情報とに基づいて、前記所定の時間の前記符号化されたオーディオ信号を復号するかを選択する
（１）に記載の復号装置。
（９）
前記復号が行われた場合、前記復号により得られた信号を出力信号とし、前記復号が行われなかった場合、０データを出力信号として、チャネルごとまたはオブジェクトごとに、前記所定の時間の前記出力信号と、前記所定の時間よりも前または後の時間の前記出力信号とを加算して前記所定の時間のオーディオ信号を生成する加算部と、
チャネルごとまたはオブジェクトごとに、前記所定の時間の前記優先度情報と、前記所定の時間よりも前または後の時間の前記優先度情報とに基づいて、前記所定の時間のオーディオ信号のゲイン調整を行うゲイン調整部と
をさらに備える（１）に記載の復号装置。
（１０）
チャネルごとまたはオブジェクトごとに、前記所定の時間の前記優先度情報と、前記所定の時間よりも前または後の時間の前記優先度情報とに基づいて、高域のパワー値をゲイン調整するとともに、ゲイン調整された前記パワー値と、前記所定の時間のオーディオ信号とに基づいて、前記所定の時間のオーディオ信号の高域成分を生成する高域生成部をさらに備える
（９）に記載の復号装置。
（１１）
チャネルごとまたはオブジェクトごとに、高域のパワー値と、前記所定の時間のオーディオ信号とに基づいて、高域成分が含まれる前記所定の時間のオーディオ信号を生成する高域生成部をさらに備え、
前記ゲイン調整部は、高域成分が含まれる前記所定の時間のオーディオ信号のゲイン調整を行う
（９）に記載の復号装置。
（１２）
前記所定の時間の前記優先度情報に基づいて、オブジェクトのオーディオ信号を複数の各チャネルに所定のゲイン値で割り当てて、前記複数の各チャネルのオーディオ信号を生成するレンダリング部をさらに備える
（１）に記載の復号装置。
（１３）
複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報を取得し、
前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号する
ステップを含む復号方法。
（１４）
複数のチャネルまたは複数のオブジェクトの符号化されたオーディオ信号、および所定の時間における各前記オーディオ信号の優先度情報を取得し、
前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号する
ステップを含む処理をコンピュータに実行させるプログラム。
（１５）
複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報を生成する優先度情報生成部と、
前記優先度情報をビットストリームに格納するパッキング部と
を備える符号化装置。
（１６）
前記優先度情報生成部は、前記オーディオ信号ごとに複数の前記優先度情報を生成する
（１５）に記載の符号化装置。
（１７）
前記優先度情報生成部は、符号化された前記オーディオ信号の復号側の計算能力に応じて、前記計算能力ごとに前記優先度情報を生成する
（１６）に記載の符号化装置。
（１８）
前記優先度情報生成部は、前記オーディオ信号の音圧またはスペクトル形状に基づいて前記優先度情報を生成する
（１５）乃至（１７）の何れか一項に記載の符号化装置。
（１９）
前記複数のチャネルまたは前記複数のオブジェクトのオーディオ信号を符号化する符号化部をさらに備え、
前記パッキング部は、前記優先度情報と符号化された前記オーディオ信号とを前記ビットストリームに格納する
（１５）乃至（１８）の何れか一項に記載の符号化装置。
（２０）
複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報を生成し、
前記優先度情報をビットストリームに格納する
ステップを含む符号化方法。
（２１）
複数のチャネルまたは複数のオブジェクトのオーディオ信号の所定の時間における優先度情報を生成し、
前記優先度情報をビットストリームに格納する
ステップを含む処理をコンピュータに実行させるプログラム。 (1)
an acquisition unit for acquiring encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a predetermined time;
and an audio signal decoding unit that decodes the encoded audio signals of a predetermined number of channels or objects according to the priority information, based on the priority information.
(2)
The decoding device according to (1), wherein the audio signal decoding unit decodes the encoded audio signal having a priority indicated by the priority information equal to or higher than a predetermined level.
(3)
The decoding device according to (2), wherein the obtaining unit changes the predetermined degree based on the priority information of the audio signals of the plurality of channels or the plurality of objects at the predetermined time.
(4)
The acquisition unit acquires a plurality of pieces of priority information for each audio signal,
any one of (1) to (3), wherein the audio signal decoding unit decodes the encoded audio signal based on one of the priority information selected from the plurality of priority information; The decoding device according to item 1.
(5)
(4) The decoding device according to (4), wherein the plurality of pieces of priority information are generated for each computational capability according to the computational capability of the decoding side of the encoded audio signal.
(6)
The decoding device according to any one of (1) to (5), further comprising a priority information generation unit that generates the priority information based on the encoded audio signal.
(7)
The decoding device according to (6), wherein the priority information generating section generates the priority information based on sound pressure or spectrum shape of an audio signal obtained from the encoded audio signal.
(8)
The audio signal decoding unit decodes, for each channel or each object, the predetermined time based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time. The decoding device according to (1), which selects whether to decode the encoded audio signal.
(9)
When the decoding is performed, the signal obtained by the decoding is used as an output signal, and when the decoding is not performed, 0 data is used as the output signal, and the output of the predetermined time is performed for each channel or each object. an addition unit that adds the signal and the output signal before or after the predetermined time to generate the audio signal of the predetermined time;
Adjusting the gain of the audio signal for the predetermined time based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time for each channel or for each object. The decoding device according to (1), further comprising:
(10)
gain-adjusting a high-frequency power value for each channel or for each object based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time; (9) The decoding device according to (9), further comprising a high-frequency generating unit that generates high-frequency components of the audio signal at the predetermined time based on the gain-adjusted power value and the audio signal at the predetermined time. .
(11)
further comprising a high frequency generator that generates an audio signal of the predetermined time containing high frequency components based on the power value of the high frequency and the audio signal of the predetermined time for each channel or object;
The decoding device according to (9), wherein the gain adjustment section adjusts the gain of the audio signal for the predetermined time including high frequency components.
(12)
(1) further comprising a rendering unit that allocates an audio signal of an object to each of the plurality of channels with a predetermined gain value based on the priority information for the predetermined time to generate the audio signal of each of the plurality of channels. The decoding device according to .
(13)
Obtaining encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a given time;
decoding, based on the priority information, the encoded audio signal of a predetermined number of channels or objects according to the priority information.
(14)
Obtaining encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a given time;
A program that causes a computer to perform a process comprising, based on the priority information, decoding the encoded audio signal of a predetermined number of channels or objects according to the priority information.
(15)
a priority information generator for generating priority information at a given time of audio signals of multiple channels or multiple objects;
and a packing unit that stores the priority information in a bitstream.
(16)
The encoding device according to (15), wherein the priority information generation unit generates a plurality of pieces of the priority information for each audio signal.
(17)
The encoding device according to (16), wherein the priority information generation unit generates the priority information for each computational capability according to the computational capability of a decoding side of the encoded audio signal.
(18)
The encoding device according to any one of (15) to (17), wherein the priority information generating section generates the priority information based on sound pressure or spectrum shape of the audio signal.
(19)
further comprising an encoding unit that encodes audio signals of the plurality of channels or the plurality of objects;
The encoding device according to any one of (15) to (18), wherein the packing unit stores the priority information and the encoded audio signal in the bitstream.
(20)
generating priority information at a given time for an audio signal of multiple channels or multiple objects;
An encoding method, comprising: storing the priority information in a bitstream.
(21)
generating priority information at a given time for an audio signal of multiple channels or multiple objects;
A program that causes a computer to execute a process including a step of storing the priority information in a bitstream.

１１符号化装置，２１チャネルオーディオ符号化部，２２オブジェクトオーディオ符号化部，２３メタデータ入力部，２４パッキング部，５１符号化部，５２優先度情報生成部，６１ MDCT部，９１符号化部，９２優先度情報生成部，１０１ MDCT部，１５１復号装置，１６１アンパッキング／復号部，１６２レンダリング部，１６３ミキシング部，１９１優先度情報取得部，１９３チャネルオーディオ信号復号部，１９４出力選択部，１９６ IMDCT部，１９８オブジェクトオーディオ信号復号部，１９９出力選択部，２０１ IMDCT部，２３１優先度情報生成部，２３２優先度情報生成部，２７１オーバーラップ加算部，２７２ゲイン調整部，２７３ SBR処理部，２７４オーバーラップ処理部，２７５ゲイン調整部，２７６ SBR処理部 11 encoding device, 21 channel audio encoding unit, 22 object audio encoding unit, 23 metadata input unit, 24 packing unit, 51 encoding unit, 52 priority information generation unit, 61 MDCT unit, 91 encoding unit, 92 priority information generation unit, 101 MDCT unit, 151 decoding device, 161 unpacking/decoding unit, 162 rendering unit, 163 mixing unit, 191 priority information acquisition unit, 193 channel audio signal decoding unit, 194 output selection unit, 196 IMDCT unit, 198 object audio signal decoding unit, 199 output selection unit, 201 IMDCT unit, 231 priority information generation unit, 232 priority information generation unit, 271 overlap addition unit, 272 gain adjustment unit, 273 SBR processing unit, 274 overlap processor, 275 gain adjuster, 276 SBR processor

本技術は復号装置および方法、並びにプログラムに関し、特に、オーディオ信号の復号の計算量を低減させることができるようにした復号装置および方法、並びにプログラムに関する。 TECHNICAL FIELD The present technology relates to a decoding device, method, and program, and more particularly to a decoding device, method, and program capable of reducing the amount of calculation for decoding an audio signal.

本技術の第１の側面の復号装置は、チャネルまたはオブジェクトの符号化されたオーディオ信号、および前記オーディオ信号の優先度情報を取得する取得部と、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号するオーディオ信号復号部と、チャネルごとまたはオブジェクトごとに、高域のパワー値とオーディオ信号とに基づいてSBR処理を行い、高域成分が含まれるオーディオ信号を生成するSBR処理部と、前記SBR処理により生成されたオブジェクトのオーディオ信号を複数の各チャネルに所定のゲイン値で割り当てて、前記複数の各チャネルのオーディオ信号を生成するレンダリング部とを備える。 A decoding device according to a first aspect of the present technology includes an acquisition unit that acquires an encoded audio signal of a channel or an object and priority information of the audio signal; an audio signal decoding unit that decodes the coded audio signals of a predetermined number of channels or objects according to and performs SBR processing based on the high-frequency power value and the audio signal for each channel or each object an SBR processing unit that generates an audio signal including high-frequency components; and an audio signal of the object generated by the SBR processing is allocated to each of a plurality of channels with a predetermined gain value to generate an audio signal of each of the plurality of channels. and a rendering unit that generates

本技術の第１の側面の復号方法またはプログラムは、チャネルまたはオブジェクトの符号化されたオーディオ信号、および前記オーディオ信号の優先度情報を取得し、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号を復号し、チャネルごとまたはオブジェクトごとに、高域のパワー値とオーディオ信号とに基づいてSBR処理を行い、高域成分が含まれるオーディオ信号を生成し、前記SBR処理により生成されたオブジェクトのオーディオ信号を複数の各チャネルに所定のゲイン値で割り当てて、前記複数の各チャネルのオーディオ信号を生成するステップを含む。 A decoding method or program according to the first aspect of the present technology acquires an encoded audio signal of a channel or an object and priority information of the audio signal , and converts the priority information to the priority information based on the priority information. Decode the encoded audio signal of a predetermined number of channels or objects according to each channel or object, perform SBR processing based on the power value of the high frequency band and the audio signal, and include the high frequency component. and assigning the audio signal of the object generated by the SBR processing to each of the plurality of channels with a predetermined gain value to generate the audio signal of each of the plurality of channels.

本技術の第１の側面においては、チャネルまたはオブジェクトの符号化されたオーディオ信号、および前記オーディオ信号の優先度情報が取得され、前記優先度情報に基づいて、前記優先度情報に応じた所定の数のチャネルまたはオブジェクトの前記符号化されたオーディオ信号が復号され、チャネルごとまたはオブジェクトごとに、高域のパワー値とオーディオ信号とに基づいてSBR処理が行われ、高域成分が含まれるオーディオ信号が生成され、前記SBR処理により生成されたオブジェクトのオーディオ信号が複数の各チャネルに所定のゲイン値で割り当てられて、前記複数の各チャネルのオーディオ信号が生成される。 In a first aspect of the present technology, an encoded audio signal of a channel or an object and priority information of the audio signal are obtained, and based on the priority information, a predetermined The encoded audio signal of a number of channels or objects is decoded, SBR processing is performed based on the high frequency power value and the audio signal for each channel or object, and an audio signal containing high frequency components is generated, the audio signal of the object generated by the SBR processing is assigned to each of the plurality of channels with a predetermined gain value, and the audio signal of each of the plurality of channels is generated.

Claims

an acquisition unit for acquiring encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a predetermined time;
and an audio signal decoding unit that decodes the encoded audio signals of a predetermined number of channels or objects according to the priority information, based on the priority information.

The decoding device according to claim 1, wherein the audio signal decoding section decodes the encoded audio signal whose priority level indicated by the priority level information is equal to or higher than a predetermined level.

The decoding device according to claim 2, wherein the obtaining unit changes the predetermined degree based on the priority information of the audio signals of the plurality of channels or the plurality of objects at the predetermined time.

The acquisition unit acquires a plurality of pieces of priority information for each audio signal,
The decoding device according to claim 1, wherein the audio signal decoding section decodes the encoded audio signal based on one piece of the priority information selected from the plurality of pieces of the priority information.

5. The decoding device according to claim 4, wherein the plurality of pieces of priority information are generated for each computational capability according to the computational capability of the decoding side of the encoded audio signal.

The decoding device according to claim 1, further comprising a priority information generating section that generates the priority information based on the encoded audio signal.

The decoding device according to claim 6, wherein the priority information generating section generates the priority information based on sound pressure or spectral shape of the audio signal obtained from the encoded audio signal.

The audio signal decoding unit decodes, for each channel or each object, the predetermined time based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time. 2. The decoding device according to claim 1, selecting whether to decode the encoded audio signal of.

When the decoding is performed, the signal obtained by the decoding is used as an output signal, and when the decoding is not performed, 0 data is used as the output signal, and the output of the predetermined time is performed for each channel or each object. an addition unit that adds the signal and the output signal before or after the predetermined time to generate the audio signal of the predetermined time;
Adjusting the gain of the audio signal for the predetermined time based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time for each channel or for each object. 2. The decoding device according to claim 1, further comprising: a gain adjustment unit that performs

gain-adjusting a high-frequency power value for each channel or for each object based on the priority information for the predetermined time and the priority information for the time before or after the predetermined time; 10. The decoding device according to claim 9, further comprising a high frequency generator that generates high frequency components of the audio signal at the predetermined time based on the gain-adjusted power value and the audio signal at the predetermined time. .

further comprising a high frequency generator that generates an audio signal of the predetermined time containing high frequency components based on the power value of the high frequency and the audio signal of the predetermined time for each channel or object;
10. The decoding device according to claim 9, wherein the gain adjustment section adjusts the gain of the audio signal for the predetermined time including high frequency components.

2. A rendering unit for generating audio signals for each of the plurality of channels by allocating an audio signal of the object to each of the plurality of channels with a predetermined gain value based on the priority information for the predetermined time. The decoding device according to .

Obtaining encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a given time;
decoding, based on the priority information, the encoded audio signal of a predetermined number of channels or objects according to the priority information.

Obtaining encoded audio signals of multiple channels or multiple objects and priority information of each said audio signal at a given time;
A program that causes a computer to perform a process comprising, based on the priority information, decoding the encoded audio signal of a predetermined number of channels or objects according to the priority information.

a priority information generator for generating priority information at a given time of audio signals of multiple channels or multiple objects;
and a packing unit that stores the priority information in a bitstream.

The encoding device according to claim 15, wherein the priority information generating section generates a plurality of pieces of the priority information for each audio signal.

17. The encoding device according to claim 16, wherein the priority information generation unit generates the priority information for each computational capability according to the computational capability of a decoding side of the encoded audio signal.

16. The encoding device according to claim 15, wherein the priority information generating section generates the priority information based on sound pressure or spectral shape of the audio signal.

further comprising an encoding unit that encodes audio signals of the plurality of channels or the plurality of objects;
The encoding device according to claim 15, wherein the packing unit stores the priority information and the encoded audio signal in the bitstream.

generating priority information at a given time for an audio signal of multiple channels or multiple objects;
An encoding method, comprising: storing the priority information in a bitstream.

generating priority information at a given time for an audio signal of multiple channels or multiple objects;
A program that causes a computer to execute a process including a step of storing the priority information in a bitstream.