JP2019144527A

JP2019144527A - Audio signal coding apparatus, audio signal decoding apparatus, audio signal coding method, and audio signal decoding method

Info

Publication number: JP2019144527A
Application number: JP2018203458A
Authority: JP
Inventors: 河嶋　拓也; Takuya Kawashima; 拓也河嶋; 江原　宏幸; Hiroyuki Ebara; 宏幸江原
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2014-07-25
Filing date: 2018-10-30
Publication date: 2019-08-29
Also published as: ES2707337T3; PT3413307T; PT3174050T; TR201901476T4; ZA201701428B; ES2823250T3; JP2019070823A; JP6957444B2

Abstract

To provide a coding technique and a decoding technique for realizing high-quality audio signals while reducing the overall bit rate.SOLUTION: An audio signal coding apparatus (100) includes: a time-frequency transformer (101) that outputs sub-band spectra from an input signal; a sub-band energy quantizer (102); a tonality calculator (103) that analyzes tonality of the sub-band spectra; a bit allocator (104) that selects a second sub-band on which quantization is performed by a second quantizer on the basis of the analysis result of the tonality and quantized sub-band energy, and determines a first number of bits to be allocated to a first sub-band on which quantization is performed by a first quantizer; the first quantizer (106) that performs coding by using the first number of bits; the second quantizer (107) that performs coding by using a pitch filter; and a multiplexer (108).SELECTED DRAWING: Figure 1

Description

本開示は、音声信号や音楽信号等の音響信号の音質を改善する符号化技術、および復号技術に関する。 The present disclosure relates to an encoding technique and a decoding technique that improve sound quality of an audio signal such as a voice signal or a music signal.

音響信号を低ビットレートで圧縮する符号化技術は、移動体通信における電波等の有効利用を実現する重要な技術である。さらに、近年通話音声の品質向上に対する期待が高まっており、臨場感の高い通話サービスの実現が望まれている。これを実現するためには、周波数帯域の広い音響信号を高ビットレートで符号化すればよい。しかし、このアプローチは電波や周波数帯域の有効利用と相反する。 An encoding technique for compressing an acoustic signal at a low bit rate is an important technique for realizing effective use of radio waves or the like in mobile communication. Furthermore, in recent years, expectations for improving the quality of telephone conversation voice have increased, and realization of a telephone service with a high sense of reality is desired. In order to realize this, an acoustic signal having a wide frequency band may be encoded at a high bit rate. However, this approach conflicts with the effective use of radio waves and frequency bands.

ここで、例としてＧ．７１９規格（非特許文献１）に採用されている音響信号符号化技術について検討する。 Here, as an example, G.I. The acoustic signal encoding technology adopted in the 719 standard (Non-patent Document 1) will be examined.

Ｇ．７１９規格では、音響信号を符号化するに際し、音響信号を周波数変換したスペクトルに対し所定のビットを割り当てる。具体的には、スペクトルを所定の周波数帯域幅を有するサブバンドに分割し、エネルギーの大きいサブバンドから順にラティスベクトル量子化により量子化を行うためのユニット（必要ビット数の単位）を以下の通り配分する。 G. In the 719 standard, when an acoustic signal is encoded, a predetermined bit is assigned to a spectrum obtained by frequency-converting the acoustic signal. Specifically, the units (units of the required number of bits) for dividing the spectrum into subbands having a predetermined frequency bandwidth and performing quantization by lattice vector quantization in order from the subband with the largest energy are as follows: To distribute.

（１）
全サブバンドの中からエネルギーが最大のサブバンドに１ユニットを配分する。 (1)
One unit is allocated to the subband with the maximum energy among all the subbands.

１スペクトル当たり１ビットずつ配分するので、例えばサブバンド内のスペクトルサンプル数が８なら、１ユニットは８ビットとなる（なお、１スペクトル当たりに配分可能なビット数は最大で９ビットで、例えばサブフレームのスペクトルサンプル数が８なら最終的に７２ビットまで割り当てが可能）。 Since 1 bit is allocated per spectrum, for example, if the number of spectral samples in a subband is 8, 1 unit is 8 bits (note that the maximum number of bits that can be allocated per spectrum is 9 bits, for example, sub (If the number of spectral samples in the frame is 8, 72 bits can be finally assigned).

（２）
１ユニットを配分したサブバンドは、量子化サブバンドエネルギーを２レベル（6dB）下げる。もし、１ユニットを配分したサブバンドへのビット割り当てが最大値（９ビット）を超えていたら、次回以降のループで量子化対象から外す。 (2)
The subband to which 1 unit is allocated lowers the quantization subband energy by 2 levels (6 dB). If the bit allocation to the subband to which 1 unit is allocated exceeds the maximum value (9 bits), it is excluded from the quantization target in the subsequent loop.

（３）
上記（１）に戻って同じ処理を繰り返す。 (3)
Returning to (1) above, the same processing is repeated.

図６は、各サブバンドにおけるサブバンドエネルギーを示す。横軸は周波数、縦軸は対数目盛の振幅を表す。図中、サブバンドエネルギーは点ではなく横線で表されているが、この一つ一つの幅が、各サブバンドの周波数帯域幅を表している。 FIG. 6 shows the subband energy in each subband. The horizontal axis represents frequency, and the vertical axis represents logarithmic scale amplitude. In the figure, the subband energy is represented by a horizontal line instead of a point, but each width represents the frequency bandwidth of each subband.

図７、図８は、Ｇ．７１９規格で定められた符号化方法を用いた場合の各サブバンドへのビット配分結果例を示す図である。各図の横軸は周波数、縦軸は割り当てられたビット数を表す。そして、図７は、ビットレートが１２８ｋｂｉｔ／ｓ、図８は、ビットレートが６４ｋｂｉｔ／ｓの場合である。 7 and FIG. It is a figure which shows the example of a bit allocation result to each subband at the time of using the encoding method defined by 719 standard. In each figure, the horizontal axis represents frequency, and the vertical axis represents the number of allocated bits. FIG. 7 shows a case where the bit rate is 128 kbit / s, and FIG. 8 shows a case where the bit rate is 64 kbit / s.

１２８ｋｂｉｔ／ｓの場合は割り当て可能なビット資産が豊富にあるので、多くのサブバンド（スペクトル）に、最大値である９ビットを割り当てることが可能であり、音響信号を高品質に保つことができる。 In the case of 128 kbit / s, since there are abundant bit assets that can be allocated, it is possible to allocate 9 bits which is the maximum value to many subbands (spectrums), and the acoustic signal can be kept in high quality. .

これに対し、６４ｋｂｉｔ／ｓの場合は、最大値である９ビットが割り当てられたサブバンドがなくなるが、逆にビットが割り当てられていないサブバンドもなく、音響信号の品質の劣化を抑えつつ電波や周波数帯域の有効利用を両立できているといえる。 On the other hand, in the case of 64 kbit / s, there is no subband to which 9 bits, which is the maximum value, is assigned, but there is no subband to which no bit is assigned. It can be said that both effective use of the frequency band can be achieved.

特表２０１３−５３４３２８号公報Special table 2013-534328 gazette 国際公開第２００５／０２７０９５号International Publication No. 2005/027095

ＩＴＵ―ＴＳｔａｎｄａｒｄＧ．７１９、２００８年ITU-T Standard G. 719, 2008

しかし、さらなる電波や周波数帯域の有効利用を図る必要がある。ここで、Ｇ．７１９規格で採用されている上記方法を用いて２０ｋｂｐ／ｓ以下程度の低ビットレートで３２ｋＨｚ程度のサンプリング周波数の音響信号を符号化する場合には、全てのサブバンドを量子化するためのユニット（ビット数）を確保できなくなるという問題がある。 However, more effective use of radio waves and frequency bands is required. Here, G. When encoding an acoustic signal having a sampling frequency of about 32 kHz at a low bit rate of about 20 kbp / s or less using the above method adopted in the 719 standard, a unit for quantizing all subbands ( There is a problem that the number of bits) cannot be secured.

図９は、２０ｋｂｉｔ／ｓでのＧ．７１９規格で定められた符号化方法を用いた場合の各サブバンドへのビット配分結果例を示す図である。このように、高周波数域部分はもちろん、場合によっては聴覚上重要な低周波数域部分についてもビットを割り当てることができなくなる結果、そのサブバンドにおけるスペクトルは符号化できないこととなり、音響信号の品質の劣化が著しくなる。 FIG. 9 shows G.M. at 20 kbit / s. It is a figure which shows the example of a bit allocation result to each subband at the time of using the encoding method defined by 719 standard. In this way, as a result of not being able to allocate bits not only to the high frequency part but also to the auditory important low frequency part, the spectrum in that subband cannot be encoded, and the quality of the acoustic signal Deterioration becomes remarkable.

これに対し、ビットの割り当て方法をダイナミックに変更する方法を採用することも考えられる（特許文献１）。 On the other hand, it is conceivable to adopt a method of dynamically changing the bit allocation method (Patent Document 1).

しかし、符号化方法（量子化方法）を変更せずに単一の符号化方法（量子化方法）でビット割り当て方法を変更することにより、音響信号の品質劣化を対策するにも限界がある。 However, by changing the bit allocation method with a single encoding method (quantization method) without changing the encoding method (quantization method), there is a limit in countermeasures against quality degradation of the acoustic signal.

本開示は、全体のビットレートを低減させつつも、高品質の音響信号を実現するための符号化技術および復号技術を提供する。 The present disclosure provides an encoding technique and a decoding technique for realizing a high-quality acoustic signal while reducing the overall bit rate.

本開示の音響信号符号化装置は、入力音響信号を周波数領域に変換してスペクトルを生成し、スペクトルを所定の周波数帯域毎のサブバンドに分割してサブバンドスペクトルを出力する時間周波数変換部と、サブバンド毎に量子化サブバンドエネルギーを求めるサブバンドエネルギー量子化部と、サブバンドスペクトルのトーナル性を分析して分析結果を出力するトーナリティ計算部と、トーナル性の分析結果および量子化サブバンドエネルギーに基づき、サブバンドの中から第２量子化部で量子化する第２サブバンドを選択し、第１量子化部で量子化する第１サブバンドに配分される第１のビット数を決定する、ビット配分部と、第１量子化部及び第２量子化部から出力された符号化情報、量子化サブバンドエネルギー、およびトーナル性の分析結果を含む情報を多重化し、出力する多重化部と、を構成する。第１量子化部は、第１サブバンドに含まれるサブバンドスペクトルを、第１のビット数からなるビットを用いてパルス符号化し、第２量子化部は、第２サブバンドに含まれるサブバンドスペクトルを、ピッチフィルタを用いて符号化する。 An acoustic signal encoding device according to the present disclosure includes a time-frequency conversion unit that converts an input acoustic signal into a frequency domain to generate a spectrum, divides the spectrum into subbands for each predetermined frequency band, and outputs a subband spectrum. , Subband energy quantization unit for obtaining quantized subband energy for each subband, tonality calculation unit for analyzing the tonal property of the subband spectrum and outputting the analysis result, tonal property analysis result and quantization subband Based on the energy, the second subband to be quantized by the second quantizer is selected from the subbands, and the first number of bits allocated to the first subband to be quantized by the first quantizer is determined. The bit allocation unit, the encoding information output from the first quantization unit and the second quantization unit, the quantization subband energy, and the tonal property The information including the analysis result and multiplexes, constituting a multiplexing unit to output. The first quantizing unit pulse-codes the subband spectrum included in the first subband using the first bit number, and the second quantizing unit includes the subband included in the second subband. The spectrum is encoded using a pitch filter.

なお、これらの包括的または具体的な態様は、システム、方法、集積回路、またはコンピュータプログラムで実現されてもよく、システム、装置、方法、集積回路、およびコンピュータプログラムの任意な組み合わせで実現されてもよい。 Note that these comprehensive or specific aspects may be realized by a system, method, integrated circuit, or computer program, or realized by any combination of the system, apparatus, method, integrated circuit, and computer program. Also good.

本開示の符号化装置、復号装置等によれば、全体のビットレートを低減させつつも、高品質な音響信号を符号化および復号することができる。 According to the encoding device, the decoding device, and the like of the present disclosure, it is possible to encode and decode a high-quality acoustic signal while reducing the overall bit rate.

本開示の実施形態１における符号化装置の構成図Configuration diagram of encoding apparatus according to Embodiment 1 of the present disclosure 本開示の実施形態１における符号化装置のビット配分部の詳細構成図Detailed configuration diagram of the bit distribution unit of the encoding device according to the first embodiment of the present disclosure 本開示の実施形態１における符号化装置の動作を示す説明図Explanatory drawing which shows operation | movement of the encoding apparatus in Embodiment 1 of this indication. 本開示の実施形態２における復号装置の構成図Configuration diagram of decoding apparatus according to Embodiment 2 of the present disclosure 本開示の実施形態２における復号装置のビット配分部の詳細構成図Detailed configuration diagram of the bit distribution unit of the decoding device according to the second embodiment of the present disclosure 従来技術の符号化装置におけるサブバンドエネルギーを説明する説明図Explanatory drawing explaining the subband energy in the encoding apparatus of a prior art 従来技術の符号化装置におけるサブバンドへのビット配分結果を説明する説明図Explanatory drawing explaining the bit allocation result to the subband in the encoding apparatus of a prior art 従来技術の符号化装置におけるサブバンドへのビット配分結果を説明する説明図Explanatory drawing explaining the bit allocation result to the subband in the encoding apparatus of a prior art 従来技術の符号化装置におけるサブバンドへのビット配分結果を説明する説明図Explanatory drawing explaining the bit allocation result to the subband in the encoding apparatus of a prior art

以下、本開示の実施形態の構成および動作について、図面を参照して説明する。なお、本開示の符号化装置への入力信号、および復号装置からの出力信号である音響信号は、音声信号、より帯域の広い音楽信号、さらにはこれらが混在する信号も包含する概念である。 Hereinafter, the configuration and operation of the embodiment of the present disclosure will be described with reference to the drawings. Note that the input signal to the encoding device of the present disclosure and the acoustic signal that is the output signal from the decoding device are a concept that includes an audio signal, a wider-band music signal, and a signal in which these signals are mixed.

本開示において、「入力音響信号」とは、音楽信号や音声信号、あるいは両者が混在した信号も包含する概念である。また、「量子化サブバンドエネルギー」とは、サブバンド内のサブバンドスペクトルのエネルギーの総和または平均であるサブバンドエネルギーを量子化したものであり、サブバンドエネルギーは例えばサブバンド内のサブバンドスペクトルの二乗和で求めることができる。「トーナル性」とは、特定の周波数成分にスペクトルのピークが立っている度合いをいい、その分析結果は、数値や符号などで表現することができる。「パルス符号化」とは、パルスを用いてスペクトルを近似する符号化をいう。 In the present disclosure, the “input sound signal” is a concept including a music signal, a sound signal, or a signal in which both are mixed. In addition, “quantized subband energy” is obtained by quantizing the subband energy that is the sum or average of the energy of the subband spectrum in the subband, and the subband energy is, for example, the subband spectrum in the subband. The sum of squares of “Tonality” refers to the degree to which a spectrum peak stands at a specific frequency component, and the analysis result can be expressed by a numerical value or a sign. “Pulse coding” refers to coding that approximates a spectrum using pulses.

「相対的に低い」とは、サブバンド間を比較してより低いものをいい、例えば全サブバンドの平均よりも低い場合や、所定の値よりも低い場合がこれにあたる。「高周波数域のサブバンド」とは、複数のサブバンドのうち、高周波数側に位置するサブバンドをいう。 The term “relatively low” refers to a lower value compared between subbands, for example, a case where it is lower than the average of all subbands or a case where it is lower than a predetermined value. “High frequency subband” refers to a subband located on the high frequency side among a plurality of subbands.

なお、実施形態や特許請求の範囲に記載の、第１（スペクトル）量子化部、第２（スペクトル）量子化部、第１（スペクトル）復号部、第２（スペクトル）復号部、第１サブバンド、第２サブバンド、第３サブバンド、第４サブバンド、第１のビット数、第２のビット数、第３のビット数、第４のビット数は、それぞれカテゴリーを意味するものであり、順序を意味するものではない。 In addition, the first (spectrum) quantization unit, the second (spectrum) quantization unit, the first (spectrum) decoding unit, the second (spectrum) decoding unit, the first sub, described in the embodiments and claims. The band, the second subband, the third subband, the fourth subband, the first bit number, the second bit number, the third bit number, and the fourth bit number each mean a category. , Does not mean the order.

（実施形態１）
図１は、実施形態１にかかる音響信号符号化装置１００の構成、および動作を示すブロック図である。図１に示す音響信号符号化装置１００は、時間―周波数変換部１０１、サブバンドエネルギー量子化部１０２、トーナリティ計算部１０３、ビット配分部１０４、正規化部１０５、第１スペクトル量子化部１０６、第２スペクトル量子化部１０７、多重化部１０８により構成される。また、多重化部１０８には、アンテナＡが接続されている。そして、音響信号符号化装置１００とアンテナＡとを合わせて、端末装置または基地局装置を構成する。 (Embodiment 1)
FIG. 1 is a block diagram illustrating the configuration and operation of the acoustic signal encoding device 100 according to the first embodiment. 1 includes a time-frequency conversion unit 101, a subband energy quantization unit 102, a tonality calculation unit 103, a bit distribution unit 104, a normalization unit 105, a first spectrum quantization unit 106, The second spectrum quantization unit 107 and the multiplexing unit 108 are included. In addition, the antenna A is connected to the multiplexing unit 108. Then, the acoustic signal encoding apparatus 100 and the antenna A are combined to constitute a terminal apparatus or a base station apparatus.

時間―周波数変換部１０１は、時間領域の入力音響信号を周波数領域に変換して入力音響信号スペクトル（以下、「スペクトル」とする。）を生成する。時間―周波数変換の例としてＭＤＣＴ（修正離散コサイン変換）が挙げられるが、これに限定されず、例えば、ＤＣＴ（離散コサイン変換）、ＤＦＴ（離散フーリエ変換）、フーリエ変換等を用いてもよい。 The time-frequency conversion unit 101 converts an input acoustic signal in the time domain into a frequency domain to generate an input acoustic signal spectrum (hereinafter referred to as “spectrum”). Examples of time-frequency conversion include MDCT (Modified Discrete Cosine Transform), but the present invention is not limited to this. For example, DCT (Discrete Cosine Transform), DFT (Discrete Fourier Transform), Fourier Transform, and the like may be used.

また、時間―周波数変換部１０１は、スペクトルを所定の周波数帯域であるサブバンドに分割する。所定の周波数帯域は、等間隔である場合の他、例えば高周波数域では広く低周波数域では狭くするなど、異なる間隔であってもよい。 In addition, the time-frequency conversion unit 101 divides the spectrum into subbands that are predetermined frequency bands. In addition to the case where the predetermined frequency bands are equally spaced, the predetermined frequency bands may be different intervals, for example, wide in the high frequency range and narrow in the low frequency range.

そして、時間―周波数変換部１０１は、サブバンド毎に分割したスペクトルを、サブバンドスペクトルとしてサブバンドエネルギー量子化部１０２、トーナリティ計算部１０３、および正規化部１０５に出力する。 Time-frequency conversion section 101 then outputs the spectrum divided for each subband to subband energy quantization section 102, tonality calculation section 103, and normalization section 105 as a subband spectrum.

サブバンドエネルギー量子化部１０２は、サブバンド毎にサブバンドスペクトルのエネルギーであるサブバンドエネルギーを求め、これを量子化して量子化サブバンドエネルギーを求める。具体的には、サブバンド内のサブバンドスペクトルの二乗和でサブバンドエネルギーを求めることができるが、これに限らない。例えば、サブバンド毎にサブバンドスペクトルの振幅を積分してサブバンドエネルギーを求めることができる。また、サブバンドエネルギーを平均化する場合は、二乗和をサブバンド内のスペクトル数（サブバンド幅）で除算する。そして、このようにして求めたサブバンドエネルギーを所定の刻み幅で量子化する。 The subband energy quantization unit 102 obtains subband energy that is energy of a subband spectrum for each subband, and quantizes the subband energy to obtain quantized subband energy. Specifically, the subband energy can be obtained from the sum of squares of the subband spectra in the subband, but the present invention is not limited to this. For example, the subband energy can be obtained by integrating the amplitude of the subband spectrum for each subband. In addition, when averaging the subband energy, the sum of squares is divided by the number of spectra in the subband (subband width). Then, the subband energy thus obtained is quantized with a predetermined step size.

そして、求めた量子化サブバンドエネルギーを、正規化部１０５、およびビット配分部１０４に出力するとともに、量子化サブバンドエネルギーを符号化した符号化量子化サブバンドエネルギーを多重化部１０８に出力する。 Then, the obtained quantization subband energy is output to normalization section 105 and bit distribution section 104, and the encoded quantization subband energy obtained by encoding the quantization subband energy is output to multiplexing section 108. .

トーナリティ計算部１０３は、各サブバンドに含まれるサブバンドスペクトルを分析して、トーナル性を判定する。トーナル性とは、特定の周波数成分にスペクトルのピークが立っている度合いをいい、際立ったピークが存在することを意味するピーク性を含む概念である。定量的には、例えば、対象とするサブバンド内の平均スペクトルの振幅と、そのサブバンド内に存在する最大スペクトルの振幅との比で求めることができ、この値が所定の閾値を超える場合、そのサブバンドのスペクトルはトーナル性（ピーク性）を有すると定義する。本実施形態では、所定の閾値を超えている場合はピーク／トーナルフラグとして１を、所定の閾値以下の場合はピーク／トーナルフラグとして０を生成し、これを分析結果としてビット配分部１０４、および多重化部１０８に出力する。もちろん、上記比を直接分析結果として出力してもよい。 The tonality calculation unit 103 analyzes the subband spectrum included in each subband and determines tonalness. Tonal property refers to the degree to which a spectrum peak stands at a specific frequency component, and is a concept that includes a peak property that means that a distinct peak exists. Quantitatively, for example, it can be obtained by the ratio of the amplitude of the average spectrum in the target subband and the amplitude of the maximum spectrum existing in the subband, and when this value exceeds a predetermined threshold, The spectrum of the subband is defined as having a tonal property (peak property). In the present embodiment, when a predetermined threshold is exceeded, 1 is generated as the peak / tonal flag, and when it is equal to or lower than the predetermined threshold, 0 is generated as the peak / tonal flag. The data is output to the multiplexing unit 108. Of course, the above ratio may be directly output as an analysis result.

トーナリティ計算部の意義は次の通りである。 The significance of the tonality calculator is as follows.

低ビットレート条件下においては、雑音的なスペクトルのようにスペクトルのエネルギーがサブバンド全体に分散しているスペクトルの効率的な量子化には、ピッチフィルタに基づく方法（つまり、低周波数域スペクトルを利用して高周波数域スペクトルを表現する方法）を用いることが有効である。それゆえ、サブバンド内のスペクトルのピーク性／トーナル性の尺度（ピークパワーと平均パワーの比など）からサブバンド内のエネルギー分散度合を判定して、ピーク性／トーナル性が高くないスペクトルのサブバンドはピッチフィルタに基づく量子化の対象にする。 Under low bit rate conditions, a pitch filter-based method (ie low frequency spectrum is used for efficient quantization of a spectrum where the spectrum energy is distributed across the entire subband, such as a noisy spectrum. It is effective to use a method of expressing a high-frequency spectrum using this method. Therefore, the degree of energy dispersion in the subband is determined from the peak / tonal scale of the spectrum in the subband (such as the ratio of peak power to average power), and the spectrum sub- Bands are subject to quantization based on pitch filters.

ビット配分部１０４は、サブバンド毎の量子化サブバンドエネルギー、およびピーク／トーナルフラグを参照して、各サブバンドにおけるサブバンドスペクトルに対し、符号化に用いることができる総ビット数を意味する、ビット資産からビットを割り当てる。具体的には、第１スペクトル量子化部で量子化するサブバンドである第１サブバンドに割り当てるビット数である、第１のビット数を計算・決定し、これを第１スペクトル量子化部１０６に、配分ビット情報として出力する。また、第２スペクトル量子化部１０７で量子化するサブバンドである、第２サブバンドを選択・特定し、これを第２スペクトル量子化部１０７に量子化モードとして出力する。 The bit allocation unit 104 refers to the quantized subband energy for each subband and the peak / tonal flag, and means the total number of bits that can be used for encoding for the subband spectrum in each subband. Allocate bits from bit assets. Specifically, a first number of bits, which is the number of bits assigned to the first subband that is a subband quantized by the first spectrum quantization unit, is calculated and determined, and this is calculated by the first spectrum quantization unit 106. Output as distribution bit information. Further, the second subband, which is a subband to be quantized by the second spectrum quantization unit 107, is selected and specified, and this is output to the second spectrum quantization unit 107 as a quantization mode.

ビット配分部１０４の構成及び動作の詳細は後述する。 Details of the configuration and operation of the bit distribution unit 104 will be described later.

なお、ビット配分部１０４は、本実施形態では、ピーク／トーナルフラグおよびサブバンド毎の量子化サブバンドエネルギーの順で参照するが、参照の順序は任意である。 In this embodiment, the bit allocation unit 104 refers to the peak / tonal flag and the quantized subband energy for each subband, but the reference order is arbitrary.

また、第２スペクトル量子化部１０７で量子化の対象となる第２サブバンドは、全帯域を候補としてもよいが、一般的に量子化サブバンドエネルギーが低い帯域、およびトーナル性が低い帯域は、主として高周波数域であるから、特定の高周波数域に存在するサブバンドのみを対象としてもよい。例えば、高周波数域の４つまたは５つのサブバンドのみを対象とすることができる。 In addition, the second subband to be quantized by the second spectrum quantization unit 107 may be the entire band, but generally, a band with a low quantization subband energy and a band with a low tonal property are Since it is mainly in the high frequency range, only a subband existing in a specific high frequency range may be targeted. For example, only four or five subbands in the high frequency range can be targeted.

あるいは、音響信号は通常、低周波数域側がトーナル性が高く、高周波数域側はトーナル性が低いため、実質的には高周波数域側のサブバンドがピッチフィルタに基づく量子化の対象となる。このため、トーナル性で選択されたサブバンドから高周波数域側は全てピッチフィルタによる量子化の対象とし、このサブバンドの番号だけを量子化モードとして送信する方法でもよい。 Alternatively, the acoustic signal usually has a high tonal property on the low frequency region side and a low tonal property on the high frequency region side, so that the subband on the high frequency region side is substantially subject to quantization based on the pitch filter. For this reason, a method may be used in which the high frequency region side from the subband selected by tonal property is all subject to quantization by the pitch filter, and only the subband number is transmitted as the quantization mode.

正規化部１０５は、入力された量子化サブバンドエネルギーで各サブバンドスペクトルを正規化（除算）することにより、正規化サブバンドスペクトルを生成する。これにより、サブバンド間での振幅の大きさの違いが正規化される。そして、正規化部１０５は、正規化サブバンドスペクトルを第１スペクトル量子化部１０６、および第２スペクトル量子化部１０７に出力される。 The normalization unit 105 generates a normalized subband spectrum by normalizing (dividing) each subband spectrum with the input quantized subband energy. Thereby, the difference in amplitude between subbands is normalized. Then, normalization section 105 outputs the normalized subband spectrum to first spectrum quantization section 106 and second spectrum quantization section 107.

なお、正規化部１０５は任意の構成である。 Note that the normalization unit 105 has an arbitrary configuration.

また、正規化部１０５は、本実施形態では１つの構成であるが、第１スペクトル量子化部１０６、および第２スペクトル量子化部１０７のそれぞれの前段に配置して２つとしてもよい。 In addition, the normalization unit 105 has one configuration in the present embodiment, but two normalization units 105 may be arranged in front of each of the first spectrum quantization unit 106 and the second spectrum quantization unit 107.

第１スペクトル量子化部１０６は、第１量子化部の一例であって、ビット配分部１０４で配分された第１のビット数からなるビットを用いて、入力された正規化サブバンドスペクトルのうち第１スペクトル量子化部１０６で量子化すべき第１サブバンドに属するサブバンドスペクトルを量子化する。そして、量子化の結果を、量子化スペクトルとして第２スペクトル量子化部１０７に出力するとともに、量子化スペクトルを符号化して生成した第１符号化情報を多重化部１０８に出力する。 The first spectrum quantizing unit 106 is an example of a first quantizing unit, and uses the bit composed of the first number of bits allocated by the bit distributing unit 104 to input the normalized subband spectrum. The first spectrum quantization unit 106 quantizes the subband spectrum belonging to the first subband to be quantized. Then, the quantization result is output to the second spectrum quantization unit 107 as a quantized spectrum, and the first encoded information generated by encoding the quantized spectrum is output to the multiplexing unit 108.

第１スペクトル量子化部１０６はパルス符号部を用いるが、パルス符号部の例として、ラティスベクトル量子化を行うラティスベクトル量子化部、少数のパルスでサブバンドスペクトルを近似するパルス符号化を行うパルス符号化部が挙げられる。つまり、トーナル性の高いスペクトルの量子化に適した量子化方法、少数のパルスで量子化する方法であれば、任意の量子化部を用いることができる。 The first spectrum quantization unit 106 uses a pulse encoding unit. As an example of the pulse encoding unit, a lattice vector quantization unit that performs lattice vector quantization, and a pulse that performs pulse encoding that approximates a subband spectrum with a small number of pulses. An encoding unit may be mentioned. In other words, any quantization unit can be used as long as it is a quantization method suitable for quantization of a spectrum with high tonal characteristics or a method of quantization with a small number of pulses.

なお、非常に低いビットレートでは、ラティスベクトル量子化よりも少数のパルスでサブバンドスペクトルを近似するパルス符号化による量子化の方がより音質を維持する効果が期待できる。 Note that at a very low bit rate, the effect of maintaining the sound quality can be expected by quantization by pulse coding that approximates the subband spectrum with a small number of pulses rather than lattice vector quantization.

第２スペクトル量子化部１０７は、第２量子化部の一例であって、例えば以下のような拡張帯域（ピッチフィルタによる予測モデル）による量子化法を採ることができる。 The second spectrum quantizing unit 107 is an example of a second quantizing unit, and can employ a quantization method based on the following extension band (prediction model using a pitch filter), for example.

ここで、ピッチフィルタとは、以下の式１で表される処理を行う処理ブロックである。 Here, the pitch filter is a processing block that performs processing represented by the following Expression 1.

一般的にピッチフィルタとは、時間軸の信号に対してピッチ周期（Ｔ）を強調する（周波数軸上でピッチ成分を強調する）フィルタのことを指し、タップ数が１の場合、離散信号ｘ［ｉ］に対して例えば式１で表されるデジタルフィルタである。しかしながら、本実施形態におけるピッチフィルタは、式１で表される処理を行う処理ブロックとして定義され、必ずしも時間軸の信号に対してピッチ強調を行うものではない。 In general, the pitch filter refers to a filter that emphasizes the pitch period (T) with respect to the time-axis signal (emphasizes the pitch component on the frequency axis). When the number of taps is 1, the discrete signal x For example, a digital filter represented by Formula 1 with respect to [i]. However, the pitch filter in the present embodiment is defined as a processing block that performs the processing represented by Expression 1, and does not necessarily perform pitch emphasis on a time-axis signal.

本実施形態では、前記ピッチフィルタ（式１で表される処理ブロック）を量子化ＭＤＣＴ係数列Ｍｑ［ｉ］に適用する。具体的には式１において、ｘ［ｉ］＝０（ｉ≧Ｋ，Ｋは符号化対象とするＭＤＣＴ係数の周波数下限）、ｙ［ｉ］＝Ｍｑ［ｉ］（ｉ＜Ｋ）としてｙ［ｉ］（Ｋ≦ｉ≦Ｋ'，Ｋ'は符号化対象とするＭＤＣＴ係数の周波数上限）を算出する。符号化対象とするＭＤＣＴ係数Ｍｔ［ｉ］と算出されたｙ［ｉ］との誤差を最小とするＴをラグ情報として符号化する．このようなピッチフィルタに基づくスペクトル符号化は、特許文献２などに開示されている。 In the present embodiment, the pitch filter (processing block represented by Expression 1) is applied to the quantized MDCT coefficient sequence Mq [i]. Specifically, in Equation 1, x [i] = 0 (i ≧ K, K is the lower frequency limit of the MDCT coefficient to be encoded), y [i] = Mq [i] (i <K), and y [ i] (K ≦ i ≦ K ′, where K ′ is the upper frequency limit of the MDCT coefficient to be encoded). T that minimizes the error between the MDCT coefficient Mt [i] to be encoded and the calculated y [i] is encoded as lag information. Such spectral coding based on the pitch filter is disclosed in Patent Document 2 and the like.

第２スペクトル量子化部１０７は、量子化モードを参照して第２スペクトル量子化部１０７で量子化すべき第２サブバンド（正規化サブバンドスペクトル）を特定する。これにより、前記ＫおよびＫ'が特定される。そして、特定した第２サブバンド（周波数Ｋ〜Ｋ'）にかかる正規化サブバンドスペクトル（前記Ｍｔ［ｉ］，Ｋ≦ｉ≦Ｋ' に相当）が、量子化スペクトル（前記Ｍｑ［ｉ］，ｉ＜Ｋに相当）との関係で相関が最大となる量子化スペクトルのサブバンドもしくは帯域を探索し、その位置をラグ情報（前記Ｔに相当）として生成する。ラグ情報は、サブバンドや帯域の絶対位置や相対位置、あるいはサブバンドの番号が例として挙げられる。そして、第２スペクトル量子化部１０７は、ラグ情報を符号化して、第２符号化情報として多重化部１０８に出力する。 The second spectrum quantization unit 107 specifies a second subband (normalized subband spectrum) to be quantized by the second spectrum quantization unit 107 with reference to the quantization mode. Thereby, the K and K ′ are specified. A normalized subband spectrum (corresponding to Mt [i], K ≦ i ≦ K ′) relating to the specified second subband (frequency K to K ′) is converted into a quantized spectrum (Mq [i], The subband or band of the quantized spectrum having the maximum correlation in relation to i <K) is searched, and the position is generated as lag information (corresponding to T). Examples of the lag information include the absolute position and relative position of subbands and bands, or subband numbers. Then, the second spectrum quantization unit 107 encodes the lag information and outputs the encoded lag information to the multiplexing unit 108 as second encoded information.

なお、本実施形態では、符号化量子化サブバンドエネルギーを多重化部１０８で多重化して送信しており、復号部側でゲインを生成できることから、ゲインは符号化していない。しかし、ゲインを符号化して送るようにしてもよい。その際は、量子化すべき第２サブバンドと相関が最大となる量子化スペクトルのサブバンドとの間のゲインを算出し、第２スペクトル量子化部１０７は、ラグ情報およびゲインを符号化して、第２符号化情報として多重化部１０８に出力する。 In the present embodiment, the encoded quantization subband energy is multiplexed and transmitted by the multiplexing unit 108, and the gain can be generated on the decoding unit side, so the gain is not encoded. However, the gain may be encoded and sent. In that case, the gain between the second subband to be quantized and the subband of the quantized spectrum having the maximum correlation is calculated, and the second spectrum quantization unit 107 encodes the lag information and the gain, It outputs to the multiplexing part 108 as 2nd encoding information.

なお、高周波数域のサブバンドは低周波数域のサブバンドよりバンド幅を広く設定するのが一般的だが、コピーされる低周波数域のサブバンドの一部について、エネルギーが小さいため、ラティスベクトル量子化の対象となっていない場合もあり得る。このような場合には、そのようなサブバンドはゼロスペクトルと見なすか、雑音付加を行ってサブバンド間のスペクトルの急変を回避すればよい。 In general, the bandwidth of the high frequency subband is set wider than that of the low frequency subband, but the lattice vector quantum is low because the energy of the subband of the low frequency region to be copied is small. In some cases, it may not be a target of conversion. In such a case, such a subband may be regarded as a zero spectrum, or noise may be added to avoid a sudden change in spectrum between subbands.

多重化部１０８は、量子化サブバンドエネルギー、第１符号化情報、第２符号化情報、およびピーク／トーナルフラグを多重化して符号化情報としてアンテナＡに出力する。 Multiplexing section 108 multiplexes the quantized subband energy, the first encoded information, the second encoded information, and the peak / tonal flag, and outputs the multiplexed information to antenna A.

そして、アンテナＡは、符号化情報を音響信号復号装置に向けて送信する。符号化情報は、各種ノードや基地局を経由して音響信号復号装置に至る。 Then, the antenna A transmits the encoded information to the acoustic signal decoding device. The encoded information reaches the acoustic signal decoding apparatus via various nodes and base stations.

次に、ビット配分部１０４の詳細について説明する。 Next, details of the bit distribution unit 104 will be described.

図２は、実施形態１にかかる音響信号符号化装置１００のビット配分部１０４の詳細な構成、および動作を示すブロック図である。図２に示すビット配分部１０４は、ビットリザーバー１１１、ビットリザーバー１１２、ビット配分計算部１１３、量子化モード決定部１１４から構成される。 FIG. 2 is a block diagram illustrating a detailed configuration and operation of the bit distribution unit 104 of the audio signal encoding device 100 according to the first embodiment. The bit distribution unit 104 illustrated in FIG. 2 includes a bit reservoir 111, a bit reservoir 112, a bit distribution calculation unit 113, and a quantization mode determination unit 114.

ビットリザーバー１１１は、トーナリティ計算部１０３の出力であるピーク／トーナルフラグを参照して、ピーク／トーナルフラグが０の場合、第２スペクトル量子化部１０７で行われる第２スペクトル量子化に必要なビット数を確保する。 The bit reservoir 111 refers to the peak / tonal flag output from the tonality calculation unit 103. When the peak / tonal flag is 0, the bit reservoir 111 is a bit required for the second spectrum quantization performed by the second spectrum quantization unit 107. Secure the number.

本実施形態では、ピッチフィルタに基づき、ラグ情報の符号化に必要なビット数を確保する。そして、確保されたビット数は、量子化に用いることができる総ビット数であるビット資産から除かれ、残ったビット資産がビットリザーバー１１２に出力される。なお、ビット資産はサブバンドエネルギー量子化部１０２から供給されているが、これは量子化サブバンドエネルギーを可変長符号化するために必要なビット数を除いたビットが、第１スペクトル量子化部１０６、第２スペクトル量子化部１０７、およびピーク／トーナルフラグの量子化（符号化）に用いることができることを表現したものである。サブバンドエネルギー量子化部１０２がビット資産の情報を生成するとは限らない。 In this embodiment, the number of bits necessary for encoding the lag information is secured based on the pitch filter. Then, the reserved number of bits is removed from the bit assets that are the total number of bits that can be used for quantization, and the remaining bit assets are output to the bit reservoir 112. The bit assets are supplied from the subband energy quantization unit 102. This is because the bits excluding the number of bits necessary for variable-length coding of the quantized subband energy are the first spectrum quantization unit. 106, the second spectrum quantization unit 107, and the peak / tonal flag can be used for quantization (encoding). The subband energy quantization unit 102 does not always generate bit asset information.

ビットリザーバー１１２は、ピーク／トーナルフラグに用いるビット数を確保する。例えば、本実施形態では、ピーク／トーナルフラグを高周波数域の５サブバンドで送るので、ビットリザーバー１１２は５ビットを確保する。 The bit reservoir 112 secures the number of bits used for the peak / tonal flag. For example, in the present embodiment, since the peak / tonal flag is sent in 5 subbands in the high frequency range, the bit reservoir 112 reserves 5 bits.

そして、ビットリザーバー１１２は、ビットリザーバー１１１から入力されたビット資産からビットリザーバー１１２で確保されたビット数を除いたビット数を、適応ビット配分部中のビット配分計算部１１３に出力する。なお、ビットリザーバー１１１およびビットリザーバー１１２で確保されたビット数の合計が、第３ビット数となる。また、ピーク／トーナルフラグがゼロであるサブバンドが、第３サブバンドに該当する。 Then, the bit reservoir 112 outputs the number of bits obtained by subtracting the number of bits secured in the bit reservoir 112 from the bit asset input from the bit reservoir 111 to the bit allocation calculation unit 113 in the adaptive bit allocation unit. Note that the total number of bits secured in the bit reservoir 111 and the bit reservoir 112 is the third bit number. A subband having a peak / tonal flag of zero corresponds to the third subband.

なお、ビットリザーバー１１１とビットリザーバー１１２は順序を入れ替えてもよい。また、本実施形態では、ビットリザーバー１１１とビットリザーバー１１２ブロックを分けているが、これを一つのブロックで同時に行ってもよい。あるいは、これらの動作を、ビット配分計算部１１３の中で行ってもよい。 The order of the bit reservoir 111 and the bit reservoir 112 may be changed. Further, in this embodiment, the bit reservoir 111 and the bit reservoir 112 block are separated, but this may be performed simultaneously in one block. Alternatively, these operations may be performed in the bit allocation calculation unit 113.

ビット配分計算部１１３は、第１スペクトル量子化部１０６で量子化するサブバンドへのビット配分を計算する。具体的には、まず、ビットリザーバー１１２から出力されたビット数を、量子化サブバンドエネルギーを参照して各サブバンドに配分する。配分方法は、従来技術の項で説明した通り、量子化サブバンドエネルギーの大小で聴覚的に重要かどうかを判断し、重要と思われるサブバンドにビット配分を重点的に行う。結果として、量子化サブバンドエネルギーがゼロ、またはゼロ及び所定の値よりも低いサブバンドにビットが配分されない。 The bit allocation calculation unit 113 calculates the bit allocation to the subbands quantized by the first spectrum quantization unit 106. Specifically, first, the number of bits output from the bit reservoir 112 is allocated to each subband with reference to the quantized subband energy. As described in the section of the prior art, the allocation method determines whether the quantization subband energy is large or small and whether it is important auditoryly or not, and assigns bits to the subbands that are considered important. As a result, no bits are allocated to subbands whose quantization subband energy is zero, or zero and lower than a predetermined value.

また、配分の際、入力されるピーク／トーナルフラグを参照し、ピーク／トーナルフラグが０のサブバンド（第３サブバンド）はビット配分の対象から外す。つまり、ピーク性が高いサブバンド（ここではピーク/トーナルフラグが１に設定されているサブバンド）のみをビット配分の対象サブバンドとしてビットを配分していく。そして、ビットが配分されるべきサブバンド（第１サブバンド）を特定するとともに各サブバンドに配分されるビット数を合わせて配分ビット情報とし、これをまず量子化モード決定部１１４に出力する。 Further, in the allocation, the input peak / tonal flag is referred to, and the subband (third subband) in which the peak / tonal flag is 0 is excluded from the bit allocation target. That is, bits are allocated using only the subbands with high peak characteristics (subbands where the peak / tonal flag is set to 1 here) as the target subbands for bit allocation. Then, the subband (first subband) to which the bits are to be allocated is specified, and the number of bits allocated to each subband is combined to be allocated bit information, which is first output to the quantization mode determining unit 114.

量子化モード決定部１１４は、ビット配分計算部１１３から出力された配分ビット情報およびピーク／トーナルフラグを受信する。そして、トーナル性が高い（第1スペクトル量子化部１０６の量子化対象である）のにビット配分されていない高周波数域サブバンドがある場合は、このサブバンドは第２スペクトル量子化部１０７で量子化するサブバンド（第４サブバンド）に定義し直し、第２スペクトル量子化部での量子化に必要なビット数（第４のビット数）を配分ビット情報から減算するためにビット配分計算部１１３に出力する。すなわち、その帯域に第２スペクトル量子化部１０７で量子化するのに必要なビット数を割り当て、その割り当てたビット数（第４のビット数）を出力する。これに代えて、割り当てたビット数だけ第１スペクトル量子化部１０６で使えるビット資産から減じて、これをビット配分計算部１１３に出力してもよい。 The quantization mode determination unit 114 receives the allocation bit information and the peak / tonal flag output from the bit allocation calculation unit 113. If there is a high frequency band subband that is not to be bit-distributed even though the tonal property is high (which is the quantization target of the first spectrum quantization unit 106), the subband is converted by the second spectrum quantization unit 107. Redefine the subband to be quantized (fourth subband), and calculate bit allocation to subtract the number of bits (fourth bit) required for quantization in the second spectrum quantizer from the allocated bit information Output to the unit 113. That is, the number of bits necessary for quantization by the second spectrum quantization unit 107 is assigned to the band, and the assigned number of bits (fourth bit number) is output. Alternatively, the number of allocated bits may be subtracted from the bit assets that can be used by the first spectrum quantization unit 106 and output to the bit allocation calculation unit 113.

また、量子化モード決定部１１４は、第２スペクトル量子化部１０７で量子化するサブバンドを特定し、これを第２スペクトル量子化部１０７に量子化モードとして出力する。具体的には、トーナリティが低い（ピーク／トーナルフラグが０）である高周波数域サブバンド（第３サブバンド）、およびビットが配分されていない高周波数域サブバンド（第４サブバンド）を、第２スペクトル量子化部１０７で量子化するサブバンド（第２サブバンド）と定め、量子化モードとして出力する。 Also, the quantization mode determination unit 114 identifies the subband to be quantized by the second spectrum quantization unit 107 and outputs this to the second spectrum quantization unit 107 as a quantization mode. Specifically, a high frequency region subband (third subband) with low tonality (peak / tonal flag is 0) and a high frequency region subband (fourth subband) to which no bits are allocated, A subband (second subband) to be quantized by the second spectrum quantization unit 107 is determined and output as a quantization mode.

再びビット配分計算部１１３において、ビットリザーバー１１２から入力されたビット数（ビット資産）から量子化モード決定部１１４から受信したビット数（第４のビット数）を減じることによりビット資産を更新し、第１スペクトル量子化部１０６で量子化するサブバンドへのビット配分を再計算する。更新されたビット資産を量子化モード決定部から受け取る場合は、更新されたビット資産を用いて、第１スペクトル量子化部１０６で量子化するサブバンドへのビット配分を再計算する。最終的に、第１のビット数は、総ビット数（ビット資産）から、第３のビット数および第４のビット数を減じた値となる。 The bit allocation calculation unit 113 again updates the bit asset by subtracting the bit number (fourth bit number) received from the quantization mode determination unit 114 from the bit number (bit asset) input from the bit reservoir 112, The bit allocation to the subbands to be quantized by the first spectrum quantization unit 106 is recalculated. When the updated bit asset is received from the quantization mode determination unit, the bit allocation to the subbands to be quantized by the first spectrum quantization unit 106 is recalculated using the updated bit asset. Finally, the first bit number is a value obtained by subtracting the third bit number and the fourth bit number from the total bit number (bit asset).

そして、再計算後のビット数（第１のビット数）および第１スペクトル量子化部１０６で量子化するサブバンド（第１サブバンド）の情報を、配分ビット情報として、今度は第１スペクトル量子化部１０６に出力する。 Then, the number of bits after recalculation (first bit number) and the information of the subband (first subband) quantized by the first spectrum quantization unit 106 are used as distribution bit information, and this time, the first spectrum quantum. To the conversion unit 106.

なお、第１回目にビット配分計算部１１３でビット配分を計算した結果、何れのサブバンドもビット配分されているなど再計算の必要がない場合は、直接配分ビット情報を第１スペクトル量子化部１０６に出力してもよい。 When the bit allocation is calculated by the bit allocation calculation unit 113 for the first time, if no sub-calculation is necessary, for example, any subband is allocated, direct allocation bit information is converted to the first spectrum quantization unit. 106 may be output.

図３は、実施形態１にかかる音響信号符号化装置１００の動作、具体的には、ビット配分部１０４の動作を示すフロー図である。 FIG. 3 is a flowchart illustrating the operation of the audio signal encoding device 100 according to the first embodiment, specifically, the operation of the bit distribution unit 104.

まず、ビット配分部１０４は、サブバンドエネルギー量子化部１０２から、量子化サブバンドエネルギーを取得する（Ｓ１）。 First, the bit allocation unit 104 acquires the quantized subband energy from the subband energy quantization unit 102 (S1).

次に、ビット配分部１０４は、高周波数域におけるピーク／トーナルフラグをトーナリティ計算部１０３から取得する（Ｓ２）。 Next, the bit allocation unit 104 acquires the peak / tonal flag in the high frequency range from the tonality calculation unit 103 (S2).

そして、ビット配分部１０４は、ピーク／トーナルフラグに基づき、第２スペクトル量子化部１０７で量子化すべきサブバンド（第３サブバンド）を特定するとともに、ビットリザーバー１１１およびビットリザーバー１１２において、第２スペクトル量子化部１０７で量子化するためのビット（第３のビット数）を確保する（Ｓ３）。 Then, the bit distribution unit 104 specifies the subband (third subband) to be quantized by the second spectrum quantization unit 107 based on the peak / tonal flag, and in the bit reservoir 111 and the bit reservoir 112, A bit (third bit number) to be quantized by the spectrum quantization unit 107 is secured (S3).

ビット配分部１０４は、ビット配分計算部１１３において、量子化サブバンドエネルギーに基づき、第１スペクトル量子化部１０６の量子化対象となっているサブバンドへ配分するビット数を決定する（Ｓ４）。 In the bit allocation calculation unit 113, the bit allocation unit 104 determines the number of bits to be allocated to the subbands to be quantized by the first spectrum quantization unit 106 based on the quantization subband energy (S4).

ビット配分部１０４は、量子化モード決定部１１４において、ビット配分計算部１１３で決定された高周波数域サブバンドへの配分ビットをチェックし、必要に応じて第２スペクトル量子化部１０７で量子化すべきサブバンド（第２サブバンド）を再特定し、第１サブバンド量子化部１０６のためのビット資産を更新する（Ｓ５）。 The bit allocation unit 104 checks the allocation bit to the high frequency band subband determined by the bit allocation calculation unit 113 in the quantization mode determination unit 114 and quantizes the second spectrum quantization unit 107 as necessary. The power subband (second subband) is re-specified, and the bit asset for the first subband quantization unit 106 is updated (S5).

そして、最後に、ビット配分部１０４は、再びビット配分計算部１１３において、更新したビット資産を用いて、第１スペクトル量子化部１０６へのビット配分（第１のビット数）を再計算する（Ｓ６）。 Finally, the bit allocation unit 104 recalculates the bit allocation (first bit number) to the first spectrum quantization unit 106 by using the updated bit asset again in the bit allocation calculation unit 113 ( S6).

以上、本実施形態の音響信号符号化装置によれば、全体のビットレートを低減させつつも、高品質な音響信号の符号化を実現することができる。 As described above, according to the audio signal encoding device of the present embodiment, it is possible to realize high-quality audio signal encoding while reducing the overall bit rate.

特に、図２、図３の構成、および動作によれば、サブバンド幅が特に広くなる高周波数域に、量子化をしない（ビット配分が０となる）サブバンドを発生させることなく、第１量子化部で量子化するサブバンド数を最大とするビット配分を実現できる。したがって、限られたビットレートにおいてベストパフォーマンスを引き出すことができる、適応ビット配分を実現できる。 In particular, according to the configuration and operation of FIGS. 2 and 3, the first frequency can be generated without generating a subband without quantization (bit allocation is 0) in a high frequency region where the subband width is particularly wide. It is possible to realize bit allocation that maximizes the number of subbands quantized by the quantization unit. Therefore, it is possible to realize adaptive bit allocation that can bring out the best performance at a limited bit rate.

（実施形態２）
図４は、実施形態２にかかる音響信号復号装置２００の構成、および動作を示すブロック図である。図４に示す音響信号復号装置２００は、分離部２０１、サブバンドエネルギー復号部２０２、ビット配分部２０３、第１スペクトル復号部２０４、第２スペクトル復号部２０５、逆正規化部２０６、周波数―時間変換部２０７により構成される。また、分離部２０１には、アンテナＡが接続されている。そして、音響信号復号装置２００およびアンテナＡを合わせて、端末装置または基地局装置を構成する。 (Embodiment 2)
FIG. 4 is a block diagram illustrating a configuration and an operation of the acoustic signal decoding device 200 according to the second embodiment. The acoustic signal decoding apparatus 200 illustrated in FIG. 4 includes a separation unit 201, a subband energy decoding unit 202, a bit distribution unit 203, a first spectrum decoding unit 204, a second spectrum decoding unit 205, a denormalization unit 206, and a frequency-time. The conversion unit 207 is configured. An antenna A is connected to the separation unit 201. Then, the acoustic signal decoding device 200 and the antenna A are combined to constitute a terminal device or a base station device.

分離部２０１は、アンテナＡで受信された符号化情報を受信し、符号化量子化サブバンドエネルギー、第１符号化情報、第２符号化情報、およびピーク／トーナルフラグを分離する。そして、符号化量子化サブバンドエネルギーはサブバンドエネルギー復号部２０２、第１符号化情報は第１スペクトル復号部２０４、第２符号化情報は第２スペクトル復号部２０５、そしてピーク／トーナルフラグはビット配分部２０３、へと出力される。 The separation unit 201 receives the encoded information received by the antenna A, and separates the encoded quantization subband energy, the first encoded information, the second encoded information, and the peak / tonal flag. The encoded quantization subband energy is a subband energy decoding unit 202, the first encoded information is a first spectrum decoding unit 204, the second encoded information is a second spectrum decoding unit 205, and the peak / tonal flag is a bit. To the distribution unit 203.

サブバンドエネルギー復号部２０２は、符号化量子化サブバンドエネルギーを復号して、復号量子化サブバンドエネルギーを生成し、ビット配分部２０３および逆正規化部２０６に出力される。 The subband energy decoding unit 202 decodes the encoded quantized subband energy, generates decoded quantized subband energy, and outputs the decoded quantized subband energy to the bit distribution unit 203 and the inverse normalization unit 206.

ビット配分部２０３は、サブバンド毎の復号量子化サブバンドエネルギー、およびピーク／トーナルフラグを参照して、第１スペクトル復号部２０４および第２スペクトル復号部２０５で割り当てるビットの配分を決定する。具体的には、第１スペクトル復号部２０４で第１符号化情報を復号した際に割り当てるビット数（第１のビット数）およびビットを割り当てられるサブバンド（第１サブバンド）を決定し、配分ビット情報として出力するとともに、第２スペクトル復号部２０５で復号される第２符号化情報が復号されるべきサブバンド（第２サブバンド）を特定・選択し、これを第２スペクトル復号部２０５に量子化モードとして出力する。 The bit allocation unit 203 refers to the decoded quantization subband energy for each subband and the peak / tonal flag, and determines the allocation of bits to be allocated by the first spectrum decoding unit 204 and the second spectrum decoding unit 205. Specifically, the number of bits (first bit number) to be assigned when the first spectrum decoding unit 204 decodes the first encoded information and the subband (first subband) to which the bits are assigned are determined and distributed. The sub-band (second sub-band) to be decoded is output to the second spectrum decoding unit 205 as the bit information, and the second encoded information decoded by the second spectrum decoding unit 205 is to be decoded. Output as quantization mode.

ビット配分部２０３は、図５で示した通り、符号化装置側で説明したビット配分部１０４の構成および動作と同じであるので、動作の詳細は符号化装置側のビット配分部１０４の説明を引用する。 Since the bit distribution unit 203 is the same as the configuration and operation of the bit distribution unit 104 described on the encoding device side as shown in FIG. 5, details of the operation are described in the description of the bit distribution unit 104 on the encoding device side. Quote.

第１スペクトル復号部２０４は、配分ビット情報に示された第１のビット数を用いて第１符号化情報を復号して第１復号スペクトルを生成し、第２スペクトル復号部２０５に出力する。 The first spectrum decoding unit 204 decodes the first encoded information using the first number of bits indicated in the allocated bit information, generates a first decoded spectrum, and outputs the first decoded spectrum to the second spectrum decoding unit 205.

第２スペクトル復号部２０５は、量子化モードで特定されたサブバンドに第１復号スペクトルを用いて第２符号化情報を復号して第２復号スペクトルを生成し、当該第２復号スペクトルと第１復号スペクトルと結合して再生スペクトルを生成し、出力する。 The second spectrum decoding unit 205 generates the second decoded spectrum by decoding the second encoded information using the first decoded spectrum in the subband specified in the quantization mode, and the second decoded spectrum and the first Combined with the decoded spectrum, a reproduction spectrum is generated and output.

逆正規化部２０６は、復号量子化サブバンドエネルギーを参照して再生スペクトルの振幅（ゲイン）を調整し、これを周波数―時間変換部２０７に出力する。 The inverse normalization unit 206 adjusts the amplitude (gain) of the reproduction spectrum with reference to the decoded quantized subband energy, and outputs this to the frequency-time conversion unit 207.

周波数―時間変換部２０７は、周波数領域の再生スペクトルを時間領域の出力音響信号に変換して出力する。周波数―時間変換の例として、周波数―時間で挙げた変換の逆変換が挙げられる。 The frequency-time conversion unit 207 converts the reproduction spectrum in the frequency domain into an output acoustic signal in the time domain and outputs it. As an example of the frequency-time conversion, there is an inverse conversion of the conversion given in frequency-time.

以上、本実施形態の音響信号復号装置によれば、全体のビットレートを低減させつつも、高品質な音響信号の復号を実現することができる。 As described above, according to the acoustic signal decoding device of the present embodiment, it is possible to realize high-quality acoustic signal decoding while reducing the overall bit rate.

（総括）
以上、実施形態１、２で本開示の音響信号符号化装置および音響信号復号装置を説明した。本開示の符号化装置および復号装置は、システムボードや半導体素子に代表されるような半完成品や部品レベルの形態でもよいし、端末装置や基地局装置のような完成品レベルの形態も含む概念である。本開示の符号化装置および復号装置が半完成品や部品レベルの形態の場合は、アンテナ、ＤＡ／ＡＤコンバータ、増幅部、スピーカ、およびマイク等と組み合わせることにより完成品レベルの形態となる。 (Summary)
As described above, the acoustic signal encoding device and the acoustic signal decoding device of the present disclosure have been described in the first and second embodiments. The encoding device and the decoding device of the present disclosure may be in a semi-finished product or component level form as represented by a system board or a semiconductor element, and also include a finished product level form such as a terminal device or a base station device. It is a concept. When the encoding device and the decoding device according to the present disclosure are in a semi-finished product or a component level form, they are combined with an antenna, a DA / AD converter, an amplifying unit, a speaker, a microphone, and the like to obtain a finished product level form.

なお、図１、図２、図４、図５のブロック図は、専用に設計されたハードウェアの構成および動作（方法）を表すとともに、汎用のハードウェアに本開示の動作（方法）を実行するためのプログラムをインストールしてプロセッサで実行することにより実現する場合も含む。汎用のハードウェアたる電子計算機として、例えばパーソナルコンピュータ、スマートフォンなどの各種携帯情報端末、および携帯電話などが挙げられる。 The block diagrams of FIGS. 1, 2, 4, and 5 represent the configuration and operation (method) of hardware designed exclusively, and execute the operation (method) of the present disclosure on general-purpose hardware. Including a case where the program is realized by installing a program for executing the program and executing the program on the processor. Examples of general-purpose hardware electronic computers include personal computers, various portable information terminals such as smartphones, and mobile phones.

また、専用に設計されたハードウェアは、携帯電話や固定電話などの完成品レベル（コンシューマエレクトロニクス）に限らず、システムボードや半導体素子など、半完成品や部品レベルをも含むものである。 Moreover, the hardware designed exclusively is not limited to a finished product level (consumer electronics) such as a mobile phone and a fixed phone, but includes a semi-finished product and a component level such as a system board and a semiconductor element.

本開示にかかる音響信号符号化装置および音響信号復号装置は、音響信号の記録、伝送、再生に関係する機部に応用が可能である。 The acoustic signal encoding device and the acoustic signal decoding device according to the present disclosure can be applied to an apparatus related to recording, transmission, and reproduction of an acoustic signal.

１００音響信号符号化装置
１０１時間―周波数変換部
１０２サブバンドエネルギー量子化部
１０３トーナリティ計算部
１０４ビット配分部
１０５正規化部
１０６第１スペクトル量子化部
１０７第２スペクトル量子化部
１０８多重化部
１１１ビットリザーバー
１１２ビットリザーバー
１１３ビット配分計算部
１１４量子化モード決定部
２００音響信号復号装置
２０１分離部
２０２サブバンドエネルギー復号部
２０３ビット配分部
２０４第１スペクトル復号部
２０５第２スペクトル復号部
２０６逆正規化部
２０７周波数―時間変換部
２１１ビットリザーバー
２１２ビットリザーバー
２１３ビット配分計算部
２１４量子化モード決定部 DESCRIPTION OF SYMBOLS 100 Acoustic signal encoding apparatus 101 Time-frequency conversion part 102 Subband energy quantization part 103 Tonality calculation part 104 Bit allocation part 105 Normalization part 106 1st spectrum quantization part 107 2nd spectrum quantization part 108 Multiplexing part 111 Bit reservoir 112 Bit reservoir 113 Bit allocation calculation unit 114 Quantization mode determination unit 200 Acoustic signal decoding device 201 Separation unit 202 Subband energy decoding unit 203 Bit allocation unit 204 First spectrum decoding unit 205 Second spectrum decoding unit 206 Denormalization Section 207 Frequency-time conversion section 211 Bit reservoir 212 Bit reservoir 213 Bit allocation calculation section 214 Quantization mode determination section

Claims

A first encoding unit that generates a first encoded signal by encoding a low frequency signal having a predetermined frequency or less of a voice or audio input signal, and generates a low frequency decoded signal by decoding the first encoded signal;
A second encoding unit that generates a high frequency encoded signal by encoding a high frequency signal from the low frequency signal based on the low frequency decoded signal;
A first multiplexing unit that multiplexes the first encoded signal and the high-frequency encoded signal and outputs an encoded signal;
The second encoding unit includes:
Calculate the energy ratio between the high frequency noise component, which is the noise component of the high frequency signal, and the high frequency non-tonal component of the high frequency decoded signal generated from the low frequency decoded signal, and output it as a high frequency encoded signal To
Encoding device.

An energy calculator that calculates energy of the voice or audio input signal and outputs the energy as quantized band energy; and
The first multiplexing unit multiplexes the quantized band energy, the first encoded signal, and the high frequency encoded signal, and outputs the result.
The encoding device according to claim 1.

The second encoding unit includes:
A separation unit that separates the low frequency decoded signal into a low frequency non-tonal signal that is a non-tonal component of the low frequency decoded signal and a low frequency tonal signal that is a tonal component of the low frequency decoded signal;
A first band extension unit that outputs position information of a specific band that maximizes a correlation between the high-frequency signal and the low-frequency tonal signal as lag information;
A second band extending unit that outputs the low frequency non-tonal signal corresponding to the lag information as a high frequency non-tonal signal;
A first calculator that calculates energy of a high-frequency noise component that is a noise component from the high-frequency signal corresponding to the lag information;
Calculating the ratio from the energy ratio of the high-frequency noise component and the high-frequency non-tonal signal, and outputting as a scale factor;
A second multiplexing unit that multiplexes the lag information and the scale factor and outputs the result as a high frequency encoded signal,
The encoding device according to claim 2.

The second encoding unit includes:
A noise adding unit that adds a noise signal to the low-frequency decoded signal;
The encoding device according to claim 3.

The second encoding unit includes:
A noise addition unit that adds a noise signal to the low-frequency non-tonal signal output from the separation unit;
The encoding device according to claim 3.

In the encoding device, a first encoded signal obtained by encoding a low frequency signal having a frequency equal to or lower than a predetermined frequency of a voice or audio input signal, and a high frequency encoded signal obtained by encoding a higher frequency signal than the low frequency signal are input. A decoding device comprising:
A separation unit that separates the first encoded signal and the high-frequency encoded signal;
A first decoding unit for decoding the first encoded signal to generate a low-frequency decoded signal;
A second decoding unit that decodes the highband encoded signal and generates a wideband decoded signal using the lowband decoded signal,
The high frequency encoded signal includes an energy ratio between a high frequency noise component which is a noise component and a high frequency non-tonal component of a high frequency decoded signal generated from the low frequency decoded signal,
The second decoding unit
Adjusting the amplitude of the low frequency non-tonal signal, which is a non-tonal component of the low frequency decoded signal, with reference to the decoded ratio;
Decoding device.

In the encoding device, a first encoded signal obtained by encoding a low frequency signal having a predetermined frequency or lower of a voice or audio input signal, a high frequency encoded signal obtained by encoding a signal of a higher frequency than the low frequency signal, and a band A decoding device to which an energy encoded signal is input,
A first decoding unit for decoding the first encoded signal to generate a low-frequency decoded signal;
A second decoding unit that decodes the high frequency encoded signal and generates a wideband decoded signal using the low frequency decoded signal;
A third decoding unit that decodes the band energy encoded signal to generate quantized band energy,
The second decoding unit
A separation unit that separates the low frequency decoded signal into a low frequency non-tonal signal that is a non-tonal component of the low frequency decoded signal and a low frequency tonal signal that is a tonal component of the low frequency decoded signal;
A first band extension unit that generates a high frequency non-tonal signal by copying the low frequency non-tonal signal to a high frequency using lag information obtained by decoding the high frequency encoded signal;
A first scaling unit that adjusts an amplitude of the high-frequency non-tonal signal using a scale factor obtained by decoding the high-frequency encoded signal;
From the energy of the high frequency non-tonal signal and the quantization band energy, a tonal signal energy estimation unit that estimates the energy of the high frequency tonal signal;
A first combining unit that combines the low frequency non-tonal signal and the high frequency non-tonal signal to generate a wideband non-tonal signal;
A second band extending unit that generates a high frequency tonal signal by copying the low frequency tonal signal to a high frequency using the lag information;
A second scaling unit that adjusts the amplitude of the high frequency tonal signal based on the energy of the high frequency tonal signal;
A second combining unit that combines the low frequency tonal signal and the high frequency tonal signal whose amplitude is adjusted to generate a wideband tonal signal;
An adder that adds the wideband non-tonal signal and the wideband tonal signal to generate a wideband decoded signal;
The lag information is position information of a specific band that maximizes the correlation between the high frequency signal and the low frequency tonal signal,
The scale factor is an energy ratio between a high-frequency noise component that is a noise component of a high-frequency signal corresponding to the lag information and a high-frequency non-tonal signal.
Decoding device.

The second decoding unit
A noise adding unit that adds a noise signal to the low-frequency decoded signal;
The decoding device according to claim 6.

The second decoding unit
A noise addition unit that adds a noise signal to the low-frequency non-tonal signal output from the separation unit;
The decoding device according to claim 6.

A terminal device comprising the encoding device according to claim 1.

A terminal device comprising the decoding device according to claim 6.

Encoding a low frequency signal of a voice or audio input signal below a predetermined frequency to generate a first encoded signal;
Decoding the first encoded signal to generate a low-frequency decoded signal;
Based on the low frequency decoded signal, a high frequency signal is encoded by generating a high frequency signal than the low frequency signal,
Calculating an energy ratio between a high frequency noise component which is a noise component of the high frequency signal and a high frequency non-tonal component of the high frequency decoded signal generated from the low frequency decoded signal;
Multiplexing the first encoded signal and the high frequency encoded signal including the ratio to output an encoded signal;
Encoding method.

The encoding method according to claim 12 includes:
Calculate the energy of the voice or audio input signal and output it as quantized band energy,
Separating the low frequency decoded signal into a low frequency non-tonal signal that is a non-tornal component of the low frequency decoded signal and a low frequency tonal signal that is a tonal component of the low frequency decoded signal;
The position information of a specific band that maximizes the correlation between the high frequency signal and the low frequency tonal signal is output as lag information,
The low frequency non-tonal signal corresponding to the lag information is output as a high frequency non-tonal signal,
From the high frequency signal corresponding to the lag information, calculate the energy of the high frequency noise component that is a noise component,
Calculating an energy ratio between the high-frequency noise component and the high-frequency non-tonal signal and outputting it as a scale factor;
Encoding method.

About a first encoded signal obtained by encoding a low frequency signal of a predetermined frequency or lower of a voice or audio input signal in an encoding device, and a high frequency encoded signal obtained by encoding a higher frequency signal than the low frequency signal,
Separating the first encoded signal and the high frequency encoded signal;
Decoding the first encoded signal to generate a low-frequency decoded signal;
Decoding the highband encoded signal, generating a wideband decoded signal using the lowband decoded signal;
The high frequency encoded signal includes an energy ratio between a high frequency noise component which is a noise component and a high frequency non-tonal component of a high frequency decoded signal generated from the low frequency decoded signal,
Generating the decoded ratio, and adjusting the amplitude of the low frequency non-tonal signal, which is a non-tonal component of the low frequency decoded signal, with reference to the ratio;
Decryption method.

A first encoded signal obtained by encoding a low-frequency signal having a frequency equal to or lower than a predetermined frequency of an audio or audio input signal in the encoding device; a high-frequency encoded signal obtained by encoding a signal higher in frequency than the low-frequency signal; About the signal
Decoding the first encoded signal to generate a low-frequency decoded signal;
Decoding the highband encoded signal, generating a wideband decoded signal using the lowband decoded signal;
Decoding the band energy encoded signal to generate quantized band energy;
The low-frequency decoded signal is separated into a low-frequency non-tonal signal that is a non-tonal component of the low-frequency decoded signal and a low-frequency tonal signal that is a tonal component of the low-frequency decoded signal, and the high frequency encoding Using the lag information obtained by decoding the signal, the low frequency non-tonal signal is copied to the high frequency to generate a high frequency non-tonal signal,
Adjust the amplitude of the high-frequency non-tonal signal using a scale factor obtained by decoding the high-frequency encoded signal,
From the energy of the high frequency non-tonal signal and the quantization band energy, the energy of the high frequency tonal signal is estimated,
Combining the low frequency non-tonal signal and the high frequency non-tonal signal to generate a wideband non-tonal signal;
Using the lag information, the low frequency tonal signal is copied to a high frequency to generate a high frequency tonal signal,
Based on the energy of the high frequency tonal signal, adjust the amplitude of the high frequency tonal signal,
A broadband tonal signal is generated by combining the low frequency tonal signal and the high frequency tonal signal whose amplitude is adjusted,
Adding the broadband non-tonal signal and the broadband tonal signal to generate a wideband decoded signal;
The lag information is position information of a specific band that maximizes the correlation between the high frequency signal and the low frequency tonal signal,
The scale factor is an energy ratio between a high-frequency noise component that is a noise component of a high-frequency signal corresponding to the lag information and a high-frequency non-tonal signal.
Decryption method.

A process of generating a first encoded signal by encoding a low frequency signal of a predetermined frequency or less of a voice or audio input signal;
A process of decoding the first encoded signal to generate a low-frequency decoded signal;
Based on the low frequency decoded signal, a process of generating a high frequency encoded signal by encoding a higher frequency signal than the low frequency signal;
A process of calculating an energy ratio between a high frequency noise component that is a noise component of the high frequency signal and a high frequency non-tonal component of the high frequency decoded signal generated from the low frequency decoded signal;
The program which makes a processor perform the process which multiplexes the said 1st encoding signal and the high frequency encoding signal containing the said ratio, and outputs an encoding signal.

About a first encoded signal obtained by encoding a low frequency signal having a frequency equal to or lower than a predetermined frequency of a voice or audio input signal in the encoding device, and a high frequency encoded signal obtained by encoding a higher frequency signal than the low frequency signal,
Processing to separate the first encoded signal and the high frequency encoded signal;
A process of decoding the first encoded signal to generate a low-frequency decoded signal;
Decoding the highband encoded signal and generating a wideband decoded signal using the lowband decoded signal;
The high frequency encoded signal includes an energy ratio between a high frequency noise component which is a noise component and a high frequency non-tonal component of a high frequency decoded signal generated from the low frequency decoded signal,
A program that generates the decoded ratio and adjusts the amplitude of a low-frequency non-tonal signal that is a non-tonal component of the low-frequency decoded signal with reference to the ratio.